Baidu AI Cloud
中国站

百度智能云

Data Warehouse

CREATE-TABLE

CREATE TABLE

Description

This command is used to create a table. This article mainly introduces the syntax of creating tables self-maintained by Palo. For syntax of external tables, refer to [CREATE EXTERNAL TABLE](PALO/Operating Manual/Data Load /Synchronize Data Through External Table.md#) Document.

CREATE TABLE [IF NOT EXISTS] [database.]table
(
    column_definition_list,
    [index_definition_list]
)
[engine_type]
[keys_type]
[table_comment]
[partition_info]
distribution_info
[rollup_list]
[properties]
[extra_properties]
  • column_definition_list

    Column definition list:

    column_definition[, column_definition]

    • column_definition

      Column definition:

      column_name column_type [KEY] [aggr_type] [NULL] [default_value] [column_comment]

      • column_type

        Column type, the following types are supported:

        TINYINT(1 byte)
           Range: -2^7 + 1 ~ 2^7 - 1
        SMALLINT(2 bytes)
            Range: -2^15 + 1 ~ 2^15 - 1
        INT(4 bytes)
            Range: -2^31 + 1 ~ 2^31 - 1
        BIGINT(8 bytes)
           Range: -2^63 + 1 ~ 2^63 - 1
        LARGEINT(16 bytes)
            Range: -2^127 + 1 ~ 2^127 - 1
        FLOAT(4 bytes)
            support scientific counting method
        DOUBLE(12 bytes)
            support scientific counting method
        DECIMAL[(precision, scale)] (16 bytes)
            guarantee the decimal type of precision, default to DECIMAL(10, 0)
            precision: 1 ~ 27
            scale: 0 ~ 9
            In which, the integer part is 1 ~ 18
            not support the scientific counting method
        DATE(3 bytes)
            Range: 0000-01-01 ~ 9999-12-31
        DATETIME(8 bytes)
            Range:0000-01-01 00:00:00 ~ 9999-12-31 23:59:59
        CHAR[(length)]
            fixed-length string. Range of length: 1 ~ 255. Default to 1
        VARCHAR[(length)]
            Variable-length string. Range of length: 1 ~ 65533. Default to 1
        HLL (1~16385 bytes)
            HyperLogLog column type,there is no need to specify the length and default value.The length is controlled in the system according to the aggregation degree of data.
           Must be used in combination with HLL_UNION aggregation type.
        BITMAP
            bitmap column type,there is no need to specify the length and default value.It represents the sets of integers, elements support up to 2^64 - 1.
            Must be used in combination with BITMAP_UNION aggregation type.
      • aggr_type

        Aggregation type, the following types are supported:

        SUM:sum.Applicable to numeric type.
        MIN:minimum value. Applicable to numeric type.
        MAX:maximum value. Applicable to numeric type.
        REPLACE:replace. For rows with the same dimension column, the index columns will be imported in the order of import, and replace the one imported previously with the one imported later.
        REPLACE_IF_NOT_NULL:replace if not null. The difference between it with REPLACE lies in only replacing values not null.
        HLL_UNION:aggregation method for columns of HLL type, aggregate by HyperLogLog algorithm.
        BITMAP_UNION:aggregation method for columns of BIMTAP type, aggregate the union set of bitmaps. 

      Example:

        ​```
        k1 TINYINT,
        k2 DECIMAL(10,2) DEFAULT "10.5",
        k4 BIGINT NULL DEFAULT VALUE "1000" COMMENT "This is column k4",
        v1 VARCHAR(10) REPLACE NOT NULL,
        v2 BITMAP BITMAP_UNION,
        v3 HLL HLL_UNION,
        v4 INT SUM NOT NULL DEFAULT "1" COMMENT "This is column v4"
        ​```
  • index_definition_list

    Index list definition:

    index_definition[, index_definition]

    • index_definition

      Index definition:

      INDEX index_name (col_name[, col_name, ...]) [USING BITMAP] COMMENT 'xxxxxx'

      Example:

      INDEX idx1 (k1, k2) USING BITMAP COMMENT "This is a bitmap index"
  • engine_type

    Table engine type. All types in this document are OLAP. For other external table engine types, refer to CREATE EXTERNAL TABLE document. Example:

    ENGINE=olap

  • key_desc

    Data model.

    key_type(col1, col2, ...)

    key_type supports the following models:

    • DUPLICATE KEY(default): columns specified after that are ordering columns.
    • AGGREGATE KEY:the columns specified after that are dimensional columns.
    • UNIQUE KEY:the columns specified after that are key columns.

    Example:

    DUPLICATE KEY(col1, col2),
    AGGREGATE KEY(k1, k2, k3),
    UNIQUE KEY(k1, k2)
  • table_comment

    Table comment. Example:

    COMMENT "This is my first PALO table"
  • partition_desc

    Partition information, support two kinds of writing:

    1. LESS THAN:only define the upper bound of partition. Lower bound is determined by the upper bound of last partition.

      PARTITION BY RANGE(col1[, col2, ...])
      (
          PARTITION partition_name1 VALUES LESS THAN MAXVALUE|("value1", "value2", ...),
          PARTITION partition_name2 VALUES LESS THAN MAXVALUE|("value1", "value2", ...)
      )
    2. FIXED RANGE: define the left-closed and right open interval of partition.

      PARTITION BY RANGE(col1[, col2, ...])
      (
          PARTITION partition_name1 VALUES [("k1-lower1", "k2-lower1", "k3-lower1",...), ("k1-upper1", "k2-upper1", "k3-upper1", ...)),
          PARTITION partition_name2 VALUES [("k1-lower1-2", "k2-lower1-2", ...), ("k1-upper1-2", MAXVALUE, ))
      )
  • distribution_desc

    Define the data bucketing method.

    DISTRIBUTED BY HASH (k1[,k2 ...]) [BUCKETS num]

  • rollup_list

    Multiple materialized views (ROLLUP) can be created while creating tables.

    ROLLUP (rollup_definition[, rollup_definition, ...])

    • rollup_definition

      rollup_name (col1[, col2, ...]) [DUPLICATE KEY(col1[, col2, ...])] [PROPERTIES("key" = "value")]

      Example:

      ROLLUP (
          r1 (k1, k3, v1, v2),
          r2 (k1, v1)
      )
  • properties

    Set table properties. Currently the following properties are supported:

    • replication_num

      Replication number. The default replication number is 3. If the number of BE node is less than 3, it is required to specify the replication number less than or equal to the number of BE node.

    • storage_medium/storage_cooldown_time

      Data storage medium.storage_medium is used to declare the initial storage medium of table data, while storage_cooldown_time is used to set the expiration time. Example:

      "storage_medium" = "SSD",
      "storage_cooldown_time" = "2020-11-20 00:00:00"

      This example indicates that the data is stored in SSD and will be automatically migrated to HDD storage after it expires at 00: 00: 00 on 2020-11-20.

    • colocate_with

      When it needs to use Colocation Join function, use this parameter to set Colocation Group.

      "colocate_with" = "group1"

    • bloom_filter_columns

      Users specify the list of column names to be added with Bloom Filter index. The Bloom Filter index of each column is independent, not combined.

      "bloom_filter_columns" = "k1, k2, k3"

    • in_memory

      Set whether the table is Memory Table through this property.

      "in_memory" = "true"

    • function_column.sequence_type

      When using the UNIQUE KEY model, a sequence column can be specified. When the KEY columns are the same, REPLACE will be done by the sequence column (use the larger value to replace the smaller value, otherwise the replacement cannot be executed)

      Here we only need to specify the type of sequence columns, time type or integer type is supported. Palo will create a hidden sequence column.

      "function_column.sequence_type" = 'Date'

    • Dynamic partition related

      Parameters related to dynamic partition are as follows:

      • dynamic_partition.enable: used to specify whether the dynamic partitioning function at table level is turned on. Default to true。
      • dynamic_partition.time_unit: Used to specify the time unit for dynamically adding partitions, which can be selected as DAY, WEEK and MONTH
      • dynamic_partition.start: Used to specify how many partitions are deleted forward. The value must be less than 0. Default to Integer.MIN_VALUE。
      • dynamic_partition.end: Used to specify the number of partitions created in advance. The value must be greater than 0.
      • dynamic_partition.prefix: Used to specify the prefix of the created partition name. for example, if the prefix of the partition name is p, the partition name will be automatically created as p20200108
      • dynamic_partition.buckets: Used to specify the number of partition and buckets automatically created

Example

  1. Create a detailed model table

    CREATE TABLE example_db.table_hash
    (
        k1 TINYINT,
        k2 DECIMAL(10, 2) DEFAULT "10.5",
        k3 CHAR(10) COMMENT "string column",
        k4 INT NOT NULL DEFAULT "1" COMMENT "int column"
    )
    COMMENT "my first table"
    DISTRIBUTED BY HASH(k1) BUCKETS 32
  2. Create a detailed model table, partition and specify the ordering column, set replication number as 1

    CREATE TABLE example_db.table_hash
    (
        k1 DATE,
        k2 DECIMAL(10, 2) DEFAULT "10.5",
        k3 CHAR(10) COMMENT "string column",
        k4 INT NOT NULL DEFAULT "1" COMMENT "int column"
    )
    DUPLICATE KEY(k1, k2)
    COMMENT "my first table"
    PARTITION BY RANGE(k1)
    (
        PARTITION p1 VALUES LESS THAN ("2020-02-01"),
        PARTITION p1 VALUES LESS THAN ("2020-03-01"),
        PARTITION p1 VALUES LESS THAN ("2020-04-01")
    )
    DISTRIBUTED BY HASH(k1) BUCKETS 32
    PROPERTIES (
        "replication_num" = "1"
    );
  3. Create a unique key model table, set the initial storage medium and cooldown time

    CREATE TABLE example_db.table_hash
    (
        k1 BIGINT,
        k2 LARGEINT,
        v1 VARCHAR(2048) REPLACE,
        v2 SMALLINT SUM DEFAULT "10"
    )
    UNIQUE KEY(k1, k2)
    DISTRIBUTED BY HASH (k1, k2) BUCKETS 32
    PROPERTIES(
        "storage_medium" = "SSD",
        "storage_cooldown_time" = "2015-06-04 00:00:00"
    );
  4. Create an aggregate model table and describe by partition with fixed zone

    CREATE TABLE table_range
    (
        k1 DATE,
        k2 INT,
        k3 SMALLINT,
        v1 VARCHAR(2048) REPLACE,
        v2 INT SUM DEFAULT "1"
    )
    AGGREGATE KEY(k1, k2, k3)
    PARTITION BY RANGE (k1, k2, k3)
    (
        PARTITION p1 VALUES [("2014-01-01", "10", "200"), ("2014-01-01", "20", "300")),
        PARTITION p2 VALUES [("2014-06-01", "100", "200"), ("2014-07-01", "100", "300"))
    )
    DISTRIBUTED BY HASH(k2) BUCKETS 32
  5. Create an aggregate model table containing HLL and BITMAP column types

    CREATE TABLE example_db.example_table
    (
        k1 TINYINT,
        k2 DECIMAL(10, 2) DEFAULT "10.5",
        v1 HLL HLL_UNION,
        v2 BITMAP BITMAP_UNION
    )
    ENGINE=olap
    AGGREGATE KEY(k1, k2)
    DISTRIBUTED BY HASH(k1) BUCKETS 32
  6. Create two tables self-maintained by same Colocation Group.

    CREATE TABLE t1 (
        id int(11) COMMENT "",
        value varchar(8) COMMENT ""
    )
    DUPLICATE KEY(id)
    DISTRIBUTED BY HASH(id) BUCKETS 10
    PROPERTIES (
        "colocate_with" = "group1"
    );
    
    CREATE TABLE t2 (
        id int(11) COMMENT "",
        value1 varchar(8) COMMENT "",
        value2 varchar(8) COMMENT ""
    )
    DUPLICATE KEY(`id`)
    DISTRIBUTED BY HASH(`id`) BUCKETS 10
    PROPERTIES (
        "colocate_with" = "group1"
    );
  7. Create a memory table with bitmap index and bloom filter index

    CREATE TABLE example_db.table_hash
    (
        k1 TINYINT,
        k2 DECIMAL(10, 2) DEFAULT "10.5",
        v1 CHAR(10) REPLACE,
        v2 INT SUM,
        INDEX k1_idx (k1) USING BITMAP COMMENT 'my first index'
    )
    AGGREGATE KEY(k1, k2)
    DISTRIBUTED BY HASH(k1) BUCKETS 32
    PROPERTIES (
        "bloom_filter_columns" = "k2",
        "in_memory" = "true"
    );
  8. Create a dynamic partition table.

    Create partitions for three days in the table in advance every day, and delete partitions three days before. For example, today is 2020-01-08, then the partitions named as p20200108, p20200109, p20200110, p20200111 will be created. The ranges of such partitions are respectively:

    [types: [DATE]; keys: [2020-01-08]; ‥types: [DATE]; keys: [2020-01-09]; )
    [types: [DATE]; keys: [2020-01-09]; ‥types: [DATE]; keys: [2020-01-10]; )
    [types: [DATE]; keys: [2020-01-10]; ‥types: [DATE]; keys: [2020-01-11]; )
    [types: [DATE]; keys: [2020-01-11]; ‥types: [DATE]; keys: [2020-01-12]; )
    CREATE TABLE example_db.dynamic_partition
    (
        k1 DATE,
        k2 INT,
        k3 SMALLINT,
        v1 VARCHAR(2048),
        v2 DATETIME DEFAULT "2014-02-04 15:36:00"
    )
    DUPLICATE KEY(k1, k2, k3)
    PARTITION BY RANGE (k1) ()
    DISTRIBUTED BY HASH(k2) BUCKETS 32
    PROPERTIES(
        "dynamic_partition.time_unit" = "DAY",
        "dynamic_partition.start" = "-3",
        "dynamic_partition.end" = "3",
        "dynamic_partition.prefix" = "p",
        "dynamic_partition.buckets" = "32" 
    );
  9. Create a table with materialized view (ROLLUP).

    CREATE TABLE example_db.rolup_index_table
    (
        event_day DATE,
        siteid INT DEFAULT '10',
        citycode SMALLINT,
        username VARCHAR(32) DEFAULT '',
        pv BIGINT SUM DEFAULT '0'
    )
    AGGREGATE KEY(event_day, siteid, citycode, username)
    DISTRIBUTED BY HASH(siteid) BUCKETS 10
    ROLLUP (
        r1(event_day,siteid),
        r2(event_day,citycode),
        r3(event_day)
    )
    PROPERTIES("replication_num" = "3");

Keyword

CREATE, TABLE

Best Practices

Partition and bucketing

In a table, a bucketing column must be specified, and a partition may not. Detailed introduction of partition and bucketing can be found in Data Division document.

Tables in Palo can be divided into partitioned tables and non-partitioned tables. This property is determined when the table is created, and cannot be changed afterwards. For partitioned tables, operations such as addition, deletion can be done during the process of use, while for non-partitioned tables, such operations are not allowed.

At the same time, the partition columns and bucketing columns cannot be changed after the table is created, neither the type of partition column and bucketing column can be changed, nor any addition or deletion operation can be performed on these columns.

So it is suggested to confirm the usage mode before building the table.

Dynamic Partition

The dynamic partition function is mainly used to help users manage partitions automatically. After set with related rules, Palo system regularly adds new partitions or deletes historical partitions. For more help, refer to Dynamic Partition document.

Materialized Views

Users can create multiple materialized views (ROLLUP) while building tables. Materialized views can also be added after the table is created. It can facilitate users to create all materialized views at one time by writing it in table building statements.

If materialized views are created during table building process, all subsequent data import operations will generate materialized view data synchronously. The number of materialized views may affect the efficiency of data import.

If materialized views are added during later use, if there is data in the table, the creation time of materialized views depends on the current data amount.

Introduction to materialized views can be found in Materialized Views.

Index

Users can create indexes of multiple columns while building tables. Indexes can also be added after the table is created.

When adding index in the later use, if there is data existing in the table, it is required to rewrite all the data, and the index creation time will be determined by the current data volume.

Memory Table

When the "in_memory" = "true" property is specified when building table, Palo will cache the data blocks of this table in the PageCache of the storage engine as far as possible to reduce the disk IO. However, this property cannot guarantee the data block is stored in the memory all the time but only works as an indicator of best-effort.

Previous
CREATE-FILE
Next
CREATE-REPOSITORY