CREATE-TABLE
CREATE TABLE
Description
This command is used to create a table. This article mainly introduces the syntax of creating tables self-maintained by Palo. For syntax of external tables, refer to [CREATE EXTERNAL TABLE](PALO/Operating Manual/Data Load /Synchronize Data Through External Table.md#) Document.
CREATE TABLE [IF NOT EXISTS] [database.]table
(
column_definition_list,
[index_definition_list]
)
[engine_type]
[keys_type]
[table_comment]
[partition_info]
distribution_info
[rollup_list]
[properties]
[extra_properties]
-
column_definition_list
Column definition list:
column_definition[, column_definition]
-
column_definition
Column definition:
column_name column_type [KEY] [aggr_type] [NULL] [default_value] [column_comment]
-
column_type
Column type, the following types are supported:
TINYINT(1 byte) Range: -2^7 + 1 ~ 2^7 - 1 SMALLINT(2 bytes) Range: -2^15 + 1 ~ 2^15 - 1 INT(4 bytes) Range: -2^31 + 1 ~ 2^31 - 1 BIGINT(8 bytes) Range: -2^63 + 1 ~ 2^63 - 1 LARGEINT(16 bytes) Range: -2^127 + 1 ~ 2^127 - 1 FLOAT(4 bytes) support scientific counting method DOUBLE(12 bytes) support scientific counting method DECIMAL[(precision, scale)] (16 bytes) guarantee the decimal type of precision, default to DECIMAL(10, 0) precision: 1 ~ 27 scale: 0 ~ 9 In which, the integer part is 1 ~ 18 not support the scientific counting method DATE(3 bytes) Range: 0000-01-01 ~ 9999-12-31 DATETIME(8 bytes) Range:0000-01-01 00:00:00 ~ 9999-12-31 23:59:59 CHAR[(length)] fixed-length string. Range of length: 1 ~ 255. Default to 1 VARCHAR[(length)] Variable-length string. Range of length: 1 ~ 65533. Default to 1 HLL (1~16385 bytes) HyperLogLog column type,there is no need to specify the length and default value.The length is controlled in the system according to the aggregation degree of data. Must be used in combination with HLL_UNION aggregation type. BITMAP bitmap column type,there is no need to specify the length and default value.It represents the sets of integers, elements support up to 2^64 - 1. Must be used in combination with BITMAP_UNION aggregation type.
-
aggr_type
Aggregation type, the following types are supported:
SUM:sum.Applicable to numeric type. MIN:minimum value. Applicable to numeric type. MAX:maximum value. Applicable to numeric type. REPLACE:replace. For rows with the same dimension column, the index columns will be imported in the order of import, and replace the one imported previously with the one imported later. REPLACE_IF_NOT_NULL:replace if not null. The difference between it with REPLACE lies in only replacing values not null. HLL_UNION:aggregation method for columns of HLL type, aggregate by HyperLogLog algorithm. BITMAP_UNION:aggregation method for columns of BIMTAP type, aggregate the union set of bitmaps.
Example:
``` k1 TINYINT, k2 DECIMAL(10,2) DEFAULT "10.5", k4 BIGINT NULL DEFAULT VALUE "1000" COMMENT "This is column k4", v1 VARCHAR(10) REPLACE NOT NULL, v2 BITMAP BITMAP_UNION, v3 HLL HLL_UNION, v4 INT SUM NOT NULL DEFAULT "1" COMMENT "This is column v4" ```
-
-
-
index_definition_list
Index list definition:
index_definition[, index_definition]
-
index_definition
Index definition:
INDEX index_name (col_name[, col_name, ...]) [USING BITMAP] COMMENT 'xxxxxx'
Example:
INDEX idx1 (k1, k2) USING BITMAP COMMENT "This is a bitmap index"
-
-
engine_type
Table engine type. All types in this document are OLAP. For other external table engine types, refer to CREATE EXTERNAL TABLE document. Example:
ENGINE=olap
-
key_desc
Data model.
key_type(col1, col2, ...)
key_type
supports the following models:- DUPLICATE KEY(default): columns specified after that are ordering columns.
- AGGREGATE KEY:the columns specified after that are dimensional columns.
- UNIQUE KEY:the columns specified after that are key columns.
Example:
DUPLICATE KEY(col1, col2), AGGREGATE KEY(k1, k2, k3), UNIQUE KEY(k1, k2)
-
table_comment
Table comment. Example:
COMMENT "This is my first PALO table"
-
partition_desc
Partition information, support two kinds of writing:
-
LESS THAN:only define the upper bound of partition. Lower bound is determined by the upper bound of last partition.
PARTITION BY RANGE(col1[, col2, ...]) ( PARTITION partition_name1 VALUES LESS THAN MAXVALUE|("value1", "value2", ...), PARTITION partition_name2 VALUES LESS THAN MAXVALUE|("value1", "value2", ...) )
-
FIXED RANGE: define the left-closed and right open interval of partition.
PARTITION BY RANGE(col1[, col2, ...]) ( PARTITION partition_name1 VALUES [("k1-lower1", "k2-lower1", "k3-lower1",...), ("k1-upper1", "k2-upper1", "k3-upper1", ...)), PARTITION partition_name2 VALUES [("k1-lower1-2", "k2-lower1-2", ...), ("k1-upper1-2", MAXVALUE, )) )
-
-
distribution_desc
Define the data bucketing method.
DISTRIBUTED BY HASH (k1[,k2 ...]) [BUCKETS num]
-
rollup_list
Multiple materialized views (ROLLUP) can be created while creating tables.
ROLLUP (rollup_definition[, rollup_definition, ...])
-
rollup_definition
rollup_name (col1[, col2, ...]) [DUPLICATE KEY(col1[, col2, ...])] [PROPERTIES("key" = "value")]
Example:
ROLLUP ( r1 (k1, k3, v1, v2), r2 (k1, v1) )
-
-
properties
Set table properties. Currently the following properties are supported:
-
replication_num
Replication number. The default replication number is 3. If the number of BE node is less than 3, it is required to specify the replication number less than or equal to the number of BE node.
-
storage_medium/storage_cooldown_time
Data storage medium.
storage_medium
is used to declare the initial storage medium of table data, whilestorage_cooldown_time
is used to set the expiration time. Example:"storage_medium" = "SSD", "storage_cooldown_time" = "2020-11-20 00:00:00"
This example indicates that the data is stored in SSD and will be automatically migrated to HDD storage after it expires at 00: 00: 00 on 2020-11-20.
-
colocate_with
When it needs to use Colocation Join function, use this parameter to set Colocation Group.
"colocate_with" = "group1"
-
bloom_filter_columns
Users specify the list of column names to be added with Bloom Filter index. The Bloom Filter index of each column is independent, not combined.
"bloom_filter_columns" = "k1, k2, k3"
-
in_memory
Set whether the table is Memory Table through this property.
"in_memory" = "true"
-
function_column.sequence_type
When using the UNIQUE KEY model, a sequence column can be specified. When the KEY columns are the same, REPLACE will be done by the sequence column (use the larger value to replace the smaller value, otherwise the replacement cannot be executed)
Here we only need to specify the type of sequence columns, time type or integer type is supported. Palo will create a hidden sequence column.
"function_column.sequence_type" = 'Date'
-
Dynamic partition related
Parameters related to dynamic partition are as follows:
dynamic_partition.enable
: used to specify whether the dynamic partitioning function at table level is turned on. Default to true。dynamic_partition.time_unit:
Used to specify the time unit for dynamically adding partitions, which can be selected as DAY, WEEK and MONTHdynamic_partition.start
: Used to specify how many partitions are deleted forward. The value must be less than 0. Default to Integer.MIN_VALUE。dynamic_partition.end
: Used to specify the number of partitions created in advance. The value must be greater than 0.dynamic_partition.prefix
: Used to specify the prefix of the created partition name. for example, if the prefix of the partition name is p, the partition name will be automatically created as p20200108dynamic_partition.buckets
: Used to specify the number of partition and buckets automatically created
-
Example
-
Create a detailed model table
CREATE TABLE example_db.table_hash ( k1 TINYINT, k2 DECIMAL(10, 2) DEFAULT "10.5", k3 CHAR(10) COMMENT "string column", k4 INT NOT NULL DEFAULT "1" COMMENT "int column" ) COMMENT "my first table" DISTRIBUTED BY HASH(k1) BUCKETS 32
-
Create a detailed model table, partition and specify the ordering column, set replication number as 1
CREATE TABLE example_db.table_hash ( k1 DATE, k2 DECIMAL(10, 2) DEFAULT "10.5", k3 CHAR(10) COMMENT "string column", k4 INT NOT NULL DEFAULT "1" COMMENT "int column" ) DUPLICATE KEY(k1, k2) COMMENT "my first table" PARTITION BY RANGE(k1) ( PARTITION p1 VALUES LESS THAN ("2020-02-01"), PARTITION p1 VALUES LESS THAN ("2020-03-01"), PARTITION p1 VALUES LESS THAN ("2020-04-01") ) DISTRIBUTED BY HASH(k1) BUCKETS 32 PROPERTIES ( "replication_num" = "1" );
-
Create a unique key model table, set the initial storage medium and cooldown time
CREATE TABLE example_db.table_hash ( k1 BIGINT, k2 LARGEINT, v1 VARCHAR(2048) REPLACE, v2 SMALLINT SUM DEFAULT "10" ) UNIQUE KEY(k1, k2) DISTRIBUTED BY HASH (k1, k2) BUCKETS 32 PROPERTIES( "storage_medium" = "SSD", "storage_cooldown_time" = "2015-06-04 00:00:00" );
-
Create an aggregate model table and describe by partition with fixed zone
CREATE TABLE table_range ( k1 DATE, k2 INT, k3 SMALLINT, v1 VARCHAR(2048) REPLACE, v2 INT SUM DEFAULT "1" ) AGGREGATE KEY(k1, k2, k3) PARTITION BY RANGE (k1, k2, k3) ( PARTITION p1 VALUES [("2014-01-01", "10", "200"), ("2014-01-01", "20", "300")), PARTITION p2 VALUES [("2014-06-01", "100", "200"), ("2014-07-01", "100", "300")) ) DISTRIBUTED BY HASH(k2) BUCKETS 32
-
Create an aggregate model table containing HLL and BITMAP column types
CREATE TABLE example_db.example_table ( k1 TINYINT, k2 DECIMAL(10, 2) DEFAULT "10.5", v1 HLL HLL_UNION, v2 BITMAP BITMAP_UNION ) ENGINE=olap AGGREGATE KEY(k1, k2) DISTRIBUTED BY HASH(k1) BUCKETS 32
-
Create two tables self-maintained by same Colocation Group.
CREATE TABLE t1 ( id int(11) COMMENT "", value varchar(8) COMMENT "" ) DUPLICATE KEY(id) DISTRIBUTED BY HASH(id) BUCKETS 10 PROPERTIES ( "colocate_with" = "group1" ); CREATE TABLE t2 ( id int(11) COMMENT "", value1 varchar(8) COMMENT "", value2 varchar(8) COMMENT "" ) DUPLICATE KEY(`id`) DISTRIBUTED BY HASH(`id`) BUCKETS 10 PROPERTIES ( "colocate_with" = "group1" );
-
Create a memory table with bitmap index and bloom filter index
CREATE TABLE example_db.table_hash ( k1 TINYINT, k2 DECIMAL(10, 2) DEFAULT "10.5", v1 CHAR(10) REPLACE, v2 INT SUM, INDEX k1_idx (k1) USING BITMAP COMMENT 'my first index' ) AGGREGATE KEY(k1, k2) DISTRIBUTED BY HASH(k1) BUCKETS 32 PROPERTIES ( "bloom_filter_columns" = "k2", "in_memory" = "true" );
-
Create a dynamic partition table.
Create partitions for three days in the table in advance every day, and delete partitions three days before. For example, today is
2020-01-08
, then the partitions named asp20200108
,p20200109
,p20200110
,p20200111
will be created. The ranges of such partitions are respectively:[types: [DATE]; keys: [2020-01-08]; ‥types: [DATE]; keys: [2020-01-09]; ) [types: [DATE]; keys: [2020-01-09]; ‥types: [DATE]; keys: [2020-01-10]; ) [types: [DATE]; keys: [2020-01-10]; ‥types: [DATE]; keys: [2020-01-11]; ) [types: [DATE]; keys: [2020-01-11]; ‥types: [DATE]; keys: [2020-01-12]; )
CREATE TABLE example_db.dynamic_partition ( k1 DATE, k2 INT, k3 SMALLINT, v1 VARCHAR(2048), v2 DATETIME DEFAULT "2014-02-04 15:36:00" ) DUPLICATE KEY(k1, k2, k3) PARTITION BY RANGE (k1) () DISTRIBUTED BY HASH(k2) BUCKETS 32 PROPERTIES( "dynamic_partition.time_unit" = "DAY", "dynamic_partition.start" = "-3", "dynamic_partition.end" = "3", "dynamic_partition.prefix" = "p", "dynamic_partition.buckets" = "32" );
-
Create a table with materialized view (ROLLUP).
CREATE TABLE example_db.rolup_index_table ( event_day DATE, siteid INT DEFAULT '10', citycode SMALLINT, username VARCHAR(32) DEFAULT '', pv BIGINT SUM DEFAULT '0' ) AGGREGATE KEY(event_day, siteid, citycode, username) DISTRIBUTED BY HASH(siteid) BUCKETS 10 ROLLUP ( r1(event_day,siteid), r2(event_day,citycode), r3(event_day) ) PROPERTIES("replication_num" = "3");
Keyword
CREATE, TABLE
Best Practices
Partition and bucketing
In a table, a bucketing column must be specified, and a partition may not. Detailed introduction of partition and bucketing can be found in Data Division document.
Tables in Palo can be divided into partitioned tables and non-partitioned tables. This property is determined when the table is created, and cannot be changed afterwards. For partitioned tables, operations such as addition, deletion can be done during the process of use, while for non-partitioned tables, such operations are not allowed.
At the same time, the partition columns and bucketing columns cannot be changed after the table is created, neither the type of partition column and bucketing column can be changed, nor any addition or deletion operation can be performed on these columns.
So it is suggested to confirm the usage mode before building the table.
Dynamic Partition
The dynamic partition function is mainly used to help users manage partitions automatically. After set with related rules, Palo system regularly adds new partitions or deletes historical partitions. For more help, refer to Dynamic Partition document.
Materialized Views
Users can create multiple materialized views (ROLLUP) while building tables. Materialized views can also be added after the table is created. It can facilitate users to create all materialized views at one time by writing it in table building statements.
If materialized views are created during table building process, all subsequent data import operations will generate materialized view data synchronously. The number of materialized views may affect the efficiency of data import.
If materialized views are added during later use, if there is data in the table, the creation time of materialized views depends on the current data amount.
Introduction to materialized views can be found in Materialized Views.
Index
Users can create indexes of multiple columns while building tables. Indexes can also be added after the table is created.
When adding index in the later use, if there is data existing in the table, it is required to rewrite all the data, and the index creation time will be determined by the current data volume.
Memory Table
When the "in_memory" = "true"
property is specified when building table, Palo will cache the data blocks of this table in the PageCache of the storage engine as far as possible to reduce the disk IO. However, this property cannot guarantee the data block is stored in the memory all the time but only works as an indicator of best-effort.