Relation Model and Data Division
This document mainly introduces the table creation and data partition of Palo, as well as the problems and solutions that may be encountered in table creation.
Basic concepts
In Palo, data are logically described in the form of relation tables.
Row & Column
A table includes rows and columns. Row is a row of data of a user. Column is used to describe different fields in a row of data.
In the default data model, columns are only divided into sorted and non-sorted. The storage engine will sort and store the data according to the sorting sequence, and build a sparse index to quickly search the sorted data.
In the aggregation model, columns can be divided into two categories: Key and Value. Key and Value, from the perspective of business, correspond to dimension column and indicator column respectively. From the perspective of aggregation model, rows with the same key column are aggregated into one row. And the aggregation mode of Value column is specified by the user when creating the table. Refer to Data model document for more instructions of aggregation model.
Partition & Tablet
In storage engine of Palo, user data are first divided into several partitions, and the partition rule is usually based on the partition column specified by the user, such as by time. In each partitions, the data are further divided into buckets by installing Hash, and the rule of bucket is to find the value of bucket column specified by the user for Hash bucket. Each bucket is a data tablet, which is also the smallest logical unit of data partition.
Stored independently, the direct data of tablet have no intersection. Tablet is also the smallest physical storage unit for data movement, copying and other operations.
Partition can be regarded as the smallest management unit in logic. Data can be loaded or deleted for only one partition.
Data partition
We use a table creation operation to illustrate data partition of Palo.
Table creation of Palo is a synchronous command, if the command has successful return, the table is successfully created.
Refer to CREATE TABLE for more help.
This section, through an example, introduces how Palo creates tables.
CREATE TABLE IF NOT EXISTS example_db.expamle_tbl
(
`user_id` LARGEINT NOT NULL COMMENT "user id",
`date` DATE NOT NULL COMMENT "date and time of data injection",
`timestamp` DATETIME NOT NULL COMMENT "timestamp of data injection",
`city` VARCHAR(20) COMMENT "user's city",
`age` SMALLINT COMMENT "user's age",
`sex` TINYINT COMMENT "user's sex",
`last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "last visit time of user",
`cost` BIGINT SUM DEFAULT "0" COMMENT "Total consumption of user",
`max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum dwelling time of users",
`min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum dwelling time of users"
)
ENGINE=olap
AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`)
PARTITION BY RANGE(`date`)
(
PARTITION `p201701` VALUES LESS THAN ("2017-02-01"),
PARTITION `p201702` VALUES LESS THAN ("2017-03-01"),
PARTITION `p201703` VALUES LESS THAN ("2017-04-01")
)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
PROPERTIES
(
"replication_num" = "3",
"storage_medium" = "SSD",
"storage_cooldown_time" = "2018-01-01 12:00:00"
);
Column definition
Here we only take AGGREGATE KEY data model as an example. Refer to Data model for more data models.
Refer to CREATE TABLE document for basic types of columns.
In AGGREGATE KEY data model, all columns without specified aggregation methods (SUM, REPLACE, MAX, MIN) are regarded as Key columns. The rest are the Value column.
Refer to the following suggestions for column definition:
- Key column must precede all Value columns.
- Choose an integer type as much as possible. Because the calculation and search efficiency of integer type are much higher than those of character strings.
- For the selection principle of integer types of different lengths, follow Enough is OK .
- For the lengths of VARCHAR and CHAR types, follow Enough is OK .
- The total byte length of all columns (including Key and Value) cannot exceed 100KB.
Partition and bucket
Palo supports two layers of data partition. The first layer is Partition, which only supports Range partition. The second layer is Bucket (Tablet), which only supports Hash partition.
You can also use only one layer of partition. When using a layer of partition, only Bucket is supported.
-
Partition
- Partition column can specify one or more columns and the partition column must be Key column. The usage of multi column partition is summarized in the following Multi column partition.
- No matter what type of partition column is, double quotes are required when writing partition values.
- The partition column is usually a time column to facilitate the management of old and new data.
- In theory, there is no upper limit on the number of partitions.
- When creating a table without Partition, the system will automatically generate a Partition with the same name as the table name and a full range of values. The Partition is invisible to the user and cannot be deleted or modified.
- Partition supports specifying only the upper bound through
VALUES LESS THAN (...)
, the system will take the upper bound of the previous partition as the lower bound of the partition and generate a left closed and right open interval. It also supports specifying both upper and lower bounds throughVALUES [...)
to generate a left closed and right open interval. -
It is easy to understand that
VALUES [...)
is used to specify both upper and lower bounds. Here is an example to show how the partition range changes when theVALUES LESS THAN (...)
statement is used to add or delete partitions-
When the table is created, we can see in the above example, the following three partitions will be automatically generated:
p201701: [MIN_VALUE, 2017-02-01) p201702: [2017-02-01, 2017-03-01) p201703: [2017-03-01, 2017-04-01)
-
When we add a partition p201705 VALUES LESS THAN ("2017-06-01") , the partition result is as follows:
p201701: [MIN_VALUE, 2017-02-01) p201702: [2017-02-01, 2017-03-01) p201703: [2017-03-01, 2017-04-01) p201705: [2017-04-01, 2017-06-01)
-
Now we delete the partition p201703, and the partition result is as follows:
p201701: [MIN_VALUE, 2017-02-01) p201702: [2017-02-01, 2017-03-01) p201705: [2017-04-01, 2017-06-01)
We can note that the partition ranges of p201702 and p201705 have not changed, but there is an empty hole between the two partitions: [2017-03-01, 2017-04-01].This means that if the loaded data range is within this empty hole range, the data cannot be loaded.
-
We continue to delete partition p201702, and the partition result is as follows:
p201701: [MIN_VALUE, 2017-02-01) p201705: [2017-04-01, 2017-06-01) The empty hole range is:[2017-02-01, 2017-04-01)
-
Now we add a partition p201702new VALUES LESS THAN ("2017-03-01"), and the partition result is as follows:
p201701: [MIN_VALUE, 2017-02-01) p201702new: [2017-02-01, 2017-03-01) p201705: [2017-04-01, 2017-06-01)
The empty hole range is reduced to: [2017-03-01, 2017-04-01)
-
Now we delete the partition p201701 and add the partition p201612 VALUES LESS THAN ("2017-01-01"), the partition result is as follows:
p201612: [MIN_VALUE, 2017-01-01) p201702new: [2017-02-01, 2017-03-01) p201705: [2017-04-01, 2017-06-01)
There occurs a new empty hole: [2017-01-01, 2017-02-01)
-
To sum up, the deletion of partitions will not change the scope of existing partitions. Deleting a partition may result in an empty hole. When adding a partition through
VALUES LESS THAN
statement, the lower bound of the partition is immediately followed by the upper bound of the previous partition.Adding partitions with overlapping ranges is not allowed.
-
Bucket
- If Partition is used, the
DISTRIBUTED ...
statement describes the partition rules of the data in every partitions. While if Partition is not used, it describes the partition rules for the data of the whole table. - The bucket column can be multiple columns, but they must be Key columns. The bucket column can be the same or different from the partition column.
-
The selection of bucket columns is a trade-off between query throughput and query concurrency.
- If multiple bucket columns are selected, the data distribution is more uniform. If a query condition does not contain the equivalent conditions of all bucket columns, the query will trigger all bucket scanning simultaneously, so the query throughput will increase and the delay of a single query will decrease. This method is suitable for high throughput and low concurrency query scenarios.
- If only one or a few bucket columns are selected, the corresponding point query can trigger only one bucket scan. At this time, when multiple point queries are concurrent, these queries are more likely to trigger different bucket scans respectively, and the IO impact between queries is relatively small (especially when different buckets are distributed on different disks), so this method is suitable for high concurrency point query scenarios.
- Theoretically, there is no upper limit on the number of buckets.
- If Partition is used, the
-
Suggestions on the number and data volume of Partition and Bucket.
- The total number of Tablets in a table is equal to (Partition num * Bucket num).
- The number of Tablet in a table is recommended to be slightly more than the number of disks in the whole cluster without considering the expansion.
- Theoretically, there is no upper and lower bound for data volume in a single Tablet, but the range of 1G - 10G is recommended. If the data volume of a single Tablet is too small, the data aggregation effect is unfavorable, and the metadata management pressure is high. If the data volume is too large, it is adverse to the migration and completion of the replica, which will increase the cost of Schema Change or Rollup operation failure retrial (the granularity of these operation failure retrial is Tablet).
- When the data volume principle of Tablet conflicts with the quantity principle, data volume principle is recommended priority to quantity principle.
- When creating a table, the number of Bucket in each partitions is specified uniformly. However, when adding partitions dynamically (
ADD PARTITION
), the number of Bucket in the new partition can be specified separately. This function can easily deal with data shrinkage or expansion. - Once the number of Buckets in a Partition is specified, it is unchangeable. Therefore, when determining the number of Bucket, the cluster expansion should be considered in advance. For example, there are only 3 hosts at present, each host has 1 disk, and if the number of Buckets is set to 3 or less, the concurrency will not be increased even if the machine is added afterwards.
- Here is an example: Suppose there are 10 BE and one disk for each BE, if the total size of a table is 500MB, then 4-8 partitions can be considered. 5GB: 8-16 partitions。50GB: 32 partitions. 500GB: It is recommended to set each partition to 50GB or so, and each partition has 16-32 tablets. 5TB: It is recommended to set each partition to 50GB or so, and each partition has 16-32 tablets.
Note: the data volume in the table can be viewed through
show data
command, and the result should be divided by the number of copies to get the data volume of the table.
Multi-column partition
Palo supports specifying multiple columns as partition columns. The example is as follows:
PARTITION BY RANGE(`date`, `id`)
(
PARTITION `p201701_1000` VALUES LESS THAN ("2017-02-01", "1000"),
PARTITION `p201702_2000` VALUES LESS THAN ("2017-03-01", "2000"),
PARTITION `p201703_all` VALUES LESS THAN ("2017-04-01")
)
In the above example, we specify date
(DATE type) and id
(INT type) as partition columns. The final partition of the above example is as follows:
* p201701_1000: [(MIN_VALUE, MIN_VALUE), ("2017-02-01", "1000") )
* p201702_2000: [("2017-02-01", "1000"), ("2017-03-01", "2000") )
* p201703_all: [("2017-03-01", "2000"), ("2017-04-01", MIN_VALUE))
Note that in the last partition, the user only specifies the partition column value asdate
by default, so the partition value of id
column will be filled in by default MIN_VALUE
. When the user inserts data, the partition column values will be compared in order to get the corresponding partitions. Here are the examples:
* Data --> Partition
* 2017-01-01, 200 --> p201701_1000
* 2017-01-01, 2000 --> p201701_1000
* 2017-02-01, 100 --> p201701_1000
* 2017-02-01, 2000 --> p201702_2000
* 2017-02-15, 5000 --> p201702_2000
* 2017-03-01, 2000 --> p201703_all
* 2017-03-10, 1 --> p201703_all
* 2017-04-01, 1000 --> Unable to load
* 2017-05-01, 1000 --> Unable to load