Full Data Export
Full data export (Export) is a data export function provided by Palo.
This function can export the table or section data, specified by the user, to remote storage as HDFS / BOS through Broker process in text format.
This document mainly introduces the basic use of Export function.
Function introduction
Export function is asynchronous. The system generates a distributed data scanning plan after the user specifies the table or some sections in the table to be exported through Export statement, and multiple Compute Nodes scan and read the data to write them to the remote storage through Broker process.
The minimum granularity of Export function is the section of the table.
Currently the export function is rather simple, supports only the whole column export of the table rather than the mapping, filtering or converting of the columns of the table.
Submit export operation
Submit an export operation with the following statement
EXPORT TABLE example_tbl
PARTITION(p1, p2)
TO "bos://my_bucket/export/"
WITH BROKER "bos"
(
"bos_endpoint" = "http://bj.bcebos.com",
"bos_accesskey" = "xxxxxxxxxxxxxxxxxxxxxxxxxx",
"bos_secret_accesskey"="yyyyyyyyyyyyyyyyyyyy"
);
The statement specifies to export section p1
and p2
in table example_tb1
to the directory bos://my_bucket/export/
under BOS.
Refer to EXPORT for detailed help of export command.
Execution of export operation
The export operation generates a host of query plans, each of which is responsible for scanning partial data tablet (Tablet).
Each query plan scans 5 tablets by default. That is, assuming there are 100 Tablets in total, 20 query plans will be generated.
Also, the user can specify the value through operation property tablet_num_per_task
when submitting the operation.
Multiple query plans of one operation are executed sequentially.
A query plan scans a host of Tablets and organizes the read data in the form of rows, and then write the data to the remote storage through calling Broker with every 1024 rows seen as a batch.
The query plan will automatically retry 3 times if it encounters any error. If a query plan still fails after 3 retries, the whole operation fails.
Structure of exported file
First, export operation creates a temporary directory called __doris_export_tmp_12345
(where 12345 is the operation id) in the specified remote storage path.
The exported data are first written to the temporary directory. Each query plan generates a file, whose name is shown as follows as examples:
export-data-c69fcf2b6db5420f-a96b94c1ff8bccef-1561453713822
where c69fcf2b6db5420f-a96b94c1ff8bccef
is the ID of the query plan. 1561453713822
is the timestamp of the file generation.
Doris will transfer these files to the path specified by the user after the export of all the data.
View operation progress
The user can query the imported operation status after submitting the operation through SHOW EXPORT command. The results are as follows:
JobId: 14008
State: FINISHED
Progress: 100%
TaskInfo: {"partitions":["p1", "p2"],"exec mem limit":2147483648,"column separator":",","line delimiter":"\n","tablet num":1,"broker":"hdfs","coord num":1,"db":"default_cluster:db1","tbl":"tbl3"}
Path: bos://my_bucket/export/
CreateTime: 2019-06-25 17:08:24
StartTime: 2019-06-25 17:08:28
FinishTime: 2019-06-25 17:08:34
Timeout: 3600
ErrorMsg: N/A
The export is complete when the operation status showed FINISHED.
Refer to:SHOW EXPORT for detailed help about SHOW EXPORT.