SELECT-INTO-OUTFILE

Last Updated：2021-04-13

SELECT INTO OUTFILE

Description

This command is used to export the result set of SQL output to local disk or to remote storage through Broker.

query_stmt
INTO OUTFILE "file_path"
[format_as]
[properties]

query_stmt

Any query statement.
file_path

Path to file storage and file prefix. For example:
```
bos://my_bucket/my_file_            # remote catalog
file:///to/loca/path/my_file_       # local catalog
```
The final file name will consist of my_file_ + fileS/N + file format suffix .

The file serial number starts from 0, and the number is the number of files divided. For example:
```
my_file_0.csv
my_file_1.csv
my_file_2.csv
```
If the local file mode is used, the exported file will be stored on a random Compute Node inthe cluster. Specific node information is known in the URL of the returned result.
format_as

Export file format. Only support CN currently
```
FORMAT AS CSV
```
properties

Related properties. The properties required by broker must start with broker. . For example:
```
(
    "broker.prop_key" = "prop_val",
)
```
其他参数如：
- column_separator：Column separator, only applicable to CSV format. Default to \t.
- line_delimiter：line delimiter, only applicable to CSV format. Default to \n.
- max_file_size：The maximum size of a single file. Default to 1GB. The value range is between 5MB and 2GB. Files larger than this size will be split.
- success_file_name：Whether to generate an empty file identifier after success. File name is "my_file_file_name". In which, mysq_file is the prefix specified in file_path，file_name is the value of the parameter.

Description of return results:

If it exports and returns normally, the result is as follows:

mysql> select * from tbl1 limit 10 into outfile "file:///home/work/path/result_";
+------------+-----------+----------+--------------+
| FileNumber | TotalRows | FileSize | URL          |
+------------+-----------+----------+--------------+
|          1 |         2 |        8 | 192.168.1.10 |
+------------+-----------+----------+--------------+
1 row in set (0.05 sec)

FileNumber：The number of files finally generated.
TotalRows：Number of rows in result set.
FileSize：Total size of export file, in bytes.
URL：It exporting to local disk, it shows which Compute Node to export to.

If there is error in execution, an error message will be returned, such as:

mysql> SELECT * FROM tbl INTO OUTFILE ...
ERROR 1064 (HY000): errCode = 2, detailMessage = Open broker writer failed ...

Example

Export simple query results to file bos://my_bucket/result_. Specify that the export format is CSV. Use my_broker and set kerberos authentication information. The specified column separator is ,，and the line delimiter is \n.

SELECT * FROM tbl
INTO OUTFILE "bos://my_bucket/result_"
FORMAT AS CSV
PROPERTIES
(
    "broker.name" = "bos",
    "broker.bos_endpoint" = "http://bj.bcebos.com",
    "broker.bos_accesskey" = "xxxxxxxxxxxxxxxxxxxxxxxxxx",
    "broker.bos_secret_accesskey" = "yyyyyyyyyyyyyyyyyyyyyyyyyy",
    "column_separator" = ",",
    "line_delimiter" = "\n",
    "max_file_size" = "100MB"
);

If the final generated file is not larger than 100MB, then：result_0.csv.

If it is larger than 100MB, it may be result_0.csv, result_1.csv, ....

Export the query result of CTE statement to file bos://my_bucket/result_.

WITH
x1 AS
(SELECT k1, k2 FROM tbl1),
x2 AS
(SELECT k3 FROM tbl2)
SELEC k1 FROM x1 UNION SELECT k3 FROM x2
INTO OUTFILE "bos://my_bucket/result_"
PROPERTIES
(
    "broker.name" = "bos",
    "broker.bos_endpoint" = "http://bj.bcebos.com",
    "broker.bos_accesskey" = "xxxxxxxxxxxxxxxxxxxxxxxxxx",
    "broker.bos_secret_accesskey" = "yyyyyyyyyyyyyyyyyyyyyyyyyy"
);

If the final generated file is not larger than 1GB, then：result_0.csv.

If it is larger than 1GB, it may be result_0.csv, result_1.csv, ....

Export the query result of UNION statement to file bos: bos://my_bucket/result.txt. And an empty file identifier is generated after successful exporting.

SELECT k1 FROM tbl1 UNION SELECT k2 FROM tbl1
INTO OUTFILE "bos://bucket/result_"
PROPERTIES
(
    "broker.name" = "my_broker",
    "broker.bos_endpoint" = "http://bj.bcebos.com",
    "broker.bos_accesskey" = "xxxxxxxxxxxxxxxxxxxxxxxxxx",
    "broker.bos_secret_accesskey" = "yyyyyyyyyyyyyyyyyyyyyyyyyy",
    "success_file_name" = "SUCCESS"
);

If the final generated file is not larger than 1GB, then：result_0.parquet.

If it is larger than 1GB, it may be result_0.parquet, result_1.parquet, ....

A successful file is identified as result_SUCCESS.

Export results to local disk.

SELECT k1 FROM tbl1 UNION SELECT k2 FROM tbl1
INTO OUTFILE "file:///local/path/result_"
PROPERTIES
(
    "column_separator" = ",",
    "line_delimiter" = "\n",
    "max_file_size" = "100MB"
);

Keywords

SELECT, INTO, OUTFILE

Best Practices

Export data quantity and export efficiency

The function is essentially to execute a SQL query command. The final result is single-threaded output. Therefore, the time consumption of the whole export includes the time consumption of the query itself and the time consumption of writing the final result set. If the query is large, it is necessary to set the session variable query_timeout to extend the query timeout appropriately.
Management of export files

Palo does not manage exported files, Including those files that were successfully exported or remained after the export failed, all need to be handled by users themselves.
Export to local files

The function of exporting to local files is not applicable to public cloud users, but only to users deployed in privatization. And the default user has complete control authority over the cluster nodes. Palo does not check the legality of the export path filled in by the user. If the process user of Palo does not have write access to the path, or the path does not exist, an error will be reported. At the same time, for security reasons, if a file with the same name already exists in this path, the export will also fail.

Palo does not manage files exported locally, and does not check disk space, etc. These files need to be managed by users themselves, such as cleaning.
Guarantee of result integrity

This command is a synchronous command, so it is possible that the task connection is disconnected during the execution, and it is impossible to live. Whether the exported data ends normally or is complete. At this time, user can use the success_file_name parameter to request that a successful file identifier be generated in the directory after the task is successful. And user can judge whether the export ends normally through this file.

ROUTINE-LOAD

INSERT

百度智能云

Data Warehouse

SELECT-INTO-OUTFILE