Baidu AI Cloud
中国站

百度智能云

Data Warehouse

Load Overview

Supported data sources

Palo provides a variety of data load schemes to choose from for different data sources.

Data sources load methods
Baidu Object Storage (BOS), HDFS, AFS Load data with Broker Load
Local file Load local data
Baidu News Service (Kafka) Subscribe to Kafka log
MySQL, Oracle, PostgreSQL Synchronize data through external table
load data through JDBC Synchronize data throughJDBC
load data in JSON format Instructions of importing data in JSON format
MySQL binlog Please wait and see

General description of data load

The following are the instructions for common features of data load for Palo for users to better use the function.

Atomicity guarantee

Every load job in Palo is a complete transaction operation, whether through Broker Load for batch load or through INSERT statement for single import. The load transaction can ensure that the data atoms in a batch take effect instead of partial data write.

Also, each load job has a Label, which is unique under a Database and is used to uniquely identify an load job. Labels can be specified by users, and partial load functions can be generated automatically by the system.

Label is used to ensure that the corresponding load job can be successfully imported only once. A successfully imported label will be rejected and an error Label already used will be reported if it is used again. At-Most-Once semantic can be done in Palo through this mechanism. Combined with At-Least-Once semantic of the upstream system, Exactly-Once semantic of the imported data can be realized.

Refer to load transaction and atomicity for best practices on atomicity guarantee.

Synchronous load and asynchronous import

load methods include synchronous one and asynchronous one. For synchronous import, the return result indicates the success or failure of the import. For asynchronous import, successful return simply means successful operation submission rather than successful data load, the user need to view the running status of the load job through corresponding commands.

Supported data formats

Supported data formats slightly differ in terms of different load methods.

Load methods Supported formats
Broker Load Parquet, ORC, csv, gzip
Stream Load csv, gzip, json
Routine Load csv, json
Previous
Start to Use
Next
Load Local Data