Load Transaction and Atomicity
Load atomicity
All Load operations in Palo have atomicity guarantee, that is, the data in an load job are either all succeed or all fail instead of partial imported.
We can also implement atomic Load of multi table in BROKER LOAD .
For attached materialized view, the atomicity and consistency with the base table are also guaranteed.
Label mechanism
The load job of Palo can be set with a label, which is usually a user-defined string with certain business logic attributes.
The main function of Label is to uniquely identify an Load task and ensure that the same Label will be successfully imported only once.
Label mechanism can ensure that the Load data are not lost or duplicated. If the upstream data source can guarantee At-Least-Once semantic, then it can guarantee Exactly-Once semantic with the Label mechanism of Palo.
Label is unique in a database. Its default retention period is 3 days, which means that the completed Label will be automatically cleared after 3 days and the Label can be reused .
The best practices
Usually, Label is set to the format of business logic+time
like my_business1_20201010_125000
.
This Label usually represents a batch of data generated from business my_business1
at 2020-10-10 12:50:00
. Through this Label setting, Load task status of business can be queried through Label to know clearly whether the data of the batch at this time point has been loaded successfully. Try to Load again with this Label if fails.