Advanced Operations Guide

Last Updated：2021-11-02

Some common advanced features of Palo are listed in this document to help users fully understand Palo.

The specific instructions of some features will be introduced in a separate document.

Relationship table, partition and bucket division

In Palo, user data are stored in a two-dimensional relational table. And based on Shard-Nothing distributed architecture of Palo, the data of a table will be divided into multiple data slices (tablet) horizontally according to the partition and bucket division method specified by the user, and stored on different nodes.

Refer to Relation model and data partition document for specific instructions of partition and bucket division.

Data model

One of the characteristics of Palo is supporting both detailed data query and aggregate data query. The user can specify the data model of the table to adapt to different application scenarios.

Palo currently supports three data models: 1) Duplicate detail model. 2) Aggregate aggregation model. 3) Unqiue primary key model.

Refer to Data model document for specific instructions and usage suggestions for the three models.

Materialized view

Materialized view is a kind of data analysis acceleration technology, and Palo supports creating materialized views based on basic tables. For example, an aggregate view based on partial columns can be established on the table of detail data model, which can satisfy the fast query of detail data and aggregate data at the same time.

Also, Palo can automatically ensure the data consistency of materialized views and basic tables, and automatically match the appropriate materialized views when querying, which greatly reduces the data maintenance cost of users, providing users with a consistent and transparent query acceleration experience.

Refer to [Materialized view]（TODO） document for specific instructions of materialized views.

Changing of table structure

Palo supports online table structure changes, whose operations include adding, deleting, rearranging, modifying column types, adding and deleting partitions, and renaming libraries, tables, and partitions. All these operations will not affect the current loading or query, and can ensure that users can smoothly change the table structures in the production environment.

Refer to [Changing of table structure]（TODO）document for specific instructions of all change operations.

Multiple loading methods

We have introductions about how to load data stored in BOS in Basic operation guides. Besides, Palo itself supports various loading methods, such as loading local data through HTTP protocol or subscribing to messages in Kafka through Route Load function. The data can also be loaded directly through INSERT statement in real time .

Refer to Loading overview for more loading methods.

Data deletion and update

Palo supports two ways to delete loaded data. One is by specifying WHERE condition through DELETE FROM statement to delete data. This method is more general and suitable for timing deletion task with lower frequency.

The other method, only used for Unique primary key model, is to load primary key row data that are needed to be deleted by loading data .Palo The data are physically deleted by deleting the sign bits in Palo internally. This method is suitable for deleting data in real time.

Refer to Data update document for detailed instructions of deleting and update operations.

Basic Operations Guide

Relation Model and Data Division

百度智能云

Data Warehouse