Basic Operations Guide
In the quick start tutorial, we completed some basic operations of Palo through UI interface of Palo. In the actual production environment, we usually need to connect to Palo and carry out various operations with the programs.
Palo uses MySQL protocol for communication, so users can use standard MySQL client, or MySQL library, JDBC, ODBC and other languages to connect Palo. This paper takes MySQL client as an example to show the basic usage of Palo through a complete process.
MySQL client with the version after 5.1 is recommended when choosing MySQL client, because the user name with a length of more than 16 characters cannot be supported by the version before 5.1.
Download the MySQL client of Linux version here: mysql-5.7.22-linux-glibc2.12-x86_64.(After decompression, there is a
mysql
binary program inbin/
directory.)
Creating users and databases
The password set by the user when creating the Palo cluster is the password of the Palo admin user. By default, Palo cluster initially contains an admin user, and the user can connect with Palo for the first time through the admin user.
mysql -hPALO_HOST -PPALO_PORT -uadmin -pyour_password
For MySQL clients of version 8.0 or above, please add the following parameters:
mysql --default-auth=mysql_native_password -hPALO_HOST -PPALO_PORT -uadmin -pyour_password
The host and port here are the MySQL protocol connection targets given on the Palo console page. If the user is bound with EIP, replace it with EIP.
Note: for Baidu intranet special users, please contact student on duty of Palo obtain the IP address of the connection target.
The admin user has all the operation privileges of the cluster. Only administrators are recommended to use. Administrators can use the admin user to create ordinary users and grant corresponding privileges.
Create an ordinary user through the following command:
CREATE USER 'jack' IDENTIFIED BY 'jack_passwd';
The newly-created ordinary user does not have any privileges by default. We can then create a database and authorize the user Jack.
CREATE DATABASE example_db;
GRANT ALL ON example_db to "jack";
After that, we can use user jack to log in and view the database.
mysql -hPALO_HOST -PPALO_PORT -ujack -pjack_password
MySQL> SHOW DATABASES;
+--------------------+
| Database |
+--------------------+
| example_db |
| information_schema |
+--------------------+
2 rows in set (0.00 sec)
Create a table
Switch the database first:
USE example_db;
Next create a table:
CREATE TABLE lineorder (
lo_orderkey BIGINT,
lo_linenumber BIGINT,
lo_custkey INT,
lo_partkey INT,
lo_suppkey INT,
lo_orderdate INT,
lo_orderpriotity VARCHAR(16),
lo_shippriotity INT,
lo_quantity BIGINT,
lo_extendedprice BIGINT,
lo_ordtotalprice BIGINT,
lo_discount BIGINT,
lo_revenue BIGINT,
lo_supplycost BIGINT,
lo_tax BIGINT,
lo_commitdate BIGINT,
lo_shipmode VARCHAR(11)
)
DISTRIBUTED BY HASH(lo_orderkey)
PROPERTIES ("replication_num"="1");
Here we create a lineorder
table in Star Schema Benchmark with lo_orderkey
being bucket column,and set the number of copies to 1.
By default, the number of copies is 3, when the number of BE nodes in the cluster is less than 3, the number of copies should not be greater than the number of BE nodes.
After the table is created, we can view the information of table example_db:
MySQL> SHOW TABLES;
+----------------------+
| Tables_in_example_db |
+----------------------+
| lineorder |
+----------------------+
2 rows in set (0.01 sec)
mysql> DESC lineorder;
+------------------+-------------+------+-------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+-------------+------+-------+---------+-------+
| lo_orderkey | BIGINT | Yes | true | NULL | |
| lo_linenumber | BIGINT | Yes | true | NULL | |
| lo_custkey | INT | Yes | true | NULL | |
| lo_partkey | INT | Yes | false | NULL | NONE |
| lo_suppkey | INT | Yes | false | NULL | NONE |
| lo_orderdate | INT | Yes | false | NULL | NONE |
| lo_orderpriotity | VARCHAR(16) | Yes | false | NULL | NONE |
| lo_shippriotity | INT | Yes | false | NULL | NONE |
| lo_quantity | BIGINT | Yes | false | NULL | NONE |
| lo_extendedprice | BIGINT | Yes | false | NULL | NONE |
| lo_ordtotalprice | BIGINT | Yes | false | NULL | NONE |
| lo_discount | BIGINT | Yes | false | NULL | NONE |
| lo_revenue | BIGINT | Yes | false | NULL | NONE |
| lo_supplycost | BIGINT | Yes | false | NULL | NONE |
| lo_tax | BIGINT | Yes | false | NULL | NONE |
| lo_commitdate | BIGINT | Yes | false | NULL | NONE |
| lo_shipmode | VARCHAR(11) | Yes | false | NULL | NONE |
+------------------+-------------+------+-------+---------+-------+
17 rows in set (0.02 sec)
Loading data
Palo supports a variety of data loading methods. Please refer to the data loading document for details. Here we take Broker loading as an example.
Broker loading reads data from external storage for loading through built-in Broker process in the cluster. Please refer to the introduction of Broker in the operation manual for more help.
In order to use Broker loading, we need to store the loaded data files on Baidu object storage BOS in advance. Here we prepare data of lineorder
table (about 100MB) and we can download and upload them to our own BOS to start loading.
Refer to Data loaded to BOS for detailed documents on how to upload to and load data from BOS.
Suppose that the BOS path where the user stores the loading file is:bos://example_bucket/lineorder_1.tbl
Then we can load data through the following command:
LOAD LABEL example_db.my_first_load
(
DATA INFILE("bos://example_bucket/lineorder_1.tbl")
INTO TABLE lineorder
)
WITH BROKER 'bos'
(
"bos_endpoint" = "http://bj.bcebos.com",
"bos_accesskey" = "xxxxxxxxxxxxxxxxxxxxxxxxxx",
"bos_secret_accesskey"="yyyyyyyyyyyyyyyyyyyy"
);
- The Broker name of the public cloud Palo cluster is BOS. Refer to
SHOW BROKER;
to view.- The
bos_endpoint
in tis example ishttp://bj.bcebos.com
. Users need to specify different endpoints according to different regions.- accesskey and secret_accesskey can be viewed in public cloud security certification center.
Broker loading is an asynchronous command. Successful execution of the above command only means successful submission of the task.
Whether the loading is successful or not depends on the loading label, view through SHOW LOAD
command. The label in this example is my_first_load
:
SHOW LOAD WHERE LABLE = "my_first_load";
In the returned results, if the State
field is FINISHED, the loading is successful. Then we can query the data.
Data query
Palo supports SQL syntax in most SQL 92 and SQL 99 standards, as well as some SQL 2003 standards, which, basically, covers most SQL usage scenarios. Here are some simple examples.
Simple query
Examples:
MySQL> SELECT * FROM table1 LIMIT 3;
+--------+----------+----------+------+
| siteid | citycode | username | pv |
+--------+----------+----------+------+
| 2 | 1 | 'grace' | 2 |
| 5 | 3 | 'helen' | 3 |
| 3 | 2 | 'tom' | 2 |
+--------+----------+----------+------+
5 rows in set (0.01 sec)
MySQL> SELECT * FROM table1 ORDER BY citycode;
+--------+----------+----------+------+
| siteid | citycode | username | pv |
+--------+----------+----------+------+
| 2 | 1 | 'grace' | 2 |
| 1 | 1 | 'jim' | 2 |
| 3 | 2 | 'tom' | 2 |
| 4 | 3 | 'bush' | 3 |
| 5 | 3 | 'helen' | 3 |
+--------+----------+----------+------+
5 rows in set (0.01 sec)
Join query
Examples:
MySQL> SELECT SUM(table1.pv) FROM table1 JOIN table2 WHERE table1.siteid = table2.siteid;
+--------------------+
| sum(`table1`.`pv`) |
+--------------------+
| 12 |
+--------------------+
1 row in set (0.20 sec)
Subquery
Examples:
MySQL> SELECT SUM(pv) FROM table2 WHERE siteid IN (SELECT siteid FROM table1 WHERE siteid > 2);
+-----------+
| sum(`pv`) |
+-----------+
| 8 |
+-----------+
1 row in set (0.13 sec)