Synchronize Data Through External Table
Palo can create external tables accessed through ODBC protocol. After the creation, the user can query the data of the external table directly through SELECT statement, or load the external table data through the method of INSERT INTO SELECT
.
This document mainly introduces how to create external tables accessed through ODBC protocol and how to load these external table data. Currently supported data sources include:
- MySQL
- Oracle
- PostgreSQL
Create external table
Refer to CREATE ODBC TABLE syntax help manual for detailed introductions to creating ODBC external table.
Here, we describe the usage only by examples.
-
Create ODBC Resource
ODBC Resource is to unify the management of connection information for external tables.
CREATE EXTERNAL RESOURCE `oracle_odbc` PROPERTIES ( "type" = "odbc_catalog", "host" = "192.168.0.1", "port" = "8086", "user" = "test", "password" = "test", "database" = "test", "odbc_type" = "oracle", "driver" = "Oracle" );
We created here a Resource called
oracle_odbc
, whose type isodbc_catalog
, indicating that this is a Resource used to store ODBC information.odbc_type
isoracle
,indicating this OBDC Resource is used to connect to Oracle database. Refer to Resource management document for other types of resources. -
Create external table
CREATE EXTERNAL TABLE `ext_oracle_tbl` ( `k1` decimal(9, 3) NOT NULL COMMENT "", `k2` char(10) NOT NULL COMMENT "", `k3` datetime NOT NULL COMMENT "", `k5` varchar(20) NOT NULL COMMENT "", `k6` double NOT NULL COMMENT "" ) ENGINE=ODBC COMMENT "ODBC" PROPERTIES ( "odbc_catalog_resource" = "oracle_odbc", "database" = "test", "table" = "baseall" );
We created here an external table called
ext_oracle_tbl
, quoting previously createdoracle_odbc
Resource.
Connect to Baidu cloud database RDS
-
Create RDS
Create an RDS example through RDS quick start tutorial .
Note: When creating RDS example, select the same network (VPC) as the Palo cluster at
Network type
->Select network
. Availability zones can be different. -
Create resources
CREATE EXTERNAL RESOURCE `rds_odbc` PROPERTIES ( "type" = "odbc_catalog", "host" = "mysql56.rdsxxxxx.rds.gz.baidubce.com", "port" = "3306", "user" = "rdsroot", "password" = "12345", "odbc_type" = "mysql", "driver" = "MySQL" );
The user need to alter parameters corresponding to
host
,port
,user
,password
. host port can be checked in RDS example information. user and password should be acquired after creating accounts on RDS console. -
Create external table
CREATE EXTERNAL TABLE `mysql_table` ( k1 int, k2 int ) ENGINE=ODBC PROPERTIES ( "odbc_catalog_resource" = "rds_odbc", "database" = "mysql_db", "table" = "mysql_tbl" );
The user can carry out query and other operations after creating external tale.
Load data
-
Create Palo table
Here we create a Palo table with column information being the same as that of the external table
ext_oracle_tbl
created in the previous step:CREATE EXTERNAL TABLE `palo_tbl` ( `k1` decimal(9, 3) NOT NULL COMMENT "", `k2` char(10) NOT NULL COMMENT "", `k3` datetime NOT NULL COMMENT "", `k5` varchar(20) NOT NULL COMMENT "", `k6` double NOT NULL COMMENT "" ) COMMENT "Palo Table" DISTRIBUTED BY HASH(k1) BUCKETS 2; PROPERTIES ( "replication_num" = "1" );
Refer to CREATE-TABLE syntax help for detailed instructions of creating Palo table.
-
Load data (from table
ext_oracle_tbl
to tablepalo_tbl
)INSERT INTO palo_tbl SELECT k1,k2,k3 FROM ext_oracle_tbl limit 100;
INSERT command is a synchronization command. Successful return indicates successful load.
Points of attention
- It is necessary to ensure that the external data source and Palo cluster are in the same VPC, and the Compute Node can be interworking with the network of the external data source.
- The essence of ODBC external tables is to access the data source through a single ODBC client, so it is improper to load a large amount of data at one time, loading in batches for several times is recommended.