Baidu AI Cloud
中国站

百度智能云

All Product Document

          Data Warehouse

          Synchronize Data Through External Table

          Palo can create external tables accessed through ODBC protocol. After the creation, the user can query the data of the external table directly through SELECT statement, or load the external table data through the method of INSERT INTO SELECT.

          This document mainly introduces how to create external tables accessed through ODBC protocol and how to load these external table data. Currently supported data sources include:

          • MySQL
          • Oracle
          • PostgreSQL

          Create external table

          Refer to CREATE ODBC TABLE syntax help manual for detailed introductions to creating ODBC external table.

          Here, we describe the usage only by examples.

          1. Create ODBC Resource

            ODBC Resource is to unify the management of connection information for external tables.

            CREATE EXTERNAL RESOURCE `oracle_odbc`
            PROPERTIES (
                "type" = "odbc_catalog",
                "host" = "192.168.0.1",
                "port" = "8086",
                "user" = "test",
                "password" = "test",
                "database" = "test",
                "odbc_type" = "oracle",
                "driver" = "Oracle"
            );

            We created here a Resource called oracle_odbc, whose type is odbc_catalog, indicating that this is a Resource used to store ODBC information. odbc_type is oracle,indicating this OBDC Resource is used to connect to Oracle database. Refer to Resource management document for other types of resources.

          2. Create external table

            CREATE EXTERNAL TABLE `ext_oracle_tbl` (
              `k1` decimal(9, 3) NOT NULL COMMENT "",
              `k2` char(10) NOT NULL COMMENT "",
              `k3` datetime NOT NULL COMMENT "",
              `k5` varchar(20) NOT NULL COMMENT "",
              `k6` double NOT NULL COMMENT ""
            ) ENGINE=ODBC
            COMMENT "ODBC"
            PROPERTIES (
                "odbc_catalog_resource" = "oracle_odbc",
                "database" = "test",
                "table" = "baseall"
            );

            We created here an external table called ext_oracle_tbl, quoting previously created oracle_odbc Resource.

          Connect to Baidu cloud database RDS

          1. Create RDS

            Create an RDS example through RDS quick start tutorial .

            Note: When creating RDS example, select the same network (VPC) as the Palo cluster at Network type->Select network. Availability zones can be different.

          2. Create resources

            CREATE EXTERNAL RESOURCE `rds_odbc`
            PROPERTIES (
                "type" = "odbc_catalog",
                "host" = "mysql56.rdsxxxxx.rds.gz.baidubce.com",
                "port" = "3306",
                "user" = "rdsroot",
                "password" = "12345",
                "odbc_type" = "mysql",
                "driver" = "MySQL"
            );

            The user need to alter parameters corresponding to hostportuserpassword . host port can be checked in RDS example information. user and password should be acquired after creating accounts on RDS console.

          3. Create external table

            CREATE EXTERNAL TABLE `mysql_table` (
               k1 int,
               k2 int
            ) ENGINE=ODBC
            PROPERTIES (
                "odbc_catalog_resource" = "rds_odbc",
                "database" = "mysql_db",
                "table" = "mysql_tbl"
            );

            The user can carry out query and other operations after creating external tale.

          Load data

          1. Create Palo table

            Here we create a Palo table with column information being the same as that of the external table ext_oracle_tbl created in the previous step:

            CREATE EXTERNAL TABLE `palo_tbl` (
              `k1` decimal(9, 3) NOT NULL COMMENT "",
              `k2` char(10) NOT NULL COMMENT "",
              `k3` datetime NOT NULL COMMENT "",
              `k5` varchar(20) NOT NULL COMMENT "",
              `k6` double NOT NULL COMMENT ""
            )
            COMMENT "Palo Table"
            DISTRIBUTED BY HASH(k1) BUCKETS 2;
            PROPERTIES (
                "replication_num" = "1"
            );

            Refer to CREATE-TABLE syntax help for detailed instructions of creating Palo table.

          2. Load data (from table ext_oracle_tblto table palo_tbl )

            INSERT INTO palo_tbl SELECT k1,k2,k3 FROM ext_oracle_tbl limit 100;

            INSERT command is a synchronization command. Successful return indicates successful load.

          Points of attention

          • It is necessary to ensure that the external data source and Palo cluster are in the same VPC, and the Compute Node can be interworking with the network of the external data source.
          • The essence of ODBC external tables is to access the data source through a single ODBC client, so it is improper to load a large amount of data at one time, loading in batches for several times is recommended.
          Previous
          Use JDBC to Synchronize Data
          Next
          Load JSON Data