百度智能云

All Product Document

          Database Transmission Server

          Use PostgreSQL Database as DTS Source

          1. Applicable Scenarios

          This article applies to scenarios where Baidu AI Cloud Data Transmission Service DTS (hereinafter referred to as DTS) is used to migrate data from self-built Postgresql to a data destination already supported by DTS.

          2. Limits of Using PostgreSQL Database as a DTS Source

          • Do not support timestamps that include time zone type columns.
          • Incremental synchronization does not support DDL statements that synchronize relational databases.
          • A data migration task can only perform data migration on one database. If there are multiple databases to migrate, you need to create a data migration task for each database.
          • If the source database does not have a primary key or unique constraint, and all fields are not unique, it may cause duplicate data in the destination database.
          • Copying multiple tables with the same name but different cases (for example, table1, TABLE1, and Table1) can cause unpredictable behavior.
          • For the structure migration from PostgreSQL to PostgreSQL, it does not support the migration of trigger and extension temporarily.
          • For the structure migration from PostgreSQL to heterogeneous data sources, it only supports the structure migration of database table, but does not support the migration of other objects, such as function and view.

          3. Preconditions for Using PostgreSQL Database as DTS Source

          3.1 Environmental Requirements

          DTS supports self-built databases of PostgreSQL 9.4 and 10.10 as the source.

          3.2 Database account Privilege Requirements

          Source:

          Database Full migration Incremental migration
          Self-built PostgreSQL database Select privilege of migration object superuser

          In self-built PostgreSQL database, for database account creation and authorization method, please refer to CREATE USER and GRANT syntax.

          3.3 Network Configuration

          Add the DTS IP address to the self-built PostgreSQL configuration file pg_hba.conf, and enable replication and socket connection.

             # Replication Instance 
             host all all 10.1.4.56/00 md5 
             # Allow replication connections from localhost, by a user with the 
             # replication privilege. 
             host replication dms 10.1.4.56/00 md5 

          The pg_hba.conf configuration file of PostgreSQL controls client authentication. Files are usually stored in the data directory of the database cluster.

          3.4 Preparation for Incremental Migration of Source Self-built PostgreSQL

          1. Install the logic flow replication plug in

            Log in to the server to which the self-built PostgreSQL belongs, and download the wal2json logical flow replication plug-in. The download address is as follows: https://github.com/eulerto/wal2json Follow the official instructions to install the wal2json plug in.

          2. Modify PostgreSQL related parameter configuration

            Modify the following parameters in the postgresql.conf configuration file:

            1. Set max_replication_slots to an integer greater than 1. The value of max_replication_slots should be set according to the number of tasks you need to run at the same time. For example, to run twelve tasks at the same time, you need to set at least twelve slots.

              Example: max_replication_slots = 12

            2. Set wal_level to logical to enable wal logical log.

              Example: wal_level = logical

            3. Set wal_sender_timeout to 0. When the replication connection is inactive for more than wal_sender_timeout, it will be terminated by the source library. The default value of this parameter is 60 seconds, but we recommend that you set this parameter to zero, which disables the timeout mechanism.

              Example: wal_sender_timeout = 0

            4. Set max_wal_senders to an integer greater than 10. The max_wal_senders parameter is used to set the number of concurrent tasks that can be run. It is recommended to be consistent with the value set by max_replication_slots.

              Example: max_wal_senders = 12

          3. Restart the PostgreSQL database
          4. Set the replica identity of the table to migrate to FULL

            ALTER TABLE TABLENAME REPLICA IDENTITY FULL; 

          4. Use PostgreSQL Database as DTS Source

          Using the PostgreSQL database as the source, please refer to the best practice document for the operation procedures of task creation, task configuration, pre-check, task start, task pause, and task termination. The task configuration parameters and object mapping are slightly different from other data sources.

          4.1 Task Configuration Parameters

          image.png

          As shown in FIG. Currently, the source of DTS supports self-built PostgreSQL instances on the public network. The configuration parameters of the PostgreSQL source are described as follows:

          • Access type: Support public network/BCC/BBC/DCC self-built PostgreSQL instance.
          • Type of data: Fixed selection of PostgreSQL.
          • IP/port: Access IP and service port of self-built PostgreSQL database.
          • Database: Name of the database to migrate in the self-built PostgreSQL database.
          • Account number: Self-built PostgreSQL database account.
          • Password: Password corresponding to the database account.

          4.2 Object Mapping

          PostgreSQL is a three-level schema. When you need to migrate data to the destination of the two-level schema, DTS provides two database table name mapping methods for users to choose. As shown in the figure below, the selected migration object will appear in the selected object list on the right. DTS supports upstream and downstream database table name mapping, column filtering black and white lists and other functions. You can click [Edit] to configure mapping and filtering rules for each migration object. The destination database object can be mapped to the database and schema in the source library:

          • 1. The database object is mapped to the Database in the source library: Ignore the schema in PostgreSQL, the tables under different schemas in PostgreSQL are all mapped to the mysql specified library, and the destination library name uses the source library name by default.
          • 2. Database objects are mapped to the Schema in the source library: Ignore the library name in PostgreSQL. Different schemas in PostgreSQL are mapped to different libraries in mysql. The destination library name uses the source schema name by default.

          After completing the object mapping configuration, click [Save and pre-check] to start the pre-check of the task.

          5. PostgreSQL Source Data Types Supported by DTS

          The following table lists the PostgreSQL source data types supported when using DTS and the default mapping with DTS data types.

          For information on how to view the data types mapped on the destination side, please refer to the Data Type Mapping section for data sources on the destination side.

          PostgreSQL data types DTS field type
          INTEGER DTS_TYPE_INT4
          INTEGER[] DTS_TYPE_STRING
          TEXT[] DTS_TYPE_STRING
          BIT DTS_TYPE_BYTES
          BIT VARYING DTS_TYPE_BYTES
          MONEY DTS_TYPE_NUMBER
          CHARACTER DTS_TYPE_STRING
          CHARACTER VARYING DTS_TYPE_STRING
          TEXT DTS_TYPE_STRING
          BYTEA DTS_TYPE_BYTES
          COMPLEX DTS_TYPE_STRING
          ENUM DTS_TYPE_STRING
          CIDR DTS_TYPE_STRING
          POINT DTS_TYPE_STRING
          LINE DTS_TYPE_STRING
          LSEG DTS_TYPE_STRING
          BOX DTS_TYPE_STRING
          PATH DTS_TYPE_STRING
          POLYGON DTS_TYPE_STRING
          CIRCLE DTS_TYPE_STRING
          INET DTS_TYPE_STRING
          MACADDR DTS_TYPE_STRING
          MACADDR8 DTS_TYPE_STRING
          JSON DTS_TYPE_STRING
          SMALLINT DTS_TYPE_INT2
          BIGINT DTS_TYPE_INT8
          NUMERIC(P, S) If the precision is between 0 and 38, use DTS_TYPE_NUMBER, if the precision is 39 or greater, use DTS_TYPE_STRING
          DECIMAL(P, S) If the precision is between 0 and 38, use DTS_TYPE_NUMBER, if the precision is 39 or greater, use DTS_TYPE_STRING
          DECIMAL DTS_TYPE_STRING
          NUMBER DTS_TYPE_STRING
          DECIMAL(p) DTS_TYPE_NUMBER
          NUMBER(p) DTS_TYPE_NUMBER
          REAL DTS_TYPE_FLOAT
          DOUBLE PRECISION DTS_TYPE_DOUBLE
          SMALLSERIAL DTS_TYPE_INT2
          SERIAL DTS_TYPE_INT4
          BIGSERIAL DTS_TYPE_INT8
          TIME WITHOUT TIME ZONE DTS_TYPE_TIME
          DATE DTS_TYPE_DATE
          TIMESTAMP WITHOUT TIMEZONE DTS_TYPE_DATETIME
          INTERVAL DTS_TYPE_STRING
          BOOLEAN DTS_TYPE_BOOLEAN
          Previous
          DTS Public Network IP Address Range
          Next
          Use SQL Server Database as DTS Source