Use PostgreSQL Database as DTS Source
1. Applicable Scenarios
This article applies to scenarios where Baidu AI Cloud Data Transmission Service DTS (hereinafter referred to as DTS) is used to migrate data from self-built Postgresql to a data destination already supported by DTS.
2. Limits of Using PostgreSQL Database as a DTS Source
- Do not support timestamps that include time zone type columns.
- Incremental synchronization does not support DDL statements that synchronize relational databases.
- A data migration task can only perform data migration on one database. If there are multiple databases to migrate, you need to create a data migration task for each database.
- If the source database does not have a primary key or unique constraint, and all fields are not unique, it may cause duplicate data in the destination database.
- Copying multiple tables with the same name but different cases (for example, table1, TABLE1, and Table1) can cause unpredictable behavior.
- For the structure migration from PostgreSQL to PostgreSQL, it does not support the migration of trigger and extension temporarily.
- For the structure migration from PostgreSQL to heterogeneous data sources, it only supports the structure migration of database table, but does not support the migration of other objects, such as function and view.
3. Preconditions for Using PostgreSQL Database as DTS Source
3.1 Environmental Requirements
DTS supports self-built databases of PostgreSQL 9.4 and 10.10 as the source.
3.2 Database account Privilege Requirements
Source:
Database | Full migration | Incremental migration |
---|---|---|
Self-built PostgreSQL database | Select privilege of migration object | superuser |
In self-built PostgreSQL database, for database account creation and authorization method, please refer to CREATE USER and GRANT syntax.
3.3 Network Configuration
Add the DTS IP address to the self-built PostgreSQL configuration file pg_hba.conf, and enable replication and socket connection.
# Replication Instance
host all all 10.1.4.56/00 md5
# Allow replication connections from localhost, by a user with the
# replication privilege.
host replication dms 10.1.4.56/00 md5
The pg_hba.conf configuration file of PostgreSQL controls client authentication. Files are usually stored in the data directory of the database cluster.
3.4 Preparation for Incremental Migration of Source Self-built PostgreSQL
-
Install the logic flow replication plug in
Log in to the server to which the self-built PostgreSQL belongs, and download the wal2json logical flow replication plug-in. The download address is as follows: https://github.com/eulerto/wal2json Follow the official instructions to install the wal2json plug in.
-
Modify PostgreSQL related parameter configuration
Modify the following parameters in the postgresql.conf configuration file:
-
Set max_replication_slots to an integer greater than 1. The value of max_replication_slots should be set according to the number of tasks you need to run at the same time. For example, to run twelve tasks at the same time, you need to set at least twelve slots.
Example:
max_replication_slots = 12
-
Set wal_level to logical to enable wal logical log.
Example:
wal_level = logical
-
Set wal_sender_timeout to 0. When the replication connection is inactive for more than wal_sender_timeout, it will be terminated by the source library. The default value of this parameter is 60 seconds, but we recommend that you set this parameter to zero, which disables the timeout mechanism.
Example:
wal_sender_timeout = 0
-
Set max_wal_senders to an integer greater than 10. The max_wal_senders parameter is used to set the number of concurrent tasks that can be run. It is recommended to be consistent with the value set by max_replication_slots.
Example:
max_wal_senders = 12
-
- Restart the PostgreSQL database
-
Set the replica identity of the table to migrate to FULL
ALTER TABLE TABLENAME REPLICA IDENTITY FULL;
4. Use PostgreSQL Database as DTS Source
Using the PostgreSQL database as the source, please refer to the best practice document for the operation procedures of task creation, task configuration, pre-check, task start, task pause, and task termination. The task configuration parameters and object mapping are slightly different from other data sources.
4.1 Task Configuration Parameters
As shown in FIG. Currently, the source of DTS supports self-built PostgreSQL instances on the public network. The configuration parameters of the PostgreSQL source are described as follows:
- Access type: Support public network/BCC/BBC/DCC self-built PostgreSQL instance.
- Type of data: Fixed selection of PostgreSQL.
- IP/port: Access IP and service port of self-built PostgreSQL database.
- Database: Name of the database to migrate in the self-built PostgreSQL database.
- Account number: Self-built PostgreSQL database account.
- Password: Password corresponding to the database account.
4.2 Object Mapping
PostgreSQL is a three-level schema. When you need to migrate data to the destination of the two-level schema, DTS provides two database table name mapping methods for users to choose. As shown in the figure below, the selected migration object will appear in the selected object list on the right. DTS supports upstream and downstream database table name mapping, column filtering black and white lists and other functions. You can click [Edit] to configure mapping and filtering rules for each migration object. The destination database object can be mapped to the database and schema in the source library:
- 1. The database object is mapped to the Database in the source library: Ignore the schema in PostgreSQL, the tables under different schemas in PostgreSQL are all mapped to the mysql specified library, and the destination library name uses the source library name by default.
- 2. Database objects are mapped to the Schema in the source library: Ignore the library name in PostgreSQL. Different schemas in PostgreSQL are mapped to different libraries in mysql. The destination library name uses the source schema name by default.
After completing the object mapping configuration, click [Save and pre-check] to start the pre-check of the task.
5. PostgreSQL Source Data Types Supported by DTS
The following table lists the PostgreSQL source data types supported when using DTS and the default mapping with DTS data types.
For information on how to view the data types mapped on the destination side, please refer to the Data Type Mapping section for data sources on the destination side.
PostgreSQL data types | DTS field type |
---|---|
INTEGER | DTS_TYPE_INT4 |
INTEGER[] | DTS_TYPE_STRING |
TEXT[] | DTS_TYPE_STRING |
BIT | DTS_TYPE_BYTES |
BIT VARYING | DTS_TYPE_BYTES |
MONEY | DTS_TYPE_NUMBER |
CHARACTER | DTS_TYPE_STRING |
CHARACTER VARYING | DTS_TYPE_STRING |
TEXT | DTS_TYPE_STRING |
BYTEA | DTS_TYPE_BYTES |
COMPLEX | DTS_TYPE_STRING |
ENUM | DTS_TYPE_STRING |
CIDR | DTS_TYPE_STRING |
POINT | DTS_TYPE_STRING |
LINE | DTS_TYPE_STRING |
LSEG | DTS_TYPE_STRING |
BOX | DTS_TYPE_STRING |
PATH | DTS_TYPE_STRING |
POLYGON | DTS_TYPE_STRING |
CIRCLE | DTS_TYPE_STRING |
INET | DTS_TYPE_STRING |
MACADDR | DTS_TYPE_STRING |
MACADDR8 | DTS_TYPE_STRING |
JSON | DTS_TYPE_STRING |
SMALLINT | DTS_TYPE_INT2 |
BIGINT | DTS_TYPE_INT8 |
NUMERIC(P, S) | If the precision is between 0 and 38, use DTS_TYPE_NUMBER, if the precision is 39 or greater, use DTS_TYPE_STRING |
DECIMAL(P, S) | If the precision is between 0 and 38, use DTS_TYPE_NUMBER, if the precision is 39 or greater, use DTS_TYPE_STRING |
DECIMAL | DTS_TYPE_STRING |
NUMBER | DTS_TYPE_STRING |
DECIMAL(p) | DTS_TYPE_NUMBER |
NUMBER(p) | DTS_TYPE_NUMBER |
REAL | DTS_TYPE_FLOAT |
DOUBLE PRECISION | DTS_TYPE_DOUBLE |
SMALLSERIAL | DTS_TYPE_INT2 |
SERIAL | DTS_TYPE_INT4 |
BIGSERIAL | DTS_TYPE_INT8 |
TIME WITHOUT TIME ZONE | DTS_TYPE_TIME |
DATE | DTS_TYPE_DATE |
TIMESTAMP WITHOUT TIMEZONE | DTS_TYPE_DATETIME |
INTERVAL | DTS_TYPE_STRING |
BOOLEAN | DTS_TYPE_BOOLEAN |