Use Rsync to Sync From Old File System to CFS File System
This document provides a data synchronization solution based on rsync, helping users back up or migrate file system data stored in on-premises data centers or other cloud service providers to Baidu AI Cloud CFS.
The following assumes that the file system to be synchronized has been mounted on one or more servers, with the path /old_fs.
Operation steps
-
Sign in to Baidu Cloud Compute
- Bind an EIP to the Baidu Cloud Compute (BCC) to obtain a public IP address.
- Ensure that the remote server can log in to the BCC’s public IP via SSH.
-
Configure the public key
Sign in to the remote server and execute the following commands:
Plain Text1ssh-keygen -t rsa #Creates a public key; you may press Enter repeatedly when prompted for input (no need to fill in additional info) 2 cat ~/.ssh/id_rsa.pub #Views the generated public key; copy the content here for use in the next stepSign in to the BCC, edit the file
~/.ssh/id_rsa.pub, and paste the public key copied in the previous step into this file.Once the steps above are completed, the remote server can log in to and access the BCC without requiring a password.
-
Strategy evaluation
Users need to evaluate the following factors to select a more appropriate data synchronization method:
- Can writes to the current file system be paused during synchronization? If yes, perform a one-time full synchronization (see Synchronization Solution 4.1). If not, calculate how much data will be written during synchronization. For small volumes, adopt "full synchronization + one incremental synchronization" (see Synchronization Solution 4.2). Otherwise, use the periodic synchronization approach (see Synchronization Solution 4.3).
- Consider the data volume of the file system. For large volumes, synchronize by subdirectories. Each subdirectory's sync process is independent and can be done simultaneously or one after another. CFS supports concurrent mounting by multiple VMs, so increasing the number of VMs for subdirectory sync can speed up the process.
-
Synchronization solutions
The synchronization approach changes depending on whether there are write operations on the old file system. The following situations are addressed separately:
4.1 No write IO requests to the original file system
First, confirm that the CFS file system has been mounted to /mnt/cfs/ (or another directory) on the BCC. Then, execute the following command on the remote server:
Plain Text1rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip:/mnt/cfs/sub_folder_1 2rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip:/mnt/cfs/sub_folder_2 3rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip:/mnt/cfs/sub_folder_3 4...Plain Text1> **Notes** 2> 3> –z: Enable compression 4> 5> –v: Enable verbose output 6> 7> –r: Enable recursive synchronization 8> 9> user: BCC login account 10> 11> bcc_ip: Public IP address of the BCC4.2 A small number of write IO requests to the original file system
For systems with write IO operations, the rsync command cannot ensure full synchronization between the old and CFS file systems. However, rsync's internal algorithm can make the two systems mostly synchronized. Once the initial rsync run is done, you can temporarily stop write IO operations to the old file system. The process is as follows:
- Resume write IO operations to the old file system. Run these commands on the remote server:
Plain Text1rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip:/mnt/cfs/sub_folder_1 2rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip:/mnt/cfs/sub_folder_2 3rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip:/mnt/cfs/sub_folder_3 4...- Pause write IO operations to the old file system. Execute the following commands on the remote server:
Plain Text1rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip:/mnt/cfs/sub_folder_1 2rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip:/mnt/cfs/sub_folder_2 3rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip:/mnt/cfs/sub_folder_3 4...When running rsync a second time, since a previous sync has already taken place, rsync will only identify and synchronize the differences. This makes the second rsync process significantly faster.
4.3 A large number of write IO requests to the original file system
If there are numerous write IO operations and pausing IO is not possible, achieving strong consistency between the old file system and CFS through synchronization is not realistic. In this case, consider running rsync periodically to maintain as much consistency as possible. On Linux, multiple methods are available for periodic execution of commands. We suggest using crontab to schedule rsync at regular intervals. For instance:
Plain Text1crontab -eAdd the following entry to the crontab configuration (it schedules the rsync command to run every hour).
Plain Text1* */1 * * * rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip:/mnt/cfs/sub_folder_1Use the appropriate check command to verify that the crontab command was successfully configured.
Plain Text1crontab -lAfter a period of synchronization from the old file system to Baidu CFS, select an appropriate time to pause the application’s write operations to the old file system, and delete the crontab task with the following command:
Plain Text1crontab -e #Follow the prompts to delete the rsync task and exitFinally, manually execute the rsync command to ensure complete synchronization between the two file systems.
Plain Text1rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip:/mnt/cfs/sub_folder_1 2rsync -zvr /old_fs/sub_folder_2/ user@bcc_ip:/mnt/cfs/sub_folder_2 3rsync -zvr /old_fs/sub_folder_3/ user@bcc_ip:/mnt/cfs/sub_folder_3 4...At this stage, users can migrate their application from the old server to the Baidu BCC server.
-
Statistical results
After each rsync execution, the synchronization results will be displayed, including:
- Time spent on synchronization;
- Number of bytes transferred;
- IO transfer speed;
- Total count of synchronized files.
