百度智能云

All Product Document

          Cloud File Storage

          Use Rsync to Synchronize from the Old File System to the CFS File System

          This document provides a data synchronization solution based on rsync to help users backup or migrate file system data stored in local computer rooms and other cloud vendors to Baidu AI Cloud CFS.

          The following assumes that the file system to be synchronized has been mounted on one or more servers and the path is /old_fs.

          Operation Steps

          1.Log in to the cloud server

          1. Bind the EIP to the Baidu Cloud Compute(BCC) to have a public IP.
          2. Ensure that the remote server can log in to the BCC public IP through SSH.

          2.Configure the public key

          Log in to the remote server and execute the following command:

          ssh-keygen -t rsa       # Create a public key, you will be prompted to enter information, you can leave it alone, just press Enter all the time 
          cat ~/.ssh/id_rsa.pub   # Check the generated public key, copy the content here, and prepare to enter it into the file in the next step 

          Log in to BCC, edit the file ~/.ssh/id_rsa.pub, and enter the public key copied in the previous step here.

          So far, now from the remote server, you can log in and access BCC without a secret key.

          3.Policy evaluation

          Users need to evaluate the following factors to adopt a more appropriate way to synchronize data:

          1. Whether writing to the existing file system can be suspended during synchronization. If possible, perform full synchronization only once (synchronization scheme 4.1). If not, please evaluate the amount of newly written data during synchronization. If the amount of data is small, you can use full synchronization + one incremental synchronization (synchronization scheme 4.2), otherwise use periodic synchronization scheme 4.3.
          2. Amount of data in the file system. If there is a large amount of data in the existing file system, synchronize it by directory. The synchronization of each subdirectory is irrelevant and can be performed simultaneously or serially. In the meantime, CFS supports simultaneous mounting of multiple virtual machines. Appropriately increasing the number of virtual machines to perform concurrent synchronization of subdirectories can increase the speed of synchronization.

          4.Synchronization scheme

          The synchronization process will vary slightly depending on whether the user has written to the old file system. The following situations are discussed below:

          4.1 The user did not write an IO request to the original file system.

          First confirm that the CFS file system has been mounted to /mnt/cfs/ or other directories on the BCC machine, and then execute the following commands on the remote server:

          rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip:/mnt/cfs/sub_folder_1 
          rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip:/mnt/cfs/sub_folder_2 
          rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip:/mnt/cfs/sub_folder_3 
          ...

          Comment

          –z: Allow compression

          –v: lengthy

          –r: Recursion

          user: bcc login account

          bcc_ip: BCC public IP address

          4.2 The user has a small number of write IO requests to the original file system.

          If the user has a request to write IO, the rsync command cannot guarantee that the original file system and the CFS file system are absolutely synchronized. But the algorithm inside rsync can ensure that the two file systems are mostly synchronized. After the first rsync is executed, the user can suspend the application's writing operation IO to the original file system. The specific steps are as follows:

          1. The user's write IO to the old file system can continue. Execute the following commands on the remote server:
          rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip:/mnt/cfs/sub_folder_1 
          rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip:/mnt/cfs/sub_folder_2 
          rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip:/mnt/cfs/sub_folder_3 
          ...
          1. Suspend user write IO to the old file system. Execute the following commands on the remote server:
          rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip:/mnt/cfs/sub_folder_1 
          rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip:/mnt/cfs/sub_folder_2 
          rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip:/mnt/cfs/sub_folder_3 
          ...

          When the rsync operation is performed for the second time, since a synchronization operation has already been performed, the second time the difference between the two files will be calculated and then the synchronization will be performed. Therefore, the second rsync operation will be much faster.

          4.3 The user has a large number of write IO requests to the original file system.

          When the user's application has a large amount of write IO, and the IO cannot be suspended at the same time. In theory, it is impossible to ensure the strong consistency between the old file system and CFS through synchronization. Then, you can consider running rsync operations periodically to keep the two file systems as consistent as possible. In the Linux system, there are ways to execute commands in multiple cycles. Here we recommend using crontab, which can conveniently call rsync periodically. E.g.:

          crontab -e 

          Then enter the following content, which means to call the rsync command every 1 hour.

          * */1 * * * rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip:/mnt/cfs/sub_folder_1 

          You can use the following command to check whether the crontab command is successfully configured.

          crontab -l 

          After a period of synchronization from the old file system to Baidu CFS, the user chooses a suitable time to suspend the application's write operation to the old file system and delete the crontab task at the same time. The operation is as follows:

          crontab -e # According to the prompt, delete the rsync task and exit 

          Finally, manually execute the following rsync command to ensure that the two file systems are completely synchronized again.

          rsync -zvr /old_fs/sub_folder_1/ user@bcc_ip:/mnt/cfs/sub_folder_1 
          rsync -zvr /old_fs/sub_folder_2/ user@bcc_ip:/mnt/cfs/sub_folder_2 
          rsync -zvr /old_fs/sub_folder_3/ user@bcc_ip:/mnt/cfs/sub_folder_3 
          ...

          Then, users can migrate applications from the old server to the Baidu BCC server.

          5.Statistical results

          Whenever rsync is executed, the result of this synchronization will be displayed, including:

          1. Time spent on synchronization;
          2. How many bytes were transferred;
          3. IO transmission speed;
          4. How many files are synchronized in total.
          Previous
          Use SFTP to Upload and Download CFS File System Data
          Next
          API Reference