Operations and Practices
Attach
Execute the command so that BOS FS mounts bucket to local directory:
1bosfs mybucket my_local_directory -o endpoint=http://bj.bcebos.com -o ak=xxxxxxxxxxxxxxxx -o sk=xxxxxxxxxxxxxxxxx -o logfile=xx/xx.log
After execution, a daemon process named bosfs will start in the backend. If the mounting succeeds, you can view the mount target using the df command. Moreover, you can view /etc/fstab to understand the specific mount options used.
Additionally, bosfs allows subdirectory mounting by appending the subdirectory to the bucket name in the format bucket/subdir, with the subdirectory becoming the mount target's root.
Note: By default, only the root user has the permission to execute 行
fusermount. If a specific user needs to executefusermount, the user should be added to the group obtained after the fuse installation using theusermod -a -G fuse YourAccountcommand.
Detach
- The
umountcommand can be used for direct deletion. - If non-root users without unmount permission are added to the fuse group, they can be deleted with
fusermount -u my_local_directory. - They can also be removed by terminating the backend bosfs process, though this is not recommended as it could result in unflushed files being corrupted.
Example
Illustrate BOS FS usage by demonstrating data synchronization between two buckets as an example.
Imagine there are two storage buckets, bucket-a and bucket-b, likely located in different regions (e.g., Beijing and Guangzhou). The service demands backing up data from bucket-a to bucket-b while synchronizing only modified files to minimize resource usage. The following steps can be performed:
- Find an available virtual machine with a CentOS image (if the virtual machine is unavailable, purchase it) and enable the EIP service;
- Install bosfs;
- Mount bucket-a and bucket-b to local directories bucket-a and bucket-b using the bosfs mount command;
- Start synchronizing data between the two buckets using the
rsync -av --partial-dir=/tmp bucket-a bucket-bcommand. - Consider adding the
rsynccommand to crontab for regular execution.
Notes:
- During the synchronization of a file, the /tmp directory or a selected cache directory will temporarily use space equivalent to the file's size. This space will be freed once the synchronization is complete. Ensure local disk space is adequately available.
- When using the rsync command, add the
--partial-dir=/tmpoption to place temporary files in the tmp directory, avoiding unnecessary data transfer within the mount target.- Using a single rsync process may result in slow synchronization speeds. Moreover, involving another machine can be limited by the network interface card of a single machine. To improve performance, if the bucket's directory structure can be divided, consider using multiple parallel rsync processes or deploying several virtual machines to synchronize distinct subdirectories simultaneously.
- If the same file is edited concurrently on multiple mounted endpoints, BOS cannot guarantee predictable outcomes. This could lead to file truncation, loss of written data, or newly written content becoming unreadable on other endpoints. Such scenarios should thus be avoided.
- In this scenario, because the two buckets are in different regions, enabling EIP is necessary. If either bucket and BCC are situated in different regions, the EIP feature must also be activated. There is no need to enable EIP only if all three components are in the same region.
Practice
-
bosfs performance and notes:
- bosfs is developed and mounted based on fuse. Due to semantic restrictions of its framework API, operations on local files within the mount target may require multiple network interactions with the Baidu AI Cloud Object Storage (BOS) service. Therefore, the network environment between the bosfs mount target instance and the BOS endpoint will affect bosfs performance.
- bosfs provides numerous parameters that can be optimized according to specific usage scenarios. Refer to Documentation.
- bosfs operations are of atomic class. Especially when multiple bosfs processes are performed to mount the same BOS bucket, the concurrent operations on that bucket across multiple mount targets may result in undefined behavior.
- If bosfs cannot meet service requirements, boscmd is recommended. For non-sensitive online services, bosfs may be used. For scenarios requiring reliability, stability and controllability, API integration is recommended.
-
Use bosfs in docker/kubernetes environment:
- The static PV/PVC can be used to access BOS via bosfs mounting in k8s environment.
- It is recommended that the business pods and bosfs pods should be managed separately. A liveness probe mechanism can be used to reboot bosfs when it is abnormal.
- When multiple bosfs mount targets are used in a single business pod or multiple pods are launched in the k8s environment, deploying multiple bosfs containers on one node may consume excessive memory and bandwidth. It is suggested that the adjustment should be made according to actual conditions.
- Cloud Container Engine (CCE) of Baidu AI Cloud has integrated components for accessing BOS through bosfs mounting in containers. Refer to Documentation.
