BOS-AlluxioExtension Tool
Updated at:2025-11-03
Introduction to Alluxio
Alluxio is a memory speed virtual distributed storage system.
In the big data ecosystem, Alluxio acts as a bridge between data-driven frameworks or applications (e.g., Apache Spark) and persistent storage systems (e.g., BOS, HDFS, S3). It unifies data across these storage systems, offering a single client API and global namespace for upper-level applications.
Advantages of using BOS via Alluxio
Alluxio’s memory-first hierarchical architecture simplifies access to BOS data and significantly reduces the frequency of requests to BOS APIs for commonly accessed hot data.
Read
- Memory-level I/O throughput rate, Alluxio’s hierarchical storage mechanism can fully utilize cached frequently accessed data
- Effectively reduce the latency of certain operations (such as listing directories or renaming) in object storage
- Simplify data management, and Alluxio supports single-point access to multi-source data
- Compatibility, the existing data analysis applications, such as Spark and MapReduce programs, can run on Alluxio without any code changes
Write
Intelligent cache, with configurable write policies as needed:
- MUST_CACHE: Write-only cache, which is suitable for temporary data that does not need to be preserved. Although it carries a high risk of data loss, it offers the best performance
- THROUGH: Write infrequently used data directly to BOS, and reserve more memory space for other frequently read data in Alluxio
- CACHE_THROUGH: Synchronously write to both Alluxio and BOS, and the data will soon be used by other Alluxio applications
- ASYNC_THROUGH: Default mode - write cache and asynchronously write BOS, which is suitable for data requiring persistence but not immediate use
Quick start
Deployment
- Download Alluxio and precompiled BOS underlying storage alluxio-underfs-bos, and then unzip
- Install the JAR package of alluxio-underfs-bos
Bash
1$ cd {ALLUXIO_HOME}
2$ ./bin/alluxio extensions install <path>/alluxio-underfs-bos-0.1.0.jar
- In the ${ALLUXIO_HOME}/conf directory, create the conf/alluxio-site.properties configuration file from the template, configure the AK/SK of BOS, and enable temporary STS access.
Bash
1$ cp conf/alluxio-site.properties.template conf/alluxio-site.properties
Bash
1fs.bos.accessKey=<BOS_AK>
2fs.bos.secretKey=<BOS_SK>
3fs.bos.endpoint=<BOS_ENDPOINT>
4alluxio.user.file.writetype.default=CACHE_THROUGH
Alluxio mounts BOS
- Reformatted Alluxio’s logs and worker storage directories.
Bash
1$ ./bin/alluxio format
- Enable Alluxio on localhost
Bash
1$ ./bin/alluxio-start.sh local
- Create a directory and mount BOS. A bucket must exist on BOS, use the “test-979” bucket as an example
Bash
1$ ./bin/alluxio fs mkdir /mnt
2Successfully created directory /mnt
Bash
1$ ./bin/alluxio fs mount /mnt/bos bos://test-979
2Mounted bos://test-979 at /mnt/bos
- Copy local files to Alluxio
Bash
1$ ./bin/alluxio fs copyFromLocal LICENSE /mnt/bos
2Copied file:///alluxio-2.0.0/LICENSE to /mnt/bos
- Use the list command to view the copied files. The echoed parameters include: permission, file size, whether being persisted to BOS, creation date, cache ratio in Alluxio, and file name
Bash
1$ ./bin/alluxio fs ls /mnt/bos/LICENSE
2-rwxrwxrwx 27040 PERSISTED 07-21-2020 15:06:46:000 100% /mnt/bos/LICENSE
- Stop Alluxio
Bash
1$ ./bin/alluxio-stop.sh local
Experience Alluxio access accelerate for BOS data
Alluxio stores data in memory, enabling faster access to BOS data. Try it out:
- Execute list to view files in /mnt/bos; 0% indicates the file is not in Alluxio memory
Bash
1$ time ./bin/alluxio fs ls /mnt/bos
2-rwxrwxrwx 27040 PERSISTED 07-21-2020 15:06:46:000 0% /mnt/bos/LICENSE
3-rwxrwxrwx 51307896 PERSISTED 07-21-2020 15:05:49:000 0% /mnt/bos/alluxio-underfs-bos-0.1.0.jar
4real 0m2.297s
5user 0m2.703s
6sys 0m0.269s
- Calculate the counts of the word “the” in the file /mnt/bos/LICENSE
Bash
1$ time ./bin/alluxio fs cat /mnt/bos/LICENSE | grep -c the
2200
3real 0m3.357s
4user 0m2.974s
5sys 0m0.289s
- BOS data read will be cached in memory for faster subsequent access; 100% indicates the file is fully loaded into Alluxio memory
Bash
1$ ./bin/alluxio fs ls //mnt/bos/LICENSE
2-rwxrwxrwx 27040 PERSISTED 07-21-2020 15:06:46:000 100% /mnt/bos/LICENSE
- Calculate the counts of “the” in /mnt/bos/LICENSE again, 3.357s → 2.189s, the latency is significantly shorted as no data is fetched from BOS for the second time
Bash
1$ time ./bin/alluxio fs cat /mnt/bos/LICENSE | grep -c the
2200
3real 0m2.189s
4user 0m2.835s
5sys 0m0.286s
- Execute list again to view files in /mnt/bos again, 2.297s → 1.793s, which is noticeably faster
Bash
1$ time ./bin/alluxio fs ls /mnt/bos
2-rwxrwxrwx 27040 PERSISTED 07-21-2020 15:06:46:000 100% /mnt/bos/LICENSE
3-rwxrwxrwx 51307896 PERSISTED 07-21-2020 15:05:49:000 0% /mnt/bos/alluxio-underfs-bos-0.1.0.jar
4real 0m1.793s
5user 0m2.630s
6sys 0m0.238s
