Overview
What is the BOS HDFS Tool?
The BOS HDFS Tool, introduced by Baidu AI Cloud, is based on the Hadoop framework and is designed to address data reading, writing, and usability challenges in BOS for big data scenarios.
Data analysis in big data scenarios is becoming a key focus for businesses. Hadoop, known for its reliability, efficiency, scalability, and concurrent processing power, has become one of the most popular open-source frameworks for big data. Hadoop includes a distributed file system called HDFS (Hadoop Distributed File System), which offers high fault tolerance and supports high-throughput data access, making it ideal for handling ultra-large datasets. HDFS reliably stores vast amounts of data and is a cornerstone of the Hadoop ecosystem. As data volumes grow, enterprises face challenges in maintaining native Hadoop systems due to the high costs involved. Moreover, storing massive datasets on local HDFS infrastructure has become difficult. With enterprises increasingly migrating data to the cloud, many choose to use object storage servers. However, limitations in the APIs of object storage systems have posed challenges for data access and R/W operations between object storage and self-managed HDFS systems in big data use cases. BOS HDFS effectively addresses this issue.
BOS HDFS is fully compatible with Hadoop versions 2.7+ and 3.1+, supporting large-scale storage of HDFS data in BOS. It uses standard HDFS interfaces for upper-layer data operations like access and read/write, effectively addressing the high operational costs and limited scalability of self-built HDFS. By employing this tool, users can fully leverage BOS’s ultra-low cost, high performance, reliability, and high throughput, meeting enterprise requirements for data read/write and usage in big data scenarios.
Advantages of the BOS HDFS tool
- Framework compatibility: Fully compatible with Hadoop 2.7+/3.1+
- Seamless call: Realize transparent call of data in BOS
- Cost-effective data storage: Combine the ultra-low cost, ultra-high performance, high reliability, high availability, and high throughput advantages of Baidu AI Cloud Object Storage (BOS).
Update records
【1.0.5】
- Support the append/truncate interface
- Support multi-bucket isolation configuration for ak/sk/endpoint
- Support EnvironmentVariableCredentialsProvider
- Optimize the getFileStatus/create/delete API
- Optimize the isFile/isDirectory API
- Update the hadoop-common dependency to version 3.2.2
【1.0.4】
- Support CRC32C Checksum verification
- Optimize the create API
- Optimize the open API
- Optimize the hierarchical rename API
- Adjust the default part size from 10 MB to 12 MB
- Optimize sequential read policies
【1.0.3】
- Support the hierarchical bucket
- Read cache optimization, enabled by default
- Optimize sequential read
- Optimize multi-file deletion
- Fixed known issues
