百度智能云

All Product Document

          Object Storage

          Configuration and Use

          To use BOS HDFS tool, you only need to download corresponding SDK package, modify part of the configuration.

          Download

          • Download BOS FS JAR, and copy jar package to $hadoop_dir/share/hadoop/commonafter extraction. WhereasMD5 = 98a1260c63013c612ca5de1b793c6ddc.
          • Download BOS Java SDK, and copy the jar package after extraction into $hadoop_dir/share/hadoop/common/lib/, whereas the version of Java SDK must be V0.10.82 or later.

          Preparations before Use

          • Modify log4j.properties in the configuration path of Hadoop, and adjust the log configuration of BOS SDK: log4j.logger.com.baidubce.http=WARN
          • Add or modify relevant configuration of BOS HDFS in the file $hadoop_dir/etc/core-site.xml.
          <property> 
            <name>fs.bos.access.key</name> 
            <value>{Your AK}</value> 
          </property> 
          
          <property> 
            <name>fs.bos.secret.access.key</name> 
            <value>{Your SK}</value> 
          </property> 
          
          <property> 
            <name>fs.bos.endpoint</name> 
            <value>http://bj.bcebos.com</value> 
          </property> 
          
          <property> 
            <name>fs.bos.impl</name> 
            <value>org.apache.hadoop.fs.bos.BaiduBosFileSystem</value> 
          </property> 
          
          <property> 
            <name>fs.bos.multipart.uploads.attempts</name> 
            <value>5</value> 
          </property> 
          
          <property> 
            <name>fs.bos.multipart.uploads.block.size</name> 
            <value>9437184</value> 
          </property> 
          
          <property> 
            <name>fs.bos.multipart.uploads.cocurrent.size</name> 
            <value>3</value> 
          </property> 
          
          <property> 
            <name>fs.bos.multipart.uploads.factor</name> 
            <value>10.0</value> 
          </property> 
          
          <property> 
            <name>fs.bos.multipart.uploads.speed</name> 
            <value>10485760</value> 
          </property> 

          Start to Use

          The path needs to be started with bos:// when BOS HDFS is used to access the BOS service. E.g.:

          hdfs dfs ls bos://{bucket} 
          hdfs dfs -put ${local_file} bos://{bucket}/a/b/c 

          Use Advancement

          Due to the limited scalability of the self-built Hadoop cluster and the need for a great number of manpower to operate and maintain the cluster, if you have higher demands for the performance and security, you are recommended to use Baidu MapReduce (BMR) provided by Baidu AI Cloud. BMR is a fully hosted Hadoop/Spark cluster, and you can deploy and flexibly expand the cluster on demand, and you only have to focus on the processing, analysis and report of big data. Baidu Operation and Maintenance Team with many years of accumulation in large-scale distributed computing technology is fully responsible for the operation and maintenance of cluster, and can make significant improvements in performance, security and convenience.

          Previous
          Overview
          Next
          BOSFTP