HBase Uses BOS as Underlying Storage

Updated at：2025-11-03

HBase

HBase is a columnar distributed database designed for fast, random access to large-scale structured data. Its underlying storage is typically implemented using HDFS.

Prerequisites

First, refer to the document BOS HDFS to install and configure BOS HDFS. The Hadoop version installed on the local machine is hadoop-3.3.2. Refer to the “Getting Started” section in the document to complete the basic trial of BOS HDFS and set environment variables:

                Bash
                
                export HADOOP_HOME=/opt/hadoop-3.3.2
export HADOOP_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath

Install

1. HBase environment preparation

                Bash
                
                # Download to a path
wget https://www.apache.org/dyn/closer.lua/hbase/2.6.0/hbase-2.6.0-bin.tar.gz
# Unzip
tar zxvf hbase-2.6.0-bin.tar.gz

2. Configuration

Configure JAVA_HOME in conf/hbase-env.sh:

                Bash
                
                # Select the java environment installed on the machine, version 1.8 or above
export JAVA_HOME=/usr/java/jdk1.8.0/

Configure it in conf/hbase-site.xml to use BOS for data storage:

                Bash
                
            

                <property>
    <name>hbase.rootdir</name>
    <value>bos://{bucket}/hbase</value>
 <description>This item is used to set the persistent storage path for HBase data. When using BOS, it must be set as a path </description>
 prefixed with “bos://{bucket}/” </property>

 <property>
     <name>hbase.wal.dir</name>
     <value></value>
 <description>This items is used to set the WAL data path, requiring low latency - typically using HDFS storage. If using BOS, it is required to ensure that the cluster’s BOS-HDFS version supports the hflush/hfsync API </description>
 </property>

 <property>
     <name>hbase.zookeeper.property.dataDir</name>
     <value>/var/zookeeper</value>
 <description>This item is used to set the metadata for ZooKeeper storage. If it is not set to be stored under /tmp by default, data will be lost upon restart. </description>
</property>
 <property>
     <name>hbase.cluster.distributed</name>
     <value>false</value>
 <description>This item is used to set the distributed cluster mode, false indicates standalone or pseudo-distributed mode, and true indicates fully distributed mode</description>
 </property>
            

Use

1. Start HBase

Bash

1./bin/start-hbase.sh

2. Create a table

                Bash
                
            

                ./bin/hbase shell
 >status           # View cluster status
1 active master, 0 backup masters, 1 servers, 0 dead, 2.0000 average load
Took 0.7840 seconds
 >create 'students','name','age' # Create students table
2024-09-02 19:23:25,153 INFO  [main] client.HBaseAdmin (HBaseAdmin.java:postOperationResult(3746)) - Operation: CREATE, Table Name: default:students, procId: 9 completed
Created table students
Took 2.4410 seconds
=> Hbase::Table - students
            

3. Insert data

                Bash
                
            

                >put 'students', 'row1', 'name:lastname', 'zhang'
Took 0.0820 seconds
> put 'students', 'row1', 'name:firstname', 'san'
Took 0.0900 seconds
> put 'students', 'row1', 'age', '23'
Took 0.0990 seconds
> put 'students', 'row2', 'name:lastname', 'li'
Took 0.0710 seconds
> put 'students', 'row2', 'name:firstname', 'si'
Took 0.0520 seconds
> put 'students', 'row2', 'age', '30'
Took 0.0920 seconds
            

View the data stored on BOS

4. Full table scan

                Bash
                
            

                >scan 'students'
ROW                                                          COLUMN+CELL
 row1                                                        column=age:, timestamp=2024-09-02T19:37:56.571, value=23
 row1                                                        column=name:firstname, timestamp=2024-09-02T19:37:31.480, value=san
 row1                                                        column=name:lastname, timestamp=2024-09-02T19:36:09.318, value=zhang
 row2                                                        column=age:, timestamp=2024-09-02T19:38:50.066, value=30
 row2                                                        column=name:firstname, timestamp=2024-09-02T19:38:38.772, value=si
 row2                                                        column=name:lastname, timestamp=2024-09-02T19:38:24.245, value=li
2 row(s)
Took 0.0350 seconds
            

5. Exit

Bash

1>quit

Developer Guide

Kafka Data Storage to BOS