Flink Usage Guide

Updated at：2025-11-03

Flink

flink-bos-hadoop is the file system implementation of Baidu AI Cloud Object Storage (BOS) for Flink, supporting the RecoverWriter API. Flink can leverage this file system for reading and writing BOS data and as a state backend for streaming applications.

Install

1. Flink environment preparation

For example, consider version 1.15.0.

                Bash
                
                # Download to a path
wget https://archive.apache.org/dist/flink/flink-1.15.0/flink-1.15.0-bin-scala_2.12.tgz
 #Unzip
tar zxvf flink-1.15.0-bin-scala_2.12.tgz

2. Add dependency jar packages and configuration

Before starting Flink, download the flink-bos-hadoop plugin.

                Bash
                
            

                # Import the flink-bos-hadoop plugin into flink
mkdir ./plugins/bos-fs-hadoop
cp flink-bos-hadoop-1.15.0-0.1.0.jar ./plugins/bos-fs-hadoop/
# Some necessary configurations for accessing BOS
vim ./conf/flink-conf.yaml
...
cat ./conf/flink-conf.yaml
fs.bos.impl: org.apache.hadoop.fs.bos.BaiduBosFileSystem
fs.AbstractFileSystem.bos.impl: org.apache.hadoop.fs.bos.BOS
fs.bos.access.key: {your ak}
fs.bos.secret.access.key: {your sk}
fs.bos.endpoint: bj.bcebos.com {your bucket endpoint}
            

Use

Start

Bash

1./bin/start-cluster.sh

Submit the job

                Bash
                
                # Specify paths in the following format to use BOS objects like regular files: bos://<your-bucket>/{object-name}
./bin/flink run examples/streaming/WordCount.jar --input "bos://my_bucket/students.txt" --output "bos://my_bucket/out"

View running results

                Bash
                
            

                # View wordcount statistical results
$ hadoop fs -ls bos://my_bucket/out
Found 1 items
drwxrwxrwx   -          0 1970-01-01 08:00 bos://my_bucket/out/2023-08-10--15
$ hadoop fs -ls bos://my_bucket/out/2023-08-10--15/
Found 1 items
-rw-rw-rw-   1       1792 2023-08-10 15:52 bos://my_bucket/out/2023-08-10--15/part-3053774f-2d8e-40c5-aa3c-01402ce4b6b4-0
$ hadoop fs -cat bos://my_bucket/out/2023-08-10--15/part-3053774f-2d8e-40c5-aa3c-01402ce4b6b4-0
(name,1)
(studentname,1)
(age,1)
...
            

Flume Data Storage to BOS

Impala Usage Guide