Flink Usage Guide
Updated at:2025-11-03
Flink
flink-bos-hadoop is the file system implementation of Baidu AI Cloud Object Storage (BOS) for Flink, supporting the RecoverWriter API. Flink can leverage this file system for reading and writing BOS data and as a state backend for streaming applications.
Install
1. Flink environment preparation
For example, consider version 1.15.0.
Bash
1# Download to a path
2wget https://archive.apache.org/dist/flink/flink-1.15.0/flink-1.15.0-bin-scala_2.12.tgz
3 #Unzip
4tar zxvf flink-1.15.0-bin-scala_2.12.tgz
2. Add dependency jar packages and configuration
Before starting Flink, download the flink-bos-hadoop plugin.
Bash
1# Import the flink-bos-hadoop plugin into flink
2mkdir ./plugins/bos-fs-hadoop
3cp flink-bos-hadoop-1.15.0-0.1.0.jar ./plugins/bos-fs-hadoop/
4# Some necessary configurations for accessing BOS
5vim ./conf/flink-conf.yaml
6...
7cat ./conf/flink-conf.yaml
8fs.bos.impl: org.apache.hadoop.fs.bos.BaiduBosFileSystem
9fs.AbstractFileSystem.bos.impl: org.apache.hadoop.fs.bos.BOS
10fs.bos.access.key: {your ak}
11fs.bos.secret.access.key: {your sk}
12fs.bos.endpoint: bj.bcebos.com {your bucket endpoint}
Use
Start
Bash
1./bin/start-cluster.sh
Submit the job
Bash
1# Specify paths in the following format to use BOS objects like regular files: bos://<your-bucket>/{object-name}
2./bin/flink run examples/streaming/WordCount.jar --input "bos://my_bucket/students.txt" --output "bos://my_bucket/out"
View running results
Bash
1# View wordcount statistical results
2$ hadoop fs -ls bos://my_bucket/out
3Found 1 items
4drwxrwxrwx - 0 1970-01-01 08:00 bos://my_bucket/out/2023-08-10--15
5$ hadoop fs -ls bos://my_bucket/out/2023-08-10--15/
6Found 1 items
7-rw-rw-rw- 1 1792 2023-08-10 15:52 bos://my_bucket/out/2023-08-10--15/part-3053774f-2d8e-40c5-aa3c-01402ce4b6b4-0
8$ hadoop fs -cat bos://my_bucket/out/2023-08-10--15/part-3053774f-2d8e-40c5-aa3c-01402ce4b6b4-0
9(name,1)
10(studentname,1)
11(age,1)
12...
