Flume Data Storage to BOS
Updated at:2025-11-03
Flume
Flume is a distributed, reliable, and highly available system for mass log aggregation. It supports customizable data senders for data collection and offers simple data processing capabilities, as well as options to write data to various customizable receivers.
Flume supports multiple Sink types. Use the HDFS Sink to store collected data in BOS.
Start
1. Download and install apache-flume
Omitted
2. Configure environment
If a Hadoop environment already exists and is configured to access BOS, skip this step; Otherwise
- Download the jar package of bos-hdfs to the /opt/apache-flume-1.xx.0-bin/lib directory;
- Add BOS access configurations to the core-site.xml file in the Hadoop directory and copy it to the /opt/apache-flume-1.xx.0-bin/conf directory.
3. Create a flume configuration file
Configure Flume with StressSource as the source, a memory channel, and write to BOS using the HDFS protocol.
Bash
1#ss2bos.properties
2agent.sources = stress_source
3agent.channels = mem_channel
4agent.sinks = bos_hdfs_sink
5agent.sources.stress_source.type = org.apache.flume.source.StressSource
6agent.sources.stress_source.channels = mem_channel
7agent.sources.stress_source.size = 1024
8agent.sources.stress_source.maxTotalEvents = 1000
9agent.sources.stress_source.maxEventsPerSecond = 10
10agent.sources.stress_source.batchSize=10
11agent.channels.mem_channel.type = memory
12agent.channels.mem_channel.capacity = 1000000
13agent.channels.mem_channel.transactionCapacity = 100
14agent.sinks.bos_hdfs_sink.channel = mem_channel
15agent.sinks.bos_hdfs_sink.type = hdfs
16agent.sinks.bos_hdfs_sink.hdfs.useLocalTimeStamp = true
17 agent.sinks.bos_hdfs_sink.hdfs.filePrefix = %{host}_bos_hdfs_sink #host distinguishes files to avoid concurrent write conflicts
18 agent.sinks.bos_hdfs_sink.hdfs.path = bos://{your bucket}/flume/%Y-%m-%d-%H-%M # replace the bucket path
19agent.sinks.bos_hdfs_sink.hdfs.fileType = DataStream
20agent.sinks.bos_hdfs_sink.hdfs.writeFormat = Text
21agent.sinks.bos_hdfs_sink.hdfs.rollSize = 0
22agent.sinks.bos_hdfs_sink.hdfs.rollCount = 100
23agent.sinks.bos_hdfs_sink.hdfs.rollInterval = 0
24agent.sinks.bos_hdfs_sink.hdfs.batchSize = 100
25agent.sinks.bos_hdfs_sink.hdfs.round = true
26agent.sinks.bos_hdfs_sink.hdfs.roundValue = 10
27agent.sinks.bos_hdfs_sink.hdfs.roundUnit = minute
4. Start the Flume agent
Bash
1./bin/flume-ng agent -n agent -c conf/ -f ss2bos.properties
