一、配置示意图:
二、Flume参数配置说明:
三、问题记录:
滚动生成新文件说明
1.minBlockReplicas=1 该值设为1
参考:flume中sink到hdfs,文件系统频繁产生文件,文件滚动配置不起作用?
http://blog.csdn.net/simonchi/article/details/43231891
2.rollCount/rollSize/rollInterval最好简单配置,(只设置一个,多个不知道能不能成功,其他设为0)
四、配置列表:
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2
a1.sources.r1.selector.type = replicating
# Describe/configure the source
a1.sources.r1.type = http
a1.sources.r1.port = 5140
a1.sources.r1.handler = org.apache.flume.source.http.JSONHandler
a1.sources.r1.channels = c1 c2
# Use a channel which buffers events in memory
a1.sinks.k1.channel = c1
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.sinks.k2.channel = c2
a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100
######to kafka
# Describe the sink k1
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.topic = test
a1.sinks.k1.brokerList = 192.168.206.10:9092
a1.sinks.k1.requiredAcks = 1
a1.sinks.k1.batchSize = 20
#####a1 to hdfs#####
# Describe the sink
a1.sinks.k2.type = hdfs
a1.sinks.k2.hdfs.path = hdfs://master:9000/flume/%Y%m%d
a1.sinks.k2.hdfs.filePrefix = log_%H_%M
a1.sinks.k2.hdfs.fileSuffix = .log
a1.sinks.k2.hdfs.useLocalTimeStamp = true
a1.sinks.k2.hdfs.writeFormat = Text
a1.sinks.k2.hdfs.fileType = DataStream
####one hour save
a1.sinks.k2.hdfs.round = true
a1.sinks.k2.hdfs.roundValue = 1
a1.sinks.k2.hdfs.roundUnit = hour
#### write new file file 1M
a1.sinks.k2.hdfs.rollInterval = 0
a1.sinks.k2.hdfs.rollSize=1048576
a1.sinks.k2.hdfs.rollCount=0
a1.sinks.k2.hdfs.batchSize = 100
a1.sinks.k2.hdfs.threadsPoolSize = 10
a1.sinks.k2.hdfs.idleTimeout = 0
a1.sinks.k2.hdfs.minBlockReplicas = 1
五、参考文献:
1.Flume中的HDFS Sink配置参数说明:
http://lxw1234.com/archives/2015/10/527.htm
2.Flume(NG)架构设计要点及配置实践
http://shiyanjun.cn/archives/915.html
3.Flume NG 简介及配置实战
https://yq.aliyun.com/articles/50487
4.flume官网
http://flume.apache.org/FlumeUserGuide.html#hdfs-sink