修改Flume源码使Tairdir支持文件夹递归

具体步骤
1、Flume导入IDEA,见前面文章Flume源码导入IDEA

2、修改代码见Github

简书不支持跳转Github,请直接访问https://github.com/UniqueChun/flume-recursive-tairDir

3、修改完后进入目录
D:\Hadoop\source\flume-ng-1.6.0-cdh5.7.0-src\flume-ng-1.6.0-cdh5.7.0\flume-ng-sources\flume-taildir-source编译 mvn clean package

此时编译报错


Tairdir不支持win环境,我将修改的代码打成zip上传至linux后重新 mvn clean package,成功

[hadoop@hadoop000 flume-taildir-source]$ pwd
/home/hadoop/soul/soft/source/flume-ng-1.6.0-cdh5.7.0-src/flume-ng-1.6.0-cdh5.7.0/flume-ng-sources/flume-taildir-source

[hadoop@hadoop000 flume-taildir-source]$ mvn clean package

4、编译完成后将生产现有Flume下lib的flume-taildir-source-1.6.0-cdh5.7.0.jar进行备份

mv flume-taildir-source-1.6.0-cdh5.7.0.jar flume-taildir-source-1.6.0-cdh5.7.0.jar.bak

5、将编译后的flume-taildir-source-1.6.0-cdh5.7.0.jar(target下)拷贝至部署的flume的lib下

编译的Flume
[hadoop@hadoop000 target]$ pwd
/home/hadoop/soul/soft/source/flume-ng-1.6.0-cdh5.7.0-src/flume-ng-1.6.0-cdh5.7.0/flume-ng-sources/flume-taildir-source/target

 //$FLUME_HOME/lib  生产已经使用的FLume
[hadoop@hadoop000 target]$ cp flume-taildir-source-1.6.0-cdh5.7.0.jar $FLUME_HOME/lib

6、测试
配置文件如下

tairdir-hdfs-agent.sources = tairdir-source
tairdir-hdfs-agent.sinks = hdfs-sink
tairdir-hdfs-agent.channels = memory-channel

tairdir-hdfs-agent.sources.tairdir-source.type = TAILDIR
tairdir-hdfs-agent.sources.tairdir-source.filegroups = f1
tairdir-hdfs-agent.sources.tairdir-source.filegroups.f1 = /home/hadoop/soul/data/flume/tairdir/.*.log
# 元数据位置
tairdir-hdfs-agent.sources.tairdir-source.positionFile = /home/hadoop/soul/data/flume/taildir_position.json


tairdir-hdfs-agent.channels.memory-channel.type = memory
tairdir-hdfs-agent.channels.memory-channel.capacity = 1000
tairdir-hdfs-agent.channels.memory-channel.transactionCapacity = 100


tairdir-hdfs-agent.sinks.hdfs-sink.type = hdfs
tairdir-hdfs-agent.sinks.hdfs-sink.hdfs.path = hdfs://hadoop000:8020/g6/flume/tairDir/%Y%m%d/%H%M
tairdir-hdfs-agent.sinks.hdfs-sink.hdfs.filePrefix = baidu
tairdir-hdfs-agent.sinks.hdfs-sink.hdfs.rollInterval = 90
tairdir-hdfs-agent.sinks.hdfs-sink.hdfs.rollSize = 20000000
tairdir-hdfs-agent.sinks.hdfs-sink.hdfs.rollCount = 0
tairdir-hdfs-agent.sinks.hdfs-sink.dfs.codeC = gzip
tairdir-hdfs-agent.sinks.hdfs-sink.hdfs.writeFormat = Text
tairdir-hdfs-agent.sinks.hdfs-sink.hdfs.useLocalTimeStamp = true


tairdir-hdfs-agent.sources.tairdir-source.channels = memory-channel
tairdir-hdfs-agent.sinks.hdfs-sink.channel= memory-channel

监控目录到tairDir为止,但是日志文件实际是在src层

[hadoop@hadoop000 src]$ pwd
/home/hadoop/soul/data/flume/tairdir/src

[hadoop@hadoop000 src]$ ll
total 80244
-rw-r--r-- 1 hadoop hadoop 82166362 Jun 24 20:00 access.log

启动Flume

flume-ng agent --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/tairdir-hdfs.conf --name tairdir-hdfs-agent -Dflume.root.logger=INFO,console

会发现日志已经收集过来了

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。