flume的配置和入门

一、flume的安装

1、上传压缩包

2、解压

3、修改conf/flume-env.sh 文件中的JDK目录

注意:JAVA_OPTS 配置 如果我们传输文件过大 报内存溢出时 需要修改这个配置项

4、验证安装是否成功 ./flume-ng version

5、配置环境变量

export FLUME_HOME=/home/apache-flume-1.6.0-bin

二、案列

1、 A simple example

http://flume.apache.org/FlumeUserGuide.html#a-simple-example

配置文件
############################################################
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = node1
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
############################################################

1.1、启动flume

flume-ng agent -n a1 -f /usr/local/flume/conf/netcat-logger.conf -Dflume.root.logger=INFO,console

1.2、在shell中新打开一个node1的窗口

安装telnet
yum install telnet

telnet node1 44444

1.3、输入内容,然后再控制台就可以看到

1.4、退出 ctrl+] quit

1.5、Memory Chanel 配置
capacity:默认该通道中最大的可以存储的event数量是100,
trasactionCapacity:每次最大可以source中拿到或者送到sink中的event数量也是100
keep-alive:event添加到通道中或者移出的允许时间
byte**:即event的字节量的限制,只包括eventbody

2、两个flume做集群

node01服务器中,配置文件
############################################################
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = node1
a1.sources.r1.port = 44444

# Describe the sink
# a1.sinks.k1.type = logger
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = node2
a1.sinks.k1.port = 60000

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
############################################################

node02服务器中,安装Flume(步骤略)
配置文件
############################################################
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.bind = node2
a1.sources.r1.port = 60000

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
############################################################

2.1、先启动node02的Flume

flume-ng agent  -n a1 -c conf -f /usr/local/flume/conf/netcat-logger2.conf -Dflume.root.logger=INFO,console

2.2、再启动node01的Flume

flume-ng agent  -n a1 -c conf -f /usr/local/flume/conf/netcat-logger1.conf -Dflume.root.logger=INFO,console

2.3、打开在xshell中再打开一个node1的窗口,telnet node1 44444,node02控制台输出结果

3、Exec Source

http://flume.apache.org/FlumeUserGuide.html#exec-source

配置文件
############################################################
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/flume.exec.log

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
############################################################

3.1、启动Flume

flume-ng agent -n a1 -c conf -f /usr/local/flume/conf/exec.conf -Dflume.root.logger=INFO,console

3.2、在/home目录下创建flume.exec.log空文件

touch flume.exec.log

3.3、向flume.exec.log循环添加数据,然后再控制台就可以看到添加的数据

for i in {1..50}; do echo "$i hi flume" >> flume.exec.log ; sleep 0.1; done

4、Spooling Directory Source

http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source

配置文件
############################################################
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/logs
a1.sources.r1.fileHeader = true

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
############################################################

4.1、启动Flume

flume-ng agent -n a1 -c conf -f /usr/local/flume/conf/spool.conf -Dflume.root.logger=INFO,console

4.2、mkdir /home/logs

4.3、cp flume.exec.log /home/logs/

5、hdfs sink

http://flume.apache.org/FlumeUserGuide.html#hdfs-sink

配置文件
############################################################
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/logs
a1.sources.r1.fileHeader = true

# Describe the sink
***只修改上一个spool sink的配置代码块 a1.sinks.k1.type = logger
a1.sinks.k1.type=hdfs
#myhadoop是hadoop集群的nameservice
a1.sinks.k1.hdfs.path=hdfs://myhadoop/flume/%Y-%m-%d/%H%M

##每隔60s或者文件大小超过10M的时候产生新文件
# hdfs有多少条消息时新建文件,0不基于消息个数
a1.sinks.k1.hdfs.rollCount=0
# hdfs创建多长时间新建文件,0不基于时间
a1.sinks.k1.hdfs.rollInterval=60
# hdfs多大时新建文件,0不基于文件大小
a1.sinks.k1.hdfs.rollSize=10240
# 当目前被打开的临时文件在该参数指定的时间(秒)内,没有任何数据写入,则将该临时文件关闭并重命名成目标文件
a1.sinks.k1.hdfs.idleTimeout=3

a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.useLocalTimeStamp=true

## 每五分钟生成一个目录:
# 是否启用时间上的”舍弃”,这里的”舍弃”,类似于”四舍五入”,后面再介绍。如果启用,则会影响除了%t的其他所有时间表达式
a1.sinks.k1.hdfs.round=true
# 时间上进行“舍弃”的值;
a1.sinks.k1.hdfs.roundValue=5
# 时间上进行”舍弃”的单位,包含:second,minute,hour
a1.sinks.k1.hdfs.roundUnit=minute

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
############################################################

5.1、创建HDFS目录

hadoop fs -mkdir /flume

5.2、启动Flume

flume-ng agent -n a1 -c conf -f /usr/local/flume/conf/hdfs.conf -Dflume.root.logger=INFO,consolecd

5.3、cp flume.exec.log /home/logs/,这个时候,flume.exec.log文件里有什么内容,hadoop fs -ls /flume/...文件里就有什么内容

5.4、查看hdfs文件

hadoop fs -ls /flume/...
hadoop fs -get /flume/...
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • 博客原文 翻译作品,水平有限,如有错误,烦请留言指正。原文请见 官网英文文档 引言 概述 Apache Flume...
    rabbitGYK阅读 11,538评论 13 34
  • 介绍 概述 Apache Flume是为有效收集聚合和移动大量来自不同源到中心数据存储而设计的可分布,可靠的,可用...
    ximengchj阅读 3,553评论 0 13
  • 1. Flume简介 Apache Flume是一个分布式的、可靠的、可用的,从多种不同的源收集、聚集、移动大量日...
    奉先阅读 4,516评论 2 5
  • Spring Cloud为开发人员提供了快速构建分布式系统中一些常见模式的工具(例如配置管理,服务发现,断路器,智...
    卡卡罗2017阅读 134,997评论 19 139
  • 一、安装 1、必备条件 因为要把日志上传到hdfs上,所以需要以下hadoop依赖包: commons-confi...
    Bottle丶Fish阅读 355评论 0 1