一、背景说明
在Hadoop2.0.0出现之前,在HDFS集群中,NameNode是存在单点故障问题的。集群中只有一个NameNode,如果进程或者机器出现问题,那么整个HDFS集群将不可用,直到NameNode重启或者通过其他机器接入。
主要有两个方面会影响集群的高可用:
1、突发事件发生,比如机器Crash了,集群就变得不可用了,直到重启NameNode
2、系统维护,比如软件或硬件的升级,需要临时NameNode,那么集群也会短时间内不可用。
HDFS高可用特性解决以上问题是通过在同一个集群中运行两个冗余的NameNode在Active和Passive之间进行热备份。这样,在机器Crash发生故障时,可以快速的转移到新的NameNode上,也可以在系统计划升级的时候由管理员发起的administrator-initiated转移。
二、体系
在一个典型的高可用集群中,两个分离的机器被配置成NameNode。在任一时刻,实际上只有其中之一处在Active状态上,而另外一台出去Standby状态。处于Active状态的NameNode负责集群中所有客户端的操作,Standby状态的NameNode简单的表现为一个从节点,在必要的时候以便快速故障转移。
为了是Standby节点保持和Active节点状态一致,两个节点都与一组名为"JournalNodes(JNs)"的相互独立的线程保持通信,当Active节点更新了namespace后,它把更新记录发送给这些JNs中的多数,Standby节点可以从JNs中读取这些更改,并持续关注这些变更,Standby节点把这些变更同步到自己的namespace中,当发生故障事件时,Standby节点确保它已经读取了全部的变更记录在它变更为Active状态前。这就确保了当故障发生时,namespace已经全部同步完毕。
为了支持快速故障转移,Standby节点有必要知道集群上关于block的最新信息,为了得到这些,所有的DataNode节点除了配置Active节点还要配置Standby节点,以便和Standby保持心跳,发送消息。
在高可用集群中正确的操作是至关重要的,即某一时刻,是有一个NameNode处于Active状态。否则,两个NameNode节点的namespace很快会变的不一致,会导致数据丢失或者其他不正确的结果。为了确保这个状况并且阻止出现所谓的“split-brain scenario”现象,JournalNodes同一时刻,只允许一个NameNode可以写数据。在故障发生的期间,变成Acitve状态的NameNode(之前Standby状态的节点)接过Active的所有职能,负责向JournalNodes写入数据,有效阻止了另外的NameNode继续处于Active状态,允许新的Active状态节点安全的进行失败故障转移。
三、硬件资源
如果部署一个高可用集群,你需要按照以下来准备:
1、NameNode机器:用于运行Active状态和Standby状态的NameNode,具有相同硬件环境并且和非高可用集群相同的硬件资源。
2、JournalNodes机器:用于运行JournalNodes进程。JournalNode守护进程是非常轻量级的,因此,这些进程可以运行在其他Hadoop进程机器上,比如,NameNodes、JobTracker、YARN ResourceManager等。JournalNode守护进程必须最少在三个以上,因为edit log文件的修改必须被写入JNs的多数中才算成功,这样,单机发生故障,系统仍然可以正常。你也可以运行多于三个JournalNode,但是为了系统能更好的容错,你最好运行奇数个JournalNode,这样可以更好的形成多数。如果你运行了N个JournalNode,系统能够容错,在最多(N-1)/2失败时,扔能继续工作。
在一个高可用集群中,Standby NameNode节点也需要定时检查namespace的状态,但是,在高可用集群上运行Secondary NameNode,CheckpointNode,BackupNode是没有必要的。事实上,这么做,会产生错误,是不允许的。
四、部署
1、集群规划
节点名称 | IP | 安装软件 | 进程服务 |
---|---|---|---|
work1 | 192.168.162.11 | jdk、hadoop | NameNode、DFSZKFailoverController |
work2 | 192.168.162.12 | jdk、hadoop | NameNode、DFSZKFailoverController |
work3 | 192.168.162.13 | jdk、hadoop | ResourceManager |
work4 | 192.168.162.14 | jdk、hadoop | ResourceManager |
work5 | 192.168.162.15 | jdk、hadoop、zookeeper | QuorumPeerMain、JournalNode、NodeManager、DataNode |
work6 | 192.168.162.16 | jdk、hadoop、zookeeper | QuorumPeerMain、JournalNode、NodeManager、DataNode |
work7 | 192.168.162.17 | jdk、hadoop、zookeeper | QuorumPeerMain、JournalNode、NodeManager、DataNode |
2、软件版本
软件 | 版本 |
---|---|
jdk | jdk1.8.0_172 |
hadoop | hadoop-2.6.0 |
zookeeper | zookeeper-3.4.5 |
3、安装步骤
注:所有操作均使用root用户,所有软件均安装在/usr/local/src目录下
3.1 配置host
修改集群中所有节点的hosts文件
vim /etc/hosts
将IP与hostname加进去
3.2 配置SSH
3.2.1 生成ssh秘钥
在所有节点执行创建命令
ssh-keygen
直接回车即可安装成功
3.2.2 配置服务器间免密登录
在所有节点均执行
ssh-copy-id work1
ssh-copy-id work2
ssh-copy-id work3
ssh-copy-id work4
ssh-copy-id work5
ssh-copy-id work6
ssh-copy-id work7
3.3 安装zookeeper
在work5、work6、work7安装zookeeper
3.3.1 解压
tar -zxvf zookeeper-3.4.5.tar.gz
3.3.2 配置zookeeper
进入配置目录zookeeper-3.4.5/conf/,拷贝配置文件
cp zoo_sample.cfg zoo.cfg
修改配置文件
vim zoo.cfg
1、修改dataDir的路径地址,需要提前创建好相关目录,本人设置的是
dataDir=/usr/local/src/zookeeper-3.4.5
2、在配置文件最下方增加以下内容
server.0=work5:2888:3888
server.1=work6:2888:3888
server.2=work7:2888:3888
关于内容说明:server.A=B:C:D
其中,A表示服务器编号,看3.3.3;B是服务器ip,也可以是hostname;C表示zookeeper集群中follower与leader节点通信的端口;D表示Leader投票过程中通信的端口
3.3.3 配置myid
进入到3.3.2中配置的dataDir的目录中,执行命令
echo 0 >> myid
这里echo的值由3.3.2中的第二个配置决定,如本篇中,server.0对应的work5,所以在work5的节点上,执行
echo 0 >> myid
同理,在work6、work7分别执行
echo 1 >> myid
echo 2 >> myid
至此,zookeeper集群搭建完毕。
3.4 安装hadoop
在全部节点都要安装hadoop,可以在work1安装配置完之后,同步拷贝到其他节点。
3.4.1 解压
tar -zxvf hadoop-2.6.0.tar.gz
3.4.2 创建数据目录
配置hadoop前,提前创建好相关目录,用于存储数据信息,本篇在hadoop的安装目录下创建了tmp目录,在其中创建了两个子目录dfs和journal,dfs创建两个子目录name和data,分别用于namenode节点和datanode节点产生的临时数据的存放,journal目录用于HA高可用配置产生的临时数据的存放。
3.4.3 配置hadoop
进入hadoop的配置文件目录,hadoop安装目录下的etc/hadoop
- 配置core-site.xml
<configuration>
<property>
<!-- 指定hdfs的nameservice -->
<name>fs.defaultFS</name>
<value>hdfs://ns/</value>
</property>
<property>
<!-- Size of read/write buffer used in SequenceFiles -->
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.name.dir</name>
<value>/usr/local/src/hadoop-2.6.0/tmp</value>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>work5:2181,work6:2181,work7:2181</value>
</property>
</configuration>
- 配置hdfs-site.xml
<configuration>
<property>
<!--指定hdfs的nameservice为ns,需要和core-site.xml中的保持一致 -->
<name>dfs.nameservices</name>
<value>ns</value>
</property>
<!-- ns下面有两个NameNode,分别是nn1,nn2 -->
<property>
<name>dfs.ha.namenodes.ns</name>
<value>nn1,nn2</value>
</property>
<!-- nn1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns.nn1</name>
<value>work1:9000</value>
</property>
<!-- nn2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns.nn2</name>
<value>work2:9000</value>
</property>
<!-- nn1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns.nn1</name>
<value>work1:50070</value>
</property>
<!-- nn2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns.nn2</name>
<value>work2:50070</value>
</property>
<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://work5:8485;work6:8485;work7:8485/ns</value>
</property>
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/usr/local/src/hadoop-2.6.0/tmp/journal</value>
</property>
<!-- 开启NameNode失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.ns</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- 使用sshfence隔离机制时需要ssh免登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<!-- 配置sshfence隔离机制超时时间 -->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<!-- namenode临时数据 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/src/hadoop-2.6.0-ha/tmp/dfs/name</value>
</property>
<!-- datanode临时数据 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/src/hadoop-2.6.0/tmp/dfs/data</value>
</property>
<!-- 数据备份块数 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
- 配置mapred-site.xml
需要先拷贝一下
cp mapred-site.xml.template mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- 配置yarn-site.xml
<configuration>
<property>
<!-- 开启RM高可用 -->
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<!-- 指定RM的cluster id -->
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<property>
<!-- 指定RM的名字 -->
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 分别指定RM的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>work3</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>work4</value>
</property>
<!-- 指定zk集群地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>work5:2181,work6:2181,work7:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
- 配置slaves
修改slaves文件,将datanode节点添加进去,本篇添加work5、work6、work7三个节点。
work5
work6
work7
3.4.4 同步
将配置好的hadoop拷贝到其他节点上
scp -r /usr/local/src/hadoop-2.6.0 root@work2:/usr/local/src/
scp -r /usr/local/src/hadoop-2.6.0 root@work3:/usr/local/src/
scp -r /usr/local/src/hadoop-2.6.0 root@work4:/usr/local/src/
scp -r /usr/local/src/hadoop-2.6.0 root@work5:/usr/local/src/
scp -r /usr/local/src/hadoop-2.6.0 root@work6:/usr/local/src/
scp -r /usr/local/src/hadoop-2.6.0 root@work7:/usr/local/src/
3.5 配置环境变量
因为hadoop配置文件hadoop-env.sh中用到了jdk的环境变量,所以,必须配置jdk的环境变量,其他环境变量的配置,可以允许在任何目录下执行相关脚本
vim ~/.bashrc
在work1、work2、work3、work4节点的文件尾部追加以下内容
export JAVA_HOME=/usr/local/src/jdk1.8.0_172
export HADOOP_HOME=/usr/local/src/hadoop-2.6.0
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
在work5、work6、work7节点的文件尾部追加以下内容
export JAVA_HOME=/usr/local/src/jdk1.8.0_172
export ZOOKEEPER_HOME=/usr/local/src/zookeeper-3.4.5
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin
五、启动集群
请严格按照以下步骤执行
1、启动zookeeper
在work5、work6、work7上,分别执行启动命令
zkServer.sh start
全部启动成功后,查看zookeeper进程状态
zkServer.sh status
如下显示结果
JMX enabled by default
Using config: /usr/local/src/zookeeper-3.4.5/bin/../conf/zoo.cfg
Mode: follower
其中有两个节点的Mode显示为follower,另外一个节点显示为leader,即正常。
可用jps命令查看进程
2610 QuorumPeerMain
2、启动journalnode进程
在work5、work6、work7上,分别执行启动命令
hadoop-daemon.sh start journalnode
启动成功之后,用jps命令查看是否存在进程
2677 JournalNode
3、格式化HDFS
在work1上执行
hdfs namenode -format
格式化后会在根据core-site.xml中的hadoop.tmp.dir配置生成个文件,可以去对应的目录下查看。
因为work2作为standby的namenode节点,需要将work1里生成的namenode元信息拷贝一份到work2上。
这里我们可以手动拷贝过去,如下
scp -r /usr/local/src/hadoop-2.6.0/tmp/ root@work2:/usr/local/src/hadoop-2.6.0/
也可以执行4之后启动work1上的namenode
hadoop-daemon.sh start namenode
然后在work2上执行
hdfs namenode -bootstrapStandby
4、格式化ZKFC
在work1上执行
hdfs zkfc -formatZK
5、启动hdfs
在work1上启动hdfs
start-dfs.sh
启动过程如下
18/06/22 02:46:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [work1 work2]
work2: starting namenode, logging to /usr/local/src/hadoop-2.6.0/logs/hadoop-root-namenode-work2.out
work1: starting namenode, logging to /usr/local/src/hadoop-2.6.0/logs/hadoop-root-namenode-work1.out
work7: starting datanode, logging to /usr/local/src/hadoop-2.6.0/logs/hadoop-root-datanode-work7.out
work6: starting datanode, logging to /usr/local/src/hadoop-2.6.0/logs/hadoop-root-datanode-work6.out
work5: starting datanode, logging to /usr/local/src/hadoop-2.6.0/logs/hadoop-root-datanode-work5.out
Starting journal nodes [work5 work6 work7]
work5: journalnode running as process 2677. Stop it first.
work7: journalnode running as process 2688. Stop it first.
work6: journalnode running as process 2671. Stop it first.
18/06/22 02:46:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting ZK Failover Controllers on NN hosts [work1 work2]
work1: starting zkfc, logging to /usr/local/src/hadoop-2.6.0/logs/hadoop-root-zkfc-work1.out
work2: starting zkfc, logging to /usr/local/src/hadoop-2.6.0/logs/hadoop-root-zkfc-work2.out
在work1、work2、work5、work6、work7使用jps命令查看分别如下
2752 NameNode
5096 Jps
3039 DFSZKFailoverController
11233 Jps
2725 DFSZKFailoverController
2623 NameNode
30418 Jps
2610 QuorumPeerMain
2677 JournalNode
2793 DataNode
30297 Jps
2781 DataNode
2591 QuorumPeerMain
2671 JournalNode
2688 JournalNode
30291 Jps
2615 QuorumPeerMain
2798 DataNode
在work1、work2上分别启动了NameNode进程和DFSZKFailoverController进程
在work5、work6、work7上分别启动了DataNode进程
6、启动yarn
在work3上启动yarn
start-yarn.sh
启动过程
starting yarn daemons
starting resourcemanager, logging to /usr/local/src/hadoop-2.6.0/logs/yarn-root-resourcemanager-work3.out
work5: starting nodemanager, logging to /usr/local/src/hadoop-2.6.0/logs/yarn-root-nodemanager-work5.out
work7: starting nodemanager, logging to /usr/local/src/hadoop-2.6.0/logs/yarn-root-nodemanager-work7.out
work6: starting nodemanager, logging to /usr/local/src/hadoop-2.6.0/logs/yarn-root-nodemanager-work6.out
在work3、work5、work6、work7使用jps命令查看分别如下
2551 ResourceManager
2616 Jps
30418 Jps
2610 QuorumPeerMain
2677 JournalNode
2919 NodeManager
2793 DataNode
30297 Jps
2907 NodeManager
2781 DataNode
2591 QuorumPeerMain
2671 JournalNode
2688 JournalNode
30291 Jps
2615 QuorumPeerMain
2924 NodeManager
2798 DataNode
在work3上启动了ResourceManager进程
在work5、work6、work7分别启动了NodeManager进程
在work4上启动standby状态的ResourceManager
yarn-daemon.sh start resourcemanager
启动过程
starting resourcemanager, logging to /usr/local/src/hadoop-2.6.0-ha/logs/yarn-root-resourcemanager-work4.out
使用jps查看进程
2881 Jps
2851 ResourceManager
六、验证集群
1、浏览器访问
访问work1的NameNode节点
http://192.168.162.11:50070
网页显示
'work1:9000' (active)
访问work2的NameNode节点
http://192.168.162.12:50070
网页显示
'work2:9000' (standby)
访问work3的ResourceManager
http://192.168.162.13:8088/cluster/cluster
网页显示
Cluster ID: 1529681241532
ResourceManager state: STARTED
ResourceManager HA state: active
ResourceManager RMStateStore: org.apache.hadoop.yarn.server.resourcemanager.recovery.NullRMStateStore
ResourceManager started on: 22-Jun-2018 08:27:21
ResourceManager version: 2.6.0 from e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1 by jenkins source checksum 7e1415f8c555842b6118a192d86f5e8 on 2014-11-13T21:17Z
Hadoop version: 2.6.0 from e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1 by jenkins source checksum 18e43357c8f927c0695f1e9522859d6a on 2014-11-13T21:10Z
访问work4的ResourceManager
http://192.168.162.14:8088/cluster/cluster
网页显示
Cluster ID: 1529681248588
ResourceManager state: STARTED
ResourceManager HA state: standby
ResourceManager RMStateStore: org.apache.hadoop.yarn.server.resourcemanager.recovery.NullRMStateStore
ResourceManager started on: 22-Jun-2018 08:27:28
ResourceManager version: 2.6.0 from e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1 by jenkins source checksum 7e1415f8c555842b6118a192d86f5e8 on 2014-11-13T21:17Z
Hadoop version: 2.6.0 from e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1 by jenkins source checksum 18e43357c8f927c0695f1e9522859d6a on 2014-11-13T21:10Z
2、验证HDFS的HA
(1)上传一个文件
在work1执行
hadoop fs -put /etc/profile /profile
查看是否存在
hadoop fs -ls /
Found 1 items
-rw-r--r-- 3 root supergroup 1796 2018-06-22 08:36 /profile
(2)杀掉active的NameNode节点
work1上jps查看进程
2752 NameNode
10254 Jps
3039 DFSZKFailoverController
kill掉NameNode进程
kill -9 2752
(3) 查看work1、work2的NameNode
访问work1的NameNode
http://192.168.162.11:50070/
已经不能访问
访问work2的NameNode
http://192.168.162.12:50070/
发现work2的NameNode已经显示为active状态
'work2:9000' (active)
(4)查看上传文件是否存在
在work2上执行
hadoop fs -ls /
显示如下
Found 1 items
-rw-r--r-- 3 root supergroup 1796 2018-06-22 08:36 /profile
之前上传的文件仍然存在
(5)重启work1的NameNode
hadoop-daemon.sh start namenode
访问work1的NameNode
http://192.168.162.11:50070/
网页显示
'work1:9000' (standby)
验证完毕!
3、验证yarn的HA
(1)执行示例程序
基于以上验证,在work2上执行示例程序
hadoop jar /usr/local/src/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /profile /out
执行日志
18/06/22 08:45:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/06/22 08:45:28 INFO input.FileInputFormat: Total input paths to process : 1
18/06/22 08:45:28 INFO mapreduce.JobSubmitter: number of splits:1
18/06/22 08:45:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1529681241532_0001
18/06/22 08:45:30 INFO impl.YarnClientImpl: Submitted application application_1529681241532_0001
18/06/22 08:45:30 INFO mapreduce.Job: The url to track the job: http://work3:8088/proxy/application_1529681241532_0001/
18/06/22 08:45:30 INFO mapreduce.Job: Running job: job_1529681241532_0001
18/06/22 08:45:48 INFO mapreduce.Job: Job job_1529681241532_0001 running in uber mode : false
18/06/22 08:45:48 INFO mapreduce.Job: map 0% reduce 0%
18/06/22 08:46:02 INFO mapreduce.Job: map 100% reduce 0%
18/06/22 08:46:14 INFO mapreduce.Job: map 100% reduce 100%
18/06/22 08:46:14 INFO mapreduce.Job: Job job_1529681241532_0001 completed successfully
18/06/22 08:46:14 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=2058
FILE: Number of bytes written=220233
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1878
HDFS: Number of bytes written=1429
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=12274
Total time spent by all reduces in occupied slots (ms)=6928
Total time spent by all map tasks (ms)=12274
Total time spent by all reduce tasks (ms)=6928
Total vcore-seconds taken by all map tasks=12274
Total vcore-seconds taken by all reduce tasks=6928
Total megabyte-seconds taken by all map tasks=12568576
Total megabyte-seconds taken by all reduce tasks=7094272
Map-Reduce Framework
Map input records=78
Map output records=258
Map output bytes=2573
Map output materialized bytes=2058
Input split bytes=82
Combine input records=258
Combine output records=156
Reduce input groups=156
Reduce shuffle bytes=2058
Reduce input records=156
Reduce output records=156
Spilled Records=312
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=218
CPU time spent (ms)=1750
Physical memory (bytes) snapshot=271466496
Virtual memory (bytes) snapshot=4126367744
Total committed heap usage (bytes)=138362880
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1796
File Output Format Counters
Bytes Written=1429
(2) 查看结果
hadoop fs -ls /out
显示
Found 2 items
-rw-r--r-- 3 root supergroup 0 2018-06-22 08:46 /out/_SUCCESS
-rw-r--r-- 3 root supergroup 1429 2018-06-22 08:46 /out/part-r-00000
执行成功!
(3)杀掉work3中的ResourceManager,重新执行
10131 Jps
8890 ResourceManager
kill -9 8890
在work2重新执行jar,因为/out目录已经存在,换一个目录out1
hadoop jar /usr/local/src/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /profile /out1
执行日志
18/06/22 09:34:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/06/22 09:34:38 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
18/06/22 09:34:40 INFO input.FileInputFormat: Total input paths to process : 1
18/06/22 09:34:40 INFO mapreduce.JobSubmitter: number of splits:1
18/06/22 09:34:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1529685157948_0001
18/06/22 09:34:41 INFO impl.YarnClientImpl: Submitted application application_1529685157948_0001
18/06/22 09:34:41 INFO mapreduce.Job: The url to track the job: http://work4:8088/proxy/application_1529685157948_0001/
18/06/22 09:34:41 INFO mapreduce.Job: Running job: job_1529685157948_0001
18/06/22 09:35:01 INFO mapreduce.Job: Job job_1529685157948_0001 running in uber mode : false
18/06/22 09:35:01 INFO mapreduce.Job: map 0% reduce 0%
18/06/22 09:35:13 INFO mapreduce.Job: map 100% reduce 0%
18/06/22 09:35:27 INFO mapreduce.Job: map 100% reduce 100%
18/06/22 09:35:28 INFO mapreduce.Job: Job job_1529685157948_0001 completed successfully
18/06/22 09:35:28 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=2058
FILE: Number of bytes written=220235
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1878
HDFS: Number of bytes written=1429
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=10069
Total time spent by all reduces in occupied slots (ms)=9950
Total time spent by all map tasks (ms)=10069
Total time spent by all reduce tasks (ms)=9950
Total vcore-seconds taken by all map tasks=10069
Total vcore-seconds taken by all reduce tasks=9950
Total megabyte-seconds taken by all map tasks=10310656
Total megabyte-seconds taken by all reduce tasks=10188800
Map-Reduce Framework
Map input records=78
Map output records=258
Map output bytes=2573
Map output materialized bytes=2058
Input split bytes=82
Combine input records=258
Combine output records=156
Reduce input groups=156
Reduce shuffle bytes=2058
Reduce input records=156
Reduce output records=156
Spilled Records=312
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=257
CPU time spent (ms)=2260
Physical memory (bytes) snapshot=258924544
Virtual memory (bytes) snapshot=4125798400
Total committed heap usage (bytes)=136077312
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1796
File Output Format Counters
Bytes Written=1429
执行依然成功,日志已经提示失败故障转移到第二个resoucemanager了。
至此,全部结束~