滴滴研发云AlphaCloud环境下hadoop2.7.7的安装
本文基于滴滴研发云AlphaCloud环境的个人学习笔记。
前期准备
安装包下载
为节约时间,建议提前下载好安装包。
- hadoop版本:hadoop-2.7.7.tar.gz
- jdk版本:jdk-8u191-linux-x64.tar.gz
集群规划
本文集群环境基于滴滴研发云,其他同学可以自行搭建虚拟机环境。
节点概况
ip | host | name | system |
---|---|---|---|
master | 10.96.81.166 | jms-master-01 | centos7.2 [ CPU: 4 & 内存: 12G & 硬盘大小: 100G ] |
node | 10.96.113.243 | jms-master-02 | centos7.2 [ CPU: 4 & 内存: 12G & 硬盘大小: 100G ] |
node | 10.96.85.231 | jms-master-03 | centos7.2 [ CPU: 4 & 内存: 12G & 硬盘大小: 100G ] |
用户规范
统一登录用户组及用户为hadoop。
[root@jms-master-01 ~] groupadd hadoop
[root@jms-master-01 ~] useradd -r -g mysql mysql
[root@jms-master-01 ~] cat /etc/group | grep hadoop
hadoop:x:500:
[root@jms-master-01 ~] cat /etc/passwd | grep hadoop
hadoop:x:500:500::/home/hadoop:/bin/bash
目录规范
软件安装目录:/home/hadoop/tools
安装包存放目录:/home/hadoop/tools/package
[hadoop@jms-master-01 ~]$mkdir -p /home/hadoop/tools/package
系统配置
hosts配置
需要注意的是,要注释掉127.0.0.1的host映射。每个节点都需要配置。
[root@jms-master-01 ~]# vim /etc/hosts
[root@jms-master-01 ~]# cat /etc/hosts
# 127.0.0.1 jms-master-01
10.96.81.166 jms-master-01
10.96.113.243 jms-master-02
10.96.85.231 jms-master-03
ssh免密
需要注意的是,本机也需要配置ssh免密。另外, 免密是和用户相关的,比如root下配置了免密,hadoop用户需要重新配置免密。每个节点都需要配置。
- 查看是否存在密钥
[hadoop@jms-master-01 ~]$ cat ~/.ssh/id_rsa.pub
- (如果不存在)生成密钥(一路回车即可)
[hadoop@jms-master-01 ~]$ ssh-keygen -t rsa
- 远程复制密钥
[hadoop@jms-master-01 ~]$ ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.81.166
[hadoop@jms-master-01 ~]$ ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.113.243
[hadoop@jms-master-01 ~]$ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.85.231
[hadoop@jms-master-02 ~]$ ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.81.166
[hadoop@jms-master-02 ~]$ ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.113.243
[hadoop@jms-master-02 ~]$ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.85.231
[hadoop@jms-master-03 ~]$ ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.81.166
[hadoop@jms-master-03 ~]$ ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.113.243
[hadoop@jms-master-03 ~]$ssh-copy-id -I ~/.ssh/id_rsa.pub 10.96.85.231
jdk安装
jdk安装目录:
[hadoop@jms-master-01 ~]$ mkdir -p /home/hadoop/tools/java
上传安装包jdk-8u191-linux-x64.tar.gz至目录/home/hadoop/tools/package
scp jdk-8u191-linux-x64.tar.gz hadoop@10.96.81.166:~/tools/package/
解压
tar -xzvf jdk-8u191-linux-x64.tar.gz -C /home/hadoop/tools/java/
配置环境变量
[root@jms-master-01 ~]# vim /etc/profile
[root@jms-master-01 ~]# cat /etc/profile
java home
export JAVA_HOME=/home/hadoop/tools/java/jdk1.8.0_191
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export JRE_HOME=$JAVA_HOME/jre
另一种比较优雅的环境变量配置可选用
sudo vi /etc/profile.d/jdk-1.8.sh
export JAVA_HOME=/home/hadoop/tools/java/jdk1.8.0_191
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
配置刷新(root和hadoop用户都刷新立即生效),并验证
[root@jms-master-01 ~]# source /etc/profile
[root@jms-master-01 ~]# su hadoop
[hadoop@jms-master-01 ~]$ source /etc/profile
[hadoop@jms-master-01 ~]$ java -version
java version “1.8.0_191”
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)
我们只在master节点安装,其他节点直接scp复制即可。
复制jdk
[hadoop@jms-master-01 ~]$ scp -r /home/hadoop/tools/java/jdk1.8.0_191 hadoop@10.96.113.243:/home/hadoop/tools/java/jdk1.8.0_191
[hadoop@jms-master-01 ~]$ scp -r /home/hadoop/tools/java/jdk1.8.0_191 hadoop@10.96.85.231:/home/hadoop/tools/java/jdk1.8.0_191
复制配置文件
[hadoop@jms-master-01 ~]$ scp /etc/profile root@10.96.113.243:/etc/profile
[hadoop@jms-master-01 ~]$ scp /etc/profile root@10.96.85.231:/etc/profile
最后别忘了在另外两个节点上验证jdk是否安装成功。
[hadoop@jms-master-02 ~]$ java -version
java version “1.8.0_191”
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)
[hadoop@jms-master-03 ~]$ java -version
java version “1.8.0_191”
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)
至此所有节点jdk安装成功。
关闭防火墙(研发云环境可以略过)
查看防火墙状态
firewall-cmd —state
关闭防火墙
systemctl stop firewalld.service
开启防火墙
systemctl start firewalld.service
禁止开机启动启动防火墙
systemctl disable firewalld.service
Hadoop安装
安装目录规划
项目 | 目录规划 | 备注 |
---|---|---|
hadoop安装目录: | /home/hadoop/tools/hadoop-2.7.7 | 建议统一创建软连接,规范配置 |
hadoop数据存放根目录: | /home/hadoop/tools/hadoop_data | 建议目录:/data |
hdfs-site.xml dfs.namenode.name.dir | /home/hadoop/tools/hadoop_data/hadoop/dfs/name | 建议目录: /data/hadoop/dfs/name |
hdfs-site.xml dfs.datanode.data.dir | /home/hadoop/tools/hadoop_data/hadoop/dfs/data | 建议目录: /data/hadoop/dfs/data |
core-site.xml hadoop.tmp.dir | /home/hadoop/tools/hadoop_temp | 建议目录:/temp |
其实数据目录最好安装在根目录下,这样避免了和用户目录之间的耦合。本文只是临时搭建,所以没有特殊要求。
上传解压
上传安装包并解压至master规划目录。
scp hadoop-2.7.7.tar.gz hadoop@10.96.81.166:~/tools/package/
[hadoop@jms-master-01 package]$ tar -xzvf hadoop-2.7.7.tar.gz -C /home/hadoop/tools/
环境变量
在/etc/profile文件中配置: hadoop home
export HADOOP_HOME=/home/hadoop/tools/hadoop-2.7.7
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS=-Djava.library.path=$HADOOP_HOME/lib
# open hadoop debug mode
export HADOOP_ROOT_LOGGER=DEBUG,console
如果在安装之后发现启动报错,可以用下面的参数开启hadoop的debug模式来查看详细的日志。
export HADOOP_ROOT_LOGGER=DEBUG,console
还有一种比较优雅的环境变量配置方式是在/etc/profile.d/目录中配置。
sudo vi /etc/profile.d/hadoop-2.7.7.sh
export HADOOP_HOME=/home/hadoop/tools/hadoop-2.7.7
export PATH=“$HADOOP_HOME/bin:$PATH”
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
修改hadoop配置文件
需要修改6个配置文件:
# Hadoop环境变量
/home/hadoop/tools/hadoop-2.7.7/etc/hadoop/hadoop-env.sh
# Yarn环境变量
/hadoop/home/hadoop/tools-2.7.7/etc/hadoop/yarn-env.sh
# 注册slave节点
/home/hadoop/tools/hadoop-2.7.7/etc/hadoop/slaves
# Hadoop全局配置文件,可被其他文件的配置项覆盖
/home/hadoop/tools/hadoop-2.7.7/etc/hadoop/core-site.xml
# HDFS配置文件,该模板的属性继承于core-site.xml
/home/hadoop/tools/hadoop-2.7.7/etc/hadoop/hdfs-site.xml
# MapReduce的配置文件,该模板的属性继承于core-site.xml
/home/hadoop/tools/hadoop-2.7.7/etc/hadoop/mapred-site.xml
# yarn的配置文件,该模板的属性继承于core-site.xml
/home/hadoop/tools/hadoop-2.7.7/etc/hadoop/yarn-site.xml
hadoop-env.sh,yarn-env.sh
添加jdk环境变量即可。
# The java implementation to use.
export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/home/hadoop/tools/java/jdk1.8.0_191
export HADOOP_COMMON_LIB_NATIVE_DIR=/home/hadoop/tools/hadoop-2.7.7/lib/native
export HADOOP_OPTS=-Djava.library.path=/home/hadoop/tools/hadoop-2.7.7/lib
# 可选配置参数(不配不影响):
export HDFS_NAMENODE_USER=“hadoop”
export HDFS_DATANODE_USER=“hadoop”
export HDFS_SECONDARYNAMENODE_USER=“hadoop”
export YARN_RESOURCEMANAGER_USER=“hadoop”
export YARN_NODEMANAGER_USER=“hadoop"
slaves
配置从节点(这里需要注意的是hadoop3.0+版本以后,slaves文件命名为workers了)
[hadoop@jms-master-01 package]$ cat /home/hadoop/tools/hadoop-2.7.7/etc/hadoop/slaves
jms-master-02
jms-master-03
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://jms-master-01:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/tools/hadoop_temp</value>
<description>Abase for other temporary directories.</description>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>jms-master-01:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/tools/hadoop_data/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/tools/hadoop_data/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
mapred-site.xml
mapred-site.xml需要通过复制mapred-site.xml.template创建。
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>jms-master-01:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>jms-master-01:19888</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!— Site specific YARN configuration properties —>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>jms-master-01:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>jms-master-01:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>jms-master-01:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>jms-master-01:8033</value>
</property>
</configuration>
以上我们初步完成了master节点的hadoop安装。然后将master节点的hadoop解压包以及配置文件复制到其他两个节点。
scp -r /home/hadoop/tools/hadoop-2.7.7 hadoop@10.96.113.243:/home/hadoop/tools/
scp -r /home/hadoop/tools/hadoop-2.7.7 hadoop@10.96.85.231:/home/hadoop/tools/
scp /etc/profile.d/hadoop-2.7.7.sh root@10.96.113.243:/etc/profile.d/
scp /etc/profile.d/hadoop-2.7.7.sh root@10.96.85.231:/etc/profile.d/
别忘了source /etc/profile 使配置生效。然后检查是否安装成功:
[hadoop@jms-master-01 ~]$ hadoop version
Hadoop 2.7.7
Subversion Unknown -r c1aad84bd27cd79c3d1a7dd58202a8c3ee1ed3ac
Compiled by stevel on 2018-07-18T22:47Z
Compiled with protoc 2.5.0
From source with checksum 792e15d20b12c74bd6f19a1fb886490
This command was run using /home/hadoop/tools/hadoop-2.7.7/share/hadoop/common/hadoop-common-2.7.7.jar
启动hadoop
启动前,需要先在master节点格式化hdfs
/home/hadoop/tools/hadoop-2.7.7/bin/hdfs namenode -format testCluster
启动
/home/hadoop/tools/hadoop-2.7.7/sbin/start-dfs.sh
/home/hadoop/tools/hadoop-2.7.7/sbin/start-yarn.sh
Jps查看启动状态
master节点
[hadoop@jms-master-01 ~]$ jps
9651 NodeManager
9364 SecondaryNameNode
47268 Jps
9029 NameNode
9529 ResourceManager
9150 DataNode
node节点
[hadoop@jms-master-02 ~]$ jps
18643 Jps
2410 DataNode
2538 NodeManager
[hadoop@jms-master-03 ~]$ jps
16869 Jps
2232 DataNode
2360 NodeManager
管理界面
HDFS管理界面:http://10.96.81.166:50070/dfshealth.html#tab-overview
MR管理界面:http://10.96.81.166:8088/cluster/apps/RUNNING
打开连接查看管理界面详情.
运行一个woldcount
- 创建两个测试文件
vim test1
aaa bbb ccc ddd
eee fff ggg hhh
vim test2
aaa bbb ccc ddd 111
eee fff ggg hhh 111
- 创建一个hdfs测试目录,并上传测试文件到该目录中
hadoop fs -mkdir -p /user/hadoop/test/input
hadoop fs -put test* /user/hadoop/test/input/
- 先退出集群的安全模式
hdfs dfsadmin -safemode leave
- 提交任务到yarn
yarn jar /home/hadoop/tools/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /user/hadoop/input /user/hadoop/output
- 运行ing
[hadoop@jms-master-02 ~]$ yarn jar /home/hadoop/tools/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /user/hadoop/input /user/hadoop/output
19/03/15 20:08:39 INFO client.RMProxy: Connecting to ResourceManager at jms-master-01/10.96.81.166:8032
19/03/15 20:08:40 INFO input.FileInputFormat: Total input paths to process : 2
19/03/15 20:08:40 INFO mapreduce.JobSubmitter: number of splits:2
19/03/15 20:08:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552651623473_0002
19/03/15 20:08:41 INFO impl.YarnClientImpl: Submitted application application_1552651623473_0002
19/03/15 20:08:41 INFO mapreduce.Job: The url to track the job: http://jms-master-01:8088/proxy/application_1552651623473_0002/
19/03/15 20:08:41 INFO mapreduce.Job: Running job: job_1552651623473_0002
19/03/15 20:08:48 INFO mapreduce.Job: Job job_1552651623473_0002 running in uber mode : false
19/03/15 20:08:48 INFO mapreduce.Job: map 0% reduce 0%
19/03/15 20:08:57 INFO mapreduce.Job: map 100% reduce 0%
19/03/15 20:09:06 INFO mapreduce.Job: map 100% reduce 100%
19/03/15 20:09:07 INFO mapreduce.Job: Job job_1552651623473_0002 completed successfully
19/03/15 20:09:07 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=75
FILE: Number of bytes written=368945
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=274
HDFS: Number of bytes written=31
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=12594
Total time spent by all reduces in occupied slots (ms)=6686
Total time spent by all map tasks (ms)=12594
Total time spent by all reduce tasks (ms)=6686
Total vcore-milliseconds taken by all map tasks=12594
Total vcore-milliseconds taken by all reduce tasks=6686
Total megabyte-milliseconds taken by all map tasks=12896256
Total megabyte-milliseconds taken by all reduce tasks=6846464
Map-Reduce Framework
Map input records=4
Map output records=8
Map output bytes=78
Map output materialized bytes=81
Input split bytes=228
Combine input records=8
Combine output records=6
Reduce input groups=4
Reduce shuffle bytes=81
Reduce input records=6
Reduce output records=4
Spilled Records=12
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=306
CPU time spent (ms)=2490
Physical memory (bytes) snapshot=698810368
Virtual memory (bytes) snapshot=6430330880
Total committed heap usage (bytes)=556793856
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=46
File Output Format Counters
Bytes Written=31
- 查看运行结果
[hadoop@jms-master-01 xiepengjie]$ hadoop fs -ls /user/hadoop/output
19/03/15 20:16:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
Found 2 items
-rw-r—r— 3 hadoop supergroup 0 2019-03-15 20:15 /user/hadoop/output/_SUCCESS
-rw-r—r— 3 hadoop supergroup 54 2019-03-15 20:15 /user/hadoop/output/part-r-00000
[hadoop@jms-master-01 xiepengjie]$ hadoop fs -cat /user/hadoop/output/part-r-00000
19/03/15 20:16:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
111 2
aaa 2
bbb 2
ccc 2
ddd 2
eee 2
fff 2
ggg 2
hhh 2
至此,hadoop集群搭建完毕。