1 安装文件下载
jdk 1.8.0_131:
官网下载 jdk-8u131-linux-x64.tar.gz
hadoop 2.7.1:
hadoop@master:~$ wget https://archive.apache.org/dist/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
2 配置hosts
以master为例:
hadoop@master:~$ vi /etc/hosts
192.168.1.245 master
192.168.1.247 slave01
192.168.1.249 slave02
验证:
hadoop@master:~$ cat /etc/hosts
192.168.1.245 master
192.168.1.247 slave01
192.168.1.249 slave02
slave01,slave02配置相同。
3 配置master到各节点ssh的无密码传输
配置无密码传输:
hadoop@master:~/.ssh$ cd /home/hadoop/.ssh/
hadoop@master:~/.ssh$ ssh-keygen
hadoop@master:~/.ssh$ ls -la
总用量 24
drwxr-xr-x 2 hadoop hadoop 4096 4月 22 19:51 .
drwxr-xr-x 23 hadoop hadoop 4096 4月 25 14:43 ..
-rw-rw-r-- 1 hadoop hadoop 395 4月 22 19:51 authorized_keys
-rw------- 1 hadoop hadoop 1675 4月 22 19:50 id_rsa
-rw-r--r-- 1 hadoop hadoop 395 4月 22 19:50 id_rsa.pub
-rw-r--r-- 1 hadoop hadoop 1332 4月 22 20:08 known_hosts
hadoop@master:~/.ssh$ cat id_rsa.pub >> authorized_keys
hadoop@master:~/.ssh$ scp authorized_keys hadoop@slave01:/home/hadoop/.ssh
hadoop@master:~/.ssh$ scp authorized_keys hadoop@slave02:/home/hadoop/.ssh
验证:
hadoop@master:~/.ssh$ ssh slave01
Welcome to Ubuntu 14.04.6 LTS (GNU/Linux 4.4.0-142-generic x86_64)
* Documentation: https://help.ubuntu.com/
Last login: Tue Apr 23 13:59:41 2019 from 192.168.1.153
hadoop@slave01:~$
.ssh目录、公钥、密钥权限:
hadoop@master:~$ ls -l /home/ | grep hadoop
drwxr-xr-x 23 hadoop hadoop 4096 4月 25 14:43 hadoop # 755
hadoop@master:~$ ls -la /home/hadoop | grep .ssh
drwxr-xr-x 2 hadoop hadoop 4096 4月 22 19:51 .ssh # 755
hadoop@master:~$ ls -l /home/hadoop/.ssh/
总用量 16
-rw-r--r-- 1 hadoop hadoop 395 4月 22 19:51 authorized_keys # 644
-rw------- 1 hadoop hadoop 1675 4月 22 19:50 id_rsa # 600
-rw-r--r-- 1 hadoop hadoop 395 4月 22 19:50 id_rsa.pub # 644
-rw-r--r-- 1 hadoop hadoop 1332 4月 22 20:08 known_hosts
4 安装JDK
安装jdk:
hadoop@master:~$ tar zxvf jdk-linux-x64.tar.gz
hadoop@master:~$ sudo mv jdk1.8.0_131 /usr/lib/jvm/
环境变量:
hadoop@master:~$ vi /home/hadoop/.bashrc
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_131
export PATH=$JAVA_HOME/bin:$PATH
使环境变量设置生效:
hadoop@master:~$ source /home/hadoop/.bashrc
验证:
hadoop@master:~$ env | grep JAVA
JAVA_HOME=/usr/lib/jvm/jdk1.8.0_131
hadoop@master:~$ java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
5 安装配置Hadoop
安装hadoop:
hadoop@master:~$ mkdir /home/hadoop/bigdata
hadoop@master:~$ tar zxvf hadoop-2.7.1.tar.gz -C bigdata/
hadoop@master:~$ cd bigdata/
hadoop@master:~/bigdata$ mv hadoop-2.7.1/ hadoop
环境变量:
hadoop@master:~/bigdata$ vi /home/hadoop/.bashrc
export HADOOP_HOME=/home/hadoop/bigdata/hadoop
export HADOOP_USER_NAME=hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
使环境变量设置生效:
hadoop@master:~$ source /home/hadoop/.bashrc
验证:
hadoop@master:~/bigdata$ env | grep HADOOP
HADOOP_HOME=/home/hadoop/bigdata/hadoop
HADOOP_USER_NAME=hadoop
修改 core-site.xml:
hadoop@master:~$ vi /home/hadoop/bigdata/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/bigdata/data/hadoop/tmp</value>
</property>
</configuration>
修改 hdfs-site.xml:
hadoop@master:~$ vi /home/hadoop/bigdata/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/bigdata/data/hadoop/hdfs/datanode</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/bigdata/data/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
修改 mapred-site.xml:
hadoop@master:~$ vi /home/hadoop/bigdata/hadoop/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
修改 yarn-site.xml:
hadoop@master:~$ vi /home/hadoop/bigdata/hadoop/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
修改 slaves:
hadoop@master:~$ vi /home/hadoop/bigdata/hadoop/etc/hadoop/slaves
slave01
slave02
以上配置信息复制到slave01对应的目录:
hadoop@master:~$ scp /home/hadoop/.bashrc hadoop@slave01:/home/hadoop/
hadoop@master:~$ scp /etc/hosts hadoop@slave01:/etc/hosts
hadoop@master:~$ scp -r /home/hadoop/bigdata/hadoop/ hadoop@slave01:/home/hadoop/bigdata/
slave02相同。
6 启动
master:
hadoop@master:~$ hdfs namenode -format
hadoop@master:~$ cd /home/hadoop/bigdata/hadoop/sbin/
hadoop@master:~/bigdata/hadoop/sbin$ ./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /home/hadoop/bigdata/hadoop/logs/hadoop-hadoop-namenode-master.out
slave01: starting datanode, logging to /home/hadoop/bigdata/hadoop/logs/hadoop-hadoop-datanode-slave01.out
slave02: starting datanode, logging to /home/hadoop/bigdata/hadoop/logs/hadoop-hadoop-datanode-slave02.out
Starting secondary namenodes [master]
master: starting secondarynamenode, logging to /home/hadoop/bigdata/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/bigdata/hadoop/logs/yarn-hadoop-resourcemanager-master.out
slave01: starting nodemanager, logging to /home/hadoop/bigdata/hadoop/logs/yarn-hadoop-nodemanager-slave01.out
slave02: starting nodemanager, logging to /home/hadoop/bigdata/hadoop/logs/yarn-hadoop-nodemanager-slave02.out
hadoop@master:~/bigdata/hadoop/sbin$ jps
2049 NameNode
2257 SecondaryNameNode
2699 Jps
2415 ResourceManager
slave01:
hadoop@master:~/bigdata/hadoop/sbin$ ssh slave01
Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 4.4.0-142-generic x86_64)
* Documentation: https://help.ubuntu.com/
Last login: Thu Apr 25 15:10:12 2019 from master
hadoop@slave01:~$ jps
3619 DataNode
3924 Jps
3708 NodeManager
slave02:
hadoop@master:~/bigdata/hadoop/sbin$ ssh slave02
Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 4.4.0-142-generic x86_64)
* Documentation: https://help.ubuntu.com/
Last login: Tue Apr 23 13:59:44 2019 from 192.168.1.153
hadoop@slave02:~$ jps
3555 DataNode
3860 Jps
3644 NodeManager
7 验证
基本信息:
hadoop@master:~/bigdata/hadoop$ hdfs dfsadmin -report
Configured Capacity: 31354429440 (29.20 GB)
Present Capacity: 20549148672 (19.14 GB)
DFS Remaining: 20548468736 (19.14 GB)
DFS Used: 679936 (664 KB)
DFS Used%: 0.00%
Under replicated blocks: 14
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (2):
Name: 192.168.1.249:50010 (slave02)
Hostname: slave02
Decommission Status : Normal
Configured Capacity: 15677214720 (14.60 GB)
DFS Used: 339968 (332 KB)
Non DFS Used: 5402562560 (5.03 GB)
DFS Remaining: 10274312192 (9.57 GB)
DFS Used%: 0.00%
DFS Remaining%: 65.54%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Apr 25 16:42:27 CST 2019
Name: 192.168.1.247:50010 (slave01)
Hostname: slave01
Decommission Status : Normal
Configured Capacity: 15677214720 (14.60 GB)
DFS Used: 339968 (332 KB)
Non DFS Used: 5402718208 (5.03 GB)
DFS Remaining: 10274156544 (9.57 GB)
DFS Used%: 0.00%
DFS Remaining%: 65.54%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Apr 25 16:42:26 CST 2019
存储测试:
hadoop@master:~/bigdata/hadoop$ hdfs dfs -mkdir /test01
hadoop@master:~/bigdata/hadoop$ hdfs dfs -ls /
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2019-04-25 16:44 /test01
drwx------ - hadoop supergroup 0 2019-04-23 12:50 /tmp
hadoop@master:~/bigdata/hadoop$ hdfs dfs -copyFromLocal ./README.txt /test01
hadoop@master:~/bigdata/hadoop$ hdfs dfs -ls /test01
Found 1 items
-rw-r--r-- 3 hadoop supergroup 1366 2019-04-25 16:45 /test01/README.txt
计算测试:
hadoop@master:~/bigdata/hadoop$ hadoop jar ~/bigdata/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /test01/README.txt /output01
19/04/25 16:51:11 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.1.245:8032
19/04/25 16:51:13 INFO input.FileInputFormat: Total input paths to process : 1
19/04/25 16:51:13 INFO mapreduce.JobSubmitter: number of splits:1
19/04/25 16:51:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1556181225140_0001
19/04/25 16:51:15 INFO impl.YarnClientImpl: Submitted application application_1556181225140_0001
19/04/25 16:51:15 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1556181225140_0001/
19/04/25 16:51:15 INFO mapreduce.Job: Running job: job_1556181225140_0001
19/04/25 16:52:25 INFO mapreduce.Job: Job job_1556181225140_0001 running in uber mode : false
19/04/25 16:52:25 INFO mapreduce.Job: map 0% reduce 0%
19/04/25 16:53:14 INFO mapreduce.Job: map 100% reduce 0%
19/04/25 16:53:58 INFO mapreduce.Job: map 100% reduce 100%
19/04/25 16:54:02 INFO mapreduce.Job: Job job_1556181225140_0001 completed successfully
19/04/25 16:54:03 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=1836
FILE: Number of bytes written=234579
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1467
HDFS: Number of bytes written=1306
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=40232
Total time spent by all reduces in occupied slots (ms)=43204
Total time spent by all map tasks (ms)=40232
Total time spent by all reduce tasks (ms)=43204
Total vcore-seconds taken by all map tasks=40232
Total vcore-seconds taken by all reduce tasks=43204
Total megabyte-seconds taken by all map tasks=41197568
Total megabyte-seconds taken by all reduce tasks=44240896
Map-Reduce Framework
Map input records=31
Map output records=179
Map output bytes=2055
Map output materialized bytes=1836
Input split bytes=101
Combine input records=179
Combine output records=131
Reduce input groups=131
Reduce shuffle bytes=1836
Reduce input records=131
Reduce output records=131
Spilled Records=262
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=1310
CPU time spent (ms)=9000
Physical memory (bytes) snapshot=244305920
Virtual memory (bytes) snapshot=664309760
Total committed heap usage (bytes)=136065024
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1366
File Output Format Counters
Bytes Written=1306
hadoop@master:~/bigdata/hadoop$ hdfs dfs -ls /output01
Found 2 items
-rw-r--r-- 3 hadoop supergroup 0 2019-04-25 16:54 /output01/_SUCCESS
-rw-r--r-- 3 hadoop supergroup 1306 2019-04-25 16:53 /output01/part-r-00000
8 总结
安装包 百度网盘链接: https://pan.baidu.com/s/1Nxd82L800_JAWqTlZrDSOA 提取码: xwbu
配置参考代码 github: https://github.com/zhixingkad/bigdata