版本:hadoop-2.6.0-cdh5.16.2.tar.gz(相当于Apache下的2.9-2.10版本)
1. Hadoop hdfs安装
1.1 创建用户和文件夹
[root@hadoop001 ~]# useradd hadoop
[root@hadoop001 ~]# su - hadoop
[hadoop@hadoop001 ~]$ mkdir tmp sourcecode software shell log lib app data
[hadoop@hadoop001 ~]$ cd software/
// 这是提前rz上去的安装包
[hadoop@hadoop001 software]$ ll
total 1266604
-rw-r--r-- 1 root root 434354462 Feb 24 14:01 hadoop-2.6.0-cdh5.16.2.tar.gz
-rw-r--r-- 1 hadoop hadoop 185646832 Feb 24 12:03 jdk-8u181-linux-x64.tar.gz
1.2 安装部署jdk
jdk-8u181-linux-x64.tar.gz
[root@hadoop001 ~]# mkdir /usr/java
[root@hadoop001 ~]# tar -xzvf jdk-8u181-linux-x64.tar.gz -C /usr/java/
[root@hadoop001 ~]# cd /usr/java
[root@hadoop001 java]# chown -R root:root jdk1.8.0_181
[root@hadoop001 java]# vi /etc/profile
#hadoop env
export JAVA_HOME=/usr/java/jdk1.8.0_181
export PATH=$JAVA_HOME/bin:$PATH
[root@hadoop001 ~]# source /etc/profile
[root@hadoop001 ~]# which java
/usr/java/jdk1.8.0_181/bin/java
1.3 hadoop解压和软连接
[hadoop@hadoop001 software]$ tar -xzvf hadoop-2.6.0-cdh5.16.2.tar.gz -C ../app/
[hadoop@hadoop001 app]$
[hadoop@hadoop001 app]$ ll
total 4
drwxr-xr-x 14 hadoop hadoop 4096 Jun 3 2019 hadoop-2.6.0-cdh5.16.2
[hadoop@hadoop001 app]$ ln -s hadoop-2.6.0-cdh5.16.2 hadoop
[hadoop@hadoop001 app]$ ll
total 4
lrwxrwxrwx 1 hadoop hadoop 22 May 6 22:05 hadoop -> hadoop-2.6.0-cdh5.16.2
drwxr-xr-x 14 hadoop hadoop 4096 Jun 3 2019 hadoop-2.6.0-cdh5.16.2
[hadoop@hadoop001 app]$
1.4 配置ssh hadoop001 无密码验证
[hadoop@hadoop001 ~]$ rm -rf .ssh
[hadoop@hadoop001 ~]$
[hadoop@hadoop001 ~]$
[hadoop@hadoop001 ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:fhAts9iahMuFy0r/djKCcAO7m8vPm5lf2ExdkWUqIdw hadoop@ruozedata001
The key's randomart image is:
+---[RSA 2048]----+
| .... .oo |
| ..E..+ |
| +..o |
|. o o.=o |
| o o +.S. |
|o oo ==+ . |
| +.o=.o+. . |
|oooo= = .. |
|++oB+=.+ |
+----[SHA256]-----+
[hadoop@hadoop001 ~]$ cd .ssh
[hadoop@hadoop001 .ssh]$
[hadoop@hadoop001 .ssh]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@hadoop001 .ssh]$ chmod 0600 ~/.ssh/authorized_keys
[hadoop@hadoop001 .ssh]$
[hadoop@hadoop001 .ssh]$ ssh hadoop001 date
The authenticity of host hadoop001 (192.168.0.3)' can't be established.
ECDSA key fingerprint is SHA256:OLqoaMxlGFbCq4sC9pYgF+FdbcXHbEbtSrnMiGGFbVw.
ECDSA key fingerprint is MD5:d3:5b:4a:ef:8e:00:41:a0:5e:80:ef:75:76:8a:a3:49.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'ruozedata001,192.168.0.3' (ECDSA) to the list of known hosts.
Wed May 6 22:26:57 CST 2020
[hadoop@hadoop001 .ssh]$
[hadoop@hadoop001 .ssh]$
[hadoop@hadoop001 .ssh]$ ssh hadoop001 date
Wed May 6 22:27:07 CST 2020
[hadoop@hadoop001 .ssh]$
1.5 修改配置文件 ,且hdfs的三个进程都以hadoop001名称启动
nn启动以hadoop001名称启动
etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop001:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp/</value>
</property>
</configuration>
snn启动以hadoop001名称启动
etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop001:9868</value>
</property>
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>hadoop001:9869</value>
</property>
</configuration>
dn启动以hadoop001名称启动
[hadoop@hadoop001 hadoop]$ pwd
/home/hadoop/app/hadoop/etc/hadoop
[hadoop@hadoop001 hadoop]$ vi slaves
hadoop001
1.6 添加环境变量
进入hadoop用户的家目录
[hadoop@hadoop001 ~]$ vi .bashrc
# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
# Uncomment the following line if you don't like systemctl's auto-paging feature:
# export SYSTEMD_PAGER=
# User specific aliases and functions
export HADOOP_HOME=/home/hadoop/app/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
[hadoop@hadoop001 ~]$ source .bashrc
[hadoop@hadoop001 ~]$
1.6 格式化 ,只需第一次即可,格式化自己的编码储存格式
[hadoop@hadoop001 hadoop]$ pwd
/home/hadoop/app/hadoop
[hadoop@hadoop001 hadoop]$ bin/hdfs namenode -format
#### 1.7 启动
[hadoop@hadoop001 hadoop]$ sbin/start-dfs.sh
20/05/06 22:43:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [hadoop001]
hadoop001: starting namenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-hadoop-namenode-hadoop001.out
hadoop001: starting datanode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-hadoop-datanode-hadoop001.out
Starting secondary namenodes [hadoop001]
hadoop001: starting secondarynamenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-hadoop-secondarynamenode-hadoop001.out
20/05/06 22:43:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop001 hadoop]$ jps
21712 DataNode dn 存储数据的 小弟
21585 NameNode nn 负责分配数据存储的 老大
21871 SecondaryNameNode snn 万年老二 默认是按1小时粒度去备份老大的数据
21999 Jps
[hadoop@hadoop001 hadoop]$
1.8 web查看进程
[http://192.168.131.128:50070/dfshealth.html#tab-overview]
1.9 创建文件夹
[hadoop@hadoop001 ~]$ hdfs dfs -mkdir /user
20/05/18 20:03:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop001 ~]$ hdfs dfs -ls /
20/05/18 20:03:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 3 items
drwx------ - hadoop supergroup 0 2020-05-10 15:52 /tmp
drwxr-xr-x - hadoop supergroup 0 2020-05-18 20:03 /user
drwxr-xr-x - hadoop supergroup 0 2020-05-10 15:52 /wordcount
[hadoop@hadoop001 ~]$
1.9 上传下载文件
[hadoop@hadoop001 ~]$ hdfs dfs -put error.log /wordcount
20/05/18 20:07:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop001 ~]$ hdfs dfs -ls /wordcount
20/05/18 20:07:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 3 items
-rw-r--r-- 1 hadoop supergroup 1763 2020-05-18 20:07 /wordcount/error.log
drwxr-xr-x - hadoop supergroup 0 2020-05-10 15:16 /wordcount/input
drwxr-xr-x - hadoop supergroup 0 2020-05-10 15:53 /wordcount/output
[hadoop@hadoop001 ~]$
下载
[hadoop@hadoop001 ~]$ hdfs dfs -get /wordcount/output/part-r-00000
20/05/18 20:11:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop001 ~]$ ll
total 12
drwxrwxr-x. 3 hadoop hadoop 50 May 7 11:38 app
drwxrwxr-x. 2 hadoop hadoop 32 May 10 15:13 data
-rw-rw-r--. 1 hadoop hadoop 3039 May 10 14:43 error1.log
-rw-rw-r--. 1 hadoop hadoop 1763 May 10 14:40 error.log
drwxrwxr-x. 2 hadoop hadoop 6 May 7 11:21 lib
drwxrwxr-x. 2 hadoop hadoop 6 May 7 11:21 log
-rw-r--r--. 1 hadoop hadoop 64 May 18 20:11 part-r-00000
drwxrwxr-x. 2 hadoop hadoop 6 May 7 11:21 shell
drwxrwxr-x. 2 hadoop hadoop 77 May 7 11:25 software
drwxrwxr-x. 2 hadoop hadoop 6 May 7 11:21 sourcecode
drwxrwxr-x. 4 hadoop hadoop 222 May 18 20:00 tmp
[hadoop@hadoop001 ~]$
2 YARN安装部署
2.1 修改配置文件
etc/hadoop/mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
etc/hadoop/yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
//这里的端口号修改为其他的,防止被挖矿,8088的端口号会被扫描为yarn
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>ruozedata001:7776</value>
</property>
</configuration>
2.2 启动yarn进程
[hadoop@hadoop001 hadoop]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/yarn-hadoop-resourcemanager-hadoop001.out
hadoop001: starting nodemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/yarn-hadoop-nodemanager-hadoop001.out
[hadoop@hadoop001 hadoop]$ jps
9539 DataNode
12135 NodeManager
12360 Jps
9401 NameNode
12011 ResourceManager
9708 SecondaryNameNode
2.3 web端查看进程
(http://192.168.131.128:18088/cluster)
2.4 词频统计
1.10 计算
[hadoop@hadoop001 hadoop]$ hadoop jar share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar wordcount /wordcount/error.log /user/output
20/05/18 20:17:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/05/18 20:17:27 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
20/05/18 20:17:28 INFO input.FileInputFormat: Total input paths to process : 1
20/05/18 20:17:28 INFO mapreduce.JobSubmitter: number of splits:1
20/05/18 20:17:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1589803231154_0001
20/05/18 20:17:28 INFO impl.YarnClientImpl: Submitted application application_1589803231154_0001
20/05/18 20:17:28 INFO mapreduce.Job: The url to track the job: http://hadoop001:18088/proxy/application_1589803231154_0001/
20/05/18 20:17:28 INFO mapreduce.Job: Running job: job_1589803231154_0001
20/05/18 20:17:39 INFO mapreduce.Job: Job job_1589803231154_0001 running in uber mode : false
20/05/18 20:17:39 INFO mapreduce.Job: map 0% reduce 0%
20/05/18 20:17:46 INFO mapreduce.Job: map 100% reduce 0%
20/05/18 20:17:53 INFO mapreduce.Job: map 100% reduce 100%
20/05/18 20:17:54 INFO mapreduce.Job: Job job_1589803231154_0001 completed successfully
20/05/18 20:17:54 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=1554
FILE: Number of bytes written=289077
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1869
HDFS: Number of bytes written=1180
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=4100
Total time spent by all reduces in occupied slots (ms)=4615
Total time spent by all map tasks (ms)=4100
Total time spent by all reduce tasks (ms)=4615
Total vcore-milliseconds taken by all map tasks=4100
Total vcore-milliseconds taken by all reduce tasks=4615
Total megabyte-milliseconds taken by all map tasks=4198400
Total megabyte-milliseconds taken by all reduce tasks=4725760
Map-Reduce Framework
Map input records=11
Map output records=139
Map output bytes=2316
Map output materialized bytes=1554
Input split bytes=106
Combine input records=139
Combine output records=92
Reduce input groups=92
Reduce shuffle bytes=1554
Reduce input records=92
Reduce output records=92
Spilled Records=184
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=129
CPU time spent (ms)=1640
Physical memory (bytes) snapshot=306192384
Virtual memory (bytes) snapshot=5457453056
Total committed heap usage (bytes)=165810176
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1763
File Output Format Counters
OK啦!伪分布式布置完成,去玩玩吧