搭建一个简单的 Hadoop 环境用于编程。
环境
硬件
配置 | 测试配置 |
---|---|
CPU | 1.8GHz |
内存 | 4GB |
核心 | 4核 |
带宽 | 1000Mb |
软件
- VMware® Workstation 16 Pro 16.1.1 build-17801498
- CentOS Linux release 7.6.1810 (Core)
- jdk-8u202-linux-x64
- hadoop-3.3.2
系统设置
ssh 命令
安装ssh
yum install ssh
检查是否能够免密登录本机
ssh localhost
开启本机的免密访问权限
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
关闭防火墙
systemctl stop firewalld.service
systemctl disable firewalld.service
关闭selinux
setenforce 0
vi /etc/selinux/config
重启
shutdown -r now
下载 Hadoop
安装JDK
参考文章CentOS7 安装jdk
安装 Hadoop
上传安装包并解压
复制安装文件到指定目录并解压
mkdir /joinway # 创建工作目录
cp /mnt/hadoop-3.3.2.tar.gz /joinway/ # 复制安装文件到工作目录
cd /joinway # 进入安装目录
chmod 755 hadoop-3.3.2.tar.gz # 修改安装文件权限
tar -xvf hadoop-3.3.2.tar.gz # 解压安装文件
chown -R root hadoop-3.3.2
chgrp -R root hadoop-3.3.2 # 修改解压文件归属用户、用户组
mv hadoop-3.3.2/ hadoop/ # 修改文件夹名,可以不做的,强迫症犯了(〃'▽'〃)
配置 JAVA_HOME
cd /joinway/hadoop/
vim etc/hadoop/hadoop-env.sh
指定java_home
# The java implementation to use. By default, this environment
# variable is REQUIRED on ALL platforms except OS X!
export JAVA_HOME=/usr/java/jdk1.8.0_191
配置core-site
vim /joinway/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
配置hdfs-site
vim /joinway/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
配置用户
vim /joinway/hadoop/sbin/start-dfs.sh
vim /joinway/hadoop/sbin/stop-dfs.sh
添加如下信息
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
格式化系统文件
/joinway/hadoop/bin/hdfs namenode -format
简单测试hadoop
启动、关闭操作
/joinway/hadoop/sbin/start-dfs.sh
/joinway/hadoop/sbin/stop-dfs.sh
管理后台地址:http://ip:9870/
执行一个MR任务
/joinway/hadoop/bin/hdfs dfs -mkdir /user
/joinway/hadoop/bin/hdfs dfs -mkdir /user/root
/joinway/hadoop/bin/hdfs dfs -mkdir input
/joinway/hadoop/bin/hdfs dfs -put etc/hadoop/*.xml input
/joinway/hadoop/bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.2.jar grep input output 'dfs[a-z.]+'
查看运行结果
/joinway/hadoop/bin/hdfs dfs -cat output/*
安装YARN
配置mapred-site
vim /joinway/hadoop/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
</configuration>
配置yarn-site
vim /joinway/hadoop/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
配置root用户
vim /joinway/hadoop/sbin/start-yarn.sh
vim /joinway/hadoop/sbin/stop-yarn.sh
添加如下信息
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
简单测试yarn
启动、关闭操作
/joinway/hadoop/sbin/start-yarn.sh
/joinway/hadoop/sbin/stop-yarn.sh
管理后台地址:http://ip:8088/
/joinway/hadoop/bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.2.jar pi 2 3
可以通过浏览器查看任务的运行情况