Hadoop伪集群搭建
前言
需要先准备一下工作:
- 关闭防火墙
- 设置主机名称
- IP与主机名称进行绑定
- 安装JDK环境(Hadoop需要在Java环境中运行)
一.配置Hadoop环境
-
解压Hadoop安装包到opt/app路径下
tar -zxvf hadoop-2.7.1.tar.gz -C /opt/app
-
配置etc/hadoop里面的配置文件
-
设置文件夹权限
sudo chown -R root:root /opt/app/hadoop-2.7.1
-
配置hadoop-env.sh,填入jdk目录
# The java implementation to use. export JAVA_HOME=/opt/app/jdk1.8.0_152
-
返回hadoop根目录,检验环境配置是否有问题
bin/hadoop Usage: hadoop [--config confdir] [COMMAND | CLASSNAME] CLASSNAME run the class named CLASSNAME or where COMMAND is one of: fs run a generic filesystem user client version print the version jar <jar> run a jar file note: please use "yarn jar" to launch YARN applications, not this command. checknative [-a|-h] check native hadoop and compression libraries availability distcp <srcurl> <desturl> copy file or directories recursively archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive classpath prints the class path needed to get the credential interact with credential providers Hadoop jar and the required libraries daemonlog get/set the log level for each daemon trace view and modify Hadoop tracing settings Most commands print help when invoked w/o parameters.
-
配置core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>
-
配置hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
-
配置mapred-site.xml(需要先把mapred-site.xml.template改成mapred-site.xml)
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.application.classpath</name> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value> </property> </configuration>
-
配置yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property> </configuration>
-
二、启动Hadoop服务
回到Hadoop文件夹根目录
-
格式化文件系统
bin/hdfs namenode -format
-
启动NameNode和DataNode守护程序
sbin/hadoop-daemon.sh start namenode sbin/hadoop-daemon.sh start datanode
-
启动Yarn中的ResourceManager和NodeManager守护程序
sbin/yarn-daemon.sh start resourcemanager sbin/yarn-daemon.sh start nodemanager
-
用命令jps查看java进程
jps 126098 Jps 14532 DataNode 17284 ResourceManager 14235 NameNode 17679 NodeManager
可以在web中查看服务页面
三、运行简单实例单词统计
-
编写一个文件a.txt
hadoop java hbase hello hadoop java zookeeper hello sqoop hbase flume spark
-
上传到hdfs集群的根目录上(可以在HDFS外部访问web界面 localhost:50070查看 )
bin/hdfs dfs -put a.txt /a.txt
-
跑动MapReduce中的wordcount
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /a.txt output
-
在hdfs文件系统上打开output,
bin/hdfs dfs -ls /output Found 2 items -rw-r--r-- 1 hadoop supergroup 0 2020-02-24 23:39 /output/_SUCCESS -rw-r--r-- 1 hadoop supergroup 68 2020-02-24 23:39 /output/part-r-00000
-
查看文本中的文件
bin/hdfs dfs -text /output/part* flume 1 hadoop 2 hbase 2 hello 2 java 2 spark 1 sqoop 1 zookeeper 1
完成!
Hadoop目录中各个文件夹
bin 基本脚本管理
sbin 服务启动与关闭脚本
share jar包
etc 配置文件