软件准备
- 一台Linux虚拟机
我用的CentOS-6.6的一个虚拟机,主机名为repo
参考在Windows中安装一台Linux虚拟机 - 该虚拟机中安装了JDK
参考在Linux中安装JDK - 该虚拟机可以免秘钥登录自身
参考配置各台虚拟机之间免秘钥登录 - hadoop安装包
下载地址:https://mirrors.aliyun.com/apache/hadoop/common/
我用的hadoop2.6.5
1. 把hadoop安装包上传到服务器并解压
[root@repo ~]# tar zxvf hadoop-2.6.5.tar.gz -C /opt/apps/
2. 配置环境变量
# + 可以直接定位到文件的最后一行
[root@repo hadoop-2.6.5]# vi + /etc/profile
export HADOOP_HOME=/opt/apps/hadoop-2.6.5
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
[root@repo hadoop-2.6.5]# . /etc/profile
3. 修改 hadoop-env.sh、mapred-env.sh、yarn-env.sh 这三个配置文件,添加JAVA_HOME
[root@repo hadoop]# pwd
/opt/apps/hadoop-2.6.5/etc/hadoop
[root@repo hadoop]# vi hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_73
[root@repo hadoop]# vi mapred-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_73
[root@repo hadoop]# vi yarn-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_73
4. 修改 core-site.xml 和 hdfs-site.xml 配置文件,配置伪分布式相关的内容
[root@repo hadoop]# vi core-site.xml
<configuration>
<!--设置namenode所在节点-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://repo:9000</value>
</property>
<!--
设置hadoop存放数据的目录
Hadoop默认把数据块的元数据和数据存放在操作系统的/tmp目录下
但操作系统的/tmp目录会不定时清空,所以要做修改
-->
<property>
<name>hadoop.tmp.dir</name>
<value>/var/hadoop/pseudo</value>
</property>
</configuration>
[root@repo hadoop]# vi hdfs-site.xml
<configuration>
<!--设置block副本数,不能超过节点数-->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<!—设置secondaryNode在哪个节点-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>repo:50090</value>
</property>
</configuration>
5. 修改 slaves 配置文件,指定DataNode所在的节点
[root@repo hadoop]# vi slaves
repo
6. 格式化文件系统
[root@repo hadoop]# hadoop namenode --format
# 成功信息
17/09/16 21:17:11 INFO common.Storage: Storage directory /var/hadoop/pseudo/dfs/name has been successfully formatted.
7. 启动hdfs和yarn
[root@repo hadoop]# start-dfs.sh
Starting namenodes on [repo]
repo: starting namenode, logging to /opt/apps/hadoop-2.6.5/logs/hadoop-root-namenode-repo.out
repo: starting datanode, logging to /opt/apps/hadoop-2.6.5/logs/hadoop-root-datanode-repo.out
Starting secondary namenodes [repo]
repo: starting secondarynamenode, logging to /opt/apps/hadoop-2.6.5/logs/hadoop-root-secondarynamenode-repo.out
[root@repo hadoop]# start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop-2.7.4/logs/yarn-root-resourcemanager-repo.out
repo: starting nodemanager, logging to /opt/hadoop-2.7.4/logs/yarn-root-nodemanager-repo.out
[root@repo hadoop]# jps
4368 Jps
3957 ResourceManager
3512 NameNode
3641 DataNode
4058 NodeManager
3805 SecondaryNameNode
8. 访问WEB页面
搭建成功!