2 Hadoop安装搭建
2.1 集群规划
2.2 上传安装包到/opt/software
2.3 修改配置文件 ★
路径 /opt/software/hadoop-2.7.5/etc/hadoop
2.3.1 修改core-site.xml(hadoop01节点)
<configuration>
<!-- 指定集群的文件系统类型:分布式文件系统 -->
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop01:8020</value>
</property>
<!-- 指定临时文件存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/software/hadoop-2.7.5/hadoopDatas/tempDatas</value>
</property>
<!-- 缓冲区大小,实际工作中根据服务器性能调整 -->
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
<!-- 开启hdfs的垃圾桶机制,删除掉的数据可以从垃圾桶中回收,单位分钟 -->
<property>
<name>fs.trash.interval</name>
<value>10080</value>
</property>
</configuration>
2.3.2 修改hdfs-site.xml(hadoop01节点)
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop01:50090</value>
</property>
<!-- 指定namenode的访问地址和端口 -->
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop01:50070</value>
</property>
<!-- 指定namenode元数据的存放位置 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///opt/software/hadoop-2.7.5/hadoopDatas/namenodeDatas,file:///opt/software/hadoop-2.7.5/hadoopDatas/namenodeDatas2</value>
</property>
<!-- 定义dataNode数据存储的节点位置,实际工作中,一般先确定磁盘的挂在目录,然后多个目录用,进行分割 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///opt/software/hadoop-2.7.5/hadoopDatas/datanodeDatas,file:///opt/software/hadoop-2.7.5/hadoopDatas/datanodeDatas2</value>
</property>
<!-- 指定namenode日志文件存放位置 -->
<property>
<name>dfs.namenode.edits.dir</name>
<value>file:///opt/software/hadoop-2.7.5/hadoopDatas/nn/edits</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///opt/software/hadoop-2.7.5/hadoopDatas/snn/name</value>
</property>
<property>
<name>dfs.namenode.checkpoint.edits.dir</name>
<value>file:///opt/software/hadoop-2.7.5/hadoopDatas/dfs/snn/edits</value>
</property>
<!-- 指定文件切片副本个数 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 设置HDFS的文件权限 -->
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<!-- 设置一个文件切片的大小 -->
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
</configuration>
2.3.3 修改hadoop-env.sh(hadoop01节点)
# The java implementation to use.
export JAVA_HOME=/opt/software/jdk1.8.0_11
2.3.4 修改mapred-site.xml(hadoop01节点)
拷贝mapred-site.xml模板
<configuration>
<!-- 开启MapReduce小任务模式 -->
<property>
<name>mapreduce.job.ubertask.enable</name>
<value>true</value>
</property>
<!-- 设置历史任务的主机和端口 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop01:10020</value>
</property>
<!-- 设置网页访问历史任务的主机和端口 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop01:19888</value>
</property>
</configuration>
2.3.5 修改yarn-site.xml(hadoop01节点)
<configuration>
<!-- yarn主节点位置 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop01</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 开启日志聚合功能 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 设置聚合日志在hdfs上的保存时间 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
<!-- 设置yarn集群的内存分配方案 -->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>20480</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
</configuration>
2.3.6 修改mapred-env.sh(hadoop01节点)
export JAVA_HOME=/opt/software/jdk1.8.0_11
2.3.7 修改slaves
修改slaves文件,然后将安装包发送到其他机器,重新启动集群
hadoop01节点执行
vi slaves
hadoop01
hadoop02
hadoop03
2.4 第一台机器执行以下命令
mkdir -p /opt/software/hadoop-2.7.5/hadoopDatas/tempDatas
mkdir -p /opt/software/hadoop-2.7.5/hadoopDatas/namenodeDatas
mkdir -p /opt/software/hadoop-2.7.5/hadoopDatas/namenodeDatas2
mkdir -p /opt/software/hadoop-2.7.5/hadoopDatas/datanodeDatas
mkdir -p /opt/software/hadoop-2.7.5/hadoopDatas/datanodeDatas2
mkdir -p /opt/software/hadoop-2.7.5/hadoopDatas/nn/edits
mkdir -p /opt/software/hadoop-2.7.5/hadoopDatas/snn/name
mkdir -p /opt/software/hadoop-2.7.5/hadoopDatas/dfs/snn/edits
将文件分发到hadoop02 hadoop03
scp -r hadoop-2.7.5 hadoop02:$PWD
scp -r hadoop-2.7.5 hadoop03:$PWD
2.5 配置hadoop环境变量
三台机器都要进行环境变量配置
vi /etc/profile
# hadoop environment
export HADOOP_HOME=/opt/software/hadoop-2.7.5
export PATH=:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
source /etc/profile
2.6 启动集群
要启动Hadoop集群,需要启动HDFS和YARN两个模块。注意:首次启动HDFS时,必须对其进行格式化操作。本质上是清理和准备工作,因为此时HDFS在物理上还是不存在的。
准备启动
第一台机器执行以下命令
bin/hdfs namenode -format (注意:首次启动执行)
sbin/start-dfs.sh (停止sbin/stop-dfs.sh)
sbin/start-yarn.sh
sbin/mr-jobhistory-daemon.sh start historyserver
三个端口查看界面
http://192.168.182.171:50070/explorer.html 查看hdfs
http://192.168.182.171:8088/cluster 查看yarn集群
http://192.168.182.171:19888/jobhistory 查看历史完成的任务