在配置集群/分布式模式时,需要修改/usr/local/hadoop/etc/hadoop目录下的配置文件,这里仅设置正常启动所需的设置项,包括slaves,core-site.xml,hdfs-site.xml、yarn-site.xml、mapred-site.xml 共5个文件。
我的环境:
Ubuntu 16.04 TLS
Hadoop 2.6.0
192.168.56.102 Master
192.168.56.105 data1
192.168.56.104 data2
192.168.56.103 data3
(1)修改文件slaves:
sudo gedit /usr/local/hadoop/etc/hadoop/slaves
需要把所有的数据节点的主机名写入该文件,每行一个。
(2)修改core-site.xml:
sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://Master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
</configuration>
(3)修改文件hdfs-site.xml
sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>Master:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>
(4)修改文件mapred-site.xml
sudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>Master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>Master:19888</value>
</property>
</configuration>
(5)修改文件yarn-site.xml
sudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
推荐阅读:
Hadoop完全分布式SSH免密登录
参考资料:
《大数据基础编程、实验和案例教程 林子雨》
《Hadoop+Spark大数据巨量分析与机器学习 林大贵》