本文章采用Hadoop版本为hadoop-2.6.0-cdh5.7.0,jdk版本为1.7
-
Hadoop环境搭建
下载Hadoop
下载地址:http://archive.cloudera.com/cdh5/cdh/5/2.6.0-cdh5.7.0
wget http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.7.0.tar.gz-
安装jdk
下载jdk-7u51-linux-x64.tar.gz
解压到app目录:tar -zxvf jdk-7u51-linux-x64.tar.gz -C ~/app/
验证安装是否成功:~/app/jdk1.7.0_51/bin ./java -version
建议把bin目录配置到系统环境变量(~/.bash_profile)中export JAVA_HOME=/home/hadoop/app/jdk1.7.0_51 export PATH=$JAVA_HOME/bin:$PATH
-
机器参数设置
设置机器hostname为hadoop001修改机器名: /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hadoop001设置ip和hostname的映射关系: /etc/hosts
192.168.199.200 hadoop001
127.0.0.1 localhostssh免密码登陆(本步骤可以省略,但是后面你重启hadoop进程时是需要手工输入密码才行)
ssh-keygen -t rsa
cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
-
Hadoop配置文件修改: ~/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop
hadoop-env.sh
export JAVA_HOME=/home/hadoop/app/jdk1.7.0_51-
core-site.xml
<property> <name>fs.defaultFS</name> <value>hdfs://hadoop001:8020</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/app/tmp</value> </property>
-
hdfs-site.xml
<property> <name>dfs.replication</name> <value>1</value> </property>
格式化HDFS
注意:这一步操作,只是在第一次时执行,每次如果都格式化的话,那么HDFS上的数据就会被清空
bin/hdfs namenode -format-
启动HDFS
sbin/start-dfs.sh验证是否启动成功:
jps
DataNode
SecondaryNameNode
NameNode浏览器访问 http://hadoop001:50070/
停止HDFS
sbin/stop-dfs.sh
-
YARN环境搭建
-
配置文件修改
- mapred-site.xml
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
- yarn-site.xml
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>
- mapred-site.xml
启动yarn
sbin/start-yarn.sh-
验证是否启动成功
jps
ResourceManager
NodeManager浏览器访问: http://hadoop001:8088
停止yarn
sbin/stop-yarn.sh提交mr作业到yarn上运行: wc
hadoop jar /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar wordcount /input/wc/hello.txt /output/wc/