两台主机:spark0 ip:10.112.154.98 spark1 ip:10.112.203.31
镜像中已安装hadoop
将两个容器运行起来
docker run -it -h master --name master $image_name # 主节点
docker run -it -h slave01 --name slave01 $image_name # 从节点
利用weave将两容器联通
weave attach 192.168.0.2/24 $master_container_id
weave connect $ip # slave01主机ip
weave attach 192.168.0.2/24 $slave01_container_id
weave connect $ip # master主机ip
hadoop配置
打开hadoop_env.sh文件,修改JAVA_HOME:
#假设现在/usr/local/hadoop目录下
vim etc/hadoop/hadoop-env.sh
# 将export JAVA_HOME=${JAVA_HOME}替换成
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
打开core-site.xml,输入一下内容:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
打开hdfs-site.xml输入以下内容:
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/namenode_dir</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/datanode_dir</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
修改mapred-site.xml(复制mapred-site.xml.template,再修改文件名),输入以下内容:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
修改yarn-site.xml文件,输入以下内容:
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
</configuration>
修改/etc/hosts
192.168.0.2 master
192.168.0.3 slave01
开启hadoop
打开master上的slaves文件,输入slave的主机名:
vim etc/hadoop/slaves
# 将localhost替换成两个slave的主机名
slave01
在master终端上,首先进入/usr/local/hadoop,然后运行如下命令:
cd /usr/local/hadoop
bin/hdfs namenode -format
sbin/start-all.sh
这时Hadoop集群就已经开启,我们可以在master,slave01和slave02上分别运行命令jps查看运行结果;
下面是运行结果图:
运行Hadoop实例程序grep
在hdfs上创建一个目录:
./bin/hdfs dfs -mkdir -p /user/hadoop/input
将/usr/local/hadoop/etc/hadoop/目录下的所有文件拷贝到hdfs上的目录:
./bin/hdfs dfs -put ./etc/hadoop/*.xml /user/hadoop/input
通过ls命令查看下是否正确将文件上传到hdfs下:
./bin/hdfs dfs -ls /user/hadoop/input
通过运行下面命令执行实例程序:
./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep /user/hadoop/input output 'dfs[a-z.]+'
在hdfs上的output目录下查看到运行结果:
./bin/hdfs dfs -cat output/*
1 dfsadmin
1 dfs.replication
1 dfs.namenode.name.dir
1 dfs.datanode.data.dir