搭过Hadoop的人都知道,Hadoop的搭建过程非常的繁琐,需要配置大量的环境,修改大量的配置文件,因此搭建一个可用的测试环境非常浪费时间。好在Docker的出现,就是帮助我们解决这类问题,有了Docker我们可以快速搭建一个可用的Hadoop集群供测试使用。
本文使用Github上的一个Dockerfile来实现,做了一些细微的修改来增强国内使用的体验。Github地址
直接clone github的repository,进入repository目录:
以下内容摘自README.md
Apache Hadoop 2.7.1 Docker image
Note: this is the master branch - for a particular Hadoop version always check the related branch
A few weeks ago we released an Apache Hadoop 2.3 Docker image - this quickly become the most popular Hadoop image in the Docker registry.
Following the success of our previous Hadoop Docker images, the feedback and feature requests we received, we aligned with the Hadoop release cycle, so we have released an Apache Hadoop 2.7.1 Docker image - same as the previous version, it's available as a trusted and automated build on the official Docker registry.
FYI: All the former Hadoop releases (2.3, 2.4.0, 2.4.1, 2.5.0, 2.5.1, 2.5.2, 2.6.0) are available in the GitHub branches or our Docker Registry - check the tags.
适合国内使用的修改
这个版本修改Dockerfile时区为中国区。考虑到中国网络下载下列文件会非常的慢,所以把所有文件全部改为自行提供,而不是通过curl的方式调用,因此需要提供几个文件在当前目录下:
可以分别另寻渠道自行下载
添加docker-compose.yml文件,添加logs映射,快速启动
Build the image
If you'd like to try directly from the Dockerfile you can build the image as:
docker build -t sequenceiq/hadoop-docker:2.7.1 .
Pull the image
The image is also released as an official Docker image from Docker's automated build repository - you can always pull or refer the image when launching containers.
docker pull sequenceiq/hadoop-docker:2.7.1
通过docker-compose启动
docker-compose up -d
测试环境可用
使用
docker exec -it 容器名称 bash
进入容器终端
执行下面的命令:
cd $HADOOP_PREFIX
# run the mapreduce
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar grep input output 'dfs[a-z.]+'
# check the output
bin/hdfs dfs -cat output/*
Hadoop native libraries, build, Bintray, etc
The Hadoop build process is no easy task - requires lots of libraries and their right version, protobuf, etc and takes some time - we have simplified all these, made the build and released a 64b version of Hadoop nativelibs on this Bintray repo. Enjoy.
Automate everything
As we have mentioned previousely, a Docker file was created and released in the official Docker repository
结尾
最后提供几个Hadoop的常用web url:
- 查看集群状态:http://server:8088/cluster
- 浏览HDFS文件:http://server:50070/explorer.html