本章内容源于原书第五章节
1. 启动HDFS和YARN
root@10049605-ThinkPad-T470-W10DG:~# start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [10049605-ThinkPad-T470-W10DG]
root@10049605-ThinkPad-T470-W10DG:~# start-yarn.sh
Starting resourcemanager
Starting nodemanagers
root@10049605-ThinkPad-T470-W10DG:~#
2. Mapeduce例子
2.1. 把input文件放到HDFS中
图片.png
root@10049605-ThinkPad-T470-W10DG:/# hdfs dfs -mkdir data
mkdir: `data': No such file or directory
root@10049605-ThinkPad-T470-W10DG:/# hdfs dfs -mkdir /user/root
mkdir: `/user/root': No such file or directory
root@10049605-ThinkPad-T470-W10DG:/# hdfs dfs -mkdir /user
root@10049605-ThinkPad-T470-W10DG:/# hdfs dfs -mkdir /user/root
root@10049605-ThinkPad-T470-W10DG:/# hdfs dfs -mkdir data
root@10049605-ThinkPad-T470-W10DG:/# hdfs dfs -mkdir data/weblogs
root@10049605-ThinkPad-T470-W10DG:/# hdfs dfs -copyFromLocal /home/yay/下载/hcb/chapter4/resources data/weblogs
root@10049605-ThinkPad-T470-W10DG:/# hdfs dfs -ls data/weblogs
Found 1 items
drwxr-xr-x - root supergroup 0 2019-01-12 23:25 data/weblogs/resources
root@10049605-ThinkPad-T470-W10DG:/# hdfs dfs -ls data/weblogs/resources
Found 1 items
-rw-r--r-- 1 root supergroup 10851 2019-01-12 23:25 data/weblogs/resources/NASA_log_sample.txt
root@10049605-ThinkPad-T470-W10DG:/#
2.2. 执行Mapeduce
root@10049605-ThinkPad-T470-W10DG:~# hadoop jar /home/yay/eclipse-workspace/countwords/target/countwords-0.0.1-SNAPSHOT.jar chapter5.weblog.MsgSizeAggregateMapReduce data/weblogs/resources data/msgsize-out
root@10049605-ThinkPad-T470-W10DG:~# hdfs dfs -cat data/msgsize-out/part*
Mean 15195
Max 305722
Min 0
root@10049605-ThinkPad-T470-W10DG:~#
也可以copy到本地查看:
root@10049605-ThinkPad-T470-W10DG:~# hdfs dfs -copyToLocal data/msgsize-out /home/yay/
root@10049605-ThinkPad-T470-W10DG:~# cat /home/yay/msgsize-out/*
Mean 15195
Max 305722
Min 0
2.3. 执行另外一个Mapeduce
root@10049605-ThinkPad-T470-W10DG:~# hadoop jar /home/yay/eclipse-workspace/countwords/target/countwords-0.0.1-SNAPSHOT.jar chapter5.weblog.HitCountMapReduce data/weblogs/resources data/hit-count-out
root@10049605-ThinkPad-T470-W10DG:~# hdfs dfs -cat data/hit-count-out/part*
2.4 以2.3输出作为输入的Mapeduce
root@10049605-ThinkPad-T470-W10DG:~# hadoop jar /home/yay/eclipse-workspace/countwords/target/countwords-0.0.1-SNAPSHOT.jar chapter5.weblog.FrequencyDistributionMapReduce data/hit-count-out data/freq-dist-out
3. 安装gnuplot plotting program
yay@10049605-ThinkPad-T470-W10DG:~/下载$ mv gnuplot-5.2.6 /home/yay/software/gnuplot
yay@10049605-ThinkPad-T470-W10DG:~/software$ cd gnuplot
yay@10049605-ThinkPad-T470-W10DG:~/software/gnuplot$ dir
aclocal.m4 configure.ac GNUmakefile Makefile.maint README VERSION
BUGS configure.vms INSTALL man RELEASE_NOTES win
ChangeLog Copyright INSTALL.gnu missing share
compile demo install-sh mkinstalldirs src
config depcomp m4 NEWS term
config.hin docs Makefile.am PATCHLEVEL TODO
configure FAQ.pdf Makefile.in PGPKEYS tutorial
yay@10049605-ThinkPad-T470-W10DG:~/software/gnuplot$ ./configure --prefix=/home/yay/gnuplot5
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... no
........
yay@10049605-ThinkPad-T470-W10DG:~/software/gnuplot$ make
yay@10049605-ThinkPad-T470-W10DG:~/software/gnuplot$ make install
然后在/etc/profile最后加入下面内容:
#gunplot
export GNUPLOT=/home/yay/gnuplot5
export PATH=$PATH:$GNUPLOT/bin
export MANPATH=/home/yay/gnuplot5/share/man/man1:$MANPATH
还需要注意安装X11,否则报错:
gnuplot> set term png
Terminal type is now 'unknown'
^
unknown or ambiguous terminal type; type just 'set terminal' for a list
root@10049605-ThinkPad-T470-W10DG:~/plots# sudo apt-get install gnuplot-x11
4. 绘图
root@10049605-ThinkPad-T470-W10DG:~# hdfs dfs -copyToLocal data/freq-dist-out/part-r-00000 2.dat
root@10049605-ThinkPad-T470-W10DG:~# dir
2.dat
root@10049605-ThinkPad-T470-W10DG:~# cp -r /home/yay/下载/hcb/chapter5/plots .
root@10049605-ThinkPad-T470-W10DG:~#
root@10049605-ThinkPad-T470-W10DG:~# cd plots
root@10049605-ThinkPad-T470-W10DG:~/plots# mv ../2.dat .
root@10049605-ThinkPad-T470-W10DG:~/plots# dir
2.dat httpfreqdist.plot httphitsvsmsgsize.plot sendvsreceive.plot
data httphistbyhour.plot plot-images
root@10049605-ThinkPad-T470-W10DG:~/plots# mv 2.dat 2.data
root@10049605-ThinkPad-T470-W10DG:~/plots# gnuplot httpfreqdist.plot
root@10049605-ThinkPad-T470-W10DG:~/plots# dir
2.data freqdist.png httphistbyhour.plot plot-images
data httpfreqdist.plot httphitsvsmsgsize.plot sendvsreceive.plot
root@10049605-ThinkPad-T470-W10DG:~/plots#
图片.png
打开这个文档:
图片.png
5. 绘制另外一个图
root@10049605-ThinkPad-T470-W10DG:~/plots# hadoop jar /home/yay/eclipse-workspace/countwords/target/countwords-0.0.1-SNAPSHOT.jar chapter5.weblog.HistogramGenerationMapReduce data/weblogs/resources data/histogram-out
图片.png