上一篇文章Hadoop之编写WordCount我们在本地搭建的Hadoop运行环境,并在本地运行成功,这篇主要是在上篇的基础上将编写好的WordCount程序打成可执行jar,并在集群上运行。如果你还没有集群环境参考Hadoop集群环境搭建(三台)搭建即可
主要内容:
- 1.修改Job的数据输入和输出文件夹
- 2.打成可执行jar
- 3.提交集群并运行
相关文章:
1.VM12安装配置CentOS7
2.Hadoop集群环境搭建(三台)
3.Hadoop之本地运行WordCount
4.Hadoop之集群运行WordCount
5.Log4j2+Flume+Hdfs日志采集
1.修改Job的数据输入和输出文件夹
由于前面是在本地运行,所以输入文件和输出文件夹都指定在本地
FileInputFormat.setInputPaths(job, "D:\\hadoop\\input");
FileOutputFormat.setOutputPath(job, new Path("D:\\hadoop\\output"));
现在修改为Hdfs上的路径
FileInputFormat.setInputPaths(job, "/input/words.txt");
FileOutputFormat.setOutputPath(job, new Path("/output/wc"));
提前将words.txt上传到hdfs上的input目录下
2.将WordCount打成可执行jar
用Maven打包,在pom.xml里添加如下:
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>2.4</version>
<configuration>
<excludes>
<!-- 过滤指定的文件 -->
<exclude>org/**</exclude>
</excludes>
<archive>
<manifest>
<addClasspath>true</addClasspath>
<classpathPrefix>lib/</classpathPrefix>
<!-- 指定运行的主类 -->
<mainClass>me.jinkun.mr.wc.RunWcJob</mainClass>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</build>
如果使用idea开发,那么直接在右侧双击package即可
这时在项目的target下会有名为mapreduce-wc-1.0.jar的jar包
3.将jar提交集群运行
运行如下命令:
hadoop jar mapreduce-wc-1.0.jar
运行结果如下:
[hadoop@hadoop1 soft-install]$ hadoop jar mapreduce-wc-1.0.jar
18/03/08 17:00:25 INFO client.RMProxy: Connecting to ResourceManager at hadoop1/192.168.2.111:8032
18/03/08 17:00:26 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
18/03/08 17:00:27 INFO input.FileInputFormat: Total input paths to process : 1
18/03/08 17:00:28 INFO mapreduce.JobSubmitter: number of splits:1
18/03/08 17:00:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1520498386048_0001
18/03/08 17:00:29 INFO impl.YarnClientImpl: Submitted application application_1520498386048_0001
18/03/08 17:00:29 INFO mapreduce.Job: The url to track the job: http://hadoop1:8088/proxy/application_1520498386048_0001/
18/03/08 17:00:29 INFO mapreduce.Job: Running job: job_1520498386048_0001
18/03/08 17:00:40 INFO mapreduce.Job: Job job_1520498386048_0001 running in uber mode : false
18/03/08 17:00:40 INFO mapreduce.Job: map 0% reduce 0%
18/03/08 17:00:47 INFO mapreduce.Job: map 100% reduce 0%
18/03/08 17:00:56 INFO mapreduce.Job: map 100% reduce 100%
18/03/08 17:00:56 INFO mapreduce.Job: Job job_1520498386048_0001 completed successfully
18/03/08 17:00:56 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=100
FILE: Number of bytes written=237705
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=146
HDFS: Number of bytes written=39
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=4342
Total time spent by all reduces in occupied slots (ms)=6070
Total time spent by all map tasks (ms)=4342
Total time spent by all reduce tasks (ms)=6070
Total vcore-milliseconds taken by all map tasks=4342
Total vcore-milliseconds taken by all reduce tasks=6070
Total megabyte-milliseconds taken by all map tasks=4446208
Total megabyte-milliseconds taken by all reduce tasks=6215680
Map-Reduce Framework
Map input records=4
Map output records=8
Map output bytes=78
Map output materialized bytes=100
Input split bytes=100
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=100
Reduce input records=8
Reduce output records=5
Spilled Records=16
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=206
CPU time spent (ms)=1430
Physical memory (bytes) snapshot=300036096
Virtual memory (bytes) snapshot=4156841984
Total committed heap usage (bytes)=141660160
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=46
File Output Format Counters
Bytes Written=39
查看结果:
在Hdfs的webui里可以看到如下结果
其中part-r-00000里就存放的计算结果。
到此我们介绍了2种运行mapreduce的方式,一种本地模式便于本地调试,一种集群模式用于生产环境。