Xzg大数据实验代码总结V1.0
分为两部分
前八个实验框架结构(以词频统计为例):
三部分:构造mapper类 构造reducer类 构造main
代码结构如下:
第一部分
包名pacage
引用jar包 import
构造继承Mapper的具体类WordCountMapper<Object, Text, Text, IntWritable>
{
Mapper四个内部方法:
Setup一般是在执行map函数前做一些准备工作,(默认)
map是主要的数据处理函数,
cleanup则是在map执行完成后做一些清理工作和finally字句的作用很像(默认)
run方法(默认)
复写mapper构造方法map()输入输出参数(LongWritable key, Text value,Mapper<LongWritable, Text, Text, LongWritable>.Context context)(key值,value值,context类值)
//get values string值转换成字符串便于以后操作
String valueString = value.toString();
//spile string按空格将字符串断开用数组存储
String wArr[] = valueString.split(" ");
//for iterator递归形成<key,value>对
for(int i=0;i<wArr.length;i++){
//map out key/value
context.write(new Text(wArr[i]), new LongWritable(i));
}
后续mapper操作由默认的cleanup()和run()进行
Setup(Context context)默认无操作实际可获取context内容给实际mapper类内部参数赋值 如name
Clean(Context context)默认啥也不干
Run(Context context){
Setup(context)
try {
while (context.nextKeyValue()) {
map(context.getCurrentKey(), context.getCurrentValue(), context);
}
} finally {
cleanup(context);
}
}
第二部分
包名pacage
引用jar包 import
public class WordCountReducer extends Reducer<Text, LongWritable, Text, LongWritable> {
构造继承Reducer的具体类WordCountReducer {
<Text, LongWritable, Text, LongWritable>
内部跟mapper差不多具体操作实现部分在reduce(Text key, Iterable<LongWritable> v2s,
Reducer.Context context)方法(key,value,content)
Iterator<LongWritable> it = v2s.iterator();
//define var sum
long sum = 0;
// iterator count arr
while(it.hasNext()){
sum += it.next().get();
}
context.write(key, new LongWritable(sum));
}
第三部分
包名pacage
引用jar包 import
public class TestMapReducer {
public static void main(String[] args) throws Exception{
七步
1默认配置
2job对象
3job主类
4map和reduce类
5map和reduce输出类型
6MapReduce输入输出文件路径
7执行job任务
Configuration conf = new Configuration();
生成默认配置
//conf.set("fs.default.name", "hdfs://192.168.0.202:9000");
// step1 : get a job
获取job对象
Job job = Job.getInstance(conf);
//step2: set jar main class
设置job主类
job.setJarByClass(TestMapReducer.class);
//step3: set map class and redcer class
设置map和reduce类
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
//step4: set map reduce output type
设置map和reduce输出类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
//step5: set key/value output file format and input/output path
设置MapReduce输入输出文件路径
FileInputFormat.setInputPaths(job, new Path("file:///simple/source.txt"));
FileOutputFormat.setOutputPath(job, new Path("file:///simple/wcout4"));
//step6: commit job
执行job任务
job.waitForCompletion(true);
}
}
测试代码
Run as”--->”Java Application
Cd /simple/output
cat part-r-00000