本文主要介绍本地Eclipse开发连接远程阿里云Hadoop的环境搭建
首先需要在远程服务器部署好Hadoop运行环境
可以参考Hadoop伪分布式环境搭建
远程Hadoop环境创建好以后,接下来开始本地Eclipse环境的搭建
一,Eclipse下载和Hadoop插件下载
1,Eclipse官网下载Eclipse
2,接下来解决对应版本的Hadoop插件
示例中Hadoop的环境是2.7.3,这里也需要网上下载hadoop-eclipse-plugin-2.7.3.jar
3,将下载好的插件hadoop-eclipse-plugin-2.7.3.jar放在eclipse/dropins中,然后重启Eclipse
二,插件配置
1,Eclipse重启后将会出现红圈所示的部分,这说明插件加
2,
选择File->New->Project->Map/Reduce Project
创建一个WordCount工程
3,打开Eclipse的Preferences界面
选择Hadoop Map/Reduce选项
把Hadoop的安装目录选择进去.
这里可能会有一个疑问, Hadoop是在远程阿里云上安装的,这个目录怎么选择?
其实是把远程hadoop的运行程序在本地copy一份,然后解压,选择的是本地的
4,设置Hadoop Tool
点击Window-->Show View -->MapReduce Tools 点击 Map/ReduceLocation
弹出如下界面,然后进
设置成功后,会出现如下界面
5,设置阿里云
1,修改 hadoop/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
2,修改hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://120.27.4.193:9000</value>
</property>
</configuration>
三,创建Demo测试
1,新建WordCount工程,添加WordCount.java类
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
2,在工程下面添加一个输入文件,就是程序的输入数据Input
3,选中WordCount.java右键->Run As -> Run as configure
执行输入文件名字就是刚才的Input文件,和输出目录名字Out(自动生成),然后点击右下角的run运行
4,运行完毕后,选中WordCount工程,然后Refresh
运行结果目录就出来了,里面有结果文件
至此,本地Eclipse开发连接远程阿里云Hadoop结束