搭建Hadoop环境,让其能够在Windows中进行开发
步骤1 关闭防火墙
先关闭防火墙,这样可以让比如Hadoop的50070端口供给外界访问
centOS 6.5关闭防火墙步骤
关闭命令: service iptables stop
永久关闭防火墙:chkconfig iptables off
两个命令同时运行,运行完成后查看防火墙关闭状态
service iptables status
步骤2 搭建伪分布式环境
具体搭建环境请参见Hadoop官网
注意 为了能够让其在Windows中能够通过IDEA访问虚拟机中的Hadoop,那么就需要在core-site.xml等配置文件中使用ip地址,而不是hostname,不然windows端会报Connection Error
执行bin/hadoop namenode -format
执行sbin/start-dfs.sh启动hdfs
执行sbin/start-yarn.sh启动yarn
步骤3 Windows端配置
1, windows端配置Hadoop 环境变量,
2, Windows为了能够访问Hadoop,需要加入几个包放置到hadoop目录的bin文件夹中
3, windows 在etc host文件配置能够访问虚拟机hadoop机器的hostname
4, 打开IDEA开发项目,然后将配置文件放到resources文件中
步骤4 IDEA开发Hadoop Yarn
这里以WordCount例子为例
package ComponentApp;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import java.io.IOException;
/**
-
Created by IBM on 2017/7/16.
*/
public class WordCount2 implements Tool {
public void setConf(Configuration configuration) {}
public Configuration getConf() {
return new JobConf(WordCount2.class);
}public int run(String[] strings) throws Exception {
try {
Configuration conf = getConf();
conf.set("mapreduce.job.jar", "D:\java\idea\ComponentApp\out\artifacts\ComponentApp_jar\ComponentApp.jar");
conf.set("mapreduce.framework.name", "yarn");
conf.set("yarn.resourcemanager.hostname", "192.168.137.131");
conf.set("mapreduce.app-submission.cross-platform", "true");Job job = Job.getInstance(conf); job.setJarByClass(WordCount2.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); job.setMapperClass(WcMapper.class); job.setReducerClass(WcReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.setInputPaths(job, "hdfs://192.168.137.131:9000/kason/myid"); FileOutputFormat.setOutputPath(job, new Path("hdfs://192.168.137.131:9000/kason/out4")); job.waitForCompletion(true); } catch (Exception e) { e.printStackTrace(); } return 0;
}
public static class WcMapper extends Mapper<LongWritable, Text, Text, LongWritable>{
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String mVal = value.toString();
String[] strs = mVal.split(" ");
for(String s : strs) {
System.out.println("data:" + s);
context.write(new Text(s), new LongWritable(1));
}
}
}
public static class WcReducer extends Reducer<Text, LongWritable, Text, LongWritable>{
@Override
protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
long sum = 0;
for(LongWritable lVal : values){
sum += lVal.get();
}
context.write(key, new LongWritable(sum));
}
}
public static void main(String[] args) throws Exception {
ToolRunner.run(new WordCount2(),args);
}
}
IDEA运行结果
YARN 页面
HDFS页面