在实际项目开发的时候,经常会出现把同一个文件中的内容进行分类输出,以便于进行下一轮的文件输出进行区分,这样就涉及到了HadoopMR的多路径输出的问题。
HadoopMR中的多路径输出使用的类是:MultipleOutputs,其常用接口如下:
public void write(KEYOUT key, VALUEOUT value, String baseOutputPath)
以上接口的前面两个参数跟Context里的write一样,第三个参数是需要输出的分类前缀,比如输出如下:
multipleOutput.write(key, value, "ONE")
multipleOutput.write(key, value, "TWO")
则输出如下:
ONE-r-00000
TWO-r-00000
该接口也支持建立子目录,用于区别每类输出,比如:
multipleOutput.write(key, value, "folder1/ONE")
multipleOutput.write(key, value, "folder2/TWO")
则输出如下:
folder1/ONE-r-00000
folder2/TWO-r-00000
该接口主要用于reduce输出,下面提供reduce例子:
import java.io.IOException;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;
import com.dataeye.mr.util.OutFieldsBaseModel;
public class MultiReducer extends Reducer<OutFieldsBaseModel, OutFieldsBaseModel, NullWritable, OutFieldsBaseModel> {
private OutFieldsBaseModel mapValueObj = new OutFieldsBaseModel();
private MultipleOutputs<NullWritable, OutFieldsBaseModel> multipleOutput;
@Override
protected void cleanup(Context context) throws IOException, InterruptedException {
multipleOutput.close();
}
@Override
protected void setup(Context context) throws IOException, InterruptedException {
multipleOutput = new MultipleOutputs<NullWritable, OutFieldsBaseModel>(context);
}
@Override
protected void reduce(OutFieldsBaseModel key, Iterable<OutFieldsBaseModel> values, Context context) throws IOException, InterruptedException {
String[] keyArray = key.getOutFields();
String deviceId = keyArray[0];
mapValueObj.setOutFields(keyArray);
int code = deviceId.hashCode() % 2;
if (code == 0){
multipleOutput.write(NullWritable.get(), mapValueObj, "ZERO/ZERO");
} else {
multipleOutput.write(NullWritable.get(), mapValueObj, "ONE/ONE");
}
}
}