最近在执行Hive SQL的时候遇到一个类型转换异常的错误,无法将Text类型转换为Long。由于异常信息栈比较多,只贴出一些核心异常信息,具体如下:
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.LongWritable
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableLongObjectInspector.get(WritableLongObjectInspector.java:36)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPLessThan.evaluate(GenericUDFOPLessThan.java:130)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:197)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:88)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd.evaluate(GenericUDFOPAnd.java:65)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:197)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:88)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFWhen.evaluate(GenericUDFWhen.java:93)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:197)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:88)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd.evaluate(GenericUDFOPAnd.java:65)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:197)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorHead._evaluate(ExprNodeEvaluatorHead.java:44)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:68)
at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:112)
at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:995)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:941)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:153)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555)
其中查询语句如下:
A1 long,
...
},
B AS {
B1, //string类型
B2, //long类型
...
},
C AS {
C1, //string类型
C2, //long类型
...
}
SELECT
A1,
IF(duration/1000 > xxx,1,0) as valid,
...
FROM
(SELECT
B1 AS id, //long类型
B2 AS duration, //long类型
...
FROM
B
UNION
SELECT
C1 AS id, //long类型
C2 AS duration, //long类型
...
FROM
C
) temp
LEFT JOIN
A ON A.A1 = temp.id
实际上,其中有两个问题:
1.join的关键key类型不匹配,一个是long,一个是string
2.在进行数据计算的时候,会有long类型参与计算转换
优化结果:
1.数据类型一致之后进行join,任务的执行时间由1h减少到0.5h
2.long类型的转换报错,因为有null类型的数据,需要任务的执行从null类型转换变为long类型,就导致报异常错误。
结论:
一般需要检查字段的类型以及字段中的value是否有异常值,做比较充分的数据探查工作。