Hive优化之一fetch task。
优化场景,
1、当在hive中执行select * from emp全部查询时及过滤属性字段
2、当在hive中执行分区查询时
3、当查询前10/20笔数据这样的LIMIT
执行以上操作时不执行MapReduce
优化方法
在hive-site.xml中添加 hive.fetch.task.conversion 配置,见下面的描述
<property>
<name>hive.fetch.task.conversion</name>
<value>more</value>
<description>
Some select queries can be converted to single FETCH task minimizing latency.
Currently the query should be single sourced not having any subquery and should not have
any aggregations or distincts (which incurs RS), lateral views and joins.
1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only
2. more : SELECT, FILTER, LIMIT only (TABLESAMPLE, virtual columns)
</description>
</property>
测试优化方法
退出hive命令行,重新进入命令行,因为hive-site.xml配置文件中添加了属性配置。
前后比较