1、Hive fetch 模式介绍
1、hive底层执行的是mapreduce程序,执行mapreduce程序很耗费时间
2、配置参数:
<property>
<name>hive.fetch.task.conversion</name>
<value>minimal</value>
<description>
Some select queries can be converted to single FETCH task minimizing latency.
Currently the query should be single sourced not having any subquery and should not have
any aggregations or distincts (which incurs RS), lateral views and joins.
0.none
1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only
2. more : SELECT, FILTER, LIMIT only (TABLESAMPLE, virtual columns)
</description>
</property>
-》node:不管执行什么SQL语句,都会跑mapreduce
-》minimal:执行select *,或者对分区列进行过滤,limit的时候不跑mapreduce
-》more:执行select *,过滤查询,limit的时候不跑mapreduce
-》默认配置就是more
2、Hive 虚拟列介绍
0、参考文档:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VirtualColumns
1、INPUT__FILE__NAME,显示每行数据来自于哪个文件
2、BLOCK__OFFSET__INSIDE__FILE,每行数据的首字母相对于文件起始位置的字节偏移量
3、自己手写虚拟列的时候需要注意,单词之间的下划线是连续2个