生态圈

生态圈

组件架构：
hiveserver2（beeline）,hive,metadb

Execution Engine – The component which executes the execution plan created by the compiler. The plan is a DAG of stages. The execution engine manages the dependencies between these different stages of the plan and executes these stages on the appropriate system components.

连接hiveserver2
GUI CLI JDBC (beeline)
数据源
用kafka，sqoop等获得data，放入hdfs，这些数据各种结构都有。
关系数据库的表，MongoDB 或json数据，或日志
执行hql
背后运行的是mapreduce or Tez jobs(类似于pig latin脚本执行pig)
insert into test values("wangyuq","123");
查看tracking url
stage
将你的数据移到目的位置之前，将会staing 那儿一段时间。staging文件最终丢弃。
比对
pig是对非结构化数据处理的好的etl。
hive不是关系数据库，只是维护存储在HDFS的数据的metadata，使得对大数据操作就像sql操作表一样，只不过hql和sql稍有出入。使我们能用sql来执行mr。可以对hdfs数据进行query。
hive使用metastore存表。hive默认derby但是可自定义更换。
劣
hive不能承诺优化，只是简单，因此hive不能支持实时，性能差
index view有限制（partition bucket 弥补）
和sql 的datatype不完全一样
与hdfs关系
hdfs里有hive，data在hdfs上，schema在metastore里。
load语句：将hdfs搬运到hive，hdfs不再有该数据。只是将真正的data转到了hive目录下。

最后编辑于：2017.12.11 06:32:12

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成，浏览时请结合常识与多方信息审慎甄别。
平台声明：文章内容（如有图片或视频亦包括在内）由作者上传并发布，文章内容仅代表作者本人观点，简书系信息发布平台，仅提供信息存储服务。

生态圈

相关阅读更多精彩内容

友情链接更多精彩内容