一.使用Hive Table(把Hive中的数据,读取到Spark SQL 中)
1.首先,搭建Hive的环境(分离式)
(1)关系:搭建分离式的Hive,
- 一台机器用作Hive Server(hadoop2),
- 一台机器用作Hive Client(hadoop3)
(2)配置hive环境,见前面的文章
(3)修改两台机器的hive-site.xml文件
这两台hive中其他配置文件一样,只有hive-site.xml有区别
(a)其中Hive Server的hive-site.xml配置如下:
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop1:3306/metastore1?serverTimezone=UTC</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
<property>
<name>hive.querylog.location</name>
<value>/data/hive/iotmp</value>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/data/hive/operation_logs</value>
</property>
<property>
<name>datanucleus.readOnlyDatastore</name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>false</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateTables</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateColumns</name>
<value>true</value>
</property>
<property>
<name>datanucleus.schema.autoCreateAll</name>
<value>true</value>
</property>
</configuration>
(b)Hive Client 中hive-site.xml配置如下:
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>hive.metastore.local</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://192.168.1.122:9083</value>
</property>
</configuration>
2.配置Spark SQL支持Hive
(1)只需要将以下文件拷贝到$SPARK_HOME/conf的目录下,即可
- $HIVE_HOME/conf/hive-site.xml(拷贝Hive Client中的hive-site.xml)
- $HADOOP_CONF_DIR/core-site.xml
-
$HADOOP_CONF_DIR/hdfs-site.xml
3.启动Hadoop集群
4.启动hive:
(1)hadoop2机器:启动Hive Server
cd /opt/module/hive-1.2.1
bin/hive --service metastore
(2)hadoop3机器:启动Hive Client
cd /opt/module/hive-1.2.1
bin/hive
5.使用Spark Shell操作Hive
(1)启动Spark Shell的时候,需要使用--jars指定mysql的驱动程序
启动Spark
cd /opt/module/spark-2.1.0-bin-hadoop2.7
bin/spark-shell --master spark://hadoop1:7077
spark.sql("select * from default.emp").show
(2)创建表
spark.sql("create table movle.src (key INT, value STRING) row format delimited fields terminated by ','")
(3)导入数据
spark.sql("load data local path '/root/temp/data.txt' into table src")
(4)查询数据
spark.sql("select * from src").show
6.使用spark-sql操作Hive
(1)启动spark-sql的时候,需要使用--jars指定mysql的驱动程序
(2)操作Hive
spark.sql("show tables").show