Hive安装
2)解压apache-hive-3.1.2-bin.tar.gz到/opt/module/目录下面
[atguigu@hadoop102 software]$ tar -zxvf /opt/software/apache-hive-3.1.2-bin.tar.gz -C /opt/module/
3)修改apache-hive-3.1.2-bin.tar.gz的名称为hive
[atguigu@hadoop102 software]$ mv /opt/module/apache-hive-3.1.2-bin/ /opt/module/hive
4)修改/etc/profile.d/my_env.sh,添加环境变量
[atguigu@hadoop102 software]$ sudo vim /etc/profile.d/my_env.sh
新建 hive-site.xml 配置
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop01:3306/metastore?useSSL=false&useUnicode=true&characterEncoding=UTF-8</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>Xindeco123!</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>datanucleus.schema.autoCreateAll</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://hadoop01:9083</value>
</property>
<!-- hiveserver2 -->
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>hadoop01</value>
</property>
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>
<property>
<name>hive.server2.active.passive.ha.enable</name>
<value>true</value>
</property>
<!--增大spark客户端的连接超时时长 -->
<property>
<name>hive.spark.client.connect.timeout</name>
<value>10000ms</value>
</property>
<!--设置相应队列-->
<property>
<name>mapred.job.queue.name</name>
<value>a</value>
</property>
<property>
<name>mapreduce.job.queuename</name>
<value>a</value>
</property>
<property>
<name>mapred.queue.names</name>
<value>a</value>
</property>
<!--Spark依赖位置-->
<property>
<name>spark.yarn.jars</name>
<value>hdfs://djcluster/spark-jars/*</value>
</property>
<!--Hive执行引擎-->
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
</configuration>
5)添加内容
#HIVE_HOME
export HIVE_HOME=/opt/module/hive
export PATH=$PATH:$HIVE_HOME/bin
6)解决日志Jar包冲突
[atguigu@hadoop102 software]$ mv $HIVE_HOME/lib/log4j-slf4j-impl-2.10.0.jar $HIVE_HOME/lib/log4j-slf4j-impl-2.10.0.bak
注:/tmp/atguigu目录下监控hive.log文件可以排查hive问题
MySQL安装
检查当前系统是否安装过MySQL
[atguigu@hadoop102 ~]$ rpm -qa|grep mariadb
mariadb-libs-5.5.56-2.el7.x86_64
//如果存在通过如下命令卸载
[atguigu @hadoop102 ~]$ sudo rpm -e --nodeps mariadb-libs
rpm -qa|grep mysql | -e --nodeps
卸载自带的Mysql-libs(如果之前安装过MySQL,要全都卸载掉)
[atguigu@hadoop102 software]$ rpm -qa | grep -i -E mysql\|mariadb | xargs -n1 sudo rpm -e --nodeps
2)将MySQL安装包拷贝到/opt/software目录下
将安装包和JDBC驱动上传到/opt/software,共计6个
01_mysql-community-common-5.7.16-1.el7.x86_64.rpm
02_mysql-community-libs-5.7.16-1.el7.x86_64.rpm
03_mysql-community-libs-compat-5.7.16-1.el7.x86_64.rpm
04_mysql-community-client-5.7.16-1.el7.x86_64.rpm
05_mysql-community-server-5.7.16-1.el7.x86_64.rpm
mysql-connector-java-5.1.27-bin.jar
安装MySQL
(1)安装mysql依赖
[atguigu@hadoop102 software]$ sudo rpm -ivh 01_mysql-community-common-5.7.16-1.el7.x86_64.rpm
[atguigu@hadoop102 software]$ sudo rpm -ivh 02_mysql-community-libs-5.7.16-1.el7.x86_64.rpm
[atguigu@hadoop102 software]$ sudo rpm -ivh 03_mysql-community-libs-compat-5.7.16-1.el7.x86_64.rpm
(2)安装mysql-client
[atguigu@hadoop102 software]$ sudo rpm -ivh 04_mysql-community-client-5.7.16-1.el7.x86_64.rpm
(3)安装mysql-server
[atguigu@hadoop102 software]$ sudo rpm -ivh 05_mysql-community-server-5.7.16-1.el7.x86_64.rpm
(4)启动mysql
[atguigu@hadoop102 software]$ sudo systemctl start mysqld
(5)查看mysql密码
[atguigu@hadoop102 software]$ sudo cat /var/log/mysqld.log | grep password
================================================
如果找不到密码
1.删除原来安装过的mysql残留的数据(这一步非常重要,问题就出在这)
rm -rf /var/lib/mysql
2.重启mysqld服务
systemctl restart mysqld
3.再去找临时密码
grep 'temporary password' /var/log/mysqld.log
================================================
通过yum安装缺少的依赖,然后重新安装mysql-community-server-5.7.28-1.el7.x86_64即可
[atguigu@hadoop102 software] yum install -y libaio
================================================
9)登录MySQL数据库
[atguigu @hadoop102 opt]$ mysql -uroot -p
Enter password: 输入临时生成的密码
登录成功.
(2)设置复杂密码(由于mysql密码策略,此密码必须足够复杂)
mysql> set password=password("Qs23=zs32");
(3)更改mysql密码策略(如果嫌弃太复杂)
mysql> set global validate_password_length=4;
mysql> set global validate_password_policy=0;
10)必须先修改root用户的密码,否则执行其他的操作会报错
mysql> set password = password("新密码");
11)修改mysql库下的user表中的root用户允许任意ip连接
mysql> update mysql.user set host='%' where user='root';
mysql> flush privileges;
Hive元数据配置到MySQL
拷贝驱动
将MySQL的JDBC驱动拷贝到Hive的lib目录下
[atguigu@hadoop102 software]$ cp /opt/software/mysql-connector-java-5.1.27.jar $HIVE_HOME/lib
登陆MySQL
[atguigu@hadoop102 software]$ mysql -uroot -p000000
3)新建Hive元数据库
mysql> create database metastore;
mysql> use metastore;
(1)字段注释
mysql> alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;
(2)表注释
mysql> alter table TABLE_PARAMS modify column PARAM_VALUE mediumtext character set utf8;
mysql> quit;
4) 初始化Hive元数据库
[atguigu@hadoop102 software]$ schematool -initSchema -dbType mysql -verbose
1)修改Hive元数据库中存储注释的字段的字符集为utf-8
2)修改hive-site.xml中JDBC URL,如下
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop102:3306/metastore?useSSL=false&useUnicode=true&characterEncoding=UTF-8</value>
</property>
使用元数据服务的方式访问Hive
编写hive服务启动脚本
[atguigu@hadoop202 hive]$ nohup hive --service metastore 2>&1 &
[atguigu@hadoop202 hive]$ nohup hive --service hiveserver2 2>&1 &
vim $HIVE_HOME/bin/hiveservices.sh
=====================================================================
#!/bin/bash
HIVE_LOG_DIR=$HIVE_HOME/logs
if [ ! -d $HIVE_LOG_DIR ]
then
mkdir -p $HIVE_LOG_DIR
fi
#检查进程是否运行正常,参数1为进程名,参数2为进程端口
function check_process()
{
pid=$(ps -ef 2>/dev/null | grep -v grep | grep -i $1 | awk '{print $2}')
ppid=$(netstat -nltp 2>/dev/null | grep $2 | awk '{print $7}' | cut -d '/' -f 1)
echo $pid
[[ "$pid" =~ "$ppid" ]] && [ "$ppid" ] && return 0 || return 1
}
function hive_start()
{
metapid=$(check_process HiveMetastore 9083)
cmd="nohup hive --service metastore >$HIVE_LOG_DIR/metastore.log 2>&1 &"
[ -z "$metapid" ] && eval $cmd || echo "Metastroe服务已启动"
server2pid=$(check_process HiveServer2 10000)
cmd="nohup hiveserver2 >$HIVE_LOG_DIR/hiveServer2.log 2>&1 &"
[ -z "$server2pid" ] && eval $cmd || echo "HiveServer2服务已启动"
}
function hive_stop()
{
metapid=$(check_process HiveMetastore 9083)
[ "$metapid" ] && kill $metapid || echo "Metastore服务未启动"
server2pid=$(check_process HiveServer2 10000)
[ "$server2pid" ] && kill $server2pid || echo "HiveServer2服务未启动"
}
case $1 in
"start")
hive_start
;;
"stop")
hive_stop
;;
"restart")
hive_stop
sleep 2
hive_start
;;
"status")
check_process HiveMetastore 9083 >/dev/null && echo "Metastore服务运行正常" || echo "Metastore服务运行异常"
check_process HiveServer2 10000 >/dev/null && echo "HiveServer2服务运行正常" || echo "HiveServer2服务运行异常"
;;
*)
echo Invalid Args!
echo 'Usage: '$(basename $0)' start|stop|restart|status'
;;
esac
===========================================================================
Hive on spark
上传并解压解压spark-3.0.0-bin-hadoop3.2.tgz
[atguigu@hadoop102 software]$ tar -zxvf spark-3.0.0-bin-hadoop3.2.tgz -C /opt/module/
[atguigu@hadoop102 software]$ mv /opt/module/spark-3.0.0-bin-hadoop3.2 /opt/module/spark
修改
/opt/module/spark/conf
vi spark-env.sh
shift+g
最底下添加一句
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
(3)配置SPARK_HOME环境变量
[atguigu@hadoop102 software]$ sudo vim /etc/profile.d/my_env.sh
添加如下内容
# SPARK_HOME
export SPARK_HOME=/opt/module/spark
export PATH=$PATH:$SPARK_HOME/bin
source 使其生效
[atguigu@hadoop102 software]$ source /etc/profile.d/my_env.sh
(4)在hive中创建spark配置文件
[atguigu@hadoop102 software]$ vim /opt/module/hive/conf/spark-defaults.conf
添加如下内容(在执行任务时,会根据如下参数执行)
spark.master yarn
spark.eventLog.enabled true
spark.eventLog.dir hdfs://djcluster/spark-history
spark.executor.memory 8g
spark.serializer org.apache.spark.serializer.KryoSerializer
在HDFS创建如下路径,用于存储历史日志
[atguigu@hadoop102 software]$ hadoop fs -mkdir /spark-history
上传并解压spark-3.0.0-bin-without-hadoop.tgz
[atguigu@hadoop102 software]$ tar -zxvf /opt/software/spark-3.0.0-bin-without-hadoop.tgz
上传Spark纯净版jar包到HDFS
[atguigu@hadoop102 software]$ hadoop fs -mkdir /spark-jars
[atguigu@hadoop102 software]$ hadoop fs -put spark-3.0.0-bin-without-hadoop/jars/* /spark-jars
修改hive-site.xml文件
[atguigu@hadoop102 ~]$ vim /opt/module/hive/conf/hive-site.xml
添加如下内容
<!--Spark依赖位置(注意:端口号8020必须和namenode的端口号一致)-->
<property>
<name>spark.yarn.jars</name>
<value>hdfs://djcluster/spark-jars/*</value>
</property>
<!--Hive执行引擎-->
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
Hive on Spark测试
(1)启动hive客户端
[atguigu@hadoop102 hive]$ bin/hive
(2)创建一张测试表
hive (default)> create table student(id int, name string);
(3)通过insert测试效果
hive (default)> insert into table student values(1,'abc');