一、简介
hive是基于Hadoop的一个数据仓库工具,用来进行数据提取、转化、加载,这是一种可以存储、查询和分析存储在Hadoop中的大规模数据的机制。hive数据仓库工具能将结构化的数据文件映射为一张数据库表,并提供SQL查询功能,能将SQL语句转变成MapReduce任务来执行。Hive的优点是学习成本低,可以通过类似SQL语句实现快速MapReduce统计,使MapReduce变得更加简单,而不必开发专门的MapReduce应用程序。hive是十分适合数据仓库的统计分析和Windows注册表文件。(摘自百度百科)
二、下载
下载得到:apache-hive-3.1.2-bin.tar.gz
从下面可以看到,3.1.2可以兼容Hadoop3.x.x
选择清华下载点
选3.1.2
三、安装
tar zxvf apache-hive-3.1.2-bin.tar.gz -C /mylab/soft
四、配置
这个配置是最麻烦的一个,请客观一定要耐心一点,耐心一点、再耐心一点
1.修改环境变量
修改~/.bashrc
vi ~/.bashrc
#hive 3.1.2
export HIVE_HOME=/mylab/soft/apache-hive-3.1.2-bin
export PATH=$PATH:$HIVE_HOME/bin
export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib
source ~/.bashrc
2.为hive配置mysql
创建新用户
CREATE USER 'hive'@'%' IDENTIFIED BY 'hive';
创建数据库
create database hive;
查看全部用户
SELECT user, host FROM mysql.user;
用户授权
GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' with grant option;
刷新权限
flush privileges;
显示MySQL用户帐户权限
SHOW GRANTS FOR 'hive'@'%';
测试登录
mysql -u hive -p hive
3.修改hbase配置文件
a) 创建工作目录
注:在配置文件中要用到的目录,有些我也搞不清楚干啥的,反正看到有目录的地方,我都给它配上一个目录,且都放在$HIVE_HOME/working下,并给$HIVE_HOME/working及所有子目录授予读写属性
mkdir -p $HIVE_HOME/working/exec.scratchdir
mkdir -p $HIVE_HOME/working/repl.rootdir
mkdir -p $HIVE_HOME/working/repl.cmrootdir
mkdir -p $HIVE_HOME/working/replica.functions.root
mkdir -p $HIVE_HOME/working/local.scratchdir
mkdir -p $HIVE_HOME/working/downloaded.resources
mkdir -p $HIVE_HOME/working/metastore.warehouse
mkdir -p $HIVE_HOME/working/replica.functions.root
mkdir -p $HIVE_HOME/working/operation.log
mkdir -p $HIVE_HOME/working/querylog
mkdir -p $HIVE_HOME/working/logs
chmod 777 -R $HIVE_HOME/working
b) 复制mysql-connector-java
为了省事,直接复制mysql-connector-java-8.0.15.jar到$HIVE_HOME/lib
cp /usr/share/java/mysql-connector-java-8.0.21.jar $HIVE_HOME/lib
或者
ln -s /usr/share/java/mysql-connector-java-8.0.21.jar $HIVE_HOME/lib/mysql-connector-java-8.0.21.jar
要是不确定在哪个目录下,用find命令找
sudo find / -name mysql-connector-java-8.0.21.jar
c) 修改hive-env.sh
这个需要从模板文件里复制
cp $HIVE_HOME/conf/hive-env.sh.template $HIVE_HOME/conf/hive-env.sh
vi $HIVE_HOME/conf/hive-env.sh
HADOOP_HOME=/mylab/soft/hadoop-3.2.1
export HIVE_CONF_DIR=/mylab/soft/apache-hive-3.1.2-bin/conf
export HIVE_AUX_JARS_PATH=/mylab/soft/apache-hive-3.1.2-bin/lib
c) 修改hive-site.xml
这个需要从模板文件里复制(这个模板文件名起的有点坑)
cp $HIVE_HOME/conf/hive-default.xml.template $HIVE_HOME/conf/hive-site.xml
这个配置文件比较长,所以参见附录hive-site.xml
d) 去掉hive-site.xml里的一个特殊字符
<property>
<name>hive.txn.xlock.iow</name>
<value>true</value>
<description>
Ensures commands with OVERWRITE (such as INSERT OVERWRITE) acquire Exclusive locks fortransactional tables. This ensures that inserts (w/o overwrite) running concurrently
are not hidden by the INSERT OVERWRITE.
</description>
</property>
把这个for 去掉
e) 修复hive的bug
将$HIVE_HOME/bin/hive文件放到share共享目录里,利用windows的notepad打开它,然后把所有的“file://”替换为“file:///”(三个反斜杠,为啥会有这种错误)
改完后放回到$HIVE_HOME/bin下
f) 解决guava库版本冲突
ls /mylab/soft/hadoop-3.2.1/share/hadoop/common/lib/guava*
guava-27.0-jre.jar
ls /mylab/soft/apache-hive-3.1.2-bin/lib/guava*
guava-19.0.jar
删除掉hive下低版本的,复制Hadoop下高版本的到$HIVE_HOME/lib目录
rm /mylab/soft/apache-hive-3.1.2-bin/lib/guava-19.0.jar
cp /mylab/soft/hadoop-3.2.1/share/hadoop/common/lib/guava-27.0-jre.jar /mylab/soft/apache-hive-3.1.2-bin/lib
到https://mvnrepository.com/artifact/com.google.guava/guava
找最新的替换也行
f) hive-log4j2.properties
cp $HIVE_HOME/conf/hive-log4j2.properties.template $HIVE_HOME/conf/hive-log4j2.properties
g) 修改hive-exec-log4j2.properties
cp $HIVE_HOME/conf/hive-exec-log4j2.properties.template $HIVE_HOME/conf/hive-exec-log4j2.properties
五、验证
终于可以运行一把了
格式化数据库
schematool -dbType mysql -initSchema
启动hive服务
hive --service metastore > $HIVE_HOME/working/logs/metastore.log 2>&1 &
启动hiveserver2服务
hive --service hiveserver2 > $HIVE_HOME/working/logs/hiveserver2.log 2>&1 &
或者交互模式
hive --service metastore
hive --service hiveserver2
jps
查看hive进程
普通型
ps -ef | grep hive
精简型(命令很复杂,但结果很精简,输出为:linux用户名 进程ID)
ps -ef | grep hive | grep -v grep | awk '{print $1 " " $2}'
启动客户端
方法1
hive
方法2
beeline -u jdbc:hive2://master:10000 -n root
beeline
!connect jdbc:hive2://master:10000 -n root
退出beeline
!exit
查看hiveserver2(默认10000端口)
netstat -tulnp | grep 10000
beeline使用详见
https://www.cnblogs.com/lenmom/p/11218807.html
https://www.cnblogs.com/xinfang520/p/7684598.html
停止hive服务
只能kill掉
普通型
ps -ef | grep hive
kill -9 <pid>
精简型(命令很复杂,但一剑封喉)
ps -ef | grep hive | grep -v grep | awk '{print "kill -9 " $2}' | sh
六、附录
1.hive-site.xml
<property>
<name>hive.metastore.uris</name>
<value>thrift://master:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/mylab/soft/apache-hive-3.1.2-bin/working/scratchdir</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
</property>
<property>
<name>hive.repl.rootdir</name>
<value>/mylab/soft/apache-hive-3.1.2-bin/working/repl.rootdir</value>
<description>HDFS root dir for all replication dumps.</description>
</property>
<property>
<name>hive.repl.cmrootdir</name>
<value>/mylab/soft/apache-hive-3.1.2-bin/working/repl.cmrootdir</value>
<description>Root dir for ChangeManager, used for deleted files.</description>
</property>
<property>
<name>hive.repl.replica.functions.root.dir</name>
<value>/mylab/soft/apache-hive-3.1.2-bin/working/repl.replica.functions</value>
<description>Root directory on the replica warehouse where the repl sub-system will store jars from the primary warehouse</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/mylab/soft/apache-hive-3.1.2-bin/working/local_scratchdir</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/mylab/soft/apache-hive-3.1.2-bin/working/downloaded.resources/${hive.session.id}_resources</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/mylab/soft/apache-hive-3.1.2-bin/working/metastore.warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.aux.jars.path</name>
<value>/mylab/soft/apache-hive-3.1.2-bin/lib</value>
<description>The location of the plugin jars that contain implementations of user defined functions and serdes.</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/mylab/soft/apache-hive-3.1.2-bin/working/querylog</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/mylab/soft/apache-hive-3.1.2-bin/working/operation_log</value>
<description>Top level directory where operation logs are stored if logging functionality is enabled</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<!-- <value>jdbc:derby:;databaseName=metastore_db;create=true</value> -->
<value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<!-- <value>org.apache.derby.jdbc.EmbeddedDriver</value> -->
<value>com.mysql.cj.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>master</value>
<description>Bind host on which to run the HiveServer2 Thrift service.</description>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
<description>TCP port number to listen on, default 10000</description>
</property>
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
2.参考
https://blog.csdn.net/weixin_43824520/article/details/100580557
https://blog.csdn.net/weixin_45883933/article/details/106843035