好玩的大数据之17:Hive安装(Hive-3.1.2)

一、简介


hive是基于Hadoop的一个数据仓库工具,用来进行数据提取、转化、加载,这是一种可以存储、查询和分析存储在Hadoop中的大规模数据的机制。hive数据仓库工具能将结构化的数据文件映射为一张数据库表,并提供SQL查询功能,能将SQL语句转变成MapReduce任务来执行。Hive的优点是学习成本低,可以通过类似SQL语句实现快速MapReduce统计,使MapReduce变得更加简单,而不必开发专门的MapReduce应用程序。hive是十分适合数据仓库的统计分析和Windows注册表文件。(摘自百度百科)

二、下载


下载地址:http://hive.apache.org/

下载得到:apache-hive-3.1.2-bin.tar.gz

Hive首页

从下面可以看到,3.1.2可以兼容Hadoop3.x.x

下载页面

选择清华下载点

选择清华下载点

选3.1.2

选3.1.2
选bin文件

三、安装


        tar zxvf apache-hive-3.1.2-bin.tar.gz -C /mylab/soft

四、配置

    这个配置是最麻烦的一个,请客观一定要耐心一点,耐心一点、再耐心一点


    1.修改环境变量

            修改~/.bashrc

                    vi ~/.bashrc

                        #hive 3.1.2

                        export HIVE_HOME=/mylab/soft/apache-hive-3.1.2-bin

                        export PATH=$PATH:$HIVE_HOME/bin                        

                        export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib

                    source ~/.bashrc

    2.为hive配置mysql

            创建新用户

                    CREATE USER 'hive'@'%' IDENTIFIED BY 'hive';

            创建数据库

                    create database hive;

            查看全部用户

                    SELECT user, host FROM mysql.user;

            用户授权

                    GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' with grant option;

            刷新权限

                    flush privileges;

            显示MySQL用户帐户权限

                    SHOW GRANTS FOR 'hive'@'%';

            测试登录

                    mysql -u hive -p hive

    3.修改hbase配置文件

        a) 创建工作目录

注:在配置文件中要用到的目录,有些我也搞不清楚干啥的,反正看到有目录的地方,我都给它配上一个目录,且都放在$HIVE_HOME/working下,并给$HIVE_HOME/working及所有子目录授予读写属性

mkdir -p $HIVE_HOME/working/exec.scratchdir

mkdir -p $HIVE_HOME/working/repl.rootdir

mkdir -p $HIVE_HOME/working/repl.cmrootdir

mkdir -p $HIVE_HOME/working/replica.functions.root

mkdir -p $HIVE_HOME/working/local.scratchdir

mkdir -p $HIVE_HOME/working/downloaded.resources

mkdir -p $HIVE_HOME/working/metastore.warehouse

mkdir -p $HIVE_HOME/working/replica.functions.root

mkdir -p $HIVE_HOME/working/operation.log

mkdir -p $HIVE_HOME/working/querylog

mkdir -p $HIVE_HOME/working/logs

chmod 777 -R $HIVE_HOME/working

   b) 复制mysql-connector-java        

        为了省事,直接复制mysql-connector-java-8.0.15.jar到$HIVE_HOME/lib

        cp /usr/share/java/mysql-connector-java-8.0.21.jar $HIVE_HOME/lib

        或者

        ln -s /usr/share/java/mysql-connector-java-8.0.21.jar $HIVE_HOME/lib/mysql-connector-java-8.0.21.jar

        要是不确定在哪个目录下,用find命令找

         sudo find / -name mysql-connector-java-8.0.21.jar

   c) 修改hive-env.sh

        这个需要从模板文件里复制

        cp $HIVE_HOME/conf/hive-env.sh.template  $HIVE_HOME/conf/hive-env.sh

        vi $HIVE_HOME/conf/hive-env.sh

            HADOOP_HOME=/mylab/soft/hadoop-3.2.1

            export HIVE_CONF_DIR=/mylab/soft/apache-hive-3.1.2-bin/conf

            export HIVE_AUX_JARS_PATH=/mylab/soft/apache-hive-3.1.2-bin/lib

   c) 修改hive-site.xml

            这个需要从模板文件里复制(这个模板文件名起的有点坑)

            cp $HIVE_HOME/conf/hive-default.xml.template   $HIVE_HOME/conf/hive-site.xml

            这个配置文件比较长,所以参见附录hive-site.xml

   d) 去掉hive-site.xml里的一个特殊字符

<property>

    <name>hive.txn.xlock.iow</name>

    <value>true</value>

    <description>

      Ensures commands with OVERWRITE (such as INSERT OVERWRITE) acquire Exclusive locks for&#8;transactional tables.  This ensures that inserts (w/o overwrite) running concurrently

      are not hidden by the INSERT OVERWRITE.

    </description>

  </property>

    把这个for&#8 去掉

        e) 修复hive的bug

            将$HIVE_HOME/bin/hive文件放到share共享目录里,利用windows的notepad打开它,然后把所有的“file://”替换为“file:///”(三个反斜杠,为啥会有这种错误)

            改完后放回到$HIVE_HOME/bin下

        f) 解决guava库版本冲突

                ls /mylab/soft/hadoop-3.2.1/share/hadoop/common/lib/guava*

                        guava-27.0-jre.jar

                ls /mylab/soft/apache-hive-3.1.2-bin/lib/guava*

                        guava-19.0.jar

                删除掉hive下低版本的,复制Hadoop下高版本的到$HIVE_HOME/lib目录

                    rm /mylab/soft/apache-hive-3.1.2-bin/lib/guava-19.0.jar

                    cp /mylab/soft/hadoop-3.2.1/share/hadoop/common/lib/guava-27.0-jre.jar /mylab/soft/apache-hive-3.1.2-bin/lib

          到https://mvnrepository.com/artifact/com.google.guava/guava

           找最新的替换也行

        f) hive-log4j2.properties

 cp $HIVE_HOME/conf/hive-log4j2.properties.template   $HIVE_HOME/conf/hive-log4j2.properties

         g) 修改hive-exec-log4j2.properties

cp $HIVE_HOME/conf/hive-exec-log4j2.properties.template $HIVE_HOME/conf/hive-exec-log4j2.properties


五、验证


        终于可以运行一把了

        格式化数据库

                schematool -dbType mysql -initSchema

        启动hive服务

                hive --service metastore > $HIVE_HOME/working/logs/metastore.log 2>&1 &

        启动hiveserver2服务

                hive --service hiveserver2 > $HIVE_HOME/working/logs/hiveserver2.log 2>&1 &

        或者交互模式

                hive --service metastore

                hive --service hiveserver2

                jps

        查看hive进程

            普通型

                    ps -ef | grep hive 

            精简型(命令很复杂,但结果很精简,输出为:linux用户名 进程ID)    

                    ps -ef | grep hive | grep -v grep | awk '{print $1 " " $2}'

        启动客户端

                方法1

                        hive

                方法2

                                beeline -u jdbc:hive2://master:10000 -n root

                                beeline

                                        !connect  jdbc:hive2://master:10000 -n root

                                退出beeline

                                        !exit

                查看hiveserver2(默认10000端口)

                                netstat -tulnp | grep 10000

                beeline使用详见

                https://www.cnblogs.com/lenmom/p/11218807.html

                https://www.cnblogs.com/xinfang520/p/7684598.html

        停止hive服务

            只能kill掉

            普通型

                        ps -ef | grep hive

                        kill    -9  <pid>

            精简型(命令很复杂,但一剑封喉)     

                        ps -ef | grep hive | grep -v grep | awk '{print "kill -9 " $2}' | sh        

六、附录


    1.hive-site.xml

<property>

    <name>hive.metastore.uris</name>

    <value>thrift://master:9083</value>

    <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>

  </property>

<property>

    <name>hive.exec.scratchdir</name>

    <value>/mylab/soft/apache-hive-3.1.2-bin/working/scratchdir</value>

    <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/&lt;username&gt; is created, with ${hive.scratch.dir.permission}.</description>

  </property>

<property>

    <name>hive.repl.rootdir</name>

    <value>/mylab/soft/apache-hive-3.1.2-bin/working/repl.rootdir</value>

    <description>HDFS root dir for all replication dumps.</description>

  </property>

<property>

    <name>hive.repl.cmrootdir</name>

    <value>/mylab/soft/apache-hive-3.1.2-bin/working/repl.cmrootdir</value>

    <description>Root dir for ChangeManager, used for deleted files.</description>

  </property>

<property>

    <name>hive.repl.replica.functions.root.dir</name>

    <value>/mylab/soft/apache-hive-3.1.2-bin/working/repl.replica.functions</value>

    <description>Root directory on the replica warehouse where the repl sub-system will store jars from the primary warehouse</description>

  </property>

<property>

    <name>hive.exec.local.scratchdir</name>

    <value>/mylab/soft/apache-hive-3.1.2-bin/working/local_scratchdir</value>

    <description>Local scratch space for Hive jobs</description>

  </property>

<property>

    <name>hive.downloaded.resources.dir</name>

<value>/mylab/soft/apache-hive-3.1.2-bin/working/downloaded.resources/${hive.session.id}_resources</value>

    <description>Temporary local directory for added resources in the remote file system.</description>

  </property>

<property>

    <name>hive.metastore.warehouse.dir</name>

    <value>/mylab/soft/apache-hive-3.1.2-bin/working/metastore.warehouse</value>

    <description>location of default database for the warehouse</description>

  </property>

<property>

    <name>hive.aux.jars.path</name>

    <value>/mylab/soft/apache-hive-3.1.2-bin/lib</value>

    <description>The location of the plugin jars that contain implementations of user defined functions and serdes.</description>

  </property>

<property>

    <name>hive.querylog.location</name>

    <value>/mylab/soft/apache-hive-3.1.2-bin/working/querylog</value>

    <description>Location of Hive run time structured log file</description>

  </property>

<property>

    <name>hive.server2.logging.operation.log.location</name>

    <value>/mylab/soft/apache-hive-3.1.2-bin/working/operation_log</value>

    <description>Top level directory where operation logs are stored if logging functionality is enabled</description>

  </property>

<property>

    <name>javax.jdo.option.ConnectionURL</name>

    <!-- <value>jdbc:derby:;databaseName=metastore_db;create=true</value> -->

    <value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true</value>

    <description>

      JDBC connect string for a JDBC metastore.

      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.

      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.

    </description>

  </property>

<property>

    <name>javax.jdo.option.ConnectionDriverName</name>

    <!-- <value>org.apache.derby.jdbc.EmbeddedDriver</value> -->

    <value>com.mysql.cj.jdbc.Driver</value>

    <description>Driver class name for a JDBC metastore</description>

  </property>

<property>

    <name>javax.jdo.option.ConnectionUserName</name>

    <value>hive</value>

    <description>Username to use against metastore database</description>

  </property>

<property>

    <name>javax.jdo.option.ConnectionPassword</name>

    <value>hive</value>

    <description>password to use against metastore database</description>

  </property>

<property>

    <name>hive.server2.thrift.bind.host</name>

    <value>master</value>

    <description>Bind host on which to run the HiveServer2 Thrift service.</description>

  </property>

<property>

                <name>hive.server2.thrift.port</name>

                <value>10000</value>

                <description>TCP port number to listen on, default 10000</description>

        </property>

        <property>

                <name>hive.metastore.event.db.notification.api.auth</name>

                <value>false</value>

        </property>

<property>

                <name>hive.cli.print.header</name>

                <value>true</value>

        </property>

        <property>

                <name>hive.cli.print.current.db</name>

                <value>true</value>

        </property>


2.参考

https://blog.csdn.net/weixin_43824520/article/details/100580557

https://blog.csdn.net/weixin_45883933/article/details/106843035

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。