Hive 2.1.0-用mysql作metastore数据库-安装配置和问题

Hive 2.1.0-用mysql作为metastore数据库-安装配置和问题

环境

  • 虚拟机:
 [root@master whm]# cat /etc/issue 
CentOS release 6.5 (Final) Kernel \r on an \m
  • Hadoop
    伪分布式的hadoop 2.7.3

下载hive 2.1,解压,放到喜欢的文件夹下面

下载apache-hive-2.1.0-bin.tar.gz,解压,放到喜欢的位置.本文放在用户whm的home文件夹.

[whm@master ~]$ ll
total 80
drwxrwxr-x. 10 whm whm 4096 Jan 13 23:16 apache-hive-2.1.0-bin

配置HOME_HIVE和PATH

[whm@master ~]$ su root
Password: 
[root@master whm]# vim /etc/profile
---------------
export HIVE_HOME=/home/whm/apache-hive-2.1.0-bin
export PATH=PATH:HIVE_HOME/bin

在hadoop里建立tmp和warehouse文件夹,更改权限

[whm@master ~]$ hdfs dfs -mkdir /tmp
[whm@master ~]$ hdfs dfs -mkdir /hive/warehouse
[whm@master ~]$ hdfs dfs -chmod 777 /tmp
[whm@master ~]$ hdfs dfs -chmod 777 /hive/warehouse

hive的hive-site.xml中,默认的warehouse地址是/user/hive/warehouse,也可以使用默认值.

安装mysql

使用mysql作为metastore数据库.

[root@master ~]$ yum search mysql

里面有一条:
mysql-server.x86_64 : The MySQL server and related files
安装它:

[root@master ~]$ yum install mysql-server.x86_64

安装mysql-connector并复制到hive的lib

刚才yum search的时候,有一条:
mysql-connector-java.noarch : Official JDBC driver for MySQL
安装它:

[root@master ~]$ yum install mysql-connector-java.noarch

安装好后在/usr/share/java/文件夹.
拷贝到hive的lib文件夹:

[whm@master ~]$ cp /usr/share/java/mysql-connector-java ~/apache-hive-2.1.0-bin/lib/

在mysql中,添加hive metastore数据库,初始化

[whm@master ~]$ su root

Password: 
[root@master whm]# service mysqld start
Starting mysqld:                                           [ OK ]
[root@master whm]# mysql -u root -p
Enter password: 
Welcome to the MySQL monitor. Commands end with ; or \g.

创建hive metastore数据库,本文这个数据库叫hive:

mysql> create database hive;

mysql> use hive;

使用hive的schematool初始化(在hive的bin目录):

schematool -dbType mysql -initSchema

hive schemetool

The Hive distribution now includes an offline tool for Hive metastore schema manipulation. This tool can be used to initialize the metastore schema for the current Hive version. It can also handle upgrading the schema from an older version to current. It tries to find the current schema from the metastore if it is available. This will be applicable to future upgrades like 0.12.0 to 0.13.0. In case of upgrades from older releases like 0.7.0 or 0.10.0, you can specify the schema version of the existing metastore as a command line option to the tool.
The schematool figures out the SQL scripts required to initialize or upgrade the schema and then executes those scripts against the backend database. The metastore DB connection information like JDBC URL, JDBC driver and DB credentials are extracted from the Hive configuration. You can provide alternate DB credentials if needed.

注意 :
不知道可不可以不初始化,但是,如果初始化,一定要使用这个schemetool...一开始参考的一篇文章是手动初始化:

mysql> SOURCE $HIVE_HOME/scripts/metastore/upgrade/mysql/hive-schema-0.14.0.mysql.sql;

查看$HIVE_HOME/scripts/metastore/upgrade/mysql/,确实有这些文件:

[whm@master ~]$ ll apache-hive-2.1.0-bin/scripts/metastore/upgrade/mysql/
total 760
...
-rw-r--r--. 1 whm whm 35192 Jun  3  2016 hive-schema-1.2.0.mysql.sql
-rw-r--r--. 1 whm whm 34845 Jun  3  2016 hive-schema-1.3.0.mysql.sql
-rw-r--r--. 1 whm whm 34845 Jun  3  2016 hive-schema-2.0.0.mysql.sql
-rw-r--r--. 1 whm whm 35601 Jan 14 01:33 hive-schema-2.1.0.mysql.sql
-rw-r--r--. 1 whm whm  2845 Jun  3  2016 hive-txn-schema-0.13.0.mysql.sql
-rw-r--r--. 1 whm whm  2845 Jun  3  2016 hive-txn-schema-0.14.0.mysql.sql
-rw-r--r--. 1 whm whm  4139 Jun  3  2016 hive-txn-schema-1.3.0.mysql.sql
-rw-r--r--. 1 whm whm  3770 Jun  3  2016 hive-txn-schema-2.0.0.mysql.sql
-rw-r--r--. 1 whm whm  4327 Jan 14 01:32 hive-txn-schema-2.1.0.mysql.sql
...

但如果如法炮制2.1.0的话,就会报错:

Failed to open file 'hive-txn-schema-2.1.0.mysql.sql', error: 2

因为参考的0.14版本,虽然也有txn那个文件,但是scheme文件中完全没有用到txn文件

而2.1.0的scheme文件中有一句:

SOURCE hive-txn-schema-2.1.0.mysql.sql;

都没有绝对路径当然找不到了...

如果试图手动改成绝对路径重新执行的话,因为其他的sql语句里有一些insert,就会报一些Duplicate key name,Duplicate entry等错误...而且txn那个文件里建表不检查是否exists,多执行几遍又会报表已经存在错误..

**使用schemetool就可以避免这些麻烦,而且很明确的知道是否初始化成功. **

在mysql中,添加连接metastore使用的user,授权

mysql> create user 'hiveuser'@'%' identified by 'hivepassword';
mysql> grant all on *.* to 'hiveuser'@localhost identified by 'hivepassword';
mysql> flush privileges;

此处hiveuser替换为喜欢的名字,hivepassword替换为喜欢的密码.本文直接使用这俩...

配置hive-env.sh

复制template:

[whm@master ~]$ cp apache-hive-2.1.0-bin/conf/hive-env.sh.template apache-hive-2.1.0-bin/conf/hive-env.sh

配置:

# Set HADOOP_HOME to point to a specific hadoop install directory
# HADOOP_HOME=${bin}/../../hadoop
HADOOP_HOME=/home/whm/hadoop-2.7.3

# Hive Configuration Directory can be controlled by:
# export HIVE_CONF_DIR=
export HIVE_CONF_DIR=/home/whm/apache-hive-2.1.0-bin/conf

配置hive-site.sml

复制template:

[whm@master ~]$ cp apache-hive-2.1.0-bin/conf/hive-default.xml.template apache-hive-2.1.0-bin/conf/hive-site.xml

配置:

<configuration>
    <property>  
        <name>hive.metastore.warehouse.dir</name>  
        <!--之前hdfs中创建的warehouse文件夹-->  
        <value>/hive/warehouse</value>  
        <description>location of default database for the warehouse</description>  
    </property>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <!--这里的hive就是刚才建立的metastore数据库名字-->
        <value>jdbc:mysql://localhost/hive?createDatabaseIfNotExist=true</value>
        <description>metadata is stored in a MySQL server</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>MySQL JDBC driver class</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <!--刚才建的mysql用户-->
        <value>hiveuser</value>
        <description>user name for connecting to mysql server</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <!--刚才建的mysql用户的密码-->
        <value>hivepassword</value>
        <description>password for connecting to mysql server</description>
    </property>
    <property>
        <name>hive.exec.scratchdir</name>
        <!--之前在hdfs中建的tmp文件夹-->
        <value>/tmp/hive</value>
        <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
    </property>
  
    <!--以下5个属性在default文件中都是通过${system:java.io.tmpdir}/${system:user.name}定义的,改成自己在hive目录下建的iotmp-->
    <property>
        <name>hive.querylog.location</name>
        <value>/home/whm/apache-hive-2.1.0-bin/iotmp/querylog</value>
        <description>Location of Hive run time structured log file</description>
    </property>
    <property>
        <name>hive.server2.logging.operation.log.location</name>
        <value>/home/whm/apache-hive-2.1.0-bin/iotmp/operation_logs</value>
        <description>Top level directory where operation logs are stored if logging functionality is enabled</description>
   </property>
    <property>
        <name>hive.server2.logging.operation.log.location</name>
        <value>/home/whm/apache-hive-2.1.0-bin/iotmp/operation_logs</value>
        <description>Top level directory where operation logs are stored if logging functionality is enabled</description>
    </property>
    <property>
        <name>hive.downloaded.resources.dir</name>
        <value>/home/whm/apache-hive-2.1.0-bin/iotmp/resource_dir</value>
        <description>Temporary local directory for added resources in the remote file system.</description>
    </property>
    <property>
        <name>hive.exec.local.scratchdir</name>
        <value>/home/whm/apache-hive-2.1.0-bin/iotmp/scratchdir</value>
        <description>Local scratch space for Hive jobs</description>
    </property>
</configuration>

改后面5个属性的文件夹,是因为我的报过错,然后在网上找的解决办法:

Exception in thread "main" java.lang.RuntimeException: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D  

如果幸运的话,就可以用了

Hive console

[whm@master ~]$ hive
SLF4J: Class path contains multiple SLF4J bindings.
...
hive> create table test (id int, name string);
hive> show tables;
OK
test
Time taken: 0.163 seconds, Fetched: 1 row(s)

查看metastore:

mysql> use hive;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select * from TBLS;
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE      | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
|      1 |  1484329787 |     1 |                0 | whm  |         0 |     1 | test     | MANAGED_TABLE | NULL               | NULL               |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
1 row in set (0.00 sec)

beeline连接hiveserver2

连接之前要先设置代理用户,否则会报错.

在hadoop的core-site.xml中,设置如下属性(proxyuser后面是运行hive的超级用户,本文用whm运行):

<property> 
    <name>hadoop.proxyuser.whm.hosts</name>
    <value>*</value>
</property>
<property>
    <name>hadoop.proxyuser.whm.groups</name>
    <value>*</value>
</property>

设置了以后, 无论使用什么用户登陆 (后面举例使用whatever),都使用hive超级用户 (本文是whm用户, 其启动了hiveserver2) 来代理whatever, 使whatever以whm的权限进行操作, 但所建立的表还是属于whatever.后面有例子.

在一个窗口中启动hiveserver2:

[whm@master ~]$ hiveserver2 
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/whm/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/whm/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

此时jps,多了一个RunJar:

[whm@master ~]$ jps
9456 SecondaryNameNode
9824 Jps
9282 DataNode
9149 NameNode
9583 RunJar

另一个窗口启动beeline,用户名写随便什么都可以,密码空,直接回车.(能做到这一点和我们前面把hdfs上的文件夹/tmp和/hive/warehouse权限设置为777有关.)

比如先用whm连接,建表, 再用随便起的whatever连接, 建表,效果如下:

whm连接:

[whm@master ~]$ beeline
Beeline version 1.2.1.spark2 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000: whm
Enter password for jdbc:hive2://localhost:10000: 
17/01/15 18:57:29 INFO jdbc.Utils: Supplied authorities: localhost:10000
17/01/15 18:57:29 INFO jdbc.Utils: Resolved authority: localhost:10000
17/01/15 18:57:29 INFO jdbc.HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://localhost:10000
Connected to: Apache Hive (version 2.1.0)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000> show tables;
+-----------+--+
| tab_name  |
+-----------+--+
| test      |
+-----------+--+
1 row selected (1.832 seconds)
0: jdbc:hive2://localhost:10000> create table whmbeelineteset (id int, name string);
No rows affected (0.491 seconds)
0: jdbc:hive2://localhost:10000> show tables;
+-------------------+--+
|     tab_name      |
+-------------------+--+
| test              |
| whmbeelineteset  |
+-------------------+--+
2 rows selected (0.238 seconds)
0: jdbc:hive2://localhost:10000> 

whatever连接:

[whm@master ~]$ beeline
Beeline version 1.2.1.spark2 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000: whatever
Enter password for jdbc:hive2://localhost:10000: 
17/01/15 20:53:48 INFO jdbc.Utils: Supplied authorities: localhost:10000
17/01/15 20:53:48 INFO jdbc.Utils: Resolved authority: localhost:10000
17/01/15 20:53:48 INFO jdbc.HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://localhost:10000
Connected to: Apache Hive (version 2.1.0)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000> create table whatevertest (id int, name string);
No rows affected (0.116 seconds)
0: jdbc:hive2://localhost:10000> show tables;
+-------------------+--+
|     tab_name      |
+-------------------+--+
| test              |
| whatevertest      |
| whmbeelinetest  |
+-------------------+--+
4 rows selected (0.079 seconds)

如果之前warehouse没有设置为777,会报错:

Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=hiveuser, access=WRITE, inode="/hive/warehouse/hiveusertest":whm:supergroup:drwxrwxr-x

当然如果只用hive超级用户连接不必设置为777,设置为775就可以了.

此时hdfs上,warehouse:

[whm@master ~]$ hdfs dfs -ls /hive/warehouse
Found 3 items
drwxrwxrwx   - whm     supergroup          0 2017-01-14 01:49 /hive/warehouse/test
drwxrwxrwx   - whatever supergroup          0 2017-01-15 20:54 /hive/warehouse/whatevertest
drwxrwxrwx   - whm     supergroup          0 2017-01-15 20:45 /hive/warehouse/whmbeelinetest

表whatevertest属于用户whatever, 其他的属于whm.

删除表:

0: jdbc:hive2://localhost:10000> drop table whmbeelinetest;
No rows affected (0.206 seconds)
0: jdbc:hive2://localhost:10000> show tables;
+---------------+--+
|   tab_name    |
+---------------+--+
| test          |
| whatevertest  |
+---------------+--+
2 rows selected (0.066 seconds)

warehouse:

[whm@master ~]$ hdfs dfs -ls /hive/warehouse
Found 2 items
drwxrwxrwx   - whm     supergroup          0 2017-01-14 01:49 /hive/warehouse/test
drwxrwxrwx   - whatever supergroup          0 2017-01-15 20:54 /hive/warehouse/whatevertest

但还有一个疑问

为什么不设置代理用户,用whm连接会报错呢?不输密码报错:

Error: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: whm is not allowed to impersonate whm (state=,code=0)

User: whm is not allowed to impersonate whm
whm不允许模拟whm...

输密码 (输了用户在linux的密码) 也报错:

Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000/default;auth=noSasl: java.net.ConnectException: Connection refused (state=08S01,code=0)

beeline也是whm启动的,hiveserver2也是whm启动的,whm都是它们的超级用户.

使用whm作为beeline连接时的用户名,为什么也需要设置代理用户呢...

其他

最初安装的时候不知怎么搞得把mysql搞坏了,hive那边可以正常建表,hdfs上也能看到,但是mysql的TBLS表里就是没有信息,Empty Set.于是完全卸载重装mysql:

yum remove mysql mysql-server mysql-libs compat-mysql51
rm -rf /var/lib/mysql #这步很重要
rm /etc/my.cnf

参考

http://bit1129.iteye.com/blog/2169918
http://mangocool.com/1465288694998.html
https://dzone.com/articles/how-configure-mysql-metastore
https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-RemoteMetastoreDatabase
http://blog.csdn.net/reesun/article/details/8556078

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,039评论 6 498
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,223评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 161,916评论 0 351
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,009评论 1 291
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,030评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,011评论 1 295
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,934评论 3 416
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,754评论 0 271
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,202评论 1 309
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,433评论 2 331
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,590评论 1 346
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,321评论 5 342
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,917评论 3 325
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,568评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,738评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,583评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,482评论 2 352

推荐阅读更多精彩内容