官网给出了Apache Hive的配置说明,Hive的配置支持多种方式,主要如下(以map-reduce临时目录配置项 hive.exec.scratchdir 为例):
- 第1种,环境变量
set hive.exec.scratchdir=/tmp/mydir;
- 第2种,hive 交互命令的参数(--hiveconf)
bin/hive --hiveconf hive.exec.scratchdir=/tmp/mydir
- 第3种,hive-site.xml配置文件
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/mydir</value>
<description>Scratch space for Hive jobs</description>
</property>
- 第4种,hivemetastore-site.xml, hiveserver2-site.xml 配置文件
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/mydir</value>
<description>Scratch space for Hive jobs</description>
</property>
当同时出现多种配置方式时,则按以下优先级生效(越往后,优先级越高):
hive-site.xml -> hivemetastore-site.xml -> hiveserver2-site.xml -> '--hiveconf' 命令行参数
在 $HIVE_HOME/conf 里面还有一个默认的配置文件 hive-default.xml.template ,这里存储了默认的参数,通过复制该默认配置模板,并命名为hive-site.xml,用于配置新的参数。
cp $HIVE_HOME/conf/hive-default.xml.template $HIVE_HOME/conf/hive-site.xml
接下来,将继续配置hive-site.xml,使用mysql进行存储,hive官网有说明文档
# 复制 hive 的默认配置模板文件为 hive-site.xml
cp $HIVE_HOME/conf/hive-default.xml.template $HIVE_HOME/conf/hive-site.xml
# 复制 mysql 驱动包到 lib 目录
mv mysql-connector-java-8.0.17.jar $HIVE_HOME/lib/
编辑hive_site.xml,修改jdbc链接(hd1为mysql server的主机名)、驱动、账号、密码、数据仓库默认路径(hdfs)等信息,如下:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://wkh11:3306/hivedb?createDatabaseIfNotExist=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
在完成基本配置后,还有以下几点需要注意的
- a、按官网文档,还强烈推荐在hive-site.xml中配置datanucleus.autoStartMechanism项,以解决多并发读取失败的问题(HIVE-4762),配置如下
<property>
<name>datanucleus.autoStartMechanism</name>
<value>SchemaTable</value>
</property>
- b、在hive-site.xml中再配置元数据认证为false,否则启动时会报以下异常
Caused by: MetaException(message:Version information not found in metastore. )
配置如下
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>
Enforce metastore schema version consistency.
True: Verify that version information stored in metastore matches with one from Hive jars. Also disable automatic schema migration attempt. Users are required to manully migrate schema after Hive upgrade which ensures proper metastore schema migration. (Default)
False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
</description>
</property>
当配置为true时,则表示会强制metastore的版本信息与hive jar 一致。(这里很奇怪,使用hive官网下载的包来解压安装,按理metastore的版本信息应该是会和hive jar一致的,怎么设置为true会报异常呢)
- c、配置 io 临时目录,否则会报异常
Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at org.apache.hadoop.fs.Path.initialize(Path.java:254)
at org.apache.hadoop.fs.Path.<init>(Path.java:212)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:644)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:563)
at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:531)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at java.net.URI.checkPath(URI.java:1823)
at java.net.URI.<init>(URI.java:745)
at org.apache.hadoop.fs.Path.initialize(Path.java:251)
... 12 more
这是因为在hive-site.xml中使用了变量${system:java.io.tmpdir}为表示io临时目录,但没有指定这个变量的值,也即没有指定io临时目录的路径,因此会报异常。
创建io临时目录
mkdir /home/ahadoop/hive-data
mkdir /home/ahadoop/hive-data/tmp
配置 hive-site.xml 指定 io 临时目录的路径(本实验使用的linux账号为hive,可根据实际情况修改)
<property>
<name>system:java.io.tmpdir</name>
<value>/opt/software/apache-hive-2.3.5-bin/hive_tmp/</value>
</property>
<property>
<name>system:user.name</name>
<value>hive</value>
</property>
经过以上步骤,已经完成了hive-site.xml的配置了