安装Hive
1.将 apache-hive-1.2.1-bin.tar.gz 解压到指定的目录,修改其名称为hive。修改hive配置文件名称hive-env.sh.template名称为hive-env.sh,mv hive-env.sh.template hive-env.sh
2.配置hive-env.sh文件:配置HADOOP_HOME路径,配置HIVE_CONF_DIR路径
3.启动Hive前,要确保Hadoop以启动
4.基本操作:启动hive bin/hive,查看数据库show databases; 打开默认数据库 use default; 显示default数据库中的表 show tables; 显示数据库中有几张表 show tables; 查看表的结构 desc student; 创建一张表 create table student(id int, name string); 向表中插入数据 insert into student values(1000,"ss"); 查询表中数据 select * from student; 退出hive quit;
MySql安装
1. 切换为root账户 ,查看mysql是否安装,如果安装了,卸载mysql
2.如果为安装,在安装前要先卸载MariaDB数据库,安装mysql,列出所有被安装的rpm package ,rpm -qa | grep mariadb,强制卸载rpm -e --nodeps + 名称。
3.安装MySql服务器 rpm -ivh MySQL-server-5.6.24-1.el6.x86_64.rpm
查看产生的随机密码 cat /root/.mysql_secret ,查看mysql状态 service mysql status,启动mysql service mysql start
4.安装MySql客户端 rpm -ivh MySQL-client-5.6.24-1.el6.x86_64.rpm,链接mysql mysql -uroot -p + 之前产生的随机密码,修改密码SET PASSWORD=PASSWORD(‘xxxxxx’);退出mysql exit
5.MySql中user表中主机配置,配置只要是root用户+密码,在任何主机上都能登录MySQL数据库。
1.进入mysql mysql -uroot -p 密码,2.显示数据库 show databases; 3.使用mysql数据库 use mysql;4.展示mysql数据库中的所有表 show tables; 5.展示user表的结构 desc user; 6.查询user表 select User, Host, Password from user;7.修改user表,把Host表内容修改为% update user set host='%' where host='localhost'; 8.删除root用户的其他host delete from user where Host=''; delete from user where Host='';delete from user where Host='';9.刷新flush privileges;10.退出 quit;
6.Hive元数据配置到MySql
解压驱动包,将驱动包拷贝到hive的lib目录下,将root账号退出,修改驱动包的所有者为用户
7.配置Metastore到MySql,在conf目录下,创建vi hive-site.xml,根据官方文档配置参数,拷贝数据到hive-site.xml文件中hive-site 官方文档地址,
内容如下:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://主机名称:端口号/metastore?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>MySQL账号</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>MySQL密码</value>
<description>password to use against metastore database</description>
</property>
</configuration>
配置完毕后,如果启动hive异常,可以重新启动虚拟机。(重启后,别忘了启动hadoop集群)
8.查询后信息显示配置 在hive-site.xml文件中添加如下配置信息,就可以实现显示当前数据库,以及查询表的头信息配置。<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
修改hive-site.xml 关闭元数据检查 增加如下配置:
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
3.Hive运行引擎Tez
1.解压缩apache-tez-0.9.1-bin.tar.gz,修改名称 mv apache-tez-0.9.1-bin/ tez-0.9.1
2.在Hive中配置Tez, 在hive-env.sh文件中添加tez环境变量配置和依赖包环境变量配置
# Set HADOOP_HOME to point to a specific hadoop install directory
export HADOOP_HOME=hadoop 安装目录
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=hive conf目录
# Folder containing extra libraries required for hive compilation/execution can be controlled by:
export TEZ_HOME= #是你的tez的解压目录
export TEZ_JARS=""
for jar in `ls $TEZ_HOME |grep jar`; do
export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/$jar
done
for jar in `ls $TEZ_HOME/lib`; do
export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/lib/$jar
done
export HIVE_AUX_JARS_PATH=hadoop目录common包中的lzojar包 位置$TEZ_JARS
3.在hive-site.xml文件中添加如下配置,更改hive计算引擎
<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
4.配置Tez,在Hive的Conf目录下,创建一个tez-site.xml文件,添加如下内容
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>tez.lib.uris</name> <value>${fs.defaultFS}/tez/tez-0.9.1,${fs.defaultFS}/tez/tez-0.9.1/lib</value>
</property>
<property>
<name>tez.lib.uris.classpath</name> <value>${fs.defaultFS}/tez/tez-0.9.1,${fs.defaultFS}/tez/tez-0.9.1/lib</value>
</property>
<property>
<name>tez.use.cluster.hadoop-libs</name>
<value>true</value>
</property>
<property>
<name>tez.history.logging.service.class</name> <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>
</configuration>
5.上传Tez到集群
6.测试,启动Hive,创建表,向表中插入数据,如果没有报错,就表示成功了,如果报
The NodeManager is killing your container. It sounds like you are trying to use hadoop streaming which is running as a child process of the map-reduce task. The NodeManager monitors the entire process tree of the task and if it eats up more memory than the maximum set in mapreduce.map.memory.mb or mapreduce.reduce.memory.mb respectively, we would expect the Nodemanager to kill the task, otherwise your task is stealing memory belonging to other containers, which you don't want.
解决方法:修改 hadoop - yarn-site.xml
property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
关掉虚拟内存检查。