简介
hive是一个客户端, 也可以当作一个软件, 它可以将hql(类似于sql)语句转化为mapreduce算法执行, 得到需要的结果.
原理就是将hadoop文件系统中的一定格式的文件的解析思路保存到mysql(或者其他数据库)中, 这样就可以从数据库取解析方法去操作分布式文件系统的文件了!
环境准备
1. 3台centOS 6.5
关闭防火墙
安装jdk
配置host ( zk1, zk2, zk3)
配置免密钥ssh (包括自己链接自己)
2. mysql一台(主机名mysql)
允许远程连接
给与数据库权限
安装运行hadoop
1. 配置hadoop
解压
mkdir -p /opt/modules/cdh/
tar -zxvf hadoop-2.5.0-cdh5.3.6.tar.gz -C /opt/modules/cdh
cd /opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/etc/hadoop
修改配置文件
- 将
core-site.xml
hdfs-site.xml
yarn-site.xml
mapred-site.xml
hadoop.env.sh
yarn-env.sh
mapred-env.sh
去掉后缀.template
- 在
hadoop.env.sh
yarn-env.sh
mapred-env.sh
添加JAVA_HOME的变量
- core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://zk1:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/data</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<!-- 指定数据冗余份数 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 关闭权限检查-->
<property>
<name>dfs.permissions.enable</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>zk3:50090</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>zk1:50070</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
- yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>zk2</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>86400</value>
</property>
<!-- 任务历史服务 -->
<property>
<name>yarn.log.server.url</name>
<value>http://zk1:19888/jobhistory/logs/</value>
</property>
</configuration>
- mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.adress</name>
<value>zk1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.adress</name>
<value>zk1:19888</value>
</property>
</configuration>
添加slave
文件 (etc/hadoop/目录下)
vi slave
添加
zk1
zk2
zk3
配置完成后scp
到其他两台主机上
scp -r /opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/etc/hadoop root@zk2:/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/etc/
scp -r /opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/etc/hadoop root@zk3:/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/etc/
在namenode机器(zk1)执行格式化namenode
/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/bin/hdfs namenode -format
2. 启动hadoop
启动namenode (zk1)
/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/sbin/hadoop-daemon.sh start namenode
启动secondarynamenode (zk3)
/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/sbin/hadoop-daemon.sh start secondarynamenode
启动datanode (zk1, zk2, zk3)
/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/sbin/hadoop-daemon.sh start datanode
启动resourcemanager (zk2)
/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/sbin/yarn-daemon.sh start resourcemanager
启动nodemanager (zk1, zk2, zk3)
/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/sbin/yarn-daemon.sh start nodemanager
启动historyserver (zk1)
/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/sbin/mr-jobhistory-daemon.sh start historyserver
验证是否完成启动, 浏览器访问http://zk1:50070
安装运行hive
安装hive
解压tarbao
tar -zxvf hive-0.13.1-cdh5.3.6.tar.gz -C /opt/modules/cdh/
修改配置文件
- 重命名配置文件
mv hive-default.xml.template hive-site.xml
mv hive-env.sh.template hive-env.sh
mv hive-log4j.properties.template hive-log4j.properties
- hive-env.sh
JAVA_HOME=/usr/local/jdk
HADOOP_HOME=/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/
export HIVE_CONF_DIR=/opt/modules/cdh/hive-0.13.1-cdh5.3.6/conf
- hive-site.xml (修改不是添加)
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://mysql:3306/metastore?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123123</value>
<description>password to use against metastore database</description>
</property>
- hive-log4j.properties
hive.log.dir=/opt/modules/cdh/hive-0.13.1-cdh5.3.6/logs
拷贝jdbc驱动到lib目录下
cp -a mysql-connector-java-5.1.27-bin.jar /opt/modules/cdh/hive-0.13.1-cdh5.3.6/lib/
运行hive
/opt/modules/cdh/hive-0.13.1-cdh5.3.6/bin/hive