本文以hive3.1为示例,分别配置hadoop和spark做为计算引擎。
1环境准备
1.1 下载jdk
登陆 Oracle官网 下载jdk
oracle账号:amador.sun@foxmail.com
密码:1211WaN!
1.2 配置jdk环境变量
编辑环境变量:vim ~/.bash_profile
JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
1.3 验证
使用java -version
命令,查看安装jdk版本
duyanyong@duyanyongdeMacBook-Air apache-hive-3.1.3-bin % java -version
java version "1.8.0_291"
Java(TM) SE Runtime Environment (build 1.8.0_291-b10)
Java HotSpot(TM) 64-Bit Server VM (build 25.291-b10, mixed mode)
2部署hadoop
查看hive源码确认hadoop对应版本为<hadoop.version>3.1.0</hadoop.version>
2.1 下载hadoop
进入下载地址下载匹配版本的hadoop
2.2 免密登陆
因为Hadoop是分布式平台,需要多个机器之间协作,设置ssh免密码登录可以减少每次登陆主机输入密码的繁琐流程。
- 在Mac的系统偏好设置–>共享中打开远程登录:
- 在terminal中输入 ssh-keygen -t rsa ,生成rsa公钥,接下来一路按回车键或者输入y就行了:
- 在terminal中输入 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys ,将公钥的内容写入到authorized_keys文件中。
- 在terminal中输入 ssh localhost ,不需要密码也能登录,说明设置成功。
2.3 修改Hadoop配置文件
进入路径/Users/duyanyong/bigdata/hadoop-3.1.0/etc/hadoop
编辑以下配置文件
1.core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://0.0.0.0:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储路径 -->
<property>
<name>hadoop.tmp.dir</name>
<!-- 配置到hadoop目录下temp文件夹 -->
<value>/Users/duyanyong/bigdata/hadoop-3.1.0/tmp</value>
</property>
</configuration>
2.hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<!--指定hdfs保存数据副本的数量,包括自己,默认为3-->
<!--伪分布式模式,此值必须为1-->
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<!-- name node 存放 name table 的目录 -->
<value>file:/Users/duyanyong/bigdata/hadoop-3.1.0/tmp/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<!-- data node 存放数据 block 的目录 -->
<value>file:/Users/duyanyong/bigdata/hadoop-3.1.0/tmp/hdfs/data</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>localhost:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
3.mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<!--指定mapreduce运行在yarn上-->
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
4.yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
<property>
<!--NodeManager获取数据的方式-->
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>/Users/duyanyong/bigdata/hadoop-3.1.0/etc/hadoop:/Users/duyanyong/bigdata/hadoop-3.1.0/share/hadoop/common/lib/*:/Users/duyanyong/bigdata/hadoop-3.1.0/share/hadoop/common/*:/Users/duyanyong/bigdata/hadoop-3.1.0/share/hadoop/hdfs:/Users/duyanyong/bigdata/hadoop-3.1.0/share/hadoop/hdfs/lib/*:/Users/duyanyong/bigdata/hadoop-3.1.0/share/hadoop/hdfs/*:/Users/duyanyong/bigdata/hadoop-3.1.0/share/hadoop/mapreduce/lib/*:/Users/duyanyong/bigdata/hadoop-3.1.0/share/hadoop/mapreduce/*:/Users/duyanyong/bigdata/hadoop-3.1.0/share/hadoop/yarn:/Users/duyanyong/bigdata/hadoop-3.1.0/share/hadoop/yarn/lib/*:/Users/duyanyong/bigdata/hadoop-3.1.0/share/hadoop/yarn/*</value>
</property>
</configuration>
2.4 初始化及启动
初始化hdfshdfs namenode -format
启动hadoopstart-all.sh
或者分别输入start-dfs.sh
和start-yarn.sh)
2.5 验证
3部署hive
3.1 下载hive
进入官方下载下载hive源码,使用idea打开,切换到3.1.3分支
3.2 编译hive源文件
进入hive根目录/Users/duyanyong/IdeaProjects/hive
编辑hive源码mvn clean package -DskipTests -Phadoop -Pdist
编译完成之后会在hive根目录下生成packaging
文件
3.3 添加hadoop配置
进入路径/Users/duyanyong/IdeaProjects/hive/packaging/target/apache-hive-3.1.3-bin/apache-hive-3.1.3-bin/conf
修改hive-env.sh
添加hadoop路径配置
HADOOP_HOME=/Users/duyanyong/bigdata/hadoop-3.1.0
3.4 初始化元数据库
3.4.1 在mysql创建hive用户
# 进入mysql
hadoop@ubuntu:~$ mysql -u root -p mysql
# 创建hive用户,密码为hive
mysql> CREATE USER 'hive' IDENTIFIED BY 'hive';、
# 为hive赋权
mysql> GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' WITH GRANT OPTION;
# 刷新权限
mysql> flush privileges;
3.4.2 修改hive元数据库配置
在Hive的conf目录下的文件“hive-site.xml”中增加如下配置:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.local</name>
<value>true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive_metastore?characterEncoding=UTF-8</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
</property>
</configuration>
3.4.3 初始化mysql元数据库
进入路径:/Users/duyanyong/IdeaProjects/hive/bin
执行命令./schematool --dbType mysql --initSchema
完成元数据库的初始化
3.5 开启hive debug
3.5.1 运行hive debug模式
以debug模式运行hivehive --debug -hiveconf hive.root.logger=DEBUG,console
3.5.2 idea中添加远程debug配置
3.5.3 源码上打断点
找到CliDriver.java类,并打上断点
3.5.4 debug
debug源码,输入hive命令,如:show databases;