机器:
172.21.51.87
172.21.51.88
172.21.51.89
修改每台机器hosts
vim /etc/hosts
172.21.51.87 dev5187
172.21.51.88 dev5188
172.21.51.89 dev5189
禁用191.21.51.87
网卡
sudo systemctl stop firewalld.service
sudo systemctl disable firewalld.service
vim /etc/selinux/config
SELINUX=enforcing 改为 disabled
关闭swap;
临时关闭 swapoff -a 永久关闭 : sudo echo 'vm.swappiness=0'>> /etc/sysctl.conf 重启
创建用户(每一台)
groupadd hadoop
useradd hadoop -g hadoop
passwd hadoop
修改权限
vim /etc/sudoers
注意权限 第一次需要chmod 640 /etc/sudoers
(每一台)
hadoop ALL =(ALL) NOPASSWD: ALL
修改配置
vim /etc/security/limits.conf
(这步可以通过root也可以用别的用户sudo实现)
* soft nofile 32768
* soft nproc 65536
* hard nofile 1048576
* hard nproc unlimited
* hard memlock unlimited
* soft memlock unlimited
gz包存放位置
/APS/usr/vdsappas/package
解压后文件存放位置
/APS/usr/vdsappas/soft
1.zookeeper搭建
a.解压
tar -xzvf zookeeper-3.4.5-cdh5.14.4.tar.gz -C /APS/usr/vdsappas/soft/
b.创建符号连接
ln -s zookeeper-3.4.5-cdh5.14.4 zookeeper
c.配置环境变量
vim ~/.bash_profile
ZOOKEEPER_HOME=/APS/usr/vdsappas/soft/zookeeper/bin
PATH=$PATH:$HOME/.local/bin:$HOME/bin:$UPSQL_CLIENT_HOME:$ZOOKEEPER_HOME
source ~/.bash_profile
d.修改配置文件
cd /APS/usr/vdsappas/soft/zookeeper/conf
cp zoo_sample.cfg zoo.cfg
vim zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
dataDir=/APS/usr/vdsappas/softdata/zookeeper/data
dataLogDir=/APS/usr/vdsappas/softdata/zookeeper/logs
server.1=172.21.51.87:2888:3888
server.2=172.21.51.88:2888:3888
server.3=172.21.51.89:2888:3888
e.修改日志配置文件
vim /APS/usr/vdsappas/soft/zookeeper/conf/log4j.properties
zookeeper.log.dir=/APS/usr/vdsappas/softdata/zookeeper/logs
zookeeper.tracelog.dir=/APS/usr/vdsappas/softdata/zookeeper/logs
f.添加myid文件
echo 1 > /APS/usr/vdsappas/softdata/zookeeper/data/myid
g.按以上步骤安装其他服务器
h.启动zk(所有安装的服务器)
zkServer.sh start
查看状态
zkServer.sh status
正常启动的情况下,会显示leader follower
2.配置ssh
在每台服务器的当前用户目录下
cd ~
生成
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
操作172.21.51.87
服务器,配置ssh连接其他服务器
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys #公钥写入自己的认证文件
chmod 600 ~/.ssh/authorized_keys
将87
上的公匙发送至其他服务器
scp ~/.ssh/id_rsa.pub vdsappas@172.21.51.89:~/.ssh/authorized_keys
操作172.21.51.88
服务器,配置ssh连接其他服务器
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys #公钥写入自己的认证文件
chmod 600 ~/.ssh/authorized_keys
将88
上的公匙发送至其他服务器
scp ~/.ssh/id_rsa.pub vdsappas@172.21.51.89:~/.ssh/id_rsa.pub.2
将88
上的公匙写入认证文件
cat ~/.ssh/id_rsa.pub.2 >> ~/.ssh/authorized_keys
3.配置hadoop
a.解压
tar -xzvf hadoop-2.6.0-cdh5.14.4.tar.gz -C ~/soft/
ln -s hadoop-2.6.0-cdh5.14.4 hadoop
b.修改配置文件
cd /APS/usr/vdsappas/soft/hadoop/etc/hadoop
[hdfs-site.xml]
<configuration>
<property>
<name>dfs.nameservices</name>
<value>unionpayCluster</value>
</property>
<property>
<name>dfs.ha.namenodes.unionpayCluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.unionpayCluster.nn1</name>
<value>172.21.51.87:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.unionpayCluster.nn2</name>
<value>172.21.51.88:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.unionpayCluster.nn1</name>
<value>172.21.51.87:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.unionpayCluster.nn2</name>
<value>172.21.51.88:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://172.21.51.87:8485;172.21.51.88:8485;172.21.51.89:8485/unionpayCluster</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.unionpayCluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/APS/usr/vdsappas/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/APS/usr/vdsappas/softdata/hadoop/journal</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/APS/usr/vdsappas/softdata/hadoop/hdfs/dfs/name</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/APS/usr/vdsappas/softdata/hadoop/diskb/dfs</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
[core-site.xml]
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://unionpayCluster</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>172.21.51.87:2181,172.21.51.88:2181,172.21.51.89:2181</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/APS/usr/vdsappas/softdata/hadoop/log</value>
</property>
</configuration>
[yarn-site.xml]
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>172.21.51.87:2181,172.21.51.88:2181,172.21.51.89:2181</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.client.failover-sleep-base-ms</name>
<value>100</value>
</property>
<property>
<name>yarn.client.failover-sleep-max-ms</name>
<value>2000</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarncluster</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>172.21.51.87:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>172.21.51.87:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>172.21.51.87:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>172.21.51.87:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>172.21.51.87:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address.rm1</name>
<value>172.21.51.87:8090</value>
</property>
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>172.21.51.89:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>172.21.51.89:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>172.21.51.89:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>172.21.51.89:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>172.21.51.89:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address.rm2</name>
<value>172.21.51.89:8090</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.client.thread-count</name>
<value>50</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.client.thread-count</name>
<value>50</value>
</property>
<property>
<name>yarn.resourcemanager.admin.client.thread-count</name>
<value>1</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.scheduler.increment-allocation-mb</name>
<value>512</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>16384</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
</property>
<property>
<name>yarn.scheduler.increment-allocation-vcores</name>
<value>1</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>8</value>
</property>
<property>
<name>yarn.resourcemanager.amliveliness-monitor.interval-ms</name>
<value>1000</value>
</property>
<property>
<name>yarn.am.liveness-monitor.expiry-interval-ms</name>
<value>600000</value>
</property>
<property>
<name>yarn.resourcemanager.am.max-attempts</name>
<value>2</value>
</property>
<property>
<name>yarn.resourcemanager.container.liveness-monitor.interval-ms</name>
<value>600000</value>
</property>
<property>
<name>yarn.resourcemanager.nm.liveness-monitor.interval-ms</name>
<value>1000</value>
</property>
<property>
<name>yarn.nm.liveness-monitor.expiry-interval-ms</name>
<value>600000</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.client.thread-count</name>
<value>50</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>$HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.resourcemanager.max-completed-applications</name>
<value>10000</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://unionpayCluster/var/log/hadoop-yarn/apps</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir-suffix</name>
<value>logs</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<description>List of directories to store localized files in.</description>
<name>yarn.nodemanager.local-dirs</name>
<value>/APS/usr/vdsappas/softdata/hadoop/hadoop-yarn/cache/${user.name}/nm-local-dir</value>
</property>
<property>
<description>Where to store container logs.</description>
<name>yarn.nodemanager.log-dirs</name>
<value>/APS/usr/vdsappas/softdata/hadoop/hadoop-yarn/log/containers</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>
</configuration>
配置队列
[yarn-site.xml]可不做
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.scheduler.capacity.maximum-applications</name>
<value>10000</value>
<description>
Maximum number of applications that can be pending and running.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>0.1</value>
<description>
Maximum percent of resources in the cluster which can be used to run
application masters i.e. controls number of concurrent running
applications.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
<description>
The ResourceCalculator implementation to be used to compare
Resources in the scheduler.
The default i.e. DefaultResourceCalculator only uses Memory while
DominantResourceCalculator uses dominant-resource to compare
multi-dimensional resources such as Memory, CPU etc.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>default,queue1,queue2,queue3</value>
<description>
The queues at the this level (root is the root queue).
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.capacity</name>
<value>56</value>
<description>Default queue target capacity.</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
<value>1.0</value>
<description>
Default queue user limit a percentage from 0.0 to 1.0.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
<value>100</value>
<description>
The maximum capacity of the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.state</name>
<value>RUNNING</value>
<description>
The state of the default queue. State can be one of RUNNING or STOPPED.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
<value>*</value>
<description>
The ACL of who can submit jobs to the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
<value>*</value>
<description>
The ACL of who can administer jobs on the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.node-locality-delay</name>
<value>40</value>
<description>
Number of missed scheduling opportunities after which the CapacityScheduler
attempts to schedule rack-local containers.
Typically this should be set to number of nodes in the cluster, By default is setting
approximately number of nodes in one rack which is 40.
</description>
</property>
<!--root.queue1队列使用限制-->
<property>
<name>yarn.scheduler.capacity.root.queue1.capacity</name>
<value>10</value>
<description>Default queue target capacity.</description>
</property>
<!--root.queue1队列默认用户最大使用限制是在其他队列空闲的情况下若是其他队列有充足的任务进行,是按照比例分配的 -->
<property>
<name>yarn.scheduler.capacity.root.queue1.maximum-capacity</name>
<value>10</value>
<description>
The maximum capacity of the default queue.
</description>
</property>
<!--root.queue1队列默认用户使用使用限制 表示单个用户最大可以占该队列容量的100%-->
<property>
<name>yarn.scheduler.capacity.root.queue1.user-limit-factor</name>
<value>1</value>
<description>
Default queue user limit a percentage from 0.0 to 1.0.
</description>
</property>
<!--root.queue1队列默认显示状态-->
<property>
<name>yarn.scheduler.capacity.root.queue1.state</name>
<value>RUNNING</value>
<description>
The state of the default queue. State can be one of RUNNING or STOPPED.
</description>
</property>
<!--root.queue1队列访问权限-->
<property>
<name>yarn.scheduler.capacity.root.queue1.acl_submit_applications</name>
<value>*</value>
<description>
The ACL of who can submit jobs to the default queue.
</description>
</property>
<!--root.queue1队列默认管理用户-->
<property>
<name>yarn.scheduler.capacity.root.queue1.acl_administer_queue</name>
<value>*</value>
<description>
The ACL of who can administer jobs on the default queue.
</description>
</property>
<!--root.queue2队列使用限制-->
<property>
<name>yarn.scheduler.capacity.root.queue2.capacity</name>
<value>4</value>
<description>Default queue target capacity.</description>
</property>
<!--root.queue2队列默认用户最大使用限制是在其他队列空闲的情况下若是其他队列有充足的任务进行,是按照比例分配的 -->
<property>
<name>yarn.scheduler.capacity.root.queue2.maximum-capacity</name>
<value>4</value>
<description>
The maximum capacity of the default queue.
</description>
</property>
<!--root.queue2队列默认用户使用使用限制 表示单个用户最大可以占该队列容量的100%-->
<property>
<name>yarn.scheduler.capacity.root.queue2.user-limit-factor</name>
<value>1</value>
<description>
Default queue user limit a percentage from 0.0 to 1.0.
</description>
</property>
<!--root.queue2队列默认显示状态-->
<property>
<name>yarn.scheduler.capacity.root.queue2.state</name>
<value>RUNNING</value>
<description>
The state of the default queue. State can be one of RUNNING or STOPPED.
</description>
</property>
<!--root.queue2队列访问权限-->
<property>
<name>yarn.scheduler.capacity.root.queue2.acl_submit_applications</name>
<value>*</value>
<description>
The ACL of who can submit jobs to the default queue.
</description>
</property>
<!--root.queue2队列默认管理用户-->
<property>
<name>yarn.scheduler.capacity.root.queue2.acl_administer_queue</name>
<value>*</value>
<description>
The ACL of who can administer jobs on the default queue.
</description>
</property>
<!--root.queue3队列使用限制-->
<property>
<name>yarn.scheduler.capacity.root.queue3.capacity</name>
<value>30</value>
<description>Default queue target capacity.</description>
</property>
<!--root.queue3队列默认用户最大使用限制是在其他队列空闲的情况下若是其他队列有充足的任务进行,是按照比例分配的 -->
<property>
<name>yarn.scheduler.capacity.root.queue3.maximum-capacity</name>
<value>30</value>
<description>
The maximum capacity of the default queue.
</description>
</property>
<!--root.queue3队列默认用户使用使用限制 表示单个用户最大可以占该队列容量的100%-->
<property>
<name>yarn.scheduler.capacity.root.queue3.user-limit-factor</name>
<value>1</value>
<description>
Default queue user limit a percentage from 0.0 to 1.0.
</description>
</property>
<!--root.queue3队列默认显示状态-->
<property>
<name>yarn.scheduler.capacity.root.queue3.state</name>
<value>RUNNING</value>
<description>
The state of the default queue. State can be one of RUNNING or STOPPED.
</description>
</property>
<!--root.queue3队列访问权限-->
<property>
<name>yarn.scheduler.capacity.root.queue3.acl_submit_applications</name>
<value>*</value>
<description>
The ACL of who can submit jobs to the default queue.
</description>
</property>
<!--root.queue3队列默认管理用户-->
<property>
<name>yarn.scheduler.capacity.root.queue3.acl_administer_queue</name>
<value>*</value>
<description>
The ACL of who can administer jobs on the default queue.
</description>
</property>
</configuration>
[mapred-site.xml]
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/user/history/done_intermediate</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/user/history/done</value>
</property>
</configuration>
[slaves]
172.21.51.87
172.21.51.88
172.21.51.89
c.配置环境变量
vim ~/.bash_profile
HADOOP_HOME=/APS/usr/vdsappas/soft/hadoop
PATH=$PATH:$HOME/.local/bin:$HOME/bin:$UPSQL_CLIENT_HOME:$ZOOKEEPER_HOME:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
d.创建相关文件夹
e.启动journalnode
hadoop-daemon.sh start journalnode
f.格式化
hadoop namenode -format
g.启动所有
start-all.sh
h.验证
jps 查看所有服务是否启动
访问http://172.21.51.87:50070
http://172.21.51.88:50070
页面查看
4.安装hive
a.解压
tar -xzvf hive-1.1.0-cdh5.14.4.tar.gz -C /APS/usr/vdsappas/soft/
b.创建符号连接
ln -s hive-1.1.0-cdh5.14.4 hive
c.配置环境变量
vim ~/.bash_profile
HIVE_HOME=/APS/usr/vdsappas/soft/hive/bin
PATH=$PATH:$HOME/.local/bin:$HOME/bin:$UPSQL_CLIENT_HOME:$ZOOKEEPER_HOME:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME
d.在mysql数据库中创建metastore数据库
mysql -uroot -proot
CREATE DATABASE metastore;
e.修改配置文件
[hive-site.xml]
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://172.21.51.87:3306/metastore</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>true</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
</configuration>
[hive-env.sh]
指定hadoop_home
HADOOP_HOME=/APS/usr/vdsappas/soft/hadoop
export HIVE_CONF_DIR=/APS/usr/vdsappas/soft/hive/conf
export HIVE_AUX_JARS_PATH=/soft/hive/lib,/soft/hadoop-2.6.0-cdh5.14.4/share/hadoop/common,/soft/hive/lib,/soft/hadoop-2.6.0-cdh5.14.4/share/hadoop/common/lib,/soft/hadoop-2.6.0-cdh5.14.4/share/hadoop/mapreduce,/soft/hadoop-2.6.0-cdh5.14.4/etc/hadoop,/soft/hadoop-2.6.0-cdh5.14.4/share/hadoop/hdfs,/soft/hadoop-2.6.0-cdh5.14.4/share/hadoop/yarn,/soft/hadoop-2.6.0-cdh5.14.4/share/hadoop/yarn/lib,/soft/hadoop-2.6.0-cdh5.14.4/share/hadoop/mapreduce/lib
[hive-log4j.properties]
修改日志文件路径
hive.log.dir=/APS/usr/vdsappas/softdata/hive/logs
f.初始化元数据(表结构)到mysql
schematool -dbType mysql -initSchema
g.启动metastore和hiveserver2服务
hive --service metastore &
hive --service hiveserver2 &
h.测试
netstat -anop | grep 10000 #查看hiveserver2服务是否启动
beeline验证
5.安装kafka(使用原来87-89的kafka)
a.解压
tar -xzvf kafka-0.10.0-kafka2.1.1.tar.gz -C /APS/usr/vdsappas/soft/
b.创建符号连接
ln -s kafka-0.10.0-kafka2.1.1 kafka
c.配置环境变量
vim ~/.bash_profile
KAFKA_HOME=/APS/usr/vdsappas/soft/kafka/bin PATH=$PATH:$HOME/.local/bin:$HOME/bin:$UPSQL_CLIENT_HOME:$ZOOKEEPER_HOME:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME:$KAFKA_HOME
d.修改配置文件
cd /APS/usr/vdsappas/soft/kafka/config
vim server.properties
broker.id=1
listeners=PLAINTEXT://:9092
log.dirs=/APS/usr/vdsappas/softdata/kafka/logs
zookeeper.connect=172.21.51.87:2181,172.21.51.88:2181,172.21.51.89:2181
log.retention.hours=168 #数据的保留时间(168 hours=7天)
delete.topic.enable=true #可以删除已创建主题
注:每台服务器上的broker.id是不同的
e.启动每台服务器的kafka
kafka-server-start.sh /APS/usr/vdsappas/soft/kafka/config/server.properties
6.安装spark
a.解压
tar -xzvf spark-1.6.0-cdh5.14.4.tar.gz -C /APS/usr/vdsappas/soft/
b 创建符号链接
ln -s spark-1.6.0-cdh5.14.4 spark
c.配置环境变量
vim ~/.bash_profile
export SPARK_HOME=/APS/usr/vdsappas/soft/spark
PATH=$PATH:$HOME/.local/bin:$HOME/bin:$JAVA_HOME/bin:$UPSQL_CLIENT_HOME:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME::$HIVE_HOME/bin:$SPARK_HOME/bin
d.修改配置文件
cd /APS/usr/vdsappas/soft/spark/conf/
vim spark-env.sh
增加如下配置
export HADOOP_CONF_DIR=/APS/usr/vdsappas/soft/hadoop/etc/hadoop
export SPARK_CONF_DIR=/APS/usr/vdsappas/soft/spark/conf
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=172.21.51.87:2181,172.21.51.88:2181,172.21.51.89:2181 -Dspark.deploy.zookeeper.dir=/spark"
export SPARK_DIST_CLASSPATH=$(/APS/usr/vdsappas/soft/hadoop/bin/hadoop classpath)
vim spark-defaults.conf
#增加如下配置
spark.driver.memory 5g
spark.eventLog.enabled true
spark.eventLog.compress true
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.master yarn
vim slaves
#增加如下配置
172.21.51.87
172.21.51.88
172.21.51.89
在/APS/usr/vdsappas/soft/hadoop/share/hadoop/common/lib
目录下上传如下jar包
jackson-annotations-2.4.0.jar
jackson-core-2.4.2.jar
jackson-databind-2.4.2.jar
parquet-hadoop-1.4.3.jar
在三台机器重复以上步骤
e.启动spark集群
cd /APS/usr/vdsappas/soft/spark/sbin
./start-all.sh
在另外两台机器/APS/usr/vdsappas/soft/spark/sbin
目录下执行 start-master.sh
,实现spark master高可用