概述
之前在阿里云买的学生服务器,为了平常学习使用,决定在一台服务器上搭建一个伪分布式的开发环境。之前用三台虚拟机部署过集群,时间长了怕忘,现在升级各个版本重新部署,记下来方便以后查看。难免有错误,欢迎指出,共同学习交流。
一、准备好需要用到的软件
1.ssh软件:
xshell5 xftp
2.使用cloudrea cdh5.8.0系列,下载地址:
http://archive-primary.cloudera.com/cdh5/cdh/5/archive-primary.cloudera.com
- hadoop-2.6.0-cdh5.8.0.tar.gz
- hbase-1.2.0-cdh5.8.0.tar.gz
- hive-1.1.0-cdh5.8.0.tar.gz
- zookeeper-3.4.5-cdh5.8.0.tar.gz
- spark-2.2.0-bin-hadoop2.6.tgz(spark到官网选择hadoop2.6版本)
- apache-maven-3.5.2-bin.tar.gz
- jdk-8u152-linux-x64.tar.gz
- kafka_2.11-0.10.0.0.tgz
- scala-2.11.8.tgz
3.系统选择
镜像: 直接选择了已经部署好jdk1.8,tomcat和mysql的系统,后期方便使用
<figcaption style="margin-top: calc(0.666667em); padding: 0px 1em; font-size: 0.9em; line-height: 1.5; text-align: center; color: rgb(153, 153, 153);">镜像系统</figcaption>
4.web服务页面
hdfs hostname:50070
yarn hostname:8088
hbase hostname:60010
5.配置好环境变量
二、 基本配置
1.远程登录
进入控制台,生成密钥,需要重启服务器后才生效。阿里云好像默认禁止使用密码远程登录,做如下配置即可密码登录。
该问题通常是由于 SSH 服务修改了 PasswordAuthentication 参数,禁用了密码验证登录所致。
将 /etc/ssh/sshd_config中的如下行注释掉。
然后重启ssh服务就可以通过密码远程登录
systemctl restart sshd
2.添加新用户及ssh免密登录
1.创建用户 hadoop
adduser hadoop
2.修改密码
passwd hadoop
3.为hadoop添加免密切换到root用户权限
vi /etc/sudoers
添加一行
hadoop ALL=(root)NOPASSWD:ALL
4.防火墙设置
systemctl stop firewalld
systemctl disable firewalld(关闭开启启动)
5.为hadoop用户添加ssh免密登录
(进入用户家目录)
cd ~
(生成密钥,之后一路回车)
ssh-keygen -t rsa
(进入.ssh目录)
cd ~/.ssh
(将本机公钥添加到authorized_key中)
ssh-copy-id 主机名
注意:
1.用户的家目录权限不能太低,只能700,否则无法配置免密登录
2.配置java环境变量
vi /etc/profile
export JAVA_HOME=/jdk
export PATH=$PATH:$JAVA_HOME/bin
三、hadoop和yarn部署
1.在家目录创建hdfs存储数据的目录
cd /home/hadoop
mkdir hdfs
2.将上面要用到的所有软件全部解压到/opt目录
tar -xzvf /software /opt
3.配置
1.core-site
1\. cd /opt/hadoop-2.6.0-cdh5.8.0/etc/hadoop
2\. vi core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/hdfs</value> #开始创建好的目录
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hostname:9000</value>
</property>
</configuration>
2.hdfs-site.xml
1.vi hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name> #数据存放的副本数
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/hdfs/name</value> #namenode数据目录
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/hdfs/data</value> #datanode数据目录
</property>
</configuration>
export JAVA_HOME=/usr/java/jdk1.8.0_111
4.slaves
hostname #datanode节点主机名
4.hdfs格式化及启动
1.格式化hdfs
bin/hdfs namenode -format
2.启动namenode和datanode进程
sbin/start-dfs.sh
3.查看进程
jps
DataNode
NameNode
SecondaryNameNode
注意:
1.如果启动时报如下错,见上面基本配置1
5.yarn配置
1.yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
2.mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
6.启动yarn
sbin/start-yarn.sh
查看进程:
jps
ResourceManager
NodeManager
五、zookeeper部署
1、配置
1.cd /opt/zookeeper-3.4.5-cdh5.8.0
2.创建数据存放目录 zkData
mkdir /opt/zookeeper-3.4.5-cdh5.8.0/zkData
创建myid文件,输入数字1后保存
touch myid
3.创建日志目录
mkdir /opt/zookeeper-3.4.5-cdh5.8.0/logs
4.进入conf目录,将zoo.cfg.template文件名改为zoo.cfg,编辑zoo.cfg
vi zoo.cfg
tickTime=2000 #心跳时间,最小会话超时时间是tickTime的两倍
dataDir=/opt/zookeeper-3.4.5-cdh5.8.0/zkData
clientPort=2181 #监听客户端连接端口
5.编辑日志目录
vi log4j.properties
zookeeper.log.dir=/opt/zookeeper-3.4.5-cdh5.8.0/logs
2、启动
1、启动
bin/zkServer.sh start
2、查看当前节点状态
bin/zkServer.sh status
3.连接zookeeper客户端
bin/zkClish -server hostname:2181
六、hbase部署
1.配置
cd /opt/hbase-1.2.0-cdh5.8.0/conf
1.hbase-site.xml
<configuration>
<property>
<name>hbase.cluster.distributed</name> #由于我用的是独立的zookeeper,所以需要设置为true
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hostname:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name> #zookeeper节点的主机名
<value>hostname</value>
</property>
</configuration>
export JAVA_HOME=/usr/java/jdk1.8.0_111
#如果jdk8+,注释掉下面两个配置项
# Configure PermSize. Only needed in JDK7\. You can safely remove it for JDK8+
#export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
#export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
#我用的是外部独立的zookeeper,因此需要关闭hbase自带的zookeeper
# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=false
3、regionserver
hostname #本机主机名
hbase.log.dir=/opt/hbase-1.2.0-cdh5.8.0/logs #在该目录创建logs用来存放日志文件
2.启动hbase
bin/start-hbase.sh
查看进程jps
hmaster
hregionserver
七、hive部署
1.所需软件与Jar包
mysql (存储元数据,系统镜像中默认已经安装好)
将mysql-connector-java-5.1.45-bin.jar添加到hive的lib目录中, 下载地址
2.hive与mysql集成
1.启动mysql
systemctl start mysqld
2.登录mysql
mysql -uroot -p****
3.修改root密码
SET PASSWORD = PASSWORD("password"); #当前用户用这个即可
SET PASSWORD FOR 'root'@'localhost' = PASSWORD('password'); #远程用户这个
4.为hive创建mysql专用账号
CREATE USER “name”@”localhost” IDENTIFIED BY “1234”; #本地登录 密码‘1234’
CREATE USER “name”@”%” IDENTIFIED BY “1234”; #远程登录
mysql> create user 'hive'@'%' IDENTIFIED BY 'hive';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' WITH GRANT OPTION;
5.退出当前账号,登录hive并创建hive存放元数据的数据库
mysql> create database hive;
注意:
1.修改密码报错:ERROR 1819 (HY000): Your password does not satisfy the current policy requirements
原因:为密码设置了安全等级,需要修改等级后才可以设置为简单的密码
mysql> select @@validate_password_policy; #密码等级为中等
+----------------------------+
| @@validate_password_policy |
+----------------------------+
| MEDIUM |
+----------------------------+
1 row in set (0.00 sec)
mysql> SHOW VARIABLES LIKE 'validate_password%'; #查看密码验证条件
+--------------------------------------+--------+
| Variable_name | Value |
+--------------------------------------+--------+
| validate_password_check_user_name | OFF | #验证用户名
| validate_password_dictionary_file | | #验证密码强度的字典文件路径
| validate_password_length | 8 | #最小长度,参数默认为8
| validate_password_mixed_case_count | 1 | #至少要包含的小写字母个数和大写字母个数
| validate_password_number_count | 1 | #至少要包含的数字个数
| validate_password_policy | MEDIUM | #密码强度检查等级,0:low 1:medium 2:strong
| validate_password_special_char_count | 1 | #至少要包含的特殊字符数
+--------------------------------------+--------+
7 rows in set (0.03 sec)
mysql> set global validate_password_policy=0;
Query OK, 0 rows affected (0.00 sec)
mysql> set global validate_password_mixed_case_count=0;
Query OK, 0 rows affected (0.00 sec)
mysql> set global validate_password_number_count=3;
Query OK, 0 rows affected (0.00 sec)
mysql> set global validate_password_special_char_count=0;
Query OK, 0 rows affected (0.00 sec)
mysql> set global validate_password_length=3;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW VARIABLES LIKE 'validate_password%';
+--------------------------------------+-------+
| Variable_name | Value |
+--------------------------------------+-------+
| validate_password_check_user_name | OFF |
| validate_password_dictionary_file | |
| validate_password_length | 3 |
| validate_password_mixed_case_count | 0 |
| validate_password_number_count | 3 |
| validate_password_policy | LOW |
| validate_password_special_char_count | 0 |
+--------------------------------------+-------+
7 rows in set (0.00 sec)
#修改成功
mysql> SET PASSWORD FOR 'root'@'localhost' = PASSWORD('123');
3.hive配置
1.hive-site.xml
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://Jeason:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value>
<description>JDBC connect string for a JDBC metastore(jdbc连接mysql中的自建hive数据库),高版本的mysql需要设置SSL</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>(连接驱动)Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.cli.print.header</name>
<value>true</value>
<description>(显示数据对应的名称)Whether to print the names of the columns in query output.</description>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
<description>(显示使用的数据库)Whether to include the current database in the Hive prompt.</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/usr/hive/warehouse</value>
<description>hive在hdfs上数据存储目录</description>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/usr/hive/tmp</value>
<description>数据临时文件在hdfs上的目录</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/usr/hive/log</value>
<description>查询日志在hdfs上的位置</description>
</property>
</configuration>
# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/opt/hadoop-2.6.0-cdh5.8.0
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/opt/hive-1.1.0-cdh5.8.0/conf
hive.log.dir=/opt/hive-1.1.0-cdh5.8.0/logs #自己创建logs目录
4.在hdfs上为hive创建需要的目录
hdfs dfs -mkdir -p /usr/hive/warehouse
hdfs dfs -mkdir -p /usr/hive/tmp
hdfs dfs -mkdir -p /usr/hive/log
hdfs dfs -chmod 777 /usr/hive/warehouse
hdfs dfs -chmod 777 /usr/hive/tmp
hdfs dfs -chmod 777 /usr/hive/log
4.启动hive
bin/hive
注意:
1.jar包冲突,只需要删除非hadoop中的另一个jar包就可以了
2.高版本mysql需要设置ssl,见hive-site.xml第一条配置
八.spark部署
九.hue部署
1.编译
安装编译所需的依赖
sudo yum install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libxml2-devel libxslt-devel mysql-devel openldap-devel python-devel sqlite-devel gmp-devel openssl-devel
编译
cd /hue
make apps
2.配置
1.hadoop配置
#此项添加到hdfs-site.xml中,启用webhdfs或运行httpFS
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
#添加到core-site.xml中
<property>
<name>hadoop.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.groups</name>
<value>*</value>
</property>
注意:
如果web页面报错: Failed to access filesystem root
#再添加到hdfs-site.xml中,Stack Overflow找到的回答
<property>
<name>hadoop.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.groups</name>
<value>*</value>
</property>
2.hue.ini配置
http_host=hostname
hive_server_port = 10016
time_zone=Asia/Shanghai
secret_key=jFE93j;2[290-eiw.KEiwN2s3['d;/.q[eIW^y#e=+Iei*@Mn<qW5o
hive_conf_dir=/opt/hive-1.1.0-cdh5.8.0/conf
#关闭一些没开启的服务
app_blacklist==beeswax,impala,jobsub,pig,sqoop,oozie,indexer
#hue与hdfs
fs_defaultfs=hdfs://hostname:9000
webhdfs_url=http://hostname:50070/webhdfs/v1
hadoop_conf_dir=/opt/hadoop-2.6.0-cdh5.8.0/etc/hadoop
hadoop_bin=/opt/hadoop-2.6.0-cdh5.8.0/bin
hadoop_hdfs_home=/opt/hadoop-2.6.0-cdh5.8.0
#hue与yarn
resourcemanager_host=Jeason
resourcemanager_port=8032
submit_to=True
#hue与hive
hive_server_host=Jeason
hive_server_port=10000
hive_conf_dir=/opt/hive-1.1.0-cdh5.8.0/conf
#hue与hbase
hbase_clusters=(Cluster|Jeason:9090)
hbase_conf_dir=/opt/hbase-1.2.0-cdh5.8.0/conf/
#hue与mysql(为hive的metastore创建的数据库和hive账户)
host=hostname
port=3306
engine=mysql
user=hue
password=hue
name=hue
3.启动
1.zookeeper
2.hdfs
3.yarn
4.hbase
5.mysql
6.thrift /opt/bigdata/hbase-0.98.6-cdh5.3.0/bin/hbase-daemon.sh start thrift
7.hiveserver2 /opt/hive-1.1.0-cdh5.8.0/bin/hiveserver2
8.beeline /opt/hive-1.1.0-cdh5.8.0/bin/beeline
9.hue /opt/hue-3.9.0-cdh5.8.0/build/env/bin/supervisor