1、下载hadoop
https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/
2、安装jdk
rpm -ivh jdk-8u261-linux-x64.rpm
3、解压hadoop文件
tar zxvf hadoop-3.3.1.tar.gz
4、指定jdk地址
修改hadoop-3.3.1/etc/hadoop/hadoop-env.sh文件
到hadoop-3.3.1目录下,执行
vi etc/hadoop/hadoop-env.sh
增加 export JAVA_HOME=/usr/java/jdk1.8.0_261-amd64
5、伪分布式部署
1)配置etc/hadoop/core-site.xml文件
vi etc/hadoop/core-site.xml
增加以下内容
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
2)配置etc/hadoop/hdfs-site.xml文件
vi etc/hadoop/hdfs-site.xml
增加以下内容
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
3)配置ssh localhost无密码登录
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
4)格式化文件系统
bin/hdfs namenode -format
5)执行sbin/start-dfs.sh,报以下错误
[root@iZ2zeb8tcng37z21t5bk9cZ hadoop-3.3.1]# sbin/start-dfs.sh
Starting namenodes on [localhost]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [iZ2zeb8tcng37z21t5bk9cZ]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
需要在sbin目录下start-dfs.sh和stop-dfs.sh文件空白处增加
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
6)在sbin/start-yarn.sh和sbin/stop-yarn.sh文件空白处增加
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
启动dfs和yarn
sbin/start-dfs.sh
sbin/start-yarn.sh
7)生成执行MapReduce作业所需的HDFS目录
bin/hdfs dfs -mkdir /user
bin/hdfs dfs -mkdir /user/<username>
8)将输入文件复制到分布式文件系统
bin/hdfs dfs -mkdir input
bin/hdfs dfs -put etc/hadoop/*.xml input
9)检查输出文件:将输出文件从分布式文件系统复制到本地文件系统并进行检查
bin/hdfs dfs -get output output
cat output/*
查看分布式文件系统上的输出文件
bin/hdfs dfs -cat output/*
10)配置单节点YARN
配置etc/hadoop/mapred-site.xml文件
vi etc/hadoop/mapred-site.xml
增加
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
配置etc/hadoop/yarn-site.xml文件
vi etc/hadoop/yarn-site.xml
增加
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
</property>
11)安装配置Hive
下载文件地址
https://hive.apache.org/general/downloads/
解压文件
tar zxvf apache-hive-2.3.9-bin.tar.gz
修改目录
mv apache-hive-2.3.9-bin/ hive-2.3.9
12)进入hive-2.3.9/conf路径,重命名配置文件:
mv hive-env.sh.template hive-env.sh
13)修改hive-env.sh文件
vi hive-env.sh
增加以下内容
# Set HADOOP_HOME to point to a specific hadoop install directory
# 指定Hadoop安装路径
HADOOP_HOME=/root/hadoop-3.3.1
# Hive Configuration Directory can be controlled by:
# 指定Hive配置文件夹
export HIVE_CONF_DIR=/root/hive-2.3.9/conf
14)修改环境变量
vi /etc/profile
增加以下内容
export HIVE_HOME=/root/hive-2.3.9
export PATH=$PATH:$HIVE_HOME/bin
# Hadoop环境加入Hive依赖
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*
声明环境变量:
source /etc/profile
15)安装mysql
解压安装包
tar xvf mysql-8.0.26-1.el8.x86_64.rpm-bundle.tar
安装顺序
rpm -ivh mysql-community-common-8.0.26-1.el8.x86_64.rpm
rpm -ivh mysql-community-client-plugins-8.0.26-1.el8.x86_64.rpm
rpm -ivh mysql-community-libs-8.0.26-1.el8.x86_64.rpm
rpm -ivh mysql-community-client-8.0.26-1.el8.x86_64.rpm
rpm -ivh mysql-community-server-8.0.26-1.el8.x86_64.rpm
16)初始化数据库
mysqld --initialize
17)查看配置文件、赋权
cat /etc/my.cnf
这里面有一行是 datadir=/var/lib/mysql 表示数据文件存放地址,这个文件夹要给mysql用户赋权,不然是无法启动数据库的。执行
chown mysql:mysql /var/lib/mysql -R
命令进行赋权。赋权必须在数据库初始化后进行(前提是你用非mysql用户安装mysql,而不是mysql用户),不然启动数据库会报错。
18)启动数据库
启动MySql
systemctl start mysqld.service
停止MySql
systemctl stop mysqld.service
重启MySql
systemctl restart mysqld.service
设置MySql开机自启
systemctl enable mysqld
19)查看修改root用户密码
查看数据库的初始密码
cat /var/log/mysqld.log | grep password
修改初始密码
mysqladmin -uroot -p'Ush&4PGR=0Vj' password Mysql123456
如果初始密码中有特殊字符,如<、&等字符,可以在密码信息两边加上单引号。
20)设置远程客户端登录mysql
mysql -u root -p
use mysql
update user set host='%' where user = 'root';
select host,user from user;
设置完成需要重启数据库
systemctl restart mysqld.service
21)在hive中上传mysql驱动包
安装mysql驱动包
rpm -ivh mysql-connector-java-8.0.26-1.el8.noarch.rpm
说明需要java-headless
执行以下命令进行安装
yum install java-headless
再次安装mysql驱动包
rpm -ivh mysql-connector-java-8.0.26-1.el8.noarch.rpm
查找mysql驱动包路径
find / -name mysql-connector-java.jar
拷贝mysql-connector-java.jar到hive-2.3.9/lib目录下
cp /usr/share/java/mysql-connector-java.jar /root/hive-2.3.9/lib
22)在hive-2.3.9/conf路径创建配置文件hive-site.xml
vi hive-site.xml
增加以下内容
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>HadoopMysql123456</value>
<description>password to use against metastore database</description>
</property>
# 查询表时显示表头信息
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
# 显示当前所在的数据库
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
</configuration>
初始化metastore,起服务
schematool -dbType mysql -initSchema
hive --service metastore &
23)修改hadoop 配置文件 etc/hadoop/core-site.xml,加入如下配置项
vi etc/hadoop/core-site.xml
增加以下内容
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
需要重启stop-dfs.sh、stop-yarn.sh
sbin/stop-dfs.sh
sbin/stop-yarn.sh
sbin/start-dfs.sh
sbin/start-yarn.sh
24)创建数据库、赋权
登录hive,创建chinese_consul数据库
hive
create database if not exists chinese_consul;
quit;
赋权
./hadoop dfs -chmod -R 777 /user/hive/warehouse/chinese_consul.db
通过前端客户端insert数据时报以下错误:
org.apache.hadoop.security.AccessControlException: Permission denied: user=anonymous, access=EXECUTE, inode="/tmp/hadoop-yarn":root:supergroup:drwx------
是权限不足导致
执行以下语句进行赋权
hadoop fs -chown hadoop:hadoop /tmp/hadoop-yarn
hadoop fs -chmod -R 777 /tmp/hadoop-yarn
存在的问题
Exiting with status 1: java.io.IOException: NameNode is not formatted.
此时9000端口是没有打开
解决方法:重新格式化文件系统
bin/hdfs namenode -format
如果想修改resourcemanager前端管理页面地址http://localhost:8088
修改hadoop-3.3.1/etc/hadoop/yarn-site.xml文件
增加
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
hive建表,注释内容乱码解决
登录mysql数据库
mysql -u root -p
切换到metastore库。
use metastore;
执行一下操作。
alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;
alter table TABLE_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
alter table PARTITION_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
alter table PARTITION_KEYS modify column PKEY_COMMENT varchar(4000) character set utf8;
alter table INDEX_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
重新建表即可。
修改hadoop的ResourceManager的默认端口,默认端口为8088,但是这个地址很容易被挖矿程序攻击。
修改yarn-site.xml文件
vi etc/hadoop/yarn-site.xml
增加以下内容
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8888</value>
</property>
8888为新的端口
重启
sbin/stop-dfs.sh
sbin/stop-yarn.sh
sbin/start-dfs.sh
sbin/start-yarn.sh
参考文档:
https://hadoop.apache.org/docs/r3.3.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml