注意事项:
JDK版本: JDK 1.8_261+
- 所有的Hadoop机器和kerberos的客户端都需要高于该版本, 否则可能会报错
java KrbException: Message stream modified (41)
在snc-platform-node所在机器安装kerberos服务端和客户端
备注: 如无法通过yum方式安装, 所需的离线rpm需自行下载并对应相对的系统,cpu架构(x86或arm),若更新过yum源后,需要执行一下yum clean all;yum makecache
来更新依赖信息
麒麟系统- ky10
#先安装devel, 用于依赖版本适配
yum install krb5-devel
yum install krb5-libs
yum install krb5-client
#服务端
yum install krb5-server
yum install openldap-clients
其他系统
- 服务端
yum install krb5-server krb5-libs krb5-auth-dialog krb5-workstation openldap-clients -y
- 客户端
# RHEL
yum install krb5-workstation krb5-libs
# SUSE
zypper install krb5-client
# Ubuntu, Debian
apt-get install krb5-user
配置和启动Kerberos服务端
安装服务端后, 会生成/etc/krb5.conf、/var/kerberos/krb5kdc/kadm5.acl、/var/kerberos/krb5kdc/kdc.conf
3个文件, 我们需要更改一下默认的域名(realm)为HADOOP.COM
,按需修改; 如果多套hadoop集群, 也可以共享一个域名
2.1 修改配置
- 修改/etc/krb5.conf
# 无特殊情况,该行配置建议也注释掉(容易导致客户端配置加载信息错误), 若有更多的配置平铺在krb5.conf中
#includedir /etc/krb5.conf.d/
[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
[libdefaults]
dns_lookup_realm = false
ticket_lifetime = 24h
# renew需要注释掉,容易导致客户端解密失败
# renew_lifetime = 7d
forwardable = true
rdns = false
pkinit_anchors = FILE:/etc/pki/tls/certs/ca-bundle.crt
default_realm = HADOOP.COM
# 该配置建议去掉, 可能会导致客户端认证失败
# default_ccache_name = KEYRING:persistent:%{uid}
[realms]
HADOOP.COM = {
# kdc认证服务的host名称, 需要提前在/etc/hosts做配置
kdc = hadoop-node0
admin_server = hadoop-node0
}
[domain_realm]
.hadoop.com = HADOOP.COM
hadoop.com = HADOOP.COM
- 修改/var/kerberos/krb5kdc/kadm5.acl
*/admin@HADOOP.COM *
- 修改/var/kerberos/krb5kdc/kdc.conf
[kdcdefaults]
kdc_ports = 88
kdc_tcp_ports = 88
[realms]
HADOOP.COM = {
#master_key_type = aes256-cts
acl_file = /var/kerberos/krb5kdc/kadm5.acl
dict_file = /usr/share/dict/words
admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
supported_enctypes = aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal camellia256-cts:normal camellia128-cts:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:norm
al
}
2.2 初始化和启动Kerberos服务
- 创建Kerberos数据库, 途中会提示输入密码, 重复两次即可
kdb5_util create –r HADOOP.COM -s
- 执行kadmin.local, 输入list_principals查看列表
- 启动kdc和kadmin服务
systemctl enable krb5kdc
systemctl enable kadmin
systemctl start krb5kdc
systemctl start kadmin
Hadoop配置Kerberos
3.1 Hdfs-Datanode开启ssl
由于datanode数据传输走的不是rpc,而是http。所以datanode必须要配置ssl来支持sasl, 否则会导致进程无法启动
3.1.1 生成ssl认证文件 keystore.jks和truststore.jks
- 在任意一台hadoop机器生成CA证书
执行一次即可,执行后会生成ca-cert, ca-key两个文件, 将文件拷贝到每一台hadoop的指定目录下,如/data/hadoop/kerberos
export password=user123
export sslSubj="/C=CN/L=Guangzhou/O=user/CN=user.com"
rm -fr *.jks
export aliasName=user_hadoop
openssl req -new -x509 -keyout ca-key -out ca-cert -days 3650 -passin pass:${password} -passout pass:${password} -subj ${sslSubj}
- 在每一台hadoop机器上执行如下的脚本,生成jks文件进行CA签署
使用keytool命令需要配置JAVA_HOME, 若不进行配置则需要指定全路径
rm -fr *.jks
export password=user123
export aliasName=user_hadoop
export keytoolDname="C=CN,L=Guangzhou,O=user,CN=user.com"
#步骤1: 生成server端的ssl密钥和证书
keytool -keystore keystore.jks -alias ${aliasName} -validity 365 -keyalg RSA -genkey -storepass ${password} -keypass ${password} -dname ${keytoolDname}
keytool -keystore truststore.jks -alias CARoot -import -file ca-cert -storepass ${password} -keypass ${password} -dname ${keytoolDname} -noprompt
#步骤2:签署 CA文件
keytool -keystore keystore.jks -alias ${aliasName} -certreq -file cert-file -storepass ${password} -keypass ${password} -dname ${keytoolDname} -noprompt
openssl x509 -req -CA ca-cert -CAkey ca-key -in cert-file -out cert-signed -days 365 -CAcreateserial -passin pass:${password}
keytool -keystore keystore.jks -alias CARoot -import -file ca-cert -storepass ${password} -keypass ${password} -dname ${keytoolDname} -noprompt
keytool -keystore keystore.jks -alias ${aliasName} -import -file cert-signed -storepass ${password} -keypass ${password} -dname ${keytoolDname} -noprompt
rm -fr ca-cert.srl cert-file cert-signed
3.1.2 编辑ssl-client.xml和ssl-server.xml, 若文件不存在则新建一个
- 编辑./etc/hadoop/ssl-client.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>ssl.client.truststore.location</name>
<value>/data/hadoop/kerberos/truststore.jks</value>
<description>Truststore to be used by clients like distcp. Must be
specified.
</description>
</property>
<property>
<name>ssl.client.truststore.password</name>
<value>user123</value>
<description>Optional. Default value is "".
</description>
</property>
<property>
<name>ssl.client.truststore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is "jks".
</description>
</property>
<property>
<name>ssl.client.truststore.reload.interval</name>
<value>10000</value>
<description>Truststore reload check interval, in milliseconds.
Default value is 10000 (10 seconds).
</description>
</property>
<property>
<name>ssl.client.keystore.location</name>
<value>/data/hadoop/kerberos/keystore.jks</value>
<description>Keystore to be used by clients like distcp. Must be
specified.
</description>
</property>
<property>
<name>ssl.client.keystore.password</name>
<value>user123</value>
<description>Optional. Default value is "".
</description>
</property>
<property>
<name>ssl.client.keystore.keypassword</name>
<value>user123</value>
<description>Optional. Default value is "".
</description>
</property>
<property>
<name>ssl.client.keystore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is "jks".
</description>
</property>
</configuration>
- 编辑./etc/hadoop/server-ssl.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>ssl.server.truststore.location</name>
<value>/data/hadoop/kerberos/truststore.jks</value>
<description>Truststore to be used by NN and DN. Must be specified.
</description>
</property>
<property>
<name>ssl.server.truststore.password</name>
<value>user123</value>
<description>Optional. Default value is "".
</description>
</property>
<property>
<name>ssl.server.truststore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is "jks".
</description>
</property>
<property>
<name>ssl.server.truststore.reload.interval</name>
<value>10000</value>
<description>Truststore reload check interval, in milliseconds.
Default value is 10000 (10 seconds).
</description>
</property>
<property>
<name>ssl.server.keystore.location</name>
<value>/data/hadoop/kerberos/keystore.jks</value>
<description>Keystore to be used by NN and DN. Must be specified.
</description>
</property>
<property>
<name>ssl.server.keystore.password</name>
<value>user123</value>
<description>Must be specified.
</description>
</property>
<property>
<name>ssl.server.keystore.keypassword</name>
<value>user123</value>
<description>Must be specified.
</description>
</property>
<property>
<name>ssl.server.keystore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is "jks".
</description>
</property>
<property>
<name>ssl.server.exclude.cipher.list</name>
<value>TLS_ECDHE_RSA_WITH_RC4_128_SHA,SSL_DHE_RSA_EXPORT_WITH_DES40_CBC_SHA,
SSL_RSA_WITH_DES_CBC_SHA,SSL_DHE_RSA_WITH_DES_CBC_SHA,
SSL_RSA_EXPORT_WITH_RC4_40_MD5,SSL_RSA_EXPORT_WITH_DES40_CBC_SHA,
SSL_RSA_WITH_RC4_128_MD5</value>
<description>Optional. The weak security cipher suites that you want excluded
from SSL communication.</description>
</property>
</configuration>
- 编辑./etc/hadoop/hdfs-site.xml
<!-- datanoe开启ssl -->
<property>
<name>dfs.data.transfer.protection</name>
<value>authentication</value>
</property>
<property>
<name>dfs.http.policy</name>
<value>HTTPS_ONLY</value>
</property>
<property>
<name>ssl.server.truststore.location</name>
<value>/data/hadoop/kerberos/truststore.jks</value>
</property>
<property>
<name>ssl.server.keystore.location</name>
<value>/data/hadoop/kerberos/keystore.jks</value>
</property>
3.1.3 重启HDFS
这一步主要验证ssl是否配置正确, 如果hadoop机器之间配置了免密, 则可以使用./sbin/stop-dfs.sh和./sbin/start-dfs.sh来启停hdfs,
需要注意stop的脚本有可能会无法停掉进程需要ps做进一步的确认(/tmp目录如果清除过, hadoop运行时的部分信息被抹除了可能会导致无法通过命令停止,需要通过kill的方式)
启动成功后在Web上输入 https://hadoop-node0:9871
(原本没有开启https的时候, 端口默认为9870), 即可访问; 若Web因网络安全原因无法开放,则可用curl, netstat或hdfs dfsadmin -report等方式来进行验证
3.2 Hadoop开启和关联Kerberos认证
3.2.1 新增pricipal, 导出keytab文件
在Kerberos Server端(KDC)所在的机器(如snc-platform-node), 新增一个pricipal(相当于用户信息)
格式: hdfs/``hadoop_cluster@HADOOP.COM
, 由三部分组成, 用户名/服务名称@域名
在同一套hadoop集群可以统一使用一个pricipal便于部署, 如果多套hadoop则通过服务名称来区分: 如hadoop_cluster_1, hadoop_cluster_2; 若想搭建多套kdc则可通过域名来做区分如@HADOOP_1.COM, @HADOOP_2.COM
步骤:
- 新增pricipal - 在root用户下,输入kadmin.local, 途中需要输入两次密码保持一致即可,创建后可用
list_principals来查看
kadmin.local:
add_principal hdfs/hadoop-cluster@HADOOP.COM
- 导出keytab文件, 名称为user.keytab, 目前像统一约束为这个名称
kadmin.local:
ktadd -kt /data/hadoop/kerberos/user.keytab hdfs/hadoop-cluster@HADOOP.COM
# 导出后需要使用启动hadoop进程的用户,执行一下kinit进行命令验证,防止文件权限文件或其他问题, 导致认证失败
kinit -kt /data/hadoop/kerberos/user.keytab hdfs/hadoop_cluster_33_34@HADOOP.COM
- 将导出的keytab文件拷贝到hadoop的每一台机器的相同目录下
3.2.2 将Kerberos Server的/etc/krb5.conf文件拷贝覆盖到每一个hadoop节点的/etc目录下
假设指定目录为: /data/hadoop/kerberos/
3.2.3 Hadoop关联pricipal和keytab
- 编辑./etc/hadoop/core-site.xml
<property>
<name>hadoop.rpc.protection</name>
<value>authentication</value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
<property>
<name>hadoop.kerberos.keytab.login.autorenewal.enabled</name>
<value>true</value>
</property>
- 编辑./etc/hadoop/hdfs-site.xml
<!-- 自定义的属性值,多个服务共享一个keytab减少配置量 -->
<property>
<name>custom.kerberos.common.principal</name>
<value>hdfs/hadoop_cluster@HADOOP.COM</value>
</property>
<property>
<name>custom.kerberos.common.keytab.file</name>
<value>/data/hadoop/kerberos/user.keytab</value>
</property>
<!-- Web 配置 -->
<property>
<name>dfs.namenode.kerberos.internal.spnego.principal</name>
<value>${custom.kerberos.common.principal}</value>
</property>
<property>
<name>dfs.web.authentication.kerberos.keytab</name>
<value>${custom.kerberos.common.keytab.file}</value>
</property>
<!-- DataNode 配置 -->
<property>
<name>dfs.secondary.namenode.kerberos.principal</name>
<value>${custom.kerberos.common.principal}</value>
</property>
<property>
<name>dfs.secondary.namenode.keytab.file</name>
<value>${custom.kerberos.common.keytab.file}</value>
</property>
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>${custom.kerberos.common.principal}</value>
</property>
<property>
<name>dfs.namenode.keytab.file</name>
<value>${custom.kerberos.common.keytab.file}</value>
</property>
<!-- DataNode 配置 -->
<property>
<name>dfs.datanode.kerberos.principal</name>
<value>${custom.kerberos.common.principal}</value>
</property>
<property>
<name>dfs.datanode.keytab.file</name>
<value>${custom.kerberos.common.keytab.file}</value>
</property>
<property>
<name>dfs.block.access.token.enable</name>
<value>true</value>
</property>
<!-- Journalnode 配置 -->
<property>
<name>dfs.journalnode.kerberos.principal</name>
<value>${custom.kerberos.common.principal}</value>
</property>
<property>
<name>dfs.journalnode.keytab.file</name>
<value>${custom.kerberos.common.keytab.file}</value>
</property>
- 编辑./etc/hadoop/yarn-site.xml
<property>
<name>custom.kerberos.common.principal</name>
<value>hdfs/hadoop_cluster@HADOOP.COM</value>
</property>
<property>
<name>custom.kerberos.common.keytab.file</name>
<value>/data/hadoop/kerberos/user.keytab</value>
</property>
<!-- resourcemanager -->
<property>
<name>yarn.resourcemanager.principal</name>
<value>${custom.kerberos.common.principal}</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.spnego-principal</name>
<value>${custom.kerberos.common.principal}</value>
</property>
<property>
<name>yarn.resourcemanager.keytab</name>
<value>${custom.kerberos.common.keytab.file}</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.spnego-keytab-file</name>
<value>${custom.kerberos.common.keytab.file}</value>
</property>
<!-- nodemanager -->
<property>
<name>yarn.nodemanager.principal</name>
<value>${custom.kerberos.common.principal}</value>
</property>
<property>
<name>yarn.nodemanager.webapp.spnego-principal</name>
<value>${custom.kerberos.common.principal}</value>
</property>
<property>
<name>yarn.nodemanager.keytab</name>
<value>${custom.kerberos.common.keytab.file}</value>
</property>
<property>
<name>yarn.nodemanager.webapp.spnego-keytab-file</name>
<value>${custom.kerberos.common.keytab.file}</value>
</property>
3.2.4 重启Hadoop集群
机器之间免密互信时,可用sbin目录下来批量操作
重启HDFS
./sbin/stop-dfs.sh
./sbin/start-dfs.sh
重启YARN
./sbin/stop-yarn.sh
./sbin/start-yarn.sh
批量操作的情况下,容易出现某台机器的某个hadoop进程启动失败,这时候需要单独处理, 如单独启停某一个datanode, 执行./bin/hdfs --daemon start datanode
, 其他进程的操作也是类似的
3.2.5 验证Hadoop服务是否正常
在任意一台有kerberos client的hadoop机器上, 执行
kinit -kt /data/hadoop/kerberos/user.keytab hdfs/``hadoop_cluster_33_34@HADOOP.COM
查看hdfs 相关信息, 能正常返回即为正常
datanode状态,
./bin/hdfs dfsadmin -report
查看根目录,
./bin/hdfs dfs fs -ls /
查看yarn相关信息,
./bin.yarn node -list
异常处理记录
【金山文档 | WPS云文档】 Hadoop认证异常记录
https://kdocs.cn/l/cuhZSkouygW6