OpenShift是RedHat开发的PaaS,使用需要付费订阅,它的社区版为OKD,两者安装方法几乎一致,只是在操作系统和上层应用软件上有不同,本文讲述OKD的安装。
集群环境
注意:
集群中的主机是普通的PC,是实实在在的主机,不是虚拟机;
各角色的主机内存不要低于16GB,特别作为“Worker Host”的机器,因为仅部署“openshift-logging”就会消耗不小的内存,所以内存建议越大越好;
如果计划部署“Storage”(比如Ceph)在“Worker Host”,搭建集群前就给主机安装所需的非系统硬盘,否则还要关机安装硬盘;
集群搭建过程需要从Quay.io下载很多的镜像,如果你的网络很慢,那安装时间将会相当长,建议配置一个镜像站点或者想办法改善自己的网络环境。
下面开始演示安装过程,总共分六个部分:
一,DHCP和DNS
安装集群前,需要完成DHCP和DNS的配置
1,DHCP
集群主机使用PXE方式安装操作系统,通过DHCP得到网络地址信息。配置DHCP主要是下面两点(笔者使用是Windows Server 2008自带的DHCP服务)
(1)绑定主机的IP和MAC,方便对DNS进行配置
(2)配置PXE相关
这里使用的bootfile文件为“lpxelinux.0”,原因后面解释。
2,DNS
本来也准备使用Windows Server 2008自带的DNS服务,刚好有个CoreDNS容器,就使用了它,配置文件如下,
$ORIGIN okd-infra.wumi.ai. ; designates the start of this zone file in the namespace
$TTL 1h ; default expiration time of all resource records without their own TTL value
okd-infra.wumi.ai. IN SOA ns.okd-infra.wumi.ai. host-1.example.xyz. ( 2007120710 1d 2h 4w 1h )
okd-infra.wumi.ai. IN NS ns ; ns.example.com is a nameserver for example.com
okd-infra.wumi.ai. IN A 10.1.95.9 ; IPv4 address for example.com
ns IN A 10.1.95.9 ; IPv4 address for ns.example.com
bootstrap IN A 10.1.99.7
master-1 IN A 10.1.99.11
master-2 IN A 10.1.99.3
master-3 IN A 10.1.99.8
worker-1 IN A 10.1.99.14
worker-2 IN A 10.1.99.15
worker-3 IN A 10.1.99.16
etcd-0 IN A 10.1.99.11
etcd-1 IN A 10.1.99.3
etcd-2 IN A 10.1.99.8
_etcd-server-ssl._tcp 86400 IN SRV 0 10 2380 etcd-0
_etcd-server-ssl._tcp 86400 IN SRV 0 10 2380 etcd-1
_etcd-server-ssl._tcp 86400 IN SRV 0 10 2380 etcd-2
api IN A 10.1.95.9 ; host-1 haproxy
api-int IN A 10.1.95.9 ; host-1 haproxy
*.apps IN A 10.1.95.9 ; host-1 haproxy
二,HAProxy
HAProxy主要实现对APIServer和Ingress负载均衡访问,直接上配置文件,
/etc/haproxy/haproxy.cfg
defaults
mode tcp
option dontlognull
timeout connect 10s
timeout client 1m
timeout server 1m
#---------------------------------------------------------------------
frontend openshift-api-server
bind 10.1.95.9:6443
default_backend api-backend
mode tcp
#---------------------------------------------------------------------
backend api-backend
balance source
mode tcp
# server bootstrap 10.1.99.7:6443 check port 6443
server master-1 10.1.99.11:6443 check port 6443
server master-2 10.1.99.3:6443 check port 6443
server master-3 10.1.99.8:6443 check port 6443
#---------------------------------------------------------------------
frontend machine-config-server
bind 10.1.95.9:22623
default_backend machine-config-server
mode tcp
#---------------------------------------------------------------------
backend machine-config-server
balance source
mode tcp
# server bootstrap 10.1.99.7:22623 check port 22623
server master-1 10.1.99.11:22623 check port 22623
server master-2 10.1.99.3:22623 check port 22623
server master-3 10.1.99.8:22623 check port 22623
#---------------------------------------------------------------------
frontend ingress-http
bind 10.1.95.9:80
default_backend ingress-http
mode tcp
#---------------------------------------------------------------------
backend ingress-http
balance source
mode tcp
server worker-1 10.1.99.14:80 check port 80
server worker-2 10.1.99.15:80 check port 80
server worker-3 10.1.99.16:80 check port 80
#---------------------------------------------------------------------
frontend ingress-https
bind 10.1.95.9:443
default_backend ingress-https
mode tcp
#---------------------------------------------------------------------
backend ingress-https
balance source
mode tcp
server worker-1 10.1.99.14:443 check port 443
server worker-2 10.1.99.15:443 check port 443
server worker-3 10.1.99.16:443 check port 443
#---------------------------------------------------------------------
listen admin_stats # 网页管理页面
bind 0.0.0.0:8081
mode http
log 127.0.0.1 local0 err
stats refresh 10s
stats uri /haproxy
stats realm welcome login\ Haproxy
stats hide-version
stats admin if TRUE
初始配置“backend machine-config-server”和“backend api-backend”不要注释bootstrap部分,安装过程中,如果命令“./openshift-install --dir=<installation_directory> wait-for bootstrap-complete --log-level=info”输出结果提示移除bootstrap再注释bootstrap
三,下载需要的软件并准备安装配置文件
1,下载需要的软件
(1) 从“https://github.com/openshift/okd/releases”下载集群安装工具:openshift-install,该工具协助在公有云和本地基础设施上部署OpenShift 4集群
(2)安装集群管理工具:oc,从“ https://mirror.openshift.com/pub/openshift-v4/clients/oc/latest/”下载最新版,oc可以通过命令行的方式连接管理集群
2,定制安装配置文件
OpenShift 4集群安装和OpenShift 3完全不同,在安装前,需要定制安装配置文件,下面是一个样例配置文件(文件名必须为install-config.yaml):
apiVersion: v1
baseDomain: wumi.ai
compute:
- hyperthreading: Enabled
name: worker
replicas: 0 //在自维护的物理机上部署集群,需设置为0
controlPlane:
hyperthreading: Enabled
name: master
replicas: 3
metadata:
name: okd-infra
networking:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
networkType: OpenShiftSDN
serviceNetwork:
- 172.30.0.0/16
platform:
none: {}
fips: false
pullSecret: 'pullsecret obtained from redhat'
sshKey: 'sshkey that is created by ssh-keygen command'
搭建的集群里面有三台作为“master”,主要运行apiserver、etcd cluster等。
“pullSecret”:从红帽官网获得,部署集群需要的镜像存储在Quay.io,该key主要用来验证并获取镜像。
“sshKey”:是ssh public key,可以通过命令“ssh-keygen”获得,在配置对应私钥的主机上,使用ssh命令可以直接登录集群中的服务器,不用输入密码,方便对集群进行调试等。
四,生成k8s manifest和ignition配置文件
1,生成k8s manifest文件
创建目录“config-install”,将上个步骤中编写的集群安装配置文件“install-config.yaml”拷贝到该目录,然后执行下面命令,
./openshift-install create manifests --dir=config-install
执行后,安装程序在目录“config-install”中生成manifests文件(文件install-config.yaml会被消耗掉)
我们不打算运行用户的pod在master上,修改文件“config-install/manifests/cluster-scheduler-02-config.yml”,将参数“mastersSchedulable”设置为“false”,保存并退出。
2,生成ignition配置文件,该文件完成CoreOS的定制(Openshift 4集群的主机都必须运行CoreOS)
./openshift-install create ignition-configs --dir=config-install
执行后,安装程序在目录“config-install”生成ignitions文件(manifests文件会被消耗掉)
五,搭建PXE安装环境
在得到ignition文件和系统镜像文件后,配置PXE安装环境。ignition、kernel、initrd等通过http供集群主机下载,所以首先需要配置http服务器和tftp服务器
1,TFTP服务器
对tftp的配置主要是两个部分:
PXE的Bootfile要使用“lpxelinux.0”,这样才可以使用http协议;
pxelinux.cfg配置:
# D-I config version 2.0
# search path for the c32 support libraries (libcom32, libutil etc.)
path debian-installer/amd64/boot-screens/
include debian-installer/amd64/boot-screens/menu.cfg
default debian-installer/amd64/boot-screens/vesamenu.c32
prompt 0
timeout 0
label fedora-coreos-bootstrap
KERNEL http://10.1.95.10:8000/fedora-coreos-32.20200923.3.0-live-kernel-x86_64
APPEND ip=dhcp initrd=http://10.1.95.10:8000/fedora-coreos-32.20200923.3.0-live-initramfs.x86_64.img \
console=tty0 console=ttyS0 coreos.inst.install_dev=/dev/sda \
coreos.inst.ignition_url=http://10.1.95.10:8000/bootstrap.ign \
coreos.inst.install_dev=/dev/sda \
coreos.live.rootfs_url=http://10.1.95.10:8000/fedora-coreos-32.20200923.3.0-live-rootfs.x86_64.img
label fedora-coreos-master
KERNEL http://10.1.95.10:8000/fedora-coreos-32.20200923.3.0-live-kernel-x86_64
APPEND ip=dhcp initrd=http://10.1.95.10:8000/fedora-coreos-32.20200923.3.0-live-initramfs.x86_64.img \
console=tty0 console=ttyS0 coreos.inst.install_dev=/dev/sda \
coreos.inst.ignition_url=http://10.1.95.10:8000/master.ign \
coreos.inst.install_dev=/dev/sda \
coreos.live.rootfs_url=http://10.1.95.10:8000/fedora-coreos-32.20200923.3.0-live-rootfs.x86_64.img
label fedora-coreos-worker
KERNEL http://10.1.95.10:8000/fedora-coreos-32.20200923.3.0-live-kernel-x86_64
APPEND ip=dhcp initrd=http://10.1.95.10:8000/fedora-coreos-32.20200923.3.0-live-initramfs.x86_64.img \
console=tty0 console=ttyS0 coreos.inst.install_dev=/dev/sda \
coreos.inst.ignition_url=http://10.1.95.10:8000/worker.ign \
coreos.inst.install_dev=/dev/sda \
coreos.live.rootfs_url=http://10.1.95.10:8000/fedora-coreos-32.20200923.3.0-live-rootfs.x86_64.img
2,HTTP服务器
笔者使用nginx来配置HTTP服务,nginx的配置文件没有什么好展示的,将所需的镜像文件放到“/var/www/html”目录即可,集群主机在PXE安装环节会请求这些文件,
aneirin@vm-1:/var/www/html$ ls -lh
total 732M
-rwxrwxrwx 1 root root 297K Oct 16 15:32 bootstrap.ign
-rwxrwxrwx 1 root root 70M Oct 15 10:44 fedora-coreos-32.20200923.3.0-live-initramfs.x86_64.img
-rwxrwxrwx 1 root root 12M Oct 15 10:44 fedora-coreos-32.20200923.3.0-live-kernel-x86_64
-rwxrwxrwx 1 root root 651M Oct 15 10:45 fedora-coreos-32.20200923.3.0-live-rootfs.x86_64.img
-rwxrwxrwx 1 root root 11K Sep 5 2019 index.html //nginx自带的文件
-rwxrwxrwx 1 root root 612 Apr 22 11:50 index.nginx-debian.html //nginx自带的文件
-rwxrwxrwx 1 root root 1.9K Oct 16 15:32 master.ign
-rwxrwxrwx 1 root root 1.9K Oct 16 15:32 worker.ign
六,集群安装
PXE环境配置好之后,就可以为集群的主机安装操作系统,七台主机逐次安装即可(bootstrap->master->worker),不用刻意等待一台安装好,再安装另一台,直接同时装也是没问题的。
使用命令“./openshift-install --dir=<installation_directory> wait-for bootstrap-complete --log-level=info”查看bootstrap的过程,当提示remove bootstrap时,从haproxy的配置文件中移除bootstrap相关配置即可,bootstrap主机的使命就完成了。
1,配置登录凭据
export KUBECONFIG=<installation_directory>/auth/kubeconfig
可以直接写在“~/.bashrc”中,下次登录Shell,KUBECONFIG环境变量是一直存在的
2,连接集群Approving CSR
上步配置好后,就能以用户“system:admin”连接集群,该用户对集群有超级管理的权限(集群安装完成后建议禁用该账户,它是一个安全隐患),我们需要对一些对象生成的CSR做Approve操作,这样组件安装才能继续进行,
oc get csr //查看需要Approve的CSR
oc adm certificate approve <csr_name> //Approve指定的CSR
操作完成后,输出如下,
3,等待clusteroperators安装完成
OKD集群基础设施组件严重依赖各类“clusteroperators”,需要等待“AVAILABLE”列全部变为“True”
aneirin@host-1:~$ oc get clusteroperators
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
authentication 4.5.0-0.okd-2020-10-03-012432 True False False 3h22m
cloud-credential 4.5.0-0.okd-2020-10-03-012432 True False False 3h55m
cluster-autoscaler 4.5.0-0.okd-2020-10-03-012432 True False False 3h29m
config-operator 4.5.0-0.okd-2020-10-03-012432 True False False 3h29m
console 4.5.0-0.okd-2020-10-03-012432 True False False 3h24m
csi-snapshot-controller 4.5.0-0.okd-2020-10-03-012432 True False False 3h33m
dns 4.5.0-0.okd-2020-10-03-012432 True False False 3h41m
etcd 4.5.0-0.okd-2020-10-03-012432 True False False 3h42m
image-registry 4.5.0-0.okd-2020-10-03-012432 True False False 3h36m
ingress 4.5.0-0.okd-2020-10-03-012432 True False False 3h27m
insights 4.5.0-0.okd-2020-10-03-012432 True False False 3h36m
kube-apiserver 4.5.0-0.okd-2020-10-03-012432 True False False 3h42m
kube-controller-manager 4.5.0-0.okd-2020-10-03-012432 True False False 3h42m
kube-scheduler 4.5.0-0.okd-2020-10-03-012432 True False False 3h40m
kube-storage-version-migrator 4.5.0-0.okd-2020-10-03-012432 True False False 3h27m
machine-api 4.5.0-0.okd-2020-10-03-012432 True False False 3h35m
machine-approver 4.5.0-0.okd-2020-10-03-012432 True False False 3h40m
machine-config 4.5.0-0.okd-2020-10-03-012432 True False False 3h28m
marketplace 4.5.0-0.okd-2020-10-03-012432 True False False 3h35m
monitoring 4.5.0-0.okd-2020-10-03-012432 True False False 3h25m
network 4.5.0-0.okd-2020-10-03-012432 True False False 3h44m
node-tuning 4.5.0-0.okd-2020-10-03-012432 True False False 3h44m
openshift-apiserver 4.5.0-0.okd-2020-10-03-012432 True False False 3h27m
openshift-controller-manager 4.5.0-0.okd-2020-10-03-012432 True False False 3h34m
openshift-samples 4.5.0-0.okd-2020-10-03-012432 True False False 3h24m
operator-lifecycle-manager 4.5.0-0.okd-2020-10-03-012432 True False False 3h43m
operator-lifecycle-manager-catalog 4.5.0-0.okd-2020-10-03-012432 True False False 3h43m
operator-lifecycle-manager-packageserver 4.5.0-0.okd-2020-10-03-012432 True False False 3h34m
service-ca 4.5.0-0.okd-2020-10-03-012432 True False False 3h44m
storage 4.5.0-0.okd-2020-10-03-012432 True False False 3h33m
4,为image-registry配置存储
在非公有云平台部署OKD4,image-registry没有现成的存储可用。在非生产环境,可以使用“emptyDir”来作为临时存储(重启registry,镜像会丢失,生产环境勿用),这样就可以使用集群内的本地镜像仓库,配置命令如下,
oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"storage":{"emptyDir":{}}}}'
大功告成:
aneirin@host-1:~$ ./openshift-install --dir=config-install wait-for install-complete
INFO Waiting up to 30m0s for the cluster at https://api.okd-infra.wumi.ai:6443 to initialize...
INFO Waiting up to 10m0s for the openshift-console route to be created...
INFO Install complete!
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/home/aneirin/okd4/config-install/auth/kubeconfig'
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.okd-infra.wumi.ai
INFO Login to the console with user: "kubeadmin", and password: "CaEJY-myzAi-R7Wtj-XXXX"
INFO Time elapsed: 1s
本篇只是OpenShift 4使用万里长征的第一步,后面还有很多工作要做,比如monitoring、logging、storage等等,敬请期待!