1 规划设计

组网拓扑

主机	业务网	心跳网	存储网
node1	192.168.234.129	172.16.1.231	10.0.1.231
node2	192.168.234.130	172.16.1.232	10.0.1.232
storage	192.168.234.250		10.0.1.235

2 节点准备

安装虚拟化软件

yum groups install -y "Virtualization Platform " 
yum groups install -y "Virtualization Hypervisor "
yum groups install -y "Virtualization Tools "
yum groups install -y "Virtualization Client "

安装集群软件

yum install pacemaker corosync pcs psmisc policycoreutils-python fence-agents-all -y

存储安装nfs、rpcbind软件

yum -y install nfs-utils rpcbind

设置hosts添加各主机的域名解析

[root@node1 ~]$  cat /etc/hosts
192.168.234.129 node1
192.168.234.130 node2
10.0.1.231 node1-stor
10.0.1.232 node2-stor
10.0.1.235 stor
172.16.1.231 node1-sync
172.16.1.232 node2-sync

配置ssh免密认证

ssh-keygen -t rsa -P ''
ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1 #到自己免密码
ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2 #到node2免密码（双向）

设置定时同步时间

yum install ntpdate -y
crontab -e
*/30 * * * * /usr/sbin/ntpdate time.windows.com &> /dev/null

设置防火墙

#允许集群服务通过防火墙
firewall-cmd --permanent --add-service=high-availability
#允许心跳及存储网络通过防火墙
firewall-cmd --zone=trusted --add-source=10.0.1.0/24 --permanent
firewall-cmd --zone=trusted --add-source=172.16.1.0/24 --permanent
#允许动态迁移
firewall-cmd --permanent --add-port=16509/tcp
firewall-cmd --permanent --add-port=49152-49215/tcp
#允许虚拟机vnc端口，方便virt-manager远程连接
firewall-cmd --permanent --add-port=5900/tcp
#nfs存储上放行nfs端口
firewall-cmd --permanent --add-service=nfs
firewall-cmd --permanent --add-service=rpc-bind
firewall-cmd --permanent --add-service=mountd

配置应用程序

#配置宿主机pcs守护进程
systemctl start pcsd
systemctl enable pcsd
#配置存储共享目录
vim /etc/exports
/vm  *(rw,async,no_root_squash)
#配置存储nfs、rpcbind进程
systemctl start nfs
systemctl start rpcbind
systemctl enable nfs
systemctl enable rpcbind

配置集群账号hacluster密码

echo 'linuxplus' | passwd --stdin hacluster

认证节点并创建集群

pcs cluster auth node1-sync node2-sync -u hacluster
pcs cluster setup --name cluster1 node1-sync node2-sync
pcs cluster start --all

挂载nfs文件夹

[root@node1 ~]$ vim /etc/fstab
#添加一行
stor:/vm                /vm                     nfs     defaults        0 0
[root@node1 ~]$ mount -a

3 虚拟机创建

3.1 创建KVM虚拟机

创建虚拟机磁盘文件

[root@node1 ~]$ cd /vm/
[root@node1 vm]$ qemu-img create -f qcow2 centos6.qcow2 10G

创建虚拟机

virt-install \
--name centos6 \
--memory 1024 --vcpus 1 \
--cdrom /iso/CentOS-6.10-x86_64-minimal.iso \
--disk /vm/centos6.qcow2,cache=none \
--graphics vnc,listen=0.0.0.0 \
--network network=default \
--os-variant rhel6 \
--os-type linux

测试使用libvirt来迁移虚拟机

virsh # migrate centos6 qemu+ssh://root@node2/system --live --persistent --undefinesource --migrateuri tcp://node2-sync

3.2 创建PCS虚拟机

PCS管理的集群虚拟机资源需要所有节点都能访问虚拟机的配置文件和磁盘文件，同一时刻虚拟机只能在一个节点上启动，其他节点为备用状态。
首先需要将KVM虚拟机配置文件拷贝出来，放在共享存储上

virsh dumpxml centos6 > /vm/qemu_config/test.xml

在宿主机中，删除之前的KVM虚拟机配置文件，虚拟机由pcs来控制，而不是由libvirt控制

virsh undefine centos6

启动集群

[root@node1 ~]$ pcs cluster start --all
node1-sync: Starting Cluster (corosync)...
node2-sync: Starting Cluster (corosync)...
node1-sync: Starting Cluster (pacemaker)...
node2-sync: Starting Cluster (pacemaker)...

查询集群资源

[root@node1 ~]$  pcs resource list  | grep domain
ocf:heartbeat:VirtualDomain - Manages virtual domains through the libvirt
service:rhel-domainname - systemd unit file for rhel-domainname
systemd:rhel-domainname - systemd unit file for rhel-domainname

查看ocf:heartbeat:VirtualDomain支持的操作

[root@node1 ~]$ pcs resource describe ocf:heartbeat:VirtualDomain
ocf:heartbeat:VirtualDomain - Manages virtual domains through the libvirt virtualization framework

Resource agent for a virtual domain (a.k.a. domU, virtual machine,
virtual environment etc., depending on context) managed by libvirtd.

Resource options:
  config (required) (unique): Absolute path to the libvirt configuration file, for this virtual domain.
  hypervisor: Hypervisor URI to connect to. See the libvirt documentation for details on supported URI formats.
              The default is system dependent. Determine the system's default uri by running 'virsh --quiet uri'.
  force_stop: Always forcefully shut down ("destroy") the domain on stop. The default behavior is to resort to a
              forceful shutdown only after a graceful shutdown attempt has failed. You should only set this to
              true if your virtual domain (or your virtualization backend) does not support graceful shutdown.
  migration_transport: Transport used to connect to the remote hypervisor while migrating. Please refer to the
                       libvirt documentation for details on transports available. If this parameter is omitted,
                       the resource will use libvirt's default transport to connect to the remote hypervisor.
  migration_user: The username will be used in the remote libvirt remoteuri/migrateuri. No user will be given
                  (which means root) in the username if omitted If remoteuri is set, migration_user will be
                  ignored.
  migration_downtime: Define max downtime during live migration in milliseconds
  migration_speed: Define live migration speed per resource in MiB/s
  migration_network_suffix: Use a dedicated migration network. The migration URI is composed by adding this
                            parameters value to the end of the node name. If the node name happens to be an FQDN
                            (as opposed to an unqualified host name), insert the suffix immediately prior to the
                            first period (.) in the FQDN. At the moment Qemu/KVM and Xen migration via a dedicated
                            network is supported. Note: Be sure this composed host name is locally resolveable and
                            the associated IP is reachable through the favored network. This suffix will be added
                            to the remoteuri and migrateuri parameters. See also the migrate_options parameter
                            below.
  migrateuri: You can also specify here if the calculated migrate URI is unsuitable for your environment. If
              migrateuri is set then migration_network_suffix, migrateport and --migrateuri in migrate_options are
              effectively ignored. Use "%n" as the placeholder for the target node name. Please refer to the
              libvirt documentation for details on guest migration.
  migrate_options: Extra virsh options for the guest live migration. You can also specify here --migrateuri if the
                   calculated migrate URI is unsuitable for your environment. If --migrateuri is set then
                   migration_network_suffix and migrateport are effectively ignored. Use "%n" as the placeholder
                   for the target node name. Please refer to the libvirt documentation for details on guest
                   migration.
  monitor_scripts: To additionally monitor services within the virtual domain, add this parameter with a list of
                   scripts to monitor. Note: when monitor scripts are used, the start and migrate_from operations
                   will complete only when all monitor scripts have completed successfully. Be sure to set the
                   timeout of these operations to accommodate this delay.
  autoset_utilization_cpu: If set true, the agent will detect the number of domainU's vCPUs from virsh, and put it
                           into the CPU utilization of the resource when the monitor is executed.
  autoset_utilization_hv_memory: If set true, the agent will detect the number of *Max memory* from virsh, and put
                                 it into the hv_memory utilization of the resource when the monitor is executed.
  migrateport: This port will be used in the qemu migrateuri. If unset, the port will be a random highport.
  remoteuri: Use this URI as virsh connection URI to commuicate with a remote hypervisor. If remoteuri is set then
             migration_user and migration_network_suffix are effectively ignored. Use "%n" as the placeholder for
             the target node name. Please refer to the libvirt documentation for details on guest migration.
  save_config_on_stop: Changes to a running VM's config are normally lost on stop. This parameter instructs the RA
                       to save the configuration back to the xml file provided in the "config" parameter.
  sync_config_on_stop: Setting this automatically enables save_config_on_stop. When enabled this parameter
                       instructs the RA to call csync2 -x to synchronize the file to all nodes. csync2 must be
                       properly set up for this to work.
  snapshot: Path to the snapshot directory where the virtual machine image will be stored. When this parameter is
            set, the virtual machine's RAM state will be saved to a file in the snapshot directory when stopped.
            If on start a state file is present for the domain, the domain will be restored to the same state it
            was in right before it stopped last. This option is incompatible with the 'force_stop' option.
  backingfile: When the VM is used in Copy-On-Write mode, this is the backing file to use (with its full path).
               The VMs image will be created based on this backing file. This backing file will never be changed
               during the life of the VM.
  stateless: If set to true and backingfile is defined, the start of the VM will systematically create a new qcow2
             based on the backing file, therefore the VM will always be stateless. If set to false, the start of
             the VM will use the COW (<vmname>.qcow2) file if it exists, otherwise the first start will create a
             new qcow2 based on the backing file given as backingfile.
  copyindirs: List of directories for the virt-copy-in before booting the VM. Used only in stateless mode.
  shutdown_mode: virsh shutdown method to use. Please verify that it is supported by your virsh toolsed with
                 'virsh help shutdown' When this parameter is set --mode shutdown_mode is passed as an additional
                 argument to the 'virsh shutdown' command. One can use this option in case default acpi method
                 does not work. Verify that this mode is supported by your VM. By default --mode is not passed.

Default operations:
  start: interval=0s timeout=90s
  stop: interval=0s timeout=90s
  monitor: interval=10s timeout=30s
  migrate_from: interval=0s timeout=60s
  migrate_to: interval=0s timeout=120s

向群集添加虚拟机资源

pcs resource create centos6 ocf:heartbeat:VirtualDomain \
hypervisor="qemu:///system" \
config="/vm/qemu_config/centos6.xml" \
migration_transport=ssh \
meta allow-migration="true" priority="100" \
#meta allow-migrate="true" 关键配置，决定了迁移模式
#下面是可选部分
op start timeout="120s" \
op stop timeout="120s" \
op monitor timeout="30" interval="10" \
op migrate_from interval="0" timeout="120s" \
op migrate_to interval="0" timeout="120s"

由于pcs集群默认要检查隔离设备，所以创建后虚拟机资源无法启动，需要更改pcs集群属性，将stonith设备禁用，禁用后虚拟机资源可以正常启动

[root@node1 ~]$ pcs property set stonith-enabled=false
[root@node1 ~]$ pcs property 
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: cluster1
dc-version: 1.1.21-4.el7-f14e36fd43
have-watchdog: false
stonith-enabled: false

配置完stonith-enabled=false后的pcs状态.png

4 迁移虚拟机

Centos自带的kvm版本，使用libvirt迁移支持在线迁移，使用pcs无法在线迁移，虚拟机会先shutdown，再在目标节点上开机。
将虚拟机从node1迁移到node2

[root@node1 ~]$ pcs resource move centos6 node2-sync

如果不指定迁移节点，pcs会在迁出节点上添加限制属性constraint，防止虚拟机反复迁移

#目前虚拟机在node2上，不指定迁移节点，迁移虚拟机至node1
[root@node1 ~]$ pcs resource move centos6
Warning: Creating location constraint cli-ban-centos6-on-node2-sync with a score of -INFINITY for resource centos6 on node node2-sync.
This will prevent centos6 from running on node2-sync until the constraint is removed. This will be the case even if node2-sync is the last node in the cluster.
[root@node2 ~]$ pcs constraint --full
Location Constraints:
  Resource: centos6
    Disabled on: node2-sync (score:-INFINITY) (role: Started) (id:cli-ban-centos6-on-node2-sync)
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:
#可以看出node2被添加了-infinity值的constraint

节点被添加了constraint属性后，需要删除constraint属性才能在上面继续允许虚拟机，否则虚拟机迁移过去将永远是stopped状态

#pcs constraint remove <constraint id>
[root@node1 ~]$ pcs constraint remove cli-ban-centos6-on-node2-sync

使节点变为standby状态，则上面所有的资源自动迁移

#目前虚拟机在node1上，配置node1为standby状态
[root@node1 ~]$ pcs node standby node1-sync
#查看集群状态，发现node1是standby状态，虚拟机在node2上运行
[root@node1 ~]$ pcs status 
Cluster name: cluster1
Stack: corosync
Current DC: node1-sync (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Oct  5 13:39:50 2020
Last change: Mon Oct  5 13:39:06 2020 by root via cibadmin on node1-sync

2 nodes configured
1 resource configured

Node node1-sync: standby
Online: [ node2-sync ]

Full list of resources:

 centos6        (ocf::heartbeat:VirtualDomain): Started node2-sync

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
#还原node1状态
[root@node1 ~]$  pcs node unstandby node1-sync

停止节点的集群服务，使节点上面的所有资源迁移

#目前虚拟机运行在node2上面，关闭node2的集群服务
[root@node1 ~]$  pcs cluster stop node2-sync
node2-sync: Stopping Cluster (pacemaker)...
node2-sync: Stopping Cluster (corosync)...
#查看pcs状态，虚拟机迁移到node1上运行
[root@node1 ~]$  pcs status 
Cluster name: cluster1
Stack: corosync
Current DC: node1-sync (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Oct  5 13:44:33 2020
Last change: Mon Oct  5 13:41:18 2020 by root via cibadmin on node1-sync

2 nodes configured
1 resource configured

Online: [ node1-sync ]
OFFLINE: [ node2-sync ]

Full list of resources:

 centos6        (ocf::heartbeat:VirtualDomain): Started node1-sync

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
#还原node2集群服务
[root@node1 ~]$  pcs cluster start node2-sync
node2-sync: Starting Cluster (corosync)...
node2-sync: Starting Cluster (pacemaker)...

5 配置STONITH

安装fence agent

yum -y install fence-agents-ipmilan

不直接更改集群文件配置，先将集群cib文件拷贝出来，保存为当前文件夹下s_cfg文件

[root@node1 ~]$ pcs cluster cib s_cfg

更改拷贝出来的配置文件s_cfg，添加STONITH设备

Usage: pcs stonith [commands]...
Configure fence devices for use with pacemaker
    create <stonith id> <stonith device type> [stonith device options]
           [op <operation action> <operation options> [<operation action>
           <operation options>]...] [meta <meta options>...]
           [--group <group id> [--before <stonith id> | --after <stonith id>]]
           [--disabled] [--wait[=n]]
        Create stonith device with specified type and options.
        If --group is specified the stonith device is added to the group named.
        You can use --before or --after to specify the position of the added
        stonith device relatively to some stonith device already existing in the
        group.
        If --disabled is specified the stonith device is not used.
        If --wait is specified, pcs will wait up to 'n' seconds for the stonith
        device to start and then return 0 if the stonith device is started, or 1
        if the stonith device has not yet started.  If 'n' is not specified it
        defaults to 60 minutes.
        Example: Create a device for nodes node1 and node2
            pcs stonith create MyFence fence_virt pcmk_host_list=node1,node2
        Example: Use port p1 for node n1 and ports p2 and p3 for node n2
            pcs stonith create MyFence fence_virt 'pcmk_host_map=n1:p1;n2:p2,p3'

pcs -f s_cfg stonith create impi-fencing fence_ipmilan \
pcmk_host_list="node1-sync node2-sync" ipaddr=10.0.1.1 login=testuser \
passwd=abc123 op monitor interval=60s

配置文件s_cfg的property中设置stonith-enabled为true

[root@node1 ~]$  pcs -f s_cfg property set stonith-enabled=true

使用s_cfg刷新当前集群配置文件

[root@node1 ~]$  pcs cluster cib-push s_cfg 
CIB updated

查看集群状态，已经可以看到ipmi-fencing资源了，因为当前是虚拟机的实验环境，无真实设备，资源启动后会失败

[root@node1 ~]$  pcs status 
Cluster name: cluster1
Stack: corosync
Current DC: node1-sync (version 1.1.21-4.el7-f14e36fd43) - partition with quorum
Last updated: Mon Oct  5 16:24:14 2020
Last change: Mon Oct  5 16:23:46 2020 by root via cibadmin on node1-sync

2 nodes configured
2 resources configured

Online: [ node1-sync node2-sync ]

Full list of resources:

 centos6        (ocf::heartbeat:VirtualDomain): Started node1-sync
 impi-fencing   (stonith:fence_ipmilan):        Starting node1-sync

KVM学习笔记(集群实验一，基于NFS共享存储的集群)