9. MySQL高可用-Galera Cluster

9.2 Galera Cluster

9.2.1 Galera Cluster介绍

Galera Cluster：集成了Galera插件的MySQL集群，是一种新型的，数据不共享的，高度冗余的高可用方
目前Galera Cluster有两个版本，分别是Percona Xtradb Cluster及MariaDB Cluster，Galera本身是具有多主特性的，即采用multi-master的集群架构，是一个既稳健，又在数据一致性、完整性及高性能方面有出色表现的高可用解决方案

Galera Cluster特点


    多主架构：真正的多点读写的集群，在任何时候读写数据，都是最新的
    同步复制：集群不同节点之间数据同步，没有延迟，在数据库挂掉之后，数据不会丢失
    并发复制：从节点APPLY数据时，支持并行执行，更好的性能
    故障切换：在出现数据库故障时，因支持多点写入，切换容易
    热插拔：在服务期间，如果数据库挂了，只要监控程序发现的够快，不可服务时间就会非常少。在节点故障期间，节点本身对集群的影响非常小
    自动节点克隆：在新增节点，或者停机维护时，增量数据或者基础数据不需要人工手动备份提供，Galera Cluster会自动拉取在线节点数据，最终集群会变为一致
    对应用透明：集群的维护，对应用程序是透明的

Galera Cluster缺点


    由于DDL 需全局验证通过，则集群性能由集群中最差性能节点决定（一般集群节点配置都是一样的）
    新节点加入或延后较大的节点重新加入需全量拷贝数据(SST，State Snapshot Transfer),作为donor( 贡献者，如： 同步数据时的提供者)的节点在同步过程中无法提供读写
    只支持innodb存储引擎的表

Galera Cluster工作过程

依靠一个全局的事务id, 整个集群环境, 每个事务的事务id都是唯一的. 客户端连接到一台主服务器上做更改, 提交请求后会在当前连接的主服务器上做检查, 如果失败, 直接返回更新失败, 如果成功, 会把修改操作发给其他主节点, 如果发生冲突, 那么操作就会取消, 返回失败信息. 这样确保所有的服务器都会执行正确的指令, 不会出现冲突

因此, 所有节点都要做检查就会造成效率问题, 延迟问题. 因此, galera cluster的集群有数量限制, 主节点过多或造成效率过低.

galera cluster有多种实现方案, 其底层都是基于两个组件

Galera replication library
WSREP: MySQL extended with the Write Set Replication

WSREP复制实现的不同方案

PXC: Percona XtraDB Cluster, 是Percona 对 Galera的实现
MariaDB Galera Cluster, 是MariaDB 对 Galera的实现

注意: 两者都需要至少三个节点, 不能安装mysql server或者mariadb server, 因为Galera Cluster是独立专用的数据库版本

9.2.2 实战案例: Percona XtraDB Cluster(PXC 5.7)

环境准备:

四台主机:
pxc1 - 10.0.0.237
pxc2 - 10.0.0.227
pxc3 - 10.0.0.217
pxc4 - 10.0.0.207 #作为后期添加节点和模拟故障节点修复操作

目前PXC只支持CentOS 7

实验步骤

1: 安装Percona XtraDB Cluster 5.7

[14:13:53 root@pxc1 ~]#vim /etc/yum.repos.d/pxc.repo

[percona]
name=percona_repo                                                                                                                                   
baseurl=https://mirrors.tuna.tsinghua.edu.cn/percona/release/$releasever/RPMS/$basearch/
enabled=1
gpgcheck=0

[14:28:33 root@pxc1 ~]#scp /etc/yum.repos.d/pxc.repo 10.0.0.227:/etc/yum.repos.d
[14:28:33 root@pxc1 ~]#scp /etc/yum.repos.d/pxc.repo 10.0.0.217:/etc/yum.repos.d
[14:28:33 root@pxc1 ~]#scp /etc/yum.repos.d/pxc.repo 10.0.0.207:/etc/yum.repos.d

[16:26:27 root@pxc1 ~]#yum -y install Percona-XtraDB-Cluster-57
[16:26:27 root@pxc2 ~]#yum -y install Percona-XtraDB-Cluster-57
[16:26:27 root@pxc3 ~]#yum -y install Percona-XtraDB-Cluster-57
[16:26:27 root@pxc4 ~]#yum -y install Percona-XtraDB-Cluster-57

2: 在各个节点上分别配置mysql及集群配置文件

/etc/my.cnf为主配置文件, 当前版本中, 其余的配置文件都放在/etc/percona-xtradb-cluster.conf.d目录里, 包括mysqld.conf, mysqld_safe.cnf, wsrep.cnf三个文件

#/etc/my.cnf不需要修改
[16:29:42 root@pxc1 ~]#cat /etc/my.cnf
#
# The Percona XtraDB Cluster 5.7 configuration file.
#
#
# * IMPORTANT: Additional settings that can override those from this file!
#   The files must end with '.cnf', otherwise they'll be ignored.
#   Please make any edits and changes to the appropriate sectional files
#   included below.
#
!includedir /etc/my.cnf.d/
!includedir /etc/percona-xtradb-cluster.conf.d/

#mysqld.cnf不需要修改
[16:29:45 root@pxc1 ~]#cat /etc/percona-xtradb-cluster.conf.d/mysqld.cnf 
# Template my.cnf for PXC
# Edit to your requirements.
[client]
socket=/var/lib/mysql/mysql.sock

[mysqld]
server-id=1 #建议各个节点不同, 但是无需手动修改
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
log-bin #建议启动, 但非必须项
log_slave_updates
expire_logs_days=7

# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

#mysqld_safe.cnf不需要修改
[16:33:06 root@pxc1 ~]#cat /etc/percona-xtradb-cluster.conf.d/mysqld_safe.cnf 
#
# The Percona Server 5.7 configuration file.
#
# One can use all long options that the program supports.
# Run program with --help to get a list of available options and with
# --print-defaults to see which it would actually understand and use.
#
# For explanations see
# http://dev.mysql.com/doc/mysql/en/server-system-variables.html

[mysqld_safe]
pid-file = /var/run/mysqld/mysqld.pid
socket   = /var/lib/mysql/mysql.sock
nice     = 0

pxc的配置文件需要修改
/etc/percona-xtradb-cluster.conf.d/wsrep.cnf

#pxc1
wsrep_cluster_address=gcomm://10.0.0.237,10.0.0.227,10.0.0.217 #集群所有服务器的ip, 这里演示初期搭建三个服务器, 后期添加一个服务器10.0.0.207
wsrep_node_address=10.0.0.237 #各个节点指定自己的ip
wsrep_node_name=pxc-cluster-node-237 #各个节点指定自己的节点名称
wsrep_sst_auth="sstuser:s3cretPass" #取消本行注释, 用于各节点之间信息同步, 生产环境需要修改, 所有节点统一. 此处账号密码需要在数据库启动后, 手动创建

#pxc2
wsrep_cluster_address=gcomm://10.0.0.237,10.0.0.227,10.0.0.217 #集群所有服务器的ip, 这里演示初期搭建三个服务器, 后期添加一个服务器10.0.0.207
wsrep_node_address=10.0.0.227 #各个节点指定自己的ip
wsrep_node_name=pxc-cluster-node-227 #各个节点指定自己的节点名称
wsrep_sst_auth="sstuser:s3cretPass" #取消本行注释, 用于各节点之间信息同步, 生产环境需要修改

#pxc3
wsrep_cluster_address=gcomm://10.0.0.237,10.0.0.227,10.0.0.217 #集群所有服务器的ip, 这里演示初期搭建三个服务器, 后期添加一个服务器10.0.0.207
wsrep_node_address=10.0.0.217 #各个节点指定自己的ip
wsrep_node_name=pxc-cluster-node-217 #各个节点指定自己的节点名称
wsrep_sst_auth="sstuser:s3cretPass" #取消本行注释, 用于各节点之间信息同步, 生产环境需要修改

尽管Galera Cluster不再需要通过binlog的形式进行同步, 但还是建议在配置文件中开启二进制日志功能, 原因是后期有新节点需要加入, 老节点通过SST全量传输的方式向新节点传输数据, 很可能会拖垮集群性能, 所以让新节点通过binlog的方式完成同步再加入集群会是一种更好的选择

3. 启动集群服务

由于pxc各节点之间都是平级关系, 因此启动集群服务可以在任何一个节点上操作, 之后在把其他节点加入进来

注意: 启动第一个节点和启动其余节点的方式不同

这里选择在pxc1上启动集群服务

[16:53:45 root@pxc1 ~]#systemctl start mysql@bootstrap.service

[17:06:44 root@pxc1 ~]#ss -ntlp
State      Recv-Q Send-Q                             Local Address:Port                                            Peer Address:Port              
LISTEN     0      100                                    127.0.0.1:25                                                         *:*                   users:(("master",pid=949,fd=13))
LISTEN     0      128                                            *:22                                                         *:*                   users:(("sshd",pid=806,fd=3))
LISTEN     0      128                                            *:4567                                                       *:*                   users:(("mysqld",pid=27074,fd=11)) #启动服务后开始4567端口
LISTEN     0      100                                        [::1]:25                                                      [::]:*                   users:(("master",pid=949,fd=14))
LISTEN     0      80                                          [::]:3306                                                    [::]:*                   users:(("mysqld",pid=27074,fd=32)) #启动服务后开启3306端口
LISTEN     0      128                                         [::]:22                                                      [::]:*                   users:(("sshd",pid=806,fd=4))

# pxc5.7对应了mysql的5.7版本, 因此安装时会设置临时密码, 需要获取root密码
[17:08:22 root@pxc1 ~]#grep 'temporary password' /var/log/mysqld.log 
2020-11-27T09:05:05.380840Z 1 [Note] A temporary password is generated for root@localhost: Hih(ptaF2TWq

[17:10:18 root@pxc1 ~]#mysql -uroot -p
Enter password:

mysql> show databases;  #需要修改初始密码
ERROR 1820 (HY000): You must reset your password using ALTER USER statement before executing this statement.

mysql> alter user root@'localhost' identified by ''; #这里使用空密码, 方便操作, 生产环境需要配置密码. 注意: 在启动第一个节点后修改密码, 该账号和密码会复制到其余所有节点
Query OK, 0 rows affected (0.00 sec)

#需在启动其余节点创建, 之后会复制到其余节点. 如果没创建账号就启动其余节点, mysql是启动不起来的

mysql> create user sstuser@'localhost' identified by 's3cretPass';
Query OK, 0 rows affected (0.00 sec)
mysql> grant reload, lock tables, process, replication client on *.* to 'sstuser'@'localhost';
Query OK, 0 rows affected (0.00 sec)

#查看集群节点数量的状态变量
[17:15:24 root@pxc1 ~]#mysql -e "show status like 'wsre%'"
| wsrep_cluster_size               | 1  #显示当前集群只有一个节点
| wsrep_local_state_comment        | Synced #状态Synced(4), 表示数据已同步完成(因为是第一个引导节点, 无数据需要同步); 如果状态是Joiner, 表示SST没有完成; 只有当所有节点状态是Synced, 才可以添加新节点
| wsrep_cluster_status             | Primary #集群状态为primary, 且已完全连接并准备好

4. 添加其他节点到集群

#pxc2
[17:32:28 root@pxc2 ~]#systemctl start mysql
[17:34:59 root@pxc2 ~]#ss -ntl
State      Recv-Q Send-Q                             Local Address:Port                                            Peer Address:Port              
LISTEN     0      128                                            *:22                                                         *:*                  
LISTEN     0      128                                            *:4567                                                       *:*                  
LISTEN     0      100                                    127.0.0.1:25                                                         *:*                  
LISTEN     0      80                                          [::]:3306                                                    [::]:*                  
LISTEN     0      128                                         [::]:22                                                      [::]:*                  
LISTEN     0      100                                        [::1]:25                                                      [::]:*

[17:36:16 root@pxc1 ~]#mysql -e "show status like 'wsre%'" | grep size
wsrep_cluster_size  2 #集群size变为2, 说明pxc2添加成功

#pxc3
[17:38:42 root@pxc3 ~]#systemctl start mysql
[17:39:27 root@pxc3 ~]#ss -ntl
State      Recv-Q Send-Q                             Local Address:Port                                            Peer Address:Port              
LISTEN     0      128                                            *:22                                                         *:*                  
LISTEN     0      128                                            *:4567                                                       *:*                  
LISTEN     0      100                                    127.0.0.1:25                                                         *:*                  
LISTEN     0      128                                         [::]:22                                                      [::]:*                  
LISTEN     0      100                                        [::1]:25                                                      [::]:*                  
LISTEN     0      80                                          [::]:3306                                                    [::]:*

[17:40:15 root@pxc1 ~]#mysql -e "show status like 'wsre%'" | grep size
wsrep_cluster_size  3 #集群size变为3, 说明pxc3添加成功

到此, 三个节点的pxc搭建完

5. 测试各节点都能写数据, 并且能同步到其他节点

注意: mysql和percona的sql_mode不同, 因此, 使用pxc, 开发的sql代码要做修改

[00:23:01 root@pxc1 ~]#mysql < hellodb_innodb.sql 
ERROR 1105 (HY000) at line 45: Percona-XtraDB-Cluster prohibits use of LOCK TABLE/FLUSH TABLE <table> WITH READ LOCK/FOR EXPORT with pxc_strict_mode = ENFORCING

在pxc2上, 创建hellodb数据库, 并且导入testlog.sql存储过程

create table testlog (id int auto_increment primary key,name char(10),age int default 20);

delimiter $$

create procedure  sp_testlog() 
begin  
declare i int;
set i = 1; 
while i <= 100000 
do  insert into testlog(name,age) values (concat('wang',i),i); 
set i = i +1; 
end while; 
end$$

delimiter ;

#pxc2
mysql> create database hellodb;
Query OK, 1 row affected (0.01 sec)
#pxc1
[17:40:20 root@pxc1 ~]#mysql -e 'show databases'
+--------------------+
| Database           |
+--------------------+
| information_schema |
| hellodb 
#pxc3
[17:46:12 root@pxc3 ~]#mysql -e 'show databases';
+--------------------+
| Database           |
+--------------------+
| information_schema |
| hellodb            |

#pxc2
[17:47:22 root@pxc2 ~]#mysql hellodb < testlog.sql
mysql> use hellodb;
mysql> call sp_testlog;
Query OK, 1 row affected (5 min 14.05 sec)

#由于pxc工作时, 会在所有主节点做数据检查, 因此效率很慢, 需要以事务方式运行

#pxc3同时以事务形式运行testlog存储过程
mysql> use hellodb;
mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> call sp_testlog;
Query OK, 1 row affected (2.36 sec)

mysql> commit;
Query OK, 0 rows affected (0.23 sec)

#pxc1
mysql> use hellodb;

mysql>  select count(*) from testlog;
+----------+
| count(*) |
+----------+
|   142229 |
+----------+
1 row in set (0.02 sec)

#pxc2
mysql> use hellodb;
mysql> select count(*) from testlog;
+----------+
| count(*) |
+----------+
|   100000 |
+----------+
1 row in set (0.07 sec)
#pxc3
mysql> select count(*) from testlog;
+----------+
| count(*) |
+----------+
|   132279 |
+----------+
1 row in set (0.02 sec)

pxc的多主可以同时读写, 互不影响, 并且同时写数据会互相同步, 缺点是执行效率慢, 集群节点最多8个, 最少3个, 执行SQL时建议以事务方式执行, 把多次提交变为一次提交, 这样只要检查一次数据即可

同时在所有节点创建数据, 当有冲突时, 只会有一个节点生效, 其节点会返回失败

6. 添加新节点, pxc4-10.0.0.207

修改新增节点配置文件

#pxc4
[00:32:18 root@pxc4 ~]#vim /etc/percona-xtradb-cluster.conf.d/wsrep.cnf

wsrep_cluster_address=gcomm://10.0.0.237,10.0.0.227,10.0.0.217,10.0.0.207 
wsrep_node_address=10.0.0.207
wsrep_node_name=pxc-cluster-node-207
wsrep_sst_auth="sstuser:s3cretPass"

#其余所有节点, 把10.0.0.207添加到配置文件, 如果不手动添加, 只要pxc4启动后会自动被添加到集群里, 但是还是建议之后手动修改配置文件添加新节点
wsrep_cluster_address=gcomm://10.0.0.237,10.0.0.227,10.0.0.217,10.0.0.207

启动服务

[00:36:09 root@pxc4 ~]#systemctl start mysql

[18:04:27 root@pxc1 ~]#mysql -e "show status like 'wsre%'" | grep size
wsrep_cluster_size  4 #集群size变为4, 说明添加成功

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| hellodb |  #数据也同步过来了

7. 模拟修复故障节点

#停止任意一个节点的mysql服务
[18:10:22 root@pxc4 ~]#systemctl stop mysql

#此时可以看到集群节点数变为3, 该过程无需等待, 会立即发现某个主节点发生了故障
[18:06:59 root@pxc1 ~]#mysql -e "show status like 'wsre%'" | grep size
wsrep_cluster_size  3

# pxc1再次插入testlog.sql

mysql> use hellodb;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed

mysql> begin;
Query OK, 0 rows affected (0.00 sec)

mysql> call sp_testlog;
Query OK, 1 row affected (2.02 sec)

mysql> commit;
Query OK, 0 rows affected (0.18 sec)

# 启动pxc4

[01:23:43 root@pxc4 ~]#systemctl start mysql

# 验证数据自动同步

mysql> use hellodb;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select count(*) from testlog;
+----------+
| count(*) |
+----------+
|   200000 |
+----------+
1 row in set (0.03 sec)

pxc支持:

多个主节点, 同时插入相同数据, 没有主键冲突或不指定主键时, 会根据节点数量插入对应数量的相同数据, 比如有三个节点, 那么就是插入3行该数据, 主键会错开, 这些数据的主键会不同, pxc会自动解决主键的问题
多个主节点, 同时插入主键相同的数据时, 只有一个节点会显示插入成功, 数据也会同步到其他节点

mysql> insert testlog  (id,name,age)value(1,'haha',2);
Query OK, 1 row affected (0.01 sec)

mysql> insert testlog  (id,name,age)value(1,'haha',2);
ERROR 1062 (23000): Duplicate entry '1' for key 'PRIMARY'

mysql> insert testlog  (id,name,age)value(1,'haha',2);
ERROR 1062 (23000): Duplicate entry '1' for key 'PRIMARY'

一个节点加入到Galera集群有两种情况：新节点加入集群、暂时离组的成员再次加入集群

1）新节点加入Galera集群新节点加入集群时，需要从当前集群中选择一个Donor节点来同步数据，也就是所谓的state_snapshot_tranfer(SST)过程。SST同步数据的方式由选项wsrep_sst_method决定，一般选择的是xtrabackup。必须注意，新节点加入Galera时，会删除新节点上所有已有数据，再通过xtrabackup(假设使用的是该方式)从Donor处完整备份所有数据进行恢复。所以，如果数据量很大，新节点加入过程会很慢。而且，在一个新节点成为Synced状态之前，不要同时加入其它新节点，否则很容易将集群压垮。如果是这种情况，可以考虑使用wsrep_sst_method=rsync来做增量同步，既然是增量同步，最好保证新节点上已经有一部分数据基础，否则和全量同步没什么区别，且这样会对Donor节点加上全局read only锁。

2）旧节点加入Galera集群如果旧节点加入Galera集群，说明这个节点在之前已经在Galera集群中呆过，有一部分数据基础，缺少的只是它离开集群时的数据。这时加入集群时，会采用IST(incremental snapshot transfer)传输机制，即使用增量传输。但注意，这部分增量传输的数据源是Donor上缓存在GCache文件中的，这个文件有大小限制，如果缺失的数据范围超过已缓存的内容，则自动转为SST传输。如果旧节点上的数据和Donor上的数据不匹配(例如这个节点离组后人为修改了一点数据)，则自动转为SST传输。

MySQL架构选择

mycat+MHA: mycat将写请求调度到MHA指定的vip, 让MHA去复制故障切换, mycat只负责调度
lvs|haproxy + pxc: pxc实现多主架构, 由lvs去做调度, 所有节点均可读写, lvs|haproxy负责后端数据库健康检查和故障切换.