说明
所有mysql实例皆为MYSQL8版本,使用的Xtrabackup备份组件为xtrabackup8.
生产mysql使用基于percona的分支,相对于原版mysql多了一些性能调教和监控视图,版本为:percona-server-server-8.0.22,备份相关工具对mysql8的官方版本也是完全兼容的.percona的分支相关信息:https://www.percona.com/software/mysql-database/percona-server
生产环境mysql一共3个实例,使用MGR组成集群以实现灵活的高可用与读写分离.并由前置的proxysql进行数据的路由与转发.
root@127.0.0.1 14:38: [(none)]> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+---------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+---------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 128c392c-a4cf-11eb-8c2a-f01090785866 | xxx-mysql-03 | 3306 | ONLINE | SECONDARY | 8.0.22 |
| group_replication_applier | 39f6440e-a4a3-11eb-8627-f0109078507a | xxx-mysql-01 | 3306 | ONLINE | PRIMARY | 8.0.22 |
| group_replication_applier | cd04c01c-a4c7-11eb-a80a-f01090783b98 | xxx-mysql-02 | 3306 | ONLINE | SECONDARY | 8.0.22 |
+---------------------------+--------------------------------------+---------------+-------------+--------------+-------------+----------------+
模拟故障为:在生产环境mysql8 的MGR集群完全不可用,有前一天的Xtrabackup全量备份和增量备份,需要从之前的全量和增量备份完整恢复到故障最近的时间点.本次故障恢复的要求是读取前一天的所有增量和全量数据并恢复.快速重组生产MGR集群.
在这里会用ansible脚本快速搭建3个mysql实例,以模拟生产环境(略过).
备份所使用的脚本:
backup_func.sh
:
#!/bin/bash
# 此脚本只适用于mysql8版本,早先的5.6,5.7版本不能混用(会由tools变量做安装包检测)
# crontab设置:(每天0点全备,其他时间每隔两小时一次增量)
# 0 0 * * * /home/backups/backup_func.sh full
# 0 2-22/2 * * * /home/backups/backup_func.sh incr
# 备份用户创建及权限配置(密码随机生成`openssl rand -base64 12 | cut -b 1-12`)
# mysql> CREATE USER 'bkpuser'@'localhost' IDENTIFIED BY 'xxxxxxxxxx';
# mysql> GRANT BACKUP_ADMIN, PROCESS, RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'bkpuser'@'localhost';
# mysql> GRANT SELECT ON performance_schema.log_status TO 'bkpuser'@'localhost';
# mysql> GRANT SELECT ON performance_schema.keyring_component_status TO bkpuser@'localhost'
# mysql> FLUSH PRIVILEGES;
# 备份相关基础路径(必填!!!)
BACKUPDIR=/home/backups/mysql
# 全量备份目录
Full=$BACKUPDIR/full
# 增量备份目录
Incr=$BACKUPDIR/incr
# 备份归档路径
Old=$BACKUPDIR/old
# 日志路径
baklog=$BACKUPDIR/backup.log
# 日期信息,用于区分当天备份用于归档
# TODAY=$(date +%Y%m%d%H%M)
YESTERDAY=$(date -d"yesterday" +%Y%m%d)
# Mysql实例相关信息(必填!!!)
MYSQL=/usr/bin/mysql
MYSQLADMIN=/usr/bin/mysqladmin
DB_HOST='localhost' # 填写localhost时,会尝试使用socket连接
DB_PORT=3306 # 这里没有用到,需要的话,在备份命令也加上端口参数
DB_USER='bkpuser'
DB_PASS='xxxxxxxxxxx'
DB_SOCK=/home/data/mysql/mysql.sock
DB_CONF=/etc/my.cnf
# 备份必备工具检查(压缩备份需要qpresss)
tools="percona-xtrabackup-80 qpress"
# Check packages before proceeding
for i in $tools; do
if ! [[ $(rpm -qa $i) =~ ${i} ]]; then
echo -e " Needed package $i not found.\n Pre check failed !!!"
exit 1
fi
done
# mysql 运行状态监测
if [ -z "$($MYSQLADMIN --host=$DB_HOST --socket=${DB_SOCK} --user=$DB_USER --password=$DB_PASS --port=$DB_PORT status | grep 'Uptime')" ]; then
echo -e "HALTED: MySQL does not appear to be running or incorrect username and password"
exit 1
fi
# 备份用户名密码监测 # 好像有点多余...
if ! $(echo 'exit' | $MYSQL -s --host=$DB_HOST --socket=${DB_SOCK} --user=$DB_USER --password=$DB_PASS --port=$DB_PORT); then
echo -e "HALTED: Supplied mysql username or password appears to be incorrect (not copied here for security, see script)."
exit 1
fi
####################################################
#归档备份函数(全量时自动触发)
####################################################
function Xtr_tar_backup() {
# if [ ! -d "${Old}" ]; then
# mkdir ${Old}
# fi
for i in $Full $Incr $Old; do
if [ ! -d $i ]; then
mkdir -pv $i
fi
done
# 压缩上传前一天的备份
echo "压缩前一天的备份,移动到${Old}"
cd $BACKUPDIR
tar -zcvf $YESTERDAY.tar.gz ./full/ ./incr/
#scp -P 8022 $YESTERDAY.tar.gz root@192.168.10.46:/data/backup/mysql/
mv $YESTERDAY.tar.gz $Old
if [ $? = 0 ]; then
rm -rf $Full $Incr
echo "Tar old backup succeed" | tee -a ${baklog} 2>&1
else
echo "Error with old backup." | tee -a ${baklog} 2>&1
fi
}
####################################################
#全量备份函数(手动触发)
####################################################
function Xtr_full_backup() {
if [ ! -d "${Full}" ]; then
mkdir ${Full}
fi
Xtr_tar_backup
# 第一步 创建本次的备份目录
FullBakTime=$(date +%Y%m%d-%H%M%S)
mkdir -p ${Full}/${FullBakTime}
FullBakDir=${Full}/${FullBakTime}
# 第二步 开始全量备份
echo -e "备份时间: ${FullBakTime}\n" | tee -a ${baklog} 2>&1
echo -e "本次全量备份目录为 ${FullBakDir}\n" | tee -a ${baklog} 2>&1
xtrabackup --defaults-file=${DB_CONF} --host=${DB_HOST} --port=$DB_PORT --user=${DB_USER} --password=${DB_PASS} --socket=${DB_SOCK} --backup --compress --compress-threads=4 --target-dir=${FullBakDir}
dirStorage=$(du -sh ${FullBakDir})
echo -e "本次备份数据 ${dirStorage}\n" | tee -a ${baklog} 2>&1
echo -e "备份完成...\n\n\n" | tee -a ${baklog} 2>&1
exit 0
}
####################################################
#增量备份函数(手动触发)
####################################################
function Xtr_incr_backup() {
# 第一步 获取上一次全量备份和增量备份信息
LATEST_INCR=$(find $Incr -mindepth 1 -maxdepth 1 -type d | sort -nr | head -1)
LATEST_FULL=$(find $Full -mindepth 1 -maxdepth 1 -type d | sort -nr | head -1)
if [ ! $LATEST_FULL]; then
echo "xtrabackup_info does not exist. Please make sure full backup exist."
exit 1
fi
echo "LATEST_INCR=$LATEST_INCR"
if [ ! -d "${Incr}" ]; then
mkdir ${Incr}
fi
# 判断上一次的备份路径,如果增量备份路径为空,则使用全量备份路径为--incremental-basedir
if [ ! $LATEST_INCR ]; then
CompliteLatestFullDir=$LATEST_FULL
else
CompliteLatestFullDir=$LATEST_INCR
fi
# 第二步 创建备份目录
IncrBakTime=$(date +%Y%m%d-%H%M%S)
mkdir -p ${Incr}/${IncrBakTime}
IncrBakDir=${Incr}/${IncrBakTime}
# 第三步 开始增量备份
echo -e "日期: ${IncrBakTime}\n" | tee -a ${baklog} 2>&1
echo -e "整点: ${Hour}\n" | tee -a ${baklog} 2>&1
echo -e "本次备份为基于上一次备份${CompliteLatestFullDir}的增量备份\n" | tee -a ${baklog} 2>&1
echo -e "本次增量备份目录为: ${IncrBakDir}\n" | tee -a ${baklog} 2>&1
xtrabackup --defaults-file=${DB_CONF} --host=${DB_HOST} --port=$DB_PORT --user=${DB_USER} --password=${DB_PASS} --socket=${DB_SOCK} --backup --compress --compress-threads=4 --parallel=4 --target-dir=${IncrBakDir} --incremental-basedir=${CompliteLatestFullDir}
dirStorage=$(du -sh ${IncrBakDir})
echo -e "本次备份数据 ${dirStorage}\n" | tee -a ${baklog} 2>&1
echo -e "备份完成...\n\n\n" | tee -a ${baklog} 2>&1
exit 0
}
####################################################
#主体备份函数
####################################################
function printInfo() {
echo "Your choice is $1"
}
case $1 in
"full")
echo "Your choice is $1"
Xtr_full_backup
;;
"incr")
echo "Your choice is $1"
Xtr_incr_backup
;;
*)
echo -e "No parameters specified!\nFor example:\n$0 full\n$0 incr"
;;
esac
exit 0
定时任务相关设置
策略是每天的0:30做一次全量备份,之后每两个小时的半点会做一次增量备份.
[root@xxx-mysql-01 backups]# crontab -l
30 0 * * * /home/backups/backup_func.sh full
30 2-23/2 * * * /home/backups/backup_func.sh incr
注: 本次恢复属于完整的生产环境集群恢复,下面的获取备份文件
,准备备份
,开始恢复
相关流程,需要在每一台服务器上执行,实际操作中,如果只需要恢复并查看数据则只做一个点就可以了.
获取备份文件
这里取的是前一天的打过包的备份文件,根据备份脚本的规则,每天凌晨的全量备份之前,会自动将前一天的全量备份和增量备份目录全部打包并以前一天的日期命名:
20210609.tar.gz
将其scp到待恢复的服务器上并解压:
tar -zxvf 20210609.tar.gz
解压后的目录结构:
# tree -L 2
.
├── 20210609.tar.gz
├── full
│ └── 20210609-092746
└── incr
├── 20210609-103002
├── 20210609-123002
├── 20210609-143002
├── 20210609-163002
├── 20210609-183002
├── 20210609-203002
└── 20210609-223002
12 directories, 1 file
解压备份:
确保需要恢复的服务器有mysql实例,xtrabackup8工具已安装,由于备份是经过压缩的,确保qpress也已安装.
[root@xxx-prod-percona-02 /]# rpm -qa|grep percona-xtrabackup-80
percona-xtrabackup-80-8.0.23-16.1.el7.x86_64
[root@xxx-prod-percona-02 /]# rpm -qa|grep qpress
qpress-11-1.el7.x86_64
由于备份脚本在备份时使用了--compress
指令,在恢复备份前,需要先解压缩备份,
这里将所有增量和全量备份路径均执行一下解压缩指令:
xtrabackup --decompress --target-dir=/home/full/20210609-092746/
xtrabackup --decompress --target-dir=/home/incr/20210609-103002/
xtrabackup --decompress --target-dir=/home/incr/20210609-123002/
xtrabackup --decompress --target-dir=/home/incr/20210609-143002/
xtrabackup --decompress --target-dir=/home/incr/20210609-163002/
xtrabackup --decompress --target-dir=/home/incr/20210609-183002/
xtrabackup --decompress --target-dir=/home/incr/20210609-203002/
xtrabackup --decompress --target-dir=/home/incr/20210609-223002/
准备备份:
在同时存在全量和增量备份需要合并的情况下,准备备份时需要带上--apply-log-only
参数,但是要注意在准备最后一个增量备份的时候,不需要加该参数.
xtrabackup --prepare --apply-log-only --target-dir=/home/full/20210609-092746/
xtrabackup --prepare --apply-log-only --target-dir=/home/full/20210609-092746/ --incremental-dir=/home/incr/20210609-103002/
xtrabackup --prepare --apply-log-only --target-dir=/home/full/20210609-092746/ --incremental-dir=/home/incr/20210609-123002/
xtrabackup --prepare --apply-log-only --target-dir=/home/full/20210609-092746/ --incremental-dir=/home/incr/20210609-143002/
xtrabackup --prepare --apply-log-only --target-dir=/home/full/20210609-092746/ --incremental-dir=/home/incr/20210609-163002/
xtrabackup --prepare --apply-log-only --target-dir=/home/full/20210609-092746/ --incremental-dir=/home/incr/20210609-183002/
xtrabackup --prepare --apply-log-only --target-dir=/home/full/20210609-092746/ --incremental-dir=/home/incr/20210609-203002/
xtrabackup --prepare --target-dir=/home/full/20210609-092746/ --incremental-dir=/home/incr/20210609-223002/
以上操作会将所有的增量备份合并到全量备份中.
开始恢复
1.关闭mysql
systemctl stop mysql
2.移除旧的数据文件
根据各自安装时指定的相关路径去删除数据.(data,binglog,logs,undolog),一般在my.cnf中有指定
rm -rf /home/data/mysql/data/* /home/data/mysql/logs/* /home/data/mysql/binlogs/* /home/data/mysql/undolog/*
3.拷贝完全准备好的数据库文件
如果my.cnf不是在默认路径(/etc/my.cnf),需要指定一下mysql配置文件的路径:--defaults-file=${DB_CONF}
xtrabackup --copy-back --target-dir=/home/full/20210609-092746/
4.修改数据路径的文件权限
chown mysql:mysql -R /home/data/mysql/
5.启动mysql
systemctl start mysql
至此,备份数据在单节点上的恢复已经完成了.
6.检查数据
查看下最后一份增量备份里才会有的一些数据
root@127.0.0.1 16:46: [(none)]> select created_time from xxxxxxxx.xxxxxxx where created_time >='2021-06-09 20:00:00' and created_time <='2021-06-09 22:00:00' order by xxx asc;
+---------------------+
| created_time |
+---------------------+
| 2021-06-09 20:57:08 |
| 2021-06-09 21:16:20 |
| 2021-06-09 21:23:03 |
| 2021-06-09 21:28:20 |
| 2021-06-09 21:32:01 |
+---------------------+
5 rows in set (0.02 sec)
7.MGR状态恢复
先确保同样的mysql恢复操作分别在三台服务器上执行完成.(如果在新建服务器上恢复MGR集群,一定要检查my.cnf中MGR集群相关配置,比如loose-group_replication_local_address
,loose-group_replication_group_seeds
,loose-group_replication_start_on_boot
)
master节点启动:
MGR的三台节点的权重实际上是一样的,选择其中的一台做master即可.
root@127.0.0.1 10:56: [(none)]> SET GLOBAL group_replication_bootstrap_group=ON;
Query OK, 0 rows affected (0.00 sec)
root@127.0.0.1 10:57: [(none)]> change master to master_user='repuser',master_password='xxxxxxxxx' for channel 'group_replication_recovery';
Query OK, 0 rows affected, 1 warning (0.03 sec)
root@127.0.0.1 10:57: [(none)]> START GROUP_REPLICATION;
Query OK, 0 rows affected (2.26 sec)
root@127.0.0.1 10:57: [(none)]> SET GLOBAL group_replication_bootstrap_group=OFF;
Query OK, 0 rows affected (0.00 sec)
root@127.0.0.1 10:57: [(none)]> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+----------------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+----------------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 694a6c7e-c994-11eb-bcd8-005056b82c4e | xxx-prod-percona-01 | 3306 | ONLINE | PRIMARY | 8.0.22 |
+---------------------------+--------------------------------------+----------------------+-------------+--------------+-------------+----------------+
1 row in set (0.00 sec)
mster节点启动完成后,再分别在两台slave节点启动MGR:
root@127.0.0.1 11:01: [(none)]> START GROUP_REPLICATION;
Query OK, 0 rows affected (2.93 sec)
root@127.0.0.1 11:01: [(none)]> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+----------------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+----------------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 234f91d5-c998-11eb-bb83-005056b81703 | xxx-prod-percona-02 | 3306 | ONLINE | SECONDARY | 8.0.22 |
| group_replication_applier | 694a6c7e-c994-11eb-bcd8-005056b82c4e | xxx-prod-percona-01 | 3306 | ONLINE | PRIMARY | 8.0.22 |
| group_replication_applier | 72870194-c997-11eb-975f-005056b80dd1 | xxx-prod-percona-03 | 3306 | ONLINE | SECONDARY | 8.0.22 |
+---------------------------+--------------------------------------+----------------------+-------------+--------------+-------------+----------------+
3 rows in set (0.01 sec)
由于3台mysql实例的数据是一样的,节点间状态同步迅速就OK了.
至此,3台mysql的MGR状态均为online
,整个集群启动完成,全演练流程结束.