前言
搭建完成 MHA 环境,然后模拟 master 故障,验证是否正确切换成了新的 master
参考1
搭建思路
- 搭建好一主二从的复制结构
- 配置各个服务器间的 SSH 免登陆
- 在各个服务器中都安装 mha-node,在 服务器4
slave2
上安装 mha-manager
- 使用 MHA 提供的脚本对 SSH 免登陆的配置、集群的复制状态进行验证,看是否正确。
- 如果验证通过,启动 manager 。
- 测试一下,把 master 停掉,看是否自动选出了新的 master
详细过程
因为资源有限,现在使用3台服务器进行搭建
192.168.1.11 作为 master
192.168.1.12 作为 slave1
192.168.1.13 作为 slave2,同时把 MHA manager 也安装到这台服务器
配置 SSH 免登陆
- 在每台服务器上都执行以下命令
ssh-keygen
(执行后会有多个输入提示,不用输入任何内容,全部直接回车即可) - 免登陆授权
ssh-copy-id -i /root/.ssh/id_rsa -p 22 root@192.168.1.11
ssh-copy-id -i /root/.ssh/id_rsa -p 22 root@192.168.1.12
ssh-copy-id -i /root/.ssh/id_rsa -p 22 root@192.168.1.13
验证 ssh 192.168.0.x
搭建环境
- 主节点配置 安装可参考:mysql搭建手册
vi /etc/my.cnf
[mysqld]
log-bin=mysql-bin
server-id=11
gtid_mode=on
#开启gtid, 必须主从全开
enforce_gtid_consistency=1
log_slave_updates=1
#开启半同步复制 否则自动切换主从的时候会报主键错误
plugin_load = "rpl_semi_sync_master=semisync_master.so;rpl_semi_sync_slave=semisync_slave.so"
loose_rpl_semi_sync_master_enabled = 1
loose_rpl_semi_sync_slave_enabled = 1
loose_rpl_semi_sync_master_timeout = 5000
#需要同步的数据库
binlog-do-db=test
##需要忽略的数据库
binlog-ignore-db=information_schema
binlog-ignore-db=mysql
binlog-ignore-db=performance_schema
binlog-ignore-db=sys
- 重启
service mysqld restart
- 在主服务器上建立帐户并授权slave:
mysql -uroot -proot
use mysql;
#mha测试账号,可以也使用root
CREATE USER 'mha'@'%' IDENTIFIED BY '123456';
#在所有Node上创建Manage监控用户
grant all privileges on *.* to 'mha'@'192.168.1.%' identified by '123456';
flush privileges;
#创建同步账号
CREATE USER 'repl'@'%' IDENTIFIED BY '123456';
#给同步账号赋权
grant replication slave,replication client on *.* to 'repl'@'192.168.1.%' identified by "123456";
//grant ALL PRIVILEGES on *.* to repl@"%" identified by "123456";
#生效
flush privileges;
#查看指定用户,IP权限
show grants for repl@'192.168.1.%';
- 登录主服务器的mysql,查询master的状态
show master status;
+------------------+----------+--------------+-------------------------------------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+-------------------------------------------------+-------------------+
| mysql-bin.000002 | 151 | test | information_schema,mysql,performance_schema,sys | |
+------------------+----------+--------------+-------------------------------------------------+-------------------+
1 row in set (0.00 sec)
- 在 master 上查询日志目录
show variables like 'log_bin_basename%';
从节点配置
- 两个从节点的server-id需要改一下
vi /etc/my.cnf
server-id=12
server-id=13
[mysqld]
log-bin=mysql-bin
server-id=12
gtid_mode=on
#开启gtid, 必须主从全开
enforce_gtid_consistency=1
log_slave_updates=1
#开启半同步复制 否则自动切换主从的时候会报主键错误
plugin_load = "rpl_semi_sync_master=semisync_master.so;rpl_semi_sync_slave=semisync_slave.so"
loose_rpl_semi_sync_master_enabled = 1
loose_rpl_semi_sync_slave_enabled = 1
loose_rpl_semi_sync_master_timeout = 5000
#需要同步的数据库
binlog-do-db=test
##需要忽略的数据库
binlog-ignore-db=information_schema
binlog-ignore-db=mysql
binlog-ignore-db=performance_schema
binlog-ignore-db=sys
#从服务器,要加上relay_log_purge=0,不加的话,会报出warning,relay_log_purge=0 is not set on slave
relay_log_purge=0
- 重启
service mysqld restart
- 192.168.0.12上执行
mysql -uroot -proot
use mysql;
#mha测试账号,可以也使用root
CREATE USER 'mha'@'%' IDENTIFIED BY '123456';
#在所有Node上创建Manage监控用户
grant all privileges on *.* to 'mha'@'192.168.1.%' identified by '123456';
flush privileges;
#从服务的id Master_Server_Id: 0
change master to master_auto_position=0;
#指定master
change master to master_host='192.168.1.11',master_user='repl',master_password='123456',master_log_file='mysql-bin.000009',master_log_pos=191;
flush privileges;
#启动复制
start slave;
#从节点上运行
#set global read_only=1;
- 查看状态
show slave status \G;
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.11
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 151
Relay_Log_File: localhost-relay-bin.000002
Relay_Log_Pos: 361
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
如上,当IO和SQL线程的状态均为Yes,则表示主从已实现同步了!
- 192.168.0.13上执行(MHA-manager管理节点)
mysql -uroot -proot
use mysql;
#mha测试账号,可以也使用root
CREATE USER 'mha'@'%' IDENTIFIED BY '123456';
grant ALL PRIVILEGES on *.* to mha@"%" Identified by "123456";
flush privileges;
#从服务的id Master_Server_Id: 0
change master to master_auto_position=0;
#指定master
change master to master_host='192.168.1.11',master_user='repl',master_password='123456',master_log_file='mysql-bin.000003',master_log_pos=151;
#启动复制
start slave;
#从节点上运行
#set global read_only=1;
- 查看状态
show slave status \G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.11
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000003
Read_Master_Log_Pos: 151
Relay_Log_File: localhost-relay-bin.000002
Relay_Log_Pos: 361
Relay_Master_Log_File: mysql-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
安装 MHA
创建安装目录
- Node服务器安装
mkdir -p /usr/local/mha - manage服务器安装
mkdir -p /usr/local/mha/ha1/fail_script
mkdir -p /usr/local/mha/ha1/workdir
/usr/local/mha:程序安装目录
/usr/local/mha/ha1:用于区别每一个mha方案,当前方案ha1
/usr/local/mha/ha1/fail_script:方案ha1的failover脚本保存路径
/usr/local/mha/ha1/workdir:方案ha1的的日志和failover产生的binlog保存路径
安装 MHA-node
-
在每台服务器上都执行以下命令
安装epel源(所有节点)
yum -y install perl-DBD-MySQL ncftp perl-DBI.x86
若遇到如图报错则:yum -y install perl-DBI.x*
下载需翻墙:
cd /usr/local/src/
wget https://raw.githubusercontent.com/linyue515/mysql-master-ha/master/mha4mysql-node-0.57-0.el7.noarch.rpm
#安装并在安装过程中显示正在安装的文件信息及安装进度
rpm -ivh mha4mysql-node-0.57-0.el7.noarch.rpm
安装 MHA-manager
在 slave2上执行命令
wget https://raw.githubusercontent.com/linyue515/mysql-master-ha/master/mha4mysql-manager-0.57-0.el7.noarch.rpm
rpm -ivh mha4mysql-manager-0.57-0.el7.noarch.rpm
- 错误:依赖检测失败:
perl(Config::Tiny) 被 mha4mysql-manager-0.57-0.el7.noarch 需要
perl(Log::Dispatch) 被 mha4mysql-manager-0.57-0.el7.noarch 需要
perl(Log::Dispatch::File) 被 mha4mysql-manager-0.57-0.el7.noarch 需要
perl(Log::Dispatch::Screen) 被 mha4mysql-manager-0.57-0.el7.noarch 需要
perl(Parallel::ForkManager) 被 mha4mysql-manager-0.57-0.el7.noarch 需要
- yum安装,先更新yum源
yum -y update
yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
没有可用软件包 perl-Log-Dispatch。没有可用软件包 perl-Parallel-ForkManager。
注意:这两步的安装方式为centos 系统,在其他系统中,需要自己调整安装方法
yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y
yum -y install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager
yum install -y rrdtool perl-rrdtool rrdtool-devel perl-Params-Validate
创建 MHA 配置文件
- slave2管理节点上,位置可以自定义
mkdir /usr/local/mha
vi /usr/local/mha/mha.cnf
配置文件添加如下:
[server default]
# mha监听用户
hostname=192.168.1.13
user=mha
password=123456
ssh_user=root
# 配置主从复制时创建的复制用户
repl_user=repl
repl_password=123456
#监控mater,ping的频率
ping_interval=1
# 在各台服务器上创建目录mkdir 如:/usr/local/mha
manager_workdir=/usr/local/mha
# 日志位置
manager_log=/usr/local/mha/manager.log
remote_workdir=/usr/local/mha
master_binlog_dir=/usr/local/mysql/data/mysql
#master_ip_failover:自动切换时vip管理的脚本
#master_ip_online_change:手动切换使用的脚本
#power_manager:故障发生后关闭主机的脚本
#send_report:发送报警的脚本。
master_ip_failover_script=/usr/local/mha/ha1/fail_script/master_ip_failover
master_ip_online_change_script=/usr/local/mha/ha1/fail_script/master_ip_online_change
secondary_check_script=/usr/bin/masterha_secondary_check -s 192.168.1.12 -s 192.168.1.13
report_script=/usr/local/mha/ha1/fail_script/send_report
shutdown_script=""
[server1]
hostname=192.168.1.11
port=3306
[server2]
hostname=192.168.1.12
port=3306
candidate_master=1
check_repl_delay=0
[server3]
hostname=192.168.1.13
port=3306
ignore_fail=1
no_master=1
基本配置部分
- mnager监控用的mysql,mha用户
一般设置为root用户
user=mha
password=123456 - ssh登入用户名
ssh_user=root - 配置主从复制时创建的复制用户,每个node服务器都需要存在
repl_user=repl
repl_password=123456 - mha_manager项目的主目录
前面创建的mha放脚本的目录
manager_workdir=/usr/local/mha - mha_manager记录日志
manager_log=/usr/local/mha/manager.log - master服务器上查询 show variables like 'log_bin_basename%';
例如:值为 /data/mysql/mysql-bin,需要的是 /data/mysql
master_binlog_dir=/usr/local/mysql/data/mysql - 监控mater,ping的频率
ping_interval=1 - node服务器在发生master切换时,binlog保持的路径,每个node都会在该目录下保存一份差异的binlog,除非没有差异。
remote_workdir=/usr/local/mha - 设置为候选master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中事件最新的slave
candidate_master=1 - 默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master,因为对于这个slave的恢复需要花费很长时间,通过设置check_repl_delay=0,MHA触发切换在选择一个新master的时候将会忽略复制延时,这个参数对于设置了candidate_master=1的主机非常有用,因为这个候选主在切换的过程中一定是新的master
check_repl_delay=0 - 如果不加上该参数,当该slave主机故障了,mha将无法启动,加上该参数会忽略该主机是否正常,在mha启动的时候加上参数--ignore_fail_on_start
ignore_fail=1 - 不将该主机转换为master
no_master=1
高可用配置部分
- mha在线自动failover时处理VIP的配置文件
master_ip_failover_script=/usr/local/mha/ha1/fail_script/master_ip_failover - 在线手动执行master切换时VIP的处理文件
master_ip_online_change_script=/usr/local/mha/ha1/fail_script/master_ip_online_change - 一旦MHA到新master之间的网络出现问题,manager会尝试从backup登入到masger
secondary_check_script=/usr/bin/masterha_secondary_check -s 192.168.1.12 -s 192.168.1.13 - 发生切换后执行的报警脚本
report_script=/usr/local/mha/ha1/fail_script/send_report - 故障后关闭master主机的脚本(主要是使用keepalive做VIP时会出现脑裂导致VIP频繁切换所以会将故障的master关闭)
shutdown_script=""
ha1脚本
- master_ip_failover
VIP的配置可以使用keepalived也可以写脚本,keepalived对网络的要求很高否则容易脑裂,在我前面搭建双主环境讲过keepalived的搭建方法,我这里使用脚本的方式。
需要修改VIP、网卡ens33
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '192.168.0.100/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
- master_ip_online_change
perl脚本
#!/usr/bin/env perl
use strict;
use warnings FATAL =>'all';
use Getopt::Long;
my $vip = '192.168.0.100/24'; # Virtual IP
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
my $exit_code = 0;
my (
$command, $orig_master_is_new_slave, $orig_master_host,
$orig_master_ip, $orig_master_port, $orig_master_user,
$orig_master_password, $new_master_host,
$new_master_ip, $new_master_port, $new_master_user,
$new_master_password,
);
GetOptions(
'command=s' => \$command,
'orig_master_is_new_slave' => \$orig_master_is_new_slave,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'orig_master_user=s' => \$orig_master_user,
'orig_master_password=s' => \$orig_master_password,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
'new_master_user=s' => \$new_master_user,
'new_master_password=s' => \$new_master_password,
);
exit &main();
sub main {
#print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
# $orig_master_host, $orig_master_ip, $orig_master_port are passed.
# If you manage master ip address at global catalog database,
# invalidate orig_master_ip here.
my $exit_code = 1;
eval {
print "\n\n\n***************************************************************\n";
print "Disabling the VIP - $vip on old master: $orig_master_host\n";
print "***************************************************************\n\n\n\n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
# all arguments are passed.
# If you manage master ip address at global catalog database,
# activate new_master_ip here.
# You can also grant write access (create user, set read_only=0, etc) here.
my $exit_code = 10;
eval {
print "\n\n\n***************************************************************\n";
print "Enabling the VIP - $vip on new master: $new_master_host \n";
print "***************************************************************\n\n\n\n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
`ssh $orig_master_user\@$orig_master_host \" $ssh_start_vip \"`;
exit 0;
}
else {
&usage();
exit 1;
}
}
# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $new_master_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $orig_master_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
- shell脚本
#/bin/bash
#source /root/.bash_profile
vip=`echo '192.168.0.100/24'` # Virtual IP
key=`echo '1'`
command=`echo "$1" | awk -F = '{print $2}'`
orig_master_host=`echo "$2" | awk -F = '{print $2}'`
new_master_host=`echo "$7" | awk -F = '{print $2}'`
stop_vip=`echo "ssh root@$orig_master_host /sbin/ifconfig ens33:$key down"`
start_vip=`echo "ssh root@$new_master_host /sbin/ifconfig ens33:$key $vip"`
if [ $command = 'stop' ]
then
echo -e "\n\n\n***************************************************************\n"
echo -e "Disabling the VIP - $vip on old master: $orig_master_host\n"
$stop_vip
if [ $? -eq 0 ]
then
echo "Disabled the VIP successfully"
else
echo "Disabled the VIP failed"
fi
echo -e "***************************************************************\n\n\n\n"
fi
if [ $command = 'start' -o $command = 'status' ]
then
echo -e "\n\n\n***************************************************************\n"
echo -e "Enabling the VIP - $vip on new master: $new_master_host \n"
$start_vip
if [ $? -eq 0 ]
then
echo "Enabled the VIP successfully"
else
echo "Enabled the VIP failed"
fi
echo -e "***************************************************************\n\n\n\n"
fi
- send_report
#!/usr/bin/perl
# Copyright (C) 2011 DeNA Co.,Ltd.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
## Note: This is a sample script and is not complete. Modify the script based on your environment.
use strict;
use warnings FATAL => 'all';
use Mail::Sender;
use Getopt::Long;
#new_master_host and new_slave_hosts are set only when recovering master succeeded
my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );
my $smtp='1019533934@qq.com';
my $mail_from='xxxx';
my $mail_user='xxxxx';
my $mail_pass='xxxxx';
my $mail_to=['xxxx','xxxx'];
GetOptions(
'orig_master_host=s' => \$dead_master_host,
'new_master_host=s' => \$new_master_host,
'new_slave_hosts=s' => \$new_slave_hosts,
'subject=s' => \$subject,
'body=s' => \$body,
);
mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);
sub mailToContacts {
my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_;
open my $DEBUG, "> /tmp/monitormail.log"
or die "Can't open the debug file:$!\n";
my $sender = new Mail::Sender {
ctype => 'text/plain; charset=utf-8',
encoding => 'utf-8',
smtp => $smtp,
from => $mail_from,
auth => 'LOGIN',
TLS_allowed => '0',
authid => $user,
authpwd => $passwd,
to => $mail_to,
subject => $subject,
debug => $DEBUG
};
$sender->MailMsg(
{ msg => $msg,
debug => $DEBUG
}
) or print $Mail::Sender::Error;
return 1;
}
# Do whatever you want here
exit 0;
- 修改用户,默认监控用户是mha
cd /usr/bin
vi masterha_secondary_check - 给上面脚本赋权:
chmod -R 755 /usr/local/mha/ha1/fail_script
master手动添加虚拟VIP
注意:需要手动先在master服务器上面添加VIP
yum install -y net-tools.x86_64
查看网卡:
ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:c4:82:82 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.11/24 brd 192.168.0.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fec4:8282/64 scope link
valid_lft forever preferred_lft forever
添加虚拟VIP:
ifconfig ens33:1 192.168.1.100/24
ifconfig ens33:1 192.168.1.100/24 down
配置relay_log的清除方式(在每个Node上)
- 所有Node的cnf配置文件加上
relay_log_purge=0
MHA在发生切换的过程中,从库的恢复过程中依赖于relay log的相关信息,所以这里要将relay log的自动清除设置为OFF,采用手动清除relay log的方式。
在默认情况下,从服务器上的中继日志会在SQL线程执行完毕后被自动删除。但是在MHA环境中,这些中继日志在恢复其他从服务器时可能会被用到,因此需要禁用中继日志的自动删除功能。定期清除中继日志需要考虑到复制延时的问题。在ext3的文件系统下,删除大的文件需要一定的时间,会导致严重的复制延时。为了避免复制延时,需要暂时为中继日志创建硬链接,因为在linux系统中通过硬链接删除大文件速度会很快。
提示:在mysql数据库中,删除大表时,通常也采用建立硬链接的方式
MHA节点中包含了pure_relay_logs命令工具,它可以为中继日志创建硬链接,执行SET GLOBAL relay_log_purge=1,等待几秒钟以便SQL线程切换到新的中继日志,再执行SET GLOBAL relay_log_purge=0。
pure_relay_logs脚本参数如下所示:
--user mysql 用户名
--password mysql 密码
--port 端口号
--workdir 指定创建relay log的硬链接的位置,默认是/var/tmp,由于系统不同分区创建硬链接文件会失败,故需要执行硬链接具体位置,成功执行脚本后,硬链接的中继日志文件被删除
--disable_relay_log_purge 默认情况下,如果relay_log_purge=1,脚本会什么都不清理,自动退出,通过设定这个参数,当relay_log_purge=1的情况下会将relay_log_purge设置为0。清理relay log之后,最后将参数设置为OFF。
在每台slave Node上创建
vim /usr/local/mha/purge_relay_log.sh
#!/bin/bash
user=root
passwd=root
#确保用户和密码能通过127.0.0.1登入(本地的IP)
host='127.0.0.1'
port=3306
work_dir='/usr/local/mysql/data/mysql'
purge='/usr/bin/purge_relay_logs'
$purge --user=$user --password=$passwd --host=$host --disable_relay_log_purge --port=$port --workdir=$work_dir >> /usr/local/mha/purge_relay_logs.log 2>&1
- 授权
chmod u+x /usr/local/mha/purge_relay_log.sh
cd /usr/local/mha
./purge_relay_log.sh
2019-06-23 12:07:51: purge_relay_logs script started.
relay_log_purge is enabled. Disabling..
Found relay_log.info: /usr/local/mysql/data/mysql/relay-log.info
Opening /usr/local/mysql/data/mysql/localhost-relay-bin.000001 ..
Opening /usr/local/mysql/data/mysql/localhost-relay-bin.000002 ..
Executing SET GLOBAL relay_log_purge=1; FLUSH LOGS; sleeping a few seconds so that SQL thread can delete older relay log files (if it keeps up); SET GLOBAL relay_log_purge=0; .. ok.
2019-06-23 12:07:54: All relay log purging operations succeeded.
将脚本加入到os定时任务中
启动 manager
check检查
- 检查SSH配置
masterha_check_ssh --conf=/usr/local/mha/mha.cnf - 检查复制
masterha_check_repl --conf=/usr/local/mha/mha.cnf - 检查状态
masterha_check_status --conf=/usr/local/mha/mha.cnf
必需保证所有的检查都通过 - 错误
在验证时,我遇到过这个错误:Can't exec "mysqlbinlog" ......
解决方法是在所有服务器上执行:
ln -s /usr/local/mysql/bin/mysqlbinlog /usr/local/bin/mysqlbinlog
ln -s /usr/local/mysql/bin/mysql /usr/local/bin/mysql
注意:加粗位置,需要根据自己环境修改路径 - 验证成功
MySQL Replication Health is OK.
- 启动MHA
nohup masterha_manager --conf=/usr/local/mha/mha.cnf --ignore_fail_on_start --ignore_last_failover < /dev/null > /usr/local/mha/start.log 2>&1 & - 查看启动日志
tail -f manager.log
IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.0.100/24===
Checking the Status of the script.. OK
Sun Jun 23 09:44:20 2019 - [info] OK.
Sun Jun 23 09:44:20 2019 - [warning] shutdown_script is not defined.
Sun Jun 23 09:44:20 2019 - [info] Set master ping interval 1 seconds.
Sun Jun 23 09:44:20 2019 - [info] Set secondary check script: /usr/bin/masterha_secondary_check -s backup -s master --user=root --master_host=master --master_ip=192.168.0.11 --master_port=3306
Sun Jun 23 09:44:20 2019 - [info] Starting ping health check on 192.168.0.11(192.168.0.11:3306)..
Sun Jun 23 09:44:20 2019 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
- 状态检查
ps -ef|grep masterha_manager
masterha_check_status --conf=/usr/local/mha/mha.cnf
mha (pid:29238) is running(0:PING_OK), master:192.168.0.11
- 处理故障master,将其配置为从库chang到新的master,可以从manager.log找到change语句。
grep "CHANGE MASTER TO MASTER" /usr/local/mha/manager.log | tail -1 - 停掉MHA监控
masterha_stop --conf=/usr/local/mha/mha.cnf
故障转移验证
验证MHA
验证的方式是先停掉 master,因为之前的配置文件中,把 slave1 作为了候选人,那么就到 slave2 上查看 master 的 IP 是否变为了 slave1 的 IP
- 停掉 master,在 master(192.168.0.11) 上把 mysql 停掉
service mysqld stop - 查看 MHA 日志,上面的配置文件中指定了日志位置为 /usr/local/mha/manager.log
tail -f /usr/local/mha/manager.log
Started automated(non-interactive) failover.
Selected 192.168.0.12(192.168.0.12:3306) as a new master.
192.168.0.12(192.168.0.12:3306): OK: Applying all logs succeeded.
192.168.0.13(192.168.0.13:3306): OK: Slave started, replicating from 192.168.0.12(192.168.0.12:3306)
192.168.0.12(192.168.0.12:3306): Resetting slave info succeeded.
Master failover to 192.168.0.12(192.168.0.12:3306) completed successfully.
-
删除fail文件(非必需)
由于启动mha的时候加上了--ignore_last_failover参数,所以不删除failower生成的文件也能启动,否则需要删除failower生成的文件“mha.failover.complete”。
rm -rf /usr/local/mha/mha.failover.complete
检查新的slave
登陆 slave(192.168.0.12)的mysql,查看 slave 状态
show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.0.12
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000006
Read_Master_Log_Pos: 473
Relay_Log_File: localhost-relay-bin.000002
Relay_Log_Pos: 408
Relay_Master_Log_File: mysql-bin.000006
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
可以看到 master 的 IP 现在为 12,说明 MHA 已经把 slave1 提升为了新的 master,IO线程和SQL线程也正确运行,MHA 搭建成功。
IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.1.100/24===
Checking the Status of the script.. OK
Sun Jun 23 13:52:00 2019 - [info] OK.
Sun Jun 23 13:52:00 2019 - [warning] shutdown_script is not defined.
Sun Jun 23 13:52:00 2019 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
failover处理
dead master处理阶段
down掉主服务器的VIP
/usr/local/mha/ha1/fail_script/master_ip_failover --orig_master_host=192.168.1.11 --orig_master_ip=192.168.1.100 --orig_master_port=3306 --command=stopssh --ssh_user=root
- 问题1:
slave replicates is not defined in the configuration file!
MySQL彻底清除slave信息
stop slave;
reset slave all;
- 问题2:
Tue Feb 23 13:47:18 2016 - [info] read_only=1 is not set on slave db
Tue Feb 23 13:47:18 2016 - [warning] relay_log_purge=0 is not set on slave db
解决办法:
备库执行
set global read_only=1;
set global relay_log_purge=0;
keepalived配置
配置keepalived的配置文件,在master上配置(192.168.1.11)
vi /etc/keepalived/keepalived.conf
global_defs {
notification_email {
saltstack@163.com
}
notification_email_from dba@dbserver.com
smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id MySQL-HA
}
vrrp_instance VI_1 {
state BACKUP
interface ens33
virtual_router_id 51 priority 150 advert_int 1 nopreempt
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.1.100
}
}
其中router_id MySQL HA表示设定keepalived组的名称,将192.168.1.100这个虚拟ip绑定到该主机的eth1网卡上,并且设置了状态为backup模式,将keepalived的模式设置为非抢占模式(nopreempt),priority 150表示设置的优先级为150。下面的配置略有不同,但是都是一个意思。
在候选master上配置(192.168.1.12)
vi /etc/keepalived/keepalived.conf
global_defs {
notification_email {
saltstack@163.com
}
notification_email_from dba@dbserver.com
smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id MySQL-HA
}
vrrp_instance VI_1 {
state BACKUP
interface ens33
virtual_router_id 51 priority 120 advert_int 1 nopreempt
authentication {
auth_type PASS
auth_pass 1111 }
virtual_ipaddress {
192.168.1.100
}
}
/etc/init.d/keepalived start ; tail -f /var/log/messages