mysql高可用集群 MHA 搭建实例

前言

搭建完成 MHA 环境,然后模拟 master 故障,验证是否正确切换成了新的 master
参考1

搭建思路
  1. 搭建好一主二从的复制结构
  2. 配置各个服务器间的 SSH 免登陆
  3. 在各个服务器中都安装 mha-node,在 服务器4slave2 上安装 mha-manager
    image.png
  • 使用 MHA 提供的脚本对 SSH 免登陆的配置、集群的复制状态进行验证,看是否正确。
  • 如果验证通过,启动 manager 。
  • 测试一下,把 master 停掉,看是否自动选出了新的 master

详细过程

因为资源有限,现在使用3台服务器进行搭建
192.168.1.11 作为 master
192.168.1.12 作为 slave1
192.168.1.13 作为 slave2,同时把 MHA manager 也安装到这台服务器

配置 SSH 免登陆

  1. 在每台服务器上都执行以下命令
    ssh-keygen
    (执行后会有多个输入提示,不用输入任何内容,全部直接回车即可)
  2. 免登陆授权
ssh-copy-id -i /root/.ssh/id_rsa -p 22 root@192.168.1.11
ssh-copy-id -i /root/.ssh/id_rsa -p 22 root@192.168.1.12
ssh-copy-id -i /root/.ssh/id_rsa -p 22 root@192.168.1.13

验证 ssh 192.168.0.x

搭建环境

  1. 主节点配置 安装可参考:mysql搭建手册
    vi /etc/my.cnf
[mysqld]
log-bin=mysql-bin
server-id=11
gtid_mode=on
#开启gtid, 必须主从全开
enforce_gtid_consistency=1
log_slave_updates=1
#开启半同步复制 否则自动切换主从的时候会报主键错误
plugin_load = "rpl_semi_sync_master=semisync_master.so;rpl_semi_sync_slave=semisync_slave.so"
loose_rpl_semi_sync_master_enabled = 1
loose_rpl_semi_sync_slave_enabled = 1
loose_rpl_semi_sync_master_timeout = 5000
#需要同步的数据库
binlog-do-db=test
##需要忽略的数据库
binlog-ignore-db=information_schema
binlog-ignore-db=mysql
binlog-ignore-db=performance_schema
binlog-ignore-db=sys
  • 重启
    service mysqld restart
  1. 在主服务器上建立帐户并授权slave:
mysql -uroot -proot  
use mysql;
#mha测试账号,可以也使用root
CREATE USER 'mha'@'%' IDENTIFIED BY '123456';
#在所有Node上创建Manage监控用户
grant all privileges on *.* to 'mha'@'192.168.1.%' identified  by '123456';
flush privileges;
#创建同步账号
CREATE USER 'repl'@'%' IDENTIFIED BY '123456';
#给同步账号赋权
grant replication slave,replication client on *.* to 'repl'@'192.168.1.%' identified by "123456";
//grant ALL PRIVILEGES on *.* to repl@"%" identified by "123456";
#生效
flush privileges;
#查看指定用户,IP权限
show grants for repl@'192.168.1.%'; 
  1. 登录主服务器的mysql,查询master的状态
    show master status;
+------------------+----------+--------------+-------------------------------------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB                                | Executed_Gtid_Set |
+------------------+----------+--------------+-------------------------------------------------+-------------------+
| mysql-bin.000002 |      151 | test         | information_schema,mysql,performance_schema,sys |                   |
+------------------+----------+--------------+-------------------------------------------------+-------------------+
1 row in set (0.00 sec)

  1. 在 master 上查询日志目录
    show variables like 'log_bin_basename%';
image.png

从节点配置

  1. 两个从节点的server-id需要改一下
    vi /etc/my.cnf
    server-id=12
    server-id=13
[mysqld]
log-bin=mysql-bin
server-id=12
gtid_mode=on
#开启gtid, 必须主从全开
enforce_gtid_consistency=1
log_slave_updates=1
#开启半同步复制 否则自动切换主从的时候会报主键错误
plugin_load = "rpl_semi_sync_master=semisync_master.so;rpl_semi_sync_slave=semisync_slave.so"
loose_rpl_semi_sync_master_enabled = 1
loose_rpl_semi_sync_slave_enabled = 1
loose_rpl_semi_sync_master_timeout = 5000
#需要同步的数据库
binlog-do-db=test
##需要忽略的数据库
binlog-ignore-db=information_schema
binlog-ignore-db=mysql
binlog-ignore-db=performance_schema
binlog-ignore-db=sys
#从服务器,要加上relay_log_purge=0,不加的话,会报出warning,relay_log_purge=0 is not set on slave
relay_log_purge=0
  • 重启
    service mysqld restart
  1. 192.168.0.12上执行
mysql -uroot -proot  
use mysql;
#mha测试账号,可以也使用root
CREATE USER 'mha'@'%' IDENTIFIED BY '123456';
#在所有Node上创建Manage监控用户
grant all privileges on *.* to 'mha'@'192.168.1.%' identified  by '123456';
flush privileges;
#从服务的id Master_Server_Id: 0
change master to master_auto_position=0;
#指定master 
change  master to master_host='192.168.1.11',master_user='repl',master_password='123456',master_log_file='mysql-bin.000009',master_log_pos=191;
flush privileges;
#启动复制
start slave;  
#从节点上运行
#set global read_only=1;
  • 查看状态
    show slave status \G;
 Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.11
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 151
               Relay_Log_File: localhost-relay-bin.000002
                Relay_Log_Pos: 361
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

如上,当IO和SQL线程的状态均为Yes,则表示主从已实现同步了!

  1. 192.168.0.13上执行(MHA-manager管理节点)
mysql -uroot -proot  
use mysql;
#mha测试账号,可以也使用root
CREATE USER 'mha'@'%' IDENTIFIED BY '123456';
grant ALL PRIVILEGES on *.* to mha@"%" Identified by "123456";
flush privileges;
#从服务的id Master_Server_Id: 0
change master to master_auto_position=0;
#指定master
change  master to master_host='192.168.1.11',master_user='repl',master_password='123456',master_log_file='mysql-bin.000003',master_log_pos=151;
#启动复制
start slave;  
#从节点上运行
#set global read_only=1;
  • 查看状态
    show slave status \G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.11
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000003
          Read_Master_Log_Pos: 151
               Relay_Log_File: localhost-relay-bin.000002
                Relay_Log_Pos: 361
        Relay_Master_Log_File: mysql-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

安装 MHA

创建安装目录
  1. Node服务器安装
    mkdir -p /usr/local/mha
  2. manage服务器安装
    mkdir -p /usr/local/mha/ha1/fail_script
    mkdir -p /usr/local/mha/ha1/workdir

/usr/local/mha:程序安装目录
/usr/local/mha/ha1:用于区别每一个mha方案,当前方案ha1
/usr/local/mha/ha1/fail_script:方案ha1的failover脚本保存路径
/usr/local/mha/ha1/workdir:方案ha1的的日志和failover产生的binlog保存路径

安装 MHA-node
  1. 在每台服务器上都执行以下命令
    安装epel源(所有节点)
    yum -y install perl-DBD-MySQL ncftp perl-DBI.x86


    image.png
  2. 若遇到如图报错则:yum -y install perl-DBI.x*
    下载需翻墙:

cd /usr/local/src/
wget https://raw.githubusercontent.com/linyue515/mysql-master-ha/master/mha4mysql-node-0.57-0.el7.noarch.rpm
#安装并在安装过程中显示正在安装的文件信息及安装进度
rpm -ivh mha4mysql-node-0.57-0.el7.noarch.rpm 
安装 MHA-manager

在 slave2上执行命令

wget https://raw.githubusercontent.com/linyue515/mysql-master-ha/master/mha4mysql-manager-0.57-0.el7.noarch.rpm
rpm -ivh mha4mysql-manager-0.57-0.el7.noarch.rpm
  • 错误:依赖检测失败:

    perl(Config::Tiny) 被 mha4mysql-manager-0.57-0.el7.noarch 需要
    perl(Log::Dispatch) 被 mha4mysql-manager-0.57-0.el7.noarch 需要
    perl(Log::Dispatch::File) 被 mha4mysql-manager-0.57-0.el7.noarch 需要
    perl(Log::Dispatch::Screen) 被 mha4mysql-manager-0.57-0.el7.noarch 需要
    perl(Parallel::ForkManager) 被 mha4mysql-manager-0.57-0.el7.noarch 需要

  1. yum安装,先更新yum源
yum -y update
yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

没有可用软件包 perl-Log-Dispatch。没有可用软件包 perl-Parallel-ForkManager。
注意:这两步的安装方式为centos 系统,在其他系统中,需要自己调整安装方法

yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y
yum -y install  perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch   perl-Parallel-ForkManager  
yum install -y rrdtool perl-rrdtool rrdtool-devel perl-Params-Validate

创建 MHA 配置文件

  1. slave2管理节点上,位置可以自定义
    mkdir /usr/local/mha
    vi /usr/local/mha/mha.cnf
    配置文件添加如下:
[server default]
# mha监听用户
hostname=192.168.1.13
user=mha
password=123456
ssh_user=root
# 配置主从复制时创建的复制用户
repl_user=repl
repl_password=123456
#监控mater,ping的频率
ping_interval=1
# 在各台服务器上创建目录mkdir 如:/usr/local/mha
manager_workdir=/usr/local/mha
# 日志位置
manager_log=/usr/local/mha/manager.log
remote_workdir=/usr/local/mha
master_binlog_dir=/usr/local/mysql/data/mysql

#master_ip_failover:自动切换时vip管理的脚本
#master_ip_online_change:手动切换使用的脚本
#power_manager:故障发生后关闭主机的脚本
#send_report:发送报警的脚本。
master_ip_failover_script=/usr/local/mha/ha1/fail_script/master_ip_failover
master_ip_online_change_script=/usr/local/mha/ha1/fail_script/master_ip_online_change
secondary_check_script=/usr/bin/masterha_secondary_check  -s 192.168.1.12 -s 192.168.1.13  
report_script=/usr/local/mha/ha1/fail_script/send_report
shutdown_script=""
[server1]
hostname=192.168.1.11
port=3306
[server2]
hostname=192.168.1.12
port=3306
candidate_master=1
check_repl_delay=0
[server3]
hostname=192.168.1.13
port=3306
ignore_fail=1
no_master=1
基本配置部分
  1. mnager监控用的mysql,mha用户 一般设置为root用户
    user=mha
    password=123456
  2. ssh登入用户名
    ssh_user=root
  3. 配置主从复制时创建的复制用户,每个node服务器都需要存在
    repl_user=repl
    repl_password=123456
  4. mha_manager项目的主目录 前面创建的mha放脚本的目录
    manager_workdir=/usr/local/mha
  5. mha_manager记录日志
    manager_log=/usr/local/mha/manager.log
  6. master服务器上查询 show variables like 'log_bin_basename%';
    例如:值为 /data/mysql/mysql-bin,需要的是 /data/mysql
    master_binlog_dir=/usr/local/mysql/data/mysql
  7. 监控mater,ping的频率
    ping_interval=1
  8. node服务器在发生master切换时,binlog保持的路径,每个node都会在该目录下保存一份差异的binlog,除非没有差异。
    remote_workdir=/usr/local/mha
  9. 设置为候选master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中事件最新的slave
    candidate_master=1
  10. 默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master,因为对于这个slave的恢复需要花费很长时间,通过设置check_repl_delay=0,MHA触发切换在选择一个新master的时候将会忽略复制延时,这个参数对于设置了candidate_master=1的主机非常有用,因为这个候选主在切换的过程中一定是新的master
    check_repl_delay=0
  11. 如果不加上该参数,当该slave主机故障了,mha将无法启动,加上该参数会忽略该主机是否正常,在mha启动的时候加上参数--ignore_fail_on_start
    ignore_fail=1
  12. 不将该主机转换为master
    no_master=1
高可用配置部分
  1. mha在线自动failover时处理VIP的配置文件
    master_ip_failover_script=/usr/local/mha/ha1/fail_script/master_ip_failover
  2. 在线手动执行master切换时VIP的处理文件
    master_ip_online_change_script=/usr/local/mha/ha1/fail_script/master_ip_online_change
  3. 一旦MHA到新master之间的网络出现问题,manager会尝试从backup登入到masger
    secondary_check_script=/usr/bin/masterha_secondary_check -s 192.168.1.12 -s 192.168.1.13
  4. 发生切换后执行的报警脚本
    report_script=/usr/local/mha/ha1/fail_script/send_report
  5. 故障后关闭master主机的脚本(主要是使用keepalive做VIP时会出现脑裂导致VIP频繁切换所以会将故障的master关闭)
    shutdown_script=""

ha1脚本

  1. master_ip_failover
    VIP的配置可以使用keepalived也可以写脚本,keepalived对网络的要求很高否则容易脑裂,在我前面搭建双主环境讲过keepalived的搭建方法,我这里使用脚本的方式。
    需要修改VIP、网卡ens33
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;

my (
    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);

my $vip = '192.168.0.100/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";

GetOptions(
    'command=s'          => \$command,
    'ssh_user=s'         => \$ssh_user,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s'   => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'new_master_host=s'  => \$new_master_host,
    'new_master_ip=s'    => \$new_master_ip,
    'new_master_port=i'  => \$new_master_port,
);

exit &main();

sub main {

    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {

        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {

        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}

sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
     return 0  unless  ($ssh_user);
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
  1. master_ip_online_change
    perl脚本
#!/usr/bin/env perl  
use strict;
use warnings FATAL =>'all';

use Getopt::Long;

my $vip = '192.168.0.100/24';  # Virtual IP  
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
my $exit_code = 0;

my (
  $command,              $orig_master_is_new_slave, $orig_master_host,
  $orig_master_ip,       $orig_master_port,         $orig_master_user,
  $orig_master_password, $new_master_host,
  $new_master_ip,        $new_master_port,          $new_master_user,
  $new_master_password,
);
GetOptions(
  'command=s'                => \$command,
  'orig_master_is_new_slave' => \$orig_master_is_new_slave,
  'orig_master_host=s'       => \$orig_master_host,
  'orig_master_ip=s'         => \$orig_master_ip,
  'orig_master_port=i'       => \$orig_master_port,
  'orig_master_user=s'       => \$orig_master_user,
  'orig_master_password=s'   => \$orig_master_password,
  'new_master_host=s'        => \$new_master_host,
  'new_master_ip=s'          => \$new_master_ip,
  'new_master_port=i'        => \$new_master_port,
  'new_master_user=s'        => \$new_master_user,
  'new_master_password=s'    => \$new_master_password,
);

exit &main();

sub main {

#print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";  

if ( $command eq "stop" || $command eq "stopssh" ) {

        # $orig_master_host, $orig_master_ip, $orig_master_port are passed.  
        # If you manage master ip address at global catalog database,  
        # invalidate orig_master_ip here.  
        my $exit_code = 1;
        eval {
            print "\n\n\n***************************************************************\n";
            print "Disabling the VIP - $vip on old master: $orig_master_host\n";
            print "***************************************************************\n\n\n\n";
&stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
}
elsif ( $command eq "start" ) {

        # all arguments are passed.  
        # If you manage master ip address at global catalog database,  
        # activate new_master_ip here.  
        # You can also grant write access (create user, set read_only=0, etc) here.  
my $exit_code = 10;
        eval {
            print "\n\n\n***************************************************************\n";
            print "Enabling the VIP - $vip on new master: $new_master_host \n";
            print "***************************************************************\n\n\n\n";
&start_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
}

elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        `ssh $orig_master_user\@$orig_master_host \" $ssh_start_vip \"`;
        exit 0;
}
else {
&usage();
        exit 1;
}
}

# A simple system call that enable the VIP on the new master  
sub start_vip() {
`ssh $new_master_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master  
sub stop_vip() {
`ssh $orig_master_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
  1. shell脚本
#/bin/bash  
#source /root/.bash_profile  
  
vip=`echo '192.168.0.100/24'`  # Virtual IP  
key=`echo '1'`  
  
command=`echo "$1" | awk -F = '{print $2}'`  
orig_master_host=`echo "$2" | awk -F = '{print $2}'`  
new_master_host=`echo "$7" | awk -F = '{print $2}'`    
  
stop_vip=`echo "ssh root@$orig_master_host /sbin/ifconfig  ens33:$key  down"`  
start_vip=`echo "ssh root@$new_master_host /sbin/ifconfig  ens33:$key  $vip"`  
  
if [ $command = 'stop' ]  
   then  
   echo -e "\n\n\n***************************************************************\n"  
   echo -e "Disabling the VIP - $vip on old master: $orig_master_host\n"  
   $stop_vip  
   if [ $? -eq 0 ]  
      then  
      echo "Disabled the VIP successfully"  
   else  
      echo "Disabled the VIP failed"  
   fi  
   echo -e "***************************************************************\n\n\n\n"  
fi  
  
if [ $command = 'start' -o $command = 'status' ]  
   then  
   echo -e "\n\n\n***************************************************************\n"  
   echo -e "Enabling the VIP - $vip on new master: $new_master_host \n"  
   $start_vip  
   if [ $? -eq 0 ]  
      then  
      echo "Enabled the VIP successfully"  
   else  
      echo "Enabled the VIP failed"  
   fi  
   echo -e "***************************************************************\n\n\n\n"  
fi 
  1. send_report
#!/usr/bin/perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';
use Mail::Sender;
use Getopt::Long;

#new_master_host and new_slave_hosts are set only when recovering master succeeded
my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );
my $smtp='1019533934@qq.com';
my $mail_from='xxxx';
my $mail_user='xxxxx';
my $mail_pass='xxxxx';
my $mail_to=['xxxx','xxxx'];
GetOptions(
  'orig_master_host=s' => \$dead_master_host,
  'new_master_host=s'  => \$new_master_host,
  'new_slave_hosts=s'  => \$new_slave_hosts,
  'subject=s'          => \$subject,
  'body=s'             => \$body,
);

mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);

sub mailToContacts {
    my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_;
    open my $DEBUG, "> /tmp/monitormail.log"
        or die "Can't open the debug      file:$!\n";
    my $sender = new Mail::Sender {
        ctype       => 'text/plain; charset=utf-8',
        encoding    => 'utf-8',
        smtp        => $smtp,
        from        => $mail_from,
        auth        => 'LOGIN',
        TLS_allowed => '0',
        authid      => $user,
        authpwd     => $passwd,
        to          => $mail_to,
        subject     => $subject,
        debug       => $DEBUG
    };

    $sender->MailMsg(
        {   msg   => $msg,
            debug => $DEBUG
        }
    ) or print $Mail::Sender::Error;
    return 1;
}
# Do whatever you want here
exit 0;
  • 修改用户,默认监控用户是mha
    cd /usr/bin
    vi masterha_secondary_check
  • 给上面脚本赋权:
    chmod -R 755 /usr/local/mha/ha1/fail_script
master手动添加虚拟VIP

注意:需要手动先在master服务器上面添加VIP
yum install -y net-tools.x86_64
查看网卡:
ip a

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:c4:82:82 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.11/24 brd 192.168.0.255 scope global noprefixroute ens33
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fec4:8282/64 scope link 
       valid_lft forever preferred_lft forever

添加虚拟VIP:
ifconfig ens33:1 192.168.1.100/24
ifconfig ens33:1 192.168.1.100/24 down

配置relay_log的清除方式(在每个Node上)

  1. 所有Node的cnf配置文件加上
    relay_log_purge=0
    MHA在发生切换的过程中,从库的恢复过程中依赖于relay log的相关信息,所以这里要将relay log的自动清除设置为OFF,采用手动清除relay log的方式。
    在默认情况下,从服务器上的中继日志会在SQL线程执行完毕后被自动删除。但是在MHA环境中,这些中继日志在恢复其他从服务器时可能会被用到,因此需要禁用中继日志的自动删除功能。定期清除中继日志需要考虑到复制延时的问题。在ext3的文件系统下,删除大的文件需要一定的时间,会导致严重的复制延时。为了避免复制延时,需要暂时为中继日志创建硬链接,因为在linux系统中通过硬链接删除大文件速度会很快。
    提示:在mysql数据库中,删除大表时,通常也采用建立硬链接的方式
    MHA节点中包含了pure_relay_logs命令工具,它可以为中继日志创建硬链接,执行SET GLOBAL relay_log_purge=1,等待几秒钟以便SQL线程切换到新的中继日志,再执行SET GLOBAL relay_log_purge=0。
    pure_relay_logs脚本参数如下所示:
--user mysql                      用户名
--password mysql                  密码
--port                            端口号
--workdir                         指定创建relay log的硬链接的位置,默认是/var/tmp,由于系统不同分区创建硬链接文件会失败,故需要执行硬链接具体位置,成功执行脚本后,硬链接的中继日志文件被删除
--disable_relay_log_purge         默认情况下,如果relay_log_purge=1,脚本会什么都不清理,自动退出,通过设定这个参数,当relay_log_purge=1的情况下会将relay_log_purge设置为0。清理relay log之后,最后将参数设置为OFF。

在每台slave Node上创建
vim /usr/local/mha/purge_relay_log.sh

#!/bin/bash
user=root
passwd=root  
#确保用户和密码能通过127.0.0.1登入(本地的IP)
host='127.0.0.1'
port=3306
work_dir='/usr/local/mysql/data/mysql'
purge='/usr/bin/purge_relay_logs'
$purge --user=$user --password=$passwd --host=$host --disable_relay_log_purge --port=$port --workdir=$work_dir >> /usr/local/mha/purge_relay_logs.log 2>&1
  • 授权
    chmod u+x /usr/local/mha/purge_relay_log.sh
    cd /usr/local/mha
    ./purge_relay_log.sh
2019-06-23 12:07:51: purge_relay_logs script started.
 relay_log_purge is enabled. Disabling..
 Found relay_log.info: /usr/local/mysql/data/mysql/relay-log.info
 Opening /usr/local/mysql/data/mysql/localhost-relay-bin.000001 ..
 Opening /usr/local/mysql/data/mysql/localhost-relay-bin.000002 ..
 Executing SET GLOBAL relay_log_purge=1; FLUSH LOGS; sleeping a few seconds so that SQL thread can delete older relay log files (if it keeps up); SET GLOBAL relay_log_purge=0; .. ok.
2019-06-23 12:07:54: All relay log purging operations succeeded.

将脚本加入到os定时任务中


image

启动 manager

check检查
  • 检查SSH配置
    masterha_check_ssh --conf=/usr/local/mha/mha.cnf
  • 检查复制
    masterha_check_repl --conf=/usr/local/mha/mha.cnf
  • 检查状态
    masterha_check_status --conf=/usr/local/mha/mha.cnf
    必需保证所有的检查都通过
  • 错误
    在验证时,我遇到过这个错误:Can't exec "mysqlbinlog" ......
    解决方法是在所有服务器上执行:
    ln -s /usr/local/mysql/bin/mysqlbinlog /usr/local/bin/mysqlbinlog
    ln -s /usr/local/mysql/bin/mysql /usr/local/bin/mysql
    注意:加粗位置,需要根据自己环境修改路径
  • 验证成功
    MySQL Replication Health is OK.
  1. 启动MHA
    nohup masterha_manager --conf=/usr/local/mha/mha.cnf --ignore_fail_on_start --ignore_last_failover < /dev/null > /usr/local/mha/start.log 2>&1 &
  2. 查看启动日志
    tail -f manager.log
IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.0.100/24===
Checking the Status of the script.. OK 
Sun Jun 23 09:44:20 2019 - [info]  OK.
Sun Jun 23 09:44:20 2019 - [warning] shutdown_script is not defined.
Sun Jun 23 09:44:20 2019 - [info] Set master ping interval 1 seconds.
Sun Jun 23 09:44:20 2019 - [info] Set secondary check script: /usr/bin/masterha_secondary_check -s backup -s master --user=root --master_host=master --master_ip=192.168.0.11 --master_port=3306
Sun Jun 23 09:44:20 2019 - [info] Starting ping health check on 192.168.0.11(192.168.0.11:3306)..
Sun Jun 23 09:44:20 2019 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
  1. 状态检查
    ps -ef|grep masterha_manager
    masterha_check_status --conf=/usr/local/mha/mha.cnf
mha (pid:29238) is running(0:PING_OK), master:192.168.0.11
  1. 处理故障master,将其配置为从库chang到新的master,可以从manager.log找到change语句。
    grep "CHANGE MASTER TO MASTER" /usr/local/mha/manager.log | tail -1
  2. 停掉MHA监控
    masterha_stop --conf=/usr/local/mha/mha.cnf

故障转移验证


image.png

验证MHA

验证的方式是先停掉 master,因为之前的配置文件中,把 slave1 作为了候选人,那么就到 slave2 上查看 master 的 IP 是否变为了 slave1 的 IP

  1. 停掉 master,在 master(192.168.0.11) 上把 mysql 停掉
    service mysqld stop
  2. 查看 MHA 日志,上面的配置文件中指定了日志位置为 /usr/local/mha/manager.log
    tail -f /usr/local/mha/manager.log
Started automated(non-interactive) failover.
Selected 192.168.0.12(192.168.0.12:3306) as a new master.
192.168.0.12(192.168.0.12:3306): OK: Applying all logs succeeded.
192.168.0.13(192.168.0.13:3306): OK: Slave started, replicating from 192.168.0.12(192.168.0.12:3306)
192.168.0.12(192.168.0.12:3306): Resetting slave info succeeded.
Master failover to 192.168.0.12(192.168.0.12:3306) completed successfully.
  1. 删除fail文件(非必需)
    由于启动mha的时候加上了--ignore_last_failover参数,所以不删除failower生成的文件也能启动,否则需要删除failower生成的文件“mha.failover.complete”。
    rm -rf /usr/local/mha/mha.failover.complete


    image.png
  2. 检查新的slave
    登陆 slave(192.168.0.12)的mysql,查看 slave 状态
    show slave status \G

*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.0.12
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000006
          Read_Master_Log_Pos: 473
               Relay_Log_File: localhost-relay-bin.000002
                Relay_Log_Pos: 408
        Relay_Master_Log_File: mysql-bin.000006
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

可以看到 master 的 IP 现在为 12,说明 MHA 已经把 slave1 提升为了新的 master,IO线程和SQL线程也正确运行,MHA 搭建成功。

IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 192.168.1.100/24===

Checking the Status of the script.. OK 
Sun Jun 23 13:52:00 2019 - [info]  OK.
Sun Jun 23 13:52:00 2019 - [warning] shutdown_script is not defined.
Sun Jun 23 13:52:00 2019 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

failover处理

dead master处理阶段

down掉主服务器的VIP

/usr/local/mha/ha1/fail_script/master_ip_failover --orig_master_host=192.168.1.11 --orig_master_ip=192.168.1.100 --orig_master_port=3306 --command=stopssh --ssh_user=root
  • 问题1:

slave replicates is not defined in the configuration file!

MySQL彻底清除slave信息
stop slave;
reset slave all;

  • 问题2:

Tue Feb 23 13:47:18 2016 - [info] read_only=1 is not set on slave db
Tue Feb 23 13:47:18 2016 - [warning] relay_log_purge=0 is not set on slave db

解决办法:
备库执行
set global read_only=1;
set global relay_log_purge=0;

keepalived配置

配置keepalived的配置文件,在master上配置(192.168.1.11)
vi /etc/keepalived/keepalived.conf

global_defs {
     notification_email {
     saltstack@163.com
   }
   notification_email_from dba@dbserver.com
   smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id MySQL-HA
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens33
    virtual_router_id 51 priority 150 advert_int 1 nopreempt

    authentication {
    auth_type PASS
    auth_pass 1111 
}

    virtual_ipaddress { 
    192.168.1.100
     }
}

其中router_id MySQL HA表示设定keepalived组的名称,将192.168.1.100这个虚拟ip绑定到该主机的eth1网卡上,并且设置了状态为backup模式,将keepalived的模式设置为非抢占模式(nopreempt),priority 150表示设置的优先级为150。下面的配置略有不同,但是都是一个意思。

在候选master上配置(192.168.1.12)

vi /etc/keepalived/keepalived.conf

global_defs {
     notification_email {
     saltstack@163.com
   }
   notification_email_from dba@dbserver.com
   smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id MySQL-HA
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens33
    virtual_router_id 51 priority 120 advert_int 1 nopreempt

    authentication {
    auth_type PASS
    auth_pass 1111 }

    virtual_ipaddress { 
  192.168.1.100
   }
} 

/etc/init.d/keepalived start ; tail -f /var/log/messages

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 214,904评论 6 497
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,581评论 3 389
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 160,527评论 0 350
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,463评论 1 288
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,546评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,572评论 1 293
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,582评论 3 414
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,330评论 0 270
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,776评论 1 307
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,087评论 2 330
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,257评论 1 344
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,923评论 5 338
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,571评论 3 322
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,192评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,436评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,145评论 2 366
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,127评论 2 352

推荐阅读更多精彩内容