1,简介
1)MHA (master high availability) 用于在mysql高可用性环境下(主库)故障切换和主从提升(
提供高性能的从库为主
)的高可用策略。
2)组成部分(Perl编写
)。MHA Manager(管理节点)和MHA Node(数据节点)。manager可以部署在单独的节点上(最好
),也可以部署在一台slave上(若这台slave网络不通的话,就会出现问题
)。
mha manager可以管理多个master/slave集群。
3)原理。MHA node运行在每个mysql 节点上,MHA Manager 会定时检测master节点,当master故障时,自动将最新数据的slave提升为主,将其他slave指向新的master。
4)MHA在切换过程中,会尝试从宕机的主服务器上保存二进制日志(binlog)
,如果master硬件故障或者网络不通,则无法保存binlog,进行故障转移会丢失这部分数据。
5)MHA node程序,并不是一直运行的,当MHA manager programs,在进行configuration check, failover时,才会被调用。
2,基本过程
1)从宕机的master保存binlog。
2)通过relay log识别最新的slave。
3)应用差异的relay log到其他的slave。
4)应用从master保存的binlog。
5) 提升一个slave为新的master,并将其他slave指向新的master。
3,MHA提供的工具
1)Manager管理端工具
masterha_check_sshSSH 环境检测
masterha_check_replMySQL 复制环境检测
masterha_managermanager服务主程序
masterha_check_statusmha运行状态检测
masterha_master_monitormaster节点可用性监测
masterha_master_switchmaster节点切换工具
masterha_stop关闭mha服务
2)node端工具
save_binary_logs保存和复制 master binlog
apply_diff_relay_logs识别差异的relay log,(当要提升为新的master时)并应用于其它 slave
purge_relay_logs清除relay log,不会阻塞SQL线程(主从复制时,会使用relay log应用到从库)
3)扩展
secondary_check_script:通过多条网络路由检测 master 的可用性;
report_script:发送报告;
ssh免密登录配置。
注意点
1)配置4台机器的ssh免密登录
必须有登录权限,执行mysql权限,读取mysql上binlog和relaylog权限,写remote_workdir权限
useradd mysql (mkdir -p /home/mysql/)
chown -R mysql:mysql /home/mysql
chmod 600 id_rsa
chmod 600 authorized_keys
vim /etc/passwd
....../home/mysql:/bin/bash
usermod -a -G admin mysql
su mysql 再使用ssh登录。
ssh mysql@10.103.131.39 -p 52722 -i /home/mysql/.ssh/id_rsa
ssh mysql@10.103.131.40 -p 52722 -i /home/mysql/.ssh/id_rsa
ssh mysql@10.103.131.38 -p 52722 -i /home/mysql/.ssh/id_rsa
ssh mysql@10.103.131.43 -p 52722 -i /home/mysql/.ssh/id_rsa
2)l-qa-duobeitest-salt.cls.beta.ali.dm manager节点 10.103.131.43
l-qa-duobeitest-db1.cls.beta.ali.dm master 10.103.131.39
l-qa-duobeitest-db2.cls.beta.ali.dm slave 10.103.131.40
l-qa-duobeitest-file1.cls.beta.ali.dm slave 10.103.131.38
4,MHA搭建
1)每个节点安装perl环境
源码https://github.com/yoshinorim/mha4mysql-manager/wiki/Installation#downloading-mha-node-and-mha-manager
mkdir -p /usr/local/mha
chown -R premha:premha /usr/local/mha
2)manager管理端安装perl环境,安装manager端
yum install perl-Module-Install -y
yum install perl cpan perl-DBD-MySQL perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Net-Telnet -y
yum install perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker -y
cd mha4mysql-manager-master/
perl Makefile.PL
make & make install
cp mha4mysql-manager-master/samples/scripts/ ./
3)node 节点安装perl环境,安装node端
yum install perl-Module-Install -y
yum install perl-DBD-MySQL -y
yum install perl cpan perl-DBD-MySQL perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Net-Telnet -y
yum install perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker -y
cd mha4mysql-node-master/
perl Makefile.PL
make & make install
4)mha配置文件与脚本
mkdir -p /usr/local/mha/scripts
cd /usr/local/mha
cp mha4mysql-manager-master/samples/conf/app1.cnf ./
5)node上
mkdir -p /usr/local/mha/mha4mysql-node/logs node的工作目录
chown -R premha:premha /usr/local/mha
6)使用cpan安装模块
perl -MCPAN -e shell
http://mirrors.sohu.com/CPAN/
install ExtUtils::Constant
install Socket
7)vim /usr/local/mha/app.cnf
或者cp mha4mysql-manager-master/samples/conf/app1.cnf ./app.cnf
[server default]
#MHA的工作目录,注释和参数不能写在一行
manager_workdir=/usr/local/mha
#MHA的日志路径
manager_log=/usr/local/mha/logs/manager.log
#免秘钥登陆的用户名
#必须有登录权限,执行mysql权限,读取mysql上binlog和relaylog权限,写remote_workdir权限
ssh_user=mysql
ssh_port=52722
ssh_options="-i /home/mysql/.ssh/id_rsa"
`使用tomcat用户运行salt,需要tomcat用户有root权限`
#mha管理mysql的用户
# mha管理mysql的密码
# 需要有root权限,stop slave; change masterto,reset slave
user=root
password=root
#主从复制账号,用来在主从之间同步数据
repl_user=root
repl_password=root
# ping间隔时间,用来检查master是否正常
# manager在和mysql创建连接的时候,收到多次连接错误或者认证错误,则认为master down。
ping_interval=3
ping_type=CONNECT
#二次检测,使用不同路由策略来检测master是否存活
##正常状态下,master: predb1, slave: predb2、prefile
secondary_check_script=masterha_secondary_check -s 10.103.131.40 -s 10.103.131.38 --port=52722 --user=mysql --master_host=l-qa-duobeitest-db1.cls.beta.ali.dm --master_ip=10.103.131.39 --master_port=3306
##发生故障切换后,master: predb2, slave: predb1、prefile
#secondary_check_script=masterha_secondary_check -s 10.103.131.39 -s 10.103.131.38 --port=52722 --user=mysql --master_host=l-qa-duobeitest-db2.cls.beta.ali.dm --master_ip=10.103.131.40 --master_port=3306
##指定master在线切脚本位置
master_ip_online_change_script=/usr/local/mha/scripts/master_ip_online_change
##发送通知
report_script=/usr/local/mha/scripts/send_report
[server1]
hostname=10.103.131.39
port=3306
master_binlog_dir=/var/lib/mysql
#mha在选择new master时,会忽略延迟
check_repl_delay=0
##master宕机后,优先启用这台作为master
candidate_master=1
##mha node节点的工作目录
remote_workdir=/usr/local/mha/mha4mysql-node/logs
[server2]
hostname=10.103.131.40
port=3306
master_binlog_dir=/var/lib/mysql
check_repl_delay=0
candidate_master=1
remote_workdir=/usr/local/mha/mha4mysql-node/logs
[server3]
hostname=10.103.131.38
port=3306
master_binlog_dir=/var/lib/mysql
check_repl_delay=0
candidate_master=0
##设置na_master=1,使服务器不能成为master
no_master=1
remote_workdir=/usr/local/mha/mha4mysql-node/logs
4,基本使用
1)ssh连通性检测
su mysql
masterha_check_ssh --conf=/usr/local/mha/app.cnf
2)mha二次检测
masterha_secondary_check -s 10.103.131.40 -s 10.103.131.38 --port=52722 --user=mysql --master_host=l-qa-duobeitest-db1.cls.beta.ali.dm --master_ip=10.103.131.39 --master_port=3306
echo $?
每个node上需要配置好host和ip的静态路由
10.103.131.38 l-qa-duobeitest-file1.cls.beta.ali.dm
10.103.131.39 l-qa-duobeitest-db1.cls.beta.ali.dm
10.103.131.40 l-qa-duobeitest-db2.cls.beta.ali.dm
10.103.131.43 l-qa-duobeitest-salt.cls.beta.ali.dm
特殊host主机名 .或者( 等,需要修改脚本
vim /usr/local/share/perl5/MHA/DBHelper.pm
#my $dsn = "DBI:mysql:;host=[$opt{host}];port=$opt{port}";
my $dsn = "DBI:mysql:;host=$opt{host};port=$opt{port}";
#$self->{dsn} = "DBI:mysql:;host=[$host];port=$port;mysql_connect_timeout=4";
$self->{dsn} = "DBI:mysql:;host=$host;port=$port;mysql_connect_timeout=4";
vim /usr/local/share/perl5/MHA/SlaveUtil.pm
#my $dsn = "DBI:mysql:;host=[$ip];port=$port";
my $dsn = "DBI:mysql:;host=$ip;port=$port";
vim /usr/local/share/perl5/MHA/HealthCheck.pm
#"DBI:mysql:;host=[$self->{ip}];"
"DBI:mysql:;host=$self->{ip};"
vim /usr/local/bin/purge_relay_logs (node节点上
)
# my $dsn = "DBI:mysql:;host=[$opt{host}];port=$opt{port}";
my $dsn = "DBI:mysql:;host=$opt{host};port=$opt{port}";
二次检测,当3条链路都不通,才会failover
Manager - master
Manager - slave1 - master
Manager - slave2 - master
3)查看主从复制状态
masterha_check_repl --conf=/usr/local/mha/app.cnf
4)查看manager状态
masterha_check_status --conf=/usr/local/mha/app.cnf
5,故障(masterha_manager)自动切换
masterha_master_switch --conf=/usr/local/mha/app.cnf --master_state=dead --dead_master_host=10.103.131.39 --new_master_host=10.103.131.40 --interactive=0
1)每次执行之前,最好
masterha_check_ssh --conf=/usr/local/mha/app.cnf
masterha_check_repl --conf=/usr/local/mha/app.cnf
检测成功之后,启动mha服务。
masterha_manager --conf=/usr/local/mha/app.cnf
nohup masterha_manager --conf=/usr/local/mha/app.cnf > /usr/local/mha/logs/manager.log 2>&1 &
nohup是永久执行,&是指在后台运行(session断开就退出),nohup COMMAND & == 使命令永久的在后台执行
2)master执行/etc/init.d/mysqld stop操作(故障自动failover)
Phase 1: Configuration Check Phase..
Phase 2: Dead Master Shutdown Phase..
Phase 3: Master Recovery Phase..
Getting Latest Slaves Phase..
Saving Dead Master's Binlog Phase..
Determining New Master Phase.
New Master Diff Log Generation Phase..
Master Log Apply Phase..
Phase 4: Slaves Recovery Phase..
Resetting slave 10.103.131.38(10.103.131.38:3306) and starting replication from the new master 10.103.131.40(10.103.131.40:3306)..
3)恢复master
/etc/init.d/mysqld start
grep MASTER_HOST /usr/local/mha/logs/manager.log
Master is not reachable from all other monitoring servers. Failover should start.
6,mha手工在线切换
masterha_master_switch --conf=/usr/local/mha/app.cnf --master_state=alive --new_master_host=10.103.131.40 --orig_master_is_new_slave --interactive=0
当mha manager运行时,不能进行alive手工切换。
7,其他
1)mysql增加replication_one用户
GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, RELOAD, SHUTDOWN, PROCESS, FILE, REFERENCES, INDEX, ALTER, SHOW DATABASES, CREATE TEMPORARY TABLES, LOCK TABLES, EXECUTE, REPLICATION SLAVE, REPLICATION CLIENT, CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, EVENT, TRIGGER, CREATE TABLESPACE ON *.* TO 'replication_one'@'localhost' identified by 'replication_one';
127.0.0.1
localhost
10.103.131.%
2)mysql read_only和super_read_only
MySQL的slave设置为只读,read_only, 但对super用户无效.(使用super用户在开启read_only的slave下还是可以正常的读写)
给用户赋予super权限
grant super on *.* to 'super'@'localhost';
grant all privileges on *.* to 'super'@'localhost';
可以设置slave
set global super_read_only = ON;
show variables like "%read_only%";
动态设置,重启会失效,my.cnf中可以持久化)v