MHA - Master High Availability
是由 Perl
实现的一款高可用程序,出现故障时,MHA 以最小的停机时间(通常10-30秒)执行 master 的故障转移以及 slave 的升级。MHA 可防止复制一致性问题,并且易于安装,不需要改变现有部署。
MHA 由MHA manager
和MHA node
组成, MHA manager
是一个监控管理程序,用于监控MySQL master
状态; MHA node
是具有故障转移的工具脚本,如解析 MySQL 二进制/中继日志,传输应用事件到Slave
, MHA node
在每个MySQL服务器上运行。
MHA manager
调用MHA node
工具脚本的方式是SSH
到主机上然后执行命令,所以各节点需要做等效验证。
当Master
宕机后,MHA
会尝试保存宕机Master
的二进制日志,然后自动判断MySQL
集群中哪个实例的中继日志是最新的,并将有最新日志的实例的差异日志传到其他实例补齐,从而实现所有实例数据一致。然后把宕机Master
的二进制日志应用到选定节点,并提升为 Master
。
具体流程如下:
Master
中保存二进制日志Slave
Master
保存的二进制日志事件Slave
为Master
Slave
向该新Master
同步从切换流程流程可以看到,如果宕机Master
主机无法SSH
登录,那么第一步就没办法实现,对于MySQL5.5
以前的版本,数据还是有丢失的风险。对于5.5
后的版本,开启半同步复制后,真正有助于避免数据丢失,半同步复制保证至少一个 (不是所有)slave
在 master
提交时接收到二进制日志事件。因此,对于可以处理一致性问题的MHA
可以实现"几乎没有数据丢失"和"从属一致性"。
Perl
编写MHA
会做日志补齐操作,尽可能减少数据丢失,保证数据一SSH
信任,有一定的安全隐患Slave
的高可用 M(RW)
|
+-------+-------+
S1(R) S2(R) S3(R)
这种复制方式非常常见,当Master
宕机时,MHA
会选一个日志最新的主机升级为Master
, 如果不希望个节点成为Master
,把no_master
设为1就可以。
M(RW)----M2(R, candidate_master=1)
|
+-------+-------+
S1(R) S2(R)
双主结构也是常见的复制模式,如果当前Master
崩溃, MHA
会选择只读Master
成为新的Master
本次演示使用复制方式是主主从,主主从数据库搭建方式参考以前文章
IP
系统
端口
MySQL版本
节点
读写
说明
10.0.0.247
Centos6.5
3306
5.7.9
Master
读写
主节点
10.0.0.248
Centos6.5
3306
5.7.9
Standby
只读,可切换为读写
备主节点
10.0.0.249
Centos6.5
3306
5.7.9
Slave
只读
从节点
10.0.0.24
Centos6.5
-
-
manager
-
MHA Manager
10.0.0.237
-
-
-
-
-
VIP
Master1
[client]
port = 3306
default-character-set=utf8mb4
socket = /data/mysql_db/test_db/mysql.sock
[mysqld]
datadir = /data/mysql_db/test_db
basedir = /usr/local/mysql57
tmpdir = /tmp
socket = /data/mysql_db/test_db/mysql.sock
pid-file = /data/mysql_db/test_db/mysql.pid
skip-external-locking = 1
skip-name-resolve = 1
port = 3306
server_id = 2473306
default-storage-engine = InnoDB
character-set-server = utf8mb4
default_password_lifetime=0
auto_increment_offset = 1
auto_increment_increment = 2
#### log ####
log_timestamps=system
log_bin = /data/mysql_log/test_db/mysql-bin
log_bin_index = /data/mysql_log/test_db/mysql-bin.index
binlog_format = row
relay_log_recovery=ON
relay_log=/data/mysql_log/test_db/mysql-relay-bin
relay_log_index=/data/mysql_log/test_db/mysql-relay-bin.index
log_error = /data/mysql_log/test_db/mysql-error.log
#### replication ####
log_slave_updates = 1
replicate_wild_ignore_table = information_schema.%,performance_schema.%,sys.%
#### semi sync replication settings #####
plugin_dir=/usr/local/mysql57/lib/plugin
plugin_load = "rpl_semi_sync_master=semisync_master.so;rpl_semi_sync_slave=semisync_slave.so"
loose_rpl_semi_sync_master_enabled = 1
loose_rpl_semi_sync_slave_enabled = 1
Master2
[client]
port = 3306
default-character-set=utf8mb4
socket = /data/mysql_db/test_db/mysql.sock
[mysqld]
datadir = /data/mysql_db/test_db
basedir = /usr/local/mysql57
tmpdir = /tmp
socket = /data/mysql_db/test_db/mysql.sock
pid-file = /data/mysql_db/test_db/mysql.pid
skip-external-locking = 1
skip-name-resolve = 1
port = 3306
server_id = 2483306
default-storage-engine = InnoDB
character-set-server = utf8mb4
default_password_lifetime=0
auto_increment_offset = 2
auto_increment_increment = 2
#### log ####
log_timestamps=system
log_bin = /data/mysql_log/test_db/mysql-bin
log_bin_index = /data/mysql_log/test_db/mysql-bin.index
binlog_format = row
relay_log_recovery=ON
relay_log=/data/mysql_log/test_db/mysql-relay-bin
relay_log_index=/data/mysql_log/test_db/mysql-relay-bin.index
log_error = /data/mysql_log/test_db/mysql-error.log
#### replication ####
log_slave_updates = 1
replicate_wild_ignore_table = information_schema.%,performance_schema.%,sys.%
#### semi sync replication settings #####
plugin_dir=/usr/local/mysql57/lib/plugin
plugin_load = "rpl_semi_sync_master=semisync_master.so;rpl_semi_sync_slave=semisync_slave.so"
loose_rpl_semi_sync_master_enabled = 1
loose_rpl_semi_sync_slave_enabled = 1
Slave
[client]
port = 3306
default-character-set=utf8mb4
socket = /data/mysql_db/test_db/mysql.sock
[mysqld]
datadir = /data/mysql_db/test_db
basedir = /usr/local/mysql57
tmpdir = /tmp
socket = /data/mysql_db/test_db/mysql.sock
pid-file = /data/mysql_db/test_db/mysql.pid
skip-external-locking = 1
skip-name-resolve = 1
port = 3306
server_id = 2493306
default-storage-engine = InnoDB
character-set-server = utf8mb4
default_password_lifetime=0
read_only=1
#### log ####
log_timestamps=system
log_bin = /data/mysql_log/test_db/mysql-bin
log_bin_index = /data/mysql_log/test_db/mysql-bin.index
binlog_format = row
relay_log_recovery=ON
relay_log=/data/mysql_log/test_db/mysql-relay-bin
relay_log_index=/data/mysql_log/test_db/mysql-relay-bin.index
log_error = /data/mysql_log/test_db/mysql-error.log
#### replication ####
log_slave_updates = 1
replicate_wild_ignore_table = information_schema.%,performance_schema.%,sys.%
#### semi sync replication settings #####
plugin_dir=/usr/local/mysql57/lib/plugin
plugin_load = "rpl_semi_sync_master=semisync_master.so;rpl_semi_sync_slave=semisync_slave.so"
loose_rpl_semi_sync_master_enabled = 1
loose_rpl_semi_sync_slave_enabled = 1
进入 MHA 下载页面 Downloads, 下载Manager
和Node
节点安装包,由于我的服务器是centos6
,所以下载了MHA Manager 0.56 rpm RHEL6
和MHA Node 0.56 rpm RHEL6
Node安装
在所有主机(包括Manager)上执行
# 安装依赖
yum install perl perl-devel perl-DBD-MySQL
# 安装 node 工具
rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm
Manager安装
在 Manager 主机上执行
# 安装依赖
yum install -y perl perl-devel perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
# 安装 manager
rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm
管理用户需要执行一些数据库管理命令包括STOP SLAVE, CHANGE MASTER, RESET SLAVE
create user mha_manager@'%' identified by 'mha_manager';
grant all on *.* to mha_manager@'%';
flush privileges;
配置 VIP
需要有 sudo
权限
打开/etc/sudoers
文件, 增加一条
root ALL=(ALL) ALL
# 这个是增加的
mysql ALL=(ALL) NOPASSWD: ALL
然后把Defaults requiretty
注释掉
# Defaults requiretty
所有主机执行
# 进入 mysql 用户
su - mysql
# 生成密钥对, 执行命令,然后按回车
ssh-keygen -t rsa
# 复制公钥到相应主机
ssh-copy-id mysql@10.0.0.247
ssh-copy-id mysql@10.0.0.248
ssh-copy-id mysql@10.0.0.249
ssh-copy-id mysql@10.0.1.24
新建/etc/masterha
目录,我们把配置文件放到这里
mkdir /etc/masterha
创建配置文件/etc/masterha/app1.cnf
, 写上配置
[server default]
manager_workdir=/etc/masterha # 设置 manager 的工作目录, 可以自己调整
manager_log=/etc/masterha/manager.log # 设置 manager 的日志文件
master_binlog_dir=/data/mysql_log/test_db # 设置 master binlog 的日志的位置
master_ip_failover_script= /etc/masterha/script/master_ip_failover # 设置自动 failover 时的切换脚本, 脚本参考附件
master_ip_online_change_script= /etc/masterha/script/master_ip_online_change # 设置手动切换时执行的切换脚本, 脚本参考附件
user=mha_manager # 设置管理用户, 用来监控、配置 MySQL(STOP SLAVE, CHANGE MASTER, RESET SLAVE), 默认为 root
password=mha_manager # 设置管理用户密码
repl_user=repl # 设置复制环境中的复制用户名
repl_password=repl # 设置复制用户的密码
ping_interval=1 # 发送 ping 包的时间间隔,三次没有回应就自动进行 failover
remote_workdir=/tmp # 设置远端 MySQL 的工作目录
report_script=/etc/masterha/script/send_report # 设置发生切换后执行的脚本
# 检查脚本
secondary_check_script= /usr/bin/masterha_secondary_check-s 10.0.0.247 -s 10.0.0.248
shutdown_script="" #设置故障发生后关闭故障主机脚本(可以用于防止脑裂)
ssh_user=mysql #设置 ssh 的登录用户名
[server1]
hostname=10.0.0.247
port=3306
[server2]
hostname=10.0.0.248
port=3306
candidate_master=1 # 设置为候选 master, 如果发生宕机切换,会把该节点设为新 Master,即使它不是数据最新的节点
check_repl_delay=0 # 默认情况下,一个 Slave 落后 Master 100M 的中继日志,MHA 不会选择它作为新的 Master,因为这对于 Slave 恢复数据要很长时间,check_repl_delay=0 的时候会忽略延迟,可以和 candidate_master=1 配合用
[server3]
hostname=10.0.0.249
port=3306
no_master=1 # 从不将这台主机升级为 Master
ignore_fail=1 # 默认情况下,如果有 Slave 节点挂了, 就不进行切换,设置 ignore_fail=1 可以忽然它
创建配置文件/etc/masterha/app2.cnf
, 以备用Master
为 Master
, 方便切换后启动MHA
[server default]
manager_workdir=/etc/masterha # 设置 manager 的工作目录, 可以自己调整
manager_log=/etc/masterha/manager.log # 设置 manager 的日志文件
master_binlog_dir=/data/mysql_log/test_db # 设置 master binlog 的日志的位置
master_ip_failover_script= /etc/masterha/script/master_ip_failover # 设置自动 failover 时的切换脚本
master_ip_online_change_script= /etc/masterha/script/master_ip_online_change # 设置手动切换时执行的切换脚本
user=mha_manager # 设置管理用户, 用来监控、配置 MySQL(STOP SLAVE, CHANGE MASTER, RESET SLAVE), 默认为 root
password=mha_manager # 设置管理用户密码
repl_user=repl # 设置复制环境中的复制用户名
repl_password=repl # 设置复制用户的密码
ping_interval=1 # 发送 ping 包的时间间隔,三次没有回应就自动进行 failover
remote_workdir=/tmp # 设置远端 MySQL 的工作目录
report_script=/etc/masterha/script/send_report # 设置发生切换后执行的脚本
# 检查脚本
secondary_check_script= /usr/bin/masterha_secondary_check -s 10.0.0.248 -s 10.0.0.247
shutdown_script="" #设置故障发生后关闭故障主机脚本(可以用于防止脑裂)
ssh_user=mysql #设置 ssh 的登录用户名
[server1]
hostname=10.0.0.248
port=3306
[server2]
hostname=10.0.0.247
port=3306
candidate_master=1 # 设置为候选 master, 如果发生宕机切换,会把该节点设为新 Master,即使它不是数据最新的节点
check_repl_delay=0 # 默认情况下,一个 Slave 落后 Master 100M 的中继日志,MHA 不会选择它作为新的 Master,因为这对于 Slave 恢复数据要很长时间,check_repl_delay=0 的时候会忽略延迟,可以和 candidate_master=1 配合用
[server3]
hostname=10.0.0.249
port=3306
no_master=1 # 从不将这台主机升级为 Master
ignore_fail=1 # 默认情况下,如果有 Slave 节点挂了, 就不进行切换,设置 ignore_fail=1 可以忽然它
注意:使用的时候去掉注释
MHA
管理VIP
有两种方案,一种是使用Keepalived
,另一种是自己写命令实现增删VIP
,由于Keepalived
容易受到网络波动造成VIP
切换,而且无法在多实例机器上使用,所以建议写脚本管理VIP
。
当前主机的网卡是eth0
, 可以通过下列命令增删 VIP
up VIP
sudo /sbin/ifconfig eth0:1 10.0.0.237 netmask 255.255.255.255
down VIP
sudo /sbin/ifconfig eth0:1 down
master_ip_failover
, master_ip_online_change
和send_report
脚本在附录里面
MHA
的检测比较严格,所以我们把除Master
外的节点设为read_only
, 有必要可以写进配置文件里面
# mysql shell
set global read_only=1;
MHA
需要使用中继日志来实现数据一致性,所以所有节点要设置不自动清理中继日志
# mysql shell
set global relay_log_purge=0;
也可以写入配置文件
# my.cnf
relay_log_purge=0
Manager
masterha_check_ssh 检查 MHA 的 SSH 配置状况
masterha_check_repl 检查 MySQL 复制状况
masterha_manger 启动 MHA
masterha_stop 停止 MHA
masterha_check_status 检测当前 MHA 运行状态
masterha_master_monitor 检测 master 是否宕机
masterha_master_switch 手动故障转移
masterha_conf_host 添加或删除配置的 server 信息
Node
save_binary_logs 保存 master 的二进制日志
apply_diff_relay_logs 对比识别中继日志的差异部分
purge_relay_logs 清除中继日志(MHA中继日志需要使用这个命令清除)
命令的使用方法可以通过执行命令 --help 得到
在 manager
节点执行 masterha_check_ssh --conf=/etc/masterha/app1.cnf
检测SSH
状态,下面是执行结果
[mysql@chengqm ~]$ masterha_check_ssh --conf=/etc/masterha/app1.cnf
Thu Dec 20 19:47:18 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Dec 20 19:47:18 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Dec 20 19:47:18 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Thu Dec 20 19:47:18 2018 - [info] Starting SSH connection tests..
Thu Dec 20 19:47:19 2018 - [debug]
Thu Dec 20 19:47:18 2018 - [debug] Connecting via SSH from mysql@10.0.0.247(10.0.0.247:22) to mysql@10.0.0.248(10.0.0.248:22)..
Thu Dec 20 19:47:19 2018 - [debug] ok.
Thu Dec 20 19:47:19 2018 - [debug] Connecting via SSH from mysql@10.0.0.247(10.0.0.247:22) to mysql@10.0.0.249(10.0.0.249:22)..
Thu Dec 20 19:47:19 2018 - [debug] ok.
Thu Dec 20 19:47:19 2018 - [debug]
Thu Dec 20 19:47:19 2018 - [debug] Connecting via SSH from mysql@10.0.0.248(10.0.0.248:22) to mysql@10.0.0.247(10.0.0.247:22)..
Thu Dec 20 19:47:19 2018 - [debug] ok.
Thu Dec 20 19:47:19 2018 - [debug] Connecting via SSH from mysql@10.0.0.248(10.0.0.248:22) to mysql@10.0.0.249(10.0.0.249:22)..
Thu Dec 20 19:47:19 2018 - [debug] ok.
Thu Dec 20 19:47:20 2018 - [debug]
Thu Dec 20 19:47:19 2018 - [debug] Connecting via SSH from mysql@10.0.0.249(10.0.0.249:22) to mysql@10.0.0.247(10.0.0.247:22)..
Thu Dec 20 19:47:20 2018 - [debug] ok.
Thu Dec 20 19:47:20 2018 - [debug] Connecting via SSH from mysql@10.0.0.249(10.0.0.249:22) to mysql@10.0.0.248(10.0.0.248:22)..
Thu Dec 20 19:47:20 2018 - [debug] ok.
Thu Dec 20 19:47:20 2018 - [info] All SSH connection tests passed successfully.
在 manager
节点执行 masterha_check_repl --conf=/etc/masterha/app1.cnf
检测同步状态,下面是执行结果
[mysql@chengqm ~]$ masterha_check_repl --conf=/etc/masterha/app1.cnf
Thu Dec 20 20:05:03 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Dec 20 20:05:03 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Dec 20 20:05:03 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Thu Dec 20 20:05:03 2018 - [info] MHA::MasterMonitor version 0.56.
Thu Dec 20 20:05:03 2018 - [info] Multi-master configuration is detected. Current primary(writable) master is 10.0.0.247(10.0.0.247:3306)
Thu Dec 20 20:05:03 2018 - [info] Master configurations are as below:
Master 10.0.0.247(10.0.0.247:3306), replicating from 10.0.0.248(10.0.0.248:3306)
Master 10.0.0.248(10.0.0.248:3306), replicating from 10.0.0.247(10.0.0.247:3306), read-only
================ 省略 ==================
Thu Dec 20 20:05:08 2018 - [info] /etc/masterha/script/master_ip_failover --command=status --ssh_user=mysql --orig_master_host=10.0.0.247 --orig_master_ip=10.0.0.247 --orig_master_port=3306
Thu Dec 20 20:05:08 2018 - [info] OK.
Thu Dec 20 20:05:08 2018 - [warning] shutdown_script is not defined.
Thu Dec 20 20:05:08 2018 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
出现 MySQL Replication Health is OK.
表示成功
如果出现Failed to get master_ip_failover_script status with return code 255:0
这个错误,就注释掉master_ip_failover
脚本的FIXME_xxx
注意:要想正常运行,系统路径必须要有 mysqlbinlog 和 mysql 命令
使用脚本管理 VIP
不会自动设置 VIP
,所以先手动在 Master
设置 VIP
[root@cluster01 ~]# /sbin/ifconfig eth0:1 10.0.0.237 netmask 255.255.255.255
[root@cluster01 ~]# ifconfig
eth0 Link encap:Ethernet HWaddr FA:16:3E:DE:80:33
inet addr:10.0.0.247 Bcast:10.0.255.255 Mask:255.255.0.0
inet6 addr: fe80::f816:3eff:fede:8033/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:17333247 errors:0 dropped:0 overruns:0 frame:0
TX packets:5472004 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1476157398 (1.3 GiB) TX bytes:1064253754 (1014.9 MiB)
eth0:1 Link encap:Ethernet HWaddr FA:16:3E:DE:80:33
inet addr:10.0.0.237 Bcast:10.0.0.237 Mask:255.255.255.255
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
...
启动 MHA Manager
[mysql@chengqm ~]$ nohup /usr/bin/masterha_manager --conf=/etc/masterha/app1.cnf --ignore_last_failover &
[1] 21668
--ignore_last_failover
忽略上次切换。MHA
每次故障切换后都会生成一个app1.failover.complete
这样的文件,如果不加这个参数,需要删除这个文件才能再次启动检查启动日志
[mysql@chengqm ~]$ tail -18 /etc/masterha/manager.log
Fri Dec 21 13:56:39 2018 - [info]
10.0.0.247(10.0.0.247:3306) (current master)
+--10.0.0.248(10.0.0.248:3306)
+--10.0.0.249(10.0.0.249:3306)
Fri Dec 21 13:56:39 2018 - [info] Checking master_ip_failover_script status:
Fri Dec 21 13:56:39 2018 - [info] /etc/masterha/script/master_ip_failover --command=status --ssh_user=mysql --orig_master_host=10.0.0.247 --orig_master_ip=10.0.0.247 --orig_master_port=3306
VIP Command: start=sudo /sbin/ifconfig eth0:1 10.0.0.237 netmask 255.255.255.255 stop=sudo /sbin/ifconfig eth0:1 down
Check script.. OK
Fri Dec 21 13:56:39 2018 - [info] OK.
Fri Dec 21 13:56:39 2018 - [warning] shutdown_script is not defined.
Fri Dec 21 13:56:39 2018 - [info] Set master ping interval 1 seconds.
Fri Dec 21 13:56:39 2018 - [info] Set secondary check script: /usr/bin/masterha_secondary_check -s 10.0.0.247 -s 10.0.0.248
Fri Dec 21 13:56:39 2018 - [info] Starting ping health check on 10.0.0.247(10.0.0.247:3306)..
Fri Dec 21 13:56:39 2018 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
日志中显示 Ping(SELECT) succeeded, waiting until MySQL doesn't respond
表示启动成功
如果查看Master
的general
日志,会发现MHA
不断执行SELECT 1 As Value
检查命令
我们模拟Master
数据库宕机的情况
[root@cluster01 ~]# ps -ef | grep mysql
mysql 20061 1 0 11:19 pts/0 00:00:00 /bin/sh /usr/local/mysql57/bin/mysqld_safe --defaults-file=/data/mysql_db/test_db/my.cnf --datadir=/data/mysql_db/test_db --pid-file=/data/mysql_db/test_db/mysql.pid
mysql 20494 20061 0 11:19 pts/0 00:00:21 /usr/local/mysql57/bin/mysqld --defaults-file=/data/mysql_db/test_db/my.cnf --basedir=/usr/local/mysql57 --datadir=/data/mysql_db/test_db --plugin-dir=/usr/local/mysql57/lib/plugin --log-error=/data/mysql_log/test_db/mysql-error.log --pid-file=/data/mysql_db/test_db/mysql.pid --socket=/data/mysql_db/test_db/mysql.sock --port=3306
[root@cluster01 ~]# kill -9 20061 20494
查看MHA
日志可以看到整个切换过程
Fri Dec 21 14:04:49 2018 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Fri Dec 21 14:04:49 2018 - [info] Executing secondary network check script: /usr/bin/masterha_secondary_check -s 10.0.0.247 -s 10.0.0.248 --user=mysql --master_host=10.0.0.247 --master_ip=10.0.0.247 --master_port=3306 --master_user=mha_manager --master_password=mha_manager --ping_type=SELECT
Fri Dec 21 14:04:49 2018 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql_log/test_db --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --binlog_prefix=mysql-bin
Monitoring server 10.0.0.247 is reachable, Master is not reachable from 10.0.0.247. OK.
Fri Dec 21 14:04:49 2018 - [info] HealthCheck: SSH to 10.0.0.247 is reachable.
Monitoring server 10.0.0.248 is reachable, Master is not reachable from 10.0.0.248. OK.
Fri Dec 21 14:04:49 2018 - [info] Master is not reachable from all other monitoring servers. Failover should start.
=============== 省略 ================
Fri Dec 21 14:04:52 2018 - [info] Forcing shutdown so that applications never connect to the current master..
Fri Dec 21 14:04:52 2018 - [info] Executing master IP deactivation script:
Fri Dec 21 14:04:52 2018 - [info] /etc/masterha/script/master_ip_failover --orig_master_host=10.0.0.247 --orig_master_ip=10.0.0.247 --orig_master_port=3306 --command=stopssh --ssh_user=mysql
VIP Command: start=sudo /sbin/ifconfig eth0:1 10.0.0.237 netmask 255.255.255.255 stop=sudo /sbin/ifconfig eth0:1 down
Disabling the VIP on old master: 10.0.0.247
SIOCSIFFLAGS: Cannot assign requested address
Fri Dec 21 14:04:52 2018 - [info] done.
=============== 省略 ================
Fri Dec 21 14:04:53 2018 - [info] Starting master failover..
Fri Dec 21 14:04:53 2018 - [info]
From:
10.0.0.247(10.0.0.247:3306) (current master)
+--10.0.0.248(10.0.0.248:3306)
+--10.0.0.249(10.0.0.249:3306)
To:
10.0.0.248(10.0.0.248:3306) (new master)
+--10.0.0.249(10.0.0.249:3306)
Fri Dec 21 14:04:53 2018 - [info]
=============== 省略 ================
Fri Dec 21 14:04:53 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.0.248', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000005', MASTER_LOG_POS=154, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Fri Dec 21 14:04:53 2018 - [info] Executing master IP activate script:
Fri Dec 21 14:04:53 2018 - [info] /etc/masterha/script/master_ip_failover --command=start --ssh_user=mysql --orig_master_host=10.0.0.247 --orig_master_ip=10.0.0.247 --orig_master_port=3306 --new_master_host=10.0.0.248 --new_master_ip=10.0.0.248 --new_master_port=3306 --new_master_user='mha_manager' --new_master_password='mha_manager'
VIP Command: start=sudo /sbin/ifconfig eth0:1 10.0.0.237 netmask 255.255.255.255 stop=sudo /sbin/ifconfig eth0:1 down
Set read_only=0 on the new master.
Enabling the VIP - 10.0.0.237 on the new master - 10.0.0.248
=============== 省略 ================
Fri Dec 21 14:04:55 2018 - [info] 10.0.0.248: Resetting slave info succeeded.
Fri Dec 21 14:04:55 2018 - [info] Master failover to 10.0.0.248(10.0.0.248:3306) completed successfully.
查看新Master
VIP
[mysql@cluster02 ~]$ ifconfig
eth0 Link encap:Ethernet HWaddr FA:16:3E:66:7E:E8
inet addr:10.0.0.248 Bcast:10.0.255.255 Mask:255.255.0.0
inet6 addr: fe80::f816:3eff:fe66:7ee8/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:40197173 errors:0 dropped:0 overruns:0 frame:0
TX packets:10470689 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:4063358126 (3.7 GiB) TX bytes:2269241789 (2.1 GiB)
eth0:1 Link encap:Ethernet HWaddr FA:16:3E:66:7E:E8
inet addr:10.0.0.237 Bcast:10.0.0.237 Mask:255.255.255.255
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
可以看到VIP
已经成功切换
查看新Master
的general
日志,可以看到MHA
的操作过程, 下面展示部分日志
...
2018-12-21T14:04:41.782336+08:00 5525 Query SHOW SLAVE STATUS
2018-12-21T14:04:41.788318+08:00 5525 Query STOP SLAVE IO_THREAD
2018-12-21T14:04:41.900734+08:00 5525 Query SHOW SLAVE STATUS
2018-12-21T14:04:42.044801+08:00 5525 Query SHOW SLAVE STATUS
2018-12-21T14:04:42.668581+08:00 5525 Query SHOW SLAVE STATUS
2018-12-21T14:04:42.670336+08:00 5525 Query STOP SLAVE SQL_THREAD
...
2018-12-21T14:04:42.863904+08:00 5526 Query SET GLOBAL read_only=0
...
2018-12-21T14:04:43.950986+08:00 5527 Query SET @rpl_semi_sync_slave= 1
...
查看Slave
的general
日志,可以看到Slave
会重新指向
2018-12-21T14:04:04.835218+08:00 90 Query STOP SLAVE IO_THREAD
2018-12-21T14:04:04.955706+08:00 90 Query SHOW SLAVE STATUS
2018-12-21T14:04:05.092123+08:00 90 Query SHOW SLAVE STATUS
2018-12-21T14:04:06.018838+08:00 90 Query SHOW SLAVE STATUS
2018-12-21T14:04:06.034225+08:00 90 Query SHOW SLAVE STATUS
2018-12-21T14:04:06.036613+08:00 90 Query SHOW SLAVE STATUS
2018-12-21T14:04:06.038475+08:00 90 Query STOP SLAVE SQL_THREAD
2018-12-21T14:04:06.160142+08:00 90 Query SHOW SLAVE STATUS
2018-12-21T14:04:06.162224+08:00 90 Query STOP SLAVE
2018-12-21T14:04:06.163171+08:00 90 Query SHOW SLAVE STATUS
2018-12-21T14:04:06.164554+08:00 90 Query RESET SLAVE
2018-12-21T14:04:06.825564+08:00 90 Query CHANGE MASTER TO MASTER_HOST = '10.0.0.248' MASTER_USER = 'repl' MASTER_PASSWORD = <secret> MASTER_PORT = 3306 MASTER_LOG_FILE = 'mysql-bin.000005' MASTER_LOG_POS = 154
2018-12-21T14:04:06.981718+08:00 90 Query SET GLOBAL relay_log_purge=0
2018-12-21T14:04:06.982802+08:00 90 Query START SLAVE
注意: MHA在切换完成后会结束 Manager 进程
切换后Master
为Cluster2
, 把Cluster1
重新指向Cluster2
,现在测试一下手动切换,把Master
切回Cluster1
, 命令如下
masterha_master_switch --conf=/etc/masterha/app2.cnf --master_state=alive --new_master_host=10.0.0.247 --new_master_port=3306 --orig_master_is_new_slave
--orig_master_is_new_slave
是将原master切换为新主的slave,默认情况下,是不添加的。下面是执行过程, 有两个地方要回答 yes/no
[mysql@chengqm ~]$ masterha_master_switch --conf=/etc/masterha/app2.cnf --master_state=alive --new_master_host=10.0.0.247 --new_master_port=3306 --orig_master_is_new_slave
......
It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 10.0.0.248(10.0.0.248:3306)? (YES/no): yes
......
Sun Dec 23 16:50:48 2018 - [info]
From:
10.0.0.248(10.0.0.248:3306) (current master)
+--10.0.0.247(10.0.0.247:3306)
+--10.0.0.249(10.0.0.249:3306)
To:
10.0.0.247(10.0.0.247:3306) (new master)
+--10.0.0.249(10.0.0.249:3306)
+--10.0.0.248(10.0.0.248:3306)
Starting master switch from 10.0.0.248(10.0.0.248:3306) to 10.0.0.247(10.0.0.247:3306)? (yes/NO): yes
......
Sun Dec 23 16:51:36 2018 - [info] 10.0.0.247: Resetting slave info succeeded.
Sun Dec 23 16:51:36 2018 - [info] Switching master to 10.0.0.247(10.0.0.247:3306) completed successfully.
切换成功,查看Cluster1
的VIP
[mysql@cluster01 ~]$ ifconfig
eth0 Link encap:Ethernet HWaddr FA:16:3E:DE:80:33
inet addr:10.0.0.247 Bcast:10.0.255.255 Mask:255.255.0.0
inet6 addr: fe80::f816:3eff:fede:8033/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:20585872 errors:0 dropped:0 overruns:0 frame:0
TX packets:5519122 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1785787985 (1.6 GiB) TX bytes:1068115408 (1018.6 MiB)
eth0:1 Link encap:Ethernet HWaddr FA:16:3E:DE:80:33
inet addr:10.0.0.237 Bcast:10.0.0.237 Mask:255.255.255.255
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
注意:手动切换的时候先把 MHA Manager 停了
停止 MHA
的命令如下,就不演示了
masterha_stop --conf=配置文件
总的来说,MHA
是一套非常优秀而且使用比较广的高可用程序,它可以自动补齐日志使得一致性有保证,部署的时候不需要改变原有架构就可以使用。但是使用起来还是有一点复杂的,因为MHA
不接管VIP
,所以要自己写脚本实现,而且只保证Master
高可用,没有Slave
高可用,还有就是中继日志要自己设定时任务来清理。
不管怎么说,在没有更好的方案下,MHA
还是值得使用的。
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
use MHA::DBHelper;
my (
$command, $ssh_user, $orig_master_host,
$orig_master_ip, $orig_master_port, $new_master_host,
$new_master_ip, $new_master_port, $new_master_user,
$new_master_password
);
my $vip = '10.0.0.237';
my $key = '1';
my $ssh_start_vip = "sudo /sbin/ifconfig eth0:$key $vip netmask 255.255.255.255";
my $ssh_stop_vip = "sudo /sbin/ifconfig eth0:$key down";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
'new_master_user=s' => \$new_master_user,
'new_master_password=s' => \$new_master_password,
);
exit &main();
sub main {
print "\n\n VIP Command: start=$ssh_start_vip stop=$ssh_stop_vip\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
# $orig_master_host, $orig_master_ip, $orig_master_port are passed.
# If you manage master ip address at global catalog database,
# invalidate orig_master_ip here.
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
# updating global catalog, etc
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
# all arguments are passed.
# If you manage master ip address at global catalog database,
# activate new_master_ip here.
# You can also grant write access (create user, set read_only=0, etc) here.
my $exit_code = 10;
eval {
my $new_master_handler = new MHA::DBHelper();
# args: hostname, port, user, password, raise_error_or_not
$new_master_handler->connect( $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, 1 );
## Set read_only=0 on the new master
$new_master_handler->disable_log_bin_local();
print "Set read_only=0 on the new master.\n";
$new_master_handler->disable_read_only();
$new_master_handler->disconnect();
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
# If you want to continue failover, exit 10.
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Check script.. OK \n";
# do nothing
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
#!/usr/bin/env perl
# Copyright (C) 2011 DeNA Co.,Ltd.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
## Note: This is a sample script and is not complete. Modify the script based on your environment.
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw( sleep gettimeofday tv_interval );
use Data::Dumper;
my $_tstart;
my $_running_interval = 0.1;
my (
$command, $orig_master_is_new_slave, $orig_master_host,
$orig_master_ip, $orig_master_port, $orig_master_user,
$orig_master_password, $orig_master_ssh_user, $new_master_host,
$new_master_ip, $new_master_port, $new_master_user,
$new_master_password, $new_master_ssh_user
);
GetOptions(
'command=s' => \$command,
'orig_master_is_new_slave' => \$orig_master_is_new_slave,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'orig_master_user=s' => \$orig_master_user,
'orig_master_password=s' => \$orig_master_password,
'orig_master_ssh_user=s' => \$orig_master_ssh_user,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
'new_master_user=s' => \$new_master_user,
'new_master_password=s' => \$new_master_password,
'new_master_ssh_user=s' => \$new_master_ssh_user,
);
my $vip = '10.0.0.237';
my $key = '1';
my $ssh_start_vip = "sudo /sbin/ifconfig eth0:$key $vip netmask 255.255.255.255";
my $ssh_stop_vip = "sudo /sbin/ifconfig eth0:$key down";
exit &main();
sub current_time_us {
my ( $sec, $microsec ) = gettimeofday();
my $curdate = localtime($sec);
return $curdate . " " . sprintf( "%06d", $microsec );
}
sub sleep_until {
my $elapsed = tv_interval($_tstart);
if ( $_running_interval > $elapsed ) {
sleep( $_running_interval - $elapsed );
}
}
sub get_threads_util {
my $dbh = shift;
my $my_connection_id = shift;
my $running_time_threshold = shift;
my $type = shift;
$running_time_threshold = 0 unless ($running_time_threshold);
$type = 0 unless ($type);
my @threads;
my $sth = $dbh->prepare("SHOW PROCESSLIST");
$sth->execute();
while ( my $ref = $sth->fetchrow_hashref() ) {
my $id = $ref->{Id};
my $user = $ref->{User};
my $host = $ref->{Host};
my $command = $ref->{Command};
my $state = $ref->{State};
my $query_time = $ref->{Time};
my $info = $ref->{Info};
$info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
next if ( $my_connection_id == $id );
next if ( defined($query_time) && $query_time < $running_time_threshold );
next if ( defined($command) && $command eq "Binlog Dump" );
next if ( defined($user) && $user eq "system user" );
next
if ( defined($command)
&& $command eq "Sleep"
&& defined($query_time)
&& $query_time >= 1 );
if ( $type >= 1 ) {
next if ( defined($command) && $command eq "Sleep" );
next if ( defined($command) && $command eq "Connect" );
}
if ( $type >= 2 ) {
next if ( defined($info) && $info =~ m/^select/i );
next if ( defined($info) && $info =~ m/^show/i );
}
push @threads, $ref;
}
return @threads;
}
sub main {
if ( $command eq "stop" ) {
## Gracefully killing connections on the current master
# 1. Set read_only= 1 on the new master
# 2. DROP USER so that no app user can establish new connections
# 3. Set read_only= 1 on the current master
# 4. Kill current queries
# * Any database access failure will result in script die.
my $exit_code = 1;
eval {
## Setting read_only=1 on the new master (to avoid accident)
my $new_master_handler = new MHA::DBHelper();
# args: hostname, port, user, password, raise_error(die_on_error)_or_not
$new_master_handler->connect( $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, 1 );
print current_time_us() . " Set read_only on the new master.. ";
$new_master_handler->enable_read_only();
if ( $new_master_handler->is_read_only() ) {
print "ok.\n";
}
else {
die "Failed!\n";
}
$new_master_handler->disconnect();
# Connecting to the orig master, die if any database error happens
my $orig_master_handler = new MHA::DBHelper();
$orig_master_handler->connect( $orig_master_ip, $orig_master_port,
$orig_master_user, $orig_master_password, 1 );
$orig_master_handler->disable_log_bin_local();
## Waiting for N * 100 milliseconds so that current connections can exit
my $time_until_read_only = 15;
$_tstart = [gettimeofday];
my @threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
while ( $time_until_read_only > 0 && $#threads >= 0 ) {
if ( $time_until_read_only % 5 == 0 ) {
printf
"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
current_time_us(), $#threads + 1, $time_until_read_only * 100;
if ( $#threads < 5 ) {
print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
foreach (@threads);
}
}
sleep_until();
$_tstart = [gettimeofday];
$time_until_read_only--;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
}
## Setting read_only=1 on the current master so that nobody(except SUPER) can write
print current_time_us() . " Set read_only=1 on the orig master.. ";
$orig_master_handler->enable_read_only();
if ( $orig_master_handler->is_read_only() ) {
print "ok.\n";
}
else {
die "Failed!\n";
}
## Waiting for M * 100 milliseconds so that current update queries can complete
my $time_until_kill_threads = 5;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
if ( $time_until_kill_threads % 5 == 0 ) {
printf
"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
if ( $#threads < 5 ) {
print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
foreach (@threads);
}
}
sleep_until();
$_tstart = [gettimeofday];
$time_until_kill_threads--;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
}
## Terminating all threads
print current_time_us() . " Killing all application threads..\n";
$orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
print current_time_us() . " done.\n";
$orig_master_handler->enable_log_bin_local();
$orig_master_handler->disconnect();
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
## Activating master ip on the new master
# 1. Create app user with write privileges
# 2. Moving backup script if needed
# 3. Register new master's ip to the catalog database
# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.
# If exit code is 0 or 10, MHA does not abort
my $exit_code = 10;
eval {
my $new_master_handler = new MHA::DBHelper();
# args: hostname, port, user, password, raise_error_or_not
$new_master_handler->connect( $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, 1 );
## Set read_only=0 on the new master
$new_master_handler->disable_log_bin_local();
print current_time_us() . " Set read_only=0 on the new master.\n";
$new_master_handler->disable_read_only();
$new_master_handler->enable_log_bin_local();
$new_master_handler->disconnect();
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
## Update master ip on the catalog database, etc
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
# do nothing
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
return 0 unless ($new_master_ssh_user);
`ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
return 0 unless ($orig_master_ssh_user);
`ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
die;
}
#!/usr/bin/perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
#new_master_host and new_slave_hosts are set only when recovering master succeeded
my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body, $title, $content);
GetOptions(
'orig_master_host=s' => \$dead_master_host,
'new_master_host=s' => \$new_master_host,
'new_slave_hosts=s' => \$new_slave_hosts,
'subject=s' => \$subject,
'body=s' => \$body,
);
# 调用外部脚本
$title="[mha switch]";
$content="`date +'%Y-%m-%d %H:%M'` old_master=".$dead_master_host." new_master=".$new_master_host;
system("sh /etc/masterha/script/send_report.sh $title $content");
exit 0;
下面是我的定时任务,参数自行替换, workdir
需要和中继日志在同一个盘
# 每小时清理一次
0 * * * * (/usr/bin/purge_relay_logs --user=mha_manager --password=mha_manager --disable_relay_log_purge --port=3306 --workdir=/tmp/relaylogtmp >> /var/log/purge_relay_logs.log 2>&1)
MHA Wiki: https://github.com/yoshinorim…
手机扫一扫
移动阅读更方便
你可能感兴趣的文章