案例说明:
1、 通过sys_basebackup创建新备库。
2、 将备库加入到Cluster nodes管理,可以用kingbase_monitor.sh一键启停。
3、 主备复制切换测试。
此次测试案例用于kingbaseES V8R3读写分离集群在线扩容测试,主要分为三个步骤:
操作系统和数据库版本:
1)操作系统环境
[kingbase@#localhost ~]$ cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core)
2)数据库版本
TEST=# select version();
version
-------------------------------------------------------------------------------
Kingbase V008R003C002B0100 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)
前期准备:
1、 原有节点,主库(192.168.4.127)、备库(192.168.4.129),新增节点(192.168.4.159)。
2、 在新增节点安装相同版本的kingbaseES数据库。
3、 配置新增节点到原节点的ssh信任关系(root、kingbase)。
4、 关闭新增节点防火墙和selinux并启动crond服务。
一、在线增加数据库复制节点
1.1 查看kingbase cluster节点进程
首先确定原主备库的Cluster和kingbaseES服务状态正常。
主库:
kingbase@srv127 bin]$ ps -ef |grep kingbase
kingbase 15845 1 0 15:11 ? 00:00:00 /home/kingbase/cluster/ESHA/db/bin/kingbase -D /home/kingbase/cluster/ESHA/db/data
kingbase 15877 15845 0 15:11 ? 00:00:00 kingbase: logger process
kingbase 15879 15845 0 15:11 ? 00:00:00 kingbase: checkpointer process
kingbase 15880 15845 0 15:11 ? 00:00:00 kingbase: writer process
kingbase 15881 15845 0 15:11 ? 00:00:00 kingbase: wal writer process
kingbase 15882 15845 0 15:11 ? 00:00:00 kingbase: autovacuum launcher process
kingbase 15883 15845 0 15:11 ? 00:00:00 kingbase: archiver process
kingbase 15884 15845 0 15:11 ? 00:00:00 kingbase: stats collector process
kingbase 15885 15845 0 15:11 ? 00:00:00 kingbase: bgworker: syslogical supervisor
kingbase 16851 15845 0 15:11 ? 00:00:00 kingbase: wal sender process SYSTEM 192.168.4.127(52967) streaming 0/180000D0
root 17781 1 0 15:11 ? 00:00:00 ./kingbasecluster -n
root 17830 17781 0 15:11 ? 00:00:00 kingbasecluster: watchdog
root 17895 17781 0 15:11 ? 00:00:00 kingbasecluster: lifecheck
root 17897 17895 0 15:11 ? 00:00:00 kingbasecluster: heartbeat receiver
root 17898 17895 0 15:11 ? 00:00:00 kingbasecluster: heartbeat sender
root 17899 17781 0 15:11 ? 00:00:00 kingbasecluster: wait for connection request
root 17900 17781 0 15:11 ? 00:00:00 kingbasecluster: wait for connection request
root 17901 17781 0 15:11 ? 00:00:00 kingbasecluster: wait for connection request
root 17912 17781 0 15:11 ? 00:00:00 kingbasecluster: wait for connection request
root 17913 17781 0 15:11 ? 00:00:00 kingbasecluster: wait for connection request
root 17914 17781 0 15:11 ? 00:00:00 kingbasecluster: wait for connection request
root 17924 17781 0 15:11 ? 00:00:00 kingbasecluster: PCP: wait for connection request
root 17925 17781 0 15:11 ? 00:00:00 kingbasecluster: worker process
kingbase 20397 15845 0 15:12 ? 00:00:00 kingbase: SUPERMANAGER_V8ADMIN TEST 192.168.4.127(53139) idle
备库:
[kingbase@srv129 bin]$ ps -ef |grep kingbase
kingbase 30708 1 0 15:11 ? 00:00:00 /home/kingbase/cluster/ESHA/db/bin/kingbase -D /home/kingbase/cluster/ESHA/db/data
kingbase 30709 30708 0 15:11 ? 00:00:00 kingbase: logger process
kingbase 30710 30708 0 15:11 ? 00:00:00 kingbase: startup process recovering 000000030000000000000018
kingbase 30714 30708 0 15:11 ? 00:00:00 kingbase: checkpointer process
kingbase 30715 30708 0 15:11 ? 00:00:00 kingbase: writer process
kingbase 30716 30708 0 15:11 ? 00:00:00 kingbase: stats collector process
kingbase 30995 1 0 10月18 ? 00:00:17 /home/kingbase/cluster/KHA6/KHA/kingbase/bin/kbha -A daemon -f /home/kingbase/cluster/KHA6/KHA/kingbase/bin/../etc/repmgr.conf
kingbase 31173 30708 0 15:11 ? 00:00:00 kingbase: wal receiver process streaming 0/180000D0
root 31222 1 0 15:11 ? 00:00:00 ./kingbasecluster -n
root 31264 31222 0 15:11 ? 00:00:00 kingbasecluster: watchdog
root 31396 31222 0 15:11 ? 00:00:00 kingbasecluster: lifecheck
root 31398 31396 0 15:11 ? 00:00:00 kingbasecluster: heartbeat receiver
root 31399 31396 0 15:11 ? 00:00:00 kingbasecluster: heartbeat sender
root 31400 31222 0 15:11 ? 00:00:00 kingbasecluster: wait for connection request
root 31402 31222 0 15:11 ? 00:00:00 kingbasecluster: wait for connection request
……
root 31421 31222 0 15:11 ? 00:00:00 kingbasecluster: PCP: wait for connection request
root 31422 31222 0 15:11 ? 00:00:00 kingbasecluster: worker process
kingbase 32062 30708 0 15:12 ? 00:00:00 kingbase: SUPERMANAGER_V8ADMIN TEST 192.168.4.127(19332) idle
1.2 查看主备复制信息
保证集群中主备复制状态正常。
kingbase@srv127 ~]$ ksql -U SYSTEM -W 123456 TEST
警告: License file will expire in 1 days.
ksql (V008R003C002B0160)
Type "help" for help.
TEST=# select * from sys_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lo
cation | write_location | flush_location | replay_location | sync_priority | sync_state
-------+----------+---------+------------------+---------------+-------------
16851 | 10 | SYSTEM | srv129 | 192.168.4.129 | | 52967 | 2020-10-23 15:11:17.836557+08 | | streaming | 0/18000
0D0 | 0/180000D0 | 0/180000D0 | 0/180000D0 | 0 | async
(1 row)
1.3 在线增加新的备库节点
为保证测试环境和接近于生产环境,在创建备库时,在主库端进行大的事务处理,当备库创建完成后查看是否合主库数据一致。
在主库进行事务处理(模拟生产库):
prod=# insert into t10 values (generate_series(1,10000000),'tom');
prod=# select count(*) from t10;
count
----------
11000000
(1 row)
1.3.1 在新备库上通过sys_basebackup在线创建备库
1) 创建备库数据存储目录
注意:目录位置尽量和其他已存在节点的目录位置一致,这样对于后期配置改动较少。
[kingbase@srv159 v8r3]$ mkdir -p /home/kingbase/cluster/ESHA/db/data[kingbase@srv159 v8r3]$ chmod 0700 /home/kingbase/cluster/ESHA/db/data
2)通过sys_basebackup创建备库
[kingbase@srv159 v8r3]$sys_basebackup -h 192.168.4.127 -D /home/kingbase/cluster/ESHA/db/data -F p -x -v -P -U SYSTEM -W 123456 -p 54321事务日志起始于时间点: 0/B20060D0, 基于时间表32079583/2079583 kB (100%), 1/1 表空间transaction log end point: 0/D5C48538sys_basebackup: base backup completed
** 1.3.2 配置备库recovery.conf文件**
[kingbase@srv159 data]$ pwd/home/kingbase/cluster/ESHA/db/data[kingbase@srv159 data]$ cat recovery.confstandby_mode='on'primary_conninfo='port=54321 host=192.168.4.127 user=SYSTEM password=MTIzNDU2 application_name=srv159'recovery_target_timeline='latest'primary_slot_name ='slot_srv159'
1.3.3 配置备库kingbase.conf文件(部分内容)
# Add settings for extensions heremax_wal_senders=4wal_keep_segments=256hot_standby_feedback=onmax_prepared_transactions=100port=54321control_file_copy='/home/kingbase/cluster/ESHA/template.bk'fsync=onwal_log_hints=onarchive_mode=onhot_standby=onwal_level=replica###synchronous_standby_names='1(srv129)'archive_dest='/home/kingbase/cluster/ESHA/archivedir/'max_replication_slots=4log_directory='/home/kingbase/cluster/ESHA/db/data/sys_log/'
1.3.4 配置所有节点sys_hba.conf文件
# "local" is for Unix domain socket connections onlylocal all all md5# IPv4 local connections:host all all 127.0.0.1/32 md5host all all 0.0.0.0/0 md5# IPv6 local connections:host all all ::1/128 md5host all all ::0/0 md5# Allow replication connections from localhost, by a user with the# replication privatehost all SYSTEM 192.168.4.127/16 md5host all SYSTEM 192.168.4.129/16 md5host all SYSTEM 192.168.4.159/16 md5 host replication SYSTEM 192.168.4.127/16 md5host replication SYSTEM 192.168.4.129/16 md5host replication SYSTEM 192.168.4.159/16 md5
注意:修改已有节点的sys_hba.conf文件后,需要reload数据库进程才能使配置生效。
[kingbase@srv129 data]$ sys_ctl reload -D /home/kingbase/cluster/ESHA/db/data服务器进程发出信号
1.3.5 从主库拷贝以下目录和文件到新备库
通过scp远程拷贝以下目录和文件到新增节点相同的位置:
说明:“/home/kingbase/cluster/ESHA”是集群部署的目录,其中“ESHA”是集群名称。对于archivedir也可以创建空目录。
[kingbase@#localhost ESHA]$ pwd/home/kingbase/cluster/ESHA[kingbase@#localhost ESHA]$ ls -lh总用量 8.0Kdrwxrwxr-x 2 kingbase kingbase 6 10月 30 14:23 archivedirdrwxrwxr-x 7 kingbase kingbase 84 10月 30 17:00 dbdrwxrwxr-x 3 kingbase kingbase 181 10月 30 17:23 logdrwxrwxr-x 3 kingbase kingbase 29 10月 30 14:55 run-rw------- 1 kingbase kingbase 8.0K 10月 31 13:19 template.bk [kingbase@srv127 db]$ pwd/home/kingbase/cluster/ESHA/db[kingbase@srv127 db]$ ls -lh总用量 20Kdrwxrwxr-x 2 kingbase kingbase 4.0K 10月 29 17:29 bindrwxrwxr-x 2 kingbase kingbase 295 10月 29 17:26 etc-rw------- 1 kingbase kingbase 151 8月 18 20:13 kingbase.logdrwxrwxr-x 2 kingbase kingbase 4.0K 7月 27 10:41 libdrwxrwxr-x 9 kingbase kingbase 4.0K 7月 27 10:41 share
注意:目录和文件拷贝不要拷贝kingbasecluster目录到新增节点,在V8R3的集群中,cluster只支持主备两个节点,新增节点只能是data node。
拷贝主库/etc/cron.d/KINGBASECRON文件到备库相同目录下:
1.3.6 在主库端添加replication_slot
1) 查看已经创建的replication_slot:
TEST=# select * from sys_replication_slots; slot_name | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn-------------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+--------------------- slot_srv127 | | physical | | | f | | | | | slot_srv129 | | physical | | | f | | | | |(2 rows)
2) 创建新增备库的replication_slot
TEST=# select * from sys_create_physical_replication_slot('slot_srv159'); slot_name | xlog_position-------------+--------------- slot_srv159 |(1 row)
3)查看主备复制的replication_slots
TEST=# select * from sys_replication_slots; slot_name | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn-------------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+--------------------- slot_srv127 | | physical | | | f | | | | | slot_srv129 | | physical | | | f | | | | | slot_srv159 | | physical | | | f | | | | |(3 rows)
** 1.3.7 启动备库实例加入主备复制架构**
[kingbase@srv159 data]$ sys_ctl start -D /home/kingbase/cluster/ESHA/db/data正在启动服务器进程[kingbase@srv159 data]$ 日志: 日志输出重定向到日志收集进程提示: 后续的日志输出将出现在目录 "/home/kingbase/cluster/ESHA/db/data/sys_log"中. 查看备库进程:[kingbase@srv159 data]$ ps -ef |grep kingbasekingbase 5071 1 0 20:01 pts/3 00:00:00 /opt/Kingbase/ES/V8R3/Server/bin/kingbase -D /home/kingbase/cluster/ESHA/db/datakingbase 5072 5071 0 20:01 ? 00:00:00 kingbase: logger processkingbase 5073 5071 90 20:01 ? 00:00:19 kingbase: startup process waiting for 0000000300000000000000D6kingbase 5077 5071 0 20:01 ? 00:00:00 kingbase: checkpointer processkingbase 5078 5071 0 20:01 ? 00:00:00 kingbase: writer processkingbase 5199 5071 0 20:01 ? 00:00:00 kingbase: stats collector processkingbase 5200 5071 0 20:01 ? 00:00:00 kingbase: wal receiver processkingbase 5201 5071 0 20:01 ? 00:00:00 kingbase: wal sender process SYSTEM 192.168.4.159(62391) idle
** 1.3.8 查看主备复制信息**
1)查看主备复制状态信息
TEST=# select * from sys_stat_replication; pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state-------+----------+---------+------------------+---------------+------------- 16851 | 10 | SYSTEM | srv129 | 192.168.4.129 | | 52967 | 2020-10-23 15:11:17.836557+08 | | streaming | 0/ED000000 | 0/ED000000 | 0/ED000000 | 0/ECFFF418 | 0 | async 12202 | 10 | SYSTEM | srv159 | 192.168.4.159 | | 47993 | 2020-10-23 20:10:32.521960+08 | 2059 | streaming | 0/ED000000 | 0/ED000000 | 0/ED000000 | 0/ECFFF418 | 0 | async(2 rows)
2)查看数据库数据信息
主库:prod=# select count(*) from t10; count---------- 21000000(1 row) 备库:prod=# select count(*) from t10; count---------- 21000000(1 row)
备注:
1) 在主备状态信息中已经查看到新增节点(srv159)的复制状态信息。2) 查看主库和备库同步数据一致,说明在创建基础备份期间,备库和主库保持了数据的一致性。3) 由以上可知,备库创建完成。
二、配置新节点加入集群服务一键启停
2.1 编辑所有节点HAmodule.conf
注意:在KB_ALL_IP参数中加入新的备库节点的ip。
2.2 在新增节点修改HAmodule.conf
注意:将KB_LOCALHOST_IP参数配置为本机ip。
2.3 配置已经存在节点(Cluster的主备库)kingbase_cluster.conf文件
注意:图中黄色部分为新增节点的配置,仔细修改,否则在show pool_nodes时无法发现新增节点。
2.4 测试kingbase_monitor.sh一键启停集群:
[kingbase@srv129 bin]$ ./kingbase_monitor.sh restart-----------------------------------------------------------------------2020-10-29 17:32:33 KingbaseES automation beging...2020-10-29 17:32:33 stop kingbasecluster [192.168.4.127] ...DEL VIP NOW AT 2020-10-29 17:32:32 ON ens192No VIP on my dev, nothing to do.2020-10-29 17:32:33 Done...2020-10-29 17:32:33 stop kingbasecluster [192.168.4.129] ...DEL VIP NOW AT 2020-10-29 17:32:34 ON ens192No VIP on my dev, nothing to do.2020-10-29 17:32:34 Done...2020-10-29 17:32:34 stop kingbase [192.168.4.127] ...set /home/kingbase/cluster/ESHA/db/data down now...2020-10-29 17:32:37 Done...2020-10-29 17:32:38 Del kingbase VIP [192.168.4.99/24] ...DEL VIP NOW AT 2020-10-29 17:32:37 ON ens192No VIP on my dev, nothing to do.2020-10-29 17:32:38 Done...2020-10-29 17:32:38 stop kingbase [192.168.4.129] ...set /home/kingbase/cluster/ESHA/db/data down now...2020-10-29 17:32:40 Done...2020-10-29 17:32:41 Del kingbase VIP [192.168.4.99/24] ...DEL VIP NOW AT 2020-10-29 17:32:42 ON ens192execute: [/sbin/ip addr del 192.168.4.99/24 dev ens192]Oprate del ip cmd end.2020-10-29 17:32:42 Done...2020-10-29 17:32:42 stop kingbase [192.168.4.159] ...2020-10-29 17:32:42 Done...2020-10-29 17:32:43 Del kingbase VIP [192.168.4.99/24] ...DEL VIP NOW AT 2020-10-29 17:32:43 ON ens192No VIP on my dev, nothing to do.2020-10-29 17:32:43 Done.........................all stop.. start crontab kingbase position : [3]Redirecting to /bin/systemctl restart crond.servicestart crontab kingbase position : [3]Redirecting to /bin/systemctl restart crond.serviceADD VIP NOW AT 2020-10-29 17:32:51 ON ens192execute: [/sbin/ip addr add 192.168.4.99/24 dev ens192 label ens192:2]execute: /usr/sbin/arping -U 192.168.4.99 -I ens192 -w 1ARPING 192.168.4.99 from 192.168.4.99 ens192Sent 1 probes (1 broadcast(s))Received 0 response(s)Redirecting to /bin/systemctl restart crond.servicewait kingbase recovery 5 sec...start crontab kingbasecluster line number: [4]Redirecting to /bin/systemctl restart crond.servicestart crontab kingbasecluster line number: [4]Redirecting to /bin/systemctl restart crond.service......................all started.....now we check again==================================================================| ip | program| [status][ 192.168.4.127]| [kingbasecluster]| [active][ 192.168.4.129]| [kingbasecluster]| [active][ 192.168.4.127]| [kingbase]| [active][ 192.168.4.129]| [kingbase]| [active][ 192.168.4.159]| [kingbase]| [active]==================================================================
注意:从以上可以获得,集群架构为2个节点的cluster,3个节点的data node。
2.5 查看新增节点信息
三、集群切换测试
3.1 测试目标
当原备库数据库服务停止或宕机,在主库服务或宕机时,新的备库切换为主库。
3.2 测试步骤
先停止原备库的kingbaseES服务。再停止主库的kingbaseES服务。查看新的备库是否被切换为主库。
3.3 实施主备切换
[kingbase@srv129 bin]$ sys_ctl stop -D /home/kingbase/cluster/ESHA/db/data等待服务器进程关闭 .... 完成服务器进程已经关闭
1)停止原备库的kingbaseES服务
[kingbase@srv129 bin]$ sys_ctl stop -D /home/kingbase/cluster/ESHA/db/data等待服务器进程关闭 .... 完成服务器进程已经关闭
2)查看复制状态信息
1)复制状态信息
TEST=# select * from sys_stat_replication; pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state------+----------+---------+------------------+---------------+-------------- 1556 | 10 | SYSTEM | srv129 | 192.168.4.129 | | 20766 | 2020-10-31 15:17:03.946716+08 | | streaming | 1/5603A788 | 1/5603A788 | 1/5603A788 | 1/5603A750 | 1 | sync 3387 | 10 | SYSTEM | srv159 | 192.168.4.159 | | 47333 | 2020-10-31 15:21:05.275911+08 | | streaming | 1/5603A788 | 1/5603A788 | 1/5603A788 | 1/5603A750 | 2 | potential(2 rows)
注意:在一主两从的架构中,新增节点的sync_stat状态为”potential”
2)原备库kingbaseES被停止
TEST=# select * from sys_stat_replication; pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state------+----------+---------+------------------+---------------+--------------- 3387 | 10 | SYSTEM | srv159 | 192.168.4.159 | | 47333 | 2020-10-31 15:21:05.275911+08 | | streaming | 1/5603A980 | 1/5603A980 | 1/5603A980 | 1/5603A8D8 | 0 | async(1 row)
注意:在一主一从的架构中,新增节点的sync_stat状态为“async”。
3)查看pool node信息
TEST=# show pool_nodes; node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay---------+---------------+-------+--------+-----------+---------+------------+-------------------+------------------- 0 | 192.168.4.127 | 54321 | up | 0.333333 | primary | 5 | false | 0 1 | 192.168.4.129 | 54321 | down | 0.333333 | standby | 0 | false | 0 2 | 192.168.4.159 | 54321 | up | 0.333333 | standby | 0 | true | 0(3 rows)
3)停止主库kingbaseES数据库服务
[kingbase@srv127 etc]$ sys_ctl stop -D /home/kingbase/cluster/ESHA/db/data等待服务器进程关闭 .... 完成服务器进程已经关闭
** 4)切换完成查看备库信息**
1)查看新备库的kingbaseES进程
[kingbase@#localhost log]$ ps -ef |grep kingbasekingbase 11680 1 0 15:21 pts/1 00:00:01 /home/kingbase/cluster/ESHA/db/bin/kingbase -D /home/kingbase/cluster/ESHA/db/datakingbase 11681 11680 0 15:21 ? 00:00:00 kingbase: logger processkingbase 11686 11680 0 15:21 ? 00:00:00 kingbase: checkpointer processkingbase 11687 11680 0 15:21 ? 00:00:00 kingbase: writer processkingbase 11689 11680 0 15:21 ? 00:00:00 kingbase: stats collector processkingbase 14821 31212 0 10月30 pts/0 00:00:00 ../bin/kingbasecluster -n -dkingbase 14840 14821 0 10月30 pts/0 00:00:35 kingbasecluster: watchdogroot 14850 14675 0 09:48 pts/1 00:00:00 su - kingbasekingbase 14851 14850 0 09:48 pts/1 00:00:00 -bashkingbase 15518 11680 0 15:33 ? 00:00:00 kingbase: wal writer processkingbase 15519 11680 0 15:33 ? 00:00:00 kingbase: autovacuum launcher processkingbase 15520 11680 0 15:33 ? 00:00:00 kingbase: archiver process last was 00000005.historykingbase 15521 11680 0 15:33 ? 00:00:00 kingbase: bgworker: syslogical supervisorkingbase 16075 11680 0 15:35 ? 00:00:00 kingbase: SUPERMANAGER_V8ADMIN TEST 192.168.4.127(34260) idle
2)连接实例查看数据库状态
[kingbase@#localhost data]$ ksql -U system -W 123456 TESTksql (V008R003C002B0160)Type "help" for help. TEST=# select sys_is_in_recovery(); sys_is_in_recovery-------------------- f(1 row)
注意:备库已经切换为主库状态。
3)查看recovery日志
[kingbase@#localhost log]$ tail -f recovery.log
---------------------------------------------------------------------
2020-10-31 15:36:02 recover beging...
2020-10-31 15:36:02 check if the network is ok
ping trust ip 192.168.4.1 scuccess ping times :[3], scuccess times:[3]
determine if i am master or standby
I,m node is primary, determine whether there is a standby db can be set to synchronous standby
由以上可以获得,备库已经切换为主库。
四、总结
kingbaseES V8R3读写分离集群可以在线手工方式增加新的节点,进行集群架构的横向扩展。
在修改kingbasecluster.conf后需要重新启动kingbasecluster服务,可以先在cluster备节点通过命令(kingbasecluster –m fast stop关闭服务),然后再关闭cluster主节点服务;先启动cluster主节点服务(kingbasecluster –n –d>kingbasecluster.log 2>&1 &)方式启动集群服务,主节点启动完成后再启动cluster备节点。
增加节点前做好测试,应该在业务低峰时候来完成。
本测试文档只是在CentOS 7环境下,其他版本使用前请在线下测试。
手机扫一扫
移动阅读更方便
你可能感兴趣的文章