KingbaseES R3 受限dba影响集群切换
阅读原文时间:2023年07月08日阅读:3

一、受限dba功能说明(参考自官方文档)

受限DBA

受限DBA可以对当前DBA的权限进行一定限制。当功能开启后DBA将不能更改以下对象:

Table
Database
Function(by name)
Language
large object
Namespace(by name)
Tablespace(by name)
Foreign data wrapper
Foreign server
Type(by name)
Relation(by oid)
Type(by oid)
Operator(by oid)
Function(by oid)
Namespace(by oid)
Tablespace(by oid)
Operator class(by oid)
search dictionary(by oid)
search configuration(by oid)
database(by oid)
conversion(by oid)
extension(by oid)

二、测试环境

1、系统主机

[kingbase@node102 data]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.1.101   node101   ---原主库
192.168.1.102   node102   ---原备库

 node_id |   hostname    | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------
 0       | 192.168.1.101 | 54321 | up     | 0.500000  | primary | 0          | true              | 0
 1       | 192.168.1.102 | 54321 | up     | 0.500000  | standby | 0          | false             | 0

2、测试版本

TEST=# select version();
                                                         VERSION
-------------------------------------------------------------------------------------------------------------------------
 Kingbase V008R003C002B0290 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)

三、切换前集群状态

1、流复制状态

TEST=# select * from sys_stat_replication;
 PID  | USESYSID | USENAME | APPLICATION_NAME |  CLIENT_ADDR  | CLIENT_HOSTNAME | CLIENT_PORT |         BACKEND_START         | BACKEND_XMIN |   STATE   | SENT_LOCATION | WRITE_LOCATION | FLUSH_LOCATION | REPLAY_LOCATION | SYNC_PRIORITY | SYNC_STATE
------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
 7758 |       10 | SYSTEM  | node102          | 192.168.1.102 |                 |       33190 | 2022-04-12 10:46:11.448058+08 |     | streaming | 0/5020F48     | 0/5020F48      | 0/5020F48      | 0/5020F48       |             2 | sync
(1 row)

2、集群节点状态

[kingbase@node101 bin]$ ./ksql -U SYSTEM -W 123456 TEST -p 9999
ksql (V008R003C002B0290)
Type "help" for help.

TEST=# show pool_nodes;
 node_id |   hostname    | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------
 0       | 192.168.1.101 | 54321 | up     | 0.500000  | primary | 0          | true              | 0
 1       | 192.168.1.102 | 54321 | up     | 0.500000  | standby | 0          | false             | 0
(2 rows)

二、配置集群信息及受限dba

1、关闭自动recovery primary

[kingbase@node102 etc]$ cat HAmodule.conf |grep -i auto_primary
#whether to turn on automatic recovery,0->off,1->on.example:AUTO_PRIMARY_RECOVERY="1"
AUTO_PRIMARY_RECOVERY=0

2、主备库开启受限dba

[kingbase@node101 bin]$ cat ../data/kingbase.conf |grep -i dba
restricted_dba = on

[kingbase@node102 bin]$ cat ../data/kingbase.conf |grep -i dba
restricted_dba = on

3、测试受限dba功能(重启集群后)

[kingbase@node101 bin]$ ./ksql -U SYSTEM -W 123456 TEST
ksql (V008R003C002B0290)
Type "help" for help.

TEST=# show restricted_dba;
 restricted_dba
----------------
 on
(1 row)

TEST=# create database test1;
CREATE DATABASE
TEST=# create table t1 (id int);
CREATE TABLE
TEST=# alter table t1 add column name varchar(10);
ALTER TABLE

TEST=# create user tom with password 'beijing';
CREATE ROLE

TEST=# alter user tom with superuser;
ERROR:  Restricted DBA can not alter user

TEST=# alter user tom with password 'beijing01';
ERROR:  Restricted DBA can not alter user

=== 从以上信息可知,system用户现在为restricted dba,其权限受到了限制,但是和官网文档说法有出入。===

四、测试集群主备切换

1、关闭主库数据库服务

[kingbase@node101 bin]$ ./sys_ctl stop -D ../data
waiting for server to shut down.... done
server stopped

2、查看备库recovery.log

=== 从以下日志信息可知,主备切换成功,由于关闭自动recovery primary,原主库节点现在为down状态===

2022-04-12 11:36:01 recover beging...
my pid is 7728,officially began to perform recovery
2022-04-12 11:36:01 check read/write on mount point
2022-04-12 11:36:01 check read/write on mount point (1 / 6).
2022-04-12 11:36:01 stat the directory of the mount point "/home/kingbase/cluster/R3HA/db/data" ...
2022-04-12 11:36:01 stat the directory of the mount point "/home/kingbase/cluster/R3HA/db/data" ... OK
2022-04-12 11:36:01 create/write the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_662084106" ...
2022-04-12 11:36:01 create/write the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_662084106" ... OK
2022-04-12 11:36:01 stat the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_662084106" ...
2022-04-12 11:36:01 stat the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_662084106" ... OK
2022-04-12 11:36:01 read the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_662084106" ...
2022-04-12 11:36:01 read the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_662084106" ... OK
2022-04-12 11:36:01 delete the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_662084106" ...
2022-04-12 11:36:01 delete the file "/home/kingbase/cluster/R3HA/db/data/rw_status_file_662084106" ... OK
2022-04-12 11:36:01 success to check read/write on mount point (1 / 6).
2022-04-12 11:36:01 check read/write on mount point ... ok
2022-04-12 11:36:01 check if the network is ok
ping trust ip 192.168.1.1 success ping times :[3], success times:[2]
determine if i am master or standby
 node_id |   hostname    | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------
 0       | 192.168.1.101 | 54321 | down   | 0.500000  | standby | 0          | false             | 0
 1       | 192.168.1.102 | 54321 | up     | 0.500000  | primary | 0          | true              | 0
(2 rows)

3、原备库已经切换为新主库

[kingbase@node102 log]$ ps -ef |grep kingbase

kingbase  4749     1  0 13:56 ?        00:00:01 /home/kingbase/cluster/R3HA/db/bin/kingbase -D /home/kingbase/cluster/R3HA/db/data
kingbase  4759  4749  0 13:56 ?        00:00:00 kingbase: logger process
kingbase  4764  4749  0 13:56 ?        00:00:00 kingbase: checkpointer process
kingbase  4765  4749  0 13:56 ?        00:00:00 kingbase: writer process
kingbase  4766  4749  0 13:56 ?        00:00:00 kingbase: stats collector process
root      5530     1  0 13:57 ?        00:00:00 ./kingbasecluster -n
root      5569  5530  0 13:57 ?        00:00:00 kingbasecluster: watchdog
root      5814  5530  0 13:57 ?        00:00:00 kingbasecluster: lifecheck
root      5816  5814  0 13:57 ?        00:00:00 kingbasecluster: heartbeat receiver
root      5817  5814  0 13:57 ?        00:00:00 kingbasecluster: heartbeat sender
kingbase 19169  4749  0 14:15 ?        00:00:00 kingbase: wal writer process
kingbase 19170  4749  0 14:15 ?        00:00:00 kingbase: autovacuum launcher process
kingbase 19171  4749  0 14:15 ?        00:00:00 kingbase: archiver process   last was 00000003.history
kingbase 19172  4749  0 14:15 ?        00:00:00 kingbase: bgworker: syslogical supervisor

TEST=# select sys_is_in_recovery();
 SYS_IS_IN_RECOVERY
--------------------
 f
(1 row)

五、人工recovery原主库为新备库

1、在原主库生成recovery.conf文件

[kingbase@node101 data]$ cp ../etc/recovery.done recovery.conf
[kingbase@node101 data]$ cat recovery.conf
standby_mode='on'
primary_conninfo='port=54321 host=192.168.1.101 user=SYSTEM password=MTIzNDU2 application_name=node102'
recovery_target_timeline='latest'
primary_slot_name ='slot_node102'

2、查看原主库recovery.log

=== 从以下可以获知,备库生成recovery.conf文件后,集群 自动启动rewind服务做recovery,在recovery过程中,出现无法访问函数的错误。===

 node_id |   hostname    | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------
 0       | 192.168.1.101 | 54321 | down   | 0.500000  | standby | 0          | false             | 0
 1       | 192.168.1.102 | 54321 | up     | 0.500000  | primary | 0          | true              | 0
(2 rows)

if recover node up, let it down , for rewind
2022-04-12 11:49:34 sys_rewind...
sys_rewind  --target-data=/home/kingbase/cluster/R3HA/db/data --source-server="host=192.168.1.102 port=54321 user=SUPERMANAGER_V8ADMIN dbname=TEST"
datadir_source = /home/kingbase/cluster/R3HA/db/data

error running query (select xlog_record_read.read_record('1/0/7000028')) in source server: ERROR:  permission denied for schema XLOG_RECORD_READ
LINE 1: select xlog_record_read.read_record('1/0/7000028')
               ^
status check: restore all the backup temp file, Done!
Failure, exiting
 sed conf change #synchronous_standby_names
2022-04-12 11:49:36 file operate
cp recovery.conf...
 change recovery.conf ip -> primary.ip
2022-04-12 11:49:37 no need change recovery.conf, primary node is 192.168.1.102
delete pid file if exist
del the replication_slots if exis
drop the slot [slot_node101].
drop the slot [slot_node102].
2022-04-12 11:49:37 start up the kingbase...
waiting for server to start....LOG:  redirecting log output to logging collector process
HINT:  Future log output will appear in directory "/home/kingbase/cluster/R3HA/db/data/sys_log".

done
server started
ksql "port=54321 user=SUPERMANAGER_V8ADMIN dbname=TEST connect_timeout=10"   -c "select 33333;"
 SYS_CREATE_PHYSICAL_REPLICATION_SLOT
--------------------------------------
 (slot_node101,)
(1 row)

2022-04-12 11:49:39 create the slot [slot_node101] success.
 SYS_CREATE_PHYSICAL_REPLICATION_SLOT
--------------------------------------
 (slot_node102,)
(1 row)

2022-04-12 11:49:39 create the slot [slot_node102] success.
2022-04-12 11:49:39 start up standby successful!
cluster is sync cluster.
SYNC RECOVER MODE ...
2022-04-12 11:49:39 remote primary node change sync
ALTER SYSTEM
ERROR:  permission denied for function SYS_RELOAD_CONF
reload conf file failed,exit

3、查看切换后集群状态

1)主备流复制状态正常

[kingbase@node102 bin]$ ./ksql -U SYSTEM -W 123456 TEST
ksql (V008R003C002B0290)
Type "help" for help.

TEST=# select * from sys_stat_replication;
^[[5~  PID  | USESYSID | USENAME | APPLICATION_NAME |  CLIENT_ADDR  | CLIENT_HOSTNAME | CLIENT_PORT |         BACKEND_START         | BACKEND_XMIN |   STATE   | SENT_LOCATION | WRITE_LOCATION | FLUSH_LOCATION | REPLAY_LOCATION | SYNC_PRIORITY | SYNC_STATE
-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
 15790 |       10 | SYSTEM  | node101          | 192.168.1.101 |                 |       32411 | 2022-04-12 11:49:36.658658+08 |      | streaming | 0/7000E18     | 0/7000E18      | 0/7000E18      | 0/7000E18       |             0 | async
(1 row)

2)节点状态正常

[kingbase@node102 bin]$ ./ksql -U SYSTEM -W 123456 TEST -p 9999
ksql (V008R003C002B0290)
Type "help" for help.

TEST=# show pool_nodes;
 node_id |   hostname    | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------
 0       | 192.168.1.101 | 54321 | up     | 0.500000  | standby | 0          | false             | 0
 1       | 192.168.1.102 | 54321 | up     | 0.500000  | primary | 0          | true              | 0
(2 rows)

4、验证切换过程的错误信息

=在切换过程中执行报错信息,在切换完成后,执行不会再报错。=

TEST=# show restricted_dba ;
 restricted_dba
----------------
 on
(1 row)

TEST=# alter user tom with superuser;
ERROR:  Restricted DBA can not alter user

TEST=# select sys_reload_conf();
 SYS_RELOAD_CONF
-----------------
 t
(1 row)

TEST=# select xlog_record_read.read_record('1/0/7000028');
 READ_RECORD
-------------
 -1284001950
(1 row)

六、总结

对于生产环境为安全需要,需开启“受限dba”功能,对于单实例环境,只是权限会受到相应的限制;但是对于集群环境,涉及的因素比较多,导致主备切换出现一定的故障,虽然最终切换成功,但是存在风险,需要完善受限dba的功能,以便用于集群环境。