redis 5.0.5集群部署与服务器宕机故障模拟
阅读原文时间:2023年07月08日阅读:2

背景

业务稳定性要求需要一套redis集群来保障
因此采用 redis cluster 集群

环境

名称

ip地址

cpu

内存

master端口

slave端口

redis-651

10.65.6.51

4c

8G

7001

7002

redis-652

10.65.6.52

4c

8G

7001

7002

redis-653

10.65.6.53

4c

8G

7001

7002

安装配置,以10.65.6.51 为例

#下载软件包
wget http://download.redis.io/releases/redis-5.0.5.tar.gz

#编译安装、
tar -zxvf redis-5.0.5.tar.gz 
cd redis-5.0.5
make && make install

#修改必要的环境参数
echo 'never' > /sys/kernel/mm/transparent_hugepage/defrag
echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled

#加入开机启动项:
vi /etc/rc.local
echo 'never' > /sys/kernel/mm/transparent_hugepage/defrag
echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled

#修改内核参数
vi /etc/sysctl.conf
vm.overcommit_memory = 1
net.core.somaxconn= 1024
vm.max_map_count=655360

#修改连接数
ulimit -n 655350

#创建redis用户
useradd -d /exporter/redis -m redis

#在redis用户根目录创建redis_cluster
su - redis
mkdir redis_cluster
mkdir logs

#在 redis_cluster 目录下,创建名为7001、7002的目录,并将 redis.conf 拷贝到这二个目录中,提前准备好 redis 配置文件
cd redis_cluster
mkdir 7001 7002

#配置 7001 redis.conf 文件并启动实例
cd 7001 

#cat redis.conf 

bind 0.0.0.0
protected-mode yes
port 7001
tcp-backlog 511
timeout 0
tcp-keepalive 300
daemonize no
supervised no
pidfile /exporter/redis/redis_7001.pid
loglevel notice
logfile "/exporter/redis/logs/redis-7001.log"
databases 16
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump-7001.rdb
dir ./
slave-serve-stale-data yes
slave-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-disable-tcp-nodelay no
slave-priority 100
maxmemory 2147483648
appendonly yes
appendfilename "appendonly-7001.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
lua-time-limit 5000
cluster-enabled yes
cluster-config-file nodes-7001.conf
cluster-node-timeout 15000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes
masterauth m2i3s5
requirepass m2i3s5

# cat start.sh
redis-server ./redis.conf &

#启动 7001 redis 实例
su - redis
bash start.sh

#配置 7002 redis.conf 文件并启动实例
cd /exporter/redis/redis_cluster/7002

$ cat redis.conf
bind 0.0.0.0
protected-mode yes
port 7002
tcp-backlog 511
timeout 0
tcp-keepalive 300
daemonize no
supervised no
pidfile /exporter/redis/redis_7002.pid
loglevel notice
logfile "/exporter/redis/logs/redis-7002.log"
databases 16
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump-7002.rdb
dir ./
slave-serve-stale-data yes
slave-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-disable-tcp-nodelay no
slave-priority 100
maxmemory 2147483648
appendonly yes
appendfilename "appendonly-7002.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
lua-time-limit 5000
cluster-enabled yes
cluster-config-file nodes-7002.conf
cluster-node-timeout 15000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes
masterauth m2i3s5
requirepass m2i3s5

# cat start.sh
redis-server ./redis.conf &

#启动 7002 redis 实例
su - redis
bash start.sh

#查看节点实例端口是否启动,查看已经启动

另外两个服务器节点 redis实例配置并启动

按照10.65.6.51 实例配置,依次启动redis 实例

创建集群

su - redis

redis-cli -a m2i3s5  --cluster create 10.65.6.51:7001 10.65.6.51:7002 10.65.6.52:7001 10.65.6.52:7002 10.65.6.53:7001 10.65.6.53:7002 --cluster-replicas 1 

输入 yes 自动完成集群创建

#参数
–cluster-replicas 1
最后面的数字是指每个master带有多少个slave从结点。
这里最后的数是1,那就是每个master节点有一个slave从节点。如果最后的数字是2,那么代表每个master对应2个slave从结点。

-a 指定密码

# 查看集群信息
$  redis-cli  -c  -p 7002  -h 10.65.6.53      -a m2i3sc5
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.65.6.53:7002> CLUSTER nodes
5a3ac40d4fb508294581d54a5f1c78482e7510bc 10.65.6.51:7001@17001 master - 0 1667381997109 1 connected 0-5460
feec77be74afc3822711614ae9108f5b77f3fa11 10.65.6.53:7001@17001 master - 0 1667381995099 10 connected 10923-16383
3988c010767e71c4b86941a709d5ae7c96d2a662 10.65.6.53:7002@17002 myself,slave 5a3ac40d4fb508294581d54a5f1c78482e7510bc 0 1667381985000 0 connected
3299250eeb002bd9a24a7f69900ab6795a908c67 10.65.6.52:7002@17002 slave feec77be74afc3822711614ae9108f5b77f3fa11 0 1667381996103 10 connected
26a308f4be77175789d8d400aec57ae16548122b 10.65.6.51:7002@17002 slave 218f90ba77b311c205ca7c96daa64a6f27aa363c 0 1667381994094 9 connected
218f90ba77b311c205ca7c96daa64a6f27aa363c 10.65.6.52:7001@17001 master - 0 1667381993091 9 connected 5461-10922

#集群会自动创建主从关系
master               slave
10.65.6.51:7001   10.65.6.53:7002
10.65.6.52:7001   10.65.6.51:7002
10.65.6.53:7001   10.65.6.52:7002

调整集群内存

# redis-cli  -c  -p 7001  -h 10.65.6.53      -a m2i3s5
10.65.6.53:7001> info memory 

#maxmemory_human:2.00G,将集群内存调整至 3G,6个 redis实例节点都要配置,并且实时生效,生效好最好写进 redis.conf,保证重启后永久生效
# redis-cli  -c  -p 7001  -h 10.65.6.53      -a m2i3s5
10.65.6.53:7001> info memory
10.65.6.53:7002> config set maxmemory 3221225472
OK
10.65.6.53:7001> info memory 

模拟服务器宕机,关机其中任意一个节点,查看其主从关系,并重新安装一台新的服务器,加入redis集群,以 10.65.6.53 关机模拟故障

#查看redis集群,执行 CLUSTER  nodes 发现10.65.6.53 上7001、7002 实例已处于 fail 状态,但是10.65.6.53:7001 的从节点10.65.6.52:7002 已经自动升级为master
10.65.6.51:7002> CLUSTER  nodes

#查看槽位分配,10.65.6.52:7002 已经接替 10.65.6.53:7001 槽位
10.65.6.51:7002> CLUSTER SLOTS

#重新创建一台虚拟机 10.65.6.54,启动后修改ip,将ip 修改为原节点ip 10.65.6.53,并完成节点初始化配置,启动两个端口实例。
#查看节点 id
10.65.6.51:7002> CLUSTER nodes

#删除故障节点,此处经验证需要在存活的 4个实例中分别执行
10.65.6.51:7002>cluster forget b6e5b93d9f2e923f64b7bf67d8dffc37c4b45500
10.65.6.51:7002>CLUSTER forget d595f2a9cc43384642ed92d8f4384c7dee6ffdba

#添加新节点
redis-cli -a m2i3s5 --cluster add-node 10.65.6.53:7001 (添加的节点) 10.65.6.51:7001  (集群任意节点)
redis-cli -a m2i3s5 --cluster add-node 10.65.6.53:7002 (添加的节点) 10.65.6.51:7001  (集群任意节点)

#查看节点,发现新添加的两个 redis实例都是master,需要手动分配主从关系
10.65.6.51:7002> CLUSTER nodes

#将10.65.6.53:7002 修改为 10.65.6.51:7001 的从节点
10.65.6.53:7002> CLUSTER nodes
10.65.6.53:7002> cluster replicate  5a3ac40d4fb508294581d54a5f1c78482e7510bc(主节点 10.65.6.51:7001  id)
10.65.6.53:7002> CLUSTER nodes 

#将10.65.6.53:7001 修改为10.65.6.52:7002 的从节点
10.65.6.53:7001> CLUSTER nodes
10.65.6.53:7001> cluster replicate 3299250eeb002bd9a24a7f69900ab6795a908c67 (主节点 10.65.6.51:7001  id)

#在 10.65.6.53:7001 从节点上执行 CLUSTER FAILOVER 更换主从关系
10.65.6.53:7001> CLUSTER FAILOVER

#最终保证主从关系
10.65.6.51:7001   10.65.6.53:7002
10.65.6.52:7001   10.65.6.51:7002
10.65.6.53:7001   10.65.6.52:7002

#查看原有key,保证了数据可用

至此,redis集群搭建完成,模拟服务器宕机演练完成,保证数据不丢失!