在2024年3月30号,MogDB发布了最新版的5.0.6版本,其中引入了一个比较有意思的小特性,用于保护备机不被switchover或者failover命令拉起提升为主库,使用户可以对指定备机升主行为进行控制。 这个需求源于一些用户场景的实际需求洞察。
同时针对该特性,MogDB引入了一个新的参数protect_standby,该参数为布尔型,即on or off.
这里我们通过实际测试,来为大家演示一下该特性的效果究竟如何。
首先准备环境
1.新建别名:alias c=“cm_ctl query -Cvid”,方便操作。 2.至少是一主两从,三个节点或者以上。
[root@mogdb114506]#ptkls
cluster_name|id|addr|user|data_dir|db_version|create_time|comment
----------------+------+--------------------------------+------+---------------------+------------------------------+---------------------+----------
cluster_26000|6001|172.20.22.114:26000(cm:15300)|omm|/data/mogdb5.0/data|MogDB5.0.6(build8b0a6ca8)|2024-04-01T16:57:25|
|6002|172.20.22.115:26000(cm:15300)|omm|/data/mogdb5.0/data|||
|6003|172.20.22.117:26000(cm:15300)|omm|/data/mogdb5.0/data|||
[root@mogdb114506]#
[omm@mogdb114~]$cm_ctlshow
[NetworkConnectState]
Networktimeout:6s
CurrentCMServertime:2024-04-0117:42:13
Networkstat('Y'meansconnected,otherwise'N'):
||Y|Y|
|Y||Y|
|Y|Y||
[NodeDiskHBState]
Nodediskhbtimeout:200s
CurrentCMServertime:2024-04-0117:42:14
Nodediskhbstat('Y'meansconnected,otherwise'N'):
|N|N|N|
[FloatIpNetworkState]
nodeinstancebase_ipfloat_ip_namefloat_ip
---------------------------------------------------------------
1mogdb1146001172.20.22.114VIP_az240917172.20.22.180
[omm@mogdb114~]$
[omm@mogdb114~]$c
[CMServerState]
nodenode_ipinstancestate
--------------------------------------------------------------------
1mogdb114172.20.22.1141/data/mogdb5.0/cm/cm_serverPrimary
2mogdb115172.20.22.1152/data/mogdb5.0/cm/cm_serverStandby
3mogdb117172.20.22.1173/data/mogdb5.0/cm/cm_serverStandby
[ClusterState]
cluster_state:Normal
redistributing:No
balanced:Yes
current_az:AZ_ALL
[DatanodeState]
nodenode_ipinstancestate|nodenode_ipinstancestate|nodenode_ipinstancestate
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1mogdb114172.20.22.1146001/data/mogdb5.0/dataPPrimaryNormal|2mogdb115172.20.22.1156002/data/mogdb5.0/dataSStandbyNormal|3mogdb117172.20.22.1176003/data/mogdb5.0/dataSStandbyNormal
[omm@mogdb114~]$
ok! 环境准备就绪之后,就可以开始测试验证工作了。
参数启用前验证switchover效果
在使用5.0.6版本的该新特性之前,我们先做一下手工切换,看看switchover的情况。
例如这里我们将主库switcover到节点2上。
[omm@mogdb114~]$cm_ctlswitchover-n2-D$PGDATA
..Killed
[omm@mogdb114~]$c
[CMServerState]
nodenode_ipinstancestate
--------------------------------------------------------------------
1mogdb114172.20.22.1141/data/mogdb5.0/cm/cm_serverPrimary
2mogdb115172.20.22.1152/data/mogdb5.0/cm/cm_serverStandby
3mogdb117172.20.22.1173/data/mogdb5.0/cm/cm_serverStandby
[ClusterState]
cluster_state:Normal
redistributing:No
balanced:No
current_az:AZ_ALL
[DatanodeState]
nodenode_ipinstancestate|nodenode_ipinstancestate|nodenode_ipinstancestate
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1mogdb114172.20.22.1146001/data/mogdb5.0/dataPStandbyNormal|2mogdb115172.20.22.1156002/data/mogdb5.0/dataSPrimaryNormal|3mogdb117172.20.22.1176003/data/mogdb5.0/dataSStandbyNormal
[omm@mogdb114~]$
接下来观察一下详细的切换过程:
###2024-04-0117:50:49.956:sleep1
>>>>>>>>>>>>>>>tctest_insert,log=tctest.log.insert2024-04-0117:49:30.495----2024-04-0117:50:50.987:76
INSERT010
now|get_hostname|tctest_insert
-------------------------------+--------------+---------------
2024-04-0117:50:51.020981+08|mogdb114|760
(1row)
now
-------------------------------
2024-04-0117:50:51.021683+08
(1row)
###2024-04-0117:50:51.025:sleep1
failedtoconnect172.20.22.180:26000.
###failedtoconnectmogdb.sleep1,tctest.log.inserttctest.log.ustore2024-04-0117:49:30.495>>>>2024-04-0117:50:52.291:77
failedtoconnect172.20.22.180:26000.
###failedtoconnectmogdb.sleep1,tctest.log.inserttctest.log.ustore2024-04-0117:49:30.495>>>>2024-04-0117:50:53.352:77
>>>>>>>>>>>>>>>tctest_insert,log=tctest.log.insert2024-04-0117:49:30.495----2024-04-0117:50:54.384:77
INSERT010
now|get_hostname|tctest_insert
-------------------------------+--------------+---------------
2024-04-0117:50:54.418988+08|mogdb115|770
(1row)
now
-------------------------------
2024-04-0117:50:54.419806+08
(1row)
从上面的日志可以看到,主库成功从114切换到了115节点,符合预期。
那么能否将主库切换到节点3呢?当然可以,如下:
[omm@mogdb114~]$cm_ctlswitchover-n3-D$PGDATA
......
cm_ctl:switchoversuccessfully.
[omm@mogdb114~]$
[omm@mogdb114~]$c
[CMServerState]
nodenode_ipinstancestate
--------------------------------------------------------------------
1mogdb114172.20.22.1141/data/mogdb5.0/cm/cm_serverPrimary
2mogdb115172.20.22.1152/data/mogdb5.0/cm/cm_serverStandby
3mogdb117172.20.22.1173/data/mogdb5.0/cm/cm_serverStandby
[ClusterState]
cluster_state:Normal
redistributing:No
balanced:No
current_az:AZ_ALL
[DatanodeState]
nodenode_ipinstancestate|nodenode_ipinstancestate|nodenode_ipinstancestate
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1mogdb114172.20.22.1146001/data/mogdb5.0/dataPStandbyNormal|2mogdb115172.20.22.1156002/data/mogdb5.0/dataSStandbyNormal|3mogdb117172.20.22.1176003/data/mogdb5.0/dataSPrimaryNormal
[omm@mogdb114~]$
同样这里我们可以来观察一下切换效果。
>>>>>>>>>>>>>>>tctest_insert,log=tctest.log.insert2024-04-0117:51:40.596----2024-04-0117:51:58.970:18
INSERT010
now|get_hostname|tctest_insert
-------------------------------+--------------+---------------
2024-04-0117:51:59.002625+08|mogdb115|180
(1row)
now
-------------------------------
2024-04-0117:51:59.003296+08
(1row)
###2024-04-0117:51:59.006:sleep1
failedtoconnect172.20.22.180:26000.
###failedtoconnectmogdb.sleep1,tctest.log.inserttctest.log.ustore2024-04-0117:51:40.596>>>>2024-04-0117:52:00.026:19
>>>>>>>>>>>>>>>tctest_insert,log=tctest.log.insert2024-04-0117:51:40.596----2024-04-0117:52:08.072:19
INSERT010
now|get_hostname|tctest_insert
-------------------------------+--------------+---------------
2024-04-0117:52:08.125014+08|mogdb117|190
(1row)
now
-------------------------------
2024-04-0117:52:08.125707+08
(1row)
符合预期,跟前面的验证一样,主库被外面switchover到了节点3,也就是117号节点。
参数启用前验证failover
首先我们来观察此时数据库集群的状态,如下:
[omm@mogdb115~]$c
[CMServerState]
nodenode_ipinstancestate
--------------------------------------------------------------------
1mogdb114172.20.22.1141/data/mogdb5.0/cm/cm_serverPrimary
2mogdb115172.20.22.1152/data/mogdb5.0/cm/cm_serverStandby
3mogdb117172.20.22.1173/data/mogdb5.0/cm/cm_serverStandby
[ClusterState]
cluster_state:Normal
redistributing:No
balanced:No
current_az:AZ_ALL
[DatanodeState]
nodenode_ipinstancestate|nodenode_ipinstancestate|nodenode_ipinstancestate
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1mogdb114172.20.22.1146001/data/mogdb5.0/dataPStandbyNormal|2mogdb115172.20.22.1156002/data/mogdb5.0/dataSPrimaryNormal|3mogdb117172.20.22.1176003/data/mogdb5.0/dataSStandbyNormal
[omm@mogdb115~]$ps-fuomm
UIDPIDPPIDCSTIMETTYTIMECMD
omm54511016:57?00:01:10/data/mogdb5.0/app/5.0.5/bin/om_monitor-L/data/mogdb5.0/log/cm/om_monitor
omm131621217:57?00:01:46/data/mogdb5.0/app/5.0.5/bin/mogdb-D/data/mogdb5.0/data-Mpending
omm2646430588018:58?00:00:00arping-D-f-w1-Iens192172.20.22.180
omm2646528012018:58pts/000:00:00ps-fuomm
omm2801228011017:46pts/000:00:00-bash
omm3058854511717:48?00:12:05/data/mogdb5.0/app/5.0.5/bin/cm_agent
omm3060811317:48?00:09:23/data/mogdb5.0/app/5.0.5/bin/cm_server
omm306321017:48?00:00:00mogdbfencedUDFmasterprocess
目前主库在115号节点,我们尝试将主库115 强行kill,模拟failover的场景。
[omm@mogdb115~]$kill-913162
[omm@mogdb115~]$c
[CMServerState]
nodenode_ipinstancestate
--------------------------------------------------------------------
1mogdb114172.20.22.1141/data/mogdb5.0/cm/cm_serverPrimary
2mogdb115172.20.22.1152/data/mogdb5.0/cm/cm_serverStandby
3mogdb117172.20.22.1173/data/mogdb5.0/cm/cm_serverStandby
[ClusterState]
cluster_state:Normal
redistributing:No
balanced:Yes
current_az:AZ_ALL
[DatanodeState]
nodenode_ipinstancestate|nodenode_ipinstancestate|nodenode_ipinstancestate
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1mogdb114172.20.22.1146001/data/mogdb5.0/dataPPrimaryNormal|2mogdb115172.20.22.1156002/data/mogdb5.0/dataSStandbyNormal|3mogdb117172.20.22.1176003/data/mogdb5.0/dataSStandbyNormal
[omm@mogdb115~]$
可以看到114节点成功接管了,主库。那么此时模拟的insert数据场景如何呢?
>>>>>>>>>>>>>>>tctest_insert,log=tctest.log.insert2024-04-0118:58:31.448----2024-04-0118:58:54.326:22
INSERT010
now|get_hostname|tctest_insert
-------------------------------+----------服务器托管----+---------------
2024-04-0118:58:54.361219+08|mogdb115|220
(1row)
now
-------------------------------
2024-04-0118:58:54.361889+08
(1row)
###2024-04-0118:58:54.365:sleep1
failedtoconnect172.20.22.180:26000.
###failedtoconnectmogdb.sleep1,tctest.log.inserttctest.log.ustore2024-04-0118:58:31.448>>>>2024-04-0118:58:55.448:23
failedtoconnect172.20.22.180:26000.
###failedtoconnectmogdb.sleep1,tctest.log.inserttctest.log.ustore2024-04-0118:58:31.448>>>>2024-04-0118:58:56.468:23
gsql:FATAL:cannotacceptconnectioninpendingmode.
###failedtoconnectmogdb.sleep1,tctest.log.inserttctest.log.ustore2024-04-0118:58:31.448>>>>2024-04-0118:58:57.521:23
>>>>>>>>>>>>>>>tctest_insert,log=tctest.log.insert2024-04-0118:58:31.448----2024-04-0118:59:05.569:23
INSERT010
now|get_hostname|tctest_insert
-------------------------------+--------------+---------------
2024-04-0118:59:05.604845+08|mogdb114|230
(1row)
now
-------------------------------
2024-04-0118:59:05.605569+08
(1row)
如上面的测试,CM做了切换,将114提升为了主库,此时我们的115节点变成了备库。
启用特性的测试验证
在验证特性之前,我们需要先测试一下相关的参数,如下:
[omm@mogdb115~]$gs_gucset-D$PGDATA-c"protect_standby=on"
Thegs_gucrunwiththefollowingarguments:[gs_guc-D/data/mogdb5.0/data-cprotect_standby=onset].
expectedinstancepath:[/data/mogdb5.0/data/postgresql.conf]
gs_gucset:protect_standby=on:[/data/mogdb5.0/data/postgresql.conf]
Totalinstances:1.Failedinstances:0.
Successtoperformgs_guc!
[omm@mogdb115~]$gs_ctlreload
[2024-04-0119:04:37.767][4895][][gs_ctl]:gs_ctlreload,datadiris/data/mogdb5.0/data
serversignaled
[omm@mogdb115~]$gsql-r
gsql((MogDB5.0.6build8b0a6ca8)compiledat2024-03-2711:05:29commit0lastmr1804)
Non-SSLconnection(SSLconnectionisrecommendedwhenrequiringhigh-security)
Type"help"forhelp.
MogDB=#showprotect_standby;
protect_standby
-----------------
on
(1row)
MogDB=#
那么此时的集群状态如何呢?
[omm@mogdb114~]$c
[CMServerState]
nodenode_ipinstancestate
--------------------------------------------------------------------
1mogdb114172.20.22.1141/data/mogdb5.0/cm/cm_serverPrimary
2mogdb115172.20.22.1152/data/mogdb5.0/cm/cm_serverStandby
3mogdb117172.20.22.1173/data/mogdb5.0/cm/cm_serverStandby
[ClusterState]
cluster_state:Normal
redistributing:No
balanced:Yes
current_az:AZ_ALL
[DatanodeState]
nodenode_ipinstancestate|nodenode_ipinstancestate|nodenode_ipinstancestate
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1mogdb114172.20.22.1146001/data/mogdb5.0/dataPPrimaryNormal|2mogdb115172.20.22.1156002/data/mogdb5.0/dataSProtectStandbyNormal|3mogdb117172.20.22.1176003/data/mogdb5.0/dataSStandbyNormal
[omm@mogdb114~]$
我们可以看到,此时115 节点上的数据库从Standby Normal 变成了 Protect Standby Normal 。
接下来我们就分别测一下switchover和failover。
[omm@mogdb114~]$cm_ctlswitchover-n2-D$PGDATA
.
cm_ctl:cannotdoswitchoveratcurrentrole(notstandby),Youcanexecute"cm_ctlquery-v"andcheck
[omm@mogdb114~]$
[omm@mogdb114~]$c
[CMServerState]
nodenode_ipinstancestate
--------------------------------------------------------------------
1mogdb114172.20.22.1141/data/mogdb5.0/cm/cm_serverPrimary
2mogdb115172.20.22.1152/data/mogdb5.0/cm/cm_serverStandby
3mogdb117172.20.22.1173/data/mogdb5.0/cm/cm_serverStandby
[ClusterState]
cluster_state:Normal
redistributing:No
balanced:Yes
current_az:AZ_ALL
[DatanodeState]
nodenode_ipinstancestate|nodenode_ipinstancestate|nodenode_ipinstancestate
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1mogdb114172.20.22.1146001/data/mogdb5.0/dataPPrimaryNormal|2mogdb115172.20.22.1156002/data/mogdb5.0/dataSProtectStandbyNormal|3mogdb117172.20.22.1176003/data/mogdb5.0/dataSStandbyNormal
[omm@mogdb114~]$
可以看到此时如果你进行switchover,那么会报错,提示不允许进行操作。
那么如果进行failover操作会是什么情况呢,如果没有启用这个特性设置,那么kill主库114的进程,CM可能会把115 提升为主库。接下来就是见证奇迹的时刻。
[omm@mogdb114~]$ps-fuomm
UIDPIDPPIDCSTIMETTYTIMECMD
omm769317884019:07?00:00:00arping-D-f-w1-Iens192172.20.22.180
omm770227470019:07pts/200:00:00ps-fuomm
omm17884253021517:48?00:12:32/data/mogdb5.0/app/5.0.5/bin/cm_agent
omm1790411417:48?00:11:47/data/mogdb5.0/app/5.0.5/bin/cm_server
omm179301017:48?00:00:00mogdbfencedUDFmasterprocess
omm253021116:57?00:01:17/data/mogdb5.0/app/5.0.5/bin/om_monitor-L/data/mogdb5.0/log/cm/om_monitor
omm2747027469017:02pts/200:00:00-bash
omm306261618:58?00:00:34/data/mogdb5.0/app/5.0.5/bin/mogdb-D/data/mogdb5.0/data-Mpending
[omm@mogdb114~]$kill-930626
[omm@mogdb114~]$
[omm@mogdb114~]$c
[CMServerState]
nodenode_ipinstancestate
--------------------------------------------------------------------
1mogdb114172.20.22.1141/data/mogdb5.0/cm/cm_serverPrimary
2mogdb115172.20.22.1152/data/mogdb5.0/cm/cm_serverStandby
3mogdb117172.20.22.1173/data/mogdb5.0/cm/cm_serverStandby
[ClusterState]
cluster_state:Normal
redistributing:No
balanced:No
current_az:AZ_ALL
[DatanodeState]
nodenode_ipi服务器托管nstancestate|nodenode_ipinstancestate|nodenode_ipinstancestate
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1mogdb114172.20.22.1146001/data/mogdb5.0/dataPStandbyNormal|2mogdb115172.20.22.1156002/data/mogdb5.0/dataSProtectStandbyNormal|3mogdb117172.20.22.1176003/data/mogdb5.0/dataSPrimaryNormal
[omm@mogdb114~]$
当我们kill掉114 主节点上的进程后,我们可以看到集群failover到了117节点上了,主库不在往115上做切换。这是符合我们的预期的!
那么这个特性有什么实际应用场景?
一些特定场景下,用户需要确保某个备库不被集群管理软件所干预,永远保存一份standby的状态,必要的时候可以进行人工干预切换;同时实现将特定节点提升为主库的需求,而不是让集群管理软件来随机选择。
所以,大家觉得这个小特性有用吗?
参考:
MogDB 5.0.6 新特性介绍 https://docs.mogdb.io/zh/mogdb/v5.0/5.0.6
本文由 mdnice 多平台发布
服务器托管,北京服务器托管,服务器租用 http://www.fwqtg.net
相关推荐: QT—-基于QT的人脸考勤系统ubuntu系统运行,编译到rk3588开发板运行
目录 1 Ubantu编译opencv和seetaface库 1.1 Ubantu编译opencv 1.2 Ubuntu编译seetaface 1.3 安装qt 2 更改代码 2.1 直接运行报错/usr/bin/ld: cannot find -lGL: N…