oracle rac 12.1以后的脑裂brain split evict踢节点算法改进

oracle 12.1以后RAC发生脑裂时踢节点的算法变了;以前2节点RAC情况下,总是踢不是lower node number(master node)的那个节点;这样一般就是踢2号节点。在12.1以后clusterware维护一个weight权重值,主要是计算每个节点或者说子集群里使用的service和连接这些service的负载情况,这样当发生脑裂时总是踢weight小的节点,即负载轻的节点,保证服务更多用户的节点存活。 具体可以见 文档  12c: Which Node Will Survive when Split Brain Takes Place (Doc ID 1951726.1) 和 Split Brain: What’s new in Oracle Database 12.1.0.2c?   

 

理解 12.1.0.2 开始,脑裂问题发生后,节点保留策略。

在 11.2 及早期版本,在脑裂发生时,节点号小的会保留下来。然而从 12.1.0.2 开始,引入节点权重的概念。从 12.1.0.2 开始,解决脑裂时,权重高的节点将会存活下来。

 

 

这里负责计算 权重weigth的函数是 clssnmrCheckNodeWeight  , clssnm 即 Node Monitoring (clssnm.c) – Node monitoring (NM) is used to verify the health of all members of the cluster. It will maintain consistency with vendor clusterware (if it exists) via skgxn.

 

12c: Which Node Will Survive when Split Brain Takes Place (Doc ID 1951726.1)

 

PURPOSE
To understand the new behavior, from 12.1.0.2, of which node will survive when split brain takes place.

DETAILS
In 11.2 or even older version, the lowest number node will survive when split brain takes place, however this has changed in 12.1.0.2 with the introduction of node weight. Started from 12.1.0.2, during split brain resolution, node with higher weight will survive:

2014-11-24 14:25:41.140603 : CSSD:1117321536: clssnmrCheckNodeWeight: node(1) has weight stamp(0), pebble(0)
2014-11-24 14:25:41.140609 : CSSD:1117321536: clssnmrCheckNodeWeight: node(2) has weight stamp(311972654), pebble(3)
2014-11-24 14:25:41.140612 : CSSD:1117321536: clssnmrCheckNodeWeight: stamp(311972654), completed(1/2)
2014-11-24 14:25:41.140615 : CSSD:1117321536: clssnmrCheckSplit: Waiting for node weights, stamp(311972654)
2014-11-24 14:25:41.188880 : CSSD:1084811584: clssnmvDiskKillCheck: not evicted, file /dev/raw/raw2 flags 0x00000000, kill block unique 0, my unique 1416805718
2014-11-24 14:25:41.558921 : CSSD:1114167616: clssnmvDiskPing: Writing with status 0x3, timestamp 1416810341/1022717334
2014-11-24 14:25:41.731912 : CSSD:1086388544: clssnmvDHBValidateNCopy: node 1, node1, has a disk HB, but no network HB, DHB has rcfg 311972655, wrtcnt, 9527468, LATS 102 2717514, lastSeqNo 9527467, uniqueness 1416808381, timestamp 1416810341/1022722074
2014-11-24 14:25:41.731928 : CSSD:1086388544: clssnmvReadDskHeartbeat: manual shutdown of nodename node1, nodenum 1 epoch 1416810341 msec 1022722074
2014-11-24 14:25:41.732266 : CSSD:1117321536: clssnmrCheckNodeWeight: node(2) has weight stamp(311972654), pebble(3)
2014-11-24 14:25:41.732273 : CSSD:1117321536: clssnmrCheckNodeWeight: stamp(311972654), completed(1/1)
2014-11-24 14:25:41.732294 : CSSD:1117321536: clssnmCheckDskInfo: My cohort: 2
2014-11-24 14:25:41.732299 : CSSD:1117321536: clssnmRemove: Start
2014-11-24 14:25:41.732306 : CSSD:1117321536: (:CSSNM00007:)clssnmrRemoveNode: Evicting node 1, node1, from the cluster in incarnation 311972655, node birth incarnation 311972654, death incarnation 311972655, stateflags 0x225000 uniqueness value 1416808381 The number of the resource executing on each node and others are considered by the weight. Reference
此条目发表在 Oracle, Oracle RAC 分类目录。将固定链接加入收藏夹。

评论功能已关闭。