Simulating a cluster network failure

Description —To simulate a network failure to test the cluster behavior in case of a split brain.

Run node: Can be run on any node. In this test case, this is done on node B.

Run steps:

Drop all the traffic coming from and going to node A with the following command:


iptables -A INPUT -s <<Primary IP address of Node A>> -j DROP; 
iptables -A OUTPUT -d <<Primary IP address of Node A>> -j DROP


[root@sechana ~]# pcs status
Cluster name: rhelhanaha
Stack: corosync
Current DC: prihana(version 1.1.19-8.el7_6.5-c3c624ea3d) - partition with quorum
Last updated: Fri Jan 22 14:45:24 2021
Last change: Fri Jan 22 14:45:11 2021 by hacluster via crmd on  sechana
2 nodes configured
6 resources configured
Online: [ prihana sechana ]
Full list of resources:
 clusterfence   (stonith:fence_aws):    Started prihana
 Clone Set: SAPHanaTopology_DRL_00-clone [SAPHanaTopology_DRL_00]
     Started: [ prihana sechana ]
 Master/Slave Set: SAPHana_DRL_00-master [SAPHana_DRL_00]
     Masters: [ prihana]
     Slaves: [ sechana ]
 hana-oip       (ocf::heartbeat:aws-vpc-move-ip):       Started prihana
Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@ sechana ~]#sechana:~ # iptables -A INPUT -s xxx.xxx.xxx.xxx -j DROP; 
iptables -A OUTPUT -d xxx.xxx.xxx.xxx -j DROP

Expected result:

The cluster detects network failure and fences node 1. The cluster promotes the secondary SAP HANA database (on node 2) to take over as primary without going to a split brain situation.


[root@sechana ~]# pcs status
Cluster name: rhelhanaha
Stack: corosync
Current DC: sechana (version 1.1.19-8.el7_6.5-c3c624ea3d) - partition with quorum
Last updated: Fri Jan 22 15:11:43 2021
Last change: Fri Jan 22 15:10:48 2021 by root via crm_attribute on sechana
2 nodes configured
6 resources configured
Online: [ sechana ]
OFFLINE: [ prihana]
Full list of resources:
 clusterfence   (stonith:fence_aws):    Started sechana
 Clone Set: SAPHanaTopology_DRL_00-clone [SAPHanaTopology_DRL_00]
     Started: [ sechana ]
     Stopped: [ prihana]
 Master/Slave Set: SAPHana_DRL_00-master [SAPHana_DRL_00]
     Masters: [ sechana ]
     Stopped: [ prihana]
 hana-oip       (ocf::heartbeat:aws-vpc-move-ip):       Started sechana
Failed Actions:
* clusterfence_monitor_60000 on sechana 'unknown error' (1): call=-1, 
status=Timed Out, exitreason='',
    last-rc-change='Fri Jan 22 14:59:14 2021', queued=0ms, exec=0ms
Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@sechana ~]#

Recovery procedure:

Clean up the cluster “failed actions”.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Reboot SAP HANA on node 2

Administration and troubleshooting