Stop the SAP HANA database on the secondary node - SAP HANA on AWS

Stop the SAP HANA database on the secondary node

Description — Stop the primary SAP HANA database (on Node 2) during normal cluster operation.

Run node — Primary SAP HANA database node (on Node 2)

Run steps:

  • Stop the SAP HANA database gracefully as <sid>adm on node 2.

    [root@sechana ~]# su - hdbadm hdbadm@sechana:/usr/sap/HDB/HDB00> HDB stop hdbdaemon will wait maximal 300 seconds for NewDB services finishing. Stopping instance using: /usr/sap/HDB/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function Stop 400 12.11.2020 11:45:21 Stop OK Waiting for stopped instance using: /usr/sap/HDB/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function WaitforStopped 600 2 12.11.2020 11:45:53 WaitforStopped OK hdbdaemon is stopped.

Expected result:

  • The cluster detects the stopped primary SAP HANA database (on node 2) and promotes the secondary SAP HANA database (on node 1) to take over as primary.

    [root@sechana ~]# pcs status Cluster name: rhelhanaha Stack: corosync Current DC: sechana (version 1.1.19-8.el7_6.5-c3c624ea3d) - partition with quorum Last updated: Tue Nov 10 18:04:01 2020 Last change: Tue Nov 10 18:04:00 2020 by root via crm_attribute on prihana 2 nodes configured 6 resources configured Online: [ prihana sechana ] Full list of resources: clusterfence (stonith:fence_aws): Started prihana Clone Set: SAPHanaTopology_HDB_00-clone [SAPHanaTopology_HDB_00] Started: [ prihana sechana ] Master/Slave Set: SAPHana_HDB_00-master [SAPHana_HDB_00] SAPHana_HDB_00 (ocf::heartbeat:SAPHana): Promoting prihana Slaves: [ sechana ] hana-oip (ocf::heartbeat:aws-vpc-move-ip): Started prihana Failed Actions: * SAPHana_HDB_00_monitor_59000 on sechana 'master (failed)' (9): call=41, status=complete, exitreason='', last-rc-change='Tue Nov 10 18:03:49 2020', queued=0ms, exec=0ms Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled [root@sechana ~]#
  • The overlay IP address is migrated to the new primary (on node 1).

    [root@prihana ~]# ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000 link/ether 0a:38:1c:ce:b4:3d brd ff:ff:ff:ff:ff:ff inet xx.xx.xx.xx/24 brd 11.0.1.255 scope global eth0 valid_lft forever preferred_lft forever inet xx.xx.xx.xx/32 scope global eth0:1 valid_lft forever preferred_lft forever inet 192.168.10.16/32 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::838:1cff:fece:b43d/64 scope link valid_lft forever preferred_lft forever
  • With AUTOMATED_REGISTER set to true, the cluster restarts the failed SAP HANA database and registers it against the new primary.

    Check the status of the secondary using the following command:

    hdbadm@sechana:/usr/sap/HDB/HDB00> sapcontrol -nr 00 -function GetProcessList 10.11.2020 18:08:47 GetProcessList OK name, description, dispstatus, textstatus, starttime, elapsedtime, pid hdbdaemon, HDB Daemon, GREEN, Running, 2020 11 10 18:05:44, 0:03:03, 6601 hdbcompileserver, HDB Compileserver, GREEN, Running, 2020 11 10 18:05:48, 0:02:59, 6725 hdbindexserver, HDB Indexserver-HDB, GREEN, Running, 2020 11 10 18:05:49, 0:02:58, 6828 hdbnameserver, HDB Nameserver, GREEN, Running, 2020 11 10 18:05:44, 0:03:03, 6619 hdbpreprocessor, HDB Preprocessor, GREEN, Running, 2020 11 10 18:05:48, 0:02:59, 6730 hdbwebdispatcher, HDB Web Dispatcher, GREEN, Running, 2020 11 10 18:05:58, 0:02:49, 7797 hdbxsengine, HDB XSEngine-HDB, GREEN, Running, 2020 11 10 18:05:49, 0:02:58, 6831 hdbadm@sechana:/usr/sap/HDB/HDB00>

Recovery procedure:

  • Clean up the cluster “failed actions” on node 2 as root using the following command:

    [root@sechana ~]# pcs resource cleanup SAPHana_HDB_00 --node sechana
  • After resource cleanup, ensure the cluster “failed actions” are cleaned up.

    root@sechana ~]# pcs status Cluster name: rhelhanaha Stack: corosync Current DC: sechana (version 1.1.19-8.el7_6.5-c3c624ea3d) - partition with quorum Last updated: Tue Nov 10 18:13:35 2020 Last change: Tue Nov 10 18:12:51 2020 by hacluster via crmd on sechana 2 nodes configured 6 resources configured Online: [ prihana sechana ] Full list of resources: clusterfence (stonith:fence_aws): Started prihana Clone Set: SAPHanaTopology_HDB_00-clone [SAPHanaTopology_HDB_00] Started: [ prihana sechana ] Master/Slave Set: SAPHana_HDB_00-master [SAPHana_HDB_00] Masters: [ prihana ] Slaves: [ sechana ] hana-oip (ocf::heartbeat:aws-vpc-move-ip): Started prihana Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled [root@sechana ~]#