Best Practice 3.5 – Use playbooks to
investigate issues
Enable consistent and prompt responses to issues that are not well understood, by documenting the investigation process in playbooks. Validate and evolve these playbooks by using them regularly in operations but also in non-production environments and designated practice sessions like game days.
Suggestion 3.5.1 - Create problem playbooks for use in incident response
Understand the frequently occurring problems and troubleshooting steps used for each of the identified problems and create specific, versioned documentation with a review cycle. Suggested playbooks should include:
-
Performance Issue Investigation
-
Capacity Issue Investigation
-
Authentication and Sign On Issue Investigation
-
Security Incident Investigation
-
Connectivity and Networking Investigation
-
Ransomware and Virus Investigation
-
Interface Error Investigation
-
Batch Job Error Investigation
-
Deployment or Transport Error investigation
Ensure that your playbooks include integration and communication steps with related support functions and teams. Common communications steps include notification and progress updates to a critical incident desk, a security incident team and/or a change management team.
Suggestion 3.5.2 - Run regular SAP game days to test operational procedures and validate playbooks
Consider running SAP game days regularly for your operational team. A game day simulates a failure or event to test systems, processes, and team responses. The purpose is to actually perform the actions the team would perform as if an exceptional event happened. These should be conducted regularly so that your team builds "muscle memory" on how to respond. Your game days should cover the areas of operations, security, reliability, performance, and cost. Using a dedicated experimentation environment, simulate real world scenarios in order to validate and practice operational procedures and recovery processes.