Run AWS Systems Manager Automation tasks synchronously from AWS Step Functions - AWS Prescriptive Guidance

Run AWS Systems Manager Automation tasks synchronously from AWS Step Functions

Created by Elie El khoury (AWS)

Code repository: amazon-stepfunctions-ssm-waitfortasktoken

Environment: Production

Technologies: Serverless; DevOps; End-user computing; Operations

AWS services: AWS Step Functions; AWS Systems Manager

Summary

This pattern explains how to integrate AWS Step Functions with AWS Systems Manager. It uses AWS SDK service integrations to call the Systems Manager startAutomationExecution API with a task token from a state machine workflow, and pauses until the token returns with a success or failure call. To demonstrate the integration, this pattern implements an Automation document (runbook) wrapper around the AWS-RunShellScript or AWS-RunPowerShellScript document, and uses .waitForTaskToken to synchronously call AWS-RunShellScript or AWS-RunPowerShellScript. For more information about AWS SDK service integrations in Step Functions, see the AWS Step Functions Developer Guide.

Step Functions is a low-code, visual workflow service that you can use to build distributed applications, automate IT and business processes, and build data and machine learning pipelines by using AWS services. Workflows manage failures, retries, parallelization, service integrations, and observability so you can focus on higher-value business logic.

Automation, a capability of AWS Systems Manager, simplifies common maintenance, deployment, and remediation tasks for AWS services such as Amazon Elastic Compute Cloud (Amazon EC2), Amazon Relational Database Service (Amazon RDS), Amazon Redshift, and Amazon Simple Storage Service (Amazon S3). Automation gives you granular control over the concurrency of your automations. For example, you can specify how many resources to target concurrently, and how many errors can occur before an automation is stopped.

For implementation details, including runbook steps, parameters, and examples, see the Additional information section.

Prerequisites and limitations

Prerequisites

  • An active AWS account

  • AWS Identity and Access Management (IAM) permissions to access Step Functions and Systems Manager

  • An EC2 instance with Systems Manager Agent (SSM Agent) installed on the instance

  • An IAM instance profile for Systems Manager attached to the instance where you plan to run the runbook

  • A Step Functions role that has the following IAM permissions (which follow the principle of least privilege):

{ "Effect": "Allow", "Action": "ssm:StartAutomationExecution", "Resource": "*" }

Product versions

  • SSM document schema version 0.3 or later

  • SSM Agent version 2.3.672.0 or later

Architecture

Target technology stack  

  • AWS Step Functions

  • AWS Systems Manager Automation

Target architecture

Architecture for running Systems Manager automation tasks synchronously from Step Functions

Automation and scale

Tools

AWS services

  • AWS CloudFormation helps you set up AWS resources, provision them quickly and consistently, and manage them throughout their lifecycle across AWS accounts and Regions.

  • AWS Identity and Access Management (IAM) helps you securely manage access to your AWS resources by controlling who is authenticated and authorized to use them.

  • AWS Step Functions is a serverless orchestration service that helps you combine AWS Lambda functions and other AWS services to build business-critical applications.

  • AWS Systems Manager helps you manage your applications and infrastructure running in the AWS Cloud. It simplifies application and resource management, shortens the time to detect and resolve operational problems, and helps you manage your AWS resources securely at scale.

Code 

The code for this pattern is available in the GitHub Step Functions and Systems Manager implementation repository. 

Epics

TaskDescriptionSkills required

Download the CloudFormation template.

Download the ssm-automation-documents.cfn.json template from the cloudformation folder of the GitHub repository.

AWS DevOps

Create runbooks.

Sign in to the AWS Management Console, open the AWS CloudFormation console, and deploy the template. For more information about deploying CloudFormation templates, see Creating a stack on the AWS CloudFormation console in the CloudFormation documentation. 

The CloudFormation template deploys three resources:

  • SfnRunCommandByInstanceIds – Runbook that lets you run AWS-RunShellScript or AWS-RunPowerShellScript by using instance IDs.

  • SfnRunCommandByTargets – Runbook that lets you run AWS-RunShellScript or AWS-RunPowerShellScript by using targets.

  • SSMSyncRole – The IAM role assumed by the runbooks.

AWS DevOps
TaskDescriptionSkills required

Create a test state machine.

Follow the instructions in the AWS Step Functions Developer Guide to create and run a state machine. For the definition, use the following code. Make sure to update the InstanceIds value with the ID of a valid Systems Manager-enabled instance in your account.

{ "Comment": "A description of my state machine", "StartAt": "StartAutomationWaitForCallBack", "States": { "StartAutomationWaitForCallBack": { "Type": "Task", "Resource": "arn:aws:states:::aws-sdk:ssm:startAutomationExecution.waitForTaskToken", "Parameters": { "DocumentName": "SfnRunCommandByInstanceIds", "Parameters": { "InstanceIds": [ "i-1234567890abcdef0" ], "taskToken.$": "States.Array($$.Task.Token)", "workingDirectory": [ "/home/ssm-user/" ], "Commands": [ "echo \"This is a test running automation waitForTaskToken\" >> automation.log", "sleep 100" ], "executionTimeout": [ "10800" ], "deliveryTimeout": [ "30" ], "shell": [ "Shell" ] } }, "End": true } } }

This code calls the runbook to run two commands that demonstrate the waitForTaskToken call to Systems Manager Automation.

The shell parameter value (Shell or PowerShell) determines whether the Automation document runs AWS-RunShellScript or AWS-RunPowerShellScript.

The task writes "This is a test running automation waitForTaskToken" into the /home/ssm-user/automation.log file, and then sleeps for 100 seconds before it responds with the task token and releases the next task in the workflow.

If you want to call the SfnRunCommandByTargets runbook instead, replace the Parameters section of the previous code with the following:

"Parameters": { "Targets": [ { "Key": "InstanceIds", "Values": [ "i-02573cafcfEXAMPLE", "i-0471e04240EXAMPLE" ] } ],
AWS DevOps

Update the IAM role for the state machine.

The previous step automatically creates a dedicated IAM role for the state machine. However, it doesn’t grant permissions to call the runbook. Update the role by adding the following permissions:

{ "Effect": "Allow", "Action": "ssm:StartAutomationExecution", "Resource": "*" }
AWS DevOps

Validate the synchronous calls.

Run the state machine to validate the synchronous call between Step Functions and Systems Manager Automation. 

For sample output, see the Additional information section. 

AWS DevOps

Related resources

Additional information

Implementation details

This pattern provides a CloudFormation template that deploys two Systems Manager runbooks:

  • SfnRunCommandByInstanceIds runs the AWS-RunShellScript or AWS-RunPowerShellScript command by using instance IDs.

  • SfnRunCommandByTargets runs the AWS-RunShellScript or AWS-RunPowerShellScript command by using targets.

Each runbook implements four steps to achieve a synchronous call when using the .waitForTaskToken option in Step Functions.

Step

Action

Description

1

Branch

Checks the shell parameter value (Shell or PowerShell) to decide whether to run AWS-RunShellScript for Linux or AWS-RunPowerShellScript for Windows.

2

RunCommand_Shell or RunCommand_PowerShell

Takes several inputs and runs the RunShellScript or RunPowerShellScript command. For more information, check the Details tab for the RunCommand_Shell or RunCommand_PowerShell Automation document on the Systems Manager console.

3

SendTaskFailure

Runs when step 2 is aborted or canceled. It calls the Step Functions send_task_failure API, which accepts three parameters as input: the token passed by the state machine, the failure error, and a description of the cause of the failure.

4

SendTaskSuccess

Runs when step 2 is successful. It calls the Step Functions send_task_success API, which accepts the token passed by the state machine as input.

Runbook parameters

SfnRunCommandByInstanceIds runbook:

Parameter name

Type

Optional or required

Description

shell

String

Required

The instances shell to decide whether to run AWS-RunShellScript for Linux or AWS-RunPowerShellScript for Windows.

deliveryTimeout

Integer

Optional

The time, in seconds, to wait for a command to deliver to the SSM Agent on an instance. This parameter has a minimum value of 30 (0.5 minute) and a maximum value of 2592000 (720 hours).

executionTimeout

String

Optional

The time, in seconds, for a command to complete before it is considered to have failed. The default value is 3600 (1 hour). The maximum value is 172800 (48 hours).

workingDirectory

String

Optional

The path to the working directory on your instance.

Commands

StringList

Required

The shell script or command to run.

InstanceIds

StringList

Required

The IDs of the instances where you want to run the command.

taskToken

String

Required

The task token to use for callback responses.

SfnRunCommandByTargets runbook:

Name

Type

Optional or required

Description

shell

String

Required

The instances shell to decide whether to run AWS-RunShellScript for Linux or AWS-RunPowerShellScript for Windows.

deliveryTimeout

Integer

Optional

The time, in seconds, to wait for a command to deliver to the SSM Agent on an instance. This parameter has a minimum value of 30 (0.5 minute) and a maximum value of 2592000 (720 hours).

executionTimeout

Integer

Optional

The time, in seconds, for a command to complete before it is considered to have failed. The default value is 3600 (1 hour). The maximum value is 172800 (48 hours).

workingDirectory

String

Optional

The path to the working directory on your instance.

Commands

StringList

Required

The shell script or command to run.

Targets

MapList

Required

An array of search criteria that identifies instances by using key-value pairs that you specify. For example: [{"Key":"InstanceIds","Values":["i-02573cafcfEXAMPLE","i-0471e04240EXAMPLE"]}]

taskToken

String

Required

The task token to use for callback responses.

Sample output

The following table provides sample output from the step function. It shows that the total run time is over 100 seconds between step 5 (TaskSubmitted) and step 6 (TaskSucceeded). This demonstrates that the step function waited for the sleep 100 command to finish before moving to the next task in the workflow.

ID

Type

Step

Resource

Elapsed Time (ms)

Timestamp

  1

ExecutionStarted

-

0

Mar 11, 2022 02:50:34.303 PM

  2

TaskStateEntered

StartAutomationWaitForCallBack

-

40

Mar 11, 2022 02:50:34.343 PM

  3

TaskScheduled

StartAutomationWaitForCallBack

-

40

Mar 11, 2022 02:50:34.343 PM

  4

TaskStarted

StartAutomationWaitForCallBack

-

154

Mar 11, 2022 02:50:34.457 PM

  5

TaskSubmitted

StartAutomationWaitForCallBack

-

657

Mar 11, 2022 02:50:34.960 PM

  6

TaskSucceeded

StartAutomationWaitForCallBack

-

103835

Mar 11, 2022 02:52:18.138 PM

  7

TaskStateExited

StartAutomationWaitForCallBack

-

103860

Mar 11, 2022 02:52:18.163 PM

  8

ExecutionSucceeded

-

103897

Mar 11, 2022 02:52:18.200 PM