How Step Functions generates IAM policies for integrated services - AWS Step Functions

How Step Functions generates IAM policies for integrated services

When you create a state machine in the AWS Step Functions console, Step Functions produces an AWS Identity and Access Management (IAM) policy based on the resources used in your state machine definition as follows:

  • If your state machine uses one of the Optimized integrations, Step Functions will create a policy with the necessary permissions and roles for your state machine. (Exception: MediaConvert integration requires you to manually set up permissions – see IAM policies for calling AWS Elemental MediaConvert.)

  • If your state machine uses one of the AWS SDK integrations, an IAM role with partial permissions will be created. Afterwards, you can use the IAM console to add any missing role policies.

The following examples show how Step Functions generates an IAM policy based on your state machine definition. Items in the example code such as [[resourceName]] are replaced with the static resources listed in your state machine definition. If you have multiple static resources, there will be an entry for each in the IAM role.

Dynamic vs. Static Resources

Static resources are defined directly in the task state of your state machine. When you include the information about the resources you want to call directly in your task states, Step Functions creates an IAM role for only those resources.

Dynamic resources are those that are passed in to your state input, and accessed using a Path (see Using paths to access input for Step Functions workflows). If you are passing dynamic resources to your task, Step Functions will create a more privileged policy that specifies: "Resource": "*".

Additional permissions for tasks using the Run a Job pattern

For tasks that use the Run a Job pattern (those ending in .sync), additional permissions are needed to monitor and receive a response from the API actions of connected services. The related policies include more permissions than for tasks that use the Request Response or Wait for Callback patterns. See Discover service integration patterns in Step Functions for information about synchronous tasks.

Note

You need to provide additional permissions for service integrations that support the Run a Job (.sync) pattern.

Step Functions uses two methods to monitor a job's status when a job is run on a connected service, polling and events.

Polling requires permission for Describe or Get API actions, such as ecs:DescribeTasks or glue:GetJobRun. If these permissions are missing from your role, then Step Functions may be unable to determine the status of your job. This is because some Run a Job (.sync) service integrations do not support EventBridge events, and some services only send events on a best-effort basis.

Events sent from AWS services to Amazon EventBridge are directed to Step Functions using a managed rule, and require permissions for events:PutTargets, events:PutRule, and events:DescribeRule. If these permissions are missing from your role, there may be a delay before Step Functions becomes aware of the completion of your job. For more information about EventBridge events, see Events from AWS services.

Note

For Run a Job (.sync) tasks that support both polling and events, your task may still complete properly using events. This can occur even if your role lacks the required permissions for polling. In this case, you may not immediately notice that the polling permissions are incorrect or missing. In the rare instance that the event fails to be delivered to or processed by Step Functions, your execution could become stuck. To verify that your polling permissions are configured correctly, you can run an execution in an environment without EventBridge events in the following ways:

  • Delete the managed rule from EventBridge, which is responsible for forwarding events to Step Functions. This managed rule is shared by all state machines in your account, so you should perform this action only in a test or development account to avoid any unintentional impact on other state machines. You can identify the specific managed rule to delete by inspecting the Resource field used for events:PutRule in the policy template for the target service. The managed rule will be recreated the next time you create or update a state machine that uses that service integration. For more information on deleting EventBridge rules, see Disabling or deleting a rule.

  • Test with Step Functions Local, which does not support the use of events to complete Run a Job (.sync) tasks. To use Step Functions Local, assume the IAM role used by your state machine. You may need to edit the Trust Relationship. Set the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN environment variables to the assumed role's values, then launch Step Functions Local using java -jar StepFunctionsLocal.jar. Last, use the AWS CLI with the --endpoint-url parameter to create a state machine, start an execution, and get the execution history. For more information, see Testing state machines with Step Functions Local (unsupported).

If a task that uses the Run a Job (.sync) pattern is stopped, Step Functions will make a best-effort attempt to cancel the task. This requires permission to Cancel, Stop, Terminate, or Delete API actions, such as batch:TerminateJob or eks:DeleteCluster. If these permissions are missing from your role, Step Functions will be unable to cancel your task and you may accrue additional charges while it continues to run. For more information on stopping tasks, see Run a Job.