How Step Functions generates IAM policies for integrated services - AWS Step Functions

How Step Functions generates IAM policies for integrated services

When you create a state machine in the AWS Step Functions console, Step Functions produces an AWS Identity and Access Management (IAM) policy based on the resources used in your state machine definition, as follows:

  • For optimized integrations, Step Functions will create a policy with all the necessary permissions and roles for your state machine.

    Tip: You can see example policies in each of the service pages under Integrating optimized services.

  • For standard integrations integrations, Step Functions will create an IAM role with partial permissions.

    You must add any missing role policies that your state machine needs to interact with the service.

Dynamic and static resources

Static resources are defined directly in the task state of your state machine. When you include the information about the resources you want to call directly in your task states, Step Functions can create an IAM role for only those resources.

Dynamic resources are passed as input when starting your state machine, or as input to an individual state, and accessed using JSONata or a JSONPath. When you are passing dynamic resources to your task, Step Functions cannot automatically scope-down the permissions, so Step Functions will create a more permissive policy which specifies:"Resource": "*".

Additional permissions for tasks using .sync

Tasks that use the Run a Job (.sync) pattern require additional permissions for monitoring and receiving a response from the API of connected services.

Step Functions uses two approaches to monitor a job's status when a job is run on a connected service: polling and events.

Polling requires permission for Describe or Get API actions. For example, for Amazon ECS the state machine must have allow permission for ecs:DescribeTasks, for AWS Glue the state machine requires allow permissions for glue:GetJobRun. If the necessary permissions are missing from the role, Step Functions may be unable to determine the status of your job. One reason for using the polling method is because some service integrations do not support EventBridge events, and some services only send events on a best-effort basis.

Alternatively, you might use events sent from AWS services to Amazon EventBridge. Events are routed to Step Functions by EventBridge with a managed rule, so the role requires permissions for events:PutTargets, events:PutRule, and events:DescribeRule. If these permissions are missing from the role, there may be a delay before Step Functions becomes aware of the completion of your job. For more information about EventBridge events, see Events from AWS services.

Troubleshooting stuck .sync workflows

For Run a Job (.sync) tasks that support both polling and events, your task may complete properly using events, even when the role lacks the required permissions for polling.

In the previous scenario, you might not notice the polling permissions are missing or incorrect. In the rare case that an event fails to be delivered to or processed by Step Functions, your execution could become stuck.

To verify that your polling permissions are configured correctly, you can run an execution in an environment without EventBridge events in the following ways

  • Delete the managed rule in EventBridge that is responsible for forwarding events to Step Functions.

    Note

    Because managed rules are shared by all state machines in your account, you should use a test or development account to avoid unintentional impact to other state machines.

  • You can identify the specific managed rule to delete by inspecting the Resource field used for events:PutRule in the policy template for the target service. The managed rule will be recreated the next time you create or update a state machine that uses that service integration.

  • For more information on deleting EventBridge rules, see Disabling or deleting a rule.

Permissions for cancelling workflows

If a task that uses the Run a Job (.sync) pattern is stopped, Step Functions will make a best-effort attempt to cancel the task.

Cancelling a task requires permission to Cancel, Stop, Terminate, or Delete API actions, such as batch:TerminateJob or eks:DeleteCluster. If these permissions are missing from your role, Step Functions will be unable to cancel your task and you may accrue additional charges while it continues to run. For more information on stopping tasks, see Run a Job.

Learn more about integration patterns

To learn about synchronous tasks, see Discover service integration patterns in Step Functions.