

AWS Data Pipeline is no longer available to new customers. Existing customers of AWS Data Pipeline can continue to use the service as normal. [Learn more](https://aws.amazon.com/blogs/big-data/migrate-workloads-from-aws-data-pipeline/)

# Identity and Access Management for AWS Data Pipeline
<a name="dp-control-access"></a>

Your security credentials identify you to services in AWS and grant you permissions to use AWS resources, such as your pipelines. You can use features of AWS Data Pipeline and AWS Identity and Access Management (IAM) to allow AWS Data Pipeline and other users to access your AWS Data Pipeline resources without sharing your security credentials.

Organizations can share access to pipelines so that the individuals in that organization can develop and maintain them collaboratively. However, for example, it might be necessary to do the following:
+ Control which users can access specific pipelines
+ Protect a production pipeline from being edited by mistake
+ Allow an auditor to have read-only access to pipelines, but prevent them from making changes

AWS Data Pipeline is integrated with AWS Identity and Access Management (IAM), which offers a wide range of features:
+ Create users and groups in your AWS account.
+ Easily share your AWS resources between the users in your AWS account.
+ Assign unique security credentials to each user.
+ Control each user's access to services and resources.
+ Get a single bill for all users in your AWS account.

By using IAM with AWS Data Pipeline, you can control whether users in your organization can perform a task using specific API actions and whether they can use specific AWS resources. You can use IAM policies based on pipeline tags and worker groups to share your pipelines with other users and control the level of access they have.

**Topics**
+ [IAM Policies for AWS Data Pipeline](dp-iam-resourcebased-access.md)
+ [Example Policies for AWS Data Pipeline](dp-example-tag-policies.md)
+ [IAM Roles for AWS Data Pipeline](dp-iam-roles.md)

# IAM Policies for AWS Data Pipeline
<a name="dp-iam-resourcebased-access"></a>

By default, IAM entities don't have permission to create or modify AWS resources. To allow IAM entities to create or modify resources and perform tasks, you must create IAM policies that grant IAM entities permission to use the specific resources and API actions they'll need, and then attach those policies to the IAM entities that require those permissions.

When you attach a policy to a user or group of users, it allows or denies the users permission to perform the specified tasks on the specified resources. For general information about IAM policies, see [Permissions and Policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/PermissionsAndPolicies.html) in the *IAM User Guide* guide. For more information about managing and creating custom IAM policies, see [Managing IAM Policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/ManagingPolicies.html).

**Topics**
+ [Policy Syntax](#dp-policy-syntax)
+ [Controlling Access to Pipelines Using Tags](#dp-control-access-tags)
+ [Controlling Access to Pipelines Using Worker Groups](#dp-control-access-workergroup)

## Policy Syntax
<a name="dp-policy-syntax"></a>

An IAM policy is a JSON document that consists of one or more statements. Each statement is structured as follows:

```
{
  "Statement":[{
    "Effect":"effect",
    "Action":"action",
    "Resource":"*",
    "Condition":{
      "condition":{
        "key":"value"
        }
      }
    }
  ]
}
```

The following elements make up a policy statement:
+ **Effect:** The *effect* can be `Allow` or `Deny`. By default, IAM entities don't have permission to use resources and API actions, so all requests are denied. An explicit allow overrides the default. An explicit deny overrides any allows.
+ **Action**: The *action* is the specific API action for which you are granting or denying permission. For a list of actions for AWS Data Pipeline, see [Actions](https://docs.aws.amazon.com/datapipeline/latest/APIReference/API_Operations.html) in the *AWS Data Pipeline API Reference*.
+ **Resource**: The resource that's affected by the action. The only valid value here is `"*"`. 
+ **Condition**: Conditions are optional. They can be used to control when your policy will be in effect.

  AWS Data Pipeline implements the AWS-wide context keys (see [Available Keys for Conditions](https://docs.aws.amazon.com/IAM/latest/UserGuide/AccessPolicyLanguage_ElementDescriptions.html#AvailableKeys)), plus the following service-specific keys.
  + `datapipeline:PipelineCreator` — To grant access to the user that created the pipeline. For an example, see [Grant the pipeline owner full access](dp-example-tag-policies.md#ex3).
  + `datapipeline:Tag` — To grant access based on pipeline tagging. For more information, see [Controlling Access to Pipelines Using Tags](#dp-control-access-tags).
  + `datapipeline:workerGroup` — To grant access based on the name of the worker group. For more information, see [Controlling Access to Pipelines Using Worker Groups](#dp-control-access-workergroup).

## Controlling Access to Pipelines Using Tags
<a name="dp-control-access-tags"></a>

You can create IAM policies that reference the tags for your pipeline. This enables you to use pipeline tagging to do the following:
+ Grant read-only access to a pipeline
+ Grant read/write access to a pipeline
+ Block access to a pipeline

For example, suppose that a manager has two pipeline environments, production and development, and an IAM group for each environment. For pipelines in the production environment, the manager grants read/write access to users in the production IAM group, but grants read-only access to users in the developer IAM group. For pipelines in the development environment, the manager grants read/write access to both the production and developer IAM groups.

To achieve this scenario, the manager tags the production pipelines with the "environment=production" tag and attaches the following policy to the developer IAM group. The first statement grants read-only access to all pipelines. The second statement grants read/write access to pipelines that do not have an "environment=production" tag.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "datapipeline:Describe*",
        "datapipeline:ListPipelines",
        "datapipeline:GetPipelineDefinition",
        "datapipeline:QueryObjects"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": "datapipeline:*",
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {"datapipeline:Tag/environment": "production"}
      }
    }
  ]
}
```

------

In addition, the manager attaches the following policy to the production IAM group. This statement grants full access to all pipelines.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "datapipeline:*",
      "Resource": "*"
    }
  ]
}
```

------

For more examples, see [Grant users read-only access based on a tag](dp-example-tag-policies.md#ex1) and [Grant users full access based on a tag](dp-example-tag-policies.md#ex2).

## Controlling Access to Pipelines Using Worker Groups
<a name="dp-control-access-workergroup"></a>

You can create IAM policies that make reference worker group names.

For example, suppose that a manager has two pipeline environments, production and development, and an IAM group for each environment. The manager has three database servers with task runners configured for production, pre-production, and developer environments, respectively. The manager wants to ensure that users in the production IAM group can create pipelines that push tasks to production resources, and that users in the development IAM group can create pipelines that push tasks to both pre-production and developer resources.

To achieve this scenario, the manager installs task runner on the production resources with production credentials, and sets `workerGroup` to "prodresource". In addition, the manager installs task runner on the development resources with development credentials, and sets `workerGroup` to "pre-production" and "development". The manager attaches the following policy to the developer IAM group to block access to "prodresource" resources. The first statement grants read-only access to all pipelines. The second statement grants read/write access to pipelines when the name of the worker group has a prefix of "dev" or "pre-prod".

In addition, the manager attaches the following policy to the production IAM group to grant access to "prodresource" resources. The first statement grants read-only access to all pipelines. The second statement grants read/write access when the name of the worker group has a prefix of "prod".

# Example Policies for AWS Data Pipeline
<a name="dp-example-tag-policies"></a>

The following examples demonstrate how to grant users full or restricted access to pipelines.

**Topics**
+ [Example 1: Grant users read-only access based on a tag](#ex1)
+ [Example 2: Grant users full access based on a tag](#ex2)
+ [Example 3: Grant the pipeline owner full access](#ex3)
+ [Example 4: Grant users access to the AWS Data Pipeline console](#example4-grant-users-access-to-console)

## Example 1: Grant users read-only access based on a tag
<a name="ex1"></a>

The following policy allows users to use the read-only AWS Data Pipeline API actions, but only with pipelines that have the tag "environment=production". 

The ListPipelines API action does not support tag-based authorization.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "datapipeline:Describe*",
        "datapipeline:GetPipelineDefinition",
        "datapipeline:ValidatePipelineDefinition",
        "datapipeline:QueryObjects"
      ],
      "Resource": [
        "*"
      ],
      "Condition": {
        "StringEquals": {
          "datapipeline:Tag/environment": "production"
        }
      }
    }
  ]
}
```

------

## Example 2: Grant users full access based on a tag
<a name="ex2"></a>

The following policy allows users to use all AWS Data Pipeline API actions, with the exception of ListPipelines, but only with pipelines that have the tag "environment=test".

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "datapipeline:*"
      ],
      "Resource": [
        "*"
      ],
      "Condition": {
        "StringEquals": {
          "datapipeline:Tag/environment": "test"
        }
      }
    }
  ]
}
```

------

## Example 3: Grant the pipeline owner full access
<a name="ex3"></a>

The following policy allows users to use all the AWS Data Pipeline API actions, but only with their own pipelines.

## Example 4: Grant users access to the AWS Data Pipeline console
<a name="example4-grant-users-access-to-console"></a>

The following policy allows users to create and manage a pipeline using the AWS Data Pipeline console. 

This policy includes the action for `PassRole` permissions for specific resources tied to the `roleARN` that AWS Data Pipeline needs. For more information about the identity-based (IAM) `PassRole` permission, see the blog post [ Granting Permission to Launch EC2 Instances with IAM Roles (PassRole Permission)](https://aws.amazon.com/blogs/security/granting-permission-to-launch-ec2-instances-with-iam-roles-passrole-permission/).

------
#### [ JSON ]

****  

```
{
	"Version":"2012-10-17",		 	 	 
	"Statement": [{
			"Action": [
				"cloudwatch:*",
				"datapipeline:*",
				"dynamodb:DescribeTable",
				"elasticmapreduce:AddJobFlowSteps",
				"elasticmapreduce:ListInstance*",
				"iam:AddRoleToInstanceProfile",
				"iam:CreateInstanceProfile",
				"iam:GetInstanceProfile",
				"iam:GetRole",
				"iam:GetRolePolicy",
				"iam:ListInstanceProfiles",
				"iam:ListInstanceProfilesForRole",
				"iam:ListRoles",
				"rds:DescribeDBInstances",
				"rds:DescribeDBSecurityGroups",
				"redshift:DescribeClusters",
				"redshift:DescribeClusterSecurityGroups",
				"s3:List*",
				"sns:ListTopics"
			],
			"Effect": "Allow",
			"Resource": [
				"*"
			]
		},
		{
			"Action": "iam:PassRole",
			"Effect": "Allow",
			"Resource": [
				"arn:aws:iam::*:role/DataPipelineDefaultResourceRole",
				"arn:aws:iam::*:role/DataPipelineDefaultRole"
			]
		}
	]
}
```

------

# IAM Roles for AWS Data Pipeline
<a name="dp-iam-roles"></a>

AWS Data Pipeline uses AWS Identity and Access Management roles. The permissions policies attached to IAM roles determine what actions AWS Data Pipeline and your applications can perform, and what AWS resources they can access. For more information, see [IAM roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html) in the *IAM User Guide*.

AWS Data Pipeline requires two IAM roles:
+ **The pipeline role** controls AWS Data Pipeline access to your AWS resources. In pipeline object definitions, the `role` field specifies this role.
+ **The EC2 instance role** controls the access that applications running on EC2 instances, including the EC2 instances in Amazon EMR clusters, have to AWS resources. In pipeline object definitions, the `resourceRole` field specifies this role.

**Important**  
If you created a pipeline before October 3, 2022 using the AWS Data Pipeline console with default roles, AWS Data Pipeline created the `DataPipelineDefaultRole` for you and attached the `AWSDataPipelineRole` managed policy to the role. As of October 3, 2022, the `AWSDataPipelineRole` managed policy is deprecated and the pipeline role must be specified for a pipeline when using the console.  
We recommend that you review existing pipelines and determine if the `DataPipelineDefaultRole` is associated with the pipeline and whether the `AWSDataPipelineRole` is attached to that role. If so, review the access that this policy allows to ensure it is appropriate for your security requirements. Add, update, or replace the policies and policy statements attached to this role as necessary. Alternatively, you can update a pipeline to use a role that you create with different permissions policies.

## Example Permissions Policies for AWS Data Pipeline Roles
<a name="dp-role-permissions-policy-examples"></a>

Each role has one or more permissions policies attached to it that determine the AWS resources that the role can access and the actions that the role can perform. This topic provides an example permissions policy for the pipeline role. It also provides the contents of the `AmazonEC2RoleforDataPipelineRole`, which is the managed policy for the default EC2 instance role, `DataPipelineDefaultResourceRole`.

### Example Pipeline Role Permissions Policy
<a name="dp-role-example-policy"></a>

The example policy that follows is scoped to allow essential functions that AWS Data Pipeline requires to run a pipeline with Amazon EC2 and Amazon EMR resources. It also provides permissions to access other AWS resources, such as Amazon Simple Storage Service and Amazon Simple Notification Service, that many pipelines require. If the objects defined in a pipeline do not require the resources of an AWS service, we strongly recommend that you remove permissions to access that service. For example, if your pipeline does not define a [DynamoDBDataNode](dp-object-dynamodbdatanode.md) or use the [SnsAlarm](dp-object-snsalarm.md) action, we recommend that you remove the allow statements for those actions.
+ Replace `111122223333` with your AWS account ID.
+ Replace `NameOfDataPipelineRole` with the name of pipeline role (the role to which this policy is attached).
+ Replace `NameOfDataPipelineResourceRole` with the name of EC2 instance role.
+ Replace `us-west-1` with the appropriate Region for your application.

### Default Managed Policy for the EC2 Instance Role
<a name="dp-resource-role-example-policy"></a>

The contents of the `AmazonEC2RoleforDataPipelineRole` is shown below. This is the managed policy attached to the default resource role for AWS Data Pipeline, `DataPipelineDefaultResourceRole`. When you define a resource role for your pipeline, we recommend that you begin with this permissions policy and then remove permissions for AWS service actions that are not required.

Version 3 of the policy is shown, which is the most recent version at the time of this writing. View the most recent version of the policy using the IAM console.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [{
      "Effect": "Allow",
      "Action": [
        "cloudwatch:*",
        "datapipeline:*",
        "dynamodb:*",
        "ec2:Describe*",
        "elasticmapreduce:AddJobFlowSteps",
        "elasticmapreduce:Describe*",
        "elasticmapreduce:ListInstance*",
        "elasticmapreduce:ModifyInstanceGroups",
        "rds:Describe*",
        "redshift:DescribeClusters",
        "redshift:DescribeClusterSecurityGroups",
        "s3:*",
        "sdb:*",
        "sns:*",
        "sqs:*"
      ],
      "Resource": ["*"]
    }]
}
```

------

## Creating IAM Roles for AWS Data Pipeline and Editing Role Permissions
<a name="dp-iam-roles-new"></a>

Use the following procedures to create roles for AWS Data Pipeline using the IAM console. The process consists of two steps. First, you create a permissions policy to attach to the role. Next, you create the role and attach the policy. After you create a role, you can change the role's permissions by attaching and detaching permissions policies.

**Note**  
When you create roles for AWS Data Pipeline using the console as described below, IAM creates and attaches the appropriate trust policies that the role requires.

**To create a permissions policy to use with a role for AWS Data Pipeline**

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the navigation pane, choose **Policies**, and then choose **Create policy**.

1. <a name="step3"></a>Choose the **JSON** tab.

1. If you are creating a pipeline role, copy and paste the contents of the policy example in [Example Pipeline Role Permissions Policy](#dp-role-example-policy), editing it as appropriate for your security requirements. Alternatively, if you are creating a custom EC2 instance role, do the same for the example in [Default Managed Policy for the EC2 Instance Role](#dp-resource-role-example-policy).

1. Choose **Review policy**.

1. Enter a name for the policy—for example, `MyDataPipelineRolePolicy`—and an optional **Description**, and then choose **Create policy**.

1. Note the name of the policy. You need it when you create your role.

**To create an IAM role for AWS Data Pipeline**

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the navigation pane, choose **Roles**, and then choose **Create Role**.

1. Under **Choose a use case**, choose **Data Pipeline**.

1. Under **Select your use case**, do one of the following:
   + Choose `Data Pipeline` to create a pipeline role.
   + Choose `EC2 Role for Data Pipeline` to create a resource role.

1. Choose **Next: Permissions**.

1. If the default policy for AWS Data Pipeline is listed, proceed with the following steps to create the role and then edit it according to the instructions in the next procedure. Otherwise, enter the name of the policy that you created in the procedure above, and then select it from the list.

1. Choose **Next: Tags**, enter any tags to add to the role, and then choose **Next: Review**.

1. Enter a name for the role—for example, `MyDataPipelineRole`—and an optional **Description**, and then choose **Create role**.

**To attach or detach a permissions policy for an IAM role for AWS Data Pipeline**

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the navigation pane, choose **Roles**

1. In the search box, begin typing the name of the role you want to edit—for example, **DataPipelineDefaultRole** or **MyDataPipelineRole**—and then choose the **Role name** from the list.

1. On the **Permissions** tab, do the following:
   + To detach a permissions policy, under **Permissions policies**, choose the remove button on the far right of the policy entry. Choose **Detach** when prompted to confirm.
   + To attach a policy that you created earlier, choose **Attach policies**. In the search box, begin typing the name of the policy you want to edit, select it from the list, and then choose **Attach policy**.

## Changing Roles for an Existing Pipelines
<a name="dp-iam-change-console"></a>

If you want to assign a different pipeline role or resource role to a pipeline, you can use the architect editor in the AWS Data Pipeline console.

**To edit the roles assigned to a pipeline using the console**

1. Open the AWS Data Pipeline console at [https://console.aws.amazon.com/datapipeline/](https://console.aws.amazon.com/datapipeline/).

1. Select the pipeline from the list, and then choose **Actions**, **Edit**.

1. In the right pane of the architect editor, choose **Others**.

1. From the **Resource Role** and **Role** lists, choose the roles for AWS Data Pipeline that you want to assign, and then choose **Save**.