Accessing S3 data in another AWS account from EMR Serverless
You can run Amazon EMR Serverless jobs from one AWS account and configure them to access data in Amazon S3 buckets that belong to another AWS account. This page describes how to configure cross-account access to S3 from EMR Serverless.
Jobs that run on EMR Serverless can use an S3 bucket policy or an assumed role to access data in Amazon S3 from a different AWS account.
Prerequisites
To set up cross-account access for Amazon EMR Serverless, you must complete tasks while signed in to two AWS accounts:
-
AccountA
– This is the AWS account where you have created an Amazon EMR Serverless application. Before you set up cross-account access, you must have the following ready in this account:-
An Amazon EMR Serverless application where you want to run jobs.
-
A job execution role that has the required permissions to run jobs in the application. For more information, see Job runtime roles for Amazon EMR Serverless.
-
-
AccountB
– This is the AWS account that contains the S3 bucket that you want your Amazon EMR Serverless jobs to access.
Use an S3 bucket policy to access cross-account S3 data
To access the S3 bucket in account B from account A, attach the following policy to the S3 bucket in account B.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "Example permissions 1", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::
AccountA
:root" }, "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::bucket_name_in_AccountB
" ] }, { "Sid": "Example permissions 2", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::AccountA
:root" }, "Action": [ "s3:PutObject", "s3:GetObject", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::bucket_name_in_AccountB
/*" ] } ] }
For more information about S3 cross-account access with S3 bucket policies, see Example 2: Bucket owner granting cross-account bucket permissions in the Amazon Simple Storage Service User Guide.
Use an assumed role to access cross-account S3 data
Another way to set up cross-account access for Amazon EMR Serverless is with the
AssumeRole
action from the AWS Security Token Service (AWS STS). AWS STS is a global web service
that lets you request temporary, limited-privilege credentials for users. You can make API
calls to EMR Serverless and Amazon S3 with the temporary security credentials that you create
with AssumeRole
.
The following steps illustrate how to use an assumed role to access cross-account S3 data from EMR Serverless:
-
Create an Amazon S3 bucket,
cross-account-bucket
, inAccountB
. For more information, see Creating a bucket in the Amazon Simple Storage Service User Guide. If you want to have cross-account access to DynamoDB, you can also create a DynamoDB table inAccountB
. For more information, see Creating a DynamoDB table in the Amazon DynamoDB Developer Guide. -
Create a
Cross-Account-Role-B
IAM role inAccountB
that can access thecross-account-bucket
.Sign in to the AWS Management Console and open the IAM console at https://console.aws.amazon.com/iam/
. -
Choose Roles and create a new role:
Cross-Account-Role-B
. For more information about how to create IAM roles, see Creating IAM roles in the IAM User Guide. -
Create an IAM policy that specifies the permissions for
Cross-Account-Role-B
to access thecross-account-bucket
S3 bucket, as the following policy statement demonstrates. Then attach the IAM policy toCross-Account-Role-B
. For more information, see Creating IAM policies in the IAM User Guide.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "s3:*", "Resource": [ "arn:aws:s3:::
cross-account-bucket
", "arn:aws:s3:::cross-account-bucket
/*" ] } ] }If you require DynamoDB access, create an IAM policy that specifies permissions to access the cross-account DynamoDB table. Then attach the IAM policy to
Cross-Account-Role-B
. For more information, see Amazon DynamoDB: Allows access to a specific table in the IAM User Guide.The following is a policy to allow access to the DynamoDB table
CrossAccountTable
.{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "dynamodb:*", "Resource": "arn:aws:dynamodb:MyRegion:
AccountB
:table/CrossAccountTable
" } ] } -
Edit the trust relationship for the
Cross-Account-Role-B
role.-
To configure the trust relationship for the role, choose the Trust Relationships tab in the IAM console for the role
Cross-Account-Role-B
that you created in Step 2. -
Select Edit Trust Relationship.
-
Add the following policy document. This allows
Job-Execution-Role-A
inAccountA
to assume theCross-Account-Role-B
role.{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::
AccountA
:role/Job-Execution-Role-A" }, "Action": "sts:AssumeRole" } ] }
-
-
Grant
Job-Execution-Role-A
inAccountA
the AWS STSAssumeRole
permission to assumeCross-Account-Role-B
.-
In the IAM console for AWS account
AccountA
, selectJob-Execution-Role-A
. -
Add the following policy statement to the
Job-Execution-Role-A
to allow theAssumeRole
action on theCross-Account-Role-B
role.{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "sts:AssumeRole", "Resource": "arn:aws:iam::
AccountB
:role/Cross-Account-Role-B" } ] }
-
Assumed role examples
You can use a single assumed role to access all S3 resources in an account, or with Amazon EMR 6.11 and higher, you can configure multiple IAM roles to assume when you access different cross-account S3 buckets.
Access S3 resources with one assumed role
Note
When you configure a job to use a single assumed role, all S3 resources throughout
the job use that role, including the entryPoint
script.
If you want to use a single assumed role to access all S3 resources in account B, specify the following configurations:
-
Specify EMRFS configuration
fs.s3.customAWSCredentialsProvider
tospark.hadoop.fs.s3.customAWSCredentialsProvider=com.amazonaws.emr.AssumeRoleAWSCredentialsProvider
. -
For Spark, use
spark.emr-serverless.driverEnv.ASSUME_ROLE_CREDENTIALS_ROLE_ARN
andspark.executorEnv.ASSUME_ROLE_CREDENTIALS_ROLE_ARN
to specify the environment variables on driver and executors. -
For Hive, use
hive.emr-serverless.launch.env.ASSUME_ROLE_CREDENTIALS_ROLE_ARN
,tez.am.emr-serverless.launch.env.ASSUME_ROLE_CREDENTIALS_ROLE_ARN
, andtez.task.emr-serverless.launch.env.ASSUME_ROLE_CREDENTIALS_ROLE_ARN
to specify the environment variables on Hive driver, Tez application master, and Tez task containers.
The following examples show how to use an assumed role to start an EMR Serverless job run with cross-account access.
Access S3 resources with multiple assumed roles
With EMR Serverless releases 6.11.0 and higher, you can configure multiple IAM roles to assume when you access different cross-account buckets. If you want to access different S3 resources with different assumed roles in account B, use following configurations when you start the job run:
-
Specify EMRFS configuration
fs.s3.customAWSCredentialsProvider
tocom.amazonaws.emr.serverless.credentialsprovider.BucketLevelAssumeRoleCredentialsProvider
. -
Specify EMRFS configuration
fs.s3.bucketLevelAssumeRoleMapping
to define the mapping from S3 bucket name to the IAM role in account B to assume. The value should be in format ofbucket1->role1;bucket2->role2
.
For example, you can use
arn:aws:iam::
to access bucket AccountB
:role/Cross-Account-Role-B-1bucket1
, and use
arn:aws:iam::
to access bucket AccountB
:role/Cross-Account-Role-B-2bucket2
. The following examples show how to start an
EMR Serverless job run with cross-account access through multiple assumed roles.