Set up the permissions to enable listing and launching Amazon EMR applications from SageMaker Studio
In this section, we detail the roles and permissions required to list and connect to EMR Serverless applications from SageMaker Studio, considering scenarios where Studio and the EMR Serverless applications are deployed in the same AWS account or across different accounts.
The roles to which you must add the necessary permissions depend on whether Studio and your EMR Serverless applications reside in the same AWS account (Single Account) or in separate accounts (Cross Account). There are two types of roles involved:
-
Execution roles:
-
Runtime execution roles (Role-Based Access Control roles) used by EMR Serverless: These are the IAM roles used by the EMR Serverless job execution environments to access other AWS services and resources needed during runtime, such as Amazon S3 for data access, CloudWatch for logging, access to the AWS Glue Data Catalog or other services based on your workload requirements. We recommend creating these roles in the account where the EMR Serverless applications are running.
To learn more about runtime roles, see Job runtime roles in the EMR Serverless User Guide.
Note
You can define several RBAC roles for your EMR Serverless application. These roles can be based on the responsibilities and access levels needed by different users or groups within your organization. For more information about RBAC permissions, see Security best practices for Amazon Amazon EMR Serverless.
-
SageMaker execution role: The execution role allowing SageMaker to perform certain tasks like reading data from Amazon S3 buckets, writing logs to CloudWatch, and accessing other AWS services that your workflow might need. The SageMaker execution role also has the special permission called
iam:PassRole
which allows SageMaker to pass temporary runtime execution roles to the EMR Serverless applications. These roles give the EMR Serverless applications the permissions they need to interact with other AWS resources while they are running.
-
-
Assumable roles (Also referred to as Service Access Roles):
-
These are the IAM roles that SageMaker's execution role can assume to perform operations related to managing EMR Serverless applications. These roles define the permissions and access policies required when listing, connecting to, or managing EMR Serverless applications. They are typically used in cross-account scenarios, where the EMR Serverless applications are located in a different AWS account than the SageMaker domain. Having a dedicated IAM role for your EMR Serverless applications helps to follow the principle of least privilege and ensures that Amazon EMR has only the required permissions to run your jobs while protecting other resources in your AWS account.
-
By understanding and configuring these roles correctly, you can ensure that SageMaker Studio has the necessary permissions to interact with EMR Serverless applications, regardless of whether they are deployed in the same account or across different accounts.
Single account
The following diagrams illustrate the roles and permissions required to list and connect to EMR Serverless applications from Studio when Studio and the applications are deployed in the same AWS account.
If your Amazon EMR applications and Studio are deployed in the same AWS account, follow these steps:
-
Step 1: Retrieve the ARN of the Amazon S3 bucket you use for data sources and output data storage in the Amazon S3 console
. To learn about how to find a bucket by name, see Accessing and listing an Amazon S3 bucket. For information on how to create an Amazon S3 bucket, see Creating a bucket.
-
Step 2: Create at least one job runtime execution role for your EMR Serverless application in your account (The
EMRServerlessRuntimeExecutionRoleA
in the Single account use case diagram above). Choose Custom trust policy as the trusted entity. Add the permissions required by your job. At a minimum, you need full access to an Amazon S3 bucket, and create and read access to AWS Glue Data Catalog.For detailed instructions about how to create a new runtime execution role for your EMR Serverless applications, follow these steps:
-
Navigate to the IAM console
. -
In the left navigation pane, choose Policy, and then Create policy.
-
Add the permissions required by your runtime role, name the policy, and then choose Create policy.
You can refer to Job runtime roles for EMR Serverless to find sample runtime policies for an EMR Serverless runtime role.
-
In the left navigation pane, choose Roles and then Create role.
-
On the Create role page, choose Custom trust policy as the trusted entity.
-
Paste in the following JSON document in the Custom trust policy section and then choose Next.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "emr-serverless.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }
-
In the Add permissions page, add the policy you created and then choose Next.
-
On the Review page, enter a name for the role such as
EMRServerlessAppRuntimeRoleA
and an optional description. -
Review the role details and choose Create role.
With these roles, you and your teammates can connect to the same application, each using a runtime role scoped with permissions matching your individual level of access to data.
Note
The Spark sessions operate differently. Spark sessions are isolated based on the execution role used from Studio, so users with different execution roles will have separate, isolated Spark sessions. Additionally, if you have enabled source identity for your domain, there is further isolation of Spark sessions across different source identities.
-
-
Step 3: Retrieve the ARN of the SageMaker execution role used by your private space.
For information on spaces and execution roles in SageMaker, see Understanding domain space permissions and execution roles.
For more information about how to retrieve the ARN of SageMaker's execution role, see Get your execution role.
Note
Alternatively, users new to SageMaker can simplify their setup process by automatically creating a new SageMaker execution role with the appropriate permissions. In this case, skip steps 3 and 4. Instead, users can either:
-
Choose the Set up for organizations option when creating a new domain from the Domain menu in the left navigation of the SageMaker console
. -
Create a new execution role from the Role manager menu of the console, and then attach the role to an existing domain or user profile.
When creating the role, choose the Run Studio EMR Serverless Applications option in What ML activities will users perform? Then, provide the name of your Amazon S3 bucket and the job runtime execution role you want your EMR Serverless application to use (step 2).
The SageMaker Role Manager automatically adds the necessary permissions for running and connecting to EMR Serverless applications to the new execution role.Using the SageMaker Role Manager, you can only assign one runtime role to your EMR Serverless application, and the application must run in the same account where Studio is deployed, using a runtime role created within that same account.
-
-
Step 4: Attach the following permissions to the SageMaker execution role accessing your EMR Serverless application.
-
Open the IAM console at https://console.aws.amazon.com/sagemaker/
. -
Choose Roles and then search for your execution role by name in the Search field. The role name is the last part of the ARN, after the last forward slash (/).
-
Follow the link to your role.
-
Choose Add permissions and then Create inline policy.
-
In the JSON tab, add the Amazon EMR Serverless permissions allowing EMR Serverless access and operations. For details on the policy document, see EMR Serverless policies in Reference policies. Replace the
region
,accountID
, and passedEMRServerlessAppRuntimeRole
(s) with their actual values before copying the list of statements to the inline policy of your role.Note
You can include as many ARN strings of runtime roles as needed within the permission, separating them with commas.
-
Choose Next and then provide a Policy name.
-
Choose Create policy.
-
Repeat the Create inline policy step to add another inline policy granting the role permissions to update the domains, user profiles, and spaces. For details on the
SageMakerUpdateResourcesPolicy
policy document, see Domain, user profile, and space update actions policy in Reference policies. Replace theregion
andaccountID
with their actual values before copying the list of statements to the inline policy of your role.
-
-
Step 5:
Associate the list of runtime roles with your user profile or domain so you can visually browse the list of roles and select the one to use when connecting to an EMR Serverless application from JupyterLab. You can use the SageMaker console or the following script. Subsequently, all your Apache Spark or Apache Hive jobs created from your notebook will access only the data and resources permitted by the policies attached to the selected runtime role.
Important
Failure to complete this step will prevent you from connecting a JupyterLab notebook to an EMR Serverless application.
Cross account
The following diagrams illustrate the roles and permissions required to list and connect to EMR Serverless applications from Studio when Studio and the applications are deployed in different AWS accounts.
For more information about creating a role on an AWS account, see https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user.html Creating an IAM role (console).
Before you get started:
-
Retrieve the ARN of the SageMaker execution role used by your private space. For information on spaces and execution roles in SageMaker, see Understanding domain space permissions and execution roles. For more information about how to retrieve the ARN of SageMaker's execution role, see Get your execution role.
-
Retrieve the ARN of the Amazon S3 bucket you will use for data sources and output data storage in the Amazon S3 console
. For information on how to create an Amazon S3 bucket, see Creating a bucket. To learn about how to find a bucket by name, see Accessing and listing an Amazon S3 bucket.
If your EMR Serverless applications and Studio are deployed in separate AWS accounts, you configure the permissions on both accounts.
On the EMR Serverless account
Follow these steps to create the necessary roles and policies on the account where your EMR Serverless application is running, also referred to as the trusting account:
-
Step 1: Create at least one job runtime execution role for your EMR Serverless application in your account (The
EMRServerlessRuntimeExecutionRoleB
in the Cross account diagram above). Choose Custom trust policy as the trusted entity. Add the permissions required by your job. At a minimum, you need full access to an Amazon S3 bucket, and create and read access to AWS Glue Data Catalog.For detailed instructions on how to create a new runtime execution role for your EMR Serverless applications, follow these steps:
-
Navigate to the IAM console
. -
In the left navigation pane, choose Policy, and then Create policy.
-
Add the permissions required by your runtime role, name the policy, and then choose Create policy.
For sample runtime policies of an EMR Serverless runtime role, see Job runtime roles for Amazon EMR Serverless.
-
In the left navigation pane, choose Roles and then Create role.
-
On the Create role page, choose Custom trust policy as the trusted entity.
-
Paste in the following JSON document in the Custom trust policy section and then choose Next.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "emr-serverless.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }
-
In the Add permissions page, add the policy you created and then choose Next.
-
On the Review page, enter a name for the role such as
EMRServerlessAppRuntimeRoleB
and an optional description. -
Review the role details and choose Create role.
With these roles, you and your teammates can connect to the same application, each using a runtime role scoped with permissions matching your individual level of access to data.
Note
The Spark sessions operate differently.Spark sessions are isolated based on the execution role used from Studio, so users with different execution roles will have separate, isolated Spark sessions. Additionally, if you have enabled source identity for your domain, there is further isolation of Spark sessions across different source identities.
-
-
Step 2: Create a custom IAM role named
AssumableRole
with the following configuration:-
Permissions: Grant the necessary permissions (Amazon EMR Serverless policies) to the
AssumableRole
to allow accessing EMR Serverless resources. This role is also known as an Access role. -
Trust relationship: Configure the trust policy for the
AssumableRole
to allow assuming the execution role (TheSageMakerExecutionRole
in the cross-account diagram) from the Studio account that requires access.
By assuming the role, Studio can gain temporary access to the permissions it needs in the EMR Serverless account.
For detailed instructions on how to create a new
AssumableRole
in your EMR Serverless AWS account, follow these steps:-
Navigate to the IAM console
. -
In the left navigation pane, choose Policy, and then Create policy.
-
In the JSON tab, add the Amazon EMR Serverless permissions allowing EMR Serverless access and operations. For details on the policy document, see EMR Serverless policies in Reference policies. Replace the
region
,accountID
, and passedEMRServerlessAppRuntimeRole
(s) with their actual values before copying the list of statements to the inline policy of your role.Note
The
EMRServerlessAppRuntimeRole
here is the job runtime execution role created in Step 1 (TheEMRServerlessAppRuntimeRoleB
in the Cross account diagram above). You can include as many ARN strings of runtime roles as needed within the permission, separating them with commas. -
Choose Next and then provide a Policy name.
-
Choose Create policy.
-
In the left navigation pane, choose Roles and then Create role.
-
On the Create role page, choose Custom trust policy as the trusted entity.
-
Paste in the following JSON document in the Custom trust policy section and then choose Next.
Replace
studio-account
with the Studio account ID, andAmazonSageMaker-ExecutionRole
with the execution role used by your JupyterLab space.{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::
studio-account
:role/service-role/AmazonSageMaker-ExecutionRole
" }, "Action": "sts:AssumeRole" } ] } -
In the Add permissions page, add the permission
EMRServerlessAppRuntimeRoleB
you created in Step 2 and then choose Next. -
On the Review page, enter a name for the role such as
AssumableRole
and an optional description. -
Review the role details and choose Create role.
For more information about creating a role on an AWS account, see Creating an IAM role (console).
-
On the Studio account
On the account where Studio is deployed, also referred to as the trusted account, update the SageMaker execution role accessing your EMR Serverless applications with the required permissions to access resources in the trusting account.
-
Step 1: Retrieve the ARN of the SageMaker execution role used by your space.
For information on spaces and execution roles in SageMaker, see Understanding domain space permissions and execution roles.
For more information about how to retrieve the ARN of SageMaker's execution role, see Get your execution role.
-
Step 2: Attach the following permissions to the SageMaker execution role accessing your EMR Serverless application.
-
Open the IAM console at https://console.aws.amazon.com/iam/
. -
Choose Roles and then search for your execution role by name in the Search field. The role name is the last part of the ARN, after the last forward slash (/).
-
Follow the link to your role.
-
Choose Add permissions and then Create inline policy.
-
In the JSON tab, add the inline policy granting the role permissions to update the domains, user profiles, and spaces. For details on the
SageMakerUpdateResourcesPolicy
policy document, see Domain, user profile, and space update actions policy in Reference policies. Replace theregion
andaccountID
with their actual values before copying the list of statements to the inline policy of your role. -
Choose Next and then provide a Policy name.
-
Choose Create policy.
-
Repeat the Create inline policy step to add another policy granting the execution role the permissions to assume the
AssumableRole
and then perform actions permitted by the role's access policy.Replace
emr-account
with the Amazon EMR Serverless account ID, andAssumableRole
with the name of the assumable role created in the Amazon EMR Serverless account.{ "Version": "2012-10-17", "Statement": { "Sid": "AllowSTSToAssumeAssumableRole", "Effect": "Allow", "Action": "sts:AssumeRole", "Resource": "arn:aws:iam::
emr-account
:role/AssumableRole
" } }
-
-
Step 3:
Associate the list of runtime roles with your domain or user profile so you can visually browse the list of roles and select the one to use when connecting to an EMR Serverless application from JupyterLab. You can use the SageMaker console or the following script. Subsequently, all your Apache Spark or Apache Hive jobs created from your notebook will access only the data and resources permitted by the policies attached to the selected runtime role.
Important
Failure to complete this step will prevent you from connecting a JupyterLab notebook to an EMR Serverless application.
Reference policies
-
EMR Serverless policies: This policy allows managing EMR Serverless applications, including listing, creating (with required SageMaker tags), starting, stopping, getting details, deleting, accessing Livy endpoints, and getting job run dashboards. It also allows passing the required EMR Serverless application runtime role to the service.
-
EMRServerlessListApplications
: Allows the ListApplications action on all EMR Serverless resources in the specified region and AWS account. -
EMRServerlessPassRole
: Allows passing the specified runtime role(s) in the provided AWS account, but only when the role is being passed to theemr-serverless.amazonaws.com service
. -
EMRServerlessCreateApplicationAction
: Allows the CreateApplication and TagResource actions on EMR Serverless resources in he specified region and AWS account. However, it requires that the resources being created or tagged have specific tag keys (sagemaker:domain-arn
,sagemaker:user-profile-arn
, andsagemaker:space-arn
) present with non-null values. -
EMRServerlessDenyTaggingAction
: The TagResource and UntagResource actions on EMR Serverless resources in the specified region and AWS account if the resources do not have any of the specified tag keys (sagemaker:domain-arn
,sagemaker:user-profile-arn
, andsagemaker:space-arn
) set. -
EMRServerlessActions
: Allows various actions (StartApplication
,StopApplication
,GetApplication
,DeleteApplication
,AccessLivyEndpoints
, andGetDashboardForJobRun
) on EMR Serverless resources, but only if the resources have the specified tag keys (sagemaker:domain-arn
,sagemaker:user-profile-arn
, andsagemaker:space-arn
) set with non-null values.
The IAM policy defined in the provided JSON document grants those permissions, but limits that access to the presence of specific SageMaker tags on the EMR Serverless applications to ensure that only Amazon EMR Serverless resources associated with a particular SageMaker domain, user profile, and space can be managed.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "EMRServerlessListApplications", "Effect": "Allow", "Action": [ "emr-serverless:ListApplications" ], "Resource": "arn:aws:emr-serverless:
region
:accountID
:/*" }, { "Sid": "EMRServerlessPassRole", "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam::accountID
:EMRServerlessAppRuntimeRole
", "Condition": { "StringLike": { "iam:PassedToService": "emr-serverless.amazonaws.com" } } }, { "Sid": "EMRServerlessCreateApplicationAction", "Effect": "Allow", "Action": [ "emr-serverless:CreateApplication", "emr-serverless:TagResource" ], "Resource": "arn:aws:emr-serverless:region
:accountID
:/*", "Condition": { "ForAllValues:StringEquals": { "aws:TagKeys": [ "sagemaker:domain-arn", "sagemaker:user-profile-arn", "sagemaker:space-arn" ] }, "Null": { "aws:RequestTag/sagemaker:domain-arn": "false", "aws:RequestTag/sagemaker:user-profile-arn": "false", "aws:RequestTag/sagemaker:space-arn": "false" } } }, { "Sid": "EMRServerlessDenyTaggingAction", "Effect": "Deny", "Action": [ "emr-serverless:TagResource", "emr-serverless:UntagResource" ], "Resource": "arn:aws:emr-serverless:region
:accountID
:/*", "Condition": { "Null": { "aws:ResourceTag/sagemaker:domain-arn": "true", "aws:ResourceTag/sagemaker:user-profile-arn": "true", "aws:ResourceTag/sagemaker:space-arn": "true" } } }, { "Sid": "EMRServerlessActions", "Effect": "Allow", "Action": [ "emr-serverless:StartApplication", "emr-serverless:StopApplication", "emr-serverless:GetApplication", "emr-serverless:DeleteApplication", "emr-serverless:AccessLivyEndpoints", "emr-serverless:GetDashboardForJobRun" ], "Resource": "arn:aws:emr-serverless:region
:accountID
:/applications/*", "Condition": { "Null": { "aws:ResourceTag/sagemaker:domain-arn": "false", "aws:ResourceTag/sagemaker:user-profile-arn": "false", "aws:ResourceTag/sagemaker:space-arn": "false" } } } ] } -
-
Domain, user profile, and space update actions policy : The following policy grants permissions to update SageMaker domains, user profiles, and spaces within the specified region and AWS account.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "SageMakerUpdateResourcesPolicy", "Effect": "Allow", "Action": [ "sagemaker:UpdateDomain", "sagemaker:UpdateUserprofile", "sagemaker:UpdateSpace" ], "Resource": [ "arn:aws:sagemaker:
region>
:accountID
:domain/*", "arn:aws:sagemaker:region
:accountID
:user-profile/*" ] } ] }