

# Amazon EMR Studio
<a name="emr-studio"></a>

Amazon EMR Studio is a web-based integrated development environment (IDE) for fully managed Jupyter notebooks that run on Amazon EMR clusters. You can set up an EMR Studio for your team to develop, visualize, and debug applications written in R, Python, Scala, and PySpark. EMR Studio is integrated with AWS Identity and Access Management (IAM) and IAM Identity Center so users can log in using their corporate credentials.

You can create an EMR Studio at no cost. Applicable charges for Amazon S3 storage and for Amazon EMR clusters apply when you use EMR Studio. For product details and highlights, see the service page for [Amazon EMR Studio](https://aws.amazon.com/emr/features/studio/).

## Key features of EMR Studio
<a name="emr-studio-key-features"></a>

Amazon EMR Studio provides the following features:
+ Authenticate users with AWS Identity and Access Management (IAM), or with AWS IAM Identity Center with or without [trusted identity propagation](https://docs.aws.amazon.com/singlesignon/latest/userguide/trustedidentitypropagation.html) and your enterprise identity provider.
+ Access and launch Amazon EMR clusters on-demand to run Jupyter Notebook jobs.
+ Connect to Amazon EMR on EKS clusters to submit work as job runs.
+ Explore and save example notebooks. For more information about example notebooks, see the [EMR Studio Notebook examples GitHub repository](https://github.com/aws-samples/emr-studio-notebook-examples).
+ Analyze data using Python, PySpark, Spark Scala, Spark R, or SparkSQL, and install custom kernels and libraries.
+ Collaborate in real time with other users in the same Workspace. For more information, see [Configure Workspace collaboration in EMR Studio](emr-studio-workspace-collaboration.md).
+ Use the EMR Studio SQL Explorer to browse your data catalog, run SQL queries, and download results before you work with the data in a notebook.
+ Run parameterized notebooks as part of scheduled workflows with an orchestration tool such as Apache Airflow or Amazon Managed Workflows for Apache Airflow. For more information, see [ Orchestrating analytics jobs on EMR Notebooks using MWAA](https://aws.amazon.com/blogs/big-data/orchestrating-analytics-jobs-on-amazon-emr-notebooks-using-amazon-mwaa/) in the AWS Big Data Blog.
+ Link code repositories such as GitHub and BitBucket.
+ Track and debug jobs using the Spark History Server, Tez UI, or YARN timeline server. 

EMR Studio is HIPAA eligible and is certified under HITRUST CSF and SOC 2. For more information about HIPAA compliance for AWS services, see [https://aws.amazon.com/compliance/hipaa-compliance/](https://aws.amazon.com/compliance/hipaa-compliance/). To learn more about HITRUST CSF compliance for AWS services, see [https://aws.amazon.com/compliance/hitrust/](https://aws.amazon.com/compliance/hitrust/).

EMR Studio is also FedRamp compliant. For more information about compliance programs Amazon EMR conforms with, see [Compliance validation for Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-compliance.html). For more information about additional compliance programs for AWS services, see [AWS Services in Scope by Compliance Program](https://aws.amazon.com/compliance/services-in-scope/).

## Amazon SageMaker Unified Studio integrated development environment
<a name="emr-studio-unified"></a>

Amazon SageMaker Unified Studio provides an integrated development environment (IDE) for your [Jupyter notebooks](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/jupyterlab.html) that runs on [Amazon EMR on EC2 clusters](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/managing-emr-on-ec2.html) or using [EMR Serverless compute connections](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/adding-deleting-emr-serverless.html). By combining the power of Amazon EMR with the end-to-end workflow capabilities of Amazon SageMaker Unified Studio, teams can streamline data preparation, pipeline development, and ML experimentation in a single environment. Amazon EMR in SageMaker revolutionizes big data processing by supporting open-source frameworks like Apache Spark, Trino, and Apache Flink. Eliminate infrastructure management complexities while scaling analytics workloads effortlessly. To learn more, see [Amazon EMR](https://aws.amazon.com/emr/). 

## Amazon EMR Studio feature history
<a name="emr-studio-history"></a>

This table lists updates to the Amazon EMR managed scaling capability.


| Release date | Capability | 
| --- | --- | 
| January 5, 2024 |  Added support for EMR Studio in AWS GovCloud (US-East) and AWS GovCloud (US-West).  | 
| November 26, 2023 |  Added support for trusted identity propagation for EMR Studio with IAM Identity Center authentication.  | 
| October 26, 2023 |  Added ability to create an EMR Serverless application with interactive capability.  | 
| February 28, 2023 |  Added AWS KMS customer-managed key support for application log storage for EMR Serverless applications.  | 
| February 23, 2023 |  Added one-click IAM role creation for EMR Serverless job submission. Added ECR lookup for when you select a custom image for EMR Serverless applications.  | 
| January 27, 2023 |  Headless execution notebooks can track the progress of each cell execution with `%execute_notebook` magic.  | 
| January 23, 2023 |  Persistent application have been optimized for faster launch times.  | 

# How Amazon EMR Studio works
<a name="how-emr-studio-works"></a>

An Amazon EMR Studio is an Amazon EMR resource that you create for a team of users. Each Studio is a self-contained, web-based integrated development environment for Jupyter notebooks that run on Amazon EMR clusters. Users log in to a Studio using corporate credentials. 

Each EMR Studio that you create uses the following AWS resources: 
+ **An Amazon Virtual Private Cloud (VPC) with subnets **- Users run Studio kernels and applications on Amazon EMR and Amazon EMR on EKS clusters in the specified VPC. An EMR Studio can connect to any cluster in the subnets that you specify when you create the Studio.
+ **IAM roles and permissions policies** - To manage user permissions, you create IAM permissions policies that you attach to a user's IAM identity or to a user role. EMR Studio also uses an IAM service role and security groups to interoperate with other AWS services. For more information, see [Access control](#emr-studio-access-control) and [Define security groups to control EMR Studio network traffic](emr-studio-security-groups.md).
+ **Security groups** - EMR Studio uses security groups to establish a secure network channel between the Studio and an EMR cluster.
+ **An Amazon S3 backup location** - EMR Studio saves notebook work in an Amazon S3 location.

The following steps outline how to create and administer an EMR Studio:

1. Create a Studio in your AWS account with either IAM or IAM Identity Center authentication. For instructions, see [Set up an EMR Studio](emr-studio-set-up.md).

1. Assign users and groups to your Studio. Use permissions policies to set fine-grained permissions for each user. For more information, see the topic [Assign and manage EMR Studio users](emr-studio-manage-users.md).

1. Start monitoring EMR Studio actions with AWS CloudTrail events. For more information, see [Monitor Amazon EMR Studio actions](emr-studio-manage-studio.md#emr-studio-monitor).

1. Provide more cluster options to Studio users with cluster templates and Amazon EMR on EKS managed endpoints. 

## Authentication and user login
<a name="emr-studio-login"></a>

Amazon EMR Studio supports two authentication modes: IAM authentication mode and IAM Identity Center authentication mode. IAM mode uses AWS Identity and Access Management (IAM), while IAM Identity Center mode uses AWS IAM Identity Center. When you create an EMR Studio, you choose the authentication mode for all users of that Studio.

### IAM authentication mode
<a name="emr-studio-login-iam-mode"></a>

With IAM authentication mode, you can use IAM authentication or IAM federation. 

IAM *authentication* lets you manage IAM identities such as users, groups, and roles in IAM. You grant users access to a Studio with IAM permissions policies and [attribute-based access control (ABAC)](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction_attribute-based-access-control.html). 

IAM *federation* lets you establish trust between a third-party identity provider (IdP) and AWS so that you can manage user identities through your IdP.

### IAM Identity Center authentication mode
<a name="emr-studio-login-sso-mode"></a>

IAM Identity Center authentication mode lets you give users federated access to an EMR Studio. You can use IAM Identity Center to authenticate users and groups from your IAM Identity Center directory, your existing corporate directory, or an external IdP such as Azure Active Directory (AD). You then manage users with your identity provider (IdP).

EMR Studio supports using the following identity providers for IAM Identity Center:
+ **AWS Managed Microsoft AD and self-managed Active Directory** – For more information, see [Connect to your Microsoft AD directory](https://docs.aws.amazon.com/singlesignon/latest/userguide/manage-your-identity-source-ad.html).
+ **SAML-based providers** – For a full list, see [Supported identity providers](https://docs.aws.amazon.com/singlesignon/latest/userguide/supported-idps.html).
+ **The IAM Identity Center directory** – For more information, see [Manage identities in IAM Identity Center](https://docs.aws.amazon.com/singlesignon/latest/userguide/manage-your-identity-source-sso.html) and [Trusted Identity Propagation across applications](https://docs.aws.amazon.com/singlesignon/latest/userguide/trustedidentitypropagation.html) in the *AWS IAM Identity Center User Guide*.

### How authentication affects login and user assignment
<a name="emr-studio-login-auth-differences"></a>

The authentication mode that you choose for EMR Studio affects how users log in to a Studio, how you assign a user to a Studio, and how you *authorize* (give permissions to) users to perform actions such as creating new Amazon EMR clusters.

The following table summarizes login methods for EMR Studio according to authentication mode.


**EMR Studio login options by authentication mode**  

| Authentication mode | Login method | Description | 
| --- | --- | --- | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/how-emr-studio-works.html)  | EMR Studio URL |  Users log in to a Studio using the Studio access URL. For example, `https://xxxxxxxxxxxxxxxxxxxxxxx.emrstudio-prod.us-east-1.amazonaws.com`.  Users enter IAM credentials when you use IAM authentication. When you use IAM federation or IAM Identity Center, EMR Studio redirects users to your identity provider's sign-in URL to enter credentials. In the context of identity federation, this login option is called Service Provider (SP) initiated sign-in.  | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/how-emr-studio-works.html)  | Identity provider (IdP) portal |  Users log in to your identity provider's portal, such as the Azure portal, and launch the Amazon EMR console. After launching the Amazon EMR console, users select and open a Studio from the **Studios list**. You can also configure EMR Studio as a SAML application so that users can log in to a specific Studio from your identity provider's portal. For instructions, see [To configure an EMR Studio as a SAML application in your IdP portal](emr-studio-authentication.md#emr-studio-create-federation-deeplink). In the context of identity federation, this login option is called identity provider (IdP) initiated sign-in.  | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/how-emr-studio-works.html)  | AWS Management Console | Users sign in to the AWS Management Console using IAM credentials and open a Studio from the Studios list in the Amazon EMR console. | 

The following table outlines user assignment and authorization for EMR Studio by authentication mode.


**EMR Studio user assignment and authorization by authentication mode**  

| Authentication mode | User assignment | User authorization | 
| --- | --- | --- | 
|  IAM (authentication and federation)  |  Allow the `CreateStudioPresignedUrl` action in an IAM permissions policy attached to an IAM identity (user, group, or role).  For federated users, allow the `CreateStudioPresignedUrl` action in an IAM in the permissions policy that you configure for the IAM role you use for federation. Use attribute-based access control (ABAC) to specify the Studio or Studios that the user can access. For instructions, see [Assign a user or group to an EMR Studio](emr-studio-manage-users.md#emr-studio-assign-users-groups).  |  Define IAM permissions policies that allow certain EMR Studio actions.  For native a users, attach the IAM permissions policy to an IAM identity (user, group, or role). For federated users, allow Studio actions in the permissions policy that you configure for the IAM role you use for federation. For more information, see [Configure EMR Studio user permissions for Amazon EC2 or Amazon EKS](emr-studio-user-permissions.md).  | 
| IAM Identity Center |  For Studios created with `IdCUserAssignment` set to `REQUIRED`, map users to the Studio with a specified session policy. For more information, see [Assign a user or group to an EMR Studio](emr-studio-manage-users.md#emr-studio-assign-users-groups). For Studios created with `IdCUserAssignment` set to `OPTIONAL`, any Identity Center user or group can access the Studio.  |  *Optional:* Define IAM session policies that allow certain EMR Studio actions. Map a session policy to a user when you assign the user to a Studio. For more information, see [User permissions for IAM Identity Center authentication mode](#emr-studio-sso-authorization).  | 

## Access control
<a name="emr-studio-access-control"></a>

In Amazon EMR Studio, you configure user authorization (permissions) with AWS Identity and Access Management (IAM) identity-based policies. In these policies, you specify allowed actions and resources, as well as the conditions under which the actions are allowed.

### User permissions for IAM authentication mode
<a name="emr-studio-iam-authorization"></a>

To set user permissions when you use IAM authentication for EMR Studio, you allow actions such as `elasticmapreduce:RunJobFlow` in an IAM permissions policy. You can create one or more permissions policies to use. For example, you might create a basic policy that does not allow a user to create new Amazon EMR clusters, and another policy that does allow cluster creation. For a list of all Studio actions, see [AWS Identity and Access Management permissions for EMR Studio users](emr-studio-user-permissions.md#emr-studio-iam-permissions-table).

### User permissions for IAM Identity Center authentication mode
<a name="emr-studio-sso-authorization"></a>

When you use IAM Identity Center authentication, you create a single EMR Studio user role. The *user role* is a dedicated IAM role that a Studio assumes when a user logs in.

You attach IAM session policies to the EMR Studio user role. A *session policy* is a special kind of IAM permissions policy that limits what a federated user can do during a Studio login session. Session policies let you set specific permissions for a user or group without creating multiple user roles for EMR Studio.

When you [assign users and groups](emr-studio-manage-users.md#emr-studio-assign-users-groups) to a Studio, you map a session policy to that user or group to apply fine-grained permissions. You can also update a user or group's session policy at any time. Amazon EMR stores each session policy mapping that you create.

For more information about session policies, see [Policies and permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html#policies_session) in the *AWS Identity and Access Management User Guide*. 

## Workspaces
<a name="emr-studio-workspaces"></a>

Workspaces are the primary building blocks of Amazon EMR Studio. To organize notebooks, users create one or more Workspaces in a Studio. For more information, see [Learn EMR Studio workspaces](emr-studio-configure-workspace.md).

Similar to [workspaces in JupyterLab](https://jupyterlab.readthedocs.io/en/latest/user/urls.html#managing-workspaces-ui), a Workspace preserves the state of notebook work. However, the Workspace user interface extends the open-source [JupyterLab](https://jupyterlab.readthedocs.io/en/latest/user/interface.html) interface with additional tools to let you create and attach EMR clusters, run jobs, explore sample notebooks, and link Git repositories.

The following list includes key features of EMR Studio Workspaces:
+ Workspace visibility is Studio-based. Workspaces that you create in one Studio aren't visible in other Studios.
+ By default, a Workspace is shared and can be seen by all Studio users. However, only one user can open and work in a Workspace at a time. To work simultaneously with other users, you can [Configure Workspace collaboration in EMR Studio](emr-studio-workspace-collaboration.md)
+ You can collaborate simultaneously with other users in a Workspace when you enable Workspace collaboration. For more information, see [Configure Workspace collaboration in EMR Studio](emr-studio-workspace-collaboration.md).
+ Notebooks in a Workspace share the same EMR cluster to run commands. You can attach a Workspace to an Amazon EMR cluster running on Amazon EC2 or to an Amazon EMR on EKS virtual cluster and managed endpoint.
+ Workspaces can switch over to another Availability Zone that you associate with a Studio's subnets. You can stop and restart a Workspace to prompt the failover process. When you restart a Workspace, EMR Studio launches the Workspace in a different Availability Zone in the Studio's VPC when the Studio is configured with access to multiple Availability Zones. If the Studio has only one Availability Zone, EMR Studio attempts to launch the Workspace in a different subnet. For more information, see [Resolve Workspace connectivity issues](emr-studio-workspace-stop-start.md).
+ A Workspace can connect to clusters in any of the subnets that are associated with a Studio.

For more information about creating and configuring EMR Studio Workspaces, see [Learn EMR Studio workspaces](emr-studio-configure-workspace.md).

## Notebook storage in Amazon EMR Studio
<a name="emr-studio-storage"></a>

When you use a Workspace, EMR Studio autosaves the cells in notebook files at a regular cadence in the Amazon S3 location that is associated with your Studio. This backup process preserves work between sessions so that you can come back to it later without committing changes to a Git repository. For more information, see [Save Workspace content in EMR Studio](emr-studio-save-workspace.md).

When you delete a notebook file from a Workspace, EMR Studio deletes the backup version from Amazon S3 for you. However, if you delete a Workspace without first deleting its notebook files, the notebook files remain in Amazon S3 and continue to accrue storage charges. To learn more, see [Delete a Workspace and notebook files in EMR Studio](emr-studio-delete-workspace.md).

# EMR Studio features, requirements, and limits
<a name="emr-studio-considerations"></a>

This topic includes Items to consider when working with Amazon EMR Studio, including considerations about regions and tools, cluster requirements, and technical limitations.

## Considerations
<a name="emr-studio-considerations-general"></a>

Consider the following when you work with EMR Studio:
+ EMR Studio is available in the following AWS Regions: 
  + US East (Ohio) (us-east-2)
  + US East (N. Virginia) (us-east-1)
  + US West (N. California) (us-west-1)
  + US West (Oregon) (us-west-2)
  + Africa (Cape Town) (af-south-1)
  + Asia Pacific (Hong Kong) (ap-east-1)
  + Asia Pacific (Jakarta) (ap-southeast-3)\$1
  + Asia Pacific (Melbourne) (ap-southeast-4)\$1
  + Asia Pacific (Mumbai) (ap-south-1)
  + Asia Pacific (Osaka) (ap-northeast-3)\$1
  + Asia Pacific (Seoul) (ap-northeast-2)
  + Asia Pacific (Singapore) (ap-southeast-1)
  + Asia Pacific (Sydney) (ap-southeast-2)
  + Asia Pacific (Tokyo) (ap-northeast-1)
  + Canada (Central) (ca-central-1)
  + Europe (Frankfurt) (eu-central-1)
  + Europe (Ireland) (eu-west-1) 
  + Europe (London) (eu-west-2)
  + Europe (Milan) (eu-south-1)
  + Europe (Paris) (eu-west-3)
  + Europe (Spain) (eu-south-2)
  + Europe (Stockholm) (eu-north-1)
  + Europe (Zurich) (eu-central-2)\$1
  + Israel (Tel Aviv) (il-central-1)\$1
  + Middle East (UAE) (me-central-1)\$1
  + South America (São Paulo) (sa-east-1)
  + AWS GovCloud (US-East) (gov-us-east-1)
  + AWS GovCloud (US-West) (gov-us-west-1)

  \$1 The live Spark UI isn't supported in these Regions.
+ To let users provision new EMR clusters running on Amazon EC2 for a Workspace, you can associate an EMR Studio with a set of cluster templates. Administrators can define cluster templates with Service Catalog and can choose whether a user or group can access the cluster templates, or no cluster templates, within a Studio.
+ When you define access permissions to notebook files stored in Amazon S3 or read secrets from AWS Secrets Manager, use the Amazon EMR service role. Session policies aren't supported with these permissions.
+ You can create multiple EMR Studios to control access to EMR clusters in different VPCs.
+ Use the AWS CLI to set up Amazon EMR on EKS clusters. You can then use the Studio interface to attach clusters to Workspaces with a managed endpoint to run notebook jobs.
+ There are additional considerations when you use trusted identity propagation with Amazon EMR that also apply to EMR Studio. For more information, see [Considerations and limitations for Amazon EMR with the Identity Center integration](emr-idc-considerations.md).
+ EMR Studio doesn't support the following Python magic commands:
  + `%alias`
  + `%alias_magic`
  + `%automagic`
  + `%macro`
  + `%%js`
  + `%%javascript`
  + Modifying `proxy_user` using `%configure`
  + Modifying `KERNEL_USERNAME` using `%env` or `%set_env`
+ Amazon EMR on EKS clusters don't support SparkMagic commands for EMR Studio.
+ To write multi-line Scala statements in notebook cells, make sure that all but the last line end with a period. The following example uses the correct syntax for multi-line Scala statements.

  ```
  val df = spark.sql("SELECT * from table_name).
          filter("col1=='value'").
          limit(50)
  ```
+ To augment the security for the off-console applications that you might use with Amazon EMR, the application hosting domains are registered in the Public Suffix List (PSL). Examples of these hosting domains include the following: `emrstudio-prod.us-east-1.amazonaws.com`, `emrnotebooks-prod.us-east-1.amazonaws.com`, `emrappui-prod.us-east-1.amazonaws.com`. For further security, if you ever need to set sensitive cookies in the default domain name, we recommend that you use cookies with a `__Host-` prefix. This helps to defend your domain against cross-site request forgery attempts (CSRF). For more information, see the [https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie#cookie_prefixes](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie#cookie_prefixes) page in the *Mozilla Developer Network*. 
+ Amazon EMR Studio Workspaces and Persistent UI endpoints use FIPS 140 validated cryptographic modules for encryption-in-transit, which enables easier adoption of the service for regulated workloads. For additional context on Persistent UI endpoints, see [View persistent application user interfaces in Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/app-history-spark-UI.html). For additional context regarding notebooks, see [Amazon EMR Notebooks overview](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks.html).

## Known issues
<a name="emr-studio-known-issues"></a>
+ An EMR Studio that uses IAM Identity Center with trusted identity propagation enabled can only associate with EMR clusters that also use trusted identity propagation.
+ Make sure you deactivate proxy management tools such as FoxyProxy or SwitchyOmega in the browser before you create a Studio. Active proxies can cause errors when you choose **Create Studio**, and result in a **Network Failure ** error message.
+ Kernels that run on Amazon EMR on EKS clusters can fail to start due to timeout issues. If you encounter an error or issue starting the kernel, close the notebook file, shut down the kernel, and then reopen the notebook file.
+ The **Restart kernel** operation doesn't work as expected when you use an Amazon EMR on EKS cluster. After you select **Restart kernel**, refresh the Workspace for the restart to take effect.
+ If a Workspace isn't attached to a cluster, an error message appears when a Studio user opens a notebook file and tries to select a kernel. You can ignore this error message by choosing **Ok**, but you must attach the Workspace to a cluster and select a kernel before you can run notebook code.
+ When you use Amazon EMR 6.2.0 with a [security configuration](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-security-configurations.html) to set up cluster security, the Workspace interface appears blank and doesn't work as expected. We recommend that you use a different supported version of Amazon EMR if you want to configure data encryption or Amazon S3 authorization for EMRFS for a cluster. EMR Studio works with Amazon EMR versions 5.32.0 (Amazon EMR 5.x series) and 6.2.0 (Amazon EMR 6.x series) and higher.
+ When you [Debug Amazon EMR running on Amazon EC2 jobs](emr-studio-debug.md#emr-studio-debug-ec2), the links to the on-cluster Spark UI may not work or fail to appear. To regenerate the links, create a new notebook cell and run the `%%info` command.
+ Jupyter Enterprise Gateway doesn't clean up idle kernels on the primary node of a cluster in the following Amazon EMR release versions: 5.32.0, 5.33.0, 6.2.0, and 6.3.0. Idle kernels consume computing resources and can cause long running clusters to fail. You can configure idle kernel cleanup for Jupyter Enterprise Gateway using the following example script. You can [Connect to the Amazon EMR cluster primary node using SSH](emr-connect-master-node-ssh.md), or submit the script as a step. For more information, see [Run commands and scripts on an Amazon EMR cluster](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-commandrunner.html).

  ```
  #!/bin/bash
  sudo tee -a /emr/notebook-env/conf/jupyter_enterprise_gateway_config.py << EOF
  c.MappingKernelManager.cull_connected = True
  c.MappingKernelManager.cull_idle_timeout = 10800
  c.MappingKernelManager.cull_interval = 300
  EOF
  sudo systemctl daemon-reload
  sudo systemctl restart jupyter_enterprise_gateway
  ```
+ When you use an auto-termination policy with Amazon EMR versions 5.32.0, 5.33.0, 6.2.0, or 6.3.0, Amazon EMR marks a cluster as idle and may automatically terminate the cluster even if you have an active Python3 kernel. This is because executing a Python3 kernel does not submit a Spark job on the cluster. To use auto-termination with a Python3 kernel, we recommend that you use Amazon EMR version 6.4.0 or later. For more information about auto-termination, see [Using an auto-termination policy for Amazon EMR cluster cleanup](emr-auto-termination-policy.md).
+ When you use `%%display` to display a Spark DataFrame in a table, very wide tables may get truncated. You can right-click the output and select **Create New View for Output** to get a scrollable view of the output.
+ Starting a Spark-based kernel, such as PySpark, Spark, or SparkR, starts a Spark session, and running a cell in a notebook queues up Spark jobs in that session. When you interrupt a running cell, the Spark job continues to run. To stop the Spark job, you should use the on-cluster Spark UI. For instructions on how to connect to the Spark UI, see [Debug applications and jobs with EMR Studio](emr-studio-debug.md).
+ Using Amazon EMR Studio Workspaces as the root user in an AWS account causes a `403: Forbidden` error. This is because the Jupyter Enterprise Gateway configuration in Amazon EMR doesn't allow access to the root user. We recommend that you don't use the root user for your everyday tasks. For other authentication options, see [AWS Identity and Access Management for Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-access-iam.html).

## Feature limitations
<a name="emr-studio-limitations"></a>

Amazon EMR Studio doesn't support the following Amazon EMR features:
+ Attaching and running jobs on EMR clusters with a security configuration that specifies Kerberos authentication
+ Clusters with multiple primary nodes
+ Clusters that use Amazon EC2 instances based on AWS Graviton2 for Amazon EMR 6.x releases lower than 6.9.0, and 5.x releases lower than 5.36.1 

The following features aren't supported from a Studio that uses trusted identity propagation:
+ Creating EMR clusters without a template.
+ Using EMR Serverless applications.
+ Launching Amazon EMR on EKS clusters.
+ Using a runtime role.
+ Enabling SQL Explorer or Workspace collaboration.

## Service limits for EMR Studio
<a name="emr-studio-default-limits"></a>

The following table displays service limits for EMR Studio.


****  

| Item | Limit | 
| --- | --- | 
| EMR Studios | Maximum of 100 per AWS account | 
| Subnets | Maximum of 5 associated with each EMR Studio | 
| IAM Identity Center Groups | Maximum of 5 assigned to each EMR Studio | 
| IAM Identity Center Users | Maximum of 100 assigned to each EMR Studio | 

# VPC and subnet best practices for EMR Studio
<a name="emr-studio-vpc-subnet-best-practices"></a>

Use the following best practices to set up an Amazon Virtual Private Cloud (Amazon VPC) with subnets for EMR Studio:
+ You can specify a maximum of five subnets in your VPC to associate with the Studio. We recommend that you provide multiple subnets in different Availability Zones in order to support Workspace availability and give Studio users access to clusters across different Availability Zones. To learn more about working with VPCs, subnets, and Availability Zones, see [VPCs and subnets](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Subnets.html) in the *Amazon Virtual Private Cloud User Guide*.
+ The subnets that you specify should be able to communicate with each other.
+ To let users link a Workspace to publicly hosted Git repositories, you should specify only private subnets that have access to the internet through Network Address Translation (NAT). For more information about setting up a private subnet for Amazon EMR, see [Private subnets](emr-clusters-in-a-vpc.md#emr-vpc-private-subnet).
+ When you use Amazon EMR on EKS with EMR Studio, there must be *at least one subnet in common* between your Studio and the Amazon EKS cluster that you use to register a virtual cluster. Otherwise, your managed endpoint won't appear as an option in Studio Workspaces. You can create an Amazon EKS cluster and associate it with a subnet that belongs to the Studio, or create a Studio and specify your EKS cluster's subnets. 
+ If you plan to use Amazon Amazon EMR on EKS with EMR Studio, choose the same VPC as your Amazon EKS cluster worker nodes.

# Amazon EMR cluster requirements
<a name="emr-studio-cluster-requirements"></a>

**Amazon EMR Clusters Running on Amazon EC2**

All Amazon EMR clusters running on Amazon EC2 that you create for an EMR Studio Workspace must meet the following requirements. Clusters that you create using the EMR Studio interface automatically meet these requirements.
+ The cluster must use Amazon EMR versions 5.32.0 (Amazon EMR 5.x series) or 6.2.0 (Amazon EMR 6.x series) or later. You can create a cluster using the Amazon EMR console, AWS Command Line Interface, or SDK, and then attach it to an EMR Studio Workspace. Studio users can also provision and attach clusters when creating or working in an Amazon EMR Workspace. For more information, see [Attach a compute to an EMR Studio Workspace](emr-studio-create-use-clusters.md).
+ The cluster must be within an Amazon Virtual Private Cloud. The EC2-Classic platform isn't supported.
+ The cluster must have Spark, Livy, and Jupyter Enterprise Gateway installed. If you plan to use the cluster for SQL Explorer, you should install both Presto and Spark.
+ To use SQL Explorer, the cluster must use Amazon EMR version 5.34.0 or later or version 6.4.0 or later and have Presto installed. If you want to specify the AWS Glue Data Catalog as the Hive metastore for Presto, you must configure it on the cluster. For more information, see [Using Presto with the AWS Glue Data Catalog](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-presto-glue.html).
+ The cluster must be in a private subnet with network address translation (NAT) to use publicly-hosted Git repositories with EMR Studio.

We recommend the following cluster configurations when you work with EMR Studio.
+ Set deploy mode for Spark sessions to cluster mode. Cluster mode places the application master processes on the core nodes and not on the primary node of a cluster. Doing so relieves the primary node of potential memory pressures. For more information, see [Cluster Mode Overview](https://spark.apache.org/docs/latest/cluster-overview.html) in the Apache Spark documentation.
+ Change the Livy timeout from the default of one hour to six hours as in the following example configuration.

  ```
  {
      "classification":"livy-conf",
          "Properties":{
              "livy.server.session.timeout":"6h",
              "livy.spark.deploy-mode":"cluster"
          }
  }
  ```
+ Create diverse instance fleets with up to 30 instances, and select multiple instance types in your Spot Instance fleet. For example, you might specify the following memory-optimized instance types for Spark workloads: r5.2x, r5.4x, r5.8x, r5.12x, r5.16x, r4.2x, r4.4x, r4.8x, r4.12, etc. For more information, see [Planning and configuring instance fleets for your Amazon EMR cluster](emr-instance-fleet.md).
+ Use the capacity-optimized allocation strategy for Spot Instances to help Amazon EMR make effective instance selections based on real-time capacity insights from Amazon EC2. For more information, see [Allocation strategy for instance fleets](emr-instance-fleet.md#emr-instance-fleet-allocation-strategy).
+ Enable managed scaling on your cluster. Set the maximum core nodes parameter to the minimum persistent capacity that you plan to use, and configure scaling on a well-diversified task fleet that runs on Spot Instances to save on costs. For more information, see [Using managed scaling in Amazon EMR](emr-managed-scaling.md).

We also urge you to keep Amazon EMR Block Public Access enabled, and that to restrict inbound SSH traffic to trusted sources. Inbound access to a cluster lets users run notebooks on the cluster. For more information, see [Using Amazon EMR block public access](emr-block-public-access.md) and [Control network traffic with security groups for your Amazon EMR cluster](emr-security-groups.md).

**Amazon EMR on EKS Clusters**

In addition to EMR clusters running on Amazon EC2, you can set up and manage Amazon EMR on EKS clusters for EMR Studio using the AWS CLI. Set up Amazon EMR on EKS clusters using the following guidelines:
+ Create a managed HTTPS endpoint for the Amazon EMR on EKS cluster. Users attach a Workspace to a managed endpoint. The Amazon Elastic Kubernetes Service (EKS) cluster that you use to register a virtual cluster must have a private subnet to support managed endpoints.
+ Use an Amazon EKS cluster with at least one private subnet and network address translation (NAT) when you want to use publicly-hosted Git repositories.
+ Avoid using [Amazon EKS optimized Arm Amazon Linux AMIs](https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-ami.html#arm-ami), which aren't supported for Amazon EMR on EKS managed endpoints.
+ Avoid using AWS Fargate-only Amazon EKS clusters, which aren't supported.

# Configure Amazon EMR Studio
<a name="emr-studio-configure"></a>

This section is for EMR Studio administrators. It covers how to set up an EMR Studio for your team and provides instructions for tasks such as assigning users and groups, setting up cluster templates, and optimizing Apache Spark for EMR Studio.

**Topics**
+ [

# Administrator permissions to create and manage an EMR Studio
](emr-studio-admin-permissions.md)
+ [

# Set up an EMR Studio
](emr-studio-set-up.md)
+ [

# Monitor, update and delete Amazon EMR Studio resources
](emr-studio-manage-studio.md)
+ [

# Encrypting EMR Studio workspace notebooks and files
](emr-studio-workspace-storage-encryption.md)
+ [

# Define security groups to control EMR Studio network traffic
](emr-studio-security-groups.md)
+ [

# Create AWS CloudFormation templates for Amazon EMR Studio
](emr-studio-cluster-templates.md)
+ [

# Establish access and permissions for Git-based repositories
](emr-studio-enable-git.md)
+ [

# Optimize Spark jobs in EMR Studio
](emr-studio-spark-optimization.md)

# Administrator permissions to create and manage an EMR Studio
<a name="emr-studio-admin-permissions"></a>

The IAM permissions described on this page permit you to create and manage an EMR Studio. For detailed information about each required permission, see [Permissions required to manage an EMR Studio](#emr-studio-admin-permissions-table).

## Permissions required to manage an EMR Studio
<a name="emr-studio-admin-permissions-table"></a>

The following table lists the operations related to creating and managing an EMR Studio. The table also displays the permissions needed for each operation.

**Note**  
You only need IAM Identity Center and Studio `SessionMapping` actions when you use IAM Identity Center authentication mode.


**Permissions to create and manage an EMR Studio**  

| Operation | Permissions | 
| --- | --- | 
| Create a Studio |  <pre>"elasticmapreduce:CreateStudio", <br />"sso:CreateApplication",<br />"sso:PutApplicationAuthenticationMethod",<br />"sso:PutApplicationGrant",<br />"sso:PutApplicationAccessScope",<br />"sso:PutApplicationAssignmentConfiguration",<br />"iam:PassRole"</pre>  | 
| Describe a Studio |  <pre>"elasticmapreduce:DescribeStudio",<br />"sso:GetManagedApplicationInstance"</pre>  | 
| List Studios |  <pre>"elasticmapreduce:ListStudios"</pre>  | 
| Delete a Studio |  <pre>"elasticmapreduce:DeleteStudio",<br />"sso:DeleteApplication",<br />"sso:DeleteApplicationAuthenticationMethod",<br />"sso:DeleteApplicationAccessScope",<br />"sso:DeleteApplicationGrant"</pre>  | 
| Additional permissions required when you use IAM Identity Center mode | 
|  Assign users or groups to a Studio  |  <pre>"elasticmapreduce:CreateStudioSessionMapping",<br />"sso:GetProfile",<br />"sso:ListDirectoryAssociations",<br />"sso:ListProfiles",<br />"sso:AssociateProfile",<br />"sso-directory:SearchUsers",<br />"sso-directory:SearchGroups",<br />"sso-directory:DescribeUser",<br />"sso-directory:DescribeGroup",<br />"sso:ListInstances",<br />"sso:CreateApplicationAssignment",<br />"sso:DescribeInstance",<br />"organizations:DescribeOrganization",<br />"organizations:ListDelegatedAdministrators",<br />"sso:CreateInstance",<br />"sso:DescribeRegisteredRegions",<br />"sso:GetSharedSsoConfiguration",<br />"iam:ListPolicies"</pre>  | 
|  Retrieve Studio assignment details for a specific user or group  |  <pre>"sso-directory:SearchUsers",<br />"sso-directory:SearchGroups",<br />"sso-directory:DescribeUser",<br />"sso-directory:DescribeGroup",<br />"sso:DescribeApplication",<br />"elasticmapreduce:GetStudioSessionMapping"</pre>  | 
| List all users and groups assigned to a Studio |  <pre>"elasticmapreduce:ListStudioSessionMappings"</pre>  | 
| Update the session policy attached to a user or group assigned to a Studio |  <pre>"sso-directory:SearchUsers",<br />"sso-directory:SearchGroups",<br />"sso-directory:DescribeUser",<br />"sso-directory:DescribeGroup",<br />"sso:DescribeApplication",<br />"sso:DescribeInstance",<br />"elasticmapreduce:UpdateStudioSessionMapping"</pre>  | 
| Remove a user or group from a Studio |  <pre>"elasticmapreduce:DeleteStudioSessionMapping",<br />"sso-directory:SearchUsers",<br />"sso-directory:SearchGroups",<br />"sso-directory:DescribeUser",<br />"sso-directory:DescribeGroup",<br />"sso:ListDirectoryAssociations",<br />"sso:GetProfile",<br />"sso:DescribeApplication",<br />"sso:DescribeInstance",<br />"sso:ListProfiles",<br />"sso:DisassociateProfile",<br />"sso:DeleteApplicationAssignment",<br />"sso:ListApplicationAssignments"<br /></pre>  | 

**To create a policy with admin permissions for EMR Studio**

1. Follow the instructions in [Creating IAM policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create.html) to create a policy using one of the following examples. The permissions you need depend on your [authentication mode for EMR Studio](emr-studio-authentication.md). 

   Insert your own values for these items:
   + Replace *`<your-resource-ARN>` *to specify the Amazon Resource Name (ARN) of the object or objects that the statement covers for your use cases.
   + Replace *<region>* with the code of the AWS Region where you plan to create the Studio.
   + Replace *<aws-account\$1id>* with the ID of the AWS account for the Studio.
   + Replace *<EMRStudio-Service-Role>* and *<EMRStudio-User-Role>* with the names of your [EMR Studio service role](emr-studio-service-role.md) and [EMR Studio user role](emr-studio-user-permissions.md#emr-studio-create-user-role).  
**Example policy: Admin permissions when you use IAM authentication mode**  

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Resource": [
           "arn:aws:elasticmapreduce:*:123456789012:studio/*"
         ],
         "Action": [
           "elasticmapreduce:CreateStudio",
           "elasticmapreduce:DescribeStudio",
           "elasticmapreduce:DeleteStudio"
         ],
         "Sid": "AllowELASTICMAPREDUCECreatestudio"
       },
       {
         "Effect": "Allow",
         "Resource": [
           "*"
         ],
         "Action": [
           "elasticmapreduce:ListStudios"
         ],
         "Sid": "AllowELASTICMAPREDUCEListstudios"
       },
       {
         "Effect": "Allow",
         "Resource": [
           "arn:aws:iam::123456789012:role/EMRStudioServiceRole"
         ],
         "Action": [
           "iam:PassRole"
         ],
         "Sid": "AllowIAMPassrole"
       }
     ]
   }
   ```

------  
**Example policy: Admin permissions when you use IAM Identity Center authentication mode**  
**Note**  
Identity Center and Identity Center directory APIs don't support specifying an ARN in the resource element of an IAM policy statement. To allow access to IAM Identity Center and IAM Identity Center Directory, the following permissions specify all resources, "Resource":"\$1", for IAM Identity Center actions. For more information, see [Actions, resources, and condition keys for IAM Identity Center Directory](https://docs.aws.amazon.com/service-authorization/latest/reference/list_awsssodirectory.html#awsssodirectory-actions-as-permissions).

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Resource": [
           "arn:aws:elasticmapreduce:*:123456789012:studio/*"
         ],
         "Action": [
           "elasticmapreduce:CreateStudio",
           "elasticmapreduce:DescribeStudio",
           "elasticmapreduce:DeleteStudio",
           "elasticmapreduce:CreateStudioSessionMapping",
           "elasticmapreduce:GetStudioSessionMapping",
           "elasticmapreduce:UpdateStudioSessionMapping",
           "elasticmapreduce:DeleteStudioSessionMapping"
         ],
         "Sid": "AllowELASTICMAPREDUCECreatestudio"
       },
       {
         "Effect": "Allow",
         "Resource": [
           "*"
         ],
         "Action": [
           "elasticmapreduce:ListStudios",
           "elasticmapreduce:ListStudioSessionMappings"
         ],
         "Sid": "AllowELASTICMAPREDUCEListstudios"
       },
       {
         "Effect": "Allow",
         "Resource": [
           "arn:aws:iam::123456789012:role/EMRStudio-SvcRole",
           "arn:aws:iam::123456789012:role/EMRStudio-User-Role"
         ],
         "Action": [
           "iam:PassRole"
         ],
         "Sid": "AllowIAMPassrole"
       },
       {
         "Effect": "Allow",
         "Resource": [
           "*"
         ],
         "Action": [
           "sso:CreateApplication",
           "sso:PutApplicationAuthenticationMethod",
           "sso:PutApplicationGrant",
           "sso:PutApplicationAccessScope",
           "sso:PutApplicationAssignmentConfiguration",
           "sso:DescribeApplication",
           "sso:DeleteApplication",
           "sso:DeleteApplicationAuthenticationMethod",
           "sso:DeleteApplicationAccessScope",
           "sso:DeleteApplicationGrant",
           "sso:ListInstances",
           "sso:CreateApplicationAssignment",
           "sso:DeleteApplicationAssignment",
           "sso:ListApplicationAssignments",
           "sso:DescribeInstance",
           "sso:AssociateProfile",
           "sso:DisassociateProfile",
           "sso:GetProfile",
           "sso:ListDirectoryAssociations",
           "sso:ListProfiles",
           "sso-directory:SearchUsers",
           "sso-directory:SearchGroups",
           "sso-directory:DescribeUser",
           "sso-directory:DescribeGroup",
           "organizations:DescribeOrganization",
           "organizations:ListDelegatedAdministrators",
           "sso:CreateInstance",
           "sso:DescribeRegisteredRegions",
           "sso:GetSharedSsoConfiguration",
           "iam:ListPolicies"
         ],
         "Sid": "AllowSSOCreateapplication"
       }
     ]
   }
   ```

------

1. Attach the policy to your IAM identity (user, role, or group). For instructions, see [Adding and removing IAM identity permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html).

# Set up an EMR Studio
<a name="emr-studio-set-up"></a>

Complete the following steps to set up an EMR Studio.

**Before you start**

**Note**  
If you plan to use EMR Studio with Amazon EMR on EKS, we recommend that you first set up Amazon EMR on EKS for EMR Studio before you set up a Studio.

Before you set up an EMR Studio, make sure you have the following items:
+ An AWS account. For instructions, see [Before you set up Amazon EMR](emr-setting-up.md).
+ Permissions to create and manage an EMR Studio. For more information, see [Administrator permissions to create and manage an EMR Studio](emr-studio-admin-permissions.md).
+ An Amazon S3 bucket where EMR Studio can back up the Workspaces and notebook files in your Studio. For instructions, see [Creating a bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html) in the *Amazon Simple Storage Service (S3) User Guide*.
+ If you want to attach to an Amazon EMR on EC2 or Amazon EMR on EKS cluster, or use Git repositories, you need an Amazon Virtual Private Cloud (VPC) for the Studio, and a maximum of five subnets. You don't need a VPC to use EMR Studio with EMR Serverless. For tips on how to configure networking, see [VPC and subnet best practices for EMR Studio](emr-studio-vpc-subnet-best-practices.md).

**To set up an EMR Studio**

1.  [Choose an authentication mode for Amazon EMR Studio](emr-studio-authentication.md)

1. Create the following Studio resources.
   + [Create an EMR Studio service role](emr-studio-service-role.md)
   + [Configure EMR Studio user permissions for Amazon EC2 or Amazon EKS](emr-studio-user-permissions.md)
   + (Optional) [Define security groups to control EMR Studio network traffic](emr-studio-security-groups.md).

1. [Create an EMR Studio](emr-studio-create-studio.md)

1. [Assign a user or group to an EMR Studio](emr-studio-manage-users.md#emr-studio-assign-users-groups)

After you complete the setup steps, you can [Use an Amazon EMR Studio](use-an-emr-studio.md).

# Choose an authentication mode for Amazon EMR Studio
<a name="emr-studio-authentication"></a>

EMR Studio supports two authentication modes: IAM authentication mode and IAM Identity Center authentication mode. IAM mode uses AWS Identity and Access Management (IAM), while IAM Identity Center mode uses AWS IAM Identity Center. When you create an EMR Studio, you choose the authentication mode for all users of that Studio. For more information about the different authentication modes, see [Authentication and user login](how-emr-studio-works.md#emr-studio-login).

Use the following table to choose an authentication mode for EMR Studio.


****  

| If you are... | We recommend... | 
| --- | --- | 
| Already familiar with or have previously set up IAM authentication or federation |  [IAM authentication mode](#emr-studio-iam-authentication), which offers the following benefits: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-authentication.html)  | 
| New to AWS or Amazon EMR |  [IAM Identity Center authentication mode](#emr-studio-enable-sso), which provides the following features: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-authentication.html)  | 

## Set up IAM authentication mode for Amazon EMR Studio
<a name="emr-studio-iam-authentication"></a>

With IAM authentication mode, you can use either IAM authentication or IAM federation. IAM *authentication* lets you manage IAM identities such as users, groups, and roles in IAM. You grant users access to a Studio with IAM permissions policies and [attribute-based access control (ABAC)](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction_attribute-based-access-control.html). IAM *federation* lets you establish trust between a third-party identity provider (IdP) and AWS so that you can manage user identities through your IdP.

**Note**  
If you already use IAM to control access to AWS resources, or if you've already configured your identity provider (IdP) for IAM, see [User permissions for IAM authentication mode](how-emr-studio-works.md#emr-studio-iam-authorization) to set user permissions when you use IAM authentication mode for EMR Studio.

### Use IAM federation for Amazon EMR Studio
<a name="emr-studio-iam-federation"></a>

To use IAM federation for EMR Studio, you create a trust relationship between your AWS account and your identity provider (IdP) and enable federated users to access the AWS Management Console. The steps you take to create this trust relationship differ depending on your IdP's federation standard.

In general, you complete the following tasks to configure federation with an external IdP. For complete instructions, see [Enabling SAML 2.0 federated users to access the AWS Management Console](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_enable-console-saml.html) and [Enabling custom identity broker access to the AWS Management Console](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_enable-console-custom-url.html) in the *AWS Identity and Access Management User Guide*.

1. Gather information from your IdP. This usually means generating a metadata document to validate SAML authentication requests from your IdP.

1. Create an identity provider IAM entity to store information about your IdP. For instructions, see [Creating IAM identity providers](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_create.html).

1. Create one or more IAM roles for your IdP. EMR Studio assigns a role to a federated user when the user logs in. The role permits your IdP to request temporary security credentials for access to AWS. For instructions, see [Creating a role for a third-party identity provider (federation)](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-idp.html). The permissions policies that you assign to the role determine what federated users can do in AWS and in an EMR Studio. For more information, see [User permissions for IAM authentication mode](how-emr-studio-works.md#emr-studio-iam-authorization).

1. (For SAML providers) Complete the SAML trust by configuring your IdP with information about AWS and the roles that you want federated users to assume. This configuration process creates *relying party trust* between your IdP and AWS. For more information, see [Configuring your SAML 2.0 IdP with relying party trust and adding claims](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_create_saml_relying-party.html).

**To configure an EMR Studio as a SAML application in your IdP portal**

You can configure a particular EMR Studio as a SAML application using a deep link to the Studio. Doing so lets users log in to your IdP portal and launch a specific Studio instead of navigating through the Amazon EMR console.
+ Use the following format to configure a deep link to your EMR Studio as a landing URL after SAML assertion verification. 

  ```
  https://console.aws.amazon.com/emr/home?region=<aws-region>#studio/<your-studio-id>/start
  ```

## Set up IAM Identity Center authentication mode for Amazon EMR Studio
<a name="emr-studio-enable-sso"></a>

To prepare AWS IAM Identity Center for EMR Studio, you must configure your identity source and provision users and groups. Provisioning is the process of making user and group information available for use by IAM Identity Center and by applications that use IAM Identity Center. For more information, see [User and group provisioning](https://docs.aws.amazon.com/singlesignon/latest/userguide/users-groups-provisioning.html#user-group-provision). 

EMR Studio supports using the following identity providers for IAM Identity Center:
+ **AWS Managed Microsoft AD and self-managed Active Directory** – For more information, see [Connect to your Microsoft AD directory](https://docs.aws.amazon.com/singlesignon/latest/userguide/manage-your-identity-source-ad.html).
+ **SAML-based providers** – For a full list, see [Supported identity providers](https://docs.aws.amazon.com/singlesignon/latest/userguide/supported-idps.html).
+ **The IAM Identity Center directory** – For more information, see [Manage identities in IAM Identity Center](https://docs.aws.amazon.com/singlesignon/latest/userguide/manage-your-identity-source-sso.html).

**To set up IAM Identity Center for EMR Studio**

1. To set up IAM Identity Center for EMR Studio, you need the following:
   + A management account in your AWS organization if you use multiple accounts in your organization. 
**Note**  
You should only use your management account to enable IAM Identity Center and *provision* users and groups. After you set up IAM Identity Center, use a member account to create an EMR Studio and *assign* users and groups. To learn more about AWS terminology, see [AWS Organizations terminology and concepts](https://docs.aws.amazon.com/organizations/latest/userguide/orgs_getting-started_concepts.html). 
   + If you enabled IAM Identity Center before November 25, 2019, you might have to enable applications that use IAM Identity Center for the accounts in your AWS organization. For more information, see [Enable IAM Identity Center-integrated applications in AWS accounts](https://docs.aws.amazon.com/singlesignon/latest/userguide/app-enablement.html#enable-app-enablement).
   + Make sure that you have the prerequisites listed on the [IAM Identity Center prerequisites](https://docs.aws.amazon.com/singlesignon/latest/userguide/prereqs.html) page.

1. Follow the instructions in [Enable IAM Identity Center](https://docs.aws.amazon.com/singlesignon/latest/userguide/step1.html) to enable IAM Identity Center in the AWS Region where you want to create the EMR Studio.

1. Connect IAM Identity Center to your identity provider and provision the users and groups that you want to assign to the Studio.   
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-authentication.html)

You can now assign users and groups from your Identity Store to an EMR Studio. For instructions, see [Assign a user or group to an EMR Studio](emr-studio-manage-users.md#emr-studio-assign-users-groups).

# Create an EMR Studio service role
<a name="emr-studio-service-role"></a>

## About the EMR Studio service role
<a name="emr-studio-about-service-role"></a>

Each EMR Studio uses an IAM role with permissions that let the Studio interact with other AWS services. This service role must include permissions that allow EMR Studio to establish a secure network channel between Workspaces and clusters, to store notebook files in Amazon S3 Control, and to access the AWS Secrets Manager while linking a Workspace to a Git repository.

Use the Studio service role (instead of session policies) to define all Amazon S3 access permissions for storing notebook files, and to define AWS Secrets Manager access permissions.

## How to create a service role for EMR Studio on Amazon EC2 or Amazon EKS
<a name="emr-studio-service-role-instructions"></a>

1. Follow the instructions in [Creating a role to delegate permissions to an AWS service](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-service.html) to create the service role with the following trust policy. 
**Important**  
The following trust policy includes the [https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-sourcearn](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-sourcearn) and [https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-sourceaccount](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-sourceaccount) global condition keys to limit the permissions that you give EMR Studio to particular resources in your account. Doing so can protect you against [the confused deputy problem](https://docs.aws.amazon.com/IAM/latest/UserGuide/confused-deputy.html).

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Action": [
           "sts:AssumeRole"
         ],
         "Resource": "arn:aws:iam::123456789012:role/EMRStudioServiceRole",
         "Condition": {
           "StringEquals": {
             "aws:SourceAccount": "123456789012"
           },
           "ArnLike": {
             "aws:SourceArn": "arn:aws:elasticmapreduce:*:123456789012:*"
           }
         },
         "Sid": "AllowSTSAssumerole"
       }
     ]
   }
   ```

------

1. Remove the default role permissions. Then, include the permissions from the following sample IAM permissions policy. Alternatively, you can create a custom policy that uses the [EMR Studio service role permissions](#emr-studio-service-role-permissions-table).
**Important**  
For Amazon EC2 tag-based access control with to work with EMR Studio, you must set access for the `ModifyNetworkInterfaceAttribute` API as shown the following policy.
For EMR Studio to work with the service role, you must not change the following statements: `AllowAddingEMRTagsDuringDefaultSecurityGroupCreation` and `AllowAddingTagsDuringEC2ENICreation`.
To use the example policy, you must tag the following resources with the key `"for-use-with-amazon-emr-managed-policies"` and value `"true"`.  
Your Amazon Virtual Private Cloud (VPC) for EMR Studio.
Each subnet that you want to use with the Studio.
Any custom EMR Studio security groups. You must tag any security groups that you created during the EMR Studio preview period if you want to continue to use them. 
Secrets maintained in AWS Secrets Manager that Studio users use to link Git repositories to a Workspace.
You can apply tags to resources using the **Tags** tab on the relevant resource screen in the AWS Management Console.

   Where applicable, change the `*` in `"Resource":"*"` in the following policy to specify the Amazon Resource Name (ARN) of the resources that the statement covers for your use case.

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Sid": "AllowEMRReadOnlyActions",
         "Effect": "Allow",
         "Action": [
           "elasticmapreduce:ListInstances",
           "elasticmapreduce:DescribeCluster",
           "elasticmapreduce:ListSteps"
         ],
         "Resource": [
           "*"
         ]
       },
       {
         "Sid": "AllowEC2ENIActionsWithEMRTags",
         "Effect": "Allow",
         "Action": [
           "ec2:CreateNetworkInterfacePermission",
           "ec2:DeleteNetworkInterface"
         ],
         "Resource": [
           "arn:aws:ec2:*:*:network-interface/*"
         ],
         "Condition": {
           "StringEquals": {
             "aws:ResourceTag/for-use-with-amazon-emr-managed-policies": "true"
           }
         }
       },
       {
         "Sid": "AllowEC2ENIAttributeAction",
         "Effect": "Allow",
         "Action": [
           "ec2:ModifyNetworkInterfaceAttribute"
         ],
         "Resource": [
           "arn:aws:ec2:*:*:instance/*",
           "arn:aws:ec2:*:*:network-interface/*",
           "arn:aws:ec2:*:*:security-group/*"
         ]
       },
       {
         "Sid": "AllowEC2SecurityGroupActionsWithEMRTags",
         "Effect": "Allow",
         "Action": [
           "ec2:AuthorizeSecurityGroupEgress",
           "ec2:AuthorizeSecurityGroupIngress",
           "ec2:RevokeSecurityGroupEgress",
           "ec2:RevokeSecurityGroupIngress",
           "ec2:DeleteNetworkInterfacePermission"
         ],
         "Resource": [
           "*"
         ],
         "Condition": {
           "StringEquals": {
             "aws:ResourceTag/for-use-with-amazon-emr-managed-policies": "true"
           }
         }
       },
       {
         "Sid": "AllowDefaultEC2SecurityGroupsCreationWithEMRTags",
         "Effect": "Allow",
         "Action": [
           "ec2:CreateSecurityGroup"
         ],
         "Resource": [
           "arn:aws:ec2:*:*:security-group/*"
         ],
         "Condition": {
           "StringEquals": {
             "aws:RequestTag/for-use-with-amazon-emr-managed-policies": "true"
           }
         }
       },
       {
         "Sid": "AllowDefaultEC2SecurityGroupsCreationInVPCWithEMRTags",
         "Effect": "Allow",
         "Action": [
           "ec2:CreateSecurityGroup"
         ],
         "Resource": [
           "arn:aws:ec2:*:*:vpc/*"
         ],
         "Condition": {
           "StringEquals": {
             "aws:ResourceTag/for-use-with-amazon-emr-managed-policies": "true"
           }
         }
       },
       {
         "Sid": "AllowAddingEMRTagsDuringDefaultSecurityGroupCreation",
         "Effect": "Allow",
         "Action": [
           "ec2:CreateTags"
         ],
         "Resource": [
           "arn:aws:ec2:*:*:security-group/*"
         ],
         "Condition": {
           "StringEquals": {
             "aws:RequestTag/for-use-with-amazon-emr-managed-policies": "true",
             "ec2:CreateAction": "CreateSecurityGroup"
           }
         }
       },
       {
         "Sid": "AllowEC2ENICreationWithEMRTags",
         "Effect": "Allow",
         "Action": [
           "ec2:CreateNetworkInterface"
         ],
         "Resource": [
           "arn:aws:ec2:*:*:network-interface/*"
         ],
         "Condition": {
           "StringEquals": {
             "aws:RequestTag/for-use-with-amazon-emr-managed-policies": "true"
           }
         }
       },
       {
         "Sid": "AllowEC2ENICreationInSubnetAndSecurityGroupWithEMRTags",
         "Effect": "Allow",
         "Action": [
           "ec2:CreateNetworkInterface"
         ],
         "Resource": [
           "arn:aws:ec2:*:*:subnet/*",
           "arn:aws:ec2:*:*:security-group/*"
         ],
         "Condition": {
           "StringEquals": {
             "aws:ResourceTag/for-use-with-amazon-emr-managed-policies": "true"
           }
         }
       },
       {
         "Sid": "AllowAddingTagsDuringEC2ENICreation",
         "Effect": "Allow",
         "Action": [
           "ec2:CreateTags"
         ],
         "Resource": [
           "arn:aws:ec2:*:*:network-interface/*"
         ],
         "Condition": {
           "StringEquals": {
             "ec2:CreateAction": "CreateNetworkInterface"
           }
         }
       },
       {
         "Sid": "AllowEC2ReadOnlyActions",
         "Effect": "Allow",
         "Action": [
           "ec2:DescribeSecurityGroups",
           "ec2:DescribeNetworkInterfaces",
           "ec2:DescribeTags",
           "ec2:DescribeInstances",
           "ec2:DescribeSubnets",
           "ec2:DescribeVpcs"
         ],
         "Resource": [
           "*"
         ]
       },
       {
         "Sid": "AllowSecretsManagerReadOnlyActionsWithEMRTags",
         "Effect": "Allow",
         "Action": [
           "secretsmanager:GetSecretValue"
         ],
         "Resource": [
           "arn:aws:secretsmanager:*:*:secret:*"
         ],
         "Condition": {
           "StringEquals": {
             "aws:ResourceTag/for-use-with-amazon-emr-managed-policies": "true"
           }
         }
       },
       {
         "Sid": "AllowWorkspaceCollaboration",
         "Effect": "Allow",
         "Action": [
           "iam:GetUser",
           "iam:GetRole",
           "iam:ListUsers",
           "iam:ListRoles",
           "sso:GetManagedApplicationInstance",
           "sso-directory:SearchUsers"
         ],
         "Resource": [
           "*"
         ]
       }
     ]
   }
   ```

------

1. Give your service role read and write access to your Amazon S3 location for EMR Studio. Use the following minimum set of permissions. For more information, see the [Amazon S3: Allows read and write access to objects in an S3 Bucket, programmatically and in the console](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_examples_s3_rw-bucket-console.html) example.

   ```
   "s3:PutObject",
   "s3:GetObject",
   "s3:GetEncryptionConfiguration",
   "s3:ListBucket",
   "s3:DeleteObject"
   ```

   If you encrypt your Amazon S3 bucket, include the following permissions for AWS Key Management Service.

   ```
   "kms:Decrypt",
   "kms:GenerateDataKey",
   "kms:ReEncryptFrom",
   "kms:ReEncryptTo",
   "kms:DescribeKey"
   ```

1. If you want to control access to Git secrets at user level, add tag-based permissions to `secretsmanager:GetSecretValue` in the EMR Studio **user role policy**, and remove permissions to `secretsmanager:GetSecretValue` policy from the EMR Studio **service role policy**. For more information on setting fine-grained user permissions, see [Create permissions policies for EMR Studio users](emr-studio-user-permissions.md#emr-studio-permissions-policies).

## Minimum service role for EMR Serverless
<a name="emr-studio-service-role-serverless"></a>

If you want to run interactive workloads with EMR Serverless through EMR Studio notebooks, use the same trust policy that you use to set up EMR Studio in the previous section, [How to create a service role for EMR Studio on Amazon EC2 or Amazon EKS](#emr-studio-service-role-instructions).

For your IAM policy, the minimum viable policy has permissions as follows. Update `bucket-name` with the name of the bucket that you plan to use when you configure your EMR Studio and Workspace. EMR Studio uses the bucket back up the Workspaces and notebook files in your Studio. 

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "ObjectActions",
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::bucket-name/*"
      ]
    },
    {
      "Sid": "BucketActions",
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetEncryptionConfiguration"
      ],
      "Resource": [
        "arn:aws:s3:::bucket-name"
      ]
    }
  ]
}
```

------

If you plan to use an encrypted Amazon S3 bucket, add the following permissions on your policy:

```
"kms:Decrypt",
"kms:GenerateDataKey",
"kms:ReEncryptFrom",
"kms:ReEncryptTo",
"kms:DescribeKey"
```

## EMR Studio service role permissions
<a name="emr-studio-service-role-permissions-table"></a>

The following table lists the operations that EMR Studio performs using the service role, along with the IAM actions required for each operation.


| Operation | Actions | 
| --- | --- | 
| Establish a secure network channel between a Workspace and an EMR cluster, and perform necessary cleanup actions. |  <pre>"ec2:CreateNetworkInterface", <br />"ec2:CreateNetworkInterfacePermission", <br />"ec2:DeleteNetworkInterface", <br />"ec2:DeleteNetworkInterfacePermission", <br />"ec2:DescribeNetworkInterfaces", <br />"ec2:ModifyNetworkInterfaceAttribute", <br />"ec2:AuthorizeSecurityGroupEgress", <br />"ec2:AuthorizeSecurityGroupIngress", <br />"ec2:CreateSecurityGroup",<br />"ec2:DescribeSecurityGroups", <br />"ec2:RevokeSecurityGroupEgress",<br />"ec2:DescribeTags",<br />"ec2:DescribeInstances",<br />"ec2:DescribeSubnets",<br />"ec2:DescribeVpcs",<br />"elasticmapreduce:ListInstances", <br />"elasticmapreduce:DescribeCluster", <br />"elasticmapreduce:ListSteps"</pre>  | 
| Use Git credentials stored in AWS Secrets Manager to link Git repositories to a Workspace. |  <pre>"secretsmanager:GetSecretValue"</pre>  | 
| Apply AWS tags to the network interface and default security groups that EMR Studio creates while setting up the secure network channel. For more information, see [Tagging AWS resources](https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html). |  <pre>"ec2:CreateTags"</pre>  | 
| Access or upload notebook files and metadata to Amazon S3. |  <pre>"s3:PutObject",<br />"s3:GetObject",<br />"s3:GetEncryptionConfiguration",<br />"s3:ListBucket",<br />"s3:DeleteObject" </pre> If you use an encrypted Amazon S3 bucket, include the following permissions. <pre>"kms:Decrypt",<br />"kms:GenerateDataKey",<br />"kms:ReEncryptFrom",<br />"kms:ReEncryptTo",<br />"kms:DescribeKey"</pre>  | 
| Enable and configure Workspace collaboration. |  <pre>"iam:GetUser",<br />"iam:GetRole",<br />"iam:ListUsers",<br />"iam:ListRoles",<br />"sso:GetManagedApplicationInstance",<br />"sso-directory:SearchUsers",<br />"sso:DescribeApplication",<br />"sso:DescribeInstance"</pre>  | 
| [ Encrypt EMR Studio workspace notebooks and files using customer managed keys (CMK) with AWS Key Management Service](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-workspace-storage-encryption)  |  <pre>"kms:Decrypt",<br />"kms:GenerateDataKey",<br />"kms:ReEncryptFrom",<br />"kms:ReEncryptTo",<br />"kms:DescribeKey"</pre>  | 

# Configure EMR Studio user permissions for Amazon EC2 or Amazon EKS
<a name="emr-studio-user-permissions"></a>

You must configure user permissions policies for Amazon EMR Studio so that you can set fine-grained user and group permissions. For information about how user permissions work in EMR Studio, see [Access control](how-emr-studio-works.md#emr-studio-access-control) in [How Amazon EMR Studio works](how-emr-studio-works.md). 

**Note**  
The permissions covered in this section don't enforce data access control. To manage access to input datasets, you should configure permissions for the clusters that your Studio uses. For more information, see [Security in Amazon EMR](emr-security.md).

## Create an EMR Studio user role for IAM Identity Center authentication mode
<a name="emr-studio-create-user-role"></a>

You must create an EMR Studio user role when you use IAM Identity Center authentication mode. 

**To create a user role for EMR Studio**

1. Follow the instructions in [Creating a role to delegate permissions to an AWS service](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-service.html) in the *AWS Identity and Access Management User Guide* to create a user role.

   When you create the role, use the following trust relationship policy.

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Action": [
           "sts:AssumeRole",
           "sts:SetContext"
         ],
         "Resource": "arn:aws:iam::123456789012:role/EMRStudioServiceRole",
         "Sid": "AllowSTSAssumerole"
       }
     ]
   }
   ```

------

1. Remove the default role permissions and policies. 

1. Before you assign users and groups to a Studio, attach your EMR Studio session policies to the user role. For instructions on how to create session policies, see [Create permissions policies for EMR Studio users](#emr-studio-permissions-policies).

## Create permissions policies for EMR Studio users
<a name="emr-studio-permissions-policies"></a>

Refer to the following sections to create permissions policies for EMR Studio.

**Topics**
+ [

### Create the permissions policies
](#emr-studio-permissions-policies-create)
+ [

### Set ownership for Workspace collaboration
](#emr-studio-workspace-collaboration-permissions)
+ [

### Create user-level Git secrets policy
](#emr-studio-permissions-policies-git)
+ [

### Attach the permissions policy to your IAM identity
](#emr-studio-permissions-policies-attach)

**Note**  
To set Amazon S3 access permissions for storing notebook files, and to set AWS Secrets Manager access permissions to read secrets when you link Workspaces to Git repositories, use the EMR Studio service role. 

### Create the permissions policies
<a name="emr-studio-permissions-policies-create"></a>

Create one or more IAM permissions policies that specify what actions a user can take in your Studio. For example, you can create three separate policies for [basic](), [intermediate](), and [advanced]() Studio user types with the example policies on this page.

For a breakdown of each Studio operation that a user might perform, and the minimum IAM actions that are required to perform each operation, see [AWS Identity and Access Management permissions for EMR Studio users](#emr-studio-iam-permissions-table). For steps to create the policies, see [Creating IAM policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create-console.html) in the *IAM User Guide*.

Your permissions policy must include the following statements.

```
{
            "Sid": "AllowAddingTagsOnSecretsWithEMRStudioPrefix",
            "Effect": "Allow",
            "Action": "secretsmanager:TagResource",
            "Resource": "arn:aws:secretsmanager:*:*:secret:emr-studio-*"
},
{
            "Sid": "AllowPassingServiceRoleForWorkspaceCreation",
            "Action": "iam:PassRole",
            "Resource": [
                "arn:aws:iam::*:role/your-emr-studio-service-role"
            ],
            "Effect": "Allow"
}
```

### Set ownership for Workspace collaboration
<a name="emr-studio-workspace-collaboration-permissions"></a>

Workspace collaboration lets multiple users work simultaneously in the same Workspace and can be configured with the **Collaboration** panel in the Workspace UI. In order to see and use the **Collaboration** panel, a user must have the following permissions. Any user with these permissions can see and use the **Collaboration** panel.

```
"elasticmapreduce:UpdateEditor",
"elasticmapreduce:PutWorkspaceAccess",
"elasticmapreduce:DeleteWorkspaceAccess",
"elasticmapreduce:ListWorkspaceAccessIdentities"
```

To restrict access to the **Collaboration** panel, you can use tag-based access control. When a user creates a Workspace, EMR Studio applies a default tag with a key of `creatorUserId` whose value is the ID of the user creating the Workspace. 

**Note**  
EMR Studio adds the `creatorUserId` tag to Workspaces created after November 16, 2021. To restrict who can configure collaboration for workspaces that you created before this date, we recommend that you manually add the `creatorUserId` tag to your Workspace, and then use tag-based access control in your user permissions policies.

The following example statement allows a user to configure collaboration for any Workspace with the tag key `creatorUserId` whose value matches the user's ID (indicated by the policy variable `aws:userId`). In other words, the statement lets a user configure collaboration for the Workspaces that they create. To learn more about policy variables, see [IAM policy elements: Variables and tags](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_variables.html) in the *IAM User Guide*.

```
    {
        "Sid": "UserRolePermissionsForCollaboration",
        "Action": [
            "elasticmapreduce:UpdateEditor",
            "elasticmapreduce:PutWorkspaceAccess",
            "elasticmapreduce:DeleteWorkspaceAccess",
            "elasticmapreduce:ListWorkspaceAccessIdentities"
        ],
        "Resource": "*",
        "Effect": "Allow",
        "Condition": {
            "StringEquals": {
                "elasticmapreduce:ResourceTag/creatorUserId": "${aws:userid}"
            }
        }
    }
```

### Create user-level Git secrets policy
<a name="emr-studio-permissions-policies-git"></a>

**Topics**
+ [

#### To use user-level permissions
](#emr-studio-permissions-policies-user)
+ [

#### To transition from service-level permissions to user-level permissions
](#emr-studio-permissions-policies-transition)
+ [

#### To use service-level permissions
](#emr-studio-permissions-policies-service)

#### To use user-level permissions
<a name="emr-studio-permissions-policies-user"></a>

EMR Studio automatically adds the `for-use-with-amazon-emr-managed-user-policies` tag when it creates Git secrets. If you want to control access to Git secrets at the user level, add tag-based permissions to the EMR Studio **user role policy** with `secretsmanager:GetSecretValue` as shown in the [To transition from service-level permissions to user-level permissions](#emr-studio-permissions-policies-transition) section below.

If you have existing permissions for `secretsmanager:GetSecretValue` in the EMR Studio **service role policy**, you should remove those permissions.

#### To transition from service-level permissions to user-level permissions
<a name="emr-studio-permissions-policies-transition"></a>

**Note**  
The `for-use-with-amazon-emr-managed-user-policies` tag ensures that the permissions from **Step 1** below grant the creator of the workspace access to the Git secret. However, if you linked Git repositories before September 1, 2023, then the corresponding Git secrets will be denied access because they don't have the `for-use-with-amazon-emr-managed-user-policies` tag applied. To apply user-level permissions, you must recreate the old secrets from JupyterLab and link the appropriate Git repositories again.  
For more information about policy variables, see [IAM policy elements: Variables and tags](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_variables.html) in the *IAM User Guide*.

1. Add the following permissions to the the [EMR Studio **user role policy**](emr-studio-service-role.md). It uses the `for-use-with-amazon-emr-managed-user-policies` key with value `"${aws:userid}"`.

   ```
   {
      "Sid": "AllowSecretsManagerReadOnlyActionsWithEMRTags",
       "Effect": "Allow",
       "Action": "secretsmanager:GetSecretValue",
       "Resource": "arn:aws:secretsmanager:*:*:secret:*",
       "Condition": {
           "StringEquals": {
               "secretsmanager:ResourceTag/for-use-with-amazon-emr-managed-user-policies": "${aws:userid}"
           }
       }
   }
   ```

1. If present, remove the following permission from the [EMR Studio **service role policy**](emr-studio-service-role.md). Because the service role policy applies to all secrets defined by each user, you only need to do this one time.

   ```
   {
       "Sid": "AllowSecretsManagerReadOnlyActionsWithEMRTags", 
       "Effect": "Allow", 
       "Action": [ 
           "secretsmanager:GetSecretValue" 
        ], 
       "Resource": "arn:aws:secretsmanager:*:*:secret:*", 
       "Condition": {
           "StringEquals": {
               "aws:ResourceTag/for-use-with-amazon-emr-managed-policies": "true" 
           } 
       } 
   }
   ```

#### To use service-level permissions
<a name="emr-studio-permissions-policies-service"></a>

As of September 1, 2023, EMR Studio automatically adds the `for-use-with-amazon-emr-managed-user-policies` tag for user-level access control. Because this is an added capability, you can continue to use service-level access that's available through the `GetSecretValue` permission in the [EMR Studio service role](emr-studio-service-role.md).

For secrets created before September 1, 2023, EMR Studio didn't add the `for-use-with-amazon-emr-managed-user-policies` tag. To keep using service-level permissions, simply retain your existing [EMR Studio service role](emr-studio-service-role.md) and user role permissions. However, to restrict who can access an individual secret, we recommend that you follow the steps in [To use user-level permissions](#emr-studio-permissions-policies-user) to manually add the `for-use-with-amazon-emr-managed-user-policies` tag to your secrets, and then use tag-based access control in your user permissions policies.

For more information about policy variables, see [IAM policy elements: Variables and tags](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_variables.html) in the *IAM User Guide*.

### Attach the permissions policy to your IAM identity
<a name="emr-studio-permissions-policies-attach"></a>

The following table summarizes which IAM identity you attach a permissions policy to, depending on your EMR Studio authentication mode. For instructions on how to attach a policy, see [Adding and removing IAM identity permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html).


****  

| If you use... | Attach the policy to... | 
| --- | --- | 
| IAM authentication | Your IAM identities (users, groups of users, or roles). For example, you can attach a permissions policy to a user in your AWS account. | 
| IAM federation with an external identity provider (IdP) |  The IAM role or roles that you create for your external IdP. For example, an IAM for SAML 2.0 federation.  EMR Studio uses the permissions that you attach to your IAM role(s) for users with federated access to a Studio.  | 
| IAM Identity Center | Your Amazon EMR Studio user role. | 

## Example user policies
<a name="emr-studio-example-policies"></a>

The following basic user policy allows most EMR Studio actions, but does not let a user create new Amazon EMR clusters. 

### Basic policy
<a name="basic"></a>

**Important**  
The example policy does not include the `CreateStudioPresignedUrl` permission, which you must allow for a user when you use IAM authentication mode. For more information, see [Assign a user or group to an EMR Studio](emr-studio-manage-users.md#emr-studio-assign-users-groups).

The example policy includes `Condition` elements to enforce tag-based access control (TBAC) so that you can use the policy with the example service role for EMR Studio. For more information, see [Create an EMR Studio service role](emr-studio-service-role.md).

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "AllowDefaultEC2SecurityGroupsCreationInVPCWithEMRTags",
      "Effect": "Allow",
      "Action": [
        "ec2:CreateSecurityGroup"
      ],
      "Resource": [
        "arn:aws:ec2:*:*:vpc/*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:ResourceTag/for-use-with-amazon-emr-managed-policies": "true"
        }
      }
    },
    {
      "Sid": "AllowAddingEMRTagsDuringDefaultSecurityGroupCreation",
      "Effect": "Allow",
      "Action": [
        "ec2:CreateTags"
      ],
      "Resource": [
        "arn:aws:ec2:*:*:security-group/*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:RequestTag/for-use-with-amazon-emr-managed-policies": "true",
          "ec2:CreateAction": "CreateSecurityGroup"
        }
      }
    },
    {
      "Sid": "AllowSecretManagerListSecrets",
      "Action": [
        "secretsmanager:ListSecrets"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowSecretCreationWithEMRTagsAndEMRStudioPrefix",
      "Effect": "Allow",
      "Action": [
        "secretsmanager:CreateSecret"
      ],
      "Resource": [
        "arn:aws:secretsmanager:*:*:secret:emr-studio-*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:RequestTag/for-use-with-amazon-emr-managed-policies": "true"
        }
      }
    },
    {
      "Sid": "AllowAddingTagsOnSecretsWithEMRStudioPrefix",
      "Effect": "Allow",
      "Action": [
        "secretsmanager:TagResource"
      ],
      "Resource": [
        "arn:aws:secretsmanager:*:*:secret:emr-studio-*"
      ]
    },
    {
      "Sid": "AllowPassingServiceRoleForWorkspaceCreation",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::*:role/your-emr-studio-service-role>"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowS3ListAndLocationPermissions",
      "Action": [
        "s3:ListAllMyBuckets",
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::*"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowS3ReadOnlyAccessToLogs",
      "Action": [
        "s3:GetObject"
      ],
      "Resource": [
        "arn:aws:s3:::aws-logs-aws-111122223333>-region>/elasticmapreduce/*"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowConfigurationForWorkspaceCollaboration",
      "Action": [
        "elasticmapreduce:UpdateEditor",
        "elasticmapreduce:PutWorkspaceAccess",
        "elasticmapreduce:DeleteWorkspaceAccess",
        "elasticmapreduce:ListWorkspaceAccessIdentities"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow",
      "Condition": {
        "StringEquals": {
          "elasticmapreduce:ResourceTag/creatorUserId": "${aws:userId}"
        }
      }
    },
    {
      "Sid": "DescribeNetwork",
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeVpcs",
        "ec2:DescribeSubnets",
        "ec2:DescribeSecurityGroups"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Sid": "ListIAMRoles",
      "Effect": "Allow",
      "Action": [
        "iam:ListRoles"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}
```

------

The following intermediate user policy allows most EMR Studio actions, and lets a user create new Amazon EMR clusters using a cluster template. 

### Intermediate policy
<a name="intermediate"></a>

**Important**  
The example policy does not include the `CreateStudioPresignedUrl` permission, which you must allow for a user when you use IAM authentication mode. For more information, see [Assign a user or group to an EMR Studio](emr-studio-manage-users.md#emr-studio-assign-users-groups).

The example policy includes `Condition` elements to enforce tag-based access control (TBAC) so that you can use the policy with the example service role for EMR Studio. For more information, see [Create an EMR Studio service role](emr-studio-service-role.md).

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "AllowEMRBasicActions",
      "Action": [
        "elasticmapreduce:CreateEditor",
        "elasticmapreduce:DescribeEditor",
        "elasticmapreduce:ListEditors",
        "elasticmapreduce:StartEditor",
        "elasticmapreduce:StopEditor",
        "elasticmapreduce:DeleteEditor",
        "elasticmapreduce:OpenEditorInConsole",
        "elasticmapreduce:AttachEditor",
        "elasticmapreduce:DetachEditor",
        "elasticmapreduce:CreateRepository",
        "elasticmapreduce:DescribeRepository",
        "elasticmapreduce:DeleteRepository",
        "elasticmapreduce:ListRepositories",
        "elasticmapreduce:LinkRepository",
        "elasticmapreduce:UnlinkRepository",
        "elasticmapreduce:DescribeCluster",
        "elasticmapreduce:ListInstanceGroups",
        "elasticmapreduce:ListBootstrapActions",
        "elasticmapreduce:ListClusters",
        "elasticmapreduce:ListSteps",
        "elasticmapreduce:CreatePersistentAppUI",
        "elasticmapreduce:DescribePersistentAppUI",
        "elasticmapreduce:GetPersistentAppUIPresignedURL",
        "elasticmapreduce:GetOnClusterAppUIPresignedURL"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowEMRContainersBasicActions",
      "Action": [
        "emr-containers:DescribeVirtualCluster",
        "emr-containers:ListVirtualClusters",
        "emr-containers:DescribeManagedEndpoint",
        "emr-containers:ListManagedEndpoints",
        "emr-containers:DescribeJobRun",
        "emr-containers:ListJobRuns"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowRetrievingManagedEndpointCredentials",
      "Effect": "Allow",
      "Action": [
        "emr-containers:GetManagedEndpointSessionCredentials"
      ],
      "Resource": [
        "arn:aws:emr-containers:us-west-1:123456789012:/virtualclusters/virtual-cluster-id/endpoints/managed-endpoint-id"
      ],
      "Condition": {
        "StringEquals": {
          "emr-containers:ExecutionRoleArn": [
            "arn:aws:iam::123456789012:role/emr-on-eks-execution-role"
          ]
        }
      }
    },
    {
      "Sid": "AllowSecretManagerListSecrets",
      "Action": [
        "secretsmanager:ListSecrets"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowSecretCreationWithEMRTagsAndEMRStudioPrefix",
      "Effect": "Allow",
      "Action": [
        "secretsmanager:CreateSecret"
      ],
      "Resource": [
        "arn:aws:secretsmanager:*:*:secret:emr-studio-*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:RequestTag/for-use-with-amazon-emr-managed-policies": "true"
        }
      }
    },
    {
      "Sid": "AllowAddingTagsOnSecretsWithEMRStudioPrefix",
      "Effect": "Allow",
      "Action": [
        "secretsmanager:TagResource"
      ],
      "Resource": [
        "arn:aws:secretsmanager:*:*:secret:emr-studio-*"
      ]
    },
    {
      "Sid": "AllowClusterTemplateRelatedIntermediateActions",
      "Action": [
        "servicecatalog:DescribeProduct",
        "servicecatalog:DescribeProductView",
        "servicecatalog:DescribeProvisioningParameters",
        "servicecatalog:ProvisionProduct",
        "servicecatalog:SearchProducts",
        "servicecatalog:UpdateProvisionedProduct",
        "servicecatalog:ListProvisioningArtifacts",
        "servicecatalog:ListLaunchPaths",
        "servicecatalog:DescribeRecord",
        "cloudformation:DescribeStackResources"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowPassingServiceRoleForWorkspaceCreation",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::*:role/your-emr-studio-service-role"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowS3ListAndLocationPermissions",
      "Action": [
        "s3:ListAllMyBuckets",
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::*"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowS3ReadOnlyAccessToLogs",
      "Action": [
        "s3:GetObject"
      ],
      "Resource": [
        "arn:aws:s3:::aws-logs-123456789012-us-east-1/elasticmapreduce/*"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowConfigurationForWorkspaceCollaboration",
      "Action": [
        "elasticmapreduce:UpdateEditor",
        "elasticmapreduce:PutWorkspaceAccess",
        "elasticmapreduce:DeleteWorkspaceAccess",
        "elasticmapreduce:ListWorkspaceAccessIdentities"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow",
      "Condition": {
        "StringEquals": {
          "elasticmapreduce:ResourceTag/creatorUserId": "${aws:userId}"
        }
      }
    },
    {
      "Sid": "DescribeNetwork",
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeVpcs",
        "ec2:DescribeSubnets",
        "ec2:DescribeSecurityGroups"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Sid": "ListIAMRoles",
      "Effect": "Allow",
      "Action": [
        "iam:ListRoles"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Sid": "AllowServerlessActions",
      "Action": [
        "emr-serverless:CreateApplication",
        "emr-serverless:UpdateApplication",
        "emr-serverless:DeleteApplication",
        "emr-serverless:ListApplications",
        "emr-serverless:GetApplication",
        "emr-serverless:StartApplication",
        "emr-serverless:StopApplication",
        "emr-serverless:StartJobRun",
        "emr-serverless:CancelJobRun",
        "emr-serverless:ListJobRuns",
        "emr-serverless:GetJobRun",
        "emr-serverless:GetDashboardForJobRun",
        "emr-serverless:AccessInteractiveEndpoints"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowPassingRuntimeRoleForRunningServerlessJob",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::*:role/serverless-runtime-role"
      ],
      "Effect": "Allow"
    }
  ]
}
```

------

The following advanced user policy allows all EMR Studio actions, and lets a user create new Amazon EMR clusters using a cluster template or by providing a cluster configuration. 

### Advanced policy
<a name="advanced"></a>

**Important**  
The example policy does not include the `CreateStudioPresignedUrl` permission, which you must allow for a user when you use IAM authentication mode. For more information, see [Assign a user or group to an EMR Studio](emr-studio-manage-users.md#emr-studio-assign-users-groups).

The example policy includes `Condition` elements to enforce tag-based access control (TBAC) so that you can use the policy with the example service role for EMR Studio. For more information, see [Create an EMR Studio service role](emr-studio-service-role.md).

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "AllowEMRBasicActions",
      "Action": [
        "elasticmapreduce:CreateEditor",
        "elasticmapreduce:DescribeEditor",
        "elasticmapreduce:ListEditors",
        "elasticmapreduce:StartEditor",
        "elasticmapreduce:StopEditor",
        "elasticmapreduce:DeleteEditor",
        "elasticmapreduce:OpenEditorInConsole",
        "elasticmapreduce:AttachEditor",
        "elasticmapreduce:DetachEditor",
        "elasticmapreduce:CreateRepository",
        "elasticmapreduce:DescribeRepository",
        "elasticmapreduce:DeleteRepository",
        "elasticmapreduce:ListRepositories",
        "elasticmapreduce:LinkRepository",
        "elasticmapreduce:UnlinkRepository",
        "elasticmapreduce:DescribeCluster",
        "elasticmapreduce:ListInstanceGroups",
        "elasticmapreduce:ListBootstrapActions",
        "elasticmapreduce:ListClusters",
        "elasticmapreduce:ListSteps",
        "elasticmapreduce:CreatePersistentAppUI",
        "elasticmapreduce:DescribePersistentAppUI",
        "elasticmapreduce:GetPersistentAppUIPresignedURL",
        "elasticmapreduce:GetOnClusterAppUIPresignedURL"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowEMRContainersBasicActions",
      "Action": [
        "emr-containers:DescribeVirtualCluster",
        "emr-containers:ListVirtualClusters",
        "emr-containers:DescribeManagedEndpoint",
        "emr-containers:ListManagedEndpoints",
        "emr-containers:DescribeJobRun",
        "emr-containers:ListJobRuns"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowRetrievingManagedEndpointCredentials",
      "Effect": "Allow",
      "Action": [
        "emr-containers:GetManagedEndpointSessionCredentials"
      ],
      "Resource": [
        "arn:aws:emr-containers:*:123456789012:/virtualclusters/virtual-cluster-id/endpoints/managed-endpoint-id"
      ],
      "Condition": {
        "StringEquals": {
          "emr-containers:ExecutionRoleArn": [
            "arn:aws:iam::123456789012:role/emr-on-eks-execution-role"
          ]
        }
      }
    },
    {
      "Sid": "AllowSecretManagerListSecrets",
      "Action": [
        "secretsmanager:ListSecrets"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowSecretCreationWithEMRTagsAndEMRStudioPrefix",
      "Effect": "Allow",
      "Action": [
        "secretsmanager:CreateSecret"
      ],
      "Resource": [
        "arn:aws:secretsmanager:*:*:secret:emr-studio-*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:RequestTag/for-use-with-amazon-emr-managed-policies": "true"
        }
      }
    },
    {
      "Sid": "AllowAddingTagsOnSecretsWithEMRStudioPrefix",
      "Effect": "Allow",
      "Action": [
        "secretsmanager:TagResource"
      ],
      "Resource": [
        "arn:aws:secretsmanager:*:*:secret:emr-studio-*"
      ]
    },
    {
      "Sid": "AllowClusterTemplateRelatedIntermediateActions",
      "Action": [
        "servicecatalog:DescribeProduct",
        "servicecatalog:DescribeProductView",
        "servicecatalog:DescribeProvisioningParameters",
        "servicecatalog:ProvisionProduct",
        "servicecatalog:SearchProducts",
        "servicecatalog:UpdateProvisionedProduct",
        "servicecatalog:ListProvisioningArtifacts",
        "servicecatalog:ListLaunchPaths",
        "servicecatalog:DescribeRecord",
        "cloudformation:DescribeStackResources"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowEMRCreateClusterAdvancedActions",
      "Action": [
        "elasticmapreduce:RunJobFlow"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowPassingServiceRoleForWorkspaceCreation",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::*:role/your-emr-studio-service-role",
        "arn:aws:iam::*:role/EMR_DefaultRole_V2",
        "arn:aws:iam::*:role/EMR_EC2_DefaultRole"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowS3ListAndLocationPermissions",
      "Action": [
        "s3:ListAllMyBuckets",
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::*"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowS3ReadOnlyAccessToLogs",
      "Action": [
        "s3:GetObject"
      ],
      "Resource": [
        "arn:aws:s3:::aws-logs-123456789012-us-east-1/elasticmapreduce/*"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowConfigurationForWorkspaceCollaboration",
      "Action": [
        "elasticmapreduce:UpdateEditor",
        "elasticmapreduce:PutWorkspaceAccess",
        "elasticmapreduce:DeleteWorkspaceAccess",
        "elasticmapreduce:ListWorkspaceAccessIdentities"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow",
      "Condition": {
        "StringEquals": {
          "elasticmapreduce:ResourceTag/creatorUserId": "${aws:userId}"
        }
      }
    },
    {
      "Sid": "SageMakerDataWranglerForEMRStudio",
      "Effect": "Allow",
      "Action": [
        "sagemaker:CreatePresignedDomainUrl",
        "sagemaker:DescribeDomain",
        "sagemaker:ListDomains",
        "sagemaker:ListUserProfiles"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Sid": "DescribeNetwork",
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeVpcs",
        "ec2:DescribeSubnets",
        "ec2:DescribeSecurityGroups"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Sid": "ListIAMRoles",
      "Effect": "Allow",
      "Action": [
        "iam:ListRoles"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Sid": "AllowServerlessActions",
      "Action": [
        "emr-serverless:CreateApplication",
        "emr-serverless:UpdateApplication",
        "emr-serverless:DeleteApplication",
        "emr-serverless:ListApplications",
        "emr-serverless:GetApplication",
        "emr-serverless:StartApplication",
        "emr-serverless:StopApplication",
        "emr-serverless:StartJobRun",
        "emr-serverless:CancelJobRun",
        "emr-serverless:ListJobRuns",
        "emr-serverless:GetJobRun",
        "emr-serverless:GetDashboardForJobRun",
        "emr-serverless:AccessInteractiveEndpoints"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowPassingRuntimeRoleForRunningServerlessJob",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::*:role/serverless-runtime-role"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowCodeWhisperer",
      "Effect": "Allow",
      "Action": [
        "codewhisperer:GenerateRecommendations"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Sid": "AllowAthenaSQL",
      "Action": [
        "athena:StartQueryExecution",
        "athena:StopQueryExecution",
        "athena:GetQueryExecution",
        "athena:GetQueryRuntimeStatistics",
        "athena:GetQueryResults",
        "athena:ListQueryExecutions",
        "athena:BatchGetQueryExecution",
        "athena:GetNamedQuery",
        "athena:ListNamedQueries",
        "athena:BatchGetNamedQuery",
        "athena:UpdateNamedQuery",
        "athena:DeleteNamedQuery",
        "athena:ListDataCatalogs",
        "athena:GetDataCatalog",
        "athena:ListDatabases",
        "athena:GetDatabase",
        "athena:ListTableMetadata",
        "athena:GetTableMetadata",
        "athena:ListWorkGroups",
        "athena:GetWorkGroup",
        "athena:CreateNamedQuery",
        "athena:GetPreparedStatement",
        "glue:CreateDatabase",
        "glue:DeleteDatabase",
        "glue:GetDatabase",
        "glue:GetDatabases",
        "glue:UpdateDatabase",
        "glue:CreateTable",
        "glue:DeleteTable",
        "glue:BatchDeleteTable",
        "glue:UpdateTable",
        "glue:GetTable",
        "glue:GetTables",
        "glue:BatchCreatePartition",
        "glue:CreatePartition",
        "glue:DeletePartition",
        "glue:BatchDeletePartition",
        "glue:UpdatePartition",
        "glue:GetPartition",
        "glue:GetPartitions",
        "glue:BatchGetPartition",
        "kms:ListAliases",
        "kms:ListKeys",
        "kms:DescribeKey",
        "lakeformation:GetDataAccess",
        "s3:GetObject",
        "s3:ListBucket",
        "s3:ListBucketMultipartUploads",
        "s3:ListMultipartUploadParts",
        "s3:AbortMultipartUpload",
        "s3:PutObject",
        "s3:PutBucketPublicAccessBlock",
        "s3:ListAllMyBuckets"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    }
  ]
}
```

------

The following user policy contains the minimum user permissions that are required to use an EMR Serverless interactive application with EMR Studio Workspaces.

### EMR Serverless interactive policy
<a name="serverless-interactive"></a>

In this example policy that has user permissions for EMR Serverless interactive applications with EMR Studio, replace the placeholders for *serverless-runtime-role* and *emr-studio-service-role* with your correct [EMR Studio service role](emr-studio-service-role.md) and [EMR Serverless runtime role](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/security-iam-runtime-role.html).

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "AllowServerlessActions",
      "Action": [
        "emr-serverless:CreateApplication",
        "emr-serverless:UpdateApplication",
        "emr-serverless:DeleteApplication",
        "emr-serverless:ListApplications",
        "emr-serverless:GetApplication",
        "emr-serverless:StartApplication",
        "emr-serverless:StopApplication",
        "emr-serverless:StartJobRun",
        "emr-serverless:CancelJobRun",
        "emr-serverless:ListJobRuns",
        "emr-serverless:GetJobRun",
        "emr-serverless:GetDashboardForJobRun",
        "emr-serverless:AccessInteractiveEndpoints"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowEMRBasicActions",
      "Action": [
        "elasticmapreduce:CreateEditor",
        "elasticmapreduce:DescribeEditor",
        "elasticmapreduce:ListEditors",
        "elasticmapreduce:UpdateStudio",
        "elasticmapreduce:StartEditor",
        "elasticmapreduce:StopEditor",
        "elasticmapreduce:DeleteEditor",
        "elasticmapreduce:OpenEditorInConsole",
        "elasticmapreduce:AttachEditor",
        "elasticmapreduce:DetachEditor",
        "elasticmapreduce:CreateStudio",
        "elasticmapreduce:DescribeStudio",
        "elasticmapreduce:DeleteStudio",
        "elasticmapreduce:ListStudios",
        "elasticmapreduce:CreateStudioPresignedUrl"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowPassingRuntimeRoleForRunningEMRServerlessJob",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::*:role/serverless-runtime-role"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowPassingServiceRoleForWorkspaceCreation",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::*:role/emr-studio-service-role"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "AllowS3ListAndGetPermissions",
      "Action": [
        "s3:ListAllMyBuckets",
        "s3:ListBucket",
        "s3:GetBucketLocation",
        "s3:GetObject"
      ],
      "Resource": [
        "arn:aws:s3:::*"
      ],
      "Effect": "Allow"
    },
    {
      "Sid": "DescribeNetwork",
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeVpcs",
        "ec2:DescribeSubnets",
        "ec2:DescribeSecurityGroups"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Sid": "ListIAMRoles",
      "Effect": "Allow",
      "Action": [
        "iam:ListRoles"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}
```

------

## AWS Identity and Access Management permissions for EMR Studio users
<a name="emr-studio-iam-permissions-table"></a>

The following table includes each Amazon EMR Studio operation that a user might perform, and lists the minimum IAM actions needed to perform that operation. You allow these actions in your IAM permissions policies (when you use IAM authentication) or in your user role session policies (when you use IAM Identity Center authentication) for EMR Studio.

The table also displays the operations allowed in each of example permissions policy for EMR Studio. For more information about the example permissions policies, see [Create permissions policies for EMR Studio users](#emr-studio-permissions-policies).


| Action | Basic | Intermediate | Advanced | Associated actions | 
| --- | --- | --- | --- | --- | 
| Create and delete Workspaces | Yes | Yes | Yes |  <pre>"elasticmapreduce:CreateEditor", <br />"elasticmapreduce:DescribeEditor",<br />"elasticmapreduce:ListEditors", <br />"elasticmapreduce:DeleteEditor"</pre>  | 
| View the Collaboration panel, enable Workspace collaboration, and add collaborators. For more information, see [Set ownership for Workspace collaboration](#emr-studio-workspace-collaboration-permissions). | Yes | Yes | Yes |  <pre>"elasticmapreduce:UpdateEditor",<br />"elasticmapreduce:PutWorkspaceAccess",<br />"elasticmapreduce:DeleteWorkspaceAccess",<br />"elasticmapreduce:ListWorkspaceAccessIdentities"</pre>  | 
| See a list of Amazon S3 Control storage buckets in the same account as the Studio when creating a new EMR cluster, and access container logs when using a web UI to debug applications | Yes | Yes | Yes |  <pre>"s3:ListAllMyBuckets",<br />"s3:ListBucket", <br />"s3:GetBucketLocation",<br />"s3:GetObject"</pre>  | 
| Access Workspaces | Yes | Yes | Yes |  <pre>"elasticmapreduce:DescribeEditor", <br />"elasticmapreduce:ListEditors",<br />"elasticmapreduce:StartEditor", <br />"elasticmapreduce:StopEditor",<br />"elasticmapreduce:OpenEditorInConsole"</pre>  | 
| Attach or detach existing Amazon EMR clusters associated with the Workspace | Yes | Yes | Yes |  <pre>"elasticmapreduce:AttachEditor",<br />"elasticmapreduce:DetachEditor",<br />"elasticmapreduce:ListClusters",<br />"elasticmapreduce:DescribeCluster",<br />"elasticmapreduce:ListInstanceGroups",<br />"elasticmapreduce:ListBootstrapActions"</pre>  | 
| Attach or detach Amazon EMR on EKS clusters  | Yes | Yes | Yes |  <pre>"elasticmapreduce:AttachEditor", <br />"elasticmapreduce:DetachEditor",<br />"emr-containers:ListVirtualClusters", <br />"emr-containers:DescribeVirtualCluster",<br />"emr-containers:ListManagedEndpoints",<br />"emr-containers:DescribeManagedEndpoint",<br />"emr-containers:GetManagedEndpointSessionCredentials"</pre>  | 
| Attach or detach EMR Serverless applications that are associated with the Workspace | No | Yes | Yes |  <pre>"elasticmapreduce:AttachEditor",<br />"elasticmapreduce:DetachEditor",<br />"emr-serverless:GetApplication",<br />"emr-serverless:StartApplication",<br />"emr-serverless:ListApplications",<br />"emr-serverless:GetDashboardForJobRun",<br />"emr-serverless:AccessInteractiveEndpoints",<br />"iam:PassRole"</pre> The `PassRole` permission is required to pass the EMR Serverless job runtime role. For more information, see [Job runtime roles](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/security-iam-runtime-role.html) in the *Amazon EMR Serverless User Guide*. | 
| Debug Amazon EMR on EC2 jobs with persistent application user interfaces | Yes | Yes | Yes |  <pre>"elasticmapreduce:CreatePersistentAppUI",<br />"elasticmapreduce:DescribePersistentAppUI",<br />"elasticmapreduce:GetPersistentAppUIPresignedURL",<br />"elasticmapreduce:ListClusters",<br />"elasticmapreduce:ListSteps",<br />"elasticmapreduce:DescribeCluster",<br />"s3:ListBucket",<br />"s3:GetObject"</pre>  | 
| Debug Amazon EMR on EC2 jobs with on-cluster application user interfaces | Yes | Yes | Yes |  <pre>"elasticmapreduce:GetOnClusterAppUIPresignedURL"</pre>  | 
| Debug Amazon EMR on EKS job runs using the Spark History Server | Yes | Yes | Yes |  <pre>"elasticmapreduce:CreatePersistentAppUI",<br />"elasticmapreduce:DescribePersistentAppUI",<br />"elasticmapreduce:GetPersistentAppUIPresignedURL",<br />"emr-containers:ListVirtualClusters",<br />"emr-containers:DescribeVirtualCluster",<br />"emr-containers:ListJobRuns",<br />"emr-containers:DescribeJobRun",<br />"s3:ListBucket",<br />"s3:GetObject"</pre>  | 
| Create and delete Git repositories | Yes | Yes | Yes |  <pre>"elasticmapreduce:CreateRepository", <br />"elasticmapreduce:DeleteRepository",<br />"elasticmapreduce:ListRepositories",<br />"elasticmapreduce:DescribeRepository",<br />"secretsmanager:CreateSecret",<br />"secretsmanager:ListSecrets",<br />"secretsmanager:TagResource"</pre>  | 
| Link and unlink Git repositories | Yes | Yes | Yes |  <pre>"elasticmapreduce:LinkRepository",<br />"elasticmapreduce:UnlinkRepository",<br />"elasticmapreduce:ListRepositories",<br />"elasticmapreduce:DescribeRepository"</pre>  | 
| Create new clusters from predefined cluster templates | No | Yes | Yes |  <pre>"servicecatalog:SearchProducts", <br />"servicecatalog:DescribeProduct",<br />"servicecatalog:DescribeProductView",<br />"servicecatalog:DescribeProvisioningParameters",<br />"servicecatalog:ProvisionProduct",<br />"servicecatalog:UpdateProvisionedProduct",<br />"servicecatalog:ListProvisioningArtifacts", <br />"servicecatalog:DescribeRecord",<br />"servicecatalog:ListLaunchPaths",<br />"cloudformation:DescribeStackResources", <br />"elasticmapreduce:ListClusters",<br />"elasticmapreduce:DescribeCluster"</pre>  | 
| Provide a cluster configuration to create new clusters. | No | No | Yes |  <pre>"elasticmapreduce:RunJobFlow",<br />"iam:PassRole",<br />"elasticmapreduce:ListClusters",<br />"elasticmapreduce:DescribeCluster"</pre>  | 
| [Assign a user to a Studio when you use IAM authentication mode.](emr-studio-manage-users.md#emr-studio-assign-users-groups) | No | No | No |  <pre>"elasticmapreduce:CreateStudioPresignedUrl"</pre>  | 
| Describe network objects. | Yes | Yes | Yes |    JSON   

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "DescribeNetwork",
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeVpcs",
        "ec2:DescribeSubnets",
        "ec2:DescribeSecurityGroups"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}
```      | 
| List IAM roles. | Yes | Yes | Yes |    JSON   

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "ListIAMRoles",
      "Effect": "Allow",
      "Action": [
        "iam:ListRoles"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}
```      | 
| [Connect to EMR Studio from Amazon SageMaker AI Studio and use the Data Wrangler visual interface.](https://aws.amazon.com/blogs/machine-learning/prepare-data-from-amazon-emr-for-machine-learning-using-amazon-sagemaker-data-wrangler/)  | No | No | Yes |  <pre>"sagemaker:CreatePresignedDomainUrl",<br />"sagemaker:DescribeDomain",<br />"sagemaker:ListDomains",<br />"sagemaker:ListUserProfiles"</pre>  | 
| [Use Amazon CodeWhisperer in your EMR Studio.](emr-studio-codewhisperer.md) | No | No | Yes |  <pre>"codewhisperer:GenerateRecommendations"</pre>  | 
| [Access Amazon Athena SQL editor from your EMR Studio.](emr-studio-athena.md) This list might not include all of the permissions that you need to use all Athena features. For the most up-to-date list, see the [Athena full access policy](https://docs.aws.amazon.com/athena/latest/ug/managed-policies.html#amazonathenafullaccess-managed-policy). | No | No | Yes |  <pre>"athena:StartQueryExecution",<br />"athena:StopQueryExecution",<br />"athena:GetQueryExecution",<br />"athena:GetQueryRuntimeStatistics",<br />"athena:GetQueryResults",<br />"athena:ListQueryExecutions",<br />"athena:BatchGetQueryExecution",<br />"athena:GetNamedQuery",<br />"athena:ListNamedQueries",<br />"athena:BatchGetNamedQuery",<br />"athena:UpdateNamedQuery",<br />"athena:DeleteNamedQuery",<br />"athena:ListDataCatalogs",<br />"athena:GetDataCatalog",<br />"athena:ListDatabases",<br />"athena:GetDatabase",<br />"athena:ListTableMetadata",<br />"athena:GetTableMetadata",<br />"athena:ListWorkGroups",<br />"athena:GetWorkGroup",<br />"athena:CreateNamedQuery",<br />"athena:GetPreparedStatement",<br />"glue:CreateDatabase",<br />"glue:DeleteDatabase",<br />"glue:GetDatabase",<br />"glue:GetDatabases",<br />"glue:UpdateDatabase",<br />"glue:CreateTable",<br />"glue:DeleteTable",<br />"glue:BatchDeleteTable",<br />"glue:UpdateTable",<br />"glue:GetTable",<br />"glue:GetTables",<br />"glue:BatchCreatePartition",<br />"glue:CreatePartition",<br />"glue:DeletePartition",<br />"glue:BatchDeletePartition",<br />"glue:UpdatePartition",<br />"glue:GetPartition",<br />"glue:GetPartitions",<br />"glue:BatchGetPartition",<br />"kms:ListAliases",<br />"kms:ListKeys",<br />"kms:DescribeKey",<br />"lakeformation:GetDataAccess",<br />"s3:GetBucketLocation",<br />"s3:GetBucketLocation",<br />"s3:GetObject",<br />"s3:ListBucket",<br />"s3:ListBucketMultipartUploads",<br />"s3:ListMultipartUploadParts",<br />"s3:AbortMultipartUpload",<br />"s3:PutObject",<br />"s3:PutBucketPublicAccessBlock",<br />"s3:ListAllMyBuckets"</pre>  | 

# Create an EMR Studio
<a name="emr-studio-create-studio"></a>

You can create an EMR Studio for your team with the Amazon EMR console or the AWS CLI. Creating a Studio instance is part of setting up Amazon EMR Studio.

**Prerequisites**

Before you create a Studio, make sure you've completed the previous tasks in [Set up an EMR Studio](emr-studio-set-up.md).

To create a Studio using the AWS CLI, you should have the latest version installed. For more information, see [Installing or updating the latest version of the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html).

**Important**  
Deactivate proxy management tools such as FoxyProxy or SwitchyOmega in the browser before you create a Studio. Active proxies can result in a **Network Failure ** error message when you choose **Create Studio**.

 Amazon EMR provides you with a simple console experience to create a Studio, so you can quickly get started with the default settings. to run interactive workloads or batch jobs with the default settings. Creating a EMR Studio also creates an EMR Serverless application ready for your interactive jobs.

If you want full control over your Studio's settings, you can choose **Custom**, which lets you configure all of the additional settings. 

------
#### [ Interactive workloads ]

**To create a EMR Studio for interactive workloads**

1. Open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR Studio** on the left navigation, choose **Getting started**. You can also create a new Studio from the **Studios** page.

1. Amazon EMR provides default settings for you if you're creating a EMR Studio for interactive workloads, but you can edit these settings. Configurable settings include the EMR Studio's name, the S3 location for your Workspace, the service role to use, the Workspace(s) you want to use, EMR Serverless application name, and the associated runtime role.

1. Choose **Create Studio and launch Workspace** to finish and navigate to the **Studios** page. Your new Studio appears in the list with details such as **Studio name**, **Creation date**, and **Studio access URL**. Your Workspace opens in a new tab in your browser.

------
#### [ Batch jobs ]

**To create a EMR Studio for interactive workloads**

1. Open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR Studio** on the left navigation, choose **Getting started**. You can also create a new Studio from the **Studios** page.

1. Amazon EMR provides default settings for you if you're creating a EMR Studio for batch jobs, but you can edit these settings. Configurable settings include the EMR Studio's name, EMR Serverless application name, and the associated runtime role.

1. Choose **Create Studio and launch Workspace** to finish and navigate to the **Studios** page. Your new Studio appears in the list with details such as **Studio name**, **Creation date**, and **Studio access URL**. Your EMR Studio opens in a new tab in your browser.

------
#### [ Custom settings ]

**To create a EMR Studio with custom settings**

1. Open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR Studio** on the left navigation, choose **Getting started**. You can also create a new Studio from the **Studios** page.

1. Choose **Create a Studio** to open the **Create a Studio** page.

1. Enter a **Studio name**.

1. Choose to create a new S3 bucket or use an existing location.

1. Choose the Workspace to add to the Studio. You can add up to 3 Workspaces.

1. Under **Authentication**, choose an authentication mode for the Studio and provide information according to the following table. To learn more about authentication for EMR Studio, see [Choose an authentication mode for Amazon EMR Studio](emr-studio-authentication.md).  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-create-studio.html)

1. For VPC, choose an Amazon Virtual Private Cloud (**VPC**) for the Studio from the dropdown list.

1. Under **Subnets**, select a maximum of five subnets in your VPC to associate with the Studio. You have the option to add more subnets after you create the Studio.

1. For **Security groups**, choose either the default security groups or custom security groups. For more information, see [Define security groups to control EMR Studio network traffic](emr-studio-security-groups.md).  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-create-studio.html)

1. Add tags to your Studio and other resources. For more information about tags, see [Tag clusters](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-tags.html).

1. Choose **Create Studio and launch Workspace** to finish and navigate to the **Studios** page. Your new Studio appears in the list with details such as **Studio name**, **Creation date**, and **Studio access URL**.

After you create a Studio, follow the instructions in [Assign a user or group to an EMR Studio](emr-studio-manage-users.md#emr-studio-assign-users-groups).

------
#### [ CLI ]

**Note**  
Linux line continuation characters (\$1) are included for readability. They can be removed or used in Linux commands. For Windows, remove them or replace with a caret (^).

**Example – Create an EMR Studio that uses IAM for authentication**  
The following example AWS CLI command creates an EMR Studio with IAM authentication mode. When you use IAM authentication or federation for the Studio, you don't specify a `--user-role`.   
To let federated users log in using the Studio URL and credentials for your identity provider (IdP), specify your `--idp-auth-url` and `--idp-relay-state-parameter-name`. For a list of IdP authentication URLs and RelayState names, see [Identity provider RelayState parameters and authentication URLs](#emr-studio-idp-reference-table).  

```
aws emr create-studio \
--name <example-studio-name> \
--auth-mode IAM \
--vpc-id <example-vpc-id> \
--subnet-ids <subnet-id-1> <subnet-id-2>... <subnet-id-5>  \
--service-role <example-studio-service-role-name> \
--user-role studio-user-role-name \
--workspace-security-group-id <example-workspace-sg-id> \
--engine-security-group-id <example-engine-sg-id> \
--default-s3-location <example-s3-location> \
--idp-auth-url <https://EXAMPLE/login/> \
--idp-relay-state-parameter-name <example-RelayState>
```

**Example – Create an EMR Studio that uses Identity Center for authentication**  
The following AWS CLI example command creates an EMR Studio that uses IAM Identity Center authentication mode. When you use IAM Identity Center authentication, you must specify a `--user-role`.   
For more information about IAM Identity Center authentication mode, see [Set up IAM Identity Center authentication mode for Amazon EMR Studio](emr-studio-authentication.md#emr-studio-enable-sso).  

```
aws emr create-studio \
--name <example-studio-name> \
--auth-mode SSO \
--vpc-id <example-vpc-id> \
--subnet-ids <subnet-id-1> <subnet-id-2>... <subnet-id-5>  \
--service-role <example-studio-service-role-name> \
--user-role <example-studio-user-role-name> \
--workspace-security-group-id <example-workspace-sg-id> \
--engine-security-group-id <example-engine-sg-id> \
--default-s3-location <example-s3-location>
--trusted-identity-propagation-enabled \
--idc-user-assignment OPTIONAL \
--idc-instance-arn <iam-identity-center-instance-arn>
```

**Example – CLI output for `aws emr create-studio`**  
The following is an example of the output that appears after you create a Studio.  

```
{
    StudioId: "es-123XXXXXXXXX",
    Url: "https://es-123XXXXXXXXX.emrstudio-prod.us-east-1.amazonaws.com"
}
```

For more information about the `create-studio` command, see [https://docs.aws.amazon.com/cli/latest/reference/emr/create-studio.html](https://docs.aws.amazon.com/cli/latest/reference/emr/create-studio.html).

------

## Identity provider RelayState parameters and authentication URLs
<a name="emr-studio-idp-reference-table"></a>

When you use IAM federation, and you want users to log in using your Studio URL and credentials for your identity provider (IdP), you can specify your **Identity provider (IdP) login URL** and **RelayState** parameter name when you [Create an EMR Studio](#emr-studio-create-studio).

The following table shows the standard authentication URL and RelayState parameter name for some popular identity providers.


| Identity provider | Parameter | Authentication URL | 
| --- | --- | --- | 
| Auth0 | RelayState | https://<sub\$1domain>.auth0.com/samlp/<app\$1id> | 
| Google accounts | RelayState | https://accounts.google.com/o/saml2/initsso?idpid=<idp\$1id>&spid=<sp\$1id>&forceauthn=false | 
| Microsoft Azure | RelayState | https://myapps.microsoft.com/signin/<app\$1name>/<app\$1id>?tenantId=<tenant\$1id> | 
| Okta | RelayState | https://<sub\$1domain>.okta.com/app/<app\$1name>/<app\$1id>/sso/saml | 
| PingFederate | TargetResource | https://<host>/idp/<idp\$1id>/startSSO.ping?PartnerSpId=<sp\$1id> | 
| PingOne | TargetResource | https://sso.connect.pingidentity.com/sso/sp/initsso?saasid=<app\$1id>&idpid=<idp\$1id> | 

# Assign and manage EMR Studio users
<a name="emr-studio-manage-users"></a>

After you create an EMR Studio, you can assign users and groups to it. The method you use to assign, update, and remove users depends on the Studio authentication mode. 
+ When you use IAM authentication mode, you configure EMR Studio user assignment and permissions in IAM or with IAM and your identity provider. 
+ With IAM Identity Center authentication mode, you use the Amazon EMR management console or the AWS CLI to manage users.

To learn more about authentication for Amazon EMR Studio, see [Choose an authentication mode for Amazon EMR Studio](emr-studio-authentication.md).

## Assign a user or group to an EMR Studio
<a name="emr-studio-assign-users-groups"></a>

------
#### [ IAM ]

When you use [Set up IAM authentication mode for Amazon EMR Studio](emr-studio-authentication.md#emr-studio-iam-authentication), you must allow the `CreateStudioPresignedUrl` action in a user's IAM permissions policy and restrict the user to a particular Studio. You can include `CreateStudioPresignedUrl` in your [User permissions for IAM authentication mode](how-emr-studio-works.md#emr-studio-iam-authorization) or use a separate policy.

To restrict a user to a Studio (or set of Studios), you can use attribute-based access control (ABAC) or specify the Amazon Resource Name (ARN) of a Studio in the `Resource` element of the permissions policy. 

**Example Assign a user to a Studio using a Studio ARN**  
The following example policy gives a user access to a particular EMR Studio by allowing the `CreateStudioPresignedUrl` action and specifying the Studio's Amazon Resource Name (ARN) in the `Resource` element.    
****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "AllowCreateStudioPresignedUrl",
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:CreateStudioPresignedUrl"
      ],
      "Resource": [
        "arn:aws:elasticmapreduce:us-east-1:123456789012:studio/studio-id"
      ]
    }
  ]
}
```

**Example Assign a user to a Studio with ABAC for IAM authentication**  
There are multiple ways to configure attribute-based access control (ABAC) for a Studio. For example, you might attach one or more tags to an EMR Studio, and then create an IAM policy that restricts the `CreateStudioPresignedUrl` action to a particular Studio or set of Studios with those tags.   
You can add tags during or after Studio creation. To add tags to an existing Studio, you can use the [AWS CLI`emr add-tags`](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/emr/add-tags.html) command. The following example adds a tag with the key-value pair `Team = Data Analytics` to an EMR Studio.   

```
aws emr add-tags --resource-id <example-studio-id> --tags Team="Data Analytics"
```
The following example permissions policy allows the `CreateStudioPresignedUrl` action for EMR Studios with the tag key-value pair `Team = DataAnalytics`. For more information about using tags to control access, see [Controlling access to and for a users and roles using tags](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_iam-tags.html) or [Controlling access to AWS resources using tags](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_tags.html).    
****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "AllowCreateStudioPresignedUrl",
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:CreateStudioPresignedUrl"
      ],
      "Resource": [
        "arn:aws:elasticmapreduce:*:123456789012:studio/*"
      ],
      "Condition": {
        "StringEquals": {
          "elasticmapreduce:ResourceTag/Team": "Data Analytics"
        }
      }
    }
  ]
}
```

**Example Assign a user to a Studio using the aws:SourceIdentity global condition key**  
When you use IAM federation, you can use the global condition key `aws:SourceIdentity` in a permissions policy to give users Studio access when they assume your IAM role for federation.   
You must first configure your identity provider (IdP) to return an identifying string, such as an email address or username, when a user authenticates and assumes your IAM role for federation. IAM sets the global condition key `aws:SourceIdentity` to the identifying string returned by your IdP.  
For more information, see the [How to relate IAM role activity to corporate identity](https://aws.amazon.com/blogs/security/how-to-relate-iam-role-activity-to-corporate-identity/) blog post in the AWS Security Blog and the [aws:SourceIdentity](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-sourceidentity) entry in the global condition keys reference.   
The following example policy allows the `CreateStudioPresignedUrl` action and gives users with an `aws:SourceIdentity` that matches the *<example-source-identity>* access to the EMR Studio specified by *<example-studio-arn>*.    
****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:CreateStudioPresignedUrl"
      ],
      "Resource": [
        "arn:aws:elasticmapreduce:us-east-1:123456789012:studio/studio-name"
      ],
      "Condition": {
        "StringLike": {
          "aws:SourceIdentity": "example-source-identity"
        }
      },
      "Sid": "AllowELASTICMAPREDUCECreatestudiopresignedurl"
    }
  ]
}
```

------
#### [ IAM Identity Center ]

When you assign a user or group to an EMR Studio, you specify a session policy that defines fine-grained permissions, such as the ability to create a new EMR cluster, for that user or group. Amazon EMR stores these session policy mappings. You can update a user or group's session policy after assignment.

**Note**  
The final permissions for a user or group is an intersection of the permissions defined in your EMR Studio user role and the permissions defined in the session policy for that user or group. If a user belongs to more than one group assigned to the Studio, EMR Studio uses a union of permissions for that user.

**To assign users or groups to an EMR Studio using the Amazon EMR console**

1. Navigate to the new Amazon EMR console and select **Switch to the old console** from the side navigation. For more information on what to expect when you switch to the old console, see [Using the old console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html#console-opt-in).

1. Choose **EMR Studio** from the left navigation.

1. Choose your Studio name from the **Studios** list, or select the Studio and choose **View details**, to open the Studio detail page.

1. Choose **Add Users** to see the **Users** and **Groups** search table.

1. Select the **Users** tab or the **Groups** tab, and enter a search term in the search bar to find a user or group. 

1. Select one or more users or groups from the search results list. You can switch back and forth between the **Users** tab and the **Groups** tab.

1. After you select users and groups to add to the Studio, choose **Add**. You should see the users and groups appear in the **Studio users** list. It might take a few seconds for the list to refresh.

1. Follow the instructions in [Update permissions for a user or group assigned to a Studio](#emr-studio-update-user) to refine the Studio permissions for a user or group.

**To assign a user or group to an EMR Studio using the AWS CLI**

Insert your own values for the following `create-studio-session-mapping` arguments. For more information about the `create-studio-session-mapping` command, see the [https://docs.aws.amazon.com/cli/latest/reference/emr/create-studio-session-mapping.html](https://docs.aws.amazon.com/cli/latest/reference/emr/create-studio-session-mapping.html).
+ **`--studio-id`** – The ID of the Studio you want to assign the user or group to. For instructions on how to retrieve a Studio ID, see [View Studio details](emr-studio-manage-studio.md#emr-studio-get-studio-id).
+ `--identity-name` – The name of the user or group from the Identity Store. For more information, see [UserName](https://docs.aws.amazon.com/singlesignon/latest/IdentityStoreAPIReference/API_User.html#singlesignon-Type-User-UserName) for users and [DisplayName](https://docs.aws.amazon.com/singlesignon/latest/IdentityStoreAPIReference/API_Group.html#singlesignon-Type-Group-DisplayName) for groups in the *Identity Store API Reference*.
+ **`--identity-type`** – Use either `USER` or `GROUP` to specify the identity type.
+ **`--session-policy-arn`** – The Amazon Resource Name (ARN) for the session policy you want to associate with the user or group. For example, `arn:aws:iam::<aws-account-id>:policy/EMRStudio_Advanced_User_Policy`. For more information, see [Create permissions policies for EMR Studio users](emr-studio-user-permissions.md#emr-studio-permissions-policies).

```
aws emr create-studio-session-mapping \
 --studio-id <example-studio-id> \
 --identity-name <example-identity-name> \
 --identity-type <USER-or-GROUP> \
 --session-policy-arn <example-session-policy-arn>
```

**Note**  
Linux line continuation characters (\$1) are included for readability. They can be removed or used in Linux commands. For Windows, remove them or replace with a caret (^).

Use the `get-studio-session-mapping` command to verify the new assignment. Replace *<example-identity-name>* with the IAM Identity Center name of the user or group that you updated.

```
aws emr get-studio-session-mapping \
 --studio-id <example-studio-id> \
 --identity-type <USER-or-GROUP> \
 --identity-name <user-or-group-name> \
```

------

## Update permissions for a user or group assigned to a Studio
<a name="emr-studio-update-user"></a>

------
#### [ IAM ]

To update user or group permissions when you use IAM authentication mode, use IAM to change the IAM permissions policies attached to your IAM identities (users, groups, or roles). 

For more information, see [User permissions for IAM authentication mode](how-emr-studio-works.md#emr-studio-iam-authorization).

------
#### [ IAM Identity Center ]

****To update EMR Studio permissions for a user or group using the console****

1. Navigate to the new Amazon EMR console and select **Switch to the old console** from the side navigation. For more information on what to expect when you switch to the old console, see [Using the old console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html#console-opt-in).

1. Choose **EMR Studio** from the left navigation.

1. Choose your Studio name from the **Studios** list, or select the Studio and choose **View details**, to open the Studio detail page.

1. In the** Studio users** list on the Studio detail page, search for the user or group you want to update. You can search by name or identity type.

1. Select the user or group that you want to update and choose **Assign policy** to open the **Session policy** dialog box.

1. Select a policy to apply to the user or group that you chose in step 5, and choose **Apply policy**. The **Studio users** list should display the policy name in the **Session policy** column for the user or group that you updated.

**To update EMR Studio permissions for a user or group using the AWS CLI**

Insert your own values for the following `update-studio-session-mappings` arguments. For more information about the `update-studio-session-mappings` command, see the [https://docs.aws.amazon.com/cli/latest/reference/emr/update-studio-session-mapping.html](https://docs.aws.amazon.com/cli/latest/reference/emr/update-studio-session-mapping.html).

```
aws emr update-studio-session-mapping \
 --studio-id <example-studio-id> \
 --identity-name <name-of-user-or-group-to-update> \
 --session-policy-arn <new-session-policy-arn-to-apply> \
 --identity-type <USER-or-GROUP> \
```

Use the `get-studio-session-mapping` command to verify the new session policy assignment. Replace *<example-identity-name>* with the IAM Identity Center name of the user or group that you updated.

```
aws emr get-studio-session-mapping \
 --studio-id <example-studio-id> \
 --identity-type <USER-or-GROUP> \
 --identity-name <user-or-group-name> \
```

------

## Remove a user or group from a Studio
<a name="emr-studio-remove-user"></a>

------
#### [ IAM ]

To remove a user or group from an EMR Studio when you use IAM authentication mode, you must revoke the user's access to the Studio by reconfiguring the user's IAM permissions policy. 

In the following example policy, assume that you have an EMR Studio with the tag key-value pair `Team = Quality Assurance`. According to the policy, the user can access Studios tagged with the `Team` key whose value is equal to either `Data Analytics` or `Quality Assurance`. To remove the user from the Studio tagged with `Team = Quality Assurance`, remove `Quality Assurance` from the list of tag values.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "AllowCreateStudioPresignedUrl",
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:CreateStudioPresignedUrl"
      ],
      "Resource": [
        "arn:aws:elasticmapreduce:us-east-1:123456789012:studio/*"
      ],
      "Condition": {
        "StringEquals": {
          "elasticmapreduce:ResourceTag/Team": [
            "Data Analytics",
            "Quality Assurance"
          ]
        }
      }
    }
  ]
}
```

------

------
#### [ IAM Identity Center ]

****To remove a user or group from an EMR Studio using the console****

1. Navigate to the new Amazon EMR console and select **Switch to the old console** from the side navigation. For more information on what to expect when you switch to the old console, see [Using the old console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html#console-opt-in).

1. Choose **EMR Studio** from the left navigation.

1. Choose your Studio name from the **Studios** list, or select the Studio and choose **View details**, to open the Studio detail page.

1. In the** Studio users** list on the Studio detail page, find the user or group you want to remove from the Studio. You can search by name or identity type.

1. Select the user or group that you want to delete, choose **Delete** and confirm. The user or group that you deleted disappears from the **Studio users** list.

**To remove a user or group from an EMR Studio using the AWS CLI**

Insert your own values for the following `delete-studio-session-mapping` arguments. For more information about the `delete-studio-session-mapping` command, see the [https://docs.aws.amazon.com/cli/latest/reference/emr/delete-studio-session-mapping.html](https://docs.aws.amazon.com/cli/latest/reference/emr/delete-studio-session-mapping.html).

```
aws emr delete-studio-session-mapping \
 --studio-id <example-studio-id> \
 --identity-type <USER-or-GROUP> \
 --identity-name <name-of-user-or-group-to-delete> \
```

------

# Monitor, update and delete Amazon EMR Studio resources
<a name="emr-studio-manage-studio"></a>

This section includes instructions to help you monitor, update, or delete an EMR Studio resource. For information about assigning users or updating user permissions, see [Assign and manage EMR Studio users](emr-studio-manage-users.md).

## View Studio details
<a name="emr-studio-get-studio-id"></a>

------
#### [ Console ]

****To view details about an EMR Studio with the new console****

1. Open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR Studio** on the left navigation, choose **Studios**.

1. Select the Studio from the **Studios** list to open the Studio detail page. The Studio detail page includes **Studio setting** information, such as the Studio **Description**, **VPC**, and **Subnets**.

------
#### [ CLI ]

**To retrieve details for an EMR Studio by Studio ID using the AWS CLI**

Use the following `describe-studio` AWS CLI command to fetch detailed information about a particular EMR Studio. For more information, see the [https://docs.aws.amazon.com/cli/latest/reference/emr/describe-studio.html](https://docs.aws.amazon.com/cli/latest/reference/emr/describe-studio.html).

```
aws emr describe-studio \
 --studio-id <id-of-studio-to-describe> \
```

**To retrieve a list of EMR Studios using the AWS CLI**

Use the following `list-studios` AWS CLI command. For more information, see the [https://docs.aws.amazon.com/cli/latest/reference/emr/list-studios.html](https://docs.aws.amazon.com/cli/latest/reference/emr/list-studios.html).

```
aws emr list-studios
```

The following is an example return value for the `list-studios` command in JSON format. 

```
{
    "Studios": [
        {
            "AuthMode": "IAM",
            "VpcId": "vpc-b21XXXXX", 
            "Name": "example-studio-name", 
            "Url": "https://es-7HWP74SNGDXXXXXXXXXXXXXXX.emrstudio-prod.us-east-1.amazonaws.com", 
            "CreationTime": 1605672582.781, 
            "StudioId": "es-7HWP74SNGDXXXXXXXXXXXXXXX", 
            "Description": "example studio description"
        }
    ]
}
```

------

## Monitor Amazon EMR Studio actions
<a name="emr-studio-monitor"></a>

### View EMR Studio and API activity
<a name="emr-studio-cloudtrail-events"></a>

EMR Studio is integrated with AWS CloudTrail, a service that provides a record of actions taken by a user, by an IAM role, or by another AWS service in EMR Studio. CloudTrail captures API calls for EMR Studio as events. You can view events using the CloudTrail console at [https://console.aws.amazon.com/cloudtrail/](https://console.aws.amazon.com/cloudtrail/). 

EMR Studio events provide information such as which Studio or IAM user makes a request, and what kind of request it is.

**Note**  
On-cluster actions such as running notebook jobs do not emit AWS CloudTrail.

You can also create a trail for continuous delivery of EMR Studio CloudTrail events to an Amazon S3 bucket. For more information, see the *[AWS CloudTrail User Guide](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-user-guide.html)*.

**Example CloudTrail Event: a user Calls the DescribeStudio API**

The following is an example AWS CloudTrail event that is created when a user, `admin`, calls the [DescribeStudio](https://docs.aws.amazon.com/emr/latest/APIReference/API_DescribeStudio.html) API. CloudTrail records the user name as `admin`.

**Note**  
To protect Studio details, the EMR Studio API event for DescribeStudio excludes a value for `responseElements`.

```
{
   "eventVersion":"1.08",
   "userIdentity":{
      "type":"IAMUser",
      "principalId":"AIDXXXXXXXXXXXXXXXXXX",
      "arn":"arn:aws:iam::653XXXXXXXXX:user/admin",
      "accountId":"653XXXXXXXXX",
      "accessKeyId":"AKIAIOSFODNN7EXAMPLE",
      "userName":"admin"
   },
   "eventTime":"2021-01-07T19:13:58Z",
   "eventSource":"elasticmapreduce.amazonaws.com",
   "eventName":"DescribeStudio",
   "awsRegion":"us-east-1",
   "sourceIPAddress":"72.XX.XXX.XX",
   "userAgent":"aws-cli/1.18.188 Python/3.8.5 Darwin/18.7.0 botocore/1.19.28",
   "requestParameters":{
      "studioId":"es-9O5XXXXXXXXXXXXXXXXXXXXXX"
   },
   "responseElements":null,
   "requestID":"0fxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
   "eventID":"b0xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
   "readOnly":true,
   "eventType":"AwsApiCall",
   "managementEvent":true,
   "eventCategory":"Management",
   "recipientAccountId":"653XXXXXXXXX"
}
```

### View Spark user and job activity
<a name="emr-studio-monitor-spark-job-by-user"></a>

To view Spark job activity by Amazon EMR Studio users, you can configure user impersonation on a cluster. With user impersonation, each Spark job that is submitted from a Workspace is associated with the Studio user who ran the code.

When user impersonation is enabled, Amazon EMR creates an HDFS user directory on the cluster's primary node for each user that runs code in the Workspace. For example, if user `studio-user-1@example.com` runs code, you can connect to the primary node and see that `hadoop fs -ls /user` has a directory for `studio-user-1@example.com`.

To set up Spark user impersonation, set the following properties in the following configuration classifications:
+ `core-site`
+ `livy-conf`

```
[
    {
        "Classification": "core-site",
        "Properties": {
          "hadoop.proxyuser.livy.groups": "*",
          "hadoop.proxyuser.livy.hosts": "*"
        }
    },
    {
        "Classification": "livy-conf",
        "Properties": {
          "livy.impersonation.enabled": "true"
        }
    }
]
```

To view history server pages, see [Debug applications and jobs with EMR Studio](emr-studio-debug.md). You can also connect to the primary node of the cluster using SSH to view application web interfaces. For more information, see [View web interfaces hosted on Amazon EMR clusters](emr-web-interfaces.md).

## Update an Amazon EMR Studio
<a name="emr-studio-update-studio"></a>

After you create an EMR Studio, you can update the following attributes using the AWS CLI:
+ Name
+ Description
+ Default S3 location
+ Subnets

**To update an EMR Studio using the AWS CLI**

Use the `update-studio` AWS CLI command to update an EMR Studio. For more information, see the [https://docs.aws.amazon.com/cli/latest/reference/emr/update-studio.html](https://docs.aws.amazon.com/cli/latest/reference/emr/update-studio.html).

**Note**  
You can associated a Studio with a maximum of 5 subnets. These subnets must belong to the same VPC as the Studio. The list of subnet IDs that you submit to the `update-studio` command can include new subnet IDs, but must also include all of the subnet IDs that you already associated with the Studio. You can't remove subnets from a Studio.

```
aws emr update-studio \
 --studio-id <example-studio-id-to-update> \
 --name <example-new-studio-name> \
 --subnet-ids <old-subnet-id-1 old-subnet-id-2 old-subnet-id-3 new-subnet-id> \
```

To verify the changes, use the `describe-studio` AWS CLI command and specify your Studio ID. For more information, see the [https://docs.aws.amazon.com/cli/latest/reference/emr/describe-studio.html](https://docs.aws.amazon.com/cli/latest/reference/emr/describe-studio.html).

```
aws emr describe-studio \
 --studio-id <id-of-updated-studio> \
```

## Delete an Amazon EMR Studio and Workspaces
<a name="emr-studio-delete-studio"></a>

When you delete a Studio, EMR Studio deletes all of the IAM Identity Center user and group assignments that are associated with the Studio. 

**Note**  
When you delete a Studio, Amazon EMR does *not* delete the Workspaces associated with that Studio. You must delete the Workspaces in your Studio separately.

**Delete Workspaces**

------
#### [ Console ]

Since each EMR Studio Workspace is an EMR notebook instance, you can use the Amazon EMR management console to delete Workspaces. You can delete Workspaces using the Amazon EMR console before or after you delete your Studio

**To delete a Workspace using the Amazon EMR console**

1. Navigate to the new Amazon EMR console and select **Switch to the old console** from the side navigation. For more information on what to expect when you switch to the old console, see [Using the old console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html#console-opt-in).

1. Choose **Notebooks**.

1. Select the Workspace(s) that you want to delete.

1. Choose **Delete**, then choose **Delete** again to confirm.

1. Follow the instructions for [Deleting objects](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/delete-objects.html) in the *Amazon Simple Storage Service* *Console User Guide* to remove the notebook files associated with the deleted Workspace from Amazon S3.

------
#### [ EMR Studio UI ]

------
#### [ From the Workspace UI ]

**Delete a Workspace and its associated backup files from EMR Studio**

1. Log in to your EMR Studio with your Studio access URL and choose **Workspaces** from the left navigation.

1. Find your Workspace in the list, then select the check box next to its name. You can select multiple Workspaces to delete at the same time.

1. Choose **Delete** in the upper right of the **Workspaces** list and confirm that you want to delete the selected Workspaces. Choose **Delete** to confirm.

1. If you want to remove the notebook files that were associated with the deleted Workspace from Amazon S3, follow the instructions for [Deleting objects](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/delete-objects.html) in the *Amazon Simple Storage Service* *Console User Guide*. If you did not create the Studio, consult your Studio administrator to determine the Amazon S3 backup location for the deleted Workspace.

------
#### [ From the Workspaces list ]

**Delete a Workspace and its associated backup files from the Workspaces list**

1. Navigate to the **Workspace**s list in the console.

1. Select the Workspace that you want to delete from the list and then choose **Actions**.

1. Choose **Delete**.

1. If you want to remove the notebook files that were associated with the deleted Workspace from Amazon S3, follow the instructions for [Deleting objects](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/delete-objects.html) in the *Amazon Simple Storage Service* *Console User Guide*. If you did not create the Studio, consult your Studio administrator to determine the Amazon S3 backup location for the deleted Workspace.

------

------

**Delete an EMR Studio**

------
#### [ Console ]

****To delete an EMR Studio with the new console****

1. Open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR Studio** on the left navigation, choose **Studios**.

1. Select the Studio from the **Studios** list with the toggle to the left of the Studio name . Choose **Delete**.

------
#### [ Old console ]

****To delete an EMR Studio with the old console****

1. Open the Amazon EMR console at [https://console.aws.amazon.com/elasticmapreduce/home](https://console.aws.amazon.com/elasticmapreduce/home).

1. Choose **EMR Studio** from the left navigation.

1. Select the Studio from the **Studios** list and choose **Delete**.

------
#### [ CLI ]

**To delete an EMR Studio with the AWS CLI**

Use the `delete-studio` AWS CLI command to delete an EMR Studio. For more information, see the [https://docs.aws.amazon.com/cli/latest/reference/emr/delete-studio.html](https://docs.aws.amazon.com/cli/latest/reference/emr/delete-studio.html).

```
aws emr delete-studio --studio-id <id-of-studio-to-delete>
```

------

# Encrypting EMR Studio workspace notebooks and files
<a name="emr-studio-workspace-storage-encryption"></a>

In EMR Studio, you can create and configure different workspaces to organize and run notebooks. These workspaces store notebooks and related files in your specified Amazon S3 bucket. By default, these files are encrypted with Amazon S3-managed keys (SSE-S3) with server-side encryption as the base level of encryption. You can also choose to use customer managed KMS keys (SSE-KMS) to encrypt your files. You can do so by using the Amazon EMR management console or through the AWS CLI and AWS SDK when creating an EMR Studio.

EMR Studio workspace storage encryption is available in all the [Regions](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-considerations.html#emr-studio-considerations-general) where EMR Studio is available.

## Prerequisites
<a name="emr-studio-workspace-storage-encryption-prereqs"></a>

Before you can encrypt EMR Studio workspace notebook and files, you must use AWS Key Management Service to [ create a symmetric customer manager key (CMK)](https://docs.aws.amazon.com/kms/latest/developerguide/create-keys.html#create-symmetric-cmk) in the same AWS account and Region as your EMR Studio.

 The resource policy of your AWS KMS must have the necessary access permissions for your EMR Studio's service role. The following is a sample IAM policy granting minimum access permissions for EMR Studio Workspace storage encryption: 

```
{
    "Sid": "AllowEMRStudioServiceRoleAccess",
    "Effect": "Allow",
    "Principal": {
        "AWS": "arn:aws:iam::<ACCOUNT_ID>:role/<ROLE_NAME>"
    },
    "Action": [
        "kms:Decrypt", 
        "kms:GenerateDataKey", 
        "kms:ReEncryptFrom",
        "kms:ReEncryptTo",
        "kms:DescribeKey"
    ],
    "Resource": "*",
    "Condition": {
        "StringEquals": {
            "kms:CallerAccount": "<ACCOUNT_ID>",
            "kms:EncryptionContext:aws:s3:arn": "arn:aws:s3:::<S3_BUCKET_NAME>",
            "kms:ViaService": "s3.<AWS_REGION>.amazonaws.com"
        }
    }
}
```

Your EMR Studio service role must also have the access permissions to use your AWS KMS key. The following is a sample IAM policy granting the minimum access permissions for EMR Studio Workspace storage encryption:

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "AllowEMRStudioWorkspaceStorageEncryptionAccess",
      "Effect": "Allow",
      "Action": [
        "kms:Decrypt",
        "kms:GenerateDataKey",
        "kms:ReEncryptFrom",
        "kms:ReEncryptTo",
        "kms:DescribeKey"
      ],
      "Resource": [
        "arn:aws:kms:*:123456789012:key/12345678-1234-1234-1234-123456789012"
      ]
    }
  ]
}
```

------

## Create a new EMR Studio
<a name="emr-studio-workspace-storage-encryption-setup"></a>

Follow these steps to create a new EMR Studio that uses workspace storage encryption.

1. Open the Amazon EMR console at [https://console.aws.amazon.com/elasticmapreduce/](https://console.aws.amazon.com/elasticmapreduce/).

1. Choose **Studios**, then choose **Create Studio**.

1. For **S3 location for storage**, enter or choose an Amazon S3 path. This is the Amazon S3 location where Amazon EMR stores workspace notebooks and files.

1. For **Service role**, enter or choose an IAM role. This is the IAM role that Amazon EMR assumes.

1. Choose **Encrypt Workspace files with your own AWS KMS key**.

1. Enter or choose an AWS KMS key to use to encrypt workspace notebooks and files in Amazon S3.

1. Choose **Create Studio** or **Create Studio and Launch Workspaces**.

1. Choose **Encrypt Workspace files with your own AWS KMS key**.

1. Enter or choose an AWS KMS to use to encrypt workspace notebooks and files in Amazon S3.

1. Choose **Save Changes**.

The following steps demonstrate how to update an EMR Studio and set up workspace storage encryption.

1. Open the Amazon EMR console at [https://console.aws.amazon.com/elasticmapreduce/](https://console.aws.amazon.com/elasticmapreduce/).

1. Choose **an existing EMR Studio from the list**, then choose **Edit**.

1. Choose **Encrypt Workspace files with your own AWS KMS key**.

1. Enter or choose an AWS KMS to use to encrypt workspace notebooks and files in Amazon S3.

1. Choose **Save Changes**.

# Define security groups to control EMR Studio network traffic
<a name="emr-studio-security-groups"></a>

## About the EMR Studio security groups
<a name="emr-studio-about-security-groups"></a>

Amazon EMR Studio uses two security groups to control network traffic between Workspaces in the Studio and an attached Amazon EMR cluster running on Amazon EC2:
+ An **engine security group** that uses port 18888 to communicate with an attached Amazon EMR cluster running on Amazon EC2.
+ A **Workspace security group** associated with the Workspaces in a Studio. This security group includes an outbound HTTPS rule to allow the Workspace to route traffic to the internet and must allow outbound traffic to the internet on port 443 to enable linking Git repositories to a Workspace.

EMR Studio uses these security groups in addition to any security groups associated with an EMR cluster attached to a Workspace. 

You must create these security groups when you use the AWS CLI to create a Studio. 

**Note**  
You can customize the security groups for EMR Studio with rules tailored to your environment, but you must include the rules noted on this page. Your Workspace security group can't allow any inbound traffic, and the engine security group must allow inbound traffic from the Workspace security group.

**Use the Default EMR Studio Security Groups**

When you use the Amazon EMR console, you can choose the following default security groups. The default security groups are created by EMR Studio on your behalf, and include the minimum required inbound and outbound rules for Workspaces in an EMR Studio. 
+ `DefaultEngineSecurityGroup`
+ `DefaultWorkspaceSecurityGroupGit` or `DefaultWorkspaceSecurityGroupWithoutGit`

## Prerequisites
<a name="emr-studio-security-group-prereqs"></a>

To create the security groups for EMR Studio, you need an Amazon Virtual Private Cloud (VPC) for the Studio. You choose this VPC when you create the security groups. This should be the same VPC that you specify when you create the Studio. If you plan to use Amazon Amazon EMR on EKS with EMR Studio, choose the VPC for your Amazon EKS cluster worker nodes.

## Instructions
<a name="emr-studio-security-group-instructions"></a>

Follow the instructions in [Creating a security group](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/working-with-security-groups.html#creating-security-group) in the *Amazon EC2 User Guide for Linux Instances* to create an engine security group and a Workspace security group in your VPC. The security groups must include the rules summarized in the following tables.

When you create security groups for EMR Studio, note the IDs for both. You specify each security group by ID when you create a Studio.

**Engine security group**  
EMR Studio uses port 18888 to communicate with an attached cluster.    
**Inbound rules**    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-security-groups.html)

**Workspace security group**  
This security group is associated with the Workspaces in an EMR Studio.     
**Outbound rules**    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-security-groups.html)

# Create AWS CloudFormation templates for Amazon EMR Studio
<a name="emr-studio-cluster-templates"></a>

## About EMR Studio cluster templates
<a name="emr-studio-about-cluster-templates"></a>

You can create AWS CloudFormation templates to help EMR Studio users launch new Amazon EMR clusters in a Workspace. CloudFormation templates are formatted text files in JSON or YAML. In a template, you describe a stack of AWS resources and tell CloudFormation how to provision those resources for you. For EMR Studio, you can create one or more templates that describe an Amazon EMR cluster. 

You organize your templates in AWS Service Catalog. AWS Service Catalog lets you create and manage commonly deployed IT services called *products* on AWS. You collect your templates as products in a *portfolio* that you share with your EMR Studio users. After you create cluster templates, Studio users can launch a new cluster for a Workspace with one of your templates. Users must have permission to create new clusters from templates. You can set user permissions in your [EMR Studio permissions policies](emr-studio-user-permissions.md).

To learn more about CloudFormation templates, see [Templates](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-whatis-concepts.html#w2ab1b5c15b7) in the *AWS CloudFormation User Guide*. For more information about AWS Service Catalog, see [What is AWS Service Catalog](https://docs.aws.amazon.com/servicecatalog/latest/adminguide/introduction.html).

The following video demonstrates how to set up cluster templates in AWS Service Catalog for EMR Studio. You can also learn more in the [Build a self-service environment for each line of business using Amazon EMR and Service Catalog](https://aws.amazon.com/blogs/big-data/build-a-self-service-environment-for-each-line-of-business-using-amazon-emr-and-aws-service-catalog/) blog post.

[![AWS Videos](http://img.youtube.com/vi/https://www.youtube.com/embed/9w_TXTdFLpo/0.jpg)](http://www.youtube.com/watch?v=https://www.youtube.com/embed/9w_TXTdFLpo)


### Optional template parameters
<a name="emr-studio-cluster-template-parameters"></a>

You can include additional options in the [https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/parameters-section-structure.html](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/parameters-section-structure.html) section of your template. *Parameters* let Studio users input or select custom values for a cluster. For example, you could add a parameter that lets users select a particular Amazon EMR release. For more information, see [Parameters](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/parameters-section-structure.html) in the *CloudFormation User Guide*.

The following example `Parameters` section defines additional input parameters such as `ClusterName`, `EmrRelease` version, and `ClusterInstanceType`.

```
Parameters:
  ClusterName:
    Type: "String"
    Default: "Cluster_Name_Placeholder"
  EmrRelease:
    Type: "String"
    Default: "emr-6.2.0"
    AllowedValues:
    - "emr-6.2.0"
    - "emr-5.32.0"
  ClusterInstanceType:
    Type: "String"
    Default: "m5.xlarge"
    AllowedValues:
    - "m5.xlarge"
    - "m5.2xlarge"
```

When you add parameters, Studio users see additional form options after selecting a cluster template. The following image shows additional form options for **EmrRelease** version, **ClusterName**, and **InstanceType**.

![\[Screenshot of the additional inputs in the Studio user interface when a user selects a cluster template with parameters.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/cluster-template-parameters-studio-ui.jpg)


## Prerequisites
<a name="emr-studio-cluster-template-prereqs"></a>

Before you create a cluster template, make sure you have IAM permissions to access the Service Catalog administrator console view. You also need the required IAM permissions to perform Service Catalog administrative tasks. For more information, see [Grant permissions to Service Catalog administrators](https://docs.aws.amazon.com/servicecatalog/latest/adminguide/getstarted-iamadmin.html). 

## Create EMR cluster templates
<a name="emr-studio-cluster-template-instructions"></a>

**To create EMR cluster templates using Service Catalog**

1. Create one or more CloudFormation templates. Where you store your templates is up to you. Since templates are formatted text files, you can upload them to Amazon S3 or keep them in your local file system. To learn more about CloudFormation templates, see [Templates](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-whatis-concepts.html#w2ab1b5c15b7) in the *AWS CloudFormation User Guide*.

   Use the following rules to name your templates, or check your names against the pattern `[a-zA-Z0-9][a-zA-Z0-9._-]*`.
   + Template names must start with a letter or a number.
   + Template names can only consist of letters, numbers, periods (.), underscores (\$1), and hyphens (-).

   Each cluster template that you create must include the following options: 

   **Input parameters**
   + ClusterName – A name for the cluster to help users identify it after it has been provisioned.

   **Output**
   + `ClusterId` – The ID of the newly-provisioned EMR cluster.

   Following is an example CloudFormation template in YAML format for a cluster with two nodes. The example template includes the required template options and defines additional input parameters for `EmrRelease` and `ClusterInstanceType`.

   ```
   awsTemplateFormatVersion: 2010-09-09
   
   Parameters:
     ClusterName:
       Type: "String"
       Default: "Example_Two_Node_Cluster"
     EmrRelease:
       Type: "String"
       Default: "emr-6.2.0"
       AllowedValues:
       - "emr-6.2.0"
       - "emr-5.32.0"
     ClusterInstanceType:
       Type: "String"
       Default: "m5.xlarge"
       AllowedValues:
       - "m5.xlarge"
       - "m5.2xlarge"
   
   Resources:
     EmrCluster:
       Type: AWS::EMR::Cluster
       Properties:
         Applications:
         - Name: Spark
         - Name: Livy
         - Name: JupyterEnterpriseGateway
         - Name: Hive
         EbsRootVolumeSize: '10'
         Name: !Ref ClusterName
         JobFlowRole: EMR_EC2_DefaultRole
         ServiceRole: EMR_DefaultRole_V2
         ReleaseLabel: !Ref EmrRelease
         VisibleToAllUsers: true
         LogUri: 
           Fn::Sub: 's3://aws-logs-${AWS::AccountId}-${AWS::Region}/elasticmapreduce/'
         Instances:
           TerminationProtected: false
           Ec2SubnetId: 'subnet-ab12345c'
           MasterInstanceGroup:
             InstanceCount: 1
             InstanceType: !Ref ClusterInstanceType
           CoreInstanceGroup:
             InstanceCount: 1
             InstanceType: !Ref ClusterInstanceType
             Market: ON_DEMAND
             Name: Core
   
   Outputs:
     ClusterId:
       Value:
         Ref: EmrCluster
       Description: The ID of the  EMR cluster
   ```

1. Create a portfolio for your cluster templates in the same AWS account as your Studio. 

   1. Open the AWS Service Catalog console at [https://console.aws.amazon.com/servicecatalog/](https://console.aws.amazon.com/servicecatalog/).

   1. Choose **Portfolios** in the left navigation menu.

   1. Enter the requested information on the **Create portfolio** page.

   1. Choose **Create**. AWS Service Catalog creates the portfolio and displays the portfolio details.

1. Use the following steps to add your cluster templates as AWS Service Catalog products.

   1. Navigate to the **Products** page under **Administration** in the AWS Service Catalog management console.

   1. Choose **Upload new product**.

   1. Enter a **Product name** and **Owner**.

   1. Specify your template file under **Version details**. 

   1. Choose **Review** to review your product settings, then choose **Create product**.

1. Complete the following steps to add your products to your portfolio.

   1. Navigate to the **Products** page in the AWS Service Catalog management console.

   1. Choose your product, choose **Actions**, then choose **Add product to portfolio**.

   1. Choose your portfolio, then choose **Add product to portfolio**.

1. Create a launch constraint for your products. A launch constraint is an IAM role that specifies user permissions for launching a product. You can tailor your launch constraints, but must allow permissions to use CloudFormation, Amazon EMR, and AWS Service Catalog. For more information and instructions, see [Service Catalog launch constraints](https://docs.aws.amazon.com/servicecatalog/latest/adminguide/constraints-launch.html).

1. Apply your launch constraint to each product in your portfolio. You must apply the launch constraint to each product individually.

   1. Select your portfolio from the **Portfolios** page in the AWS Service Catalog management console.

   1. Choose the **Constraints** tab and choose **Create constraint**.

   1. Choose your product and choose **Launch** under **Constraint type**. Choose **Continue**.

   1. Select your launch constraint role in the **Launch constraint** section, then choose **Create**.

1. Grant access to your portfolio.

   1. Select your portfolio from the **Portfolios** page in the AWS Service Catalog management console.

   1. Expand the **Groups, roles, and users** tab and choose **Add groups, roles, users**.

   1. Search for your EMR Studio IAM role in the **Roles** tab, select your role, and choose **Add access**.  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-cluster-templates.html)

# Establish access and permissions for Git-based repositories
<a name="emr-studio-enable-git"></a>

EMR Studio supports the following Git-based services:
+ [AWS CodeCommit](https://aws.amazon.com/codecommit)
+ [GitHub](https://github.com)
+ [Bitbucket](https://bitbucket.org/)
+ [GitLab](https://about.gitlab.com/)

To let EMR Studio users associate a Git repository with a Workspace, set up the following access and permissions requirements. You can also configure Git-based repositories that you host in a private network by following the instructions in [Configure a privately hosted Git repository for EMR Studio](#emr-studio-private-git-repo).

**Cluster internet access**  
Both Amazon EMR clusters running on Amazon EC2 and Amazon EMR on EKS clusters attached to Studio Workspaces must be in a private subnet that uses a network address translation (NAT) gateway, or they must be able to access the internet through a virtual private gateway. For more information, see [Amazon VPC options when you launch a cluster](emr-clusters-in-a-vpc.md).  
The security groups that you use with EMR Studio must also include an outbound rule that allows Workspaces to route traffic to the internet from an attached EMR cluster. For more information, see [Define security groups to control EMR Studio network traffic](emr-studio-security-groups.md).  
If the network interface is in a public subnet, it won't be able to communicate with the internet through an internet gateway (IGW).

**Permissions for AWS Secrets Manager**  
To let EMR Studio users access Git repositories with secrets stored in AWS Secrets Manager, add a permissions policy to the [service role for EMR Studio](emr-studio-service-role.md) that allows the `secretsmanager:GetSecretValue` operation.

For information about how to link Git-based repositories to Workspaces, see [Link Git-based repositories to an EMR Studio Workspace](emr-studio-git-repo.md).

## Configure a privately hosted Git repository for EMR Studio
<a name="emr-studio-private-git-repo"></a>

Use the following instructions to configure privately hosted repositories for Amazon EMR Studio. Provide a configuration file with information about your DNS and Git servers. EMR Studio uses this information to configure Workspaces that can route traffic to your self-managed repositories.

**Note**  
If you configure `DnsServerIpV4`, EMR Studio uses your DNS server to resolve both your `GitServerDnsName` and AWS Endpoints, but it's strongly recommended to avoid resolving AWS Endpoints with your `DnsServerIpV4` as this can disrupt essential service functionalities.

**Prerequisites**

Before you configure a privately hosted Git repository for EMR Studio, you need an Amazon S3 storage location where EMR Studio can back up the Workspaces and notebook files in the Studio. Use the same S3 bucket that you specify when you create a Studio.

**To configure one or more privately hosted Git repositories for EMR Studio**

1. Create a configuration file using the following template. Include the following values for each Git server that you want to specify in your configuration:
   + **`DnsServerIpV4`** - The IPv4 address of your DNS server. If you provide values for both `DnsServerIpV4` and `GitServerIpV4List`, the value for `DnsServerIpV4` takes precedence and EMR Studio uses `DnsServerIpV4` to resolve your `GitServerDnsName`.
**Note**  
To use privately hosted Git repositories, your DNS server must allow inbound access from EMR Studio. We urge you to secure your DNS server against other, unauthorized access.
   + **`GitServerDnsName`** - The DNS name of your Git server. For example `"git.example.com"`.
   + **`GitServerIpV4List`** - A list of IPv4 addresses that belong to your Git servers.

   ```
   [
       {
           "Type": "PrivatelyHostedGitConfig",
           "Value": [
               {
                   "DnsServerIpV4": "<10.24.34.xxx>",
                   "GitServerDnsName": "<enterprise.git.com>",
                   "GitServerIpV4List": [
                       "<xxx.xxx.xxx.xxx>",
                       "<xxx.xxx.xxx.xxx>"
                   ]
               },
               {
                   "DnsServerIpV4": "<10.24.34.xxx>",
                   "GitServerDnsName": "<git.example.com>",
                   "GitServerIpV4List": [
                       "<xxx.xxx.xxx.xxx>",
                       "<xxx.xxx.xxx.xxx>"
                   ]
               }
           ]
       }
   ]
   ```

1. Save your configuration file as `configuration.json`.

1. Upload the configuration file into your default Amazon S3 storage location in a folder called `life-cycle-configuration`. For example, if your default S3 location is `s3://amzn-s3-demo-bucket/workspace`, your configuration file would be in `s3://amzn-s3-demo-bucket/workspace/life-cycle-configuration/configuration.json`.
**Important**  
We urge you to restrict access to your `life-cycle-configuration` folder to Studio administrators and to your EMR Studio service role, and that you secure `configuration.json` against unauthorized access. For instructions, see [Controlling access to a bucket with user policies](https://docs.aws.amazon.com/AmazonS3/latest/userguide/walkthrough1.html) or [Security Best Practices for Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/security-best-practices.html).

   For upload instructions, see [Creating a folder](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-folders.html#create-folder) and [Uploading objects](https://docs.aws.amazon.com/AmazonS3/latest/userguide/upload-objects.html) in the *Amazon Simple Storage Service User Guide*. To apply your configuration to an existing Workspace, close and restart the Workspace after you upload your configuration file to Amazon S3.

# Optimize Spark jobs in EMR Studio
<a name="emr-studio-spark-optimization"></a>

When running a Spark job using EMR Studio, there are a few steps you can take to help ensure that you're optimizing your Amazon EMR cluster resources.

## Prolong your Livy session
<a name="optimize-spark-livy-timeout"></a>

If you use Apache Livy along with Spark on your Amazon EMR cluster, we recommend that you increase your Livy session timeout by doing one of the following:
+ When you create an Amazon EMR cluster, set this configuration classification in the **Enter Configuration** field.

  ```
  [
      {
          "Classification": "livy-conf",
          "Properties": {
            "livy.server.session.timeout": "8h"
          }
      }
  ]
  ```
+ For an already-running EMR cluster, connect to your cluster using `ssh` and set the `livy-conf` configuration classification in `/etc/livy/conf/livy.conf`.

  ```
  [
      {
          "Classification": "livy-conf",
          "Properties": {
            "livy.server.session.timeout": "8h"
          }
      }
  ]
  ```

  You may need to restart Livy after changing the configuration.
+ If you don't want your Livy session to time out at all, set the property `livy.server.session.timeout-check` to `false` in `/etc/livy/conf/livy.conf`.

## Run Spark in cluster mode
<a name="optimize-spark-cluster-mode"></a>

In cluster mode, the Spark driver runs on a core node instead of on the primary node, improving resource utilization on the primary node. 

To run your Spark application in cluster mode instead of the default client mode, choose **Cluster** mode when you set **Deploy mode** while configuring your Spark step in your new Amazon EMR cluster. For more information, see [Cluster mode overview](https://spark.apache.org/docs/latest/cluster-overview.html) in the Apache Spark documentation.

## Increase Spark driver memory
<a name="optimize-spark-memory"></a>

To increase the Spark driver memory, configure your Spark session using the `%%configure` magic command in your EMR notebook, as in the following example.

```
%%configure -f
{"driverMemory": "6000M"}
```

# Use an Amazon EMR Studio
<a name="use-an-emr-studio"></a>

This section contains topics that help you configure and interact with an Amazon EMR Studio.

The following video covers practical information such as how to create a new Workspace, and how to launch a new Amazon EMR cluster with a cluster template. The video also runs through a sample notebook.

[![AWS Videos](http://img.youtube.com/vi/https://www.youtube.com/embed/rZ3zeJ6WKPY/0.jpg)](http://www.youtube.com/watch?v=https://www.youtube.com/embed/rZ3zeJ6WKPY)


**Topics**
+ [

# Learn EMR Studio workspaces
](emr-studio-configure-workspace.md)
+ [

# Configure Workspace collaboration in EMR Studio
](emr-studio-workspace-collaboration.md)
+ [

# Run an EMR Studio Workspace with a runtime role
](emr-studio-runtime.md)
+ [

# Run Amazon EMR Studio Workspace Workspace notebooks programmatically
](emr-studio-run-programmatically.md)
+ [

# Browse data with SQL Explorer for EMR Studio
](emr-studio-sql-explorer.md)
+ [

# Attach a compute to an EMR Studio Workspace
](emr-studio-create-use-clusters.md)
+ [

# Link Git-based repositories to an EMR Studio Workspace
](emr-studio-git-repo.md)
+ [

# Use the Amazon Athena SQL editor in EMR Studio
](emr-studio-athena.md)
+ [

# Amazon CodeWhisperer integration with EMR Studio Workspaces
](emr-studio-codewhisperer.md)
+ [

# Debug applications and jobs with EMR Studio
](emr-studio-debug.md)
+ [

# Install kernels and libraries in an EMR Studio Workspace
](emr-studio-install-libraries-and-kernels.md)
+ [

# Enhance kernels with magic commands in EMR Studio
](emr-studio-magics.md)
+ [

# Use multi-language notebooks with Spark kernels
](emr-multi-language-kernels.md)

# Learn EMR Studio workspaces
<a name="emr-studio-configure-workspace"></a>

When you use an EMR Studio, you can create and configure different *Workspaces* to organize and run notebooks. This section covers creating and working with Workspaces. For a conceptual overview, see [Workspaces](how-emr-studio-works.md#emr-studio-workspaces) on the [How Amazon EMR Studio works](how-emr-studio-works.md) page.

**Topics**
+ [

# Create an EMR Studio Workspace
](emr-studio-create-workspace.md)
+ [

# Launch a Workspace in EMR Studio
](emr-studio-use-workspace.md)
+ [

# Understand the Workspace user interface in EMR Studio
](emr-studio-workspace-ui.md)
+ [

# Explore notebook examples in an EMR Studio workspace
](emr-studio-notebook-examples.md)
+ [

# Save Workspace content in EMR Studio
](emr-studio-save-workspace.md)
+ [

# Delete a Workspace and notebook files in EMR Studio
](emr-studio-delete-workspace.md)
+ [

# Understand Workspace status
](emr-studio-workspace-status.md)
+ [

# Resolve Workspace connectivity issues
](emr-studio-workspace-stop-start.md)

# Create an EMR Studio Workspace
<a name="emr-studio-create-workspace"></a>

You can create EMR Studio Workspaces to run notebook code using the EMR Studio interface. 

**To create a Workspace in an EMR Studio**

1. Log in to your EMR Studio.

1. Choose **Create a Workspace**.

1. Enter a **Workspace name** and a **Description**. Naming a Workspace helps you identify it on the **Workspaces** page.

1. If you want to work with other Studio users in this Workspace in real time, enable Workspace collaboration. You can configure collaborators after you launch the Workspace.

1. If you want to attach a cluster to a Workspace, expand the **Advanced configuration** section. You can attach a cluster later, if you prefer. For more information, see [Attach a compute to an EMR Studio Workspace](emr-studio-create-use-clusters.md).
**Note**  
To provision a new cluster, you need access permissions from your administrator. 

   Choose one of the cluster options for the Workspace and attach the cluster. For more information about provisioning a cluster when you create a Workspace, see [Create and attach a new EMR cluster to an EMR Studio Workspace](emr-studio-create-use-clusters.md#emr-studio-create-cluster).

1. Choose **Create a Workspace** in the lower right of the page. 

After you create a Workspace, EMR Studio will open the **Workspaces** page. You will see a green success banner at the top of the page and can find the newly-created Workspace in the list.

By default, a Workspace is shared and can be seen by all Studio users. However, only one user can open and work in a Workspace at a time. To work simultaneously with other users, you can [Configure Workspace collaboration in EMR Studio](emr-studio-workspace-collaboration.md)

# Launch a Workspace in EMR Studio
<a name="emr-studio-use-workspace"></a>

To start working with notebook files, launch a Workspace to access the notebook editor. The **Workspaces** page in a Studio lists all of the Workspaces that you have access to with details including **Name**, **Status**, **Creation time**, and **Last modified**. 

**Note**  
If you had EMR notebooks in the old Amazon EMR console, you can find them in the console as EMR Studio Workspaces. EMR Notebooks users need additional IAM role permissions to access or create Workspaces. If you recently created a notebook in the old console, you might need to refresh the Workspaces list to see it in the console. For more information about the transition, see [Amazon EMR Notebooks are available as Amazon EMR Studio Workspaces in the console](emr-managed-notebooks-migration.md) and [Managing Amazon EMR clusters with the console](whats-new-in-console.md)

**To launch a Workspace for editing and running notebooks**

1. On the **Workspaces** page of your Studio, find the Workspace. You can filter the list by keyword or by column value.

1. Choose the Workspace name to launch the Workspace in a new browser tab. It may take a few minutes for the Workspace to open if it's **Idle**. Alternatively, select the row for the Workspace and then select **Launch Workspace**. You can choose from the following launch options:
   + **Quick launch** – Quickly launch your Workspace with default options. Choose **Quick launch** if you want to attach clusters to the Workspace in JupyterLab.
   + **Launch with options** – Launch your Workspace with custom options. You can choose to launch in either Jupyter or JupyterLab, attach your Workspace to an EMR cluster, and select your security groups.
**Note**  
Only one user can open and work in a Workspace at a time. If you select a Workspace that is already in use, EMR Studio displays a notification when you try to open it. The **User** column on the **Workspaces** page shows the user working in the Workspace.

# Understand the Workspace user interface in EMR Studio
<a name="emr-studio-workspace-ui"></a>

The EMR Studio Workspace user interface is based on the [JupyterLab interface](https://jupyterlab.readthedocs.io/en/latest/user/interface.html) with icon-denoted tabs on the left sidebar. When you pause over an icon, you can see a tooltip that shows the name of the tab. Choose tabs from the left sidebar to access the following panels.
+ **File Browser** – Displays the files and directories in the Workspace, as well as the files and directories of linked Git repositories.
+ **Running Kernels and Terminals** – Lists all of the kernels and terminals running in the Workspace. For more information, see [Managing kernels and terminals](https://jupyterlab.readthedocs.io/en/latest/user/running.html) in the official JupyterLab documentation.
+ **Git** – Provides a graphical user interface for performing commands in the Git repositories attached to the Workspace. This panel is a JupyterLab extension called jupyterlab-git. For more information, see [jupyterlab-git](https://github.com/jupyterlab/jupyterlab-git).
+ ** EMR clusters** – Lets you attach a cluster to or detach a cluster from the Workspace to run notebook code. The EMR cluster configuration panel also provides advanced configuration options to help you create and attach a *new* cluster to the Workspace. For more information, see [Create and attach a new EMR cluster to an EMR Studio Workspace](emr-studio-create-use-clusters.md#emr-studio-create-cluster).
+ **Amazon EMR Git Repository** – Helps you link the Workspace with up to three Git repositories. For details and instructions, see [Link Git-based repositories to an EMR Studio Workspace](emr-studio-git-repo.md).
+ **Notebook Examples** – Provides a list of notebook examples that you can save to the Workspace. You can also access the examples by choosing **Notebook Examples** on the **Launcher** page of the Workspace. 
+ **Commands** – Offers a keyboard-driven way to search for and run JupyterLab commands. For more information, see the [Command palette](https://jupyterlab.readthedocs.io/en/latest/user/commands.html) page in the JupyterLab documentation.
+ **Notebook Tools** – Lets you select and set options such as cell slide type and metadata. The **Notebook Tools** option appears in the left sidebar after you open a notebook file.
+ **Open Tabs** – Lists the open documents and activities in the main work area so that you can jump to an open tab. For more information, see the [Tabs and single-document mode](https://jupyterlab.readthedocs.io/en/latest/user/interface.html#tabs-and-single-document-mode) page in the JupyterLab documentation.
+ **Collaboration** – Lets you enable or disable Workspace collaboration, and manage collaborators. To see the **Collaboration** panel, you must have the necessary permissions. For more information, see [Set ownership for Workspace collaboration](emr-studio-user-permissions.md#emr-studio-workspace-collaboration-permissions).

# Explore notebook examples in an EMR Studio workspace
<a name="emr-studio-notebook-examples"></a>

Every EMR Studio Workspace includes a set of notebook examples that you can use to explore EMR Studio features. To edit or run a notebook example, you can save it to the Workspace.

**To save a notebook example to a Workspace**

1. From the left sidebar, choose the **Notebook Examples** tab to open the **Notebook Examples** panel. You can also access the examples by choosing **Notebook Examples** on the **Launcher** page of the Workspace. 

1. Choose a notebook example to preview it in the main work area. The example is read-only.

1. To save the notebook example to the Workspace, choose **Save to Workspace**. EMR Studio saves the example in your home directory. After you save a notebook example to the Workspace, you can rename, edit, and run it.

For more information about the notebook examples, see the [EMR Studio Notebook examples GitHub repository](https://github.com/aws-samples/emr-studio-notebook-examples).

# Save Workspace content in EMR Studio
<a name="emr-studio-save-workspace"></a>

When you work in the notebook editor of a Workspace, EMR Studio saves the content of notebook cells and output for you in the Amazon S3 location associated with the Studio. This backup process preserves work between sessions. 

You can also save a notebook by pressing **CTRL\$1S** in the open notebook tab or by using one of the save options under **File**.

Another way to back up the notebook files in a Workspace is to associate the Workspace with a Git-based repository and sync your changes with the remote repository. Doing so also lets you save and share notebooks with team members who use a different Workspace or Studio. For instructions, see [Link Git-based repositories to an EMR Studio Workspace](emr-studio-git-repo.md).

# Delete a Workspace and notebook files in EMR Studio
<a name="emr-studio-delete-workspace"></a>

When you delete a notebook file from an EMR Studio Workspace, you delete the file from the **File browser**, and EMR Studio removes its backup copy in Amazon S3. You do not have to take any further steps to avoid storage charges when you delete a file from a Workspace.

When you delete *an entire Workspace*, its notebook files and folders will remain in the Amazon S3 storage location. The files continue to accrue storage charges. To avoid storage charges, remove all backed-up files and folders that are associated with your deleted Workspace from Amazon S3.

**To delete a notebook file from an EMR Studio Workspace**

1. Select the **File browser** panel from the left sidebar in the Workspace.

1. Select the file or folder you want to delete. Right-click your selection and choose **Delete**. The file disappears from the list. EMR Studio removes the file or folder from Amazon S3 for you.

------
#### [ From the Workspace UI ]

**Delete a Workspace and its associated backup files from EMR Studio**

1. Log in to your EMR Studio with your Studio access URL and choose **Workspaces** from the left navigation.

1. Find your Workspace in the list, then select the check box next to its name. You can select multiple Workspaces to delete at the same time.

1. Choose **Delete** in the upper right of the **Workspaces** list and confirm that you want to delete the selected Workspaces. Choose **Delete** to confirm.

1. If you want to remove the notebook files that were associated with the deleted Workspace from Amazon S3, follow the instructions for [Deleting objects](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/delete-objects.html) in the *Amazon Simple Storage Service* *Console User Guide*. If you did not create the Studio, consult your Studio administrator to determine the Amazon S3 backup location for the deleted Workspace.

------
#### [ From the Workspaces list ]

**Delete a Workspace and its associated backup files from the Workspaces list**

1. Navigate to the **Workspace**s list in the console.

1. Select the Workspace that you want to delete from the list and then choose **Actions**.

1. Choose **Delete**.

1. If you want to remove the notebook files that were associated with the deleted Workspace from Amazon S3, follow the instructions for [Deleting objects](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/delete-objects.html) in the *Amazon Simple Storage Service* *Console User Guide*. If you did not create the Studio, consult your Studio administrator to determine the Amazon S3 backup location for the deleted Workspace.

------

# Understand Workspace status
<a name="emr-studio-workspace-status"></a>

After you create an EMR Studio Workspace, it appears as a row in the **Workspaces** list in your Studio with its name, status, creation time, and last modified timestamp. The following table describes Workspace statuses.


****  

| Status | Description | 
| --- | --- | 
| Starting | The Workspace is being prepared, but is not yet ready to use. You can't open a Workspace when its status is Starting. | 
| Ready | You can open the Workspace to use the notebook editor, but you must attach the Workspace to an EMR cluster before you can run notebook code. | 
| Attaching | The Workspace is being attached to a cluster. | 
| Attached | The Workspace is attached to an EMR cluster and ready for you to write and run notebook code. If a Workspace's status is not Attached, you must attach it to a cluster before you can run notebook code. | 
| Idle | The Workspace has stopped. To reactivate an idle Workspace, select it from the Workspaces list. The status changes from Idle to Starting to Ready when you select the Workspace. | 
| Stopping | The Workspace is shutting down and will be set to Idle. When you stop a Workspace, it terminates any corresponding notebook kernels. EMR Studio stops notebooks that have been inactive for a long time.  | 
| Deleting | When you delete a Workspace, EMR Studio marks it for deletion and starts the deletion process. After the deletion process completes, the Workspace disappears from the list. When you delete a Workspace, its notebook files will remain in the Amazon S3 storage location. | 

# Resolve Workspace connectivity issues
<a name="emr-studio-workspace-stop-start"></a>

To resolve Workspace connectivity issues, you can stop and restart a Workspace. When you restart a Workspace, EMR Studio launches the Workspace in a different Availability Zone or a different subnet that is associated with your Studio.

**To stop and restart an EMR Studio Workspace**

1. Close the Workspace in your browser.

1. Navigate to the **Workspace** list in the console.

1. Select your Workspace from the list and choose **Actions**.

1. Choose **Stop** and wait for the Workspace status to change from **Stopping** to **Idle**.

1. Choose **Actions** again, and then choose **Start** to restart the Workspace.

1. Wait for the Workspace status to change from **Starting** to **Ready**, then choose the Workspace name to reopen it in a new browser tab.

# Configure Workspace collaboration in EMR Studio
<a name="emr-studio-workspace-collaboration"></a>

Workspace collaboration lets you write and run notebook code simultaneously with other members of your team. When you work in the same notebook file, you'll see changes as your collaborators make them. You can enable collaboration when you create a Workspace, or switch collaboration on and off in an existing Workspace. 

**Note**  
EMR Studio Workspace collaboration isn't supported with [EMR Serverless interactive applications](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/interactive-workloads.html) or if trusted identity propagation is enabled.

**Prerequisites**

Before you configure collaboration for a Workspace, make sure you complete the following tasks:
+ Ensure that your EMR Studio admin has given you the necessary permissions. For example, the following statement allows a user to configure collaboration for any Workspace with the tag key `creatorUserId` whose value matches the user's ID (indicated by the policy variable `aws:userId`).

  ```
  {
      "Sid": "UserRolePermissionsForCollaboration",
      "Action": [
          "elasticmapreduce:UpdateEditor",
          "elasticmapreduce:PutWorkspaceAccess",
          "elasticmapreduce:DeleteWorkspaceAccess",
          "elasticmapreduce:ListWorkspaceAccessIdentities"
      ],
      "Resource": "*",
      "Effect": "Allow",
      "Condition": {
          "StringEquals": {
              "elasticmapreduce:ResourceTag/creatorUserId": "${aws:userid}"
          }
      }
  }
  ```
+ Ensure that the service role associated with your EMR Studio has the permissions required to enable and configure Workspace collaboration, as in the following example statement.

  ```
  {
      "Sid": "AllowWorkspaceCollaboration",
      "Effect": "Allow",
      "Action": [
          "iam:GetUser",
          "iam:GetRole",
          "iam:ListUsers",
          "iam:ListRoles",
          "sso:GetManagedApplicationInstance",
          "sso-directory:SearchUsers"
      ],
      "Resource": "*"
  }
  ```

  For more information, see [Create an EMR Studio service role](emr-studio-service-role.md).

**To enable Workspace collaboration and add collaborators**

1. In your Workspace, choose the **Collaboration** icon from the Launcher screen or the bottom of the left panel. 
**Note**  
You won't see the **Collaboration** panel unless your Studio administator has given you permission to configure collaboration for the Workspace. For more information, see [Set ownership for Workspace collaboration](emr-studio-user-permissions.md#emr-studio-workspace-collaboration-permissions).

1. Make sure the **Allow Workspace collaboration** toggle is in the on position. When you enable collaboration, only you and the collaborators that you add can see the Workspace in the list on the Studio **Workspaces** page.

1. Enter a **Collaborator name**. Your Workspace can have a maximum of five collaborators including yourself. A collaborator can be any user with access to your EMR Studio. If you don't enter a collaborator, the Workspace is a private Workspace that is only accessible to you.

   The following table specifies the applicable collaborator values to enter based on the identity type of the owner.
**Note**  
An owner can only invite collaborators with the same identity type. For example, a user can only add other a users, and an IAM Identity Center user can only add other IAM Identity Center users.  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-workspace-collaboration.html)

1. Choose **Add**. The collaborator can now see the Workspace on their EMR Studio **Workspaces** page, and launch the Workspace to use it in real time with you.

**Note**  
If you disable Workspace collaboration, the Workspace returns to its shared state and can be seen by all Studio users. In the shared state, only one Studio user can open and work in the Workspace at a time. 

# Run an EMR Studio Workspace with a runtime role
<a name="emr-studio-runtime"></a>

**Note**  
The runtime role functionality described on this page only applies to Amazon EMR running on Amazon EC2, and doesn't refer to the runtime role functionality in EMR Serverless interactive applications. To learn more about how to use runtime roles in EMR Serverless, see [Job runtime roles](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/security-iam-runtime-role.html) in the *Amazon EMR Serverless User Guide*.

A *runtime role* is an AWS Identity and Access Management (IAM) role that you can specify when you submit a job or query to an Amazon EMR cluster. The job or query that you submit to your EMR cluster uses the runtime role to access AWS resources, such as objects in Amazon S3.

When you attach an EMR Studio Workspace to an EMR cluster that uses Amazon EMR 6.11 or higher, you can select a runtime role for the job or query that you submit to use when it accesses AWS resources. However, if the EMR cluster doesn't support runtime roles, the EMR cluster won't assume the role when it accesses AWS resources.

Before you can use a runtime role with an Amazon EMR Studio Workspace, an administrator must configure user permissions so that the Studio user can call the `elasticmapreduce:GetClusterSessionCredentials` API on the runtime role. Then, launch a new cluster with a runtime role that you can use with your Amazon EMR Studio Workspace.

**Topics**
+ [

## Configure user permissions for the runtime role
](#emr-studio-runtime-setup-permissions)
+ [

## Launch a new cluster with a runtime role
](#emr-studio-runtime-setup-cluster)
+ [

## Use the EMR cluster with a runtime role in Workspaces
](#emr-studio-runtime-use)
+ [

## Considerations
](#emr-studio-runtime-considerations)

## Configure user permissions for the runtime role
<a name="emr-studio-runtime-setup-permissions"></a>

Configure user permissions so that the Studio user can call the `elasticmapreduce:GetClusterSessionCredentials` API on the runtime role that the user wants to use. You must also configure [Configure EMR Studio user permissions for Amazon EC2 or Amazon EKS](emr-studio-user-permissions.md) before the user can start using Studio.

**Warning**  
To grant this permission, create a condition based on the `elasticmapreduce:ExecutionRoleArn` context key when you grant a caller access to call the `GetClusterSessionCredentials` APIs. The following examples demonstrate how to do so.

```
{
      "Sid": "AllowSpecificExecRoleArn",
      "Effect": "Allow",
      "Action": [
          "elasticmapreduce:GetClusterSessionCredentials"
      ],
      "Resource": "*",
      "Condition": {
          "StringEquals": {
              "elasticmapreduce:ExecutionRoleArn": [
                  "arn:aws:iam::111122223333:role/test-emr-demo1",
                  "arn:aws:iam::111122223333:role/test-emr-demo2"
              ]
          }
      }
  }
```

The following example demonstrates how to allow an IAM principal to use an IAM role named `test-emr-demo3` as the runtime role. Additionally, the policy holder will only be able to access Amazon EMR clusters with the cluster ID `j-123456789`.

```
{
    "Sid":"AllowSpecificExecRoleArn",
    "Effect":"Allow",
    "Action":[
        "elasticmapreduce:GetClusterSessionCredentials"
    ],
    "Resource": [
          "arn:aws:elasticmapreduce:<region>:111122223333:cluster/j-123456789"
     ],
    "Condition":{
        "StringEquals":{
            "elasticmapreduce:ExecutionRoleArn":[
                "arn:aws:iam::111122223333:role/test-emr-demo3"
            ]
        }
    }
}
```

The following example lets an IAM principal use any IAM role with a name starting with the string `test-emr-demo4` as the runtime role. Additionally, the policy holder will only be able to access Amazon EMR clusters tagged with the key-value pair `tagKey: tagValue`.

```
{
    "Sid":"AllowSpecificExecRoleArn",
    "Effect":"Allow",
    "Action":[
        "elasticmapreduce:GetClusterSessionCredentials"
    ],
    "Resource": "*",
    "Condition":{
        "StringEquals":{
             "elasticmapreduce:ResourceTag/tagKey": "tagValue"
        },
        "StringLike":{
            "elasticmapreduce:ExecutionRoleArn":[
                "arn:aws:iam::111122223333:role/test-emr-demo4*"
            ]
        }
    }
}
```

## Launch a new cluster with a runtime role
<a name="emr-studio-runtime-setup-cluster"></a>

Now that you have the required permissions, launch a new cluster with a runtime role that you can use with your Amazon EMR Studio Workspace.

If you have already launched a new cluster with a runtime role, you can skip to the [Use the EMR cluster with a runtime role in Workspaces](#emr-studio-runtime-use) section.

1. First, complete the prerequisites in the [Runtime roles for Amazon EMR steps](emr-steps-runtime-roles.md#emr-steps-runtime-roles-configure) section.

1. Then, launch a cluster with the following settings to use runtime roles with Amazon EMR Studio Workspaces. For instructions on how to launch your cluster, see [Specify a security configuration for an Amazon EMR cluster](emr-specify-security-configuration.md).
   + Choose release label emr-6.11.0 or later.
   + Select Spark, Livy, and Jupyter Enterprise Gateway as your cluster applications.
   + Use the security configuration that you created in the previous step.
   + Optionally, you can enable Lake Formation for your EMR cluster. For more information, see [Enable Lake Formation with Amazon EMR](emr-lf-enable.md).

After you launch your cluster, you're ready to [use the runtime role-enabled cluster with an EMR Studio Workspace](#emr-studio-runtime-use).

**Note**  
The [ExecutionRoleArn](https://docs.aws.amazon.com/emr/latest/APIReference/API_ExecutionEngineConfig.html           #EMR-Type-ExecutionEngineConfig-ExecutionRoleArn) value is currently not supported with the [ StartNotebookExecution](https://docs.aws.amazon.com/emr/latest/APIReference/API_StartNotebookExecution.html) API operation when the `ExecutionEngineConfig.Type` value is `EMR`.

## Use the EMR cluster with a runtime role in Workspaces
<a name="emr-studio-runtime-use"></a>

Once you have set up and launched your cluster, you can use the runtime role-enabled cluster with your EMR Studio Workspace.

1. Create a new workspace or launch an existing workspace. For more information, see [Create an EMR Studio Workspace](emr-studio-create-workspace.md).

1. Choose the ** EMR clusters** tab in the left sidebar of your open Workspace, expand the **Compute type** section, and choose your cluster from the **EMR cluster on EC2** menu, and the runtime role from the **Runtime role** menu.  
![\[The EMR Studio Workspace user interface, based on the JupyterLab interface, with icon-denoted tabs on the left sidebar.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/emr-studio-jupyter-runtime.png)

1. Choose **Attach** to attach the cluster with runtime role to your Workspace.

**Note**  
When you choose a runtime role, note that it can have underlying managed policies associated with it. In most cases we recommend choosing limited resources, such as specific notebooks. If you choose a runtime role that includes access for all of your notebooks, for instance, the managed policy associated with the role provides full access.

## Considerations
<a name="emr-studio-runtime-considerations"></a>

Keep in mind the following considerations when you use a runtime role-enabled cluster with your Amazon EMR Studio Workspace:
+ You can only select a runtime role when you attach an EMR Studio Workspace to an EMR cluster that uses Amazon EMR release 6.11 or higher.
+ The runtime role functionality described on this page is only supported with Amazon EMR running on Amazon EC2, and isn't supported with EMR Serverless interactive applications. To learn more about runtime roles for EMR Serverless, see [Job runtime roles](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/security-iam-runtime-role.html) in the *Amazon EMR Serverless User Guide*.
+ Although you need to configure additional permissions before you can specify a runtime role when submitting a job to a cluster, you don't need additional permissions to access the files generated by an EMR Studio Workspace. The permissions for such files are the same as files generated from clusters without runtime roles.
+ You can't use SQL Explorer in an EMR Studio Workspace with a cluster that has a runtime role. Amazon EMR disables SQL Explorer in the UI when a Workspace is attached to a runtime role-enabled EMR cluster.
+ You can't use collaboration mode in an EMR Studio Workspace with a cluster that has a runtime role. Amazon EMR disables Workspace collaboration capabilities when a Workspace is attached to a runtime role-enabled EMR cluster. The Workspace will remain accessible only to the user who attached the Workspace.
+ You can't use runtime roles in a Studio with IAM Identity Center trusted identity propagation enabled.
+ You might encounter a warning **"Page may not be safe\$1"** from Spark UI for a runtime role-enabled cluster that uses Amazon EMR release 7.4.0 and lower. If this happens, bypass the alert to continue to see the Spark UI.

# Run Amazon EMR Studio Workspace Workspace notebooks programmatically
<a name="emr-studio-run-programmatically"></a>

**Note**  
Programmatic execution of notebooks isn't supported with Amazon EMR Serverless interactive applications.

You can run your Amazon EMR Studio Workspace notebooks programmatically with a script or on the AWS CLI. To learn how to run your notebook programmatically, see [Sample programmatic commands for EMR Notebooks](emr-managed-notebooks-headless.md).

# Browse data with SQL Explorer for EMR Studio
<a name="emr-studio-sql-explorer"></a>

**Note**  
SQL Explorer for EMR Studio isn't supported with Amazon EMR Serverless interactive applications or in a Studio with IAM Identity Center trusted identity propagation enabled. 

This topic provides information to help you get started with SQL Explorer in Amazon EMR Studio. SQL Explorer is a single-page tool in your Workspace that helps you understand the data sources in your EMR cluster's data catalog. You can use SQL Explorer to browse your data, run SQL queries to retrieve data, and download query results.

SQL Explorer supports Presto. Before you use SQL Explorer, make sure you have a cluster that uses Amazon EMR version 5.34.0 or later or version 6.4.0 or later with Presto installed. The Amazon EMR Studio SQL Explorer doesn't support Presto clusters that you've configured with in-transit encryption. This is because Presto runs in TLS mode on these clusters.

## Browse your cluster's data catalog
<a name="emr-studio-sql-explorer-browse"></a>

SQL Explorer provides a catalog browser interface that you can use to explore and understand how your data is organized. For example, you can use the data catalog browser to verify table and column names before you write a SQL query.

**To browse your data catalog**

1. Open SQL Explorer in your Workspace.

1. Make sure your Workspace is attached to an EMR cluster running on EC2 that uses Amazon EMR version 6.4.0 or later with Presto installed. You can choose an existing cluster, or create a new one. For more information, see [Attach a compute to an EMR Studio Workspace](emr-studio-create-use-clusters.md).

1. Select a **Database** from the dropdown list to browse.

1. Expand a table in your database to see the table's column names. You can also enter a keyword in the search bar to filter table results.

## Run a SQL query to retrieve data
<a name="emr-studio-sql-explorer-run-query"></a>

**To retrieve data with a SQL query and download the results**

1. Open SQL Explorer in your Workspace.

1. Make sure your Workspace is attached to an EMR cluster running on EC2 with Presto and Spark installed. You can choose an existing cluster, or create a new one. For more information, see [Attach a compute to an EMR Studio Workspace](emr-studio-create-use-clusters.md).

1. Select **Open editor** to open a new editor tab in your Workspace.

1. Compose your SQL query in the editor tab.

1. Choose **Run**.

1. View your query results under **Result preview**. SQL Explorer displays the first 100 results by default. You can choose a different number of results to display (up to 1000) using the **Preview first 100 query results** drowdown.

1. Choose **Download results** to download your results in CSV format. You can download up to 1000 rows of results.

# Attach a compute to an EMR Studio Workspace
<a name="emr-studio-create-use-clusters"></a>

Amazon EMR Studio runs notebook commands using a kernel on an EMR cluster. Before you can select a kernel, you should attach the Workspace to a cluster that uses Amazon EC2 instances, to an Amazon EMR on EKS cluster, or to an EMR Serverless application. EMR Studio lets you attach Workspaces to new or existing clusters, and gives you the flexibility to change clusters without closing the Workspace.

**Topics**
+ [Attach an Amazon EC2 cluster](#emr-studio-attach-cluster)
+ [Attach an Amazon EMR on EKS cluster](#emr-studio-use-eks-cluster)
+ [Attach an EMR Serverless application](#emr-studio-use-serverless-studio)
+ [Create a cluster](#emr-studio-create-cluster)
+ [Detach a compute](#emr-studio-detach-cluster)

## Attach an Amazon EC2 cluster to an EMR Studio Workspace
<a name="emr-studio-attach-cluster"></a>

You can attach an EMR cluster running on Amazon EC2 to a Workspace when you create the Workspace, or attach a cluster to an existing Workspace. If you want to create and attach a *new* cluster, see [Create and attach a new EMR cluster to an EMR Studio Workspace](#emr-studio-create-cluster).

**Note**  
A workspace in a Studio that has IAM Identity Center trusted identity propagation enabled can only attach to an EMR cluster with a security configuration that has Identity Center enabled.

------
#### [ On create ]

**Attach to an Amazon EMR compute cluster when you create a Workspace**

1. In the **Create a Workspace** dialog box, make sure you've already selected a subnet for the new Workspace. Expand the **Advanced configuration** section.

1. Choose **Attach Workspace to an EMR cluster**.

1. In the ** EMR cluster** dropdown list, select an existing EMR cluster to attach to the Workspace.

After you attach a cluster, finish creating the Workspace. When you open the new Workspace for the first time and choose the ** EMR clusters** panel, you should see your selected cluster attached.

------
#### [ On launch ]

**Attach to an Amazon EMR compute cluster when you launch the Workspace**

1. Navigate to the Workspaces list and select the row for the Workspace that you want to launch. Then, select **Launch Workspace** > **Launch with options**.

1. Choose an EMR cluster to attach to your Workspace.

After you attach a cluster, finish creating the Workspace. When you open the new Workspace for the first time and choose the **EMR clusters** panel, you should see your selected cluster attached.

------
#### [ In JupyterLab ]

**Attach a Workspace to an Amazon EMR compute cluster in JupyterLab**

1. Select your Workspace, then select **Launch Workspace** > **Quick launch**.

1. Inside JupyterLab, open the **Cluster**tab in the left sidebar.

1. Select the **EMR on EC2 cluster** dropdown, or select an Amazon EMR on EKS cluster.

1. Select **Attach** to attach the cluster to your Workspace.

After you attach the cluster, finish creating the Workspace. When you open the new Workspace for the first time and choose the ** EMR clusters** panel, you should see your selected cluster attached.

------
#### [ In the Workspace UI ]

**Attach a Workspace to an Amazon EMR compute cluster from the Workspace user interface**

1. In the Workspace that you want to attach to a cluster, choose the ** EMR clusters** icon from the left sidebar to open the **Cluster** panel.

1. Under **Cluster type**, expand the dropdown and select ** EMR cluster on EC2**.

1. Choose a cluster from the dropdown list. You might need to detach an existing cluster first to enable the cluster selection dropdown list.

1. Choose **Attach**. When the cluster is attached, you should see a success message appear.

------

## Attach an Amazon EMR on EKS cluster to an EMR Studio Workspace
<a name="emr-studio-use-eks-cluster"></a>

In addition to using Amazon EMR clusters running on Amazon EC2, you can attach a Workspace to an Amazon EMR on EKS cluster to run notebook code. For more information about Amazon EMR on EKS, see [What is Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/emr-eks.html).

Before you can connect a Workspace to an Amazon EMR on EKS cluster, your Studio administrator must grant you access permissions.

**Note**  
You can't launch an Amazon EMR on EKS cluster in a EMR Studio that uses IAM Identity Center trusted identity propagation. 

------
#### [ On create ]

**To attach an Amazon EMR on EKS cluster when you create a Workspace**

1. In the **Create a Workspace** dialog box, expand the **Advanced configuration** section.

1. Choose **Attach Workspace to an Amazon EMR on EKS cluster**.

1. Under **Amazon EMR on EKS cluster**, choose a cluster from the dropdown list.

1. Under **Select an endpoint**, choose a managed endpoint to attach to the Workspace. A managed endpoint is a gateway that lets EMR Studio communicate with your chosen cluster.

1. Choose **Create a Workspace** to finish the Workspace creation process and attach the selected cluster.

After you attach a cluster, you can finish the Workspace creation process. When you open the new Workspace for the first time and choose the ** EMR clusters** panel, you should see that your selected cluster is attached.

------
#### [ In the Workspace UI ]

**To attach an Amazon EMR on EKS cluster from the Workspace user interface**

1. In the Workspace that you want to attach to a cluster, choose the ** EMR clusters** icon from the left sidebar to open the **Cluster** panel.

1. Expand the **Cluster type** dropdown and choose ** EMR clusters on EKS**.

1. Under ** EMR cluster on EKS**, choose a cluster from the dropdown list.

1. Under **Endpoint**, choose a managed endpoint to attach to the Workspace. A managed endpoint is a gateway that lets EMR Studio communicate with your chosen cluster.

1. Choose **Attach**. When the cluster is attached, you should see a success message appear.

------

## Attach an Amazon EMR Serverless application to an EMR Studio Workspace
<a name="emr-studio-use-serverless-studio"></a>

You can attach a Workspace to an EMR Serverless application to run interactive workloads. For more information, see [Using notebooks to run interactive workloads with EMR Serverless through EMR Studio](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/interactive-workloads.html).

**Note**  
You can't attach an EMR Serverless application to a EMR Studio that uses IAM Identity Center trusted identity propagation. 

**Example Attach a Workspace to an EMR Serverless application in JupyterLab**  
Before you can connect a Workspace to an EMR Serverless application, your account administrator must grant you access permissions as described in [Required permissions for interactive workloads](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/interactive-workloads.html#interactive-permissions).  

1. Navigate to EMR Studio select your Workspace, then select **Launch Workspace** > **Quick launch**.

1. Inside JupyterLab, open the **Cluster** tab in the left sidebar.

1. Select **EMR Serverless** as a compute option, then select an EMR Serverless application and a runtime role.

1. To attach the cluster to your Workspace, choose **Attach**.
Now when you open this Workspace, you should see your selected application attached.

## Create and attach a new EMR cluster to an EMR Studio Workspace
<a name="emr-studio-create-cluster"></a>

Advanced EMR Studio users can provision new EMR clusters running on Amazon EC2 to use with a Workspace. The new cluster has all of the big data applications that are required for EMR Studio installed by default. 

To create clusters, your Studio administrator must first give you permission using a session policy. For more information, see [Create permissions policies for EMR Studio users](emr-studio-user-permissions.md#emr-studio-permissions-policies).

You can create a new cluster in the **Create a Workspace** dialog box or from the **Cluster** panel in the Workspace UI. Either way, you have two cluster creation options:

1. **Create an EMR cluster** – Create an EMR cluster by choosing the Amazon EC2 instance type and count.

1. **Use a cluster template** – Provision a cluster by selecting a predefined cluster template. This option appears if you have permission to use cluster templates.
**Note**  
If you enabled trusted identity propagation with IAM Identity Center for your Studio, then you must use a template to create a cluster.

**To create an EMR cluster by providing a cluster configuration**

1. Choose a starting point.  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-create-use-clusters.html)

1. Enter a **Cluster name**. Naming the cluster helps you find it later in the EMR Studio Clusters list.

1. For **Amazon EMR release**, Choose an Amazon EMR release version for the cluster.

1. For **Instance**, select the type and number of Amazon EC2 instances for the cluster. For more information about selecting instance types, see [Configure Amazon EC2 instance types for use with Amazon EMR](emr-plan-ec2-instances.md). One instance will be used as the primary node.

1. Select a **Subnet** where EMR Studio can launch the new cluster. Each subnet option is preapproved by your Studio administrator, and your Workspace should be able to connect to a cluster in any listed subnet.

1. Choose an **S3 URI for log storage**.

1. Choose **Create EMR cluster** to provision the cluster. If you use the **Create a Workspace** dialog box, choose **Create a Workspace** to create the Workspace and provision the cluster. After EMR Studio provisions the new cluster, it attaches the cluster to the Workspace.

**To create a cluster using a cluster template**

1. Choose a starting point.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-create-use-clusters.html)

1. Select a cluster template from the dropdown list. Each available cluster template includes a brief description to help you make a selection.

1. The cluster template you choose may have additional parameters such as Amazon EMR release version or cluster name. You can choose or insert values, or use the default values that your administrator selected.

1. Select a **Subnet** where EMR Studio can launch the new cluster. Each subnet option is preapproved by your Studio administrator, and your Workspace should be able to connect to a cluster in any subnet.

1. Choose **Use cluster template** to provision the cluster and attach it to the Workspace. It will take a few minutes for EMR Studio to create the cluster. If you use the **Create a Workspace** dialog box, choose **Create a Workspace** to create the Workspace and provision the cluster. After EMR Studio provisions the new cluster, it attaches the cluster to your Workspace.

## Detach a compute from an EMR Studio Workspace
<a name="emr-studio-detach-cluster"></a>

To exchange the cluster attached to a Workspace, you can detach a cluster from the Workspace UI.

**To detach a cluster from a Workspace**

1. In the Workspace that you want to detach from a cluster, choose the ** EMR clusters** icon from the left sidebar to open the **Cluster** panel.

1. Under **Select cluster**, choose **Detach** and wait for EMR Studio to detach the cluster. When the cluster is detached, you will see a success message.

**To detach an EMR Serverless application from an EMR Studio Workspace**

To exchange the compute attached to a Workspace, you can detach the application from the Workspace UI. 

1. In the Workspace that you want to detach from a cluster, choose the **Amazon EMR compute** icon from the left sidebar to open the **Compute** panel.

1. Under **Select compute**, choose **Detach** and wait for EMR Studio to detach the application. When the application is detached, you will see a success message.

# Link Git-based repositories to an EMR Studio Workspace
<a name="emr-studio-git-repo"></a>

Associate up to three Git-based repositories with an Amazon EMR Studio Workspace to save and share notebook files.

## About Git repositories for EMR Studio
<a name="emr-studio-git-repo-about"></a>

You can associate a maximum of three Git repositories with an EMR Studio Workspace. By default, each Workspace lets you choose from a list of Git repositories that are associated with the same AWS account as the Studio. You can also create a new Git repository as a resource for a Workspace.

You can run Git commands like the following using a terminal command while connected to the primary node of a cluster. 

```
!git pull origin <branch-name>
```

Alternatively, you can use the jupyterlab-git extension. Open it from the left sidebar by choosing the **Git** icon. For information about the jupyterlab-git extension for JupyterLab, see [jupyterlab-git](https://github.com/jupyterlab/jupyterlab-git).

## Prerequisites
<a name="emr-studio-git-prereqs"></a>
+ To associate a Git repository with a Workspace, the Studio must be configured to allow Git repository linking. Your Studio administrator should take steps to [Establish access and permissions for Git-based repositories](emr-studio-enable-git.md).
+ If you use a CodeCommit repository, you must use Git credentials and HTTPS. SSH keys and HTTPS with the AWS Command Line Interface credential helper are not supported. CodeCommit also does not support personal access tokens (PATs). For more information, see [Using IAM with CodeCommit](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_ssh-keys.html) in the *IAM user Guide* and [Setup for HTTPS users using Git credentials](https://docs.aws.amazon.com/codecommit/latest/userguide/setting-up-gc.html) in the *AWS CodeCommit User Guide*.

## Instructions
<a name="emr-studio-link-git-repo"></a>

**To link an associated Git repository to a Workspace**

1. Open the Workspace that you want to link to a repository from the **Workspaces** list in the Studio.

1. In the left sidebar, choose the **Amazon EMR Git Repository** icon to open the **Git repository** tool panel.

1. Under **Git repositories**, expand the dropdown list and select a maximum of three repositories to link to the Workspace. EMR Studio registers your selection and begins linking each repository. 

It might take some time for the linking process to complete. You can see the status for each repository that you selected in the **Git repository** tool panel. After EMR Studio links a repository to a Workspace, you should see the files that belong to that repository appear in the **File browser** panel.

**To add a new Git repository to a Workspace as a resource**

1. Open the Workspace that you want to link to a repository from the Workspaces list in your Studio.

1. In the left sidebar, choose the **Amazon EMR Git Repository** icon to open the **Git repository** tool panel.

1. Choose **Add new Git repository**.

1. For **Repository name**, enter a descriptive name for the repository in EMR Studio. Names may only contain alphanumeric characters, hyphens, and underscores.

1. For **Git repository URL**, enter the URL for the repository. When you use a CodeCommit repository, this is the URL that is copied when you choose **Clone URL** and then **Clone HTTPS**. For example, `https://git-codecommit.us-west-2.amazonaws.com/v1/repos/[MyCodeCommitRepoName]`.

1. For **Branch**, enter the name of an existing branch that you want to check out.

1. For Git credentials, choose an option according to the following guidelines. EMR Studio accesses your Git credentials using secrets stored in Secrets Manager.
**Note**  
If you use a GitHub repository, we recommend that you use a personal access token (PAT) to authenticate. Beginning August 13, 2021, GitHub will require token-based authentication and will no longer accept passwords when authenticating Git operations. For more information, see the [Token authentication requirements for Git operations](https://github.blog/2020-12-15-token-authentication-requirements-for-git-operations/) post in *The GitHub Blog*.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-git-repo.html)

1. Choose **Add repository** to create the new repository. After EMR Studio creates the new repository, you will see a success message. The new repository appears in the dropdown list under **Git repositories**.

1. To link the new repository to your Workspace, choose it from the dropdown list under **Git repositories**.

It might take some time for the linking process to complete. After EMR Studio links the new repository to the Workspace, you should see a new folder with the same name as your repository appear in the **File Browser** panel.

To open a different linked repository, navigate to its folder in the **File browser**. 

# Use the Amazon Athena SQL editor in EMR Studio
<a name="emr-studio-athena"></a>

## Overview
<a name="emr-studio-athena-overview"></a>

You can use Amazon EMR Studio to develop and run interactive queries on Amazon Athena. That means that you can perform SQL analytics on Athena from the same EMR Studio interface that you use to run your Spark, Scala, and other workloads. With this integration, you can use auto-completion to develop queries quickly, browse data in your AWS Glue Data Catalog, create saved queries, view your query history, and more.

For more information on using Amazon Athena, see [Using Athena SQL](https://docs.aws.amazon.com/athena/latest/ug/using-athena-sql.html) in the *Amazon Athena User Guide*.

## Use the Athena SQL editor in EMR Studio
<a name="emr-studio-athena-use"></a>

Use the following steps to develop and run interactive queries on Amazon Athena from your EMR Studio:

1. Add the required permissions to the user role for the users who access the Workspaces in this Studio. The permissions are listed in the [AWS Identity and Access Management permissions for EMR Studio users](emr-studio-user-permissions.md#emr-studio-iam-permissions-table) table in the column **Access Amazon Athena SQL editor from your EMR Studio**. Alternatively, you can choose to copy the **Advanced** policy contents from the [Example user policies](emr-studio-user-permissions.md#emr-studio-example-policies) to grant users full permissions to EMR Studio capabilities including this one.

1. [Set up](emr-studio-set-up.md) and [create an EMR Studio](emr-studio-create-studio.md).

1. Navigate to your Studio and select **Query editor** from the sidebar.

You should now see the familiar Athena editor UI. For information on getting started and using Athena SQL to run interactive queries, see [Getting started](https://docs.aws.amazon.com/athena/latest/ug/getting-started.html) and [Using Athena SQL](https://docs.aws.amazon.com/athena/latest/ug/using-athena-sql.html) in the *Amazon Athena User Guide*.

**Note**  
If you have enabled trusted identity propagation through IAM Identity Center for your EMR Studio, then you must use Athena workgroups to control query access, and the workgroup that you use must also use trusted identity propagation. For steps to set up Identity Center and enable trusted identity propagation for your workgroup, see [Using IAM Identity Center enabled Athena workgroups](https://docs.aws.amazon.com/athena/latest/ug/workgroups-identity-center.html) in the *Amazon Athena User Guide*.

## Considerations for using the Athena SQL editor in EMR Studio
<a name="emr-studio-athena-considerations"></a>
+ Integration with Athena is available in all commercial Regions where EMR Studio and Athena are available.
+ The following Athena features are not available in EMR Studio:
  + Admin features like creating or updating Athena workgroups, data sources, or capacity reservations
  + Athena for Spark or Spark notebooks
  + Amazon DataZone integration
  + Cost Based Optimizer (CBO)
  + Step functions

# Amazon CodeWhisperer integration with EMR Studio Workspaces
<a name="emr-studio-codewhisperer"></a>

## Overview
<a name="emr-studio-codewhisperer-overview"></a>

You can use [Amazon CodeWhisperer](https://docs.aws.amazon.com/codewhisperer/latest/userguide/what-is-cwspr.html) with Amazon EMR Studio to get real-time recommendations as you write code in JupyterLab. CodeWhisperer can complete your comments, finish single lines of code, make line-by-line recommendations, and generate fully-formed functions. 

**Note**  
When you use Amazon EMR Studio, AWS might store data about your usage and content for service improvement purposes. For more information and instructions to opt out of data sharing, see [Sharing your data with AWS](https://docs.aws.amazon.com/codewhisperer/latest/userguide/sharing-data.html) in the *Amazon CodeWhisperer User Guide*. 

## Considerations for using CodeWhisperer with Workspaces
<a name="emr-studio-codewhisperer-considerations"></a>
+ CodeWhisperer integration is available in the same AWS Regions where EMR Studio is available, as documented in the [EMR Studio considerations](emr-studio-considerations.md).
+ Amazon EMR Studio automatically uses the CodeWhisperer endpoint in US East (N. Virginia) (us-east-1) for recommendations, regardless of the Region that your studio is in.
+ CodeWhisperer supports only Python language for coding ETL scripts for Spark jobs in EMR Studio. 
+ A client-side telemetry option quantifies your usage of CodeWhisperer. This functionality isn't supported with EMR Studio.

## Permissions required for CodeWhisperer
<a name="emr-studio-codewhisperer-permissions"></a>

To use CodeWhisperer, you must attach the following policy to your IAM user role for Amazon EMR Studio:

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "CodeWhispererPermissions",
      "Effect": "Allow",
      "Action": [
        "codewhisperer:GenerateRecommendations"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}
```

------

## Use CodeWhisperer with Workspaces
<a name="emr-studio-codewhisperer-use"></a>

To display the CodeWhisperer reference log in JupyterLab, open the **CodeWhisperer** panel at the bottom of the JupyterLab window and choose **Open Code Reference Log**.

The following list contains shortcuts that you can use to interact with CodeWhisperer suggestions:
+ **Pause recommendations** – Use **Pause Auto-Suggestions** from the CodeWhisperer settings.
+ **Accept a recommendation** – Press **Tab** on your keyboard.
+ **Reject a recommendation** – Press **Escape** on your keyboard.
+ **Navigate recommendations** – Use the **Up** and **Down** arrows on your keyboard.
+ **Manual invoke** – Press **Alt** and **C** on your keyboard. If you're using a Mac, press **Cmd** and **C**.

You can also use CodeWhisperer to change settings like log level and get suggestions for code references. For more information, see [Setting up CodeWhisperer with JupyterLab](https://docs.aws.amazon.com/codewhisperer/latest/userguide/jupyterlab-setup.html) and [Features](https://docs.aws.amazon.com/codewhisperer/latest/userguide/features.html) in the *Amazon CodeWhisperer User Guide*.

# Debug applications and jobs with EMR Studio
<a name="emr-studio-debug"></a>

With Amazon EMR Studio, you can launch data application interfaces to analyze applications and job runs in the browser.

You can also launch the persistent, off-cluster user interfaces for Amazon EMR running on EC2 clusters from the Amazon EMR console. For more information, see [View persistent application user interfaces in Amazon EMR](app-history-spark-UI.md).

**Note**  
Depending on your browser settings, you might need to enable pop-ups for an application UI to open.

For information about configuring and using the application interfaces, see [The YARN Timeline Server](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/TimelineServer.html), [Monitoring and instrumentation](https://spark.apache.org/docs/latest/monitoring.html), or [Tez UI overview](https://tez.apache.org/tez-ui.html).

## Debug Amazon EMR running on Amazon EC2 jobs
<a name="emr-studio-debug-ec2"></a>

------
#### [ Workspace UI ]

**Launch an on-cluster UI from a notebook file**

When you use Amazon EMR release versions 5.33.0 and later, you can launch the Spark web user interface (the Spark UI or Spark History Server) from a notebook in your Workspace. 

On-cluster UIs work with the PySpark, Spark, or SparkR kernels. The maximum viewable file size for Spark event logs or container logs is 10 MB. If your log files exceed 10 MB, we recommend that you use the persistent Spark History Server instead of the on-cluster Spark UI to debug jobs.
**Important**  
In order for EMR Studio to launch on-cluster application user interfaces from a Workspace, a cluster must be able to communicate with the Amazon API Gateway. You must configure the EMR cluster to allow outgoing network traffic to Amazon API Gateway, and make sure that Amazon API Gateway is reachable from the cluster.   
The Spark UI accesses container logs by resolving hostnames. If you use a custom domain name, you must make sure that the hostnames of your cluster nodes can be resolved by Amazon DNS or by the DNS server you specify. To do so, set the Dynamic Host Configuration Protocol (DHCP) options for the Amazon Virtual Private Cloud (VPC) that is associated with your cluster. For more information about DHCP options, see [DHCP option sets](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_DHCP_Options.html) in the *Amazon Virtual Private Cloud* *User Guide.*

1. In your EMR Studio, open the Workspace that you want to use and make sure that it is attached to an Amazon EMR cluster running on EC2. For instructions, see [Attach a compute to an EMR Studio Workspace](emr-studio-create-use-clusters.md).

1. Open a notebook file and use the PySpark, Spark, or SparkR kernel. To select a kernel, choose the kernel name from the upper right of the notebook toolbar to open the **Select Kernel** dialog box. The name appears as **No Kernel\$1** if no kernel has been selected.

1. Run your notebook code. The following appears as output in the notebook when you start the Spark context. It might take a few seconds to appear. If you have started the Spark context, you can run the `%%info` command to access a link to the Spark UI at any time.
**Note**  
If the Spark UI links do not work or do not appear after a few seconds, create a new notebook cell and run the `%%info` command to regenerate the links.  
![\[Screenshot of the Spark application master information, with link to the Spark UI. The link appears in a notebook when you run a Spark application.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/spark-app-ui-link.jpg)

1. To launch the Spark UI, choose **Link** under **Spark UI**. If your Spark application is running, the Spark UI opens in a new tab. If the application has completed, the Spark History Server opens instead.

   After you launch the Spark UI, you can modify the URL in the browser to open the YARN ResourceManager or the Yarn Timeline Server. Add one of the following paths after `amazonaws.com`.  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-debug.html)

------
#### [ Studio UI ]

**Launch the persistent YARN Timeline Server, Spark History Server, or Tez UI from the EMR Studio UI**

1. In your EMR Studio, select **Amazon EMR on EC2** on the left side of the page to open the **Amazon EMR on EC2** clusters list. 

1. Filter the list of clusters by **name**, **state**, or **ID** by entering values in the search box. You can also search by creation **time range**.

1. Select a cluster and then choose **Launch application UIs** to select an application user interface. The Application UI opens in a new browser tab and might take some time to load.

------

## Debug EMR Studio running on EMR Serverless
<a name="emr-studio-debug-serverless"></a>

Similar to Amazon EMR running on Amazon EC2, you can use the Workspace user interface to analyze your EMR Serverless applications. From the Workspace UI, when you use Amazon EMR releases 6.14.0 and higher, you can launch the Spark web user interface (the Spark UI or Spark History Server) from a notebook in your Workspace. For your convenience, we also provide a link to the driver log for quick access the Spark driver logs.

## Debug Amazon EMR on EKS job runs with the Spark History Server
<a name="emr-studio-debug-eks"></a>

When you submit a job run to an Amazon EMR on EKS cluster, you can access logs for that job run using the Spark History Server. The Spark History Server provides tools for monitoring Spark applications, such as a list of scheduler stages and tasks, a summary of RDD sizes and memory usage, and environmental information. You can launch the Spark History Server for Amazon EMR on EKS job runs in the following ways:
+ When you submit a job run using EMR Studio with an Amazon EMR on EKS managed endpoint, you can launch the Spark History Server from a notebook file in your Workspace.
+ When you submit a job run using the AWS CLI or AWS SDK for Amazon EMR on EKS, you can launch the Spark History Server from the EMR Studio UI.

For information about how to use the Spark History Server, see [Monitoring and Instrumentation](https://spark.apache.org/docs/latest/monitoring.html) in the Apache Spark documentation. For more information about job runs, see [Concepts and components](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/emr-eks-concepts.html) in the *Amazon EMR on EKS Development Guide*.

**To launch the Spark History Server from a notebook file in your EMR Studio Workspace**

1. Open a Workspace that is connected to an Amazon EMR on EKS cluster.

1. Select and open your notebook file in the Workspace.

1. Choose **Spark UI** at the top of the notebook file to open the persistent Spark History Server in a new tab.

**To launch the Spark History Server from the EMR Studio UI**
**Note**  
The **Jobs** list in the EMR Studio UI displays only job runs that you submit using the AWS CLI or AWS SDK for Amazon EMR on EKS.

1. In your EMR Studio, select **Amazon EMR on EKS** on the left side of the page. 

1. Search for the Amazon EMR on EKS virtual cluster that you used to submit your job run. You can filter the list of clusters by **status** or **ID** by entering values in the search box.

1. Select the cluster to open its detail page. The detail page displays information about the cluster, such as ID, namespace, and status. The page also shows a list of all the job runs submitted to that cluster. 

1. From the cluster detail page, select a job run to debug.

1. In the upper right of the **Jobs** list, choose **Launch Spark History Server** to open the application interface in a new browser tab.

# Install kernels and libraries in an EMR Studio Workspace
<a name="emr-studio-install-libraries-and-kernels"></a>

Each Amazon EMR Studio Workspace comes with a set of pre-installed libraries and kernels. 

## Kernels and libraries on clusters that run on Amazon EC2
<a name="emr-studio-ec2-kernels-libraries"></a>

You can also customize the environment for EMR Studio in the following ways when you use EMR clusters running on Amazon EC2:
+ **Install Jupyter Notebook kernels and Python libraries on a cluster primary node** – When you install libraries using this option, all Workspaces attached to the same cluster share those libraries. You can install kernels or libraries from within a notebook cell or while connected using SSH to the primary node of a cluster.
+ **Use notebook-scoped libraries** – When Workspace users install and use libraries from within a notebook cell, those libraries only available to that notebook alone. This option lets different notebooks using the same cluster work without worrying about conflicting library versions.

EMR Studio Workspaces have the same underlying architecture as EMR Notebooks. You can install and use Jupyter Notebook kernels and Python libraries with EMR Studio in the same way you would with EMR Notebooks. For instructions, see [Installing and using kernels and libraries in EMR Studio](emr-managed-notebooks-installing-libraries-and-kernels.md). 

## Kernels and libraries on Amazon EMR on EKS clusters
<a name="emr-studio-eks-kernels-libraries"></a>

Amazon EMR on EKS clusters include the PySpark and Python 3.7 kernels with a set of pre-installed libraries. Amazon EMR on EKS does not support installing additional libraries or clusters.

Each Amazon EMR on EKS cluster comes with the following Python and PySpark libraries installed:
+ **Python** – boto3, cffi, future, ggplot, jupyter, kubernetes, matplotlib, numpy, pandas, plotly, pycryptodomex, py4j, requests, scikit-learn, scipy, seaborn
+ **PySpark** – ggplot, jupyter, matplotlib, numpy, pandas, plotly, pycryptodomex, py4j, requests, scikit-learn, scipy, seaborn

## Kernels and libraries on EMR Serverless applications
<a name="emr-studio-serverless-kernels-libraries"></a>

Each EMR Serverless application comes with the following Python and PySpark libraries installed:
+ **Python** – ggplot, matplotlib, numpy, pandas, plotly, bokeh, scikit-learn, scipy, seaborn
+ **PySpark** – ggplot, matplotlib,numpy, pandas, plotly, bokeh, scikit-learn, scipy, seaborn

# Enhance kernels with magic commands in EMR Studio
<a name="emr-studio-magics"></a>

## Overview
<a name="overview-magics"></a>

EMR Studio and EMR Notebooks support magic commands. *Magic* commands, or *magics*, are enhancements that the IPython kernel provides to help you run and analyze data. IPython is an interactive shell environment that is built with Python.

Amazon EMR also supports Sparkmagic, a package that provides Spark-related kernels (PySpark, SparkR, and Scala kernels) with specific magic commands and that uses Livy on the cluster to submit Spark jobs.

You can use magic commands as long as you have a Python kernel in your EMR notebook. Similarly, any Spark-related kernel supports Sparkmagic commands.

Magic commands, also called * magics*, come in two varieties:
+ **Line magics** – These magic commands are denoted by a single `%` prefix and operate on a single line of code
+ **Cell magics** – These magic commands are denoted by a double `%%` prefix and operate on multiple lines of code

For all available magics, see [List magic and Sparkmagic commands](#accessing-all-magic-commands).

## Considerations and limitations
<a name="considerations-limitations-magics"></a>
+ EMR Serverless doesn't support `%%sh` to run `spark-submit`. It doesn't support the EMR Notebooks magics.
+ Amazon EMR on EKS clusters don't support Sparkmagic commands for EMR Studio. This is because Spark kernels that you use with managed endpoints are built into Kubernetes, and they aren't supported by Sparkmagic and Livy. You can set the Spark configuration directly into the SparkContext object as a workaround, as the following example demonstrates.

  ```
  spark.conf.set("spark.driver.maxResultSize", '6g') 
  ```
+ The following magic commands and actions are prohibited by AWS:
  + `%alias`
  + `%alias_magic`
  + `%automagic`
  + `%macro`
  + Modifying `proxy_user` with `%configure`
  + Modifying `KERNEL_USERNAME` with `%env` or `%set_env`

## List magic and Sparkmagic commands
<a name="accessing-all-magic-commands"></a>

Use the following commands to list the available magic commands:
+ `%lsmagic` lists all currently-available magic functions.
+ `%%help` lists currently-available Spark-related magic functions provided by the Sparkmagic package.

## Use `%%configure` to configure Spark
<a name="using-configure-sparkmagic"></a>

One of the most useful Sparkmagic commands is the `%%configure` command, which configures the session creation parameters. Using `conf` settings, you can configure any Spark configuration that's mentioned in the [configuration documentation for Apache Spark](https://spark.apache.org/docs/latest/configuration.html).

**Example Add external JAR file to EMR Notebooks from Maven repository or Amazon S3**  
You can use the following approach to add an external JAR file dependency to any Spark-related kernel that's supported by Sparkmagic.  

```
%%configure -f
{"conf": {
    "spark.jars.packages": "com.jsuereth:scala-arm_2.11:2.0,ml.combust.bundle:bundle-ml_2.11:0.13.0,com.databricks:dbutils-api_2.11:0.0.3",
    "spark.jars": "s3://amzn-s3-demo-bucket/my-jar.jar"
    }
}
```

**Example : Configure Hudi**  
You can use the notebook editor to configure your EMR notebook to use Hudi.  

```
%%configure
{ "conf": {
     "spark.jars": "hdfs://apps/hudi/lib/hudi-spark-bundle.jar,hdfs:///apps/hudi/lib/spark-spark-avro.jar", 
     "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
     "spark.sql.hive.convertMetastoreParquet":"false"
     }
}
```

## Use `%%sh` to run `spark-submit`
<a name="using-sh-sparkmagic"></a>

The `%%sh` magic runs shell commands in a subprocess on an instance of your attached cluster. Typically, you'd use one of the Spark-related kernels to run Spark applications on your attached cluster. However, if you want to use a Python kernel to submit a Spark application, you can use the following magic, replacing the bucket name with your bucket name in lowercase.

```
%%sh
spark-submit --master yarn --deploy-mode cluster s3://amzn-s3-demo-bucket/test.py
```

In this example, the cluster needs access to the location of `s3://amzn-s3-demo-bucket/test.py`, or the command will fail.

You can use any Linux command with the `%%sh` magic. If you want to run any Spark or YARN commands, use one of the following options to create an `emr-notebook` Hadoop user and grant the user permissions to run the commands:
+ You can explicitly create a new user by running the following commands.

  ```
  hadoop fs -mkdir /user/emr-notebook
  hadoop fs -chown emr-notebook /user/emr-notebook
  ```
+ You can turn on user impersonation in Livy, which automatically creates the user. See [Enabling user impersonation to monitor Spark user and job activity](emr-managed-notebooks-spark-monitor.md) for more information.

## Use `%%display` to visualize Spark dataframes
<a name="using-display-sparkmagic"></a>

You can use the `%%display` magic to visualize a Spark dataframe. To use this magic, run the following command. 

```
%%display df
```

Choose to view the results in a table format, as the following image shows.

![\[Output of using the %%display magic that shows results in a table format.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/magic-display-table.png)


You can also choose to visualize your data with five types of charts. Your options include pie, scatter, line, area, and bar charts.

![\[Output of using the %%display magic that shows results in a chart format.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/magic-display-chart.png)


## Use EMR Notebooks magics
<a name="emr-magics"></a>

Amazon EMR provides the following EMR Notebooks magics that you can use with Python3 and Spark-based kernels:
+ `%mount_workspace_dir` - Mounts your Workspace directory to your cluster so that you can import and run code from other files in your Workspace
**Note**  
With `%mount_workspace_dir`, only the Python 3 kernel can access your local file systems. Spark executors will not have access to the mounted directory with this kernel.
+ `%umount_workspace_dir` - Unmounts your Workspace directory from your cluster
+ `%generate_s3_download_url` - Generates a temporary download link in your notebook output for an Amazon S3 object 

### Prerequisites
<a name="emr-magics-prereqs"></a>

Before you install EMR Notebooks magics, complete the following tasks:
+ Make sure that your [Service role for cluster EC2 instances (EC2 instance profile)](emr-iam-role-for-ec2.md) has read access for Amazon S3. The `EMR_EC2_DefaultRole` with the `AmazonElasticMapReduceforEC2Role` managed policy fulfills this requirement. If you use a custom role or policy, make sure that it has the necessary S3 permissions.
**Note**  
EMR Notebooks magics run on a cluster as the notebook user and use the EC2 instance profile to interact with Amazon S3. When you mount a Workspace directory on an EMR cluster, all Workspaces and EMR notebooks with permission to attach to that cluster can access the mounted directory.  
Directories are mounted as read-only by default. While `s3fs-fuse` and `goofys` allow read-write mounts, we strongly recommend that you do not modify mount parameters to mount directories in read-write mode. If you allow write access, any changes made to the directory are written to the S3 bucket. To avoid accidental deletion or overwriting, you can enable versioning for your S3 bucket. To learn more, see [Using versioning in S3 buckets](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html).
+ Run one of the following scripts on your cluster to install the dependencies for EMR Notebooks magics. To run a script, you can either [Use custom bootstrap actions](emr-plan-bootstrap.md#bootstrapCustom) or follow the instructions in [Run commands and scripts on an Amazon EMR cluster](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-commandrunner.html) when you already have a running cluster.

  You can choose which dependency to install. Both [s3fs-fuse](https://github.com/s3fs-fuse/s3fs-fuse) and [goofys](https://github.com/kahing/goofys) are FUSE (Filesystem in Userspace) tools that let you mount an Amazon S3 bucket as a local file system on a cluster. The `s3fs` tool provides an experience similar to POSIX. The `goofys` tool is a good choice when you prefer performance over a POSIX-compliant file system.

  The Amazon EMR 7.x series uses Amazon Linux 2023, which doesn't support EPEL repositories. If you're running Amazon EMR 7.x, follow the [s3fs-fuse GitHub](https://github.com/s3fs-fuse/s3fs-fuse/blob/master/COMPILATION.md) instructions to install `s3fs-fuse`. If you use the 5.x or 6.x series, use the following commands to install `s3fs-fuse`.

  ```
  #!/bin/sh
  
  # Install the s3fs dependency for EMR Notebooks magics 
  sudo amazon-linux-extras install epel -y
  sudo yum install s3fs-fuse -y
  ```

  **OR**

  ```
  #!/bin/sh
  
  # Install the goofys dependency for EMR Notebooks magics 
  sudo wget https://github.com/kahing/goofys/releases/latest/download/goofys -P /usr/bin/
  sudo chmod ugo+x /usr/bin/goofys
  ```

### Install EMR Notebooks magics
<a name="emr-magics-install"></a>

**Note**  
With Amazon EMR releases 6.0 through 6.9.0, and 5.0 through 5.36.0, only `emr-notebooks-magics` package versions 0.2.0 and higher support `%mount_workspace_dir` magic.

Complete the following steps to install EMR Notebooks magics.

1. In your notebook, run the following commands to install the [https://pypi.org/project/emr-notebooks-magics/](https://pypi.org/project/emr-notebooks-magics/) package.

   ```
   %pip install boto3 --upgrade
   %pip install botocore --upgrade
   %pip install emr-notebooks-magics --upgrade
   ```

1. Restart your kernel to load the EMR Notebooks magics.

1. Verify your installation with the following command, which should display output help text for `%mount_workspace_dir`.

   ```
   %mount_workspace_dir?
   ```

### Mount a Workspace directory with `%mount_workspace_dir`
<a name="emr-magics-mount-workspace"></a>

The `%mount_workspace_dir` magic lets you mount your Workspace directory onto your EMR cluster so that you can import and run other files, modules, or packages stored in your directory.

The following example mounts the entire Workspace directory onto a cluster, and specifies the optional *`<--fuse-type>`* argument to use goofys for mounting the directory.

```
%mount_workspace_dir . <--fuse-type goofys>
```

To verify that your Workspace directory is mounted, use the following example to display the current working directory with the `ls` command. The output should display all of the files in your Workspace.

```
%%sh
ls
```

When you're done making changes in your Workspace, you can unmount the Workspace directory with the following command:

**Note**  
Your Workspace directory stays mounted to your cluster even when the Workspace is stopped or detached. You must explicitly unmount your Workspace directory.

```
%umount_workspace_dir
```

### Download an Amazon S3 object with `%generate_s3_download_url`
<a name="emr-magics-generate-s3-download-url"></a>

The `generate_s3_download_url` command creates a presigned URL for an object stored in Amazon S3. You can use the presigned URL to download the object to your local machine. For example, you might run `generate_s3_download_url` to download the result of a SQL query that your code writes to Amazon S3.

The presigned URL is valid for 60 minutes by default. You can change the expiration time by specifying a number of seconds for the `--expires-in` flag. For example, `--expires-in 1800` creates a URL that is valid for 30 minutes.

The following example generates a download link for an object by specifying the full Amazon S3 path: `s3://EXAMPLE-DOC-BUCKET/path/to/my/object`.

```
%generate_s3_download_url s3://EXAMPLE-DOC-BUCKET/path/to/my/object
```

To learn more about using `generate_s3_download_url`, run the following command to display help text.

```
%generate_s3_download_url?
```

### Run a notebook in headless mode with `%execute_notebook`
<a name="headless-execution"></a>

With `%execute_notebook` magic, you can run another notebook in headless mode and view the output for each cell that you've run. This magic requires additional permissions for the instance role that Amazon EMR and Amazon EC2 share. For more details on how to grant additional permissions, run the command `%execute_notebook?`.

During a long-running job, your system might go to sleep because of inactivity, or might temporarily lose internet connectivity. This might disrupt the connection between your browser and the Jupyter Server. In this case, you might lose the output from the cells that you've run and sent from the Jupyter Server.

If you run the notebook in headless mode with `%execute_notebook` magic, EMR Notebooks captures output from the cells that have run, even if the local network experiences disruption. EMR Notebooks saves the output incrementally in a new notebook with the same name as the notebook that you've run. EMR Notebooks then places the notebook into a new folder within the workspace. Headless runs occur on the same cluster and uses service role `EMR_Notebook_DefaultRole`, but additional arguments can alter the default values.

To run a notebook in headless mode, use the following command:

```
%execute_notebook <relative-file-path>
```

To specify a cluster ID and service role for a headless run, use the following command:

```
%execute_notebook <notebook_name>.ipynb --cluster-id <emr-cluster-id> --service-role <emr-notebook-service-role>
```

When Amazon EMR and Amazon EC2 share an instance role, the role requires the following additional permissions:

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:StartNotebookExecution",
        "elasticmapreduce:DescribeNotebookExecution",
        "ec2:DescribeInstances"
      ],
      "Resource": [
        "*"
      ],
      "Sid": "AllowELASTICMAPREDUCEStartnotebookexecution"
    },
    {
      "Effect": "Allow",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::123456789012:role/EMR_Notebooks_DefaultRole"
      ],
      "Sid": "AllowIAMPassrole"
    }
  ]
}
```

------

**Note**  
To use `%execute_notebook` magic, install the `emr-notebooks-magics` package, version 0.2.3 or higher.

# Use multi-language notebooks with Spark kernels
<a name="emr-multi-language-kernels"></a>

Each Jupyter notebook kernel has a default language. For example, the Spark kernel's default language is Scala, and the PySpark kernels's default language is Python. With Amazon EMR 6.4.0 and later, EMR Studio supports multi-language notebooks. This means that each kernel in EMR Studio can support the following languages in addition to the default language: Python, Spark, R, and Spark SQL.

To activate this feature, specify one of the following magic commands at the beginning of any cell.


****  

| Language | Command | 
| --- | --- | 
| Python | `%%pyspark` | 
| Scala | `%%scalaspark` | 
| R | `%%rspark` Not supported for interactive workloads with EMR Serverless. | 
| Spark SQL | `%%sql` | 

When invoked, these commands execute the entire cell within the same Spark session using the interpreter of the corresponding language.

The `%%pyspark` cell magic allows users to write PySpark code in all Spark kernels.

```
%%pyspark
a = 1
```

The `%%sql` cell magic allows users to execute Spark-SQL code in all Spark kernels.

```
%%sql
SHOW TABLES
```

The `%%rspark` cell magic allows users to execute SparkR code in all Spark kernels.

```
%%rspark
a <- 1
```

The `%%scalaspark` cell magic allows users to execute Spark Scala code in all Spark kernels.

```
%%scalaspark
val a = 1
```

## Share data across language interpreters with temporary tables
<a name="emr-temp-tables"></a>

You can also share data between language interpreters using temporary tables. The following example uses `%%pyspark` in one cell to create a temporary table in Python and uses `%%scalaspark` in the following cell to read data from that table in Scala.

```
%%pyspark
df=spark.sql("SELECT * from nyc_top_trips_report LIMIT 20")
# create a temporary table called nyc_top_trips_report_view in python
df.createOrReplaceTempView("nyc_top_trips_report_view")
```

```
%%scalaspark
// read the temp table in scala
val df=spark.sql("SELECT * from nyc_top_trips_report_view")
df.show(5)
```