Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

User guide

Focus mode
User guide - Amazon SageMaker AI

This section covers how data scientist and data engineers can launch, discover, connect to, or terminate an Amazon EMR cluster from Studio or Studio Classic.

Before users can list or launch clusters, administrators must have configured the necessary settings in the Studio environment. For information on how administrators can configure a Studio environment to allow self-provisioning and listing of Amazon EMR clusters, see Admin guide.

Supported images and kernels to connect to an Amazon EMR cluster from Studio or Studio Classic

The following images and kernels come with sagemaker-studio-analytics-extension, the JupyterLab extension that connects to a remote Spark (Amazon EMR) cluster via the SparkMagic library using Apache Livy.

  • For Studio users: SageMaker Distribution is a Docker environment for data science used as the default image of JupyterLab notebook instances. All versions of SageMaker AI Distribution come with sagemaker-studio-analytics-extension pre-installed.

  • For Studio Classic users: The following images come pre-installed with sagemaker-studio-analytics-extension:

    • DataScience – Python 3 kernel

    • DataScience 2.0 – Python 3 kernel

    • DataScience 3.0 – Python 3 kernel

    • SparkAnalytics 1.0 – SparkMagic and PySpark kernels

    • SparkAnalytics 2.0 – SparkMagic and PySpark kernels

    • SparkMagic – SparkMagic and PySpark kernels

    • PyTorch 1.8 – Python 3 kernels

    • TensorFlow 2.6 – Python 3 kernel

    • TensorFlow 2.11 – Python 3 kernel

To connect to Amazon EMR clusters using another built-in image or your own image, follow the instructions in Bring your own image.

Bring your own image

To bring your own image in Studio or Studio Classic and allow your notebooks to connect to Amazon EMR clusters, install the following sagemaker-studio-analytics-extension extension to your kernel. It supports connecting SageMaker Studio or Studio Classic notebooks to Spark(Amazon EMR) clusters through the SparkMagic library.

pip install sparkmagic pip install sagemaker-studio-sparkmagic-lib pip install sagemaker-studio-analytics-extension

Additionally, to connect to Amazon EMR with Kerberos authentication, you must install the kinit client. Depending on your OS, the command to install the kinit client can vary. To bring an Ubuntu (Debian based) image, use the apt-get install -y -qq krb5-user command.

For more information on bringing your own image in SageMaker Studio or Studio Classic, see Bring your own SageMaker image.

PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.