User guide
This section covers how data scientist and data engineers can launch, discover, connect to, or terminate an Amazon EMR cluster from Studio or Studio Classic.
Before users can list or launch clusters, administrators must have configured the necessary settings in the Studio environment. For information on how administrators can configure a Studio environment to allow self-provisioning and listing of Amazon EMR clusters, see Admin guide.
Topics
- Supported images and kernels to connect to an Amazon EMR cluster from Studio or Studio Classic
- Bring your own image
- Launch an Amazon EMR cluster from Studio or Studio Classic
- List Amazon EMR clusters from Studio or Studio Classic
- Connect to an Amazon EMR cluster from SageMaker Studio or Studio Classic
- Terminate an Amazon EMR cluster from Studio or Studio Classic
- Access Spark UI from Studio or Studio Classic
Supported images and kernels to connect to an Amazon EMR cluster from Studio or Studio Classic
The following images and kernels come with sagemaker-studio-analytics-extension
-
For Studio users: SageMaker Distribution is a Docker environment for data science used as the default image of JupyterLab notebook instances. All versions of SageMaker Distribution
come with sagemaker-studio-analytics-extension
pre-installed. -
For Studio Classic users: The following images come pre-installed with
sagemaker-studio-analytics-extension
:-
DataScience – Python 3 kernel
-
DataScience 2.0 – Python 3 kernel
-
DataScience 3.0 – Python 3 kernel
-
SparkAnalytics 1.0 – SparkMagic and PySpark kernels
-
SparkAnalytics 2.0 – SparkMagic and PySpark kernels
-
SparkMagic – SparkMagic and PySpark kernels
-
PyTorch 1.8 – Python 3 kernels
-
TensorFlow 2.6 – Python 3 kernel
-
TensorFlow 2.11 – Python 3 kernel
-
To connect to Amazon EMR clusters using another built-in image or your own image, follow the instructions in Bring your own image.
Bring your own image
To bring your own image in Studio or Studio Classic and allow your notebooks to
connect to Amazon EMR clusters, install the following sagemaker-studio-analytics-extension
pip install sparkmagic pip install sagemaker-studio-sparkmagic-lib pip install sagemaker-studio-analytics-extension
Additionally, to connect to Amazon EMR with Kerberos authentication,
you must install the kinit client. Depending on your OS, the command to install the
kinit client can vary. To bring an Ubuntu (Debian based) image, use the apt-get
install -y -qq krb5-user
command.
For more information on bringing your own image in SageMaker Studio or Studio Classic, see Bring your own SageMaker image.