Create an Amazon SageMaker notebook instance
Important
Custom IAM policies that allow Amazon SageMaker Studio or Amazon SageMaker Studio Classic to create Amazon SageMaker resources must also grant permissions to add tags to those resources. The permission to add tags to resources is required because Studio and Studio Classic automatically tag any resources they create. If an IAM policy allows Studio and Studio Classic to create resources but does not allow tagging, "AccessDenied" errors can occur when trying to create resources. For more information, see Provide permissions for tagging SageMaker AI resources.
AWS managed policies for Amazon SageMaker AI that give permissions to create SageMaker resources already include permissions to add tags while creating those resources.
An Amazon SageMaker notebook instance is a ML compute instance running the Jupyter Notebook application. SageMaker AI manages creating the instance and related resources. Use Jupyter notebooks in your notebook instance to:
-
prepare and process data
-
write code to train models
-
deploy models to SageMaker AI hosting
-
test or validate your models
To create a notebook instance, use either the SageMaker AI console or the
CreateNotebookInstance
API.
The notebook instance type you choose depends on how you use your notebook instance. Ensure that your notebook instance is not bound by memory, CPU, or IO. To load a dataset into memory on the notebook instance for exploration or preprocessing, choose an instance type with enough RAM memory for your dataset. This requires an instance with at least 16 GB of memory (.xlarge or larger). If you plan to use the notebook for compute intensive preprocessing, we recommend you choose a compute-optimized instance such as a c4 or c5.
A best practice when using a SageMaker notebook is to use the notebook instance to orchestrate other AWS services. For example, you can use the notebook instance to manage large dataset processing. To do this, make calls to AWS Glue for ETL (extract, transform, and load) services or Amazon EMR for mapping and data reduction using Hadoop. You can use AWS services as temporary forms of computation or storage for your data.
You can store and retrieve your training and test data using an Amazon Simple Storage Service bucket. You can then use SageMaker AI to train and build your model. As a result, the instance type of your notebook would have no bearing on the speed of your model training and testing.
After receiving the request, SageMaker AI does the following:
-
Creates a network interface—If you choose the optional VPC configuration, SageMaker AI creates the network interface in your VPC. It uses the subnet ID that you provide in the request to determine which Availability Zone to create the subnet in. SageMaker AI associates the security group that you provide in the request with the subnet. For more information, see Connect a Notebook Instance in a VPC to External Resources.
-
Launches an ML compute instance—SageMaker AI launches an ML compute instance in a SageMaker AI VPC. SageMaker AI performs the configuration tasks that allow it to manage your notebook instance. If you specified your VPC, SageMaker AI enables traffic between your VPC and the notebook instance.
-
Installs Anaconda packages and libraries for common deep learning platforms—SageMaker AI installs all of the Anaconda packages that are included in the installer. For more information, see Anaconda package list
. SageMaker AI also installs the TensorFlow and Apache MXNet deep learning libraries. -
Attaches an ML storage volume—SageMaker AI attaches an ML storage volume to the ML compute instance. You can use the volume as a working area to clean up the training dataset or to temporarily store validation, test, or other data. Choose any size between 5 GB and 16384 GB, in 1 GB increments, for the volume. The default is 5 GB. ML storage volumes are encrypted, so SageMaker AI can't determine the amount of available free space on the volume. Because of this, you can increase the volume size when you update a notebook instance, but you can't decrease the volume size. If you want to decrease the size of the ML storage volume in use, create a new notebook instance with the desired size.
Only files and data saved within the
/home/ec2-user/SageMaker
folder persist between notebook instance sessions. Files and data that are saved outside this directory are overwritten when the notebook instance stops and restarts. Each notebook instance's /tmp directory provides a minimum of 10 GB of storage in an instance store. An instance store is temporary, block-level storage that isn't persistent. When the instance is stopped or restarted, SageMaker AI deletes the directory's contents. This temporary storage is part of the root volume of the notebook instance.If the instance type used by the notebook instance has NVMe support, customers can use the NVMe instance store volumes available for that instance type. For instances with NVMe store volumes, all instance store volumes are automatically attached to the instance at launch. For more information about instance types and their associated NVMe store volumes, see the Amazon Elastic Compute Cloud Instance Type Details
. To make the attached NVMe store volume available for your notebook instance, complete the steps in Make instance store volumes available on your instance . Complete the steps with root access or by using a lifecycle configuration script.
Note
NVMe instance store volumes are not persistent storage. This storage is short-lived with the instance and must be reconfigured every time an instance with this storage is launched.
-
Copies example Jupyter notebooks— These Python code examples show model training and hosting exercises using different algorithms and training datasets.
To create a SageMaker AI notebook instance:
-
Open the SageMaker AI console at https://console.aws.amazon.com/sagemaker/
. -
Choose Notebook instances, then choose Create notebook instance.
-
On the Create notebook instance page, provide the following information:
-
For Notebook instance name, type a name for your notebook instance.
-
For Notebook instance type, choose an instance size suitable for your use case. For a list of supported instance types and quotas, see Amazon SageMaker AI Service Quotas.
For Platform Identifier, choose a platform type to create the notebook instance on. This platform type dictates the Operating System and the JupyterLab version that your notebook instance is created with. For information about platform identifier type, see Amazon Linux 2 notebook instances. For information about JupyterLab versions, see JupyterLab versioning.
-
(Optional) Additional configuration lets advanced users create a shell script that can run when you create or start the instance. This script, called a lifecycle configuration script, can be used to set the environment for the notebook or to perform other functions. For information, see Customization of a SageMaker notebook instance using an LCC script.
-
(Optional) Additional configuration also lets you specify the size, in GB, of the ML storage volume that is attached to the notebook instance. You can choose a size between 5 GB and 16,384 GB, in 1 GB increments. You can use the volume to clean up the training dataset or to temporarily store validation or other data.
-
(Optional) For Minimum IMDS Version, select a version from the dropdown list. If this value is set to v1, both versions can be used with the notebook instance. If v2 is selected, then only IMDSv2 can be used with the notebook instance. For information about IMDSv2, see Use IMDSv2.
Note
Starting October 31, 2022, the default minimum IMDS Version for SageMaker notebook instances changes from IMDSv1 to IMDSv2.
Starting February 1, 2023, IMDSv1 is no longer be available for new notebook instance creation. After this date, you can create notebook instances with a minimum IMDS version of 2.
-
For IAM role, choose either an existing IAM role in your account with the necessary permissions to access SageMaker AI resources or Create a new role. If you choose Create a new role, SageMaker AI creates an IAM role named
AmazonSageMaker-ExecutionRole-
. The AWS managed policyYYYYMMDD
THHmmSS
AmazonSageMakerFullAccess
is attached to the role. The role provides permissions that allow the notebook instance to call SageMaker AI and Amazon S3. -
For Root access, to give root access for all notebook instance users, choose Enable. To remove root access for users, choose Disable.If you give root access, all notebook instance users have administrator privileges and can access and edit all files on it.
-
(Optional) Encryption key lets you encrypt data on the ML storage volume attached to the notebook instance using an AWS Key Management Service (AWS KMS) key. If you plan to store sensitive information on the ML storage volume, consider encrypting the information.
-
(Optional) Network lets you put your notebook instance inside a Virtual Private Cloud (VPC). A VPC provides additional security and limits access to resources in the VPC from sources outside the VPC. For more information on VPCs, see Amazon VPC User Guide.
To add your notebook instance to a VPC:
-
Choose the VPC and a SubnetId.
-
For Security Group, choose your VPC's default security group.
-
If you need your notebook instance to have internet access, enable direct internet access. For Direct internet access, choose Enable. Internet access can make your notebook instance less secure. For more information, see Connect a Notebook Instance in a VPC to External Resources.
-
-
(Optional) To associate Git repositories with the notebook instance, choose a default repository and up to three additional repositories. For more information, see Git repositories with SageMaker AI Notebook Instances.
-
Choose Create notebook instance.
In a few minutes, Amazon SageMaker AI launches an ML compute instance—in this case, a notebook instance—and attaches an ML storage volume to it. The notebook instance has a preconfigured Jupyter notebook server and a set of Anaconda libraries. For more information, see the
CreateNotebookInstance
API.
-
-
When the status of the notebook instance is
InService
, in the console, the notebook instance is ready to use. Choose Open Jupyter next to the notebook name to open the classic Jupyter dashboard.Note
To augment the security of your Amazon SageMaker notebook instance, all regional
domains are registered in the internet Public Suffix List (PSL)notebook
.region
.sagemaker.aws. For further security, we recommend that you use cookies with a __Host-
prefix to set sensitive cookies for the domains of your SageMaker notebook instances. This helps to defend your domain against cross-site request forgery attempts (CSRF). For more information, see the Set-Cookiepage in the mozilla.org developer documentation website. You can choose Open JupyterLab to open the JupyterLab dashboard. The dashboard provides access to your notebook instance and sample SageMaker AI notebooks that contain complete code walkthroughs. These walkthroughs show how to use SageMaker AI to perform common machine learning tasks. For more information, see Access example notebooks. For more information, see Control root access to a SageMaker notebook instance.
For more information about Jupyter notebooks, see The Jupyter notebook
.