Available options - Amazon SageMaker AI

Available options

The following table displays all available options you can use to customize your notebook job, whether you run your Notebook Job in Studio, a local Jupyter environment, or using the SageMaker Python SDK. The table includes the type of custom option, a description, additional guidelines about how to use the option, a field name for the option in Studio (if available) and the parameter name for the notebook job step in the SageMaker Python SDK (if available).

For some options, you can also preset custom default values so you don’t have to specify them every time you set up a notebook job. For Studio, these options are Role, Input folder, Output folder, and KMS Key ID, and are specified in the following table. If you preset custom defaults for these options, these fields are prepopulated in the Create Job form when you create your notebook job. For details about how to create custom defaults in Studio and local Jupyter environments, see Set up default options for local notebooks.

The SageMaker SDK also gives you the option to set intelligent defaults so that you don’t have to specify these parameters when you create a NotebookJobStep. These parameters are role, s3_root_uri, s3_kms_key, volume_kms_key, subnets, security_group_ids, and are specified in the following table. For information about how to set intelligent defaults, see Set up default options.

Custom option Description Studio-specific guideline Local Jupyter environment guideline SageMaker Python SDK guideline
Job name Your job name as it should appear in the Notebook Jobs dashboard. Field Job name. Same as Studio. Parameter notebook_job_name. Defaults to None.
Image The container image used to run the notebook noninteractively on the chosen compute type. Field Image. This field defaults to your notebook’s current image. Change this field from the default to a custom value if needed. If Studio cannot infer this value, the form displays a validation error requiring you to specify it. This image can be a custom, bring-your-own image or an available Amazon SageMaker image. For a list of available SageMaker images supported by the notebook scheduler, see Amazon SageMaker images available for use with Studio Classic. Field Image. This field requires an ECR URI of a Docker image that can run the provided notebook on the selected compute type. By default, the scheduler extension uses a pre-built SageMaker AI Docker image—base Python 2.0. This is the official Python 3.8 image from DockerHub with boto3, AWS CLI, and the Python 3 kernel. You can also provide any ECR URI that meets the notebook custom image specification. For details, see Custom SageMaker image specifications. This image should have all the kernels and libraries needed for the notebook run. Required. Parameter image_uri. URI location of a Docker image on ECR. You can use specific SageMaker Distribution Images or custom image based on those images, or your own image pre-installed with notebook job dependencies that meets additional requirements. For details, see Image constraints for SageMaker AI Python SDK notebook jobs.
Instance type The EC2 instance type to use to run the notebook job. The notebook job uses a SageMaker Training Job as a computing layer, so the specified instance type should be a SageMaker Training supported instance type. Field Compute type. Defaults to ml.m5.large. Same as Studio. Parameter instance_type. Defaults to ml.m5.large.
Kernel The Jupyter kernel used to run the notebook job. Field Kernel. This field defaults to your notebook’s current kernel. Change this field from the default to a custom value if needed. If Studio cannot infer this value, the form displays a validation error requiring you to specify it. Field Kernel. This kernel should be present in the image and follow the Jupyter kernel specs. This field defaults to the Python3 kernel found in the base Python 2.0 SageMaker image. Change this field to a custom value if needed. Required. Parameter kernel_name. This kernel should be present in the image and follow the Jupyter kernel specs. To see the kernel identifiers for your image, see (LINK).
SageMaker AI session The underlying SageMaker AI session to which SageMaker AI service calls are delegated. N/A N/A Parameter sagemaker_session. If unspecified, one is created using a default configuration chain.
Role ARN The role’s Amazon Resource Name (ARN) used with the notebook job. Field Role ARN. This field defaults to the Studio execution role. Change this field to a custom value if needed.
Note

If Studio cannot infer this value, the Role ARN field is blank. In this case, insert the ARN you want to use.

Field Role ARN. This field defaults to any role prefixed with SagemakerJupyterScheduler. If you have multiple roles with the prefix, the extension chooses one. Change this field to a custom value if needed. For this field, you can set your own user default that pre-populates whenever you create a new job definition. For details, see Set up default options for local notebooks. Parameter role. Defaults to the SageMaker AI default IAM role if the SDK is running in SageMaker Notebooks or SageMaker Studio Notebooks. Otherwise, it throws a ValueError. Allows intelligent defaults.
Input notebook The name of the notebook which you are scheduling to run. Required. Field Input file. Same as Studio. Required.Parameter input_notebook.
Input folder The folder containing your inputs. The job inputs, including the input notebook and any optional start-up or initialization scripts, are put in this folder. Field Input folder. If you don’t provide a folder, the scheduler creates a default Amazon S3 bucket for your inputs. Same as Studio. For this field, you can set your own user default that pre-populates whenever you create a new job definition. For details, see Set up default options for local notebooks. N/A. The input folder is placed inside the location specified by parameter s3_root_uri.
Output folder The folder containing your outputs. The job outputs, including the output notebook and logs, are put in this folder. Field Output folder. If you don’t specify a folder, the scheduler creates a default Amazon S3 bucket for your outputs. Same as Studio. For this field, you can set your own user default that pre-populates whenever you create a new job definition. For details, see Set up default options for local notebooks. N/A. The output folder is placed inside the location specified by parameter s3_root_uri.
Parameters A dictionary of variables and values to pass to your notebook job. Field Parameters. You need to parameterize your notebook to accept parameters. Same as Studio. Parameter parameters. You need to parameterize your notebook to accept parameters.
Additional (file or folder) dependencies The list of file or folder dependencies which the notebook job uploads to s3 staged folder. Not supported. Not supported. Parameter additional_dependencies. The notebook job uploads these dependencies to an S3 staged folder so they can be consumed during execution.
S3 root URI The folder containing your inputs. The job inputs, including the input notebook and any optional start-up or initialization scripts, are put in this folder. N/A. Use Input Folder and Output folder. Same as Studio. Parameter s3_root_uri. Defaults to a default S3 bucket. Allows intelligent defaults.
Environment variables Any existing environment variables that you want to override, or new environment variables that you want to introduce and use in your notebook. Field Environment variables. Same as Studio. Parameter environment_variables. Defaults to None.
Tags A list of tags attached to the job. N/A N/A Parameter tags. Defaults to None. Your tags control how the Studio UI captures and displays the job created by the pipeline. For details, see View your notebook jobs in the Studio UI dashboard.
Start-up script A script preloaded in the notebook startup menu that you can choose to run before you run the notebook. Field Start-up script. Select a Lifecycle Configuration (LCC) script that runs on the image at start-up.
Note

A start-up script runs in a shell outside of the Studio environment. Therefore, this script cannot depend on the Studio local storage, environment variables, or app metadata (in /opt/ml/metadata). Also, if you use a start-up script and an initialization script, the start-up script runs first.

Not supported. Not supported.
Initialization script A path to a local script you can run when your notebook starts up. Field Initialization script. Enter the EFS file path where a local script or a Lifecycle Configuration (LCC) script is located. If you use a start-up script and an initialization script, the start-up script runs first.
Note

An initialization script is sourced from the same shell as the notebook job. This is not the case for a start-up script described previously. Also, if you use a start-up script and an initialization script, the start-up script runs first.

Field Initialization script. Enter the local file path where a local script or a Lifecycle Configuration (LCC) script is located. Parameter initialization_script. Defaults to None.
Max retry attempts The number of times Studio tries to rerun a failed job run. Field Max retry attempts. Defaults to 1. Same as Studio. Parameter max_retry_attempts. Defaults to 1.
Max run time (in seconds) The maximum length of time, in seconds, that a notebook job can run before it is stopped. If you configure both Max run time and Max retry attempts, the run time applies to each retry. If a job does not complete in this time, its status is set to Failed. Field Max run time (in seconds). Defaults to 172800 seconds (2 days). Same as Studio. Parameter max_runtime_in_seconds. Defaults to 172800 seconds (2 days).
Retry policies A list of retry policies, which govern actions to take in case of failure. Not supported. Not supported. Parameter retry_policies. Defaults to None.
Add Step or StepCollection dependencies A list of Step or StepCollection names or instances on which the job depends. Not supported. Not supported. Parameter depends_on. Defaults to None. Use this to define explicit dependencies between steps in your pipeline graph.
Volume size The size in GB of the storage volume for storing input and output data during training. Not supported. Not supported. Parameter volume_size. Defaults to 30GB.
Encrypt traffic between containers A flag that specifies whether traffic between training containers is encrypted for the training job. N/A. Enabled by default. N/A. Enabled by default. Parameter encrypt_inter_container_traffic. Defaults to True.
Configure job encryption An indicator that you want to encrypt your notebook job outputs, job instance volume, or both. Field Configure job encryption. Check this box to choose encryption. If left unchecked, the job outputs are encrypted with the account's default KMS key and the job instance volume is not encrypted. Same as Studio. Not supported.
Output encryption KMS key A KMS key to use if you want to customize the encryption key used for your notebook job outputs. This field is only applicable if you checked Configure job encryption. Field Output encryption KMS key. If you do not specify this field, your notebook job outputs are encrypted with SSE-KMS using the default Amazon S3 KMS key. Also, if you create the Amazon S3 bucket yourself and use encryption, your encryption method is preserved. Same as Studio. For this field, you can set your own user default that pre-populates whenever you create a new job definition. For details, see Set up default options for local notebooks. Parameter s3_kms_key. Defaults to None. Allows intelligent defaults.
Job instance volume encryption KMS key A KMS key to use if you want to encrypt your job instance volume. This field is only applicable if you checked Configure job encryption. Field Job instance volume encryption KMS key. Field Job instance volume encryption KMS key. For this field, you can set your own user default that pre-populates whenever you create a new job definition. For details, see Set up default options for local notebooks. Parameter volume_kms_key. Defaults to None. Allows intelligent defaults.
Use a Virtual Private Cloud to run this job (for VPC users) An indicator that you want to run this job in a Virtual Private Cloud (VPC). For better security, it is recommend that you use a private VPC. Field Use a Virtual Private Cloud to run this job. Check this box if you want to use a VPC. At minimum, create the following VPC endpoints to enable your notebook job to privately connect to those AWS resources:
If you choose to use a VPC, you need to specify at least one private subnet and at least one security group in the following options. If you don’t use any private subnets, you need to consider other configuration options. For details, see Public VPC subnets not supported in Constraints and considerations.
Same as Studio. N/A
Subnet(s) (for VPC users) Your subnets. This field must contain at least one and at most five, and all the subnets you provide should be private. For details, see Public VPC subnets not supported in Constraints and considerations. Field Subnet(s). This field defaults to the subnets associated with the Studio domain, but you can change this field if needed. Field Subnet(s). The scheduler cannot detect your subnets, so you need to enter any subnets you configured for your VPC. Parameter subnets. Defaults to None. Allows intelligent defaults.
Security group(s) (for VPC users) Your security groups. This field must contain at least one and at most 15. For details, see Public VPC subnets not supported in Constraints and considerations. Field Security groups. This field defaults to the security groups associated with the domain VPC, but you can change this field if needed. Field Security groups. The scheduler cannot detect your security groups, so you need to enter any security groups you configured for your VPC. Parameter security_group_ids. Defaults to None. Allows intelligent defaults.
Name The name of the notebook job step. N/A N/A Parameter name. If unspecified, it is derived from the notebook file name.
Display name Your job name as it should appear in your list of pipeline executions. N/A N/A Parameter display_name. Defaults to None.
Description A description of your job. N/A N/A Parameter description.