SageMaker AI environment variables and the default paths for training storage locations - Amazon SageMaker AI

SageMaker AI environment variables and the default paths for training storage locations

The following table summarizes the input and output paths for training datasets, checkpoints, model artifacts, and outputs, managed by the SageMaker training platform.

Local path in SageMaker training instance SageMaker AI environment variable Purpose Read from S3 during start Read from S3 during Spot-restart Writes to S3 during training Writes to S3 when job is terminated

/opt/ml/input/data/channel_name1

SM_CHANNEL_CHANNEL_NAME

Reading training data from the input channels specified through the SageMaker AI Python SDK Estimator class or the CreateTrainingJob API operation. For more information about how to specify it in your training script using the SageMaker Python SDK, see Prepare a Training script.

Yes Yes No No

/opt/ml/output/data2

SM_OUTPUT_DIR

Saving outputs such as loss, accuracy, intermediate layers, weights, gradients, bias, and TensorBoard-compatible outputs. You can also save any arbitrary output you’d like using this path. Note that this is a different path from the one for storing the final model artifact /opt/ml/model/.

No No No Yes

/opt/ml/model3

SM_MODEL_DIR

Storing the final model artifact. This is also the path from where the model artifact is deployed for Real-time inference in SageMaker AI Hosting.

No No No Yes

/opt/ml/checkpoints4

-

Saving model checkpoints (the state of model) to resume training from a certain point, and recover from unexpected or Managed Spot Training interruptions.

Yes Yes Yes No

/opt/ml/code

SAGEMAKER_SUBMIT_DIRECTORY

Copying training scripts, additional libraries, and dependencies.

Yes Yes No No

/tmp

-

Reading or writing to /tmp as a scratch space.

No No No No

1 channel_name is the place to specify user-defined channel names for training data inputs. Each training job can contain several data input channels. You can specify up to 20 training input channels per training job. Note that the data downloading time from the data channels is counted to the billable time. For more information about data input paths, see How Amazon SageMaker AI Provides Training Information. Also, there are three types of data input modes that SageMaker AI supports: file, FastFile, and pipe mode. To learn more about the data input modes for training in SageMaker AI, see Access Training Data.

2 SageMaker AI compresses and writes training artifacts to TAR files (tar.gz). Compression and uploading time is counted to the billable time. For more information, see How Amazon SageMaker AI Processes Training Output.

3 SageMaker AI compresses and writes the final model artifact to a TAR file (tar.gz). Compression and uploading time is counted to the billable time. For more information, see How Amazon SageMaker AI Processes Training Output.

4 Sync with Amazon S3 during training. Write as is without compressing to TAR files. For more information, see Use Checkpoints in Amazon SageMaker AI.