

# Mapping of training storage paths managed by Amazon SageMaker AI
<a name="model-train-storage"></a>

This page provides a high-level summary of how the SageMaker training platform manages storage paths for training datasets, model artifacts, checkpoints, and outputs between AWS cloud storage and training jobs in SageMaker AI. Throughout this guide, you learn to identify the default paths set by the SageMaker AI platform and how the data channels can be streamlined with your data sources in Amazon Simple Storage Service (Amazon S3), FSx for Lustre, and Amazon EFS. For more information about various data channel input modes and storage options, see [Setting up training jobs to access datasets](model-access-training-data.md).

## Overview of how SageMaker AI maps storage paths
<a name="model-train-storage-overview"></a>

The following diagram shows an example of how SageMaker AI maps input and output paths when you run a training job using the SageMaker Python SDK [Estimator](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html#sagemaker.estimator.Estimator) class. 

![\[An example of how SageMaker AI maps paths between the training job container and the storage when you run a training job using the SageMaker Python SDK Estimator class and its fit method.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/sagemaker-training-storage.png)


SageMaker AI maps storage paths between a storage (such as Amazon S3, Amazon FSx, and Amazon EFS) and the SageMaker training container based on the paths and input mode specified through a SageMaker AI estimator object. More information about how SageMaker AI reads from or writes to the paths and the purpose of the paths, see [SageMaker AI environment variables and the default paths for training storage locations](model-train-storage-env-var-summary.md).

You can use `OutputDataConfig` in the [CreateTrainingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) API to save the results of model training to an S3 bucket. Use the [ModelArtifacts](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ModelArtifacts.html) API to find the S3 bucket that contains your model artifacts. See the [abalone\$1build\$1train\$1deploy](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-pipelines/tabular/abalone_build_train_deploy/sagemaker-pipelines-preprocess-train-evaluate-batch-transform.ipynb) notebook for an example of output paths and how they are used in API calls.

For more information and examples of how SageMaker AI manages data source, input modes, and local paths in SageMaker training instances, see [Access Training Data](https://docs.aws.amazon.com/sagemaker/latest/dg/model-access-training-data.html).

**Topics**
+ [Overview of how SageMaker AI maps storage paths](#model-train-storage-overview)
+ [Uncompressed model output](model-train-storage-uncompressed.md)
+ [Managing storage paths for different types of instance local storage](model-train-storage-tips-considerations.md)
+ [SageMaker AI environment variables and the default paths for training storage locations](model-train-storage-env-var-summary.md)

# Uncompressed model output
<a name="model-train-storage-uncompressed"></a>

SageMaker AI stores your model in `/opt/ml/model` and your data in `/opt/ml/output/data`. After the model and data are written to those locations, they're uploaded to your Amazon S3 bucket as compressed files by default. 

You can save time on large data file compression by uploading model and data outputs to your S3 bucket as uncompressed files. To do this, create a training job in uncompressed upload mode by using either the AWS Command Line Interface (AWS CLI) or the SageMaker Python SDK. 

The following code example shows how to create a training job in uncompressed upload mode when using the AWS CLI. To enable uncompressed upload mode, set `CompressionType` field in the `OutputDataConfig` API to **NONE**.

```
{
   "TrainingJobName": "uncompressed_model_upload",
   ...
   "OutputDataConfig": { 
      "S3OutputPath": "s3://amzn-s3-demo-bucket/uncompressed_upload/output",
      "CompressionType": "NONE"
   },
   ...
}
```

The following code example shows you how to create a training job in uncompressed upload mode using the SageMaker Python SDK.

```
import sagemaker
from sagemaker.estimator import Estimator

estimator = Estimator(
    image_uri="your-own-image-uri",
    role=sagemaker.get_execution_role(), 
    sagemaker_session=sagemaker.Session(),
    instance_count=1,
    instance_type='ml.c4.xlarge',
    disable_output_compression=True
)
```

# Managing storage paths for different types of instance local storage
<a name="model-train-storage-tips-considerations"></a>

Consider the following when setting up storage paths for training jobs in SageMaker AI.
+ If you want to store training artifacts for distributed training in the `/opt/ml/output/data` directory, you must properly append subdirectories or use unique file names for the artifacts through your model definition or training script. If the subdirectories and file names are not properly configured, all of the distributed training workers might write outputs to the same file name in the same output path in Amazon S3.
+ If you use a custom training container, make sure you install the [SageMaker Training Toolkit](https://github.com/aws/sagemaker-training-toolkit) that helps set up the environment for SageMaker training jobs. Otherwise, you must specify the environment variables explicitly in your Dockerfile. For more information, see [Create a container with your own algorithms and models](https://docs.aws.amazon.com/sagemaker/latest/dg/docker-containers-create.html).
+ When using an ML instance with [NVMe SSD volumes](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-instance-store.html#nvme-ssd-volumes), SageMaker AI doesn't provision Amazon EBS gp2 storage. Available storage is fixed to the NVMe-type instance's storage capacity. SageMaker AI configures storage paths for training datasets, checkpoints, model artifacts, and outputs to use the entire capacity of the instance storage. For example, ML instance families with the NVMe-type instance storage include `ml.p4d`, `ml.g4dn`, and `ml.g5`. When using an ML instance with the EBS-only storage option and without instance storage, you must define the size of EBS volume through the `volume_size` parameter in the SageMaker AI estimator class (or `VolumeSizeInGB` if you are using the `ResourceConfig` API). For example, ML instance families that use EBS volumes include `ml.c5` and `ml.p2`. To look up instance types and their instance storage types and volumes, see [Amazon EC2 Instance Types](https://aws.amazon.com/ec2/instance-types/).
+ The default paths for SageMaker training jobs are mounted to Amazon EBS volumes or NVMe SSD volumes of the ML instance. When you adapt your training script to SageMaker AI, make sure that you use the default paths listed in the previous topic about [SageMaker AI environment variables and the default paths for training storage locations](model-train-storage-env-var-summary.md). We recommend that you use the `/tmp` directory as a scratch space for temporarily storing any large objects during training. This means that you must not use directories that are mounted to small disk space allocated for system, such as `/user` and `/home`, to avoid out-of-space errors.

To learn more, see the AWS machine learning blog [Choose the best data source for your Amazon SageMaker training job](https://aws.amazon.com/blogs/machine-learning/choose-the-best-data-source-for-your-amazon-sagemaker-training-job/) that further discusses case studies and performance benchmarks of data sources and input modes.

# SageMaker AI environment variables and the default paths for training storage locations
<a name="model-train-storage-env-var-summary"></a>

The following table summarizes the input and output paths for training datasets, checkpoints, model artifacts, and outputs, managed by the SageMaker training platform.


| Local path in SageMaker training instance | SageMaker AI environment variable | Purpose | Read from S3 during start | Read from S3 during Spot-restart | Writes to S3 during training | Writes to S3 when job is terminated | 
| --- | --- | --- | --- | --- | --- | --- | 
|  `/opt/ml/input/data/channel_name`1   |  SM\$1CHANNEL\$1*CHANNEL\$1NAME*  |  Reading training data from the input channels specified through the SageMaker AI Python SDK [Estimator](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html#sagemaker.estimator.Estimator) class or the [CreateTrainingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) API operation. For more information about how to specify it in your training script using the SageMaker Python SDK, see [Prepare a Training script](https://sagemaker.readthedocs.io/en/stable/overview.html?highlight=VPC#prepare-a-training-script).  | Yes | Yes | No | No | 
|  `/opt/ml/output/data`2  | SM\$1OUTPUT\$1DIR |  Saving outputs such as loss, accuracy, intermediate layers, weights, gradients, bias, and TensorBoard-compatible outputs. You can also save any arbitrary output you’d like using this path. Note that this is a different path from the one for storing the final model artifact `/opt/ml/model/`.  | No | No | No | Yes | 
|  `/opt/ml/model`3  | SM\$1MODEL\$1DIR |  Storing the final model artifact. This is also the path from where the model artifact is deployed for [Real-time inference](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints.html) in SageMaker AI Hosting.  | No | No | No | Yes | 
|  `/opt/ml/checkpoints`4  | - |  Saving model checkpoints (the state of model) to resume training from a certain point, and recover from unexpected or [Managed Spot Training](https://docs.aws.amazon.com/sagemaker/latest/dg/model-managed-spot-training.html) interruptions.  | Yes | Yes | Yes | No | 
|  `/opt/ml/code`  | SAGEMAKER\$1SUBMIT\$1DIRECTORY |  Copying training scripts, additional libraries, and dependencies.  | Yes | Yes | No | No | 
|  `/tmp`  | - |  Reading or writing to `/tmp` as a scratch space.  | No | No | No | No | 

1 `channel_name` is the place to specify user-defined channel names for training data inputs. Each training job can contain several data input channels. You can specify up to 20 training input channels per training job. Note that the data downloading time from the data channels is counted to the billable time. For more information about data input paths, see [How Amazon SageMaker AI Provides Training Information](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo-running-container.html). Also, there are three types of data input modes that SageMaker AI supports: file, FastFile, and pipe mode. To learn more about the data input modes for training in SageMaker AI, see [Access Training Data](https://docs.aws.amazon.com/sagemaker/latest/dg/model-access-training-data.html).

2 SageMaker AI compresses and writes training artifacts to TAR files (`tar.gz`). Compression and uploading time is counted to the billable time. For more information, see [How Amazon SageMaker AI Processes Training Output](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo-output.html).

3 SageMaker AI compresses and writes the final model artifact to a TAR file (`tar.gz`). Compression and uploading time is counted to the billable time. For more information, see [How Amazon SageMaker AI Processes Training Output](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo-output.html).

4 Sync with Amazon S3 during training. Write as is without compressing to TAR files. For more information, see [Use Checkpoints in Amazon SageMaker AI](https://docs.aws.amazon.com/sagemaker/latest/dg/model-checkpoints.html).