# Setting up training jobs to access datasets
<a name="model-access-training-data"></a>

When creating a training job, you specify the location of training datasets in a data storage of your choice and the data input mode for the job. Amazon SageMaker AI supports Amazon Simple Storage Service (Amazon S3), Amazon Elastic File System (Amazon EFS), and Amazon FSx for Lustre. You can choose one of the input modes to stream the dataset in real time or download the whole dataset at the start of the training job.

**Note**  
Your dataset must reside in the same AWS Region as the training job.

## SageMaker AI input modes and AWS cloud storage options
<a name="model-access-training-data-input-modes"></a>

This section provides an overview of the file input modes supported by SageMaker for data stored in Amazon EFS and Amazon FSx for Lustre.

![\[Summary of the SageMaker AI input modes for Amazon S3 and file systems in Amazon EFS and Amazon FSx for Lustre.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/sagemaker-training-input-mode.png)

+ *File mode* presents a file system view of the dataset to the training container. This is the default input mode if you don't explicitly specify one of the other two options. If you use file mode, SageMaker AI downloads the training data from the storage location to a local directory in the Docker container. Training starts after the full dataset has been downloaded. In file mode, the training instance must have enough storage space to fit the entire dataset. File mode download speed depends on the size of dataset, the average size of files, and the number of files. You can configure the dataset for file mode by providing either an Amazon S3 prefix, manifest file, or augmented manifest file. You should use an S3 prefix when all your dataset files are located within a common S3 prefix. File mode is compatible with [SageMaker AI local mode](https://sagemaker.readthedocs.io/en/stable/overview.html#local-mode) (starting a SageMaker training container interactively in seconds). For distributed training, you can shard the dataset across multiple instances with the `ShardedByS3Key` option.
+ *Fast file mode* provides file system access to an Amazon S3 data source while leveraging the performance advantage of pipe mode. At the start of training, fast file mode identifies the data files but does not download them. Training can start without waiting for the entire dataset to download. This means that the training startup takes less time when there are fewer files in the Amazon S3 prefix provided.

  In contrast to pipe mode, fast file mode works with random access to the data. However, it works best when data is read sequentially. Fast file mode doesn't support augmented manifest files.

  Fast file mode exposes S3 objects using a POSIX-compliant file system interface, as if the files are available on the local disk of your training instance. It streams S3 content on demand as your training script consumes data. This means that your dataset no longer needs to fit into the training instance storage space as a whole, and you don't need to wait for the dataset to be downloaded to the training instance before training starts. Fast file currently supports S3 prefixes only (it does not support manifest and augmented manifest). Fast file mode is compatible with SageMaker AI local mode.
**Note**  
Using Fast File mode might lead to increased CloudTrail costs due to additional logging of:  
Amazon S3 data events (if enabled in CloudTrail).
AWS KMS decryption events when accessing Amazon S3 objects encrypted with AWS KMS keys.
Management events related to AWS KMS operations.
Review your CloudTrail configuration and monitoring costs if you have CloudTrail logging enabled for these event types.
+ *Pipe mode* streams data directly from an Amazon S3 data source. Streaming can provide faster start times and better throughput than file mode.

  When you stream the data directly, you can reduce the size of the Amazon EBS volumes used by the training instance. Pipe mode needs only enough disk space to store the final model artifacts.

  It is another streaming mode that is largely replaced by the newer and simpler-to-use fast file mode. In pipe mode, data is pre-fetched from Amazon S3 at high concurrency and throughput, and streamed into a named pipe, which also known as a First-In-First-Out (FIFO) pipe for its behavior. Each pipe may only be read by a single process. A SageMaker AI specific extension to TensorFlow conveniently [integrates Pipe mode into the native TensorFlow data loader](https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/using_tf.html#training-with-pipe-mode-using-pipemodedataset) for streaming text, TFRecords, or RecordIO file formats. Pipe mode also supports managed sharding and shuffling of data.
+ *Amazon S3 Express One Zone* is a high-performance, single Availability Zone storage class that can deliver consistent, single-digit millisecond data access for the most latency-sensitive applications including SageMaker model training. Amazon S3 Express One Zone allows customers to collocate their object storage and compute resources in a single AWS Availability Zone, optimizing both compute performance and costs with increased data processing speed. To further increase access speed and support hundreds of thousands of requests per second, data is stored in a new bucket type, an Amazon S3 directory bucket.

  SageMaker AI model training supports high-performance Amazon S3 Express One Zone directory buckets as a data input location for file mode, fast file mode, and pipe mode. To use Amazon S3 Express One Zone, input the location of the Amazon S3 Express One Zone directory bucket instead of an Amazon S3 bucket. Provide the ARN for the IAM role with the required access control and permissions policy. Refer to [AmazonSageMakerFullAccesspolicy](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonSageMakerFullAccess.html) for details. You can only encrypt your SageMaker AI output data in directory buckets with server-side encryption with Amazon S3 managed keys (SSE-S3). Server-side encryption with AWS KMS keys (SSE-KMS) is not currently supported for storing SageMaker AI output data in directory buckets. For more information, see [Amazon S3 Express One Zone](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-express-one-zone.html).
+ Amazon FSx for Lustre – FSx for Lustre can scale to hundreds of gigabytes of throughput and millions of IOPS with low-latency file retrieval. When starting a training job, SageMaker AI mounts the FSx for Lustre file system to the training instance file system, then starts your training script. Mounting itself is a relatively fast operation that doesn't depend on the size of the dataset stored in FSx for Lustre. 

  To access FSx for Lustre, your training job must connect to an Amazon Virtual Private Cloud (VPC), which requires DevOps setup and involvement. To avoid data transfer costs, the file system uses a single Availability Zone, and you need to specify a VPC subnet which maps to this Availability Zone ID when running the training job.
+ Amazon EFS – To use Amazon EFS as a data source, the data must already reside in Amazon EFS prior to training. SageMaker AI mounts the specified Amazon EFS file system to the training instance, then starts your training script. Your training job must connect to a VPC to access Amazon EFS.
**Tip**  
To learn more about how to specify your VPC configuration to SageMaker AI estimators, see [Use File Systems as Training Inputs](https://sagemaker.readthedocs.io/en/stable/overview.html?highlight=VPC#use-file-systems-as-training-inputs) in the *SageMaker AI Python SDK documentation*.

# Configure data input mode using the SageMaker Python SDK
<a name="model-access-training-data-using-pysdk"></a>

SageMaker Python SDK provides the generic [Estimator class](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html#sagemaker.estimator.Estimator) and its [variations for ML frameworks](https://sagemaker.readthedocs.io/en/stable/frameworks/index.html) for launching training jobs. You can specify one of the data input modes while configuring the SageMaker AI `Estimator` class or the `Estimator.fit` method. The following code templates show the two ways to specify input modes.

**To specify the input mode using the Estimator class**

```
from sagemaker.estimator import Estimator
from sagemaker.inputs import TrainingInput

estimator = Estimator(
    checkpoint_s3_uri='s3://amzn-s3-demo-bucket/checkpoint-destination/',
    output_path='s3://amzn-s3-demo-bucket/output-path/',
    base_job_name='job-name',
    input_mode='File'  # Available options: File | Pipe | FastFile
    ...
)

# Run the training job
estimator.fit(
    inputs=TrainingInput(s3_data="s3://amzn-s3-demo-bucket/my-data/train")
)
```

For more information, see the [sagemaker.estimator.Estimator](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html#sagemaker.estimator.Estimator) class in the *SageMaker Python SDK documentation*.

**To specify the input mode through the `estimator.fit()` method**

```
from sagemaker.estimator import Estimator
from sagemaker.inputs import TrainingInput

estimator = Estimator(
    checkpoint_s3_uri='s3://amzn-s3-demo-bucket/checkpoint-destination/',
    output_path='s3://amzn-s3-demo-bucket/output-path/',
    base_job_name='job-name',
    ...
)

# Run the training job
estimator.fit(
    inputs=TrainingInput(
        s3_data="s3://amzn-s3-demo-bucket/my-data/train",
        input_mode='File'  # Available options: File | Pipe | FastFile
    )
)
```

For more information, see the [sagemaker.estimator.Estimator.fit](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html#sagemaker.estimator.Estimator.fit) class method and the [sagemaker.inputs.TrainingInput](https://sagemaker.readthedocs.io/en/stable/api/utility/inputs.html#sagemaker.inputs.TrainingInput) class in the *SageMaker Python SDK documentation*.

**Tip**  
To learn more about how to configure Amazon FSx for Lustre or Amazon EFS with your VPC configuration using the SageMaker Python SDK estimators, see [Use File Systems as Training Inputs](https://sagemaker.readthedocs.io/en/stable/overview.html?highlight=VPC#use-file-systems-as-training-inputs) in the *SageMaker AI Python SDK documentation*.

**Tip**  
The data input mode integrations with Amazon S3, Amazon EFS, and FSx for Lustre are recommended ways to optimally configure data source for the best practices. You can strategically improve data loading performance using the SageMaker AI managed storage options and input modes, but it's not strictly constrained. You can write your own data reading logic directly in your training container. For example, you can set to read from a different data source, write your own S3 data loader class, or use third-party frameworks' data loading functions within your training script. However, you must make sure that you specify the right paths that SageMaker AI can recognize.

**Tip**  
If you use a custom training container, make sure you install the [SageMaker training toolkit](https://github.com/aws/sagemaker-training-toolkit) that helps set up the environment for SageMaker training jobs. Otherwise, you must specify the environment variables explicitly in your Dockerfile. For more information, see [Create a container with your own algorithms and models](https://docs.aws.amazon.com/sagemaker/latest/dg/docker-containers-create.html).

For more information about how to set the data input modes using the low-level SageMaker APIs, see [How Amazon SageMaker AI Provides Training Information](your-algorithms-training-algo-running-container.md), the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) API, and the `TrainingInputMode` in [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AlgorithmSpecification.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AlgorithmSpecification.html).

# Configure data input channel to use Amazon FSx for Lustre
<a name="model-access-training-data-fsx"></a>

Learn how to use Amazon FSx for Lustre as your data source for higher throughput and faster training by reducing the time for data loading.

**Note**  
When you use EFA-enabled instances such as P4d and P3dn, make sure that you set appropriate inbound and output rules in the security group. Specially, opening up these ports is necessary for SageMaker AI to access the Amazon FSx file system in the training job. To learn more, see [File System Access Control with Amazon VPC](https://docs.aws.amazon.com/fsx/latest/LustreGuide/limit-access-security-groups.html).

## Sync Amazon S3 and Amazon FSx for Lustre
<a name="model-access-training-data-fsx-sync-s3"></a>

To link your Amazon S3 to Amazon FSx for Lustre and upload your training datasets, do the following.

1. Prepare your dataset and upload to an Amazon S3 bucket. For example, assume that the Amazon S3 paths for a train dataset and a test dataset are in the following format.

   ```
   s3://amzn-s3-demo-bucket/data/train
   s3://amzn-s3-demo-bucket/data/test
   ```

1. To create an FSx for Lustre file system linked with the Amazon S3 bucket with the training data, follow the steps at [Linking your file system to an Amazon S3 bucket](https://docs.aws.amazon.com/fsx/latest/LustreGuide/create-dra-linked-data-repo.html) in the *Amazon FSx for Lustre User Guide*. Make sure that you add an endpoint to your VPC allowing Amazon S3 access. For more information, see [Create an Amazon S3 VPC Endpoint](train-vpc.md#train-vpc-s3). When you specify **Data repository path**, provide the Amazon S3 bucket URI of the folder that contains your datasets. For example, based on the example S3 paths in step 1, the data repository path should be the following.

   ```
   s3://amzn-s3-demo-bucket/data
   ```

1. After the FSx for Lustre file system is created, check the configuration information by running the following commands.

   ```
   aws fsx describe-file-systems && \
   aws fsx describe-data-repository-association
   ```

   These commands return `FileSystemId`, `MountName`, `FileSystemPath`, and `DataRepositoryPath`. For example, the outputs should look like the following.

   ```
   # Output of aws fsx describe-file-systems
   "FileSystemId": "fs-0123456789abcdef0"
   "MountName": "1234abcd"
   
   # Output of aws fsx describe-data-repository-association
   "FileSystemPath": "/ns1",
   "DataRepositoryPath": "s3://amzn-s3-demo-bucket/data/"
   ```

   After the sync between Amazon S3 and Amazon FSx has completed, your datasets are saved in Amazon FSx in the following directories.

   ```
   /ns1/train  # synced with s3://amzn-s3-demo-bucket/data/train
   /ns1/test   # synced with s3://amzn-s3-demo-bucket/data/test
   ```

## Set the Amazon FSx file system path as the data input channel for SageMaker training
<a name="model-access-training-data-fsx-set-as-input-channel"></a>

The following procedures walk you through the process of setting the Amazon FSx file system as the data source for SageMaker training jobs.

------
#### [ Using the SageMaker Python SDK ]

To properly set the Amazon FSx file system as the data source, configure the SageMaker AI estimator classes and `FileSystemInput` using the following instruction.

1. Configure a FileSystemInput class object.

   ```
   from sagemaker.inputs import FileSystemInput
   
   train_fs = FileSystemInput(
       file_system_id="fs-0123456789abcdef0",
       file_system_type="FSxLustre",
       directory_path="/1234abcd/ns1/",
       file_system_access_mode="ro",
   )
   ```
**Tip**  
When you specify `directory_path`, make sure that you provide the Amazon FSx file system path starting with `MountName`.

1. Configure a SageMaker AI estimator with the VPC configuration used for the Amazon FSx file system.

   ```
   from sagemaker.estimator import Estimator
   
   estimator = Estimator(
       ...
       role="your-iam-role-with-access-to-your-fsx",
       subnets=["subnet-id"],  # Should be the same as the subnet used for Amazon FSx
       security_group_ids="security-group-id"
   )
   ```

   Make sure that the IAM role for the SageMaker training job has the permissions to access and read from Amazon FSx.

1. Launch the training job by running the estimator.fit method with the Amazon FSx file system.

   ```
   estimator.fit(train_fs)
   ```

To find more code examples, see [Use File Systems as Training Inputs](https://sagemaker.readthedocs.io/en/stable/overview.html#use-file-systems-as-training-inputs) in the *SageMaker Python SDK documentation*.

------
#### [ Using the SageMaker AI CreateTrainingJob API ]

As part of the [CreateTrainingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request JSON, configure `InputDataConfig` as follows.

```
"InputDataConfig": [ 
    { 
        "ChannelName": "string",
        "DataSource": { 
            "FileSystemDataSource": { 
                "DirectoryPath": "/1234abcd/ns1/",
                "FileSystemAccessMode": "ro",
                "FileSystemId": "fs-0123456789abcdef0",
                "FileSystemType": "FSxLustre"
            }
        }
    }
],
```

**Tip**  
When you specify `DirectoryPath`, make sure that you provide the Amazon FSx file system path starting with `MountName`.

------

# Choosing an input mode and a storage unit
<a name="model-access-training-data-best-practices"></a>

The best data source for your training job depends on workload characteristics such as the size of the dataset, the file format, the average size of files, the training duration, a sequential or random data loader read pattern, and how fast your model can consume the training data. The following best practices provide guidelines to get started with the most suitable input mode and data storage service for your use case.

![\[Flowchart summarizing best practices of choosing the best storage as the data source and input file mode.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/sagemaker-training-choose-mode-and-storage.png)


## When to use Amazon EFS
<a name="model-access-training-data-best-practices-efs"></a>

If your dataset is stored in Amazon Elastic File System, you might have a preprocessing or annotations application that uses Amazon EFS for storage. You can run a training job configured with a data channel that points to the Amazon EFS file system. For more information, see [Speed up training on Amazon SageMaker AI using Amazon FSx for Lustre and Amazon EFS file systems](https://aws.amazon.com/blogs/machine-learning/speed-up-training-on-amazon-sagemaker-using-amazon-efs-or-amazon-fsx-for-lustre-file-systems/). If you cannot achieve better performance, check your optimization options following the [Amazon Elastic File System performance guide](https://docs.aws.amazon.com/efs/latest/ug/performance.html#performance-overview) or consider using different input modes or data storage.

## Use file mode for small datasets
<a name="model-access-training-data-best-practices-file-mode"></a>

If the dataset is stored in Amazon Simple Storage Service and its overall volume is relatively small (for example, less than 50-100 GB), try using file mode. The overhead of downloading a 50 GB dataset can vary based on the total number of files. For example, it takes about 5 minutes if a dataset is chunked into 100 MB shards. Whether this startup overhead is acceptable primarily depends on the overall duration of your training job, because a longer training phase means a proportionally smaller download phase.

## Serializing many small files
<a name="model-access-training-data-best-practices-serialize"></a>

If your dataset size is small (less than 50-100 GB), but is made up of many small files (less than 50 MB per file), the file mode download overhead grows, because each file needs to be downloaded individually from Amazon Simple Storage Service to the training instance volume. To reduce this overhead and data traversal time in general, consider serializing groups of such small files into fewer larger file containers (such as 150 MB per file) by using file formats, such as [TFRecord](https://www.tensorflow.org/tutorials/load_data/tfrecord) for TensorFlow, [ WebDataset](https://webdataset.github.io/webdataset/) for PyTorch, and [RecordIO](https://mxnet.apache.org/versions/1.8.0/api/faq/recordio) for MXNet.

## When to use fast file mode
<a name="model-access-training-data-best-practices-fastfile"></a>

For larger datasets with larger files (more than 50 MB per file), the first option is to try fast file mode, which is more straightforward to use than FSx for Lustre because it doesn't require creating a file system, or connecting to a VPC. Fast file mode is ideal for large file containers (more than 150 MB), and might also do well with files more than 50 MB. Because fast file mode provides a POSIX interface, it supports random reads (reading non-sequential byte-ranges). However, this is not the ideal use case, and your throughput might be lower than with the sequential reads. However, if you have a relatively large and computationally intensive ML model, fast file mode might still be able to saturate the effective bandwidth of the training pipeline and not result in an IO bottleneck. You'll need to experiment and see. To switch from file mode to fast file mode (and back), just add (or remove) the `input_mode='FastFile'` parameter while defining your input channel using the SageMaker Python SDK:

```
sagemaker.inputs.TrainingInput(S3_INPUT_FOLDER,  input_mode = 'FastFile')
```

## When to use Amazon FSx for Lustre
<a name="model-access-training-data-best-practices-fsx"></a>

If your dataset is too large for file mode, has many small files that you can't serialize easily, or uses a random read access pattern, FSx for Lustre is a good option to consider. Its file system scales to hundreds of gigabytes per second (GB/s) of throughput and millions of IOPS, which is ideal when you have many small files. However, note that there might be the cold start issue due to lazy loading and the overhead of setting up and initializing the FSx for Lustre file system.

**Tip**  
To learn more, see [Choose the best data source for your Amazon SageMaker training job](https://aws.amazon.com/blogs/machine-learning/choose-the-best-data-source-for-your-amazon-sagemaker-training-job/). This AWS machine learning blog further discusses case studies and performance benchmark of data sources and input modes.

# Use attribute-based access control (ABAC) for multi-tenancy training
<a name="model-access-training-data-abac"></a>

In a multi-tenant environment, it is crucial to ensure that each tenant's data is isolated and accessible only to authorized entities. SageMaker AI supports the use of [attribute-based access control (ABAC)](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction_attribute-based-access-control.html) to achieve this isolation for training jobs. Instead of creating multiple IAM roles for each tenant, you can use the same IAM role for all tenants by configuring a session chaining configuration that uses AWS Security Token Service (AWS STS) session tags to request temporary, limited-privilege credentials for your training job to access specific tenants. For more information about session tags, see [Passing session tags in AWS STS](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_session-tags.html).

When creating a training job, your session chaining configuration uses AWS STS to request temporary security credentials. This request generates a session, which is tagged. Each SageMaker training job can only access a specific tenant using a single role shared by all training jobs. By implementing ABAC with session chaining, you can ensure that each training job has access only to the tenant specified by the session tag, effectively isolating and securing each tenant. The following section guides you through the steps to set up and use ABAC for multi-tenant training job isolation using the SageMaker Python SDK. 

## Prerequisites
<a name="model-access-training-data-abac-prerequisites"></a>

To get started with ABAC for multi-tenant training job isolation, you must have the following:
+ Tenants with consistent naming across locations. For example, if an input data Amazon S3 URI for a tenant is `s3://your-input-s3-bucket/example-tenant`, the Amazon FSx directory for that same tenant should be `/fsx-train/train/example-tenant` and the output data Amazon S3 URI should be `s3://your-output-s3-bucket/example-tenant`.
+ A SageMaker AI job creation role. You can create a SageMaker AI job creation role using Amazon SageMaker AI Role Manager. For information, see [Using the role manager](https://docs.aws.amazon.com/sagemaker/latest/dg/role-manager-tutorial.html).
+ A SageMaker AI execution role that has `sts:AssumeRole`, and `sts:TagSession` permissions in its trust policy. For more information on SageMaker AI execution roles, see [SageMaker AI Roles](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html). 

  The execution role should also have a policy that allows tenants in any attribute-based multi-tenancy architecture to read from the prefix attached to a principal tag. The following is an example policy that limits the SageMaker AI execution role to have access to the value associated with the `tenant-id` key. For more information on naming tag keys, see [Rules for tagging in IAM and STS](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_tags.html#id_tags_rules).

------
#### [ JSON ]

****  

  ```
  {
      "Version":"2012-10-17",		 	 	 
      "Statement": [
          {
              "Action": [
                  "s3:GetObject",
                  "s3:PutObject"
              ],
              "Resource": [
                  "arn:aws:s3:::your-input-s3-bucket/${aws:PrincipalTag/tenant-id}/*"
              ],
              "Effect": "Allow"
          },
          {
              "Action": [
                  "s3:PutObject"
              ],
              "Resource": "arn:aws:s3:::your-output-s3-bucket/${aws:PrincipalTag/tenant-id}/*",
              "Effect": "Allow"
          },
          {
              "Action": "s3:ListBucket",
              "Resource": "*",
              "Effect": "Allow"
          }
      ]
  }
  ```

------

## Create a training job with session tag chaining enabled
<a name="model-access-training-data-abac-create-training-job"></a>

The following procedure shows you how to create a training job with session tag chaining using the SageMaker Python SDK for ABAC-enabled multi-tenancy training.

**Note**  
In addition to multi-tenancy data storage, you can also use the ABAC workflow to pass session tags to your execution role for Amazon VPC, AWS Key Management Service, and any other services you allow SageMaker AI to call

**Enable session tag chaining for ABAC**

1. Import `boto3` and the SageMaker Python SDK. ABAC-enabled training job isolation is only available in version [2.217](https://pypi.org/project/sagemaker/2.217.0/) or later of the SageMaker AI Python SDK. 

   ```
   import boto3
   import sagemaker
   
   from sagemaker.estimator import Estimator
   from sagemaker.inputs import TrainingInput
   ```

1. Set up an AWS STS and SageMaker AI client to use the tenant-labeled session tags. You can change the tag value to specify a different tenant.

   ```
   # Start an AWS STS client
   sts_client = boto3.client('sts')
   
   # Define your tenants using tags
   # The session tag key must match the principal tag key in your execution role policy
   tags = []
   tag = {}
   tag['Key'] = "tenant-id"
   tag['Value'] = "example-tenant"
   tags.append(tag)
   
   # Have AWS STS assume your ABAC-enabled job creation role
   response = sts_client.assume_role(
       RoleArn="arn:aws:iam::<account-id>:role/<your-training-job-creation-role>",
       RoleSessionName="SessionName",
       Tags=tags)
   credentials = response['Credentials']
   
   # Create a client with your job creation role (which was assumed with tags)
   sagemaker_client = boto3.client(
       'sagemaker',
       aws_access_key_id=credentials['AccessKeyId'],
       aws_secret_access_key=credentials['SecretAccessKey'],
       aws_session_token=credentials['SessionToken']
   )
   sagemaker_session = sagemaker.Session(sagemaker_client=sagemaker_client)
   ```

    When appending the tags `"tenant-id=example-tenant"` to the job creation role, these tags are extracted by the execution role to use the following policy: 

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Action": [
                   "s3:GetObject",
                   "s3:PutObject"
               ],
               "Resource": [
                   "arn:aws:s3:::your-input-s3-bucket/example-tenant/*"
               ],
               "Effect": "Allow"
           },
           {
               "Action": [
                   "s3:PutObject"
               ],
               "Resource": "arn:aws:s3:::your-output-s3-bucket/example-tenant/*",
               "Effect": "Allow"
           },
           {
               "Action": "s3:ListBucket",
               "Resource": "*",
               "Effect": "Allow"
           }
       ]
   }
   ```

------

1. Define an estimator to create a training job using the SageMaker Python SDK. Set `enable_session_tag_chaining` to `True` to allow your SageMaker AI training execution role to retrieve the tags from your job creation role.

   ```
   # Specify your training input
   trainingInput = TrainingInput(
       s3_data='s3://<your-input-bucket>/example-tenant',
       distribution='ShardedByS3Key',
       s3_data_type='S3Prefix'
   )
   
   # Specify your training job execution role 
   execution_role_arn = "arn:aws:iam::<account-id>:role/<your-training-job-execution-role>"
   
   # Define your esimator with session tag chaining enabled
   estimator = Estimator(
       image_uri="<your-training-image-uri>",
       role=execution_role_arn,
       instance_count=1,
       instance_type='ml.m4.xlarge',
       volume_size=20,
       max_run=3600,
       sagemaker_session=sagemaker_session,
       output_path="s3://<your-output-bucket>/example-tenant",
       enable_session_tag_chaining=True
   )
   
   estimator.fit(inputs=trainingInput, job_name="abac-demo")
   ```

SageMaker AI can only read tags provided in the training job request and does not add any tags to resources on your behalf.

ABAC for SageMaker training is compatible with SageMaker AI managed warm pools. To use ABAC with warm pools, matching training jobs must have identical session tags. For more information, see [Matching training jobs](train-warm-pools.md#train-warm-pools-matching-criteria).