

# Container creation with your own algorithms and models
<a name="docker-containers-create"></a>

If none of the existing SageMaker AI containers meet your needs and you don't have an existing container of your own, you may need to create a new Docker container. The following sections show how to create Docker containers with your training and inference algorithms for use with SageMaker AI.

**Topics**
+ [Containers with custom training algorithms](your-algorithms-training-algo.md)
+ [Containers with custom inference code](your-algorithms-inference-main.md)

# Containers with custom training algorithms
<a name="your-algorithms-training-algo"></a>

This section explains how Amazon SageMaker AI interacts with a Docker container that runs your custom training algorithm. Use this information to write training code and create a Docker image for your training algorithms. 

**Topics**
+ [How Amazon SageMaker AI Runs Your Training Image](your-algorithms-training-algo-dockerfile.md)
+ [How Amazon SageMaker AI Provides Training Information](your-algorithms-training-algo-running-container.md)
+ [Run Training with EFA](your-algorithms-training-efa.md)
+ [How Amazon SageMaker AI Signals Algorithm Success and Failure](your-algorithms-training-signal-success-failure.md)
+ [How Amazon SageMaker AI Processes Training Output](your-algorithms-training-algo-output.md)

# How Amazon SageMaker AI Runs Your Training Image
<a name="your-algorithms-training-algo-dockerfile"></a>

You can use a custom entrypoint script to automate infrastructure to train in a production environment. If you pass your entrypoint script into your Docker container, you can also run it as a standalone script without rebuilding your images. SageMaker AI processes your training image using a Docker container entrypoint script. 

This section shows you how to use a custom entrypoint without using the training toolkit. If you want to use a custom entrypoint but are unfamiliar with how to manually configure a Docker container, we recommend that you use the [SageMaker training toolkit library](https://github.com/aws/sagemaker-training-toolkit) instead. For more information about how to use the training toolkit, see [Adapting your own training container](adapt-training-container.md). 

By default, SageMaker AI looks for a script called `train` inside your container. You can also manually provide your own custom entrypoint by using the `ContainerArguments` and `ContainerEntrypoint` parameters of the [AlgorithmSpecification](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AlgorithmSpecification.html) API. 

You have the following two options to manually configure your Docker container to run your image.
+ Use the [CreateTrainingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) API and a Docker container with an entrypoint instruction contained inside of it.
+ Use the `CreateTrainingJob` API, and pass your training script from outside of your Docker container.

If you pass your training script from outside your Docker container, you don't need to rebuild the Docker container when you update your script. You can also use several different scripts to run in the same container.

Your entrypoint script should contain training code for your image. If you use the optional `source_dir` parameter inside an [estimator](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html), it should reference the relative Amazon S3 path to the folder containing your entrypoint script. You can reference multiple files using the `source_dir` parameter. If you do not use `source_dir`, you can specify the entrypoint using the `entry_point` parameter. For an example of a custom entrypoint script that contains an estimator, see [Bring Your Own Model with SageMaker AI Script Mode](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-script-mode/sagemaker-script-mode.html).

SageMaker AI model training supports high-performance S3 Express One Zone directory buckets as a data input location for file mode, fast file mode, and pipe mode. You can also use S3 Express One Zone directory buckets to store your training output. To use S3 Express One Zone, provide the URI of an S3 Express One Zone directory bucket instead of an Amazon S3 general purpose bucket. You can only encrypt your SageMaker AI output data in directory buckets with server-side encryption with Amazon S3 managed keys (SSE-S3). Server-side encryption with AWS KMS keys (SSE-KMS) is not currently supported for storing SageMaker AI output data in directory buckets. For more information, see [S3 Express One Zone](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-express-one-zone.html).

## Run a training job with an entrypoint script bundled inside the Docker container
<a name="your-algorithms-training-algo-dockerfile-api-ep-in"></a>

SageMaker AI can run an entrypoint script bundled inside your Docker container. 
+ By default, Amazon SageMaker AI runs the following container.

  ```
  docker run image train
  ```
+ SageMaker AI overrides any default [CMD](https://docs.docker.com/engine/reference/builder/#cmd) statements in a container by specifying the `train` argument after the image name. In your Docker container, use the following `exec` form of the `ENTRYPOINT` instruction.

  ```
  ENTRYPOINT ["executable", "param1", "param2", ...]
  ```

  The following example shows how to specify a python entrypoint instruction called `k-means-algorithm.py`.

  ```
  ENTRYPOINT ["python", "k-means-algorithm.py"]
  ```

  The `exec` form of the `ENTRYPOINT` instruction starts the executable directly, not as a child of `/bin/sh`. This enables it to receive signals like `SIGTERM` and `SIGKILL` from SageMaker APIs. The following conditions apply when using the SageMaker APIs. 
  + The [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) API has a stopping condition that directs SageMaker AI to stop model training after a specific time. 
  + The following shows the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_StopTrainingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_StopTrainingJob.html) API. This API issues the equivalent of the `docker stop`, with a 2-minute timeout command to gracefully stop the specified container.

    ```
    docker stop -t 120
    ```

    The command attempts to stop the running container by sending a `SIGTERM` signal. After the 2-minute timeout, the API sends `SIGKILL` and forcibly stops the containers. If the container handles the `SIGTERM` gracefully and exits within 120 seconds from receiving it, no `SIGKILL` is sent. 

  If you want access to the intermediate model artifacts after SageMaker AI stops the training, add code to handle saving artifacts in your `SIGTERM` handler.
+ If you plan to use GPU devices for model training, make sure that your containers are `nvidia-docker` compatible. Include only the CUDA toolkit on containers; don't bundle NVIDIA drivers with the image. For more information about `nvidia-docker`, see [NVIDIA/nvidia-docker](https://github.com/NVIDIA/nvidia-docker).
+ You can't use the `tini` initializer as your entrypoint script in SageMaker AI containers because it gets confused by the `train` and `serve` arguments.
+ `/opt/ml` and all subdirectories are reserved by SageMaker training. When building your algorithm’s Docker image, make sure that you don't place any data that's required by your algorithm in this directory. Because if you do, the data may no longer be visible during training.

To bundle your shell or Python scripts inside your Docker image, or to provide the script in an Amazon S3 bucket or by using the AWS Command Line Interface (CLI), continue to the following section.

### Bundle your shell script in a Docker container
<a name="your-algorithms-training-algo-dockerfile-script-sh"></a>

 If you want to bundle a custom shell script inside your Docker image, use the following steps. 

1. Copy your shell script from your working directory to inside your Docker container. The following code snippet copies a custom entrypoint script `custom_entrypoint.sh` from the current working directory to a Docker container located in `mydir`. The following example assumes that the base Docker image has Python installed.

   ```
   FROM <base-docker-image>:<tag>
   
   # Copy custom entrypoint from current dir to /mydir on container
   COPY ./custom_entrypoint.sh /mydir/
   ```

1. Build and push a Docker container to the Amazon Elastic Container Registry ([Amazon ECR](https://docs.aws.amazon.com/AmazonECR/latest/userguide/what-is-ecr.html)) by following the instructions at [Pushing a Docker image](https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-ecr-image.html) in the *Amazon ECR User Guide*.

1. Launch the training job by running the following AWS CLI command.

   ```
   aws --region <your-region> sagemaker create-training-job \
   --training-job-name <your-training-job-name> \
   --role-arn <your-execution-role-arn> \
   --algorithm-specification '{ \ 
       "TrainingInputMode": "File", \
       "TrainingImage": "<your-ecr-image>", \
       "ContainerEntrypoint": ["/bin/sh"], \
       "ContainerArguments": ["/mydir/custom_entrypoint.sh"]}' \
   --output-data-config '{"S3OutputPath": "s3://custom-entrypoint-output-bucket/"}' \
   --resource-config '{"VolumeSizeInGB":10,"InstanceCount":1,"InstanceType":"ml.m5.2xlarge"}' \
   --stopping-condition '{"MaxRuntimeInSeconds": 180}'
   ```

### Bundle your Python script in a Docker container
<a name="your-algorithms-training-algo-dockerfile-script-py"></a>

To bundle a custom Python script inside your Docker image, use the following steps. 

1. Copy your Python script from your working directory to inside your Docker container. The following code snippet copies a custom entrypoint script `custom_entrypoint.py` from the current working directory to a Docker container located in `mydir`.

   ```
   FROM <base-docker-image>:<tag>
   # Copy custom entrypoint from current dir to /mydir on container
   COPY ./custom_entrypoint.py /mydir/
   ```

1. Launch the training job by running the following AWS CLI command.

   ```
   --algorithm-specification '{ \ 
       "TrainingInputMode": "File", \
       "TrainingImage": "<your-ecr-image>", \
       "ContainerEntrypoint": ["python"], \
       "ContainerArguments": ["/mydir/custom_entrypoint.py"]}' \
   ```

## Run a training job with an entrypoint script outside the Docker container
<a name="your-algorithms-training-algo-dockerfile-api-pass-ep"></a>

You can use your own Docker container for training and pass in an entrypoint script from outside the Docker container. There are some benefits to structuring your entrypoint script outside the container. If you update your entrypoint script, you don't need to rebuild the Docker container. You can also use several different scripts to run in the same container. 

Specify the location of your training script using the `ContainerEntrypoint` and `ContainerArguments` parameters of the [AlgorithmSpecification](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AlgorithmSpecification.html) API. These entrypoints and arguments behave in the same manner as Docker entrypoints and arguments. The values in these parameters override the corresponding `ENTRYPOINT` or `CMD` provided as part of the Docker container. 

When you pass your custom entrypoint script to your Docker training container, the inputs that you provide determine the behavior of the container.
+ For example, if you provide only `ContainerEntrypoint`, the request syntax using the CreateTrainingJob API is as follows.

  ```
  {
      "AlgorithmSpecification": {
          "ContainerEntrypoint": ["string"],   
          ...     
          }       
  }
  ```

  Then, the SageMaker training backend runs your custom entrypoint as follows.

  ```
  docker run --entrypoint <ContainerEntrypoint> image
  ```
**Note**  
If `ContainerEntrypoint` is provided, the SageMaker training backend runs the image with the given entrypoint and overrides the default `ENTRYPOINT` in the image.
+ If you provide only `ContainerArguments`, SageMaker AI assumes that the Docker container contains an entrypoint script. The request syntax using the `CreateTrainingJob` API is as follows.

  ```
  {
      "AlgorithmSpecification": {
          "ContainerArguments": ["arg1", "arg2"],
          ...
      }
  }
  ```

  The SageMaker training backend runs your custom entrypoint as follows.

  ```
  docker run image <ContainerArguments>
  ```
+ If your provide both the `ContainerEntrypoint` and `ContainerArguments`, then the request syntax using the `CreateTrainingJob` API is as follows.

  ```
  {
      "AlgorithmSpecification": {
          "ContainerEntrypoint": ["string"],
          "ContainerArguments": ["arg1", "arg2"],
          ...
      }
  }
  ```

   The SageMaker training backend runs your custom entrypoint as follows.

  ```
  docker run --entrypoint <ContainerEntrypoint> image <ContainerArguments>
  ```

You can use any supported `InputDataConfig` source in the `CreateTrainingJob` API to provide an entrypoint script to run your training image. 

### Provide your entrypoint script in an Amazon S3 bucket
<a name="your-algorithms-training-algo-dockerfile-script-s3"></a>

 To provide a custom entrypoint script using an S3 bucket, use the `S3DataSource` parameter of the [DataSource](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DataSource.html#sagemaker-Type-DataSource-S3DataSource) API to specify the location of the script. If you use the `S3DataSource` parameter, the following are required.
+ The [InputMode](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Channel.html#sagemaker-Type-Channel-InputMode) must be of the type `File`.
+ The [S3DataDistributionType](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DataSource.html#sagemaker-Type-DataSource-S3DataSource) must be `FullyReplicated`.

The following example has a script called custom\$1entrypoint.sh placed in a path to an S3 bucket `s3://<bucket-name>/<bucket prefix>/custom_entrypoint.sh`.

```
#!/bin/bash
echo "Running custom_entrypoint.sh"
echo "Hello you have provided the following arguments: " "$@"
```

Next, you must set the configuration of the input data channel to run a training job. Do this either by using the AWS CLI directly or with a JSON file.

#### Configure the input data channel using AWS CLI with a JSON file
<a name="your-algorithms-training-algo-dockerfile-script-s3-json"></a>

To configure your input data channel with a JSON file, use AWS CLI as shown in the following code structure. Ensure that all of the following fields use the request syntax defined in the [CreateTrainingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html#API_CreateTrainingJob_RequestSyntax) API.

```
// run-my-training-job.json
{
 "[AlgorithmSpecification](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html#sagemaker-CreateTrainingJob-request-AlgorithmSpecification)": { 
        "ContainerEntrypoint": ["/bin/sh"],
        "ContainerArguments": ["/opt/ml/input/data/<your_channel_name>/custom_entrypoint.sh"],
         ...
   },
  "[InputDataConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html#sagemaker-CreateTrainingJob-request-InputDataConfig)": [ 
    { 
        "[ChannelName](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Channel.html#sagemaker-Type-Channel-ChannelName)": "<your_channel_name>",
        "[DataSource](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Channel.html#sagemaker-Type-Channel-DataSource)": { 
            "[S3DataSource](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DataSource.html#sagemaker-Type-DataSource-S3DataSource)": { 
                "[S3DataDistributionType](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_S3DataSource.html#sagemaker-Type-S3DataSource-S3DataDistributionType)": "FullyReplicated",
                "[S3DataType](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_S3DataSource.html#sagemaker-Type-S3DataSource-S3DataType)": "S3Prefix",
                "[S3Uri](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_S3DataSource.html#sagemaker-Type-S3DataSource-S3Uri)": "s3://<bucket-name>/<bucket_prefix>"
            }
        },
        "[InputMode](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Channel.html#sagemaker-Type-Channel-InputMode)": "File",
    },
    ...]
}
```

Next, run the AWS CLI command to launch the training job from the JSON file as follows.

```
aws sagemaker create-training-job --cli-input-json file://run-my-training-job.json
```

#### Configure the input data channel using AWS CLI directly
<a name="your-algorithms-training-algo-dockerfile-script-s3-directly"></a>

To configure your input data channel without a JSON file, use the following AWS CLI code structure.

```
aws --region <your-region> sagemaker create-training-job \
--training-job-name <your-training-job-name> \
--role-arn <your-execution-role-arn> \
--algorithm-specification '{ \
    "TrainingInputMode": "File", \
    "TrainingImage": "<your-ecr-image>", \
    "ContainerEntrypoint": ["/bin/sh"], \
    "ContainerArguments": ["/opt/ml/input/data/<your_channel_name>/custom_entrypoint.sh"]}' \
--input-data-config '[{ \
    "ChannelName":"<your_channel_name>", \
    "DataSource":{ \
        "S3DataSource":{ \
            "S3DataType":"S3Prefix", \
            "S3Uri":"s3://<bucket-name>/<bucket_prefix>", \
            "S3DataDistributionType":"FullyReplicated"}}}]' \
--output-data-config '{"S3OutputPath": "s3://custom-entrypoint-output-bucket/"}' \
--resource-config '{"VolumeSizeInGB":10,"InstanceCount":1,"InstanceType":"ml.m5.2xlarge"}' \
--stopping-condition '{"MaxRuntimeInSeconds": 180}'
```

# How Amazon SageMaker AI Provides Training Information
<a name="your-algorithms-training-algo-running-container"></a>

This section explains how SageMaker AI makes training information, such as training data, hyperparameters, and other configuration information, available to your Docker container. 

When you send a [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request to SageMaker AI to start model training, you specify the Amazon Elastic Container Registry (Amazon ECR) path of the Docker image that contains the training algorithm. You also specify the Amazon Simple Storage Service (Amazon S3) location where training data is stored and algorithm-specific parameters. SageMaker AI makes this information available to the Docker container so that your training algorithm can use it. This section explains how we make this information available to your Docker container. For information about creating a training job, see `CreateTrainingJob`. For more information on the way that SageMaker AI containers organize information, see [SageMaker Training and Inference Toolkits](amazon-sagemaker-toolkits.md).

**Topics**
+ [Hyperparameters](#your-algorithms-training-algo-running-container-hyperparameters)
+ [Environment Variables](#your-algorithms-training-algo-running-container-environment-variables)
+ [Input Data Configuration](#your-algorithms-training-algo-running-container-inputdataconfig)
+ [Training Data](#your-algorithms-training-algo-running-container-trainingdata)
+ [Distributed Training Configuration](#your-algorithms-training-algo-running-container-dist-training)

## Hyperparameters
<a name="your-algorithms-training-algo-running-container-hyperparameters"></a>

 SageMaker AI makes the hyperparameters in a `CreateTrainingJob` request available in the Docker container in the `/opt/ml/input/config/hyperparameters.json` file.

The following is an example of a hyperparameter configuration in `hyperparameters.json` to specify the `num_round` and `eta` hyperparameters in the `CreateTrainingJob` operation for [XGBoost](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html). 

```
{
    "num_round": "128",
    "eta": "0.001"
}
```

For a complete list of hyperparameters that can be used for the SageMaker AI built-in XGBoost algorithm, see [XGBoost Hyperparameters](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost_hyperparameters.html).

The hyperparameters that you can tune depend on the algorithm that you are training. For a list of hyperparameters available for a SageMaker AI built-in algorithm, find them listed in **Hyperparameters** under the algorithm link in [Use Amazon SageMaker AI Built-in Algorithms or Pre-trained Models](https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html).

## Environment Variables
<a name="your-algorithms-training-algo-running-container-environment-variables"></a>

SageMaker AI sets the following environment variables in your container:
+ TRAINING\$1JOB\$1NAME – Specified in the `TrainingJobName` parameter of the `CreateTrainingJob` request.
+ TRAINING\$1JOB\$1ARN – The Amazon Resource Name (ARN) of the training job returned as the `TrainingJobArn` in the `CreateTrainingJob` response.
+ All environment variables specified in the [Environment](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html#sagemaker-CreateTrainingJob-request-Environment) parameter in the `CreateTrainingJob` request.

## Input Data Configuration
<a name="your-algorithms-training-algo-running-container-inputdataconfig"></a>

SageMaker AI makes the data channel information in the `InputDataConfig` parameter from your `CreateTrainingJob` request available in the `/opt/ml/input/config/inputdataconfig.json` file in your Docker container.

For example, suppose that you specify three data channels (`train`, `evaluation`, and `validation`) in your request. SageMaker AI provides the following JSON:

```
{
  "train" : {"ContentType":  "trainingContentType",
             "TrainingInputMode": "File",
             "S3DistributionType": "FullyReplicated",
             "RecordWrapperType": "None"},
  "evaluation" : {"ContentType":  "evalContentType",
                  "TrainingInputMode": "File",
                  "S3DistributionType": "FullyReplicated",
                  "RecordWrapperType": "None"},
  "validation" : {"TrainingInputMode": "File",
                  "S3DistributionType": "FullyReplicated",
                  "RecordWrapperType": "None"}
}
```

**Note**  
SageMaker AI provides only relevant information about each data channel (for example, the channel name and the content type) to the container, as shown in the previous example. `S3DistributionType` will be set as `FullyReplicated` if you specify EFS or FSxLustre as input data sources.

## Training Data
<a name="your-algorithms-training-algo-running-container-trainingdata"></a>

The `TrainingInputMode` parameter in the `AlgorithmSpecification` of the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request specifies how the training dataset is made available to your container. The following input modes are available.
+ **`File` mode**

  If you use `File` mode as your `TrainingInputMode` value, SageMaker AI sets the following parameters in your container.
  + Your `TrainingInputMode` parameter is written to `inputdataconfig.json` as "File".
  + Your data channel directory is written to `/opt/ml/input/data/channel_name`.

  If you use `File` mode, SageMaker AI creates a directory for each channel. For example, if you have three channels named `training`, `validation`, and `testing`, SageMaker AI makes the following three directories in your Docker container: 
  + `/opt/ml/input/data/training`
  + `/opt/ml/input/data/validation`
  + `/opt/ml/input/data/testing`

  `File` mode also supports the following data sources.
  + Amazon Simple Storage Service (Amazon S3)
  + Amazon Elastic File System (Amazon EFS)
  + Amazon FSx for Lustre
**Note**  
Channels that use file system data sources such as Amazon EFS and Amazon FSx must use `File` mode. In this case, the directory path provided in the channel is mounted at `/opt/ml/input/data/channel_name`.
+ **`FastFile` mode**

  If you use `FastFile` mode as your `TrainingInputNodeParameter`, SageMaker AI sets the following parameters in your container.
  + Similar to `File` mode, in `FastFile` mode, your `TrainingInputMode` parameter is written to `inputdataconfig.json` as "File".
  + Your data channel directory is written to `/opt/ml/input/data/channel_name`.

  `FastFile` mode supports the following data sources.
  + Amazon S3

  If you use `FastFile` mode, the channel directory is mounted with read-only permission.

  Historically, `File` mode preceded `FastFile` mode. To ensure backwards compatibility, algorithms that support `File` mode can also seamlessly work with `FastFile` mode as long as the `TrainingInputMode` parameter is set to `File` in `inputdataconfig.json.`.
**Note**  
Channels that use `FastFile` mode must use a `S3DataType` of "S3Prefix".  
`FastFile` mode presents a folder view that uses the forward slash (`/`) as the delimiter for grouping Amazon S3 objects into folders. `S3Uri` prefixes must not correspond to a partial folder name. For example, if an Amazon S3 dataset contains `s3://amzn-s3-demo-bucket/train-01/data.csv`, then neither `s3://amzn-s3-demo-bucket/train` nor `s3://amzn-s3-demo-bucket/train-01` are allowed as `S3Uri` prefixes.  
A trailing forward slash is recommended to define a channel corresponding to a folder. For example, the `s3://amzn-s3-demo-bucket/train-01/` channel for the `train-01` folder. Without the trailing forward slash, the channel would be ambiguous if there existed another folder `s3://amzn-s3-demo-bucket/train-011/` or file `s3://amzn-s3-demo-bucket/train-01.txt/`.
+ **`Pipe` mode**
  + `TrainingInputMode` parameter written to `inputdataconfig.json`: "Pipe"
  + Data channel directory in the Docker container: `/opt/ml/input/data/channel_name_epoch_number`
  + Supported data sources: Amazon S3

  You need to read from a separate pipe for each channel. For example, if you have three channels named `training`, `validation`, and `testing`, you need to read from the following pipes:
  + `/opt/ml/input/data/training_0, /opt/ml/input/data/training_1, ...`
  + `/opt/ml/input/data/validation_0, /opt/ml/input/data/validation_1, ...`
  + `/opt/ml/input/data/testing_0, /opt/ml/input/data/testing_1, ...`

  Read the pipes sequentially. For example, if you have a channel called `training`, read the pipes in this sequence: 

  1. Open `/opt/ml/input/data/training_0` in read mode and read it to end-of-file (EOF) or, if you are done with the first epoch, close the pipe file early. 

  1. After closing the first pipe file, look for `/opt/ml/input/data/training_1` and read it until you have completed the second epoch, and so on.

  If the file for a given epoch doesn't exist yet, your code may need to retry until the pipe is created There is no sequencing restriction across channel types. For example, you can read multiple epochs for the `training` channel and only start reading the `validation` channel when you are ready. Or, you can read them simultaneously if your algorithm requires that.

  For an example of a Jupyter notebook that shows how to use Pipe mode when bringing your own container, see [Bring your own pipe-mode algorithm to Amazon SageMaker AI](https://github.com/aws/amazon-sagemaker-examples/blob/main/advanced_functionality/pipe_bring_your_own/pipe_bring_your_own.ipynb).

  

SageMaker AI model training supports high-performance S3 Express One Zone directory buckets as a data input location for file mode, fast file mode, and pipe mode. To use S3 Express One Zone, input the location of the S3 Express One Zone directory bucket instead of an Amazon S3 general purpose bucket. Provide the ARN for the IAM role with the required access control and permissions policy. Refer to [AmazonSageMakerFullAccesspolicy](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonSageMakerFullAccess.html) for details. You can only encrypt your SageMaker AI output data in directory buckets with server-side encryption with Amazon S3 managed keys (SSE-S3). Server-side encryption with AWS KMS keys (SSE-KMS) is not currently supported for storing SageMaker AI output data in directory buckets. For more information, see [S3 Express One Zone](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-express-one-zone.html).

## Distributed Training Configuration
<a name="your-algorithms-training-algo-running-container-dist-training"></a>

If you're performing distributed training with multiple containers, SageMaker AI makes information about all containers available in the `/opt/ml/input/config/resourceconfig.json` file.

To enable inter-container communication, this JSON file contains information for all containers. SageMaker AI makes this file available for both `File` and `Pipe` mode algorithms. The file provides the following information:
+ `current_host`—The name of the current container on the container network. For example, `algo-1`. Host values can change at any time. Don't write code with specific values for this variable.
+ `hosts`—The list of names of all containers on the container network, sorted lexicographically. For example, `["algo-1", "algo-2", "algo-3"]` for a three-node cluster. Containers can use these names to address other containers on the container network. Host values can change at any time. Don't write code with specific values for these variables.
+ `network_interface_name`—The name of the network interface that is exposed to your container. For example, containers running the Message Passing Interface (MPI) can use this information to set the network interface name.
+ Do not use the information in `/etc/hostname` or `/etc/hosts` because it might be inaccurate.
+ Hostname information may not be immediately available to the algorithm container. We recommend adding a retry policy on hostname resolution operations as nodes become available in the cluster.

The following is an example file on node 1 in a three-node cluster:

```
{
    "current_host": "algo-1",
    "hosts": ["algo-1","algo-2","algo-3"],
    "network_interface_name":"eth1"
}
```

# Run Training with EFA
<a name="your-algorithms-training-efa"></a>

 SageMaker AI provides integration with [EFA](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html) devices to accelerate High Performance Computing (HPC) and machine learning applications. This integration allows you to leverage an EFA device when running your distributed training jobs. You can add EFA integration to an existing Docker container that you bring to SageMaker AI. The following information outlines how to configure your own container to use an EFA device for your distributed training jobs. 

## Prerequisites
<a name="your-algorithms-training-efa-prereq"></a>

 Your container must satisfy the [SageMaker Training container specification](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo-dockerfile.html).  

## Install EFA and required packages
<a name="your-algorithms-training-efa-install"></a>

Your container must download and install the [ EFA software](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html). This allows your container to recognize the EFA device, and provides compatible versions of Libfabric and Open MPI. 

Any tools like MPI and NCCL must be installed and managed inside the container to be used as part of your EFA-enabled training job. For a list of all available EFA versions, see [Verify the EFA installer using a checksum](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-verify.html). The following example shows how to modify the Dockerfile of your EFA-enabled container to install EFA, MPI, OFI, NCCL, and NCCL-TEST.

**Note**  
When using PyTorch with EFA on your container, the NCCL version of your container should match the NCCL version of your PyTorch installation. To verify the PyTorch NCCL version, use the following command:  

```
torch.cuda.nccl.version()
```

```
ARG OPEN_MPI_PATH=/opt/amazon/openmpi/
ENV NCCL_VERSION=2.7.8
ENV EFA_VERSION=1.30.0
ENV BRANCH_OFI=1.1.1

#################################################
## EFA and MPI SETUP
RUN cd $HOME \
  && curl -O https://s3-us-west-2.amazonaws.com/aws-efa-installer/aws-efa-installer-${EFA_VERSION}.tar.gz \
  && tar -xf aws-efa-installer-${EFA_VERSION}.tar.gz \
  && cd aws-efa-installer \
  && ./efa_installer.sh -y --skip-kmod -g \

ENV PATH="$OPEN_MPI_PATH/bin:$PATH"
ENV LD_LIBRARY_PATH="$OPEN_MPI_PATH/lib/:$LD_LIBRARY_PATH"

#################################################
## NCCL, OFI, NCCL-TEST SETUP
RUN cd $HOME \
  && git clone https://github.com/NVIDIA/nccl.git -b v${NCCL_VERSION}-1 \
  && cd nccl \
  && make -j64 src.build BUILDDIR=/usr/local

RUN apt-get update && apt-get install -y autoconf
RUN cd $HOME \
  && git clone https://github.com/aws/aws-ofi-nccl.git -b v${BRANCH_OFI} \
  && cd aws-ofi-nccl \
  && ./autogen.sh \
  && ./configure --with-libfabric=/opt/amazon/efa \
       --with-mpi=/opt/amazon/openmpi \
       --with-cuda=/usr/local/cuda \
       --with-nccl=/usr/local --prefix=/usr/local \
  && make && make install
  
RUN cd $HOME \
  && git clone https://github.com/NVIDIA/nccl-tests \
  && cd nccl-tests \
  && make MPI=1 MPI_HOME=/opt/amazon/openmpi CUDA_HOME=/usr/local/cuda NCCL_HOME=/usr/local
```

## Considerations when creating your container
<a name="your-algorithms-training-efa-considerations"></a>

The EFA device is mounted to the container as `/dev/infiniband/uverbs0` under the list of devices accessible to the container. On P4d instances, the container has access to 4 EFA devices. The EFA devices can be found in the list of devices accessible to the container as: 
+  `/dev/infiniband/uverbs0` 
+  `/dev/infiniband/uverbs1` 
+  `/dev/infiniband/uverbs2` 
+  `/dev/infiniband/uverbs3` 

 To get information about hostname, peer hostnames, and network interface (for MPI) from the `resourceconfig.json` file provided to each container instances, see [Distributed Training Configuration](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo-running-container.html#your-algorithms-training-algo-running-container-dist-training). Your container handles regular TCP traffic among peers through the default Elastic Network Interfaces (ENI), while handling OFI (kernel bypass) traffic through the EFA device. 

## Verify that your EFA device is recognized
<a name="your-algorithms-training-efa-verify"></a>

  To verify that the EFA device is recognized, run the following command from within your container. 

```
/opt/amazon/efa/bin/fi_info -p efa
```

Your output should look similar to the following.

```
provider: efa
    fabric: EFA-fe80::e5:56ff:fe34:56a8
    domain: efa_0-rdm
    version: 2.0
    type: FI_EP_RDM
    protocol: FI_PROTO_EFA
provider: efa
    fabric: EFA-fe80::e5:56ff:fe34:56a8
    domain: efa_0-dgrm
    version: 2.0
    type: FI_EP_DGRAM
    protocol: FI_PROTO_EFA
provider: efa;ofi_rxd
    fabric: EFA-fe80::e5:56ff:fe34:56a8
    domain: efa_0-dgrm
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXD
```

## Running a training job with EFA
<a name="your-algorithms-training-efa-run"></a>

 Once you’ve created an EFA-enabled container, you can run a training job with EFA using a SageMaker AI Estimator the same way as you would with any other Docker image. For more information on registering your container and using it for training, see [Adapting Your Own Training Container](https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-training-container.html#byoc-training-step5).

# How Amazon SageMaker AI Signals Algorithm Success and Failure
<a name="your-algorithms-training-signal-success-failure"></a>

A training algorithm indicates whether it succeeded or failed using the exit code of its process. 

A successful training execution should exit with an exit code of 0 and an unsuccessful training execution should exit with a non-zero exit code. These will be converted to `Completed` and `Failed` in the `TrainingJobStatus` returned by `DescribeTrainingJob`. This exit code convention is standard and is easily implemented in all languages. For example, in Python, you can use `sys.exit(1)` to signal a failure exit, and simply running to the end of the main routine will cause Python to exit with code 0.

In the case of failure, the algorithm can write a description of the failure to the failure file. See next section for details.

# How Amazon SageMaker AI Processes Training Output
<a name="your-algorithms-training-algo-output"></a>

As your algorithm runs in a container, it generates output including the status of the training job and model and output artifacts. Your algorithm should write this information to the following files, which are located in the container's `/output` directory. Amazon SageMaker AI processes the information contained in this directory as follows:
+ `/opt/ml/model` – Your algorithm should write all final model artifacts to this directory. SageMaker AI copies this data as a single object in compressed tar format to the S3 location that you specified in the `CreateTrainingJob` request. If multiple containers in a single training job write to this directory they should ensure no `file/directory` names clash. SageMaker AI aggregates the result in a TAR file and uploads to S3 at the end of the training job. 
+ `/opt/ml/output/data` – Your algorithm should write artifacts you want to store other than the final model to this directory. SageMaker AI copies this data as a single object in compressed tar format to the S3 location that you specified in the `CreateTrainingJob` request. If multiple containers in a single training job write to this directory they should ensure no `file/directory` names clash. SageMaker AI aggregates the result in a TAR file and uploads to S3 at the end of the training job.
+ `/opt/ml/output/failure` – If training fails, after all algorithm output (for example, logging) completes, your algorithm should write the failure description to this file. In a `DescribeTrainingJob` response, SageMaker AI returns the first 1024 characters from this file as `FailureReason`. 

You can specify either an S3 general purpose or S3 directory bucket to store your training output. Directory buckets use only the Amazon S3 Express One Zone storage class, which is designed for workloads or performance-critical applications that require consistent single-digit millisecond latency. Choose the bucket type that best fits your application and performance requirements. For more information on S3 directory buckets, see [Directory buckets](https://docs.aws.amazon.com/AmazonS3/latest/userguide/directory-buckets-overview.html) in the *Amazon Simple Storage Service User Guide*. 

**Note**  
You can only encrypt your SageMaker AI output data in S3 directory buckets with server-side encryption with Amazon S3 managed keys (SSE-S3). Server-side encryption with AWS KMS keys (SSE-KMS) isn't currently supported for storing SageMaker AI output data in directory buckets.

# Containers with custom inference code
<a name="your-algorithms-inference-main"></a>

You can use Amazon SageMaker AI to interact with Docker containers and run your own inference code in one of two ways:
+ To use your own inference code with a persistent endpoint to get one prediction at a time, use SageMaker AI hosting services.
+ To use your own inference code to get predictions for an entire dataset, use SageMaker AI batch transform.

**Topics**
+ [Custom Inference Code with Hosting Services](your-algorithms-inference-code.md)
+ [Custom Inference Code with Batch Transform](your-algorithms-batch-code.md)

# Custom Inference Code with Hosting Services
<a name="your-algorithms-inference-code"></a>

This section explains how Amazon SageMaker AI interacts with a Docker container that runs your own inference code for hosting services. Use this information to write inference code and create a Docker image. 

**Topics**
+ [How SageMaker AI Runs Your Inference Image](#your-algorithms-inference-code-run-image)
+ [How SageMaker AI Loads Your Model Artifacts](#your-algorithms-inference-code-load-artifacts)
+ [How Your Container Should Respond to Inference Requests](#your-algorithms-inference-code-container-response)
+ [How Your Container Should Respond to Health Check (Ping) Requests](#your-algorithms-inference-algo-ping-requests)
+ [Container Contract to Support Bidirectional Streaming Capabilities](#your-algorithms-inference-algo-bidi)
+ [Use a Private Docker Registry for Real-Time Inference Containers](your-algorithms-containers-inference-private.md)

## How SageMaker AI Runs Your Inference Image
<a name="your-algorithms-inference-code-run-image"></a>

To configure a container to run as an executable, use an `ENTRYPOINT` instruction in a Dockerfile. Note the following: 
+ For model inference, SageMaker AI runs the container as:

  ```
  docker run image serve
  ```

  SageMaker AI overrides default `CMD` statements in a container by specifying the `serve` argument after the image name. The `serve` argument overrides arguments that you provide with the `CMD` command in the Dockerfile.

   
+ SageMaker AI expects all containers to run with root users. Create your container so that it uses only root users. When SageMaker AI runs your container, users that do not have root-level access can cause permissions issues.

   
+ We recommend that you use the `exec` form of the `ENTRYPOINT` instruction:

  ```
  ENTRYPOINT ["executable", "param1", "param2"]
  ```

  For example:

  ```
  ENTRYPOINT ["python", "k_means_inference.py"]
  ```

  The `exec` form of the `ENTRYPOINT` instruction starts the executable directly, not as a child of `/bin/sh`. This enables it to receive signals like `SIGTERM` and `SIGKILL` from the SageMaker API operations, which is a requirement. 

   

  For example, when you use the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpoint.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpoint.html) API to create an endpoint, SageMaker AI provisions the number of ML compute instances required by the endpoint configuration, which you specify in the request. SageMaker AI runs the Docker container on those instances. 

   

  If you reduce the number of instances backing the endpoint (by calling the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateEndpointWeightsAndCapacities.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateEndpointWeightsAndCapacities.html) API), SageMaker AI runs a command to stop the Docker container on the instances that are being terminated. The command sends the `SIGTERM` signal, then it sends the `SIGKILL` signal thirty seconds later.

   

  If you update the endpoint (by calling the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateEndpoint.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateEndpoint.html) API), SageMaker AI launches another set of ML compute instances and runs the Docker containers that contain your inference code on them. Then it runs a command to stop the previous Docker containers. To stop a Docker container, command sends the `SIGTERM` signal, then it sends the `SIGKILL` signal 30 seconds later. 

   
+ SageMaker AI uses the container definition that you provided in your [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModel.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModel.html) request to set environment variables and the DNS hostname for the container as follows:

   
  + It sets environment variables using the `ContainerDefinition.Environment` string-to-string map.
  + It sets the DNS hostname using the `ContainerDefinition.ContainerHostname`.

     
+ If you plan to use GPU devices for model inferences (by specifying GPU-based ML compute instances in your `CreateEndpointConfig` request), make sure that your containers are `nvidia-docker` compatible. Don't bundle NVIDIA drivers with the image. For more information about `nvidia-docker`, see [NVIDIA/nvidia-docker](https://github.com/NVIDIA/nvidia-docker). 

   
+ You can't use the `tini` initializer as your entry point in SageMaker AI containers because it gets confused by the `train` and `serve` arguments.

  

## How SageMaker AI Loads Your Model Artifacts
<a name="your-algorithms-inference-code-load-artifacts"></a>

In your [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModel.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModel.html) API request, you can use either the `ModelDataUrl` or `S3DataSource` parameter to identify the S3 location where model artifacts are stored. SageMaker AI copies your model artifacts from the S3 location to the `/opt/ml/model` directory for use by your inference code. Your container has read-only access to `/opt/ml/model`. Do not write to this directory.

The `ModelDataUrl` must point to a tar.gz file. Otherwise, SageMaker AI won't download the file. 

If you trained your model in SageMaker AI, the model artifacts are saved as a single compressed tar file in Amazon S3. If you trained your model outside SageMaker AI, you need to create this single compressed tar file and save it in a S3 location. SageMaker AI decompresses this tar file into /opt/ml/model directory before your container starts.

For deploying large models, we recommend that you follow [Deploying uncompressed models](large-model-inference-uncompressed.md).

## How Your Container Should Respond to Inference Requests
<a name="your-algorithms-inference-code-container-response"></a>

To obtain inferences, the client application sends a POST request to the SageMaker AI endpoint. SageMaker AI passes the request to the container, and returns the inference result from the container to the client.

For more information about the inference requests that your container will receive, see the following actions in the *Amazon SageMaker AI API Reference*:
+ [ InvokeEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpoint.html)
+ [ InvokeEndpointAsync](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpointAsync.html)
+ [ InvokeEndpointWithResponseStream](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpointWithResponseStream.html)
+ [ InvokeEndpointWithResponseStream](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpointWithBidirectionalStream.html)

**Requirements for inference containers**

To respond to inference requests, your container must meet the following requirements:
+ SageMaker AI strips all `POST` headers except those supported by `InvokeEndpoint`. SageMaker AI might add additional headers. Inference containers must be able to safely ignore these additional headers.
+ To receive inference requests, the container must have a web server listening on port 8080 and must accept `POST` requests to the `/invocations` and `/ping` endpoints. 
+ A customer's model containers must accept socket connection requests within 250 ms.
+ A customer's model containers must respond to requests within 60 seconds. The model itself can have a maximum processing time of 60 seconds before responding to the `/invocations`. If your model is going to take 50-60 seconds of processing time, the SDK socket timeout should be set to be 70 seconds.
+ A customer’s model container that supports bidirectional streaming must:
  + support WebSockets connections on port 8080 to /invocations-bidirectional-stream by default.
  + have a web server listening on port 8080 and must accept POST requests to the /ping endpoints.
  + In addition to container health checks over HTTP, container must respond with Pong Frame per ([RFC6455](https://datatracker.ietf.org/doc/html/rfc6455#section-5.5.3)), for WebSocket Ping Frame sent.

**Example invocation functions**  
The following examples demonstrate how the code in your container can process inference requests. These examples handle requests that client applications send by using the InvokeEndpoint action.  
FastAPI is a web framework for building APIs with Python.  

```
from fastapi import FastAPI, status, Request, Response
. . .
app = FastAPI()
. . .
@app.post('/invocations')
async def invocations(request: Request):
    # model() is a hypothetical function that gets the inference output:
    model_resp = await model(Request)

    response = Response(
        content=model_resp,
        status_code=status.HTTP_200_OK,
        media_type="text/plain",
    )
    return response
. . .
```
In this example, the `invocations` function handles the inference request that SageMaker AI sends to the `/invocations` endpoint.
Flask is a framework for developing web applications with Python.  

```
import flask
. . .
app = flask.Flask(__name__)
. . .
@app.route('/invocations', methods=["POST"])
def invoke(request):
    # model() is a hypothetical function that gets the inference output:
    resp_body = model(request)
    return flask.Response(resp_body, mimetype='text/plain')
```
In this example, the `invoke` function handles the inference request that SageMaker AI sends to the `/invocations` endpoint.

**Example invocation functions for streaming requests**  
The following examples demonstrate how the code in your inference container can process streaming inference requests. These examples handle requests that client applications send by using the InvokeEndpointWithResponseStream action.  
When a container handles a streaming inference request, it returns the model's inference as a series of parts incrementally as the model generates them. Client applications start receiving responses immediately when they're available. They don't need to wait for the model to generate the entire response. You can implement streaming to support fast interactive experiences, such as chatbots, virtual assistants, and music generators.  
FastAPI is a web framework for building APIs with Python.  

```
from starlette.responses import StreamingResponse
from fastapi import FastAPI, status, Request
. . .
app = FastAPI()
. . .
@app.post('/invocations')
async def invocations(request: Request):
    # Streams inference response using HTTP chunked encoding
    async def generate():
        # model() is a hypothetical function that gets the inference output:
        yield await model(Request)
        yield "\n"

    response = StreamingResponse(
        content=generate(),
        status_code=status.HTTP_200_OK,
        media_type="text/plain",
    )
    return response
. . .
```
In this example, the `invocations` function handles the inference request that SageMaker AI sends to the `/invocations` endpoint. To stream the response, the example uses the `StreamingResponse` class from the Starlette framework.
Flask is a framework for developing web applications with Python.  

```
import flask
. . .
app = flask.Flask(__name__)
. . .
@app.route('/invocations', methods=["POST"])
def invocations(request):
    # Streams inference response using HTTP chunked encoding

    def generate():
        # model() is a hypothetical function that gets the inference output:
        yield model(request)
        yield "\n"
    return flask.Response(
        flask.stream_with_context(generate()), mimetype='text/plain')
. . .
```
In this example, the `invocations` function handles the inference request that SageMaker AI sends to the `/invocations` endpoint. To stream the response, the example uses the `flask.stream_with_context` function from the Flask framework.

**Example invocation functions for bidirectional streaming**  
The following examples demonstrate how the code in your container can process streaming inference request and responses. These examples handle streaming requests that client applications send by using the InvokeEndpointWithBidirectionalStream action.  
A container with bidirectional streaming capability handles streaming inference requests where parts are incrementally generated at the client and streamed to the container. It returns the model's inference back to the client as a series of parts as the model generates them. Client applications start receiving responses immediately when they're available. They don't need to wait for request to the fully generated at the client or for the model to generate the entire response. You can implement bidirectional streaming to support fast interactive experiences, such as chatbots, interactive voice AI assistants and real-time translations for a more real-time experience.  
FastAPI is a web framework for building APIs with Python.  

```
import sys
import asyncio
import json
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from fastapi.responses import JSONResponse
import uvicorn

app = FastAPI()
...
@app.websocket("/invocations-bidirectional-stream")
async def websocket_invoke(websocket: WebSocket):
    """
    WebSocket endpoint with RFC 6455 ping/pong and fragmentation support
    
    Handles:
    - Text messages (JSON) - including fragmented frames
    - Binary messages - including fragmented frames
    - Ping frames (automatically responds with pong)
    - Pong frames (logs receipt)
    - Fragmented frames per RFC 6455 Section 5.4
    """
    await manager.connect(websocket)
    
    # Fragment reassembly buffers per RFC 6455 Section 5.4
    text_fragments = []
    binary_fragments = []
    
    while True:
        # Use receive() to handle all WebSocket frame types
        message = await websocket.receive()
        print(f"Received message: {message}")
        if message["type"] == "websocket.receive":
            if "text" in message:
                # Handle text frames (including fragments)
                text_data = message["text"]
                more_body = message.get("more_body", False)
                
                if more_body:
                    # This is a fragment, accumulate it
                    text_fragments.append(text_data)
                    print(f"Received text fragment: {len(text_data)} chars (more coming)")
                else:
                    # This is the final frame or a complete message
                    if text_fragments:
                        # Reassemble fragmented message
                        text_fragments.append(text_data)
                        complete_text = "".join(text_fragments)
                        text_fragments.clear()
                        print(f"Reassembled fragmented text message: {len(complete_text)} chars total")
                        await handle_text_message(websocket, complete_text)
                    else:
                        # Complete message in single frame
                        await handle_text_message(websocket, text_data)
                
            elif "bytes" in message:
                # Handle binary frames (including fragments)
                binary_data = message["bytes"]
                more_body = message.get("more_body", False)
                
                if more_body:
                    # This is a fragment, accumulate it
                    binary_fragments.append(binary_data)
                    print(f"Received binary fragment: {len(binary_data)} bytes (more coming)")
                else:
                    # This is the final frame or a complete message
                    if binary_fragments:
                        # Reassemble fragmented message
                        binary_fragments.append(binary_data)
                        complete_binary = b"".join(binary_fragments)
                        binary_fragments.clear()
                        print(f"Reassembled fragmented binary message: {len(complete_binary)} bytes total")
                        await handle_binary_message(websocket, complete_binary)
                    else:
                        # Complete message in single frame
                        await handle_binary_message(websocket, binary_data)
                
        elif message["type"] == "websocket.ping":
            # Handle ping frames - RFC 6455 Section 5.5.2
            ping_data = message.get("bytes", b"")
            print(f"Received PING frame with payload: {ping_data}")
            # FastAPI automatically sends pong response
            
        elif message["type"] == "websocket.pong":
            # Handle pong frames
            pong_data = message.get("bytes", b"")
            print(f"Received PONG frame with payload: {pong_data}")
            
        elif message["type"] == "websocket.close":
            # Handle close frames - RFC 6455 Section 5.5.1
            close_code = message.get("code", 1000)
            close_reason = message.get("reason", "")
            print(f"Received CLOSE frame - Code: {close_code}, Reason: '{close_reason}'")
            
            # Send close frame response if not already closing
            try:
                await websocket.close(code=close_code, reason=close_reason)
                print(f"Sent CLOSE frame response - Code: {close_code}")
            except Exception as e:
                print(f"Error sending close frame: {e}")
            break
            
        elif message["type"] == "websocket.disconnect":
            print("Client initiated disconnect")
            break

        else:
            print(f"Received unknown message type: {message['type']}")
            break

                        
async def handle_binary_message(websocket: WebSocket, binary_data: bytes):
    """Handle incoming binary messages (complete or reassembled from fragments)"""
    print(f"Processing complete binary message: {len(binary_data)} bytes")
    
    try:
        # Echo back the binary data
        await websocket.send_bytes(binary_data)
    except Exception as e:
        print(f"Error handling binary message: {e}")

async def handle_text_message(websocket: WebSocket, data: str):
    """Handle incoming text messages"""
    try:
        # Send response back to the same client
        await manager.send_personal_message(data, websocket)
    except Exception as e:
        print(f"Error handling text message: {e}")

def main():
    if len(sys.argv) > 1 and sys.argv[1] == "serve":
        print("Starting server on port 8080...")
        uvicorn.run(app, host="0.0.0.0", port=8080)
    else:
        print("Usage: python app.py serve")
        sys.exit(1)

if __name__ == "__main__":
    main()
```
In this example, the `websocket_invoke` function handles the inference request that SageMaker AI sends to the `/invocations-bidirectional-stream` endpoint. It shows handling stream requests and stream responses back to the client.

## How Your Container Should Respond to Health Check (Ping) Requests
<a name="your-algorithms-inference-algo-ping-requests"></a>

SageMaker AI launches new inference containers in the following situations:
+ Responding to `CreateEndpoint`, `UpdateEndpoint`, and `UpdateEndpointWeightsAndCapacities` API calls
+ Security patching
+ Replacing unhealthy instances

Soon after container startup, SageMaker AI starts sending periodic GET requests to the `/ping` endpoint.

The simplest requirement on the container is to respond with an HTTP 200 status code and an empty body. This indicates to SageMaker AI that the container is ready to accept inference requests at the `/invocations` endpoint.

If the container does not begin to pass health checks by consistently responding with 200s during the 8 minutes after startup, the new instance launch fails. This causes `CreateEndpoint` to fail, leaving the endpoint in a failed state. The update requested by `UpdateEndpoint` isn't completed, security patches aren't applied, and unhealthy instances aren't replaced.

While the minimum bar is for the container to return a static 200, a container developer can use this functionality to perform deeper checks. For example, the container can verify that the model is loaded into memory and can serve inference requests. The request timeout on `/ping` attempts is 2 seconds.

We strongly recommend implementing meaningful health checks rather than returning a static 200. The `/ping` endpoint is the main signal SageMaker AI uses to determine whether an instance is healthy. If your container always returns 200 — even when the model has failed to load, run out of memory, or entered a bad state — SageMaker AI continues routing inference requests to that instance. This results in sustained invocation errors for your application until the instance is manually replaced or the endpoint is updated.

A well-implemented `/ping` handler should verify that:
+ The model artifact is loaded and ready to serve
+ Critical resources (memory, disk, GPU if applicable) are available
+ The inference code path is functional (for example, a lightweight test prediction succeeds)

When `/ping` correctly reports an unhealthy state by returning a non-200 response, SageMaker AI detects the failure and automatically replaces the instance (excluding endpoints that use inference components), minimizing downtime for your application.

Additionally, a container that is capable of handling bidirectional streaming requests must respond with a Pong Frame (per WebSocket protocol [RFC6455](https://datatracker.ietf.org/doc/html/rfc6455#section-5.5.3)) to a Ping Frame. If no Pong Frame is received for 5 consecutive Pings, the connection to container will be closed by SageMaker AI platform. SageMaker AI platform will also respond to Ping Frames from model container with Pong Frames.

## Container Contract to Support Bidirectional Streaming Capabilities
<a name="your-algorithms-inference-algo-bidi"></a>

If you want to host your model container as SageMaker AI endpoint that supports bidirectional streaming capabilities, the model container must support the contract below:

**1. Bidirectional Docker Label **

The model container should have a Docker label indicating to the SageMaker AI platform that bidirectional streaming capability is supported on this container.

```
com.amazonaws.sagemaker.capabilities.bidirectional-streaming=true
```

**2. Support WebSocket Connection for invocations**

A customer’s model container that supports bi-directional streaming must support WebSockets connections on port 8080 to `/invocations-bidirectional-stream` by default. 

This path can be overridden by passing X-Amzn-SageMaker-Model-Invocation-Path header when invoking InvokeEndpointWithBidirectionalStream API. Additionally, users can specify a query string to be appended to this path by passing X-Amzn-SageMaker-Model-Query-String header when invoking InvokeEndpointWithBidirectionalStream API.

**3. Request Stream Handling**

The InvokeEndpointWithBidirectionalStream API input payloads are streamed in as a series of PayloadParts, which is just a wrapper of a binary chunk (“Bytes”: ***<Blob>***):

```
{
   "PayloadPart": { 
      "Bytes": <Blob>,
      "DataType": <String: UTF8 | BINARY>,
      "CompletionState": <String: PARTIAL | COMPLETE>
      "P": <String>
   }
}
```

**3.1. Data Frames**

SageMaker AI passes the input PayloadParts to Model container as WebSocket Data Frames ([RFC6455-Section-5.6](https://datatracker.ietf.org/doc/html/rfc6455#section-5.6))

1. SageMaker AI does not inspect into the binary chunk.

1. On receiving an input PayloadPart
   + SageMaker AI creates exactly one WebSocket Data Frame from `PayloadPart.Bytes`, then pass it to model container.
   + If `PayloadPart.DataType = UTF8`, SageMaker AI creates a Text Data Frame
   + If `PayloadPart.DataType` does not present or `PayloadPart.DataType = BINARY`, SageMaker AI creates a Binary Data Frame

1. For a sequence of PayloadParts with `PayloadPart.CompletionState = PARTIAL`, and terminated by a PayloadPart with `PayloadPart.CompletionState = COMPLETE`, SageMaker AI translates them into WebSocket fragmented message [RFC6455-Section-5.4: Fragmentation](https://datatracker.ietf.org/doc/html/rfc6455#section-5.4):
   + The initial PayloadPart with `PayloadPart.CompletionState = PARTIAL` will be translated into a WebSocket Data Frame, with FIN bit clear.
   + The subsequent PayloadParts with `PayloadPart.CompletionState = PARTIAL` will be translated into WebSocket Continuation Frames with FIN bit clear.
   + The final PayloadPart with `PayloadPart.CompletionState = COMPLETE` will be translated into WebSocket Continuation Frame with FIN bit set.

1. SageMaker AI does not encode or decode the binary chunk from the input PayloadPart, the bytes are passed to model container as-is.

1. SageMaker AI does not combine multiple input PayloadParts into one BinaryDataFrame.

1. SageMaker AI does not chunk one input PayloadPart into multiple BinaryDataFrames.

**Example: Fragmented Message Flow**

```
Client sends:
PayloadPart 1: {Bytes: "Hello ", DataType: "UTF8", CompletionState: "PARTIAL"}
PayloadPart 2: {Bytes: "World", DataType: "UTF8", CompletionState: "COMPLETE"}

Container receives:
Frame 1: Text Data Frame with "Hello " (FIN=0)
Frame 2: Continuation Frame with "World" (FIN=1)
```

**3.2. Control Frames**

Besides Data Frames, SageMaker AI also sends Control Frames to model container ([RFC6455-Section-5.5](https://datatracker.ietf.org/doc/html/rfc6455#section-5.5)):

1. Close Frame: SageMaker AI may send Close Frame ([RFC6455-Section-5.5.1](https://datatracker.ietf.org/doc/html/rfc6455#section-5.5.1)) to model container should the connection be closed for any reason.

1. Ping Frame: SageMaker AI send Ping Frame ([RFC6455-Section-5.5.2](https://datatracker.ietf.org/doc/html/rfc6455#section-5.5.2)) once every 60 seconds, model container must respond with Pong Frame. If no Pong Frame ([RFC6455-Section-5.5.3](https://datatracker.ietf.org/doc/html/rfc6455#section-5.5.3)) is received for 5 consecutive Pings, the connection will be closed by SageMaker AI.

1. Pong Frame: SageMaker AI will respond to Ping Frames from model container with Pong Frames.

**4. Response Stream Handling**

The output are streamed out as a series of PayloadParts, ModelStreamErrors or InternalStreamFailures.

```
{   
   "PayloadPart": { 
      "Bytes": <Blob>,
      "DataType": <String: UTF8 | BINARY>,
      "CompletionState": <String: PARTIAL | COMPLETE>,
   },
   "ModelStreamError": {
      "ErrorCode": <String>,
      "Message": <String>
   },
   "InternalStreamFailure": {
      "Message": <String>
   }
}
```

**4.1. Data Frames**

SageMaker AI convert Data Frames received from model container into output PayloadParts:

1. On receiving a WebSocket Text Data Frame from the model container, SageMaker AI gets the raw bytes from the Text Data Frame, and wraps it into a response PayloadPart, meanwhile set `PayloadPart.DataType = UTF8`.

1. On receiving a WebSocket Binary Data Frame from the model container, SageMaker AI directly wraps the bytes from the data frame into a response PayloadPart, meanwhile set `PayloadPart.DataType = BINARY`.

1. For fragmented message as defined in [RFC6455-Section-5.4: Fragmentation](https://datatracker.ietf.org/doc/html/rfc6455#section-5.4):
   + The initial Data Frame with FIN bit clear will be translated into a PayloadPart with `PayloadPart.CompletionState = PARTIAL`.
   + The subsequent Continuation Frames with FIN bit clear will be translated into PayloadParts with `PayloadPart.CompletionState = PARTIAL`.
   + The final Continuation Frame with FIN bit set will be translated into PayloadPart with `PayloadPart.CompletionState = COMPLETE`.

1. SageMaker AI does not encode or decode the bytes received from model containers, the bytes are passed to model container as-is.

1. SageMaker AI does not combine multiple Data Frames received from model container into one response PayloadPart.

1. SageMaker AI does not chunk a Data Frame received from model container into multiple response PayloadParts.

**Example: Streaming Response Flow**

```
Container sends:
Frame 1: Text Data Frame with "Generating" (FIN=0)
Frame 2: Continuation Frame with " response..." (FIN=1)

Client receives:
PayloadPart 1: {Bytes: "Generating", DataType: "UTF8", CompletionState: "PARTIAL"}
PayloadPart 2: {Bytes: " response...", DataType: "UTF8", CompletionState: "COMPLETE"}
```

**4.2. Control Frames**

SageMaker AI responds to the following Control Frames from the model container:

1. On receiving a Close Frame ([RFC6455-Section-5.5.1](https://datatracker.ietf.org/doc/html/rfc6455#section-5.5.1)) from model container, SageMaker AI will wrap the status code ([RFC6455-Section-7.4](https://datatracker.ietf.org/doc/html/rfc6455#section-7.4)) and failure messages into ModelStreamError, and stream it back to the end user.

1. On receiving a Ping Frame ([RFC6455-Section-5.5.2](https://datatracker.ietf.org/doc/html/rfc6455#section-5.5.2)) from model container, SageMaker AI will respond with Pong Frame.

1. Pong Frame([RFC6455-Section-5.5.3](https://datatracker.ietf.org/doc/html/rfc6455#section-5.5.3)): If no Pong Frame is received for 5 consecutive Pings, the connection will be closed by SageMaker AI.

# Use a Private Docker Registry for Real-Time Inference Containers
<a name="your-algorithms-containers-inference-private"></a>

Amazon SageMaker AI hosting enables you to use images stored in Amazon ECR to build your containers for real-time inference by default. Optionally, you can build containers for real-time inference from images in a private Docker registry. The private registry must be accessible from an Amazon VPC in your account. Models that you create based on the images stored in your private Docker registry must be configured to connect to the same VPC where the private Docker registry is accessible. For information about connecting your model to a VPC, see [Give SageMaker AI Hosted Endpoints Access to Resources in Your Amazon VPC](host-vpc.md).

Your Docker registry must be secured with a TLS certificate from a known public certificate authority (CA).

**Note**  
Your private Docker registry must allow inbound traffic from the security groups you specify in the VPC configuration for your model, so that SageMaker AI hosting is able to pull model images from your registry.  
SageMaker AI can pull model images from DockerHub if there's a path to the open internet inside your VPC.

**Topics**
+ [Store Images in a Private Docker Registry other than Amazon Elastic Container Registry](#your-algorithms-containers-inference-private-registry)
+ [Use an Image from a Private Docker Registry for Real-time Inference](#your-algorithms-containers-inference-private-use)
+ [Allow SageMaker AI to authenticate to a private Docker registry](#inference-private-docker-authenticate)
+ [Create the Lambda function](#inference-private-docker-lambda)
+ [Give your execution role permission to Lambda](#inference-private-docker-perms)
+ [Create an interface VPC endpoint for Lambda](#inference-private-docker-vpc-interface)

## Store Images in a Private Docker Registry other than Amazon Elastic Container Registry
<a name="your-algorithms-containers-inference-private-registry"></a>

To use a private Docker registry to store your images for SageMaker AI real-time inference, create a private registry that is accessible from your Amazon VPC. For information about creating a Docker registry, see [Deploy a registry server](https://docs.docker.com/registry/deploying/) in the Docker documentation. The Docker registry must comply with the following:
+ The registry must be a [Docker Registry HTTP API V2](https://docs.docker.com/registry/spec/api/) registry.
+ The Docker registry must be accessible from the same VPC that you specify in the `VpcConfig` parameter that you specify when you create your model.

## Use an Image from a Private Docker Registry for Real-time Inference
<a name="your-algorithms-containers-inference-private-use"></a>

When you create a model and deploy it to SageMaker AI hosting, you can specify that it use an image from your private Docker registry to build the inference container. Specify this in the `ImageConfig` object in the `PrimaryContainer` parameter that you pass to a call to the [create\$1model](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model) function.

**To use an image stored in your private Docker registry for your inference container**

1. Create the image configuration object and specify a value of `Vpc` for the `RepositoryAccessMode` field.

   ```
   image_config = {
                       'RepositoryAccessMode': 'Vpc'
                  }
   ```

1. If your private Docker registry requires authentication, add a `RepositoryAuthConfig` object to the image configuration object. For the `RepositoryCredentialsProviderArn` field of the `RepositoryAuthConfig` object, specify the Amazon Resource Name (ARN) of an AWS Lambda function that provides credentials that allows SageMaker AI to authenticate to your private Docker Registry. For information about how to create the Lambda function to provide authentication, see [Allow SageMaker AI to authenticate to a private Docker registry](#inference-private-docker-authenticate).

   ```
   image_config = {
                       'RepositoryAccessMode': 'Vpc',
                       'RepositoryAuthConfig': {
                          'RepositoryCredentialsProviderArn': 'arn:aws:lambda:Region:Acct:function:FunctionName'
                        }
                  }
   ```

1. Create the primary container object that you want to pass to `create_model`, using the image configuration object that you created in the previous step. 

   Provide your image in [digest](https://docs.docker.com/engine/reference/commandline/pull/#pull-an-image-by-digest-immutable-identifier) form. If you provide your image using the `:latest` tag, there is a risk that SageMaker AI pulls a newer version of the image than intended. Using the digest form ensures that SageMaker AI pulls the intended image version.

   ```
   primary_container = {
       'ContainerHostname': 'ModelContainer',
       'Image': 'myteam.myorg.com/docker-local/my-inference-image:<IMAGE-TAG>',
       'ImageConfig': image_config
   }
   ```

1. Specify the model name and the execution role that you want to pass to `create_model`.

   ```
   model_name = 'vpc-model'
   execution_role_arn = 'arn:aws:iam::123456789012:role/SageMakerExecutionRole'
   ```

1. Specify one or more security groups and subnets for the VPC configuration for your model. Your private Docker registry must allow inbound traffic from the security groups that you specify. The subnets that you specify must be in the same VPC as your private Docker registry.

   ```
   vpc_config = {
       'SecurityGroupIds': ['sg-0123456789abcdef0'],
       'Subnets': ['subnet-0123456789abcdef0','subnet-0123456789abcdef1']
   }
   ```

1. Get a Boto3 SageMaker AI client.

   ```
   import boto3
   sm = boto3.client('sagemaker')
   ```

1. Create the model by calling `create_model`, using the values you specified in the previous steps for the `PrimaryContainer` and `VpcConfig` parameters.

   ```
   try:
       resp = sm.create_model(
           ModelName=model_name,
           PrimaryContainer=primary_container,
           ExecutionRoleArn=execution_role_arn,
           VpcConfig=vpc_config,
       )
   except Exception as e:
       print(f'error calling CreateModel operation: {e}')
   else:
       print(resp)
   ```

1. Finally, call [create\$1endpoint\$1config](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint_config) and [create\$1endpoint](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint) to create the hosting endpoint, using the model that you created in the previous step.

   ```
   endpoint_config_name = 'my-endpoint-config'
   sm.create_endpoint_config(
       EndpointConfigName=endpoint_config_name,
       ProductionVariants=[
           {
               'VariantName': 'MyVariant',
               'ModelName': model_name,
               'InitialInstanceCount': 1,
               'InstanceType': 'ml.t2.medium'
           },
       ],
   )
   
   endpoint_name = 'my-endpoint'
   sm.create_endpoint(
       EndpointName=endpoint_name,
       EndpointConfigName=endpoint_config_name,
   )
   
   sm.describe_endpoint(EndpointName=endpoint_name)
   ```

## Allow SageMaker AI to authenticate to a private Docker registry
<a name="inference-private-docker-authenticate"></a>

To pull an inference image from a private Docker registry that requires authentication, create an AWS Lambda function that provides credentials, and provide the Amazon Resource Name (ARN) of the Lambda function when you call [create\$1model](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model). When SageMaker AI runs `create_model`, it calls the Lambda function that you specified to get credentials to authenticate to your Docker registry.

## Create the Lambda function
<a name="inference-private-docker-lambda"></a>

Create an AWS Lambda function that returns a response with the following form:

```
def handler(event, context):
   response = {
      "Credentials": {"Username": "username", "Password": "password"}
   }
   return response
```

Depending on how you set up authentication for your private Docker registry, the credentials that your Lambda function returns can mean either of the following:
+ If you set up your private Docker registry to use basic authentication, provide the sign-in credentials to authenticate to the registry.
+ If you set up your private Docker registry to use bearer token authentication, the sign-in credentials are sent to your authorization server, which returns a Bearer token that can then be used to authenticate to the private Docker registry.

## Give your execution role permission to Lambda
<a name="inference-private-docker-perms"></a>

The execution role that you use to call `create_model` must have permissions to call AWS Lambda functions. Add the following to the permissions policy of your execution role.

```
{
    "Effect": "Allow",
    "Action": [
        "lambda:InvokeFunction"
    ],
    "Resource": [
        "arn:aws:lambda:*:*:function:*myLambdaFunction*"
    ]
}
```

Where *myLambdaFunction* is the name of your Lambda function. For information about editing a role permissions policy, see [Modifying a role permissions policy (console)](https://docs.aws.amazon.com/IAM/latest/UserGuide/roles-managingrole-editing-console.html#roles-modify_permissions-policy) in the *AWS Identity and Access Management User Guide*.

**Note**  
An execution role with the `AmazonSageMakerFullAccess` managed policy attached to it has permission to call any Lambda function with **SageMaker** in its name.

## Create an interface VPC endpoint for Lambda
<a name="inference-private-docker-vpc-interface"></a>

Create an interface endpoint so that your Amazon VPC can communicate with your AWS Lambda function without sending traffic over the internet. For information about how to do this, see [Configuring interface VPC endpoints for Lambda](https://docs.aws.amazon.com/lambda/latest/dg/configuration-vpc-endpoints.html) in the *AWS Lambda Developer Guide*.

SageMaker AI hosting sends a request through your VPC to `lambda.region.amazonaws.com`, to call your Lambda function. If you choose Private DNS Name when you create your interface endpoint, Amazon Route 53 routes the call to the Lambda interface endpoint. If you use a different DNS provider, make sure to map `lambda.region.amazonaws.com` to your Lambda interface endpoint.

# Custom Inference Code with Batch Transform
<a name="your-algorithms-batch-code"></a>

This section explains how Amazon SageMaker AI interacts with a Docker container that runs your own inference code for batch transform. Use this information to write inference code and create a Docker image. 

**Topics**
+ [How SageMaker AI Runs Your Inference Image](#your-algorithms-batch-code-run-image)
+ [How SageMaker AI Loads Your Model Artifacts](#your-algorithms-batch-code-load-artifacts)
+ [How Containers Serve Requests](#your-algorithms-batch-code-how-containe-serves-requests)
+ [How Your Container Should Respond to Inference Requests](#your-algorithms-batch-code-how-containers-should-respond-to-inferences)
+ [How Your Container Should Respond to Health Check (Ping) Requests](#your-algorithms-batch-algo-ping-requests)

## How SageMaker AI Runs Your Inference Image
<a name="your-algorithms-batch-code-run-image"></a>

To configure a container to run as an executable, use an `ENTRYPOINT` instruction in a Dockerfile. Note the following: 
+ For batch transforms, SageMaker AI invokes the model on your behalf. SageMaker AI runs the container as:

  ```
  docker run image serve
  ```

  The input to batch transforms must be of a format that can be split into smaller files to process in parallel. These formats include CSV, [JSON](https://www.json.org/json-en.html), [JSON Lines](https://jsonlines.org/), [TFRecord](https://www.tensorflow.org/tutorials/load_data/tfrecord) and [RecordIO](https://mesos.apache.org/documentation/latest/recordio/).

  SageMaker AI overrides default `CMD` statements in a container by specifying the `serve` argument after the image name. The `serve` argument overrides arguments that you provide with the `CMD` command in the Dockerfile.

   
+ We recommend that you use the `exec` form of the `ENTRYPOINT` instruction:

  ```
  ENTRYPOINT ["executable", "param1", "param2"]
  ```

  For example:

  ```
  ENTRYPOINT ["python", "k_means_inference.py"]
  ```

   
+ SageMaker AI sets environment variables specified in [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModel.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModel.html) and [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTransformJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTransformJob.html) on your container. Additionally, the following environment variables are populated:
  + `SAGEMAKER_BATCH` is set to `true` when the container runs batch transforms.
  + `SAGEMAKER_MAX_PAYLOAD_IN_MB` is set to the largest size payload that is sent to the container via HTTP.
  + `SAGEMAKER_BATCH_STRATEGY` is set to `SINGLE_RECORD` when the container is sent a single record per call to invocations and `MULTI_RECORD` when the container gets as many records as will fit in the payload.
  + `SAGEMAKER_MAX_CONCURRENT_TRANSFORMS` is set to the maximum number of `/invocations` requests that can be opened simultaneously.
**Note**  
The last three environment variables come from the API call made by the user. If the user doesn’t set values for them, they aren't passed. In that case, either the default values or the values requested by the algorithm (in response to the `/execution-parameters`) are used.
+ If you plan to use GPU devices for model inferences (by specifying GPU-based ML compute instances in your `CreateTransformJob` request), make sure that your containers are nvidia-docker compatible. Don't bundle NVIDIA drivers with the image. For more information about nvidia-docker, see [NVIDIA/nvidia-docker](https://github.com/NVIDIA/nvidia-docker). 

   
+ You can't use the `init` initializer as your entry point in SageMaker AI containers because it gets confused by the train and serve arguments.

  

## How SageMaker AI Loads Your Model Artifacts
<a name="your-algorithms-batch-code-load-artifacts"></a>

In a [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModel.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModel.html) request, container definitions include the `ModelDataUrl` parameter, which identifies the location in Amazon S3 where model artifacts are stored. When you use SageMaker AI to run inferences, it uses this information to determine from where to copy the model artifacts. It copies the artifacts to the `/opt/ml/model` directory in the Docker container for use by your inference code.

The `ModelDataUrl` parameter must point to a tar.gz file. Otherwise, SageMaker AI can't download the file. If you train a model in SageMaker AI, it saves the artifacts as a single compressed tar file in Amazon S3. If you train a model in another framework, you need to store the model artifacts in Amazon S3 as a compressed tar file. SageMaker AI decompresses this tar file and saves it in the `/opt/ml/model` directory in the container before the batch transform job starts. 

## How Containers Serve Requests
<a name="your-algorithms-batch-code-how-containe-serves-requests"></a>

Containers must implement a web server that responds to invocations and ping requests on port 8080. For batch transforms, you have the option to set algorithms to implement execution-parameters requests to provide a dynamic runtime configuration to SageMaker AI. SageMaker AI uses the following endpoints: 
+ `ping`—Used to periodically check the health of the container. SageMaker AI waits for an HTTP `200` status code and an empty body for a successful ping request before sending an invocations request. You might use a ping request to load a model into memory to generate inference when invocations requests are sent.
+ (Optional) `execution-parameters`—Allows the algorithm to provide the optimal tuning parameters for a job during runtime. Based on the memory and CPUs available for a container, the algorithm chooses the appropriate `MaxConcurrentTransforms`, `BatchStrategy`, and `MaxPayloadInMB` values for the job.

Before calling the invocations request, SageMaker AI attempts to invoke the execution-parameters request. When you create a batch transform job, you can provide values for the `MaxConcurrentTransforms`, `BatchStrategy`, and `MaxPayloadInMB` parameters. SageMaker AI determines the values for these parameters using this order of precedence:

1. The parameter values that you provide when you create the `CreateTransformJob` request.

1. The values that the model container returns when SageMaker AI invokes the execution-parameters endpoint>

1. The default parameter values, listed in the following table.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-batch-code.html)

The response for a `GET` execution-parameters request is a JSON object with keys for `MaxConcurrentTransforms`, `BatchStrategy`, and `MaxPayloadInMB` parameters. This is an example of a valid response:

```
{
“MaxConcurrentTransforms”: 8,
“BatchStrategy": "MULTI_RECORD",
"MaxPayloadInMB": 6
}
```

## How Your Container Should Respond to Inference Requests
<a name="your-algorithms-batch-code-how-containers-should-respond-to-inferences"></a>

To obtain inferences, Amazon SageMaker AI sends a POST request to the inference container. The POST request body contains data from Amazon S3. Amazon SageMaker AI passes the request to the container, and returns the inference result from the container, saving the data from the response to Amazon S3.

To receive inference requests, the container must have a web server listening on port 8080 and must accept POST requests to the `/invocations` endpoint. The inference request timeout and max retries can be configured through `[ModelClientConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ModelClientConfig.html)`.

## How Your Container Should Respond to Health Check (Ping) Requests
<a name="your-algorithms-batch-algo-ping-requests"></a>

The simplest requirement on the container is to respond with an HTTP 200 status code and an empty body. This indicates to SageMaker AI that the container is ready to accept inference requests at the `/invocations` endpoint.

While the minimum bar is for the container to return a static 200, a container developer can use this functionality to perform deeper checks. The request timeout on `/ping` attempts is 2 seconds.