

# AWS Clean Rooms ML
<a name="machine-learning"></a>

AWS Clean Rooms ML allows two or more parties to run machine learning models on their data without the need to share their data with each other. The service provides privacy-enhancing controls that allow data owners to safe-guard their data and their model IP. You can use AWS authored models or bring your own custom model.

For a more detailed explanation of how this works, see [Cross-account jobs](ml-behaviors.md#ml-behaviors-cross-account-jobs).

For more information about the capabilities of Clean Rooms ML models, see the following topics. 

**Topics**
+ [AWS Clean Rooms ML terminology](#ml-terminology)
+ [How AWS Clean Rooms ML works with AWS models](#ml-how-it-works)
+ [How AWS Clean Rooms ML works with custom models](#custML-how-it-works)
+ [AWS models in Clean Rooms ML](aws-models.md)
+ [Custom models in Clean Rooms ML](custom-models.md)

## AWS Clean Rooms ML terminology
<a name="ml-terminology"></a>

It is important to understand the following terminology when using Clean Rooms ML:
+ *Training data provider* – The party that contributes the training data, creates and configures a lookalike model, and then associates that lookalike model with a collaboration.
+ *Seed data provider* – The party that contributes the seed data, generates a lookalike segment, and exports their lookalike segment.
+ *Training data* – The training data provider's data, which is used to generate a lookalike model. The training data is used to measure similarity in user behaviors.

  The training data must contain a user ID, item ID, and timestamp column. Optionally, the training data can contain other interactions as numerical or categorical features. Examples of interactions are a list of videos watched, items purchased, or articles read. 
+ *Seed data* – The seed data provider's data, which is used to create a lookalike segment. The seed data can be provided directly or it can come from the results of an AWS Clean Rooms query. The lookalike segment output is a set of users from the training data that most closely resembles the seed users.
+ *Lookalike model* – A machine learning model of the training data that is used to find similar users in other datasets.

  When using the API, the term *audience model* is used equivalently to lookalike model. For example, you use the [CreateAudienceModel](https://docs.aws.amazon.com/cleanrooms-ml/latest/APIReference/API_CreateAudienceModel.html) API to create a lookalike model.
+ *Lookalike segment* – A subset of the training data that most closely resembles the seed data.

  When using the API, you create a lookalike segment with the [StartAudienceGenerationJob](https://docs.aws.amazon.com/cleanrooms-ml/latest/APIReference/API_StartAudienceGenerationJob.html) API.

The training data provider's data is never shared with the seed data provider and the seed data provider's data is never shared with the training data provider. The lookalike segment output is shared with the training data provider, but never the seed data provider.

## How AWS Clean Rooms ML works with AWS models
<a name="ml-how-it-works"></a>

![\[An overview of how AWS Clean Rooms ML works with AWS models.\]](http://docs.aws.amazon.com/clean-rooms/latest/userguide/images/howItWorksML.png)


Working with lookalike models requires that two parties, a training data provider and a seed data provider, work sequentially in AWS Clean Rooms to bring their data into a collaboration. This is the workflow that the training data provider must complete first:

1. The training data provider's data must be stored in a AWS Glue data catalog table of user-item interactions. At a minimum, the training data must contain a user ID column, interaction ID column, and a timestamp column.

1. The training data provider registers the training data with AWS Clean Rooms.

1. The training data provider creates a lookalike model that can be shared with multiple seed data providers. The lookalike model is a deep neural network that can take up to 24 hours to train. It isn't automatically retrained and we recommend that you retrain the model weekly.

1. The training data provider configures the lookalike model, including whether to share relevance metrics and the Amazon S3 location of the output segments. The training data provider can create multiple configured lookalike models from a single lookalike model.

1. The training data provider associates the configured audience model to a collaboration that's shared with a seed data provider.

This is the workflow that the seed data provider must complete next:

1. The seed data provider's data can be stored in an Amazon S3 bucket or it can come from the results of query.

1. The seed data provider opens the collaboration that they share with the training data provider.

1. The seed data provider creates a lookalike segment from the Clean Rooms ML tab of the collaboration page. 

1. The seed data provider can evaluate the relevance metrics, if they were shared, and export the lookalike segment for use outside AWS Clean Rooms.

## How AWS Clean Rooms ML works with custom models
<a name="custML-how-it-works"></a>

With Clean Rooms ML, members of a collaboration can use a dockerized custom model algorithm that is stored in Amazon ECR to jointly analyze their data. To do this, the *model provider* must create an image and store it in Amazon ECR. Follow the steps in [Amazon Elastic Container Registry User Guide](https://docs.aws.amazon.com/AmazonECR/latest/userguide/) to create a private repository that will contain the custom ML model. 

Any member of a collaboration can be the *model provider*, provided they have the correct permissions. All members of a collaboration can contribute training data, inference data, or both to the model. For the purpose of this guide, members contributing data are referred to as *data providers*. The member who creates the collaboration is the *collaboration creator*, and this member can be either the *model provider*, one of the *data providers*, or both.

At the highest level, here are the steps that must be completed to perform custom ML modeling:

1. The collaboration creator creates a collaboration and assigns each member the proper member abilities and payment configuration. The collaboration creator must assign the member ability to either receive model outputs or receive inference results to the appropriate member in this step because it can't be updated after the collaboration is created. For more information, see [Creating and joining the collaboration in AWS Clean Rooms ML](create-custom-ml-collaboration.md).

1. The model provider configures and associates their containerized ML model to the collaboration and ensures privacy constraints are set for exported data. For more information, see [Configuring a model algorithm in AWS Clean Rooms ML](configure-model-algorithm.md).

1. The data providers contribute their data to the collaboration and ensure their privacy needs are specified. Data providers must allow the model to access their data. For more information, see [Contributing training data in AWS Clean Rooms ML](custom-model-training-data.md) and [Associating the configured model algorithm in AWS Clean Rooms ML](associate-model-algorithm.md).

1. A collaboration member creates the ML configuration, which defines where the model artifacts or inference results are exported to.

1. A collaboration member creates an ML input channel that provides input to the training container or inference container. The ML input channel is a query that defines the data to be used in the context of the model algorithm.

1. A collaboration member invokes model training using the ML input channel and the configured model algorithm. For more information, see [Creating a trained model in AWS Clean Rooms ML](create-trained-model.md).

1. (Optional) The model trainer invokes the model export job and the model artifacts are sent to the model results receiver. Only members with a valid ML configuration and the member ability to receive model output can receive model artifacts. For more information, see [Exporting model artifacts from AWS Clean Rooms ML](export-model-artifacts.md).

1. (Optional) A collaboration member invokes model inference using the ML input channel, the trained model ARN, and the inference configured model algorithm. The inference results are sent to the inference output receiver. Only members with a valid ML configuration and the member ability to receive inference output can receive inference results.

Here are the steps that must be completed by the *model provider*:

1. Create a SageMaker AI compatible Amazon ECR docker image. Clean Rooms ML supports only SageMaker AI compatible docker images.

1. After you have created a SageMaker AI compatible docker image, push the image to Amazon ECR. Follow the directions in [Amazon Elastic Container Registry User Guide](https://docs.aws.amazon.com/AmazonECR/latest/userguide/) to create a container training image.

1. Configure the model algorithm for use in Clean Rooms ML.

   1. Provide the Amazon ECR repository link and any arguments necessary to configure the model algorithm.

   1. Provide a service access role that allows Clean Rooms ML to access the Amazon ECR repository.

   1. Associate the configured model algorithm with the collaboration. This includes providing a privacy policy that defines controls for container logs, failure logs, CloudWatch metrics, and limits about how much data can be exported from the container results.

Here are the steps that must be completed by the *data provider* to collaborate with a custom ML model:

1. Configure an existing AWS Glue table with a custom analysis rule. This allows a specific set of pre-approved queries or pre-approved accounts to use your data.

1. Associate your configured table with a collaboration and provide a service access role that can access your AWS Glue tables.

1. [Add a collaboration analysis rule](add-collaboration-analysis-rule.md) to the table that allows the configured model algorithm association to access the configured table.

1. After the model and data are associated and configured in Clean Rooms ML, the member with the ability to run queries provides an SQL query and selects the model algorithm to use.

 After model training is finished, that member initiates the export of model training artifacts or inference results. These artifacts or results are sent to the member with the ability to received trained model output. The results receiver must configure their `MachineLearningConfiguration` before they can receive model output.

# AWS models in Clean Rooms ML
<a name="aws-models"></a>

AWS Clean Rooms ML provides a privacy-preserving method for two parties to identify similar users in their data without the need to share their data with each other. The first party brings the training data to AWS Clean Rooms so that they can create and configure a lookalike model and associate it with a collaboration. Then, seed data is brought to the collaboration to create a lookalike segment that resembles the training data.

For a more detailed explanation of how this works, see [Cross-account jobs](ml-behaviors.md#ml-behaviors-cross-account-jobs).

The following topics provide information on how to create and configure a AWS models in Clean Rooms ML.

**Topics**
+ [Privacy protections of AWS Clean Rooms ML](ml-privacy.md)
+ [Training data requirements for Clean Rooms ML](ml-training-data-requirements.md)
+ [Seed data requirements for Clean Rooms ML](ml-seed-data-requirements.md)
+ [AWS Clean Rooms ML model evaluation metrics](ml-metrics.md)

# Privacy protections of AWS Clean Rooms ML
<a name="ml-privacy"></a>

Clean Rooms ML is designed to reduce the risk of *membership inference attacks* where the training data provider can learn who is in the seed data and the seed data provider can learn who is in the training data. Several steps are taken to prevent this attack.

First, seed data providers don't directly observe the Clean Rooms ML output and training data providers can never observe the seed data. Seed data providers can choose to include the seed data in the output segment.

Next, the lookalike model is created from a random sample of the training data. This sample includes a significant number of users that don't match the seed audience. This process makes it harder to determine whether a user was not in the data, which is another avenue for membership inference.

Further, multiple seed customers can be used for every parameter of seed-specific lookalike model training. This limits how much the model can overfit, and thus how much can be inferred about a user. As a result, we recommend that the minimum size of the seed data is 500 users. 

Finally, user-level metrics are never provided to training data providers, which eliminates another avenue for a membership inference attack.

# Training data requirements for Clean Rooms ML
<a name="ml-training-data-requirements"></a>

To successfully create a lookalike model, your training data must meet the following requirements:
+ The training data must be in Parquet, CSV, or JSON format.
**Note**  
Zstandard (ZSTD) compressed Parquet data is not supported.
+ Your training data must be cataloged in AWS Glue. For more information, see [Getting started with the AWS Glue Data Catalog](https://docs.aws.amazon.com//glue/latest/dg/start-data-catalog.html) in the AWS Glue Developer Guide. We recommend using AWS Glue crawlers to create your tables because the schema is inferred automatically.
+ The Amazon S3 bucket that contains the training data and seed data is in the same AWS region as your other Clean Rooms ML resources.
+ The training data must contain at least 100,000 unique user IDs with at least two item interactions each.
+ The training data must contain at least 1 million records.
+ The schema specified in the [CreateTrainingDataset](https://docs.aws.amazon.com/cleanrooms-ml/latest/APIReference/API_CreateTrainingDataset.html) action must align with the schema defined when the AWS Glue table was created.
+ The required fields, as defined in the provided table, are defined in the [CreateTrainingDataset](https://docs.aws.amazon.com/cleanrooms-ml/latest/APIReference/API_CreateTrainingDataset.html) action.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/clean-rooms/latest/userguide/ml-training-data-requirements.html)
+ Optionally, you can provide up to 10 total categorical or numerical features.

Here is an example of a valid training data set in CSV format

```
USER_ID,ITEM_ID,TIMESTAMP,EVENT_TYPE(CATEGORICAL FEATURE),EVENT_VALUE (NUMERICAL FEATURE)
196,242,881250949,click,15
186,302,891717742,click,13
22,377,878887116,click,10
244,51,880606923,click,20
166,346,886397596,click,10
```

# Seed data requirements for Clean Rooms ML
<a name="ml-seed-data-requirements"></a>

The seed data for a lookalike model can either come directly from an Amazon S3 bucket or from the results of an SQL query. 

Seed data that's provided directly must meet the following requirements:
+ The seed data must be in JSON lines format with a list of user IDs.
+ The seed size should be between 25 and 500,000 unique user IDs.
+ The minimum number of seed users must match the minimum matching seed size value that was specified when you created the configured audience model.

The following is an example of a valid training data set in CSV format

```
{"user_id": "abc"}
{"user_id": "def"}
{"user_id": "ghijkl"}
{"user_id": "123"}
{"user_id": "456"}
{"user_id": "7890"}
```

# AWS Clean Rooms ML model evaluation metrics
<a name="ml-metrics"></a>

Clean Rooms ML computes the *recall* and *relevance score* to determine how well your model performs. Recall compares the similarity between the lookalike data and training data. The relevance score is used to decide how large the audience should be, not whether the model is well-performing.

*Recall* is an unbiased measure of how similar the lookalike segment is to the training data. Recall is the percentage of the most similar users (by default, the most similar 20%) from a sample of the training data that are included in the seed audience by the audience generation job. Values range from 0-1, larger values indicate a better audience. A recall value approximately equal to the maximum bin percentage indicates that the audience model is equivalent to random selection.

We consider this a better evaluation metric than accuracy, precision, and F1 scores because Clean Rooms ML doesn't have accurately labeled true negative users when building its model.

Segment-level *relevance score* is a measure of similarity with values ranging from -1 (least similar) to 1 (most similar). Clean Rooms ML computes a set of relevance scores for various segment sizes to help you determine the best segment size for your data. Relevance scores monotonically decrease as the segment size increases, thus as the segment size increases it can be less similar to the seed data. When the segment-level relevance score reaches 0, the model predicts that all users in the lookalike segment are from the same distribution as the seed data. Increasing the output size is likely to include users in the lookalike segment that aren't from the same distribution as the seed data.

Relevance scores are normalized within a single campaign and should not be used to compare across campaigns. Relevancy scores shouldn't be used as a single-sourced evidence for any business outcome because those are impacted by multiple complex factors in addition to relevance, such as inventory quality, inventory type, timing of advertising, and so on.

Relevance scores should not be used to judge the quality of the seed, but rather if it can be increased or decreased. Consider the following examples:
+ All positive scores – This indicates that there are more output users that are predicted as similar than are included in the lookalike segment. This is common for seed data that's part of a large market, such as everybody who has bought toothpaste in the past month. We recommend looking at smaller seed data, such as everybody who has bought toothpaste more than once in the past month.
+ All negatives scores or negative for your desired lookalike segment size – This indicates that Clean Rooms ML predicts there aren't enough similar users in the desired lookalike segment size. This can be because the seed data is too specific or the market is too small. We recommend either applying fewer filters to the seed data or widening the market. For example, if the original seed data was customers that bought a stroller and car seat, you could expand the market to customers that bought multiple baby products.

Training data providers determine whether the relevance scores are exposed and the bucket bins where relevance scores are computed.

# Custom models in Clean Rooms ML
<a name="custom-models"></a>

With Clean Rooms ML, members of a collaboration can use a dockerized custom model algorithm that is stored in Amazon ECR to jointly analyze their data. To do this, the *model provider* must create an image and store it in Amazon ECR. Follow the steps in [Amazon Elastic Container Registry User Guide](https://docs.aws.amazon.com/AmazonECR/latest/userguide/) to create a private repository that will contain the custom ML model. 

Any member of a collaboration can be the *model provider*, provided they have the correct permissions. All members of a collaboration can contribute data to the model. For the purpose of this guide, members contributing data are referred to as *data providers*. The member who creates the collaboration is the *collaboration creator*, and this member can be either the *model provider*, one of the *data providers*, or both.

The following topics describe the information necessary to create a custom ML model

**Topics**
+ [Custom ML modeling prerequisites](custom-model-prerequisites.md)
+ [Model authoring guidelines for the training container](custom-model-guidelines.md)
+ [Model authoring guidelines for the inference container](inference-model-guidelines.md)
+ [Receiving model logs and metrics](custom-model-logs.md)

# Custom ML modeling prerequisites
<a name="custom-model-prerequisites"></a>

Before you can perform custom ML modeling, you should consider the following:
+ Determine whether both model training and inference on the trained model is going to be performed in the collaboration.
+ Determine the role that each collaboration member will perform and assign them the appropriate abilities.
  + Assign the `CAN_QUERY` ability to the member who will train the model and run inference on the trained model.
  + Assign the `CAN_RECEIVE_RESULTS` to at least one member of the collaboration.
  + Assign `CAN_RECEIVE_MODEL_OUTPUT` or `CAN_RECEIVE_INFERENCE_OUTPUT` abilities to the member that will receive trained model exports or inference output, respectively. You can choose to use both abilities if they are required by your use-case.
+ Determine the maximum size of the trained model artifacts or inference results that you will allow to be exported.
+ We recommend that all users have the `CleanrooomsFullAccess` and `CleanroomsMLFullAccess` policies attached to their role. Using custom ML models requires using both the AWS Clean Rooms and AWS Clean Rooms ML SDKs.
+ Consider the following information about IAM roles.
  + All data providers must have a service access role that allows AWS Clean Rooms to read data from their AWS Glue catalogs and tables, and the underlying Amazon S3 locations. These roles are similar to those required for SQL querying. This allows you to use the `CreateConfiguredTableAssociation` action. For more information, see [Create a service role to create a configured table association](ml-roles.md#ml-roles-custom-configure-table). 
  + All members that want to receive metrics must have a service access role that allows them to write CloudWatch metrics and logs. This role is used by Clean Rooms ML to write all model metrics and logs to the member's AWS account during model training and inference. We also provide privacy controls to determine which members have access to the metrics and logs. This allows you to use the `CreateMLConfiguration` action. For more information see, [Create a service role for custom ML modeling - ML Configuration](ml-roles.md#ml-roles-custom-configure). 

    The member receiving results must provide a service access role with permissions to write to their Amazon S3 bucket. This role allows Clean Rooms ML to export results (trained model artifacts or inference results) to an Amazon S3 bucket. This allows you to use the `CreateMLConfiguration` action. For more information, see [Create a service role for custom ML modeling - ML Configuration](ml-roles.md#ml-roles-custom-configure). 
  + The model provider must provide a service access role with permissions to read their Amazon ECR repository and image. This allows you to use the `CreateConfigureModelAlgorithm` action. For more information, see [Create a service role to provide a custom ML model](ml-roles.md#ml-roles-custom-model-provider). 
  + The member that creates the `MLInputChannel` to generate datasets for training or inference must provide a service access role that allows Clean Rooms ML to execute an SQL query in AWS Clean Rooms. This allows you to use the `CreateTrainedModel` and `StartTrainedModelInferenceJob` actions. For more information, see [Create a service role to query a dataset](ml-roles.md#ml-roles-custom-query-dataset). 
+ Model authors should follow the [Model authoring guidelines for the training container](custom-model-guidelines.md) and [Model authoring guidelines for the inference containerReceiving model logs and metrics](inference-model-guidelines.md) to ensure model inputs and outputs are configured as expected by AWS Clean Rooms.

# Model authoring guidelines for the training container
<a name="custom-model-guidelines"></a>

This section details the guidelines that model providers should follow when creating a custom ML model algorithm for Clean Rooms ML.
+ Use the appropriate SageMaker AI training-supported container base image, as described in the [SageMaker AI Developer Guide](https://docs.aws.amazon.com/sagemaker/latest/dg-ecr-paths/sagemaker-algo-docker-registry-paths.html). The following code allows you to pull the supported container base images from public SageMaker AI endpoints.

  ```
  ecr_registry_endpoint='763104351884.dkr.ecr.$REGION.amazonaws.com'
  base_image='pytorch-training:2.3.0-cpu-py311-ubuntu20.04-sagemaker'
  aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $ecr_registry_endpoint
  docker pull $ecr_registry_endpoint/$base_image
  ```
+ When authoring the model locally, ensure the following so that you can test your model locally, on a development instance, on SageMaker AI Training in your AWS account, and on Clean Rooms ML.
  + We recommend writing a training script that accesses useful properties about the training environment through various environment variables. Clean Rooms ML uses the following arguments to invoke training on your model code: `SM_MODEL_DIR`, `SM_OUTPUT_DIR`, `SM_CHANNEL_TRAIN`, and `FILE_FORMAT`. These defaults are used by Clean Rooms ML to train your ML model in its own execution environment with the data from all parties.
  + Clean Rooms ML makes your training input channels available via the `/opt/ml/input/data/channel-name` directories in the docker container. Each ML input channel is mapped based on its corresponding `channel_name` provided in the `CreateTrainedModel` request.

    ```
    parser = argparse.ArgumentParser()# Data, model, and output directories
    
    parser.add_argument('--model_dir', type=str, default=os.environ.get('SM_MODEL_DIR', "/opt/ml/model"))
    parser.add_argument('--output_dir', type=str, default=os.environ.get('SM_OUTPUT_DIR', "/opt/ml/output/data"))
    parser.add_argument('--train_dir', type=str, default=os.environ.get('SM_CHANNEL_TRAIN', "/opt/ml/input/data/train"))
    parser.add_argument('--train_file_format', type=str, default=os.environ.get('FILE_FORMAT', "csv"))
    ```
  + Ensure that you are able to generate a synthetic or test dataset based on the schema of the collaborators that will be used in your model code.
  + Ensure that you can run a SageMaker AI training job on your own AWS account before you associate the model algorithm with a AWS Clean Rooms collaboration.

    The following code contains a sample Docker file that is compatible with local testing, SageMaker AI Training environment testing, and Clean Rooms ML

    ```
    FROM  763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:2.3.0-cpu-py311-ubuntu20.04-sagemaker
    MAINTAINER $author_name
    
    ENV PYTHONDONTWRITEBYTECODE=1 \
        PYTHONUNBUFFERED=1 \
        LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/local/lib"
    
    ENV PATH="/opt/ml/code:${PATH}"
    
    # this environment variable is used by the SageMaker PyTorch container to determine our user code directory
    ENV SAGEMAKER_SUBMIT_DIRECTORY /opt/ml/code
    
    # copy the training script inside the container
    COPY train.py /opt/ml/code/train.py
    # define train.py as the script entry point
    ENV SAGEMAKER_PROGRAM train.py
    ENTRYPOINT ["python", "/opt/ml/code/train.py"]
    ```
+ To best monitor container failures, we recommend exporting logs and debugging for failure reasons. In a `GetTrainedModel` response, Clean Rooms ML returns the first 1024 characters from this file under `StatusDetails`. 
+ After you have completed any model changes and you are ready to test it in the SageMaker AI environment, run the following commands in the order provided.

  ```
  export ACCOUNT_ID=xxx
  export REPO_NAME=xxx
  export REPO_TAG=xxx
  export REGION=xxx
  
  docker build -t $ACCOUNT_ID.dkr.ecr.us-west-2.amazonaws.com/$REPO_NAME:$REPO_TAG
  
  # Sign into AWS $ACCOUNT_ID/ Run aws configure
  # Check the account and make sure it is the correct role/credentials
  aws sts get-caller-identity
  aws ecr create-repository --repository-name $REPO_NAME --region $REGION
  aws ecr describe-repositories --repository-name $REPO_NAME --region $REGION
  
  # Authenticate Doker
  aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com
  
  # Push To ECR Images
  docker push  $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com$REPO_NAME:$REPO_TAG
  
  # Create Sagemaker Training job
  # Configure the training_job.json with
  # 1. TrainingImage
  # 2. Input DataConfig
  # 3. Output DataConfig
  aws sagemaker create-training-job --cli-input-json file://training_job.json --region $REGION
  ```

  After the SageMaker AI job is complete and you are satisfied with your model algorithm, you can register the Amazon ECR Registry with AWS Clean Rooms ML. Use the `CreateConfiguredModelAlgorithm` action to register the model algorithm and the `CreateConfiguredModelAlgorithmAssociation` to associate it to a collaboration.

# Model authoring guidelines for the inference container
<a name="inference-model-guidelines"></a>

This section details the guidelines that model providers should follow when creating an inference algorithm for Clean Rooms ML.
+ Use the appropriate SageMaker AI inference-supported container base image, as described in the [SageMaker AI Developer Guide](https://docs.aws.amazon.com/sagemaker/latest/dg-ecr-paths/sagemaker-algo-docker-registry-paths.html). The following code allows you to pull the supported container base images from public SageMaker AI endpoints.

  ```
  ecr_registry_endpoint='763104351884.dkr.ecr.$REGION.amazonaws.com'
  base_image='pytorch-inference:2.3.0-cpu-py311-ubuntu20.04-sagemaker'
  aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $ecr_registry_endpoint
  docker pull $ecr_registry_endpoint/$base_image
  ```
+ When authoring the model locally, ensure the following so that you can test your model locally, on a development instance, on SageMaker AI Batch Transform in your AWS account, and on Clean Rooms ML.
  + Clean Rooms ML makes your model artifacts from inference available for use by your inference code via the `/opt/ml/model` directory in the docker container.
  + Clean Rooms ML splits input by line, uses a `MultiRecord` batch strategy, and adds a newline character at the end of every transformed record.
  + Ensure that you are able to generate a synthetic or test inference dataset based on the schema of the collaborators that will be used in your model code.
  + Ensure that you can run a SageMaker AI batch transform job on your own AWS account before you associate the model algorithm with a AWS Clean Rooms collaboration.

    The following code contains a sample Docker file that is compatible with local testing, SageMaker AI transform environment testing, and Clean Rooms ML

    ```
    FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:1.12.1-cpu-py38-ubuntu20.04-sagemaker
    
    ENV PYTHONUNBUFFERED=1
    
    COPY serve.py /opt/ml/code/serve.py
    COPY inference_handler.py /opt/ml/code/inference_handler.py
    COPY handler_service.py /opt/ml/code/handler_service.py
    COPY model.py /opt/ml/code/model.py
    
    RUN chmod +x /opt/ml/code/serve.py
    
    ENTRYPOINT ["/opt/ml/code/serve.py"]
    ```
+ After you have completed any model changes and you are ready to test it in the SageMaker AI environment, run the following commands in the order provided.

  ```
  export ACCOUNT_ID=xxx
  export REPO_NAME=xxx
  export REPO_TAG=xxx
  export REGION=xxx
  
  docker build -t $ACCOUNT_ID.dkr.ecr.us-west-2.amazonaws.com/$REPO_NAME:$REPO_TAG
  
  # Sign into AWS $ACCOUNT_ID/ Run aws configure
  # Check the account and make sure it is the correct role/credentials
  aws sts get-caller-identity
  aws ecr create-repository --repository-name $REPO_NAME --region $REGION
  aws ecr describe-repositories --repository-name $REPO_NAME --region $REGION
  
  # Authenticate Docker
  aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com
  
  # Push To ECR Repository
  docker push $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com$REPO_NAME:$REPO_TAG
  
  # Create Sagemaker Model
  # Configure the create_model.json with
  # 1. Primary container - 
      # a. ModelDataUrl - S3 Uri of the model.tar from your training job
  aws sagemaker create-model --cli-input-json file://create_model.json --region $REGION
  
  # Create Sagemaker Transform Job
  # Configure the transform_job.json with
  # 1. Model created in the step above 
  # 2. MultiRecord batch strategy
  # 3. Line SplitType for TransformInput
  # 4. AssembleWith Line for TransformOutput
  aws sagemaker create-transform-job --cli-input-json file://transform_job.json --region $REGION
  ```

  After the SageMaker AI job is complete and you are satisfied with your batch transform, you can register the Amazon ECR Registry with AWS Clean Rooms ML. Use the `CreateConfiguredModelAlgorithm` action to register the model algorithm and the `CreateConfiguredModelAlgorithmAssociation` to associate it to a collaboration.

# Receiving model logs and metrics
<a name="custom-model-logs"></a>

To receive logs and metrics from custom model training or inference, members must have [created an ML Configuration](https://docs.aws.amazon.com/clean-rooms/latest/userguide/create-custom-ml-collaboration.html) with a valid role that provides the necessary CloudWatch permissions (see [Create a service role for custom ML modeling - ML Configuration](https://docs.aws.amazon.com/clean-rooms/latest/userguide/ml-roles.html#ml-roles-custom-configure)).

**System metric**

System metrics for both training and inference, such as CPU and memory utilization, are published to all members in the collaboration with valid ML Configurations. These metrics can be viewed as the job progresses via CloudWatch Metrics in the `/aws/cleanroomsml/TrainedModels` or `/aws/cleanroomsml/TrainedModelInferenceJobs` namespaces, respectively.

**Model logs**

Access to the model logs is provided by the privacy configuration policy of each configured model algorithm. The model author sets the privacy configuration policy when associating a configured model algorithm (either via the console or the `CreateConfiguredModelAlgorithmAssociation` API) to a collaboration. Setting the privacy configuration policy controls which members can receive the model logs.

Additionally, the model author can set a filter pattern in the privacy configuration policy to filter log events. All logs that a model container sends to `stdout` or `stderr` and that match the filter pattern (if set), are sent to Amazon CloudWatch Logs. Model logs are available in CloudWatch log groups `/aws/cleanroomsml/TrainedModels` or `/aws/cleanroomsml/TrainedModelInferenceJobs`, respectively.

**Custom defined metrics**

When you configure a model algorithm (either via the console or the `CreateConfiguredModelAlgorithm` API), the model author can provide specific metric names and regex statements to search for in the output logs. These can be viewed as the job progresses via CloudWatch Metrics in the `/aws/cleanroomsml/TrainedModels` namespace. When associating a configured model algorithm, the model author can set an optional noise level in the metrics privacy configuration to avoid outputting raw data while still providing visibility into custom metric trends. If a noise level is set, the metrics are published at the end of the job rather than in real time.