AWS Clean Rooms ML
AWS Clean Rooms ML allows two or more parties to run machine learning models on their data without the need to share their data with each other. The service provides privacy-enhancing controls that allow data owners to safe-guard their data and their model IP. You can use AWS authored models or bring your own custom model.
For a more detailed explanation of how this works, see Cross-account jobs.
For more information about the capabilities of Clean Rooms ML models, see the following topics.
Topics
How AWS Clean Rooms ML works with AWS models
Working with lookalike models requires that two parties, a training data provider and a seed data provider, work sequentially in AWS Clean Rooms to bring their data into a collaboration. This is the workflow that the training data provider must complete first:
-
The training data provider's data must be stored in a AWS Glue data catalog table of user-item interactions. At a minimum, the training data must contain a user ID column, interaction ID column, and a timestamp column.
-
The training data provider registers the training data with AWS Clean Rooms.
-
The training data provider creates a lookalike model that can be shared with multiple seed data providers. The lookalike model is a deep neural network that can take up to 24 hours to train. It isn't automatically retrained and we recommend that you retrain the model weekly.
-
The training data provider configures the lookalike model, including whether to share relevance metrics and the Amazon S3 location of the output segments. The training data provider can create multiple configured lookalike models from a single lookalike model.
-
The training data provider associates the configured audience model to a collaboration that's shared with a seed data provider.
This is the workflow that the seed data provider must complete next:
-
The seed data provider's data can be stored in an Amazon S3 bucket or it can come from the results of query.
-
The seed data provider opens the collaboration that they share with the training data provider.
-
The seed data provider creates a lookalike segment from the Clean Rooms ML tab of the collaboration page.
-
The seed data provider can evaluate the relevance metrics, if they were shared, and export the lookalike segment for use outside AWS Clean Rooms.
How AWS Clean Rooms ML works with custom models
With Clean Rooms ML, members of a collaboration can use a dockerized custom model algorithm that is stored in Amazon ECR to jointly analyze their data. To do this, the model provider must create an image and store it in Amazon ECR. Follow the steps in Amazon Elastic Container Registry User Guide to create a private repository that will contain the custom ML model.
Any member of a collaboration can be the model provider, provided they have the correct permissions. All members of a collaboration can contribute training data, inference data, or both to the model. For the purpose of this guide, members contributing data are referred to as data providers. The member who creates the collaboration is the collaboration creator, and this member can be either the model provider, one of the data providers, or both.
At the highest level, here are the steps that must be completed to perform custom ML modeling:
-
The collaboration creator creates a collaboration and assigns each member the proper member abilities and payment configuration. The collaboration creator must assign the member ability to either receive model outputs or receive inference results to the appropriate member in this step because it can't be updated after the collaboration is created. For more information, see Creating the collaboration.
-
The model provider configures and associates their containerized ML model to the collaboration and ensures privacy constraints are set for exported data. For more information, see Configuring a model algorithm.
-
The data providers contribute their data to the collaboration and ensure their privacy needs are specified. Data providers must allow the model to access their data. For more information, see Contributing training data and Associating the configured model algorithm.
-
A collaboration member creates the ML configuration, which defines where the model artifacts or inference results are exported to.
-
A collaboration member creates an ML input channel that provides input to the training container or inference container. The ML input channel is a query that defines the data to be used in the context of the model algorithm.
-
A collaboration member invokes model training using the ML input channel and the configured model algorithm. For more information, see Creating a trained model.
-
(Optional) The model trainer invokes the model export job and the model artifacts are sent to the model results receiver. Only members with a valid ML configuration and the member ability to receive model output can receive model artifacts. For more information, see Exporting model artifacts.
-
(Optional) A collaboration member invokes model inference using the ML input channel, the trained model ARN, and the inference configured model algorithm. The inference results are sent to the inference output receiver. Only members with a valid ML configuration and the member ability to receive inference output can receive inference results.
Here are the steps that must be completed by the model provider:
-
Create a SageMaker AI compatible Amazon ECR docker image. Clean Rooms ML supports only SageMaker AI compatible docker images.
-
After you have created a SageMaker AI compatible docker image, push the image to Amazon ECR. Follow the directions in Amazon Elastic Container Registry User Guide to create a container training image.
-
Configure the model algorithm for use in Clean Rooms ML.
-
Provide the Amazon ECR repository link and any arguments necessary to configure the model algorithm.
-
Provide a service access role that allows Clean Rooms ML to access the Amazon ECR repository.
-
Associate the configured model algorithm with the collaboration. This includes providing a privacy policy that defines controls for container logs, failure logs, CloudWatch metrics, and limits about how much data can be exported from the container results.
-
Here are the steps that must be completed by the data provider to collaborate with a custom ML model:
-
Configure an existing AWS Glue table with a custom analysis rule. This allows a specific set of pre-approved queries or pre-approved accounts to use your data.
-
Associate your configured table with a collaboration and provide a service access role that can access your AWS Glue tables.
-
Add a collaboration analysis rule to the table that allows the configured model algorithm association to access the configured table.
-
After the model and data are associated and configured in Clean Rooms ML, the member with the ability to run queries provides an SQL query and selects the model algorithm to use.
After model training is finished, that member initiates the export of model training artifacts or inference results. These artifacts or results are sent to the member with the ability to received trained model output. The results receiver must configure their MachineLearningConfiguration
before they can receive model output.