AWS models in Clean Rooms ML
AWS Clean Rooms ML provides a privacy-preserving method for two parties to identify similar users in their data without the need to share their data with each other. The first party brings the training data to AWS Clean Rooms so that they can create and configure a lookalike model and associate it with a collaboration. Then, seed data is brought to the collaboration to create a lookalike segment that resembles the training data.
For a more detailed explanation of how this works, see Cross-account jobs.
The following topics provide information on how to create and configure a AWS models in Clean Rooms ML.
Topics
AWS Clean Rooms ML terminology
It is important to understand the following terminology when using Clean Rooms ML:
-
Training data provider – The party that contributes the training data, creates and configures a lookalike model, and then associates that lookalike model with a collaboration.
-
Seed data provider – The party that contributes the seed data, generates a lookalike segment, and exports their lookalike segment.
-
Training data – The training data provider's data, which is used to generate a lookalike model. The training data is used to measure similarity in user behaviors.
The training data must contain a user ID, item ID, and timestamp column. Optionally, the training data can contain other interactions as numerical or categorical features. Examples of interactions are a list of videos watched, items purchased, or articles read.
-
Seed data – The seed data provider's data, which is used to create a lookalike segment. The seed data can be provided directly or it can come from the results of an AWS Clean Rooms query. The lookalike segment output is a set of users from the training data that most closely resembles the seed users.
-
Lookalike model – A machine learning model of the training data that is used to find similar users in other datasets.
When using the API, the term audience model is used equivalently to lookalike model. For example, you use the CreateAudienceModel API to create a lookalike model.
-
Lookalike segment – A subset of the training data that most closely resembles the seed data.
When using the API, you create a lookalike segment with the StartAudienceGenerationJob API.
The training data provider's data is never shared with the seed data provider and the seed data provider's data is never shared with the training data provider. The lookalike segment output is shared with the training data provider, but never the seed data provider.