Users dataset schema requirements (custom)
A Users dataset stores metadata about your users. This might include information such as age, gender, and loyalty membership for each item. For information on the types of user data you can import into Amazon Personalize, see User metadata.
The data you provide for each user must match your schema. At
minimum, you must provide a User ID for each user (max length
256 characters). Depending on your schema, user metadata
can include empty/null values. Your Users schema must have minimum one
metadata field, but if you add a null
type, this value can be
null for the user. You are free to add additional fields depending on your
use case and your data. As long as the fields aren't listed as required or
reserved, and the data types are listed in Schema data types,
the field names and data types are up to you.
To use categorical data, add a field of type string
and set the field's categorical attribute to true
in your schema.
Then include the categorical data in your bulk CSV file and individual record imports.
For users with multiple categories, separate each value using the vertical bar, '|'. For example, for a SUBSCRIPTION_MODEL field, your data for a user might be
student|monthly|discount.
Categorical values can have at most 1000 characters. If you have a user with a categorical value with more than 1000 characters, your dataset import job will fail.
For more information on minimum requirements and maximum data limits for a Users dataset, see Service quotas.
Users schema example (custom)
The following example shows how to structure a Users schema. The
USER_ID
field is required and the AGE
and
GENDER
fields are metadata. At least one metadata field
is required and you can add at most 25 metadata
fields. For information about schema requirements see Custom dataset and schema
requirements.
{ "type": "record", "name": "Users", "namespace": "com.amazonaws.personalize.schema", "fields": [ { "name": "USER_ID", "type": "string" }, { "name": "AGE", "type": "int" }, { "name": "GENDER", "type": "string", "categorical": true } ], "version": "1.0" }
For this schema, the first few lines of historical data in a CSV file might look like the following.
USER_ID,AGE,GENDER 5,34,Male 6,56,Female 8,65,Male ... ...