

# Importing training data into Amazon Personalize datasets
<a name="import-data"></a>

After you complete [create a schema and a dataset](data-prep-creating-datasets.md), you are ready to import your training data into the dataset. When you import data, you can choose to import records in bulk, individually, or both.
+ Bulk imports involve importing a large number of historical records at once. You can prepare bulk data yourself, and import it directly into Amazon Personalize from a CSV file in Amazon S3. For information about how to prepare your data, see [Preparing training data for Amazon Personalize](preparing-training-data.md). If you need help preparing your data, you can use SageMaker AI Data Wrangler to prepare and import your bulk item interaction, user, and item data. For more information, see [Preparing and importing bulk data using Amazon SageMaker AI Data Wrangler](preparing-importing-with-data-wrangler.md).
+ If you don't have bulk data, you can use individual import operations to collect data and stream events until you meet Amazon Personalize training requirements and the data requirements of your domain use case or recipe. For information about recording events, see [Recording real-time events to influence recommendations](recording-events.md). For information about importing individual records, see [Importing individual records into an Amazon Personalize dataset](incremental-data-updates.md). 

 After you import data into an Amazon Personalize dataset, you can [analyze it](analyzing-data.md), [export it to an Amazon S3 bucket](export-data.md), [update it](updating-datasets.md), or [delete it](delete-dataset.md) by deleting the dataset.

If you import an item, user, or action with the same ID as a record that's already in your dataset, Amazon Personalize replaces it with the new record. If you record two item interaction or action interaction events with exactly the same timestamp and identical properties, Amazon Personalize keeps only one of the events.

 As your catalog grows, update your historical data with additional bulk, or individual data, import operations. For real-time recommendations, keep your Item interactions dataset up to date with your users' behavior. You do this by recording real-time interaction *[events](https://docs.aws.amazon.com/glossary/latest/reference/glos-chap.html#event)* with an event tracker and the [PutEvents](API_UBS_PutEvents.md) operation. For more information, see [Recording real-time events to influence recommendations](recording-events.md) 

 After you import your data, you are ready to create domain recommenders (for Domain dataset groups) or custom resources (for Custom dataset group) to train a model on your data. You use these resources to generate recommendations. For more information, see [Domain recommenders in Amazon Personalize](creating-recommenders.md) or [Custom resources for training and deploying Amazon Personalize models](create-custom-resources.md). 

**Topics**
+ [Importing bulk data into Amazon Personalize with a dataset import job](bulk-data-import-step.md)
+ [Preparing and importing bulk data using Amazon SageMaker AI Data Wrangler](preparing-importing-with-data-wrangler.md)
+ [Importing individual records into an Amazon Personalize dataset](incremental-data-updates.md)

# Importing bulk data into Amazon Personalize with a dataset import job
<a name="bulk-data-import-step"></a>

After you have formatted your input data (see [Preparing training data for Amazon Personalize](preparing-training-data.md)) and completed [Creating a schema and a dataset](data-prep-creating-datasets.md), you are ready to import your bulk data with a dataset import job. A *dataset import job* is a bulk import tool that populates a dataset with data from Amazon S3.

 To import data from Amazon S3, your CSV files must be in an Amazon S3 bucket and you must give Amazon Personalize permission to access to your Amazon S3 resources: 
+ For information about uploading files to Amazon S3, see [Uploading Files and Folders by Using Drag and Drop](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/upload-objects.html) in the Amazon Simple Storage Service User Guide.
+ For information about giving Amazon Personalize access to your files in Amazon S3, see [Giving Amazon Personalize access to Amazon S3 resources](granting-personalize-s3-access.md).

   If you use AWS Key Management Service (AWS KMS) for encryption, you must grant Amazon Personalize and your Amazon Personalize IAM service role permission to use your key. For more information, see [Giving Amazon Personalize permission to use your AWS KMS key](granting-personalize-key-access.md).

You can create a dataset import job using the Amazon Personalize console, AWS Command Line Interface (AWS CLI), or AWS SDKs. If you previously created a dataset import job for a dataset, you can use a new dataset import job to add to or replace the existing bulk data. For more information, see [Updating data in datasets after training](updating-datasets.md). 

If you import an item, user, or action with the same ID as a record that's already in your dataset, Amazon Personalize replaces it with the new record. If you record two item interaction or action interaction events with exactly the same timestamp and identical properties, Amazon Personalize keeps only one of the events.

 After you import your data, you are ready to create domain recommenders (for Domain dataset groups) or custom resources (for Custom dataset group) to train a model on your data. You use these resources to generate recommendations. For more information, see [Domain recommenders in Amazon Personalize](creating-recommenders.md) or [Custom resources for training and deploying Amazon Personalize models](create-custom-resources.md). 



**Topics**
+ [Import modes](#bulk-import-modes)
+ [Creating a dataset import job (console)](#bulk-data-import-console)
+ [Creating a dataset import job (AWS CLI)](#bulk-data-import-cli)
+ [Creating a dataset import job (AWS SDKs)](#python-import-ex)

## Import modes
<a name="bulk-import-modes"></a>

If you already created an import job for the dataset, you can configure how Amazon Personalize adds your new records. To do this, you specify an import mode for your dataset import job. If you haven't imported bulk records, the **Import mode** field is not available in the console and you can only specify `FULL` in the `CreateDatasetImportJob` API operation. The default is a full replacement.
+ To overwrite all existing bulk data in your dataset, choose **Replace existing data** in the Amazon Personalize console or specify `FULL` in the [CreateDatasetImportJob](API_CreateDatasetImportJob.md) API operation. This doesn't replace data you imported individually, including events recorded in real time.
+ To append the records to the existing data in your dataset, choose **Add to existing data** or specify `INCREMENTAL` in the `CreateDatasetImportJob` API operation. Amazon Personalize replaces any record with the same ID with the new one.
**Note**  
To append data to an Item interactions dataset or Action interactions dataset with a dataset import job, you must have at minimum 1000 new item interaction or action interaction records.

## Creating a dataset import job (console)
<a name="bulk-data-import-console"></a>

**Important**  
By default, a dataset import job replaces any existing data in the dataset that you imported in bulk. If you already imported bulk data, you can append data by changing the job's [import mode](#bulk-import-modes).

 To import bulk records into a dataset with the Amazon Personalize console, create a dataset import job with a name, the IAM service role, and the location of your data.

If you just created your dataset in [Creating a schema and a dataset](data-prep-creating-datasets.md), skip to step 5.

**To import bulk records (console)**

1. Open the Amazon Personalize console at [https://console.aws.amazon.com/personalize/home](https://console.aws.amazon.com/personalize/home) and sign in to your account.

1.  On the **Dataset groups** page, choose your dataset group. The dataset group **Overview** displays.

1. In the navigation pane, choose **Datasets** and choose the dataset you want to import bulk data into.

1. In **Dataset import jobs**, choose **Create dataset import job**.

1. If this is your first dataset import job, for **Data import source** choose **Import data from S3**.

1. For **Dataset import job name**, specify a name for your import job.

1. If you already imported bulk data, for **Import mode**, choose how to update the dataset. Choose either **Replace existing data** or **Add to existing data**. data. This option doesn't appear if it's your first job for the dataset. For more information, see [Updating data in datasets after training](updating-datasets.md).

1. In **Data import source**, for **Data Location**, specify where your data file is stored in Amazon S3. Use the following syntax:

   **s3:/amzn-s3-demo-bucket/<folder path>/<CSV filename>**

   If your CSV files are in a folder in your Amazon S3 bucket and you want to upload multiple CSV files to a dataset with one dataset import job, you can specify the path to the folder. Amazon Personalize only uses the files in the first level of your folder, it doesn't use any data in any sub folders. Use the following syntax with a `/` after the folder name:

   **s3:/amzn-s3-demo-bucket/<folder path>/**

1. In **IAM role**, choose to either create a new role or use an existing one. If you completed the prerequisites, choose **Use an existing service role** and specify the role that you created in [Creating an IAM role for Amazon Personalize](set-up-required-permissions.md#set-up-create-role-with-permissions). 

1. If you created a metric attribution and want to publish metrics related to this job to Amazon S3, in **Publish event metrics to S3** choose **Publish metrics for this import job**. 

   If you haven't created one and want to publish metrics for this job, choose **Create metric attribution** to create a new one on a different tab. After you create the metric attribution, you can return to this screen and finish creating the import job. 

   For more information on metric attributions, see [Measuring the impact of Amazon Personalize recommendations](measuring-recommendation-impact.md).

1. For **Tags**, optionally add any tags. For more information about tagging Amazon Personalize resources, see [Tagging Amazon Personalize resources](tagging-resources.md).

1. Choose **Start import**. The data import job starts and the **Dashboard Overview** page is displayed. The dataset import is complete when the status shows as ACTIVE. After you import data into an Amazon Personalize dataset, you can [analyze it](analyzing-data.md), [export it to an Amazon S3 bucket](export-data.md), [update it](updating-datasets.md), or [delete it](delete-dataset.md) by deleting the dataset. 

    After you import your data, you are ready to create domain recommenders (for Domain dataset groups) or custom resources (for Custom dataset group) to train a model on your data. You use these resources to generate recommendations. For more information, see [Domain recommenders in Amazon Personalize](creating-recommenders.md) or [Custom resources for training and deploying Amazon Personalize models](create-custom-resources.md). 

## Creating a dataset import job (AWS CLI)
<a name="bulk-data-import-cli"></a>

**Important**  
By default, a dataset import job replaces any existing data in the dataset that you imported in bulk. If you already imported bulk data, you can append data by changing the job's [import mode](#bulk-import-modes).

 To import bulk records using the AWS CLI, create a dataset import job using the [CreateDatasetImportJob](API_CreateDatasetImportJob.md) command. If you've previously created a dataset import job for a dataset, you can use the import mode parameter to specify how to add the new data. For more information about updating existing bulk data, see [Updating data in datasets after training](updating-datasets.md).

**Import bulk records (AWS CLI)**

1. Create a dataset import job by running the following command. Provide the Amazon Resource Name (ARN) for your dataset and specify the path to your Amazon S3 bucket where you stored the training data. Use the following syntax for the path:

   **s3:/amzn-s3-demo-bucket/<folder path>/<CSV filename>**

   If your CSV files are in a folder in your Amazon S3 bucket and you want to upload multiple CSV files to a dataset with one dataset import job, you can specify the path to the folder. Amazon Personalize only uses the files in the first level of your folder, it doesn't use any data in any sub folders. Use the following syntax with a `/` after the folder name:

   **s3:/amzn-s3-demo-bucket/<folder path>/**

   Provide the AWS Identity and Access Management (IAM) role Amazon Resource Name (ARN) that you created in [Creating an IAM role for Amazon Personalize](set-up-required-permissions.md#set-up-create-role-with-permissions). The default `import-mode` is `FULL`. For more information see [Updating data in datasets after training](updating-datasets.md). For more information about the operation, see [CreateDatasetImportJob](API_CreateDatasetImportJob.md).

   ```
   aws personalize create-dataset-import-job \
   --job-name dataset import job name \
   --dataset-arn dataset arn \
   --data-source dataLocation=s3://amzn-s3-demo-bucket/filename \
   --role-arn roleArn \
   --import-mode FULL
   ```

   The dataset import job ARN is displayed, as shown in the following example.

   ```
   {
     "datasetImportJobArn": "arn:aws:personalize:us-west-2:acct-id:dataset-import-job/DatasetImportJobName"
   }
   ```

1. Check the status by using the `describe-dataset-import-job` command. Provide the dataset import job ARN that was returned in the previous step. For more information about the operation, see [DescribeDatasetImportJob](API_DescribeDatasetImportJob.md).

   ```
   aws personalize describe-dataset-import-job \
   --dataset-import-job-arn dataset import job arn
   ```

   The properties of the dataset import job, including its status, are displayed. Initially, the `status` shows as CREATE PENDING.

   ```
   {
     "datasetImportJob": {
         "jobName": "Dataset Import job name",
         "datasetImportJobArn": "arn:aws:personalize:us-west-2:acct-id:dataset-import-job/DatasetImportJobArn",
         "datasetArn": "arn:aws:personalize:us-west-2:acct-id:dataset/DatasetGroupName/INTERACTIONS",
         "dataSource": {
             "dataLocation": "s3://amzn-s3-demo-bucket/ratings.csv"
         },
         "importMode": "FULL",
         "roleArn": "role-arn",
         "status": "CREATE PENDING",
         "creationDateTime": 1542392161.837,
         "lastUpdatedDateTime": 1542393013.377
     }
   }
   ```

   The dataset import is complete when the status shows as ACTIVE. After you import data into an Amazon Personalize dataset, you can [analyze it](analyzing-data.md), [export it to an Amazon S3 bucket](export-data.md), [update it](updating-datasets.md), or [delete it](delete-dataset.md) by deleting the dataset. 

    After you import your data, you are ready to create domain recommenders (for Domain dataset groups) or custom resources (for Custom dataset group) to train a model on your data. You use these resources to generate recommendations. For more information, see [Domain recommenders in Amazon Personalize](creating-recommenders.md) or [Custom resources for training and deploying Amazon Personalize models](create-custom-resources.md). 

## Creating a dataset import job (AWS SDKs)
<a name="python-import-ex"></a>

**Important**  
By default, a dataset import job replaces any existing data in the dataset that you imported in bulk. If you already imported bulk data, you can append data by changing the job's [import mode](#bulk-import-modes).

To import data, create a dataset import job with the [CreateDatasetImportJob](API_CreateDatasetImportJob.md) operation. The following code shows how to create a dataset import job.

Give the job name, set the `datasetArn` the Amazon Resource Name (ARN) of your dataset, and set the `dataLocation` to the path to your Amazon S3 bucket where you stored the training data. Use the following syntax for the path:

**s3:/amzn-s3-demo-bucket/<folder path>/<CSV filename>.csv**

If your CSV files are in a folder in your Amazon S3 bucket and you want to upload multiple CSV files to a dataset with one dataset import job, you can specify the path to the folder. Amazon Personalize only uses the files in the first level of your folder, it doesn't use any data in any sub folders. Use the following syntax with a `/` after the folder name:

**s3:/amzn-s3-demo-bucket/<folder path>/**

For the `roleArn`, specify the AWS Identity and Access Management (IAM) role that gives Amazon Personalize permissions to access your S3 bucket. See [Creating an IAM role for Amazon Personalize](set-up-required-permissions.md#set-up-create-role-with-permissions). The default `importMode` is `FULL`. This replaces all bulk data in the dataset. To append data, set it to `INCREMENTAL`. For more information about updating existing bulk data, see [Updating data in datasets after training](updating-datasets.md). 

------
#### [ SDK for Python (Boto3) ]

```
import boto3

personalize = boto3.client('personalize')

response = personalize.create_dataset_import_job(
    jobName = 'YourImportJob',
    datasetArn = 'dataset_arn',
    dataSource = {'dataLocation':'s3://amzn-s3-demo-bucket/filename.csv'},
    roleArn = 'role_arn',
    importMode = 'FULL'
)

dsij_arn = response['datasetImportJobArn']

print ('Dataset Import Job arn: ' + dsij_arn)

description = personalize.describe_dataset_import_job(
    datasetImportJobArn = dsij_arn)['datasetImportJob']

print('Name: ' + description['jobName'])
print('ARN: ' + description['datasetImportJobArn'])
print('Status: ' + description['status'])
```

------
#### [ SDK for Java 2.x ]

```
public static String createPersonalizeDatasetImportJob(PersonalizeClient personalizeClient,
                                                      String jobName,
                                                      String datasetArn,
                                                      String s3BucketPath,
                                                      String roleArn,
                                                      ImportMode importMode) {

  long waitInMilliseconds = 60 * 1000;
  String status;
  String datasetImportJobArn;
  
  try {
      DataSource importDataSource = DataSource.builder()
              .dataLocation(s3BucketPath)
              .build();
      
      CreateDatasetImportJobRequest createDatasetImportJobRequest = CreateDatasetImportJobRequest.builder()
              .datasetArn(datasetArn)
              .dataSource(importDataSource)
              .jobName(jobName)
              .roleArn(roleArn)
              .importMode(importMode)
              .build();
  
      datasetImportJobArn = personalizeClient.createDatasetImportJob(createDatasetImportJobRequest)
              .datasetImportJobArn();
      
      DescribeDatasetImportJobRequest describeDatasetImportJobRequest = DescribeDatasetImportJobRequest.builder()
              .datasetImportJobArn(datasetImportJobArn)
              .build();
  
      long maxTime = Instant.now().getEpochSecond() + 3 * 60 * 60;
  
      while (Instant.now().getEpochSecond() < maxTime) {
  
          DatasetImportJob datasetImportJob = personalizeClient
                  .describeDatasetImportJob(describeDatasetImportJobRequest)
                  .datasetImportJob();
  
          status = datasetImportJob.status();
          System.out.println("Dataset import job status: " + status);
  
          if (status.equals("ACTIVE") || status.equals("CREATE FAILED")) {
              break;
          }
          try {
              Thread.sleep(waitInMilliseconds);
          } catch (InterruptedException e) {
              System.out.println(e.getMessage());
          }
      }
      return datasetImportJobArn;
  
  } catch (PersonalizeException e) {
      System.out.println(e.awsErrorDetails().errorMessage());
  }
  return "";
}
```

------
#### [ SDK for JavaScript v3 ]

```
// Get service clients and commands using ES6 syntax.
import { CreateDatasetImportJobCommand, PersonalizeClient } from
  "@aws-sdk/client-personalize";

// create personalizeClient
const personalizeClient = new PersonalizeClient({
  region: "REGION"
});

// Set the dataset import job parameters.
export const datasetImportJobParam = {
  datasetArn: 'DATASET_ARN', /* required */
  dataSource: {  
    dataLocation: 's3://amzn-s3-demo-bucket/<folderName>/<CSVfilename>.csv'  /* required */
  },
  jobName: 'NAME',           /* required */
  roleArn: 'ROLE_ARN',       /* required */
  importMode: "FULL"         /* optional, default is FULL */
};

export const run = async () => {
  try {
    const response = await personalizeClient.send(new CreateDatasetImportJobCommand(datasetImportJobParam));
    console.log("Success", response);
    return response; // For unit tests.
  } catch (err) {
    console.log("Error", err);
  }
};
run();
```

------

The response from the [DescribeDatasetImportJob](API_DescribeDatasetImportJob.md) operation includes the status of the operation.

You must wait until the status changes to ACTIVE before you can use the data to train a model.

The dataset import is complete when the status shows as ACTIVE. After you import data into an Amazon Personalize dataset, you can [analyze it](analyzing-data.md), [export it to an Amazon S3 bucket](export-data.md), [update it](updating-datasets.md), or [delete it](delete-dataset.md) by deleting the dataset. 

 After you import your data, you are ready to create domain recommenders (for Domain dataset groups) or custom resources (for Custom dataset group) to train a model on your data. You use these resources to generate recommendations. For more information, see [Domain recommenders in Amazon Personalize](creating-recommenders.md) or [Custom resources for training and deploying Amazon Personalize models](create-custom-resources.md). 

# Preparing and importing bulk data using Amazon SageMaker AI Data Wrangler
<a name="preparing-importing-with-data-wrangler"></a>

**Important**  
As you use Data Wrangler, you incur SageMaker AI costs. For a complete list of charges and prices, see the Data Wrangler tab of [Amazon SageMaker AI pricing](https://aws.amazon.com/sagemaker/pricing/). To avoid incurring additional fees, when you are finished, shut down your Data Wrangler instance. For more information, see [Shut Down Data Wrangler](https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-shut-down.html). 

After you create a dataset group, you can use Amazon SageMaker AI Data Wrangler (Data Wrangler) to import data from 40\$1 sources into an Amazon Personalize dataset. Data Wrangler is a feature of Amazon SageMaker AI Studio Classic that provides an end-to-end solution to import, prepare, transform, and analyze data. You can't use Data Wrangler to prepare and import data into an Actions dataset or Action interactions dataset.

 When you use Data Wrangler to prepare and import data, you use a data flow. A *data flow* defines a series of machine learning data prep steps, starting with importing data. Each time you add a step to your flow, Data Wrangler takes an action on your data, such as transforming it or generating a visualization. 

The following are some of the steps that you can add to your flow to prepare data for Amazon Personalize:
+ **Insights:** You can add Amazon Personalize specific insight steps to your flow. These insights can help you learn about your data and what actions you can take to improve it.
+ **Visualizations:** You can add visualization steps to generate graphs such as histograms and scatter plots. Graphs can help you discover issues in your data, such as outliers or missing values.
+ **Transformations:** You can use Amazon Personalize specific and general transformation steps to make sure your data meets Amazon Personalize requirements. The Amazon Personalize transformation helps you map your data columns to required columns depending on the Amazon Personalize dataset type.

If you need to leave Data Wrangler before importing data into Amazon Personalize, you can return to where you left off by choosing the same dataset type when you [launch Data Wrangler from the Amazon Personalize console](dw-launch-dw-from-personalize.md). Or you can access Data Wrangler directly through SageMaker AI Studio Classic.

 We recommend you import data from Data Wrangler into Amazon Personalize as follows. The transformation, visualization and analysis steps are optional, repeatable, and can be completed in any order. 

1. **[Set up permissions](dw-data-prep-minimum-permissions.md)** - Set up permissions for Amazon Personalize and SageMaker AI service roles. And set up permissions for your users.

1. **[Launch Data Wrangler in SageMaker AI Studio Classic from the Amazon Personalize console](dw-launch-dw-from-personalize.md)** - Use the Amazon Personalize console to configure a SageMaker AI domain and launch Data Wrangler in SageMaker AI Studio Classic.

1. **[Import your data into Data Wrangler](dw-import-data.md)** - Import data from 40\$1 sources into Data Wrangler. Sources include AWS services, such as Amazon Redshift, Amazon EMR, or Amazon Athena, and 3rd parties such as Snowflake or DataBricks.

1. **[Transform your data](dw-transform-data.md)** - Use Data Wrangler to transform your data to meet Amazon Personalize requirements.

1. **[Visualize and analyze your data](dw-analyze-data.md)** - Use Data Wrangler to visualize your data and analyze it through Amazon Personalize specific insights.

1. **[Process and import data into Amazon Personalize](dw-export-data.md)** - Use a SageMaker AI Studio Classic Jupyter notebook to import your processed data into Amazon Personalize.

## Additional information
<a name="dw-additional-info"></a>

The following resources provide additional information about using Amazon SageMaker AI Data Wrangler and Amazon Personalize.
+ For a tutorial that walks you through processing and transforming a sample dataset, see [Demo: Data Wrangler Titanic Dataset Walkthrough](https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-getting-started.html#data-wrangler-getting-started-demo) in the *Amazon SageMaker AI Developer Guide*. This tutorial introduces the fields and functions of Data Wrangler.
+ For information on onboarding to Amazon SageMaker AI domains, see [Quick onboard to Amazon SageMaker AI Domain](https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-quick-start.html) in the *Amazon SageMaker AI Developer Guide*.
+ For information on Amazon Personalize data requirements, see [Preparing training data for Amazon Personalize](preparing-training-data.md).

# Setting up permissions
<a name="dw-data-prep-minimum-permissions"></a>

To prepare data with Data Wrangler, you must set up the following permissions: 
+ **Create a service role for Amazon Personalize:** If you haven't already, complete the instructions in [Setting up Amazon Personalize](setup.md) to create an IAM service role for Amazon Personalize. This role must have `GetObject` and `ListBucket` permissions for the Amazon S3 buckets that store your processed data. And it must have permission to use any AWS KMS keys.

   For information about granting Amazon Personalize access to your Amazon S3 buckets, see [Giving Amazon Personalize access to Amazon S3 resources](granting-personalize-s3-access.md). For information about granting Amazon Personalize access to your AWS KMS keys, see [Giving Amazon Personalize permission to use your AWS KMS key](granting-personalize-key-access.md). 
+  **Create an administrative user with SageMaker AI permissions:** Your administrator must have full access to SageMaker AI and must be able to create a SageMaker AI domain. For more information, see [Create an Administrative User and Group](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-set-up.html#gs-account-user) in the *Amazon SageMaker AI Developer Guide*. 
+ **Create a SageMaker AI execution role:** Create a SageMaker AI execution role with access to SageMaker AI resources and Amazon Personalize data import operations. The SageMaker AI execution role must have the [https://console.aws.amazon.com/iam/home?#/policies/arn:aws:iam::aws:policy/AmazonSageMakerFullAccess](https://console.aws.amazon.com/iam/home?#/policies/arn:aws:iam::aws:policy/AmazonSageMakerFullAccess) policy attached. If you require more granular Data Wrangler permissions, see [Data Wrangler Security and Permissions](https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-security.html#data-wrangler-security-iam-policy) in the *Amazon SageMaker AI Developer Guide*. For more information on SageMaker AI roles, see [SageMaker AI Roles](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html). 

  To grant access to Amazon Personalize data import operations, attach the following IAM policy to the SageMaker AI execution role. This policy grants the permissions required to import data into Amazon Personalize and attach a policy to your Amazon S3 bucket. And it grants `PassRole` permissions when the service is Amazon Personalize. Update the Amazon S3 `amzn-s3-demo-bucket` to the name of the Amazon S3 bucket you want to use as the destination for your formatted data after you prepare it with Data Wrangler. 

------
#### [ JSON ]

****  

  ```
  {
      "Version":"2012-10-17",		 	 	 
      "Statement": [
          {
              "Effect": "Allow",
              "Action": [
                  "personalize:Create*",
                  "personalize:List*",
                  "personalize:Describe*"
              ],
              "Resource": "*"
          },
          {
              "Effect": "Allow",
              "Action": [
                  "s3:PutBucketPolicy"
              ],
              "Resource": [
                  "arn:aws:s3:::amzn-s3-demo-bucket",
                  "arn:aws:s3:::amzn-s3-demo-bucket/*"
              ]
          },
          {
              "Effect": "Allow",
              "Action": [
                  "iam:PassRole"
              ],
              "Resource": "*",
              "Condition": {
                  "StringEquals": {
                      "iam:PassedToService": "personalize.amazonaws.com"
                  }
              }
          }
      ]
  }
  ```

------

  For information on creating an IAM policy, see [Creating IAM policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create.html) in the *IAM User Guide*. For information on attaching an IAM policy to role, see [Adding and removing IAM identity permissions ](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html) in the *IAM User Guide*.

# Launching Data Wrangler from Amazon Personalize
<a name="dw-launch-dw-from-personalize"></a>

To launch Data Wrangler from Amazon Personalize, you use the Amazon Personalize console to configure a SageMaker AI domain and launch Data Wrangler. 

**To launch Data Wrangler from Amazon Personalize**

1. Open the Amazon Personalize console at [https://console.aws.amazon.com/personalize/home](https://console.aws.amazon.com/personalize/home) and sign in to your account.

1. On the **Dataset groups** page, choose your dataset group.

1. In **Set up datasets** choose **Create dataset** and choose the type of dataset to create. You can't use Data Wrangler to prepare an Actions dataset or Action interactions dataset.

1. Choose **Import data using Data Wrangler** and choose **Next**.

1. For **SageMaker domain**, choose to use an existing domain or create a new one. You need a SageMaker AI Domain to access Data Wrangler in SageMaker AI Studio Classic. For information about domains and user profiles, see [SageMaker AI Domain](https://docs.aws.amazon.com/sagemaker/latest/dg/sm-domain.html) in the *Amazon SageMaker AI Developer Guide*.

1. To use an existing domain, choose a **SageMaker AI domain** and **User profile** to configure the domain.

1. To create a new domain:
   + Give the new domain a name.
   + Choose a **User profile name**.
   +  For **Execution role**, choose the role you created in [Setting up permissions](dw-data-prep-minimum-permissions.md). Or, if you have CreateRole permissions, create a new role using the role creation wizard. The role you use must have the `AmazonSageMakerFullAccess` policy attached. 

1. Choose **Next**. If you are creating a new domain, SageMaker AI starts creating your domain. This can take up to ten minutes.

1. Review the details for your SageMaker AI domain.

1. Choose **Import data with Data Wrangler**. SageMaker AI Studio Classic starts creating your environment, and when complete, the **Data flow** page of Data Wrangler in SageMaker AI Studio Classic opens in a new tab. It can take up to five minutes for SageMaker AI Studio Classic to finish creating your environment. When it finishes, you are ready to start importing data into Data Wrangler. For more information, see [Importing data into Data Wrangler](dw-import-data.md).

# Importing data into Data Wrangler
<a name="dw-import-data"></a>

 After you configure a SageMaker AI domain and launch Data Wrangler in a new tab, you are ready to import data from your source into Data Wrangler. When you use Data Wrangler to prepare data for Amazon Personalize, you import one dataset at a time. We recommend starting with an Item interactions dataset. You can't use Data Wrangler to prepare an Actions dataset or Action interactions dataset.

 You start on the **Data flow** page. The page should look similar to the following. 

![\[Depicts the Data flow page of Data Wrangler with Import data and Use sample dataset options.\]](http://docs.aws.amazon.com/personalize/latest/dg/images/dw-data-sources.png)


To start importing data, you choose **Import data** and specify your data source. Data Wrangler supports 40\$1 sources. These include AWS services, such as Amazon Redshift, Amazon EMR, or Amazon Athena, and third parties, such as Snowflake or DataBricks. Different data sources have different procedures for connecting and importing data. 

For a complete list of available sources and step-by-step instructions on importing data, see [Import](https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-import.html) in the *Amazon SageMaker AI Developer Guide*. 

After you import data into Data Wrangler, you are ready to transform it. For information about transforming data, see [Transforming data](dw-transform-data.md).

# Transforming data
<a name="dw-transform-data"></a>

 To transform data in Data Wrangler, you add a **Transform** step to your data flow. Data Wrangler includes over 300 transforms that you can use to prepare your data, including a **Map columns for Amazon Personalize** transform. And you can use the general Data Wrangler transforms to fix issues such as outliers, type issues, and missing values. 

After you finish transforming your data, you can analyze it with Data Wrangler. Or, if you are finished preparing your data in Data Wrangler, you can process it and import it into Amazon Personalize. For information about analyzing data, see [Generating visualizations and data insights](dw-analyze-data.md). For information about processing and importing data, see [Processing data and importing it into Amazon Personalize](dw-export-data.md).

**Topics**
+ [Mapping columns for Amazon Personalize](#dw-personalize-transform)
+ [General Data Wrangler transforms](#dw-general-transform)

## Mapping columns for Amazon Personalize
<a name="dw-personalize-transform"></a>

 To transform your data so it meets Amazon Personalize requirements, you add the **Map columns for Amazon Personalize** transform and map your columns to the required and optional fields for Amazon Personalize.

**To use the Map columns for Amazon Personalize transform**

1.  Choose **\$1** for your latest transform and choose **Add transform**. If you haven't added a transform, choose the **\$1** for the **Data types** transform. Data Wrangler adds this transform automatically to your flow. 

1.  Choose **Add step**. 

1.  Choose **Transforms for Amazon Personalize**. The **Map columns for Amazon Personalize** transform is selected by default. 

1. Use the transform fields to map your data to required Amazon Personalize attributes.

   1. Choose the dataset type that matches your data (Interactions, Items, or Users). 

   1. Choose your domain (ECOMMERCE, VIDEO\$1ON\$1DEMAND, or custom). The domain you choose must match the domain you specified when you created your dataset group.

   1. Choose the columns that match the required and optional fields for Amazon Personalize. For example, for the item\$1ID column, choose the column in your data that stores the unique identification information for each of your items. 

      Each column field is filtered by data type. Only the columns in your data that meet Amazon Personalize data type requirements are available. If your data is not of the required type, you can use the [Parse Value as Type](https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-transform.html#data-wrangler-transform-cast-type) Data Wrangler transform to convert it.

## General Data Wrangler transforms
<a name="dw-general-transform"></a>

 The following general Data Wrangler transforms can help you prepare data for Amazon Personalize: 
+ Data type conversion: If your field is not listed as a possible option in the **Map columns for Amazon Personalize** transform, you might need to convert its data type. The Data Wrangler transform [Parse Value as Type](https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-transform.html#data-wrangler-transform-cast-type) can help you convert your data. Or you can use the **Data types** transform that Data Wrangler adds by default when you create a flow. To use this transform, you choose the data type from the **Type** drop-down lists, choose **Preview** and then choose **Update**.

   For information on required data types for fields, see the section for your domain and dataset type in [Creating schema JSON files for Amazon Personalize schemas](how-it-works-dataset-schema.md). 
+ Handling missing values and outliers: If you generate missing value or outlier insights, you can use the Data Wrangler transforms [Handle Outliers](https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-transform.html#data-wrangler-transform-handle-outlier) and [Handle Missing Values](https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-transform.html#data-wrangler-transform-handle-missing) to resolve these issues. 
+  Custom transformations: With Data Wrangler, you can create your own transformations with Python (User-Defined Function), PySpark, pandas, or PySpark (SQL). You might use a custom transform to perform tasks such as dropping duplicate columns or grouping by columns. For more information, see [Custom Transforms](https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-transform.html#data-wrangler-transform-custom) in the *Amazon SageMaker AI Developer Guide*. 

# Generating visualizations and data insights
<a name="dw-analyze-data"></a>

After you import your data into Data Wrangler, you can use it to generate visualizations and data insights. 
+  **[Visualizations](#dw-visualizing-data)**: Data Wrangler can generate different types of graphs, such as histograms and scatter plots. For example, you can generate a histogram to identify outliers in your data. 
+ **[Data insights](#dw-generating-insights)**: You can use a *Data Quality and Insights Report for Amazon Personalize* to learn about your data through data insights and column and row statistics. This report can let you know if you have any type issues in your data. And you can learn what actions you can take to improve your data. These actions can help you meet Amazon Personalize resource requirements, such as model training requirements, or they can lead to improved recommendations.

 After you learn about your data through visualizations and insights, you can use this information to help you apply additional transforms to improve your data. Or, if you are finished preparing your data, you can process it and import it into Amazon Personalize. For information about transforming your data, see [Transforming data](dw-transform-data.md). For information about processing and importing data, see [Processing data and importing it into Amazon Personalize](dw-export-data.md). 

## Generating visualizations
<a name="dw-visualizing-data"></a>

You can use Data Wrangler to create different types of graphs, such as histograms and scatter plots. For example, you can generate a histogram to identify outliers in your data. To generate a data visualization, you add an **Analysis** step to your flow and, from **Analysis type**, choose the visualization you want to create. 

 For more information about creating visualizations in Data Wrangler, see [Analyze and Visualize](https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-analyses.html) in the *Amazon SageMaker AI Developer Guide*. 

## Generating data insights
<a name="dw-generating-insights"></a>

 You can use Data Wrangler to generate a **Data Quality and Insights Report for Amazon Personalize** report specific to your dataset type. Before generating the report, we recommend that you transform your data to meet Amazon Personalize requirements. This will lead to more relevant insights. For more information, see [Transforming data](dw-transform-data.md). 

**Topics**
+ [Report content](#dw-report-content)
+ [Generating the report](#dw-generating-insight-report)

### Report content
<a name="dw-report-content"></a>

The **Data Quality and Insights Report for Amazon Personalize** includes the following sections: 
+ **Summary:** The report summary includes dataset statistics and high priority warnings:
  + **Dataset statistics:** These include Amazon Personalize specific statistics, such as the number of unique users in your interactions data, and general statistics, such as the number of missing values or outliers.
  +  **High priority warnings:** These are Amazon Personalize specific insights that have the most impact on training or recommendations. Each warning includes a recommended action that you can take to resolve the issue. 
+  **Duplicate rows and Incomplete rows:** These sections include information on which rows have missing values and which rows are duplicated in your data. 
+  **Feature summary:** This section includes the data type for each column, invalid or missing data information, and warning counts. 
+  **Feature details:** This section includes subsections with detailed information for each of your columns of data. Each subsection includes statistics for the column, such as categorical value count, and missing value information. And each subsection includes Amazon Personalize specific insights and recommended actions for columns of data. For example, an insight might indicate that a column has more than 30 possible categories. 

#### Data type issues
<a name="dw-report-type-issues"></a>

 The report identifies columns that are not of the correct data type and specifies the required type. To get insights related to these features, you must convert the data type of the column and generate the report again. To convert the type, you can use the Data Wrangler transform [Parse Value as Type](https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-transform.html#data-wrangler-transform-cast-type). 

#### Amazon Personalize insights
<a name="dw-report-insights"></a>

The Amazon Personalize insights include a finding and a suggested action. The action is optional. For example, the report might include an insight and action related to the number of categories for a column of categorical data. If you don't believe the column is a categorical, you can disregard this insight and take no action.

 Except for minor wording differences, the Amazon Personalize specific insights are the same as the *single dataset* insights you might generate when you analyze your data with Amazon Personalize. For example, the insights report in Data Wrangler includes insights such as "The Item interactions dataset has only X unique users with two or more interactions." But it doesn't include insights like "X% of items in the *Items dataset* have no interactions in the *Item interactions dataset*."

 For a list of possible Amazon Personalize specific insights, see the insights that don't reference multiple datasets in [Data insights](analyzing-data.md#data-insights).

#### Report examples
<a name="dw-insight-report-examples"></a>

The look and feel of the Amazon Personalize report is the same as the general insights report in Data Wrangler. For examples of the general insights report, see [Get Insights On Data and Data Quality](https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-data-insights.html) in the *Amazon SageMaker AI Developer Guide*. The following example shows how the summary section of a report for an Item interactions dataset. It includes dataset statistics and some possible high priority Item interactions dataset warnings.

![\[Depicts the summary section of a report for an Item interactions dataset.\]](http://docs.aws.amazon.com/personalize/latest/dg/images/dw-reports-summary.png)


 The following example shows how the feature details section for an EVENT\$1TYPE column of an Item interactions dataset might appear in a report. 

![\[Depicts the feature details section for an EVENT_TYPE column of an Item interactions dataset.\]](http://docs.aws.amazon.com/personalize/latest/dg/images/dw-event-type-report.png)


### Generating the report
<a name="dw-generating-insight-report"></a>

To generate the **Data Quality and Insights Report for Amazon Personalize**, you choose **Get data insights** for your transform and create an analysis.

**To generate Data Quality and Insights Report for Amazon Personalize**

1. Choose the **\$1** option for the transform you are analyzing. If you haven't added a transform, choose the **\$1** for the **Data types** transform. Data Wrangler adds this transform automatically to your flow. 

1. Choose **Get data insights**. The **Create analysis** panel displays.

1. For **Analysis type**, choose **Data Quality and Insights Report for Amazon Personalize**. 

1.  For **Dataset type**, choose the type of Amazon Personalize dataset you are analyzing. 

1. Optionally choose **Run on full data**. By default, Data Wrangler generates insights on only a sample of your data. 

1. Choose **Create**. When analysis completes, the report appears. 

# Processing data and importing it into Amazon Personalize
<a name="dw-export-data"></a>

 When you are finished analyzing and transforming your data, you are ready to process it and import it into Amazon Personalize. 
+  **[Processing data](#dw-process-data)** – Processing the data applies your transform to your entire dataset and outputs it to a destination you specify. In this case you specify an Amazon S3 bucket. 
+ **[Importing data into Amazon Personalize](#dw-import-into-personalize)** – To import processed data into Amazon Personalize, you run a Jupyter Notebook provided in SageMaker AI Studio Classic. This notebook creates your Amazon Personalize datasets and imports your data into them. 

## Processing data
<a name="dw-process-data"></a>

 Before you import data into Amazon Personalize, you must apply your transform to your entire dataset and output it to an Amazon S3 bucket. To do this, you create a destination node with the destination set to an Amazon S3 bucket, and then launch a processing job for the transformation.

For step-by-step instructions on specifying a destination and launching a process job, see [Launch processing jobs with a few clicks using Amazon SageMaker AI Data Wrangler ](https://aws.amazon.com/blogs/machine-learning/launch-processing-jobs-with-a-few-clicks-using-amazon-sagemaker-data-wrangler/). When you add a destination, choose **Amazon S3**. You will use this location when you import the processed data into Amazon Personalize.

When you finish processing your data, you are ready to import it from the Amazon S3 bucket into Amazon Personalize.

## Importing data into Amazon Personalize
<a name="dw-import-into-personalize"></a>

After you process your data, you are ready to import it into Amazon Personalize. To import processed data into Amazon Personalize, you run a Jupyter Notebook provided in SageMaker AI Studio Classic. This notebook creates your Amazon Personalize datasets and imports your data into them.

**To import processed data into Amazon Personalize**

1. For the transformation you want to export, choose **Export to** and choose **Amazon Personalize (via Jupyter Notebook)**.

1. Modify the notebook to specify the Amazon S3 bucket you used as the data destination for the processing job. Optionally specify the domain for your dataset group. By default, the notebook creates a custom dataset group.

1. Review the notebook cells that create the schema. Verify that the schema fields have the expected types and attributes before running the cell. 
   +  Verify that fields that support null data have `null` listed in the list of types. The following example shows how to add `null` for a field. 

     ```
     {
       "name": "GENDER",
       "type": [
         "null",
         "string"
       ],
       "categorical": true
     }
     ```
   +  Verify that categorical fields have the categorical attribute set to true. The following example shows how to mark a field categorical. 

     ```
     {
               "name": "SUBSCRIPTION_MODEL",
               "type": "string",
               "categorical": true
     }
     ```
   + Verify that textual fields have the textual attribute set to true. The following example shows how to mark a field as textual.

     ```
     {
           "name": "DESCRIPTION",
           "type": [
             "null",
             "string"
           ],
           "textual": true
     }
     ```

1. Run the notebook to create a schema, and create dataset, and import your data into the Amazon Personalize dataset. You run the notebook just as you would a notebook outside of SageMaker AI Studio Classic. For information on running Jupyter notebooks, see [Running Code](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Running Code.html). For information on notebooks in SageMaker AI Studio Classic, see [Use Amazon SageMaker AI Notebooks](https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks.html) in the *Amazon SageMaker AI Developer Guide*.

    After you complete the notebook, if you imported interactions data, you are ready to create recommenders or custom resources. Or you can repeat the process with an items dataset or users dataset.
   + For information about creating a domain recommenders, see [Domain recommenders in Amazon Personalize](creating-recommenders.md). 
   + For information about creating and deploying custom resources, see [Custom resources for training and deploying Amazon Personalize models](create-custom-resources.md).

# Importing individual records into an Amazon Personalize dataset
<a name="incremental-data-updates"></a>

 After you have complete [Creating a schema and a dataset](data-prep-creating-datasets.md), you can import individual records, including item interactions, users, items, actions, or action interactions into an existing dataset. Importing data individually allows you to add small batches of records to your Amazon Personalize datasets as your catalog grows. You can import up to 10 records per individual import operation.

If you import an item, user, or action with the same ID as a record that's already in your dataset, Amazon Personalize replaces it with the new record. If you record two item interaction or action interaction events with exactly the same timestamp and identical properties, Amazon Personalize keeps only one of the events.

If you use Apache Kafka, you can use the *Kafka connector for Amazon Personalize* to stream data in real time to Amazon Personalize. For information see [Kafka Connector for Amazon Personalize ](https://github.com/aws/personalize-kafka-connector/blob/main/README.md) in the *personalize-kafka-connector* Github repository.

 If you have a large amount of historical records, we recommend that you first import data in bulk and then import data individually as necessary. See [Importing bulk data into Amazon Personalize with a dataset import job](bulk-data-import-step.md). 

**Filter updates for individual record imports**

Amazon Personalize updates any filters you created in the dataset group with your new interaction, item, and user data within 20 minutes from the last individual import. This update allows your campaigns to use your most recent data when filtering recommendations for your users. 

If you already created a recommender or deployed a custom solution version with a campaign, how new individual records influence recommendations depends on the domain use case or recipe that you use. For more information, see [Updating data in datasets after training](updating-datasets.md).

**Topics**
+ [Importing interactions individually](importing-interactions.md)
+ [Importing users individually](importing-users.md)
+ [Importing items individually](importing-items.md)
+ [Importing actions individually](importing-actions.md)

# Importing interactions individually
<a name="importing-interactions"></a>

 After you complete [Creating a schema and a dataset](data-prep-creating-datasets.md) to create an Item interactions dataset, you can individually import one or more new events into the dataset. To import interaction *[events](https://docs.aws.amazon.com/glossary/latest/reference/glos-chap.html#event)* individually, you create an *[event tracker](https://docs.aws.amazon.com/glossary/latest/reference/glos-chap.html#event-tracker)* and then import one or more events into your Item interactions dataset. You can import historical individual interaction events using the Amazon Personalize console, or import historical or real-time events using the AWS Command Line Interface (AWS CLI), or the AWS SDKs.

This section includes information about importing events with the Amazon Personalize console. We recommend using the Amazon Personalize console to import *only* historical events. For information about using the AWS CLI or the AWS SDKs to record events in real-time, see [Recording real-time events to influence recommendations](recording-events.md). 

For information about how Amazon Personalize updates filters for new records and how new records influence recommendations, see [Importing individual records into an Amazon Personalize dataset](incremental-data-updates.md). 

**Topics**
+ [Creating an event tracker (console)](#event-tracker-console)
+ [Importing events individually (console)](#importing-interactions-console)

## Creating an event tracker (console)
<a name="event-tracker-console"></a>

**Note**  
 If you've created an event tracker, you can skip to [Importing events individually (console)](#importing-interactions-console). 

Before you can import an event to an Interactions dataset, you must create an *[event tracker](https://docs.aws.amazon.com/glossary/latest/reference/glos-chap.html#event-tracker)* for the dataset group. 

**To create an event tracker (console)**

1. Open the Amazon Personalize console at [https://console.aws.amazon.com/personalize/home](https://console.aws.amazon.com/personalize/home) and sign in to your account.

1.  On the **Dataset groups** page, choose the dataset group with the Item interactions dataset that you want to import events to.

1. On the **Dashboard** for the dataset group, in **Install event ingestion SDK**, choose **Start**. 

1. On the **Configure tracker** page, in **Tracker configurations**, for **Tracker name**, provide a name for the event tracker, and choose **Next**.

1. The **Install the SDK** page shows the **Tracking ID** for the new event tracker and instructions for using AWS Amplify or AWS Lambda to stream event data.

   You can ignore this information because you're using the Amazon Personalize console to upload event data. If you want to stream event data using AWS Amplify or AWS Lambda in the future, you can view this information by choosing the event tracker on the **Event trackers** page. 

1. Choose **Finish**. You can now import events with the console (see [Importing events individually (console)](#importing-interactions-console) or record events in real time using the `PutEvents` operation (see [Recording real-time events to influence recommendations](recording-events.md)). 

## Importing events individually (console)
<a name="importing-interactions-console"></a>

 After you create an event tracker, you can import events individually into an Item interactions dataset. This procedure assumes you have already created an Item interactions dataset. For information about creating datasets, see [Creating a schema and a dataset](data-prep-creating-datasets.md).

**To import events individually (console)**

1. Open the Amazon Personalize console at [https://console.aws.amazon.com/personalize/home](https://console.aws.amazon.com/personalize/home) and sign in to your account.

1. On the **Dataset groups** page, choose the dataset group with the Item interactions dataset that you want to import events to. 

1. In the navigation pane, choose **datasets**. 

1. On the **Datasets** page, choose the Interactions dataset. 

1. At the top right of the dataset details page, choose **Modify dataset**, and choose **Create record**. 

1. In **Create user-item interaction record(s)** page, for **Record input**, enter the event details in JSON format. The event's field names and values must match the schema that you used when you created the Item interactions dataset. Amazon Personalize provides a JSON template with field names and data types from this schema. You can import up to 10 events at a time.

1. Choose **Create record(s)**. In **Response**, the result of the import is listed and a success or failure message is displayed. 

# Importing users individually
<a name="importing-users"></a>

 After you complete [Creating a schema and a dataset](data-prep-creating-datasets.md) to create a Users dataset, you can individually import one or more new users into the dataset. Individually importing users allows you to keep your Users dataset current with small batch imports as your catalog grows. You can import up to 10 users at a time. If you have a large amount of new users, we recommend that you first import data in bulk and then import user data individually as necessary. See [Importing bulk data into Amazon Personalize with a dataset import job](bulk-data-import-step.md). 

You can use the Amazon Personalize console, the AWS Command Line Interface (AWS CLI), or AWS SDKs to import users. If you import a user with the same `userId` as a user that's already in your Users dataset, Amazon Personalize replaces the user with the new one. You can import up to 10 users at a time.

For information about how Amazon Personalize updates filters for new records and how new records influence recommendations, see [Importing individual records into an Amazon Personalize dataset](incremental-data-updates.md). 

**Topics**
+ [Importing users individually (console)](#importing-users-console)
+ [Importing users individually (AWS CLI)](#importing-users-cli)
+ [Importing users individually (AWS SDKs)](#importing-users-sdk)

## Importing users individually (console)
<a name="importing-users-console"></a>

You can import up to 10 users at a time. This procedure assumes you have already created a Users dataset. For information about creating datasets, see [Creating a schema and a dataset](data-prep-creating-datasets.md).

**To import users individually (console)**

1. Open the Amazon Personalize console at [https://console.aws.amazon.com/personalize/home](https://console.aws.amazon.com/personalize/home) and sign in to your account.

1. On the **Dataset groups** page, choose the dataset group with the Users dataset that you want to import the user to. 

1. In the navigation pane, choose **Datasets**. 

1. On the **Datasets** page, choose the Users dataset. 

1. On the dataset details page, at the top right, choose **Modify dataset** and choose **Create record**. 

1. On the **Create user record(s)** page, for record input, enter the user details in JSON format. The user's field names and values must match the schema you used when you created the Users dataset. Amazon Personalize provides a JSON template with field names and data types from this schema. 

1. Choose **Create record(s)**. In **Response**, the result of the import is listed and a success or failure message is displayed.

## Importing users individually (AWS CLI)
<a name="importing-users-cli"></a>

Add one or more users to your Users dataset with the [PutUsers](API_UBS_PutUsers.md) operation. You can import up to 10 users with a single `PutUsers` call. This section assumes that you have already created an Users dataset. For information about creating datasets, see [Creating a schema and a dataset](data-prep-creating-datasets.md).

Use the following `put-users` command to add one or more users with the AWS CLI. Replace `dataset arn` with the Amazon Resource Name (ARN) of your dataset and `user Id` with the ID of the user. If an user with the same `userId` is already in your Users dataset, Amazon Personalize replaces it with the new one.

For `properties`, for each field in your Users dataset, replace the `propertyName` with the field name from your schema in camel case. For example, GENDER would be `gender` and MEMBERSHIP\$1TYPE would be `membershipType`. Replace `user data` with the data for the user. For categorical string data, to include multiple categories for a single property, separate each category with a pipe (`|`). For example `\"Premium Class|Legacy Member\"`.

```
aws personalize-events put-users \
  --dataset-arn dataset arn \
  --users '[{
      "userId": "user Id", 
      "properties": "{\"propertyName\": "\user data\"}" 
    }, 
    {
      "userId": "user Id", 
      "properties": "{\"propertyName\": "\user data\"}" 
    }]'
```

## Importing users individually (AWS SDKs)
<a name="importing-users-sdk"></a>

Add one or more users to your Users dataset with the [PutUsers](API_UBS_PutUsers.md) operation. If a user with the same `userId` is already in your Users dataset, Amazon Personalize replaces it with the new one. You can import up to 10 users with a single `PutUsers` call. This section assumes that you have already created a Users dataset. For information about creating datasets, see [Creating a schema and a dataset](data-prep-creating-datasets.md).

 The following code shows how to add one or more users to your Users dataset. For each property name parameter, pass the field name from your schema in camel case. For example, GENDER would be `gender` and MEMBERSHIP\$1TYPE would be `membershipType`. For each property value parameter, pass the data for the user. 

For categorical string data, to include multiple categories for a single property separate each category with a pipe (`|`). For example `"Premium class|Legacy Member"`.

------
#### [ SDK for Python (Boto3) ]

```
import boto3

personalize_events = boto3.client(service_name='personalize-events')

personalize_events.put_users(
    datasetArn = 'dataset arn',
    users = [{
      'userId': 'user ID',
      'properties': "{\"propertyName\": \"user data\"}"   
      },
      {
      'userId': 'user ID',
      'properties': "{\"propertyName\": \"user data\"}"   
      }]
)
```

------
#### [ SDK for Java 2.x ]

```
public static int putUsers(PersonalizeEventsClient personalizeEventsClient,
                         String datasetArn,
                         String user1Id,
                         String user1PropertyName,
                         String user1PropertyValue,
                         String user2Id,
                         String user2PropertyName,
                         String user2PropertyValue) {

    int responseCode = 0;
    ArrayList<User> users = new ArrayList<>();

    try {
        User user1 = User.builder()
          .userId(user1Id)
          .properties(String.format("{\"%1$s\": \"%2$s\"}", user1PropertyName, user1PropertyValue))
          .build();

        users.add(user1);

        User user2 = User.builder()
          .userId(user2Id)
          .properties(String.format("{\"%1$s\": \"%2$s\"}", user2PropertyName, user2PropertyValue))
          .build();

        users.add(user2);

        PutUsersRequest putUsersRequest = PutUsersRequest.builder()
          .datasetArn(datasetArn)
          .build();

        responseCode = personalizeEventsClient.putUsers(putUsersRequest).sdkHttpResponse().statusCode();
        System.out.println("Response code: " + responseCode);
        return responseCode;

    } catch (PersonalizeEventsException e) {
        System.out.println(e.awsErrorDetails().errorMessage());
    }
    return responseCode;
}
```

------
#### [ SDK for JavaScript v3 ]

```
import {
  PutUsersCommand,
  PersonalizeEventsClient,
} from "@aws-sdk/client-personalize-events";

const personalizeEventsClient = new PersonalizeEventsClient({
  region: "REGION",
});

// set the put users parameters
var putUsersParam = {
  datasetArn:
    "DATASET ARN",
  users: [
    {
      userId: "userId",
      properties: '{"column1Name": "value", "column2Name": "value"}',
    },
    {
      userId: "userId",
      properties: '{"column1Name": "value", "column2Name": "value"}',
    },
  ],
};
export const run = async () => {
  try {
    const response = await personalizeEventsClient.send(
      new PutUsersCommand(putUsersParam)
    );
    console.log("Success!", response);
    return response; // For unit tests.
  } catch (err) {
    console.log("Error", err);
  }
};
run();
```

------

# Importing items individually
<a name="importing-items"></a>

After you complete [Creating a schema and a dataset](data-prep-creating-datasets.md) to create an Items dataset, you can individually import one or more new items into the dataset. Individually importing items allows you to keep your Items dataset current with small batch imports as your catalog grows. You can import up to 10 items at a time. If you have a large amount of new items, we recommend that you first import data in bulk and then import item data individually as necessary. See [Importing bulk data into Amazon Personalize with a dataset import job](bulk-data-import-step.md).

You can use the Amazon Personalize console, the AWS Command Line Interface (AWS CLI), or AWS SDKs to import items. If you import an item with the same `itemId` as an item that's already in your Items dataset, Amazon Personalize replaces it with the new item.

 For information about how Amazon Personalize updates filters for new records and how new records influence recommendations, see [Importing individual records into an Amazon Personalize dataset](incremental-data-updates.md). 

**Topics**
+ [Importing items individually (console)](#importing-items-console)
+ [Importing items individually (AWS CLI)](#importing-items-cli)
+ [Importing items individually (AWS SDKs)](#importing-items-cli-sdk)

## Importing items individually (console)
<a name="importing-items-console"></a>

You can import up to 10 items to an Items dataset at a time. This procedure assumes that you have already created an Items dataset. For information about creating datasets, see [Creating a schema and a dataset](data-prep-creating-datasets.md).

**To import items individually (console)**

1. Open the Amazon Personalize console at [https://console.aws.amazon.com/personalize/home](https://console.aws.amazon.com/personalize/home) and sign in to your account.

1. On the **Dataset groups** page, choose the dataset group with the Items dataset that you want to import the items to. 

1. In the navigation pane, choose **Datasets**. 

1. On the **Datasets** page, choose the Items dataset. 

1. At the top right of the dataset details page, choose **Modify dataset**, and then choose **Create record**. 

1. In **Create item record(s)** page, for **Record input**, enter the item details in JSON format. The item's field names and values must match the schema you used when you created the Items dataset. Amazon Personalize provides a JSON template with field names and data types from this schema.

1. Choose **Create record(s)**. In **Response**, the result of the import is listed and a success or failure message is displayed.

## Importing items individually (AWS CLI)
<a name="importing-items-cli"></a>

Add one or more items to your Items dataset using the [PutItems](API_UBS_PutItems.md) operation. You can import up to 10 items with a single `PutItems` call. This section assumes that you have already created an Items dataset. For information about creating datasets, see [Creating a schema and a dataset](data-prep-creating-datasets.md).

Use the following `put-items` command to add one or more items with the AWS CLI. Replace `dataset arn` with the Amazon Resource Name (ARN) of your dataset and `item Id` with the ID of the item. If an item with the same `itemId` is already in your Items dataset, Amazon Personalize replaces it with the new one.

For `properties`, for each field in your Items dataset, replace the `propertyName` with the field name from your schema in camel case. For example, GENRES would be `genres` and CREATION\$1TIMESTAMP would be creationTimestamp. Replace `item data` with the data for the item. `CREATION_TIMESTAMP` data must be in [Unix epoch time format](interactions-datasets.md#timestamp-data) and in seconds. For categorical string data, to include multiple categories for a single property, separate each category with a pipe (`|`). For example `\"Horror|Action\"`.

```
aws personalize-events put-items \
  --dataset-arn dataset arn \
  --items '[{
      "itemId": "item Id", 
      "properties": "{\"propertyName\": "\item data\"}" 
    }, 
    {
      "itemId": "item Id", 
      "properties": "{\"propertyName\": "\item data\"}" 
    }]'
```

## Importing items individually (AWS SDKs)
<a name="importing-items-cli-sdk"></a>

Add one or more items to your Items dataset using the [PutItems](API_UBS_PutItems.md) operation. You can import up to 10 items with a single `PutItems` call. If an item with the same `itemId` is already in your Items dataset, Amazon Personalize replaces it with the new one. This section assumes that you have already created an Items dataset. For information about creating datasets, see [Creating a schema and a dataset](data-prep-creating-datasets.md).

 The following code shows how to add one or more items to your Items dataset. For each property name parameter, pass the field name from your schema in camel case. For example, GENRES would be `genres` and CREATION\$1TIMESTAMP would be `creationTimestamp`. For each property value parameter, pass the data for the item. `CREATION_TIMESTAMP` data must be in [Unix epoch time format](interactions-datasets.md#timestamp-data) and in seconds. 

For categorical string data, to include multiple categories for a single property, separate each category with a pipe (`|`). For example `"Horror|Action"`.

------
#### [ SDK for Python (Boto3) ]

```
import boto3

personalize_events = boto3.client(service_name='personalize-events')

personalize_events.put_items(
    datasetArn = 'dataset arn',
    items = [{
      'itemId': 'item ID',
      'properties': "{\"propertyName\": \"item data\"}"   
      },
      {
      'itemId': 'item ID',
      'properties': "{\"propertyName\": \"item data\"}"   
      }]
)
```

------
#### [ SDK for Java 2.x ]

```
public static int putItems(PersonalizeEventsClient personalizeEventsClient,
                           String datasetArn,
                           String item1Id,
                           String item1PropertyName,
                           String item1PropertyValue,
                           String item2Id,
                           String item2PropertyName,
                           String item2PropertyValue) {

    int responseCode = 0;
    ArrayList<Item> items = new ArrayList<>();

    try {
        Item item1 = Item.builder()
                .itemId(item1Id)
                .properties(String.format("{\"%1$s\": \"%2$s\"}",
                        item1PropertyName, item1PropertyValue))
                .build();

        items.add(item1);

        Item item2 = Item.builder()
                .itemId(item2Id)
                .properties(String.format("{\"%1$s\": \"%2$s\"}",
                        item2PropertyName, item2PropertyValue))
                .build();

        items.add(item2);

        PutItemsRequest putItemsRequest = PutItemsRequest.builder()
                .datasetArn(datasetArn)
                .items(items)
                .build();

        responseCode = personalizeEventsClient.putItems(putItemsRequest).sdkHttpResponse().statusCode();
        System.out.println("Response code: " + responseCode);
        return responseCode;

    } catch (PersonalizeEventsException e) {
        System.out.println(e.awsErrorDetails().errorMessage());
    }
    return responseCode;
    }
```

------
#### [ SDK for JavaScript v3 ]

```
import {
  PutItemsCommand,
  PersonalizeEventsClient,
} from "@aws-sdk/client-personalize-events";

const personalizeEventsClient = new PersonalizeEventsClient({
  region: "REGION",
});

// set the put items parameters
var putItemsParam = {
  datasetArn:
    "DATASET ARN",
  items: [
    {
      itemId: "itemId", 
      properties: '{"column1Name": "value", "column2Name": "value"}',
    },
    {
      itemId: "itemId",
      properties: '{"column1Name": "value", "column2Name": "value"}',
    },
  ],
};
export const run = async () => {
  try {
    const response = await personalizeEventsClient.send(
      new PutItemsCommand(putItemsParam)
    );
    console.log("Success!", response);
    return response; // For unit tests.
  } catch (err) {
    console.log("Error", err);
  }
};
run();
```

------

# Importing actions individually
<a name="importing-actions"></a>

After you complete [Creating a schema and a dataset](data-prep-creating-datasets.md) to create an [Actions dataset](actions-datasets.md), you can individually import one or more new actions into the dataset. When you individually import actions, you keep your Actions dataset current with small batch imports as your catalog grows. You can import up to 10 actions at a time. If you have a large number of new actions, we recommend that you first import data in bulk and then import action data individually as necessary. See [Importing bulk data into Amazon Personalize with a dataset import job](bulk-data-import-step.md).

You can use the Amazon Personalize console, the AWS Command Line Interface (AWS CLI), or AWS SDKs to import actions. If you import an action with the same `actionId` as an action that's already in your Actions dataset, Amazon Personalize replaces it with the new action.

For information about how new records influence recommendations, see [Updating data in datasets after training](updating-datasets.md). 

**Topics**
+ [Importing actions individually (console)](#importing-actions-console)
+ [Importing actions individually (AWS CLI)](#importing-actions-cli)
+ [Importing actions individually (AWS SDKs)](#importing-actions-cli-sdk)

## Importing actions individually (console)
<a name="importing-actions-console"></a>

You can import up to 10 actions into an Actions dataset at a time. This section assumes that you have already created an Actions dataset. For information about creating datasets, see [Creating a schema and a dataset](data-prep-creating-datasets.md).

**To import actions individually (console)**

1. Open the Amazon Personalize console at [https://console.aws.amazon.com/personalize/home](https://console.aws.amazon.com/personalize/home) and sign in to your account.

1. On the **Dataset groups** page, choose the dataset group with the Actions dataset that you want to add to.

1. In the navigation pane, choose **Datasets**. 

1. On the **Datasets** page, choose the Actions dataset. 

1. At the top right of the dataset details page, choose **Modify dataset**, and then choose **Create record**. 

1. In **Create action record(s)** page, for **Record input**, enter the action details in JSON format. The action's field names and values must match the schema you used when you created the Actions dataset. Amazon Personalize provides a JSON template with field names and data types from this schema.

1. Choose **Create record(s)**. In **Response**, the result of the import is listed and a success or failure message is displayed.

## Importing actions individually (AWS CLI)
<a name="importing-actions-cli"></a>

Add one or more actions to your Actions dataset using the `PutActions` API operation. You can import up to 10 actions at once. This section assumes that you have already created an Actions dataset. For information about creating datasets, see [Creating a schema and a dataset](data-prep-creating-datasets.md).

Use the following `put-actions` command to add one or more actions with the AWS CLI. Replace `dataset arn` with the Amazon Resource Name (ARN) of your dataset and `actionId` with the ID of the action. If an action with the same `actionId` is already in your Actions dataset, Amazon Personalize replaces it with the new one.

For `properties`, for each field in your Actions dataset, replace the `propertyName` with the field name from your schema in camel case. For example, ACTION\$1EXPIRATION\$1TIMESTAMP would be `actionExpirationTimestamp` and CREATION\$1TIMESTAMP would be creationTimestamp. Replace `property data` with the data for the property.

```
aws personalize-events put-actions \
  --dataset-arn dataset arn \
  --actions '[{
      "actionId": "actionId", 
      "properties": "{\"propertyName\": "\property data\"}" 
    }, 
    {
      "actionId": "actionId", 
      "properties": "{\"propertyName\": "\property data\"}" 
    }]'
```

## Importing actions individually (AWS SDKs)
<a name="importing-actions-cli-sdk"></a>

Add one or more actions to your Actions dataset using the PutActions operation. You can import up to 10 actions with a single `PutActions` call. If an action with the same `actionId` is already in your Actions dataset, Amazon Personalize replaces it with the new one. This section assumes that you have already created an Actions dataset. For information about creating datasets, see [Creating a schema and a dataset](data-prep-creating-datasets.md).

 The following code shows how to add one or more actions to your Actions dataset. For each action, specify the `actionId`. If an action with the same `actionId` is already in your Actions dataset, Amazon Personalize replaces it with the new one. For `properties`, for each additional field in your Actions dataset, replace the `propertyName` with the field name from your schema in camel case. For example, ACTION\$1EXPIRATION\$1TIMESTAMP would be `actionExpirationTimestamp` and CREATION\$1TIMESTAMP would be creationTimestamp. Replace `property data` with the data for the property. 

```
import boto3

personalize_events = boto3.client(service_name='personalize-events')

personalize_events.put_actions(
    datasetArn = 'dataset arn',
    actions = [{
      'actionId': 'actionId',
      'properties': "{\"propertyName\": \"property value\"}"   
      },
      {
      'actionId': 'actionId',
      'properties': "{\"propertyName\": \"property value\"}"   
      }]
)
```