

# DynamoDB data import from Amazon S3: how it works
<a name="S3DataImport.HowItWorks"></a>

To import data into DynamoDB, your data must be in an Amazon S3 bucket in CSV, DynamoDB JSON, or Amazon Ion format. Data can be compressed in ZSTD or GZIP format, or can be directly imported in uncompressed form. Source data can either be a single Amazon S3 object or multiple Amazon S3 objects that use the same prefix.

Your data will be imported into a new DynamoDB table, which will be created when you initiate the import request. You can create this table with secondary indexes, then query and update your data across all primary and secondary indexes as soon as the import is complete. You can also add a global table replica after the import is complete. 

**Note**  
During the Amazon S3 import process, DynamoDB creates a new target table that will be imported into. Import into existing tables is not currently supported by this feature.

Import from Amazon S3 does not consume write capacity on the new table, so you do not need to provision any extra capacity for importing data into DynamoDB. Data import pricing is based on the uncompressed size of the source data in Amazon S3, that is processed as a result of the import. Items that are processed but fail to load into the table due to formatting or other inconsistencies in the source data are also billed as part of the import process. See [ Amazon DynamoDB pricing](https://aws.amazon.com/dynamodb/pricing) for details.

You can import data from an Amazon S3 bucket owned by a different account if you have the correct permissions to read from that specific bucket. The new table may also be in a different Region from the source Amazon S3 bucket. For more information, see [Amazon Simple Storage Service setup and permissions ](https://docs.aws.amazon.com/AmazonS3/latest/userguide/example-walkthroughs-managing-access.html).

Import times are directly related to your data’s characteristics in Amazon S3. This includes data size, data format, compression scheme, uniformity of data distribution, number of Amazon S3 objects, and other related variables. In particular, data sets with uniformly distributed keys will be faster to import than skewed data sets. For example, if your secondary index's key is using the month of the year for partitioning, and all your data is from the month of December, then importing this data may take significantly longer. 

The attributes associated with keys are expected to be unique on the base table. If any keys are not unique, the import will overwrite the associated items until only the last overwrite remains. For example, if the primary key is the month and multiple items are set to the month of September, each new item will overwrite the previously written items and only one item with the primary key of "month" set to September will remain. In such cases, the number of items processed in the import table description will not match the number of items in the target table. 

AWS CloudTrail logs all console and API actions for table import. For more information, see [Logging DynamoDB operations by using AWS CloudTrail](logging-using-cloudtrail.md).

The following video is an introduction to importing directly from Amazon S3 into DynamoDB.

[![AWS Videos](http://img.youtube.com/vi/https://www.youtube.com/embed/fqq0CMOnOaI/0.jpg)](http://www.youtube.com/watch?v=https://www.youtube.com/embed/fqq0CMOnOaI)


**Topics**
+ [

# Requesting a table import in DynamoDB
](S3DataImport.Requesting.md)
+ [

# Amazon S3 import formats for DynamoDB
](S3DataImport.Format.md)
+ [

# Import format quotas and validation
](S3DataImport.Validation.md)
+ [

# Best practices for importing from Amazon S3 into DynamoDB
](S3DataImport.BestPractices.md)

# Requesting a table import in DynamoDB
<a name="S3DataImport.Requesting"></a>

DynamoDB import allows you to import data from an Amazon S3 bucket to a new DynamoDB table. You can request a table import using the [DynamoDB console](https://console.aws.amazon.com/), the [CLI](AccessingDynamoDB.md#Tools.CLI), [CloudFormation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-dynamodb-table.html) or the [DynamoDB API.](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/Welcome.html)

If you want to use the AWS CLI, you must configure it first. For more information, see [Accessing DynamoDB](AccessingDynamoDB.md).

**Note**  
The Import Table feature interacts with multiple different AWS Services such as Amazon S3 and CloudWatch. Before you begin an import, make sure that the user or role that invokes the import APIs has permissions to all services and resources the feature depends on. 
 Do not modify the Amazon S3 objects while the import is in progress, as this can cause the operation to fail or be cancelled.
For more information on errors and troubleshooting, see [Import format quotas and validation](S3DataImport.Validation.md)

**Topics**
+ [

## Setting up IAM permissions
](#DataImport.Requesting.Permissions)
+ [

## Requesting an import using the AWS Management Console
](#S3DataImport.Requesting.Console)
+ [

## Getting details about past imports in the AWS Management Console
](#S3DataImport.Requesting.Console.Details)
+ [

## Requesting an import using the AWS CLI
](#S3DataImport.Requesting.CLI)
+ [

## Getting details about past imports in the AWS CLI
](#S3DataImport.Requesting.CLI.Details)

## Setting up IAM permissions
<a name="DataImport.Requesting.Permissions"></a>

You can import data from any Amazon S3 bucket you have permission to read from. The source bucket does not need to be in the same Region or have the same owner as the source table. Your AWS Identity and Access Management (IAM) must include the relevant actions on the source Amazon S3 bucket, and required CloudWatch permissions for providing debugging information. An example policy is shown below.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "AllowDynamoDBImportAction",
      "Effect": "Allow",
      "Action": [
        "dynamodb:ImportTable",
        "dynamodb:DescribeImport"
      ],
      "Resource": "arn:aws:dynamodb:us-east-1:111122223333:table/my-table*"
    },
    {
      "Sid": "AllowS3Access",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::your-bucket/*",
        "arn:aws:s3:::your-bucket"
      ]
    },
    {
      "Sid": "AllowCloudwatchAccess",
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:DescribeLogGroups",
        "logs:DescribeLogStreams",
        "logs:PutLogEvents",
        "logs:PutRetentionPolicy"
      ],
      "Resource": "arn:aws:logs:us-east-1:111122223333:log-group/aws-dynamodb/*"
    },
    {
      "Sid": "AllowDynamoDBListImports",
      "Effect": "Allow",
      "Action": "dynamodb:ListImports",
      "Resource": "*"
    }
  ]
}
```

------

### Amazon S3 permissions
<a name="DataImport.Requesting.Permissions.s3"></a>

When starting an import on an Amazon S3 bucket source that is owned by another account, ensure that the role or user has access to the Amazon S3 objects. You can check that by executing an Amazon S3 `GetObject` command and using the credentials. When using the API, the Amazon S3 bucket owner parameter defaults to the current user’s account ID. For cross account imports, ensure that this parameter is correctly populated with the bucket owner’s account ID. The following code is an example Amazon S3 bucket policy in the source account.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {"Sid": "ExampleStatement",
            "Effect": "Allow",
            "Principal": {"AWS": "arn:aws:iam::123456789012:user/Dave"
            },
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": "arn:aws:s3:::amzn-s3-demo-bucket/*"
        }
    ]
}
```

------

### AWS Key Management Service
<a name="DataImport.Requesting.Permissions.kms"></a>

When creating the new table for import, if you select an encryption at rest key that is not owned by DynamoDB then you must provide the AWS KMS permissions required to operate a DynamoDB table encrypted with customer managed keys. For more information see [Authorizing use of your AWS KMS key](encryption.usagenotes.html#dynamodb-kms-authz). If the Amazon S3 objects are encrypted with server side encryption KMS (SSE-KMS), ensure that the role or user initiating the import has access to decrypt using the AWS KMS key. This feature does not support customer-provided encryption keys (SSE-C) encrypted Amazon S3 objects. 

### CloudWatch permissions
<a name="DataImport.Requesting.Permissions.cw"></a>

The role or user that is initiating the import will need create and manage permissions for the log group and log streams associated with the import. 

## Requesting an import using the AWS Management Console
<a name="S3DataImport.Requesting.Console"></a>

The following example demonstrates how to use the DynamoDB console to import existing data to a new table named `MusicCollection`.

**To request a table import**

1. Sign in to the AWS Management Console and open the DynamoDB console at [https://console.aws.amazon.com/dynamodb/](https://console.aws.amazon.com/dynamodb/).

1. In the navigation pane on the left side of the console, choose **Import from S3**.

1. On the page that appears, select **Import from S3**.

1. Choose **Import from S3**.

1. In **Source S3 URL**, enter the Amazon S3 source URL.

   If you own the source bucket, choose **Browse S3** to search for it. Alternatively, enter the bucket's URL in the following format – `s3://bucket/prefix`. The `prefix` is an Amazon S3 key prefix. It's either the Amazon S3 object name that you want to import or the key prefix shared by all the Amazon S3 objects that you want to import.
**Note**  
You can't use the same prefix as your DynamoDB export request. The export feature creates a folder structure and manifest files for all the exports. If you use the same Amazon S3 path, it will result in an error.   
Instead, point the import at the folder, which contains data from that specific export. The format of the correct path in this case will be `s3://bucket/prefix/AWSDynamoDB/<XXXXXXXX-XXXXXX>/data/`, where `XXXXXXXX-XXXXXX` is the export ID. You can find export ID in the export ARN, which has the following format – `arn:aws:dynamodb:<Region>:<AccountID>:table/<TableName>/export/<XXXXXXXX-XXXXXX>`. For example, `arn:aws:dynamodb:us-east-1:123456789012:table/ProductCatalog/export/01234567890123-a1b2c3d4`.

1. Specify if you are the **S3 bucket owner**. If the source bucket is owned by a different account, select **A different AWS account**. Then enter the account ID of the bucket owner.

1. Under **Import file compression**, select either **No compression**, **GZIP** or **ZSTD** as appropriate.

1. Select the appropriate Import file format. The options are **DynamoDB JSON**, **Amazon Ion** or **CSV**. If you select **CSV**, you will have two additional options: **CSV header** and **CSV delimiter character**.

   For **CSV header**, choose if the header will either be taken from the first line of the file or be customized. If you select **Customize your headers**, you can specify the header values you want to import with. CSV Headers specified by this method are case-sensitive and are expected to contain the keys of the target table. 

   For **CSV delimiter character**, you set the character which will separate items. Comma is selected by default. If you select **Custom delimiter character**, the delimiter must match the regex pattern: `[,;:|\t ]`.

1. Select the **Next** button and select the options for the new table that will be created to store your data. 
**Note**  
Primary Key and Sort Key must match the attributes in the file, or the import will fail. The attributes are case sensitive.

1. Select **Next** again to review your import options, then click **Import** to begin the import task. You will first see your new table listed in the “Tables” with the status “Creating”. At this time the table is not accessible.

1. Once the import completes, the status will show as "Active" and you can start using the table.

## Getting details about past imports in the AWS Management Console
<a name="S3DataImport.Requesting.Console.Details"></a>

You can find information about import tasks you've run in the past by clicking **Import from S3** in the navigation sidebar, then selecting the **Imports** tab. The import panel contains a list of all imports you've created in the past 90 days. Selecting the ARN of a task listed in the Imports tab will retrieve information about that import, including any advanced configuration settings you chose.

## Requesting an import using the AWS CLI
<a name="S3DataImport.Requesting.CLI"></a>

The following example imports CSV formatted data from an S3 bucket called bucket with a prefix of prefix to a new table called target-table.

```
aws dynamodb import-table --s3-bucket-source S3Bucket=bucket,S3KeyPrefix=prefix \ 
            --input-format CSV --table-creation-parameters '{"TableName":"target-table","KeySchema":  \
            [{"AttributeName":"hk","KeyType":"HASH"}],"AttributeDefinitions":[{"AttributeName":"hk","AttributeType":"S"}],"BillingMode":"PAY_PER_REQUEST"}' \ 
            --input-format-options '{"Csv": {"HeaderList": ["hk", "title", "artist", "year_of_release"], "Delimiter": ";"}}'
```

**Note**  
If you choose to encrypt your import using a key protected by AWS Key Management Service (AWS KMS), the key must be in the same Region as the destination Amazon S3 bucket.

## Getting details about past imports in the AWS CLI
<a name="S3DataImport.Requesting.CLI.Details"></a>

You can find information about import tasks you've run in the past by using the `list-imports` command. This command returns a list of all imports you've created in the past 90 days. Note that although import task metadata expires after 90 days and jobs older than that are no longer found on this list, DynamoDB does not delete any of the objects in your Amazon S3 bucket or the table created during import.

```
aws dynamodb list-imports
```

To retrieve detailed information about a specific import task, including any advanced configuration settings, use the `describe-import` command.

```
aws dynamodb describe-import \
    --import-arn arn:aws:dynamodb:us-east-1:123456789012:table/ProductCatalog/exp
```

# Amazon S3 import formats for DynamoDB
<a name="S3DataImport.Format"></a>

DynamoDB can import data in three formats: CSV, DynamoDB JSON, and Amazon Ion.

**Topics**
+ [

## CSV
](#S3DataImport.Requesting.Formats.CSV)
+ [

## DynamoDB Json
](#S3DataImport.Requesting.Formats.DDBJson)
+ [

## Amazon Ion
](#S3DataImport.Requesting.Formats.Ion)

## CSV
<a name="S3DataImport.Requesting.Formats.CSV"></a>

A file in CSV format consists of multiple items delimited by newlines. By default, DynamoDB interprets the first line of an import file as the header and expects columns to be delimited by commas. You can also define headers that will be applied, as long as they match the number of columns in the file. If you define headers explicitly, the first line of the file will be imported as values. 

**Note**  
When importing from CSV files, all columns other than the hash range and keys of your base table and secondary indexes are imported as DynamoDB strings.

**Escaping double quotes**

Any double quotes characters that exist in the CSV file must be escaped. If they are not escaped, such as in this following example, the import will fail: 

```
id,value
"123",Women's Full "Length" Dress
```

This same import will succeed if the quotes are escaped with two sets of double quotes:

```
id,value
"""123""","Women's Full ""Length"" Dress"
```

Once the text has been properly escaped and imported, it will appear as it did in the original CSV file:

```
id,value
"123",Women's Full "Length" Dress
```

**Importing heterogeneous item types**

You can use a single CSV file to import different item types into one table. Define a header row that includes all attributes across your item types, and leave columns empty for attributes that don't apply to a given item. Empty columns are omitted from the imported item rather than stored as empty strings.

```
PK,SK,EntityType,Name,Email,OrderDate,Amount,ProductName,Quantity
USER#1,PROFILE,User,Alice,alice@example.com,,,,
USER#1,ORDER#2024-01-15,Order,,,2024-01-15,99.99,,
USER#1,ORDER#2024-02-10,Order,,,2024-02-10,149.50,,
PRODUCT#101,METADATA,Product,,,,,Laptop,50
PRODUCT#102,METADATA,Product,,,,,Mouse,200
USER#2,PROFILE,User,Bob,bob@example.com,,,,
USER#2,ORDER#2024-01-20,Order,,,2024-01-20,75.00,,
PRODUCT#103,METADATA,Product,,,,,Keyboard,150
USER#3,PROFILE,User,Charlie,charlie@example.com,,,,
PRODUCT#104,METADATA,Product,,,,,Monitor,30
```

In this example, user profiles, orders, and products share the same table. Each item type uses only the columns relevant to it.

## DynamoDB Json
<a name="S3DataImport.Requesting.Formats.DDBJson"></a>

A file in DynamoDB JSON format can consist of multiple Item objects. Each individual object is in DynamoDB’s standard marshalled JSON format, and newlines are used as item delimiters. As an added feature, exports from point in time are supported as an import source by default.

**Note**  
New lines are used as item delimiters for a file in DynamoDB JSON format and shouldn't be used within an item object.

```
{"Item": {"Authors": {"SS": ["Author1", "Author2"]}, "Dimensions": {"S": "8.5 x 11.0 x 1.5"}, "ISBN": {"S": "333-3333333333"}, "Id": {"N": "103"}, "InPublication": {"BOOL": false}, "PageCount": {"N": "600"}, "Price": {"N": "2000"}, "ProductCategory": {"S": "Book"}, "Title": {"S": "Book 103 Title"}}}
{"Item": {"Authors": {"SS": ["Author1", "Author2"]}, "Dimensions": {"S": "8.5 x 11.0 x 1.5"}, "ISBN": {"S": "444-444444444"}, "Id": {"N": "104"}, "InPublication": {"BOOL": false}, "PageCount": {"N": "600"}, "Price": {"N": "2000"}, "ProductCategory": {"S": "Book"}, "Title": {"S": "Book 104 Title"}}}
{"Item": {"Authors": {"SS": ["Author1", "Author2"]}, "Dimensions": {"S": "8.5 x 11.0 x 1.5"}, "ISBN": {"S": "555-5555555555"}, "Id": {"N": "105"}, "InPublication": {"BOOL": false}, "PageCount": {"N": "600"}, "Price": {"N": "2000"}, "ProductCategory": {"S": "Book"}, "Title": {"S": "Book 105 Title"}}}
```

## Amazon Ion
<a name="S3DataImport.Requesting.Formats.Ion"></a>

[Amazon Ion](https://amzn.github.io/ion-docs/) is a richly-typed, self-describing, hierarchical data serialization format built to address rapid development, decoupling, and efficiency challenges faced every day while engineering large-scale, service-oriented architectures.

When you import data in Ion format, the Ion datatypes are mapped to DynamoDB datatypes in the new DynamoDB table.


| S. No. | Ion to DynamoDB datatype conversion | B | 
| --- | --- | --- | 
| `1` | `Ion Data Type` | `DynamoDB Representation` | 
| `2` | `string` | `String (s)` | 
| `3` | `bool` | `Boolean (BOOL)` | 
| `4` | `decimal` | `Number (N)` | 
| `5` | `blob` | `Binary (B)` | 
| `6` | `list (with type annotation $dynamodb_SS, $dynamodb_NS, or $dynamodb_BS)` | `Set (SS, NS, BS)` | 
| `7` | `list` | `List` | 
| `8` | `struct` | `Map` | 

Items in an Ion file are delimited by newlines. Each line begins with an Ion version marker, followed by an item in Ion format.

**Note**  
In the following example, we've formatted items from an Ion-formatted file on multiple lines to improve readability.

```
$ion_1_0
[
  {
    Item:{
      Authors:$dynamodb_SS::["Author1","Author2"],
      Dimensions:"8.5 x 11.0 x 1.5",
      ISBN:"333-3333333333",
      Id:103.,
      InPublication:false,
      PageCount:6d2,
      Price:2d3,
      ProductCategory:"Book",
      Title:"Book 103 Title"
    }
  },
  {
    Item:{
      Authors:$dynamodb_SS::["Author1","Author2"],
      Dimensions:"8.5 x 11.0 x 1.5",
      ISBN:"444-4444444444",
      Id:104.,
      InPublication:false,
      PageCount:6d2,
      Price:2d3,
      ProductCategory:"Book",
      Title:"Book 104 Title"
    }
  },
  {
    Item:{
      Authors:$dynamodb_SS::["Author1","Author2"],
      Dimensions:"8.5 x 11.0 x 1.5",
      ISBN:"555-5555555555",
      Id:105.,
      InPublication:false,
      PageCount:6d2,
      Price:2d3,
      ProductCategory:"Book",
      Title:"Book 105 Title"
    }
  }
]
```

# Import format quotas and validation
<a name="S3DataImport.Validation"></a>

## Import quotas
<a name="S3DataImport.Validation.limits"></a>

DynamoDB Import from Amazon S3 can support up to 50 concurrent import jobs with a total import source object size of 15TB at a time in us-east-1, us-west-2, and eu-west-1 regions. In all other regions, up to 50 concurrent import tasks with a total size of 1TB is supported. Each import job can take up to 50,000 Amazon S3 objects in all regions. These default quotas are applied to every account. If you feel you need to revise these quotas, please contact your account team, and this will be considered on a case-by-case basis. For more details on DynamoDB limits, see [Service Quotas](ServiceQuotas.html).

## Validation errors
<a name="S3DataImport.Validation.Errors"></a>

During the import process, DynamoDB may encounter errors while parsing your data. For each error, DynamoDB emits a CloudWatch log and keeps a count of the total number of errors encountered. If the Amazon S3 object itself is malformed or if its contents cannot form a DynamoDB item, then we may skip processing the remaining portion of the object.

**Note**  
If the Amazon S3 data source has multiple items that share the same key, the items will overwrite until one remains. This can appear as if 1 item was imported and the others were ignored. The duplicate items will be overwritten in random order, are not counted as errors, and are not emitted to CloudWatch logs.  
Once the import is complete you can see the total count of items imported, total count of errors, and total count of items processed. For further troubleshooting you can also check the total size of items imported and total size of data processed.

There are three categories of import errors: API validation errors, data validation errors, and configuration errors.

### API validation errors
<a name="S3DataImport.Validation.Errors.API"></a>

API validation errors are item-level errors from the sync API. Common causes are permissions issues, missing required parameters and parameter validation failures. Details on why the API call failed are contained in the exceptions thrown by the `ImportTable` request.

### Data validation errors
<a name="S3DataImport.Validation.Errors.Data"></a>

Data validation errors can occur at either the item level or file level. During import, items are validated based on DynamoDB rules before importing into the target table. When an item fails validation and is not imported, the import job skips over that item and continues on with the next item. At the end of job, the import status is set to FAILED with a FailureCode, ItemValidationError and the FailureMessage "Some of the items failed validation checks and were not imported. Please check CloudWatch error logs for more details."

 Common causes for data validation errors include objects being unparsable, objects being in the incorrect format (input specifies DYNAMODB\$1JSON but the object is not in DYNAMODB\$1JSON), and schema mismatch with specified source table keys.

### Configuration errors
<a name="S3DataImport.Validation.Errors.Configuration"></a>

Configuration errors are typically workflow errors due to permission validation. The Import workflow checks some permissions after accepting the request. If there are issues calling any of the required dependencies like Amazon S3 or CloudWatch the process marks the import status as FAILED. The `failureCode` and `failureMessage` point to the reason for failure. Where applicable, the failure message also contains the request id that you can use to investigate the reason for failure in CloudTrail.

Common configuration errors include having the wrong URL for the Amazon S3 bucket, and not having permission to access the Amazon S3 bucket, CloudWatch Logs, and AWS KMS keys used to decrypt the Amazon S3 object. For more information see [Using and data keys](encryption.usagenotes.html#dynamodb-kms). 

### Validating source Amazon S3 objects
<a name="S3DataImport.Validation.Errors.S3Objects"></a>

In order to validate source S3 objects, take the following steps.

1. Validate the data format and compression type 
   + Make sure that all matching Amazon S3 objects under the specified prefix have the same format (DYNAMODB\$1JSON, DYNAMODB\$1ION, CSV)
   + Make sure that all matching Amazon S3 objects under the specified prefix are compressed the same way (GZIP, ZSTD, NONE)
**Note**  
The Amazon S3 objects do not need to have the corresponding extension (.csv / .json / .ion / .gz / .zstd etc) as the input format specified in ImportTable call takes precedence.

1. Validate that the import data conforms to the desired table schema
   + Make sure that each item in the source data has the primary key. A sort key is optional for imports.
   + Make sure that the attribute type associated with the primary key and any sort key matches the attribute type in the Table and the GSI schema, as specified in table creation parameters

### Troubleshooting
<a name="S3DataImport.Validation.Troubleshooting"></a>

#### CloudWatch logs
<a name="S3DataImport.Validation.Troubleshooting.Cloudwatch"></a>

For Import jobs that fail, detailed error messages are posted to CloudWatch logs. To access these logs, first retrieve the ImportArn from the output and describe-import using this command:

```
aws dynamodb describe-import --import-arn arn:aws:dynamodb:us-east-1:ACCOUNT:table/target-table/import/01658528578619-c4d4e311
}
```

Example output:

```
aws dynamodb describe-import --import-arn "arn:aws:dynamodb:us-east-1:531234567890:table/target-table/import/01658528578619-c4d4e311"
{
    "ImportTableDescription": {
        "ImportArn": "arn:aws:dynamodb:us-east-1:ACCOUNT:table/target-table/import/01658528578619-c4d4e311",
        "ImportStatus": "FAILED",
        "TableArn": "arn:aws:dynamodb:us-east-1:ACCOUNT:table/target-table",
        "TableId": "7b7ecc22-302f-4039-8ea9-8e7c3eb2bcb8",
        "ClientToken": "30f8891c-e478-47f4-af4a-67a5c3b595e3",
        "S3BucketSource": {
            "S3BucketOwner": "ACCOUNT",
            "S3Bucket": "my-import-source",
            "S3KeyPrefix": "import-test"
        },
        "ErrorCount": 1,
        "CloudWatchLogGroupArn": "arn:aws:logs:us-east-1:ACCOUNT:log-group:/aws-dynamodb/imports:*",
        "InputFormat": "CSV",
        "InputCompressionType": "NONE",
        "TableCreationParameters": {
            "TableName": "target-table",
            "AttributeDefinitions": [
                {
                    "AttributeName": "pk",
                    "AttributeType": "S"
                }
            ],
            "KeySchema": [
                {
                    "AttributeName": "pk",
                    "KeyType": "HASH"
                }
            ],
            "BillingMode": "PAY_PER_REQUEST"
        },
        "StartTime": 1658528578.619,
        "EndTime": 1658528750.628,
        "ProcessedSizeBytes": 70,
        "ProcessedItemCount": 1,
        "ImportedItemCount": 0,
        "FailureCode": "ItemValidationError",
        "FailureMessage": "Some of the items failed validation checks and were not imported. Please check CloudWatch error logs for more details."
    }
}
```

Retrieve the log group and the import id from the above response and use it to retrieve the error logs. The import ID is the last path element of the `ImportArn` field. The log group name is `/aws-dynamodb/imports`. The error log stream name is `import-id/error`. For this example, it would be `01658528578619-c4d4e311/error`.

#### Missing the key pk in the item
<a name="S3DataImport.Validation.Troubleshooting.Missing"></a>

If the source S3 object does not contain the primary key that was provided as a parameter, the import will fail. For example, when you define the primary key for the import as column name “pk”.

```
aws dynamodb import-table —s3-bucket-source S3Bucket=my-import-source,S3KeyPrefix=import-test.csv \ 
            —input-format CSV --table-creation-parameters '{"TableName":"target-table","KeySchema":  \
            [{"AttributeName":"pk","KeyType":"HASH"}],"AttributeDefinitions":[{"AttributeName":"pk","AttributeType":"S"}],"BillingMode":"PAY_PER_REQUEST"}'
```

The column “pk” is missing from the the source object `import-test.csv` which has the following contents:

```
title,artist,year_of_release
The Dark Side of the Moon,Pink Floyd,1973
```

This import will fail due to item validation error because of the missing primary key in the data source.

Example CloudWatch error log:

```
aws logs get-log-events —log-group-name /aws-dynamodb/imports —log-stream-name 01658528578619-c4d4e311/error
{
"events": [
{
"timestamp": 1658528745319,
"message": "{\"itemS3Pointer\":{\"bucket\":\"my-import-source\",\"key\":\"import-test.csv\",\"itemIndex\":0},\"importArn\":\"arn:aws:dynamodb:us-east-1:531234567890:table/target-table/import/01658528578619-c4d4e311\",\"errorMessages\":[\"One or more parameter values were invalid: Missing the key pk in the item\"]}",
"ingestionTime": 1658528745414
}
],
"nextForwardToken": "f/36986426953797707963335499204463414460239026137054642176/s",
"nextBackwardToken": "b/36986426953797707963335499204463414460239026137054642176/s"
}
```

This error log indicates that “One or more parameter values were invalid: Missing the key pk in the item”. Since this import job failed, the table “target-table” now exists and is empty because no items were imported. The first item was processed and the object failed Item Validation. 

To fix the issue, first delete “target-table” if it is no longer needed. Then either use a primary key column name that exists in the source object, or update the source data to:

```
pk,title,artist,year_of_release
Albums::Rock::Classic::1973::AlbumId::ALB25,The Dark Side of the Moon,Pink Floyd,1973
```

#### Target table exists
<a name="S3DataImport.Validation.Troubleshooting.TargetTable"></a>

When you start an import job and receive a response as follows:

```
An error occurred (ResourceInUseException) when calling the ImportTable operation: Table already exists: target-table
```

To fix this error, you will need to choose a table name that doesn’t already exist and retry the import. 

#### The specified bucket does not exist
<a name="S3DataImport.Validation.Troubleshooting.Bucket"></a>

If the source bucket does not exist, the import will fail and log the error message details in CloudWatch. 

Example describe import:

```
aws dynamodb —endpoint-url $ENDPOINT describe-import —import-arn "arn:aws:dynamodb:us-east-1:531234567890:table/target-table/import/01658530687105-e6035287"
{
"ImportTableDescription": {
"ImportArn": "arn:aws:dynamodb:us-east-1:ACCOUNT:table/target-table/import/01658530687105-e6035287",
"ImportStatus": "FAILED",
"TableArn": "arn:aws:dynamodb:us-east-1:ACCOUNT:table/target-table",
"TableId": "e1215a82-b8d1-45a8-b2e2-14b9dd8eb99c",
"ClientToken": "3048e16a-069b-47a6-9dfb-9c259fd2fb6f",
"S3BucketSource": {
"S3BucketOwner": "531234567890",
"S3Bucket": "BUCKET_DOES_NOT_EXIST",
"S3KeyPrefix": "import-test"
},
"ErrorCount": 0,
"CloudWatchLogGroupArn": "arn:aws:logs:us-east-1:ACCOUNT:log-group:/aws-dynamodb/imports:*",
"InputFormat": "CSV",
"InputCompressionType": "NONE",
"TableCreationParameters": {
"TableName": "target-table",
"AttributeDefinitions": [
{
"AttributeName": "pk",
"AttributeType": "S"
}
],
"KeySchema": [
{
"AttributeName": "pk",
"KeyType": "HASH"
}
],
"BillingMode": "PAY_PER_REQUEST"
},
"StartTime": 1658530687.105,
"EndTime": 1658530701.873,
"ProcessedSizeBytes": 0,
"ProcessedItemCount": 0,
"ImportedItemCount": 0,
"FailureCode": "S3NoSuchBucket",
"FailureMessage": "The specified bucket does not exist (Service: Amazon S3; Status Code: 404; Error Code: NoSuchBucket; Request ID: Q4W6QYYFDWY6WAKH; S3 Extended Request ID: ObqSlLeIMJpQqHLRX2C5Sy7n+8g6iGPwy7ixg7eEeTuEkg/+chU/JF+RbliWytMlkUlUcuCLTrI=; Proxy: null)"
}
}
```

The `FailureCode` is `S3NoSuchBucket`, with `FailureMessage` containing details such as request id and the service that threw the error. Since the error was caught before the data was imported into the table, a new DynamoDB table is not created. In some cases, when these errors are encountered after the data import has started, the table with partially imported data is retained. 

To fix this error, make sure that the source Amazon S3 bucket exists and then restart the import process.

# Best practices for importing from Amazon S3 into DynamoDB
<a name="S3DataImport.BestPractices"></a>

The following are the best practices for importing data from Amazon S3 into DynamoDB.

## Stay under the limit of 50,000 S3 objects
<a name="S3DataImport.BestPractices.S3Limit"></a>

Each import job supports a maximum of 50,000 S3 objects. If your dataset contains more than 50,000 objects, consider consolidating them into larger objects.

## Avoid excessively large S3 objects
<a name="S3DataImport.BestPractices.AvoidLargeObjects"></a>

S3 objects are imported in parallel. Having numerous mid-sized S3 objects allows for parallel execution without excessive overhead. For items under 1 KB, consider placing 4,000,000 items into each S3 object. If you have a larger average item size, place proportionally fewer items into each S3 object.

## Randomize sorted data
<a name="S3DataImport.BestPractices.RandomizeSortedData"></a>

If an S3 object holds data in sorted order, it can create a *rolling hot partition*. This is a situation where one partition receives all the activity, and then the next partition after that, and so on. Data in sorted order is defined as items in sequence in the S3 object that will be written to the same target partition during the import. One common situation where data is in sorted order is a CSV file where items are sorted by partition key so that repeated items share the same partition key.

To avoid a rolling hot partition, we recommend that you randomize the order in these cases. This can improve performance by spreading the write operations. For more information, see [Distributing write activity efficiently during data upload in DynamoDB](bp-partition-key-data-upload.md).

## Compress data to keep the total S3 object size below the Regional limit
<a name="S3DataImport.BestPractices.CompressData"></a>

In the [import from S3 process](S3DataImport.Requesting.md), there is a limit on the sum total size of the S3 object data to be imported. The limit is 15 TB in the us-east-1, us-west-2, and eu-west-1 Regions, and 1 TB in all other Regions. The limit is based on the raw S3 object sizes.

Compression allows more raw data to fit within the limit. If compression alone isn’t sufficient to fit the import within the limit, you can also contact [AWS Premium Support](https://aws.amazon.com/premiumsupport/) for a quota increase.

## Be aware of how item size impacts performance
<a name="S3DataImport.BestPractices.ItemSize"></a>

If your average item size is very small (below 200 bytes), the import process might take a little longer than for larger item sizes.

## Do not modify S3 objects during active imports
<a name="S3DataImport.BestPractices.NoModification"></a>

Ensure that your source S3 objects remain unchanged while an import operation is in progress. If an S3 object is modified during an import, the operation will fail with error code `ObjectModifiedInS3DuringImport` and the message "The S3 object could not be imported because it was overwritten."

If you encounter this error, restart the import operation with a stable version of your S3 object. To avoid this issue, wait for the current import to complete before making changes to the source files.

## Consider importing without any Global Secondary Indexes
<a name="S3DataImport.BestPractices.GSI"></a>

The duration of an import task may depend on the presence of one or multiple global secondary indexes (GSIs). If you plan to establish indexes with partition keys that have low cardinality, you may see a faster import if you defer index creation until after the import task is finished (rather than including them in the import job).

**Note**  
Creating a GSI does not incur write charges, whether it is created during or after the import.