DynamoDB table export output format - Amazon DynamoDB

DynamoDB table export output format

A DynamoDB table export includes manifest files in addition to the files containing your table data. These files are all saved in the Amazon S3 bucket that you specify in your export request. The following sections describe the format and contents of each output object.

Topics

    Manifest files

    DynamoDB creates manifest files, along with their checksum files, in the specified S3 bucket for each export request.

    export-prefix/AWSDynamoDB/ExportId/manifest-summary.json export-prefix/AWSDynamoDB/ExportId/manifest-summary.checksum export-prefix/AWSDynamoDB/ExportId/manifest-files.json export-prefix/AWSDynamoDB/ExportId/manifest-files.checksum

    You choose an export-prefix when you request a table export. This helps you keep files in the destination S3 bucket organized. The ExportId is a unique token generated by the service to ensure that multiple exports to the same S3 bucket and export-prefix don't overwrite each other.

    The export creates at least 1 file per partition. For partitions that are empty, your export request will create an empty file. All of the items in each file are from that particular partition's hashed keyspace.

    Note

    DynamoDB also creates an empty file named _started in the same directory as the manifest files. This file verifies that the destination bucket is writable and that the export has begun. It can safely be deleted.

    The summary manifest

    The manifest-summary.json file contains summary information about the export job. This allows you to know which data files in the shared data folder are associated with this export. Its format is as follows:

    { "version": "2020-06-30", "exportArn": "arn:aws:dynamodb:us-east-1:123456789012:table/ProductCatalog/export/01234567890123-a1b2c3d4", "startTime": "2020-11-04T07:28:34.028Z", "endTime": "2020-11-04T07:33:43.897Z", "tableArn": "arn:aws:dynamodb:us-east-1:123456789012:table/ProductCatalog", "tableId": "12345a12-abcd-123a-ab12-1234abc12345", "exportTime": "2020-11-04T07:28:34.028Z", "s3Bucket": "ddb-productcatalog-export", "s3Prefix": "2020-Nov", "s3SseAlgorithm": "AES256", "s3SseKmsKeyId": null, "manifestFilesS3Key": "AWSDynamoDB/01693685827463-2d8752fd/manifest-files.json", "billedSizeBytes": 0, "itemCount": 8, "outputFormat": "DYNAMODB_JSON", "exportType": "FULL_EXPORT" }

    The files manifest

    The manifest-files.json file contains information about the files that contain your exported table data. The file is in JSON lines format, so newlines are used as item delimiters. In the following example, the details of one data file from a files manifest are formatted on multiple lines for the sake of readability.

    { "itemCount": 8, "md5Checksum": "sQMSpEILNgoQmarvDFonGQ==", "etag": "af83d6f217c19b8b0fff8023d8ca4716-1", "dataFileS3Key": "AWSDynamoDB/01693685827463-2d8752fd/data/asdl123dasas.json.gz" }

    Data files

    DynamoDB can export your table data in two formats: DynamoDB JSON and Amazon Ion. Regardless of the format you choose, your data will be written to multiple compressed files named by the keys. These files are also listed in the manifest-files.json file.

    The directory structure of your Amazon S3 bucket after a full export will contain all of your manifest files and data files under the export Id folder.

    amzn-s3-demo-bucket/DestinationPrefix . └── AWSDynamoDB ├── 01693685827463-2d8752fd // the single full export │ ├── manifest-files.json // manifest points to files under 'data' subfolder │ ├── manifest-files.checksum │ ├── manifest-summary.json // stores metadata about request │ ├── manifest-summary.md5 │ ├── data // The data exported by full export │ │ ├── asdl123dasas.json.gz │ │ ... │ └── _started // empty file for permission check

    DynamoDB JSON

    A table export in DynamoDB JSON format consists of multiple Item objects. Each individual object is in DynamoDB's standard marshalled JSON format.

    When creating custom parsers for DynamoDB JSON export data, the format is JSON lines. This means that newlines are used as item delimiters. Many AWS services, such as Athena and AWS Glue, will parse this format automatically.

    In the following example, a single item from a DynamoDB JSON export has been formatted on multiple lines for the sake of readability.

    { "Item":{ "Authors":{ "SS":[ "Author1", "Author2" ] }, "Dimensions":{ "S":"8.5 x 11.0 x 1.5" }, "ISBN":{ "S":"333-3333333333" }, "Id":{ "N":"103" }, "InPublication":{ "BOOL":false }, "PageCount":{ "N":"600" }, "Price":{ "N":"2000" }, "ProductCategory":{ "S":"Book" }, "Title":{ "S":"Book 103 Title" } } }

    Amazon Ion

    Amazon Ion is a richly-typed, self-describing, hierarchical data serialization format built to address rapid development, decoupling, and efficiency challenges faced every day while engineering large-scale, service-oriented architectures. DynamoDB supports exporting table data in Ion's text format, which is a superset of JSON.

    When you export a table to Ion format, the DynamoDB datatypes used in the table are mapped to Ion datatypes. DynamoDB sets use Ion type annotations to disambiguate the datatype used in the source table.

    The following table lists the mapping of DynamoDB data types to ion data types:

    DynamoDB data type Ion representation
    String (S) string
    Boolean (BOOL) bool
    Number (N) decimal
    Binary (B) blob
    Set (SS, NS, BS) list (with type annotation $dynamodb_SS, $dynamodb_NS, or $dynamodb_BS)
    List list
    Map struct

    Items in an Ion export are delimited by newlines. Each line begins with an Ion version marker, followed by an item in Ion format. In the following example, an item from an Ion export has been formatted on multiple lines for the sake of readability.

    $ion_1_0 { Item:{ Authors:$dynamodb_SS::["Author1","Author2"], Dimensions:"8.5 x 11.0 x 1.5", ISBN:"333-3333333333", Id:103., InPublication:false, PageCount:6d2, Price:2d3, ProductCategory:"Book", Title:"Book 103 Title" } }

    Manifest files

    DynamoDB creates manifest files, along with their checksum files, in the specified S3 bucket for each export request.

    export-prefix/AWSDynamoDB/ExportId/manifest-summary.json export-prefix/AWSDynamoDB/ExportId/manifest-summary.checksum export-prefix/AWSDynamoDB/ExportId/manifest-files.json export-prefix/AWSDynamoDB/ExportId/manifest-files.checksum

    You choose an export-prefix when you request a table export. This helps you keep files in the destination S3 bucket organized. The ExportId is a unique token generated by the service to ensure that multiple exports to the same S3 bucket and export-prefix don't overwrite each other.

    The export creates at least 1 file per partition. For partitions that are empty, your export request will create an empty file. All of the items in each file are from that particular partition's hashed keyspace.

    Note

    DynamoDB also creates an empty file named _started in the same directory as the manifest files. This file verifies that the destination bucket is writable and that the export has begun. It can safely be deleted.

    The summary manifest

    The manifest-summary.json file contains summary information about the export job. This allows you to know which data files in the shared data folder are associated with this export. Its format is as follows:

    { "version": "2023-08-01", "exportArn": "arn:aws:dynamodb:us-east-1:599882009758:table/export-test/export/01695097218000-d6299cbd", "startTime": "2023-09-19T04:20:18.000Z", "endTime": "2023-09-19T04:40:24.780Z", "tableArn": "arn:aws:dynamodb:us-east-1:599882009758:table/export-test", "tableId": "b116b490-6460-4d4a-9a6b-5d360abf4fb3", "exportFromTime": "2023-09-18T17:00:00.000Z", "exportToTime": "2023-09-19T04:00:00.000Z", "s3Bucket": "jason-exports", "s3Prefix": "20230919-prefix", "s3SseAlgorithm": "AES256", "s3SseKmsKeyId": null, "manifestFilesS3Key": "20230919-prefix/AWSDynamoDB/01693685934212-ac809da5/manifest-files.json", "billedSizeBytes": 20901239349, "itemCount": 169928274, "outputFormat": "DYNAMODB_JSON", "outputView": "NEW_AND_OLD_IMAGES", "exportType": "INCREMENTAL_EXPORT" }

    The files manifest

    The manifest-files.json file contains information about the files that contain your exported table data. The file is in JSON lines format, so newlines are used as item delimiters. In the following example, the details of one data file from a files manifest are formatted on multiple lines for the sake of readability.

    { "itemCount": 8, "md5Checksum": "sQMSpEILNgoQmarvDFonGQ==", "etag": "af83d6f217c19b8b0fff8023d8ca4716-1", "dataFileS3Key": "AWSDynamoDB/data/sgad6417s6vss4p7owp0471bcq.json.gz" }

    Data files

    DynamoDB can export your table data in two formats: DynamoDB JSON and Amazon Ion. Regardless of the format you choose, your data will be written to multiple compressed files named by the keys. These files are also listed in the manifest-files.json file.

    The data files for incremental exports are all contained in a common data folder in your S3 bucket. Your manifest files are under your export ID folder.

    amzn-s3-demo-bucket/DestinationPrefix . └── AWSDynamoDB ├── 01693685934212-ac809da5 // an incremental export ID │ ├── manifest-files.json // manifest points to files under 'data' folder │ ├── manifest-files.checksum │ ├── manifest-summary.json // stores metadata about request │ ├── manifest-summary.md5 │ └── _started // empty file for permission check ├── 01693686034521-ac809da5 │ ├── manifest-files.json │ ├── manifest-files.checksum │ ├── manifest-summary.json │ ├── manifest-summary.md5 │ └── _started ├── data // stores all the data files for incremental exports │ ├── sgad6417s6vss4p7owp0471bcq.json.gz │ ...

    In you export files, each item’s output includes a timestamp that represents when that item was updated in your table and a data structure that indicates if it was an insert, update, or delete operation. The timestamp is based on an internal system clock and can vary from your application clock. For incremental exports, you can choose between two export view types for your output structure: new and old images or new images only.

    • New image provides the latest state of the item

    • Old image provides the state of the item right before the specified start date and time

    View types can be helpful if you want to see how the item was changed within the export period. It can also be useful for efficiently updating your downstream systems, especially if those downstream systems have a partition key that is not the same as your DynamoDB partition key.

    You can infer whether an item in your incremental export output was an insert, update, or delete by looking at the structure of the output. The incremental export structure and its corresponding operations are summarized in the table below for both export view types.

    Operation New images only New and old images

    Insert

    Keys + new image

    Keys + new image

    Update

    Keys + new image Keys + new image + old image
    Delete Keys Keys + old image
    Insert + delete No output No output

    DynamoDB JSON

    A table export in DynamoDB JSON format consists of a metadata timestamp that indicates the write time of the item, followed by the keys of the item and the values. The following shows an example DynamoDB JSON output using export view type as New and Old images.

    // Ex 1: Insert // An insert means the item did not exist before the incremental export window // and was added during the incremental export window { "Metadata": { "WriteTimestampMicros": "1680109764000000" }, "Keys": { "PK": { "S": "CUST#100" } }, "NewImage": { "PK": { "S": "CUST#100" }, "FirstName": { "S": "John" }, "LastName": { "S": "Don" } } } // Ex 2: Update // An update means the item existed before the incremental export window // and was updated during the incremental export window. // The OldImage would not be present if choosing "New images only". { "Metadata": { "WriteTimestampMicros": "1680109764000000" }, "Keys": { "PK": { "S": "CUST#200" } }, "OldImage": { "PK": { "S": "CUST#200" }, "FirstName": { "S": "Mary" }, "LastName": { "S": "Grace" } }, "NewImage": { "PK": { "S": "CUST#200" }, "FirstName": { "S": "Mary" }, "LastName": { "S": "Smith" } } } // Ex 3: Delete // A delete means the item existed before the incremental export window // and was deleted during the incremental export window // The OldImage would not be present if choosing "New images only". { "Metadata": { "WriteTimestampMicros": "1680109764000000" }, "Keys": { "PK": { "S": "CUST#300" } }, "OldImage": { "PK": { "S": "CUST#300" }, "FirstName": { "S": "Jose" }, "LastName": { "S": "Hernandez" } } } // Ex 4: Insert + Delete // Nothing is exported if an item is inserted and deleted within the // incremental export window.

    Amazon Ion

    Amazon Ion is a richly-typed, self-describing, hierarchical data serialization format built to address rapid development, decoupling, and efficiency challenges faced every day while engineering large-scale, service-oriented architectures. DynamoDB supports exporting table data in Ion's text format, which is a superset of JSON.

    When you export a table to Ion format, the DynamoDB datatypes used in the table are mapped to Ion datatypes. DynamoDB sets use Ion type annotations to disambiguate the datatype used in the source table.

    The following table lists the mapping of DynamoDB data types to ion data types:

    DynamoDB data type Ion representation
    String (S) string
    Boolean (BOOL) bool
    Number (N) decimal
    Binary (B) blob
    Set (SS, NS, BS) list (with type annotation $dynamodb_SS, $dynamodb_NS, or $dynamodb_BS)
    List list
    Map struct

    Items in an Ion export are delimited by newlines. Each line begins with an Ion version marker, followed by an item in Ion format. In the following example, an item from an Ion export has been formatted on multiple lines for the sake of readability.

    $ion_1_0 { Record:{ Keys:{ ISBN:"333-3333333333" }, Metadata:{ WriteTimestampMicros:1684374845117899. }, OldImage:{ Authors:$dynamodb_SS::["Author1","Author2"], ISBN:"333-3333333333", Id:103., InPublication:false, ProductCategory:"Book", Title:"Book 103 Title" }, NewImage:{ Authors:$dynamodb_SS::["Author1","Author2"], Dimensions:"8.5 x 11.0 x 1.5", ISBN:"333-3333333333", Id:103., InPublication:true, PageCount:6d2, Price:2d3, ProductCategory:"Book", Title:"Book 103 Title" } } }