DynamoDBExportDataFormat - AWS Data Pipeline

AWS Data Pipeline is no longer available to new customers. Existing customers of AWS Data Pipeline can continue to use the service as normal. Learn more

DynamoDBExportDataFormat

Applies a schema to an DynamoDB table to make it accessible by a Hive query. Use DynamoDBExportDataFormat with a HiveCopyActivity object and DynamoDBDataNode or S3DataNode input and output. DynamoDBExportDataFormat has the following benefits:

  • Provides both DynamoDB and Amazon S3 support

  • Allows you to filter data by certain columns in your Hive query

  • Exports all attributes from DynamoDB even if you have a sparse schema

Note

DynamoDB Boolean types are not mapped to Hive Boolean types. However, it is possible to map DynamoDB integer values of 0 or 1 to Hive Boolean types.

Example

The following example shows how to use HiveCopyActivity and DynamoDBExportDataFormat to copy data from one DynamoDBDataNode to another, while filtering based on a time stamp.

{ "objects": [ { "id" : "DataFormat.1", "name" : "DataFormat.1", "type" : "DynamoDBExportDataFormat", "column" : "timeStamp BIGINT" }, { "id" : "DataFormat.2", "name" : "DataFormat.2", "type" : "DynamoDBExportDataFormat" }, { "id" : "DynamoDBDataNode.1", "name" : "DynamoDBDataNode.1", "type" : "DynamoDBDataNode", "tableName" : "item_mapped_table_restore_temp", "schedule" : { "ref" : "ResourcePeriod" }, "dataFormat" : { "ref" : "DataFormat.1" } }, { "id" : "DynamoDBDataNode.2", "name" : "DynamoDBDataNode.2", "type" : "DynamoDBDataNode", "tableName" : "restore_table", "region" : "us_west_1", "schedule" : { "ref" : "ResourcePeriod" }, "dataFormat" : { "ref" : "DataFormat.2" } }, { "id" : "EmrCluster.1", "name" : "EmrCluster.1", "type" : "EmrCluster", "schedule" : { "ref" : "ResourcePeriod" }, "masterInstanceType" : "m1.xlarge", "coreInstanceCount" : "4" }, { "id" : "HiveTransform.1", "name" : "Hive Copy Transform.1", "type" : "HiveCopyActivity", "input" : { "ref" : "DynamoDBDataNode.1" }, "output" : { "ref" : "DynamoDBDataNode.2" }, "schedule" : { "ref" : "ResourcePeriod" }, "runsOn" : { "ref" : "EmrCluster.1" }, "filterSql" : "`timeStamp` > unix_timestamp(\"#{@scheduledStartTime}\", \"yyyy-MM-dd'T'HH:mm:ss\")" }, { "id" : "ResourcePeriod", "name" : "ResourcePeriod", "type" : "Schedule", "period" : "1 Hour", "startDateTime" : "2013-06-04T00:00:00", "endDateTime" : "2013-06-04T01:00:00" } ] }

Syntax

Optional Fields Description Slot Type
column Column name with datatype specified by each field for the data described by this data node. Ex: hostname STRING String
parent Parent of the current object from which slots will be inherited. Reference Object, e.g. "parent":{"ref":"myBaseObjectId"}

Runtime Fields Description Slot Type
@version Pipeline version the object was created with. String

System Fields Description Slot Type
@error Error describing the ill-formed object String
@pipelineId Id of the pipeline to which this object belongs to String
@sphere The sphere of an object denotes its place in the lifecycle: Component Objects give rise to Instance Objects which execute Attempt Objects String