AWS::DataPipeline::Pipeline
The AWS::DataPipeline::Pipeline resource specifies a data pipeline that you can use to automate the movement and transformation of data.
Important
AWS Data Pipeline is no longer available to new customers. Existing customers of AWS Data Pipeline can continue to use the service as normal. Learn more
In each pipeline, you define pipeline objects, such as activities, schedules, data nodes, and resources.
The AWS::DataPipeline::Pipeline
resource adds tasks, schedules, and
preconditions to the specified pipeline. You can use PutPipelineDefinition
to
populate a new pipeline.
PutPipelineDefinition
also validates the configuration as it adds it to the pipeline. Changes to the pipeline are saved unless one
of the following validation errors exist in the pipeline.
-
An object is missing a name or identifier field.
-
A string or reference field is empty.
-
The number of objects in the pipeline exceeds the allowed maximum number of objects.
-
The pipeline is in a FINISHED state.
Pipeline object definitions are passed to the PutPipelineDefinition action and returned by the GetPipelineDefinition action.
Syntax
To declare this entity in your AWS CloudFormation template, use the following syntax:
JSON
{ "Type" : "AWS::DataPipeline::Pipeline", "Properties" : { "Activate" :
Boolean
, "Description" :String
, "Name" :String
, "ParameterObjects" :[ ParameterObject, ... ]
, "ParameterValues" :[ ParameterValue, ... ]
, "PipelineObjects" :[ PipelineObject, ... ]
, "PipelineTags" :[ PipelineTag, ... ]
} }
YAML
Type: AWS::DataPipeline::Pipeline Properties: Activate:
Boolean
Description:String
Name:String
ParameterObjects:- ParameterObject
ParameterValues:- ParameterValue
PipelineObjects:- PipelineObject
PipelineTags:- PipelineTag
Properties
Activate
-
Indicates whether to validate and start the pipeline or stop an active pipeline. By default, the value is set to
true
.Required: No
Type: Boolean
Update requires: No interruption
Description
-
A description of the pipeline.
Required: No
Type: String
Pattern:
[\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\r\n\t]*
Minimum:
0
Maximum:
1024
Update requires: Replacement
Name
-
The name of the pipeline.
Required: Yes
Type: String
Pattern:
[\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\n\t]*
Minimum:
1
Maximum:
1024
Update requires: Replacement
ParameterObjects
-
The parameter objects used with the pipeline.
Required: No
Type: Array of ParameterObject
Update requires: No interruption
ParameterValues
-
The parameter values used with the pipeline.
Required: No
Type: Array of ParameterValue
Update requires: No interruption
PipelineObjects
-
The objects that define the pipeline. These objects overwrite the existing pipeline definition. Not all objects, fields, and values can be updated. For information about restrictions, see Editing Your Pipeline in the AWS Data Pipeline Developer Guide.
Required: No
Type: Array of PipelineObject
Update requires: No interruption
-
A list of arbitrary tags (key-value pairs) to associate with the pipeline, which you can use to control permissions. For more information, see Controlling Access to Pipelines and Resources in the AWS Data Pipeline Developer Guide.
Required: No
Type: Array of PipelineTag
Update requires: No interruption
Return values
Ref
When you pass the logical ID of this resource to the intrinsic Ref
function, Ref
returns the pipeline ID.
For more information about using the Ref
function, see Ref
.
Fn::GetAtt
PipelineId
-
The ID of the pipeline.
Examples
The following data pipeline backs up data from an Amazon DynamoDB table to an Amazon
S3 bucket. The pipeline uses the HiveCopyActivity
activity to copy the data,
and runs it once a day. The roles for the
pipeline and the pipeline resource are declared elsewhere in the same template.
JSON
"DynamoDBInputS3OutputHive": { "Type": "AWS::DataPipeline::Pipeline", "Properties": { "Name": "DynamoDBInputS3OutputHive", "Description": "Pipeline to backup DynamoDB data to S3", "Activate": "true", "ParameterObjects": [ { "Id": "myDDBReadThroughputRatio", "Attributes": [ { "Key": "description", "StringValue": "DynamoDB read throughput ratio" }, { "Key": "type", "StringValue": "Double" }, { "Key": "default", "StringValue": "0.2" } ] }, { "Id": "myOutputS3Loc", "Attributes": [ { "Key": "description", "StringValue": "S3 output bucket" }, { "Key": "type", "StringValue": "AWS::S3::ObjectKey" }, { "Key": "default", "StringValue": { "Fn::Join" : [ "", [ "s3://", { "Ref": "S3OutputLoc" } ] ] } } ] }, { "Id": "myDDBTableName", "Attributes": [ { "Key": "description", "StringValue": "DynamoDB Table Name " }, { "Key": "type", "StringValue": "String" } ] } ], "ParameterValues": [ { "Id": "myDDBTableName", "StringValue": { "Ref": "TableName" } } ], "PipelineObjects": [ { "Id": "S3BackupLocation", "Name": "Copy data to this S3 location", "Fields": [ { "Key": "type", "StringValue": "S3DataNode" }, { "Key": "dataFormat", "RefValue": "DDBExportFormat" }, { "Key": "directoryPath", "StringValue": "#{myOutputS3Loc}/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}" } ] }, { "Id": "DDBSourceTable", "Name": "DDBSourceTable", "Fields": [ { "Key": "tableName", "StringValue": "#{myDDBTableName}" }, { "Key": "type", "StringValue": "DynamoDBDataNode" }, { "Key": "dataFormat", "RefValue": "DDBExportFormat" }, { "Key": "readThroughputPercent", "StringValue": "#{myDDBReadThroughputRatio}" } ] }, { "Id": "DDBExportFormat", "Name": "DDBExportFormat", "Fields": [ { "Key": "type", "StringValue": "DynamoDBExportDataFormat" } ] }, { "Id": "TableBackupActivity", "Name": "TableBackupActivity", "Fields": [ { "Key": "resizeClusterBeforeRunning", "StringValue": "true" }, { "Key": "type", "StringValue": "HiveCopyActivity" }, { "Key": "input", "RefValue": "DDBSourceTable" }, { "Key": "runsOn", "RefValue": "EmrClusterForBackup" }, { "Key": "output", "RefValue": "S3BackupLocation" } ] }, { "Id": "DefaultSchedule", "Name": "RunOnce", "Fields": [ { "Key": "occurrences", "StringValue": "1" }, { "Key": "startAt", "StringValue": "FIRST_ACTIVATION_DATE_TIME" }, { "Key": "type", "StringValue": "Schedule" }, { "Key": "period", "StringValue": "1 Day" } ] }, { "Id": "Default", "Name": "Default", "Fields": [ { "Key": "type", "StringValue": "Default" }, { "Key": "scheduleType", "StringValue": "cron" }, { "Key": "failureAndRerunMode", "StringValue": "CASCADE" }, { "Key": "role", "StringValue": "DataPipelineDefaultRole" }, { "Key": "resourceRole", "StringValue": "DataPipelineDefaultResourceRole" }, { "Key": "schedule", "RefValue": "DefaultSchedule" } ] }, { "Id": "EmrClusterForBackup", "Name": "EmrClusterForBackup", "Fields": [ { "Key": "terminateAfter", "StringValue": "2 Hours" }, { "Key": "amiVersion", "StringValue": "3.3.2" }, { "Key": "masterInstanceType", "StringValue": "m1.medium" }, { "Key": "coreInstanceType", "StringValue": "m1.medium" }, { "Key": "coreInstanceCount", "StringValue": "1" }, { "Key": "type", "StringValue": "EmrCluster" } ] } ] } }
YAML
DynamoDBInputS3OutputHive: Type: AWS::DataPipeline::Pipeline Properties: Name: DynamoDBInputS3OutputHive Description: "Pipeline to backup DynamoDB data to S3" Activate: true ParameterObjects: - Id: "myDDBReadThroughputRatio" Attributes: - Key: "description" StringValue: "DynamoDB read throughput ratio" - Key: "type" StringValue: "Double" - Key: "default" StringValue: "0.2" - Id: "myOutputS3Loc" Attributes: - Key: "description" StringValue: "S3 output bucket" - Key: "type" StringValue: "AWS::S3::ObjectKey" - Key: "default" StringValue: Fn::Join: - "" - - "s3://" - Ref: "S3OutputLoc" - Id: "myDDBTableName" Attributes: - Key: "description" StringValue: "DynamoDB Table Name " - Key: "type" StringValue: "String" ParameterValues: - Id: "myDDBTableName" StringValue: Ref: "TableName" PipelineObjects: - Id: "S3BackupLocation" Name: "Copy data to this S3 location" Fields: - Key: "type" StringValue: "S3DataNode" - Key: "dataFormat" RefValue: "DDBExportFormat" - Key: "directoryPath" StringValue: "#{myOutputS3Loc}/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}" - Id: "DDBSourceTable" Name: "DDBSourceTable" Fields: - Key: "tableName" StringValue: "#{myDDBTableName}" - Key: "type" StringValue: "DynamoDBDataNode" - Key: "dataFormat" RefValue: "DDBExportFormat" - Key: "readThroughputPercent" StringValue: "#{myDDBReadThroughputRatio}" - Id: "DDBExportFormat" Name: "DDBExportFormat" Fields: - Key: "type" StringValue: "DynamoDBExportDataFormat" - Id: "TableBackupActivity" Name: "TableBackupActivity" Fields: - Key: "resizeClusterBeforeRunning" StringValue: "true" - Key: "type" StringValue: "HiveCopyActivity" - Key: "input" RefValue: "DDBSourceTable" - Key: "runsOn" RefValue: "EmrClusterForBackup" - Key: "output" RefValue: "S3BackupLocation" - Id: "DefaultSchedule" Name: "RunOnce" Fields: - Key: "occurrences" StringValue: "1" - Key: "startAt" StringValue: "FIRST_ACTIVATION_DATE_TIME" - Key: "type" StringValue: "Schedule" - Key: "period" StringValue: "1 Day" - Id: "Default" Name: "Default" Fields: - Key: "type" StringValue: "Default" - Key: "scheduleType" StringValue: "cron" - Key: "failureAndRerunMode" StringValue: "CASCADE" - Key: "role" StringValue: "DataPipelineDefaultRole" - Key: "resourceRole" StringValue: "DataPipelineDefaultResourceRole" - Key: "schedule" RefValue: "DefaultSchedule" - Id: "EmrClusterForBackup" Name: "EmrClusterForBackup" Fields: - Key: "terminateAfter" StringValue: "2 Hours" - Key: "amiVersion" StringValue: "3.3.2" - Key: "masterInstanceType" StringValue: "m1.medium" - Key: "coreInstanceType" StringValue: "m1.medium" - Key: "coreInstanceCount" StringValue: "1" - Key: "type" StringValue: "EmrCluster"
See also
-
Pipeline Object Reference in the AWS Data Pipeline Developer Guide.
-
PutPipelineDefinition in the AWS Data Pipeline API Reference.