Ec2Resource - AWS Data Pipeline

AWS Data Pipeline is no longer available to new customers. Existing customers of AWS Data Pipeline can continue to use the service as normal. Learn more

Ec2Resource

An Amazon EC2 instance that performs the work defined by a pipeline activity.

AWS Data Pipeline now supports IMDSv2 for the Amazon EC2 instance, which uses a session-oriented method to better handle authentication when retrieving metadata information from instances. A session begins and ends a series of requests that software running on an Amazon EC2 instance uses to access the locally-stored Amazon EC2 instance metadata and credentials. The software starts a session with a simple HTTP PUT request to IMDSv2. IMDSv2 returns a secret token to the software running on the Amazon EC2 instance, which will use the token as a password to make requests to IMDSv2 for metadata and credentials.

Note

To use IMDSv2 for your Amazon EC2 instance, you will need to modify the settings, as the default AMI is not compatible with IMDSv2. You can specify a new AMI version which you can retrieve via the following SSM parameter: /aws/service/ami-amazon-linux-latest/amzn-ami-hvm-x86_64-ebs.

For information about default Amazon EC2 instances that AWS Data Pipeline creates if you do not specify an instance, see Default Amazon EC2 Instances by AWS Region.

Examples

EC2-Classic

Important

Only AWS accounts created before December 4, 2013 support the EC2-Classic platform. If you have one of these accounts, you may have the option to create EC2Resource objects for a pipeline in an EC2-Classic network rather than a VPC. We strongly recommend that you create resources for all your pipelines in VPCs. In addition, if you have existing resources in EC2-Classic, we recommend that you migrate them to a VPC.

The following example object launches an EC2 instance into EC2-Classic, with some optional fields set.

{ "id" : "MyEC2Resource", "type" : "Ec2Resource", "actionOnTaskFailure" : "terminate", "actionOnResourceFailure" : "retryAll", "maximumRetries" : "1", "instanceType" : "m5.large", "securityGroups" : [ "test-group", "default" ], "keyPair" : "my-key-pair" }

EC2-VPC

The following example object launches an EC2 instance into a nondefault VPC, with some optional fields set.

{ "id" : "MyEC2Resource", "type" : "Ec2Resource", "actionOnTaskFailure" : "terminate", "actionOnResourceFailure" : "retryAll", "maximumRetries" : "1", "instanceType" : "m5.large", "securityGroupIds" : [ "sg-12345678", "sg-12345678" ], "subnetId": "subnet-12345678", "associatePublicIpAddress": "true", "keyPair" : "my-key-pair" }

Syntax

Required Fields Description Slot Type
resourceRole The IAM role that controls the resources that the Amazon EC2 instance can access. String
role The IAM role that AWS Data Pipeline uses to create the EC2 instance. String

Object Invocation Fields Description Slot Type
schedule

This object is invoked within the execution of a schedule interval.

To set the dependency execution order for this object,specify a schedule reference to another object. You can do this in one of the following ways:

  • To ensure that all objects in the pipeline inherit the schedule, set a schedule on the object explicitly: "schedule": {"ref": "DefaultSchedule"}. In most cases, it is useful to put the schedule reference on the default pipeline object, so that all objects inherit that schedule.

  • If the pipeline has schedules nested within the master schedule, you can create a parent object that has a schedule reference. For more information about example optional schedule configurations, see https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-schedule.html

Reference Object, for example "schedule":{"ref":"myScheduleId"}

Optional Fields Description Slot Type
actionOnResourceFailure The action taken after a resource failure for this resource. Valid values are "retryall" and "retrynone". String
actionOnTaskFailure The action taken after a task failure for this resource. Valid values are "continue" or "terminate". String
associatePublicIpAddress Indicates whether to assign a public IP address to the instance. If the instance is in Amazon EC2 or Amazon VPC, the default value is true. Otherwise, the default value is false. Boolean
attemptStatus Most recently reported status from the remote activity. String
attemptTimeout Timeout for the remote work completion. If set, then a remote activity that does not complete within the specified starting time may be retried. Period
availabilityZone The Availability Zone in which to launch the Amazon EC2 instance. String
disableIMDSv1 Default value is false and enables both IMDSv1 and IMDSv2. If you set it to true then it disables IMDSv1 and only provides IMDSv2s Boolean
failureAndRerunMode Describes consumer node behavior when dependencies fail or are rerun. Enumeration
httpProxy The proxy host that clients use to connect to AWS services. Reference Object, for example, "httpProxy":{"ref":"myHttpProxyId"}
imageId

The ID of the AMI to use for the instance. By default, AWS Data Pipeline uses the HVM AMI virtualization type. The specific AMI IDs used are based on a Region. You can overwrite the default AMI by specifying the HVM AMI of your choice. For more information about AMI types, see Linux AMI Virtualization Types and Finding a Linux AMI in the Amazon EC2 User Guide.

String
initTimeout The amount of time to wait for the resource to start. Period
instanceCount Deprecated. Integer
instanceType The type of Amazon EC2 instance to start. String
keyPair The name of the key pair. If you launch an Amazon EC2 instance without specifying a key pair, you cannot log on to it. String
lateAfterTimeout The elapsed time after pipeline start within which the object must complete. It is triggered only when the schedule type is not set to ondemand. Period
maxActiveInstances The maximum number of concurrent active instances of a component. Re-runs do not count toward the number of active instances. Integer
maximumRetries The maximum number of attempt retries on failure. Integer
minInstanceCount Deprecated. Integer
onFail An action to run when the current object fails. Reference Object, for example "onFail":{"ref":"myActionId"}
onLateAction Actions that should be triggered if an object has not yet been scheduled or is still running. Reference Object, for example"onLateAction":{"ref":"myActionId"}
onSuccess An action to run when the current object succeeds. Reference Object, for example, "onSuccess":{"ref":"myActionId"}
parent The parent of the current object from which slots are inherited. Reference Object, for example, "parent":{"ref":"myBaseObjectId"}
pipelineLogUri The Amazon S3 URI (such as 's3://BucketName/Key/') for uploading logs for the pipeline. String
region The code for the Region in which the Amazon EC2 instance should run. By default, the instance runs in the same Region as the pipeline. You can run the instance in the same Region as a dependent dataset. Enumeration
reportProgressTimeout The timeout for remote work successive calls to reportProgress. If set, then remote activities that do not report progress for the specified period may be considered stalled and will be retried. Period
retryDelay The timeout duration between two retry attempts. Period
runAsUser The user to run the TaskRunner. String
runsOn This field is not allowed on this object. Reference Object, for example, "runsOn":{"ref":"myResourceId"}
scheduleType

Schedule type allows you to specify whether the objects in your pipeline definition should be scheduled at the beginning of an interval, at the end of the interval, or on demand.

Values are:

  • timeseries. Instances are scheduled at the end of each interval.

  • cron. Instances are scheduled at the beginning of each interval.

  • ondemand. Allows you to run a pipeline one time per activation. You do not have to clone or re-create the pipeline to run it again. If you use an on-demand schedule, it must be specified in the default object and must be the only scheduleType specified for objects in the pipeline. To use on-demand pipelines, call the ActivatePipeline operation for each subsequent run.

Enumeration
securityGroupIds The IDs of one or more Amazon EC2 security groups to use for the instances in the resource pool. String
securityGroups One or more Amazon EC2 security groups to use for the instances in the resource pool. String
spotBidPrice The maximum amount per hour for your Spot Instance in dollars, which is a decimal value between 0 and 20.00, exclusive. String
subnetId The ID of the Amazon EC2 subnet in which to start the instance. String
terminateAfter The number of hours after which to terminate the resource. Period
useOnDemandOnLastAttempt On the last attempt to request a Spot Instance, make a request for On-Demand Instances rather than a Spot Instance. This ensures that if all previous attempts have failed, the last attempt is not interrupted. Boolean
workerGroup This field is not allowed on this object. String

Runtime Fields Description Slot Type
@activeInstances List of the currently scheduled active instance objects. Reference Object, for example, "activeInstances":{"ref":"myRunnableObjectId"}
@actualEndTime Time when the execution of this object finished. DateTime
@actualStartTime Time when the execution of this object started. DateTime
cancellationReason The cancellationReason if this object was cancelled. String
@cascadeFailedOn Description of the dependency chain on which the object failed. Reference Object, for example, "cascadeFailedOn":{"ref":"myRunnableObjectId"}
emrStepLog Step logs are available only on Amazon EMR activity attempts. String
errorId The error ID if this object failed. String
errorMessage The error message if this object failed. String
errorStackTrace The error stack trace if this object failed. String
@failureReason The reason for the resource failure. String
@finishedTime The time at which this object finished its execution. DateTime
hadoopJobLog Hadoop job logs available on attempts for Amazon EMR activities. String
@healthStatus The health status of the object that reflects success or failure of the last object instance that reached a terminated state. String
@healthStatusFromInstanceId Id of the last instance object that reached a terminated state. String
@healthStatusUpdatedTime Time at which the health status was updated last time. DateTime
hostname The host name of client that picked up the task attempt. String
@lastDeactivatedTime The time at which this object was last deactivated. DateTime
@latestCompletedRunTime Time the latest run for which the execution completed. DateTime
@latestRunTime Time the latest run for which the execution was scheduled. DateTime
@nextRunTime Time of run to be scheduled next. DateTime
reportProgressTime The most recent time that the remote activity reported progress. DateTime
@scheduledEndTime The schedule end time for the object. DateTime
@scheduledStartTime The schedule start time for the object. DateTime
@status The status of this object. String
@version The pipeline version with which the object was created. String
@waitingOn Description of the list of dependencies on which this object is waiting. Reference Object, for example, "waitingOn":{"ref":"myRunnableObjectId"}

System Fields Description Slot Type
@error Error describing the ill-formed object. String
@pipelineId ID of the pipeline to which this object belongs. String
@sphere The place of an object in the lifecycle. Component objects give rise to instance objects, which execute attempt objects. String