

AWS Data Pipeline is no longer available to new customers. Existing customers of AWS Data Pipeline can continue to use the service as normal. [Learn more](https://aws.amazon.com/blogs/big-data/migrate-workloads-from-aws-data-pipeline/)

# What is AWS Data Pipeline?
<a name="what-is-datapipeline"></a>

**Note**  
AWS Data Pipeline service is in maintenance mode and no new features or region expansions are planned. To learn more and to find out how to migrate your existing workloads, see [Migrating workloads from AWS Data Pipeline](migration.md).

AWS Data Pipeline is a web service that you can use to automate the movement and transformation of data. With AWS Data Pipeline, you can define data-driven workflows, so that tasks can be dependent on the successful completion of previous tasks. You define the parameters of your data transformations and AWS Data Pipeline enforces the logic that you've set up. 

The following components of AWS Data Pipeline work together to manage your data:
+ A *pipeline definition* specifies the business logic of your data management. For more information, see [Pipeline definition file syntax](dp-writing-pipeline-definition.md). 
+ A *pipeline* schedules and runs tasks by creating Amazon EC2 instances to perform the defined work activities. You upload your pipeline definition to the pipeline, and then activate the pipeline. You can edit the pipeline definition for a running pipeline and activate the pipeline again for it to take effect. You can deactivate the pipeline, modify a data source, and then activate the pipeline again. When you are finished with your pipeline, you can delete it.
+  *Task Runner* polls for tasks and then performs those tasks. For example, Task Runner could copy log files to Amazon S3 and launch Amazon EMR clusters. Task Runner is installed and runs automatically on resources created by your pipeline definitions. You can write a custom task runner application, or you can use the Task Runner application that is provided by AWS Data Pipeline. For more information, see [Task Runners](dp-how-remote-taskrunner-client.md).

 For example, you can use AWS Data Pipeline to archive your web server's logs to Amazon Simple Storage Service (Amazon S3) each day and then run a weekly Amazon EMR (Amazon EMR) cluster over those logs to generate traffic reports. AWS Data Pipeline schedules the daily tasks to copy data and the weekly task to launch the Amazon EMR cluster. AWS Data Pipeline also ensures that Amazon EMR waits for the final day's data to be uploaded to Amazon S3 before it begins its analysis, even if there is an unforeseen delay in uploading the logs.

![\[AWS Data Pipeline functional overview\]](http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/images/dp-how-dp-works-v2.png)


**Topics**
+ [Migrating workloads from AWS Data Pipeline](migration.md)
+ [Related services](datapipeline-related-services.md)
+ [Accessing AWS Data Pipeline](#accessing-datapipeline)
+ [Pricing](#datapipeline-pricing)
+ [Supported Instance Types for Pipeline Work Activities](dp-supported-instance-types.md)

# Migrating workloads from AWS Data Pipeline
<a name="migration"></a>

AWS launched the AWS Data Pipeline service in 2012. At that time, customers were looking for a service to help them reliably move data between different data sources using a variety of compute options. Now, there are other services that offer customers a better experience. For example, you can use AWS Glue to to run and orchestrate Apache Spark applications, AWS Step Functions to help orchestrate AWS service components, or Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to help manage workflow orchestration for Apache Airflow.

This topic explains how to migrate from AWS Data Pipeline to alternative options. The option you choose depends on your current workload on AWS Data Pipeline. You can migrate typical use cases of AWS Data Pipeline to either AWS Glue, AWS Step Functions, or Amazon MWAA.

## Migrating workloads to AWS Glue
<a name="migration-glue"></a>

[AWS Glue](https://aws.amazon.com/glue/) is a serverless data integration service that makes it easy for analytics users to discover, prepare, move, and integrate data from multiple sources. It includes tooling for authoring, running jobs, and orchestrating workflows. With AWS Glue, you can discover and connect to more than 70 diverse data sources and manage your data in a centralized data catalog. You can visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes. Also, you can immediately search and query cataloged data using Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.

We recommend migrating your AWS Data Pipeline workload to AWS Glue when:
+ You're looking for a serverless data integration service that supports various data sources, authoring interfaces including visual editors and notebooks, and advanced data management capabilities such as data quality and sensitive data detection.
+ Your workload can be migrated to AWS Glue workflows, jobs (in Python or Apache Spark) and crawlers (for example, your existing pipeline is built on top of Apache Spark).
+ You require a single platform that can handle all aspects of your data pipeline, including ingestion, processing, transfer, integrity testing, and quality checks.
+ Your existing pipeline was created from a pre-defined template on the AWS Data Pipeline console, such as exporting a DynamoDB table to Amazon S3, and you are looking for the same purpose template.
+ Your workload does not depend on a specific Hadoop ecosystem application like Apache Hive.
+ Your workload does not require orchestrating on-premises servers.

AWS charges an hourly rate, billed by the second, for crawlers (discovering data) and ETL jobs (processing and loading data). AWS Glue Studio is a built-in orchestration engine for AWS Glue resources, and is offered at no additional cost. Learn more about pricing in [AWS Glue Pricing](https://aws.amazon.com/glue/pricing/).

## Migrating workloads to AWS Step Functions
<a name="migration-step-functions"></a>

[AWS Step Functions](https://aws.amazon.com/step-functions/) is a serverless orchestration service that lets you build workflows for your business-critical applications. With Step Functions you use a visual editor to build workflows and integrate directly with over 11,000 actions for over 250 AWS services, such as AWS Lambda, Amazon EMR, DynamoDB and more. You can use Step Functions for orchestrating data processing pipelines, handling errors, and working with the throttling limits on the underlying AWS services. You can create workflows that process and publish machine learning models, orchestrate micro-services, as well as control AWS services, such as AWS Glue, to create extract, transform, and load (ETL) workflows. You also can create long-running, automated workflows for applications that require human interaction.

Similarly to AWS Data Pipeline, AWS Step Functions is a fully managed service provided by AWS. You will not be required to manage infrastructure, patch workers, manage OS version updates or similar.

We recommend migrating your AWS Data Pipeline workload to AWS Step Functions when:
+ You're looking for a serverless, highly available workflow orchestration service.
+ You're looking for a cost-effective solution that charges at a granularity of a single task execution.
+ Your workloads are orchestrating tasks for multiple other AWS services, such as Amazon EMR, Lambda, AWS Glue, or DynamoDB.
+ You're looking for a low-code solution that comes with a drag-and-drop visual designer for workflow creation and does not require learning new programming concepts.
+ You're looking for a service that provides integrations with over 250 other AWS services covering over 11,000 actions out-of-the-box, as well as allowing integrations with custom non-AWS services and activities.

Both AWS Data Pipeline and Step Functions use JSON format to define workflows. This allows to store your workflows in source control, manage versions, control access, and automate with CI/CD. Step Functions are using a syntax called Amazon State Language which is fully based on JSON, and allows a seamless transition between the textual and visual representations of the workflow.

With Step Functions, you can choose the same version of Amazon EMR that you're currently using in AWS Data Pipeline.

For migrating activities on AWS Data Pipeline managed resources, you can use [AWS SDK service integration](https://docs.aws.amazon.com/step-functions/latest/dg/supported-services-awssdk.html) on Step Functions to automate resource provisioning and cleaning up.

For migrating activities on on-premises servers, user-managed EC2 instances, or a user-managed EMR cluster, you can install an [SSM agent](https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-prereqs.html) to the instance. You can initiate the command through the [AWS Systems Manager Run Command](https://docs.aws.amazon.com/systems-manager/latest/userguide/execute-remote-commands.html) from Step Functions. You can also initiate the state machine from the schedule defined in [Amazon EventBridge](https://aws.amazon.com/eventbridge/).

AWS Step Functions has two types of workflows: Standard Workflows and Express Workflows. For Standard Workflows, you’re charged based on the number of state transitions required to run your application. For Express Workflows, you’re charged based on the number of requests for your workflow and its duration. Learn more about pricing in [AWS Step Functions Pricing](https://aws.amazon.com/step-functions/pricing/).

## Migrating workloads to Amazon MWAA
<a name="migration-mwaa"></a>

[Amazon MWAA](https://aws.amazon.com/managed-workflows-for-apache-airflow/) (Managed Workflows for Apache Airflow) is a managed orchestration service for [Apache Airflow](https://airflow.apache.org/) that makes it easier to set up and operate end-to-end data pipelines in the cloud at scale. Apache Airflow is an open-source tool used to programmatically author, schedule, and monitor sequences of processes and tasks referred to as "workflows". With Amazon MWAA, you can use Airflow and Python programming language to create workflows without having to manage the underlying infrastructure for scalability, availability, and security. Amazon MWAA automatically scales its workflow execution capacity to meet your needs, and is integrated with AWS security services to help provide you with fast and secure access to your data.

Similarly to AWS Data Pipeline, Amazon MWAA is fully managed services provided by AWS. While you need to learn several new concepts specific to these services, you are not required to manage infrastructure, patch workers, manage OS version updates or similar.

We recommend migrating your AWS Data Pipeline workloads to Amazon MWAA when:
+ You're looking for a managed, highly available service to orchestrate workflows written in Python.
+ You want to transition to a fully-managed, widely-adopted open-source technology, Apache Airflow, for maximum portability.
+ You require a single platform that can handle all aspects of your data pipeline, including ingestion, processing, transfer, integrity testing, and quality checks.
+ You're looking for a service designed for data pipeline orchestration with features such as rich UI for observability, restarts for failed workflows, backfills, and retries for tasks.
+ You're looking for a service that comes with more than 800 pre-built operators and sensors, covering AWS as well as non-AWS services.

Amazon MWAA workflows are defined as Directed Acyclic Graphs (DAGs) using Python, so you can also treat them as source code. Airflow's extensible Python framework enables you to build workflows connecting with virtually any technology. It comes with a rich user interface for viewing and monitoring workflows and can be easily integrated with version control systems to automate the CI/CD process.

With Amazon MWAA, you can choose the same version of Amazon EMR that you’re currently using in AWS Data Pipeline.

AWS charges for the time your Airflow environment runs plus any additional auto scaling to provide more worker or web server capacity. Learn more about pricing in [Amazon Managed Workflows for Apache Airflow Pricing](https://aws.amazon.com/managed-workflows-for-apache-airflow/pricing/).

## Mapping the concepts
<a name="migration-mapping"></a>

The following table contains mapping of major concepts used by the services. It will help people familiar with Data Pipeline to understand the Step Functions and MWAA terminology.


| Data Pipeline | Glue | Step Functions | Amazon MWAA | 
| --- | --- | --- | --- | 
| Pipelines | [Workflows](https://docs.aws.amazon.com/glue/latest/dg/workflows_overview.html) | [Workflows](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-standard-vs-express.html) | [Direct acylic graphs](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html) | 
| Pipeline definition JSON | [Workflow definition](https://docs.aws.amazon.com/glue/latest/dg/creating_running_workflows.html) or [Python-based blueprints](https://docs.aws.amazon.com/glue/latest/dg/blueprints-overview.html) | [Amazon State Language JSON](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-amazon-states-language.html) | [Python-based](https://airflow.apache.org/docs/apache-airflow/stable/tutorial/fundamentals.html#example-pipeline-definition) | 
| Activities | [Jobs](https://docs.aws.amazon.com/glue/latest/dg/etl-jobs-section.html) | [States](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-states.html) and [Tasks](https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-task-state.html) | [Tasks](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/tasks.html) ([Operators](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/operators.html) and [Sensors](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/sensors.html)) | 
| Instances | [Job runs](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-jobs-runs.html) | [Executions](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-state-machine-executions.html) | [DAG runs](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dag-run.html) | 
| Attempts | Retry attempts | [Catchers and retriers](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-error-handling.html) | [Retries](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html#default-arguments) | 
| Pipeline schedule | [Schedule triggers](https://docs.aws.amazon.com/glue/latest/dg/about-triggers.html) | [EventBridge Scheduler tasks](https://docs.aws.amazon.com/scheduler/latest/UserGuide/what-is-scheduler.html) | [Cron](https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/timezone.html), [timetables](https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/timetable.html), [data-aware](https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/datasets.html) | 
| Pipeline expressions and functions | [Blueprint library](https://docs.aws.amazon.com/glue/latest/dg/developing-blueprints-overview.html) | [Step Functions intrinsic functions](https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-intrinsic-functions.html) and [AWS Lambda](https://docs.aws.amazon.com/step-functions/latest/dg/use-cases-data-processing.html) | [Extensible Python framework](https://airflow.apache.org/docs/apache-airflow/stable/howto/custom-operator.html) | 

## Samples
<a name="migration-samples"></a>

The following sections lists public examples that you can refer to migrate from AWS Data Pipeline to individual services. You can refer them as examples, and build your own pipeline on the individual services by updating and testing it based on your use case.

### AWS Glue samples
<a name="migration-samples-aws-glue"></a>

The following list contains sample implementations for the most common AWS Data Pipeline use-cases with AWS Glue.
+ [Running Spark jobs](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-intro-tutorial.html)
+ [Copying data from JDBC to Amazon S3](https://github.com/awslabs/aws-glue-blueprint-libs/tree/master/samples/jdbc_to_s3) (including Amazon Redshift)
+ [Copying data from Amazon S3 to JDBC](https://github.com/awslabs/aws-glue-blueprint-libs/tree/master/samples/s3_to_jdbc) (including Amazon Redshift)
+ [Copying data from Amazon S3 to DynamoDB](https://github.com/awslabs/aws-glue-blueprint-libs/tree/master/samples/s3_to_dynamodb)
+ [Moving data to and from Amazon Redshift](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-redshift.html)
+ [Cross-account cross-Region access to DynamoDB tables](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-dynamo-db-cross-account.html)

### AWS Step Functions samples
<a name="migration-samples-aws-step-functions"></a>

The following list contains sample implementations for the most common AWS Data Pipeline use-cases with AWS Step Functions.
+ [Managing an Amazon EMR job](https://docs.aws.amazon.com/step-functions/latest/dg/sample-emr-job.html)
+ [Running a data processing job on Amazon EMR Serverless](https://aws.amazon.com/blogs/big-data/run-a-data-processing-job-on-amazon-emr-serverless-with-aws-step-functions/)
+ [Running Hive/Pig/Hadoop jobs](https://catalog.us-east-1.prod.workshops.aws/workshops/c86bd131-f6bf-4e8f-b798-58fd450d3c44/en-US/step-functions/01-execute-step-function)
+ [Querying large datasets](https://docs.aws.amazon.com/step-functions/latest/dg/sample-query-large-datasets.html) (Amazon Athena, Amazon S3, AWS Glue)
+ [Running ETL workflows using Amazon Redshift](https://docs.aws.amazon.com/step-functions/latest/dg/sample-etl-orchestration.html)
+ [Orchestrating AWS Glue crawlers](https://aws.amazon.com/blogs/compute/orchestrating-aws-glue-crawlers-using-aws-step-functions/)

See additional [tutorials](https://docs.aws.amazon.com/step-functions/latest/dg/tutorials.html) and [samples projects](https://docs.aws.amazon.com/step-functions/latest/dg/create-sample-projects.html) for using AWS Step Functions.

### Amazon MWAA samples
<a name="migration-samples-amazon-mwaa"></a>

The following list contains sample implementations for the most common AWS Data Pipeline use-cases with Amazon MWAA.
+ [Running an Amazon EMR job](https://catalog.us-east-1.prod.workshops.aws/workshops/795e88bb-17e2-498f-82d1-2104f4824168/en-US/workshop-2-2-2/m1-processing/emr)
+ [Creating a custom plugin for Apache Hive and Hadoop](https://docs.aws.amazon.com/mwaa/latest/userguide/samples-hive.html)
+ [Copying data from Amazon S3 to Redshift](https://catalog.us-east-1.prod.workshops.aws/workshops/795e88bb-17e2-498f-82d1-2104f4824168/en-US/workshop-2-2-2/m1-processing/redshift)
+ [Executing a Shell script on a remote EC2 instance](https://docs.aws.amazon.com/mwaa/latest/userguide/samples-ssh.html)
+ [Orchestrating hybrid (on-prem) workflows](https://dev.to/aws/orchestrating-hybrid-workflows-using-amazon-managed-workflows-for-apache-airflow-mwaa-2boc)

See additional [tutorials](https://docs.aws.amazon.com/mwaa/latest/userguide/tutorials.html) and [samples projects](https://docs.aws.amazon.com/mwaa/latest/userguide/sample-code.html) for using Amazon MWAA.

# Related services
<a name="datapipeline-related-services"></a>

AWS Data Pipeline works with the following services to store data.
+ Amazon DynamoDB — Provides a fully managed NoSQL database with fast performance at a low cost. For more information, see *[Amazon DynamoDB Developer Guide](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/)*.
+ Amazon RDS — Provides a fully managed relational database that scales to large datasets. For more information, see *[Amazon Relational Database Service Developer Guide](https://docs.aws.amazon.com/AmazonRDS/latest/DeveloperGuide/)*.
+ Amazon Redshift — Provides a fast, fully managed, petabyte-scale data warehouse that makes it easy and cost-effective to analyze a vast amount of data. For more information, see *[Amazon Redshift Database Developer Guide](https://docs.aws.amazon.com/redshift/latest/dg/)*.
+ Amazon S3 — Provides secure, durable, and highly scalable object storage. For more information, see *[Amazon Simple Storage Service User Guide](https://docs.aws.amazon.com/AmazonS3/latest/userguide/)*.

AWS Data Pipeline works with the following compute services to transform data.
+ Amazon EC2 — Provides resizable computing capacity—literally, servers in Amazon's data centers—that you use to build and host your software systems. For more information, see *[Amazon EC2 User Guide](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/)*.
+ Amazon EMR — Makes it easy, fast, and cost-effective for you to distribute and process vast amounts of data across Amazon EC2 servers, using a framework such as Apache Hadoop or Apache Spark. For more information, see *[Amazon EMR Developer Guide](https://docs.aws.amazon.com/emr/latest/DeveloperGuide/)*.

## Accessing AWS Data Pipeline
<a name="accessing-datapipeline"></a>

You can create, access, and manage your pipelines using any of the following interfaces:
+ **AWS Management Console**— Provides a web interface that you can use to access AWS Data Pipeline.
+ **AWS Command Line Interface (AWS CLI)** — Provides commands for a broad set of AWS services, including AWS Data Pipeline, and is supported on Windows, macOS, and Linux. For more information about installing the AWS CLI, see [AWS Command Line Interface](https://aws.amazon.com/cli/). For a list of commands for AWS Data Pipeline, see [datapipeline](https://docs.aws.amazon.com/cli/latest/reference/datapipeline/index.html).
+ **AWS SDKs** — Provides language-specific APIs and takes care of many of the connection details, such as calculating signatures, handling request retries, and error handling. For more information, see [AWS SDKs](http://aws.amazon.com/tools/#SDKs).
+ **Query API**— Provides low-level APIs that you call using HTTPS requests. Using the Query API is the most direct way to access AWS Data Pipeline, but it requires that your application handle low-level details such as generating the hash to sign the request, and error handling. For more information, see the *[AWS Data Pipeline API Reference](https://docs.aws.amazon.com/datapipeline/latest/APIReference/)*.

## Pricing
<a name="datapipeline-pricing"></a>

With Amazon Web Services, you pay only for what you use. For AWS Data Pipeline, you pay for your pipeline based on how often your activities and preconditions are scheduled to run and where they run. For more information, see [AWS Data Pipeline Pricing](https://aws.amazon.com/datapipeline/pricing/).

If your AWS account is less than 12 months old, you are eligible to use the free tier. The free tier includes three low-frequency preconditions and five low-frequency activities per month at no charge. For more information, see [AWS Free Tier](https://aws.amazon.com/free/).

# Supported Instance Types for Pipeline Work Activities
<a name="dp-supported-instance-types"></a>

When AWS Data Pipeline runs a pipeline, it compiles the pipeline components to create a set of actionable Amazon EC2 instances. Each instance contains all the information for performing a specific task. The complete set of instances is the to-do list of the pipeline. AWS Data Pipeline hands the instances out to task runners to process.

EC2 instances come in different configurations, which are known as *instance types*. Each instance type has a different CPU, input/output, and storage capacity. In addition to specifying the instance type for an activity, you can choose different purchasing options. Not all instance types are available in all AWS Regions. If an instance type is not available, your pipeline may fail to provision or may be stuck provisioning. For information about instance availability, see the [Amazon EC2 Pricing Page](https://aws.amazon.com//ec2/pricing). Open the link for your instance purchasing option and filter by **Region** to see if an instance type is available in the Region. For more information about these instance types, families, and virtualization types, see [Amazon EC2 Instances](https://aws.amazon.com/ec2/instance-types/) and [Amazon Linux AMI Instance Type Matrix](https://aws.amazon.com/amazon-linux-ami/instance-type-matrix/).

The following tables describe the instance types that AWS Data Pipeline supports. You can use AWS Data Pipeline to launch Amazon EC2 instances in any Region, including Regions where AWS Data Pipeline is not supported. For information about Regions where AWS Data Pipeline is supported, see [AWS Regions and Endpoints](https://docs.aws.amazon.com/general/latest/gr/rande.html#datapipeline_region). 

**Topics**
+ [Default Amazon EC2 Instances by AWS Region](dp-ec2-default-instance-types.md)
+ [Additional Supported Amazon EC2 Instances](dp-ec2-supported-instance-types.md)
+ [Supported Amazon EC2 Instances for Amazon EMR Clusters](dp-emr-supported-instance-types.md)

# Default Amazon EC2 Instances by AWS Region
<a name="dp-ec2-default-instance-types"></a>

If you do not specify an instance type in your pipeline definition, AWS Data Pipeline launches an instance by default. 

The following table lists the Amazon EC2 instances that AWS Data Pipeline uses by default in those Regions where AWS Data Pipeline is supported. 


| Region Name | Region | Instance Type | 
| --- | --- | --- | 
| US East (N. Virginia) | us-east-1 | m1.small | 
| US West (Oregon) | us-west-2 | m1.small | 
| Asia Pacific (Sydney) | ap-southeast-2 | m1.small | 
| Asia Pacific (Tokyo) | ap-northeast-1 | m1.small | 
| EU (Ireland) | eu-west-1 | m1.small | 

The following table lists the Amazon EC2 instances that AWS Data Pipeline launches by default in those Regions where AWS Data Pipeline is not supported. 


| Region Name | Region | Instance Type | 
| --- | --- | --- | 
| US East (Ohio) | us-east-2 | t2.small | 
| US West (N. California) | us-west-1 | m1.small | 
| Asia Pacific (Mumbai) | ap-south-1 | t2.small | 
| Asia Pacific (Singapore) | ap-southeast-1 | m1.small | 
| Asia Pacific (Seoul) | ap-northeast-2 | t2.small | 
| Canada (Central) | ca-central-1 | t2.small | 
| EU (Frankfurt) | eu-central-1 | t2.small | 
| EU (London) | eu-west-2 | t2.small | 
| EU (Paris) | eu-west-3 | t2.small | 
| South America (São Paulo) | sa-east-1 | m1.small | 

# Additional Supported Amazon EC2 Instances
<a name="dp-ec2-supported-instance-types"></a>

In addition to the default instances that are created if you don't specify an instance type in your pipeline definition, the following instances are supported. 

The following table lists the Amazon EC2 instances that AWS Data Pipeline supports and can create, if specified. 


| Instance Class | Instance Types | 
| --- | --- | 
| General purpose |  t2.nano \$1 t2.micro \$1 t2.small \$1 t2.medium \$1 t2.large  | 
| Compute optimized |  c3.large \$1 c3.xlarge \$1 c3.2xlarge \$1 c3.4xlarge \$1 c3.8xlarge \$1 c4.large \$1 c4.xlarge \$1 c4.2xlarge \$1 c4.4xlarge \$1 c4.8xlarge \$1 c5.xlarge \$1 c5.9xlarge \$1 c5.2xlarge \$1 c5.4xlarge \$1 c5.9xlarge \$1 c5.18xlarge \$1 c5d.xlarge \$1 c5d.2xlarge \$1 c5d.4xlarge \$1 c5d.9xlarge \$1 c5d.18xlarge  | 
| Memory optimized |  m3.medium \$1 m3.large \$1 m3.xlarge \$1 m3.2xlarge \$1 m4.large \$1 m4.xlarge \$1 m4.2xlarge \$1 m4.4xlarge \$1 m4.10xlarge \$1 m4.16xlarge \$1 m5.xlarge \$1 m5.2xlarge \$1 m5.4xlarge \$1 m5.12xlarge \$1 m5.24xlarge \$1 m5d.xlarge \$1 m5d.2xlarge \$1 m5d.4xlarge \$1 m5d.12xlarge \$1 m5d.24xlarge r3.large \$1 r3.xlarge \$1 r3.2xlarge \$1 r3.4xlarge \$1 r3.8xlarge \$1 r4.large \$1 r4.xlarge \$1 r4.2xlarge \$1 r4.4xlarge \$1 r4.8xlarge \$1 r4.16xlarge  | 
| Storage optimized |   i2.xlarge \$1 i2.2xlarge \$1 i2.4xlarge \$1 i2.8xlarge \$1 hs1.8xlarge \$1 g2.2xlarge \$1 g2.8xlarge \$1 d2.xlarge \$1 d2.2xlarge \$1 d2.4xlarge \$1 d2.8xlarge  | 

# Supported Amazon EC2 Instances for Amazon EMR Clusters
<a name="dp-emr-supported-instance-types"></a>

This table lists the Amazon EC2 instances that AWS Data Pipeline supports and can create for Amazon EMR clusters, if specified. For more information, see [Supported Instance Types](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-supported-instance-types.html) in the *Amazon EMR Management Guide*.


| Instance Class | Instance Types | 
| --- | --- | 
| General purpose |  m1.small \$1 m1.medium \$1 m1.large \$1 m1.xlarge \$1 m3.xlarge \$1 m3.2xlarge  | 
| Compute optimized |  c1.medium \$1 c1.xlarge \$1 c3.xlarge \$1 c3.2xlarge \$1 c3.4xlarge \$1 c3.8xlarge \$1 cc1.4xlarge\$1 cc2.8xlarge \$1 c4.large \$1 c4.xlarge \$1 c4.2xlarge\$1 c4.4xlarge \$1 c4.8xlarge \$1 c5.xlarge \$1 c5.9xlarge \$1 c5.2xlarge \$1 c5.4xlarge \$1 c5.9xlarge \$1 c5.18xlarge \$1 c5d.xlarge \$1 c5d.2xlarge \$1 c5d.4xlarge \$1 c5d.9xlarge \$1 c5d.18xlarge  | 
| Memory optimized | m2.xlarge \$1 m2.2xlarge \$1 m2.4xlarge \$1 r3.xlarge \$1 r3.2xlarge \$1 r3.4xlarge \$1 r3.8xlarge \$1 cr1.8xlarge \$1 m4.large \$1 m4.xlarge \$1 m4.2xlarge \$1 m4.4xlarge \$1 m4.10xlarge \$1 m4.16large \$1 m5.xlarge \$1 m5.2xlarge \$1 m5.4xlarge \$1 m5.12xlarge \$1 m5.24xlarge \$1 m5d.xlarge \$1 m5d.2xlarge \$1 m5d.4xlarge \$1 m5d.12xlarge \$1 m5d.24xlarge \$1 r4.large \$1 r4.xlarge \$1 r4.2xlarge \$1 r4.4xlarge \$1 r4.8xlarge \$1 r4.16xlarge | 
| Storage optimized |  h1.4xlarge \$1 hs1.2xlarge \$1 hs1.4xlarge\$1 hs1.8xlarge \$1 i2.xlarge \$1 i2.2xlarge \$1 i2.4large \$1 i2.8xlarge \$1 d2.xlarge \$1 d2.2xlarge\$1 d2.4xlarge \$1 d2.8xlarge  | 
| Accelerated computing | g2.2xlarge \$1 cg1.4xlarge | 