Working with Amazon RDS zero-ETL integrations with Amazon Redshift
An Amazon RDS zero-ETL integration with Amazon Redshift enables near real-time analytics and machine learning (ML) using Amazon Redshift on petabytes of transactional data from RDS. It's a fully managed solution for making transactional data available in Amazon Redshift after it is written to an RDS database. Extract, transform, and load (ETL) is the process of combining data from multiple sources into a large, central data warehouse.
A zero-ETL integration makes the data in your RDS database available in Amazon Redshift in near real-time. Once that data is in Amazon Redshift, you can power your analytics, ML, and AI workloads using the built-in capabilities of Amazon Redshift, such as machine learning, materialized views, data sharing, federated access to multiple data stores and data lakes, and integrations with Amazon SageMaker, Amazon QuickSight, and other AWS services.
To create a zero-ETL integration, you specify an RDS database as the source, and an Amazon Redshift data warehouse as the target. The integration replicates data from the source database into the target data warehouse.
The following diagram illustrates this functionality:
The integration monitors the health of the data pipeline and recovers from issues when possible. You can create integrations from multiple RDS databases into a single Amazon Redshift namespace, enabling you to derive insights across multiple applications.
Topics
- Benefits
- Key concepts
- Limitations
- Quotas
- Supported Regions
- Getting started with Amazon RDS zero-ETL integrations with Amazon Redshift
- Creating Amazon RDS zero-ETL integrations with Amazon Redshift
- Data filtering for Amazon RDS zero-ETL integrations with Amazon Redshift
- Adding data to a source RDS database and querying it in Amazon Redshift
- Viewing and monitoring Amazon RDS zero-ETL integrations with Amazon Redshift
- Modifying Amazon RDS zero-ETL integrations with Amazon Redshift
- Deleting Amazon RDS zero-ETL integrations with Amazon Redshift
- Troubleshooting Amazon RDS zero-ETL integrations with Amazon Redshift
Benefits
RDS zero-ETL integrations with Amazon Redshift have the following benefits:
-
Help you derive holistic insights from multiple data sources.
-
Eliminate the need to build and maintain complex data pipelines that perform extract, transform, and load (ETL) operations. Zero-ETL integrations remove the challenges that come with building and managing pipelines by provisioning and managing them for you.
-
Reduce operational burden and cost, and let you focus on improving your applications.
-
Let you leverage Amazon Redshift's analytics and ML capabilities to derive insights from transactional and other data, to respond effectively to critical, time-sensitive events.
Key concepts
As you get started with zero-ETL integrations, consider the following concepts:
- Integration
-
A fully managed data pipeline that automatically replicates transactional data and schemas from an RDS database to an Amazon Redshift data warehouse.
- Source database
-
The RDS database where data is replicated from. You can specify a Single-AZ or Multi-AZ DB instance, or a Multi-AZ DB cluster.
- Target data warehouse
-
The Amazon Redshift data warehouse where the data is replicated to. There are two types of data warehouse: a provisioned cluster data warehouse and a serverless data warehouse. A provisioned cluster data warehouse is a collection of computing resources called nodes, which are organized into a group called a cluster. A serverless data warehouse is comprised of a workgroup that stores compute resources, and a namespace that houses the database objects and users. Both data warehouses run an Amazon Redshift engine and contain one or more databases.
Multiple source databases can write to the same target.
For more information, see Data warehouse system architecture in the Amazon Redshift Developer Guide.
Limitations
The following limitations apply to RDS zero-ETL integrations with Amazon Redshift.
General limitations
-
The source database must be in the same Region as the target Amazon Redshift data warehouse.
-
You can't rename a database if it has existing integrations.
-
You can't create multiple integrations between the same source and target databases.
-
You can't delete a database that has existing integrations. You must delete all associated integrations first.
-
If you stop the source database, the last few transactions might not be replicated to the target data warehouse until you resume the database.
-
You can’t delete an integration if the source database is stopped.
-
If your database is the source of a blue/green deployment, the blue and green environments can't have existing zero-ETL integrations during switchover. You must delete the integration first and switch over, then recreate it.
-
You can't create an integration for a source database that has another integration being actively created.
-
When you initially create an integration, or when a table is being resynchronized, data seeding from the source to the target can take 20-25 minutes or more depending on the size of the source database. This delay can lead to increased replica lag.
-
Some data types aren't supported. For more information, see Data type differences between RDS and Amazon Redshift databases.
-
Object identifiers (including database name, table name, column names, and others) can contain only alphanumeric characters, numbers, $, and _ (underscore).
-
System tables, temporary tables, and views aren't replicated to Amazon Redshift.
RDS for MySQL limitations
-
Your source database must be running a supported version of RDS for MySQL. For a list of supported versions, see Supported Regions and DB engines for Amazon RDS zero-ETL integrations with Amazon Redshift.
-
Zero-ETL integrations rely on MySQL binary logging (binlog) to capture ongoing data changes. Don't use binlog-based data filtering, as it can cause data inconsistencies between the source and target databases.
-
Zero-ETL integrations are supported only for databases configured to use the InnoDB storage engine.
-
Foreign key references with predefined table updates aren't supported. Specifically,
ON DELETE
andON UPDATE
rules aren't supported withCASCADE
,SET NULL
, andSET DEFAULT
actions. Attempting to create or update a table with such references to another table will put the table into a failed state. -
ALTER TABLE
partition operations cause your table to resynchronize in order to reload data from RDS to Amazon Redshift. The table will be unavailable for querying while it's resynchronizing. For more information, see One or more of my Amazon Redshift tables requires a resync.
Amazon Redshift limitations
For a list of Amazon Redshift limitations related to zero-ETL integrations, see Considerations when using zero-ETL integrations with Amazon Redshift in the Amazon Redshift Management Guide.
Quotas
Your account has the following quotas related to RDS zero-ETL integrations with Amazon Redshift. Each quota is per-Region unless otherwise specified.
Name | Default | Description |
---|---|---|
Integrations | 100 | The total number of integrations within an AWS account. |
Integrations per target data warehouse | 50 | The number of integrations sending data to a single target Amazon Redshift data warehouse. |
Integrations per source instance | 1 | The number of integrations sending data from a single source DB instance. |
In addition, Amazon Redshift places certain limits on the number of tables allowed in each DB instance or cluster node. For more information, see Quotas and limits in Amazon Redshift in the Amazon Redshift Management Guide.
Supported Regions
RDS zero-ETL integrations with Amazon Redshift are available in a subset of AWS Regions. For a list of supported Regions, see Supported Regions and DB engines for Amazon RDS zero-ETL integrations with Amazon Redshift.