Working with Aurora zero-ETL integrations with Amazon Redshift - Amazon Aurora

Working with Aurora zero-ETL integrations with Amazon Redshift

An Aurora zero-ETL integration with Amazon Redshift enables near real-time analytics and machine learning (ML) using Amazon Redshift on petabytes of transactional data from Aurora. It's a fully managed solution for making transactional data available in Amazon Redshift after it is written to an Aurora DB cluster. Extract, transform, and load (ETL) is the process of combining data from multiple sources into a large, central data warehouse.

A zero-ETL integration makes the data in your Aurora DB cluster available in Amazon Redshift in near real-time. Once that data is in Amazon Redshift, you can power your analytics, ML, and AI workloads using the built-in capabilities of Amazon Redshift, such as machine learning, materialized views, data sharing, federated access to multiple data stores and data lakes, and integrations with Amazon SageMaker, Amazon QuickSight, and other AWS services.

To create a zero-ETL integration, you specify an Aurora DB cluster as the source, and an Amazon Redshift data warehouse as the target. The integration replicates data from the source database into the target data warehouse.

The following diagram illustrates this functionality:

A zero-ETL integration

The integration monitors the health of the data pipeline and recovers from issues when possible. You can create integrations from multiple Aurora DB clusters into a single Amazon Redshift namespace, enabling you to derive insights across multiple applications.

For information about pricing for zero-ETL integrations, see Amazon Aurora pricing and Amazon Redshift pricing.

Benefits

Aurora zero-ETL integrations with Amazon Redshift have the following benefits:

  • Help you derive holistic insights from multiple data sources.

  • Eliminate the need to build and maintain complex data pipelines that perform extract, transform, and load (ETL) operations. Zero-ETL integrations remove the challenges that come with building and managing pipelines by provisioning and managing them for you.

  • Reduce operational burden and cost, and let you focus on improving your applications.

  • Let you leverage Amazon Redshift's analytics and ML capabilities to derive insights from transactional and other data, to respond effectively to critical, time-sensitive events.

Key concepts

As you get started with zero-ETL integrations, consider the following concepts:

Integration

A fully managed data pipeline that automatically replicates transactional data and schemas from an Aurora DB cluster to an Amazon Redshift data warehouse.

Source DB cluster

The Aurora DB cluster where data is replicated from. For Aurora MySQL, you can specify a DB cluster that uses provisioned DB instances or Aurora Serverless v2 DB instances as the source. For the Aurora PostgreSQL preview, you can only specify a cluster that uses provisioned DB instances.

Target data warehouse

The Amazon Redshift data warehouse where the data is replicated to. There are two types of data warehouse: a provisioned cluster data warehouse and a serverless data warehouse. A provisioned cluster data warehouse is a collection of computing resources called nodes, which are organized into a group called a cluster. A serverless data warehouse is comprised of a workgroup that stores compute resources, and a namespace that houses the database objects and users. Both data warehouses run an Amazon Redshift engine and contain one or more databases.

Multiple source DB clusters can write to the same target.

For more information, see Data warehouse system architecture in the Amazon Redshift Developer Guide.

Limitations

The following limitations apply to Aurora zero-ETL integrations with Amazon Redshift.

General limitations

  • The source DB cluster must be in the same Region as the target Amazon Redshift data warehouse.

  • You can't rename a DB cluster or any of its instances if it has existing integrations.

  • You can't delete a DB cluster that has existing integrations. You must delete all associated integrations first.

  • If you stop the source DB cluster, the last few transactions might not be replicated to the target data warehouse until you resume the cluster.

  • If your cluster is the source of a blue/green deployment, the blue and green environments can't have existing zero-ETL integrations during switchover. You must delete the integration first and switch over, then recreate it.

  • A DB cluster must contain at least one DB instance in order to be the source of an integration.

  • If your source cluster is the primary DB cluster in an Aurora global database and it fails over to one of its secondary clusters, the integration becomes inactive. You must delete and recreate the integration.

  • You can't create an integration for a source database that has another integration being actively created.

  • When you initially create an integration, or when a table is being resynchronized, data seeding from the source to the target can take 20-25 minutes or more depending on the size of the source database. This delay can lead to increased replica lag.

  • Some data types aren't supported. For more information, see Data type differences between Aurora and Amazon Redshift databases.

  • Foreign key references with predefined table updates aren't supported. Specifically, ON DELETE and ON UPDATE rules aren't supported with CASCADE, SET NULL, and SET DEFAULT actions. Attempting to create or update a table with such references to another table will put the table into a failed state.

  • ALTER TABLE partition operations cause your table to resynchronize in order to reload data from Aurora to Amazon Redshift. The table will be unavailable for querying while it's resynchronizing. For more information, see One or more of my Amazon Redshift tables requires a resync.

  • XA transaction aren't supported.

  • Object identifiers (including database name, table name, column names, and others) can contain only alphanumeric characters, numbers, $, and _ (underscore).

Aurora MySQL limitations

  • Your source DB cluster must be running Aurora MySQL version 3.05 (compatible with MySQL 8.0.32) or higher.

  • Zero-ETL integrations rely on MySQL binary logging (binlog) to capture ongoing data changes. Don't use binlog-based data filtering, as it can cause data inconsistencies between the source and target databases.

  • Aurora MySQL system tables, temporary tables, and views aren't replicated to Amazon Redshift.

  • Zero-ETL integrations are supported only for databases configured to use the InnoDB storage engine.

Aurora PostgreSQL preview limitations

Important

The zero-ETL integrations with Amazon Redshift feature for Aurora PostgreSQL is in preview release. The documentation and the feature are both subject to change. You can use this feature only in test environments, not in production environments. For preview terms and conditions, see Betas and Previews in AWS Service Terms.

  • Your source DB cluster must be running Aurora PostgreSQL (compatible with PostgreSQL 15.4 and Zero-ETL Support).

  • You can create and manage zero-ETL integrations for Aurora PostgreSQL only in the Amazon RDS Database Preview Environment, in the US East (Ohio) (us-east-2) AWS Region. You can use the preview environment to test beta, release candidate, and early production versions of PostgreSQL database engine software.

  • You can create and manage integrations for Aurora PostgreSQL only using the AWS Management Console. You can't use the AWS Command Line Interface (AWS CLI), the Amazon RDS API, or any of the AWS SDKs.

  • When you create a source DB cluster, the parameter group that you choose must already have the required DB cluster parameter values configured. You can't create a new parameter group afterwards and then associate it with the cluster. For a list of required parameters, see Step 1: Create a custom DB cluster parameter group.

  • You can't modify an integration after you create it. If you need to change certain settings, you must delete and recreate the integration.

  • Currently, Aurora PostgreSQL DB clusters that are the source of an integration don't perform garbage collection of logical replication data.

  • All databases created within the source Aurora PostgreSQL DB cluster must use UTF-8 encoding.

  • Column names can't contain any of the following characters: commas (,), semicolons (;), parentheses ( ), curly brackets { }, newlines (\n), tabs (\t), equal signs (=), and spaces.

  • Zero-ETL integrations with Aurora PostgreSQL don't support the following:

    • Aurora Serverless v2 DB instances. Your source DB cluster must use provisioned DB instances.

    • Custom data types or data types created by extensions.

    • Subtransactions on the source DB cluster.

    • Renaming of schemas or databases within a source DB cluster.

    • Restoring from a DB cluster snapshot or using Aurora cloning to create a source DB cluster. If you want to bring existing data into a preview cluster, then you must use the pg_dump or pg_restore utilities.

    • Creation of logical replication slots on the writer instance of the source DB cluster.

    • Large field values that require The Oversized-Attribute Storage Technique (TOAST).

    • ALTER TABLE partition operations. These operations can cause your table to resynchronize and eventually enter a Failed state. If a table fails, then you must drop and recreate it.

Amazon Redshift limitations

For a list of Amazon Redshift limitations related to zero-ETL integrations, see Considerations in the Amazon Redshift Management Guide.

Quotas

Your account has the following quotas related to Aurora zero-ETL integrations with Amazon Redshift. Each quota is per-Region unless otherwise specified.

Name Default Description
Integrations 100 The total number of integrations within an AWS account.
Integrations per target data warehouse 50 The number of integrations sending data to a single target Amazon Redshift data warehouse.
Integrations per source cluster 5 for Aurora MySQL, 1 for Aurora PostgreSQL The number of integrations sending data from a single source DB cluster.

In addition, Amazon Redshift places certain limits on the number of tables allowed in each DB instance or cluster node. For more information, see Quotas and limits in Amazon Redshift in the Amazon Redshift Management Guide.

Supported Regions

Aurora zero-ETL integrations with Amazon Redshift are available in a subset of AWS Regions. For a list of supported Regions, see Supported Regions and Aurora DB engines for zero-ETL integrations with Amazon Redshift.