# Aurora zero-ETL integrations
<a name="zero-etl"></a>

An Aurora zero-ETL integration with Amazon Redshift and Amazon SageMaker AI enables near real-time analytics and machine learning (ML) using data from Aurora. It's a fully managed solution for making transactional data available in your analytics destination after it is written to an Aurora DB cluster. *Extract, transform,* and *load* (ETL) is the process of combining data from multiple sources into a large, central data warehouse.

A zero-ETL integration makes the data in your Aurora DB cluster available in Amazon Redshift or an Amazon SageMaker AI lakehouse in near real-time. Once that data is in the target data warehouse or data lake, you can power your analytics, ML, and AI workloads using the built-in capabilities, such as machine learning, materialized views, data sharing, federated access to multiple data stores and data lakes, and integrations with Amazon SageMaker AI, Quick, and other AWS services.

To create a zero-ETL integration, you specify an Aurora DB cluster as the *source*, and a supported data warehouse or lakehouse as the *target*. The integration replicates data from the source database into the target data warehouse or lakehouse.

The following diagram illustrates this functionality for zero-ETL integration with Amazon Redshift:

![\[A zero-ETL integration\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/zero-etl-integrations.png)


The following diagram illustrates this functionality for zero-ETL integration with an Amazon SageMaker AI lakehouse:

![\[A zero-ETL integration with an Amazon SageMaker AI lakehouse\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/zero-etl-aurora-lakehouse.png)


The integration monitors the health of the data pipeline and recovers from issues when possible. You can create integrations from multiple Aurora DB clusters into a single target data warehouse or lakehouse enabling you to derive insights across multiple applications.

For information about pricing for zero-ETL integrations, see [Amazon Aurora pricing](https://aws.amazon.com/rds/aurora/pricing/) and [Amazon Redshift pricing](https://aws.amazon.com/redshift/pricing/).

**Topics**
+ [Benefits](#zero-etl.benefits)
+ [Key concepts](#zero-etl.concepts)
+ [Limitations](#zero-etl.reqs-lims)
+ [Quotas](#zero-etl.quotas)
+ [Supported Regions](#zero-etl.regions)
+ [Getting started with Aurora zero-ETL integrations](zero-etl.setting-up.md)
+ [Creating Aurora zero-ETL integrations with Amazon Redshift](zero-etl.creating.md)
+ [Creating Aurora zero-ETL integrations with an Amazon SageMaker lakehouse](zero-etl.creating-smlh.md)
+ [Data filtering for Aurora zero-ETL integrations](zero-etl.filtering.md)
+ [Adding data to a source Aurora DB cluster and querying it](zero-etl.querying.md)
+ [Viewing and monitoring Aurora zero-ETL integrations](zero-etl.describingmonitoring.md)
+ [Modifying Aurora zero-ETL integrations](zero-etl.modifying.md)
+ [Deleting Aurora zero-ETL integrations](zero-etl.deleting.md)
+ [Troubleshooting Aurora zero-ETL integrations](zero-etl.troubleshooting.md)

## Benefits
<a name="zero-etl.benefits"></a>

Aurora zero-ETL integrations have the following benefits:
+ Help you derive holistic insights from multiple data sources.
+ Eliminate the need to build and maintain complex data pipelines that perform extract, transform, and load (ETL) operations. Zero-ETL integrations remove the challenges that come with building and managing pipelines by provisioning and managing them for you.
+ Reduce operational burden and cost, and let you focus on improving your applications.
+ Let you leverage the target destination's analytics and ML capabilities to derive insights from transactional and other data, to respond effectively to critical, time-sensitive events.

## Key concepts
<a name="zero-etl.concepts"></a>

As you get started with zero-ETL integrations, consider the following concepts:

**Integration**  
A fully managed data pipeline that automatically replicates transactional data and schemas from an Aurora DB cluster to a data warehouse or catalog.

**Source DB cluster**  
The Aurora DB cluster where data is replicated from. You can specify a DB cluster that uses provisioned DB instances or Aurora Serverless v2 DB instances as the source.

**Target**  
The data warehouse or lakehouse where the data is replicated to. There are two types of data warehouse: a [provisioned cluster](https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-clusters.html) data warehouse and a [serverless](https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-workgroup-namespace.html) data warehouse. A provisioned cluster data warehouse is a collection of computing resources called nodes, which are organized into a group called a *cluster*. A serverless data warehouse is comprised of a workgroup that stores compute resources, and a namespace that houses the database objects and users. Both data warehouses run an analytics engine and contain one or more databases.  
A target lakehouse consists of catalogs, databases, tables, and views. For more information about lakehouse architecture, see [https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/lakehouse-components.html](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/lakehouse-components.html) in the * Amazon SageMaker AI Unified Studio User Guide*.  
Multiple source DB clusters can write to the same target.  
For more information, see [Data warehouse system architecture](https://docs.aws.amazon.com/redshift/latest/dg/c_high_level_system_architecture.html) in the *Amazon Redshift Developer Guide*.

## Limitations
<a name="zero-etl.reqs-lims"></a>

The following limitations apply to Aurora zero-ETL integrations.

**Topics**
+ [General limitations](#zero-etl.reqs-lims-general)
+ [Aurora MySQL limitations](#zero-etl.reqs-lims-mysql)
+ [Aurora PostgreSQL limitations](#zero-etl.reqs-lims-apg)
+ [Amazon Redshift limitations](#zero-etl.reqs-lims-redshift)
+ [Amazon SageMaker AI lakehouse limitations](#zero-etl.reqs-lims-sagemaker-)

### General limitations
<a name="zero-etl.reqs-lims-general"></a>
+ The source DB cluster must be in the same Region as the target.
+ You can't rename a DB cluster or any of its instances if it has existing integrations.
+ You can't create multiple integrations between the same source and target databases.
+ You can't delete a DB cluster that has existing integrations. You must delete all associated integrations first.
+ If you stop the source DB cluster, the last few transactions might not be replicated to the target until you resume the cluster.
+ If your cluster is the source of a blue/green deployment, the blue and green environments can't have existing zero-ETL integrations during switchover. You must delete the integration first and switch over, then recreate it.
+ A DB cluster must contain at least one DB instance in order to be the source of an integration.
+ You can't create an integration for a source DB cluster that is a cross-account clone, such as those shared using AWS Resource Access Manager (AWS RAM).
+ If your source cluster is the primary DB cluster in an Aurora global database and it fails over to one of its secondary clusters, the integration becomes inactive. You must delete and recreate the integration.
+ You can't create an integration for a source database that has another integration being actively created.
+ When you initially create an integration, or when a table is being resynchronized, data seeding from the source to the target can take 20-25 minutes or more depending on the size of the source database. This delay can lead to increased replica lag.
+ Some data types aren't supported. For more information, see [Data type differences between Aurora and Amazon Redshift databases](zero-etl.querying.md#zero-etl.data-type-mapping).
+ System tables, temporary tables, and views aren't replicated to target warehouses.
+ Performing DDL commands (for example `ALTER TABLE`) on a source table can trigger a table resynchronization, making the table unavailable for querying while it's resynchronizing. For more information, see [One or more of my Amazon Redshift tables requires a resync](zero-etl.troubleshooting.md#zero-etl.troubleshooting.resync).

### Aurora MySQL limitations
<a name="zero-etl.reqs-lims-mysql"></a>
+ Your source DB cluster must be running a supported version of Aurora MySQL. For a list of supported versions, see [Supported Regions and Aurora DB engines for zero-ETL integrations](Concepts.Aurora_Fea_Regions_DB-eng.Feature.Zero-ETL.md).
+ Zero-ETL integrations rely on MySQL binary logging (binlog) to capture ongoing data changes. Don't use binlog-based data filtering, as it can cause data inconsistencies between the source and target databases.
+ Zero-ETL integrations are supported only for databases configured to use the InnoDB storage engine.
+ Foreign key references with predefined table updates aren't supported. Specifically, `ON DELETE` and `ON UPDATE` rules aren't supported with `CASCADE`, `SET NULL`, and `SET DEFAULT` actions. Attempting to create or update a table with such references to another table will put the table into a failed state.
+ [XA transactions](https://dev.mysql.com/doc/refman/8.0/en/xa.html) performed on the source DB cluster cause the integration to enter a state of `Syncing`.

### Aurora PostgreSQL limitations
<a name="zero-etl.reqs-lims-apg"></a>
+ Your source DB cluster must be running a supported version of Aurora PostgreSQL. For a list of supported versions, see [Supported Regions and Aurora DB engines for zero-ETL integrations](Concepts.Aurora_Fea_Regions_DB-eng.Feature.Zero-ETL.md).
+ If you select an Aurora PostgreSQL source DB cluster, you must specify at least one data filter pattern. At minimum, the pattern must include a single database (`database-name.*.*`) for replication to the target warehouse. For more information, see [Data filtering for Aurora zero-ETL integrations](zero-etl.filtering.md).
+ All databases created within the source Aurora PostgreSQL DB cluster must use UTF-8 encoding.
+ When using declarative partitioning, the table partitions will be replicated to Amazon Redshift. However, the partitioned table itself isn't replicated to Amazon Redshift.
+ [Two-phase transactions](https://www.postgresql.org/docs/current/two-phase.html) aren't supported.
+ If you delete all DB instances from a DB cluster that is the source of an integration and then re-add a DB instance, replication breaks between the source and the target clusters.
+ The source DB cluster can't use Aurora Limitless Database.
+ Primary keys are required on all tables present in the data filter. Any tables without a primary key will be put into the failed state.

### Amazon Redshift limitations
<a name="zero-etl.reqs-lims-redshift"></a>

For a list of Amazon Redshift limitations related to zero-ETL integrations, see [Considerations when using zero-ETL integrations with Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/mgmt/zero-etl.reqs-lims.html) in the *Amazon Redshift Management Guide*.

### Amazon SageMaker AI lakehouse limitations
<a name="zero-etl.reqs-lims-sagemaker-"></a>

Following is a limitation for Amazon SageMaker AI lakehouse zero-ETL integrations.
+ Catalog names are limited to 19 characters in length.

## Quotas
<a name="zero-etl.quotas"></a>

Your account has the following quotas related to Aurora zero-ETL integrations. Each quota is per-Region unless otherwise specified.


| Name | Default | Description | 
| --- | --- | --- | 
| Integrations | 100 | The total number of integrations within an AWS account. | 
| Integrations per target | 50 | The number of integrations sending data to a single target data warehouse or lakehouse. | 
| Integrations per source cluster | 5 | The number of integrations sending data from a single source DB cluster. | 

In addition, the target warehouse places certain limits on the number of tables allowed in each DB instance or cluster node. For more information about Amazon Redshift quotas and limits, see [Quotas and limits in Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/mgmt/amazon-redshift-limits.html) in the *Amazon Redshift Management Guide*.

## Supported Regions
<a name="zero-etl.regions"></a>

Aurora zero-ETL integrations are available in a subset of AWS Regions. For a list of supported Regions, see [Supported Regions and Aurora DB engines for zero-ETL integrations](Concepts.Aurora_Fea_Regions_DB-eng.Feature.Zero-ETL.md).

# Getting started with Aurora zero-ETL integrations
<a name="zero-etl.setting-up"></a>

Before you create a zero-ETL integration, configure your Aurora DB cluster and your data warehouse with the required parameters and permissions. During setup, you'll complete the following steps:

1. [Create a custom DB cluster parameter group](#zero-etl.parameters).

1. [Create a source DB cluster](#zero-etl.create-cluster).

1. [Create a target data warehouse for Amazon Redshift](#zero-etl-setting-up.data-warehouse) or [ Create a target Amazon SageMaker Lakehouse](#zero-etl-setting-up.sagemaker).

After you complete these tasks, continue to [Creating Aurora zero-ETL integrations with Amazon Redshift](zero-etl.creating.md) or [Creating Aurora zero-ETL integrations with an Amazon SageMaker lakehouse](zero-etl.creating-smlh.md).

You can use the AWS SDKs to automate the setup process for you. For more information, see [Set up an integration using the AWS SDKs](#zero-etl.setup-sdk).

**Tip**  
You can have RDS complete these setup steps for you while you're creating the integration, rather than performing them manually. To immediately start creating an integration, see [Creating Aurora zero-ETL integrations with Amazon Redshift](zero-etl.creating.md).

For Step 3, you can choose to create either a target data warehouse (Step 3a) or a target lakehouse (Step 3b) depending on your needs:
+ Choose a data warehouse if you need traditional data warehousing capabilities with SQL-based analytics.
+ Choose an Amazon SageMaker Lakehouse if you need machine learning capabilities and want to use lakehouse features for data science and ML workflows.

## Step 1: Create a custom DB cluster parameter group
<a name="zero-etl.parameters"></a>

Aurora zero-ETL integrations require specific values for the DB cluster parameters that control replication. Specifically, Aurora MySQL requires *enhanced binlog* (`aurora_enhanced_binlog`), and Aurora PostgreSQL requires *enhanced logical replication* (`aurora.enhanced_logical_replication`).

To configure binary logging or logical replication, you must first create a custom DB cluster parameter group, and then associate it with the source DB cluster.

**Aurora MySQL (aurora-mysql8.0 family)**:
+ `aurora_enhanced_binlog=1`
+ `binlog_backup=0`
+ `binlog_format=ROW`
+ `binlog_replication_globaldb=0`
+ `binlog_row_image=full`
+ `binlog_row_metadata=full`

In addition, make sure that the `binlog_transaction_compression` parameter is *not* set to `ON`, and that the `binlog_row_value_options` parameter is *not* set to `PARTIAL_JSON`.

For more information about Aurora MySQL enhanced binlog, see [Setting up enhanced binlog for Aurora MySQL](AuroraMySQL.Enhanced.binlog.md).

**Aurora PostgreSQL (aurora-postgresql16 family):**
+ `rds.logical_replication=1`
+ `aurora.enhanced_logical_replication=1`
+ `aurora.logical_replication_backup=0`
+ `aurora.logical_replication_globaldb=0`

Enabling enhanced logical replication (`aurora.enhanced_logical_replication`) will always write all column values to the write ahead log (WAL) even if `REPLICA IDENTITY FULL` isn't enabled. This might increase the IOPS for your source DB cluster.

**Important**  
If you enable or disable the `aurora.enhanced_logical_replication` DB cluster parameter, the primary DB instance invalidates all logical replication slots. This stops replication from the source to the target, and you must recreate replication slots on the primary DB instance. To prevent interruptions, keep the parameter state consistent during replication.

## Step 2: Select or create a source DB cluster
<a name="zero-etl.create-cluster"></a>

After you create a custom DB cluster parameter group, choose or create an Aurora DB cluster. This cluster will be the source of data replication to the target data warehouse. You can specify a DB cluster that uses provisioned DB instances or Aurora Serverless v2 DB instances as the source. For instructions to create a DB cluster, see [Creating an Amazon Aurora DB cluster](Aurora.CreateInstance.md) or [Creating a DB cluster that uses Aurora Serverless v2](aurora-serverless-v2.create.md). 

The database must be running a supported DB engine version. For a list of supported versions, see [Supported Regions and Aurora DB engines for zero-ETL integrations](Concepts.Aurora_Fea_Regions_DB-eng.Feature.Zero-ETL.md).

When you create the database, under **Additional configuration**, change the default **DB cluster parameter group** to the custom parameter group that you created in the previous step.

**Note**  
If you associate the parameter group with the DB cluster *after* the cluster is already created, you must reboot the primary DB instance in the cluster to apply the changes before you can create a zero-ETL integration. For instructions, see [Rebooting an Amazon Aurora DB cluster or Amazon Aurora DB instance](USER_RebootCluster.md).

## Step 3a: Create a target data warehouse
<a name="zero-etl-setting-up.data-warehouse"></a>

After you create your source DB cluster, you must create and configure a target data warehouse. The data warehouse must meet the following requirements:
+ Using an RA3 node type with at least two nodes, or Redshift Serverless.
+ Encrypted (if using a provisioned cluster). For more information, see [Amazon Redshift database encryption](https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-db-encryption.html).

For instructions to create a data warehouse, see [Creating a cluster](https://docs.aws.amazon.com/redshift/latest/mgmt/create-cluster) for provisioned clusters, or [Creating a workgroup with a namespace](https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-console-workgroups-create-workgroup-wizard.html) for Redshift Serverless.

### Enable case sensitivity on the data warehouse
<a name="zero-etl-setting-up.case-sensitivity"></a>

For the integration to be successful, the case sensitivity parameter ([https://docs.aws.amazon.com/redshift/latest/dg/r_enable_case_sensitive_identifier.html](https://docs.aws.amazon.com/redshift/latest/dg/r_enable_case_sensitive_identifier.html)) must be enabled for the data warehouse. By default, case sensitivity is disabled on all provisioned clusters and Redshift Serverless workgroups.

To enable case sensitivity, perform the following steps depending on your data warehouse type:
+ **Provisioned cluster** – To enable case sensitivity on a provisioned cluster, create a custom parameter group with the `enable_case_sensitive_identifier` parameter enabled. Then, associate the parameter group with the cluster. For instructions, see [Managing parameter groups using the console](https://docs.aws.amazon.com/redshift/latest/mgmt/managing-parameter-groups-console.html) or [Configuring parameter values using the AWS CLI](https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-parameter-groups.html#configure-parameters-using-the-clil). 
**Note**  
Remember to reboot the cluster after you associate the custom parameter group with it.
+ **Serverless workgroup** – To enable case sensitivity on a Redshift Serverless workgroup, you must use the AWS CLI. The Amazon Redshift console doesn't currently support modifying Redshift Serverless parameter values. Send the following [update-workgroup](https://docs.aws.amazon.com/cli/latest/reference/redshift-serverless/update-workgroup.html) request:

  ```
  aws redshift-serverless update-workgroup \
    --workgroup-name target-workgroup \
    --config-parameters parameterKey=enable_case_sensitive_identifier,parameterValue=true
  ```

  You don't need to reboot a workgroup after you modify its parameter values.

### Configure authorization for the data warehouse
<a name="zero-etl.setup-auth"></a>

After you create a data warehouse, you must configure the source Aurora DB cluster as an authorized integration source. For instructions, see [Configure authorization for your Amazon Redshift data warehouse](https://docs.aws.amazon.com/redshift/latest/mgmt/zero-etl-using.setting-up.html#zero-etl-using.redshift-iam).

## Set up an integration using the AWS SDKs
<a name="zero-etl.setup-sdk"></a>

Rather than setting up each resource manually, you can run the following Python script to automatically set up the required resources for you. The code example uses the [AWS SDK for Python (Boto3)](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) to create a source Amazon Aurora DB cluster and target data warehouse, each with the required parameter values. It then waits for the databases to be available before creating a zero-ETL integration between them. You can comment out different functions depending on which resources you need to set up. 

To install the required dependencies, run the following commands:

```
pip install boto3
pip install time
```

Within the script, optionally modify the names of the source, target, and parameter groups. The final function creates an integration named `my-integration` after the resources are set up.

### Python code example
<a name="zero-etl.setup-sdk-python"></a>

------
#### [ Aurora MySQL ]

```
import boto3
import time

# Build the client using the default credential configuration.
# You can use the CLI and run 'aws configure' to set access key, secret
# key, and default Region.

rds = boto3.client('rds')
redshift = boto3.client('redshift')
sts = boto3.client('sts')

source_cluster_name = 'my-source-cluster' # A name for the source cluster
source_param_group_name = 'my-source-param-group' # A name for the source parameter group
target_cluster_name = 'my-target-cluster' # A name for the target cluster
target_param_group_name = 'my-target-param-group' # A name for the target parameter group

def create_source_cluster(*args):
    """Creates a source Aurora MySQL DB cluster"""

    response = rds.create_db_cluster_parameter_group(
        DBClusterParameterGroupName=source_param_group_name,
        DBParameterGroupFamily='aurora-mysql8.0',
        Description='For Aurora MySQL binary logging'
    )
    print('Created source parameter group: ' + response['DBClusterParameterGroup']['DBClusterParameterGroupName'])

    response = rds.modify_db_cluster_parameter_group(
        DBClusterParameterGroupName=source_param_group_name,
        Parameters=[
            {
                'ParameterName': 'aurora_enhanced_binlog',
                'ParameterValue': '1',
                'ApplyMethod': 'pending-reboot'
            },
            {
                'ParameterName': 'binlog_backup',
                'ParameterValue': '0',
                'ApplyMethod': 'pending-reboot'
            },
            {
                'ParameterName': 'binlog_format',
                'ParameterValue': 'ROW',
                'ApplyMethod': 'pending-reboot'
            },
            {
                'ParameterName': 'binlog_replication_globaldb',
                'ParameterValue': '0',
                'ApplyMethod': 'pending-reboot'
            },
            {
                'ParameterName': 'binlog_row_image',
                'ParameterValue': 'full',
                'ApplyMethod': 'pending-reboot'
            },
            {
                'ParameterName': 'binlog_row_metadata',
                'ParameterValue': 'full',
                'ApplyMethod': 'pending-reboot'
            }
        ]
    )
    print('Modified source parameter group: ' + response['DBClusterParameterGroupName'])

    response = rds.create_db_cluster(
        DBClusterIdentifier=source_cluster_name,
        DBClusterParameterGroupName=source_param_group_name,
        Engine='aurora-mysql',
        EngineVersion='8.0.mysql_aurora.3.05.2',
        DatabaseName='myauroradb',
        MasterUsername='username',
        MasterUserPassword='Password01**'
    )
    print('Creating source cluster: ' + response['DBCluster']['DBClusterIdentifier'])
    source_arn = (response['DBCluster']['DBClusterArn'])
    create_target_cluster(target_cluster_name, source_arn, target_param_group_name)

    response = rds.create_db_instance(
        DBInstanceClass='db.r6g.2xlarge',
        DBClusterIdentifier=source_cluster_name,
        DBInstanceIdentifier=source_cluster_name + '-instance',
        Engine='aurora-mysql'
    )
    return(response)

def create_target_cluster(target_cluster_name, source_arn, target_param_group_name):
    """Creates a target Redshift cluster"""

    response = redshift.create_cluster_parameter_group(
        ParameterGroupName=target_param_group_name,
        ParameterGroupFamily='redshift-1.0',
        Description='For Aurora MySQL zero-ETL integrations'
    )
    print('Created target parameter group: ' + response['ClusterParameterGroup']['ParameterGroupName'])

    response = redshift.modify_cluster_parameter_group(
        ParameterGroupName=target_param_group_name,
        Parameters=[
            {
                'ParameterName': 'enable_case_sensitive_identifier',
                'ParameterValue': 'true'
            }
        ]
    )
    print('Modified target parameter group: ' + response['ParameterGroupName'])

    response = redshift.create_cluster(
        ClusterIdentifier=target_cluster_name,
        NodeType='ra3.4xlarge',
        NumberOfNodes=2,
        Encrypted=True,
        MasterUsername='username',
        MasterUserPassword='Password01**',
        ClusterParameterGroupName=target_param_group_name
    )
    print('Creating target cluster: ' + response['Cluster']['ClusterIdentifier'])
    
    # Retrieve the target cluster ARN
    response = redshift.describe_clusters(
        ClusterIdentifier=target_cluster_name
    )
    target_arn = response['Clusters'][0]['ClusterNamespaceArn']

    # Retrieve the current user's account ID
    response = sts.get_caller_identity()
    account_id = response['Account']

    # Create a resource policy specifying cluster ARN and account ID
    response = redshift.put_resource_policy(
        ResourceArn=target_arn,
        Policy='''
        {
            \"Version\":\"2012-10-17\",		 	 	 
            \"Statement\":[
                {\"Effect\":\"Allow\",
                \"Principal\":{
                    \"Service\":\"redshift.amazonaws.com\"
                },
                \"Action\":[\"redshift:AuthorizeInboundIntegration\"],
                \"Condition\":{
                    \"StringEquals\":{
                        \"aws:SourceArn\":\"%s\"}
                    }
                },
                {\"Effect\":\"Allow\",
                \"Principal\":{
                    \"AWS\":\"arn:aws:iam::%s:root\"},
                \"Action\":\"redshift:CreateInboundIntegration\"}
            ]
        }
        ''' % (source_arn, account_id)
    )
    return(response)

def wait_for_cluster_availability(*args):
    """Waits for both clusters to be available"""

    print('Waiting for clusters to be available...')

    response = rds.describe_db_clusters(
        DBClusterIdentifier=source_cluster_name
    )
    source_status = response['DBClusters'][0]['Status']
    source_arn = response['DBClusters'][0]['DBClusterArn']

    response = rds.describe_db_instances(
        DBInstanceIdentifier=source_cluster_name + '-instance'
    )
    source_instance_status = response['DBInstances'][0]['DBInstanceStatus']

    response = redshift.describe_clusters(
        ClusterIdentifier=target_cluster_name
    )
    target_status = response['Clusters'][0]['ClusterStatus']
    target_arn = response['Clusters'][0]['ClusterNamespaceArn']

    # Every 60 seconds, check whether the clusters are available.
    if source_status != 'available' or target_status != 'available' or  source_instance_status != 'available':
        time.sleep(60)
        response = wait_for_cluster_availability(
            source_cluster_name, target_cluster_name)
    else:
        print('Clusters available. Ready to create zero-ETL integration.')
        create_integration(source_arn, target_arn)
        return

def create_integration(source_arn, target_arn):
    """Creates a zero-ETL integration using the source and target clusters"""

    response = rds.create_integration(
        SourceArn=source_arn,
        TargetArn=target_arn,
        IntegrationName='my-integration'
    )
    print('Creating integration: ' + response['IntegrationName'])
    
def main():
    """main function"""
    create_source_cluster(source_cluster_name, source_param_group_name)
    wait_for_cluster_availability(source_cluster_name, target_cluster_name)

if __name__ == "__main__":
    main()
```

------
#### [ Aurora PostgreSQL ]

```
import boto3
import time

# Build the client using the default credential configuration.
# You can use the CLI and run 'aws configure' to set access key, secret
# key, and default Region.

rds = boto3.client('rds')
redshift = boto3.client('redshift')
sts = boto3.client('sts')

source_cluster_name = 'my-source-cluster' # A name for the source cluster
source_param_group_name = 'my-source-param-group' # A name for the source parameter group
target_cluster_name = 'my-target-cluster' # A name for the target cluster
target_param_group_name = 'my-target-param-group' # A name for the target parameter group

def create_source_cluster(*args):
    """Creates a source Aurora PostgreSQL DB cluster"""

    response = rds.create_db_cluster_parameter_group(
        DBClusterParameterGroupName=source_param_group_name,
        DBParameterGroupFamily='aurora-postgresql16',
        Description='For Aurora PostgreSQL logical replication'
    )
    print('Created source parameter group: ' + response['DBClusterParameterGroup']['DBClusterParameterGroupName'])

    response = rds.modify_db_cluster_parameter_group(
        DBClusterParameterGroupName=source_param_group_name,
        Parameters=[
            {
                'ParameterName': 'rds.logical_replication',
                'ParameterValue': '1',
                'ApplyMethod': 'pending-reboot'
            },
            {
                'ParameterName': 'aurora.enhanced_logical_replication',
                'ParameterValue': '1',
                'ApplyMethod': 'pending-reboot'
            },
            {
                'ParameterName': 'aurora.logical_replication_backup',
                'ParameterValue': '0',
                'ApplyMethod': 'pending-reboot'
            },
            {
                'ParameterName': 'aurora.logical_replication_globaldb',
                'ParameterValue': '0',
                'ApplyMethod': 'pending-reboot'
            }
        ]
    )
    print('Modified source parameter group: ' + response['DBClusterParameterGroupName'])

    response = rds.create_db_cluster(
        DBClusterIdentifier=source_cluster_name,
        DBClusterParameterGroupName=source_param_group_name,
        Engine='aurora-postgresql',
        EngineVersion='16.4.aurora-postgresql',
        DatabaseName='mypostgresdb',
        MasterUsername='username',
        MasterUserPassword='Password01**'
    )
    print('Creating source cluster: ' + response['DBCluster']['DBClusterIdentifier'])
    source_arn = (response['DBCluster']['DBClusterArn'])
    create_target_cluster(target_cluster_name, source_arn, target_param_group_name)

    response = rds.create_db_instance(
        DBInstanceClass='db.r6g.2xlarge',
        DBClusterIdentifier=source_cluster_name,
        DBInstanceIdentifier=source_cluster_name + '-instance',
        Engine='aurora-postgresql'
    )
    return(response)

def create_target_cluster(target_cluster_name, source_arn, target_param_group_name):
    """Creates a target Redshift cluster"""

    response = redshift.create_cluster_parameter_group(
        ParameterGroupName=target_param_group_name,
        ParameterGroupFamily='redshift-1.0',
        Description='For Aurora PostgreSQL zero-ETL integrations'
    )
    print('Created target parameter group: ' + response['ClusterParameterGroup']['ParameterGroupName'])

    response = redshift.modify_cluster_parameter_group(
        ParameterGroupName=target_param_group_name,
        Parameters=[
            {
                'ParameterName': 'enable_case_sensitive_identifier',
                'ParameterValue': 'true'
            }
        ]
    )
    print('Modified target parameter group: ' + response['ParameterGroupName'])

    response = redshift.create_cluster(
        ClusterIdentifier=target_cluster_name,
        NodeType='ra3.4xlarge',
        NumberOfNodes=2,
        Encrypted=True,
        MasterUsername='username',
        MasterUserPassword='Password01**',
        ClusterParameterGroupName=target_param_group_name
    )
    print('Creating target cluster: ' + response['Cluster']['ClusterIdentifier'])
    
    # Retrieve the target cluster ARN
    response = redshift.describe_clusters(
        ClusterIdentifier=target_cluster_name
    )
    target_arn = response['Clusters'][0]['ClusterNamespaceArn']

    # Retrieve the current user's account ID
    response = sts.get_caller_identity()
    account_id = response['Account']

    # Create a resource policy specifying cluster ARN and account ID
    response = redshift.put_resource_policy(
        ResourceArn=target_arn,
        Policy='''
        {
            \"Version\":\"2012-10-17\",		 	 	 
            \"Statement\":[
                {\"Effect\":\"Allow\",
                \"Principal\":{
                    \"Service\":\"redshift.amazonaws.com\"
                },
                \"Action\":[\"redshift:AuthorizeInboundIntegration\"],
                \"Condition\":{
                    \"StringEquals\":{
                        \"aws:SourceArn\":\"%s\"}
                    }
                },
                {\"Effect\":\"Allow\",
                \"Principal\":{
                    \"AWS\":\"arn:aws:iam::%s:root\"},
                \"Action\":\"redshift:CreateInboundIntegration\"}
            ]
        }
        ''' % (source_arn, account_id)
    )
    return(response)

def wait_for_cluster_availability(*args):
    """Waits for both clusters to be available"""

    print('Waiting for clusters to be available...')

    response = rds.describe_db_clusters(
        DBClusterIdentifier=source_cluster_name
    )
    source_status = response['DBClusters'][0]['Status']
    source_arn = response['DBClusters'][0]['DBClusterArn']

    response = rds.describe_db_instances(
        DBInstanceIdentifier=source_cluster_name + '-instance'
    )
    source_instance_status = response['DBInstances'][0]['DBInstanceStatus']

    response = redshift.describe_clusters(
        ClusterIdentifier=target_cluster_name
    )
    target_status = response['Clusters'][0]['ClusterStatus']
    target_arn = response['Clusters'][0]['ClusterNamespaceArn']

    # Every 60 seconds, check whether the clusters are available.
    if source_status != 'available' or target_status != 'available' or  source_instance_status != 'available':
        time.sleep(60)
        response = wait_for_cluster_availability(
            source_cluster_name, target_cluster_name)
    else:
        print('Clusters available. Ready to create zero-ETL integration.')
        create_integration(source_arn, target_arn)
        return

def create_integration(source_arn, target_arn):
    """Creates a zero-ETL integration using the source and target clusters"""

    response = rds.create_integration(
        SourceArn=source_arn,
        TargetArn=target_arn,
        IntegrationName='my-integration'
    )
    print('Creating integration: ' + response['IntegrationName'])
    
def main():
    """main function"""
    create_source_cluster(source_cluster_name, source_param_group_name)
    wait_for_cluster_availability(source_cluster_name, target_cluster_name)

if __name__ == "__main__":
    main()
```

------

## Step 3b: Create an AWS Glue catalog for Amazon SageMaker Lakehouse zero-ETL integration
<a name="zero-etl-setting-up.sagemaker"></a>

When creating a zero-ETL integration with an Amazon SageMaker Lakehouse, you must create an AWS Glue managed catalog in AWS Lake Formation. The target catalog must be an Amazon Redshift managed catalog. To create an Amazon Redshift managed catalog, first create the `AWSServiceRoleForRedshift` service-linked role. In the Lake Formation console, add the `AWSServiceRoleForRedshift` as a read-only administrator.

For more information about the previous tasks, see the following topics.
+ For information about creating an Amazon Redshift managed catalog, see [Creating an Amazon Redshift managed catalog in the AWS Glue Data Catalog](https://docs.aws.amazon.com/lake-formation/latest/dg/create-rms-catalog.html) in the *AWS Lake Formation Developer Guide*.
+ For information about the service-linked role for Amazon Redshift, see [Using service-linked roles for Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/mgmt/using-service-linked-roles.html) in the *Amazon Redshift Management Guide*.
+ For information about read-only administrator permissions for Lake Formation, see [Lake Formation personas and IAM permissions reference](https://docs.aws.amazon.com/lake-formation/latest/dg/permissions-reference.html) in the *AWS Lake Formation Developer Guide*.

### Configure permissions for the target AWS Glue catalog
<a name="zero-etl-setting-up.sagemaker-permissions"></a>

Before creating a target catalog for zero-ETL integration, you must create the Lake Formation target creation role and the AWS Glue data transfer role. Use the Lake Formation target creation role to create the target catalog. When creating the target catalog, enter the Glue data transfer role in the **IAM role** field in the **Access from engines section**.

#### Lake Formation target creation role
<a name="zero-etl-setting-up.target-creation-role"></a>

The target creation role must be a Lake Formation administrator and requires the following permissions.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "lakeformation:RegisterResource",
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "s3:PutEncryptionConfiguration",
                "iam:PassRole",
                "glue:CreateCatalog",
                "glue:GetCatalog",
                "s3:PutBucketTagging",
                "s3:PutLifecycleConfiguration",
                "s3:PutBucketPolicy",
                "s3:CreateBucket",
                "redshift-serverless:CreateNamespace",
                "s3:DeleteBucket",
                "s3:PutBucketVersioning",
                "redshift-serverless:CreateWorkgroup"
            ],
            "Resource": [
                "arn:aws:glue:*:111122223333:catalog",
                "arn:aws:glue:*:111122223333:catalog/*",
                "arn:aws:s3:::*",
                "arn:aws:redshift-serverless:*:111122223333:workgroup/*",
                "arn:aws:redshift-serverless:*:111122223333:namespace/*",
                "arn:aws:iam::111122223333:role/GlueDataCatalogDataTransferRole"
            ]
        }
    ]
}
```

------

The target creation role must have the following trust relationship.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "glue.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        },
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::111122223333:user/Username"
          },
          "Action": "sts:AssumeRole"
        }
    ]
}
```

------

#### Glue data transfer role
<a name="zero-etl-setting-up.glue-data-transfer-role"></a>

The Glue data transfer role is required for MySQL catalog operations and must have the following permissions.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "DataTransferRolePolicy",
            "Effect": "Allow",
            "Action": [
                "kms:GenerateDataKey",
                "kms:Decrypt",
                "glue:GetCatalog",
                "glue:GetDatabase"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}
```

------

The Glue data transfer role must have the following trust relationship.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "glue.amazonaws.com",
                    "redshift.amazonaws.com"
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}
```

------

## Next steps
<a name="zero-etl.setup-next"></a>

With a source Aurora DB cluster and either an Amazon Redshift target data warehouse or Amazon SageMaker Lakehouse, you can create a zero-ETL integration and replicate data. For instructions, see [Creating Aurora zero-ETL integrations with Amazon Redshift](zero-etl.creating.md).

# Creating Aurora zero-ETL integrations with Amazon Redshift
<a name="zero-etl.creating"></a>

When you create an Aurora zero-ETL integration, you specify the source Aurora DB cluster and the target Amazon Redshift data warehouse. You can also customize encryption settings and add tags. Aurora creates an integration between the source DB cluster and its target. Once the integration is active, any data that you insert into the source DB cluster will be replicated into the configured Amazon Redshift target.

## Prerequisites
<a name="zero-etl.create-prereqs"></a>

Before you create a zero-ETL integration, you must create a source DB cluster and a target Amazon Redshift data warehouse. You also must allow replication into the data warehouse by adding the DB cluster as an authorized integration source.

For instructions to complete each of these steps, see [Getting started with Aurora zero-ETL integrations](zero-etl.setting-up.md).

## Required permissions
<a name="zero-etl.create-permissions"></a>

Certain IAM permissions are required to create a zero-ETL integration. At minimum, you need permissions to perform the following actions:
+ Create zero-ETL integrations for the source Aurora DB cluster.
+ View and delete all zero-ETL integrations.
+ Create inbound integrations into the target data warehouse.

The following sample policies demonstrate the [least privilege permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#grant-least-privilege) required to create and manage integrations. You might not need these exact permissions if your user or role has broader permissions, such as an `AdministratorAccess` managed policy.

**Note**  
Redshift Amazon Resource Names (ARNs) have the following format. Note the use of a forward slash `(/`) rather than a colon (`:`) before the serverless namespace UUID.  
Provisioned cluster – `arn:aws:redshift:{region}:{account-id}:namespace:namespace-uuid`
Serverless – `arn:aws:redshift-serverless:{region}:{account-id}:namespace/namespace-uuid`

### Sample policy for Redshift target
<a name="zero-etl.create-sample-policy"></a>

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "CreateIntegration",
      "Effect": "Allow",
      "Action": [
        "rds:CreateIntegration"
      ],
      "Resource": [
        "arn:aws:rds:us-east-1:123456789012:db:source-db",
        "arn:aws:rds:us-east-1:123456789012:integration:*"
      ]
    },
    {
      "Sid": "DescribeIntegrationDetails",
      "Effect": "Allow",
      "Action": [
        "rds:DescribeIntegrations"
      ],
      "Resource": [
      "arn:aws:rds:us-east-1:123456789012:integration:*"
  ]
    },
    {
      "Sid": "ChangeIntegrationDetails",
      "Effect": "Allow",
      "Action": [
        "rds:DeleteIntegration",
        "rds:ModifyIntegration"
      ],
      "Resource": [
        "arn:aws:rds:us-east-1:123456789012:integration:*"
      ]
    },
    {
      "Sid": "AllowRedShiftIntegration",
      "Effect": "Allow",
      "Action": [
        "redshift:CreateInboundIntegration"
      ],
      "Resource": [
        "arn:aws:redshift:us-east-1:123456789012:namespace:namespace-uuid"
      ]
    }
  ]
}
```

------

### Choosing a target data warehouse in a different account
<a name="zero-etl.create-permissions-cross-account"></a>

If you plan to specify a target Amazon Redshift data warehouse that's in another AWS account, you must create a role that allows users in the current account to access resources in the target account. For more information, see [Providing access to an IAM user in another AWS account that you own](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_common-scenarios_aws-accounts.html).

The role must have the following permissions, which allow the user to view available Amazon Redshift provisioned clusters and Redshift Serverless namespaces in the target account.

#### Required permissions and trust policy
<a name="zero-etl.cross-account-sample-policy"></a>

------
#### [ JSON ]

****  

```
{
   "Version":"2012-10-17",		 	 	 
   "Statement":[
      {
         "Effect":"Allow",
         "Action":[
            "redshift:DescribeClusters",
            "redshift-serverless:ListNamespaces"
         ],
         "Resource":[
            "*"
         ]
      }
   ]
}
```

------

The role must have the following trust policy, which specifies the target account ID.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::111122223333:root"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}
```

------

For instructions to create the role, see [Creating a role using custom trust policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-custom.html).

## Creating zero-ETL integrations
<a name="zero-etl.create"></a>

You can create a zero-ETL integration using the AWS Management Console, the AWS CLI, or the RDS API.

**Important**  
zero-ETL integrations do not support refresh or resync operations. If you encounter issues with an integration after creation, you must delete the integration and create a new one.

### RDS console
<a name="zero-etl.create-console"></a>

**To create a zero-ETL integration**

1. Sign in to the AWS Management Console and open the Amazon RDS console at [https://console.aws.amazon.com/rds/](https://console.aws.amazon.com/rds/).

1. In the left navigation pane, choose **Zero-ETL integrations**.

1. Choose **Create zero-ETL integration**.

1. For **Integration identifier**, enter a name for the integration. The name can have up to 63 alphanumeric characters and can include hyphens.
**Important**  
Catalog names are limited to 19 characters in length. Ensure your integration identifier meets this requirement if it will be used as a catalog name.

1. Choose **Next**.

1. For **Source**, select the Aurora DB cluster where the data will originate from.
**Note**  
RDS notifies you if the DB cluster parameters aren't configured correctly. If you receive this message, you can either choose **Fix it for me**, or configure them manually. For instructions to fix them manually, see [Step 1: Create a custom DB cluster parameter group](zero-etl.setting-up.md#zero-etl.parameters).  
Modifying DB cluster parameters requires a reboot. Before you can create the integration, the reboot must be complete and the new parameter values must be successfully applied to the cluster.

1. (Optional) Select **Customize data filtering options** and add data filters to your integration. You can use data filters to define the scope of replication to the target data warehouse. For more information, see [Data filtering for Aurora zero-ETL integrations](zero-etl.filtering.md).

1. Once your source DB cluster is successfully configured, choose **Next**.

1. For **Target**, do the following:

   1. (Optional) To use a different AWS account for the Amazon Redshift target, choose **Specify a different account**. Then, enter the ARN of an IAM role with permissions to display your data warehouses. For instructions to create the IAM role, see [Choosing a target data warehouse in a different account](#zero-etl.create-permissions-cross-account).

   1. For **Amazon Redshift data warehouse**, select the target for replicated data from the source DB cluster. You can choose a provisioned Amazon Redshift *cluster* or a Redshift Serverless *namespace* as the target.
**Note**  
RDS notifies you if the resource policy or case sensitivity settings for the specified data warehouse aren't configured correctly. If you receive this message, you can either choose **Fix it for me**, or configure them manually. For instructions to fix them manually, see [Turn on case sensitivity for your data warehouse](https://docs.aws.amazon.com/redshift/latest/mgmt/zero-etl-using.setting-up.html#zero-etl-setting-up.case-sensitivity) and [Configure authorization for your data warehouse](https://docs.aws.amazon.com/redshift/latest/mgmt/zero-etl-using.setting-up.html#zero-etl-using.redshift-iam) in the *Amazon Redshift Management Guide*.  
Modifying case sensitivity for a *provisioned* Redshift cluster requires a reboot. Before you can create the integration, the reboot must be complete and the new parameter value must be successfully applied to the cluster.  
If your selected source and target are in different AWS accounts, then Amazon RDS cannot fix these settings for you. You must navigate to the other account and fix them manually in Amazon Redshift.

1. Once your target data warehouse is configured correctly, choose **Next**.

1. (Optional) For **Tags**, add one or more tags to the integration. For more information, see [Tagging Amazon Aurora andAmazon RDS resources](USER_Tagging.md).

1. For **Encryption**, specify how you want your integration to be encrypted. By default, RDS encrypts all integrations with an AWS owned key. To choose a customer managed key instead, enable **Customize encryption settings** and choose a KMS key to use for encryption. For more information, see [Encrypting Amazon Aurora resources](Overview.Encryption.md).

   Optionally, add an encryption context. For more information, see [Encryption context](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#encrypt_context) in the *AWS Key Management Service Developer Guide*.
**Note**  
Amazon RDS adds the following encryption context pairs in addition to any that you add:  
`aws:redshift:integration:arn` - `IntegrationArn`
`aws:servicename:id` - `Redshift`
This reduces the overall number of pairs that you can add from 8 to 6, and contributes to the overall character limit of the grant constraint. For more information, see [Using grant constraints](https://docs.aws.amazon.com/kms/latest/developerguide/create-grant-overview.html#grant-constraints) in the *AWS Key Management Service Developer Guide*.

1. Choose **Next**.

1. Review your integration settings and choose **Create zero-ETL integration**.

   If creation fails, see [I can't create a zero-ETL integration](zero-etl.troubleshooting.md#zero-etl.troubleshooting.creation) for troubleshooting steps.

The integration has a status of `Creating` while it's being created, and the target Amazon Redshift data warehouse has a status of `Modifying`. During this time, you can't query the data warehouse or make any configuration changes on it.

When the integration is successfully created, the status of the integration and the target Amazon Redshift data warehouse both change to `Active`.

### AWS CLI
<a name="zero-etl.create-cli"></a>

To create a zero-ETL integration using the AWS CLI, use the [create-integration](https://docs.aws.amazon.com/cli/latest/reference/rds/create-integration.html) command with the following options:

**Note**  
Remember that catalog names are limited to 19 characters. Choose your integration name accordingly if it will be used as a catalog name.
+ `--integration-name` – Specify a name for the integration.
+ `--source-arn` – Specify the ARN of the Aurora DB cluster that will be the source for the integration.
+ `--target-arn` – Specify the ARN of the Amazon Redshift data warehouse that will be the target for the integration.

**Example**  
For Linux, macOS, or Unix:  

```
aws rds create-integration \
    --integration-name my-integration \
    --source-arn arn:aws:rds:{region}:{account-id}:my-db \
    --target-arn arn:aws:redshift:{region}:{account-id}:namespace:namespace-uuid
```
For Windows:  

```
aws rds create-integration ^
    --integration-name my-integration ^
    --source-arn arn:aws:rds:{region}:{account-id}:my-db ^
    --target-arn arn:aws:redshift:{region}:{account-id}:namespace:namespace-uuid
```

### RDS API
<a name="zero-etl.create-api"></a>

To create a zero-ETL integration by using the Amazon RDS API, use the [https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_CreateIntegration.html](https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_CreateIntegration.html) operation with the following parameters:

**Note**  
Catalog names are limited to 19 characters. Ensure your IntegrationName parameter meets this requirement if it will be used as a catalog name.
+ `IntegrationName` – Specify a name for the integration.
+ `SourceArn` – Specify the ARN of the Aurora DB cluster that will be the source for the integration.
+ `TargetArn` – Specify the ARN of the Amazon Redshift data warehouse that will be the target for the integration.

## Encrypting integrations with a customer managed key
<a name="zero-etl.create-encrypt"></a>

If you specify a custom KMS key rather than an AWS owned key when you create an integration, the key policy must provide the Amazon Redshift service principal access to the `CreateGrant` action. In addition, it must allow the current user to perform to the `DescribeKey` and `CreateGrant` actions.

The following sample policy demonstrates how to provide the required permissions in the key policy. It includes context keys to further reduce the scope of permissions.

### Sample key policy
<a name="zero-etl.kms-sample-policy"></a>

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Id": "Key policy",
    "Statement": [
        {
            "Sid": "Enables IAM user permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::111122223333:root"
            },
            "Action": "kms:*",
            "Resource": "*"
        },
        {
            "Sid": "Allows the Redshift service principal to add a grant to a KMS key",
            "Effect": "Allow",
            "Principal": {
                "Service": "redshift.amazonaws.com"
            },
            "Action": "kms:CreateGrant",
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "kms:EncryptionContext:{context-key}": "{context-value}"
                },
                "ForAllValues:StringEquals": {
                    "kms:GrantOperations": [
                        "Decrypt",
                        "GenerateDataKey",
                        "CreateGrant"
                    ]
                }
            }
        },
        {
            "Sid": "Allows the current user or role to add a grant to a KMS key",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::111122223333:role/{role-name}"
            },
            "Action": "kms:CreateGrant",
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "kms:EncryptionContext:{context-key}": "{context-value}",
                    "kms:ViaService": "rds.us-east-1.amazonaws.com"
                },
                "ForAllValues:StringEquals": {
                    "kms:GrantOperations": [
                        "Decrypt",
                        "GenerateDataKey",
                        "CreateGrant"
                    ]
                }
            }
        },
        {
            "Sid": "Allows the current uer or role to retrieve information about a KMS key",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::111122223333:role/{role-name}"
            },
            "Action": "kms:DescribeKey",
            "Resource": "*"
        }
    ]
}
```

------

For more information, see [Creating a key policy](https://docs.aws.amazon.com/kms/latest/developerguide/key-policy-overview.html) in the *AWS Key Management Service Developer Guide*.

## Next steps
<a name="zero-etl.create-next"></a>

After you successfully create a zero-ETL integration, you must create a destination database within your target Amazon Redshift cluster or workgroup. Then, you can start adding data to the source Aurora DB cluster and querying it in Amazon Redshift. For instructions, see [Creating destination databases in Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/mgmt/zero-etl-using.creating-db.html).

# Creating Aurora zero-ETL integrations with an Amazon SageMaker lakehouse
<a name="zero-etl.creating-smlh"></a>

When you create an Aurora zero-ETL integration with an Amazon SageMaker lakehouse, you specify the source Aurora DB cluster and the target AWS Glue managed catalog. You can also customize encryption settings and add tags. Aurora creates an integration between the source DB cluster and its target. Once the integration is active, any data that you insert into the source DB cluster will be replicated into the configured target.

## Prerequisites
<a name="zero-etl.create-prereqs-smlh"></a>

Before you create a zero-ETL integration with an Amazon SageMaker lakehouse, you must create a source DB cluster and a target AWS Glue managed catalog. You also must allow replication into the catalog by adding the DB cluster as an authorized integration source.

For instructions to complete each of these steps, see [Getting started with Aurora zero-ETL integrations](zero-etl.setting-up.md).

## Required permissions
<a name="zero-etl.create-permissions-smlh"></a>

Certain IAM permissions are required to create a zero-ETL integration with an Amazon SageMaker lakehouse. At minimum, you need permissions to perform the following actions:
+ Create zero-ETL integrations for the source Aurora DB cluster.
+ View and delete all zero-ETL integrations.
+ Create inbound integrations into the target AWS Glue managed catalog.
+ Access Amazon S3 buckets used by the AWS Glue managed catalog.
+ Use AWS KMS keys for encryption if custom encryption is configured.
+ Register resources with Lake Formation.
+ Put resource policy on the AWS Glue managed catalog to authorize inbound integrations.

The following sample policy demonstrates the [least privilege permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#grant-least-privilege) required to create and manage integrations with an Amazon SageMaker lakehouse. You might not need these exact permissions if your user or role has broader permissions, such as an `AdministratorAccess` managed policy.

Additionally, you must configure a resource policy on the target AWS Glue managed catalog to authorize inbound integrations. Use the following AWS CLI command to apply the resource policy.

### Sample AWS CLI command to authorize inbound integrations on the target catalog
<a name="zero-etl.create-sample-policy-smlh"></a>

```
aws glue put-resource-policy \
      --policy-in-json  '{
    "Version": "2012-10-17",		 	 	 
    "Statement": [{
        "Effect": "Allow",
        "Principal": {
            "Service": "glue.amazonaws.com"
        },
        "Action": [
            "glue:AuthorizeInboundIntegration"
        ],
        "Resource": ["arn:aws:glue:region:account_id:catalog/catalog_name"],
        "Condition": {
            "StringEquals": {
                "aws:SourceArn": "arn:aws:rds:region:account_id:db:source_name"
            }
        }
    },
    {
        "Effect": "Allow",
        "Principal": {
            "AWS": "account_id"
        },
        "Action": ["glue:CreateInboundIntegration"],
        "Resource": ["arn:aws:glue:region:account_id:catalog/catalog_name"]
    }
    ]
}' \
      --region region
```

**Note**  
Glue catalog Amazon Resource Names (ARNs) have the following format:  
Glue catalog – `arn:aws:glue:{region}:{account-id}:catalog/catalog-name`

### Choosing a target AWS Glue managed catalog in a different account
<a name="zero-etl.create-permissions-cross-account-smlh"></a>

If you plan to specify a target AWS Glue managed catalog that's in another AWS account, you must create a role that allows users in the current account to access resources in the target account. For more information, see [Providing access to an IAM user in another AWS account that you own](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_common-scenarios_aws-accounts.html).

The role must have the following permissions, which allow the user to view available AWS Glue catalogs in the target account.

#### Required permissions and trust policy
<a name="zero-etl.cross-account-sample-policy-smlh"></a>

------
#### [ JSON ]

****  

```
{
   "Version":"2012-10-17",		 	 	 
   "Statement":[
      {
         "Effect":"Allow",
         "Action":[
            "glue:GetCatalog"
         ],
         "Resource":[
            "*"
         ]
      }
   ]
}
```

------

The role must have the following trust policy, which specifies the target account ID.

------
#### [ JSON ]

****  

```
{
   "Version":"2012-10-17",		 	 	 
   "Statement":[
      {
         "Effect":"Allow",
         "Principal":{
            "AWS": "arn:aws:iam::111122223333:root"
         },
         "Action":"sts:AssumeRole"
      }
   ]
}
```

------

For instructions to create the role, see [Creating a role using custom trust policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-custom.html).

## Creating zero-ETL integrations with an Amazon SageMaker lakehouse
<a name="zero-etl.create-smlh"></a>

You can create a zero-ETL integration with an Amazon SageMaker lakehouse using the AWS Management Console, the AWS CLI, or the RDS API.

**Important**  
Zero-ETL integrations with an Amazon SageMaker lakehouse do not support refresh or resync operations. If you encounter issues with an integration after creation, you must delete the integration and create a new one.

### RDS console
<a name="zero-etl.create-console-smlh"></a>

**To create a zero-ETL integration with an Amazon SageMaker lakehouse**

1. Sign in to the AWS Management Console and open the Amazon RDS console at [https://console.aws.amazon.com/rds/](https://console.aws.amazon.com/rds/).

1. In the left navigation pane, choose **Zero-ETL integrations**.

1. Choose **Create zero-ETL integration**.

1. For **Integration identifier**, enter a name for the integration. The name can have up to 63 alphanumeric characters and can include hyphens.

1. Choose **Next**.

1. For **Source**, select the Aurora DB cluster where the data will originate from.
**Note**  
RDS notifies you if the DB cluster parameters aren't configured correctly. If you receive this message, you can either choose **Fix it for me**, or configure them manually. For instructions to fix them manually, see [Step 1: Create a custom DB cluster parameter group](zero-etl.setting-up.md#zero-etl.parameters).  
Modifying DB cluster parameters requires a reboot. Before you can create the integration, the reboot must be complete and the new parameter values must be successfully applied to the cluster.

1. (Optional) Select **Customize data filtering options** and add data filters to your integration. You can use data filters to define the scope of replication to the target Amazon SageMaker lakehouse. For more information, see [Data filtering for Aurora zero-ETL integrations](zero-etl.filtering.md).

1. Once your source DB cluster is successfully configured, choose **Next**.

1. For **Target**, do the following:

   1. (Optional) To use a different AWS account for the Amazon SageMaker lakehouse target, choose **Specify a different account**. Then, enter the ARN of an IAM role with permissions to display your AWS Glue catalogs. For instructions to create the IAM role, see [Choosing a target AWS Glue managed catalog in a different account](#zero-etl.create-permissions-cross-account-smlh).

   1. For **AWS Glue catalog**, select the target for replicated data from the source DB cluster. You can choose an existing AWS Glue managed catalog as the target.

   1. The target IAM role needs describe permissions on the target catalog and must have the following permissions:

------
#### [ JSON ]

****  

      ```
      {
          "Version":"2012-10-17",		 	 	 
          "Statement": [
              {
                  "Sid": "VisualEditor0",
                  "Effect": "Allow",
                  "Action": "glue:GetCatalog",
                  "Resource": [
                      "arn:aws:glue:us-east-1:111122223333:catalog/*",
                      "arn:aws:glue:us-east-1:111122223333:catalog"
                  ]
              }
          ]
      }
      ```

------

      The target IAM role must have the following trust relationship:

------
#### [ JSON ]

****  

      ```
      {
          "Version":"2012-10-17",		 	 	 
          "Statement": [
              {
                  "Effect": "Allow",
                  "Principal": {
                      "Service": "glue.amazonaws.com"
                  },
                  "Action": "sts:AssumeRole"
              }
          ]
      }
      ```

------

   1. You must grant the target IAM role describe permissions for the target AWS Glue managed catalog with the Lake Formation administrator role created in [Step 3b: Create an AWS Glue catalog for Amazon SageMaker Lakehouse zero-ETL integration](zero-etl.setting-up.md#zero-etl-setting-up.sagemaker).
**Note**  
RDS notifies you if the resource policy or configuration settings for the specified AWS Glue managed catalog aren't configured correctly. If you receive this message, you can either choose **Fix it for me**, or configure them manually.  
If your selected source and target are in different AWS accounts, then Amazon RDS cannot fix these settings for you. You must navigate to the other account and fix them manually in SageMaker Unified Studio.

1. Once your target AWS Glue managed catalog is configured correctly, choose **Next**.

1. (Optional) For **Tags**, add one or more tags to the integration. For more information, see [Tagging Amazon Aurora andAmazon RDS resources](USER_Tagging.md).

1. For **Encryption**, specify how you want your integration to be encrypted. By default, RDS encrypts all integrations with an AWS owned key. To choose a customer managed key instead, enable **Customize encryption settings** and choose a KMS key to use for encryption. For more information, see [Encrypting Amazon Aurora resources](Overview.Encryption.md).

   Optionally, add an encryption context. For more information, see [Encryption context](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#encrypt_context) in the *AWS Key Management Service Developer Guide*.
**Note**  
Amazon RDS adds the following encryption context pairs in addition to any that you add:  
`aws:glue:integration:arn` - `IntegrationArn`
`aws:servicename:id` - `glue`
This reduces the overall number of pairs that you can add from 8 to 6, and contributes to the overall character limit of the grant constraint. For more information, see [Using grant constraints](https://docs.aws.amazon.com/kms/latest/developerguide/create-grant-overview.html#grant-constraints) in the *AWS Key Management Service Developer Guide*.

1. Choose **Next**.

1. Review your integration settings and choose **Create zero-ETL integration**.

   If creation fails, see [Troubleshooting Aurora zero-ETL integrations](zero-etl.troubleshooting.md) for troubleshooting steps.

The integration has a status of `Creating` while it's being created, and the target Amazon SageMaker lakehouse has a status of `Modifying`. During this time, you can't query the catalog or make any configuration changes on it.

When the integration is successfully created, the status of the integration and the target Amazon SageMaker lakehouse both change to `Active`.

### AWS CLI
<a name="zero-etl.create-cli-smlh"></a>

To prepare a target AWS Glue managed catalog for zero-ETL integration using the AWS CLI, you must first use the [create-integration-resource-property](https://docs.aws.amazon.com/cli/latest/reference/rds/create-integration.html) command with the following options:
+ `--resource-arn` – Specify the ARN of the AWS Glue managed catalog that will be the target for the integration.
+ `--target-processing-properties` – Specify the ARN of the IAM role to access the target AWS Glue managed catalog 

```
aws glue create-integration-resource-property --region us-east-1
 --resource-arn arn:aws:glue:region:account_id:catalog/catalog_name \
 --target-processing-properties '{"RoleArn" : "arn:aws:iam::account_id:role/TargetIamRole"}'
```

To create a zero-ETL integration with an Amazon SageMaker lakehouse using the AWS CLI, use the [create-integration](https://docs.aws.amazon.com/cli/latest/reference/rds/create-integration.html) command with the following options:
+ `--integration-name` – Specify a name for the integration.
+ `--source-arn` – Specify the ARN of the Aurora DB cluster that will be the source for the integration.
+ `--target-arn` – Specify the ARN of the AWS Glue managed catalog that will be the target for the integration.

**Example**  
For Linux, macOS, or Unix:  

```
aws rds create-integration \
    --integration-name my-sagemaker-integration \
    --source-arn arn:aws:rds:{region}:{account-id}:cluster:my-db \
    --target-arn arn:aws:glue:{region}:{account-id}:catalog/catalog-name
```
For Windows:  

```
aws rds create-integration ^
    --integration-name my-sagemaker-integration ^
    --source-arn arn:aws:rds:{region}:{account-id}:cluster:my-db ^
    --target-arn arn:aws:glue:{region}:{account-id}:catalog/catalog-name
```

### RDS API
<a name="zero-etl.create-api-smlh"></a>

To create a zero-ETL integration with Amazon SageMaker by using the Amazon RDS API, use the [https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_CreateIntegration.html](https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_CreateIntegration.html) operation with the following parameters:

**Note**  
Catalog names are limited to 19 characters. Ensure your IntegrationName parameter meets this requirement if it will be used as a catalog name.
+ `IntegrationName` – Specify a name for the integration.
+ `SourceArn` – Specify the ARN of the Aurora DB cluster that will be the source for the integration.
+ `TargetArn` – Specify the ARN of the AWS Glue managed catalog that will be the target for the integration.

## Encrypting integrations with a customer managed key
<a name="zero-etl.create-encrypt-smlh"></a>

If you specify a custom KMS key rather than an AWS owned key when you create an integration with Amazon SageMaker, the key policy must provide the SageMaker Unified Studio service principal access to the `CreateGrant` action. In addition, it must allow the current user to perform to the `DescribeKey` and `CreateGrant` actions.

The following sample policy demonstrates how to provide the required permissions in the key policy. It includes context keys to further reduce the scope of permissions.

### Sample key policy
<a name="zero-etl.kms-sample-policy"></a>

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Id": "Key policy",
    "Statement": [
        {
            "Sid": "EnablesIAMUserPermissions",
            "Effect": "Allow",
            "Principal": {
            "AWS": "arn:aws:iam::111122223333:root"
            },
            "Action": "kms:*",
            "Resource": "*"
        },
        {
            "Sid": "GlueServicePrincipalAddGrant",
            "Effect": "Allow",
            "Principal": {
                "Service": "glue.amazonaws.com"
            },
            "Action": "kms:CreateGrant",
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "kms:EncryptionContext:{context-key}":"{context-value}"
                },
                "ForAllValues:StringEquals": {
                    "kms:GrantOperations": [
                        "Decrypt",
                        "GenerateDataKey",
                        "CreateGrant"
                    ]
                }
            }
        },
        {
            "Sid": "AllowsCurrentUserRoleAddGrantKMSKey",
            "Effect": "Allow",
            "Principal": {
            "AWS": "arn:aws:iam::111122223333:role/{role-name}"
            },
            "Action": "kms:CreateGrant",
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "kms:EncryptionContext:{context-key}":"{context-value}",
                    "kms:ViaService": "rds.us-east-1.amazonaws.com"
                },
                "ForAllValues:StringEquals": {
                    "kms:GrantOperations": [
                        "Decrypt",
                        "GenerateDataKey",
                        "CreateGrant"
                    ]
                }
            }
        },
        {
            "Sid": "AllowsCurrentUserRoleRetrieveKMSKeyInformation",
            "Effect": "Allow",
            "Principal": {
            "AWS": "arn:aws:iam::111122223333:role/{role-name}"
            },
            "Action": "kms:DescribeKey",
            "Resource": "*"
        }
    ]
}
```

------

For more information, see [Creating a key policy](https://docs.aws.amazon.com/kms/latest/developerguide/key-policy-overview.html) in the *AWS Key Management Service Developer Guide*.

## Next steps
<a name="zero-etl.create-next-smlh"></a>

After you successfully create a zero-ETL integration with Amazon SageMaker, you can start adding data to the source Aurora DB cluster and querying it in your Amazon SageMaker lakehouse. The data will be automatically replicated and made available for analytics and machine learning workloads.

# Data filtering for Aurora zero-ETL integrations
<a name="zero-etl.filtering"></a>

Aurora zero-ETL integrations support data filtering, which lets you control which data is replicated from your source Aurora DB cluster to your target data warehouse. Instead of replicating the entire database, you can apply one or more filters to selectively include or exclude specific tables. This helps you optimize storage and query performance by ensuring that only relevant data is transferred. Currently, filtering is limited to the database and table levels. Column- and row-level filtering are not supported.

Data filtering can be useful when you want to:
+ Join certain tables from two or more different source clusters, and you don't need complete data from either cluster.
+ Save costs by performing analytics using only a subset of tables rather than an entire fleet of databases.
+ Filter out sensitive information—such as phone numbers, addresses, or credit card details—from certain tables.

You can add data filters to a zero-ETL integration using the AWS Management Console, the AWS Command Line Interface (AWS CLI), or the Amazon RDS API.

If the integration has a provisioned cluster as its target, the cluster must be on [patch 180](https://docs.aws.amazon.com/redshift/latest/mgmt/cluster-versions.html#cluster-version-180) or higher to use data filtering.

**Topics**
+ [Format of a data filter](#zero-etl.filtering-format)
+ [Filter logic](#zero-etl.filtering-evaluate)
+ [Filter precedence](#zero-etl.filtering-precedence)
+ [Aurora MySQL examples](#zero-etl.filtering-examples-mysql)
+ [Aurora PostgreSQL examples](#zero-etl.filtering-examples-postgres)
+ [Adding data filters to an integration](#zero-etl.add-filter)
+ [Removing data filters from an integration](#zero-etl.remove-filter)

## Format of a data filter
<a name="zero-etl.filtering-format"></a>

You can define multiple filters for a single integration. Each filter either includes or excludes any existing and future database tables that match one of the patterns in the filter expression. Aurora zero-ETL integrations use [Maxwell filter syntax](https://maxwells-daemon.io/filtering/) for data filtering.

Each filter has the following elements:


| Element | Description | 
| --- | --- | 
| Filter type |  An `Include` filter type *includes* all tables that match one of the patterns in the filter expression. An `Exclude` filter type *excludes* all tables that match one of the patterns.  | 
| Filter expression |  A comma-separated list of patterns. Expressions must use [Maxwell filter syntax](https://maxwells-daemon.io/filtering/).  | 
| Pattern |  A filter pattern in the format `database.table` for Aurora MySQL, or `database.schema.table` for Aurora PostgreSQL. You can specify literal names, or define regular expressions.  For Aurora MySQL, regular expressions are supported in both the database and table name. For Aurora PostgreSQL, regular expressions are supported only in the schema and table name, not in the database name.  You can't include column-level filters or denylists. A single integration can have a maximum of 99 total patterns. In the console, you can enter patterns within a single filter expression, or spread them out among multiple expressions. A single pattern can't exceed 256 characters in length.  | 

**Important**  
If you select an Aurora PostgreSQL source DB cluster, you must specify at least one data filter pattern. At minimum, the pattern must include a single database (`database-name.*.*`) for replication to the target data warehouse.

The following image shows the structure of Aurora MySQL data filters in the console:

![\[Data filters for a zero-ETL integration\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/zero-etl-filter.png)


**Important**  
Do not include personally identifying, confidential, or sensitive information in your filter patterns.

### Data filters in the AWS CLI
<a name="zero-etl.filtering-cli"></a>

When using the AWS CLI to add a data filter, the syntax differs slightly from the console. You must assign a filter type (`Include` or `Exclude`) to each pattern individually, so you can't group multiple patterns under one filter type.

For example, in the console you can group the following comma-separated patterns under a single `Include` statement:

**Aurora MySQL**

```
mydb.mytable, mydb./table_\d+/
```

**Aurora PostgreSQL**

```
mydb.myschema.mytable, mydb.myschema./table_\d+/
```

However, when using the AWS CLI, the same data filter must be in the following format:

**Aurora MySQL**

```
'include: mydb.mytable, include: mydb./table_\d+/'
```

**Aurora PostgreSQL**

```
'include: mydb.myschema.mytable, include: mydb.myschema./table_\d+/'
```

## Filter logic
<a name="zero-etl.filtering-evaluate"></a>

If you don't specify any data filters in your integration, Aurora assumes a default filter of `include:*.*`, which replicates all tables to the target data warehouse. However, if you add at least one filter, the default logic switches to `exclude:*.*`, which excludes all tables by default. This lets you explicitly define which databases and tables to include in replication.

For example, if you define the following filter:

```
'include: db.table1, include: db.table2'
```

Aurora evaluates the filter as follows:

```
'exclude:*.*, include: db.table1, include: db.table2'
```

Therefore, Aurora only replicates `table1` and `table2` from the database named `db` to the target data warehouse.

## Filter precedence
<a name="zero-etl.filtering-precedence"></a>

Aurora evaluates data filters in the order you specify. In the AWS Management Console, it processes filter expressions from left to right and top to bottom. A second filter or an individual pattern that follows the first can override it.

For example, if the first filter is `Include books.stephenking`, it includes only the `stephenking` table from the `books` database. However, if you add a second filter, `Exclude books.*`, it overrides the first filter. This prevents any tables from the `books` index from being replicated to the target data warehouse.

When you specify at least one filter, the logic starts by assuming `exclude:*.*` by default, which automatically *excludes* all tables from replication. As a best practice, define filters from broadest to most specific. Start with one or more `Include` statements to specify the data to replicate, then add `Exclude` filters to selectively remove certain tables.

The same principle applies to filters that you define using the AWS CLI. Aurora evaluates these filter patterns in the order that you specify them, so a pattern might override one that you specify before it.

## Aurora MySQL examples
<a name="zero-etl.filtering-examples-mysql"></a>

The following examples demonstrate how data filtering works for Aurora MySQL examples zero-ETL integrations:
+  Include all databases and all tables:

  ```
  'include: *.*'
  ```
+  Include all tables within the `books` database:

  ```
  'include: books.*'
  ```
+ Exclude any tables named `mystery`:

  ```
  'include: *.*, exclude: *.mystery'
  ```
+ Include two specific tables within the `books` database:

  ```
  'include: books.stephen_king, include: books.carolyn_keene'
  ```
+ Include all tables in the `books` database, except for those containing the substring `mystery`:

  ```
  'include: books.*, exclude: books./.*mystery.*/'
  ```
+ Include all tables in the `books` database, except those starting with `mystery`:

  ```
  'include: books.*, exclude: books./mystery.*/'
  ```
+ Include all tables in the `books` database, except those ending with `mystery`:

  ```
  'include: books.*, exclude: books./.*mystery/'
  ```
+ Include all tables in the `books` database that start with `table_`, except for the one named `table_stephen_king`. For example, `table_movies` or `table_books` would be replicated, but not `table_stephen_king`.

  ```
  'include: books./table_.*/, exclude: books.table_stephen_king'
  ```

## Aurora PostgreSQL examples
<a name="zero-etl.filtering-examples-postgres"></a>

The following examples demonstrate how data filtering works for Aurora PostgreSQL zero-ETL integrations:
+ Include all tables within the `books` database:

  ```
  'include: books.*.*'
  ```
+ Exclude any tables named `mystery` in the `books` database:

  ```
  'include: books.*.*, exclude: books.*.mystery'
  ```
+ Include one table within the `books` database in the `mystery` schema, and one table within `employee` database in the `finance` schema:

  ```
  'include: books.mystery.stephen_king, include: employee.finance.benefits'
  ```
+ Include all tables in the `books` database and `science_fiction` schema, except for those containing the substring `king`:

  ```
  'include: books.science_fiction.*, exclude: books.*./.*king.*/
  ```
+ Include all tables in the `books` database, except those with a schema name starting with `sci`:

  ```
  'include: books.*.*, exclude: books./sci.*/.*'
  ```
+ Include all tables in the `books` database, except those in the `mystery` schema ending with `king`:

  ```
  'include: books.*.*, exclude: books.mystery./.*king/'
  ```
+ Include all tables in the `books` database that start with `table_`, except for the one named `table_stephen_king`. For example, `table_movies` in the `fiction` schema and `table_books` in the `mystery` schema are replicated, but not `table_stephen_king` in either schema:

  ```
  'include: books.*./table_.*/, exclude: books.*.table_stephen_king'
  ```

## Adding data filters to an integration
<a name="zero-etl.add-filter"></a>

You can configure data filtering using the AWS Management Console, the AWS CLI, or the Amazon RDS API. 

**Important**  
If you add a filter after you create an integration, Aurora treats it as if it always existed. It removes any data in the target data warehouse that doesn’t match the new filtering criteria and resynchronizes all affected tables.

### RDS console
<a name="add-filter-console"></a>

**To add data filters to a zero-ETL integration**

1. Sign in to the AWS Management Console and open the Amazon RDS console at [https://console.aws.amazon.com/rds/](https://console.aws.amazon.com/rds/).

1. In the navigation pane, choose **Zero-ETL integrations**. Select the integration that you want to add data filters to, and then choose **Modify**.

1. Under **Source**, add one or more `Include` and `Exclude` statements.

   The following image shows an example of data filters for a MySQL integration:  
![\[Data filters for a zero-ETL integration in the RDS console\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/zero-etl-filter-data.png)

1. When you're satisfied with the changes, choose **Continue** and **Save changes**.

### AWS CLI
<a name="add-filter-cli"></a>

To add data filters to a zero-ETL integration using the AWS CLI, call the [modify-integration](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/rds/modify-integration.html) command. In addition to the integration identifier, specify the `--data-filter` parameter with a comma-separated list of `Include` and `Exclude` Maxwell filters.

**Example**  
The following example adds filter patterns to `my-integration`.  
For Linux, macOS, or Unix:  

```
aws rds modify-integration \
    --integration-identifier my-integration \
    --data-filter 'include: foodb.*, exclude: foodb.tbl, exclude: foodb./table_\d+/'
```
For Windows:  

```
aws rds modify-integration ^
    --integration-identifier my-integration ^
    --data-filter 'include: foodb.*, exclude: foodb.tbl, exclude: foodb./table_\d+/'
```

### RDS API
<a name="add-filter-api"></a>

To modify a zero-ETL integration using the RDS API, call the [ModifyIntegration](https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_ModifyIntegration.html) operation. Specify the integration identifier and provide a comma-separated list of filter patterns.

## Removing data filters from an integration
<a name="zero-etl.remove-filter"></a>

When you remove a data filter from an integration, Aurora reevaluates the remaining filters as if the removed filter never existed. It then replicates any previously excluded data that now meets the criteria into the target data warehouse. This triggers a resynchronization of all affected tables.

# Adding data to a source Aurora DB cluster and querying it
<a name="zero-etl.querying"></a>

To finish creating a zero-ETL integration that replicates data from Amazon Aurora into Amazon Redshift, you must create a database in the target destination.

For connections with Amazon Redshift, connect to your Amazon Redshift cluster or workgroup and create a database with a reference to your integration identifier. Then, you can add data to your source Aurora DB cluster and see it replicated in Amazon Redshift or Amazon SageMaker.

**Topics**
+ [Creating a target database](#zero-etl.create-db)
+ [Adding data to the source DB cluster](#zero-etl.add-data-rds)
+ [Querying your Aurora data in Amazon Redshift](#zero-etl.query-data-redshift)
+ [Data type differences between Aurora and Amazon Redshift databases](#zero-etl.data-type-mapping)
+ [DDL operations for Aurora PostgreSQL](#zero-etl.ddl-postgres)

## Creating a target database
<a name="zero-etl.create-db"></a>

Before you can start replicating data into Amazon Redshift, after you create an integration, you must create a database in your target data warehouse. This database must include a reference to the integration identifier. You can use the Amazon Redshift console or the Query editor v2 to create the database.

For instructions to create a destination database, see [Create a destination database in Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/mgmt/zero-etl-using.creating-db.html#zero-etl-using.create-db).

## Adding data to the source DB cluster
<a name="zero-etl.add-data-rds"></a>

After you configure your integration, you can populate the source Aurora DB cluster with data that you want to replicate into your data warehouse.

**Note**  
There are differences between data types in Amazon Aurora and the target analytics warehouse. For a table of data type mappings, see [Data type differences between Aurora and Amazon Redshift databases](#zero-etl.data-type-mapping).

First, connect to the source DB cluster using the MySQL or PostgreSQL client of your choice. For instructions, see [Connecting to an Amazon Aurora DB cluster](Aurora.Connecting.md).

Then, create a table and insert a row of sample data.

**Important**  
Make sure that the table has a primary key. Otherwise, it can't be replicated to the target data warehouse.

The pg\$1dump and pg\$1restore PostgreSQL utilities initially create tables without a primary key and then add it afterwards. If you're using one of these utilities, we recommend first creating a schema and then loading data in a separate command.

**MySQL**

The following example uses the [MySQL Workbench utility](https://dev.mysql.com/downloads/workbench/).

```
CREATE DATABASE my_db;

USE my_db;

CREATE TABLE books_table (ID int NOT NULL, Title VARCHAR(50) NOT NULL, Author VARCHAR(50) NOT NULL,
Copyright INT NOT NULL, Genre VARCHAR(50) NOT NULL, PRIMARY KEY (ID));

INSERT INTO books_table VALUES (1, 'The Shining', 'Stephen King', 1977, 'Supernatural fiction');
```

**PostgreSQL**

The following example uses the `[psql](https://www.postgresql.org/docs/current/app-psql.html)` PostgreSQL interactive terminal. When connecting to the cluster, include the named database that you specified when creating the integration.

```
psql -h mycluster.cluster-123456789012.us-east-2.rds.amazonaws.com -p 5432 -U username -d named_db;

named_db=> CREATE TABLE books_table (ID int NOT NULL, Title VARCHAR(50) NOT NULL, Author VARCHAR(50) NOT NULL,
Copyright INT NOT NULL, Genre VARCHAR(50) NOT NULL, PRIMARY KEY (ID));

named_db=> INSERT INTO books_table VALUES (1, 'The Shining', 'Stephen King', 1977, 'Supernatural fiction');
```

## Querying your Aurora data in Amazon Redshift
<a name="zero-etl.query-data-redshift"></a>

After you add data to the Aurora DB cluster, it's replicated into the destination database and is ready to be queried.

**To query the replicated data**

1. Navigate to the Amazon Redshift console and choose **Query editor v2** from the left navigation pane.

1. Connect to your cluster or workgroup and choose your destination database (which you created from the integration) from the dropdown menu (**destination\$1database** in this example). For instructions to create a destination database, see [Create a destination database in Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/mgmt/zero-etl-using.creating-db.html#zero-etl-using.create-db).

1. Use a SELECT statement to query your data. In this example, you can run the following command to select all data from the table that you created in the source Aurora DB cluster:

   ```
   SELECT * from my_db."books_table";
   ```  
![\[Run a SELECT statement within the query editor. The result is a single row of sample data that was added to the Amazon RDS database.\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/zero-etl-redshift-editor.png)
   + `my_db` is the Aurora database schema name. This option is only needed for MySQL databases.
   + `books_table` is the Aurora table name.

You can also query the data using the a command line client. For example:

```
destination_database=# select * from my_db."books_table";

 ID |       Title |        Author |   Copyright |                  Genre |  txn_seq |  txn_id
----+–------------+---------------+-------------+------------------------+----------+--------+
  1 | The Shining |  Stephen King |        1977 |   Supernatural fiction |        2 |   12192
```

**Note**  
For case-sensitivity, use double quotes (" ") for schema, table, and column names. For more information, see [enable\$1case\$1sensitive\$1identifier](https://docs.aws.amazon.com/redshift/latest/dg/r_enable_case_sensitive_identifier.html).

## Data type differences between Aurora and Amazon Redshift databases
<a name="zero-etl.data-type-mapping"></a>

The following tables show the mappings of Aurora MySQL and Aurora PostgreSQL data types to corresponding destination data types. *Amazon Aurora currently supports only these data types for zero-ETL integrations.*

If a table in your source DB cluster includes an unsupported data type, the table goes out of sync and isn't consumable by the destination target. Streaming from the source to the target continues, but the table with the unsupported data type isn't available. To fix the table and make it available in the target destination, you must manually revert the breaking change and then refresh the integration by running `[ALTER DATABASE...INTEGRATION REFRESH](https://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_DATABASE.html)`.

**Note**  
You can't refresh zero-ETL integrations with an Amazon SageMaker lakehouse. Instead, delete and try to create the integration again.

**Topics**
+ [Aurora MySQL](#zero-etl.data-type-mapping-mysql)
+ [Aurora PostgreSQL](#zero-etl.data-type-mapping-postgres)

### Aurora MySQL
<a name="zero-etl.data-type-mapping-mysql"></a>

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/zero-etl.querying.html)

### Aurora PostgreSQL
<a name="zero-etl.data-type-mapping-postgres"></a>

Zero-ETL integrations for Aurora PostgreSQL don't support custom data types or data types created by extensions.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/zero-etl.querying.html)

## DDL operations for Aurora PostgreSQL
<a name="zero-etl.ddl-postgres"></a>

Amazon Redshift is derived from PostgreSQL, so it shares several features with Aurora PostgreSQL due to their common PostgreSQL architecture. Zero-ETL integrations leverage these similarities to streamline data replication from Aurora PostgreSQL to Amazon Redshift, mapping databases by name and utilizing the shared database, schema, and table structure.

Consider the following points when managing Aurora PostgreSQL zero-ETL integrations:
+ Isolation is managed at the database level.
+ Replication occurs at the database level. 
+ Aurora PostgreSQL databases are mapped to Amazon Redshift databases by name, with data flowing to the corresponding renamed Redshift database if the original is renamed.

Despite their similarities, Amazon Redshift and Aurora PostgreSQL have important differences. The following sections outline Amazon Redshift system responses for common DDL operations.

**Topics**
+ [Database operations](#zero-etl.ddl-postgres-database)
+ [Schema operations](#zero-etl.ddl-postgres-schema)
+ [Table operations](#zero-etl.ddl-postgres-table)

### Database operations
<a name="zero-etl.ddl-postgres-database"></a>

The following table shows the system responses for database DDL operations.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/zero-etl.querying.html)

### Schema operations
<a name="zero-etl.ddl-postgres-schema"></a>

The following table shows the system responses for schema DDL operations.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/zero-etl.querying.html)

### Table operations
<a name="zero-etl.ddl-postgres-table"></a>

The following table shows the system responses for table DDL operations.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/zero-etl.querying.html)

# Viewing and monitoring Aurora zero-ETL integrations
<a name="zero-etl.describingmonitoring"></a>

You can view the details of an Amazon Aurora zero-ETL integration to see its configuration information and current status. You can also monitor the status of your integration by querying specific system views in Amazon Redshift. In addition, Amazon Redshift publishes certain integration-related metrics to Amazon CloudWatch, which you can view within the Amazon Redshift console.

**Topics**
+ [Viewing integrations](#zero-etl.describing)
+ [Monitoring integrations using system tables for Amazon Redshift](#zero-etl.monitoring)
+ [Monitoring integrations with Amazon EventBridge for Amazon Redshift](#zero-etl.eventbridge)

## Viewing integrations
<a name="zero-etl.describing"></a>

You can view Aurora zero-ETL integrations using the AWS Management Console, the AWS CLI, or the RDS API.

### Console
<a name="zero-etl.describing-console"></a>

**To view the details of a zero-ETL integration**

1. Sign in to the AWS Management Console and open the Amazon RDS console at [https://console.aws.amazon.com/rds/](https://console.aws.amazon.com/rds/).

1. From the left navigation pane, choose **Zero-ETL integrations**. 

1. Select an integration to view more details about it, such as its source DB cluster and target data warehouse.  
![\[Details about a zero-ETL integration\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/images/zero-etl-integration-view.png)

An integration can have the following statuses:
+ `Creating` – The integration is being created.
+ `Active` – The integration is sending transactional data to the target data warehouse.
+ `Syncing` – The integration has encountered a recoverable error and is reseeding data. Affected tables aren't available for querying until they finish resyncing.
+ `Needs attention` – The integration encountered an event or error that requires manual intervention to resolve it. To fix the issue, follow the instructions in the error message on the integration details page.
+ `Failed` – The integration encountered an unrecoverable event or error that can't be fixed. You must delete and recreate the integration.
+ `Deleting` – The integration is being deleted.

### AWS CLI
<a name="zero-etl.describing-cli"></a>

To view all zero-ETL integrations in the current account using the AWS CLI, use the [describe-integrations](https://docs.aws.amazon.com/cli/latest/reference/rds/describe-integrations.html) command and specify the `--integration-identifier` option.

**Example**  
For Linux, macOS, or Unix:  

```
aws rds describe-integrations \
    --integration-identifier ee605691-6c47-48e8-8622-83f99b1af374
```
For Windows:  

```
aws rds describe-integrations ^
    --integration-identifier ee605691-6c47-48e8-8622-83f99b1af374
```

### RDS API
<a name="zero-etl.describing-api"></a>

To view zero-ETL integration using the Amazon RDS API, use the [https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_DescribeIntegrations.html](https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_DescribeIntegrations.html) operation with the `IntegrationIdentifier` parameter.

## Monitoring integrations using system tables for Amazon Redshift
<a name="zero-etl.monitoring"></a>

Amazon Redshift has system tables and views that contain information about how the system is functioning. You can query these system tables and views the same way that you would query any other database table. For more information about system tables and views in Amazon Redshift, see [System tables and views reference](https://docs.aws.amazon.com//redshift/latest/dg/cm_chap_system-tables.html) in the *Amazon Redshift Database Developer Guide*.

You can query the following system views and tables to get information about your Aurora zero-ETL integrations:
+  [SVV\$1INTEGRATION](https://docs.aws.amazon.com/redshift/latest/dg/r_SVV_INTEGRATION.html) – Provides configuration details for your integrations.
+ [SVV\$1INTEGRATION\$1TABLE\$1STATE](https://docs.aws.amazon.com/redshift/latest/dg/r_SVV_INTEGRATION_TABLE_STATE.html) – Describes the state of each table within an integration.
+ [SYS\$1INTEGRATION\$1TABLE\$1STATE\$1CHANGE](https://docs.aws.amazon.com/redshift/latest/dg/r_SYS_INTEGRATION_TABLE_STATE_CHANGE.html) – Displays table state change logs for an integration.
+ [SYS\$1INTEGRATION\$1ACTIVITY](https://docs.aws.amazon.com/redshift/latest/dg/r_SYS_INTEGRATION_ACTIVITY.html) – Provides information about completed integration runs.

All integration-related Amazon CloudWatch metrics originate from Amazon Redshift. For more information, see [Metrics for zero-ETL integrations](https://docs.aws.amazon.com/redshift/latest/mgmt/zero-etl-using.monitoring.html) in the *Amazon Redshift Management Guide*. Currently, Amazon Aurora doesn't publish any integration metrics to CloudWatch.

## Monitoring integrations with Amazon EventBridge for Amazon Redshift
<a name="zero-etl.eventbridge"></a>

Amazon Redshift send integration-related events to Amazon EventBridge. For a list of events and their corresponding event IDs, see [Zero-ETL integration event notifications with Amazon EventBridge](https://docs.aws.amazon.com/redshift/latest/mgmt/integration-event-notifications) in the *Amazon Redshift Management Guide*.

# Modifying Aurora zero-ETL integrations
<a name="zero-etl.modifying"></a>

You can modify only the name, description, and data filtering options for a zero-ETL integration in a supported data warehouse. You can't modify the AWS KMS key used to encrypt the integration, or the source or target databases.

If you add a data filter to an existing integration, Aurora reevaluates the filter as if it always existed. It removes any data that is currently in the target data warehouse that doesn't match the new filtering criteria. If you *remove* a data filter from an integration, it replicates any data that previously didn't match the filtering criteria (but now does) into the target data warehouse. For more information, see [Data filtering for Aurora zero-ETL integrations](zero-etl.filtering.md).

You can modify a zero-ETL integration using the AWS Management Console, the AWS CLI, or the Amazon RDS API.

## RDS console
<a name="modify-integration-console"></a>

**To modify a zero-ETL integration**

1. Sign in to the AWS Management Console and open the Amazon RDS console at [https://console.aws.amazon.com/rds/](https://console.aws.amazon.com/rds/).

1. In the navigation pane, choose **Zero-ETL integrations**, and then choose the integration that you want to modify. 

1. Choose **Modify** and make modifications to any available settings.

1. When all the changes are as you want them, choose **Modify**.

## AWS CLI
<a name="modify-integration-cli"></a>

To modify a zero-ETL integration using the AWS CLI, call the [modify-integration](https://docs.aws.amazon.com/cli/latest/reference/rds/modify-integration.html) command. Along with the `--integration-identifier`, specify any of the following options:
+ `--integration-name` – Specify a new name for the integration.
+ `--description` – Specify a new description for the integration.
+ `--data-filter` – Specify data filtering options for the integration. For more information, see [Data filtering for Aurora zero-ETL integrations](zero-etl.filtering.md).

**Example**  
The following request modifies an existing integration.  
For Linux, macOS, or Unix:  

```
aws rds modify-integration \
    --integration-identifier ee605691-6c47-48e8-8622-83f99b1af374 \
    --integration-name my-renamed-integration
```
For Windows:  

```
aws rds modify-integration ^
    --integration-identifier ee605691-6c47-48e8-8622-83f99b1af374 ^
    --integration-name my-renamed-integration
```

## RDS API
<a name="modify-integration-api"></a>

To modify a zero-ETL integration using the RDS API, call the [ModifyIntegration](https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_ModifyIntegration.html) operation. Specify the integration identifier, and the parameters that you want to modify.

# Deleting Aurora zero-ETL integrations
<a name="zero-etl.deleting"></a>

When you delete a zero-ETL integration, Amazon Aurora removes it from the source Aurora DB cluster. Your transactional data isn't deleted from Amazon Aurora or the analytics destination, but Aurora doesn't send new data to Amazon Redshift or Amazon SageMaker.

You can only delete an integration when it has a status of `Active`, `Failed`, `Syncing`, or `Needs attention`.

You can delete zero-ETL integrations using the AWS Management Console, the AWS CLI, or the RDS API.

## Console
<a name="zero-etl.deleting-console"></a>

**To delete a zero-ETL integration**

1. Sign in to the AWS Management Console and open the Amazon RDS console at [https://console.aws.amazon.com/rds/](https://console.aws.amazon.com/rds/).

1. From the left navigation pane, choose **Zero-ETL integrations**. 

1. Select the zero-ETL integration that you want to delete. 

1. Choose **Actions**, **Delete**, and confirm deletion.

## AWS CLI
<a name="zero-etl.deleting-cli"></a>

To delete a zero-ETL integration, use the [delete-integration](https://docs.aws.amazon.com/cli/latest/reference/rds/delete-integration.html) command and specify the `--integration-identifier` option.

**Example**  
For Linux, macOS, or Unix:  

```
aws rds delete-integration \
    --integration-identifier ee605691-6c47-48e8-8622-83f99b1af374
```
For Windows:  

```
aws rds delete-integration ^
    --integration-identifier ee605691-6c47-48e8-8622-83f99b1af374
```

## RDS API
<a name="zero-etl.deleting-api"></a>

To delete a zero-ETL integration using the Amazon RDS API, use the [https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_DeleteIntegration.html](https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_DeleteIntegration.html) operation with the `IntegrationIdentifier` parameter.

# Troubleshooting Aurora zero-ETL integrations
<a name="zero-etl.troubleshooting"></a>

You can check the state of a zero-ETL integration by querying the [SVV\$1INTEGRATION](https://docs.aws.amazon.com/redshift/latest/dg/r_SVV_INTEGRATION.html) system table in the analytics destination. If the `state` column has a value of `ErrorState`, it means something's wrong. For more information, see [Monitoring integrations using system tables for Amazon Redshift](zero-etl.describingmonitoring.md#zero-etl.monitoring).

Use the following information to troubleshoot common issues with Aurora zero-ETL integrations.

**Important**  
Resync and refresh operations are not available for zero-ETL integrations with an Amazon SageMaker AI lakehouse. If there are issues with an integration, you must delete the integration and create a new integration. You can't refresh or resync an existing integration.

**Topics**
+ [I can't create a zero-ETL integration](#zero-etl.troubleshooting.creation)
+ [My integration is stuck in a state of `Syncing`](#zero-etl.troubleshooting.syncing)
+ [My tables aren't replicating to Amazon Redshift](#zero-etl.troubleshooting.primarykey)
+ [One or more of my Amazon Redshift tables requires a resync](#zero-etl.troubleshooting.resync)
+ [Integration failed issues for Amazon SageMaker AI lakehouse zero-ETL integrations](#zero-etl.troubleshooting.integration-issues)
+ [DDL changes are in Amazon Redshift before the DDL transaction is complete for Aurora PostgreSQL](#zero-etl.troubleshooting.ddl)

## I can't create a zero-ETL integration
<a name="zero-etl.troubleshooting.creation"></a>

If you can't create a zero-ETL integration, make sure that the following are correct for your source database:
+ Your source database must be running a supported DB engine version. For a list of supported versions, see [Supported Regions and Aurora DB engines for zero-ETL integrations](Concepts.Aurora_Fea_Regions_DB-eng.Feature.Zero-ETL.md).
+  You correctly configured DB parameters. If the required parameters are set incorrectly or not associated with the database, creation fails. See [Step 1: Create a custom DB cluster parameter group](zero-etl.setting-up.md#zero-etl.parameters).

In addition, make sure the following are correct for your target data warehouse:
+ Case sensitivity is enabled. See [Turn on case sensitivity for your data warehouse](https://docs.aws.amazon.com/redshift/latest/mgmt/zero-etl-using.setting-up.html#zero-etl-setting-up.case-sensitivity).
+ You added the correct authorized principal and integration source. See [Configure authorization for your Amazon Redshift data warehouse](https://docs.aws.amazon.com/redshift/latest/mgmt/zero-etl-using.setting-up.html#zero-etl-using.redshift-iam).
+ The data warehouse is encrypted (if it's a provisioned cluster). See [Amazon Redshift database encryption](https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-db-encryption.html).

## My integration is stuck in a state of `Syncing`
<a name="zero-etl.troubleshooting.syncing"></a>

Your integration might consistently show a status of `Syncing` if you change the value of one of the required DB parameters.

To fix this issue, check the values of the parameters in the parameter group associated with the source DB cluster, and make sure that they match the required values. For more information, see [Step 1: Create a custom DB cluster parameter group](zero-etl.setting-up.md#zero-etl.parameters).

If you modify any parameters, make sure to reboot the DB cluster to apply the changes.

## My tables aren't replicating to Amazon Redshift
<a name="zero-etl.troubleshooting.primarykey"></a>

If you don't see one or more tables reflected in Amazon Redshift, you can run the following command to resynchronize them:

```
ALTER DATABASE dbname INTEGRATION REFRESH TABLES table1, table2;
```

For more information, see [ALTER DATABASE](https://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_DATABASE.html) in the Amazon Redshift SQL reference.

Your data might not be replicating because one or more of your source tables doesn't have a primary key. The monitoring dashboard in Amazon Redshift displays the status of these tables as `Failed`, and the status of the overall zero-ETL integration changes to `Needs attention`. To resolve this issue, you can identify an existing key in your table that can become a primary key, or you can add a synthetic primary key. For detailed solutions, see the following resources:
+ [Handle tables without primary keys while creating Amazon Aurora MySQL or Amazon RDS for MySQL zero-ETL integrations with Amazon Redshift](https://aws.amazon.com/blogs/database/handle-tables-without-primary-keys-while-creating-amazon-aurora-mysql-or-amazon-rds-for-mysql-zero-etl-integrations-with-amazon-redshift/)
+ [Handle tables without primary keys while creating Amazon Aurora PostgreSQL zero-ETL integrations with Amazon Redshift](https://aws.amazon.com/blogs/database/handle-tables-without-primary-keys-while-creating-amazon-aurora-postgresql-zero-etl-integrations-with-amazon-redshift/)

## One or more of my Amazon Redshift tables requires a resync
<a name="zero-etl.troubleshooting.resync"></a>

Running certain commands on your source database might require your tables to be resynchronized. In these cases, the [SVV\$1INTEGRATION\$1TABLE\$1STATE](https://docs.aws.amazon.com/redshift/latest/dg/r_SVV_INTEGRATION_TABLE_STATE.html) system view shows a `table_state` of `ResyncRequired`, which means that the integration must completely reload data for that specific table from MySQL to Amazon Redshift.

When the table starts to resynchronize, it enters a state of `Syncing`. You don't need to take any manual action to resynchronize a table. While table data is resynchronizing, you can't access it in Amazon Redshift.

The following are some example operations that can put a table into a `ResyncRequired` state, and possible alternatives to consider.


| Operation | Example | Alternative | 
| --- | --- | --- | 
| Adding a column into a specific position  |  <pre>ALTER TABLE table_name<br />  ADD COLUMN column_name INTEGER<br />  NOT NULL first;</pre>  | Amazon Redshift doesn't support adding columns into specific positions using first or after keywords. If the order of columns in the target table isn't critical, add the column to the end of the table using a simpler command:<pre>ALTER TABLE table_name<br />  ADD COLUMN column_name column_type;</pre> | 
| Adding a timestamp column with the default CURRENT\$1TIMESTAMP |  <pre>ALTER TABLE table_name<br />  ADD COLUMN column_name TIMESTAMP<br />  NOT NULL DEFAULT CURRENT_TIMESTAMP;</pre>  | The CURRENT\$1TIMESTAMP value for existing table rows is calculated by Aurora MySQL and can't be simulated in Amazon Redshift without full table data resynchronization. If possible, switch the default value to a literal constant like `2023-01-01 00:00:15` to avoid latency in table availability. | 
| Performing multiple column operations within a single command |  <pre>ALTER TABLE table_name<br />  ADD COLUMN column_1,<br />  RENAME COLUMN column_2 TO column_3;</pre>  | Consider splitting the command into two separate operations, ADD and RENAME, which won't require resynchronization. | 

## Integration failed issues for Amazon SageMaker AI lakehouse zero-ETL integrations
<a name="zero-etl.troubleshooting.integration-issues"></a>

If you encounter issues with an existing zero-ETL integration with an Amazon SageMaker AI lakehouse, the only resolution is to delete the integration and create a new one. Unlike other AWS services, zero-ETL integrations do not support refresh or resync operations.

To resolve integration issues:

1. Delete the problematic zero-ETL integration using the console, CLI, or API.

1. Verify that the source database and target data warehouse configurations are correct.

1. Create a new zero-ETL integration with the same or updated configuration.

This process will result in a complete re-initialization of the data pipeline, which may take time depending on the size of your source database.

## DDL changes are in Amazon Redshift before the DDL transaction is complete for Aurora PostgreSQL
<a name="zero-etl.troubleshooting.ddl"></a>

DDL changes can appear in Amazon Redshift before a DDL operation finishes in Aurora PostgreSQL zero-ETL integrations. For more information, see [DDL operations for Aurora PostgreSQL](zero-etl.querying.md#zero-etl.ddl-postgres).