# Targets for data migration
<a name="CHAP_Target"></a>

AWS Database Migration Service (AWS DMS) can use many of the most popular databases as a target for data replication. The target can be on an Amazon Elastic Compute Cloud (Amazon EC2) instance, an Amazon Relational Database Service (Amazon RDS) instance, or an on-premises database. 

For a comprehensive list of valid targets, see [Targets for AWS DMS](CHAP_Introduction.Targets.md).

**Note**  
AWS DMS doesn't support migration across AWS Regions for the following target endpoint types:  
Amazon DynamoDB
Amazon OpenSearch Service
Amazon Kinesis Data Streams
Amazon Aurora PostgreSQL Limitless is available as a target for AWS Database Migration Service (AWS DMS). For more information see [Using a PostgreSQL database as a target for AWS Database Migration Service](https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.PostgreSQL.html).

**Topics**
+ [Using an Oracle database as a target for AWS Database Migration Service](CHAP_Target.Oracle.md)
+ [Using a Microsoft SQL Server database as a target for AWS Database Migration Service](CHAP_Target.SQLServer.md)
+ [Using a PostgreSQL database as a target for AWS Database Migration Service](CHAP_Target.PostgreSQL.md)
+ [Using a MySQL-compatible database as a target for AWS Database Migration Service](CHAP_Target.MySQL.md)
+ [Using an Amazon Redshift database as a target for AWS Database Migration Service](CHAP_Target.Redshift.md)
+ [Using a SAP ASE database as a target for AWS Database Migration Service](CHAP_Target.SAP.md)
+ [Using Amazon S3 as a target for AWS Database Migration Service](CHAP_Target.S3.md)
+ [Using an Amazon DynamoDB database as a target for AWS Database Migration Service](CHAP_Target.DynamoDB.md)
+ [Using Amazon Kinesis Data Streams as a target for AWS Database Migration Service](CHAP_Target.Kinesis.md)
+ [Using Apache Kafka as a target for AWS Database Migration Service](CHAP_Target.Kafka.md)
+ [Using an Amazon OpenSearch Service cluster as a target for AWS Database Migration Service](CHAP_Target.Elasticsearch.md)
+ [Using Amazon DocumentDB as a target for AWS Database Migration Service](CHAP_Target.DocumentDB.md)
+ [Using Amazon Neptune as a target for AWS Database Migration Service](CHAP_Target.Neptune.md)
+ [Using Redis OSS as a target for AWS Database Migration Service](CHAP_Target.Redis.md)
+ [Using Babelfish as a target for AWS Database Migration Service](CHAP_Target.Babelfish.md)
+ [Using Amazon Timestream as a target for AWS Database Migration Service](CHAP_Target.Timestream.md)
+ [Using Amazon RDS for Db2 and IBM Db2 LUW as a target for AWS DMS](CHAP_Target.DB2.md)

# Using an Oracle database as a target for AWS Database Migration Service
<a name="CHAP_Target.Oracle"></a>

You can migrate data to Oracle database targets using AWS DMS, either from another Oracle database or from one of the other supported databases. You can use Secure Sockets Layer (SSL) to encrypt connections between your Oracle endpoint and the replication instance. For more information on using SSL with an Oracle endpoint, see [Using SSL with AWS Database Migration Service](CHAP_Security.SSL.md). AWS DMS also supports the use of Oracle transparent data encryption (TDE) to encrypt data at rest in the target database because Oracle TDE does not require an encryption key or password to write to the database.

For information about versions of Oracle that AWS DMS supports as a target, see [Targets for AWS DMS](CHAP_Introduction.Targets.md). 

When you use Oracle as a target, we assume that the data is to be migrated into the schema or user that is used for the target connection. If you want to migrate data to a different schema, use a schema transformation to do so. For example, suppose that your target endpoint connects to the user `RDSMASTER` and you want to migrate from the user `PERFDATA1` to `PERFDATA2`. In this case, create a transformation like the following.

```
{
   "rule-type": "transformation",
   "rule-id": "2",
   "rule-name": "2",
   "rule-action": "rename",
   "rule-target": "schema",
   "object-locator": {
   "schema-name": "PERFDATA1"
},
"value": "PERFDATA2"
}
```

When using Oracle as a target, AWS DMS migrates all tables and indexes to default table and index tablespaces in the target. If you want to migrate tables and indexes to different table and index tablespaces, use a tablespace transformation to do so. For example, suppose that you have a set of tables in the `INVENTORY` schema assigned to some tablespaces in the Oracle source. For the migration, you want to assign all of these tables to a single `INVENTORYSPACE` tablespace in the target. In this case, create a transformation like the following.

```
{
   "rule-type": "transformation",
   "rule-id": "3",
   "rule-name": "3",
   "rule-action": "rename",
   "rule-target": "table-tablespace",
   "object-locator": {
      "schema-name": "INVENTORY",
      "table-name": "%",
      "table-tablespace-name": "%"
   },
   "value": "INVENTORYSPACE"
}
```

For more information about transformations, see [Specifying table selection and transformations rules using JSON](CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.md).

If Oracle is both source and target, you can preserve existing table or index tablespace assignments by setting the Oracle source extra connection attribute, `enableHomogenousTablespace=true`. For more information, see [Endpoint settings when using Oracle as a source for AWS DMS](CHAP_Source.Oracle.md#CHAP_Source.Oracle.ConnectionAttrib)

For additional details on working with Oracle databases as a target for AWS DMS, see the following sections: 

**Topics**
+ [Limitations on Oracle as a target for AWS Database Migration Service](#CHAP_Target.Oracle.Limitations)
+ [User account privileges required for using Oracle as a target](#CHAP_Target.Oracle.Privileges)
+ [Configuring an Oracle database as a target for AWS Database Migration Service](#CHAP_Target.Oracle.Configuration)
+ [Endpoint settings when using Oracle as a target for AWS DMS](#CHAP_Target.Oracle.ConnectionAttrib)
+ [Target data types for Oracle](#CHAP_Target.Oracle.DataTypes)

## Limitations on Oracle as a target for AWS Database Migration Service
<a name="CHAP_Target.Oracle.Limitations"></a>

Limitations when using Oracle as a target for data migration include the following:
+ AWS DMS doesn't create schema on the target Oracle database. You have to create any schemas you want on the target Oracle database. The schema name must already exist for the Oracle target. Tables from source schema are imported to the user or schema, which AWS DMS uses to connect to the target instance. To migrate multiple schemas, you can create multiple replication tasks. You can also migrate data to different schemas on a target. To do this, you need to use schema transformation rules on the AWS DMS table mappings.
+ AWS DMS doesn't support the `Use direct path full load` option for tables with INDEXTYPE CONTEXT. As a workaround, you can use array load. 
+ With the batch optimized apply option, loading into the net changes table uses a direct path, which doesn't support XML type. As a workaround, you can use transactional apply mode.
+ Empty strings migrated from source databases can be treated differently by the Oracle target (converted to one-space strings, for example). This can result in AWS DMS validation reporting a mismatch.
+ You can express the total number of columns per table supported in Batch optimized apply mode, using the following formula:

  ```
  2 * columns_in_original_table + columns_in_primary_key <= 999
  ```

  For example, if the original table has 25 columns and its Primary Key consists of 5 columns, then the total number of columns is 55. If a table exceeds the supported number of columns, then all of the changes are applied in one-by-one mode.
+ AWS DMS doesn't support Autonomous DB on Oracle Cloud Infrastructure (OCI).
+ In transactional apply mode, an Oracle target can process DML statements up to 32 KB in size. While this limit is sufficient for many use cases, DML statements exceeding 32 KB will fail with the error: "ORA-01460: unimplemented or unreasonable conversion requested." To resolve this issue, you must enable the batch apply feature by setting the `BatchApplyEnabled` task setting to `true`. Batch apply reduces the overall statement size, allowing you to bypass the 32 KB limitation. For more information, see [Target metadata task settings](CHAP_Tasks.CustomizingTasks.TaskSettings.TargetMetadata.md).
+ AWS DMS Direct path full load for LOB tables may fail with error ORA-39777 due to special handling requirements for LOB data. This error occurs during the direct path load process and can disrupt migration tasks involving LOB columns. To resolve, disable the `useDirectPathFullLoad` setting on the target endpoint and retry the load operation.

## User account privileges required for using Oracle as a target
<a name="CHAP_Target.Oracle.Privileges"></a>

To use an Oracle target in an AWS Database Migration Service task, grant the following privileges in the Oracle database. You grant these to the user account specified in the Oracle database definitions for AWS DMS.
+ SELECT ANY TRANSACTION 
+ SELECT on V\$1NLS\$1PARAMETERS 
+ SELECT on V\$1TIMEZONE\$1NAMES 
+ SELECT on ALL\$1INDEXES 
+ SELECT on ALL\$1OBJECTS 
+ SELECT on DBA\$1OBJECTS
+ SELECT on ALL\$1TABLES 
+ SELECT on ALL\$1USERS 
+ SELECT on ALL\$1CATALOG 
+ SELECT on ALL\$1CONSTRAINTS 
+ SELECT on ALL\$1CONS\$1COLUMNS 
+ SELECT on ALL\$1TAB\$1COLS 
+ SELECT on ALL\$1IND\$1COLUMNS 
+ DROP ANY TABLE 
+ SELECT ANY TABLE
+ INSERT ANY TABLE 
+ UPDATE ANY TABLE
+ CREATE ANY VIEW
+ DROP ANY VIEW
+ CREATE ANY PROCEDURE
+ ALTER ANY PROCEDURE
+ DROP ANY PROCEDURE
+ CREATE ANY SEQUENCE
+ ALTER ANY SEQUENCE
+ DROP ANY SEQUENCE 
+ DELETE ANY TABLE

For the following requirements, grant these additional privileges:
+ To use a specific table list, grant SELECT on any replicated table and also ALTER on any replicated table.
+ To allow a user to create a table in a default tablespace, grant the privilege GRANT UNLIMITED TABLESPACE.
+ For logon, grant the privilege CREATE SESSION.
+ If you are using a direct path (which is the default for full load), `GRANT LOCK ANY TABLE to dms_user;`.
+ If schema is different when using “DROP and CREATE” table prep mode, `GRANT CREATE ANY INDEX to dms_user;`.
+ For some full load scenarios, you might choose the "DROP and CREATE table" or "TRUNCATE before loading" option where a target table schema is different from the DMS user's. In this case, grant DROP ANY TABLE.
+ To store changes in change tables or an audit table where the target table schema is different from the DMS user's, grant CREATE ANY TABLE and CREATE ANY INDEX.
+ To validate LOB columns with the validation feature, grant EXECUTE privelege on `SYS.DBMS_CRYPTO` to the DMS user.

### Read privileges required for AWS Database Migration Service on the target database
<a name="CHAP_Target.Oracle.Privileges.Read"></a>

The AWS DMS user account must be granted read permissions for the following DBA tables:
+ SELECT on DBA\$1USERS
+ SELECT on DBA\$1TAB\$1PRIVS
+ SELECT on DBA\$1OBJECTS
+ SELECT on DBA\$1SYNONYMS
+ SELECT on DBA\$1SEQUENCES
+ SELECT on DBA\$1TYPES
+ SELECT on DBA\$1INDEXES
+ SELECT on DBA\$1TABLES
+ SELECT on DBA\$1TRIGGERS
+ SELECT on SYS.DBA\$1REGISTRY

If any of the required privileges cannot be granted to V\$1xxx, then grant them to V\$1\$1xxx.

### Premigration assessments
<a name="CHAP_Target.Oracle.Privileges.Premigration"></a>

To use the premigration assessments listed in [Oracle assessments](CHAP_Tasks.AssessmentReport.Oracle.md) with Oracle as a Target, you must add the following permissions to the user account specified in the Oracle database target endpoint:

```
GRANT SELECT ON V_$INSTANCE TO dms_user;
GRANT EXECUTE ON SYS.DBMS_XMLGEN TO dms_user;
```

## Configuring an Oracle database as a target for AWS Database Migration Service
<a name="CHAP_Target.Oracle.Configuration"></a>

Before using an Oracle database as a data migration target, you must provide an Oracle user account to AWS DMS. The user account must have read/write privileges on the Oracle database, as specified in [User account privileges required for using Oracle as a target](#CHAP_Target.Oracle.Privileges).

## Endpoint settings when using Oracle as a target for AWS DMS
<a name="CHAP_Target.Oracle.ConnectionAttrib"></a>

You can use endpoint settings to configure your Oracle target database similar to using extra connection attributes. You specify the settings when you create the target endpoint using the AWS DMS console, or by using the `create-endpoint` command in the [AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/dms/index.html), with the `--oracle-settings '{"EndpointSetting": "value", ...}'` JSON syntax.

The following table shows the endpoint settings that you can use with Oracle as a target.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.Oracle.html)

## Target data types for Oracle
<a name="CHAP_Target.Oracle.DataTypes"></a>

A target Oracle database used with AWS DMS supports most Oracle data types. The following table shows the Oracle target data types that are supported when using AWS DMS and the default mapping from AWS DMS data types. For more information about how to view the data type that is mapped from the source, see the section for the source you are using.


|  AWS DMS data type  |  Oracle data type  | 
| --- | --- | 
|  BOOLEAN  |  NUMBER (1)  | 
|  BYTES  |  RAW (length)  | 
|  DATE  |  DATETIME  | 
|  TIME  | TIMESTAMP (0) | 
|  DATETIME  |  TIMESTAMP (scale)  | 
|  INT1  | NUMBER (3) | 
|  INT2  |  NUMBER (5)  | 
|  INT4  | NUMBER (10) | 
|  INT8  |  NUMBER (19)  | 
|  NUMERIC  |  NUMBER (p,s)  | 
|  REAL4  |  FLOAT  | 
|  REAL8  | FLOAT | 
|  STRING  |  With date indication: DATE  With time indication: TIMESTAMP  With timestamp indication: TIMESTAMP  With timestamp\$1with\$1timezone indication: TIMESTAMP WITH TIMEZONE  With timestamp\$1with\$1local\$1timezone indication: TIMESTAMP WITH LOCAL TIMEZONE With interval\$1year\$1to\$1month indication: INTERVAL YEAR TO MONTH  With interval\$1day\$1to\$1second indication: INTERVAL DAY TO SECOND  If length > 4000: CLOB In all other cases: VARCHAR2 (length)  | 
|  UINT1  |  NUMBER (3)  | 
|  UINT2  |  NUMBER (5)  | 
|  UINT4  |  NUMBER (10)  | 
|  UINT8  |  NUMBER (19)  | 
|  WSTRING  |  If length > 2000: NCLOB In all other cases: NVARCHAR2 (length)  | 
|  BLOB  |  BLOB To use this data type with AWS DMS, you must enable the use of BLOBs for a specific task. BLOB data types are supported only in tables that include a primary key  | 
|  CLOB  |  CLOB To use this data type with AWS DMS, you must enable the use of CLOBs for a specific task. During change data capture (CDC), CLOB data types are supported only in tables that include a primary key. STRING An Oracle VARCHAR2 data type on the source with a declared size greater than 4000 bytes maps through the AWS DMS CLOB to a STRING on the Oracle target.  | 
|  NCLOB  |  NCLOB To use this data type with AWS DMS, you must enable the use of NCLOBs for a specific task. During CDC, NCLOB data types are supported only in tables that include a primary key. WSTRING An Oracle VARCHAR2 data type on the source with a declared size greater than 4000 bytes maps through the AWS DMS NCLOB to a WSTRING on the Oracle target.   | 
| XMLTYPE |  The XMLTYPE target data type is only relevant in Oracle-to-Oracle replication tasks. When the source database is Oracle, the source data types are replicated as-is to the Oracle target. For example, an XMLTYPE data type on the source is created as an XMLTYPE data type on the target.  | 

# Using a Microsoft SQL Server database as a target for AWS Database Migration Service
<a name="CHAP_Target.SQLServer"></a>

You can migrate data to Microsoft SQL Server databases using AWS DMS. With an SQL Server database as a target, you can migrate data from either another SQL Server database or one of the other supported databases.

For information about versions of SQL Server that AWS DMS supports as a target, see [Targets for AWS DMS](CHAP_Introduction.Targets.md). 

AWS DMS supports the on-premises and Amazon RDS editions of Enterprise, Standard, Workgroup, and Developer.

For additional details on working with AWS DMS and SQL Server target databases, see the following.

**Topics**
+ [Limitations on using SQL Server as a target for AWS Database Migration Service](#CHAP_Target.SQLServer.Limitations)
+ [Security requirements when using SQL Server as a target for AWS Database Migration Service](#CHAP_Target.SQLServer.Security)
+ [Endpoint settings when using SQL Server as a target for AWS DMS](#CHAP_Target.SQLServer.ConnectionAttrib)
+ [Target data types for Microsoft SQL Server](#CHAP_Target.SQLServer.DataTypes)

## Limitations on using SQL Server as a target for AWS Database Migration Service
<a name="CHAP_Target.SQLServer.Limitations"></a>

The following limitations apply when using a SQL Server database as a target for AWS DMS:
+ When you manually create a SQL Server target table with a computed column, full load replication is not supported when using the BCP bulk-copy utility. To use full load replication, disable BCP loading by setting the extra connection attribute (ECA) `'useBCPFullLoad=false'` on the endpoint. For information about setting ECAs on endpoints, see [Creating source and target endpoints](CHAP_Endpoints.Creating.md). For more information on working with BCP, see the [Microsoft SQL Server documentation](https://docs.microsoft.com/en-us/sql/relational-databases/import-export/import-and-export-bulk-data-by-using-the-bcp-utility-sql-server).
+ When replicating tables with SQL Server spatial data types (GEOMETRY and GEOGRAPHY), AWS DMS replaces any spatial reference identifier (SRID) that you might have inserted with the default SRID. The default SRID is 0 for GEOMETRY and 4326 for GEOGRAPHY.
+ Temporal tables are not supported. Migrating temporal tables may work with a replication-only task in transactional apply mode if those tables are manually created on the target.
+ Currently, `boolean` data types in a PostgreSQL source are migrated to a SQLServer target as the `bit` data type with inconsistent values. 

  As a workaround, do the following:
  + Precreate the table with a `VARCHAR(1)` data type for the column (or let AWS DMS create the table). Then have downstream processing treat an "F" as False and a "T" as True.
  + To avoid having to change downstream processing, add a transformation rule to the task to change the "F" values to "0" and "T" values to 1, and store them as the SQL server bit datatype.
+ AWS DMS doesn't support change processing to set column nullability (using the `ALTER COLUMN [SET|DROP] NOT NULL` clause with `ALTER TABLE` statements).
+ Windows Authentication isn't supported.

## Security requirements when using SQL Server as a target for AWS Database Migration Service
<a name="CHAP_Target.SQLServer.Security"></a>

The following describes the security requirements for using AWS DMS with a Microsoft SQL Server target:
+ The AWS DMS user account must have at least the `db_owner` user role on the SQL Server database that you are connecting to.
+ A SQL Server system administrator must provide this permission to all AWS DMS user accounts.

## Endpoint settings when using SQL Server as a target for AWS DMS
<a name="CHAP_Target.SQLServer.ConnectionAttrib"></a>

You can use endpoint settings to configure your SQL Server target database similar to using extra connection attributes. You specify the settings when you create the target endpoint using the AWS DMS console, or by using the `create-endpoint` command in the [AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/dms/index.html), with the `--microsoft-sql-server-settings '{"EndpointSetting": "value", ...}'` JSON syntax.

The following table shows the endpoint settings that you can use with SQL Server as a target.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.SQLServer.html)

## Target data types for Microsoft SQL Server
<a name="CHAP_Target.SQLServer.DataTypes"></a>

The following table shows the Microsoft SQL Server target data types that are supported when using AWS DMS and the default mapping from AWS DMS data types. For additional information about AWS DMS data types, see [Data types for AWS Database Migration Service](CHAP_Reference.DataTypes.md).


|  AWS DMS data type  |  SQL Server data type  | 
| --- | --- | 
|  BOOLEAN  |  TINYINT  | 
|  BYTES  |  VARBINARY(length)  | 
|  DATE  |  For SQL Server 2008 and higher, use DATE. For earlier versions, if the scale is 3 or less use DATETIME. In all other cases, use VARCHAR (37).  | 
|  TIME  |  For SQL Server 2008 and higher, use DATETIME2 (%d). For earlier versions, if the scale is 3 or less use DATETIME. In all other cases, use VARCHAR (37).  | 
|  DATETIME  |  For SQL Server 2008 and higher, use DATETIME2 (scale).  For earlier versions, if the scale is 3 or less use DATETIME. In all other cases, use VARCHAR (37).  | 
|  INT1  | SMALLINT | 
|  INT2  |  SMALLINT  | 
|  INT4  | INT | 
|  INT8  |  BIGINT  | 
|  NUMERIC  |  NUMERIC (p,s)  | 
|  REAL4  |  REAL  | 
|  REAL8  | FLOAT | 
|  STRING  |  If the column is a date or time column, then do the following:  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.SQLServer.html) If the column is not a date or time column, use VARCHAR (length).  | 
|  UINT1  |  TINYINT  | 
|  UINT2  |  SMALLINT  | 
|  UINT4  |  INT  | 
|  UINT8  |  BIGINT  | 
|  WSTRING  |  NVARCHAR (length)  | 
|  BLOB  |  VARBINARY(max) IMAGE To use this data type with AWS DMS, you must enable the use of BLOBs for a specific task. AWS DMS supports BLOB data types only in tables that include a primary key.  | 
|  CLOB  |  VARCHAR(max) To use this data type with AWS DMS, you must enable the use of CLOBs for a specific task. During change data capture (CDC), AWS DMS supports CLOB data types only in tables that include a primary key.  | 
|  NCLOB  |  NVARCHAR(max) To use this data type with AWS DMS, you must enable the use of NCLOBs for a specific task. During CDC, AWS DMS supports NCLOB data types only in tables that include a primary key.  | 

# Using a PostgreSQL database as a target for AWS Database Migration Service
<a name="CHAP_Target.PostgreSQL"></a>

You can migrate data to PostgreSQL databases using AWS DMS, either from another PostgreSQL database or from one of the other supported databases. 

For information about versions of PostgreSQL that AWS DMS supports as a target, see [Targets for AWS DMS](CHAP_Introduction.Targets.md).

**Note**  
Amazon Aurora Serverless is available as a target for Amazon Aurora with PostgreSQL compatibility. For more information about Amazon Aurora Serverless, see [Using Amazon Aurora Serverless v2](https://docs.aws.amazon.com//AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2.html) in the *Amazon Aurora User Guide*.
Aurora Serverless DB clusters are accessible only from an Amazon VPC and can't use a [public IP address](https://docs.aws.amazon.com//AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2.requirements.html). So, if you intend to have a replication instance in a different region than Aurora PostgreSQL Serverless, you must configure [vpc peering](https://docs.aws.amazon.com//dms/latest/userguide/CHAP_ReplicationInstance.VPC.html#CHAP_ReplicationInstance.VPC.Configurations.ScenarioVPCPeer). Otherwise, check the availability of Aurora PostgreSQL Serverless [regions](https://docs.aws.amazon.com//AmazonRDS/latest/AuroraUserGuide/Concepts.AuroraFeaturesRegionsDBEngines.grids.html#Concepts.Aurora_Fea_Regions_DB-eng.Feature.Serverless), and decide to use one of those regions for both Aurora PostgreSQL Serverless and your replication instance.
Babelfish capability is built into Amazon Aurora and doesn't have an additional cost. For more information, see [Using Babelfish for Aurora PostgreSQL as a target for AWS Database Migration Service](#CHAP_Target.PostgreSQL.Babelfish).

AWS DMS takes a table-by-table approach when migrating data from source to target in the Full Load phase. Table order during the full load phase cannot be guaranteed. Tables are out of sync during the full load phase and while cached transactions for individual tables are being applied. As a result, active referential integrity constraints can result in task failure during the full load phase.

In PostgreSQL, foreign keys (referential integrity constraints) are implemented using triggers. During the full load phase, AWS DMS loads each table one at a time. We strongly recommend that you disable foreign key constraints during a full load, using one of the following methods:
+ Temporarily disable all triggers from the instance, and finish the full load.
+ Use the `session_replication_role` parameter in PostgreSQL.

At any given time, a trigger can be in one of the following states: `origin`, `replica`, `always`, or `disabled`. When the `session_replication_role` parameter is set to `replica`, only triggers in the `replica` state are active, and they are fired when they are called. Otherwise, the triggers remain inactive. 

PostgreSQL has a failsafe mechanism to prevent a table from being truncated, even when `session_replication_role` is set. You can use this as an alternative to disabling triggers, to help the full load run to completion. To do this, set the target table preparation mode to `DO_NOTHING`. Otherwise, DROP and TRUNCATE operations fail when there are foreign key constraints.

In Amazon RDS, you can control set this parameter using a parameter group. For a PostgreSQL instance running on Amazon EC2, you can set the parameter directly.


For additional details on working with a PostgreSQL database as a target for AWS DMS, see the following sections: 

**Topics**
+ [Limitations on using PostgreSQL as a target for AWS Database Migration Service](#CHAP_Target.PostgreSQL.Limitations)
+ [Limitations on using Amazon Aurora PostgreSQL Limitless as a target for AWS Database Migration Service](#CHAP_Target.PostgreSQL.Aurora.Limitations)
+ [Security requirements when using a PostgreSQL database as a target for AWS Database Migration Service](#CHAP_Target.PostgreSQL.Security)
+ [Endpoint settings and Extra Connection Attributes (ECAs) when using PostgreSQL as a target for AWS DMS](#CHAP_Target.PostgreSQL.ConnectionAttrib)
+ [Target data types for PostgreSQL](#CHAP_Target.PostgreSQL.DataTypes)
+ [Using Babelfish for Aurora PostgreSQL as a target for AWS Database Migration Service](#CHAP_Target.PostgreSQL.Babelfish)

## Limitations on using PostgreSQL as a target for AWS Database Migration Service
<a name="CHAP_Target.PostgreSQL.Limitations"></a>

The following limitations apply when using a PostgreSQL database as a target for AWS DMS:
+ For heterogeneous migrations, the JSON data type is converted to the Native CLOB data type internally.
+ In an Oracle to PostgreSQL migration, if a column in Oracle contains a NULL character (hex value U\$10000), AWS DMS converts the NULL character to a space (hex value U\$10020). This is due to a PostgreSQL limitation.
+ AWS DMS doesn't support replication to a table with a unique index created with coalesce function.
+ If your tables use sequences, then update the value of `NEXTVAL` for each sequence in the target database after you stop the replication from the source database. AWS DMS copies data from your source database, but doesn't migrate sequences to the target during the ongoing replication.

## Limitations on using Amazon Aurora PostgreSQL Limitless as a target for AWS Database Migration Service
<a name="CHAP_Target.PostgreSQL.Aurora.Limitations"></a>

The following limitations apply when using Amazon Aurora PostgreSQL Limitless as a target for AWS DMS:
+ AWS DMS Data Validation does not support Amazon Aurora PostgreSQL Limitless.
+ AWS DMS migrates source tables as Standard tables, which are not distributed. After migration, you can convert these Standard tables to Limitless tables by following the official conversion guide.

## Security requirements when using a PostgreSQL database as a target for AWS Database Migration Service
<a name="CHAP_Target.PostgreSQL.Security"></a>

For security purposes, the user account used for the data migration must be a registered user in any PostgreSQL database that you use as a target.

Your PostgreSQL target endpoint requires minimum user permissions to run an AWS DMS migration, see the following examples.

```
    CREATE USER newuser WITH PASSWORD 'your-password';
    ALTER SCHEMA schema_name OWNER TO newuser;
```

Or,

```
    GRANT USAGE ON SCHEMA schema_name TO myuser;
    GRANT CONNECT ON DATABASE postgres to myuser;
    GRANT CREATE ON DATABASE postgres TO myuser;
    GRANT CREATE ON SCHEMA schema_name TO myuser;
    GRANT UPDATE, INSERT, SELECT, DELETE, TRUNCATE ON ALL TABLES IN SCHEMA schema_name TO myuser;
    GRANT TRUNCATE ON schema_name."BasicFeed" TO myuser;
```

## Endpoint settings and Extra Connection Attributes (ECAs) when using PostgreSQL as a target for AWS DMS
<a name="CHAP_Target.PostgreSQL.ConnectionAttrib"></a>

You can use endpoint settings and Extra Connection Attributes (ECAs) to configure your PostgreSQL target database. 

You specify the settings when you create the target endpoint using the AWS DMS console, or by using the `create-endpoint` command in the [AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/dms/index.html), with the `--postgre-sql-settings '{"EndpointSetting": "value", ...}'` JSON syntax.

You specify ECAs using the `ExtraConnectionAttributes` parameter for your endpoint.

The following table shows the endpoint settings that you can use with PostgreSQL as a target.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.PostgreSQL.html)

## Target data types for PostgreSQL
<a name="CHAP_Target.PostgreSQL.DataTypes"></a>

The PostgreSQL database endpoint for AWS DMS supports most PostgreSQL database data types. The following table shows the PostgreSQL database target data types that are supported when using AWS DMS and the default mapping from AWS DMS data types.

For additional information about AWS DMS data types, see [Data types for AWS Database Migration Service](CHAP_Reference.DataTypes.md).


|  AWS DMS data type  |  PostgreSQL data type  | 
| --- | --- | 
|  BOOLEAN  |  BOOLEAN  | 
|  BLOB  |  BYTEA  | 
|  BYTES  |  BYTEA  | 
|  DATE  |  DATE  | 
|  TIME  |  TIME  | 
|  DATETIME  |  If the scale is from 0 through 6, then use TIMESTAMP. If the scale is from 7 through 9, then use VARCHAR (37).  | 
|  INT1  |  SMALLINT  | 
|  INT2  |  SMALLINT  | 
|  INT4  |  INTEGER  | 
|  INT8  |  BIGINT  | 
|  NUMERIC   |  DECIMAL (P,S)  | 
|  REAL4  |  FLOAT4  | 
|  REAL8  |  FLOAT8  | 
|  STRING  |  If the length is from 1 through 21,845, then use VARCHAR (length in bytes).  If the length is 21,846 through 2,147,483,647, then use VARCHAR (65535).  | 
|  UINT1  |  SMALLINT  | 
|  UINT2  |  INTEGER  | 
|  UINT4  |  BIGINT  | 
|  UINT8  |  BIGINT  | 
|  WSTRING  |  If the length is from 1 through 21,845, then use VARCHAR (length in bytes).  If the length is 21,846 through 2,147,483,647, then use VARCHAR (65535).  | 
|  NCLOB  |  TEXT  | 
|  CLOB  |  TEXT  | 

**Note**  
When replicating from a PostgreSQL source, AWS DMS creates the target table with the same data types for all columns, apart from columns with user-defined data types. In such cases, the data type is created as "character varying" in the target.

## Using Babelfish for Aurora PostgreSQL as a target for AWS Database Migration Service
<a name="CHAP_Target.PostgreSQL.Babelfish"></a>

You can migrate SQL Server source tables to a Babelfish for Amazon Aurora PostgreSQL target using AWS Database Migration Service. With Babelfish, Aurora PostgreSQL understands T-SQL, Microsoft SQL Server's proprietary SQL dialect, and supports the same communications protocol. So, applications written for SQL Server can now work with Aurora with fewer code changes. Babelfish capability is built into Amazon Aurora and doesn't have an additional cost. You can activate Babelfish on your Amazon Aurora cluster from the Amazon RDS console.

When you create your AWS DMS target endpoint using the AWS DMS console, API, or CLI commands, specify the target engine as **Amazon Aurora PostgreSQL**, and name the database, **babelfish\$1db**. In the **Endpoint Settings** section, add settings to set `DatabaseMode` to `Babelfish` and `BabelfishDatabaseName` to the name of the target Babelfish T-SQL database.

### Adding transformation rules to your migration task
<a name="CHAP_Target.PostgreSQL.Babelfish.transform"></a>

When you define a migration task for a Babelfish target, you need to include transformation rules that ensure DMS uses the pre-created T-SQL Babelfish tables in the target database.

First, add a transformation rule to your migration task that makes all table names lowercase. Babelfish stores as lowercase in the PostgreSQL `pg_class` catalog the names of tables that you create using T-SQL. However, when you have SQL Server tables with mixed-case names, DMS creates the tables using PostgreSQL native data types instead of the T-SQL compatible data types. For that reason, be sure to add a transformation rule that makes all table names lowercase. Note that column names should not be transformed to lowercase.

Next, if you used the multidatabase migration mode when you defined your cluster, add a transformation rule that renames the original SQL Server schema. Make sure to rename the SQL Server schema name to include the name of the T-SQL database. For example, if the original SQL Server schema name is dbo, and your T-SQL database name is mydb, rename the schema to mydb\$1dbo using a transformation rule.

**Note**  
When using Babelfish for Aurora PostgreSQL 16 or later, the default migration mode is "mutidatabase". When running DMS migration tasks, ensure to review the migration mode parameter and update the transformation rules if needed.

If you use single database mode, you don't need a transformation rule to rename schema names. Schema names have a one-to-one mapping with the target T-SQL database in Babelfish.

The following sample transformation rule makes all table names lowercase, and renames the original SQL Server schema name from `dbo` to `mydb_dbo`.

```
{
   "rules": [
   {
      "rule-type": "transformation",
      "rule-id": "566251737",
      "rule-name": "566251737",
      "rule-target": "schema",
      "object-locator": {
         "schema-name": "dbo"
      },
      "rule-action": "rename",
      "value": "mydb_dbo",
      "old-value": null
   },
   {
      "rule-type": "transformation",
      "rule-id": "566139410",
      "rule-name": "566139410",
      "rule-target": "table",
      "object-locator": {
         "schema-name": "%",
         "table-name": "%"
      },
      "rule-action": "convert-lowercase",
      "value": null,
      "old-value": null
   },
   {
      "rule-type": "selection",
      "rule-id": "566111704",
      "rule-name": "566111704",
      "object-locator": {
         "schema-name": "dbo",
         "table-name": "%"
      },
      "rule-action": "include",
      "filters": []
   }
]
}
```

### Limitations to using a PostgreSQL target endpoint with Babelfish tables
<a name="CHAP_Target.PostgreSQL.Babelfish.limitations"></a>

The following limitations apply when using a PostgreSQL target endpoint with Babelfish tables:
+ For **Target table preparation** mode, use only the **Do nothing** or **Truncate** modes. Don't use the **Drop tables on target** mode. In that mode, DMS creates the tables as PostgreSQL tables that T-SQL might not recognize.
+ AWS DMS doesn't support the sql\$1variant data type.
+ Babelfish under Postgres endpoint does not support `HEIRARCHYID`, `GEOMETRY` (prior to 3.5.4) and `GEOGRAPHY` (prior to 3.5.4) data types. To migrate these data types, you can add transformation rules to convert the data type to wstring(250).
+ Babelfish only supports migrating `BINARY`, `VARBINARY`, and `IMAGE` data types using the `BYTEA` data type. For earlier versions of Aurora PostgreSQL, you can use DMS to migrate these tables to a [Babelfish target endpoint](CHAP_Target.Babelfish.md). You don't have to specify a length for the `BYTEA` data type, as shown in the following example.

  ```
  [Picture] [VARBINARY](max) NULL
  ```

  Change the preceding T-SQL data type to the T-SQL supported `BYTEA` data type.

  ```
  [Picture] BYTEA NULL
  ```
+ For earlier versions of Aurora PostgreSQL Babelfish, if you create a migration task for ongoing replication from SQL Server to Babelfish using the PostgreSQL target endpoint, you need to assign the `SERIAL` data type to any tables that use `IDENTITY` columns. Starting with Aurora PostgreSQL (version 15.3/14.8 and higher) and Babelfish (version 3.2.0 and higher), the identity column is supported, and it is no longer required to assign the SERIAL data type. For more information, see [SERIAL Usage](https://docs.aws.amazon.com/dms/latest/sql-server-to-aurora-postgresql-migration-playbook/chap-sql-server-aurora-pg.tsql.sequences..html) in the Sequences and Identity section of the *SQL Server to Aurora PostgreSQL Migration Playbook*. Then, when you create the table in Babelfish, change the column definition from the following.

  ```
      [IDCol] [INT] IDENTITY(1,1) NOT NULL PRIMARY KEY
  ```

  Change the preceding into the following.

  ```
      [IDCol] SERIAL PRIMARY KEY
  ```

  Babelfish-compatible Aurora PostgreSQL creates a sequence using the default configuration and adds a `NOT NULL` constraint to the column. The newly created sequence behaves like a regular sequence (incremented by 1) and has no composite `SERIAL` option.
+ After migrating data with tables that use `IDENTITY` columns or the `SERIAL` data type, reset the PostgreSQL-based sequence object based on the maximum value for the column. After performing a full load of the tables, use the following T-SQL query to generate statements to seed the associated sequence object.

  ```
  DECLARE @schema_prefix NVARCHAR(200) = ''
  
  IF current_setting('babelfishpg_tsql.migration_mode') = 'multi-db'
          SET @schema_prefix = db_name() + '_'
  
  SELECT 'SELECT setval(pg_get_serial_sequence(''' + @schema_prefix + schema_name(tables.schema_id) + '.' + tables.name + ''', ''' + columns.name + ''')
                 ,(select max(' + columns.name + ') from ' + schema_name(tables.schema_id) + '.' + tables.name + '));'
  FROM sys.tables tables
  JOIN sys.columns columns ON tables.object_id = columns.object_id
  WHERE columns.is_identity = 1
  
  UNION ALL
  
  SELECT 'SELECT setval(pg_get_serial_sequence(''' + @schema_prefix + table_schema + '.' + table_name + ''', 
  ''' + column_name + '''),(select max(' + column_name + ') from ' + table_schema + '.' + table_name + '));'
  FROM information_schema.columns
  WHERE column_default LIKE 'nextval(%';
  ```

  The query generates a series of SELECT statements that you execute in order to update the maximum IDENTITY and SERIAL values.
+ For Babelfish versions prior to 3.2, **Full LOB mode** might result in a table error. If that happens, create a separate task for the tables that failed to load. Then use **Limited LOB mode** to specify the appropriate value for the **Maximum LOB size (KB)**. Another option is to set the SQL Server Endpoint Connection Attribute setting `ForceFullLob=True`.
+ For Babelfish versions prior to 3.2, performing data validation with Babelfish tables that don't use integer based primary keys generates a message that a suitable unique key can't be found. Starting with Aurora PostgreSQL (version 15.3/14.8 and higher) and Babelfish (version 3.2.0 and higher), data validation for non-integer primary keys is supported. 
+ Because of precision differences in the number of decimal places for seconds, DMS reports data validation failures for Babelfish tables that use `DATETIME` data types. To suppress those failures, you can add the following validation rule type for `DATETIME` data types.

  ```
  {
           "rule-type": "validation",
           "rule-id": "3",
           "rule-name": "3",
           "rule-target": "column",
           "object-locator": {
               "schema-name": "dbo",
               "table-name": "%",
               "column-name": "%",
               "data-type": "datetime"
           },
           "rule-action": "override-validation-function",
           "source-function": "case when ${column-name} is NULL then NULL else 0 end",
           "target-function": "case when ${column-name} is NULL then NULL else 0 end"
       }
  ```

# Using a MySQL-compatible database as a target for AWS Database Migration Service
<a name="CHAP_Target.MySQL"></a>

You can migrate data to any MySQL-compatible database using AWS DMS, from any of the source data engines that AWS DMS supports. If you are migrating to an on-premises MySQL-compatible database, then AWS DMS requires that your source engine reside within the AWS ecosystem. The engine can be on an AWS-managed service such as Amazon RDS, Amazon Aurora, or Amazon S3. Or the engine can be on a self-managed database on Amazon EC2. 

You can use SSL to encrypt connections between your MySQL-compatible endpoint and the replication instance. For more information on using SSL with a MySQL-compatible endpoint, see [Using SSL with AWS Database Migration Service](CHAP_Security.SSL.md). 

For information about versions of MySQL that AWS DMS supports as a target, see [Targets for AWS DMS](CHAP_Introduction.Targets.md).

You can use the following MySQL-compatible databases as targets for AWS DMS:
+ MySQL Community Edition
+ MySQL Standard Edition
+ MySQL Enterprise Edition
+ MySQL Cluster Carrier Grade Edition
+ MariaDB Community Edition
+ MariaDB Enterprise Edition
+ MariaDB Column Store
+ Amazon Aurora MySQL

**Note**  
Regardless of the source storage engine (MyISAM, MEMORY, and so on), AWS DMS creates a MySQL-compatible target table as an InnoDB table by default.   
If you need a table in a storage engine other than InnoDB, you can manually create the table on the MySQL-compatible target and migrate the table using the **Do nothing** option. For more information, see [Full-load task settings](CHAP_Tasks.CustomizingTasks.TaskSettings.FullLoad.md).

For additional details on working with a MySQL-compatible database as a target for AWS DMS, see the following sections. 

**Topics**
+ [Using any MySQL-compatible database as a target for AWS Database Migration Service](#CHAP_Target.MySQL.Prerequisites)
+ [Limitations on using a MySQL-compatible database as a target for AWS Database Migration Service](#CHAP_Target.MySQL.Limitations)
+ [Endpoint settings when using a MySQL-compatible database as a target for AWS DMS](#CHAP_Target.MySQL.ConnectionAttrib)
+ [Target data types for MySQL](#CHAP_Target.MySQL.DataTypes)

## Using any MySQL-compatible database as a target for AWS Database Migration Service
<a name="CHAP_Target.MySQL.Prerequisites"></a>

Before you begin to work with a MySQL-compatible database as a target for AWS DMS, make sure that you have completed the following prerequisites:
+ Provide a user account to AWS DMS that has read/write privileges to the MySQL-compatible database. To create the necessary privileges, run the following commands.

  ```
  CREATE USER '<user acct>'@'%' IDENTIFIED BY '<user password>';
  GRANT ALTER, CREATE, DROP, INDEX, INSERT, UPDATE, DELETE, SELECT, CREATE TEMPORARY TABLES  ON <schema>.* TO 
  '<user acct>'@'%';
  GRANT ALL PRIVILEGES ON awsdms_control.* TO '<user acct>'@'%';
  ```
+ During the full-load migration phase, you must disable foreign keys on your target tables. To disable foreign key checks on a MySQL-compatible database during a full load, you can add the following command to the **Extra connection attributes** section of the AWS DMS console for your target endpoint.

  ```
  Initstmt=SET FOREIGN_KEY_CHECKS=0;
  ```
+ Set the database parameter `local_infile = 1` to enable AWS DMS to load data into the target database.
+ Grant the following privileges if you use MySQL-specific premigration assessments.

  ```
  grant select on mysql.user to <dms_user>;
  grant select on mysql.db to <dms_user>;
  grant select on mysql.tables_priv to <dms_user>;
  grant select on mysql.role_edges to <dms_user>  #only for MySQL version 8.0.11 and higher
  ```

## Limitations on using a MySQL-compatible database as a target for AWS Database Migration Service
<a name="CHAP_Target.MySQL.Limitations"></a>

When using a MySQL database as a target, AWS DMS doesn't support the following:
+ The data definition language (DDL) statements TRUNCATE PARTITION, DROP TABLE, and RENAME TABLE.
+ Using an `ALTER TABLE table_name ADD COLUMN column_name` statement to add columns to the beginning or the middle of a table.
+ When loading data to a MySQL-compatible target in a full load task, AWS DMS doesn't report errors caused by constraints in the task logs, which can cause duplicate key errors or mismatches with the number of records. This is caused by the way MySQL handles local data with the `LOAD DATA` command. Be sure to do the following during the full load phase: 
  + Disable constraints
  + Use AWS DMS validation to make sure the data is consistent.
+ When you update a column's value to its existing value, MySQL-compatible databases return a `0 rows affected` warning. Although this behavior isn't technically an error, it is different from how the situation is handled by other database engines. For example, Oracle performs an update of one row. For MySQL-compatible databases, AWS DMS generates an entry in the awsdms\$1apply\$1exceptions control table and logs the following warning.

  ```
  Some changes from the source database had no impact when applied to
  the target database. See awsdms_apply_exceptions table for details.
  ```
+ Aurora Serverless is available as a target for Amazon Aurora version 2, compatible with MySQL version 5.7. (Select Aurora MySQL version 2.07.1 to be able to use Aurora Serverless with MySQL 5.7 compatibility.) For more information about Aurora Serverless, see [Using Aurora Serverless v2](https://docs.aws.amazon.com//AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2.html) in the *Amazon Aurora User Guide*.
+ AWS DMS does not support using a reader endpoint for Aurora or Amazon RDS, unless the instances are in writable mode, that is, the `read_only` and `innodb_read_only` parameters are set to `0` or `OFF`. For more information about using Amazon RDS and Aurora as targets, see the following:
  +  [ Determining which DB instance you are connected to](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.BestPractices.html#AuroraMySQL.BestPractices.DeterminePrimaryInstanceConnection) 
  +  [ Updating read replicas with MySQL](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_MySQL.Replication.ReadReplicas.html#USER_MySQL.Replication.ReadReplicas.Updates) 
+ When replicating TIME datatype, fractional part of time value is not replicated.
+ When replicating TIME datatype with Extra Connection Attribute `loadUsingCSV=false`, the time value is capped to range `[00:00:00, 23:59:59]`.

## Endpoint settings when using a MySQL-compatible database as a target for AWS DMS
<a name="CHAP_Target.MySQL.ConnectionAttrib"></a>

You can use endpoint settings to configure your MySQL-compatible target database similar to using extra connection attributes. You specify the settings when you create the target endpoint using the AWS DMS console, or by using the `create-endpoint` command in the [AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/dms/index.html), with the `--my-sql-settings '{"EndpointSetting": "value", ...}'` JSON syntax.

The following table shows the endpoint settings that you can use with MySQL as a target.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.MySQL.html)

You can also use extra connection attributes to configure your MySQL-compatible target database.

The following table shows the extra connection attributes that you can use with MySQL as a target.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.MySQL.html)

Alternatively, you can use the `AfterConnectScript` parameter of the `--my-sql-settings` command to disable foreign key checks and specify the time zone for your database.

## Target data types for MySQL
<a name="CHAP_Target.MySQL.DataTypes"></a>

The following table shows the MySQL database target data types that are supported when using AWS DMS and the default mapping from AWS DMS data types.

For additional information about AWS DMS data types, see [Data types for AWS Database Migration Service](CHAP_Reference.DataTypes.md).


|  AWS DMS data types  |  MySQL data types  | 
| --- | --- | 
|  BOOLEAN  |  BOOLEAN  | 
|  BYTES  |  If the length is from 1 through 65,535, then use VARBINARY (length).  If the length is from 65,536 through 2,147,483,647, then use LONGLOB.  | 
|  DATE  |  DATE  | 
|  TIME  |  TIME  | 
|  TIMESTAMP  |  "If scale is => 0 and =< 6, then: DATETIME (Scale) If scale is => 7 and =< 9, then: VARCHAR (37)"  | 
|  INT1  |  TINYINT  | 
|  INT2  |  SMALLINT  | 
|  INT4  |  INTEGER  | 
|  INT8  |  BIGINT  | 
|  NUMERIC  |  DECIMAL (p,s)  | 
|  REAL4  |  FLOAT  | 
|  REAL8  |  DOUBLE PRECISION  | 
|  STRING  |  If the length is from 1 through 21,845, then use VARCHAR (length). If the length is from 21,846 through 2,147,483,647, then use LONGTEXT.  | 
|  UINT1  |  UNSIGNED TINYINT  | 
|  UINT2  |  UNSIGNED SMALLINT  | 
|  UINT4  |  UNSIGNED INTEGER  | 
|  UINT8  |  UNSIGNED BIGINT  | 
|  WSTRING  |  If the length is from 1 through 32,767, then use VARCHAR (length).  If the length is from 32,768 through 2,147,483,647, then use LONGTEXT.  | 
|  BLOB  |  If the length is from 1 through 65,535, then use BLOB. If the length is from 65,536 through 2,147,483,647, then use LONGBLOB. If the length is 0, then use LONGBLOB (full LOB support).  | 
|  NCLOB  |  If the length is from 1 through 65,535, then use TEXT. If the length is from 65,536 through 2,147,483,647, then use LONGTEXT with ucs2 for CHARACTER SET. If the length is 0, then use LONGTEXT (full LOB support) with ucs2 for CHARACTER SET.  | 
|  CLOB  |  If the length is from 1 through 65,535, then use TEXT. If the length is from 65,536 through 2147483647, then use LONGTEXT. If the length is 0, then use LONGTEXT (full LOB support).  | 

# Using an Amazon Redshift database as a target for AWS Database Migration Service
<a name="CHAP_Target.Redshift"></a>

You can migrate data to Amazon Redshift databases using AWS Database Migration Service. Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. With an Amazon Redshift database as a target, you can migrate data from all of the other supported source databases.

You can use Amazon Redshift Serverless as a target for AWS DMS. For more information, see [Using AWS DMS with Amazon Redshift Serverless as a TargetAmazon Redshift Serverless](#CHAP_Target.Redshift.RSServerless) following.

 The Amazon Redshift cluster must be in the same AWS account and same AWS Region as the replication instance. 

During a database migration to Amazon Redshift, AWS DMS first moves data to an Amazon S3 bucket. When the files reside in an Amazon S3 bucket, AWS DMS then transfers them to the proper tables in the Amazon Redshift data warehouse. AWS DMS creates the S3 bucket in the same AWS Region as the Amazon Redshift database. The AWS DMS replication instance must be located in that same AWS Region . 

If you use the AWS CLI or DMS API to migrate data to Amazon Redshift, set up an AWS Identity and Access Management (IAM) role to allow S3 access. For more information about creating this IAM role, see [Creating the IAM roles to use with AWS DMS](security-iam.md#CHAP_Security.APIRole).

The Amazon Redshift endpoint provides full automation for the following:
+ Schema generation and data type mapping
+ Full load of source database tables
+ Incremental load of changes made to source tables
+ Application of schema changes in data definition language (DDL) made to the source tables
+ Synchronization between full load and change data capture (CDC) processes.

AWS Database Migration Service supports both full load and change processing operations. AWS DMS reads the data from the source database and creates a series of comma-separated value (.csv) files. For full-load operations, AWS DMS creates files for each table. AWS DMS then copies the table files for each table to a separate folder in Amazon S3. When the files are uploaded to Amazon S3, AWS DMS sends a copy command and the data in the files are copied into Amazon Redshift. For change-processing operations, AWS DMS copies the net changes to the .csv files. AWS DMS then uploads the net change files to Amazon S3 and copies the data to Amazon Redshift.

For additional details on working with Amazon Redshift as a target for AWS DMS, see the following sections: 

**Topics**
+ [Prerequisites for using an Amazon Redshift database as a target for AWS Database Migration Service](#CHAP_Target.Redshift.Prerequisites)
+ [Privileges required for using Redshift as a target](#CHAP_Target.Redshift.Privileges)
+ [Limitations on using Amazon Redshift as a target for AWS Database Migration Service](#CHAP_Target.Redshift.Limitations)
+ [Configuring an Amazon Redshift database as a target for AWS Database Migration Service](#CHAP_Target.Redshift.Configuration)
+ [Using enhanced VPC routing with Amazon Redshift as a target for AWS Database Migration Service](#CHAP_Target.Redshift.EnhancedVPC)
+ [Creating and using AWS KMS keys to encrypt Amazon Redshift target data](#CHAP_Target.Redshift.KMSKeys)
+ [Endpoint settings when using Amazon Redshift as a target for AWS DMS](#CHAP_Target.Redshift.ConnectionAttrib)
+ [Using a data encryption key, and an Amazon S3 bucket as intermediate storage](#CHAP_Target.Redshift.EndpointSettings)
+ [Multithreaded task settings for Amazon Redshift](#CHAP_Target.Redshift.ParallelApply)
+ [Target data types for Amazon Redshift](#CHAP_Target.Redshift.DataTypes)
+ [Using AWS DMS with Amazon Redshift Serverless as a Target](#CHAP_Target.Redshift.RSServerless)

## Prerequisites for using an Amazon Redshift database as a target for AWS Database Migration Service
<a name="CHAP_Target.Redshift.Prerequisites"></a>

The following list describes the prerequisites necessary for working with Amazon Redshift as a target for data migration:
+ Use the AWS Management Console to launch an Amazon Redshift cluster. Note the basic information about your AWS account and your Amazon Redshift cluster, such as your password, user name, and database name. You need these values when creating the Amazon Redshift target endpoint. 
+ The Amazon Redshift cluster must be in the same AWS account and the same AWS Region as the replication instance.
+ The AWS DMS replication instance needs network connectivity to the Amazon Redshift endpoint (hostname and port) that your cluster uses.
+ AWS DMS uses an Amazon S3 bucket to transfer data to the Amazon Redshift database. For AWS DMS to create the bucket, the console uses an IAM role, `dms-access-for-endpoint`. If you use the AWS CLI or DMS API to create a database migration with Amazon Redshift as the target database, you must create this IAM role. For more information about creating this role, see [Creating the IAM roles to use with AWS DMS](security-iam.md#CHAP_Security.APIRole). 
+ AWS DMS converts BLOBs, CLOBs, and NCLOBs to a VARCHAR on the target Amazon Redshift instance. Amazon Redshift does not support VARCHAR data types larger than 64 KB, so you can't store traditional LOBs on Amazon Redshift. 
+ Set the target metadata task setting [BatchApplyEnabled](CHAP_Tasks.CustomizingTasks.TaskSettings.ChangeProcessingTuning.md) to `true` for AWS DMS to handle changes to Amazon Redshift target tables during CDC. A Primary Key on both the source and target table is required. Without a Primary Key, changes are applied statement by statement. And that can adversely affect task performance during CDC by causing target latency and impacting the cluster commit queue. 
+ When Row Level Security is enabled on the tables in Redshift, you must grant appropriate permissions to all your DMS users.

## Privileges required for using Redshift as a target
<a name="CHAP_Target.Redshift.Privileges"></a>

Use the GRANT command to define access privileges for a user or user group. Privileges include access options such as being able to read data in tables and views, write data, and create tables. For more information about using GRANT with Amazon Redshift, see [GRANT](https://docs.aws.amazon.com//redshift/latest/dg/r_GRANT.html) in the * Amazon Redshift Database Developer Guide*. 

The following is the syntax to give specific privileges for a table, database, schema, function, procedure, or language-level privileges on Amazon Redshift tables and views.

```
GRANT { { SELECT | INSERT | UPDATE | DELETE | REFERENCES } [,...] | ALL [ PRIVILEGES ] }
    ON { [ TABLE ] table_name [, ...] | ALL TABLES IN SCHEMA schema_name [, ...] }
    TO { username [ WITH GRANT OPTION ] | GROUP group_name | PUBLIC } [, ...]

GRANT { { CREATE | TEMPORARY | TEMP } [,...] | ALL [ PRIVILEGES ] }
    ON DATABASE db_name [, ...]
    TO { username [ WITH GRANT OPTION ] | GROUP group_name | PUBLIC } [, ...]

GRANT { { CREATE | USAGE } [,...] | ALL [ PRIVILEGES ] }
    ON SCHEMA schema_name [, ...]
    TO { username [ WITH GRANT OPTION ] | GROUP group_name | PUBLIC } [, ...]

GRANT { EXECUTE | ALL [ PRIVILEGES ] }
    ON { FUNCTION function_name ( [ [ argname ] argtype [, ...] ] ) [, ...] | ALL FUNCTIONS IN SCHEMA schema_name [, ...] }
    TO { username [ WITH GRANT OPTION ] | GROUP group_name | PUBLIC } [, ...]

GRANT { EXECUTE | ALL [ PRIVILEGES ] }
    ON { PROCEDURE procedure_name ( [ [ argname ] argtype [, ...] ] ) [, ...] | ALL PROCEDURES IN SCHEMA schema_name [, ...] }
    TO { username [ WITH GRANT OPTION ] | GROUP group_name | PUBLIC } [, ...]

GRANT USAGE 
    ON LANGUAGE language_name [, ...]
    TO { username [ WITH GRANT OPTION ] | GROUP group_name | PUBLIC } [, ...]
```

The following is the syntax for column-level privileges on Amazon Redshift tables and views. 

```
GRANT { { SELECT | UPDATE } ( column_name [, ...] ) [, ...] | ALL [ PRIVILEGES ] ( column_name [,...] ) }
     ON { [ TABLE ] table_name [, ...] }
     TO { username | GROUP group_name | PUBLIC } [, ...]
```

The following is the syntax for the ASSUMEROLE privilege granted to users and groups with a specified role.

```
GRANT ASSUMEROLE
    ON { 'iam_role' [, ...] | ALL }
    TO { username | GROUP group_name | PUBLIC } [, ...]
    FOR { ALL | COPY | UNLOAD } [, ...]
```

## Limitations on using Amazon Redshift as a target for AWS Database Migration Service
<a name="CHAP_Target.Redshift.Limitations"></a>

The following limitations apply when using an Amazon Redshift database as a target:
+ Don’t enable versioning for the S3 bucket you use as intermediate storage for your Amazon Redshift target. If you need S3 versioning, use lifecycle policies to actively delete old versions. Otherwise, you might encounter endpoint test connection failures because of an S3 `list-object` call timeout. To create a lifecycle policy for an S3 bucket, see [ Managing your storage lifecycle](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html). To delete a version of an S3 object, see [ Deleting object versions from a versioning-enabled bucket](https://docs.aws.amazon.com/AmazonS3/latest/dev/DeletingObjectVersions.html).
+ The following DDL is not supported:

  ```
  ALTER TABLE table name MODIFY COLUMN column name data type;
  ```
+  AWS DMS cannot migrate or replicate changes to a schema with a name that begins with underscore (\$1). If you have schemas that have a name that begins with an underscore, use mapping transformations to rename the schema on the target. 
+  Amazon Redshift does not support VARCHARs larger than 64 KB. LOBs from traditional databases can't be stored in Amazon Redshift.
+  Applying a DELETE statement to a table with a multi-column primary key is not supported when any of the primary key column names use a reserved word. Go [here](https://docs.aws.amazon.com/redshift/latest/dg/r_pg_keywords.html) to see a list of Amazon Redshift reserved words.
+ You may experience performance issues if your source system performs UPDATE operations on the primary key of a source table. These performance issues occur when applying changes to the target. This is because UPDATE (and DELETE) operations depend on the primary key value to identify the target row. If you update the primary key of a source table, your task log will contain messages like the following:

  ```
  Update on table 1 changes PK to a PK that was previously updated in the same bulk update.
  ```
+ DMS does not support custom DNS names when configuring an endpoint for a Redshift cluster, and you need to use the Amazon provided DNS name. Since the Amazon Redshift cluster must be in the same AWS account and Region as the replication instance, validation fails if you use a custom DNS endpoint.
+ Amazon Redshift has a default 4-hour idle session timeout. When there isn't any activity within the DMS replication task, Redshift disconnects the session after 4 hours. Errors can result from DMS being unable to connect and potentially needing to restart. As a workaround, set a SESSION TIMEOUT limit greater than 4 hours for the DMS replication user. Or, see the description of [ALTER USER](https://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_USER.html) in the *Amazon Redshift Database Developer Guide*.
+ When AWS DMS replicates source table data without a primary or unique key, CDC latency might be high resulting in an unacceptable level of performance.
+ Truncating partitions is not supported during CDC replication from Oracle source to Redshift target.
+ Duplicate records might appear in target tables because Amazon Redshift does not enforce primary keys and AWS DMS may replay CDC when a task is resumed. To prevent duplicates, use the `ApplyErrorInsertPolicy=INSERT_RECORD` setting. For more information, see [Error handling task settings](CHAP_Tasks.CustomizingTasks.TaskSettings.ErrorHandling.md). Alternatively, you can implement application-level duplicate detection and post-migration cleanup procedures.

## Configuring an Amazon Redshift database as a target for AWS Database Migration Service
<a name="CHAP_Target.Redshift.Configuration"></a>

AWS Database Migration Service must be configured to work with the Amazon Redshift instance. The following table describes the configuration properties available for the Amazon Redshift endpoint.


| Property | Description | 
| --- | --- | 
| server | The name of the Amazon Redshift cluster you are using. | 
| port | The port number for Amazon Redshift. The default value is 5439. | 
| username | An Amazon Redshift user name for a registered user. | 
| password | The password for the user named in the username property. | 
| database | The name of the Amazon Redshift data warehouse (service) you are working with. | 

If you want to add extra connection string attributes to your Amazon Redshift endpoint, you can specify the `maxFileSize` and `fileTransferUploadStreams` attributes. For more information on these attributes, see [Endpoint settings when using Amazon Redshift as a target for AWS DMS](#CHAP_Target.Redshift.ConnectionAttrib).

## Using enhanced VPC routing with Amazon Redshift as a target for AWS Database Migration Service
<a name="CHAP_Target.Redshift.EnhancedVPC"></a>

If you use Enhanced VPC Routing with your Amazon Redshift target, all COPY traffic between your Amazon Redshift cluster and your data repositories goes through your VPC. Because Enhanced VPC Routing affects the way that Amazon Redshift accesses other resources, COPY commands might fail if you haven't configured your VPC correctly.

AWS DMS can be affected by this behavior because it uses the COPY command to move data in S3 to an Amazon Redshift cluster.

Following are the steps AWS DMS takes to load data into an Amazon Redshift target:

1. AWS DMS copies data from the source to .csv files on the replication server.

1. AWS DMS uses the AWS SDK to copy the .csv files into an S3 bucket on your account.

1. AWS DMS then uses the COPY command in Amazon Redshift to copy data from the .csv files in S3 to an appropriate table in Amazon Redshift.

If Enhanced VPC Routing is not enabled, Amazon Redshift routes traffic through the internet, including traffic to other services within the AWS network. If the feature is not enabled, you do not have to configure the network path. If the feature is enabled, you must specifically create a network path between your cluster's VPC and your data resources. For more information on the configuration required, see [Enhanced VPC routing](https://docs.aws.amazon.com/redshift/latest/mgmt/enhanced-vpc-routing.html) in the Amazon Redshift documentation. 

## Creating and using AWS KMS keys to encrypt Amazon Redshift target data
<a name="CHAP_Target.Redshift.KMSKeys"></a>

You can encrypt your target data pushed to Amazon S3 before it is copied to Amazon Redshift. To do so, you can create and use custom AWS KMS keys. You can use the key you created to encrypt your target data using one of the following mechanisms when you create the Amazon Redshift target endpoint:
+ Use the following option when you run the `create-endpoint` command using the AWS CLI.

  ```
  --redshift-settings '{"EncryptionMode": "SSE_KMS", "ServerSideEncryptionKmsKeyId": "your-kms-key-ARN"}'
  ```

  Here, `your-kms-key-ARN` is the Amazon Resource Name (ARN) for your KMS key. For more information, see [Using a data encryption key, and an Amazon S3 bucket as intermediate storage](#CHAP_Target.Redshift.EndpointSettings).
+ Set the extra connection attribute `encryptionMode` to the value `SSE_KMS` and the extra connection attribute `serverSideEncryptionKmsKeyId` to the ARN for your KMS key. For more information, see [Endpoint settings when using Amazon Redshift as a target for AWS DMS](#CHAP_Target.Redshift.ConnectionAttrib).

To encrypt Amazon Redshift target data using a KMS key, you need an AWS Identity and Access Management (IAM) role that has permissions to access Amazon Redshift data. This IAM role is then accessed in a policy (a key policy) attached to the encryption key that you create. You can do this in your IAM console by creating the following:
+ An IAM role with an AWS-managed policy.
+ A KMS key with a key policy that references this role.

The following procedures describe how to do this.

**To create an IAM role with the required AWS-managed policy**

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the navigation pane, choose **Roles**. The **Roles** page opens.

1. Choose **Create role**. The **Create role** page opens.

1. With **AWS service** chosen as the trusted entity, choose **DMS** as the service to use the role.

1. Choose **Next: Permissions**. The **Attach permissions policies** page appears.

1. Find and select the `AmazonDMSRedshiftS3Role` policy.

1. Choose **Next: Tags**. The **Add tags** page appears. Here, you can add any tags you want.

1. Choose **Next: Review** and review your results.

1. If the settings are what you need, enter a name for the role (for example, `DMS-Redshift-endpoint-access-role`), and any additional description, then choose **Create role**. The **Roles** page opens with a message indicating that your role has been created.

You have now created the new role to access Amazon Redshift resources for encryption with a specified name, for example `DMS-Redshift-endpoint-access-role`.

**To create an AWS KMS encryption key with a key policy that references your IAM role**
**Note**  
For more information about how AWS DMS works with AWS KMS encryption keys, see [Setting an encryption key and specifying AWS KMS permissions](CHAP_Security.md#CHAP_Security.EncryptionKey).

1. Sign in to the AWS Management Console and open the AWS Key Management Service (AWS KMS) console at [https://console.aws.amazon.com/kms](https://console.aws.amazon.com/kms).

1. To change the AWS Region, use the Region selector in the upper-right corner of the page.

1. In the navigation pane, choose **Customer managed keys**.

1. Choose **Create key**. The **Configure key** page opens.

1. For **Key type**, choose **Symmetric**.
**Note**  
When you create this key, you can only create a symmetric key, because all AWS services, such as Amazon Redshift, only work with symmetric encryption keys.

1. Choose **Advanced Options**. For **Key material origin**, make sure that **KMS** is chosen, then choose **Next**. The **Add labels** page opens.

1. For **Create alias and description**, enter an alias for the key (for example, `DMS-Redshift-endpoint-encryption-key`) and any additional description.

1. For **Tags**, add any tags that you want to help identify the key and track its usage, then choose **Next**. The **Define key administrative permissions** page opens showing a list of users and roles that you can choose from.

1. Add the users and roles that you want to manage the key. Make sure that these users and roles have the required permissions to manage the key. 

1. For **Key deletion**, choose whether key administrators can delete the key, then choose **Next**. The **Define key usage permissions** page opens showing an additional list of users and roles that you can choose from.

1. For **This account**, choose the available users you want to perform cryptographic operations on Amazon Redshift targets. Also choose the role that you previously created in **Roles** to enable access to encrypt Amazon Redshift target objects, for example `DMS-Redshift-endpoint-access-role`).

1. If you want to add other accounts not listed to have this same access, for **Other AWS accounts**, choose **Add another AWS account**, then choose **Next**. The **Review and edit key policy** page opens, showing the JSON for the key policy that you can review and edit by typing into the existing JSON. Here, you can see where the key policy references the role and users (for example, `Admin` and `User1`) that you chose in the previous step. You can also see the different key actions permitted for the different principals (users and roles), as shown in the following example.

------
#### [ JSON ]

****  

   ```
   {
       "Id": "key-consolepolicy-3",
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "Enable IAM User Permissions",
               "Effect": "Allow",
               "Principal": {
                   "AWS": [
                       "arn:aws:iam::111122223333:root"
                   ]
               },
               "Action": "kms:*",
               "Resource": "*"
           },
           {
               "Sid": "Allow access for Key Administrators",
               "Effect": "Allow",
               "Principal": {
                   "AWS": [
                       "arn:aws:iam::111122223333:role/Admin"
                   ]
               },
               "Action": [
                   "kms:Create*",
                   "kms:Describe*",
                   "kms:Enable*",
                   "kms:List*",
                   "kms:Put*",
                   "kms:Update*",
                   "kms:Revoke*",
                   "kms:Disable*",
                   "kms:Get*",
                   "kms:Delete*",
                   "kms:TagResource",
                   "kms:UntagResource",
                   "kms:ScheduleKeyDeletion",
                   "kms:CancelKeyDeletion"
               ],
               "Resource": "*"
           },
           {
               "Sid": "Allow use of the key",
               "Effect": "Allow",
               "Principal": {
                   "AWS": [
                       "arn:aws:iam::111122223333:role/DMS-Redshift-endpoint-access-role",
                       "arn:aws:iam::111122223333:role/Admin",
                       "arn:aws:iam::111122223333:role/User1"
                   ]
               },
               "Action": [
                   "kms:Encrypt",
                   "kms:Decrypt",
                   "kms:ReEncrypt*",
                   "kms:GenerateDataKey*",
                   "kms:DescribeKey"
               ],
               "Resource": "*"
           },
           {
               "Sid": "Allow attachment of persistent resources",
               "Effect": "Allow",
               "Principal": {
                   "AWS": [
                       "arn:aws:iam::111122223333:role/DMS-Redshift-endpoint-access-role",
                       "arn:aws:iam::111122223333:role/Admin",
                       "arn:aws:iam::111122223333:role/User1"
                   ]
               },
               "Action": [
                   "kms:CreateGrant",
                   "kms:ListGrants",
                   "kms:RevokeGrant"
               ],
               "Resource": "*",
               "Condition": {
                   "Bool": {
                       "kms:GrantIsForAWSResource": true
                   }
               }
           }
       ]
   }
   ```

------

1. Choose **Finish**. The **Encryption keys** page opens with a message indicating that your AWS KMS key has been created.

You have now created a new KMS key with a specified alias (for example, `DMS-Redshift-endpoint-encryption-key`). This key enables AWS DMS to encrypt Amazon Redshift target data.

## Endpoint settings when using Amazon Redshift as a target for AWS DMS
<a name="CHAP_Target.Redshift.ConnectionAttrib"></a>

You can use endpoint settings to configure your Amazon Redshift target database similar to using extra connection attributes. You specify the settings when you create the target endpoint using the AWS DMS console, or by using the `create-endpoint` command in the [AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/dms/index.html), with the `--redshift-settings '{"EndpointSetting": "value", ...}'` JSON syntax.

The following table shows the endpoint settings that you can use with Amazon Redshift as a target.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.Redshift.html)

## Using a data encryption key, and an Amazon S3 bucket as intermediate storage
<a name="CHAP_Target.Redshift.EndpointSettings"></a>

You can use Amazon Redshift target endpoint settings to configure the following:
+ A custom AWS KMS data encryption key. You can then use this key to encrypt your data pushed to Amazon S3 before it is copied to Amazon Redshift.
+ A custom S3 bucket as intermediate storage for data migrated to Amazon Redshift.
+ Map a boolean as a boolean from a PostgreSQL source. By default, a BOOLEAN type is migrated as varchar(1). You can specify `MapBooleanAsBoolean` to let your Redshift target migrate the boolean type as boolean, as shown in the example following.

  ```
  --redshift-settings '{"MapBooleanAsBoolean": true}'
  ```

  Note that you must set this setting on both the source and target endpoints for it to take effect.

### KMS key settings for data encryption
<a name="CHAP_Target.Redshift.EndpointSettings.KMSkeys"></a>

The following examples show configuring a custom KMS key to encrypt your data pushed to S3. To start, you might make the following `create-endpoint` call using the AWS CLI.

```
aws dms create-endpoint --endpoint-identifier redshift-target-endpoint --endpoint-type target 
--engine-name redshift --username your-username --password your-password 
--server-name your-server-name --port 5439 --database-name your-db-name 
--redshift-settings '{"EncryptionMode": "SSE_KMS", 
"ServerSideEncryptionKmsKeyId": "arn:aws:kms:us-east-1:111122223333:key/24c3c5a1-f34a-4519-a85b-2debbef226d1"}'
```

Here, the JSON object specified by `--redshift-settings` option defines two parameters. One is an `EncryptionMode` parameter with the value `SSE_KMS`. The other is an `ServerSideEncryptionKmsKeyId` parameter with the value `arn:aws:kms:us-east-1:111122223333:key/24c3c5a1-f34a-4519-a85b-2debbef226d1`. This value is an Amazon Resource Name (ARN) for your custom KMS key.

By default, S3 data encryption occurs using S3 server-side encryption. For the previous example's Amazon Redshift target, this is also equivalent of specifying its endpoint settings, as in the following example.

```
aws dms create-endpoint --endpoint-identifier redshift-target-endpoint --endpoint-type target 
--engine-name redshift --username your-username --password your-password 
--server-name your-server-name --port 5439 --database-name your-db-name 
--redshift-settings '{"EncryptionMode": "SSE_S3"}'
```

For more information about working with S3 server-side encryption, see [Protecting data using server-side encryption](https://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html) in the *Amazon Simple Storage Service User Guide.*

**Note**  
You can also use the CLI `modify-endpoint` command to change the value of the `EncryptionMode` parameter for an existing endpoint from `SSE_KMS` to `SSE_S3`. But you can’t change the `EncryptionMode` value from `SSE_S3` to `SSE_KMS`.

### Amazon S3 bucket settings
<a name="CHAP_Target.Redshift.EndpointSettings.S3Buckets"></a>

When you migrate data to an Amazon Redshift target endpoint, AWS DMS uses a default Amazon S3 bucket as intermediate task storage before copying the migrated data to Amazon Redshift. For example, the examples shown for creating an Amazon Redshift target endpoint with a AWS KMS data encryption key use this default S3 bucket (see [KMS key settings for data encryption](#CHAP_Target.Redshift.EndpointSettings.KMSkeys)). 

You can instead specify a custom S3 bucket for this intermediate storage by including the following parameters in the value of your `--redshift-settings` option on the AWS CLI `create-endpoint` command:
+ `BucketName` – A string you specify as the name of the S3 bucket storage. If your service access role is based on the `AmazonDMSRedshiftS3Role` policy, this value must have a prefix of `dms-`, for example, `dms-my-bucket-name`.
+ `BucketFolder` – (Optional) A string you can specify as the name of the storage folder in the specified S3 bucket.
+ `ServiceAccessRoleArn` – The ARN of an IAM role that permits administrative access to the S3 bucket. Typically, you create this role based on the `AmazonDMSRedshiftS3Role` policy. For an example, see the procedure to create an IAM role with the required AWS-managed policy in [Creating and using AWS KMS keys to encrypt Amazon Redshift target data](#CHAP_Target.Redshift.KMSKeys).
**Note**  
If you specify the ARN of a different IAM role using the `--service-access-role-arn` option of the `create-endpoint` command, this IAM role option takes precedence.

The following example shows how you might use these parameters to specify a custom Amazon S3 bucket in the following `create-endpoint` call using the AWS CLI. 

```
aws dms create-endpoint --endpoint-identifier redshift-target-endpoint --endpoint-type target 
--engine-name redshift --username your-username --password your-password 
--server-name your-server-name --port 5439 --database-name your-db-name 
--redshift-settings '{"ServiceAccessRoleArn": "your-service-access-ARN", 
"BucketName": "your-bucket-name", "BucketFolder": "your-bucket-folder-name"}'
```

## Multithreaded task settings for Amazon Redshift
<a name="CHAP_Target.Redshift.ParallelApply"></a>

You can improve performance of full load and change data capture (CDC) tasks for an Amazon Redshift target endpoint by using multithreaded task settings. They enable you to specify the number of concurrent threads and the number of records to store in a buffer.

### Multithreaded full load task settings for Amazon Redshift
<a name="CHAP_Target.Redshift.ParallelApply.FullLoad"></a>

To promote full load performance, you can use the following `ParallelLoad*` task settings:
+ `ParallelLoadThreads` – Specifies the number of concurrent threads that DMS uses during a full load to push data records to an Amazon Redshift target endpoint. The default value is zero (0) and the maximum value is 32. For more information, see [Full-load task settings](CHAP_Tasks.CustomizingTasks.TaskSettings.FullLoad.md).

  You can use the `enableParallelBatchInMemoryCSVFiles` attribute set to `false` when using the `ParallelLoadThreads` task setting. The attribute improves performance of larger multithreaded full load tasks by having DMS write to disk instead of memory. The default value is `true`.
+ `ParallelLoadBufferSize` – Specifies the maximum data record requests while using parallel load threads with Redshift target. The default value is 100 and the maximum value is 1,000. We recommend you use this option when ParallelLoadThreads > 1 (greater than one).

**Note**  
Support for the use of `ParallelLoad*` task settings during FULL LOAD to Amazon Redshift target endpoints is available in AWS DMS versions 3.4.5 and higher.  
The `ReplaceInvalidChars` Redshift endpoint setting is not supported for use during change data capture (CDC) or during a parallel load enabled FULL LOAD migration task. It is supported for FULL LOAD migration when parallel load isn’t enabled. For more information see [RedshiftSettings](https://docs.aws.amazon.com/dms/latest/APIReference/API_RedshiftSettings.html) in the *AWS Database Migration Service API Reference*

### Multithreaded CDC task settings for Amazon Redshift
<a name="CHAP_Target.Redshift.ParallelApply.CDC"></a>

To promote CDC performance, you can use the following `ParallelApply*` task settings:
+ `ParallelApplyThreads` – Specifies the number of concurrent threads that AWS DMS uses during a CDC load to push data records to a Amazon Redshift target endpoint. The default value is zero (0) and the maximum value is 32. The minimum recommended value equals the number of slices in your cluster.
+ `ParallelApplyBufferSize` – Specifies the maximum data record requests while using parallel apply threads with Redshift target. The default value is 100 and the maximum value is 1,000. We recommend to use this option when ParallelApplyThreads > 1 (greater than one). 

  To obtain the most benefit for Redshift as a target, we recommend that the value of `ParallelApplyBufferSize` be at least two times (double or more) the number of `ParallelApplyThreads`.

**Note**  
Support for the use of `ParallelApply*` task settings during CDC to Amazon Redshift target endpoints is available in AWS DMS versions 3.4.3 and higher.

The level of parallelism applied depends on the correlation between the total *batch size* and the *maximum file size* used to transfer data. When using multithreaded CDC task settings with a Redshift target, benefits are gained when batch size is large in relation to the maximum file size. For example, you can use the following combination of endpoint and task settings to tune for optimal performance. 

```
// Redshift endpoint setting
                
        MaxFileSize=250000;

// Task settings

        BatchApplyEnabled=true;
        BatchSplitSize =8000;
        BatchApplyTimeoutMax =1800;
        BatchApplyTimeoutMin =1800;
        ParallelApplyThreads=32;
        ParallelApplyBufferSize=100;
```

Using the settings in the previous example, a customer with a heavy transactional workload benefits by their batch buffer, containing 8000 records, getting filled in 1800 seconds, utilizing 32 parallel threads with a 250 MB maximum file size.

For more information, see [Change processing tuning settings](CHAP_Tasks.CustomizingTasks.TaskSettings.ChangeProcessingTuning.md).

**Note**  
DMS queries that run during ongoing replication to a Redshift cluster can share the same WLM (workload management) queue with other application queries that are running. So, consider properly configuring WLM properties to influence performance during ongoing replication to a Redshift target. For example, if other parallel ETL queries are running, DMS runs slower and performance gains are lost.

## Target data types for Amazon Redshift
<a name="CHAP_Target.Redshift.DataTypes"></a>

The Amazon Redshift endpoint for AWS DMS supports most Amazon Redshift data types. The following table shows the Amazon Redshift target data types that are supported when using AWS DMS and the default mapping from AWS DMS data types.

For additional information about AWS DMS data types, see [Data types for AWS Database Migration Service](CHAP_Reference.DataTypes.md).


| AWS DMS data types | Amazon Redshift data types | 
| --- | --- | 
| BOOLEAN | BOOL | 
| BYTES | VARCHAR (Length) | 
| DATE | DATE | 
| TIME | VARCHAR(20) | 
| DATETIME |  If the scale is => 0 and =< 6, depending on Redshift target column type, then one of the following: TIMESTAMP (s) TIMESTAMPTZ (s) — If source timestamp contains a zone offset (such as in SQL Server or Oracle) it converts to UTC on insert/update. If it does not contain an offset, then time is considered in UTC already. If the scale is => 7 and =< 9, then:  VARCHAR (37) | 
| INT1 | INT2 | 
| INT2 | INT2 | 
| INT4 | INT4 | 
| INT8 | INT8 | 
| NUMERIC | If the scale is => 0 and =< 37, then:  NUMERIC (p,s)  If the scale is => 38 and =< 127, then:  VARCHAR (Length) | 
| REAL4 | FLOAT4 | 
| REAL8 | FLOAT8 | 
| STRING | If the length is 1–65,535, then use VARCHAR (length in bytes)  If the length is 65,536–2,147,483,647, then use VARCHAR (65535) | 
| UINT1 | INT2 | 
| UINT2 | INT2 | 
| UINT4 | INT4 | 
| UINT8 | NUMERIC (20,0) | 
| WSTRING |  If the length is 1–65,535, then use NVARCHAR (length in bytes)  If the length is 65,536–2,147,483,647, then use NVARCHAR (65535) | 
| BLOB | VARCHAR (maximum LOB size \$12)  The maximum LOB size cannot exceed 31 KB. Amazon Redshift does not support VARCHARs larger than 64 KB. | 
| NCLOB | NVARCHAR (maximum LOB size)  The maximum LOB size cannot exceed 63 KB. Amazon Redshift does not support VARCHARs larger than 64 KB. | 
| CLOB | VARCHAR (maximum LOB size)  The maximum LOB size cannot exceed 63 KB. Amazon Redshift does not support VARCHARs larger than 64 KB. | 

## Using AWS DMS with Amazon Redshift Serverless as a Target
<a name="CHAP_Target.Redshift.RSServerless"></a>

AWS DMS supports using Amazon Redshift Serverless as a target endpoint. For information about using Amazon Redshift Serverless, see [Amazon Redshift Serverless](https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-serverless.html) in the [Amazon Redshift Management Guide](https://docs.aws.amazon.com/redshift/latest/mgmt/welcome.html).

This topic describes how to use a Amazon Redshift Serverless endpoint with AWS DMS.

**Note**  
When creating an Amazon Redshift Serverless endpoint, for the **DatabaseName** field of your [RedshiftSettings](https://docs.aws.amazon.com/dms/latest/APIReference/API_RedshiftSettings.html) endpoint configuration, use either the name of the Amazon Redshift data warehouse or the name of the workgroup endpoint. For the **ServerName** field, use the value for Endpoint displayed in the **Workgroup** page for the serverless cluster (for example, `default-workgroup.093291321484.us-east-1.redshift-serverless.amazonaws.com`). For information about creating an endpoint, see [Creating source and target endpoints](CHAP_Endpoints.Creating.md). For information about the workgroup endpoint, see [ Connecting to Amazon Redshift Serverless ](https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-connecting.html).

### Trust Policy with Amazon Redshift Serverless as a target
<a name="CHAP_Target.Redshift.RSServerless.policy"></a>

When using Amazon Redshift Serverless as a target endpoint, you must add the following highlighted section to your trust policy. This trust policy is attached to the `dms-access-for-endpoint` role.

For more information about using a trust policy with AWS DMS, see [Creating the IAM roles to use with AWS DMS](security-iam.md#CHAP_Security.APIRole).

### Limitations when using Amazon Redshift Serverless as a target
<a name="CHAP_Target.Redshift.RSServerless.Limitations"></a>

Using Redshift Serverless as a target has the following limitations:
+ AWS DMS only supports Amazon Redshift Serverless as an endpoint in regions that support Amazon Redshift Serverless. For information about which regions support Amazon Redshift Serverless, see **Redshift Serverless API** in the [Amazon Redshift endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/redshift-service.html) topic in the [AWS General Reference](https://docs.aws.amazon.com/general/latest/gr/Welcome.html).
+ When using Enhanced VPC Routing, make sure that you create an Amazon S3 endpoint in the same VPC as your Redshift Serverless or Redshift Provisioned cluster. For more information, see [Using enhanced VPC routing with Amazon Redshift as a target for AWS Database Migration Service](#CHAP_Target.Redshift.EnhancedVPC).
+ AWS DMS does not support Enhanced Throughput for Amazon Redshift Serverless as a target. For more information, see [Enhanced Throughput for Full-Load Oracle to Amazon Redshift and Amazon S3 Migrations](CHAP_Serverless.Components.md#CHAP_Serverless.Throughput).
+ AWS DMS does not support connections to Amazon Redshift Redshift Serverless when the SSL mode is set to `verify-full`. For connections requiring SSL verification to Amazon Redshift Serverless targets, use alternative SSL modes such as `require` or `verify-ca`.

# Using a SAP ASE database as a target for AWS Database Migration Service
<a name="CHAP_Target.SAP"></a>

You can migrate data to SAP Adaptive Server Enterprise (ASE)–formerly known as Sybase–databases using AWS DMS, either from any of the supported database sources.

For information about versions of SAP ASE that AWS DMS supports as a target, see [Targets for AWS DMS](CHAP_Introduction.Targets.md).

## Prerequisites for using a SAP ASE database as a target for AWS Database Migration Service
<a name="CHAP_Target.SAP.Prerequisites"></a>

Before you begin to work with a SAP ASE database as a target for AWS DMS, make sure that you have the following prerequisites:
+ Provide SAP ASE account access to the AWS DMS user. This user must have read/write privileges in the SAP ASE database.
+ In some cases, you might replicate to SAP ASE version 15.7 installed on an Amazon EC2 instance on Microsoft Windows that is configured with non-Latin characters (for example, Chinese). In such cases, AWS DMS requires SAP ASE 15.7 SP121 to be installed on the target SAP ASE machine.

## Limitations when using a SAP ASE database as a target for AWS DMS
<a name="CHAP_Target.SAP.Limitations"></a>

The following limitations apply when using an SAP ASE database as a target for AWS DMS:
+ AWS DMS doesn't support tables that include fields with the following data types. Replicated columns with these data types show as null. 
  + User-defined type (UDT)

## Endpoint settings when using SAP ASE as a target for AWS DMS
<a name="CHAP_Target.SAP.ConnectionAttrib"></a>

You can use endpoint settings to configure your SAP ASE target database similar to using extra connection attributes. You specify the settings when you create the target endpoint using the AWS DMS console, or by using the `create-endpoint` command in the [AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/dms/index.html), with the `--sybase-settings '{"EndpointSetting": "value", ...}'` JSON syntax.

The following table shows the endpoint settings that you can use with SAP ASE as a target.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.SAP.html)

## Target data types for SAP ASE
<a name="CHAP_Target.SAP.DataTypes"></a>

The following table shows the SAP ASE database target data types that are supported when using AWS DMS and the default mapping from AWS DMS data types.

For additional information about AWS DMS data types, see [Data types for AWS Database Migration Service](CHAP_Reference.DataTypes.md).


|  AWS DMS data types  |  SAP ASE data types  | 
| --- | --- | 
| BOOLEAN | BIT | 
| BYTES | VARBINARY (Length) | 
| DATE | DATE | 
| TIME | TIME | 
| TIMESTAMP |  If scale is => 0 and =< 6, then: BIGDATETIME  If scale is => 7 and =< 9, then: VARCHAR (37)  | 
| INT1 | TINYINT | 
| INT2 | SMALLINT | 
| INT4 | INTEGER | 
| INT8 | BIGINT | 
| NUMERIC | NUMERIC (p,s) | 
| REAL4 | REAL | 
| REAL8 | DOUBLE PRECISION | 
| STRING | VARCHAR (Length) | 
| UINT1 | TINYINT | 
| UINT2 | UNSIGNED SMALLINT | 
| UINT4 | UNSIGNED INTEGER | 
| UINT8 | UNSIGNED BIGINT | 
| WSTRING | VARCHAR (Length) | 
| BLOB | IMAGE | 
| CLOB | UNITEXT | 
| NCLOB | TEXT | 

# Using Amazon S3 as a target for AWS Database Migration Service
<a name="CHAP_Target.S3"></a>

You can migrate data to Amazon S3 using AWS DMS from any of the supported database sources. When using Amazon S3 as a target in an AWS DMS task, both full load and change data capture (CDC) data is written to comma-separated value (.csv) format by default. For more compact storage and faster query options, you also have the option to have the data written to Apache Parquet (.parquet) format. 

AWS DMS names files created during a full load using an incremental hexadecimal counter—for example LOAD00001.csv, LOAD00002..., LOAD00009, LOAD0000A, and so on for .csv files. AWS DMS names CDC files using timestamps, for example 20141029-1134010000.csv. For each source table that contains records, AWS DMS creates a folder under the specified target folder (if the source table is not empty). AWS DMS writes all full load and CDC files to the specified Amazon S3 bucket. You can control the size of the files that AWS DMS creates by using the [MaxFileSize](https://docs.aws.amazon.com/dms/latest/APIReference/API_S3Settings.html#DMS-Type-S3Settings-MaxFileSize) endpoint setting. 

The parameter `bucketFolder` contains the location where the .csv or .parquet files are stored before being uploaded to the S3 bucket. With .csv files, table data is stored in the following format in the S3 bucket, shown with full-load files.

```
database_schema_name/table_name/LOAD00000001.csv
database_schema_name/table_name/LOAD00000002.csv
...
database_schema_name/table_name/LOAD00000009.csv
database_schema_name/table_name/LOAD0000000A.csv
database_schema_name/table_name/LOAD0000000B.csv
...database_schema_name/table_name/LOAD0000000F.csv
database_schema_name/table_name/LOAD00000010.csv
...
```

You can specify the column delimiter, row delimiter, and other parameters using the extra connection attributes. For more information on the extra connection attributes, see [Endpoint settings when using Amazon S3 as a target for AWS DMS](#CHAP_Target.S3.Configuring) at the end of this section.

To prevent spoofing, AWS DMS validates bucket ownership before performing operations. By default, when the `ExpectedBucketOwner` Amazon S3 endpoint setting is not specified, AWS DMS uses the AWS account ID that owns the AWS DMS service role as the expected bucket owner.

To migrate data to an S3 bucket owned by a different AWS account, you must explicitly specify the actual bucket owner in the `ExpectedBucketOwner` Amazon S3 endpoint setting, as shown following. Otherwise, the cross-account replication task will fail.

```
--s3-settings '{"ExpectedBucketOwner": "AWS_Account_ID"}'
```

When you use AWS DMS to replicate data changes using a CDC task, the first column of the .csv or .parquet output file indicates how the row data was changed as shown for the following .csv file.

```
I,101,Smith,Bob,4-Jun-14,New York
U,101,Smith,Bob,8-Oct-15,Los Angeles
U,101,Smith,Bob,13-Mar-17,Dallas
D,101,Smith,Bob,13-Mar-17,Dallas
```

For this example, suppose that there is an `EMPLOYEE` table in the source database. AWS DMS writes data to the .csv or .parquet file, in response to the following events:
+ A new employee (Bob Smith, employee ID 101) is hired on 4-Jun-14 at the New York office. In the .csv or .parquet file, the `I` in the first column indicates that a new row was `INSERT`ed into the EMPLOYEE table at the source database.
+ On 8-Oct-15, Bob transfers to the Los Angeles office. In the .csv or .parquet file, the `U` indicates that the corresponding row in the EMPLOYEE table was `UPDATE`d to reflect Bob's new office location. The rest of the line reflects the row in the EMPLOYEE table as it appears after the `UPDATE`. 
+ On 13-Mar,17, Bob transfers again to the Dallas office. In the .csv or .parquet file, the `U` indicates that this row was `UPDATE`d again. The rest of the line reflects the row in the EMPLOYEE table as it appears after the `UPDATE`.
+ After some time working in Dallas, Bob leaves the company. In the .csv or .parquet file, the `D` indicates that the row was `DELETE`d in the source table. The rest of the line reflects how the row in the EMPLOYEE table appeared before it was deleted.

Note that by default for CDC, AWS DMS stores the row changes for each database table without regard to transaction order. If you want to store the row changes in CDC files according to transaction order, you need to use S3 endpoint settings to specify this and the folder path where you want the CDC transaction files to be stored on the S3 target. For more information, see [Capturing data changes (CDC) including transaction order on the S3 target](#CHAP_Target.S3.EndpointSettings.CdcPath).

To control the frequency of writes to an Amazon S3 target during a data replication task, you can configure the `cdcMaxBatchInterval` and `cdcMinFileSize` extra connection attributes. This can result in better performance when analyzing the data without any additional overhead operations. For more information, see [Endpoint settings when using Amazon S3 as a target for AWS DMS](#CHAP_Target.S3.Configuring) 

**Topics**
+ [Prerequisites for using Amazon S3 as a target](#CHAP_Target.S3.Prerequisites)
+ [Limitations to using Amazon S3 as a target](#CHAP_Target.S3.Limitations)
+ [Security](#CHAP_Target.S3.Security)
+ [Using Apache Parquet to store Amazon S3 objects](#CHAP_Target.S3.Parquet)
+ [Amazon S3 object tagging](#CHAP_Target.S3.Tagging)
+ [Creating AWS KMS keys to encrypt Amazon S3 target objects](#CHAP_Target.S3.KMSKeys)
+ [Using date-based folder partitioning](#CHAP_Target.S3.DatePartitioning)
+ [Parallel load of partitioned sources when using Amazon S3 as a target for AWS DMS](#CHAP_Target.S3.ParallelLoad)
+ [Endpoint settings when using Amazon S3 as a target for AWS DMS](#CHAP_Target.S3.Configuring)
+ [Using AWS Glue Data Catalog with an Amazon S3 target for AWS DMS](#CHAP_Target.S3.GlueCatalog)
+ [Using data encryption, parquet files, and CDC on your Amazon S3 target](#CHAP_Target.S3.EndpointSettings)
+ [Indicating source DB operations in migrated S3 data](#CHAP_Target.S3.Configuring.InsertOps)
+ [Target data types for S3 Parquet](#CHAP_Target.S3.DataTypes)

## Prerequisites for using Amazon S3 as a target
<a name="CHAP_Target.S3.Prerequisites"></a>

Before using Amazon S3 as a target, check that the following are true: 
+ The S3 bucket that you're using as a target is in the same AWS Region as the DMS replication instance you are using to migrate your data.
+ The AWS account that you use for the migration has an IAM role with write and delete access to the S3 bucket you are using as a target.
+ This role has tagging access so you can tag any S3 objects written to the target bucket.
+ The IAM role has DMS (dms.amazonaws.com) added as *trusted entity*. 
+ For AWS DMS version 3.4.7 and higher, DMS must access the source bucket through a VPC endpoint or a public route. For information about VPC endpoints, see [Configuring VPC endpoints for AWS DMS](CHAP_VPC_Endpoints.md).

To set up this account access, ensure that the role assigned to the user account used to create the migration task has the following set of permissions.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:PutObjectTagging"
            ],
            "Resource": [
                "arn:aws:s3:::buckettest2/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::buckettest2"
            ]
        }
    ]
}
```

------

For prerequisites for using validation with S3 as a target, see [S3 target validation prerequisites](CHAP_Validating_S3.md#CHAP_Validating_S3_prerequisites).

## Limitations to using Amazon S3 as a target
<a name="CHAP_Target.S3.Limitations"></a>

The following limitations apply when using Amazon S3 as a target:
+ Don’t enable versioning for S3. If you need S3 versioning, use lifecycle policies to actively delete old versions. Otherwise, you might encounter endpoint test connection failures because of an S3 `list-object` call timeout. To create a lifecycle policy for an S3 bucket, see [ Managing your storage lifecycle](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html). To delete a version of an S3 object, see [ Deleting object versions from a versioning-enabled bucket](https://docs.aws.amazon.com/AmazonS3/latest/dev/DeletingObjectVersions.html).
+ A VPC-enabled (gateway VPC) S3 bucket is supported in versions 3.4.7 and higher.
+ The following data definition language (DDL) commands are supported for change data capture (CDC): Truncate Table, Drop Table, Create Table, Rename Table, Add Column, Drop Column, Rename Column, and Change Column Data Type. Note that when a column is added, dropped, or renamed on the source database, no ALTER statement is recorded in the target S3 bucket, and AWS DMS does not alter previously created records to match the new structure. After the change, AWS DMS creates any new records using the new table structure.
**Note**  
A truncate DDL operation removes all files and corresponding table folders from an S3 bucket. You can use task settings to disable that behavior and configure the way DMS handles DDL behavior during change data capture (CDC). For more information, see [Task settings for change processing DDL handling](CHAP_Tasks.CustomizingTasks.TaskSettings.DDLHandling.md).
+ Full LOB mode is not supported.
+ Changes to the source table structure during full load are not supported. Changes to data are supported during full load.
+ Multiple tasks that replicate data from the same source table to the same target S3 endpoint bucket result in those tasks writing to the same file. We recommend that you specify different target endpoints (buckets) if your data source is from the same table.
+ `BatchApply` is not supported for an S3 endpoint. Using Batch Apply (for example, the `BatchApplyEnabled` target metadata task setting) for an S3 target might result in loss of data.
+ You can't use `DatePartitionEnabled` or `addColumnName` together with `PreserveTransactions` or `CdcPath`.
+ AWS DMS doesn't support renaming multiple source tables to the same target folder using transformation rules.
+ If there is intensive writing to the source table during the full load phase, DMS may write duplicate records to the S3 bucket or cached changes.
+ If you configure the task with a `TargetTablePrepMode` of `DO_NOTHING`, DMS may write duplicate records to the S3 bucket if the task stops and resumes abruptly during the full load phase.
+ If you configure the target endpoint with a `PreserveTransactions` setting of `true`, reloading a table doesn't clear previously generated CDC files. For more information, see [Capturing data changes (CDC) including transaction order on the S3 target](#CHAP_Target.S3.EndpointSettings.CdcPath).

For limitations for using validation with S3 as a target, see [Limitations for using S3 target validation](CHAP_Validating_S3.md#CHAP_Validating_S3_limitations).

## Security
<a name="CHAP_Target.S3.Security"></a>

To use Amazon S3 as a target, the account used for the migration must have write and delete access to the Amazon S3 bucket that is used as the target. Specify the Amazon Resource Name (ARN) of an IAM role that has the permissions required to access Amazon S3. 

AWS DMS supports a set of predefined grants for Amazon S3, known as canned access control lists (ACLs). Each canned ACL has a set of grantees and permissions that you can use to set permissions for the Amazon S3 bucket. You can specify a canned ACL using the `cannedAclForObjects` on the connection string attribute for your S3 target endpoint. For more information about using the extra connection attribute `cannedAclForObjects`, see [Endpoint settings when using Amazon S3 as a target for AWS DMS](#CHAP_Target.S3.Configuring). For more information about Amazon S3 canned ACLs, see [Canned ACL](http://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl).

The IAM role that you use for the migration must be able to perform the `s3:PutObjectAcl` API operation.

## Using Apache Parquet to store Amazon S3 objects
<a name="CHAP_Target.S3.Parquet"></a>

The comma-separated value (.csv) format is the default storage format for Amazon S3 target objects. For more compact storage and faster queries, you can instead use Apache Parquet (.parquet) as the storage format.

Apache Parquet is an open-source file storage format originally designed for Hadoop. For more information on Apache Parquet, see [https://parquet.apache.org/](https://parquet.apache.org/).

To set .parquet as the storage format for your migrated S3 target objects, you can use the following mechanisms:
+ Endpoint settings that you provide as parameters of a JSON object when you create the endpoint using the AWS CLI or the API for AWS DMS. For more information, see [Using data encryption, parquet files, and CDC on your Amazon S3 target](#CHAP_Target.S3.EndpointSettings).
+ Extra connection attributes that you provide as a semicolon-separated list when you create the endpoint. For more information, see [Endpoint settings when using Amazon S3 as a target for AWS DMS](#CHAP_Target.S3.Configuring).

## Amazon S3 object tagging
<a name="CHAP_Target.S3.Tagging"></a>

You can tag Amazon S3 objects that a replication instance creates by specifying appropriate JSON objects as part of task-table mapping rules. For more information about requirements and options for S3 object tagging, including valid tag names, see [Object tagging](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html) in the *Amazon Simple Storage Service User Guide*. For more information about table mapping using JSON, see [Specifying table selection and transformations rules using JSON](CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.md).

You tag S3 objects created for specified tables and schemas by using one or more JSON objects of the `selection` rule type. You then follow this `selection` object (or objects) by one or more JSON objects of the `post-processing` rule type with `add-tag` action. These post-processing rules identify the S3 objects that you want to tag and specify the names and values of the tags that you want to add to these S3 objects.

You can find the parameters to specify in JSON objects of the `post-processing` rule type in the following table.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html)

When you specify multiple `post-processing` rule types to tag a selection of S3 objects, each S3 object is tagged using only one `tag-set` object from one post-processing rule. The particular tag set used to tag a given S3 object is the one from the post-processing rule whose associated object locator best matches that S3 object. 

For example, suppose that two post-processing rules identify the same S3 object. Suppose also that the object locator from one rule uses wildcards and the object locator from the other rule uses an exact match to identify the S3 object (without wildcards). In this case, the tag set associated with the post-processing rule with the exact match is used to tag the S3 object. If multiple post-processing rules match a given S3 object equally well, the tag set associated with the first such post-processing rule is used to tag the object.

**Example Adding static tags to an S3 object created for a single table and schema**  
The following selection and post-processing rules add three tags (`tag_1`, `tag_2`, and `tag_3` with corresponding static values `value_1`, `value_2`, and `value_3`) to a created S3 object. This S3 object corresponds to a single table in the source named `STOCK` with a schema named `aat2`.  

```
{
    "rules": [
        {
            "rule-type": "selection",
            "rule-id": "5",
            "rule-name": "5",
            "object-locator": {
                "schema-name": "aat2",
                "table-name": "STOCK"
            },
            "rule-action": "include"
        },
        {
            "rule-type": "post-processing",
            "rule-id": "41",
            "rule-name": "41",
            "rule-action": "add-tag",
            "object-locator": {
                "schema-name": "aat2",
                "table-name": "STOCK"
            },
            "tag-set": [
              {
                "key": "tag_1",
                "value": "value_1"
              },
              {
                "key": "tag_2",
                "value": "value_2"
              },
              {
                "key": "tag_3",
                "value": "value_3"
              }                                     
           ]
        }
    ]
}
```

**Example Adding static and dynamic tags to S3 objects created for multiple tables and schemas**  
The following example has one selection and two post-processing rules, where input from the source includes all tables and all of their schemas.  

```
{
    "rules": [
        {
            "rule-type": "selection",
            "rule-id": "1",
            "rule-name": "1",
            "object-locator": {
                "schema-name": "%",
                "table-name": "%"
            },
            "rule-action": "include"
        },
        {
            "rule-type": "post-processing",
            "rule-id": "21",
            "rule-name": "21",
            "rule-action": "add-tag",
            "object-locator": {
                "schema-name": "%",
                "table-name": "%",
            },
            "tag-set": [
              { 
                "key": "dw-schema-name",
                "value":"${schema-name}"
              },
              {
                "key": "dw-schema-table",
                "value": "my_prefix_${table-name}"
              }
            ]
        },
        {
            "rule-type": "post-processing",
            "rule-id": "41",
            "rule-name": "41",
            "rule-action": "add-tag",
            "object-locator": {
                "schema-name": "aat",
                "table-name": "ITEM",
            },
            "tag-set": [
              {
                "key": "tag_1",
                "value": "value_1"
              },
              {
                "key": "tag_2",
                "value": "value_2"
              }           ]
        }
    ]
}
```
The first post-processing rule adds two tags (`dw-schema-name` and `dw-schema-table`) with corresponding dynamic values (`${schema-name}` and `my_prefix_${table-name}`) to almost all S3 objects created in the target. The exception is the S3 object identified and tagged with the second post-processing rule. Thus, each target S3 object identified by the wildcard object locator is created with tags that identify the schema and table to which it corresponds in the source.  
The second post-processing rule adds `tag_1` and `tag_2` with corresponding static values `value_1` and `value_2` to a created S3 object that is identified by an exact-match object locator. This created S3 object thus corresponds to the single table in the source named `ITEM` with a schema named `aat`. Because of the exact match, these tags replace any tags on this object added from the first post-processing rule, which matches S3 objects by wildcard only.

**Example Adding both dynamic tag names and values to S3 objects**  
The following example has two selection rules and one post-processing rule. Here, input from the source includes just the `ITEM` table in either the `retail` or `wholesale` schema.  

```
{
    "rules": [
        {
            "rule-type": "selection",
            "rule-id": "1",
            "rule-name": "1",
            "object-locator": {
                "schema-name": "retail",
                "table-name": "ITEM"
            },
            "rule-action": "include"
        },
        {
            "rule-type": "selection",
            "rule-id": "1",
            "rule-name": "1",
            "object-locator": {
                "schema-name": "wholesale",
                "table-name": "ITEM"
            },
            "rule-action": "include"
        },
        {
            "rule-type": "post-processing",
            "rule-id": "21",
            "rule-name": "21",
            "rule-action": "add-tag",
            "object-locator": {
                "schema-name": "%",
                "table-name": "ITEM",
            },
            "tag-set": [
              { 
                "key": "dw-schema-name",
                "value":"${schema-name}"
              },
              {
                "key": "dw-schema-table",
                "value": "my_prefix_ITEM"
              },
              {
                "key": "${schema-name}_ITEM_tag_1",
                "value": "value_1"
              },
              {
                "key": "${schema-name}_ITEM_tag_2",
                "value": "value_2"
              }
            ]
    ]
}
```
The tag set for the post-processing rule adds two tags (`dw-schema-name` and `dw-schema-table`) to all S3 objects created for the `ITEM` table in the target. The first tag has the dynamic value `"${schema-name}"` and the second tag has a static value, `"my_prefix_ITEM"`. Thus, each target S3 object is created with tags that identify the schema and table to which it corresponds in the source.   
In addition, the tag set adds two additional tags with dynamic names (`${schema-name}_ITEM_tag_1` and `"${schema-name}_ITEM_tag_2"`). These have the corresponding static values `value_1` and `value_2`. Thus, these tags are each named for the current schema, `retail` or `wholesale`. You can't create a duplicate dynamic tag name in this object, because each object is created for a single unique schema name. The schema name is used to create an otherwise unique tag name.

## Creating AWS KMS keys to encrypt Amazon S3 target objects
<a name="CHAP_Target.S3.KMSKeys"></a>

You can create and use custom AWS KMS keys to encrypt your Amazon S3 target objects. After you create a KMS key, you can use it to encrypt objects using one of the following approaches when you create the S3 target endpoint:
+ Use the following options for S3 target objects (with the default .csv file storage format) when you run the `create-endpoint` command using the AWS CLI.

  ```
  --s3-settings '{"ServiceAccessRoleArn": "your-service-access-ARN", 
  "CsvRowDelimiter": "\n", "CsvDelimiter": ",", "BucketFolder": "your-bucket-folder", 
  "BucketName": "your-bucket-name", "EncryptionMode": "SSE_KMS", 
  "ServerSideEncryptionKmsKeyId": "your-KMS-key-ARN"}'
  ```

  Here, your-`your-KMS-key-ARN` is the Amazon Resource Name (ARN) for your KMS key and it is required your IAM role has access permissions, see [Using data encryption, parquet files, and CDC on your Amazon S3 target](#CHAP_Target.S3.EndpointSettings).
+ Set the extra connection attribute `encryptionMode` to the value `SSE_KMS` and the extra connection attribute `serverSideEncryptionKmsKeyId` to the ARN for your KMS key. For more information, see [Endpoint settings when using Amazon S3 as a target for AWS DMS](#CHAP_Target.S3.Configuring).

To encrypt Amazon S3 target objects using a KMS key, you need an IAM role that has permissions to access the Amazon S3 bucket. This IAM role is then accessed in a policy (a key policy) attached to the encryption key that you create. You can do this in your IAM console by creating the following:
+ A policy with permissions to access the Amazon S3 bucket.
+ An IAM role with this policy.
+ A KMS key encryption key with a key policy that references this role.

The following procedures describe how to do this.

**To create an IAM policy with permissions to access the Amazon S3 bucket**

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the navigation pane, choose **Policies** in the navigation pane. The **Policies** page opens.

1. Choose **Create policy**. The **Create policy** page opens.

1. Choose **Service** and choose **S3**. A list of action permissions appears.

1. Choose **Expand all** to expand the list and choose the following permissions at a minimum:
   + **ListBucket**
   + **PutObject**
   + **DeleteObject**

   Choose any other permissions you need, and then choose **Collapse all** to collapse the list.

1. Choose **Resources** to specify the resources that you want to access. At a minimum, choose **All resources** to provide general Amazon S3 resource access.

1. Add any other conditions or permissions you need, then choose **Review policy**. Check your results on the **Review policy** page.

1. If the settings are what you need, enter a name for the policy (for example, `DMS-S3-endpoint-access`), and any description, then choose **Create policy**. The **Policies** page opens with a message indicating that your policy has been created.

1. Search for and choose the policy name in the **Policies** list. The **Summary** page appears displaying JSON for the policy similar to the following.

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "VisualEditor0",
               "Effect": "Allow",
               "Action": [
                   "s3:PutObject",
                   "s3:ListBucket",
                   "s3:DeleteObject"
               ],
               "Resource": "*"
           }
       ]
   }
   ```

------

You have now created the new policy to access Amazon S3 resources for encryption with a specified name, for example `DMS-S3-endpoint-access`.

**To create an IAM role with this policy**

1. On your IAM console, choose **Roles** in the navigation pane. The **Roles** detail page opens.

1. Choose **Create role**. The **Create role** page opens.

1. With AWS service selected as the trusted entity, choose **DMS** as the service to use the IAM role.

1. Choose **Next: Permissions**. The **Attach permissions policies** view appears in the **Create role** page.

1. Find and select the IAM policy for the IAM role that you created in the previous procedure (`DMS-S3-endpoint-access`).

1. Choose **Next: Tags**. The **Add tags** view appears in the **Create role** page. Here, you can add any tags you want.

1. Choose **Next: Review**. The **Review** view appears in the **Create role** page. Here, you can verify the results.

1. If the settings are what you need, enter a name for the role (required, for example, `DMS-S3-endpoint-access-role`), and any additional description, then choose **Create role**. The **Roles** detail page opens with a message indicating that your role has been created.

You have now created the new role to access Amazon S3 resources for encryption with a specified name, for example, `DMS-S3-endpoint-access-role`.

**To create a KMS key encryption key with a key policy that references your IAM role**
**Note**  
For more information about how AWS DMS works with AWS KMS encryption keys, see [Setting an encryption key and specifying AWS KMS permissions](CHAP_Security.md#CHAP_Security.EncryptionKey).

1. Sign in to the AWS Management Console and open the AWS Key Management Service (AWS KMS) console at [https://console.aws.amazon.com/kms](https://console.aws.amazon.com/kms).

1. To change the AWS Region, use the Region selector in the upper-right corner of the page.

1. In the navigation pane, choose **Customer managed keys**.

1. Choose **Create key**. The **Configure key** page opens.

1. For **Key type**, choose **Symmetric**.
**Note**  
When you create this key, you can only create a symmetric key, because all AWS services, such as Amazon S3, only work with symmetric encryption keys.

1. Choose **Advanced Options**. For **Key material origin**, make sure that **KMS** is chosen, then choose **Next**. The **Add labels** page opens.

1. For **Create alias and description**, enter an alias for the key (for example, `DMS-S3-endpoint-encryption-key`) and any additional description.

1. For **Tags**, add any tags that you want to help identify the key and track its usage, then choose **Next**. The **Define key administrative permissions** page opens showing a list of users and roles that you can choose from.

1. Add the users and roles that you want to manage the key. Make sure that these users and roles have the required permissions to manage the key. 

1. For **Key deletion**, choose whether key administrators can delete the key, then choose **Next**. The **Define key usage permissions** page opens showing an additional list of users and roles that you can choose from.

1. For **This account**, choose the available users you want to perform cryptographic operations on Amazon S3 targets. Also choose the role that you previously created in **Roles** to enable access to encrypt Amazon S3 target objects, for example `DMS-S3-endpoint-access-role`).

1. If you want to add other accounts not listed to have this same access, for **Other AWS accounts**, choose **Add another AWS account**, then choose **Next**. The **Review and edit key policy** page opens, showing the JSON for the key policy that you can review and edit by typing into the existing JSON. Here, you can see where the key policy references the role and users (for example, `Admin` and `User1`) that you chose in the previous step. You can also see the different key actions permitted for the different principals (users and roles), as shown in the example following.

------
#### [ JSON ]

****  

   ```
   {
       "Id": "key-consolepolicy-3",
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "Enable IAM User Permissions",
               "Effect": "Allow",
               "Principal": {
                   "AWS": [
                       "arn:aws:iam::111122223333:root"
                   ]
               },
               "Action": "kms:*",
               "Resource": "*"
           },
           {
               "Sid": "Allow access for Key Administrators",
               "Effect": "Allow",
               "Principal": {
                   "AWS": [
                       "arn:aws:iam::111122223333:role/Admin"
                   ]
               },
               "Action": [
                   "kms:Create*",
                   "kms:Describe*",
                   "kms:Enable*",
                   "kms:List*",
                   "kms:Put*",
                   "kms:Update*",
                   "kms:Revoke*",
                   "kms:Disable*",
                   "kms:Get*",
                   "kms:Delete*",
                   "kms:TagResource",
                   "kms:UntagResource",
                   "kms:ScheduleKeyDeletion",
                   "kms:CancelKeyDeletion"
               ],
               "Resource": "*"
           },
           {
               "Sid": "Allow use of the key",
               "Effect": "Allow",
               "Principal": {
                   "AWS": [
                       "arn:aws:iam::111122223333:role/DMS-S3-endpoint-access-role",
                       "arn:aws:iam::111122223333:role/Admin",
                       "arn:aws:iam::111122223333:role/User1"
                   ]
               },
               "Action": [
                   "kms:Encrypt",
                   "kms:Decrypt",
                   "kms:ReEncrypt*",
                   "kms:GenerateDataKey*",
                   "kms:DescribeKey"
               ],
               "Resource": "*"
           },
           {
               "Sid": "Allow attachment of persistent resources",
               "Effect": "Allow",
               "Principal": {
                   "AWS": [
                       "arn:aws:iam::111122223333:role/DMS-S3-endpoint-access-role",
                       "arn:aws:iam::111122223333:role/Admin",
                       "arn:aws:iam::111122223333:role/User1"
                   ]
               },
               "Action": [
                   "kms:CreateGrant",
                   "kms:ListGrants",
                   "kms:RevokeGrant"
               ],
               "Resource": "*",
               "Condition": {
                   "Bool": {
                       "kms:GrantIsForAWSResource": true
                   }
               }
           }
       ]
   }
   ```

------

1. Choose **Finish**. The **Encryption keys** page opens with a message indicating that your KMS key has been created.

You have now created a new KMS key with a specified alias (for example, `DMS-S3-endpoint-encryption-key`). This key enables AWS DMS to encrypt Amazon S3 target objects.

## Using date-based folder partitioning
<a name="CHAP_Target.S3.DatePartitioning"></a>

AWS DMS supports S3 folder partitions based on a transaction commit date when you use Amazon S3 as your target endpoint. Using date-based folder partitioning, you can write data from a single source table to a time-hierarchy folder structure in an S3 bucket. By partitioning folders when creating an S3 target endpoint, you can do the following:
+ Better manage your S3 objects
+ Limit the size of each S3 folder
+ Optimize data lake queries or other subsequent operations

You can enable date-based folder partitioning when you create an S3 target endpoint. You can enable it when you either migrate existing data and replicate ongoing changes (full load \$1 CDC), or replicate data changes only (CDC only). When you migrate existing data and replicate ongoing changes, only ongoing changes will be partitioned. Use the following target endpoint settings:
+ `DatePartitionEnabled` – Specifies partitioning based on dates. Set this Boolean option to `true` to partition S3 bucket folders based on transaction commit dates. 

  You can't use this setting with `PreserveTransactions` or `CdcPath`.

  The default value is `false`. 
+ `DatePartitionSequence` – Identifies the sequence of the date format to use during folder partitioning. Set this ENUM option to `YYYYMMDD`, `YYYYMMDDHH`, `YYYYMM`, `MMYYYYDD`, or `DDMMYYYY`. The default value is `YYYYMMDD`. Use this setting when `DatePartitionEnabled` is set to `true.`
+ `DatePartitionDelimiter` – Specifies a date separation delimiter to use during folder partitioning. Set this ENUM option to `SLASH`, `DASH`, `UNDERSCORE`, or `NONE`. The default value is `SLASH`. Use this setting when `DatePartitionEnabled` is set to `true`.
+ `DatePartitionTimezone` – When creating an S3 target endpoint, set `DatePartitionTimezone` to convert the current UTC time into a specified time zone. The conversion occurs when a date partition folder is created and a CDC filename is generated. The time zone format is Area/Location. Use this parameter when `DatePartitionedEnabled` is set to `true`, as shown in the following example:

  ```
  s3-settings='{"DatePartitionEnabled": true, "DatePartitionSequence": "YYYYMMDDHH", "DatePartitionDelimiter": "SLASH", "DatePartitionTimezone":"Asia/Seoul", "BucketName": "dms-nattarat-test"}'
  ```

The following example shows how to enable date-based folder partitioning, with default values for the data partition sequence and the delimiter. It uses the `--s3-settings '{json-settings}'` option of the AWS CLI.`create-endpoint` command. 

```
   --s3-settings '{"DatePartitionEnabled": true,"DatePartitionSequence": "YYYYMMDD","DatePartitionDelimiter": "SLASH"}'
```

## Parallel load of partitioned sources when using Amazon S3 as a target for AWS DMS
<a name="CHAP_Target.S3.ParallelLoad"></a>

You can configure a parallel full load of partitioned data sources to Amazon S3 targets. This approach improves the load times for migrating partitioned data from supported source database engines to the S3 target. To improve the load times of partitioned source data, you create S3 target subfolders mapped to the partitions of every table in the source database. These partition-bound subfolders allow AWS DMS to run parallel processes to populate each subfolder on the target.

To configure a parallel full load of an S3 target, S3 supports three `parallel-load` rule types for the `table-settings` rule of table mapping:
+ `partitions-auto`
+ `partitions-list`
+ `ranges`

For more information on these parallel-load rule types, see [Table and collection settings rules and operations](CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Tablesettings.md).

For the `partitions-auto` and `partitions-list` rule types, AWS DMS uses each partition name from the source endpoint to identify the target subfolder structure, as follows.

```
bucket_name/bucket_folder/database_schema_name/table_name/partition_name/LOADseq_num.csv
```

Here, the subfolder path where data is migrated and stored on the S3 target includes an additional `partition_name` subfolder that corresponds to a source partition with the same name. This `partition_name` subfolder then stores one or more `LOADseq_num.csv` files containing data migrated from the specified source partition. Here, `seq_num` is the sequence number postfix on the .csv file name, such as `00000001` in the .csv file with the name, `LOAD00000001.csv`.

However, some database engines, such as MongoDB and DocumentDB, don't have the concept of partitions. For these database engines, AWS DMS adds the running source segment index as a prefix to the target .csv file name, as follows.

```
.../database_schema_name/table_name/SEGMENT1_LOAD00000001.csv
.../database_schema_name/table_name/SEGMENT1_LOAD00000002.csv
...
.../database_schema_name/table_name/SEGMENT2_LOAD00000009.csv
.../database_schema_name/table_name/SEGMENT3_LOAD0000000A.csv
```

Here, the files `SEGMENT1_LOAD00000001.csv` and `SEGMENT1_LOAD00000002.csv` are named with the same running source segment index prefix, `SEGMENT1`. They're named as so because the migrated source data for these two .csv files is associated with the same running source segment index. On the other hand, the migrated data stored in each of the target `SEGMENT2_LOAD00000009.csv` and `SEGMENT3_LOAD0000000A.csv` files is associated with different running source segment indexes. Each file has its file name prefixed with the name of its running segment index, `SEGMENT2` and `SEGMENT3`.

For the `ranges` parallel-load type, you define the column names and column values using the `columns` and `boundaries` settings of the `table-settings` rules. With these rules, you can specify partitions corresponding to segment names, as follows.

```
"parallel-load": {
    "type": "ranges",
    "columns": [
         "region",
         "sale"
    ],
    "boundaries": [
          [
               "NORTH",
               "1000"
          ],
          [
               "WEST",
               "3000"
          ]
    ],
    "segment-names": [
          "custom_segment1",
          "custom_segment2",
          "custom_segment3"
    ]
}
```

Here, the `segment-names` setting defines names for three partitions to migrate data in parallel on the S3 target. The migrated data is parallel-loaded and stored in .csv files under the partition subfolders in order, as follows.

```
.../database_schema_name/table_name/custom_segment1/LOAD[00000001...].csv
.../database_schema_name/table_name/custom_segment2/LOAD[00000001...].csv
.../database_schema_name/table_name/custom_segment3/LOAD[00000001...].csv
```

Here, AWS DMS stores a series of .csv files in each of the three partition subfolders. The series of .csv files in each partition subfolder is named incrementally starting from `LOAD00000001.csv` until all the data is migrated.

In some cases, you might not explicitly name partition subfolders for a `ranges` parallel-load type using the `segment-names` setting. In these case, AWS DMS applies the default of creating each series of .csv files under its `table_name` subfolder. Here, AWS DMS prefixes the file names of each series of .csv files with the name of the running source segment index, as follows.

```
.../database_schema_name/table_name/SEGMENT1_LOAD[00000001...].csv
.../database_schema_name/table_name/SEGMENT2_LOAD[00000001...].csv
.../database_schema_name/table_name/SEGMENT3_LOAD[00000001...].csv
...
.../database_schema_name/table_name/SEGMENTZ_LOAD[00000001...].csv
```

## Endpoint settings when using Amazon S3 as a target for AWS DMS
<a name="CHAP_Target.S3.Configuring"></a>

You can use endpoint settings to configure your Amazon S3 target database similar to using extra connection attributes. You specify the settings when you create the target endpoint using the AWS DMS console, or by using the `create-endpoint` command in the [AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/dms/index.html), with the `--s3-settings '{"EndpointSetting": "value", ...}'` JSON syntax.

**Note**  
DMS writes changes to Parquet files based on the commit order from the source database, but when migrating multiple tables, the original transaction order is not preserved due to table-level partitioning. To maintain transaction sequence information, configure the `TimestampColumnName` endpoint setting to include the source commit timestamp for each row, which you can then use in downstream processing to reconstruct the original transaction sequence. Unlike CSV format, which offers the `PreserveTransactions` setting, Parquet files handle transactions differently due to their columnar storage structure, but this approach enables accurate tracking of source commit times, supports post-migration transaction order reconstruction, and allows efficient data processing while maintaining data consistency.

The following table shows the endpoint settings that you can use with Amazon S3 as a target.


| **Option** | **Description** | 
| --- | --- | 
| CsvNullValue |  An optional parameter that specifies how AWS DMS treats null values. While handling the null value, you can use this parameter to pass a user-defined string as null when writing to the target. For example, when target columns are nullable, you can use this option to differentiate between the empty string value and the null value.  Default value: `""` Valid values: any valid string Example: `--s3-settings '{"CsvNullValue": "NULL"}'` If the source database column value is null, in S3 CSV file, the column value is `NULL` instead of "" string.  | 
| AddColumnName |  An optional parameter that when set to `true` or `y` you can use to add column name information to the .csv output file. You can't use this parameter with `PreserveTransactions` or `CdcPath`. Default value: `false` Valid values: `true`, `false`, `y`, `n` Example: `--s3-settings '{"AddColumnName": true}'`  | 
| AddTrailingPaddingCharacter |  Use the S3 target endpoint setting `AddTrailingPaddingCharacter` to add padding on string data. The default value is `false`. Type: Boolean Example: `--s3-settings '{"AddTrailingPaddingCharacter": true}'`  | 
| BucketFolder |  An optional parameter to set a folder name in the S3 bucket. If provided, target objects are created as .csv or .parquet files in the path `BucketFolder/schema_name/table_name/`. If this parameter isn't specified, then the path used is `schema_name/table_name/`.  Example: `--s3-settings '{"BucketFolder": "testFolder"}'`  | 
| BucketName |  The name of the S3 bucket where S3 target objects are created as .csv or .parquet files. Example: `--s3-settings '{"BucketName": "buckettest"}'`  | 
| CannedAclForObjects |  A value that enables AWS DMS to specify a predefined (canned) access control list for objects created in the S3 bucket as .csv or .parquet files. For more information about Amazon S3 canned ACLs, see [Canned ACL](http://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl) in the *Amazon S3 Developer Guide.* Default value: NONE Valid values for this attribute are: NONE; PRIVATE; PUBLIC\$1READ; PUBLIC\$1READ\$1WRITE; AUTHENTICATED\$1READ; AWS\$1EXEC\$1READ; BUCKET\$1OWNER\$1READ; BUCKET\$1OWNER\$1FULL\$1CONTROL. Example: `--s3-settings '{"CannedAclForObjects": "PUBLIC_READ"}'`  | 
| CdcInsertsOnly |  An optional parameter during a change data capture (CDC) load to write only INSERT operations to the comma-separated value (.csv) or columnar storage (.parquet) output files. By default (the `false` setting), the first field in a .csv or .parquet record contains the letter I (INSERT), U (UPDATE), or D (DELETE). This letter indicates whether the row was inserted, updated, or deleted at the source database for a CDC load to the target. If `cdcInsertsOnly` is set to `true` or `y`, only INSERTs from the source database are migrated to the .csv or .parquet file. For .csv format only, how these INSERTS are recorded depends on the value of `IncludeOpForFullLoad`. If `IncludeOpForFullLoad` is set to `true`, the first field of every CDC record is set to I to indicate the INSERT operation at the source. If `IncludeOpForFullLoad` is set to `false`, every CDC record is written without a first field to indicate the INSERT operation at the source. For more information about how these parameters work together, see [Indicating source DB operations in migrated S3 data](#CHAP_Target.S3.Configuring.InsertOps). Default value: `false` Valid values: `true`, `false`, `y`, `n` Example: `--s3-settings '{"CdcInsertsOnly": true}'`  | 
| CdcInsertsAndUpdates |  Enables a change data capture (CDC) load to write INSERT and UPDATE operations to .csv or .parquet (columnar storage) output files. The default setting is `false`, but when `cdcInsertsAndUpdates` is set to `true` or `y`, INSERTs and UPDATEs from the source database are migrated to the .csv or .parquet file.  For .csv file format only, how these INSERTs and UPDATEs are recorded depends on the value of the `includeOpForFullLoad` parameter. If `includeOpForFullLoad` is set to `true`, the first field of every CDC record is set to either `I` or `U` to indicate INSERT and UPDATE operations at the source. But if `includeOpForFullLoad` is set to `false`, CDC records are written without an indication of INSERT or UPDATE operations at the source.   For more information about how these parameters work together, see [Indicating source DB operations in migrated S3 data](#CHAP_Target.S3.Configuring.InsertOps).  `CdcInsertsOnly` and `cdcInsertsAndUpdates` can't both be set to true for the same endpoint. Set either `cdcInsertsOnly` or `cdcInsertsAndUpdates` to `true` for the same endpoint, but not both.   Default value: `false` Valid values: `true`, `false`, `y`, `n` Example: `--s3-settings '{"CdcInsertsAndUpdates": true}'`  | 
|  `CdcPath`  |  Specifies the folder path of CDC files. For an S3 source, this setting is required if a task captures change data; otherwise, it's optional. If `CdcPath` is set, DMS reads CDC files from this path and replicates the data changes to the target endpoint. For an S3 target if you set `PreserveTransactions` to true, DMS verifies that you have set this parameter to a folder path on your S3 target where DMS can save the transaction order for the CDC load. DMS creates this CDC folder path in either your S3 target working directory or the S3 target location specified by `BucketFolder` and `BucketName`. You can't use this parameter with `DatePartitionEnabled` or `AddColumnName`. Type: String For example, if you specify `CdcPath` as `MyChangedData`, and you specify `BucketName` as `MyTargetBucket` but do not specify `BucketFolder`, DMS creates the following CDC folder path: `MyTargetBucket/MyChangedData`.  If you specify the same `CdcPath`, and you specify `BucketName` as `MyTargetBucket` and `BucketFolder` as `MyTargetData`, DMS creates the following CDC folder path: `MyTargetBucket/MyTargetData/MyChangedData`. This setting is supported in AWS DMS versions 3.4.2 and higher. When capturing data changes in transaction order, DMS always stores the row changes in .csv files regardless of the value of the DataFormat S3 setting on the target.   | 
|  `CdcMaxBatchInterval`  |  Maximum interval length condition, defined in seconds, to output a file to Amazon S3. Default Value: 60 seconds When `CdcMaxBatchInterval` is specified and `CdcMinFileSize` is specified, the file write is triggered by whichever parameter condition is met first.  Starting with AWS DMS version 3.5.3, when using PostgreSQL or Aurora PostgreSQL as the source and Amazon S3 with Parquet as the target, the frequency of `confirmed_flush_lsn` updates depends on the amount of data the target endpoint is configured to retain in memory. AWS DMS sends the `confirmed_flush_lsn` back to the source only after the data in memory is written to Amazon S3. If you configure the `CdcMaxBatchInterval` parameter to a higher value, you may observe increased replication slot usage on the source database.   | 
|  `CdcMinFileSize`  |  Minimum file size condition as defined in kilobytes to output a file to Amazon S3. Default Value: 32000 KB When `CdcMinFileSize` is specified and `CdcMaxBatchInterval` is specified, the file write is triggered by whichever parameter condition is met first.  | 
|  `PreserveTransactions`  |  If set to `true`, DMS saves the transaction order for change data capture (CDC) on the Amazon S3 target specified by `CdcPath`. You can't use this parameter with `DatePartitionEnabled` or `AddColumnName`. Type: Boolean When capturing data changes in transaction order, DMS always stores the row changes in .csv files regardless of the value of the DataFormat S3 setting on the target. This setting is supported in AWS DMS versions 3.4.2 and higher.   | 
| IncludeOpForFullLoad |  An optional parameter during a full load to write the INSERT operations to the comma-separated value (.csv) output files only. For full load, records can only be inserted. By default (the `false` setting), there is no information recorded in these output files for a full load to indicate that the rows were inserted at the source database. If `IncludeOpForFullLoad` is set to `true` or `y`, the INSERT is recorded as an I annotation in the first field of the .csv file.  This parameter works together with `CdcInsertsOnly` or `CdcInsertsAndUpdates` for output to .csv files only. For more information about how these parameters work together, see [Indicating source DB operations in migrated S3 data](#CHAP_Target.S3.Configuring.InsertOps).  Default value: `false` Valid values: `true`, `false`, `y`, `n` Example: `--s3-settings '{"IncludeOpForFullLoad": true}'`  | 
| CompressionType |  An optional parameter when set to `GZIP` uses GZIP to compress the target .csv files. When this parameter is set to the default, it leaves the files uncompressed. Default value: `NONE` Valid values: `GZIP` or `NONE` Example: `--s3-settings '{"CompressionType": "GZIP"}'`  | 
| CsvDelimiter |  The delimiter used to separate columns in .csv source files. The default is a comma (,). Example: `--s3-settings '{"CsvDelimiter": ","}'`  | 
| CsvRowDelimiter |  The delimiter used to separate rows in the .csv source files. The default is a newline (\$1n). Example: `--s3-settings '{"CsvRowDelimiter": "\n"}'`  | 
|   `MaxFileSize`   |  A value that specifies the maximum size (in KB) of any .csv file to be created while migrating to an S3 target during full load. Default value: 1,048,576 KB (1 GB) Valid values: 1–1,048,576 Example: `--s3-settings '{"MaxFileSize": 512}'`  | 
| Rfc4180 |  An optional parameter used to set behavior to comply with RFC for data migrated to Amazon S3 using .csv file format only. When this value is set to `true` or `y` using Amazon S3 as a target, if the data has quotation marks, commas, or newline characters in it, AWS DMS encloses the entire column with an additional pair of double quotation marks ("). Every quotation mark within the data is repeated twice. This formatting complies with RFC 4180. Default value: `true` Valid values: `true`, `false`, `y`, `n` Example: `--s3-settings '{"Rfc4180": false}'`  | 
| EncryptionMode |  The server-side encryption mode that you want to encrypt your .csv or .parquet object files copied to S3. The valid values are `SSE_S3` (S3 server-side encryption) or `SSE_KMS` (KMS key encryption). If you choose `SSE_KMS`, set the `ServerSideEncryptionKmsKeyId` parameter to the Amazon Resource Name (ARN) for the KMS key to be used for encryption.  You can also use the CLI `modify-endpoint` command to change the value of the `EncryptionMode` attribute for an existing endpoint from `SSE_KMS` to `SSE_S3`. But you can’t change the `EncryptionMode` value from `SSE_S3` to `SSE_KMS`.  Default value: `SSE_S3` Valid values: `SSE_S3` or `SSE_KMS` Example: `--s3-settings '{"EncryptionMode": SSE_S3}'`  | 
| ServerSideEncryptionKmsKeyId |  If you set `EncryptionMode` to `SSE_KMS`, set this parameter to the Amazon Resource Name (ARN) for the KMS key. You can find this ARN by selecting the key alias in the list of AWS KMS keys created for your account. When you create the key, you must associate specific policies and roles associated with this KMS key. For more information, see [Creating AWS KMS keys to encrypt Amazon S3 target objects](#CHAP_Target.S3.KMSKeys). Example: `--s3-settings '{"ServerSideEncryptionKmsKeyId":"arn:aws:kms:us-east-1:111122223333:key/11a1a1a1-aaaa-9999-abab-2bbbbbb222a2"}'`  | 
| DataFormat |  The output format for the files that AWS DMS uses to create S3 objects. For Amazon S3 targets, AWS DMS supports either .csv or .parquet files. The .parquet files have a binary columnar storage format with efficient compression options and faster query performance. For more information about .parquet files, see [https://parquet.apache.org/](https://parquet.apache.org/). Default value: `csv` Valid values: `csv` or `parquet` Example: `--s3-settings '{"DataFormat": "parquet"}'`  | 
| EncodingType |  The Parquet encoding type. The encoding type options include the following: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html) Default value: `rle-dictionary` Valid values: `rle-dictionary`, `plain`, or `plain-dictionary` Example: `--s3-settings '{"EncodingType": "plain-dictionary"}'`  | 
| DictPageSizeLimit |  The maximum allowed size, in bytes, for a dictionary page in a .parquet file. If a dictionary page exceeds this value, the page uses plain encoding. Default value: 1,024,000 (1 MB) Valid values: Any valid integer value Example: `--s3-settings '{"DictPageSizeLimit": 2,048,000}'`  | 
| RowGroupLength |  The number of rows in one row group of a .parquet file. Default value: 10,024 (10 KB) Valid values: Any valid integer Example: `--s3-settings '{"RowGroupLength": 20,048}'`  | 
| DataPageSize |  The maximum allowed size, in bytes, for a data page in a .parquet file. Default value: 1,024,000 (1 MB) Valid values: Any valid integer Example: `--s3-settings '{"DataPageSize": 2,048,000}'`  | 
| ParquetVersion |  The version of the .parquet file format. Default value: `PARQUET_1_0` Valid values: `PARQUET_1_0` or `PARQUET_2_0` Example: `--s3-settings '{"ParquetVersion": "PARQUET_2_0"}'`  | 
| EnableStatistics |  Set to `true` or `y` to enable statistics about .parquet file pages and row groups. Default value: `true` Valid values: `true`, `false`, `y`, `n` Example: `--s3-settings '{"EnableStatistics": false}'`  | 
| TimestampColumnName |  An optional parameter to include a timestamp column in the S3 target endpoint data. AWS DMS includes an additional `STRING` column in the .csv or .parquet object files of your migrated data when you set `TimestampColumnName` to a non blank value. For a full load, each row of this timestamp column contains a timestamp for when the data was transferred from the source to the target by DMS.  For a CDC load, each row of the timestamp column contains the timestamp for the commit of that row in the source database. The string format for this timestamp column value is `yyyy-MM-dd HH:mm:ss.SSSSSS`. By default, the precision of this value is in microseconds. For a CDC load, the rounding of the precision depends on the commit timestamp supported by DMS for the source database. When the `AddColumnName` parameter is set to `true`, DMS also includes the name for the timestamp column that you set as the non blank value of `TimestampColumnName`. Example: `--s3-settings '{"TimestampColumnName": "TIMESTAMP"}'`  | 
| UseTaskStartTimeForFullLoadTimestamp |  When set to `true`, this parameter uses the task start time as the timestamp column value instead of the time data is written to target. For full load, when `UseTaskStartTimeForFullLoadTimestamp` is set to `true`, each row of the timestamp column contains the task start time. For CDC loads, each row of the timestamp column contains the transaction commit time. When `UseTaskStartTimeForFullLoadTimestamp` is set to `false`, the full load timestamp in the timestamp column increments with the time data arrives at the target. Default value: `false` Valid values: `true`, `false` Example: `--s3-settings '{"UseTaskStartTimeForFullLoadTimestamp": true}'` `UseTaskStartTimeForFullLoadTimestamp: true` helps make the S3 target `TimestampColumnName` for a full load sortable with `TimestampColumnName` for a CDC load.  | 
| ParquetTimestampInMillisecond |  An optional parameter that specifies the precision of any `TIMESTAMP` column values written to an S3 object file in .parquet format. When this attribute is set to `true` or `y`, AWS DMS writes all `TIMESTAMP` columns in a .parquet formatted file with millisecond precision. Otherwise, DMS writes them with microsecond precision. Currently, Amazon Athena and AWS Glue can handle only millisecond precision for `TIMESTAMP` values. Set this attribute to true for .parquet formatted S3 endpoint object files only if you plan to query or process the data with Athena or AWS Glue.    AWS DMS writes any `TIMESTAMP` column values written to an S3 file in .csv format with microsecond precision.   The setting of this attribute has no effect on the string format of the timestamp column value inserted by setting the `TimestampColumnName` attribute.    Default value: `false` Valid values: `true`, `false`, `y`, `n` Example: `--s3-settings '{"ParquetTimestampInMillisecond": true}'`  | 
| GlueCatalogGeneration |  To generate an AWS Glue Data Catalog, set this endpoint setting to `true`. Default value: `false` Valid values: `true`, `false`, Example: `--s3-settings '{"GlueCatalogGeneration": true}'` **Note: **Don't use `GlueCatalogGeneration` with `PreserveTransactions` and `CdcPath`.  | 

## Using AWS Glue Data Catalog with an Amazon S3 target for AWS DMS
<a name="CHAP_Target.S3.GlueCatalog"></a>

AWS Glue is a service that provides simple ways to categorize data, and consists of a metadata repository known as AWS Glue Data Catalog. You can integrate AWS Glue Data Catalog with your Amazon S3 target endpoint and query Amazon S3 data through other AWS services such as Amazon Athena. Amazon Redshift works with AWS Glue but AWS DMS doesn't support that as a pre-built option. 

To generate the data catalog, set the `GlueCatalogGeneration` endpoint setting to `true`, as shown in the following AWS CLI example.

```
aws dms create-endpoint --endpoint-identifier s3-target-endpoint 
            --engine-name s3 --endpoint-type target--s3-settings '{"ServiceAccessRoleArn": 
            "your-service-access-ARN", "BucketFolder": "your-bucket-folder", "BucketName": 
            "your-bucket-name", "DataFormat": "parquet", "GlueCatalogGeneration": true}'
```

For a Full load replication task that includes `csv` type data, set `IncludeOpForFullLoad` to `true`.

Don't use `GlueCatalogGeneration` with `PreserveTransactions` and `CdcPath`. The AWS Glue crawler can't reconcile the different schemas of files stored under the specified `CdcPath`.

For Amazon Athena to index your Amazon S3 data, and for you to query your data using standard SQL queries through Amazon Athena, the IAM role attached to the endpoint must have the following policy:

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	  
    "Statement": [ 
        {
            "Effect": "Allow", 
            "Action": [
                "s3:GetBucketLocation", 
                "s3:GetObject",
                "s3:ListBucket", 
                "s3:ListBucketMultipartUploads", 
                "s3:ListMultipartUploadParts", 
                "s3:AbortMultipartUpload" 
            ], 
            "Resource": [
                "arn:aws:s3:::bucket123", 
                "arn:aws:s3:::bucket123/*" 
            ]
        },
        {
            "Effect": "Allow", 
            "Action": [ 
                "glue:CreateDatabase", 
                "glue:GetDatabase", 
                "glue:CreateTable", 
                "glue:DeleteTable", 
                "glue:UpdateTable", 
                "glue:GetTable", 
                "glue:BatchCreatePartition", 
                "glue:CreatePartition", 
                "glue:UpdatePartition", 
                "glue:GetPartition", 
                "glue:GetPartitions", 
                "glue:BatchGetPartition"
            ], 
            "Resource": [
                "arn:aws:glue:*:111122223333:catalog", 
                "arn:aws:glue:*:111122223333:database/*", 
                "arn:aws:glue:*:111122223333:table/*" 
            ]
        }, 
        {
            "Effect": "Allow",
            "Action": [
                "athena:StartQueryExecution",
                "athena:GetQueryExecution", 
                "athena:CreateWorkGroup"
            ],
            "Resource": "arn:aws:athena:*:111122223333:workgroup/glue_catalog_generation_for_task_*"
        }
    ]
}
```

------

**References**
+ For more information about AWS Glue, see [Concepts](https://docs.aws.amazon.com//glue/latest/dg/components-key-concepts.html) in the *AWS Glue Developer Guide* .
+ For more information about AWS Glue Data Catalog see [Components](https://docs.aws.amazon.com/glue/latest/dg/components-overview.html) in the *AWS Glue Developer Guide* .

## Using data encryption, parquet files, and CDC on your Amazon S3 target
<a name="CHAP_Target.S3.EndpointSettings"></a>

You can use S3 target endpoint settings to configure the following:
+ A custom KMS key to encrypt your S3 target objects.
+ Parquet files as the storage format for S3 target objects.
+ Change data capture (CDC) including transaction order on the S3 target.
+ Integrate AWS Glue Data Catalog with your Amazon S3 target endpoint and query Amazon S3 data through other services such as Amazon Athena.

### AWS KMS key settings for data encryption
<a name="CHAP_Target.S3.EndpointSettings.KMSkeys"></a>

The following examples show configuring a custom KMS key to encrypt your S3 target objects. To start, you might run the following `create-endpoint` CLI command.

```
aws dms create-endpoint --endpoint-identifier s3-target-endpoint --engine-name s3 --endpoint-type target 
--s3-settings '{"ServiceAccessRoleArn": "your-service-access-ARN", "CsvRowDelimiter": "\n", 
"CsvDelimiter": ",", "BucketFolder": "your-bucket-folder", 
"BucketName": "your-bucket-name", 
"EncryptionMode": "SSE_KMS", 
"ServerSideEncryptionKmsKeyId": "arn:aws:kms:us-east-1:111122223333:key/72abb6fb-1e49-4ac1-9aed-c803dfcc0480"}'
```

Here, the JSON object specified by `--s3-settings` option defines two parameters. One is an `EncryptionMode` parameter with the value `SSE_KMS`. The other is an `ServerSideEncryptionKmsKeyId` parameter with the value of `arn:aws:kms:us-east-1:111122223333:key/72abb6fb-1e49-4ac1-9aed-c803dfcc0480`. This value is an Amazon Resource Name (ARN) for your custom KMS key. For an S3 target, you also specify additional settings. These identify the server access role, provide delimiters for the default CSV object storage format, and give the bucket location and name to store S3 target objects.

By default, S3 data encryption occurs using S3 server-side encryption. For the previous example's S3 target, this is also equivalent to specifying its endpoint settings as in the following example.

```
aws dms create-endpoint --endpoint-identifier s3-target-endpoint --engine-name s3 --endpoint-type target
--s3-settings '{"ServiceAccessRoleArn": "your-service-access-ARN", "CsvRowDelimiter": "\n", 
"CsvDelimiter": ",", "BucketFolder": "your-bucket-folder", 
"BucketName": "your-bucket-name", 
"EncryptionMode": "SSE_S3"}'
```

For more information about working with S3 server-side encryption, see [Protecting data using server-side encryption](https://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html).

**Note**  
You can also use the CLI `modify-endpoint` command to change the value of the `EncryptionMode` parameter for an existing endpoint from `SSE_KMS` to `SSE_S3`. But you can’t change the `EncryptionMode` value from `SSE_S3` to `SSE_KMS`.

### Settings for using .parquet files to store S3 target objects
<a name="CHAP_Target.S3.EndpointSettings.Parquet"></a>

The default format for creating S3 target objects is .csv files. The following examples show some endpoint settings for specifying .parquet files as the format for creating S3 target objects. You can specify the .parquet files format with all the defaults, as in the following example.

```
aws dms create-endpoint --endpoint-identifier s3-target-endpoint --engine-name s3 --endpoint-type target 
--s3-settings '{"ServiceAccessRoleArn": "your-service-access-ARN", "DataFormat": "parquet"}'
```

Here, the `DataFormat` parameter is set to `parquet` to enable the format with all the S3 defaults. These defaults include a dictionary encoding (`"EncodingType: "rle-dictionary"`) that uses a combination of bit-packing and run-length encoding to more efficiently store repeating values.

You can add additional settings for options other than the defaults as in the following example.

```
aws dms create-endpoint --endpoint-identifier s3-target-endpoint --engine-name s3 --endpoint-type target
--s3-settings '{"ServiceAccessRoleArn": "your-service-access-ARN", "BucketFolder": "your-bucket-folder",
"BucketName": "your-bucket-name", "DataFormat": "parquet", "EncodingType: "plain-dictionary", "DictPageSizeLimit": 3,072,000,
"EnableStatistics": false }'
```

Here, in addition to parameters for several standard S3 bucket options and the `DataFormat` parameter, the following additional .parquet file parameters are set:
+ `EncodingType` – Set to a dictionary encoding (`plain-dictionary`) that stores values encountered in each column in a per-column chunk of the dictionary page.
+ `DictPageSizeLimit` – Set to a maximum dictionary page size of 3 MB.
+ `EnableStatistics` – Disables the default that enables the collection of statistics about Parquet file pages and row groups.

### Capturing data changes (CDC) including transaction order on the S3 target
<a name="CHAP_Target.S3.EndpointSettings.CdcPath"></a>

By default when AWS DMS runs a CDC task, it stores all the row changes logged in your source database (or databases) in one or more files for each table. Each set of files containing changes for the same table reside in a single target directory associated with that table. AWS DMS creates as many target directories as database tables migrated to the Amazon S3 target endpoint. The files are stored on the S3 target in these directories without regard to transaction order. For more information on the file naming conventions, data contents, and format, see [Using Amazon S3 as a target for AWS Database Migration Service](#CHAP_Target.S3).

To capture source database changes in a manner that also captures the transaction order, you can specify S3 endpoint settings that direct AWS DMS to store the row changes for *all* database tables in one or more .csv files created depending on transaction size. These .csv *transaction files* contain all row changes listed sequentially in transaction order for all tables involved in each transaction. These transaction files reside together in a single *transaction directory* that you also specify on the S3 target. In each transaction file, the transaction operation and the identity of the database and source table for each row change is stored as part of the row data as follows. 

```
operation,table_name,database_schema_name,field_value,...
```

Here, `operation` is the transaction operation on the changed row, `table_name` is the name of the database table where the row is changed, `database_schema_name` is the name of the database schema where the table resides, and `field_value` is the first of one or more field values that specify the data for the row.

The example following of a transaction file shows changed rows for one or more transactions that involve two tables.

```
I,Names_03cdcad11a,rdsTempsdb,13,Daniel
U,Names_03cdcad11a,rdsTempsdb,23,Kathy
D,Names_03cdcad11a,rdsTempsdb,13,Cathy
I,Names_6d152ce62d,rdsTempsdb,15,Jane
I,Names_6d152ce62d,rdsTempsdb,24,Chris
I,Names_03cdcad11a,rdsTempsdb,16,Mike
```

Here, the transaction operation on each row is indicated by `I` (insert), `U` (update), or `D` (delete) in the first column. The table name is the second column value (for example, `Names_03cdcad11a`). The name of the database schema is the value of the third column (for example, `rdsTempsdb`). And the remaining columns are populated with your own row data (for example, `13,Daniel`).

In addition, AWS DMS names the transaction files it creates on the Amazon S3 target using a time stamp according to the following naming convention.

```
CDC_TXN-timestamp.csv
```

Here, `timestamp` is the time when the transaction file was created, as in the following example. 

```
CDC_TXN-20201117153046033.csv
```

This time stamp in the file name ensures that the transaction files are created and listed in transaction order when you list them in their transaction directory.

**Note**  
When capturing data changes in transaction order, AWS DMS always stores the row changes in .csv files regardless of the value of the `DataFormat` S3 setting on the target.

To control the frequency of writes to an Amazon S3 target during a data replication task, you can configure the `CdcMaxBatchInterval` and `CdcMinFileSize` settings. This can result in better performance when analyzing the data without any additional overhead operations. For more information, see [Endpoint settings when using Amazon S3 as a target for AWS DMS](#CHAP_Target.S3.Configuring) 

**To tell AWS DMS to store all row changes in transaction order**

1. Set the `PreserveTransactions` S3 setting on the target to `true`.

1. Set the `CdcPath` S3 setting on the target to a relative folder path where you want AWS DMS to store the .csv transaction files.

   AWS DMS creates this path either under the default S3 target bucket and working directory or under the bucket and bucket folder that you specify using the `BucketName` and `BucketFolder` S3 settings on the target.

## Indicating source DB operations in migrated S3 data
<a name="CHAP_Target.S3.Configuring.InsertOps"></a>

When AWS DMS migrates records to an S3 target, it can create an additional field in each migrated record. This additional field indicates the operation applied to the record at the source database. How AWS DMS creates and sets this first field depends on the migration task type and settings of `includeOpForFullLoad`, `cdcInsertsOnly`, and `cdcInsertsAndUpdates`.

For a full load when `includeOpForFullLoad` is `true`, AWS DMS always creates an additional first field in each .csv record. This field contains the letter I (INSERT) to indicate that the row was inserted at the source database. For a CDC load when `cdcInsertsOnly` is `false` (the default), AWS DMS also always creates an additional first field in each .csv or .parquet record. This field contains the letter I (INSERT), U (UPDATE), or D (DELETE) to indicate whether the row was inserted, updated, or deleted at the source database.

In the following table, you can see how the settings of the `includeOpForFullLoad` and `cdcInsertsOnly` attributes work together to affect the setting of migrated records.


| With these parameter settings | DMS sets target records as follows for .csv and .parquet output  | includeOpForFullLoad | cdcInsertsOnly | For full load | For CDC load | 
| --- | --- | --- | --- | --- | --- | 
| true | true | Added first field value set to I | Added first field value set to I | 
| false | false | No added field | Added first field value set to I, U, or D | 
| false | true | No added field | No added field | 
| true | false | Added first field value set to I | Added first field value set to I, U, or D | 

When `includeOpForFullLoad` and `cdcInsertsOnly` are set to the same value, the target records are set according to the attribute that controls record settings for the current migration type. That attribute is `includeOpForFullLoad` for full load and `cdcInsertsOnly` for CDC load.

When `includeOpForFullLoad` and `cdcInsertsOnly` are set to different values, AWS DMS makes the target record settings consistent for both CDC and full load. It does this by making the record settings for a CDC load conform to the record settings for any earlier full load specified by `includeOpForFullLoad`. 

In other words, suppose that a full load is set to add a first field to indicate an inserted record. In this case, a following CDC load is set to add a first field that indicates an inserted, updated, or deleted record as appropriate at the source. In contrast, suppose that a full load is set to *not* add a first field to indicate an inserted record. In this case, a CDC load is also set to not add a first field to each record regardless of its corresponding record operation at the source.

Similarly, how DMS creates and sets an additional first field depends on the settings of `includeOpForFullLoad` and `cdcInsertsAndUpdates`. In the following table, you can see how the settings of the `includeOpForFullLoad` and `cdcInsertsAndUpdates` attributes work together to affect the setting of migrated records in this format. 


| With these parameter settings | DMS sets target records as follows for .csv output  | includeOpForFullLoad | cdcInsertsAndUpdates | For full load | For CDC load | 
| --- | --- | --- | --- | --- | --- | 
| true | true | Added first field value set to I | Added first field value set to I or U | 
| false | false | No added field | Added first field value set to I, U, or D | 
| false | true | No added field | Added first field value set to I or U | 
| true | false | Added first field value set to I | Added first field value set to I, U, or D | 

## Target data types for S3 Parquet
<a name="CHAP_Target.S3.DataTypes"></a>

The following table shows the Parquet target data types that are supported when using AWS DMS and the default mapping from AWS DMS data types.

For additional information about AWS DMS data types, see [Data types for AWS Database Migration Service](CHAP_Reference.DataTypes.md).


|  AWS DMS data type  |  S3 parquet data type   | 
| --- | --- | 
| BYTES | BINARY | 
| DATE | DATE32 | 
| TIME | TIME32 | 
| DATETIME | TIMESTAMP | 
| INT1 | INT8 | 
| INT2 | INT16 | 
| INT4 | INT32 | 
| INT8 | INT64 | 
| NUMERIC | DECIMAL | 
| REAL4 | FLOAT | 
| REAL8 | DOUBLE | 
| STRING | STRING | 
| UINT1 | UINT8 | 
| UINT2 | UINT16 | 
| UINT4 | UINT32 | 
| UINT8 | UINT64 | 
| WSTRING | STRING | 
| BLOB | BINARY | 
| NCLOB | STRING | 
| CLOB | STRING | 
| BOOLEAN | BOOL | 

# Using an Amazon DynamoDB database as a target for AWS Database Migration Service
<a name="CHAP_Target.DynamoDB"></a>

You can use AWS DMS to migrate data to an Amazon DynamoDB table. Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability. AWS DMS supports using a relational database or MongoDB as a source.

In DynamoDB, tables, items, and attributes are the core components that you work with. A *table *is a collection of items, and each *item *is a collection of attributes. DynamoDB uses primary keys, called partition keys, to uniquely identify each item in a table. You can also use keys and secondary indexes to provide more querying flexibility.

You use object mapping to migrate your data from a source database to a target DynamoDB table. Object mapping enables you to determine where the source data is located in the target. 

When AWS DMS creates tables on an DynamoDB target endpoint, it creates as many tables as in the source database endpoint. AWS DMS also sets several DynamoDB parameter values. The cost for the table creation depends on the amount of data and the number of tables to be migrated.

**Note**  
The **SSL Mode** option on the AWS DMS console or API doesn’t apply to some data streaming and NoSQL services like Kinesis and DynamoDB. They are secure by default, so AWS DMS shows the SSL mode setting is equal to none (**SSL Mode=None**). You don’t need to provide any additional configuration for your endpoint to make use of SSL. For example, when using DynamoDB as a target endpoint, it is secure by default. All API calls to DynamoDB use SSL, so there is no need for an additional SSL option in the AWS DMS endpoint. You can securely put data and retrieve data through SSL endpoints using the HTTPS protocol, which AWS DMS uses by default when connecting to a DynamoDB database.

To help increase the speed of the transfer, AWS DMS supports a multithreaded full load to a DynamoDB target instance. DMS supports this multithreading with task settings that include the following:
+ `MaxFullLoadSubTasks` – Use this option to indicate the maximum number of source tables to load in parallel. DMS loads each table into its corresponding DynamoDB target table using a dedicated subtask. The default value is 8. The maximum value is 49.
+ `ParallelLoadThreads` – Use this option to specify the number of threads that AWS DMS uses to load each table into its DynamoDB target table. The default value is 0 (single-threaded). The maximum value is 200. You can ask to have this maximum limit increased.
**Note**  
DMS assigns each segment of a table to its own thread for loading. Therefore, set `ParallelLoadThreads` to the maximum number of segments that you specify for a table in the source.
+ `ParallelLoadBufferSize` – Use this option to specify the maximum number of records to store in the buffer that the parallel load threads use to load data to the DynamoDB target. The default value is 50. The maximum value is 1,000. Use this setting with `ParallelLoadThreads`. `ParallelLoadBufferSize` is valid only when there is more than one thread.
+ Table-mapping settings for individual tables – Use `table-settings` rules to identify individual tables from the source that you want to load in parallel. Also use these rules to specify how to segment the rows of each table for multithreaded loading. For more information, see [Table and collection settings rules and operations](CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Tablesettings.md).

**Note**  
When AWS DMS sets DynamoDB parameter values for a migration task, the default Read Capacity Units (RCU) parameter value is set to 200.  
The Write Capacity Units (WCU) parameter value is also set, but its value depends on several other settings:  
The default value for the WCU parameter is 200.
If the `ParallelLoadThreads` task setting is set greater than 1 (the default is 0), then the WCU parameter is set to 200 times the `ParallelLoadThreads` value.
Standard AWS DMS usage fees apply to resources you use.

## Migrating from a relational database to a DynamoDB table
<a name="CHAP_Target.DynamoDB.RDBMS2DynamoDB"></a>

AWS DMS supports migrating data to DynamoDB scalar data types. When migrating from a relational database like Oracle or MySQL to DynamoDB, you might want to restructure how you store this data.

Currently AWS DMS supports single table to single table restructuring to DynamoDB scalar type attributes. If you are migrating data into DynamoDB from a relational database table, you take data from a table and reformat it into DynamoDB scalar data type attributes. These attributes can accept data from multiple columns, and you can map a column to an attribute directly.

AWS DMS supports the following DynamoDB scalar data types:
+ String
+ Number
+ Boolean

**Note**  
NULL data from the source are ignored on the target.

## Prerequisites for using DynamoDB as a target for AWS Database Migration Service
<a name="CHAP_Target.DynamoDB.Prerequisites"></a>

Before you begin to work with a DynamoDB database as a target for AWS DMS, make sure that you create an IAM role. This IAM role should allow AWS DMS to assume and grant access to the DynamoDB tables that are being migrated into. The minimum set of access permissions is shown in the following IAM policy.

------
#### [ JSON ]

****  

```
{
   "Version":"2012-10-17",		 	 	 
   "Statement": [
      {
         "Sid": "",
         "Effect": "Allow",
         "Principal": {
            "Service": "dms.amazonaws.com"
         },
         "Action": "sts:AssumeRole"
      }
   ]
}
```

------

The role that you use for the migration to DynamoDB must have the following permissions.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:PutItem",
                "dynamodb:CreateTable",
                "dynamodb:DescribeTable",
                "dynamodb:DeleteTable",
                "dynamodb:DeleteItem",
                "dynamodb:UpdateItem"
            ],
            "Resource": [
                "arn:aws:dynamodb:us-west-2:111122223333:table/name1",
                "arn:aws:dynamodb:us-west-2:111122223333:table/OtherName*",
                "arn:aws:dynamodb:us-west-2:111122223333:table/awsdms_apply_exceptions",
                "arn:aws:dynamodb:us-west-2:111122223333:table/awsdms_full_load_exceptions"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:ListTables"
            ],
            "Resource": "*"
        }
    ]
}
```

------

## Limitations when using DynamoDB as a target for AWS Database Migration Service
<a name="CHAP_Target.DynamoDB.Limitations"></a>

The following limitations apply when using DynamoDB as a target:
+ DynamoDB limits the precision of the Number data type to 38 places. Store all data types with a higher precision as a String. You need to explicitly specify this using the object-mapping feature.
+ Because DynamoDB doesn't have a Date data type, data using the Date data type are converted to strings.
+ DynamoDB doesn't allow updates to the primary key attributes. This restriction is important when using ongoing replication with change data capture (CDC) because it can result in unwanted data in the target. Depending on how you have the object mapping, a CDC operation that updates the primary key can do one of two things. It can either fail or insert a new item with the updated primary key and incomplete data.
+ AWS DMS only supports replication of tables with noncomposite primary keys. The exception is if you specify an object mapping for the target table with a custom partition key or sort key, or both.
+ AWS DMS doesn't support LOB data unless it is a CLOB. AWS DMS converts CLOB data into a DynamoDB string when migrating the data.
+ When you use DynamoDB as target, only the Apply Exceptions control table (`dmslogs.awsdms_apply_exceptions`) is supported. For more information about control tables, see [Control table task settings](CHAP_Tasks.CustomizingTasks.TaskSettings.ControlTable.md).
+ AWS DMS doesn't support the task setting `TargetTablePrepMode=TRUNCATE_BEFORE_LOAD` for DynamoDB as a target. 
+ AWS DMS doesn't support the task setting `TaskRecoveryTableEnabled` for DynamoDB as a target. 
+ `BatchApply` is not supported for a DynamoDB endpoint.
+ AWS DMS cannot migrate attributes whose names match reserved words in DynamoDB. For more information, see [Reserved words in DynamoDB](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ReservedWords.html) in the *Amazon DynamoDB Developer Guide*.

## Using object mapping to migrate data to DynamoDB
<a name="CHAP_Target.DynamoDB.ObjectMapping"></a>

AWS DMS uses table-mapping rules to map data from the source to the target DynamoDB table. To map data to a DynamoDB target, you use a type of table-mapping rule called *object-mapping*. Object mapping lets you define the attribute names and the data to be migrated to them. You must have selection rules when you use object mapping.

DynamoDB doesn't have a preset structure other than having a partition key and an optional sort key. If you have a noncomposite primary key, AWS DMS uses it. If you have a composite primary key or you want to use a sort key, define these keys and the other attributes in your target DynamoDB table.

To create an object-mapping rule, you specify the `rule-type` as *object-mapping*. This rule specifies what type of object mapping you want to use. 

The structure for the rule is as follows:

```
{ "rules": [
    {
      "rule-type": "object-mapping",
      "rule-id": "<id>",
      "rule-name": "<name>",
      "rule-action": "<valid object-mapping rule action>",
      "object-locator": {
      "schema-name": "<case-sensitive schema name>",
      "table-name": ""
      },
      "target-table-name": "<table_name>"
    }
  ]
}
```

AWS DMS currently supports `map-record-to-record` and `map-record-to-document` as the only valid values for the `rule-action` parameter. These values specify what AWS DMS does by default to records that aren't excluded as part of the `exclude-columns` attribute list. These values don't affect the attribute mappings in any way. 
+ You can use `map-record-to-record` when migrating from a relational database to DynamoDB. It uses the primary key from the relational database as the partition key in DynamoDB and creates an attribute for each column in the source database. When using `map-record-to-record`, for any column in the source table not listed in the `exclude-columns` attribute list, AWS DMS creates a corresponding attribute on the target DynamoDB instance. It does so regardless of whether that source column is used in an attribute mapping. 
+ You use `map-record-to-document` to put source columns into a single, flat DynamoDB map on the target using the attribute name "\$1doc." When using `map-record-to-document`, AWS DMS places the data into a single, flat, DynamoDB map attribute on the source. This attribute is called "\$1doc". This placement applies to any column in the source table not listed in the `exclude-columns` attribute list. 

One way to understand the difference between the `rule-action` parameters `map-record-to-record` and `map-record-to-document` is to see the two parameters in action. For this example, assume that you are starting with a relational database table row with the following structure and data:

![\[sample database for example\]](http://docs.aws.amazon.com/dms/latest/userguide/images/datarep-dynamodb1.png)


To migrate this information to DynamoDB, you create rules to map the data into a DynamoDB table item. Note the columns listed for the `exclude-columns` parameter. These columns are not directly mapped over to the target. Instead, attribute mapping is used to combine the data into new items, such as where *FirstName* and *LastName* are grouped together to become *CustomerName* on the DynamoDB target. *NickName* and *income* are not excluded.

```
{
    "rules": [
        {
            "rule-type": "selection",
            "rule-id": "1",
            "rule-name": "1",
            "object-locator": {
                "schema-name": "test",
                "table-name": "%"
            },
            "rule-action": "include"
        },
        {
            "rule-type": "object-mapping",
            "rule-id": "2",
            "rule-name": "TransformToDDB",
            "rule-action": "map-record-to-record",
            "object-locator": {
                "schema-name": "test",
                "table-name": "customer"
            },
            "target-table-name": "customer_t",
            "mapping-parameters": {
                "partition-key-name": "CustomerName",
                "exclude-columns": [
                    "FirstName",
                    "LastName",
                    "HomeAddress",
                    "HomePhone",
                    "WorkAddress",
                    "WorkPhone"
                ],
                "attribute-mappings": [
                    {
                        "target-attribute-name": "CustomerName",
                        "attribute-type": "scalar",
                        "attribute-sub-type": "string",
                        "value": "${FirstName},${LastName}"
                    },
                    {
                        "target-attribute-name": "ContactDetails",
                        "attribute-type": "document",
                        "attribute-sub-type": "dynamodb-map",
                        "value": {
                            "M": {
                                "Home": {
                                    "M": {
                                        "Address": {
                                            "S": "${HomeAddress}"
                                        },
                                        "Phone": {
                                            "S": "${HomePhone}"
                                        }
                                    }
                                },
                                "Work": {
                                    "M": {
                                        "Address": {
                                            "S": "${WorkAddress}"
                                        },
                                        "Phone": {
                                            "S": "${WorkPhone}"
                                        }
                                    }
                                }
                            }
                        }
                    }
                ]
            }
        }
    ]
}
```

By using the `rule-action` parameter *map-record-to-record*, the data for *NickName* and *income* are mapped to items of the same name in the DynamoDB target. 

![\[Get started with AWS DMS\]](http://docs.aws.amazon.com/dms/latest/userguide/images/datarep-dynamodb2.png)


However, suppose that you use the same rules but change the `rule-action` parameter to *map-record-to-document*. In this case, the columns not listed in the `exclude-columns` parameter, *NickName* and *income*, are mapped to a *\$1doc* item.

![\[Get started with AWS DMS\]](http://docs.aws.amazon.com/dms/latest/userguide/images/datarep-dynamodb3.png)


### Using custom condition expressions with object mapping
<a name="CHAP_Target.DynamoDB.ObjectMapping.ConditionExpression"></a>

You can use a feature of DynamoDB called conditional expressions to manipulate data that is being written to a DynamoDB table. For more information about condition expressions in DynamoDB, see [Condition expressions](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Expressions.ConditionExpressions.html).

A condition expression member consists of: 
+ an expression (required) 
+ expression attribute values (required). Specifies a DynamoDB json structure of the attribute value. This is useful for comparing an attribute with a value in DynamoDB that you might not know until runtime. You can define an expression attribute value as a placeholder for an actual value.
+ expression attribute names (required). This helps avoid potential conflicts with any DynamoDB reserved words, attribute names containing special characters, and similar.
+ options for when to use the condition expression (optional). The default is apply-during-cdc = false and apply-during-full-load = true

The structure for the rule is as follows:

```
"target-table-name": "customer_t",
      "mapping-parameters": {
        "partition-key-name": "CustomerName",
        "condition-expression": {
          "expression":"<conditional expression>",
          "expression-attribute-values": [
              {
                "name":"<attribute name>",
                "value":<attribute value>
              }
          ],
          "apply-during-cdc":<optional Boolean value>,
          "apply-during-full-load": <optional Boolean value>
        }
```

The following sample highlights the sections used for condition expression.

![\[Get started with AWS DMS\]](http://docs.aws.amazon.com/dms/latest/userguide/images/datarep-Tasks-conditional1.png)


### Using attribute mapping with object mapping
<a name="CHAP_Target.DynamoDB.ObjectMapping.AttributeMapping"></a>

Attribute mapping lets you specify a template string using source column names to restructure data on the target. There is no formatting done other than what the user specifies in the template.

The following example shows the structure of the source database and the desired structure of the DynamoDB target. First is shown the structure of the source, in this case an Oracle database, and then the desired structure of the data in DynamoDB. The example concludes with the JSON used to create the desired target structure.

The structure of the Oracle data is as follows:


****  

| FirstName | LastName | StoreId | HomeAddress | HomePhone | WorkAddress | WorkPhone | DateOfBirth | 
| --- | --- | --- | --- | --- | --- | --- | --- | 
| Primary Key | N/A |  | 
| Randy | Marsh | 5 | 221B Baker Street  | 1234567890 | 31 Spooner Street, Quahog  | 9876543210  | 02/29/1988  | 

The structure of the DynamoDB data is as follows:


****  

| CustomerName | StoreId | ContactDetails | DateOfBirth | 
| --- | --- | --- | --- | 
| Partition Key | Sort Key | N/A | 
| <pre>Randy,Marsh</pre> | <pre>5</pre> | <pre>{<br />    "Name": "Randy",<br />    "Home": {<br />        "Address": "221B Baker Street",<br />        "Phone": 1234567890<br />    },<br />    "Work": {<br />        "Address": "31 Spooner Street, Quahog",<br />        "Phone": 9876541230<br />    }<br />}</pre> | <pre>02/29/1988</pre> | 

The following JSON shows the object mapping and column mapping used to achieve the DynamoDB structure:

```
{
    "rules": [
        {
            "rule-type": "selection",
            "rule-id": "1",
            "rule-name": "1",
            "object-locator": {
                "schema-name": "test",
                "table-name": "%"
            },
            "rule-action": "include"
        },
        {
            "rule-type": "object-mapping",
            "rule-id": "2",
            "rule-name": "TransformToDDB",
            "rule-action": "map-record-to-record",
            "object-locator": {
                "schema-name": "test",
                "table-name": "customer"
            },
            "target-table-name": "customer_t",
            "mapping-parameters": {
                "partition-key-name": "CustomerName",
                "sort-key-name": "StoreId",
                "exclude-columns": [
                    "FirstName",
                    "LastName",
                    "HomeAddress",
                    "HomePhone",
                    "WorkAddress",
                    "WorkPhone"
                ],
                "attribute-mappings": [
                    {
                        "target-attribute-name": "CustomerName",
                        "attribute-type": "scalar",
                        "attribute-sub-type": "string",
                        "value": "${FirstName},${LastName}"
                    },
                    {
                        "target-attribute-name": "StoreId",
                        "attribute-type": "scalar",
                        "attribute-sub-type": "string",
                        "value": "${StoreId}"
                    },
                    {
                        "target-attribute-name": "ContactDetails",
                        "attribute-type": "scalar",
                        "attribute-sub-type": "string",
                        "value": "{\"Name\":\"${FirstName}\",\"Home\":{\"Address\":\"${HomeAddress}\",\"Phone\":\"${HomePhone}\"}, \"Work\":{\"Address\":\"${WorkAddress}\",\"Phone\":\"${WorkPhone}\"}}"
                    }
                ]
            }
        }
    ]
}
```

Another way to use column mapping is to use DynamoDB format as your document type. The following code example uses *dynamodb-map* as the `attribute-sub-type` for attribute mapping. 

```
{
    "rules": [
        {
            "rule-type": "selection",
            "rule-id": "1",
            "rule-name": "1",
            "object-locator": {
                "schema-name": "test",
                "table-name": "%"
            },
            "rule-action": "include"
        },
        {
            "rule-type": "object-mapping",
            "rule-id": "2",
            "rule-name": "TransformToDDB",
            "rule-action": "map-record-to-record",
            "object-locator": {
                "schema-name": "test",
                "table-name": "customer"
            },
            "target-table-name": "customer_t",
            "mapping-parameters": {
                "partition-key-name": "CustomerName",
                "sort-key-name": "StoreId",
                "exclude-columns": [
                    "FirstName",
                    "LastName",
                    "HomeAddress",
                    "HomePhone",
                    "WorkAddress",
                    "WorkPhone"
                ],
                "attribute-mappings": [
                    {
                        "target-attribute-name": "CustomerName",
                        "attribute-type": "scalar",
                        "attribute-sub-type": "string",
                        "value": "${FirstName},${LastName}"
                    },
                    {
                        "target-attribute-name": "StoreId",
                        "attribute-type": "scalar",
                        "attribute-sub-type": "string",
                        "value": "${StoreId}"
                    },
                    {
                        "target-attribute-name": "ContactDetails",
                        "attribute-type": "document",
                        "attribute-sub-type": "dynamodb-map",
                        "value": {
                          "M": {
                            "Name": {
                              "S": "${FirstName}"
                            },
                            "Home": {
                                    "M": {
                                        "Address": {
                                            "S": "${HomeAddress}"
                                        },
                                        "Phone": {
                                            "S": "${HomePhone}"
                                        }
                                    }
                                },
                                "Work": {
                                    "M": {
                                        "Address": {
                                            "S": "${WorkAddress}"
                                        },
                                        "Phone": {
                                            "S": "${WorkPhone}"
                                        }
                                    }
                                }
                            }
                        }        
                    }
                ]
            }
        }
    ]
}
```

As an alternative to `dynamodb-map`, you can use `dynamodb-list` as the attribute-sub-type for attribute mapping, as shown in the following example.

```
{
"target-attribute-name": "ContactDetailsList",
"attribute-type": "document",
"attribute-sub-type": "dynamodb-list",
"value": {
    "L": [
            {
                "N": "${FirstName}"
            },
            {   
                "N": "${HomeAddress}"
            },
            {   
                "N": "${HomePhone}"
            },
            {
                "N": "${WorkAddress}"
            },
            {
                "N": "${WorkPhone}"
            }
        ]   
    }
}
```

### Example 1: Using attribute mapping with object mapping
<a name="CHAP_Target.DynamoDB.ColumnMappingExample1"></a>

The following example migrates data from two MySQL database tables, *nfl\$1data* and *sport\$1team* , to two DynamoDB table called *NFLTeams* and *SportTeams*. The structure of the tables and the JSON used to map the data from the MySQL database tables to the DynamoDB tables are shown following.

The structure of the MySQL database table *nfl\$1data* is shown below:

```
mysql> desc nfl_data;
+---------------+-------------+------+-----+---------+-------+
| Field         | Type        | Null | Key | Default | Extra |
+---------------+-------------+------+-----+---------+-------+
| Position      | varchar(5)  | YES  |     | NULL    |       |
| player_number | smallint(6) | YES  |     | NULL    |       |
| Name          | varchar(40) | YES  |     | NULL    |       |
| status        | varchar(10) | YES  |     | NULL    |       |
| stat1         | varchar(10) | YES  |     | NULL    |       |
| stat1_val     | varchar(10) | YES  |     | NULL    |       |
| stat2         | varchar(10) | YES  |     | NULL    |       |
| stat2_val     | varchar(10) | YES  |     | NULL    |       |
| stat3         | varchar(10) | YES  |     | NULL    |       |
| stat3_val     | varchar(10) | YES  |     | NULL    |       |
| stat4         | varchar(10) | YES  |     | NULL    |       |
| stat4_val     | varchar(10) | YES  |     | NULL    |       |
| team          | varchar(10) | YES  |     | NULL    |       |
+---------------+-------------+------+-----+---------+-------+
```

The structure of the MySQL database table *sport\$1team* is shown below:

```
mysql> desc sport_team;
+---------------------------+--------------+------+-----+---------+----------------+
| Field                     | Type         | Null | Key | Default | Extra          |
+---------------------------+--------------+------+-----+---------+----------------+
| id                        | mediumint(9) | NO   | PRI | NULL    | auto_increment |
| name                      | varchar(30)  | NO   |     | NULL    |                |
| abbreviated_name          | varchar(10)  | YES  |     | NULL    |                |
| home_field_id             | smallint(6)  | YES  | MUL | NULL    |                |
| sport_type_name           | varchar(15)  | NO   | MUL | NULL    |                |
| sport_league_short_name   | varchar(10)  | NO   |     | NULL    |                |
| sport_division_short_name | varchar(10)  | YES  |     | NULL    |                |
```

The table-mapping rules used to map the two tables to the two DynamoDB tables is shown below:

```
{
  "rules":[
    {
      "rule-type": "selection",
      "rule-id": "1",
      "rule-name": "1",
      "object-locator": {
        "schema-name": "dms_sample",
        "table-name": "nfl_data"
      },
      "rule-action": "include"
    },
    {
      "rule-type": "selection",
      "rule-id": "2",
      "rule-name": "2",
      "object-locator": {
        "schema-name": "dms_sample",
        "table-name": "sport_team"
      },
      "rule-action": "include"
    },
    {
      "rule-type":"object-mapping",
      "rule-id":"3",
      "rule-name":"MapNFLData",
      "rule-action":"map-record-to-record",
      "object-locator":{
        "schema-name":"dms_sample",
        "table-name":"nfl_data"
      },
      "target-table-name":"NFLTeams",
      "mapping-parameters":{
        "partition-key-name":"Team",
        "sort-key-name":"PlayerName",
        "exclude-columns": [
          "player_number", "team", "name"
        ],
        "attribute-mappings":[
          {
            "target-attribute-name":"Team",
            "attribute-type":"scalar",
            "attribute-sub-type":"string",
            "value":"${team}"
          },
          {
            "target-attribute-name":"PlayerName",
            "attribute-type":"scalar",
            "attribute-sub-type":"string",
            "value":"${name}"
          },
          {
            "target-attribute-name":"PlayerInfo",
            "attribute-type":"scalar",
            "attribute-sub-type":"string",
            "value":"{\"Number\": \"${player_number}\",\"Position\": \"${Position}\",\"Status\": \"${status}\",\"Stats\": {\"Stat1\": \"${stat1}:${stat1_val}\",\"Stat2\": \"${stat2}:${stat2_val}\",\"Stat3\": \"${stat3}:${
stat3_val}\",\"Stat4\": \"${stat4}:${stat4_val}\"}"
          }
        ]
      }
    },
    {
      "rule-type":"object-mapping",
      "rule-id":"4",
      "rule-name":"MapSportTeam",
      "rule-action":"map-record-to-record",
      "object-locator":{
        "schema-name":"dms_sample",
        "table-name":"sport_team"
      },
      "target-table-name":"SportTeams",
      "mapping-parameters":{
        "partition-key-name":"TeamName",
        "exclude-columns": [
          "name", "id"
        ],
        "attribute-mappings":[
          {
            "target-attribute-name":"TeamName",
            "attribute-type":"scalar",
            "attribute-sub-type":"string",
            "value":"${name}"
          },
          {
            "target-attribute-name":"TeamInfo",
            "attribute-type":"scalar",
            "attribute-sub-type":"string",
            "value":"{\"League\": \"${sport_league_short_name}\",\"Division\": \"${sport_division_short_name}\"}"
          }
        ]
      }
    }
  ]
}
```

The sample output for the *NFLTeams* DynamoDB table is shown below:

```
  "PlayerInfo": "{\"Number\": \"6\",\"Position\": \"P\",\"Status\": \"ACT\",\"Stats\": {\"Stat1\": \"PUNTS:73\",\"Stat2\": \"AVG:46\",\"Stat3\": \"LNG:67\",\"Stat4\": \"IN 20:31\"}",
  "PlayerName": "Allen, Ryan",
  "Position": "P",
  "stat1": "PUNTS",
  "stat1_val": "73",
  "stat2": "AVG",
  "stat2_val": "46",
  "stat3": "LNG",
  "stat3_val": "67",
  "stat4": "IN 20",
  "stat4_val": "31",
  "status": "ACT",
  "Team": "NE"
}
```

The sample output for the SportsTeams *DynamoDB* table is shown below:

```
{
  "abbreviated_name": "IND",
  "home_field_id": 53,
  "sport_division_short_name": "AFC South",
  "sport_league_short_name": "NFL",
  "sport_type_name": "football",
  "TeamInfo": "{\"League\": \"NFL\",\"Division\": \"AFC South\"}",
  "TeamName": "Indianapolis Colts"
}
```

## Target data types for DynamoDB
<a name="CHAP_Target.DynamoDB.DataTypes"></a>

The DynamoDB endpoint for AWS DMS supports most DynamoDB data types. The following table shows the Amazon AWS DMS target data types that are supported when using AWS DMS and the default mapping from AWS DMS data types.

For additional information about AWS DMS data types, see [Data types for AWS Database Migration Service](CHAP_Reference.DataTypes.md).

When AWS DMS migrates data from heterogeneous databases, we map data types from the source database to intermediate data types called AWS DMS data types. We then map the intermediate data types to the target data types. The following table shows each AWS DMS data type and the data type it maps to in DynamoDB:


| AWS DMS data type | DynamoDB data type | 
| --- | --- | 
|  String  |  String  | 
|  WString  |  String  | 
|  Boolean  |  Boolean  | 
|  Date  |  String  | 
|  DateTime  |  String  | 
|  INT1  |  Number  | 
|  INT2  |  Number  | 
|  INT4  |  Number  | 
|  INT8  |  Number  | 
|  Numeric  |  Number  | 
|  Real4  |  Number  | 
|  Real8  |  Number  | 
|  UINT1  |  Number  | 
|  UINT2  |  Number  | 
|  UINT4  |  Number  | 
| UINT8 | Number | 
| CLOB | String | 

# Using Amazon Kinesis Data Streams as a target for AWS Database Migration Service
<a name="CHAP_Target.Kinesis"></a>

You can use AWS DMS to migrate data to an Amazon Kinesis data stream. Amazon Kinesis data streams are part of the Amazon Kinesis Data Streams service. You can use Kinesis data streams to collect and process large streams of data records in real time.

A Kinesis data stream is made up of shards. *Shards* are uniquely identified sequences of data records in a stream. For more information on shards in Amazon Kinesis Data Streams, see [Shard](https://docs.aws.amazon.com/streams/latest/dev/key-concepts.html#shard) in the *Amazon Kinesis Data Streams Developer Guide.*

AWS Database Migration Service publishes records to a Kinesis data stream using JSON. During conversion, AWS DMS serializes each record from the source database into an attribute-value pair in JSON format or a JSON\$1UNFORMATTED message format. A JSON\$1UNFORMATTED message format is a single line JSON string with new line delimiter. It allows Amazon Data Firehose to deliver Kinesis data to an Amazon S3 destination, and then query it using various query engines including Amazon Athena.

You use object mapping to migrate your data from any supported data source to a target stream. With object mapping, you determine how to structure the data records in the stream. You also define a partition key for each table, which Kinesis Data Streams uses to group the data into its shards. 

AWS DMS also sets several Kinesis Data Streams parameter values. The cost for the table creation depends on the amount of data and the number of tables to be migrated.

**Note**  
The **SSL Mode** option on the AWS DMS console or API doesn’t apply to some data streaming and NoSQL services like Kinesis and DynamoDB. They are secure by default, so AWS DMS shows the SSL mode setting is equal to none (**SSL Mode=None**). You don’t need to provide any additional configuration for your endpoint to make use of SSL. For example, when using Kinesis as a target endpoint, it is secure by default. All API calls to Kinesis use SSL, so there is no need for an additional SSL option in the AWS DMS endpoint. You can securely put data and retrieve data through SSL endpoints using the HTTPS protocol, which AWS DMS uses by default when connecting to a Kinesis Data Stream.

**Kinesis Data Streams endpoint settings**

When you use Kinesis Data Streams target endpoints, you can get transaction and control details using the `KinesisSettings` option in the AWS DMS API. 

You can set connection settings in the following ways:
+ In the AWS DMS console, using endpoint settings.
+ In the CLI, using the `kinesis-settings` option of the [CreateEndpoint](https://docs.aws.amazon.com/dms/latest/APIReference/API_CreateEndpoint.html) command.

In the CLI, use the following request parameters of the `kinesis-settings` option:
**Note**  
Support for the `IncludeNullAndEmpty` endpoint setting is available in AWS DMS version 3.4.1 and higher. But support for the other following endpoint settings for Kinesis Data Streams targets is available in AWS DMS. 
+ `MessageFormat` – The output format for the records created on the endpoint. The message format is `JSON` (default) or `JSON_UNFORMATTED` (a single line with no tab).
+ `IncludeControlDetails` – Shows detailed control information for table definition, column definition, and table and column changes in the Kinesis message output. The default is `false`.
+ `IncludeNullAndEmpty` – Include NULL and empty columns in the target. The default is `false`.
+ `IncludePartitionValue` – Shows the partition value within the Kinesis message output, unless the partition type is `schema-table-type`. The default is `false`.
+ `IncludeTableAlterOperations` – Includes any data definition language (DDL) operations that change the table in the control data, such as `rename-table`, `drop-table`, `add-column`, `drop-column`, and `rename-column`. The default is `false`.
+ `IncludeTransactionDetails` – Provides detailed transaction information from the source database. This information includes a commit timestamp, a log position, and values for `transaction_id`, `previous_transaction_id`, and `transaction_record_id `(the record offset within a transaction). The default is `false`.
+ `PartitionIncludeSchemaTable` – Prefixes schema and table names to partition values, when the partition type is `primary-key-type`. Doing this increases data distribution among Kinesis shards. For example, suppose that a `SysBench` schema has thousands of tables and each table has only limited range for a primary key. In this case, the same primary key is sent from thousands of tables to the same shard, which causes throttling. The default is `false`.
+ `UseLargeIntegerValue` – Use up to 18 digit int instead of casting ints as doubles, available from AWS DMS version 3.5.4. The default is false.

The following example shows the `kinesis-settings` option in use with an example `create-endpoint` command issued using the AWS CLI.

```
aws dms \
  create-endpoint \
    --region <aws-region> \
    --endpoint-identifier <user-endpoint-identifier> \
    --endpoint-type target \
    --engine-name kinesis \
    --kinesis-settings ServiceAccessRoleArn=arn:aws:iam::<account-id>:role/<kinesis-role-name>,StreamArn=arn:aws:kinesis:<aws-region>:<account-id>:stream/<stream-name>,MessageFormat=json-unformatted,
IncludeControlDetails=true,IncludeTransactionDetails=true,IncludePartitionValue=true,PartitionIncludeSchemaTable=true,
IncludeTableAlterOperations=true
```

**Multithreaded full load task settings**

To help increase the speed of the transfer, AWS DMS supports a multithreaded full load to a Kinesis Data Streams target instance. DMS supports this multithreading with task settings that include the following:
+ `MaxFullLoadSubTasks` – Use this option to indicate the maximum number of source tables to load in parallel. DMS loads each table into its corresponding Kinesis target table using a dedicated subtask. The default is 8; the maximum value is 49.
+ `ParallelLoadThreads` – Use this option to specify the number of threads that AWS DMS uses to load each table into its Kinesis target table. The maximum value for a Kinesis Data Streams target is 32. You can ask to have this maximum limit increased.
+ `ParallelLoadBufferSize` – Use this option to specify the maximum number of records to store in the buffer that the parallel load threads use to load data to the Kinesis target. The default value is 50. The maximum value is 1,000. Use this setting with `ParallelLoadThreads`. `ParallelLoadBufferSize` is valid only when there is more than one thread.
+ `ParallelLoadQueuesPerThread` – Use this option to specify the number of queues each concurrent thread accesses to take data records out of queues and generate a batch load for the target. The default is 1. However, for Kinesis targets of various payload sizes, the valid range is 5–512 queues per thread.

**Multithreaded CDC load task settings**

You can improve the performance of change data capture (CDC) for real-time data streaming target endpoints like Kinesis using task settings to modify the behavior of the `PutRecords` API call. To do this, you can specify the number of concurrent threads, queues per thread, and the number of records to store in a buffer using `ParallelApply*` task settings. For example, suppose you want to perform a CDC load and apply 128 threads in parallel. You also want to access 64 queues per thread, with 50 records stored per buffer. 

To promote CDC performance, AWS DMS supports these task settings:
+ `ParallelApplyThreads` – Specifies the number of concurrent threads that AWS DMS uses during a CDC load to push data records to a Kinesis target endpoint. The default value is zero (0) and the maximum value is 32.
+ `ParallelApplyBufferSize` – Specifies the maximum number of records to store in each buffer queue for concurrent threads to push to a Kinesis target endpoint during a CDC load. The default value is 100 and the maximum value is 1,000. Use this option when `ParallelApplyThreads` specifies more than one thread. 
+ `ParallelApplyQueuesPerThread` – Specifies the number of queues that each thread accesses to take data records out of queues and generate a batch load for a Kinesis endpoint during CDC. The default value is 1 and the maximum value is 512.

When using `ParallelApply*` task settings, the `partition-key-type` default is the `primary-key` of the table, not `schema-name.table-name`.

## Using a before image to view original values of CDC rows for a Kinesis data stream as a target
<a name="CHAP_Target.Kinesis.BeforeImage"></a>

When writing CDC updates to a data-streaming target like Kinesis, you can view a source database row's original values before change by an update. To make this possible, AWS DMS populates a *before image* of update events based on data supplied by the source database engine. 

Different source database engines provide different amounts of information for a before image: 
+ Oracle provides updates to columns only if they change. 
+ PostgreSQL provides only data for columns that are part of the primary key (changed or not). To provide data for all columns (changed or not), you need to set `REPLICA_IDENTITY` to `FULL` instead of `DEFAULT`. Note that you should choose the `REPLICA_IDENTITY` setting carefully for each table. If you set `REPLICA_IDENTITY` to `FULL`, all of the column values are written to write-ahead logging (WAL) continuously. This may cause performance or resource issues with tables that are updated frequently.
+ MySQL generally provides data for all columns except for BLOB and CLOB data types (changed or not).

To enable before imaging to add original values from the source database to the AWS DMS output, use either the `BeforeImageSettings` task setting or the `add-before-image-columns` parameter. This parameter applies a column transformation rule. 

`BeforeImageSettings` adds a new JSON attribute to every update operation with values collected from the source database system, as shown following.

```
"BeforeImageSettings": {
    "EnableBeforeImage": boolean,
    "FieldName": string,  
    "ColumnFilter": pk-only (default) / non-lob / all (but only one)
}
```

**Note**  
Only apply `BeforeImageSettings` to AWS DMS tasks that contain a CDC component, such as full load plus CDC tasks (which migrate existing data and replicate ongoing changes), or to CDC only tasks (which replicate data changes only). Don't apply `BeforeImageSettings` to tasks that are full load only.

For `BeforeImageSettings` options, the following applies:
+ Set the `EnableBeforeImage` option to `true` to enable before imaging. The default is `false`. 
+ Use the `FieldName` option to assign a name to the new JSON attribute. When `EnableBeforeImage` is `true`, `FieldName` is required and can't be empty.
+ The `ColumnFilter` option specifies a column to add by using before imaging. To add only columns that are part of the table's primary keys, use the default value, `pk-only`. To add any column that has a before image value, use `all`. Note that the before image does not contain columns with LOB data types, such as CLOB or BLOB.

  ```
  "BeforeImageSettings": {
      "EnableBeforeImage": true,
      "FieldName": "before-image",
      "ColumnFilter": "pk-only"
    }
  ```

**Note**  
Amazon S3 targets don't support `BeforeImageSettings`. For S3 targets, use only the `add-before-image-columns` transformation rule to perform before imaging during CDC.

### Using a before image transformation rule
<a name="CHAP_Target.Kinesis.BeforeImage.Transform-Rule"></a>

As as an alternative to task settings, you can use the `add-before-image-columns` parameter, which applies a column transformation rule. With this parameter, you can enable before imaging during CDC on data streaming targets like Kinesis. 

By using `add-before-image-columns` in a transformation rule, you can apply more fine-grained control of the before image results. Transformation rules enable you to use an object locator that gives you control over tables selected for the rule. Also, you can chain transformation rules together, which allows different rules to be applied to different tables. You can then manipulate the columns produced by using other rules. 

**Note**  
Don't use the `add-before-image-columns` parameter together with the `BeforeImageSettings` task setting within the same task. Instead, use either the parameter or the setting, but not both, for a single task.

A `transformation` rule type with the `add-before-image-columns` parameter for a column must provide a `before-image-def` section. The following shows an example.

```
    {
      "rule-type": "transformation",
      …
      "rule-target": "column",
      "rule-action": "add-before-image-columns",
      "before-image-def":{
        "column-filter": one-of  (pk-only / non-lob / all),
        "column-prefix": string,
        "column-suffix": string,
      }
    }
```

The value of `column-prefix` is prepended to a column name, and the default value of `column-prefix` is `BI_`. The value of `column-suffix` is appended to the column name, and the default is empty. Don't set both `column-prefix` and `column-suffix` to empty strings.

Choose one value for `column-filter`. To add only columns that are part of table primary keys, choose `pk-only` . Choose `non-lob` to only add columns that are not of LOB type. Or choose `all` to add any column that has a before-image value.

### Example for a before image transformation rule
<a name="CHAP_Target.Kinesis.BeforeImage.Example"></a>

The transformation rule in the following example adds a new column called `BI_emp_no` in the target. So a statement like `UPDATE employees SET emp_no = 3 WHERE emp_no = 1;` populates the `BI_emp_no` field with 1. When you write CDC updates to Amazon S3 targets, the `BI_emp_no` column makes it possible to tell which original row was updated.

```
{
  "rules": [
    {
      "rule-type": "selection",
      "rule-id": "1",
      "rule-name": "1",
      "object-locator": {
        "schema-name": "%",
        "table-name": "%"
      },
      "rule-action": "include"
    },
    {
      "rule-type": "transformation",
      "rule-id": "2",
      "rule-name": "2",
      "rule-target": "column",
      "object-locator": {
        "schema-name": "%",
        "table-name": "employees"
      },
      "rule-action": "add-before-image-columns",
      "before-image-def": {
        "column-prefix": "BI_",
        "column-suffix": "",
        "column-filter": "pk-only"
      }
    }
  ]
}
```

For information on using the `add-before-image-columns` rule action, see [Transformation rules and actions](CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Transformations.md).

## Prerequisites for using a Kinesis data stream as a target for AWS Database Migration Service
<a name="CHAP_Target.Kinesis.Prerequisites"></a>

### IAM role for using a Kinesis data stream as a target for AWS Database Migration Service
<a name="CHAP_Target.Kinesis.Prerequisites.IAM"></a>

Before you set up a Kinesis data stream as a target for AWS DMS, make sure that you create an IAM role. This role must allow AWS DMS to assume and grant access to the Kinesis data streams that are being migrated into. The minimum set of access permissions is shown in the following IAM policy.

------
#### [ JSON ]

****  

```
{
   "Version":"2012-10-17",		 	 	 
   "Statement": [
   {
     "Sid": "1",
     "Effect": "Allow",
     "Principal": {
        "Service": "dms.amazonaws.com"
     },
   "Action": "sts:AssumeRole"
   }
]
}
```

------

The role that you use for the migration to a Kinesis data stream must have the following permissions.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "kinesis:DescribeStream",
        "kinesis:PutRecord",
        "kinesis:PutRecords"
      ],
      "Resource": "*"
    }
  ]
}
```

------

### Accessing a Kinesis data stream as a target for AWS Database Migration Service
<a name="CHAP_Target.Kinesis.Prerequisites.Access"></a>

In AWS DMS version 3.4.7 and higher, to connect to an Kinesis endpoint, you must do one of the following:
+ Configure DMS to use VPC endpoints. For information about configuring DMS to use VPC endpoints, see [Configuring VPC endpoints for AWS DMS](CHAP_VPC_Endpoints.md).
+ Configure DMS to use public routes, that is, make your replication instance public. For information about public replication instances, see [Public and private replication instances](CHAP_ReplicationInstance.PublicPrivate.md).

## Limitations when using Kinesis Data Streams as a target for AWS Database Migration Service
<a name="CHAP_Target.Kinesis.Limitations"></a>

The following limitations apply when using Kinesis Data Streams as a target:
+ AWS DMS publishes each update to a single record in the source database as one data record in a given Kinesis data stream regardless of transactions. However, you can include transaction details for each data record by using relevant parameters of the `KinesisSettings` API.
+ Full LOB mode is not supported.
+ The maximum supported LOB size is 1 MB.
+ Kinesis Data Streams don't support deduplication. Applications that consume data from a stream need to handle duplicate records. For more information, see [Handling duplicate records](https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-duplicates.html) in the *Amazon Kinesis Data Streams Developer Guide.*
+ AWS DMS supports the following four forms for partition keys:
  + `SchemaName.TableName`: A combination of the schema and table name.
  + `${AttributeName}`: The value of one of the fields in the JSON, or the primary key of the table in the source database.
  + `transaction-id`: The CDC transaction ID. All records within the same transaction go to the same partition.
  + `constant`: A fixed literal value for every record regardless of table or data. All records are sent to the same partition key value "constant", providing strict global ordering across all tables.

  ```
  {
      "rule-type": "object-mapping",
      "rule-id": "2",
      "rule-name": "PartitionKeyTypeExample",
      "rule-action": "map-record-to-document",
      "object-locator": {
          "schema-name": "onprem",
          "table-name": "it_system"
      },
      "mapping-parameters": {
          "partition-key-type": "transaction-id | constant | attribute-name | schema-table"
      }
  }
  ```
+ For information about encrypting your data at rest within Kinesis Data Streams, see [Data protection in Kinesis Data Streams](https://docs.aws.amazon.com/streams/latest/dev/server-side-encryption.html.html) in the *AWS Key Management Service Developer Guide*. 
+ The `IncludeTransactionDetails` endpoint setting is only supported when the source endpoint is Oracle, SQL Server, PostgreSQL, or MySQL. For other source endpoint types, transaction details will not be included.
+ `BatchApply` is not supported for a Kinesis endpoint. Using Batch Apply (for example, the `BatchApplyEnabled` target metadata task setting) for a Kinesis target causes task failure and data loss. Do not enable `BatchApply` when using Kinesis as a target endpoint.
+ Kinesis targets are only supported for a Kinesis data stream in the same AWS account and the same AWS Region as the replication instance.
+ When migrating from a MySQL source, the BeforeImage data doesn't include CLOB and BLOB data types. For more information, see [Using a before image to view original values of CDC rows for a Kinesis data stream as a target](#CHAP_Target.Kinesis.BeforeImage).
+ AWS DMS doesn't support migrating values of `BigInt` data type with more than 16 digits. To work around this limitation, you can use the following transformation rule to convert the `BigInt` column to a string. For more information about transformation rules, see [Transformation rules and actions](CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Transformations.md).

  ```
  {
      "rule-type": "transformation",
      "rule-id": "id",
      "rule-name": "name",
      "rule-target": "column",
      "object-locator": {
          "schema-name": "valid object-mapping rule action",
          "table-name": "",
          "column-name": ""
      },
      "rule-action": "change-data-type",
      "data-type": {
          "type": "string",
          "length": 20
      }
  }
  ```
+ When multiple DML operations within a single transaction modify a Large Object (LOB) column on the source database, the target database retains only the final LOB value from the last operation in that transaction. The intermediate LOB values set by earlier operations in the same transaction are overwritten, which can result in potential data loss or inconsistencies. This behavior occurs due to how LOB data is processed during replication.
+ AWS DMS does not support source data containing embedded `'\0'` characters when using Kinesis as a target endpoint. Data containing embedded `'\0'` characters will be truncated at the first `'\0'` character.

## Using object mapping to migrate data to a Kinesis data stream
<a name="CHAP_Target.Kinesis.ObjectMapping"></a>

AWS DMS uses table-mapping rules to map data from the source to the target Kinesis data stream. To map data to a target stream, you use a type of table-mapping rule called object mapping. You use object mapping to define how data records in the source map to the data records published to the Kinesis data stream. 

Kinesis data streams don't have a preset structure other than having a partition key. In an object mapping rule, the possible values of a `partition-key-type` for data records are `schema-table`, `transaction-id`, `primary-key`, `constant`, and `attribute-name`.

To create an object-mapping rule, you specify `rule-type` as `object-mapping`. This rule specifies what type of object mapping you want to use. 

The structure for the rule is as follows.

```
{
    "rules": [
        {
            "rule-type": "object-mapping",
            "rule-id": "id",
            "rule-name": "name",
            "rule-action": "valid object-mapping rule action",
            "object-locator": {
                "schema-name": "case-sensitive schema name",
                "table-name": ""
            }
        }
    ]
}
```

AWS DMS currently supports `map-record-to-record` and `map-record-to-document` as the only valid values for the `rule-action` parameter. These settings affect values that aren't excluded as part of the `exclude-columns` attribute list. The `map-record-to-record` and `map-record-to-document` values specify how AWS DMS handles these records by default. These values don't affect the attribute mappings in any way. 

Use `map-record-to-record` when migrating from a relational database to a Kinesis data stream. This rule type uses the `taskResourceId.schemaName.tableName` value from the relational database as the partition key in the Kinesis data stream and creates an attribute for each column in the source database. 

When using `map-record-to-record`, note the following:
+ This setting only affects columns excluded by the `exclude-columns` list.
+ For every such column, AWS DMS creates a corresponding attribute in the target topic.
+ AWS DMS creates this corresponding attribute regardless of whether the source column is used in an attribute mapping. 

Use `map-record-to-document` to put source columns into a single, flat document in the appropriate target stream using the attribute name "\$1doc". AWS DMS places the data into a single, flat map on the source called "`_doc`". This placement applies to any column in the source table not listed in the `exclude-columns` attribute list.

One way to understand `map-record-to-record` is to see it in action. For this example, assume that you are starting with a relational database table row with the following structure and data.


| FirstName | LastName | StoreId | HomeAddress | HomePhone | WorkAddress | WorkPhone | DateofBirth | 
| --- | --- | --- | --- | --- | --- | --- | --- | 
| Randy | Marsh | 5 | 221B Baker Street | 1234567890 | 31 Spooner Street, Quahog  | 9876543210 | 02/29/1988 | 

To migrate this information from a schema named `Test` to a Kinesis data stream, you create rules to map the data to the target stream. The following rule illustrates the mapping. 

```
{
    "rules": [
        {
            "rule-type": "selection",
            "rule-id": "1",
            "rule-name": "1",
            "rule-action": "include",
            "object-locator": {
                "schema-name": "Test",
                "table-name": "%"
            }
        },
        {
            "rule-type": "object-mapping",
            "rule-id": "2",
            "rule-name": "DefaultMapToKinesis",
            "rule-action": "map-record-to-record",
            "object-locator": {
                "schema-name": "Test",
                "table-name": "Customers"
            }
        }
    ]
}
```

The following illustrates the resulting record format in the Kinesis data stream: 
+ StreamName: XXX
+ PartitionKey: Test.Customers //schmaName.tableName
+ Data: //The following JSON message

  ```
    {
       "FirstName": "Randy",
       "LastName": "Marsh",
       "StoreId":  "5",
       "HomeAddress": "221B Baker Street",
       "HomePhone": "1234567890",
       "WorkAddress": "31 Spooner Street, Quahog",
       "WorkPhone": "9876543210",
       "DateOfBirth": "02/29/1988"
    }
  ```

However, suppose that you use the same rules but change the `rule-action` parameter to `map-record-to-document` and exclude certain columns. The following rule illustrates the mapping.

```
{
	"rules": [
	   {
			"rule-type": "selection",
			"rule-id": "1",
			"rule-name": "1",
			"rule-action": "include",
			"object-locator": {
				"schema-name": "Test",
				"table-name": "%"
			}
		},
		{
			"rule-type": "object-mapping",
			"rule-id": "2",
			"rule-name": "DefaultMapToKinesis",
			"rule-action": "map-record-to-document",
			"object-locator": {
				"schema-name": "Test",
				"table-name": "Customers"
			},
			"mapping-parameters": {
				"exclude-columns": [
					"homeaddress",
					"homephone",
					"workaddress",
					"workphone"
				]
			}
		}
	]
}
```

In this case, the columns not listed in the `exclude-columns` parameter, `FirstName`, `LastName`, `StoreId` and `DateOfBirth`, are mapped to `_doc`. The following illustrates the resulting record format. 

```
       {
            "data":{
                "_doc":{
                    "FirstName": "Randy",
                    "LastName": "Marsh",
                    "StoreId":  "5",
                    "DateOfBirth": "02/29/1988"
                }
            }
        }
```

### Restructuring data with attribute mapping
<a name="CHAP_Target.Kinesis.AttributeMapping"></a>

You can restructure the data while you are migrating it to a Kinesis data stream using an attribute map. For example, you might want to combine several fields in the source into a single field in the target. The following attribute map illustrates how to restructure the data.

```
{
    "rules": [
        {
            "rule-type": "selection",
            "rule-id": "1",
            "rule-name": "1",
            "rule-action": "include",
            "object-locator": {
                "schema-name": "Test",
                "table-name": "%"
            }
        },
        {
            "rule-type": "object-mapping",
            "rule-id": "2",
            "rule-name": "TransformToKinesis",
            "rule-action": "map-record-to-record",
            "target-table-name": "CustomerData",
            "object-locator": {
                "schema-name": "Test",
                "table-name": "Customers"
            },
            "mapping-parameters": {
                "partition-key-type": "attribute-name",
                "partition-key-name": "CustomerName",
                "exclude-columns": [
                    "firstname",
                    "lastname",
                    "homeaddress",
                    "homephone",
                    "workaddress",
                    "workphone"
                ],
                "attribute-mappings": [
                    {
                        "target-attribute-name": "CustomerName",
                        "attribute-type": "scalar",
                        "attribute-sub-type": "string",
                        "value": "${lastname}, ${firstname}"
                    },
                    {
                        "target-attribute-name": "ContactDetails",
                        "attribute-type": "document",
                        "attribute-sub-type": "json",
                        "value": {
                            "Home": {
                                "Address": "${homeaddress}",
                                "Phone": "${homephone}"
                            },
                            "Work": {
                                "Address": "${workaddress}",
                                "Phone": "${workphone}"
                            }
                        }
                    }
                ]
            }
        }
    ]
}
```

To set a constant value for `partition-key`, specify `"partition-key-type: "constant"`, this sets the partition value to `constant`. For example, you might do this to force all the data to be stored in a single shard. The following mapping illustrates this approach. 

```
{
    "rules": [
        {
            "rule-type": "selection",
            "rule-id": "1",
            "rule-name": "1",
            "object-locator": {
                "schema-name": "Test",
                "table-name": "%"
            },
            "rule-action": "include"
        },
        {
            "rule-type": "object-mapping",
            "rule-id": "2",
            "rule-name": "TransformToKinesis",
            "rule-action": "map-record-to-document",
            "object-locator": {
                "schema-name": "Test",
                "table-name": "Customer"
            },
            "mapping-parameters": {
                "partition-key-type": "constant",
                "exclude-columns": [
                    "FirstName",
                    "LastName",
                    "HomeAddress",
                    "HomePhone",
                    "WorkAddress",
                    "WorkPhone"
                ],
                "attribute-mappings": [
                    {
                        "target-attribute-name": "CustomerName",
                        "attribute-type": "scalar",
                        "attribute-sub-type": "string",
                        "value": "${FirstName},${LastName}"

                    },
                    {
                        "target-attribute-name": "ContactDetails",
                        "attribute-type": "scalar",
                        "attribute-sub-type": "string",
                        "value": {
                            "Home": {
                                "Address": "${HomeAddress}",
                                "Phone": "${HomePhone}"
                            },
                            "Work": {
                                "Address": "${WorkAddress}",
                                "Phone": "${WorkPhone}"
                            }
                        }
                    },
                    {
                        "target-attribute-name": "DateOfBirth",
                        "attribute-type": "scalar",
                        "attribute-sub-type": "string",
                        "value": "${DateOfBirth}"
                    }
                ]
            }
        }
    ]
}
```

**Note**  
The `partition-key` value for a control record that is for a specific table is `TaskId.SchemaName.TableName`. The `partition-key` value for a control record that is for a specific task is that record's `TaskId`. Specifying a `partition-key` value in the object mapping has no impact on the `partition-key` for a control record.  
 When `partition-key-type` is set to `attribute-name` in a table mapping rule, you must specify `partition-key-name`, which must reference either a column from the source table or a custom column defined in the mapping. Additionally, `attribute-mappings` must be provided to define how source columns map to the target Kinesis Stream.

### Message format for Kinesis Data Streams
<a name="CHAP_Target.Kinesis.Messageformat"></a>

The JSON output is simply a list of key-value pairs. A JSON\$1UNFORMATTED message format is a single line JSON string with new line delimiter.

AWS DMS provides the following reserved fields to make it easier to consume the data from the Kinesis Data Streams: 

**RecordType**  
The record type can be either data or control. *Data records *represent the actual rows in the source. *Control records* are for important events in the stream, for example a restart of the task.

**Operation**  
For data records, the operation can be `load`, `insert`, `update`, or `delete`.  
For control records, the operation can be `create-table`, `rename-table`, `drop-table`, `change-columns`, `add-column`, `drop-column`, `rename-column`, or `column-type-change`.

**SchemaName**  
The source schema for the record. This field can be empty for a control record.

**TableName**  
The source table for the record. This field can be empty for a control record.

**Timestamp**  
The timestamp for when the JSON message was constructed. The field is formatted with the ISO 8601 format.

# Using Apache Kafka as a target for AWS Database Migration Service
<a name="CHAP_Target.Kafka"></a>

You can use AWS DMS to migrate data to an Apache Kafka cluster. Apache Kafka is a distributed streaming platform. You can use Apache Kafka for ingesting and processing streaming data in real-time.

AWS also offers Amazon Managed Streaming for Apache Kafka (Amazon MSK) to use as an AWS DMS target. Amazon MSK is a fully managed Apache Kafka streaming service that simplifies the implementation and management of Apache Kafka instances. It works with open-source Apache Kafka versions, and you access Amazon MSK instances as AWS DMS targets exactly like any Apache Kafka instance. For more information, see [What is Amazon MSK?](https://docs.aws.amazon.com/msk/latest/developerguide/what-is-msk.html) in the* Amazon Managed Streaming for Apache Kafka Developer Guide.*

A Kafka cluster stores streams of records in categories called topics that are divided into partitions. *Partitions* are uniquely identified sequences of data records (messages) in a topic. Partitions can be distributed across multiple brokers in a cluster to enable parallel processing of a topic’s records. For more information on topics and partitions and their distribution in Apache Kafka, see [Topics and logs](https://kafka.apache.org/documentation/#intro_topics) and [Distribution](https://kafka.apache.org/documentation/#intro_distribution).

Your Kafka cluster can be either an Amazon MSK instance, a cluster running on an Amazon EC2 instance, or an on-premises cluster. An Amazon MSK instance or a cluster on an Amazon EC2 instance can be in the same VPC or a different one. If your cluster is on-premises, you can use your own on-premises name server for your replication instance to resolve the cluster's host name. For information about setting up a name server for your replication instance, see [Using your own on-premises name server](CHAP_BestPractices.md#CHAP_BestPractices.Rte53DNSResolver). For more information about setting up a network, see [Setting up a network for a replication instance](CHAP_ReplicationInstance.VPC.md).

When using an Amazon MSK cluster, make sure that its security group allows access from your replication instance. For information about changing the security group for an Amazon MSK cluster, see [Changing an Amazon MSK cluster's security group](https://docs.aws.amazon.com/msk/latest/developerguide/change-security-group.html).

AWS Database Migration Service publishes records to a Kafka topic using JSON. During conversion, AWS DMS serializes each record from the source database into an attribute-value pair in JSON format.

To migrate your data from any supported data source to a target Kafka cluster, you use object mapping. With object mapping, you determine how to structure the data records in the target topic. You also define a partition key for each table, which Apache Kafka uses to group the data into its partitions. 

Currently, AWS DMS supports a single topic per task. For a single task with multiple tables, all messages go to a single topic. Each message includes a metadata section that identifies the target schema and table. AWS DMS versions 3.4.6 and higher support multitopic replication using object mapping. For more information, see [Multitopic replication using object mapping](#CHAP_Target.Kafka.MultiTopic).

**Apache Kafka endpoint settings**

You can specify connection details through endpoint settings in the AWS DMS console, or the `--kafka-settings` option in the CLI. The requirements for each setting follow:
+ `Broker` – Specify the locations of one or more brokers in your Kafka cluster in the form of a comma-separated list of each `broker-hostname:port`. An example is `"ec2-12-345-678-901.compute-1.amazonaws.com:2345,ec2-10-987-654-321.compute-1.amazonaws.com:9876"`. This setting can specify the locations of any or all brokers in the cluster. The cluster brokers all communicate to handle the partitioning of data records migrated to the topic.
+ `Topic` – (Optional) Specify the topic name with a maximum length of 255 letters and symbols. You can use period (.), underscore (\$1), and minus (-). Topic names with a period (.) or underscore (\$1) can collide in internal data structures. Use either one, but not both of these symbols in the topic name. If you don't specify a topic name, AWS DMS uses `"kafka-default-topic"` as the migration topic.
**Note**  
To have AWS DMS create either a migration topic you specify or the default topic, set `auto.create.topics.enable = true` as part of your Kafka cluster configuration. For more information, see [Limitations when using Apache Kafka as a target for AWS Database Migration Service](#CHAP_Target.Kafka.Limitations)
+ `MessageFormat` – The output format for the records created on the endpoint. The message format is `JSON` (default) or `JSON_UNFORMATTED` (a single line with no tab).
+ `MessageMaxBytes` – The maximum size in bytes for records created on the endpoint. The default is 1,000,000.
**Note**  
You can only use the AWS CLI/SDK to change `MessageMaxBytes` to a non-default value. For example, to modify your existing Kafka endpoint and change `MessageMaxBytes`, use the following command.  

  ```
  aws dms modify-endpoint --endpoint-arn your-endpoint 
  --kafka-settings Broker="broker1-server:broker1-port,broker2-server:broker2-port,...",
  Topic=topic-name,MessageMaxBytes=integer-of-max-message-size-in-bytes
  ```
+ `IncludeTransactionDetails` – Provides detailed transaction information from the source database. This information includes a commit timestamp, a log position, and values for `transaction_id`, `previous_transaction_id`, and `transaction_record_id` (the record offset within a transaction). The default is `false`.
+ `IncludePartitionValue` – Shows the partition value within the Kafka message output, unless the partition type is `schema-table-type`. The default is `false`.
+ `PartitionIncludeSchemaTable` – Prefixes schema and table names to partition values, when the partition type is `primary-key-type`. Doing this increases data distribution among Kafka partitions. For example, suppose that a `SysBench` schema has thousands of tables and each table has only limited range for a primary key. In this case, the same primary key is sent from thousands of tables to the same partition, which causes throttling. The default is `false`.
+ `IncludeTableAlterOperations` – Includes any data definition language (DDL) operations that change the table in the control data, such as `rename-table`, `drop-table`, `add-column`, `drop-column`, and `rename-column`. The default is `false`. 
+ `IncludeControlDetails` – Shows detailed control information for table definition, column definition, and table and column changes in the Kafka message output. The default is `false`.
+ `IncludeNullAndEmpty` – Include NULL and empty columns in the target. The default is `false`.
+ `SecurityProtocol` – Sets a secure connection to a Kafka target endpoint using Transport Layer Security (TLS). Options include `ssl-authentication`, `ssl-encryption`, and `sasl-ssl`. Using `sasl-ssl` requires `SaslUsername` and `SaslPassword`.
+ `SslEndpointIdentificationAlgorithm` – Sets hostname verification for the certificate. This setting is supported in AWS DMS version 3.5.1 and later. Options include the following: 
  + `NONE`: Disable hostname verification of the broker in the client connection.
  + `HTTPS`: Enable hostname verification of the broker in the client connection.
+ `useLargeIntegerValue` – Use up to 18 digit int instead of casting ints as doubles, available from AWS DMS version 3.5.4. The default is false.

You can use settings to help increase the speed of your transfer. To do so, AWS DMS supports a multithreaded full load to an Apache Kafka target cluster. AWS DMS supports this multithreading with task settings that include the following:
+ `MaxFullLoadSubTasks` – Use this option to indicate the maximum number of source tables to load in parallel. AWS DMS loads each table into its corresponding Kafka target table using a dedicated subtask. The default is 8; the maximum value is 49.
+ `ParallelLoadThreads` – Use this option to specify the number of threads that AWS DMS uses to load each table into its Kafka target table. The maximum value for an Apache Kafka target is 32. You can ask to have this maximum limit increased.
+ `ParallelLoadBufferSize` – Use this option to specify the maximum number of records to store in the buffer that the parallel load threads use to load data to the Kafka target. The default value is 50. The maximum value is 1,000. Use this setting with `ParallelLoadThreads`. `ParallelLoadBufferSize` is valid only when there is more than one thread.
+ `ParallelLoadQueuesPerThread` – Use this option to specify the number of queues each concurrent thread accesses to take data records out of queues and generate a batch load for the target. The default is 1. The maximum is 512.

You can improve the performance of change data capture (CDC) for Kafka endpoints by tuning task settings for parallel threads and bulk operations. To do this, you can specify the number of concurrent threads, queues per thread, and the number of records to store in a buffer using `ParallelApply*` task settings. For example, suppose you want to perform a CDC load and apply 128 threads in parallel. You also want to access 64 queues per thread, with 50 records stored per buffer. 

To promote CDC performance, AWS DMS supports these task settings:
+ `ParallelApplyThreads` – Specifies the number of concurrent threads that AWS DMS uses during a CDC load to push data records to a Kafka target endpoint. The default value is zero (0) and the maximum value is 32.
+ `ParallelApplyBufferSize` – Specifies the maximum number of records to store in each buffer queue for concurrent threads to push to a Kafka target endpoint during a CDC load. The default value is 100 and the maximum value is 1,000. Use this option when `ParallelApplyThreads` specifies more than one thread. 
+ `ParallelApplyQueuesPerThread` – Specifies the number of queues that each thread accesses to take data records out of queues and generate a batch load for a Kafka endpoint during CDC. The default is 1. The maximum is 512.

When using `ParallelApply*` task settings, the `partition-key-type` default is the `primary-key` of the table, not `schema-name.table-name`.

## Connecting to Kafka using Transport Layer Security (TLS)
<a name="CHAP_Target.Kafka.TLS"></a>

A Kafka cluster accepts secure connections using Transport Layer Security (TLS). With DMS, you can use any one of the following three security protocol options to secure a Kafka endpoint connection.

**SSL encryption (`server-encryption`)**  
Clients validate server identity through the server’s certificate. Then an encrypted connection is made between server and client.

**SSL authentication (`mutual-authentication`)**  
Server and client validate the identity with each other through their own certificates. Then an encrypted connection is made between server and client.

**SASL-SSL (`mutual-authentication`)**  
The Simple Authentication and Security Layer (SASL) method replaces the client’s certificate with a user name and password to validate a client identity. Specifically, you provide a user name and password that the server has registered so that the server can validate the identity of a client. Then an encrypted connection is made between server and client.

**Important**  
Apache Kafka and Amazon MSK accept resolved certificates. This is a known limitation of Kafka and Amazon MSK to be addressed. For more information, see [Apache Kafka issues, KAFKA-3700](https://issues.apache.org/jira/browse/KAFKA-3700).  
If you're using Amazon MSK, consider using access control lists (ACLs) as a workaround to this known limitation. For more information about using ACLs, see [Apache Kafka ACLs](https://docs.aws.amazon.com//msk/latest/developerguide/msk-acls.html) section of *Amazon Managed Streaming for Apache Kafka Developer Guide*.  
If you're using a self-managed Kafka cluster, see [Comment dated 21/Oct/18](https://issues.apache.org/jira/browse/KAFKA-3700?focusedCommentId=16658376) for information about configuring your cluster.

### Using SSL encryption with Amazon MSK or a self-managed Kafka cluster
<a name="CHAP_Target.Kafka.TLS.SSLencryption"></a>

You can use SSL encryption to secure an endpoint connection to Amazon MSK or a self-managed Kafka cluster. When you use the SSL encryption authentication method, clients validate a server's identity through the server’s certificate. Then an encrypted connection is made between server and client.

**To use SSL encryption to connect to Amazon MSK**
+ Set the security protocol endpoint setting (`SecurityProtocol`) using the `ssl-encryption` option when you create your target Kafka endpoint. 

  The JSON example following sets the security protocol as SSL encryption.

```
"KafkaSettings": {
    "SecurityProtocol": "ssl-encryption", 
}
```

**To use SSL encryption for a self-managed Kafka cluster**

1. If you're using a private Certification Authority (CA) in your on-premises Kafka cluster, upload your private CA cert and get an Amazon Resource Name (ARN). 

1. Set the security protocol endpoint setting (`SecurityProtocol`) using the `ssl-encryption` option when you create your target Kafka endpoint. The JSON example following sets the security protocol as `ssl-encryption`.

   ```
   "KafkaSettings": {
       "SecurityProtocol": "ssl-encryption", 
   }
   ```

1. If you're using a private CA, set `SslCaCertificateArn` in the ARN you got in the first step above.

### Using SSL authentication
<a name="CHAP_Target.Kafka.TLS.SSLauthentication"></a>

You can use SSL authentication to secure an endpoint connection to Amazon MSK or a self-managed Kafka cluster.

To enable client authentication and encryption using SSL authentication to connect to Amazon MSK, do the following:
+ Prepare a private key and public certificate for Kafka.
+ Upload certificates to the DMS certificate manager.
+ Create a Kafka target endpoint with corresponding certificate ARNs specified in Kafka endpoint settings.

**To prepare a private key and public certificate for Amazon MSK**

1. Create an EC2 instance and set up a client to use authentication as described in steps 1 through 9 in the [Client Authentication](https://docs.aws.amazon.com/msk/latest/developerguide/msk-authentication.html) section of *Amazon Managed Streaming for Apache Kafka Developer Guide*.

   After you complete those steps, you have a Certificate-ARN (the public certificate ARN saved in ACM), and a private key contained within a `kafka.client.keystore.jks` file.

1. Get the public certificate and copy the certificate to the `signed-certificate-from-acm.pem` file, using the command following:

   ```
   aws acm-pca get-certificate --certificate-authority-arn Private_CA_ARN --certificate-arn Certificate_ARN
   ```

   That command returns information similar to the following example:

   ```
   {"Certificate": "123", "CertificateChain": "456"}
   ```

   You then copy your equivalent of `"123"` to the `signed-certificate-from-acm.pem` file.

1. Get the private key by importing the `msk-rsa` key from `kafka.client.keystore.jks to keystore.p12`, as shown in the following example.

   ```
   keytool -importkeystore \
   -srckeystore kafka.client.keystore.jks \
   -destkeystore keystore.p12 \
   -deststoretype PKCS12 \
   -srcalias msk-rsa-client \
   -deststorepass test1234 \
   -destkeypass test1234
   ```

1. Use the following command to export `keystore.p12` into `.pem` format. 

   ```
   Openssl pkcs12 -in keystore.p12 -out encrypted-private-client-key.pem –nocerts
   ```

   The **Enter PEM pass phrase** message appears and identifies the key that is applied to encrypt the certificate.

1. Remove bag attributes and key attributes from the `.pem` file to make sure that the first line starts with the following string.

   ```
                                   ---BEGIN ENCRYPTED PRIVATE KEY---
   ```

**To upload a public certificate and private key to the DMS certificate manager and test the connection to Amazon MSK**

1. Upload to DMS certificate manager using the following command.

   ```
   aws dms import-certificate --certificate-identifier signed-cert --certificate-pem file://path to signed cert
   aws dms import-certificate --certificate-identifier private-key —certificate-pem file://path to private key
   ```

1. Create an Amazon MSK target endpoint and test connection to make sure that TLS authentication works.

   ```
   aws dms create-endpoint --endpoint-identifier $endpoint-identifier --engine-name kafka --endpoint-type target --kafka-settings 
   '{"Broker": "b-0.kafka260.aaaaa1.a99.kafka.us-east-1.amazonaws.com:0000", "SecurityProtocol":"ssl-authentication", 
   "SslClientCertificateArn": "arn:aws:dms:us-east-1:012346789012:cert:",
   "SslClientKeyArn": "arn:aws:dms:us-east-1:0123456789012:cert:","SslClientKeyPassword":"test1234"}'
   aws dms test-connection -replication-instance-arn=$rep_inst_arn —endpoint-arn=$kafka_tar_arn_msk
   ```

**Important**  
You can use SSL authentication to secure a connection to a self-managed Kafka cluster. In some cases, you might use a private Certification Authority (CA) in your on-premises Kafka cluster. If so, upload your CA chain, public certificate, and private key to the DMS certificate manager. Then, use the corresponding Amazon Resource Name (ARN) in your endpoint settings when you create your on-premises Kafka target endpoint.

**To prepare a private key and signed certificate for a self-managed Kafka cluster**

1. Generate a key pair as shown in the following example.

   ```
   keytool -genkey -keystore kafka.server.keystore.jks -validity 300 -storepass your-keystore-password 
   -keypass your-key-passphrase -dname "CN=your-cn-name" 
   -alias alias-of-key-pair -storetype pkcs12 -keyalg RSA
   ```

1. Generate a Certificate Sign Request (CSR). 

   ```
   keytool -keystore kafka.server.keystore.jks -certreq -file server-cert-sign-request-rsa -alias on-premise-rsa -storepass your-key-store-password 
   -keypass your-key-password
   ```

1. Use the CA in your cluster truststore to sign the CSR. If you don't have a CA, you can create your own private CA.

   ```
   openssl req -new -x509 -keyout ca-key -out ca-cert -days validate-days                            
   ```

1. Import `ca-cert` into the server truststore and keystore. If you don't have a truststore, use the following command to create the truststore and import `ca-cert `into it. 

   ```
   keytool -keystore kafka.server.truststore.jks -alias CARoot -import -file ca-cert
   keytool -keystore kafka.server.keystore.jks -alias CARoot -import -file ca-cert
   ```

1. Sign the certificate.

   ```
   openssl x509 -req -CA ca-cert -CAkey ca-key -in server-cert-sign-request-rsa -out signed-server-certificate.pem 
   -days validate-days -CAcreateserial -passin pass:ca-password
   ```

1. Import the signed certificate to the keystore.

   ```
   keytool -keystore kafka.server.keystore.jks -import -file signed-certificate.pem -alias on-premise-rsa -storepass your-keystore-password 
   -keypass your-key-password
   ```

1. Use the following command to import the `on-premise-rsa` key from `kafka.server.keystore.jks` to `keystore.p12`.

   ```
   keytool -importkeystore \
   -srckeystore kafka.server.keystore.jks \
   -destkeystore keystore.p12 \
   -deststoretype PKCS12 \
   -srcalias on-premise-rsa \
   -deststorepass your-truststore-password \
   -destkeypass your-key-password
   ```

1. Use the following command to export `keystore.p12` into `.pem` format.

   ```
   Openssl pkcs12 -in keystore.p12 -out encrypted-private-server-key.pem –nocerts
   ```

1. Upload `encrypted-private-server-key.pem`, `signed-certificate.pem`, and `ca-cert` to the DMS certificate manager.

1. Create an endpoint by using the returned ARNs.

   ```
   aws dms create-endpoint --endpoint-identifier $endpoint-identifier --engine-name kafka --endpoint-type target --kafka-settings 
   '{"Broker": "b-0.kafka260.aaaaa1.a99.kafka.us-east-1.amazonaws.com:9092", "SecurityProtocol":"ssl-authentication", 
   "SslClientCertificateArn": "your-client-cert-arn","SslClientKeyArn": "your-client-key-arn","SslClientKeyPassword":"your-client-key-password", 
   "SslCaCertificateArn": "your-ca-certificate-arn"}'
                               
   aws dms test-connection -replication-instance-arn=$rep_inst_arn —endpoint-arn=$kafka_tar_arn_msk
   ```

### Using SASL-SSL authentication to connect to Amazon MSK
<a name="CHAP_Target.Kafka.TLS.SSL-SASL"></a>

The Simple Authentication and Security Layer (SASL) method uses a user name and password to validate a client identity, and makes an encrypted connection between server and client.

To use SASL, you first create a secure user name and password when you set up your Amazon MSK cluster. For a description how to set up a secure user name and password for an Amazon MSK cluster, see [Setting up SASL/SCRAM authentication for an Amazon MSK cluster](https://docs.aws.amazon.com/msk/latest/developerguide/msk-password.html#msk-password-tutorial) in the *Amazon Managed Streaming for Apache Kafka Developer Guide*.

Then, when you create your Kafka target endpoint, set the security protocol endpoint setting (`SecurityProtocol`) using the `sasl-ssl` option. You also set `SaslUsername` and `SaslPassword` options. Make sure these are consistent with the secure user name and password that you created when you first set up your Amazon MSK cluster, as shown in the following JSON example.

```
                   
"KafkaSettings": {
    "SecurityProtocol": "sasl-ssl",
    "SaslUsername":"Amazon MSK cluster secure user name",
    "SaslPassword":"Amazon MSK cluster secure password"                    
}
```

**Note**  
Currently, AWS DMS supports only public CA backed SASL-SSL. DMS does not support SASL-SSL for use with self-managed Kafka that is backed by private CA.
For SASL-SSL authentication, AWS DMS supports the SCRAM-SHA-512 mechanism by default. AWS DMS versions 3.5.0 and higher also support the Plain mechanism. To support the Plain mechanism, set the `SaslMechanism` parameter of the `KafkaSettings` API data type to `PLAIN`. The datatype `PLAIN` is supported by Kafka, but not supported by Amazon Kafka (MSK).

## Using a before image to view original values of CDC rows for Apache Kafka as a target
<a name="CHAP_Target.Kafka.BeforeImage"></a>

When writing CDC updates to a data-streaming target like Kafka you can view a source database row's original values before change by an update. To make this possible, AWS DMS populates a *before image* of update events based on data supplied by the source database engine. 

Different source database engines provide different amounts of information for a before image: 
+ Oracle provides updates to columns only if they change. 
+ PostgreSQL provides only data for columns that are part of the primary key (changed or not). If logical replication is in use and REPLICA IDENTITY FULL is set for the source table, you can get entire before and after information on the row written to the WALs and available here.
+ MySQL generally provides data for all columns (changed or not).

To enable before imaging to add original values from the source database to the AWS DMS output, use either the `BeforeImageSettings` task setting or the `add-before-image-columns` parameter. This parameter applies a column transformation rule. 

`BeforeImageSettings` adds a new JSON attribute to every update operation with values collected from the source database system, as shown following.

```
"BeforeImageSettings": {
    "EnableBeforeImage": boolean,
    "FieldName": string,  
    "ColumnFilter": pk-only (default) / non-lob / all (but only one)
}
```

**Note**  
Apply `BeforeImageSettings` to full load plus CDC tasks (which migrate existing data and replicate ongoing changes), or to CDC only tasks (which replicate data changes only). Don't apply `BeforeImageSettings` to tasks that are full load only.

For `BeforeImageSettings` options, the following applies:
+ Set the `EnableBeforeImage` option to `true` to enable before imaging. The default is `false`. 
+ Use the `FieldName` option to assign a name to the new JSON attribute. When `EnableBeforeImage` is `true`, `FieldName` is required and can't be empty.
+ The `ColumnFilter` option specifies a column to add by using before imaging. To add only columns that are part of the table's primary keys, use the default value, `pk-only`. To add only columns that are not of LOB type, use `non-lob`. To add any column that has a before image value, use `all`. 

  ```
  "BeforeImageSettings": {
      "EnableBeforeImage": true,
      "FieldName": "before-image",
      "ColumnFilter": "pk-only"
    }
  ```

### Using a before image transformation rule
<a name="CHAP_Target.Kafka.BeforeImage.Transform-Rule"></a>

As as an alternative to task settings, you can use the `add-before-image-columns` parameter, which applies a column transformation rule. With this parameter, you can enable before imaging during CDC on data streaming targets like Kafka.

By using `add-before-image-columns` in a transformation rule, you can apply more fine-grained control of the before image results. Transformation rules enable you to use an object locator that gives you control over tables selected for the rule. Also, you can chain transformation rules together, which allows different rules to be applied to different tables. You can then manipulate the columns produced by using other rules. 

**Note**  
Don't use the `add-before-image-columns` parameter together with the `BeforeImageSettings` task setting within the same task. Instead, use either the parameter or the setting, but not both, for a single task.

A `transformation` rule type with the `add-before-image-columns` parameter for a column must provide a `before-image-def` section. The following shows an example.

```
    {
      "rule-type": "transformation",
      …
      "rule-target": "column",
      "rule-action": "add-before-image-columns",
      "before-image-def":{
        "column-filter": one-of  (pk-only / non-lob / all),
        "column-prefix": string,
        "column-suffix": string,
      }
    }
```

The value of `column-prefix` is prepended to a column name, and the default value of `column-prefix` is `BI_`. The value of `column-suffix` is appended to the column name, and the default is empty. Don't set both `column-prefix` and `column-suffix` to empty strings.

Choose one value for `column-filter`. To add only columns that are part of table primary keys, choose `pk-only` . Choose `non-lob` to only add columns that are not of LOB type. Or choose `all` to add any column that has a before-image value.

### Example for a before image transformation rule
<a name="CHAP_Target.Kafka.BeforeImage.Example"></a>

The transformation rule in the following example adds a new column called `BI_emp_no` in the target. So a statement like `UPDATE employees SET emp_no = 3 WHERE emp_no = 1;` populates the `BI_emp_no` field with 1. When you write CDC updates to Amazon S3 targets, the `BI_emp_no` column makes it possible to tell which original row was updated.

```
{
  "rules": [
    {
      "rule-type": "selection",
      "rule-id": "1",
      "rule-name": "1",
      "object-locator": {
        "schema-name": "%",
        "table-name": "%"
      },
      "rule-action": "include"
    },
    {
      "rule-type": "transformation",
      "rule-id": "2",
      "rule-name": "2",
      "rule-target": "column",
      "object-locator": {
        "schema-name": "%",
        "table-name": "employees"
      },
      "rule-action": "add-before-image-columns",
      "before-image-def": {
        "column-prefix": "BI_",
        "column-suffix": "",
        "column-filter": "pk-only"
      }
    }
  ]
}
```

For information on using the `add-before-image-columns` rule action, see [Transformation rules and actions](CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Transformations.md).

## Limitations when using Apache Kafka as a target for AWS Database Migration Service
<a name="CHAP_Target.Kafka.Limitations"></a>

The following limitations apply when using Apache Kafka as a target:
+ AWS DMS Kafka target endpoints don't support IAM access control for Amazon Managed Streaming for Apache Kafka (Amazon MSK).
+ Full LOB mode is not supported.
+ Specify a Kafka configuration file for your cluster with properties that allow AWS DMS to automatically create new topics. Include the setting, `auto.create.topics.enable = true`. If you are using Amazon MSK, you can specify the default configuration when you create your Kafka cluster, then change the `auto.create.topics.enable` setting to `true`. For more information about the default configuration settings, see [The default Amazon MSK configuration](https://docs.aws.amazon.com/msk/latest/developerguide/msk-default-configuration.html) in the *Amazon Managed Streaming for Apache Kafka Developer Guide*. If you need to modify an existing Kafka cluster created using Amazon MSK, run the AWS CLI command `aws kafka create-configuration` to update your Kafka configuration, as in the following example:

  ```
  14:38:41 $ aws kafka create-configuration --name "kafka-configuration" --kafka-versions "2.2.1" --server-properties file://~/kafka_configuration
  {
      "LatestRevision": {
          "Revision": 1,
          "CreationTime": "2019-09-06T14:39:37.708Z"
      },
      "CreationTime": "2019-09-06T14:39:37.708Z",
      "Name": "kafka-configuration",
      "Arn": "arn:aws:kafka:us-east-1:111122223333:configuration/kafka-configuration/7e008070-6a08-445f-9fe5-36ccf630ecfd-3"
  }
  ```

  Here, `//~/kafka_configuration` is the configuration file you have created with the required property settings.

  If you are using your own Kafka instance installed on Amazon EC2, modify the Kafka cluster configuration with the `auto.create.topics.enable = true` setting to allow AWS DMS to automatically create new topics, using the options provided with your instance.
+ AWS DMS publishes each update to a single record in the source database as one data record (message) in a given Kafka topic regardless of transactions.
+ AWS DMS supports the following four forms for partition keys:
  + `SchemaName.TableName`: A combination of the schema and table name.
  + `${AttributeName}`: The value of one of the fields in the JSON, or the primary key of the table in the source database.
  + `transaction-id`: The CDC transaction ID. All records within the same transaction go to the same partition.
  + `constant`: A fixed literal value for every record regardless of table or data. All records are sent to the same partition key value "constant", providing strict global ordering across all tables.

  ```
  {
      "rule-type": "object-mapping",
      "rule-id": "2",
      "rule-name": "TransactionIdPartitionKey",
      "rule-action": "map-record-to-document",
      "object-locator": {
          "schema-name": "onprem",
          "table-name": "it_system"
      },
      "mapping-parameters": {
          "partition-key-type": "transaction-id | constant | attribute-name | schema-table"
      }
  }
  ```
+ The `IncludeTransactionDetails` endpoint setting is only supported when the source endpoint is Oracle, SQL Server, PostgreSQL, or MySQL. For other source endpoint types, transaction details will not be included.
+ `BatchApply` is not supported for a Kafka endpoint. Using Batch Apply (for example, the `BatchApplyEnabled` target metadata task setting) for a Kafka target might result in loss of data.
+ AWS DMS does not support migrating values of `BigInt` data type with more than 16 digits. To work around this limitation, you can use the following transformation rule to convert the `BigInt` column to a string. For more information about transformation rules, see [Transformation rules and actions](CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Transformations.md).

  ```
  {
      "rule-type": "transformation",
      "rule-id": "id",
      "rule-name": "name",
      "rule-target": "column",
      "object-locator": {
          "schema-name": "valid object-mapping rule action",
          "table-name": "",
          "column-name": ""
      },
      "rule-action": "change-data-type",
      "data-type": {
          "type": "string",
          "length": 20
      }
  }
  ```
+ AWS DMS Kafka target endpoints do not support Amazon MSK servless.
+ When defining mapping rules, having both, object mapping rule and a transformation rule is not supported. You must set only one rule. 
+ AWS DMS supports SASL Authentication for Apache Kafka versions up to 3.8. If you are using Kafka 4.0 or higher, you can only connect without SASL authentication.
+ AWS DMS does not support source data containing embedded `'\0'` characters when using Kafka as a target endpoint. Data containing embedded `'\0'` characters will be truncated at the first `'\0'` character.

## Using object mapping to migrate data to a Kafka topic
<a name="CHAP_Target.Kafka.ObjectMapping"></a>

AWS DMS uses table-mapping rules to map data from the source to the target Kafka topic. To map data to a target topic, you use a type of table-mapping rule called object mapping. You use object mapping to define how data records in the source map to the data records published to a Kafka topic. 

Kafka topics don't have a preset structure other than having a partition key.

**Note**  
You don't have to use object mapping. You can use regular table mapping for various transformations. However, the partition key type will follow these default behaviors:   
Primary Key is used as a partition key for Full Load.
If no parallel-apply task settings are used, `schema.table` is used as a partition key for CDC.
If parallel-apply task settings are used, Primary key is used as a partition key for CDC.

To create an object-mapping rule, specify `rule-type` as `object-mapping`. This rule specifies what type of object mapping you want to use. 

The structure for the rule is as follows.

```
{
    "rules": [
        {
            "rule-type": "object-mapping",
            "rule-id": "id",
            "rule-name": "name",
            "rule-action": "valid object-mapping rule action",
            "object-locator": {
                "schema-name": "case-sensitive schema name",
                "table-name": ""
            }
        }
    ]
}
```

AWS DMS currently supports `map-record-to-record` and `map-record-to-document` as the only valid values for the `rule-action` parameter. These settings affect values that aren't excluded as part of the `exclude-columns` attribute list. The `map-record-to-record` and `map-record-to-document` values specify how AWS DMS handles these records by default. These values don't affect the attribute mappings in any way. 

Use `map-record-to-record` when migrating from a relational database to a Kafka topic. This rule type uses the `taskResourceId.schemaName.tableName` value from the relational database as the partition key in the Kafka topic and creates an attribute for each column in the source database. 

When using `map-record-to-record`, note the following:
+ This setting only affects columns excluded by the `exclude-columns` list.
+ For every such column, AWS DMS creates a corresponding attribute in the target topic.
+ AWS DMS creates this corresponding attribute regardless of whether the source column is used in an attribute mapping. 

One way to understand `map-record-to-record` is to see it in action. For this example, assume that you are starting with a relational database table row with the following structure and data.


| FirstName | LastName | StoreId | HomeAddress | HomePhone | WorkAddress | WorkPhone | DateofBirth | 
| --- | --- | --- | --- | --- | --- | --- | --- | 
| Randy | Marsh | 5 | 221B Baker Street | 1234567890 | 31 Spooner Street, Quahog  | 9876543210 | 02/29/1988 | 

To migrate this information from a schema named `Test` to a Kafka topic, you create rules to map the data to the target topic. The following rule illustrates the mapping. 

```
{
    "rules": [
        {
            "rule-type": "selection",
            "rule-id": "1",
            "rule-name": "1",
            "rule-action": "include",
            "object-locator": {
                "schema-name": "Test",
                "table-name": "%"
            }
        },
        {
            "rule-type": "object-mapping",
            "rule-id": "2",
            "rule-name": "DefaultMapToKafka",
            "rule-action": "map-record-to-record",
            "object-locator": {
                "schema-name": "Test",
                "table-name": "Customers"
            }
        }
    ]
}
```

Given a Kafka topic and a partition key (in this case, `taskResourceId.schemaName.tableName`), the following illustrates the resulting record format using our sample data in the Kafka target topic: 

```
  {
     "FirstName": "Randy",
     "LastName": "Marsh",
     "StoreId":  "5",
     "HomeAddress": "221B Baker Street",
     "HomePhone": "1234567890",
     "WorkAddress": "31 Spooner Street, Quahog",
     "WorkPhone": "9876543210",
     "DateOfBirth": "02/29/1988"
  }
```

**Topics**
+ [Restructuring data with attribute mapping](#CHAP_Target.Kafka.AttributeMapping)
+ [Multitopic replication using object mapping](#CHAP_Target.Kafka.MultiTopic)
+ [Message format for Apache Kafka](#CHAP_Target.Kafka.Messageformat)

### Restructuring data with attribute mapping
<a name="CHAP_Target.Kafka.AttributeMapping"></a>

You can restructure the data while you are migrating it to a Kafka topic using an attribute map. For example, you might want to combine several fields in the source into a single field in the target. The following attribute map illustrates how to restructure the data.

```
{
    "rules": [
        {
            "rule-type": "selection",
            "rule-id": "1",
            "rule-name": "1",
            "rule-action": "include",
            "object-locator": {
                "schema-name": "Test",
                "table-name": "%"
            }
        },
        {
            "rule-type": "object-mapping",
            "rule-id": "2",
            "rule-name": "TransformToKafka",
            "rule-action": "map-record-to-record",
            "target-table-name": "CustomerData",
            "object-locator": {
                "schema-name": "Test",
                "table-name": "Customers"
            },
            "mapping-parameters": {
                "partition-key-type": "attribute-name",
                "partition-key-name": "CustomerName",
                "exclude-columns": [
                    "firstname",
                    "lastname",
                    "homeaddress",
                    "homephone",
                    "workaddress",
                    "workphone"
                ],
                "attribute-mappings": [
                    {
                        "target-attribute-name": "CustomerName",
                        "attribute-type": "scalar",
                        "attribute-sub-type": "string",
                        "value": "${lastname}, ${firstname}"
                    },
                    {
                        "target-attribute-name": "ContactDetails",
                        "attribute-type": "document",
                        "attribute-sub-type": "json",
                        "value": {
                            "Home": {
                                "Address": "${homeaddress}",
                                "Phone": "${homephone}"
                            },
                            "Work": {
                                "Address": "${workaddress}",
                                "Phone": "${workphone}"
                            }
                        }
                    }
                ]
            }
        }
    ]
}
```

To set a constant value for `partition-key`, specify `"partition-key-type: "constant"`, this sets the partition value to `constant`. For example, you might do this to force all the data to be stored in a single partition. The following mapping illustrates this approach. 

```
{
    "rules": [
        {
            "rule-type": "selection",
            "rule-id": "1",
            "rule-name": "1",
            "object-locator": {
                "schema-name": "Test",
                "table-name": "%"
            },
            "rule-action": "include"
        },
        {
            "rule-type": "object-mapping",
            "rule-id": "1",
            "rule-name": "TransformToKafka",
            "rule-action": "map-record-to-document",
            "object-locator": {
                "schema-name": "Test",
                "table-name": "Customer"
            },
            "mapping-parameters": {
                "partition-key-type": "constant",
                "exclude-columns": [
                    "FirstName",
                    "LastName",
                    "HomeAddress",
                    "HomePhone",
                    "WorkAddress",
                    "WorkPhone"
                ],
                "attribute-mappings": [
                    {
                        "attribute-name": "CustomerName",
                        "value": "${FirstName},${LastName}"
                    },
                    {
                        "attribute-name": "ContactDetails",
                        "value": {
                            "Home": {
                                "Address": "${HomeAddress}",
                                "Phone": "${HomePhone}"
                            },
                            "Work": {
                                "Address": "${WorkAddress}",
                                "Phone": "${WorkPhone}"
                            }
                        }
                    },
                    {
                        "attribute-name": "DateOfBirth",
                        "value": "${DateOfBirth}"
                    }
                ]
            }
        }
    ]
}
```

**Note**  
The `partition-key` value for a control record that is for a specific table is `TaskId.SchemaName.TableName`. The `partition-key` value for a control record that is for a specific task is that record's `TaskId`. Specifying a `partition-key` value in the object mapping has no impact on the `partition-key` for a control record.  
 When `partition-key-type` is set to `attribute-name` in a table mapping rule, you must specify `partition-key-name`, which must reference either a column from the source table or a custom column defined in the mapping. Additionally, `attribute-mappings` must be provided to define how source columns map to the target Kafka topic.

### Multitopic replication using object mapping
<a name="CHAP_Target.Kafka.MultiTopic"></a>

By default, AWS DMS tasks migrate all source data to one of the Kafka topics following:
+ As specified in the **Topic** field of the AWS DMS target endpoint.
+ As specified by `kafka-default-topic` if the **Topic** field of the target endpoint isn't populated and the Kafka `auto.create.topics.enable` setting is set to `true`.

With AWS DMS engine versions 3.4.6 and higher, you can use the `kafka-target-topic` attribute to map each migrated source table to a separate topic. For example, the object mapping rules following migrate the source tables `Customer` and `Address` to the Kafka topics `customer_topic` and `address_topic`, respectively. At the same time, AWS DMS migrates all other source tables, including the `Bills` table in the `Test` schema, to the topic specified in the target endpoint.

```
{
    "rules": [
        {
            "rule-type": "selection",
            "rule-id": "1",
            "rule-name": "1",
            "rule-action": "include",
            "object-locator": {
                "schema-name": "Test",
                "table-name": "%"
            }
        },
        {
            "rule-type": "object-mapping",
            "rule-id": "2",
            "rule-name": "MapToKafka1",
            "rule-action": "map-record-to-record",
            "kafka-target-topic": "customer_topic",
            "object-locator": {
                "schema-name": "Test",
                "table-name": "Customer" 
            },
            "partition-key-type": "constant"
        },
        {
            "rule-type": "object-mapping",
            "rule-id": "3",
            "rule-name": "MapToKafka2",
            "rule-action": "map-record-to-record",
            "kafka-target-topic": "address_topic",
            "object-locator": {
                "schema-name": "Test",
                "table-name": "Address"
            },
            "partition-key-type": "constant"
        },
        {
            "rule-type": "object-mapping",
            "rule-id": "4",
            "rule-name": "DefaultMapToKafka",
            "rule-action": "map-record-to-record",
            "object-locator": {
                "schema-name": "Test",
                "table-name": "Bills"
            }
        }
    ]
}
```

By using Kafka multitopic replication, you can group and migrate source tables to separate Kafka topics using a single replication task.

### Message format for Apache Kafka
<a name="CHAP_Target.Kafka.Messageformat"></a>

The JSON output is simply a list of key-value pairs. 

**RecordType**  
The record type can be either data or control. *Data records *represent the actual rows in the source. *Control records* are for important events in the stream, for example a restart of the task.

**Operation**  
For data records, the operation can be `load`, `insert`, `update`, or `delete`.  
For control records, the operation can be `create-table`, `rename-table`, `drop-table`, `change-columns`, `add-column`, `drop-column`, `rename-column`, or `column-type-change`.

**SchemaName**  
The source schema for the record. This field can be empty for a control record.

**TableName**  
The source table for the record. This field can be empty for a control record.

**Timestamp**  
The timestamp for when the JSON message was constructed. The field is formatted with the ISO 8601 format.

The following JSON message example illustrates a data type message with all additional metadata.

```
{ 
   "data":{ 
      "id":100000161,
      "fname":"val61s",
      "lname":"val61s",
      "REGION":"val61s"
   },
   "metadata":{ 
      "timestamp":"2019-10-31T22:53:59.721201Z",
      "record-type":"data",
      "operation":"insert",
      "partition-key-type":"primary-key",
      "partition-key-value":"sbtest.sbtest_x.100000161",
      "schema-name":"sbtest",
      "table-name":"sbtest_x",
      "transaction-id":9324410911751,
      "transaction-record-id":1,
      "prev-transaction-id":9324410910341,
      "prev-transaction-record-id":10,
      "commit-timestamp":"2019-10-31T22:53:55.000000Z",
      "stream-position":"mysql-bin-changelog.002171:36912271:0:36912333:9324410911751:mysql-bin-changelog.002171:36912209"
   }
}
```

The following JSON message example illustrates a control type message.

```
{ 
   "control":{ 
      "table-def":{ 
         "columns":{ 
            "id":{ 
               "type":"WSTRING",
               "length":512,
               "nullable":false
            },
            "fname":{ 
               "type":"WSTRING",
               "length":255,
               "nullable":true
            },
            "lname":{ 
               "type":"WSTRING",
               "length":255,
               "nullable":true
            },
            "REGION":{ 
               "type":"WSTRING",
               "length":1000,
               "nullable":true
            }
         },
         "primary-key":[ 
            "id"
         ],
         "collation-name":"latin1_swedish_ci"
      }
   },
   "metadata":{ 
      "timestamp":"2019-11-21T19:14:22.223792Z",
      "record-type":"control",
      "operation":"create-table",
      "partition-key-type":"task-id",
      "schema-name":"sbtest",
      "table-name":"sbtest_t1"
   }
}
```

# Using an Amazon OpenSearch Service cluster as a target for AWS Database Migration Service
<a name="CHAP_Target.Elasticsearch"></a>

You can use AWS DMS to migrate data to Amazon OpenSearch Service (OpenSearch Service). OpenSearch Service is a managed service that makes it easy to deploy, operate, and scale an OpenSearch Service cluster. 

In OpenSearch Service, you work with indexes and documents. An *index* is a collection of documents, and a *document* is a JSON object containing scalar values, arrays, and other objects. OpenSearch provides a JSON-based query language, so that you can query data in an index and retrieve the corresponding documents.

When AWS DMS creates indexes for a target endpoint for OpenSearch Service, it creates one index for each table from the source endpoint. The cost for creating an OpenSearch Service index depends on several factors. These are the number of indexes created, the total amount of data in these indexes, and the small amount of metadata that OpenSearch stores for each document.

Configure your OpenSearch Service cluster with compute and storage resources that are appropriate for the scope of your migration. We recommend that you consider the following factors, depending on the replication task you want to use:
+ For a full data load, consider the total amount of data that you want to migrate, and also the speed of the transfer.
+ For replicating ongoing changes, consider the frequency of updates, and your end-to-end latency requirements.

Also, configure the index settings on your OpenSearch cluster, paying close attention to the document count.

**Multithreaded full load task settings**

To help increase the speed of the transfer, AWS DMS supports a multithreaded full load to an OpenSearch Service target cluster. AWS DMS supports this multithreading with task settings that include the following:
+ `MaxFullLoadSubTasks` – Use this option to indicate the maximum number of source tables to load in parallel. DMS loads each table into its corresponding OpenSearch Service target index using a dedicated subtask. The default is 8; the maximum value is 49.
+ `ParallelLoadThreads` – Use this option to specify the number of threads that AWS DMS uses to load each table into its OpenSearch Service target index. The maximum value for an OpenSearch Service target is 32. You can ask to have this maximum limit increased.
**Note**  
If you don't change `ParallelLoadThreads` from its default (0), AWS DMS transfers a single record at a time. This approach puts undue load on your OpenSearch Service cluster. Make sure that you set this option to 1 or more.
+ `ParallelLoadBufferSize` – Use this option to specify the maximum number of records to store in the buffer that the parallel load threads use to load data to the OpenSearch Service target. The default value is 50. The maximum value is 1,000. Use this setting with `ParallelLoadThreads`. `ParallelLoadBufferSize` is valid only when there is more than one thread.

For more information on how DMS loads an OpenSearch Service cluster using multithreading, see the AWS blog post [Scale Amazon OpenSearch Service for AWS Database Migration Service migrations](https://aws.amazon.com/blogs/database/scale-amazon-elasticsearch-service-for-aws-database-migration-service-migrations/). 

**Multithreaded CDC load task settings**

You can improve the performance of change data capture (CDC) for an OpenSearch Service target cluster using task settings to modify the behavior of the `PutRecords` API call. To do this, you can specify the number of concurrent threads, queues per thread, and the number of records to store in a buffer using `ParallelApply*` task settings. For example, suppose you want to perform a CDC load and apply 32 threads in parallel. You also want to access 64 queues per thread, with 50 records stored per buffer. 
**Note**  
Support for the use of `ParallelApply*` task settings during CDC to Amazon OpenSearch Service target endpoints is available in AWS DMS versions 3.4.0 and higher.

To promote CDC performance, AWS DMS supports these task settings:
+ `ParallelApplyThreads` – Specifies the number of concurrent threads that AWS DMS uses during a CDC load to push data records to a OpenSearch Service target endpoint. The default value is zero (0) and the maximum value is 32.
+ `ParallelApplyBufferSize` – Specifies the maximum number of records to store in each buffer queue for concurrent threads to push to a OpenSearch Service target endpoint during a CDC load. The default value is 100 and the maximum value is 1,000. Use this option when `ParallelApplyThreads` specifies more than one thread. 
+ `ParallelApplyQueuesPerThread` – Specifies the number of queues that each thread accesses to take data records out of queues and generate a batch load for a OpenSearch Service endpoint during CDC.

When using `ParallelApply*` task settings, the `partition-key-type` default is the `primary-key` of the table, not `schema-name.table-name`.

## Migrating from a relational database table to an OpenSearch Service index
<a name="CHAP_Target.Elasticsearch.RDBMS2Elasticsearch"></a>

AWS DMS supports migrating data to OpenSearch Service's scalar data types. When migrating from a relational database like Oracle or MySQL to OpenSearch Service, you might want to restructure how you store this data.

AWS DMS supports the following OpenSearch Service scalar data types: 
+ Boolean 
+ Date
+ Float
+ Int
+ String

AWS DMS converts data of type Date into type String. You can specify custom mapping to interpret these dates.

AWS DMS does not support migration of LOB data types.

## Prerequisites for using Amazon OpenSearch Service as a target for AWS Database Migration Service
<a name="CHAP_Target.Elasticsearch.Prerequisites"></a>

Before you begin work with an OpenSearch Service database as a target for AWS DMS, make sure that you create an AWS Identity and Access Management (IAM) role. This role should let AWS DMS access the OpenSearch Service indexes at the target endpoint. The minimum set of access permissions is shown in the following IAM policy.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "1",
            "Effect": "Allow",
            "Principal": {
                "Service": "dms.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
        }
    ]
}
```

------

The role that you use for the migration to OpenSearch Service must have the following permissions.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "es:ESHttpDelete",
        "es:ESHttpGet",
        "es:ESHttpHead",
        "es:ESHttpPost",
        "es:ESHttpPut"
      ],
      "Resource": "*"
    }
  ]
}
```

------

In the preceding example, replace `region` with the AWS Region identifier, *`account-id`* with your AWS account ID, and `domain-name` with the name of your Amazon OpenSearch Service domain. An example is `arn:aws:es:us-west-2:123456789012:domain/my-es-domain`

## Endpoint settings when using OpenSearch Service as a target for AWS DMS
<a name="CHAP_Target.Elasticsearch.Configuration"></a>

You can use endpoint settings to configure your OpenSearch Service target database similar to using extra connection attributes. You specify the settings when you create the target endpoint using the AWS DMS console, or by using the `create-endpoint` command in the [AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/dms/index.html), with the `--elasticsearch-settings '{"EndpointSetting": "value", ...}'` JSON syntax.

The following table shows the endpoint settings that you can use with OpenSearch Service as a target.


| Attribute name | Valid values | Default value and description | 
| --- | --- | --- | 
|  `FullLoadErrorPercentage`   |  A positive integer greater than 0 but no larger than 100.  |  10 – For a full load task, this attribute determines the threshold of errors allowed before the task fails. For example, suppose that there are 1,500 rows at the source endpoint and this parameter is set to 10. Then the task fails if AWS DMS encounters more than 150 errors (10 percent of the row count) when writing to the target endpoint.  | 
|   `ErrorRetryDuration`   |  A positive integer greater than 0.  |  300 – If an error occurs at the target endpoint, AWS DMS retries for this many seconds. Otherwise, the task fails.  | 
|  `UseNewMappingType`  | true or false |  `false`, but to work using opensearch v2.x it should be set to `true`.  | 

## Limitations when using Amazon OpenSearch Service as a target for AWS Database Migration Service
<a name="CHAP_Target.Elasticsearch.Limitations"></a>

The following limitations apply when using Amazon OpenSearch Service as a target:
+ OpenSearch Service uses dynamic mapping (auto guess) to determine the data types to use for migrated data.
+ OpenSearch Service stores each document with a unique ID. The following is an example ID. 

  ```
  "_id": "D359F8B537F1888BC71FE20B3D79EAE6674BE7ACA9B645B0279C7015F6FF19FD"
  ```

  Each document ID is 64 bytes long, so anticipate this as a storage requirement. For example, if you migrate 100,000 rows from an AWS DMS source, the resulting OpenSearch Service index requires storage for an additional 6,400,000 bytes.
+ With OpenSearch Service, you can't make updates to the primary key attributes. This restriction is important when using ongoing replication with change data capture (CDC) because it can result in unwanted data in the target. In CDC mode, primary keys are mapped to SHA256 values, which are 32 bytes long. These are converted to human-readable 64-byte strings, and are used as OpenSearch Service document IDs.
+ If AWS DMS encounters any items that can't be migrated, it writes error messages to Amazon CloudWatch Logs. This behavior differs from that of other AWS DMS target endpoints, which write errors to an exceptions table.
+ AWS DMS does not support connection to an Amazon ES cluster that has Fine-grained Access Control enabled with master user and password.
+ AWS DMS does not support OpenSearch Service serverless.
+ OpenSearch Service does not support writing data to pre-existing indexes.
+ The replication task setting, `TargetTablePrepMode:TRUNCATE_BEFORE_LOAD` is not supported for use with a OpenSearch target endpoint.
+ When migrating data to Amazon Elasticsearch using AWS DMS, the source data must have a primary key or a unique identifier column. If the source data does not have a primary key or unique identifier, you need to define one using the define-primary-key transformation rule.

## Target data types for Amazon OpenSearch Service
<a name="CHAP_Target.Elasticsearch.DataTypes"></a>

When AWS DMS migrates data from heterogeneous databases, the service maps data types from the source database to intermediate data types called AWS DMS data types. The service then maps the intermediate data types to the target data types. The following table shows each AWS DMS data type and the data type it maps to in OpenSearch Service.


| AWS DMS data type | OpenSearch Service data type | 
| --- | --- | 
|  Boolean  |  boolean  | 
|  Date  |  string  | 
|  Time  |  date  | 
|  Timestamp  |  date  | 
|  INT4  |  integer  | 
|  Real4  |  float  | 
|  UINT4  |  integer  | 

For additional information about AWS DMS data types, see [Data types for AWS Database Migration Service](CHAP_Reference.DataTypes.md).

# Using Amazon DocumentDB as a target for AWS Database Migration Service
<a name="CHAP_Target.DocumentDB"></a>

 For information about what versions of Amazon DocumentDB (with MongoDB compatibility) that AWS DMS supports, see [Targets for AWS DMS](CHAP_Introduction.Targets.md). You can use AWS DMS to migrate data to Amazon DocumentDB (with MongoDB compatibility) from any of the source data engines that AWS DMS supports. The source engine can be on an AWS managed service such as Amazon RDS, Aurora, or Amazon S3. Or the engine can be on a self-managed database, such as MongoDB running on Amazon EC2 or on-premises.

You can use AWS DMS to replicate source data to Amazon DocumentDB databases, collections, or documents. 

**Note**  
If your source endpoint is MongoDB or Amazon DocumentDB, run the migration in **Document mode**.

MongoDB stores data in a binary JSON format (BSON). AWS DMS supports all of the BSON data types that are supported by Amazon DocumentDB. For a list of these data types, see [Supported MongoDB APIs, operations, and data types](https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html) in *the Amazon DocumentDB Developer Guide.*

If the source endpoint is a relational database, AWS DMS maps database objects to Amazon DocumentDB as follows:
+ A relational database, or database schema, maps to an Amazon DocumentDB *database*. 
+ Tables within a relational database map to *collections* in Amazon DocumentDB.
+ Records in a relational table map to *documents* in Amazon DocumentDB. Each document is constructed from data in the source record.

If the source endpoint is Amazon S3, then the resulting Amazon DocumentDB objects correspond to AWS DMS mapping rules for Amazon S3. For example, consider the following URI.

```
s3://amzn-s3-demo-bucket/hr/employee
```

In this case, AWS DMS maps the objects in `amzn-s3-demo-bucket` to Amazon DocumentDB as follows:
+ The top-level URI part (`hr`) maps to an Amazon DocumentDB database. 
+ The next URI part (`employee`) maps to an Amazon DocumentDB collection.
+ Each object in `employee` maps to a document in Amazon DocumentDB.

For more information on mapping rules for Amazon S3, see [Using Amazon S3 as a source for AWS DMS](CHAP_Source.S3.md).

**Amazon DocumentDB endpoint settings**

In AWS DMS versions 3.5.0 and higher, you can improve the performance of change data capture (CDC) for Amazon DocumentDB endpoints by tuning task settings for parallel threads and bulk operations. To do this, you can specify the number of concurrent threads, queues per thread, and the number of records to store in a buffer using `ParallelApply*` task settings. For example, suppose you want to perform a CDC load and apply 128 threads in parallel. You also want to access 64 queues per thread, with 50 records stored per buffer. 

To promote CDC performance, AWS DMS supports these task settings:
+ `ParallelApplyThreads` – Specifies the number of concurrent threads that AWS DMS uses during a CDC load to push data records to a Amazon DocumentDB target endpoint. The default value is zero (0) and the maximum value is 32.
+ `ParallelApplyBufferSize` – Specifies the maximum number of records to store in each buffer queue for concurrent threads to push to a Amazon DocumentDB target endpoint during a CDC load. The default value is 100 and the maximum value is 1,000. Use this option when `ParallelApplyThreads` specifies more than one thread. 
+ `ParallelApplyQueuesPerThread` – Specifies the number of queues that each thread accesses to take data records out of queues and generate a batch load for a Amazon DocumentDB endpoint during CDC. The default is 1. The maximum is 512.

**Note**  
 For Amazon DocumentDB targets, parallel CDC apply can cause duplicate key errors or stalled CDC apply for workloads that use secondary unique indexes or require strict ordering of changes. Use the default single-threaded CDC apply configuration for these workloads. 

For additional details on working with Amazon DocumentDB as a target for AWS DMS, see the following sections:

**Topics**
+ [Mapping data from a source to an Amazon DocumentDB target](#CHAP_Target.DocumentDB.data-mapping)
+ [Connecting to Amazon DocumentDB Elastic Clusters as a target](#CHAP_Target.DocumentDB.data-mapping.elastic-cluster-connect)
+ [Ongoing replication with Amazon DocumentDB as a target](#CHAP_Target.DocumentDB.data-mapping.ongoing-replication)
+ [Limitations to using Amazon DocumentDB as a target](#CHAP_Target.DocumentDB.limitations)
+ [Using endpoint settings with Amazon DocumentDB as a target](#CHAP_Target.DocumentDB.ECAs)
+ [Target data types for Amazon DocumentDB](#CHAP_Target.DocumentDB.datatypes)

**Note**  
For a step-by-step walkthrough of the migration process, see [Migrating from MongoDB to Amazon DocumentDB ](https://docs.aws.amazon.com/dms/latest/sbs/CHAP_MongoDB2DocumentDB.html) in the AWS Database Migration Service Step-by-Step Migration Guide.

## Mapping data from a source to an Amazon DocumentDB target
<a name="CHAP_Target.DocumentDB.data-mapping"></a>

AWS DMS reads records from the source endpoint, and constructs JSON documents based on the data it reads. For each JSON document, AWS DMS must determine an `_id` field to act as a unique identifier. It then writes the JSON document to an Amazon DocumentDB collection, using the `_id` field as a primary key.

### Source data that is a single column
<a name="CHAP_Target.DocumentDB.data-mapping.single-column"></a>

If the source data consists of a single column, the data must be of a string type. (Depending on the source engine, the actual data type might be VARCHAR, NVARCHAR, TEXT, LOB, CLOB, or similar.) AWS DMS assumes that the data is a valid JSON document, and replicates the data to Amazon DocumentDB as is.

If the resulting JSON document contains a field named `_id`, then that field is used as the unique `_id` in Amazon DocumentDB.

If the JSON doesn't contain an `_id` field, then Amazon DocumentDB generates an `_id` value automatically.

### Source data that is multiple columns
<a name="CHAP_Target.DocumentDB.data-mapping.multiple-columns"></a>

If the source data consists of multiple columns, then AWS DMS constructs a JSON document from all of these columns. To determine the `_id` field for the document, AWS DMS proceeds as follows:
+ If one of the columns is named `_id`, then the data in that column is used as the target`_id`.
+ If there is no `_id` column, but the source data has a primary key or a unique index, then AWS DMS uses that key or index value as the `_id` value. The data from the primary key or unique index also appears as explicit fields in the JSON document.
+ If there is no `_id` column, and no primary key or a unique index, then Amazon DocumentDB generates an `_id` value automatically.

### Coercing a data type at the target endpoint
<a name="CHAP_Target.DocumentDB.coercing-datatype"></a>

AWS DMS can modify data structures when it writes to an Amazon DocumentDB target endpoint. You can request these changes by renaming columns and tables at the source endpoint, or by providing transformation rules that are applied when a task is running.

#### Using a nested JSON document (json\$1 prefix)
<a name="CHAP_Target.DocumentDB.coercing-datatype.json"></a>

To coerce a data type, you can prefix the source column name with `json_` (that is, `json_columnName`) either manually or using a transformation. In this case, the column is created as a nested JSON document within the target document, rather than as a string field.

For example, suppose that you want to migrate the following document from a MongoDB source endpoint.

```
{
    "_id": "1", 
    "FirstName": "John", 
    "LastName": "Doe",
    "ContactDetails": "{"Home": {"Address": "Boston","Phone": "1111111"},"Work": { "Address": "Boston", "Phone": "2222222222"}}"
}
```

If you don't coerce any of the source data types, the embedded `ContactDetails` document is migrated as a string.

```
{
    "_id": "1", 
    "FirstName": "John", 
    "LastName": "Doe",
    "ContactDetails": "{\"Home\": {\"Address\": \"Boston\",\"Phone\": \"1111111\"},\"Work\": { \"Address\": \"Boston\", \"Phone\": \"2222222222\"}}"
}
```

However, you can add a transformation rule to coerce `ContactDetails` to a JSON object. For example, suppose that the original source column name is `ContactDetails`. To coerce the data type as Nested JSON, the column at source endpoint needs to be renamed as json\$1ContactDetails” either by adding “\$1json\$1\$1“ prefix on the source manually or through transformation rules. For example, you can use the below transformation rule:

```
{
    "rules": [
    {
    "rule-type": "transformation",
    "rule-id": "1",
    "rule-name": "1",
    "rule-target": "column",
    "object-locator": {
    "schema-name": "%",
    "table-name": "%",
    "column-name": "ContactDetails"
     },
    "rule-action": "rename",
    "value": "json_ContactDetails",
    "old-value": null
    }
    ]
}
```

AWS DMS replicates the ContactDetails field as nested JSON, as follows. 

```
{
    "_id": "1",
    "FirstName": "John",
    "LastName": "Doe",
    "ContactDetails": {
        "Home": {
            "Address": "Boston",
            "Phone": "1111111111"
        },
        "Work": {
            "Address": "Boston",
            "Phone": "2222222222"
        }
    }
}
```

#### Using a JSON array (array\$1 prefix)
<a name="CHAP_Target.DocumentDB.coercing-datatype.array"></a>

To coerce a data type, you can prefix a column name with `array_` (that is, `array_columnName`), either manually or using a transformation. In this case, AWS DMS considers the column as a JSON array, and creates it as such in the target document.

Suppose that you want to migrate the following document from a MongoDB source endpoint.

```
{
    "_id" : "1",
    "FirstName": "John",
    "LastName": "Doe", 
    "ContactAddresses": ["Boston", "New York"],             
    "ContactPhoneNumbers": ["1111111111", "2222222222"]
}
```

If you don't coerce any of the source data types, the embedded `ContactDetails` document is migrated as a string.

```
{
    "_id": "1",
    "FirstName": "John",
    "LastName": "Doe", 
    "ContactAddresses": "[\"Boston\", \"New York\"]",             
    "ContactPhoneNumbers": "[\"1111111111\", \"2222222222\"]" 
}
```

 However, you can add transformation rules to coerce `ContactAddress` and `ContactPhoneNumbers` to JSON arrays, as shown in the following table.


****  

| Original source column name | Renamed source column | 
| --- | --- | 
| ContactAddress | array\$1ContactAddress | 
| ContactPhoneNumbers | array\$1ContactPhoneNumbers | 

AWS DMS replicates `ContactAddress` and `ContactPhoneNumbers` as follows.

```
{
    "_id": "1",
    "FirstName": "John",
    "LastName": "Doe",
    "ContactAddresses": [
        "Boston",
        "New York"
    ],
    "ContactPhoneNumbers": [
        "1111111111",
        "2222222222"
    ]
}
```

### Connecting to Amazon DocumentDB using TLS
<a name="CHAP_Target.DocumentDB.tls"></a>

By default, a newly created Amazon DocumentDB cluster accepts secure connections only using Transport Layer Security (TLS). When TLS is enabled, every connection to Amazon DocumentDB requires a public key.

You can retrieve the public key for Amazon DocumentDB by downloading the file, `rds-combined-ca-bundle.pem`, from an AWS hosted Amazon S3 bucket. For more information on downloading this file, see [Encrypting connections using TLS](https://docs.aws.amazon.com/documentdb/latest/developerguide/security.encryption.ssl.html) in the *Amazon DocumentDB Developer Guide*

After you download this .pem file, you can import the public key that it contains into AWS DMS as described following.

#### AWS Management Console
<a name="CHAP_Target.DocumentDB.tls.con"></a>

**To import the public key (.pem) file**

1. Open the AWS DMS console at [https://console.aws.amazon.com/dms](https://console.aws.amazon.com/dms).

1. In the navigation pane, choose **Certificates**.

1. Choose **Import certificate** and do the following:
   + For **Certificate identifier**, enter a unique name for the certificate, for example `docdb-cert`.
   + For **Import file**, navigate to the location where you saved the .pem file.

   When the settings are as you want them, choose **Add new CA certificate**.

#### AWS CLI
<a name="CHAP_Target.DocumentDB.tls.cli"></a>

Use the `aws dms import-certificate` command, as shown in the following example.

```
aws dms import-certificate \
    --certificate-identifier docdb-cert \
    --certificate-pem file://./rds-combined-ca-bundle.pem
```

When you create an AWS DMS target endpoint, provide the certificate identifier (for example, `docdb-cert`). Also, set the SSL mode parameter to `verify-full`.

## Connecting to Amazon DocumentDB Elastic Clusters as a target
<a name="CHAP_Target.DocumentDB.data-mapping.elastic-cluster-connect"></a>

In AWS DMS versions 3.4.7 and higher, you can create a Amazon DocumentDB target endpoint as an Elastic Cluster. If you create your target endpoint as an Elastic Cluster, you need to attach a new SSL certificate to your Amazon DocumentDB Elastic Cluster endpoint because your existing SSL certificate won't work.

**To attach a new SSL certificate to your Amazon DocumentDB Elastic Cluster endpoint**

1. In a browser, open [ https://www.amazontrust.com/repository/SFSRootCAG2.pem](https://www.amazontrust.com/repository/SFSRootCAG2.pem) and save the contents to a `.pem` file with a unique file name, for example `SFSRootCAG2.pem`. This is the certificate file that you need to import in subsequent steps.

1. Create the Elastic Cluster endpoint and set the following options:

   1. Under **Endpoint Configuration**, choose **Add new CA certificate**.

   1. For **Certificate identifier**, enter **SFSRootCAG2.pem**.

   1. For **Import certificate file**, choose **Choose file**, then navigate to the `SFSRootCAG2.pem` file that you previously downloaded.

   1. Select and open the downloaded `SFSRootCAG2.pem` file.

   1. Choose **Import certificate**.

   1. From the **Choose a certificate** drop down, choose **SFSRootCAG2.pem**.

The new SSL certificate from the downloaded `SFSRootCAG2.pem` file is now attached to your Amazon DocumentDB Elastic Cluster endpoint.

## Ongoing replication with Amazon DocumentDB as a target
<a name="CHAP_Target.DocumentDB.data-mapping.ongoing-replication"></a>

If ongoing replication (change data capture, CDC) is enabled for Amazon DocumentDB as a target, AWS DMS versions 3.5.0 and higher provide a performance improvement that is twenty times greater than in prior releases. In prior releases where AWS DMS handles up to 250 records per second, AWS DMS now efficiently processes over 5000 records per second. AWS DMS also ensures that documents in Amazon DocumentDB stay in sync with the source. When a source record is created or updated, AWS DMS must first determine which Amazon DocumentDB record is affected by doing the following:
+ If the source record has a column named `_id`, the value of that column determines the corresponding `_id` in the Amazon DocumentDB collection.
+ If there is no `_id` column, but the source data has a primary key or unique index, then AWS DMS uses that key or index value as the `_id` for the Amazon DocumentDB collection.
+ If the source record doesn't have an `_id` column, a primary key, or a unique index, then AWS DMS matches all of the source columns to the corresponding fields in the Amazon DocumentDB collection.

When a new source record is created, AWS DMS writes a corresponding document to Amazon DocumentDB. If an existing source record is updated, AWS DMS updates the corresponding fields in the target document in Amazon DocumentDB. Any fields that exist in the target document but not in the source record remain untouched.

When a source record is deleted, AWS DMS deletes the corresponding document from Amazon DocumentDB.

### Structural changes (DDL) at the source
<a name="CHAP_Target.DocumentDB.data-mapping.ongoing-replication.ddl"></a>

With ongoing replication, any changes to source data structures (such as tables, columns, and so on) are propagated to their counterparts in Amazon DocumentDB. In relational databases, these changes are initiated using data definition language (DDL) statements. You can see how AWS DMS propagates these changes to Amazon DocumentDB in the following table.


****  

| DDL at source | Effect at Amazon DocumentDB target | 
| --- | --- | 
| CREATE TABLE | Creates an empty collection. | 
| Statement that renames a table (RENAME TABLE, ALTER TABLE...RENAME, and similar) | Renames the collection. | 
| TRUNCATE TABLE | Removes all the documents from the collection, but only if HandleSourceTableTruncated is true. For more information, see [Task settings for change processing DDL handling](CHAP_Tasks.CustomizingTasks.TaskSettings.DDLHandling.md). | 
| DROP TABLE | Deletes the collection, but only if HandleSourceTableDropped is true. For more information, see [Task settings for change processing DDL handling](CHAP_Tasks.CustomizingTasks.TaskSettings.DDLHandling.md). | 
| Statement that adds a column to a table (ALTER TABLE...ADD and similar) | The DDL statement is ignored, and a warning is issued. When the first INSERT is performed at the source, the new field is added to the target document. | 
| ALTER TABLE...RENAME COLUMN | The DDL statement is ignored, and a warning is issued. When the first INSERT is performed at the source, the newly named field is added to the target document. | 
| ALTER TABLE...DROP COLUMN | The DDL statement is ignored, and a warning is issued. | 
| Statement that changes the column data type (ALTER COLUMN...MODIFY and similar) | The DDL statement is ignored, and a warning is issued. When the first INSERT is performed at the source with the new data type, the target document is created with a field of that new data type. | 

## Limitations to using Amazon DocumentDB as a target
<a name="CHAP_Target.DocumentDB.limitations"></a>

The following limitations apply when using Amazon DocumentDB as a target for AWS DMS:
+ In Amazon DocumentDB, collection names can't contain the dollar symbol (\$1). In addition, database names can't contain any Unicode characters.
+ AWS DMS doesn't support merging of multiple source tables into a single Amazon DocumentDB collection.
+ When AWS DMS processes changes from a source table that doesn't have a primary key, any LOB columns in that table are ignored.
+ If the **Change table** option is enabled and AWS DMS encounters a source column named "*\$1id*", then that column appears as "*\$1\$1id*" (two underscores) in the change table.
+ If you choose Oracle as a source endpoint, then the Oracle source must have full supplemental logging enabled. Otherwise, if there are columns at the source that weren't changed, then the data is loaded into Amazon DocumentDB as null values.
+ The replication task setting, `TargetTablePrepMode:TRUNCATE_BEFORE_LOAD` isn't supported for use with a DocumentDB target endpoint. 
+ MongoDB capped collections are not supported in Amazon DocumentDB. However, AWS DMS automatically migrates such objects as uncapped collections on target DocumentDB.
+ Parallel CDC apply to Amazon DocumentDB targets can cause duplicate key errors or stalled CDC apply for workloads that use secondary unique indexes or require strict ordering of changes. For such workloads, use the default single-threaded CDC apply configuration.

## Using endpoint settings with Amazon DocumentDB as a target
<a name="CHAP_Target.DocumentDB.ECAs"></a>

You can use endpoint settings to configure your Amazon DocumentDB target database similar to using extra connection attributes. You specify the settings when you create the target endpoint using the AWS DMS console, or by using the `create-endpoint` command in the [AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/dms/index.html), with the `--doc-db-settings '{"EndpointSetting": "value", ...}'` JSON syntax.

The following table shows the endpoint settings that you can use with Amazon DocumentDB as a target.


| Attribute name | Valid values | Default value and description | 
| --- | --- | --- | 
|   `replicateShardCollections`   |  boolean `true` `false`  |  When `true`, this endpoint setting has the following effects and imposes the following limitations: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.DocumentDB.html)  | 

## Target data types for Amazon DocumentDB
<a name="CHAP_Target.DocumentDB.datatypes"></a>

In the following table, you can find the Amazon DocumentDB target data types that are supported when using AWS DMS, and the default mapping from AWS DMS data types. For more information about AWS DMS data types, see [Data types for AWS Database Migration Service](CHAP_Reference.DataTypes.md).


|  AWS DMS data type  |  Amazon DocumentDB data type  | 
| --- | --- | 
|  BOOLEAN  |  Boolean  | 
|  BYTES  |  Binary data  | 
|  DATE  | Date | 
|  TIME  | String (UTF8) | 
|  DATETIME  | Date | 
|  INT1  | 32-bit integer | 
|  INT2  |  32-bit integer  | 
|  INT4  | 32-bit integer | 
|  INT8  |  64-bit integer  | 
|  NUMERIC  | String (UTF8) | 
|  REAL4  |  Double  | 
|  REAL8  | Double | 
|  STRING  |  If the data is recognized as JSON, then AWS DMS migrates it to Amazon DocumentDB as a document. Otherwise, the data is mapped to String (UTF8).  | 
|  UINT1  | 32-bit integer | 
|  UINT2  | 32-bit integer | 
|  UINT4  | 64-bit integer | 
|  UINT8  |  String (UTF8)  | 
|  WSTRING  | If the data is recognized as JSON, then AWS DMS migrates it to Amazon DocumentDB as a document. Otherwise, the data is mapped to String (UTF8). | 
|  BLOB  | Binary | 
|  CLOB  | If the data is recognized as JSON, then AWS DMS migrates it to Amazon DocumentDB as a document. Otherwise, the data is mapped to String (UTF8). | 
|  NCLOB  | If the data is recognized as JSON, then AWS DMS migrates it to Amazon DocumentDB as a document. Otherwise, the data is mapped to String (UTF8). | 

# Using Amazon Neptune as a target for AWS Database Migration Service
<a name="CHAP_Target.Neptune"></a>

Amazon Neptune is a fast, reliable, fully managed graph database service that makes it easy to build and run applications that work with highly connected datasets. The core of Neptune is a purpose-built, high-performance graph database engine. This engine is optimized for storing billions of relationships and querying the graph with milliseconds latency. Neptune supports the popular graph query languages Apache TinkerPop Gremlin and W3C's SPARQL. For more information on Amazon Neptune, see [What is Amazon Neptune?](https://docs.aws.amazon.com/neptune/latest/userguide/intro.html) in the *Amazon Neptune User Guide*. 

Without a graph database such as Neptune, you probably model highly connected data in a relational database. Because the data has potentially dynamic connections, applications that use such data sources have to model connected data queries in SQL. This approach requires you to write an extra layer to convert graph queries into SQL. Also, relational databases come with schema rigidity. Any changes in the schema to model changing connections require downtime and additional maintenance of the query conversion to support the new schema. The query performance is also another big constraint to consider while designing your applications.

Graph databases can greatly simplify such situations. Free from a schema, a rich graph query layer (Gremlin or SPARQL) and indexes optimized for graph queries increase flexibility and performance. The Amazon Neptune graph database also has enterprise features such as encryption at rest, a secure authorization layer, default backups, Multi-AZ support, read replica support, and others.

Using AWS DMS, you can migrate relational data that models a highly connected graph to a Neptune target endpoint from a DMS source endpoint for any supported SQL database.

For more details, see the following.

**Topics**
+ [Overview of migrating to Amazon Neptune as a target](#CHAP_Target.Neptune.MigrationOverview)
+ [Specifying endpoint settings for Amazon Neptune as a target](#CHAP_Target.Neptune.EndpointSettings)
+ [Creating an IAM service role for accessing Amazon Neptune as a target](#CHAP_Target.Neptune.ServiceRole)
+ [Specifying graph-mapping rules using Gremlin and R2RML for Amazon Neptune as a target](#CHAP_Target.Neptune.GraphMapping)
+ [Data types for Gremlin and R2RML migration to Amazon Neptune as a target](#CHAP_Target.Neptune.DataTypes)
+ [Limitations of using Amazon Neptune as a target](#CHAP_Target.Neptune.Limitations)

## Overview of migrating to Amazon Neptune as a target
<a name="CHAP_Target.Neptune.MigrationOverview"></a>

Before starting a migration to a Neptune target, create the following resources in your AWS account:
+ A Neptune cluster for the target endpoint. 
+ A SQL relational database supported by AWS DMS for the source endpoint. 
+ An Amazon S3 bucket for the target endpoint. Create this S3 bucket in the same AWS Region as your Neptune cluster. AWS DMS uses this S3 bucket as intermediate file storage for the target data that it bulk loads to the Neptune database. For more information on creating an S3 bucket, see [Creating a bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html) in the *Amazon Simple Storage Service User Guide.*
+ A virtual private cloud (VPC) endpoint for S3 in the same VPC as the Neptune cluster. 
+ An AWS Identity and Access Management (IAM) role that includes an IAM policy. This policy should specify the `GetObject`, `PutObject`, `DeleteObject` and `ListObject` permissions to the S3 bucket for your target endpoint. This role is assumed by both AWS DMS and Neptune with IAM access to both the target S3 bucket and the Neptune database. For more information, see [Creating an IAM service role for accessing Amazon Neptune as a target](#CHAP_Target.Neptune.ServiceRole).

After you have these resources, setting up and starting a migration to a Neptune target is similar to any full load migration using the console or DMS API. However, a migration to a Neptune target requires some unique steps.

**To migrate an AWS DMS relational database to Neptune**

1. Create a replication instance as described in [Creating a replication instance](CHAP_ReplicationInstance.Creating.md).

1. Create and test a SQL relational database supported by AWS DMS for the source endpoint.

1. Create and test the target endpoint for your Neptune database. 

   To connect the target endpoint to the Neptune database, specify the server name for either the Neptune cluster endpoint or the Neptune writer instance endpoint. Also, specify the S3 bucket folder for AWS DMS to store its intermediate files for bulk load to the Neptune database. 

   During migration, AWS DMS stores all migrated target data in this S3 bucket folder up to a maximum file size that you specify. When this file storage reaches this maximum size, AWS DMS bulk loads the stored S3 data into the target database. It clears the folder to enable storage of any additional target data for subsequent loading to the target database. For more information on specifying these settings, see [Specifying endpoint settings for Amazon Neptune as a target](#CHAP_Target.Neptune.EndpointSettings).

1. Create a full-load replication task with the resources created in steps 1–3 and do the following: 

   1. Use task table mapping as usual to identify specific source schemas, tables, and views to migrate from your relational database using appropriate selection and transformation rules. For more information, see [Using table mapping to specify task settings](CHAP_Tasks.CustomizingTasks.TableMapping.md). 

   1. Specify target mappings by choosing one of the following to specify mapping rules from source tables and views to your Neptune target database graph:
      + Gremlin JSON – For information on using Gremlin JSON to load a Neptune database, see [Gremlin load data format](https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load-tutorial-format-gremlin.html) in the *Amazon Neptune User Guide*.
      + SPARQL RDB to Resource Description Framework Mapping Language (R2RML) – For information on using SPARQL R2RML, see the W3C specification [R2RML: RDB to RDF mapping language](https://www.w3.org/TR/r2rml/).

   1. Do one of the following:
      + Using the AWS DMS console, specify graph-mapping options using **Graph mapping rules** on the **Create database migration task** page. 
      + Using the AWS DMS API, specify these options using the `TaskData` request parameter of the `CreateReplicationTask` API call. 

      For more information and examples using Gremlin JSON and SPARQL R2RML to specify graph-mapping rules, see [Specifying graph-mapping rules using Gremlin and R2RML for Amazon Neptune as a target](#CHAP_Target.Neptune.GraphMapping).

1. Start the replication for your migration task.

## Specifying endpoint settings for Amazon Neptune as a target
<a name="CHAP_Target.Neptune.EndpointSettings"></a>

To create or modify a target endpoint, you can use the console or the `CreateEndpoint` or `ModifyEndpoint` API operations. 

For a Neptune target in the AWS DMS console, specify **Endpoint-specific settings** on the **Create endpoint** or **Modify endpoint** console page. For `CreateEndpoint` and `ModifyEndpoint`, specify request parameters for the `NeptuneSettings` option. The following example shows how to do this using the CLI. 

```
dms create-endpoint --endpoint-identifier my-neptune-target-endpoint
--endpoint-type target --engine-name neptune 
--server-name my-neptune-db.cluster-cspckvklbvgf.us-east-1.neptune.amazonaws.com 
--port 8192
--neptune-settings 
     '{"ServiceAccessRoleArn":"arn:aws:iam::123456789012:role/myNeptuneRole",
       "S3BucketName":"amzn-s3-demo-bucket",
       "S3BucketFolder":"amzn-s3-demo-bucket-folder",
       "ErrorRetryDuration":57,
       "MaxFileSize":100, 
       "MaxRetryCount": 10, 
       "IAMAuthEnabled":false}‘
```

Here, the CLI `--server-name` option specifies the server name for the Neptune cluster writer endpoint. Or you can specify the server name for a Neptune writer instance endpoint. 

The `--neptune-settings` option request parameters follow:
+ `ServiceAccessRoleArn` – (Required) The Amazon Resource Name (ARN) of the service role that you created for the Neptune target endpoint. For more information, see [Creating an IAM service role for accessing Amazon Neptune as a target](#CHAP_Target.Neptune.ServiceRole).
+ `S3BucketName` – (Required) The name of the S3 bucket where DMS can temporarily store migrated graph data in .csv files before bulk loading it to the Neptune target database. DMS maps the SQL source data to graph data before storing it in these .csv files.
+ `S3BucketFolder` – (Required) A folder path where you want DMS to store migrated graph data in the S3 bucket specified by `S3BucketName`.
+ `ErrorRetryDuration` – (Optional) The number of milliseconds for DMS to wait to retry a bulk load of migrated graph data to the Neptune target database before raising an error. The default is 250.
+ `MaxFileSize` – (Optional) The maximum size in KB of migrated graph data stored in a .csv file before DMS bulk loads the data to the Neptune target database. The default is 1,048,576 KB (1 GB). If successful, DMS clears the bucket, ready to store the next batch of migrated graph data.
+ `MaxRetryCount` – (Optional) The number of times for DMS to retry a bulk load of migrated graph data to the Neptune target database before raising an error. The default is 5.
+ `IAMAuthEnabled` – (Optional) If you want IAM authorization enabled for this endpoint, set this parameter to `true` and attach the appropriate IAM policy document to your service role specified by `ServiceAccessRoleArn`. The default is `false`.

## Creating an IAM service role for accessing Amazon Neptune as a target
<a name="CHAP_Target.Neptune.ServiceRole"></a>

To access Neptune as a target, create a service role using IAM. Depending on your Neptune endpoint configuration, attach to this role some or all of the following IAM policy and trust documents. When you create the Neptune endpoint, you provide the ARN of this service role. Doing so enables AWS DMS and Amazon Neptune to assume permissions to access both Neptune and its associated Amazon S3 bucket.

If you set the `IAMAuthEnabled` parameter in `NeptuneSettings` to `true` in your Neptune endpoint configuration, attach an IAM policy like the following to your service role. If you set `IAMAuthEnabled` to `false`, you can ignore this policy.

```
// Policy to access Neptune

    {
        "Version": "2012-10-17",		 	 	 
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": "neptune-db:*",
                "Resource": "arn:aws:neptune-db:us-east-1:123456789012:cluster-CLG7H7FHK54AZGHEH6MNS55JKM/*"
            }
        ]
    }
```

The preceding IAM policy allows full access to the Neptune target cluster specified by `Resource`.

Attach an IAM policy like the following to your service role. This policy allows DMS to temporarily store migrated graph data in the S3 bucket that you created for bulk loading to the Neptune target database.

```
//Policy to access S3 bucket

{
	"Version": "2012-10-17",		 	 	 
	"Statement": [{
			"Sid": "ListObjectsInBucket0",
			"Effect": "Allow",
			"Action": "s3:ListBucket",
			"Resource": [
				"arn:aws:s3:::amzn-s3-demo-bucket"
			]
		},
		{
			"Sid": "AllObjectActions",
			"Effect": "Allow",
			"Action": ["s3:GetObject",
				"s3:PutObject",
				"s3:DeleteObject"
			],

			"Resource": [
				"arn:aws:s3:::amzn-s3-demo-bucket/"
			]
		},
		{
			"Sid": "ListObjectsInBucket1",
			"Effect": "Allow",
			"Action": "s3:ListBucket",
			"Resource": [
				"arn:aws:s3:::amzn-s3-demo-bucket",
				"arn:aws:s3:::amzn-s3-demo-bucket/"
			]
		}
	]
}
```

The preceding IAM policy allows your account to query the contents of the S3 bucket (`arn:aws:s3:::amzn-s3-demo-bucket`) created for your Neptune target. It also allows your account to fully operate on the contents of all bucket files and folders (`arn:aws:s3:::amzn-s3-demo-bucket/`).

Edit the trust relationship and attach the following IAM role to your service role to allow both AWS DMS and Amazon Neptune database service to assume the role.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "dms.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Sid": "neptune",
      "Effect": "Allow",
      "Principal": {
        "Service": "rds.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
```

------

For information about specifying this service role for your Neptune target endpoint, see [Specifying endpoint settings for Amazon Neptune as a target](#CHAP_Target.Neptune.EndpointSettings).

## Specifying graph-mapping rules using Gremlin and R2RML for Amazon Neptune as a target
<a name="CHAP_Target.Neptune.GraphMapping"></a>

The graph-mapping rules that you create specify how data extracted from an SQL relational database source is loaded into a Neptune database cluster target. The format of these mapping rules differs depending on whether the rules are for loading property-graph data using Apache TinkerPop Gremlin or Resource Description Framework (RDF) data using R2RML. Following, you can find information about these formats and where to learn more.

You can specify these mapping rules when you create the migration task using either the console or DMS API. 

Using the console, specify these mapping rules using **Graph mapping rules** on the **Create database migration task** page. In **Graph mapping rules**, you can enter and edit the mapping rules directly using the editor provided. Or you can browse for a file that contains the mapping rules in the appropriate graph-mapping format. 

Using the API, specify these options using the `TaskData` request parameter of the `CreateReplicationTask` API call. Set `TaskData` to the path of a file containing the mapping rules in the appropriate graph-mapping format.

### Graph-mapping rules for generating property-graph data using Gremlin
<a name="CHAP_Target.Neptune.GraphMapping.Gremlin"></a>

Using Gremlin to generate the property-graph data, specify a JSON object with a mapping rule for each graph entity to be generated from the source data. The format of this JSON is defined specifically for bulk loading Amazon Neptune. The following template shows what each rule in this object looks like.

```
{
    "rules": [
        {
            "rule_id": "(an identifier for this rule)",
            "rule_name": "(a name for this rule)",
            "table_name": "(the name of the table or view being loaded)",
            "vertex_definitions": [
                {
                    "vertex_id_template": "{col1}",
                    "vertex_label": "(the vertex to create)",
                    "vertex_definition_id": "(an identifier for this vertex)",
                    "vertex_properties": [
                        {
                            "property_name": "(name of the property)",
                            "property_value_template": "{col2} or text",
                            "property_value_type": "(data type of the property)"
                        }
                    ]
                }
            ]
        },
        {
            "rule_id": "(an identifier for this rule)",
            "rule_name": "(a name for this rule)",
            "table_name": "(the name of the table or view being loaded)",
            "edge_definitions": [
                {
                    "from_vertex": {
                        "vertex_id_template": "{col1}",
                        "vertex_definition_id": "(an identifier for the vertex referenced above)"
                    },
                    "to_vertex": {
                        "vertex_id_template": "{col3}",
                        "vertex_definition_id": "(an identifier for the vertex referenced above)"
                    },
                    "edge_id_template": {
                        "label": "(the edge label to add)",
                        "template": "{col1}_{col3}"
                    },
                    "edge_properties":[
                        {
                            "property_name": "(the property to add)",
                            "property_value_template": "{col4} or text",
                            "property_value_type": "(data type like String, int, double)"
                        }
                    ]
                }
            ]
        }
    ]
}
```

The presence of a vertex label implies that the vertex is being created here. Its absence implies that the vertex is created by a different source, and this definition is only adding vertex properties. Specify as many vertex and edge definitions as required to specify the mappings for your entire relational database source.

A sample rule for an `employee` table follows.

```
{
    "rules": [
        {
            "rule_id": "1",
            "rule_name": "vertex_mapping_rule_from_nodes",
            "table_name": "nodes",
            "vertex_definitions": [
                {
                    "vertex_id_template": "{emp_id}",
                    "vertex_label": "employee",
                    "vertex_definition_id": "1",
                    "vertex_properties": [
                        {
                            "property_name": "name",
                            "property_value_template": "{emp_name}",
                            "property_value_type": "String"
                        }
                    ]
                }
            ]
        },
        {
            "rule_id": "2",
            "rule_name": "edge_mapping_rule_from_emp",
            "table_name": "nodes",
            "edge_definitions": [
                {
                    "from_vertex": {
                        "vertex_id_template": "{emp_id}",
                        "vertex_definition_id": "1"
                    },
                    "to_vertex": {
                        "vertex_id_template": "{mgr_id}",
                        "vertex_definition_id": "1"
                    },
                    "edge_id_template": {
                        "label": "reportsTo",
                        "template": "{emp_id}_{mgr_id}"
                    },
                    "edge_properties":[
                        {
                            "property_name": "team",
                            "property_value_template": "{team}",
                            "property_value_type": "String"
                        }
                    ]
                }
            ]
        }
    ]
}
```

Here, the vertex and edge definitions map a reporting relationship from an `employee` node with employee ID (`EmpID`) and an `employee` node with a manager ID (`managerId`).

For more information about creating graph-mapping rules using Gremlin JSON, see [Gremlin load data format](https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load-tutorial-format-gremlin.html) in the *Amazon Neptune User Guide*.

### Graph-mapping rules for generating RDF/SPARQL data
<a name="CHAP_Target.Neptune.GraphMapping.R2RML"></a>

If you are loading RDF data to be queried using SPARQL, write the graph-mapping rules in R2RML. R2RML is a standard W3C language for mapping relational data to RDF. In an R2RML file, a *triples map* (for example, `<#TriplesMap1>` following) specifies a rule for translating each row of a logical table to zero or more RDF triples. A *subject map* (for example, any `rr:subjectMap` following) specifies a rule for generating the subjects of the RDF triples generated by a triples map. A *predicate-object map* (for example, any `rr:predicateObjectMap` following) is a function that creates one or more predicate-object pairs for each logical table row of a logical table.

A simple example for a `nodes` table follows.

```
@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix ex: <http://example.com/ns#>.

<#TriplesMap1>
    rr:logicalTable [ rr:tableName "nodes" ];
    rr:subjectMap [
        rr:template "http://data.example.com/employee/{id}";
        rr:class ex:Employee;
    ];
    rr:predicateObjectMap [
        rr:predicate ex:name;
        rr:objectMap [ rr:column "label" ];
    ]
```

In the previous example, the mapping defines graph nodes mapped from a table of employees.

Another simple example for a `Student` table follows.

```
@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix ex: <http://example.com/#>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.

<#TriplesMap2>
    rr:logicalTable [ rr:tableName "Student" ];
    rr:subjectMap   [ rr:template "http://example.com/{ID}{Name}";
                      rr:class foaf:Person ];
    rr:predicateObjectMap [
        rr:predicate ex:id ;
        rr:objectMap  [ rr:column "ID";
                        rr:datatype xsd:integer ]
    ];
    rr:predicateObjectMap [
        rr:predicate foaf:name ;
        rr:objectMap  [ rr:column "Name" ]
    ].
```

In the previous example, the mapping defines graph nodes mapping friend-of-a-friend relationships between persons in a `Student` table.

For more information about creating graph-mapping rules using SPARQL R2RML, see the W3C specification [R2RML: RDB to RDF mapping language](https://www.w3.org/TR/r2rml/).

## Data types for Gremlin and R2RML migration to Amazon Neptune as a target
<a name="CHAP_Target.Neptune.DataTypes"></a>

AWS DMS performs data type mapping from your SQL source endpoint to your Neptune target in one of two ways. Which way you use depends on the graph mapping format that you're using to load the Neptune database: 
+ Apache TinkerPop Gremlin, using a JSON representation of the migration data.
+ W3C's SPARQL, using an R2RML representation of the migration data. 

For more information on these two graph mapping formats, see [Specifying graph-mapping rules using Gremlin and R2RML for Amazon Neptune as a target](#CHAP_Target.Neptune.GraphMapping).

Following, you can find descriptions of the data type mappings for each format.

### SQL source to Gremlin target data type mappings
<a name="CHAP_Target.Neptune.DataTypes.Gremlin"></a>

The following table shows the data type mappings from a SQL source to a Gremlin formatted target. 

AWS DMS maps any unlisted SQL source data type to a Gremlin `String`.


[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.Neptune.html)

For more information on the Gremlin data types for loading Neptune, see [Gremlin data types](https://docs.aws.amazon.com//neptune/latest/userguide/bulk-load-tutorial-format-gremlin.html#bulk-load-tutorial-format-gremlin-datatypes) in the *Neptune User Guide.*

### SQL source to R2RML (RDF) target data type mappings
<a name="CHAP_Target.Neptune.DataTypes.R2RML"></a>

The following table shows the data type mappings from a SQL source to an R2RML formatted target.

All listed RDF data types are case-sensitive, except RDF literal. AWS DMS maps any unlisted SQL source data type to an RDF literal. 

An *RDF literal* is one of a variety of literal lexical forms and data types. For more information, see [RDF literals](https://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-Graph-Literal) in the W3C specification *Resource Description Framework (RDF): Concepts and Abstract Syntax*.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.Neptune.html)

For more information on the RDF data types for loading Neptune and their mappings to SQL source data types, see [Datatype conversions](https://www.w3.org/TR/r2rml/#datatype-conversions) in the W3C specification *R2RML: RDB to RDF Mapping Language*.

## Limitations of using Amazon Neptune as a target
<a name="CHAP_Target.Neptune.Limitations"></a>

The following limitations apply when using Neptune as a target:
+ AWS DMS currently supports full load tasks only for migration to a Neptune target. Change data capture (CDC) migration to a Neptune target isn't supported.
+ Make sure that your target Neptune database is manually cleared of all data before starting the migration task, as in the following examples.

  To drop all data (vertices and edges) within the graph, run the following Gremlin command.

  ```
  gremlin> g.V().drop().iterate()
  ```

  To drop vertices that have the label `'customer'`, run the following Gremlin command.

  ```
  gremlin> g.V().hasLabel('customer').drop()
  ```
**Note**  
It can take some time to drop a large dataset. You might want to iterate `drop()` with a limit, for example, `limit(1000)`.

  To drop edges that have the label `'rated'`, run the following Gremlin command.

  ```
  gremlin> g.E().hasLabel('rated').drop()
  ```
**Note**  
It can take some time to drop a large dataset. You might want to iterate `drop()` with a limit, for example `limit(1000)`.
+ The DMS API operation `DescribeTableStatistics` can return inaccurate results about a given table because of the nature of Neptune graph data structures.

  During migration, AWS DMS scans each source table and uses graph mapping to convert the source data into a Neptune graph. The converted data is first stored in the S3 bucket folder specified for the target endpoint. If the source is scanned and this intermediate S3 data is generated successfully, `DescribeTableStatistics` assumes that the data was successfully loaded into the Neptune target database. But this isn't always true. To verify that the data was loaded correctly for a given table, compare `count()` return values at both ends of the migration for that table. 

  In the following example, AWS DMS has loaded a `customer` table from the source database, which is assigned the label `'customer'` in the target Neptune database graph. You can make sure that this label is written to the target database. To do this, compare the number of `customer` rows available from the source database with the number of `'customer'` labeled rows loaded in the Neptune target database after the task completes.

  To get the number of customer rows available from the source database using SQL, run the following.

  ```
  select count(*) from customer;
  ```

  To get the number of `'customer'` labeled rows loaded into the target database graph using Gremlin, run the following.

  ```
  gremlin> g.V().hasLabel('customer').count()
  ```
+ Currently, if any single table fails to load, the whole task fails. Unlike in a relational database target, data in Neptune is highly connected, which makes it impossible in many cases to resume a task. If a task can't be resumed successfully because of this type of data load failure, create a new task to load the table that failed to load. Before running this new task, manually clear the partially loaded table from the Neptune target.
**Note**  
You can resume a task that fails migration to a Neptune target if the failure is recoverable (for example, a network transit error).
+ AWS DMS supports most standards for R2RML. However, AWS DMS doesn't support certain R2RML standards, including inverse expressions, joins, and views. A work-around for an R2RML view is to create a corresponding custom SQL view in the source database. In the migration task, use table mapping to choose the view as input. Then map the view to a table that is then consumed by R2RML to generate graph data.
+ When you migrate source data with unsupported SQL data types, the resulting target data can have a loss of precision. For more information, see [Data types for Gremlin and R2RML migration to Amazon Neptune as a target](#CHAP_Target.Neptune.DataTypes).
+ AWS DMS doesn't support migrating LOB data into a Neptune target.

# Using Redis OSS as a target for AWS Database Migration Service
<a name="CHAP_Target.Redis"></a>

Redis OSS is an open-source in-memory data structure store used as a database, cache, and message broker. Managing data in-memory can result in read or write operations taking less than a millisecond, and hundreds of millions of operations performed each second. As an in-memory data store, Redis OSS powers the most demanding applications requiring sub-millisecond response times.

Using AWS DMS, you can migrate data from any supported source database to a target Redis OSS data store with minimal downtime. For additional information about Redis OSS see, [Redis OSS Documentation](https://redis.io/documentation).

In addition to on-premises Redis OSS, AWS Database Migration Service supports the following:
+ [Amazon ElastiCache (Redis OSS)](https://aws.amazon.com/elasticache/redis/) as a target data store. ElastiCache (Redis OSS) works with your Redis OSS clients and uses the open Redis OSS data format to store your data.
+ [Amazon MemoryDB](https://aws.amazon.com/memorydb/) as a target data store. MemoryDB is compatible with Redis OSS and enables you to build applications using all the Redis OSS data structures, APIs, and commands in use today.

For additional information about working with Redis OSS as a target for AWS DMS, see the following sections: 

**Topics**
+ [Prerequisites for using a Redis OSS cluster as a target for AWS DMS](#CHAP_Target.Redis.Prerequisites)
+ [Limitations when using Redis as a target for AWS Database Migration Service](#CHAP_Target.Redis.Limitations)
+ [Migrating data from a relational or non-relational database to a Redis OSS target](#CHAP_Target.Redis.Migrating)
+ [Specifying endpoint settings for Redis OSS as a target](#CHAP_Target.Redis.EndpointSettings)

## Prerequisites for using a Redis OSS cluster as a target for AWS DMS
<a name="CHAP_Target.Redis.Prerequisites"></a>

DMS supports an on-premises Redis OSS target in a standalone configuration, or as a Redis OSS cluster where data is automatically *sharded* across multiple nodes. Sharding is the process of separating data into smaller chunks called shards that are spread across multiple servers or nodes. In effect, a shard is a data partition that contains a subset of the total data set, and serves a slice of the overall workload.

Since Redis OSS is a key-value NoSQL data store, the Redis OSS key naming convention to use when your source is a relational database, is **schema-name.table-name.primary-key**. In Redis OSS, the key and value must not contain the special character %. Otherwise, DMS skips the record. 

**Note**  
If you are using ElastiCache (Redis OSS) as a target, DMS supports *cluster mode enabled* configurations only. For more information about using ElastiCache (Redis OSS) version 6.x or higher to create a cluster mode enabled target data store, see [Getting started](https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/GettingStarted.html) in the *Amazon ElastiCache (Redis OSS) User Guide*. 

Before you begin a database migration, launch your Redis OSS cluster with the following criteria.
+ Your cluster has one or more shards.
+ If you're using an ElastiCache (Redis OSS) target, ensure that your cluster doesn't use IAM role-based access control. Instead, use Redis OSS Auth to authenticate users.
+ Enable Multi-AZ (Availability Zones).
+ Ensure the cluster has sufficient memory available to fit the data to be migrated from your database. 
+ Make sure that your target Redis OSS cluster is clear of all data before starting the initial migration task.

You should determine your security requirements for the data migration prior to creating your cluster configuration. DMS supports migration to target replication groups regardless of their encryption configuration. But you can enable or disable encryption only when you create your cluster configuration.

## Limitations when using Redis as a target for AWS Database Migration Service
<a name="CHAP_Target.Redis.Limitations"></a>

The following limitations apply when using Redis OSS as a target:
+ Since Redis OSS is a key-value no-sql data store, the Redis OSS key naming convention to use when your source is a relational database, is `schema-name.table-name.primary-key`. 
+ In Redis OSS, the key-value can't contain the special character `%`. Otherwise, DMS skips the record.
+ DMS won't migrate rows that contain the `%` character.
+ DMS won't migrate fields that contain the `%` character in the field name.
+ Full LOB mode is not supported.
+  A private Certificate Authority (CA) isn’t supported when using ElastiCache (Redis OSS) as a target.
+ AWS DMS does not support source data containing embedded `'\0'` characters when using Redis as a target endpoint. Data containing embedded `'\0'` characters will be truncated at the first `'\0'` character.

## Migrating data from a relational or non-relational database to a Redis OSS target
<a name="CHAP_Target.Redis.Migrating"></a>

You can migrate data from any source SQL or NoSQL data store directly to a Redis OSS target. Setting up and starting a migration to a Redis OSS target is similar to any full load and change data capture migration using the DMS console or API. To perform a database migration to a Redis OSS target, you do the following.
+ Create a replication instance to perform all the processes for the migration. For more information, see [Creating a replication instance](CHAP_ReplicationInstance.Creating.md).
+ Specify a source endpoint. For more information, see [Creating source and target endpoints](CHAP_Endpoints.Creating.md).
+ Locate the DNS name and port number of your cluster.
+ Download a certificate bundle that you can use to verify SSL connections.
+ Specify a target endpoint, as described below.
+ Create a task or set of tasks to define what tables and replication processes you want to use. For more information, see [Creating a task](CHAP_Tasks.Creating.md).
+ Migrate data from your source database to your target cluster.

You begin a database migration in one of two ways:

1. You can choose the AWS DMS console and perform each step there.

1. You can use the AWS Command Line Interface (AWS CLI). For more information about using the CLI with AWS DMS, see [AWS CLI for AWS DMS](http://docs.aws.amazon.com/cli/latest/reference/dms/index.html).

**To locate the DNS name and port number of your cluster**
+ Use the following AWS CLI command to provide the `replication-group-id` with the name of your replication group.

  ```
  aws elasticache describe-replication-groups --replication-group-id myreplgroup
  ```

  Here, the output shows the DNS name in the `Address` attribute and the port number in the `Port` attribute of the primary node in the cluster. 

  ```
   ...
  "ReadEndpoint": {
  "Port": 6379,
  "Address": "myreplgroup-
  111.1abc1d.1111.uuu1.cache.example.com"
  }
  ...
  ```

  If you are using MemoryDB as your target, use the following AWS CLI command to provide an endpoint address to your Redis OSS cluster. 

  ```
  aws memorydb describe-clusters --clusterid clusterid
  ```

**Download a certificate bundle for use to verify SSL connections**
+ Enter the following `wget` command at the command line. Wget is a free GNU command-line utility tool used to download files from the internet.

  ```
  wget https://s3.aws-api-domain/rds-downloads/rds-combined-ca-bundle.pem
  ```

  Here, `aws-api-domain` completes the Amazon S3 domain in your AWS Region required to access the speciﬁed S3 bucket and the rds-combined-ca-bundle.pem ﬁle that it provides.

**To create a target endpoint using the AWS DMS console**

This endpoint is for your Redis OSS target that is already running. 
+ On the console, choose **Endpoints** from the navigation pane and then choose **Create Endpoint**. The following table describes the settings.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.Redis.html)

When you're finished providing all information for your endpoint, AWS DMS creates your Redis OSS target endpoint for use during database migration.

For information about creating a migration task and starting your database migration, see [Creating a task](CHAP_Tasks.Creating.md).

## Specifying endpoint settings for Redis OSS as a target
<a name="CHAP_Target.Redis.EndpointSettings"></a>

To create or modify a target endpoint, you can use the console or the `CreateEndpoint` or `ModifyEndpoint` API operations. 

For a Redis OSS target in the AWS DMS console, specify **Endpoint-specific settings** on the **Create endpoint** or **Modify endpoint** console page.

When using `CreateEndpoint` and `ModifyEndpoint` API operations, specify request parameters for the `RedisSettings` option. The example following shows how to do this using the AWS CLI.

```
aws dms create-endpoint --endpoint-identifier my-redis-target
--endpoint-type target --engine-name redis --redis-settings 
'{"ServerName":"sample-test-sample.zz012zz.cluster.eee1.cache.bbbxxx.com","Port":6379,"AuthType":"auth-token", 
 "SslSecurityProtocol":"ssl-encryption", "AuthPassword":"notanactualpassword"}'

{
    "Endpoint": {
        "EndpointIdentifier": "my-redis-target",
        "EndpointType": "TARGET",
        "EngineName": "redis",
        "EngineDisplayName": "Redis",
        "TransferFiles": false,
        "ReceiveTransferredFiles": false,
        "Status": "active",
        "KmsKeyId": "arn:aws:kms:us-east-1:999999999999:key/x-b188188x",
        "EndpointArn": "arn:aws:dms:us-east-1:555555555555:endpoint:ABCDEFGHIJKLMONOPQRSTUVWXYZ",
        "SslMode": "none",
        "RedisSettings": {
            "ServerName": "sample-test-sample.zz012zz.cluster.eee1.cache.bbbxxx.com",
            "Port": 6379,
            "SslSecurityProtocol": "ssl-encryption",
            "AuthType": "auth-token"
        }
    }
}
```

The `--redis-settings` parameters follow:
+ `ServerName`–(Required) Of type `string`, specifies the Redis OSS cluster that data will be migrated to, and is in your same VPC.
+ `Port`–(Required) Of type `number`, the port value used to access the endpoint.
+ `SslSecurityProtocol`–(Optional) Valid values include `plaintext` and `ssl-encryption`. The default is `ssl-encryption`. 

  The `plaintext` option doesn't provide Transport Layer Security (TLS) encryption for traffic between endpoint and database. 

  Use `ssl-encryption` to make an encrypted connection. `ssl-encryption` doesn’t require an SSL Certificate Authority (CA) ARN to verify a server’s certificate, but one can be identified optionally using the `SslCaCertificateArn` setting. If a certificate authority ARN isn't given, DMS uses the Amazon root CA.

  When using an on-premises Redis OSS target, you can use `SslCaCertificateArn` to import public or private Certificate Authority (CA) into DMS, and provide that ARN for server authentication. A private CA isn’t supported when using ElastiCache (Redis OSS) as a target.
+ `AuthType`–(Required) Indicates the type of authentication to perform when connecting to Redis OSS. Valid values include `none`, `auth-token`, and `auth-role`.

  The `auth-token` option requires an "*AuthPassword*" be provided, while the `auth-role` option requires "*AuthUserName*" and "*AuthPassword*" be provided.

# Using Babelfish as a target for AWS Database Migration Service
<a name="CHAP_Target.Babelfish"></a>

You can migrate data from a Microsoft SQL Server source database to a Babelfish target using AWS Database Migration Service. 

Babelfish for Aurora PostgreSQL extends your Amazon Aurora PostgreSQL-Compatible Edition database with the ability to accept database connections from Microsoft SQL Server clients. Doing this allows applications originally built for SQL Server to work directly with Aurora PostgreSQL with few code changes compared to a traditional migration, and without changing database drivers. 

For information about versions of Babelfish that AWS DMS supports as a target, see [Targets for AWS DMS](CHAP_Introduction.Targets.md). Earlier versions of Babelfish on Aurora PostgreSQL require an upgrade before using the Babelfish endpoint.

**Note**  
The Aurora PostgreSQL target endpoint is the preferred way to migrate data to Babelfish. For more information, see [Using Babelfish for Aurora PostgreSQL as a target](CHAP_Target.PostgreSQL.md#CHAP_Target.PostgreSQL.Babelfish). 

For information about using Babelfish as a database endpoint, see [Babelfish for Aurora PostgreSQL](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.AuroraPostgreSQL.html) in the *Amazon Aurora User Guide for Aurora* 

## Prerequisites to using Babelfish as a target for AWS DMS
<a name="CHAP_Target.Babelfish.Prerequisites"></a>

You must create your tables before migrating data to make sure that AWS DMS uses the correct data types and table metadata. If you don't create your tables on the target before running migration, AWS DMS may create the tables with incorrect data types and permissions. For example, AWS DMS creates a timestamp column as binary(8) instead, and doesn't provide the expected timestamp/rowversion functionality.

**To prepare and create your tables prior to migration**

1. Run your create table DDL statements that include any unique constraints, primary keys, or default constraints. 

   Do not include foreign key constraints, or any DDL statements for objects like views, stored procedures, functions, or triggers. You can apply them after migrating your source database.

1. Identify any identity columns, computed columns, or columns containing rowversion or timestamp data types for your tables. Then, create the necessary transformation rules to handle known issues when running the migration task. For more information see, [Transformation rules and actions](CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Transformations.md).

1. Identify columns with data types that Babelfish doesn't support. Then, change the affected columns in the target table to use supported data types, or create a transformation rule that removes them during the migration task. For more information see, [Transformation rules and actions](CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Transformations.md).

   The following table lists source data types not supported by Babelfish, and the corresponding recommended target data type to use.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.Babelfish.html)

**To set Aurora capacity units (ACUs) level for your Aurora PostgreSQL Serverless V2 source database**

You can improve performance of your AWS DMS migration task prior to running it by setting the minimum ACU value.
+ From the **Severless v2 capacity settings** window, set **Minimum ACUs** to **2**, or a reasonable level for your Aurora DB cluster.

  For additional information about setting Aurora capacity units, see [ Choosing the Aurora Serverless v2 capacity range for an Aurora cluster](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2.setting-capacity.html) in the *Amazon Aurora User Guide* 

After running your AWS DMS migration task, you can reset the minimum value of your ACUs to a reasonable level for your Aurora PostgreSQL Serverless V2 source database.

## Security requirements when using Babelfish as a target for AWS Database Migration Service
<a name="CHAP_Target.Babelfish.Security"></a>

The following describes the security requirements for using AWS DMS with a Babelfish target:
+ The administrator user name (the Admin user) used to create the database.
+ PSQL login and user with the sufficient SELECT, INSERT, UPDATE, DELETE, and REFERENCES permissions.

## User permissions for using Babelfish as a target for AWS DMS
<a name="CHAP_Target.Babelfish.Permissions"></a>

**Important**  
For security purposes, the user account used for the data migration must be a registered user in any Babelfish database that you use as a target.

Your Babelfish target endpoint requires minimum user permissions to run an AWS DMS migration.

**To create a login and a low-privileged Transact-SQL (T-SQL) user**

1. Create a login and password to use when connecting to the server.

   ```
   CREATE LOGIN dms_user WITH PASSWORD = 'password';
   GO
   ```

1. Create the virtual database for your Babelfish cluster.

   ```
   CREATE DATABASE my_database;
   GO
   ```

1. Create the T-SQL user for your target database.

   ```
   USE my_database
   GO
   CREATE USER dms_user FOR LOGIN dms_user;
   GO
   ```

1. For each table in your Babelfish database, GRANT permissions to the tables.

   ```
   GRANT SELECT, DELETE, INSERT, REFERENCES, UPDATE ON [dbo].[Categories] TO dms_user;  
   ```

## Limitations on using Babelfish as a target for AWS Database Migration Service
<a name="CHAP_Target.Babelfish.Limitations"></a>

The following limitations apply when using a Babelfish database as a target for AWS DMS:
+ Only table preparation mode “**Do Nothing**“ is supported.
+ The ROWVERSION data type requires a table mapping rule that removes the column name from the table during the migration task.
+ The sql\$1variant data type isn't supported.
+ Full LOB mode is supported. Using SQL Server as a source endpoint requires the SQL Server Endpoint Connection Attribute setting `ForceFullLob=True` to be set in order for LOBs to be migrated to the target endpoint.
+ Replication task settings have the following limitations:

  ```
  {
     "FullLoadSettings": {
        "TargetTablePrepMode": "DO_NOTHING",
        "CreatePkAfterFullLoad": false,
        }.
      
  }
  ```
+ TIME(7), DATETIME2(7), and DATETIMEOFFSET(7) data types in Babelfish limit the precision value for the seconds portion of the time to 6 digits. Consider using a precision value of 6 for your target table when using these data types. For Babelfish versions 2.2.0 and higher, when using TIME(7) and DATETIME2(7), the seventh digit of precision is always zero.
+ In DO\$1NOTHING mode, DMS checks to see if the table already exists. If the table doesn't exist in the target schema, DMS creates the table based on the source table definition, and maps any user defined data types to their base data type.
+ An AWS DMS migration task to a Babelfish target doesn't support tables that have columns using ROWVERSION or TIMESTAMP data types. You can use a table mapping rule that removes the column name from the table during the transfer process. In the following transformation rule example, a table named `Actor` in your source is transformed to remove all columns starting with the characters `col` from the `Actor` table in your target.

  ```
  {
   	"rules": [{
  		"rule-type": "selection",is 
  		"rule-id": "1",
  		"rule-name": "1",
  		"object-locator": {
  			"schema-name": "test",
  			"table-name": "%"
  		},
  		"rule-action": "include"
  	}, {
  		"rule-type": "transformation",
  		"rule-id": "2",
  		"rule-name": "2",
  		"rule-action": "remove-column",
  		"rule-target": "column",
  		"object-locator": {
  			"schema-name": "test",
  			"table-name": "Actor",
  			"column-name": "col%"
  		}
  	}]
   }
  ```
+ For tables with identity or computed columns, where the target tables use mixed case names like Categories, you must create a transformation rule action that converts the table names to lowercase for your DMS task. The following example shows how to create the transformation rule action, **Make lowercase** using the AWS DMS console. For more information, see [Transformation rules and actions](CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Transformations.md).  
![\[Babelfish transformation rule\]](http://docs.aws.amazon.com/dms/latest/userguide/images/datarep-babelfish-transform-1.png)
+ Prior to Babelfish version 2.2.0, DMS limited the number of columns that you could replicate to a Babelfish target endpoint to twenty (20) columns. With Babelfish 2.2.0 the limit increased to 100 columns. But with Babelfish versions 2.4.0 and higher, the number of columns that you can replicate increases again. You can run the following code sample against your SQL Server database to determine which tables are too long.

  ```
  USE myDB;
  GO
  DECLARE @Babelfish_version_string_limit INT = 8000; -- Use 380 for Babelfish versions before 2.2.0
  WITH bfendpoint
  AS (
  SELECT 
  	[TABLE_SCHEMA]
        ,[TABLE_NAME]
  	  , COUNT( [COLUMN_NAME] ) AS NumberColumns
  	  , ( SUM( LEN( [COLUMN_NAME] ) + 3)  
  		+ SUM( LEN( FORMAT(ORDINAL_POSITION, 'N0') ) + 3 )  
  	    + LEN( TABLE_SCHEMA ) + 3
  		+ 12 -- INSERT INTO string
  		+ 12)  AS InsertIntoCommandLength -- values string
        , CASE WHEN ( SUM( LEN( [COLUMN_NAME] ) + 3)  
  		+ SUM( LEN( FORMAT(ORDINAL_POSITION, 'N0') ) + 3 )  
  	    + LEN( TABLE_SCHEMA ) + 3
  		+ 12 -- INSERT INTO string
  		+ 12)  -- values string
  			>= @Babelfish_version_string_limit
  			THEN 1
  			ELSE 0
  		END AS IsTooLong
  FROM [INFORMATION_SCHEMA].[COLUMNS]
  GROUP BY [TABLE_SCHEMA], [TABLE_NAME]
  )
  SELECT * 
  FROM bfendpoint
  WHERE IsTooLong = 1
  ORDER BY TABLE_SCHEMA, InsertIntoCommandLength DESC, TABLE_NAME
  ;
  ```

## Target data types for Babelfish
<a name="CHAP_Target.Babelfish.DataTypes"></a>

The following table shows the Babelfish target data types that are supported when using AWS DMS and the default mapping from AWS DMS data types.

For additional information about AWS DMS data types, see [Data types for AWS Database Migration Service](CHAP_Reference.DataTypes.md). 


|  AWS DMS data type  |  Babelfish data type   | 
| --- | --- | 
|  BOOLEAN  |  TINYINT  | 
|  BYTES  |  VARBINARY(length)  | 
|  DATE  |  DATE  | 
|  TIME  |  TIME  | 
|  INT1  |  SMALLINT  | 
|  INT2  |  SMALLINT  | 
|  INT4  |  INT  | 
|  INT8  |  BIGINT  | 
|  NUMERIC   |  NUMERIC(p,s)  | 
|  REAL4  |  REAL  | 
|  REAL8  |  FLOAT  | 
|  STRING  |  If the column is a date or time column, then do the following:  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.Babelfish.html) If the column is not a date or time column, use VARCHAR (length).  | 
|  UINT1  |  TINYINT  | 
|  UINT2  |  SMALLINT  | 
|  UINT4  |  INT  | 
|  UINT8  |  BIGINT  | 
|  WSTRING  |  NVARCHAR(length)  | 
|  BLOB  |  VARBINARY(max) To use this data type with DMS, you must enable the use of BLOBs for a specific task. DMS supports BLOB data types only in tables that include a primary key.  | 
|  CLOB  |  VARCHAR(max) To use this data type with DMS, you must enable the use of CLOBs for a specific task.  | 
|  NCLOB  |  NVARCHAR(max) To use this data type with DMS, you must enable the use of NCLOBs for a specific task. During CDC, DMS supports NCLOB data types only in tables that include a primary key.  | 

# Using Amazon Timestream as a target for AWS Database Migration Service
<a name="CHAP_Target.Timestream"></a>

You can use AWS Database Migration Service to migrate data from your source database to a Amazon Timestream target endpoint, with support for Full Load and CDC data migrations.

Amazon Timestream is a fast, scalable, and serverless time series database service built for high-volume data ingestion. Time series data is a sequence of data points collected over a time interval, and is used for measuring events that change over time. It is used to collect, store, and analyze metrics from IoT applications, DevOps applications, and analytics applications. Once you have your data in Timestream, you can visualize and identify trends and patterns in your data in near real-time. For information about Amazon Timestream, see [What is Amazon Timestream?](https://docs.aws.amazon.com/timestream/latest/developerguide/what-is-timestream.html) in the *Amazon Timestream Developer Guide*.

**Topics**
+ [Prerequisites for using Amazon Timestream as a target for AWS Database Migration Service](#CHAP_Target.Timestream.Prerequisites)
+ [Multithreaded full load task settings](#CHAP_Target.Timestream.FLTaskSettings)
+ [Multithreaded CDC load task settings](#CHAP_Target.Timestream.CDCTaskSettings)
+ [Endpoint settings when using Timestream as a target for AWS DMS](#CHAP_Target.Timestream.ConnectionAttrib)
+ [Creating and modifying an Amazon Timestream target endpoint](#CHAP_Target.Timestream.CreateModifyEndpoint)
+ [Using object mapping to migrate data to a Timestream topic](#CHAP_Target.Timestream.ObjectMapping)
+ [Limitations when using Amazon Timestream as a target for AWS Database Migration Service](#CHAP_Target.Timestream.Limitations)

## Prerequisites for using Amazon Timestream as a target for AWS Database Migration Service
<a name="CHAP_Target.Timestream.Prerequisites"></a>

Before you set up Amazon Timestream as a target for AWS DMS, make sure that you create an IAM role. This role must allow AWS DMS to gain access to the data being migrated into Amazon Timestream. The minimum set of access permissions for the role that you use to migrate to Timestream is shown in the following IAM policy.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "AllowDescribeEndpoints",
      "Effect": "Allow",
      "Action": [
        "timestream:DescribeEndpoints"
      ],
      "Resource": "*"
    },
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": [
        "timestream:ListTables",
        "timestream:DescribeDatabase"
      ],
      "Resource": "arn:aws:timestream:us-east-1:123456789012:database/DATABASE_NAME"
    },
    {
      "Sid": "VisualEditor1",
      "Effect": "Allow",
      "Action": [
        "timestream:DeleteTable",
        "timestream:WriteRecords",
        "timestream:UpdateTable",
        "timestream:CreateTable"
      ],
      "Resource": "arn:aws:timestream:us-east-1:123456789012:database/DATABASE_NAME/table/TABLE_NAME"
    }
  ]
}
```

------

If you intend to migrate all tables, use `*` for *TABLE\$1NAME* in the example above.

Note the following about using Timestream as a target:
+ If you intend to ingest historical data with timestamps exceeding 1 year old, we recommend to use AWS DMS to write the data to Amazon S3 in a comma separated value (csv) format. Then, use Timestream’s batch load to ingest the data into Timestream. For more information, see [Using batch load in Timestream](https://docs.aws.amazon.com/timestream/latest/developerguide/batch-load.html) in the [Amazon Timestream developer guide](https://docs.aws.amazon.com/timestream/latest/developerguide/what-is-timestream.html).
+ For full-load data migrations of data less than 1 year old, we recommend setting the memory store retention period of the Timestream table greater than or equal to the oldest timestamp. Then, once migration completes, edit the table's memory store retention to the desired value. For example, to migrate data with the oldest timestamp being 2 months old, do the following:
  + Set the Timestream target table's memory store retention to 2 months.
  + Start the data migration using AWS DMS.
  + Once the data migration completes, change the retention period of the target Timestream table to your desired value. 

   We recommend estimating the memory store cost prior to the migration, using the information on the following pages:
  + [Amazon Timestream pricing](https://aws.amazon.com/timestream/pricing)
  + [AWS pricing calculator](https://calculator.aws/#/addService) 
+ For CDC data migrations, we recommend setting the memory store retention period of the target table such that ingested data falls within the memory store retention bounds. For more information, see [ Writes Best Practices ](https://docs.aws.amazon.com/timestream/latest/developerguide/data-ingest.html) in the [Amazon Timestream developer guide](https://docs.aws.amazon.com/timestream/latest/developerguide/what-is-timestream.html).

## Multithreaded full load task settings
<a name="CHAP_Target.Timestream.FLTaskSettings"></a>

To help increase the speed of data transfer, AWS DMS supports a multithreaded full load migration task to a Timestream target endpoint with these task settings:
+ `MaxFullLoadSubTasks` – Use this option to indicate the maximum number of source tables to load in parallel. DMS loads each table into its corresponding Amazon Timestream target table using a dedicated subtask. The default is 8; the maximum value is 49.
+ `ParallelLoadThreads` – Use this option to specify the number of threads that AWS DMS uses to load each table into its Amazon Timestream target table. The maximum value for a Timestream target is 32. You can ask to have this maximum limit increased.
+ `ParallelLoadBufferSize` – Use this option to specify the maximum number of records to store in the buffer that the parallel load threads use to load data to the Amazon Timestream target. The default value is 50. The maximum value is 1,000. Use this setting with `ParallelLoadThreads`. `ParallelLoadBufferSize` is valid only when there is more than one thread.
+ `ParallelLoadQueuesPerThread` – Use this option to specify the number of queues each concurrent thread accesses to take data records out of queues and generate a batch load for the target. The default is 1. However, for Amazon Timestream targets of various payload sizes, the valid range is 5–512 queues per thread.

## Multithreaded CDC load task settings
<a name="CHAP_Target.Timestream.CDCTaskSettings"></a>

To promote CDC performance, AWS DMS supports these task settings:
+ `ParallelApplyThreads` – Specifies the number of concurrent threads that AWS DMS uses during a CDC load to push data records to a Timestream target endpoint. The default value is 0 and the maximum value is 32.
+ `ParallelApplyBufferSize` – Specifies the maximum number of records to store in each buffer queue for concurrent threads to push to a Timestream target endpoint during a CDC load. The default value is 100 and the maximum value is 1,000. Use this option when `ParallelApplyThreads` specifies more than one thread. 
+ `ParallelApplyQueuesPerThread` – Specifies the number of queues that each thread accesses to take data records out of queues and generate a batch load for a Timestream endpoint during CDC. The default value is 1 and the maximum value is 512.

## Endpoint settings when using Timestream as a target for AWS DMS
<a name="CHAP_Target.Timestream.ConnectionAttrib"></a>

You can use endpoint settings to configure your Timestream target database similar to using extra connection attributes. You specify the settings when you create the target endpoint using the AWS DMS console, or by using the `create-endpoint` command in the [AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/dms/index.html), with the `--timestream-settings '{"EndpointSetting": "value", ...}'` JSON syntax.

The following table shows the endpoint settings that you can use with Timestream as a target.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.Timestream.html)

## Creating and modifying an Amazon Timestream target endpoint
<a name="CHAP_Target.Timestream.CreateModifyEndpoint"></a>

Once you have created an IAM role and established the minimum set of access permissions, you can create a Amazon Timestream target endpoint using the AWS DMS console, or by using the `create-endpoint` command in the [AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/dms/index.html), with the `--timestream-settings '{"EndpointSetting": "value", ...}'` JSON syntax.

The following examples show how to create and modify a Timestream target endpoint using the AWS CLI.

**Create Timestream target endpoint command**

```
aws dms create-endpoint —endpoint-identifier timestream-target-demo
--endpoint-type target —engine-name timestream
--service-access-role-arn arn:aws:iam::123456789012:role/my-role
--timestream-settings
{
    "MemoryDuration": 20,
    "DatabaseName":"db_name",
    "MagneticDuration": 3,
    "CdcInsertsAndUpdates": true,
    "EnableMagneticStoreWrites": true,
}
```

**Modify Timestream target endpoint command**

```
aws dms modify-endpoint —endpoint-identifier timestream-target-demo
--endpoint-type target —engine-name timestream
--service-access-role-arn arn:aws:iam::123456789012:role/my-role
--timestream-settings
{
    "MemoryDuration": 20,
    "MagneticDuration": 3,
}
```

## Using object mapping to migrate data to a Timestream topic
<a name="CHAP_Target.Timestream.ObjectMapping"></a>

AWS DMS uses table-mapping rules to map data from the source to the target Timestream topic. To map data to a target topic, you use a type of table-mapping rule called object mapping. You use object mapping to define how data records in the source map to the data records published to a Timestream topic. 

Timestream topics don't have a preset structure other than having a partition key.

**Note**  
You don't have to use object mapping. You can use regular table mapping for various transformations. However, the partition key type will follow these default behaviors:   
Primary Key is used as a partition key for Full Load.
If no parallel-apply task settings are used, `schema.table` is used as a partition key for CDC.
If parallel-apply task settings are used, Primary key is used as a partition key for CDC.

To create an object-mapping rule, specify `rule-type` as `object-mapping`. This rule specifies what type of object mapping you want to use. The structure for the rule is as follows.

```
{
    "rules": [
        {
            "rule-type": "object-mapping",
            "rule-id": "id",
            "rule-name": "name",
            "rule-action": "valid object-mapping rule action",
            "object-locator": {
                "schema-name": "case-sensitive schema name",
                "table-name": ""
            }
        }
    ]
}
```


```
{
    "rules": [
        {
            "rule-type": "object-mapping",
            "rule-id": "1",
            "rule-name": "timestream-map",
            "rule-action": "map-record-to-record",
            "target-table-name": "tablename",
            "object-locator": {
                "schema-name": "",
                "table-name": ""
            },
            "mapping-parameters": {
                "timestream-dimensions": [
                    "column_name1",
                     "column_name2"
                ],
                "timestream-timestamp-name": "time_column_name",
                "timestream-multi-measure-name": "column_name1or2",
                "timestream-hash-measure-name":  true or false,
                "timestream-memory-duration": x,
                "timestream-magnetic-duration": y
            }
        }
    ]
}
```

AWS DMS currently supports `map-record-to-record` and `map-record-to-document` as the only valid values for the `rule-action` parameter. The `map-record-to-record` and `map-record-to-document` values specify what AWS DMS does by default to records that aren't excluded as part of the `exclude-columns` attribute list. These values don't affect the attribute mappings in any way. 

Use `map-record-to-record` when migrating from a relational database to a Timestream topic. This rule type uses the `taskResourceId.schemaName.tableName` value from the relational database as the partition key in the Timestream topic and creates an attribute for each column in the source database. When using `map-record-to-record`, for any column in the source table not listed in the `exclude-columns` attribute list, AWS DMS creates a corresponding attribute in the target topic. This corresponding attribute is created regardless of whether that source column is used in an attribute mapping. 

One way to understand `map-record-to-record` is to see it in action. For this example, assume that you are starting with a relational database table row with the following structure and data.


| FirstName | LastName | StoreId | HomeAddress | HomePhone | WorkAddress | WorkPhone | DateofBirth | 
| --- | --- | --- | --- | --- | --- | --- | --- | 
| Randy | Marsh | 5 | 221B Baker Street | 1234567890 | 31 Spooner Street, Quahog  | 9876543210 | 02/29/1988 | 

To migrate this information from a schema named `Test` to a Timestream topic, you create rules to map the data to the target topic. The following rule illustrates the mapping. 

```
{
    "rules": [
        {
            "rule-type": "selection",
            "rule-id": "1",
            "rule-name": "1",
            "rule-action": "include",
            "object-locator": {
                "schema-name": "Test",
                "table-name": "%"
            }
        },
        {
            "rule-type": "object-mapping",
            "rule-id": "2",
            "rule-name": "DefaultMapToTimestream",
            "rule-action": "map-record-to-record",
            "object-locator": {
                "schema-name": "Test",
                "table-name": "Customers"
            }
        }
    ]
}
```

Given a Timestream topic and a partition key (in this case, `taskResourceId.schemaName.tableName`), the following illustrates the resulting record format using our sample data in the Timestream target topic: 

```
  {
     "FirstName": "Randy",
     "LastName": "Marsh",
     "StoreId":  "5",
     "HomeAddress": "221B Baker Street",
     "HomePhone": "1234567890",
     "WorkAddress": "31 Spooner Street, Quahog",
     "WorkPhone": "9876543210",
     "DateOfBirth": "02/29/1988"
  }
```

## Limitations when using Amazon Timestream as a target for AWS Database Migration Service
<a name="CHAP_Target.Timestream.Limitations"></a>

The following limitations apply when using Amazon Timestream as a target:
+ **Dimensions and Timestamps:** Timestream uses the dimensions and timestamps in the source data like a composite primary key, and also does not allow you to upsert these values. This means that if you change the timestamp or the dimensions for a record in the source database, the Timestream database will try to create a new record. It is thus possible that if you change the dimension or timestamp of a record such that they match those of another existing record, then AWS DMS updates the values of the other record instead of creating a new record or updating the previous corresponding record.
+ **DDL Commands:** The current release of AWS DMS only supports `CREATE TABLE` and `DROP TABLE` DDL commands.
+ **Record Limitations:** Timestream has limitations for records such as record size and measure size. For more information, see [Quotas](https://docs.aws.amazon.com/timestream/latest/developerguide/what-is-timestream.html) in the [Amazon Timestream Developer Guide](https://docs.aws.amazon.com/).
+ **Deleting Records and Null Values: ** Timestream doesn't support deleting records. To support migrating records deleted from the source, AWS DMS clears the corresponding fields in the records in the Timestream target database. AWS DMS changes the values in the fields of the corresponding target record with **0** for numeric fields, **null** for text fields, and **false** for boolean fields.
+ Timestream as a target doesn't support sources that aren't relational databases (RDBMS).
+ AWS DMS only supports Timestream as a target in the following regions:
  + US East (N. Virginia)
  + US East (Ohio)
  + US West (Oregon)
  + Europe (Ireland)
  + Europe (Frankfurt)
  + Asia Pacific (Sydney)
  + Asia Pacific (Tokyo)
+ Timestream as a target doesn't support setting `TargetTablePrepMode` to `TRUNCATE_BEFORE_LOAD`. We recommend using `DROP_AND_CREATE` for this setting.

# Using Amazon RDS for Db2 and IBM Db2 LUW as a target for AWS DMS
<a name="CHAP_Target.DB2"></a>

You can migrate data to an Amazon RDS for Db2 or an on-premises Db2 database from a Db2 LUW database using AWS Database Migration Service (AWS DMS). 

For information about versions of Db2 LUW that AWS DMS supports as a target, see [Targets for AWS DMS](CHAP_Introduction.Targets.md).

You can use Secure Sockets Layer (SSL) to encrypt connections between your Db2 LUW endpoint and the replication instance. For more information about using SSL with a Db2 LUW endpoint, see [Using SSL with AWS Database Migration Service](CHAP_Security.SSL.md).

## Limitations when using Db2 LUW as a target for AWS DMS
<a name="CHAP_Target.DB2.Limitations"></a>

The following limitations apply when using Db2 LUW database as a target for AWS DMS. For limitations on using Db2 LUW as a source, see [Limitations when using Db2 LUW as a source for AWS DMS](CHAP_Source.DB2.md#CHAP_Source.DB2.Limitations).
+ AWS DMS only supports Db2 LUW as a target when the source is either Db2 LUW or Db2 for z/OS.
+ Using Db2 LUW as a target doesn't support replications with full LOB mode.
+ Using Db2 LUW as a target doesn't support the XML datatype in the full load phase. This is a limitation of the IBM dbload utility. For more information, see [The dbload utility](https://www.ibm.com/docs/en/informix-servers/14.10?topic=utilities-dbload-utility) in the *IBM Informix Servers* documentation.
+ AWS DMS truncates BLOB fields with values corresponding to the double quote character ("). This is a limitation of the IBM dbload utility. 
+ AWS DMS does not support the parallel full load option when migrating to a Db2 LUW target in DMS version 3.5.3. This option is available from DMS version 3.5.4 or later.

## Endpoint settings when using Db2 LUW as a target for AWS DMS
<a name="CHAP_Target.DB2.ConnectionAttrib"></a>

You can use endpoint settings to configure your Db2 LUW target database similar to using extra connection attributes. You specify the settings when you create the target endpoint using the AWS DMS console, or by using the `create-endpoint` command in the [AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/dms/index.html), with the `--ibm-db2-settings '{"EndpointSetting": "value", ...}'` JSON syntax.

The following table shows the endpoint settings that you can use with Db2 LUW as a target.


| Name | Description | 
| --- | --- | 
|  `KeepCsvFiles`  |  If true, AWS DMS saves any .csv files to the Db2 LUW target that were used to replicate data. DMS uses these files for analysis and troubleshooting..  | 
|  `LoadTimeout`  |  The amount of time (in milliseconds) before AWS DMS times out operations performed by DMS on the Db2 target. The default value is 1200 (20 minutes).  | 
|  `MaxFileSize`  |  Specifies the maximum size (in KB) of .csv files used to transfer data to Db2 LUW.  | 
|  `WriteBufferSize`  |  The size (in KB) of the in-memory file write buffer used when generating .csv files on the local disk on the DMS replication instance. The default value is 1024 (1 MB).  |