

# Migrating to Amazon DocumentDB
<a name="docdb-migration"></a>

Amazon DocumentDB is a fully managed database service that is compatible with the MongoDB API. You can migrate your data to Amazon DocumentDB from MongoDB databases running on premises or on Amazon Elastic Compute Cloud (Amazon EC2) using the process detailed in this section.

**Topics**
+ [Quick start guide](migration-quick-start.md)
+ [Migration runbook](docdb-migration-runbook.md)
+ [Migration from Couchbase Server](migration-from-couchbase.md)

# Migrate to Amazon DocumentDB using AWS Database Migration Service (DMS): Quick Start Guide
<a name="migration-quick-start"></a>

**Topics**
+ [

## Prepare the DMS source
](#migrate-qs-dma-source)
+ [

## Setup DMS
](#migrate-qs-dms-setup)
+ [

## Enable DocumentDB compression
](#migrate-qs-comp)
+ [

## Create a replication task
](#migrate-qs-create)
+ [

## Monitor progress
](#migrate-qs-monitor)
+ [

## Additional information
](#migrate-qs-info)

## Prepare the DMS source
<a name="migrate-qs-dma-source"></a>

See [Enabling change streams](change_streams.md#change_streams-enabling) to enable DocumentDB change streams or enable the MongoDB Oplog to support DMS Change Data Capture (CDC).
+ The DMS source must retain all ongoing changes until DMS full load completes for all included collections.
+ DocumentDB change streams are time based. Make sure your `change_stream_log_retention_duration` setting is large enough to cover the time to complete the full load.
+ The MongoDB Oplog is a fixed size. Make sure it is sized to hold all operations during full load.

## Setup DMS
<a name="migrate-qs-dms-setup"></a>

Create DMS instance, source, and target endpoints and test each endpoint.

## Enable DocumentDB compression
<a name="migrate-qs-comp"></a>

Enable compression by attaching a custom parameter group to your DocumentDB cluster and updating default\$1collection\$1compression parameter to enabled. See [Managing collection-level document compression](doc-compression.md) for more information.

## Create a replication task
<a name="migrate-qs-create"></a>

1. In the DMS console, in the navigation pane, choose **Migrate or replicate**, then choose **Tasks**.

1. Choose **Create task**.

1. On the **Create task** page, in the **Task configuration** section:
   + Enter a unique and meaningful **Task identifier** (for example, "mongodb-docdb-replication").
   + Choose the source endpoint you created previously in the **Source database endpoint** drop-down menu.
   + Choose the target endpoint you created previously in the **Target database endpoint** drop-down menu.
   + For **Task type**, choose **Migrate and replicate**.

1. In the **Settings** section:
   + For **Task logs**, check the **Turn on CloudWatch** logs box.
   + For **Editing mode** (at the top of the section), choose **JSON editor** and set the following attributes:
     + Set `ParallelApplyThreads` to 5 (under `TargetMetadata`). This enables \$11000 insert/update/delete ops per second in CDC.
     + Set `MaxFullLoadSubTasks` to 16 (under `FullLoadSettings`). Consider increasing this depending on your instance size.
     + For large collections (over 100 GB), enable auto-partition (under Table Mapping and under the `parallel-load` attribute):
       + "type": "partitions-auto"
       + "number-of-partitions": 16

## Monitor progress
<a name="migrate-qs-monitor"></a>

Use the AWS DMS console or create a custom dashboard ([dashboarder tool](https://github.com/awslabs/amazon-documentdb-tools/tree/master/monitoring/docdb-dashboarder)) to track migration. Focus on the following metrics:
+ **FullLoadThroughputBandwidthTarget** — Measures the network bandwidth (in KB/second) used by DMS when transferring data to the target database during the full load phase of migration.
+ **CDCLatencyTarget** — Measures the time delay (in seconds) bewteen when a change occurs in the source database and when that change is applied to the target database.
+ **CDCThroughputRowsTarget** — Measures the number of rows per second that DMS is applying to the target database during the ongoing replication phase of migration.

## Additional information
<a name="migrate-qs-info"></a>

For more information about Amazon DocumentDB and AWS DMS, see: See for more information.
+ [Amazon DocumentDB migration runbook](docdb-migration-runbook.md)
+ [Migrating from MongoDB to Amazon DocumentDB](https://docs.aws.amazon.com/dms/latest/sbs/chap-mongodb2documentdb.html)

# Amazon DocumentDB migration runbook
<a name="docdb-migration-runbook"></a>

This runbook provides a comprehensive guide for migrating a MongoDB database to Amazon DocumentDB using AWS Database Migration Service (DMS). It is designed to support database administrators, cloud engineers, and developers throughout the end-to-end migration journey—from initial discovery to post-migration validation.

Given the differences in implementation and supported features between MongoDB and Amazon DocumentDB, this runbook emphasizes a structured and systematic approach. It outlines essential pre-migration assessments, highlights compatibility considerations, and details the key tasks required to ensure a successful migration with minimal disruption.

The runbook is organized into the following topics:
+ **[Compatibility](#mig-runbook-compatibility)** — Understand the supported MongoDB features and data types in Amazon DocumentDB, and identify potential incompatibilities.
+ **[Workload discovery](#mig-runbook-workload)** — Analyze existing MongoDB workloads, including read/write patterns, data volumes, and performance baselines.
+ **[Index migration](#mig-runbook-index)** — Analyze strategies for extracting and transforming MongoDB indexes for optimal performance in Amazon DocumentDB.
+ **[User migration](#mig-runbook-user)** — Detail the approach for migrating database users, roles, and access controls to Amazon DocumentDB.
+ **[Data migration](#mig-runbook-data)** — Cover various methods for data migration using AWS DMS, including full load and change data capture (CDC).
+ **[Monitoring](#mig-runbook-monitoring)** — Detail various monitoring approaches when migrating using DMS or native tools.
+ **[Validation](#mig-runbook-validation)** — Provide procedures for data integrity checks, functional validation, and performance comparison post-migration.

By following the guidance in this runbook, teams can ensure a smooth, secure, and efficient transition to Amazon DocumentDB, while preserving application functionality and minimizing risk.

## Compatibility
<a name="mig-runbook-compatibility"></a>

**Topics**
+ [

### Core feature compatibility
](#w2aac25b9c13c13)
+ [

### Amazon DocumentDB compatibility assessment tool
](#w2aac25b9c13c15)

When migrating from MongoDB to Amazon DocumentDB, a thorough initial assessment and feature compatibility check is essential for a successful migration. This process begins with a comprehensive inventory of your MongoDB features, including aggregation pipeline operators, query patterns, indexes, and data models.

Since Amazon DocumentDB is compatible with MongoDB 3.6, 4.0, 5.0, and 8.0 API's, applications using newer MongoDB-specific features may require refactoring. Critical areas to evaluate include sharding mechanisms(Amazon DocumentDB uses a different approach), transaction implementations, change streams functionality, and index types (particularly sparse and partial indexes).

Performance characteristics also differ, with Amazon DocumentDB optimized for enterprise workloads with predictable performance. Testing should involve running representative workloads against both systems to identify query patterns that might need optimization.

Monitoring execution plans to detect potential performance gaps is important during the assessment phase. This helps create a clear migration roadmap, identifying necessary application changes and establishing realistic timelines for a smooth transition.

### Core feature compatibility
<a name="w2aac25b9c13c13"></a>



#### Comprehensive feature support
<a name="w2aac25b9c13c13b5"></a>
+ **CRUD operations** — Enjoy full support for all basic create, read, update, and delete operations, including bulk and query operators - providing seamless application compatibility.
+ **Rich indexing capabilities** — Leverage comprehensive support for single field, compound, TTL, partial, sparse, and 2dsphere indexes, to optimize your query performance and text indexes (version 5) for text-based lookups.
+ **Enterprise-grade replication** — Benefit from a robust automatic failover mechanism with read replicas for superior high availability without operational overhead.
+ **Advanced backup solutions** — Rest easy with automated backup system featuring Point-in-Time Recovery (PITR) and on-demand manual snapshots for data protection.

#### Enhanced AWS-integrated features
<a name="w2aac25b9c13c13b7"></a>
+ **Streamlined aggregation** — Take advantage of the most commonly used aggregation stages (`$match`, `$group`, `$sort`, `$project`, etc.) with optimized performance for enterprise workloads.
+ **Transaction support** — Implement multi-document and multi-collection transactions, perfect for most business application needs.
+ **Real-time data tracking** — Enable change streams by a simple command and increase change stream retention period through a simple parameter group setting for real-time data change monitoring.
+ **Location-based services** — Implement geospatial applications with support for `$geoNear` operator and 2dsphere indexes.
+ **Text search capabilities** — Utilize built-in text search functionality for content discovery needs.

#### Modern architecture advantages
<a name="w2aac25b9c13c13b9"></a>
+ **Cloud-native design** — Enjoy AWS-optimized architecture that replaces legacy features like MapReduce with more efficient aggregation pipeline operations.
+ **Enhanced security** — Benefit from AWS Identity and Access Management (IAM), SCRAM-SHA-1, SCRAM-SHA-256, X.509 certificate authentication, and password-based authentication.
+ **Predictable performance** — Experience consistent performance optimized specifically for enterprise workloads.

For a comprehensive overview of Amazon DocumentDB's capabilities, refer to the [Supported MongoDB APIs, operations, and data types in Amazon DocumentDB](mongo-apis.md) and [Functional differences: Amazon DocumentDB and MongoDB](functional-differences.md) to maximize your database's potential.

Amazon DocumentDB does not support all the indexes offered by MongoDB. We provide a free [index tool](https://github.com/awslabs/amazon-documentdb-tools/blob/master/index-tool/README.md) to check the compatibility. We recommend running the index tool to assess incompatibility and plan workarounds accordingly.

### Amazon DocumentDB compatibility assessment tool
<a name="w2aac25b9c13c15"></a>

The [MongoDB to Amazon DocumentDB Compatibility Tool](https://github.com/awslabs/amazon-documentdb-tools/blob/master/compat-tool/README.md) is an open-source utility available on GitHub that helps evaluate MongoDB workload compatibility with Amazon DocumentDB by analyzing MongoDB logs or application source code.

**Key features**
+ Identifies MongoDB API usage patterns in your workload
+ Flags potential compatibility issues before migration
+ Generates detailed compatibility reports with recommendations
+ Available as a standalone utility that can be run locally

#### Assessment methods
<a name="w2aac25b9c13c15b9"></a>

**Log-based assessment**


+ Pros:
  + Captures actual runtime behavior and query patterns
  + Identifies real-world usage frequencies and performance characteristics
  + Detects dynamic queries that might not be visible in source code
  + No access to application source code required
+ Cons:
  + Requires access to MongoDB logs with profiling enabled
  + Only captures operations that occurred during the logging period
  + May miss infrequently used features or seasonal workloads

**Source code analysis**


+ Pros:
  + Comprehensive coverage of all potential MongoDB operations in the codebase
  + Can identify issues in rarely executed code paths
  + Detects client-side logic that might be affected by Amazon DocumentDB differences
  + No need to run the application to perform assessment
+ Cons:
  + May flag code that exists but is never executed in production
  + Requires access to complete application source code
  + Limited ability to analyze dynamically constructed queries

For best results, we recommend using both assessment methods when possible to get a complete picture of compatibility challenges before migration.

## Workload discovery
<a name="mig-runbook-workload"></a>

Migrating from MongoDB to Amazon DocumentDB requires a thorough understanding of the existing database workload. Workload discovery is the process of analyzing your database usage patterns, data structures, query performance, and operational dependencies to ensure a seamless transition with minimal disruption. This section outlines the key steps involved in workload discovery to facilitate an effective migration from MongoDB to Amazon DocumentDB.

**Topics**
+ [

### Assessing the existing MongoDB deployment
](#w2aac25b9c15b7)
+ [

### Identifying data model differences
](#w2aac25b9c15b9)
+ [

### Query and performance analysis
](#w2aac25b9c15c11)
+ [

### Security and access control review
](#w2aac25b9c15c13)
+ [

### Operational and monitoring considerations
](#w2aac25b9c15c15)

### Assessing the existing MongoDB deployment
<a name="w2aac25b9c15b7"></a>

Before migration, it is crucial to evaluate the current MongoDB environment, including:
+ **Cluster architecture** — Identify the number of nodes, replica sets, and sharding configurations. When migrating from MongoDB to Amazon DocumentDB, understanding your MongoDB sharding configuration is important because Amazon DocumentDB does not support user-controlled sharding. Applications designed for a sharded MongoDB environment will need architectural changes, as Amazon DocumentDB uses a different scaling approach with its storage-based architecture. You'll need to adapt your data distribution strategy and possibly consolidate sharded collections when moving to Amazon DocumentDB.
+ **Storage and data volume** — Measure the total data size and index size of your cluster. Complement this with the [Oplog review tool](https://github.com/awslabs/amazon-documentdb-tools/tree/master/migration/mongodb-oplog-review) to understand write patterns and data growth velocity. For more information about sizing your cluster, see [Instance sizing](best_practices.md#best_practices-instance_sizing). 
+ **Workload patterns** — Analyze read and write throughput, query execution frequency, and indexing efficiency.
+ **Operational dependencies** — Document all applications, services, and integrations relying on MongoDB.

### Identifying data model differences
<a name="w2aac25b9c15b9"></a>

Although Amazon DocumentDB is MongoDB-compatible, there are differences in supported features, such as:
+ **Transactions** — Amazon DocumentDB supports ACID transactions but with some [Limitations](transactions.md#transactions-limitations).
+ **Schema design** — Ensure that document structures, embedded documents, and references align with [Amazon DocumentDB’s best practices](https://d1.awsstatic.com/product-marketing/Data%20modeling%20with%20Amazon%20DocumentDB.pdf).

### Query and performance analysis
<a name="w2aac25b9c15c11"></a>

Understanding query behavior helps optimize migration and post-migration performance. Key areas to analyze include:
+ **Slow queries** — Identify queries with high execution time using MongoDB’s profiling tools.
+ **Query patterns** — Categorize common query types, including CRUD operations and aggregations.
+ **Index usage** — Assess whether indexes are effectively utilized or need optimization in Amazon DocumentDB. To assess index usage and optimize performance in Amazon DocumentDB, use the `$indexStats` aggregation pipeline stage combined with the `explain()` method on your critical queries. Start by running `db.collection.aggregate([{$indexStats{}}])` to identify which indexes are being used. You can do more detailed analysis by executing you most frequent queries with `explainPlan`.
+ **Concurrency & workload distribution** — Evaluate read and write ratios, connection pooling, and performance bottlenecks.

### Security and access control review
<a name="w2aac25b9c15c13"></a>

**Authentication and authorization**
+ **MongoDB RBAC to Amazon DocumentDB IAM and RBAC** — Map MongoDB's role-based access control users and roles to AWS Identity and Access Management (IAM) policies and Amazon DocumentDB SCRAM authentication users.
+ **User migration strategy** — Plan for migrating database users, custom roles, and privileges to Amazon DocumentDB's supported authentication mechanisms.
+ **Privilege differences** — Identify MongoDB privileges without direct Amazon DocumentDB equivalents (for example, cluster administration roles).
+ **Application authentication** — Update connection strings and credential management for Amazon DocumentDB's password policies. You can use secrets manager to store your credentials and rotate passwords.
+ **Service account management** — Establish processes for managing service account credentials in AWS Secrets Manager.
+ **Least privilege implementation** — Review and refine access controls to implement least privilege principles in the new environment.

**Encryption**

Ensure encryption at rest and in transit aligns with compliance requirements.

**Network configuration**

Plan for [Virtual Private Cloud (VPC)](https://docs.aws.amazon.com/vpc/latest/userguide/create-vpc.html) setup and security group rules.

### Operational and monitoring considerations
<a name="w2aac25b9c15c15"></a>

To maintain system reliability, workload discovery should also include:
+ **Backup and restore strategy** — Evaluate existing backup methods and Amazon DocumentDB’s backup capabilities.
+ **AWS Backup integration** — Leverage AWS Backup for centralized backup management across AWS services including Amazon DocumentDB.
+ **CloudWatch metrics** — Map MongoDB monitoring metrics to Amazon DocumentDB CloudWatch metrics for CPU, memory, connections, and storage.
+ **Performance Insights** — Implement Amazon DocumentDB Performance Insights to visualize database load and analyze performance issues with detailed query analytics.
+ **Profiler** — Configure Amazon DocumentDB profiler to capture slow-running operations (similar to MongoDB's profiler but with Amazon DocumentDB-specific settings).
  + Enable through parameter groups with appropriate thresholds.
  + Analyze profiler data to identify optimization opportunities
+ **CloudWatch Events** — Set up event-driven monitoring for Amazon DocumentDB cluster events.
  + Configure notifications for backup events, maintenance windows, and failovers.
  + Integrate with Amazon SNS for alerting and AWS Lambda for automated responses.
+ **Audit logging** — Plan for audit logging configuration to track user activity and security-relevant events.
+ **Enhanced monitoring** — Enable enhanced monitoring for granular OS-level metrics at 1-second intervals.

## Index migration
<a name="mig-runbook-index"></a>

Migrating from MongoDB to Amazon DocumentDB involves transferring not just data but also indexes to maintain query performance and optimize database operations. This section outlines the detailed step-by-step process for migrating indexes from MongoDB to Amazon DocumentDB while ensuring compatibility and efficiency.

### Using the Amazon DocumentDB index tool
<a name="w2aac25b9c17b5"></a>

**Clone the [index tool](https://github.com/awslabs/amazon-documentdb-tools/blob/master/index-tool/README.md)**

```
git clone https://github.com/aws-samples/amazon-documentdb-tools.git
cd amazon-documentdb-tools/index-tool
```

```
pip install -r requirements.txt
```

**Export indexes from MongoDB (if migrating from MongoDB)**

```
python3 migrationtools/documentdb_index_tool.py --dump-indexes --dir mongodb_index_export --uri
'mongodb://localhost:27017'
```

**Export indexes from Amazon DocumentDB (if migrating from Amazon DocumentDB)**

```
python3 migrationtools/documentdb_index_tool.py --dump-indexes --dir docdb_index_export --uri
'mongodb://user:password@mydocdb.cluster-cdtjj00yfi95.eu-west-
2.docdb.amazonaws.com:27017/?tls=true&tlsCAFile=rds-combined-ca-
bundle.pem&replicaSet=rs0&retryWrites=false'
```

**Import indexes**

```
python3 migrationtools/documentdb_index_tool.py --restore-indexes --skip-incompatible --dir
mongodb_index_export --uri 'mongodb://user:password@mydocdb.cluster-cdtjj00yfi95.eu-west-
2.docdb.amazonaws.com:27017/?tls=true&tlsCAFile=rds-combined-ca-
bundle.pem&replicaSet=rs0&retryWrites=false'
```

**Verify indexes**

```
python3 migrationtools/documentdb_index_tool.py --show-issues --dir mongodb_index_export
```

## User migration
<a name="mig-runbook-user"></a>

Migrating users from MongoDB to Amazon DocumentDB is essential for maintaining access control, authentication, and database security. This section outlines detailed steps to successfully migrate MongoDB users while preserving their roles and permissions using the Amazon DocumentDB export user tool.

### Using Amazon DocumentDB export users tool
<a name="w2aac25b9c19b5"></a>

The [https://github.com/awslabs/amazon-documentdb-tools/tree/master/migration/export-users](https://github.com/awslabs/amazon-documentdb-tools/tree/master/migration/export-users) exports users and roles from MongoDB or Amazon DocumentDB to JavaScript files, which can then be used to recreate them in another cluster.

**Prerequisites**

```
# Clone the repository
git clone https://github.com/awslabs/amazon-documentdb-tools.git
cd amazon-documentdb-tools/migration/export-users
```

```
# Install required dependencies
pip install pymongo
```

**Step 1: Export users and roles**

```
# Export users and roles to JavaScript files
python3 docdbExportUsers.py \
--users-file mongodb-users.js \
--roles-file mongodb-roles.js \
--uri "mongodb://admin:password@source-host:27017/"
```

**Step 2: Edit the Users File**

```
// Example of how to update the users.js file
// Find each user creation statement and add the password
db.getSiblingDB("admin").createUser({
user: "appuser",
// Add password here
pwd: "newpassword",
roles: [
{ role: "readWrite", db: "mydb" }
]
})
```

**Step 3: Restore Custom Roles to Amazon DocumentDB**

```
# Import roles first
mongo --ssl \
--host target-host:27017 \
--sslCAFile rds-combined-ca-bundle.pem \
--username admin \
--password password \
mongodb-roles.js
```

**Step 4: Restore Users to Amazon DocumentDB**

```
# Import users after roles are created
mongo --ssl \
--host target-host:27017 \
--sslCAFile rds-combined-ca-bundle.pem \
--username admin \
--password password \
mongodb-users.js
```

**Important notes**
+ Passwords are not exported for security reasons and must be manually added to the users.js file.
+ Roles must be imported before users to ensure proper role assignments.
+ The tool generates JavaScript files that can be directly executed with the mongo shell.
+ Custom roles and their privileges are preserved during migration.
+ This approach allows for review and modification of user permissions before importing.

This method provides a secure and flexible approach to migrating users and roles from MongoDB to Amazon DocumentDB while allowing for password resets during the migration process.

## Data migration
<a name="mig-runbook-data"></a>

**Topics**
+ [

### Online migration
](#w2aac25b9c21b5)
+ [

### Offline migration
](#w2aac25b9c21b7)
+ [

### Prerequisites
](#w2aac25b9c21c11)
+ [

### Prepare an Amazon DocumentDB cluster
](#w2aac25b9c21c13)
+ [

### Perform the data dump (mongodump)
](#w2aac25b9c21c15)
+ [

### Transfer dump files to restoration environment
](#w2aac25b9c21c17)
+ [

### Restore data to Amazon DocumentDB (mongorestore)
](#w2aac25b9c21c19)

### Online migration
<a name="w2aac25b9c21b5"></a>

This section provides detailed steps to perform an online migration from MongoDB to Amazon DocumentDB using AWS DMS to enable minimal downtime and continuous replication. To begin, you set up an Amazon DocumentDB cluster as the target and ensure your MongoDB instance is properly configured as the source, typically requiring replica set mode for change data capture. Next, you create a DMS replication instance and define source and target endpoints with the necessary connection details. After validating the endpoints, you configure and start a migration task that can include full data load, ongoing replication, or both.

#### Configure target (Amazon DocumentDB)
<a name="w2aac25b9c21b5b5"></a>

**Note**  
If you already have provisioned a Amazon DocumentDB cluster to migrate to, you can skip this step.

**Create a custom parameter group**

See the AWS Management Console or AWS CLI procedures in [Creating Amazon DocumentDB cluster parameter groups](cluster_parameter_groups-create.md).

**Create an Amazon DocumentDB cluster**

**Note**  
While there are other procedures for creating an Amazon DocumentDB cluster in this guide, the steps in this section apply specifically to the task of migrating large amounts of data to a new cluster.

1. Sign in to the AWS Management Console, and open the Amazon DocumentDB console at [https://console.aws.amazon.com/docdb](https://console.aws.amazon.com/docdb).

1. In the navigation pane, choose **Clusters**.
**Tip**  
If you don't see the navigation pane on the left side of your screen, choose the menu icon (![\[Hamburger menu icon with three horizontal lines.\]](http://docs.aws.amazon.com/documentdb/latest/developerguide/images/docdb-menu-icon.png)) in the upper-left corner of the page.

1. On the Amazon DocumentDB management console, under **Clusters**, choose **Create**.

1. On the **Create Amazon DocumentDB cluster** page, in the **Cluster type** section, choose **Instance-based cluster** (this is the default option).

1. In the Cluster configuration section:
   + For **Cluster identifier**, enter a unique name, such as **mydocdbcluster**. Note that the console will change all cluster names to lower-case regardless of how they are entered.
   + For **Engine version**, choose 5.0.0.

1. In the **Cluster storage configuration** section, leave the **Amazon DocumentDB Standard** setting as is (this is the default option).

1. In the **Instance configuration** section:
   + For **DB instance class**, choose **Memory optimized classes (include r classes)** (this is default).
   + For **Instance class**, choose an instance class based on workload. For example:
     + db.r6g.large: for smaller workloads
     + db.r6g.4xlarge: for larger workloads

     As a best practice, we recommend choosing as big an instance as you are able to for best full-load throughput, and scale down after migration is complete.
   + For **Number of instances**, choose 1 instance. Choosing one instance helps minimize costs. We recommend that you scale to three instances for high availability after the full-load migration is complete.

1. In the **Authentication** section, enter a username for the primary user, and then choose **Self managed**. Enter a password, then confirm it.

1. In the **Network settings** section, choose a VPC and subnet group, and then configure the VPC security group. Make sure your Amazon DocumentDB security group allows inbound connection from the DMS instance’s security group by updating inbound rules.

1. In the **Encryption-at-rest** section, enable encryption (recommended) and choose or enter a KMS key.

1. In the **Backup** section, set the backup retention period (1-35 days).

1.  Review your configuration and choose **Create cluster**.

   The deployment time typically takes between 10 an 15 minutes,

#### Configure source
<a name="w2aac25b9c21b5b7"></a>

MongoDB and Amazon DocumentDB can both serve as migration sources, depending on your scenario:
+ **MongoDB as source** — Common when migrating from an on-premises or a self-managed MongoDB to an Amazon DocumentDB or other AWS database services. Requires running in replica set mode with an adequately sized oplog (make sure it is sized to hold all operations during Full Load) to support change data capture during migration.
+ **Amazon DocumentDB as source** — Typically used for cross-region replication, version upgrades, or migrating to other database services like MongoDB Atlas. Requires [Enabling change streams](change_streams.md#change_streams-enabling) by setting the `change_stream_log_retention_duration` parameter in the cluster parameter group to capture ongoing changes during migration. Make sure your `change_stream_log_retention_duration` setting is large enough to cover the time needed to complete the Full Load.

Before starting migration, configure your source to allow AWS DMS access.

Create a MongoDB user with proper permissions:

```
db.createUser({
user: "dmsUser",
pwd: "yourSecurePassword",
roles: [{ role: "readAnyDatabase", db: "admin" }]
})
```

Configure network and authentication.

When configuring network connectivity for MongoDB to DMS migration:

**EC2-hosted MongoDB source**
+ Modify the EC2 security group to allow inbound traffic from the DMS replication instance security group.
+ Add a rule for TCP port 27017 (or your custom MongoDB port).
+ Use the DMS replication instance's security group ID as the source for precise access control.
+ Ensure the EC2 instance's subnet has a route to the DMS replication instance's subnet.

**On-premises MongoDB source**
+ Configure your firewall to allow inbound connections from the DMS replication instance's public IP addresses.
+ If using Direct Connect or a VPN, ensure proper routing between your network and the VPC containing the DMS instance.
+ Test connectivity using telnet or nc commands from the DMS subnet to your MongoDB server.

**MongoDB Atlas source**
+ Add a DMS replication instance IP addresses to the MongoDB Atlas IP allowlist.
+ Configure VPC peering between AWS VPC and MongoDB Atlas VPC if Atlas is running on AWS.
+ Set up AWS PrivateLink for private connectivity (Enterprise tier), if running on another cloud provider.
+ Create a dedicated user with appropriate read/write permissions.
+ Use a MongoDB Atlas connection string with SSL Mode set to "verify-full".
+ Ensure sufficient oplog size for migration duration.

**Amazon DocumentDB source**

Configure your source Amazon DocumentDB security group to allow inbound traffic from the DMS replication instance security group.

#### Create DMS replication instance
<a name="w2aac25b9c21b5b9"></a>

We recommend using [DMS Buddy](https://github.com/awslabs/amazon-documentdb-tools/tree/master/migration/dms_buddy) to provision your DMS infrastructure as it creates optimal migration infrastructure with optimal DMS settings and instance sizes.

If you prefer to configure manually, follow these steps:

1. Open the AWS DMS console and choose **Create replication instance**.

1. Enter replication instance details:
   + **Instance name**: Choose a unique name.
   + **Instance class**: Select based on workload. Example: dms.r5.large (small workloads), dms.r5.4xlarge (large workloads).
   + **Engine version**: 3.5.4
   + **Allocated storage**: Default is 100GB (increase if needed). This is determined by document size, updates/second and full load duration.
   + **Multi-AZ Deployment**: Enable for high availability, if needed.
   + Choose the same VPC as Amazon DocumentDB.
   + Ensure **Security groups** allow inbound traffic from source and Amazon DocumentDB.

1. Click **Create replication instance** and wait for the status to be available.

#### Create DMS endpoints
<a name="w2aac25b9c21b5c11"></a>

##### Create a source endpoint
<a name="w2aac25b9c21b5c11b3"></a>

**For a MongoDB source**

1. In the DMS console, in the navigation pane, choose **Migrate or replicate**, then choose **Endpoints**.

1. Choose **Create endpoint**.

1. On the **Create endpoint** page, choose **Source endpoint**.

1. In the **Endpoint configuration** section:
   + Enter a unique and meaningful **Endpoint identifier** (for example, "mongodb-source").
   + Choose **MongoDB** as the **Source engine**.
   + For **Access to endpoint database**, choose **Provide access information manually**.
   + For **Server name**, enter your *MongoDB server DNS name/IP address*.
   + For **Port**, enter **27017** (default MongoDB port).
   + For **Authentication mode**, choose the appropriate mode for your application (password/SSL) (default is secrets manager).
   + If **Authentication mode** is **Password**, provide:
     + **Username** and **Password**: Enter MongoDB credentials.
     + **Database name**: Your source database name.
     + **Authentication mechanism**: SCRAM-SHA-1 (default) or appropriate mechanism

1. For **Metadata mode**, leave the default setting of **document**.

1. Additional connection attributes:
   + authSource=admin (if authentication database is different)
   + replicaSet=<your-replica-set-name> (required for CDC)

**For an Amazon DocumentDB source**

1. In the DMS console, in the navigation pane, choose **Migrate or replicate**, then choose **Endpoints**.

1. Choose **Create endpoint**.

1. On the **Create endpoint** page, choose **Source endpoint**.

1. In the **Endpoint configuration** section:
   + Enter a unique and meaningful **Endpoint identifier** (for example, "docdb-source").
   + Choose **Amazon DocumentDB** as the **Source engine**.
   + For **Access to endpoint database**, choose **Provide access information manually**.
   + For **Server name**, enter your *source Amazon DocumentDB cluster endpoint*.
   + For **Port**, enter **27017** (default Amazon DocumentDB port).
   + For **SSL mode**, choose **verify-full** (recommended for Amazon DocumentDB).
   + For **CA Certificate**, choose the Amazon RDS root CA certificate.
   + For **Authentication mode**, choose the appropriate mode for your application (password/SSL) (default is secrets manager).
   + If **Authentication mode** is **Password**, provide:
     + **Username** and **Password**: Enter Amazon DocumentDB credentials.
     + **Database name**: Your source database name.
     + **Authentication mechanism**: SCRAM-SHA-1 (default) or appropriate mechanism

1. For **Metadata mode**, leave the default setting of **document**.

##### Create a target endpoint (Amazon DocumentDB)
<a name="w2aac25b9c21b5c11b5"></a>

1. In the DMS console, in the navigation pane, choose **Migrate or replicate**, then choose **Endpoints**.

1. Choose **Create endpoint**.

1. On the **Create endpoint** page, choose **Target endpoint**.

1. In the **Endpoint configuration** section:
   + Enter a unique and meaningful **Endpoint identifier** (for example, "docdb-target").
   + Choose **Amazon DocumentDB** as the **Target engine**.
   + For **Access to endpoint database**, choose the method you want to use to authenticate access to the database:
     + If you choose **AWS Secrets Manager**, choose the secret where you store your Amazon DocumentDB credentials in the **Secret** field.
     + If you choose **Provide access information manually**: 
       + For **Server name**, enter your *target Amazon DocumentDB cluster endpoint*.
       + For **Port**, enter **27017** (default Amazon DocumentDB port).
       + For **SSL mode**, choose **verify-full** (recommended for Amazon DocumentDB).
       + For **CA Certificate**, download and specify the CA certificate bundle for SSL verification.
       + For **Authentication mode**, choose the appropriate mode for your application (password/SSL) (default is secrets manager).
       + If **Authentication mode** is **Password**, provide:
         + **Username** and **Password**: Enter Amazon DocumentDB credentials.
         + **Database name**: Your source database name.
         + **Authentication mechanism**: SCRAM-SHA-1 (default) or appropriate mechanism

1. For **Metadata mode**, leave the default setting of **document**.

#### Create replication task
<a name="w2aac25b9c21b5c13"></a>

1. In the DMS console, in the navigation pane, choose **Migrate or replicate**, then choose **Tasks**.

1. Choose **Create task**.

1. On the **Create task** page, in the **Task configuration** section:
   + Enter a unique and meaningful **Task identifier** (for example, "mongodb-docdb-replication").
   + Choose the source endpoint you created previously in the **Source database endpoint** drop-down menu.
   + Choose the target endpoint you created previously in the **Target database endpoint** drop-down menu.
   + For **Task type**, choose **Migrate and replicate**.

1. In the **Settings** section:
   + For **Target table preparation mode**, leave the default setting.
   + For **Stop task after full load completes**, leave the default setting.
   + For **LOB column settings**, leave the **Limited LOB mode** setting as is.
   + For **Data validation**, leave the default setting of **Turn off**.
   + For **Task logs**, check the **Turn on CloudWatch** logs box.
   + For **Batch-optimized apply**, leave the default setting of unchecked (off).

1. Back at the top of the **Task settings** section, in **Editing mode**, choose **JSON editor** and set the following attributes:

   ```
   {
     "TargetMetadata": {
       "ParallelApplyThreads": 5
     },
     "FullLoadSettings": {
       "MaxFullLoadSubTasks": 16
     }
   }
   ```

1. In the **Table mappings** section, add a new selection rule:
   + For **Schema name**, add the source database to migrate. Use % to specify multiple databases.
   + For **Schema table** name, add the source collection to migrate. Use % to specify multiple collections.
   + For **Action**, leave the default setting of **Include**

1. For large collections (over 100GB), add **Table settings rule**:
   + For **Schema name**, add the source database to migrate. Use % to specify multiple databases.
   + For **Schema table** name, add the source collection to migrate. Use % to specify multiple collections.
   + For **Number of partitions**, enter 16 (should be less than `MaxFullLoadSubTask`).

1. In the **Premigration assessment** section, make sure it is turned off.

### Offline migration
<a name="w2aac25b9c21b7"></a>

This section outlines the process to perform an offline migration from a self-managed MongoDB instance to Amazon DocumentDB using native MongoDB tools: `mongodump` and `mongorestore`.

### Prerequisites
<a name="w2aac25b9c21c11"></a>

**Source MongoDB requirements**
+ Access to the source MongoDB instance with appropriate permissions.
+ Install `mongodump`. if needed (it is installed during a MongoDB installation).
+ Make sure there is enough disk space for the dump files.

**Target Amazon DocumentDB requirements**
+ Make sure you have an Amazon DocumentDB cluster provisioned.
+ Ensure there is an EC2 instance in the same VPC as Amazon DocumentDB to facilitate the migration.
+ Network connectivity must be available between your source environment and Amazon DocumentDB.
+ **mongorestore** must be installed on the migration EC2 instance.
+ Appropriate IAM permissions must be configured to access Amazon DocumentDB,

**General requirements**
+ AWS CLI must be configured (if using AWS services for intermediate storage)
+ Sufficient bandwidth must be available for data transfer.
+ Downtime window should be approved (if doing a live migration, consider other approaches)

### Prepare an Amazon DocumentDB cluster
<a name="w2aac25b9c21c13"></a>

Create an Amazon DocumentDB cluster in AWS:
+ Appropriate an instance size based on your workload.
+ Configure a VPC, subnets, and security groups.
+ Enable necessary parameters via parameter groups.

### Perform the data dump (mongodump)
<a name="w2aac25b9c21c15"></a>

Choose one of the following options to create a dump file:
+ **Option 1: Basic**

  ```
  mongodump --
  uri="mongodb://<source_user>:<source_password>@<source_host>:<source_port>/<database>" --
  out=/path/to/dump
  ```
+ **Option 2: Better control and performance**

  ```
  mongodump \
  --uri="mongodb://<source_user>:<source_password>@<sourcehost>:<source_port>" \
  --out=/path/to/dump \
  --gzip \# Compress output
  --numParallelCollections=4 \# Parallel collections dump
  --ssl \# If using SSL
  --authenticationDatabase=admin \ # If auth is required
  --readPreference=secondaryPreferred # If replica set
  ```
+ **Option 3: Large databases**

  ```
  mongodump \
  --host=<source_host> \
  --port=<source_port> \
  --username=<source_user> \
  --password=<source_password> \
  --db=<specific_db> \# Only dump specific DB
  --collection=<specific_collection> \ # Only dump specific collection
  --query='{ "date": { "$gt": "2020-01-01" } }' \ # Filter documents
  --archive=/path/to/archive.gz \# Single archive output
  --gzip \
  --ssl
  ```

### Transfer dump files to restoration environment
<a name="w2aac25b9c21c17"></a>

Choose an appropriate method based on your dump size:
+ **Small** — Directly copy to your migration machine (EC2 instance you created earlier):

  ```
  scp -r /path/to/dump user@migration-machine:/path/to/restore
  ```
+ **Medium** — Use Amazon S3 as intermediate storage:

  ```
  aws s3 cp --recursive /path/to/dump s3://your-bucket/mongodb-dump/
  ```
+ **Large** — For very large databases, consider AWS DataSync or a physical transfer.

### Restore data to Amazon DocumentDB (mongorestore)
<a name="w2aac25b9c21c19"></a>

Before starting the restore process, create the indexes in Amazon DocumentDB. You can utilize the [Amazon DocumentDB Index tool](https://github.com/awslabs/amazon-documentdb-tools/tree/master/index-tool) to export and import indexes.

Choose one of the following options to restore data:
+ **Option 1: Basic restore**

  ```
  mongorestore --uri="mongodb://<docdb_user>:<docdb_password>@<docdb_endpoint>:27017"
  /path/to/dump
  ```
+ **Option 2: Better control and performance**

  ```
  mongorestore \
  --uri="mongodb://<docdb_user>:<docdb_password>@<docdb_endpoint>:27017" \
  --ssl \
  --sslCAFile=/path/to/rds-combined-ca-bundle.pem \ # DocumentDB CA cert
  --gzip \# If dumped with gzip
  --numParallelCollections=4 \# Parallel restoration
  --numInsertionWorkersPerCollection=4 \# Parallel documents insertion
  --noIndexRestore \# skip indexes as they are pre-created
  /path/to/dump
  ```
+ **Option 3: Large databases or specific controls**

  ```
  mongorestore \
  --host=<docdb_endpoint> \
  --port=27017 \
  --username=<docdb_user> \
  --password=<docdb_password> \
  --ssl \
  --sslCAFile=/path/to/rds-combined-ca-bundle.pem \
  --archive=/path/to/archive.gz \# If using archive format
  --gzip \
  --nsInclude="db1.*" \# Only restore specific namespaces
  --nsExclude="db1.sensitive_data" \ # Exclude specific collections if needed
  --noIndexRestore \# skip indexes as they are pre-created
  --writeConcern="{w: 'majority'}" # Ensure write durability
  ```

## Monitoring
<a name="mig-runbook-monitoring"></a>

This section provides a detailed monitoring process to track the progress, performance, and health of an ongoing migration from:

**MongoDB** to **Amazon DocumentDB**

or

**Amazon DocumentDB** to **Amazon DocumentDB**

The monitoring steps apply regardless of the migration method (AWS DMS, mongodump/mongorestore, or other tools).

### AWS DMS Migration monitoring (if applicable)
<a name="w2aac25b9c23c13"></a>

Monitor the following key CloudWatch metrics:

**Full load phase metrics**
+ **FullLoadThroughputBandwidthTarget** — Network bandwidth (KB/second) during full load
+ **FullLoadThroughputRowsTarget** — Number of rows/documents loaded per second
+ **FullLoadThroughputTablesTarget** — Number of tables/collections completed per minute
+ **FullLoadProgressPercent** — Percentage of full load completed
+ **TablesLoaded** — Number of tables/collections successfully loaded
+ **TablesLoading** — Number of tables/collections currently loading
+ **TablesQueued** — Number of tables/collections waiting to be loaded
+ **TablesErrored** — Number of tables/collections that failed to load

**CDC phase metrics**
+ **CDCLatencyTarget** — Time delay (seconds) between source change and target application
+ **CDCLatencySource** — Time delay (seconds) between change in source and DMS reading it
+ **CDCThroughputRowsTarget** — Rows per second applied during ongoing replication
+ **CDCThroughputBandwidthTarget** — Network bandwidth (KB/second) during CDC
+ **CDCIncomingChanges** — Number of change events received from source
+ **CDCChangesMemoryTarget** — Memory used (MB) for storing changes on target side

**Resource metrics**
+ **CPUUtilization** — CPU usage of the replication instance
+ **FreeableMemory** — Available memory on the replication instance
+ **FreeStorageSpace** — Available storage on the replication instance
+ **NetworkTransmitThroughput** — Network throughput for the replication instance
+ **NetworkReceiveThroughput** — Network throughput for the replication instance

**Error metrics**
+ **ErrorsCount** — Total number of errors during migration
+ **TableErrorsCount** — Number of table-specific errors
+ **RecordsErrorsCount** — Number of record-specific errors

Create CloudWatch alarms for critical metrics like `CDCLatencyTarget` and `CPUUtilization` to receive notifications if migration performance degrades.

#### DMS logs (CloudWatch logs)
<a name="w2aac25b9c23c13c23"></a>



1. Go to Amazon CloudWatch Logs console.

1. Find and choose on your log group. It will look similar to "dms-tasks –".

1. Look for log streams that might contain error information:
   + Streams with "error" in the name
   + Streams with task IDs or endpoint names
   + The most recent log streams during the time of your migration

1. Within these streams, search for keywords like:
   + "error"
   + "exception"
   + "failed"
   + "warning"

#### DMS task status (using AWS CLI)
<a name="w2aac25b9c23c13c25"></a>



```
aws dms describe-replication-tasks --filters Name=replication-task id,Values=<task_id> --query
"ReplicationTasks[0].Status"
```

Expected status flow:

creating → ready → running → stopping → stopped (or failed)

#### Monitor using `docdb-dashboarder`
<a name="w2aac25b9c23c13c27"></a>

The `docdb-dashboarder` tool provides comprehensive monitoring for Amazon DocumentDB clusters by automatically generating CloudWatch dashboards with essential performance metrics. These dashboards display critical cluster-level metrics (replica lag, operation counters), instance-level metrics (CPU, memory, connections), and storage metrics (volume usage, backup storage). For migration scenarios, the tool offers specialized dashboards that track migration progress with metrics like CDC replication lag and operation rates. The dashboards can monitor multiple clusters simultaneously and include support for NVMe-backed instances. By visualizing these metrics, teams can proactively identify performance bottlenecks, optimize resource allocation, and ensure smooth operation of their Amazon DocumentDB deployments. The tool eliminates the need for manual dashboard creation while providing consistent monitoring across all environments. For setup instructions and advanced configuration options, refer to the [Amazon DocumentDB Dashboarder Tool](https://github.com/awslabs/amazon-documentdb-tools/tree/master/monitoring/docdb-dashboarder) GitHub repository.

## Validation
<a name="mig-runbook-validation"></a>

**Topics**
+ [

### Validation checklist
](#w2aac25b9c25c15)
+ [

### Schema and index validation
](#w2aac25b9c25c17)
+ [

### Data sampling and field-level validation
](#w2aac25b9c25c19)
+ [

### Validation using DataDiffer tool
](#w2aac25b9c25c21)

This section provides a detailed validation process to ensure data consistency, integrity, and application compatibility after migrating from:

**MongoDB** to **Amazon DocumentDB**

or

**Amazon DocumentDB** to **Amazon DocumentDB**

The validation steps apply regardless of the migration method (AWS DMS, mongodump/mongorestore, or other tools).

### Validation checklist
<a name="w2aac25b9c25c15"></a>

Verify that the number of documents in each collection matches between source and target:

**MongoDB source**

```
mongo --host <source_host> --port <port> --username <user> -- password <password> --eval
"db.<collection>.count()"
```

**Amazon DocumentDB target**

```
mongo --host <target_host> --port <port> --username <user> -- password <password> --eval
"db.<collection>.count()"
```

### Schema and index validation
<a name="w2aac25b9c25c17"></a>

Ensure that:
+ all collections exist in the target.
+ indexes are correctly replicated.
+ schema definitions (if enforced) are identical.

**Check collections (source vs. target)**

```
mongo --host <source_host> --eval "show collections"
mongo --host <target_host> --ssl --eval "show collections"
```

**check indexes (Source vs. Target)**

```
mongo --host <source_host> --eval" db.<collection>.getIndexes()"
mongo --host <target_host> --ssl –eval" db.<collection>.getIndexes()"
```

Compare the list of collections to ensure there are no missing or extra collections.

Verify indexes by checking index names, key definitions, unique constraints, and TTL indexes (if any).

**Check schema validation rules (if using schema validation in MongoDB)**

```
mongo --host <source_host> --eval" db.getCollectionInfos({name: '<collection>'})
[0].options.validator"
   mongo --host <target_host> --ssl –eval" db.getCollectionInfos({name: '<collection>'})[0].options.validator"
```

### Data sampling and field-level validation
<a name="w2aac25b9c25c19"></a>

You can randomly sample documents and compare fields between source and target.

**Manual sampling**

Fetch five random documents (source):

```
mongo --host <source_host> --eval "db.<collection>.aggregate([{ \$sample: { size: 5 } }])"
```

Fetch the same document IDs (target):

```
mongo --host <target_host> --ssl –eval "db.<collection>.find({ _id: { \$in: [<list_of_ids>] } })"
```

**Automatic sampling**

```
import pymongo
# Connect to source and target
source_client = pymongo.MongoClient("<source_uri>")
target_client = pymongo.MongoClient("<target_uri>", ssl=True)
source_db = source_client["<db_name>"]
target_db = target_client["<db_name>"]
# Compare 100 random documents
for doc in source_db.<collection>.aggregate([{ "$sample":
{ "size": 100 } }]):
target_doc = target_db.<collection>.find_one({ "_id":
doc["_id"] })
if target_doc != doc:
print(f"❌ Mismatch in _id: {doc['_id']}")
else:
print(f"✅ Match: {doc['_id']}")
```

### Validation using DataDiffer tool
<a name="w2aac25b9c25c21"></a>

The [DataDiffer tool](https://github.com/awslabs/amazon-documentdb-tools/tree/master/migration/data-differ) provides a reliable way to compare data between source and target databases.

#### Prerequisites
<a name="w2aac25b9c25c21b5"></a>

The following prerequisites must be met before installing the DataDiffer tool:
+ Python 3.7\$1
+ PyMongo library
+ Network connectivity to both source MongoDB and target Amazon DocumentDB clusters

#### Setup and installation
<a name="w2aac25b9c25c21b7"></a>

**Clone the repository and navigate to the DataDiffer directory**

```
git clone https://github.com/awslabs/amazon-documentdb-tools.git
cd amazon-documentdb-tools/migration/data-differ
```

**Install required dependencies**

```
pip install -r requirements.txt
```

#### Running data validation
<a name="w2aac25b9c25c21b9"></a>

**Create a configuration file (e.g., config.json) with connection details**

```
{
"source": {
"uri": "mongodb://username:password@source-mongodb-
host:27017/?replicaSet=rs0",
"db": "your_database",
"collection": "your_collection"
},
"target": {
"uri": "mongodb://username:password@target-docdb-
cluster.region.docdb.amazonaws.com:27017/?tls=true&tlsCAFile=global-
bundle.pem&replicaSet=rs0",
"db": "your_database",
"collection": "your_collection"
},
"options": {
"batch_size": 1000,
"threads": 4,
"sample_size": 0,
"verbose": true
}
}
```

**Run the DataDiffer tool**

```
python differ.py --config config.json
```

**For large collections, use sampling to validate a subset of data**

```
python differ.py --config config.json --sample-size 10000
```

**To validate multiple collections, create separate configuration files or use the batch mode**

```
python differ.py --batch-config batch_config.json
```

#### Interpreting results
<a name="w2aac25b9c25c21c11"></a>

The tool will output:
+ Total documents in source and target
+ Number of matching documents
+ Number of missing documents
+ Number of documents with differences
+ Detailed report of differences (if any)

#### Best practices
<a name="w2aac25b9c25c21c13"></a>

The following are best practices when using the DataDiffer tool:
+ **Run in phases** — First validate document counts, then sample key documents, and finally run a full comparison, if needed.
+ **Check for schema differences** — Amazon DocumentDB has some limitations compared to MongoDB. The tool will highlight incompatible data types or structures.
+ **Validate during quiet periods** — Run validation when write operations are minimal to ensure consistency.
+ **Monitor resource usage** — The comparison process can be resource-intensive. Adjust batch size and thread count accordingly.
+ **Validate indexes** — After data validation, ensure all required indexes have been created on the target Amazon DocumentDB cluster.
+ **Document validation results** — Keep a record of validation results for each collection as part of your migration documentation.

# Migration from Couchbase Server
<a name="migration-from-couchbase"></a>

**Topics**
+ [

## Introduction
](#introduction)
+ [

## Comparison to Amazon DocumentDB
](#comparison-to-amazon-documentdb)
+ [

## Discovery
](#discovery)
+ [

## Planning
](#planning)
+ [

## Migration
](#migration)
+ [

## Validation
](#validation)

## Introduction
<a name="introduction"></a>

This guide presents the key points to consider when migrating from Couchbase Server to Amazon DocumentDB. It explains considerations for the discovery, planning, execution, and validation phases of your migration. It also explains how to perform offline and online migrations.

## Comparison to Amazon DocumentDB
<a name="comparison-to-amazon-documentdb"></a>


|  | **Couchbase Server** | **Amazon DocumentDB** | 
| --- | --- | --- | 
| Data Organization | In versions 7.0 and later, data is organized into buckets, scopes, and collections. In earlier versions, data is organized into buckets. | Data is organized into databases and collections. | 
| Compatibility | There are separate APIs for each service (e.g. data, index, search, etc.). Secondary lookups use SQL\$1\$1 (formerly known an N1QL); a query language based on ANSI-standard SQL so it is familiar to many developers. | Amazon DocumentDB is [compatible with the MongoDB API](compatibility.html). | 
| Architecture | Storage is attached to each cluster instance. You cannot scale compute independently of storage. | Amazon DocumentDB is designed for the cloud and to avoid the limitations of traditional database architectures. The [compute and storage layers are separated](db-clusters-understanding.html) in Amazon DocumentDB and the compute layer can be [scaled independently of storage](how-it-works.html). | 
| Add read capacity on demand | Clusters can be scaled out by adding instances. Since storage is attached to the instance where the service is running, the time it takes to scale out is dependent on the amount of data that needs to be moved to the new instance, or rebalanced. | You can achieve read scaling for your Amazon DocumentDB cluster by [creating up to 15 Amazon DocumentDB replicas](db-cluster-manage-performance.html#db-cluster-manage-scaling-reads) in the cluster. There is no impact to the storage layer. | 
| Recover quickly from node failure | Clusters have automatic failover capabilities but the time to get the cluster back to full strength is dependent on the amount of data that needs to be moved to the new instance. | Amazon DocumentDB can [failover the primary](failover.html) typically within 30 seconds and restore the cluster back to full strength in 8-10 minutes regardless of the amount of data in the cluster. | 
| Scale storage as data grows | For self-managed clusters storage and IOs do not scale automatically. | Amazon DocumentDB [storage and IOs scale automatically](db-cluster-manage-performance.html#db-cluster-manage-scaling-storage). | 
| Backup data without affecting performance | Backups are performed by the backup service and are not enabled by default. Since storage and compute are not separated there can be an impact to performance. | Amazon DocumentDB backups are enabled by default and cannot be turned off. Backups are handled by the storage layer, so they are zero-impact on the compute layer. Amazon DocumentDB supports [restoring from a cluster snapshot](backup_restore-restore_from_snapshot.html) and [restoring to a point in time](backup_restore-point_in_time_recovery.html). | 
| Data durability | There can be a maximum of 3 replica copies of data in a cluster for a total of 4 copies. Each instance where the data service is running will have active and 1, 2, or 3 replica copies of the data. | Amazon DocumentDB maintains 6 copies of data no matter how many compute instances there are with a write quorum of 4 and persist true. Clients receive an acknowledgement after the storage layer has persisted 4 copies of the data. | 
| Consistency | Immediate consistency for K/V operations is supported. The Couchbase SDK routes K/V requests to the specific instance that contains the active copy of the data so once an update is acknowledged, the client is guaranteed to read that update. Replication of updates to other services (index, search, analytics, eventing) is eventually consistent. | Amazon DocumentDB replicas are eventually consistent. If immediate consistency reads are required, the client can read from the primary instance. | 
| Replication | Cross-Data Center Replication (XDCR) provides filtered, active-passive/active-active replication of data in many:many topologies. | [Amazon DocumentDB global clusters](global-clusters.html) provide active-passive replication in 1:many (up to 10) topologies. | 

## Discovery
<a name="discovery"></a>

Migrating to Amazon DocumentDB requires a thorough understanding of the existing database workload. Workload discovery is the process of analyzing your Couchbase cluster configuration and operational characteristics – data set, indexes, and workload – to help ensure a seamless transition with minimal disruption.

### Cluster configuration
<a name="cluster-configuration"></a>

Couchbase uses a service-centric architecture where each capability corresponds to a service. Execute the following command against your Couchbase cluster to determine which services are being used (see [Getting Information on Nodes](https://docs.couchbase.com/server/current/rest-api/rest-node-get-info.html)):

```
curl -v -u <administrator>:<password> \
  http://<ip-address-or-hostname>:<port>/pools/nodes | \
  jq '[.nodes[].services[]] | unique'
```

Sample output:

```
[
  "backup",
  "cbas",
  "eventing",
  "fts",
  "index",
  "kv",
  "n1ql"
]
```

Couchbase services include the following:

#### Data service (kv)
<a name="data-service-kv"></a>

The data service provides read/write access to data in memory and on disk.

Amazon DocumentDB supports K/V operations on JSON data via the [MongoDB API](java-crud-operations.html).

#### Query service (n1ql)
<a name="query-service-n1ql"></a>

The query service supports the querying of JSON data via SQL\$1\$1.

Amazon DocumentDB supports the querying of JSON data via the MongoDB API.

#### Index service (index)
<a name="index-service-index"></a>

The index service creates and maintains indexes on data, enabling faster querying.

Amazon DocumentDB supports a default primary index and the creation of secondary indexes on JSON data via the MongoDB API.

#### Search service (fts)
<a name="search-service-fts"></a>

The search service supports the creation of indexes for full text search.

Amazon DocumentDB's native full text search feature allows you to [perform text search on large textual data sets using special purpose text indexes](text-search.html) via the MongoDB API. For advanced search use cases, [Amazon DocumentDB zero-ETL integration with Amazon OpenSearch Service](https://aws.amazon.com/blogs/big-data/amazon-documentdb-zero-etl-integration-with-amazon-opensearch-service-is-now-available/) provides advanced search capabilities, such as fuzzy search, cross-collection search and multilingual search, on Amazon DocumentDB data.

#### Analytics service (cbas)
<a name="analytics-service-cbas"></a>

The analytics service supports analyzing JSON data in near real-time.

Amazon DocumentDB supports ad-hoc queries on JSON data via the MongoDB API. You can also [run complex queries on your JSON data in Amazon DocumentDB using Apache Spark running on Amazon EMR](https://aws.amazon.com/blogs/database/run-complex-queries-on-massive-amounts-of-data-stored-on-your-amazon-documentdb-clusters-using-apache-spark-running-on-amazon-emr/).

#### Eventing service (eventing)
<a name="eventing-service-eventing"></a>

The eventing service executes user-defined business logic in response to data changes.

Amazon DocumentDB automates event-driven workloads by [invoking AWS Lambda functions each time that data changes with your Amazon DocumentDB cluster](https://docs.aws.amazon.com/lambda/latest/dg/with-documentdb-tutorial.html).

#### Backup service (backup)
<a name="backup-service-backup"></a>

The backup service schedules full and incremental data backups and merges of previous data backups.

Amazon DocumentDB continuously backs up your data to Amazon S3 with a retention period of 1–35 days so that you can quickly restore to any point within the backup retention period. Amazon DocumentDB also takes automatic snapshots of your data as part of this continuous backup process. You can also [manage backup and restore of Amazon DocumentDB with AWS Backup.](https://aws.amazon.com/blogs/storage/manage-backup-and-restore-of-amazon-documentdb-with-aws-backup/).

### Operational characteristics
<a name="operational-characteristics"></a>

Use the [Discovery Tool for Couchbase](https://github.com/awslabs/amazon-documentdb-tools/tree/master/migration/discovery-tool-for-couchbase) to get the following information about your data set, indexes, and workload. This information will help you size your Amazon DocumentDB cluster.

#### Data set
<a name="data-set"></a>

The tool retrieves the following bucket, scope, and collection information:

1. bucket name

1. bucket type

1. scope name

1. collection name

1. total size (bytes)

1. total items

1. item size (bytes)

#### Indexes
<a name="indexes"></a>

The tool retrieves the following index statistics and all index definitions for all buckets. Note that primary indexes are excluded since Amazon DocumentDB automatically creates a primary index for each collection.

1. bucket name

1. scope name

1. collection name

1. index name

1. index size (bytes)

#### Workload
<a name="workload"></a>

The tool retrieves K/V and N1QL query metrics. K/V metric values are gathered at the bucket level and SQL\$1\$1 metrics are gathered at the cluster level.

The tool command line options are as follows:

```
python3 discovery.py \
  --username <source cluster username> \
  --password <source cluster password> \
  --data_node <data node IP address or DNS name> \
  --admin_port <administration http REST port> \
  --kv_zoom <get bucket statistics for specified interval> \
  --tools_path <full path to Couchbase tools> \
  --index_metrics <gather index definitions and SQL++ metrics> \
  --indexer_port <indexer service http REST port> \
  --n1ql_start <start time for sampling> \
  --n1ql_step <sample interval over the sample period>
```

Here is an example command:

```
python3 discovery.py \
  --username username \
  --password ******** \
  --data_node "http://10.0.0.1" \
  --admin_port 8091 \
  --kv_zoom week \
  --tools_path "/opt/couchbase/bin" \
  --index_metrics true \
  --indexer_port 9102 \
  --n1ql_start -60000 \
  --n1ql_step 1000
```

K/V metric values will be based on samples every 10 minutes for the past week (see [HTTP method and URI](https://docs.couchbase.com/server/current/rest-api/rest-bucket-stats.html#http-method-and-uri)). SQL\$1\$1 metric values will based on samples every 1 seconds for the past 60 seconds (see [General Labels](https://docs.couchbase.com/server/current/rest-api/rest-statistics-single.html#general-labels)). The output of the command will be in the following files:

**collection-stats.csv** – bucket, scope, and collection information

```
bucket,bucket_type,scope_name,collection_name,total_size,total_items,document_size
beer-sample,membase,_default,_default,2796956,7303,383
gamesim-sample,membase,_default,_default,114275,586,196
pillowfight,membase,_default,_default,1901907769,1000006,1902
travel-sample,membase,inventory,airport,547914,1968,279
travel-sample,membase,inventory,airline,117261,187,628
travel-sample,membase,inventory,route,13402503,24024,558
travel-sample,membase,inventory,landmark,3072746,4495,684
travel-sample,membase,inventory,hotel,4086989,917,4457
...
```

**index-stats.csv** – index names and sizes

```
bucket,scope,collection,index-name,index-size
beer-sample,_default,_default,beer_primary,468144
gamesim-sample,_default,_default,gamesim_primary,87081
travel-sample,inventory,airline,def_inventory_airline_primary,198290
travel-sample,inventory,airport,def_inventory_airport_airportname,513805
travel-sample,inventory,airport,def_inventory_airport_city,487289
travel-sample,inventory,airport,def_inventory_airport_faa,526343
travel-sample,inventory,airport,def_inventory_airport_primary,287475
travel-sample,inventory,hotel,def_inventory_hotel_city,497125
...
```

**kv-stats.csv** – get, set, and delete metrics for all buckets

```
bucket,gets,sets,deletes
beer-sample,0,0,0
gamesim-sample,0,0,0
pillowfight,369,521,194
travel-sample,0,0,0
```

**n1ql-stats.csv** – SQL\$1\$1 select, delete, and insert metrics for the cluster

```
selects,deletes,inserts
0,132,87
```

**indexes-<bucket-name>.txt** – index definitions of all indexes in the bucket. Note that primary indexes are excluded since Amazon DocumentDB automatically creates a primary index for each collection.

```
CREATE INDEX `def_airportname` ON `travel-sample`(`airportname`)
CREATE INDEX `def_city` ON `travel-sample`(`city`)
CREATE INDEX `def_faa` ON `travel-sample`(`faa`)
CREATE INDEX `def_icao` ON `travel-sample`(`icao`)
CREATE INDEX `def_inventory_airport_city` ON `travel-sample`.`inventory`.`airport`(`city`)
CREATE INDEX `def_inventory_airport_faa` ON `travel-sample`.`inventory`.`airport`(`faa`)
CREATE INDEX `def_inventory_hotel_city` ON `travel-sample`.`inventory`.`hotel`(`city`)
CREATE INDEX `def_inventory_landmark_city` ON `travel-sample`.`inventory`.`landmark`(`city`)
CREATE INDEX `def_sourceairport` ON `travel-sample`(`sourceairport`)
...
```

## Planning
<a name="planning"></a>

In the planning phase you will determine Amazon DocumentDB cluster requirements and mapping of the Couchbase buckets, scopes, and collections to Amazon DocumentDB databases and collections.

### Amazon DocumentDB cluster requirements
<a name="amazon-documentdb-cluster-requirements"></a>

Use the data gathered in the discovery phase to size your Amazon DocumentDB cluster. See [Instance sizing](best_practices.html#best_practices-instance_sizing) for more information about sizing your Amazon DocumentDB cluster.

### Mapping buckets, scopes, and collections to databases and collections
<a name="mapping-buckets-scopes-and-collections-to-databases-and-collections"></a>

Determine the databases and collections that will exist in your Amazon DocumentDB cluster(s). Consider the following options depending on how data is organized in your Couchbase cluster. These are not the only options, but they provide starting points for you to consider.

#### Couchbase Server 6.x or earlier
<a name="couchbase-6x-or-earlier"></a>

##### Couchbase buckets to Amazon DocumentDB collections
<a name="couchbase-buckets-to-amazon-documentdb-collections"></a>

Migrate each bucket to a different Amazon DocumentDB collection. In this scenario, the Couchbase document `id` value will be used as the Amazon DocumentDB `_id` value.

![\[Couchbase Server 6.x or earlier buckets to Amazon DocumentDB collections\]](http://docs.aws.amazon.com/documentdb/latest/developerguide/images/buckets-to-collections.png)


#### Couchbase Server 7.0 or later
<a name="couchbase-70-or-later"></a>

##### Couchbase collections to Amazon DocumentDB collections
<a name="couchbase-collections-to-amazon-documentdb-collections"></a>

Migrate each collection to a different Amazon DocumentDB collection. In this scenario, the Couchbase document `id` value will be used as the Amazon DocumentDB `_id` value.

![\[Couchbase Server 7.0 or later collections to Amazon DocumentDB collections\]](http://docs.aws.amazon.com/documentdb/latest/developerguide/images/collections-to-collections.png)


## Migration
<a name="migration"></a>

### Index migration
<a name="index-migration"></a>

Migrating to Amazon DocumentDB involves transferring not just data but also indexes to maintain query performance and optimize database operations. This section outlines the detailed step-by-step process for migrating indexes to Amazon DocumentDB while ensuring compatibility and efficiency.

Use [Amazon Q](https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/chat-with-q.html) to convert SQL\$1\$1 `CREATE INDEX` statements to Amazon DocumentDB `createIndex()` commands.

1. Upload the **indexes-<bucket name>.txt** file(s) created by the Discovery Tool for Couchbase.

1. Enter the following prompt:

   `Convert the Couchbase CREATE INDEX statements to Amazon DocumentDB createIndex commands`

Amazon Q will generate equivalent Amazon DocumentDB `createIndex()` commands. Note that you may need to update the collection names based on how you [mapped the Couchbase buckets, scopes, and collections to Amazon DocumentDB collections](#mapping-buckets-scopes-and-collections-to-databases-and-collections).

For example:

**indexes-beer-sample.txt**

```
CREATE INDEX `beerType` ON `beer-sample`(`type`)
CREATE INDEX `code` ON `beer-sample`(`code`) WHERE (`type` = "brewery")
```

Example Amazon Q output (excerpt):

```
db.beerSample.createIndex(
  { "type": 1 },
  {
    "name": "beerType",
    "background": true
  }
)

db.beerSample.createIndex(
  { "code": 1 },
  {
    "name": "code",
    "background": true,
    "partialFilterExpression": { "type": "brewery" }
  }
)
```

For any indexes that Amazon Q is not able to convert, refer to [Managing Amazon DocumentDB indexes](managing-indexes.html) and [Indexes and index properties](mongo-apis.html#mongo-apis-index) for more information.

### Refactor code to use the MongoDB APIs
<a name="refactor-code-to-use-the-mongodb-apis"></a>

Clients use the Couchbase SDKs to connect to Couchbase Server. Amazon DocumentDB clients use MongoDB drivers to connect to Amazon DocumentDB. All the languages supported by the Couchbase SDKs are also supported by MongoDB drivers. See [MongoDB Drivers](https://www.mongodb.com/docs/drivers/) for more information on the driver for your language.

Because the APIs are different between Couchbase Server and Amazon DocumentDB, you will need to refactor your code to use the appropriate MongoDB APIs. You can use [Amazon Q](https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/chat-with-q.html) to convert the K/V API calls and SQL\$1\$1 queries to the equivalent MongoDB APIs:

1. Upload the source code file(s).

1. Enter the following prompt:

   `Convert the Couchbase API code to Amazon DocumentDB API code`

Using the [Hello Couchbase](https://docs.couchbase.com/python-sdk/current/hello-world/start-using-sdk.html#hello-couchbase) Python code sample, Amazon Q generates the following (excerpt):

```
from datetime import timedelta
from pymongo import MongoClient

# Connection parameters
database_name = "travel-sample"

# Connect to Amazon DocumentDB cluster
client = MongoClient('<Amazon DocumentDB connection string>')

# Get reference to database and collection
db = client['travel-sample']
airline_collection = db['airline']

# upsert document function
def upsert_document(doc):
    print("\nUpsert Result: ")
    try:
        # key will equal: "airline_8091"
        key = doc["type"] + "_" + str(doc["id"])
        doc['_id'] = key # Amazon DocumentDB uses _id as primary key
        result = airline_collection.update_one(
            {'_id': key},
            {'$set': doc},
            upsert=True
        )
        print(f"Modified count: {result.modified_count}")
    except Exception as e:
        print(e)

# get document function
def get_airline_by_key(key):
    print("\nGet Result: ")
    try:
        result = airline_collection.find_one({'_id': key})
        print(result)
    except Exception as e:
        print(e)

# query for document by callsign
def lookup_by_callsign(cs):
    print("\nLookup Result: ")
    try:
        result = airline_collection.find(
            {'callsign': cs},
            {'name': 1, '_id': 0}
        )
        for doc in result:
            print(doc['name'])
    except Exception as e:
        print(e)

# Test document
airline = {
    "type": "airline",
    "id": 8091,
    "callsign": "CBS",
    "iata": None,
    "icao": None,
    "name": "Couchbase Airways",
}

upsert_document(airline)
get_airline_by_key("airline_8091")
lookup_by_callsign("CBS")
```

Refer to [Connecting programmatically to Amazon DocumentDB](connect_programmatically.html) for examples of connecting to Amazon DocumentDB in Python, Node.js, PHP, Go, Java, C\$1/.NET, R, and Ruby.

### Select the migration approach
<a name="select-the-migration-approach"></a>

When migrating data to Amazon DocumentDB, there are two options:

1. [offline migration](#offline-migration)

1. [online migration](#online-migration)

#### Offline migration
<a name="offline-migration"></a>

Consider an offline migration when:
+ **Downtime is acceptable:** Offline migration involves stopping write operations to the source database, exporting the data, and then importing it to Amazon DocumentDB. This process incurs downtime for your application. If your application or workload can tolerate this period of unavailability, offline migration is a viable option.
+ **Migrating smaller datasets or conducting proofs of concept:** For smaller datasets, the time required for the export and import process is relatively short, making offline migration a quick and simple method. It is also well-suited for development, testing, and proof-of-concept environments where downtime is less critical.
+ **Simplicity is a priority:** The offline method, using cbexport and mongoimport, is generally the most straightforward approach to migrate data. It avoids the complexities of change data capture (CDC) involved in online migration methods.
+ **No ongoing changes need to be replicated:** If the source database is not actively receiving changes during the migration, or if those changes are not critical to be captured and applied to the target during the migration process, then an offline approach is appropriate.

**Topics**
+ [

##### Couchbase Server 6.x or earlier
](#couchbase-6x-or-earlier-offline)
+ [

##### Couchbase Server 7.0 or later
](#couchbase-70-or-later-offline)

##### Couchbase Server 6.x or earlier
<a name="couchbase-6x-or-earlier-offline"></a>

##### Couchbase bucket to Amazon DocumentDB collection
<a name="couchbase-bucket-to-amazon-documentdb-collection-offline"></a>

Export data using [cbexport json](https://docs-archive.couchbase.com/server/6.6/tools/cbexport-json.html) to create a JSON dump of all data in the bucket. For the `--format` option you can use `lines` or `list`.

```
cbexport json \
  --cluster <source cluster endpoint> \
  --bucket <bucket name> \
  --format <lines | list> \
  --username <username> \
  --password <password> \
  --output export.json \
  --include-key _id
```

Import the data to an Amazon DocumentDB collection using [mongoimport](backup_restore-dump_restore_import_export_data.html#backup_restore-dump_restore_import_export_data-mongoimport) with the appropriate option to import the lines or list:

lines:

```
mongoimport \
  --db <database> \
  --collection <collection> \
  --uri "<Amazon DocumentDB cluster connection string>" \
  --file export.json
```

list:

```
mongoimport \
  --db <database> \
  --collection <collection> \
  --uri "<Amazon DocumentDB cluster connection string>" \
  --jsonArray \
  --file export.json
```

##### Couchbase Server 7.0 or later
<a name="couchbase-70-or-later-offline"></a>

To perform an offline migration, use the cbexport and mongoimport tools:

##### Couchbase bucket with default scope and default collection
<a name="couchbase-bucket-with-default-scope-and-default-collection-offline"></a>

Export data using [cbexport json](https://docs.couchbase.com/server/current/tools/cbexport-json.html) to create a JSON dump of all collections in the bucket. For the `--format` option you can use `lines` or `list`.

```
cbexport json \
  --cluster <source cluster endpoint> \
  --bucket <bucket name> \
  --format <lines | list> \
  --username <username> \
  --password <password> \
  --output export.json \
  --include-key _id
```

Import the data to an Amazon DocumentDB collection using [mongoimport](backup_restore-dump_restore_import_export_data.html#backup_restore-dump_restore_import_export_data-mongoimport) with the appropriate option to import the lines or list:

lines:

```
mongoimport \
  --db <database> \
  --collection <collection> \
  --uri "<Amazon DocumentDB cluster connection string>" \
  --file export.json
```

list:

```
mongoimport \
  --db <database> \
  --collection <collection> \
  --uri "<Amazon DocumentDB cluster connection string>" \
  --jsonArray \
  --file export.json
```

##### Couchbase collections to Amazon DocumentDB collections
<a name="couchbase-collections-to-amazon-documentdb-collections-offline"></a>

Export data using [cbexport json](https://docs.couchbase.com/server/current/tools/cbexport-json.html) to create a JSON dump of each collection. Use the `--include-data` option to export each collection. For the `--format` option you can use `lines` or `list`. Use the `--scope-field` and `--collection-field` options to store the name of the scope and collection in the specified fields in each JSON document.

```
cbexport json \
  --cluster <source cluster endpoint> \
  --bucket <bucket name> \
  --include-data <scope name>.<collection name> \
  --format <lines | list> \
  --username <username> \
  --password <password> \
  --output export.json \
  --include-key _id \
  --scope-field "_scope" \
  --collection-field "_collection"
```

Since cbexport added the `_scope` and `_collection` fields to every exported document, you can remove them from every document in the export file via search and replace, `sed`, or whatever method you prefer.

Import the data for each collection to an Amazon DocumentDB collection using [mongoimport](backup_restore-dump_restore_import_export_data.html#backup_restore-dump_restore_import_export_data-mongoimport) with the appropriate option to import the lines or list:

lines:

```
mongoimport \
--db <database> \
--collection <collection> \
--uri "<Amazon DocumentDB cluster connection string>" \
--file export.json
```

list:

```
mongoimport \
--db <database> \
--collection <collection> \
--uri "<Amazon DocumentDB cluster connection string>" \
--jsonArray \
--file export.json
```

#### Online migration
<a name="online-migration"></a>

Consider an online migration when you need to minimize downtime and ongoing changes need to be replicated to Amazon DocumentDB in near-real time.

See [How to perform a live migration from Couchbase to Amazon DocumentDB](https://github.com/awslabs/amazon-documentdb-tools/tree/master/migration/migration-utility-for-couchbase) to learn how to perform a live migration to Amazon DocumentDB. The documentation walks you through deploying the solution and performing a live migration a bucket to an Amazon DocumentDB cluster.

**Topics**
+ [

##### Couchbase Server 6.x or earlier
](#couchbase-6x-or-earlier-online)
+ [

##### Couchbase Server 7.0 or later
](#couchbase-70-or-later-online)

##### Couchbase Server 6.x or earlier
<a name="couchbase-6x-or-earlier-online"></a>

##### Couchbase bucket to Amazon DocumentDB collection
<a name="couchbase-bucket-to-amazon-documentdb-collection-online"></a>

The [migration utility for Couchbase](https://github.com/awslabs/amazon-documentdb-tools/tree/master/migration/migration-utility-for-couchbase) is pre-configured to perform an online migration of a Couchbase bucket to an Amazon DocumentDB collection. Looking at the [sink connector](https://github.com/awslabs/amazon-documentdb-tools/blob/master/migration/migration-utility-for-couchbase/migration-utility-connectors.yaml) configuration, the `document.id.strategy` parameter is configured to use the message key value as the `_id` field value (see [Sink Connector Id Strategy Properties](https://www.mongodb.com/docs/kafka-connector/current/sink-connector/configuration-properties/id-strategy/#std-label-sink-configuration-id-strategy)):

```
ConnectorConfiguration:
  document.id.strategy: 'com.mongodb.kafka.connect.sink.processor.id.strategy.ProvidedInKeyStrategy'
```

##### Couchbase Server 7.0 or later
<a name="couchbase-70-or-later-online"></a>

##### Couchbase bucket with default scope and default collection
<a name="couchbase-bucket-with-default-scope-and-default-collection-online"></a>

The [migration utility for Couchbase](https://github.com/awslabs/amazon-documentdb-tools/tree/master/migration/migration-utility-for-couchbase) is pre-configured to perform an online migration of a Couchbase bucket to an Amazon DocumentDB collection. Looking at the [sink connector](https://github.com/awslabs/amazon-documentdb-tools/blob/master/migration/migration-utility-for-couchbase/migration-utility-connectors.yaml) configuration, the `document.id.strategy` parameter is configured to use the message key value as the `_id` field value (see [Sink Connector Id Strategy Properties](https://www.mongodb.com/docs/kafka-connector/current/sink-connector/configuration-properties/id-strategy/#std-label-sink-configuration-id-strategy)):

```
ConnectorConfiguration:
  document.id.strategy: 'com.mongodb.kafka.connect.sink.processor.id.strategy.ProvidedInKeyStrategy'
```

##### Couchbase collections to Amazon DocumentDB collections
<a name="couchbase-collections-to-amazon-documentdb-collections-online"></a>

Configure the [source connector](https://github.com/awslabs/amazon-documentdb-tools/blob/master/migration/migration-utility-for-couchbase/migration-utility-connectors.yaml) to stream each Couchbase collection in each scope to a separate topic (see [Source Configuration Options](https://docs.couchbase.com/kafka-connector/current/source-configuration-options.html#couchbase.collections)). For example:

```
ConnectorConfiguration:
  # add couchbase.collections configuration
  couchbase.collections: '<scope 1>.<collection 1>, <scope 1>.<collection 2>, ...'
```

Configure the [sink connector](https://github.com/awslabs/amazon-documentdb-tools/blob/master/migration/migration-utility-for-couchbase/migration-utility-connectors.yaml) to stream from each topic to a separate Amazon DocumentDB collection (see [Sink Connector Configuration Properties](https://github.com/mongodb-labs/mongo-kafka/blob/master/docs/sink.md#sink-connector-configuration-properties)). For example:

```
ConnectorConfiguration:
  # remove collection configuration  
  #collection: 'test'
  
  # modify topics configuration
  topics: '<bucket>.<scope 1>.<collection 1>, <bucket>.<scope 1>.<collection 2>, ...'

  # add topic.override.%s.%s configurations for each topic 
  topic.override.<bucket>.<scope 1>.<collection 1>.collection: '<collection>'
  topic.override.<bucket>.<scope 1>.<collection 2>.collection: '<collection>'
```

## Validation
<a name="validation"></a>

This section provides a detailed validation process to verify data consistency and integrity after migrating to Amazon DocumentDB. The validation steps apply regardless of the migration method.

**Topics**
+ [

### Verify that all collections exist in the target
](#validation-checklist-step-1)
+ [

### Verify document count between souce and target clusters
](#validation-checklist-step-2)
+ [

### Compare documents between source and target clusters
](#validation-checklist-step-3)

### Verify that all collections exist in the target
<a name="validation-checklist-step-1"></a>

#### Couchbase source
<a name="source-verify-collections"></a>

option 1: query workbench

```
SELECT RAW `path`
  FROM system:keyspaces
  WHERE `bucket` = '<bucket>'
```

option 2: [cbq](https://docs.couchbase.com/server/current/cli/cbq-tool.html) tool

```
cbq \
  -e <source cluster endpoint> \
  -u <username> \
  -p <password> \
  -q "SELECT RAW `path`
       FROM system:keyspaces
       WHERE `bucket` = '<bucket>'"
```

#### Amazon DocumentDB target
<a name="target-verify-collections"></a>

mongosh (see [Connect to your Amazon DocumentDB cluster](connect-ec2-manual.html#manual-connect-ec2.connect-use)):

```
db.getSiblingDB('<database>')
db.getCollectionNames()
```

### Verify document count between souce and target clusters
<a name="validation-checklist-step-2"></a>

#### Couchbase source
<a name="source-verify-document-count"></a>

##### Couchbase Server 6.x or earlier
<a name="source-verify-document-count-couchbase-6x-or-earlier"></a>

option 1: query workbench

```
SELECT COUNT(*)
FROM `<bucket>`
```

option 2: [cbq](https://docs.couchbase.com/server/current/cli/cbq-tool.html)

```
cbq \
  -e <source cluster endpoint> \
  -u <username> \
  -p <password> \
  -q "SELECT COUNT(*)
       FROM `<bucket:>`"
```

##### Couchbase Server 7.0 or later
<a name="source-verify-document-count-couchbase-70-or-later"></a>

option 1: query workbench

```
SELECT COUNT(*)
FROM `<bucket>`.`<scope>`.`<collection>`
```

option 2: [cbq](https://docs.couchbase.com/server/current/cli/cbq-tool.html)

```
cbq \
  -e <source cluster endpoint> \
  -u <username> \
  -p <password> \
  -q "SELECT COUNT(*)
       FROM `<bucket:>`.`<scope>`.`<collection>`"
```

#### Amazon DocumentDB target
<a name="target-verify-document-count"></a>

mongosh (see [Connect to your Amazon DocumentDB cluster](connect-ec2-manual.html#manual-connect-ec2.connect-use)):

```
db = db.getSiblingDB('<database>')
db.getCollection('<collection>').countDocuments()
```

### Compare documents between source and target clusters
<a name="validation-checklist-step-3"></a>

#### Couchbase source
<a name="source-compare-documents"></a>

##### Couchbase Server 6.x or earlier
<a name="source-compare-documents-couchbase-6x-or-earlier"></a>

option 1: query workbench

```
SELECT META().id as _id, *
FROM `<bucket>`
LIMIT 5
```

option 2: [cbq](https://docs.couchbase.com/server/current/cli/cbq-tool.html)

```
cbq \
  -e <source cluster endpoint> 
  -u <username> \
  -p <password> \
  -q "SELECT META().id as _id, *
       FROM `<bucket>` \
       LIMIT 5"
```

##### Couchbase Server 7.0 or later
<a name="source-compare-documents-couchbase-70-or-later"></a>

option 1: query workbench

```
SELECT COUNT(*)
FROM `<bucket>`.`<scope>`.`<collection>`
```

option 2: [cbq](https://docs.couchbase.com/server/current/cli/cbq-tool.html)

```
cbq \
  -e <source cluster endpoint> \
  -u <username> \
  -p <password> \
  -q "SELECT COUNT(*)
       FROM `<bucket:>`.`<scope>`.`<collection>`"
```

#### Amazon DocumentDB target
<a name="target-compare-documents"></a>

mongosh (see [Connect to your Amazon DocumentDB cluster](connect-ec2-manual.html#manual-connect-ec2.connect-use)):

```
db = db.getSiblingDB('<database>')
db.getCollection('<collection>').find({
  _id: {
    $in: [
      <_id 1>, <_id 2>, <_id 3>, <_id 4>, <_id 5>
    ]
  }
})
```