# Online migration to Amazon Keyspaces: strategies and best practices
<a name="migrating-online"></a>

If you need to maintain application availability during a migration from Apache Cassandra to Amazon Keyspaces, you can prepare a custom online migration strategy by implementing the key components discussed in this topic. By following these best practices for online migrations, you can ensure that application availability and read-after-write consistency are maintained during the entire migration process, minimizing the impact on your users.

When designing an online migration strategy from Apache Cassandra to Amazon Keyspaces, you need to consider the following key steps.

1. **Writing new data**
   + **ZDM Dual Write Proxy for Amazon Keyspaces Migration** – Use the ZDM Dual Write Proxy available on [Github](https://github.com/aws-samples/amazon-keyspaces-examples/blob/main/migration/online/zdm-proxy/README.md) to perform zero-downtime migration from Apache Cassandra to Amazon Keyspaces. The ZDM Proxy performs dual writes without the need to refactor existing applications and performs dual reads for query validation.
   + Application dual-writes: You can implement dual writes in your application using existing Cassandra client libraries and drivers. Designate one database as the leader and the other as the follower. Write failures to the follower database are recorded in a [dead letter queue (DLQ)](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-dead-letter-queues.html) for analysis.
   + Messaging tier dual-writes: Alternatively, you can configure your existing messaging platform to send writes to both Cassandra and Amazon Keyspaces using an additional consumer. This creates eventually consistent views across both databases.

1. **Migrating historical data**
   + Copy historical data: You can migrate historical data from Cassandra to Amazon Keyspaces using AWS Glue or custom extract, transform, and load (ETL) scripts. Handle conflict resolution between dual writes and bulk loads using techniques like lightweight transactions or timestamps.
   + Use Time-To-Live (TTL): For shorter data retention periods, you can use TTL in both Cassandra and Amazon Keyspaces to avoid uploading unnecessary historical data. As old data expires in Cassandra and new data is written via dual-writes, Amazon Keyspaces eventually catches up.

1. **Validating data**
   + Dual reads: Implement dual reads from both Cassandra (primary) and Amazon Keyspaces (secondary) databases, comparing results asynchronously. Differences are logged or sent to a DLQ.
   + Sample reads: Use Λ functions to periodically sample and compare data across both systems, logging any discrepancies to a DLQ.

1. **Migrating the application**
   + Blue-green strategy: Switch your application to treat Amazon Keyspaces as the primary and Cassandra as the secondary data store in a single step. Monitor performance and roll back if issues arise.
   + Canary deployment: Gradually roll out the migration to a subset of users first, incrementally increasing traffic to Amazon Keyspaces as the primary until fully migrated.

1. **Decommissioning Cassandra**

   Once your application is fully migrated to Amazon Keyspaces and data consistency is validated, you can plan to decommission your Cassandra cluster based on data retention policies.

By planning an online migration strategy with these components, you can transition smoothly to the fully managed Amazon Keyspaces service with minimal downtime or disruption. The following sections go into each component in more detail.

**Topics**
+ [Writing new data during an online migration](migration-online-dw.md)
+ [Uploading historical data during an online migration](migration-online-historical.md)
+ [Validating data consistency during an online migration](migration-online-validation.md)
+ [Migrating the application during an online migration](migration-online-app-migration.md)
+ [Decommissioning Cassandra after an online migration](migration-online-decommission.md)

# Writing new data during an online migration
<a name="migration-online-dw"></a>

The first step in an online migration plan is to ensure that any new data written by the application is stored in both databases, your existing Cassandra cluster and Amazon Keyspaces. The goal is to provide a consistent view across the two data stores. You can do this by applying all new writes to both databases. To implement dual writes, consider one of the following three options.
+ **ZDM Dual Write Proxy for Amazon Keyspaces Migration** – Using the ZDM Proxy for Amazon Keyspaces available on [Github](https://github.com/aws-samples/amazon-keyspaces-examples/blob/main/migration/online/zdm-proxy/README.md), you can migrate your Apache Cassandra workloads to Amazon Keyspaces without application downtime. This enhanced solution implements AWS best practices and extends the official ZDM Proxy capabilities.
  + Perform online migrations between Apache Cassandra and Amazon Keyspaces.
  + Write data to both source and target tables simultaneously without refactoring applications.
  + Validate queries through dual-read operations.

  The solution offers the following enhancements to work with AWS and Amazon Keyspaces.
  + **Container deployment** – Use a pre-configured Docker image from Amazon Elastic Container Registry (Amazon ECR) for VPC-accessible deployments.
  + **Infrastructure as code** – Deploy using AWS CloudFormation templates for automated setup on AWS Fargate.
  + **Amazon Keyspaces compatibility** – Access system tables with custom adaptations for Amazon Keyspaces.

  The solution runs on Amazon ECS with Fargate, providing serverless scalability based on your workload demands. A network load balancer distributes incoming application traffic across multiple Amazon ECS tasks for high availability.  
![\[Implementing the ZDM dual write proxy for migrating data from Apache Cassandra to Amazon Keyspaces.\]](http://docs.aws.amazon.com/keyspaces/latest/devguide/images/migration/online-migration-zdm.png)
+ **Application dual writes** – You can implement dual writes with minimal changes to your application code by leveraging the existing Cassandra client libraries and drivers. You can either implement dual writes in your existing application, or create a new layer in the architecture to handle dual writes. For more information and a customer case study that shows how dual writes were implemented in an existing application, see [Cassandra migration case study](https://aws.amazon.com/solutions/case-studies/intuit-apache-migration-case-study/).

  When implementing dual writes, you can designate one database as the leader and the other database as the follower. This allows you to keep writing to your original source, or leader database without letting write failures to the follower, or destination database disrupt the critical path of your application.

  Instead of retrying failed writes to the follower, you can use Amazon Simple Queue Service to record failed writes in a [dead letter queue (DLQ)](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-dead-letter-queues.html). The DLQ lets you analyze the failed writes to the follower and determine why processing did not succeed in the destination database.

  For a more sophisticated dual write implementation, you can follow AWS best practices for designing a sequence of local transactions using the [saga pattern](https://docs.aws.amazon.com/prescriptive-guidance/latest/cloud-design-patterns/saga.html). A saga pattern ensures that if a transaction fails, the saga runs compensating transactions to revert the database changes made by the previous transactions. 

  When using dual-writes for an online migration, you can configure the dual-writes following the saga pattern so that each write is a local transaction to ensure atomic operations across heterogeneous databases. For more information about designing distributed application using recommended design patterns for the AWS Cloud, see [Cloud design patterns, architectures, and implementations](https://docs.aws.amazon.com/prescriptive-guidance/latest/cloud-design-patterns/introduction).  
![\[Implementing dual writes at the application layer when migrating from Apache Cassandra to Amazon Keyspaces.\]](http://docs.aws.amazon.com/keyspaces/latest/devguide/images/migration/online-migration-dual-writes.png)
+ **Messaging tier dual writes** – Instead of implementing dual writes at the application layer, you can use your existing messaging tier to perform dual writes to Cassandra and Amazon Keyspaces. 

  To do this you can configure an additional consumer to your messaging platform to send writes to both data stores. This approach provides a simple low code strategy using the messaging tier to create two views across both databases that are eventually consistent. 

# Uploading historical data during an online migration
<a name="migration-online-historical"></a>

After implementing dual writes to ensure that new data is written to both data stores in real time, the next step in the migration plan is to evaluate how much historical data you need to copy or bulk upload from Cassandra to Amazon Keyspaces. This ensures that both, new data and historical data are going to be available in the new Amazon Keyspaces database before you’re migrating the application. 

Depending on your data retention requirements, for example how much historical data you need to preserve based on your organizations policies, you can consider one the following two options.
+ **Bulk upload of historical data** – The migration of historical data from your existing Cassandra deployment to Amazon Keyspaces can be achieved through various techniques, for example using AWS Glue or custom scripts to extract, transform, and load (ETL) the data. For more information about using AWS Glue to upload historical data, see [Offline migration process: Apache Cassandra to Amazon Keyspaces](migrating-offline.md). 

  When planning the bulk upload of historical data, you need to consider how to resolve conflicts that can occur when new writes are trying to update the same data that is in the process of being uploaded. The bulk upload is expected to be eventually consistent, which means the data is going to reach all nodes eventually. 

  If an update of the same data occurs at the same time due to a new write, you want to ensure that it's not going to be overwritten by the historical data upload. To ensure that you preserve the latest updates to your data even during the bulk import, you must add conflict resolution either into the bulk upload scripts or into the application logic for dual writes. 

  For example, you can use [Lightweight transactions](functional-differences.md#functional-differences.light-transactions) (LWT) to compare and set operations. To do this, you can add an additional field to your data-model that represents time of modification or state. 

  Additionally, Amazon Keyspaces supports the Cassandra `WRITETIME` timestamp function. You can use Amazon Keyspaces client-side timestamps to preserve source database timestamps and implement last-writer-wins conflict resolution. For more information, see [Client-side timestamps in Amazon Keyspaces](client-side-timestamps.md).
+ **Using Time-to-Live (TTL)** – For data retention periods shorter than 30, 60, or 90 days, you can use TTL in Cassandra and Amazon Keyspaces during migration to avoid uploading unnecessary historical data to Amazon Keyspaces. TTL allows you to set a time period after which the data is automatically removed from the database. 

  During the migration phase, instead of copying historical data to Amazon Keyspaces, you can configure the TTL settings to let the historical data expire automatically in the old system (Cassandra) while only applying the new writes to Amazon Keyspaces using the dual-write method. Over time and with old data continually expiring in the Cassandra cluster and new data written using the dual-write method, Amazon Keyspaces automatically catches up to contain the same data as Cassandra.

   This approach can significantly reduce the amount of data to be migrated, resulting in a more efficient and streamlined migration process. You can consider this approach when dealing with large datasets with varying data retention requirements. For more information about TTL, see [Expire data with Time to Live (TTL) for Amazon Keyspaces (for Apache Cassandra)](TTL.md).

  Consider the following example of a migration from Cassandra to Amazon Keyspaces using TTL data expiration. In this example we set TTL for both databases to 60 days and show how the migration process progresses over a period of 90 days. Both databases receive the same newly written data during this period using the dual writes method. We're going to look at three different phases of the migration, each phase is 30 days long. 

  How the migration process works for each phase is shown in the following images.   
![\[Using TTL to expire historical data when migrating from Apache Cassandra to Amazon Keyspaces.\]](http://docs.aws.amazon.com/keyspaces/latest/devguide/images/migration/online-migration-TTL.png)

  1. After the first 30 days, the Cassandra cluster and Amazon Keyspaces have been receiving new writes. The Cassandra cluster also contains historical data that has not yet reached 60 days of retention, which makes up 50% of the data in the cluster. 

     Data that is older than 60 days is being automatically deleted in the Cassandra cluster using TTL. At this point Amazon Keyspaces contains 50% of the data stored in the Cassandra cluster, which is made up of the new writes minus the historical data.

  1. After 60 days, both the Cassandra cluster and Amazon Keyspaces contain the same data written in the last 60 days.

  1. Within 90 days, both Cassandra and Amazon Keyspaces contain the same data and are expiring data at the same rate. 

  This example illustrates how to avoid the step of uploading historical data by using TTL with an expiration date set to 60 days.

# Validating data consistency during an online migration
<a name="migration-online-validation"></a>

 The next step in the online migration process is data validation. Dual writes are adding new data to your Amazon Keyspaces database and you have completed the migration of historical data either using bulk upload or data expiration with TTL. 

Now you can use the validation phase to confirm that both data stores contain in fact the same data and return the same read results. You can choose from one of the following two options to validate that both your databases contain identical data. 
+ **Dual reads** – To validate that both, the source and the destination database contain the same set of newly written and historical data, you can implement dual reads. To do so you read from both your primary Cassandra and your secondary Amazon Keyspaces database similarly to the dual writes method and compare the results asynchronously. 

  The results from the primary database are returned to the client, and the results from the secondary database are used to validate against the primary resultset. Differences found can be logged or sent to a [dead letter queue (DLQ)](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-dead-letter-queues.html) for later reconciliation. 

  In the following diagram, the application is performing a synchronous read from Cassandra, which is the primary data store) and an asynchronous read from Amazon Keyspaces, which is the secondary data store.  
![\[Using dual reads to validate data consistency during an online migration from Apache Cassandra to Amazon Keyspaces.\]](http://docs.aws.amazon.com/keyspaces/latest/devguide/images/migration/online-migration-dual-reads.png)
+ **Sample reads** – An alternative solution that doesn’t require application code changes is to use AWS Lambda functions to periodically and randomly sample data from both the source Cassandra cluster and the destination Amazon Keyspaces database. 

  These Lambda functions can be configured to run at regular intervals. The Lambda function retrieves a random subset of data from both the source and destination systems, and then performs a comparison of the sampled data. Any discrepancies or mismatches between the two datasets can be recorded and sent to a dedicated [dead letter queue (DLQ)](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-dead-letter-queues.html) for later reconciliation.

  This process is illustrated in the following diagram.  
![\[Using sample reads to validate data consistency during and online migration from Apache Cassandra to Amazon Keyspaces.\]](http://docs.aws.amazon.com/keyspaces/latest/devguide/images/migration/online-migration-sample-reads.png)

# Migrating the application during an online migration
<a name="migration-online-app-migration"></a>

In the fourth phase of an online migration, you are migrating your application and transitioning to Amazon Keyspaces as the primary data store. This means that you switch your application to read and write directly from and to Amazon Keyspaces. To ensure minimal disruption to your users, this should be a well-planned and coordinated process. 

Two different recommended solution for application migration are available, the blue green cut over strategy and the canary cut over strategy. The following sections outline these strategies in more detail. 
+ **Blue green strategy** – Using this approach, you switch your application to treat Amazon Keyspaces as the primary data store and Cassandra as the secondary data store in a single step. You can do this using an AWS AppConfig feature flag to control the election of primary and secondary data stores across the application instance. For more information about feature flags, see [Creating a feature flag configuration profile in AWS AppConfig](https://docs.aws.amazon.com/appconfig/latest/userguide/appconfig-creating-configuration-and-profile-feature-flags.html).

  After making Amazon Keyspaces the primary data store, you monitor the application's behavior and performance, ensuring that Amazon Keyspaces meets your requirements and that the migration is successful.

  For example, if you implemented dual-reads for your application, during the application migration phase you transition the primary reads going from Cassandra to Amazon Keyspaces and the secondary reads from Amazon Keyspaces to Cassandra. After the transition, you continue to monitor and compare results as described in the [data validation](migration-online-validation.md) section to ensure consistency across both databases before decommissioning Cassandra. 

  If you detect any issues, you can quickly roll back to the previous state by reverting to Cassandra as the primary data store. You only proceed to the decommissioning phase of the migration if Amazon Keyspaces is meeting all your needs as the primary data store.  
![\[Using the blue green strategy for migrating an application from Apache Cassandra to Amazon Keyspaces.\]](http://docs.aws.amazon.com/keyspaces/latest/devguide/images/migration/online-migration-switch.png)
+ **Canary strategy** – In this approach, you gradually roll out the migration to a subset of your users or traffic. Initially, a small percentage of your application's traffic, for example 5% of all traffic is routed to the version using Amazon Keyspaces as the primary data store, while the rest of the traffic continues to use Cassandra as the primary data store. 

  This allows you to thoroughly test the migrated version with real-world traffic and monitor its performance, stability, and investigate potential issues. If you don't detect any issues, you can incrementally increase the percentage of traffic routed to Amazon Keyspaces until it becomes the primary data store for all users and traffic. 

  This staged roll out minimizes the risk of widespread service disruptions and allows for a more controlled migration process. If any critical issues arise during the canary deployment, you can quickly roll back to the previous version using Cassandra as the primary data store for the affected traffic segment. You only proceed to the decommissioning phase of the migration after you have validated that Amazon Keyspaces processes 100% of your users and traffic as expected.

  The following diagram illustrates the individual steps of the canary strategy.  
![\[Using the canary strategy for migrating an application from Apache Cassandra to Amazon Keyspaces.\]](http://docs.aws.amazon.com/keyspaces/latest/devguide/images/migration/online-migration-canary.png)

# Decommissioning Cassandra after an online migration
<a name="migration-online-decommission"></a>

After the application migration is complete with your application is fully running on Amazon Keyspaces and you have validated data consistency over a period of time, you can plan to decommission your Cassandra cluster. During this phase, you can evaluate if the data remaining in your Cassandra cluster needs to be archived or can be deleted. This depends on your organization’s policies for data handling and retention.

By following this strategy and considering the recommended best practices described in this topic when planning your online migration from Cassandra to Amazon Keyspaces, you can ensure a seamless transition to Amazon Keyspaces while maintaining read-after-write consistency and availability of your application.

Migrating from Apache Cassandra to Amazon Keyspaces can provide numerous benefits, including reduced operational overhead, automatic scaling, improved security, and a framework that helps you to reach your compliance goals. By planning an online migration strategy with dual writes, historical data upload, data validation, and a gradual roll out, you can ensure a smooth transition with minimal disruption to your application and its users. 

Implementing the online migration strategy discussed in this topic allows you to validate the migration results, identify and address any issues, and ultimately decommission your existing Cassandra deployment in favor of the fully managed Amazon Keyspaces service.