# Saga patterns
<a name="saga"></a>

A *saga* consists of a sequence of local transactions. Each local transaction in a saga updates the database and triggers the next local transaction. If a transaction fails, the saga runs compensating transactions to revert the database changes made by the previous transactions.

This sequence of local transactions helps achieve a business workflow by using continuation and compensation principles. The *continuation principle* decides the forward recovery of the workflow, whereas the *compensation principle* decides the backward recovery. If the update fails at any step in the transaction, the saga publishes an event for either continuation (to retry the transaction) or compensation (to go back to the previous data state). This ensures that data integrity is maintained and is consistent across the data stores.

For example, when a user purchases a book from an online retailer, the process consists of a sequence of transactions—such as order creation, inventory update, payment, and shipping—that represents a business workflow. In order to complete this workflow, the distributed architecture issues a sequence of local transactions to create an order in the order database, update the inventory database, and update the payment database. When the process is successful, these transactions are invoked sequentially to complete the business workflow, as the following diagram shows. However, if any of these local transactions fails, the system should be able to decide on an appropriate next step—that is, either a forward recovery or a backward recovery.

![\[Business workflows for the saga pattern.\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/cloud-design-patterns/images/saga-1.png)


The following two scenarios help determine whether the next step is forward recovery or backward recovery:
+ Platform-level failure, where something goes wrong with the underlying infrastructure and causes the transaction to fail. In this case, the saga pattern can perform a forward recovery by retrying the local transaction and continuing the business process.
+ Application-level failure, where the payment service fails because of an invalid payment. In this case, the saga pattern can perform a backward recovery by issuing a compensatory transaction to update the inventory and the order databases, and reinstate their previous state.

The saga pattern handles the business workflow and ensures that a desirable end state is reached through forward recovery. In case of failures, it reverts the local transactions by using backward recovery to avoid data consistency issues.

The saga pattern has two variants: choreography and orchestration.

## Saga choreography
<a name="s-choreography"></a>

The saga choreography pattern depends on the events published by the microservices. The saga participants (microservices) subscribe to the events and act based on the event triggers. For example, the order service in the following diagram emits an `OrderPlaced` event. The inventory service subscribes to that event and updates the inventory when the `OrderPlaced` event is emitted. Similarly, the participant services act based on the context of the emitted event.

The saga choreography pattern is suitable when there are only a few participants in the saga, and you need a simple implementation with no single point of failure. When more participants are added, it becomes harder to track the dependencies between the participants by using this pattern.

![\[Saga choreography pattern\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/cloud-design-patterns/images/saga-2.png)


For a detailed review, see the [Saga choreography](saga-choreography.md) section of this guide.

## Saga orchestration
<a name="s-orchestration"></a>

The saga orchestration pattern has a central coordinator called an *orchestrator*. The saga orchestrator manages and coordinates the entire transaction lifecycle. It is aware of the series of steps to be performed to complete the transaction. To run a step, it sends a message to the participant microservice to perform the operation. The participant microservice completes the operation and sends a message back to the orchestrator. Based on the message it receives, the orchestrator decides which microservice to run next in the transaction.

The saga orchestration pattern is suitable when there are many participants, and loose coupling is required between saga participants. The orchestrator encapsulates the complexity in the logic by making the participants loosely coupled. However, the orchestrator can become a single point of failure because it controls the entire workflow.

![\[Saga orchestration pattern\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/cloud-design-patterns/images/saga-3.png)


For a detailed review, see the [Saga orchestration](saga-orchestration.md) section of this guide.

# Saga choreography pattern
<a name="saga-choreography"></a>

## Intent
<a name="saga-choreography-intent"></a>

The saga choreography pattern helps preserve data integrity in distributed transactions that span multiple services by using event subscriptions. In a distributed transaction, multiple services can be called before a transaction is completed. When the services store data in different data stores, it can be challenging to maintain data consistency across these data stores.

## Motivation
<a name="saga-choreography-motivation"></a>

A *transaction* is a single unit of work that might involve multiple steps, where all steps are completely executed or no step is executed, resulting in a data store that retains its consistent state. The terms *atomicity, consistency, isolation, and durability (ACID)* define the properties of a transaction. Relational databases provide ACID transactions to maintain data consistency.

To maintain consistency in a transaction, relational databases use the two-phase commit (2PC) method. This consists of a *prepare phase* and a *commit phase*.
+ In the prepare phase, the coordinating process requests the transaction's participating processes (participants) to promise to either commit or roll back the transaction.
+ In the commit phase, the coordinating process requests the participants to commit the transaction. If the participants cannot agree to commit in the prepare phase, the transaction is rolled back.

In distributed systems that follow a database-per-service design pattern, the two-phase commit is not an option. This is because each transaction is distributed across various databases, and there is no single controller that can coordinate a process that's similar to the two-phase commit in relational data stores. In this case, one solution is to use the saga choreography pattern.

## Applicability
<a name="saga-choreography-applicability"></a>

Use the saga choreography pattern when:
+ Your system requires data integrity and consistency in distributed transactions that span multiple data stores.
+ The data store (for example, a NoSQL database) doesn't provide 2PC to provide ACID transactions, you need to update multiple tables within a single transaction, and implementing 2PC within the application boundaries would be a complex task.
+ A central controlling process that manages the participant transactions might become a single point of failure.
+ The saga participants are independent services and need to be loosely coupled.
+ There is communication between bounded contexts in a business domain.

## Issues and considerations
<a name="saga-choreography-issues"></a>
+ **Complexity: **As the number of microservices increases, saga choreography can become difficult to manage because of the number of interactions between the microservices. Additionally, compensatory transactions and retries add complexities to the application code, which can result in maintenance overhead. Choreography is suitable when there are only a few participants in the saga, and you need a simple implementation with no single point of failure. When more participants are added, it becomes harder to track the dependencies between the participants by using this pattern.
+ **Resilient implementation:** In saga choreography, it's more difficult to implement timeouts, retries, and other resiliency patterns globally, compared with saga orchestration. Choreography must be implemented on individual components instead of at an orchestrator level.
+ **Cyclic dependencies:** The participants consume messages that are published by one another. This might result in cyclic dependencies, leading to code complexities and maintenance overheads, and possible deadlocks.
+ **Dual writes issue: **The microservice has to atomically update the database and publish an event. The failure of either operation might lead to an inconsistent state. One way to solve this is to use the [transactional outbox pattern](transactional-outbox.md).
+ **Preserving events: **The saga participants act based on the events published. It's important to save the events in the order they occur for audit, debugging, and replay purposes. You can use the [event sourcing pattern](event-sourcing.md) to persist the events in an event store in case a replay of the system state is required to restore data consistency. Event stores can also be used for auditing and troubleshooting purposes because they reflect every change in the system.
+ **Eventual consistency**: The sequential processing of local transactions results in eventual consistency, which can be a challenge in systems that require strong consistency. You can address this issue by setting your business teams' expectations for the consistency model or reassess the use case and switch to a database that provides strong consistency.
+ **Idempotency**: Saga participants have to be idempotent to allow repeated execution in case of transient failures that are caused by unexpected crashes and orchestrator failures.
+ **Transaction isolation**: The saga pattern lacks transaction isolation, which is one of the four properties in ACID transactions. The [degree of isolation](https://docs.aws.amazon.com/neptune/latest/userguide/transactions-isolation-levels.html) of a transaction determines how much other concurrent transactions can affect the data that the transaction operates on. Concurrent orchestration of transactions can lead to stale data. We recommend using semantic locking to handle such scenarios.
+ **Observability**: Observability refers to detailed logging and tracing to troubleshoot issues in the implementation and orchestration process. This becomes important when the number of saga participants increases, resulting in complexities in debugging. End-to-end monitoring and reporting are more difficult to achieve in saga choreography, compared with saga orchestration.
+ **Latency issues**: Compensatory transactions can add latency to the overall response time when the saga consists of several steps. If the transactions make synchronous calls, this can increase the latency further.

## Implementation
<a name="saga-choreography-implementation"></a>

### High-level architecture
<a name="saga-choreography-high-level-arch"></a>

In the following architecture diagram, the saga choreography has three participants: the order service, the inventory service, and the payment service. Three steps are required to complete the transaction: T1, T2, and T3. Three compensatory transactions restore the data to the initial state: C1, C2, and C3.

![\[Saga choreography high-level architecture\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/cloud-design-patterns/images/saga-choreography-1.png)

+ The order service runs a local transaction, T1, which atomically updates the database and publishes an `Order placed` message to the message broker.
+ The inventory service subscribes to the order service messages and receives the message that an order has been created.
+ The inventory service runs a local transaction, T2, which atomically updates the database and publishes an `Inventory updated` message to the message broker.
+ The payment service subscribes to the messages from the inventory service and receives the message that the inventory has been updated.
+ The payment service runs a local transaction, T3, which atomically updates the database with payment details and publishes a `Payment processed` message to the message broker.
+ If the payment fails, the payment service runs a compensatory transaction, C1, which atomically reverts the payment in the database and publishes a `Payment failed` message to the message broker.
+ The compensatory transactions C2 and C3 are run to restore data consistency.

### Implementation using AWS services
<a name="saga-choreography-aws-services"></a>

You can implement the saga choreography pattern by using Amazon EventBridge. EventBridge uses events to connect application components. It processes events through event buses or pipes. An event bus is a router that receives [events](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-events.html) and delivers them to zero or more destinations, or *targets*.[ Rules](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-rules.html) associated with the event bus evaluate events as they arrive and send them to [targets](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-targets.html) for processing.

In the following architecture:
+ The microservices—order service, inventory service, and payment service—are implemented as Lambda functions.
+ There are three custom EventBridge buses: `Orders` event bus, `Inventory` event bus, and `Payment` event bus.
+ `Orders` rules, `Inventory` rules, and `Payment` rules match the events that are sent to the corresponding event bus and invoke the Lambda functions.

![\[Saga choreography architecture using AWS services\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/cloud-design-patterns/images/saga-choreography-2.png)


In a successful scenario, when an order is placed:

1. The order service processes the request and sends the event to the `Orders` event bus*.*

1. The `Orders` rules match the events and starts the inventory service*.*

1. The inventory service updates the inventory and sends the event to the `Inventory` event bus.

1. The `Inventory` rules match the events and start the payment service.

1. The payment service processes the payment and sends the event to the `Payment` event bus.

1. The `Payment` rules match the events and send the `Payment processed` event notification to the listener.

   Alternatively, when there is an issue in order processing, the EventBridge rules start the compensatory transactions for reverting the data updates to maintain data consistency and integrity.

1. If the payment fails, the `Payment` rules process the event and start the inventory service*. *The inventory service runs compensatory transactions to revert the inventory.

1. When the inventory has been reverted, the inventory service sends the `Inventory reverted` event to the `Inventory` event bus*. *This event is processed by `Inventory` rules*. *It starts the order service*, *which runs the compensatory transaction to remove the order.

## Related content
<a name="saga-choreography-resources"></a>
+ [Saga orchestration pattern](saga-orchestration.md)
+ [Transactional outbox pattern](transactional-outbox.md)
+ [Retry with backoff pattern](retry-backoff.md)

# Saga orchestration pattern
<a name="saga-orchestration"></a>

## Intent
<a name="saga-orchestration-intent"></a>

The saga orchestration pattern uses a central coordinator (*orchestrator*) to help preserve data integrity in distributed transactions that span multiple services. In a distributed transaction, multiple services can be called before a transaction is completed. When the services store data in different data stores, it can be challenging to maintain data consistency across these data stores.

## Motivation
<a name="saga-orchestration-motivation"></a>

A *transaction* is a single unit of work that might involve multiple steps, where all steps are completely executed or no step is executed, resulting in a data store that retains its consistent state. The terms *atomicity, consistency, isolation, and durability (ACID)* define the properties of a transaction. Relational databases provide ACID transactions to maintain data consistency.

To maintain consistency in a transaction, relational databases use the two-phase commit (2PC) method. This consists of a *prepare phase* and a *commit phase*.
+ In the prepare phase, the coordinating process requests the transaction's participating processes (participants) to promise to either commit or roll back the transaction.
+ In the commit phase, the coordinating process requests the participants to commit the transaction. If the participants cannot agree to commit in the prepare phase, the transaction is rolled back.

In distributed systems that follow a database-per-service design pattern, the two-phase commit is not an option. This is because each transaction is distributed across various databases, and there is no single controller that can coordinate a process that's similar to the two-phase commit in relational data stores. In this case, one solution is to use the saga orchestration pattern.

## Applicability
<a name="saga-orchestration-applicability"></a>

Use the saga orchestration pattern when:
+ Your system requires data integrity and consistency in distributed transactions that span multiple data stores.
+ The data store doesn't provide 2PC to provide ACID transactions, and implementing 2PC within the application boundaries is a complex task.
+ You have NoSQL databases, which do not provide ACID transactions, and you need to update multiple tables within a single transaction.

## Issues and considerations
<a name="saga-orchestration-issues"></a>
+ **Complexity**: Compensatory transactions and retries add complexities to the application code, which can result in maintenance overhead.
+ **Eventual consistency**: The sequential processing of local transactions results in eventual consistency, which can be a challenge in systems that require strong consistency. You can address this issue by setting your business teams' expectations for the consistency model or by switching to a data store that provides strong consistency.
+ **Idempotency**: Saga participants need to be idempotent to allow repeated execution in case of transient failures caused by unexpected crashes and orchestrator failures.
+ **Transaction isolation**: Saga lacks transaction isolation. Concurrent orchestration of transactions can lead to stale data. We recommend using semantic locking to handle such scenarios.
+ **Observability**: Observability refers to detailed logging and tracing to troubleshoot issues in the execution and orchestration process. This becomes important when the number of saga participants increases, resulting in complexities in debugging.
+ **Latency issues**: Compensatory transactions can add latency to the overall response time when the saga consists of several steps. Avoid synchronous calls in such cases.
+ **Single point of failure**: The orchestrator can become a single point of failure because it coordinates the entire transaction. In some cases, the saga choreography pattern is preferred because of this issue.

## Implementation
<a name="saga-orchestration-implementation"></a>

### High-level architecture
<a name="saga-orchestration-implementation-high-level-arch"></a>

In the following architecture diagram, the saga orchestrator has three participants: the order service, the inventory service, and the payment service. Three steps are required to complete the transaction: T1, T2, and T3. The saga orchestrator is aware of the steps and runs them in the required order. When step T3 fails (payment failure), the orchestrator runs the compensatory transactions C1 and C2 to restore the data to the initial state.

![\[Saga orchestrator high-level architecture\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/cloud-design-patterns/images/saga-orchestration-1.png)


You can use [AWS Step Functions](https://aws.amazon.com/step-functions/) to implement saga orchestration when the transaction is distributed across multiple databases.

### Implementation using AWS services
<a name="saga-orchestration-implementation-aws-services"></a>

The sample solution uses the standard workflow in Step Functions to implement the saga orchestration pattern.

![\[Implementing the saga workflow with Step Functions\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/cloud-design-patterns/images/saga-orchestration-2.png)


When a customer calls the API, the Lambda function is invoked, and preprocessing occurs in the Lambda function. The function starts the Step Functions workflow to start processing the distributed transaction. If preprocessing isn't required, you can [initiate the Step Functions workflow directly](https://serverlessland.com/patterns/apigw-sfn) from API Gateway without using the Lambda function.

The use of Step Functions mitigates the single point of failure issue, which is inherent in the implementation of the saga orchestration pattern. Step Functions has built-in fault tolerance and maintains service capacity across multiple Availability Zones in each AWS Region to protect applications against individual machine or data center failures. This helps ensure high availability for both the service itself and for the application workflow it operates.

#### The Step Functions workflow
<a name="saga-orchestration-implementation-sfn"></a>

The Step Functions state machine allows you to configure the decision-based control flow requirements for the pattern implementation. The Step Functions workflow calls the individual services for order placement, inventory update, and payment processing to complete the transaction and sends an event notification for further processing. The Step Functions workflow acts as the orchestrator to coordinate the transactions. If the workflow contains any errors, the orchestrator runs the compensatory transactions to ensure that data integrity is maintained across services.

The following diagram shows the steps that run inside the Step Functions workflow. The `Place Order`, `Update Inventory`, and `Make Payment` steps indicate the success path. The order is placed, the inventory is updated, and the payment is processed before a `Success` state is returned to the caller.

The `Revert Payment`, `Revert Inventory`, and `Remove Order` Lambda functions indicate the compensatory transactions that the orchestrator runs when any step in the workflow fails. If the workflow fails at the `Update Inventory` step, the orchestrator calls the `Revert Inventory` and `Remove Order` steps before returning a `Fail` state to the caller. These compensatory transactions ensure that data integrity is maintained. The inventory returns to its original level and the order is reverted.

![\[Saga Step Functions workflow\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/cloud-design-patterns/images/saga-orchestration-3.png)


### Sample code
<a name="saga-orchestration-implementation-sample-code"></a>

The following sample code shows how you can create a saga orchestrator by using Step Functions. To view the complete code, see the [GitHub repository](https://github.com/aws-samples/saga-orchestration-netcore-blog) for this example.

#### Task definitions
<a name="saga-orchestration-task"></a>

```
var successState = new Succeed(this,"SuccessState");
var failState = new Fail(this, "Fail");

var placeOrderTask = new LambdaInvoke(this, "Place Order", new LambdaInvokeProps
{
    LambdaFunction = placeOrderLambda,
    Comment = "Place Order",
    RetryOnServiceExceptions = false,
    PayloadResponseOnly = true
});

var updateInventoryTask = new LambdaInvoke(this,"Update Inventory", new LambdaInvokeProps
{
    LambdaFunction = updateInventoryLambda,
    Comment = "Update inventory",
    RetryOnServiceExceptions = false,
    PayloadResponseOnly = true
});

var makePaymentTask = new LambdaInvoke(this,"Make Payment", new LambdaInvokeProps
{
    LambdaFunction = makePaymentLambda,
    Comment = "Make Payment",
    RetryOnServiceExceptions = false,
    PayloadResponseOnly = true
});

var removeOrderTask = new LambdaInvoke(this, "Remove Order", new LambdaInvokeProps
{
    LambdaFunction = removeOrderLambda,
    Comment = "Remove Order",
    RetryOnServiceExceptions = false,
    PayloadResponseOnly = true
}).Next(failState);

var revertInventoryTask = new LambdaInvoke(this,"Revert Inventory", new LambdaInvokeProps
{
    LambdaFunction = revertInventoryLambda,
    Comment = "Revert inventory",
    RetryOnServiceExceptions = false,
    PayloadResponseOnly = true
}).Next(removeOrderTask);

var revertPaymentTask = new LambdaInvoke(this,"Revert Payment", new LambdaInvokeProps
{
    LambdaFunction = revertPaymentLambda,
    Comment = "Revert Payment",
    RetryOnServiceExceptions = false,
    PayloadResponseOnly = true
}).Next(revertInventoryTask);

var waitState = new Wait(this, "Wait state", new WaitProps
{
    Time = WaitTime.Duration(Duration.Seconds(30))
}).Next(revertInventoryTask);
```

#### Step function and state machine definitions
<a name="saga-step"></a>

```
var stepDefinition = placeOrderTask
                .Next(new Choice(this, "Is order placed")
                    .When(Condition.StringEquals("$.Status", "ORDER_PLACED"), updateInventoryTask
                        .Next(new Choice(this, "Is inventory updated")
                            .When(Condition.StringEquals("$.Status", "INVENTORY_UPDATED"),
                                makePaymentTask.Next(new Choice(this, "Is payment success")
                                    .When(Condition.StringEquals("$.Status", "PAYMENT_COMPLETED"), successState)
                                    .When(Condition.StringEquals("$.Status", "ERROR"), revertPaymentTask)))
                            .When(Condition.StringEquals("$.Status", "ERROR"), waitState)))
                    .When(Condition.StringEquals("$.Status", "ERROR"), failState));

var stateMachine = new StateMachine(this, "DistributedTransactionOrchestrator", new StateMachineProps {
    StateMachineName = "DistributedTransactionOrchestrator",
    StateMachineType = StateMachineType.STANDARD,
    Role = iamStepFunctionRole,
    TracingEnabled = true,
    Definition = stepDefinition
});
```

### GitHub repository
<a name="saga-orchestration-implementation-github-repo"></a>

For a complete implementation of the sample architecture for this pattern, see the GitHub repository at [https://github.com/aws-samples/saga-orchestration-netcore-blog](https://github.com/aws-samples/saga-orchestration-netcore-blog).

## Blog references
<a name="saga-orchestration-blog"></a>
+ [Building a serverless distributed application using Saga Orchestration pattern](https://aws.amazon.com/blogs/compute/building-a-serverless-distributed-application-using-a-saga-orchestration-pattern/)

## Related content
<a name="saga-orchestration-resources"></a>
+ [Saga choreography pattern](saga-choreography.md)
+ [Transactional outbox pattern](transactional-outbox.md)

## Videos
<a name="saga-orchestration-videos"></a>

The following video discusses how to implement the saga orchestration pattern by using AWS Step Functions.