# Invoking durable Lambda functions
<a name="durable-invoking"></a>

Durable Lambda functions support the same invocation methods as standard Lambda functions. You can invoke durable functions synchronously, asynchronously, or through event source mappings. The invocation process is identical to standard functions, but durable functions provide additional capabilities for long-running executions and automatic state management.

## Invocation methods
<a name="durable-invoking-methods"></a>

**Synchronous invocation:** Invoke a durable function and wait for the response. Synchronous invocations are limited by the Lambda to 15 minutes (or less, depending on the configured function and execution timeout). Use synchronous invocation when you need immediate results or when integrating with APIs and services that expect a response. You can use wait operations for efficient computation without disrupting the caller—the invocation waits for the entire durable execution to complete. For idempotent execution starts, use the execution name parameter as described in [Idempotency](durable-execution-idempotency.md).

```
aws lambda invoke \
  --function-name my-durable-function:1 \
  --cli-binary-format raw-in-base64-out \
  --payload '{"orderId": "12345"}' \
  response.json
```

**Asynchronous invocation:** Queue an event for processing without waiting for a response. Lambda places the event in a queue and returns immediately. Asynchronous invocations support execution durations up to 1 year. Use asynchronous invocation for fire-and-forget scenarios or when processing can happen in the background. For idempotent execution starts, use the execution name parameter as described in [Idempotency](durable-execution-idempotency.md).

```
aws lambda invoke \
  --function-name my-durable-function:1 \
  --invocation-type Event \
  --cli-binary-format raw-in-base64-out \
  --payload '{"orderId": "12345"}' \
  response.json
```

**Event source mappings:** Configure Lambda to automatically invoke your durable function when records are available from stream or queue-based services like Amazon SQS, Kinesis, or DynamoDB. Event source mappings poll the event source and invoke your function with batches of records. For details about using event source mappings with durable functions, including execution duration limits, see [Event source mappings with durable functions](durable-invoking-esm.md).

For complete details about each invocation method, see [synchronous invocation](invocation-sync.md) and [asynchronous invocation](invocation-async.md).

**Note**  
Durable functions support dead-letter queues (DLQs) for error handling, but don't support Lambda destinations. Configure a DLQ to capture records from failed invocations.

## Qualified ARNs requirement
<a name="durable-invoking-qualified-arns"></a>

Durable functions require qualified identifiers for invocation. You must invoke durable functions using a version number, alias, or `$LATEST`. You can use either a full qualified ARN or a function name with version/alias suffix. You cannot use an unqualified identifier (without a version or alias suffix).

**Valid invocations:**

```
# Using full ARN with version number
arn:aws:lambda:us-east-1:123456789012:function:my-durable-function:1

# Using full ARN with alias
arn:aws:lambda:us-east-1:123456789012:function:my-durable-function:prod

# Using full ARN with $LATEST
arn:aws:lambda:us-east-1:123456789012:function:my-durable-function:$LATEST

# Using function name with version number
my-durable-function:1

# Using function name with alias
my-durable-function:prod
```

**Invalid invocations:**

```
# Unqualified ARN (not allowed)
arn:aws:lambda:us-east-1:123456789012:function:my-durable-function

# Unqualified function name (not allowed)
my-durable-function
```

This requirement ensures that durable executions remain consistent throughout their lifecycle. When a durable execution starts, it's pinned to the specific function version. If your function pauses and resumes hours or days later, Lambda invokes the same version that started the execution, ensuring code consistency across the entire workflow.

**Best practice**  
Use numbered versions or aliases for production durable functions rather than `$LATEST`. Numbered versions are immutable and support deterministic replay. Optionally, aliases provide a stable reference that you can update to point to new versions without changing invocation code. When you update an alias, new executions use the new version, while in-progress executions continue with their original version. You may use `$LATEST` for prototyping or to shorten deployment times during development, understanding that executions might not replay correctly (or even fail) if the underlying code changes during running executions.

## Understanding execution lifecycle
<a name="durable-invoking-execution-lifecycle"></a>

When you invoke a durable function, Lambda creates a durable execution that can span multiple function invocations:

1. **Initial invocation:** Your invocation request creates a new durable execution. Lambda assigns a unique execution ID and starts processing.

1. **Execution and checkpointing:** As your function executes durable operations, the SDK creates checkpoints that track progress.

1. **Suspension (if needed):** If your function uses durable waits, such as `wait` or `waitForCallback`, or automatic step retries, Lambda suspends the execution and stops charging for compute time.

1. **Resumption:** When it's time to resume (including after retries), Lambda invokes your function again. The SDK replays the checkpoint log and continues from where execution paused.

1. **Completion:** When your function returns a final result or throws an unhandled error, the durable execution completes.

For synchronous invocations, the caller waits for the entire durable execution to complete, including any wait operations. If the execution exceeds the invocation timeout (15 minutes or less), the invocation times out. For asynchronous invocations, Lambda returns immediately and the execution continues independently. Use the durable execution APIs to track execution status and retrieve final results.

## Invoking from application code
<a name="durable-invoking-with-sdk"></a>

Use the AWS SDKs to invoke durable functions from your application code. The invocation process is identical to standard functions:

------
#### [ TypeScript ]

```
import { LambdaClient, InvokeCommand } from '@aws-sdk/client-lambda';

const client = new LambdaClient({});

// Synchronous invocation
const response = await client.send(new InvokeCommand({
  FunctionName: 'arn:aws:lambda:us-east-1:123456789012:function:my-durable-function:1',
  Payload: JSON.stringify({ orderId: '12345' })
}));

const result = JSON.parse(Buffer.from(response.Payload!).toString());

// Asynchronous invocation
await client.send(new InvokeCommand({
  FunctionName: 'arn:aws:lambda:us-east-1:123456789012:function:my-durable-function:1',
  InvocationType: 'Event',
  Payload: JSON.stringify({ orderId: '12345' })
}));
```

------
#### [ Python ]

```
import boto3
import json

client = boto3.client('lambda')

# Synchronous invocation
response = client.invoke(
    FunctionName='arn:aws:lambda:us-east-1:123456789012:function:my-durable-function:1',
    Payload=json.dumps({'orderId': '12345'})
)

result = json.loads(response['Payload'].read())

# Asynchronous invocation
client.invoke(
    FunctionName='arn:aws:lambda:us-east-1:123456789012:function:my-durable-function:1',
    InvocationType='Event',
    Payload=json.dumps({'orderId': '12345'})
)
```

------

## Chained invocations
<a name="durable-invoking-chained"></a>

Durable functions can invoke other durable and non-durable functions using the `invoke` operation from `DurableContext`. This creates a chained invocation where the calling function waits (suspends) for the invoked function to complete:

------
#### [ TypeScript ]

```
export const handler = withDurableExecution(
  async (event: any, context: DurableContext) => {
    // Invoke another durable function and wait for result
    const result = await context.invoke(
      'process-order',
      'arn:aws:lambda:us-east-1:123456789012:function:order-processor:1',
      { orderId: event.orderId }
    );
    
    return { statusCode: 200, body: JSON.stringify(result) };
  }
);
```

------
#### [ Python ]

```
@durable_execution
def handler(event, context: DurableContext):
    # Invoke another durable function and wait for result
    result = context.invoke(
        'arn:aws:lambda:us-east-1:123456789012:function:order-processor:1',
        {'orderId': event['orderId']},
        name='process-order'
    )
    
    return {'statusCode': 200, 'body': json.dumps(result)}
```

------

Chained invocations create a checkpoint in the calling function. If the calling function is interrupted, it resumes from the checkpoint with the invoked function's result, without re-invoking the function.

**Note**  
Cross-account chained invocations are not supported. The invoked function must be in the same AWS account as the calling function.

# Event source mappings with durable functions
<a name="durable-invoking-esm"></a>

Durable functions work with all Lambda event source mappings. Configure event source mappings for durable functions the same way you configure them for standard functions. Event source mappings automatically poll event sources like Amazon SQS, Kinesis, and DynamoDB Streams, and invoke your function with batches of records.

Event source mappings are useful for durable functions that process streams or queues with complex, multi-step workflows. For example, you can create a durable function that processes Amazon SQS messages with retries, external API calls, and human approvals.

## How event source mappings invoke durable functions
<a name="durable-esm-invocation-behavior"></a>

Event source mappings invoke durable functions synchronously, waiting for the complete durable execution to finish before processing the next batch or marking records as processed. If the total durable execution time exceeds 15 minutes, the execution times out and fails. The event source mapping receives a timeout exception and handles it according to its retry configuration.

## 15-minute execution limit
<a name="durable-esm-duration-limit"></a>

When durable functions are invoked by event source mappings, the total durable execution duration cannot exceed 15 minutes. This limit applies to the entire durable execution from start to completion, not just individual function invocations.

This 15-minute limit is separate from the Lambda function timeout (also 15 minutes maximum). The function timeout controls how long each individual invocation can run, while the durable execution timeout controls the total elapsed time from execution start to completion.

**Example scenarios:**
+ **Valid:** A durable function processes an Amazon SQS message with three steps, each taking 2 minutes, then waits 5 minutes before completing a final step. Total execution time: 11 minutes. This works because the total is under 15 minutes.
+ **Invalid:** A durable function processes an Amazon SQS message, completes initial processing in 2 minutes, then waits 20 minutes for an external callback before completing. Total execution time: 22 minutes. This exceeds the 15-minute limit and will fail.
+ **Invalid:** A durable function processes a Kinesis record with multiple wait operations totaling 30 minutes between steps. Even though each individual invocation completes quickly, the total execution time exceeds 15 minutes.

**Important**  
Configure your durable execution timeout to 15 minutes or less when using event source mappings, otherwise creation of the event source mapping will fail. If your workflow requires longer execution times, use the intermediary function pattern described below.

## Configuring event source mappings
<a name="durable-esm-configuration"></a>

Configure event source mappings for durable functions using the Lambda console, AWS CLI, or AWS SDKs. All standard event source mapping properties apply to durable functions:

```
aws lambda create-event-source-mapping \
  --function-name arn:aws:lambda:us-east-1:123456789012:function:my-durable-function:1 \
  --event-source-arn arn:aws:sqs:us-east-1:123456789012:my-queue \
  --batch-size 10 \
  --maximum-batching-window-in-seconds 5
```

Remember to use a qualified ARN (with version number or alias) when configuring event source mappings for durable functions.

## Error handling with event source mappings
<a name="durable-esm-error-handling"></a>

Event source mappings provide built-in error handling that works with durable functions:
+ **Retry behavior:** If the initial invocation fails, the event source mapping retries according to its retry configuration. Configure maximum retry attempts and retry intervals based on your requirements.
+ **Dead-letter queues:** Configure a dead-letter queue to capture records that fail after all retries. This prevents message loss and enables manual inspection of failed records.
+ **Partial batch failures:** For Amazon SQS and Kinesis, use partial batch failure reporting to process records individually and only retry failed records.
+ **Bisect on error:** For Kinesis and DynamoDB Streams, enable bisect on error to split failed batches and isolate problematic records.

**Note**  
Durable functions support dead-letter queues (DLQs) for error handling, but don't support Lambda destinations. Configure a DLQ to capture records from failed invocations.

For complete information about event source mapping error handling, see [event source mappings](invocation-eventsourcemapping.md).

## Using an intermediary function for long-running workflows
<a name="durable-esm-intermediary-function"></a>

If your workflow requires more than 15 minutes to complete, use an intermediary standard Lambda function between the event source mapping and your durable function. The intermediary function receives events from the event source mapping and invokes the durable function asynchronously, removing the 15-minute execution limit.

This pattern decouples the event source mapping's synchronous invocation model from the durable function's long-running execution model. The event source mapping invokes the intermediary function, which quickly returns after starting the durable execution. The durable function then runs independently for as long as needed (up to 1 year).

### Architecture
<a name="durable-esm-intermediary-architecture"></a>

The intermediary function pattern uses three components:

1. **Event source mapping:** Polls the event source (Amazon SQS, Kinesis, DynamoDB Streams) and invokes the intermediary function synchronously with batches of records.

1. **Intermediary function:** A standard Lambda function that receives events from the event source mapping, validates and transforms the data if needed, and invokes the durable function asynchronously. This function completes quickly (typically under 1 second) and returns control to the event source mapping.

1. **Durable function:** Processes the event with complex, multi-step logic that can run for extended periods. Invoked asynchronously, so it's not constrained by the 15-minute limit.

### Implementation
<a name="durable-esm-intermediary-implementation"></a>

The intermediary function receives the entire event from the event source mapping and invokes the durable function asynchronously. Use the execution name parameter to ensure idempotent execution starts, preventing duplicate processing if the event source mapping retries:

------
#### [ TypeScript ]

```
import { LambdaClient, InvokeCommand } from '@aws-sdk/client-lambda';
import { SQSEvent } from 'aws-lambda';
import { createHash } from 'crypto';

const lambda = new LambdaClient({});

export const handler = async (event: SQSEvent) => {
  // Invoke durable function asynchronously with execution name
  await lambda.send(new InvokeCommand({
    FunctionName: 'arn:aws:lambda:us-east-1:123456789012:function:my-durable-function:1',
    InvocationType: 'Event',
    Payload: JSON.stringify({
      executionName: event.Name,
      event: event
    })
  }));
  
  return { statusCode: 200 };
};
```

------
#### [ Python ]

```
import boto3
import json
import hashlib

lambda_client = boto3.client('lambda')

def handler(event, context):  
    # Invoke durable function asynchronously with execution name
    lambda_client.invoke(
        FunctionName='arn:aws:lambda:us-east-1:123456789012:function:my-durable-function:1',
        InvocationType='Event',
        Payload=json.dumps({
            'executionName': execution_name,
            'event': event["name"]
        })
    )
    
    return {'statusCode': 200}
```

------

For idempotency in the intermediary function itself, use [Powertools for AWS Lambda](https://docs.aws.amazon.com//powertools/) to prevent duplicate invocations of the durable function if the event source mapping retries the intermediary function.

The durable function receives the payload with the execution name and processes all records with long-running logic:

------
#### [ TypeScript ]

```
import { withDurableExecution, DurableContext } from '@aws/durable-execution-sdk-js';

export const handler = withDurableExecution(
  async (payload: any, context: DurableContext) => {
    const sqsEvent = payload.event;
    
    // Process each record with complex, multi-step logic
    const results = await context.map(
      sqsEvent.Records,
      async (ctx, record) => {
        const validated = await ctx.step('validate', async () => {
          return validateOrder(JSON.parse(record.body));
        });
        
        // Wait for external approval (could take hours or days)
        const approval = await ctx.waitForCallback(
          'approval',
          async (callbackId) => {
            await requestApproval(callbackId, validated);
          },
          { timeout: { hours: 48 } }
        );
        
        // Complete processing
        return await ctx.step('complete', async () => {
          return completeOrder(validated, approval);
        });
      }
    );
    
    return { statusCode: 200, processed: results.getResults().length };
  }
);
```

------
#### [ Python ]

```
from aws_durable_execution_sdk_python import durable_execution, DurableContext
from aws_durable_execution_sdk_python.config import Duration, WaitForCallbackConfig
from collections.abc import Sequence
import json

def validate_order(order_data: dict) -> dict:
    """Validate order data - always passes."""
    return order_data

def request_approval(callback_id: str, validated_order: dict) -> None:
    """Request approval for the order - always passes."""
    pass

def complete_order(validated_order: dict, approval_result: str) -> dict:
    """Complete the order processing - always passes."""
    return validated_order

@durable_execution
def lambda_handler(payload, context: DurableContext):
    sqs_event = payload['event']

    def process_record(
        ctx: DurableContext, 
        record: dict, 
        index: int, 
        items: Sequence[dict]
    ) -> dict:
        validated = ctx.step(
            lambda _: validate_order(json.loads(record['body'])),
            name=f'validate-{index}'
        )

        approval = ctx.wait_for_callback(
            submitter=lambda callback_id, wait_ctx: request_approval(callback_id, validated),
            name=f'approval-{index}',
            config=WaitForCallbackConfig(timeout=Duration.from_seconds(172800))
        )

        return ctx.step(
            lambda _: complete_order(validated, approval),
            name=f'complete-{index}'
        )

    results = context.map(
        inputs=sqs_event['Records'],
        func=process_record,
        name='process-records'
    )

    return {
        'statusCode': 200, 
        'started': results.started_count,
        'completed': results.success_count,
        'failed': results.failure_count,
        'total': results.total_count
    }
```

------

### Key considerations
<a name="durable-esm-intermediary-tradeoffs"></a>

This pattern removes the 15-minute execution limit by decoupling the event source mapping from the durable execution. The intermediary function returns immediately after starting the durable execution, allowing the event source mapping to continue processing. The durable function then runs independently for as long as needed.

The intermediary function succeeds when it invokes the durable function, not when the durable execution completes. If the durable execution fails later, the event source mapping won't retry because it already processed the batch successfully. Implement error handling in the durable function and configure dead-letter queues for failed executions.

Use the execution name parameter to ensure idempotent execution starts. If the event source mapping retries the intermediary function, the durable function won't start a duplicate execution because the execution name already exists.

## Supported event sources
<a name="durable-esm-supported-sources"></a>

Durable functions support all Lambda event sources that use event source mappings:
+ Amazon SQS queues (standard and FIFO)
+ Kinesis streams
+ DynamoDB Streams
+ Amazon Managed Streaming for Apache Kafka (Amazon MSK)
+ Self-managed Apache Kafka
+ Amazon MQ (ActiveMQ and RabbitMQ)
+ Amazon DocumentDB change streams

All event source types are subject to the 15-minute durable execution limit when invoking durable functions.

# Retries for Lambda durable functions
<a name="durable-execution-sdk-retries"></a>

Durable functions provide automatic retry capabilities that make your applications resilient to transient failures. The SDK handles retries at two levels: step retries for business logic failures and backend retries for infrastructure failures.

## Step retries
<a name="durable-step-retries"></a>

When an uncaught exception occurs within a step, the SDK automatically retries the step based on the configured retry strategy. Step retries are checkpointed operations that allow the SDK to suspend execution and resume later without losing progress.

### Step retry behavior
<a name="durable-step-retry-behavior"></a>

The following table describes how the SDK handles exceptions within steps:


| Scenario | What happens | Metering impact | 
| --- | --- | --- | 
| Exception in step with remaining retry attempts | The SDK creates a checkpoint for the retry and suspends the function. On the next invocation, the step retries with the configured backoff delay. | 1 operation \$1 error payload size | 
| Exception in step with no remaining retry attempts | The step fails and throws an exception. If your handler code doesn't catch this exception, the entire execution fails. | 1 operation \$1 error payload size | 

When a step needs to retry, the SDK checkpoints the retry state and exits the Lambda invocation if no other work is running. This allows the SDK to implement backoff delays without consuming compute resources. The function resumes automatically after the backoff period.

### Configuring step retry strategies
<a name="durable-step-retry-configuration"></a>

Configure retry strategies to control how steps handle failures. You can specify maximum attempts, backoff intervals, and conditions for retrying.

**Exponential backoff with max attempts:**

------
#### [ TypeScript ]

```
const result = await context.step('call-api', async () => {
  const response = await fetch('https://api.example.com/data');
  if (!response.ok) throw new Error(`API error: ${response.status}`);
  return await response.json();
}, {
  retryStrategy: (error, attemptCount) => {
    if (attemptCount >= 5) {
      return { shouldRetry: false };
    }
    // Exponential backoff: 2s, 4s, 8s, 16s, 32s (capped at 300s)
    const delay = Math.min(2 * Math.pow(2, attemptCount - 1), 300);
    return { shouldRetry: true, delay: { seconds: delay } };
  }
});
```

------
#### [ Python ]

```
def retry_strategy(error, attempt_count):
    if attempt_count >= 5:
        return RetryDecision(should_retry=False)
    # Exponential backoff: 2s, 4s, 8s, 16s, 32s (capped at 300s)
    delay = min(2 * (2 ** (attempt_count - 1)), 300)
    return RetryDecision(should_retry=True, delay=delay)

result = context.step(
    lambda _: call_external_api(),
    name='call-api',
    config=StepConfig(retry_strategy=retry_strategy)
)
```

------

**Fixed interval backoff:**

------
#### [ TypeScript ]

```
const orders = await context.step('query-orders', async () => {
  return await queryDatabase(event.userId);
}, {
  retryStrategy: (error, attemptCount) => {
    if (attemptCount >= 3) {
      return { shouldRetry: false };
    }
    return { shouldRetry: true, delay: { seconds: 5 } };
  }
});
```

------
#### [ Python ]

```
def retry_strategy(error, attempt_count):
    if attempt_count >= 3:
        return RetryDecision(should_retry=False)
    return RetryDecision(should_retry=True, delay=5)

orders = context.step(
    lambda _: query_database(event['userId']),
    name='query-orders',
    config=StepConfig(retry_strategy=retry_strategy)
)
```

------

**Conditional retry (retry only specific errors):**

------
#### [ TypeScript ]

```
const result = await context.step('call-rate-limited-api', async () => {
  const response = await fetch('https://api.example.com/data');
  
  if (response.status === 429) throw new Error('RATE_LIMIT');
  if (response.status === 504) throw new Error('TIMEOUT');
  if (!response.ok) throw new Error(`API_ERROR_${response.status}`);
  
  return await response.json();
}, {
  retryStrategy: (error, attemptCount) => {
    // Only retry rate limits and timeouts
    const isRetryable = error.message === 'RATE_LIMIT' || error.message === 'TIMEOUT';
    
    if (!isRetryable || attemptCount >= 3) {
      return { shouldRetry: false };
    }
    
    // Exponential backoff: 1s, 2s, 4s (capped at 30s)
    const delay = Math.min(Math.pow(2, attemptCount - 1), 30);
    return { shouldRetry: true, delay: { seconds: delay } };
  }
});
```

------
#### [ Python ]

```
def retry_strategy(error, attempt_count):
    # Only retry rate limits and timeouts
    is_retryable = str(error) in ['RATE_LIMIT', 'TIMEOUT']
    
    if not is_retryable or attempt_count >= 3:
        return RetryDecision(should_retry=False)
    
    # Exponential backoff: 1s, 2s, 4s (capped at 30s)
    delay = min(2 ** (attempt_count - 1), 30)
    return RetryDecision(should_retry=True, delay=delay)

result = context.step(
    lambda _: call_rate_limited_api(),
    name='call-rate-limited-api',
    config=StepConfig(retry_strategy=retry_strategy)
)
```

------

**Disable retries:**

------
#### [ TypeScript ]

```
const isDuplicate = await context.step('check-duplicate', async () => {
  return await checkIfOrderExists(event.orderId);
}, {
  retryStrategy: () => ({ shouldRetry: false })
});
```

------
#### [ Python ]

```
is_duplicate = context.step(
    lambda _: check_if_order_exists(event['orderId']),
    name='check-duplicate',
    config=StepConfig(
        retry_strategy=lambda error, attempt: {'should_retry': False}
    )
)
```

------

When the retry strategy returns `shouldRetry: false`, the step fails immediately without retries. Use this for operations that should not be retried, such as idempotency checks or operations with side effects that cannot be safely repeated.

## Exceptions outside steps
<a name="durable-handler-exceptions"></a>

When an uncaught exception occurs in your handler code but outside any step, the SDK marks the execution as failed. This ensures errors in your application logic are properly captured and reported.


| Scenario | What happens | Metering impact | 
| --- | --- | --- | 
| Exception in handler code outside any step | The SDK marks the execution as FAILED and returns the error. The exception is not automatically retried. | Error payload size | 

To enable automatic retry for error-prone code, wrap it in a step with a retry strategy. Steps provide automatic retry with configurable backoff, while code outside steps fails immediately.

## Invocation retries
<a name="durable-invocation-retries"></a>

Invocation level retries are handled differently depending on how the Lambda durable function is attempted to be invoked. The following table describes how the different invocation types might influence the invocation level retries.


| Invocation type | What happens | 
| --- | --- | 
| Synchronous invocation |  Lambda does not automatically retry the invocation on an error during durable function execution. Retries on invocation failures will depend on the source of the synchronous invocation. For example, using the AWS SDK, InternalFailure and ThrottlingException are by default retried automatically.  | 
| Asynchronous invocation |  If a durable function execution fails (for example, it enters a FAILED, STOPPED, or TIMED\$1OUT status), Lambda does not retry the execution. This is different from standard Lambda functions, where Lambda retries the function on asynchronous invocation failures. The MaximumRetryAttempts setting for asynchronous invocations does not apply to durable executions. If you configure a dead-letter queue (DLQ) for the function, Lambda sends the triggering event to the DLQ.  | 
| ESM (Event Source Mapping) |  Lambda by default retries the entire batch until it succeeds. For stream sources (DynamoDB and Kinesis), you can configure the maximum number of times that Lambda retries when your function returns an error. See [event source mappings batching](invocation-eventsourcemapping.md#invocation-eventsourcemapping-batching). For Amazon SQS ESM, you may configure max retries via a DLQ on the original Amazon SQS queue. See [configure Amazon SQS ESM](services-sqs-configure.md). Alternatively, you may consider a DLQ at the function level and Lambda will send the failing triggering event to the DLQ. See [function DLQ](invocation-async-retain-records.md#invocation-dlq). If you are interested in receiving a record of events that failed all processing attempts, or events for successful processing attempts, you may configure destinations for ESM. See [invocation async destinations](invocation-async-retain-records.md#invocation-async-destinations).  | 
| Direct Trigger |  This depends on the "Trigger". For example, Lambda processes functions triggered by Amazon S3 event notifications asynchronously. See [Process Amazon SQS event notifications with Lambda](with-sqs.md). Lambda processes functions triggered by Amazon SNS event notifications, asynchronously. See [Invoking Lambda functions with Amazon SNS notifications](with-sns.md). The asynchronous invocation retry behavior is above in the "Asynchronous invocation" table entry. If Amazon SNS can't reach Lambda or the message is rejected, Amazon SNS retries at increasing intervals over several hours. For details, see [Reliability](https://aws.amazon.com/sns/faqs/#Reliability) in the Amazon SNS FAQs. API Gateway will synchronously invoke Lambda and return the genuine error response back to the requester. See [invocation retries](invocation-retries.md). The synchronous invocation retry behavior is above in the "Synchronous invocation" table entry. See [each direct trigger](invocation-eventsourcemapping.md#eventsourcemapping-trigger-difference) for more details.  | 

## Backend retries
<a name="durable-backend-retries"></a>

Backend retries occur when Lambda encounters infrastructure failures, runtime errors, or when the SDK cannot communicate with the durable execution service. Lambda automatically retries these failures to help your durable functions can recover from transient infrastructure issues.

### Backend retry scenarios
<a name="durable-backend-retry-scenarios"></a>

Lambda automatically retries your function when it encounters the following scenarios:
+ **Internal service errors** - When Lambda or the durable execution service returns a 5xx error, indicating a temporary service issue.
+ **Throttling** - When your function is throttled due to concurrency limits or service quotas.
+ **Timeouts** - When the SDK cannot reach the durable execution service within the timeout period.
+ **Sandbox initialization failures** - When Lambda cannot initialize the execution environment.
+ **Runtime errors** - When the Lambda runtime encounters errors outside your function code, such as out-of-memory errors or process crashes.
+ **Invalid checkpoint token errors** - When the checkpoint token is no longer valid, typically due to service-side state changes.

The following table describes how the SDK handles these scenarios:


| Scenario | What happens | Metering impact | 
| --- | --- | --- | 
| Runtime error outside durable handler (OOM, timeout, crash) | Lambda automatically retries the invocation. The SDK replays from the last checkpoint, skipping completed steps. | Error payload size \$1 1 operation per retry | 
| Service error (5xx) or timeout when calling CheckpointDurableExecution / GetDurableExecutionState APIs | Lambda automatically retries the invocation. The SDK replays from the last checkpoint. | Error payload size \$1 1 operation per retry | 
| Throttling (429) or invalid checkpoint token when calling CheckpointDurableExecution / GetDurableExecutionState APIs | Lambda automatically retries the invocation with exponential backoff. The SDK replays from the last checkpoint. | Error payload size \$1 1 operation per retry | 
| Client error (4xx, except 429 and invalid token) when CheckpointDurableExecution / GetDurableExecutionState APIs | The SDK marks the execution as FAILED. No automatic retry occurs because the error indicates a permanent issue. | Error payload size | 

Backend retries use exponential backoff and continue until the function succeeds or the execution timeout is reached. During replay, the SDK skips completed checkpoints and continues execution from the last successful operation, ensuring your function doesn't re-execute completed work.

## Retry best practices
<a name="durable-retry-best-practices"></a>

Follow these best practices when configuring retry strategies:
+ **Configure explicit retry strategies** - Don't rely on default retry behavior in production. Configure explicit retry strategies with appropriate max attempts and backoff intervals for your use case.
+ **Use conditional retries** - Implement `shouldRetry` logic to retry only transient errors (rate limits, timeouts) and fail fast on permanent errors (validation failures, not found).
+ **Set appropriate max attempts** - Balance between resilience and execution time. Too many retries can delay failure detection, while too few can cause unnecessary failures.
+ **Use exponential backoff** - Exponential backoff reduces load on downstream services and increases the likelihood of recovery from transient failures.
+ **Wrap error-prone code in steps** - Code outside steps cannot be automatically retried. Wrap external API calls, database queries, and other error-prone operations in steps with retry strategies.
+ **Monitor retry metrics** - Track step retry operations and execution failures in Amazon CloudWatch to identify patterns and optimize retry strategies.

# Idempotency
<a name="durable-execution-idempotency"></a>

Durable functions provide built-in idempotency for execution starts through execution names. When you provide an execution name, Lambda uses it to prevent duplicate executions and enable safe retries of invocation requests. Steps have at-least-once execution semantics by default—during replay, the SDK returns checkpointed results without re-executing completed steps, but your business logic must be idempotent to handle potential retries before completion.

**Note**  
Lambda event source mappings (ESM) don't support idempotency at launch. Therefore, each invocation (including retries) starts a new durable execution. To ensure idempotent execution with event source mappings, either implement idempotency logic in your function code such as with [Powertools for AWS Lambda](https://docs.aws.amazon.com//powertools/) or use a regular Lambda function as proxy (dispatcher) to invoke a durable function with an idempotency key (execution name parameter).

## Execution names
<a name="durable-idempotency-execution-names"></a>

You can provide an execution name when invoking a durable function. The execution name acts as an idempotency key, allowing you to safely retry invocation requests without creating duplicate executions. If you don't provide a name, Lambda generates a unique execution ID automatically.

Execution names must be unique within your account and region. When you invoke a function with an execution name that already exists, Lambda behavior depends on the existing execution's state and whether the payload matches.

## Idempotency behavior
<a name="durable-idempotency-behavior"></a>

The following table describes how Lambda handles invocation requests based on whether you provide an execution name, the existing execution state, and whether the payload matches:


| Scenario | Name provided? | Existing execution status | Payload identical? | Behavior | 
| --- | --- | --- | --- | --- | 
| 1 | No | N/A | N/A | New execution started: Lambda generates a unique execution ID and starts a new execution | 
| 2 | Yes | Never existed or retention expired | N/A | New execution started: Lambda starts a new execution with the provided name | 
| 3 | Yes | Running | Yes | Idempotent start: Lambda returns the existing execution information without starting a duplicate. For synchronous invocations, this acts as a reattach to the running execution | 
| 4 | Yes | Running | No | Error: Lambda returns DurableExecutionAlreadyExists error because an execution with this name is already running with different payload | 
| 5 | Yes | Closed (succeeded, failed, stopped, or timed out) | Yes | Idempotent start: Lambda returns the existing execution information without starting a new execution. The closed execution result is returned | 
| 6 | Yes | Closed (succeeded, failed, stopped, or timed out) | No | Error: Lambda returns DurableExecutionAlreadyExists error because an execution with this name already completed with different payload | 

**Note**  
Scenarios 3 and 5 demonstrate idempotent behavior where Lambda safely handles duplicate invocation requests by returning existing execution information instead of creating duplicates.

## Step idempotency
<a name="durable-idempotency-steps"></a>

Steps have at-least-once execution semantics by default. When your function replays after a wait, callback, or failure, the SDK checks each step against the checkpoint log. For steps that already completed, the SDK returns the checkpointed result without re-executing the step logic. However, if a step fails or the function is interrupted before the step completes, the step may execute multiple times.

Your business logic wrapped in steps must be idempotent to handle potential retries. Use idempotency keys to ensure operations like payments or database writes execute only once, even if the step retries.

**Example: Using idempotency keys in steps**

------
#### [ TypeScript ]

```
import { withDurableExecution, DurableContext } from '@aws/durable-execution-sdk-js';
import { randomUUID } from 'crypto';

export const handler = withDurableExecution(
  async (event: any, context: DurableContext) => {
    // Generate idempotency key once
    const idempotencyKey = await context.step('generate-key', async () => {
      return randomUUID();
    });
    
    // Use idempotency key in payment API to prevent duplicate charges
    const payment = await context.step('process-payment', async () => {
      return paymentAPI.charge({
        amount: event.amount,
        idempotencyKey: idempotencyKey
      });
    });
    
    return { statusCode: 200, payment };
  }
);
```

------
#### [ Python ]

```
from aws_durable_execution_sdk_python import durable_execution, DurableContext
import uuid

@durable_execution
def handler(event, context: DurableContext):
    # Generate idempotency key once
    idempotency_key = context.step(
        lambda _: str(uuid.uuid4()),
        name='generate-key'
    )
    
    # Use idempotency key in payment API to prevent duplicate charges
    payment = context.step(
        lambda _: payment_api.charge(
            amount=event['amount'],
            idempotency_key=idempotency_key
        ),
        name='process-payment'
    )
    
    return {'statusCode': 200, 'payment': payment}
```

------

You can configure steps to use at-most-once execution semantics by setting the execution mode to `AT_MOST_ONCE_PER_RETRY`. This ensures the step executes at most once per retry attempt, but may not execute at all if the function is interrupted before the step completes.

The SDK enforces deterministic replay by validating that step names and order match the checkpoint log during replay. If your code attempts to execute steps in a different order or with different names, the SDK throws a `NonDeterministicExecutionError`.

**How replay works with completed steps:**

1. First invocation: Function executes step A, creates checkpoint, then waits

1. Second invocation (after wait): Function replays from beginning, step A returns checkpointed result instantly without re-executing, then continues to step B

1. Third invocation (after another wait): Function replays from beginning, steps A and B return checkpointed results instantly, then continues to step C

This replay mechanism ensures that completed steps don't re-execute, but your business logic must still be idempotent to handle retries before completion.