

# How Lambda works
<a name="concepts-basics"></a>

Lambda functions are the basic building blocks you use to build Lambda applications. To write functions, it's essential to understand the core concepts and components that make up the Lambda programming model. This section will guide you through the fundamental elements you need to know to start building serverless applications with Lambda.
+ **[Lambda functions and function handlers](#gettingstarted-concepts-function)** - A Lambda function is a small block of code that runs in response to events. Functions can be standard (up to 15 minutes) or [durable](durable-functions.md) (up to one year). Functions are the basic building blocks you use to build applications. Function handlers are the entry point for event objects that your Lambda function code processes.
+ **[Lambda execution environment and runtimes](#gettingstarted-concepts-runtime)** - Lambda execution environments manage the resources required to run your function. For [durable functions](durable-functions.md), the execution environment includes automatic state management and checkpointing capabilities. Runtimes are the language-specific environments your functions run in.
+ **[Events and triggers](#gettingstarted-concepts-event)** - Other AWS services can invoke your functions in response to specific events. For durable functions, events can also trigger resumption of paused workflows.
+ **[Lambda permissions and roles](#gettingstarted-concepts-permissions)** - Control who can access your functions and what other AWS services your functions can interact with. Durable functions require additional permissions for state management and extended execution.

**Tip**  
If you want to start by understanding serverless development more generally, see [Understanding the difference between traditional and serverless development](https://docs.aws.amazon.com/serverless/latest/devguide/serverless-shift-mindset.html) in the *AWS Serverless Developer Guide*.

## Lambda functions and function handlers
<a name="gettingstarted-concepts-function"></a>

In Lambda, **functions** are the fundamental building blocks you use to create applications. A Lambda function is a piece of code that runs in response to events, such as a user clicking a button on a website or a file being uploaded to an Amazon Simple Storage Service (Amazon S3) bucket. With durable functions, your code can pause execution between steps, maintaining state automatically, making them ideal for long-running workflows like order processing or content moderation. You can think of a function as a kind of self-contained program with the following properties.

A Lambda **function handler** is the method in your function code that processes events. When a function runs in response to an event, Lambda runs the function handler. Data about the event that caused the function to run is passed directly to the handler. While the code in a Lambda function can contain more than one method or function, Lambda functions can only have one handler.

To create a Lambda function, you bundle your function code and its dependencies in a deployment package. Lambda supports two types of deployment package, [.zip file archives](configuration-function-zip.md) and [container images](images-create.md).
+ A function has one specific job or purpose
+ They run only when needed in response to specific events
+ They automatically stop running when finished

## Lambda execution environment and runtimes
<a name="gettingstarted-concepts-runtime"></a>

Lambda functions run inside a secure, isolated *[execution environment](lambda-runtime-environment.md)* which Lambda manages for you. For [durable functions](durable-functions.md), the execution environment includes additional components for state management and workflow coordination. The execution environment manages the processes and resources that are needed to run your function. When a function is first invoked, Lambda creates a new execution environment for the function to run in. After the function has finished running, Lambda doesn't stop the execution environment right away; if the function is invoked again, Lambda can re-use the existing execution environment.

The Lambda execution environment also contains a *runtime*, a language-specific environment that relays event information and responses between Lambda and your function. Lambda provides a number of [managed runtimes](lambda-runtimes.md#runtimes-supported) for the most popular programming languages, or you can create your own.

For managed runtimes, Lambda automatically applies security updates and patches to functions using the runtime.

## Events and triggers
<a name="gettingstarted-concepts-event"></a>

You can also invoke a Lambda function directly by using the Lambda console, [AWS CLI](https://aws.amazon.com/cli/), or one of the [AWS Software Development Kits (SDKs)](https://aws.amazon.com/developer/tools/). It's more usual in a production application for your function to be invoked by another AWS service in response to a particular event. For example, you might want a function to run whenever an item is added to an Amazon DynamoDB table.

To make your function respond to events, you set up a **trigger**. A trigger connects your function to an event source, and your function can have multiple triggers. When an event occurs, Lambda receives event data as a JSON document and converts it into an object that your code can process. You might define the following JSON format for your event and the Lambda runtime converts this JSON to an object before passing it to your function's handler.

**Example custom Lambda event**  

```
{
  "Location": "SEA",
  "WeatherData":{
    "TemperaturesF":{
      "MinTempF": 22,
      "MaxTempF": 78
    },
    "PressuresHPa":{
      "MinPressureHPa": 1015,
      "MaxPressureHPa": 1027
    }
  }
}
```

Stream and queue services like Amazon Kinesis or Amazon SQS use an [event source mapping](invocation-eventsourcemapping.md) instead of a standard trigger. Event source mappings poll the source for new data, batch records together, and then invoke your function with the batched events. For more information, see [How event source mappings differ from direct triggers](invocation-eventsourcemapping.md#eventsourcemapping-trigger-difference).

To understand how a trigger works, start by completing the [Use an Amazon S3 trigger](with-s3-example.md) tutorial, or for a general overview of using triggers and instructions on creating a trigger using the Lambda console, see [Integrating other services](lambda-services.md).

## Lambda permissions and roles
<a name="gettingstarted-concepts-permissions"></a>

For Lambda, there are two main types of [permissions](permissions-granting-access.md) that you need to configure:
+ Permissions that your function needs to access other AWS services
+ Permissions that other users and AWS services need to access your function

The following sections describe both of these permission types and discuss best practices for applying least-privilege permissions.

### Permissions for functions to access other AWS resources
<a name="gettingstarted-concepts-permissions-role"></a>

Lambda functions often need to access other AWS resources and perform actions on them. For example, a function might read items from a DynamoDB table, store an object in an S3 bucket, or write to an Amazon SQS queue. To give functions the permissions they need to perform these actions, you use an *[execution role](lambda-intro-execution-role.md)*. 

A Lambda execution role is a special kind of AWS Identity and Access Management (IAM) [role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html), an identity you create in your account that has specific permissions associated with it defined in a *policy*.

Every Lambda function must have an execution role, and a single role can be used by more than one function. When a function is invoked, Lambda assumes the function's execution role and is granted permission to take the actions defined in the role's policy.

When you create a function in the Lambda console, Lambda automatically creates an execution role for your function. The role's policy gives your function basic permissions to write log outputs to Amazon CloudWatch Logs. To give your function permission to perform actions on other AWS resources, you need to edit the role to add the extra permissions. The easiest way to add permissions is to use an AWS [managed policy](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_managed-vs-inline.html#aws-managed-policies). Managed policies are created and administered by AWS and provide permissions for many common use cases. For example, if your function performs CRUD operations on a DynamoDB table, you can add the [AmazonDynamoDBFullAccess](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonDynamoDBFullAccess.html) policy to your role.

### Permissions for other users and resources to access your function
<a name="gettingstarted-concepts-permissions-resource-based"></a>

To grant other AWS service permission to access your Lambda function, you use a *[resource-based policy](access-control-resource-based.md)*. In IAM, resource-based policies are attached to a resource (in this case, your Lambda function) and define who can access the resource and what actions they are allowed to take.

For another AWS service to invoke your function through a trigger, your function's resource-based policy must grant that service permission to use the `lambda:InvokeFunction` action. If you create the trigger using the console, Lambda automatically adds this permission for you.

To grant permission to other AWS users to access your function, you can define this in your function's resource-based policy in exactly the same way as for another AWS service or resource. You can also use an *[identity-based policy](access-control-identity-based.md)* that's associated with the user. 

### Best practices for Lambda permissions
<a name="gettingstarted-concepts-permissions-best-practice"></a>

When you set permissions using IAM policies, [security best practice](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html) is to grant only the permissions required to perform a task. This is known as the principle of *least privilege*. To get started granting permissions for your function, you might choose to use an AWS managed policy. Managed policies can be the quickest and easiest way to grant permissions to perform a task, but they might also include other permissions you don't need. As you move from early development through test and production, we recommend you reduce permissions to only those needed by defining your own [customer-managed policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_managed-vs-inline.html#customer-managed-policies).

The same principle applies when granting permissions to access your function using a resource-based policy. For example, if you want to give permission to Amazon S3 to invoke your function, best practice is to limit access to individual buckets, or buckets in particular AWS accounts, rather than giving blanket permissions to the S3 service.

# Running code with Lambda
<a name="concepts-how-lambda-runs-code"></a>

When you write a Lambda function, you are creating code that will run in a unique serverless environment. Understanding how Lambda actually runs your code involves two key aspects: the programming model that defines how your code interacts with Lambda, and the execution environment lifecycle that determines how Lambda manages your code's runtime environment.

## The Lambda programming model
<a name="concepts-progmodel-overview"></a>

Programming model functions as a common set of rules for how Lambda works with your code, regardless of whether you're writing in Python, Java, or any other supported language. The programming model includes your runtime and handler.

**For standard functions:**

1. Lambda receives an event.

1. Lambda uses the runtime to prepare the event in a format your code can use.

1. The runtime sends the formatted event to your handler.

1. Your handler processes the event using the code you've written.

**For Durable Functions:**

1. Lambda receives an event

1. The runtime prepares both the event and DurableContext

1. Your handler can:
   + Process steps with automatic checkpointing
   + Pause execution without consuming resources
   + Resume from the last successful checkpoint
   + Maintain state between steps

Essential to this model is the *handler*, where Lambda sends events to be processed by your code. Think of it as the entry point to your code. When Lambda receives an event, it passes this event and some context information to your handler. The handler then runs your code to process these events - for example, it might read a file when it's uploaded to Amazon S3, analyze an image, or update a database. Once your code finishes processing an event, the handler is ready to process the next one.

## The Lambda execution model
<a name="concepts-exec-env-overview"></a>

While the programming model defines how Lambda interacts with your code, Execution environment is where Lambda actually runs your function — it's a secure, isolated compute space created specifically for your function.

**Each environment follows a lifecycle that varies between standard and durable functions:**

**Standard Functions (up to 15 minutes):**

1. **Initialization:** Environment setup and code loading

1. **Invocation:** Single execution of function code

1. **Shutdown:** Environment cleanup

**Durable Functions (up to 1 year):**

1. **Initialization:** Environment and durable state setup

1. **Invocation:** Multiple steps with automatic checkpointing

1. **Wait States:** Pause execution without resource consumption

1. **Resume:** Restart from last checkpoint

1. **Shutdown:** Cleanup of durable state

This environment handles important aspects of running your function. It provides your function with memory and a `/tmp` directory for temporary storage. **For Durable Functions, it also manages:**
+ Automatic state persistence between steps
+ Checkpoint storage and recovery
+ Wait state coordination
+ Progress tracking across long-running executions

# Understanding the Lambda programming model
<a name="foundation-progmodel"></a>

Lambda offers two programming models: standard functions that run up to 15 minutes, and Durable Functions that can run up to one year. While both share core concepts, Durable Functions add capabilities for long-running, stateful workflows.

Lambda provides a programming model that is common to all of the runtimes. The programming model defines the interface between your code and the Lambda system. You tell Lambda the entry point to your function by defining a *handler* in the function configuration. The runtime passes in objects to the handler that contain the invocation *event* and the *context*, such as the function name and request ID.

**For Durable Functions, the handler also receives a DurableContext object that provides:**
+ Checkpointing capabilities through step()
+ Wait state management through wait() and waitForCallback()
+ Automatic state persistence between invocations

When the handler finishes processing the first event, the runtime sends it another. For Durable Functions, the handler can pause execution between steps, and Lambda will automatically save and restore state when the function resumes. The function's class stays in memory, so clients and variables that are declared outside of the handler method in *initialization code* can be reused. To save processing time on subsequent events, create reusable resources like AWS SDK clients during initialization. Once initialized, each instance of your function can process thousands of requests.

Your function also has access to local storage in the `/tmp` directory, a transient cache that can be used for multiple invocations. For more information, see [Execution environment](lambda-runtime-environment.md).

When [AWS X-Ray tracing](services-xray.md) is enabled, the runtime records separate subsegments for initialization and execution.

The runtime captures logging output from your function and sends it to Amazon CloudWatch Logs. In addition to logging your function's output, the runtime also logs entries when function invocation starts and ends. This includes a report log with the request ID, billed duration, initialization duration, and other details. If your function throws an error, the runtime returns that error to the invoker.

**Note**  
Logging is subject to [CloudWatch Logs quotas](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/cloudwatch_limits_cwl.html). Log data can be lost due to throttling or, in some cases, when an instance of your function is stopped.

**Key differences for Durable Functions:**
+ State is automatically persisted between steps
+ Functions can pause execution without consuming resources
+ Steps are automatically retried on failure
+ Progress is tracked through checkpoints

Lambda scales your function by running additional instances of it as demand increases, and by stopping instances as demand decreases. This model leads to variations in application architecture, such as:
+ Unless noted otherwise, incoming requests might be processed out of order or concurrently.
+ Do not rely on instances of your function being long lived, instead store your application's state elsewhere.
+ Use local storage and class-level objects to increase performance, but keep to a minimum the size of your deployment package and the amount of data that you transfer onto the execution environment.

For a hands-on introduction to the programming model in your preferred programming language, see the following chapters.
+ [Building Lambda functions with Node.js](lambda-nodejs.md)
+ [Building Lambda functions with Python](lambda-python.md)
+ [Building Lambda functions with Ruby](lambda-ruby.md)
+ [Building Lambda functions with Java](lambda-java.md)
+ [Building Lambda functions with Go](lambda-golang.md)
+ [Building Lambda functions with C\$1](lambda-csharp.md)
+ [Building Lambda functions with PowerShell](lambda-powershell.md)

# Understanding the Lambda execution environment lifecycle
<a name="lambda-runtime-environment"></a>

Lambda execution environments support both standard functions (up to 15 minutes) and Durable Functions (up to one year). While both share the same basic lifecycle, Durable Functions add state management capabilities for long-running workflows.

 Lambda invokes your function in an execution environment, which provides a secure and isolated runtime environment. The execution environment manages the resources required to run your function. The execution environment also provides lifecycle support for the function's runtime and any [external extensions](lambda-extensions.md) associated with your function. 

**For Durable Functions, the execution environment includes additional components for:**
+ State persistence between steps
+ Checkpointing management
+ Wait state coordination
+ Progress tracking

**Lambda Managed Instances execution environment**  
If you are using [Lambda Managed Instances](lambda-managed-instances-execution-environment.md), the execution environment has important differences compared to Lambda (default) functions. Managed Instances support concurrent invocations, use a different lifecycle model, and run on customer-owned infrastructure. For detailed information about the Managed Instances execution environment, see [Understanding the Lambda Managed Instances execution environment](lambda-managed-instances-execution-environment.md).

The function's runtime communicates with Lambda using the [Runtime API](runtimes-api.md). Extensions communicate with Lambda using the [Extensions API](runtimes-extensions-api.md). Extensions can also receive log messages and other telemetry from the function by using the [Telemetry API](telemetry-api.md). 



![\[Architecture diagram of the execution environment.\]](http://docs.aws.amazon.com/lambda/latest/dg/images/telemetry-api-concept-diagram.png)


When you create your Lambda function, you specify configuration information, such as the amount of memory available and the maximum execution time allowed for your function. Lambda uses this information to set up the execution environment.

The function's runtime and each external extension are processes that run within the execution environment. Permissions, resources, credentials, and environment variables are shared between the function and the extensions.

**Topics**
+ [

## Lambda execution environment lifecycle
](#runtimes-lifecycle)
+ [

## Cold starts and latency
](#cold-start-latency)
+ [

## Reducing cold starts with Provisioned Concurrency
](#cold-starts-pc)
+ [

## Optimizing static initialization
](#static-initialization)

## Lambda execution environment lifecycle
<a name="runtimes-lifecycle"></a>

![\[Lambda lifecycle phases: Init, Invoke, Shutdown\]](http://docs.aws.amazon.com/lambda/latest/dg/images/Overview-Successful-Invokes.png)


Each phase starts with an event that Lambda sends to the runtime and to all registered extensions. The runtime and each extension indicate completion by sending a `Next` API request. Lambda freezes the execution environment when the runtime and each extension have completed and there are no pending events.

**The lifecycle phases for Durable Functions include:**
+ **Init:** Standard initialization plus durable state setup
+ **Invoke:** Can include multiple step executions with automatic checkpointing
+ **Wait:** Function can pause execution without consuming resources
+ **Resume:** Function restarts from last checkpoint
+ **Shutdown:** Cleanup of durable state and resources

**Topics**
+ [

### Init phase
](#runtimes-lifecycle-ib)
+ [

### Failures during the Init phase
](#runtimes-lifecycle-init-errors)
+ [

### Restore phase (Lambda SnapStart only)
](#runtimes-lifecycle-restore)
+ [

### Invoke phase
](#runtimes-lifecycle-invoke)
+ [

### Failures during the invoke phase
](#runtimes-lifecycle-invoke-with-errors)
+ [

### Shutdown phase
](#runtimes-lifecycle-shutdown)

### Init phase
<a name="runtimes-lifecycle-ib"></a>

In the `Init` phase, Lambda performs three tasks:
+ Start all extensions (`Extension init`)
+ Bootstrap the runtime (`Runtime init`)
+ Run the function's static code (`Function init`)
+ Run any before-checkpoint [runtime hooks](snapstart-runtime-hooks.md) (Lambda SnapStart only)

The `Init` phase ends when the runtime and all extensions signal that they are ready by sending a `Next` API request. The `Init` phase is limited to 10 seconds. If all three tasks do not complete within 10 seconds, Lambda retries the `Init` phase at the time of the first function invocation with the configured function timeout.

When [Lambda SnapStart](snapstart.md) is activated, the `Init` phase happens when you publish a function version. Lambda saves a snapshot of the memory and disk state of the initialized execution environment, persists the encrypted snapshot, and caches it for low-latency access. If you have a before-checkpoint [runtime hook](snapstart-runtime-hooks.md), then the code runs at the end of `Init` phase.

**Note**  
The 10-second timeout doesn't apply to functions that are using provisioned concurrency, SnapStart, or Lambda Managed Instances. For provisioned concurrency, SnapStart, and Managed Instances functions, your initialization code can run for up to 15 minutes. The time limit is 130 seconds or the configured function timeout (maximum 900 seconds), whichever is higher.

When you use [provisioned concurrency](https://docs.aws.amazon.com/lambda/latest/dg/provisioned-concurrency.html), Lambda initializes the execution environment when you configure the PC settings for a function. Lambda also ensures that initialized execution environments are always available in advance of invocations. You may see gaps between your function's invocation and initialization phases. Depending on your function's runtime and memory configuration, you may also see variable latency on the first invocation on an initialized execution environment.

For functions using on-demand concurrency, Lambda may occasionally initialize execution environments ahead of invocation requests. When this happens, you may also observe a time gap between your function's initialization and invocation phases. We recommend you to not take a dependency on this behavior.

### Failures during the Init phase
<a name="runtimes-lifecycle-init-errors"></a>

If a function crashes or times out during the `Init` phase, Lambda emits error information in the `INIT_REPORT` log.

**Example — INIT\$1REPORT log for timeout**  

```
INIT_REPORT Init Duration: 1236.04 ms Phase: init Status: timeout
```

**Example — INIT\$1REPORT log for extension failure**  

```
INIT_REPORT Init Duration: 1236.04 ms Phase: init Status: error Error Type: Extension.Crash
```

If the `Init` phase is successful, Lambda doesn't emit the `INIT_REPORT` log unless [SnapStart](snapstart.md) or [provisioned concurrency](provisioned-concurrency.md) is enabled. SnapStart and provisioned concurrency functions always emit `INIT_REPORT`. For more information, see [Monitoring for Lambda SnapStart](snapstart-monitoring.md).

### Restore phase (Lambda SnapStart only)
<a name="runtimes-lifecycle-restore"></a>

When you first invoke a [SnapStart](snapstart.md) function and as the function scales up, Lambda resumes new execution environments from the persisted snapshot instead of initializing the function from scratch. If you have an after-restore [runtime hook](snapstart-runtime-hooks.md), the code runs at the end of the `Restore` phase. You are charged for the duration of after-restore runtime hooks. The runtime must load and after-restore runtime hooks must complete within the timeout limit (10 seconds). Otherwise, you'll get a SnapStartTimeoutException. When the `Restore` phase completes, Lambda invokes the function handler (the [Invoke phase](#runtimes-lifecycle-invoke)).

#### Failures during the Restore phase
<a name="runtimes-lifecycle-restore-errors"></a>

If the `Restore` phase fails, Lambda emits error information in the `RESTORE_REPORT` log.

**Example — RESTORE\$1REPORT log for timeout**  

```
RESTORE_REPORT Restore Duration: 1236.04 ms Status: timeout
```

**Example — RESTORE\$1REPORT log for runtime hook failure**  

```
RESTORE_REPORT Restore Duration: 1236.04 ms Status: error Error Type: Runtime.ExitError
```

For more information about the `RESTORE_REPORT` log, see [Monitoring for Lambda SnapStart](snapstart-monitoring.md).

### Invoke phase
<a name="runtimes-lifecycle-invoke"></a>

When a Lambda function is invoked in response to a `Next` API request, Lambda sends an `Invoke` event to the runtime and to each extension.

The function's timeout setting limits the duration of the entire `Invoke` phase. For example, if you set the function timeout as 360 seconds, the function and all extensions need to complete within 360 seconds. Note that there is no independent post-invoke phase. The duration is the sum of all invocation time (runtime \$1 extensions) and is not calculated until the function and all extensions have finished executing.

The invoke phase ends after the runtime and all extensions signal that they are done by sending a `Next` API request.

### Failures during the invoke phase
<a name="runtimes-lifecycle-invoke-with-errors"></a>

If the Lambda function crashes or times out during the `Invoke` phase, Lambda resets the execution environment. The following diagram illustrates Lambda execution environment behavior when there's an invoke failure:

![\[Execution environment example: Init, Invoke, Invoke with Error, Invoke, Shutdown\]](http://docs.aws.amazon.com/lambda/latest/dg/images/Overview-Invoke-with-Error.png)


In the previous diagram:
+ The first phase is the **INIT** phase, which runs without errors.
+ The second phase is the **INVOKE** phase, which runs without errors.
+ At some point, suppose your function runs into an invoke failure (common causes include function timeouts, runtime errors, memory exhaustion, VPC connectivity issues, permission errors, concurrency limits, and various configuration problems). For a complete list of possible invocation failures, see [Troubleshoot invocation issues in Lambda](troubleshooting-invocation.md). The third phase, labeled **INVOKE WITH ERROR **, illustrates this scenario. When this happens, the Lambda service performs a reset. The reset behaves like a `Shutdown` event. First, Lambda shuts down the runtime, then sends a `Shutdown` event to each registered external extension. The event includes the reason for the shutdown. If this environment is used for a new invocation, Lambda re-initializes the extension and runtime together with the next invocation.

  Note that the Lambda reset does not clear the `/tmp` directory content prior to the next init phase. This behavior is consistent with the regular shutdown phase.
**Note**  
AWS is currently implementing changes to the Lambda service. Due to these changes, you may see minor differences between the structure and content of system log messages and trace segments emitted by different Lambda functions in your AWS account.  
If your function's system log configuration is set to plain text, this change affects the log messages captured in CloudWatch Logs when your function experiences an invoke failure. The following examples show log outputs in both old and new formats.  
These changes will be implemented during the coming weeks, and all functions in all AWS Regions except the China and GovCloud regions will transition to use the new-format log messages and trace segments.

    
**Example CloudWatch Logs log output (runtime or extension crash) - old style**  

  ```
  START RequestId: c3252230-c73d-49f6-8844-968c01d1e2e1 Version: $LATEST
  RequestId: c3252230-c73d-49f6-8844-968c01d1e2e1 Error: Runtime exited without providing a reason
  Runtime.ExitError
  END RequestId: c3252230-c73d-49f6-8844-968c01d1e2e1
  REPORT RequestId: c3252230-c73d-49f6-8844-968c01d1e2e1 Duration: 933.59 ms Billed Duration: 934 ms Memory Size: 128 MB Max Memory Used: 9 MB
  ```  
**Example CloudWatch Logs log output (function timeout) - old style**  

  ```
  START RequestId: b70435cc-261c-4438-b9b6-efe4c8f04b21 Version: $LATEST
  2024-03-04T17:22:38.033Z b70435cc-261c-4438-b9b6-efe4c8f04b21 Task timed out after 3.00 seconds
  END RequestId: b70435cc-261c-4438-b9b6-efe4c8f04b21
  REPORT RequestId: b70435cc-261c-4438-b9b6-efe4c8f04b21 Duration: 3004.92 ms Billed Duration: 3117 ms Memory Size: 128 MB Max Memory Used: 33 MB Init Duration: 111.23 ms
  ```

  The new format for CloudWatch logs includes an additional `status`field in the `REPORT` line. In the case of a runtime or extension crash, the `REPORT` line also includes a field `ErrorType`.

    
**Example CloudWatch Logs log output (runtime or extension crash) - new style**  

  ```
  START RequestId: 5b866fb1-7154-4af6-8078-6ef6ca4c2ddd Version: $LATEST
  END RequestId: 5b866fb1-7154-4af6-8078-6ef6ca4c2ddd
  REPORT RequestId: 5b866fb1-7154-4af6-8078-6ef6ca4c2ddd Duration: 133.61 ms Billed Duration: 214 ms Memory Size: 128 MB Max Memory Used: 31 MB Init Duration: 80.00 ms Status: error Error Type: Runtime.ExitError
  ```  
**Example CloudWatch Logs log output (function timeout) - new style**  

  ```
  START RequestId: 527cb862-4f5e-49a9-9ae4-a7edc90f0fda Version: $LATEST
  END RequestId: 527cb862-4f5e-49a9-9ae4-a7edc90f0fda
  REPORT RequestId: 527cb862-4f5e-49a9-9ae4-a7edc90f0fda Duration: 3016.78 ms Billed Duration: 3101 ms Memory Size: 128 MB Max Memory Used: 31 MB Init Duration: 84.00 ms Status: timeout
  ```
+ The fourth phase represents the **INVOKE** phase immediately following an invoke failure. Here, Lambda initializes the environment again by re-running the **INIT** phase. This is called a *suppressed init*. When suppressed inits occur, Lambda doesn't explicitly report an additional **INIT** phase in CloudWatch Logs. Instead, you may notice that the duration in the REPORT line includes an additional **INIT** duration \$1 the **INVOKE** duration. For example, suppose you see the following logs in CloudWatch:

  ```
  2022-12-20T01:00:00.000-08:00 START RequestId: XXX Version: $LATEST 
  2022-12-20T01:00:02.500-08:00 END RequestId: XXX 
  2022-12-20T01:00:02.500-08:00 REPORT RequestId: XXX Duration: 3022.91 ms 
  Billed Duration: 3000 ms Memory Size: 512 MB Max Memory Used: 157 MB
  ```

  In this example, the difference between the REPORT and START timestamps is 2.5 seconds. This doesn't match the reported duration of 3022.91 millseconds, because it doesn't take into account the extra **INIT** (suppressed init) that Lambda performed. In this example, you can infer that the actual **INVOKE** phase took 2.5 seconds.

  For more insight into this behavior, you can use the [Accessing real-time telemetry data for extensions using the Telemetry API](telemetry-api.md). The Telemetry API emits `INIT_START`, `INIT_RUNTIME_DONE`, and `INIT_REPORT` events with `phase=invoke` whenever suppressed inits occur in between invoke phases.
+ The fifth phase represents the **SHUTDOWN** phase, which runs without errors.

### Shutdown phase
<a name="runtimes-lifecycle-shutdown"></a>

When Lambda is about to shut down the runtime, it sends a `Shutdown` event to each registered external extension. Extensions can use this time for final cleanup tasks. The `Shutdown` event is a response to a `Next` API request.

**Duration limit**: The maximum duration of the `Shutdown` phase depends on the configuration of registered extensions:
+ 0 ms – A function with no registered extensions
+ 500 ms – A function with a registered internal extension
+ 2,000 ms – A function with one or more registered external extensions

If the runtime or an extension does not respond to the `Shutdown` event within the limit, Lambda ends the process using a `SIGKILL` signal.

After the function and all extensions have completed, Lambda maintains the execution environment for some time in anticipation of another function invocation. However, Lambda terminates execution environments every few hours to allow for runtime updates and maintenance—even for functions that are invoked continuously. You should not assume that the execution environment will persist indefinitely. For more information, see [Implement statelessness in functions](concepts-application-design.md#statelessness-functions).

When the function is invoked again, Lambda thaws the environment for reuse. Reusing the execution environment has the following implications: 
+ Objects declared outside of the function's handler method remain initialized, providing additional optimization when the function is invoked again. For example, if your Lambda function establishes a database connection, instead of reestablishing the connection, the original connection is used in subsequent invocations. We recommend adding logic in your code to check if a connection exists before creating a new one.
+ Each execution environment provides between 512 MB and 10,240 MB, in 1-MB increments, of disk space in the `/tmp` directory. The directory content remains when the execution environment is frozen, providing a transient cache that can be used for multiple invocations. You can add extra code to check if the cache has the data that you stored. For more information on deployment size limits, see [Lambda quotasLambda quotas](gettingstarted-limits.md).
+ Background processes or callbacks that were initiated by your Lambda function and did not complete when the function ended resume if Lambda reuses the execution environment. Make sure that any background processes or callbacks in your code are complete before the code exits.

## Cold starts and latency
<a name="cold-start-latency"></a>

When Lambda receives a request to run a function via the Lambda API, the service first prepares an execution environment. During this initialization phase, the service downloads your code, starts the environment, and runs any initialization code outside of the main handler. Finally, Lambda runs the handler code.

![\[perf optimize figure 1\]](http://docs.aws.amazon.com/lambda/latest/dg/images/perf-optimize-figure-1.png)


In this diagram, the first two steps of downloading the code and setting up the environment are frequently referred to as a “cold start”. You are [charged for this time](https://aws.amazon.com/blogs/compute/aws-lambda-standardizes-billing-for-init-phase/), and it adds latency to your overall invocation duration.

After the invocation completes, the execution environment is frozen. To improve resource management and performance, Lambda retains the execution environment for a period of time. During this time, if another request arrives for the same function, Lambda can reuse the environment. This second request typically finishes more quickly, since the execution environment is already fully set up. This is called a “warm start”.

Cold starts typically occur in under 1% of invocations. The duration of a cold start varies from under 100 ms to over 1 second. In general, cold starts are typically more common in development and test functions than production workloads. This is because development and test functions are usually invoked less frequently.

## Reducing cold starts with Provisioned Concurrency
<a name="cold-starts-pc"></a>

If you need predictable function start times for your workload, [provisioned concurrency](provisioned-concurrency.md) is the recommended solution to ensure the lowest possible latency. This feature pre-initializes execution environments, reducing cold starts.

For example, a function with a provisioned concurrency of 6 has 6 execution environments pre-warmed.

![\[perf optimize figure 4\]](http://docs.aws.amazon.com/lambda/latest/dg/images/perf-optimize-figure-4.png)


## Optimizing static initialization
<a name="static-initialization"></a>

Static initialization happens before the handler code starts running in a function. This is the initialization code that you provide, that is outside of the main handler. This code is often used to import libraries and dependencies, set up configurations, and initialize connections to other services.

The following Python example shows importing, and configuring modules, and creating the Amazon S3 client during the initialization phase, before the `lambda_handler` function runs during invoke.

```
import os
import json
import cv2
import logging
import boto3

s3 = boto3.client('s3')
logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):

  # Handler logic...
```

The largest contributor of latency before function execution comes from initialization code. This code runs when a new execution environment is created for the first time. The initialization code is not run again if an invocation uses a warm execution environment. Factors that affect initialization code latency include:
+ The size of the function package, in terms of imported libraries and dependencies, and Lambda layers.
+ The amount of code and initialization work.
+ The performance of libraries and other services in setting up connections and other resources.

There are a number of steps that developers can take to optimize static initialization latency. If a function has many objects and connections, you may be able to rearchitect a single function into multiple, specialized functions. These are individually smaller and each have less initialization code.

It’s important that functions only import the libraries and dependencies that they need. For example, if you only use Amazon DynamoDB in the AWS SDK, you can require an individual service instead of the entire SDK. Compare the following three examples:

```
// Instead of const AWS = require('aws-sdk'), use:
const DynamoDB = require('aws-sdk/clients/dynamodb')

// Instead of const AWSXRay = require('aws-xray-sdk'), use:
const AWSXRay = require('aws-xray-sdk-core')

// Instead of const AWS = AWSXRay.captureAWS(require('aws-sdk')), use:
const dynamodb = new DynamoDB.DocumentClient()
AWSXRay.captureAWSClient(dynamodb.service)
```

Static initialization is also often the best place to open database connections to allow a function to reuse connections over multiple invocations to the same execution environment. However, you may have large numbers of objects that are only used in certain execution paths in your function. In this case, you can lazily load variables in the global scope to reduce the static initialization duration.

Avoid global variables for context-specific information. If your function has a global variable that is used only for the lifetime of a single invocation and is reset for the next invocation, use a variable scope that is local to the handler. Not only does this prevent global variable leaks across invocations, it also improves the static initialization performance.

# Creating event-driven architectures with Lambda
<a name="concepts-event-driven-architectures"></a>

An event is anything that triggers a Lambda function to run. Events can trigger a Lambda function in two ways: through direct invocation (push) and event source mappings (pull).

Many AWS services can directly invoke your Lambda functions. These services *push* events to your Lambda function. Events that trigger functions can be almost anything, from an HTTP request through API Gateway, a schedule managed by an EventBridge rule, an AWS IoT event, or an Amazon S3 event. With event source mapping, Lambda actively fetches (or *pulls*) events from a queue or stream. You configure Lambda to check for events from a supported service, and Lambda handles the polling and invocation of your function.

When passed to your function, events are structured in JSON format. The JSON structure varies depending on the service that generates it and the event type. While standard Lambda function invocations can last up to 15 minutes (or up to one year with [durable functions](durable-functions.md)), Lambda is best-suited for short invocations that last one second or less. This is particularly true of event-driven architectures, where each Lambda function is treated as a microservice responsible for performing a narrow set of specific instructions.

**Note**  
Event-driven architectures communicate across different systems using networks, which introduce variable latency. For workloads that require very low latency, such as real-time trading systems, this design might not be the best choice. However, for highly scalable and available workloads, or those with unpredictable traffic patterns, event-driven architectures can provide an effective way to meet these demands.

**Topics**
+ [

## Benefits of event-driven architectures
](#event-driven-benefits)
+ [

## Trade-offs of event-driven architectures
](#event-driven-tradeoffs)
+ [

## Anti-patterns in Lambda-based event-driven applications
](#event-driven-anti-patterns)

## Benefits of event-driven architectures
<a name="event-driven-benefits"></a>

Lambda supports two methods of invocation in event-driven architectures:

1. Direct invocation (push method): AWS services trigger Lambda functions directly. For example:
   + Amazon S3 triggers a function when a file is uploaded
   + API Gateway triggers a function when it receives an HTTP request

1. Event source mapping (pull method): Lambda retrieves events and invokes functions. For example:
   + Lambda retrieves messages from an Amazon SQS queue and invokes a function
   + Lambda reads records from a DynamoDB stream and invokes a function

Both methods contribute to the benefits of event-driven architectures, as described below.

### Replacing polling and webhooks with events
<a name="polling-webhooks-events"></a>

Many traditional architectures use polling and webhook mechanisms to communicate state between different components. Polling can be highly inefficient for fetching updates since there is a lag between new data becoming available and synchronization with downstream services. Webhooks are not always supported by other microservices that you want to integrate with. They might also require custom authorization and authentication configurations. In both cases, these integration methods are challenging to scale on-demand without additional work by development teams.

![\[event driven architectures figure 7\]](http://docs.aws.amazon.com/lambda/latest/dg/images/event-driven-architectures-figure-7.png)


Both of these mechanisms can be replaced by events, which can be filtered, routed, and pushed downstream to consuming microservices. This approach can result in less bandwidth consumption, CPU utilization, and potentially lower cost. These architectures can also reduce complexity, since each functional unit is smaller and there is often less code.

![\[event driven architectures figure 8\]](http://docs.aws.amazon.com/lambda/latest/dg/images/event-driven-architectures-figure-8.png)


Event-driven architectures can also make it easier to design near-real-time systems, helping organizations move away from batch-based processing. Events are generated at the time when state in the application changes, so the custom code of a microservice should be designed to handle the processing of a single event. Since scaling is handled by the Lambda service, this architecture can handle significant increases in traffic without changing custom code. As events scale up, so does the compute layer that processes events.

### Reducing complexity
<a name="complexity"></a>

Microservices enable developers and architects to simplify complex workflows. For example, an ecommerce monolith can be broken down into order acceptance and payment processes with separate inventory, fulfillment and accounting services. What might be complex to manage and orchestrate in a monolith becomes a series of decoupled services that communicate asynchronously with events.

![\[event driven architectures figure 9\]](http://docs.aws.amazon.com/lambda/latest/dg/images/event-driven-architectures-figure-9.png)


This approach also makes it possible to assemble services that process data at different rates. In this case, an order acceptance microservice can store high volumes of incoming orders by buffering the messages in an Amazon SQS queue.

A payment processing service, which is typically slower due to the complexity of handling payments, can take a steady stream of messages from the Amazon SQS queue. It can orchestrate complex retry and error handling logic using AWS Step Functions, and coordinate active payment workflows for hundreds of thousands of orders.

**Alternative approach:** For orchestration using standard programming languages, you can use [Lambda durable functions](durable-functions.md). Durable functions let you write the order acceptance, payment processing, and notification logic in code with automatic checkpointing and retry. This approach works well when the workflow primarily involves Lambda functions and you prefer keeping orchestration logic in code.

### Improving scalability and extensibility
<a name="scalability-extensibility"></a>

Microservices generate events that are typically published to messaging services like Amazon SNS and Amazon SQS. These behave like an elastic buffer between microservices and help handle scaling when traffic increases. Services like Amazon EventBridge can then filter and route messages depending upon the content of the event, as defined in rules. As a result, event-based applications can be more scalable and offer greater redundancy than monolithic applications.

This system is also highly extensible, allowing other teams to extend features and add functionality without impacting the order processing and payment processing microservices. By publishing events using EventBridge, this application integrates with existing systems, such as the inventory microservice, but also enables any future application to integrate as an event consumer. Producers of events have no knowledge of event consumers, which can help simplify the microservice logic.

## Trade-offs of event-driven architectures
<a name="event-driven-tradeoffs"></a>

### Variable latency
<a name="variable-latency"></a>

Unlike monolithic applications, which might process everything within the same memory space on a single device, event-driven applications communicate across networks. This design introduces variable latency. While it’s possible to engineer applications to minimize latency, monolithic applications can almost always be optimized for lower latency at the expense of scalability and availability.

Workloads that require consistent low-latency performance, such as high-frequency trading applications in banks or sub-millisecond robotics automation in warehouses, are not good candidates for event-driven architecture.

### Eventual consistency
<a name="eventual-consistency"></a>

An event represents a change in state, and with many events flowing through different services in an architecture at any given point of time, such workloads are often [eventually consistent](https://en.wikipedia.org/wiki/Eventual_consistency). This makes it more complex to process transactions, handle duplicates, or determine the exact overall state of a system.

Some workloads contain a combination of requirements that are eventually consistent (for example, total orders in the current hour) or strongly consistent (for example, current inventory). For workloads needing strong data consistency, there are architecture patterns to support this. For example:
+ DynamoDB can provide [ strongly consistent reads](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadConsistency.html), sometimes at a higher latency, consuming a greater throughput than the default mode. DynamoDB can also [ support transactions](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/transactions.html) to help maintain data consistency.
+ You can use Amazon RDS for features needing [ACID properties](https://en.wikipedia.org/wiki/ACID), though relational databases are generally less scalable than NoSQL databases like DynamoDB. [Amazon RDS Proxy](https://aws.amazon.com/rds/proxy/) can help manage connection pooling and scaling from ephemeral consumers like Lambda functions.

Event-based architectures are usually designed around individual events instead of large batches of data. Generally, workflows are designed to manage the steps of an individual event or execution flow instead of operating on multiple events simultaneously. In serverless, real-time event processing is preferred over batch processing: batches should be replaced with many smaller incremental updates. While this can make workloads more available and scalable, it also makes it more challenging for events to have awareness of other events.

### Returning values to callers
<a name="values-callers"></a>

In many cases, event-based applications are asynchronous. This means that caller services do not wait for requests from other services before continuing with other work. This is a fundamental characteristic of event-driven architectures that enables scalability and flexibility. This means that passing return values or the result of a workflow is more complex than in synchronous execution flows.

Most Lambda invocations in production systems are [asynchronous](invocation-async.md), responding to events from services like Amazon S3 or Amazon SQS. In these cases, the success or failure of processing an event is often more important than returning a value. Features such as [dead letter queues](https://docs.aws.amazon.com/lambda/latest/dg/invocation-async.html) (DLQs) in Lambda are provided to ensure you can identify and retry failed events, without needing to notify the caller.

### Debugging across services and functions
<a name="services-functions"></a>

Debugging event-driven systems is also different compared to a monolithic application. With different systems and services passing events, it's not possible to record and reproduce the exact state of multiple services when errors occur. Since each service and function invocation has separate log files, it can be more complicated to determine what happened to a specific event that caused an error.

There are three important requirements for building a successful debugging approach in event-driven systems. First, a robust logging system is critical, and this is provided across AWS services and embedded in Lambda functions by Amazon CloudWatch. Second, in these systems, it’s important to ensure that every event has a transaction identifier that is logged at each step throughout a transaction, to help when searching for logs.

Finally, it’s highly recommended to automate the parsing and analysis of logs by using a debugging and monitoring service like AWS X-Ray. This can consume logs across multiple Lambda invocations and services, making it much easier to pinpoint the root cause of issues. See [ Troubleshooting walkthrough](lambda-troubleshooting.md) for in-depth coverage of using X-Ray for troubleshooting.

## Anti-patterns in Lambda-based event-driven applications
<a name="event-driven-anti-patterns"></a>

When building event-driven architectures with Lambda, avoid the following common anti-patterns. These patterns work but can increase costs and complexity.

### The Lambda monolith
<a name="monolith"></a>

In many applications migrated from traditional servers, such as Amazon EC2 instances or Elastic Beanstalk applications, developers “lift and shift” existing code. Frequently, this results in a single Lambda function that contains all of the application logic that is triggered for all events. For a basic web application, a monolithic Lambda function would handle all API Gateway routes and integrate with all necessary downstream resources.

![\[event driven architectures figure 13\]](http://docs.aws.amazon.com/lambda/latest/dg/images/event-driven-architectures-figure-13.png)


This approach has several drawbacks:
+  **Package size** – The Lambda function might be much larger because it contains all possible code for all paths, which makes it slower for the Lambda service to run.
+  **Hard to enforce least privilege** – The function’s [execution role](lambda-intro-execution-role.md) must allow permissions to all resources needed for all paths, making the permissions very broad. This is a security concern. Many paths in the functional monolith do not need all the permissions that have been granted.
+  **Harder to upgrade** – In a production system, any upgrades to the single function are more risky and could break the entire application. Upgrading a single path in the Lambda function is an upgrade to the entire function.
+  **Harder to maintain** – It’s more difficult to have multiple developers working on the service since it’s a monolithic code repository. It also increases the cognitive burden on developers and makes it harder to create appropriate test coverage for code.
+  **Harder to reuse code** – It's harder to separate reusable libraries from monoliths, making code reuse more difficult. As you develop and support more projects, this can make it harder to support the code and scale your team’s velocity.
+  **Harder to test** – As the lines of code increase, it becomes harder to unit test all the possible combinations of inputs and entry points in the code base. It’s generally easier to implement unit testing for smaller services with less code.

The preferred alternative is to break down the monolithic Lambda function into individual microservices, mapping a single Lambda function to a single, well-defined task. In this simple web application with a few API endpoints, the resulting microservice-based architecture can be based upon the API Gateway routes.

![\[event driven architectures figure 14\]](http://docs.aws.amazon.com/lambda/latest/dg/images/event-driven-architectures-figure-14.png)


### Recursive patterns that cause runaway Lambda functions
<a name="recursive-runaway"></a>

AWS services generate events that invoke Lambda functions, and Lambda functions can send messages to AWS services. Generally, the service or resource that invokes a Lambda function should be different to the service or resource that the function outputs to. Failure to manage this can result in infinite loops.

For example, a Lambda function writes an object to an Amazon S3 object, which in turn invokes the same Lambda function via a put event. The invocation causes a second object to be written to the bucket, which invokes the same Lambda function:

![\[event driven architectures figure 15\]](http://docs.aws.amazon.com/lambda/latest/dg/images/event-driven-architectures-figure-15.png)


While the potential for infinite loops exists in most programming languages, this anti-pattern has the potential to consume more resources in serverless applications. Both Lambda and Amazon S3 automatically scale based upon traffic, so the loop can cause Lambda to scale to consume all available concurrency and Amazon S3 will continue to write objects and generate more events for Lambda.

This example uses S3, but the risk of recursive loops also exists in Amazon SNS, Amazon SQS, DynamoDB, and other services. You can use [recursive loop detection](invocation-recursion.md) to find and avoid this anti-pattern.

### Lambda functions calling Lambda functions
<a name="functions-calling-functions"></a>

Functions enable encapsulation and code re-use. Most programming languages support the concept of code synchronously calling functions within a code base. In this case, the caller waits until the function returns a response.

**Note**  
While Lambda functions directly calling other Lambda functions is generally an anti-pattern due to cost and complexity concerns, this doesn't apply to [durable functions](durable-functions.md), which are specifically designed to orchestrate multi-step workflows by invoking other functions.

When this happens on a traditional server or virtual instance, the operating system scheduler switches to other available work. Whether the CPU runs at 0% or 100% does not affect the overall cost of the application, since you are paying for the fixed cost of owning and operating a server.

This model often does not adapt well to serverless development. For example, consider a simple ecommerce application consisting of three Lambda functions that process an order:

![\[event driven architectures figure 16\]](http://docs.aws.amazon.com/lambda/latest/dg/images/event-driven-architectures-figure-16.png)


In this case, the *Create order* function calls the *Process payment* function, which in turn calls the *Create invoice* function. While this synchronous flow might work within a single application on a server, it introduces several avoidable problems in a distributed serverless architecture:
+  **Cost** – With Lambda, you pay for the duration of an invocation. In this example, while the *Create invoice* functions runs, two other functions are also running in a wait state, shown in red on the diagram.
+  **Error handling** – In nested invocations, error handling can become much more complex. For example, an error in *Create invoice* might require the *Process payment* function to reverse the charge, or it might instead retry the *Create invoice* process.
+  **Tight coupling** – Processing a payment typically takes longer than creating an invoice. In this model, the availability of the entire workflow is limited by the slowest function.
+  **Scaling** – The [concurrency](lambda-concurrency.md) of all three functions must be equal. In a busy system, this uses more concurrency than would otherwise be needed.

In serverless applications, there are two common approaches to avoid this pattern. First, use an Amazon SQS queue between Lambda functions. If a downstream process is slower than an upstream process, the queue durably persists messages and decouples the two functions. In this example, the *Create order* function would publish a message to an Amazon SQS queue, and the *Process payment* function consumes messages from the queue.

The second approach is to use AWS Step Functions. For complex processes with multiple types of failure and retry logic, Step Functions can help reduce the amount of custom code needed to orchestrate the workflow. As a result, Step Functions orchestrates the work and robustly handles errors and retries, and the Lambda functions contain only business logic.

### Synchronous waiting within a single Lambda function
<a name="synchronous-waiting"></a>

Make sure that any potentially concurrent activities are not scheduled synchronously within a single Lambda function. For example, a Lambda function might write to an S3 bucket and then write to a DynamoDB table:

![\[event driven architectures figure 17\]](http://docs.aws.amazon.com/lambda/latest/dg/images/event-driven-architectures-figure-17.png)


In this design, wait times are compounded because the activities are sequential. In cases where the second task depends on the completion of the first task, you can reduce the total waiting time and the cost of execution by have two separate Lambda functions:

![\[event driven architectures figure 19\]](http://docs.aws.amazon.com/lambda/latest/dg/images/event-driven-architectures-figure-19.png)


In this design, the first Lambda function responds immediately after putting the object to the Amazon S3 bucket. The S3 service invokes the second Lambda function, which then writes data to the DynamoDB table. This approach minimizes the total wait time in the Lambda function executions.

# Designing Lambda applications
<a name="concepts-application-design"></a>

A well-architected event-driven application uses a combination of AWS services and custom code to process and manage requests and data. This chapter focuses on Lambda-specific topics in application design. There are many important considerations for serverless architects when designing applications for busy production systems.

Many of the best practices that apply to software development and distributed systems also apply to serverless application development. The overall goal is to develop workloads that are:
+  **Reliable** – offering your end users a high level of availability. AWS serverless services are reliable because they are also designed for failure.
+  **Durable** – providing storage options that meet the durability needs of your workload.
+  **Secure** – following best practices and using the tools provided to secure access to workloads and limit the blast radius.
+  **Performant** – using computing resources efficiently and meeting the performance needs of your end users.
+  **Cost-efficient**– designing architectures that avoid unnecessary cost that can scale without overspending, and also be decommissioned without significant overhead.

The following design principles can help you build workloads that meet these goals. Not every principle may apply to every architecture, but they should guide you in general architecture decisions.

**Topics**
+ [

## Use services instead of custom code
](#services-custom-code)
+ [

## Understand Lambda abstraction levels
](#level-abstraction)
+ [

## Implement statelessness in functions
](#statelessness-functions)
+ [

## Minimize coupling
](#minimize-coupling)
+ [

## Build for on-demand data instead of batches
](#on-demand-batches)
+ [

## Choose an orchestration option for complex workflows
](#orchestration)
+ [

## Implement idempotency
](#retries-failures)
+ [

## Use multiple AWS accounts for managing quotas
](#multiple-accounts)

## Use services instead of custom code
<a name="services-custom-code"></a>

Serverless applications usually comprise several AWS services, integrated with custom code run in Lambda functions. While Lambda can be integrated with most AWS services, the services most commonly used in serverless applications are:


| Category | AWS service | 
| --- | --- | 
|  Compute  |  AWS Lambda  | 
|  Data storage  |  Amazon S3 Amazon DynamoDB Amazon RDS  | 
|  API  |  Amazon API Gateway  | 
|  Application integration  |  Amazon EventBridge Amazon SNS Amazon SQS  | 
|  Orchestration  |  Lambda durable functions AWS Step Functions  | 
|  Streaming data and analytics  |  Amazon Data Firehose  | 

**Note**  
Many serverless services provide replication and support for multiple Regions, including DynamoDB and Amazon S3. Lambda functions can be deployed in multiple Regions as part of a deployment pipeline, and API Gateway can be configured to support this configuration. See this [ example architecture](https://d1.awsstatic.com/architecture-diagrams/ArchitectureDiagrams/serverless-architecture-for-global-applications-ra.pdf?did=wp_card&trk=wp_card) that shows how this can be achieved.

There are many well-established, common patterns in distributed architectures that you can build yourself or implement using AWS services. For most customers, there is little commercial value in investing time to develop these patterns from scratch. When your application needs one of these patterns, use the corresponding AWS service:


| Pattern | AWS service | 
| --- | --- | 
|  Queue  |  Amazon SQS  | 
|  Event bus  |  Amazon EventBridge  | 
|  Publish/subscribe (fan-out)  |  Amazon SNS  | 
|  Orchestration  |  Lambda durable functions AWS Step Functions  | 
|  API  |  Amazon API Gateway  | 
|  Event streams  |  Amazon Kinesis  | 

These services are designed to integrate with Lambda and you can use infrastructure as code (IaC) to create and discard resources in the services. You can use any of these services via the [AWS SDK](https://aws.amazon.com/tools/) without needing to install applications or configure servers. Becoming proficient with using these services via code in your Lambda functions is an important step to producing well-designed serverless applications.

## Understand Lambda abstraction levels
<a name="level-abstraction"></a>

The Lambda service limits your access to the underlying operating systems, hypervisors, and hardware running your Lambda functions. The service continuously improves and changes infrastructure to add features, reduce cost and make the service more performant. Your code should assume no knowledge of how Lambda is architected and assume no hardware affinity.

Similarly, Lambda's integrations with other services are managed by AWS, with only a small number of configuration options exposed to you. For example, when API Gateway and Lambda interact, there is no concept of load balancing since it is entirely managed by the services. You also have no direct control over which [Availability Zones](https://aws.amazon.com/about-aws/global-infrastructure/regions_az/) the services use when invoking functions at any point in time, or how Lambda determines when to scale up or down the number of execution environments.

This abstraction helps you focus on the integration aspects of your application, the flow of data, and the business logic where your workload provides value to your end users. Allowing the services to manage the underlying mechanics helps you develop applications more quickly with less custom code to maintain.

## Implement statelessness in functions
<a name="statelessness-functions"></a>

For standard Lambda functions, you should assume that the environment exists only for a single invocation. The function should initialize any required state when it is first started. For example, your function may require fetching data from a DynamoDB table. It should commit any permanent data changes to a durable store such as Amazon S3, DynamoDB, or Amazon SQS before exiting. It should not rely on any existing data structures or temporary files, or any internal state that would be managed by multiple invocations.

When using Durable Functions, state is automatically preserved between invocations, eliminating the need to manually persist state to external storage. However, you should still follow stateless principles for any data not explicitly managed through the DurableContext.

To initialize database connections and libraries, or load state, you can take advantage of [static initialization](lambda-runtime-environment.md#static-initialization). Since execution environments are reused where possible to improve performance, you can amortize the time taken to initialize these resources over multiple invocations. However, you should not store any variables or data used in the function within this global scope.

## Minimize coupling
<a name="minimize-coupling"></a>

Most architectures should prefer many, shorter functions over fewer, larger ones. The purpose of each function should be to handle the event passed into the function, with no knowledge or expectations of the overall workflow or volume of transactions. This makes the function agnostic to the source of the event with minimal coupling to other services.

Any global-scope constants that change infrequently should be implemented as environment variables to allow updates without deployments. Any secrets or sensitive information should be stored in [AWS Systems Manager Parameter Store](https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-parameter-store.html) or [AWS Secrets Manager](https://aws.amazon.com/secrets-manager/) and loaded by the function. Since these resources are account-specific, you can create build pipelines across multiple accounts. The pipelines load the appropriate secrets per environment, without exposing these to developers or requiring any code changes.

## Build for on-demand data instead of batches
<a name="on-demand-batches"></a>

Many traditional systems are designed to run periodically and process batches of transactions that have built up over time. For example, a banking application may run every hour to process ATM transactions into central ledgers. In Lambda-based applications, the custom processing should be triggered by every event, allowing the service to scale up concurrency as needed, to provide near-real time processing of transactions.

While standard Lambda functions are limited to 15 minutes of execution time, Durable Functions can run for up to one year, making them suitable for longer-running processing needs. However, you should still prefer event-driven processing over batch processing when possible.

While you can run [cron](https://en.wikipedia.org/wiki/Cron) tasks in serverless applications [by using scheduled expressions](https://docs.aws.amazon.com/eventbridge/latest/userguide/scheduled-events.html) for rules in Amazon EventBridge, these should be used sparingly or as a last-resort. In any scheduled task that processes a batch, there is the potential for the volume of transactions to grow beyond what can be processed within the 15-minute Lambda duration limit. If the limitations of external systems force you to use a scheduler, you should generally schedule for the shortest reasonable recurring time period.

For example, it’s not best practice to use a batch process that triggers a Lambda function to fetch a list of new Amazon S3 objects. This is because the service may receive more new objects in between batches than can be processed within a 15-minute Lambda function.

![\[event driven architectures figure 10\]](http://docs.aws.amazon.com/lambda/latest/dg/images/event-driven-architectures-figure-10.png)


Instead, Amazon S3 should invoke the Lambda function each time a new object is put into the bucket. This approach is significantly more scalable and works in near-real time.

![\[event driven architectures figure 11\]](http://docs.aws.amazon.com/lambda/latest/dg/images/event-driven-architectures-figure-11.png)


## Choose an orchestration option for complex workflows
<a name="orchestration"></a>

Workflows that involve branching logic, different types of failure models, and retry logic typically use an orchestrator to keep track of the state of the overall execution. Don't build ad-hoc orchestration in standard Lambda functions. This results in tight coupling, complex routing code, and no automatic state recovery.

Instead, use one of these purpose-built orchestration options:
+ **[Lambda durable functions](durable-functions.md):** Application-centric orchestration using standard programming languages with automatic checkpointing, built-in retry, and execution recovery. Ideal for developers who prefer keeping workflow logic in code alongside business logic within Lambda.
+ **[AWS Step Functions](with-step-functions.md):** Visual workflow orchestration with native integrations to 220\$1 AWS services. Ideal for multi-service coordination, zero-maintenance infrastructure, and visual workflow design.

For guidance on choosing between these options, see [Durable functions or Step Functions](durable-step-functions.md).

With [Step Functions](https://aws.amazon.com/step-functions/), you use state machines to manage orchestration. This extracts the error handling, routing, and branching logic from your code, replacing it with state machines declared using JSON. Apart from making workflows more robust and observable, you can also add versioning to workflows and make the state machine a codified resource that you can add to a code repository.

It’s common for simpler workflows in Lambda functions to become more complex over time. When operating a production serverless application, it’s important to identify when this is happening, so you can migrate this logic to a state machine or durable function.

## Implement idempotency
<a name="retries-failures"></a>

AWS serverless services, including Lambda, are fault-tolerant and designed to handle failures. For example, if a service invokes a Lambda function and there is a service disruption, Lambda invokes your function in a different Availability Zone. If your function throws an error, Lambda retries the invocation.

Since the same event may be received more than once, functions should be designed to be [idempotent](https://en.wikipedia.org/wiki/Idempotence). This means that receiving the same event multiple times does not change the result beyond the first time the event was received.

You can implement idempotency in Lambda functions by using a DynamoDB table to track recently processed identifiers to determine if the transaction has already been handled previously. The DynamoDB table usually implements a [Time To Live (TTL)](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html) value to expire items to limit the storage space used.

## Use multiple AWS accounts for managing quotas
<a name="multiple-accounts"></a>

Many [service quotas](gettingstarted-limits.md) in AWS are set at the account level. This means that as you add more workloads, you can quickly exhaust your limits.

An effective way to solve this issue is to use multiple AWS accounts, dedicating each workload to its own account. This prevents quotas from being shared with other workloads or non-production resources.

 In addition, by using [AWS Organizations](https://aws.amazon.com/organizations/), you can centrally manage the billing, compliance, and security of these accounts. You can attach policies to groups of accounts to avoid custom scripts and manual processes.

One common approach is to provide each developer with an AWS account, and then use separate accounts for a beta deployment stage and production:

![\[application design figure 3\]](http://docs.aws.amazon.com/lambda/latest/dg/images/application-design-figure-3.png)


In this model, each developer has their own set of limits for the account, so their usage does not impact your production environment. This approach also allows developers to test Lambda functions locally on their development machines against live cloud resources in their individual accounts.