Lambda-based application design principles
A well-architected event-driven application uses a combination of AWS services and custom code to process and manage requests and data. This chapter focuses on Lambda-specific topics in application design. There are many important considerations for serverless architects when designing applications for busy production systems.
Many of the best practices that apply to software development and distributed systems also apply to serverless application development. The overall goal is to develop workloads that are:
-
Reliable – offering your end users a high level of availability. AWS serverless services are reliable because they are also designed for failure.
-
Durable – providing storage options that meet the durability needs of your workload.
-
Secure – following best practices and using the tools provided to secure access to workloads and limit the blast radius.
-
Performant – using computing resources efficiently and meeting the performance needs of your end users.
-
Cost-efficient– designing architectures that avoid unnecessary cost that can scale without overspending, and also be decommissioned without significant overhead.
The following design principles can help you build workloads that meet these goals. Not every principle may apply to every architecture, but they should guide you in general architecture decisions.
Topics
Use services instead of custom code
Serverless applications usually comprise several AWS services, integrated with custom code run in Lambda functions. While Lambda can be integrated with most AWS services, the services most commonly used in serverless applications are:
Category | AWS service |
---|---|
Compute |
AWS Lambda |
Data storage |
Amazon S3 Amazon DynamoDB Amazon RDS |
API |
Amazon API Gateway |
Application integration |
Amazon EventBridge Amazon SNS Amazon SQS |
Orchestration |
AWS Step Functions |
Streaming data and analytics |
Amazon Data Firehose |
There are many well-established, common patterns in distributed architectures that you can build yourself or implement using AWS services. For most customers, there is little commercial value in investing time to develop these patterns from scratch. When your application needs one of these patterns, use the corresponding AWS service:
Pattern | AWS service |
---|---|
Queue |
Amazon SQS |
Event bus |
Amazon EventBridge |
Publish/subscribe (fan-out) |
Amazon SNS |
Orchestration |
AWS Step Functions |
API |
Amazon API Gateway |
Event streams |
Amazon Kinesis |
These services are designed to integrate with Lambda and you can use infrastructure as code (IaC) to
create and discard resources in the services. You can use any of these services via the
AWS SDK
Understand Lambda abstraction levels
The Lambda service limits your access to the underlying operating systems, hypervisors, and hardware running your Lambda functions. The service continuously improves and changes infrastructure to add features, reduce cost and make the service more performant. Your code should assume no knowledge of how Lambda is architected and assume no hardware affinity.
Similarly, Lambda's integration with is managed by AWS with only a small number of configuration options
exposed to you. For example, when API Gateway and Lambda interact, there is no concept of load balancing since it is
entirely managed by the services. You also have no direct control over which
Availability Zones
This abstraction allows you to focus on the integration aspects of your application, the flow of data, and the business logic where your workload provides value to your end users. Allowing the services to manage the underlying mechanics helps you develop applications more quickly with less custom code to maintain.
Implement statelessness in functions
When building Lambda functions, you should assume that the environment exists only for a single invocation. The function should initialize any required state when it is first started. For example, your function may require fetching data from a DynamoDB table. It should commit any permanent data changes to a durable store such as Amazon S3, DynamoDB, or Amazon SQS before exiting. It should not rely on any existing data structures or temporary files, or any internal state that would be managed by multiple invocations.
To initialize database connections and libraries, or load state, you can take advantage of static initialization. Since execution environments are reused where possible to improve performance, you can amortize the time taken to initialize these resources over multiple invocations. However, you should not store any variables or data used in the function within this global scope.
Minimize coupling
Most architectures should prefer many, shorter functions over fewer, larger ones. The purpose of each function should be to handle the event passed into the function, with no knowledge or expectations of the overall workflow or volume of transactions. This makes the function agnostic to the source of the event with minimal coupling to other services.
Any global-scope constants that change infrequently should be implemented as environment variables to
allow updates without deployments. Any secrets or sensitive information should be stored in
AWS Systems Manager Parameter Store or AWS Secrets Manager
Build for on-demand data instead of batches
Many traditional systems are designed to run periodically and process batches of transactions that have built up over time. For example, a banking application may run every hour to process ATM transactions into central ledgers. In Lambda-based applications, the custom processing should be triggered by every event, allowing the service to scale up concurrency as needed, to provide near-real time processing of transactions.
While you can run cron
For example, it’s not best practice to use a batch process that triggers a Lambda function to fetch a list of new Amazon S3 objects. This is because the service may receive more new objects in between batches than can be processed within a 15-minute Lambda function.
Instead, Amazon S3 should invoke the Lambda function each time a new object is put into the bucket. This approach is significantly more scalable and works in near-real time.
Consider AWS Step Functions for orchestration
Workflows that involve branching logic, different types of failure models, and retry logic typically use an orchestrator to keep track of the state of the overall execution. Avoid using Lambda functions for this purpose, since it results in tight coupling and complex code handling routing.
With AWS Step Functions
It’s common for simpler workflows in Lambda functions to become more complex over time. When operating a production serverless application, it’s important to identify when this is happening, so you can migrate this logic to a state machine.
Implement idempotency
AWS serverless services, including Lambda, are fault-tolerant and designed to handle failures. For example, if a service invokes a Lambda function and there is a service disruption, Lambda invokes your function in a different Availability Zone. If your function throws an error, Lambda retries the invocation.
Since the same event may be received more than once, functions should be designed to be
idempotent
You can implement idempotency in Lambda functions by using a DynamoDB table to track recently processed identifiers to determine if the transaction has already been handled previously. The DynamoDB table usually implements a Time To Live (TTL) value to expire items to limit the storage space used.
Use multiple AWS accounts for managing quotas
Many service quotas in AWS are set at the account level. This means that as you add more workloads, you can quickly exhaust your limits.
An effective way to solve this issue is to use multiple AWS accounts, dedicating each workload to its own account. This prevents quotas from being shared with other workloads or non-production resources.
In addition, by using AWS Organizations
One common approach is to provide each developer with an AWS account, and then use separate accounts for a beta deployment stage and production:
In this model, each developer has their own set of limits for the account, so their usage does not impact your production environment. This approach also allows developers to test Lambda functions locally on their development machines against live cloud resources in their individual accounts.