Process events asynchronously with Amazon API Gateway and Amazon DynamoDB Streams - AWS Prescriptive Guidance

Process events asynchronously with Amazon API Gateway and Amazon DynamoDB Streams

Created by Andrea Meroni (AWS), Alessandro Trisolini (AWS), Nadim Majed (AWS), Mariem Kthiri (AWS), and Michael Wallner (AWS)

Code repository: Asynchronous Processing with API Gateway and DynamoDB Streams

Environment: PoC or pilot

Technologies: Serverless

AWS services: Amazon API Gateway; Amazon DynamoDB; Amazon DynamoDB Streams; AWS Lambda; Amazon SNS

Summary

Amazon API Gateway is a fully managed service that developers can use to create, publish, maintain, monitor, and secure APIs at any scale. It handles the tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls.

An important service quota of API Gateway is the integration timeout. The timeout is the maximum time in which a backend service must return a response before the REST API returns an error. The hard limit of 29 seconds is generally acceptable for synchronous workloads. However, that limit represents a challenge for those developers who want to use API Gateway with asynchronous workloads.

This pattern shows an example architecture for processing events asynchronously using API Gateway, Amazon DynamoDB Streams, and AWS Lambda. The architecture supports running parallel processing jobs with the same input parameters, and it uses a basic REST API as the interface. In this example, using Lambda as the backend limits the duration of jobs to 15 minutes. You can avoid this limit by using an alternative service to process incoming events (for example, AWS Fargate).

Projen is used to set up the local development environment and to deploy the example architecture to a target AWS account, in combination with the AWS Cloud Development Kit (AWS CDK) Toolkit, Docker and Node.js. Projen automatically sets up a Python virtual environment with pre-commit and the tools that are used for code quality assurance, security scanning, and unit testing. For more information, see the Tools section.

Prerequisites and limitations

Prerequisites

Limitations

  • The advised maximum number of readers for DynamoDB Streams is two to avoid throttling.

  • The maximum runtime of a job is limited by the maximum runtime for Lambda functions (15 minutes).

  • The maximum number of concurrent job requests is limited by the reserved concurrency of the Lambda functions.

Architecture

Architecture

The following diagram shows the interaction of the jobs API with DynamoDB Streams and the event-processing and error-handling Lambda functions, with events stored in an Amazon EventBridge event archive.

Diagram of architecture and process, with steps listed after the diagram.

A typical workflow includes the following steps:

  1. You authenticate against AWS Identity and Access Management (IAM) and obtain security credentials.

  2. You send an HTTP POST request to the /jobs jobs API endpoint, specifying the job parameters in the request body.

  3. The jobs API returns to you an HTTP response that contains the job identifier.

  4. The jobs API puts the job parameters in the jobs_table Amazon DynamoDB table.

  5. The jobs_table DynamoDB table DynamoDB stream invokes the event-processing Lambda functions.

  6. The event-processing Lambda functions process the event and then put the job results in the jobs_table DynamoDB table. To help ensure consistent results, the event-processing functions implement an optimistic locking mechanism.

  7. You send an HTTP GET request to the /jobs/{jobId} jobs API endpoint, with the job identifier from step 3 as {jobId}.

  8. The jobs API queries the jobs_table DynamoDB table to retrieve the job results.

  9. The jobs API returns an HTTP response that contains the job results.

  10. If the event processing fails, the event-processing function's source mapping sends the event to the error-handling Amazon Simple Notification Service (Amazon SNS) topic.

  11. The error-handling SNS topic asynchronously pushes the event to the error-handling function.

  12. The error-handling function puts the job parameters in the jobs_table DynamoDB table.

    You can retrieve the job parameters by sending an HTTP GET request to the /jobs/{jobId} jobs API endpoint.

  13. If the error handling fails, the error-handling function sends the event to an Amazon EventBridge archive.

    You can replay the archived events by using EventBridge.

Tools

AWS services

  • AWS Cloud Development Kit (AWS CDK) is a software development framework that helps you define and provision AWS Cloud infrastructure in code.

  • Amazon DynamoDB is a fully managed NoSQL database service that provides fast, predictable, and scalable performance.

  • Amazon EventBridge is a serverless event bus service that helps you connect your applications with real-time data from a variety of sources. For example, AWS Lambda functions, HTTP invocation endpoints using API destinations, or event buses in other AWS accounts.

  • AWS Lambda is a compute service that helps you run code without needing to provision or manage servers. It runs your code only when needed and scales automatically, so you pay only for the compute time that you use.

  • Amazon Simple Notification Service (Amazon SNS) helps you coordinate and manage the exchange of messages between publishers and clients, including web servers and email addresses.

Other tools

  • autopep8 automatically formats Python code based on the Python Enhancement Proposal (PEP) 8 style guide.

  • Bandit scans Python code to find common security issues.

  • Commitizen is a Git commit checker and CHANGELOG generator.

  • cfn-lint is an AWS CloudFormation linter

  • Checkov is a static code-analysis tool that checks infrastructure as code (IaC) for security and compliance misconfigurations.

  • jq is a command-line tool for parsing JSON.

  • Postman is an API platform.

  • pre-commit is a Git hooks manager.

  • Projen is a project generator.

  • pytest is a Python framework for writing small, readable tests.

Code repository

This example architecture code can be found in the GitHub Asynchronous Processing with API Gateway and DynamoDB Streams repository.

Best practices

  • This example architecture doesn't include monitoring of the deployed infrastructure. If your use case requires monitoring, evaluate adding CDK Monitoring Constructs or another monitoring solution.

  • This example architecture uses IAM permissions to control the access to the jobs API. Anyone authorized to assume the JobsAPIInvokeRole will be able to invoke the jobs API. As such, the access control mechanism is binary. If your use case requires a more complex authorization model, evaluate using a different access control mechanism.

  • When a user sends an HTTP POST request to the /jobs jobs API endpoint, the input data is validated at two different levels:

    • API Gateway is in charge of the first request validation.

    • The event processing function performs the second request.

      No validation is performed when the user does an HTTP GET request to the /jobs/{jobId} jobs API endpoint. If your use case requires additional input validation and an increased level of security, evaluate using AWS WAF to protect your API.

  • To avoid throttling, the DynamoDB Streams documentation discourages users from reading with more than two consumers from the same stream’s shard. To scale out the number of consumers, we recommend using Amazon Kinesis Data Streams.

  • Optimistic locking has been used in this example to ensure consistent updates of items in the jobs_table DynamoDB table. Depending on the use-case requirement, you might need to implement more reliable locking mechanisms, such as pessimistic locking.

Epics

TaskDescriptionSkills required

Clone the repository.

To clone the repository locally, run the following command:

git clone https://github.com/aws-samples/asynchronous-event-processing-api-gateway-dynamodb-streams-cdk.git
DevOps engineer

Set up the project.

Change the directory to the repository root, and set up the Python virtual environment and all the tools by using Projen:

cd asynchronous-event-processing-api-gateway-api-gateway-dynamodb-streams-cdk npx projen
DevOps engineer

Install pre-commit hooks.

To install pre-commit hooks, do the following:

  1. Activate the Python virtual environment:

    source .env/bin/activate
  2. Install the pre-commit hooks:

    pre-commit install pre-commit install --hook-type commit-msg
DevOps engineer
TaskDescriptionSkills required

Bootstrap AWS CDK.

To bootstrap AWS CDK in your AWS account, run the following command:

AWS_PROFILE=$YOUR_AWS_PROFILE npx projen bootstrap
AWS DevOps

Deploy the example architecture.

To deploy the example architecture in your AWS account, run the following command:

AWS_PROFILE=$YOUR_AWS_PROFILE npx projen deploy
AWS DevOps
TaskDescriptionSkills required

Install test prerequisites.

Install on your workstation the AWS Command Line Interface (AWS CLI), Postman, and jq.

Using Postman to test this example architecture is suggested but not mandatory. If you choose an alternative API testing tool, make sure that it supports AWS Signature Version 4 authentication, and refer to the exposed API endpoints that can be inspected by exporting the REST API.

DevOps engineer

Assume the JobsAPIInvokeRole.

Assume the JobsAPIInvokeRole that was printed as output from the deploy command:

CREDENTIALS=$(AWS_PROFILE=$<YOUR_AWS_PROFILE> aws sts assume-role \ --no-cli-pager \ --role-arn $<JOBS_API_INVOKE_ROLE_ARN> \ --role-session-name JobsAPIInvoke) export AWS_ACCESS_KEY_ID=$(cat $CREDENTIALS | jq ‘.Credentials’’.AccessKeyId’) export AWS_SECRET_ACCESS_KEY=$(cat $CREDENTIALS | jq ‘.Credentials’’.SecretAccessKey’) export AWS_SESSION_TOKEN==$(cat $CREDENTIALS | jq ‘.Credentials’’.SessionToken’)
AWS DevOps

Configure Postman.

  • To import the Postman collection that's included in the repository, follow the instructions in the Postman documentation.

  • Set the JobsAPI variables with the following values:

    • accessKey ‒ The value of the Credentials.AccessKeyId attribute from the assume-role command.

    • baseUrl ‒ The value of the JobsApiJobsAPIEndpoint output from the deploy command, without the trailing slash.

    • region ‒ The value of the AWS Region where you deployed the example architecture.

    • seconds ‒ The value of the input parameter for the example job. It must be a positive integer.

    • secretKey ‒ The value of the Credentials.SecretAccessKey attribute from the assume-role command.

    • sessionToken ‒ The value of the Credentials.SessionToken attribute from the assume-role command.

AWS DevOps

Test the example architecture.

To test the example architecture, send requests to the jobs API. For more information, see the Postman documentation.

DevOps engineer

Troubleshooting

IssueSolution

Destruction and subsequent redeployment of the example architecture fails because the Amazon CloudWatch Logs log group /aws/apigateway/JobsAPIAccessLogs already exists.

  1. If necessary, export your log data to Amazon Simple Storage Service (Amazon S3).

  2. Delete the CloudWatch Logs log group /aws/apigateway/JobsAPIAccessLogs.

  3. Redeploy the example architecture.

Related resources