Frequently asked questions

Why is a function with reserved concurrency not scaling to meet incoming traffic?

Reserved concurrency on a Lambda function also acts as a maximum capacity value. Raising the soft limit on total concurrency does not affect this behavior. If you need a function with reserved concurrency to process more traffic, you can update the reserved concurrency value, which effectively increases the maximum throughput of your function.

Why is a function with Provisioned Concurrency still experiencing cold starts?

You can measure cold starts as Lambda scales up by adding X-Ray monitoring to your function. A function using Provisioned Concurrency does not exhibit cold start behavior since the execution environment is prepared ahead of invocation. However, Provisioned Concurrency must be applied to a specific version or alias of a function, not the $LATEST version. In cases where you continue to see cold start behavior, ensure that you are invoking the version of alias with Provisioned Concurrency configured.

What is the best runtime to use for my Lambda function?

The Lambda service is agnostic to your choice of runtime. For simple functions, interpreted languages like Python and Node.js offer the fastest performance. For functions with more complex computation, compiled languages like Java are often slower to initialize but run quickly in the Lambda handler. Choice of runtime is also influential by developer preference and language familiarity.

How do I make sure that the SDK version doesn’t change?

The embedded SDKs may change without notice as AWS releases new services and features. You can lock a version of the SDK by creating a Lambda layer with the specific version needed. The function then always uses the version in the layer, even if the version embedded in the service changes.

How can I test that a Lambda-based application can scale to meet the expected traffic?

To ensure that your application scales as expected, use load testing in your development process to simulate the expected level of traffic. You can load test an application backend using open source tools like Artillery.

Which workloads are suitable for Provisioned Concurrency?

Provisioned Concurrency is designed to make functions available with double-digit millisecond response times. Generally, interactive workloads benefit the most from the feature. Those are applications with users initiating requests, such as web and mobile applications, and are the most sensitive to latency. Asynchronous workloads, such as data processing pipelines, are often less latency sensitive and so do not usually need Provisioned Concurrency.

What is the difference between reserved concurrency and Provisioned Concurrency?

The unreserved concurrency pool is used by all on-demand Lambda functions. When a function is invoked, it draws from this pool. If unreserved concurrency reaches zero, new invocations for any Lambda function in an account will fail.

Reserved concurrency enables you to reserve a portion of this pool for a given function, to help ensure capacity is always available. It also limits the concurrency of a function so can be used to limit and smooth out traffic for some workloads.

Provisioned Concurrency configures execution environments to be available before invocation. It provides a way to virtually eliminate cold starts for latency-sensitive workloads. A function typically uses one of these two features but not both at the same time.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Controlling traffic flow for server-based resources

Security