Monitoring for Lambda SnapStart - AWS Lambda

Monitoring for Lambda SnapStart

You can monitor your Lambda SnapStart functions using Amazon CloudWatch, AWS X-Ray, and the Accessing real-time telemetry data for extensions using the Telemetry API.

Note

The AWS_LAMBDA_LOG_GROUP_NAME and AWS_LAMBDA_LOG_STREAM_NAME environment variables are not available in Lambda SnapStart functions.

Understanding logging and billing behavior with SnapStart

There are a few differences with the CloudWatch log stream format for SnapStart functions:

  • Initialization logs – When a new execution environment is created, the REPORT doesn't include the Init Duration field. That's because Lambda initializes SnapStart functions when you create a version instead of during function invocation. For SnapStart functions, the Init Duration field is in the INIT_REPORT record. This record shows duration details for the Init phase, including the duration of any beforeCheckpoint runtime hooks.

  • Invocation logs – When a new execution environment is created, the REPORT includes the Restore Duration and Billed Restore Duration fields:

    • Restore Duration: The time it takes for Lambda to restore a snapshot, load the runtime (JVM), and run any afterRestore runtime hooks. The process of restoring snapshots can include time spent on activities outside the MicroVM. This time is reported in Restore Duration.

    • Billed Restore Duration: The time it takes for Lambda to load the runtime (JVM) and run any afterRestore hooks.

Note

As with all Lambda functions, duration charges apply to code that runs in the function handler. For SnapStart functions, duration charges also apply to initialization code that's declared outside of the handler, the time it takes for the runtime to load, and any code that runs in a runtime hook.

The cold start duration is the sum of Restore Duration + Duration.

The following example is a Lambda Insights query that returns the latency percentiles for SnapStart functions. For more information about Lambda Insights queries, see Example workflow using queries to troubleshoot a function.

filter @type = "REPORT" | parse @log /\d+:\/aws\/lambda\/(?<function>.*)/ | parse @message /Restore Duration: (?<restoreDuration>.*?) ms/ | stats count(*) as invocations, pct(@duration+coalesce(@initDuration,0)+coalesce(restoreDuration,0), 50) as p50, pct(@duration+coalesce(@initDuration,0)+coalesce(restoreDuration,0), 90) as p90, pct(@duration+coalesce(@initDuration,0)+coalesce(restoreDuration,0), 99) as p99, pct(@duration+coalesce(@initDuration,0)+coalesce(restoreDuration,0), 99.9) as p99.9 group by function, (ispresent(@initDuration) or ispresent(restoreDuration)) as coldstart | sort by coldstart desc

X-Ray active tracing for SnapStart

You can use X-Ray to trace requests to Lambda SnapStart functions. There are a few differences with the X-Ray subsegments for SnapStart functions:

  • There is no Initialization subsegment for SnapStart functions.

  • The Restore subsegment shows the time it takes for Lambda to restore a snapshot, load the runtime (JVM), and run any afterRestore runtime hooks. The process of restoring snapshots can include time spent on activities outside the MicroVM. This time is reported in the Restore subsegment. You aren't charged for the time spent outside the microVM to restore a snapshot.

Telemetry API events for SnapStart

Lambda sends the following SnapStart events to the Telemetry API:

  • platform.restoreStart – Shows the time when the Restore phase started.

  • platform.restoreRuntimeDone – Shows whether the Restore phase was successful. Lambda sends this message when the runtime sends a restore/next runtime API request. There are three possible statuses: success, failure, and timeout.

  • platform.restoreReport – Shows how long the Restore phase lasted and how many milliseconds you were billed for during this phase.

Amazon API Gateway and function URL metrics

If you create a web API using API Gateway, then you can use the IntegrationLatency metric to measure end-to-end latency (the time between when API Gateway relays a request to the backend and when it receives a response from the backend).

If you're using a Lambda function URL, then you can use the UrlRequestLatency metric to measure end-to-end latency (the time between when the function URL receives a request and when the function URL returns a response).