Metrics for multi-container endpoints with direct invocation - Amazon SageMaker

Metrics for multi-container endpoints with direct invocation

In addition to the endpoint metrics that are listed in Metrics for monitoring Amazon SageMaker with Amazon CloudWatch, SageMaker also provides per-container metrics.

Per-container metrics for multi-container endpoints with direct invocation are located in CloudWatch and categorized into two namespaces: AWS/SageMaker and aws/sagemaker/Endpoints. The AWS/SageMaker namespace includes invocation-related metrics, and the aws/sagemaker/Endpoints namespace includes memory and CPU utilization metrics.

The following table lists the per-container metrics for multi-container endpoints with direct invocation. All the metrics use the [EndpointName, VariantName, ContainerName] dimension, which filters metrics at a specific endpoint, for a specific variant and corresponding to a specific container. These metrics share the same metric names as in those for inference pipelines, but at a per-container level [EndpointName, VariantName, ContainerName].

Metric Name Description Dimension NameSpace
Invocations The number of InvokeEndpoint requests sent to a container inside an endpoint. To get the total number of requests sent to that container, use the Sum statistic. Units: None Valid statistics: Sum, Sample Count EndpointName, VariantName, ContainerName AWS/SageMaker
Invocation4XX Errors The number of InvokeEndpoint requests that the model returned a 4xx HTTP response code for on a specific container. For each 4xx response, SageMaker sends a 1. Units: None Valid statistics: Average, Sum EndpointName, VariantName, ContainerName AWS/SageMaker
Invocation5XX Errors The number of InvokeEndpoint requests that the model returned a 5xx HTTP response code for on a specific container. For each 5xx response, SageMaker sends a 1. Units: None Valid statistics: Average, Sum EndpointName, VariantName, ContainerName AWS/SageMaker
ContainerLatency The time it took for the target container to respond as viewed from SageMaker. ContainerLatency includes the time it took to send the request, to fetch the response from the model's container, and to complete inference in the container. Units: Microseconds Valid statistics: Average, Sum, Min, Max, Sample Count EndpointName, VariantName, ContainerName AWS/SageMaker
OverheadLatency The time added to the time taken to respond to a client request by SageMaker for overhead. OverheadLatency is measured from the time that SageMaker receives the request until it returns a response to the client, minus theModelLatency. Overhead latency can vary depending on request and response payload sizes, request frequency, and authentication or authorization of the request, among other factors. Units: Microseconds Valid statistics: Average, Sum, Min, Max, `Sample Count ` EndpointName, VariantName, ContainerName AWS/SageMaker
CPUUtilization The percentage of CPU units that are used by each container running on an instance. The value ranges from 0% to 100%, and is multiplied by the number of CPUs. For example, if there are four CPUs, CPUUtilization can range from 0% to 400%. For endpoints with direct invocation, the number of CPUUtilization metrics equals the number of containers in that endpoint. Units: Percent EndpointName, VariantName, ContainerName aws/sagemaker/Endpoints
MemoryUtilizaton The percentage of memory that is used by each container running on an instance. This value ranges from 0% to 100%. Similar as CPUUtilization, in endpoints with direct invocation, the number of MemoryUtilization metrics equals the number of containers in that endpoint. Units: Percent EndpointName, VariantName, ContainerName aws/sagemaker/Endpoints

All the metrics in the previous table are specific to multi-container endpoints with direct invocation. Besides these special per-container metrics, there are also metrics at the variant level with dimension [EndpointName, VariantName] for all the metrics in the table expect ContainerLatency.