Metrics for multi-container endpoints with direct invocation

In addition to the endpoint metrics that are listed in Metrics for monitoring Amazon SageMaker with Amazon CloudWatch, SageMaker also provides per-container metrics.

Per-container metrics for multi-container endpoints with direct invocation are located in CloudWatch and categorized into two namespaces: AWS/SageMaker and aws/sagemaker/Endpoints. The AWS/SageMaker namespace includes invocation-related metrics, and the aws/sagemaker/Endpoints namespace includes memory and CPU utilization metrics.

The following table lists the per-container metrics for multi-container endpoints with direct invocation. All the metrics use the [EndpointName, VariantName, ContainerName] dimension, which filters metrics at a specific endpoint, for a specific variant and corresponding to a specific container. These metrics share the same metric names as in those for inference pipelines, but at a per-container level [EndpointName, VariantName, ContainerName].

Metric Name	Description	Dimension	NameSpace
`Invocations`	The number of `InvokeEndpoint` requests sent to a container inside an endpoint. To get the total number of requests sent to that container, use the `Sum` statistic. Units: None Valid statistics: `Sum`, `Sample Count`	`EndpointName`, `VariantName`, `ContainerName`	`AWS/SageMaker`
`Invocation4XX Errors`	The number of `InvokeEndpoint` requests that the model returned a `4xx` HTTP response code for on a specific container. For each `4xx` response, SageMaker sends a `1`. Units: None Valid statistics: `Average`, `Sum`	`EndpointName`, `VariantName`, `ContainerName`	`AWS/SageMaker`
`Invocation5XX Errors`	The number of `InvokeEndpoint` requests that the model returned a `5xx` HTTP response code for on a specific container. For each `5xx` response, SageMaker sends a `1`. Units: None Valid statistics: `Average`, `Sum`	`EndpointName`, `VariantName`, `ContainerName`	`AWS/SageMaker`
`ContainerLatency`	The time it took for the target container to respond as viewed from SageMaker. `ContainerLatency` includes the time it took to send the request, to fetch the response from the model's container, and to complete inference in the container. Units: Microseconds Valid statistics: `Average`, `Sum`, `Min`, `Max`, `Sample Count`	`EndpointName`, `VariantName`, `ContainerName`	`AWS/SageMaker`
`OverheadLatency`	The time added to the time taken to respond to a client request by SageMaker for overhead. `OverheadLatency` is measured from the time that SageMaker receives the request until it returns a response to the client, minus the`ModelLatency`. Overhead latency can vary depending on request and response payload sizes, request frequency, and authentication or authorization of the request, among other factors. Units: Microseconds Valid statistics: `Average`, `Sum`, `Min`, `Max`, `Sample Count `	`EndpointName`, `VariantName`, `ContainerName`	`AWS/SageMaker`
`CPUUtilization`	The percentage of CPU units that are used by each container running on an instance. The value ranges from 0% to 100%, and is multiplied by the number of CPUs. For example, if there are four CPUs, `CPUUtilization` can range from 0% to 400%. For endpoints with direct invocation, the number of CPUUtilization metrics equals the number of containers in that endpoint. Units: Percent	`EndpointName`, `VariantName`, `ContainerName`	`aws/sagemaker/Endpoints`
`MemoryUtilizaton`	The percentage of memory that is used by each container running on an instance. This value ranges from 0% to 100%. Similar as CPUUtilization, in endpoints with direct invocation, the number of MemoryUtilization metrics equals the number of containers in that endpoint. Units: Percent	`EndpointName`, `VariantName`, `ContainerName`	`aws/sagemaker/Endpoints`

All the metrics in the previous table are specific to multi-container endpoints with direct invocation. Besides these special per-container metrics, there are also metrics at the variant level with dimension [EndpointName, VariantName] for all the metrics in the table expect ContainerLatency.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Security with multi-container endpoints with direct invocation

Autoscale multi-container endpoints