Metrics for multi-container endpoints with direct invocation
In addition to the endpoint metrics that are listed in Metrics for monitoring Amazon SageMaker with Amazon CloudWatch, SageMaker also provides per-container metrics.
Per-container metrics for multi-container endpoints with direct invocation are located
in CloudWatch and categorized into two namespaces: AWS/SageMaker
and
aws/sagemaker/Endpoints
. The AWS/SageMaker
namespace
includes invocation-related metrics, and the aws/sagemaker/Endpoints
namespace includes memory and CPU utilization metrics.
The following table lists the per-container metrics for multi-container endpoints with
direct invocation. All the metrics use the [EndpointName, VariantName,
ContainerName
] dimension, which filters metrics at a specific endpoint, for a
specific variant and corresponding to a specific container. These metrics share the same
metric names as in those for inference pipelines, but at a per-container level
[EndpointName, VariantName, ContainerName
].
Metric Name | Description | Dimension | NameSpace |
Invocations
|
The number of InvokeEndpoint requests sent to a
container inside an endpoint. To get the total number of requests sent
to that container, use the Sum statistic. Units: None Valid
statistics: Sum , Sample Count |
EndpointName , VariantName ,
ContainerName
|
AWS/SageMaker |
Invocation4XX Errors
|
The number of InvokeEndpoint requests that the model
returned a 4xx HTTP response code for on a specific
container. For each 4xx response, SageMaker sends a
1 . Units: None Valid statistics: Average ,
Sum
|
EndpointName , VariantName ,
ContainerName
|
AWS/SageMaker |
Invocation5XX Errors
|
The number of InvokeEndpoint requests that the model
returned a 5xx HTTP response code for on a specific
container. For each 5xx response, SageMaker sends a
1 . Units: None Valid statistics: Average ,
Sum
|
EndpointName , VariantName ,
ContainerName
|
AWS/SageMaker |
ContainerLatency
|
The time it took for the target container to respond as viewed from
SageMaker. ContainerLatency includes the time it took to send
the request, to fetch the response from the model's container, and to
complete inference in the container. Units: Microseconds Valid
statistics: Average , Sum , Min ,
Max , Sample Count |
EndpointName , VariantName ,
ContainerName
|
AWS/SageMaker |
OverheadLatency
|
The time added to the time taken to respond to a client request by
SageMaker for overhead. OverheadLatency is measured from the
time that SageMaker receives the request until it returns a response to the
client, minus theModelLatency . Overhead latency can vary
depending on request and response payload sizes, request frequency, and
authentication or authorization of the request, among other factors.
Units: Microseconds Valid statistics: Average ,
Sum , Min , Max , `Sample Count
` |
EndpointName , VariantName ,
ContainerName
|
AWS/SageMaker |
CPUUtilization
|
The percentage of CPU units that are used by each container running
on an instance. The value ranges from 0% to 100%, and is multiplied by
the number of CPUs. For example, if there are four CPUs,
CPUUtilization can range from 0% to 400%. For endpoints
with direct invocation, the number of CPUUtilization metrics equals the
number of containers in that endpoint. Units: Percent |
EndpointName , VariantName ,
ContainerName
|
aws/sagemaker/Endpoints |
MemoryUtilizaton
|
The percentage of memory that is used by each container running on an instance. This value ranges from 0% to 100%. Similar as CPUUtilization, in endpoints with direct invocation, the number of MemoryUtilization metrics equals the number of containers in that endpoint. Units: Percent |
EndpointName , VariantName ,
ContainerName
|
aws/sagemaker/Endpoints |
All the metrics in the previous table are specific to multi-container endpoints with
direct invocation. Besides these special per-container metrics, there are also metrics
at the variant level with dimension [EndpointName, VariantName]
for all the
metrics in the table expect ContainerLatency
.