Inference Pipeline Logs and Metrics - Amazon SageMaker

Inference Pipeline Logs and Metrics

Monitoring is important for maintaining the reliability, availability, and performance of Amazon SageMaker resources. To monitor and troubleshoot inference pipeline performance, use Amazon CloudWatch logs and error messages. For information about the monitoring tools that SageMaker provides, see Tools for monitoring the AWS resources provisioned while using Amazon SageMaker.

Use Metrics to Monitor Multi-container Models

To monitor the multi-container models in Inference Pipelines, use Amazon CloudWatch. CloudWatch collects raw data and processes it into readable, near real-time metrics. SageMaker training jobs and endpoints write CloudWatch metrics and logs in the AWS/SageMaker namespace.

The following tables list the metrics and dimensions for the following:

  • Endpoint invocations

  • Training jobs, batch transform jobs, and endpoint instances

A dimension is a name/value pair that uniquely identifies a metric. You can assign up to 10 dimensions to a metric. For more information on monitoring with CloudWatch, see Metrics for monitoring Amazon SageMaker with Amazon CloudWatch.

Endpoint Invocation Metrics

The AWS/SageMaker namespace includes the following request metrics from calls to InvokeEndpoint.

Metrics are reported at a 1-minute intervals.

Metric Description
Invocation4XXErrors

The number of InvokeEndpoint requests that the model returned a 4xx HTTP response code for. For each 4xx response, SageMaker sends a 1.

Units: None

Valid statistics: Average, Sum

Invocation5XXErrors

The number of InvokeEndpoint requests that the model returned a 5xx HTTP response code for. For each 5xx response, SageMaker sends a 1.

Units: None

Valid statistics: Average, Sum

Invocations

The number of InvokeEndpoint requests sent to a model endpoint.

To get the total number of requests sent to a model endpoint, use the Sum statistic.

Units: None

Valid statistics: Sum, Sample Count

InvocationsPerInstance

The number of endpoint invocations sent to a model, normalized by InstanceCount in each ProductionVariant. SageMaker sends 1/numberOfInstances as the value for each request, where numberOfInstances is the number of active instances for the ProductionVariant at the endpoint at the time of the request.

Units: None

Valid statistics: Sum

ModelLatency The time the model or models took to respond. This includes the time it took to send the request, to fetch the response from the model container, and to complete the inference in the container. ModelLatency is the total time taken by all containers in an inference pipeline.

Units: Microseconds

Valid statistics: Average, Sum, Min, Max, Sample Count

OverheadLatency

The time added to the time taken to respond to a client request by SageMaker for overhead. OverheadLatency is measured from the time that SageMaker receives the request until it returns a response to the client, minus the ModelLatency. Overhead latency can vary depending on request and response payload sizes, request frequency, and authentication or authorization of the request, among other factors.

Units: Microseconds

Valid statistics: Average, Sum, Min, Max, Sample Count

ContainerLatency The time it took for an Inference Pipelines container to respond as viewed from SageMaker. ContainerLatency includes the time it took to send the request, to fetch the response from the model's container, and to complete inference in the container.

Units: Microseconds

Valid statistics: Average, Sum, Min, Max, Sample Count

Dimensions for Endpoint Invocation Metrics

Dimension Description
EndpointName, VariantName, ContainerName

Filters endpoint invocation metrics for a ProductionVariant at the specified endpoint and for the specified variant.

For an inference pipeline endpoint, CloudWatch lists per-container latency metrics in your account as Endpoint Container Metrics and Endpoint Variant Metrics in the SageMaker namespace, as follows. The ContainerLatency metric appears only for inferences pipelines.

The CloudWatch dashboard for an inference pipeline.

For each endpoint and each container, latency metrics display names for the container, endpoint, variant, and metric.

The latency metrics for an endpoint.

Training Job, Batch Transform Job, and Endpoint Instance Metrics

The namespaces /aws/sagemaker/TrainingJobs, /aws/sagemaker/TransformJobs, and /aws/sagemaker/Endpoints include the following metrics for training jobs and endpoint instances.

Metrics are reported at a 1-minute intervals.

Metric Description
CPUUtilization

The percentage of CPU units that are used by the containers running on an instance. The value ranges from 0% to 100%, and is multiplied by the number of CPUs. For example, if there are four CPUs, CPUUtilization can range from 0% to 400%.

For training jobs, CPUUtilization is the CPU utilization of the algorithm container running on the instance.

For batch transform jobs, CPUUtilization is the CPU utilization of the transform container running on the instance.

For multi-container models, CPUUtilization is the sum of CPU utilization by all containers running on the instance.

For endpoint variants, CPUUtilization is the sum of CPU utilization by all of the containers running on the instance.

Units: Percent

MemoryUtilization

The percentage of memory that is used by the containers running on an instance. This value ranges from 0% to 100%.

For training jobs, MemoryUtilization is the memory used by the algorithm container running on the instance.

For batch transform jobs, MemoryUtilization is the memory used by the transform container running on the instance.

For multi-container models, MemoryUtilization is the sum of memory used by all containers running on the instance.

For endpoint variants, MemoryUtilization is the sum of memory used by all of the containers running on the instance.

Units: Percent

GPUUtilization

The percentage of GPU units that are used by the containers running on an instance. GPUUtilization ranges from 0% to 100% and is multiplied by the number of GPUs. For example, if there are four GPUs, GPUUtilization can range from 0% to 400%.

For training jobs, GPUUtilization is the GPU used by the algorithm container running on the instance.

For batch transform jobs, GPUUtilization is the GPU used by the transform container running on the instance.

For multi-container models, GPUUtilization is the sum of GPU used by all containers running on the instance.

For endpoint variants, GPUUtilization is the sum of GPU used by all of the containers running on the instance.

Units: Percent

GPUMemoryUtilization

The percentage of GPU memory used by the containers running on an instance. GPUMemoryUtilization ranges from 0% to 100% and is multiplied by the number of GPUs. For example, if there are four GPUs, GPUMemoryUtilization can range from 0% to 400%.

For training jobs, GPUMemoryUtilization is the GPU memory used by the algorithm container running on the instance.

For batch transform jobs, GPUMemoryUtilization is the GPU memory used by the transform container running on the instance.

For multi-container models, GPUMemoryUtilization is sum of GPU used by all containers running on the instance.

For endpoint variants, GPUMemoryUtilization is the sum of the GPU memory used by all of the containers running on the instance.

Units: Percent

DiskUtilization

The percentage of disk space used by the containers running on an instance. DiskUtilization ranges from 0% to 100%. This metric is not supported for batch transform jobs.

For training jobs, DiskUtilization is the disk space used by the algorithm container running on the instance.

For endpoint variants, DiskUtilization is the sum of the disk space used by all of the provided containers running on the instance.

Units: Percent

Dimensions for Training Job, Batch Transform Job, and Endpoint Instance Metrics

Dimension Description
Host

For training jobs, Host has the format [training-job-name]/algo-[instance-number-in-cluster]. Use this dimension to filter instance metrics for the specified training job and instance. This dimension format is present only in the /aws/sagemaker/TrainingJobs namespace.

For batch transform jobs, Host has the format [transform-job-name]/[instance-id]. Use this dimension to filter instance metrics for the specified batch transform job and instance. This dimension format is present only in the /aws/sagemaker/TransformJobs namespace.

For endpoints, Host has the format [endpoint-name]/[ production-variant-name ]/[instance-id]. Use this dimension to filter instance metrics for the specified endpoint, variant, and instance. This dimension format is present only in the /aws/sagemaker/Endpoints namespace.

To help you debug your training jobs, endpoints, and notebook instance lifecycle configurations, SageMaker also sends anything an algorithm container, a model container, or a notebook instance lifecycle configuration sends to stdout or stderr to Amazon CloudWatch Logs. You can use this information for debugging and to analyze progress.

Use Logs to Monitor an Inference Pipeline

The following table lists the log groups and log streams SageMaker. sends to Amazon CloudWatch

A log stream is a sequence of log events that share the same source. Each separate source of logs into CloudWatch makes up a separate log stream. A log group is a group of log streams that share the same retention, monitoring, and access control settings.

Logs

Log Group Name Log Stream Name
/aws/sagemaker/TrainingJobs

[training-job-name]/algo-[instance-number-in-cluster]-[epoch_timestamp]

/aws/sagemaker/Endpoints/[EndpointName]

[production-variant-name]/[instance-id]

[production-variant-name]/[instance-id]

[production-variant-name]/[instance-id]/[container-name provided in the SageMaker model] (For Inference Pipelines) For Inference Pipelines logs, if you don't provide container names, CloudWatch uses **container-1, container-2**, and so on, in the order that containers are provided in the model.

/aws/sagemaker/NotebookInstances

[notebook-instance-name]/[LifecycleConfigHook]

/aws/sagemaker/TransformJobs

[transform-job-name]/[instance-id]-[epoch_timestamp]

[transform-job-name]/[instance-id]-[epoch_timestamp]/data-log

[transform-job-name]/[instance-id]-[epoch_timestamp]/[container-name provided in the SageMaker model] (For Inference Pipelines) For Inference Pipelines logs, if you don't provide container names, CloudWatch uses **container-1, container-2**, and so on, in the order that containers are provided in the model.

Note

SageMaker creates the /aws/sagemaker/NotebookInstances log group when you create a notebook instance with a lifecycle configuration. For more information, see Customization of a SageMaker notebook instance using an LCC script.

For more information about SageMaker logging, see Log groups and streams that Amazon SageMaker sends to Amazon CloudWatch Logs.