AWS Regions Use Debugger with Custom Containers Debugger Open-Source GitHub Repositories

Supported frameworks and algorithms

The following table shows SageMaker AI machine learning frameworks and algorithms supported by Debugger.

SageMaker AI-supported frameworks and algorithms	Debugging output tensors
TensorFlow	AWS TensorFlow deep learning containers 1.15.4 or later
PyTorch	AWS PyTorch deep learning containers 1.5.0 or later
MXNet	AWS MXNet deep learning containers 1.6.0 or later
XGBoost	1.0-1, 1.2-1, 1.3-1
SageMaker AI generic estimator	Custom training containers (available for TensorFlow, PyTorch, MXNet, and XGBoost with manual hook registration)

Debugging output tensors – Track and debug model parameters, such as weights, gradients, biases, and scalar values of your training job. Available deep learning frameworks are Apache MXNet, TensorFlow, PyTorch, and XGBoost.

Important
For the TensorFlow framework with Keras, SageMaker Debugger deprecates the zero code change support for debugging models built using the tf.keras modules of TensorFlow 2.6 and later. This is due to breaking changes announced in the TensorFlow 2.6.0 release note. For instructions on how to update your training script, see Adapt your TensorFlow training script.

Important
From PyTorch v1.12.0 and later, SageMaker Debugger deprecates the zero code change support for debugging models.
This is due to breaking changes that cause SageMaker Debugger to interfere with the torch.jit functionality. For instructions on how to update your training script, see Adapt your PyTorch training script.

If the framework or algorithm that you want to train and debug is not listed in the table, go to the AWS Discussion Forum and leave feedback on SageMaker Debugger.

AWS Regions

Amazon SageMaker Debugger is available in all regions where Amazon SageMaker AI is in service except the following region.

Asia Pacific (Jakarta): ap-southeast-3

To find if Amazon SageMaker AI is in service in your AWS Region, see AWS Regional Services.

Use Debugger with Custom Training Containers

Bring your training containers to SageMaker AI and gain insights into your training jobs using Debugger. Maximize your work efficiency by optimizing your model on Amazon EC2 instances using the monitoring and debugging features.

For more information about how to build your training container with the sagemaker-debugger client library, push it to the Amazon Elastic Container Registry (Amazon ECR), and monitor and debug, see Use Debugger with custom training containers.

Debugger Open-Source GitHub Repositories

Debugger APIs are provided through the SageMaker Python SDK and designed to construct Debugger hook and rule configurations for the SageMaker AI CreateTrainingJob and DescribeTrainingJob API operations. The sagemaker-debugger client library provides tools to register hooks and access the training data through its trial feature, all through its flexible and powerful API operations. It supports the machine learning frameworks TensorFlow, PyTorch, MXNet, and XGBoost on Python 3.6 and later.

For direct resources about the Debugger and sagemaker-debugger API operations, see the following links:

If you use the SDK for Java to conduct SageMaker training jobs and want to configure Debugger APIs, see the following references:

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

SageMaker Debugger

Debugger architecture

Supported frameworks and algorithms

Important

Important

AWS Regions

Use Debugger with Custom Training Containers

Debugger Open-Source GitHub Repositories