Supported frameworks and algorithms
The following table shows SageMaker machine learning frameworks and algorithms supported by Debugger.
SageMaker-supported frameworks and algorithms | Debugging output tensors |
---|---|
AWS TensorFlow deep learning containers |
|
AWS PyTorch deep learning containers |
|
AWS MXNet deep learning containers |
|
1.0-1, 1.2-1, 1.3-1 |
|
Custom training containers (available for TensorFlow, PyTorch, MXNet, and XGBoost with manual hook registration) |
-
Debugging output tensors – Track and debug model parameters, such as weights, gradients, biases, and scalar values of your training job. Available deep learning frameworks are Apache MXNet, TensorFlow, PyTorch, and XGBoost.
Important
For the TensorFlow framework with Keras, SageMaker Debugger deprecates the zero code change support for debugging models built using the
tf.keras
modules of TensorFlow 2.6 and later. This is due to breaking changes announced in the TensorFlow 2.6.0 release note. For instructions on how to update your training script, see Adapt your TensorFlow training script. Important
From PyTorch v1.12.0 and later, SageMaker Debugger deprecates the zero code change support for debugging models.
This is due to breaking changes that cause SageMaker Debugger to interfere with the
torch.jit
functionality. For instructions on how to update your training script, see Adapt your PyTorch training script.
If the framework or algorithm that you want to train and debug is not listed in the
table, go to the AWS Discussion
Forum
AWS Regions
Amazon SageMaker Debugger is available in all regions where Amazon SageMaker is in service except the following region.
Asia Pacific (Jakarta):
ap-southeast-3
To find if Amazon SageMaker is in service in your AWS Region, see AWS Regional
Services
Use Debugger with Custom Training Containers
Bring your training containers to SageMaker and gain insights into your training jobs using Debugger. Maximize your work efficiency by optimizing your model on Amazon EC2 instances using the monitoring and debugging features.
For more information about how to build your training container with the
sagemaker-debugger
client library, push it to the Amazon Elastic Container Registry
(Amazon ECR), and monitor and debug, see Use Debugger with custom training
containers.
Debugger Open-Source GitHub Repositories
Debugger APIs are provided through the SageMaker Python SDK and designed to construct
Debugger hook and rule configurations for the SageMaker
CreateTrainingJob and
DescribeTrainingJob API operations. The sagemaker-debugger
client library provides tools to register hooks and access the
training data through its trial feature, all through its
flexible and powerful API operations. It supports the machine learning frameworks
TensorFlow, PyTorch, MXNet, and XGBoost on Python 3.6 and later.
For direct resources about the Debugger and sagemaker-debugger
API
operations, see the following links:
If you use the SDK for Java to conduct SageMaker training jobs and want to configure Debugger APIs, see the following references: