Amazon SageMaker Debugger

Focus mode

Amazon SageMaker Debugger - Amazon SageMaker AI

Debug model output tensors from machine learning training jobs in real time and detect non-converging issues using Amazon SageMaker Debugger.

Amazon SageMaker Debugger features

A machine learning (ML) training job can have problems such as overfitting, saturated activation functions, and vanishing gradients, which can compromise model performance.

SageMaker Debugger provides tools to debug training jobs and resolve such problems to improve the performance of your model. Debugger also offers tools to send alerts when training anomalies are found, take actions against the problems, and identify the root cause of them by visualizing collected metrics and tensors.

SageMaker Debugger supports the Apache MXNet, PyTorch, TensorFlow, and XGBoost frameworks. For more information about available frameworks and versions supported by SageMaker Debugger, see Supported frameworks and algorithms.

Overview of how Amazon SageMaker Debugger works.

The high-level Debugger workflow is as follows:

Modify your training script with the sagemaker-debugger Python SDK if needed.
Configure a SageMaker training job with SageMaker Debugger.
- Configure using the SageMaker AI Estimator API (for Python SDK).
- Configure using the SageMaker AI CreateTrainingJob request (for Boto3 or CLI).
- Configure custom training containers with SageMaker Debugger.
Start a training job and monitor training issues in real time.
- List of Debugger built-in rules.
Get alerts and take prompt actions against the training issues.
- Receive texts and emails and stop training jobs when training issues are found using Use Debugger built-in actions for rules.
- Set up your own actions using Amazon CloudWatch Events and AWS Lambda.
Explore deep analysis of the training issues.
- For debugging model output tensors, see Visualize Debugger Output Tensors in TensorBoard.
Fix the issues, consider the suggestions provided by Debugger, and repeat steps 1–5 until you optimize your model and achieve target accuracy.

The SageMaker Debugger developer guide walks you through the following topics.

Topics

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Delete unused TensorBoard applications

Supported frameworks and algorithms

Next topic:

Supported frameworks and algorithms

Previous topic:

Delete unused TensorBoard applications

Need help?

Select your cookie preferences

Customize cookie preferences

Essential

Performance

Functional

Advertising

Unable to save cookie preferences

Amazon SageMaker Debugger

Amazon SageMaker Debugger features

Topics

Next topic:

Previous topic:

Need help?

On this page

Related resources

Did this page help you?

Related resources

Overview of how Amazon SageMaker Debugger works.