

# Launch training jobs with Debugger using the SageMaker Python SDK
<a name="debugger-configuration-for-debugging"></a>

To configure a SageMaker AI estimator with SageMaker Debugger, use [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) and specify Debugger-specific parameters. To fully utilize the debugging functionality, there are three parameters you need to configure: `debugger_hook_config`, `tensorboard_output_config`, and `rules`.

**Important**  
Before constructing and running the estimator fit method to launch a training job, make sure that you adapt your training script following the instructions at [Adapting your training script to register a hook](debugger-modify-script.md).

## Constructing a SageMaker AI Estimator with Debugger-specific parameters
<a name="debugger-configuration-structure"></a>

The code examples in this section show how to construct a SageMaker AI estimator with the Debugger-specific parameters.

**Note**  
The following code examples are templates for constructing the SageMaker AI framework estimators and not directly executable. You need to proceed to the next sections and configure the Debugger-specific parameters.

------
#### [ PyTorch ]

```
# An example of constructing a SageMaker AI PyTorch estimator
import boto3
import sagemaker
from sagemaker.pytorch import PyTorch
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs

session=boto3.session.Session()
region=session.region_name

debugger_hook_config=DebuggerHookConfig(...)
rules=[
    Rule.sagemaker(rule_configs.built_in_rule())
]

estimator=PyTorch(
    entry_point="directory/to/your_training_script.py",
    role=sagemaker.get_execution_role(),
    base_job_name="debugger-demo",
    instance_count=1,
    instance_type="ml.p3.2xlarge",
    framework_version="1.12.0",
    py_version="py37",
    
    # Debugger-specific parameters
    debugger_hook_config=debugger_hook_config,
    rules=rules
)

estimator.fit(wait=False)
```

------
#### [ TensorFlow ]

```
# An example of constructing a SageMaker AI TensorFlow estimator
import boto3
import sagemaker
from sagemaker.tensorflow import TensorFlow
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs

session=boto3.session.Session()
region=session.region_name

debugger_hook_config=DebuggerHookConfig(...)
rules=[
    Rule.sagemaker(rule_configs.built_in_rule()),
    ProfilerRule.sagemaker(rule_configs.BuiltInRule())
]

estimator=TensorFlow(
    entry_point="directory/to/your_training_script.py",
    role=sagemaker.get_execution_role(),
    base_job_name="debugger-demo",
    instance_count=1,
    instance_type="ml.p3.2xlarge",
    framework_version="2.9.0",
    py_version="py39",
    
    # Debugger-specific parameters
    debugger_hook_config=debugger_hook_config,
    rules=rules
)

estimator.fit(wait=False)
```

------
#### [ MXNet ]

```
# An example of constructing a SageMaker AI MXNet estimator
import sagemaker
from sagemaker.mxnet import MXNet
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs

debugger_hook_config=DebuggerHookConfig(...)
rules=[
    Rule.sagemaker(rule_configs.built_in_rule())
]

estimator=MXNet(
    entry_point="directory/to/your_training_script.py",
    role=sagemaker.get_execution_role(),
    base_job_name="debugger-demo",
    instance_count=1,
    instance_type="ml.p3.2xlarge",
    framework_version="1.7.0",
    py_version="py37",
    
    # Debugger-specific parameters
    debugger_hook_config=debugger_hook_config,
    rules=rules
)

estimator.fit(wait=False)
```

------
#### [ XGBoost ]

```
# An example of constructing a SageMaker AI XGBoost estimator
import sagemaker
from sagemaker.xgboost.estimator import XGBoost
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs

debugger_hook_config=DebuggerHookConfig(...)
rules=[
    Rule.sagemaker(rule_configs.built_in_rule())
]

estimator=XGBoost(
    entry_point="directory/to/your_training_script.py",
    role=sagemaker.get_execution_role(),
    base_job_name="debugger-demo",
    instance_count=1,
    instance_type="ml.p3.2xlarge",
    framework_version="1.5-1",

    # Debugger-specific parameters
    debugger_hook_config=debugger_hook_config,
    rules=rules
)

estimator.fit(wait=False)
```

------
#### [ Generic estimator ]

```
# An example of constructing a SageMaker AI generic estimator using the XGBoost algorithm base image
import boto3
import sagemaker
from sagemaker.estimator import Estimator
from sagemaker import image_uris
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs

debugger_hook_config=DebuggerHookConfig(...)
rules=[
    Rule.sagemaker(rule_configs.built_in_rule())
]

region=boto3.Session().region_name
xgboost_container=sagemaker.image_uris.retrieve("xgboost", region, "1.5-1")

estimator=Estimator(
    role=sagemaker.get_execution_role()
    image_uri=xgboost_container,
    base_job_name="debugger-demo",
    instance_count=1,
    instance_type="ml.m5.2xlarge",
    
    # Debugger-specific parameters
    debugger_hook_config=debugger_hook_config,
    rules=rules
)

estimator.fit(wait=False)
```

------

Configure the following parameters to activate SageMaker Debugger:
+ `debugger_hook_config` (an object of [https://sagemaker.readthedocs.io/en/stable/api/training/debugger.html#sagemaker.debugger.DebuggerHookConfig](https://sagemaker.readthedocs.io/en/stable/api/training/debugger.html#sagemaker.debugger.DebuggerHookConfig)) – Required to activate the hook in the adapted training script during [Adapting your training script to register a hook](debugger-modify-script.md), configure the SageMaker training launcher (estimator) to collect output tensors from your training job, and save the tensors into your secured S3 bucket or local machine. To learn how to configure the `debugger_hook_config` parameter, see [Configuring SageMaker Debugger to save tensors](debugger-configure-hook.md).
+ `rules` (a list of [https://sagemaker.readthedocs.io/en/stable/api/training/debugger.html#sagemaker.debugger.Rule](https://sagemaker.readthedocs.io/en/stable/api/training/debugger.html#sagemaker.debugger.Rule) objects) – Configure this parameter to activate SageMaker Debugger built-in rules that you want to run in real time. The built-in rules are logics that automatically debug the training progress of your model and find training issues by analyzing the output tensors saved in your secured S3 bucket. To learn how to configure the `rules` parameter, see [How to configure Debugger built-in rules](use-debugger-built-in-rules.md). To find a complete list of built-in rules for debugging output tensors, see [Debugger rule](debugger-built-in-rules.md#debugger-built-in-rules-Rule). If you want to create your own logic to detect any training issues, see [Creating custom rules using the Debugger client library](debugger-custom-rules.md).
**Note**  
The built-in rules are available only through SageMaker training instances. You cannot use them in local mode.
+ `tensorboard_output_config` (an object of [https://sagemaker.readthedocs.io/en/stable/api/training/debugger.html#sagemaker.debugger.TensorBoardOutputConfig](https://sagemaker.readthedocs.io/en/stable/api/training/debugger.html#sagemaker.debugger.TensorBoardOutputConfig)) – Configure SageMaker Debugger to collect output tensors in the TensorBoard-compatible format and save to your S3 output path specified in the `TensorBoardOutputConfig` object. To learn more, see [Visualize Amazon SageMaker Debugger output tensors in TensorBoard](debugger-enable-tensorboard-summaries.md).
**Note**  
The `tensorboard_output_config` must be configured with the `debugger_hook_config` parameter, which also requires you to adapt your training script by adding the `sagemaker-debugger` hook.

**Note**  
SageMaker Debugger securely saves output tensors in subfolders of your S3 bucket. For example, the format of the default S3 bucket URI in your account is `s3://amzn-s3-demo-bucket-sagemaker-<region>-<12digit_account_id>/<base-job-name>/<debugger-subfolders>/`. There are two subfolders created by SageMaker Debugger: `debug-output`, and `rule-output`. If you add the `tensorboard_output_config` parameter, you'll also find `tensorboard-output` folder.

See the following topics to find more examples of how to configure the Debugger-specific parameters in detail.

**Topics**
+ [Constructing a SageMaker AI Estimator with Debugger-specific parameters](#debugger-configuration-structure)
+ [Configuring SageMaker Debugger to save tensors](debugger-configure-hook.md)
+ [How to configure Debugger built-in rules](use-debugger-built-in-rules.md)
+ [Turn off Debugger](debugger-turn-off.md)
+ [Useful SageMaker AI estimator class methods for Debugger](debugger-estimator-classmethods.md)

# Configuring SageMaker Debugger to save tensors
<a name="debugger-configure-hook"></a>

*Tensors* are data collections of updated parameters from the backward and forward pass of each training iteration. SageMaker Debugger collects the output tensors to analyze the state of a training job. SageMaker Debugger's [https://sagemaker.readthedocs.io/en/stable/api/training/debugger.html#sagemaker.debugger.CollectionConfig](https://sagemaker.readthedocs.io/en/stable/api/training/debugger.html#sagemaker.debugger.CollectionConfig) and [https://sagemaker.readthedocs.io/en/stable/api/training/debugger.html#sagemaker.debugger.DebuggerHookConfig](https://sagemaker.readthedocs.io/en/stable/api/training/debugger.html#sagemaker.debugger.DebuggerHookConfig) API operations provide methods for grouping tensors into *collections* and saving them to a target S3 bucket. The following topics show how to use the `CollectionConfig` and `DebuggerHookConfig` API operations, followed by examples of how to use Debugger hook to save, access, and visualize output tensors.

While constructing a SageMaker AI estimator, activate SageMaker Debugger by specifying the `debugger_hook_config` parameter. The following topics include examples of how to set up the `debugger_hook_config` using the `CollectionConfig` and `DebuggerHookConfig` API operations to pull tensors out of your training jobs and save them.

**Note**  
After properly configured and activated, SageMaker Debugger saves the output tensors in a default S3 bucket, unless otherwise specified. The format of the default S3 bucket URI is `s3://amzn-s3-demo-bucket-sagemaker-<region>-<12digit_account_id>/<training-job-name>/debug-output/`.

**Topics**
+ [Configure tensor collections using the `CollectionConfig` API](debugger-configure-tensor-collections.md)
+ [Configure the `DebuggerHookConfig` API to save tensors](debugger-configure-tensor-hook.md)
+ [Example notebooks and code samples to configure Debugger hook](debugger-save-tensors.md)

# Configure tensor collections using the `CollectionConfig` API
<a name="debugger-configure-tensor-collections"></a>

Use the `CollectionConfig` API operation to configure tensor collections. Debugger provides pre-built tensor collections that cover a variety of regular expressions (regex) of parameters if using Debugger-supported deep learning frameworks and machine learning algorithms. As shown in the following example code, add the built-in tensor collections you want to debug.

```
from sagemaker.debugger import CollectionConfig

collection_configs=[
    CollectionConfig(name="weights"),
    CollectionConfig(name="gradients")
]
```

The preceding collections set up the Debugger hook to save the tensors every 500 steps based on the default `"save_interval"` value.

For a full list of available Debugger built-in collections, see [Debugger Built-in Collections](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/api.md#collection).

If you want to customize the built-in collections, such as changing the save intervals and tensor regex, use the following `CollectionConfig` template to adjust parameters.

```
from sagemaker.debugger import CollectionConfig

collection_configs=[
    CollectionConfig(
        name="tensor_collection",
        parameters={
            "key_1": "value_1",
            "key_2": "value_2",
            ...
            "key_n": "value_n"
        }
    )
]
```

For more information about available parameter keys, see [CollectionConfig](https://sagemaker.readthedocs.io/en/stable/api/training/debugger.html#sagemaker.debugger.CollectionConfig) in the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable). For example, the following code example shows how you can adjust the save intervals of the "losses" tensor collection at different phases of training: save loss every 100 steps in training phase and validation loss every 10 steps in validation phase. 

```
from sagemaker.debugger import CollectionConfig

collection_configs=[
    CollectionConfig(
        name="losses",
        parameters={
            "train.save_interval": "100",
            "eval.save_interval": "10"
        }
    )
]
```

**Tip**  
This tensor collection configuration object can be used for both [DebuggerHookConfig](https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-configure-hook.html#debugger-configure-tensor-hook) and [Rule](https://docs.aws.amazon.com/sagemaker/latest/dg/use-debugger-built-in-rules.html#debugger-built-in-rules-configuration-param-change) API operations.

# Configure the `DebuggerHookConfig` API to save tensors
<a name="debugger-configure-tensor-hook"></a>

Use the [DebuggerHookConfig](https://sagemaker.readthedocs.io/en/stable/api/training/debugger.html                 #sagemaker.debugger.DebuggerHookConfig) API to create a `debugger_hook_config` object using the `collection_configs` object you created in the previous step.

```
from sagemaker.debugger import DebuggerHookConfig

debugger_hook_config=DebuggerHookConfig(
    collection_configs=collection_configs
)
```

Debugger saves the model training output tensors into the default S3 bucket. The format of the default S3 bucket URI is `s3://amzn-s3-demo-bucket-sagemaker-<region>-<12digit_account_id>/<training-job-name>/debug-output/.`

If you want to specify an exact S3 bucket URI, use the following code example:

```
from sagemaker.debugger import DebuggerHookConfig

debugger_hook_config=DebuggerHookConfig(
    s3_output_path="specify-uri"
    collection_configs=collection_configs
)
```

For more information, see [DebuggerHookConfig](https://sagemaker.readthedocs.io/en/stable/api/training/debugger.html#sagemaker.debugger.DebuggerHookConfig) in the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable).

# Example notebooks and code samples to configure Debugger hook
<a name="debugger-save-tensors"></a>

The following sections provide notebooks and code examples of how to use Debugger hook to save, access, and visualize output tensors.

**Topics**
+ [Tensor visualization example notebooks](#debugger-tensor-visualization-notebooks)
+ [Save tensors using Debugger built-in collections](#debugger-save-built-in-collections)
+ [Save tensors by modifying Debugger built-in collections](#debugger-save-modified-built-in-collections)
+ [Save tensors using Debugger custom collections](#debugger-save-custom-collections)

## Tensor visualization example notebooks
<a name="debugger-tensor-visualization-notebooks"></a>

The following two notebook examples show advanced use of Amazon SageMaker Debugger for visualizing tensors. Debugger provides a transparent view into training deep learning models.
+ [Interactive Tensor Analysis in SageMaker Studio Notebook with MXNet](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-debugger/mnist_tensor_analysis)

  This notebook example shows how to visualize saved tensors using Amazon SageMaker Debugger. By visualizing the tensors, you can see how the tensor values change while training deep learning algorithms. This notebook includes a training job with a poorly configured neural network and uses Amazon SageMaker Debugger to aggregate and analyze tensors, including gradients, activation outputs, and weights. For example, the following plot shows the distribution of gradients of a convolutional layer that is suffering from a vanishing gradient problem.  
![\[A graph plotting the distribution of gradients.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/debugger/debugger-vanishing-gradient.gif)

  This notebook also illustrates how a good initial hyperparameter setting improves the training process by generating the same tensor distribution plots. 
+ [ Visualizing and Debugging Tensors from MXNet Model Training](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-debugger/mnist_tensor_plot)

   This notebook example shows how to save and visualize tensors from an MXNet Gluon model training job using Amazon SageMaker Debugger. It illustrates that Debugger is set to save all tensors to an Amazon S3 bucket and retrieves ReLu activation outputs for the visualization. The following figure shows a three-dimensional visualization of the ReLu activation outputs. The color scheme is set to blue to indicate values close to 0 and yellow to indicate values close to 1.   
![\[A visualization of the ReLU activation outputs\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/tensorplot.gif)

  In this notebook, the `TensorPlot` class imported from `tensor_plot.py` is designed to plot convolutional neural networks (CNNs) that take two-dimensional images for inputs. The `tensor_plot.py` script provided with the notebook retrieves tensors using Debugger and visualizes the CNN. You can run this notebook on SageMaker Studio to reproduce the tensor visualization and implement your own convolutional neural network model. 
+ [Real-time Tensor Analysis in a SageMaker Notebook with MXNet](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-debugger/mxnet_realtime_analysis)

  This example guides you through installing required components for emitting tensors in an Amazon SageMaker training job and using the Debugger API operations to access those tensors while training is running. A gluon CNN model is trained on the Fashion MNIST dataset. While the job is running, you will see how Debugger retrieves activation outputs of the first convolutional layer from each of 100 batches and visualizes them. Also, this will show you how to visualize weights after the job is done.

## Save tensors using Debugger built-in collections
<a name="debugger-save-built-in-collections"></a>

You can use built-in collections of tensors using the `CollectionConfig` API and save them using the `DebuggerHookConfig` API. The following example shows how to use the default settings of Debugger hook configurations to construct a SageMaker AI TensorFlow estimator. You can also utilize this for MXNet, PyTorch, and XGBoost estimators.

**Note**  
In the following example code, the `s3_output_path` parameter for `DebuggerHookConfig` is optional. If you do not specify it, Debugger saves the tensors at `s3://<output_path>/debug-output/`, where the `<output_path>` is the default output path of SageMaker training jobs. For example:  

```
"s3://sagemaker-us-east-1-111122223333/sagemaker-debugger-training-YYYY-MM-DD-HH-MM-SS-123/debug-output"
```

```
import sagemaker
from sagemaker.tensorflow import TensorFlow
from sagemaker.debugger import DebuggerHookConfig, CollectionConfig

# use Debugger CollectionConfig to call built-in collections
collection_configs=[
        CollectionConfig(name="weights"),
        CollectionConfig(name="gradients"),
        CollectionConfig(name="losses"),
        CollectionConfig(name="biases")
    ]

# configure Debugger hook
# set a target S3 bucket as you want
sagemaker_session=sagemaker.Session()
BUCKET_NAME=sagemaker_session.default_bucket()
LOCATION_IN_BUCKET='debugger-built-in-collections-hook'

hook_config=DebuggerHookConfig(
    s3_output_path='s3://{BUCKET_NAME}/{LOCATION_IN_BUCKET}'.
                    format(BUCKET_NAME=BUCKET_NAME, 
                           LOCATION_IN_BUCKET=LOCATION_IN_BUCKET),
    collection_configs=collection_configs
)

# construct a SageMaker TensorFlow estimator
sagemaker_estimator=TensorFlow(
    entry_point='directory/to/your_training_script.py',
    role=sm.get_execution_role(),
    base_job_name='debugger-demo-job',
    instance_count=1,
    instance_type="ml.p3.2xlarge",
    framework_version="2.9.0",
    py_version="py39",
    
    # debugger-specific hook argument below
    debugger_hook_config=hook_config
)

sagemaker_estimator.fit()
```

To see a list of Debugger built-in collections, see [Debugger Built-in Collections](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/api.md#collection).

## Save tensors by modifying Debugger built-in collections
<a name="debugger-save-modified-built-in-collections"></a>

You can modify the Debugger built-in collections using the `CollectionConfig` API operation. The following example shows how to tweak the built-in `losses` collection and construct a SageMaker AI TensorFlow estimator. You can also use this for MXNet, PyTorch, and XGBoost estimators.

```
import sagemaker
from sagemaker.tensorflow import TensorFlow
from sagemaker.debugger import DebuggerHookConfig, CollectionConfig

# use Debugger CollectionConfig to call and modify built-in collections
collection_configs=[
    CollectionConfig(
                name="losses", 
                parameters={"save_interval": "50"})]

# configure Debugger hook
# set a target S3 bucket as you want
sagemaker_session=sagemaker.Session()
BUCKET_NAME=sagemaker_session.default_bucket()
LOCATION_IN_BUCKET='debugger-modified-collections-hook'

hook_config=DebuggerHookConfig(
    s3_output_path='s3://{BUCKET_NAME}/{LOCATION_IN_BUCKET}'.
                    format(BUCKET_NAME=BUCKET_NAME, 
                           LOCATION_IN_BUCKET=LOCATION_IN_BUCKET),
    collection_configs=collection_configs
)

# construct a SageMaker TensorFlow estimator
sagemaker_estimator=TensorFlow(
    entry_point='directory/to/your_training_script.py',
    role=sm.get_execution_role(),
    base_job_name='debugger-demo-job',
    instance_count=1,
    instance_type="ml.p3.2xlarge",
    framework_version="2.9.0",
    py_version="py39",
    
    # debugger-specific hook argument below
    debugger_hook_config=hook_config
)

sagemaker_estimator.fit()
```

For a full list of `CollectionConfig` parameters, see [ Debugger CollectionConfig API](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/api.md#configuring-collection-using-sagemaker-python-sdk).

## Save tensors using Debugger custom collections
<a name="debugger-save-custom-collections"></a>

You can also save a reduced number of tensors instead of the full set of tensors (for example, if you want to reduce the amount of data saved in your Amazon S3 bucket). The following example shows how to customize the Debugger hook configuration to specify target tensors that you want to save. You can use this for TensorFlow, MXNet, PyTorch, and XGBoost estimators.

```
import sagemaker
from sagemaker.tensorflow import TensorFlow
from sagemaker.debugger import DebuggerHookConfig, CollectionConfig

# use Debugger CollectionConfig to create a custom collection
collection_configs=[
        CollectionConfig(
            name="custom_activations_collection",
            parameters={
                "include_regex": "relu|tanh", # Required
                "reductions": "mean,variance,max,abs_mean,abs_variance,abs_max"
            })
    ]
    
# configure Debugger hook
# set a target S3 bucket as you want
sagemaker_session=sagemaker.Session()
BUCKET_NAME=sagemaker_session.default_bucket()
LOCATION_IN_BUCKET='debugger-custom-collections-hook'

hook_config=DebuggerHookConfig(
    s3_output_path='s3://{BUCKET_NAME}/{LOCATION_IN_BUCKET}'.
                    format(BUCKET_NAME=BUCKET_NAME, 
                           LOCATION_IN_BUCKET=LOCATION_IN_BUCKET),
    collection_configs=collection_configs
)

# construct a SageMaker TensorFlow estimator
sagemaker_estimator=TensorFlow(
    entry_point='directory/to/your_training_script.py',
    role=sm.get_execution_role(),
    base_job_name='debugger-demo-job',
    instance_count=1,
    instance_type="ml.p3.2xlarge",
    framework_version="2.9.0",
    py_version="py39",
    
    # debugger-specific hook argument below
    debugger_hook_config=hook_config
)

sagemaker_estimator.fit()
```

For a full list of `CollectionConfig` parameters, see [ Debugger CollectionConfig](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/api.md#configuring-collection-using-sagemaker-python-sdk).

# How to configure Debugger built-in rules
<a name="use-debugger-built-in-rules"></a>

In the following topics, you'll learn how to use the SageMaker Debugger built-in rules. Amazon SageMaker Debugger's built-in rules analyze tensors emitted during the training of a model. SageMaker AI Debugger offers the `Rule` API operation that monitors training job progress and errors for the success of training your model. For example, the rules can detect whether gradients are getting too large or too small, whether a model is overfitting or overtraining, and whether a training job does not decrease loss function and improve. To see a full list of available built-in rules, see [List of Debugger built-in rules](debugger-built-in-rules.md).

**Topics**
+ [Use Debugger built-in rules with the default parameter settings](debugger-built-in-rules-configuration.md)
+ [Use Debugger built-in rules with custom parameter values](debugger-built-in-rules-configuration-param-change.md)
+ [Example notebooks and code samples to configure Debugger rules](debugger-built-in-rules-example.md)

# Use Debugger built-in rules with the default parameter settings
<a name="debugger-built-in-rules-configuration"></a>

To specify Debugger built-in rules in an estimator, you need to configure a list object. The following example code shows the basic structure of listing the Debugger built-in rules:

```
from sagemaker.debugger import Rule, rule_configs

rules=[
    Rule.sagemaker(rule_configs.built_in_rule_name_1()),
    Rule.sagemaker(rule_configs.built_in_rule_name_2()),
    ...
    Rule.sagemaker(rule_configs.built_in_rule_name_n()),
    ... # You can also append more profiler rules in the ProfilerRule.sagemaker(rule_configs.*()) format.
]
```

For more information about default parameter values and descriptions of the built-in rule, see [List of Debugger built-in rules](debugger-built-in-rules.md).

To find the SageMaker Debugger API reference, see [https://sagemaker.readthedocs.io/en/stable/api/training/debugger.html#sagemaker.debugger.sagemaker.debugger.rule_configs](https://sagemaker.readthedocs.io/en/stable/api/training/debugger.html#sagemaker.debugger.sagemaker.debugger.rule_configs) and [https://sagemaker.readthedocs.io/en/stable/api/training/debugger.html#sagemaker.debugger.Rule](https://sagemaker.readthedocs.io/en/stable/api/training/debugger.html#sagemaker.debugger.Rule).

For example, to inspect the overall training performance and progress of your model, construct a SageMaker AI estimator with the following built-in rule configuration. 

```
from sagemaker.debugger import Rule, rule_configs

rules=[
    Rule.sagemaker(rule_configs.loss_not_decreasing()),
    Rule.sagemaker(rule_configs.overfit()),
    Rule.sagemaker(rule_configs.overtraining()),
    Rule.sagemaker(rule_configs.stalled_training_rule())
]
```

When you start the training job, Debugger collects system resource utilization data every 500 milliseconds and the loss and accuracy values every 500 steps by default. Debugger analyzes the resource utilization to identify if your model is having bottleneck problems. The `loss_not_decreasing`, `overfit`, `overtraining`, and `stalled_training_rule` monitors if your model is optimizing the loss function without those training issues. If the rules detect training anomalies, the rule evaluation status changes to `IssueFound`. You can set up automated actions, such as notifying training issues and stopping training jobs using Amazon CloudWatch Events and AWS Lambda. For more information, see [Action on Amazon SageMaker Debugger rules](debugger-action-on-rules.md).



# Use Debugger built-in rules with custom parameter values
<a name="debugger-built-in-rules-configuration-param-change"></a>

If you want to adjust the built-in rule parameter values and customize tensor collection regex, configure the `base_config` and `rule_parameters` parameters for the `ProfilerRule.sagemaker` and `Rule.sagemaker` classmethods. In case of the `Rule.sagemaker` class methods, you can also customize tensor collections through the `collections_to_save` parameter. The instruction of how to use the `CollectionConfig` class is provided at [Configure tensor collections using the `CollectionConfig` API](debugger-configure-tensor-collections.md). 

Use the following configuration template for built-in rules to customize parameter values. By changing the rule parameters as you want, you can adjust the sensitivity of the rules to be triggered. 
+ The `base_config` argument is where you call the built-in rule methods.
+ The `rule_parameters` argument is to adjust the default key values of the built-in rules listed in [List of Debugger built-in rules](debugger-built-in-rules.md).
+ The `collections_to_save` argument takes in a tensor configuration through the `CollectionConfig` API, which requires `name` and `parameters` arguments. 
  + To find available tensor collections for `name`, see [ Debugger Built-in Tensor Collections ](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/api.md#built-in-collections). 
  + For a full list of adjustable `parameters`, see [ Debugger CollectionConfig API](https://github.com/awslabs/sagemaker-debugger/blob/master/docs/api.md#configuring-collection-using-sagemaker-python-sdk).

For more information about the Debugger rule class, methods, and parameters, see [SageMaker AI Debugger Rule class](https://sagemaker.readthedocs.io/en/stable/api/training/debugger.html) in the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable).

```
from sagemaker.debugger import Rule, ProfilerRule, rule_configs, CollectionConfig

rules=[
    Rule.sagemaker(
        base_config=rule_configs.built_in_rule_name(),
        rule_parameters={
                "key": "value"
        },
        collections_to_save=[ 
            CollectionConfig(
                name="tensor_collection_name", 
                parameters={
                    "key": "value"
                } 
            )
        ]
    )
]
```

The parameter descriptions and value customization examples are provided for each rule at [List of Debugger built-in rules](debugger-built-in-rules.md).

# Example notebooks and code samples to configure Debugger rules
<a name="debugger-built-in-rules-example"></a>

In the following sections, notebooks and code samples of how to use Debugger rules to monitor SageMaker training jobs are provided.

**Topics**
+ [Debugger built-in rules example notebooks](#debugger-built-in-rules-notebook-example)
+ [Debugger built-in rules example code](#debugger-deploy-built-in-rules)
+ [Use Debugger built-in rules with parameter modifications](#debugger-deploy-modified-built-in-rules)

## Debugger built-in rules example notebooks
<a name="debugger-built-in-rules-notebook-example"></a>

The following example notebooks show how to use Debugger built-in rules when running training jobs with Amazon SageMaker AI: 
+ [Using a SageMaker Debugger built-in rule with TensorFlow](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-debugger/tensorflow_builtin_rule)
+ [Using a SageMaker Debugger built-in rule with Managed Spot Training and MXNet](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-debugger/mxnet_spot_training)
+ [Using a SageMaker Debugger built-in rule with parameter modifications for a real-time training job analysis with XGBoost](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-debugger/xgboost_realtime_analysis)

While running the example notebooks in SageMaker Studio, you can find the training job trial created on the **Studio Experiment List** tab. For example, as shown in the following screenshot, you can find and open a **Describe Trial Component** window of your current training job. On the Debugger tab, you can check if the Debugger rules, `vanishing_gradient()` and `loss_not_decreasing()`, are monitoring the training session in parallel. For a full instruction of how to find your training job trial components in the Studio UI, see [SageMaker Studio - View Experiments, Trials, and Trial Components](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-tasks.html#studio-tasks-experiments).

![\[An image of running a training job with Debugger built-in rules activated in SageMaker Studio\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/debugger/debugger-built-in-rule-studio.png)


There are two ways of using the Debugger built-in rules in the SageMaker AI environment: deploy the built-in rules as it is prepared or adjust their parameters as you want. The following topics show you how to use the built-in rules with example codes.

## Debugger built-in rules example code
<a name="debugger-deploy-built-in-rules"></a>

The following code sample shows how to set the Debugger built-in rules using the `Rule.sagemaker` method. To specify built-in rules that you want to run, use the `rules_configs` API operation to call the built-in rules. To find a full list of Debugger built-in rules and default parameter values, see [List of Debugger built-in rules](debugger-built-in-rules.md).

```
import sagemaker
from sagemaker.tensorflow import TensorFlow
from sagemaker.debugger import Rule, CollectionConfig, rule_configs

# call built-in rules that you want to use.
built_in_rules=[ 
            Rule.sagemaker(rule_configs.vanishing_gradient())
            Rule.sagemaker(rule_configs.loss_not_decreasing())
]

# construct a SageMaker AI estimator with the Debugger built-in rules
sagemaker_estimator=TensorFlow(
    entry_point='directory/to/your_training_script.py',
    role=sm.get_execution_role(),
    base_job_name='debugger-built-in-rules-demo',
    instance_count=1,
    instance_type="ml.p3.2xlarge",
    framework_version="2.9.0",
    py_version="py39",

    # debugger-specific arguments below
    rules=built_in_rules
)
sagemaker_estimator.fit()
```

**Note**  
The Debugger built-in rules run in parallel with your training job. The maximum number of built-in rule containers for a training job is 20. 

For more information about the Debugger rule class, methods, and parameters, see the [SageMaker Debugger Rule class](https://sagemaker.readthedocs.io/en/stable/api/training/debugger.html) in the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable). 

To find an example of how to adjust the Debugger rule parameters, see the following [Use Debugger built-in rules with parameter modifications](#debugger-deploy-modified-built-in-rules) section.

## Use Debugger built-in rules with parameter modifications
<a name="debugger-deploy-modified-built-in-rules"></a>

The following code example shows the structure of built-in rules to adjust parameters. In this example, the `stalled_training_rule` collects the `losses` tensor collection from a training job at every 50 steps and an evaluation stage at every 10 steps. If the training process starts stalling and not collecting tensor outputs for 120 seconds, the `stalled_training_rule` stops the training job. 

```
import sagemaker
from sagemaker.tensorflow import TensorFlow
from sagemaker.debugger import Rule, CollectionConfig, rule_configs

# call the built-in rules and modify the CollectionConfig parameters

base_job_name_prefix= 'smdebug-stalled-demo-' + str(int(time.time()))

built_in_rules_modified=[
    Rule.sagemaker(
        base_config=rule_configs.stalled_training_rule(),
        rule_parameters={
                'threshold': '120',
                'training_job_name_prefix': base_job_name_prefix,
                'stop_training_on_fire' : 'True'
        }
        collections_to_save=[ 
            CollectionConfig(
                name="losses", 
                parameters={
                      "train.save_interval": "50"
                      "eval.save_interval": "10"
                } 
            )
        ]
    )
]

# construct a SageMaker AI estimator with the modified Debugger built-in rule
sagemaker_estimator=TensorFlow(
    entry_point='directory/to/your_training_script.py',
    role=sm.get_execution_role(),
    base_job_name=base_job_name_prefix,
    instance_count=1,
    instance_type="ml.p3.2xlarge",
    framework_version="2.9.0",
    py_version="py39",

    # debugger-specific arguments below
    rules=built_in_rules_modified
)
sagemaker_estimator.fit()
```

For an advanced configuration of the Debugger built-in rules using the `CreateTrainingJob` API, see [Configure Debugger using SageMaker API](debugger-createtrainingjob-api.md).

# Turn off Debugger
<a name="debugger-turn-off"></a>

If you want to completely turn off Debugger, do one of the following:
+ Before starting a training job, do the following:

  To stop both monitoring and profiling, include the `disable_profiler` parameter to your estimator and set it to `True`.
**Warning**  
If you disable it, you won't be able to view the comprehensive Studio Debugger insights dashboard and the autogenerated profiling report.

  To stop debugging, set the `debugger_hook_config` parameter to `False`.
**Warning**  
If you disable it, you won't be able to collect output tensors and cannot debug your model parameters.

  ```
  estimator=Estimator(
      ...
      disable_profiler=True
      debugger_hook_config=False
  )
  ```

  For more information about the Debugger-specific parameters, see [SageMaker AI Estimator](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html#sagemaker.estimator.Estimator) in the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable).
+ While a training job is running, do the following:

  To disable both monitoring and profiling while your training job is running, use the following estimator classmethod:

  ```
  estimator.disable_profiling()
  ```

  To disable framework profiling only and keep system monitoring, use the `update_profiler` method:

  ```
  estimator.update_profiler(disable_framework_metrics=true)
  ```

  For more information about the estimator extension methods, see the [estimator.disable\$1profiling](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html#sagemaker.estimator.Estimator.disable_profiling) and [estimator.update\$1profiler](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html#sagemaker.estimator.Estimator.update_profiler) classmethods in the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) documentation.

# Useful SageMaker AI estimator class methods for Debugger
<a name="debugger-estimator-classmethods"></a>

The following estimator class methods are useful for accessing your SageMaker training job information and retrieving output paths of training data collected by Debugger. The following methods are executable after you initiate a training job with the `estimator.fit()` method.
+ To check the base S3 bucket URI of a SageMaker training job:

  ```
  estimator.output_path
  ```
+ To check the base job name of a SageMaker training job:

  ```
  estimator.latest_training_job.job_name
  ```
+ To see a full `CreateTrainingJob` API operation configuration of a SageMaker training job:

  ```
  estimator.latest_training_job.describe()
  ```
+ To check a full list of the Debugger rules while a SageMaker training job is running:

  ```
  estimator.latest_training_job.rule_job_summary()
  ```
+ To check the S3 bucket URI where the model parameter data (output tensors) are saved:

  ```
  estimator.latest_job_debugger_artifacts_path()
  ```
+ To check the S3 bucket URI at where the model performance data (system and framework metrics) are saved:

  ```
  estimator.latest_job_profiler_artifacts_path()
  ```
+ To check the Debugger rule configuration for debugging output tensors:

  ```
  estimator.debugger_rule_configs
  ```
+ To check the list of the Debugger rules for debugging while a SageMaker training job is running:

  ```
  estimator.debugger_rules
  ```
+ To check the Debugger rule configuration for monitoring and profiling system and framework metrics:

  ```
  estimator.profiler_rule_configs
  ```
+ To check the list of the Debugger rules for monitoring and profiling while a SageMaker training job is running:

  ```
  estimator.profiler_rules
  ```

For more information about the SageMaker AI estimator class and its methods, see [Estimator API](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html#sagemaker.estimator.Estimator) in the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable).