Launch training jobs with Debugger using the SageMaker Python SDK
To configure a SageMaker AI estimator with SageMaker Debugger, use Amazon SageMaker Python SDKdebugger_hook_config
,
tensorboard_output_config
, and rules
.
Important
Before constructing and running the estimator fit method to launch a training job, make sure that you adapt your training script following the instructions at Adapting your training script to register a hook.
Constructing a SageMaker AI Estimator with Debugger-specific parameters
The code examples in this section show how to construct a SageMaker AI estimator with the Debugger-specific parameters.
Note
The following code examples are templates for constructing the SageMaker AI framework estimators and not directly executable. You need to proceed to the next sections and configure the Debugger-specific parameters.
Configure the following parameters to activate SageMaker Debugger:
-
debugger_hook_config
(an object ofDebuggerHookConfig
) – Required to activate the hook in the adapted training script during Adapting your training script to register a hook, configure the SageMaker training launcher (estimator) to collect output tensors from your training job, and save the tensors into your secured S3 bucket or local machine. To learn how to configure the debugger_hook_config
parameter, see Configuring SageMaker Debugger to save tensors. -
rules
(a list ofRule
objects) – Configure this parameter to activate SageMaker Debugger built-in rules that you want to run in real time. The built-in rules are logics that automatically debug the training progress of your model and find training issues by analyzing the output tensors saved in your secured S3 bucket. To learn how to configure the rules
parameter, see How to configure Debugger built-in rules. To find a complete list of built-in rules for debugging output tensors, see Debugger rule. If you want to create your own logic to detect any training issues, see Creating custom rules using the Debugger client library.Note
The built-in rules are available only through SageMaker training instances. You cannot use them in local mode.
-
tensorboard_output_config
(an object ofTensorBoardOutputConfig
) – Configure SageMaker Debugger to collect output tensors in the TensorBoard-compatible format and save to your S3 output path specified in the TensorBoardOutputConfig
object. To learn more, see Visualize Amazon SageMaker Debugger output tensors in TensorBoard.Note
The
tensorboard_output_config
must be configured with thedebugger_hook_config
parameter, which also requires you to adapt your training script by adding thesagemaker-debugger
hook.
Note
SageMaker Debugger securely saves output tensors in subfolders of your S3 bucket. For
example, the format of the default S3 bucket URI in your account is
s3://amzn-s3-demo-bucket-sagemaker-<region>-<12digit_account_id>/<base-job-name>/<debugger-subfolders>/
.
There are two subfolders created by SageMaker Debugger: debug-output
, and
rule-output
. If you add the tensorboard_output_config
parameter, you'll also find tensorboard-output
folder.
See the following topics to find more examples of how to configure the Debugger-specific parameters in detail.