本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。
使用 Python 使用调试器启动训练作业 SageMaker SDK
要使用调试器配置 SageMaker 估算器,请使用 Amaz on Pyth SageMaker on SDK 并指定 SageMaker 调试器特定的参数。要充分利用调试功能,需要配置三个参数:debugger_hook_config
、tensorboard_output_config
和 rules
。
使用调试器特定的参数构造 SageMaker 估计器
本节中的代码示例展示了如何使用调试器特定的参数构造 SageMaker 估计器。
以下代码示例是用于构造 SageMaker 框架估算器的模板,不能直接执行。您需要继续完成下一个部分中的内容,配置 Debugger 特定的参数。
- PyTorch
-
# An example of constructing a SageMaker PyTorch estimator
import boto3
import sagemaker
from sagemaker.pytorch import PyTorch
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs
session=boto3.session.Session()
region=session.region_name
debugger_hook_config
=DebuggerHookConfig(...)
rules
=[
Rule.sagemaker(rule_configs.built_in_rule())
]
estimator=PyTorch(
entry_point="directory/to/your_training_script.py
",
role=sagemaker.get_execution_role(),
base_job_name="debugger-demo
",
instance_count=1
,
instance_type="ml.p3.2xlarge
",
framework_version="1.12.0
",
py_version="py37
",
# Debugger-specific parameters
debugger_hook_config=debugger_hook_config
,
rules=rules
)
estimator.fit(wait=False)
- TensorFlow
-
# An example of constructing a SageMaker TensorFlow estimator
import boto3
import sagemaker
from sagemaker.tensorflow import TensorFlow
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs
session=boto3.session.Session()
region=session.region_name
debugger_hook_config
=DebuggerHookConfig(...)
rules
=[
Rule.sagemaker(rule_configs.built_in_rule())
,
ProfilerRule.sagemaker(rule_configs.BuiltInRule())
]
estimator=TensorFlow(
entry_point="directory/to/your_training_script.py
",
role=sagemaker.get_execution_role(),
base_job_name="debugger-demo
",
instance_count=1
,
instance_type="ml.p3.2xlarge
",
framework_version="2.9.0
",
py_version="py39
",
# Debugger-specific parameters
debugger_hook_config=debugger_hook_config
,
rules=rules
)
estimator.fit(wait=False)
- MXNet
-
# An example of constructing a SageMaker MXNet estimator
import sagemaker
from sagemaker.mxnet import MXNet
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs
debugger_hook_config
=DebuggerHookConfig(...)
rules
=[
Rule.sagemaker(rule_configs.built_in_rule())
]
estimator=MXNet(
entry_point="directory/to/your_training_script.py
",
role=sagemaker.get_execution_role(),
base_job_name="debugger-demo
",
instance_count=1
,
instance_type="ml.p3.2xlarge
",
framework_version="1.7.0
",
py_version="py37
",
# Debugger-specific parameters
debugger_hook_config=debugger_hook_config
,
rules=rules
)
estimator.fit(wait=False)
- XGBoost
-
# An example of constructing a SageMaker XGBoost estimator
import sagemaker
from sagemaker.xgboost.estimator import XGBoost
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs
debugger_hook_config
=DebuggerHookConfig(...)
rules
=[
Rule.sagemaker(rule_configs.built_in_rule())
]
estimator=XGBoost(
entry_point="directory/to/your_training_script.py
",
role=sagemaker.get_execution_role(),
base_job_name="debugger-demo
",
instance_count=1
,
instance_type="ml.p3.2xlarge
",
framework_version="1.5-1
",
# Debugger-specific parameters
debugger_hook_config=debugger_hook_config
,
rules=rules
)
estimator.fit(wait=False)
- Generic estimator
-
# An example of constructing a SageMaker generic estimator using the XGBoost algorithm base image
import boto3
import sagemaker
from sagemaker.estimator import Estimator
from sagemaker import image_uris
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs
debugger_hook_config
=DebuggerHookConfig(...)
rules
=[
Rule.sagemaker(rule_configs.built_in_rule())
]
region=boto3.Session().region_name
xgboost_container=sagemaker.image_uris.retrieve("xgboost", region, "1.5-1")
estimator=Estimator(
role=sagemaker.get_execution_role()
image_uri=xgboost_container,
base_job_name="debugger-demo
",
instance_count=1
,
instance_type="ml.m5.2xlarge
",
# Debugger-specific parameters
debugger_hook_config=debugger_hook_config
,
rules=rules
)
estimator.fit(wait=False)
配置以下参数以激活 SageMaker 调试器:
SageMaker 调试器将输出张量安全地保存在 S3 存储桶的子文件夹中。例如,您的账户中默认 S3 存储桶URI的格式为s3://sagemaker-<region>-<12digit_account_id>/<base-job-name>/<debugger-subfolders>/
。 SageMaker 调试器创建了两个子文件夹:debug-output
、和。rule-output
如果您添加 tensorboard_output_config
参数,则还会找到 tensorboard-output
文件夹。
请参阅以下主题,查找更多详细说明如何配置 Debugger 特定参数的示例。