SageMaker Python SDK を使用して SageMaker Training Compiler をアクティブ化する方法について説明します。SageMaker AI CreateTrainingJob API オペレーションを使用して SageMaker Training Compiler をアクティブ化する方法について説明します。

SageMaker Training Compiler を使用して PyTorch トレーニングジョブを実行する

SageMaker Training Compiler では、Amazon SageMaker Studio Classic、Amazon SageMaker ノートブックインスタンス AWS SDK for Python (Boto3)、およびのいずれかの Amazon SageMakerインターフェイスを使用してトレーニングジョブを実行できます AWS Command Line Interface。

トピック

SageMaker Python SDK を使用する
SageMaker AI CreateTrainingJob API オペレーションの使用

SageMaker Python SDK を使用する

PyTorch 用 SageMaker Training Compiler は、SageMaker AI PyTorchおよびHuggingFaceフレームワーク推定器クラスを通じて利用できます。SageMaker Training Compiler を有効にするには、SageMaker AI 推定器に compiler_configパラメータを追加します。TrainingCompilerConfig クラスをインポートし、そのインスタンスを compiler_config パラメータに渡します。次のコード例は、SageMaker Training Compiler を有効にした SageMaker AI 推定器クラスの構造を示しています。

ヒント

PyTorch または Transformers が提供するビルド済みモデルの使用を開始するには、テスト済みモデルのリファレンステーブルに記載されているバッチサイズをお試しください。

注記

ネイティブの PyTorch サポートは、SageMaker Python SDK v2.121.0 以降で利用できます。SageMaker Python SDK を適宜アップデートしてください。

注記

PyTorch v1.12.0 から、PyTorch 用 SageMaker Training Compiler コンテナが利用可能になりました。PyTorch 用 SageMaker Training Compiler コンテナには Hugging Face Transformers がパッケージされていないことに注意してください。ライブラリをコンテナにインストールする必要がある場合は、トレーニングジョブを送信するときに、ソースディレクトリの下に requirements.txt ファイルを追加してください。

PyTorch v1.11.0 以前の場合は、Hugging Face と PyTorch 用 SageMaker Training Compiler コンテナの以前のバージョンを使用してください。

フレームワークバージョンと対応するコンテナ情報の完全なリストについては、サポートされるフレームワークを参照してください。

ユースケースに合った情報については、次のオプションのいずれかを参照してください。

PyTorch v1.12.0 and later

PyTorch モデルをコンパイルしてトレーニングするには、次のコード例に示すように、SageMaker Training Compiler を使用して SageMaker AI PyTorch 推定器を設定します。

注記

このネイティブ PyTorch サポートは、SageMaker AI Python SDK v2.120.0 以降で利用できます。SageMaker AI Python SDK を必ず更新してください。


from sagemaker.pytorch import PyTorch, TrainingCompilerConfig

# the original max batch size that can fit into GPU memory without compiler
batch_size_native=12
learning_rate_native=float('5e-5')

# an updated max batch size that can fit into GPU memory with compiler
batch_size=64

# update learning rate
learning_rate=learning_rate_native/batch_size_native*batch_size

hyperparameters={
    "n_gpus": 1,
    "batch_size": batch_size,
    "learning_rate": learning_rate
}

pytorch_estimator=PyTorch(
    entry_point='train.py',
    source_dir='path-to-requirements-file', # Optional. Add this if need to install additional packages.
    instance_count=1,
    instance_type='ml.p3.2xlarge',
    framework_version='1.13.1',
    py_version='py3',
    hyperparameters=hyperparameters,
    compiler_config=TrainingCompilerConfig(),
    disable_profiler=True,
    debugger_hook_config=False
)

pytorch_estimator.fit()

Hugging Face Transformers with PyTorch v1.11.0 and before

PyTorch でトランスフォーマーモデルをコンパイルしてトレーニングするには、次のコード例に示すように、SageMaker Training Compiler で SageMaker AI Hugging Face 推定器を設定します。


from sagemaker.huggingface import HuggingFace, TrainingCompilerConfig

# the original max batch size that can fit into GPU memory without compiler
batch_size_native=12
learning_rate_native=float('5e-5')

# an updated max batch size that can fit into GPU memory with compiler
batch_size=64

# update learning rate
learning_rate=learning_rate_native/batch_size_native*batch_size

hyperparameters={
    "n_gpus": 1,
    "batch_size": batch_size,
    "learning_rate": learning_rate
}

pytorch_huggingface_estimator=HuggingFace(
    entry_point='train.py',
    instance_count=1,
    instance_type='ml.p3.2xlarge',
    transformers_version='4.21.1',
    pytorch_version='1.11.0',
    hyperparameters=hyperparameters,
    compiler_config=TrainingCompilerConfig(),
    disable_profiler=True,
    debugger_hook_config=False
)

pytorch_huggingface_estimator.fit()

トレーニングスクリプトを準備するには、次のページを参照してください。

Hugging Face Transformers の Trainer API を使用した PyTorch モデルのシングル GPU のトレーニングの場合
Hugging Face Transformers の Trainer API を使用しない PyTorch モデルのシングル GPU のトレーニングの場合

エンドツーエンドの例については、次のノートブックを参照してください。

PyTorch v1.12

PyTorch v1.12 では、SageMaker AI PyTorch 推定器クラスの distributionパラメータに指定された pytorch_xlaオプションを追加することで、SageMaker Training Compiler で分散トレーニングを実行できます。

注記

このネイティブ PyTorch サポートは、SageMaker AI Python SDK v2.121.0 以降で利用できます。SageMaker AI Python SDK を必ず更新してください。


from sagemaker.pytorch import PyTorch, TrainingCompilerConfig

# choose an instance type, specify the number of instances you want to use,
# and set the num_gpus variable the number of GPUs per instance.
instance_count=1
instance_type='ml.p3.8xlarge'
num_gpus=4

# the original max batch size that can fit to GPU memory without compiler
batch_size_native=16
learning_rate_native=float('5e-5')

# an updated max batch size that can fit to GPU memory with compiler
batch_size=26

# update learning rate
learning_rate=learning_rate_native/batch_size_native*batch_size*num_gpus*instance_count

hyperparameters={
    "n_gpus": num_gpus,
    "batch_size": batch_size,
    "learning_rate": learning_rate
}

pytorch_estimator=PyTorch(
    entry_point='your_training_script.py',
    source_dir='path-to-requirements-file', # Optional. Add this if need to install additional packages.
    instance_count=instance_count,
    instance_type=instance_type,
    framework_version='1.13.1',
    py_version='py3',
    hyperparameters=hyperparameters,
    compiler_config=TrainingCompilerConfig(),
    distribution ={'pytorchxla' : { 'enabled': True }},
    disable_profiler=True,
    debugger_hook_config=False
)

pytorch_estimator.fit()

ヒント

トレーニングスクリプトを準備するには、「PyTorch」を参照してください。

Transformers v4.21 with PyTorch v1.11

PyTorch v1.11 以降では、distribution パラメータに指定された pytorch_xla オプションを使用して、SageMaker Training Compiler を分散トレーニングに利用できます。


from sagemaker.huggingface import HuggingFace, TrainingCompilerConfig

# choose an instance type, specify the number of instances you want to use,
# and set the num_gpus variable the number of GPUs per instance.
instance_count=1
instance_type='ml.p3.8xlarge'
num_gpus=4

# the original max batch size that can fit to GPU memory without compiler
batch_size_native=16
learning_rate_native=float('5e-5')

# an updated max batch size that can fit to GPU memory with compiler
batch_size=26

# update learning rate
learning_rate=learning_rate_native/batch_size_native*batch_size*num_gpus*instance_count

hyperparameters={
    "n_gpus": num_gpus,
    "batch_size": batch_size,
    "learning_rate": learning_rate
}

pytorch_huggingface_estimator=HuggingFace(
    entry_point='your_training_script.py',
    instance_count=instance_count,
    instance_type=instance_type,
    transformers_version='4.21.1',
    pytorch_version='1.11.0',
    hyperparameters=hyperparameters,
    compiler_config=TrainingCompilerConfig(),
    distribution ={'pytorchxla' : { 'enabled': True }},
    disable_profiler=True,
    debugger_hook_config=False
)

pytorch_huggingface_estimator.fit()

ヒント

トレーニングスクリプトを準備するには、次のページを参照してください。

Hugging Face Transformers の Trainer API を使用した PyTorch モデルの分散トレーニングの場合
Hugging Face Transformers の Trainer API を使用しない PyTorch モデルの分散トレーニングの場合

Transformers v4.17 with PyTorch v1.10.2 and before

PyTorch v1.10.2 以前のサポート対象バージョンでは、SageMaker Training Compiler には分散型トレーニングジョブを起動するための代替メカニズムが必要です。分散トレーニングを実行するには、SageMaker Training Compiler で SageMaker AI 分散トレーニングランチャースクリプトを entry_point引数に渡し、トレーニングスクリプトを hyperparameters引数に渡す必要があります。次のコード例は、必要な変更を適用する SageMaker AI Hugging Face 推定器を設定する方法を示しています。


from sagemaker.huggingface import HuggingFace, TrainingCompilerConfig

# choose an instance type, specify the number of instances you want to use,
# and set the num_gpus variable the number of GPUs per instance.
instance_count=1
instance_type='ml.p3.8xlarge'
num_gpus=4

# the original max batch size that can fit to GPU memory without compiler
batch_size_native=16
learning_rate_native=float('5e-5')

# an updated max batch size that can fit to GPU memory with compiler
batch_size=26

# update learning rate
learning_rate=learning_rate_native/batch_size_native*batch_size*num_gpus*instance_count

training_script="your_training_script.py"

hyperparameters={
    "n_gpus": num_gpus,
    "batch_size": batch_size,
    "learning_rate": learning_rate,
    "training_script": training_script     # Specify the file name of your training script.
}

pytorch_huggingface_estimator=HuggingFace(
    entry_point='distributed_training_launcher.py',    # Specify the distributed training launcher script.
    instance_count=instance_count,
    instance_type=instance_type,
    transformers_version='4.17.0',
    pytorch_version='1.10.2',
    hyperparameters=hyperparameters,
    compiler_config=TrainingCompilerConfig(),
    disable_profiler=True,
    debugger_hook_config=False
)

pytorch_huggingface_estimator.fit()

ランチャースクリプトは次のようになります。ランチャースクリプトは、トレーニングスクリプトをラップし、選択したトレーニングインスタンスのサイズに応じて分散トレーニング環境を設定します。


# distributed_training_launcher.py

#!/bin/python

import subprocess
import sys

if __name__ == "__main__":
    arguments_command = " ".join([arg for arg in sys.argv[1:]])
    """
    The following line takes care of setting up an inter-node communication
    as well as managing intra-node workers for each GPU.
    """
    subprocess.check_call("python -m torch_xla.distributed.sm_dist " + arguments_command, shell=True)

ヒント

トレーニングスクリプトを準備するには、次のページを参照してください。

Hugging Face Transformers の Trainer API を使用した PyTorch モデルの分散トレーニングの場合
Hugging Face Transformers の Trainer API を使用しない PyTorch モデルの分散トレーニングの場合

ヒント

エンドツーエンドの例については、次のノートブックを参照してください。

次のリストは、コンパイラで SageMaker トレーニングジョブを実行するために必要な最小限のパラメータセットです。

注記

SageMaker AI Hugging Face 推定器を使用する場合は、transformers_version、pytorch_version、hyperparameters、および compiler_configパラメータを指定して SageMaker Training Compiler を有効にする必要があります。image_uri を使用して、サポートされるフレームワークにリストされている Training Compiler の統合深層学習コンテナを手動で指定することはできません。

entry_point (str) — 必須。トレーニングスクリプトのファイル名を指定します。
注記
SageMaker Training Compiler と PyTorch 1.10.2 以前のバージョンを使用して分散トレーニングを実行するには、このパラメータにランチャースクリプトのファイル名を指定します。ランチャースクリプトは、トレーニングスクリプトをラップして分散トレーニング環境を設定できるよう準備しておく必要があります。詳細については、以下のノートブックの例を参照してください。
- シングルノードマルチ GPU トレーニング用の SST2 データセットで、Transformers Trainer API を使用して GPT2 モデルをコンパイルおよびトレーニングする
- マルチノードマルチ GPU トレーニング用の SST2 データセットで、Transformers Trainer API を使用して GPT2 モデルをコンパイルおよびトレーニングする
source_dir (str) — オプション。追加パッケージをインストールする必要がある場合は、こちらを追加してください。パッケージをインストールするには、このディレクトリの下に requirements.txt ファイルを用意する必要があります。
instance_count (int) — 必須。インスタンス数を指定します。
instance_type (str) — 必須。インスタンスのタイプを指定します。
transformers_version (str) – SageMaker AI Hugging Face 推定器を使用する場合にのみ必要です。SageMaker Training Compiler でサポートされている Hugging Face Transformers のライブラリバージョンを指定します。使用可能なバージョンを見つけるには、「サポートされるフレームワーク」を参照してください。
framework_version または pytorch_version (str) — 必須。SageMaker Training Compiler でサポートされている PyTorch バージョンを指定します。使用可能なバージョンを見つけるには、「サポートされるフレームワーク」を参照してください。

注記
SageMaker AI Hugging Face 推定器を使用する場合は、 transformers_versionとの両方を指定する必要がありますpytorch_version。
hyperparameters (dict) — オプション。トレーニングジョブのハイパーパラメータ (n_gpus、batch_size、learning_rate など) を指定します。SageMaker Training Compiler を有効にする場合は、より大きなバッチサイズを試し、それに応じて学習レートを調整します。コンパイラを使用し、バッチサイズを調整してトレーニング速度を向上させたケーススタディについては、「テスト済みモデル」および「SageMaker Training Compiler サンプルノートブックとブログ」を参照してください。

注記
SageMaker Training Compiler と PyTorch 1.10.2 より以前のバージョンを使用して分散トレーニングを実行するには、前のコード例に示すように、追加パラメータ "training_script" を追加してトレーニングスクリプトを指定する必要があります。
compiler_config (TrainingCompilerConfig オブジェクト) — SageMaker Training Compiler を有効にするために必要です。SageMaker Training Compiler を有効にするには、このパラメータを含めます。TrainingCompilerConfig クラスのパラメータは次のとおりです。
- enabled (bool) — オプション。True または False を指定して、SageMaker Training Compiler を有効または無効にします。デフォルト値は True です。
- debug (bool) — オプション。コンパイラで高速化されたトレーニングジョブからより詳細なトレーニングログを受け取るには、これを True に変更します。ただし、追加のログ記録によってオーバーヘッドが増し、コンパイルされたトレーニングジョブが遅くなる可能性があります。デフォルト値は False です。
distribution (dict) — オプション。SageMaker Training Compiler を使用して分散トレーニングジョブを実行するには、distribution = { 'pytorchxla' : { 'enabled': True }} を追加します。

警告

SageMaker Debugger をオンにすると、SageMaker Training Compiler のパフォーマンスに影響を与える可能性があります。SageMaker Training Compiler の実行時にデバッガーをオフにして、パフォーマンスに影響が出ないようにすることをお勧めします。詳細については、「考慮事項」を参照してください。デバッガー機能をオフにするには、次の 2 つの引数を推定器に追加します。


disable_profiler=True,
debugger_hook_config=False

コンパイラを使用したトレーニングジョブが正常に起動すると、ジョブの初期化フェーズで次のログを受け取ります。

TrainingCompilerConfig(debug=False) の場合


Found configuration for Training Compiler
Configuring SM Training Compiler...

TrainingCompilerConfig(debug=True) の場合


Found configuration for Training Compiler
Configuring SM Training Compiler...
Training Compiler set to debug mode

SageMaker AI `CreateTrainingJob` API オペレーションの使用

SageMaker Training Compiler の設定オプションは、CreateTrainingJob API オペレーションのリクエスト構文で AlgorithmSpecification および HyperParameters フィールドを介して指定する必要があります。


"AlgorithmSpecification": {
    "TrainingImage": "<sagemaker-training-compiler-enabled-dlc-image>"
},

"HyperParameters": {
    "sagemaker_training_compiler_enabled": "true",
    "sagemaker_training_compiler_debug_mode": "false",
    "sagemaker_pytorch_xla_multi_worker_enabled": "false"    // set to "true" for distributed training
}

SageMaker Training Compiler が実装されている深層学習コンテナのイメージ URI の完全なリストについては、「サポートされるフレームワーク」を参照してください。

ブラウザで JavaScript が無効になっているか、使用できません。

AWS ドキュメントを使用するには、JavaScript を有効にする必要があります。手順については、使用するブラウザのヘルプページを参照してください。

ドキュメントの表記規則

Training Compiler を有効化する

トレーニングコンパイラを使用して TensorFlow トレーニングジョブを実行する

SageMaker Training Compiler を使用して PyTorch トレーニングジョブを実行する

トピック

SageMaker Python SDK を使用する

ヒント

注記

注記

注記

注記

ヒント

ヒント

ヒント

ヒント

注記

注記

注記

注記

警告

SageMaker AI CreateTrainingJob API オペレーションの使用

SageMaker AI `CreateTrainingJob` API オペレーションの使用