SageMaker Python SDK를 사용하여 SageMaker 교육 컴파일러를 활성화하는 방법을 알아봅니다. SageMaker CreateTrainingJobAPI 작업을 사용하여 SageMaker 트레이닝 컴파일러를 활성화하는 방법을 알아보십시오.

PyTorch 교육 컴파일러로 SageMaker 교육 작업 실행

교육 컴파일러에서는 Amazon SageMaker Studio Classic, Amazon SageMaker 노트북 인스턴스 등 모든 SageMaker 인터페이스를 사용하여 SageMaker 교육 작업을 실행할 수 있습니다. AWS SDK for Python (Boto3) AWS Command Line Interface

SageMaker Python SDK 사용하기

SageMaker 훈련 컴파일러는 SageMaker PyTorch및 HuggingFace프레임워크 PyTorch 추정기 클래스를 통해 사용할 수 있습니다. SageMaker 트레이닝 컴파일러를 켜려면 추정기에 compiler_config 파라미터를 추가하십시오. SageMaker TrainingCompilerConfig 클래스를 가져와서 이 클래스의 인스턴스를 compiler_config 매개변수에 전달합니다. 다음 코드 예제는 트레이닝 컴파일러가 켜진 상태에서의 SageMaker 추정기 클래스 구조를 보여줍니다. SageMaker

작은 정보

PyTorch 또는 Transformer에서 제공하는 사전 빌드된 모델로 시작하려면 의 참조 표에 나와 있는 배치 크기를 사용해 보십시오. 테스트 완료 모델

참고

기본 PyTorch 지원은 SageMaker Python SDK v2.121.0 이상에서 사용할 수 있습니다. SageMaker Python SDK를 적절히 업데이트해야 합니다.

참고

PyTorch v1.12.0부터 SageMaker 트레이닝 컴파일러 컨테이너를 사용할 수 있습니다. PyTorch 참고로, 의 SageMaker 트레이닝 컴파일러 컨테이너는 Hugging Face PyTorch Transformer와 함께 사전 패키징되지 않습니다. 이 컨테이너에 라이브러리를 설치해야 하는 경우, 훈련 작업을 제출할 때 소스 디렉터리 하위에 requirements.txt 파일을 추가해야 합니다.

PyTorch v1.11.0 및 이전 버전의 경우 Hugging Face 및 용 SageMaker 트레이닝 컴파일러 컨테이너의 이전 버전을 사용하십시오. PyTorch

프레임워크 버전의 전체 목록과 해당 컨테이너의 정보는 지원되는 프레임워크을(를) 참조하세요.

사용 사례에 맞는 내용은 다음 옵션 중 하나를 참조하세요.

PyTorch v1.12.0 and later

PyTorch 모델을 컴파일하고 학습시키려면 다음 코드 예제와 같이 SageMaker Training Compiler를 사용하여 SageMaker PyTorch 추정기를 구성하십시오.

참고

이 기본 PyTorch 지원은 SageMaker Python SDK v2.120.0 이상에서 사용할 수 있습니다. SageMakerPython SDK를 업데이트해야 합니다.


from sagemaker.pytorch import PyTorch, TrainingCompilerConfig

# the original max batch size that can fit into GPU memory without compiler
batch_size_native=12
learning_rate_native=float('5e-5')

# an updated max batch size that can fit into GPU memory with compiler
batch_size=64

# update learning rate
learning_rate=learning_rate_native/batch_size_native*batch_size

hyperparameters={
    "n_gpus": 1,
    "batch_size": batch_size,
    "learning_rate": learning_rate
}

pytorch_estimator=PyTorch(
    entry_point='train.py',
    source_dir='path-to-requirements-file', # Optional. Add this if need to install additional packages.
    instance_count=1,
    instance_type='ml.p3.2xlarge',
    framework_version='1.13.1',
    py_version='py3',
    hyperparameters=hyperparameters,
    compiler_config=TrainingCompilerConfig(),
    disable_profiler=True,
    debugger_hook_config=False
)

pytorch_estimator.fit()

Hugging Face Transformers with PyTorch v1.11.0 and before

를 사용하여 트랜스포머 모델을 컴파일하고 학습시키려면 다음 PyTorch 코드 예제와 같이 SageMaker Training Compiler를 사용하여 SageMaker Hugging Face 추정기를 구성하십시오.


from sagemaker.huggingface import HuggingFace, TrainingCompilerConfig

# the original max batch size that can fit into GPU memory without compiler
batch_size_native=12
learning_rate_native=float('5e-5')

# an updated max batch size that can fit into GPU memory with compiler
batch_size=64

# update learning rate
learning_rate=learning_rate_native/batch_size_native*batch_size

hyperparameters={
    "n_gpus": 1,
    "batch_size": batch_size,
    "learning_rate": learning_rate
}

pytorch_huggingface_estimator=HuggingFace(
    entry_point='train.py',
    instance_count=1,
    instance_type='ml.p3.2xlarge',
    transformers_version='4.21.1',
    pytorch_version='1.11.0',
    hyperparameters=hyperparameters,
    compiler_config=TrainingCompilerConfig(),
    disable_profiler=True,
    debugger_hook_config=False
)

pytorch_huggingface_estimator.fit()

훈련 스크립트를 준비하려면 다음 페이지를 참고하십시오.

단일 GPU 훈련용Hugging Face Transformers의 트레이너 API를 사용한 PyTorch 모델의
단일 GPU 훈련의 경우Hugging Face 트랜스포머의 트레이너 API가 없는 PyTorch 모델

end-to-end 예제를 찾으려면 다음 노트북을 참조하십시오.

PyTorch v1.12

PyTorch v1.12의 경우 추정기 클래스의 파라미터에 지정된 pytorch_xla 옵션을 추가하여 SageMaker 트레이닝 컴파일러로 분산 학습을 실행할 수 있습니다. distribution SageMaker PyTorch

참고

이 기본 PyTorch 지원은 SageMaker Python SDK v2.121.0 이상에서 사용할 수 있습니다. SageMakerPython SDK를 업데이트해야 합니다.


from sagemaker.pytorch import PyTorch, TrainingCompilerConfig

# choose an instance type, specify the number of instances you want to use,
# and set the num_gpus variable the number of GPUs per instance.
instance_count=1
instance_type='ml.p3.8xlarge'
num_gpus=4

# the original max batch size that can fit to GPU memory without compiler
batch_size_native=16
learning_rate_native=float('5e-5')

# an updated max batch size that can fit to GPU memory with compiler
batch_size=26

# update learning rate
learning_rate=learning_rate_native/batch_size_native*batch_size*num_gpus*instance_count

hyperparameters={
    "n_gpus": num_gpus,
    "batch_size": batch_size,
    "learning_rate": learning_rate
}

pytorch_estimator=PyTorch(
    entry_point='your_training_script.py',
    source_dir='path-to-requirements-file', # Optional. Add this if need to install additional packages.
    instance_count=instance_count,
    instance_type=instance_type,
    framework_version='1.13.1',
    py_version='py3',
    hyperparameters=hyperparameters,
    compiler_config=TrainingCompilerConfig(),
    distribution ={'pytorchxla' : { 'enabled': True }},
    disable_profiler=True,
    debugger_hook_config=False
)

pytorch_estimator.fit()

작은 정보

훈련 스크립트를 준비하려면 PyTorch을(를) 참조하세요.

Transformers v4.21 with PyTorch v1.11

PyTorch v1.11 이상에서는 파라미터에 지정된 pytorch_xla 옵션을 사용하여 분산 학습에 SageMaker 훈련 컴파일러를 사용할 수 있습니다. distribution


from sagemaker.huggingface import HuggingFace, TrainingCompilerConfig

# choose an instance type, specify the number of instances you want to use,
# and set the num_gpus variable the number of GPUs per instance.
instance_count=1
instance_type='ml.p3.8xlarge'
num_gpus=4

# the original max batch size that can fit to GPU memory without compiler
batch_size_native=16
learning_rate_native=float('5e-5')

# an updated max batch size that can fit to GPU memory with compiler
batch_size=26

# update learning rate
learning_rate=learning_rate_native/batch_size_native*batch_size*num_gpus*instance_count

hyperparameters={
    "n_gpus": num_gpus,
    "batch_size": batch_size,
    "learning_rate": learning_rate
}

pytorch_huggingface_estimator=HuggingFace(
    entry_point='your_training_script.py',
    instance_count=instance_count,
    instance_type=instance_type,
    transformers_version='4.21.1',
    pytorch_version='1.11.0',
    hyperparameters=hyperparameters,
    compiler_config=TrainingCompilerConfig(),
    distribution ={'pytorchxla' : { 'enabled': True }},
    disable_profiler=True,
    debugger_hook_config=False
)

pytorch_huggingface_estimator.fit()

작은 정보

훈련 스크립트를 준비하려면 다음 페이지를 참고하십시오.

분산형 훈련용Hugging Face Transformers의 트레이너 API를 사용한 PyTorch 모델의
분산 훈련의 경우Hugging Face 트랜스포머의 트레이너 API가 없는 PyTorch 모델

Transformers v4.17 with PyTorch v1.10.2 and before

지원되는 PyTorch v1.10.2 및 이전 버전의 경우 SageMaker 트레이닝 컴파일러에는 분산 교육 작업을 시작하기 위한 대체 메커니즘이 필요합니다. 분산 교육을 실행하려면 교육 컴파일러에서 SageMaker 분산 SageMaker 훈련 시작 프로그램 스크립트를 인수에 전달하고 훈련 스크립트를 entry_point 인수에 전달해야 합니다. hyperparameters 다음 코드 예제는 필요한 변경 사항을 적용하여 SageMaker Hugging Face 추정기를 구성하는 방법을 보여줍니다.


from sagemaker.huggingface import HuggingFace, TrainingCompilerConfig

# choose an instance type, specify the number of instances you want to use,
# and set the num_gpus variable the number of GPUs per instance.
instance_count=1
instance_type='ml.p3.8xlarge'
num_gpus=4

# the original max batch size that can fit to GPU memory without compiler
batch_size_native=16
learning_rate_native=float('5e-5')

# an updated max batch size that can fit to GPU memory with compiler
batch_size=26

# update learning rate
learning_rate=learning_rate_native/batch_size_native*batch_size*num_gpus*instance_count

training_script="your_training_script.py"

hyperparameters={
    "n_gpus": num_gpus,
    "batch_size": batch_size,
    "learning_rate": learning_rate,
    "training_script": training_script     # Specify the file name of your training script.
}

pytorch_huggingface_estimator=HuggingFace(
    entry_point='distributed_training_launcher.py',    # Specify the distributed training launcher script.
    instance_count=instance_count,
    instance_type=instance_type,
    transformers_version='4.17.0',
    pytorch_version='1.10.2',
    hyperparameters=hyperparameters,
    compiler_config=TrainingCompilerConfig(),
    disable_profiler=True,
    debugger_hook_config=False
)

pytorch_huggingface_estimator.fit()

런처 스크립트는 다음과 같아야 합니다. 이 스크립트에서는 훈련 스크립트를 래핑하고, 선택된 훈련 인스턴스의 크기에 따라 분산 훈련 환경을 구성합니다.


# distributed_training_launcher.py

#!/bin/python

import subprocess
import sys

if __name__ == "__main__":
    arguments_command = " ".join([arg for arg in sys.argv[1:]])
    """
    The following line takes care of setting up an inter-node communication
    as well as managing intra-node workers for each GPU.
    """
    subprocess.check_call("python -m torch_xla.distributed.sm_dist " + arguments_command, shell=True)

작은 정보

훈련 스크립트를 준비하려면 다음 페이지를 참조하세요.

분산형 훈련용Hugging Face Transformers의 트레이너 API를 사용한 PyTorch 모델의
분산 훈련의 경우Hugging Face 트랜스포머의 트레이너 API가 없는 PyTorch 모델

작은 정보

end-to-end 예제를 찾으려면 다음 노트북을 참조하십시오.

다음 목록은 컴파일러로 SageMaker 훈련 작업을 실행하는 데 필요한 최소 파라미터 집합입니다.

참고

SageMaker Hugging Face 추정기를 사용할 때는,,, 매개변수를 지정하여 transformers_version SageMaker 트레이닝 pytorch_version hyperparameters 컴파일러를 compiler_config 활성화해야 합니다. image_uri은(는) 지원되는 프레임워크에 나열된 Training Compiler 통합 딥 러닝 컨테이너를 수동으로 지정하는 데 사용할 수 없습니다.

entry_point(str) - 필수 사항. 훈련 스크립트의 파일 이름을 지정하세요.
참고
트레이닝 컴파일러와 PyTorch v1.10.2 이전 버전을 사용하여 분산 SageMaker 훈련을 실행하려면 런처 스크립트의 파일 이름을 이 파라미터에 지정하십시오. 훈련 스크립트를 래핑하고 분산 훈련 환경을 구성할 수 있도록 런처 스크립트를 준비해야 합니다. 자세한 내용은 다음 예제 노트북을 참조하세요.
- 단일 노드 다중 GPU 훈련용 SST2 데이터 세트로 Transformers 트레이너 API를 이용한 GPT2 모델 컴파일 및 훈련
- 다중 노드 다중 GPU 훈련용 SST2 데이터 세트로 Transformers 트레이너 API를 이용한 GPT2 모델 컴파일 및 훈련
source_dir(str) - 선택 사항. 추가 패키지를 설치해야 하는 경우에 추가하세요. 패키지를 설치하려면 이 디렉터리 하위에 requirements.txt 파일을 준비해야 합니다.
instance_count(int) - 필수 사항. 인스턴스 수를 지정하세요.
instance_type(str) - 필수 사항. 인스턴스 유형을 지정하세요.
transformers_version(str) — SageMaker Hugging Face 추정기를 사용할 때만 필요합니다. 트레이닝 컴파일러에서 SageMaker 지원하는 Hugging Face Transformer 라이브러리 버전을 지정하십시오. 사용 가능한 버전을 확인하려면 지원되는 프레임워크을(를) 참조하세요.
framework_version 또는 pytorch_version(str) - 필수 사항. 트레이닝 컴파일러에서 지원하는 PyTorch SageMaker 버전을 지정하십시오. 사용 가능한 버전을 확인하려면 지원되는 프레임워크을(를) 참조하세요.

참고
SageMaker Hugging Face 추정기를 사용할 때는 및 를 모두 지정해야 합니다. transformers_version pytorch_version
hyperparameters(dict) - 선택 사항. 훈련 작업에 사용할 하이퍼파라미터(예: n_gpus, batch_size, learning_rate)를 지정하세요. SageMaker 트레이닝 컴파일러를 활성화할 때는 배치 크기를 늘리고 그에 따라 학습률을 조정하세요. 컴파일러를 사용하고 배치 크기를 조정하여 훈련 속도를 향상시킨 사례 연구를 확인하려면 테스트 완료 모델 및 SageMaker 교육 컴파일러 예제 노트북 및 블로그을(를) 참조하세요.

참고
Training Compiler와 PyTorch v1.10.2 이전 버전을 사용하여 분산 SageMaker 훈련을 실행하려면 이전 코드 예제와 같이 훈련 스크립트를 지정하는 추가 파라미터를 추가해야 합니다. "training_script"
compiler_config(TrainingCompilerConfig 객체) — 트레이닝 컴파일러를 활성화하는 데 필요합니다. SageMaker SageMaker 트레이닝 컴파일러를 켜려면 이 파라미터를 포함하세요. 다음은 TrainingCompilerConfig 클래스의 매개변수입니다.
- enabled(bool) — 선택 사항. SageMaker 트레이닝 컴파일러를 켜거나 False 끄도록 지정합니다True. 기본 값은 True입니다.
- debug(bool) — 선택 사항. 컴파일러 가속 훈련 작업에서 더 자세한 훈련 로그를 받으려면 True로 변경하십시오. 하지만, 추가 로깅으로 인해 오버헤드가 추가되어 컴파일된 훈련 작업이 느려질 수 있습니다. 기본 값은 False입니다.
distribution(dict) - 선택 사항. 트레이닝 컴파일러를 사용하여 SageMaker 분산 훈련 작업을 실행하려면 를 추가하십시오. distribution = { 'pytorchxla' : { 'enabled': True }}

주의

SageMaker 디버거를 켜면 SageMaker 트레이닝 컴파일러의 성능에 영향을 미칠 수 있습니다. 성능에 영향을 주지 않도록 SageMaker 트레이닝 컴파일러를 실행할 때는 디버거를 끄는 것이 좋습니다. 자세한 설명은 고려 사항 섹션을 참조하세요. 디버거 기능을 끄려면 다음 두 인수를 예측기에 추가하십시오.


disable_profiler=True,
debugger_hook_config=False

컴파일러를 사용한 훈련 작업이 성공적으로 시작되면, 작업 초기화 단계에서 다음과 같은 로그를 받게 됩니다.

TrainingCompilerConfig(debug=False) 포함


Found configuration for Training Compiler
Configuring SM Training Compiler...

TrainingCompilerConfig(debug=True) 포함


Found configuration for Training Compiler
Configuring SM Training Compiler...
Training Compiler set to debug mode

API 작업 사용 SageMaker `CreateTrainingJob`

SageMaker 교육 컴파일러 구성 옵션은 CreateTrainingJobAPI 작업에 대한 요청 구문의 AlgorithmSpecification 및 HyperParameters 필드를 통해 지정해야 합니다.


"AlgorithmSpecification": {
    "TrainingImage": "<sagemaker-training-compiler-enabled-dlc-image>"
},

"HyperParameters": {
    "sagemaker_training_compiler_enabled": "true",
    "sagemaker_training_compiler_debug_mode": "false",
    "sagemaker_pytorch_xla_multi_worker_enabled": "false"    // set to "true" for distributed training
}

SageMaker 트레이닝 컴파일러가 구현된 딥러닝 컨테이너 이미지 URI의 전체 목록을 찾으려면 을 참조하십시오. 지원되는 프레임워크

javascript가 브라우저에서 비활성화되거나 사용이 불가합니다.

AWS 설명서를 사용하려면 Javascript가 활성화되어야 합니다. 지침을 보려면 브라우저의 도움말 페이지를 참조하십시오.

문서 규칙

훈련 컴파일러 활성화

교육 컴파일러로 TensorFlow 훈련 작업 실행

PyTorch 교육 컴파일러로 SageMaker 교육 작업 실행

주제

SageMaker Python SDK 사용하기

작은 정보

참고

참고

참고

참고

작은 정보

작은 정보

작은 정보

작은 정보

참고

참고

참고

참고

주의

API 작업 사용 SageMaker CreateTrainingJob

API 작업 사용 SageMaker `CreateTrainingJob`