활성화 체크포인트

활성화 체크포인트는 특정 레이어의 활성화를 지우고 역방향 패스 중에 이를 다시 계산하여 메모리 사용량을 줄이는 기법입니다. 이렇게 하면 추가 계산 시간이 줄어들어 메모리 사용량이 줄어듭니다. 모듈이 체크포인트된 경우 순방향 패스가 끝날 때 모듈에 대한 초기 입력과 모듈의 최종 출력만 메모리에 남아 있습니다.는 순방향 패스 중에 해당 모듈 내부의 계산에 포함된 모든 중간 텐서를 PyTorch 릴리스합니다. 체크포인트된 모듈의 이전 패스 중에는 이러한 텐서를 PyTorch 다시 계산합니다. 이 시점에서 이 체크포인트 모듈 뒤의 레이어는 역방향 패스를 완료했으므로 체크포인트의 최대 메모리 사용량을 줄입니다.

SMP v2는 PyTorch 활성화 체크포인트 모듈를 지원합니다apply_activation_checkpointing. 다음은 Hugging Face GPT-NeoX 모델의 활성화 체크포인트의 예입니다.

Hugging Face GPT-NeoX 모델의 변환기 계층 확인


from transformers.models.gpt_neox import GPTNeoXLayer
from torch.distributed.algorithms._checkpoint.checkpoint_wrapper import (
    apply_activation_checkpointing
)
    
# check_fn receives a module as the arg, 
# and it needs to return whether the module is to be checkpointed
def is_transformer_layer(module):
    from transformers.models.gpt_neox import GPTNeoXLayer
    return isinstance(submodule, GPTNeoXLayer)
    
apply_activation_checkpointing(model, check_fn=is_transformer_layer)

Hugging Face GPT-NeoX 모델의 다른 모든 변환기 계층 확인


# check_fn receives a module as arg, 
# and it needs to return whether the module is to be checkpointed
# here we define that function based on global variable (transformer_layers)
from transformers.models.gpt_neox import GPTNeoXLayer
from torch.distributed.algorithms._checkpoint.checkpoint_wrapper import (
    apply_activation_checkpointing
)

transformer_layers = [
    m for m model.modules() if isinstance(m, GPTNeoXLayer)
]

def is_odd_transformer_layer(module):
    return transformer_layers.index(module) % 2 == 0
    
apply_activation_checkpointing(model, check_fn=is_odd_transformer_layer)

또는 Hugging Face Transformers 모델의 하위 집합에서 사용하는 체크포인트용 torch.utils.checkpoint 모듈 PyTorch 도 있습니다. 이 모듈은 SMP v2에서도 작동합니다. 하지만 체크포인트 래퍼를 추가하기 위해 모델 정의에 액세스할 수 있어야 합니다. 따라서 apply_activation_checkpointing 메서드를 사용하는 것이 좋습니다.

javascript가 브라우저에서 비활성화되거나 사용이 불가합니다.

AWS 설명서를 사용하려면 Javascript가 활성화되어야 합니다. 지침을 보려면 브라우저의 도움말 페이지를 참조하십시오.

문서 규칙

지연된 파라미터 초기화

활성화 오프로딩