支持 Hugging Face 转换器模型

SageMaker 模型并行度库的张量并行度为以下 Hugging Face Transformer 模型提供 out-of-the-box支持：

GPT-2 BERT、和 RoBERTa （在 SageMaker 模型并行度库 v1.7.0 及更高版本中可用）
GPT-J（在 SageMaker 模型并行度库 1.8.0 及更高版本中可用）
GPT-Neo（在 SageMaker 模型并行度库 v1.10.0 及更高版本中可用）

注意

对于任何其他《变形金刚》模型，你需要使用 smdistributed.modelparallel.torch.tp_register_with_module () 来应用张量并行性。 API

注意

要使用张量并行度来训练 Hugging Face Transformer 模型，请务必使用 SageMaker具有模型并行度库 v1.7.0 及更高版本的 Hugging Face Deep Lear PyTorch ning Containers。有关更多信息，请参阅SageMaker 模型并行度库发行说明。

现成支持的模型

对于开箱即用库支持的 Hugging Face 变压器模型，您无需手动实现挂钩即可将 Transformer smdistributed 转换为APIs变压器层。你可以使用上下文管理器 smdistributed.modelparallel.torch.tensor_parallelism () 激活张量并行性，然后用 smdistributed.modelparallel.torch 包装模型。 DistributedModel()。您无需使用手动注册用于张量并行性的挂钩。smp.tp_register API

Hugging Face Transformers 与 smdistributed.modelparallel 之间的 state_dict 转换函数可以如下所示访问。

smdistributed.modelparallel.torch.nn.huggingface.gpt2.translate_state_dict_to_hf_gpt2(state_dict, max_seq_len=None)
smdistributed.modelparallel.torch.nn.huggingface.gpt2.translate_hf_state_dict_to_smdistributed_gpt2(state_dict)
smdistributed.modelparallel.torch.nn.huggingface.bert.translate_state_dict_to_hf_bert(state_dict, max_seq_len=None)
smdistributed.modelparallel.torch.nn.huggingface.bert.translate_hf_state_dict_to_smdistributed_bert(state_dict)
smdistributed.modelparallel.torch.nn.huggingface.roberta.translate_state_dict_to_hf_roberta(state_dict, max_seq_len=None)
smdistributed.modelparallel.torch.nn.huggingface.roberta.translate_hf_state_dict_to_smdistributed_roberta(state_dict)
smdistributed.modelparallel.torch.nn.huggingface.gptj.translate_state_dict_to_hf_gptj(state_dict, max_seq_len=None)（在 SageMaker 模型并行度库 v1.8.0 及更高版本中可用）
smdistributed.modelparallel.torch.nn.huggingface.gptj.translate_hf_gptj_state_dict_to_smdistributed_gptj（在 SageMaker 模型并行度库 v1.8.0 及更高版本中可用）
smdistributed.modelparallel.torch.nn.huggingface.gptneo.translate_state_dict_to_hf_gptneo(state_dict, max_seq_len=None)（在 SageMaker 模型并行度库 v1.10.0 及更高版本中可用）
smdistributed.modelparallel.torch.nn.huggingface.gptneo.translate_hf_state_dict_to_smdistributed_gptneo(state_dict)（在 SageMaker 模型并行度库 v1.10.0 及更高版本中可用）

GPT-2 翻译函数的用法示例

首先包装模型，如以下代码所示。


from transformers import AutoModelForCausalLM

with smp.tensor_parallelism():
    model = AutoModelForCausalLM.from_config(hf_gpt2_config)

model = smp.DistributedModel(model)

给定state_dict来自DistributedModel对象的 a，你可以使用translate_state_dict_to_hf_gpt2函数将权重加载到原始的 Hugging GPT Face -2 模型中，如以下代码所示。


from smdistributed.modelparallel.torch.nn.huggingface.gpt2 \
                                      import translate_state_dict_to_hf_gpt2
max_seq_len = 1024

# [... code block for training ...]

if smp.rdp_rank() == 0:
    state_dict = dist_model.state_dict()
    hf_state_dict = translate_state_dict_to_hf_gpt2(state_dict, max_seq_len)

    # can now call model.load_state_dict(hf_state_dict) to the original HF model

R oBERTa 翻译函数的用法示例

同样，给定支持的 HuggingFace 模型state_dict，您可以使用translate_hf_state_dict_to_smdistributed函数将其转换为可读的格式smp.DistributedModel。这在迁移学习使用场景中非常有用，此时预训练的模型加载到 smp.DistributedModel 中用于模型并行微调：


from smdistributed.modelparallel.torch.nn.huggingface.roberta \
                                      import translate_state_dict_to_smdistributed

model = AutoModelForMaskedLM.from_config(roberta_config)
model = smp.DistributedModel(model)

pretrained_model = AutoModelForMaskedLM.from_pretrained("roberta-large")
translated_state_dict =
        translate_state_dict_to_smdistributed(pretrained_model.state_dict())

# load the translated pretrained weights into the smp.DistributedModel
model.load_state_dict(translated_state_dict)

# start fine-tuning...

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

文档惯例

使用张量并行性运行训练作业

秩评定机制