Context parallelism
Context parallelism is a type of model parallelism
that partitions the model activations along the sequence dimension. Unlike other sequence parallelismLayerNorm
and RMSNorm
, context
parallelism partitions the network inputs and all intermediate activations along the
sequence dimension.
SMP v2 integrates with Transformer
Engine
Hugging Face Transformer models compatible with SMP context parallelism
SMP v2 currently offers context parallelism support for the following Hugging Face transformer models.
-
GPT-NeoX
-
Llama 2 and Llama 3
Configure context parallelism
Set an integer value to the context_parallel_degree
parameter that
evenly divides the number of GPUs in your cluster. For example, if you have an 8-GPU
instance, use 2, 4, or 8 for context_parallel_degree
. We recommend
starting with a small context_parallel_degree
value and gradually
increasing it until the model fits in the GPU memory with the required input
sequence length.
The following code snippets show how to add the SMP initialization module
torch.sagemaker.init()
to your training script and set up the SMP
configuration dictionary in JSON format for training job launcher while following
the two-step process introduced in Use the SageMaker AI model parallelism
library v2. You
don’t need to make any changes to your PyTorch model or PyTorch FSDPcontext_parallel_degree
parameter, see SMP v2 core
feature configuration parameters.
In your training script
As part of Step
1, initialize your script with torch.sagemaker.init()
to
activate SMP v2 and wrap your model with the torch.sagemaker.transform API.
Starting from SMP v2.6.0, you can use the argument cp_comm_type
to determine which context parallelism implementation to use. The SMP library
currently supports two implementations: p2p
and
all_gather
. The p2p
implementation uses
peer-to-peer send-receive calls for key-value accumulation during the attention
implementation and runs asynchronously, allowing overlaps with compute.
all_gather
implementation, instead, uses the
AllGather
collective operation and runs synchronously.
import torch.sagemaker as tsm tsm.init() from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_config(..) model = tsm.transform(model, cp_comm_type="p2p")
SMP configuration
As part of Step 2, add the following parameter to the SMP configuration dictionary for the SageMaker AI PyTorch estimator.
{ ..., # other SMP config parameters "context_parallel_degree": 2 }