Core features of the SageMaker model parallelism library v2
The Amazon SageMaker AI model parallelism library v2 (SMP v2) offers distribution strategies and
memory-saving techniques, such as sharded data parallelism, tensor parallelism, and
checkpointing. The model parallelism strategies and techniques offered by SMP v2 help
distribute large models across multiple devices while optimizing training speed and memory
consumption. SMP v2 also provides a Python package torch.sagemaker
to help
adapt your training script with few lines of code change.
This guide follows the basic two-step flow introduced in Use the SageMaker model parallelism library v2. To dive deep into the core features of SMP v2 and how to use them, see the following topics.
Note
These core features are available in SMP v2.0.0 and later and the SageMaker Python SDK v2.200.0 and later, and works for PyTorch v2.0.1 and later. To check the versions of the packages, see Supported frameworks and AWS Regions.