Amazon SageMaker model parallelism library v1 examples
This page provides a list of blogs and Jupyter notebooks that present practical examples of implementing the SageMaker model parallelism (SMP) library v1 to run distributed training jobs on SageMaker.
Blogs and Case Studies
The following blogs discuss case studies about using SMP v1.
-
New performance improvements in the Amazon SageMaker model parallelism library
, AWS Machine Learning Blog (December 16, 2022) -
Train gigantic models with near-linear scaling using sharded data parallelism on Amazon SageMaker
, AWS Machine Learning Blog (October 31, 2022)
Example notebooks
Example notebooks are provided in the SageMaker examples GitHub repositorytraining/distributed_training/pytorch/model_parallel
.
Note
Clone and run the example notebooks in the following SageMaker ML IDEs.
-
SageMaker JupyterLab (available in Studio created after December 2023)
-
SageMaker Code Editor (available in Studio created after December 2023)
-
Studio Classic (available as an application in Studio created after December 2023)
git clone https://github.com/aws/amazon-sagemaker-examples.git cd amazon-sagemaker-examples/training/distributed_training/pytorch/model_parallel
SMP v1 example notebooks for PyTorch
SMP v1 example notebooks for TensorFlow