SageMaker AI data parallelism library release notes - Amazon SageMaker AI

SageMaker AI data parallelism library release notes

See the following release notes to track the latest updates for the SageMaker AI distributed data parallelism (SMDDP) library.

The SageMaker AI distributed data parallelism library v2.5.0

Date: October 17, 2024

New features

  • Added support for PyTorch v2.4.1 with CUDA v12.1.

Integration into Docker containers distributed by the SageMaker AI model parallelism (SMP) library

This version of the SMDDP library is migrated to The SageMaker AI model parallelism library v2.6.0.

658645717510.dkr.ecr.<us-west-2>.amazonaws.com/smdistributed-modelparallel:2.4.1-gpu-py311-cu121

For Regions where the SMP Docker images are available, see AWS Regions.

Binary file of this release

You can download or install the library using the following URL.

https://smdataparallel.s3.amazonaws.com/binary/pytorch/2.4.1/cu121/2024-10-09/smdistributed_dataparallel-2.5.0-cp311-cp311-linux_x86_64.whl

The SageMaker AI distributed data parallelism library v2.3.0

Date: June 11, 2024

New features

  • Added support for PyTorch v2.3.0 with CUDA v12.1 and Python v3.11.

  • Added support for PyTorch Lightning v2.2.5. This is integrated into the SageMaker AI framework container for PyTorch v2.3.0.

  • Added instance type validation during import to prevent loading the SMDDP library on unsupported instance types. For a list of instance types compatible with the SMDDP library, see Supported frameworks, AWS Regions, and instances types.

Integration into SageMaker AI Framework Containers

This version of the SMDDP library is migrated to the following SageMaker AI Framework Container.

  • PyTorch v2.3.0

    763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:2.3.0-gpu-py311-cu121-ubuntu20.04-sagemaker

For a complete list of versions of the SMDDP library and the pre-built containers, see Supported frameworks, AWS Regions, and instances types.

Binary file of this release

You can download or install the library using the following URL.

https://smdataparallel.s3.amazonaws.com/binary/pytorch/2.3.0/cu121/2024-05-23/smdistributed_dataparallel-2.3.0-cp311-cp311-linux_x86_64.whl

Other changes

  • The SMDDP library v2.2.0 is integrated into the SageMaker AI framework container for PyTorch v2.2.0.

The SageMaker AI distributed data parallelism library v2.2.0

Date: March 4, 2024

New features

  • Added support for PyTorch v2.2.0 with CUDA v12.1.

Integration into Docker containers distributed by the SageMaker AI model parallelism (SMP) library

This version of the SMDDP library is migrated to The SageMaker AI model parallelism library v2.2.0.

658645717510.dkr.ecr.<region>.amazonaws.com/smdistributed-modelparallel:2.2.0-gpu-py310-cu121

For Regions where the SMP Docker images are available, see AWS Regions.

Binary file of this release

You can download or install the library using the following URL.

https://smdataparallel.s3.amazonaws.com/binary/pytorch/2.2.0/cu121/2024-03-04/smdistributed_dataparallel-2.2.0-cp310-cp310-linux_x86_64.whl

The SageMaker AI distributed data parallelism library v2.1.0

Date: March 1, 2024

New features

  • Added support for PyTorch v2.1.0 with CUDA v12.1.

Bug fixes

Integration into SageMaker AI Framework Containers

This version of the SMDDP library passed benchmark testing and is migrated to the following SageMaker AI Framework Container.

  • PyTorch v2.1.0

    763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:2.1.0-gpu-py310-cu121-ubuntu20.04-sagemaker

Integration into Docker containers distributed by the SageMaker AI model parallelism (SMP) library

This version of the SMDDP library is migrated to The SageMaker AI model parallelism library v2.1.0.

658645717510.dkr.ecr.<region>.amazonaws.com/smdistributed-modelparallel:2.1.2-gpu-py310-cu121

For Regions where the SMP Docker images are available, see AWS Regions.

Binary file of this release

You can download or install the library using the following URL.

https://smdataparallel.s3.amazonaws.com/binary/pytorch/2.1.0/cu121/2024-02-04/smdistributed_dataparallel-2.1.0-cp310-cp310-linux_x86_64.whl

The SageMaker AI distributed data parallelism library v2.0.1

Date: December 7, 2023

New features

Known issues

  • There's a CPU memory leak issue from a gradual CPU memory increase while training with SMDDP AllReduce in DDP mode.

Integration into SageMaker AI Framework Containers

This version of the SMDDP library passed benchmark testing and is migrated to the following SageMaker AI Framework Container.

  • PyTorch v2.0.1

    763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:2.0.1-gpu-py310-cu118-ubuntu20.04-sagemaker

Binary file of this release

You can download or install the library using the following URL.

https://smdataparallel.s3.amazonaws.com/binary/pytorch/2.0.1/cu118/2023-12-07/smdistributed_dataparallel-2.0.2-cp310-cp310-linux_x86_64.whl

Other changes

  • Starting from this release, documentation for the SMDDP library is fully available in this Amazon SageMaker AI Developer Guide. In favor of the complete developer guide for SMDDP v2 housed in the Amazon SageMaker AI Developer Guide, documentation for the additional reference for SMDDP v1.x in the SageMaker AI Python SDK documentation is no longer supported. If you still need SMP v1.x documentation, see the following snapshot of the documentation at SageMaker Python SDK v2.212.0 documentation.