Distributed training with the SageMaker AI distributed data parallelism library

Focus mode

Distributed training with the SageMaker AI distributed data parallelism library - Amazon SageMaker AI

The SageMaker AI distributed data parallelism (SMDDP) library is designed for ease of use and to provide seamless integration with PyTorch.

When training a deep learning model with the SMDDP library on SageMaker AI, you can focus on writing your training script and model training.

To get started, import the SMDDP library to use its collective operations optimized for AWS. The following topics provide instructions on what to add to your training script depending on which collective operation you want to optimize.

Topics

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Supported frameworks, AWS Regions, and instances types

Adapting your training script to use the SMDDP collective operations

Select your cookie preferences

Customize cookie preferences

Essential

Performance

Functional

Advertising

Unable to save cookie preferences

Distributed training with the SageMaker AI distributed data parallelism library

Topics

Related resources

Did this page help you?

Related resources

Next topic:

Previous topic:

Need help?