Amazon SageMaker Training Compiler Release Notes - Amazon SageMaker AI

Amazon SageMaker Training Compiler Release Notes

Important

Amazon Web Services (AWS) announces that there will be no new releases or versions of SageMaker Training Compiler. You can continue to utilize SageMaker Training Compiler through the existing AWS Deep Learning Containers (DLCs) for SageMaker Training. It is important to note that while the existing DLCs remain accessible, they will no longer receive patches or updates from AWS, in accordance with the AWS Deep Learning Containers Framework Support Policy.

See the following release notes to track the latest updates for Amazon SageMaker Training Compiler.

SageMaker Training Compiler Release Notes: February 13, 2023

Currency Updates
  • Added support for PyTorch v1.13.1

Bug Fixes
  • Fixed a race condition issue on GPU which was causing NAN loss in some models like vision transformer (ViT) models.

Other Changes
  • SageMaker Training Compiler improves performance by letting PyTorch/XLA to automatically override the optimizers (such as SGD, Adam, AdamW) in torch.optim or transformers.optimization with the syncfree versions of them in torch_xla.amp.syncfree (such as torch_xla.amp.syncfree.SGD, torch_xla.amp.syncfree.Adam, torch_xla.amp.syncfree.AdamW). You don't need to change those code lines where you define optimizers in your training script.

Migration to AWS Deep Learning Containers

This release passed benchmark testing and is migrated to the following AWS Deep Learning Container:

SageMaker Training Compiler Release Notes: January 9, 2023

Breaking Changes

  • tf.keras.optimizers.Optimizer points to a new optimizer in TensorFlow 2.11.0 and later. The old optimizers are moved to tf.keras.optimizers.legacy. You might encounter job failure due to the breaking change when you do the following.

    • Load checkpoints from an old optimizer. We recommend you to switch to use the legacy optimizers.

    • Use TensorFlow v1. We recommend you to migrate to TensorFlow v2, or switch to the legacy optimizers if you need to continue using TensorFlow v1.

    For more detailed list of breaking changes from the optimizer changes, see the official TensorFlow v2.11.0 release notes in the TensorFlow GitHub repository.

Migration to AWS Deep Learning Containers

This release passed benchmark testing and is migrated to the following AWS Deep Learning Container:

SageMaker Training Compiler Release Notes: December 8, 2022

Bug Fixes

  • Fixed the seed for PyTorch training jobs starting PyTorch v1.12 to ensure that there is no discrepancy in model initialization across different processes. See also PyTorch Reproducibility.

  • Fixed the issue causing PyTorch distributed training jobs on G4dn and G5 instances to not default to communication through PCIe.

Known Issues

  • Improper use of PyTorch/XLA APIs in Hugging Face’s vision transformers might lead to convergence issues.

Other Changes

Migration to AWS Deep Learning Containers

This release passed benchmark testing and is migrated to the following AWS Deep Learning Container:

SageMaker Training Compiler Release Notes: October 4, 2022

Currency Updates
  • Added support for TensorFlow v2.10.0.

Other Changes
  • Added Hugging Face NLP models using the Transformers library to TensorFlow framework tests. To find the tested Transformer models, see Tested Models.

Migration to AWS Deep Learning Containers

This release passed benchmark testing and is migrated to the following AWS Deep Learning Container:

SageMaker Training Compiler Release Notes: September 1, 2022

Currency Updates
  • Added support for Hugging Face Transformers v4.21.1 with PyTorch v1.11.0.

Improvements
Migration to AWS Deep Learning Containers

This release passed benchmark testing and is migrated to the following AWS Deep Learning Container:

SageMaker Training Compiler Release Notes: June 14, 2022

New Features
Migration to AWS Deep Learning Containers

This release passed benchmark testing and is migrated to the following AWS Deep Learning Container:

SageMaker Training Compiler Release Notes: April 26, 2022

Improvements

SageMaker Training Compiler Release Notes: April 12, 2022

Currency Updates
  • Added support for Hugging Face Transformers v4.17.0 with TensorFlow v2.6.3 and PyTorch v1.10.2.

SageMaker Training Compiler Release Notes: February 21, 2022

Improvements
  • Completed benchmark test and confirmed training speed-ups on the ml.g4dn instance types. To find a complete list of tested ml instances, see Supported Instance Types.

SageMaker Training Compiler Release Notes: December 01, 2021

New Features
  • Launched Amazon SageMaker Training Compiler at AWS re:Invent 2021.

Migration to AWS Deep Learning Containers