SageMaker HyperPod recipe adapter - Amazon SageMaker AI

SageMaker HyperPod recipe adapter

The SageMaker HyperPod training adapter is a training framework. You can use it to manage the entire lifecycle of your training jobs. Use the adapter to distribute the pre-training or fine-tuning of your models across multiple machines. The adaptor uses different parallelism techniques to distribute the training. It also handles the implementation and management of saving the checkpoints. For more details, see Advanced Settings.

Use the SageMaker HyperPod recipe adapter repository to use the recipe adapter.

  1. src: This directory contains the implementation of Large-scale Language Model (LLM) training, encompassing various features such as model parallelism, mixed-precision training, and checkpointing management.

  2. examples: This folder provides a collection of examples demonstrating how to create an entry point for training an LLM model, serving as a practical guide for users.