SageMaker HyperPod recipes - Amazon SageMaker AI

SageMaker HyperPod recipes

Use Amazon SageMaker HyperPod recipes to get started with training and fine-tuning publicly available foundation models. To view the available recipes, see SageMaker HyperPod recipes.

The recipes are pre-configured training configurations for the following model families:

You can run recipes within SageMaker HyperPod or as SageMaker training jobs. You use the Amazon SageMaker HyperPod training adapter as the framework to help you run end-to-end training workflows. The training adapter is built on the NVIDIA NeMo framework and Neuronx Distributed Training package. If you're familiar with using NeMo, the process of using the training adapter is the same. The training adapter runs the recipe on your cluster.

Diagram showing SageMaker HyperPod recipe workflow. A "Recipe" icon at the top feeds into a "HyperPod recipe launcher" box. This box connects to a larger section labeled "Cluster: Slurm, K8s, ..." containing three GPU icons with associated recipe files. The bottom of the cluster section is labeled "Train with HyperPod Training Adapter".

You can also train your own model by defining your own custom recipe.

The following tables outline the predefined recipes and launch scripts that SageMaker HyperPod currently supports.

Available pre-training models, recipes, and launch scripts
Model Size Sequence Nodes Instance Accelerator Recipe Script
Llama3.2 11b 8192 4 ml.p5.48xlarge Nvidia H100 link link
Llama3.2 90b 8192 32 ml.p5.48xlarge Nvidia H100 link link
Llama3.2 1b 8192 1 ml.p5.48xlarge Nvidia H100 link link
Llama3.2 3b 8192 1 ml.p5.48xlarge Nvidia H100 link link
Llama3.1 70b 16384 32 ml.p5.48xlarge Nvidia H100 link link
Llama3.1 70b 16384 64 ml.p5.48xlarge Nvidia H100 link link
Llama3.1 70b 8192 32 ml.p5.48xlarge Nvidia H100 link link
Llama3.1 70b 8192 64 ml.p5.48xlarge Nvidia H100 link link
Llama3 70b 8192 16 ml.trn1.32xlarge AWS TRN link link
Llama3.1 8b 16384 16 ml.p5.48xlarge Nvidia H100 link link
Llama3.1 8b 16384 32 ml.p5.48xlarge Nvidia H100 link link
Llama3.1 8b 8192 16 ml.p5.48xlarge Nvidia H100 link link
Llama3.1 8b 8192 32 ml.p5.48xlarge Nvidia H100 link link
Llama3 8b 8192 4 ml.trn1.32xlarge AWS TRN link link
Llama3.1 8b 8192 16 ml.p5.48xlarge Nvidia H100 link N/A
Mistral 7b 16384 16 ml.p5.48xlarge Nvidia H100 link link
Mistral 7b 16384 32 ml.p5.48xlarge Nvidia H100 link link
Mistral 7b 8192 16 ml.p5.48xlarge Nvidia H100 link link
Mistral 7b 8192 32 ml.p5.48xlarge Nvidia H100 link link
Mixtral 22b 16384 32 ml.p5.48xlarge Nvidia H100 link link
Mixtral 22b 16384 64 ml.p5.48xlarge Nvidia H100 link link
Mixtral 22b 8192 32 ml.p5.48xlarge Nvidia H100 link link
Mixtral 22b 8192 64 ml.p5.48xlarge Nvidia H100 link link
Mixtral 7b 16384 16 ml.p5.48xlarge Nvidia H100 link link
Mixtral 7b 16384 32 ml.p5.48xlarge Nvidia H100 link link
Mixtral 7b 8192 16 ml.p5.48xlarge Nvidia H100 link link
Mixtral 7b 8192 32 ml.p5.48xlarge Nvidia H100 link link
Available fine-tuning models, recipes, and launch scripts
Model Method Size Sequence length Nodes Instance Accelerator Recipe Script
Llama3.1 QLoRA 405b 131072 2 ml.p5.48xlarge Nvidia H100 link link
Llama3.1 LoRA 405b 16384 6 ml.p5.48xlarge Nvidia H100 link link
Llama3.1 QLoRA 405b 16384 2 ml.p5.48xlarge Nvidia H100 link link
Llama3.1 LoRA 405b 16384 6 ml.p5.48xlarge Nvidia H100 link link
Llama3.1 QLoRA 405b 8192 2 ml.p5.48xlarge Nvidia H100 link link
Llama3.1 SFT 70b 16384 16 ml.p5.48xlarge Nvidia H100 link link
Llama3.1 LoRA 70b 16384 2 ml.p5.48xlarge Nvidia H100 link link
Llama3.1 SFT 70b 8192 10 ml.p5.48xlarge Nvidia H100 link link
Llama3.1 LoRA 70b 8192 1 ml.p5.48xlarge Nvidia H100 link link
Llama3.1 SFT 8b 16384 1 ml.p5.48xlarge Nvidia H100 link link
Llama3.1 LoRA 8b 16384 1 ml.p5.48xlarge Nvidia H100 link link
Llama3.1 SFT 8b 8192 1 ml.p5.48xlarge Nvidia H100 link link
Llama3.1 LoRA 8b 8192 1 ml.p5.48xlarge Nvidia H100 link link
Llama3.1 SFT 70b 8192 32 ml.p4d.24xlarge Nvidia A100 link link
Llama3.1 LoRA 70b 8192 20 ml.p4d.24xlarge Nvidia A100 link link
Llama3.1 SFT 8b 8192 4 ml.p4d.24xlarge Nvidia A100 link link
Llama3.1 LoRA 8b 8192 1 ml.p4d.24xlarge Nvidia A100 link link
Llama3 SFT 8b 8192 1 ml.trn1.32xlarge AWS TRN link link

To get started with a tutorial, see Tutorials.