SageMaker HyperPod recipes

Use Amazon SageMaker HyperPod recipes to get started with training and fine-tuning publicly available foundation models. To view the available recipes, see SageMaker HyperPod recipes.

The recipes are pre-configured training configurations for the following model families:

You can run recipes within SageMaker HyperPod or as SageMaker training jobs. You use the Amazon SageMaker HyperPod training adapter as the framework to help you run end-to-end training workflows. The training adapter is built on the NVIDIA NeMo framework and Neuronx Distributed Training package. If you're familiar with using NeMo, the process of using the training adapter is the same. The training adapter runs the recipe on your cluster.

Diagram showing SageMaker HyperPod recipe workflow. A "Recipe" icon at the top feeds into a "HyperPod recipe launcher" box. This box connects to a larger section labeled "Cluster: Slurm, K8s, ..." containing three GPU icons with associated recipe files. The bottom of the cluster section is labeled "Train with HyperPod Training Adapter".

You can also train your own model by defining your own custom recipe.

The following tables outline the predefined recipes and launch scripts that SageMaker HyperPod currently supports.

Available pre-training models, recipes, and launch scripts
Model	Size	Sequence	Nodes	Instance	Accelerator	Recipe	Script
Llama3.2	11b	8192	4	ml.p5.48xlarge	Nvidia H100	link	link
Llama3.2	90b	8192	32	ml.p5.48xlarge	Nvidia H100	link	link
Llama3.2	1b	8192	1	ml.p5.48xlarge	Nvidia H100	link	link
Llama3.2	3b	8192	1	ml.p5.48xlarge	Nvidia H100	link	link
Llama3.1	70b	16384	32	ml.p5.48xlarge	Nvidia H100	link	link
Llama3.1	70b	16384	64	ml.p5.48xlarge	Nvidia H100	link	link
Llama3.1	70b	8192	32	ml.p5.48xlarge	Nvidia H100	link	link
Llama3.1	70b	8192	64	ml.p5.48xlarge	Nvidia H100	link	link
Llama3	70b	8192	16	ml.trn1.32xlarge	AWS TRN	link	link
Llama3.1	8b	16384	16	ml.p5.48xlarge	Nvidia H100	link	link
Llama3.1	8b	16384	32	ml.p5.48xlarge	Nvidia H100	link	link
Llama3.1	8b	8192	16	ml.p5.48xlarge	Nvidia H100	link	link
Llama3.1	8b	8192	32	ml.p5.48xlarge	Nvidia H100	link	link
Llama3	8b	8192	4	ml.trn1.32xlarge	AWS TRN	link	link
Llama3.1	8b	8192	16	ml.p5.48xlarge	Nvidia H100	link	N/A
Mistral	7b	16384	16	ml.p5.48xlarge	Nvidia H100	link	link
Mistral	7b	16384	32	ml.p5.48xlarge	Nvidia H100	link	link
Mistral	7b	8192	16	ml.p5.48xlarge	Nvidia H100	link	link
Mistral	7b	8192	32	ml.p5.48xlarge	Nvidia H100	link	link
Mixtral	22b	16384	32	ml.p5.48xlarge	Nvidia H100	link	link
Mixtral	22b	16384	64	ml.p5.48xlarge	Nvidia H100	link	link
Mixtral	22b	8192	32	ml.p5.48xlarge	Nvidia H100	link	link
Mixtral	22b	8192	64	ml.p5.48xlarge	Nvidia H100	link	link
Mixtral	7b	16384	16	ml.p5.48xlarge	Nvidia H100	link	link
Mixtral	7b	16384	32	ml.p5.48xlarge	Nvidia H100	link	link
Mixtral	7b	8192	16	ml.p5.48xlarge	Nvidia H100	link	link
Mixtral	7b	8192	32	ml.p5.48xlarge	Nvidia H100	link	link

Available fine-tuning models, recipes, and launch scripts
Model	Method	Size	Sequence length	Nodes	Instance	Accelerator	Recipe	Script
Llama3.1	QLoRA	405b	131072	2	ml.p5.48xlarge	Nvidia H100	link	link
Llama3.1	LoRA	405b	16384	6	ml.p5.48xlarge	Nvidia H100	link	link
Llama3.1	QLoRA	405b	16384	2	ml.p5.48xlarge	Nvidia H100	link	link
Llama3.1	LoRA	405b	16384	6	ml.p5.48xlarge	Nvidia H100	link	link
Llama3.1	QLoRA	405b	8192	2	ml.p5.48xlarge	Nvidia H100	link	link
Llama3.1	SFT	70b	16384	16	ml.p5.48xlarge	Nvidia H100	link	link
Llama3.1	LoRA	70b	16384	2	ml.p5.48xlarge	Nvidia H100	link	link
Llama3.1	SFT	70b	8192	10	ml.p5.48xlarge	Nvidia H100	link	link
Llama3.1	LoRA	70b	8192	1	ml.p5.48xlarge	Nvidia H100	link	link
Llama3.1	SFT	8b	16384	1	ml.p5.48xlarge	Nvidia H100	link	link
Llama3.1	LoRA	8b	16384	1	ml.p5.48xlarge	Nvidia H100	link	link
Llama3.1	SFT	8b	8192	1	ml.p5.48xlarge	Nvidia H100	link	link
Llama3.1	LoRA	8b	8192	1	ml.p5.48xlarge	Nvidia H100	link	link
Llama3.1	SFT	70b	8192	32	ml.p4d.24xlarge	Nvidia A100	link	link
Llama3.1	LoRA	70b	8192	20	ml.p4d.24xlarge	Nvidia A100	link	link
Llama3.1	SFT	8b	8192	4	ml.p4d.24xlarge	Nvidia A100	link	link
Llama3.1	LoRA	8b	8192	1	ml.p4d.24xlarge	Nvidia A100	link	link
Llama3	SFT	8b	8192	1	ml.trn1.32xlarge	AWS TRN	link	link

To get started with a tutorial, see Tutorials.

Topics

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

IAM for HyperPod

Tutorials