Deploy foundation models and custom fine-tuned models

Whether you're deploying pre-trained foundation open-weights or gated models from Amazon SageMaker JumpStart or your own custom or fine-tuned models stored in Amazon S3 or Amazon FSx, SageMaker HyperPod provides the flexible, scalable infrastructure you need for production inference workloads.

	Deploy open-weights and gated foundation models from JumpStart	Deploy custom and fine-tuned models from Amazon S3 and Amazon FSx	Deploy models from local NVMe storage
Description	Deploy from a comprehensive catalog of pre-trained foundation models with automatic optimization and scaling policies tailored to each model family.	Bring your own custom and fine-tuned models and use SageMaker HyperPod's enterprise infrastructure for production-scale inference. Choose between cost-effective storage with Amazon S3 or a high-performance file system with Amazon FSx.	Load model weights from a node's local NVMe storage to eliminate network latency during pod startup. Useful for autoscaling events, scale-from-zero workloads, and latency-sensitive failovers.
Key benefits	One-click deployment through Amazon SageMaker Studio UI Auto-scaling based on incoming requests automatically enabled Pre-optimized containers and configurations for each model family EULA handling for gated models	Support for multiple storage backends: Amazon S3, Amazon FSx Flexible container and framework support Custom scaling policies based on your model's characteristics	Reduced cold-start time by reading weights locally No network dependency for model loading Optional fallback to Amazon S3 when NVMe cache is missing Custom Kubernetes volumes and initContainers
Deployment options	Amazon SageMaker Studio for visual deployment kubectl for Kubernetes-native operations Python SDK for programmatic integration HyperPod CLI for command-line automation	kubectl for Kubernetes-native operations Python SDK for programmatic integration HyperPod CLI for command-line automation	kubectl for Kubernetes-native operations Python SDK for programmatic integration HyperPod CLI for command-line automation

The following sections step you through deploying models from Amazon SageMaker JumpStart, from Amazon S3 and Amazon FSx, and from local NVMe storage.

Topics

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Setting up your HyperPod clusters for model deployment

Deploy models from JumpStart using Studio