Training plans utilization for Amazon SageMaker HyperPod clusters - Amazon SageMaker AI

Training plans utilization for Amazon SageMaker HyperPod clusters

To use SageMaker training plans for your Amazon SageMaker HyperPod cluster, you specify the training plan you want to use at the cluster instance level when creating or updating your cluster.

Note
  • The training plan must be in the Scheduled or Active status to be used by an HyperPod cluster.

  • Ensure the cluster configuration aligns with the Availability Zone (AZ) specified in your training plan.

    For VPC setup, resource location, and security group configuration, refer to Setting up SageMaker HyperPod with your Amazon VPC in the SageMaker HyperPod documentation.

    If setting up HyperPod with Amazon FSx for Lustre, learn about Region and AZ selection, review VPC configuration requirements, and understand AZ alignment best practices in (Optional) Setting up SageMaker HyperPod with Amazon FSx for Lustre.

  • You can select a plan for each of your instance groups. However, we do not recommend using a training plan for the primary instance group of a cluster, as primary nodes require continuous, stable resources that don't align with the fixed duration and potentially discontinuous nature of training plan capacities.