Create a SageMaker HyperPod cluster on training plans using the SageMaker API, or AWS CLI - Amazon SageMaker AI

Create a SageMaker HyperPod cluster on training plans using the SageMaker API, or AWS CLI

To use SageMaker training plans for your Amazon SageMaker HyperPod cluster, specify the ARN of the training plan you want to use in the TrainingPlanArn parameter of the ClusterInstanceGroupSpecification when calling the CreateCluster API operation.

Ensure that the subnet associated with the designated AZ of your plan is included in the VPCConfig of your cluster configuration. You can retrieve the AvailabilityZone of a training plan in the response of a DescribeTrainingPlan API call.

The following sample illustrates how to create a new SageMaker HyperPod cluster and provide an instance group with a training plan in the --instance-groups attribute of the create-cluster AWS CLI command.

# Create a cluster aws sagemaker create-cluster \ --cluster-name cluster-name \ --instance-groups '[ \ { \ "InstanceCount": 1,\ "InstanceGroupName": "controller-nodes",\ "InstanceType": "ml.t3.xlarge",\ "LifeCycleConfig": {"SourceS3Uri": source_s3_uri, "OnCreate": "on_create.sh"},\ "ExecutionRole": "arn:aws:iam::customer_account_id:role/execution_role",\ "ThreadsPerCore": 1,\ },\ { \ "InstanceCount": 2, \ "InstanceGroupName": "worker-nodes",\ "InstanceType": "p4d.24xlarge",\ "LifeCycleConfig": {"SourceS3Uri": source_s3_uri, "OnCreate": "on_create.sh"},\ "ExecutionRole": "arn:aws:iam::customer_account_id}:role/execution_role}",\ "ThreadsPerCore": 1,\ "TrainingPlanArn": training_plan_arn,\ }]'

For information about how to create an HyperPod cluster using the AWS CLI, see create-cluster.

After creating the cluster, you can verify that your instance group was properly assigned capacity from the training plan by calling the DescribeCluster API.

aws sagemaker describe-cluster --cluster-name cluster-name