ClusterInstanceGroupDetails - Amazon SageMaker

ClusterInstanceGroupDetails

Details of an instance group in a SageMaker HyperPod cluster.

Contents

CurrentCount

The number of instances that are currently in the instance group of a SageMaker HyperPod cluster.

Type: Integer

Valid Range: Minimum value of 0.

Required: No

ExecutionRole

The execution role for the instance group to assume.

Type: String

Length Constraints: Minimum length of 20. Maximum length of 2048.

Pattern: ^arn:aws[a-z\-]*:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+$

Required: No

InstanceGroupName

The name of the instance group of a SageMaker HyperPod cluster.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 63.

Pattern: ^[a-zA-Z0-9](-*[a-zA-Z0-9])*$

Required: No

InstanceStorageConfigs

The additional storage configurations for the instances in the SageMaker HyperPod cluster instance group.

Type: Array of ClusterInstanceStorageConfig objects

Array Members: Maximum number of 1 item.

Required: No

InstanceType

The instance type of the instance group of a SageMaker HyperPod cluster.

Type: String

Valid Values: ml.p4d.24xlarge | ml.p4de.24xlarge | ml.p5.48xlarge | ml.trn1.32xlarge | ml.trn1n.32xlarge | ml.g5.xlarge | ml.g5.2xlarge | ml.g5.4xlarge | ml.g5.8xlarge | ml.g5.12xlarge | ml.g5.16xlarge | ml.g5.24xlarge | ml.g5.48xlarge | ml.c5.large | ml.c5.xlarge | ml.c5.2xlarge | ml.c5.4xlarge | ml.c5.9xlarge | ml.c5.12xlarge | ml.c5.18xlarge | ml.c5.24xlarge | ml.c5n.large | ml.c5n.2xlarge | ml.c5n.4xlarge | ml.c5n.9xlarge | ml.c5n.18xlarge | ml.m5.large | ml.m5.xlarge | ml.m5.2xlarge | ml.m5.4xlarge | ml.m5.8xlarge | ml.m5.12xlarge | ml.m5.16xlarge | ml.m5.24xlarge | ml.t3.medium | ml.t3.large | ml.t3.xlarge | ml.t3.2xlarge | ml.g6.xlarge | ml.g6.2xlarge | ml.g6.4xlarge | ml.g6.8xlarge | ml.g6.16xlarge | ml.g6.12xlarge | ml.g6.24xlarge | ml.g6.48xlarge | ml.gr6.4xlarge | ml.gr6.8xlarge | ml.g6e.xlarge | ml.g6e.2xlarge | ml.g6e.4xlarge | ml.g6e.8xlarge | ml.g6e.16xlarge | ml.g6e.12xlarge | ml.g6e.24xlarge | ml.g6e.48xlarge | ml.p5e.48xlarge | ml.p5en.48xlarge | ml.trn2.48xlarge | ml.c6i.large | ml.c6i.xlarge | ml.c6i.2xlarge | ml.c6i.4xlarge | ml.c6i.8xlarge | ml.c6i.12xlarge | ml.c6i.16xlarge | ml.c6i.24xlarge | ml.c6i.32xlarge | ml.m6i.large | ml.m6i.xlarge | ml.m6i.2xlarge | ml.m6i.4xlarge | ml.m6i.8xlarge | ml.m6i.12xlarge | ml.m6i.16xlarge | ml.m6i.24xlarge | ml.m6i.32xlarge | ml.r6i.large | ml.r6i.xlarge | ml.r6i.2xlarge | ml.r6i.4xlarge | ml.r6i.8xlarge | ml.r6i.12xlarge | ml.r6i.16xlarge | ml.r6i.24xlarge | ml.r6i.32xlarge

Required: No

LifeCycleConfig

Details of LifeCycle configuration for the instance group.

Type: ClusterLifeCycleConfig object

Required: No

OnStartDeepHealthChecks

A flag indicating whether deep health checks should be performed when the cluster instance group is created or updated.

Type: Array of strings

Array Members: Minimum number of 1 item. Maximum number of 2 items.

Valid Values: InstanceStress | InstanceConnectivity

Required: No

OverrideVpcConfig

The customized Amazon VPC configuration at the instance group level that overrides the default Amazon VPC configuration of the SageMaker HyperPod cluster.

Type: VpcConfig object

Required: No

Status

The current status of the cluster instance group.

  • InService: The instance group is active and healthy.

  • Creating: The instance group is being provisioned.

  • Updating: The instance group is being updated.

  • Failed: The instance group has failed to provision or is no longer healthy.

  • Degraded: The instance group is degraded, meaning that some instances have failed to provision or are no longer healthy.

  • Deleting: The instance group is being deleted.

Type: String

Valid Values: InService | Creating | Updating | Failed | Degraded | SystemUpdating | Deleting

Required: No

TargetCount

The number of instances you specified to add to the instance group of a SageMaker HyperPod cluster.

Type: Integer

Valid Range: Minimum value of 0. Maximum value of 6758.

Required: No

ThreadsPerCore

The number you specified to TreadsPerCore in CreateCluster for enabling or disabling multithreading. For instance types that support multithreading, you can specify 1 for disabling multithreading and 2 for enabling multithreading. For more information, see the reference table of CPU cores and threads per CPU core per instance type in the Amazon Elastic Compute Cloud User Guide.

Type: Integer

Valid Range: Minimum value of 1. Maximum value of 2.

Required: No

TrainingPlanArn

The Amazon Resource Name (ARN); of the training plan associated with this cluster instance group.

For more information about how to reserve GPU capacity for your SageMaker HyperPod clusters using Amazon SageMaker Training Plan, see CreateTrainingPlan .

Type: String

Length Constraints: Minimum length of 50. Maximum length of 2048.

Pattern: arn:aws[a-z\-]*:sagemaker:[a-z0-9\-]*:[0-9]{12}:training-plan/.*

Required: No

TrainingPlanStatus

The current status of the training plan associated with this cluster instance group.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 63.

Required: No

See Also

For more information about using this API in one of the language-specific AWS SDKs, see the following: