View a markdown version of this page

StartClusterHealthCheck - Amazon SageMaker

StartClusterHealthCheck

Start deep health checks for a SageMaker HyperPod cluster. You can use DescribeClusterNode API to track progress of the deep health checks. The unhealthy nodes will be automatically rebooted or replaced. Please see Resilience-related Kubernetes labels by SageMaker HyperPod for details.

Request Syntax

{ "ClusterName": "string", "DeepHealthCheckConfigurations": [ { "DeepHealthChecks": [ "string" ], "InstanceGroupName": "string", "InstanceIds": [ "string" ] } ] }

Request Parameters

For information about the parameters that are common to all actions, see Common Parameters.

The request accepts the following data in JSON format.

ClusterName

The string name or the Amazon Resource Name (ARN) of the SageMaker HyperPod cluster.

Type: String

Length Constraints: Minimum length of 0. Maximum length of 256.

Pattern: (arn:aws[a-z\-]*:sagemaker:[a-z0-9\-]*:[0-9]{12}:cluster/[a-z0-9]{12})|([a-zA-Z0-9](-*[a-zA-Z0-9]){0,62})

Required: Yes

DeepHealthCheckConfigurations

A list of configurations containing instance group names, EC2 instance IDs, and deep health checks to perform.

Type: Array of InstanceGroupHealthCheckConfiguration objects

Array Members: Minimum number of 1 item. Maximum number of 99 items.

Required: Yes

Response Syntax

{ "ClusterArn": "string" }

Response Elements

If the action is successful, the service sends back an HTTP 200 response.

The following data is returned in JSON format by the service.

ClusterArn

The Amazon Resource Name (ARN) of the SageMaker HyperPod cluster on which the deep health checks were initiated.

Type: String

Length Constraints: Minimum length of 0. Maximum length of 256.

Pattern: arn:aws[a-z\-]*:sagemaker:[a-z0-9\-]*:[0-9]{12}:cluster/[a-z0-9]{12}

Errors

For information about the errors that are common to all actions, see Common Error Types.

ResourceNotFound

Resource being access is not found.

HTTP Status Code: 400

See Also

For more information about using this API in one of the language-specific AWS SDKs, see the following: