HyperPod tabs in Studio
In Amazon SageMaker Studio you can navigate to one of your clusters in HyperPod clusters (under Compute) and view your list of clusters. The displayed clusters contain information like tasks, hardware metrics, settings, and metadata details. This visibility can help your team identify the right candidate for your pre-training or finetuning workloads. The following sections provide information on each type of information.
Tasks
Amazon SageMaker HyperPod provides a view of your cluster tasks. Tasks are operations or jobs that are sent to the cluster. These can be machine learning operations, like training, running experiments, or inference. The following section provides information on your HyperPod cluster tasks.
In Amazon SageMaker Studio, you can navigate to one of your clusters in HyperPod clusters (under Compute) and view the Tasks information on your cluster. If you are having any issues with viewing tasks, see Troubleshoot.
The task table includes:
Metrics
Amazon SageMaker HyperPod provides a view of your Slurm or Amazon EKS cluster utilization metrics. The following provides information on your HyperPod cluster metrics.
You will need to install the Amazon EKS add-on to view the following metrics. For more information, see Install the Amazon CloudWatch Observability EKS add-on.
In Amazon SageMaker Studio, you can navigate to one of your clusters in HyperPod clusters (under Compute) and view the Metrics details on your cluster. Metrics provides a comprehensive view of cluster utilization metrics, including hardware, team, and task metrics. This includes compute availability and usage, team allocation and utilization, and task run and wait time information.
Settings
Amazon SageMaker HyperPod provides a view of your cluster settings. The following provides information on your HyperPod cluster settings.
In Amazon SageMaker Studio you can navigate to one of your clusters in HyperPod clusters (under Compute) and view the Settings information on your cluster. The information includes the following:
-
Instances details, including instance ID, status, instance type, and instance group
-
Instance groups details, including instance group name, type, counts, and compute information
-
Orchestration details, including the orchestrator, version, and certification authority
-
Cluster resiliency details
-
Security details, including subnets and security groups
Details
Amazon SageMaker HyperPod provides a view of your cluster metadata details. The following paragraph provides information on how to get your HyperPod cluster details.
In Amazon SageMaker Studio, you can navigate to one of your clusters in HyperPod clusters (under Compute) and view the Details on your cluster. This includes the tags, logs, and metadata.