Amazon SageMaker Unified Studio is in preview release and is subject to change.
HyperPod clusters
Use Amazon SageMaker AI HyperPod to help you provision resilient compute clusters for running model training or fine-tuning workloads. Amazon SageMaker AI HyperPod integrates with Slurm or Amazon EKS for orchestration.
You can create HyperPod clusters using the Amazon SageMaker AI Hyperpod console UI or SageMaker AI Studio. For more information, see Orchestrating SageMaker AI HyperPod clusters with Slurm or Orchestrating SageMaker AI HyperPod clusters with Amazon EKS in the Amazon SageMaker AI Developer Guide.
In Amazon SageMaker Unified Studio, you can launch machine learning workloads on Amazon SageMaker AI HyperPod clusters. You can also view details about the HyperPod clusters.
Topics
Connect to a HyperPod cluster
To use a HyperPod cluster in Amazon SageMaker Unified Studio, you create a connection to the cluster by following these steps:
-
Sign in to Amazon SageMaker Unified Studio using the link that your administrator gave you.
From the Build drop-down menu, choose HyperPod. The compute page displays the HyperPod clusters for your project.
-
Choose Add compute.
-
In the Add compute form, configure the following fields:
For Connection name, enter a name for this connection.
For HuperPod cluster name, enter the name of the HyperPod cluster.
For Access role ARN, enter the IAM role that the project needs to assume.
For Account ID, enter the AWS account where the runtime role exists.
For AWS Region, enter the Region where the HyperPod cluster was created.
View the HyperPod clusters
To view the HyperPod clusters in your project, follow these steps:
-
Sign in to Amazon SageMaker Unified Studio using the link that your administrator gave you.
From the Build drop-down menu, choose HyperPods.
The portal opens the HyperPod clusters tab of the Compute page. The HyperPod clusters table provides a summary view of each cluster, including the ARN, status, and creation time.
View details about a HyperPod cluster
To view the details page for a HyperPod cluster, choose the HyperPod from the table of HyperPod clusters. The page displays tabs for tasks, metrics, settings, and metadata details.
For more information about HyperPod cluster details that you can view in Amazon SageMaker Unified Studio, see HyperPod tabs in Studio in the Amazon SageMaker AI Developer Guide.
HyperPod task governance
For Amazon EKS clusters, you can use HyperPod task governance to streamline resource allocation and utilization of compute resources in the cluster.
HyperPod task governance provides a comprehensive dashboard view of your Amazon EKS cluster utilization metrics, including hardware, team, and task metrics.
For more information about the HyperPod dashboard view, see Dashboard in the Amazon SageMaker AI Developer Guide.
Open the HyperPod in JupyterLab
To open your HyperPod in JupyterLab, follow these steps:
From the cluster details page, choose Open in JupyterLab.
The Starting space page opens and the space initialization starts.
After the JupyterLab space is ready, it opens the HyperPod sample notebook.
-
The HyperPod sample notebook shows the end-to-end flow of how to use the HyperPod cluster, including sample commands for:
-
Connecting to the cluster
-
Submitting jobs to the cluster.
-
Viewing job status or cluster status.
-