SageMaker HyperPod task governance is a robust management system designed to streamline resource allocation and ensure efficient utilization of compute resources across teams and projects for your Amazon EKS clusters. This provides administrators with the capability to set:
-
Priority levels for various tasks
-
Compute allocation for each team
-
How each team lends and borrows idle compute
-
If a team preempts their own tasks
HyperPod task governance also provides Amazon EKS cluster Observability, offering real-time visibility into cluster capacity. This includes compute availability and usage, team allocation and utilization, and task run and wait time information, setting you up for informed decision-making and proactive resource management.
The following sections cover how to set up, understand key concepts, and use HyperPod task governance for your Amazon EKS clusters.