Troubleshoot
The following section lists troubleshooting solutions for HyperPod in Studio.
Topics
Tasks tab
If you get Custom Resource Definition (CRD) is not configured on the
cluster
while in the Tasks tab.
-
Grant
EKSAdminViewPolicy
andClusterAccessRole
policies to your domain execution role.For information on how to add tags to your execution role, see Tag IAM roles.
To learn how to attach policies to an IAM user or group, see Adding and removing IAM identity permissions.
If the tasks grid for Slurm metrics doesn’t stop loading in the Tasks tab.
-
Ensure that
RunAs
enabled in your AWS Session Manager preferences and the role you are using has theSSMSessionRunAs
tag attached.-
To enable
RunAs
, navigate to the Preference tab in the Systems Manager console.
-
For restricted task view in Studio for EKS clusters:
-
If your execution role doesn’t have permissions to list namespaces for EKS clusters.
-
If users are experiencing issues with access for EKS clusters.
-
Verify RBAC is enabled by running the following AWS CLI command.
kubectl api-versions | grep rbac
This should return rbac.authorization.k8s.io/v1.
-
Check if the
ClusterRole
andClusterRoleBinding
exist by running the following commands.kubectl get clusterrole pods-events-crd-cluster-role kubectl get clusterrolebinding pods-events-crd-cluster-role-binding
-
Verify user group membership. Ensure the user is correctly assigned to the
pods-events-crd-cluster-level
group in your identity provider or IAM.
-
-
If user can't see any resources.
-
Verify group membership and ensure the
ClusterRoleBinding
is correctly applied.
-
-
If users can see resources in all namespaces.
-
If namespace restriction is required, consider using
Role
andRoleBinding
instead ofClusterRole
andClusterRoleBinding
.
-
-
If configuration appears correct, but permissions aren't applied.
-
Check if there are any
NetworkPolicies
orPodSecurityPolicies
interfering with access.
-
Metrics tab
If there are no Amazon CloudWatch metrics are displayed in the Metrics tab.
-
The
Metrics
section of HyperPod cluster details uses CloudWatch to fetch the data. In order to see the metrics in this section, you need to have enabled Cluster observability. Contact your administrator to configure metrics.