AWS Identity and Access Management for SageMaker HyperPod
AWS Identity and Access Management (IAM) is an AWS service that helps an administrator securely control access to AWS resources. IAM administrators control who can be authenticated (signed in) and authorized (have permissions) to use Amazon EKS resources. IAM is an AWS service that you can use with no additional charge.
Important
Custom IAM policies that allow Amazon SageMaker Studio or Amazon SageMaker Studio Classic to create Amazon SageMaker resources must also grant permissions to add tags to those resources. The permission to add tags to resources is required because Studio and Studio Classic automatically tag any resources they create. If an IAM policy allows Studio and Studio Classic to create resources but does not allow tagging, "AccessDenied" errors can occur when trying to create resources. For more information, see Provide permissions for tagging SageMaker AI resources.
AWS managed policies for Amazon SageMaker AI that give permissions to create SageMaker resources already include permissions to add tags while creating those resources.
Let's assume that there are two main layers of SageMaker HyperPod users: cluster admin users and data scientist users.
-
Cluster admin users – Are responsible for creating and managing SageMaker HyperPod clusters. This includes configuring the HyperPod clusters and managing user access to them.
-
Create and configure SageMaker HyperPod clusters with Slurm or Amazon EKS.
-
Create and configure IAM roles for data scientist users and HyperPod cluster resources.
-
For SageMaker HyperPod orchestration with Amazon EKS, create and configure EKS access entries, role-based access control (RBAC), and Pod Identity to fulfill data science use cases.
-
-
Data scientist users – Focus on ML model training. They use the open-source orchestrator or the SageMaker HyperPod CLI to submit and manage training jobs.
-
Assume and use the IAM Role provided by cluster admin users.
-
Interact with the open-source orchestrator CLIs supported by SageMaker HyperPod (Slurm or Kubernetes) or the SageMaker HyperPod CLI to check clusters capacity, connect to cluster, and submit workloads.
-
Set up IAM roles for cluster admins by attaching the right permissions or policies to operate SageMaker HyperPod clusters. Cluster admins also should create IAM roles to provide to SageMaker HyperPod resources to assume to run and communicate with necessary AWS resources, such as Amazon S3, Amazon CloudWatch, and AWS Systems Manager (SSM). Finally, the AWS account admin or the cluster admins should grant scientists permissions to access the SageMaker HyperPod clusters and run ML workloads.
Depending on which orchestrator you choose, permissions needed for the cluster admin and scientists may vary. You can also control the scope of permissions for various actions in the roles using the condition keys per service. Use the following Service Authorization References for adding detailed scope for the services related to SageMaker HyperPod.
-
Amazon Elastic Container Registry (for SageMaker HyperPod cluster orchestration with Amazon EKS)
-
Amazon Elastic Kubernetes Service (for SageMaker HyperPod cluster orchestration with Amazon EKS)
IAM users for cluster admin
Cluster administrators (admins) operate and configure SageMaker HyperPod clusters, performing the tasks in SageMaker HyperPod operation. The following policy example includes the minimum set of permissions for cluster administrators to run the SageMaker HyperPod core APIs and manage SageMaker HyperPod clusters within your AWS account.
To grant permissions to access the SageMaker AI console, use the sample policy provided at Permissions required to use the Amazon SageMaker AI console.
To grant permissions to access the Amazon EC2 Systems Manager console, use the sample policy provided at Using the AWS Systems Manager console in the AWS Systems Manager User Guide.
You might also consider attaching the AmazonSageMakerFullAccess policy to the role; however, note
that the AmazonSageMakerFullAccess
policy grants permissions to the entire
SageMaker API calls, features, and resources.
For guidance on IAM users in general, see IAM users in the AWS Identity and Access Management User Guide.
IAM users for scientists
Scientists log into and run ML workloads on SageMaker HyperPod cluster nodes provisioned by
cluster admins. For scientists in your AWS account, you should grant the permission
"ssm:StartSession"
to run the SSM start-session
command.
The following is a policy example for IAM users.
IAM role for SageMaker HyperPod
For SageMaker HyperPod clusters to run and communicate with necessary AWS resources, you need create an IAM role for HyperPod cluster to assume.
Start with attaching the managed role AWS managed policy: AmazonSageMakerHyperPodServiceRolePolicy. Given this AWS managed policy, SageMaker HyperPod cluster instance groups assume the role to communicate with Amazon CloudWatch, Amazon S3, and AWS Systems Manager Agent (SSM Agent). This managed policy is the minimum requirement for SageMaker HyperPod resources to run properly, so you must provide an IAM role with this policy to all instance groups.
Tip
Depending on your preference on designing the level of permissions for multiple instance groups, you can also set up multiple IAM roles and attach them to different instance groups. When you set up your cluster user access to specific SageMaker HyperPod cluster nodes, the nodes assume the role with the selective permissions you manually attached.
When you set up the access for scientists to specific cluster nodes through AWS Systems Manager
After you are done with creating IAM roles, make notes of their names and ARNs. You use the roles when creating a SageMaker HyperPod cluster, granting the correct permissions required for each instance group to communicate with necessary AWS resources.