Complete prerequisites for SageMaker HyperPod cluster observability - Amazon SageMaker AI

Complete prerequisites for SageMaker HyperPod cluster observability

Before proceeding with the steps to Install metrics exporter packages on your HyperPod cluster, ensure that the following prerequisites are met.

Enable IAM Identity Center

To enable observability for your SageMaker HyperPod cluster, you must first enable IAM Identity Center. This is a prerequisite for deploying an AWS CloudFormation stack that sets up the Amazon Managed Grafana workspace and Amazon Managed Service for Prometheus. Both of these services also require the IAM Identity Center for authentication and authorization, ensuring secure user access and management of the monitoring infrastructure.

For detailed guidance on enabling IAM Identity Center, see the Enabling IAM Identity Center section in the AWS IAM Identity Center User Guide.

After successfully enabling IAM Identity Center, set up a user account that will serve as the administrative user throughout the following configuration precedures.

Create and deploy an AWS CloudFormation stack for SageMaker HyperPod observability

Create and deploy a CloudFormation stack for SageMaker HyperPod observability to monitor HyperPod cluster metrics in real time using Amazon Managed Service for Prometheus and Amazon Managed Grafana. To deploy the stack, note that you also should enable your IAM Identity Center beforehand.

Use the sample CloudFormation script cluster-observability.yaml that helps you set up Amazon VPC subnets, Amazon FSx for Lustre file systems, Amazon S3 buckets, and IAM roles required to create a HyperPod cluster observability stack.