SageMaker HyperPod cluster management
The following topics discuss logging and managing SageMaker HyperPod clusters.
Logging SageMaker HyperPod events
All events and logs from SageMaker HyperPod are saved to Amazon CloudWatch under the log group name
/aws/sagemaker/Clusters/[ClusterName]/[ClusterID]
. Every call to the
CreateCluster
API creates a new log group. The following list contains
all of the available log streams collected in each log group.
Log Group Name | Log Stream Name |
/aws/sagemaker/Clusters/[ClusterName]/[ClusterID] |
LifecycleConfig/[instance-group-name]/[instance-id] |
Logging SageMaker HyperPod at instance level
You can access the LifecycleScript logs published to CloudWatch during cluster
instance configuration. Every instance within the created cluster generates a separate
log stream, distinguishable by the
LifecycleConfig/[instance-group-name]/[instance-id]
format.
All logs that are written to /var/log/provision/provisioning.log
are
uploaded to the preceding CloudWatch stream. Sample LifecycleScripts at 1.architectures/5.sagemaker_hyperpods/LifecycleScripts/base-config
stdout
and stderr
to this location. If you are
using your custom scripts, write your logs to the
/var/log/provision/provisioning.log
location for them to be available
in CloudWatch.
Tagging resources
AWS Tagging system helps manage, identify, organize, search for, and filter resources. SageMaker HyperPod supports tagging, so you can manage the clusters as an AWS resource. During cluster creation or editing an existing cluster, you can add or edit tags for the cluster. To learn more about tagging in general, see Tagging your AWS resources.
Using the SageMaker HyperPod console UI
When you are creating a new cluster and editing a cluster, you can add, remove, or edit tags.
Using the SageMaker HyperPod APIs
When you write a CreateCluster
or UpdateCluster
API request file in JSON format, edit the Tags
section.
Using the AWS CLI tagging commands for SageMaker
To tag a cluster
Use aws sagemaker add-tags
as follows.
aws sagemaker add-tags --resource-arn
cluster_ARN
--tags Key=string
,Value=string
To untag a cluster
Use aws sagemaker delete-tags
as follows.
aws sagemaker delete-tags --resource-arn
cluster_ARN
--tag-keys"tag_key"
To list tags for a resource
Use aws sagemaker list-tags
as follows.
aws sagemaker list-tags --resource-arn
cluster_ARN