Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Automatic node recovery

Focus mode
Automatic node recovery - Amazon SageMaker AI

During cluster creation or update, cluster admin users can select the node (instance) recovery option between Automatic (Recommended) and None at the cluster level. If set to Automatic, SageMaker HyperPod reboots or replaces faulty nodes automatically.

Important

We recommend setting the Automatic option.

Automatic node recovery runs when issues are found from health-monitoring agent, basic health checks, and deep health checks. If set to None, the health monitoring agent will label the instances when a fault is detected, but it will not automatically initiate any repair or recovery actions on the affected nodes. This option is not recommended.

PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.