Slurm versions in AWS PCS - AWS PCS

Slurm versions in AWS PCS

SchedMD continually enhances Slurm with new capabilities, optimizations, and security patches. SchedMD releases a new major version at regular intervals and plans to support up to 3 versions at any given time. AWS PCS initially supports Slurm 23.11. AWS PCS is designed to automatically update the Slurm controller with patch versions.

When SchedMD ends support for a particular major version, AWS PCS also ends support for that major version. AWS PCS sends advance notice if a Slurm major version is close to its end of life, to help customers know when to upgrade their clusters to a newer supported version.

We recommend you use the latest supported Slurm version to deploy your cluster, to access the most recent advancements and improvements.

Frequently asked questions about Slurm versions

How long does AWS PCS support a Slurm version?

AWS PCS follows the SchedMD support cycles for major versions. AWS PCS supports up to 3 major versions at any given time. After SchedMD releases a new major version, AWS PCS retires the oldest supported version. AWS PCS releases a new major version of Slurm as soon as possible, but there might be a delay between the SchedMD release and its availability in AWS PCS.

When does AWS PCS notify me about the End of Support Life (EOSL) for Slurm versions?

AWS PCS notifies you multiple times, in a pre-determined cadence, before the EOSL date.

What do I have to do when a Slurm version approaches EOSL?

You must update your Slurm versions before EOSL to help maintain a secure and supported environment.

How can I update my clusters to use a new major version of Slurm?

To update the Slurm version, you must create a new cluster. You must also upgrade to the equivalent AWS PCS software in your Amazon Machine Image (AMI) and use it to create the compute node groups for your new cluster.

How will my clusters get new Slurm patch version releases?

AWS PCS is designed to automatically apply patches to address Slurm Common Vulnerabilities and Exposures (CVEs). AWS PCS applies the patches to cluster controllers that run in internal service-owned accounts. To install patches on EC2 instances in your AWS account, update the AMI for your compute node groups and update the compute node groups to use the updated AMI. For more information, see Custom Amazon Machine Images (AMIs) for AWS PCS.

Note

Slurm controllers are unavailable while we update them. Running jobs aren't affected. Jobs submitted when the cluster's controller is unavailable are held until the controller is available.

What if I don’t update Slurm by the EOSL date?

AWS PCS is designed to stop clusters that have an unsupported Slurm version. You must update the Slurm major version of the cluster controller and the AWS PCS software installed on the compute node groups.

How many Slurm versions does AWS PCS support?

AWS PCS supports up to 3 major Slurm versions at any given time, including the current and 2 previous major versions.

What Slurm version updates should I apply?

We strongly recommend you use the same major version across all components in your cluster and install the latest patches as soon as they are released. The AMIs for your compute node groups must use a version of Slurm software compatible with the Slurm version of the cluster controller. The Slurm major version in your AMIs must be within 2 versions of the Slurm major version on the cluster controller. The Slurm version installed in the AMI and on the running EC2 instances in the cluster can’t be newer than the Slurm version on the cluster controller. To maintain support for your cluster, your AMIs must use a supported AWS PCS software version.

What if I update the Slurm major version but use older Slurm software in my AMI for compute node groups?

You must update the AWS PCS software to the same version to use new Slurm functionality. For full AWS PCS support, all Slurm components must use supported versions. In summary:

  • We are able to provide full support when the cluster controller and all components (AWS PCS packages) in your AWS account both use the supported versions.

  • AWS PCS is designed to stop a cluster if the Slurm version of its controller reaches EOSL.

  • If the Slurm version of components in your AWS account reach EOSL, your cluster won’t be supported.

In what order should I update components in my Cluster?

You must update the Slurm version of your cluster controller before you use an AMI with a newer Slurm version. You update a compute node group to use the AMI. AWS PCS uses the AMI to launch new EC2 instances in the compute node group. AWS PCS doesn’t update existing EC2 instances that have running jobs; AWS PCS is designed to terminate those instances after their jobs complete.

Does AWS PCS offer extended support for Slurm versions?

No. We will communicate detailed information about extended support options, including any additional costs and the specific support coverage provided.