Multiple queue mode - AWS ParallelCluster

Multiple queue mode

AWS ParallelCluster version 2.9.0 introduced multiple queue mode. Multiple queue mode is supported when scheduler is set to slurm and the queue_settings setting is defined. This mode allows different instance types to coexist in the compute nodes. The compute resources that contain the different instance types can scale up or down as needed. In queue mode, up to five (5) queues are supported, and each [queue] section can refer to up to three (3) [compute_resource] sections. Each of these [queue] sections is a partition in Slurm Workload Manager. For more information, see Slurm guide for multiple queue mode and Multiple queue mode tutorial.

Each [compute_resource] section in a queue must have a different instance type, and each of these [compute_resource] is further divided into static and dynamic nodes. Static nodes for each [compute_resource] are numbered from 1 to the value of min_count. Dynamic nodes for each [compute_resource] are numbered from one (1) to (max_count - min_count). For example, if min_count is 2 and max_count is 10, the dynamic nodes for that [compute_resource] are numbered from one (1) to eight (8). At any time, there can be between zero (0) and the max number of dynamic nodes in a [compute_resource].

The instances that are launched into the compute fleet are dynamically assigned. To help manage this, hostnames are generated for each node. The format of the hostname is as follows:

$HOSTNAME=$QUEUE-$STATDYN-$INSTANCE_TYPE-$NODENUM

  • $QUEUE is the name of the queue. For example, if the section starts [queue queue-name] then “$QUEUE” is “queue-name”.

  • $STATDYN is st for static nodes or dy for dynamic nodes.

  • $INSTANCE_TYPE is the instance type for the [compute_resource], from the instance_type setting.

  • $NODENUM is the number of the node. $NODENUM is between one (1) and the value of min_count for static nodes and between one (1) and (max_count - min_count) for dynamic nodes.

Both hostnames and fully-qualified domain names (FQDN) are created using Amazon Route 53 hosted zones. The FQDN is $HOSTNAME.$CLUSTERNAME.pcluster, where $CLUSTERNAME is the name of the [cluster] section used for the cluster.

To convert your configuration to a queue mode, use the pcluster-config convert command. It writes an updated configuration with a single [queue] section named [queue compute]. That queue contains a single [compute_resource] section that is named [compute_resource default]. The [queue compute] and [compute_resource default] has settings migrated from the specified [cluster] section.