Multiple queue mode
AWS ParallelCluster version 2.9.0 introduced multiple queue mode. Multiple queue mode is supported when scheduler is set to slurm
and the queue_settings setting is defined. This mode allows different instance
types to coexist in the compute nodes. The compute resources that contain the different instance types can scale up or
down as needed. In queue mode, up to five (5) queues are supported, and each [queue] section can refer to up to three (3) [compute_resource] sections. Each of these [queue]
sections is a partition in Slurm Workload Manager. For more information, see Slurm guide for multiple queue mode and Multiple queue mode tutorial.
Each [compute_resource] section in a queue must have
a different instance type, and each of these [compute_resource]
is further divided into static and
dynamic nodes. Static nodes for each [compute_resource]
are numbered from 1 to the value of min_count. Dynamic nodes for each
[compute_resource]
are numbered from one (1) to (max_count - min_count
). For example, if min_count
is 2 and
max_count
is 10, the dynamic nodes for that [compute_resource]
are numbered from one (1) to
eight (8). At any time, there can be between zero (0) and the max number of dynamic nodes in a
[compute_resource]
.
The instances that are launched into the compute fleet are dynamically assigned. To help manage this, hostnames are generated for each node. The format of the hostname is as follows:
$HOSTNAME=$QUEUE-$STATDYN-$INSTANCE_TYPE-$NODENUM
-
$QUEUE
is the name of the queue. For example, if the section starts[queue
then “queue-name
]$QUEUE
” is “queue-name
”. -
$STATDYN
isst
for static nodes ordy
for dynamic nodes. -
$INSTANCE_TYPE
is the instance type for the[compute_resource]
, from the instance_type setting. -
$NODENUM
is the number of the node.$NODENUM
is between one (1) and the value of min_count for static nodes and between one (1) and (max_count -min_count
) for dynamic nodes.
Both hostnames and fully-qualified domain names (FQDN) are created using Amazon Route 53 hosted zones. The FQDN is
$HOSTNAME.$CLUSTERNAME.pcluster
, where $CLUSTERNAME
is the name of the [cluster] section used for the cluster.
To convert your configuration to a queue mode, use the pcluster-config convert command. It writes an updated configuration with a single [queue] section named [queue compute]
. That queue contains a
single [compute_resource] section that is named
[compute_resource default]
. The [queue compute]
and [compute_resource default]
has settings migrated from the specified [cluster]
section.