Multiple instance type allocation with Slurm - AWS ParallelCluster

Multiple instance type allocation with Slurm

Starting with AWS ParallelCluster version 3.3.0, you can configure your cluster to allocate from a compute resource's set of defined instance types. Allocation can be based on Amazon EC2 fleet low cost or optimal capacity strategies.

This set of defined instance types must either all have the same number of vCPUs or, if multithreading is disabled, the same number of cores. Moreover, this set of instance types must have the same number of accelerators of the same manufacturers. If Efa / Enabled is set to true, the instances must have EFA supported. For more information and requirements, see Scheduling / SlurmQueues / AllocationStrategy and ComputeResources / Instances.

You can set AllocationStrategy to lowest-price or capacity-optimized depending on your CapacityType configuration.

In Instances, you can configure a set of instance types.


Starting with AWS ParallelCluster version 3.7.0, EnableMemoryBasedScheduling can be enabled if you configure multiple instance types in Instances.

For AWS ParallelCluster versions 3.2.0 to 3.6.x, EnableMemoryBasedScheduling can't be enabled if you configure multiple instance types in Instances.

The following examples show how you can query instance types for vCPUs, EFA support, and architecture.

Query InstanceTypes with 96 vCPUs and x86_64 architecture.

$ aws ec2 describe-instance-types --region region-id \ --filters "Name=vcpu-info.default-vcpus,Values=96" "Name=processor-info.supported-architecture,Values=x86_64" \ --query "sort_by(InstanceTypes[*].{InstanceType:InstanceType,MemoryMiB:MemoryInfo.SizeInMiB,CurrentGeneration:CurrentGeneration,VCpus:VCpuInfo.DefaultVCpus,Cores:VCpuInfo.DefaultCores,Architecture:ProcessorInfo.SupportedArchitectures[0],MaxNetworkCards:NetworkInfo.MaximumNetworkCards,EfaSupported:NetworkInfo.EfaSupported,GpuCount:GpuInfo.Gpus[0].Count,GpuManufacturer:GpuInfo.Gpus[0].Manufacturer}, &InstanceType)" \ --output table

Query InstanceTypes with 64 cores, EFA support, and arm64 architecture.

$ aws ec2 describe-instance-types --region region-id \ --filters "Name=vcpu-info.default-cores,Values=64" "Name=processor-info.supported-architecture,Values=arm64" "Name=network-info.efa-supported,Values=true" --query "sort_by(InstanceTypes[*].{InstanceType:InstanceType,MemoryMiB:MemoryInfo.SizeInMiB,CurrentGeneration:CurrentGeneration,VCpus:VCpuInfo.DefaultVCpus,Cores:VCpuInfo.DefaultCores,Architecture:ProcessorInfo.SupportedArchitectures[0],MaxNetworkCards:NetworkInfo.MaximumNetworkCards,EfaSupported:NetworkInfo.EfaSupported,GpuCount:GpuInfo.Gpus[0].Count,GpuManufacturer:GpuInfo.Gpus[0].Manufacturer}, &InstanceType)" \ --output table

The next example cluster configuration snippet shows how you can use these InstanceType and AllocationStrategy properties.

... Scheduling: Scheduler: slurm SlurmQueues: - Name: queue-1 CapacityType: ONDEMAND AllocationStrategy: lowest-price ... ComputeResources: - Name: computeresource1 Instances: - InstanceType: r6g.2xlarge - InstanceType: m6g.2xlarge - InstanceType: c6g.2xlarge MinCount: 0 MaxCount: 500 - Name: computeresource2 Instances: - InstanceType: m6g.12xlarge - InstanceType: x2gd.12xlarge MinCount: 0 MaxCount: 500 ...