Creating a cluster with an external Slurmdbd accounting - AWS ParallelCluster

Creating a cluster with an external Slurmdbd accounting

Learn how to configure and create a cluster with external Slurmdbd accounting. For more information, see Slurm accounting with AWS ParallelCluster.

When using the AWS ParallelCluster command line interface (CLI) or API, you only pay for the AWS resources that are created when you create or update AWS ParallelCluster images and clusters. For more information, see AWS services used by AWS ParallelCluster.

The AWS ParallelCluster UI is built on a serverless architecture and you can use it within the AWS Free Tier category for most cases. For more information, see AWS ParallelCluster UI costs.

In this tutorial, you use a AWS CloudFormation quick-create template to create the necessary components to deploy a Slurmdbd instance on the same VPC as the cluster. The template creates a basic networking and security configuration for the connection between the cluster and the database.

Note

Starting with version 3.10.0, AWS ParallelCluster supports external Slurmdbd with the cluster configuration parameter SlurmSettings / ExternelSlurmdbd.

Note

The quick-create template serves as an example. This template doesn't cover all possible use cases. It's your responsibility to create an external Slurmdbd with the configuration and capacity appropriate for your production workloads.

Prerequisites:

Step 1: Create the Slurmdbd stack

In this tutorial, use a CloudFormation quick-create template (us-east-1) to create a Slurmdbd stack. The template requires following inputs:

Networking
  • VPCId: The VPC ID to launch the Slurmdbd instance.

  • SubnetId: The Subnet ID to launch the Slurmdbd instance.

  • PrivatePrefix: The CIDR prefix of the VPC.

  • PrivateIp: A secondary private IP to assign to the Slurmdbd instance.

Database connection
  • DBMSClientSG: The security group to be attach to the Slurmdbd instance. This security group should allows connections between the database server and the Slurmdbd instance.

  • DBMSDatabaseName: The name of the database.

  • DBMSUsername: The username to the database.

  • DBMSPasswordSecretArn: The secret containing the password to the database.

  • DBMSUri: The URI of the database server.

Instance settings
  • InstanceType: An instance type to use for the slurmdbd instance.

  • KeyName: An Amazon EC2 key pair to use for the slurmdbd instance.

Slurmdbd settings
  • AMIID: An AMI of the Slurmdbd instance. The AMI should be a ParallelCluster AMI. The version of the ParallelCluster AMI determines the version of Slurmdbd.

  • MungeKeySecretArn: The secret containing the munge key to use for authenticating communications between Slurmdbd and clusters.

  • SlurmdbdPort: A port number that the slurmdbd uses.

  • EnableSlurmdbdSystemService: Enables slurmdbd as system service and have it run when an instance launches.

Warning

If the database was created by a different version of SlurmDB, do not use Slurmdbd as a system service.

If the database contains a large number of entries, the Slurm Database Daemon (SlurmDBD) may require tens of minutes to update the database and be unresponsive during this time interval.

Before upgrading SlurmDB, make a backup of the database. For more information, see the Slurm documentation.

Step 2: Create a cluster with external Slurmdbd enabled

The provided AWS CloudFormation template generates a AWS CloudFormation stack with some defined outputs.

From the AWS Management Console, view the Outputs tab in the AWS CloudFormation stack to review the entities created. To enable the Slurm accounting, some of these outputs must be used in the AWS ParallelCluster configuration file:

Additional, from the Parameters tab in the AWS CloudFormation stack view:

Update your cluster configuration file database parameters with the output values. Use the pcluster AWS CLI to create the cluster.

$ pcluster create-cluster -n cluster-3.x-c path/to/cluster-config.yaml

After the cluster is created, you can start using Slurm accounting commands such as sacctmgr or sacct.

Warning

Traffic between ParallelCluster and the external SlurmDB is not encrypted. It is recommended to run the cluster and the external SlurmDB in a trusted network.