Getting started with AWS Parallel Computing Service - AWS PCS

Getting started with AWS Parallel Computing Service

This is a tutorial to create a simple cluster that you can use to try AWS PCS. The following figure shows the design of the cluster.

An architecture diagram of the tutorial cluster: The 2 compute node groups are resources in your AWS account and connect to the Slurm cluster controller that runs in a service-owned AWS account. The EC2 instances in both compute node groups connect to shared storage in Amazon EFS and Amazon FSx for Lustre.

The tutorial cluster design has the following key components:

  • A VPC and subnets that meet AWS PCS networking requirements.

  • An Amazon EFS file system, which will be used as a shared home directory.

  • An Amazon FSx for Lustre file system, which provides a shared high performance directory.

  • An AWS PCS cluster, which provides a Slurm controller.

  • 2 AWS PCS compute node groups.

    • The login node group, which provides shell-based interactive access to the system.

    • The compute-1 node group provides elastically-scaling instances to run jobs.

  • 1 queue that sends jobs to EC2 instances in the compute-1 node group.

The cluster requires additional AWS resources, such as security groups, IAM roles, and EC2 launch templates, which aren't shown in the diagram.

Note

We recommend that you complete the command line steps in this topic in a Bash shell. If you aren't using a Bash shell, some script commands such as line continuation characters and the way variables are set and used require adjustment for your shell. Additionally, the quoting and escaping rules for your shell might be different. For more information, see Quotation marks and literals with strings in the AWS CLI in the AWS Command Line Interface User Guide for Version 2.