

# AWS PCS Networking
<a name="working-with_networking"></a>

Your AWS PCS cluster is created in an Amazon VPC. This chapter includes the following topics about networking for your cluster’s scheduler and nodes.

Except for choosing a subnet to launch instances in, you must use EC2 launch templates to configure networking for AWS PCS compute node groups. For more information about launch templates, see [Using Amazon EC2 launch templates with AWS PCS](working-with_launch-templates.md). 

**Topics**
+ [AWS PCS VPC and subnet requirements and considerations](working-with_networking_vpc-requirements.md)
+ [Creating a VPC for your AWS PCS cluster](working-with_networking_create-vpc.md)
+ [Security groups in AWS PCS](working-with_networking_sg.md)
+ [Multiple network interfaces in AWS PCS](working-with_networking_multi-nic.md)
+ [Placement groups for EC2 instances in AWS PCS](working-with_networking_placement-groups.md)
+ [Using Elastic Fabric Adapter (EFA) with AWS PCS](working-with_networking_efa.md)

# AWS PCS VPC and subnet requirements and considerations
<a name="working-with_networking_vpc-requirements"></a>

When you create an AWS PCS cluster, you specify a VPC a subnet in that VPC. This topic provides an overview of AWS PCS specific requirements and considerations for the VPC and subnet(s) that you use with your cluster. If you don't have a VPC to use with AWS PCS, you can create one using an AWS-provided CloudFormation template. For more information about VPCs, see [Virtual private clouds (VPC)](https://docs.aws.amazon.com/vpc/latest/userguide/configure-your-vpc.html) in the *Amazon VPC User Guide*.

## VPC requirements and considerations
<a name="working-with_networking_vpc-requirements_vpc"></a>

When you create a cluster, the VPC that you specify must meet the following requirements and considerations:
+ The VPC must have a sufficient number of IP addresses available for the cluster, any nodes, and other cluster resources that you want to create. For more information, see [IP addressing for your VPCs and subnets](https://docs.aws.amazon.com/eks/latest/userguide/network_reqs.html#network-requirements-vpc) in the *Amazon VPC User Guide*.
+  If your cluster uses IPv6: 
  +  Associate an IPv6 CIDR block with your VPC. For more information, see [Create a VPC](https://docs.aws.amazon.com/vpc/latest/userguide/create-vpc.html) in the *Amazon VPC User Guide*. 
**Important**  
 Although you can configure your VPC with both IPv4 and IPv6, you can only choose 1 network type for your cluster. 
  +  Enable **auto-assign IPv6 address** for your subnets. 
  + For more information, see:
    +  [IPv6 on AWS](https://docs.aws.amazon.com/whitepapers/latest/ipv6-on-aws/IPv6-on-AWS.html) 
    +  [Understanding IPv6 addressing on AWS and designing a scalable addressing plan](https://aws.amazon.com/blogs/networking-and-content-delivery/understanding-ipv6-addressing-on-aws-and-designing-a-scalable-addressing-plan) 
+ The VPC must have a DNS hostname and DNS resolution support. Otherwise, nodes can't register the customer cluster. For more information, see [DNS attributes for your VPC](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-dns.html) in the *Amazon VPC User Guide*.
+ The VPC might require VPC endpoints using AWS PrivateLink to be able to contact the AWS PCS API. For more information, see [Connect your VPC to services using AWS PrivateLink](https://docs.aws.amazon.com/vpc/latest/userguide/endpoint-services-overview.html) in the *Amazon VPC User Guide*.

**Important**  
 AWS PCS doesn't support a VPC with dedicated instance tenancy. The VPC you use for AWS PCS must use `default` instance tenancy. You can change the instance tenancy for an existing VPC. For more information, see [Change the instance tenancy of a VPC](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/change-tenancy-vpc.html) in the *Amazon Elastic Compute Cloud User Guide*. 

## Subnet requirements and considerations
<a name="working-with_networking_vpc-requirements_subnet"></a>

When you create a Slurm cluster, AWS PCS creates an [Elastic Network Interface(ENI)](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html) in the subnet you specified. This network interface enables communication between the scheduler controller and the customer VPC. The network interface also enables Slurm to communicate with the components deployed in your account. You can only specify the subnet for a cluster at creation time. 

### Subnet requirements for clusters
<a name="working-with_networking_vpc-requirements_subnet_clusters"></a>

The [subnet](https://docs.aws.amazon.com/vpc/latest/userguide/configure-subnets.html#subnet-types) that you specify when you create a cluster must meet the following requirements:
+ The subnet must have at least 1 IP address for use by AWS PCS.
+  If your cluster uses IPv6, all of the subnets in your cluster must use IPv6. 

**Important**  
Compute node groups configured with AWS PCS sample AMIs and multiple network interfaces won't work currently if the subnets are only configured to use IPv6. Use dual-stack subnets (IPv4 and IPv6) or IPv4-only subnets instead. For more information, see [Using sample Amazon Machine Images (AMIs) with AWS PCS](working-with_ami_samples.md).
+ The subnet can't reside in AWS Outposts, AWS Wavelength, or an AWS Local Zone.
+ The subnet can be a public or private. We recommend that you specify a private subnet, if possible. A public subnet is a subnet with a route table that includes a route to an [internet gateway](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Internet_Gateway.html); a private subnet is a subnet with a route table that doesn't include a route to an internet gateway.

### Subnet requirements for nodes
<a name="working-with_networking_vpc-requirements_subnet_nodes"></a>

You can deploy nodes and other cluster resources to the subnet you specify when you create your AWS PCS cluster, and to other subnets in the same VPC. 

Any subnet that you deploy nodes and cluster resources to must meet the following requirements:
+ You must ensure that the subnet has enough available IP addresses to deploy all the nodes and cluster resources.
+ If your cluster uses IPv4 and you plan to deploy nodes to a public subnet, that subnet must auto-assign IPv4 public addresses.
**Note**  
Instances in a public subnet must use a security group with inbound rules that permit traffic from public IP addresses. Unless you have specific source address restrictions, this means an IPv4 source address of 0.0.0.0/0 or an IPv6 source address of ::/0.
+ If the subnet where you deploy nodes to is a private subnet and its route table doesn't include a route to a network address translation [(NAT) device](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat.html) (IPv4), add VPC endpoints using AWS PrivateLink to the customer VPC. VPC endpoints are needed for all the AWS services that the nodes contact. The only required endpoint is for AWS PCS to allow the node to call the `RegisterComputeNodeGroupInstance` API action. For more information, see [RegisterComputeNodeGroupInstance](https://docs.aws.amazon.com/pcs/latest/APIReference/API_RegisterComputeNodeGroupInstance.html) in the *AWS PCS API Reference*.
+ Public or private subnet status doesn't impact AWS PCS; the required endpoints must be reachable.

# Creating a VPC for your AWS PCS cluster
<a name="working-with_networking_create-vpc"></a>

You can create an Amazon Virtual Private Cloud (Amazon VPC) for your clusters within AWS Parallel Computing Service (AWS PCS).

Use Amazon VPC to launch VPC resources into a virtual network that you've defined. This virtual network closely resembles a traditional network that you might operate in your own data center. However, it comes with the benefits of using the scalable infrastructure of Amazon Web Services. We recommend that you have a thorough understanding of the Amazon VPC service before deploying production VPC clusters. For more information, see [What is Amazon VPC?](https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html) in the author visual mode.*Amazon VPC User Guide*.

An PCS cluster, nodes, and supporting resources (such as file systems and directory services) are deployed within your Amazon VPC. If you want to use an existing Amazon VPC with PCS, it must meet the requirements described in [AWS PCS VPC and subnet requirements and considerations](working-with_networking_vpc-requirements.md) . This topic describes how to create a VPC that meets PCS requirements using an AWS–provided CloudFormation template. Once you've deployed a template, you can view the resources created by the template to know exactly what resources it created, and the configuration of those resources.

## Prerequisites
<a name="working-with_networking_create-vpc_prereq"></a>

To create an Amazon VPC for PCS, you must have the necessary IAM permissions to create Amazon VPC resources. These resources are VPCs, subnets, security groups, route tables and routes, and internet and NAT gateways. For more information, see [Create a VPC with a public subnet](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-policy-examples.html#vpc-public-subnet-iam) in the *Amazon VPC User Guide*. To review the full list for Amazon EC2, see [Actions, resources, and condition keys for Amazon EC2](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonec2.html) in the *Service Authorization Reference*.

## Create an Amazon VPC
<a name="working-with_networking_create-vpc_create"></a>

Create a VPC by copy and pasting the appropriate URL for the AWS Region where you will use PCS. You may also download the CloudFormation template and upload it yourself to the [CloudFormation console](https://console.aws.amazon.com/cloudformation). 
+ **US East (N. Virginia) (us-east-1)**

  ```
  https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/create/review?stackName=hpc-networking&templateURL=https://aws-hpc-recipes.s3.us-east-1.amazonaws.com/main/recipes/net/hpc_large_scale/assets/main.yaml
  ```
+ **US East (Ohio) (us-east-2)**

  ```
  https://console.aws.amazon.com/cloudformation/home?region=us-east-2#/stacks/create/review?stackName=hpc-networking&templateURL=https://aws-hpc-recipes.s3.us-east-1.amazonaws.com/main/recipes/net/hpc_large_scale/assets/main.yaml
  ```
+ **US West (Oregon) (us-west-2)**

  ```
  https://console.aws.amazon.com/cloudformation/home?region=us-west-2#/stacks/create/review?stackName=hpc-networking&templateURL=https://aws-hpc-recipes.s3.us-east-1.amazonaws.com/main/recipes/net/hpc_large_scale/assets/main.yaml
  ```
+ **Asia Pacific (Mumbai) (ap-south-1)**

  ```
  https://console.aws.amazon.com/cloudformation/home?region=ap-south-1#/stacks/create/review?stackName=hpc-networking&templateURL=https://aws-hpc-recipes.s3.us-east-1.amazonaws.com/main/recipes/net/hpc_large_scale/assets/main.yaml
  ```
+ **Asia Pacific (Singapore) (ap-southeast-1)**

  ```
  https://console.aws.amazon.com/cloudformation/home?region=ap-southeast-1#/stacks/create/review?stackName=hpc-networking&templateURL=https://aws-hpc-recipes.s3.us-east-1.amazonaws.com/main/recipes/net/hpc_large_scale/assets/main.yaml
  ```
+ **Asia Pacific (Sydney) (ap-southeast-2)**

  ```
  https://console.aws.amazon.com/cloudformation/home?region=ap-southeast-2#/stacks/create/review?stackName=hpc-networking&templateURL=https://aws-hpc-recipes.s3.us-east-1.amazonaws.com/main/recipes/net/hpc_large_scale/assets/main.yaml
  ```
+ **Asia Pacific (Tokyo) (ap-northeast-1)**

  ```
  https://console.aws.amazon.com/cloudformation/home?region=ap-northeast-1#/stacks/create/review?stackName=hpc-networking&templateURL=https://aws-hpc-recipes.s3.us-east-1.amazonaws.com/main/recipes/net/hpc_large_scale/assets/main.yaml
  ```
+ **Europe (Frankfurt) (eu-central-1)**

  ```
  https://console.aws.amazon.com/cloudformation/home?region=eu-central-1#/stacks/create/review?stackName=hpc-networking&templateURL=https://aws-hpc-recipes.s3.us-east-1.amazonaws.com/main/recipes/net/hpc_large_scale/assets/main.yaml
  ```
+ **Europe (Ireland) (eu-west-1)**

  ```
  https://console.aws.amazon.com/cloudformation/home?region=eu-west-1#/stacks/create/review?stackName=hpc-networking&templateURL=https://aws-hpc-recipes.s3.us-east-1.amazonaws.com/main/recipes/net/hpc_large_scale/assets/main.yaml
  ```
+ **Europe (London) (eu-west-2)**

  ```
  https://console.aws.amazon.com/cloudformation/home?region=eu-west-2#/stacks/create/review?stackName=hpc-networking&templateURL=https://aws-hpc-recipes.s3.us-east-1.amazonaws.com/main/recipes/net/hpc_large_scale/assets/main.yaml
  ```
+ **Europe (Paris) (eu-west-3)**

  ```
  https://console.aws.amazon.com/cloudformation/home?region=eu-west-3#/stacks/create/review?stackName=hpc-networking&templateURL=https://aws-hpc-recipes.s3.us-east-1.amazonaws.com/main/recipes/net/hpc_large_scale/assets/main.yaml
  ```
+ **Europe (Milan) (eu-south-1)**

  ```
  https://console.aws.amazon.com/cloudformation/home?region=eu-south-1#/stacks/create/review?stackName=hpc-networking&templateURL=https://aws-hpc-recipes.s3.us-east-1.amazonaws.com/main/recipes/net/hpc_large_scale/assets/main.yaml
  ```
+ **Europe (Stockholm) (eu-north-1)**

  ```
  https://console.aws.amazon.com/cloudformation/home?region=eu-north-1#/stacks/create/review?stackName=hpc-networking&templateURL=https://aws-hpc-recipes.s3.us-east-1.amazonaws.com/main/recipes/net/hpc_large_scale/assets/main.yaml
  ```
+ **AWS GovCloud (US-East) (us-gov-east-1)**

  ```
  https://console.aws.amazon.com/cloudformation/home?region=us-gov-east-1#/stacks/create/review?stackName=hpc-networking&templateURL=https://aws-hpc-recipes.s3.us-east-1.amazonaws.com/main/recipes/net/hpc_large_scale/assets/main.yaml
  ```
+ **AWS GovCloud (US-West) (us-gov-west-1)**

  ```
  https://console.aws.amazon.com/cloudformation/home?region=us-gov-west-1#/stacks/create/review?stackName=hpc-networking&templateURL=https://aws-hpc-recipes.s3.us-east-1.amazonaws.com/main/recipes/net/hpc_large_scale/assets/main.yaml
  ```
+ **Template only**

  ```
  https://aws-hpc-recipes.s3.us-east-1.amazonaws.com/main/recipes/net/hpc_large_scale/assets/main.yaml
  ```

**To create an Amazon VPC for PCS**

1. Open the template in the [CloudFormation console](https://console.aws.amazon.com/cloudformation).
**Note**  
 These are pre-populated in the template so that you can simply leave them as the default values. 

1. Under **Provide a stack name**, then **Stack name**, enter `hpc-networking`.

1. Under **parameters**, enter the following details:

   1. Under **VPC**, then **CidrBlock**, enter `10.3.0.0/16`

   1. Under **Subnets A**:

      1. Then **CidrPublicSubnetA**, enter `10.3.0.0/20`

      1. Then **CidrPrivateSubnetA**, enter `10.3.128.0/20`

   1. Under **Subnets B**:

      1. Then **CidrPublicSubnetB**, enter `10.3.16.0/20`

      1. Then **CidrPrivateSubnetA**, enter ` 10.3.144.0/20`

   1. Under **Subnets C**:

      1. For **ProvisionSubnetsC**, select `True`.
**Note**  
If you are creating a VPC in a Region that has less than three Availability Zones, this option will be ignored if set to `True`.

      1. Then **CidrPublicSubnetB**, enter `10.3.32.0/20`

      1. Then **CidrPrivateSubnetA**, enter `10.3.160.0/20`

1. Under **Capabilities**, check the box for **I acknowledge that AWS CloudFormation might create IAM resources**.

Monitor the status of the CloudFormation stack. When it reaches `CREATE_COMPLETE`, the VPC resource are ready for you to use.

**Note**  
To see all the resources the CloudFormation template created, open the [CloudFormation console](https://console.aws.amazon.com/cloudformation). Choose the `hpc-networking` stack and then choose the **Resources** tab.

# Security groups in AWS PCS
<a name="working-with_networking_sg"></a>

Security groups in Amazon EC2 act as virtual firewalls to control inbound and outbound traffic to instances. Use a launch template for an AWS PCS compute node group to add or remove security groups to its instances. If your launch template doesn't contain any network interfaces, use `SecurityGroupIds` to provide a list of security groups. If your launch template defines network interfaces, you must use the `Groups` parameter to assign security groups to each network interface. For more information about launch templates, see [Using Amazon EC2 launch templates with AWS PCS](working-with_launch-templates.md).

**Note**  
Changes to the security group configuration in the launch template only affects new instances launched after the compute node group is updated.

## Security group requirements and considerations
<a name="working-with_networking_sg-requirements"></a>

AWS PCS creates a cross-account [Elastic Network Interface (ENI)](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html) in the subnet you specify when creating a cluster. This provides the HPC scheduler, which is running in an account managed by AWS, a path to communicate with EC2 instances launched by AWS PCS. You must provide a security group for that ENI that allows 2-way communication between the scheduler ENI and your cluster EC2 instances. 

A straightforward way to accomplish this is to create a permissive self-referencing security group that permits TCP/IP traffic on all ports between all members of the group. You can attach this to both the cluster and to node group EC2 instances. 

### Example permissive security group configuration
<a name="working-with_networking_sg-requirements_permissive-security-config"></a>

------
#### [ IPv4 ]


| Rule type | Protocols | Ports | Source | Destination  | 
| --- | --- | --- | --- | --- | 
| Inbound | All | All | Self |  | 
| Outbound | All | All |  |  0.0.0.0/0  | 
| Outbound | All | All |  | Self | 

------
#### [ IPv6 ]


| Rule type | Protocols | Ports | Source | Destination  | 
| --- | --- | --- | --- | --- | 
| Inbound | All | All | Self |  | 
| Outbound | All | All |  |  ::/0  | 
| Outbound | All | All |  | Self | 

------

These rules allow all traffic to flow freely between the Slurm controller and nodes, allows all outbound traffic to any destination, and enables [EFA traffic](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html#efa-start-security).

### Example restrictive security group configuration
<a name="working-with_networking_sg-requirements_restrictive-security-config.title"></a>

You can also limit the open ports between the cluster and its compute nodes. For the Slurm scheduler, the security group attached to your cluster must allow the following ports:
+ 6817 – enable inbound connections to `slurmctld` from EC2 instances
+ 6818 – enable outbound connections from `slurmctld` to `slurmd` running on EC2 instances

The security group attached to your compute nodes must allow the following ports:
+ 6817 – enable outbound connections to `slurmctld` from EC2 instances.
+ 6818 – enable inbound and outbound connections to `slurmd` from `slurmctld` and from `slurmd` on node group instances 
+ 60001–63000 – inbound and outbound connections between node group instances to support `srun`
+ EFA traffic between node group instances. For more information, see [Prepare an EFA-enabled security group](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html#efa-start-security) in the *User Guide for Linux Instances*
+ Any other inter-node traffic required by your workload

# Multiple network interfaces in AWS PCS
<a name="working-with_networking_multi-nic"></a>

Some EC2 instances have multiple network cards. This allows them to provide higher network performance, including bandwidth capabilities above 100 Gbps and improved packet handling. For more information about instances with multiple network cards, see [Elastic network interfaces](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html#network-cards) in the *Amazon Elastic Compute Cloud User Guide*.

 Configure additional network cards for instances in an AWS PCS compute node group by adding network interfaces to its EC2 launch template. Below is an example launch template that enables two network cards, such as can be found on an `hpc7a.96xlarge` instance. Note the following details: 
+ The subnet for each network interface must be the same as you choose when configuring the AWS PCS compute node group that will use the launch template.
+ The primary network device, where routine network communication such as SSH and HTTPS traffic will occur, is established by setting a `DeviceIndex` of `0`. Other network interfaces have a `DeviceIndex` of `1`. There can only be one primary network interface—all other interfaces are secondary.
+ All network interfaces must have a unique `NetworkCardIndex`. A recommended practice is to number them sequentially as they are defined in the launch template.
+ Security groups for each network interface are set using `Groups`. In this example, an inbound SSH security group (`sg-SshSecurityGroupId`) is added to the primary network interface, as well as the security group enabling within-cluster communications (`sg-ClusterSecurityGroupId`). Finally, a security group allowing outbound connections to the internet (`sg-InternetOutboundSecurityGroupId`) is added to both primary and secondary interfaces.

```
{
    "NetworkInterfaces": [
        {
            "DeviceIndex": 0,
            "NetworkCardIndex": 0,
            "SubnetId": "subnet-SubnetId",
            "Groups": [
               "sg-SshSecurityGroupId",
               "sg-ClusterSecurityGroupId",
               "sg-InternetOutboundSecurityGroupId"
            ]
        },
        {
            "DeviceIndex": 1,
            "NetworkCardIndex": 1,
            "SubnetId": "subnet-SubnetId",
            "Groups": ["sg-InternetOutboundSecurityGroupId"]
        }
    ]
}
```

# Placement groups for EC2 instances in AWS PCS
<a name="working-with_networking_placement-groups"></a>

 You can use a **placement group** to influence the placement of EC2 instances to suit the needs of the workload that runs on them.

**Placement group types**
+  **Cluster** – Packs instances close together in an Availability Zone to optimize for low-latency communication. 
+  **Partition** – Spreads instances across logical partitions to help maximize resilience.
+  **Spread** – Strictly enforces that a small number of instances launch on distinct hardware, which can also help with resiliency.

For more information, see [Placement groups for your Amazon EC2 instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html) in the *Amazon Elastic Compute Cloud User Guide*.

We recommended you include a **cluster** placement group when you configure an AWS PCS compute node group to use Elastic Fabric Adapter (EFA).

**To create a cluster placement group that works with EFA**

1. Create a placement group with the type **cluster** for the compute node group.
   + Use the following AWS CLI command:

     ```
     aws ec2 create-placement-group --strategy cluster --group-name PLACEMENT-GROUP-NAME
     ```
   + You can also use a CloudFormation template to create a placement group. For more information, see [Working with CloudFormation templates](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/template-guide.html) in the *AWS CloudFormation User Guide*. Download the template from the following URL and upload it into the [CloudFormation console](https://console.aws.amazon.com/cloudformation).

     ```
     https://aws-hpc-recipes.s3.amazonaws.com/main/recipes/pcs/enable_efa/assets/efa-placement-group.yaml
     ```

1.  Include the placement group in the EC2 launch template for the AWS PCS compute node group.

# Using Elastic Fabric Adapter (EFA) with AWS PCS
<a name="working-with_networking_efa"></a>

 Elastic Fabric Adapter (EFA) is a high performance advanced networking interconnect from AWS that you can attach to your EC2 instance to accelerate High Performance Computing (HPC) and machine learning applications. Enabling your applications running on an AWS PCS cluster with EFA involves configuring the AWS PCS compute node group instances to use EFA as follows. 

**Note**  
**Install EFA on an AWS PCS-compatible AMI** – The AMI used in the AWS PCS compute node group must have the EFA driver installed and loaded. For information on how to build a custom AMI with EFA software installed, see [Custom Amazon Machine Images (AMIs) for AWS PCS](working-with_ami_custom.md).

**Contents**
+ [Identify EFA-enabled EC2 instances](working-with_networking_efa_identify-instances.md)
+ [Create a security group to support EFA communications](working-with_networking_efa_create-sg.md)
+ [(Optional) Create a placement group](working-with_networking_efa_create-placement-group.md)
+ [Create or update an EC2 launch template](working-with_networking_efa_create-lt.md)
+ [Create or update compute node groups for EFA](working-with_networking_efa_create-cng.md)
+ [(Optional) Test EFA](working-with_networking_efa_test-efa.md)
+ [(Optional) Use a CloudFormation template to create an EFA-enabled launch template](working-with_networking_efa_create-lt-cfn.md)

# Identify EFA-enabled EC2 instances
<a name="working-with_networking_efa_identify-instances"></a>

To use EFA, all instance types that are allowed for an AWS PCS compute group must support EFA, and must have the same number of vCPUs (and GPUs if appropriate). For a list of EFA-enabled instances, see [Elastic Fabric Adapter for HPC and ML workloads on Amazon EC2](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html#efa-instance-types) in the *Amazon Elastic Compute Cloud User Guide*. You can also use the AWS CLI to view a list of instance types that support EFA. Replace *region-code* with the AWS Region where you use AWS PCS, such as `us-east-1`.

```
aws ec2 describe-instance-types \ 
   --region region-code \ 
   --filters Name=network-info.efa-supported,Values=true \ 
   --query "InstanceTypes[*].[InstanceType]" \
   --output text | sort
```

**Note**  
**Determine how many network interfaces are available** – Some EC2 instances have multiple network cards. This allows them to have multiple EFAs. For more information, see [Multiple network interfaces in AWS PCS](working-with_networking_multi-nic.md).

# Create a security group to support EFA communications
<a name="working-with_networking_efa_create-sg"></a>

------
#### [ AWS CLI ]

You can use the following AWS CLI command to create a security group that supports EFA. The command outputs a security group ID. Make the following replacements:
+ `region-code` – Specify the AWS Region where you use AWS PCS, such as `us-east-1`.
+ `vpc-id` – Specify the ID of the VPC that you use for AWS PCS.
+ `efa-group-name` – Provide your chosen name for the security group.

```
aws ec2 create-security-group \
    --group-name efa-group-name \
    --description "Security group to enable EFA traffic" \
    --vpc-id vpc-id \
    --region region-code
```

Use the following commands to attach inbound and outbound security group rules. Make the following replacement: 
+ `efa-secgroup-id` – Provide the ID of the EFA security group you just created. 

```
aws ec2 authorize-security-group-ingress \
    --group-id efa-secgroup-id \
    --protocol -1 \
    --source-group efa-secgroup-id
    
aws ec2 authorize-security-group-egress \
    --group-id efa-secgroup-id \
    --protocol -1 \
    --source-group efa-secgroup-id
```

------
#### [ CloudFormation template ]

You can use a CloudFormation template to create a security group that supports EFA. Download the template from the following URL, then upload it into the [AWS CloudFormation console](https://console.aws.amazon.com/cloudformation). 

```
https://aws-hpc-recipes.s3.amazonaws.com/main/recipes/pcs/enable_efa/assets/efa-sg.yaml
```

With the template open in the AWS CloudFormation console, enter the following options.
+ Under **Provide a stack name**
  + Under **Stack name**, enter a name such as `efa-sg-stack`.
+ Under **Parameters**
  + Under **SecurityGroupName**, enter a name such as `efa-sg`.
  + Under **VPC**, select the VPC where you will use AWS PCS.

Finish creating the CloudFormation stack and monitor its status. When it reaches `CREATE_COMPLETE` the EFA security group is ready for use. 

------

# (Optional) Create a placement group
<a name="working-with_networking_efa_create-placement-group"></a>

We recommended you launch all instances that use EFA in a cluster placement group to minimize the physical distance between them. Create a placement group for each compute node group where you plan to use EFA. See [Placement groups for EC2 instances in AWS PCS](working-with_networking_placement-groups.md) to create a placement group for your compute node group. 

# Create or update an EC2 launch template
<a name="working-with_networking_efa_create-lt"></a>

EFA network interfaces are set up in the EC2 launch template for an AWS PCS compute node group. If there are multiple network cards, multiple EFAs can be configured. The EFA security group and the optional placement group are included in the launch template as well. 

Here is an example launch template for instances with two network cards, such as **hpc7a.96xlarge**. The instances will be launched in `subnet-SubnetID1` in cluster placement group `pg-PlacementGroupId1`.

 Security groups must be added specifically to each EFA interface. Every EFA needs the security group that enables EFA traffic (`sg-EfaSecGroupId`). Other security groups, especially ones that handle regular traffic like SSH or HTTPS, only need to be attached to the primary network interface (designated by a `DeviceIndex` of `0`). Launch templates where network interfaces are defined do not support setting security groups using the `SecurityGroupIds` parameter—you must set a value for `Groups` in each network interface that you configure. 

```
{
    "Placement": {
        "GroupId": "pg-PlacementGroupId1"
    },
    "NetworkInterfaces": [
        {
            "DeviceIndex": 0,
            "InterfaceType": "efa",
            "NetworkCardIndex": 0,
            "SubnetId": "subnet-SubnetId1",
            "Groups": [
                "sg-SecurityGroupId1",
                "sg-EfaSecGroupId"
            ]
        },
        {
            "DeviceIndex": 1,
            "InterfaceType": "efa",
            "NetworkCardIndex": 1,
            "SubnetId": "subnet-SubnetId1"
            "Groups": ["sg-EfaSecGroupId"]
        }
    ]
}
```

# Create or update compute node groups for EFA
<a name="working-with_networking_efa_create-cng"></a>

Your AWS PCS compute node groups must contain instances that have the same number of vCPUs, processor architecture, and EFA support. Configure the compute node group to use the AMI with the EFA software installed on it, and to use the launch template that configures EFA-enabled network interfaces. 

# (Optional) Test EFA
<a name="working-with_networking_efa_test-efa"></a>

 You can demonstrate EFA-enabled communication between two nodes in a compute node group by running the `fi_pingpong` program, which is included in the EFA software installation. If this test is successful, it is likely that EFA is configured properly. 

 To start, you need two running instances in the compute node group. If your compute node group uses static capacity, there should be already be instances available. For a compute node group that uses dynamic capacity, you can launch two nodes using the `salloc` command. Here is an example from a cluster with a dynamic node group named `hpc7g` associated with a queue named `all`. 

```
% salloc --nodes 2 -p all
salloc: Granted job allocation 6
salloc: Waiting for resource configuration
... a few minutes pass ...
salloc: Nodes hpc7g-[1-2] are ready for job
```

 Find out the IP address for the two allocated nodes using `scontrol`. In the example that follows, the addresses are `10.3.140.69` for `hpc7g-1` and `10.3.132.211` for `hpc7g-2`. 

```
% scontrol show nodes hpc7g-[1-2]
NodeName=hpc7g-1 Arch=aarch64 CoresPerSocket=1
   CPUAlloc=0 CPUEfctv=64 CPUTot=64 CPULoad=0.00
   AvailableFeatures=hpc7g
   ActiveFeatures=hpc7g
   Gres=(null)
   NodeAddr=10.3.140.69 NodeHostName=ip-10-3-140-69 Version=25.05.5
   OS=Linux 5.10.218-208.862.amzn2.aarch64 #1 SMP Tue Jun 4 16:52:10 UTC 2024
   RealMemory=124518 AllocMem=0 FreeMem=110763 Sockets=64 Boards=1
   State=IDLE+CLOUD ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=efa
   BootTime=2024-07-02T19:00:09 SlurmdStartTime=2024-07-08T19:33:25
   LastBusyTime=2024-07-08T19:33:25 ResumeAfterTime=None
   CfgTRES=cpu=64,mem=124518M,billing=64
   AllocTRES=
   CapWatts=n/a
   CurrentWatts=0 AveWatts=0
   ExtSensorsJoules=n/a ExtSensorsWatts=0 ExtSensorsTemp=n/a
   Reason=Maintain Minimum Number Of Instances [root@2024-07-02T18:59:00]
   InstanceId=i-04927897a9ce3c143 InstanceType=hpc7g.16xlarge

NodeName=hpc7g-2 Arch=aarch64 CoresPerSocket=1
   CPUAlloc=0 CPUEfctv=64 CPUTot=64 CPULoad=0.00
   AvailableFeatures=hpc7g
   ActiveFeatures=hpc7g
   Gres=(null)
   NodeAddr=10.3.132.211 NodeHostName=ip-10-3-132-211 Version=25.05.5
   OS=Linux 5.10.218-208.862.amzn2.aarch64 #1 SMP Tue Jun 4 16:52:10 UTC 2024
   RealMemory=124518 AllocMem=0 FreeMem=110759 Sockets=64 Boards=1
   State=IDLE+CLOUD ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=efa
   BootTime=2024-07-02T19:00:09 SlurmdStartTime=2024-07-08T19:33:25
   LastBusyTime=2024-07-08T19:33:25 ResumeAfterTime=None
   CfgTRES=cpu=64,mem=124518M,billing=64
   AllocTRES=
   CapWatts=n/a
   CurrentWatts=0 AveWatts=0
   ExtSensorsJoules=n/a ExtSensorsWatts=0 ExtSensorsTemp=n/a
   Reason=Maintain Minimum Number Of Instances [root@2024-07-02T18:59:00]
   InstanceId=i-0a2c82623cb1393a7 InstanceType=hpc7g.16xlarge
```

Connect to one of the nodes (in this example case, `hpc7g-1`) using SSH (or SSM). Note that this is an internal IP address, so you may need to connect from one of your login nodes if you use SSH. Also be aware that the instance needs to be configured with an SSH key by way of the compute node group launch template.

```
% ssh ec2-user@10.3.140.69
```

 Now, launch `fi_pingpong` in server mode. 

```
/opt/amazon/efa/bin/fi_pingpong -p efa
```

 Connect to the second instance (`hpc7g-2`).

```
% ssh ec2-user@10.3.132.211
```

 Run `fi_pingpong` in client mode, connecting to the server on `hpc7g-1`. You should see output that resembles the example below. 

```
% /opt/amazon/efa/bin/fi_pingpong -p efa 10.3.140.69

bytes   #sent   #ack     total       time     MB/sec    usec/xfer   Mxfers/sec
64      10      =10      1.2k        0.00s      3.08      20.75       0.05
256     10      =10      5k          0.00s     21.24      12.05       0.08
1k      10      =10      20k         0.00s     82.91      12.35       0.08
4k      10      =10      80k         0.00s    311.48      13.15       0.08
[error] util/pingpong.c:1876: fi_close (-22) fid 0
```

# (Optional) Use a CloudFormation template to create an EFA-enabled launch template
<a name="working-with_networking_efa_create-lt-cfn"></a>

Because there are several dependencies to setting up EFA, a CloudFormation template has been provided that you can use to configure a compute node group. It supports instances with up to four network cards. To learn more about instances with multiple network cards, see [Elastic network interfaces](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html#network-cards) in the *Amazon Elastic Compute Cloud User Guide*.

Download the CloudFormation template from the following URL, then upload it to the CloudFormation console in the AWS Region where you use AWS PCS. 

```
https://aws-hpc-recipes.s3.amazonaws.com/main/recipes/pcs/enable_efa/assets/pcs-lt-efa.yaml
```

With the template open in the CloudFormation console, enter the following values. Note that the template will provide some default parameter values—you can leave them as their default values. 
+ Under **Provide a stack name**
  + Under **Stack name**, enter a descriptive name. We recommend incorporating the name you will choose for your AWS PCS compute node group, such as `NODEGROUPNAME-efa-lt`.
+ Under **Parameters**
  + Under **NumberOfNetworkCards**, choose the number of network cards in the instances that will be in your node group.
  + Under **VpcId**, choose the VPC where your AWS PCS cluster is deployed.
  + Under **NodeGroupSubnetId**, choose the subnet in your cluster VPC where EFA-enabled instances will be launched.
  + Under **PlacementGroupName**, leave the field blank to create a new cluster placement group for the node group. If you have an existing placement group you want to use, enter its name here.
  + Under **ClusterSecurityGroupId**, choose the security group you are using to allow access to other instances in the cluster and to the AWS PCS API. Many customers choose the default security group from their cluster VPC.
  + Under **SshSecurityGroupId**, provide the ID for a security group you are using to allow inbound SSH access to nodes in your cluster.
  + For **SshKeyName**, select the SSH keypair for access to nodes in your cluster.
  + For **LaunchTemplateName**, enter a descriptive name for the launch template such as `NODEGROUPNAME-efa-lt`. The name must be unique to your AWS account in the AWS Region where you will use AWS PCS.
+ Under **Capabilities**
  + Check the box for **I acknowledge that AWS CloudFormation might create IAM resources**.

 Monitor the status of the CloudFormation stack. When it reaches `CREATE_COMPLETE` the launch template is ready to be used. Use it with an AWS PCS compute node group, as described above in [Create or update compute node groups for EFA](working-with_networking_efa_create-cng.md).