Configuring EFA clients - FSx for Lustre

Configuring EFA clients

To access an FSx for Lustre file system using an EFA interface, you must install the Lustre EFA modules and configure EFA interfaces. EFA is currently supported on Lustre clients running Ubuntu 22 with a kernel version of 6.8 and higher. See the Step 3: Install the EFA software in the Amazon EC2 User Guide on steps to install the EFA driver.

To configure your client instance on an EFA-enabled file system
  1. Connect to your Amazon EC2 instance.

  2. Copy the following script and save it as a file named configure-efa-fsx-lustre-client.sh.

    #!/bin/bash PATH=/sbin:/bin:/usr/sbin:/usr/bin echo "Started ${0} at $(date)" eth_intf="$(ip -br -4 a sh | grep $(hostname -i)/ | awk '{print $1}')" efa_version=$(modinfo efa | awk '/^version:/ {print $2}' | sed 's/[^0-9.]//g') min_efa_version="2.12.1" # Check the EFA driver version. Minimum v2.12.1 supported if [[ -z "$efa_version" ]]; then echo "Error: EFA driver not found" exit 1 fi if [[ "$(printf '%s\n' "$min_efa_version" "$efa_version" | sort -V | head -n1)" != "$min_efa_version" ]]; then echo "Error: EFA driver version $efa_version does not meet the minimum requirement $min_efa_version" exit 1 else echo "Using EFA driver version $efa_version" fi echo "Loading Lustre/EFA modules..." sudo /sbin/modprobe lnet sudo /sbin/modprobe kefalnd ipif_name="$eth_intf" sudo /sbin/modprobe ksocklnd sudo lnetctl lnet configure echo "Configuring TCP interface..." sudo lnetctl net del --net tcp 2> /dev/null sudo lnetctl net add --net tcp --if $eth_intf # For P5 instance type which supports 32 network cards, # by default add 8 EFA interfaces selecting every 4th device (1 per PCI bus) echo "Configuring EFA interface(s)..." instance_type="$(ec2-metadata --instance-type | awk '{ print $2 }')" num_efa_devices="$(ls -1 /sys/class/infiniband | wc -l)" echo "Found $num_efa_devices available EFA device(s)" if [[ "$instance_type" == "p5.48xlarge" || "$instance_type" == "p5e.48xlarge" ]]; then for intf in $(ls -1 /sys/class/infiniband | awk 'NR % 4 == 1'); do sudo lnetctl net add --net efa --if $intf --peer-credits 32 done else # Other instances: Configure 2 EFA interfaces by default if the instance supports multiple network cards, # or 1 interface for single network card instances # Can be modified to add more interfaces if instance type supports it sudo lnetctl net add --net efa --if $(ls -1 /sys/class/infiniband | head -n1) --peer-credits 32 if [[ $num_efa_devices -gt 1 ]]; then sudo lnetctl net add --net efa --if $(ls -1 /sys/class/infiniband | tail -n1) --peer-credits 32 fi fi echo "Setting discovery and UDSP rule" sudo lnetctl set discovery 1 sudo lnetctl udsp add --src efa --priority 0 sudo /sbin/modprobe lustre sudo lnetctl net show echo "Added $(sudo lnetctl net show | grep -c '@efa') EFA interface(s)"
  3. Run the EFA configuration script.

    sudo apt-get install amazon-ec2-utils cron ./configure-efa-fsx-lustre-client.sh
  4. Use the following example commands to set up a cron job that automatically reconfigures EFA on client instances after they are rebooted:

    sudo chmod +x configure-efa-fsx-lustre-client.sh sudo crontab -e @reboot /path/to/configure-efa-fsx-lustre-client.sh > /var/log/configure-efa-fsx-lustre-client-output.log

Adding or removing EFA interfaces

Each FSx for Lustre file system has a maximum limit of 1024 EFA connections across all client instances.

The configure-efa-fsx-lustre-client.sh script automatically configures the number of Elastic Fabric Adapter (EFA) interfaces on an EC2 instance based on the instance type. For P5 instances (p5.48xlarge or p5e.48xlarge), it configures 8 EFA interfaces by default. For other instances with multiple network cards, it configures 2 EFA interfaces. For instances with a single network card, it configures 1 EFA interface. When a client instance connects to an FSx for Lustre file system, each EFA interface configured on the client instance counts against the 1024 EFA connection limit.

Client instances with more EFA interfaces typically support higher levels of throughput per client instance compared to client instances with fewer EFA interfaces. As long as you do not exceed the EFA connection limit, you can modify the script to increase or decrease the number of EFA interfaces per instance to optimize per-client throughput performance for your workloads.

To add an EFA interface:

sudo lnetctl net add --net efa --if device_name --peer-credits 32

Where device_name is a device listed in ls -1 /sys/class/infiniband.

To delete an EFA interface:

sudo lnetctl net del --net efa --if device_name
To install the NVIDIA GPUDirect Storage driver on your client instance

To use GPUDirect Storage on FSx for Lustre, you must use an Amazon EC2 P5 client instance, and the NVIDIA GDS driver with a release version 2.24.2 or higher.

  1. Clone the NVIDIA/gds-nvidia-fs repository which is available on GitHub.

    git clone https://github.com/NVIDIA/gds-nvidia-fs.git
  2. After cloning the repository, use the following commands to build the driver:

    cd src export NVFS_MAX_PEER_DEVS=128 export NVFS_MAX_PCI_DEPTH=16 sudo -E make sudo insmod nvidia-fs.ko