Monitor an Elastic Fabric Adapter on Amazon EC2 - Amazon Elastic Compute Cloud

Monitor an Elastic Fabric Adapter on Amazon EC2

You can use the following features to monitor the performance of your Elastic Fabric Adapters.

EFA driver metrics for an Amazon EC2 instance

The Elastic Fabric Adapter (EFA) driver publishes multiple metrics from the instances that have EFA interfaces attached. You can use these metrics to troubleshoot application performance issues, choose the right cluster size for a workload, plan scaling activities proactively, and benchmark applications to determine whether they maximize the EFA performance available on an instance.

Available EFA driver metrics

The EFA driver publishes the following metrics to the instance in real time. They provide the cumulative number of errors and packets or bytes sent, received, or dropped by the attached EFA devices since instance launch or the last driver reset.

Metric Description
tx_bytes

The number of bytes transmitted.

Unit: bytes

rx_bytes

The number of bytes received.

Unit: bytes

tx_pkts

The number of packets transmitted.

Unit: count

rx_pkts

The number of packets received.

Unit: count

rx_drops

The number of packets that were received and then dropped.

Unit: count

send_bytes

The number of bytes sent using send operations.

Unit: bytes

recv_bytes

The number of bytes received by send operations.

Unit: bytes

send_wrs

The number of packets sent using send operations.

Unit: count

recv_wrs

The number of packets received by send operations.

Unit: count

rdma_write_wrs

The number of completed rdma write operations.

Unit: count

rdma_read_wrs

The number of completed rdma read operations.

Unit: count

rdma_write_bytes

The number of bytes written to it by other instances using rdma write operations.

Unit: bytes

rdma_read_bytes

The number of bytes received using rdma read operations.

Unit: bytes

rdma_write_wr_err

The number of rdma write operations that had local or remote errors.

Unit: count

rdma_read_wr_err

The number of rdma read operations that had local or remote errors.

Unit: count

rdma_read_resp_bytes

The number of bytes sent in response to rdma read operations.

Unit: bytes

rdma_write_recv_bytes

The number of bytes received by rdma write operations.

Unit: bytes

Retrieve EFA driver metrics for your instance

You can use the rdma-tool command line tool to retrieve the metrics for all EFA interfaces attached to an instance as follows:

$ rdma -p statistic show link rdmap0s31/1 tx_bytes 0 tx_pkts 0 rx_bytes 0 rx_pkts 0 rx_drops 0 send_bytes 0 send_wrs 0 recv_bytes 0 recv_wrs 0 rdma_read_wrs 0 rdma_read_bytes 0 rdma_read_wr_err 0 rdma_read_resp_bytes 0 rdma_write_wrs 0 rdma_write_bytes 0 rdma_write_wr_err 0

Alternatively, you can retrieve the metrics for each EFA interface attached to an instance from the sys files using the following command.

$ more /sys/class/infiniband/device_number/ports/port_number/hw_counters/* | cat

For example

$ more /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/* | cat :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/lifespan :::::::::::::: 12 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/rdma_read_bytes :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/rdma_read_resp_bytes :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/rdma_read_wr_err :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/rdma_read_wrs :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/rdma_write_bytes :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/rdma_write_recv_bytes :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/rdma_write_wr_err :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/rdma_write_wrs :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/recv_bytes :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/recv_wrs :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/rx_bytes :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/rx_drops :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/rx_pkts :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/send_bytes :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/send_wrs :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/tx_bytes :::::::::::::: 0 :::::::::::::: /sys/class/infiniband/rdmap0s31/ports/1/hw_counters/tx_pkts :::::::::::::: 0

Amazon VPC flow logs

You can create an Amazon VPC Flow Log to capture information about the traffic going to and from an EFA. Flow log data can be published to Amazon CloudWatch Logs and Amazon S3. After you create a flow log, you can retrieve and view its data in the chosen destination. For more information, see VPC Flow Logs in the Amazon VPC User Guide.

You create a flow log for an EFA in the same way that you create a flow log for an elastic network interface. For more information, see Create a flow log in the Amazon VPC User Guide.

In the flow log entries, EFA traffic is identified by the srcAddress and destAddress, which are both formatted as MAC addresses, as shown in the following example.

version accountId eniId srcAddress destAddress sourcePort destPort protocol packets bytes start end action log-status 2 3794735123 eni-10000001 01:23:45:67:89:ab 05:23:45:67:89:ab - - - 9 5689 1521232534 1524512343 ACCEPT OK

Amazon CloudWatch

If you are using EFA in an Amazon EKS cluster, you can monitor your EFAs using CloudWatch Container Insights. For more information, see Amazon EKS and Kubernetes Container Insights metrics in the Amazon CloudWatch User Guide.