Monitoring with Amazon CloudWatch - FSx for Lustre

Monitoring with Amazon CloudWatch

You can monitor file systems using Amazon CloudWatch, which collects and processes raw data from Amazon FSx for Lustre into readable, near real-time metrics. These statistics are retained for a period of 15 months, so that you can access historical information and gain a better perspective on how your web application or service is performing. By default, Amazon FSx for Lustre metric data is automatically sent to CloudWatch at 1-minute periods. For more information about CloudWatch, see What Is Amazon CloudWatch? in the Amazon CloudWatch User Guide.

CloudWatch metrics are reported as raw Bytes. Bytes are not rounded to either a decimal or binary multiple of the unit.

File system metrics

FSx for Lustre publishes the following metrics into the FSx namespace in CloudWatch. For each metric, FSx for Lustre emits a data point per disk per minute. To view aggregate file system details, you can use the Sum statistic. Note that the file servers behind your FSx for Lustre file systems are spread across multiple disks.

Metric Description
DataReadBytes

The number of bytes for file system read operations.

The Sum statistic is the total number of bytes associated with read operations during the period. The Minimum statistic is the minimum number of bytes associated with read operations on a single disk. The Maximum statistic is the maximum number of bytes associated with read operations on the disk. The Average statistic is the average number of bytes associated with read operations per disk. The SampleCount statistic is the number of disks.

To calculate the average throughput (bytes per second) for a period, divide the Sum statistic by the number of seconds in the period.

Units:

  • Bytes for Sum, Minimum, Maximum, and Average.

  • Count for SampleCount.

Valid statistics: Sum, Minimum, Maximum, Average, SampleCount

DataWriteBytes

The number of bytes for file system write operations.

The Sum statistic is the total number of bytes associated with write operations. The Minimum statistic is the minimum number of bytes associated with write operations on a single disk. The Maximum statistic is the maximum number of bytes associated with write operations on the disk. The Average statistic is the average number of bytes associated with write operations per disk. The SampleCount statistic is the number of disks.

To calculate the average throughput (bytes per second) for a period, divide the Sum statistic by the number of seconds in the period.

Units:

  • Bytes for Sum, Minimum, Maximum, and Average.

  • Count for SampleCount.

Valid statistics: Sum, Minimum, Maximum, Average, SampleCount

DataReadOperations

The number of read operations.

The Sum statistic is the total number of read operations. The Minimum statistic is the minimum number of read operations on a single disk. The Maximum statistic is the maximum number of read operations on the disk. The Average statistic is the average number of read operations per disk. The SampleCount statistic is the number of disks.

To calculate the average number of read operations (operations per second) for a period, divide the Sum statistic by the number of seconds in the period.

Units:

  • Bytes for Sum, Minimum, Maximum, and Average.

  • Count for SampleCount.

Valid statistics: Sum, Minimum, Maximum, Average, SampleCount

DataWriteOperations

The number of write operations.

The Sum statistic is the total number of write operations. The Minimum statistic is the minimum number of write operations on a single disk. The Maximum statistic is the maximum number of write operations on the disk. The Average statistic is the average number of write operations per disk. The SampleCount statistic is the number of disks.

To calculate the average number of write operations (operations per second) for a period, divide the Sum statistic by the number of seconds in the period.

Units:

  • Bytes for Sum, Minimum, Maximum, and Average.

  • Count for SampleCount.

Valid statistics: Sum, Minimum, Maximum, Average, SampleCount

MetadataOperations

The number of metadata operations.

The Sum statistic is the count of metadata operations. The Minimum statistic is the minimum number of metadata operations per disk. The Maximum statistic is the maximum number of metadata operations per disk. The Average statistic is the average number of metadata operations per disk. The SampleCount statistic is the number of disks.

To calculate the average number of metadata operations (operations per second) for a period, divide the Sum statistic by the number of seconds in the period.

Units:

  • Count for Sum, Minimum, Maximum, Average, and SampleCount.

Valid statistics: Sum, Minimum, Maximum, Average, SampleCount

FreeDataStorageCapacity

The amount of available storage capacity.

The Sum statistic is the total number of bytes available in the file system. The Minimum statistic is the total number bytes available in the fullest disk. The Maximum statistic is the total number of bytes available in the disk with the most remaining available storage. The Average statistic is the average number of bytes available per disk. The SampleCount statistic is the number of disks.

Units:

  • Bytes for Sum, Minimum, Maximum.

  • Count for SampleCount.

Valid statistics: Sum, Minimum, Maximum, Average, SampleCount

LogicalDiskUsage

The amount of logical data stored (uncompressed).

The Sum statistic is the total number of logical bytes stored in the file system. The Minimum statistic is the least number of logical bytes stored in a disk in the file system. The Maximum statistic is the largest number of logical bytes stored in a disk in the file system. The Average statistic is the average number of logical bytes stored per disk. The SampleCount statistic is the number of disks.

Units:

  • Bytes for Sum, Minimum, Maximum.

  • Count for SampleCount.

Valid statistics: Sum, Minimum, Maximum, Average, SampleCount

PhysicalDiskUsage

The amount of storage physically occupied by file system data (compressed).

The Sum statistic is the total number of bytes occupied in disks in the file system. The Minimum statistic is the total number of bytes occupied in the emptiest disk. The Maximum statistic is the total number of bytes occupied in the fullest disk. The Average statistic is the average number of bytes occupied per disk. The SampleCount statistic is the number of disks.

Units:

  • Bytes for Sum, Minimum, Maximum.

  • Count for SampleCount.

Valid statistics: Sum, Minimum, Maximum, Average, SampleCount

File system metadata metrics

FSx for Lustre publishes the following file system metadata metrics into the FSx namespace in CloudWatch. These metrics use dimensions to enable more granular measurements of your metadata data. All metadata metrics have the FileSystemId and StorageTargetId dimensions. File system metadata metrics are exposed only if your file system has a metadata configuration specified.

Metric Description
DiskReadOperations

The number of read operations for the file server accessing storage volumes. All traffic is considered in this metric, including background tasks. There is one metric emitted each minute for each of your file system's storage volumes.

The Sum statistic is the total number of read operations performed by the given storage volume over the specified period.

The Average statistic is the average number of read operations performed each minute by the given storage volume over the specified period.

The Minimum statistic is the lowest number of read operations performed each minute by the given storage volume over the specified period.

The Maximum statistic is the highest number of read operations performed each minute by the given storage volume over the specified period.

To calculate average metadata disk IOPS over the period, use the Average statistic and divide the result by 60 (seconds).

Units: Count

Valid statistics: Sum, Average, Minimum, and Maximum

DiskWriteOperations

The number of write operations for the file server accessing storage volumes.

The number of write operations to this storage volumes. All traffic is considered in this metric, including background tasks. There is one metric emitted each minute for each of your file system's storage volumes.

The Sum statistic is the total number of write operations performed by the given storage volume over the specified period.

The Average statistic is the average number of write operations performed each minute by the given storage volume over the specified period.

To calculate average metadata disk IOPS over the period, use the Average statistic and divide the result by 60 (seconds).

Units: Count

Valid statistics: Sum and Average

FileCreateOperations

Total number of file create operations.

Unit: Count

FileOpenOperations

Total number of file open operations.

Unit: Count

FileDeleteOperations

Total number of file delete operations.

Unit: Count

StatOperations

Total number of stat operations.

Unit: Count

RenameOperations

Total number of directory renames, whether in-place directory renames or cross directory renames.

Unit: Count

AutoImport and AutoExport metrics

FSx for Lustre publishes the following AutoImport (automatic import) and AutoExport (automatic export) metrics into the FSx namespace in CloudWatch. These metrics use dimensions to enable more granular measurements of your data. All AutoImport and AutoExport metrics have the FileSystemId and Publisher dimensions.

Metric Description

AgeOfOldestQueuedMessage

Dimension: AutoExport

The age, in seconds, of the oldest message waiting to be exported.

The Average statistic is the average age of the oldest message waiting to be exported. The Maximum statistic is the maximum number of seconds a message lived in the export queue. The Minimum statistic is the minimum number of seconds a message lived in the export queue. A value of zero indicates that no messages are waiting to be exported.

Units: Seconds

Valid statistics: Average, Minimum, Maximum

RepositoryRenameOperations

Dimension: AutoExport

The number of renames processed by the file system in response to a larger directory rename.

The Sum statistic is the total number of rename operations that result from a directory rename. The Average statistic is the average number of rename operations for the file system. The Maximum statistic is the maximum number of rename operations associated with a directory rename on the file system. The Minimum statistic is the minimum number of renames associated with a directory rename on the file system.

Units: Count

Valid statistics: Sum, Minimum, Maximum, Average

AgeOfOldestQueuedMessage

Dimension: AutoImport

The age, in seconds, of the oldest message waiting to be imported.

The Average statistic is the average age of the oldest message waiting to be imported. The Maximum statistic is the maximum number of seconds a message lived in the import queue. The Minimum statistic is the minimum number of seconds a message lived in the import queue. A value of zero indicates that no messages are waiting to be imported.

Units: Seconds

Valid statistics: Average, Minimum, Maximum

Amazon FSx for Lustre dimensions

Amazon FSx for Lustre metrics use the FSx namespace and provide metrics for the dimension, FileSystemId. A file system's ID can be found using the describe-file-systems AWS CLI command, and it takes the form of fs-01234567890123456.

The StorageTargetId dimension is available in CloudWatch to denote which MDT (metadata target) published the file system metadata metrics. A StorageTargetId takes the form of MDTxxxx (for example, MDT0001).

The Publisher dimension is available in CloudWatch and AWS CLI for the AutoImport and AutoImport metrics to denote which service published the metrics.