Performance for Amazon FSx for OpenZFS - FSx for OpenZFS

Performance for Amazon FSx for OpenZFS

Amazon FSx for OpenZFS provides simple, high-performance file storage. In this section, we provide an overview of FSx for OpenZFS performance for all deployment types, and describe how your file system configuration impacts key performance dimensions. We also include some important tips and recommendations for maximizing the performance of your file system.

How FSx for OpenZFS file systems work

Each FSx for OpenZFS file system consists of the file server that clients communicate with and a set of disks attached to that file server. Each file server employs a fast, in-memory cache to enhance performance for the most frequently accessed data. In addition to the in-memory cache, Single-AZ 2 file systems also provide an additional Non-volatile Memory Express (NVMe) cache for storing up to terrabytes of frequently accessed data. FSx for OpenZFS utilizes the Adaptive Replacement Cache (ARC) and L2ARC that are built into the OpenZFS file system, which improves the portion of data access driven from the in-memory and NVMe caches.

When a client accesses data that's stored in either the in-memory or NVMe caches, the file server doesn't need to read it from disk, and the data is served directly to the requesting client as network I/O. When a client accesses data that is not in either of these caches, it is read from disk as disk I/O and then served to the client as network I/O; data read from disk is also subject to the IOPS and bandwidth limits of the underlying disks.

FSx for OpenZFS file systems can serve network I/O about three times faster than disk I/O, which means that clients can drive greater throughput and IOPS with lower latencies for frequently accessed data in cache. The following diagram illustrates how data is accessed from an FSx for OpenZFS file system, with the NVMe cache applying to all Single-AZ 2 file systems.

Diagram showing how data is accessed in an FSx for OpenZFS file system.

File-based workloads are typically spiky, characterized by short, intense periods of high I/O with plenty of idle time between bursts. To support spiky workloads, in addition to the baseline speeds that a file system can sustain 24/7, Amazon FSx provides the capability to burst to higher speeds for periods of time for both network I/O and disk I/O operations. Amazon FSx uses a network I/O credit mechanism to allocate throughput and IOPS based on average utilization — file systems accrue credits when their throughput and IOPS usage is below their baseline limits, and can use these credits when they perform I/O operations.

File system performance

File system performance is typically measured in latency, throughput, and I/O operations per second (IOPS). Amazon FSx for OpenZFS offers three deployment options, Multi-AZ (HA), Single-AZ (HA), and Single-AZ (non-HA). Both Single-AZ (non-HA and HA) deployment types, also support Single-AZ 1 and Single-AZ 2. Single-AZ 2 offers higher levels of performance than the maximum offered by Single-AZ 1. Each deployment option offers a different performance profile. In this section, we document the performance you can expect for frequently accessed data from the in-memory or NVMe caches and data accessed from disk for both deployment types. We also document the baseline performance you can always deliver, as well as the burst performance you can drive for short periods of time.

The specific level of performance a file system can provide is defined by its provisioned throughput capacity, which determines the size of the file server hosting the file system. Provisioned throughput capacity is equivalent to the baseline disk throughput supported by your file server. For data access from disks, your file system’s performance is also dependent on the number of provisioned SSD disk IOPS configured for the file system’s underlying disks.

The following sections provide details about the maximum levels of network throughput capacity, disk throughput capacity, and IOPS you can drive with each provisioned throughput capacity configuration. Note that the actual level of performance you can drive for your workload depends on a variety of factors. For more information, see Tips for maximizing performance.

Data access from cache

For read access directly from the in-memory ARC or NVMe L2ARC cache, performance is primarily defined by two components: the performance supported by the client-server network I/O connection, and the size of the cache. The following tables show the cached read performance of all Single-AZ 1, all Single-AZ 2, and Multi-AZ (HA) file systems, based on AWS Region.

Note

Single-AZ 1 (HA) and Single-AZ 2 (HA) file systems are only available in a certain subset of AWS Regions. For more information on which AWS Regions support Single-AZ 1 (HA) and Single-AZ 2 (HA) file systems, see Deployment type availability by AWS Region.

Provisioned throughput capacity (MB/s) In-memory cache (GB) Maximum network throughput capacity (MB/s) Maximum network IOPS

Baseline

Burst

64 3 97 1,562

Tens of thousands of IOPS

128 11.2 195 1,562
256 22.4 390 1,562
512 44.8 781 1,562

Hundreds of thousands of IOPS

1,024 89.6 1,562

2,048 179.2 3,125

3,072 268.8 4,687

4,096 358.4 6,250

Up to 1 million IOPS

Provisioned throughput capacity (MB/s) In-memory cache (GB) Maximum network throughput capacity (MB/s) Maximum network IOPS

Baseline

Burst

64 0.25 200 3,200

Tens of thousands of IOPS

128 1.0 400 3,200
256 3.0 800 3,200
512 11.2 1,600 3,200

Hundreds of thousands of IOPS

1,024 22.4 3,200

2,048 44.8 6,400

3,072 67.2 9,600

4,096 89.6 12,800

1 million IOPS

Provisioned throughput capacity (MB/s) In-memory cache (GB) NVMe L2ARC cache (GB) Network throughput capacity (MB/s) Maximum network IOPS

Baseline

Burst

160 3 40 375 3,125

Tens of thousands of IOPS

320 11.2 80 775 3,750
640 22.4 160 1,550 5,000

Hundreds of thousands of IOPS

1,280 44.8 320 3,125 6,250
2,560 89.6 640 6,250

3,840 134.4 960 9,375

5,120 179.2 1,280 12,500

1+ million IOPS

7,680 268.8 1,920 18,750

10,240 358.4 2,560 21,000

Provisioned throughput capacity (MB/s) In-memory cache (GB) Network throughput capacity (MB/s) Maximum network IOPS

Baseline

Burst

160 11.2 195 1,562

Tens of thousands of IOPS

320 22.4 390 1,562
640 44.8 781 1,562

Hundreds of thousands of IOPS

1,280 89.6 1,562
2,560 179.2 3,125

3,840 268.8 4,687

5,120 358.4 6,250

Up to 1 million IOPS

Provisioned throughput capacity (MB/s) In-memory cache (GB) Network throughput capacity (MB/s) Maximum network IOPS

Baseline

Burst

160 3 375 3,125

Tens of thousands of IOPS

320 11.2 775 3,750
640 22.4 1,550 5,000

Hundreds of thousands of IOPS

1,280 44.8 3,125 6,250
2,560 89.6 6,250

3,840 134.4 9,375

5,120 179.2 12,500

1+ million IOPS

7,680 268.8 18,750

10,240 358.4 21,000

Note

For Multi-AZ file systems created in Canada (Central) and Asia Pacific (Mumbai) prior to July 9th, 2024, refer to Table 6 for performance details.

Provisioned throughput capacity (MB/s) In-memory cache (GB) Network throughput capacity (MB/s) Maximum network IOPS

Baseline

Burst

160 1.0 400 3,200

Tens of thousands of IOPS

320 3 800 3,200
640 11.2 1,600 3,400

Hundreds of thousands of IOPS

1,280 22.4 3,200
2,560 44.8 6,400

3,840 67.2 9,600

5,120 89.6 12,800

1+ million IOPS

Data access from disk

For read and write access from the disks attached to the file server, performance depends on the performance supported by the server’s disk I/O connection. Similar to data accessed from cache, the performance of this connection is determined by the provisioned throughput capacity of the file system, which is equivalent to the baseline throughput capacity of your file server.

Provisioned throughput capacity (MB/s) Maximum disk throughput capacity (MB/s) Maximum disk IOPS

Baseline

Burst

Baseline

Burst

64 64 1,024 2,500 40,000
128 128 1,024 5,000 40,000
256 256 1,024 10,000 40,000
512 512

1,024

20,000

40,000

1,024 1,024

40,000

2,048 2,048

80,000

3,072 3,072

120,000

4,096 4,096

160,000

Provisioned throughput capacity (MB/s) Maximum disk throughput capacity (MB/s) Maximum disk IOPS

Baseline

Burst

Baseline

Burst

160 160 3,125 6,250 100,000
320 320 3,125 12,500 100,000
640 640 3,125 25,000 100,000
1,280 1,280 3,125 50,000 100,000
2,560 2,560

100,000

3,840 3,840

150,000

5,120 5,120

200,000

7,680 7,680

300,000

10,240 10,240*

400,000

Provisioned throughput capacity (MB/s) Maximum disk throughput capacity (MB/s)* Maximum disk IOPS

Baseline

Burst

Baseline

Burst

160 160 1,250 6,000 40,000
320 320 1,250 12,000 40,000
640 640 1,250 20,000 40,000
1,280 1,280 40,000
2,560 2,560

80,000

3,840 3,840

120,000

5,120 5,120

160,000

Note

*Deployment hardware differences in these regions may cause disk throughput capacity to vary by up to 5% from the values shown in this table.

Provisioned throughput capacity (MB/s) Maximum disk throughput capacity (MB/s) Maximum disk IOPS

Baseline

Burst

Baseline

Burst

160 160 3,125 6,250 100,000
320 320 3,125 12,500 100,000
640 640 3,125 25,000 100,000
1,280 1,280 3,125 50,000 100,000
2,560 2,560

100,000

3,840 3,840

150,000

5,120 5,120

200,000

7,680 7,680

300,000

10,240 10,240*

400,000

Note

*If you have a Multi-AZ (HA) file system with a throughput capacity of 10,240 MB/s, performance will be limited to 7,500 MB/s for write traffic only. Otherwise, for read traffic on all Multi-AZ (HA) file systems, read and write traffic on all Single-AZ file systems, and all other throughput capacity levels, your file system will support the performance limits shown in the table.

Note

For Multi-AZ file systems created in Canada (Central) and Asia Pacific (Mumbai) prior to July 9th, 2024, refer to Table 11 for performance details.

Provisioned throughput capacity (MB/s) Maximum disk throughput capacity (MB/s)* Maximum disk IOPS

Baseline

Burst

Baseline

Burst

160 160 1,187 5,000 40,000
320 320 1,187 10,000 40,000
640 640 1,187 20,000 40,000
1,280 1,280 40,000
2,560 2,560

80,000

3,840 3,840

120,000

5,120 5,120

160,000

Note

*Deployment hardware differences in these regions may cause disk throughput capacity to vary by up to 5% from the values shown in this table.

The previous tables show your file system’s throughput capacity for uncompressed data. However, because data compression reduces the amount of data that needs to be transferred as disk I/O, you can often deliver higher levels of throughput for compressed data. For example, if your data is compressed to be 50% smaller (that is, a compression ratio of 2), then you can drive up to 2x the throughput than you could if the data were uncompressed. For more information, see Data compression.

SSD IOPS and performance

Data accessed from disk is also subject to the performance of those underlying disks, which is determined by the number of provisioned SSD IOPS configured on the file system. The maximum IOPS levels you can achieve are defined by the lower of either the maximum IOPS supported by your file server’s disk I/O connection, or the maximum SSD disk IOPS supported by your disks. In order to drive the maximum performance supported by the server-disk connection, you should configure your file system’s provisioned SSD IOPS to match the maximum IOPS in the table above.

If you select Automatic provisioned SSD IOPS, Amazon FSx will provision 3 IOPS per GB of storage capacity up to the maximum for your file system, which is the highest IOPS level supported by the disk I/O connection documented above. If you select User-provisioned, you can configure any level of SSD IOPS from the minimum of 3 IOPS per GB of storage, up to the maximum for your file system, as long as you don't exceed 1000 IOPS per GiB*.

Note

*File systems in the following AWS Regions have a maximum IOPS to storage ratio of 50 IOPS per GiB: Africa (Cape Town), Asia Pacific (Hyderabad), Asia Pacific (Jakarta), Middle East (UAE), Middle East (Bahrain), Asia Pacific (Osaka), Europe (Milan), Europe (Paris), South America (São Paulo) Region, Israel (Tel Aviv), Asia Pacific (Hong Kong), Asia Pacific (Seoul), Asia Pacific (Mumbai), Canada (Central), Europe (Stockholm), and Europe (London).

The following graph illustrates the maximum IOPS for Single-AZ 1 (non-HA and HA), Single-AZ 2 (non-HA and HA), and Multi-AZ (HA) depending on storage capacity.

Chart showing provisioned IOPS.

Choosing a deployment type based on performance

Both Single-AZ (non-HA) and Single-AZ (HA) offer two tiers of performance with Single-AZ 1 and Single-AZ 2. Single-AZ 2 (HA) is recommended for most use cases, given the higher level of both performance and availablity that it provides. Single-AZ 2 file systems offer double the performance scalability as compared to Single-AZ 1, delivering up to 400,000 IOPS and 10 GB/s throughput for both reads and writes to persistent SSD storage.

In addition, Single-AZ 2 file systems include an up to 2.5 TB high-speed NVMe read cache that automatically caches your most recently-accessed data, making that data accessible at millions of IOPS and with latencies of a few hundred microseconds. Single-AZ 2 file systems are suitable for high-performance workloads such as media processing and rendering, financial analytics, and machine learning. Single-AZ 2 file systems are also appropriate for read-heavy workloads with frequently accessed datasets.

In addition to Single-AZ file systems, Amazon FSx for OpenZFS also offers Multi-AZ file systems that offer higher levels of availability and durability in addition to the same levels of performance as Single-AZ 2. For more information on Multi-AZ (HA) file systems and choosing between deployment types, see Availability and durability for Amazon FSx for OpenZFS.

Migrating between deployment types

There are several ways for you to migrate from a Single-AZ 1 (non-HA or HA) file system to a Single-AZ 2 (non-HA or HA) file system.

  • You can create a new Single-AZ 2 file system by restoring from a backup of your Single-AZ 1 file system.

  • You can create a new file system with the desired deployment type and use standard tools like rsync to copy your data over from the existing file system. For more information, see Migrating files to Amazon FSx for OpenZFS using rsync.

  • You can use on-demand replication to synchronize data on a Single-AZ 1 file system with a new Single-AZ 2 file system. For more information, see Working with on-demand data replication.

Tips for maximizing performance

FSx for OpenZFS file systems are designed to deliver the maximum performance of your file system across your clients in aggregate, whether you are supporting data access from a single client, or thousands of clients. The following sections provide some practical tips on how to maximize client performance.

Client considerations

Amazon EC2 instances

When launching the Amazon EC2 instances that will work with your FSx for OpenZFS file system, ensure that they can support the level of performance your file system needs to deliver. Ensure they have the compute, memory, and network capacity sufficient to drive the throughput, IOPS, and latencies provided by your FSx for OpenZFS file system.

To determine your EC2 instance’s compute and memory capacity, see Instance types in the Amazon EC2 User Guide for Linux Instances. To determine its network capacity, see Amazon EC2 instance network bandwidth in the same guide. The performance characteristics of FSx for OpenZFS file systems don't depend on the use of Amazon EC2–optimized instances.

NFS nconnect

With FSx for OpenZFS, NFS clients can use the nconnect mount option to have multiple TCP connections (up to 16) associated with a single NFS mount. Such an NFS client multiplexes file operations onto multiple TCP connections (multi-flow) in a round-robin fashion to obtain improved performance beyond single TCP connection (single-flow) limits. For more information on single-flow limits, see Amazon EC2 instance network bandwidth in the Amazon EC2 User Guide for Linux Instances.

The following command demonstrates how to use the nconnect mount option to mount an FSx for OpenZFS volume with a maximum of 16 simultaneous connections:

sudo mount -t nfs -o nconnect=16 filesystem_dns_name:/vol_path /localpath

The nconnect mount option is supported for all NFS versions (v3, v4.0, v4.1, v4.2). NFS nconnect is supported by default in Linux kernel versions 5.3 and above, including the latest Ubuntu 18.04 LTS. In addition, RHEL8.3 supports nconnect by way of a backport into the 4.18.0-240.e18 kernel and newer.

NFS v3

FSx for OpenZFS file systems flexibly support multiple versions of the NFS protocol (v3, v4.0, v4.1, v4.2). While more recent versions of NFS can better support simultaneous access from many clients (due to a more robust file-locking mechanism) and client-side caching, NFS v3 may still provide improved latency, throughput, and IOPS performance for performance-sensitive workloads. You can mount using NFS v3 from Linux, Windows, or macOS EC2 instances. For more information, see Step 2: Mount your file system from an Amazon EC2 instance.

The following example illustrates how to specify NFS v3 when mounting an FSx for OpenZFS volume:

sudo mount -t nfs -o nfsvers=3 fs-dns-name:/vol_path /local_path

NFS delegations

To improve the ability of NFS clients to cache data locally, NFS v4 introduced NFS delegations, or the ability of the server to delegate certain responsibilities to the client. If the client is granted a read delegation, it is assured that no other client has the ability to write to the file for the duration of the delegation, meaning that the client can read from its local copy instead of having to go back to the file server.

FSx for OpenZFS file systems support NFS v4 file read delegations. To take advantage of this capability, ensure your clients are mounting with NFS v4.0 or higher.

Request model

When you mount your file system, asynchronous writes are enabled by default (that is, -o async). With asynchronous writes, pending write operations are buffered on the client before they are written to your Amazon FSx file system, enabling lower latencies for these operations. A client that has enabled synchronous writes (that is, -o sync), or one that opens files using an option that bypasses the cache (for example, O_DIRECT), issues synchronous requests, which means that every operation incurs a round-trip between your client and the file server. We recommend using the default asynchronous write option to maximize client performance.

Other recommended mount options

To improve the performance of your file system, you can also configure the following options when mounting your file system:

  • rsize=1048576 – Sets the maximum number of bytes of data that the NFS client can receive for each network READ request to 1048576 bytes (1 MB). Due to lower memory capacity on file systems with 64 MB/s and 128 MB/s of provisioned throughput, these file systems will only accept a maximum rsize of 262144 and 524288 bytes, respectively.

  • wsize=1048576 – Sets the maximum number of bytes of data that the NFS client can send for each network WRITE request to 1048576 bytes (1 MB). Due to lower memory capacity on file systems with 64 MB/s and 128 MB/s of provisioned throughput, these file systems will only accept a maximum wsize of 262144 and 524288 bytes, respectively.

  • timeo=600 – Sets the timeout value that the NFS client uses to wait for a response before it retries an NFS request to 600 deciseconds (60 seconds).

  • _netdev – When present in /etc/fstab, prevents the client from attempting to mount the FSx for OpenZFS volume until the network has been enabled.

The following example uses sample values.

sudo mount -t nfs -o rsize=1048576,wsize=1048576,timeo=600 fs-01234567890abcdef1.fsx.us-east-1.amazonaws.com:/fsx/vol1 /fsx

File system and volume configurations

Storage capacity utilization

As the amount of used storage space gets closer to the total available storage capacity, OpenZFS (like other file systems) spends more time finding suitable places to store new files and their metadata. This leads to higher latency for operations that modify files, which can negatively impact overall performance. To avoid this performance impact, we recommended keeping storage utilization below 80% of the total capacity. If needed, you can increase your maximum storage capacity at anytime, without disruption to your end users or applications. For more information, see Modifying SSD storage capacity and provisioned IOPS.

Provisioned throughput capacity and in-memory cache

In addition to defining the throughput and IOPS that a file system can deliver, a file system's provisioned throughput capacity also determines the amount of in-memory cache on your file server. Increasing your file system's throughput capacity improves workload performance in two ways.

First, it increases the throughput and IOPS you can drive from disk (disk I/O) and from in-memory cache. Second, by increasing the amount of in-memory cache, you can store more data in your file server's in-memory cache, which drives higher cached performance for larger workloads.

Some request- or metadata-intensive workloads will also benefit from a larger file server in-memory cache. These types of workloads can generate and store a large volume of metadata in the in-memory cache. To ensure the size of your file server's in-memory cache is not a bottleneck for your file system performance, we recommend provisioning at least 128 MB/s of throughput capacity for these types of workloads.

NFS export options (sync and async)

On the file server side, the sync or async NFS export option can impact performance. (This is distinct from the similarly-named option you use when mounting your FSx for OpenZFS volume on your client.) This option determines whether your file server will acknowledge client I/O requests as complete when they are written to the file server’s in-memory cache (async), or only after they are committed to the file server’s persistent disks (sync). sync is the default option and is generally recommended for most workloads.

If you have performance-intensive workloads that can use an FSx for OpenZFS volume as temporary storage for shorter-term data processing or workloads that are resilient to data loss, you can use the async option to achieve substantially higher performance. Because an FSx for OpenZFS volume exported with the async option will acknowledge client writes before they are committed to durable disk storage, clients can write data to the file server at a significantly faster rate. However, this performance comes at the cost of losing data from acknowledged writes that have not yet been committed to the server's disks, in the event of a file server crash.

Data compression

For read-heavy workloads, compression can significantly improve the overall throughput performance of your file system because it reduces the amount of data that needs to be sent between the underlying storage and the file server. FSx for OpenZFS volumes support the following data compression algorithms.

  • Zstandard compression delivers very high levels of on-disk data compression, with higher read throughput and reduced write throughput performance than LZ4 compression.

  • LZ4 compression delivers higher write throughput performance, but achieves lower levels of data compression than Zstandard compression.

With data compression, you can improve your read throughput on data accessed from disk up to the same levels you deliver for frequently accessed cached data. The specific improvement depends upon the amount by which compression can reduce the size of your dataset. Your effective throughput will be roughly equivalent to the product of your provisioned disk throughput and your compression ratio (defined as the ratio of the size of the compressed data to the size of the uncompressed data). For the highest provisioned throughput level (4096 MB/s), common Z-Standard compression ratios of 2-3x can increase your effective read throughput by up to 8-12 GB/s.

You can change a volume's data compression to improve performance. Changing this property affects only newly-written data on the volume.

ZFS record size

The ZFS record size specifies a suggested block size for files in the volume. This property is designed solely for use with databases and other workloads that access files in fixed-size records. ZFS automatically tunes block sizes according to internal algorithms optimized for typical access patterns. When you create a volume, the default record size is 128 KiB. General purpose workflows perform well using the default record size, and we don't recommend changing it, as it may adversely affect performance.

For database workflows that create very large files but access them in small random chunks, specifying a record size greater than or equal to the record size of the database can result in significant performance gains. For databases that use a fixed disk block or record size for I/O, set the ZFS record size to match it. See Dataset record size in the OpenZFS documentation for more information.

Streaming workflows such as multimedia and video can benefit from setting a larger record size than the default value. For more information about setting the record size on a volume, see Managing Amazon FSx for OpenZFS volumes.

You can change a volume's record size to make performance improvements. Changing the volume record size affects only files created afterward; existing files are unaffected.

Monitoring performance

Every minute, FSx for OpenZFS emits usage metrics to Amazon CloudWatch and you can use these metrics to help identify opportunities to improve the performance your clients can drive from your file system.

You can investigate aggregate file system performance with the Sum statistic of each metric. For example, the Sum of the DataReadBytes statistic reports the total read throughput by file system or volume, and the Sum of the DataWriteBytes statistic reports the total write throughput by file system or volume.

For more information on monitoring your file system’s performance, see Monitoring with Amazon CloudWatch.