# Using data repositories with Amazon File Cache
<a name="using-data-repositories"></a>

Amazon File Cache is a fully managed, high-speed cache on AWS that makes it easier to process file data, regardless of where the data is stored. Your cache serves as a temporary, high-performance storage location for data stored in on-premises file systems, AWS file systems, and Amazon Simple Storage Service (Amazon S3) buckets. Using this capability, you can make dispersed data sets available to file-based applications on AWS with a unified view and at high speeds—sub-millisecond latencies and high throughput.

**Topics**
+ [Overview of data repositories](overview-data-repo.md)
+ [Linking your cache to a data repository](create-linked-data-repo.md)
+ [Importing files from your data repository](importing-files.md)
+ [Exporting changes to the data repository](export-changed-data.md)
+ [Cache eviction](cache-eviction.md)

# Overview of data repositories
<a name="overview-data-repo"></a>

Amazon File Cache is well integrated with data repositories in Amazon S3 or on Network File System (NFS) file systems that support the NFSv3 protocol. This integration means that you can seamlessly access the objects stored in your Amazon S3 buckets or NFS data repositories from applications that mount your cache. You can also run your compute-intensive workloads on Amazon EC2 instances in the AWS Cloud and export the results to your data repository after your workload is complete.

**Note**  
You can link your cache to either S3 or NFS data repositories, but not to both types at the same time. You can't have a mix of linked S3 and NFS data repositories on a single cache.

When you use Amazon File Cache with multiple storage repositories, you can ingest and process large volumes of file data in a high-performance cache. At the same time, you can write results to your data repositories by using HSM commands. With these features, you can restart your workload at any time using the latest data stored in your data repository.

By default, Amazon File Cache automatically loads data into the cache when it’s accessed for the first time (lazy load). You can optionally pre-load data into the cache before starting your workload. For more information, see [Lazy load](mdll-lazy-load.md).

You can also export files and their associated metadata (including POSIX metadata) in your cache to your data repository using HSM commands. When you use HSM commands, file data and metadata that were created or modified since the last such export are exported to the data repository. For more information, see [Exporting files using HSM commands](exporting-files-hsm.md).

**Important**  
If you have linked one or more caches to a data repository on Amazon S3, don't delete the Amazon S3 bucket until you have deleted all linked caches.

# POSIX metadata support for data repositories
<a name="posix-metadata-support"></a>

Amazon File Cache automatically transfers Portable Operating System Interface (POSIX) metadata for files, directories, and symbolic links (symlinks) when importing and exporting data to and from a linked Amazon S3 or NFS data repository. When you export changes in your cache to a linked data repository, Amazon File Cache also exports POSIX metadata changes along with data changes. Because of this metadata export, you can implement and maintain access controls between your cache and its linked data repositories.

 Amazon File Cache imports only objects that have POSIX-compliant object keys, such as the following.

```
test/mydir/ 
test/
```

Amazon File Cache stores directories and symlinks as separate objects in the linked data repository. On an S3 data repository for example, for directories, Amazon File Cache creates an S3 object with a key name that ends with a slash ("/"), as follows:
+ The S3 object key `test/mydir/` maps to the cache directory `test/mydir`.
+ The S3 object key `test/` maps to the cache directory `test`.

For symlinks, Amazon File Cache uses the following Amazon S3 schema for symlinks:
+ **S3 object key** – The path to the link, relative to the Amazon File Cache mount directory
+ **S3 object data** – The target path of the symlink
+ **S3 object metadata** – The metadata for the symlink

Amazon File Cache stores POSIX metadata, including ownership, permissions, and timestamps for Amazon File Cache files, directories, and symbolic links, in S3 objects as follows:
+ `Content-Type` – The HTTP entity header used to indicate the media type of the resource for web browsers.
+ `x-amz-meta-file-permissions` – The file type and permissions in the format `<octal file type><octal permission mask>`, consistent with `st_mode` in the [Linux stat(2) man page](https://man7.org/linux/man-pages/man2/lstat.2.html).
**Note**  
Amazon File Cache doesn't import or retain `setuid` information.
+ `x-amz-meta-file-owner` – The owner user ID (UID) expressed as an integer.
+ `x-amz-meta-file-group` – The group ID (GID) expressed as an integer.
+ `x-amz-meta-file-atime` – The last-accessed time in nanoseconds. Terminate the time value with `ns`; otherwise Amazon File Cache interprets the value as milliseconds.
+ `x-amz-meta-file-mtime` – The last-modified time in nanoseconds. Terminate the time value with `ns`; otherwise, Amazon File Cache interprets the value as milliseconds.
+ `x-amz-meta-user-agent` – The user agent, ignored during Amazon File Cache import. During export, Amazon File Cache sets this value to `aws-fsx-lustre`.

The default POSIX permission that Amazon File Cache assigns to a file is 755. This permission allows read and execute access for all users and write access for the owner of the file.

**Note**  
Amazon File Cache doesn't retain any user-defined custom metadata on S3 objects.

# Walkthrough: Attaching POSIX permissions when uploading objects into an Amazon S3 bucket
<a name="attach-s3-posix-permissions"></a>

The following procedure walks you through the process of uploading objects into Amazon S3 with POSIX permissions. By doing so, you can import the POSIX permissions when you create an Amazon File Cache that is linked to that S3 bucket.

**To upload objects with POSIX permissions to Amazon S3**

1. From your local computer or machine, use the following example commands to create a test directory (`s3cptestdir`) and file (`s3cptest.txt`) that will be uploaded to the S3 bucket.

   ```
   $ mkdir s3cptestdir
   $ echo "S3cp metadata import test" >> s3cptestdir/s3cptest.txt
   $ ls -ld s3cptestdir/ s3cptestdir/s3cptest.txt
   drwxr-xr-x 3 500 500 96 Jan 8 11:29 s3cptestdir/
   -rw-r--r-- 1 500 500 26 Jan 8 11:29 s3cptestdir/s3cptest.txt
   ```

   The newly created file and directory have a file owner user ID (UID) and group ID (GID) of 500, and permissions as shown in the preceding example.

1. Call the Amazon S3 API to create the directory `s3cptestdir` with metadata permissions. You must specify the directory name with a trailing slash (`/`). For information about supported POSIX metadata, see [POSIX metadata support for data repositories](posix-metadata-support.md).

   Replace `bucket_name` with the actual name of your S3 bucket.

   ```
   $ aws s3api put-object --bucket bucket_name --key s3cptestdir/ --metadata '{"user-agent":"aws-fsx-lustre" , \
         "file-atime":"1595002920000000000ns" , "file-owner":"500" , "file-permissions":"0100664","file-group":"500" , \
         "file-mtime":"1595002920000000000ns"}'
   ```

1. Verify that the POSIX permissions are tagged to S3 object metadata.

   ```
   $ aws s3api head-object --bucket bucket_name --key s3cptestdir/
   {
       "AcceptRanges": "bytes",
       "LastModified": "Fri, 08 Jan 2021 17:32:27 GMT",
       "ContentLength": 0,
       "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
       "VersionId": "bAlhCoWq7aIEjc3R6Myc6UOb8sHHtJkR",
       "ContentType": "binary/octet-stream",
       "Metadata": {
           "user-agent": "aws-fsx-lustre",
           "file-atime": "1595002920000000000ns",
           "file-owner": "500",
           "file-permissions": "0100664",
           "file-group": "500",
           "file-mtime": "1595002920000000000ns"
       }
   }
   ```

1. Upload the test file (created in step 1) from your computer to the S3 bucket with metadata permissions.

   ```
   $ aws s3 cp s3cptestdir/s3cptest.txt s3://bucket_name/s3cptestdir/s3cptest.txt \
         --metadata '{"user-agent":"aws-fsx-lustre" , "file-atime":"1595002920000000000ns" , \
         "file-owner":"500" , "file-permissions":"0100664","file-group":"500" , "file-mtime":"1595002920000000000ns"}'
   ```

1. Verify that the POSIX permissions are tagged to S3 object metadata.

   ```
   $ aws s3api head-object --bucket bucket_name --key s3cptestdir/s3cptest.txt
   {
       "AcceptRanges": "bytes",
       "LastModified": "Fri, 08 Jan 2021 17:33:35 GMT",
       "ContentLength": 26,
       "ETag": "\"eb33f7e1f44a14a8e2f9475ae3fc45d3\"",
       "VersionId": "w9ztRoEhB832m8NC3a_JTlTyIx7Uzql6",
       "ContentType": "text/plain",
       "Metadata": {
           "user-agent": "aws-fsx-lustre",
           "file-atime": "1595002920000000000ns",
           "file-owner": "500",
           "file-permissions": "0100664",
           "file-group": "500",
           "file-mtime": "1595002920000000000ns"
       }
   }
   ```

1. Verify permissions on the Amazon File Cache linked to the S3 bucket.

   ```
   $ sudo lfs df -h /fsx
   UUID                       bytes        Used   Available Use% Mounted on
   3rnxfbmv-MDT0000_UUID       34.4G        6.1M       34.4G   0% /fsx[MDT:0]
   3rnxfbmv-OST0000_UUID        1.1T        4.5M        1.1T   0% /fsx[OST:0]
    
   filesystem_summary:         1.1T        4.5M        1.1T   0% /fsx
    
   $ cd /fsx/s3cptestdir/
   $ ls -ld s3cptestdir/
   drw-rw-r-- 2 500 500 25600 Jan  8 17:33 s3cptestdir/
   
   $ ls -ld s3cptestdir/s3cptest.txt
   -rw-rw-r-- 1 500 500 26 Jan 8 17:33 s3cptestdir/s3cptest.txt
   ```

Both the `s3cptestdir` directory and the `s3cptest.txt` file have POSIX permissions imported.

# Prerequisites for linking to on-premises NFS data repositories
<a name="nfs-filer-prereqs"></a>

Before you can link your cache to an on-premises NFS data store, verify that your resources and configurations meet the following requirements:
+ Your on-premises NFS file system must support NFSv3.
+ If you're using a domain name to link your NFS file system to Amazon File Cache, you must provide the IP address of a DNS server that Amazon File Cache can use to resolve the domain name of the on-premises NFSv3 file system. The DNS server can be located in the VPC where you plan to create the cache, or it can be on your on-premises network accessible from your VPC.
+ The DNS server and on premises NFSv3 file system must use private IP addresses, as specified in RFC 1918:
  + 10.0.0.0-10.255.255.255 (10/8 prefix)
  + 172.16.0.0-172.31.255.255 (172.16/12 prefix)
  + 192.168.0.0-192.168.255.255 (192.168/16 prefix)
+ You must establish an Direct Connect or VPN connection between your on-premises network and the Amazon VPC where your Amazon File Cache is located. For more information about Direct Connect, see the [https://docs.aws.amazon.com/directconnect/latest/UserGuide/Welcome.html](https://docs.aws.amazon.com/directconnect/latest/UserGuide/Welcome.html). For more information about setting up a VPC connection, see the [https://docs.aws.amazon.com/vpn/latest/s2svpn/VPC_VPN.html](https://docs.aws.amazon.com/vpn/latest/s2svpn/VPC_VPN.html).
**Important**  
Use an Site-to-Site VPN connection if you want to encrypt data as it transits between your Amazon VPC and your on-premises network. For more information, see [What is AWS Site-to-Site VPN?](https://docs.aws.amazon.com/vpn/latest/s2svpn/VPC_VPN.html)
+ Your on-premises firewall must allow traffic between IP addresses in your Amazon VPC subnet IP CIDR and the IP addresses of the DNS server and the on-premises NFSv3 file system. The following ports must be open for the daemons involved in sharing data via NFS:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/fsx/latest/FileCacheGuide/nfs-filer-prereqs.html)

  You can use the following command to look up dynamic ports for your on-premises NFS servers:

  ```
  rpcinfo -p localhost
  ```
+ Your on-premises NFSv3 file system is configured to allow access to IP addresses on the Amazon VPC where the cache is located.
+ The Amazon VPC Security Group used for your cache must be configured to allow outbound traffic to the IP addresses of the DNS server and on-premises NFSv3 file system. Make sure to add outbound rules to allow port 53 for both UDP and TCP for DNS traffic, and to allow the TCP ports used by the on-premises NFSv3 file system for NFS. For more information, see [Controlling access using inbound and outbound rules](limit-access-security-groups.md#inbound-outbound-rules). 
+ While Amazon File Cache supports NFSv3 file systems with most NFSv3 export policies, you must not use the NFS export option `all_squash`. This configuration is required so that Amazon File Cache has the necessary permissions to read and write files owned by all users on your NFSv3 file system.

# Linking your cache to a data repository
<a name="create-linked-data-repo"></a>

You can link your Amazon File Cache to data repositories in Amazon S3 or on NFS (Network File System) file systems that support the NFSv3 protocol. The NFS file systems can be on-premises or in-cloud file systems. You create the links when you create your cache.

A link between a directory on your cache and an Amazon S3 or NFS data repository is called a *data repository association (DRA)*. You can create a maximum of 8 data repository associations on a Amazon File Cache resource. Each DRA must have a unique Amazon File Cache directory and an S3 bucket or NFS file system associated with it.

**Note**  
An Amazon File Cache resource can link to either S3 or NFS data repositories, but not to both types at the same time. All the DRAs on the cache must link to the same data repository type (S3 or NFS).

By default, Amazon File Cache automatically loads data into the cache when it’s accessed for the first time (lazy load). You can optionally pre-load data into the cache before starting your workload.

**Note**  
You shouldn't modify the same file on both the data repository and the cache at the same time, otherwise the behavior is undefined.

# Creating a link to a data repository
<a name="create-linked-repo"></a>

The following procedure walks you through the process of creating a data repository association (DRA) while creating an Amazon File Cache resource, using the AWS Management Console. The DRA links the cache to an existing Amazon S3 bucket or NFS file system.

Keep the following in mind when working with DRAs.
+ You can link to a data repository only when you create the cache.
+ You can't update an existing DRA.
+ You can't delete an existing DRA. To remove a link to a data repository, delete the cache and create it again.
+ You can link your cache to either S3 data repositories or NFS data repositories, but not to both types in a single cache.

For information about using the AWS Command Line Interface (AWS CLI) to create a DRA while creating a cache, see [To create a cache (CLI)](managing-caches.md#create-file-system-cli). 

## To link an S3 bucket or NFS file system while creating a cache (console)
<a name="link-new-repo-console"></a>

1. Open the AWS Management Console at [https://console.aws.amazon.com/fsx/](https://console.aws.amazon.com/fsx/).

1. Follow the procedure for creating a new Amazon File Cache described in [Step 1: Create your cache](getting-started-step1.md).

1. In the **Data repository associations (DRAs)** section, the **Create a new data repository association** dialog box displays.  
![\[The Data Repository Associations configuration dialog, which is one of the dialogs to configure export and import links for an S3 or NFS data repository.\]](http://docs.aws.amazon.com/fsx/latest/FileCacheGuide/images/create-fs-dra.png)

   In the dialog box, provide information for the following fields.
   + **Repository type** – Choose the type of data repository to link to:
     + `NFS` – NFS file system that supports the NFSv3 protocol.
     + `S3` – Amazon S3 bucket
   + **Data repository path** – Enter a path in either an S3 or NFS data repository to associate with your cache.
     + For S3, the path can be an S3 bucket or prefix in the format `s3://myBucket/myPrefix/`. Amazon File Cache will append a trailing "/" to your data repository path if you don't provide one. For example, if you provide a data repository path of `s3://myBucket/myPrefix`, Amazon File Cache will interpret it as `s3://myBucket/myPrefix/`.
     + For NFS, the path to the NFS data repository can be in one of two formats:
       + If you're not using **Subdirectories**, the path is to an NFS Export directory (or one of its subdirectories) in the format `nfs://nfs-domain-name/exportpath`.
       + If you're using **Subdirectories**, the path is the domain name of the NFS file system in the format `nfs://filer-domain-name`, which indicates the root of the NFS Export subdirectories specified with the `NFS Exports` field.

     Two data repository associations can't have overlapping data repository paths. For example, if a data repository with path `s3://myBucket/myPrefix/` is linked to the cache, you can't create another data repository association with data repository path `s3://myBucket/myPrefix/mySubPrefix`.
   + **Subdirectories** – (NFS only) You can optionally provide a list of comma-delimited NFS export paths in the NFS data repository. When this field is provided, **Data repository path** can only contain the NFS domain name, indicating the root of the subdirectories.
   + **DNS server IP addresses** – (NFS only) If you provided the domain name of the NFS file system for **Data repository path**, you can specify up to two IPv4 addresses of DNS servers used to resolve the NFS file system domain name. The provided IP addresses can either be the IP addresses of a DNS forwarder or resolver that the customer manages and runs inside the customer VPC, or the IP addresses of the on-premises DNS servers.
   + **Cache path** – Enter the name of a high-level directory (such as `/ns1`) or subdirectory (such as `/ns1/subdir`) within the Amazon File Cache that will be associated with the data repository. The leading forward slash in the path is required. Two data repository associations cannot have overlapping cache paths. The **Cache path** setting must be unique across all the data repository associations for the cache.
**Note**  
**Cache path** can only be set to root (/) on NFS DRAs when **Subdirectories** is specified. If you specify root (/) as the **Cache path**, you can create only one DRA on the cache.  
**Cache path** cannot be set to root (/) for an S3 DRA.

1. When you finish configuring the DRA, choose **Add**.

1. You can add another data repository association using the same steps. You can create a maximum of 8 data repository associations, which must all be of the same repository type.

1. When you finish adding DRAs, choose **Next**.

1. Continue with the Amazon File Cache creation wizard.

# Working with server-side encrypted Amazon S3 buckets
<a name="s3-server-side-encryption-support"></a>

Amazon File Cache supports Amazon Simple Storage Service (Amazon S3) buckets that use server-side encryption with S3-managed keys (SSE-S3), and with AWS Key Management Service (AWS KMS) stored in AWS KMS (SSE-KMS). 

If you want Amazon File Cache to encrypt data when writing to your S3 bucket, you must set the default encryption on your S3 bucket to either SSE-S3 or SSE-KMS. For more information, see [Configuring default encryption](https://docs.aws.amazon.com/AmazonS3/latest/userguide/default-bucket-encryption.html) in the *Amazon S3 User Guide*.

When writing files to your S3 bucket, Amazon File Cache follows the default encryption policy of your S3 bucket.

By default, Amazon File Cache supports S3 buckets encrypted using SSE-S3. If you want to link your Amazon File Cache to an S3 bucket encrypted using SSE-KMS encryption, you must add a statement to your customer managed key policy that allows Amazon File Cache to encrypt and decrypt objects in your S3 bucket using your AWS KMS key.

The following statement allows a specific Amazon File Cache to encrypt and decrypt objects for a specific S3 bucket, *bucket\$1name*.

```
{
    "Sid": "Allow access through S3 for the FSx SLR to use the KMS key on the objects in the given S3 bucket",
    "Effect": "Allow",
    "Principal": {
        "AWS": "arn:aws:iam::aws_account_id:role/aws-service-role/s3.data-source.lustre.fsx.amazonaws.com/AWSServiceRoleForFSxS3Access_file_cache_id"
    },
    "Action": [
        "kms:Encrypt",
        "kms:Decrypt",
        "kms:ReEncrypt*",
        "kms:GenerateDataKey*",
        "kms:DescribeKey"
    ],
    "Resource": "*",
    "Condition": {
        "StringEquals": {
            "kms:CallerAccount": "aws_account_id",
            "kms:ViaService": "s3.bucket-region.amazonaws.com"
        },
        "StringLike": {
            "kms:EncryptionContext:aws:s3:arn": "arn:aws:s3:::bucket_name/*"
        }
    }
}
```

**Note**  
 If you're using an AWS KMS with a CMK to encrypt your S3 bucket with S3 Bucket Keys enabled, set the `EncryptionContext` to the bucket ARN, not the object ARN, as in this example:  

```
"StringLike": {
    "kms:EncryptionContext:aws:s3:arn": "arn:aws:s3:::bucket_name"
}
```

The following policy statement allows every Amazon File Cache in your account to link to a specific S3 bucket.

```
{
    "Sid": "Allow access through S3 for the FSx SLR to use the KMS key on the objects in the given S3 bucket",
    "Effect": "Allow",
    "Principal": {
        "AWS": "*"
    },
    "Action": [
        "kms:Encrypt",
        "kms:Decrypt",
        "kms:ReEncrypt*",
        "kms:GenerateDataKey*",
        "kms:DescribeKey"
    ],
    "Resource": "*",
    "Condition": {
        "StringEquals": {
            "kms:CallerAccount": "aws_account_id",
            "kms:ViaService": "s3.bucket-region.amazonaws.com"
        },
        "StringLike": {
            "aws:userid": "*:FSx",
            "kms:EncryptionContext:aws:s3:arn": "arn:aws:s3:::bucket_name/*"
        }
    }
}
```

## Accessing server-side encrypted Amazon S3 buckets in a different AWS account
<a name="s3-server-side-cross-account-support"></a>

After you create a cache linked to an encrypted Amazon S3 bucket, you must then grant the `AWSServiceRoleForFSxS3Access_fc-01234567890` service-linked role (SLR) access to the AWS KMS key used to encrypt the S3 bucket before reading or writing data from the linked S3 bucket. You can use an IAM role which already has permissions to the AWS KMS key.

**Note**  
This IAM role must be in the account that the Amazon File Cache was created in (which is the same account as the S3 SLR), not the account that the AWS KMS key/S3 bucket belongs to.

You use the IAM role to call the following AWS KMS API to create a grant for the S3 SLR so that the SLR gains permission to the S3 objects. In order to find the ARN associated with your SLR, search your IAM roles using your cache ID as the search string.

```
$ aws kms create-grant --region cache_account_region \
      --key-id arn:aws:kms:s3_bucket_account_region:s3_bucket_account:key/key_id \
      --grantee-principal arn:aws:iam::cache_account_id:role/aws-service-role/s3.data-source.lustre.fsx.amazonaws.com/AWSServiceRoleForFSxS3Access_file-cache-id \
      --operations "Decrypt" "Encrypt" "GenerateDataKey" "GenerateDataKeyWithoutPlaintext" "CreateGrant" "DescribeKey" "ReEncryptFrom" "ReEncryptTo"
```

For more information about service-linked roles, see [Using service-linked roles for Amazon FSx](using-service-linked-roles.md).

# Importing files from your data repository
<a name="importing-files"></a>

When you create a Amazon File Cache resource, you can create a data repository association (DRA) to link your cache to an Amazon S3 or NFS data repository. Amazon File Cache transparently copies the content of a file from your repository and loads it into the cache, if it doesn't already exist, when your application accesses the file.

You can also preload your whole cache or an entire directory within your cache. For more information, see [Preloading files into your cache](preload-file-contents-hsm.md).

This data movement is managed by Amazon File Cache and occurs transparently to your applications. Subsequent reads of these files are served directly out of Amazon File Cache with consistent sub-millisecond latencies. If you request the preloading of multiple ﬁles simultaneously, Amazon File Cache loads your ﬁles from your linked data repository in parallel. For more information, see [Lazy load](mdll-lazy-load.md).

Amazon File Cache *only* imports objects that have POSIX-compliant object keys, such as:

```
test/mydir/ 
test/
```

**Note**  
For a linked S3 bucket, Amazon File Cache doesn't support importing metadata for symbolic links (symlinks) from S3 Glacier Flexible Retrieval and S3 Glacier Deep Archive storage classes. Metadata for S3 Glacier Flexible Retrieval objects that are not symlinks can be imported (that is, an inode is created on the cache with the correct metadata). However, to retrieve the data, you must restore the S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive object first and then use an `hsm_restore` command to import the object. Importing file data directly from Amazon S3 objects in the S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage class into Amazon File Cache is not supported.

# Lazy load
<a name="mdll-lazy-load"></a>

When you access data on a linked Amazon S3 or NFS data repository using the cache, Amazon File Cache automatically loads the metadata (the name, ownership, timestamps, and permissions) and file contents if they're not already present in the cache. The data in your data repositories appears as files and directories in the cache. 

Lazy load is triggered when you're in a DRA directory and you read or write data or metadata to a file. Amazon File Cache loads data into the cache from the linked data repositories if it's not already available. For example, lazy load is triggered when you open a file, stat a file, or make metadata updates to the file.

You can also trigger lazy load by using the `ls` command to list the contents of a DRA directory. If you're at the root of a directory hierarchy that includes several DRA directories, the `ls` command will use lazy load on all the DRA directories in the hierarchy. For example, if you’re at `/` in the directory tree, and your four DRAs are `/a`, `/b`, `/c`, and `/d`, then running a recursive `ls` command populates metadata for all DRAs. To run a recursive `ls` command, use the `-R` option shown in the examples below:

```
ls -R
ls -R /tmp/dir1
```

When you use the `ls` or `stat` commands, Amazon File Cache only loads file and directory metadata for requested files; no file content will be downloaded. The data from a file in the data repository is actually downloaded to your cache when the file is read.

**Note**  
Amazon File Cache only loads a directory listing the first time `ls` is run on a directory. Subsequently, if new files are added or existing files are changed in the corresponding directory in the linked data repository, you can `stat` the file path to update the directory listing.

# Preloading files into your cache
<a name="preload-file-contents-hsm"></a>

If the data you're accessing doesn't already exist in the cache, Amazon File Cache copies the data from your Amazon S3 or NFS data repository into the cache in line with file access. Because of this approach, the initial read or write to a file incurs a small amount of latency. If your application is sensitive to this latency, and you know which files or directories your application needs to access, you can optionally preload contents of individual files or directories. You do so using the `hsm_restore` command.

You can use the `hsm_action` command (issued with the `lfs` user utility) to verify that the file's contents have finished loading into the cache. A return value of `NOOP` indicates that the file has successfully been loaded. Run the following commands from a compute instance with the cache mounted. Replace *path/to/file* with the path of the file you're preloading into your cache.

```
sudo lfs hsm_restore path/to/file
sudo lfs hsm_action path/to/file
```

You can preload your whole cache or an entire directory within your cache by using the following commands. (The trailing ampersand makes a command run as a background process.) If you request the preloading of multiple files simultaneously, Amazon File Cache loads your files from your linked data repository in parallel.

```
nohup find local/directory -type f -print0 | xargs -0 -n 1 sudo lfs hsm_restore &
```

**Note**  
If your linked data repository is larger than your cache, you can only load as much actual file data as will fit into the cache's remaining storage space. You'll receive an error if you attempt to access file data when there's no more storage remaining in the cache.

# Exporting changes to the data repository
<a name="export-changed-data"></a>

You can export data and metadata changes, including POSIX metadata, from Amazon File Cache to a linked Amazon S3 or NFS data repository. Associated POSIX metadata includes ownership, permissions, and timestamps. To export changes from the cache, use HSM commands. When you export a file or directory using HSM commands, your cache exports only data files and metadata that were created or modified since the last export. For more information, see [Exporting files using HSM commands](exporting-files-hsm.md).

**Important**  
For Amazon File Cache to export your data to your linked data repository, it must be stored in a UTF-8 compatible format.

**Topics**
+ [Exporting files using HSM commands](exporting-files-hsm.md)

# Exporting files using HSM commands
<a name="exporting-files-hsm"></a>

To export an individual file to your data repository and verify the success of the export, run the following commands. A return value of `states: (0x00000009) exists archived` indicates that export was successful.

```
sudo lfs hsm_archive path/to/export/file
sudo lfs hsm_state path/to/export/file
```

**Note**  
You must run the HSM commands (such as `hsm_archive`) as the root user or using `sudo`.

To export changes on an entire cache or an entire directory in your cache, run the following commands. If you export multiple files simultaneously, Amazon File Cache exports your files to your data repository in parallel.

```
nohup find local/directory -type f -print0 | xargs -0 -n 1 sudo lfs hsm_archive &
```

To determine whether the export is complete, run the following command.

```
find path/to/export/file -type f -print0 | xargs -0 -n 1 -P 8 sudo lfs hsm_state | awk '!/\<archived\>/ || /\<dirty\>/' | wc -l
```

If the command returns with zero files remaining, the export is complete.

# Cache eviction
<a name="cache-eviction"></a>

Files can be evicted (released) from the cache to free up space for new files. Releasing a file retains the file listing and metadata, but removes the local copy of that file's contents. You can't release a file if it's in use or if it hasn't been exported to a linked data repository. There are two methods to release files:
+ Automatic cache eviction releases files automatically when the cache begins to fill up.
+ Manual release using HSM commands to release files.

**Important**  
Both methods only release files that are in the archived state. You must first export the files to your linked data repository using HSM commands, as described in [Exporting files using HSM commands](exporting-files-hsm.md).

## Automatic cache eviction
<a name="auto-cache-eviction"></a>

Amazon File Cache automatically manages the cache storage capacity by releasing the less recently used files on your cache when the cache begins to fill up. Automatic cache eviction is enabled by default when you create a cache using the File Cache console, AWS CLI, or the AWS API.

## Releasing files using HSM commands
<a name="hsm-release-files"></a>

You can manually release individual files from your cache using the following commands:
+ To release one or more files from your cache if you are the file owner:

  ```
  lfs hsm_release file1 file2 ...
  ```
+ To release one or more files from your cache if you're not the file owner:

  ```
  sudo lfs hsm_release file1 file2 ...
  ```

The `hsm_release` command can only release regular files as defined by POSIX. You cannot release sockets, symbolic links, block devices, character devices, or named pipes. To identify whether a file is a regular file, use the `test -f` command on that file.