

# Overview of data repositories
<a name="overview-data-repo"></a>

Amazon File Cache is well integrated with data repositories in Amazon S3 or on Network File System (NFS) file systems that support the NFSv3 protocol. This integration means that you can seamlessly access the objects stored in your Amazon S3 buckets or NFS data repositories from applications that mount your cache. You can also run your compute-intensive workloads on Amazon EC2 instances in the AWS Cloud and export the results to your data repository after your workload is complete.

**Note**  
You can link your cache to either S3 or NFS data repositories, but not to both types at the same time. You can't have a mix of linked S3 and NFS data repositories on a single cache.

When you use Amazon File Cache with multiple storage repositories, you can ingest and process large volumes of file data in a high-performance cache. At the same time, you can write results to your data repositories by using HSM commands. With these features, you can restart your workload at any time using the latest data stored in your data repository.

By default, Amazon File Cache automatically loads data into the cache when it’s accessed for the first time (lazy load). You can optionally pre-load data into the cache before starting your workload. For more information, see [Lazy load](mdll-lazy-load.md).

You can also export files and their associated metadata (including POSIX metadata) in your cache to your data repository using HSM commands. When you use HSM commands, file data and metadata that were created or modified since the last such export are exported to the data repository. For more information, see [Exporting files using HSM commands](exporting-files-hsm.md).

**Important**  
If you have linked one or more caches to a data repository on Amazon S3, don't delete the Amazon S3 bucket until you have deleted all linked caches.

# POSIX metadata support for data repositories
<a name="posix-metadata-support"></a>

Amazon File Cache automatically transfers Portable Operating System Interface (POSIX) metadata for files, directories, and symbolic links (symlinks) when importing and exporting data to and from a linked Amazon S3 or NFS data repository. When you export changes in your cache to a linked data repository, Amazon File Cache also exports POSIX metadata changes along with data changes. Because of this metadata export, you can implement and maintain access controls between your cache and its linked data repositories.

 Amazon File Cache imports only objects that have POSIX-compliant object keys, such as the following.

```
test/mydir/ 
test/
```

Amazon File Cache stores directories and symlinks as separate objects in the linked data repository. On an S3 data repository for example, for directories, Amazon File Cache creates an S3 object with a key name that ends with a slash ("/"), as follows:
+ The S3 object key `test/mydir/` maps to the cache directory `test/mydir`.
+ The S3 object key `test/` maps to the cache directory `test`.

For symlinks, Amazon File Cache uses the following Amazon S3 schema for symlinks:
+ **S3 object key** – The path to the link, relative to the Amazon File Cache mount directory
+ **S3 object data** – The target path of the symlink
+ **S3 object metadata** – The metadata for the symlink

Amazon File Cache stores POSIX metadata, including ownership, permissions, and timestamps for Amazon File Cache files, directories, and symbolic links, in S3 objects as follows:
+ `Content-Type` – The HTTP entity header used to indicate the media type of the resource for web browsers.
+ `x-amz-meta-file-permissions` – The file type and permissions in the format `<octal file type><octal permission mask>`, consistent with `st_mode` in the [Linux stat(2) man page](https://man7.org/linux/man-pages/man2/lstat.2.html).
**Note**  
Amazon File Cache doesn't import or retain `setuid` information.
+ `x-amz-meta-file-owner` – The owner user ID (UID) expressed as an integer.
+ `x-amz-meta-file-group` – The group ID (GID) expressed as an integer.
+ `x-amz-meta-file-atime` – The last-accessed time in nanoseconds. Terminate the time value with `ns`; otherwise Amazon File Cache interprets the value as milliseconds.
+ `x-amz-meta-file-mtime` – The last-modified time in nanoseconds. Terminate the time value with `ns`; otherwise, Amazon File Cache interprets the value as milliseconds.
+ `x-amz-meta-user-agent` – The user agent, ignored during Amazon File Cache import. During export, Amazon File Cache sets this value to `aws-fsx-lustre`.

The default POSIX permission that Amazon File Cache assigns to a file is 755. This permission allows read and execute access for all users and write access for the owner of the file.

**Note**  
Amazon File Cache doesn't retain any user-defined custom metadata on S3 objects.

# Walkthrough: Attaching POSIX permissions when uploading objects into an Amazon S3 bucket
<a name="attach-s3-posix-permissions"></a>

The following procedure walks you through the process of uploading objects into Amazon S3 with POSIX permissions. By doing so, you can import the POSIX permissions when you create an Amazon File Cache that is linked to that S3 bucket.

**To upload objects with POSIX permissions to Amazon S3**

1. From your local computer or machine, use the following example commands to create a test directory (`s3cptestdir`) and file (`s3cptest.txt`) that will be uploaded to the S3 bucket.

   ```
   $ mkdir s3cptestdir
   $ echo "S3cp metadata import test" >> s3cptestdir/s3cptest.txt
   $ ls -ld s3cptestdir/ s3cptestdir/s3cptest.txt
   drwxr-xr-x 3 500 500 96 Jan 8 11:29 s3cptestdir/
   -rw-r--r-- 1 500 500 26 Jan 8 11:29 s3cptestdir/s3cptest.txt
   ```

   The newly created file and directory have a file owner user ID (UID) and group ID (GID) of 500, and permissions as shown in the preceding example.

1. Call the Amazon S3 API to create the directory `s3cptestdir` with metadata permissions. You must specify the directory name with a trailing slash (`/`). For information about supported POSIX metadata, see [POSIX metadata support for data repositories](posix-metadata-support.md).

   Replace `bucket_name` with the actual name of your S3 bucket.

   ```
   $ aws s3api put-object --bucket bucket_name --key s3cptestdir/ --metadata '{"user-agent":"aws-fsx-lustre" , \
         "file-atime":"1595002920000000000ns" , "file-owner":"500" , "file-permissions":"0100664","file-group":"500" , \
         "file-mtime":"1595002920000000000ns"}'
   ```

1. Verify that the POSIX permissions are tagged to S3 object metadata.

   ```
   $ aws s3api head-object --bucket bucket_name --key s3cptestdir/
   {
       "AcceptRanges": "bytes",
       "LastModified": "Fri, 08 Jan 2021 17:32:27 GMT",
       "ContentLength": 0,
       "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
       "VersionId": "bAlhCoWq7aIEjc3R6Myc6UOb8sHHtJkR",
       "ContentType": "binary/octet-stream",
       "Metadata": {
           "user-agent": "aws-fsx-lustre",
           "file-atime": "1595002920000000000ns",
           "file-owner": "500",
           "file-permissions": "0100664",
           "file-group": "500",
           "file-mtime": "1595002920000000000ns"
       }
   }
   ```

1. Upload the test file (created in step 1) from your computer to the S3 bucket with metadata permissions.

   ```
   $ aws s3 cp s3cptestdir/s3cptest.txt s3://bucket_name/s3cptestdir/s3cptest.txt \
         --metadata '{"user-agent":"aws-fsx-lustre" , "file-atime":"1595002920000000000ns" , \
         "file-owner":"500" , "file-permissions":"0100664","file-group":"500" , "file-mtime":"1595002920000000000ns"}'
   ```

1. Verify that the POSIX permissions are tagged to S3 object metadata.

   ```
   $ aws s3api head-object --bucket bucket_name --key s3cptestdir/s3cptest.txt
   {
       "AcceptRanges": "bytes",
       "LastModified": "Fri, 08 Jan 2021 17:33:35 GMT",
       "ContentLength": 26,
       "ETag": "\"eb33f7e1f44a14a8e2f9475ae3fc45d3\"",
       "VersionId": "w9ztRoEhB832m8NC3a_JTlTyIx7Uzql6",
       "ContentType": "text/plain",
       "Metadata": {
           "user-agent": "aws-fsx-lustre",
           "file-atime": "1595002920000000000ns",
           "file-owner": "500",
           "file-permissions": "0100664",
           "file-group": "500",
           "file-mtime": "1595002920000000000ns"
       }
   }
   ```

1. Verify permissions on the Amazon File Cache linked to the S3 bucket.

   ```
   $ sudo lfs df -h /fsx
   UUID                       bytes        Used   Available Use% Mounted on
   3rnxfbmv-MDT0000_UUID       34.4G        6.1M       34.4G   0% /fsx[MDT:0]
   3rnxfbmv-OST0000_UUID        1.1T        4.5M        1.1T   0% /fsx[OST:0]
    
   filesystem_summary:         1.1T        4.5M        1.1T   0% /fsx
    
   $ cd /fsx/s3cptestdir/
   $ ls -ld s3cptestdir/
   drw-rw-r-- 2 500 500 25600 Jan  8 17:33 s3cptestdir/
   
   $ ls -ld s3cptestdir/s3cptest.txt
   -rw-rw-r-- 1 500 500 26 Jan 8 17:33 s3cptestdir/s3cptest.txt
   ```

Both the `s3cptestdir` directory and the `s3cptest.txt` file have POSIX permissions imported.

# Prerequisites for linking to on-premises NFS data repositories
<a name="nfs-filer-prereqs"></a>

Before you can link your cache to an on-premises NFS data store, verify that your resources and configurations meet the following requirements:
+ Your on-premises NFS file system must support NFSv3.
+ If you're using a domain name to link your NFS file system to Amazon File Cache, you must provide the IP address of a DNS server that Amazon File Cache can use to resolve the domain name of the on-premises NFSv3 file system. The DNS server can be located in the VPC where you plan to create the cache, or it can be on your on-premises network accessible from your VPC.
+ The DNS server and on premises NFSv3 file system must use private IP addresses, as specified in RFC 1918:
  + 10.0.0.0-10.255.255.255 (10/8 prefix)
  + 172.16.0.0-172.31.255.255 (172.16/12 prefix)
  + 192.168.0.0-192.168.255.255 (192.168/16 prefix)
+ You must establish an Direct Connect or VPN connection between your on-premises network and the Amazon VPC where your Amazon File Cache is located. For more information about Direct Connect, see the [https://docs.aws.amazon.com/directconnect/latest/UserGuide/Welcome.html](https://docs.aws.amazon.com/directconnect/latest/UserGuide/Welcome.html). For more information about setting up a VPC connection, see the [https://docs.aws.amazon.com/vpn/latest/s2svpn/VPC_VPN.html](https://docs.aws.amazon.com/vpn/latest/s2svpn/VPC_VPN.html).
**Important**  
Use an Site-to-Site VPN connection if you want to encrypt data as it transits between your Amazon VPC and your on-premises network. For more information, see [What is AWS Site-to-Site VPN?](https://docs.aws.amazon.com/vpn/latest/s2svpn/VPC_VPN.html)
+ Your on-premises firewall must allow traffic between IP addresses in your Amazon VPC subnet IP CIDR and the IP addresses of the DNS server and the on-premises NFSv3 file system. The following ports must be open for the daemons involved in sharing data via NFS:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/fsx/latest/FileCacheGuide/nfs-filer-prereqs.html)

  You can use the following command to look up dynamic ports for your on-premises NFS servers:

  ```
  rpcinfo -p localhost
  ```
+ Your on-premises NFSv3 file system is configured to allow access to IP addresses on the Amazon VPC where the cache is located.
+ The Amazon VPC Security Group used for your cache must be configured to allow outbound traffic to the IP addresses of the DNS server and on-premises NFSv3 file system. Make sure to add outbound rules to allow port 53 for both UDP and TCP for DNS traffic, and to allow the TCP ports used by the on-premises NFSv3 file system for NFS. For more information, see [Controlling access using inbound and outbound rules](limit-access-security-groups.md#inbound-outbound-rules). 
+ While Amazon File Cache supports NFSv3 file systems with most NFSv3 export policies, you must not use the NFS export option `all_squash`. This configuration is required so that Amazon File Cache has the necessary permissions to read and write files owned by all users on your NFSv3 file system.