

# Importing files from your data repository
<a name="importing-files"></a>

When you create a Amazon File Cache resource, you can create a data repository association (DRA) to link your cache to an Amazon S3 or NFS data repository. Amazon File Cache transparently copies the content of a file from your repository and loads it into the cache, if it doesn't already exist, when your application accesses the file.

You can also preload your whole cache or an entire directory within your cache. For more information, see [Preloading files into your cache](preload-file-contents-hsm.md).

This data movement is managed by Amazon File Cache and occurs transparently to your applications. Subsequent reads of these files are served directly out of Amazon File Cache with consistent sub-millisecond latencies. If you request the preloading of multiple ﬁles simultaneously, Amazon File Cache loads your ﬁles from your linked data repository in parallel. For more information, see [Lazy load](mdll-lazy-load.md).

Amazon File Cache *only* imports objects that have POSIX-compliant object keys, such as:

```
test/mydir/ 
test/
```

**Note**  
For a linked S3 bucket, Amazon File Cache doesn't support importing metadata for symbolic links (symlinks) from S3 Glacier Flexible Retrieval and S3 Glacier Deep Archive storage classes. Metadata for S3 Glacier Flexible Retrieval objects that are not symlinks can be imported (that is, an inode is created on the cache with the correct metadata). However, to retrieve the data, you must restore the S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive object first and then use an `hsm_restore` command to import the object. Importing file data directly from Amazon S3 objects in the S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage class into Amazon File Cache is not supported.

# Lazy load
<a name="mdll-lazy-load"></a>

When you access data on a linked Amazon S3 or NFS data repository using the cache, Amazon File Cache automatically loads the metadata (the name, ownership, timestamps, and permissions) and file contents if they're not already present in the cache. The data in your data repositories appears as files and directories in the cache. 

Lazy load is triggered when you're in a DRA directory and you read or write data or metadata to a file. Amazon File Cache loads data into the cache from the linked data repositories if it's not already available. For example, lazy load is triggered when you open a file, stat a file, or make metadata updates to the file.

You can also trigger lazy load by using the `ls` command to list the contents of a DRA directory. If you're at the root of a directory hierarchy that includes several DRA directories, the `ls` command will use lazy load on all the DRA directories in the hierarchy. For example, if you’re at `/` in the directory tree, and your four DRAs are `/a`, `/b`, `/c`, and `/d`, then running a recursive `ls` command populates metadata for all DRAs. To run a recursive `ls` command, use the `-R` option shown in the examples below:

```
ls -R
ls -R /tmp/dir1
```

When you use the `ls` or `stat` commands, Amazon File Cache only loads file and directory metadata for requested files; no file content will be downloaded. The data from a file in the data repository is actually downloaded to your cache when the file is read.

**Note**  
Amazon File Cache only loads a directory listing the first time `ls` is run on a directory. Subsequently, if new files are added or existing files are changed in the corresponding directory in the linked data repository, you can `stat` the file path to update the directory listing.

# Preloading files into your cache
<a name="preload-file-contents-hsm"></a>

If the data you're accessing doesn't already exist in the cache, Amazon File Cache copies the data from your Amazon S3 or NFS data repository into the cache in line with file access. Because of this approach, the initial read or write to a file incurs a small amount of latency. If your application is sensitive to this latency, and you know which files or directories your application needs to access, you can optionally preload contents of individual files or directories. You do so using the `hsm_restore` command.

You can use the `hsm_action` command (issued with the `lfs` user utility) to verify that the file's contents have finished loading into the cache. A return value of `NOOP` indicates that the file has successfully been loaded. Run the following commands from a compute instance with the cache mounted. Replace *path/to/file* with the path of the file you're preloading into your cache.

```
sudo lfs hsm_restore path/to/file
sudo lfs hsm_action path/to/file
```

You can preload your whole cache or an entire directory within your cache by using the following commands. (The trailing ampersand makes a command run as a background process.) If you request the preloading of multiple files simultaneously, Amazon File Cache loads your files from your linked data repository in parallel.

```
nohup find local/directory -type f -print0 | xargs -0 -n 1 sudo lfs hsm_restore &
```

**Note**  
If your linked data repository is larger than your cache, you can only load as much actual file data as will fit into the cache's remaining storage space. You'll receive an error if you attempt to access file data when there's no more storage remaining in the cache.