# Call caching for HealthOmics runs
<a name="workflows-call-caching"></a>

AWS HealthOmics supports call caching, also known as resume, for private workflows. Call caching saves the outputs of completed workflow tasks after a run finishes. Subsequent runs can use the task outputs from the cache, rather than computing the task outputs again. Call caching reduces compute resource usage, which results in shorter run durations and compute cost savings.

You can access the cached task output files after the run completes. To perform advanced task debugging and troubleshooting, you can cache intermediate task files by specifying these files as task outputs in the workflow definition.

You can use call caching to save the completed task results from failed runs. The next run starts from the last successfully completed task, rather than computing the completed tasks again. 

If HealthOmics doesn't find a matching cache entry for a task, the run doesn't fail. HealthOmics recomputes the task and its dependent tasks. 

For information about troubleshooting call caching issues, see [Troubleshooting call caching issues](troubleshooting.md#workflow-cache-troubleshooting).

**Topics**
+ [How call caching works](how-run-cache.md)
+ [Creating a run cache](workflow-cache-create.md)
+ [Updating a run cache](workflow-cache-update.md)
+ [Deleting a run cache](workflow-cache-delete.md)
+ [Contents of a run cache](workflow-cache-contents.md)
+ [Engine-specific caching features](workflow-cache-per-engine.md)
+ [Using the run cache](workflow-cache-startrun.md)

# How call caching works
<a name="how-run-cache"></a>

To use call caching, you create a run cache and configure it to have an associated Amazon S3 location for the cached data. When you start a run, you specify the run cache. A run cache isn't dedicated to one workflow. Runs from multiple workflows can use the same cache.

During the export phase of a run, the system exports the completed task outputs to the Amazon S3 location. To export intermediate task files, declare these files as task outputs in the workflow definition. Call caching also internally saves metadata and creates unique hashes for each cache entry. 

For each task in a run, the workflow engine detects whether there is a matching cache entry for this task. If there is no matching cache entry, HealthOmics computes the task. If there is a matching cache entry, the engine retrieves the cached results.

To match cache entries, HealthOmics uses the hashing mechanism that's included in the native workflow engines. HealthOmics extends these existing hash implementations to account for HealthOmics variables, such as S3 eTags and ECR container digests.

HealthOmics supports call caching for these workflow language versions: 
+ WDL versions 1.0, 1.1, and the development version
+ Nextflow version 23.10 and later
+ All CWL versions

**Note**  
HealthOmics doesn't support call caching for Ready2Run workflows.

**Topics**
+ [Shared responsibility model](#run-cache-srm)
+ [Caching requirements for tasks](#workflow-cache-task-prereqs)
+ [Run cache performance](#run-cache-performance)
+ [Cache data retention and invalidation events](#workflow-cache-data)

## Shared responsibility model
<a name="run-cache-srm"></a>

There is a shared responsibility between users and AWS to determine whether tasks and runs are good candidates for call caching. Call caching achieves the best outcomes when all tasks are idempotent (repeated executions of a task using the same inputs produce the same results). 

However, if a task includes non-deterministic elements (such as random number generations or system time), repeated executions of the task using the same inputs may result in different outputs. This can impact the effectiveness of call caching in the following ways:
+ If HealthOmics uses a cache entry (created by a previous run) that is not identical to the output that the task execution would produce for the current run, the run may yield different results than the same run with no caching.
+ HealthOmics may not find a matching cache entry for a task that should match, because of non-deterministic task outputs. If it doesn't find the valid cache entry, the run unnecessarily recomputes the task, which reduces the cost saving benefits of using call caching.

The following are known task behaviors that can cause non-deterministic results that affect call caching outcomes:
+ Using random number generators.
+ Dependence on the system time. 
+ Using concurrency (race-conditions can cause output variance). 
+ Fetching local or remote files beyond what is specified in the task input parameters.

For other scenarios that can cause non-deterministic behavior, see [ Non-deterministic process inputs](https://www.nextflow.io/docs/latest/cache-and-resume.html#non-deterministic-process-inputs) on the Nextflow documentation site.

If you suspect that a task produces outputs that are non-deterministic, consider using workflow engine features to avoid caching specific tasks that are non-deterministic. For instructions on how to opt out of caching for individual tasks in each supported workflow language, see [Engine-specific caching features](workflow-cache-per-engine.md).

We recommend that you thoroughly review your specific workflow and task requirements before enabling call caching in any environments in which ineffective call caching or different outputs than expected can present risk. For example, the potential limitations of call caching should be carefully considered in determining whether call caching is appropriate for clinical use cases.

## Caching requirements for tasks
<a name="workflow-cache-task-prereqs"></a>

HealthOmics caches task outputs for tasks that meet the following requirements:
+ The task must define a container. HealthOmics won't cache outputs for a task with no container.
+ The task must produce one or more outputs. You specify task outputs in the workflow definition.
+ The workflow definition must not use dynamic values. For example, if you pass a parameter to a task with a value that increments with every run, HealthOmics doesn't cache the task outputs. 

**Note**  
If multiple tasks in a run use the same container image, HealthOmics provides the same image version to all of these tasks. After HealthOmics pulls the image, it ignores any updates to the container image for the duration of the run. This approach provides a predictable and consistent experience and prevents potential issues that could arise from updates to the container image that are deployed mid-run.

## Run cache performance
<a name="run-cache-performance"></a>

When you turn on call caching for a run, you may notice the following impacts on run performance: 
+ During the first run, HealthOmics saves the cache data for tasks in the run. You may experience longer export times for this run, because call caching increases the amount of export data.
+ In subsequent runs, when resuming a run from cache, it may shorten the number of processing steps and reduce your run time.
+  If you also choose to declare intermediate files as outputs, then your export times might be even longer since this data can be more verbose. 

## Cache data retention and invalidation events
<a name="workflow-cache-data"></a>

The main purpose of a run cache is to optimize computation of tasks in the run. If there is a valid matching cache entry for a task, HealthOmics uses the cache entry instead of recomputing the task. Otherwise, HealthOmics reverts to the default service behavior, which is to recompute the task and its dependent tasks. By using this approach, cache misses don't cause the run to fail. 

We recommend that you manage the run cache size. Over time, cache entries may no longer be valid because of workflow engine or HealthOmics service updates or because of changes you made in the run or the run tasks. The following sections provide additional details. 

**Topics**
+ [Manifest version updates and data freshness](#workflow-cache-data-versions)
+ [Run cache behavior](#run-cache-behavior)
+ [Control run cache size](#workflow-cache-manage)

### Manifest version updates and data freshness
<a name="workflow-cache-data-versions"></a>

Periodically, the HealthOmics service may introduce new features or workflow engine updates that invalidate some or all run cache entries. In this situation, your runs can experience a one-time cache miss. 

HealthOmics creates a [JSON manifest file](workflow-cache-contents.md) for each cache entry. For runs started after February 12th 2025, the manifest file includes a version parameter. If a service update invalidates any cache entries, HealthOmics increments the version number so that you can identify the legacy cache entries for removal. 

The following example shows a manifest file with the version set to 2:

```
{
     "arn": "arn:aws:omics:us-west-2:12345678901:runCache/0123456/cacheEntry/1234567-195f-3921-a1fa-ffffcef0a6a4",
     "s3uri": "s3://example/1234567-d0d1-e230-d599-10f1539f4a32/1348677/4795326/7e8c69b1-145f-3991-a1fa-ffffcef0a6a4",
     "taskArn": "arn:aws:omics:us-west-2:12345678901:task/4567891",
     "workDir": "/mnt/workflow/1234567-d0d1-e230-d599-10f1539f4a32/workdir/call-TxtFileCopyTask/5w6tn5feyga7noasjuecdeoqpkltrfo3/wxz2fuddlo6hc4uh5s2lreaayczduxdm",
     "files": [
         {
             "name": "output_txt_file",
             "path": "out/output_txt_file/outfile.txt",
             "etag": "ajdhyg9736b9654673b9fbb486753bc8"
         }
     ],
     "nextflowContext": {},
     "otherOutputs": {},
     "version": 2,       
  }
```

For runs with cache entries that are no longer valid, rebuild the cache to create new valid entries. Perform the following steps for each run:

1. Start the run once with cache retention set to CACHE ALWAYS. This run creates the new cache entries.

1. For subsequent runs, set the cache retention to its former setting (CACHE ALWAYS or CACHE ON FAILURE).

To clean-up cache entries that are no longer valid, you can delete these cache entries from the cache Amazon S3 bucket. HealthOmics never reuses these cache entries. If you choose to retain entries that aren't valid, there is no impact on your runs.

**Note**  
Call caching saves task output data in the Amazon S3 location specified for the cache, which incurs charges to your AWS account.

### Run cache behavior
<a name="run-cache-behavior"></a>

You can set run cache behavior to save the task outputs for runs that fail (cache on failure) or for all runs (cache always). When you create a run cache, you set the default cache behavior for all runs that use this cache. You can override the default behavior when you start a run.

**Cache on failure** is useful if you're debugging a workflow that fails after several tasks completed successfully. The subsequent run resumes from the last successfully completed task if all the unique variables considered by the hash are identical to the prior run.

**Cache always** is useful if you're updating a task in a workflow that completes successfully. We recommend that you follow these steps:

1. Create a new run. Set the **Cache behavior** to **Cache always**, and start the run.

1. After the run completes, update the task in the workflow and start a new run with behavior set **Cache always**. This run processes the updated task and any subsequent tasks that have a dependency on the updated task. All other tasks use the cached results.

1. Repeat step 2 as required, until development is complete for the updated task.

1. Use the updated task as needed on future runs. Remember to switch subsequent runs to **Cache on failure** if you plan to use new or different inputs for these runs.

**Note**  
We recommend **Cache always** mode while using the same test data set, but not for a batch of runs. If you set this mode for a large batch of runs, the system can export large amounts of data to Amazon S3, resulting in increased export times and storage costs.

### Control run cache size
<a name="workflow-cache-manage"></a>

HealthOmics doesn't delete or auto-archive any run cache data or apply Amazon S3 clean-up rules for managing the cache data. We recommend that you perform regular cache clean-ups to save on Amazon S3 storage costs and to keep your run cache size manageable. You can delete files directly or set data retention/replication policies on the run cache bucket. 

For example, you can configure an Amazon S3 lifecycle policy to expire objects after 90 days, or you can manually clean-up the cache data at the end of each development project.

The following information can help you manage cache data size:
+ You can view how much data is in the cache by checking Amazon S3. HealthOmics doesn't monitor or report on cache size.
+ If you delete a valid cache entry, the subsequent run doesn't fail. HealthOmics recomputes the task and its dependent tasks.
+ If you modify cache names or directory structures such that HealthOmics can’t find a matching entry for a task, HealthOmics recomputes the task.

If you need to check whether a cache entry is still valid, check the cache manifest version number. For more information, see [Manifest version updates and data freshness](#workflow-cache-data-versions).

# Creating a run cache
<a name="workflow-cache-create"></a>

When you create a run cache, you specify an Amazon S3 location for the cache data. This data must be immediately accessible. Call caching doesn't retrieve objects archived in Glacier (such as GFR and GDA storage classes).

If the Amazon S3 bucket for the cache data is owned by another AWS account, provide that account ID when you create the run cache.

## Creating a run cache using the console
<a name="workflow-cache-create-console"></a>

From the console, follow these steps to create a run cache.

1. Open the [HealthOmics console](https://console.aws.amazon.com/omics/).

1.  If required, open the left navigation pane (≡). Choose **Run caches**.

1. From the **Run caches** page, choose **Create run cache**.

1. In the **Run cache details** panel of the **Create run cache** page, configure these fields:

   1. Enter a name for the run cache.

   1. (Optional) Enter a description.

   1. Enter an S3 location for the cached output. Choose a bucket in the same Region as your workflow.

   1. (Optional) Enter the AWS account of the bucket owner to verify bucket ownership. If you don't enter a value, the default value is your account ID.

   1. Under **Cache behavior**, configure the default behavior (whether to cache outputs for failed runs or for all runs). When you start a run, you can optionally override the default behavior. 

1. (Optional) Associate one or more tags with the run cache.

1. Choose **Create run cache**. The console displays the new run cache in the **Run caches** table.

## Creating a run cache using the CLI
<a name="workflow-cache-create-api"></a>

Use the **create-run-cache** CLI command to create a run cache. The default cache behavior is `CACHE_ON_FAILURE`.

```
aws omics create-run-cache \
      --name "workflow 123 run cache" \
      --description "my run cache" \
      --cache-s3-location "s3://amzn-s3-demo-bucket" \ 
      --cache-behavior "CACHE_ALWAYS"                \
      --cache-bucket-owner-id  "111122223333"
```

If the create is successful, you receive a response with the following fields.

```
{
  "arn": "string",
  "id": "string",
  "status": "ACTIVE"
  "tags": {}
  }
```

# Updating a run cache
<a name="workflow-cache-update"></a>

You can change the cache name, description, tags, or cache behavior, but not the S3 location for the cache.

## Updating a run cache using the console
<a name="workflow-cache-update-console"></a>

From the console, follow these steps to update a run cache.

1. Open the [HealthOmics console](https://console.aws.amazon.com/omics/).

1.  If required, open the left navigation pane (≡). Choose **Run caches**.

1. From the **Run caches** table, choose the run cache to update, then choose **Edit**. 

1. In the **Run cache details** panel, you can update the run cache name, description, and cache behavior fields.

1. (Optional) Associate one or more new tags with the run cache, or remove existing tags.

1. Choose **Save run cache**.

## Updating a run cache using the CLI
<a name="workflow-cache-update-api"></a>

Use the **update-run-cache** CLI command to update a run cache.

```
aws omics update-run-cache \
      --name "workflow 123 run cache" \
      --id "workflow id" \
      --description "my run cache" \
      --cache-behavior "CACHE_ALWAYS"
```

If the update is successful, you receive a response with no data fields.

# Deleting a run cache
<a name="workflow-cache-delete"></a>

You can delete a run cache if no active runs are using it. If any runs are using the run cache, wait for the runs to complete or you can cancel the runs.

Deleting a run cache removes the resource and its metadata, but doesn't delete the data in Amazon S3. After you delete the cache, you can't reattach it or use it for subsequent runs.

The cached data remains in Amazon S3 for your inspection. You can remove old cache data using standard S3 **Delete** operations. Alternatively, create an Amazon S3 lifecycle policy to expire cached data that you no longer use.

## Deleting a run cache using the console
<a name="workflow-cache-delete-console"></a>

From the console, follow these steps to delete a run cache.

1. Open the [HealthOmics console](https://console.aws.amazon.com/omics/).

1.  If required, open the left navigation pane (≡). Choose **Run caches**.

1. From the **Run caches** table, choose the run cache to delete.

1. From the **Run caches** table menu, choose **Delete**.

1. From the modal dialog, save the Amazon S3 cache data link for future reference, then confirm that you want to delete the run cache.

    You can use the Amazon S3 link to inspect the cached data, but you can't relink the data to another run cache. Delete the cache data when you've finished the inspection.

## Deleting a run cache using the CLI
<a name="workflow-cache-delete-api"></a>

Use the **delete-run-cache** CLI command to delete a run cache. 

```
aws omics delete-run-cache \
      --id "my cache id"
```

If the delete is successful, you receive a response with no data fields.

# Contents of a run cache
<a name="workflow-cache-contents"></a>

HealthOmics organizes your run cache with the following structure in your S3 bucket:

```
s3://{cache.S3location}/{cache.uuid}/runID/taskID/{cacheentry.uuid}/
```

The cache.uuid is the globally unique id for the cache. The cacheentry.uuid is the globally unique uuid for a cached task. HealthOmics assigns the uuids to caches and tasks. 

For all workflow engines, the cache contains the following files: 
+ The **\$1cacheentryuuid\$1.json** file – HealthOmics creates this manifest file, which contains information about the cache, including a list of all items in the cache, and the [cache version](how-run-cache.md#workflow-cache-data-versions).
+ Task output files – Each task output consists of one or more files, as defined by the task. 

For a workflow that uses Nextflow, the Nextflow engine creates these additional files in the cache:
+ The **command.out** file – This file contains the task execution stdout contents.
+ The **.exitcode** file – This file contains the task exit code (an integer).

**Note**  
If you want to access intermediate task files in your run cache for advanced troubleshooting, declare these files as task outputs in the workflow definition.

# Engine-specific caching features
<a name="workflow-cache-per-engine"></a>

HealthOmics tries to provide a consistent implementation of call caching across workflow engines. There are some differences based on how each workflow engine handles specific cases:
+ Nextflow
  + Caching across different Nextflow versions is not guaranteed. If you run a task on one Nextflow version and then run the same task on a different Nextflow version, HealthOmics might consider the second run to be a cache miss.
  + You can turn off caching for individual tasks by using the cache **false** directive. For information about this directive, see the [ Processes](https://www.nextflow.io/docs/latest/process.html#process-cache) in the Nextflow specification.
  + HealthOmics uses Nextflow lenient mode, but doesn't support deep caching mode. 
  + Caching evaluates each individual S3 object if you use a glob pattern in the S3 path to the inputs for a task. If you add a new object, HealthOmics recomputes only the tasks that use the new object.
  + HealthOmics doesn't cache task retries. This behavior is consistent with Nextflow’s default behavior.
+ WDL
  + HealthOmics supports the new “directory” type for inputs when you use the development version of the WDL workflow. For call caching, if any object in the directory changes, HealthOmics recomputes all tasks that input the directory.
  + HealthOmics supports task-level caching, but not workflow-level caching. 
  + You can disable caching for individual tasks by using the **volatile** attribute. For more information, see [Disable task-level caching with the volatile attribute](workflow-languages-wdl.md#workflow-wdl-volatile-attribute).
+ CWL
  + Constant outputs from tasks aren't explicitly visible from the manifests. HealthOmics caches constant outputs as intermediate files.
  + You can control caching for individual tasks by using the [WorkReuse](https://www.commonwl.org/v1.1/Workflow.html#WorkReuse) feature.

# Using the run cache
<a name="workflow-cache-startrun"></a>

By default, runs don't use a run cache. To use a cache for the run, you specify the run cache and the run cache behavior when you start the run.

After a run completes, you can use the console, CloudWatch Logs, or API operations to track cache hits or troubleshoot cache issues. For details, see [Tracking call caching information](#workflow-cache-track) and [Troubleshooting call caching issues](troubleshooting.md#workflow-cache-troubleshooting).

If one or more tasks in a run generate non-deterministic outputs, we strongly recommend that you don’t use call caching for the run, or you opt out these specific tasks from caching. For more information, see [Shared responsibility model](how-run-cache.md#run-cache-srm).


**Note**  
You provide an IAM service role when you start a run. To use call caching, the service role needs permission to access the run cache Amazon S3 location. For more information, see [Service roles for AWS HealthOmics](permissions-service.md).

You can use [Kiro CLI](https://docs.aws.amazon.com/kiro/latest/userguide/what-is.html) to analyze and manage your run cache data. For more information, see [Example prompts for Kiro CLI](getting-started.md#omics-kiro-prompts) and the [HealthOmics Agentic generative AI tutorial](https://github.com/aws-samples/aws-healthomics-tutorials/tree/main/generative-ai) on GitHub.

**Topics**
+ [Configuring a run with run cache using the console](#workflow-cache-startrun-console)
+ [Configuring a run with run cache using the CLI](#workflow-cache-startrun-api)
+ [Error cases for run caches](#workflow-cache-errors)
+ [Tracking call caching information](#workflow-cache-track)

## Configuring a run with run cache using the console
<a name="workflow-cache-startrun-console"></a>

From the console, you configure the run cache for a run when you start the run.

1. Open the [HealthOmics console](https://console.aws.amazon.com/omics/).

1.  If required, open the left navigation pane (≡). Choose **Runs**.

1. On the **Runs** page, choose the run to start.

1. Choose **Start run** and complete steps 1 and 2 of **Start run** as described in [Starting a run using the console](starting-a-run.md#starting-a-run-console). 

1. In step 3 of **Start run**, choose **Select an existing run cache**. 

1. Select the cache from the **Run cache ID** drop-down list. 

1. To override the default run cache behavior, choose the **Cache behavior** for the run. For more information, see [Run cache behavior](how-run-cache.md#run-cache-behavior).

1. Continue to step 4 of **Start run**.

## Configuring a run with run cache using the CLI
<a name="workflow-cache-startrun-api"></a>

To start a run that uses a run cache, add the cache-id parameter to the **start-run** CLI command. Optionally, use the `cache-behavior` parameter to override the default behavior that you configured for the run cache. The following example shows only the cache fields for the command:

```
aws omics start-run \
        ...  
      --cache-id "xxxxxx"    \
      --cache-behavior  CACHE_ALWAYS
```

If the operation is successful, you receive a response with no data fields. 

## Error cases for run caches
<a name="workflow-cache-errors"></a>

For the following scenarios, HealthOmics may not cache task outputs, even for a run with cache behavior set to **Cache always**.
+ If the run encounters an error before the first task completes successfully, there are no cache outputs to export.
+ If the export process fails, HealthOmics doesn't save the task outputs to the Amazon S3 cache location.
+ If the run fails due to a **filesystem out of space** error, call caching doesn't save any task outputs.
+ If you cancel a run, call caching doesn't save any task outputs.
+ If the run experiences a run timeout, call caching doesn't save any task outputs, even if you configured the run to use cache on failure.

## Tracking call caching information
<a name="workflow-cache-track"></a>

You can track call caching events (such as run cache hits) using the console, the CLI, or CloudWatch Logs.

**Topics**
+ [Track cache hits using the console](#workflow-cache-track-console)
+ [Track call caching using the CLI](#workflow-cache-track-cli)
+ [Track call caching using CloudWatch Logs](#workflow-cache-track-cwl)

### Track cache hits using the console
<a name="workflow-cache-track-console"></a>

In the run details page for a run, the **Run tasks** table displays **Cache hit** information for each task. The table also includes a link to the associated cache entry. Use the following procedure to view cache hit information for a run.

1. Open the [HealthOmics console](https://console.aws.amazon.com/omics/).

1.  If required, open the left navigation pane (≡). Choose **Runs**.

1. On the **Runs** page, choose the run to inspect.

1. On the run details page, choose the **Run tasks** tab to display the tasks table.

1. If a task has a cache hit, the **Cache hit** column contains a link to the run cache entry location in Amazon S3.

1. Choose the link to inspect the run cache entry.

### Track call caching using the CLI
<a name="workflow-cache-track-cli"></a>

Use the **get-run** CLI command confirm whether the run used a call cache.

```
 aws omics get-run --id 1234567  
```

In the response, if the `cacheId` field is set, the run uses that cache.

Use the **list-run-tasks** CLI command to retrieve the cache data location for each cached task in the run.

```
 aws omics list-run-tasks --id 1234567  
```

In the response, if the cacheHit field for a task is true, the cacheS3Uri field provides the cache data location for that task.

You can also use the **get-run-task** CLI command to retrieve the cache data location for a specific task:

```
 aws omics get-run-task --id 1234567 --task-id <task_id> 
```

### Track call caching using CloudWatch Logs
<a name="workflow-cache-track-cwl"></a>

HealthOmics creates cache activity logs in the `/aws/omics/WorkflowLog` CloudWatch log group. There is a log stream for each run cache: **runCache/<cache\$1id>/<cache\$1uuid>**.

For runs that use call caching, HealthOmics generates CloudWatch Logs entries for these events: 
+  creating a cache entry (CACHE\$1ENTRY\$1CREATED)
+  matching a cache entry (CACHE\$1HIT) 
+  failing to match a cache entry (CACHE\$1MISS)

For more information about these logs, see [Logs in CloudWatch](monitoring-cloudwatch-logs.md#cloudwatch-logs).

Use the following CloudWatch Insights query on the `/aws/omics/WorkflowLog` log group to return the number of cache hits per run for this cache:

```
filter @logStream like 'runCache/<CACHE_ID>/'
 fields @timestamp, @message
 filter logMessage like 'CACHE_HIT'
 parse "run: *," as run
 stats count(*) as cacheHits by run
```

Use the following query to return the number of cache entries created by each run:

```
filter @logStream like 'runCache/<CACHE_ID>/'
 fields @timestamp, @message
 filter logMessage like 'CACHE_ENTRY_CREATED'
 parse "run: *," as run
 stats count(*) as cacheEntries by run
```