Using the run cache - AWS HealthOmics

Using the run cache

By default, runs don't use a run cache. To use a cache for the run, you specify the run cache and the run cache behavior when you start the run.

After a run completes, you can invoke the GetRunTask and ListRunTask API operations to inspect the cacheHit response.

Note

You provide an IAM service role when you start a run. To use call caching, the service role needs permission to access the run cache Amazon S3 location. For more information, see Service roles for AWS HealthOmics.

Configuring a run with run cache using the console

From the console, you configure the run cache for a run when you start the run.

  1. Open the HealthOmics console https://console.aws.amazon.com/omics/.

  2. In the left navigation pane, choose Runs.

  3. On the Runs page, choose the run to start.

  4. Choose Start run and complete steps 1 and 2 of Start run as described in Starting a run using the console.

  5. In step 3 of Start run, choose Select an existing run cache.

  6. Select the cache from the Run cache ID drop-down list.

  7. To override the default run cache behavior, choose the Cache behavior for the run. For more information, see Run cache behavior.

  8. Continue to step 4 of Start run.

Configuring a run with run cache using the CLI

To start a run that uses a run cache, add the cache-id parameter to the start-run CLI command. Optionally, use the cache-behavior parameter to override the default behavior that you configured for the run cache.

aws omics start-run \ --run-id "xx" \ --cache-id "xxxxxx" \ --cache-behavior CACHE_ALWAYS

If the operation is successful, you receive a response with no data fields.

Tracking your run cache hits

For runs that use call caching, HealthOmics creates a CloudWatch Logs entry when it:

  • creates a cache entry (CACHE_ENTRY_CREATED)

  • matches a cache entry (CACHE_HIT)

  • fails to match a cache entry (CACHE_MISS)

For more information about these logs, see Logs in CloudWatch .

Managing your run cache data

HealthOmics doesn't automatically manage run cache data in Amazon S3. This list provides information about managing cache data:

  • You can view how much data is in the cache by checking Amazon S3. HealthOmics doesn't monitor or report on cache size.

  • You can set Amazon S3 data retention/replication policies for the run cache by configuring them on the S3 bucket. HealthOmics doesn't auto-archive data or apply Amazon S3 clean-up rules for managing caches.

  • You can delete cache files directly from the Amazon S3 bucket.

  • A run doesn't fail if a task doesn't have a cache entry. HealthOmics re-computes the task and its dependent tasks.

  • If you modify cache names or directory structures such that HealthOmics can’t find a matching entry for a task, HealthOmics re-computes the task.

Caching requirements for tasks

HealthOmics caches task outputs for tasks that meet the following requirements:

  • The task must define a container. HealthOmics won't cache outputs for a task with no container.

  • The task must produce one or more outputs. You specify task outputs in the workflow definition.

  • The workflow definition must not use dynamic values. For example, if you pass a parameter to a task with a value that increments with every run, HealthOmics doesn't cache the task outputs.

If you update the CPU, GPU, or memory allocation in a task definition, HealthOmics doesn't try to find a cache match for the task. Instead, it recomputes the task using the new values for the compute and memory resources. If the task completes successfully, HealthOmics adds these task outputs to the cache. This approach lets you test new resource allocations as part of the run optimization process.

Error cases for run caches

For the following scenarios, HealthOmics may not cache task outputs, even for a run with cache behavior set to Cache always.

  • If the run encounters an error before the first task completes successfully, there are no cache outputs to export.

  • If the export process fails, HealthOmics doesn't save the task outputs to the Amazon S3 cache location.

  • If the run fails due to a filesystem out of space error, call caching doesn't save any task outputs.

  • If you cancel a run, call caching doesn't save any task outputs.

  • If your run fails because of a run timeout, call caching doesn't save any task outputs.