# Creating private workflows in HealthOmics
<a name="workflows-setup"></a>

*Private workflows* depend on a variety of resources that you create and configure before creating the workflow:
+ **Workflow definition file:** A workflow definition file written in WDL, Nextflow, or CWL. The workflow definition specifies the inputs and outputs for runs that use the workflow. It also includes specifications for the runs and run tasks for your workflow, including compute and memory requirements. The workflow definition file must be in `.zip` format. For more information, see [Workflow definition files](workflow-definition-files.md).
  + You can use [Kiro CLI](https://docs.aws.amazon.com/kiro/latest/userguide/what-is.html) to build and validate your workflow definition files in WDL, Nextflow, and CWL. For more information, see [Example prompts for Kiro CLI](getting-started.md#omics-kiro-prompts) and the [HealthOmics Agentic generative AI tutorial](https://github.com/aws-samples/aws-healthomics-tutorials/tree/main/generative-ai) on GitHub.
+ **(Optional) Parameter template file:** A parameter template file written in JSON. Create the file to define the run parameters, or HealthOmics generates the parameter template for you. For more information, see [Parameter template files for HealthOmics workflows](parameter-templates.md).
+ **Amazon ECR container images:** Create a private Amazon ECR repository for the workflow. Create container images in the private repository, or synchronize the contents of a supported upstream registry with your Amazon ECR private repository.
+ **(Optional) Sentieon licenses:** Request a Sentieon license to use the Sentieon software in private workflows.

Optionally, you can run a linter on the workflow definition before or after you create the workflow. The **linter** topic describes the linters available in HealthOmics.

**Topics**
+ [HealthOmics workflow integration with Git-based repositories](workflows-git-integration.md)
+ [Workflow definition files in HealthOmics](workflow-definition-files.md)
+ [Parameter template files for HealthOmics workflows](parameter-templates.md)
+ [Container images for private workflows](workflows-ecr.md)
+ [HealthOmics Workflow README files](workflows-readme.md)
+ [Requesting Sentieon licenses for private workflows](private-workflows-subscribe.md)
+ [Workflow linters in HealthOmics](workflows-linter.md)
+ [HealthOmics workflow operations](creating-private-workflows.md)

# HealthOmics workflow integration with Git-based repositories
<a name="workflows-git-integration"></a>

When you create a workflow (or a workflow version), you provide a workflow definition to specify information about the workflow, runs, and tasks. HealthOmics can retrieve the workflow definition as a .zip archive (stored locally or in an Amazon S3 bucket), or from a supported Git-based repository.

The HealthOmics integration with Git-based repositories enables the following capabilities:
+ Direct workflow creation from public, private, and self-managed instances.
+ Integration of workflow README files and parameter templates from repositories.
+ Support for GitHub, GitLab, and Bitbucket repositories.

By using a Git-based repository, you avoid the manual steps of downloading workflow definition files and input parameter template files, creating a .zip archive, and then staging the archive to S3. This simplifies workflow creation for scenarios such as the following examples:

1. You want to get started quickly using a common open source workflow, such as nf-core. HealthOmics automatically retrieves all workflow definition and input parameter template files from the nf-core repository on GitHub and uses these files to create your new workflow.

1. You are using a public workflow from GitHub, and some new updates become available. You can easily create a new HealthOmics workflow version using the updated workflow definition on GitHub as the source. Users of your workflow can choose between the original workflow or the new workflow version that you created.

1. Your team is building a proprietary pipeline that is not public. You keep your code on a private git repository and use this workflow definition for your HealthOmics workflows. The team updates the workflow definition frequently as part of an iterative workflow development lifecycle. You can easily create new workflow versions as required from your private repository.

**Topics**
+ [Supported Git-based repositories](#workflows-git-supported)
+ [Configure connections to external code repositories](#workflows-git-connections)
+ [Accessing self managed repositories](#workflows-git-self-managed)
+ [Quotas related to external code repositories](#workflows-git-quotas)
+ [Required IAM permissions](#workflows-git-permissions)

## Supported Git-based repositories
<a name="workflows-git-supported"></a>

HealthOmics supports public and private repositories for the following Git-based providers:
+ GitHub 
+ GitLab 
+ Bitbucket 

HealthOmics supports self-managed repositories for the following Git-based providers:
+ GitHubEnterpriseServer 
+ GitLabSelfManaged 

HealthOmics supports use of cross-account connections for GitHub, GitLab, and Bitbucket. Set up shared permissions through the AWS Resource Access Manager. For an example, see [Shared connections](https://docs.aws.amazon.com/codepipeline/latest/userguide/connections-shared.html) in the *CodePipeline user guide*.

## Configure connections to external code repositories
<a name="workflows-git-connections"></a>

Connect your workflows to Git-based repositories using AWS CodeConnection. HealthOmics uses this connection to access your source code repositories.

**Note**  
The AWS CodeConnections service is not available in the il-central-1 region. For this region, configure service us-east-1 to create workflows or workflow versions from a repository. 

### Create a connection
<a name="workflows-git-connection-create"></a>

Before you can create connections, follow the instructions in [Setting up connections](https://docs.aws.amazon.com/dtconsole/latest/userguide/setting-up-connections.html) in the *Developer Console Tools User Guide*. 

To create a connection, follow the instructions in [Create a connection](https://docs.aws.amazon.com/dtconsole/latest/userguide/connections-create.html) in the *Developer Console Tools User Guide*. 

### Configure authorization for the connection
<a name="workflows-git-connection-create"></a>

You must authorize the connection using the provider's OAuth flow. Make sure that the connection status is `AVAILABLE` before you use it.

For examples, see the blog post [ How To Create an AWS HealthOmics Workflows from Content in Git](https://repost.aws/articles/ARCEN7AjhaRSmteczRoc_QsA/how-to-create-an-aws-healthomics-workflows-from-content-in-git). 

## Accessing self managed repositories
<a name="workflows-git-self-managed"></a>

To set up connections to a GitLab self-managed repository, use an admin Personal Access Token when creating a host. The subsequent connection creation accesses Oauth with the customer’s account.

The following example sets up a connection to a GitLab self-managed repository:

1. Set up access to the Personal Access Token of an admin user.

   To set up a PAT in a GitLab self managed repository, see [Personal access tokens](https://docs.gitlab.com/user/profile/personal_access_tokens/) in *GitLab Docs*.

1. Create a host

   1. Navigate to **CodePipeline>Settings>Connections**.

   1. Choose the **Hosts** tab and then choose **Create Host**.

   1. Configure the following fields:
      + Enter a name of the host
      + For provider type, choose **GitLab Self Managed**
      + Enter the **Host URL** 
      + Enter the VPC information if the host is defined in a VPC

   1. Choose **Create Host**, which creates the host in PENDING state.

   1. To complete the set up, choose **Set up Host**.

   1. Enter the Personal Access Token (PAT) of an Admin user, then choose **Continue**. 

1. Create the connection

   1. Choose **Create Connections** on the **Connections** tab.

   1. For provider type, select **GitLab self-managed**.

   1. Under **Connection Settings>Enter Connection Name**, enter the Host URL that you previously created.

   1. If your GitLab self-managed instance is only accessible via a VPC, configure the VPC details.

   1. Choose **Update Pending Connection**. The modal window re-directs you to the GitLab login page.

   1. Enter the username and password for the customer account and complete the authorization process.

   1.  For first time setup, choose **Authorize AWS Connector for Gitlab Self Managed**.

## Quotas related to external code repositories
<a name="workflows-git-quotas"></a>

For HealthOmics integration with external code repositories, there is a maximum size for a repository, each repository file, and each README file. For details, see [HealthOmics workflow fixed size quotas](fixed-quotas.md#fixed-quotas-workflows).

## Required IAM permissions
<a name="workflows-git-permissions"></a>

Add the following actions to your identity-based IAM policy:

```
   "codeconnections:CreateConnection",
   "codeconnections:GetConnection",
   "codeconnections:GetHost",
   "codeconnections:ListConnections",
   "codeconnections:UseConnection"
```

# Workflow definition files in HealthOmics
<a name="workflow-definition-files"></a>

You use a workflow definition to specify information about the workflow, runs, and the tasks in the runs. You create workflow definitions in one or more files using a workflow definition language. HealthOmics supports workflow definitions written in WDL, Nextflow, or CWL. 

HealthOmics supports the following choices for WDL workflow definitions: 
+ WDL – Provides a spec-conformant WDL engine. 
+ WDL lenient – Designed to handle workflows migrated from Cromwell. It supports customer Cromwell directives and some non-conformant logic. For details, see [Implicit type conversion in WDL lenient](workflow-languages-wdl.md#workflow-wdl-type-conversion).

For information about each of the workflow languages, see the language-specific detailed sections below.

You specify the following types of information in the workflow definition:
+ **Language version** – The language and version of the workflow definition.
+ **Compute and memory** – The compute and memory requirements for tasks in the workflow.
+ **Inputs** – Location of the inputs to the workflow tasks. For more information, see [HealthOmics run inputs](workflows-run-inputs.md).
+ **Outputs** – Location to save the outputs that the tasks generate.
+ **Task resources** – Compute and memory requirements for each task.
+ **Accelerators** – other resources that the tasks require, such as accelerators.

**Topics**
+ [HealthOmics workflow definition requirements](workflow-defn-requirements.md)
+ [Version support for HealthOmics workflow definition languages](workflows-lang-versions.md)
+ [Compute and memory requirements for HealthOmics tasks](memory-and-compute-tasks.md)
+ [Task outputs in a HealthOmics workflow definition](workflows-task-outputs.md)
+ [Task resources in a HealthOmics workflow definition](task-resources.md)
+ [Task accelerators in a HealthOmics workflow definition](task-accelerators.md)
+ [WDL workflow definition specifics](workflow-languages-wdl.md)
+ [Nextflow workflow definition specifics](workflow-definition-nextflow.md)
+ [CWL workflow definition specifics](workflow-languages-cwl.md)
+ [Example workflow definitions](workflow-definition-examples.md)

# HealthOmics workflow definition requirements
<a name="workflow-defn-requirements"></a>

The HealthOmics workflow definition files must meet the following requirements:
+ Tasks must define input/output parameters, Amazon ECR container repositories, and runtime specifications such as memory or CPU allocation.
+ Verify that your IAM roles have the required permissions.<a name="lower"></a>
  + Your workflow has access to input data from AWS resources, such as Amazon S3. 
  + Your workflow has access to external repository services when needed.
+ Declare the output files in the workflow definition. To copy intermediate run files to the output location, declare them as workflow outputs. 
+ The input and output locations must be in the same Region as the workflow. 
+ HealthOmics storage workflow inputs must be in `ACTIVE` status. HealthOmics won't import inputs with an `ARCHIVED` status, causing the workflow to fail. For information about Amazon S3 object inputs, see [HealthOmics run inputs](workflows-run-inputs.md).
+ A **main** location of the workflow is optional if your ZIP archive contains either a single workflow definition or a file named 'main'.<a name="lower"></a>
  + Example path: `workflow-definition/main-file.wdl`
+ Before you create a workflow from Amazon S3 or your local drive, create a zip archive of the workflow definition files and any dependencies, such as subworkflows.
+ We recommend that you declare Amazon ECR containers in the workflow as input parameters for validation of the Amazon ECR permissions. 

Additional Nextflow considerations:
+ **/bin**

  Nextflow workflow definitions may include a /bin folder with executable scripts. This path has read-only plus executable access to tasks. Tasks that rely on these scripts should use a container built with the appropriate script interpreters. Best practice is to call the interpreter directly. For example:

  ```
  process my_bin_task {
     ...
     script:
        """
        python3 my_python_script.py
        """
  }
  ```
+ **includeConfig**

  Nextflow-based workflow definitions can include nextflow.config files that help to abstract parameter definitions or process resource profiles. To support development and execution of Nextflow pipelines on multiple environments, use a HealthOmics-specific configuration that you add to the global config using the includeConfig directive. To maintain portability, configure the workflow to include the file only when running on HealthOmics by using the following code:

  ```
  // at the end of the nextflow.config file
  if ("$AWS_WORKFLOW_RUN") {
      includeConfig 'conf/omics.config'
  }
  ```
+ **Reports**

  HealthOmics doesn't support engine-generated dag, trace, and execution reports. You can generate alternatives to the trace and execution reports using a combination of GetRun and GetRunTask API calls. 

Additional CWL considerations:
+ **Container image uri interpolation**

  HealthOmics allows the dockerPull property of the DockerRequirement to be an inline javascript expression. For example:

  ```
  requirements:
    DockerRequirement:
      dockerPull: "$(inputs.container_image)"
  ```

  This allows you to specifying container image URIs as input parameters to the workflow.
+ **Javascript expressions**

  Javascript expressions must be `strict mode` compliant.
+ **Operation process**

  HealthOmics doesn't support CWL Operation processes.

# Version support for HealthOmics workflow definition languages
<a name="workflows-lang-versions"></a>

HealthOmics supports workflow definition files written in Nextflow, WDL, or CWL. The following sections provide information about HealthOmics version support for these languages.

**Topics**
+ [WDL version support](#workflows-lang-versions-WDL)
+ [CWL version support](#workflows-lang-versions-CWL)
+ [Nextflow version support](#workflows-lang-versions-nextflow)

## WDL version support
<a name="workflows-lang-versions-WDL"></a>

HealthOmics supports versions 1.0, 1.1, and the development version of the WDL specification.

Every WDL document must include a version statement to specify which version (major and minor) of the specification it adheres to. For more information about versions, see [WDL versioning](https://github.com/openwdl/wdl/blob/wdl-1.1/SPEC.md#versioning)

Versions 1.0 and 1.1 of the WDL specification do not support the `Directory` type. To use the `Directory` type for inputs or outputs, set the version to **development** in the first line of the file:

```
version development  # first line of .wdl file
     ... remainder of the file ...
```

## CWL version support
<a name="workflows-lang-versions-CWL"></a>

HealthOmics supports versions 1.0, 1.1, and 1.2 of the CWL language.

You can specify the language version in the CWL workflow definition file. For more information about CWL, see the [CWL user guide](https://github.com/common-workflow-language/user_guide)

## Nextflow version support
<a name="workflows-lang-versions-nextflow"></a>

HealthOmics supports four Nextflow stable versions. Nextflow typically releases a stable version every six months. HealthOmics doesn't support the monthly “edge” releases.

HealthOmics supports released features in each version, but not preview features.

### Supported versions
<a name="workflows-versions-nextflow-list"></a>

HealthOmics supports the following Nextflow versions:
+ Nextflow v22.04.01 DSL 1 and DSL 2
+ Nextflow v23.10.0 DSL 2 (default)
+ Nextflow v24.10.8 DSL 2
+ Nextflow v25.10.0 DSL 2

**Note**  
HealthOmics does not support strict syntax mode in Nextflow v25.10.0.

To migrate your workflow to the latest supported version (v25.10.0), follow the [Nextflow upgrade guide](https://www.nextflow.io/docs/latest/migrations/25-10.html).

There are some breaking changes when migrating to Nextflow v24 and v25. Follow the [Nextflow migration guide](https://www.nextflow.io/docs/latest/migrations/index.html).

### Detect and process Nextflow versions
<a name="workflows-versions-processing"></a>

HealthOmics detects the DSL version and Nextflow version that you specify. It automatically determines the best Nextflow version to run based on these inputs.

#### DSL version
<a name="workflows-versions-p1"></a>

HealthOmics detects the requested DSL version in your workflow definition file. For example, you can specify: `nextflow.enable.dsl=2`.

HealthOmics supports DSL 2 by default. It provides backwards compatibility with DSL 1, if specified in your workflow definition file.
+ If you specify DSL 1, HealthOmics runs Nextflow v22.04 DSL1 (the only supported version that runs DSL 1).
+ If you don't specify a DSL version, or if HealthOmics can’t parse the DSL information for any reason (such as syntax errors in your workflow definition file), HealthOmics defaults to DSL 2 and runs Nextflow v23.10.0.
+ To upgrade your workflow from DSL 1 to DSL 2 to take advantage of the latest Nextflow versions and software features, see [Migrating from DSL 1](https://nextflow.io/docs/latest/dsl1.html).

#### Nextflow versions
<a name="workflows-versions-p2"></a>

HealthOmics detects the requested Nextflow version in the Nextflow configuration file (nextflow.config), if you provide this file. We recommend that you add the `nextflowVersion` clause at the end of the file to avoid any unexpected overrides from included configs. For more information, see [Nextflow configuration](https://nextflow.io/docs/latest/config.html).

You can specify a Nextflow version or a range of versions using the following syntax:

```
   // exact match
   manifest.nextflowVersion = '1.2.3'   
            
   // 1.2 or later (excluding 2 and later)
   manifest.nextflowVersion = '1.2+'         
            
   // 1.2 or later
   manifest.nextflowVersion = '>=1.2'
            
   // any version in the range 1.2 to 1.5
   manifest.nextflowVersion = '>=1.2, <=1.5' 
            
   // use the "!" prefix to stop execution if the current version 
   // doesn't match the required version.
   manifest.nextflowVersion = '!>=1.2'
```

HealthOmics processes the Nextflow version information as follows: 
+ If you use **=** to specify an exact version that HealthOmics supports, HealthOmics uses that version. 
+ If you use **\$1** to specify an exact version or a range of versions that are not supported, HealthOmics raises an exception and fails the run. Consider using this option if you want to be strict with version requests and fail quickly if the request includes unsupported versions.
+ If you specify a range of versions, HealthOmics uses the highest-preference version in that range. The preference order from highest to lowest is v23.10.0, v22.04.0, v24.10.8, and v25.10.0. For example:
  + If the range covers v23.10.0, v24.10.8, and v25.10.0, HealthOmics chooses v23.10.0.
  + If the range covers v24.10.8 and v25.10.0, HealthOmics chooses v24.10.8.
+ If there is no requested version, or if the requested versions aren't valid or can’t be parsed for any reason:
  + If you specified DSL 1, HealthOmics runs Nextflow v22.04.
  + Otherwise, HealthOmics runs Nextflow v23.10.0.

 You can retrieve the following information about the Nextflow version that HealthOmics used for each run:
+ The run logs contain information about the actual Nextflow version that HealthOmics used for the run.
+ HealthOmics adds warnings in the run logs if there isn't a direct match with your requested version or if it needed to use a different version than you specified.
+ The response to the **GetRun** API operation includes a field (`engineVersion`) with the actual Nextflow version that HealthOmics used for the run. For example:

  ```
  "engineVersion":"22.04.0"
  ```

# Compute and memory requirements for HealthOmics tasks
<a name="memory-and-compute-tasks"></a>

HealthOmics runs your private workflow tasks in an omics instance. HealthOmics provides a variety of instance types to accommodate different types of tasks. Each instance type has a fixed memory and vCPU configuration (and fixed GPU configuration for accelerated computing instance types). The cost of using an omics instance varies depending on the instance type. For details, see the [HealthOmics Pricing](https://aws.amazon.com/healthomics/pricing/) page.

For tasks in a workflow, you specify the required memory and vCPUs in the workflow definition file. When a workflow task runs, HealthOmics allocates the smallest omics instance that accommodates the requested memory and vCPUs. For example, if a task needs 64 GiB of memory and 8 vCPUs, HealthOmics selects `omics.r.2xlarge`.

We recommend that you review the instance types and set your requested vCPUs and memory size to match the instance that best meets your needs. The task container uses the number of vCPUs and the memory size that you specify in your workflow definition file, even if the instance type has additional vCPUs and memory. 

The following list contains additional information about vCPU and memory allocation:
+ Container resource allocations are hard limits. If a task runs out of memory or attempts to use additional vCPUs , the task generates an error log and exits.
+ If you don’t specify any compute or memory requirements, HealthOmics selects **omics.c.large** and defaults to a configuration with 1 vCPU and 1 GiB of memory.
+ The minimum configuration that you can request is 1 vCPU and 1 GiB of memory. 
+ If you specify vCPUs, memory, or GPUs that exceeds the supported instance types, HealthOmics throws an error message and the workflow fails validations
+ If you specify fractional units, HealthOmics rounds up to the nearest integer.
+ HealthOmics reserves a small amount of memory (5%) for management and logging agents, so the full memory allocation might not always be available to the application in the task.
+ HealthOmics matches instance types to fit the compute and memory requirements that you specify, and may use a mix of hardware generations. For this reason, there can be some minor variances in task run times for the same task.

These topics provide details about the instance types that HealthOmics supports. 

**Topics**
+ [Standard instance types](#workflow-task-standard-instances)
+ [Compute-optimized instances](#workflow-task-compute-optimized-instances)
+ [Memory-optimized instances](#workflow-task-memory-optimized-instances)
+ [Accelerated-computing instances](#workflow-task-accelerated-computing-instances)

**Note**  
 For standard, compute, and memory optimized instances, increase the instance bandwidth size if the instance requires a higher throughput. Amazon EC2 instances with fewer than 16 vCPUs (size 4xl and smaller) can experience throughput bursting. For more information on Amazon EC2 instance throughput, see [Amazon EC2 available instance bandwidth](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html#available-instance-bandwidth).

## Standard instance types
<a name="workflow-task-standard-instances"></a>

For standard instance types, the configurations aim for a balance of compute power and memory. 

HealthOmics supports the 32xlarge and 48xlarge instances in these regions: US West (Oregon) and US East (N. Virginia).


| Instance | Number of vCPUs | Memory | 
| --- | --- | --- | 
| omics.m.large | 2 | 8 GiB | 
| omics.m.xlarge | 4 | 16 GiB | 
| omics.m.2xlarge | 8 | 32 GiB | 
| omics.m.4xlarge | 16 | 64 GiB | 
| omics.m.8xlarge | 32 | 128 GiB | 
| omics.m.12xlarge | 48 | 192 GiB | 
| omics.m.16xlarge | 64 | 256 GiB | 
| omics.m.24xlarge | 96 | 384 GiB | 
| omics.m.32xlarge | 128 | 512 GiB | 
| omics.m.48xlarge | 192 | 768 GiB | 

## Compute-optimized instances
<a name="workflow-task-compute-optimized-instances"></a>

For compute-optimized instance types, the configurations have more compute power and less memory.

HealthOmics supports the 32xlarge and 48xlarge instances in these regions: US West (Oregon) and US East (N. Virginia).


| Instance | Number of vCPUs | Memory | 
| --- | --- | --- | 
| omics.c.large | 2 | 4 GiB | 
| omics.c.xlarge | 4 | 8 GiB | 
| omics.c.2xlarge | 8 | 16 GiB | 
| omics.c.4xlarge | 16 | 32 GiB | 
| omics.c.8xlarge | 32 | 64 GiB | 
| omics.c.12xlarge | 48 | 96 GiB | 
| omics.c.16xlarge | 64 | 128 GiB | 
| omics.c.24xlarge | 96 | 192 GiB | 
| omics.c.32xlarge | 128 | 256 GiB | 
| omics.c.48xlarge | 192 | 384 GiB | 

## Memory-optimized instances
<a name="workflow-task-memory-optimized-instances"></a>

For memory-optimized instance types, the configurations have less compute power and more memory.

HealthOmics supports the 32xlarge and 48xlarge instances in these regions: US West (Oregon) and US East (N. Virginia).


| Instance | Number of vCPUs | Memory | 
| --- | --- | --- | 
| omics.r.large | 2 | 16 GiB | 
| omics.r.xlarge | 4 | 32 GiB | 
| omics.r.2xlarge | 8 | 64 GiB | 
| omics.r.4xlarge | 16 | 128 GiB | 
| omics.r.8xlarge | 32 | 256 GiB | 
| omics.r.12xlarge | 48 | 384 GiB | 
| omics.r.16xlarge | 64 | 512 GiB | 
| omics.r.24xlarge | 96 | 768 GiB | 
| omics.r.32xlarge | 128 | 1024 GiB | 
| omics.r.48xlarge | 192 | 1536 GiB | 

## Accelerated-computing instances
<a name="workflow-task-accelerated-computing-instances"></a>

You can optionally specify GPU resources for each task in a workflow, so that HealthOmics allocates an accelerated-computing instance for the task. For information on how to specify the GPU information in the workflow definition file, see [Task accelerators in a HealthOmics workflow definition](task-accelerators.md).

If you specify a task accelerator that supports multiple instance types, HealthOmics selects the instance type based on availability. If more than one instance types are available, HealthOmics gives preference to the lower cost instance. The exception is for the nvidia-t4-a10g-l4 task accelerator which gives preference to the latest generation instance available in your region.

G4 instances aren't supported in the Israel (Tel Aviv) Region. G5 instances aren't support in the Asia Pacific (Singapore) Region. 


**Topics**
+ [G6 and G6e instance types](#workflow-task-accelerated-accelerated-g6)
+ [G4 and G5 instances](#workflow-task-accelerated-accelerated-g45)

### G6 and G6e instance types
<a name="workflow-task-accelerated-accelerated-g6"></a>

HealthOmics supports the following G6 accelerated-computing instance configurations. All omics.g6 instances use Nvidia L4 GPUs.

HealthOmics supports the G6 and G6e instances in these regions: US West (Oregon) and US East (N. Virginia).


| Instance | Number of vCPUs | Memory | Number of GPUs | GPU memory | 
| --- | --- | --- | --- | --- | 
| omics.g6.xlarge | 4 | 16 GiB | 1 | 24 GiB | 
| omics.g6.2xlarge | 8 | 32 GiB | 1 | 24 GiB | 
| omics.g6.4xlarge | 16 | 64 GiB | 1 | 24 GiB | 
| omics.g6.8xlarge | 32 | 128 GiB | 1 | 24 GiB | 
| omics.g6.12xlarge | 48 | 192 GiB | 4 | 96 GiB | 
| omics.g6.16xlarge | 64 | 256 GiB | 1 | 24 GiB | 
| omics.g6.24xlarge | 96 | 384 GiB | 4 | 96 GiB | 

All omics.g6e instances use Nvidia L40s GPUs.


| Instance | Number of vCPUs | Memory | Number of GPUs | GPU memory | 
| --- | --- | --- | --- | --- | 
| omics.g6e.xlarge | 4 | 32 GiB | 1 | 48 GiB | 
| omics.g6e.2xlarge | 8 | 64 GiB | 1 | 48 GiB | 
| omics.g6e.4xlarge | 16 | 128 GiB | 1 | 48 GiB | 
| omics.g6e.8xlarge | 32 | 256 GiB | 1 | 48 GiB | 
| omics.g6e.12xlarge | 48 | 384 GiB | 4 | 192 GiB | 
| omics.g6e.16xlarge | 64 | 512 GiB | 1 | 48 GiB | 
| omics.g6e.24xlarge | 96 | 768 GiB | 4 | 192 GiB | 

### G4 and G5 instances
<a name="workflow-task-accelerated-accelerated-g45"></a>

HealthOmics supports the following G4 and G5 accelerated-computing instance configurations. 

All omics.g5 instances use Nvidia Tesla A10G GPUs.


| Instance | Number of vCPUs | Memory | Number of GPUs | GPU memory | 
| --- | --- | --- | --- | --- | 
| omics.g5.xlarge | 4 | 16 GiB | 1 | 24 GiB | 
| omics.g5.2xlarge | 8 | 32 GiB | 1 | 24 GiB | 
| omics.g5.4xlarge | 16 | 64 GiB | 1 | 24 GiB | 
| omics.g5.8xlarge | 32 | 128 GiB | 1 | 24 GiB | 
| omics.g5.12xlarge | 48 | 192 GiB | 4 | 96 GiB | 
| omics.g5.16xlarge | 64 | 256 GiB | 1 | 24 GiB | 
| omics.g5.24xlarge | 96 | 384 GiB | 4 | 96 GiB | 

All omics.g4dn instances use Nvidia Tesla T4 GPUs.


| Instance | Number of vCPUs | Memory | Number of GPUs | GPU memory | 
| --- | --- | --- | --- | --- | 
| omics.g4dn.xlarge | 4 | 16 GiB | 1 | 16 GiB | 
| omics.g4dn.2xlarge | 8 | 32 GiB | 1 | 16 GiB | 
| omics.g4dn.4xlarge | 16 | 64 GiB | 1 | 16 GiB | 
| omics.g4dn.8xlarge | 32 | 128 GiB | 1 | 16 GiB | 
| omics.g4dn.12xlarge | 48 | 192 GiB | 4 | 64 GiB | 
| omics.g4dn.16xlarge | 64 | 256 GiB | 1 | 24 GiB | 

# Task outputs in a HealthOmics workflow definition
<a name="workflows-task-outputs"></a>

You specify task outputs in the workflow definition. By default, HealthOmics discards all intermediate task files when the workflow completes. To export an intermediate file, you define it as an output. 

If you use call caching, HealthOmics saves task outputs to the cache, including any intermediate files that you define as outputs.

The following topics include task definition examples for each of the workflow definition languages.

**Topics**
+ [Task outputs for WDL](#workflow-task-outputs-wdl)
+ [Task outputs for Nextflow](#workflow-task-outputs-nextflow)
+ [Task outputs for CWL](#workflow-task-outputs-cwl)

## Task outputs for WDL
<a name="workflow-task-outputs-wdl"></a>

For workflow definitions written in WDL, define your outputs in the top level workflow **outputs** section. 

HealthOmics

**Topics**
+ [Task output for STDOUT](#task-outputs-wdl-stdout)
+ [Task output for STDERR](#task-outputs-wdl-stderr)
+ [Task output to a file](#task-outputs-wdl-file)
+ [Task output to an array of files](#task-outputs-wdl-files)

### Task output for STDOUT
<a name="task-outputs-wdl-stdout"></a>

This example creates a task named `SayHello` that echoes the STDOUT content to the task output file. The WDL **stdout** function captures the STDOUT content (in this example, the input string **Hello World\$1**) in file **stdout\$1file**. 

Because HealthOmics creates logs for all STDOUT content, the output also appears in CloudWatch Logs, along with other STDERR logging information for the task.

```
version 1.0
 workflow HelloWorld {
    input {
        String message = "Hello, World!"
        String ubuntu_container = "123456789012.dkr.ecr.us-east-1.amazonaws.com/dockerhub/library/ubuntu:20.04"
    }

    call SayHello {
        input:
            message = message,
            container = ubuntu_container
    }

    output {
        File stdout_file = SayHello.stdout_file
    }
}

task SayHello {
    input {
        String message
        String container
    }

    command <<<
        echo "~{message}" 
        echo "Current date: $(date)"
        echo "This message was printed to STDOUT"
    >>>

    runtime {
        docker: container
        cpu: 1
        memory: "2 GB"
    }

    output {
        File stdout_file = stdout()
    }
}
```

### Task output for STDERR
<a name="task-outputs-wdl-stderr"></a>

This example creates a task named `SayHello` that echoes the STDERR content to the task output file. The WDL **stderr** function captures the STDERR content (in this example, the input string **Hello World\$1**) in file **stderr\$1file**. 

Because HealthOmics creates logs for all STDERR content, the output will appear in CloudWatch Logs, along with other STDERR logging information for the task.

```
version 1.0
 workflow HelloWorld {
    input {
        String message = "Hello, World!"
        String ubuntu_container = "123456789012.dkr.ecr.us-east-1.amazonaws.com/dockerhub/library/ubuntu:20.04"
    }

    call SayHello {
        input:
            message = message,
            container = ubuntu_container
    }

    output {
        File stderr_file = SayHello.stderr_file
    }
}

task SayHello {
    input {
        String message
        String container
    }

    command <<<
        echo "~{message}" >&2
        echo "Current date: $(date)" >&2
        echo "This message was printed to STDERR" >&2
    >>>

    runtime {
        docker: container
        cpu: 1
        memory: "2 GB"
    }

    output {
        File stderr_file = stderr()
    }
}
```

### Task output to a file
<a name="task-outputs-wdl-file"></a>

In this example, the SayHello task creates two files (message.txt and info.txt) and explicitly declares these files as the named outputs (message\$1file and info\$1file). 

```
version 1.0
workflow HelloWorld {
    input {
        String message = "Hello, World!"
        String ubuntu_container = "123456789012.dkr.ecr.us-east-1.amazonaws.com/dockerhub/library/ubuntu:20.04"
    }

    call SayHello {
        input:
            message = message,
            container = ubuntu_container
    }

    output {
        File message_file = SayHello.message_file
        File info_file = SayHello.info_file
    }
}

task SayHello {
    input {
        String message
        String container
    }

    command <<<
        # Create message file
        echo "~{message}" > message.txt
        
        # Create info file with date and additional information
        echo "Current date: $(date)" > info.txt
        echo "This message was saved to a file" >> info.txt
    >>>

    runtime {
        docker: container
        cpu: 1
        memory: "2 GB"
    }

    output {
        File message_file = "message.txt"
        File info_file = "info.txt"
    } 
}
```

### Task output to an array of files
<a name="task-outputs-wdl-files"></a>

In this example, the `GenerateGreetings` task generates an array of files as the task output. The task dynamically generates one greeting file for each member of the input array `names`. Because the file names are not known until runtime, the output definition uses the WDL glob() function to output all files that match the pattern `*_greeting.txt`. 

```
version 1.0
 workflow HelloArray {
    input {
        Array[String] names = ["World", "Friend", "Developer"]
        String ubuntu_container = "123456789012.dkr.ecr.us-east-1.amazonaws.com/dockerhub/library/ubuntu:20.04"
    }

    call GenerateGreetings {
        input:
            names = names,
            container = ubuntu_container
    }

    output {
        Array[File] greeting_files = GenerateGreetings.greeting_files
    }
}

task GenerateGreetings {
    input {
        Array[String] names
        String container
    }

    command  <<<
        # Create a greeting file for each name
        for name in ~{sep=" " names}; do
            echo "Hello, $name!" > ${name}_greeting.txt
        done
    >>>

    runtime {
        docker: container
        cpu: 1
        memory: "2 GB"
    }

    output {
        Array[File] greeting_files = glob("*_greeting.txt")
    }       
 }
```

## Task outputs for Nextflow
<a name="workflow-task-outputs-nextflow"></a>

For workflow definitions written in Nextflow, define a **publishDir** directive to export task content to your output Amazon S3 bucket. Set the **publishDir** value to `/mnt/workflow/pubdir`. 

For HealthOmics to export files to Amazon S3, the files must be in this directory.

If a task produces a group of output files for use as inputs to a subsequent task, we recommend that you group these files in a directory and emit the directory as a task output. Enumerating each individual file can result in an I/O bottleneck in the underlying file system. For example:

```
process my_task {
      ...
      // recommended
      output "output-folder/", emit: output
      
      // not recommended
      // output "output-folder/**", emit: output
      ...
  }
```

## Task outputs for CWL
<a name="workflow-task-outputs-cwl"></a>

For workflow definitions written in CWL, you can specify the task outputs using `CommandLineTool` tasks. The following sections show examples of `CommandLineTool` tasks that define different types of outputs.

**Topics**
+ [Task output for STDOUT](#task-outputs-cwl-stdout)
+ [Task output for STDERR](#task-outputs-cwl-stderr)
+ [Task output to a file](#task-outputs-cwl-file)
+ [Task output to an array of files](#task-outputs-cwl-files)

### Task output for STDOUT
<a name="task-outputs-cwl-stdout"></a>

This example creates a `CommandLineTool` task that echoes the STDOUT content to a text output file named **output.txt**. For example, if you provide the following input, the resulting task output is **Hello World\$1** in the **output.txt** file.

```
{
    "message": "Hello World!"
}
```

The `outputs` directive specifies that the output name is **example\$1out** and it’s type is `stdout`. For a downstream task to consume the output of this task, it would refer to the output as `example_out`.

Because HealthOmics creates logs for all STDERR and STDOUT content, the output also appears in CloudWatch Logs, along with other STDERR logging information for the task.

```
cwlVersion: v1.2
class: CommandLineTool
baseCommand: echo
stdout: output.txt
inputs:
  message:
    type: string
    inputBinding:
      position: 1
outputs:
  example_out:
    type: stdout

requirements:
    DockerRequirement:
        dockerPull: 123456789012.dkr.ecr.us-east-1.amazonaws.com/dockerhub/library/ubuntu:20.04
    ResourceRequirement:
        ramMin: 2048
        coresMin: 1
```

### Task output for STDERR
<a name="task-outputs-cwl-stderr"></a>

This example creates a `CommandLineTool` task that echoes the STDERR content to a text output file named **stderr.txt**. The task modifies the `baseCommand` so that `echo` writes to STDERR (instead of STDOUT).

The `outputs` directive specifies that the output name is **stderr\$1out** and it’s type is `stderr`. 

Because HealthOmics creates logs for all STDERR and STDOUT content, the output will appear in CloudWatch Logs, along with other STDERR logging information for the task.

```
cwlVersion: v1.2
class: CommandLineTool
baseCommand: [bash, -c]
stderr: stderr.txt
inputs:
  message:
    type: string
    inputBinding:
      position: 1
      shellQuote: true
      valueFrom: "echo $(self) >&2"
outputs:
  stderr_out:
    type: stderr

requirements:
    DockerRequirement:
        dockerPull: 123456789012.dkr.ecr.us-east-1.amazonaws.com/dockerhub/library/ubuntu:20.04
    ResourceRequirement:
        ramMin: 2048
        coresMin: 1
```

### Task output to a file
<a name="task-outputs-cwl-file"></a>

This example creates a `CommandLineTool` task that creates a compressed tar archive from the input files. You provide the name of the archive as an input parameter (archive\$1name). 

The **outputs** directive specifies that the `archive_file` output type is `File`, and it uses a reference to the input parameter `archive_name` to bind to the output file.

```
cwlVersion: v1.2
class: CommandLineTool
baseCommand: [tar, cfz]
inputs:
  archive_name:
    type: string
    inputBinding:
      position: 1
  input_files:
    type: File[]
    inputBinding:
      position: 2
      
outputs:
  archive_file:
    type: File
    outputBinding:
      glob: "$(inputs.archive_name)"

requirements:
    DockerRequirement:
        dockerPull: 123456789012.dkr.ecr.us-east-1.amazonaws.com/dockerhub/library/ubuntu:20.04
    ResourceRequirement:
        ramMin: 2048
        coresMin: 1
```

### Task output to an array of files
<a name="task-outputs-cwl-files"></a>

In this example, the `CommandLineTool` task creates an array of files using the `touch` command. The command uses the strings in the `files-to-create` input parameter to name the files. The command outputs an array of files. The array includes any files in the working directory that match the `glob` pattern. This example uses a wildcard pattern ("\$1") that matches all files.

```
cwlVersion: v1.2
class: CommandLineTool
baseCommand: touch
inputs:
  files-to-create:
    type:
      type: array
      items: string
    inputBinding:
      position: 1
outputs:
  output-files:
    type:
      type: array
      items: File
    outputBinding:
      glob: "*"

requirements:
    DockerRequirement:
        dockerPull: 123456789012.dkr.ecr.us-east-1.amazonaws.com/dockerhub/library/ubuntu:20.04
    ResourceRequirement:
        ramMin: 2048
        coresMin: 1
```

# Task resources in a HealthOmics workflow definition
<a name="task-resources"></a>

In the workflow definition, define the following for each task:
+ The container image for task. For more information, see [Container images for private workflows](workflows-ecr.md).
+ The number of CPUs and memory required for the task. For more information, see [Compute and memory requirements for HealthOmics tasks](memory-and-compute-tasks.md).

HealthOmics ignores any per-task storage specifications. HealthOmics provides run storage that all tasks in the run can access. For more information, see [Run storage types in HealthOmics workflows](workflows-run-types.md).

------
#### [ WDL ]

```
task my_task {
   runtime {
      container: "<aws-account-id>.dkr.ecr.<aws-region>.amazonaws.com/<image-name>"
      cpu: 2
      memory: "4 GB"
   }
   ...
}
```

For a WDL workflow, HealthOmics attempts up to two retries for a task that fails because of service errors (API request returns a 5XX HTTP status code). For more information about task retries, see [Task Retries](monitoring-runs.md#run-status-task-retries).

You can opt out of the retry behavior by specifying the following configuration for the task in the WDL definition file:

```
runtime {
   preemptible: 0
}
```

------
#### [ NextFlow ]

```
process my_task {
   container "<aws-account-id>.dkr.ecr.<aws-region>.amazonaws.com/<image-name>"
   cpus 2
   memory "4 GiB"
   ...
}
```

------
#### [ CWL ]

```
cwlVersion: v1.2
class: CommandLineTool
requirements:
    DockerRequirement:
        dockerPull: "<aws-account-id>.dkr.ecr.<aws-region>.amazonaws.com/<image-name>"
    ResourceRequirement:
        coresMax: 2
        ramMax: 4000 # specified in mebibytes
```

------

# Task accelerators in a HealthOmics workflow definition
<a name="task-accelerators"></a>

In the workflow definition, you can optionally specify the GPU accelerator-spec for a task. HealthOmics supports the following accelerator-spec values, along with the supported instance types:


| Accelerator spec | Healthomics instance types | 
| --- | --- | 
| nvidia-tesla-t4 | G4 | 
| nvidia-tesla-t4-a10g | G4 and G5 | 
| nvidia-tesla-a10g | G5 | 
| nvidia-t4-a10g-l4 | G4, G5, and G6 | 
| nvidia-l4-a10g | G5 and G6 | 
| nvidia-l4 | G6 | 
| nvidia-l40s | G6e | 

If you specify an accelerator type that supports multiple instance types, HealthOmics selects the instance type based on available capacity. If both instance types are available, HealthOmics gives preference to the lower cost instance. The exception is for the nvidia-t4-a10g-l4 task accelerator which gives preference to the latest generation instance available.

For details about the instance types, see [Accelerated-computing instances](memory-and-compute-tasks.md#workflow-task-accelerated-computing-instances).

In the following example, the workflow definition specifies `nvidia-l4` as the accelerator:

------
#### [ WDL ]

```
task my_task {
 runtime {
    ...
    acceleratorCount: 1
    acceleratorType: "nvidia-l4"
 }
 ...
}
```

------
#### [ NextFlow ]

```
process my_task {
 ...
 accelerator 1, type: "nvidia-l4"
 ...
}
```

------
#### [ CWL ]

```
cwlVersion: v1.2
class: CommandLineTool
requirements:
  ...
  cwltool:CUDARequirement:
      cudaDeviceCountMin: 1
      cudaComputeCapability: "nvidia-l4"
      cudaVersionMin: "1.0"
```

------

# WDL workflow definition specifics
<a name="workflow-languages-wdl"></a>

The following topics provide details about types and directives available for WDL workflow definitions in HealthOmics.

**Topics**
+ [Implicit type conversion in WDL lenient](#workflow-wdl-type-conversion)
+ [Namespace definition in input.json](#workflow-wdl-namespace-defn)
+ [Primitive types in WDL](#workflow-wdl-primitive-types)
+ [Complex types in WDL](#workflow-wdl-complex-types)
+ [Directives in WDL](#workflow-wdl-directives)
+ [Task metadata in WDL](#workflow-wdl-task-metadata)
+ [WDL workflow definition example](#wdl-example)

## Implicit type conversion in WDL lenient
<a name="workflow-wdl-type-conversion"></a>

HealthOmics supports implicit type conversion in the input.json file and the workflow definition. To use implicit type casting, specify the workflow engine as WDL lenient when you create the workflow. WDL lenient is designed to handle workflows migrated from Cromwell. It supports customer Cromwell directives and some non-conformant logic.

WDL lenient supports type conversion for the following items in the list of WDL’s [limited exceptions](https://github.com/openwdl/wdl/blob/wdl-1.2/SPEC.md#-limited-exceptions):
+ Float to Int, where the coercion results in no loss of precision (such as 1.0 maps to 1).
+ String to Int/Float, where the coercion results in no loss of precision.
+ Map[W, X] to Array[Pair[Y, Z]], in the case where W is coercible to Y and X is coercible to Z.
+ Array[Pair[W, X]] to Map[Y, Z], in the case where W is coercible to Y and X is coercible to Z (such as 1.0 maps to 1).

To use implicit type casting, specify the workflow engine as WDL\$1LENIENT when you create the workflow or workflow version.

In the console, the workflow engine parameter is named **Language**. In the API, the workflow engine parameter is named **engine**. For more information, see [Create a private workflow](create-private-workflow.md) or [Create a workflow version](workflows-version-create.md).

## Namespace definition in input.json
<a name="workflow-wdl-namespace-defn"></a>

HealthOmics supports fully qualified variables in input.json. For example, if you declare two input variables named number1 and number2 in workflow **SumWorkflow**:

```
workflow SumWorkflow {
  input {
    Int number1
    Int number2
  }
}
```

 You can use them as fully qualified variables in input.json: 

```
{
    "SumWorkflow.number1": 15,
    "SumWorkflow.number2": 27
}
```

## Primitive types in WDL
<a name="workflow-wdl-primitive-types"></a>

The following table shows how inputs in WDL map to the matching primitive types. HealthOmics provides limited support for type coercion, so we recommend that you set explicit types. 


**Primitive types**  

| WDL type | JSON type | Example WDL | Example JSON key and value | Notes | 
| --- | --- | --- | --- | --- | 
| Boolean | boolean | Boolean b | "b": true | The value must be lower case and unquoted. | 
| Int | integer | Int i | "i": 7 | Must be unquoted. | 
| Float | number | Float f | "f": 42.2 | Must be unquoted. | 
| String | string | String s | "s": "characters" | JSON strings that are a URI must be mapped to a WDL file to be imported. | 
| File | string | File f | "f": "s3://amzn-s3-demo-bucket1/path/to/file" | Amazon S3 and HealthOmics storage URIs are imported as long as the IAM role provided for the workflow has read access to these objects. No other URI schemes are supported (such as file://, https://, and ftp://). The URI must specify an object. It cannot be a directory meaning it can not end with a /. | 
| Directory | string | Directory d | "d": "s3://bucket/path/" | The Directory type isn't included in WDL 1.0 or 1.1, so you will need to add version development to the header of the WDL file. The URI must be a Amazon S3 URI and with a prefix that ends with a '/'. All contents of the directory will be recursively copied to the workflow as a single download. The Directory should only contain files related to the workflow. | 

## Complex types in WDL
<a name="workflow-wdl-complex-types"></a>

The following table show how inputs in WDL map to the matching complex JSON types. Complex types in WDL are data structures comprised of primitive types. Data structures such as lists will be converted to arrays.


**Complex types**  

| WDL type | JSON type | Example WDL | Example JSON key and value | Notes | 
| --- | --- | --- | --- | --- | 
| Array | array | Array[Int] nums | “nums": [1, 2, 3] | The members of the array must follow the format of the WDL array type. | 
| Pair | object | Pair[String, Int] str\$1to\$1i | “str\$1to\$1i": \$1"left": "0", "right": 1\$1 | Each value of the pair must use the JSON format of its matching WDL type. | 
| Map | object | Map[Int, String] int\$1to\$1string | "int\$1to\$1string": \$1 2: "hello", 1: "goodbye" \$1 | Each entry in the map must use the JSON format of its matching WDL type. | 
| Struct | object | <pre>struct SampleBamAndIndex { <br />  String sample_name <br />  File bam <br />  File bam_index <br />} SampleBamAndIndex b_and_i</pre>  |  <pre>"b_and_i": { <br />   "sample_name": "NA12878", <br />   "bam": "s3://amzn-s3-demo-bucket1/NA12878.bam", <br />   "bam_index": "s3://amzn-s3-demo-bucket1/NA12878.bam.bai" <br />}           </pre>  | The names of the struct members must exactly match the names of the JSON object keys. Each value must use the JSON format of the matching WDL type. | 
| Object | N/A | N/A | N/A | The WDL Object type is outdated and should be replaced by Struct in all cases. | 

## Directives in WDL
<a name="workflow-wdl-directives"></a>

HealthOmics supports the following directives in all WDL versions that HealthOmics supports.

### Configure GPU resources
<a name="workflow-wdl-directive-gpu"></a>

HealthOmics supports runtime attributes **acceleratorType** and **acceleratorCount** with all supported [GPU instances](https://docs.aws.amazon.com/omics/latest/dev/task-accelerators.html). HealthOmics also supports aliases named **gpuType** and **gpuCount**, which have the same functionality as their accelerator counterparts. If the WDL definition contains both directives, HealthOmics uses the accelerator values.

The following example shows how to use these directives:

```
runtime {
    gpuCount: 2
    gpuType: "nvidia-tesla-t4"
}
```

### Configure task retry for service errors
<a name="workflow-wdl-task-retry"></a>

HealthOmics supports up to two retries for a task that failed because of service errors (5XX HTTP status codes). You can configure the maximum number of retries (1 or 2) and you can opt out of retries for service errors. By default, HealthOmics attempts a maximum of two retries. 

The following example sets `preemptible` to opt out of retries for service errors:

```
{
  preemptible: 0 
}
```

For more information about task retries in HealthOmics, see [Task Retries](monitoring-runs.md#run-status-task-retries).

### Configure task retry for out of memory
<a name="workflow-wdl-retries"></a>

HealthOmics supports retries for a task that failed because it ran out of memory (container exit code 137, 4XX HTTP status code). HealthOmics doubles the amount of memory for each retry attempt.

By default, HealthOmics doesn't retry for this type of failure. Use the `maxRetries` directive to specify the maximum number of retries.

The following example sets `maxRetries` to 3, so that HealthOmics attempts a maximum of four attempts to complete the task (the initial attempt plus three retries):

```
runtime {
    maxRetries: 3
}
```

**Note**  
Task retry for out of memory requires GNU findutils 4.2.3\$1. The default HealthOmics image container includes this package. If you specify a custom image in your WDL definition, make sure that the image includes GNU findutils 4.2.3\$1.

### Configure return codes
<a name="workflow-wdl-directive-returnCodes"></a>

The **returnCodes** attribute provides a mechanism to specify a return code, or a set of return codes, that indicates a successful execution of a task. The WDL engine honors the return codes that you specify in the **runtime** section of the WDL definition, and sets the tasks status accordingly. 

```
runtime {
    returnCodes: 1
}
```

HealthOmics also supports an alias named **continueOnReturnCode**, which has the same capabilities as **returnCodes**. If you specify both attributes, HealthOmics uses the **returnCodes** value.

## Task metadata in WDL
<a name="workflow-wdl-task-metadata"></a>

HealthOmics supports the following metadata options for WDL tasks.

### Disable task-level caching with the volatile attribute
<a name="workflow-wdl-volatile-attribute"></a>

The **volatile** attribute allows you to disable call caching for specific tasks in your WDL workflow. When a task is marked as volatile, it will always execute and never use cached results, even when caching is enabled for the run.

Add the **volatile** attribute to the **meta** section of your task definition:

```
task my_volatile_task {
    meta {
        volatile: true
    }
    
    input {
        String input_file
    }
    
    command {
        echo "Processing ${input_file}" > output.txt
    }
    
    output {
        File result = "output.txt"
    }
}
```

## WDL workflow definition example
<a name="wdl-example"></a>

The following examples show private workflow definitions for converting from `CRAM` to `BAM` in WDL. The `CRAM` to `BAM` workflow defines two tasks and uses tools from the `genomes-in-the-cloud` container, which is shown in the example and is publicly available. 

The following example shows how to include the Amazon ECR container as a parameter. This allows HealthOmics to verify the access permissions to your container before it starts the run the run.

```
{
   ...
   "gotc_docker":"<account_id>.dkr.ecr.<region>.amazonaws.com/genomes-in-the-cloud:2.4.7-1603303710"
}
```

The following example shows how to specify which files to use in your run, when the files are in an Amazon S3 bucket.

```
{
    "input_cram": "s3://amzn-s3-demo-bucket1/inputs/NA12878.cram",
    "ref_dict": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.dict",
    "ref_fasta": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.fasta",
    "ref_fasta_index": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.fasta.fai",
    "sample_name": "NA12878"
}
```

If you want to specify files from a sequence store, indicate that as shown in the following example, using the URI for the sequence store.

```
{
    "input_cram": "omics://429915189008.storage.us-west-2.amazonaws.com/111122223333/readSet/4500843795/source1",
    "ref_dict": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.dict",
    "ref_fasta": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.fasta",
    "ref_fasta_index": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.fasta.fai",
    "sample_name": "NA12878"
}
```

You can then define your workflow in WDL as shown in the following example. 

```
 version 1.0
workflow CramToBamFlow {
    input {
        File ref_fasta
        File ref_fasta_index
        File ref_dict
        File input_cram
        String sample_name
        String gotc_docker = "<account>.dkr.ecr.us-west-2.amazonaws.com/genomes-in-the-
cloud:latest"
    }
    #Converts CRAM to SAM to BAM and makes BAI.
    call CramToBamTask{
         input:
            ref_fasta = ref_fasta,
            ref_fasta_index = ref_fasta_index,
            ref_dict = ref_dict,
            input_cram = input_cram,
            sample_name = sample_name,
            docker_image = gotc_docker,
     }
     #Validates Bam.
     call ValidateSamFile{
        input:
           input_bam = CramToBamTask.outputBam,
           docker_image = gotc_docker,
     }
     #Outputs Bam, Bai, and validation report to the FireCloud data model.
     output {
         File outputBam = CramToBamTask.outputBam
         File outputBai = CramToBamTask.outputBai
         File validation_report = ValidateSamFile.report
      }
}
#Task definitions.
task CramToBamTask {
    input {
       # Command parameters
       File ref_fasta
       File ref_fasta_index
       File ref_dict
       File input_cram
       String sample_name
       # Runtime parameters
       String docker_image
    }
   #Calls samtools view to do the conversion.
   command {
       set -eo pipefail

       samtools view -h -T ~{ref_fasta} ~{input_cram} |
       samtools view -b -o ~{sample_name}.bam -
       samtools index -b ~{sample_name}.bam
       mv ~{sample_name}.bam.bai ~{sample_name}.bai
    }
    
    #Runtime attributes:
    runtime {
        docker: docker_image
    }

    #Outputs a BAM and BAI with the same sample name
     output {
         File outputBam = "~{sample_name}.bam"
         File outputBai = "~{sample_name}.bai"
    }
}

#Validates BAM output to ensure it wasn't corrupted during the file conversion.
task ValidateSamFile {
   input {
      File input_bam
      Int machine_mem_size = 4
      String docker_image
   }
   String output_name = basename(input_bam, ".bam") + ".validation_report"
   Int command_mem_size = machine_mem_size - 1
   command {
       java -Xmx~{command_mem_size}G -jar /usr/gitc/picard.jar \
       ValidateSamFile \
       INPUT=~{input_bam} \
       OUTPUT=~{output_name} \
       MODE=SUMMARY \
       IS_BISULFITE_SEQUENCED=false
    }
    runtime {
    docker: docker_image
    }
   #A text file is generated that lists errors or warnings that apply.
    output {
        File report = "~{output_name}"
    }
}
```

# Nextflow workflow definition specifics
<a name="workflow-definition-nextflow"></a>

HealthOmics suppports Nextflow DSL1 and DSL2. For details, see [Nextflow version support](workflows-lang-versions.md#workflows-lang-versions-nextflow).

Nextflow DSL2 is based on the Groovy programming language, so parameters are dynamic and type coercion is possible using the same rules as Groovy. Parameters and values supplied by the input JSON are available in the parameters (`params`) map of the workflow.

**Topics**
+ [Use nf-schema and nf-validation plugins](#schema-and-validation-plugins-nextflow)
+ [Specify storage URIs](#storage-uris-nextflow)
+ [Nextflow directives](#workflow-nexflow-directives)
+ [Export workflow-level content](#exporting-workflow-content-nextflow)
+ [Export task content](#exporting-task-content-nextflow)

## Use nf-schema and nf-validation plugins
<a name="schema-and-validation-plugins-nextflow"></a>

**Note**  
Summary of HealthOmics support for plugins:  
v22.04 – no support for plugins
v23.10 – supports `nf-schema` and `nf-validation`
v24.10 – supports `nf-schema`
v25.10 – supports `nf-schema`, `nf-core-utils`, `nf-fgbio`, and `nf-prov`

HealthOmics provides the following support for Nextflow plugins:
+ For Nextflow v23.10, HealthOmics pre-installs the nf-validation@1.1.1 plugin. 
+ For Nextflow v23.10 and later, HealthOmics pre-installs the nf-schema@2.3.0 plugin.
+ You cannot retrieve additional plugins during a workflow run. HealthOmics ignores any other plugin versions that you specify in the `nextflow.config` file.
+ For Nextflow v24 and higher, `nf-schema` is the new version of the deprecated `nf-validation` plugin. For more information, see [ nf-schema](https://github.com/nextflow-io/nf-schema) in the Nextflow GitHub repository.

## Specify storage URIs
<a name="storage-uris-nextflow"></a>

When an Amazon S3 or HealthOmics URI is used to construct a Nextflow file or path object, it makes the matching object available to the workflow, as long as read access is granted. The use of prefixes or directories is allowed for Amazon S3 URIs. For examples, see [Amazon S3 input parameter formats](workflows-run-inputs.md#s3-run-input-formats). 

HealthOmics partially supports the use of glob patterns in Amazon S3 URIs or HealthOmics Storage URIs. Use Glob patterns in the workflow definition for the creation of `path` or `file` channels. For the expected behavior and exact cases, see [Nextflow Handling of Glob pattern in Amazon S3 inputs](workflows-run-inputs.md#wd-nextflow-s3-formats).

## Nextflow directives
<a name="workflow-nexflow-directives"></a>

You configure Nextflow directives in the Nextflow config file or workflow definition. The following list shows the order of precedence that HealthOmics uses to apply configuration settings, from lowest to highest priority:

1. Global configuration in the config file.

1. Task section of the workflow definition.

1. Task-specific selectors in the config file.

**Topics**
+ [Task retry strategy using `errorStrategy`](#workflow-nextflow-errorStrategy)
+ [Task retry attempts using `maxRetries`](#workflow-nexflow-task-retry)
+ [Opt out of task retry using `omicsRetryOn5xx`](#workflow-nextflow-retry-5xx)
+ [Task duration using the `time` directive](#time-directive-nextflow)

### Task retry strategy using `errorStrategy`
<a name="workflow-nextflow-errorStrategy"></a>

Use the `errorStrategy` directive to define the strategy for task errors. By default, when a task returns with an error indication (a non-zero exit status), the task stops and HealthOmics terminates the entire run. If you set `errorStrategy` to `retry`, HealthOmics attempts one retry of the failed task. To increase the number of retries, see [Task retry attempts using `maxRetries`](#workflow-nexflow-task-retry).

```
process {
    label 'my_label'
    errorStrategy 'retry'

    script:
    """
    your-command-here
    """
}
```

For information about how HealthOmics handles task retries during a run, see [Task Retries](monitoring-runs.md#run-status-task-retries).

### Task retry attempts using `maxRetries`
<a name="workflow-nexflow-task-retry"></a>

By default, HealthOmics doesn't attempt any retries of a failed task, or attempts one retry if you configure `errorStrategy`. To increase the maximum number of retries, set `errorStrategy` to `retry` and configure the maximum number of retries using the `maxRetries` directive.

The following example sets the maximum number of retries to 3 in the global configuration.

```
process {
    errorStrategy = 'retry'
    maxRetries = 3
}
```

The following example shows how to set `maxRetries` in the task section of the workflow definition.

```
process myTask {
    label 'my_label'
    errorStrategy 'retry'
    maxRetries 3
    
    script:
    """
    your-command-here
    """
}
```

The following example shows how to specify task-specific configuration in the Nextflow config file, based on the name or label selectors.

```
process {
    withLabel: 'my_label' {
        errorStrategy = 'retry'
        maxRetries = 3
    }

    withName: 'myTask' {
        errorStrategy = 'retry'
        maxRetries = 3
    }
}
```

### Opt out of task retry using `omicsRetryOn5xx`
<a name="workflow-nextflow-retry-5xx"></a>

For Nextflow v23 and later, HealthOmics supports task retries if the task failed because of service errors (5XX HTTP status codes). By default, HealthOmics attempts up to two retries of a failed task. 

You can configure `omicsRetryOn5xx` to opt out of task retry for service errors. For more information about task retry in HealthOmics, see [Task Retries](monitoring-runs.md#run-status-task-retries).

The following example configures `omicsRetryOn5xx` in the global configuration to opt out of task retry.

```
process {
    omicsRetryOn5xx = false
}
```

The following example shows how to configure `omicsRetryOn5xx` in the task section of the workflow definition.

```
process myTask {
    label 'my_label'
    omicsRetryOn5xx = false
    
    script:
    """
    your-command-here
    """
}
```

The following example shows how to set `omicsRetryOn5xx` as task-specific configuration in the Nextflow config file, based on the name or label selectors.

```
process {
    withLabel: 'my_label' {
        omicsRetryOn5xx = false
    }

    withName: 'myTask' {
        omicsRetryOn5xx = false
    }
}
```

### Task duration using the `time` directive
<a name="time-directive-nextflow"></a>

HealthOmics provides an adjustable quota (see [HealthOmics service quotas](service-quotas.md)) to specify the maximum duration for a run. For Nextflow v23 and later workflows, you can also specify maximum task durations using the Nextflow `time` directive.

During new workflow development, setting maximum task duration helps you catch runaway tasks and long-running tasks. 

For more information about the Nextflow time directive, see [time directive](https://www.nextflow.io/docs/latest/reference/process.html#process-time) in the Nextflow reference.

HealthOmics provides the following support for the Nextflow time directive:

1. HealthOmics supports 1 minute granularity for the time directive. You can specify a value between 60 seconds and the maximum run duration value.

1. If you enter a value less than 60, HealthOmics rounds it up to 60 seconds. For values above 60, HealthOmics rounds down to the nearest minute.

1. If the workflow supports retries for a task, HealthOmics retries the task if it times out.

1. If a task times out (or the last retry times out), HealthOmics cancels the task. This operation can have a duration of one to two minutes.

1. On task timeout, HealthOmics sets the run and task status to failed, and it cancels the other tasks in the run (for tasks in Starting, Pending, or Running status). HealthOmics exports the outputs from tasks that it completed before the timeout to your designated S3 output location. 

1. Time that a task spends in pending status does not count toward the task duration.

1. If the run is part of a run group and the run group times out sooner than the task timer, the run and task transition to failed status.

Specify the timeout duration using one or more of the following units: `ms`, `s`, `m`,`h`, or `d`.

The following example shows how to specify global configuration in the Nextflow config file. It sets a global timeout of 1 hour and 30 minutes.

```
process {
    time = '1h30m'
}
```

The following example shows how to specify a time directive in the task section of the workflow definition. This example sets a timeout of 3 days, 5 hours, and 4 minutes. This value takes precedence over the global value in the config file, but doesn't take precedence over a task-specific time directive for `my_label` in the config file.

```
process myTask {
    label 'my_label'
    time '3d5h4m'
        
    script:
    """
    your-command-here
    """
}
```

The following example shows how to specify task-specific time directives in the Nextflow config file, based on the name or label selectors. This example sets a global task timeout value of 30 minutes. It sets a value of 2 hours for task `myTask` and sets a value of 3 hours for tasks with label `my_label`. For tasks that match the selector, these values take precedence over the global value and the value in the workflow definition.

```
process {
    time = '30m'
    
    withLabel: 'my_label' {
        time = '3h'  
    }

    withName: 'myTask' {
        time = '2h'  
    }
}
```

## Export workflow-level content
<a name="exporting-workflow-content-nextflow"></a>

For Nextflow v25.10, you can export files produced outside of individual tasks, such as provenance reports or pipeline DAGs. To export these files, write them to `/mnt/workflow/output/`. HealthOmics exports files placed in this directory to the `output/` prefix in your run's Amazon S3 output location.

The following example shows how to configure the `nf-prov` plugin to write a provenance report to `/mnt/workflow/output/`.

```
prov {
    formats {
        bco {
            file = "/mnt/workflow/output/pipeline_info/manifest.bco.json"
        }
    }
}
```

You can also pass this path as a parameter in your run's input JSON. This approach is common with nf-core workflows that use `params.outdir`.

```
{
    "outdir": "/mnt/workflow/output/"
}
```

## Export task content
<a name="exporting-task-content-nextflow"></a>

For workflows written in Nextflow, define a **publishDir** directive to export task content to your output Amazon S3 bucket. As shown in the following example, set the **publishDir** value to `/mnt/workflow/pubdir`. To export files to Amazon S3, the files must be in this directory.

```
 nextflow.enable.dsl=2
              
  workflow {
    CramToBamTask(params.ref_fasta, params.ref_fasta_index, params.ref_dict, params.input_cram, params.sample_name)
    ValidateSamFile(CramToBamTask.out.outputBam)
  }
  
  process CramToBamTask {
    container "<account>.dkr.ecr.us-west-2.amazonaws.com/genomes-in-the-cloud"
  
    publishDir "/mnt/workflow/pubdir"
  
    input:
        path ref_fasta
        path ref_fasta_index
        path ref_dict
        path input_cram
        val sample_name
  
    output:
        path "${sample_name}.bam", emit: outputBam
        path "${sample_name}.bai", emit: outputBai
  
    script:
    """
        set -eo pipefail
  
        samtools view -h -T $ref_fasta $input_cram |
        samtools view -b -o ${sample_name}.bam -
        samtools index -b ${sample_name}.bam
        mv ${sample_name}.bam.bai ${sample_name}.bai
    """
  }
  
  process ValidateSamFile {
    container "<account>.dkr.ecr.us-west-2.amazonaws.com/genomes-in-the-cloud"
  
    publishDir "/mnt/workflow/pubdir"
  
    input:
        file input_bam
  
    output:
        path "validation_report"
  
    script:
    """
        java -Xmx3G -jar /usr/gitc/picard.jar \
        ValidateSamFile \
        INPUT=${input_bam} \
        OUTPUT=validation_report \
        MODE=SUMMARY \
        IS_BISULFITE_SEQUENCED=false
    """
  }
```

For Nextflow v25.10, as an alternative to `publishDir`, you can use workflow outputs to export task content. The following example shows how to define a workflow `output` block that exports task results to Amazon S3.

```
process myTask {
    input:
    val data

    output:
    path 'result.txt'

    script:
    """
    echo ${data} > result.txt
    """
}

workflow {
    main:
    output_file = myTask('hello')

    publish:
    results = output_file
}

output {
    results {
        path '.'
    }
}
```

For more information about workflow outputs, see [Workflow outputs](https://www.nextflow.io/docs/latest/workflow.html#workflow-output-def) in the Nextflow documentation.

# CWL workflow definition specifics
<a name="workflow-languages-cwl"></a>

Workflows written in Common Workflow Language, or CWL, offer similar functionality to workflows written in WDL and Nextflow. You can use Amazon S3 or HealthOmics storage URIs as input parameters. 

If you define input in a secondaryFile in a sub workflow, add the same definition in the main workflow.

HealthOmics workflows don't support operation processes. To learn more about operations processes in CWL workflows, see the [CWL documentation](https://www.commonwl.org/user_guide/topics/operations.html).

Best practice is to define a separate CWL workflow for each container that you use. We recommend that you don't hardcode the dockerPull entry with a fixed Amazon ECR URI.

**Topics**
+ [Convert CWL workflows to use HealthOmics](#workflow-cwl-convert)
+ [Opt out of task retry using `omicsRetryOn5xx`](#workflow-cwl-retry-5xx)
+ [Loop a workflow step](#workflow-cwl-loop)
+ [Retry tasks with increased memory](#workflow-cwl-out-of-memory-retry)
+ [Examples](#workflow-cwl-examples)

## Convert CWL workflows to use HealthOmics
<a name="workflow-cwl-convert"></a>

To convert an existing CWL workflow definition to use HealthOmics, make the following changes:
+ Replace all Docker container URIs with Amazon ECR URIs.
+ Make sure that all the workflow files are declared in the main workflow as input, and all variables are explicitly defined.
+ Make sure that all JavaScript code is strict-mode complaint.

## Opt out of task retry using `omicsRetryOn5xx`
<a name="workflow-cwl-retry-5xx"></a>

HealthOmics supports task retries if the task failed because of service errors (5XX HTTP status codes). By default, HealthOmics attempts up to two retries of a failed task. For more information about task retry in HealthOmics, see [Task Retries](monitoring-runs.md#run-status-task-retries).

To opt out of task retry for service errors, configure the `omicsRetryOn5xx` directive in the workflow definition. You can define this directive under requirements or hints. We recommend adding the directive as a hint for portability.

```
requirements:
  ResourceRequirement:
    omicsRetryOn5xx: false

hints:
  ResourceRequirement:
    omicsRetryOn5xx: false
```

Requirements override hints. If a task implementation provides a resource requirement in hints that is also provided by requirements in an enclosing workflow, the enclosing requirements takes precedence.

If the same task requirement appears at different levels of the workflow, HealthOmics uses the most specific entry from `requirements` (or `hints`, if there are no entries in `requirements`). The following list shows the order of precedence that HealthOmics uses to apply configuration settings, from lowest to highest priority:
+ Workflow level
+ Step level
+ Task section of the workflow definition

The following example shows how to configure the `omicsRetryOn5xx` directive at different levels of the workflow. In this example, the workflow-level requirement overrides the workflow level hints. The requirements configurations at the task and step levels override the hints configurations.

```
class: Workflow
# Workflow-level requirement and hint
requirements:
  ResourceRequirement:
    omicsRetryOn5xx: false

hints:
  ResourceRequirement:
    omicsRetryOn5xx: false  # The value in requirements overrides this value 

steps:
  task_step:
    # Step-level requirement
    requirements:
      ResourceRequirement:
        omicsRetryOn5xx: false
    # Step-level hint
    hints:
      ResourceRequirement:
        omicsRetryOn5xx: false
    run:
      class: CommandLineTool
      # Task-level requirement
      requirements:
        ResourceRequirement:
          omicsRetryOn5xx: false
      # Task-level hint
      hints:
        ResourceRequirement:
          omicsRetryOn5xx: false
```

## Loop a workflow step
<a name="workflow-cwl-loop"></a>

HealthOmics supports looping a workflow step. You can use loops to run workflow steps repeatedly until a specified condition is met. This is useful for iterative processes where you need to repeat a task multiple times or until a certain result is achieved.

**Note:** Loop functionality requires CWL version 1.2 or later. Workflows using CWL versions earlier than 1.2 do not support loop operations.

To use loops in your CWL workflow, define a Loop requirement. The following example shows the loop requirement configuration:

```
requirements:
  - class: "http://commonwl.org/cwltool#Loop"
    loopWhen: $(inputs.counter < inputs.max)
    loop:
      counter:
        loopSource: result
        valueFrom: $(self)
    outputMethod: last
```

The `loopWhen` field controls when the loop terminates. In this example, the loop continues as long as the counter is less than the maximum value. The `loop` field defines how input parameters are updated between iterations. The `loopSource` specifies which output from the previous iteration feeds into the next iteration. The `outputMethod` field set to `last` returns only the final iteration's output.

## Retry tasks with increased memory
<a name="workflow-cwl-out-of-memory-retry"></a>

HealthOmics supports automatic retry of out-of-memory task failures. When a task exits with code 137 (out-of-memory), HealthOmics creates a new task with increased memory allocation based on the specified multiplier.

**Note**  
HealthOmics retries out-of-memory failures up to 3 times or until the memory allocation reaches 1536 GiB, whichever limit is reached first.

The following example shows how to configure out-of-memory retry:

```
hints:
  ResourceRequirement:
    ramMin: 4096
  http://arvados.org/cwl#OutOfMemoryRetry:
    memoryRetryMultiplier: 2.5
```

When a task fails due to out-of-memory, HealthOmics calculates the retry memory allocation using the formula: `previous_run_memory × memoryRetryMultiplier`. In the example above, if the task with 4096 MB of memory fails, the retry attempt uses 4096 × 2.5 = 10,240 MB of memory.

The `memoryRetryMultiplier` parameter controls how much additional memory to allocate for retry attempts:
+ **Default value:** If you don't specify a value, it defaults to `2` (doubles the memory)
+ **Valid range:** Must be a positive number greater than `1`. Invalid values result in a 4XX validation error
+ **Minimum effective value:** Values between `1` and `1.5` are automatically increased to `1.5` to ensure meaningful memory increases and prevent excessive retry attempts

## Examples
<a name="workflow-cwl-examples"></a>

The following is an example of a workflow written in CWL. 

```
cwlVersion: v1.2
class: Workflow

inputs:
in_file:
type: File
secondaryFiles: [.fai]

out_filename: string
docker_image: string


outputs:
copied_file:
type: File
outputSource: copy_step/copied_file

steps:
copy_step:
in:
  in_file: in_file
  out_filename: out_filename
  docker_image: docker_image
out: [copied_file]
run: copy.cwl
```

The following file defines the `copy.cwl` task.

```
cwlVersion: v1.2
class: CommandLineTool
baseCommand: cp

inputs:
in_file:
type: File
secondaryFiles: [.fai]
inputBinding:
  position: 1

out_filename:
type: string
inputBinding:
  position: 2
docker_image:
type: string

outputs:
copied_file:
type: File
outputBinding:
    glob: $(inputs.out_filename)

requirements:
InlineJavascriptRequirement: {}
DockerRequirement:
dockerPull: "$(inputs.docker_image)"
```

The following is an example of a workflow written in CWL with a GPU requirement.

```
cwlVersion: v1.2
class: CommandLineTool
baseCommand: ["/bin/bash", "docm_haplotypeCaller.sh"]
$namespaces:
cwltool: http://commonwl.org/cwltool#
requirements:
cwltool:CUDARequirement:
cudaDeviceCountMin: 1
cudaComputeCapability: "nvidia-tesla-t4" 
cudaVersionMin: "1.0"
InlineJavascriptRequirement: {}
InitialWorkDirRequirement:
listing:
- entryname: 'docm_haplotypeCaller.sh'
  entry: |
          nvidia-smi --query-gpu=gpu_name,gpu_bus_id,vbios_version --format=csv   

inputs: []
outputs: []
```

# Example workflow definitions
<a name="workflow-definition-examples"></a>

The following example shows the same workflow definition in WDL, Nextflow, and CWL.

------
#### [ WDL ]

```
version 1.1

task my_task {
   runtime { ... }
   inputs {
       File input_file
       String name
       Int threshold
   }
   
   command <<<
   my_tool --name ~{name} --threshold ~{threshold} ~{input_file}
   >>>
   
   output {
       File results = "results.txt"
   }
}

workflow my_workflow {
   inputs {
       File input_file
       String name
       Int threshold = 50
   }
   
   call my_task {
       input:
          input_file = input_file,
          name = name,
          threshold = threshold
   }
   outputs {
       File results = my_task.results
   }
}
```

------
#### [ Nextflow ]

```
nextflow.enable.dsl = 2

params.input_file = null
params.name = null
params.threshold = 50

process my_task {
   // <directives>
   
   input:
     path input_file
     val name
     val threshold
   
   output:
     path 'results.txt', emit: results
   
   script:
     """
     my_tool --name ${name} --threshold ${threshold} ${input_file}
     """
     
   
}

workflow MY_WORKFLOW {
   my_task(
       params.input_file,
       params.name,
       params.threshold
   )
}

workflow {
   MY_WORKFLOW()
}
```

------
#### [ CWL ]

```
cwlVersion: v1.2
class: Workflow

requirements:
    InlineJavascriptRequirement: {}

inputs:
   input_file: File
   name: string
   threshold: int

outputs:
    result:
        type: ...
        outputSource: ...

steps:
    my_task:
        run:
            class: CommandLineTool
            baseCommand: my_tool
            requirements:
                ...
            inputs:
                name:
                    type: string
                    inputBinding:
                        prefix: "--name"
                threshold:
                    type: int
                    inputBinding:
                        prefix: "--threshold"
                input_file:
                    type: File
                    inputBinding: {}
            outputs:
                results:
                    type: File
                    outputBinding:
                        glob: results.txt
```

------

# Parameter template files for HealthOmics workflows
<a name="parameter-templates"></a>

Parameter templates define the input parameters for a workflow. You can define input parameters to make your workflow more flexible and versatile. For example, you can define a parameter for the Amazon S3 location of the reference genome files. Parameter templates can be provided through a Git-based repository service or your local drive. Users can then run the workflow using various data sets. 

You can create the parameter template for your workflow, or HealthOmics can generate the parameter template for you.

The parameter template is a JSON file. In the file, each input parameter is a named object that must match the name of the workflow input. When you start a run, if you don't provide values for all the required parameters, the run fails.

The input parameter object includes the following attributes:
+ **description** – This required attribute is a string that the console displays in the **Start run** page. This description is also retained as run metadata.
+ **optional** – This optional attribute indicates whether the input parameter is optional. If you don't specify the **optional** field, the input parameter is required.

The following example parameter template shows how to specify the input parameters.

```
{
  "myRequiredParameter1": {
     "description": "this parameter is required",
  },
  "myRequiredParameter2": {
     "description": "this parameter is also required",
     "optional": false
  },
  "myOptionalParameter": {
     "description": "this parameter is optional",
     "optional": true
  }
}
```

## Generating parameter templates
<a name="parameter-parsing"></a>

HealthOmics generates the parameter template by parsing the workflow definition to detect input parameters. If you provide a parameter template file for a workflow, the parameters in your file override the parameters detected in the workflow definition.

There are slight differences between the parsing logic of the CWL, WDL, and Nextflow engines, as described in the following sections. 

**Topics**
+ [Parameter detection for CWL](#parameter-parsing-cwl)
+ [Parameter detection for WDL](#parameter-parsing-wdl)
+ [Parameter detection for Nextflow](#parameter-parsing-nextflow)

### Parameter detection for CWL
<a name="parameter-parsing-cwl"></a>

In the CWL workflow engine, the parsing logic makes the following assumptions:
+ Any nullable supported types are marked as optional input parameters.
+ Any non-null supported types are marked as required input parameters.
+ Any parameters with default values are marked as optional input parameters.
+ Descriptions are extracted from the `label` section from the `main` workflow definition. If `label` is not specified, the description will be blank (an empty string). 

The following tables show CWL interpolation examples. For each example, the parameter name is `x`. If the parameter is required, you must provide a value for the parameter. If the parameter is optional, you don't need to provide a value.

This table shows CWL interpolation examples for primitive types.


| Input | Example input/output | Required | 
| --- | --- | --- | 
|  <pre>x:               <br />  type: int</pre>  | 1 or 2 or ... | Yes | 
|  <pre>x:               <br />  type: int<br />  default: 2</pre>  | Default value is 2. Valid input is 1 or 2 or ... | No | 
|  <pre>x:               <br />  type: int?</pre>  | Valid input is None or 1 or 2 or ... | No | 
|  <pre>x:               <br />  type: int?<br />  default: 2</pre>  | Default value is 2. Valid input is None or 1 or 2 or ... | No | 

The following table shows CWL interpolation examples for complex types. A complex type is a collection of primitive types.


| Input | Example input/output | Required | 
| --- | --- | --- | 
|  <pre>x:               <br />  type: array<br />  items: int</pre>  | [] or [1,2,3]  | Yes | 
|  <pre>x:               <br />  type: array?<br />  items: int</pre>  | None or [] or [1,2,3]  | No | 
|  <pre>x:               <br />  type: array<br />  items: int?</pre>  |  [] or [None, 3, None]  | Yes | 
|  <pre>x:               <br />  type: array?<br />  items: int?</pre>  |  [None] or None or [1,2,3] or [None, 3] but not []  | No | 

### Parameter detection for WDL
<a name="parameter-parsing-wdl"></a>

In the WDL workflow engine, the parsing logic makes the following assumptions:
+ Any nullable supported types are marked as optional input parameters. 
+ For non-nullable supported types:
  + Any input variable with assignment of literals or expression are marked as optional parameters. For example:

    ```
     Int x = 2 
    Float f0 = 1.0 + f1
    ```
  + If no values or expressions have been been assigned to the input parameters, they will be marked as required parameters. 
+ Descriptions are extracted from `parameter_meta` in the `main` workflow definition. If `parameter_meta` is not specified, the description will be blank (an empty string). For more information, see the WDL specification for [Parameter metadata](https://github.com/openwdl/wdl/blob/wdl-1.2/SPEC.md#metadata-sections).

The following tables show WDL interpolation examples. For each example, the parameter name is `x`. If the parameter is required, you must provide a value for the parameter. If the parameter is optional, you don't need to provide a value.

This table shows WDL interpolation examples for primitive types.


| Input | Example input/output | Required | 
| --- | --- | --- | 
| Int x | 1 or 2 or ... | Yes | 
| Int x = 2 | 2 | No | 
| Int x = 1\$12 | 3 | No | 
| Int x = y\$1z | y\$1z | No | 
| Int? x | None or 1 or 2 or ... | Yes | 
| Int? x = 2 | None or 2 | No | 
| Int? x = 1\$12 | None or 3 | No | 
| Int? x = y\$1z | None or y\$1z | No | 

The following table shows WDL interpolation examples for complex types. A complex type is a collection of primitive types. 


| Input | Example input/output | Required | 
| --- | --- | --- | 
| Array[Int] x | [1,2,3] or [] | Yes | 
| Array[Int]\$1 x | [1], but not [] | Yes | 
| Array[Int]? x | None or [] or [1,2,3] | No | 
| Array[Int?] x | [] or [None, 3, None] | Yes | 
| Array[Int?]=? x | [None] or None or [1,2,3] or [None, 3] but not [] | No | 
| Struct sample \$1String a, Int y\$1 later in inputs: Sample mySample  |  <pre>String a = mySample.a<br />   Int y = mySample.y</pre>  | Yes | 
| Struct sample \$1String a, Int y\$1 later in inputs: Sample? mySample |  <pre>if (defined(mySample)) { <br />     String a = mySample.a<br />     Int y = mySample.y<br />   } </pre>  | No | 

### Parameter detection for Nextflow
<a name="parameter-parsing-nextflow"></a>

For Nextflow, HealthOmics generates the parameter template by parsing the `nextflow_schema.json` file. If the workflow definition doesn't include a schema file, HealthOmics parses the main workflow definition file.

**Topics**
+ [Parsing the schema file](#parameter-parsing-nextflow-schema)
+ [Parsing the main file](#parameter-parsing-nextflow-main)
+ [Nested parameters](#parameter-parsing-nextflow-nested)
+ [Examples of Nextflow interpolation](#parameter-parsing-nextflow-examples)

#### Parsing the schema file
<a name="parameter-parsing-nextflow-schema"></a>

For parsing to work correctly, make sure the schema file meets the following requirements:
+ The schema file is named `nextflow_schema.json` and is located in the same directory as the main workflow file.
+ The schema file is valid JSON as defined in either of the following schemas:
  + [json-schema.org/draft/2020-12/schema](https://json-schema.org/draft/2020-12/schema).
  + [json-schema.org/draft-07/schema](https://json-schema.org/draft-07/schema).

HealthOmics parses the `nextflow_schema.json` file to generate the parameter template:
+ Extracts all **properties** that are defined in the schema.
+ Includes the property **description** if available for the property.
+ Identifies whether each parameter is optional or required, based on the **required** field of the property.

The following example shows a definition file and the generated parameter file.

```
{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "type": "object",
    "$defs": {
        "input_options": {
            "title": "Input options",
            "type": "object",
            "required": ["input_file"],
            "properties": {
                "input_file": {
                    "type": "string",
                    "format": "file-path",
                    "pattern": "^s3://[a-z0-9.-]{3,63}(?:/\\S*)?$",
                    "description": "description for input_file"
                },
                "input_num": {
                    "type": "integer",
                    "default": 42,
                    "description": "description for input_num"
                }
            }
        },
        "output_options": {
            "title": "Output options",
            "type": "object",
            "required": ["output_dir"],
            "properties": {
                "output_dir": {
                    "type": "string",
                    "format": "file-path",
                    "description": "description for output_dir",
                }
            }
        }
    },
    "properties": {
        "ungrouped_input_bool": {
            "type": "boolean",
            "default": true
        }
    },
    "required": ["ungrouped_input_bool"],
    "allOf": [
        { "$ref": "#/$defs/input_options" },
        { "$ref": "#/$defs/output_options" }
    ]
}
```

The generated parameter template:

```
{
    "input_file": {
        "description": "description for input_file",
        "optional": False
    },
    "input_num": {
        "description": "description for input_num",
        "optional": True
    },
    "output_dir": {
        "description": "description for output_dir",
        "optional": False
    },
    "ungrouped_input_bool": {
        "description": None,
        "optional": False
    }
}
```

#### Parsing the main file
<a name="parameter-parsing-nextflow-main"></a>

If the workflow definition doesn't include a `nextflow_schema.json` file, HealthOmics parses the main workflow definition file.

HealthOmics analyzes the `params` expressions found in the main workflow definition file and in the `nextflow.config` file. All `params` with default values are marked as optional.

For parsing to work correctly, note the following requirements:
+ HealthOmics parses only the main workflow definition file. To ensure all parameters are captured, we recommend that you wire all **params** through to any submodules and imported workflows.
+ The config file is optional. If you define one, name it `nextflow.config` and place it in the same directory as the main workflow definition file.

The following example shows a definition file and the generated parameter template.

```
params.input_file = "default.txt"
params.threads = 4
params.memory = "8GB"

workflow {
    if (params.version) {
        println "Using version: ${params.version}"
    }
}
```

The generated parameter template:

```
{
    "input_file": {
        "description": None,
        "optional": True
    },
    "threads": {
        "description": None,
        "optional": True
    },
    "memory": {
        "description": None,
        "optional": True
    },
    "version": {
        "description": None,
        "optional": False
    }
}
```

For default values that are defined in nextflow.config, HealthOmics collects `params` assignments and parameters declared within `params {}`, as shown in the following example. In assignment statements, `params` must appear in the left side of the statement.

```
params.alpha = "alpha"
params.beta = "beta"

params {
    gamma = "gamma"
    delta = "delta"
}

env {
   // ignored, as this assignment isn't in the params block
   VERSION = "TEST"  
}

// ignored, as params is not on the left side
interpolated_image = "${params.cli_image}"
```

The generated parameter template:

```
{
    // other params in your main workflow defintion
    "alpha": {
        "description": None,
        "optional": True
    },
    "beta": {
        "description": None,
        "optional": True
    },
    "gamma": {
        "description": None,
        "optional": True
    },
    "delta": {
        "description": None,
        "optional": True
    }
}
```

#### Nested parameters
<a name="parameter-parsing-nextflow-nested"></a>

Both `nextflow_schema.json` and `nextflow.config` allow nested parameters. However, the HealthOmics parameter template requires only the top-level parameters. If your workflow uses a nested parameter, you must provide a JSON object as the input for that parameter.

##### Nested parameters in schema files
<a name="parameter-parsing-schema-nested"></a>

HealthOmics skips nested **params** when parsing a `nextflow_schema.json` file. For example, if you define the following `nextflow_schema.json` file:

```
{
    "properties": {
        "input": {
            "properties": {
                "input_file": { ... },
                "input_num": { ... }
            }
        },
        "input_bool": { ... }
    }
}
```

HealthOmics ignores `input_file` and `input_num` when it generates the parameter template:

```
{
    "input": {
        "description": None,
        "optional": True
    },
    "input_bool": {
        "description": None,
        "optional": True
    }
}
```

When you run this workflow, HealthOmics expects an `input.json` file similar to the following:

```
{
   "input": {
       "input_file": "s3://bucket/obj",
       "input_num": 2
   },
   "input_bool": false
}
```

##### Nested parameters in config files
<a name="parameter-parsing-config-nested"></a>

HealthOmics doesn't collect nested **params** in a `nextflow.config` file, and skips them during parsing. For example, if you define the following `nextflow.config` file:

```
params.alpha = "alpha"
  params.nested.beta = "beta"
  
  params {
      gamma = "gamma"
      group {
          delta = "delta"
      }
  }
```

HealthOmics ignores `params.nested.beta` and `params.group.delta` when it generates the parameter template:

```
{
    "alpha": {
        "description": None,
        "optional": True
    },
    "gamma": {
        "description": None,
        "optional": True
    }
}
```

#### Examples of Nextflow interpolation
<a name="parameter-parsing-nextflow-examples"></a>

The following table shows Nextflow interpolation examples for params in the main file.


| Parameters | Required | 
| --- | --- | 
| params.input\$1file | Yes | 
| params.input\$1file = "s3://bucket/data.json" | No | 
| params.nested.input\$1file | N/A | 
| params.nested.input\$1file = "s3://bucket/data.json" | N/A | 

The following table shows Nextflow interpolation examples for params in the `nextflow.config` file.


| Parameters | Required | 
| --- | --- | 
|  <pre>params.input_file = "s3://bucket/data.json"</pre>  | No | 
|  <pre>params {<br />   input_file = "s3://bucket/data.json"<br />}</pre>  | No | 
|  <pre>params {<br />   nested {<br />     input_file = "s3://bucket/data.json"    <br />   }<br />}</pre>  | N/A | 
|  <pre>input_file = params.input_file</pre>  | N/A | 

# Container images for private workflows
<a name="workflows-ecr"></a>

HealthOmics supports container images hosted in Amazon ECR private repositories. You can create container images and upload them to the private repository. You can also use your Amazon ECR private registry as a pull through cache to synchronize the contents of upstream registries.

Your Amazon ECR repository must reside in the same AWS Region as the account calling the service. A different AWS account can own the container image, as long as the source image repository provides appropriate permissions. For more information, see [Policies for cross-account Amazon ECR access](permissions-ecr.md#permissions-cross-account).

We recommend that you define your Amazon ECR container image URIs as parameters in your workflow so that access can be verified before the run begins. It also makes it easier to run a workflow in a new Region by changing the Region parameter.

**Note**  
HealthOmics doesn't support ARM containers and doesn't support access to public repositories.

For information about configuring IAM permissions for HealthOmics to access Amazon ECR, see [HealthOmics Resource permissions](permissions-resource.md).

**Topics**
+ [Synchronizing with third-party container registries](#ecr-pull-through)
+ [General considerations for Amazon ECR container images](#ecr-considerations)
+ [Environment variables for HealthOmics workflows](#ecr-env-vars)
+ [Using Java in Amazon ECR container images](#ecr-java-considerations)
+ [Add task inputs to an Amazon ECR container image](#ecr-tasks)

## Synchronizing with third-party container registries
<a name="ecr-pull-through"></a>

You can use Amazon ECR pull through cache rules to synchronize repositories in a supported upstream registry with your Amazon ECR private repositories. For more information, see [Sync an upstream registry](https://docs.aws.amazon.com/AmazonECR/latest/userguide/pull-through-cache.html) in the *Amazon ECR User Guide*.

The pull through cache automatically creates the image repository in your private registry when you create the cache, and it automatically synchronizes with the cached image when there are changes to the upstream image. 

HealthOmics supports pull through cache for the following upstream registries: 
+ Amazon ECR Public
+ Kubernetes container image registry
+ Quay
+ Docker Hub 
+ Microsoft Azure Container Registry
+ GitHub Container Registry 
+ GitLab Container Registry 

HealthOmics doesn't support pull through cache for an upstream Amazon ECR private repository.

Benefits of using Amazon ECR pull through cache include:

1. You avoid having to manually migrate container images to Amazon ECR or to synchronize updates from the third party repository. 

1. Workflows access the synchronized container images in your private repository, which is more reliable than downloading content at run time from a public registry.

1. Because Amazon ECR pull through caches use a predictable URI structure, the HealthOmics service can automatically map the Amazon ECR private URI with the upstream registry URI. You aren't required to update and replace URI values in the workflow definition.

**Topics**
+ [Configuring pull through cache](#ecr-pull-through-configure)
+ [Registry mappings](#ecr-pull-through-registry-mapping)
+ [Image mappings](#ecr-pull-through-mapping-format)

### Configuring pull through cache
<a name="ecr-pull-through-configure"></a>

Amazon ECR provides a registry for your AWS account in each Region. Make sure you create the Amazon ECR configuration in the same region where you plan to run the workflow.

The following sections describe the configuration tasks for pull through cache.

**Topics**
+ [Create a pull through cache rule](#create-ecr-ptc)
+ [Registry permissions for upstream registry](#reg-ecr-ptc)
+ [Repository creation templates](#repo-create-templates-ptc)
+ [Creating the workflow](#reg-mapping-ecr-ptc)

#### Create a pull through cache rule
<a name="create-ecr-ptc"></a>

Create an Amazon ECR pull through cache rule for each upstream registry that has images you want to cache. A rule specifies a mapping between an upstream registry and the Amazon ECR private repository. 

For an upstream registry that requires authentication, you provide your credentials using AWS Secrets Manager.

**Note**  
Don't change a pull through cache rule while an active run is using the private repository. The run could fail or, more critically, result in your pipeline using unexpected images.

For more information, see [Creating a pull through cache rule](https://docs.aws.amazon.com/AmazonECR/latest/userguide/pull-through-cache-creating-rule.html) in the *Amazon Elastic Container Registry User Guide*.

##### Create a pull through cache rule using the console
<a name="create-ecr-ptc-console"></a>

To configure pull through cache, follow these steps using the Amazon ECR console:

1. Open the Amazon ECR console : https://console.aws.amazon.com/ecr

1. From the left menu, under **Private registry**, expand **Features & Settings**. then choose **Pull through cache**.

1. From the **Pull through cache** page, choose **Add rule**.

1. In the **Upstream registry** panel, choose the upstream registry to sync with your private registry, then choose **Next**.

1. If the upstream registry requires authentication, the console opens a new page where you specify the SageMaker AI secret that contains your credentials. Choose **Next**.

1. Under **Specify namespaces**, in the **Cache namespace** panel, choose whether to create the private repositories using a specific repository prefix or with no prefix. If you choose to use a prefix, specify the prefix name in **Cache repository prefix**.

1. In the **Upstream namespace** panel, choose whether to pull from upstream repositories using a specific repository prefix or with no prefix. If you choose to use a prefix, specify the prefix name in **Upstream repository prefix**.

   The **Namespace example** panel shows an example pull request, upstream URL, and the URL of the cache repository that is created.

1. Choose **Next**.

1. Review the configuration and choose **Create** to create the rule.

For more information, see [ Create a pull through cache rule (AWS Management Console)](https://docs.aws.amazon.com/AmazonECR/latest/userguide/pull-through-cache-creating-rule.html#pull-through-cache-creating-rule-console).

##### Create a pull through cache rule using the CLI
<a name="create-ecr-ptc-cli"></a>

Use the Amazon ECR **create-pull-through-cache-rule** command to create a pull through cache rule. For upstream registries that require authentication, store the credentials in an Secrets Manager secret.

The following sections provide examples for each supported upstream registry.

##### For Amazon ECR Public
<a name="ecr-ptc-cli-public-ecr"></a>

The following example creates a pull through cache rule for the Amazon ECR Public registry. It specifies a repository prefix of `ecr-public`, which results in each repository created using the pull through cache rule to have the naming scheme of `ecr-public/upstream-repository-name`.

```
aws ecr create-pull-through-cache-rule \
     --ecr-repository-prefix ecr-public \
     --upstream-registry-url public.ecr.aws \
     --region us-east-1
```

##### For Kubernetes Container Registry
<a name="ecr-ptc-cli-kubernetes"></a>

The following example creates a pull through cache rule for the Kubernetes public registry. It specifies a repository prefix of `kubernetes`, which results in each repository created using the pull through cache rule to have the naming scheme of `kubernetes/upstream-repository-name`.

```
aws ecr create-pull-through-cache-rule \
     --ecr-repository-prefix kubernetes \
     --upstream-registry-url registry.k8s.io \
     --region us-east-1
```

##### For Quay
<a name="ecr-ptc-cli-quay"></a>

The following example creates a pull through cache rule for the Quay public registry. It specifies a repository prefix of `quay`, which results in each repository created using the pull through cache rule to have the naming scheme of `quay/upstream-repository-name`.

```
aws ecr create-pull-through-cache-rule \
     --ecr-repository-prefix quay \
     --upstream-registry-url quay.io \
     --region us-east-1
```

##### For Docker Hub
<a name="ecr-ptc-cli-docker-hub"></a>

The following example creates a pull through cache rule for the Docker Hub registry. It specifies a repository prefix of `docker-hub`, which results in each repository created using the pull through cache rule to have the naming scheme of `docker-hub/upstream-repository-name`. You must specify the full Amazon Resource Name (ARN) of the secret containing your Docker Hub credentials.

```
aws ecr create-pull-through-cache-rule \
     --ecr-repository-prefix docker-hub \
     --upstream-registry-url registry-1.docker.io \
     --credential-arn arn:aws:secretsmanager:us-east-1:111122223333:secret:ecr-pullthroughcache/example1234 \
     --region us-east-1
```

##### For GitHub Container Registry
<a name="ecr-ptc-cli-public-github"></a>

The following example creates a pull through cache rule for the GitHub Container Registry. It specifies a repository prefix of `github`, which results in each repository created using the pull through cache rule to have the naming scheme of `github/upstream-repository-name`. You must specify the full Amazon Resource Name (ARN) of the secret containing your GitHub Container Registry credentials.

```
aws ecr create-pull-through-cache-rule \
     --ecr-repository-prefix github \
     --upstream-registry-url ghcr.io \
     --credential-arn arn:aws:secretsmanager:us-east-1:111122223333:secret:ecr-pullthroughcache/example1234 \
     --region us-east-1
```

##### For Microsoft Azure Container Registry
<a name="ecr-ptc-cli-azure"></a>

The following example creates a pull through cache rule for the Microsoft Azure Container Registry. It specifies a repository prefix of `azure`, which results in each repository created using the pull through cache rule to have the naming scheme of `azure/upstream-repository-name`. You must specify the full Amazon Resource Name (ARN) of the secret containing your Microsoft Azure Container Registry credentials.

```
aws ecr create-pull-through-cache-rule \
     --ecr-repository-prefix azure \
     --upstream-registry-url myregistry.azurecr.io \
     --credential-arn arn:aws:secretsmanager:us-east-1:111122223333:secret:ecr-pullthroughcache/example1234 \
     --region us-east-1
```

##### For GitLab Container Registry
<a name="ecr-ptc-cli-gitlab"></a>

The following example creates a pull through cache rule for the GitLab Container Registry. It specifies a repository prefix of `gitlab`, which results in each repository created using the pull through cache rule to have the naming scheme of `gitlab/upstream-repository-name`. You must specify the full Amazon Resource Name (ARN) of the secret containing your GitLab Container Registry credentials.

```
aws ecr create-pull-through-cache-rule \
     --ecr-repository-prefix gitlab \
     --upstream-registry-url registry.gitlab.com \
     --credential-arn arn:aws:secretsmanager:us-east-1:111122223333:secret:ecr-pullthroughcache/example1234 \
     --region us-east-1
```

For more information, see [ Create a pull through cache rule (CLI)](https://docs.aws.amazon.com/AmazonECR/latest/userguide/pull-through-cache-creating-rule.html#pull-through-cache-creating-rule-cli) in the *Amazon ECR User Guide*.

You can use the **get-run-task** CLI command to retrieve information about the container image used for a specific task:

```
 aws omics get-run-task --id 1234567 --task-id <task_id> 
```

The output includes the following information about the container image:

```
 "imageDetails": {
    "image": "string",
    "imageDigest": "string",
    "sourceImage": "string", 
          ...
 }
```

#### Registry permissions for upstream registry
<a name="reg-ecr-ptc"></a>

Use registry permissions to allow HealthOmics to use the pull through cache and to pull the container images into the Amazon ECR private registry. Add an Amazon ECR Registry policy to the registry that provides the containers used in runs. 

The following policy grants permission for the HealthOmics service to create repositories with the specified pull through cache prefix(es) and to initiate upstream pulls into these repositories. 

1. From the Amazon ECR console, open the left menu, under **Private registry**, expand **Registry permissions**. then choose **Generate statement**.

1. On the top right side, choose JSON. Enter a policy similar to the following:

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "AllowPTCinRegPermissions",
               "Effect": "Allow",
               "Principal": {
                   "Service": "omics.amazonaws.com"
               },
               "Action": [
                   "ecr:CreateRepository",
                   "ecr:BatchImportUpstreamImage"
               ],
               "Resource": [
                   "arn:aws:ecr:us-east-1:123456789012:repository/ecr-public/*",
                   "arn:aws:ecr:us-east-1:123456789012:repository/docker-hub/*"
               ] 
           }
       ]
   }
   ```

------

#### Repository creation templates
<a name="repo-create-templates-ptc"></a>

To use pull through caching in HealthOmics, the Amazon ECR repository must have a repository creation template. The template defines configuration settings for when you or Amazon ECR create a private repository for an upstream registry. 

Each template contains a repository namespace prefix, which Amazon ECR uses to match new repositories to a specific template. Templates specify the configuration for all repository settings including resource-based access policies, tag immutability, encryption, and lifecycle policies.

For more information, see [Repository creation templates](https://docs.aws.amazon.com/AmazonECR/latest/userguide/repository-creation-templates.html) in the *Amazon Elastic Container Registry User Guide*.

How to create a repository creation template:

1. From the Amazon ECR console, open the left menu, under **Private registry**, expand **Features and settings**. then choose **Repository creation templates**.

1. Choose **Create template**.

1. In **Template details**, choose **Pull through cache**.

1. Choose whether to apply this template to a specific prefix or to all repositories that don't match another template.

   If you choose **A specific prefix**, enter the namespace prefix value in **Prefix**. You specified this prefix when you created the PTC rule.

1. Choose **Next**.

1. In **Add repository creation configuration** page, enter **Repository permissions**. Use one of the sample policy statements, or enter one similar to the following example:

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "PTCRepoCreationTemplate",
               "Effect": "Allow",
               "Principal": {
                   "Service": "omics.amazonaws.com"
               },
               "Action": [
                   "ecr:BatchGetImage",
                   "ecr:GetDownloadUrlForLayer"
               ],
               "Resource": "*"
           }
       ]
   }
   ```

------

1. Optionally, you can add repository settings such as lifecycle policy and tags. Amazon ECR applies these rules for all container images created for pull through cache that use the specified prefix.

1. Choose **Next**.

1. Review the configuration and choose **Next**.

#### Creating the workflow
<a name="reg-mapping-ecr-ptc"></a>

When you create a new workflow or workflow version, review the registry mappings and update them if required. For details, see [Create a private workflow](create-private-workflow.md).

### Registry mappings
<a name="ecr-pull-through-registry-mapping"></a>

You define registry mappings to map between prefixes in your private Amazon ECR registry and the upstream registry names.

For more information about Amazon ECR registry mappings, see [ Creating a pull through cache rule in Amazon ECR](https://docs.aws.amazon.com/AmazonECR/latest/userguide/pull-through-cache-creating-rule.html).

The following example shows registry mappings to Docker Hub, Quay, and Amazon ECR Public.

```
{
    "registryMappings": [
        {
            "upstreamRegistryUrl": "registry-1.docker.io",
            "ecrRepositoryPrefix": "docker-hub"
        },
        {
            "upstreamRegistryUrl": "quay.io",
            "ecrRepositoryPrefix": "quay"
        },
        {
            "upstreamRegistryUrl": "public.ecr.aws",
            "ecrRepositoryPrefix": "ecr-public"
        }
    ]
}
```

### Image mappings
<a name="ecr-pull-through-mapping-format"></a>

You define image mappings to map between the image names as defined in your private Amazon ECR workflows and the image names in the upstream registry.

You can use image mappings with registries that support pull through cache. You can also use image mappings with upstream registries where HealthOmics doesn't support pull through cache. You need to manually synchronize the upstream registry with your private repository. 

For more information about Amazon ECR image mappings, see [ Creating a pull through cache rule in Amazon ECR](https://docs.aws.amazon.com/AmazonECR/latest/userguide/pull-through-cache-creating-rule.html).

The following example shows mappings from private Amazon ECR images to a public genomics image and the latest Ubuntu image.

```
{
    "imageMappings": [
        {
            "sourceImage": "public.ecr.aws/aws-genomics/broadinstitute/gatk:4.6.0.2",
            "destinationImage": "123456789012.dkr.ecr.us-east-1.amazonaws.com/broadinstitute/gatk:4.6.0.2"
        },
        {
            "sourceImage": "ubuntu:latest",
            "destinationImage": "123456789012.dkr.ecr.us-east-1.amazonaws.com/custom/ubuntu:latest",
        }
    ]
}
```

## General considerations for Amazon ECR container images
<a name="ecr-considerations"></a>
+ Architecture

  HealthOmics supports x86\$164 containers. If your local machine is ARM-based, such as Apple Mac, use a command such as the following to build an x86\$164 container image: 

  ```
  docker build --platform amd64 -t my_tool:latest .
  ```
+ Entrypoint and shell

  HealthOmics workflow engines inject bash scripts as a command override to the container images used by workflow tasks. Thus, container images should be built without a specified ENTRYPOINT such that a bash shell is the default. 
+ Mounted paths

  A shared filesystem is mounted to container tasks at /tmp. Any data or tooling built into the container image at this location will be overridden.

  The workflow definition is available to tasks via a read-only mount at /mnt/workflow.
+ Image size

  See [HealthOmics workflow fixed size quotas](fixed-quotas.md#fixed-quotas-workflows) for the maximum container image sizes.

## Environment variables for HealthOmics workflows
<a name="ecr-env-vars"></a>

HealthOmics provides environment variables that have information about the workflow running in the container. You can use the values of these variables in the logic of your workflow tasks.

All HealthOmics workflow variables start with the `AWS_WORKFLOW_` prefix. This prefix is a protected environment variable prefix. Don't use this prefix for your own variables in workflow containers. 

HealthOmics provides the following workflow environment variables:

**AWS\$1REGION**  
This variable is the region where the container is running.

**AWS\$1WORKFLOW\$1RUN**  
This variable is the name of the current run.

**AWS\$1WORKFLOW\$1RUN\$1ID**  
This variable is the run identifier of the current run.

**AWS\$1WORKFLOW\$1RUN\$1UUID**  
This variable is the run UUID of the current run.

**AWS\$1WORKFLOW\$1TASK**  
This variable is the name of the current task.

**AWS\$1WORKFLOW\$1TASK\$1ID**  
This variable is the task identifier of the current task.

**AWS\$1WORKFLOW\$1TASK\$1UUID**  
This variable is the task UUID of the current task.

The following example shows typical values for each environment variable:

```
AWS Region: us-east-1
Workflow Run: arn:aws:omics:us-east-1:123456789012:run/6470304
Workflow Run ID: 6470304
Workflow Run UUID: f4d9ed47-192e-760e-f3a8-13afedbd4937
Workflow Task: arn:aws:omics:us-east-1:123456789012:task/4192063
Workflow Task ID: 4192063
Workflow Task UUID: f0c9ed49-652c-4a38-7646-60ad835e0a2e
```

## Using Java in Amazon ECR container images
<a name="ecr-java-considerations"></a>

If a workflow task uses a Java application such as GATK, consider the following memory requirements for the container:
+ Java applications use stack memory and heap memory. By default, the maximum heap memory is a percentage of the total available memory in the container. This default depends on the specific JVM distribution and JVM version, so consult the relevant documentation for your JVM or explicitly set the heap memory maximum using Java command line options (such as `-Xmx`). 
+ Don't set the maximum heap memory to be 100% of the container's memory allocation, because the JVM stack also requires memory. Memory is also required for the JVM garbage collector and any other operating system processes running in the container.
+ Some Java applications, such as GATK, can use native method invocations or other optimizations such as memory mapping files. These techniques require memory allocations that are performed “off heap”, which aren't controlled by the JVM maximum heap parameter. 

  If you know (or suspect) that your Java application allocates off-heap memory, make sure your task memory allocation includes the off-heap memory requirements.

  If these off-heap allocations cause the container to run out of memory, you typically won't see a Java **OutOfMemory** error, because the JVM doesn't control this memory. 

## Add task inputs to an Amazon ECR container image
<a name="ecr-tasks"></a>

Add all executables, libraries, and scripts needed to run a workflow task into the Amazon ECR image that's used to run the task. 

It's best practice to avoid using scripts, binaries, and libraries that are external to a tasks container image. This is especially important when using `nf-core` workflows that use a `bin` directory as part of the workflow package. While this directory will be available to the workflow task, it's mounted as a read-only directory. Required resources in this directory should be copied into the task image and made available at runtime or when building the container image used for the task. 

See [HealthOmics workflow fixed size quotas](fixed-quotas.md#fixed-quotas-workflows) for the maximum size of container image that HealthOmics supports.

# HealthOmics Workflow README files
<a name="workflows-readme"></a>

You can upload a README.md file containing instructions, diagrams, and essential information for your workflow. Each workflow version supports one README file, which you can update at any time.

**README requirements include:**
+ README file must be in markdown (.md) format
+ Maximum file size: 500 KiB

**Topics**
+ [Use an existing README](#workflows-add-readme)
+ [Rendering conditions](#workflows-rendering-readme)

## Use an existing README
<a name="workflows-add-readme"></a>

READMEs exported from Git repositories contain relative links that typically do not work outside the repository. HealthOmics Git integration automatically converts these to absolute links for proper rendering in the console, eliminating the need for manual URL updates. 

For READMEs imported from Amazon S3 or local drives, images and links must either use public URLs or have their relative paths updated for proper rendering.

**Note**  
Images must be publicly hosted to display in the HealthOmics console. Images stored in GitHub Enterprise Server or GitLab Self-Managed repositories cannot be rendered.

## Rendering conditions
<a name="workflows-rendering-readme"></a>

The HealthOmics console interpolates publicly accessible images and links using absolute paths. To render URLs from private repositories, the user must have access to the repository. For GitHub Enterprise Server or GitLab Self-Managed repositories, which use custom domains, HealthOmics cannot resolve relative links or render images stored in these private repositories.

The following table shows the markdown elements that are supported by the AWS console README view.


| Element | AWS console | 
| --- | --- | 
| Alerts | Yes, but without icons | 
| Badges | Yes | 
| Basic text formatting | Yes | 
| [Code blocks](https://www.markdownguide.org/basic-syntax/#code-blocks) | Yes, but does not have [syntax highlight](https://www.markdownguide.org/extended-syntax/#syntax-highlighting) and copy button functionality  | 
| Collapsible sections | Yes | 
| [Headings](https://www.markdownguide.org/basic-syntax/#headings) | Yes | 
| [Image formats](https://www.markdownguide.org/basic-syntax/#images-1) | Yes | 
| [Image (clickable)](https://www.markdownguide.org/basic-syntax/#linking-images) | Yes | 
| [Line breaks](https://www.markdownguide.org/basic-syntax/#line-breaks) | Yes | 
| Mermaid diagram | Only can open graph, move graph position, and copy code | 
| Quotes | Yes | 
| [Subscript](https://www.markdownguide.org/extended-syntax/#subscript) and [superscript](https://www.markdownguide.org/extended-syntax/#superscript) | Yes | 
| [Tables](https://www.markdownguide.org/extended-syntax/#tables) | Yes, but does not support text alignment | 
| Text alignment | Yes | 

### Using image and link URLs
<a name="workflows-urls-readme"></a>

Depending on your source provider, structure your base URLs for pages and images in the following formats.
+ `{username}`: The username where the repository is hosted.
+ `{repo}`: The repository name.
+ `{ref}`: The source reference (branch, tag, and commit ID). 
+ `{path}`: The file path to the page or image in the repository. 


| Source provider | Page URL | Image URL | 
| --- | --- | --- | 
| GitHub | https://github.com/\$1username\$1/\$1repo\$1/blob/\$1ref\$1/\$1path\$1 |  `https://github.com/{username}/{repo}/blob/{ref}/{path}?raw=true` `https://raw.githubusercontent.com/{username}/{repo}/{ref}/{path}`  | 
| GitLab | https://gitlab.com/\$1username\$1/\$1repo\$1/-/blob/\$1ref\$1/\$1path\$1 | https://gitlab.com/\$1username\$1/\$1repo\$1/-/raw/\$1ref\$1/\$1path\$1 | 
| Bitbucket | https://bitbucket.org/\$1username\$1/\$1repo\$1/src/\$1ref\$1/\$1path\$1 | https://bitbucket.org/\$1username\$1/\$1repo\$1/raw/\$1ref\$1/\$1path\$1 | 

GitHub, GitLab, and Bitbucket support both page and image URLs that link to a public repository. The following table shows each source provider’s support for rendering image and link URLs for private repositories.


| Private repository support | Source provider | Page URL | Image URL | 
| --- | --- | --- | --- | 
| GitHub | Only with access to repository | No | 
| GitLab | Only with access to repository | No | 
| Bitbucket | Only with access to repository | No | 

# Requesting Sentieon licenses for private workflows
<a name="private-workflows-subscribe"></a>

If your private workflow uses Sentieon software, you need a Senieon license. Follow these steps to request and set up a license for the Sentieon software:
+ Request a Sentieon license 
  + Send an email to the Sentieon support group (support@sentieon.com) to request a software license.
    + Provide your AWS Canonical User ID in the email.
    + Find your AWS Canonical User ID by following [ these instructions](https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-identifiers.html#FindCanonicalId).
+ Update your HealthOmics service role to grant it access to the Sentieon licensing server proxy and Sentieon Omics bucket in your Region. The following example grants access in `us-east-1`. If required, replace this text with your Region.

------
#### [ JSON ]

****  

  ```
  {
             "Version":"2012-10-17",		 	 	 
      "Statement": [
          {
              "Effect": "Allow",
              "Action": [
                  "s3:GetObjectAcl",
                  "s3:GetObject"
              ],
              "Resource": [
                  "arn:aws:s3:::omics-ap-us-east-1/*",
                  "arn:aws:s3:::sentieon-omics-license-us-east-1/*"
              ]
          }
      ]
   }
  ```

------
+ Generate an AWS support case to get access to the Sentieon license server proxy. 
  + To create a support case, navigate to [ support.console.aws.amazon.com.](https://support.console.aws.amazon.com)
  + Provide your AWS account and Region in the support case. Your account is added to the allowlist for the licensing server proxy.
+ Build your private workflow using the Sentieon container and the Sentieon license script.
  + For additional instructions on using Sentieon tools inside private workflows, see [Sentieon-Amazon-Omics](https://github.com/Sentieon/sentieon-amazon-omics) in GitHub.
+ Sentieon software version 202112.07 and higher support the HealthOmics licensing server proxy. To use Sentieon software versions earlier than 202112.07, contact Sentieon support.

# Workflow linters in HealthOmics
<a name="workflows-linter"></a>

After you create a workflow, we recommend that you run a linter on the workflow before you start the first run. The linter detects errors that can cause the run to fail. 

For WDL, HealthOmics automatically runs a linter when you create the workflow. The linter output is available in the `statusMessage` field of the **get-workflow** response. Use the following CLI command to retrieve the status output (use the workflow ID of the WDL workflow that you created): 

```
aws omics get-workflow 
   —id 123456 
   —query 'statusMessage'
```

HealthOmics provides linters that you can run on the workflow defnition before you create the workflow. Run these linters on existing pipelines that you're migrating to HealthOmics.
+ **WDL** – A public Amazon ECR image to run a [WDL linter](https://gallery.ecr.aws/aws-genomics/healthomics-linter).
+ **Nextflow** – A public Amazon ECR image to run [Linter rules for Nextflow]( https://gallery.ecr.aws/aws-genomics/linter-rules-for-nextflow). You can access the source code for this linter from [GitHub](https://github.com/awslabs/linter-rules-for-nextflow).
+ **CWL** – not available

# HealthOmics workflow operations
<a name="creating-private-workflows"></a>

To create a private workflow, you need:
+  **Workflow definition file:** A workflow definition file written in WDL, Nextflow, or CWL. The workflow definition specifies the inputs and outputs for runs that use the workflow. It also includes specifications for the runs and run tasks for your workflow, including compute and memory requirements. The workflow definition file must be in `.zip` format. For more information, see [Workflow definition files](workflow-definition-files.md) in HealthOmics.
  + You can use [Kiro CLI](https://docs.aws.amazon.com/kiro/latest/userguide/what-is.html) to build and validate your workflow definition files in WDL, Nextflow, and CWL. For more information, see [Example prompts for Kiro CLI](getting-started.md#omics-kiro-prompts) and the [HealthOmics Agentic generative AI tutorial](https://github.com/aws-samples/aws-healthomics-tutorials/tree/main/generative-ai) on GitHub.
+  **(Optional) Parameter template file:** A parameter template file written in JSON. Create the file to define the run parameters, or HealthOmics generates the parameter template for you. For more information, see [Parameter template files for HealthOmics workflows](parameter-templates.md). 
+ **Amazon ECR container images:** Create private Amazon ECR repositories for each container used in the workflow. Create container images for the workflow and store them in a private repository, or synchronize the contents of a supported upstream registry with your ECR private repository. 
+  **(Optional) Sentieon licenses:** Request a Sentieon license to use the Sentieon software in private workflows.

For workflow definition files larger than 4 MiB (zipped), choose one of these options during workflow creation:
+ Upload to an Amazon Simple Storage Service folder and specify the location.
+ Upload to an external repository (max size 1 GiB) and specify the repository details.

After you create a workflow, you can update the following workflow information with the `UpdateWorkflow` operation:
+ Name
+ Description
+ Default storage type
+ Default storage capacity (with workflow ID)
+ README.md file

To change other information in the workflow, create a new workflow or workflow version.

Use workflow versioning to organize and structure you workflows. Versions also help you to manage the introduction of iterative workflow updates. For more information about versions, see [Create a workflow version](workflows-version-create.md).

**Topics**
+ [Create a private workflow](create-private-workflow.md)
+ [Update a private workflow](update-private-workflow.md)
+ [Delete a private workflow](delete-private-workflow.md)
+ [Verify the workflow status](using-get-workflow.md)
+ [Referencing genome files from a workflow definition](create-ref-files.md)

# Create a private workflow
<a name="create-private-workflow"></a>

Create a workflow using the HealthOmics console, AWS CLI commands, or one of the AWS SDKs.

**Note**  
Don’t include any personally identifiable information (PII) in workflow names. These names are visible in CloudWatch logs.

When you create a workflow, HealthOmics assigns a universally unique identifier (UUID) to the workflow. The workflow UUID is a Globally Unique Identifier (guid) that's unique across workflows and workflow versions. For data provenance purposes, we recommend that you use the workflow UUID to uniquely identify workflows.

If your workflow tasks use any external tools (executables, libraries, or scripts), you build these tools into a container image. You have the following options for hosting the container image:
+ Host the container image in the ECR private registry. Prerequisites for this option:
  + Create an ECR private repository, or choose an existing repository.
  + Configure the ECR resource policy as described in [Amazon ECR permissions](permissions-ecr.md).
  + Upload your container image to the private repository. 
+ Synchronize the container image with the contents of a supported third-party registry. Prerequisites for this option:
  + In the ECR private registry, configure a pull through cache rule for each upstream registry. For more information, see [Image mappings](workflows-ecr.md#ecr-pull-through-mapping-format).
  + Configure the ECR resource policy as described in [Amazon ECR permissions](permissions-ecr.md).
  + Create repository creation templates. The template defines settings for when Amazon ECR creates the private repository for an upstream registry.
  + Create prefix mappings to remap container image references in the workflow definition to ECR cache namespaces.

When you create a workflow, you provide a workflow definition that contains information about the workflow, runs, and tasks. HealthOmics can retrieve the workflow definition as a .zip archive stored locally or in an Amazon S3 bucket, or from a supported Git-based repository. 

**Topics**
+ [Creating a workflow using the console](#console-create-workflows)
+ [Creating a workflow using the CLI](#api-create-workflows)
+ [Creating a workflow using an SDK](#sdk-create-workflows)

## Creating a workflow using the console
<a name="console-create-workflows"></a>

**Steps to create a workflow**

1. Open the [HealthOmics console](https://console.aws.amazon.com/omics/).

1.  If required, open the left navigation pane (≡). Choose **Private workflows**.

1. On the **Private workflows** page, choose **Create workflow**.

1. On the **Define workflow** page, provide the following information:

   1. **Workflow name**: A distinctive name for this workflow. We recommend setting workflow names to organize your runs in the AWS HealthOmics console and CloudWatch logs.

   1. **Description** (optional): A description of this workflow.

1. In the **Workflow definition** panel, provide the following information:

   1. **Workflow language** (optional): Select the specification language of the workflow. Otherwise, HealthOmics determines the language from the workflow definition.

   1. For **Workflow definition source**, choose to import the definition folder from a Git-based repository, an Amazon S3 location, or from a local drive.

      1. For **Import from a repository service**:
**Note**  
HealthOmics supports public and private repositories for GitHub, GitLab, Bitbucket, GitHub self-managed, GitLab self-managed.

         1. Choose a **Connection** to connect your AWS resources to the external repository. To create a connection, see [Connect with external code repositories](setting-up-new.md#setting-up-omics-repository).
**Note**  
Customers in the TLV region need to create a connection in the IAD (us-east-1) region to create a workflow. 

         1. In **Full repository ID**, enter your repository ID as user-name/repo-name. Verify you have access to the files in this repository.

         1. In **Source reference** (optional), enter a repository source reference (branch, tag, or commit ID). HealthOmics uses the default branch if no source reference is specified.

         1. In **Exclude file patterns**, enter the file patterns to exclude specific folders, files, or extensions. This helps manage data size when importing repository files. There is a max of 50 patterns, and the patters must follow the [glob pattern syntax](https://fossil-scm.org/home/doc/tip/www/globs.md). For example: 

            1. `tests/`

            1. `*.jpeg`

            1. `large_data.zip`

      1. For **Select definition folder from S3**:

         1. Enter the Amazon S3 location that contains the zipped workflow definition folder. The Amazon S3 bucket must be in the same region as the workflow.

         1. If your account doesn't own the Amazon S3 bucket, enter the bucket owner's AWS account ID in the **S3 bucket owner's account ID**. This information is required so that HealthOmics can verify the bucket ownership.

      1. For **Select definition folder from a local source**:

         1. Enter the local drive location of the zipped workflow definition folder.

   1. **Main workflow definition file path** (optional): Enter the file path from the zipped workflow definition folder or repository to the `main` file. This parameter is not required if there is only one file in the workflow definition folder, or if the main file is named "main".

1. In the **README file** (optional) panel, select the **Source of the README file** and provide the following information:
   + For **Import from a repository service**, in **README file path**, enter the path to the README file within the repository.
   + For **Select file from S3**, in **README file in S3**, enter the Amazon S3 URI for the README file.
   + For **Select file from a local source**: in **README - optional**, chose **Choose file** to select the markdown (.md) file to upload.

1. In the **Default run storage configuration** panel, provide the default run storage type and capacity for runs that use this workflow:

   1. **Run storage type**: Choose whether to use static or dynamic storage as the default for the temporary run storage. The default is static storage.

   1. **Run storage capacity** (optional): For static run storage type, you can enter the default amount of run storage required for this workflow. The default value for this parameter is 1200 GiB. You can override these default values when you start a run.

1. **Tags** (optional): You can associate up to 50 tags with this workflow.

1. Choose **Next**.

1. On the **Add workflow parameters** (optional) page, select the **Parameter source**:

   1. For **Parse from workflow definition file**, HealthOmics will automatically parse the workflow parameters from the workflow definition file.

   1. For **Provide parameter template from Git repository**, use the path to the parameter template file from your repository.

   1. For **Select JSON file from local source**, upload a JSON file from a local source that specifies the parameters.

   1. For **Manually enter workflow parameters**, manually enter parameter names and descriptions.

1. In the **Parameter preview** panel, you can review or change the parameters for this workflow version. If you restore the JSON file, you lose any local changes that you made.

1. Choose **Next**.

1. On the **Container URI remapping** page, in the **Mapping rules** panel, you can define URI mapping rules for your workflow.

   For **Source of mapping file**, select one of the following options:
   + **None** – No mapping rules required.
   + **Select JSON file from S3** – Specify the S3 location for the mapping file. 
   + **Select JSON file from a local source** – Specify the mapping file location on your local device.
   + **Manually enter mappings** – enter the registry mappings and image mappings in the **Mappings** panel.

1.  The console displays the **Mappings** panel. If you chose a mapping source file, the console displays the values from the file.

   1. In **Registry mappings**, you can edit the mappings or add mappings (maximum of 20 registry mappings).

      Each registry mapping contains the following fields:
      + **Upstream registry URL** – The URI of the upstream registry.
      + **ECR repository prefix** – The repository prefix to use in the Amazon ECR private repository.
      + (Optional) **Upstream repository prefix** – The prefix of the repository in the upstream registry.
      + (Optional) **ECR account ID** – Account ID of the account that owns the upstream container image.

   1. In **Image mappings**, you can edit the image mappings or add mappings (maximum of 100 image mappings).

      Each image mapping contains the following fields:
      + **Source image** – Specifies the URI of the source image in the upstream registry.
      + **Destination image** – Specifies the URI of the corresponding image in the private Amazon ECR registry.

1. Choose **Next**.

1. Review the workflow configuration, then choose **Create workflow**.

## Creating a workflow using the CLI
<a name="api-create-workflows"></a>

If your workflow files and the parameter template file are on your local machine, you can create a workflow using the following CLI command. 

```
aws omics create-workflow  \
  --name "my_workflow"   \
  --definition-zip fileb://my-definition.zip \
  --parameter-template file://my-parameter-template.json
```

The `create-workflow` operation returns the following response:

```
{
  "arn": "arn:aws:omics:us-west-2:....",
  "id": "1234567",
  "status": "CREATING",
  "tags": {
      "resourceArn": "arn:aws:omics:us-west-2:...."
  },
  "uuid": "64c9a39e-8302-cc45-0262-2ea7116d854f"
}
```

### Optional parameters to use when creating a workflow
<a name="other-create-parameters"></a>

You can specify any of the optional parameters when you create a workflow. For syntax details, see [CreateWorkflow](https://docs.aws.amazon.com/omics/latest/api/API_CreateWorkflow.html) in the AWS HealthOmics API Reference. 

**Topics**
+ [Specify the workflow definition Amazon S3 location](#create-defn-uri-parameter)
+ [Use the workflow definition from a Git-based repository](#create-defn-uri-git)
+ [Specify a Readme file](#specify-readme-file)
+ [Specify the **main** definition file](#create-main-parameter)
+ [Specify the run storage type](#create-run-storage-parameter)
+ [Specify the GPU configuration](#create-accelerator-parameter)
+ [Configure pull through cache mapping parameters](#create-prefix-mapping-parameters)

#### Specify the workflow definition Amazon S3 location
<a name="create-defn-uri-parameter"></a>

If your workflow definition file is located in an Amazon S3 folder, specify the location using the `definition-uri` parameter, as shown in the following example. If your account does not own the Amazon S3 bucket, provide the owner's AWS account ID.

```
aws omics create-workflow  \
  --name Test  \
  --definition-uri s3://omics-bucket/workflow-definition/  \
  --owner-id  123456789012 
    ...
```

#### Use the workflow definition from a Git-based repository
<a name="create-defn-uri-git"></a>

To use the workflow definition from a supported Git-based repository, use the `definition-repository` parameter in your request. Don’t provide any other `definition` parameter, as a request fails if it includes more than one input source.

The `definition-respository` parameter contains the following fields:
+ **connectionArn** – ARN of the Code Connection that connects your AWS resources to the external repository.
+ **fullRepositoryId** – Enter the repository ID as `owner-name/repo-name`. Verify you have access to the files in this repository.
+ **sourceReference** (Optional) – Enter a repository reference type (BRANCH, TAG, or COMMIT) and a value.

  HealthOmics uses the latest commit on the default branch if you don't specify a source reference.
+ **excludeFilePatterns** (Optional) – Enter the file patterns to exclude specific folders, files, or extensions. This helps manage data size when importing repository files. Provide a maximum of 50 patterns. the patterns must follow the [ glob pattern syntax](https://fossil-scm.org/home/doc/tip/www/globs.md). For example:
  + `tests/`
  + `*.jpeg`
  + `large_data.zip`

When you specify the workflow definition from a Git-based repository, use `parameter-template-path` to specify the parameter template file. If you don’t provide this parameter, HealthOmics creates the workflow without a parameter template.

The following example shows the parameters related to content from a Git-based private repository: 

```
aws omics create-workflow \
    --name custom-variant \
    --description "Custom variant calling pipeline" \
    --engine "WDL" \
    --definition-repository '{
            "connectionArn": "arn:aws:codeconnections:us-east-1:123456789012:connection/abcd1234-5678-90ab-cdef-1234567890ab",
            "fullRepositoryId": "myorg/my-genomics-workflows",
            "sourceReference": {
                "type": "BRANCH",            
                "value": "main"        
            },        
            "excludeFilePatterns": ["tests/**", "*.log"]   
      }' \
    --main "workflows/variant-calling/main.wdl" \
    --parameter-template-path "parameters/variant-calling-params.json" \
    --readme-path "docs/variant-calling-README.md" \
    --storage-type "DYNAMIC" \
```

For more examples, see the blog post [ How To Create an AWS HealthOmics Workflows from Content in Git](https://repost.aws/articles/ARCEN7AjhaRSmteczRoc_QsA/how-to-create-an-aws-healthomics-workflows-from-content-in-git).

#### Specify a Readme file
<a name="specify-readme-file"></a>

You can specify the README file location using one of the following parameters:
+ **readme-markdown** – String input or a file on your local machine. 
+ **readme-uri** – The URI of a file stored on S3. 
+ **readme-path ** – The path to the README file in the repository. 

Use readme-path only in conjunction with **definition-respository**. If you don’t specify any README parameter, HealthOmics imports the root level README.md file in the repository (if it exists).

The following examples show how specify the README file location using readme-path and readme-uri.

```
# Using README from repository
aws omics create-workflow \
    --name "documented-workflow" \
    --definition-repository '...' \
    --readme-path "docs/workflow-guide.md"

# Using README from S3
aws omics create-workflow \
    --name "s3-readme-workflow" \
    --definition-repository '...' \
    --readme-uri "s3://my-bucket/workflow-docs/readme.md"
```

For more information, see [HealthOmics Workflow README files](workflows-readme.md).

#### Specify the **main** definition file
<a name="create-main-parameter"></a>

If you are including multiple workflow definition files, use the `main` parameter to specify the main definition file for your workflow.

```
aws omics create-workflow  \
  --name Test  \
  --main multi_workflow/workflow2.wdl  \
    ...
```

#### Specify the run storage type
<a name="create-run-storage-parameter"></a>

You can specify the default run storage type (DYNAMIC or STATIC) and run storage capacity (required for static storage). For more information about run storage types, see [Run storage types in HealthOmics workflows](workflows-run-types.md).

```
aws omics create-workflow  \
  --name my_workflow   \
  --definition-zip fileb://my-definition.zip \
  --parameter-template file://my-parameter-template.json   \
  --storage-type 'STATIC'  \
  --storage-capacity 1200  \
```

#### Specify the GPU configuration
<a name="create-accelerator-parameter"></a>

Use the accelerators parameter to create a workflow that runs on an accelerated-compute instance. The following example shows how to use the `accelerators` parameter. You specify the GPU configuration in the workflow definition. See [Accelerated-computing instances](memory-and-compute-tasks.md#workflow-task-accelerated-computing-instances).

```
aws omics create-workflow --name workflow name \
   --definition-uri s3://amzn-s3-demo-bucket1/GPUWorkflow.zip \
   --accelerators GPU
```

#### Configure pull through cache mapping parameters
<a name="create-prefix-mapping-parameters"></a>

If you're using the Amazon ECR pull through cache mapping feature, you can override the default mappings. For more information about the container setup parameters, see [Container images for private workflows](workflows-ecr.md).

In the following example, file `mappings.json` contains this content:

```
{
    "registryMappings": [
        {
            "upstreamRegistryUrl": "registry-1.docker.io",
            "ecrRepositoryPrefix": "docker-hub"
        },
        {
            "upstreamRegistryUrl": "quay.io",
            "ecrRepositoryPrefix": "quay",
            "accountId": "123412341234"
        },
        {

            "upstreamRegistryUrl": "public.ecr.aws",
            "ecrRepositoryPrefix": "ecr-public"
        }
    ],
    "imageMappings": [{
            "sourceImage": "docker.io/library/ubuntu:latest",
            "destinationImage": "healthomics-docker-2/custom/ubuntu:latest",
            "accountId": "123412341234"
        },
        {
            "sourceImage": "nvcr.io/nvidia/k8s/dcgm-exporter",
            "destinationImage": "healthomics-nvidia/k8s/dcgm-exporter"
        }
    ]
}
```

Specify the mapping parameters in the create-workflow command:

```
aws omics create-workflow  \
     ...
--container-registry-map-file file://mappings.json
    ...
```

You can also specify the S3 location of the mapping parameters file:

```
aws omics create-workflow  \
     ...
--container-registry-map-uri s3://amzn-s3-demo-bucket1/test.zip
    ...
```

## Creating a workflow using an SDK
<a name="sdk-create-workflows"></a>

You can create a workflow using one of the SDKs. The following example shows how to create a workflow using the Python SDK

```
import boto3

omics = boto3.client('omics')

with open('definition.zip', 'rb') as f:
   definition = f.read()

response = omics.create_workflow(
   name='my_workflow',
   definitionZip=definition,
   parameterTemplate={ ... }
)
```

# Update a private workflow
<a name="update-private-workflow"></a>

You can update a workflow using the HealthOmics console, AWS CLI commands, or one of the AWS SDKs.

**Note**  
Don’t include any personally identifiable information (PII) in workflow names. These names are visible in CloudWatch logs.

**Topics**
+ [Updating a workflow using the console](#console-update-workflows)
+ [Updating a workflow using the CLI](#api-update-workflows)
+ [Updating a workflow using an SDK](#sdk-update-workflows)

## Updating a workflow using the console
<a name="console-update-workflows"></a>

**Steps to update a workflow**

1. Open the [HealthOmics console](https://console.aws.amazon.com/omics/).

1.  If required, open the left navigation pane (≡). Choose **Private workflows**.

1. On the **Private workflows** page, choose the workflow to update.

1. On the **Workflow** page:
   + If the workflow has versions, make sure that you select the **Default version**.
   + Choose **Edit selected** from the **Actions** list. 

1. On the **Edit workflow** page, you can change any of the following values:
   + **Workflow name**.
   + **Workflow description**.
   + The default **Run storage type** for the workflow.
   + The default **Run storage capacity** (if the run storage type is static storage). For more information about the default run storage configuration, see [Creating a workflow using the console](create-private-workflow.md#console-create-workflows).

1. Choose **Save changes** to apply the changes.

## Updating a workflow using the CLI
<a name="api-update-workflows"></a>

As shown in the following example, you can update the workflow name and description. You can also change the default run storage type (STATIC or DYNAMIC) and run storage capacity (for static storage type). For more information about run storage types, see [Run storage types in HealthOmics workflows](workflows-run-types.md).

```
aws omics update-workflow    \
  --id 1234567    \
  --name my_workflow      \
  --description "updated workflow"    \
  --storage-type 'STATIC'    \
  --storage-capacity 1200
```

You don't receive a response to the `update-workflow` request.

## Updating a workflow using an SDK
<a name="sdk-update-workflows"></a>

You can update a workflow using one of the SDKs.

The following example shows how to update a workflow using the Python SDK

```
import boto3

omics = boto3.client('omics')

response = omics.update_workflow(
   name='my_workflow',
   description='updated workflow'
)
```

# Delete a private workflow
<a name="delete-private-workflow"></a>

When you no longer need a workflow, you can delete it using the HealthOmics console, AWS CLI commands, or one of the AWS SDKs. You can delete a workflow that meets the following criteria:
+ Its status is ACTIVE or FAILED.
+ It has no active shares. 
+ You've deleted all the workflow versions.

Deleting a workflow doesn't affect any ongoing runs that are using the workflow.

**Topics**
+ [Deleting a workflow using the console](#console-delete-workflows)
+ [Deleting a workflow using the CLI](#api-delete-workflows)
+ [Deleting a workflow using an SDK](#sdk-delete-workflows)

## Deleting a workflow using the console
<a name="console-delete-workflows"></a>

**To delete a workflow**

1. Open the [HealthOmics console](https://console.aws.amazon.com/omics/).

1.  If required, open the left navigation pane (≡). Choose **Private workflows**.

1. On the **Private workflows** page, choose the workflow to delete.

1. On the **Workflow** page, choose **Delete selected** from the **Actions** list.

1. In the **Delete workflow** modal, enter "confirm" to confirm the deletion.

1. Choose **Delete**.

## Deleting a workflow using the CLI
<a name="api-delete-workflows"></a>

The following example shows how you can use the AWS CLI command to delete a workflow. To run the example, replace the `workflow id` with the ID of the workflow you want to delete. 

```
aws omics delete-workflow 
  --id workflow id
```

HealthOmics doesn't send a response to the `delete-workflow` request. 

## Deleting a workflow using an SDK
<a name="sdk-delete-workflows"></a>

You can delete a workflow using one of the SDKs.

The following example shows how to delete a workflow using the Python SDK.

```
import boto3

omics = boto3.client('omics')

response = omics.delete_workflow(
   id='1234567'
)
```

# Verify the workflow status
<a name="using-get-workflow"></a>

After you create your workflow, you can verify the status and view other details of the workflow using **get-workflow**, as shown.

```
aws omics get-workflow --id 1234567 
```

The response includes workflow details, including the status, as shown.

```
{
    "arn": "arn:aws:omics:us-west-2:....",
    "creationTime": "2022-07-06T00:27:05.542459" 
    "id": "1234567",
    "engine": "WDL",
    "status": "ACTIVE",
    "type": "PRIVATE",
    "main": "workflow-crambam.wdl",
    "name": "workflow_name",
    "storageType": "STATIC",
    "storageCapacity": "1200",
    "uuid": "64c9a39e-8302-cc45-0262-2ea7116d854f"   
  }
```

You can start a run using this workflow after the status transitions to `ACTIVE`.

# Referencing genome files from a workflow definition
<a name="create-ref-files"></a>

An HealthOmics reference store object can be referred to with a URI like the following. Use your own `account ID`, `reference store ID`, and `reference ID` where indicated.

```
omics://account ID.storage.us-west-2.amazonaws.com/reference store id/reference/id            
```

Some workflows will require both the `SOURCE` and `INDEX` files for the reference genome. The previous URI is the default short form and will default to the SOURCE file. In order to specify either file, you can use the long URI form, as follows.

```
omics://account ID.storage.us-west-2.amazonaws.com/reference store id/reference/id/source
omics://account ID.storage.us-west-2.amazonaws.com/reference store id/reference/id/index
```

Using a sequence read set would have a similar pattern, as shown.

```
aws omics create-workflow \
     --name workflow name \
     --main sample workflow.wdl \
     --definition-uri omics://account ID.storage.us-west-2.amazonaws.com/sequence_store_id/readSet/id \
     --parameter-template file://parameters_sample_description.json
```

Some read sets, such as those based on FASTQ, can contain paired reads. In the following examples, they're referred to as SOURCE1 and SOURCE2. Formats such as BAM and CRAM will only have a SOURCE1 file. Some read sets will contain INDEX files such as `bai` or `crai` files. The preceding URI is the default short form and will default to the SOURCE1 file. To specify the exact file or index, you can use the long URI form, as follows.

```
omics://123456789012.storage.us-west-2.amazonaws.com/<sequence_store_id>/readSet/<id>/source1
omics://123456789012.storage.us-west-2.amazonaws.com/<sequence_store_id>/readSet/<id>/source2
omics://123456789012.storage.us-west-2.amazonaws.com/<sequence_store_id>/readSet/<id>/index
```

The following is an example of an input JSON file that uses two Omics Storage URIs.

```
{
   "input_fasta": "omics://123456789012.storage.us-west-2.amazonaws.com/<reference_store_id>/reference/<id>",
   "input_cram": "omics://123456789012.storage.us-west-2.amazonaws.com/<sequence_store_id>/readSet/<id>"
}
```

Reference the input JSON file in the AWS CLI by adding `--inputs file://<input_file.json>` to your **start-run** request.