

# Workflow definition files in HealthOmics
<a name="workflow-definition-files"></a>

You use a workflow definition to specify information about the workflow, runs, and the tasks in the runs. You create workflow definitions in one or more files using a workflow definition language. HealthOmics supports workflow definitions written in WDL, Nextflow, or CWL. 

HealthOmics supports the following choices for WDL workflow definitions: 
+ WDL – Provides a spec-conformant WDL engine. 
+ WDL lenient – Designed to handle workflows migrated from Cromwell. It supports customer Cromwell directives and some non-conformant logic. For details, see [Implicit type conversion in WDL lenient](workflow-languages-wdl.md#workflow-wdl-type-conversion).

For information about each of the workflow languages, see the language-specific detailed sections below.

You specify the following types of information in the workflow definition:
+ **Language version** – The language and version of the workflow definition.
+ **Compute and memory** – The compute and memory requirements for tasks in the workflow.
+ **Inputs** – Location of the inputs to the workflow tasks. For more information, see [HealthOmics run inputs](workflows-run-inputs.md).
+ **Outputs** – Location to save the outputs that the tasks generate.
+ **Task resources** – Compute and memory requirements for each task.
+ **Accelerators** – other resources that the tasks require, such as accelerators.

**Topics**
+ [HealthOmics workflow definition requirements](workflow-defn-requirements.md)
+ [Version support for HealthOmics workflow definition languages](workflows-lang-versions.md)
+ [Compute and memory requirements for HealthOmics tasks](memory-and-compute-tasks.md)
+ [Task outputs in a HealthOmics workflow definition](workflows-task-outputs.md)
+ [Task resources in a HealthOmics workflow definition](task-resources.md)
+ [Task accelerators in a HealthOmics workflow definition](task-accelerators.md)
+ [WDL workflow definition specifics](workflow-languages-wdl.md)
+ [Nextflow workflow definition specifics](workflow-definition-nextflow.md)
+ [CWL workflow definition specifics](workflow-languages-cwl.md)
+ [Example workflow definitions](workflow-definition-examples.md)

# HealthOmics workflow definition requirements
<a name="workflow-defn-requirements"></a>

The HealthOmics workflow definition files must meet the following requirements:
+ Tasks must define input/output parameters, Amazon ECR container repositories, and runtime specifications such as memory or CPU allocation.
+ Verify that your IAM roles have the required permissions.<a name="lower"></a>
  + Your workflow has access to input data from AWS resources, such as Amazon S3. 
  + Your workflow has access to external repository services when needed.
+ Declare the output files in the workflow definition. To copy intermediate run files to the output location, declare them as workflow outputs. 
+ The input and output locations must be in the same Region as the workflow. 
+ HealthOmics storage workflow inputs must be in `ACTIVE` status. HealthOmics won't import inputs with an `ARCHIVED` status, causing the workflow to fail. For information about Amazon S3 object inputs, see [HealthOmics run inputs](workflows-run-inputs.md).
+ A **main** location of the workflow is optional if your ZIP archive contains either a single workflow definition or a file named 'main'.<a name="lower"></a>
  + Example path: `workflow-definition/main-file.wdl`
+ Before you create a workflow from Amazon S3 or your local drive, create a zip archive of the workflow definition files and any dependencies, such as subworkflows.
+ We recommend that you declare Amazon ECR containers in the workflow as input parameters for validation of the Amazon ECR permissions. 

Additional Nextflow considerations:
+ **/bin**

  Nextflow workflow definitions may include a /bin folder with executable scripts. This path has read-only plus executable access to tasks. Tasks that rely on these scripts should use a container built with the appropriate script interpreters. Best practice is to call the interpreter directly. For example:

  ```
  process my_bin_task {
     ...
     script:
        """
        python3 my_python_script.py
        """
  }
  ```
+ **includeConfig**

  Nextflow-based workflow definitions can include nextflow.config files that help to abstract parameter definitions or process resource profiles. To support development and execution of Nextflow pipelines on multiple environments, use a HealthOmics-specific configuration that you add to the global config using the includeConfig directive. To maintain portability, configure the workflow to include the file only when running on HealthOmics by using the following code:

  ```
  // at the end of the nextflow.config file
  if ("$AWS_WORKFLOW_RUN") {
      includeConfig 'conf/omics.config'
  }
  ```
+ **Reports**

  HealthOmics doesn't support engine-generated dag, trace, and execution reports. You can generate alternatives to the trace and execution reports using a combination of GetRun and GetRunTask API calls. 

Additional CWL considerations:
+ **Container image uri interpolation**

  HealthOmics allows the dockerPull property of the DockerRequirement to be an inline javascript expression. For example:

  ```
  requirements:
    DockerRequirement:
      dockerPull: "$(inputs.container_image)"
  ```

  This allows you to specifying container image URIs as input parameters to the workflow.
+ **Javascript expressions**

  Javascript expressions must be `strict mode` compliant.
+ **Operation process**

  HealthOmics doesn't support CWL Operation processes.

# Version support for HealthOmics workflow definition languages
<a name="workflows-lang-versions"></a>

HealthOmics supports workflow definition files written in Nextflow, WDL, or CWL. The following sections provide information about HealthOmics version support for these languages.

**Topics**
+ [WDL version support](#workflows-lang-versions-WDL)
+ [CWL version support](#workflows-lang-versions-CWL)
+ [Nextflow version support](#workflows-lang-versions-nextflow)

## WDL version support
<a name="workflows-lang-versions-WDL"></a>

HealthOmics supports versions 1.0, 1.1, and the development version of the WDL specification.

Every WDL document must include a version statement to specify which version (major and minor) of the specification it adheres to. For more information about versions, see [WDL versioning](https://github.com/openwdl/wdl/blob/wdl-1.1/SPEC.md#versioning)

Versions 1.0 and 1.1 of the WDL specification do not support the `Directory` type. To use the `Directory` type for inputs or outputs, set the version to **development** in the first line of the file:

```
version development  # first line of .wdl file
     ... remainder of the file ...
```

## CWL version support
<a name="workflows-lang-versions-CWL"></a>

HealthOmics supports versions 1.0, 1.1, and 1.2 of the CWL language.

You can specify the language version in the CWL workflow definition file. For more information about CWL, see the [CWL user guide](https://github.com/common-workflow-language/user_guide)

## Nextflow version support
<a name="workflows-lang-versions-nextflow"></a>

HealthOmics supports four Nextflow stable versions. Nextflow typically releases a stable version every six months. HealthOmics doesn't support the monthly “edge” releases.

HealthOmics supports released features in each version, but not preview features.

### Supported versions
<a name="workflows-versions-nextflow-list"></a>

HealthOmics supports the following Nextflow versions:
+ Nextflow v22.04.01 DSL 1 and DSL 2
+ Nextflow v23.10.0 DSL 2 (default)
+ Nextflow v24.10.8 DSL 2
+ Nextflow v25.10.0 DSL 2

**Note**  
HealthOmics does not support strict syntax mode in Nextflow v25.10.0.

To migrate your workflow to the latest supported version (v25.10.0), follow the [Nextflow upgrade guide](https://www.nextflow.io/docs/latest/migrations/25-10.html).

There are some breaking changes when migrating to Nextflow v24 and v25. Follow the [Nextflow migration guide](https://www.nextflow.io/docs/latest/migrations/index.html).

### Detect and process Nextflow versions
<a name="workflows-versions-processing"></a>

HealthOmics detects the DSL version and Nextflow version that you specify. It automatically determines the best Nextflow version to run based on these inputs.

#### DSL version
<a name="workflows-versions-p1"></a>

HealthOmics detects the requested DSL version in your workflow definition file. For example, you can specify: `nextflow.enable.dsl=2`.

HealthOmics supports DSL 2 by default. It provides backwards compatibility with DSL 1, if specified in your workflow definition file.
+ If you specify DSL 1, HealthOmics runs Nextflow v22.04 DSL1 (the only supported version that runs DSL 1).
+ If you don't specify a DSL version, or if HealthOmics can’t parse the DSL information for any reason (such as syntax errors in your workflow definition file), HealthOmics defaults to DSL 2 and runs Nextflow v23.10.0.
+ To upgrade your workflow from DSL 1 to DSL 2 to take advantage of the latest Nextflow versions and software features, see [Migrating from DSL 1](https://nextflow.io/docs/latest/dsl1.html).

#### Nextflow versions
<a name="workflows-versions-p2"></a>

HealthOmics detects the requested Nextflow version in the Nextflow configuration file (nextflow.config), if you provide this file. We recommend that you add the `nextflowVersion` clause at the end of the file to avoid any unexpected overrides from included configs. For more information, see [Nextflow configuration](https://nextflow.io/docs/latest/config.html).

You can specify a Nextflow version or a range of versions using the following syntax:

```
   // exact match
   manifest.nextflowVersion = '1.2.3'   
            
   // 1.2 or later (excluding 2 and later)
   manifest.nextflowVersion = '1.2+'         
            
   // 1.2 or later
   manifest.nextflowVersion = '>=1.2'
            
   // any version in the range 1.2 to 1.5
   manifest.nextflowVersion = '>=1.2, <=1.5' 
            
   // use the "!" prefix to stop execution if the current version 
   // doesn't match the required version.
   manifest.nextflowVersion = '!>=1.2'
```

HealthOmics processes the Nextflow version information as follows: 
+ If you use **=** to specify an exact version that HealthOmics supports, HealthOmics uses that version. 
+ If you use **\$1** to specify an exact version or a range of versions that are not supported, HealthOmics raises an exception and fails the run. Consider using this option if you want to be strict with version requests and fail quickly if the request includes unsupported versions.
+ If you specify a range of versions, HealthOmics uses the highest-preference version in that range. The preference order from highest to lowest is v23.10.0, v22.04.0, v24.10.8, and v25.10.0. For example:
  + If the range covers v23.10.0, v24.10.8, and v25.10.0, HealthOmics chooses v23.10.0.
  + If the range covers v24.10.8 and v25.10.0, HealthOmics chooses v24.10.8.
+ If there is no requested version, or if the requested versions aren't valid or can’t be parsed for any reason:
  + If you specified DSL 1, HealthOmics runs Nextflow v22.04.
  + Otherwise, HealthOmics runs Nextflow v23.10.0.

 You can retrieve the following information about the Nextflow version that HealthOmics used for each run:
+ The run logs contain information about the actual Nextflow version that HealthOmics used for the run.
+ HealthOmics adds warnings in the run logs if there isn't a direct match with your requested version or if it needed to use a different version than you specified.
+ The response to the **GetRun** API operation includes a field (`engineVersion`) with the actual Nextflow version that HealthOmics used for the run. For example:

  ```
  "engineVersion":"22.04.0"
  ```

# Compute and memory requirements for HealthOmics tasks
<a name="memory-and-compute-tasks"></a>

HealthOmics runs your private workflow tasks in an omics instance. HealthOmics provides a variety of instance types to accommodate different types of tasks. Each instance type has a fixed memory and vCPU configuration (and fixed GPU configuration for accelerated computing instance types). The cost of using an omics instance varies depending on the instance type. For details, see the [HealthOmics Pricing](https://aws.amazon.com/healthomics/pricing/) page.

For tasks in a workflow, you specify the required memory and vCPUs in the workflow definition file. When a workflow task runs, HealthOmics allocates the smallest omics instance that accommodates the requested memory and vCPUs. For example, if a task needs 64 GiB of memory and 8 vCPUs, HealthOmics selects `omics.r.2xlarge`.

We recommend that you review the instance types and set your requested vCPUs and memory size to match the instance that best meets your needs. The task container uses the number of vCPUs and the memory size that you specify in your workflow definition file, even if the instance type has additional vCPUs and memory. 

The following list contains additional information about vCPU and memory allocation:
+ Container resource allocations are hard limits. If a task runs out of memory or attempts to use additional vCPUs , the task generates an error log and exits.
+ If you don’t specify any compute or memory requirements, HealthOmics selects **omics.c.large** and defaults to a configuration with 1 vCPU and 1 GiB of memory.
+ The minimum configuration that you can request is 1 vCPU and 1 GiB of memory. 
+ If you specify vCPUs, memory, or GPUs that exceeds the supported instance types, HealthOmics throws an error message and the workflow fails validations
+ If you specify fractional units, HealthOmics rounds up to the nearest integer.
+ HealthOmics reserves a small amount of memory (5%) for management and logging agents, so the full memory allocation might not always be available to the application in the task.
+ HealthOmics matches instance types to fit the compute and memory requirements that you specify, and may use a mix of hardware generations. For this reason, there can be some minor variances in task run times for the same task.

These topics provide details about the instance types that HealthOmics supports. 

**Topics**
+ [Standard instance types](#workflow-task-standard-instances)
+ [Compute-optimized instances](#workflow-task-compute-optimized-instances)
+ [Memory-optimized instances](#workflow-task-memory-optimized-instances)
+ [Accelerated-computing instances](#workflow-task-accelerated-computing-instances)

**Note**  
 For standard, compute, and memory optimized instances, increase the instance bandwidth size if the instance requires a higher throughput. Amazon EC2 instances with fewer than 16 vCPUs (size 4xl and smaller) can experience throughput bursting. For more information on Amazon EC2 instance throughput, see [Amazon EC2 available instance bandwidth](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html#available-instance-bandwidth).

## Standard instance types
<a name="workflow-task-standard-instances"></a>

For standard instance types, the configurations aim for a balance of compute power and memory. 

HealthOmics supports the 32xlarge and 48xlarge instances in these regions: US West (Oregon) and US East (N. Virginia).


| Instance | Number of vCPUs | Memory | 
| --- | --- | --- | 
| omics.m.large | 2 | 8 GiB | 
| omics.m.xlarge | 4 | 16 GiB | 
| omics.m.2xlarge | 8 | 32 GiB | 
| omics.m.4xlarge | 16 | 64 GiB | 
| omics.m.8xlarge | 32 | 128 GiB | 
| omics.m.12xlarge | 48 | 192 GiB | 
| omics.m.16xlarge | 64 | 256 GiB | 
| omics.m.24xlarge | 96 | 384 GiB | 
| omics.m.32xlarge | 128 | 512 GiB | 
| omics.m.48xlarge | 192 | 768 GiB | 

## Compute-optimized instances
<a name="workflow-task-compute-optimized-instances"></a>

For compute-optimized instance types, the configurations have more compute power and less memory.

HealthOmics supports the 32xlarge and 48xlarge instances in these regions: US West (Oregon) and US East (N. Virginia).


| Instance | Number of vCPUs | Memory | 
| --- | --- | --- | 
| omics.c.large | 2 | 4 GiB | 
| omics.c.xlarge | 4 | 8 GiB | 
| omics.c.2xlarge | 8 | 16 GiB | 
| omics.c.4xlarge | 16 | 32 GiB | 
| omics.c.8xlarge | 32 | 64 GiB | 
| omics.c.12xlarge | 48 | 96 GiB | 
| omics.c.16xlarge | 64 | 128 GiB | 
| omics.c.24xlarge | 96 | 192 GiB | 
| omics.c.32xlarge | 128 | 256 GiB | 
| omics.c.48xlarge | 192 | 384 GiB | 

## Memory-optimized instances
<a name="workflow-task-memory-optimized-instances"></a>

For memory-optimized instance types, the configurations have less compute power and more memory.

HealthOmics supports the 32xlarge and 48xlarge instances in these regions: US West (Oregon) and US East (N. Virginia).


| Instance | Number of vCPUs | Memory | 
| --- | --- | --- | 
| omics.r.large | 2 | 16 GiB | 
| omics.r.xlarge | 4 | 32 GiB | 
| omics.r.2xlarge | 8 | 64 GiB | 
| omics.r.4xlarge | 16 | 128 GiB | 
| omics.r.8xlarge | 32 | 256 GiB | 
| omics.r.12xlarge | 48 | 384 GiB | 
| omics.r.16xlarge | 64 | 512 GiB | 
| omics.r.24xlarge | 96 | 768 GiB | 
| omics.r.32xlarge | 128 | 1024 GiB | 
| omics.r.48xlarge | 192 | 1536 GiB | 

## Accelerated-computing instances
<a name="workflow-task-accelerated-computing-instances"></a>

You can optionally specify GPU resources for each task in a workflow, so that HealthOmics allocates an accelerated-computing instance for the task. For information on how to specify the GPU information in the workflow definition file, see [Task accelerators in a HealthOmics workflow definition](task-accelerators.md).

If you specify a task accelerator that supports multiple instance types, HealthOmics selects the instance type based on availability. If more than one instance types are available, HealthOmics gives preference to the lower cost instance. The exception is for the nvidia-t4-a10g-l4 task accelerator which gives preference to the latest generation instance available in your region.

G4 instances aren't supported in the Israel (Tel Aviv) Region. G5 instances aren't support in the Asia Pacific (Singapore) Region. 



**Topics**
+ [G6 and G6e instance types](#workflow-task-accelerated-accelerated-g6)
+ [G4 and G5 instances](#workflow-task-accelerated-accelerated-g45)

### G6 and G6e instance types
<a name="workflow-task-accelerated-accelerated-g6"></a>

HealthOmics supports the following G6 accelerated-computing instance configurations. All omics.g6 instances use Nvidia L4 GPUs.

HealthOmics supports the G6 and G6e instances in these regions: US West (Oregon) and US East (N. Virginia).


| Instance | Number of vCPUs | Memory | Number of GPUs | GPU memory | 
| --- | --- | --- | --- | --- | 
| omics.g6.xlarge | 4 | 16 GiB | 1 | 24 GiB | 
| omics.g6.2xlarge | 8 | 32 GiB | 1 | 24 GiB | 
| omics.g6.4xlarge | 16 | 64 GiB | 1 | 24 GiB | 
| omics.g6.8xlarge | 32 | 128 GiB | 1 | 24 GiB | 
| omics.g6.12xlarge | 48 | 192 GiB | 4 | 96 GiB | 
| omics.g6.16xlarge | 64 | 256 GiB | 1 | 24 GiB | 
| omics.g6.24xlarge | 96 | 384 GiB | 4 | 96 GiB | 

All omics.g6e instances use Nvidia L40s GPUs.


| Instance | Number of vCPUs | Memory | Number of GPUs | GPU memory | 
| --- | --- | --- | --- | --- | 
| omics.g6e.xlarge | 4 | 32 GiB | 1 | 48 GiB | 
| omics.g6e.2xlarge | 8 | 64 GiB | 1 | 48 GiB | 
| omics.g6e.4xlarge | 16 | 128 GiB | 1 | 48 GiB | 
| omics.g6e.8xlarge | 32 | 256 GiB | 1 | 48 GiB | 
| omics.g6e.12xlarge | 48 | 384 GiB | 4 | 192 GiB | 
| omics.g6e.16xlarge | 64 | 512 GiB | 1 | 48 GiB | 
| omics.g6e.24xlarge | 96 | 768 GiB | 4 | 192 GiB | 

### G4 and G5 instances
<a name="workflow-task-accelerated-accelerated-g45"></a>

HealthOmics supports the following G4 and G5 accelerated-computing instance configurations. 

All omics.g5 instances use Nvidia Tesla A10G GPUs.


| Instance | Number of vCPUs | Memory | Number of GPUs | GPU memory | 
| --- | --- | --- | --- | --- | 
| omics.g5.xlarge | 4 | 16 GiB | 1 | 24 GiB | 
| omics.g5.2xlarge | 8 | 32 GiB | 1 | 24 GiB | 
| omics.g5.4xlarge | 16 | 64 GiB | 1 | 24 GiB | 
| omics.g5.8xlarge | 32 | 128 GiB | 1 | 24 GiB | 
| omics.g5.12xlarge | 48 | 192 GiB | 4 | 96 GiB | 
| omics.g5.16xlarge | 64 | 256 GiB | 1 | 24 GiB | 
| omics.g5.24xlarge | 96 | 384 GiB | 4 | 96 GiB | 

All omics.g4dn instances use Nvidia Tesla T4 GPUs.


| Instance | Number of vCPUs | Memory | Number of GPUs | GPU memory | 
| --- | --- | --- | --- | --- | 
| omics.g4dn.xlarge | 4 | 16 GiB | 1 | 16 GiB | 
| omics.g4dn.2xlarge | 8 | 32 GiB | 1 | 16 GiB | 
| omics.g4dn.4xlarge | 16 | 64 GiB | 1 | 16 GiB | 
| omics.g4dn.8xlarge | 32 | 128 GiB | 1 | 16 GiB | 
| omics.g4dn.12xlarge | 48 | 192 GiB | 4 | 64 GiB | 
| omics.g4dn.16xlarge | 64 | 256 GiB | 1 | 24 GiB | 

# Task outputs in a HealthOmics workflow definition
<a name="workflows-task-outputs"></a>

You specify task outputs in the workflow definition. By default, HealthOmics discards all intermediate task files when the workflow completes. To export an intermediate file, you define it as an output. 

If you use call caching, HealthOmics saves task outputs to the cache, including any intermediate files that you define as outputs.

The following topics include task definition examples for each of the workflow definition languages.

**Topics**
+ [Task outputs for WDL](#workflow-task-outputs-wdl)
+ [Task outputs for Nextflow](#workflow-task-outputs-nextflow)
+ [Task outputs for CWL](#workflow-task-outputs-cwl)

## Task outputs for WDL
<a name="workflow-task-outputs-wdl"></a>

For workflow definitions written in WDL, define your outputs in the top level workflow **outputs** section. 

HealthOmics

**Topics**
+ [Task output for STDOUT](#task-outputs-wdl-stdout)
+ [Task output for STDERR](#task-outputs-wdl-stderr)
+ [Task output to a file](#task-outputs-wdl-file)
+ [Task output to an array of files](#task-outputs-wdl-files)

### Task output for STDOUT
<a name="task-outputs-wdl-stdout"></a>

This example creates a task named `SayHello` that echoes the STDOUT content to the task output file. The WDL **stdout** function captures the STDOUT content (in this example, the input string **Hello World\$1**) in file **stdout\$1file**. 

Because HealthOmics creates logs for all STDOUT content, the output also appears in CloudWatch Logs, along with other STDERR logging information for the task.

```
version 1.0
 workflow HelloWorld {
    input {
        String message = "Hello, World!"
        String ubuntu_container = "123456789012.dkr.ecr.us-east-1.amazonaws.com/dockerhub/library/ubuntu:20.04"
    }

    call SayHello {
        input:
            message = message,
            container = ubuntu_container
    }

    output {
        File stdout_file = SayHello.stdout_file
    }
}

task SayHello {
    input {
        String message
        String container
    }

    command <<<
        echo "~{message}" 
        echo "Current date: $(date)"
        echo "This message was printed to STDOUT"
    >>>

    runtime {
        docker: container
        cpu: 1
        memory: "2 GB"
    }

    output {
        File stdout_file = stdout()
    }
}
```

### Task output for STDERR
<a name="task-outputs-wdl-stderr"></a>

This example creates a task named `SayHello` that echoes the STDERR content to the task output file. The WDL **stderr** function captures the STDERR content (in this example, the input string **Hello World\$1**) in file **stderr\$1file**. 

Because HealthOmics creates logs for all STDERR content, the output will appear in CloudWatch Logs, along with other STDERR logging information for the task.

```
version 1.0
 workflow HelloWorld {
    input {
        String message = "Hello, World!"
        String ubuntu_container = "123456789012.dkr.ecr.us-east-1.amazonaws.com/dockerhub/library/ubuntu:20.04"
    }

    call SayHello {
        input:
            message = message,
            container = ubuntu_container
    }

    output {
        File stderr_file = SayHello.stderr_file
    }
}

task SayHello {
    input {
        String message
        String container
    }

    command <<<
        echo "~{message}" >&2
        echo "Current date: $(date)" >&2
        echo "This message was printed to STDERR" >&2
    >>>

    runtime {
        docker: container
        cpu: 1
        memory: "2 GB"
    }

    output {
        File stderr_file = stderr()
    }
}
```

### Task output to a file
<a name="task-outputs-wdl-file"></a>

In this example, the SayHello task creates two files (message.txt and info.txt) and explicitly declares these files as the named outputs (message\$1file and info\$1file). 

```
version 1.0
workflow HelloWorld {
    input {
        String message = "Hello, World!"
        String ubuntu_container = "123456789012.dkr.ecr.us-east-1.amazonaws.com/dockerhub/library/ubuntu:20.04"
    }

    call SayHello {
        input:
            message = message,
            container = ubuntu_container
    }

    output {
        File message_file = SayHello.message_file
        File info_file = SayHello.info_file
    }
}

task SayHello {
    input {
        String message
        String container
    }

    command <<<
        # Create message file
        echo "~{message}" > message.txt
        
        # Create info file with date and additional information
        echo "Current date: $(date)" > info.txt
        echo "This message was saved to a file" >> info.txt
    >>>

    runtime {
        docker: container
        cpu: 1
        memory: "2 GB"
    }

    output {
        File message_file = "message.txt"
        File info_file = "info.txt"
    } 
}
```

### Task output to an array of files
<a name="task-outputs-wdl-files"></a>

In this example, the `GenerateGreetings` task generates an array of files as the task output. The task dynamically generates one greeting file for each member of the input array `names`. Because the file names are not known until runtime, the output definition uses the WDL glob() function to output all files that match the pattern `*_greeting.txt`. 

```
version 1.0
 workflow HelloArray {
    input {
        Array[String] names = ["World", "Friend", "Developer"]
        String ubuntu_container = "123456789012.dkr.ecr.us-east-1.amazonaws.com/dockerhub/library/ubuntu:20.04"
    }

    call GenerateGreetings {
        input:
            names = names,
            container = ubuntu_container
    }

    output {
        Array[File] greeting_files = GenerateGreetings.greeting_files
    }
}

task GenerateGreetings {
    input {
        Array[String] names
        String container
    }

    command  <<<
        # Create a greeting file for each name
        for name in ~{sep=" " names}; do
            echo "Hello, $name!" > ${name}_greeting.txt
        done
    >>>

    runtime {
        docker: container
        cpu: 1
        memory: "2 GB"
    }

    output {
        Array[File] greeting_files = glob("*_greeting.txt")
    }       
 }
```

## Task outputs for Nextflow
<a name="workflow-task-outputs-nextflow"></a>

For workflow definitions written in Nextflow, define a **publishDir** directive to export task content to your output Amazon S3 bucket. Set the **publishDir** value to `/mnt/workflow/pubdir`. 

For HealthOmics to export files to Amazon S3, the files must be in this directory.

If a task produces a group of output files for use as inputs to a subsequent task, we recommend that you group these files in a directory and emit the directory as a task output. Enumerating each individual file can result in an I/O bottleneck in the underlying file system. For example:

```
process my_task {
      ...
      // recommended
      output "output-folder/", emit: output
      
      // not recommended
      // output "output-folder/**", emit: output
      ...
  }
```

## Task outputs for CWL
<a name="workflow-task-outputs-cwl"></a>

For workflow definitions written in CWL, you can specify the task outputs using `CommandLineTool` tasks. The following sections show examples of `CommandLineTool` tasks that define different types of outputs.

**Topics**
+ [Task output for STDOUT](#task-outputs-cwl-stdout)
+ [Task output for STDERR](#task-outputs-cwl-stderr)
+ [Task output to a file](#task-outputs-cwl-file)
+ [Task output to an array of files](#task-outputs-cwl-files)

### Task output for STDOUT
<a name="task-outputs-cwl-stdout"></a>

This example creates a `CommandLineTool` task that echoes the STDOUT content to a text output file named **output.txt**. For example, if you provide the following input, the resulting task output is **Hello World\$1** in the **output.txt** file.

```
{
    "message": "Hello World!"
}
```

The `outputs` directive specifies that the output name is **example\$1out** and it’s type is `stdout`. For a downstream task to consume the output of this task, it would refer to the output as `example_out`.

Because HealthOmics creates logs for all STDERR and STDOUT content, the output also appears in CloudWatch Logs, along with other STDERR logging information for the task.

```
cwlVersion: v1.2
class: CommandLineTool
baseCommand: echo
stdout: output.txt
inputs:
  message:
    type: string
    inputBinding:
      position: 1
outputs:
  example_out:
    type: stdout

requirements:
    DockerRequirement:
        dockerPull: 123456789012.dkr.ecr.us-east-1.amazonaws.com/dockerhub/library/ubuntu:20.04
    ResourceRequirement:
        ramMin: 2048
        coresMin: 1
```

### Task output for STDERR
<a name="task-outputs-cwl-stderr"></a>

This example creates a `CommandLineTool` task that echoes the STDERR content to a text output file named **stderr.txt**. The task modifies the `baseCommand` so that `echo` writes to STDERR (instead of STDOUT).

The `outputs` directive specifies that the output name is **stderr\$1out** and it’s type is `stderr`. 

Because HealthOmics creates logs for all STDERR and STDOUT content, the output will appear in CloudWatch Logs, along with other STDERR logging information for the task.

```
cwlVersion: v1.2
class: CommandLineTool
baseCommand: [bash, -c]
stderr: stderr.txt
inputs:
  message:
    type: string
    inputBinding:
      position: 1
      shellQuote: true
      valueFrom: "echo $(self) >&2"
outputs:
  stderr_out:
    type: stderr

requirements:
    DockerRequirement:
        dockerPull: 123456789012.dkr.ecr.us-east-1.amazonaws.com/dockerhub/library/ubuntu:20.04
    ResourceRequirement:
        ramMin: 2048
        coresMin: 1
```

### Task output to a file
<a name="task-outputs-cwl-file"></a>

This example creates a `CommandLineTool` task that creates a compressed tar archive from the input files. You provide the name of the archive as an input parameter (archive\$1name). 

The **outputs** directive specifies that the `archive_file` output type is `File`, and it uses a reference to the input parameter `archive_name` to bind to the output file.

```
cwlVersion: v1.2
class: CommandLineTool
baseCommand: [tar, cfz]
inputs:
  archive_name:
    type: string
    inputBinding:
      position: 1
  input_files:
    type: File[]
    inputBinding:
      position: 2
      
outputs:
  archive_file:
    type: File
    outputBinding:
      glob: "$(inputs.archive_name)"

requirements:
    DockerRequirement:
        dockerPull: 123456789012.dkr.ecr.us-east-1.amazonaws.com/dockerhub/library/ubuntu:20.04
    ResourceRequirement:
        ramMin: 2048
        coresMin: 1
```

### Task output to an array of files
<a name="task-outputs-cwl-files"></a>

In this example, the `CommandLineTool` task creates an array of files using the `touch` command. The command uses the strings in the `files-to-create` input parameter to name the files. The command outputs an array of files. The array includes any files in the working directory that match the `glob` pattern. This example uses a wildcard pattern ("\$1") that matches all files.

```
cwlVersion: v1.2
class: CommandLineTool
baseCommand: touch
inputs:
  files-to-create:
    type:
      type: array
      items: string
    inputBinding:
      position: 1
outputs:
  output-files:
    type:
      type: array
      items: File
    outputBinding:
      glob: "*"

requirements:
    DockerRequirement:
        dockerPull: 123456789012.dkr.ecr.us-east-1.amazonaws.com/dockerhub/library/ubuntu:20.04
    ResourceRequirement:
        ramMin: 2048
        coresMin: 1
```

# Task resources in a HealthOmics workflow definition
<a name="task-resources"></a>

In the workflow definition, define the following for each task:
+ The container image for task. For more information, see [Container images for private workflows](workflows-ecr.md).
+ The number of CPUs and memory required for the task. For more information, see [Compute and memory requirements for HealthOmics tasks](memory-and-compute-tasks.md).

HealthOmics ignores any per-task storage specifications. HealthOmics provides run storage that all tasks in the run can access. For more information, see [Run storage types in HealthOmics workflows](workflows-run-types.md).

------
#### [ WDL ]

```
task my_task {
   runtime {
      container: "<aws-account-id>.dkr.ecr.<aws-region>.amazonaws.com/<image-name>"
      cpu: 2
      memory: "4 GB"
   }
   ...
}
```

For a WDL workflow, HealthOmics attempts up to two retries for a task that fails because of service errors (API request returns a 5XX HTTP status code). For more information about task retries, see [Task Retries](monitoring-runs.md#run-status-task-retries).

You can opt out of the retry behavior by specifying the following configuration for the task in the WDL definition file:

```
runtime {
   preemptible: 0
}
```

------
#### [ NextFlow ]

```
process my_task {
   container "<aws-account-id>.dkr.ecr.<aws-region>.amazonaws.com/<image-name>"
   cpus 2
   memory "4 GiB"
   ...
}
```

------
#### [ CWL ]

```
cwlVersion: v1.2
class: CommandLineTool
requirements:
    DockerRequirement:
        dockerPull: "<aws-account-id>.dkr.ecr.<aws-region>.amazonaws.com/<image-name>"
    ResourceRequirement:
        coresMax: 2
        ramMax: 4000 # specified in mebibytes
```

------

# Task accelerators in a HealthOmics workflow definition
<a name="task-accelerators"></a>

In the workflow definition, you can optionally specify the GPU accelerator-spec for a task. HealthOmics supports the following accelerator-spec values, along with the supported instance types:


| Accelerator spec | Healthomics instance types | 
| --- | --- | 
| nvidia-tesla-t4 | G4 | 
| nvidia-tesla-t4-a10g | G4 and G5 | 
| nvidia-tesla-a10g | G5 | 
| nvidia-t4-a10g-l4 | G4, G5, and G6 | 
| nvidia-l4-a10g | G5 and G6 | 
| nvidia-l4 | G6 | 
| nvidia-l40s | G6e | 

If you specify an accelerator type that supports multiple instance types, HealthOmics selects the instance type based on available capacity. If both instance types are available, HealthOmics gives preference to the lower cost instance. The exception is for the nvidia-t4-a10g-l4 task accelerator which gives preference to the latest generation instance available.

For details about the instance types, see [Accelerated-computing instances](memory-and-compute-tasks.md#workflow-task-accelerated-computing-instances).

In the following example, the workflow definition specifies `nvidia-l4` as the accelerator:

------
#### [ WDL ]

```
task my_task {
 runtime {
    ...
    acceleratorCount: 1
    acceleratorType: "nvidia-l4"
 }
 ...
}
```

------
#### [ NextFlow ]

```
process my_task {
 ...
 accelerator 1, type: "nvidia-l4"
 ...
}
```

------
#### [ CWL ]

```
cwlVersion: v1.2
class: CommandLineTool
requirements:
  ...
  cwltool:CUDARequirement:
      cudaDeviceCountMin: 1
      cudaComputeCapability: "nvidia-l4"
      cudaVersionMin: "1.0"
```

------

# WDL workflow definition specifics
<a name="workflow-languages-wdl"></a>

The following topics provide details about types and directives available for WDL workflow definitions in HealthOmics.

**Topics**
+ [Implicit type conversion in WDL lenient](#workflow-wdl-type-conversion)
+ [Namespace definition in input.json](#workflow-wdl-namespace-defn)
+ [Primitive types in WDL](#workflow-wdl-primitive-types)
+ [Complex types in WDL](#workflow-wdl-complex-types)
+ [Directives in WDL](#workflow-wdl-directives)
+ [Task metadata in WDL](#workflow-wdl-task-metadata)
+ [WDL workflow definition example](#wdl-example)

## Implicit type conversion in WDL lenient
<a name="workflow-wdl-type-conversion"></a>

HealthOmics supports implicit type conversion in the input.json file and the workflow definition. To use implicit type casting, specify the workflow engine as WDL lenient when you create the workflow. WDL lenient is designed to handle workflows migrated from Cromwell. It supports customer Cromwell directives and some non-conformant logic.

WDL lenient supports type conversion for the following items in the list of WDL’s [limited exceptions](https://github.com/openwdl/wdl/blob/wdl-1.2/SPEC.md#-limited-exceptions):
+ Float to Int, where the coercion results in no loss of precision (such as 1.0 maps to 1).
+ String to Int/Float, where the coercion results in no loss of precision.
+ Map[W, X] to Array[Pair[Y, Z]], in the case where W is coercible to Y and X is coercible to Z.
+ Array[Pair[W, X]] to Map[Y, Z], in the case where W is coercible to Y and X is coercible to Z (such as 1.0 maps to 1).

To use implicit type casting, specify the workflow engine as WDL\$1LENIENT when you create the workflow or workflow version.

In the console, the workflow engine parameter is named **Language**. In the API, the workflow engine parameter is named **engine**. For more information, see [Create a private workflow](create-private-workflow.md) or [Create a workflow version](workflows-version-create.md).

## Namespace definition in input.json
<a name="workflow-wdl-namespace-defn"></a>

HealthOmics supports fully qualified variables in input.json. For example, if you declare two input variables named number1 and number2 in workflow **SumWorkflow**:

```
workflow SumWorkflow {
  input {
    Int number1
    Int number2
  }
}
```

 You can use them as fully qualified variables in input.json: 

```
{
    "SumWorkflow.number1": 15,
    "SumWorkflow.number2": 27
}
```

## Primitive types in WDL
<a name="workflow-wdl-primitive-types"></a>

The following table shows how inputs in WDL map to the matching primitive types. HealthOmics provides limited support for type coercion, so we recommend that you set explicit types. 


**Primitive types**  

| WDL type | JSON type | Example WDL | Example JSON key and value | Notes | 
| --- | --- | --- | --- | --- | 
| Boolean | boolean | Boolean b | "b": true | The value must be lower case and unquoted. | 
| Int | integer | Int i | "i": 7 | Must be unquoted. | 
| Float | number | Float f | "f": 42.2 | Must be unquoted. | 
| String | string | String s | "s": "characters" | JSON strings that are a URI must be mapped to a WDL file to be imported. | 
| File | string | File f | "f": "s3://amzn-s3-demo-bucket1/path/to/file" | Amazon S3 and HealthOmics storage URIs are imported as long as the IAM role provided for the workflow has read access to these objects. No other URI schemes are supported (such as file://, https://, and ftp://). The URI must specify an object. It cannot be a directory meaning it can not end with a /. | 
| Directory | string | Directory d | "d": "s3://bucket/path/" | The Directory type isn't included in WDL 1.0 or 1.1, so you will need to add version development to the header of the WDL file. The URI must be a Amazon S3 URI and with a prefix that ends with a '/'. All contents of the directory will be recursively copied to the workflow as a single download. The Directory should only contain files related to the workflow. | 

## Complex types in WDL
<a name="workflow-wdl-complex-types"></a>

The following table show how inputs in WDL map to the matching complex JSON types. Complex types in WDL are data structures comprised of primitive types. Data structures such as lists will be converted to arrays.


**Complex types**  

| WDL type | JSON type | Example WDL | Example JSON key and value | Notes | 
| --- | --- | --- | --- | --- | 
| Array | array | Array[Int] nums | “nums": [1, 2, 3] | The members of the array must follow the format of the WDL array type. | 
| Pair | object | Pair[String, Int] str\$1to\$1i | “str\$1to\$1i": \$1"left": "0", "right": 1\$1 | Each value of the pair must use the JSON format of its matching WDL type. | 
| Map | object | Map[Int, String] int\$1to\$1string | "int\$1to\$1string": \$1 2: "hello", 1: "goodbye" \$1 | Each entry in the map must use the JSON format of its matching WDL type. | 
| Struct | object | <pre>struct SampleBamAndIndex { <br />  String sample_name <br />  File bam <br />  File bam_index <br />} SampleBamAndIndex b_and_i</pre>  |  <pre>"b_and_i": { <br />   "sample_name": "NA12878", <br />   "bam": "s3://amzn-s3-demo-bucket1/NA12878.bam", <br />   "bam_index": "s3://amzn-s3-demo-bucket1/NA12878.bam.bai" <br />}           </pre>  | The names of the struct members must exactly match the names of the JSON object keys. Each value must use the JSON format of the matching WDL type. | 
| Object | N/A | N/A | N/A | The WDL Object type is outdated and should be replaced by Struct in all cases. | 

## Directives in WDL
<a name="workflow-wdl-directives"></a>

HealthOmics supports the following directives in all WDL versions that HealthOmics supports.

### Configure GPU resources
<a name="workflow-wdl-directive-gpu"></a>

HealthOmics supports runtime attributes **acceleratorType** and **acceleratorCount** with all supported [GPU instances](https://docs.aws.amazon.com/omics/latest/dev/task-accelerators.html). HealthOmics also supports aliases named **gpuType** and **gpuCount**, which have the same functionality as their accelerator counterparts. If the WDL definition contains both directives, HealthOmics uses the accelerator values.

The following example shows how to use these directives:

```
runtime {
    gpuCount: 2
    gpuType: "nvidia-tesla-t4"
}
```

### Configure task retry for service errors
<a name="workflow-wdl-task-retry"></a>

HealthOmics supports up to two retries for a task that failed because of service errors (5XX HTTP status codes). You can configure the maximum number of retries (1 or 2) and you can opt out of retries for service errors. By default, HealthOmics attempts a maximum of two retries. 

The following example sets `preemptible` to opt out of retries for service errors:

```
{
  preemptible: 0 
}
```

For more information about task retries in HealthOmics, see [Task Retries](monitoring-runs.md#run-status-task-retries).

### Configure task retry for out of memory
<a name="workflow-wdl-retries"></a>

HealthOmics supports retries for a task that failed because it ran out of memory (container exit code 137, 4XX HTTP status code). HealthOmics doubles the amount of memory for each retry attempt.

By default, HealthOmics doesn't retry for this type of failure. Use the `maxRetries` directive to specify the maximum number of retries.

The following example sets `maxRetries` to 3, so that HealthOmics attempts a maximum of four attempts to complete the task (the initial attempt plus three retries):

```
runtime {
    maxRetries: 3
}
```

**Note**  
Task retry for out of memory requires GNU findutils 4.2.3\$1. The default HealthOmics image container includes this package. If you specify a custom image in your WDL definition, make sure that the image includes GNU findutils 4.2.3\$1.

### Configure return codes
<a name="workflow-wdl-directive-returnCodes"></a>

The **returnCodes** attribute provides a mechanism to specify a return code, or a set of return codes, that indicates a successful execution of a task. The WDL engine honors the return codes that you specify in the **runtime** section of the WDL definition, and sets the tasks status accordingly. 

```
runtime {
    returnCodes: 1
}
```

HealthOmics also supports an alias named **continueOnReturnCode**, which has the same capabilities as **returnCodes**. If you specify both attributes, HealthOmics uses the **returnCodes** value.

## Task metadata in WDL
<a name="workflow-wdl-task-metadata"></a>

HealthOmics supports the following metadata options for WDL tasks.

### Disable task-level caching with the volatile attribute
<a name="workflow-wdl-volatile-attribute"></a>

The **volatile** attribute allows you to disable call caching for specific tasks in your WDL workflow. When a task is marked as volatile, it will always execute and never use cached results, even when caching is enabled for the run.

Add the **volatile** attribute to the **meta** section of your task definition:

```
task my_volatile_task {
    meta {
        volatile: true
    }
    
    input {
        String input_file
    }
    
    command {
        echo "Processing ${input_file}" > output.txt
    }
    
    output {
        File result = "output.txt"
    }
}
```

## WDL workflow definition example
<a name="wdl-example"></a>

The following examples show private workflow definitions for converting from `CRAM` to `BAM` in WDL. The `CRAM` to `BAM` workflow defines two tasks and uses tools from the `genomes-in-the-cloud` container, which is shown in the example and is publicly available. 

The following example shows how to include the Amazon ECR container as a parameter. This allows HealthOmics to verify the access permissions to your container before it starts the run the run.

```
{
   ...
   "gotc_docker":"<account_id>.dkr.ecr.<region>.amazonaws.com/genomes-in-the-cloud:2.4.7-1603303710"
}
```

The following example shows how to specify which files to use in your run, when the files are in an Amazon S3 bucket.

```
{
    "input_cram": "s3://amzn-s3-demo-bucket1/inputs/NA12878.cram",
    "ref_dict": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.dict",
    "ref_fasta": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.fasta",
    "ref_fasta_index": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.fasta.fai",
    "sample_name": "NA12878"
}
```

If you want to specify files from a sequence store, indicate that as shown in the following example, using the URI for the sequence store.

```
{
    "input_cram": "omics://429915189008.storage.us-west-2.amazonaws.com/111122223333/readSet/4500843795/source1",
    "ref_dict": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.dict",
    "ref_fasta": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.fasta",
    "ref_fasta_index": "s3://amzn-s3-demo-bucket1/inputs/Homo_sapiens_assembly38.fasta.fai",
    "sample_name": "NA12878"
}
```

You can then define your workflow in WDL as shown in the following example. 

```
 version 1.0
workflow CramToBamFlow {
    input {
        File ref_fasta
        File ref_fasta_index
        File ref_dict
        File input_cram
        String sample_name
        String gotc_docker = "<account>.dkr.ecr.us-west-2.amazonaws.com/genomes-in-the-
cloud:latest"
    }
    #Converts CRAM to SAM to BAM and makes BAI.
    call CramToBamTask{
         input:
            ref_fasta = ref_fasta,
            ref_fasta_index = ref_fasta_index,
            ref_dict = ref_dict,
            input_cram = input_cram,
            sample_name = sample_name,
            docker_image = gotc_docker,
     }
     #Validates Bam.
     call ValidateSamFile{
        input:
           input_bam = CramToBamTask.outputBam,
           docker_image = gotc_docker,
     }
     #Outputs Bam, Bai, and validation report to the FireCloud data model.
     output {
         File outputBam = CramToBamTask.outputBam
         File outputBai = CramToBamTask.outputBai
         File validation_report = ValidateSamFile.report
      }
}
#Task definitions.
task CramToBamTask {
    input {
       # Command parameters
       File ref_fasta
       File ref_fasta_index
       File ref_dict
       File input_cram
       String sample_name
       # Runtime parameters
       String docker_image
    }
   #Calls samtools view to do the conversion.
   command {
       set -eo pipefail

       samtools view -h -T ~{ref_fasta} ~{input_cram} |
       samtools view -b -o ~{sample_name}.bam -
       samtools index -b ~{sample_name}.bam
       mv ~{sample_name}.bam.bai ~{sample_name}.bai
    }
    
    #Runtime attributes:
    runtime {
        docker: docker_image
    }

    #Outputs a BAM and BAI with the same sample name
     output {
         File outputBam = "~{sample_name}.bam"
         File outputBai = "~{sample_name}.bai"
    }
}

#Validates BAM output to ensure it wasn't corrupted during the file conversion.
task ValidateSamFile {
   input {
      File input_bam
      Int machine_mem_size = 4
      String docker_image
   }
   String output_name = basename(input_bam, ".bam") + ".validation_report"
   Int command_mem_size = machine_mem_size - 1
   command {
       java -Xmx~{command_mem_size}G -jar /usr/gitc/picard.jar \
       ValidateSamFile \
       INPUT=~{input_bam} \
       OUTPUT=~{output_name} \
       MODE=SUMMARY \
       IS_BISULFITE_SEQUENCED=false
    }
    runtime {
    docker: docker_image
    }
   #A text file is generated that lists errors or warnings that apply.
    output {
        File report = "~{output_name}"
    }
}
```

# Nextflow workflow definition specifics
<a name="workflow-definition-nextflow"></a>

HealthOmics suppports Nextflow DSL1 and DSL2. For details, see [Nextflow version support](workflows-lang-versions.md#workflows-lang-versions-nextflow).

Nextflow DSL2 is based on the Groovy programming language, so parameters are dynamic and type coercion is possible using the same rules as Groovy. Parameters and values supplied by the input JSON are available in the parameters (`params`) map of the workflow.

**Topics**
+ [Use nf-schema and nf-validation plugins](#schema-and-validation-plugins-nextflow)
+ [Specify storage URIs](#storage-uris-nextflow)
+ [Nextflow directives](#workflow-nexflow-directives)
+ [Export workflow-level content](#exporting-workflow-content-nextflow)
+ [Export task content](#exporting-task-content-nextflow)

## Use nf-schema and nf-validation plugins
<a name="schema-and-validation-plugins-nextflow"></a>

**Note**  
Summary of HealthOmics support for plugins:  
v22.04 – no support for plugins
v23.10 – supports `nf-schema` and `nf-validation`
v24.10 – supports `nf-schema`
v25.10 – supports `nf-schema`, `nf-core-utils`, `nf-fgbio`, and `nf-prov`

HealthOmics provides the following support for Nextflow plugins:
+ For Nextflow v23.10, HealthOmics pre-installs the nf-validation@1.1.1 plugin. 
+ For Nextflow v23.10 and later, HealthOmics pre-installs the nf-schema@2.3.0 plugin.
+ You cannot retrieve additional plugins during a workflow run. HealthOmics ignores any other plugin versions that you specify in the `nextflow.config` file.
+ For Nextflow v24 and higher, `nf-schema` is the new version of the deprecated `nf-validation` plugin. For more information, see [ nf-schema](https://github.com/nextflow-io/nf-schema) in the Nextflow GitHub repository.

## Specify storage URIs
<a name="storage-uris-nextflow"></a>

When an Amazon S3 or HealthOmics URI is used to construct a Nextflow file or path object, it makes the matching object available to the workflow, as long as read access is granted. The use of prefixes or directories is allowed for Amazon S3 URIs. For examples, see [Amazon S3 input parameter formats](workflows-run-inputs.md#s3-run-input-formats). 

HealthOmics partially supports the use of glob patterns in Amazon S3 URIs or HealthOmics Storage URIs. Use Glob patterns in the workflow definition for the creation of `path` or `file` channels. For the expected behavior and exact cases, see [Nextflow Handling of Glob pattern in Amazon S3 inputs](workflows-run-inputs.md#wd-nextflow-s3-formats).

## Nextflow directives
<a name="workflow-nexflow-directives"></a>

You configure Nextflow directives in the Nextflow config file or workflow definition. The following list shows the order of precedence that HealthOmics uses to apply configuration settings, from lowest to highest priority:

1. Global configuration in the config file.

1. Task section of the workflow definition.

1. Task-specific selectors in the config file.

**Topics**
+ [Task retry strategy using `errorStrategy`](#workflow-nextflow-errorStrategy)
+ [Task retry attempts using `maxRetries`](#workflow-nexflow-task-retry)
+ [Opt out of task retry using `omicsRetryOn5xx`](#workflow-nextflow-retry-5xx)
+ [Task duration using the `time` directive](#time-directive-nextflow)

### Task retry strategy using `errorStrategy`
<a name="workflow-nextflow-errorStrategy"></a>

Use the `errorStrategy` directive to define the strategy for task errors. By default, when a task returns with an error indication (a non-zero exit status), the task stops and HealthOmics terminates the entire run. If you set `errorStrategy` to `retry`, HealthOmics attempts one retry of the failed task. To increase the number of retries, see [Task retry attempts using `maxRetries`](#workflow-nexflow-task-retry).

```
process {
    label 'my_label'
    errorStrategy 'retry'

    script:
    """
    your-command-here
    """
}
```

For information about how HealthOmics handles task retries during a run, see [Task Retries](monitoring-runs.md#run-status-task-retries).

### Task retry attempts using `maxRetries`
<a name="workflow-nexflow-task-retry"></a>

By default, HealthOmics doesn't attempt any retries of a failed task, or attempts one retry if you configure `errorStrategy`. To increase the maximum number of retries, set `errorStrategy` to `retry` and configure the maximum number of retries using the `maxRetries` directive.

The following example sets the maximum number of retries to 3 in the global configuration.

```
process {
    errorStrategy = 'retry'
    maxRetries = 3
}
```

The following example shows how to set `maxRetries` in the task section of the workflow definition.

```
process myTask {
    label 'my_label'
    errorStrategy 'retry'
    maxRetries 3
    
    script:
    """
    your-command-here
    """
}
```

The following example shows how to specify task-specific configuration in the Nextflow config file, based on the name or label selectors.

```
process {
    withLabel: 'my_label' {
        errorStrategy = 'retry'
        maxRetries = 3
    }

    withName: 'myTask' {
        errorStrategy = 'retry'
        maxRetries = 3
    }
}
```

### Opt out of task retry using `omicsRetryOn5xx`
<a name="workflow-nextflow-retry-5xx"></a>

For Nextflow v23 and later, HealthOmics supports task retries if the task failed because of service errors (5XX HTTP status codes). By default, HealthOmics attempts up to two retries of a failed task. 

You can configure `omicsRetryOn5xx` to opt out of task retry for service errors. For more information about task retry in HealthOmics, see [Task Retries](monitoring-runs.md#run-status-task-retries).

The following example configures `omicsRetryOn5xx` in the global configuration to opt out of task retry.

```
process {
    omicsRetryOn5xx = false
}
```

The following example shows how to configure `omicsRetryOn5xx` in the task section of the workflow definition.

```
process myTask {
    label 'my_label'
    omicsRetryOn5xx = false
    
    script:
    """
    your-command-here
    """
}
```

The following example shows how to set `omicsRetryOn5xx` as task-specific configuration in the Nextflow config file, based on the name or label selectors.

```
process {
    withLabel: 'my_label' {
        omicsRetryOn5xx = false
    }

    withName: 'myTask' {
        omicsRetryOn5xx = false
    }
}
```

### Task duration using the `time` directive
<a name="time-directive-nextflow"></a>

HealthOmics provides an adjustable quota (see [HealthOmics service quotas](service-quotas.md)) to specify the maximum duration for a run. For Nextflow v23 and later workflows, you can also specify maximum task durations using the Nextflow `time` directive.

During new workflow development, setting maximum task duration helps you catch runaway tasks and long-running tasks. 

For more information about the Nextflow time directive, see [time directive](https://www.nextflow.io/docs/latest/reference/process.html#process-time) in the Nextflow reference.

HealthOmics provides the following support for the Nextflow time directive:

1. HealthOmics supports 1 minute granularity for the time directive. You can specify a value between 60 seconds and the maximum run duration value.

1. If you enter a value less than 60, HealthOmics rounds it up to 60 seconds. For values above 60, HealthOmics rounds down to the nearest minute.

1. If the workflow supports retries for a task, HealthOmics retries the task if it times out.

1. If a task times out (or the last retry times out), HealthOmics cancels the task. This operation can have a duration of one to two minutes.

1. On task timeout, HealthOmics sets the run and task status to failed, and it cancels the other tasks in the run (for tasks in Starting, Pending, or Running status). HealthOmics exports the outputs from tasks that it completed before the timeout to your designated S3 output location. 

1. Time that a task spends in pending status does not count toward the task duration.

1. If the run is part of a run group and the run group times out sooner than the task timer, the run and task transition to failed status.

Specify the timeout duration using one or more of the following units: `ms`, `s`, `m`,`h`, or `d`.

The following example shows how to specify global configuration in the Nextflow config file. It sets a global timeout of 1 hour and 30 minutes.

```
process {
    time = '1h30m'
}
```

The following example shows how to specify a time directive in the task section of the workflow definition. This example sets a timeout of 3 days, 5 hours, and 4 minutes. This value takes precedence over the global value in the config file, but doesn't take precedence over a task-specific time directive for `my_label` in the config file.

```
process myTask {
    label 'my_label'
    time '3d5h4m'
        
    script:
    """
    your-command-here
    """
}
```

The following example shows how to specify task-specific time directives in the Nextflow config file, based on the name or label selectors. This example sets a global task timeout value of 30 minutes. It sets a value of 2 hours for task `myTask` and sets a value of 3 hours for tasks with label `my_label`. For tasks that match the selector, these values take precedence over the global value and the value in the workflow definition.

```
process {
    time = '30m'
    
    withLabel: 'my_label' {
        time = '3h'  
    }

    withName: 'myTask' {
        time = '2h'  
    }
}
```

## Export workflow-level content
<a name="exporting-workflow-content-nextflow"></a>

For Nextflow v25.10, you can export files produced outside of individual tasks, such as provenance reports or pipeline DAGs. To export these files, write them to `/mnt/workflow/output/`. HealthOmics exports files placed in this directory to the `output/` prefix in your run's Amazon S3 output location.

The following example shows how to configure the `nf-prov` plugin to write a provenance report to `/mnt/workflow/output/`.

```
prov {
    formats {
        bco {
            file = "/mnt/workflow/output/pipeline_info/manifest.bco.json"
        }
    }
}
```

You can also pass this path as a parameter in your run's input JSON. This approach is common with nf-core workflows that use `params.outdir`.

```
{
    "outdir": "/mnt/workflow/output/"
}
```

## Export task content
<a name="exporting-task-content-nextflow"></a>

For workflows written in Nextflow, define a **publishDir** directive to export task content to your output Amazon S3 bucket. As shown in the following example, set the **publishDir** value to `/mnt/workflow/pubdir`. To export files to Amazon S3, the files must be in this directory.

```
 nextflow.enable.dsl=2
              
  workflow {
    CramToBamTask(params.ref_fasta, params.ref_fasta_index, params.ref_dict, params.input_cram, params.sample_name)
    ValidateSamFile(CramToBamTask.out.outputBam)
  }
  
  process CramToBamTask {
    container "<account>.dkr.ecr.us-west-2.amazonaws.com/genomes-in-the-cloud"
  
    publishDir "/mnt/workflow/pubdir"
  
    input:
        path ref_fasta
        path ref_fasta_index
        path ref_dict
        path input_cram
        val sample_name
  
    output:
        path "${sample_name}.bam", emit: outputBam
        path "${sample_name}.bai", emit: outputBai
  
    script:
    """
        set -eo pipefail
  
        samtools view -h -T $ref_fasta $input_cram |
        samtools view -b -o ${sample_name}.bam -
        samtools index -b ${sample_name}.bam
        mv ${sample_name}.bam.bai ${sample_name}.bai
    """
  }
  
  process ValidateSamFile {
    container "<account>.dkr.ecr.us-west-2.amazonaws.com/genomes-in-the-cloud"
  
    publishDir "/mnt/workflow/pubdir"
  
    input:
        file input_bam
  
    output:
        path "validation_report"
  
    script:
    """
        java -Xmx3G -jar /usr/gitc/picard.jar \
        ValidateSamFile \
        INPUT=${input_bam} \
        OUTPUT=validation_report \
        MODE=SUMMARY \
        IS_BISULFITE_SEQUENCED=false
    """
  }
```

For Nextflow v25.10, as an alternative to `publishDir`, you can use workflow outputs to export task content. The following example shows how to define a workflow `output` block that exports task results to Amazon S3.

```
process myTask {
    input:
    val data

    output:
    path 'result.txt'

    script:
    """
    echo ${data} > result.txt
    """
}

workflow {
    main:
    output_file = myTask('hello')

    publish:
    results = output_file
}

output {
    results {
        path '.'
    }
}
```

For more information about workflow outputs, see [Workflow outputs](https://www.nextflow.io/docs/latest/workflow.html#workflow-output-def) in the Nextflow documentation.

# CWL workflow definition specifics
<a name="workflow-languages-cwl"></a>

Workflows written in Common Workflow Language, or CWL, offer similar functionality to workflows written in WDL and Nextflow. You can use Amazon S3 or HealthOmics storage URIs as input parameters. 

If you define input in a secondaryFile in a sub workflow, add the same definition in the main workflow.

HealthOmics workflows don't support operation processes. To learn more about operations processes in CWL workflows, see the [CWL documentation](https://www.commonwl.org/user_guide/topics/operations.html).

Best practice is to define a separate CWL workflow for each container that you use. We recommend that you don't hardcode the dockerPull entry with a fixed Amazon ECR URI.

**Topics**
+ [Convert CWL workflows to use HealthOmics](#workflow-cwl-convert)
+ [Opt out of task retry using `omicsRetryOn5xx`](#workflow-cwl-retry-5xx)
+ [Loop a workflow step](#workflow-cwl-loop)
+ [Retry tasks with increased memory](#workflow-cwl-out-of-memory-retry)
+ [Examples](#workflow-cwl-examples)

## Convert CWL workflows to use HealthOmics
<a name="workflow-cwl-convert"></a>

To convert an existing CWL workflow definition to use HealthOmics, make the following changes:
+ Replace all Docker container URIs with Amazon ECR URIs.
+ Make sure that all the workflow files are declared in the main workflow as input, and all variables are explicitly defined.
+ Make sure that all JavaScript code is strict-mode complaint.

## Opt out of task retry using `omicsRetryOn5xx`
<a name="workflow-cwl-retry-5xx"></a>

HealthOmics supports task retries if the task failed because of service errors (5XX HTTP status codes). By default, HealthOmics attempts up to two retries of a failed task. For more information about task retry in HealthOmics, see [Task Retries](monitoring-runs.md#run-status-task-retries).

To opt out of task retry for service errors, configure the `omicsRetryOn5xx` directive in the workflow definition. You can define this directive under requirements or hints. We recommend adding the directive as a hint for portability.

```
requirements:
  ResourceRequirement:
    omicsRetryOn5xx: false

hints:
  ResourceRequirement:
    omicsRetryOn5xx: false
```

Requirements override hints. If a task implementation provides a resource requirement in hints that is also provided by requirements in an enclosing workflow, the enclosing requirements takes precedence.

If the same task requirement appears at different levels of the workflow, HealthOmics uses the most specific entry from `requirements` (or `hints`, if there are no entries in `requirements`). The following list shows the order of precedence that HealthOmics uses to apply configuration settings, from lowest to highest priority:
+ Workflow level
+ Step level
+ Task section of the workflow definition

The following example shows how to configure the `omicsRetryOn5xx` directive at different levels of the workflow. In this example, the workflow-level requirement overrides the workflow level hints. The requirements configurations at the task and step levels override the hints configurations.

```
class: Workflow
# Workflow-level requirement and hint
requirements:
  ResourceRequirement:
    omicsRetryOn5xx: false

hints:
  ResourceRequirement:
    omicsRetryOn5xx: false  # The value in requirements overrides this value 

steps:
  task_step:
    # Step-level requirement
    requirements:
      ResourceRequirement:
        omicsRetryOn5xx: false
    # Step-level hint
    hints:
      ResourceRequirement:
        omicsRetryOn5xx: false
    run:
      class: CommandLineTool
      # Task-level requirement
      requirements:
        ResourceRequirement:
          omicsRetryOn5xx: false
      # Task-level hint
      hints:
        ResourceRequirement:
          omicsRetryOn5xx: false
```

## Loop a workflow step
<a name="workflow-cwl-loop"></a>

HealthOmics supports looping a workflow step. You can use loops to run workflow steps repeatedly until a specified condition is met. This is useful for iterative processes where you need to repeat a task multiple times or until a certain result is achieved.

**Note:** Loop functionality requires CWL version 1.2 or later. Workflows using CWL versions earlier than 1.2 do not support loop operations.

To use loops in your CWL workflow, define a Loop requirement. The following example shows the loop requirement configuration:

```
requirements:
  - class: "http://commonwl.org/cwltool#Loop"
    loopWhen: $(inputs.counter < inputs.max)
    loop:
      counter:
        loopSource: result
        valueFrom: $(self)
    outputMethod: last
```

The `loopWhen` field controls when the loop terminates. In this example, the loop continues as long as the counter is less than the maximum value. The `loop` field defines how input parameters are updated between iterations. The `loopSource` specifies which output from the previous iteration feeds into the next iteration. The `outputMethod` field set to `last` returns only the final iteration's output.

## Retry tasks with increased memory
<a name="workflow-cwl-out-of-memory-retry"></a>

HealthOmics supports automatic retry of out-of-memory task failures. When a task exits with code 137 (out-of-memory), HealthOmics creates a new task with increased memory allocation based on the specified multiplier.

**Note**  
HealthOmics retries out-of-memory failures up to 3 times or until the memory allocation reaches 1536 GiB, whichever limit is reached first.

The following example shows how to configure out-of-memory retry:

```
hints:
  ResourceRequirement:
    ramMin: 4096
  http://arvados.org/cwl#OutOfMemoryRetry:
    memoryRetryMultiplier: 2.5
```

When a task fails due to out-of-memory, HealthOmics calculates the retry memory allocation using the formula: `previous_run_memory × memoryRetryMultiplier`. In the example above, if the task with 4096 MB of memory fails, the retry attempt uses 4096 × 2.5 = 10,240 MB of memory.

The `memoryRetryMultiplier` parameter controls how much additional memory to allocate for retry attempts:
+ **Default value:** If you don't specify a value, it defaults to `2` (doubles the memory)
+ **Valid range:** Must be a positive number greater than `1`. Invalid values result in a 4XX validation error
+ **Minimum effective value:** Values between `1` and `1.5` are automatically increased to `1.5` to ensure meaningful memory increases and prevent excessive retry attempts

## Examples
<a name="workflow-cwl-examples"></a>

The following is an example of a workflow written in CWL. 

```
cwlVersion: v1.2
class: Workflow

inputs:
in_file:
type: File
secondaryFiles: [.fai]

out_filename: string
docker_image: string


outputs:
copied_file:
type: File
outputSource: copy_step/copied_file

steps:
copy_step:
in:
  in_file: in_file
  out_filename: out_filename
  docker_image: docker_image
out: [copied_file]
run: copy.cwl
```

The following file defines the `copy.cwl` task.

```
cwlVersion: v1.2
class: CommandLineTool
baseCommand: cp

inputs:
in_file:
type: File
secondaryFiles: [.fai]
inputBinding:
  position: 1

out_filename:
type: string
inputBinding:
  position: 2
docker_image:
type: string

outputs:
copied_file:
type: File
outputBinding:
    glob: $(inputs.out_filename)

requirements:
InlineJavascriptRequirement: {}
DockerRequirement:
dockerPull: "$(inputs.docker_image)"
```

The following is an example of a workflow written in CWL with a GPU requirement.

```
cwlVersion: v1.2
class: CommandLineTool
baseCommand: ["/bin/bash", "docm_haplotypeCaller.sh"]
$namespaces:
cwltool: http://commonwl.org/cwltool#
requirements:
cwltool:CUDARequirement:
cudaDeviceCountMin: 1
cudaComputeCapability: "nvidia-tesla-t4" 
cudaVersionMin: "1.0"
InlineJavascriptRequirement: {}
InitialWorkDirRequirement:
listing:
- entryname: 'docm_haplotypeCaller.sh'
  entry: |
          nvidia-smi --query-gpu=gpu_name,gpu_bus_id,vbios_version --format=csv   

inputs: []
outputs: []
```

# Example workflow definitions
<a name="workflow-definition-examples"></a>

The following example shows the same workflow definition in WDL, Nextflow, and CWL.

------
#### [ WDL ]

```
version 1.1

task my_task {
   runtime { ... }
   inputs {
       File input_file
       String name
       Int threshold
   }
   
   command <<<
   my_tool --name ~{name} --threshold ~{threshold} ~{input_file}
   >>>
   
   output {
       File results = "results.txt"
   }
}

workflow my_workflow {
   inputs {
       File input_file
       String name
       Int threshold = 50
   }
   
   call my_task {
       input:
          input_file = input_file,
          name = name,
          threshold = threshold
   }
   outputs {
       File results = my_task.results
   }
}
```

------
#### [ Nextflow ]

```
nextflow.enable.dsl = 2

params.input_file = null
params.name = null
params.threshold = 50

process my_task {
   // <directives>
   
   input:
     path input_file
     val name
     val threshold
   
   output:
     path 'results.txt', emit: results
   
   script:
     """
     my_tool --name ${name} --threshold ${threshold} ${input_file}
     """
     
   
}

workflow MY_WORKFLOW {
   my_task(
       params.input_file,
       params.name,
       params.threshold
   )
}

workflow {
   MY_WORKFLOW()
}
```

------
#### [ CWL ]

```
cwlVersion: v1.2
class: Workflow

requirements:
    InlineJavascriptRequirement: {}

inputs:
   input_file: File
   name: string
   threshold: int

outputs:
    result:
        type: ...
        outputSource: ...

steps:
    my_task:
        run:
            class: CommandLineTool
            baseCommand: my_tool
            requirements:
                ...
            inputs:
                name:
                    type: string
                    inputBinding:
                        prefix: "--name"
                threshold:
                    type: int
                    inputBinding:
                        prefix: "--threshold"
                input_file:
                    type: File
                    inputBinding: {}
            outputs:
                results:
                    type: File
                    outputBinding:
                        glob: results.txt
```

------