選取您的 Cookie 偏好設定

我們使用提供自身網站和服務所需的基本 Cookie 和類似工具。我們使用效能 Cookie 收集匿名統計資料,以便了解客戶如何使用我們的網站並進行改進。基本 Cookie 無法停用,但可以按一下「自訂」或「拒絕」以拒絕效能 Cookie。

如果您同意,AWS 與經核准的第三方也會使用 Cookie 提供實用的網站功能、記住您的偏好設定,並顯示相關內容,包括相關廣告。若要接受或拒絕所有非必要 Cookie,請按一下「接受」或「拒絕」。若要進行更詳細的選擇,請按一下「自訂」。

HealthOmics run inputs

焦點模式
HealthOmics run inputs - AWS HealthOmics
此頁面尚未翻譯為您的語言。 請求翻譯

If the workflow definition specifies input files for the workflow or workflow tasks, HealthOmics stages the files to a scratch volume that's dedicated to the workflow run. These input files are read-only, which prevents tasks from modifying potential inputs to other tasks in the workflow. For directory imports, the directories are also read-only.

Many genomics applications assume that index files are co-located with the sequence files (such as a companion bai file for a bam file). To include index files, specify them as task inputs in the workflow definition.

Managing input parameter size

You can specify up to 50 KB of input parameters for the workflow. You can use the following techniques to remain within this size constraint:

  • Use directory imports

    To specify a large number of input files, specify one parameter as the Amazon S3 location that contains all the files, rather than specifying a parameter for each file location. For more information, see the next topic (Amazon S3 input parameter formats).

  • Use a sample sheet

    A sample sheet is a CSV or TSV file with one column for the fastq.gz address (or two for paired read) and additional columns for metadata such as sample names. You specify the sample sheet as a run input parameter instead of a parameter for each input file.

    Your workflow defines how your sample sheet maps to data structures in the workflow. While you could write code for sample sheets in WDL and CWL, they're more common in NextFlow. For an example, see sample sheet on the nf-core GitHub site.

Amazon S3 input parameter formats

For an input parameter that accepts an Amazon S3 location, the parameter can specify the location of one file or a whole directory of files. Using a directory has the following advantages:

  • Convenience – You specify the directory name as the parameter. You don't list each file name.

  • Compactness – The input parameter maximum file size is 50 KB. If you provide a long list of input file names, you can exceed this maximum.

Amazon S3 is a flat object-storage system, so it doesn't support directories. You group files into a "directory" by giving each file the same object key prefix. For more information about Amazon S3 object key prefixes, see Organizing objects using prefixes.

HealthOmics interprets the input parameter value as follows:

  • If the Amazon S3 location doesn't end with a forward slash or use the glob pattern, HealthOmics expects the parameter value to be the key for one Amazon S3 object.

    For example, you specify s3://myfiles/runs/inputs/a/file1.fastq to input file1.fastq

  • If the Amazon S3 location ends with a forward slash, HealthOmics interprets the parameter value as an Amazon S3 prefix. It loads all the Amazon S3 objects with that prefix.

    For example, you can specify s3://myfiles/runs/inputs/a/ to load all objects whose keys start with this prefix.

  • For Nextflow, HealthOmics supports the glob pattern for Amazon S3 URIs in input parameters.

    For example, you can specify “s3://myfiles/runs/inputs/a/*.gz” to input all .gz files whose keys start with this prefix.

Language-specific handling of double-slash in Amazon S3 inputs

HealthOmics retains the native engine behavior for each workflow engine when handling double-slashes in Amazon S3 URIs, so that you don't need to make any changes to your workflows when you migrate them to HealthOmics. The following sections describe how each engine handles various scenarios.

WDL

If the input parameter includes a double-slash in the middle or at the end of the URI, the WDL engine retains the double-slash.

Input parameter Expected location
s3://myfiles/runs/inputs//file1.fastq s3://myfiles/runs/inputs//file1.fastq
s3://myfiles/runs/inputs// s3://myfiles/runs/inputs//

Nextflow

If the input parameter includes a double-slash in the middle of the URI, the Nextflow engine retains double-slash. For a double-slash at the end of the URI, the Nextflow engine resolves it to a single slash.

Input parameter Expected location
s3://myfiles/runs/inputs//file1.fastq s3://myfiles/runs/inputs//file1.fastq
s3://myfiles//runs/inputs//*.gz s3://myfiles//runs/inputs//*.gz
s3://myfiles//runs/inputs// s3://myfiles//runs/inputs/

CWL

If the input parameter includes a double-slash in the middle or at the end of the URI, the CWL engine retains the double-slash.

Input parameter Expected location
s3://myfiles//runs/inputs//file1.fastq s3://myfiles//runs/inputs//file1.fastq
s3://myfiles//runs/inputs// s3://myfiles//runs/inputs//

Amazon S3 input archive states

HealthOmics can retrieve Amazon S3 objects that S3 delivers in real time. For objects that are in the following archived storage states, restore the objects to make them available to HealthOmics:

  • Flexible Retrieval or Deep Archive storage classes in Amazon S3 Glacier.

  • Archived Access or Deep Archive Access tiers in Intelligent tiering.

For information about restoring objects, see Restoring an archived object in the Amazon S3 User Guide.

隱私權網站條款Cookie 偏好設定
© 2025, Amazon Web Services, Inc.或其附屬公司。保留所有權利。