HealthOmics Storage - AWS HealthOmics

HealthOmics Storage

Use HealthOmics Storage to store, retrieve, organize, and share genomics data efficiently and at low cost. HealthOmics Storage understands the relationships between different data objects, so that you can define which read sets originated from the same source data. This provides you with data provenance.

Data that's stored in ACTIVE state is retrievable immediately. Data that hasn't been accessed for 30 days or more is stored in ARCHIVE state. To access archived data, you can reactivate it through the API operations or console.

With the HealthOmics Storage API operations, you can perform the following actions:

  • Create, manage, and delete sequence and reference stores

  • Create and manage read sets

  • Import, export, and work with read sets

  • Share and access read sets with collaborators through Amazon S3 URI access

  • Create, manage, and import references

  • Copy read sets to local file systems for analysis

  • Tag AWS resources such as sequence stores, read sets, and references

  • List and read files through Amazon S3 API operations by using the Amazon S3 URI

HealthOmics sequence stores are designed to preserve the content integrity of files. However, bitwise equivalence of imported data files and exported files isn't preserved because of the compression during active and archive tiering.

During ingestion, HealthOmics generates an entity tag, or HealthOmics ETag, to make it possible to validate the content integrity of your data files. Sequencing portions are identified and captured as an ETag at the source level of a read set. The ETag calculation doesn't alter the actual file or genomic data. After a read set is created, the ETag shouldn't change throughout the lifecycle of the read set source. This means that reimporting the same file results in the same ETag value being calculated.