

# HealthOmics analytics
<a name="omics-analytics"></a>

**Important**  
AWS HealthOmics variant stores and annotation stores are no longer open to new customers. Existing customers can continue to use the service as normal. For more information, see [AWS HealthOmics variant store and annotation store availability change](variant-store-availability-change.md).

HealthOmics analytics supports the storage and analysis of genomic variants and annotations. Analytics provides two types of storage resources - Variant stores and Annotation stores. You use these resources to store, transform, and query genomic variant data and annotation data. After you import data into a datastore, you can use Athena to peform advanced analytics on the data.

You can use the HealthOmics console or API to create and manage stores, import data, and share analytic store data with collaborators.

Variant stores support data in VCF formats, and annotation stores support TSV/CSV and GFF3 formats. Genomic coordinates are represented as zero-based, half-closed half-open intervals. When your data is in the HealthOmics analytics data store, access to the VCF files is managed through AWS Lake Formation. You can then query the VCF files by using Amazon Athena. Queries must use Athena query engine version 3. To read more about Athena query engine versions, see the [Amazon Athena documentation](https://docs.aws.amazon.com/athena/latest/ug/engine-versions-changing.html). 



**Topics**
+ [

# Creating HealthOmics variant stores
](creating-variant-stores.md)
+ [

# Creating HealthOmics variant store import jobs
](parsing-annotation-stores.md)
+ [

# Creating HealthOmics annotation stores
](creating-and-managing-annotation-store.md)
+ [

# Creating import jobs for HealthOmics annotation stores
](annotation-store-import-jobs.md)
+ [

# Creating HealthOmics annotation store versions
](annotation-store-versioning.md)
+ [

# Deleting HealthOmics analytics stores
](deleting-a-store-examples.md)
+ [

# Querying HealthOmics analytics data
](analytics-query-data.md)
+ [

# Sharing HealthOmics analytics stores
](cross-account-sharing.md)

# Creating HealthOmics variant stores
<a name="creating-variant-stores"></a>

**Important**  
AWS HealthOmics variant stores and annotation stores are no longer open to new customers. Existing customers can continue to use the service as normal. For more information, see [AWS HealthOmics variant store and annotation store availability change](variant-store-availability-change.md).

The following topics describe how to create HealthOmics variant stores using the console and the API.

**Topics**
+ [

## Creating a variant store using the console
](#gs-console-analytics)
+ [

## Creating a variant store using the API
](#gs-api-analytics)

## Creating a variant store using the console
<a name="gs-console-analytics"></a>

You can create a variant store using the HealthOmics console.

1. Open the [HealthOmics console](https://console.aws.amazon.com/omics/).

1.  If required, open the left navigation pane (≡). Choose **Variant stores**.

1. On the **Create variant store** page, provide the following information
   + **Variant store name** - A unique name for this store. 
   + **Description** (optional) - A description of this variant store.
   + **Reference genome** - The reference genome for this variant store.
   + **Data Encryption** - Choose whether you want data encryption to be owned and managed by AWS or by yourself. 
   + **Tags** (optional) - Provide up to 50 tags for this variant store.

1. Choose **Create variant store**.

## Creating a variant store using the API
<a name="gs-api-analytics"></a>

Use HealthOmics `CreateVariantStore` API operation to create variant stores. You can also perform this operation with the AWS CLI.

To create a variant store, you provide a name for the store and the ARN of a reference store. The variant store is ready to ingest data when its status changes to READY. 

The following example uses the AWS CLI to create a variant store.

```
aws omics create-variant-store --name myvariantstore \
    --reference referenceArn="arn:aws:omics:us-west-2:555555555555:referenceStore/123456789/reference/5987565360"
```

To confirm the creation of your variant store, you receive the following response.

```
{
    "creationTime": "2022-11-03T18:19:52.296368+00:00",
    "id": "45aeb91d5678",
    "name": "myvariantstore",
    "reference": {
        "referenceArn": "arn:aws:omics:us-west-2:555555555555:referenceStore/123456789/reference/5987565360"
    },
    "status": "CREATING"
}
```

To learn more about a variant store, use the **get-variant-store** API.

```
aws omics get-variant-store --name myvariantstore
```

You receive the following response.

```
{
    "id": "45aeb91d5678",
    "reference": {
        "referenceArn": "arn:aws:omics:us-west-2:555555555555:referenceStore/123456789/reference/5987565360"
    },
    "status": "ACTIVE",
    "storeArn": "arn:aws:omics:us-west-2:555555555555:variantStore/myvariantstore",
    "name": "myvariantstore",
    "creationTime": "2022-11-03T18:19:52.296368+00:00",
    "updateTime": "2022-11-03T18:30:56.272792+00:00",
    "tags": {},
    "storeSizeBytes": 0
}
```

To view all variant stores associated with an account, use the **list-variant-stores** API.

```
aws omics list-variant-stores  
```

You receive a response that lists all variant stores, along with their IDs, statuses, and other details, as shown in the following example response.

```
{
    "variantStores": [
        {
            "id": "45aeb91d5678",
            "reference": {
                "referenceArn": "arn:aws:omics:us-west-2:55555555555:referenceStore/5506874698"
            },
            "status": "ACTIVE",
            "storeArn": "arn:aws:omics:us-west-2:55555555555:variantStore/new_variant_store",
            "name": "variantstore",
            "creationTime": "2022-11-03T18:19:52.296368+00:00",
            "updateTime": "2022-11-03T18:30:56.272792+00:00",
            "statusMessage": "",
            "storeSizeBytes": 141526
        }
    ]
}
```

You can also filter the responses for the **list-variant-stores** API based on statuses or other criteria.

 VCF Files imported into analytic stores created on or after May 15, 2023 have defined schemas for Variant Effect Predictor (VEP) annotations. This makes it easier to query and parse imported VCF data. The change doesn't impact stores created before May 15, 2023, except if the `annotation fields` parameter is included in the API or CLI call. For these stores, using the `annotation fields` parameter will cause the request to fail.

# Creating HealthOmics variant store import jobs
<a name="parsing-annotation-stores"></a>

**Important**  
AWS HealthOmics variant stores and annotation stores are no longer open to new customers. Existing customers can continue to use the service as normal. For more information, see [AWS HealthOmics variant store and annotation store availability change](variant-store-availability-change.md).

The following example shows how to use the AWS CLI to create an import job for a variant store.

```
aws omics start-variant-import-job \
       --destination-name myvariantstore \
       --runLeftNormalization false \
       --role-arn  arn:aws:iam::55555555555:role/roleName \
       --items source=s3://my-omics-bucket/sample.vcf.gz source=s3://my-omics-bucket/sample2.vcf.gz
```

```
{
    "destinationName": "store_a",
    "roleArn": "....",
    "runLeftNormalization": false,
    "items": [
        {"source": "s3://my-omics-bucket/sample.vcf.gz"},
        {"source": "s3://my-omics-bucket/sample2.vcf.gz"}
    ]
}
```

For stores created after May 15, 2023, the following example shows how to add the `--annotation-fields` parameter. The annotation fields are defined with the import.

```
aws omics start-variant-import-job \
   --destination-name annotationparsingvariantstore \
   --role-arn arn:aws:iam::123456789012:role/<role_name> \
   --items source=s3://pathToS3/sample.vcf
   --annotation-fields '{"VEP": "CSQ"}'
```

```
{
    "jobId": "981e2286-e954-4391-8a97-09aefc343861"
}
```

Use **get-variant-import-job** to check the status. 

```
aws omics get-variant-import-job --job-id 08279950-a9e3-4cc3-9a3c-a574f9c9e229      
```

You'll receive a JSON response that shows the status of your import job. VEP annotations in the VCF are parsed for information stored in the INFO column as an ID/Value pair. The default ID for [Ensembl Variant Effect Predictor](https://useast.ensembl.org/info/docs/tools/vep/index.html/#vcf) annotations INFO column is CSQ, but you can use the `--annotation-fields` parameter to indicate a custom value used in the INFO column. Parsing is currently supported for VEP annotations.

For a store created before May 15, 2023 or for VCF files that don't include VEP annotation, the response doesn't include any annotation fields. 

```
{
    "creationTime": "2023-04-11T17:52:37.241958+00:00",
    "destinationName": "annotationparsingvariantstore",
    "id": "7a1c67e3-b7f9-434d-817b-9c571fd63bea",
    "items": [

    {
       "jobStatus": "COMPLETED",
       "source": "s3://amzn-s3-demo-bucket/NA12878.2k.garvan.vcf"
    }
 ],
    "roleArn": "arn:aws:iam::555555555555:role/<role_name>",

    "runLeftNormalization": false,
    "status": "COMPLETED",
    "updateTime": "2023-04-11T17:58:22.676043+00:00",
}
```

The VEP annotations that are a part of VCF files are stored as predefined schema with the following structure. The extras field can be used to store any additional VEP fields that aren't included in the default schema. 

```
annotations struct<
   vep: array<struct<
      allele:string,
      consequence: array<string>,
      impact:string,
      symbol:string,
      gene:string,
      `feature_type`: string, 
      feature: string,
      biotype: string,
      exon: struct<rank:string, total:string>,
      intron: struct<rank:string, total:string>,
      hgvsc: string,
      hgvsp: string,
      `cdna_position`: string,
      `cds_position`: string,
      `protein_position`: string,
      `amino_acids`: struct<reference:string, variant: string>,
      codons: struct<reference:string, variant: string>,
      `existing_variation`: array<string>,
      distance: string, 
      strand: string, 
      flags: array<string>,
      symbol_source: string,
      hgnc_id: string,
      `extras`: map<string, string> 
    >>
>
```

The parsing is performed with a best effort approach. If the VEP entry doesn't follow the [VEP standard specifications](https://useast.ensembl.org/info/docs/tools/vep/vep_formats.html#vcf), it won't be parsed and the row in the array will be empty.

For a new variant store, the response for **get-variant-import-job** would include the annotation fields, as shown. 

```
aws omics get-variant-import-job --job-id 08279950-a9e3-4cc3-9a3c-a574f9c9e229      
```

You receive a JSON response that shows the status of your import job.

```
{
    "creationTime": "2023-04-11T17:52:37.241958+00:00",
    "destinationName": "annotationparsingvariantstore",
    "id": "7a1c67e3-b7f9-434d-817b-9c571fd63bea",
    "items": [

    {
    "jobStatus": "COMPLETED",
    "source": "s3://amzn-s3-demo-bucket/NA12878.2k.garvan.vcf"
    }
 ],
    "roleArn": "arn:aws:iam::123456789012:role/<role_name>",
    "runLeftNormalization": false,
    "status": "COMPLETED",
    "updateTime": "2023-04-11T17:58:22.676043+00:00",
    "annotationFields" : {"VEP": "CSQ"}
  }
}
```

You can use **list-variant-import-jobs** to see all import jobs and their statuses.

```
aws omics list-variant-import-jobs --ids 7a1c67e3-b7f9-434d-817b-9c571fd63bea          
```

The response contains information as follows.

```
{
    "variantImportJobs": [
    {
        "creationTime": "2023-04-11T17:52:37.241958+00:00",
        "destinationName": "annotationparsingvariantstore",
        "id": "7a1c67e3-b7f9-434d-817b-9c571fd63bea",
        "roleArn": "arn:aws:iam::55555555555:role/roleName",
        "runLeftNormalization": false,
        "status": "COMPLETED",
        "updateTime": "2023-04-11T17:58:22.676043+00:00",
        "annotationFields" : {"VEP": "CSQ"}
        }
    ]
  }
}
```

If necessary, you can cancel an import job with the following command.

```
aws omics cancel-variant-import-job 
     --job-id edd7b8ce-xmpl-47e2-bc99-258cac95a508
```

# Creating HealthOmics annotation stores
<a name="creating-and-managing-annotation-store"></a>

**Important**  
AWS HealthOmics variant stores and annotation stores are no longer open to new customers. Existing customers can continue to use the service as normal. For more information, see [AWS HealthOmics variant store and annotation store availability change](variant-store-availability-change.md).

An annotation store is a data store representing an annotation database, such as one from a TSV, VCF, or GFF file. If the same reference genome is specified, annotation stores are mapped to the same coordinate system as variant stores during an import. The following topics show how to use the HealthOmics console and AWS CLI to create and manage annotation stores. 

**Topics**
+ [

## Creating an annotation store using the console
](#gs-console-create-annotation-store)
+ [

## Creating an annotation store using the API
](#create-manage-annotation-store-api)

## Creating an annotation store using the console
<a name="gs-console-create-annotation-store"></a>

Use the following procedure to create annotation stores with the HealthOmics console.

**To create an annotation store**

1. Open the [HealthOmics console](https://console.aws.amazon.com/omics/).

1.  If required, open the left navigation pane (≡). Choose **Annotation stores**.

1. On the **Annotation stores** page, choose **Create annotation store**.

1. On the **Create annotation store** page, provide the following information
   + **Annotation store name** - A unique name for this store. 
   + **Description** (optional) - A description of this reference genome.
   + **Data format and schema details** - Select data file format and upload the schema definition for this store.
   + **Reference genome** - The reference genome for this annotation.
   + **Data Encryption** - Choose whether you want data encryption to be owned and managed by AWS or by yourself. 
   + **Tags** (optional) - Provide up to 50 tags for this annotation store.

1. Choose **Create annotation store**.

## Creating an annotation store using the API
<a name="create-manage-annotation-store-api"></a>

The following example shows how to create an annotation store using the AWS CLI. For all AWS CLI and API operations, you must specify the format of your data. 

```
aws omics create-annotation-store --name my_annotation_store \
           --store-format GFF \
           --reference referenceArn="arn:aws:omics:us-west-2:555555555555:referenceStore/6505293348/reference/5987565360"
           --version-name new_version
```

You receive the following response to confirm the creation of your annotation store.

```
{
           "creationTime": "2022-08-24T20:34:19.229500Z",
           "id": "3b93cdef69d2",
           "name": "my_annotation_store",
           "reference": {
               "referenceArn": "arn:aws:omics:us-west-2:555555555555:referenceStore/6505293348/reference/5987565360"
           },
           "status": "CREATING"
           "versionName": "my_version"
       }
```

To learn more about an annotation store, use the **get-annotation-store** API.

```
aws omics get-annotation-store --name my_annotation_store
```

You receive the following response.

```
{
          "id": "eeb019ac79c2",
          "reference": {
              "referenceArn": "arn:aws:omics:us-west-2:555555555555:referenceStore/5638433913/reference/5871590330“
          },
          "status": "ACTIVE",
          "storeArn": "arn:aws:omics:us-west-2:555555555555:annotationStore/gffstore",
          "name": "my_annotation_store",
          "creationTime": "2022-11-05T00:05:19.136131+00:00",
          "updateTime": "2022-11-05T00:10:36.944839+00:00",
          "tags": {},
          "storeFormat": "GFF",
          "statusMessage": "",
          "storeSizeBytes": 0,
          "numVersions": 1
      }
```

To view all annotation stores associated with an account, use the **list-annotation-stores** API operation.

```
aws omics list-annotation-stores 
```

You receive a response that lists all annotation stores, along with their IDs, statuses, and other details, as shown in the following example response.

```
{
           "annotationStores": [
               {
                  "id": "4d8f3eada259",
                   "reference":
                       "referenceArn": "arn:aws:omics:us-west-2:555555555555:referenceStore/5638433913/reference/5871590330"
                   },
                   "status": "CREATING",
                   "name": "gffstore",
                   "creationTime": "2022-09-27T17:30:52.182990+00:00",
                   "updateTime": "2022-09-27T17:30:53.025362+00:00"
               }
           ]
       }
```

You can also filter responses based on status or other criteria.

# Creating import jobs for HealthOmics annotation stores
<a name="annotation-store-import-jobs"></a>

**Important**  
AWS HealthOmics variant stores and annotation stores are no longer open to new customers. Existing customers can continue to use the service as normal. For more information, see [AWS HealthOmics variant store and annotation store availability change](variant-store-availability-change.md).

**Topics**
+ [

## Creating an annotation import job using the API
](#create-annotation-import-api)
+ [

## Additional parameters for TSV and VCF formats
](#annotation-import-tsv-vcf)
+ [

## Creating TSV formatted annotation stores
](#annotation-import-tsv-vcftsv-annotation-store-examples-tsv)
+ [

## Starting VCF formatted import jobs
](#vcf-annotation-store-examples)

## Creating an annotation import job using the API
<a name="create-annotation-import-api"></a>

The following example shows how to use the AWS CLI to start an annotation import job.

```
aws omics start-annotation-import-job \
           --destination-name myannostore \
           --version-name myannostore \
           --role-arn arn:aws:iam::123456789012:role/roleName \
           --items source=s3://my-omics-bucket/sample.vcf.gz
           --annotation-fields '{"VEP": "CSQ"}'
```

Annotation stores created before May 15, 2023 return an error message if the **annotation-fields** is included. They don't return output for any API operations involved with annotation store import jobs.

You can then use the **get-annotation-import-job** API operation and the `job ID` parameter to learn more details about the annotation import job.

```
aws omics get-annotation-import-job --job-id 9e4198fb-fa85-446c-9301-9b823a1a8ba8         
```

You receive the following response, including the annotation fields.

```
{
          "creationTime": "2023-04-11T19:09:25.049767+00:00",
          "destinationName": "parsingannotationstore",
          "versionName": "parsingannotationstore",
          "id": "9e4198fb-fa85-446c-9301-9b823a1a8ba8",
          "items": [
              {
                  "jobStatus": "COMPLETED",
                  "source": "s3://my-omics-bucket/sample.vep.vcf"
              }
          ],
          "roleArn": "arn:aws:iam::55555555555:role/roleName",
          "runLeftNormalization": false,
          "status": "COMPLETED",
          "updateTime": "2023-04-11T19:13:09.110130+00:00",
          "annotationFields" : {"VEP": "CSQ"}
       }
```

To view all annotation store import jobs, use **list-annotation-import-jobs **.

```
aws omics list-annotation-import-jobs --ids 9e4198fb-fa85-446c-9301-9b823a1a8ba8          
```

The response includes the details and statuses of your annotation store import jobs.

```
{
          "annotationImportJobs": [
          {
              "creationTime": "2023-04-11T19:09:25.049767+00:00",
              "destinationName": "parsingannotationstore",
              "versionName": "parsingannotationstore",
              "id": "9e4198fb-fa85-446c-9301-9b823a1a8ba8",
              "roleArn": "arn:aws:iam::55555555555:role/roleName",
              "runLeftNormalization": false,
              "status": "COMPLETED",
              "updateTime": "2023-04-11T19:13:09.110130+00:00",
              "annotationFields" : {"VEP": "CSQ"}
          }
          ]
      }
```

## Additional parameters for TSV and VCF formats
<a name="annotation-import-tsv-vcf"></a>

For TSV and VCF formats, there are additional parameters that inform the API of how to parse your input.

**Important**  
 CSV annotation data that's exported with query engines directly returns information from the dataset import. If the imported data contains formulas or commands, the file might be subject to CSV injection. Therefore, files exported with query engines can prompt security warnings. To avoid malicious activity, turn off links and macros when reading export files. 

The TSV parser also performs basic bioinformatics operations, like left normalization and standardization of genomics coordinates, that are listed in the following table.


| Format type | Description | 
| --- | --- | 
| Generic | Generic text file. No genomic information. | 
| CHR\$1POS | Start position - 1, Add end position, which is the same as POS. | 
| CHR\$1POS\$1REF\$1ALT | Contains contig, 1-base position, ref and alt allele information. | 
| CHR\$1START\$1END\$1REF\$1ALT\$1ONE\$1BASE | Contains contig, start, end, ref and alt allele information. Coordinates are 1-based. | 
| CHR\$1START\$1END\$1ZERO\$1BASE | Contains contig, start, and end positions. Coordinates are 0-based. | 
| CHR\$1START\$1END\$1ONE\$1BASE | Contains contig, start, and end positions. Coordinates are 1-based. | 
| CHR\$1START\$1END\$1REF\$1ALT\$1ZERO\$1BASE | Contains contig, start, end, ref and alt allele information. Coordinates are 0-based. | 

A TSV import annotation store request looks like the following example.

```
aws omics start-annotation-import-job \
--destination-name tsv_anno_example \
--role-arn arn:aws:iam::555555555555:role/demoRole \
--items source=s3://demodata/genomic_data.bed.gz \
--format-options '{ "tsvOptions": {
        "readOptions": {
            "header": false,
            "sep": "\t"
        }
    }
}'
```

## Creating TSV formatted annotation stores
<a name="annotation-import-tsv-vcftsv-annotation-store-examples-tsv"></a>

The following example creates an annotation store using a tab limited file that contains a header, rows, and comments. The coordinates are `CHR_START_END_ONE_BASED`, and it contains the HG19 gene map from the [OMIM's Synopsis of the Human Gene Map](https://www.omim.org/downloads).

```
aws omics create-annotation-store --name mimgenemap \
  --store-format TSV \
  --reference=referenceArn=arn:aws:omics:us-west-2:555555555555:referenceStore/6505293348/reference/2310864158 \
  --store-options=tsvStoreOptions='{
    annotationType=CHR_START_END_ONE_BASE,  
    formatToHeader={CHR=chromosome, START=genomic_position_start, END=genomic_position_end},
    schema=[
      {chromosome=STRING}, 
      {genomic_position_start=LONG}, 
      {genomic_position_end=LONG}, 
      {cyto_location=STRING}, 
      {computed_cyto_location=STRING}, 
      {mim_number=STRING}, 
      {gene_symbols=STRING}, 
      {gene_name=STRING}, 
      {approved_gene_name=STRING}, 
      {entrez_gene_id=STRING}, 
      {ensembl_gene_id=STRING}, 
      {comments=STRING}, 
      {phenotypes=STRING}, 
      {mouse_gene_symbol=STRING}]}'
```

You can import files with or without a header. To indicate this in a CLI request, use `header=false`, as shown in the following import job example.

```
aws omics start-annotation-import-job \
   --role-arn arn:aws:iam::555555555555:role/demoRole \
   --items=source=s3://amzn-s3-demo-bucket/annotation-examples/hg38_genemap2.txt \
   --destination-name output-bucket \
   --format-options=tsvOptions='{readOptions={sep="\t",header=false,comment="#"}}'
```

The following example creates an annotation store for a bed file. A bed file is a simple tab delimited file. In this example, the columns are chromosome, start, end, and region name. The coordinates are zero-based, and the data does not have a header. 

```
aws omics create-annotation-store \
   --name cexbed --store-format TSV \
   --reference=referenceArn=arn:aws:omics:us-west-2:555555555555:referenceStore/6505293348/reference/2310864158 \
   --store-options=tsvStoreOptions='{
   annotationType=CHR_START_END_ZERO_BASE,  
   formatToHeader={CHR=chromosome, START=start, END=end}, 
   schema=[{chromosome=STRING}, {start=LONG}, {end=LONG}, {name=STRING}]}'
```

You can then import the bed file into the annotation store by using the following the CLI command.

```
aws omics start-annotation-import-job \
   --role-arn arn:aws:iam::555555555555:role/demoRole \
   --items=source=s3://amzn-s3-demo-bucket/TruSeq_Exome_TargetedRegions_v1.2.bed \ 
   --destination-name cexbed \
   --format-options=tsvOptions='{readOptions={sep="\t",header=false,comment="#"}}'
```

The following example creates an annotation store for a tab delimited file that contains the first few columns of a VCF file, followed by columns with annotation information. It contains genome positions with information on the chromosome, start, reference and alternate alleles, and it contains a header.

```
aws omics create-annotation-store --name gnomadchrx --store-format TSV \
--reference=referenceArn=arn:aws:omics:us-west-2:555555555555:referenceStore/6505293348/reference/2310864158 \
--store-options=tsvStoreOptions='{
    annotationType=CHR_POS_REF_ALT, 
    formatToHeader={CHR=chromosome, POS=start, REF=ref, ALT=alt}, 
    schema=[
        {chromosome=STRING}, 
        {start=LONG}, 
        {ref=STRING}, 
        {alt=STRING}, 
        {filters=STRING}, 
        {ac_hom=STRING}, 
        {ac_het=STRING},
        {af_hom=STRING}, 
        {af_het=STRING}, 
        {an=STRING}, 
        {max_observed_heteroplasmy=STRING}]}'
```

You would then import the file into the annotation store using the following the CLI command.

```
aws omics start-annotation-import-job \
  --role-arn arn:aws:iam::555555555555:role/demoRole \
   --items=source=s3://amzn-s3-demo-bucket/gnomad.genomes.v3.1.sites.chrM.reduced_annotations.tsv \
   --destination-name gnomadchrx \
   --format-options=tsvOptions='{readOptions={sep="\t",header=true,comment="#"}}'
```

The following example shows how a customer can create an annotation store for a mim2gene file. A mim2gene file provides the links between the genes in OMIM and another gene identifier. It's tab delimited and contains comments. 

```
aws omics create-annotation-store \
  --name mim2gene \
  --store-format TSV \
  --reference=referenceArn=arn:aws:omics:us-west-2:555555555555:referenceStore/6505293348/reference/2310864158 \
  --store-options=tsvStoreOptions='
    {annotationType=GENERIC,      
    formatToHeader={}, 
    schema=[
        {mim_gene_id=STRING}, 
        {mim_type=STRING}, 
        {entrez_id=STRING}, 
        {hgnc=STRING}, 
        {ensembl=STRING}]}'
```

You can then import data into your store as follows.

```
aws omics start-annotation-import-job \
   --role-arn arn:aws:iam::555555555555:role/demoRole \
   --items=source=s3://xquek-dev-aws/annotation-examples/mim2gene.txt \
   --destination-name mim2gene \
   --format-options=tsvOptions='{readOptions={sep="\t",header=false,comment="#"}}'
```

## Starting VCF formatted import jobs
<a name="vcf-annotation-store-examples"></a>

For VCF files, there are two additional inputs, `ignoreQualField` and `ignoreFilterField`, that ignore or include those parameters as shown.

```
aws omics start-annotation-import-job --destination-name annotation_example\
  --role-arn arn:aws:iam::555555555555:role/demoRole \
  --items source=s3://demodata/example.garvan.vcf \
  --format-options '{ "vcfOptions": {
    "ignoreQualField": false,
    "ignoreFilterField": false         
    }
   }'
```

You can also cancel an annotation store import, as shown. If the cancellation succeeds, you don't receive a response to this AWS CLI call. However, if the import job ID isn't found or the import job is completed, you receive an error message. 

```
aws omics cancel-annotation-import-job --job-id edd7b8ce-xmpl-47e2-bc99-258cac95a508
```

**Note**  
Your metadata import job history for **get-annotation-import-job**, **get-variant-import-job**, **list-annotation-import-jobs**, and **list-variant-import-jobs** is auto-deleted after two years. The variant and annotation data that's imported isn't auto-deleted and remains in your data stores.

# Creating HealthOmics annotation store versions
<a name="annotation-store-versioning"></a>

**Important**  
AWS HealthOmics variant stores and annotation stores are no longer open to new customers. Existing customers can continue to use the service as normal. For more information, see [AWS HealthOmics variant store and annotation store availability change](variant-store-availability-change.md).

You can create new versions of annotation stores to collect different versions of your annotation databases. This helps you organize your annotation data, which is updated regularly.

To create a new version of an existing annotation store, use the **create-annotation-store-version** API as shown in the following example.

```
aws omics create-annotation-store-version \
     --name my_annotation_store \
     --version-name my_version
```

You will get the following response with the annotation store version ID, confirming that a new version of your annotation has been created.

```
{
     "creationTime": "2023-07-21T17:15:49.251040+00:00",
     "id": "3b93cdef69d2",
     "name": "my_annotation_store",
     "reference": {
         "referenceArn": "arn:aws:omics:us-west-2:555555555555:referenceStore/6505293348/reference/5987565360"
     },
     "status": "CREATING",
     "versionName": "my_version"
}
```

To update the description of an annotation store version, you can use **update-annotation-store-version** to add updates to an annotation store version. 

```
aws omics update-annotation-store-version \
    --name my_annotation_store \
    --version-name my_version \
    --description "New Description"
```

You will receive the following response, confirming that the annotation store version has been updated.

```
{
     "storeId": "4934045d1c6d",
     "id": "2a3f4a44aa7b",
     "description":"New Description",
     "status": "ACTIVE",
     "name": "my_annotation_store",
     "versionName": "my_version",
     "creation Time": "2023-07-21T17:20:59.380043+00:00",
     "updateTime": "2023-07-21T17:26:17.892034+00:00"
}
```

To view the details of an annotation store version, use **get-annotation-store-version**.

```
aws omics get-annotation-store-version --name my_annotation_store --version-name my_version              
```

You will receive a response with the version name, status, and other details.

```
{
     "storeId": "4934045d1c6d",
     "id": "2a3f4a44aa7b",
     "status": "ACTIVE",
     "versionArn": "arn:aws:omics:us-west-2:555555555555:annotationStore/my_annotation_store/version/my_version",
     "name": "my_annotation_store",
     "versionName": "my_version",
     "creationTime": "2023-07-21T17:15:49.251040+00:00",
     "updateTime": "2023-07-21T17:15:56.434223+00:00",
     "statusMessage": "",
     "versionSizeBytes": 0
    }
```

To view all versions of an annotation store, you can use **list-annotation-store-versions**, as shown in the following example.

```
aws omics list-annotation-store-versions --name my_annotation_store
```

You will receive a response with the following information

```
{
  "annotationStoreVersions": [
    {
     "storeId": "4934045d1c6d",
     "id": "2a3f4a44aa7b",
     "status": "CREATING",
     "versionArn": "arn:aws:omics:us-west-2:555555555555:annotationStore/my_annotation_store/version/my_version_2",
     "name": "my_annotation_store",
     "versionName": "my_version_2",
     "creation Time": "2023-07-21T17:20:59.380043+00:00",
     "versionSizeBytes": 0
    },
    {
     "storeId": "4934045d1c6d",
     "id": "4934045d1c6d",
     "status": "ACTIVE",
     "versionArn": "arn:aws:omics:us-west-2:555555555555:annotationStore/my_annotation_store/version/my_version_1",
     "name": "my_annotation_store",
     "versionName": "my_version_1",
     "creationTime": "2023-07-21T17:15:49.251040+00:00",
     "updateTime": "2023-07-21T17:15:56.434223+00:00",
     "statusMessage": "",
     "versionSizeBytes": 0
    }
}
```

If you no longer need an annotation store version, you can use **delete-annotation-store-versions** to delete an annotation store version, as shown in the following example.

```
aws omics delete-annotation-store-versions --name my_annotation_store --versions my_version  
```

If the store version is deleted without errors, you will receive the following response.

```
{
  "errors": []
}
```

If there are errors, you will receive a response with the details of the errors, as shown.

```
{
  "errors": [
    {
      "versionName": "my_version",
      "message": "Version with versionName: my_version was not found."
    }
  ]
}
```

If you try to delete an annotation store version that has an active import job, you will receive a response with an error, as shown.

```
{
  "errors": [
    {
      "versionName": "my_version",
      "message": "version has an inflight import running"
    }
  ]
}
```

In this case, you can force deletion of the annotation store version, as shown in the following example.

```
aws omics delete-annotation-store-versions --name my_annotation_store --versions my_version --force 
```

# Deleting HealthOmics analytics stores
<a name="deleting-a-store-examples"></a>

**Important**  
AWS HealthOmics variant stores and annotation stores are no longer open to new customers. Existing customers can continue to use the service as normal. For more information, see [AWS HealthOmics variant store and annotation store availability change](variant-store-availability-change.md).

When you delete a variant or annotation store, the system also deletes all imported data in that store and any associated tags.

The following example shows how to delete a variant store using the AWS CLI. If the action is successful, the variant store status transitions to `DELETING`.

```
aws omics delete-variant-store --id <variant-store-id>
```

The following example shows how to delete an annotation store. If the action is successful, the annotation store status transitions to `DELETING`. Annotation stores can't be deleted if more than one version exists.

```
aws omics delete-annotation-store --id <annotation-store-id>
```

# Querying HealthOmics analytics data
<a name="analytics-query-data"></a>

**Important**  
AWS HealthOmics variant stores and annotation stores are no longer open to new customers. Existing customers can continue to use the service as normal. For more information, see [AWS HealthOmics variant store and annotation store availability change](variant-store-availability-change.md).

You can perform queries on your variant stores using AWS Lake Formation and Amazon Athena or Amazon EMR. Before you run any queries, complete the setup procedures (described in the following sections) for Lake Formation and Amazon Athena.

For information about Amazon EMR, see [ Tutorial: Getting started with Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-gs.html)

For variant stores created after Sept 26, 2024, HealthOmics partitions the store by sample ID. This partitioning means that HealthOmics uses the sample ID to optimize storing of the variant information. Queries that use sample information as filters will return results faster, as the query scans less data. 

HealthOmics uses sample IDs as partition file names. Before you ingest data, check whether the sample ID contains any PHI data. If it does, change the sample ID before you ingest the data. For more information about what content to include and not include in sample IDs, see guidance on the AWS [ HIPAA compliance](https://aws.amazon.com/compliance/hipaa-compliance) web page.

**Topics**
+ [

# Configuring Lake Formation to use HealthOmics
](setting-up-lf.md)
+ [

# Configuring Athena for queries
](analytics-setting-up-athena.md)
+ [

# Running queries on HealthOmics variant stores
](analytics-run-queries.md)

# Configuring Lake Formation to use HealthOmics
<a name="setting-up-lf"></a>

**Important**  
AWS HealthOmics variant stores and annotation stores are no longer open to new customers. Existing customers can continue to use the service as normal. For more information, see [AWS HealthOmics variant store and annotation store availability change](variant-store-availability-change.md).

Before you use Lake Formation to manage HealthOmics data stores, perform the following Lake Formation configuration procedures.

**Topics**
+ [

## Creating or verify Lake Formation administrators
](#create-lf-admins)
+ [

## Creating resource links using the Lake Formation console
](#create-resource-links)
+ [

## Configuring permissions for AWS RAM resource shares
](#configure-lf-permissions)

## Creating or verify Lake Formation administrators
<a name="create-lf-admins"></a>

Before you can create a data lake in Lake Formation, you define one or more administrators.

Administrators are users and roles with permissions to create resource links. You set up data lake administrators per account per region.

**Create an admin user in the Lake Formation console**

1. Open the AWS Lake Formation console: [Lake Formation console](https://console.aws.amazon.com//lakeformation)

1. If the console displays the **Welcome to Lake Formation** panel, choose **Get started**.

   Lake Formation adds you to the **Data lake administrators** table.

1. Otherwise, from the left menu, choose **Administative roles and tasks**.

1. Add any additional administrators as required.

## Creating resource links using the Lake Formation console
<a name="create-resource-links"></a>

To make a shared resource that users can query, the default access controls must be disabled. To learn more about disabling default access controls, see [Changing the default security settings for your data lake](https://docs.aws.amazon.com/lake-formation/latest/dg/change-settings.html) in the Lake Formation documentation. You can create resource links individually or as a group, so that you can access data in Amazon Athena or other AWS services (such as Amazon EMR).

**Creating resource links in the AWS Lake Formation console and sharing them with HealthOmics Analytics users**

1. Open the AWS Lake Formation console: [Lake Formation console](https://console.aws.amazon.com//lakeformation)

1. In the primary navigation bar, choose **Databases**.

1. In the **Databases** table, select the desired database.

1. From the **Create** menu, choose **Resource link**.

1. Enter a **Resource link name**. If you plan to access the database from Athena, enter a name using only lowercase letters (up to 256 characters).

1. Choose **Create**.

1. The new resource link is now listed under **Databases**.

### Grant access to the shared resource using the Lake Formation console
<a name="create-resource-links"></a>

A Lake Formation database administrator can grant access to the shared resource using the following procedure.

1. Open the AWS Lake Formation console: [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com//lakeformation)

1. In the primary navigation bar, choose **Databases**.

1. On the **Databases** page, select the resource link you previously created.

1. From the **Actions** menu, choose **Grant on target**.

1. On the **Grant data permissions** page under **Principals**, choose **IAM users or roles**.

1. From the **IAM users or roles** drop-down menu, find the user to which you want to grant access.

1. Next, under **LF-Tags or catalog resources** card, select the **Named data catalog resources** option.

1. From the **Tables-optional** drop-down menu, select **All Tables** or the table that you previously created.

1. In the **Table permissions** card, under **Table permissions** choose **Describe** and **Select**.

1. Next, choose **Grant**.

To view the Lake Formation permissions, choose **Data lake permissions** from the primary navigation pane. The table shows the available databases and resource links.

## Configuring permissions for AWS RAM resource shares
<a name="configure-lf-permissions"></a>

In the AWS Lake Formation console, view the permissions by choosing **Data lake permissions** in the primary navigation bar. On the **Data permissions** page, you can view a table that shows the **Resource types**, **Databases**, and **ARN** that's related to a shared resource under **RAM Resource Share**. If you need to accept an AWS Resource Access Manager (AWS RAM) resource share, AWS Lake Formation notifies you in the console.

HealthOmics can implicitly accept the AWS RAM resource shares during store creation. To accept the AWS RAM resource share, the IAM user or role that calls the `CreateVariantStore` or `CreateAnnotationStore` API operations must allow the following actions:
+ `ram:GetResourceShareInvitations` - This action allows HealthOmics to find the invitations.
+ `ram:AcceptResourceShareInvitation` - This action allows HealthOmics to accept the invitation by using an FAS token.

Without these permissions, you see an authorization error during store creation.

Here is a sample policy that includes these actions. Add this policy to the IAM user or role that accepts the AWS RAM resource share.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "omics:*",
        "ram:AcceptResourceShareInvitation",
        "ram:GetResourceShareInvitations"
      ],
      "Resource": "*"
    }
  ]
}
```

------

# Configuring Athena for queries
<a name="analytics-setting-up-athena"></a>

**Important**  
AWS HealthOmics variant stores and annotation stores are no longer open to new customers. Existing customers can continue to use the service as normal. For more information, see [AWS HealthOmics variant store and annotation store availability change](variant-store-availability-change.md).

You can use Athena to query variants and annotations. Before you run any queries, perform the following setup tasks:

**Topics**
+ [

## Configure a query results location using the Athena console
](#configure-athena-query)
+ [

## Configure a workgroup with Athena engine v3
](#configure-athena-workgroup)

## Configure a query results location using the Athena console
<a name="configure-athena-query"></a>

To configure a query results location, follow these steps.

1. Open the Athena console: [Athena console](https://console.aws.amazon.com//athena)

1. In the primary navigation bar, choose **Query editor**.

1. In the query editor, choose the **Settings** tab, then choose **Manage**.

1. Enter an S3 prefix of a location to save the query result.

## Configure a workgroup with Athena engine v3
<a name="configure-athena-workgroup"></a>

To configure a workgroup, follow these steps.

1. Open the Athena console: [Athena console](https://console.aws.amazon.com//athena)

1. In the primary navigation bar, choose **Workgroups**, then **Create workgroup**.

1. Enter a name for the workgroup.

1. Select **Athena SQL** as the type of engine.

1. Under **Upgrade query engine**, select **Manual**.

1. Under **Query version engine**, select **Athena version 3**.

1. Choose **Create workgroup**.

# Running queries on HealthOmics variant stores
<a name="analytics-run-queries"></a>

**Important**  
AWS HealthOmics variant stores and annotation stores are no longer open to new customers. Existing customers can continue to use the service as normal. For more information, see [AWS HealthOmics variant store and annotation store availability change](variant-store-availability-change.md).

You can perform queries on your variant store using Amazon Athena. Note that genomic coordinates in variant and annotation stores are represented as zero-based, half-closed half-open intervals.

## Run a simple query using the Athena console
<a name="run-queries-athena-simple"></a>

The following example shows how to run a simple query.

1. Open the Athena Query editor: [Athena Query editor](https://console.aws.amazon.com//athena)

1. Under **Workgroup**, select the workgroup that you created during setup.

1. Verify that **Data source** is **AwsDataCatalog**.

1. For **Database**, select the database resource link that you created during the Lake Formation setup.

1. Copy the following query into the **Query Editor ** under the **Query 1** tab:

   ```
   SELECT * from omicsvariants limit 10
   ```

1. Choose **Run** to run the query. The console populates the results table with the first 10 rows of the **omicsvariants** table.

## Run a complex query using the Athena console
<a name="run-queries-athena-complex"></a>

The following example shows how to run a complex query. To run this query, import `ClinVar` into the annotation store.

**Run a complex query**

1. Open the Athena Query editor: [Athena Query editor](https://console.aws.amazon.com//athena)

1. Under**Workgroup**, select the workgroup that you created during setup.

1. Verify that **Data source** is **AwsDataCatalog**.

1. For **Database**, select the database resource link that you created during the Lake Formation setup.

1. Choose the **\$1** at the top right to create a new query tab named **Query 2**.

1. Copy the following query into the **Query Editor ** under the **Query 2** tab:

   ```
   SELECT variants.sampleid,
     variants.contigname,
     variants.start,
     variants."end",
     variants.referenceallele,
     variants.alternatealleles,
     variants.attributes AS variant_attributes,
     clinvar.attributes AS clinvar_attributes  
   FROM omicsvariants as variants 
   INNER JOIN omicsannotations as clinvar ON 
     variants.contigname=CONCAT('chr',clinvar.contigname) 
     AND variants.start=clinvar.start 
     AND variants."end"=clinvar."end" 
     AND variants.referenceallele=clinvar.referenceallele 
     AND variants.alternatealleles=clinvar.alternatealleles 
   WHERE clinvar.attributes['CLNSIG']='Likely_pathogenic'
   ```

1. Choose **Run** to start running the query. 

# Sharing HealthOmics analytics stores
<a name="cross-account-sharing"></a>

**Important**  
AWS HealthOmics variant stores and annotation stores are no longer open to new customers. Existing customers can continue to use the service as normal. For more information, see [AWS HealthOmics variant store and annotation store availability change](variant-store-availability-change.md).

As the owner of a variant store or an annotation store, you can share the store with other AWS accounts. The owner can revoke access to the shared resource by deleting the share. 

As the subscriber to a shared store, you first accept the share. You can then define workflows that use the shared store. The data shows up as a table in both AWS Glue and Lake Formation.

When you no longer need access to the store, you delete the share.

See [Cross-account resource sharing in AWS HealthOmics](resource-sharing.md) for additional information about resource sharing. 

## Creating a store share
<a name="sharing-create"></a>

To create a store share, use the **create-share** API operation. The principal subscriber is the AWS account of the user who will subscribe to the share. The following example creates a share for a variant store. To share a store with more than one account, you create multiple shares of the same store.

```
aws omics create-share  \
        --resource-arn "arn:aws:omics:us-west-2:555555555555:variantStore/omics_dev_var_store" \
        --principal-subscriber "123456789012" \
        --name "my_Share-123"
```

If the create is successful, you receive a response with the share ID and status.

```
{
       "shareId": "495c21bedc889d07d0ab69d710a6841e-dd75ab7a1a9c384fa848b5bd8e5a7e0a",
       "name": "my_Share-123",
       "status": "PENDING"
  }
```

The share remains in pending state until the subscriber accepts it using the accept-share API operation.