Creating HealthOmics variant store import jobs - AWS HealthOmics

Creating HealthOmics variant store import jobs

The following example shows how to use the AWS CLI to create an import job for a variant store.

aws omics start-variant-import-job \ --destination-name myvariantstore \ --runLeftNormalization false \ --role-arn arn:aws:iam::55555555555:role/roleName \ --items source=s3://my-omics-bucket/sample.vcf.gz source=s3://my-omics-bucket/sample2.vcf.gz
{ "destinationName": "store_a", "roleArn": "....", "runLeftNormalization": false, "items": [ {"source": "s3://my-omics-bucket/sample.vcf.gz"}, {"source": "s3://my-omics-bucket/sample2.vcf.gz"} ] }

For stores created after May 15, 2023, the following example shows how to add the --annotation-fields parameter. The annotation fields are defined with the import.

aws omics start-variant-import-job \ --destination-name annotationparsingvariantstore \ --role-arn arn:aws:iam::123456789012:role/<role_name> \ --items source=s3://pathToS3/sample.vcf --annotation-fields '{"VEP": "CSQ"}'
{ "jobId": "981e2286-e954-4391-8a97-09aefc343861" }

Use get-variant-import-job to check the status.

aws omics get-variant-import-job --job-id 08279950-a9e3-4cc3-9a3c-a574f9c9e229

You'll receive a JSON response that shows the status of your import job. VEP annotations in the VCF are parsed for information stored in the INFO column as an ID/Value pair. The default ID for Ensembl Variant Effect Predictor annotations INFO column is CSQ, but you can use the --annotation-fields parameter to indicate a custom value used in the INFO column. Parsing is currently supported for VEP annotations.

For a store created before May 15, 2023 or for VCF files that don't include VEP annotation, the response doesn't include any annotation fields.

{ "creationTime": "2023-04-11T17:52:37.241958+00:00", "destinationName": "annotationparsingvariantstore", "id": "7a1c67e3-b7f9-434d-817b-9c571fd63bea", "items": [ { "jobStatus": "COMPLETED", "source": "s3://amzn-s3-demo-bucket/NA12878.2k.garvan.vcf" } ], "roleArn": "arn:aws:iam::555555555555:role/<role_name>", "runLeftNormalization": false, "status": "COMPLETED", "updateTime": "2023-04-11T17:58:22.676043+00:00", }

The VEP annotations that are a part of VCF files are stored as predefined schema with the following structure. The extras field can be used to store any additional VEP fields that aren't included in the default schema.

annotations struct< vep: array<struct< allele:string, consequence: array<string>, impact:string, symbol:string, gene:string, `feature_type`: string, feature: string, biotype: string, exon: struct<rank:string, total:string>, intron: struct<rank:string, total:string>, hgvsc: string, hgvsp: string, `cdna_position`: string, `cds_position`: string, `protein_position`: string, `amino_acids`: struct<reference:string, variant: string>, codons: struct<reference:string, variant: string>, `existing_variation`: array<string>, distance: string, strand: string, flags: array<string>, symbol_source: string, hgnc_id: string, `extras`: map<string, string> >> >

The parsing is performed with a best effort approach. If the VEP entry doesn't follow the VEP standard specifications, it won't be parsed and the row in the array will be empty.

For a new variant store, the response for get-variant-import-job would include the annotation fields, as shown.

aws omics get-variant-import-job --job-id 08279950-a9e3-4cc3-9a3c-a574f9c9e229

You receive a JSON response that shows the status of your import job.

{ "creationTime": "2023-04-11T17:52:37.241958+00:00", "destinationName": "annotationparsingvariantstore", "id": "7a1c67e3-b7f9-434d-817b-9c571fd63bea", "items": [ { "jobStatus": "COMPLETED", "source": "s3://amzn-s3-demo-bucket/NA12878.2k.garvan.vcf" } ], "roleArn": "arn:aws:iam::123456789012:role/<role_name>", "runLeftNormalization": false, "status": "COMPLETED", "updateTime": "2023-04-11T17:58:22.676043+00:00", "annotationFields" : {"VEP": "CSQ"} } }

You can use list-variant-import-jobs to see all import jobs and their statuses.

aws omics list-variant-import-jobs --ids 7a1c67e3-b7f9-434d-817b-9c571fd63bea

The response contains information as follows.

{ "variantImportJobs": [ { "creationTime": "2023-04-11T17:52:37.241958+00:00", "destinationName": "annotationparsingvariantstore", "id": "7a1c67e3-b7f9-434d-817b-9c571fd63bea", "roleArn": "arn:aws:iam::55555555555:role/roleName", "runLeftNormalization": false, "status": "COMPLETED", "updateTime": "2023-04-11T17:58:22.676043+00:00", "annotationFields" : {"VEP": "CSQ"} } ] } }

If necessary, you can cancel an import job with the following command.

aws omics cancel-variant-import-job --job-id edd7b8ce-xmpl-47e2-bc99-258cac95a508