Querying HealthOmics analytics data - AWS HealthOmics

Querying HealthOmics analytics data

You can perform queries on your variant stores using AWS Lake Formation and Amazon Athena or Amazon EMR. Before you run any queries, complete the setup procedures (described in the following sections) for Lake Formation and Amazon Athena.

For information about Amazon EMR, see Tutorial: Getting started with Amazon EMR

For variant stores created after Sept 26, 2024, HealthOmics partitions the store by sample ID. This partitioning means that HealthOmics uses the sample ID to optimize storing of the variant information. Queries that use sample information as filters will return results faster, as the query scans less data.

HealthOmics uses sample IDs as partition file names. Before you ingest data, check whether the sample ID contains any PHI data. If it does, change the sample ID before you ingest the data. For more information about what content to include and not include in sample IDs, see guidance on the AWS HIPAA compliance web page.