Querying HealthLake data with SQL
When you import your FHIR data into HealthLake data store, the nested JSON FHIR data simultaneously undergoes an ETL process and is stored in Apache Iceberg open table format in Amazon S3. Each FHIR resource type from your HealthLake data store is converted into a table, where it can be queried using Amazon Athena. The tables can be queried individually or as group using SQL-based queries. Because of the structure of data stores, your data is imported into Athena as multiple different data types. To learn more about creating SQL queries that can access these data types, see Query arrays with complex types and nested structures in the Amazon Athena User Guide.
Note
All examples in this topic use fictionalized data created using Synthea. To learn more about creating a data store preloaded with Synthea data, see Creating a HealthLake data store.
For each element in a resource type, the FHIR specification defines a cardinality. The
cardinality of an element defines the lower and upper bounds of how many times this element
can appear. When constructing a SQL query, you must take this into account. For example,
let's look at some elements in Resource type:
Patient
-
Element: Name The FHIR specification sets the cardinality as
0..*
.The element is captured as an array.
[{ id = null, extension = null, use = official, _use = null, text = null, _text = null, family = Wolf938, _family = null, given = [Noel608], _given = null, prefix = null, _prefix = null, suffix = null, _suffix = null, period = null }]
In Athena, to see how a resource type has been ingested, search for it under Tables and views. To access elements in this array, you can use dot notation. Here's a simple example that would access the values for
given
andfamily
.SELECT name[1].given as FirstName, name[1].family as LastName FROM Patient
-
Element: MaritalStatus The FHIR specification sets the cardinality as
0..1
.This element is captured as JSON.
{ id = null, extension = null, coding = [ { id = null, extension = null, system = http: //terminology.hl7.org/CodeSystem/v3-MaritalStatus, _system = null, version = null, _version = null, code = S, _code = null, display = Never Married, _display = null, userSelected = null, _userSelected = null } ], text = Never Married, _text = null }
In Athena, to see how a resource type has been ingested, search for it under Tables and views. To access key-value pairs in the JSON, you can use dot notation. Because it isn't an array, no array index is required. Here's a simple example that would access the value for
text
.SELECT maritalstatus.text as MaritalStatus FROM Patient
To learn more about accessing and searching JSON, see Querying JSON in the Athena User Guide.
Athena Data Manipulation Language (DML) query statements are based on Trino. Athena does not support all of Trino's features, and there are significant differences. To learn more, see DML queries, functions, and operators in the Amazon Athena User Guide.
Furthermore, Athena supports multiple data types that you may encounter when creating queries of your HealthLake data store. To learn more about data types in Athena, see Data types in Amazon Athena in the Amazon Athena User Guide.
To learn more about how SQL queries work in Athena, see SQL reference for Amazon Athena in the Amazon Athena User Guide.
Each tab shows examples of how to search on the specified resource types and associated elements using Athena.