Query your HealthLake data store using SQL
Note
After February 20, 2023, HealthLake data stores do not use integrated natural language processing (NLP) by default. If you are interested in turning on this feature on your data store, see How do I turn on HealthLake's integrated natural language processing feature? in the Troubleshooting chapter.
All examples in this topic use fictionalized data created using Synthea. To learn more about creating a data store preloaded with Synthea data, see Creating a data store in AWS HealthLake.
When you import your HealthLake data store into Athena, each resource type from your HealthLake data store is converted into a table. These tables can be queried individually or as group using SQL-based queries. Because of the structure of data stores, your data is imported into Athena as multiple different data types. To learn more about creating SQL queries that can access these data types, see Querying arrays with complex types and nested structures in the Amazon Athena User Guide.
For each element in a resource type, the FHIR specification defines a cardinality. The cardinality of an element defines the lower and upper bounds of how many times this element can appear. When constructing a SQL query, you must take this into account. For example, let's look at some elements in Resource type: Patient
-
Element: Name The FHIR specification sets the cardinality as
0..*
.The element is captured as an array.
[{ id = null, extension = null, use = official, _use = null, text = null, _text = null, family = Wolf938, _family = null, given = [Noel608], _given = null, prefix = null, _prefix = null, suffix = null, _suffix = null, period = null }]
In Athena, to see how a resource type has been ingested, search for it under Tables and views. To access elements in this array, you can use dot notation. Here's a simple example that would access the values for
given
andfamily
.SELECT name[1].given as FirstName, name[1].family as LastName FROM Patient
-
Element: MaritalStatus The FHIR specification sets the cardinality as
0..1
.This element is captured as JSON.
{ id = null, extension = null, coding = [ { id = null, extension = null, system = http: //terminology.hl7.org/CodeSystem/v3-MaritalStatus, _system = null, version = null, _version = null, code = S, _code = null, display = Never Married, _display = null, userSelected = null, _userSelected = null } ], text = Never Married, _text = null }
In Athena, to see how a resource type has been ingested, search for it under Tables and views. To access key-value pairs in the JSON, you can use dot notation. Because it isn't an array, no array index is required. Here's a simple example that would access the value for
text
.SELECT maritalstatus.text as MaritalStatus FROM Patient
To learn more about accessing and searching JSON, see Querying JSON in the Athena User Guide.
Athena Data Manipulation Language (DML) query statements are based on Trino. Athena does not support all of Trino's features, and there are significant differences. To learn more, see DML queries, functions, and operators in the Amazon Athena User Guide.
Furthermore, Athena supports multiple data types that you may encounter when creating queries of your HealthLake data store. To learn more about data types in Athena, see Data types in Amazon Athena in the Amazon Athena User Guide.
To learn more about how SQL queries work in Athena, see SQL reference for Amazon Athena in the Amazon Athena User Guide.
Each tab shows examples of how to search on the specified resource types and associated elements using Athena.