Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Extract JSON data from strings

Focus mode
Extract JSON data from strings - Amazon Athena

You may have source data containing JSON-encoded strings that you do not necessarily want to deserialize into a table in Athena. In this case, you can still run SQL operations on this data, using the JSON functions available in Presto.

Consider this JSON string as an example dataset.

{"name": "Susan Smith", "org": "engineering", "projects": [ {"name":"project1", "completed":false}, {"name":"project2", "completed":true} ] }

Examples: Extract properties

To extract the name and projects properties from the JSON string, use the json_extract function as in the following example. The json_extract function takes the column containing the JSON string, and searches it using a JSONPath-like expression with the dot . notation.

Note

JSONPath performs a simple tree traversal. It uses the $ sign to denote the root of the JSON document, followed by a period and an element nested directly under the root, such as $.name.

WITH dataset AS ( SELECT '{"name": "Susan Smith", "org": "engineering", "projects": [{"name":"project1", "completed":false}, {"name":"project2", "completed":true}]}' AS myblob ) SELECT json_extract(myblob, '$.name') AS name, json_extract(myblob, '$.projects') AS projects FROM dataset

The returned value is a JSON-encoded string, and not a native Athena data type.

+-----------------------------------------------------------------------------------------------+ | name | projects | +-----------------------------------------------------------------------------------------------+ | "Susan Smith" | [{"name":"project1","completed":false},{"name":"project2","completed":true}] | +-----------------------------------------------------------------------------------------------+

To extract the scalar value from the JSON string, use the json_extract_scalar(json, json_path) function. It is similar to json_extract, but returns a varchar string value instead of a JSON-encoded string. The value for the json_path parameter must be a scalar (a Boolean, number, or string).

Note

Do not use the json_extract_scalar function on arrays, maps, or structs.

WITH dataset AS ( SELECT '{"name": "Susan Smith", "org": "engineering", "projects": [{"name":"project1", "completed":false},{"name":"project2", "completed":true}]}' AS myblob ) SELECT json_extract_scalar(myblob, '$.name') AS name, json_extract_scalar(myblob, '$.projects') AS projects FROM dataset

This query returns:

+---------------------------+ | name | projects | +---------------------------+ | Susan Smith | | +---------------------------+

To obtain the first element of the projects property in the example array, use the json_array_get function and specify the index position.

WITH dataset AS ( SELECT '{"name": "Bob Smith", "org": "engineering", "projects": [{"name":"project1", "completed":false},{"name":"project2", "completed":true}]}' AS myblob ) SELECT json_array_get(json_extract(myblob, '$.projects'), 0) AS item FROM dataset

It returns the value at the specified index position in the JSON-encoded array.

+---------------------------------------+ | item | +---------------------------------------+ | {"name":"project1","completed":false} | +---------------------------------------+

To return an Athena string type, use the [] operator inside a JSONPath expression, then Use the json_extract_scalar function. For more information about [], see Access array elements.

WITH dataset AS ( SELECT '{"name": "Bob Smith", "org": "engineering", "projects": [{"name":"project1", "completed":false},{"name":"project2", "completed":true}]}' AS myblob ) SELECT json_extract_scalar(myblob, '$.projects[0].name') AS project_name FROM dataset

It returns this result:

+--------------+ | project_name | +--------------+ | project1 | +--------------+

On this page

PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.