Amazon Ion Hive SerDe
You can use the Amazon Ion Hive SerDe to query data stored in Amazon Ion
Amazon Ion has binary and text formats that are interchangeable. This feature combines the ease of use of text with the efficiency of binary encoding.
To query Amazon Ion data from Athena, you can use the Amazon Ion Hive SerDeCREATE TABLE AS
SELECT
(CTAS) or INSERT INTO
queries to copy data from existing
tables.
Note
Because Amazon Ion is a superset of JSON, you can use the Amazon Ion Hive SerDe to query non-Amazon Ion JSON datasets. Unlike other JSON SerDe libraries, the Amazon Ion SerDe does not expect each row of data to be on a single line. This feature is useful if you want to query JSON datasets that are in "pretty print" format or otherwise break up the fields in a row with newline characters.
For additional information and examples of querying Amazon Ion with Athena, see Analyze
Amazon Ion datasets using Amazon Athena
Serialization library name
The serialization library name for the Amazon Ion SerDe is
com.amazon.ionhiveserde.IonHiveSerDe
. For source code information, see
Amazon Ion Hive
SerDe
Considerations and limitations
-
Duplicated fields – Amazon Ion structs are ordered and support duplicated fields, while Hive's
STRUCT<>
andMAP<>
do not. Thus, when you deserialize a duplicated field from an Amazon Ion struct, a single value is chosen non deterministically, and the others are ignored. -
External symbol tables unsupported – Currently, Athena does not support external symbol tables or the following Amazon Ion Hive SerDe properties:
-
ion.catalog.class
-
ion.catalog.file
-
ion.catalog.url
-
ion.symbol_table_imports
-
-
File extensions – Amazon Ion uses file extensions to determine which compression codec to use for deserializing Amazon Ion files. As such, compressed files must have the file extension that corresponds to the compression algorithm used. For example, if ZSTD is used, corresponding files should have the extension
.zst
. -
Homogeneous data – Amazon Ion has no restrictions on the data types that can be used for values in particular fields. For example, two different Amazon Ion documents might have a field with the same name that have different data types. However, because Hive uses a schema, all values that you extract to a single Hive column must have the same data type.
-
Map key type restrictions – When you serialize data from another format into Amazon Ion, ensure that the map key type is one of
STRING
,VARCHAR
, orCHAR
. Although Hive allows you to use any primitive data type as a map key, Amazon Ion symbolsmust be a string type. -
Union type – Athena does not currently support the Hive union type
. -
Double data type – Amazon Ion does not currently support the
double
data type.