Amazon Ion SerDe property reference
This topic contains information about the SerDe properties for CREATE TABLE
statements in Athena. For more information and examples of Amazon Ion SerDe property usage,
see SerDe properties
How to specify Amazon Ion SerDe properties
To specify properties for the Amazon Ion Hive SerDe in your CREATE TABLE
statement, use the WITH SERDEPROPERTIES
clause. Because WITH
SERDEPROPERTIES
is a subfield of the ROW FORMAT SERDE
clause,
you must specify ROW FORMAT SERDE
and the Amazon Ion Hive SerDe class path
first, as the following syntax shows.
... ROW FORMAT SERDE 'com.amazon.ionhiveserde.IonHiveSerDe' WITH SERDEPROPERTIES ( '
property
' = 'value
', 'property
' = 'value
', ... )
Note that although the ROW FORMAT SERDE
clause is required if you want to
use WITH SERDEPROPERTIES
, you can use either STORED AS ION
or
the longer INPUTFORMAT
and OUTPUTFORMAT
syntax to specify the
Amazon Ion format.
Amazon Ion SerDe properties
Following are the Amazon Ion SerDe properties that can be used in CREATE
TABLE
statements in Athena.
- ion.encoding
-
Optional
Default:
BINARY
Values:
BINARY
,TEXT
This property declares whether new values added are serialized as Amazon Ion binary
or Amazon Ion text format. The following SerDe property example specifies Amazon Ion text format.
'ion.encoding' = 'TEXT'
- ion.fail_on_overflow
-
Optional
Default:
true
Values:
true
,false
Amazon Ion allows for arbitrarily large numerical types while Hive does not. By default, the SerDe fails if the Amazon Ion value does not fit the Hive column, but you can use the
fail_on_overflow
configuration option to let the value overflow instead of failing.This property can be set at either the table or column level. To specify it at the table level, specify
ion.fail_on_overflow
as in the following example. This sets the default behavior for all columns.'ion.fail_on_overflow' = 'true'
To control a specific column, specify the column name between
ion
andfail_on_overflow
, delimited by periods, as in the following example.'ion.
<column>
.fail_on_overflow' = 'false' - ion.path_extractor.case_sensitive
-
Optional
Default:
false
Values:
true
,false
Determines whether to treat Amazon Ion field names as case sensitive. When
false
, the SerDe ignores case parsing Amazon Ion field names.For example, suppose you have a Hive table schema that defines a field
alias
in lower case and an Amazon Ion document with both analias
field and anALIAS
field, as in the following example.-- Hive Table Schema alias: STRING -- Amazon Ion Document { 'ALIAS': 'value1'} { 'alias': 'value2'}
The following example shows SerDe properties and the resulting extracted table when case sensitivity is set to
false
:-- Serde properties 'ion.alias.path_extractor' = '(alias)' 'ion.path_extractor.case_sensitive' = 'false' --Extracted Table | alias | |----------| | "value1" | | "value2" |
The following example shows SerDe properties and the resulting extracted table when case sensitivity is set to
true
:-- Serde properties 'ion.alias.path_extractor' = '(alias)' 'ion.path_extractor.case_sensitive' = 'true' --Extracted Table | alias | |----------| | "value2" |
In the second case,
value1
for theALIAS
field is ignored when case sensitivity is set totrue
and the path extractor is specified asalias
. - ion.
<column>
.path_extractor -
Optional
Default: NA
Values: String with search path
Creates a path extractor with the specified search path for the given column. Path extractors map Amazon Ion fields to Hive columns. If no path extractors are specified, Athena dynamically creates path extractors at run time based on column names.
The following example path extractor maps the
example_ion_field
to theexample_hive_column
.'ion.example_hive_column.path_extractor' = '(example_ion_field)'
For more information about path extractors and search paths, see Use path extractors.
- ion.timestamp.serialization_offset
-
Optional
Default:
'Z'
Values:
OFFSET
, whereOFFSET
is represented as
. Example values:<signal>
hh:mm01:00
,+01:00
,-09:30
,Z
(UTC, same as 00:00)Unlike Apache Hive timestamps
, which have no built-in time zone and are stored as an offset from the UNIX epoch, Amazon Ion timestamps do have an offset. Use this property to specify the offset when you serialize to Amazon Ion. The following example adds an offset of one hour.
'ion.timestamp.serialization_offset' = '+01:00'
- ion.serialize_null
-
Optional
Default:
OMIT
Values:
OMIT
,UNTYPED
,TYPED
The Amazon Ion SerDe can be configured to either serialize or omit columns that have null values. You can choose to write out strongly typed nulls (
TYPED
) or untyped nulls (UNTYPED
). Strongly typed nulls are determined based on the default Amazon Ion to Hive type mapping.The following example specifies strongly typed nulls.
'ion.serialize_null'='TYPED'
- ion.ignore_malformed
-
Optional
Default:
false
Values:
true
,false
When
true
, ignores malformed entries or the whole file if the SerDe is unable to read it. For more information, see Ignore malformedin the documentation on GitHub. - ion.
<column>
.serialize_as -
Optional
Default: Default type for the column.
Values: String containing Amazon Ion type
Determines the Amazon Ion data type in which a value is serialized. Because Amazon Ion and Hive types do not always have a direct mapping, a few Hive types have multiple valid data types for serialization. To serialize data as a non-default data type, use this property. For more information about type mapping, see the Amazon Ion Type mapping
page on GitHub. By default, binary Hive columns are serialized as Amazon Ion blobs, but they can also be serialized as an Amazon Ion clob
(character large object). The following example serializes the column example_hive_binary_column
as a clob.'ion.example_hive_binary_column.serialize_as' = 'clob'