Work with timestamp data - Amazon Athena

Work with timestamp data

This section describes some considerations for working with timestamp data in Athena.

Note

The treatment of timestamps has changed somewhat between Athena engine version 2 and Athena engine version 3. For information about timestamp-related errors that can occur in Athena engine version 3 and suggested solutions, see Timestamp changes in the Athena engine version 3 reference.

Format for writing timestamp data to Amazon S3 objects

The format in which timestamp data should be written into Amazon S3 objects depends on both the column data type and the SerDe library that you use.

  • If you have a table column of type DATE, Athena expects the corresponding column or property of the data to be a string in the ISO format YYYY-MM-DD, or a built-in date type like those for Parquet or ORC.

  • If you have a table column of type TIME, Athena expects the corresponding column or property of the data to be a string in the ISO format HH:MM:SS, or a built-in time type like those for Parquet or ORC.

  • If you have a table column of type TIMESTAMP, Athena expects the corresponding column or property of the data to be a string in the format YYYY-MM-DD HH:MM:SS.SSS (note the space between the date and time), or a built-in time type like those for Parquet, ORC, or Ion. Note that Athena does not guarantee the behavior for timestamps that are invalid (for example, 0000-00-00 08:00:00.000).

    Note

    OpenCSVSerDe timestamps are an exception and must be encoded as millisecond resolution UNIX epochs.

Ensuring that time-partitioned data matches the timestamp field in a record

The producer of the data must make sure partition values align with the data within the partition. For example, if your data has a timestamp property and you use Firehose to load the data into Amazon S3, you must use dynamic partitioning because the default partitioning of Firehose is wall-clock-based.

Use string as the data type for partition keys

For performance reasons, it is preferable to use STRING as the data type for partition keys. Even though Athena recognizes partition values in the format YYYY-MM-DD as dates when you use the DATE type, this can lead to poor performance. For this reason, we recommend that you use the STRING data type for partition keys instead.

How to write queries for timestamp fields that are also time-partitioned

How you write queries for timestamp fields that are time-partitioned depends on the type of table that you want to query.

Hive tables

With the Hive tables most commonly used in Athena, the query engine has no knowledge of relationships between columns and partition keys. For this reason, you must always add predicates in your queries for both the column and the partition key.

For example, suppose you have an event_time column and an event_date partition key and want to query events between 23:00 and 03:00. In this case, you must include predicates in your query for both the column and the partition key, as in the following example.

WHERE event_time BETWEEN start_time AND end_time AND event_date BETWEEN start_time_date AND end_time_date

Iceberg tables

With Iceberg tables, you can use computed partition values, which simplifies your queries. For example, suppose your Iceberg table was created with a PARTITIONED BY clause like the following:

PARTITIONED BY (event_date month(event_time))

In this case, the query engine automatically prunes partitions based on the values of the event_time predicates. Because of this, your query only needs to specify a predicate for event_time, as in the following example.

WHERE event_time BETWEEN start_time AND end_time

For more information, see Create Iceberg tables.