Query Linux Foundation Delta Lake tables
Linux Foundation Delta LakeMSCK REPAIR
statement.
The Delta Lake format stores the minimum and maximum values per column of each data file. The Athena implementation makes use of this information to enable file-skipping on predicates to eliminate unwanted files from consideration.
Considerations and limitations
Delta Lake support in Athena has the following considerations and limitations:
-
Tables with AWS Glue catalog only – Native Delta Lake support is supported only through tables registered with AWS Glue. If you have a Delta Lake table that is registered with another metastore, you can still keep it and treat it as your primary metastore. Because Delta Lake metadata is stored in the file system (for example, in Amazon S3) rather than in the metastore, Athena requires only the location property in AWS Glue to read from your Delta Lake tables.
-
V3 engine only – Delta Lake queries are supported only on Athena engine version 3. You must ensure that the workgroup you create is configured to use Athena engine version 3.
-
No time travel support – There is no support for queries that use Delta Lake’s time travel capabilities.
-
Read only – Write DML statements like
UPDATE
,INSERT
, orDELETE
are not supported. -
Lake Formation support – Lake Formation integration is available for Delta Lake tables with their schema in sync with AWS Glue. For more information, see Using AWS Lake Formation with Amazon Athena and Set up permissions for a Delta Lake table in the AWS Lake Formation Developer Guide.
-
Limited DDL support – The following DDL statements are supported:
CREATE EXTERNAL TABLE
,SHOW COLUMNS
,SHOW TBLPROPERTIES
,SHOW PARTITIONS
,SHOW CREATE TABLE
, andDESCRIBE
. For information on using theCREATE EXTERNAL TABLE
statement, see the Get started section. -
Skipping S3 Glacier objects not supported – If objects in the Linux Foundation Delta Lake table are in an Amazon S3 Glacier storage class, setting the
read_restored_glacier_objects
table property tofalse
has no effect.For example, suppose you issue the following command:
ALTER TABLE
table_name
SET TBLPROPERTIES ('read_restored_glacier_objects' = 'false')For Iceberg and Delta Lake tables, the command produces the error
Unsupported table property key: read_restored_glacier_objects
. For Hudi tables, theALTER TABLE
command does not produce an error, but Amazon S3 Glacier objects are still not skipped. RunningSELECT
queries after theALTER TABLE
command continues to return all objects.
Delta Lake versioning and Athena
Athena does not use the versioning
-
Reader version – Every Delta Lake table has a reader version. Currently, this is a number between 1 and 3. Queries that include a table with a reader version that Athena does not support will fail.
-
Table features – Every Delta Lake table also can declare a set of reader/writer features. Because Athena's support of Delta Lake is read-only, table writer feature compatibility does not apply. However, queries on tables with unsupported table reader features will fail.
The following table shows the Delta Lake reader versions and Delta Lake table reader features that Athena supports.
Query type | Supported reader versions | Supported reader features |
---|---|---|
DQL (SELECT statements) | <= 3 | Column mapping |
DDL | <= 1 | Not applicable. Reader features can be declared only on tables with a reader version of 2 or greater. |
-
For a list of Delta Lake table features, see Valid feature names in table features
on GitHub.com -
For a list of Delta Lake features by protocol version, see Features by protocol version
on GitHub.com.
To create a Delta Lake table in Athena with a reader version greater than 1, see Synchronize Delta Lake metadata.