Synchronize Delta Lake metadata
Athena synchronizes table metadata, including schema, partition columns, and table properties, to AWS Glue if you use Athena to create your Delta Lake table. As time passes, this metadata can lose its synchronization with the underlying table metadata in the transaction log. To keep your table up to date, you can choose one of the following options:
-
Use the AWS Glue crawler for Delta Lake tables. For more information, see Introducing native Delta Lake table support with AWS Glue crawlers
in the AWS Big Data Blog and Scheduling an AWS Glue crawler in the AWS Glue Developer Guide. -
Drop and recreate the table in Athena.
-
Use the SDK, CLI, or AWS Glue console to manually update the schema in AWS Glue.
Note that the following features require your AWS Glue schema to always have the same schema as the transaction log:
-
Lake Formation
-
Views
-
Row and column filters
If your workflow does not require any of this functionality, and you prefer not to
maintain this compatibility, you can use CREATE TABLE
DDL in Athena and then
add the Amazon S3 path as a SerDe parameter in AWS Glue.
You can use the following procedure to create a Delta Lake table with the Athena and AWS Glue consoles.
To create a Delta Lake table using the Athena and AWS Glue consoles
Open the Athena console at https://console.aws.amazon.com/athena/
. -
In the Athena query editor, use the following DDL to create your Delta Lake table. Note that when using this method, the value for
TBLPROPERTIES
must be'spark.sql.sources.provider' = 'delta'
and not'table_type' = 'delta'
.Note that this same schema (with a single of column named
col
of typearray<string>
) is inserted when you use Apache Spark (Athena for Apache Spark) or most other engines to create your table.CREATE EXTERNAL TABLE [db_name.]table_name(col array<string>) LOCATION 's3://amzn-s3-demo-bucket/
your-folder
/' TBLPROPERTIES ('spark.sql.sources.provider' = 'delta') Open the AWS Glue console at https://console.aws.amazon.com/glue/
. -
In the navigation pane, choose Data Catalog, Tables.
-
In the list of tables, choose the link for your table.
-
On the page for the table, choose Actions, Edit table.
-
In the Serde parameters section, add the key
path
with the values3://amzn-s3-demo-bucket/
.your-folder
/ -
Choose Save.
To create a Delta Lake table using the AWS CLI, enter a command like the following.
aws glue create-table --database-name dbname \ --table-input '{"Name" : "tablename", "StorageDescriptor":{ "Columns" : [ { "Name": "col", "Type": "array<string>" } ], "Location" : "s3://
amzn-s3-demo-bucket
/<prefix>
/", "SerdeInfo" : { "Parameters" : { "serialization.format" : "1", "path" : "s3://amzn-s3-demo-bucket
/<prefix>
/" } } }, "PartitionKeys": [], "TableType": "EXTERNAL_TABLE", "Parameters": { "EXTERNAL": "TRUE", "spark.sql.sources.provider": "delta" } }'