Populating and managing transactional tables
Apache Iceberg
You can populate Iceberg, Hudi, and Delta Lake tables in the AWS Glue Data Catalog using the following methods:
-
AWS Glue crawler; – AWS Glue crawlers can automatically discover and populate Iceberg, Hudi and Delta Lake table metadata in the Data Catalog. For more information, see Using crawlers to populate the Data Catalog .
-
AWS Glue ETL Jobs – You can create ETL jobs to write data to Iceberg, Hudi, and Delta Lake tables and populate their metadata in the Data Catalog. For more information, see Using data lake frameworks with AWS Glue ETL jobs.
-
AWS Glue console, AWS Lake Formation console, AWS CLI or API – You can use the AWS Glue console, Lake Formation console, or API to create and manage Iceberg table definitions in the Data Catalog.
Topics
Creating Apache Iceberg tables
You can create Apache Iceberg tables that use the Apache Parquet data format in the
AWS Glue Data Catalog with data residing in Amazon S3. A table in the Data Catalog is the metadata definition that
represents the data in a data store. By default, AWS Glue creates Iceberg v2 tables. For the
difference between v1 and v2 tables, see Format version
changes
Apache Iceberg
You can use AWS Glue or Lake Formation console or the CreateTable
operation in the
AWS Glue API to create an Iceberg table in the Data Catalog. For more information, see CreateTable action (Python: create_table).
When you create an Iceberg table in the Data Catalog, you must specify the table format and metadata file path in Amazon S3 to be able to perform reads and writes.
You can use Lake Formation to secure your Iceberg table using fine-grained access control permissions when you register the Amazon S3 data location with AWS Lake Formation. For source data in Amazon S3 and metadata that is not registered with Lake Formation, access is determined by IAM permissions policies for Amazon S3 and AWS Glue actions. For more information, see Managing permissions.
Note
Data Catalog doesn’t support creating partitions and adding Iceberg table properties.
Prerequisites
To create Iceberg tables in the Data Catalog, and set up Lake Formation data access permissions, you need to complete the following requirements:
-
Permissions required to create Iceberg tables without the data registered with Lake Formation.
In addition to the permissions required to create a table in the Data Catalog, the table creator requires the following permissions:
s3:PutObject
on resource arn:aws:s3:::{bucketName}-
s3:GetObject
on resource arn:aws:s3:::{bucketName} -
s3:DeleteObject
on resource arn:aws:s3:::{bucketName}
-
Permissions required to create Iceberg tables with data registered with Lake Formation:
To use Lake Formation to manage and secure the data in your data lake, register your Amazon S3 location that has the data for tables with Lake Formation. This is so that Lake Formation can vend credentials to AWS analytical services such as Athena, Redshift Spectrum, and Amazon EMR to access data. For more information on registering an Amazon S3 location, see Adding an Amazon S3 location to your data lake.
A principal who reads and writes the underlying data that is registered with Lake Formation requires the following permissions:
-
lakeformation:GetDataAccess
-
DATA_LOCATION_ACCESS
A principal who has data location permissions on a location also has location permissions on all child locations.
For more information on data location permissions, see Underlying data access controlulink.
-
To enable compaction, the service needs to assume an IAM role that has permissions to update tables in the Data Catalog. For details, see Table optimization prerequisites
Creating an Iceberg table
You can create Iceberg v1 and v2 tables using AWS Glue or Lake Formation console or AWS Command Line Interface as documented on this page. You can also create Iceberg tables using the AWS Glue crawler. For more information, see Data Catalog and Crawlers in the AWS Glue Developer Guide.
To create an Iceberg table