You can use AWS Glue console, AWS CLI, or AWS API to enable compaction for your Apache Iceberg tables in the AWS Glue Data Catalog. For new tables, you can choose Apache Iceberg as table format and enable compaction when you create the table. Compaction is disabled by default for new tables.
To enable compaction
-
Open the AWS Glue console at https://console.aws.amazon.com/glue/
and sign in as a data lake administrator, the table creator, or a user who has been granted the glue:UpdateTable
andlakeformation:GetDataAccess
permissions on the table. -
In the navigation pane, under Data Catalog, choose Tables.
On the Tables page, choose a table in open table format that you want to enable compaction for, then under Actions menu, choose Optimization, and then choose Enable.
You can also enable compaction by selecting the Table optimization tab on the Table details page. Choose the Table optimization tab on the lower section of the page, and choose Enable compaction.
The Enable optimization option is also available when you create a new Iceberg table in the Data Catalog.
-
On the Enable optimization page, choose Compaction under Optimization options.
-
Next, select an IAM role from the drop down with the permissions shown in the Table optimization prerequisites section.
You can also choose Create a new IAM role option to create a custom role with the required permissions to run compaction.
Follow the steps below to update an existing IAM role:
-
To update the permissions policy for the IAM role, in the IAM console, go to the IAM role that is being used for running compaction.
-
In the Add permissions section, choose Create policy. In the newly opened browser window, create a new policy to use with your role.
-
On the Create policy page, choose the
JSON
tab. Copy the JSON code shown in the Prerequisites into the policy editor field.
-
-
If you have security policy configurations where the Iceberg table optimizer needs to access Amazon S3 buckets from a specific Virtual Private Cloud (VPC), create an AWS Glue network connection or use an existing one.
If you don't have an AWS Glue VPC connection set up already, create a new one by following the steps in the Creating connections for connectors section using the AWS Glue console or the AWS CLI/SDK.
-
Choose Enable optimization.
After you enable compaction, Table optimization tab shows the following compaction details (after approximately 15-20 minutes):
- Start time
-
The time at which the compaction process started within Data Catalog. The value is a timestamp in UTC time.
- End time
-
The time at which the compaction process ended in Data Catalog. The value is a timestamp in UTC time.
- Status
-
The status of the compaction run. Values are success or fail.
- Files compacted
Total number of files compacted.
- Bytes compacted
-
Total number of bytes compacted.