S3 Tables maintenance - Amazon Simple Storage Service

S3 Tables maintenance

S3 Tables offers maintenance operations to enhance the management and performance of your table. The following options are enabled by default for all tables. You can edit or disable these by specifying maintenance configuration files for your S3 table.

Editing this configuration requires the s3tables:GetTableMaintenanceConfiguration and s3tables:PutTableMaintenanceConfiguration permissions.

Compaction

Compaction combines smaller objects into fewer, larger objects to improve Iceberg query performance. While combining objects, compaction also applies the effects of row-level deletes in your table. Amazon S3 compacts tables based on a target file size optimal for your data access pattern, or a value you specify. The compacted files are written as the most recent snapshot of your table. Compaction is enabled by default for all tables, with a default target file size of 512MB.

Note

Compaction is only support on Apache Parquet file types.

You can only configure compaction at the table level. Compaction will incur an additional cost. For more information, see the pricing information in the Amazon S3 pricing.

To configure the compaction target file size by using the AWS CLI

The following example will change the target file size to 256MB using the PutTableMaintenanceConfiguration API.

aws s3tables put-table-maintenance-configuration \ --table-bucket-arn arn:aws:s3tables:us-east-1:111122223333:bucket/amzn-s3-demo-bucket1 \ --type icebergCompaction \ --namespace mynamespace \ --name testtable \ --value='{"status":"enabled","settings":{"icebergCompaction":{"targetFileSizeMB":512}}}'

For more information, see put-table-maintenance-configuration in the AWS CLI Command Reference.

To disable compaction by using the AWS CLI

The following example will disable compaction using the PutTableMaintenanceConfiguration API.

aws s3tables put-table-maintenance-configuration \ --table-bucket-arn arn:aws:s3tables:us-east-1:111122223333:bucket/amzn-s3-demo-table-bucket \ --type icebergCompaction \ --namespace mynamespace \ --name testtable \ --value='{"status":"disabled","settings":{"targetFileSizeMB":512}}'

For more information, see put-table-maintenance-configuration in the AWS CLI Command Reference.

Snapshot management

Snapshot management determines the number of active snapshots for your table. This is based on the MinimumSnapshots (1 by default) and MaximumSnapshotAge (120 hours by default). Snapshot management expires and removes table snapshots based on these configurations.

When a snapshot expires, Amazon S3 marks any objects referenced only by that snapshot as noncurrent. These noncurrent objects are deleted after the number of days specified by the NoncurrentDays property in your unreferenced file removal policy.

Note

Deletes of noncurrent objects are permanent with no way to recover these objects.

To view or recover objects that has been marked as noncurrent you must contact AWS Support. For information about contacting AWS Support, see Contact AWS or the AWS Support Documentation.

Snapshot management determine the objects to delete from your table with reference only to that table. Any reference made to these objects outside of the table will not prevent snapshot management from deleting an object.

Note

Snapshot management does not support retention values you configure as Iceberg table properties in the metadata.json file or through an ALTER TABLE SET TBLPROPERTIES SQL command, including branch or tag-based retention. Snapshot management is disabled when you configure a branch or tag-based retention policy, or configure a retention policy on the metadata.json file that is longer than the values configured through the PutTableMaintenanceConfiguration API. In these cases S3 will not expire or remove snapshots and you will need to manually delete snapshots or remove the properties from your Iceberg table to avoid storage charges.

You can only configure snapshot management at the table level. For more information, see the pricing information in the Amazon S3 pricing.

To configure the snapshot management by using the AWS CLI

The following example will set the MinimumSnapshots to 10 and the MaximumSnapshotAge to 2500 hours using the PutTableMaintenanceConfiguration API.

aws --region us-west-2 s3tables put-table-maintenance-configuration \ --table-bucket-arn arn:aws:s3tables:us-east-1:111122223333:bucket/amzn-s3-demo-table-bucket \ --namespace my_namespace \ --name my_table \ --type icebergCompaction \ --value '{"status":"enabled","settings":{"icebergSnapshotManagement":{"minSnapshotsToKeep":10,"maxSnapshotAgeHours":2500}}}'

For more information, see put-table-maintenance-configuration in the AWS CLI Command Reference.

Consideration and limitations

To learn more about additional consideration and limits for compaction and snapshot management, see Considerations and limitations for maintenance jobs.