S3 Tables maintenance
S3 Tables offers maintenance operations to enhance the management and performance of your table. The following options are enabled by default for all tables. You can edit or disable these by specifying maintenance configuration files for your S3 table.
Editing this configuration requires the
s3tables:GetTableMaintenanceConfiguration
and
s3tables:PutTableMaintenanceConfiguration
permissions.
Compaction
Compaction combines smaller objects into fewer, larger objects to improve Iceberg query performance. While combining objects, compaction also applies the effects of row-level deletes in your table. Amazon S3 compacts tables based on a target file size optimal for your data access pattern, or a value you specify. The compacted files are written as the most recent snapshot of your table. Compaction is enabled by default for all tables, with a default target file size of 512MB.
Note
Compaction is only support on Apache Parquet file types.
You can only configure compaction at the table level. Compaction will incur an
additional cost. For more information, see the pricing information in the Amazon S3 pricing
- To configure the compaction target file size by using the AWS CLI
-
The following example will change the target file size to 256MB using the
PutTableMaintenanceConfiguration
API.aws s3tables put-table-maintenance-configuration \ --table-bucket-arn arn:aws:s3tables:
us-east-1
:111122223333
:bucket/amzn-s3-demo-bucket1
\ --type icebergCompaction \ --namespacemynamespace
\ --nametesttable
\ --value='{"status":"enabled","settings":{"icebergCompaction":{"targetFileSizeMB":512}}}'For more information, see put-table-maintenance-configuration
in the AWS CLI Command Reference. - To disable compaction by using the AWS CLI
-
The following example will disable compaction using the
PutTableMaintenanceConfiguration
API.aws s3tables put-table-maintenance-configuration \ --table-bucket-arn arn:aws:s3tables:
us-east-1
:111122223333
:bucket/amzn-s3-demo-table-bucket \ --type icebergCompaction \ --namespacemynamespace
\ --nametesttable
\ --value='{"status":"disabled","settings":{"targetFileSizeMB":512}}'For more information, see put-table-maintenance-configuration
in the AWS CLI Command Reference.
Snapshot management
Snapshot management determines the number of active snapshots for your table.
This is based on the MinimumSnapshots
(1 by default) and
MaximumSnapshotAge
(120 hours by default). Snapshot management expires
and removes table snapshots based on these configurations.
When a snapshot expires, Amazon S3 marks any objects referenced only by that snapshot as noncurrent. These
noncurrent objects are deleted after the number of days specified by the NoncurrentDays
property in your unreferenced file removal policy.
Note
Deletes of noncurrent objects are permanent with no way to recover these objects.
To view or recover objects that has been marked as noncurrent you must contact AWS Support. For
information about contacting AWS Support, see Contact AWS
Snapshot management determine the objects to delete from your table with reference only to that table. Any reference made to these objects outside of the table will not prevent snapshot management from deleting an object.
Note
Snapshot management does not support retention values you configure as Iceberg
table properties in the metadata.json
file or through an
ALTER TABLE SET TBLPROPERTIES
SQL command, including branch or
tag-based retention. Snapshot management is disabled when you
configure a branch or tag-based retention policy, or configure a retention policy on
the metadata.json
file that is longer than the values
configured through the PutTableMaintenanceConfiguration
API. In these
cases S3 will not expire or remove snapshots and you will need to manually delete
snapshots or remove the properties from your Iceberg table to avoid storage
charges.
You can only configure snapshot management at the table level. For more information,
see the pricing information in the Amazon S3
pricing
- To configure the snapshot management by using the AWS CLI
-
The following example will set the
MinimumSnapshots
to 10 and theMaximumSnapshotAge
to 2500 hours using thePutTableMaintenanceConfiguration
API.aws --region
us-west-2
s3tables put-table-maintenance-configuration \ --table-bucket-arn arn:aws:s3tables:us-east-1
:111122223333
:bucket/amzn-s3-demo-table-bucket \ --namespacemy_namespace
\ --namemy_table
\ --type icebergCompaction \ --value '{"status":"enabled","settings":{"icebergSnapshotManagement":{"minSnapshotsToKeep":10,"maxSnapshotAgeHours":2500}}}'
For more information, see put-table-maintenance-configuration
Consideration and limitations
To learn more about additional consideration and limits for compaction and snapshot management, see Considerations and limitations for maintenance jobs.