Accelerating data discovery with S3 Metadata - Amazon Simple Storage Service

Accelerating data discovery with S3 Metadata

Note

The S3 Metadata feature is in preview release for Amazon S3 and is subject to change.

Amazon S3 Metadata accelerates data discovery by automatically capturing metadata for the objects in your general purpose buckets and storing it in read-only, fully managed Apache Iceberg tables that you can query. These read-only tables are called metadata tables. As objects are added to, updated, and removed from your general purpose buckets, S3 Metadata automatically refreshes the corresponding metadata tables to reflect the latest changes.

By default, S3 Metadata provides three types of metadata:

  • System-defined metadata, such as an object's creation time and storage class

  • Custom metadata, such as tags and user-defined metadata that was included during object upload

  • Event metadata, such as when an object is updated or deleted, and the AWS account that made the request

For details about what data is stored in metadata tables, see S3 Metadata tables schema.

With S3 Metadata, you can easily find, store, and query metadata for your S3 objects, so that you can quickly prepare data for use in business analytics, content retrieval, artificial intelligence and machine learning (AI/ML) model training, and more.

Metadata tables are stored in S3 table buckets, which provide storage that's optimized for tabular data. To easily query your metadata, you can integrate your table bucket with AWS Glue Data Catalog. After your table bucket is integrated with AWS Glue Data Catalog, you can directly query your metadata tables with query engines such as Amazon Athena, Amazon EMR, Amazon Redshift, Apache Spark, and Apache Trino. You can also query your metadata tables with any other application that supports the Apache Iceberg format. To create dashboards from your metadata tables, use Amazon QuickSight.

For S3 Metadata pricing, see Amazon S3 Pricing.

How metadata tables work

Metadata tables are managed by Amazon S3, and can't be modified by any IAM principal outside of Amazon S3 itself. (You can, however, delete your metadata tables.) As a result, metadata tables are read-only, which helps ensure that they correctly reflect the contents of your bucket.

To keep your Apache Iceberg metadata tables performing at their best, Amazon S3 performs periodic maintenance activities on your tables, such as compaction and unreferenced file removal. These maintenance activities help to both minimize the cost of storing your metadata tables and optimize query performance. This table maintenance happens automatically, requiring no opt-in or ongoing management by you. However, if needed, you can configure these table maintenance activities. For more information, see Table bucket maintenance.

Note

S3 Metadata is designed to continuously append to the metadata table as you make changes to your general purpose bucket. Each update creates a snapshot—a new version of the metadata table. Because of the read-only nature of the metadata table, you can't delete records in the metadata table. You also can't use the snapshot expiration capability of S3 Tables to expire old snapshots of your metadata table.

To help minimize your costs, you can periodically delete your metadata table configuration and your metadata tables, and then recreate them. For more information, see Deleting metadata table configurations and Deleting metadata tables.

To generate and store object metadata in an S3 managed metadata table, you create a metadata table configuration for your general purpose bucket. Amazon S3 is designed to continuously update the metadata table to reflect the latest changes to your data as long as the configuration is active on the bucket.

To create a metadata table configuration, you must make sure that you have the necessary AWS Identity and Access Management (IAM) permissions to create and manage metadata tables. For more information, see Setting up permissions for configuring metadata tables. You must also create or specify an S3 table bucket to store your metadata table in. This table bucket must be in the same AWS Region and account as your general purpose bucket. For more information about creating table buckets, see Creating table buckets.

Note

S3 Metadata doesn't apply to any objects that already existed in your general purpose bucket before you created your metadata table configuration. In other words, S3 Metadata only captures metadata for change events (such as uploads, updates, and deletes) that happen after you have created your metadata table configuration.

To monitor updates to your metadata table configuration, you can use AWS CloudTrail. For more information, see Amazon S3 bucket-level actions that are tracked by CloudTrail logging.