Example metadata table queries
Note
The S3 Metadata feature is in preview release for Amazon S3 and is subject to change.
The following examples show how you can get different types information from your S3 Metadata tables by using standard SQL queries.
Remember when using these examples:
-
The examples are written to work with Amazon Athena. You might have to modify the examples to work with a different query engine.
-
Make sure that you understand how to optimize your queries.
-
Replace
with the name of the S3 table bucket that's storing your metadata table.amzn-s3-demo-bucket
-
Replace
with the name of the metadata table that you're querying.my_metadata_table
-
For a full list of supported columns, see the S3 Metadata tables schema.
Finding objects by file extension
The following query returns objects with a specific file extension (.jpg
in
this case).
SELECT key FROM "s3tablescatalog/
amzn-s3-demo-bucket
"."aws_s3_metadata"."my_metadata_table
" WHERE key LIKE '%.jpg' AND record_type = 'CREATE'
Listing object deletions
The following query returns object deletion events, including the AWS account ID or AWS service principal that made the request.
SELECT DISTINCT bucket, key, sequence_number, record_type, record_timestamp, requester, source_ip_address, version_id FROM "s3tablescatalog/
amzn-s3-demo-bucket
"."aws_s3_metadata"."my_metadata_table
" WHERE record_type = 'DELETE';
Listing AWS KMS encryption keys used by your objects
The following query returns the ARNs of the AWS Key Management Service (AWS KMS) keys encrypting your objects.
SELECT DISTINCT kms_key_arn FROM "s3tablescatalog/
amzn-s3-demo-bucket
"."aws_s3_metadata"."my_metadata_table
";
Listing objects that don't use KMS keys
The following query returns objects that aren't encrypted with AWS KMS keys.
SELECT DISTINCT kms_key_arn FROM "s3tablescatalog/
amzn-s3-demo-bucket
"."aws_s3_metadata"."my_metadata_table
" WHERE encryption_status NOT IN ('SSE-KMS', 'DSSE-KMS') AND record_type = 'CREATE';
Viewing metadata provided by Amazon Bedrock
Some AWS services (such as Amazon Bedrock),
upload objects to Amazon S3. You can query the object metadata provided by these services. For
example, the following query includes the user_metadata
column to determine if
there are objects uploaded by Amazon Bedrock to a general purpose bucket.
SELECT DISTINCT bucket, key, sequence_number, record_type, record_timestamp, user_metadata FROM "s3tablescatalog/
amzn-s3-demo-bucket
"."aws_s3_metadata"."my_metadata_table
" WHERE record_type = 'CREATE' AND user_metadata['content-source'] = 'AmazonBedrock';
If Amazon Bedrock uploaded an object to your bucket, the user_metadata
column will
display the following metadata associated with the object in the query result:
user_metadata {content-additional-params -> requestid="CVK8FWYRW0M9JW65", signedContentSHA384="38b060a751ac96384cd9327eb1b1e36a21fdb71114be07434c0cc7bf63f6e1da274edebfe76f65fbd51ad2f14898b95b", content-model-id -> bedrock-model-arn, content-source -> AmazonBedrock}
Understanding the current state of your objects
The following query can help you determine the current state of your objects. The query
identifies the most recent version of each object, filters out deleted objects, and marks
the latest version of each object based on sequence numbers. Results are ordered by the
bucket
, key
, and sequence_number
columns.
WITH records_of_interest as ( -- Start with a query that can narrow down the records of interest. SELECT * from "s3tablescatalog/
amzn-s3-demo-bucket
"."aws_s3_metadata"."my_metadata_table
" ), version_stacks as ( SELECT *, -- Introduce a column called 'next_sequence_number', which is the next larger -- sequence_number for the same key version_id in sorted order. LEAD(sequence_number, 1) over (partition by (bucket, key, coalesce(version_id, '')) order by sequence_number ASC) as next_sequence_number from records_of_interest ), -- Pick the 'tip' of each version stack triple: (bucket, key, version_id). -- The tip of the version stack is the row of that triple with the largest sequencer. -- Selecting only the tip filters out any row duplicates. -- This isn't typical, but some events can be delivered more than once to the table -- and include rows that might no longer exist in the bucket (since the -- table contains rows for both extant and extinct objects). -- In the next subquery, eliminate the rows that contain deleted objects. current_versions as ( SELECT * from version_stacks where next_sequence_number is NULL ), -- Eliminate the rows that are extinct from the bucket by filtering with -- record_type. An object version has been deleted from the bucket if its tip is -- record_type==DELETE. existing_current_versions as ( SELECT * from current_versions where not (record_type = 'DELETE' and is_delete_marker = FALSE) ), -- Optionally, to determine which of several object versions is the 'latest', -- you can compare their sequence numbers. A version_id is the latest if its -- tip's sequencer is the largest among all other tips in the same key. with_is_latest as ( SELECT *, -- Determine if the sequence_number of this row is the same as the largest sequencer for the key that still exists. sequence_number = (MAX(sequence_number) over (partition by (bucket, key))) as is_latest_version FROM existing_current_versions ) SELECT * from with_is_latest ORDER BY bucket, key, sequence_number;