

# Cataloging and analyzing your data with S3 Inventory
<a name="storage-inventory"></a>

You can use Amazon S3 Inventory to help manage your storage. For example, you can use it to audit and report on the replication and encryption status of your objects for business, compliance, and regulatory needs. You can also simplify and speed up business workflows and big data jobs by using Amazon S3 Inventory, which provides a scheduled alternative to the Amazon S3 synchronous `List` API operations. Amazon S3 Inventory does not use the `List` API operations to audit your objects and does not affect the request rate of your bucket.

Amazon S3 Inventory provides comma-separated values (CSV), [Apache optimized row columnar (ORC)](https://orc.apache.org/) or [https://parquet.apache.org/](https://parquet.apache.org/) output files that list your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or objects with a shared prefix (that is, objects that have names that begin with a common string). If you set up a weekly inventory, a report is generated every Sunday (UTC time zone) after the initial report. For information about Amazon S3 Inventory pricing, see [Amazon S3 pricing](https://aws.amazon.com/s3/pricing/).

You can configure multiple inventory lists for a bucket. When you're configuring an inventory list, you can specify the following: 
+ What object metadata to include in the inventory
+ Whether to list all object versions or only current versions
+ Where to store the inventory list file output
+ Whether to generate the inventory on a daily or weekly basis
+ Whether to encrypt the inventory list file

You can query Amazon S3 Inventory with standard SQL queries by using [Amazon Athena](https://docs.aws.amazon.com/athena/latest/ug/what-is.html), [Amazon Redshift Spectrum](https://docs.aws.amazon.com/redshift/latest/dg/c-getting-started-using-spectrum.html), and other tools, such as [https://prestodb.io/](https://prestodb.io/), [https://hive.apache.org/](https://hive.apache.org/), and [https://databricks.com/spark/about/](https://databricks.com/spark/about/). For more information about using Athena to query your inventory files, see [Querying Amazon S3 Inventory with Amazon Athena](storage-inventory-athena-query.md). 

**Note**  
It might take up to 48 hours for Amazon S3 to deliver the first inventory report.

**Note**  
After deleting an inventory configuration, Amazon S3 might still deliver one additional inventory report during a brief transition period while the system processes the deletion.

## Source and destination buckets
<a name="storage-inventory-buckets"></a>

The bucket that the inventory lists objects for is called the *source bucket*. The bucket where the inventory list file is stored is called the *destination bucket*. 

**Source bucket**

The inventory lists the objects that are stored in the source bucket. You can get an inventory list for an entire bucket, or you can filter the list by object key name prefix.

The source bucket:
+ Contains the objects that are listed in the inventory
+ Contains the configuration for the inventory

**Destination bucket**

Amazon S3 Inventory list files are written to the destination bucket. To group all the inventory list files in a common location in the destination bucket, you can specify a destination prefix in the inventory configuration.

The destination bucket:
+ Contains the inventory file lists. 
+ Contains the manifest files that list all the inventory list files that are stored in the destination bucket. For more information, see [Inventory manifest](storage-inventory-location.md#storage-inventory-location-manifest).
+ Must have a bucket policy to give Amazon S3 permission to verify ownership of the bucket and permission to write files to the bucket. 
+ Must be in the same AWS Region as the source bucket.
+ Can be the same as the source bucket.
+ Can be owned by a different AWS account than the account that owns the source bucket.

## Amazon S3 Inventory list
<a name="storage-inventory-contents"></a>

An inventory list file contains a list of the objects in the source bucket and metadata for each object. An inventory list file is stored in the destination bucket with one of the following formats:
+ As a CSV file compressed with GZIP
+ As an Apache optimized row columnar (ORC) file compressed with ZLIB
+ As an Apache Parquet file compressed with Snappy

**Note**  
Objects in Amazon S3 Inventory reports aren't guaranteed to be sorted in any order.

An inventory list file contains a list of the objects in the source bucket and metadata for each listed object. These default fields are always included:
+ **Bucket name** – The name of the bucket that the inventory is for.
+ **ETag** – The entity tag (ETag) is a hash of the object. The ETag reflects changes only to the contents of an object, not to its metadata. The ETag can be an MD5 digest of the object data. Whether it is depends on how the object was created and how it is encrypted. For more information, see [https://docs.aws.amazon.com/AmazonS3/latest/API/API_Object.html](https://docs.aws.amazon.com/AmazonS3/latest/API/API_Object.html) in the *Amazon Simple Storage Service API Reference*.
+ **Key name** – The object key name (or key) that uniquely identifies the object in the bucket. When you're using the CSV file format, the key name is URL-encoded and must be decoded before you can use it.
+ **Last modified date** – The object creation date or the last modified date, whichever is the latest.
+ **Size** – The object size in bytes, not including the size of incomplete multipart uploads, object metadata, and delete markers.
+ **Storage class** – The storage class that's used for storing the object. Set to `STANDARD`, `REDUCED_REDUNDANCY`, `STANDARD_IA`, `ONEZONE_IA`, `INTELLIGENT_TIERING`, `GLACIER`, `DEEP_ARCHIVE`, `OUTPOSTS`, `GLACIER_IR`, or `SNOW`. For more information, see [Understanding and managing Amazon S3 storage classes](storage-class-intro.md).
**Note**  
S3 Inventory does not support S3 Express One Zone.

You can choose to include the following additional metadata fields in the report:
+ **Checksum algorithm** – Indicates the algorithm that's used to create the checksum for the object. For more information, see [Using supported checksum algorithms](checking-object-integrity-upload.md#using-additional-checksums).
+ **Encryption status** – The server-side encryption status, depending on what kind of encryption key is used— server-side encryption with Amazon S3 managed keys (SSE-S3), server-side encryption with AWS Key Management Service (AWS KMS) keys (SSE-KMS), dual-layer server-side encryption with AWS KMS keys (DSSE-KMS), or server-side encryption with customer-provided keys (SSE-C). Set to `SSE-S3`, `SSE-KMS`, `DSSE-KMS`, `SSE-C`, or `NOT-SSE`. A status of `NOT-SSE` means that the object is not encrypted with server-side encryption. For more information, see [Protecting data with encryption](UsingEncryption.md).
+ **S3 Intelligent-Tiering access tier** – Access tier (frequent or infrequent) of the object if it is stored in the S3 Intelligent-Tiering storage class. Set to `FREQUENT`, `INFREQUENT`, `ARCHIVE_INSTANT_ACCESS`, `ARCHIVE`, or `DEEP_ARCHIVE`. For more information, see [Storage class for automatically optimizing data with changing or unknown access patterns](storage-class-intro.md#sc-dynamic-data-access).
+ **S3 Object Lock retain until date** – The date until which the locked object cannot be deleted. For more information, see [Locking objects with Object Lock](object-lock.md).
+ **S3 Object Lock retention mode** – Set to `Governance` or `Compliance` for objects that are locked. For more information, see [Locking objects with Object Lock](object-lock.md).
+ **S3 Object Lock legal hold status ** – Set to `On` if a legal hold has been applied to an object. Otherwise, it is set to `Off`. For more information, see [Locking objects with Object Lock](object-lock.md).
+ **Version ID** – The object version ID. When you enable versioning on a bucket, Amazon S3 assigns a version number to objects that are added to the bucket. For more information, see [Retaining multiple versions of objects with S3 Versioning](Versioning.md). (This field is not included if the list is configured only for the current version of the objects.)
+ **IsLatest** – Set to `True` if the object is the current version of the object. (This field is not included if the list is configured only for the current version of the objects.)
+ **Delete marker** – Set to `True` if the object is a delete marker. For more information, see [Retaining multiple versions of objects with S3 Versioning](Versioning.md). (This field is automatically added to your report if you've configured the report to include all versions of your objects).
+ **Multipart upload flag** – Set to `True` if the object was uploaded as a multipart upload. For more information, see [Uploading and copying objects using multipart upload in Amazon S3](mpuoverview.md).
+ **Object owner** – The canonical user ID of the owner of the object. For more information, see [Find the canonical user ID for your AWS account ](https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-identifiers.html#FindCanonicalId) in the *AWS Account Management Reference Guide*.
+ **Replication status** – Set to `PENDING`, `COMPLETED`, `FAILED`, or `REPLICA`. For more information, see [Getting replication status information](replication-status.md).
+ **S3 Bucket Key status** – Set to `ENABLED` or `DISABLED`. Indicates whether the object uses an S3 Bucket Key for SSE-KMS. For more information, see [Using Amazon S3 Bucket Keys](bucket-key.md).
+ **Object access control list** – An access control list (ACL) for each object that defines which AWS accounts or groups are granted access to this object and the type of access that is granted. The Object ACL field is defined in JSON format. An S3 Inventory report includes ACLs that are associated with objects in your source bucket, even when ACLs are disabled for the bucket. For more information, see [Working with the Object ACL field](objectacl.md) and [Access control list (ACL) overview](acl-overview.md).
**Note**  
The Object ACL field is defined in JSON format. An inventory report displays the value for the Object ACL field as a base64-encoded string.  
For example, suppose that you have the following Object ACL field in JSON format:  

  ```
  {
          "version": "2022-11-10",
          "status": "AVAILABLE",
          "grants": [{
              "canonicalId": "example-canonical-user-ID",
              "type": "CanonicalUser",
              "permission": "READ"
          }]
  }
  ```
The Object ACL field is encoded and shown as the following base64-encoded string:  

  ```
  eyJ2ZXJzaW9uIjoiMjAyMi0xMS0xMCIsInN0YXR1cyI6IkFWQUlMQUJMRSIsImdyYW50cyI6W3siY2Fub25pY2FsSWQiOiJleGFtcGxlLWNhbm9uaWNhbC11c2VyLUlEIiwidHlwZSI6IkNhbm9uaWNhbFVzZXIiLCJwZXJtaXNzaW9uIjoiUkVBRCJ9XX0=
  ```
To get the decoded value in JSON for the Object ACL field, you can query this field in Amazon Athena. For query examples, see [Querying Amazon S3 Inventory with Amazon Athena](storage-inventory-athena-query.md).
+ **Lifecycle Expiration Date** – Set to the lifecycle expiration timestamp of the object. This field will only be populated, if the object is to be expired by an applicable lifecycle rule. In other cases, the field will be empty. Objects with `FAILED` replication status will not have an expiration date populated, as S3 Lifecycle prevents expiration and transition actions on these objects until replication has succeeded. For more information, see [Expiring objects](lifecycle-expire-general-considerations.md).

**Note**  
When an object reaches the end of its lifetime based on its lifecycle configuration, Amazon S3 queues the object for removal and removes it asynchronously. Therefore, there might be a delay between the expiration date and the date when Amazon S3 removes an object. The inventory report includes the objects that have expired but haven't been removed yet. For more information about expiration actions in S3 Lifecycle, see [Expiring objects](lifecycle-expire-general-considerations.md).

The following is an example inventory report with additional metadata fields consisting of four records.

```
amzn-s3-demo-bucket1    example-object-1    EXAMPLEDC8l.XJCENlF7LePaNIIvs001    TRUE        1500    2024-08-15T15:28:26.0004    EXAMPLE21e1518b92f3d92773570f600    STANDARD    FALSE    COMPLETED    SSE-KMS    2025-01-25T15:28:26.000Z    COMPLIANCE    Off        ENABLED        eyJ2ZXJzaW9uIjoiMjAyMi0xMS0xMCIsInN0YXR1cyI6IkFWQUlMQUJMRSIsImdyYW50cyI6W3sicGVybWlzc2lvbiI6IkZVTExfQ09OVFJPTCIsInR5cGUiOiJDYW5vbmljYWxVc2VyIiwiY2Fub25pY2FsSWQiOiJFWEFNUExFNzY2ZThmNmIxMTVkOTNkNDFkZjJlYWM0MjBhYTRhNDY1ZDE3N2MxMzk4YmM2YTA4OGM3NmI3MDAwIn1dfQ==    EXAMPLE766e8f6b115d93d41df2eac420aa4a465d177c1398bc6a088c76b7000
amzn-s3-demo-bucket1    example-object-2    EXAMPLEDC8l.XJCENlF7LePaNIIvs002    TRUE        200    2024-08-21T15:28:26.000Z    EXAMPLE21e1518b92f3d92773570f601    INTELLIGENT_TIERING    FALSE    COMPLETED    SSE-KMS    2025-01-25T15:28:26.000Z    COMPLIANCE    Off    INFREQUENT    ENABLED    SHA-256    eyJ2ZXJzaW9uIjoiMjAyMi0xMS0xMCIsInN0YXR1cyI6IkFWQUlMQUJMRSIsImdyYW50cyI6W3sicGVybWlzc2lvbiI6IkZVTExfQ09OVFJPTCIsInR5cGUiOiJDYW5vbmljYWxVc2VyIiwiY2Fub25pY2FsSWQiOiJFWEFNUExFNzY2ZThmNmIxMTVkOTNkNDFkZjJlYWM0MjBhYTRhNDY1ZDE3N2MxMzk4YmM2YTA4OGM3NmI3MDAwIn1dfQ==    EXAMPLE766e8f6b115d93d41df2eac420aa4a465d177c1398bc6a088c76b7001
amzn-s3-demo-bucket1    example-object-3    EXAMPLEDC8l.XJCENlF7LePaNIIvs003    TRUE        12500    2023-01-15T15:28:30.000Z    EXAMPLE21e1518b92f3d92773570f602    STANDARD    FALSE    REPLICA    SSE-KMS    2025-01-25T15:28:26.000Z    GOVERNANCE    On        ENABLED        eyJ2ZXJzaW9uIjoiMjAyMi0xMS0xMCIsInN0YXR1cyI6IkFWQUlMQUJMRSIsImdyYW50cyI6W3sicGVybWlzc2lvbiI6IkZVTExfQ09OVFJPTCIsInR5cGUiOiJDYW5vbmljYWxVc2VyIiwiY2Fub25pY2FsSWQiOiJFWEFNUExFNzY2ZThmNmIxMTVkOTNkNDFkZjJlYWM0MjBhYTRhNDY1ZDE3N2MxMzk4YmM2YTA4OGM3NmI3MDAwIn1dfQ==    EXAMPLE766e8f6b115d93d41df2eac420aa4a465d177c1398bc6a088c76b7002
amzn-s3-demo-bucket1    example-object-4    EXAMPLEDC8l.XJCENlF7LePaNIIvs004    TRUE        100    2021-02-15T15:28:27.000Z    EXAMPLE21e1518b92f3d92773570f603    STANDARD    FALSE    COMPLETED    SSE-KMS    2025-01-25T15:28:26.000Z    COMPLIANCE    Off        ENABLED        eyJ2ZXJzaW9uIjoiMjAyMi0xMS0xMCIsInN0YXR1cyI6IkFWQUlMQUJMRSIsImdyYW50cyI6W3sicGVybWlzc2lvbiI6IkZVTExfQ09OVFJPTCIsInR5cGUiOiJDYW5vbmljYWxVc2VyIiwiY2Fub25pY2FsSWQiOiJFWEFNUExFNzY2ZThmNmIxMTVkOTNkNDFkZjJlYWM0MjBhYTRhNDY1ZDE3N2MxMzk4YmM2YTA4OGM3NmI3MDAwIn1dfQ==    EXAMPLE766e8f6b115d93d41df2eac420aa4a465d177c1398bc6a088c76b7003
```

We recommend that you create a lifecycle policy that deletes old inventory lists. For more information, see [Managing the lifecycle of objects](object-lifecycle-mgmt.md).

The `s3:PutInventoryConfiguration` permission allows a user to both select all the metadata fields that are listed earlier for each object when configuring an inventory list and to specify the destination bucket to store the inventory. A user with read access to objects in the destination bucket can access all object metadata fields that are available in the inventory list. To restrict access to an inventory report, see [Grant permissions for S3 Inventory and S3 analytics](example-bucket-policies.md#example-bucket-policies-s3-inventory-1).

### Inventory consistency
<a name="storage-inventory-contents-consistency"></a>

All of your objects might not appear in each inventory list. The inventory list provides eventual consistency for `PUT` requests (of both new objects and overwrites) and for `DELETE` requests. Each inventory list for a bucket is a snapshot of bucket items. These lists are eventually consistent (that is, a list might not include recently added or deleted objects). 

To validate the state of an object before you take action on the object, we recommend that you perform a `HeadObject` REST API request to retrieve metadata for the object, or to check the object's properties in the Amazon S3 console. You can also check object metadata with the AWS CLI or the AWS SDKS. For more information, see [https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectHEAD.html](https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectHEAD.html) in the *Amazon Simple Storage Service API Reference*.

For more information about working with Amazon S3 Inventory, see the following topics.

**Topics**
+ [

## Source and destination buckets
](#storage-inventory-buckets)
+ [

## Amazon S3 Inventory list
](#storage-inventory-contents)
+ [

# Configuring Amazon S3 Inventory
](configure-inventory.md)
+ [

# Locating your inventory list
](storage-inventory-location.md)
+ [

# Setting up Amazon S3 Event Notifications for inventory completion
](storage-inventory-notification.md)
+ [

# Querying Amazon S3 Inventory with Amazon Athena
](storage-inventory-athena-query.md)
+ [

# Converting empty version ID strings in Amazon S3 Inventory reports to null strings
](inventory-configure-bops.md)
+ [

# Working with the Object ACL field
](objectacl.md)

# Configuring Amazon S3 Inventory
<a name="configure-inventory"></a>

Amazon S3 Inventory provides a flat file list of your objects and metadata, on a schedule that you define. You can use S3 Inventory as a scheduled alternative to the Amazon S3 synchronous `List` API operation. S3 Inventory provides comma-separated values (CSV), [Apache optimized row columnar (ORC)](https://orc.apache.org/), or [https://parquet.apache.org/](https://parquet.apache.org/) output files that list your objects and their corresponding metadata. 

You can configure S3 Inventory to create inventory lists on a daily or weekly basis for an S3 bucket or for objects that share a prefix (objects that have names that begin with the same string). For more information, see [Cataloging and analyzing your data with S3 Inventory](storage-inventory.md).

This section describes how to configure an inventory, including details about the inventory source and destination buckets.

**Topics**
+ [

## Overview
](#storage-inventory-setting-up)
+ [

## Creating a destination bucket policy
](#configure-inventory-destination-bucket-policy)
+ [

## Granting Amazon S3 permission to use your customer managed key for encryption
](#configure-inventory-kms-key-policy)
+ [

## Configuring inventory by using the S3 console
](#configure-inventory-console)
+ [

## Using the REST API to work with S3 Inventory
](#rest-api-inventory)

## Overview
<a name="storage-inventory-setting-up"></a>

Amazon S3 Inventory helps you manage your storage by creating lists of the objects in an S3 bucket on a defined schedule. You can configure multiple inventory lists for a bucket. The inventory lists are published to CSV, ORC, or Parquet files in a destination bucket. 

The easiest way to set up an inventory is by using the Amazon S3 console, but you can also use the Amazon S3 REST API, AWS Command Line Interface (AWS CLI), or AWS SDKs. The console performs the first step of the following procedure for you: adding a bucket policy to the destination bucket.

**To set up Amazon S3 Inventory for an S3 bucket**

1. **Add a bucket policy for the destination bucket.**

   You must create a bucket policy on the destination bucket that grants permissions to Amazon S3 to write objects to the bucket in the defined location. For an example policy, see [Grant permissions for S3 Inventory and S3 analytics](example-bucket-policies.md#example-bucket-policies-s3-inventory-1). 

1. **Configure an inventory to list the objects in a source bucket and publish the list to a destination bucket.**

   When you configure an inventory list for a source bucket, you specify the destination bucket where you want the list to be stored, and whether you want to generate the list daily or weekly. You can also configure whether to list all object versions or only current versions and what object metadata to include. 

   Some object metadata fields in S3 Inventory report configurations are optional, meaning that they're available by default but they can be restricted when you grant a user the `s3:PutInventoryConfiguration` permission. You can control whether users can include these optional metadata fields in their reports by using the `s3:InventoryAccessibleOptionalFields` condition key.

   For more information about the optional metadata fields available in S3 Inventory, see [https://docs.aws.amazon.com//AmazonS3/latest/API/API_PutBucketInventoryConfiguration.html#API_PutBucketInventoryConfiguration_RequestBody](https://docs.aws.amazon.com//AmazonS3/latest/API/API_PutBucketInventoryConfiguration.html#API_PutBucketInventoryConfiguration_RequestBody) in the *Amazon Simple Storage Service API Reference*. For more information about restricting access to certain optional metadata fields in an inventory configuration, see [Control S3 Inventory report configuration creation](example-bucket-policies.md#example-bucket-policies-s3-inventory-2).

   You can specify that the inventory list file be encrypted by using server-side encryption with an Amazon S3 managed key (SSE-S3) or an AWS Key Management Service (AWS KMS) customer managed key (SSE-KMS). 
**Note**  
The AWS managed key (`aws/s3`) is not supported for SSE-KMS encryption with S3 Inventory. 

   For more information about SSE-S3 and SSE-KMS, see [Protecting data with server-side encryption](serv-side-encryption.md). If you plan to use SSE-KMS encryption, see Step 3.
   + For information about how to use the console to configure an inventory list, see [Configuring inventory by using the S3 console](#configure-inventory-console).
   + To use the Amazon S3 API to configure an inventory list, use the [https://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUTInventoryConfig.html](https://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUTInventoryConfig.html) REST API operation or the equivalent from the AWS CLI or AWS SDKs. 

1. **To encrypt the inventory list file with SSE-KMS, grant Amazon S3 permission to use the AWS KMS key.**

   You can configure encryption for the inventory list file by using the Amazon S3 console, Amazon S3 REST API, AWS CLI, or AWS SDKs. Whichever way you choose, you must grant Amazon S3 permission to use the customer managed key to encrypt the inventory file. You [grant Amazon S3 permission by modifying the key policy for the customer managed key](https://docs.aws.amazon.com/AmazonS3/latest/userguide/configure-inventory.html#configure-inventory-kms-key-policy) that you want to use to encrypt the inventory file. Make sure that you've provided a KMS key ARN in the S3 Inventory configuration or the destination bucket’s encryption settings. If no KMS key ARN has been specified and the default encryption settings are being used, you won’t be able to access your S3 Inventory report.

   The destination bucket that stores the inventory list file can be owned by a different AWS account than the account that owns the source bucket. If you use SSE-KMS encryption for the cross-account operations of Amazon S3 Inventory, we recommend that you use a fully qualified KMS key ARN when you configure S3 inventory. For more information, see [Using SSE-KMS encryption for cross-account operations](bucket-encryption.md#bucket-encryption-update-bucket-policy) and [https://docs.aws.amazon.com/AmazonS3/latest/API/API_ServerSideEncryptionByDefault.html](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ServerSideEncryptionByDefault.html) in the *Amazon Simple Storage Service API Reference*.
**Note**  
If you can’t access your S3 Inventory report, use the [https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketEncryption.html](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketEncryption.html) API, and check whether the destination bucket has the default SSE-KMS encryption enabled. If no KMS key ARN has been specified and the default encryption settings are being used, you won’t be able to access your S3 Inventory report. To access S3 Inventory reports again, either provide a KMS key ARN in the S3 Inventory configuration or in the destination bucket’s encryption settings.

## Creating a destination bucket policy
<a name="configure-inventory-destination-bucket-policy"></a>

If you create your inventory configuration through the Amazon S3 console, Amazon S3 automatically creates a bucket policy on the destination bucket that grants Amazon S3 write permission to the bucket. However, if you create your inventory configuration through the AWS CLI, AWS SDKs, or the Amazon S3 REST API, you must manually add a bucket policy on the destination bucket. The S3 Inventory destination bucket policy allows Amazon S3 to write data for the inventory reports to the bucket. 

The following is the example bucket policy. 

------
#### [ JSON ]

****  

```
{  
      "Version":"2012-10-17",		 	 	 
      "Statement": [
        {
            "Sid": "InventoryExamplePolicy",
            "Effect": "Allow",
            "Principal": {
                "Service": "s3.amazonaws.com"
            },
            "Action": "s3:PutObject",
            "Resource": [
                "arn:aws:s3:::DOC-EXAMPLE-DESTINATION-BUCKET/*"
            ],
            "Condition": {
                "ArnLike": {
                    "aws:SourceArn": "arn:aws:s3:::DOC-EXAMPLE-SOURCE-BUCKET"
                },
                "StringEquals": {
                    "aws:SourceAccount": "source-123456789012",
                    "s3:x-amz-acl": "bucket-owner-full-control"
                }
            }
        }
    ]
}
```

------

For more information, see [Grant permissions for S3 Inventory and S3 analytics](example-bucket-policies.md#example-bucket-policies-s3-inventory-1).

If an error occurs when you try to create the bucket policy, you are given instructions on how to fix it. For example, if you choose a destination bucket in another AWS account and don't have permissions to read and write to the bucket policy, you see an error message.

In this case, the destination bucket owner must add the bucket policy to the destination bucket. If the policy is not added to the destination bucket, you won't get an inventory report because Amazon S3 doesn't have permission to write to the destination bucket. If the source bucket is owned by a different account than that of the current user, the correct account ID of the source bucket owner must be substituted in the policy.

**Note**  
Ensure that there are no Deny statements added to the destination bucket policy that would prevent the delivery of inventory reports into this bucket. For more information, see [Why can't I generate an Amazon S3 Inventory Report? ](https://repost.aws/knowledge-center/s3-inventory-report).

## Granting Amazon S3 permission to use your customer managed key for encryption
<a name="configure-inventory-kms-key-policy"></a>

To grant Amazon S3 permission to use your AWS Key Management Service (AWS KMS) customer managed key for server-side encryption, you must use a key policy. To update your key policy so that you can use your customer managed key, use the following procedure.

**To grant Amazon S3 permissions to encrypt by using your customer managed key**

1. Using the AWS account that owns the customer managed key, sign into the AWS Management Console.

1. Open the AWS KMS console at [https://console.aws.amazon.com/kms](https://console.aws.amazon.com/kms).

1. To change the AWS Region, use the Region selector in the upper-right corner of the page.

1. In the left navigation pane, choose **Customer managed keys**.

1. Under **Customer managed keys**, choose the customer managed key that you want to use to encrypt your inventory files.

1. In the **Key policy** section, choose **Switch to policy view**.

1. To update the key policy, choose **Edit**.

1. On the **Edit key policy** page, add the following lines to the existing key policy. For `source-account-id` and `amzn-s3-demo-source-bucket`, supply the appropriate values for your use case.

   ```
   {
       "Sid": "Allow Amazon S3 use of the customer managed key",
       "Effect": "Allow",
       "Principal": {
           "Service": "s3.amazonaws.com"
       },
       "Action": [
           "kms:GenerateDataKey"
       ],
       "Resource": "*",
       "Condition":{
         "StringEquals":{
            "aws:SourceAccount":"source-account-id"
        },
         "ArnLike":{
           "aws:SourceARN": "arn:aws:s3:::amzn-s3-demo-source-bucket"
        }
      }
   }
   ```

1. Choose **Save changes**.

For more information about creating customer managed keys and using key policies, see the following links in the *AWS Key Management Service Developer Guide*:
+ [Managing keys](https://docs.aws.amazon.com/kms/latest/developerguide/getting-started.html)
+ [Key policies in AWS KMS](https://docs.aws.amazon.com/kms/latest/developerguide/key-policies.html)

**Note**  
Ensure that there are no Deny statements added to the destination bucket policy that would prevent the delivery of inventory reports into this bucket. For more information, see [Why can't I generate an Amazon S3 Inventory Report? ](https://repost.aws/knowledge-center/s3-inventory-report).

## Configuring inventory by using the S3 console
<a name="configure-inventory-console"></a>

Use these instructions to configure inventory by using the S3 console.
**Note**  
It might take up to 48 hours for Amazon S3 to deliver the first inventory report.

1. Sign in to the AWS Management Console and open the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/).

1. In the left navigation pane, choose **General purpose buckets**.

1. In the buckets list, choose the name of the bucket that you want to configure Amazon S3 Inventory for.

1. Choose the **Management** tab.

1. Under **Inventory configurations**, choose **Create inventory configuration**.

1. For **Inventory configuration name**, enter a name.

1. For **Inventory scope**, do the following:
   + Enter an optional prefix.
   + Choose which object versions to include, either **Current versions only** or **Include all versions**.

1. Under **Report details**, choose the location of the AWS account that you want to save the reports to: **This account** or **A different account**.

1. Under **Destination**, choose the destination bucket where you want the inventory reports to be saved.

   The destination bucket must be in the same AWS Region as the bucket for which you are setting up the inventory. The destination bucket can be in a different AWS account. When specifying the destination bucket, you can also include an optional prefix to group your inventory reports together. 

   Under the **Destination** bucket field, you see the **Destination bucket permission** statement that is added to the destination bucket policy to allow Amazon S3 to place data in that bucket. For more information, see [Creating a destination bucket policy](#configure-inventory-destination-bucket-policy).

1. Under **Frequency**, choose how often the report will be generated, **Daily** or **Weekly**.

1. For **Output format**, choose one of the following formats for the report:
   + **CSV** – If you plan to use this inventory report with S3 Batch Operations or if you want to analyze this report in another tool, such as Microsoft Excel, choose **CSV**.
   + **Apache ORC**
   + **Apache Parquet**

1. Under **Status**, choose **Enable** or **Disable**.

1. To configure server-side encryption, under **Inventory report encryption**, follow these steps:

   1. Under **Server-side encryption**, choose either **Do not specify an encryption key** or **Specify an encryption key** to encrypt data.
      + To keep the bucket settings for default server-side encryption of objects when storing them in Amazon S3, choose **Do not specify an encryption key**. As long as the bucket destination has S3 Bucket Keys enabled, the copy operation applies an S3 Bucket Key at the destination bucket.
**Note**  
If the bucket policy for the specified destination requires objects to be encrypted before storing them in Amazon S3, you must choose **Specify an encryption key**. Otherwise, copying objects to the destination will fail.
      + To encrypt objects before storing them in Amazon S3, choose **Specify an encryption key**.

   1. If you chose **Specify an encryption key**, under **Encryption type**, you must choose either **Amazon S3 managed key (SSE-S3)** or **AWS Key Management Service key (SSE-KMS)**.

      SSE-S3 uses one of the strongest block ciphers—256-bit Advanced Encryption Standard (AES-256) to encrypt each object. SSE-KMS provides you with more control over your key. For more information about SSE-S3, see [Using server-side encryption with Amazon S3 managed keys (SSE-S3)](UsingServerSideEncryption.md). For more information about SSE-KMS, see [Using server-side encryption with AWS KMS keys (SSE-KMS)](UsingKMSEncryption.md).
**Note**  
To encrypt the inventory list file with SSE-KMS, you must grant Amazon S3 permission to use the customer managed key. For instructions, see [Grant Amazon S3 Permission to Encrypt Using Your KMS Keys](#configure-inventory-kms-key-policy).

   1. If you chose **AWS Key Management Service key (SSE-KMS)**, under **AWS KMS key**, you can specify your AWS KMS key through one of the following options.
**Note**  
If the destination bucket that stores the inventory list file is owned by a different AWS account, make sure that you use a fully qualified KMS key ARN to specify your KMS key.
      + To choose from a list of available KMS keys, choose **Choose from your AWS KMS keys**, and choose a symmetric encryption KMS key from the list of available keys. Make sure the KMS key is in the same Region as your bucket. 
**Note**  
Both the AWS managed key (`aws/s3`) and your customer managed keys appear in the list. However, the AWS managed key (`aws/s3`) is not supported for SSE-KMS encryption with S3 Inventory. 
      + To enter the KMS key ARN, choose **Enter AWS KMS key ARN**, and enter your KMS key ARN in the field that appears.
      + To create a new customer managed key in the AWS KMS console, choose **Create a KMS key**.

1. For **Additional metadata fields**, select one or more of the following to add to the inventory report:
   + **Size** – The object size in bytes, not including the size of incomplete multipart uploads, object metadata, and delete markers.
   + **Last modified date** – The object creation date or the last modified date, whichever is the latest.
   +  **Multipart upload** – Specifies that the object was uploaded as a multipart upload. For more information, see [Uploading and copying objects using multipart upload in Amazon S3](mpuoverview.md).
   + **Replication status** – The replication status of the object. For more information, see [Getting replication status information](replication-status.md).
   + **Encryption status** – The server-side encryption type that's used to encrypt the object. For more information, see [Protecting data with server-side encryption](serv-side-encryption.md).
   + **Bucket Key status** – Indicates whether a bucket-level key generated by AWS KMS applies to the object. For more information, see [Reducing the cost of SSE-KMS with Amazon S3 Bucket Keys](bucket-key.md).
   + **Object access control list** – An access control list (ACL) for each object that defines which AWS accounts or groups are granted access to this object and the type of access that is granted. For more information about this field, see [Working with the Object ACL field](objectacl.md). For more information about ACLs, see [Access control list (ACL) overview](acl-overview.md). 
   + **Object owner** – The owner of the object.
   + **Storage class** – The storage class that's used for storing the object. 
   + **Intelligent-Tiering: Access tier** – Indicates the access tier (frequent or infrequent) of the object if it was stored in the S3 Intelligent-Tiering storage class. For more information, see [Storage class for automatically optimizing data with changing or unknown access patterns](storage-class-intro.md#sc-dynamic-data-access).
   + **ETag** – The entity tag (ETag) is a hash of the object. The ETag reflects changes only to the contents of an object, not to its metadata. The ETag might or might not be an MD5 digest of the object data. Whether it is depends on how the object was created and how it is encrypted. For more information, see [https://docs.aws.amazon.com/AmazonS3/latest/API/API_Object.html](https://docs.aws.amazon.com/AmazonS3/latest/API/API_Object.html) in the *Amazon Simple Storage Service API Reference*.
   + **Checksum algorithm** – Indicates the algorithm that is used to create the checksum for the object. For more information, see [Using supported checksum algorithms](checking-object-integrity-upload.md#using-additional-checksums).
   + **All Object Lock configurations** – The Object Lock status of the object, including the following settings: 
     + **Object Lock: Retention mode** – The level of protection applied to the object, either *Governance* or *Compliance*.
     + **Object Lock: Retain until date** – The date until which the locked object cannot be deleted.
     + **Object Lock: Legal hold status** – The legal hold status of the locked object. 

     For information about S3 Object Lock, see [How S3 Object Lock works](object-lock.md#object-lock-overview).
   + **Lifecycle Expiration Date** – The lifecycle expiration timestamp for objects in your Inventory report. This field will only be populated, if the object is to be expired by an applicable lifecycle rule. In other cases, the field will be empty. For more information, see [Expiring objects](lifecycle-expire-general-considerations.md).

   For more information about the contents of an inventory report, see [Amazon S3 Inventory list](storage-inventory.md#storage-inventory-contents). 

   For more information about restricting access to certain optional metadata fields in an inventory configuration, see [Control S3 Inventory report configuration creation](example-bucket-policies.md#example-bucket-policies-s3-inventory-2).

1. Choose **Create**.

## Using the REST API to work with S3 Inventory
<a name="rest-api-inventory"></a>

The following are the REST operations that you can use to work with Amazon S3 Inventory.
+  [https://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketDELETEInventoryConfiguration.html](https://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketDELETEInventoryConfiguration.html) 
+  [https://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGETInventoryConfig.html](https://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGETInventoryConfig.html) 
+  [https://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketListInventoryConfigs.html](https://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketListInventoryConfigs.html) 
+  [https://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUTInventoryConfig.html](https://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUTInventoryConfig.html) 

# Locating your inventory list
<a name="storage-inventory-location"></a>

When an inventory list is published, the manifest files are published to the following location in the destination bucket.

```
destination-prefix/amzn-s3-demo-source-bucket/config-ID/YYYY-MM-DDTHH-MMZ/manifest.json
 destination-prefix/amzn-s3-demo-source-bucket/config-ID/YYYY-MM-DDTHH-MMZ/manifest.checksum
 destination-prefix/amzn-s3-demo-source-bucket/config-ID/hive/dt=YYYY-MM-DD-HH-MM/symlink.txt
```
+ `destination-prefix` is the object key name prefix that is optionally specified in the inventory configuration. You can use this prefix to group all the inventory list files in a common location within the destination bucket.
+ `amzn-s3-demo-source-bucket` is the source bucket that the inventory list is for. The source bucket name is added to prevent collisions when multiple inventory reports from different source buckets are sent to the same destination bucket.
+ `config-ID` is added to prevent collisions with multiple inventory reports from the same source bucket that are sent to the same destination bucket. The `config-ID` comes from the inventory report configuration, and is the name for the report that is defined during setup.
+ `YYYY-MM-DDTHH-MMZ` is the timestamp that consists of the start time and the date when the inventory report generation process begins scanning the bucket; for example, `2016-11-06T21-32Z`.
+ `manifest.json` is the manifest file. 
+ `manifest.checksum` is the MD5 hash of the content of the `manifest.json` file. 
+ `symlink.txt` is the Apache Hive-compatible manifest file. 

The inventory lists are published daily or weekly to the following location in the destination bucket.

```
destination-prefix/amzn-s3-demo-source-bucket/config-ID/data/example-file-name.csv.gz
...
destination-prefix/amzn-s3-demo-source-bucket/config-ID/data/example-file-name-1.csv.gz
```
+ `destination-prefix` is the object key name prefix that is optionally specified in the inventory configuration. You can use this prefix to group all the inventory list files in a common location in the destination bucket.
+ `amzn-s3-demo-source-bucket` is the source bucket that the inventory list is for. The source bucket name is added to prevent collisions when multiple inventory reports from different source buckets are sent to the same destination bucket.
+ `example-file-name``.csv.gz` is one of the CSV inventory files. ORC inventory names end with the file name extension `.orc`, and Parquet inventory names end with the file name extension `.parquet`.

## Inventory manifest
<a name="storage-inventory-location-manifest"></a>

The manifest files `manifest.json` and `symlink.txt` describe where the inventory files are located. Whenever a new inventory list is delivered, it is accompanied by a new set of manifest files. These files might overwrite each other. In versioning-enabled buckets, Amazon S3 creates new versions of the manifest files. 

Each manifest contained in the `manifest.json` file provides metadata and other basic information about an inventory. This information includes the following:
+ The source bucket name
+ The destination bucket name
+ The version of the inventory
+ The creation timestamp in the epoch date format that consists of the start time and the date when the inventory report generation process begins scanning the bucket
+ The format and schema of the inventory files
+ A list of the inventory files that are in the destination bucket

Whenever a `manifest.json` file is written, it is accompanied by a `manifest.checksum` file that is the MD5 hash of the content of the `manifest.json` file.

**Example Inventory manifest in a `manifest.json` file**  
The following examples show an inventory manifest in a `manifest.json` file for CSV, ORC, and Parquet-formatted inventories.  
The following is an example of a manifest in a `manifest.json` file for a CSV-formatted inventory.  

```
{
    "sourceBucket": "amzn-s3-demo-source-bucket",
    "destinationBucket": "arn:aws:s3:::example-inventory-destination-bucket",
    "version": "2016-11-30",
    "creationTimestamp" : "1514944800000",
    "fileFormat": "CSV",
    "fileSchema": "Bucket, Key, VersionId, IsLatest, IsDeleteMarker, Size, LastModifiedDate, ETag, StorageClass, IsMultipartUploaded, ReplicationStatus, EncryptionStatus, ObjectLockRetainUntilDate, ObjectLockMode, ObjectLockLegalHoldStatus, IntelligentTieringAccessTier, BucketKeyStatus, ChecksumAlgorithm, ObjectAccessControlList, ObjectOwner",
    "files": [
        {
            "key": "Inventory/amzn-s3-demo-source-bucket/2016-11-06T21-32Z/files/939c6d46-85a9-4ba8-87bd-9db705a579ce.csv.gz",
            "size": 2147483647,
            "MD5checksum": "f11166069f1990abeb9c97ace9cdfabc"
        }
    ]
}
```
The following is an example of a manifest in a `manifest.json` file for an ORC-formatted inventory.  

```
{
    "sourceBucket": "amzn-s3-demo-source-bucket",
    "destinationBucket": "arn:aws:s3:::example-destination-bucket",
    "version": "2016-11-30",
    "creationTimestamp" : "1514944800000",
    "fileFormat": "ORC",
    "fileSchema": "struct<bucket:string,key:string,version_id:string,is_latest:boolean,is_delete_marker:boolean,size:bigint,last_modified_date:timestamp,e_tag:string,storage_class:string,is_multipart_uploaded:boolean,replication_status:string,encryption_status:string,object_lock_retain_until_date:timestamp,object_lock_mode:string,object_lock_legal_hold_status:string,intelligent_tiering_access_tier:string,bucket_key_status:string,checksum_algorithm:string,object_access_control_list:string,object_owner:string>",
    "files": [
        {
            "key": "inventory/amzn-s3-demo-source-bucket/data/d794c570-95bb-4271-9128-26023c8b4900.orc",
            "size": 56291,
            "MD5checksum": "5925f4e78e1695c2d020b9f6eexample"
        }
    ]
}
```
The following is an example of a manifest in a `manifest.json` file for a Parquet-formatted inventory.  

```
{
    "sourceBucket": "amzn-s3-demo-source-bucket",
    "destinationBucket": "arn:aws:s3:::example-destination-bucket",
    "version": "2016-11-30",
    "creationTimestamp" : "1514944800000",
    "fileFormat": "Parquet",
    "fileSchema": "message s3.inventory { required binary bucket (UTF8); required binary key (UTF8); optional binary version_id (UTF8); optional boolean is_latest; optional boolean is_delete_marker; optional int64 size; optional int64 last_modified_date (TIMESTAMP_MILLIS); optional binary e_tag (UTF8); optional binary storage_class (UTF8); optional boolean is_multipart_uploaded; optional binary replication_status (UTF8); optional binary encryption_status (UTF8); optional int64 object_lock_retain_until_date (TIMESTAMP_MILLIS); optional binary object_lock_mode (UTF8); optional binary object_lock_legal_hold_status (UTF8); optional binary intelligent_tiering_access_tier (UTF8); optional binary bucket_key_status (UTF8); optional binary checksum_algorithm (UTF8); optional binary object_access_control_list (UTF8); optional binary object_owner (UTF8);}",
    "files": [
        {
           "key": "inventory/amzn-s3-demo-source-bucket/data/d754c470-85bb-4255-9218-47023c8b4910.parquet",
            "size": 56291,
            "MD5checksum": "5825f2e18e1695c2d030b9f6eexample"
        }
    ]
}
```
The `symlink.txt` file is an Apache Hive-compatible manifest file that allows Hive to automatically discover inventory files and their associated data files. The Hive-compatible manifest works with the Hive-compatible services Athena and Amazon Redshift Spectrum. It also works with Hive-compatible applications, including [https://prestodb.io/](https://prestodb.io/), [https://hive.apache.org/](https://hive.apache.org/), [https://databricks.com/spark/about/](https://databricks.com/spark/about/), and many others.  
The `symlink.txt` Apache Hive-compatible manifest file does not currently work with AWS Glue.  
Reading the `symlink.txt` file with [https://hive.apache.org/](https://hive.apache.org/) and [https://databricks.com/spark/about/](https://databricks.com/spark/about/) is not supported for ORC and Parquet-formatted inventory files. 

# Setting up Amazon S3 Event Notifications for inventory completion
<a name="storage-inventory-notification"></a>

You can set up an Amazon S3 event notification to receive notice when the manifest checksum file is created, which indicates that an inventory list has been added to the destination bucket. The manifest is an up-to-date list of all the inventory lists at the destination location.

Amazon S3 can publish events to an Amazon Simple Notification Service (Amazon SNS) topic, an Amazon Simple Queue Service (Amazon SQS) queue, or an AWS Lambda function. For more information, see [Amazon S3 Event Notifications](EventNotifications.md).

The following notification configuration defines that all `manifest.checksum` files newly added to the destination bucket are processed by the AWS Lambda `cloud-function-list-write`.

```
<NotificationConfiguration>
  <QueueConfiguration>
      <Id>1</Id>
      <Filter>
          <S3Key>
              <FilterRule>
                  <Name>prefix</Name>
                  <Value>destination-prefix/source-bucket</Value>
              </FilterRule>
              <FilterRule>
                  <Name>suffix</Name>
                  <Value>checksum</Value>
              </FilterRule>
          </S3Key>
     </Filter>
     <CloudFunction>arn:aws:lambda:us-west-2:222233334444:cloud-function-list-write</CloudFunction>
     <Event>s3:ObjectCreated:*</Event>
  </QueueConfiguration>
  </NotificationConfiguration>
```

For more information, see [Using AWS Lambda with Amazon S3](https://docs.aws.amazon.com/lambda/latest/dg/with-s3.html) in the *AWS Lambda Developer Guide*.

# Querying Amazon S3 Inventory with Amazon Athena
<a name="storage-inventory-athena-query"></a>

You can query Amazon S3 Inventory files with standard SQL queries by using Amazon Athena in all Regions where Athena is available. To check for AWS Region availability, see the [AWS Region Table](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/). 

Athena can query Amazon S3 Inventory files in [Apache optimized row columnar (ORC)](https://orc.apache.org/), [https://parquet.apache.org/](https://parquet.apache.org/), or comma-separated values (CSV) format. When you use Athena to query inventory files, we recommend that you use ORC-formatted or Parquet-formatted inventory files. The ORC and Parquet formats provide faster query performance and lower query costs. ORC and Parquet are self-describing, type-aware columnar file formats designed for [http://hadoop.apache.org/](http://hadoop.apache.org/). The columnar format lets the reader read, decompress, and process only the columns that are required for the current query. The ORC and Parquet formats for Amazon S3 Inventory are available in all AWS Regions.

**To use Athena to query Amazon S3 Inventory files**

1. Create an Athena table. For information about creating a table, see [Creating Tables in Amazon Athena](https://docs.aws.amazon.com/athena/latest/ug/creating-tables.html) in the *Amazon Athena User Guide*.

1. Create your query by using one of the following sample query templates, depending on whether you're querying an ORC-formatted, a Parquet-formatted, or a CSV-formatted inventory report. 
   + When you're using Athena to query an ORC-formatted inventory report, use the following sample query as a template.

     The following sample query includes all the optional fields in an ORC-formatted inventory report. 

     To use this sample query, do the following: 
     + Replace `your_table_name` with the name of the Athena table that you created.
     + Remove any optional fields that you did not choose for your inventory so that the query corresponds to the fields chosen for your inventory.
     + Replace the following bucket name and inventory location (the configuration ID) as appropriate for your configuration.

       `s3://amzn-s3-demo-bucket/config-ID/hive/`
     + Replace the `2022-01-01-00-00` date under `projection.dt.range` with the first day of the time range within which you partition the data in Athena. For more information, see [Partitioning data in Athena](https://docs.aws.amazon.com/athena/latest/ug/partitions.html).

     ```
     CREATE EXTERNAL TABLE your_table_name (
              bucket string,
              key string,
              version_id string,
              is_latest boolean,
              is_delete_marker boolean,
              size bigint,
              last_modified_date timestamp,
              e_tag string,
              storage_class string,
              is_multipart_uploaded boolean,
              replication_status string,
              encryption_status string,
              object_lock_retain_until_date bigint,
              object_lock_mode string,
              object_lock_legal_hold_status string,
              intelligent_tiering_access_tier string,
              bucket_key_status string,
              checksum_algorithm string,
              object_access_control_list string,
              object_owner string,
              lifecycle_expiration_date timestamp
     ) PARTITIONED BY (
             dt string
     )
     ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
       STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
       OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
       LOCATION 's3://amzn-s3-demo-bucket/config-ID/hive/'
       TBLPROPERTIES (
         "projection.enabled" = "true",
         "projection.dt.type" = "date",
         "projection.dt.format" = "yyyy-MM-dd-HH-mm",
         "projection.dt.range" = "2022-01-01-00-00,NOW",
         "projection.dt.interval" = "1",
         "projection.dt.interval.unit" = "HOURS"
       );
     ```
   + When you're using Athena to query a Parquet-formatted inventory report, use the sample query for an ORC-formatted report. However, use the following Parquet SerDe in place of the ORC SerDe in the `ROW FORMAT SERDE` statement.

     ```
     ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
     ```
   + When you're using Athena to query a CSV-formatted inventory report, use the following sample query as a template.

     The following sample query includes all the optional fields in an CSV-formatted inventory report. 

     To use this sample query, do the following: 
     + Replace `your_table_name` with the name of the Athena table that you created.
     + Remove any optional fields that you did not choose for your inventory so that the query corresponds to the fields chosen for your inventory.
     + Replace the following bucket name and inventory location (the configuration ID) as appropriate for your configuration. 

       `s3://amzn-s3-demo-bucket/config-ID/hive/`
     + Replace the `2022-01-01-00-00` date under `projection.dt.range` with the first day of the time range within which you partition the data in Athena. For more information, see [Partitioning data in Athena](https://docs.aws.amazon.com/athena/latest/ug/partitions.html).

     ```
     CREATE EXTERNAL TABLE your_table_name (
              bucket string,
              key string,
              version_id string,
              is_latest boolean,
              is_delete_marker boolean,
              size string,
              last_modified_date string,
              e_tag string,
              storage_class string,
              is_multipart_uploaded boolean,
              replication_status string,
              encryption_status string,
              object_lock_retain_until_date string,
              object_lock_mode string,
              object_lock_legal_hold_status string,
              intelligent_tiering_access_tier string,
              bucket_key_status string,
              checksum_algorithm string,
              object_access_control_list string,
              object_owner string,
              lifecycle_expiration_date string
     ) PARTITIONED BY (
             dt string
     )
     ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
       STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
       OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
       LOCATION 's3://amzn-s3-demo-bucket/config-ID/hive/'
       TBLPROPERTIES (
         "projection.enabled" = "true",
         "projection.dt.type" = "date",
         "projection.dt.format" = "yyyy-MM-dd-HH-mm",
         "projection.dt.range" = "2022-01-01-00-00,NOW",
         "projection.dt.interval" = "1",
         "projection.dt.interval.unit" = "HOURS"
       );
     ```

1. You can now run various queries on your inventory, as shown in the following examples. Replace each `user input placeholder` with your own information.

   ```
   # Get a list of the latest inventory report dates available.
   SELECT DISTINCT dt FROM your_table_name ORDER BY 1 DESC limit 10;
             
   # Get the encryption status for a provided report date.
   SELECT encryption_status, count(*) FROM your_table_name WHERE dt = 'YYYY-MM-DD-HH-MM' GROUP BY encryption_status;
             
   # Get the encryption status for inventory report dates in the provided range.
   SELECT dt, encryption_status, count(*) FROM your_table_name 
   WHERE dt > 'YYYY-MM-DD-HH-MM' AND dt < 'YYYY-MM-DD-HH-MM' GROUP BY dt, encryption_status;
   ```

   When you configure S3 Inventory to add the Object Access Control List (Object ACL) field to an inventory report, the report displays the value for the Object ACL field as a base64-encoded string. To get the decoded value in JSON for the Object ACL field, you can query this field by using Athena. See the following query examples. For more information about the Object ACL field, see [Working with the Object ACL field](objectacl.md).

   ```
   # Get the S3 keys that have Object ACL grants with public access.
   WITH grants AS (
       SELECT key,
           CAST(
               json_extract(from_utf8(from_base64(object_access_control_list)), '$.grants') AS ARRAY(MAP(VARCHAR, VARCHAR))
           ) AS grants_array
       FROM your_table_name
   )
   SELECT key,
          grants_array,
          grant
   FROM grants, UNNEST(grants_array) AS t(grant)
   WHERE element_at(grant, 'uri') = 'http://acs.amazonaws.com/groups/global/AllUsers'
   ```

   ```
   # Get the S3 keys that have Object ACL grantees in addition to the object owner.
   WITH grants AS 
       (SELECT key,
       from_utf8(from_base64(object_access_control_list)) AS object_access_control_list,
            object_owner,
            CAST(json_extract(from_utf8(from_base64(object_access_control_list)),
            '$.grants') AS ARRAY(MAP(VARCHAR, VARCHAR))) AS grants_array
       FROM your_table_name)
   SELECT key,
          grant,
          objectowner
   FROM grants, UNNEST(grants_array) AS t(grant)
   WHERE cardinality(grants_array) > 1 AND element_at(grant, 'canonicalId') != object_owner;
   ```

   ```
   # Get the S3 keys with READ permission that is granted in the Object ACL. 
   WITH grants AS (
       SELECT key,
           CAST(
               json_extract(from_utf8(from_base64(object_access_control_list)), '$.grants') AS ARRAY(MAP(VARCHAR, VARCHAR))
           ) AS grants_array
       FROM your_table_name
   )
   SELECT key,
          grants_array,
          grant
   FROM grants, UNNEST(grants_array) AS t(grant)
   WHERE element_at(grant, 'permission') = 'READ';
   ```

   ```
   # Get the S3 keys that have Object ACL grants to a specific canonical user ID.
   WITH grants AS (
       SELECT key,
           CAST(
               json_extract(from_utf8(from_base64(object_access_control_list)), '$.grants') AS ARRAY(MAP(VARCHAR, VARCHAR))
           ) AS grants_array
       FROM your_table_name
   )
   SELECT key,
          grants_array,
          grant
   FROM grants, UNNEST(grants_array) AS t(grant)
   WHERE element_at(grant, 'canonicalId') = 'user-canonical-id';
   ```

   ```
   # Get the number of grantees on the Object ACL.
   SELECT key,
          object_access_control_list,
          json_array_length(json_extract(object_access_control_list,'$.grants')) AS grants_count
   FROM your_table_name;
   ```

For more information about using Athena, see the [Amazon Athena User Guide](https://docs.aws.amazon.com/athena/latest/ug/).

# Converting empty version ID strings in Amazon S3 Inventory reports to null strings
<a name="inventory-configure-bops"></a>

**Note**  
**The following procedure applies only to Amazon S3 Inventory reports that include all versions, and only if the "all versions" reports are used as manifests for S3 Batch Operations on buckets that have S3 Versioning enabled.** You are not required to convert strings for S3 Inventory reports that specify the current version only.

You can use S3 Inventory reports as manifests for S3 Batch Operations. However, when S3 Versioning is enabled on a bucket, S3 Inventory reports that include all versions mark any null-versioned objects with empty strings in the version ID field. When an Inventory Report includes all object version IDs, Batch Operations recognizes `null` strings as version IDs, but not empty strings. 

When an S3 Batch Operations job uses an "all versions" S3 Inventory report as a manifest, it fails all tasks on objects that have an empty string in the version ID field. To convert empty strings in the version ID field of the S3 Inventory report to `null` strings for Batch Operations, use the following procedure.

**Update an Amazon S3 Inventory report for use with Batch Operations**

1. Sign in to the AWS Management Console and open the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/).

1. Navigate to your S3 Inventory report. The inventory report is located in the destination bucket that you specified while configuring your inventory report. For more information about locating inventory reports, see [Locating your inventory list](storage-inventory-location.md).

   1. Choose the destination bucket.

   1. Choose the folder. The folder is named after the original source bucket.

   1. Choose the folder named after the inventory configuration.

   1. Select the check box next to the folder named **hive**. At the top of the page, choose **Copy S3 URI** to copy the S3 URI for the folder.

1. Open the Amazon Athena console at [https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/home). 

1. In the query editor, choose **Settings**, then choose **Manage**. On the **Manage settings** page, for **Location of query result**, choose an S3 bucket to store your query results in.

1. In the query editor, create an Athena table to hold the data in the inventory report using the following command. Replace `table_name` with a name of your choosing, and in the `LOCATION` clause, insert the S3 URI that you copied earlier. Then choose **Run** to run the query.

   ```
   CREATE EXTERNAL TABLE table_name(bucket string, key string, version_id string) PARTITIONED BY (dt string)ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION 'Copied S3 URI';
   ```

1. To clear the query editor, choose **Clear**. Then load the inventory report into the table using the following command. Replace `table_name` with the one that you chose in the prior step. Then choose **Run** to run the query.

   ```
   MSCK REPAIR TABLE table_name;
   ```

1. To clear the query editor, choose **Clear**. Run the following `SELECT` query to retrieve all entries in the original inventory report and replace any empty version IDs with `null` strings. Replace `table_name` with the one that you chose earlier, and replace `YYYY-MM-DD-HH-MM` in the `WHERE` clause with the date of the inventory report that you want this tool to run on. Then choose **Run** to run the query.

   ```
   SELECT bucket as Bucket, key as Key, CASE WHEN version_id = '' THEN 'null' ELSE version_id END as VersionId FROM table_name WHERE dt = 'YYYY-MM-DD-HH-MM';
   ```

1. Return to the Amazon S3 console ([https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/)), and navigate to the S3 bucket that you chose for **Location of query result** earlier. Inside, there should be a series of folders ending with the date.

   For example, you should see something like **s3://**amzn-s3-demo-bucket**/*query-result-location*/Unsaved/2021/10/07/**. You should see `.csv` files containing the results of the `SELECT` query that you ran. 

   Choose the CSV file with the latest modified date. Download this file to your local machine for the next step.

1. The generated CSV file contains a header row. To use this CSV file as input for an S3 Batch Operations job, you must remove the header row, because Batch Operations doesn't support header rows on CSV manifests. 

   To remove the header row, you can run one of the following commands on the file. Replace *`file.csv`* with the name of your CSV file. 

   **For macOS and Linux machines**, run the `tail` command in a Terminal window. 

   ```
   tail -n +2 file.csv > tmp.csv && mv tmp.csv file.csv 
   ```

   **For Windows machines**, run the following script in a Windows PowerShell window. Replace `File-location` with the path to your file, and `file.csv` with the file name.

   ```
   $ins = New-Object System.IO.StreamReader File-location\file.csv
   $outs = New-Object System.IO.StreamWriter File-location\temp.csv
   try {
       $skip = 0
       while ( !$ins.EndOfStream ) {
           $line = $ins.ReadLine();
           if ( $skip -ne 0 ) {
               $outs.WriteLine($line);
           } else {
               $skip = 1
           }
       }
   } finally {
       $outs.Close();
       $ins.Close();
   }
   Move-Item File-location\temp.csv File-location\file.csv -Force
   ```

1. After removing the header row from the CSV file, you are ready to use it as a manifest in an S3 Batch Operations job. Upload the CSV file to an S3 bucket or location of your choosing, and then create a Batch Operations job using the CSV file as the manifest.

   For more information about creating a Batch Operations job, see [Creating an S3 Batch Operations job](batch-ops-create-job.md).

# Working with the Object ACL field
<a name="objectacl"></a>

An Amazon S3 Inventory report contains a list of the objects in the S3 source bucket and metadata for each object. The Object access control list (ACL) field is a metadata field that is available in Amazon S3 Inventory. Specifically, the Object ACL field contains the access control list (ACL) for each object. The ACL for an object defines which AWS accounts or groups are granted access to this object and the type of access that is granted. For more information, see [Access control list (ACL) overview](acl-overview.md) and [Amazon S3 Inventory list](storage-inventory.md#storage-inventory-contents). 

 The Object ACL field in Amazon S3 Inventory reports is defined in JSON format. The JSON data includes the following fields: 
+ `version` – The version of the Object ACL field format in the inventory reports. It's in date format `yyyy-mm-dd`. 
+ `status` – Possible values are `AVAILABLE` or `UNAVAILABLE` to indicate whether an Object ACL is available for an object. When the status for the Object ACL is `UNAVAILABLE`, the value of the Object Owner field in the inventory report is also `UNAVAILABLE`.
+ `grants` – Grantee-permission pairs that list the permission status of each grantee that is granted by the Object ACL. The available values for a grantee are `CanonicalUser` and `Group`. For more information about grantees, see [Grantees in access control lists](https://docs.aws.amazon.com/AmazonS3/latest/userguide/acl-overview.html#specifying-grantee).

  For a grantee with the `Group` type, a grantee-permission pair includes the following attributes:
  + `uri` – A predefined Amazon S3 group.
  + `permission` – The ACL permissions that are granted on the object. For more information, see [ACL permissions on an object](https://docs.aws.amazon.com/AmazonS3/latest/userguide/acl-overview.html#permissions).
  + `type` – The type `Group`, which denotes that the grantee is group.

  For a grantee with the `CanonicalUser` type, a grantee-permission pair includes the following attributes:
  + `canonicalId` – An obfuscated form of the AWS account ID. The canonical user ID for an AWS account is specific to that account. You can retrieve the canonical user ID. For more information, see [Find the canonical user ID for your AWS account](https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-identifiers.html#FindCanonicalId) in the *AWS Account Management Reference Guide*.
  + `permission` – The ACL permissions that are granted on the object. For more information, see [ACL permissions on an object](https://docs.aws.amazon.com/AmazonS3/latest/userguide/acl-overview.html#permissions).
  + `type` – The type `CanonicalUser`, which denotes that the grantee is an AWS account.

The following example shows possible values for the Object ACL field in JSON format: 

```
{
    "version": "2022-11-10",
    "status": "AVAILABLE",
    "grants": [{
        "uri": "http://acs.amazonaws.com/groups/global/AllUsers",
        "permission": "READ",
        "type": "Group"
    }, {
        "canonicalId": "example-canonical-id",
        "permission": "FULL_CONTROL",
        "type": "CanonicalUser"
    }]
}
```

**Note**  
The Object ACL field is defined in JSON format. An inventory report displays the value for the Object ACL field as a base64-encoded string.  
For example, suppose that you have the following Object ACL field in JSON format:  

```
{
        "version": "2022-11-10",
        "status": "AVAILABLE",
        "grants": [{
            "canonicalId": "example-canonical-user-ID",
            "type": "CanonicalUser",
            "permission": "READ"
        }]
}
```
The Object ACL field is encoded and shown as the following base64-encoded string:  

```
eyJ2ZXJzaW9uIjoiMjAyMi0xMS0xMCIsInN0YXR1cyI6IkFWQUlMQUJMRSIsImdyYW50cyI6W3siY2Fub25pY2FsSWQiOiJleGFtcGxlLWNhbm9uaWNhbC11c2VyLUlEIiwidHlwZSI6IkNhbm9uaWNhbFVzZXIiLCJwZXJtaXNzaW9uIjoiUkVBRCJ9XX0=
```
To get the decoded value in JSON for the Object ACL field, you can query this field in Amazon Athena. For query examples, see [Querying Amazon S3 Inventory with Amazon Athena](storage-inventory-athena-query.md).