Locating your inventory list
When an inventory list is published, the manifest files are published to the following location in the destination bucket.
destination-prefix
//
amzn-s3-demo-source-bucket
config-ID
/YYYY-MM-DDTHH-MMZ
/manifest.jsondestination-prefix
//
amzn-s3-demo-source-bucket
config-ID
/YYYY-MM-DDTHH-MMZ
/manifest.checksumdestination-prefix
//
amzn-s3-demo-source-bucket
config-ID
/hive/dt=YYYY-MM-DD-HH-MM
/symlink.txt
-
is the object key name prefix that is optionally specified in the inventory configuration. You can use this prefix to group all the inventory list files in a common location within the destination bucket.destination-prefix
-
is the source bucket that the inventory list is for. The source bucket name is added to prevent collisions when multiple inventory reports from different source buckets are sent to the same destination bucket.amzn-s3-demo-source-bucket
-
is added to prevent collisions with multiple inventory reports from the same source bucket that are sent to the same destination bucket. Theconfig-ID
comes from the inventory report configuration, and is the name for the report that is defined during setup.config-ID
-
is the timestamp that consists of the start time and the date when the inventory report generation process begins scanning the bucket; for example,YYYY-MM-DDTHH-MMZ
2016-11-06T21-32Z
. -
manifest.json
is the manifest file. -
manifest.checksum
is the MD5 hash of the content of themanifest.json
file. -
symlink.txt
is the Apache Hive-compatible manifest file.
The inventory lists are published daily or weekly to the following location in the destination bucket.
destination-prefix
//
amzn-s3-demo-source-bucket
config-ID
/data/example-file-name.csv.gz
...destination-prefix
//
amzn-s3-demo-source-bucket
config-ID
/data/example-file-name-1.csv.gz
-
is the object key name prefix that is optionally specified in the inventory configuration. You can use this prefix to group all the inventory list files in a common location in the destination bucket.destination-prefix
-
is the source bucket that the inventory list is for. The source bucket name is added to prevent collisions when multiple inventory reports from different source buckets are sent to the same destination bucket.amzn-s3-demo-source-bucket
-
example-file-name
.csv.gz
is one of the CSV inventory files. ORC inventory names end with the file name extension.orc
, and Parquet inventory names end with the file name extension.parquet
.
Inventory manifest
The manifest files manifest.json
and
symlink.txt
describe where the inventory files are located. Whenever
a new inventory list is delivered, it is accompanied by a new set of manifest files. These
files might overwrite each other. In versioning-enabled buckets, Amazon S3 creates new versions
of the manifest files.
Each manifest contained in the manifest.json
file provides metadata
and other basic information about an inventory. This information includes the
following:
-
The source bucket name
-
The destination bucket name
-
The version of the inventory
-
The creation timestamp in the epoch date format that consists of the start time and the date when the inventory report generation process begins scanning the bucket
-
The format and schema of the inventory files
-
A list of the inventory files that are in the destination bucket
Whenever a manifest.json
file is written, it is accompanied by a
manifest.checksum
file that is the MD5 hash of the content of the
manifest.json
file.
Example Inventory manifest in a manifest.json
file
The following examples show an inventory manifest in a
manifest.json
file for CSV, ORC, and
Parquet-formatted inventories.
The symlink.txt
file is an Apache Hive-compatible
manifest file that allows Hive to automatically discover inventory files
and their associated data files. The Hive-compatible manifest works with
the Hive-compatible services Athena and Amazon Redshift Spectrum. It also works with
Hive-compatible applications, including Presto
Important
The symlink.txt
Apache Hive-compatible manifest file does not currently work with
AWS Glue.
Reading the symlink.txt
file with Apache Hive