Include metadata in a data source to improve knowledge base query

Focus mode

Include metadata in a data source to improve knowledge base query - Amazon Bedrock

When ingesting CSV (comma separate values) files, you have the ability to have the knowledge base treat certain columns as content fields versus metadata fields. Instead of potentially having hundreds or thousands of content/metadata file pairs, you can now have a single CSV file and a corresponding metadata.json file, giving the knowledge base hints as to how to treat each column inside of your CSV.

There are limits for document metadata fields/attributes per chunk. See Quotas for knowledge bases

Before ingesting a CSV file, make sure:

Your CSV is in RFC4180 format and is UTF-8 encoded.
The first row of your CSV includes header information.
Metadata fields provided in your metadata.json are present as columns in your CSV.

You provide a fileName.csv.metadata.json file with the following format:


{
    "metadataAttributes": {
        "${attribute1}": "${value1}",
        "${attribute2}": "${value2}",
        ...
    },
    "documentStructureConfiguration": {
        "type": "RECORD_BASED_STRUCTURE_METADATA",
        "recordBasedStructureMetadata": {
            "contentFields": [
                {
                    "fieldName": "string"
                }
            ],
            "metadataFieldsSpecification": {
                "fieldsToInclude": [
                    {
                        "fieldName": "string"
                    }
                ],
                "fieldsToExclude": [
                    {
                        "fieldName": "string"
                    }
                ]
            }
        }
    }
}

The CSV file is parsed one row at a time and the chunking strategy and vector embedding is applied to the content field. Amazon Bedrock knowledge bases currently supports one content field. The content field is split into chunks, and the metadata fields (columns) that are associated with each chunk are treated as string values.

For example, say there's a CSV with a column 'Description' and a column 'Creation_Date'. The description field is the content field and the creation date is an associated metadata field. The description text is split into chunks and converted into vector embeddings for each row in the CSV. The creation date value is treated as string representation of the date and is associated with each chunk for the description.

If no inclusion/exclusion fields are provided, all columns are treated as metadata columns, except the content column. If only inclusion fields are provided, only the provided columns are treated as metadata. If only exclusion fields are provided, all columns, except the exclusion columns are treated as metadata. If you provide the same fieldName in both fieldsToInclude and fieldsToExclude, Amazon Bedrock throws a validation exception. If there’s a conflict between inclusion and exclusion, it ] will result in a failure.

Blank rows found inside a CSV are ignored or skipped.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Use a Lambda function for data ingestion

Supported models and Regions

Next topic:

Supported models and Regions

Previous topic:

Use a Lambda function for data ingestion

Need help?

Select your cookie preferences

Customize cookie preferences

Essential

Performance

Functional

Advertising

Unable to save cookie preferences

Include metadata in a data source to improve knowledge base query

Next topic:

Previous topic:

Need help?

Related resources

Did this page help you?

Related resources