Selecione suas preferências de cookies

Usamos cookies essenciais e ferramentas semelhantes que são necessárias para fornecer nosso site e serviços. Usamos cookies de desempenho para coletar estatísticas anônimas, para que possamos entender como os clientes usam nosso site e fazer as devidas melhorias. Cookies essenciais não podem ser desativados, mas você pode clicar em “Personalizar” ou “Recusar” para recusar cookies de desempenho.

Se você concordar, a AWS e terceiros aprovados também usarão cookies para fornecer recursos úteis do site, lembrar suas preferências e exibir conteúdo relevante, incluindo publicidade relevante. Para aceitar ou recusar todos os cookies não essenciais, clique em “Aceitar” ou “Recusar”. Para fazer escolhas mais detalhadas, clique em “Personalizar”.

Include metadata in a data source to improve knowledge base query

Modo de foco
Include metadata in a data source to improve knowledge base query - Amazon Bedrock
Esta página não foi traduzida para seu idioma. Solicitar tradução

When ingesting CSV (comma separate values) files, you have the ability to have the knowledge base treat certain columns as content fields versus metadata fields. Instead of potentially having hundreds or thousands of content/metadata file pairs, you can now have a single CSV file and a corresponding metadata.json file, giving the knowledge base hints as to how to treat each column inside of your CSV.

There are limits for document metadata fields/attributes per chunk. See Quotas for knowledge bases

Before ingesting a CSV file, make sure:

  • Your CSV is in RFC4180 format and is UTF-8 encoded.

  • The first row of your CSV includes header information.

  • Metadata fields provided in your metadata.json are present as columns in your CSV.

  • You provide a fileName.csv.metadata.json file with the following format:

    { "metadataAttributes": { "${attribute1}": "${value1}", "${attribute2}": "${value2}", ... }, "documentStructureConfiguration": { "type": "RECORD_BASED_STRUCTURE_METADATA", "recordBasedStructureMetadata": { "contentFields": [ { "fieldName": "string" } ], "metadataFieldsSpecification": { "fieldsToInclude": [ { "fieldName": "string" } ], "fieldsToExclude": [ { "fieldName": "string" } ] } } } }

The CSV file is parsed one row at a time and the chunking strategy and vector embedding is applied to the content field. Amazon Bedrock knowledge bases currently supports one content field. The content field is split into chunks, and the metadata fields (columns) that are are associated with each chunk are treated as string values.

For example, say there's a CSV with a column 'Description' and a column 'Creation_Date'. The description field is the content field and the creation date is an associated metadata field. The description text is split into chunks and converted into vector embeddings for each row in the CSV. The creation date value is treated as string representation of the date and is associated with each chunk for the description.

If no inclusion/exclusion fields are provided, all columns are treated as metadata columns, except the content column. If only inclusion fields are provided, only the provided columns are treated as metadata. If only exclusion fields are provided, all columns, except the exclusion columns are treated as metadata. If you provide the same fieldName in both fieldsToInclude and fieldsToExlcude, Amazon Bedrock throws a validation exception. If there’s a conflict between inclusion and exclusion, it ] will result in a failure.

Blank rows found inside a CSV are ignored or skipped.

PrivacidadeTermos do sitePreferências de cookies
© 2024, Amazon Web Services, Inc. ou suas afiliadas. Todos os direitos reservados.