Ingest documents directly into a knowledge base - Amazon Bedrock

Ingest documents directly into a knowledge base

To ingest documents directly into a knowledge base, send an IngestKnowledgeBaseDocuments request with an Agents for Amazon Bedrock build-time endpoint and specify the ID of the knowledge base and of the data source that it is connected to.

Note

If you specify a document identifier or S3 location that already exists in the knowledge base, the document will be overwritten with the new content.

Refer to the following table for restrictions on the methods that you can use to specify the documents to ingest:

Data source type Document defined in-line Document in Amazon S3 location
Amazon S3 No No Yes Yes
Custom Yes Yes Yes Yes

The request body contains one field, documents, that maps to an array of KnowledgeBaseDocument objects, each of which represents the content and optional metadata of a document to add to the data source and to ingest into the knowledge base. A KnowledgeBaseDocument object contains the following fields:

Note

The content and metadata for a document must be defined with the same method. For example, if you define the content inline, you must also define the metadata inline.

Select a topic to learn how to ingest documents for different data source types:

Ingest a document into a knowledge base connected to a custom data source

If the dataSourceId you specify belongs to a custom data source, you can add content and metadata for each KnowledgeBaseDocument object in the documents array.

The content of a document added to a custom data source can be defined in the following ways:

You can define the following types of documents in-line:

If the document is text, the DocumentContent object should be in the following format:

{ "custom": { "customDocumentIdentifier": { "id": "string" }, "inlineContent": { "textContent": { "data": "string" }, "type": "TEXT" }, "sourceType": "IN_LINE" }, "dataSourceType": "CUSTOM" }

Include an ID for the document in the id field and the text of the document in the data field.

If the document contains more than just text, convert it into a Base64-string. The DocumentContent object should then be in the following format:

{ "custom": { "customDocumentIdentifier": { "id": "string" }, "inlineContent": { "byteContent": { "data": blob, "mimeType": "string" }, "type": "BYTE" }, "sourceType": "IN_LINE" }, "dataSourceType": "CUSTOM" }

Include an ID for the document in the id field, the Base64-encoded document in the data field, and the MIME type in the mimeType field.

If you're ingesting a document from an S3 location, the DocumentContent object in the content field should be of the following form:

{ "custom": { "customDocumentIdentifier": { "id": "string" }, "s3Location": { "bucketOwnerAccountId": "string", "uri": "string" }, "sourceType": "S3" }, "dataSourceType": "CUSTOM" }

Include an ID for the document in the id field, the owner of the S3 bucket that contains the document in bucketOwnerAccountId field, and the S3 URI of the document in the uri field.

The metadata for a document can be defined in the following ways:

If you define the metadata inline, the DocumentMetadata object in the metadata field should be in the following format:

{ "inlineAttributes": [ { "key": "string", "value": { "stringValue": "string", "booleanValue": boolean, "numberValue": number, "stringListValue": [ "string" ], "type": "STRING" | "BOOLEAN" | "NUMBER" | "STRING_LIST" } } ], "type": "IN_LINE_ATTRIBUTE" } }

For each attribute that you add, define the key in the key field. Specify the data type of the value in the type field and include the field that corresponds to the data type. For example, if you include a string, the attribute would be in the following format:

{ "key": "string", "value": { "stringValue": "string", "type": "STRING" } }

You can also ingest metadata from a file with the extension .metadata.json in an S3 location. For more information about the format of a metadata file, see the Document metadata fields section in Connect to Amazon S3 for your Amazon Bedrock knowledge base.

If the metadata is from an S3 file, the DocumentMetadata object in the metadata field should be in the following format:

{ "s3Location": { "bucketOwnerAccountId": "string", "uri": "string" }, "type": "S3_LOCATION" } }

Include the owner of the S3 bucket that contains the metadata file in bucketOwnerAccountId field, and the S3 URI of the metadata file in the uri field.

Warning

If you defined the content inline, you must define the metadata inline.

Ingest a document into a knowledge base connected to an Amazon S3 data source

If the dataSourceId you specify belongs to an S3 data source, you can add content and metadata for each KnowledgeBaseDocument object in the documents array.

Note

For S3 data sources, you can only add content and metadata from an S3 location.

The content of an S3 document to add to S3 should be added to a DocumentContent object in the following format:

{ "dataSourceType": "string", "s3": { "s3Location": { "uri": "string" } } }

Include the owner of the S3 bucket that contains the document in bucketOwnerAccountId field, and the S3 URI of the document in the uri field.

The metadata for a document added to a custom data source can be defined in the following format:

{ "s3Location": { "bucketOwnerAccountId": "string", "uri": "string" }, "type": "S3_LOCATION" } }
Warning

Documents that you ingest directly into a knowledge base connected to an S3 data source aren't added to the S3 bucket itself. We recommend that you add these documents to the S3 data source as well so that they aren't removed or overwritten if you sync your data source.

Example request bodies

Expond the following sections to see request bodies for different use cases with IngestKnowledgeBaseDocuments:

The following example shows the addition of one text document to a custom data source:

PUT /knowledgebases/KB12345678/datasources/DS12345678/documents HTTP/1.1 Content-type: application/json { "documents": [ { "content": { "dataSourceType": "CUSTOM", "custom": { "customDocumentIdentifier": { "id": "MyDocument" }, "inlineContent": { "textContent": { "data": "Hello world!" }, "type": "TEXT" }, "sourceType": "IN_LINE" } } } ] }

The following example shows the addition of a PDF document to a custom data source:

PUT /knowledgebases/KB12345678/datasources/DS12345678/documents HTTP/1.1 Content-type: application/json { "documents": [ { "content": { "dataSourceType": "CUSTOM", "custom": { "customDocumentIdentifier": { "id": "MyDocument" }, "inlineContent": { "byteContent": { "data": "<Base64-encoded string>", "mimeType": "application/pdf" }, "type": "BYTE" }, "sourceType": "IN_LINE" } } } ] }

The following example shows the addition of one text document to a custom data source from an S3 location:

PUT /knowledgebases/KB12345678/datasources/DS12345678/documents HTTP/1.1 Content-type: application/json { "documents": [ { "content": { "dataSourceType": "CUSTOM", "custom": { "customDocumentIdentifier": { "id": "MyDocument" }, "s3": { "s3Location": { "uri": "amzn-s3-demo-bucket" } }, "sourceType": "S3" } } } ] }

The following example shows the inline addition to a custom data source of a document alongside metadata containing two attributes:

PUT /knowledgebases/KB12345678/datasources/DS12345678/documents HTTP/1.1 Content-type: application/json { "documents": [ { "content": { "dataSourceType": "CUSTOM", "custom": { "customDocumentIdentifier": { "id": "MyDocument" }, "inlineContent": { "textContent": { "data": "Hello world!" }, "type": "TEXT" }, "sourceType": "IN_LINE" } }, "metadata": { "inlineAttributes": [ { "key": "genre", "value": { "stringValue": "pop", "type": "STRING" } }, { "key": "year", "value": { "numberValue": 1988, "type": "NUMBER" } } ], "type": "IN_LINE_ATTRIBUTE" } } ] }

The following example shows the addition of a document alongside metadata to an S3 data source. You can only include the metadata through S3:

PUT /knowledgebases/KB12345678/datasources/DS12345678/documents HTTP/1.1 Content-type: application/json { "documents": [ { "content": { "dataSourceType": "S3", "s3": { "s3Location": { "uri": "amzn-s3-demo-bucket" } } }, "metadata": { "s3Location": { "bucketOwnerId": "111122223333", "uri": "amzn-s3-demo-bucket" }, "type": "S3_LOCATION" } } ] }