Ingest documents directly into a knowledge base
To ingest documents directly into a knowledge base, send an IngestKnowledgeBaseDocuments request with an Agents for Amazon Bedrock build-time endpoint and specify the ID of the knowledge base and of the data source that it is connected to.
Note
If you specify a document identifier or S3 location that already exists in the knowledge base, the document will be overwritten with the new content.
Refer to the following table for restrictions on the methods that you can use to specify the documents to ingest:
Data source type | Document defined in-line | Document in Amazon S3 location |
---|---|---|
Amazon S3 | No | Yes |
Custom | Yes | Yes |
The request body contains one field, documents
, that maps to an array of KnowledgeBaseDocument objects, each of which represents the content and optional metadata of a document to add to the data source and to ingest into the knowledge base. A KnowledgeBaseDocument object contains the following fields:
-
content – Maps to a DocumentContent object containing information about the content of the document to add.
-
metadata – (Optional) Maps to a DocumentMetadata object containing information about the metadata of the document to add. For more information about how to use metadata during retrieval, see the Metadata and filtering section in Configure and customize queries and response generation.
Note
The content and metadata for a document must be defined with the same method. For example, if you define the content inline, you must also define the metadata inline.
Select a topic to learn how to ingest documents for different data source types:
Topics
Ingest a document into a knowledge base connected to a custom data source
If the dataSourceId
you specify belongs to a custom data source, you can add content and metadata for each KnowledgeBaseDocument object in the documents
array.
The content of a document added to a custom data source can be defined in the following ways:
You can define the following types of documents in-line:
If the document is text, the DocumentContent object should be in the following format:
{ "custom": { "customDocumentIdentifier": { "id": "string" }, "inlineContent": { "textContent": { "data": "string" }, "type": "TEXT" }, "sourceType": "IN_LINE" }, "dataSourceType": "CUSTOM" }
Include an ID for the document in the id
field and the text of the document in the data
field.
If the document contains more than just text, convert it into a Base64-string. The DocumentContent object should then be in the following format:
{ "custom": { "customDocumentIdentifier": { "id": "string" }, "inlineContent": { "byteContent": { "data": blob, "mimeType": "string" }, "type": "BYTE" }, "sourceType": "IN_LINE" }, "dataSourceType": "CUSTOM" }
Include an ID for the document in the id
field, the Base64-encoded document in the data
field, and the MIME type in the mimeType
field.
If you're ingesting a document from an S3 location, the DocumentContent object in the content
field should be of the following form:
{ "custom": { "customDocumentIdentifier": { "id": "string" }, "s3Location": { "bucketOwnerAccountId": "string", "uri": "string" }, "sourceType": "S3" }, "dataSourceType": "CUSTOM" }
Include an ID for the document in the id
field, the owner of the S3 bucket that contains the document in bucketOwnerAccountId
field, and the S3 URI of the document in the uri
field.
The metadata for a document can be defined in the following ways:
If you define the metadata inline, the DocumentMetadata object in the metadata
field should be in the following format:
{ "inlineAttributes": [ { "key": "string", "value": { "stringValue": "string", "booleanValue": boolean, "numberValue": number, "stringListValue": [ "string" ], "type": "STRING" | "BOOLEAN" | "NUMBER" | "STRING_LIST" } } ], "type": "IN_LINE_ATTRIBUTE" } }
For each attribute that you add, define the key in the key
field. Specify the data type of the value in the type
field and include the field that corresponds to the data type. For example, if you include a string, the attribute would be in the following format:
{ "key": "string", "value": { "stringValue": "string", "type": "STRING" } }
You can also ingest metadata from a file with the extension .metadata.json
in an S3 location. For more information about the format of a metadata file, see the Document metadata fields section in Connect to Amazon S3 for your Amazon Bedrock knowledge base.
If the metadata is from an S3 file, the DocumentMetadata object in the metadata
field should be in the following format:
{ "s3Location": { "bucketOwnerAccountId": "string", "uri": "string" }, "type": "S3_LOCATION" } }
Include the owner of the S3 bucket that contains the metadata file in bucketOwnerAccountId
field, and the S3 URI of the metadata file in the uri
field.
Warning
If you defined the content inline, you must define the metadata inline.
Ingest a document into a knowledge base connected to an Amazon S3 data source
If the dataSourceId
you specify belongs to an S3 data source, you can add content and metadata for each KnowledgeBaseDocument object in the documents
array.
Note
For S3 data sources, you can only add content and metadata from an S3 location.
The content of an S3 document to add to S3 should be added to a DocumentContent object in the following format:
{ "dataSourceType": "string", "s3": { "s3Location": { "uri": "string" } } }
Include the owner of the S3 bucket that contains the document in bucketOwnerAccountId
field, and the S3 URI of the document in the uri
field.
The metadata for a document added to a custom data source can be defined in the following format:
{ "s3Location": { "bucketOwnerAccountId": "string", "uri": "string" }, "type": "S3_LOCATION" } }
Warning
Documents that you ingest directly into a knowledge base connected to an S3 data source aren't added to the S3 bucket itself. We recommend that you add these documents to the S3 data source as well so that they aren't removed or overwritten if you sync your data source.
Example request bodies
Expond the following sections to see request bodies for different use cases with IngestKnowledgeBaseDocuments
:
The following example shows the addition of one text document to a custom data source:
PUT /knowledgebases/
KB12345678
/datasources/DS12345678
/documents HTTP/1.1 Content-type: application/json { "documents": [ { "content": { "dataSourceType": "CUSTOM", "custom": { "customDocumentIdentifier": { "id": "MyDocument" }, "inlineContent": { "textContent": { "data": "Hello world!" }, "type": "TEXT" }, "sourceType": "IN_LINE" } } } ] }
The following example shows the addition of a PDF document to a custom data source:
PUT /knowledgebases/
KB12345678
/datasources/DS12345678
/documents HTTP/1.1 Content-type: application/json { "documents": [ { "content": { "dataSourceType": "CUSTOM", "custom": { "customDocumentIdentifier": { "id": "MyDocument" }, "inlineContent": { "byteContent": { "data": "<Base64-encoded string>", "mimeType": "application/pdf" }, "type": "BYTE" }, "sourceType": "IN_LINE" } } } ] }
The following example shows the addition of one text document to a custom data source from an S3 location:
PUT /knowledgebases/
KB12345678
/datasources/DS12345678
/documents HTTP/1.1 Content-type: application/json { "documents": [ { "content": { "dataSourceType": "CUSTOM", "custom": { "customDocumentIdentifier": { "id": "MyDocument" }, "s3": { "s3Location": { "uri": "amzn-s3-demo-bucket" } }, "sourceType": "S3" } } } ] }
The following example shows the inline addition to a custom data source of a document alongside metadata containing two attributes:
PUT /knowledgebases/
KB12345678
/datasources/DS12345678
/documents HTTP/1.1 Content-type: application/json { "documents": [ { "content": { "dataSourceType": "CUSTOM", "custom": { "customDocumentIdentifier": { "id": "MyDocument" }, "inlineContent": { "textContent": { "data": "Hello world!" }, "type": "TEXT" }, "sourceType": "IN_LINE" } }, "metadata": { "inlineAttributes": [ { "key": "genre", "value": { "stringValue": "pop", "type": "STRING" } }, { "key": "year", "value": { "numberValue": 1988, "type": "NUMBER" } } ], "type": "IN_LINE_ATTRIBUTE" } } ] }
The following example shows the addition of a document alongside metadata to an S3 data source. You can only include the metadata through S3:
PUT /knowledgebases/
KB12345678
/datasources/DS12345678
/documents HTTP/1.1 Content-type: application/json { "documents": [ { "content": { "dataSourceType": "S3", "s3": { "s3Location": { "uri": "amzn-s3-demo-bucket" } } }, "metadata": { "s3Location": { "bucketOwnerId": "111122223333", "uri": "amzn-s3-demo-bucket" }, "type": "S3_LOCATION" } } ] }