Ingest changes directly into a knowledge base
Amazon Bedrock Knowledge Bases offers the ability to add documents to or delete documents from your knowledge base in a single action, rather than having to modify your data source and sync the changes as separate steps. You can take advantage of this feature if your knowledge base is connected to one of the following types of data sources:
-
Amazon S3
-
Custom
With direct ingestion, you can directly add, update, or delete files in a knowledge base in a single action and your knowledge base can have access to documents without the need to sync. With the KnowledgeBaseDocuments
API operations, you index the documents that you submit directly into the vector store set up for the knowledge base. You also have the ability to view the documents in your knowledge base directly with these operations, rather than needing to navigate to the connected data source to view them.
Differences from syncing a data source
Amazon Bedrock Knowledge Bases also offers a set of IngestionJob
API operations that relate to syncing your data source. When you submit a StartIngestionJob request, Amazon Bedrock Knowledge Bases scans each document in the connected data source and verifies whether it has already been indexed into the vector store set up for the knowledge base or not. If it hasn't, it is indexed into the vector store.
With IngestKnowledgeBaseDocuments, you submit an array of documents to be directly indexed into the vector store. Therefore, you skip the step of adding documents into the data source. See the following paragraphs to understand the use case for these two sets of API operations:
If you use a custom data source
You don't need to sync or use the IngestionJob
operations. Documents that you add, modify, or delete with the KnowledgeBaseDocuments
operations or in the AWS Management Console become part of both the custom data source and your knowledge base.
If you use an Amazon S3 data source
You use the two sets of operations in different use cases:
-
After connecting the knowledge base to the S3 data source for the first time, you must sync your data source in the AWS Management Console or by submitting a request.
-
Index documents into the vector store set up for your knowledge base or remove the indexed documents in the following ways:
-
Add documents into your S3 location or delete documents from it. Then sync your data source in the AWS Management Console or submit a
StartIngestionJob
request in the API. For details about syncing and theStartIngestionJob
operation, see Sync your data with your Amazon Bedrock knowledge base. -
Ingest S3 documents into the knowledge base directly with an
IngestKnowledgeBaseDocuments
request. For details about directly ingesting documents, see Ingest documents directly into a knowledge base.Warning
For S3 data sources, any changes that you index into the knowledge base directly in the AWS Management Console or with the
KnowledgeBaseDocuments
API operations aren't reflected in the S3 location. You can use these API operations to make changes to your knowledge base immediately available in a single step. However, you should follow up by making the same changes in your S3 location so that they aren't overwritten the next time you sync your data source in the AWS Management Console or withStartIngestionJob
.Don't submit an
IngestKnowledgeBaseDocuments
andStartIngestionJob
request at the same time.
-
Select a topic to learn how to perform direct ingestion of the documents in your data sources: