Ingest changes directly into a knowledge base - Amazon Bedrock

Ingest changes directly into a knowledge base

Amazon Bedrock Knowledge Bases offers the ability to add documents to or delete documents from your knowledge base in a single action, rather than having to modify your data source and sync the changes as separate steps. You can take advantage of this feature if your knowledge base is connected to one of the following types of data sources:

  • Amazon S3

  • Custom

With direct ingestion, you can directly add, update, or delete files in a knowledge base in a single action and your knowledge base can have access to documents without the need to sync. With the KnowledgeBaseDocuments API operations, you index the documents that you submit directly into the vector store set up for the knowledge base. You also have the ability to view the documents in your knowledge base directly with these operations, rather than needing to navigate to the connected data source to view them.

Differences from syncing a data source

Amazon Bedrock Knowledge Bases also offers a set of IngestionJob API operations that relate to syncing your data source. When you submit a StartIngestionJob request, Amazon Bedrock Knowledge Bases scans each document in the connected data source and verifies whether it has already been indexed into the vector store set up for the knowledge base or not. If it hasn't, it is indexed into the vector store.

With IngestKnowledgeBaseDocuments, you submit an array of documents to be directly indexed into the vector store. Therefore, you skip the step of adding documents into the data source. See the following paragraphs to understand the use case for these two sets of API operations:

If you use a custom data source

You don't need to sync or use the IngestionJob operations. Documents that you add, modify, or delete with the KnowledgeBaseDocuments operations or in the AWS Management Console become part of both the custom data source and your knowledge base.

If you use an Amazon S3 data source

You use the two sets of operations in different use cases:

  • After connecting the knowledge base to the S3 data source for the first time, you must sync your data source in the AWS Management Console or by submitting a request.

  • Index documents into the vector store set up for your knowledge base or remove the indexed documents in the following ways:

    1. Add documents into your S3 location or delete documents from it. Then sync your data source in the AWS Management Console or submit a StartIngestionJob request in the API. For details about syncing and the StartIngestionJob operation, see Sync your data with your Amazon Bedrock knowledge base.

    2. Ingest S3 documents into the knowledge base directly with an IngestKnowledgeBaseDocuments request. For details about directly ingesting documents, see Ingest documents directly into a knowledge base.

      Warning

      For S3 data sources, any changes that you index into the knowledge base directly in the AWS Management Console or with the KnowledgeBaseDocuments API operations aren't reflected in the S3 location. You can use these API operations to make changes to your knowledge base immediately available in a single step. However, you should follow up by making the same changes in your S3 location so that they aren't overwritten the next time you sync your data source in the AWS Management Console or with StartIngestionJob.

      Don't submit an IngestKnowledgeBaseDocuments and StartIngestionJob request at the same time.

Select a topic to learn how to perform direct ingestion of the documents in your data sources: