After you create your knowledge base, you ingest or sync your data so that the data can be queried. Ingestion converts the raw data in your data source into vector embeddings, based on the vector embeddings model and configurations you specified.
Before you begin ingestion, check that your data source fulfills the following conditions:
-
You have configured the connection information for your data source. To configure a data source connector to crawl your data from your data source repository, see Supported data source connectors. You configure your data source as part of creating your knowledge base.
-
You have configured your chosen vector embeddings model and vector store. See supported vector embeddings models and vector stores for knowledge bases. You configure your vector embeddings as part of creating your knowledge base.
-
The files are in supported formats. For more information, see Support document formats.
-
The files don't exceed the Ingestion job file size specified in Amazon Bedrock endpoints and quotas in the AWS General Reference.
-
If your data source contains metadata files, check the following conditions to ensure that the metadata files aren't ignored:
-
Each
.metadata.json
file shares the same file name and extension as the source file that it's associated with. -
If the vector index for your knowledge base is in an Amazon OpenSearch Serverless vector store, check that the vector index is configured with the
faiss
engine. If the vector index is configured with thenmslib
engine, you'll have to do one of the following:-
Create a new knowledge base in the console and let Amazon Bedrock automatically create a vector index in Amazon OpenSearch Serverless for you.
-
Create another vector index in the vector store and select
faiss
as the Engine. Then create a new knowledge base and specify the new vector index.
-
-
If the vector index for your knowledge base is in an Amazon Aurora database cluster, we recommend that you use the custom metadata field to store all your metadata in a single column and create an index on this column. If you do not provide the custom metadata field, you must check that the table for your index contains a column for each metadata property in your metadata files before starting ingestion. For more information, see Prerequisites for using a vector store you created for a knowledge base.
-
Each time you add, modify, or remove files from your data source, you must sync the data source so that it is re-indexed to the knowledge base. Syncing is incremental, so Amazon Bedrock only processes added, modified, or deleted documents since the last sync.
To learn how to ingest your data into your knowledge base and sync with your latest data, choose the tab for your preferred method, and then follow the steps:
To ingest your data into your knowledge base and sync with your latest data
-
Open the Amazon Bedrock console at https://console.aws.amazon.com/bedrock/
. -
From the left navigation pane, select Knowledge base and choose your knowledge base.
-
In the Data source section, select Sync to begin data ingestion or syncing your latest data. To stop a data source currently syncing, select Stop. A data source must be currently syncing in order to stop syncing the data source. You can select Sync again to ingest the rest of your data.
-
When data ingestion completes, a green success banner appears if it is successful.
Note
After data syncing completes, it could take a few minutes for the vector embeddings of the newly synced data to reflect in your knowledge base and be available for querying if you use a vector store other than Amazon Aurora (RDS).
-
You can choose a data source to view its Sync history. Select View warnings to see why a data ingestion job failed.