Sync your data with your Amazon Bedrock knowledge base - Amazon Bedrock

Sync your data with your Amazon Bedrock knowledge base

After you create your knowledge base, you ingest or sync your data so that the data can be queried. Ingestion converts the raw data in your data source into vector embeddings. Before you begin ingestion, check that your data source fulfills the following conditions:

  • You have configured the connection information for your data source. To configure a data source connector to crawl your data from your data source repository, see Supported data source connectors.

  • The files are in supported formats. For more information, see Support document formats.

  • The files don't exceed the maximum file size specified in Knowledge base quotas.

  • If your data source contains metadata files, check the following conditions to ensure that the metadata files aren't ignored:

    • Each .metadata.json file shares the same file name and extension as the source file that it's associated with.

    • If the vector index for your knowledge base is in an Amazon OpenSearch Serverless vector store, check that the vector index is configured with the faiss engine. If the vector index is configured with the nmslib engine, you'll have to do one of the following:

    • If the vector index for your knowledge base is in an Amazon Aurora database cluster, check that the table for your index contains a column for each metadata property in your metadata files before starting ingestion.

Note

Each time you add, modify, or remove files from your data source, you must sync the data source so that it is re-indexed to the knowledge base. Syncing is incremental, so Amazon Bedrock only processes added, modified, or deleted documents since the last sync.

To learn how to sync your data source and ingest your data into your knowledge base, select the tab corresponding to your method of choice and follow the steps.

Console
To sync your data source and ingest your data
  1. Open the Amazon Bedrock console at https://console.aws.amazon.com/bedrock/.

  2. From the left navigation pane, select Knowledge base and choose your knowledge base.

  3. In the Data source section, select Sync to begin data ingestion.

  4. When data ingestion completes, a green success banner appears if it is successful.

    Note

    After data syncing completes, it could take a few minutes for the vector embeddings of the newly synced data to reflect in your knowledge base and be available for querying if you use a vector store other than Amazon Aurora (RDS).

  5. You can choose a data source to view its Sync history. Select View warnings to see why a data ingestion job failed.

API

To sync your data source and ingest your data into your knowledge base, send a StartIngestionJob request with a Agents for Amazon Bedrock build-time endpoint. Specify the knowledgeBaseId and dataSourceId.

Use the ingestionJobId returned in the response in a GetIngestionJob request with a Agents for Amazon Bedrock build-time endpoint to track the status of the ingestion job. In addition, specify the knowledgeBaseId and dataSourceId.

  • When the ingestion job finishes, the status in the response is COMPLETE.

    Note

    After data ingestion completes, it could take few minutes for the vector embeddings of the newly ingested data to be available in the vector store for querying if you use a vector store other than Amazon Aurora (RDS).

  • The statistics object in the response returns information about whether ingestion was successful or not for documents in the data source.

You can also see information for all ingestion jobs for a data source by sending a ListIngestionJobs request with a Agents for Amazon Bedrock build-time endpoint. Specify the dataSourceId and the knowledgeBaseId of the knowledge base that the data is being ingested to.

  • Filter for results by specifying a status to search for in the filters object.

  • Sort by the time that the job was started or the status of a job by specifying the sortBy object. You can sort in ascending or descending order.

  • Set the maximum number of results to return in a response in the maxResults field. If there are more results than the number you set, the response returns a nextToken that you can send in another ListIngestionJobs request to see the next batch of jobs.