To store the vector embeddings that your documents are converted to, you use a vector store. If you prefer for Amazon Bedrock to automatically create a vector index in Amazon OpenSearch Serverless for you, skip this prerequisite and proceed to Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases.
If you want to store binary vector embeddings instead of the standard floating-point (float32) vector embeddings, then you must use a vector store that supports binary vectors. Amazon OpenSearch Serverless is currently the only vector store that supports storing binary vectors.
You can set up your own supported vector store to index the vector embeddings representation of your data. You create fields for the following data:
-
A field for the vectors generated from the text in your data source by the embeddings model that you choose.
-
A field for the text chunks extracted from the files in your data source.
-
Fields for source files metadata that Amazon Bedrock manages.
-
(If you use an Amazon Aurora database and want to set up filtering on metadata) Fields for metadata that you associate with your source files. If you plan to set up filtering in other vector stores, you don't have to set up these fields for filtering.
You can encrypt third-party vector stores with a KMS key. For more information, see Encryption of knowledge base resources.
Select the tab corresponding to the vector store service that you will use to create your vector index.
-
To configure permissions and create a vector search collection in Amazon OpenSearch Serverless in the AWS Management Console, follow steps 1 and 2 at Working with vector search collections in the Amazon OpenSearch Service Developer Guide. Note the following considerations while setting up your collection:
-
Give the collection a name and description of your choice.
-
To make your collection private, select Standard create for the Security section. Then, in the Network access settings section, select VPC as the Access type and choose a VPC endpoint. For more information about setting up a VPC endpoint for an Amazon OpenSearch Serverless collection, see Access Amazon OpenSearch Serverless using an interface endpoint (AWS PrivateLink) in the Amazon OpenSearch Service Developer Guide.
-
-
Once the collection is created, take note of the Collection ARN for when you create the knowledge base.
-
In the left navigation pane, select Collections under Serverless. Then select your vector search collection.
-
Select the Indexes tab. Then choose Create vector index.
-
In the Vector index details section, enter a name for your index in the Vector index name field.
-
In the Vector fields section, choose Add vector field. Amazon Bedrock stores the vector embeddings for your data source in this field. Provide the following configurations:
-
Vector field name – Provide a name for the field (for example,
embeddings
). -
Engine – The vector engine used for search. Select faiss.
-
Dimensions – The number of dimensions in the vector. Refer to the following table to determine how many dimensions the vector should contain:
Model Dimensions Titan G1 Embeddings - Text 1,536 Titan V2 Embeddings - Text 1,024 Cohere Embed English 1,024 Cohere Embed Multilingual 1,024 -
Distance metric – The metric used to measure the similarity between vectors. We recommend using Euclidean.
-
-
Expand the Metadata management section and add two fields to configure the vector index to store additional metadata that a knowledge base can retrieve with vectors. The following table describes the fields and the values to specify for each field:
Field description Mapping field Data type Filterable Amazon Bedrock chunks the raw text from your data and stores the chunks in this field. Name of your choice (for example, text
)String True Amazon Bedrock stores metadata related to your knowledge base in this field. Name of your choice (for example, bedrock-metadata
)String False -
Take note of the names you choose for the vector index name, vector field name, and metadata management mapping field names for when you create your knowledge base. Then choose Create.
After the vector index is created, you can proceed to create your knowledge base. The following table summarizes where you will enter each piece of information that you took note of.
Field | Corresponding field in knowledge base setup (Console) | Corresponding field in knowledge base setup (API) | Description |
---|---|---|---|
Collection ARN | Collection ARN | collectionARN | The Amazon Resource Name (ARN) of the vector search collection. |
Vector index name | Vector index name | vectorIndexName | The name of the vector index. |
Vector field name | Vector field | vectorField | The name of the field in which to store vector embeddings for your data sources. |
Metadata management (first mapping field) | Text field | textField | The name of the field in which to store the raw text from your data sources. |
Metadata management (second mapping field) | Bedrock-managed metadata field | metadataField | The name of the field in which to store metadata that Amazon Bedrock manages. |
For more detailed documentation on setting up a vector store in Amazon OpenSearch Serverless, see Working with vector search collections in the Amazon OpenSearch Service Developer Guide.