Prerequisites for your own vector store for a knowledge base
To store the vector embeddings that your documents are converted to, you use a vector store. If you prefer for Amazon Bedrock to automatically create a vector index in Amazon OpenSearch Serverless for you, skip this prerequisite and proceed to Create a knowledge base in Amazon Bedrock Knowledge Bases.
If you want to store binary vector embeddings instead of the standard floating-point (float32) vector embeddings, then you must use a vector store that supports binary vectors. Amazon OpenSearch Serverless is currently the only vector store that supports storing binary vectors.
You can set up your own supported vector store to index the vector embeddings representation of your data. You create fields for the following data:
-
A field for the vectors generated from the text in your data source by the embeddings model that you choose.
-
A field for the text chunks extracted from the files in your data source.
-
Fields for source files metadata that Amazon Bedrock manages.
-
(If you use an Amazon Aurora database and want to set up filtering on metadata) Fields for metadata that you associate with your source files. If you plan to set up filtering in other vector stores, you don't have to set up these fields for filtering.
You can encrypt third-party vector stores with a KMS key. For more information, see Encryption of knowledge base resources.
Select the tab corresponding to the vector store service that you will use to create your vector index.
- Amazon OpenSearch Serverless
-
-
To configure permissions and create a vector search collection in Amazon OpenSearch Serverless in the AWS Management Console, follow steps 1 and 2 at Working with vector search collections in the Amazon OpenSearch Service Developer Guide. Note the following considerations while setting up your collection:
-
Give the collection a name and description of your choice.
-
To make your collection private, select Standard create for the Security section. Then, in the Network access settings section, select VPC as the Access type and choose a VPC endpoint. For more information about setting up a VPC endpoint for an Amazon OpenSearch Serverless collection, see Access Amazon OpenSearch Serverless using an interface endpoint (AWS PrivateLink) in the Amazon OpenSearch Service Developer Guide.
-
-
Once the collection is created, take note of the Collection ARN for when you create the knowledge base.
-
In the left navigation pane, select Collections under Serverless. Then select your vector search collection.
-
Select the Indexes tab. Then choose Create vector index.
-
In the Vector index details section, enter a name for your index in the Vector index name field.
-
In the Vector fields section, choose Add vector field. Amazon Bedrock stores the vector embeddings for your data source in this field. Provide the following configurations:
-
Vector field name – Provide a name for the field (for example,
embeddings
). -
Engine – The vector engine used for search. Select faiss.
-
Dimensions – The number of dimensions in the vector. Refer to the following table to determine how many dimensions the vector should contain:
Model Dimensions Titan G1 Embeddings - Text 1,536 Titan V2 Embeddings - Text 1,024 Cohere Embed English 1,024 Cohere Embed Multilingual 1,024 -
Distance metric – The metric used to measure the similarity between vectors. We recommend using Euclidean.
-
-
Expand the Metadata management section and add two fields to configure the vector index to store additional metadata that a knowledge base can retrieve with vectors. The following table describes the fields and the values to specify for each field:
Field description Mapping field Data type Filterable Amazon Bedrock chunks the raw text from your data and stores the chunks in this field. Name of your choice (for example, text
)String True Amazon Bedrock stores metadata related to your knowledge base in this field. Name of your choice (for example, bedrock-metadata
)String False -
Take note of the names you choose for the vector index name, vector field name, and metadata management mapping field names for when you create your knowledge base. Then choose Create.
After the vector index is created, you can proceed to create your knowledge base. The following table summarizes where you will enter each piece of information that you took note of.
Field Corresponding field in knowledge base setup (Console) Corresponding field in knowledge base setup (API) Description Collection ARN Collection ARN collectionARN The Amazon Resource Name (ARN) of the vector search collection. Vector index name Vector index name vectorIndexName The name of the vector index. Vector field name Vector field vectorField The name of the field in which to store vector embeddings for your data sources. Metadata management (first mapping field) Text field textField The name of the field in which to store the raw text from your data sources. Metadata management (second mapping field) Bedrock-managed metadata field metadataField The name of the field in which to store metadata that Amazon Bedrock manages. For more detailed documentation on setting up a vector store in Amazon OpenSearch Serverless, see Working with vector search collections in the Amazon OpenSearch Service Developer Guide.
-
- Amazon Aurora (RDS)
-
-
Create an Amazon Aurora database (DB) cluster, schema, and table by following the steps at Using Aurora PostgreSQL as a knowledge base. When you create the table, configure it with the following columns and data types. You can use column names of your liking instead of the ones listed in the following table. Take note of the column names you choose so that you can provide them during knowledge base setup.
Column name Data type Corresponding field in knowledge base setup (Console) Corresponding field in knowledge base setup (API) Description id UUID primary key Primary key primaryKeyField
Contains unique identifiers for each record. embedding Vector Vector field vectorField
Contains the vector embeddings of the data sources. chunks Text Text field textField
Contains the chunks of raw text from your data sources. metadata JSON Bedrock-managed metadata field metadataField
Contains metadata required to carry out source attribution and to enable data ingestion and querying -
(Optional) If you added metadata to your files for filtering, you must also create a column for each metadata attribute in your files and specify the data type (text, number, or boolean). For example, if the attribute
genre
exists in your data source, you would add a column namedgenre
and specifytext
as the data type. During data ingestion, these columns will be populated with the corresponding attribute values. -
Configure an AWS Secrets Manager secret for your Aurora DB cluster by following the steps at Password management with Amazon Aurora and AWS Secrets Manager.
-
Take note of the following information after you create your DB cluster and set up the secret.
Field in knowledge base setup (Console) Field in knowledge base setup (API) Description Amazon Aurora DB Cluster ARN resourceArn The ARN of your DB cluster. Database name databaseName The name of your database Table name tableName The name of the table in your DB cluster Secret ARN credentialsSecretArn The ARN of the AWS Secrets Manager key for your DB cluster
-
- Pinecone
-
Note
If you use Pinecone, you agree to authorize AWS to access the designated third-party source on your behalf in order to provide vector store services to you. You're responsible for complying with any third-party terms applicable to use and and transfer of data from the third-party service.
For detailed documentation on setting up a vector store in Pinecone, see Pinecone as a knowledge base for Amazon Bedrock
. While you set up the vector store, take note of the following information, which you will fill out when you create a knowledge base:
-
Connection string – The endpoint URL for your index management page.
-
Namespace – (Optional) The namespace to be used to write new data to your database. For more information, see Using namespaces
.
There are additional configurations that you must provide when creating a Pinecone index:
-
Name – The name of the vector index. Choose any valid name of your choice. Later, when you create your knowledge base, enter the name you choose in the Vector index name field.
-
Dimensions – The number of dimensions in the vector. Refer to the following table to determine how many dimensions the vector should contain.
Model Dimensions Titan G1 Embeddings - Text 1,536 Titan V2 Embeddings - Text 1,024 Cohere Embed English 1,024 Cohere Embed Multilingual 1,024 -
Distance metric – The metric used to measure the similarity between vectors. We recommend that you experiment with different metrics for your use-case. We recommend starting with cosine similarity.
To access your Pinecone index, you must provide your Pinecone API key to Amazon Bedrock through the AWS Secrets Manager.
To set up a secret for your Pinecone configuration
-
Follow the steps at Create an AWS Secrets Manager secret, setting the key as
apiKey
and the value as the API key to access your Pinecone index. -
To find your API key, open your Pinecone console
and select API Keys. -
After you create the secret, take note of the ARN of the KMS key.
-
Attach permissions to your service role to decrypt the ARN of the KMS key by following the steps in Permissions to decrypt an AWS Secrets Manager secret for the vector store containing your knowledge base.
-
Later, when you create your knowledge base, enter the ARN in the Credentials secret ARN field.
-
- Redis Enterprise Cloud
-
Note
If you use Redis Enterprise Cloud, you agree to authorize AWS to access the designated third-party source on your behalf in order to provide vector store services to you. You're responsible for complying with any third-party terms applicable to use and transfer of data from the third-party service.
For detailed documentation on setting up a vector store in Redis Enterprise Cloud, see Integrating Redis Enterprise Cloud with Amazon Bedrock
. While you set up the vector store, take note of the following information, which you will fill out when you create a knowledge base:
-
Endpoint URL – The public endpoint URL for your database.
-
Vector index name – The name of the vector index for your database.
-
Vector field – The name of the field where the vector embeddings will be stored. Refer to the following table to determine how many dimensions the vector should contain.
Model Dimensions Titan G1 Embeddings - Text 1,536 Titan V2 Embeddings - Text 1,024 Cohere Embed English 1,024 Cohere Embed Multilingual 1,024 -
Text field – The name of the field where the Amazon Bedrock stores the chunks of raw text.
-
Bedrock-managed metadata field – The name of the field where Amazon Bedrock stores metadata related to your knowledge base.
To access your Redis Enterprise Cloud cluster, you must provide your Redis Enterprise Cloud security configuration to Amazon Bedrock through the AWS Secrets Manager.
To set up a secret for your Redis Enterprise Cloud configuration
-
Enable TLS to use your database with Amazon Bedrock by following the steps at Transport Layer Security (TLS)
. -
Follow the steps at Create an AWS Secrets Manager secret. Set up the following keys with the appropriate values from your Redis Enterprise Cloud configuration in the secret:
-
username
– The username to access your Redis Enterprise Cloud database. To find your username, look under the Security section of your database in the Redis Console. -
password
– The password to access your Redis Enterprise Cloud database. To find your password, look under the Security section of your database in the Redis Console. -
serverCertificate
– The content of the certificate from the Redis Cloud Certificate authority. Download the server certificate from the Redis Admin Console by following the steps at Download certificates. -
clientPrivateKey
– The private key of the certificate from the Redis Cloud Certificate authority. Download the server certificate from the Redis Admin Console by following the steps at Download certificates. -
clientCertificate
– The public key of the certificate from the Redis Cloud Certificate authority. Download the server certificate from the Redis Admin Console by following the steps at Download certificates.
-
-
After you create the secret, take note of its ARN. Later, when you create your knowledge base, enter the ARN in the Credentials secret ARN field.
-
- MongoDB Atlas
-
Note
If you use MongoDB Atlas, you agree to authorize AWS to access the designated third-party source on your behalf in order to provide vector store services to you. You're responsible for complying with any third-party terms applicable to use and and transfer of data from the third-party service.
For detailed documentation on setting up a vector store in MongoDB Atlas, see MongoDB Atlas as a knowledge base for Amazon Bedrock
. When you set up the vector store, note the following information which you will add when you create a knowledge base:
-
Endpoint URL – The endpoint URL of your MongoDB Atlas cluster.
-
Database name – The name of the database in your MongoDB Atlas cluster.
-
Collection name – The name of the collection in your database.
-
Credentials secret ARN – The Amazon Resource Name (ARN) of the secret that you created in AWS Secrets Manager that contains the username and password for a database user in your MongoDB Atlas cluster.
-
(Optional) Customer-managed KMS key for your Credentials secret ARN – if you encrypted your credentials secret ARN, provide the KMS key so that Amazon Bedrock can decrypt it.
There are additional configurations for Field mapping that you must provide when creating a MongoDB Atlas index:
-
Vector index name – The name of the MongoDB Atlas Vector Search Index on your collection.
-
Vector field name – The name of the field which Amazon Bedrock should store vector embeddings in.
-
Text field name – The name of the field which Amazon Bedrock should store the raw chunk text in.
-
Metadata field name – The name of the field which Amazon Bedrock should store source attribution metadata in.
(Optional) To have Amazon Bedrock connect to your MongoDB Atlas cluster over AWS PrivateLink, see RAG workflow with MongoDB Atlas using Amazon Bedrock
. -