Prerequisites for your own vector store for a knowledge base

To store the vector embeddings that your documents are converted to, you use a vector store. If you prefer for Amazon Bedrock to automatically create a vector index in Amazon OpenSearch Serverless for you, skip this prerequisite and proceed to Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases.

If you want to store binary vector embeddings instead of the standard floating-point (float32) vector embeddings, then you must use a vector store that supports binary vectors. Amazon OpenSearch Serverless is currently the only vector store that supports storing binary vectors.

You can set up your own supported vector store to index the vector embeddings representation of your data. You create fields for the following data:

A field for the vectors generated from the text in your data source by the embeddings model that you choose.
A field for the text chunks extracted from the files in your data source.
Fields for source files metadata that Amazon Bedrock manages.
(If you use an Amazon Aurora database and want to set up filtering on metadata) Fields for metadata that you associate with your source files. If you plan to set up filtering in other vector stores, you don't have to set up these fields for filtering.

You can encrypt third-party vector stores with a KMS key. For more information, see Encryption of knowledge base resources.

Select the tab corresponding to the vector store service that you will use to create your vector index.

Amazon OpenSearch Serverless

To configure permissions and create a vector search collection in Amazon OpenSearch Serverless in the AWS Management Console, follow steps 1 and 2 at Working with vector search collections in the Amazon OpenSearch Service Developer Guide. Note the following considerations while setting up your collection:
1. Give the collection a name and description of your choice.
2. To make your collection private, select Standard create for the Security section. Then, in the Network access settings section, select VPC as the Access type and choose a VPC endpoint. For more information about setting up a VPC endpoint for an Amazon OpenSearch Serverless collection, see Access Amazon OpenSearch Serverless using an interface endpoint (AWS PrivateLink) in the Amazon OpenSearch Service Developer Guide.
Once the collection is created, take note of the Collection ARN for when you create the knowledge base.
In the left navigation pane, select Collections under Serverless. Then select your vector search collection.
Select the Indexes tab. Then choose Create vector index.
In the Vector index details section, enter a name for your index in the Vector index name field.

In the Vector fields section, choose Add vector field. Amazon Bedrock stores the vector embeddings for your data source in this field. Provide the following configurations:

Vector field name – Provide a name for the field (for example, embeddings).
Engine – The vector engine used for search. Select faiss.

Dimensions – The number of dimensions in the vector. Refer to the following table to determine how many dimensions the vector should contain:

Model	Dimensions
Titan G1 Embeddings - Text	1,536
Titan V2 Embeddings - Text	1,024
Cohere Embed English	1,024
Cohere Embed Multilingual	1,024

Distance metric – The metric used to measure the similarity between vectors. We recommend using Euclidean for floating-point vector embeddings.

Expand the Metadata management section and add two fields to configure the vector index to store additional metadata that a knowledge base can retrieve with vectors. The following table describes the fields and the values to specify for each field:

Field description	Mapping field	Data type	Filterable
Amazon Bedrock chunks the raw text from your data and stores the chunks in this field.	Name of your choice (for example, `text`)	String	True
Amazon Bedrock stores metadata related to your knowledge base in this field.	Name of your choice (for example, `bedrock-metadata`)	String	False

Take note of the names you choose for the vector index name, vector field name, and metadata management mapping field names for when you create your knowledge base. Then choose Create.

After the vector index is created, you can proceed to create your knowledge base. The following table summarizes where you will enter each piece of information that you took note of.

Field	Corresponding field in knowledge base setup (Console)	Corresponding field in knowledge base setup (API)	Description
Collection ARN	Collection ARN	collectionARN	The Amazon Resource Name (ARN) of the vector search collection.
Vector index name	Vector index name	vectorIndexName	The name of the vector index.
Vector field name	Vector field	vectorField	The name of the field in which to store vector embeddings for your data sources.
Metadata management (first mapping field)	Text field	textField	The name of the field in which to store the raw text from your data sources.
Metadata management (second mapping field)	Bedrock-managed metadata field	metadataField	The name of the field in which to store metadata that Amazon Bedrock manages.

For more detailed documentation on setting up a vector store in Amazon OpenSearch Serverless, see Working with vector search collections in the Amazon OpenSearch Service Developer Guide.

Amazon Aurora (RDS)

Create an Amazon Aurora database (DB) cluster, schema, and table by following the steps at Using Aurora PostgreSQL as a knowledge base. When you create the table, configure it with the following columns and data types. You can use column names of your liking instead of the ones listed in the following table. Take note of the column names you choose so that you can provide them during knowledge base setup.

Column name	Data type	Corresponding field in knowledge base setup (Console)	Corresponding field in knowledge base setup (API)	Description
id	UUID primary key	Primary key	`primaryKeyField`	Contains unique identifiers for each record.
embedding	Vector	Vector field	`vectorField`	Contains the vector embeddings of the data sources.
chunks	Text	Text field	`textField`	Contains the chunks of raw text from your data sources.
metadata	JSON	Bedrock-managed metadata field	`metadataField`	Contains metadata required to carry out source attribution and to enable data ingestion and querying

(Optional) If you added metadata to your files for filtering, you must also create a column for each metadata attribute in your files and specify the data type (text, number, or boolean). For example, if the attribute genre exists in your data source, you would add a column named genre and specifytext as the data type. During data ingestion, these columns will be populated with the corresponding attribute values.
Configure an AWS Secrets Manager secret for your Aurora DB cluster by following the steps at Password management with Amazon Aurora and AWS Secrets Manager.

Take note of the following information after you create your DB cluster and set up the secret.

Field in knowledge base setup (Console)	Field in knowledge base setup (API)	Description
Amazon Aurora DB Cluster ARN	resourceArn	The ARN of your DB cluster.
Database name	databaseName	The name of your database
Table name	tableName	The name of the table in your DB cluster
Secret ARN	credentialsSecretArn	The ARN of the AWS Secrets Manager key for your DB cluster

Neptune Analytics graphs (GraphRAG)

To create a graph and vector store in Neptune Analytics in the AWS Management Console, follow the steps described in Vector indexing in Neptune Analytics in the Neptune Analytics User Guide.

Note

To use Neptune GraphRAG, create an empty Neptune Analytics graph with a vector search index. The vector search index can only be created when the graph is created. When you create a Neptune Analytics graph in the console, you specify the index dimension under Vector search settings near the end of the process.

Note the following considerations while creating the graph:

Give the graph a name of your choice.
Under Data source, choose Create empty graph, and specify the number of m-NCUs to be allocated. Each m-NCU has around one GiB of memory capacity and corresponding compute and networking.

Note
The capacity of your graph can be modified later. We recommend that you start with the smallest instance and later choose a different instance, if needed.
You can leave the default availability settings, and under Network and Security, you must enable public access. Neptune Analytics graphs behind a VPC are not supported.

Under Vector search settings, choose Use vector dimension and specify the number of dimensions in each vector.

Note

The number of dimensions in each vector must match the vector dimensions in the embeddings model. Refer to the following table to determine how many dimensions the vector should contain:

Model	Dimensions
Titan G1 Embeddings - Text	1,536
Titan V2 Embeddings - Text	1,024
Cohere Embed English	1,024
Cohere Embed Multilingual	1,024

Leave all other settings to their default and create the graph.

Once the graph is created, click it to take note of the Resource ARN and Vector dimensions for when you create the knowledge base.

Expand the Metadata management section and add two fields to configure the vector index to store additional metadata that's managed by Amazon Bedrock. The following table describes the fields and the values to specify for each field:

Field description	Mapping field	Data type	Filterable
Amazon Bedrock chunks the raw text from your data and stores the chunks in this field.	Name of your choice (for example, `text`)	String	True
Amazon Bedrock stores metadata related to your knowledge base in this field, such as the Amazon S3 location of the file that contains this text.	Name of your choice (for example, `bedrock-metadata`)	String	False

Take note of the names you choose for the vector index name, vector field name, and metadata management mapping field names for when you create your knowledge base. Then choose Create.

After the vector index is created, you can proceed to create your knowledge base. The following table summarizes where you will enter each piece of information that you took note of.

Field	Corresponding field in knowledge base setup (Console)	Corresponding field in knowledge base setup (API)	Description
Graph ARN	Neptune Analytics Graph ARN	graphARN	The Amazon Resource Name (ARN) of the Neptune Analytics graph.
Metadata management (first mapping field)	Text field name	textField	The name of the field in which to store the raw text from your data sources.
Metadata management (second mapping field)	Bedrock-managed metadata field	metadataField	The name of the field in which to store metadata that Amazon Bedrock manages.

Pinecone

Note

If you use Pinecone, you agree to authorize AWS to access the designated third-party source on your behalf in order to provide vector store services to you. You're responsible for complying with any third-party terms applicable to use and and transfer of data from the third-party service.

For detailed documentation on setting up a vector store in Pinecone, see Pinecone as a knowledge base for Amazon Bedrock.

While you set up the vector store, take note of the following information, which you will fill out when you create a knowledge base:

Endpoint URL – The endpoint URL for your index management page.
Name Space – (Optional) The namespace to be used to write new data to your database. For more information, see Using namespaces.

There are additional configurations that you must provide when creating a Pinecone index:

Name – The name of the vector index. Choose any valid name of your choice. Later, when you create your knowledge base, enter the name you choose in the Vector index name field.

Dimensions – The number of dimensions in the vector. Refer to the following table to determine how many dimensions the vector should contain.

Model	Dimensions
Titan G1 Embeddings - Text	1,536
Titan V2 Embeddings - Text	1,024
Cohere Embed English	1,024
Cohere Embed Multilingual	1,024

Distance metric – The metric used to measure the similarity between vectors. We recommend that you experiment with different metrics for your use-case. We recommend starting with cosine similarity.

To access your Pinecone index, you must provide your Pinecone API key to Amazon Bedrock through the AWS Secrets Manager.

To set up a secret for your Pinecone configuration

Follow the steps at Create an AWS Secrets Manager secret, setting the key as apiKey and the value as the API key to access your Pinecone index.
To find your API key, open your Pinecone console and select API Keys.
After you create the secret, take note of the ARN of the KMS key.
Attach permissions to your service role to decrypt the ARN of the KMS key by following the steps in Permissions to decrypt an AWS Secrets Manager secret for the vector store containing your knowledge base.
Later, when you create your knowledge base, enter the ARN in the Credentials secret ARN field.

Redis Enterprise Cloud

Note

If you use Redis Enterprise Cloud, you agree to authorize AWS to access the designated third-party source on your behalf in order to provide vector store services to you. You're responsible for complying with any third-party terms applicable to use and transfer of data from the third-party service.

For detailed documentation on setting up a vector store in Redis Enterprise Cloud, see Integrating Redis Enterprise Cloud with Amazon Bedrock.

While you set up the vector store, take note of the following information, which you will fill out when you create a knowledge base:

Endpoint URL – The public endpoint URL for your database.
Vector index name – The name of the vector index for your database.

Vector field – The name of the field where the vector embeddings will be stored. Refer to the following table to determine how many dimensions the vector should contain.

Model	Dimensions
Titan G1 Embeddings - Text	1,536
Titan V2 Embeddings - Text	1,024
Cohere Embed English	1,024
Cohere Embed Multilingual	1,024

Text field – The name of the field where the Amazon Bedrock stores the chunks of raw text.
Bedrock-managed metadata field – The name of the field where Amazon Bedrock stores metadata related to your knowledge base.

To access your Redis Enterprise Cloud cluster, you must provide your Redis Enterprise Cloud security configuration to Amazon Bedrock through the AWS Secrets Manager.

To set up a secret for your Redis Enterprise Cloud configuration

Enable TLS to use your database with Amazon Bedrock by following the steps at Transport Layer Security (TLS).
Follow the steps at Create an AWS Secrets Manager secret. Set up the following keys with the appropriate values from your Redis Enterprise Cloud configuration in the secret:
- username – The username to access your Redis Enterprise Cloud database. To find your username, look under the Security section of your database in the Redis Console.
- password – The password to access your Redis Enterprise Cloud database. To find your password, look under the Security section of your database in the Redis Console.
- serverCertificate – The content of the certificate from the Redis Cloud Certificate authority. Download the server certificate from the Redis Admin Console by following the steps at Download certificates.
- clientPrivateKey – The private key of the certificate from the Redis Cloud Certificate authority. Download the server certificate from the Redis Admin Console by following the steps at Download certificates.
- clientCertificate – The public key of the certificate from the Redis Cloud Certificate authority. Download the server certificate from the Redis Admin Console by following the steps at Download certificates.
After you create the secret, take note of its ARN. Later, when you create your knowledge base, enter the ARN in the Credentials secret ARN field.

MongoDB Atlas

Note

If you use MongoDB Atlas, you agree to authorize AWS to access the designated third-party source on your behalf in order to provide vector store services to you. You're responsible for complying with any third-party terms applicable to use and and transfer of data from the third-party service.

For detailed documentation on setting up a vector store in MongoDB Atlas, see MongoDB Atlas as a knowledge base for Amazon Bedrock.

When you set up the vector store, note the following information which you will add when you create a knowledge base:

Endpoint URL – The endpoint URL of your MongoDB Atlas cluster.
Database name – The name of the database in your MongoDB Atlas cluster.
Collection name – The name of the collection in your database.
Credentials secret ARN – The Amazon Resource Name (ARN) of the secret that you created in AWS Secrets Manager that contains the username and password for a database user in your MongoDB Atlas cluster.
(Optional) Customer-managed KMS key for your Credentials secret ARN – if you encrypted your credentials secret ARN, provide the KMS key so that Amazon Bedrock can decrypt it.

There are additional configurations for Field mapping that you must provide when creating a MongoDB Atlas index:

Vector index name – The name of the MongoDB Atlas Vector Search Index on your collection.
Vector field name – The name of the field which Amazon Bedrock should store vector embeddings in.
Text field name – The name of the field which Amazon Bedrock should store the raw chunk text in.
Metadata field name – The name of the field which Amazon Bedrock should store source attribution metadata in.

(Optional) To have Amazon Bedrock connect to your MongoDB Atlas cluster over AWS PrivateLink, see RAG workflow with MongoDB Atlas using Amazon Bedrock.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Prerequisites for your knowledge base data

Create a knowledge base