Prerequisites for using a vector store you created for a knowledge base

Focus mode

Prerequisites for using a vector store you created for a knowledge base - Amazon Bedrock

To store the vector embeddings that your documents are converted to, you use a vector store. If you prefer for Amazon Bedrock to automatically create a vector index in Amazon OpenSearch Serverless for you, skip this prerequisite and proceed to Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases.

If you want to store binary vector embeddings instead of the standard floating-point (float32) vector embeddings, then you must use a vector store that supports binary vectors.

Note

Amazon OpenSearch Serverless and Amazon OpenSearch Managed clusters are the only vector stores that support storing binary vectors.

You can set up your own supported vector store to index the vector embeddings representation of your data. You create fields for the following data:

A field for the vectors generated from the text in your data source by the embeddings model that you choose.
A field for the text chunks extracted from the files in your data source.
Fields for source files metadata that Amazon Bedrock manages.
(If you use an Amazon Aurora database and want to set up filtering on metadata) Fields for metadata that you associate with your source files. If you plan to set up filtering in other vector stores, you don't have to set up these fields for filtering.

You can encrypt third-party vector stores with a KMS key. For more information, see Encryption of knowledge base resources.

Select the tab corresponding to the vector store service that you will use to create your vector index.

Amazon OpenSearch Serverless

To configure permissions and create a vector search collection in Amazon OpenSearch Serverless in the AWS Management Console, follow steps 1 and 2 at Working with vector search collections in the Amazon OpenSearch Service Developer Guide. Note the following considerations while setting up your collection:
1. Give the collection a name and description of your choice.
2. To make your collection private, select Standard create for the Security section. Then, in the Network access settings section, select VPC as the Access type and choose a VPC endpoint. For more information about setting up a VPC endpoint for an Amazon OpenSearch Serverless collection, see Access Amazon OpenSearch Serverless using an interface endpoint (AWS PrivateLink) in the Amazon OpenSearch Service Developer Guide.
Once the collection is created, take note of the Collection ARN for when you create the knowledge base.
In the left navigation pane, select Collections under Serverless. Then select your vector search collection.
Select the Indexes tab. Then choose Create vector index.
In the Vector index details section, enter a name for your index in the Vector index name field.

In the Vector fields section, choose Add vector field. Amazon Bedrock stores the vector embeddings for your data source in this field. Provide the following configurations:

Vector field name – Provide a name for the field (for example, embeddings).
Engine – The vector engine used for search. Select faiss.

Dimensions – The number of dimensions in the vector. Refer to the following table to determine how many dimensions the vector should contain:

Model	Dimensions
Titan G1 Embeddings - Text	1,536
Titan V2 Embeddings - Text	1,024, 512, and 256
Cohere Embed English	1,024
Cohere Embed Multilingual	1,024

Distance metric – The metric used to measure the similarity between vectors. We recommend using Euclidean for floating-point vector embeddings.

Expand the Metadata management section and add two fields to configure the vector index to store additional metadata that a knowledge base can retrieve with vectors. The following table describes the fields and the values to specify for each field:

Field description	Mapping field	Data type	Filterable
Amazon Bedrock chunks the raw text from your data and stores the chunks in this field.	Name of your choice (for example, `text`)	String	True
Amazon Bedrock stores metadata related to your knowledge base in this field.	Name of your choice (for example, `bedrock-metadata`)	String	False

Take note of the names you choose for the vector index name, vector field name, and metadata management mapping field names for when you create your knowledge base. Then choose Create.

After the vector index is created, you can proceed to create your knowledge base. The following table summarizes where you will enter each piece of information that you took note of.

Field	Corresponding field in knowledge base setup (Console)	Corresponding field in knowledge base setup (API)	Description
Collection ARN	Collection ARN	collectionARN	The Amazon Resource Name (ARN) of the vector search collection.
Vector index name	Vector index name	vectorIndexName	The name of the vector index.
Vector field name	Vector field	vectorField	The name of the field in which to store vector embeddings for your data sources.
Metadata management (first mapping field)	Text field	textField	The name of the field in which to store the raw text from your data sources.
Metadata management (second mapping field)	Bedrock-managed metadata field	metadataField	The name of the field in which to store metadata that Amazon Bedrock manages.

For more detailed documentation on setting up a vector store in Amazon OpenSearch Serverless, see Working with vector search collections in the Amazon OpenSearch Service Developer Guide.

Amazon OpenSearch Service Managed Clusters

Important

Before using any domain resources in OpenSearch Managed clusters, you need to configure certain IAM access permissions and policies. For more information, see Prerequisites and permissions required for using OpenSearch Managed Clusters with Amazon Bedrock Knowledge Bases.
If you encounter data ingestion failures, it might indicate insufficient OpenSearch domain capacity. To resolve this issue, increase your domain's capacity by provisioning higher IOPS and by increasing the throughput settings. For more information, see Operational best practices for Amazon OpenSearch Service.

To create a domain and vector index in OpenSearch Cluster in the AWS Management Console, follow the steps described in Creating and managing OpenSearch Service domains in the Amazon OpenSearch Service Developer Guide.

Note the following considerations while setting up your domain:
1. Give the domain a name of your choice.
2. We recommend that you use the Easy create option to get started quickly with creating your domain.
  
  Note
  This option gives you a domain with a low throughput. If you have larger workloads that require a higher throughput, choose the Standard Create option. You can adjust the capacity later as required. With this option, you can start with the lowest capacity, which can then be modified later as needed.
3. For Network, you must choose Public access. OpenSearch domains that are behind a VPC are not supported for your Knowledge Base.
4. For Version, if you're using binary vector embeddings, Amazon Bedrock Knowledge Bases requires an Engine version of 2.16 or later. In addition, a version of 2.13 or higher is required to create a k-nn index. For more information, see K-NN Search in the Amazon OpenSearch Service developer guide.
5. We recommend that you use the Dual-stack mode.
6. We recommend that you enable Fine-grained access control to protect the data in your domain, and further control the permissions that grants your Knowledge base service role access to the OpenSearch domain and make requests.
7. Leave all other settings to their default values and choose Create to create your domain.
Once the domain is created, click it to take note of the Domain ARN and Domain endpoint for when you create the knowledge base.

After you've created the domain, you can create a vector index by running the following commands on an OpenSearch dashboard or using curl commands. For more information, see the OpenSearch documentation.

When running the command:

Provide a name for the vector field (for example, embeddings).
Make sure that the vector used for search is faiss. nmslib is not supported.

For the number of dimensions in the vector, refer to the following table to determine how many dimensions the vector should contain:

Note

The Titan V2 Embeddings - Text model supports multiple dimensions. It can also be 256 or 512.

Model	Dimensions
Titan G1 Embeddings - Text	1,536
Titan V2 Embeddings - Text	1,024, 512, and 256
Cohere Embed English	1,024
Cohere Embed Multilingual	1,024

You can add two fields to configure the vector index to store additional metadata that a knowledge base can retrieve with vectors. The following table describes the fields and the values to specify for each of them.

Field description	Mapping field
Amazon Bedrock chunks the raw text from your data and stores the chunks in this field.	Specified as an object, for example, `AMAZON_BEDROCK_TEXT_CHUNK`.
Amazon Bedrock stores metadata related to your knowledge base in this field.	Specified as an object, for example, `AMAZON_BEDROCK_METADATA`.


PUT /<index-name>
{
    "settings": {
        "index": {
            "knn": true
        }
    },
    "mappings": {
        "properties": {
            "<vector-name>": {
                "type": "knn_vector",
                "dimension": <embedding-dimension>,
                "data_type": "binary",          # Only needed for binary embeddings
                "space_type": "l2" | "hamming", # Use l2 for float embeddings and hamming for binary embeddings
                "method": {
                    "name": "hnsw",
                    "engine": "faiss",
                    "parameters": {
                        "ef_construction": 128,
                        "m": 24
                    }
                }
            },

            "AMAZON_BEDROCK_METADATA": {
                "type": "text",
                "index": "false"
            },
            "AMAZON_BEDROCK_TEXT_CHUNK": {
                "type": "text",
                "index": "true"            
            }
        }
    }
}

Take note of the domain ARN and endpoint, and the names you choose for the vector index name, vector field name, and metadata management mapping field names for when you create your knowledge base.

After the vector index is created, you can proceed to create your knowledge base. The following table summarizes where you will enter each piece of information that you took note of.

Field	Corresponding field in knowledge base setup (Console)	Corresponding field in knowledge base setup (API)	Description
Domain ARN	Domain ARN	domainARN	The Amazon Resource Name (ARN) of the OpenSearch domain.
Domain endpoint	Domain endpoint	domainEndpoint	The endpoint to connect to the OpenSearch domain.
Vector index name	Vector index name	vectorIndexName	The name of the vector index.
Vector field name	Vector field	vectorField	The name of the field in which to store vector embeddings for your data sources.
Metadata management (first mapping field)	Text field	textField	The name of the field in which to store the raw text from your data sources.
Metadata management (second mapping field)	Bedrock-managed metadata field	metadataField	The name of the field in which to store metadata that Amazon Bedrock manages.

Amazon Aurora (RDS)

Create an Amazon Aurora database (DB) cluster, schema, and table by following the steps at Using Aurora PostgreSQL as a knowledge base. When you create the table, configure it with the following columns and data types. You can use column names of your liking instead of the ones listed in the following table. Take note of the column names you choose so that you can provide them during knowledge base setup.

Column name	Data type	Corresponding field in knowledge base setup (Console)	Corresponding field in knowledge base setup (API)	Description
id	UUID primary key	Primary key	`primaryKeyField`	Contains unique identifiers for each record.
embedding	Vector	Vector field	`vectorField`	Contains the vector embeddings of the data sources.
chunks	Text	Text field	`textField`	Contains the chunks of raw text from your data sources.
metadata	JSON	Bedrock-managed metadata field	`metadataField`	Contains metadata required to carry out source attribution and to enable data ingestion and querying

(Optional) If you added metadata to your files for filtering, you must also create a column for each metadata attribute in your files and specify the data type (text, number, or boolean). For example, if the attribute genre exists in your data source, you would add a column named genre and specifytext as the data type. During data ingestion, these columns will be populated with the corresponding attribute values.
Configure an AWS Secrets Manager secret for your Aurora DB cluster by following the steps at Password management with Amazon Aurora and AWS Secrets Manager.

Take note of the following information after you create your DB cluster and set up the secret.

Field in knowledge base setup (Console)	Field in knowledge base setup (API)	Description
Amazon Aurora DB Cluster ARN	resourceArn	The ARN of your DB cluster.
Database name	databaseName	The name of your database
Table name	tableName	The name of the table in your DB cluster
Secret ARN	credentialsSecretArn	The ARN of the AWS Secrets Manager key for your DB cluster

Neptune Analytics graphs (GraphRAG)

To create a graph and vector store in Neptune Analytics in the AWS Management Console, follow the steps described in Vector indexing in Neptune Analytics in the Neptune Analytics User Guide.

Note

To use Neptune GraphRAG, create an empty Neptune Analytics graph with a vector search index. The vector search index can only be created when the graph is created. When you create a Neptune Analytics graph in the console, you specify the index dimension under Vector search settings near the end of the process.

Note the following considerations while creating the graph:

Give the graph a name of your choice.
Under Data source, choose Create empty graph, and specify the number of m-NCUs to be allocated. Each m-NCU has around one GiB of memory capacity and corresponding compute and networking.

Note
The capacity of your graph can be modified later. We recommend that you start with the smallest instance and later choose a different instance, if needed.
You can leave the default availability settings, and under Network and Security, you must enable public access. Neptune Analytics graphs behind a VPC are not supported.

Under Vector search settings, choose Use vector dimension and specify the number of dimensions in each vector.

Note

The number of dimensions in each vector must match the vector dimensions in the embeddings model. Refer to the following table to determine how many dimensions the vector should contain:

Model	Dimensions
Titan G1 Embeddings - Text	1,536
Titan V2 Embeddings - Text	1,024, 512, and 256
Cohere Embed English	1,024
Cohere Embed Multilingual	1,024

Leave all other settings to their default and create the graph.

Once the graph is created, click it to take note of the Resource ARN and Vector dimensions for when you create the knowledge base.

Expand the Metadata management section and add two fields to configure the vector index to store additional metadata that's managed by Amazon Bedrock. The following table describes the fields and the values to specify for each field:

Field description	Mapping field	Data type	Filterable
Amazon Bedrock chunks the raw text from your data and stores the chunks in this field.	Name of your choice (for example, `text`)	String	True
Amazon Bedrock stores metadata related to your knowledge base in this field, such as the Amazon S3 location of the file that contains this text.	Name of your choice (for example, `bedrock-metadata`)	String	False

Take note of the names you choose for the vector index name, vector field name, and metadata management mapping field names for when you create your knowledge base. Then choose Create.

After the vector index is created, you can proceed to create your knowledge base. The following table summarizes where you will enter each piece of information that you took note of.

Field	Corresponding field in knowledge base setup (Console)	Corresponding field in knowledge base setup (API)	Description
Graph ARN	Neptune Analytics Graph ARN	graphARN	The Amazon Resource Name (ARN) of the Neptune Analytics graph.
Metadata management (first mapping field)	Text field name	textField	The name of the field in which to store the raw text from your data sources.
Metadata management (second mapping field)	Bedrock-managed metadata field	metadataField	The name of the field in which to store metadata that Amazon Bedrock manages.

Pinecone

Note

If you use Pinecone, you agree to authorize AWS to access the designated third-party source on your behalf in order to provide vector store services to you. You're responsible for complying with any third-party terms applicable to use and and transfer of data from the third-party service.

For detailed documentation on setting up a vector store in Pinecone, see Pinecone as a knowledge base for Amazon Bedrock.

While you set up the vector store, take note of the following information, which you will fill out when you create a knowledge base:

Endpoint URL – The endpoint URL for your index management page.
Credentials secret ARN – The Amazon Resource Name (ARN) of the secret that you created in AWS Secrets Manager that contains the username and password for a database user.
(Optional) Customer-managed KMS key for your Credentials secret ARN – if you encrypted your credentials secret ARN, provide the KMS key so that Amazon Bedrock can decrypt it.
Name Space – (Optional) The namespace to be used to write new data to your database. For more information, see Using namespaces.

There are additional configurations that you must provide when creating a Pinecone index:

Text field name – The name of the field which Amazon Bedrock should store the raw chunk text in.
Metadata field name – The name of the field which Amazon Bedrock should store source attribution metadata in.

To access your Pinecone index, you must provide your Pinecone API key to Amazon Bedrock through the AWS Secrets Manager.

To set up a secret for your Pinecone configuration

Follow the steps at Create an AWS Secrets Manager secret, setting the key as apiKey and the value as the API key to access your Pinecone index.
To find your API key, open your Pinecone console and select API Keys.
After you create the secret, take note of the ARN of the KMS key.
Attach permissions to your service role to decrypt the ARN of the KMS key by following the steps in Permissions to decrypt an AWS Secrets Manager secret for the vector store containing your knowledge base.
Later, when you create your knowledge base, enter the ARN in the Credentials secret ARN field.

Redis Enterprise Cloud

Note

If you use Redis Enterprise Cloud, you agree to authorize AWS to access the designated third-party source on your behalf in order to provide vector store services to you. You're responsible for complying with any third-party terms applicable to use and transfer of data from the third-party service.

For detailed documentation on setting up a vector store in Redis Enterprise Cloud, see Integrating Redis Enterprise Cloud with Amazon Bedrock.

While you set up the vector store, take note of the following information, which you will fill out when you create a knowledge base:

Endpoint URL – The public endpoint URL for your database.
Vector index name – The name of the vector index for your database.

Vector field – The name of the field where the vector embeddings will be stored. Refer to the following table to determine how many dimensions the vector should contain.

Model	Dimensions
Titan G1 Embeddings - Text	1,536
Titan V2 Embeddings - Text	1,024, 512, and 256
Cohere Embed English	1,024
Cohere Embed Multilingual	1,024

Text field – The name of the field where the Amazon Bedrock stores the chunks of raw text.
Bedrock-managed metadata field – The name of the field where Amazon Bedrock stores metadata related to your knowledge base.

To access your Redis Enterprise Cloud cluster, you must provide your Redis Enterprise Cloud security configuration to Amazon Bedrock through the AWS Secrets Manager.

To set up a secret for your Redis Enterprise Cloud configuration

Enable TLS to use your database with Amazon Bedrock by following the steps at Transport Layer Security (TLS).
Follow the steps at Create an AWS Secrets Manager secret. Set up the following keys with the appropriate values from your Redis Enterprise Cloud configuration in the secret:
- username – The username to access your Redis Enterprise Cloud database. To find your username, look under the Security section of your database in the Redis Console.
- password – The password to access your Redis Enterprise Cloud database. To find your password, look under the Security section of your database in the Redis Console.
- serverCertificate – The content of the certificate from the Redis Cloud Certificate authority. Download the server certificate from the Redis Admin Console by following the steps at Download certificates.
- clientPrivateKey – The private key of the certificate from the Redis Cloud Certificate authority. Download the server certificate from the Redis Admin Console by following the steps at Download certificates.
- clientCertificate – The public key of the certificate from the Redis Cloud Certificate authority. Download the server certificate from the Redis Admin Console by following the steps at Download certificates.
After you create the secret, take note of its ARN. Later, when you create your knowledge base, enter the ARN in the Credentials secret ARN field.

MongoDB Atlas

Note

If you use MongoDB Atlas, you agree to authorize AWS to access the designated third-party source on your behalf in order to provide vector store services to you. You're responsible for complying with any third-party terms applicable to use and and transfer of data from the third-party service.

For detailed documentation on setting up a vector store in MongoDB Atlas, see MongoDB Atlas as a knowledge base for Amazon Bedrock.

When you set up the vector store, note the following information which you will add when you create a knowledge base:

Endpoint URL – The endpoint URL of your MongoDB Atlas cluster.
Database name – The name of the database in your MongoDB Atlas cluster.
Collection name – The name of the collection in your database.
Credentials secret ARN – The Amazon Resource Name (ARN) of the secret that you created in AWS Secrets Manager that contains the username and password for a database user in your MongoDB Atlas cluster.
(Optional) Customer-managed KMS key for your Credentials secret ARN – if you encrypted your credentials secret ARN, provide the KMS key so that Amazon Bedrock can decrypt it.

There are additional configurations for Field mapping that you must provide when creating a MongoDB Atlas index:

Vector index name – The name of the MongoDB Atlas Vector Search Index on your collection.
Vector field name – The name of the field which Amazon Bedrock should store vector embeddings in.
Text field name – The name of the field which Amazon Bedrock should store the raw chunk text in.
Metadata field name – The name of the field which Amazon Bedrock should store source attribution metadata in.

(Optional) To have Amazon Bedrock connect to your MongoDB Atlas cluster over AWS PrivateLink, see RAG workflow with MongoDB Atlas using Amazon Bedrock.

anchor anchor anchor anchor anchor anchor anchor

To configure permissions and create a vector search collection in Amazon OpenSearch Serverless in the AWS Management Console, follow steps 1 and 2 at Working with vector search collections in the Amazon OpenSearch Service Developer Guide. Note the following considerations while setting up your collection:
1. Give the collection a name and description of your choice.
2. To make your collection private, select Standard create for the Security section. Then, in the Network access settings section, select VPC as the Access type and choose a VPC endpoint. For more information about setting up a VPC endpoint for an Amazon OpenSearch Serverless collection, see Access Amazon OpenSearch Serverless using an interface endpoint (AWS PrivateLink) in the Amazon OpenSearch Service Developer Guide.
Once the collection is created, take note of the Collection ARN for when you create the knowledge base.
In the left navigation pane, select Collections under Serverless. Then select your vector search collection.
Select the Indexes tab. Then choose Create vector index.
In the Vector index details section, enter a name for your index in the Vector index name field.

In the Vector fields section, choose Add vector field. Amazon Bedrock stores the vector embeddings for your data source in this field. Provide the following configurations:

Vector field name – Provide a name for the field (for example, embeddings).
Engine – The vector engine used for search. Select faiss.

Dimensions – The number of dimensions in the vector. Refer to the following table to determine how many dimensions the vector should contain:

Model	Dimensions
Titan G1 Embeddings - Text	1,536
Titan V2 Embeddings - Text	1,024, 512, and 256
Cohere Embed English	1,024
Cohere Embed Multilingual	1,024

Distance metric – The metric used to measure the similarity between vectors. We recommend using Euclidean for floating-point vector embeddings.

Field description	Mapping field	Data type	Filterable
Amazon Bedrock chunks the raw text from your data and stores the chunks in this field.	Name of your choice (for example, `text`)	String	True
Amazon Bedrock stores metadata related to your knowledge base in this field.	Name of your choice (for example, `bedrock-metadata`)	String	False

Take note of the names you choose for the vector index name, vector field name, and metadata management mapping field names for when you create your knowledge base. Then choose Create.

After the vector index is created, you can proceed to create your knowledge base. The following table summarizes where you will enter each piece of information that you took note of.

Field	Corresponding field in knowledge base setup (Console)	Corresponding field in knowledge base setup (API)	Description
Collection ARN	Collection ARN	collectionARN	The Amazon Resource Name (ARN) of the vector search collection.
Vector index name	Vector index name	vectorIndexName	The name of the vector index.
Vector field name	Vector field	vectorField	The name of the field in which to store vector embeddings for your data sources.
Metadata management (first mapping field)	Text field	textField	The name of the field in which to store the raw text from your data sources.
Metadata management (second mapping field)	Bedrock-managed metadata field	metadataField	The name of the field in which to store metadata that Amazon Bedrock manages.

For more detailed documentation on setting up a vector store in Amazon OpenSearch Serverless, see Working with vector search collections in the Amazon OpenSearch Service Developer Guide.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Prerequisites for your knowledge base data

Prerequisites for OpenSearch Managed Clusters

Select your cookie preferences

Customize cookie preferences

Essential

Performance

Functional

Advertising

Unable to save cookie preferences

Prerequisites for using a vector store you created for a knowledge base

Note

Important

Note

Note

Note

Note

Note

Note

To set up a secret for your Pinecone configuration

Note

To set up a secret for your Redis Enterprise Cloud configuration

Note

Related resources

Did this page help you?

Related resources

Next topic:

Previous topic:

Need help?