Using Aurora PostgreSQL as a Knowledge Base for Amazon Bedrock
You can use an Aurora PostgreSQL DB cluster as a Knowledge Base for Amazon Bedrock. For more information, see Create a vector store in Amazon Aurora. A Knowledge Base automatically takes unstructured text data stored in an Amazon S3 bucket, converts it to text chunks and vectors, and stores it in a PostgreSQL database. With the generative AI applications, you can use Agents for Amazon Bedrock to query the data stored in the Knowledge Base and use the results of those queries to augment answers provided by foundational models. This workflow is called Retrieval Augmented Generation (RAG). For more information on RAG, see Retrieval Augmented Generation (RAG).
For detailed information about using Aurora PostgreSQL to build generative AI applications using RAG, see this blog post
Topics
Prerequisites
Familiarize yourself with the following prerequisites to use Aurora PostgreSQL cluster as a Knowledge Base for Amazon Bedrock. At a high level, you need to configure the following services for use with Bedrock:
Amazon Aurora PostgreSQL DB cluster created in any of the following versions:
16.1 and all higher versions
15.4 and higher versions
14.9 and higher versions
13.12 and higher versions
12.16 and higher versions
Note
You must enable the
pgvector
extension in your target database and use version 0.5.0 or higher. For more information, see pgvector v0.5.0 with HNSW indexing. RDS Data API
A user managed in AWS Secrets Manager. For more information, see Password management with Amazon Aurora and AWS Secrets Manager.
Preparing Aurora PostgreSQL to be used as a Knowledge Base for Amazon Bedrock
Follow the steps explained in the below sections to prepare Aurora PostgreSQL to be used as a Knowledge Base for Amazon Bedrock.
Creating and configuring Aurora PostgreSQL
To configure Amazon Bedrock with an Aurora PostgreSQL DB cluster, you must first create an Aurora PostgreSQL DB cluster and take note of the important fields for configuring it with Amazon Bedrock. For more information about creating Aurora PostgreSQL DB cluster, see Creating and connecting to an Aurora PostgreSQL DB cluster.
Enable Data API while creating Aurora PostgreSQL DB cluster. For more information on the versions supported, see Using RDS Data API.
Make sure to note down the Amazon Resource Names (ARN) of your Aurora PostgreSQL DB cluster. You'll need it to configure the DB cluster for use with Amazon Bedrock. For more information, see Amazon Resource Names (ARNs).
Connecting to a database and installing pgvector
You can connect to Aurora PostgreSQL using any of the connection utilities. For more detailed information on these utilities, see Connecting to an Amazon Aurora PostgreSQL DB cluster. Alternatively, you can use the RDS console query editor to run the queries. You need an Aurora DB cluster with the RDS Data API enabled to use the query editor.
-
Log in to the database with your master user and set up pgvector. Use the following command if the extension is not installed:
CREATE EXTENSION IF NOT EXISTS vector;
Use
pgvector
0.5.0 and higher version that supports HNSW indexing. For more information, see pgvector v0.5.0 with HNSW indexing. -
Use the following command to check the version of the
pg_vector
installed:SELECT extversion FROM pg_extension WHERE extname='vector';
Setting up database objects and privileges
Create a specific schema that Bedrock can use to query the data. Use the following command to create a schema:
CREATE SCHEMA bedrock_integration;
Create a new role that Bedrock can use to query the database. Use the following command to create a new role:
CREATE ROLE bedrock_user WITH PASSWORD '
password
' LOGIN;Note
Make a note of this password as you will need it later to create a Secrets Manager password.
If you are using
psql
client, then use the following commands to create a new role:CREATE ROLE bedrock_user LOGIN; \PASSWORD
password
;Grant the
bedrock_user
permissions to manage thebedrock_integration
schema. This will provide the ability to create tables or indexes within the schema.GRANT ALL ON SCHEMA bedrock_integration to bedrock_user;
Login as the
bedrock_user
and create a table in thebedrock_integration schema
.CREATE TABLE bedrock_integration.bedrock_kb (id uuid PRIMARY KEY, embedding vector(
n
), chunks text, metadata json);This command will create the
bedrock_kb
table in thebedrock_integration
schema with Titan embeddings.Replace n in the
vector(
data type with the appropriate dimension for the embedding model you are using. Use the recommendations below to help select your dimensions:n
)For the Titan v2 model, use
vector(1024)
, orvector(512)
, orvector (256)
. To learn more, see Amazon Titan Embeddings Text.For the Titan v1.2 model, use
vector(1536)
. To learn more, see Amazon Titan Multimodal Embeddings G1.For the Cohere Embed model, use
vector(1024)
. To learn more, see Cohere Embed models.For the Cohere Embed Multilingual v3, use
vector(1024)
.
We recommend you to create an index with the cosine operator which the bedrock can use to query the data.
CREATE INDEX ON bedrock_integration.bedrock_kb USING hnsw (embedding vector_cosine_ops);
We recommend you to set the value of
ef_construction
to 256 forpgvector
0.6.0 and higher version that use parallel index building.CREATE INDEX ON bedrock_integration.bedrock_kb USING hnsw (embedding vector_cosine_ops) WITH (ef_construction=256);
Create a secret in Secrets Manager
Secrets Manager lets you store your Aurora credentials so that they can be securely transmitted to applications. If you didn't choose the AWS secrets manager option when creating Aurora PostgreSQL DB cluster, you can create a secret now. For more information about creating AWS Secrets Manager database secret, see AWS Secrets Manager database secret.
Creating a Knowledge Base in the Bedrock console
While preparing Aurora PostgreSQL to be used as the vector store for a Knowledge Base, you must gather the following details that you need to provide to Amazon Bedrock console.
Amazon Aurora DB cluster ARN – The ARN of your DB cluster.
Secret ARN – The ARN of the AWS Secrets Manager key for your DB cluster.
Database name – The name of your database. For example, you can use the default database
postgres
.Table name – We recommend you to provide a schema qualified name while creating the table using the command similar to the following:
CREATE TABLE bedrock_integration.bedrock_kb;
This command will create the
bedrock_kb
table in thebedrock_integration
schema.When creating the table, make sure to configure it with the specified columns and data types. You can use your preferred column names instead of those listed in the table. Remember to take a note of the names you chose for reference during the Knowledge Base set up.
Column name Data type Description id UUID primary key Contains unique identifiers for each record.
chunks Text Contains the chunks of raw text from your data sources.
embedding Vector Contains the vector embeddings of the data sources.
metadata JSON Contains metadata required to carry out source attribution and to enable data ingestion and querying.
With these details, you can now create a Knowledge Base in the Bedrock console. For more detailed information on setting up a vector index and creating a Knowledge Base information, see Create a vector store in Amazon Aurora and Create a vector store in Amazon Aurora.
After adding Aurora as your Knowledge Base, you can now ingest your data sources for searching and querying. For more information, see Ingest your data sources into the Knowledge Base.