Using Aurora PostgreSQL as a knowledge base for Amazon Bedrock - Amazon Aurora

Using Aurora PostgreSQL as a knowledge base for Amazon Bedrock

You can use an Aurora PostgreSQL DB cluster as a knowledge base for Amazon Bedrock. For more information, see Create a vector store in Amazon Aurora. A knowledge base automatically takes unstructured text data stored in an Amazon S3 bucket, converts it to text chunks and vectors, and stores it in a PostgreSQL database. With the generative AI applications, you can use Agents for Amazon Bedrock to query the data stored in the knowledge base and use the results of those queries to augment answers provided by foundational models. This workflow is called Retrieval Augmented Generation (RAG). For more information on RAG, see Retrieval Augmented Generation (RAG).

For detailed information about using Aurora PostgreSQL to build generative AI applications using RAG, see this blog post.

Prerequisites

Familiarize yourself with the following prerequisites to use Aurora PostgreSQL cluster as a knowledge base for Amazon Bedrock. At a high level, you need to configure the following services for use with Bedrock:

  • Amazon Aurora PostgreSQL DB cluster created in any of the following versions:

    • 16.1 and all higher versions

    • 15.4 and higher versions

    • 14.9 and higher versions

    • 13.12 and higher versions

    • 12.16 and higher versions

    Note

    You must enable the pgvector extension in your target database and use version 0.5.0 or higher. For more information, see pgvector v0.5.0 with HNSW indexing.

  • RDS Data API

  • A user managed in AWS Secrets Manager. For more information, see Password management with Amazon Aurora and AWS Secrets Manager.

Preparing Aurora PostgreSQL to be used as a knowledge base for Amazon Bedrock

Follow the steps explained in the below sections to prepare Aurora PostgreSQL to be used as a knowledge base for Amazon Bedrock.

Creating and configuring Aurora PostgreSQL

To configure Amazon Bedrock with an Aurora PostgreSQL DB cluster, you must first create an Aurora PostgreSQL DB cluster and take note of the important fields for configuring it with Amazon Bedrock. For more information about creating Aurora PostgreSQL DB cluster, see Creating and connecting to an Aurora PostgreSQL DB cluster.

  • Enable Data API while creating Aurora PostgreSQL DB cluster. For more information on the versions supported, see Using RDS Data API.

  • Make sure to note down the Amazon Resource Names (ARN) of your Aurora PostgreSQL DB cluster. You'll need it to configure the DB cluster for use with Amazon Bedrock. For more information, see Amazon Resource Names (ARNs).

Connecting to a database and installing pgvector

You can connect to Aurora PostgreSQL using any of the connection utilities. For more detailed information on these utilities, see Connecting to an Amazon Aurora PostgreSQL DB cluster. Alternatively, you can use the RDS console query editor to run the queries. You need an Aurora DB cluster with the RDS Data API enabled to use the query editor.

  1. Log in to the database with your master user and set up pgvector. Use the following command if the extension is not installed:

    CREATE EXTENSION IF NOT EXISTS vector;

    Use pgvector 0.5.0 and higher version that supports HNSW indexing. For more information, see pgvector v0.5.0 with HNSW indexing.

  2. Use the following command to check the version of the pg_vector installed:

    SELECT extversion FROM pg_extension WHERE extname='vector';

Setting up database objects and privileges

  1. Create a specific schema that Bedrock can use to query the data. Use the following command to create a schema:

    CREATE SCHEMA bedrock_integration;
  2. Create a new role that Bedrock can use to query the database. Use the following command to create a new role:

    CREATE ROLE bedrock_user WITH PASSWORD 'password' LOGIN;
    Note

    Make a note of this password as you will need it later to create a Secrets Manager password.

    If you are using psql client, then use the following commands to create a new role:

    CREATE ROLE bedrock_user LOGIN; \PASSWORD password;
  3. Grant the bedrock_user permissions to manage the bedrock_integration schema. This will provide the ability to create tables or indexes within the schema.

    GRANT ALL ON SCHEMA bedrock_integration to bedrock_user;
  4. Login as the bedrock_user and create a table in the bedrock_integration schema.

    CREATE TABLE bedrock_integration.bedrock_kb (id uuid PRIMARY KEY, embedding vector(n), chunks text, metadata json);

    This command will create the bedrock_kb table in the bedrock_integration schema with Titan embeddings.

    Replace n in the vector(n) data type with the appropriate dimension for the embedding model you are using. Use the recommendations below to help select your dimensions:

  5. We recommend you to create an index with the cosine operator which the bedrock can use to query the data.

    CREATE INDEX ON bedrock_integration.bedrock_kb USING hnsw (embedding vector_cosine_ops);
  6. We recommend you to set the value of ef_construction to 256 for pgvector 0.6.0 and higher version that use parallel index building.

    CREATE INDEX ON bedrock_integration.bedrock_kb USING hnsw (embedding vector_cosine_ops) WITH (ef_construction=256);

Create a secret in Secrets Manager

Secrets Manager lets you store your Aurora credentials so that they can be securely transmitted to applications. If you didn't choose the AWS secrets manager option when creating Aurora PostgreSQL DB cluster, you can create a secret now. For more information about creating AWS Secrets Manager database secret, see AWS Secrets Manager database secret.

Creating a knowledge base in the Bedrock console

While preparing Aurora PostgreSQL to be used as the vector store for a knowledge base, you must gather the following details that you need to provide to Amazon Bedrock console.

  • Amazon Aurora DB cluster ARN – The ARN of your DB cluster.

  • Secret ARN – The ARN of the AWS Secrets Manager key for your DB cluster.

  • Database name – The name of your database. For example, you can use the default database postgres.

  • Table name – We recommend you to provide a schema qualified name while creating the table using the command similar to the following:

    CREATE TABLE bedrock_integration.bedrock_kb;

    This command will create the bedrock_kb table in the bedrock_integration schema.

  • When creating the table, make sure to configure it with the specified columns and data types. You can use your preferred column names instead of those listed in the table. Remember to take a note of the names you chose for reference during the knowledge base set up.

    Column name Data type Description
    id UUID primary key

    Contains unique identifiers for each record.

    chunks Text

    Contains the chunks of raw text from your data sources.

    embedding Vector

    Contains the vector embeddings of the data sources.

    metadata JSON

    Contains metadata required to carry out source attribution and to enable data ingestion and querying.

With these details, you can now create a knowledge base in the Bedrock console. For more detailed information on setting up a vector index and creating a knowledge base information, see Create a vector store in Amazon Aurora and Create a vector store in Amazon Aurora.

After adding Aurora as your knowledge base, you can now ingest your data sources for searching and querying. For more information, see Ingest your data sources into the knowledge base.