# Using Aurora PostgreSQL as a Knowledge Base for Amazon Bedrock
<a name="AuroraPostgreSQL.VectorDB"></a>

You can use an Aurora PostgreSQL DB cluster as a Knowledge Base for Amazon Bedrock. For more information, see [Create a vector store in Amazon Aurora](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html). A Knowledge Base automatically takes unstructured text data stored in an Amazon S3 bucket, converts it to text chunks and vectors, and stores it in a PostgreSQL database. With the generative AI applications, you can use Agents for Amazon Bedrock to query the data stored in the Knowledge Base and use the results of those queries to augment answers provided by foundational models. This workflow is called Retrieval Augmented Generation (RAG). For more information on RAG, see [Retrieval Augmented Generation (RAG)](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-customize-rag.html).

For detailed information about using Aurora PostgreSQL to build generative AI applications using RAG, see this [blog post](https://aws.amazon.com/blogs/database/build-generative-ai-applications-with-amazon-aurora-and-knowledge-bases-for-amazon-bedrock/).

**Topics**
+ [

## Prerequisites
](#AuroraPostgreSQL.VectorDB.Prereq)
+ [

## Preparing Aurora PostgreSQL to be used as a Knowledge Base for Amazon Bedrock
](#AuroraPostgreSQL.VectorDB.PreparingKB)
+ [

## Creating a Knowledge Base in the Bedrock console
](#AuroraPostgreSQL.VectorDB.CreatingKB)
+ [

# Quick create an Aurora PostgreSQL Knowledge Base for Amazon Bedrock
](AuroraPostgreSQL.quickcreatekb.md)

## Prerequisites
<a name="AuroraPostgreSQL.VectorDB.Prereq"></a>

Familiarize yourself with the following prerequisites to use Aurora PostgreSQL cluster as a Knowledge Base for Amazon Bedrock. At a high level, you need to configure the following services for use with Bedrock:
+ Amazon Aurora PostgreSQL DB cluster created in any of the following versions:
  + 16.1 and all higher versions
  + 15.4 and higher versions
  + 14.9 and higher versions
  + 13.12 and higher versions
  + 12.16 and higher versions
**Note**  
You must enable the `pgvector` extension in your target database and use version 0.5.0 or higher. For more information, see [pgvector v0.5.0 with HNSW indexing](https://aws.amazon.com/about-aws/whats-new/2023/10/amazon-aurora-postgresql-pgvector-v0-5-0-hnsw-indexing/). 
+ RDS Data API
+ A user managed in AWS Secrets Manager. For more information, see [Password management with Amazon Aurora and AWS Secrets Manager](rds-secrets-manager.md).

## Preparing Aurora PostgreSQL to be used as a Knowledge Base for Amazon Bedrock
<a name="AuroraPostgreSQL.VectorDB.PreparingKB"></a>

Follow the steps explained in the below sections to prepare Aurora PostgreSQL to be used as a Knowledge Base for Amazon Bedrock.

### Creating and configuring Aurora PostgreSQL
<a name="AuroraPostgreSQL.VectorDB.CreatingDBC"></a>

To configure Amazon Bedrock with an Aurora PostgreSQL DB cluster, you must first create an Aurora PostgreSQL DB cluster and take note of the important fields for configuring it with Amazon Bedrock. For more information about creating Aurora PostgreSQL DB cluster, see [Creating and connecting to an Aurora PostgreSQL DB cluster](CHAP_GettingStartedAurora.CreatingConnecting.AuroraPostgreSQL.md).
+ Enable Data API while creating Aurora PostgreSQL DB cluster. For more information on the versions supported, see [Using the Amazon RDS Data API](data-api.md).
+ Make sure to note down the Amazon Resource Names (ARN) of your Aurora PostgreSQL DB cluster. You'll need it to configure the DB cluster for use with Amazon Bedrock. For more information, see [Amazon Resource Names (ARNs)](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_Tagging.ARN.html).

### Connecting to a database and installing pgvector
<a name="AuroraPostgreSQL.VectorDB.ConnectingDB"></a>

You can connect to Aurora PostgreSQL using any of the connection utilities. For more detailed information on these utilities, see [Connecting to an Amazon Aurora PostgreSQL DB cluster](Aurora.Connecting.md#Aurora.Connecting.AuroraPostgreSQL). Alternatively, you can use the RDS console query editor to run the queries. You need an Aurora DB cluster with the RDS Data API enabled to use the query editor.

1. Log in to the database with your master user and set up pgvector. Use the following command if the extension is not installed:

   ```
   CREATE EXTENSION IF NOT EXISTS vector;
   ```

   Use `pgvector` 0.5.0 and higher version that supports HNSW indexing. For more information, see [pgvector v0.5.0 with HNSW indexing](https://aws.amazon.com/about-aws/whats-new/2023/10/amazon-aurora-postgresql-pgvector-v0-5-0-hnsw-indexing/).

1. Use the following command to check the version of the `pg_vector` installed:

   ```
   SELECT extversion FROM pg_extension WHERE extname='vector';
   ```

### Setting up database objects and privileges
<a name="AuroraPostgreSQL.VectorDB.SetupDBObjects"></a>

1. Create a specific schema that Bedrock can use to query the data. Use the following command to create a schema:

   ```
   CREATE SCHEMA bedrock_integration;
   ```

1. Create a new role that Bedrock can use to query the database. Use the following command to create a new role:

   ```
   CREATE ROLE bedrock_user WITH PASSWORD 'password' LOGIN;
   ```
**Note**  
Make a note of this password as you will need it later to create a Secrets Manager password.

   If you are using `psql` client, then use the following commands to create a new role:

   ```
   CREATE ROLE bedrock_user LOGIN;
   \PASSWORD password;
   ```

1. Grant the `bedrock_user` permissions to manage the `bedrock_integration` schema. This will provide the ability to create tables or indexes within the schema.

   ```
   GRANT ALL ON SCHEMA bedrock_integration to bedrock_user;
   ```

1. Login as the `bedrock_user` and create a table in the `bedrock_integration schema`.

   ```
   CREATE TABLE bedrock_integration.bedrock_kb (id uuid PRIMARY KEY, embedding vector(n), chunks text, metadata json, custom_metadata jsonb);
   ```

   This command will create the `bedrock_kb` table in the `bedrock_integration` schema with Titan embeddings.

   Replace n in the `vector(n)` data type with the appropriate dimension for the embedding model you are using. Use the recommendations below to help select your dimensions:
   + For the Titan v2 model, use `vector(1024)`, or `vector(512)`, or `vector (256)`. To learn more, see [Amazon Titan Embeddings Text](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-titan-embed-text.html).
   + For the Titan v1.2 model, use `vector(1536)`. To learn more, see [Amazon Titan Multimodal Embeddings G1](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-titan-embed-mm.html).
   + For the Cohere Embed model, use `vector(1024)`. To learn more, see [Cohere Embed models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-embed.html).
   + For the Cohere Embed Multilingual v3, use `vector(1024)`.

   The first four columns are mandatory. For metadata handling, Bedrock writes data from your metadata files to the `custom_metadata` column. We recommend creating this column if you plan to use metadata and filtering. If you don't create a `custom_metadata` column, add individual columns for each metadata attribute in your table before you begin ingestion. For more information, see [Configure and customize queries and response generation](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-test-config.html).

1. Follow these steps to create the required indexes that Bedrock uses to query your data:
   + Create an index with the cosine operator which the bedrock can use to query the data.

     ```
     CREATE INDEX ON bedrock_integration.bedrock_kb USING hnsw (embedding vector_cosine_ops);
     ```
   + We recommend you to set the value of `ef_construction` to 256 for `pgvector` 0.6.0 and higher version that use parallel index building.

     ```
     CREATE INDEX ON bedrock_integration.bedrock_kb USING hnsw (embedding vector_cosine_ops) WITH (ef_construction=256);
     ```
   + Create an index which Bedrock can use to query the text data.

     ```
     CREATE INDEX ON bedrock_integration.bedrock_kb USING gin (to_tsvector('simple', chunks));
     ```
   + If you created a column for custom metadata, create an index which Bedrock can use to query the metadata.

     ```
     CREATE INDEX ON bedrock_integration.bedrock_kb USING gin (custom_metadata);
     ```

### Create a secret in Secrets Manager
<a name="AuroraPostgreSQL.VectorDB.SecretManager"></a>

Secrets Manager lets you store your Aurora credentials so that they can be securely transmitted to applications. If you didn't choose the AWS secrets manager option when creating Aurora PostgreSQL DB cluster, you can create a secret now. For more information about creating AWS Secrets Manager database secret, see [AWS Secrets Manager database secret](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_database_secret.html).

## Creating a Knowledge Base in the Bedrock console
<a name="AuroraPostgreSQL.VectorDB.CreatingKB"></a>

While preparing Aurora PostgreSQL to be used as the vector store for a Knowledge Base, you must gather the following details that you need to provide to Amazon Bedrock console.
+ **Amazon Aurora DB cluster ARN** – The ARN of your DB cluster.
+ **Secret ARN** – The ARN of the AWS Secrets Manager key for your DB cluster.
+ **Database name** – The name of your database. For example, you can use the default database *postgres*.
+ **Table name** – We recommend you to provide a schema qualified name while creating the table using the command similar to the following:

  ```
  CREATE TABLE bedrock_integration.bedrock_kb;
  ```

  This command will create the `bedrock_kb` table in the `bedrock_integration` schema.
+ When creating the table, make sure to configure it with the specified columns and data types. You can use your preferred column names instead of those listed in the table. Remember to take a note of the names you chose for reference during the Knowledge Base set up.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.VectorDB.html)

With these details, you can now create a Knowledge Base in the Bedrock console. For more detailed information on setting up a vector index and creating a Knowledge Base information, see [Create a vector store in Amazon Aurora](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup-rds.html) and [Create a vector store in Amazon Aurora](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-create.html).

After adding Aurora as your Knowledge Base, you can now ingest your data sources for searching and querying. For more information, see [Ingest your data sources into the Knowledge Base](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-ingest.html).

# Quick create an Aurora PostgreSQL Knowledge Base for Amazon Bedrock
<a name="AuroraPostgreSQL.quickcreatekb"></a>

 Amazon Bedrock's retrieval augmented generation (RAG) workflow relies on vector data stored in an Aurora PostgreSQL database to power content retrieval. Previously, setting up Aurora PostgreSQL as the vector data store for Bedrock Knowledge Bases was a multi-step process, requiring numerous manual actions across different user interfaces. This made it challenging for data scientists and developers to leverage Aurora for their Bedrock projects. 

 To improve the user experience, AWS has created a new CloudFormation-based quick create option that simplifies the setup process. With Aurora quick create, you can now provision a pre-configured Aurora PostgreSQL DB cluster as the vector store for your Amazon Bedrock Knowledge Bases with a single click. 

**Topics**
+ [

## Supported regions and Aurora PostgreSQL versions
](#AuroraPostgreSQL.quickcreatekb.avail)
+ [

## Understanding the quick create process
](#AuroraPostgreSQL.quickcreatekb.using)
+ [

## Benefits of using Aurora quick create
](#AuroraPostgreSQL.quickcreatekb.adv)
+ [

## Limitations of Aurora quick create process
](#AuroraPostgreSQL.quickcreatekb.limit)

## Supported regions and Aurora PostgreSQL versions
<a name="AuroraPostgreSQL.quickcreatekb.avail"></a>

The Aurora quick create option is available in all the AWS regions that support Amazon Bedrock Knowledge Bases. By default, it creates an Aurora PostgreSQL DB cluster with version 15.7. For more information about supported Regions, see [Supported models and regions for Amazon Bedrock Knowledge Bases](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-supported.html).

## Understanding the quick create process
<a name="AuroraPostgreSQL.quickcreatekb.using"></a>

The quick create process automatically provisions the following resources to set up an Amazon Aurora PostgreSQL database as the vector data store for your Bedrock Knowledge Base:

An Aurora PostgreSQL DB cluster in your account, configured with default settings.
+ ACUs (Aurora Capacity Units) are set from 0 to 16. This lets your vector store scale down to zero when not in use, saving on compute costs. The ACUs can be adjusted later in the Amazon RDS console.
+ (Hierarchical Navigable Small World) HNSW index using Euclidean distance as a similarity measure for the Bedrock vector embeddings stored in Aurora.
+ The DB instance is a serverless v2 instance.
+ The cluster is associated with the default VPC and subnets, and has the RDS Data API enabled.
+ The cluster admin credentials are managed by AWS Secrets Manager.

Besides the default settings, the following settings are set up for you. As you go through the process, you'll see screens that explains the workflow.
+ Seeding the Aurora cluster with the necessary database objects:
  + Create the pgvector extension, schema, role, and tables required for the Bedrock Knowledge Base.
  + Register a limited-privilege database user for Bedrock to interact with the cluster.
+  A progress banner will be displayed throughout the resource provisioning process, allowing you to track the status of the following sub-events: 
  + Aurora cluster creation
  + Seeding the Aurora cluster
  + Knowledge Base creation

  The banner stays visible until the knowledge base is fully created, even if you navigate away from the page and return.
+ You can click `View details` on the progress banner to see the status of each step. For more information about events during knowledge base creation, choose the CloudFormation link in the view details screen. Once the process is complete, your new Bedrock Knowledge Base will be ready to use.
+ The stack IDs for all the quick create resources can be found in the tags of the Bedrock Knowledge Base, should you need to reference them.

A Bedrock Knowledge Base, with the configuration to the newly provisioned Aurora cluster as the vector store is created.

## Benefits of using Aurora quick create
<a name="AuroraPostgreSQL.quickcreatekb.adv"></a>
+ The CloudFormation-based quick create process significantly reduces the time and complexity required to use Aurora as the vector store.
+ Aurora offers excellent performance, vector scalability and cost benefits with the ability to scale to zero compute charges when not in use.
+ The quick create process streamlines the end-to-end experience, allowing you to easily create and configure your Bedrock Knowledge Bases using Aurora.
+ Customers can build upon CloudFormation template to customize the provisioning with their own configurations. 

## Limitations of Aurora quick create process
<a name="AuroraPostgreSQL.quickcreatekb.limit"></a>
+ With Aurora quick create option, the DB cluster is provisioned with default configurations. However, these default settings may not meet your specific requirements or intended use case. Quick create does not offer options to modify the configurations during the provisioning process. The configurations are set automatically to streamline the deployment experience. If you need to customize the Aurora DB cluster configuration, you can do so after the initial deployment by quick create in the Amazon RDS console.
+ While quick create flow simplifies the setup process, the time to create the Aurora DB cluster is still approximately 10 minutes, the same as a manual deployment. This is due to the time required to provision the Aurora infrastructure.
+ The quick create option is designed for experimentation and quick setup. The resources created through quick create may not be suitable for production use, and you won't be able to directly migrate them to a production environment in your VPC.