Create a knowledge base connected to a structured data store
Amazon Bedrock Knowledge Bases is able to access data stored in the following structured data stores through the Amazon Redshift query engine:
Amazon Redshift
AWS Glue Data Catalog (AWS Lake Formation)
To create a knowledge base and connect it to a structured data store, you do the following:
-
Select a query engine and choose the option that corresponds to your data warehouse:
-
Amazon Redshift Provisioned
-
Amazon Redshift Serverless
-
-
Select one of the following authentication methods to use the query engine:
-
IAM – Authenticate with IAM credentials.
-
Temporary credentials user name – Authenticate through a database user name (not available for Redshift Serverless).
-
AWS Secrets Manager – Authenticate through a secret.
-
-
Select Amazon Redshift Databases or AWS Glue Data Catalog as your data store.
-
(Optional) Set query configurations to improve the accuracy of SQL generation:
-
Maximum query time – The amount of time after which the query times out.
-
Descriptions – Provides metadata or supplementary information about tables or columns. You can include descriptions of the tables or columns, usage notes, or any additional attributes. The descriptions you add can improve SQL query generation by providing extra context and information about the structure of the tables or columns.
-
Inclusions and Exclusions– Specifies a set of tables or columns to be included or excluded for SQL generation. This field is crucial if you want to limit the scope of SQL queries to a defined subset of available tables or columns. This option can help optimize the generation process by reducing unnecessary table or column references.
If you specify inclusions, all other tables and columns are ignored. If you specify exclusions, the tables and columns you specify are ignored.
Note
Inclusions and exclusions aren't a substitute for guardrails and is only intended for improving model accuracy.
-
Curated queries – A set of predefined question and answer examples. Questions are written as natural language queries (NLQ) and answers are the corresponding SQL query. These examples help the SQL generation process by providing examples of the kinds of queries that should be generated. They serve as reference points to improve the accuracy and relevance of generative SQL outputs.
-
To connect to a structured data store, send a CreateKnowledgeBase request with an Agents for Amazon Bedrock build-time endpoint with the following general request body:
{ "name": "string", "roleArn": "string", "knowledgeBaseConfiguration": { "type": "SQL", "sqlKnowledgeBaseConfiguration": SqlKnowledgeBaseConfiguration }, "description": "string", "clientToken": "string", "tags": { "string": "string" } }
The following fields are required.
Field | Basic description |
---|---|
Name | A name for the knowledge base |
roleArn | A knowledge base service role with the proper permissions. You can use the console to automatically create a service role with the proper permissions. |
knowledgeBaseConfiguration | Contains configurations for the knowledge base. For a structured database, specify SQL as the type and include the sqlKnowledgeBaseConfiguration field. |
The following fields are optional.
Field | Use |
---|---|
description | To include a description for the knowledge base. |
clientToken | To ensure the API request completes only once. For more information, see Ensuring idempotency. |
tags | To associate tags with the flow. For more information, see Tagging Amazon Bedrock resources. |
The SQLKnowledgeBaseConfiguration
depends on the query engine that you use. For Amazon Redshift, specify the type
field as REDSHIFT
and include the redshiftConfiguration
field, which maps to a RedshiftConfiguration. For the RedshiftConfiguration, you configure the following fields:
You can configure the following types of query engine:
If your Amazon Redshift databases are provisioned on dedicated compute nodes, the value of the queryEngineConfiguration
field should be a RedshiftQueryEngineConfiguration in the following format:
{ "type": "PROVISIONED", "provisionedConfiguration": { "clusterIdentifier": "string", "authConfiguration": RedshiftProvisionedAuthConfiguration }, }
Specify the ID of the cluster in the clusterIdentifier
field. The RedshiftProvisionedAuthConfiguration depends on the type of authorization you're using. Select the tab that matches your authorization method:
If you're using Amazon Redshift Serverless, the value of the queryConfiguration
field should be a RedshiftQueryEngineConfiguration in the following format:
{ "type": "SERVERLESS", "serverlessConfiguration": { "workgroupArn": "string", "authConfiguration": } }
Specify the ARN of your workgroup in the workgroupArn
field. The RedshiftServerlessAuthConfiguration depends on the type of authorization you're using. Select the tab that matches your authorization method:
This field maps to an array containing a single RedshiftQueryEngineStorageConfiguration, whose format depends on where your data is stored.
If your data is stored in AWS Glue Data Catalog, the
RedshiftQueryEngineStorageConfiguration
should be in the
following format:
{ "type": "AWS_DATA_CATALOG", "awsDataCatalogConfiguration": { "tableNames": ["string"] } }
Add the name of each table that you want to connect your knowledge base to in the array that tableNames
maps to.
Note
Enter table names in the pattern described in Cross-database queries (${databaseName}.${tableName}
). You can include all tables by specifying ${databaseName.*}
.
If your data is stored in an Amazon Redshift database, the RedshiftQueryEngineStorageConfiguration
should be in the following format:
{ "type": "string", "redshiftConfiguration": { "databaseName": "string" } }
Specify the name of your Amazon Redshift database in the databaseName
field.
Note
Enter table names in the pattern described in Cross-database queries (${databaseName}.${tableName}
). You can include all tables by specifying ${databaseName.*}
.
If your database is mounted through Amazon SageMaker AI Lakehouse, the database name is in the format ${db}@${schema}
.
This field maps to the following QueryGenerationConfiguration that you can use to configure how your data is queried:
{ "executionTimeoutSeconds": number, "generationContext": { "tables": [ { "name": "string", "description": "string", "inclusion": "string", "columns": [ { "name": "string", "description": "string", "inclusion": "string" }, ... ] }, ... ], "curatedQueries": [ { "naturalLanguage": "string", "sql": "string" }, ... ] } }
If you want the query to time out, specify the timeout duration in seconds in the executionTimeoutSeconds
field.
The generationContext
field maps to a QueryGenerationContext object in
which you can configure as many of the following options as you need.
Important
If you include a generation context, the query engine makes a best effort attempt to apply it when generating SQL. The generation context is non-deterministic and is only intended for improving model accuracy. To ensure accuracy, verify the generated SQL queries.
For information about generation contexts that you can include, expand the following sections:
To improve the accuracy of SQL generation for querying the database, you can provide a description for the table or column that provides more context than a short table or column name. You can do the following:
-
To add a description for a table, include a QueryGenerationTable object in the
tables
array. In that object, specify the name of the table in thename
field and a description in thedescription
field, as in the following example:{ "name": "database.schema.tableA", "description": "Description for Table A" }
-
To add a description for a column, include a QueryGenerationTable object in the
tables
array. In that object, specify the name of the table in thename
field and include thecolumns
field, which maps to an array of QueryGenerationColumn. In aQueryGenerationColumn
object, include the name of the column in thename
field and a description in thedescription
field, as in the following example:{ "name": "database.schema.tableA.columnA", "columns": [ { "name": "Column A", "description": "Description for Column A" } ] }
-
You can add a description for both a table and a column in it, as in the following example:
{ "name": "database.schema.tableA", "description": "Description for Table A", "columns": [ { "name": "database.schema.tableA.columnA", "description": "Description for Column A" } ] }
Note
Enter table and column names in the pattern described in Cross-database queries. If your database is in AWS Glue Data Catalog, the format is
awsdatacatalog.gluedatabase.table
.
You can suggest tables or columns to include or exclude when generating SQL
by using the inclusion
field in the QueryGenerationTable and
QueryGenerationColumn objects. You can specify one of the following values in the
inclusion
field:
-
INCLUDE – Only the tables or columns that you specify are included as context when generating SQL.
-
EXCLUDE – The tables or columns that you specify are excluded as context when generating SQL.
You can specify whether to include or exclude tables or columns in the following ways:
-
To include or exclude a table, include a QueryGenerationTable object in the
tables
array. In that object, specify the name of the table in thename
field and whether to include or exclude it in theinclusion
field, as in the following example:{ "name": "database.schema.tableA", "inclusion": "EXCLUDE" }
The query engine doesn't add
Table A
in the additional context for generating SQL. -
To include or exclude a column, include a QueryGenerationTable object in the
tables
array. In that object, specify the name of the table in thename
field and include thecolumns
field, which maps to an array of QueryGenerationColumn. In aQueryGenerationColumn
object, include the name of the column in thename
field and whether to include or exclude it in theinclusion
field, as in the following example:{ "name": "database.schema.tableA", "columns": [ { "name": "database.schema.tableA.columnA", "inclusion": "EXCLUDE" } ] }
The SQL generation ignores
Column A
inTable A
in the context when generating SQL. -
You can combine tables and columns when specifying inclusions or exclusions, as in the following example:
{ "name": "database.schema.tableA", "inclusion": "INCLUDE", "columns": [ { "name": "database.schema.tableA.columnA", "inclusion": "EXCLUDE" } ] }
SQL generation includes
Table A
, but excludesColumn A
within it when adding context for generating SQL.
Important
Table and column exclusions aren't substitutes for guardrails. These table and column inclusions and exclusions are used as additional context for model to consider when generating SQL.
To improve a query engine's accuracy in converting user queries into
SQL queries, you can provide it examples in the curatedQueries
field in the QueryGenerationContext object, which maps to an array of
CuratedQuery objects. Each object contains the following fields:
-
naturalLanguage – An example of a query in natural language.
-
sql – The SQL query that corresponds to the natural language query.