

# Build a knowledge base by connecting to a data source
<a name="knowledge-base-build"></a>

Amazon Bedrock Knowledge Bases supports a variety of file types stored in data sources. In order to interpret the data from a data source, Amazon Bedrock Knowledge Bases requires the conversion of the data into vector embeddings, a numerical representation of the data. These embeddings can be compared to the vector representations of a query to assess similarity and determine which sources to return during data retrieval.

Connecting your knowledge base to a data source involves the following general steps:

1. Connect the knowledge base to a supported data source.

1. If your data source contains multimodal data, including images, audio, and video files, you must choose an appropriate processing approach and embedding model that supports multimodal content.
**Note**  
Multimodal data is only supported with Amazon S3 and custom data sources. For comprehensive guidance on working with multimodal content, see [Build a knowledge base for multimodal content](kb-multimodal.md).

1. Choose an embeddings model to convert the data in the data source into vector embeddings.

1. Choose a vector store to store the vector representation of your data.

1. Sync your data so it's converted to vector embeddings.

1. If you modify the data in the data source, you must resync the changes.

**Topics**
+ [Prerequisites for creating an Amazon Bedrock knowledge basewith a unstructured data source](knowledge-base-prereq.md)
+ [Prerequisites and permissions required for using OpenSearch Managed Clusters with Amazon Bedrock Knowledge Bases](kb-osm-permissions-prereq.md)
+ [Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases](knowledge-base-create.md)
+ [Sync your data with your Amazon Bedrock knowledge base](kb-data-source-sync-ingest.md)
+ [Ingest changes directly into a knowledge base](kb-direct-ingestion.md)
+ [View data source information for your Amazon Bedrock knowledge base](kb-ds-info.md)
+ [Modify a data source for your Amazon Bedrock knowledge base](kb-ds-update.md)
+ [Delete a data source from your Amazon Bedrock knowledge base](kb-ds-delete.md)

# Prerequisites for creating an Amazon Bedrock knowledge basewith a unstructured data source
<a name="knowledge-base-prereq"></a>

Amazon Bedrock knowledge bases require data and models to retrieve and generate responses, a vector store to store the vector representation of the data, and AWS Identity and Access Management permissions to access your data and perform actions.

Before you can create a knowledge base, you must fulfill the following prerequisites. For general permissions requirements, see [Set up permissions for a user or role to create and manage knowledge bases](knowledge-base-prereq-permissions-general.md)

1. Make sure your data is in a [supported data source connector](data-source-connectors.md).

1. (Optional) [Set up your own supported vector store](knowledge-base-setup.md). You can skip this step if you plan to use the AWS Management Console to automatically create a vector store for you.

1. (Optional) Create a custom AWS Identity and Access Management (IAM) [service role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_terms-and-concepts.html#iam-term-service-role) with the proper permissions by following the instructions at [Create a service role for Amazon Bedrock Knowledge Bases](kb-permissions.md). You can use the AWS Management Console to automatically create a service role for you.
**Note**  
If you're creating a knowledge base with Amazon OpenSearch Service (including Amazon OpenSearch Serverless), the service role requires additional permissions beyond those covered by the AWS-managed BedrockFullAccess policy. These include `aoss:CreateAccessPolicy`, `iam:CreateServiceLinkedRole`, and `iam:CreateRole` permissions.

1. (Optional) Set up extra security configurations by following the steps at [Encryption of knowledge base resources](encryption-kb.md).

1. (Optional) If you plan to use the [https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_RetrieveAndGenerate.html](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_RetrieveAndGenerate.html) API operation to generate responses based on information retrieved from your knowledge base, request access to the models that you'll use in the Regions that you'll use them in by following the steps at [Request access to models](model-access.md).

**Topics**
+ [Prerequisites for your Amazon Bedrock knowledge base data](knowledge-base-ds.md)
+ [Prerequisites for using a vector store you created for a knowledge base](knowledge-base-setup.md)

# Prerequisites for your Amazon Bedrock knowledge base data
<a name="knowledge-base-ds"></a>

A data source contains files or content with information that can be retrieved when your knowledge base is queried. You must store your documents or content in at least one of the [supported data sources](https://docs.aws.amazon.com/bedrock/latest/userguide/data-source-connectors.html).

## Supported document formats and limits for knowledge base data
<a name="kb-ds-supported-doc-formats-limits"></a>

When you connect to a [supported data source](https://docs.aws.amazon.com/bedrock/latest/userguide/data-source-connectors.html), the content is ingested into your knowledge base.

If you use Amazon S3 to store your files or your data source includes attached files, then you first must check that each source document file adheres to the following:
+ The source files are of the following supported formats:  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-ds.html)
+ Each file size doesn't exceed the quota of 50 MB.

If you use an Amazon S3 or custom data source, you can use multimodal data, including JPEG (.jpeg) or PNG (.png) images or files that contain tables, charts, diagrams, or other images.

**Note**  
The maximum size of .JPEG and .PNG files is 3.75 MB.

# Prerequisites for using a vector store you created for a knowledge base
<a name="knowledge-base-setup"></a>

To store the vector embeddings that your documents are converted to, you use a vector store. Amazon Bedrock Knowledge Bases supports a quick-create flow for some of the vector stores, so if you prefer for Amazon Bedrock to automatically create a vector index for you in one of those vector stores, skip this prerequisite and proceed to [Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases](knowledge-base-create.md).

If you want to store binary vector embeddings instead of the standard floating-point (float32) vector embeddings, then you must use a vector store that supports binary vectors.

**Note**  
Amazon OpenSearch Serverless and Amazon OpenSearch Managed clusters are the only vector stores that support storing binary vectors.

You can set up your own supported vector store to index the vector embeddings representation of your data. You create fields for the following data:
+ A field for the vectors generated from the text in your data source by the embeddings model that you choose.
+ A field for the text chunks extracted from the files in your data source.
+ Fields for source files metadata that Amazon Bedrock manages.
+ (If you use an Amazon Aurora database and want to set up [filtering on metadata](kb-test-config.md)) Fields for metadata that you associate with your source files. If you plan to set up filtering in other vector stores, you don't have to set up these fields for filtering.

You can encrypt third-party vector stores with a KMS key. For more information, see [Encryption of knowledge base resources](https://docs.aws.amazon.com/bedrock/latest/userguide/encryption-kb.html).

Select the tab corresponding to the vector store service that you will use to create your vector index.

**Note**  
Your choice of embeddings model and vector dimensions can affect the available vector store choices. If you are not able to use your preferred vector store, choose compatible options the embeddings model and vector dimensions.

------
#### [ Amazon OpenSearch Serverless ]

1. To configure permissions and create a vector search collection in Amazon OpenSearch Serverless in the AWS Management Console, follow steps 1 and 2 at [Working with vector search collections](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-vector-search.html) in the Amazon OpenSearch Service Developer Guide. Note the following considerations while setting up your collection:

   1. Give the collection a name and description of your choice.

   1. To make your collection private, select **Standard create** for the **Security** section. Then, in the **Network access settings** section, select **VPC** as the **Access type** and choose a VPC endpoint. For more information about setting up a VPC endpoint for an Amazon OpenSearch Serverless collection, see [Access Amazon OpenSearch Serverless using an interface endpoint (AWS PrivateLink)](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-vpc.html) in the Amazon OpenSearch Service Developer Guide.

1. Once the collection is created, take note of the **Collection ARN** for when you create the knowledge base.

1. In the left navigation pane, select **Collections** under **Serverless**. Then select your vector search collection.

1. Select the **Indexes** tab. Then choose **Create vector index**.

1. In the **Vector index details** section, enter a name for your index in the **Vector index name** field.

1. In the **Vector fields** section, choose **Add vector field**. Amazon Bedrock stores the vector embeddings for your data source in this field. Provide the following configurations:
   + **Vector field name** – Provide a name for the field (for example, **embeddings**).
   + **Engine** – The vector engine used for search. Select **faiss**.
   + **Dimensions** – The number of dimensions in the vector. Refer to the following table to determine how many dimensions the vector should contain:  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html)
   + **Distance metric** – The metric used to measure the similarity between vectors. We recommend using **Euclidean** for floating-point vector embeddings.

1. Expand the **Metadata management** section and add two fields to configure the vector index to store additional metadata that a knowledge base can retrieve with vectors. The following table describes the fields and the values to specify for each field:  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html)

1. Take note of the names you choose for the vector index name, vector field name, and metadata management mapping field names for when you create your knowledge base. Then choose **Create**.

After the vector index is created, you can proceed to [create your knowledge base](knowledge-base-create.md). The following table summarizes where you will enter each piece of information that you took note of.


| Field | Corresponding field in knowledge base setup (Console) | Corresponding field in knowledge base setup (API) | Description | 
| --- | --- | --- | --- | 
| Collection ARN | Collection ARN | collectionARN | The Amazon Resource Name (ARN) of the vector search collection. | 
| Vector index name | Vector index name | vectorIndexName | The name of the vector index. | 
| Vector field name | Vector field | vectorField | The name of the field in which to store vector embeddings for your data sources. | 
| Metadata management (first mapping field) | Text field | textField | The name of the field in which to store the raw text from your data sources. | 
| Metadata management (second mapping field) | Bedrock-managed metadata field | metadataField | The name of the field in which to store metadata that Amazon Bedrock manages.  | 

For more detailed documentation on setting up a vector store in Amazon OpenSearch Serverless, see [Working with vector search collections](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-vector-search.html) in the Amazon OpenSearch Service Developer Guide.

------
#### [ Amazon OpenSearch Service Managed Clusters ]

**Important**  
Before using any domain resources in OpenSearch Managed clusters, you need to configure certain IAM access permissions and policies. For more information, see [Prerequisites and permissions required for using OpenSearch Managed Clusters with Amazon Bedrock Knowledge BasesOverview of permissions configuration](kb-osm-permissions-prereq.md).
If you encounter data ingestion failures, it might indicate insufficient OpenSearch domain capacity. To resolve this issue, increase your domain's capacity by provisioning higher IOPS and by increasing the throughput settings. For more information, see [Operational best practices for Amazon OpenSearch Service](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/bp.html).

1. To create a domain and vector index in OpenSearch Cluster in the AWS Management Console, follow the steps described in [Creating and managing OpenSearch Service domains](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/createupdatedomains.html) in the *Amazon OpenSearch Service Developer Guide*.

   Note the following considerations while setting up your domain:

   1. Give the domain a name of your choice.

   1. We recommend that you use the **Easy create** option to get started quickly with creating your domain.
**Note**  
This option gives you a domain with a low throughput. If you have larger workloads that require a higher throughput, choose the **Standard Create** option. You can adjust the capacity later as required. With this option, you can start with the lowest capacity, which can then be modified later as needed.

   1. For Network, you must choose **Public access**. OpenSearch domains that are behind a VPC are not supported for your Knowledge Base.

   1. For **Version**, if you're using binary vector embeddings, Amazon Bedrock Knowledge Bases requires an Engine version of 2.16 or later. In addition, a version of 2.13 or higher is required to create a k-nn index. For more information, see [K-NN Search](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/knn.html) in the *Amazon OpenSearch Service developer guide*.

   1. We recommend that you use the **Dual-stack mode**.

   1. We recommend that you enable **Fine-grained access control** to protect the data in your domain, and further control the permissions that grants your Knowledge base service role access to the OpenSearch domain and make requests.

   1. Leave all other settings to their default values and choose **Create** to create your domain.

1. Once the domain is created, click it to take note of the **Domain ARN** and **Domain endpoint** for when you create the knowledge base.

1. After you've created the domain, you can create a vector index by running the following commands on an OpenSearch dashboard or using curl commands. For more information, see the [OpenSearch documentation](https://opensearch.org/docs/latest/search-plugins/knn/knn-index/).

   When running the command:
   + Provide a name for the vector field (for example, **embeddings**).
   + Make sure that the vector used for search is **faiss**. **nmslib** is not supported.
   + For the number of dimensions in the vector, refer to the following table to determine how many dimensions the vector should contain:
**Note**  
The Titan V2 Embeddings - Text model supports multiple dimensions. It can also be 256 or 512.  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html)
   + You can add two fields to configure the vector index to store additional metadata that a knowledge base can retrieve with vectors. The following table describes the fields and the values to specify for each of them.  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html)

   ```
   PUT /<index-name>
   {
       "settings": {
           "index": {
               "knn": true
           }
       },
       "mappings": {
           "properties": {
               "<vector-name>": {
                   "type": "knn_vector",
                   "dimension": <embedding-dimension>,
                   "data_type": "binary",          # Only needed for binary embeddings
                   "space_type": "l2" | "hamming", # Use l2 for float embeddings and hamming for binary embeddings
                   "method": {
                       "name": "hnsw",
                       "engine": "faiss",
                       "parameters": {
                           "ef_construction": 128,
                           "m": 24
                       }
                   }
               },
   
               "AMAZON_BEDROCK_METADATA": {
                   "type": "text",
                   "index": "false"
               },
               "AMAZON_BEDROCK_TEXT_CHUNK": {
                   "type": "text",
                   "index": "true"            
               }
           }
       }
   }
   ```

1. Take note of the domain ARN and endpoint, and the names you choose for the vector index name, vector field name, and metadata management mapping field names for when you create your knowledge base.

After the vector index is created, you can proceed to [create your knowledge base](knowledge-base-create.md). The following table summarizes where you will enter each piece of information that you took note of.


| Field | Corresponding field in knowledge base setup (Console) | Corresponding field in knowledge base setup (API) | Description | 
| --- | --- | --- | --- | 
| Domain ARN | Domain ARN | domainARN | The Amazon Resource Name (ARN) of the OpenSearch domain. | 
| Domain endpoint | Domain endpoint | domainEndpoint | The endpoint to connect to the OpenSearch domain. | 
| Vector index name | Vector index name | vectorIndexName | The name of the vector index. | 
| Vector field name | Vector field | vectorField | The name of the field in which to store vector embeddings for your data sources. | 
| Metadata management (first mapping field) | Text field | textField | The name of the field in which to store the raw text from your data sources. | 
| Metadata management (second mapping field) | Bedrock-managed metadata field | metadataField | The name of the field in which to store metadata that Amazon Bedrock manages.  | 

------
#### [ Amazon S3 Vectors ]

Amazon S3 Vectors provides cost-effective vector storage in Amazon S3 that can be used to store and query vector data. It provides durable and elastic storage of large vector datasets with sub-second query performance. Amazon S3 Vectors is best suited for infrequent query workloads, and can help reduce costs when used in retrieval augmented generation (RAG) and semantic search applications.

Amazon S3 Vectors introduces S3 vector buckets, which contain vector indexes that you can query based on semantic meaning and similarity. It can be used to deliver sub-second query response times and reduce costs while storing, accessing, and querying vector data at scale without provisioning any infrastructure. Inside a vector bucket, you can organize your vector data within vector indexes. Your vector bucket can have multiple vector indexes. For more information, see [Amazon S3 Vectors](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors.html) in the *Amazon S3 User Guide*.

**Note**  
You can create a knowledge base for Amazon S3 Vectors in all AWS Regions where both Amazon Bedrock and Amazon S3 Vectors are available. For information about regional availability of Amazon S3 Vectors, see [Amazon S3 Vectors](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors.html) in the *Amazon S3 User Guide*.
When using very high token counts with hierarchical chunking in Amazon Bedrock Knowledge Bases, you can exceed the maximum metadata size limits as parent-child chunk relationships and hierarchical context are stored as non-filterable metadata in Amazon S3 Vectors. For more information about metadata size limits per vector, see [Limitations and restrictions](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors-limitations.html) in the *Amazon S3 User Guide*. For information about chunking strategies, see [How content chunking works for knowledge bases](kb-chunking.md).
<a name="metadata-support"></a>
**Metadata support**  
You can attach metadata as key-value pairs to each vector. By default, metadata is filterable and can be used in similarity search queries to filter by conditions such as dates, categories, or user preferences.

You can also configure metadata to be non-filterable when creating the vector index. Amazon S3 vector indexes support string, boolean, and number types.

When using Amazon S3 Vectors with Amazon Bedrock Knowledge Bases, you can attach up to 1 KB of custom metadata (including both filterable and non-filterable metadata) and 35 metadata keys per vector. For more information about metadata size limits per vector, see [Limitations and restrictions](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors-limitations.html) in the *Amazon S3 User Guide*.

If the metadata exceeds these limits, the ingestion job will throw an exception when populating the vector index. For more information, see [Amazon S3 Vectors](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors.html) in the *Amazon S3 User Guide*.

**Required permissions**  
Make sure that your IAM policy allows Amazon Bedrock to access your vector index in S3 vector bucket. For more information about the required permissions, see [Create a service role for Amazon Bedrock Knowledge Bases](kb-permissions.md).

**Create S3 vector bucket and index**  
To use Amazon S3 Vectors with your knowledge base, you need to create an S3 vector bucket and a vector index. You can create a vector bucket and index using the Amazon S3 console, AWS CLI, or AWS SDK. For detailed instructions, see [Create a vector index](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors-index-create.html) in the *Amazon S3 User Guide*.

Note the following considerations when creating your vector bucket and index in the [Amazon S3 console](https://console.aws.amazon.com/s3/vector-buckets#).

1. When creating your S3 vector bucket, take note of the following considerations.
   + Provide a unique **Vector bucket name**.
   + (Optional) Amazon S3 will automatically encrypt the data using the default **Server-side encryption with Amazon S3 managed keys (SSE-S3)**. You can choose whether to use this default encryption, or the **Server-side encryption with AWS Key Management Service keys (SSE-KMS)** instead.
**Note**  
The encryption type can't be changed once the vector bucket has been created.

     For step-by-step instructions, see [Encryption with AWS KMS keys](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors-bucket-encryption.html).

1. Once you've created the S3 vector bucket, take note of the **Amazon Resource Name (ARN)** of the vector bucket for when you create the knowledge base.

1. Choose the vector bucket that you created and then create a vector index. When creating the vector index, take note of the following considerations.
   + **Vector index name** – Provide a name for the field (for example, **embeddings**).
   + **Dimension** – The number of dimensions in the vector. The dimensions must be a value between 1 and 4096. Refer to the following table to determine how many dimensions the vector should contain based on your selection of the embeddings model:  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html)
   + 
**Note**  
Amazon S3 Vectors only support floating-point embeddings. Binary embeddings are not supported.

     **Distance metric** – The metric used to measure the similarity between vectors. You can use **Cosine** or **Euclidean**.

1. Expand the **Additional settings** and provide any non-filterable metadata in the **Non-filterable metadata** field.

   You can configure up to a maximum of 10 non-filterable metadata keys. Choose **Add key** and then add `AMAZON_BEDROCK_TEXT` and `AMAZON_BEDROCK_METADATA` as keys.

1. Under **Encryption**, choose **Specify encryption type**. You have the option to **Use bucket settings for encryption** or override the encryption settings for the vector index. If you override the bucket-level settings, you have the option to specify encryption type for the vector index as **Server-side encryption with AWS Key Management Service keys (SSE-KMS)** or the default **Server-side encryption with Amazon S3 managed keys (SSE-S3)**. For more information about setting encryption configuration for vector indexes, see [Data protection and encryption in Amazon S3 Vectors](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors-data-encryption.html).

1. Under **Tags (Optional)**, you can add tags as key-value pairs to help track and organize vector index costs using AWS Billing and Cost Management. Enter a **Key** and a **Value**. To add another tag, choose **Add Tag**. You can enter up to 50 tags for a vector index. For more information see [Using tags with Amazon S3 vector indexes](https://docs.aws.amazon.com/console/s3/vector-index-create-with-tag).

1. Create the vector index and take note of the **Amazon Resource Name (ARN)** of the vector index for when you create the knowledge base.

**Create knowledge base for S3 vector bucket**  
After you've gathered this information, you can proceed to [create your knowledge base](knowledge-base-create.md). When creating your knowledge base with S3 vector bucket, you'll need to provide the ARN of the vector bucket and the vector index. The vector index will store the embeddings that's generated from your data sources. The following table summarizes where you will enter each piece of information:


| Field | Corresponding field in knowledge base setup (Console) | Corresponding field in knowledge base setup (API) | Description | 
| --- | --- | --- | --- | 
| Vector bucket ARN | S3 vector bucket ARN | vectorBucketArn | The Amazon Resource Name (ARN) of your S3 vector bucket. | 
| Vector index ARN | S3 vector index ARN | vectorIndexARN | The Amazon Resource Name (ARN) of the vector index for your S3 vector bucket. | 

------
#### [ Amazon Aurora (RDS) ]

1. Create an Amazon Aurora database (DB) cluster, schema, and table by following the steps at [Using Aurora PostgreSQL as a knowledge base](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.VectorDB.html). When you create the table, configure it with the following columns and data types. You can use column names of your liking instead of the ones listed in the following table. Take note of the column names you choose so that you can provide them during knowledge base setup.

   You must provide these fields before creating the knowledge base. They connot be updated once the knowledge base has been created.
**Important**  
The Aurora cluster must reside in the same AWS account as the one where the knowledge base is created for Amazon Bedrock.  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html)

1. You must create an index on the columns vector and text for your text and embeddings fields. If you're using the custom metadata field, you must also create a GIN index on this column. GIN indexes can be used to efficiently search for key-value pairs in jsonb documents for metadata filtering. For more information, see [jsonb indexing](https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING) in the *PostgreSQL documentation*.  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html)
**Note**  
For improving hybrid search accuracy and latency with English content, consider using the 'english' dictionary instead of 'simple':  

   ```
   CREATE INDEX ON bedrock_integration.bedrock_kb USING gin (to_tsvector('english', chunks));
   ```

1. (Optional) If you [added metadata to your files for filtering](kb-test-config.md), we recommend that you provide the column name in the custom metadata field to store all your metadata in a single column. During [data ingestion](kb-data-source-sync-ingest.md), this column will be populated with all the information in the metadata files from your data sources. If you choose to provide this field, you must create a GIN index on this column.
**Note**  
If you frequently use range filters over numerical metadata, then to optimize performance, create an index for the specific key. For example, if you use filters such as `"lessThan": { "key": "year", "value": 1989 }`, create an expression index on the `year` key. For more information, see [Indexes on expressions](https://www.postgresql.org/docs/current/indexes-expressional.html) in the *PostgreSQL documentation*.  

   ```
   CREATE INDEX ON your_table ((custom_metadata->>'year')::double precision
   ```

   Alternatively, if you don't provide this field name, you can create a column for each metadata attribute in your files and specify the data type (text, number, or boolean). For example, if the attribute `genre` exists in your data source, you would add a column named `genre` and specify `text` as the data type. During [data ingestion](kb-data-source-sync-ingest.md), these separate columns will be populated with the corresponding attribute values.

1. Configure an AWS Secrets Manager secret for your Aurora DB cluster by following the steps at [Password management with Amazon Aurora and AWS Secrets Manager](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/rds-secrets-manager.html).

1. Take note of the following information after you create your DB cluster and set up the secret.  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html)

------
#### [ Neptune Analytics graphs (GraphRAG) ]

1. To create a graph and vector store in Neptune Analytics in the AWS Management Console, follow the steps described in [Vector indexing in Neptune Analytics](https://docs.aws.amazon.com/neptune-analytics/latest/userguide/vector-index.html) in the *Neptune Analytics User Guide*.
**Note**  
To use Neptune GraphRAG, create an empty Neptune Analytics graph with a vector search index. The vector search index can only be created when the graph is created. When you create a [Neptune Analytics graph in the console](https://docs.aws.amazon.com/neptune-analytics/latest/userguide/create-graph-using-console.html), you specify the index dimension under **Vector search settings** near the end of the process.

   Note the following considerations while creating the graph:

   1. Give the graph a name of your choice.

   1. Under **Data source**, choose **Create empty graph**, and specify the number of m-NCUs to be allocated. Each m-NCU has around one GiB of memory capacity and corresponding compute and networking.
**Note**  
The capacity of your graph can be modified later. We recommend that you start with the smallest instance and later choose a different instance, if needed.

   1. You can leave the default network connectivity settings. Amazon Bedrock will create a networking connection to the Neptune Analytics graph that you associate the knowledge base with. You do not have to configure public connectivity or private endpoints for your graph.

   1. Under **Vector search settings**, choose **Use vector dimension** and specify the number of dimensions in each vector.
**Note**  
The number of dimensions in each vector must match the vector dimensions in the embeddings model. Refer to the following table to determine how many dimensions the vector should contain:  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html)

   1. Leave all other settings to their default and create the graph.

1. Once the graph is created, click it to take note of the **Resource ARN** and **Vector dimensions** for when you create the knowledge base. When choosing the embeddings model in Amazon Bedrock, make sure that you choose a model with the same dimensions as the **Vector dimensions** you configured on your Neptune Analytics graph.

After the vector index is created, you can proceed to [create your knowledge base](knowledge-base-create.md). The following table summarizes where you will enter each piece of information that you took note of.


| Field | Corresponding field in knowledge base setup (Console) | Corresponding field in knowledge base setup (API) | Description | 
| --- | --- | --- | --- | 
| Graph ARN | Neptune Analytics Graph ARN | graphARN | The Amazon Resource Name (ARN) of the Neptune Analytics graph. | 
| Metadata management (first mapping field) | Text field name | textField | The name of the field in which to store the raw text from your data sources. You can provide any value for this field, for example, text. | 
| Metadata management (second mapping field) | Bedrock-managed metadata field | metadataField | The name of the field in which to store metadata that Amazon Bedrock manages. You can provide any value for this field, for example, metadata. | 

------
#### [ Pinecone ]

**Note**  
If you use Pinecone, you agree to authorize AWS to access the designated third-party source on your behalf in order to provide vector store services to you. You're responsible for complying with any third-party terms applicable to use and and transfer of data from the third-party service.

For detailed documentation on setting up a vector store in Pinecone, see [Pinecone as a knowledge base for Amazon Bedrock](https://docs.pinecone.io/docs/amazon-bedrock).

While you set up the vector store, take note of the following information, which you will fill out when you create a knowledge base:
+ **Endpoint URL** – The endpoint URL for your index management page.
+ **Credentials secret ARN** – The Amazon Resource Name (ARN) of the secret that you created in AWS Secrets Manager that contains the username and password for a database user.
+ **(Optional) Customer-managed KMS key for your Credentials secret ARN** – if you encrypted your credentials secret ARN, provide the KMS key so that Amazon Bedrock can decrypt it. 
+ **Name Space** – (Optional) The namespace to be used to write new data to your database. For more information, see [Using namespaces](https://docs.pinecone.io/docs/namespaces).

There are additional configurations that you must provide when creating a Pinecone index:
+ **Text field name** – The name of the field which Amazon Bedrock should store the raw chunk text in.
+ **Metadata field name** – The name of the field which Amazon Bedrock should store source attribution metadata in.

To access your Pinecone index, you must provide your Pinecone API key to Amazon Bedrock through the AWS Secrets Manager.

**To set up a secret for your Pinecone configuration**

1. Follow the steps at [Create an AWS Secrets Manager secret](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html), setting the key as `apiKey` and the value as the API key to access your Pinecone index.

1. To find your API key, open your [Pinecone console](https://app.pinecone.io/) and select **API Keys**.

1. After you create the secret, take note of the ARN of the KMS key.

1. Attach permissions to your service role to decrypt the ARN of the KMS key by following the steps in [Permissions to decrypt an AWS Secrets Manager secret for the vector store containing your knowledge base](encryption-kb.md#encryption-kb-3p).

1. Later, when you create your knowledge base, enter the ARN in the **Credentials secret ARN** field.

------
#### [ Redis Enterprise Cloud ]

**Note**  
If you use Redis Enterprise Cloud, you agree to authorize AWS to access the designated third-party source on your behalf in order to provide vector store services to you. You're responsible for complying with any third-party terms applicable to use and transfer of data from the third-party service.

For detailed documentation on setting up a vector store in Redis Enterprise Cloud, see [Integrating Redis Enterprise Cloud with Amazon Bedrock](https://docs.redis.com/latest/rc/cloud-integrations/aws-marketplace/aws-bedrock/).

While you set up the vector store, take note of the following information, which you will fill out when you create a knowledge base:
+ **Endpoint URL** – The public endpoint URL for your database.
+ **Vector index name** – The name of the vector index for your database.
+ **Vector field** – The name of the field where the vector embeddings will be stored. Refer to the following table to determine how many dimensions the vector should contain.  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html)
+ **Text field** – The name of the field where the Amazon Bedrock stores the chunks of raw text.
+ **Bedrock-managed metadata field** – The name of the field where Amazon Bedrock stores metadata related to your knowledge base.

To access your Redis Enterprise Cloud cluster, you must provide your Redis Enterprise Cloud security configuration to Amazon Bedrock through the AWS Secrets Manager.

**To set up a secret for your Redis Enterprise Cloud configuration**

1. Enable TLS to use your database with Amazon Bedrock by following the steps at [Transport Layer Security (TLS)](https://docs.redis.com/latest/rc/security/database-security/tls-ssl/).

1. Follow the steps at [Create an AWS Secrets Manager secret](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html). Set up the following keys with the appropriate values from your Redis Enterprise Cloud configuration in the secret:
   + `username` – The username to access your Redis Enterprise Cloud database. To find your username, look under the **Security** section of your database in the [Redis Console](http://app.redislabs.com/).
   + `password` – The password to access your Redis Enterprise Cloud database. To find your password, look under the **Security** section of your database in the [Redis Console](http://app.redislabs.com/).
   + `serverCertificate` – The content of the certificate from the Redis Cloud Certificate authority. Download the server certificate from the Redis Admin Console by following the steps at [Download certificates](https://docs.redis.com/latest/rc/security/database-security/tls-ssl/#download-certificates).
   + `clientPrivateKey` – The private key of the certificate from the Redis Cloud Certificate authority. Download the server certificate from the Redis Admin Console by following the steps at [Download certificates](https://docs.redis.com/latest/rc/security/database-security/tls-ssl/#download-certificates).
   + `clientCertificate` – The public key of the certificate from the Redis Cloud Certificate authority. Download the server certificate from the Redis Admin Console by following the steps at [Download certificates](https://docs.redis.com/latest/rc/security/database-security/tls-ssl/#download-certificates).

1. After you create the secret, take note of its ARN. Later, when you create your knowledge base, enter the ARN in the **Credentials secret ARN** field.

------
#### [ MongoDB Atlas ]

**Note**  
If you use MongoDB Atlas, you agree to authorize AWS to access the designated third-party source on your behalf in order to provide vector store services to you. You're responsible for complying with any third-party terms applicable to use and and transfer of data from the third-party service.

For detailed documentation on setting up a vector store in MongoDB Atlas, see [Launch a Fully Managed RAG Workflow With MongoDB Atlas and Amazon Bedrock](https://www.mongodb.com/developer/products/atlas/rag-workflow-with-atlas-amazon-bedrock/).

When you set up the vector store, note the following information which you will add when you create a knowledge base:
+ **Endpoint URL** – The endpoint URL of your MongoDB Atlas cluster.
+ **Database name** – The name of the database in your MongoDB Atlas cluster.
+ **Collection name** – The name of the collection in your database.
+ **Credentials secret ARN** – The Amazon Resource Name (ARN) of the secret that you created in AWS Secrets Manager that contains the username and password for a database user in your MongoDB Atlas cluster. The secret must contain keys named `username` and `password`.
+ **(Optional) Customer-managed KMS key for your Credentials secret ARN** – if you encrypted your credentials secret ARN, provide the KMS key so that Amazon Bedrock can decrypt it. 

There are additional configurations for **Field mapping** that you must provide when creating a MongoDB Atlas index:
+ **Vector index name** – The name of the MongoDB Atlas Vector Search Index on your collection.
+ **Vector field name** – The name of the field which Amazon Bedrock should store vector embeddings in.
+ **Text field name** – The name of the field which Amazon Bedrock should store the raw chunk text in.
+ **Metadata field name** – The name of the field which Amazon Bedrock should store source attribution metadata in.
+ **(Optional) Text search index name** – The name of the MongoDB Atlas Search index on your collection.

**Important**  
If you plan to use metadata filtering with your MongoDB Atlas knowledge base, you must manually configure filters in your vector index. Metadata filtering doesn't work by default and requires additional setup in your MongoDB Atlas vector index configuration.

(Optional) To have Amazon Bedrock connect to your MongoDB Atlas cluster over AWS PrivateLink, see [ RAG workflow with MongoDB Atlas using Amazon Bedrock](https://www.mongodb.com/developer/products/atlas/rag-workflow-with-atlas-amazon-bedrock/).

------

# Prerequisites and permissions required for using OpenSearch Managed Clusters with Amazon Bedrock Knowledge Bases
<a name="kb-osm-permissions-prereq"></a>

This section shows you how to configure permissions if you're creating your own vector database with Amazon OpenSearch Service Managed Clusters. This configuration must be performed before you create the knowledge base. The steps assume that you've already created a domain and vector index in Amazon OpenSearch Service. For more information, see [Creating and managing OpenSearch Service domains](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/createupdatedomains.html) in the *Amazon OpenSearch Service developer guide*.

## Key considerations
<a name="kb-osm-permissions-prereq-considerations"></a>

Following are some key considerations for using Amazon Bedrock Knowledge Bases with Amazon OpenSearch Service Managed Clusters.
+ Before using any domain resources in OpenSearch Managed clusters, you need to configure certain IAM access permissions and policies. For Knowledge Bases integration with Managed clusters, before you perform the steps in this section, if your domain has a restrictive access policy, you must grant the required IAM access and configure the resource-based policies. We also recommend that you configure fine-grained access control to scope down the permissions.
+ When ingesting the data for your knowledge base, if you encounter failures, it might indicate an insufficient OpenSearch domain capacity to handle the speed of ingestion. To resolve this issue, increase your domain's capacity by provisioning higher IOPS (Input/Output Operations Per Second) and by increasing the throughput settings. Wait for several minutes for the new capacity to be provisioned and then retry the ingestion process. To verify that the issue has been resolved, you can monitor the performance during the retry process. If throttling still persists, you may need to further adjust capacity for improved efficiency. For more information, see [Operational best practices for Amazon OpenSearch Service](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/bp.html).

## Overview of permissions configuration
<a name="kb-osm-permissions-prereq-overview"></a>

For Knowledge Bases integration with managed clusters, you need to configure the following IAM access permissions and resource-based policies. We recommend that you enable fine-grained access policies to further control the user access and the granularity at which it must be scoped down to the property level.

The following steps provide a high-level overview for how to configure permissions.

1. 

**Create and use Knowledge base service role**

   For the permissions that you want to configure, while you can still provide your own custom role, we recommend that you specify the option for Amazon Bedrock Knowledge Bases to create the Knowledge base service role for you.

1. 

**Configure the resource-based policy**

   The OpenSearch domain supports resource-based policies, that determine which principals can access and act on the domain. To use with Knowledge Bases, ensure proper configuration of the resource-based policy for your domain.

1. 

***(Strongly Recommended)* Provide role mapping for fine-grained access control**

   While fine-grained access control is optional, we recommend that you have it enabled for controlling the granularity at which the permissions must be scoped down at the property level.

## Configuring IAM policies
<a name="kb-osm-permissions-iam"></a>

Your domain's access policy must grant the permissions to perform the required OpenSearch API actions by the roles in your account.

If your domain has a restrictive access policy, then it might need to be updated as follows:
+ It should grant access to Amazon Bedrock service and include the required HTTP actions: `GET`, `POST`, `PUT`, and `DELETE`.
+ It must also grant Amazon Bedrock permissions to perform the `es:DescribeDomain` action on your index resource. This allows Amazon Bedrock Knowledge Bases to perform the required validations when configuring a knowledge base.

## (Optional) Fine-grained access control
<a name="kb-osm-permissions-console-fgap"></a>

Fine-grained access control can control the granularity at which the permissions must be scoped down at the property levelYou can configure the fine-grained access policies, to grant the read-write permissions required to the service role created by Knowledge Bases.

To configure fine-grained access control and provide the role mapping:

1. Make sure that the OpenSearch domain that you created has fine-grained access control enabled.

1. Create an OpenSearch UI (Dashboards), if you haven't already. This will be used to configure the role mapping

1. In your OpenSearch Dashboards, create an OpenSearch role and specify the vector index name, and the cluster and index permissions. To add the permissions, you must create permission groups and then add the required permissions that grant access to perform a set of operations including `delete`, `search`, `get`, and `index` for the role.

1. After you've added the required permissions, you must enter the ARN of your Knowledge base service role for the OpenSearch back-end role. Performing this step will complete the mapping between your Knowledge Base Service role and the OpenSearch role, which then grants Amazon Bedrock Knowledge Bases permissions to access the vector index in the OpenSearch domain and perform the required operations.

**Topics**
+ [Key considerations](#kb-osm-permissions-prereq-considerations)
+ [Overview of permissions configuration](#kb-osm-permissions-prereq-overview)
+ [Configuring IAM policies](#kb-osm-permissions-iam)
+ [(Optional) Fine-grained access control](#kb-osm-permissions-console-fgap)
+ [Configuring resource-based policies for OpenSearch Managed clusters](kb-osm-permissions-slr-rbp.md)
+ [Configuring OpenSearch permissions with fine-grained access control](kb-osm-permissions-console-fgap.md)

# Configuring resource-based policies for OpenSearch Managed clusters
<a name="kb-osm-permissions-slr-rbp"></a>

When creating your knowledge base, you can either create your own custom role or let Amazon Bedrock create one for you. How you configure the permissions depend on whether you're creating a new role or using an existing role. If you already have an existing IAM role, you must ensure that your domain's access policy does not prevent the roles in your account from performing the necessary OpenSearch API actions.

If you are choosing to let Amazon Bedrock Knowledge Bases create the IAM role for you, you must ensure that your domain's access policy grants the permissions to perform the required OpenSearch API actions by the roles in your account. If your domain has a restrictive access policy, it can prevent your role from performing these actions. Following shows an example of a restrictive resource-based policy.

In this case, you can either:
+ Create your knowledge base using an existing IAM role that your OpenSearch domain can grant access to this role for performing the necessary operations.
+ Alternatively, you can let Amazon Bedrock create a new role for you. In this case, you must ensure that the domain's access policy must grant the permissions to perform the necessary OpenSearch API actions by the roles in your account.

The following sections show a sample IAM policy that grants the necessary permissions and how you can update the domain's access policy so that it grants permissions to perform the necessary OpenSearch API operations.

**Topics**
+ [Sample IAM identity-based and resource-based policies](#kb-osm-permissions-iam)
+ [Creating the Amazon Bedrock Knowledge Bases service role](#kb-osm-permissions-slr)
+ [Updating the resource-based policies](#kb-osm-permissions-console-rbp)

## Sample IAM identity-based and resource-based policies
<a name="kb-osm-permissions-iam"></a>

This section provides a sample identity policy and a resource-based policy that you can configure for your OpenSearch domain when integrating with Amazon Bedrock Knowledge Bases. You must grant Amazon Bedrock permissions to perform these actions on the index that you provide your Knowledge Base.


****  

| Action | Resource | Description | 
| --- | --- | --- | 
| es:ESHttpPost | arn:<partition>:es:<region>:<accountId>:domain/<domainName>/<indexName> | For inserting information to the index | 
| es:ESHttpGet |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/kb-osm-permissions-slr-rbp.html)  | For searching information from the index. This action is configured at both the at both the domain/index level and the domain/index/\$1 level. At the domain/index level, it can get high level details about the index, such as the engine type. To retrieve details stored within the index, permissions are required at the domain/index/\$1 level. | 
| es:ESHttpHead |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/kb-osm-permissions-slr-rbp.html)  | For getting information from the index. This action is configured at both the at both the domain/index level and the domain/index/\$1 level, in case information needs to be obtained at a higher level, such as whether a particular index exists. | 
| es:ESHttpDelete | arn:<partition>:es:<region>:<accountId>:domain/<domainName>/<indexName> | For deleting information to the index | 
| es:DescribeDomain | arn:<partition>:es:<region>:<accountId>:domain/<domainName> | For performing validations on the domain, such as the engine version used. | 

### Sample identity-based policy
<a name="kb-osm-permissions-idpolicy"></a>

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "OpenSearchIndexAccess",
            "Effect": "Allow",
            "Action": [
                "es:ESHttpGet",
                "es:ESHttpPost",
                "es:ESHttpPut",
                "es:ESHttpDelete"
            ],
            "Resource": [
                "arn:aws:es:us-east-1:123456789012:domain/domainName/indexName/*"
            ]
        },
        {
            "Sid": "OpenSearchIndexGetAccess",
            "Effect": "Allow",
            "Action": [
                "es:ESHttpGet",
                "es:ESHttpHead"
            ],
            "Resource": [
            "arn:aws:es:us-east-1:123456789012:domain/domainName/indexName"
            ]
        },
        {
            "Sid": "OpenSearchDomainValidation",
            "Effect": "Allow",
            "Action": [
                "es:DescribeDomain"
            ],
            "Resource": [
            "arn:aws:es:us-east-1:123456789012:domain/domainName"
            ]
        }
    ]
}
```

------

### Sample resource-based policy
<a name="kb-osm-permissions-rbp"></a>

**Note**  
Make sure that the service role has been created for it to be used in the resource-based policy.

## Creating the Amazon Bedrock Knowledge Bases service role
<a name="kb-osm-permissions-slr"></a>

When you create the knowledge base, you can choose the option to create and use a new service role. This section walks you through creating the Amazon Bedrock Knowledge Bases service role. By mapping the resource-based policies and the fine-grained access policies to this role, it will grant Amazon Bedrock the permissions to make requests to the OpenSearch domain.

**To specify the Amazon Bedrock Knowledge Bases service role:**

1. In the Amazon Bedrock console, go to [Knowledge Bases](https://console.aws.amazon.com/bedrock/home#/knowledge-bases).

1. Choose **Create** and then choose **Knowledge base with vector store**.

1. Choose **Create and use a new service role**. You can either use the default, or provide a custom role name, and Amazon Bedrock will automatically create the Knowledge Base service role for you.

1. Continue going through the console to configure your data source and parsing and chunking strategies.

1. Choose an Embeddings model and then, under **Choose an existing vector store**, choose **Amazon OpenSearch Managed Cluster**.

**Important**  
Before you proceed to create the knowledge base, complete the following steps to configure the resource-based policies and fine-grained access policies. For detailed steps on creating the knowledge base, see [Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases](knowledge-base-create.md).

## Updating the resource-based policies
<a name="kb-osm-permissions-console-rbp"></a>

If your OpenSearch domain has a restrictive access policy, you can follow the instructions on this page to update the resource-based policy. These permissions allow Knowledge Bases to make use of the index that you provide, and to retrieve the OpenSearch domain definition to perform the required validation on the domain.

**To configure the resource-based policies from the AWS Management Console**

1. Go to the [Amazon OpenSearch Service console](https://console.aws.amazon.com/aos/home?region=us-east-1#opensearch/dashboard).

1. Go to the domain that you had created, and then go to **Security Configurations** where the resource-based policy is configured.

1. Edit the policy in the **JSON** tab and then update the policy similar to the [Sample resource-based policy](#kb-osm-permissions-rbp).

1. You can now go back to the Amazon Bedrock console and provide the details for your OpenSearch domain and index as described in [Knowledge base setup for Managed Clusters](knowledge-base-setup.md#knowledge-base-setup-osm).

# Configuring OpenSearch permissions with fine-grained access control
<a name="kb-osm-permissions-console-fgap"></a>

While optional, we strongly recommend that you enable fine-grained access control for your OpenSearch domain. Using fine-grained access control, you can use role-based access control, which enables you to create an OpenSearch role with specific permissions and map it to the Knowledge Base service role. The mapping grants your knowledge base the minimum required permissions that allows it to access and perform operations on the OpenSearch domain and index.

To configure and use fine-access control:

1. Make sure that the OpenSearch domain that you're using has fine-grained access control enabled.

1. For your domain that uses fine-grained access control, configure permissions with scoped-down policies in the form of an OpenSearch role.

1. For the domain that you create a role for, add a role mapping to the Knowledge Base Service role.

The following steps show how how to configure your OpenSearch role and ensure the correct mapping between the OpenSearch role and the Knowledge Base service role.

**To create an OpenSearch role and configure permissions**  
After you have enabled fine-grained access control and configured Amazon Bedrock to connect to the OpenSearch Service, you can configure permissions using the OpenSearch Dashboards link for each OpenSearch domain.

**To configure permissions for a domain to allow access to Amazon Bedrock:**

1. Open the OpenSearch Dashboard for the OpenSearch domain that you want to work with. To find the link to Dashboards, go to the domain that you created in the OpenSearch Service console. For domains running OpenSearch, the URL is of the format, `domain-endpoint/_dashboards/`. For more information, see [Dashboards](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/dashboards.html) in the *Amazon OpenSearch Service developer guide*.

1. In the OpenSearch Dashboard, choose **Security** and then choose **Roles**.

1. Choose **Create role**.

1. Provide any name for the role, for example, **kb\$1opensearch\$1role**.

1. Under **Cluster permissions**, add the following permissions.
   + `indices:data/read/msearch`
   + `indices:data/write/bulk*`
   + `indices:data/read/mget*`

1. Under **Index permissions**, provide a name for the vector index. Choose **Create new permission group**, and then choose **Create new action group**. Add the following permissions to an action group, such as `KnowledgeBasesActionGroup`. Add the following permissions to an action group.
   + `indices:admin/get`
   + `indices:data/read/msearch`
   + `indices:data/read/search`
   + `indices:data/write/index`
   + `indices:data/write/update`
   + `indices:data/write/delete`
   + `indices:data/write/delete/byquery`
   + `indices:data/write/bulk*`
   + `indices:admin/mapping/put`
   + `indices:data/read/mget*`  
![\[The action groups to create in OpenSearch Dashboards for adding cluster and index permissions.\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/kb/kb-test-os-action-groups.png)

1. Choose **Create** to create the OpenSearch role.

The following shows a sample OpenSearch role with the permissions added.

![\[A sample OpenSearch role in OpenSearch Dashboards with the permissions added.\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/kb/kb-test-os-dashboards-permissions.png)


**To create a role mapping to your Knowledge Base service role**

1. Identify the IAM role that will need to be mapped.
   + If you created your own custom IAM role, you can copy the role ARN for this role from the IAM console.
   + If you are allowing Knowledge Bases to create the role for you, you can make note of the role ARN when creating your knowledge base, and then copy this role ARN.

1. Open the OpenSearch Dashboard for the OpenSearch domain that you want to work with. The URL is of the format, `domain-endpoint/_dashboards/`.

1. Choose **Security** from the navigation pane.

1. Search for the role that you just created from the list, for example, **kb\$1opensearch\$1role**, and open it.

1. On the **Mapped Users** tab, choose **Manage mapping**

1. In the **Backend roles** section, enter the ARN of the AWS managed IAM role for Knowledge Bases. Depending on whether you created your own custom role or let Knowledge Bases create the role for you, copy the role ARN information from the IAM console or the Amazon Bedrock console, and then enter that information for the **Backend roles** in the OpenSearch console. Following is an example.

   ```
   arn:aws:iam::<accountId>:role/service-role/<knowledge-base-service-role>
   ```

1. Choose **Map**.

   The Knowledge Base Service role can now connect to the OpenSearch role and perform the required operations on the domain and index.

# Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases
<a name="knowledge-base-create"></a>

When you create a knowledge base by connecting to a data source, you set up or specify the following:
+ General information that defines and identifies the knowledge base
+ The service role with permissions to the knowledge base.
+ Configurations for the knowledge base, including the embeddings model to use when converting data from the data source, storage configurations for the service in which to store the embeddings, and, optionally, an S3 location to store multimodal data.

**Note**  
You can’t create a knowledge base with a root user. Log in with an IAM user before starting these steps.

Expand the section that corresponds to your use case:

## Use the console
<a name="knowledge-base-create-console"></a>

**To set up a knowledge base**

1. Sign in to the AWS Management Console with an IAM identity that has permissions to use the Amazon Bedrock console. Then, open the Amazon Bedrock console at [https://console.aws.amazon.com/bedrock](https://console.aws.amazon.com/bedrock).

1. In the left navigation pane, choose **Knowledge bases**.

1. In the **Knowledge bases** section, choose the create button and select to create a knowledge base with a vector store.

1. (Optional) Change the default name and provide a description for your knowledge base.

1. Choose an AWS Identity and Access Management (IAM) role that provides Amazon Bedrock permission to access other required AWS services. You can let Amazon Bedrock create the service role or choose to use your own [custom role that you created for Neptune Analytics](kb-permissions.md#kb-permissions-neptune).

1. Choose a data source to connect your knowledge base to.

1. (Optional) Add tags to your knowledge base. For more information, see [Tagging Amazon Bedrock resources](tagging.md).

1. (Optional) Configure services for which to deliver activity logs for your knowledge base.

1. Go to the next section and follow the steps at [Connect a data source to your knowledge base](data-source-connectors.md) to configure a data source.

1. In the **Embeddings model** section, do the following:

   1. Choose an embeddings model to convert your data into vector embeddings. For multimodal data (images, audio, and video), select a multimodal embedding model such as Amazon Titan Multimodal Embeddings G1 or Cohere Embed v3.
**Note**  
When using Amazon Titan Multimodal Embeddings G1, you must provide an S3 content bucket and can only use the default parser. This model is optimized for image search use cases. For comprehensive guidance on choosing between multimodal approaches, see [Build a knowledge base for multimodal content](kb-multimodal.md).

   1. (Optional) Expand the **Additional configurations** section to see the following configuration options (not all models support all configurations):
      + **Embeddings type** – Whether to convert the data to floating-point (float32) vector embeddings (more precise, but more costly) or binary vector embeddings (less precise, but less costly). To learn about which embeddings models support binary vectors, refer to [supported embeddings models](knowledge-base-supported.md).
      + **Vector dimensions** – Higher values improve accuracy but increase cost and latency.

1. In the **Vector database** section, do the following:

   1. Choose a vector store to store the vector embeddings that will be used for query. You have the following options:
      + **Quick create a new vector store** – choose one of the available vector stores for Amazon Bedrock to create. You can also optionally configure AWS KMS key encryption for your vector store.
**Note**  
When using this option, Amazon Bedrock automatically handles the metadata placement for each vector store.
        + **Amazon OpenSearch Serverless** – Amazon Bedrock Knowledge Bases creates an Amazon OpenSearch Serverless vector search collection and index and configures it with the required fields for you.
        + **Amazon Aurora PostgreSQL Serverless** – Amazon Bedrock sets up an Amazon Aurora PostgreSQL Serverless vector store. This process takes unstructured text data from an Amazon S3 bucket, transforms it into text chunks and vectors, and then stores them in a PostgreSQL database. For more information, see [Quick create an Aurora PostgreSQL Knowledge Base for Amazon Bedrock](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.quickcreatekb.html).
        + **Amazon Neptune Analytics** – Amazon Bedrock uses Retrieval Augmented Generation (RAG) techniques combined with graphs to enhance generative AI applications so that end users can get more accurate and comprehensive responses.
        + **Amazon S3 Vectors** – Amazon Bedrock Knowledge Bases creates an S3 vector bucket and a vector index that will store the embeddings generated from your data sources.

          You can create a knowledge base for Amazon S3 Vectors in all AWS Regions where both Amazon Bedrock and Amazon S3 Vectors are available. For region availability information, see [Amazon S3 Vectors](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors.html) in the *Amazon S3 User Guide*.
**Note**  
When using Amazon S3 Vectors with Amazon Bedrock Knowledge Bases, you can attach up to 1 KB of custom metadata (including both filterable and non-filterable metadata) and 35 metadata keys per vector. For detailed information about metadata limitations, see [Metadata support](knowledge-base-setup.md#metadata-support) in [Prerequisites for using a vector store you created for a knowledge base](knowledge-base-setup.md).
      + **Choose a vector store you have created** – Select a supported vector store and identify the vector field names and metadata field names in the vector index. For more information, see [Prerequisites for using a vector store you created for a knowledge base](knowledge-base-setup.md).
**Note**  
If your data source is a Confluence, Microsoft SharePoint, or Salesforce instance, the only supported vector store service is Amazon OpenSearch Serverless.

   1. (Optional) Expand the **Additional configurations** section and modify any relevant configurations.

1. If your data source contains images, specify an Amazon S3 URI in which to store the images that the parser will extract from the data in the **Multimodal storage destination**. The images can be returned during query. You can also optionally choose a customer managed key instead of the default AWS managed key to encrypt your data.
**Note**  
Multimodal data is only supported with Amazon S3 and custom data sources.
**Note**  
When using multimodal embedding models:  
Amazon Titan Multimodal Embeddings G1 requires an S3 content bucket and works best with image-only datasets using the default parser
Cohere Embed v3 supports mixed text and image datasets and can be used with any parser configuration
For image search use cases, avoid using Bedrock Data Automation (BDA) or foundation model parsers with Titan G1 due to token limitations
The multimodal storage destination creates file copies for retrieval purposes, which can incur additional storage charges

1. Choose **Next** and review the details of your knowledge base. You can edit any section before going ahead and creating your knowledge base.
**Note**  
The time it takes to create the knowledge base depends on your specific configurations. When the creation of the knowledge base has completed, the status of the knowledge base changes to either state it is ready or available.  
Once your knowledge base is ready and available, sync your data source for the first time and whenever you want to keep your content up to date. Select your knowledge base in the console and select **Sync** within the data source overview section.

## Use the API
<a name="knowledge-base-create-api"></a>

To create a knowledge base, send a [CreateKnowledgeBase](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateKnowledgeBase.html) request with an [Agents for Amazon Bedrock build-time endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#bra-bt).

**Note**  
If you prefer to let Amazon Bedrock create and manage a vector store for you, use the console. For more information, expand the **Use the console** section in this topic.

The following fields are required:


****  

| Field | Basic description | 
| --- | --- | 
| name | A name for the knowledge base | 
| roleArn | The ARN of an [Amazon Bedrock Knowledge Bases service role](kb-permissions.md). | 
| knowledgeBaseConfiguration | Contains configurations for the knowledge base. See details below. | 
| storageConfiguration | (Only required if you're connecting to an unstructured data source). Contains configurations for the data source service that you choose. | 

The following fields are optional:


****  

| Field | Use case | 
| --- | --- | 
| description | A description for the knowledge base. | 
| clientToken | To ensure the API request completes only once. For more information, see [Ensuring idempotency](https://docs.aws.amazon.com/ec2/latest/devguide/ec2-api-idempotency.html). | 
| tags | To associate tags with the flow. For more information, see [Tagging Amazon Bedrock resources](tagging.md). | 

In the `knowledgeBaseConfiguration` field, which maps to a [KnowledgeBaseConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_KnowledgeBaseConfiguration.html) object, specify `VECTOR` in the `type` field and include a [VectorKnowledgeBaseConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_VectorKnowledgeBaseConfiguration.html) object. In the object, include the following fields:
+ `embeddingModelArn` – The ARN of the embedding model to use.
+ `embeddingModelConfiguration` – Configurations for the embedding model. To see the possible values you can specify for each supported model, see [Supported models and Regions for Amazon Bedrock knowledge bases](knowledge-base-supported.md).
+ (If you plan to include multimodal data, which includes images, figures, charts, or tables, in your knowledge base) `supplementalDataStorageConfiguration` – Maps to a [SupplementalDataStorageLocation](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_SupplementalDataStorageLocation.html) object, in which you specify the S3 location in which to store the extracted data. For more information, see [Parsing options for your data source](kb-advanced-parsing.md).

In the `storageConfiguration` field, which maps to a [StorageConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_StorageConfiguration.html) object, specify the vector store that you plan to connect to in the `type` field and include the field that corresponds to that vector store. See each vector store configuration type at [StorageConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_StorageConfiguration.html) for details about the information you need to provide.

The following shows an example request to create a knowledge base connected to an Amazon OpenSearch Serverless collection. The data from connected data sources will be converted into binary vector embeddings with Amazon Titan Text Embeddings V2 and multimodal data extracted by the parser is set up to be stored in a bucket called *MyBucket*.

```
PUT /knowledgebases/ HTTP/1.1
Content-type: application/json

{
   "name": "MyKB",
   "description": "My knowledge base",
   "roleArn": "arn:aws:iam::111122223333:role/service-role/AmazonBedrockExecutionRoleForKnowledgeBase_123",
   "knowledgeBaseConfiguration": {
      "type": "VECTOR",
      "vectorKnowledgeBaseConfiguration": { 
         "embeddingModelArn": "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0",
         "embeddingModelConfiguration": { 
            "bedrockEmbeddingModelConfiguration": { 
               "dimensions": 1024,
               "embeddingDataType": "BINARY"
            }
         },
         "supplementalDataStorageConfiguration": { 
            "storageLocations": [ 
               { 
                  "s3Location": { 
                     "uri": "arn:aws:s3:::MyBucket"
                  },
                  "type": "S3"
               }
            ]
         }
      }
   },
   "storageConfiguration": { 
      "opensearchServerlessConfiguration": { 
         "collectionArn": "arn:aws:aoss:us-east-1:111122223333:collection/abcdefghij1234567890",
         "fieldMapping": { 
            "metadataField": "metadata",
            "textField": "text",
            "vectorField": "vector"
         },
         "vectorIndexName": "MyVectorIndex"
      }
   }
}
```

**Topics**
+ [Connect a data source to your knowledge base](data-source-connectors.md)
+ [Customize ingestion for a data source](kb-data-source-customize-ingestion.md)
+ [Set up security configurations for your knowledge base](kb-create-security.md)

# Connect a data source to your knowledge base
<a name="data-source-connectors"></a>

After finishing the configurations for your knowledge base, you connect a supported data source to the knowledge base.

Amazon Bedrock Knowledge Bases supports connecting to unstructured data sources or to structured data stores through a query engine. Select a topic to learn how to connect to that type of data source:

**Multimodal content support**  
Multimodal content (images, audio, and video files) is only supported with Amazon S3 and custom data sources. Other data source types will skip multimodal files during ingestion. For comprehensive guidance on working with multimodal content, see [Build a knowledge base for multimodal content](kb-multimodal.md).

To learn how to connect to a data source using the Amazon Bedrock console, select the topic that corresponds to your data source type at the bottom of this page:

To connect to a data source using the Amazon Bedrock API, send a [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) request with an [Agents for Amazon Bedrock runtime endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#bra-rt).

The following fields are required:


****  

| Field | Basic description | 
| --- | --- | 
| knowledgeBaseId | The ID of the knowledge base. | 
| name | A name for the knowledge base. | 
| dataSourceConfiguration | Specify the data source service or type in the type field and include the corresponding field. For more details about service-specific configurations, select the topic for the service from the topics at the bottom of this page. | 

The following fields are optional:


****  

| Field | Use case | 
| --- | --- | 
| description | To provide a description for the data source. | 
| vectorIngestionConfiguration | Contains configurations for customizing the ingestion process. For more information, see [Customize ingestion for a data source](kb-data-source-customize-ingestion.md). | 
| dataDeletionPolicy | To specify whether to RETAIN the vector embeddings in the vector store or to DELETE them. | 
| serverSideEncryptionConfiguration | To encrypt transient data during data syncing with a customer managed key, specify its ARN in the kmsKeyArn field. | 
| clientToken | To ensure the API request completes only once. For more information, see [Ensuring idempotency](https://docs.aws.amazon.com/ec2/latest/devguide/ec2-api-idempotency.html). | 

Select a topic to learn more about a service and configuring it.

**Topics**
+ [Connect to Amazon S3 for your knowledge base](s3-data-source-connector.md)
+ [Connect to Confluence for your knowledge base](confluence-data-source-connector.md)
+ [Connect to Microsoft SharePoint for your knowledge base](sharepoint-data-source-connector.md)
+ [Connect to Salesforce for your knowledge base](salesforce-data-source-connector.md)
+ [Crawl web pages for your knowledge base](webcrawl-data-source-connector.md)
+ [Connect your knowledge base to a custom data source](custom-data-source-connector.md)

# Connect to Amazon S3 for your knowledge base
<a name="s3-data-source-connector"></a>

Amazon S3 is an object storage service that stores data as objects within buckets. You can connect to your Amazon S3 bucket for your Amazon Bedrock knowledge base by using either the [AWS Management Console for Amazon Bedrock](https://console.aws.amazon.com/bedrock/home) or the [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) API (see Amazon Bedrock [supported SDKs and AWS CLI](https://docs.aws.amazon.com/bedrock/latest/APIReference/welcome.html)).

**Multimodal content support**  
Amazon S3 data sources support multimodal content including images, audio, and video files. For comprehensive guidance on working with multimodal content, see [Build a knowledge base for multimodal content](kb-multimodal.md).

You can upload a small batch of files to an Amazon S3 bucket using the Amazon S3 console or API. You can alternatively use [AWS DataSync](https://docs.aws.amazon.com/datasync/latest/userguide/create-s3-location.html) to upload multiple files to S3 continuously, and transfer files on a schedule from on-premises, edge, other cloud, or AWS storage.

Currently only General Purpose S3 buckets are supported.

There are limits to how many files and MB per file that can be crawled. See [Quotas for knowledge bases](https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.html).

**Topics**
+ [Supported features](#supported-features-s3-connector)
+ [Prerequisites](#prerequisites-s3-connector)
+ [Connection configuration](#configuration-s3-connector)

## Supported features
<a name="supported-features-s3-connector"></a>
+ Document metadata fields
+ Inclusion prefixes
+ Incremental content syncs for added, updated, deleted content

## Prerequisites
<a name="prerequisites-s3-connector"></a>

**In Amazon S3, make sure you**:
+ Note the Amazon S3 bucket URI, Amazon Resource Name (ARN), and the AWS account ID for the owner of the bucket. You can find the URI and ARN in the properties section in the Amazon S3 console. Your bucket must be in the same Region as your Amazon Bedrock knowledge base. You must have permission to access the bucket.

**In your AWS account, make sure you**:
+ Include the necessary permissions to connect to your data source in your AWS Identity and Access Management (IAM) role/permissions policy for your knowledge base. For information on the required permissions for this data source to add to your knowledge base IAM role, see [Permissions to access data sources](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-permissions.html#kb-permissions-access-ds).

**Note**  
If you use the console, the IAM role with all the required permissions can be created for you as part of the steps for creating a knowledge base. After you have configured your data source and other configurations, the IAM role with all the required permissions are applied to your specific knowledge base.

## Connection configuration
<a name="configuration-s3-connector"></a>

To connect to your Amazon S3 bucket, you must provide the necessary configuration information so that Amazon Bedrock can access and crawl your data. You must also follow the [Prerequisites](#prerequisites-s3-connector).

An example of a configuration for this data source is included in this section.

For more information about inclusion filters, document metadata fields, incremental syncing, and how these work, select the following:

### Document metadata fields
<a name="ds-s3-metadata-fields"></a>

You can include a separate file that specifies the document metadata fields/attributes for each file in your Amazon S3 data source and whether to include them in the embeddings when indexing the data source into the vector store. For example, you can create a file in the following format, name it *fileName.extension.metadata.json* and upload it to your S3 bucket.

```
{
  "metadataAttributes": {
    "company": {
      "value": {
        "type": "STRING",
        "stringValue": "BioPharm Innovations"
      },
      "includeForEmbedding": true
    },
    "created_date": {
      "value": {
        "type": "NUMBER",
        "numberValue": 20221205
      },
      "includeForEmbedding": true
    },
    "author": {
      "value": {
        "type": "STRING",
        "stringValue": "Lisa Thompson"
      },
      "includeForEmbedding": true
    },
    "origin": {
      "value": {
        "type": "STRING",
        "stringValue": "Overview"
      },
      "includeForEmbedding": true
    }
  }
}
```

The metadata file must use the same name as its associated source document file, with `.metadata.json` appended onto the end of the file name. The metadata file must be stored in the same folder or location as the source file in your Amazon S3 bucket. The file must not exceed the limit of 10 KB. For information on the supported attribute/field data types and the filtering operators you can apply to your metadata fields, see [Metadata and filtering](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-test-config.html).

### Inclusion prefixes
<a name="ds-s3-inclusion-exclusion"></a>

You can specify an inclusion prefix, which is an Amazon S3 path prefix, where you can use an S3 file or a folder instead of the entire bucket to create the S3 data source connector.

### Incremental syncing
<a name="ds-s3-incremental-sync"></a>

The data source connector crawls new, modified, and deleted content each time your data source syncs with your knowledge base. Amazon Bedrock can use your data source’s mechanism for tracking content changes and crawl content that changed since the last sync. When you sync your data source with your knowledge base for the first time, all content is crawled by default.

To sync your data source with your knowledge base, use the [StartIngestionJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_StartIngestionJob.html) API or select your knowledge base in the console and select **Sync** within the data source overview section.

**Important**  
All data that you sync from your data source becomes available to anyone with `bedrock:Retrieve` permissions to retrieve the data. This can also include any data with controlled data source permissions. For more information, see [Knowledge base permissions](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-permissions.html).

------
#### [ Console ]

**To connect an Amazon S3 bucket to your knowledge base**

1. Follow the steps at [Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases](knowledge-base-create.md) and choose **Amazon S3** as the data source.

1. Provide a name for the data source.

1. Specify whether the Amazon S3 bucket is in your current AWS account or another AWS account. Your bucket must be in the same Region as the knowledge base.

1. (Optional) If the Amazon S3 bucket is encrypted with a KMS key, include the key. For more information, see [Permissions to decrypt your AWS KMS key for your data sources in Amazon S3](encryption-kb.md#encryption-kb-ds).

1. (Optional) In the **Content parsing and chunking** section, you can customize how to parse and chunk your data. Refer to the following resources to learn more about these customizations:
   + For more information about parsing options, see [Parsing options for your data source](kb-advanced-parsing.md).
   + For more information about chunking strategies, see [How content chunking works for knowledge bases](kb-chunking.md).
**Warning**  
You can't change the chunking strategy after connecting to the data source.
   + For more information about how to customize chunking of your data and processing of your metadata with a Lambda function, see [Use a custom transformation Lambda function to define how your data is ingested](kb-custom-transformation.md).

1. In the **Advanced settings** section, you can optionally configure the following:
   + **KMS key for transient data storage.** – You can encrypt the transient data while converting your data into embeddings with the default AWS managed key or your own KMS key. For more information, see [Encryption of transient data storage during data ingestion](encryption-kb.md#encryption-kb-ingestion).
   + **Data deletion policy** – You can delete the vector embeddings for your data source that are stored in the vector store by default, or choose to retain the vector store data.

1. Continue to choose an embeddings model and vector store. To see the remaining steps, return to [Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases](knowledge-base-create.md) and continue from the step after connecting your data source.

------
#### [ API ]

The following is an example of a configuration for connecting to Amazon S3 for your Amazon Bedrock knowledge base. You configure your data source using the API with the AWS CLI or supported SDK, such as Python. After you call [CreateKnowledgeBase](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateKnowledgeBase.html), you call [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) to create your data source with your connection information in `dataSourceConfiguration`.

To learn about customizations that you can apply to ingestion by including the optional `vectorIngestionConfiguration` field, see [Customize ingestion for a data source](kb-data-source-customize-ingestion.md).

**AWS Command Line Interface**

```
aws bedrock-agent create-data-source \
 --name "S3-connector" \
 --description "S3 data source connector for Amazon Bedrock to use content in S3" \
 --knowledge-base-id "your-knowledge-base-id" \
 --data-source-configuration file://s3-bedrock-connector-configuration.json \
 --data-deletion-policy "DELETE" \
 --vector-ingestion-configuration '{"chunkingConfiguration":{"chunkingStrategy":"FIXED_SIZE","fixedSizeChunkingConfiguration":{"maxTokens":100,"overlapPercentage":10}}}'
                    
s3-bedrock-connector-configuration.json
{
    "s3Configuration": {
	    "bucketArn": "arn:aws:s3:::bucket-name",
	    "bucketOwnerAccountId": "000000000000",
	    "inclusionPrefixes": [
	        "documents/"
	    ]
    },
    "type": "S3"	
}
```

------

# Connect to Confluence for your knowledge base
<a name="confluence-data-source-connector"></a>

Atlassian Confluence is a collaborative work-management tool designed for sharing, storing, and working on project planning, software development, and product management. You can connect to your Confluence instance for your Amazon Bedrock knowledge base by using either the [AWS Management Console for Amazon Bedrock](https://console.aws.amazon.com/bedrock/home) or the [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) API (see Amazon Bedrock [supported SDKs and AWS CLI](https://docs.aws.amazon.com/bedrock/latest/APIReference/welcome.html)).

**Note**  
Confluence data source connector is in preview release and is subject to change.  
Confluence data sources don't support multimodal data, such as tables, charts, diagrams, or other images..

Amazon Bedrock supports connecting to Confluence Cloud instances. Currently, only Amazon OpenSearch Serverless vector store is available to use with this data source.

There are limits to how many files and MB per file that can be crawled. See [Quotas for knowledge bases](https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.html).

**Topics**
+ [Supported features](#supported-features-confluence-connector)
+ [Prerequisites](#prerequisites-confluence-connector)
+ [Connection configuration](#configuration-confluence-connector)

## Supported features
<a name="supported-features-confluence-connector"></a>
+ Auto detection of main document fields
+ Inclusion/exclusion content filters
+ Incremental content syncs for added, updated, deleted content
+ OAuth 2.0 authentication, authentication with Confluence API token

## Prerequisites
<a name="prerequisites-confluence-connector"></a>

**In Confluence, make sure you**:
+ Take note of your Confluence instance URL. For example, for Confluence Cloud, *https://example.atlassian.net*. The URL for Confluence Cloud must be the base URL, ending with *.atlassian.net*.
+ Configure basic authentication credentials containing a username (email of admin account) and password (Confluence API token) to allow Amazon Bedrock to connect to your Confluence Cloud instance. For information about how to create a Confluence API token, see [Manage API tokens for your Atlassian account](https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/#Create-an-API-token) on the Atlassian website.
+ (Optional) Configure an OAuth 2.0 application with credentials of an app key, app secret, access token, and refresh token. For more information, see [OAuth 2.0 apps](https://developer.atlassian.com/cloud/confluence/oauth-2-3lo-apps/) on the Atlassian website.
+ Certain read permissions or scopes must be enabled for your OAuth 2.0 app to connect to Confluence.

  Confluence API:
  + offline\$1access
  + read:content:confluence – View detailed contents 
  + read:content-details:confluence – View content details 
  + read:space-details:confluence – View space details
  + read:audit-log:confluence – View audit records 
  + read:page:confluence – View pages 
  + read:attachment:confluence – View and download content attachments 
  + read:blogpost:confluence – View blogposts 
  + read:custom-content:confluence – View custom content 
  + read:comment:confluence – View comments 
  + read:template:confluence – View content templates 
  + read:label:confluence – View labels 
  + read:watcher:confluence – View content watchers 
  + read:relation:confluence – View entity relationships 
  + read:user:confluence – View user details 
  + read:configuration:confluence – View Confluence settings 
  + read:space:confluence – View space details 
  + read:space.property:confluence – View space properties 
  + read:user.property:confluence – View user properties 
  + read:space.setting:confluence – View space settings 
  + read:analytics.content:confluence – View analytics for content
  + read:content.property:confluence – View content properties
  + read:content.metadata:confluence – View content summaries 
  + read:inlinetask:confluence – View tasks 
  + read:task:confluence – View tasks 
  + read:whiteboard:confluence – View whiteboards 
  + read:app-data:confluence – Read app data 
  + read:folder:confluence - View folders
  + read:embed:confluence - View Smart Link data

**In your AWS account, make sure you**:
+ Store your authentication credentials in an [AWS Secrets Manager secret](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html) and note the Amazon Resource Name (ARN) of the secret. Follow the **Connection configuration** instructions on this page to include the key-values pairs that must be included in your secret.
+ Include the necessary permissions to connect to your data source in your AWS Identity and Access Management (IAM) role/permissions policy for your knowledge base. For information on the required permissions for this data source to add to your knowledge base IAM role, see [Permissions to access data sources](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-permissions.html#kb-permissions-access-ds).

**Note**  
If you use the console, you can go to AWS Secrets Manager to add your secret or use an existing secret as part of the data source configuration step. The IAM role with all the required permissions can be created for you as part of the console steps for creating a knowledge base. After you have configured your data source and other configurations, the IAM role with all the required permissions are applied to your specific knowledge base.  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do not recommend that you re-use credentials and secrets across data sources.

## Connection configuration
<a name="configuration-confluence-connector"></a>

To connect to your Confluence instance, you must provide the necessary configuration information so that Amazon Bedrock can access and crawl your data. You must also follow the [Prerequisites](#prerequisites-confluence-connector).

An example of a configuration for this data source is included in this section.

For more information about auto detection of document fields, inclusion/exclusion filters, incremental syncing, secret authentication credentials, and how these work, select the following:

### Auto detection of main document fields
<a name="ds-confluence-document-fields"></a>

The data source connector automatically detects and crawls all of the main metadata fields of your documents or content. For example, the data source connector can crawl the document body equivalent of your documents, the document title, the document creation or modification date, or other core fields that might apply to your documents.

**Important**  
If your content includes sensitive information, then Amazon Bedrock could respond using sensitive information.

You can apply filtering operators to metadata fields to help you further improve the relevancy of responses. For example, document "epoch\$1modification\$1time" or the number of seconds that’s passed January 1 1970 for when the document was last updated. You can filter on the most recent data, where "epoch\$1modification\$1time" is *greater than* a certain number. For more information on the filtering operators you can apply to your metadata fields, see [Metadata and filtering](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-test-config.html).

### Inclusion/exclusion filters
<a name="ds-confluence-inclusion-exclusion"></a>

You can include or exclude crawling certain content. For example, you can specify an exclusion prefix/regular expression pattern to skip crawling any file that contains “private” in the file name. You could also specify an inclusion prefix/regular expression pattern to include certain content entities or content types. If you specify an inclusion and exclusion filter and both match a document, the exclusion filter takes precedence and the document isn’t crawled.

An example of a regular expression pattern to exclude or filter out PDF files that contain "private" in the file name: *".\$1private.\$1\$1\$1.pdf"*

You can apply inclusion/exclusion filters on the following content types:
+ `Space`: Unique space key
+ `Page`: Main page title
+ `Blog`: Main blog title
+ `Comment`: Comments that belong to a certain page or blog. Specify *Re: Page/Blog Title*
+ `Attachment`: Attachment file name with its extension

### Incremental syncing
<a name="ds-confluence-incremental-sync"></a>

The data source connector crawls new, modified, and deleted content each time your data source syncs with your knowledge base. Amazon Bedrock can use your data source’s mechanism for tracking content changes and crawl content that changed since the last sync. When you sync your data source with your knowledge base for the first time, all content is crawled by default.

To sync your data source with your knowledge base, use the [StartIngestionJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_StartIngestionJob.html) API or select your knowledge base in the console and select **Sync** within the data source overview section.

**Important**  
All data that you sync from your data source becomes available to anyone with `bedrock:Retrieve` permissions to retrieve the data. This can also include any data with controlled data source permissions. For more information, see [Knowledge base permissions](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-permissions.html).

### Secret authentication credentials
<a name="ds-confluence-secret-auth-credentials"></a>

(If using basic authentication) Your secret authentication credentials in AWS Secrets Manager should include these key-value pairs:
+ `username`: *admin user email address of Atlassian account*
+ `password`: *Confluence API token*

(If using OAuth 2.0 authentication) Your secret authentication credentials in AWS Secrets Manager should include these key-value pairs:
+ `confluenceAppKey`: *app key*
+ `confluenceAppSecret`: *app secret*
+ `confluenceAccessToken`: *app access token*
+ `confluenceRefreshToken`: *app refresh token*

**Note**  
Confluence OAuth2.0 **access** token has a default expiry time of 60 minutes. If this token expires while your data source is syncing (sync job), Amazon Bedrock will use the provided **refresh** token to regenerate this token. This regeneration refreshes both the access and refresh tokens. To keep the tokens updated from the current sync job to the next sync job, Amazon Bedrock requires write/put permissions for your secret credentials as part of your knowledge base IAM role.

**Note**  
Your secret in AWS Secrets Manager must use the same Region of your knowledge base.

------
#### [ Console ]

**Connect a Confluence instance to your knowledge base**

1. Follow the steps at [Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases](knowledge-base-create.md) and choose **Confluence** as the data source.

1. Provide a name and optional description for the data source.

1. Provide your Confluence instance URL. For example, for Confluence Cloud, *https://example.atlassian.net*. The URL for Confluence Cloud must be the base URL, ending with *.atlassian.net*.

1. In the **Advanced settings** section, you can optionally configure the following:
   + **KMS key for transient data storage.** – You can encrypt the transient data while converting your data into embeddings with the default AWS managed key or your own KMS key. For more information, see [Encryption of transient data storage during data ingestion](encryption-kb.md#encryption-kb-ingestion).
   + **Data deletion policy** – You can delete the vector embeddings for your data source that are stored in the vector store by default, or choose to retain the vector store data.

1. Provide the authentication information to connect to your Confluence instance:
   + For basic authentication, go to AWS Secrets Manager to add your secret authentication credentials or use an existing Amazon Resource Name (ARN) for the secret you created. Your secret must contain the admin user email address of the Atlassian account as the username and a Confluence API token in place of a password. For information about how to create a Confluence API token, see [Manage API tokens for your Atlassian account](https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/#Create-an-API-token) on the Atlassian website.
   + For OAuth 2.0 authentication, go to AWS Secrets Manager to add your secret authentication credentials or use an existing Amazon Resource Name (ARN) for the secret you created. Your secret must contain the Confluence app key, app secret, access token, and refresh token. For more information, see [OAuth 2.0 apps](https://developer.atlassian.com/cloud/confluence/oauth-2-3lo-apps/) on the Atlassian website.

1. (Optional) In the **Content parsing and chunking** section, you can customize how to parse and chunk your data. Refer to the following resources to learn more about these customizations:
   + For more information about parsing options, see [Parsing options for your data source](kb-advanced-parsing.md).
   + For more information about chunking strategies, see [How content chunking works for knowledge bases](kb-chunking.md).
**Warning**  
You can't change the chunking strategy after connecting to the data source.
   + For more information about how to customize chunking of your data and processing of your metadata with a Lambda function, see [Use a custom transformation Lambda function to define how your data is ingested](kb-custom-transformation.md).

1. Choose to use filters/regular expressions patterns to include or exclude certain content. All standard content is crawled otherwise.

1. Continue to choose an embeddings model and vector store. To see the remaining steps, return to [Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases](knowledge-base-create.md) and continue from the step after connecting your data source.

------
#### [ API ]

The following is an example of a configuration for connecting to Confluence Cloud for your Amazon Bedrock knowledge base. You configure your data source using the API with the AWS CLI or supported SDK, such as Python. After you call [CreateKnowledgeBase](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateKnowledgeBase.html), you call [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) to create your data source with your connection information in `dataSourceConfiguration`.

To learn about customizations that you can apply to ingestion by including the optional `vectorIngestionConfiguration` field, see [Customize ingestion for a data source](kb-data-source-customize-ingestion.md).

**AWS Command Line Interface**

```
aws bedrock create-data-source \
 --name "Confluence Cloud/SaaS connector" \
 --description "Confluence Cloud/SaaS data source connector for Amazon Bedrock to use content in Confluence" \
 --knowledge-base-id "your-knowledge-base-id" \
 --data-source-configuration file://confluence-bedrock-connector-configuration.json \
 --data-deletion-policy "DELETE" \
 --vector-ingestion-configuration '{"chunkingConfiguration":[{"chunkingStrategy":"FIXED_SIZE","fixedSizeChunkingConfiguration":[{"maxTokens":"100","overlapPercentage":"10"}]}]}'

confluence-bedrock-connector-configuration.json
{
    "confluenceConfiguration": {
        "sourceConfiguration": {
            "hostUrl": "https://example.atlassian.net",
            "hostType": "SAAS",
            "authType": "OAUTH2_CLIENT_CREDENTIALS",
            "credentialsSecretArn": "arn:aws::secretsmanager:your-region:secret:AmazonBedrock-Confluence"
        },
        "crawlerConfiguration": {
            "filterConfiguration": {
                "type": "PATTERN",
                "patternObjectFilter": {
                    "filters": [
                        {
                            "objectType": "Attachment",
                            "inclusionFilters": [
                                ".*\\.pdf"
                            ],
                            "exclusionFilters": [
                                ".*private.*\\.pdf"
                            ]
                        }
                    ]
                }
            }
        }
    },
    "type": "CONFLUENCE"
}
```

------

# Connect to Microsoft SharePoint for your knowledge base
<a name="sharepoint-data-source-connector"></a>

Microsoft SharePoint is a collaborative web-based service for working on documents, web pages, web sites, lists, and more. You can connect to your SharePoint instance for your Amazon Bedrock knowledge base by using either the [AWS Management Console for Amazon Bedrock](https://console.aws.amazon.com/bedrock/home) or the [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) API (see Amazon Bedrock [supported SDKs and AWS CLI](https://docs.aws.amazon.com/bedrock/latest/APIReference/welcome.html)).

**Note**  
SharePoint data source connector is in preview release and is subject to change.  
Microsoft SharePoint data sources don't support multimodal data, such as tables, charts, diagrams, or other images.

Amazon Bedrock supports connecting to SharePoint Online instances. Crawling OneNote documents is currently not supported. Currently, only Amazon OpenSearch Serverless vector store is available to use with this data source.

There are limits to how many files and MB per file that can be crawled. See [Quotas for knowledge bases](https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.html).

**Topics**
+ [Supported features](#supported-features-sharepoint-connector)
+ [Prerequisites](#prerequisites-sharepoint-connector)
+ [Connection configuration](#configuration-sharepoint-connector)

## Supported features
<a name="supported-features-sharepoint-connector"></a>
+ Auto detection of main document fields
+ Inclusion/exclusion content filters
+ Incremental content syncs for added, updated, deleted content
+ SharePoint App-Only authentication

## Prerequisites
<a name="prerequisites-sharepoint-connector"></a>

### SharePoint (Online)
<a name="prerequisites-sharepoint-connector-online"></a>

**In your SharePoint (Online), complete the following steps for using SharePoint App-Only authentication:**
+ Take note of your SharePoint Online site URL/URLs. For example, *https://yourdomain.sharepoint.com/sites/mysite*. Your URL must start with *https* and contain *sharepoint.com*. Your site URL must be the actual SharePoint site, not *sharepoint.com/* or *sites/mysite/home.aspx*
+ Take note of the domain name of your SharePoint Online instance URL/URLs.
+ Copy your Microsoft 365 tenant ID. You can find your tenant ID in the Properties of your Microsoft Entra portal. For details, see [Find your Microsoft 365 tenant ID](https://learn.microsoft.com/en-us/sharepoint/find-your-office-365-tenant-id).
**Note**  
For an example application, see [Register a client application in Microsoft Entra ID](https://learn.microsoft.com/en-us/azure/healthcare-apis/register-application) (formerly known as Azure Active Directory) on the Microsoft Learn website. 
+ Configure SharePoint App-Only credentials.
+ Copy the client ID and client secret value when granting permission to SharePoint App-Only. For more information, see [Granting access using SharePoint App-Only](https://learn.microsoft.com/en-us/sharepoint/dev/solution-guidance/security-apponly-azureacs).
**Note**  
You do not need to setup any API permission for SharePoint App-Only. However, you must configure APP permissions on the SharePoint side. For more information about the required APP permissions, see the Microsoft documentation on [Granting access using SharePoint App-Only](https://learn.microsoft.com/en-us/sharepoint/dev/solution-guidance/security-apponly-azureacs).

### AWS account
<a name="prerequisites-sharepoint-connector-account"></a>

**In your AWS account, make sure you**:
+ Store your authentication credentials in an [AWS Secrets Manager secret](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html) and note the Amazon Resource Name (ARN) of the secret. Follow the **Connection configuration** instructions on this page to include the key-values pairs that must be included in your secret.
+ Include the necessary permissions to connect to your data source in your AWS Identity and Access Management (IAM) role/permissions policy for your knowledge base. For information on the required permissions for this data source to add to your knowledge base IAM role, see [Permissions to access data sources](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-permissions.html#kb-permissions-access-ds).

**Note**  
If you use the console, you can go to AWS Secrets Manager to add your secret or use an existing secret as part of the data source configuration step. The IAM role with all the required permissions can be created for you as part of the console steps for creating a knowledge base. After you have configured your data source and other configurations, the IAM role with all the required permissions are applied to your specific knowledge base.  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do not recommend that you re-use credentials and secrets across data sources.

## Connection configuration
<a name="configuration-sharepoint-connector"></a>

To connect to your SharePoint instance, you must provide the necessary configuration information so that Amazon Bedrock can access and crawl your data. You must also follow the [Prerequisites](#prerequisites-sharepoint-connector).

An example of a configuration for this data source is included in this section.

For more information about auto detection of document fields, inclusion/exclusion filters, incremental syncing, secret authentication credentials, and how these work, select the following:

### Auto detection of main document fields
<a name="ds-sharepoint-document-fields"></a>

The data source connector automatically detects and crawls all of the main metadata fields of your documents or content. For example, the data source connector can crawl the document body equivalent of your documents, the document title, the document creation or modification date, or other core fields that might apply to your documents.

**Important**  
If your content includes sensitive information, then Amazon Bedrock could respond using sensitive information.

You can apply filtering operators to metadata fields to help you further improve the relevancy of responses. For example, document "epoch\$1modification\$1time" or the number of seconds that’s passed January 1 1970 for when the document was last updated. You can filter on the most recent data, where "epoch\$1modification\$1time" is *greater than* a certain number. For more information on the filtering operators you can apply to your metadata fields, see [Metadata and filtering](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-test-config.html).

### Inclusion/exclusion filters
<a name="ds-sharepoint-inclusion-exclusion"></a>

You can include or exclude crawling certain content. For example, you can specify an exclusion prefix/regular expression pattern to skip crawling any file that contains “private” in the file name. You could also specify an inclusion prefix/regular expression pattern to include certain content entities or content types. If you specify an inclusion and exclusion filter and both match a document, the exclusion filter takes precedence and the document isn’t crawled.

An example of a regular expression pattern to exclude or filter out PDF files that contain "private" in the file name: *".\$1private.\$1\$1\$1.pdf"*

You can apply inclusion/exclusion filters on the following content types:
+ `Page`: Main page title
+ `Event`: Event name
+ `File`: File name with its extension for attachments and all document files

Crawling OneNote documents is currently not supported.

### Incremental syncing
<a name="ds-sharepoint-incremental-sync"></a>

The data source connector crawls new, modified, and deleted content each time your data source syncs with your knowledge base. Amazon Bedrock can use your data source’s mechanism for tracking content changes and crawl content that changed since the last sync. When you sync your data source with your knowledge base for the first time, all content is crawled by default.

To sync your data source with your knowledge base, use the [StartIngestionJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_StartIngestionJob.html) API or select your knowledge base in the console and select **Sync** within the data source overview section.

**Important**  
All data that you sync from your data source becomes available to anyone with `bedrock:Retrieve` permissions to retrieve the data. This can also include any data with controlled data source permissions. For more information, see [Knowledge base permissions](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-permissions.html).

### Secret authentication credentials
<a name="ds-sharepoint-secret-auth-credentials"></a>

When using SharePoint App-Only authentication, your secret authentication credentials in AWS Secrets Manager must include these key-value pairs:
+ `clientId`: *client ID associated with your Microsoft Entra SharePoint application*
+ `clientSecret`: *client secret associated with your Microsoft Entra SharePoint application*
+ `sharePointClientId`: *client ID generated when registering your SharePoint app for App-Only authentication*
+ `sharePointClientSecret`: *client secret generated when registering your SharePoint app for App-Only authentication*

**Note**  
Your secret in AWS Secrets Manager must use the same Region of your knowledge base.

------
#### [ Console ]

**Connect a SharePoint instance to your knowledge base**<a name="connect-sharepoint-console"></a>

1. Follow the steps at [Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases](knowledge-base-create.md) and choose **SharePoint** as the data source.

1. Provide a name and optional description for the data source.

1. Provide your SharePoint site URL/URLs. For example, for SharePoint Online, *https://yourdomain.sharepoint.com/sites/mysite*. Your URL must start with *https* and contain *sharepoint.com*. Your site URL must be the actual SharePoint site, not *sharepoint.com/* or *sites/mysite/home.aspx*

1. Provide the domain name of your SharePoint instance.

1. In the **Advanced settings** section, you can optionally configure the following:
   + **KMS key for transient data storage.** – You can encrypt the transient data while converting your data into embeddings with the default AWS managed key or your own KMS key. For more information, see [Encryption of transient data storage during data ingestion](encryption-kb.md#encryption-kb-ingestion).
   + **Data deletion policy** – You can delete the vector embeddings for your data source that are stored in the vector store by default, or choose to retain the vector store data.

1. Provide the authentication information to connect to your SharePoint instance. For SharePoint App-Only authentication:

   1. Provide the tenant ID. You can find your tenant ID in the Properties of your Azure Active Directory portal.

   1. Go to AWS Secrets Manager to add your secret credentials or use an existing Amazon Resource Name (ARN) for the secret you created. Your secret must contain the SharePoint client ID and the SharePoint client secret generated when you registered the App-Only at the tenant level or the site level, and the Entra client ID and Entra client secret generated when you register the app in Entra.

1. (Optional) In the **Content parsing and chunking** section, you can customize how to parse and chunk your data. Refer to the following resources to learn more about these customizations:
   + For more information about parsing options, see [Parsing options for your data source](kb-advanced-parsing.md).
   + For more information about chunking strategies, see [How content chunking works for knowledge bases](kb-chunking.md).
**Warning**  
You can't change the chunking strategy after connecting to the data source.
   + For more information about how to customize chunking of your data and processing of your metadata with a Lambda function, see [Use a custom transformation Lambda function to define how your data is ingested](kb-custom-transformation.md).

1. Choose to use filters/regular expressions patterns to include or exclude certain content. All standard content is crawled otherwise.

1. Continue to choose an embeddings model and vector store. To see the remaining steps, return to [Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases](knowledge-base-create.md) and continue from the step after connecting your data source.

------
#### [ API ]

The following is an example of a configuration for connecting to SharePoint Online for your Amazon Bedrock knowledge base. You configure your data source using the API with the AWS CLI or supported SDK, such as Python. After you call [CreateKnowledgeBase](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateKnowledgeBase.html), you call [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) to create your data source with your connection information in `dataSourceConfiguration`.

To learn about customizations that you can apply to ingestion by including the optional `vectorIngestionConfiguration` field, see [Customize ingestion for a data source](kb-data-source-customize-ingestion.md).

**AWS Command Line Interface**

```
aws bedrock-agent create-data-source \
 --name "SharePoint Online connector" \
 --description "SharePoint Online data source connector for Amazon Bedrock to use content in SharePoint" \
 --knowledge-base-id "your-knowledge-base-id" \
 --data-source-configuration file://sharepoint-bedrock-connector-configuration.json \
 --data-deletion-policy "DELETE"
```

**Contents of `sharepoint-bedrock-connector-configuration.json`**

```
{
    "sharePointConfiguration": {
        "sourceConfiguration": {
            "tenantId": "888d0b57-69f1-4fb8-957f-e1f0bedf64de",
            "hostType": "ONLINE",
            "domain": "yourdomain",
            "siteUrls": [
                "https://yourdomain.sharepoint.com/sites/mysite"
            ],
            "authType": "OAUTH2_SHAREPOINT_APP_ONLY_CLIENT_CREDENTIALS",
            "credentialsSecretArn": "arn:aws::secretsmanager:your-region:secret:AmazonBedrock-SharePoint"
        },
        "crawlerConfiguration": {
            "filterConfiguration": {
                "type": "PATTERN",
                "patternObjectFilter": {
                    "filters": [
                        {
                            "objectType": "File",
                            "inclusionFilters": [
                                ".*\\.pdf"
                            ],
                            "exclusionFilters": [
                                ".*private.*\\.pdf"
                            ]
                        }
                    ]
                }
            }
        }
    },
    "type": "SHAREPOINT"
}
```

------

**Important**  
The OAuth2.0 authentication is not recommended. We recommend that you use SharePoint App-Only authentication.

## Using OAuth2.0
<a name="sharepoint-connector-oauth"></a>

Using OAuth 2.0, you can authenticate and authorize access to SharePoint resources for SharePoint connectors integrated with Knowledge Bases.

### Pre-requisites
<a name="sharepoint-connector-oauth-prereq"></a>

**In SharePoint, for OAuth 2.0 authentication, make sure you**:
+ Take note of your SharePoint Online site URL/URLs. For example, *https://yourdomain.sharepoint.com/sites/mysite*. Your URL must start with *https* and contain *sharepoint.com*. Your site URL must be the actual SharePoint site, not *sharepoint.com/* or *sites/mysite/home.aspx*
+ Take note of the domain name of your SharePoint Online instance URL/URLs.
+ Copy your Microsoft 365 tenant ID. You can find your tenant ID in the Properties of your Microsoft Entra portal or in your OAuth application.

  Take note of the username and password of the admin SharePoint account, and copy the client ID and client secret value when registering an application.
**Note**  
For an example application, see [Register a client application in Microsoft Entra ID](https://learn.microsoft.com/en-us/azure/healthcare-apis/register-application) (formerly known as Azure Active Directory) on the Microsoft Learn website. 
+ Certain read permissions are required to connect to SharePoint when you register an application.
  + SharePoint: AllSites.Read (Delegated) – Read items in all site collections
+ You might need to turn off **Security Defaults** in your Azure portal using an admin user. For more information on managing security default settings in the Azure portal, see [Microsoft documentation on how to enable/disable security defaults](https://learn.microsoft.com/en-us/microsoft-365/business-premium/m365bp-conditional-access?view=o365-worldwide&tabs=secdefaults#security-defaults-1).
+ You might need to turn off multi-factor authentication (MFA) in your SharePoint account, so that Amazon Bedrock is not blocked from crawling your SharePoint content.

To complete the pre-requisites, make sure that you've completed the steps in [AWS account](#prerequisites-sharepoint-connector-account).

### Secret authentication credentials
<a name="sharepoint-secret-auth-credentials-oauth"></a>

For connection configuration for OAuth2.0, you can perform the same steps for the auto detection of the main document fields, inclusion/exclusion filters, and incremental syncing as described in [Connection configuration](#configuration-sharepoint-connector).

**For OAuth 2.0 authentication, your secret authentication credentials in AWS Secrets Manager must include these key-value pairs**.
+ `username`: *SharePoint admin username*
+ `password`: *SharePoint admin password*
+ `clientId`: *OAuth app client ID*
+ `clientSecret`: *OAuth app client secret*

### Connect a SharePoint instance to your knowledge base
<a name="sharepoint-connector-oauth-using"></a>

To connect a SharePoint instance to your knowledge base when using OAuth2.0:
+ (console) In the console, follow the same steps as described in [Connect a SharePoint instance to your knowledge base](https://docs.aws.amazon.com/bedrock/latest/userguide/sharepoint-data-source-connector.html#connect-sharepoint-console). When you want to provide the authentication information to connect to your SharePoint instance.
  + Provide the tenant ID. You can find your tenant ID in the Properties of your Azure Active Directory portal.
  + Go to AWS Secrets Manager to add your secret authentication credentials or use an existing Amazon Resource Name (ARN) for the secret you created. Your secret must contain the SharePoint admin username and password, and your registered app client ID and client secret. For an example application, see [Register a client application in Microsoft Entra ID](https://learn.microsoft.com/en-us/azure/healthcare-apis/register-application) (formerly known as Azure Active Directory) on the Microsoft Learn website.
+ (API) The following is an example of using the `CreateDataSource` API to create your data source with your connection information for OAuth2.0.

  ```
  aws bedrock-agent create-data-source \
   --name "SharePoint Online connector" \
   --description "SharePoint Online data source connector for Amazon Bedrock to use content in SharePoint" \
   --knowledge-base-id "your-knowledge-base-id" \
   --data-source-configuration file://sharepoint-bedrock-connector-configuration.json \
   --data-deletion-policy "DELETE"
  ```

  **Contents of `sharepoint-bedrock-connector-configuration.json`**

  ```
  {
      "sharePointConfiguration": {
          "sourceConfiguration": {
              "tenantId": "888d0b57-69f1-4fb8-957f-e1f0bedf64de",
              "hostType": "ONLINE",
              "domain": "yourdomain",
              "siteUrls": [
                  "https://yourdomain.sharepoint.com/sites/mysite"
              ],
              "authType": "OAUTH2_CLIENT_CREDENTIALS",
              "credentialsSecretArn": "arn:aws::secretsmanager:your-region:secret:AmazonBedrock-SharePoint"
          },
          "crawlerConfiguration": {
              "filterConfiguration": {
                  "type": "PATTERN",
                  "patternObjectFilter": {
                      "filters": [
                          {
                              "objectType": "File",
                              "inclusionFilters": [
                                  ".*\\.pdf"
                              ],
                              "exclusionFilters": [
                                  ".*private.*\\.pdf"
                              ]
                          }
                      ]
                  }
              }
          }
      },
      "type": "SHAREPOINT"
  }
  ```

# Connect to Salesforce for your knowledge base
<a name="salesforce-data-source-connector"></a>

Salesforce is a customer relationship management (CRM) tool for managing support, sales, and marketing teams. You can connect to your Salesforce instance for your Amazon Bedrock knowledge base by using either the [AWS Management Console for Amazon Bedrock](https://console.aws.amazon.com/bedrock/home) or the [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) API (see Amazon Bedrock [supported SDKs and AWS CLI](https://docs.aws.amazon.com/bedrock/latest/APIReference/welcome.html)).

**Note**  
Salesforce data source connector is in preview release and is subject to change.  
Salesforce data sources don't support multimodal data, such as tables, charts, diagrams, or other images..

Currently, only Amazon OpenSearch Serverless vector store is available to use with this data source.

There are limits to how many files and MB per file that can be crawled. See [Quotas for knowledge bases](https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.html).

**Topics**
+ [Supported features](#supported-features-salesforce-connector)
+ [Prerequisites](#prerequisites-salesforce-connector)
+ [Connection configuration](#configuration-salesforce-connector)

## Supported features
<a name="supported-features-salesforce-connector"></a>
+ Auto detection of main document fields
+ Inclusion/exclusion content filters
+ Incremental content syncs for added, updated, deleted content
+ OAuth 2.0 authentication

## Prerequisites
<a name="prerequisites-salesforce-connector"></a>

**In Salesforce, make sure you**:
+ Take note of your Salesforce instance URL. For example, *https://company.salesforce.com/*. The instance must be running a Salesforce Connected App.
+ Create a Salesforce Connected App and configure client credentials. Then, for your selected app, copy the consumer key (client ID) and consumer secret (client secret) from the OAuth settings. For more information, see Salesforce documentation on [Create a Connected App](https://help.salesforce.com/s/articleView?id=sf.connected_app_create.htm&type=5) and [Configure a Connected App for the OAuth 2.0 Client Credentials](https://help.salesforce.com/s/articleView?id=sf.connected_app_client_credentials_setup.htm&type=5).
**Note**  
For Salesforce Connected Apps, under Client Credentials Flow, make sure you search and select the user’s name or alias for your client credentials in the “Run As” field.

**In your AWS account, make sure you**:
+ Store your authentication credentials in an [AWS Secrets Manager secret](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html) and note the Amazon Resource Name (ARN) of the secret. Follow the **Connection configuration** instructions on this page to include the key-values pairs that must be included in your secret.
+ Include the necessary permissions to connect to your data source in your AWS Identity and Access Management (IAM) role/permissions policy for your knowledge base. For information on the required permissions for this data source to add to your knowledge base IAM role, see [Permissions to access data sources](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-permissions.html#kb-permissions-access-ds).

**Note**  
If you use the console, you can go to AWS Secrets Manager to add your secret or use an existing secret as part of the data source configuration step. The IAM role with all the required permissions can be created for you as part of the console steps for creating a knowledge base. After you have configured your data source and other configurations, the IAM role with all the required permissions are applied to your specific knowledge base.  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do not recommend that you re-use credentials and secrets across data sources.

## Connection configuration
<a name="configuration-salesforce-connector"></a>

To connect to your Salesforce instance, you must provide the necessary configuration information so that Amazon Bedrock can access and crawl your data. You must also follow the [Prerequisites](#prerequisites-salesforce-connector).

An example of a configuration for this data source is included in this section.

For more information about auto detection of document fields, inclusion/exclusion filters, incremental syncing, secret authentication credentials, and how these work, select the following:

### Auto detection of main document fields
<a name="ds-salesforce-document-fields"></a>

The data source connector automatically detects and crawls all of the main metadata fields of your documents or content. For example, the data source connector can crawl the document body equivalent of your documents, the document title, the document creation or modification date, or other core fields that might apply to your documents.

**Important**  
If your content includes sensitive information, then Amazon Bedrock could respond using sensitive information.

You can apply filtering operators to metadata fields to help you further improve the relevancy of responses. For example, document "epoch\$1modification\$1time" or the number of seconds that’s passed January 1 1970 for when the document was last updated. You can filter on the most recent data, where "epoch\$1modification\$1time" is *greater than* a certain number. For more information on the filtering operators you can apply to your metadata fields, see [Metadata and filtering](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-test-config.html).

### Inclusion/exclusion filters
<a name="ds-salesforce-inclusion-exclusion"></a>

You can include or exclude crawling certain content. For example, you can specify an exclusion prefix/regular expression pattern to skip crawling any file that contains “private” in the file name. You could also specify an inclusion prefix/regular expression pattern to include certain content entities or content types. If you specify an inclusion and exclusion filter and both match a document, the exclusion filter takes precedence and the document isn’t crawled.

An example of a regular expression pattern to exclude or filter out campaigns that contain "private" in the campaign name: *".\$1private.\$1"*

You can apply inclusion/exclusion filters on the following content types:
+ `Account`: Account number/identifier
+ `Attachment`: Attachment file name with its extension
+ `Campaign`: Campaign name and associated identifiers
+ `ContentVersion`: Document version and associated identifiers
+ `Partner`: Partner information fields including associated identifiers
+ `Pricebook2`: Product/price list name
+ `Case`: Customer inquiry/issue number and other information fields including associated identifiers (please note: can contain personal information, which you can choose to exclude or filter out)
+ `Contact`: Customer information fields (please note: can contain personal information, which you can choose to exclude or filter out)
+ `Contract`: Contract name and associated identifiers
+ `Document`: File name with its extension
+ `Idea`: Idea information fields and associated identifiers
+ `Lead`: Potential new customer information fields (please note: can contain personal information, which you can choose to exclude or filter out)
+ `Opportunity`: Pending sale/deal information fields and associated identifiers
+ `Product2`: Product information fields and associated identifiers
+ `Solution`: Solution name for a customer inquiry/issue and associated identifiers
+ `Task`: Task information fields and associated identifiers
+ `FeedItem`: Identifier of the chatter feed post
+ `FeedComment`: Identifier of the chatter feed post that the comments belong to
+ `Knowledge__kav`: Knowledge Article Title
+ `User`: User alias within your organization
+ `CollaborationGroup`: Chatter group name (unique)

### Incremental syncing
<a name="ds-salesforce-incremental-sync"></a>

The data source connector crawls new, modified, and deleted content each time your data source syncs with your knowledge base. Amazon Bedrock can use your data source’s mechanism for tracking content changes and crawl content that changed since the last sync. When you sync your data source with your knowledge base for the first time, all content is crawled by default.

To sync your data source with your knowledge base, use the [StartIngestionJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_StartIngestionJob.html) API or select your knowledge base in the console and select **Sync** within the data source overview section.

**Important**  
All data that you sync from your data source becomes available to anyone with `bedrock:Retrieve` permissions to retrieve the data. This can also include any data with controlled data source permissions. For more information, see [Knowledge base permissions](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-permissions.html).

### Secret authentication credentials
<a name="ds-salesforce-secret-auth-credentials"></a>

(For OAuth 2.0 authentication) Your secret authentication credentials in AWS Secrets Manager should include these key-value pairs:
+ `consumerKey`: *app client ID*
+ `consumerSecret`: *app client secret*
+ `authenticationUrl`: *Salesforce instance URL or the URL to request the authentication token from*

**Note**  
Your secret in AWS Secrets Manager must use the same Region of your knowledge base.

------
#### [ Console ]

**Connect a Salesforce instance to your knowledge base**

1. Follow the steps at [Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases](knowledge-base-create.md) and choose **Salesforce** as the data source.

1. Provide a name and optional description for the data source.

1. Provide your Salesforce instance URL. For example, *https://company.salesforce.com/*. The instance must be running a Salesforce Connected App.

1. In the **Advanced settings** section, you can optionally configure the following:
   + **KMS key for transient data storage.** – You can encrypt the transient data while converting your data into embeddings with the default AWS managed key or your own KMS key. For more information, see [Encryption of transient data storage during data ingestion](encryption-kb.md#encryption-kb-ingestion).
   + **Data deletion policy** – You can delete the vector embeddings for your data source that are stored in the vector store by default, or choose to retain the vector store data.

1. Provide the authentication information to connect to your Salesforce instance:

   1. For OAuth 2.0 authentication, go to AWS Secrets Manager to add your secret authentication credentials or use an existing Amazon Resource Name (ARN) for the secret you created. Your secret must contain the Salesforce Connected App consumer key (client ID), consumer secret (client secret), and the Salesforce instance URL or the URL to request the authentication token from. For more information, see Salesforce documentation on [Create a Connected App](https://help.salesforce.com/s/articleView?id=sf.connected_app_create.htm&type=5) and [Configure a Connected App for the OAuth 2.0 Client Credentials](https://help.salesforce.com/s/articleView?id=sf.connected_app_client_credentials_setup.htm&type=5).

1. (Optional) In the **Content parsing and chunking** section, you can customize how to parse and chunk your data. Refer to the following resources to learn more about these customizations:
   + For more information about parsing options, see [Parsing options for your data source](kb-advanced-parsing.md).
   + For more information about chunking strategies, see [How content chunking works for knowledge bases](kb-chunking.md).
**Warning**  
You can't change the chunking strategy after connecting to the data source.
   + For more information about how to customize chunking of your data and processing of your metadata with a Lambda function, see [Use a custom transformation Lambda function to define how your data is ingested](kb-custom-transformation.md).

1. Choose to use filters/regular expressions patterns to include or exclude certain content. All standard content is crawled otherwise.

1. Continue to choose an embeddings model and vector store. To see the remaining steps, return to [Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases](knowledge-base-create.md) and continue from the step after connecting your data source.

------
#### [ API ]

The following is an example of a configuration for connecting to Salesforce for your Amazon Bedrock knowledge base. You configure your data source using the API with the AWS CLI or supported SDK, such as Python. After you call [CreateKnowledgeBase](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateKnowledgeBase.html), you call [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) to create your data source with your connection information in `dataSourceConfiguration`.

To learn about customizations that you can apply to ingestion by including the optional `vectorIngestionConfiguration` field, see [Customize ingestion for a data source](kb-data-source-customize-ingestion.md).

**AWS Command Line Interface**

```
aws bedrock create-data-source \
 --name "Salesforce connector" \
 --description "Salesforce data source connector for Amazon Bedrock to use content in Salesforce" \
 --knowledge-base-id "your-knowledge-base-id" \
 --data-source-configuration file://salesforce-bedrock-connector-configuration.json \
 --data-deletion-policy "DELETE" \
 --vector-ingestion-configuration '{"chunkingConfiguration":[{"chunkingStrategy":"FIXED_SIZE","fixedSizeChunkingConfiguration":[{"maxTokens":"100","overlapPercentage":"10"}]}]}'

salesforce-bedrock-connector-configuration.json
{
    "salesforceConfiguration": {
        "sourceConfiguration": {
            "hostUrl": "https://company.salesforce.com/",
            "authType": "OAUTH2_CLIENT_CREDENTIALS",
            "credentialsSecretArn": "arn:aws::secretsmanager:your-region:secret:AmazonBedrock-Salesforce"
        },
        "crawlerConfiguration": {
            "filterConfiguration": {
                "type": "PATTERN",
                "patternObjectFilter": {
                    "filters": [
                        {
                            "objectType": "Campaign",
                            "inclusionFilters": [
                                ".*public.*"
                            ],
                            "exclusionFilters": [
                                ".*private.*"
                            ]
                        }
                    ]
                }
            }
        }
    },
    "type": "SALESFORCE"
}
```

------

# Crawl web pages for your knowledge base
<a name="webcrawl-data-source-connector"></a>

The Amazon Bedrock provided Web Crawler connects to and crawls URLs you have selected for use in your Amazon Bedrock knowledge base. You can crawl website pages in accordance with your set scope or limits for your selected URLs. You can crawl website pages using either the [AWS Management Console for Amazon Bedrock](https://console.aws.amazon.com/bedrock/home) or the [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) API (see Amazon Bedrock [supported SDKs and AWS CLI](https://docs.aws.amazon.com/bedrock/latest/APIReference/welcome.html)). Currently, only Amazon OpenSearch Serverless vector store is available to use with this data source.

**Note**  
The Web Crawler data source connector is in preview release and is subject to change.

When selecting websites to crawl, you must adhere to the [Amazon Acceptable Use Policy](https://aws.amazon.com/aup/) and all other Amazon terms. Remember that you must only use the Web Crawler to index your own web pages, or web pages that you have authorization to crawl and must respect robots.txt configurations..

The Web Crawler respects robots.txt in accordance with the [RFC 9309](https://www.rfc-editor.org/rfc/rfc9309.html)

There are limits to how many web page content items and MB per content item that can be crawled. See [Quotas for knowledge bases](https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.html).

**Topics**
+ [Supported features](#supported-features-webcrawl-connector)
+ [Prerequisites](#prerequisites-webcrawl-connector)
+ [Connection configuration](#configuration-webcrawl-connector)

## Supported features
<a name="supported-features-webcrawl-connector"></a>

The Web Crawler connects to and crawls HTML pages starting from the seed URL, traversing all child links under the same top primary domain and path. If any of the HTML pages reference supported documents, the Web Crawler will fetch these documents, regardless if they are within the same top primary domain. You can modify the crawling behavior by changing the crawling configuration - see [Connection configuration](#configuration-webcrawl-connector).

The following is supported for you to:
+ Select multiple source URLs to crawl and set the scope of URLs to crawl only the host or also include subdomains.
+ Crawl static web pages that are part of your source URLs.
+ Specify custom User Agent suffix to set rules for your own crawler.
+ Include or exclude certain URLs that match a filter pattern.
+ Respect standard robots.txt directives like 'Allow' and 'Disallow'.
+ Limit the scope of the URLs to crawl and optionally exclude URLs that match a filter pattern.
+ Limit the rate of crawling URLs and the maximum number of pages to crawl.
+ View the status of crawled URLs in Amazon CloudWatch

## Prerequisites
<a name="prerequisites-webcrawl-connector"></a>

**To use the Web Crawler, make sure you:**.
+ Check that you are authorized to crawl your source URLs.
+ Check the path to robots.txt corresponding to your source URLs doesn't block the URLs from being crawled. The Web Crawler adheres to the standards of robots.txt: `disallow` by default if robots.txt is not found for the website. The Web Crawler respects robots.txt in accordance with the [RFC 9309](https://www.rfc-editor.org/rfc/rfc9309.html). You can also specify custom User Agent header suffix to set rules for your own crawler. For more information, see Web Crawler URL access in [Connection configuration](#configuration-webcrawl-connector) instructions on this page.
+ [Enable CloudWatch Logs delivery](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-bases-logging.html) and follow examples of Web Crawler logs to view the status of your data ingestion job for ingesting web content, and if certain URLs cannot be retrieved.

**Note**  
When selecting websites to crawl, you must adhere to the [Amazon Acceptable Use Policy](https://aws.amazon.com/aup/) and all other Amazon terms. Remember that you must only use the Web Crawler to index your own web pages, or web pages that you have authorization to crawl.

## Connection configuration
<a name="configuration-webcrawl-connector"></a>

For more information about sync scope for crawling URLs, inclusion/exclusion filters, URL access, incremental syncing, and how these work, select the following:

### Sync scope for crawling URLs
<a name="ds-sync-scope"></a>

You can limit the scope of the URLs to crawl based on each page URL's specific relationship to the seed URLs. For faster crawls, you can limit URLs to those with the same host and initial URL path of the seed URL. For more broader crawls, you can choose to crawl URLs with the same host or within any subdomain of the seed URL.

You can choose from the following options.
+ Default: Limit crawling to web pages that belong to the same host and with the same initial URL path. For example, with a seed URL of "https://aws.amazon.com/bedrock/" then only this path and web pages that extend from this path will be crawled, like "https://aws.amazon.com/bedrock/agents/". Sibling URLs like "https://aws.amazon.com/ec2/" are not crawled, for example.
+ Host only: Limit crawling to web pages that belong to the same host. For example, with a seed URL of "https://aws.amazon.com/bedrock/", then web pages with "https://aws.amazon.com" will also be crawled, like "https://aws.amazon.com/ec2".
+ Subdomains: Include crawling of any web page that has the same primary domain as the seed URL. For example, with a seed URL of "https://aws.amazon.com/bedrock/" then any web page that contains "amazon.com" (subdomain) will be crawled, like "https://www.amazon.com".

**Note**  
Make sure you are not crawling potentially excessive web pages. It's not recommended to crawl large websites, such as wikipedia.org, without filters or scope limits. Crawling large websites will take a very long time to crawl.  
[Supported file types](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-ds.html) are crawled regardless of scope and if there's no exclusion pattern for the file type.

The Web Crawler supports static websites.

You can also limit the rate of crawling URLs to control the throttling of crawling speed. You set the maximum number of URLs crawled per host per minute. In addition, you can also set the maximum number (up to 25,000) of total web pages to crawl. Note that if the total number of web pages from your source URLs exceeds your set maximum, then your data source sync/ingestion job will fail.

### Inclusion/exclusion filters
<a name="ds-inclusion-exclusion"></a>

You can include or exclude certain URLs in accordance with your scope. [Supported file types](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-ds.html) are crawled regardless of scope and if there's no exclusion pattern for the file type. If you specify an inclusion and exclusion filter and both match a URL, the exclusion filter takes precedence and the web content isn’t crawled.

**Important**  
Problematic regular expression pattern filters that lead to [catastrophic backtracking](https://docs.aws.amazon.com/codeguru/detector-library/python/catastrophic-backtracking-regex/) and look ahead are rejected.

An example of a regular expression filter pattern to exclude URLs that end with ".pdf" or PDF web page attachments: *".\$1\$1.pdf\$1"*

### Web Crawler URL access
<a name="ds-webcrawl-identity-crawling"></a>

You can use the Web Crawler to crawl the pages of websites that you are authorized to crawl.

When selecting websites to crawl, you must adhere to the [Amazon Acceptable Use Policy](https://aws.amazon.com/aup/) and all other Amazon terms. Remember that you must only use the Web Crawler to index your own web pages, or web pages that you have authorization to crawl.

The Web Crawler respects robots.txt in accordance with the [RFC 9309](https://www.rfc-editor.org/rfc/rfc9309.html)

You can specify certain user agent bots to either ‘Allow’ or ‘Disallow’ the user agent to crawl your source URLs. You can modify the robots.txt file of your website to control how the Web Crawler crawls your source URLs. The crawler will first look for `bedrockbot-UUID ` rules and then for generic `bedrockbot` rules in the robots.txt file.

You can also add a User-Agent suffix that can be used to allowlist your crawler in bot protection systems. Note that this suffix does not need to be added to the `robots.txt` file to make sure that no one can impersonate the User Agent string. For example, to allow the Web Crawler to crawl all website content and disallow crawling for any other robots, use the following directive:

```
User-agent: bedrockbot-UUID # Amazon Bedrock Web Crawler
Allow: / # allow access to all pages
User-agent: * # any (other) robot
Disallow: / # disallow access to any pages
```

### Incremental syncing
<a name="ds-incremental-sync"></a>

Each time the the Web Crawler runs, it retrieves content for all URLs that are reachable from the source URLs and which match the scope and filters. For incremental syncs after the first sync of all content, Amazon Bedrock will update your knowledge base with new and modified content, and will remove old content that is no longer present. Occasionally, the crawler may not be able to tell if content was removed from the website; and in this case it will err on the side of preserving old content in your knowledge base.

To sync your data source with your knowledge base, use the [StartIngestionJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_StartIngestionJob.html) API or select your knowledge base in the console and select **Sync** within the data source overview section.

**Important**  
All data that you sync from your data source becomes available to anyone with `bedrock:Retrieve` permissions to retrieve the data. This can also include any data with controlled data source permissions. For more information, see [Knowledge base permissions](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-permissions.html).

------
#### [ Console ]

**Connect a Web Crawler data source to your knowledge base**

1. Follow the steps at [Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases](knowledge-base-create.md) and choose **Web Crawler** as the data source.

1. Provide a name and optional description for the data source.

1. Provide the **Source URLs** of the URLs you want to crawl. You can add up to 9 additional URLs by selecting **Add Source URLs**. By providing a source URL, you are confirming that you are authorized to crawl its domain.

1. In the **Advanced settings** section, you can optionally configure the following:
   + **KMS key for transient data storage.** – You can encrypt the transient data while converting your data into embeddings with the default AWS managed key or your own KMS key. For more information, see [Encryption of transient data storage during data ingestion](encryption-kb.md#encryption-kb-ingestion).
   + **Data deletion policy** – You can delete the vector embeddings for your data source that are stored in the vector store by default, or choose to retain the vector store data.

1. (Optional) Provide a user agent suffix for **bedrock-UUID-** that identifies the crawler or bot when it accesses a web server.

1. Configure the following in the **Sync scope** section:

   1. Select a **Website domain range** for crawling your source URLs:
      + Default: Limit crawling to web pages that belong to the same host and with the same initial URL path. For example, with a seed URL of "https://aws.amazon.com/bedrock/" then only this path and web pages that extend from this path will be crawled, like "https://aws.amazon.com/bedrock/agents/". Sibling URLs like "https://aws.amazon.com/ec2/" are not crawled, for example.
      + Host only: Limit crawling to web pages that belong to the same host. For example, with a seed URL of "https://aws.amazon.com/bedrock/", then web pages with "https://aws.amazon.com" will also be crawled, like "https://aws.amazon.com/ec2".
      + Subdomains: Include crawling of any web page that has the same primary domain as the seed URL. For example, with a seed URL of "https://aws.amazon.com/bedrock/" then any web page that contains "amazon.com" (subdomain) will be crawled, like "https://www.amazon.com".
**Note**  
Make sure you are not crawling potentially excessive web pages. It's not recommended to crawl large websites, such as wikipedia.org, without filters or scope limits. Crawling large websites will take a very long time to crawl.  
[Supported file types](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-ds.html) are crawled regardless of scope and if there's no exclusion pattern for the file type.

   1. Enter **Maximum throttling of crawling speed**. Ingest URLs between 1 and 300 URLs per host per minute. A higher crawling speed increases the load but takes less time.

   1. Enter **Maximum pages for data source sync** between 1 and 25000. Limit the maximum number of web pages crawled from your source URLs. If web pages exceed this number the data source sync will fail and no web pages will be ingested. 

   1. For **URL Regex** patterns (optional) you can add **Include patterns** or **Exclude patterns** by entering the regular expression pattern in the box. You can add up to 25 include and 25 exclude filter patterns by selecting **Add new pattern**. The include and exclude patterns are crawled in accordance with your scope. If there's a conflict, the exclude pattern takes precedence.

1. (Optional) In the **Content parsing and chunking** section, you can customize how to parse and chunk your data. Refer to the following resources to learn more about these customizations:
   + For more information about parsing options, see [Parsing options for your data source](kb-advanced-parsing.md).
   + For more information about chunking strategies, see [How content chunking works for knowledge bases](kb-chunking.md).
**Warning**  
You can't change the chunking strategy after connecting to the data source.
   + For more information about how to customize chunking of your data and processing of your metadata with a Lambda function, see [Use a custom transformation Lambda function to define how your data is ingested](kb-custom-transformation.md).

1. Continue to choose an embeddings model and vector store. To see the remaining steps, return to [Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases](knowledge-base-create.md) and continue from the step after connecting your data source.

------
#### [ API ]

To connect a knowledge base to a data source using WebCrawler, send a [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) request with an [Agents for Amazon Bedrock build-time endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#bra-bt), specify `WEB` in the `type` field of the [DataSourceConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_DataSourceConfiguration.html), and include the `webConfiguration` field. The following is an example of a configuration of Web Crawler for your Amazon Bedrock knowledge base.

```
{
    "webConfiguration": {
        "sourceConfiguration": {
            "urlConfiguration": {
                "seedUrls": [{
                    "url": "https://www.examplesite.com"
                }]
            }
        },
        "crawlerConfiguration": {
            "crawlerLimits": {
                "rateLimit": 50,
                "maxPages": 100
            },
            "scope": "HOST_ONLY",
            "inclusionFilters": [
                "https://www\.examplesite\.com/.*\.html"
            ],
            "exclusionFilters": [
                "https://www\.examplesite\.com/contact-us\.html"
            ],
            "userAgent": "CustomUserAgent"
        }
    },
    "type": "WEB"
}
```

To learn about customizations that you can apply to ingestion by including the optional `vectorIngestionConfiguration` field, see [Customize ingestion for a data source](kb-data-source-customize-ingestion.md).

------

# Connect your knowledge base to a custom data source
<a name="custom-data-source-connector"></a>

Instead of choosing a supported data source service, you can connect to a custom data source for the following advantages:
+ Flexibility and control over the data types that you want your knowledge base to have access to.
+ The ability to use the `KnowledgeBaseDocuments` API operations to directly ingest or delete documents without the need to sync changes.
+ The ability to view documents in your data source directly through the Amazon Bedrock console or API.
+ The ability to upload documents into the data source directly in the AWS Management Console or to add them inline.
+ The ability to add metadata directly to each document for when adding or updating a document in the data source. For more information on how to use metadata for filtering when retrieving information from a data source, see the **Metadata and filtering** tab in [Configure and customize queries and response generation](kb-test-config.md).

**Multimodal content support**  
Custom data sources support multimodal content including images, audio, and video files up to 10MB base64 encoded. For comprehensive guidance on working with multimodal content, see [Build a knowledge base for multimodal content](kb-multimodal.md).

To connect a knowledge base to a custom data source, send a [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) request with an [Agents for Amazon Bedrock build-time endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#bra-bt). Specify the `knowledgeBaseId` of the knowledge base to connect to, give a `name` to the data source, and specify the `type` field in the `dataSourceConfiguration` as `CUSTOM`. The following shows a minimal example to create this data source:

```
PUT /knowledgebases/KB12345678/datasources/ HTTP/1.1
Content-type: application/json

{
    "name": "MyCustomDataSource",
    "dataSourceConfiguration": {
        "type": "CUSTOM"
    }
}
```

You can include any of the following optional fields to configure the data source:


****  

| Field | Use case | 
| --- | --- | 
| description | To provide a description for the data source. | 
| clientToken | To ensure the API request completes only once. For more information, see [Ensuring idempotency](https://docs.aws.amazon.com/ec2/latest/devguide/ec2-api-idempotency.html). | 
| serverSideEncryptionConfiguration | To specify a custom KMS key for transient data storage while converting your data into embeddings. For more information, see [Encryption of transient data storage during data ingestion](encryption-kb.md#encryption-kb-ingestion) | 
| dataDeletionPolicy | To configure what to do with the vector embeddings for your data source in your vector store, if you delete the data source. Specify RETAIN to retain the data in the vector store or the default option of DELETE to delete them. | 
| vectorIngestionConfiguration | To configure options for ingestion of the data source. See below for more information. | 

The `vectorIngestionConfiguration` field maps to a [VectorIngestionConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_VectorIngestionConfiguration.html) object containing the following fields:
+ chunkingConfiguration – To configure the strategy to use for chunking the documents in the data source. For more information about chunking strategies, see [How content chunking works for knowledge bases](kb-chunking.md).
+ parsingConfiguration – To configure the strategy to use for parsing the data source. For more information about parsing options, see [Parsing options for your data source](kb-advanced-parsing.md).
+ customTransformationConfiguration – To customize how the data is transformed and to apply a Lambda function for greater customization. For more information about how to customize chunking of your data and processing of your metadata with a Lambda function, see [Use a custom transformation Lambda function to define how your data is ingested](kb-custom-transformation.md).

After setting up your custom data source, you can add documents into it and directly ingest them into the knowledge base. Unlike other data sources, you don't need to sync a custom data source. To learn how to ingest documents directly, see [Ingest changes directly into a knowledge base](kb-direct-ingestion.md).

# Customize ingestion for a data source
<a name="kb-data-source-customize-ingestion"></a>

You can customize vector ingestion when connecting a data source in the AWS Management Console or by modifying the value of the `vectorIngestionConfiguration` field when sending a [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) request.

Select a topic to learn how to include configurations for customizing ingestion when connecting to a data source:

**Topics**
+ [Choose the tool to use for parsing](#kb-data-source-customize-parsing)
+ [Choose a chunking strategy](#kb-data-source-customize-chunking)
+ [Use a Lambda function during ingestion](#kb-data-source-customize-lambda)

## Choose the tool to use for parsing
<a name="kb-data-source-customize-parsing"></a>

You can customize how the documents in your data are parsed. To learn about options for parsing data in Amazon Bedrock Knowledge Bases, see [Parsing options for your data source](kb-advanced-parsing.md).

**Warning**  
You can't change the parsing strategy after connecting to the data source. To use a different parsing strategy, you can add a new data source.  
You can't add an S3 location to store multimodal data (including images, figures, charts, and tables) after you've created a knowledge base. If you want to include multimodal data and use a parser that supports it, you must create a new knowledge base.

The steps involved in choosing a parsing strategy depend on whether you use the AWS Management Console or the Amazon Bedrock API and the parsing method you choose. If you choose a parsing method that supports multimodal data, you must specify an S3 URI in which to store the multimodal data extracted from your documents. This data can be returned in knowledge base query.
+ In the AWS Management Console, do the following:

  1. Select the parsing strategy when you connect to a data source while setting up a knowledge base or when you add a new data source to your existing knowledge base.

  1. (If you choose Amazon Bedrock Data Automation or a foundation model as your parsing strategy) Specify an S3 URI in which to store the multimodal data extracted from your documents in the **Multimodal storage destination** section when you select an embeddings model and configure your vector store. You can also optionally use a customer managed key to encrypt your S3 data at this step.
+ In the Amazon Bedrock API, do the following:

  1. (If you plan to use Amazon Bedrock Data Automation or a foundation model as your parsing strategy) Include a [SupplementalDataStorageLocation](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_SupplementalDataStorageLocation.html) in the [VectorKnowledgeBaseConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_VectorKnowledgeBaseConfiguration.html) of a [CreateKnowledgeBase](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateKnowledgeBase.html) request.

  1. Include a [ParsingConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_ParsingConfiguration.html) in the `parsingConfiguration` field of the [VectorIngestionConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_VectorIngestionConfiguration.html) in the [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) request.
**Note**  
If you omit this configuration, Amazon Bedrock Knowledge Bases uses the Amazon Bedrock default parser.

For more details about how to specify a parsing strategy in the API, expand the section that corresponds to the parsing strategy that you want to use:

### Amazon Bedrock default parser
<a name="w2aac32c10c23c15c17c11c13b1"></a>

To use the default parser, don't include a `parsingConfiguration` field within the `VectorIngestionConfiguration`.

### Amazon Bedrock Data Automation parser (preview)
<a name="w2aac32c10c23c15c17c11c13b3"></a>

To use the Amazon Bedrock Data Automation parser, specify `BEDROCK_DATA_AUTOMATION` in the `parsingStrategy` field of the `ParsingConfiguration` and include a [BedrockDataAutomationConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_BedrockDataAutomationConfiguration.html) in the `bedrockDataAutomationConfiguration` field, as in the following format:

```
{
    "parsingStrategy": "BEDROCK_DATA_AUTOMATION",
    "bedrockDataAutomationConfiguration": {
        "parsingModality": "string"
    }
}
```

### Foundation model
<a name="w2aac32c10c23c15c17c11c13b5"></a>

To use a foundation model as a parser, specify the `BEDROCK_FOUNDATION_MODEL` in the `parsingStrategy` field of the `ParsingConfiguration` and include a [BedrockFoundationModelConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_BedrockFoundationModelConfiguration.html) in the `bedrockFoundationModelConfiguration` field, as in the following format:

```
{
    "parsingStrategy": "BEDROCK_FOUNDATION_MODEL",
    "bedrockFoundationModelConfiguration": {
        "modelArn": "string",
        "parsingModality": "string",
        "parsingPrompt": {
            "parsingPromptText": "string"
        }
    }
}
```

## Choose a chunking strategy
<a name="kb-data-source-customize-chunking"></a>

You can customize how the documents in your data are chunked for storage and retrieval. To learn about options for chunking data in Amazon Bedrock Knowledge Bases, see [How content chunking works for knowledge bases](kb-chunking.md).

**Warning**  
You can't change the chunking strategy after connecting to the data source.

In the AWS Management Console you choose the chunking strategy when connecting to a data source. With the Amazon Bedrock API, you include a [ChunkingConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_ChunkingConfiguration.html) in the `chunkingConfiguration` field of the [VectorIngestionConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_VectorIngestionConfiguration.html).

**Note**  
If you omit this configuration, Amazon Bedrock splits your content into chunks of approximately 300 tokens, while preserving sentence boundaries.

Expand the section that corresponds to the parsing strategy that you want to use:

### No chunking
<a name="w2aac32c10c23c15c17c13c13b1"></a>

To treat each document in your data source as a single source chunk, specify `NONE` in the `chunkingStrategy` field of the `ChunkingConfiguration`, as in the following format:

```
{
    "chunkingStrategy": "NONE"
}
```

### Fixed-size chunking
<a name="w2aac32c10c23c15c17c13c13b3"></a>

To divide each document in your data source into chunks of approximately the same size, specify `FIXED_SIZE` in the `chunkingStrategy` field of the `ChunkingConfiguration` and include a [FixedSizeChunkingConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_FixedSizeChunkingConfiguration.html) in the `fixedSizeChunkingConfiguration` field, as in the following format:

```
{
    "chunkingStrategy": "FIXED_SIZE",
    "fixedSizeChunkingConfiguration": {
        "maxTokens": number,
        "overlapPercentage": number
    }
}
```

### Hierarchical chunking
<a name="w2aac32c10c23c15c17c13c13b5"></a>

To divide each document in your data source into two levels, where the second layer contains smaller chunks derived from the first layer, specify `HIERARCHICAL` in the `chunkingStrategy` field of the `ChunkingConfiguration` and include the `hierarchicalChunkingConfiguration` field, as in the following format:

```
{
    "chunkingStrategy": "HIERARCHICAL",
    "hierarchicalChunkingConfiguration": {
        "levelConfigurations": [{
            "maxTokens": number
        }],
        "overlapTokens": number
    }
}
```

### Semantic chunking
<a name="w2aac32c10c23c15c17c13c13b7"></a>

To divide each document in your data source into chunks that prioritize semantic meaning over syntactic structure, specify `SEMANTIC` in the `chunkingStrategy` field of the `ChunkingConfiguration` and include the `semanticChunkingConfiguration` field, as in the following format:

```
{
    "chunkingStrategy": "SEMANTIC",
    "semanticChunkingConfiguration": {
        "breakpointPercentileThreshold": number,
        "bufferSize": number,
        "maxTokens": number
    }
}
```

## Use a Lambda function during ingestion
<a name="kb-data-source-customize-lambda"></a>

You can post-process how the source chunks from your data are written to the vector store with a Lambda function in the following ways:
+ Include chunking logic to provide a custom chunking strategy.
+ Include logic to specify chunk-level metadata.

To learn about writing a custom Lambda function for ingestion, see [Use a custom transformation Lambda function to define how your data is ingested](kb-custom-transformation.md). In the AWS Management Console you choose the Lambda function when connecting to a data source. With the Amazon Bedrock API, you include a [CustomTransformationConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CustomTransformationConfiguration.html) in the `CustomTransformationConfiguration` field of the [VectorIngestionConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_VectorIngestionConfiguration.html) and specify the ARN of the Lambda, as in the following format:

```
{
    "transformations": [{
        "transformationFunction": {
            "transformationLambdaConfiguration": {
                "lambdaArn": "string"
            }
        },
        "stepToApply": "POST_CHUNKING"
    }],
    "intermediateStorage": {
        "s3Location": {
            "uri": "string"
        }
    }
}
```

You also specify the S3 location in which to store the output after applying the Lambda function.

You can include the `chunkingConfiguration` field to apply the Lambda function after applying one of the chunking options that Amazon Bedrock offers.

# Set up security configurations for your knowledge base
<a name="kb-create-security"></a>

After you've created a knowledge base, you might have to set up the following security configurations:

**Topics**
+ [Set up data access policies for your knowledge base](#kb-create-security-data)
+ [Set up network access policies for your Amazon OpenSearch Serverless knowledge base](#kb-create-security-network)

## Set up data access policies for your knowledge base
<a name="kb-create-security-data"></a>

If you're using a [custom role](kb-permissions.md), set up security configurations for your newly created knowledge base. If you let Amazon Bedrock create a service role for you, you can skip this step. Follow the steps in the tab corresponding to the database that you set up.

------
#### [ Amazon OpenSearch Serverless ]

To restrict access to the Amazon OpenSearch Serverless collection to the knowledge base service role, create a data access policy. You can do so in the following ways:
+ Use the Amazon OpenSearch Service console by following the steps at [Creating data access policies (console)](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-data-access.html#serverless-data-access-console) in the Amazon OpenSearch Service Developer Guide.
+ Use the AWS API by sending a [CreateAccessPolicy](https://docs.aws.amazon.com/opensearch-service/latest/ServerlessAPIReference/API_CreateAccessPolicy.html) request with an [OpenSearch Serverless endpoint](https://docs.aws.amazon.com/general/latest/gr/opensearch-service.html#opensearch-service-regions). For an AWS CLI example, see [Creating data access policies (AWS CLI)](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-data-access.html#serverless-data-access-cli).

Use the following data access policy, specifying the Amazon OpenSearch Serverless collection and your service role:

```
[
    {
        "Description": "${data access policy description}",
        "Rules": [
          {
            "Resource": [
              "index/${collection_name}/*"
            ],
            "Permission": [
                "aoss:DescribeIndex",
                "aoss:ReadDocument",
                "aoss:WriteDocument"
            ],
            "ResourceType": "index"
          }
        ],
        "Principal": [
            "arn:aws:iam::${account-id}:role/${kb-service-role}"
        ]
    }
]
```

------
#### [ Pinecone, Redis Enterprise Cloud or MongoDB Atlas ]

To integrate a Pinecone, Redis Enterprise Cloud, MongoDB Atlas vector index, attach the following identity-based policy to your knowledge base service role to allow it to access the AWS Secrets Manager secret for the vector index.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [{
        "Effect": "Allow",
        "Action": [
            "bedrock:AssociateThirdPartyKnowledgeBase"
        ],
        "Resource": "*",
        "Condition": {
            "StringEquals": {
                "bedrock:ThirdPartyKnowledgeBaseCredentialsSecretArn": "arn:aws:secretsmanager:us-east-1:123456789012:secret:${secret-id}"
            }
        }
    }]
}
```

------

------

## Set up network access policies for your Amazon OpenSearch Serverless knowledge base
<a name="kb-create-security-network"></a>

If you use a private Amazon OpenSearch Serverless collection for your knowledge base, it can only be accessed through an AWS PrivateLink VPC endpoint. You can create a private Amazon OpenSearch Serverless collection when you [set up your Amazon OpenSearch Serverless vector collection](knowledge-base-setup.md) or you can make an existing Amazon OpenSearch Serverless collection (including one that the Amazon Bedrock console created for you) private when you configure its network access policy.

The following resources in the Amazon OpenSearch Service Developer Guide will help you understand the setup required for a private Amazon OpenSearch Serverless collections:
+ For more information about setting up a VPC endpoint for a private Amazon OpenSearch Serverless collection, see [Access Amazon OpenSearch Serverless using an interface endpoint (AWS PrivateLink)](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-vpc.html).
+ For more information about network access policies in Amazon OpenSearch Serverless, see [Network access for Amazon OpenSearch Serverless](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-network.html).

To allow an Amazon Bedrock knowledge base to access a private Amazon OpenSearch Serverless collection, you must edit the network access policy for the Amazon OpenSearch Serverless collection to allow Amazon Bedrock as a source service. Choose the tab for your preferred method, and then follow the steps:

------
#### [ Console ]

1. Open the Amazon OpenSearch Service console at [https://console.aws.amazon.com/aos/](https://console.aws.amazon.com/aos/).

1. From the left navigation pane, select **Collections**. Then choose your collection.

1. In the **Network** section, select the **Associated Policy**.

1. Choose **Edit**.

1. For **Select policy definition method**, do one of the following:
   + Leave **Select policy definition method** as **Visual editor** and configure the following settings in the **Rule 1** section:

     1. (Optional) In the **Rule name** field, enter a name for the network access rule.

     1. Under **Access collections from**, select **Private (recommended)**.

     1. Select **AWS service private access**. In the text box, enter **bedrock.amazonaws.com**.

     1. Unselect **Enable access to OpenSearch Dashboards.**
   + Choose **JSON** and paste the following policy in the **JSON editor**.

     ```
     [
         {                                        
             "AllowFromPublic": false,
             "Description":"${network access policy description}",
             "Rules":[
                 {
                     "ResourceType": "collection",
                     "Resource":[
                         "collection/${collection-id}"
                     ]
                 }
             ],
             "SourceServices":[
                 "bedrock.amazonaws.com"
             ]
         }
     ]
     ```

1. Choose **Update**.

------
#### [ API ]

To edit the network access policy for your Amazon OpenSearch Serverless collection, do the following:

1. Send a [GetSecurityPolicy](https://docs.aws.amazon.com/opensearch-service/latest/ServerlessAPIReference/API_GetSecurityPolicy.html) request with an [OpenSearch Serverless endpoint](https://docs.aws.amazon.com/general/latest/gr/opensearch-service.html#opensearch-service-regions). Specify the `name` of the policy and specify the `type` as `network`. Note the `policyVersion` in the response.

1. Send a [UpdateSecurityPolicy](https://docs.aws.amazon.com/opensearch-service/latest/ServerlessAPIReference/API_UpdateSecurityPolicy.html) request with an [OpenSearch Serverless endpoint](https://docs.aws.amazon.com/general/latest/gr/opensearch-service.html#opensearch-service-regions). Minimally, specify the following fields:  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/kb-create-security.html)

   ```
   [
       {                                        
           "AllowFromPublic": false,
           "Description":"${network access policy description}",
           "Rules":[
               {
                   "ResourceType": "collection",
                   "Resource":[
                       "collection/${collection-id}"
                   ]
               }
           ],
           "SourceServices":[
               "bedrock.amazonaws.com"
           ]
       }
   ]
   ```

For an AWS CLI example, see [Creating data access policies (AWS CLI)](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-data-access.html#serverless-data-access-cli).

------
+ Use the Amazon OpenSearch Service console by following the steps at [Creating network policies (console)](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-network.html#serverless-network-console). Instead of creating a network policy, note the **Associated policy** in the **Network** subsection of the collection details.

# Sync your data with your Amazon Bedrock knowledge base
<a name="kb-data-source-sync-ingest"></a>

After you create your knowledge base, you ingest or sync your data so that the data can be queried. Ingestion converts the raw data in your data source into vector embeddings, based on the vector embeddings model and configurations you specified.

Before you begin ingestion, check that your data source fulfills the following conditions:
+ You have configured the connection information for your data source. To configure a data source connector to crawl your data from your data source repository, see [Supported data source connectors](https://docs.aws.amazon.com/bedrock/latest/userguide/data-source-connectors.html). You configure your data source as part of creating your knowledge base.
+ You have configured your chosen vector embeddings model and vector store. See [supported vector embeddings models](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-supported.html) and [vector stores for knowledge bases](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-setup.html). You configure your vector embeddings as part of creating your knowledge base.
+ The files are in supported formats. For more information, see [Support document formats](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-ds.html#kb-ds-supported-doc-formats-limits).
+ The files don't exceed the **Ingestion job file size** specified in [Amazon Bedrock endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/bedrock.html) in the AWS General Reference.
+ If your data source contains metadata files, check the following conditions to ensure that the metadata files aren't ignored:
  + Each `.metadata.json` file shares the same file name and extension as the source file that it's associated with.
  + If the vector index for your knowledge base is in an Amazon OpenSearch Serverless vector store, check that the vector index is configured with the `faiss` engine. If the vector index is configured with the `nmslib` engine, you'll have to do one of the following:
    + [Create a new knowledge base](knowledge-base-create.md) in the console and let Amazon Bedrock automatically create a vector index in Amazon OpenSearch Serverless for you.
    + [Create another vector index](knowledge-base-setup.md) in the vector store and select `faiss` as the **Engine**. Then [create a new knowledge base](knowledge-base-create.md) and specify the new vector index.
  + If the vector index for your knowledge base is in an Amazon Aurora database cluster, we recommend that you use the custom metadata field to store all your metadata in a single column and create an index on this column. If you do not provide the custom metadata field, you must check that the table for your index contains a column for each metadata property in your metadata files before starting ingestion. For more information, see [Prerequisites for using a vector store you created for a knowledge base](knowledge-base-setup.md).

Each time you add, modify, or remove files from your data source, you must sync the data source so that it is re-indexed to the knowledge base. Syncing is incremental, so Amazon Bedrock only processes added, modified, or deleted documents since the last sync.

## How a knowledge base handles resyncs
<a name="kb-data-source-sync-ingest-resync"></a>

Each time you add, modify, or remove files from your data source, you must sync the data source so that it is re-indexed in the knowledge base. Syncing is incremental, so Amazon Bedrock processes only the documents that were added, modified, or deleted since the last sync. When you sync a data source, Amazon Bedrock re-ingests documents to ensure accuracy and consistency. Re-ingestion includes parsing, chunking, generating embeddings, and indexing into the vector store.


**Sync scenarios**  

| Scenario | What happens | 
| --- | --- | 
| No changes detected | The document is skipped. | 
| Content or metadata changed | The document is re-ingested (re-parsed, re-chunked, re-embedded, and re-indexed). | 
| New document added | Only the new document is ingested. | 
| Document deleted | The document is removed from the vector store. | 

### Metadata-only optimization
<a name="kb-data-source-sync-ingest-metadata-optimization"></a>

In certain cases, Amazon Bedrock can update metadata without re-ingesting the document associated with that metadata file. This optimization retrieves existing vector embeddings from the vector store, merges the new metadata, and writes the updated embeddings back, which avoids calls to the embedding model.

This optimization applies only when all of the following conditions are met:
+ Only `metadata.json` files are modified. No content files are changed.
+ The associated content files are not CSV files.
+ The data source does not use a custom transformation Lambda function.

### Re-ingestion behavior for CSV files
<a name="kb-data-source-sync-ingest-csv-reingestion"></a>

CSV files use the `documentStructureConfiguration` field in metadata to control which columns are indexed. Because Amazon Bedrock cannot determine whether this structural configuration changed without re-processing the file, CSV files are always re-ingested when their metadata files are updated.

To learn how to ingest your data into your knowledge base and sync with your latest data, choose the tab for your preferred method, and then follow the steps:

------
#### [ Console ]

**To ingest your data into your knowledge base and sync with your latest data**

1. Open the Amazon Bedrock console at [https://console.aws.amazon.com/bedrock/](https://console.aws.amazon.com/bedrock/).

1. From the left navigation pane, select **Knowledge base** and choose your knowledge base.

1. In the **Data source** section, select **Sync** to begin data ingestion or syncing your latest data. To stop a data source currently syncing, select **Stop**. A data source must be currently syncing in order to stop syncing the data source. You can select **Sync** again to ingest the rest of your data.

1. When data ingestion completes, a green success banner appears if it is successful.
**Note**  
After data syncing completes, it could take a few minutes for the vector embeddings of the newly synced data to reflect in your knowledge base and be available for querying if you use a vector store other than Amazon Aurora (RDS).

1. You can choose a data source to view its **Sync history**. Select **View warnings** to see why a data ingestion job failed.

------
#### [ API ]

To ingest your data into your knowledge base and sync with your latest data, send a [StartIngestionJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_StartIngestionJob.html) request with a [Agents for Amazon Bedrock build-time endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#bra-bt). Specify the `knowledgeBaseId` and `dataSourceId`. You can also stop a data ingestion job that is currently running by sending a [StopIngestionJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_StopIngestionJob.html) request. Specify the `dataSourceId`, `ingestionJobId`, and `knowledgeBaseId`. A data ingestion job must be currently running in order to stop data ingestion. You can send a `StartIngestionJob` request again to ingest the rest of your data when you are ready.

Use the `ingestionJobId` returned in the response in a [GetIngestionJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_GetIngestionJob.html) request with a [Agents for Amazon Bedrock build-time endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#bra-bt) to track the status of the ingestion job. In addition, specify the `knowledgeBaseId` and `dataSourceId`.
+ When the ingestion job finishes, the `status` in the response is `COMPLETE`.
**Note**  
After data ingestion completes, it could take few minutes for the vector embeddings of the newly ingested data to be available in the vector store for querying if you use a vector store other than Amazon Aurora (RDS).
+ The `statistics` object in the response returns information about whether ingestion was successful or not for documents in the data source.

You can also see information for all ingestion jobs for a data source by sending a [ListIngestionJobs](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_ListIngestionJobs.html) request with a [Agents for Amazon Bedrock build-time endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#bra-bt). Specify the `dataSourceId` and the `knowledgeBaseId` of the knowledge base that the data is being ingested to.
+ Filter for results by specifying a status to search for in the `filters` object.
+ Sort by the time that the job was started or the status of a job by specifying the `sortBy` object. You can sort in ascending or descending order.
+ Set the maximum number of results to return in a response in the `maxResults` field. If there are more results than the number you set, the response returns a `nextToken` that you can send in another [ListIngestionJobs](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_ListIngestionJobs.html) request to see the next batch of jobs.

------

# Ingest changes directly into a knowledge base
<a name="kb-direct-ingestion"></a>

Amazon Bedrock Knowledge Bases allows you to modify your data source and sync the changes in one step. You can take advantage of this feature if your knowledge base is connected to one of the following types of data sources:
+ Amazon S3
+ Custom

With direct ingestion, you can directly add, update, or delete files in a knowledge base in a single action and your knowledge base can have access to documents without the need to sync. Direct ingestion uses the `KnowledgeBaseDocuments` API operations to index the documents that you submit directly into the vector store set up for the knowledge base. You can also view the documents in your knowledge base directly with these operations, rather than needing to navigate to the connected data source to view them.

## Differences from syncing a data source
<a name="kb-direct-ingestion-sync-diff"></a>

Amazon Bedrock Knowledge Bases also offers a set of `IngestionJob` API operations that relate to [syncing your data source](kb-data-source-sync-ingest.md). When you sync your data source with a [StartIngestionJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_StartIngestionJob.html) request, Amazon Bedrock Knowledge Bases scans each document in the connected data source and verifies whether it has already been indexed into the vector store set up for the knowledge base. If it hasn't, it becomes indexed into the vector store.

With an [IngestKnowledgeBaseDocuments](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_IngestKnowledgeBaseDocuments.html) request, you submit an array of documents to be directly indexed into the vector store. Therefore, you skip the step of adding documents into the data source. See the following paragraphs to understand the use case for these two sets of API operations:

**If you use a custom data source**  
You don't need to sync or use the `IngestionJob` operations. Documents that you add, modify, or delete with the `KnowledgeBaseDocuments` operations or in the AWS Management Console become part of both the custom data source and your knowledge base.

**If you use an Amazon S3 data source**  
You use the two sets of operations in different use cases:
+ After connecting the knowledge base to the S3 data source for the first time, you must sync your data source in the AWS Management Console or by submitting a [StartIngestionJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_StartIngestionJob.html) request through the Amazon Bedrock API.
+ Index documents into the vector store set up for your knowledge base or remove the indexed documents in the following ways:

  1. Add documents into your S3 location or delete documents from it. Then sync your data source in the AWS Management Console or submit a `StartIngestionJob` request in the API. For details about syncing and the `StartIngestionJob` operation, see [Sync your data with your Amazon Bedrock knowledge base](kb-data-source-sync-ingest.md).

  1. Ingest S3 documents into the knowledge base directly with an `IngestKnowledgeBaseDocuments` request. For details about directly ingesting documents, see [Ingest documents directly into a knowledge base](kb-direct-ingestion-add.md).
**Warning**  
For S3 data sources, any changes that you index into the knowledge base directly in the AWS Management Console or with the `KnowledgeBaseDocuments` API operations aren't reflected in the S3 location. You can use these API operations to make changes to your knowledge base immediately available in a single step. However, you should follow up by making the same changes in your S3 location so that they aren't overwritten the next time you sync your data source in the AWS Management Console or with `StartIngestionJob`.  
Don't submit an `IngestKnowledgeBaseDocuments` and `StartIngestionJob` request at the same time.

Select a topic to learn how to perform direct ingestion of the documents in your data sources:

**Topics**
+ [Differences from syncing a data source](#kb-direct-ingestion-sync-diff)
+ [Prerequisites for direct ingestion](kb-direct-ingestion-prereq.md)
+ [Ingest documents directly into a knowledge base](kb-direct-ingestion-add.md)
+ [View information about documents in your data source](kb-direct-ingestion-view.md)
+ [Delete documents from a knowledge base directly](kb-direct-ingestion-delete.md)

# Prerequisites for direct ingestion
<a name="kb-direct-ingestion-prereq"></a>

To use direct ingestion, an IAM role must have permissions to use the `KnowledgeBaseDocs` API operations. If your IAM role has the [AmazonBedrockFullAccess](security-iam-awsmanpol.md#security-iam-awsmanpol-AmazonBedrockFullAccess) AWS managed policy attached, you can skip this section.

The following policy can be attached to an IAM role to allow it to perform direct ingestion on the knowledge bases that you specify in the `Resource` field.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "DirectIngestion",
            "Effect": "Allow",
            "Action": [
                "bedrock:StartIngestionJob",
                "bedrock:IngestKnowledgeBaseDocuments",
                "bedrock:GetKnowledgeBaseDocuments",
                "bedrock:ListKnowledgeBaseDocuments",
                "bedrock:DeleteKnowledgeBaseDocuments"
            ],
            "Resource": [
                "arn:aws:bedrock:us-east-1:123456789012:knowledge-base/${KnowledgeBaseId}"
            ]
        }
    ]
}
```

------

To further restrict permissions, you can omit actions, or you can specify resources and condition keys by which to filter permissions. For more information about actions, resources, and condition keys, see the following topics in the *Service Authorization Reference*:
+ [Actions defined by Amazon Bedrock](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonbedrock.html#amazonbedrock-actions-as-permissions) – Learn about actions, the resource types that you can scope them to in the `Resource` field, and the condition keys that you can filter permissions on in the `Condition` field.
+ [Resource types defined by Amazon Bedrock](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonbedrock.html#amazonbedrock-resources-for-iam-policies) – Learn about the resource types in Amazon Bedrock.
+ [Condition keys for Amazon Bedrock](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonbedrock.html#amazonbedrock-policy-keys) – Learn about the condition keys in Amazon Bedrock.

# Ingest documents directly into a knowledge base
<a name="kb-direct-ingestion-add"></a>

This topic describes how to ingest documents directly into a knowledge base. Restrictions apply for the types of documents that you can directly ingest depending on your data source. Refer to the following table for restrictions on the methods that you can use to specify the documents to ingest:


****  

| Data source type | Document defined in-line | Document in Amazon S3 location | 
| --- | --- | --- | 
| Amazon S3 | ![\[Red circular icon with an X symbol, indicating cancellation or denial.\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/icons/icon-no.png)No | ![\[Green circular icon with a white checkmark symbol inside.\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/icons/icon-yes.png)Yes | 
| Custom | ![\[Green circular icon with a white checkmark symbol inside.\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/icons/icon-yes.png)Yes | ![\[Green circular icon with a white checkmark symbol inside.\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/icons/icon-yes.png)Yes | 

Expand the section that corresponds your use case:

**Note**  
When you use the console, you can ingest up to 10 documents directly into your knowledge base. If you use the `IngestKnowledgeBaseDocuments` API instead, you can ingest up to 25 documents into your knowledge base. For more information about this quota, see the [Amazon Bedrock service quotas](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#limits_bedrock) in the *AWS General Reference guide*.

## Use the console
<a name="kb-direct-ingestion-add-console"></a>

To add or modify documents directly in the AWS Management Console, do the following:

1. Sign in to the AWS Management Console with an IAM identity that has permissions to use the Amazon Bedrock console. Then, open the Amazon Bedrock console at [https://console.aws.amazon.com/bedrock](https://console.aws.amazon.com/bedrock).

1. In the left navigation pane, choose **Knowledge bases**.

1. In the **Knowledge bases** section, select the knowledge base to ingest documents into.

1. In the **Data source** section, select the data source for which you want to add, modify, or delete documents.

1. In the **Documents** section, choose **Add documents**. Then, do one of the following:
   + To add or modify a document directly, select **Add documents directly**. Then, do the following:

     1. In the **Document identifier** field, specify a unique name for the document. If you specify a name that already exists in the data source, the document will be replaced.

     1. To upload a document, select **Upload**. To define a document inline, select **Add document inline**, choose a format, and enter the text of the document in the box.

     1. (Optional) To associate metadata with the document, select **Add metadata** and enter a key, type, and value.
   + To add or modify a document by specifying its S3 location, select **Add S3 documents**. Then, do the following:

     1. In the **Document identifier** field, specify a unique name for the document. If you specify a name that already exists in the data source, the document will be replaced.

     1. Specify whether the **S3 location** of the document is in your current AWS account or a different one. Then specify the S3 URI of the document.

     1. (Optional) To associate metadata with the document, choose a **Metadata source**. Specify the S3 URI of the metadata or select **Add metadata** and enter a key, type, and value.

1. To ingest the document and any associated metadata, choose **Add**.

## Use the API
<a name="kb-direct-ingestion-add-api"></a>

To ingest documents directly into a knowledge base using the Amazon Bedrock API, send an [IngestKnowledgeBaseDocuments](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_IngestKnowledgeBaseDocuments.html) request with an [Agents for Amazon Bedrock build-time endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#bra-bt) and specify the ID of the knowledge base and of the data source that it's connected to.

**Note**  
If you specify a document identifier or S3 location that already exists in the knowledge base, the document will be overwritten with the new content.

The request body contains one field, `documents`, that maps to an array of [KnowledgeBaseDocument](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_KnowledgeBaseDocument.html) objects, each of which represents the content and optional metadata of a document to add to the data source and to ingest into the knowledge base. A [KnowledgeBaseDocument](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_KnowledgeBaseDocument.html) object contains the following fields:
+ content – Maps to a [DocumentContent](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_DocumentContent.html) object containing information about the content of the document to add.
+ metadata – (Optional) Maps to a [DocumentMetadata](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_DocumentMetadata.html) object containing information about the metadata of the document to add. For more information about how to use metadata during retrieval, see the **Metadata and filtering** section in [Configure and customize queries and response generation](kb-test-config.md).

Select a topic to learn how to ingest documents for different data source types or to see examples:

**Topics**
+ [Ingest a document into a knowledge base connected to a custom data source](#kb-direct-ingestion-add-custom)
+ [Ingest a document into a knowledge base connected to an Amazon S3 data source](#kb-direct-ingestion-add-s3)
+ [Example request bodies](#w2aac32c10c23c19c17c11b3c19)

### Ingest a document into a knowledge base connected to a custom data source
<a name="kb-direct-ingestion-add-custom"></a>

If the `dataSourceId` you specify belongs to a custom data source, you can add content and metadata for each [KnowledgeBaseDocument](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_KnowledgeBaseDocument.html) object in the `documents` array.

The content of a document added to a custom data source can be defined in the following ways:

#### Define the document in-line
<a name="kb-direct-ingestion-add-custom-inline"></a>

You can define the following types of documents in-line:

------
#### [ Text ]

If the document is text, the [DocumentContent](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_DocumentContent.html) object should be in the following format:

```
{ 
    "custom": { 
        "customDocumentIdentifier": { 
            "id": "string"
        },
        "inlineContent": { 
            "textContent": { 
                "data": "string"
            },
            "type": "TEXT"
        },
        "sourceType": "IN_LINE"
    },
    "dataSourceType": "CUSTOM"
}
```

Include an ID for the document in the `id` field and the text of the document in the `data` field.

------
#### [ Bytes ]

If the document contains more than text, convert it into a Base64-string. The [DocumentContent](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_DocumentContent.html) object should then be in the following format:

```
{ 
    "custom": { 
        "customDocumentIdentifier": { 
            "id": "string"
        },
        "inlineContent": { 
            "byteContent": { 
                "data": blob,
                "mimeType": "string"
            },
            "type": "BYTE"
        },
        "sourceType": "IN_LINE"
    },
    "dataSourceType": "CUSTOM"
}
```

Include an ID for the document in the `id` field, the Base64-encoded document in the `data` field, and the MIME type in the `mimeType` field.

------

#### Ingest the document from S3
<a name="w2aac32c10c23c19c17c11b3c15b7b3"></a>

If you're ingesting a document from an S3 location, the [DocumentContent](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_DocumentContent.html) object in the `content` field should be of the following form:

```
{ 
    "custom": { 
        "customDocumentIdentifier": { 
            "id": "string"
        },
        "s3Location": { 
            "bucketOwnerAccountId": "string",
            "uri": "string"
        },
        "sourceType": "S3"
    },
    "dataSourceType": "CUSTOM"
}
```

Include an ID for the document in the `id` field, the owner of the S3 bucket that contains the document in `bucketOwnerAccountId` field, and the S3 URI of the document in the `uri` field.

The metadata for a document can be defined in the following ways:

#### Define the metadata in-line
<a name="w2aac32c10c23c19c17c11b3c15c11b1"></a>

If you define the metadata inline, the [DocumentMetadata](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_DocumentMetadata.html) object in the `metadata` field should be in the following format:

```
{ 
    "inlineAttributes": [ 
        { 
            "key": "string",
            "value": { 
                "stringValue": "string",
                "booleanValue": boolean,
                "numberValue": number,
                "stringListValue": [ "string" ],
                "type": "STRING" | "BOOLEAN" | "NUMBER" | "STRING_LIST"
            }
        }
    ],
    "type": "IN_LINE_ATTRIBUTE"
}
```

For each attribute that you add, define the key in the `key` field. Specify the data type of the value in the `type` field and include the field that corresponds to the data type. For example, if you include a string, the attribute would be in the following format:

```
{ 
    "key": "string",
    "value": { 
        "stringValue": "string",
        "type": "STRING"
    }
}
```

#### Ingest the metadata from S3
<a name="w2aac32c10c23c19c17c11b3c15c11b3"></a>

You can also ingest metadata from a file with the extension `.metadata.json` in an S3 location. For more information about the format of a metadata file, see the **Document metadata fields** section in [Connect to Amazon S3 for your knowledge base](s3-data-source-connector.md).

If the metadata is from an S3 file, the [DocumentMetadata](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_DocumentMetadata.html) object in the `metadata` field should be in the following format:

```
{
    "s3Location": { 
        "bucketOwnerAccountId": "string",
        "uri": "string"
    },
        "type": "S3_LOCATION"
    }
 }
```

Include the owner of the S3 bucket that contains the metadata file in `bucketOwnerAccountId` field, and the S3 URI of the metadata file in the `uri` field.

**Warning**  
If you defined the content inline, you must define the metadata inline.

### Ingest a document into a knowledge base connected to an Amazon S3 data source
<a name="kb-direct-ingestion-add-s3"></a>

If the `dataSourceId` you specify belongs to an S3 data source, you can add content and metadata for each [KnowledgeBaseDocument](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_KnowledgeBaseDocument.html) object in the `documents` array.

**Note**  
For S3 data sources, you can add content and metadata only from an S3 location.

The content of an S3 document to add to S3 should be added to a [DocumentContent](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_DocumentContent.html) object in the following format:

```
{ 
    "dataSourceType": "string",
    "s3": { 
        "s3Location": { 
            "uri": "string"
        }
    }
}
```

Include the owner of the S3 bucket that contains the document in `bucketOwnerAccountId` field, and the S3 URI of the document in the `uri` field.

The metadata for a document added to a custom data source can be defined in the following format:

```
{
    "s3Location": { 
        "bucketOwnerAccountId": "string",
        "uri": "string"
    },
        "type": "S3_LOCATION"
    }
 }
```

**Warning**  
Documents that you ingest directly into a knowledge base connected to an S3 data source aren't added to the S3 bucket itself. We recommend that you add these documents to the S3 data source as well so that they aren't removed or overwritten if you sync your data source.

### Example request bodies
<a name="w2aac32c10c23c19c17c11b3c19"></a>

Expond the following sections to see request bodies for different use cases with `IngestKnowledgeBaseDocuments`:

#### Add a custom text document to a custom data source and ingest it
<a name="w2aac32c10c23c19c17c11b3c19b5b1"></a>

The following example shows the addition of one text document to a custom data source:

```
PUT /knowledgebases/KB12345678/datasources/DS12345678/documents HTTP/1.1
Content-type: application/json

{
   "documents": [ 
      { 
         "content": { 
            "dataSourceType": "CUSTOM",
            "custom": { 
               "customDocumentIdentifier": { 
                  "id": "MyDocument"
               },
               "inlineContent": { 
                  "textContent": { 
                     "data": "Hello world!"
                  },
                  "type": "TEXT"
               },
               "sourceType": "IN_LINE"
            }
         }
     }
   ]
}
```

#### Add a Base64-encoded document to a custom data source and ingest it
<a name="w2aac32c10c23c19c17c11b3c19b5b3"></a>

The following example shows the addition of a PDF document to a custom data source:

```
PUT /knowledgebases/KB12345678/datasources/DS12345678/documents HTTP/1.1
Content-type: application/json

{
   "documents": [ 
      { 
         "content": { 
            "dataSourceType": "CUSTOM",
            "custom": { 
               "customDocumentIdentifier": { 
                  "id": "MyDocument"
               },
               "inlineContent": { 
                  "byteContent": { 
                     "data": "<Base64-encoded string>",
                     "mimeType": "application/pdf"
                  },
                  "type": "BYTE"
               },
               "sourceType": "IN_LINE"
            }
         }
     }
   ]
}
```

#### Add a document from an S3 location to a knowledge base connected to a custom data source and ingest it
<a name="w2aac32c10c23c19c17c11b3c19b5b5"></a>

The following example shows the addition of one text document to a custom data source from an S3 location:

```
PUT /knowledgebases/KB12345678/datasources/DS12345678/documents HTTP/1.1
Content-type: application/json

{
   "documents": [ 
      { 
         "content": { 
            "dataSourceType": "CUSTOM",
            "custom": { 
               "customDocumentIdentifier": { 
                  "id": "MyDocument"
               },
               "s3": {
                "s3Location": {
                    "uri": "amzn-s3-demo-bucket"
                }
               },
               "sourceType": "S3"
            }
         }
     }
   ]
}
```

#### Add an inline document to a knowledge base connected to a custom data source and include metadata inline
<a name="w2aac32c10c23c19c17c11b3c19b5b7"></a>

The following example shows the inline addition to a custom data source of a document alongside metadata containing two attributes:

```
PUT /knowledgebases/KB12345678/datasources/DS12345678/documents HTTP/1.1
Content-type: application/json

{
   "documents": [ 
      { 
         "content": { 
            "dataSourceType": "CUSTOM",
            "custom": { 
               "customDocumentIdentifier": { 
                  "id": "MyDocument"
               },
               "inlineContent": { 
                  "textContent": { 
                     "data": "Hello world!"
                  },
                  "type": "TEXT"
               },
               "sourceType": "IN_LINE"
            }
         },
         "metadata": {
            "inlineAttributes": [ 
               { 
                  "key": "genre",
                  "value": {
                     "stringValue": "pop",
                     "type": "STRING"
                  }
               },
               { 
                  "key": "year",
                  "value": { 
                     "numberValue": 1988,
                     "type": "NUMBER"
                  }
               }
            ],
            "type": "IN_LINE_ATTRIBUTE"
         }
     }
   ]
}
```

#### Add a document to a knowledge base connected to a S3 data source and include metadata for it
<a name="w2aac32c10c23c19c17c11b3c19b5b9"></a>

The following example shows the addition of a document alongside metadata to an S3 data source. You can include the metadata only through S3:

```
PUT /knowledgebases/KB12345678/datasources/DS12345678/documents HTTP/1.1
Content-type: application/json

{
    "documents": [ 
        { 
            "content": { 
                "dataSourceType": "S3",
                "s3": { 
                "s3Location": {
                    "uri": "amzn-s3-demo-bucket"
                }
            }
        },
        "metadata": {
            "s3Location": {
                "bucketOwnerId": "111122223333",
                "uri": "amzn-s3-demo-bucket"
            },
                "type": "S3_LOCATION"
            }
        }
    ]
}
```

# View information about documents in your data source
<a name="kb-direct-ingestion-view"></a>

The following topics describe how to view documents in your data source. If your knowledge base is connected to an Amazon S3 data source, you can view the documents in the connected S3 bucket.

**Note**  
If you created a new knowledge base by connecting to an S3 data source, you must sync the data source first before you can use these API operations on the data source.

Expand the method that corresponds to your use case:

## Use the console
<a name="kb-direct-ingestion-view-console"></a>

To view documents in your data source that have been ingested in the AWS Management Console, do the following:

1. Sign in to the AWS Management Console with an IAM identity that has permissions to use the Amazon Bedrock console. Then, open the Amazon Bedrock console at [https://console.aws.amazon.com/bedrock](https://console.aws.amazon.com/bedrock).

1. In the left navigation pane, choose **Knowledge bases**.

1. In the **Knowledge bases** section, select the knowledge base whose documents you want to view.

1. In the **Data source** section, select the data source whose documents you want to view.

1. The **Documents** section lists the documents in the data source. These documents have also been ingested into the knowledge base.

## Use the API
<a name="kb-direct-ingestion-view-api"></a>

With the Amazon Bedrock API, you can view a subset or all of the documents in your data source that have been ingested into the knowledge base. Select the topic that pertains to your use case.

**Topics**
+ [View information about a subset of documents in your knowledge base](#kb-direct-ingestion-get)
+ [View information about all documents in your knowledge base](#kb-direct-ingestion-list)

### View information about a subset of documents in your knowledge base
<a name="kb-direct-ingestion-get"></a>

To view information about specific documents in your data source, send a [GetKnowledgeBaseDocuments](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_GetKnowledgeBaseDocuments.html) request with an [Agents for Amazon Bedrock build-time endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#bra-bt) and specify the IDs of the data source and the knowledge base it's connected to.

For each document that you want to get information for, add a [DocumentIdentifier](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_DocumentIdentifier.html) item in the `documentIdentifiers` array in one of the following formats:
+ If the data source is a custom one, specify the ID of the document in the `id` field:

  ```
  { 
      "custom": { 
          "id": "string"
      },
      "dataSourceType": "CUSTOM"
  }
  ```
+ If the data source is an Amazon S3 one, specify the S3 URI of the document in the `uri` field:

  ```
  {
      "dataSourceType": "S3",
      "s3": { 
          "uri": "string"
      }
  }
  ```

The response returns an array of items, each of which contains information about a document that you requested.

### View information about all documents in your knowledge base
<a name="kb-direct-ingestion-list"></a>

To view information about all documents in a data source, send a [ListKnowledgeBaseDocuments](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_ListKnowledgeBaseDocuments.html) request with an [Agents for Amazon Bedrock build-time endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#bra-bt) and specify the IDs of tthe data source and the knowledge base it's connected to. You also have the following options:
+ Specify the `maxResults` to limit the number of results to return.
+ If the results don't fit into a response, a value is returned in the `nextToken` field of the response. You can use this value in the `nextToken` field of a subsequent request to get the next batch of results.

# Delete documents from a knowledge base directly
<a name="kb-direct-ingestion-delete"></a>

If you no longer need a document in your knowledge base, you can delete it directly. To learn how to delete documents from your data source and knowledge base, expand the section that corresponds to your use case:

## Use the console
<a name="kb-direct-ingestion-delete-console"></a>

To delete documents from your data source and knowledge base directly using the AWS Management Console, do the following:

1. Sign in to the AWS Management Console with an IAM identity that has permissions to use the Amazon Bedrock console. Then, open the Amazon Bedrock console at [https://console.aws.amazon.com/bedrock](https://console.aws.amazon.com/bedrock).

1. In the left navigation pane, choose **Knowledge bases**.

1. In the **Knowledge bases** section, select the knowledge base from which to delete documents.

1. In the **Data source** section, select the data source from which to delete documents.

1. In the **Documents** section, select a document to delete. Then choose **Delete document**. Review the message and confirm.

## Use the API
<a name="kb-direct-ingestion-delete-api"></a>

To delete specific documents from your data source through the Amazon Bedrock API, send a [DeleteKnowledgeBaseDocuments](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_DeleteKnowledgeBaseDocuments.html) request with an [Agents for Amazon Bedrock build-time endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#bra-bt) and specify the IDs of the data source and the knowledge base it's connected to.

For each document that you want to delete, add a [DocumentIdentifier](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_DocumentIdentifier.html) item in the `documentIdentifiers` array in one of the following formats:
+ If the data source is a custom one, specify the ID of the document in the `id` field:

  ```
  { 
      "custom": { 
          "id": "string"
      },
      "dataSourceType": "CUSTOM"
  }
  ```
+ If the data source is an Amazon S3 one, specify the S3 URI of the document in the `uri` field:

  ```
  {
      "dataSourceType": "S3",
      "s3": { 
          "uri": "string"
      }
  }
  ```

**Warning**  
Documents that you delete directly from a knowledge base connected to an S3 data source aren't deleted from the S3 bucket itself. We recommend that you delete these documents from the S3 bucket, so that they aren't reintroduced if you sync your data source.

# View data source information for your Amazon Bedrock knowledge base
<a name="kb-ds-info"></a>

You can view information about a data source for your knowledge base, such as the settings and sync history.

To monitor your knowledge base, including any data sources for your knowledge base, see [Knowledge base logging using Amazon CloudWatch](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-bases-logging.html).

Choose the tab for your preferred method, and then follow the steps:

------
#### [ Console ]

**To view information about a data source**

1. Sign in to the AWS Management Console with an IAM identity that has permissions to use the Amazon Bedrock console. Then, open the Amazon Bedrock console at [https://console.aws.amazon.com/bedrock](https://console.aws.amazon.com/bedrock).

1. In the left navigation pane, choose **Knowledge bases**.

1. In the **Data source** section, select the data source for which you want to view details.

1. The **Data source overview** contains details about the data source.

1. The **Sync history** contains details about when the data source was synced. To see reasons for why a sync event failed, select a sync event and choose **View warnings**.

------
#### [ API ]

To get information about a data source, send a [GetDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_GetDataSource.html) request with a [Agents for Amazon Bedrock build-time endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#bra-bt) and specify the `dataSourceId` and the `knowledgeBaseId` of the knowledge base that it belongs to.

To list information about a knowledge base's data sources, send a [ListDataSources](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_ListDataSources.html) request with a [Agents for Amazon Bedrock build-time endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#bra-bt) and specify the ID of the knowledge base.
+ To set the maximum number of results to return in a response, use the `maxResults` field.
+ If there are more results than the number you set, the response returns a `nextToken`. You can use this value in another `ListDataSources` request to see the next batch of results.

To get information a sync event for a data source, send a [GetIngestionJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_GetIngestionJob.html) request with a [Agents for Amazon Bedrock build-time endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#bra-bt). Specify the `dataSourceId`, `knowledgeBaseId`, and `ingestionJobId`.

To list the sync history for a data source in a knowledge base, send a [ListIngestionJobs](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_ListIngestionJobs.html) request with a [Agents for Amazon Bedrock build-time endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#bra-bt). Specify the ID of the knowledge base and data source. You can set the following specifications.
+ Filter for results by specifying a status to search for in the `filters` object.
+ Sort by the time that the job was started or the status of a job by specifying the `sortBy` object. You can sort in ascending or descending order.
+ Set the maximum number of results to return in a response in the `maxResults` field. If there are more results than the number you set, the response returns a `nextToken` that you can send in another [ListIngestionJobs](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_ListIngestionJobs.html) request to see the next batch of jobs.

------

# Modify a data source for your Amazon Bedrock knowledge base
<a name="kb-ds-update"></a>

You can update a data source for your knowledge base, such as changing the data source configurations.

You can update a data source in the following ways:
+ Add, change, or remove files or content from the the data source.
+ Change the data source configurations, or the KMS key to use for encrypting transient data during data ingestion. If you change the source or endpoint configuration details, you should update or create a new IAM role with the required access permissions and Secrets Manager secret (if applicable).
+ Set your data source deletion policy is to either "Delete" or "Retain". You can delete all data from your data source that’s converted into vector embeddings upon deletion of a knowledge base or data source resource. You can retain all data from your data source that’s converted into vector embeddings upon deletion of a knowledge base or data source resource. Note that the **vector store itself is not deleted** if you delete a knowledge base or data source resource.

Each time you add, modify, or remove files from your data source, you must sync the data source so that it is re-indexed to the knowledge base. Syncing is incremental, so Amazon Bedrock only processes added, modified, or deleted documents since the last sync. Before you begin ingestion, check that your data source fulfills the following conditions:
+ The files are in supported formats. For more information, see [Support document formats](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-ds.html#kb-ds-supported-doc-formats-limits).
+ The files don't exceed the **Ingestion job file size** specified in [Amazon Bedrock endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/bedrock.html) in the AWS General Reference.
+ If your data source contains metadata files, check the following conditions to ensure that the metadata files aren't ignored:
  + Each `.metadata.json` file shares the same file name and extension as the source file that it's associated with.
  + If the vector index for your knowledge base is in an Amazon OpenSearch Serverless vector store, check that the vector index is configured with the `faiss` engine. If the vector index is configured with the `nmslib` engine, you'll have to do one of the following:
    + [Create a new knowledge base](knowledge-base-create.md) in the console and let Amazon Bedrock automatically create a vector index in Amazon OpenSearch Serverless for you.
    + [Create another vector index](knowledge-base-setup.md) in the vector store and select `faiss` as the **Engine**. Then [create a new knowledge base](knowledge-base-create.md) and specify the new vector index.
  + If the vector index for your knowledge base is in an Amazon Aurora database cluster, we recommend that you use the custom metadata field to store all your metadata in a single column and create an index on this column. If you do not provide the custom metadata field, you must check that the table for your index contains a column for each metadata property in your metadata files before starting ingestion. For more information, see [Prerequisites for using a vector store you created for a knowledge base](knowledge-base-setup.md).

To learn how to update a data source, choose the tab for your preferred method, and then follow the steps:

------
#### [ Console ]

**To update a data source**

1. Sign in to the AWS Management Console with an IAM identity that has permissions to use the Amazon Bedrock console. Then, open the Amazon Bedrock console at [https://console.aws.amazon.com/bedrock](https://console.aws.amazon.com/bedrock).

1. In the left navigation pane, choose **Knowledge bases**.

1. Select the name of your knowledge base.

1. In the **Data source** section, select the radio button next to the data source that you want edit and sync.

1. (Optional) Choose **Edit**, change your configurations, and select **Submit**. If you change the source or endpoint configuration details, you should update or create a new IAM role with the required access permissions and Secrets Manager secret (if applicable). Alos, note that can't change the chunking configurations that are based on the original data ingested. You must re-create the data source.
**Note**  
You can't change the chunking configurations. You must re-create the data source.

1. (Optional) Choose to edit your data source data deletion policy as part of the advanced settings:

   For data deletion policy settings, you can choose either:
   + Delete: Deletes all data from your data source that’s converted into vector embeddings upon deletion of a knowledge base or data source resource. Note that the **vector store itself is not deleted**, only the data. This flag is ignored if an AWS account is deleted.
   + Retain: Retains all data from your data source that’s converted into vector embeddings upon deletion of a knowledge base or data source resource. Note that the **vector store itself is not deleted** if you delete a knowledge base or data source resource.

1. Choose **Sync**.

1. A green banner appears when the sync is complete and the **Status** becomes **Ready**.

------
#### [ API ]

**To update a data source**

1. (Optional) Send an [UpdateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_UpdateDataSource.html) request with a [Agents for Amazon Bedrock build-time endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#bra-bt), changing any configurations and specifying the same configurations you don't want to change. If you change the source or endpoint configuration details, you should update or create a new IAM role with the required access permissions and Secrets Manager secret (if applicable).
**Note**  
You can't change the `chunkingConfiguration`. Send the request with the existing `chunkingConfiguration`, or re-create the data source.

1. (Optional) Change the `dataDeletionPolicy` for your data source. You can `DELETE` all data from your data source that’s converted into vector embeddings upon deletion of a knowledge base or data source resource. This flag is ignored if an AWS account is deleted. You can `RETAIN` all data from your data source that’s converted into vector embeddings upon deletion of a knowledge base or data source resource. Note that the **vector store itself is not deleted** if you delete a knowledge base or data source resource.

1. Send a [StartIngestionJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_StartIngestionJob.html) request with a [Agents for Amazon Bedrock build-time endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#bra-bt), specifying the `dataSourceId` and the `knowledgeBaseId`.

------

# Delete a data source from your Amazon Bedrock knowledge base
<a name="kb-ds-delete"></a>

You can delete or remove a data source that you no longer need or use for your knowledge base.

Choose the tab for your preferred method, and then follow the steps:

------
#### [ Console ]

**To delete a data source**

1. Sign in to the AWS Management Console with an IAM identity that has permissions to use the Amazon Bedrock console. Then, open the Amazon Bedrock console at [https://console.aws.amazon.com/bedrock](https://console.aws.amazon.com/bedrock).

1. In the left navigation pane, choose **Knowledge bases**.

1. In the **Data source** section, select the radio button next to the data source that you want to delete.

1. Choose **Delete**.

1. A green banner appears when the data source is successfully deleted.
**Note**  
Your data deletion policy for your data source is set to either "Delete" (deletes all data when you delete your data source, but **doesn't delete the vector store itself**) or "Retain" (retains all data when you delete your data source). If you delete a data source or knowledge base, the **vector store itself is not deleted**. If the data source data deletion policy is set to "Delete", it's possible for the data source to unsuccessfully complete the process of deletion due to issues with the configuration or access to the vector store. You can check the "DELETE\$1UNSUCCESSFUL" status to see the reason why the data source could not successfully delete.

------
#### [ API ]

To delete a data source from a knowledge base, send a [DeleteDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_DeletDataSource.html) request, specifying the `dataSourceId` and `knowledgeBaseId`.

**Note**  
Your data deletion policy for your data source is set to either `DELETE` (deletes all data when you delete your data source, but **doesn't delete the vector store itself**) or `RETAIN` (retains all data when you delete your data source). If you delete a data source or knowledge base, the **vector store itself is not deleted**. If the data source data deletion policy is set to `DELETE`, it's possible for the data source to unsuccessfully complete the process of deletion due to issues with the configuration or access to the vector store. You can view `failureReasons` if the data source status is `DELETE_UNSUCCESSFUL` to see the reason why the data source could not successfully delete.

------