

# Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases
<a name="knowledge-base-create"></a>

When you create a knowledge base by connecting to a data source, you set up or specify the following:
+ General information that defines and identifies the knowledge base
+ The service role with permissions to the knowledge base.
+ Configurations for the knowledge base, including the embeddings model to use when converting data from the data source, storage configurations for the service in which to store the embeddings, and, optionally, an S3 location to store multimodal data.

**Note**  
You can’t create a knowledge base with a root user. Log in with an IAM user before starting these steps.

Expand the section that corresponds to your use case:

## Use the console
<a name="knowledge-base-create-console"></a>

**To set up a knowledge base**

1. Sign in to the AWS Management Console with an IAM identity that has permissions to use the Amazon Bedrock console. Then, open the Amazon Bedrock console at [https://console.aws.amazon.com/bedrock](https://console.aws.amazon.com/bedrock).

1. In the left navigation pane, choose **Knowledge bases**.

1. In the **Knowledge bases** section, choose the create button and select to create a knowledge base with a vector store.

1. (Optional) Change the default name and provide a description for your knowledge base.

1. Choose an AWS Identity and Access Management (IAM) role that provides Amazon Bedrock permission to access other required AWS services. You can let Amazon Bedrock create the service role or choose to use your own [custom role that you created for Neptune Analytics](kb-permissions.md#kb-permissions-neptune).

1. Choose a data source to connect your knowledge base to.

1. (Optional) Add tags to your knowledge base. For more information, see [Tagging Amazon Bedrock resources](tagging.md).

1. (Optional) Configure services for which to deliver activity logs for your knowledge base.

1. Go to the next section and follow the steps at [Connect a data source to your knowledge base](data-source-connectors.md) to configure a data source.

1. In the **Embeddings model** section, do the following:

   1. Choose an embeddings model to convert your data into vector embeddings. For multimodal data (images, audio, and video), select a multimodal embedding model such as Amazon Titan Multimodal Embeddings G1 or Cohere Embed v3.
**Note**  
When using Amazon Titan Multimodal Embeddings G1, you must provide an S3 content bucket and can only use the default parser. This model is optimized for image search use cases. For comprehensive guidance on choosing between multimodal approaches, see [Build a knowledge base for multimodal content](kb-multimodal.md).

   1. (Optional) Expand the **Additional configurations** section to see the following configuration options (not all models support all configurations):
      + **Embeddings type** – Whether to convert the data to floating-point (float32) vector embeddings (more precise, but more costly) or binary vector embeddings (less precise, but less costly). To learn about which embeddings models support binary vectors, refer to [supported embeddings models](knowledge-base-supported.md).
      + **Vector dimensions** – Higher values improve accuracy but increase cost and latency.

1. In the **Vector database** section, do the following:

   1. Choose a vector store to store the vector embeddings that will be used for query. You have the following options:
      + **Quick create a new vector store** – choose one of the available vector stores for Amazon Bedrock to create. You can also optionally configure AWS KMS key encryption for your vector store.
**Note**  
When using this option, Amazon Bedrock automatically handles the metadata placement for each vector store.
        + **Amazon OpenSearch Serverless** – Amazon Bedrock Knowledge Bases creates an Amazon OpenSearch Serverless vector search collection and index and configures it with the required fields for you.
        + **Amazon Aurora PostgreSQL Serverless** – Amazon Bedrock sets up an Amazon Aurora PostgreSQL Serverless vector store. This process takes unstructured text data from an Amazon S3 bucket, transforms it into text chunks and vectors, and then stores them in a PostgreSQL database. For more information, see [Quick create an Aurora PostgreSQL Knowledge Base for Amazon Bedrock](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.quickcreatekb.html).
        + **Amazon Neptune Analytics** – Amazon Bedrock uses Retrieval Augmented Generation (RAG) techniques combined with graphs to enhance generative AI applications so that end users can get more accurate and comprehensive responses.
        + **Amazon S3 Vectors** – Amazon Bedrock Knowledge Bases creates an S3 vector bucket and a vector index that will store the embeddings generated from your data sources.

          You can create a knowledge base for Amazon S3 Vectors in all AWS Regions where both Amazon Bedrock and Amazon S3 Vectors are available. For region availability information, see [Amazon S3 Vectors](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors.html) in the *Amazon S3 User Guide*.
**Note**  
When using Amazon S3 Vectors with Amazon Bedrock Knowledge Bases, you can attach up to 1 KB of custom metadata (including both filterable and non-filterable metadata) and 35 metadata keys per vector. For detailed information about metadata limitations, see [Metadata support](knowledge-base-setup.md#metadata-support) in [Prerequisites for using a vector store you created for a knowledge base](knowledge-base-setup.md).
      + **Choose a vector store you have created** – Select a supported vector store and identify the vector field names and metadata field names in the vector index. For more information, see [Prerequisites for using a vector store you created for a knowledge base](knowledge-base-setup.md).
**Note**  
If your data source is a Confluence, Microsoft SharePoint, or Salesforce instance, the only supported vector store service is Amazon OpenSearch Serverless.

   1. (Optional) Expand the **Additional configurations** section and modify any relevant configurations.

1. If your data source contains images, specify an Amazon S3 URI in which to store the images that the parser will extract from the data in the **Multimodal storage destination**. The images can be returned during query. You can also optionally choose a customer managed key instead of the default AWS managed key to encrypt your data.
**Note**  
Multimodal data is only supported with Amazon S3 and custom data sources.
**Note**  
When using multimodal embedding models:  
Amazon Titan Multimodal Embeddings G1 requires an S3 content bucket and works best with image-only datasets using the default parser
Cohere Embed v3 supports mixed text and image datasets and can be used with any parser configuration
For image search use cases, avoid using Bedrock Data Automation (BDA) or foundation model parsers with Titan G1 due to token limitations
The multimodal storage destination creates file copies for retrieval purposes, which can incur additional storage charges

1. Choose **Next** and review the details of your knowledge base. You can edit any section before going ahead and creating your knowledge base.
**Note**  
The time it takes to create the knowledge base depends on your specific configurations. When the creation of the knowledge base has completed, the status of the knowledge base changes to either state it is ready or available.  
Once your knowledge base is ready and available, sync your data source for the first time and whenever you want to keep your content up to date. Select your knowledge base in the console and select **Sync** within the data source overview section.

## Use the API
<a name="knowledge-base-create-api"></a>

To create a knowledge base, send a [CreateKnowledgeBase](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateKnowledgeBase.html) request with an [Agents for Amazon Bedrock build-time endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#bra-bt).

**Note**  
If you prefer to let Amazon Bedrock create and manage a vector store for you, use the console. For more information, expand the **Use the console** section in this topic.

The following fields are required:


****  

| Field | Basic description | 
| --- | --- | 
| name | A name for the knowledge base | 
| roleArn | The ARN of an [Amazon Bedrock Knowledge Bases service role](kb-permissions.md). | 
| knowledgeBaseConfiguration | Contains configurations for the knowledge base. See details below. | 
| storageConfiguration | (Only required if you're connecting to an unstructured data source). Contains configurations for the data source service that you choose. | 

The following fields are optional:


****  

| Field | Use case | 
| --- | --- | 
| description | A description for the knowledge base. | 
| clientToken | To ensure the API request completes only once. For more information, see [Ensuring idempotency](https://docs.aws.amazon.com/ec2/latest/devguide/ec2-api-idempotency.html). | 
| tags | To associate tags with the flow. For more information, see [Tagging Amazon Bedrock resources](tagging.md). | 

In the `knowledgeBaseConfiguration` field, which maps to a [KnowledgeBaseConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_KnowledgeBaseConfiguration.html) object, specify `VECTOR` in the `type` field and include a [VectorKnowledgeBaseConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_VectorKnowledgeBaseConfiguration.html) object. In the object, include the following fields:
+ `embeddingModelArn` – The ARN of the embedding model to use.
+ `embeddingModelConfiguration` – Configurations for the embedding model. To see the possible values you can specify for each supported model, see [Supported models and Regions for Amazon Bedrock knowledge bases](knowledge-base-supported.md).
+ (If you plan to include multimodal data, which includes images, figures, charts, or tables, in your knowledge base) `supplementalDataStorageConfiguration` – Maps to a [SupplementalDataStorageLocation](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_SupplementalDataStorageLocation.html) object, in which you specify the S3 location in which to store the extracted data. For more information, see [Parsing options for your data source](kb-advanced-parsing.md).

In the `storageConfiguration` field, which maps to a [StorageConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_StorageConfiguration.html) object, specify the vector store that you plan to connect to in the `type` field and include the field that corresponds to that vector store. See each vector store configuration type at [StorageConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_StorageConfiguration.html) for details about the information you need to provide.

The following shows an example request to create a knowledge base connected to an Amazon OpenSearch Serverless collection. The data from connected data sources will be converted into binary vector embeddings with Amazon Titan Text Embeddings V2 and multimodal data extracted by the parser is set up to be stored in a bucket called *MyBucket*.

```
PUT /knowledgebases/ HTTP/1.1
Content-type: application/json

{
   "name": "MyKB",
   "description": "My knowledge base",
   "roleArn": "arn:aws:iam::111122223333:role/service-role/AmazonBedrockExecutionRoleForKnowledgeBase_123",
   "knowledgeBaseConfiguration": {
      "type": "VECTOR",
      "vectorKnowledgeBaseConfiguration": { 
         "embeddingModelArn": "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0",
         "embeddingModelConfiguration": { 
            "bedrockEmbeddingModelConfiguration": { 
               "dimensions": 1024,
               "embeddingDataType": "BINARY"
            }
         },
         "supplementalDataStorageConfiguration": { 
            "storageLocations": [ 
               { 
                  "s3Location": { 
                     "uri": "arn:aws:s3:::MyBucket"
                  },
                  "type": "S3"
               }
            ]
         }
      }
   },
   "storageConfiguration": { 
      "opensearchServerlessConfiguration": { 
         "collectionArn": "arn:aws:aoss:us-east-1:111122223333:collection/abcdefghij1234567890",
         "fieldMapping": { 
            "metadataField": "metadata",
            "textField": "text",
            "vectorField": "vector"
         },
         "vectorIndexName": "MyVectorIndex"
      }
   }
}
```

**Topics**
+ [

# Connect a data source to your knowledge base
](data-source-connectors.md)
+ [

# Customize ingestion for a data source
](kb-data-source-customize-ingestion.md)
+ [

# Set up security configurations for your knowledge base
](kb-create-security.md)

# Connect a data source to your knowledge base
<a name="data-source-connectors"></a>

After finishing the configurations for your knowledge base, you connect a supported data source to the knowledge base.

Amazon Bedrock Knowledge Bases supports connecting to unstructured data sources or to structured data stores through a query engine. Select a topic to learn how to connect to that type of data source:

**Multimodal content support**  
Multimodal content (images, audio, and video files) is only supported with Amazon S3 and custom data sources. Other data source types will skip multimodal files during ingestion. For comprehensive guidance on working with multimodal content, see [Build a knowledge base for multimodal content](kb-multimodal.md).

To learn how to connect to a data source using the Amazon Bedrock console, select the topic that corresponds to your data source type at the bottom of this page:

To connect to a data source using the Amazon Bedrock API, send a [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) request with an [Agents for Amazon Bedrock runtime endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#bra-rt).

The following fields are required:


****  

| Field | Basic description | 
| --- | --- | 
| knowledgeBaseId | The ID of the knowledge base. | 
| name | A name for the knowledge base. | 
| dataSourceConfiguration | Specify the data source service or type in the type field and include the corresponding field. For more details about service-specific configurations, select the topic for the service from the topics at the bottom of this page. | 

The following fields are optional:


****  

| Field | Use case | 
| --- | --- | 
| description | To provide a description for the data source. | 
| vectorIngestionConfiguration | Contains configurations for customizing the ingestion process. For more information, see [Customize ingestion for a data source](kb-data-source-customize-ingestion.md). | 
| dataDeletionPolicy | To specify whether to RETAIN the vector embeddings in the vector store or to DELETE them. | 
| serverSideEncryptionConfiguration | To encrypt transient data during data syncing with a customer managed key, specify its ARN in the kmsKeyArn field. | 
| clientToken | To ensure the API request completes only once. For more information, see [Ensuring idempotency](https://docs.aws.amazon.com/ec2/latest/devguide/ec2-api-idempotency.html). | 

Select a topic to learn more about a service and configuring it.

**Topics**
+ [

# Connect to Amazon S3 for your knowledge base
](s3-data-source-connector.md)
+ [

# Connect to Confluence for your knowledge base
](confluence-data-source-connector.md)
+ [

# Connect to Microsoft SharePoint for your knowledge base
](sharepoint-data-source-connector.md)
+ [

# Connect to Salesforce for your knowledge base
](salesforce-data-source-connector.md)
+ [

# Crawl web pages for your knowledge base
](webcrawl-data-source-connector.md)
+ [

# Connect your knowledge base to a custom data source
](custom-data-source-connector.md)

# Connect to Amazon S3 for your knowledge base
<a name="s3-data-source-connector"></a>

Amazon S3 is an object storage service that stores data as objects within buckets. You can connect to your Amazon S3 bucket for your Amazon Bedrock knowledge base by using either the [AWS Management Console for Amazon Bedrock](https://console.aws.amazon.com/bedrock/home) or the [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) API (see Amazon Bedrock [supported SDKs and AWS CLI](https://docs.aws.amazon.com/bedrock/latest/APIReference/welcome.html)).

**Multimodal content support**  
Amazon S3 data sources support multimodal content including images, audio, and video files. For comprehensive guidance on working with multimodal content, see [Build a knowledge base for multimodal content](kb-multimodal.md).

You can upload a small batch of files to an Amazon S3 bucket using the Amazon S3 console or API. You can alternatively use [AWS DataSync](https://docs.aws.amazon.com/datasync/latest/userguide/create-s3-location.html) to upload multiple files to S3 continuously, and transfer files on a schedule from on-premises, edge, other cloud, or AWS storage.

Currently only General Purpose S3 buckets are supported.

There are limits to how many files and MB per file that can be crawled. See [Quotas for knowledge bases](https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.html).

**Topics**
+ [

## Supported features
](#supported-features-s3-connector)
+ [

## Prerequisites
](#prerequisites-s3-connector)
+ [

## Connection configuration
](#configuration-s3-connector)

## Supported features
<a name="supported-features-s3-connector"></a>
+ Document metadata fields
+ Inclusion prefixes
+ Incremental content syncs for added, updated, deleted content

## Prerequisites
<a name="prerequisites-s3-connector"></a>

**In Amazon S3, make sure you**:
+ Note the Amazon S3 bucket URI, Amazon Resource Name (ARN), and the AWS account ID for the owner of the bucket. You can find the URI and ARN in the properties section in the Amazon S3 console. Your bucket must be in the same Region as your Amazon Bedrock knowledge base. You must have permission to access the bucket.

**In your AWS account, make sure you**:
+ Include the necessary permissions to connect to your data source in your AWS Identity and Access Management (IAM) role/permissions policy for your knowledge base. For information on the required permissions for this data source to add to your knowledge base IAM role, see [Permissions to access data sources](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-permissions.html#kb-permissions-access-ds).

**Note**  
If you use the console, the IAM role with all the required permissions can be created for you as part of the steps for creating a knowledge base. After you have configured your data source and other configurations, the IAM role with all the required permissions are applied to your specific knowledge base.

## Connection configuration
<a name="configuration-s3-connector"></a>

To connect to your Amazon S3 bucket, you must provide the necessary configuration information so that Amazon Bedrock can access and crawl your data. You must also follow the [Prerequisites](#prerequisites-s3-connector).

An example of a configuration for this data source is included in this section.

For more information about inclusion filters, document metadata fields, incremental syncing, and how these work, select the following:

### Document metadata fields
<a name="ds-s3-metadata-fields"></a>

You can include a separate file that specifies the document metadata fields/attributes for each file in your Amazon S3 data source and whether to include them in the embeddings when indexing the data source into the vector store. For example, you can create a file in the following format, name it *fileName.extension.metadata.json* and upload it to your S3 bucket.

```
{
  "metadataAttributes": {
    "company": {
      "value": {
        "type": "STRING",
        "stringValue": "BioPharm Innovations"
      },
      "includeForEmbedding": true
    },
    "created_date": {
      "value": {
        "type": "NUMBER",
        "numberValue": 20221205
      },
      "includeForEmbedding": true
    },
    "author": {
      "value": {
        "type": "STRING",
        "stringValue": "Lisa Thompson"
      },
      "includeForEmbedding": true
    },
    "origin": {
      "value": {
        "type": "STRING",
        "stringValue": "Overview"
      },
      "includeForEmbedding": true
    }
  }
}
```

The metadata file must use the same name as its associated source document file, with `.metadata.json` appended onto the end of the file name. The metadata file must be stored in the same folder or location as the source file in your Amazon S3 bucket. The file must not exceed the limit of 10 KB. For information on the supported attribute/field data types and the filtering operators you can apply to your metadata fields, see [Metadata and filtering](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-test-config.html).

### Inclusion prefixes
<a name="ds-s3-inclusion-exclusion"></a>

You can specify an inclusion prefix, which is an Amazon S3 path prefix, where you can use an S3 file or a folder instead of the entire bucket to create the S3 data source connector.

### Incremental syncing
<a name="ds-s3-incremental-sync"></a>

The data source connector crawls new, modified, and deleted content each time your data source syncs with your knowledge base. Amazon Bedrock can use your data source’s mechanism for tracking content changes and crawl content that changed since the last sync. When you sync your data source with your knowledge base for the first time, all content is crawled by default.

To sync your data source with your knowledge base, use the [StartIngestionJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_StartIngestionJob.html) API or select your knowledge base in the console and select **Sync** within the data source overview section.

**Important**  
All data that you sync from your data source becomes available to anyone with `bedrock:Retrieve` permissions to retrieve the data. This can also include any data with controlled data source permissions. For more information, see [Knowledge base permissions](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-permissions.html).

------
#### [ Console ]

**To connect an Amazon S3 bucket to your knowledge base**

1. Follow the steps at [Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases](knowledge-base-create.md) and choose **Amazon S3** as the data source.

1. Provide a name for the data source.

1. Specify whether the Amazon S3 bucket is in your current AWS account or another AWS account. Your bucket must be in the same Region as the knowledge base.

1. (Optional) If the Amazon S3 bucket is encrypted with a KMS key, include the key. For more information, see [Permissions to decrypt your AWS KMS key for your data sources in Amazon S3](encryption-kb.md#encryption-kb-ds).

1. (Optional) In the **Content parsing and chunking** section, you can customize how to parse and chunk your data. Refer to the following resources to learn more about these customizations:
   + For more information about parsing options, see [Parsing options for your data source](kb-advanced-parsing.md).
   + For more information about chunking strategies, see [How content chunking works for knowledge bases](kb-chunking.md).
**Warning**  
You can't change the chunking strategy after connecting to the data source.
   + For more information about how to customize chunking of your data and processing of your metadata with a Lambda function, see [Use a custom transformation Lambda function to define how your data is ingested](kb-custom-transformation.md).

1. In the **Advanced settings** section, you can optionally configure the following:
   + **KMS key for transient data storage.** – You can encrypt the transient data while converting your data into embeddings with the default AWS managed key or your own KMS key. For more information, see [Encryption of transient data storage during data ingestion](encryption-kb.md#encryption-kb-ingestion).
   + **Data deletion policy** – You can delete the vector embeddings for your data source that are stored in the vector store by default, or choose to retain the vector store data.

1. Continue to choose an embeddings model and vector store. To see the remaining steps, return to [Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases](knowledge-base-create.md) and continue from the step after connecting your data source.

------
#### [ API ]

The following is an example of a configuration for connecting to Amazon S3 for your Amazon Bedrock knowledge base. You configure your data source using the API with the AWS CLI or supported SDK, such as Python. After you call [CreateKnowledgeBase](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateKnowledgeBase.html), you call [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) to create your data source with your connection information in `dataSourceConfiguration`.

To learn about customizations that you can apply to ingestion by including the optional `vectorIngestionConfiguration` field, see [Customize ingestion for a data source](kb-data-source-customize-ingestion.md).

**AWS Command Line Interface**

```
aws bedrock-agent create-data-source \
 --name "S3-connector" \
 --description "S3 data source connector for Amazon Bedrock to use content in S3" \
 --knowledge-base-id "your-knowledge-base-id" \
 --data-source-configuration file://s3-bedrock-connector-configuration.json \
 --data-deletion-policy "DELETE" \
 --vector-ingestion-configuration '{"chunkingConfiguration":{"chunkingStrategy":"FIXED_SIZE","fixedSizeChunkingConfiguration":{"maxTokens":100,"overlapPercentage":10}}}'
                    
s3-bedrock-connector-configuration.json
{
    "s3Configuration": {
	    "bucketArn": "arn:aws:s3:::bucket-name",
	    "bucketOwnerAccountId": "000000000000",
	    "inclusionPrefixes": [
	        "documents/"
	    ]
    },
    "type": "S3"	
}
```

------

# Connect to Confluence for your knowledge base
<a name="confluence-data-source-connector"></a>

Atlassian Confluence is a collaborative work-management tool designed for sharing, storing, and working on project planning, software development, and product management. You can connect to your Confluence instance for your Amazon Bedrock knowledge base by using either the [AWS Management Console for Amazon Bedrock](https://console.aws.amazon.com/bedrock/home) or the [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) API (see Amazon Bedrock [supported SDKs and AWS CLI](https://docs.aws.amazon.com/bedrock/latest/APIReference/welcome.html)).

**Note**  
Confluence data source connector is in preview release and is subject to change.  
Confluence data sources don't support multimodal data, such as tables, charts, diagrams, or other images..

Amazon Bedrock supports connecting to Confluence Cloud instances. Currently, only Amazon OpenSearch Serverless vector store is available to use with this data source.

There are limits to how many files and MB per file that can be crawled. See [Quotas for knowledge bases](https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.html).

**Topics**
+ [

## Supported features
](#supported-features-confluence-connector)
+ [

## Prerequisites
](#prerequisites-confluence-connector)
+ [

## Connection configuration
](#configuration-confluence-connector)

## Supported features
<a name="supported-features-confluence-connector"></a>
+ Auto detection of main document fields
+ Inclusion/exclusion content filters
+ Incremental content syncs for added, updated, deleted content
+ OAuth 2.0 authentication, authentication with Confluence API token

## Prerequisites
<a name="prerequisites-confluence-connector"></a>

**In Confluence, make sure you**:
+ Take note of your Confluence instance URL. For example, for Confluence Cloud, *https://example.atlassian.net*. The URL for Confluence Cloud must be the base URL, ending with *.atlassian.net*.
+ Configure basic authentication credentials containing a username (email of admin account) and password (Confluence API token) to allow Amazon Bedrock to connect to your Confluence Cloud instance. For information about how to create a Confluence API token, see [Manage API tokens for your Atlassian account](https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/#Create-an-API-token) on the Atlassian website.
+ (Optional) Configure an OAuth 2.0 application with credentials of an app key, app secret, access token, and refresh token. For more information, see [OAuth 2.0 apps](https://developer.atlassian.com/cloud/confluence/oauth-2-3lo-apps/) on the Atlassian website.
+ Certain read permissions or scopes must be enabled for your OAuth 2.0 app to connect to Confluence.

  Confluence API:
  + offline\$1access
  + read:content:confluence – View detailed contents 
  + read:content-details:confluence – View content details 
  + read:space-details:confluence – View space details
  + read:audit-log:confluence – View audit records 
  + read:page:confluence – View pages 
  + read:attachment:confluence – View and download content attachments 
  + read:blogpost:confluence – View blogposts 
  + read:custom-content:confluence – View custom content 
  + read:comment:confluence – View comments 
  + read:template:confluence – View content templates 
  + read:label:confluence – View labels 
  + read:watcher:confluence – View content watchers 
  + read:relation:confluence – View entity relationships 
  + read:user:confluence – View user details 
  + read:configuration:confluence – View Confluence settings 
  + read:space:confluence – View space details 
  + read:space.property:confluence – View space properties 
  + read:user.property:confluence – View user properties 
  + read:space.setting:confluence – View space settings 
  + read:analytics.content:confluence – View analytics for content
  + read:content.property:confluence – View content properties
  + read:content.metadata:confluence – View content summaries 
  + read:inlinetask:confluence – View tasks 
  + read:task:confluence – View tasks 
  + read:whiteboard:confluence – View whiteboards 
  + read:app-data:confluence – Read app data 
  + read:folder:confluence - View folders
  + read:embed:confluence - View Smart Link data

**In your AWS account, make sure you**:
+ Store your authentication credentials in an [AWS Secrets Manager secret](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html) and note the Amazon Resource Name (ARN) of the secret. Follow the **Connection configuration** instructions on this page to include the key-values pairs that must be included in your secret.
+ Include the necessary permissions to connect to your data source in your AWS Identity and Access Management (IAM) role/permissions policy for your knowledge base. For information on the required permissions for this data source to add to your knowledge base IAM role, see [Permissions to access data sources](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-permissions.html#kb-permissions-access-ds).

**Note**  
If you use the console, you can go to AWS Secrets Manager to add your secret or use an existing secret as part of the data source configuration step. The IAM role with all the required permissions can be created for you as part of the console steps for creating a knowledge base. After you have configured your data source and other configurations, the IAM role with all the required permissions are applied to your specific knowledge base.  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do not recommend that you re-use credentials and secrets across data sources.

## Connection configuration
<a name="configuration-confluence-connector"></a>

To connect to your Confluence instance, you must provide the necessary configuration information so that Amazon Bedrock can access and crawl your data. You must also follow the [Prerequisites](#prerequisites-confluence-connector).

An example of a configuration for this data source is included in this section.

For more information about auto detection of document fields, inclusion/exclusion filters, incremental syncing, secret authentication credentials, and how these work, select the following:

### Auto detection of main document fields
<a name="ds-confluence-document-fields"></a>

The data source connector automatically detects and crawls all of the main metadata fields of your documents or content. For example, the data source connector can crawl the document body equivalent of your documents, the document title, the document creation or modification date, or other core fields that might apply to your documents.

**Important**  
If your content includes sensitive information, then Amazon Bedrock could respond using sensitive information.

You can apply filtering operators to metadata fields to help you further improve the relevancy of responses. For example, document "epoch\$1modification\$1time" or the number of seconds that’s passed January 1 1970 for when the document was last updated. You can filter on the most recent data, where "epoch\$1modification\$1time" is *greater than* a certain number. For more information on the filtering operators you can apply to your metadata fields, see [Metadata and filtering](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-test-config.html).

### Inclusion/exclusion filters
<a name="ds-confluence-inclusion-exclusion"></a>

You can include or exclude crawling certain content. For example, you can specify an exclusion prefix/regular expression pattern to skip crawling any file that contains “private” in the file name. You could also specify an inclusion prefix/regular expression pattern to include certain content entities or content types. If you specify an inclusion and exclusion filter and both match a document, the exclusion filter takes precedence and the document isn’t crawled.

An example of a regular expression pattern to exclude or filter out PDF files that contain "private" in the file name: *".\$1private.\$1\$1\$1.pdf"*

You can apply inclusion/exclusion filters on the following content types:
+ `Space`: Unique space key
+ `Page`: Main page title
+ `Blog`: Main blog title
+ `Comment`: Comments that belong to a certain page or blog. Specify *Re: Page/Blog Title*
+ `Attachment`: Attachment file name with its extension

### Incremental syncing
<a name="ds-confluence-incremental-sync"></a>

The data source connector crawls new, modified, and deleted content each time your data source syncs with your knowledge base. Amazon Bedrock can use your data source’s mechanism for tracking content changes and crawl content that changed since the last sync. When you sync your data source with your knowledge base for the first time, all content is crawled by default.

To sync your data source with your knowledge base, use the [StartIngestionJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_StartIngestionJob.html) API or select your knowledge base in the console and select **Sync** within the data source overview section.

**Important**  
All data that you sync from your data source becomes available to anyone with `bedrock:Retrieve` permissions to retrieve the data. This can also include any data with controlled data source permissions. For more information, see [Knowledge base permissions](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-permissions.html).

### Secret authentication credentials
<a name="ds-confluence-secret-auth-credentials"></a>

(If using basic authentication) Your secret authentication credentials in AWS Secrets Manager should include these key-value pairs:
+ `username`: *admin user email address of Atlassian account*
+ `password`: *Confluence API token*

(If using OAuth 2.0 authentication) Your secret authentication credentials in AWS Secrets Manager should include these key-value pairs:
+ `confluenceAppKey`: *app key*
+ `confluenceAppSecret`: *app secret*
+ `confluenceAccessToken`: *app access token*
+ `confluenceRefreshToken`: *app refresh token*

**Note**  
Confluence OAuth2.0 **access** token has a default expiry time of 60 minutes. If this token expires while your data source is syncing (sync job), Amazon Bedrock will use the provided **refresh** token to regenerate this token. This regeneration refreshes both the access and refresh tokens. To keep the tokens updated from the current sync job to the next sync job, Amazon Bedrock requires write/put permissions for your secret credentials as part of your knowledge base IAM role.

**Note**  
Your secret in AWS Secrets Manager must use the same Region of your knowledge base.

------
#### [ Console ]

**Connect a Confluence instance to your knowledge base**

1. Follow the steps at [Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases](knowledge-base-create.md) and choose **Confluence** as the data source.

1. Provide a name and optional description for the data source.

1. Provide your Confluence instance URL. For example, for Confluence Cloud, *https://example.atlassian.net*. The URL for Confluence Cloud must be the base URL, ending with *.atlassian.net*.

1. In the **Advanced settings** section, you can optionally configure the following:
   + **KMS key for transient data storage.** – You can encrypt the transient data while converting your data into embeddings with the default AWS managed key or your own KMS key. For more information, see [Encryption of transient data storage during data ingestion](encryption-kb.md#encryption-kb-ingestion).
   + **Data deletion policy** – You can delete the vector embeddings for your data source that are stored in the vector store by default, or choose to retain the vector store data.

1. Provide the authentication information to connect to your Confluence instance:
   + For basic authentication, go to AWS Secrets Manager to add your secret authentication credentials or use an existing Amazon Resource Name (ARN) for the secret you created. Your secret must contain the admin user email address of the Atlassian account as the username and a Confluence API token in place of a password. For information about how to create a Confluence API token, see [Manage API tokens for your Atlassian account](https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/#Create-an-API-token) on the Atlassian website.
   + For OAuth 2.0 authentication, go to AWS Secrets Manager to add your secret authentication credentials or use an existing Amazon Resource Name (ARN) for the secret you created. Your secret must contain the Confluence app key, app secret, access token, and refresh token. For more information, see [OAuth 2.0 apps](https://developer.atlassian.com/cloud/confluence/oauth-2-3lo-apps/) on the Atlassian website.

1. (Optional) In the **Content parsing and chunking** section, you can customize how to parse and chunk your data. Refer to the following resources to learn more about these customizations:
   + For more information about parsing options, see [Parsing options for your data source](kb-advanced-parsing.md).
   + For more information about chunking strategies, see [How content chunking works for knowledge bases](kb-chunking.md).
**Warning**  
You can't change the chunking strategy after connecting to the data source.
   + For more information about how to customize chunking of your data and processing of your metadata with a Lambda function, see [Use a custom transformation Lambda function to define how your data is ingested](kb-custom-transformation.md).

1. Choose to use filters/regular expressions patterns to include or exclude certain content. All standard content is crawled otherwise.

1. Continue to choose an embeddings model and vector store. To see the remaining steps, return to [Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases](knowledge-base-create.md) and continue from the step after connecting your data source.

------
#### [ API ]

The following is an example of a configuration for connecting to Confluence Cloud for your Amazon Bedrock knowledge base. You configure your data source using the API with the AWS CLI or supported SDK, such as Python. After you call [CreateKnowledgeBase](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateKnowledgeBase.html), you call [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) to create your data source with your connection information in `dataSourceConfiguration`.

To learn about customizations that you can apply to ingestion by including the optional `vectorIngestionConfiguration` field, see [Customize ingestion for a data source](kb-data-source-customize-ingestion.md).

**AWS Command Line Interface**

```
aws bedrock create-data-source \
 --name "Confluence Cloud/SaaS connector" \
 --description "Confluence Cloud/SaaS data source connector for Amazon Bedrock to use content in Confluence" \
 --knowledge-base-id "your-knowledge-base-id" \
 --data-source-configuration file://confluence-bedrock-connector-configuration.json \
 --data-deletion-policy "DELETE" \
 --vector-ingestion-configuration '{"chunkingConfiguration":[{"chunkingStrategy":"FIXED_SIZE","fixedSizeChunkingConfiguration":[{"maxTokens":"100","overlapPercentage":"10"}]}]}'

confluence-bedrock-connector-configuration.json
{
    "confluenceConfiguration": {
        "sourceConfiguration": {
            "hostUrl": "https://example.atlassian.net",
            "hostType": "SAAS",
            "authType": "OAUTH2_CLIENT_CREDENTIALS",
            "credentialsSecretArn": "arn:aws::secretsmanager:your-region:secret:AmazonBedrock-Confluence"
        },
        "crawlerConfiguration": {
            "filterConfiguration": {
                "type": "PATTERN",
                "patternObjectFilter": {
                    "filters": [
                        {
                            "objectType": "Attachment",
                            "inclusionFilters": [
                                ".*\\.pdf"
                            ],
                            "exclusionFilters": [
                                ".*private.*\\.pdf"
                            ]
                        }
                    ]
                }
            }
        }
    },
    "type": "CONFLUENCE"
}
```

------

# Connect to Microsoft SharePoint for your knowledge base
<a name="sharepoint-data-source-connector"></a>

Microsoft SharePoint is a collaborative web-based service for working on documents, web pages, web sites, lists, and more. You can connect to your SharePoint instance for your Amazon Bedrock knowledge base by using either the [AWS Management Console for Amazon Bedrock](https://console.aws.amazon.com/bedrock/home) or the [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) API (see Amazon Bedrock [supported SDKs and AWS CLI](https://docs.aws.amazon.com/bedrock/latest/APIReference/welcome.html)).

**Note**  
SharePoint data source connector is in preview release and is subject to change.  
Microsoft SharePoint data sources don't support multimodal data, such as tables, charts, diagrams, or other images.

Amazon Bedrock supports connecting to SharePoint Online instances. Crawling OneNote documents is currently not supported. Currently, only Amazon OpenSearch Serverless vector store is available to use with this data source.

There are limits to how many files and MB per file that can be crawled. See [Quotas for knowledge bases](https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.html).

**Topics**
+ [

## Supported features
](#supported-features-sharepoint-connector)
+ [

## Prerequisites
](#prerequisites-sharepoint-connector)
+ [

## Connection configuration
](#configuration-sharepoint-connector)

## Supported features
<a name="supported-features-sharepoint-connector"></a>
+ Auto detection of main document fields
+ Inclusion/exclusion content filters
+ Incremental content syncs for added, updated, deleted content
+ SharePoint App-Only authentication

## Prerequisites
<a name="prerequisites-sharepoint-connector"></a>

### SharePoint (Online)
<a name="prerequisites-sharepoint-connector-online"></a>

**In your SharePoint (Online), complete the following steps for using SharePoint App-Only authentication:**
+ Take note of your SharePoint Online site URL/URLs. For example, *https://yourdomain.sharepoint.com/sites/mysite*. Your URL must start with *https* and contain *sharepoint.com*. Your site URL must be the actual SharePoint site, not *sharepoint.com/* or *sites/mysite/home.aspx*
+ Take note of the domain name of your SharePoint Online instance URL/URLs.
+ Copy your Microsoft 365 tenant ID. You can find your tenant ID in the Properties of your Microsoft Entra portal. For details, see [Find your Microsoft 365 tenant ID](https://learn.microsoft.com/en-us/sharepoint/find-your-office-365-tenant-id).
**Note**  
For an example application, see [Register a client application in Microsoft Entra ID](https://learn.microsoft.com/en-us/azure/healthcare-apis/register-application) (formerly known as Azure Active Directory) on the Microsoft Learn website. 
+ Configure SharePoint App-Only credentials.
+ Copy the client ID and client secret value when granting permission to SharePoint App-Only. For more information, see [Granting access using SharePoint App-Only](https://learn.microsoft.com/en-us/sharepoint/dev/solution-guidance/security-apponly-azureacs).
**Note**  
You do not need to setup any API permission for SharePoint App-Only. However, you must configure APP permissions on the SharePoint side. For more information about the required APP permissions, see the Microsoft documentation on [Granting access using SharePoint App-Only](https://learn.microsoft.com/en-us/sharepoint/dev/solution-guidance/security-apponly-azureacs).

### AWS account
<a name="prerequisites-sharepoint-connector-account"></a>

**In your AWS account, make sure you**:
+ Store your authentication credentials in an [AWS Secrets Manager secret](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html) and note the Amazon Resource Name (ARN) of the secret. Follow the **Connection configuration** instructions on this page to include the key-values pairs that must be included in your secret.
+ Include the necessary permissions to connect to your data source in your AWS Identity and Access Management (IAM) role/permissions policy for your knowledge base. For information on the required permissions for this data source to add to your knowledge base IAM role, see [Permissions to access data sources](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-permissions.html#kb-permissions-access-ds).

**Note**  
If you use the console, you can go to AWS Secrets Manager to add your secret or use an existing secret as part of the data source configuration step. The IAM role with all the required permissions can be created for you as part of the console steps for creating a knowledge base. After you have configured your data source and other configurations, the IAM role with all the required permissions are applied to your specific knowledge base.  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do not recommend that you re-use credentials and secrets across data sources.

## Connection configuration
<a name="configuration-sharepoint-connector"></a>

To connect to your SharePoint instance, you must provide the necessary configuration information so that Amazon Bedrock can access and crawl your data. You must also follow the [Prerequisites](#prerequisites-sharepoint-connector).

An example of a configuration for this data source is included in this section.

For more information about auto detection of document fields, inclusion/exclusion filters, incremental syncing, secret authentication credentials, and how these work, select the following:

### Auto detection of main document fields
<a name="ds-sharepoint-document-fields"></a>

The data source connector automatically detects and crawls all of the main metadata fields of your documents or content. For example, the data source connector can crawl the document body equivalent of your documents, the document title, the document creation or modification date, or other core fields that might apply to your documents.

**Important**  
If your content includes sensitive information, then Amazon Bedrock could respond using sensitive information.

You can apply filtering operators to metadata fields to help you further improve the relevancy of responses. For example, document "epoch\$1modification\$1time" or the number of seconds that’s passed January 1 1970 for when the document was last updated. You can filter on the most recent data, where "epoch\$1modification\$1time" is *greater than* a certain number. For more information on the filtering operators you can apply to your metadata fields, see [Metadata and filtering](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-test-config.html).

### Inclusion/exclusion filters
<a name="ds-sharepoint-inclusion-exclusion"></a>

You can include or exclude crawling certain content. For example, you can specify an exclusion prefix/regular expression pattern to skip crawling any file that contains “private” in the file name. You could also specify an inclusion prefix/regular expression pattern to include certain content entities or content types. If you specify an inclusion and exclusion filter and both match a document, the exclusion filter takes precedence and the document isn’t crawled.

An example of a regular expression pattern to exclude or filter out PDF files that contain "private" in the file name: *".\$1private.\$1\$1\$1.pdf"*

You can apply inclusion/exclusion filters on the following content types:
+ `Page`: Main page title
+ `Event`: Event name
+ `File`: File name with its extension for attachments and all document files

Crawling OneNote documents is currently not supported.

### Incremental syncing
<a name="ds-sharepoint-incremental-sync"></a>

The data source connector crawls new, modified, and deleted content each time your data source syncs with your knowledge base. Amazon Bedrock can use your data source’s mechanism for tracking content changes and crawl content that changed since the last sync. When you sync your data source with your knowledge base for the first time, all content is crawled by default.

To sync your data source with your knowledge base, use the [StartIngestionJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_StartIngestionJob.html) API or select your knowledge base in the console and select **Sync** within the data source overview section.

**Important**  
All data that you sync from your data source becomes available to anyone with `bedrock:Retrieve` permissions to retrieve the data. This can also include any data with controlled data source permissions. For more information, see [Knowledge base permissions](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-permissions.html).

### Secret authentication credentials
<a name="ds-sharepoint-secret-auth-credentials"></a>

When using SharePoint App-Only authentication, your secret authentication credentials in AWS Secrets Manager must include these key-value pairs:
+ `clientId`: *client ID associated with your Microsoft Entra SharePoint application*
+ `clientSecret`: *client secret associated with your Microsoft Entra SharePoint application*
+ `sharePointClientId`: *client ID generated when registering your SharePoint app for App-Only authentication*
+ `sharePointClientSecret`: *client secret generated when registering your SharePoint app for App-Only authentication*

**Note**  
Your secret in AWS Secrets Manager must use the same Region of your knowledge base.

------
#### [ Console ]

**Connect a SharePoint instance to your knowledge base**<a name="connect-sharepoint-console"></a>

1. Follow the steps at [Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases](knowledge-base-create.md) and choose **SharePoint** as the data source.

1. Provide a name and optional description for the data source.

1. Provide your SharePoint site URL/URLs. For example, for SharePoint Online, *https://yourdomain.sharepoint.com/sites/mysite*. Your URL must start with *https* and contain *sharepoint.com*. Your site URL must be the actual SharePoint site, not *sharepoint.com/* or *sites/mysite/home.aspx*

1. Provide the domain name of your SharePoint instance.

1. In the **Advanced settings** section, you can optionally configure the following:
   + **KMS key for transient data storage.** – You can encrypt the transient data while converting your data into embeddings with the default AWS managed key or your own KMS key. For more information, see [Encryption of transient data storage during data ingestion](encryption-kb.md#encryption-kb-ingestion).
   + **Data deletion policy** – You can delete the vector embeddings for your data source that are stored in the vector store by default, or choose to retain the vector store data.

1. Provide the authentication information to connect to your SharePoint instance. For SharePoint App-Only authentication:

   1. Provide the tenant ID. You can find your tenant ID in the Properties of your Azure Active Directory portal.

   1. Go to AWS Secrets Manager to add your secret credentials or use an existing Amazon Resource Name (ARN) for the secret you created. Your secret must contain the SharePoint client ID and the SharePoint client secret generated when you registered the App-Only at the tenant level or the site level, and the Entra client ID and Entra client secret generated when you register the app in Entra.

1. (Optional) In the **Content parsing and chunking** section, you can customize how to parse and chunk your data. Refer to the following resources to learn more about these customizations:
   + For more information about parsing options, see [Parsing options for your data source](kb-advanced-parsing.md).
   + For more information about chunking strategies, see [How content chunking works for knowledge bases](kb-chunking.md).
**Warning**  
You can't change the chunking strategy after connecting to the data source.
   + For more information about how to customize chunking of your data and processing of your metadata with a Lambda function, see [Use a custom transformation Lambda function to define how your data is ingested](kb-custom-transformation.md).

1. Choose to use filters/regular expressions patterns to include or exclude certain content. All standard content is crawled otherwise.

1. Continue to choose an embeddings model and vector store. To see the remaining steps, return to [Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases](knowledge-base-create.md) and continue from the step after connecting your data source.

------
#### [ API ]

The following is an example of a configuration for connecting to SharePoint Online for your Amazon Bedrock knowledge base. You configure your data source using the API with the AWS CLI or supported SDK, such as Python. After you call [CreateKnowledgeBase](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateKnowledgeBase.html), you call [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) to create your data source with your connection information in `dataSourceConfiguration`.

To learn about customizations that you can apply to ingestion by including the optional `vectorIngestionConfiguration` field, see [Customize ingestion for a data source](kb-data-source-customize-ingestion.md).

**AWS Command Line Interface**

```
aws bedrock-agent create-data-source \
 --name "SharePoint Online connector" \
 --description "SharePoint Online data source connector for Amazon Bedrock to use content in SharePoint" \
 --knowledge-base-id "your-knowledge-base-id" \
 --data-source-configuration file://sharepoint-bedrock-connector-configuration.json \
 --data-deletion-policy "DELETE"
```

**Contents of `sharepoint-bedrock-connector-configuration.json`**

```
{
    "sharePointConfiguration": {
        "sourceConfiguration": {
            "tenantId": "888d0b57-69f1-4fb8-957f-e1f0bedf64de",
            "hostType": "ONLINE",
            "domain": "yourdomain",
            "siteUrls": [
                "https://yourdomain.sharepoint.com/sites/mysite"
            ],
            "authType": "OAUTH2_SHAREPOINT_APP_ONLY_CLIENT_CREDENTIALS",
            "credentialsSecretArn": "arn:aws::secretsmanager:your-region:secret:AmazonBedrock-SharePoint"
        },
        "crawlerConfiguration": {
            "filterConfiguration": {
                "type": "PATTERN",
                "patternObjectFilter": {
                    "filters": [
                        {
                            "objectType": "File",
                            "inclusionFilters": [
                                ".*\\.pdf"
                            ],
                            "exclusionFilters": [
                                ".*private.*\\.pdf"
                            ]
                        }
                    ]
                }
            }
        }
    },
    "type": "SHAREPOINT"
}
```

------

**Important**  
The OAuth2.0 authentication is not recommended. We recommend that you use SharePoint App-Only authentication.

## Using OAuth2.0
<a name="sharepoint-connector-oauth"></a>

Using OAuth 2.0, you can authenticate and authorize access to SharePoint resources for SharePoint connectors integrated with Knowledge Bases.

### Pre-requisites
<a name="sharepoint-connector-oauth-prereq"></a>

**In SharePoint, for OAuth 2.0 authentication, make sure you**:
+ Take note of your SharePoint Online site URL/URLs. For example, *https://yourdomain.sharepoint.com/sites/mysite*. Your URL must start with *https* and contain *sharepoint.com*. Your site URL must be the actual SharePoint site, not *sharepoint.com/* or *sites/mysite/home.aspx*
+ Take note of the domain name of your SharePoint Online instance URL/URLs.
+ Copy your Microsoft 365 tenant ID. You can find your tenant ID in the Properties of your Microsoft Entra portal or in your OAuth application.

  Take note of the username and password of the admin SharePoint account, and copy the client ID and client secret value when registering an application.
**Note**  
For an example application, see [Register a client application in Microsoft Entra ID](https://learn.microsoft.com/en-us/azure/healthcare-apis/register-application) (formerly known as Azure Active Directory) on the Microsoft Learn website. 
+ Certain read permissions are required to connect to SharePoint when you register an application.
  + SharePoint: AllSites.Read (Delegated) – Read items in all site collections
+ You might need to turn off **Security Defaults** in your Azure portal using an admin user. For more information on managing security default settings in the Azure portal, see [Microsoft documentation on how to enable/disable security defaults](https://learn.microsoft.com/en-us/microsoft-365/business-premium/m365bp-conditional-access?view=o365-worldwide&tabs=secdefaults#security-defaults-1).
+ You might need to turn off multi-factor authentication (MFA) in your SharePoint account, so that Amazon Bedrock is not blocked from crawling your SharePoint content.

To complete the pre-requisites, make sure that you've completed the steps in [AWS account](#prerequisites-sharepoint-connector-account).

### Secret authentication credentials
<a name="sharepoint-secret-auth-credentials-oauth"></a>

For connection configuration for OAuth2.0, you can perform the same steps for the auto detection of the main document fields, inclusion/exclusion filters, and incremental syncing as described in [Connection configuration](#configuration-sharepoint-connector).

**For OAuth 2.0 authentication, your secret authentication credentials in AWS Secrets Manager must include these key-value pairs**.
+ `username`: *SharePoint admin username*
+ `password`: *SharePoint admin password*
+ `clientId`: *OAuth app client ID*
+ `clientSecret`: *OAuth app client secret*

### Connect a SharePoint instance to your knowledge base
<a name="sharepoint-connector-oauth-using"></a>

To connect a SharePoint instance to your knowledge base when using OAuth2.0:
+ (console) In the console, follow the same steps as described in [Connect a SharePoint instance to your knowledge base](https://docs.aws.amazon.com/bedrock/latest/userguide/sharepoint-data-source-connector.html#connect-sharepoint-console). When you want to provide the authentication information to connect to your SharePoint instance.
  + Provide the tenant ID. You can find your tenant ID in the Properties of your Azure Active Directory portal.
  + Go to AWS Secrets Manager to add your secret authentication credentials or use an existing Amazon Resource Name (ARN) for the secret you created. Your secret must contain the SharePoint admin username and password, and your registered app client ID and client secret. For an example application, see [Register a client application in Microsoft Entra ID](https://learn.microsoft.com/en-us/azure/healthcare-apis/register-application) (formerly known as Azure Active Directory) on the Microsoft Learn website.
+ (API) The following is an example of using the `CreateDataSource` API to create your data source with your connection information for OAuth2.0.

  ```
  aws bedrock-agent create-data-source \
   --name "SharePoint Online connector" \
   --description "SharePoint Online data source connector for Amazon Bedrock to use content in SharePoint" \
   --knowledge-base-id "your-knowledge-base-id" \
   --data-source-configuration file://sharepoint-bedrock-connector-configuration.json \
   --data-deletion-policy "DELETE"
  ```

  **Contents of `sharepoint-bedrock-connector-configuration.json`**

  ```
  {
      "sharePointConfiguration": {
          "sourceConfiguration": {
              "tenantId": "888d0b57-69f1-4fb8-957f-e1f0bedf64de",
              "hostType": "ONLINE",
              "domain": "yourdomain",
              "siteUrls": [
                  "https://yourdomain.sharepoint.com/sites/mysite"
              ],
              "authType": "OAUTH2_CLIENT_CREDENTIALS",
              "credentialsSecretArn": "arn:aws::secretsmanager:your-region:secret:AmazonBedrock-SharePoint"
          },
          "crawlerConfiguration": {
              "filterConfiguration": {
                  "type": "PATTERN",
                  "patternObjectFilter": {
                      "filters": [
                          {
                              "objectType": "File",
                              "inclusionFilters": [
                                  ".*\\.pdf"
                              ],
                              "exclusionFilters": [
                                  ".*private.*\\.pdf"
                              ]
                          }
                      ]
                  }
              }
          }
      },
      "type": "SHAREPOINT"
  }
  ```

# Connect to Salesforce for your knowledge base
<a name="salesforce-data-source-connector"></a>

Salesforce is a customer relationship management (CRM) tool for managing support, sales, and marketing teams. You can connect to your Salesforce instance for your Amazon Bedrock knowledge base by using either the [AWS Management Console for Amazon Bedrock](https://console.aws.amazon.com/bedrock/home) or the [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) API (see Amazon Bedrock [supported SDKs and AWS CLI](https://docs.aws.amazon.com/bedrock/latest/APIReference/welcome.html)).

**Note**  
Salesforce data source connector is in preview release and is subject to change.  
Salesforce data sources don't support multimodal data, such as tables, charts, diagrams, or other images..

Currently, only Amazon OpenSearch Serverless vector store is available to use with this data source.

There are limits to how many files and MB per file that can be crawled. See [Quotas for knowledge bases](https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.html).

**Topics**
+ [

## Supported features
](#supported-features-salesforce-connector)
+ [

## Prerequisites
](#prerequisites-salesforce-connector)
+ [

## Connection configuration
](#configuration-salesforce-connector)

## Supported features
<a name="supported-features-salesforce-connector"></a>
+ Auto detection of main document fields
+ Inclusion/exclusion content filters
+ Incremental content syncs for added, updated, deleted content
+ OAuth 2.0 authentication

## Prerequisites
<a name="prerequisites-salesforce-connector"></a>

**In Salesforce, make sure you**:
+ Take note of your Salesforce instance URL. For example, *https://company.salesforce.com/*. The instance must be running a Salesforce Connected App.
+ Create a Salesforce Connected App and configure client credentials. Then, for your selected app, copy the consumer key (client ID) and consumer secret (client secret) from the OAuth settings. For more information, see Salesforce documentation on [Create a Connected App](https://help.salesforce.com/s/articleView?id=sf.connected_app_create.htm&type=5) and [Configure a Connected App for the OAuth 2.0 Client Credentials](https://help.salesforce.com/s/articleView?id=sf.connected_app_client_credentials_setup.htm&type=5).
**Note**  
For Salesforce Connected Apps, under Client Credentials Flow, make sure you search and select the user’s name or alias for your client credentials in the “Run As” field.

**In your AWS account, make sure you**:
+ Store your authentication credentials in an [AWS Secrets Manager secret](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html) and note the Amazon Resource Name (ARN) of the secret. Follow the **Connection configuration** instructions on this page to include the key-values pairs that must be included in your secret.
+ Include the necessary permissions to connect to your data source in your AWS Identity and Access Management (IAM) role/permissions policy for your knowledge base. For information on the required permissions for this data source to add to your knowledge base IAM role, see [Permissions to access data sources](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-permissions.html#kb-permissions-access-ds).

**Note**  
If you use the console, you can go to AWS Secrets Manager to add your secret or use an existing secret as part of the data source configuration step. The IAM role with all the required permissions can be created for you as part of the console steps for creating a knowledge base. After you have configured your data source and other configurations, the IAM role with all the required permissions are applied to your specific knowledge base.  
We recommend that you regularly refresh or rotate your credentials and secret. Provide only the necessary access level for your own security. We do not recommend that you re-use credentials and secrets across data sources.

## Connection configuration
<a name="configuration-salesforce-connector"></a>

To connect to your Salesforce instance, you must provide the necessary configuration information so that Amazon Bedrock can access and crawl your data. You must also follow the [Prerequisites](#prerequisites-salesforce-connector).

An example of a configuration for this data source is included in this section.

For more information about auto detection of document fields, inclusion/exclusion filters, incremental syncing, secret authentication credentials, and how these work, select the following:

### Auto detection of main document fields
<a name="ds-salesforce-document-fields"></a>

The data source connector automatically detects and crawls all of the main metadata fields of your documents or content. For example, the data source connector can crawl the document body equivalent of your documents, the document title, the document creation or modification date, or other core fields that might apply to your documents.

**Important**  
If your content includes sensitive information, then Amazon Bedrock could respond using sensitive information.

You can apply filtering operators to metadata fields to help you further improve the relevancy of responses. For example, document "epoch\$1modification\$1time" or the number of seconds that’s passed January 1 1970 for when the document was last updated. You can filter on the most recent data, where "epoch\$1modification\$1time" is *greater than* a certain number. For more information on the filtering operators you can apply to your metadata fields, see [Metadata and filtering](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-test-config.html).

### Inclusion/exclusion filters
<a name="ds-salesforce-inclusion-exclusion"></a>

You can include or exclude crawling certain content. For example, you can specify an exclusion prefix/regular expression pattern to skip crawling any file that contains “private” in the file name. You could also specify an inclusion prefix/regular expression pattern to include certain content entities or content types. If you specify an inclusion and exclusion filter and both match a document, the exclusion filter takes precedence and the document isn’t crawled.

An example of a regular expression pattern to exclude or filter out campaigns that contain "private" in the campaign name: *".\$1private.\$1"*

You can apply inclusion/exclusion filters on the following content types:
+ `Account`: Account number/identifier
+ `Attachment`: Attachment file name with its extension
+ `Campaign`: Campaign name and associated identifiers
+ `ContentVersion`: Document version and associated identifiers
+ `Partner`: Partner information fields including associated identifiers
+ `Pricebook2`: Product/price list name
+ `Case`: Customer inquiry/issue number and other information fields including associated identifiers (please note: can contain personal information, which you can choose to exclude or filter out)
+ `Contact`: Customer information fields (please note: can contain personal information, which you can choose to exclude or filter out)
+ `Contract`: Contract name and associated identifiers
+ `Document`: File name with its extension
+ `Idea`: Idea information fields and associated identifiers
+ `Lead`: Potential new customer information fields (please note: can contain personal information, which you can choose to exclude or filter out)
+ `Opportunity`: Pending sale/deal information fields and associated identifiers
+ `Product2`: Product information fields and associated identifiers
+ `Solution`: Solution name for a customer inquiry/issue and associated identifiers
+ `Task`: Task information fields and associated identifiers
+ `FeedItem`: Identifier of the chatter feed post
+ `FeedComment`: Identifier of the chatter feed post that the comments belong to
+ `Knowledge__kav`: Knowledge Article Title
+ `User`: User alias within your organization
+ `CollaborationGroup`: Chatter group name (unique)

### Incremental syncing
<a name="ds-salesforce-incremental-sync"></a>

The data source connector crawls new, modified, and deleted content each time your data source syncs with your knowledge base. Amazon Bedrock can use your data source’s mechanism for tracking content changes and crawl content that changed since the last sync. When you sync your data source with your knowledge base for the first time, all content is crawled by default.

To sync your data source with your knowledge base, use the [StartIngestionJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_StartIngestionJob.html) API or select your knowledge base in the console and select **Sync** within the data source overview section.

**Important**  
All data that you sync from your data source becomes available to anyone with `bedrock:Retrieve` permissions to retrieve the data. This can also include any data with controlled data source permissions. For more information, see [Knowledge base permissions](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-permissions.html).

### Secret authentication credentials
<a name="ds-salesforce-secret-auth-credentials"></a>

(For OAuth 2.0 authentication) Your secret authentication credentials in AWS Secrets Manager should include these key-value pairs:
+ `consumerKey`: *app client ID*
+ `consumerSecret`: *app client secret*
+ `authenticationUrl`: *Salesforce instance URL or the URL to request the authentication token from*

**Note**  
Your secret in AWS Secrets Manager must use the same Region of your knowledge base.

------
#### [ Console ]

**Connect a Salesforce instance to your knowledge base**

1. Follow the steps at [Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases](knowledge-base-create.md) and choose **Salesforce** as the data source.

1. Provide a name and optional description for the data source.

1. Provide your Salesforce instance URL. For example, *https://company.salesforce.com/*. The instance must be running a Salesforce Connected App.

1. In the **Advanced settings** section, you can optionally configure the following:
   + **KMS key for transient data storage.** – You can encrypt the transient data while converting your data into embeddings with the default AWS managed key or your own KMS key. For more information, see [Encryption of transient data storage during data ingestion](encryption-kb.md#encryption-kb-ingestion).
   + **Data deletion policy** – You can delete the vector embeddings for your data source that are stored in the vector store by default, or choose to retain the vector store data.

1. Provide the authentication information to connect to your Salesforce instance:

   1. For OAuth 2.0 authentication, go to AWS Secrets Manager to add your secret authentication credentials or use an existing Amazon Resource Name (ARN) for the secret you created. Your secret must contain the Salesforce Connected App consumer key (client ID), consumer secret (client secret), and the Salesforce instance URL or the URL to request the authentication token from. For more information, see Salesforce documentation on [Create a Connected App](https://help.salesforce.com/s/articleView?id=sf.connected_app_create.htm&type=5) and [Configure a Connected App for the OAuth 2.0 Client Credentials](https://help.salesforce.com/s/articleView?id=sf.connected_app_client_credentials_setup.htm&type=5).

1. (Optional) In the **Content parsing and chunking** section, you can customize how to parse and chunk your data. Refer to the following resources to learn more about these customizations:
   + For more information about parsing options, see [Parsing options for your data source](kb-advanced-parsing.md).
   + For more information about chunking strategies, see [How content chunking works for knowledge bases](kb-chunking.md).
**Warning**  
You can't change the chunking strategy after connecting to the data source.
   + For more information about how to customize chunking of your data and processing of your metadata with a Lambda function, see [Use a custom transformation Lambda function to define how your data is ingested](kb-custom-transformation.md).

1. Choose to use filters/regular expressions patterns to include or exclude certain content. All standard content is crawled otherwise.

1. Continue to choose an embeddings model and vector store. To see the remaining steps, return to [Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases](knowledge-base-create.md) and continue from the step after connecting your data source.

------
#### [ API ]

The following is an example of a configuration for connecting to Salesforce for your Amazon Bedrock knowledge base. You configure your data source using the API with the AWS CLI or supported SDK, such as Python. After you call [CreateKnowledgeBase](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateKnowledgeBase.html), you call [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) to create your data source with your connection information in `dataSourceConfiguration`.

To learn about customizations that you can apply to ingestion by including the optional `vectorIngestionConfiguration` field, see [Customize ingestion for a data source](kb-data-source-customize-ingestion.md).

**AWS Command Line Interface**

```
aws bedrock create-data-source \
 --name "Salesforce connector" \
 --description "Salesforce data source connector for Amazon Bedrock to use content in Salesforce" \
 --knowledge-base-id "your-knowledge-base-id" \
 --data-source-configuration file://salesforce-bedrock-connector-configuration.json \
 --data-deletion-policy "DELETE" \
 --vector-ingestion-configuration '{"chunkingConfiguration":[{"chunkingStrategy":"FIXED_SIZE","fixedSizeChunkingConfiguration":[{"maxTokens":"100","overlapPercentage":"10"}]}]}'

salesforce-bedrock-connector-configuration.json
{
    "salesforceConfiguration": {
        "sourceConfiguration": {
            "hostUrl": "https://company.salesforce.com/",
            "authType": "OAUTH2_CLIENT_CREDENTIALS",
            "credentialsSecretArn": "arn:aws::secretsmanager:your-region:secret:AmazonBedrock-Salesforce"
        },
        "crawlerConfiguration": {
            "filterConfiguration": {
                "type": "PATTERN",
                "patternObjectFilter": {
                    "filters": [
                        {
                            "objectType": "Campaign",
                            "inclusionFilters": [
                                ".*public.*"
                            ],
                            "exclusionFilters": [
                                ".*private.*"
                            ]
                        }
                    ]
                }
            }
        }
    },
    "type": "SALESFORCE"
}
```

------

# Crawl web pages for your knowledge base
<a name="webcrawl-data-source-connector"></a>

The Amazon Bedrock provided Web Crawler connects to and crawls URLs you have selected for use in your Amazon Bedrock knowledge base. You can crawl website pages in accordance with your set scope or limits for your selected URLs. You can crawl website pages using either the [AWS Management Console for Amazon Bedrock](https://console.aws.amazon.com/bedrock/home) or the [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) API (see Amazon Bedrock [supported SDKs and AWS CLI](https://docs.aws.amazon.com/bedrock/latest/APIReference/welcome.html)). Currently, only Amazon OpenSearch Serverless vector store is available to use with this data source.

**Note**  
The Web Crawler data source connector is in preview release and is subject to change.

When selecting websites to crawl, you must adhere to the [Amazon Acceptable Use Policy](https://aws.amazon.com/aup/) and all other Amazon terms. Remember that you must only use the Web Crawler to index your own web pages, or web pages that you have authorization to crawl and must respect robots.txt configurations..

The Web Crawler respects robots.txt in accordance with the [RFC 9309](https://www.rfc-editor.org/rfc/rfc9309.html)

There are limits to how many web page content items and MB per content item that can be crawled. See [Quotas for knowledge bases](https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.html).

**Topics**
+ [

## Supported features
](#supported-features-webcrawl-connector)
+ [

## Prerequisites
](#prerequisites-webcrawl-connector)
+ [

## Connection configuration
](#configuration-webcrawl-connector)

## Supported features
<a name="supported-features-webcrawl-connector"></a>

The Web Crawler connects to and crawls HTML pages starting from the seed URL, traversing all child links under the same top primary domain and path. If any of the HTML pages reference supported documents, the Web Crawler will fetch these documents, regardless if they are within the same top primary domain. You can modify the crawling behavior by changing the crawling configuration - see [Connection configuration](#configuration-webcrawl-connector).

The following is supported for you to:
+ Select multiple source URLs to crawl and set the scope of URLs to crawl only the host or also include subdomains.
+ Crawl static web pages that are part of your source URLs.
+ Specify custom User Agent suffix to set rules for your own crawler.
+ Include or exclude certain URLs that match a filter pattern.
+ Respect standard robots.txt directives like 'Allow' and 'Disallow'.
+ Limit the scope of the URLs to crawl and optionally exclude URLs that match a filter pattern.
+ Limit the rate of crawling URLs and the maximum number of pages to crawl.
+ View the status of crawled URLs in Amazon CloudWatch

## Prerequisites
<a name="prerequisites-webcrawl-connector"></a>

**To use the Web Crawler, make sure you:**.
+ Check that you are authorized to crawl your source URLs.
+ Check the path to robots.txt corresponding to your source URLs doesn't block the URLs from being crawled. The Web Crawler adheres to the standards of robots.txt: `disallow` by default if robots.txt is not found for the website. The Web Crawler respects robots.txt in accordance with the [RFC 9309](https://www.rfc-editor.org/rfc/rfc9309.html). You can also specify custom User Agent header suffix to set rules for your own crawler. For more information, see Web Crawler URL access in [Connection configuration](#configuration-webcrawl-connector) instructions on this page.
+ [Enable CloudWatch Logs delivery](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-bases-logging.html) and follow examples of Web Crawler logs to view the status of your data ingestion job for ingesting web content, and if certain URLs cannot be retrieved.

**Note**  
When selecting websites to crawl, you must adhere to the [Amazon Acceptable Use Policy](https://aws.amazon.com/aup/) and all other Amazon terms. Remember that you must only use the Web Crawler to index your own web pages, or web pages that you have authorization to crawl.

## Connection configuration
<a name="configuration-webcrawl-connector"></a>

For more information about sync scope for crawling URLs, inclusion/exclusion filters, URL access, incremental syncing, and how these work, select the following:

### Sync scope for crawling URLs
<a name="ds-sync-scope"></a>

You can limit the scope of the URLs to crawl based on each page URL's specific relationship to the seed URLs. For faster crawls, you can limit URLs to those with the same host and initial URL path of the seed URL. For more broader crawls, you can choose to crawl URLs with the same host or within any subdomain of the seed URL.

You can choose from the following options.
+ Default: Limit crawling to web pages that belong to the same host and with the same initial URL path. For example, with a seed URL of "https://aws.amazon.com/bedrock/" then only this path and web pages that extend from this path will be crawled, like "https://aws.amazon.com/bedrock/agents/". Sibling URLs like "https://aws.amazon.com/ec2/" are not crawled, for example.
+ Host only: Limit crawling to web pages that belong to the same host. For example, with a seed URL of "https://aws.amazon.com/bedrock/", then web pages with "https://aws.amazon.com" will also be crawled, like "https://aws.amazon.com/ec2".
+ Subdomains: Include crawling of any web page that has the same primary domain as the seed URL. For example, with a seed URL of "https://aws.amazon.com/bedrock/" then any web page that contains "amazon.com" (subdomain) will be crawled, like "https://www.amazon.com".

**Note**  
Make sure you are not crawling potentially excessive web pages. It's not recommended to crawl large websites, such as wikipedia.org, without filters or scope limits. Crawling large websites will take a very long time to crawl.  
[Supported file types](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-ds.html) are crawled regardless of scope and if there's no exclusion pattern for the file type.

The Web Crawler supports static websites.

You can also limit the rate of crawling URLs to control the throttling of crawling speed. You set the maximum number of URLs crawled per host per minute. In addition, you can also set the maximum number (up to 25,000) of total web pages to crawl. Note that if the total number of web pages from your source URLs exceeds your set maximum, then your data source sync/ingestion job will fail.

### Inclusion/exclusion filters
<a name="ds-inclusion-exclusion"></a>

You can include or exclude certain URLs in accordance with your scope. [Supported file types](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-ds.html) are crawled regardless of scope and if there's no exclusion pattern for the file type. If you specify an inclusion and exclusion filter and both match a URL, the exclusion filter takes precedence and the web content isn’t crawled.

**Important**  
Problematic regular expression pattern filters that lead to [catastrophic backtracking](https://docs.aws.amazon.com/codeguru/detector-library/python/catastrophic-backtracking-regex/) and look ahead are rejected.

An example of a regular expression filter pattern to exclude URLs that end with ".pdf" or PDF web page attachments: *".\$1\$1.pdf\$1"*

### Web Crawler URL access
<a name="ds-webcrawl-identity-crawling"></a>

You can use the Web Crawler to crawl the pages of websites that you are authorized to crawl.

When selecting websites to crawl, you must adhere to the [Amazon Acceptable Use Policy](https://aws.amazon.com/aup/) and all other Amazon terms. Remember that you must only use the Web Crawler to index your own web pages, or web pages that you have authorization to crawl.

The Web Crawler respects robots.txt in accordance with the [RFC 9309](https://www.rfc-editor.org/rfc/rfc9309.html)

You can specify certain user agent bots to either ‘Allow’ or ‘Disallow’ the user agent to crawl your source URLs. You can modify the robots.txt file of your website to control how the Web Crawler crawls your source URLs. The crawler will first look for `bedrockbot-UUID ` rules and then for generic `bedrockbot` rules in the robots.txt file.

You can also add a User-Agent suffix that can be used to allowlist your crawler in bot protection systems. Note that this suffix does not need to be added to the `robots.txt` file to make sure that no one can impersonate the User Agent string. For example, to allow the Web Crawler to crawl all website content and disallow crawling for any other robots, use the following directive:

```
User-agent: bedrockbot-UUID # Amazon Bedrock Web Crawler
Allow: / # allow access to all pages
User-agent: * # any (other) robot
Disallow: / # disallow access to any pages
```

### Incremental syncing
<a name="ds-incremental-sync"></a>

Each time the the Web Crawler runs, it retrieves content for all URLs that are reachable from the source URLs and which match the scope and filters. For incremental syncs after the first sync of all content, Amazon Bedrock will update your knowledge base with new and modified content, and will remove old content that is no longer present. Occasionally, the crawler may not be able to tell if content was removed from the website; and in this case it will err on the side of preserving old content in your knowledge base.

To sync your data source with your knowledge base, use the [StartIngestionJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_StartIngestionJob.html) API or select your knowledge base in the console and select **Sync** within the data source overview section.

**Important**  
All data that you sync from your data source becomes available to anyone with `bedrock:Retrieve` permissions to retrieve the data. This can also include any data with controlled data source permissions. For more information, see [Knowledge base permissions](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-permissions.html).

------
#### [ Console ]

**Connect a Web Crawler data source to your knowledge base**

1. Follow the steps at [Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases](knowledge-base-create.md) and choose **Web Crawler** as the data source.

1. Provide a name and optional description for the data source.

1. Provide the **Source URLs** of the URLs you want to crawl. You can add up to 9 additional URLs by selecting **Add Source URLs**. By providing a source URL, you are confirming that you are authorized to crawl its domain.

1. In the **Advanced settings** section, you can optionally configure the following:
   + **KMS key for transient data storage.** – You can encrypt the transient data while converting your data into embeddings with the default AWS managed key or your own KMS key. For more information, see [Encryption of transient data storage during data ingestion](encryption-kb.md#encryption-kb-ingestion).
   + **Data deletion policy** – You can delete the vector embeddings for your data source that are stored in the vector store by default, or choose to retain the vector store data.

1. (Optional) Provide a user agent suffix for **bedrock-UUID-** that identifies the crawler or bot when it accesses a web server.

1. Configure the following in the **Sync scope** section:

   1. Select a **Website domain range** for crawling your source URLs:
      + Default: Limit crawling to web pages that belong to the same host and with the same initial URL path. For example, with a seed URL of "https://aws.amazon.com/bedrock/" then only this path and web pages that extend from this path will be crawled, like "https://aws.amazon.com/bedrock/agents/". Sibling URLs like "https://aws.amazon.com/ec2/" are not crawled, for example.
      + Host only: Limit crawling to web pages that belong to the same host. For example, with a seed URL of "https://aws.amazon.com/bedrock/", then web pages with "https://aws.amazon.com" will also be crawled, like "https://aws.amazon.com/ec2".
      + Subdomains: Include crawling of any web page that has the same primary domain as the seed URL. For example, with a seed URL of "https://aws.amazon.com/bedrock/" then any web page that contains "amazon.com" (subdomain) will be crawled, like "https://www.amazon.com".
**Note**  
Make sure you are not crawling potentially excessive web pages. It's not recommended to crawl large websites, such as wikipedia.org, without filters or scope limits. Crawling large websites will take a very long time to crawl.  
[Supported file types](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-ds.html) are crawled regardless of scope and if there's no exclusion pattern for the file type.

   1. Enter **Maximum throttling of crawling speed**. Ingest URLs between 1 and 300 URLs per host per minute. A higher crawling speed increases the load but takes less time.

   1. Enter **Maximum pages for data source sync** between 1 and 25000. Limit the maximum number of web pages crawled from your source URLs. If web pages exceed this number the data source sync will fail and no web pages will be ingested. 

   1. For **URL Regex** patterns (optional) you can add **Include patterns** or **Exclude patterns** by entering the regular expression pattern in the box. You can add up to 25 include and 25 exclude filter patterns by selecting **Add new pattern**. The include and exclude patterns are crawled in accordance with your scope. If there's a conflict, the exclude pattern takes precedence.

1. (Optional) In the **Content parsing and chunking** section, you can customize how to parse and chunk your data. Refer to the following resources to learn more about these customizations:
   + For more information about parsing options, see [Parsing options for your data source](kb-advanced-parsing.md).
   + For more information about chunking strategies, see [How content chunking works for knowledge bases](kb-chunking.md).
**Warning**  
You can't change the chunking strategy after connecting to the data source.
   + For more information about how to customize chunking of your data and processing of your metadata with a Lambda function, see [Use a custom transformation Lambda function to define how your data is ingested](kb-custom-transformation.md).

1. Continue to choose an embeddings model and vector store. To see the remaining steps, return to [Create a knowledge base by connecting to a data source in Amazon Bedrock Knowledge Bases](knowledge-base-create.md) and continue from the step after connecting your data source.

------
#### [ API ]

To connect a knowledge base to a data source using WebCrawler, send a [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) request with an [Agents for Amazon Bedrock build-time endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#bra-bt), specify `WEB` in the `type` field of the [DataSourceConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_DataSourceConfiguration.html), and include the `webConfiguration` field. The following is an example of a configuration of Web Crawler for your Amazon Bedrock knowledge base.

```
{
    "webConfiguration": {
        "sourceConfiguration": {
            "urlConfiguration": {
                "seedUrls": [{
                    "url": "https://www.examplesite.com"
                }]
            }
        },
        "crawlerConfiguration": {
            "crawlerLimits": {
                "rateLimit": 50,
                "maxPages": 100
            },
            "scope": "HOST_ONLY",
            "inclusionFilters": [
                "https://www\.examplesite\.com/.*\.html"
            ],
            "exclusionFilters": [
                "https://www\.examplesite\.com/contact-us\.html"
            ],
            "userAgent": "CustomUserAgent"
        }
    },
    "type": "WEB"
}
```

To learn about customizations that you can apply to ingestion by including the optional `vectorIngestionConfiguration` field, see [Customize ingestion for a data source](kb-data-source-customize-ingestion.md).

------

# Connect your knowledge base to a custom data source
<a name="custom-data-source-connector"></a>

Instead of choosing a supported data source service, you can connect to a custom data source for the following advantages:
+ Flexibility and control over the data types that you want your knowledge base to have access to.
+ The ability to use the `KnowledgeBaseDocuments` API operations to directly ingest or delete documents without the need to sync changes.
+ The ability to view documents in your data source directly through the Amazon Bedrock console or API.
+ The ability to upload documents into the data source directly in the AWS Management Console or to add them inline.
+ The ability to add metadata directly to each document for when adding or updating a document in the data source. For more information on how to use metadata for filtering when retrieving information from a data source, see the **Metadata and filtering** tab in [Configure and customize queries and response generation](kb-test-config.md).

**Multimodal content support**  
Custom data sources support multimodal content including images, audio, and video files up to 10MB base64 encoded. For comprehensive guidance on working with multimodal content, see [Build a knowledge base for multimodal content](kb-multimodal.md).

To connect a knowledge base to a custom data source, send a [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) request with an [Agents for Amazon Bedrock build-time endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#bra-bt). Specify the `knowledgeBaseId` of the knowledge base to connect to, give a `name` to the data source, and specify the `type` field in the `dataSourceConfiguration` as `CUSTOM`. The following shows a minimal example to create this data source:

```
PUT /knowledgebases/KB12345678/datasources/ HTTP/1.1
Content-type: application/json

{
    "name": "MyCustomDataSource",
    "dataSourceConfiguration": {
        "type": "CUSTOM"
    }
}
```

You can include any of the following optional fields to configure the data source:


****  

| Field | Use case | 
| --- | --- | 
| description | To provide a description for the data source. | 
| clientToken | To ensure the API request completes only once. For more information, see [Ensuring idempotency](https://docs.aws.amazon.com/ec2/latest/devguide/ec2-api-idempotency.html). | 
| serverSideEncryptionConfiguration | To specify a custom KMS key for transient data storage while converting your data into embeddings. For more information, see [Encryption of transient data storage during data ingestion](encryption-kb.md#encryption-kb-ingestion) | 
| dataDeletionPolicy | To configure what to do with the vector embeddings for your data source in your vector store, if you delete the data source. Specify RETAIN to retain the data in the vector store or the default option of DELETE to delete them. | 
| vectorIngestionConfiguration | To configure options for ingestion of the data source. See below for more information. | 

The `vectorIngestionConfiguration` field maps to a [VectorIngestionConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_VectorIngestionConfiguration.html) object containing the following fields:
+ chunkingConfiguration – To configure the strategy to use for chunking the documents in the data source. For more information about chunking strategies, see [How content chunking works for knowledge bases](kb-chunking.md).
+ parsingConfiguration – To configure the strategy to use for parsing the data source. For more information about parsing options, see [Parsing options for your data source](kb-advanced-parsing.md).
+ customTransformationConfiguration – To customize how the data is transformed and to apply a Lambda function for greater customization. For more information about how to customize chunking of your data and processing of your metadata with a Lambda function, see [Use a custom transformation Lambda function to define how your data is ingested](kb-custom-transformation.md).

After setting up your custom data source, you can add documents into it and directly ingest them into the knowledge base. Unlike other data sources, you don't need to sync a custom data source. To learn how to ingest documents directly, see [Ingest changes directly into a knowledge base](kb-direct-ingestion.md).

# Customize ingestion for a data source
<a name="kb-data-source-customize-ingestion"></a>

You can customize vector ingestion when connecting a data source in the AWS Management Console or by modifying the value of the `vectorIngestionConfiguration` field when sending a [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) request.

Select a topic to learn how to include configurations for customizing ingestion when connecting to a data source:

**Topics**
+ [

## Choose the tool to use for parsing
](#kb-data-source-customize-parsing)
+ [

## Choose a chunking strategy
](#kb-data-source-customize-chunking)
+ [

## Use a Lambda function during ingestion
](#kb-data-source-customize-lambda)

## Choose the tool to use for parsing
<a name="kb-data-source-customize-parsing"></a>

You can customize how the documents in your data are parsed. To learn about options for parsing data in Amazon Bedrock Knowledge Bases, see [Parsing options for your data source](kb-advanced-parsing.md).

**Warning**  
You can't change the parsing strategy after connecting to the data source. To use a different parsing strategy, you can add a new data source.  
You can't add an S3 location to store multimodal data (including images, figures, charts, and tables) after you've created a knowledge base. If you want to include multimodal data and use a parser that supports it, you must create a new knowledge base.

The steps involved in choosing a parsing strategy depend on whether you use the AWS Management Console or the Amazon Bedrock API and the parsing method you choose. If you choose a parsing method that supports multimodal data, you must specify an S3 URI in which to store the multimodal data extracted from your documents. This data can be returned in knowledge base query.
+ In the AWS Management Console, do the following:

  1. Select the parsing strategy when you connect to a data source while setting up a knowledge base or when you add a new data source to your existing knowledge base.

  1. (If you choose Amazon Bedrock Data Automation or a foundation model as your parsing strategy) Specify an S3 URI in which to store the multimodal data extracted from your documents in the **Multimodal storage destination** section when you select an embeddings model and configure your vector store. You can also optionally use a customer managed key to encrypt your S3 data at this step.
+ In the Amazon Bedrock API, do the following:

  1. (If you plan to use Amazon Bedrock Data Automation or a foundation model as your parsing strategy) Include a [SupplementalDataStorageLocation](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_SupplementalDataStorageLocation.html) in the [VectorKnowledgeBaseConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_VectorKnowledgeBaseConfiguration.html) of a [CreateKnowledgeBase](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateKnowledgeBase.html) request.

  1. Include a [ParsingConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_ParsingConfiguration.html) in the `parsingConfiguration` field of the [VectorIngestionConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_VectorIngestionConfiguration.html) in the [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) request.
**Note**  
If you omit this configuration, Amazon Bedrock Knowledge Bases uses the Amazon Bedrock default parser.

For more details about how to specify a parsing strategy in the API, expand the section that corresponds to the parsing strategy that you want to use:

### Amazon Bedrock default parser
<a name="w2aac28c10c23c15c17c11c13b1"></a>

To use the default parser, don't include a `parsingConfiguration` field within the `VectorIngestionConfiguration`.

### Amazon Bedrock Data Automation parser (preview)
<a name="w2aac28c10c23c15c17c11c13b3"></a>

To use the Amazon Bedrock Data Automation parser, specify `BEDROCK_DATA_AUTOMATION` in the `parsingStrategy` field of the `ParsingConfiguration` and include a [BedrockDataAutomationConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_BedrockDataAutomationConfiguration.html) in the `bedrockDataAutomationConfiguration` field, as in the following format:

```
{
    "parsingStrategy": "BEDROCK_DATA_AUTOMATION",
    "bedrockDataAutomationConfiguration": {
        "parsingModality": "string"
    }
}
```

### Foundation model
<a name="w2aac28c10c23c15c17c11c13b5"></a>

To use a foundation model as a parser, specify the `BEDROCK_FOUNDATION_MODEL` in the `parsingStrategy` field of the `ParsingConfiguration` and include a [BedrockFoundationModelConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_BedrockFoundationModelConfiguration.html) in the `bedrockFoundationModelConfiguration` field, as in the following format:

```
{
    "parsingStrategy": "BEDROCK_FOUNDATION_MODEL",
    "bedrockFoundationModelConfiguration": {
        "modelArn": "string",
        "parsingModality": "string",
        "parsingPrompt": {
            "parsingPromptText": "string"
        }
    }
}
```

## Choose a chunking strategy
<a name="kb-data-source-customize-chunking"></a>

You can customize how the documents in your data are chunked for storage and retrieval. To learn about options for chunking data in Amazon Bedrock Knowledge Bases, see [How content chunking works for knowledge bases](kb-chunking.md).

**Warning**  
You can't change the chunking strategy after connecting to the data source.

In the AWS Management Console you choose the chunking strategy when connecting to a data source. With the Amazon Bedrock API, you include a [ChunkingConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_ChunkingConfiguration.html) in the `chunkingConfiguration` field of the [VectorIngestionConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_VectorIngestionConfiguration.html).

**Note**  
If you omit this configuration, Amazon Bedrock splits your content into chunks of approximately 300 tokens, while preserving sentence boundaries.

Expand the section that corresponds to the parsing strategy that you want to use:

### No chunking
<a name="w2aac28c10c23c15c17c13c13b1"></a>

To treat each document in your data source as a single source chunk, specify `NONE` in the `chunkingStrategy` field of the `ChunkingConfiguration`, as in the following format:

```
{
    "chunkingStrategy": "NONE"
}
```

### Fixed-size chunking
<a name="w2aac28c10c23c15c17c13c13b3"></a>

To divide each document in your data source into chunks of approximately the same size, specify `FIXED_SIZE` in the `chunkingStrategy` field of the `ChunkingConfiguration` and include a [FixedSizeChunkingConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_FixedSizeChunkingConfiguration.html) in the `fixedSizeChunkingConfiguration` field, as in the following format:

```
{
    "chunkingStrategy": "FIXED_SIZE",
    "fixedSizeChunkingConfiguration": {
        "maxTokens": number,
        "overlapPercentage": number
    }
}
```

### Hierarchical chunking
<a name="w2aac28c10c23c15c17c13c13b5"></a>

To divide each document in your data source into two levels, where the second layer contains smaller chunks derived from the first layer, specify `HIERARCHICAL` in the `chunkingStrategy` field of the `ChunkingConfiguration` and include the `hierarchicalChunkingConfiguration` field, as in the following format:

```
{
    "chunkingStrategy": "HIERARCHICAL",
    "hierarchicalChunkingConfiguration": {
        "levelConfigurations": [{
            "maxTokens": number
        }],
        "overlapTokens": number
    }
}
```

### Semantic chunking
<a name="w2aac28c10c23c15c17c13c13b7"></a>

To divide each document in your data source into chunks that prioritize semantic meaning over syntactic structure, specify `SEMANTIC` in the `chunkingStrategy` field of the `ChunkingConfiguration` and include the `semanticChunkingConfiguration` field, as in the following format:

```
{
    "chunkingStrategy": "SEMANTIC",
    "semanticChunkingConfiguration": {
        "breakpointPercentileThreshold": number,
        "bufferSize": number,
        "maxTokens": number
    }
}
```

## Use a Lambda function during ingestion
<a name="kb-data-source-customize-lambda"></a>

You can post-process how the source chunks from your data are written to the vector store with a Lambda function in the following ways:
+ Include chunking logic to provide a custom chunking strategy.
+ Include logic to specify chunk-level metadata.

To learn about writing a custom Lambda function for ingestion, see [Use a custom transformation Lambda function to define how your data is ingested](kb-custom-transformation.md). In the AWS Management Console you choose the Lambda function when connecting to a data source. With the Amazon Bedrock API, you include a [CustomTransformationConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CustomTransformationConfiguration.html) in the `CustomTransformationConfiguration` field of the [VectorIngestionConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_VectorIngestionConfiguration.html) and specify the ARN of the Lambda, as in the following format:

```
{
    "transformations": [{
        "transformationFunction": {
            "transformationLambdaConfiguration": {
                "lambdaArn": "string"
            }
        },
        "stepToApply": "POST_CHUNKING"
    }],
    "intermediateStorage": {
        "s3Location": {
            "uri": "string"
        }
    }
}
```

You also specify the S3 location in which to store the output after applying the Lambda function.

You can include the `chunkingConfiguration` field to apply the Lambda function after applying one of the chunking options that Amazon Bedrock offers.

# Set up security configurations for your knowledge base
<a name="kb-create-security"></a>

After you've created a knowledge base, you might have to set up the following security configurations:

**Topics**
+ [

## Set up data access policies for your knowledge base
](#kb-create-security-data)
+ [

## Set up network access policies for your Amazon OpenSearch Serverless knowledge base
](#kb-create-security-network)

## Set up data access policies for your knowledge base
<a name="kb-create-security-data"></a>

If you're using a [custom role](kb-permissions.md), set up security configurations for your newly created knowledge base. If you let Amazon Bedrock create a service role for you, you can skip this step. Follow the steps in the tab corresponding to the database that you set up.

------
#### [ Amazon OpenSearch Serverless ]

To restrict access to the Amazon OpenSearch Serverless collection to the knowledge base service role, create a data access policy. You can do so in the following ways:
+ Use the Amazon OpenSearch Service console by following the steps at [Creating data access policies (console)](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-data-access.html#serverless-data-access-console) in the Amazon OpenSearch Service Developer Guide.
+ Use the AWS API by sending a [CreateAccessPolicy](https://docs.aws.amazon.com/opensearch-service/latest/ServerlessAPIReference/API_CreateAccessPolicy.html) request with an [OpenSearch Serverless endpoint](https://docs.aws.amazon.com/general/latest/gr/opensearch-service.html#opensearch-service-regions). For an AWS CLI example, see [Creating data access policies (AWS CLI)](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-data-access.html#serverless-data-access-cli).

Use the following data access policy, specifying the Amazon OpenSearch Serverless collection and your service role:

```
[
    {
        "Description": "${data access policy description}",
        "Rules": [
          {
            "Resource": [
              "index/${collection_name}/*"
            ],
            "Permission": [
                "aoss:DescribeIndex",
                "aoss:ReadDocument",
                "aoss:WriteDocument"
            ],
            "ResourceType": "index"
          }
        ],
        "Principal": [
            "arn:aws:iam::${account-id}:role/${kb-service-role}"
        ]
    }
]
```

------
#### [ Pinecone, Redis Enterprise Cloud or MongoDB Atlas ]

To integrate a Pinecone, Redis Enterprise Cloud, MongoDB Atlas vector index, attach the following identity-based policy to your knowledge base service role to allow it to access the AWS Secrets Manager secret for the vector index.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [{
        "Effect": "Allow",
        "Action": [
            "bedrock:AssociateThirdPartyKnowledgeBase"
        ],
        "Resource": "*",
        "Condition": {
            "StringEquals": {
                "bedrock:ThirdPartyKnowledgeBaseCredentialsSecretArn": "arn:aws:secretsmanager:us-east-1:123456789012:secret:${secret-id}"
            }
        }
    }]
}
```

------

------

## Set up network access policies for your Amazon OpenSearch Serverless knowledge base
<a name="kb-create-security-network"></a>

If you use a private Amazon OpenSearch Serverless collection for your knowledge base, it can only be accessed through an AWS PrivateLink VPC endpoint. You can create a private Amazon OpenSearch Serverless collection when you [set up your Amazon OpenSearch Serverless vector collection](knowledge-base-setup.md) or you can make an existing Amazon OpenSearch Serverless collection (including one that the Amazon Bedrock console created for you) private when you configure its network access policy.

The following resources in the Amazon OpenSearch Service Developer Guide will help you understand the setup required for a private Amazon OpenSearch Serverless collections:
+ For more information about setting up a VPC endpoint for a private Amazon OpenSearch Serverless collection, see [Access Amazon OpenSearch Serverless using an interface endpoint (AWS PrivateLink)](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-vpc.html).
+ For more information about network access policies in Amazon OpenSearch Serverless, see [Network access for Amazon OpenSearch Serverless](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-network.html).

To allow an Amazon Bedrock knowledge base to access a private Amazon OpenSearch Serverless collection, you must edit the network access policy for the Amazon OpenSearch Serverless collection to allow Amazon Bedrock as a source service. Choose the tab for your preferred method, and then follow the steps:

------
#### [ Console ]

1. Open the Amazon OpenSearch Service console at [https://console.aws.amazon.com/aos/](https://console.aws.amazon.com/aos/).

1. From the left navigation pane, select **Collections**. Then choose your collection.

1. In the **Network** section, select the **Associated Policy**.

1. Choose **Edit**.

1. For **Select policy definition method**, do one of the following:
   + Leave **Select policy definition method** as **Visual editor** and configure the following settings in the **Rule 1** section:

     1. (Optional) In the **Rule name** field, enter a name for the network access rule.

     1. Under **Access collections from**, select **Private (recommended)**.

     1. Select **AWS service private access**. In the text box, enter **bedrock.amazonaws.com**.

     1. Unselect **Enable access to OpenSearch Dashboards.**
   + Choose **JSON** and paste the following policy in the **JSON editor**.

     ```
     [
         {                                        
             "AllowFromPublic": false,
             "Description":"${network access policy description}",
             "Rules":[
                 {
                     "ResourceType": "collection",
                     "Resource":[
                         "collection/${collection-id}"
                     ]
                 }
             ],
             "SourceServices":[
                 "bedrock.amazonaws.com"
             ]
         }
     ]
     ```

1. Choose **Update**.

------
#### [ API ]

To edit the network access policy for your Amazon OpenSearch Serverless collection, do the following:

1. Send a [GetSecurityPolicy](https://docs.aws.amazon.com/opensearch-service/latest/ServerlessAPIReference/API_GetSecurityPolicy.html) request with an [OpenSearch Serverless endpoint](https://docs.aws.amazon.com/general/latest/gr/opensearch-service.html#opensearch-service-regions). Specify the `name` of the policy and specify the `type` as `network`. Note the `policyVersion` in the response.

1. Send a [UpdateSecurityPolicy](https://docs.aws.amazon.com/opensearch-service/latest/ServerlessAPIReference/API_UpdateSecurityPolicy.html) request with an [OpenSearch Serverless endpoint](https://docs.aws.amazon.com/general/latest/gr/opensearch-service.html#opensearch-service-regions). Minimally, specify the following fields:  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/kb-create-security.html)

   ```
   [
       {                                        
           "AllowFromPublic": false,
           "Description":"${network access policy description}",
           "Rules":[
               {
                   "ResourceType": "collection",
                   "Resource":[
                       "collection/${collection-id}"
                   ]
               }
           ],
           "SourceServices":[
               "bedrock.amazonaws.com"
           ]
       }
   ]
   ```

For an AWS CLI example, see [Creating data access policies (AWS CLI)](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-data-access.html#serverless-data-access-cli).

------
+ Use the Amazon OpenSearch Service console by following the steps at [Creating network policies (console)](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-network.html#serverless-network-console). Instead of creating a network policy, note the **Associated policy** in the **Network** subsection of the collection details.