# Build a knowledge base for multimodal content
<a name="kb-multimodal"></a>

Amazon Bedrock Knowledge Bases supports multimodal content including images, audio, and video files. You can search using images as queries, retrieve visually similar content, and process multimedia files alongside traditional text documents. This capability enables you to extract insights from diverse data types—standalone images, audio recordings, and video files stored across your organization.

Amazon Bedrock Knowledge Bases enable you to index and retrieve information from text, visual, and audio content. Organizations can now search product catalogs using images, find specific moments in training videos, and retrieve relevant segments from customer support call recordings.

**Regional availability**  
Multimodal processing approaches have different regional availability. For detailed information, see [Regional availability](kb-multimodal-choose-approach.md#kb-multimodal-processing-regions).

## Features and capabilities
<a name="kb-multimodal-features"></a>

Multimodal knowledge bases provide the following key capabilities:

**Image-based queries**  
Submit images as search queries to find visually similar content when using Nova Multimodal Embeddings. Supports product matching, visual similarity search, and image retrieval.

**Audio content retrieval**  
Search audio files using text queries. Retrieve specific segments from recordings with timestamp references. Audio transcription enables text-based search across spoken content including meetings, calls, and podcasts.

**Video segment extraction**  
Locate specific moments within video files using text queries. Retrieve video segments with precise timestamps.

**Cross-modal search**  
Search across different data types including text documents, images, audio, and video. Retrieve relevant content regardless of original format.

**Source references with timestamps**  
Retrieval results include references to original files with temporal metadata for audio and video. Enables precise navigation to relevant segments within multimedia content.

**Flexible processing options**  
Choose between native multimodal embeddings for visual similarity or text conversion for speech-based content. Configure processing approach based on content characteristics and application requirements.

## How it works
<a name="kb-multimodal-how-it-works"></a>

Multimodal knowledge bases process and retrieve content through a multi-stage pipeline that handles different data types appropriately:

****Ingestion and processing****

1. **Data source connection:** Connect your knowledge base to Amazon S3 buckets or custom data sources containing text documents, images, audio files, and video files.

1. **File type detection:** The system identifies each file type by its extension and routes it to the appropriate processing pipeline.

1. **Content processing:** Depending on your configuration, files are processed using one of two approaches:
   + **Nova Multimodal Embeddings:** Preserves native format for visual and audio similarity matching. Images, audio, and video are embedded directly without conversion to text.
   + **Bedrock Data Automation (BDA):** Converts multimedia to text representations. Audio is transcribed using Automatic Speech Recognition (ASR), video is processed to extract scene summaries and transcripts, and images undergo OCR and visual content extraction.

1. **Embedding generation:** Processed content is converted to vector embeddings using your selected embedding model. These embeddings capture semantic meaning and enable similarity-based retrieval.

1. **Vector storage:** Embeddings are stored in your configured vector database along with metadata including file references, timestamps (for audio and video), and content type information.

1. **Multimodal storage (optional):** If configured, original multimedia files are copied to a dedicated multimodal storage destination for reliable retrieval, ensuring availability even if source files are modified or deleted.

****Query and retrieval****

1. **Query processing:** User queries (text or image) are converted to embeddings using the same embedding model used during ingestion.

1. **Similarity search:** The query embedding is compared against stored embeddings in the vector database to identify the most relevant content.

1. **Result retrieval:** The system returns matching content with metadata including:
   + Source URI (original file location)
   + Timestamp metadata (for audio and video segments)
   + Content type and modality information

1. **Response generation (optional):** For `RetrieveAndGenerate` requests, retrieved content is passed to a foundation model to generate contextually relevant text responses. This is supported when using BDA processing or when the knowledge base contains text content.

**Important**  
The system returns references to complete files with timestamp metadata for audio and video content. Your application must extract and play specific segments based on the provided start and end timestamps. The AWS Management Console handles this automatically.

**Topics**
+ [

## Features and capabilities
](#kb-multimodal-features)
+ [

## How it works
](#kb-multimodal-how-it-works)
+ [

# Choosing your multimodal processing approach
](kb-multimodal-choose-approach.md)
+ [

# Prerequisites for multimodal knowledge bases
](kb-multimodal-prerequisites.md)
+ [

# Create a knowledge base for multimodal content
](kb-multimodal-create.md)
+ [

# Adding data sources and starting ingestion
](kb-multimodal-add-data-source-and-ingest.md)
+ [

# Testing and querying multimodal knowledge bases
](kb-multimodal-test-and-query.md)
+ [

# Troubleshooting multimodal knowledge bases
](kb-multimodal-troubleshooting.md)

# Choosing your multimodal processing approach
<a name="kb-multimodal-choose-approach"></a>

Amazon Bedrock Knowledge Bases offers two approaches for processing multimodal content: Nova Multimodal Embeddings for visual similarity searches, and Bedrock Data Automation (BDA) for text-based processing of multimedia content. You can also use foundation models as your parser if your input modality is image but not for audio or video.

This section describes using Nova Multimodal Embeddings and BDA as your processing approach for multimodal content. Each approach is optimized for different use cases and query patterns.

**Topics**
+ [

## Multimodal processing approach
](#kb-multimodal-processing-approach)
+ [

## Regional availability
](#kb-multimodal-processing-regions)
+ [

## Selection criteria by content type
](#kb-multimodal-selection-guidance)
+ [

## Supported file types and data sources
](#kb-multimodal-supported-files)
+ [

## Capabilities and limitations
](#kb-multimodal-approach-details)

## Multimodal processing approach
<a name="kb-multimodal-processing-approach"></a>

The following table shows a comparison between Nova Multimodal Embeddings and BDA for processing multimodal content.


**Processing approach comparison**  

| Characteristic | Nova Multimodal Embeddings | Bedrock Data Automation (BDA) | 
| --- | --- | --- | 
| Processing method | Generates embeddings without intermediate text conversion | Converts multimedia to text, then creates embeddings | 
| Query types supported | Text queries or image queries | Text queries only | 
| Primary use cases | Visual similarity search, product matching, image discovery | Speech transcription, text-based search, content analysis | 
| RAG functionality | Limited to text content only | Full RetrieveAndGenerate support | 
| Storage requirements | Multimodal storage destination required | Multimodal storage destination optional though if not specified, only text data will be processed by BDA. For non-text input, you must specify a multimodal storage destination. | 

## Regional availability
<a name="kb-multimodal-processing-regions"></a>


**Regional availability**  

| Nova Multimodal Embeddings | Bedrock Data Automation (BDA) | 
| --- | --- | 
| US East (N. Virginia) only |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/kb-multimodal-choose-approach.html)  | 

## Selection criteria by content type
<a name="kb-multimodal-selection-guidance"></a>

Use this decision matrix to choose the appropriate processing approach based on your content and use case requirements:

**Note**  
If you use the BDA parser with the Amazon Nova Multimodal Embeddings model, the embeddings model will act like a text embeddings model. When working with multimodal content, use one of the processing approaches for best results depending on your use case.


**Processing approach recommendations by content type**  

| Content Type | Nova Multimodal Embeddings | Bedrock Data Automation (BDA) | 
| --- | --- | --- | 
| Product catalogs and images | Recommended - Enables visual similarity matching and image-based queries | Limited - Only extracts text through OCR | 
| Meeting recordings and calls | Cannot process speech content meaningfully | Recommended - Provides full speech transcription and searchable text | 
| Training and educational videos | Partial - Handles visual content but misses speech | Recommended - Captures both speech transcripts and visual descriptions | 
| Customer support recordings | Not recommended - Speech content cannot be processed effectively | Recommended - Creates complete searchable conversation transcripts | 
| Technical diagrams and charts | Recommended - Excellent for visual similarity and pattern matching | Limited - Extracts text labels but misses visual relationships | 

## Supported file types and data sources
<a name="kb-multimodal-supported-files"></a>

The supported file types depend on your chosen processing approach:


**Supported file types by processing approach**  

| File Type | Nova Multimodal Embeddings | Bedrock Data Automation (BDA) | 
| --- | --- | --- | 
| Images | .png, .jpg, .jpeg, .gif, .webp | .png, .jpg, .jpeg | 
| Audio | .mp3, .ogg, .wav | .amr, .flac, .m4a, .mp3, .ogg, .wav | 
| Video | .mp4, .mov, .mkv, .webm, .flv, .mpeg, .mpg, .wmv, .3gp | .mp4, .mov | 
| Documents | Processed as text | .pdf (plus text extraction from images) | 

****Supported data sources****  
Multimodal content is supported with the following data sources:
+ **Amazon S3:** Full support for all multimodal file types
+ **Custom data sources:** Support for inline content up to 10MB base64 encoded

**Important**  
Multimodal retrieval is currently available only for Amazon S3 data sources. Other data sources (Confluence, SharePoint, Salesforce, Web Crawler) do not process multimodal files during ingestion. These files are skipped and will not be available for multimodal queries.

## Capabilities and limitations
<a name="kb-multimodal-approach-details"></a>

**Nova Multimodal Embeddings**  
**Key capabilities:**  
+ Native multimodal processing preserves original content format for optimal visual similarity matching
+ Image-based queries allow users to upload images and find visually similar content
+ Excellent performance for product catalogs, visual search, and content discovery applications
**Limitations:**  
+ Cannot effectively process speech or audio content - spoken information is not searchable
+ `RetrieveAndGenerate` and rerank functionality limited to text content only
+ Requires configuration of a dedicated multimodal storage destination

**Bedrock Data Automation (BDA)**  
**Key capabilities:**  
+ Comprehensive speech transcription using Automatic Speech Recognition (ASR) technology
+ Visual content analysis generates descriptive text for images and video scenes
+ Full `RetrieveAndGenerate` support enables complete RAG functionality across all content
+ Text-based search works consistently across all multimedia content types
**Limitations:**  
+ No support for image-based queries when used without Nova Multimodal Embeddings - all searches must use text input
+ Cannot perform visual similarity matching or image-to-image searches
+ Longer ingestion processing time due to content conversion requirements
+ Supports fewer multimedia file formats compared to Nova Multimodal Embeddings

**Speech content processing**  
Nova Multimodal Embeddings cannot effectively process speech content in audio or video files. If your multimedia content contains important spoken information that users need to search, choose the BDA approach to ensure full transcription and searchability.

# Prerequisites for multimodal knowledge bases
<a name="kb-multimodal-prerequisites"></a>

Amazon Bedrock multimodal knowledge bases require additional setup beyond standard knowledge bases to process images, audio, and video content. The specific prerequisites depend on your chosen processing approach and storage configuration.

Before you can create a multimodal knowledge base, you must fulfill the following prerequisites:

**Topics**
+ [

## Prerequisites
](#kb-multimodal-prerequisites)
+ [

## Permissions for multimodal content
](#kb-multimodal-prerequisites-permissions)

## Prerequisites
<a name="kb-multimodal-prerequisites"></a>

1. Make sure your data is in a [supported data source connector](data-source-connectors.md). Multimodal content is only supported with Amazon S3 and custom data sources.

1. (Optional) [Set up your own supported vector store](knowledge-base-setup.md). You can skip this step if you plan to use the AWS Management Console to automatically create a vector store for you.

1. Create a custom AWS Identity and Access Management (IAM) [service role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_terms-and-concepts.html#iam-term-service-role) with the proper permissions for multimodal processing. See [Permissions for multimodal content](#kb-multimodal-prerequisites-permissions) for details.
**Note**  
If you're using the console, Amazon Bedrock Knowledge Bases will automatically configure the permissions for you.

1. (Optional) Set up extra security configurations by following the steps at [Encryption of knowledge base resources](encryption-kb.md).

1. If you plan to use the [https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_RetrieveAndGenerate.html](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_RetrieveAndGenerate.html) API operation with BDA-processed content, request access to the models that you'll use in the Regions that you'll use them in by following the steps at [Access Amazon Bedrock foundation models](model-access.md).

## Permissions for multimodal content
<a name="kb-multimodal-prerequisites-permissions"></a>

Multimodal knowledge bases require additional permissions beyond standard knowledge base permissions. The specific permissions depend on your chosen processing approach and storage configuration.

You must configure the following permissions based on your multimodal processing approach:
+ **Nova Multimodal Embeddings permissions:** Required when using Nova Multimodal Embeddings for direct visual and audio similarity searches. Includes permissions for asynchronous model invocation and multimodal storage access.
+ **Bedrock Data Automation (BDA) permissions:** Required when using BDA to convert multimodal content to text representations. Includes permissions for data automation invocation and status monitoring.
+ **Customer-managed KMS key permissions:** Required when using customer-managed encryption keys with BDA processing. Includes permissions for key operations and grant creation.
+ **Multimodal storage permissions:** Required when configuring a multimodal storage destination. Includes standard S3 permissions for the storage bucket.

For detailed IAM policies and step-by-step permission configuration, see [Permissions for multimodal content](kb-permissions.md#kb-permissions-multimodal).

### Storage requirements
<a name="kb-multimodal-storage-requirements"></a>

**Nova Multimodal Embeddings**  
**Required:** You must configure a multimodal storage destination. This destination stores copies of your multimedia files for retrieval and ensures availability even if source files are modified or deleted.

**Bedrock Data Automation (BDA)**  
**Optional:** You can configure a multimodal storage bucket for additional reliability, and also to retrieve the file at runtime. However, it's not required since BDA converts content to text.  
If you select BDA parser without configuring a multimodal storage bucket, only text parsing will be available. To leverage multimodal parsing capabilities with BDA (processing images, audio, and video), you must configure a multimodal storage destination.

**Multimodal storage destination configuration**  
When configuring your multimodal storage destination, consider the following:
+ **Use separate buckets (recommended):** Configure different Amazon S3 buckets for your data source and multimodal storage destination. This provides the simplest setup and avoids potential conflicts.
+ **If using the same bucket:** You must specify an inclusion prefix for your data source that limits which content is ingested. This prevents re-ingesting extracted media files.
+ **Avoid "aws/" prefix:** When using the same bucket for both data source and multimodal storage destination, do not use inclusion prefixes starting with "aws/" as this path is reserved for extracted media storage.

# Create a knowledge base for multimodal content
<a name="kb-multimodal-create"></a>

You can create multimodal knowledge bases using either the console or API. Choose your approach based on your multimodal processing needs.

**Important**  
Multimodal support is only available when creating a knowledge base with unstructured data sources. Structured data sources do not support multimodal content processing.

------
#### [ Console ]

**To create a multimodal knowledge base from the console**

1. Sign in to the AWS Management Console with an IAM identity that has permissions to use the Amazon Bedrock console. Then, open the Amazon Bedrock console at [https://console.aws.amazon.com/bedrock](https://console.aws.amazon.com/bedrock).

1. In the left navigation pane, choose **Knowledge bases**.

1. In the **Knowledge bases** section, choose **Create**, and then choose **Knowledge Base with vector store**.

1. (Optional) Under **Knowledge Base details**, change the default name and provide a description for your knowledge base.

1. Under **IAM permissions**, choose an IAM role that provides Amazon Bedrock permissions to access other required AWS services. You can either have Amazon Bedrock create the service role for you, or you can choose to use your own custom role. For multimodal permissions, see [Permissions for multimodal content](kb-permissions.md#kb-permissions-multimodal).

1. Choose **Amazon S3** as your data source and choose **Next** to configure your data source.
**Note**  
You can add up to 5 Amazon S3 data sources during knowledge base creation. Additional data sources can be added after the knowledge base is created.

1. Provide the **S3 URI** of the bucket containing your multimodal content and configure an inclusion prefix if needed. The inclusion prefix is a folder path that can be used to limit what content gets ingested.

1. Under **Chunking and parsing configurations**, choose your parsing strategy:
   + **Bedrock default parser:** Recommended for text-only content processing. This parser processes common text formats while ignoring multimodal files. Supports text documents including Word, Excel, HTML, Markdown, TXT, and CSV files.
   + **Bedrock Data Automation (BDA):** Converts multimodal content to searchable text representations. Processes PDFs, images, audio, and video files to extract text, generate descriptions for visual content, and create transcriptions for audio and video content.
   + **Foundation model parser:** Provides advanced parsing capabilities for complex document structures. Processes PDFs, images, structured documents, tables, and visually rich content to extract text and generate descriptions for visual elements.

1. Choose **Next** and select your embedding model and multimodal processing approach. 
   + **Amazon Nova Multimodal Embeddings V1.0:** Choose **Amazon Nova embedding V1.0** for direct visual and audio similarity searches. Configure audio and video chunk duration (1-30 seconds, default 5 seconds) to control how content is segmented.
**Note**  
Audio and video chunking parameters are configured at the embedding model level, not at the data source level. A validation exception occurs if you provide this configuration for non-multimodal embedding models. Configure audio and video chunk duration (default: 5 seconds, range: 1-30 seconds) to control how content is segmented. Shorter chunks enable precise content retrieval while longer chunks preserve more semantic context.
**Important**  
Amazon Nova embedding v1.0 has limited support for searching speech content in audio/video data. If you need to support speech, use Bedrock Data Automation as a parser.
   + **Text embeddings with BDA:** Choose a text embedding model (such as Titan Text Embeddings v2) when using BDA processing. Text embedding models limit retrieval to text-only content, but you can enable multimodal retrieval by selecting either Amazon Bedrock Data Automation or Foundation Model as parsers.
**Note**  
If you use BDA parser with Nova Multimodal Embeddings, Amazon Bedrock Knowledge Bases will go with BDA parsing first. In this case, the embedding model will not generate native multimodal embeddings for images, audio, and video as BDA converts these to text representations.

1. If using Nova Multimodal Embeddings, configure the **Multimodal storage destination** by specifying an Amazon S3 bucket where processed files will be stored for retrieval. Knowledge Bases will store images parsed into a single Amazon S3 bucket with a folder created .bda for easy access.
**Lifecycle policy recommendation**  
When using Nova Multimodal Embeddings, Amazon Bedrock stores transient data in your multimodal storage destination and attempts to delete it after processing is completed. We recommend applying a lifecycle policy on the transient data path to ensure proper cleanup. For detailed instructions, see [Managing transient data with Amazon S3 lifecycle policies](kb-multimodal-troubleshooting.md#kb-multimodal-lifecycle-policy).

1. In the **Vector database** section, choose your vector store method and configure the appropriate dimensions based on your selected embedding model.

1. Choose **Next** and review the details of your knowledge base configuration, then choose **Create knowledge base**.

------
#### [ CLI ]

**To create a multimodal knowledge base using the AWS CLI**
+ Create a knowledge base with Nova Multimodal Embeddings. Send a [https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateKnowledgeBase.html](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateKnowledgeBase.html) request:

  ```
  aws bedrock-agent create-knowledge-base \
  --cli-input-json file://kb-nova-mme.json
  ```

  Contents of `kb-nova-mme.json` (replace the placeholder values with your specific configuration):

  ```
  {
      "knowledgeBaseConfiguration": {
          "vectorKnowledgeBaseConfiguration": {
              "embeddingModelArn": "arn:aws:bedrock:us-east-1::foundation-model/amazon.nova-2-multimodal-embeddings-v1:0",
              "supplementalDataStorageConfiguration": {
                  "storageLocations": [
                      {
                          "type": "S3",
                          "s3Location": {
                              "uri": "s3://<multimodal-storage-bucket>/"
                          }
                      }
                  ]
              }
          },
          "type": "VECTOR"
      },
      "storageConfiguration": {
          "opensearchServerlessConfiguration": {
              "collectionArn": "arn:aws:aoss:us-east-1:<account-id>:collection/<collection-id>",
              "vectorIndexName": "<index-name>",
              "fieldMapping": {
                  "vectorField": "<vector-field>",
                  "textField": "<text-field>",
                  "metadataField": "<metadata-field>"
              }
          },
          "type": "OPENSEARCH_SERVERLESS"
      },
      "name": "<knowledge-base-name>",
      "description": "Multimodal knowledge base with Nova Multimodal Embeddings"
  }
  ```

  Replace the following placeholders:
  + `<multimodal-storage-bucket>` - S3 bucket for storing multimodal files
  + `<account-id>` - Your AWS account ID
  + `<collection-id>` - OpenSearch Serverless collection ID
  + `<index-name>` - Vector index name in your OpenSearch collection (configured with appropriate dimensions for your chosen embedding model)
  + `<vector-field>` - Field name for storing embeddings
  + `<text-field>` - Field name for storing text content
  + `<metadata-field>` - Field name for storing metadata

------

# Adding data sources and starting ingestion
<a name="kb-multimodal-add-data-source-and-ingest"></a>

After creating your knowledge base, add data sources containing your multimodal content and start ingestion jobs to process and index the content.

**Data source deletion behavior**  
When you delete a data source with the deletion policy set to RETAIN, the ingested content remains in the vector database and will continue to be used for retrieval. The content is only removed if you explicitly sync the knowledge base after deleting the data source. Data sources with the default DELETE policy will automatically remove content from the vector database and supplemental storage during deletion. This ensures that your knowledge base continues to function even if source files are modified or deleted, but you should be aware that deleted data sources with RETAIN policy may still contribute to search results.

## Add data sources
<a name="kb-multimodal-add-data-source"></a>

Add data sources containing your multimodal content to your knowledge base.

**Important**  
For BDA data sources: Only data sources created after the launch of audio/video support will process audio and video files. Existing BDA data sources created before this feature launch will continue to skip audio and video files. To enable audio/video processing for existing knowledge bases, create new data sources.

------
#### [ Console ]

**To add a data source from the console**

1. From your knowledge base details page, choose **Add data source**.

1. Choose **Amazon S3** as your data source type.

1. Provide a name and description for your data source.

1. Configure the Amazon S3 location containing your multimodal files by providing the bucket URI and any inclusion prefixes.

1. Under **Content parsing and chunking**, configure your parsing and chunking methods:
**Note**  
Text embedding models limit retrieval to text-only content, but you can enable multimodal retrieval via text by selecting either Amazon Bedrock Data Automation (for audio, video, and images) or Foundation Model as parsers (for images).

   Choose from three parsing strategies:
   + **Bedrock default parser:** Recommended for text-only parsing. This parser ignores multimodal content and is commonly used with multimodal embedding models.
   + **Bedrock Data Automation as parser:** Enables parsing and storing multimodal content as text, supporting PDFs, images, audio, and video files.
   + **Foundation model as parser:** Provides advanced parsing for images and structured documents, supporting PDFs, images, tables, and visually rich documents.

1. Choose **Add data source** to create the data source.

------
#### [ CLI ]

**To add a data source using the AWS CLI**
+ Create a data source for your multimodal content. Send a [https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) request:

  ```
  aws bedrock-agent create-data-source \
  --knowledge-base-id <knowledge-base-id> \
  --cli-input-json file://ds-multimodal.json
  ```

  For Nova Multimodal Embeddings (no special parsing configuration needed), use this `ds-multimodal.json` content:

  ```
  {
      "dataSourceConfiguration": {
          "type": "S3",
          "s3Configuration": {
              "bucketArn": "arn:aws:s3:::<data-source-bucket>",
              "inclusionPrefixes": ["<folder-path>"]
          }
      },
      "name": "multimodal_data_source",
      "description": "Data source with multimodal content",
      "dataDeletionPolicy": "RETAIN"
  }
  ```

  For BDA parsing approach, use this configuration:

  ```
  {
      "dataSourceConfiguration": {
          "type": "S3",
          "s3Configuration": {
              "bucketArn": "arn:aws:s3:::<data-source-bucket>",
              "inclusionPrefixes": ["<folder-path>"]
          }
      },
      "name": "multimodal_data_source_bda",
      "description": "Data source with BDA multimodal parsing",
      "dataDeletionPolicy": "RETAIN",
      "vectorIngestionConfiguration": {
          "parsingConfiguration": {
              "bedrockDataAutomationConfiguration": {
                  "parsingModality": "MULTIMODAL"
              }
          }
      }
  }
  ```

------

## Start an ingestion job
<a name="kb-multimodal-start-ingestion"></a>

After adding your data sources, start an ingestion job to process and index your multimodal content.

------
#### [ Console ]

**To start ingestion from the console**

1. From your data source details page, choose **Sync**.

1. Monitor the sync status on the data source page. Ingestion may take several minutes depending on the size and number of your multimodal files.

1. Once sync completes successfully, your multimodal content is ready for querying.

------
#### [ CLI ]

**To start ingestion using the AWS CLI**

1. Start an ingestion job. Send a [https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_StartIngestionJob.html](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_StartIngestionJob.html) request:

   ```
   aws bedrock-agent start-ingestion-job \
   --knowledge-base-id <knowledge-base-id> \
   --data-source-id <data-source-id>
   ```

   Replace the placeholders with:
   + `<knowledge-base-id>` - ID from knowledge base creation
   + `<data-source-id>` - ID from data source creation

1. Monitor the ingestion job status using [https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_GetIngestionJob.html](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_GetIngestionJob.html).

------

## Resyncing after data source deletion
<a name="kb-multimodal-resync-after-deletion"></a>

If you delete a data source and want to remove its content from the knowledge base, you must explicitly resync the knowledge base:

**To remove deleted data source content**

1. Delete the data source using the console or [https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_DeleteDataSource.html](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_DeleteDataSource.html) API.

1. Start a new ingestion job on any remaining data sources to update the vector database and remove content from the deleted data source.

1. Verify that queries no longer return results from the deleted data source.

**Note**  
Without resyncing, content from deleted data sources will continue to appear in search results even though the data source no longer exists.

# Testing and querying multimodal knowledge bases
<a name="kb-multimodal-test-and-query"></a>

After ingesting your multimodal content, you can test and query your knowledge base using the console or API. The available query types depend on your chosen processing approach.

------
#### [ Console ]

**To test your knowledge base from the console**

1. From your knowledge base details page, scroll to the **Test knowledge base** section.

1. Choose your query type:
   + **Standard retrieval only:** Query and retrieve information from data sources in a single Knowledge Base.
   + **Retrieval and response generation:** Query a single Knowledge Base and generate responses based on the retrieved results by using a foundation model.
**Note**  
If you have multimodal content, you must use the BDA parser for retrieval and response generation.

1. Configure additional options as needed:
   + **Source chunks:** Specify the maximum number of source chunks to return
   + **Search Type:** Select search type to customize querying strategy
   + **Metadata filters:** Apply filters to narrow search results
   + **Guardrails:** Select an existing guardrail or create a new one

1. Enter a text query or upload an image (Nova Multimodal Embeddings only) to search your multimodal content. Use the attachment button to upload images for visual similarity search.

1. Review the results, which include:
   + Retrieved content chunks with relevance scores
   + Source file references and timestamps (for audio/video)
   + Metadata including file types and processing information
   + For multimedia content, playback controls with automatic segment positioning based on retrieved timestamps

------
#### [ API ]

The following examples show how to use the Amazon Bedrock Agent Runtime API to query your multimodal knowledge base programmatically:

**Text query example**  
Search using text input:

```
aws bedrock-agent-runtime retrieve \
--knowledge-base-id <knowledge-base-id> \
--retrieval-query text="robot automation in manufacturing"
```

**Image query example (Nova Multimodal Embeddings only)**  
Search using an uploaded image:

```
{
    "knowledgeBaseId": "<knowledge-base-id>",
    "retrievalQuery": {
        "imageQuery": {
            "inlineContent": {
                "mimeType": "image/jpeg",
                "data": "<base64-encoded-image>"
            }
        }
    }
}
```

------

## Supported query types
<a name="kb-multimodal-query-types"></a>

**Text queries**  
Supported with both Nova Multimodal Embeddings and BDA approaches. Search using natural language text to find relevant content across all media types.

**Image queries**  
Only supported with Nova Multimodal Embeddings. Upload images to find visually similar content in your knowledge base.

## Understanding response metadata
<a name="kb-multimodal-response-metadata"></a>

Multimodal query responses include additional metadata for multimedia content:

**Source attribution**  
Original file location (sourceUri) and multimodal storage location (supplementalUri) for reliable access

**Temporal metadata**  
Start and end timestamps for audio and video segments, enabling precise navigation to relevant content

**Content type information**  
File format, processing method, and modality indicators to help applications handle different content types appropriately

**Vector database metadata structure**  
When multimodal content is processed and stored, the following metadata structure is used in the vector database:
+ **text field:** For multimedia files processed with Nova Multimodal Embeddings, this field contains an empty string since the content is embedded as native multimedia rather than text
+ **metadata field:** Contains structured information including source details and related content references:

  ```
  {
    "source": {
      "sourceType": "S3",
      "s3Location": {
        "uri": "s3://source-bucket/path/to/file.mp4"
      }
    },
    "relatedContent": [{
      "type": "S3",
      "s3Location": {
        "uri": "s3://multimodal-storage-bucket/processed/file.mp4"
      }
    }]
  }
  ```
+ **Auto-created fields:** Additional fields for filtering and identification:
  + `x-amz-bedrock-kb-source-uri`: Original source URI for filtering operations
  + `x-amz-bedrock-kb-data-source-id`: Data source identifier for tracking content origin
  + `x-amz-bedrock-kb-chunk-start-time-in-millis`: Start timestamp in milliseconds for audio and video segments
  + `x-amz-bedrock-kb-chunk-end-time-in-millis`: End timestamp in milliseconds for audio and video segments
  + `x-amz-bedrock-kb-source-file-mime-type`: MIME type of the source file
  + `x-amz-bedrock-kb-source-file-modality`: Modality of the source file (TEXT, IMAGE, AUDIO, VIDEO)

**Important**  
Applications must use the provided timestamps to extract and play specific segments from audio and video files. The knowledge base returns references to complete files, not pre-segmented clips.

# Troubleshooting multimodal knowledge bases
<a name="kb-multimodal-troubleshooting"></a>

This section provides guidance for resolving common issues encountered when working with multimodal knowledge bases. The troubleshooting information is organized by general limitations, common error scenarios with their causes and solutions, and performance optimization recommendations. Use this information to diagnose and resolve issues during setup, ingestion, or querying of your multimodal content.

## General limitations
<a name="kb-multimodal-general-limitations"></a>

Be aware of these current limitations when working with multimodal knowledge bases:
+ **File size limits:** Maximum 1.5 GB per video file, 1 GB per audio file (Nova Multimodal Embeddings), or 1.5 GB per file (BDA)
+ **Files per ingestion job:** Maximum 15,000 files per job (Nova Multimodal Embeddings) or 1,000 files per job (BDA)
+ **Query limits:** Maximum of one image per query
+ **Data source restrictions:** Only Amazon S3 and custom data sources support multimodal content
+ **BDA chunking limitations:** When using Bedrock Data Automation with fixed size chunking, overlap percentage settings are not applied to audio and video content
+ **BDA concurrent job limits:** Default limit of 20 concurrent BDA jobs. For large-scale processing, consider requesting a service quota increase
+ **Reranker model limitations:** Reranker models are not supported for multimodal content
+ **Summarization limitations:** Summarization of retrieval responses containing non-text content is not supported
+ **Query input limitations:** Input containing both text and image is not currently supported. You can use either text or image queries, but not both simultaneously.
+ **Guardrail image content filters:** When using image queries with a guardrail that has image content filters configured, the input image will be evaluated against the guardrail and may be blocked if it violates the configured filter thresholds
+ **Input and type mismatch:** By default, the input is assumed to be text when the type is not specified. When using modalities other than text, you must specify the correct type

## Common errors and solutions
<a name="kb-multimodal-common-errors"></a>

If you encounter issues with your multimodal knowledge base, review these common scenarios:

**4xx error when using image queries**  
**Cause:** Attempting to use image queries with text-only embedding models or BDA-processed knowledge bases.  
**Solution:** Choose Amazon Nova Multimodal Embeddings when creating your knowledge base for image query support.

**RAG returns 4xx error with multimodal content**  
**Cause:** Using `RetrieveAndGenerate` with knowledge base containing only multimodal content and Amazon Nova Multimodal Embeddings model.  
**Solution:** Use BDA parser for RAG functionality, or ensure that your knowledge base contains text content.

**Multimodal storage destination required error**  
**Cause:** Using Nova Multimodal Embeddings without configuring a multimodal storage destination.  
**Solution:** Specify a multimodal storage destination when using Nova Multimodal Embeddings.

**Data source and multimodal storage use same S3 bucket**  
**Cause:** Configuring your data source and multimodal storage destination to use the same Amazon S3 bucket without proper inclusion prefixes.  
**Solution:** Either use separate buckets for data source and multimodal storage, or configure inclusion prefixes to prevent re-ingesting extracted media files.

**Inclusion prefix cannot start with "aws/"**  
**Cause:** Using an inclusion prefix that starts with "aws/" when your data source and multimodal storage destination share the same Amazon S3 bucket.  
**Solution:** Specify a different inclusion prefix. The "aws/" path is reserved for extracted media storage and cannot be used as an inclusion prefix to avoid re-ingesting processed content.

**BDA ingestion skips multimodal content**  
**Cause:** Knowledge base was created without a multimodal storage destination, then BDA data source was added with multimodal content.  
**Solution:** Re-create the knowledge base with a multimodal storage destination configured to enable BDA processing of audio, video, and image files.

**Knowledge base created without multimodal embedding model**  
**Cause:** Knowledge base was created with a text-only embedding model, limiting multimodal capabilities.  
**Solution:** Create a new knowledge base with Nova Multimodal Embeddings to enable native multimodal processing and image-based queries.

## Managing transient data with Amazon S3 lifecycle policies
<a name="kb-multimodal-lifecycle-policy"></a>

When using Nova Multimodal Embeddings, Amazon Bedrock stores transient data in your multimodal storage destination and attempts to delete it after processing is completed. We recommend applying a lifecycle policy on the transient data path to ensure that it is properly expired.

------
#### [ Console ]

**To create a lifecycle rule using the console**

1. Open the [Amazon S3 console](https://console.aws.amazon.com/s3).

1. Navigate to the multimodal storage destination you've configured for your Knowledge Base.

1. Choose the **Management** tab and select **Create lifecycle rule**.

1. For **Lifecycle rule name**, enter **Transient Data Deletion**.

1. Under **Filter type**, choose **Limit the scope of this rule using one or more filters**.

1. For **Prefix**, enter the transient data path for your knowledge base and data source.

   Replace the placeholder values in the following prefix with your actual identifiers:

   ```
   aws/bedrock/knowledge_bases/knowledge-base-id/data-source-id/transient_data
   ```
**Important**  
Do not apply lifecycle policies to the entire bucket or to the "aws/" prefix, as this will delete your multimodal content and cause retrieval failures. Only use the specific transient data path shown above.

1. Under **Lifecycle rule actions**, select **Expire current versions of objects**.

1. For **Days after object creation**, enter **1**.

1. Choose **Create rule**.

------
#### [ AWS CLI ]

**To create a lifecycle rule using the AWS CLI**

1. Create a JSON file named `lifecycle-policy.json` with the following content.

   Replace the placeholder values with your actual identifiers:
   + *knowledge-base-id* - Your knowledge base identifier
   + *data-source-id* - Your data source identifier

   ```
   {
       "Rules": [
           {
               "ID": "TransientDataDeletion",
               "Status": "Enabled",
               "Filter": {
                   "Prefix": "aws/bedrock/knowledge_bases/knowledge-base-id/data-source-id/transient_data"
               },
               "Expiration": {
                   "Days": 1
               }
           }
       ]
   }
   ```

1. Apply the lifecycle policy to your bucket. Replace *your-multimodal-storage-bucket* with your actual bucket name:

   ```
   aws s3api put-bucket-lifecycle-configuration \
       --bucket your-multimodal-storage-bucket \
       --lifecycle-configuration file://lifecycle-policy.json
   ```

1. Verify the lifecycle policy was applied:

   ```
   aws s3api get-bucket-lifecycle-configuration \
       --bucket your-multimodal-storage-bucket
   ```

------

For more information about Amazon S3 lifecycle policies, see [Managing the lifecycle of objects](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html) in the *Amazon S3 User Guide*.

## Performance considerations
<a name="kb-multimodal-performance-considerations"></a>

For optimal performance with your multimodal knowledge base, consider these factors:
+ **Processing time:** BDA processing takes longer due to content conversion
+ **Query latency:** Image queries may have higher latency than text queries
+ **Chunking duration:** Longer audio/video chunk durations increase processing time but may improve accuracy