

# Multimodal support for Amazon Nova
<a name="modalities"></a>

**Note**  
This documentation is for Amazon Nova Version 1. For the Amazon Nova 2 multimodal documentation, visit [Multimodal understanding](https://docs.aws.amazon.com/nova/latest/nova2-userguide/using-multimodal-models.html).

Amazon Nova Understanding Models are multimodal understanding models, that means they support multimodal inputs such as images, videos, and documents to infer and answer question based on the content provided. The Amazon Nova models are equipped with novel vision capabilities that enable the model to comprehend and analyze images, documents, and videos thereby realizing multimodal understanding use cases.

The following section outline guidelines for working with images, documents, and videos in Amazon Nova. These include preprocessing strategies employed, code examples, and relevant limitations to consider.

**Topics**
+ [

## Supported content type by modality
](#modalities-content)
+ [

# Image understanding
](modalities-image.md)
+ [

# Video understanding
](modalities-video.md)
+ [

# Document understanding
](modalities-document.md)
+ [

# Error handling
](text-error-handing.md)

## Supported content type by modality
<a name="modalities-content"></a>

The following information details the file formats supported by media file and the accepted input method.


| Media File Type | File Formats supported | **Input Method** | Parsing Strategy | 
| --- |--- |--- |--- |
| Image | PNG, JPG, JPEG, GIF, WebP | Base64 Amazon S3 URI | Image Vision Understanding | 
| Text Document *(Converse API Only)* | CSV, XLS, XLSX, HTML, TXT, MD, DOC | Bytes Amazon S3 URI | Textual Understanding from the document only. | 
| Media Document *(Converse API Only)* | PDF, DOCX | Bytes Amazon S3 URI | Text with interleaved Image Understanding | 
| Video | MP4, MOV, MKV, WebM, FLV, MPEG, MPG, WMV, 3GP | Base64 Amazon S3 URI | Video Vision Understanding | 

**Note**  
You can include up to five files from your computer or 1000 files from Amazon S3. Each file must be no more than 1 GB when uploaded from Amazon S3. The total size of the uploaded files cannot exceed 25 MB when uploading from your computer or 2 GB when uploading from Amazon S3.

Because 25 MB is the overall payload limit, ensure that you account for the base64 overhead. While working, remember that libraries and frameworks maintain memory, and passed media content can quickly add up. When using video, specifying an `s3Location` should alleviate many storage issues.

**Note**  
Large videos and documents take time to process, regardless of input method. If boto3 SDK times-out while waiting for a response from Amazon Bedrock, ensure that you have an appropriate [read\$1timeout](https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html) value set and have upgraded boto3 to at least version 1.38.

# Image understanding
<a name="modalities-image"></a>

**Note**  
This documentation is for Amazon Nova Version 1. For the Amazon Nova 2 image understanding guide, visit [Image understanding](https://docs.aws.amazon.com/nova/latest/nova2-userguide/using-multimodal-models.html#image-understanding).

Amazon Nova models allow you to include multiple images in the payload with a total payload limit of 25 MB. However, you can specify an Amazon S3 URI that contains your images for image understanding. This approach allows you to leverage the model for larger images and more images without being constrained by the 25 MB payload limitation. Amazon Nova models can analyze the passed images and answer questions, classify images, and summarize images based on your provided instructions.

## Image size information
<a name="modalities-image-resolution"></a>

To provide the best possible results, Amazon Nova automatically rescales input images up or down depending on their aspect ratio and original resolution. For each image, Amazon Nova first identifies the closest aspect ratio from 1:1, 1:2, 1:3, 1:4, 1:5, 1:6, 1:7, 1:8, 1:9 2:3, 2:4 and their transposes. Then the image is rescaled so that at least one side of the image is greater than 896px or the length of shorter side of the original image, while maintaining the closest aspect ratio. There's a maximum resolution of 8,000x8,000 pixels

## Bounding box detection
<a name="modalities-image-bounding"></a>

The Amazon Nova Lite and Amazon Nova Pro models are trained to precisely detect bounding boxes within images. This capability can be valuable when the objective is to obtain the coordinates of a specific object of interest. The bounding box detection functionality of the Amazon Nova model makes it a suitable candidate for image grounding tasks, thereby enabling enhanced understanding of screen shots. The Amazon Nova model outputs bounding boxes on a scale of [0, 1000), and after these coordinates are obtained, they can be resized based on the image dimensions as a post-processing step.

## Image to tokens conversion
<a name="modalities-image-tokens"></a>

As previously discussed, images are resized to maximize information extraction, while still maintaining the aspect ratio. What follows are some examples of sample image dimensions and approximate token calculations.


| image\$1resolution (HxW or WxH) | 900 x 450 | 900 x 900 | 1400 x 900 | 1.8K x 900 | 1.3Kx1.3K | 
| --- |--- |--- |--- |--- |--- |
| Estimated token count | \$1800 | \$11300 | \$11800 | \$12400 | \$12600 | 

So for example, consider an example image that is 800x400 in size, and you want to estimate the token count for this image. Based on the dimensions, to maintain an aspect ratio of 1:2, the closest resolution is 900x450. Therefore, the approximate token count for this image is about 800 tokens.

# Image understanding limitations
<a name="modalities-image-limitations"></a>

Understand the following limitations for Amazon Nova:
+ **Multilingual Image Understanding:** The models have limited understanding of multilingual images and video frames and can struggle or hallucinate on similar tasks.
+ **People identification**: The Amazon Nova models do not support the capability to identify or name individuals in images, documents or videos. The models will refuse to perform such tasks.
+ **Spatial reasoning**: The Amazon Nova models have limited spatial reasoning capabilities. They may struggle with tasks that require precise localization or layout analysis.
+ **Small Text in Images/Videos**: If the text in the image or video is too small, consider increasing relative size of the text in the image by cropping to relevant section while preserving necessary context.
+ **Counting**: The Amazon Nova models can provide approximate counts of objects in an image, but may not always be precisely accurate, especially when dealing with large numbers of small objects.
+ **Inappropriate content**: The Amazon Nova models will not process inappropriate or explicit images that violate the Acceptable Use Policy.
+ **Healthcare applications**: Due to the sensitive nature of these artifacts, even though Amazon Nova models can give general analysis on healthcare images or videos, we do not recommend that you interpret complex diagnostic scans. Amazon Nova responses should never be considered a substitute for professional medical advice.

# Image understanding examples
<a name="modalities-image-examples"></a>

The following example shows how to send a image prompt to Amazon Nova Model with [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html).

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
import base64
import boto3
import json
# Create a Bedrock Runtime client in the AWS Region of your choice.
client = boto3.client(
    "bedrock-runtime",
    region_name="us-east-1",
)

MODEL_ID = "us.amazon.nova-lite-v1:0"
# Open the image you'd like to use and encode it as a Base64 string.
with open("media/sunset.png", "rb") as image_file:
    binary_data = image_file.read()
    base_64_encoded_data = base64.b64encode(binary_data)
    base64_string = base_64_encoded_data.decode("utf-8")
# Define your system prompt(s).
system_list = [    {
        "text": "You are an expert artist. When the user provides you with an image, provide 3 potential art titles"
    }
]
# Define a "user" message including both the image and a text prompt.
message_list = [
    {
        "role": "user",
        "content": [
            {
                "image": {
                    "format": "png",
                    "source": {
                        "bytes": image // Binary array (Converse API) or Base64-encoded string (Invoke API)
                    },
                }
            },
            {
                "text": "Provide art titles for this image."
            }
        ],
    }
]
# Configure the inference parameters.
inf_params = {"maxTokens": 300, "topP": 0.1, "topK": 20, "temperature": 0.3}

native_request = {
    "schemaVersion": "messages-v1",
    "messages": message_list,
    "system": system_list,
    "inferenceConfig": inf_params,
}
# Invoke the model and extract the response body.
response = client.invoke_model(modelId=MODEL_ID, body=json.dumps(native_request))
model_response = json.loads(response["body"].read())
# Pretty print the response JSON.
print("[Full Response]")
print(json.dumps(model_response, indent=2))
# Print the text content for easy readability.
content_text = model_response["output"]["message"]["content"][0]["text"]
print("\n[Response Content Text]")
print(content_text)
```

For passing large image files or multiple image files, where the overall payload is greater than 25 MB, you can use Amazon S3. The following example demonstrates how to use Amazon S3 to upload images to Amazon Nova:

```
import boto3
import json
import base64
# Create a Bedrock Runtime client
client = boto3.client("bedrock-runtime", 
                      region_name="us-east-1", 
                     )
PRO_MODEL_ID = "us.amazon.nova-pro-v1:0"
LITE_MODEL_ID = "us.amazon.nova-lite-v1:0"
MICRO_MODEL_ID = "us.amazon.nova-micro-v1:0"
PREMIER_MODEL_ID = "us.amazon.nova-premier-v1:0"
messages = [
    {
        "role": "user",
        "content": [
            {
                "image": {
                    "format": "png",
                    "source": {
                        "s3Location": {
                            #Replace the s3 bucket URI 
                            "uri": "s3://demo-bucket/cat.png"
                            "bucketOwner" : "123456789012"
                        }
                    },
                }
            },
            {"text": "Describe the following image"},
        ],
    }
]
inf_params = {"maxTokens": 300, "topP": 0.1, "temperature": 0.3}
model_response = client.converse(
    modelId=LITE_MODEL_ID, messages=messages, inferenceConfig=inf_params
)
print("\n[Full Response]")
print(json.dumps(model_response, indent=2))
print("\n[Response Content Text]")
print(model_response["output"]["message"]["content"][0]["text"])
```

# Video understanding
<a name="modalities-video"></a>

**Note**  
This documentation is for Amazon Nova Version 1. For the Amazon Nova 2 video understanding guide, visit [Video understanding](https://docs.aws.amazon.com/nova/latest/nova2-userguide/using-multimodal-models.html#video-understanding).

The Amazon Nova models allow you to include a single video in the payload, which can be provided either in base64 format or through an Amazon S3 URI. When using the base64 method, the overall payload size must remain within 25 MB. However, you can specify an Amazon S3 URI for video understanding. This approach enables you to leverage the model for longer videos (up to 1 GB in size) without being constrained by the overall payload size limitation. Amazon Nova models can analyze the passed video and answer questions, classify a video, and summarize information in the video based on provided instructions.


| Media File Type | File Formats supported | **Input Method** | 
| --- |--- |--- |
| Video | MP4, MOV, MKV, WebM, FLV, MPEG, MPG, WMV, 3GP | Base64 *Recommended for payload size less than 25 MB* | 
|  |  | Amazon S3 URI *Recommended for payloads greater than 25 MB up to 2 GB. Individual files must be 1 GB or smaller.* | 

There are no differences in the video input token count, regardless of whether the video is passed as base64 (as long as it fits within the size constraints) or via an Amazon S3 location.

Note that for 3GP file format, the "format" field passed in the API request should be of the format "three\$1gp".

When using Amazon S3, ensure that you are set the "Content-Type" metadata to the correct MIME type for the video.

## Video size information
<a name="modalities-video-size"></a>

Amazon Nova video understanding capabilities support Multi-Aspect Ratio. All videos are resized with distortion (up or down, based on the input) to **672\$1672 square** dimensions before feeding it to the model. The model utilizes a dynamic sampling strategy based on the length of the video. For Amazon Nova Lite and Amazon Nova Pro, with videos less than or equal to 16 minutes in duration, a 1 frame per second (FPS) sampling rate is employed. However, for videos exceeding 16 minutes in length, the sampling rate decreases in order to maintain a consistent 960 frames sampled, with the frame sampling rate varying accordingly. This approach is designed to provide more accurate scene-level video understanding for shorter videos compared to longer video content. We recommend that you keep the video length less than 1 hour for low motion, and less than 16 minutes for anything with higher motion. For Amazon Nova Premier, the 1 FPS sampling rate is applied up to a limit of 3,200 frames.

There should be no difference when analyzing a 4k version of a video and a Full HD version. Similarly, because the sampling rate is at most 1 FPS, a 60 FPS video should perform as well as a 30 FPS video. Because of the 1 GB limit in video size, using higher than required resolution and FPS is not beneficial and will limit the video length that fits in that size limit. You might want to pre-process videos longer than 1 GB.

# Video understanding limitations
<a name="modalities-video-limitations"></a>

Understand the following limitations for Amazon Nova:
+ **Multilingual Image Understanding:** The models have limited understanding of multilingual images and video frames and can struggle or hallucinate on similar tasks.
+ **People identification**: The Amazon Nova models do not support the capability to identify or name individuals in images, documents or videos. The models will refuse to perform such tasks.
+ **Spatial reasoning**: The Amazon Nova models have limited spatial reasoning capabilities. They may struggle with tasks that require precise localization or layout analysis.
+ **Small Text in Images/Videos**: If the text in the image or video is too small, consider increasing relative size of the text in the image by cropping to relevant section while preserving necessary context.
+ **Counting**: The Amazon Nova models can provide approximate counts of objects in an image, but may not always be precisely accurate, especially when dealing with large numbers of small objects.
+ **Inappropriate content**: The Amazon Nova models will not process inappropriate or explicit images that violate the Acceptable Use Policy.
+ **Healthcare applications**: Due to the sensitive nature of these artifacts, even though Amazon Nova models can give general analysis on healthcare images or videos, we do not recommend that you interpret complex diagnostic scans. Amazon Nova responses should never be considered a substitute for professional medical advice.

## Video tokens
<a name="modalities-video-tokens"></a>

The length of the video is main factor impacting the number of tokens generated. To calculate the approximate cost, you should multiply the estimated number of video tokens by the per-token price of the specific model being utilized.

The following table provides some approximations of frame sampling and token utilization per video length for Amazon Nova Pro, Lite, and Micro:


| video\$1duration | 10 sec | 30 sec | 16 min | 20 min | 30 min | 45 min | 1 hr | 1.5 hr | 
| --- |--- |--- |--- |--- |--- |--- |--- |--- |
| frames\$1to\$1sample | 10 | 30 | 960 | 960 | 960 | 960 | 960 | 960 | 
| sample\$1rate\$1fps | 1 | 1 | 1 | 0.755 | 0.5 | 0.35556 | 0.14 | 0.096 | 
| Estimated token count | 2,880 | 8,640 | 276,480 | 276,480 | 276,480 | 276,480 | 276,480 | 276,480 | 

The following table provides some approximations of frame sampling and token utilization per video length for Amazon Nova Premier:


| video\$1duration | 10 sec | 30 sec | 16 min | 20 min | 30 min | 45 min | 1 hr | 1.5 hr | 
| --- |--- |--- |--- |--- |--- |--- |--- |--- |
| frames\$1to\$1sample | 10 | 30 | 960 | 1200 | 1800 | 2700 |  |  | 
| sample\$1rate\$1fps | 1 | 1 | 1 | 1 | 1 | 1 |  |  | 
| Estimated token count | 2,880 | 8,640 | 276,480 | 345,600 | 518,400 | 777,600 |  |  | 

The following table provides some approximations of frame sampling and token utilization per video length for Amazon Nova Lite 1.5


| video\$1duration | 10 sec | 30 sec | 16 min | 20 min | 30 min | 45 min | 1 hr | 1.5 hr | 
| --- |--- |--- |--- |--- |--- |--- |--- |--- |
| frames\$1to\$1sample | 10 | 30 | 960 | 1200 | 1800 | 2700 |  |  | 
| sample\$1rate\$1fps | 1 | 1 | 1 | 1 | 1 | 1 |  |  | 
| Estimated token count | 2,880 | 8,640 | 276,480 | 345,600 | 518,400 | 777,600 |  |  | 

# Video understanding limitations
<a name="prompting-vision-limitations"></a>

The following are key model limitations, where model accuracy and performance might not be guaranteed.
+ **One video per request:** currently the model supports only 1 video per request. Some frameworks and libraries use memory to keep track of previous interactions. There might be a video that was added in a previous context.
+ **No audio support:** The models are currently trained to process and understand video content solely based on the visual information in the video. They do not possess the capability to analyze or comprehend any audio components that are present in the video.
+ **Temporal causality:** The model has limited understanding of event causality across the progression of the video. Although it answers well to point in time questions, it does not perform as well on answers that depends on understanding a sequence of events
+ **Multilingual image understanding:** The models have limited understanding of multilingual images and video frames. They might struggle or hallucinate on similar tasks.
+ **People identification**: The Amazon Nova models do not support the capability to identify or name individuals in images, documents, or videos. The models will refuse to perform such tasks.
+ **Spatial reasoning**: The Amazon Nova models have limited spatial reasoning capabilities. They may struggle with tasks that require precise localization or layout analysis.
+ **Small text in images or videos**: If the text in the image or video is too small, consider increasing relative size of the text in the image by cropping to the relevant section while preserving necessary content.
+ **Counting**: The Amazon Nova models can provide approximate counts of objects in an image, but might not always be precisely accurate, especially when dealing with large numbers of small objects.
+ **Inappropriate content**: The Amazon Nova models will not process inappropriate or explicit images that violate the Acceptable Use Policy
+ **Healthcare applications**: Due to the sensitive nature of these artifacts, even though Amazon Nova models can give general analysis on healthcare images or videos, we do not recommend that you interpret complex diagnostic scans. The response of Amazon Nova should never be considered a substitute for professional medical advice.

# Video understanding examples
<a name="modalities-video-examples"></a>

The following example shows how to send a video prompt to Amazon Nova Model with [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html).

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
import base64
import boto3
import json
# Create a Bedrock Runtime client in the AWS Region of your choice.
client = boto3.client(
    "bedrock-runtime",
    region_name="us-east-1",
)

MODEL_ID = "us.amazon.nova-lite-v1:0"
# Open the image you'd like to use and encode it as a Base64 string.
with open("media/cooking-quesadilla.mp4", "rb") as video_file:
    binary_data = video_file.read()
    base_64_encoded_data = base64.b64encode(binary_data)
    base64_string = base_64_encoded_data.decode("utf-8")
# Define your system prompt(s).
system_list= [
    {
        "text": "You are an expert media analyst. When the user provides you with a video, provide 3 potential video titles"
    }
]
# Define a "user" message including both the image and a text prompt.
message_list = [
    {
        "role": "user",
        "content": [
            {
                "video": {
                    "format": "mp4",
                    "source": {
                        "bytes": video // Binary array (Converse API) or Base64-encoded string (Invoke API)
                    },
                }
            },
            {
                "text": "Provide video titles for this clip."
            },
        ],
    }
]
# Configure the inference parameters.
inf_params = {"maxTokens": 300, "topP": 0.1, "topK": 20, "temperature": 0.3}

native_request = {
    "schemaVersion": "messages-v1",
    "messages": message_list,
    "system": system_list,
    "inferenceConfig": inf_params,
}
# Invoke the model and extract the response body.
response = client.invoke_model(modelId=MODEL_ID, body=json.dumps(native_request))
model_response = json.loads(response["body"].read())
# Pretty print the response JSON.
print("[Full Response]")
print(json.dumps(model_response, indent=2))
# Print the text content for easy readability.
content_text = model_response["output"]["message"]["content"][0]["text"]
print("\n[Response Content Text]")
print(content_text)
```

The following example shows how to send a video using an Amazon S3 location to Amazon Nova with [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html).

```
import base64
import boto3
import json
# Create a Bedrock Runtime client in the AWS Region of your choice.
client = boto3.client(
    "bedrock-runtime",
    region_name="us-east-1",
)

MODEL_ID = "us.amazon.nova-lite-v1:0"
# Define your system prompt(s).
system_list = [
    {
        "text": "You are an expert media analyst. When the user provides you with a video, provide 3 potential video titles"
    }
]
# Define a "user" message including both the image and a text prompt.
message_list = [
    {
        "role": "user",
        "content": [
            {
                "video": {
                    "format": "mp4",
                    "source": {
                        "s3Location": {
                            "uri": "s3://my_bucket/my_video.mp4", 
                            "bucketOwner": "111122223333"
                        }
                    }
                }
            },
            {
                "text": "Provide video titles for this clip."
            }
        ]
    }
]
# Configure the inference parameters.
inf_params = {"maxTokens": 300, "topP": 0.1, "topK": 20, "temperature": 0.3}

native_request = {
    "schemaVersion": "messages-v1",
    "messages": message_list,
    "system": system_list,
    "inferenceConfig": inf_params,
}
# Invoke the model and extract the response body.
response = client.invoke_model(modelId=MODEL_ID, body=json.dumps(native_request))
model_response = json.loads(response["body"].read())
# Pretty print the response JSON.
print("[Full Response]")
print(json.dumps(model_response, indent=2))
# Print the text content for easy readability.
content_text = model_response["output"]["message"]["content"][0]["text"]
print("\n[Response Content Text]")
print(content_text)
```

# Document understanding
<a name="modalities-document"></a>

**Note**  
This documentation is for Amazon Nova Version 1. For the Amazon Nova 2 document understanding guide, visit [Document understanding](https://docs.aws.amazon.com/nova/latest/nova2-userguide/using-multimodal-models.html#document-understanding).

Amazon Nova’s document understanding capability allows you to include entire documents (PDFs, Word files, spreadsheets, etc.) in your prompt and ask questions or requests about their content. Nova’s multimodal understanding models (Lite, Pro, Premier) can interpret both the text and visual elements (like charts or tables) within these documents. This enables use cases such as question-answering, summarization, and analysis of lengthy reports or scanned documents. Key features include a very large context window (1-2M tokens) for long documents and the ability to handle multiple documents in one query. 

Amazon Nova distinguishes between two types of document inputs:
+ **Text-based document types** (e.g. TXT, CSV, Markdown, HTML, DOC): These are processed primarily for their textual content. Nova will focus on understanding and extracting information from the text in these documents. 
+ **Media based document types** (e.g. PDF, DOCX): These files may contain complex layouts, images, charts, or embedded graphics. For media-based documents, Nova processes both the visual and textual elements. Nova employs vision-based understanding to interpret visual content—such as charts, tables, diagrams, or screenshots—alongside the document's text.

  JPEG2000 and JBIG2 aren't supported in PDF files in Amazon Nova.

Supported file formats include common document types: plain text and structured text files (CSV, TXT), spreadsheets (XLS/XLSX), HTML/Markdown, Word documents (DOC/DOCX), and PDF files. For images within documents, standard image formats (PNG, JPG, GIF, WebP) are handled, though PDFs containing certain image encodings (CYMK, SVG) are not supported. 


**Document Size Limits and Usage Guidelines**  

| Constraint | Limit | 
| --- | --- | 
|  Maximum number of documents  |  Up to 5 documents per request (applies to both direct upload and Amazon S3)  | 
|  Text-based document size  |  Each text document (e.g., .txt, .csv, .md, .html, .doc) must be ≤ 4.5 MB  | 
|  Media-based document size  |  For .pdf and .docx files, there is no individual file size limit, but: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/nova/latest/userguide/modalities-document.html)  | 
|  Unsupported PDF content  |  PDFs containing CMYK color profiles or SVG images are not supported  | 

# Using Nova's Document Understanding via API
<a name="modalities-document-examples"></a>

To illustrate how to use Amazon Nova for document QA (Question-Answering) or analysis, here’s a simplified example in Python. We’ll use the AWS Bedrock API (via Boto3 SDK) to send a PDF document along with a question for the model to answer.

```
            
import base64
import base64
import json
import boto3

# Initialize Bedrock runtime client (adjust region as needed)
client = boto3.client("bedrock-runtime", region_name="us-east-1")

MODEL_ID = "us.amazon.nova-lite-v1:5"  # using Nova Lite model in this example

# Read the document file (PDF) in binary mode
with open("my_document.pdf", "rb") as file:
    doc_bytes = file.read()

# Construct the conversation messages with document + question
messages = [
    {
        "role": "user",
        "content": [
            {
                "document": {
                    "format": "pdf",
                    "name": "Document1",  # neutral name for the document
                    "source": {
                        "bytes": doc_bytes  # embedding the PDF content directly
                    }
                }
            },
            {
                "text": "Here is a question about the document: ... (your question) ... ?"
            }
        ]
    }
]

# Set inference parameters (optional)
inf_params = {"maxTokens": 4000, "topP": 0.1, "temperature": 0.3}

# Invoke the model
response = client.converse(modelId=MODEL_ID, messages=messages, inferenceConfig=inf_params)

# Extract and print the answer
answer_text = response["output"]["message"]["content"][0]["text"]
print(answer_text)
```

If your input files are large (exceeding the 25 MB direct upload limit) or you have many files, you can store them in Amazon S3 and reference them. This avoids sending the raw bytes over the request. When using S3, ensure the Bedrock service has permission to access the bucket/object. For example, to reference a PDF in S3, your document source would use "s3Location" instead of "bytes", like so:

```
messages = [
    {
        "role": "user",
        "content": [
            {
                "document": {
                    "format": "pdf",
                    "name": "Report2023",
                    "source": {
                        "s3Location": {
                            "uri": "s3://your-bucket/path/to/document1.pdf",
                            "bucketOwner": "123456789012"
                        }
                    }
                }
            },
            {
                "text": "Summarize the key findings from the Q3 2023 report."
            }
        ]
    }
]
```

**Note**  
Document names can include only alphanumeric characters, hyphens, parentheses, and square brackets.  
The `name` field is vulnerable to prompt injections, because the model might inadvertently interpret it as instructions. Therefore, we recommend that you specify a neutral name.

# Error handling
<a name="text-error-handing"></a>

The way errors are communicated back to the client varies depending on the type of error that occurs. In this section, we focus only on the error conditions that are unique to the Amazon Nova model. The three primary types of errors you will want to handle in your application code are **input validation** errors, **Responsible AI (RAI) input deflection** errors, and **RAI output deflection** errors.

**Input validation:** Input validation errors occur when you use an unsupported value for an input parameter. For example, an out-of-bounds value for temperature, or incorrect format of the input `image`. All input validation errors are expressed as a **ValidationException** which contains a message string describing the cause of the problem.

**RAI input deflection** errors occur when any of the input text values or images are determined to violate the AWS Responsible AI policy. These errors are expressed as a **ValidationException** with one of the following messages:
+ **Input text** validation message: "This request has been blocked by our content filters. Please adjust your text prompt to submit a new request."
+ **Input image** validation message: "This request has been blocked by our content filters. Please adjust your input image to submit a new request."
+ **Input Video** validation message: "This request has been blocked by our content filters. Please adjust your input video to submit a new request."

RAI output deflection errors occur when an the output is generated but it is determined to be misaligned with the AWS Responsible AI policy. When this occurs, an exception is not used. Instead, a successful response is returned, and its structure contains an `error` field which is a string with one of the following values:
+ **Output text** validation message: "The generated text has been blocked by our content filters."