

# Image understanding
<a name="modalities-image"></a>

**Note**  
This documentation is for Amazon Nova Version 1. For the Amazon Nova 2 image understanding guide, visit [Image understanding](https://docs.aws.amazon.com/nova/latest/nova2-userguide/using-multimodal-models.html#image-understanding).

Amazon Nova models allow you to include multiple images in the payload with a total payload limit of 25 MB. However, you can specify an Amazon S3 URI that contains your images for image understanding. This approach allows you to leverage the model for larger images and more images without being constrained by the 25 MB payload limitation. Amazon Nova models can analyze the passed images and answer questions, classify images, and summarize images based on your provided instructions.

## Image size information
<a name="modalities-image-resolution"></a>

To provide the best possible results, Amazon Nova automatically rescales input images up or down depending on their aspect ratio and original resolution. For each image, Amazon Nova first identifies the closest aspect ratio from 1:1, 1:2, 1:3, 1:4, 1:5, 1:6, 1:7, 1:8, 1:9 2:3, 2:4 and their transposes. Then the image is rescaled so that at least one side of the image is greater than 896px or the length of shorter side of the original image, while maintaining the closest aspect ratio. There's a maximum resolution of 8,000x8,000 pixels

## Bounding box detection
<a name="modalities-image-bounding"></a>

The Amazon Nova Lite and Amazon Nova Pro models are trained to precisely detect bounding boxes within images. This capability can be valuable when the objective is to obtain the coordinates of a specific object of interest. The bounding box detection functionality of the Amazon Nova model makes it a suitable candidate for image grounding tasks, thereby enabling enhanced understanding of screen shots. The Amazon Nova model outputs bounding boxes on a scale of [0, 1000), and after these coordinates are obtained, they can be resized based on the image dimensions as a post-processing step.

## Image to tokens conversion
<a name="modalities-image-tokens"></a>

As previously discussed, images are resized to maximize information extraction, while still maintaining the aspect ratio. What follows are some examples of sample image dimensions and approximate token calculations.


| image\$1resolution (HxW or WxH) | 900 x 450 | 900 x 900 | 1400 x 900 | 1.8K x 900 | 1.3Kx1.3K | 
| --- |--- |--- |--- |--- |--- |
| Estimated token count | \$1800 | \$11300 | \$11800 | \$12400 | \$12600 | 

So for example, consider an example image that is 800x400 in size, and you want to estimate the token count for this image. Based on the dimensions, to maintain an aspect ratio of 1:2, the closest resolution is 900x450. Therefore, the approximate token count for this image is about 800 tokens.

# Image understanding limitations
<a name="modalities-image-limitations"></a>

Understand the following limitations for Amazon Nova:
+ **Multilingual Image Understanding:** The models have limited understanding of multilingual images and video frames and can struggle or hallucinate on similar tasks.
+ **People identification**: The Amazon Nova models do not support the capability to identify or name individuals in images, documents or videos. The models will refuse to perform such tasks.
+ **Spatial reasoning**: The Amazon Nova models have limited spatial reasoning capabilities. They may struggle with tasks that require precise localization or layout analysis.
+ **Small Text in Images/Videos**: If the text in the image or video is too small, consider increasing relative size of the text in the image by cropping to relevant section while preserving necessary context.
+ **Counting**: The Amazon Nova models can provide approximate counts of objects in an image, but may not always be precisely accurate, especially when dealing with large numbers of small objects.
+ **Inappropriate content**: The Amazon Nova models will not process inappropriate or explicit images that violate the Acceptable Use Policy.
+ **Healthcare applications**: Due to the sensitive nature of these artifacts, even though Amazon Nova models can give general analysis on healthcare images or videos, we do not recommend that you interpret complex diagnostic scans. Amazon Nova responses should never be considered a substitute for professional medical advice.

# Image understanding examples
<a name="modalities-image-examples"></a>

The following example shows how to send a image prompt to Amazon Nova Model with [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html).

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
import base64
import boto3
import json
# Create a Bedrock Runtime client in the AWS Region of your choice.
client = boto3.client(
    "bedrock-runtime",
    region_name="us-east-1",
)

MODEL_ID = "us.amazon.nova-lite-v1:0"
# Open the image you'd like to use and encode it as a Base64 string.
with open("media/sunset.png", "rb") as image_file:
    binary_data = image_file.read()
    base_64_encoded_data = base64.b64encode(binary_data)
    base64_string = base_64_encoded_data.decode("utf-8")
# Define your system prompt(s).
system_list = [    {
        "text": "You are an expert artist. When the user provides you with an image, provide 3 potential art titles"
    }
]
# Define a "user" message including both the image and a text prompt.
message_list = [
    {
        "role": "user",
        "content": [
            {
                "image": {
                    "format": "png",
                    "source": {
                        "bytes": image // Binary array (Converse API) or Base64-encoded string (Invoke API)
                    },
                }
            },
            {
                "text": "Provide art titles for this image."
            }
        ],
    }
]
# Configure the inference parameters.
inf_params = {"maxTokens": 300, "topP": 0.1, "topK": 20, "temperature": 0.3}

native_request = {
    "schemaVersion": "messages-v1",
    "messages": message_list,
    "system": system_list,
    "inferenceConfig": inf_params,
}
# Invoke the model and extract the response body.
response = client.invoke_model(modelId=MODEL_ID, body=json.dumps(native_request))
model_response = json.loads(response["body"].read())
# Pretty print the response JSON.
print("[Full Response]")
print(json.dumps(model_response, indent=2))
# Print the text content for easy readability.
content_text = model_response["output"]["message"]["content"][0]["text"]
print("\n[Response Content Text]")
print(content_text)
```

For passing large image files or multiple image files, where the overall payload is greater than 25 MB, you can use Amazon S3. The following example demonstrates how to use Amazon S3 to upload images to Amazon Nova:

```
import boto3
import json
import base64
# Create a Bedrock Runtime client
client = boto3.client("bedrock-runtime", 
                      region_name="us-east-1", 
                     )
PRO_MODEL_ID = "us.amazon.nova-pro-v1:0"
LITE_MODEL_ID = "us.amazon.nova-lite-v1:0"
MICRO_MODEL_ID = "us.amazon.nova-micro-v1:0"
PREMIER_MODEL_ID = "us.amazon.nova-premier-v1:0"
messages = [
    {
        "role": "user",
        "content": [
            {
                "image": {
                    "format": "png",
                    "source": {
                        "s3Location": {
                            #Replace the s3 bucket URI 
                            "uri": "s3://demo-bucket/cat.png"
                            "bucketOwner" : "123456789012"
                        }
                    },
                }
            },
            {"text": "Describe the following image"},
        ],
    }
]
inf_params = {"maxTokens": 300, "topP": 0.1, "temperature": 0.3}
model_response = client.converse(
    modelId=LITE_MODEL_ID, messages=messages, inferenceConfig=inf_params
)
print("\n[Full Response]")
print(json.dumps(model_response, indent=2))
print("\n[Response Content Text]")
print(model_response["output"]["message"]["content"][0]["text"])
```