

# Inference request parameters and response fields for foundation models
<a name="model-parameters"></a>

The topics in this section describe the request parameters and response fields for the models that Amazon Bedrock supplies. When you make inference calls to models with the model invocation ([InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html), [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html), [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html), and [ConverseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseStream.html)) API operations, you include request parameters depending on the model that you're using.

If you created a [custom model](custom-models.md), use the same inference parameters as the foundation model from which it was customized.

If you are [importing a customized model into Amazon Bedrock](model-customization-import-model.md), make sure to use the same inference parameters that is mentioned for the customized model you are importing. If you are using inference parameters that do not match with the inference parameters mentioned for that model in this documentation, those parameters will be ignored.

Before viewing model parameters for different models, you should familiarize yourself with what model inference is by reading the following chapter: [Submit prompts and generate responses with model inference](inference.md).

Refer to the following pages for more information about different models in Amazon Bedrock:
+ For a table of models and their IDs to use with the model invocation API operations, the Regions they're supported in, and the general features that they support, see [Supported foundation models in Amazon Bedrock](models-supported.md).
+ For a table of the Amazon Bedrock Regions that each model is supported in, see [Model support by AWS Region in Amazon Bedrock](models-regions.md).
+ For a table of the Amazon Bedrock features that each model supports, see [Model support by feature in Amazon Bedrock](models-features.md).
+ To check if the Converse API (`Converse` and `ConverseStream`) supports a specific model, see [Supported models and model features](conversation-inference-supported-models-features.md).
+ When you make inference calls to a model, you include a prompt for the model. For general information about creating prompts for the models that Amazon Bedrock supports, see [Prompt engineering concepts](prompt-engineering-guidelines.md).
+ For code examples, see [Code examples for Amazon Bedrock using AWS SDKs](service_code_examples.md).

Select a topic to learn about models for that provider and their parameters.

**Topics**
+ [

# Amazon Nova models
](model-parameters-nova.md)
+ [

# Amazon Titan models
](model-parameters-titan.md)
+ [

# Anthropic Claude models
](model-parameters-claude.md)
+ [

# AI21 Labs models
](model-parameters-ai21.md)
+ [

# Cohere models
](model-parameters-cohere.md)
+ [

# DeepSeek models
](model-parameters-deepseek.md)
+ [

# Luma AI models
](model-parameters-luma.md)
+ [

# Meta Llama models
](model-parameters-meta.md)
+ [

# Mistral AI models
](model-parameters-mistral.md)
+ [

# OpenAI models
](model-parameters-openai.md)
+ [

# Stability AI models
](model-parameters-stability-diffusion.md)
+ [

# TwelveLabs models
](model-parameters-twelvelabs.md)
+ [

# Writer AI Palmyra models
](model-parameters-writer-palmyra.md)

# Amazon Nova models
<a name="model-parameters-nova"></a>

Amazon Nova multimodal understanding models are available for use for inferencing through the Invoke API ([InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html), [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html)) and the Converse API ([Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html) and [ConverseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseStream.html)). To create conversational applications see [Carry out a conversation with the Converse API operations](conversation-inference.md). Both of the API methods (Invoke and Converse) follow a very similar request pattern, for more information on API schema and Python code examples see [How to Invoke Amazon Nova Understanding Models](https://docs.aws.amazon.com/nova/latest/userguide/invoke.html).

**Important**  
The timeout period for inference calls to Amazon Nova is 60 minutes. By default, AWS SDK clients timeout after 1 minute. We recommend that you increase the read timeout period of your AWS SDK client to at least 60 minutes. For example, in the AWS Python botocore SDK, change the value of the `read_timeout`field in [botocore.config](https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html#) to at least 3600.

The default inference parameters can be found in the [Complete request schema](https://docs.aws.amazon.com/nova/latest/userguide/complete-request-schema.html) section of the Amazon Nova User Guide.

To find the model ID for Amazon Nova models, see [Supported foundation models in Amazon Bedrock](models-supported.md). To check if a feature is supported for Amazon Nova models, see [Supported models and model features](conversation-inference-supported-models-features.md). For more code examples, see [Code examples for Amazon Bedrock using AWS SDKs](service_code_examples.md).

Foundation models in Amazon Bedrock support input and output modalities, which vary from model to model. To check the modalities that Amazon Nova models support, see [Modality Support](https://docs.aws.amazon.com/nova/latest/userguide/modalities.html). To check which Amazon Bedrock features the Amazon Nova models support, see [Supported foundation models in Amazon Bedrock](models-supported.md). To check the AWS Regions that Amazon Nova models are available in, see [Supported foundation models in Amazon Bedrock](models-supported.md).

When you make inference calls with Amazon Nova models, you must include a prompt for the model. For general information about creating prompts for the models that Amazon Bedrock supports, see [Prompt engineering concepts](prompt-engineering-guidelines.md). For Amazon Nova specific prompt information, see the [Amazon Nova prompt engineering guide](https://docs.aws.amazon.com/nova/latest/userguide/prompting.html).

# Amazon Titan models
<a name="model-parameters-titan"></a>

This section describes the request parameters and response fields for Amazon Titan models. Use this information to make inference calls to Amazon Titan models with the [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) and [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) (streaming) operations. This section also includes Python code examples that shows how to call Amazon Titan models. To use a model in an inference operation, you need the model ID for the model. To get the model ID, see [Supported foundation models in Amazon Bedrock](models-supported.md). Some models also work with the [Converse API](conversation-inference.md). To check if the Converse API supports a specific Amazon Titan model, see [Supported models and model features](conversation-inference-supported-models-features.md). For more code examples, see [Code examples for Amazon Bedrock using AWS SDKs](service_code_examples.md).

Foundation models in Amazon Bedrock support input and output modalities, which vary from model to model. To check the modalities that Amazon Titan models support, see [Supported foundation models in Amazon Bedrock](models-supported.md). To check which Amazon Bedrock features the Amazon Titan models support, see [Supported foundation models in Amazon Bedrock](models-supported.md). To check which AWS Regions that Amazon Titan models are available in, see [Supported foundation models in Amazon Bedrock](models-supported.md).

When you make inference calls with Amazon Titan models, you include a prompt for the model. For general information about creating prompts for the models that Amazon Bedrock supports, see [Prompt engineering concepts](prompt-engineering-guidelines.md). 

**Topics**
+ [

# Amazon Titan Text models
](model-parameters-titan-text.md)
+ [

# Amazon Titan Image Generator G1 models
](model-parameters-titan-image.md)
+ [

# Amazon Titan Embeddings G1 - Text
](model-parameters-titan-embed-text.md)
+ [

# Amazon Titan Multimodal Embeddings G1
](model-parameters-titan-embed-mm.md)

# Amazon Titan Text models
<a name="model-parameters-titan-text"></a>

The Amazon Titan Text models support the following inference parameters.

For more information on Titan Text prompt engineering guidelines, see [Titan Text Prompt Engineering Guidelines](https://d2eo22ngex1n9g.cloudfront.net/Documentation/User+Guides/Titan/Amazon+Titan+Text+Prompt+Engineering+Guidelines.pdf). 

For more information on Titan models, see [Overview of Amazon Titan models](titan-models.md).

**Topics**
+ [

## Request and response
](#model-parameters-titan-request-response)
+ [

## Code examples
](#inference-titan-code)

## Request and response
<a name="model-parameters-titan-request-response"></a>

The request body is passed in the `body` field of an [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) request.

------
#### [ Request ]

```
{
    "inputText": string,
    "textGenerationConfig": {
        "temperature": float,  
        "topP": float,
        "maxTokenCount": int,
        "stopSequences": [string]
    }
}
```

The following parameters are required:
+ **inputText** – The prompt to provide the model for generating a response. To generate responses in a conversational style, submit the prompt by using the following format:

  ```
  "inputText": "User: <theUserPrompt>\nBot:"
  ```

  This format indicates to the model that it should respond on a new line after the user has provided a prompt.

The `textGenerationConfig` is optional. You can use it to configure the following [inference parameters](inference-parameters.md):
+ **temperature** – Use a lower value to decrease randomness in responses.  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-titan-text.html)
+ **topP** – Use a lower value to ignore less probable options and decrease the diversity of responses.  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-titan-text.html)
+ **maxTokenCount** – Specify the maximum number of tokens to generate in the response. Maximum token limits are strictly enforced.  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-titan-text.html)
+ **stopSequences** – Specify a character sequence to indicate where the model should stop.

------
#### [ InvokeModel Response ]

```
{
    "inputTextTokenCount": int,
    "results": [{
        "tokenCount": int,
        "outputText": "\n<response>\n",
        "completionReason": "string"
    }]
}
```

The response body contains the following fields:
+ **inputTextTokenCount** – The number of tokens in the prompt.
+ **results** – An array of one item, an object containing the following fields:
  + **tokenCount** – The number of tokens in the response.
  + **outputText** – The text in the response.
  + **completionReason** – The reason the response finished being generated. The following reasons are possible:
    + FINISHED – The response was fully generated.
    + LENGTH – The response was truncated because of the response length you set.
    + STOP\$1CRITERIA\$1MET – The response was truncated because the stop criteria was reached.
    + RAG\$1QUERY\$1WHEN\$1RAG\$1DISABLED – The feature is disabled and cannot complete the query.
    + CONTENT\$1FILTERED – The contents were filtered or removed by the content filter applied.

------
#### [ InvokeModelWithResponseStream Response ]

Each chunk of text in the body of the response stream is in the following format. You must decode the `bytes` field (see [Submit a single prompt with InvokeModel](inference-invoke.md) for an example).

```
{
    "chunk": {
        "bytes": b'{
            "index": int,
            "inputTextTokenCount": int,
            "totalOutputTextTokenCount": int,
            "outputText": "<response-chunk>",
            "completionReason": "string"
        }'
    }
}
```
+ **index** – The index of the chunk in the streaming response.
+ **inputTextTokenCount** – The number of tokens in the prompt.
+ **totalOutputTextTokenCount** – The number of tokens in the response.
+ **outputText** – The text in the response.
+ **completionReason** – The reason the response finished being generated. The following reasons are possible.
  + FINISHED – The response was fully generated.
  + LENGTH – The response was truncated because of the response length you set.
  + STOP\$1CRITERIA\$1MET – The response was truncated because the stop criteria was reached.
  + RAG\$1QUERY\$1WHEN\$1RAG\$1DISABLED – The feature is disabled and cannot complete the query.
  + CONTENT\$1FILTERED – The contents were filtered or removed by the filter applied.

------

## Code examples
<a name="inference-titan-code"></a>

The following example shows how to run inference with the Amazon Titan Text Premier model with the Python SDK.

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to create a list of action items from a meeting transcript
with the Amazon Titan Text model (on demand).
"""
import json
import logging
import boto3

from botocore.exceptions import ClientError


class ImageError(Exception):
    "Custom exception for errors returned by Amazon Titan Text models"

    def __init__(self, message):
        self.message = message


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_text(model_id, body):
    """
    Generate text using Amazon Titan Text models on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        response (json): The response from the model.
    """

    logger.info(
        "Generating text with Amazon Titan Text model %s", model_id)

    bedrock = boto3.client(service_name='bedrock-runtime')

    accept = "application/json"
    content_type = "application/json"

    response = bedrock.invoke_model(
        body=body, modelId=model_id, accept=accept, contentType=content_type
    )
    response_body = json.loads(response.get("body").read())

    finish_reason = response_body.get("error")

    if finish_reason is not None:
        raise ImageError(f"Text generation error. Error is {finish_reason}")

    logger.info(
        "Successfully generated text with Amazon Titan Text model %s", model_id)

    return response_body


def main():
    """
    Entrypoint for Amazon Titan Text model example.
    """
    try:
        logging.basicConfig(level=logging.INFO,
                            format="%(levelname)s: %(message)s")

        # You can replace the model_id with any other Titan Text Models
        # Titan Text Model family model_id is as mentioned below:
        # amazon.titan-text-premier-v1:0, amazon.titan-text-express-v1, amazon.titan-text-lite-v1
        model_id = 'amazon.titan-text-premier-v1:0'

        prompt = """Meeting transcript: Miguel: Hi Brant, I want to discuss the workstream  
            for our new product launch Brant: Sure Miguel, is there anything in particular you want
            to discuss? Miguel: Yes, I want to talk about how users enter into the product.
            Brant: Ok, in that case let me add in Namita. Namita: Hey everyone 
            Brant: Hi Namita, Miguel wants to discuss how users enter into the product.
            Miguel: its too complicated and we should remove friction.  
            for example, why do I need to fill out additional forms?  
            I also find it difficult to find where to access the product
            when I first land on the landing page. Brant: I would also add that
            I think there are too many steps. Namita: Ok, I can work on the
            landing page to make the product more discoverable but brant
            can you work on the additonal forms? Brant: Yes but I would need 
            to work with James from another team as he needs to unblock the sign up workflow.
            Miguel can you document any other concerns so that I can discuss with James only once?
            Miguel: Sure.
            From the meeting transcript above, Create a list of action items for each person. """

        body = json.dumps({
            "inputText": prompt,
            "textGenerationConfig": {
                "maxTokenCount": 3072,
                "stopSequences": [],
                "temperature": 0.7,
                "topP": 0.9
            }
        })

        response_body = generate_text(model_id, body)
        print(f"Input token count: {response_body['inputTextTokenCount']}")

        for result in response_body['results']:
            print(f"Token count: {result['tokenCount']}")
            print(f"Output text: {result['outputText']}")
            print(f"Completion reason: {result['completionReason']}")

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))
    except ImageError as err:
        logger.error(err.message)
        print(err.message)

    else:
        print(
            f"Finished generating text with the Amazon Titan Text Premier model {model_id}.")


if __name__ == "__main__":
    main()
```

The following example shows how to run inference with the Amazon Titan Text G1 - Express model with the Python SDK.

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to create a list of action items from a meeting transcript
with the Amazon &titan-text-express; model (on demand).
"""
import json
import logging
import boto3

from botocore.exceptions import ClientError


class ImageError(Exception):
    "Custom exception for errors returned by Amazon &titan-text-express; model"

    def __init__(self, message):
        self.message = message


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_text(model_id, body):
    """
    Generate text using Amazon &titan-text-express; model on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        response (json): The response from the model.
    """

    logger.info(
        "Generating text with Amazon &titan-text-express; model %s", model_id)

    bedrock = boto3.client(service_name='bedrock-runtime')

    accept = "application/json"
    content_type = "application/json"

    response = bedrock.invoke_model(
        body=body, modelId=model_id, accept=accept, contentType=content_type
    )
    response_body = json.loads(response.get("body").read())

    finish_reason = response_body.get("error")

    if finish_reason is not None:
        raise ImageError(f"Text generation error. Error is {finish_reason}")

    logger.info(
        "Successfully generated text with Amazon &titan-text-express; model %s", model_id)

    return response_body


def main():
    """
    Entrypoint for Amazon &titan-text-express; example.
    """
    try:
        logging.basicConfig(level=logging.INFO,
                            format="%(levelname)s: %(message)s")

        model_id = 'amazon.titan-text-express-v1'

        prompt = """Meeting transcript: Miguel: Hi Brant, I want to discuss the workstream  
            for our new product launch Brant: Sure Miguel, is there anything in particular you want
            to discuss? Miguel: Yes, I want to talk about how users enter into the product.
            Brant: Ok, in that case let me add in Namita. Namita: Hey everyone 
            Brant: Hi Namita, Miguel wants to discuss how users enter into the product.
            Miguel: its too complicated and we should remove friction.  
            for example, why do I need to fill out additional forms?  
            I also find it difficult to find where to access the product
            when I first land on the landing page. Brant: I would also add that
            I think there are too many steps. Namita: Ok, I can work on the
            landing page to make the product more discoverable but brant
            can you work on the additonal forms? Brant: Yes but I would need 
            to work with James from another team as he needs to unblock the sign up workflow.
            Miguel can you document any other concerns so that I can discuss with James only once?
            Miguel: Sure.
            From the meeting transcript above, Create a list of action items for each person. """

        body = json.dumps({
            "inputText": prompt,
            "textGenerationConfig": {
                "maxTokenCount": 4096,
                "stopSequences": [],
                "temperature": 0,
                "topP": 1
            }
        })

        response_body = generate_text(model_id, body)
        print(f"Input token count: {response_body['inputTextTokenCount']}")

        for result in response_body['results']:
            print(f"Token count: {result['tokenCount']}")
            print(f"Output text: {result['outputText']}")
            print(f"Completion reason: {result['completionReason']}")

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))
    except ImageError as err:
        logger.error(err.message)
        print(err.message)

    else:
        print(
            f"Finished generating text with the Amazon &titan-text-express; model {model_id}.")


if __name__ == "__main__":
    main()
```

# Amazon Titan Image Generator G1 models
<a name="model-parameters-titan-image"></a>

The Amazon Titan Image Generator G1 V1 and Titan Image Generator G1 V2 models support the following inference parameters and model responses when carrying out model inference.

**Topics**
+ [

## Inference parameters
](#model-parameters-titan-image-api)
+ [

## Examples
](#model-parameters-titan-image-code-examples)

## Inference parameters
<a name="model-parameters-titan-image-api"></a>

When you make an [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) call using the Amazon Titan Image Generator models, replace the `body` field of the request with the format that matches your use-case. All tasks share an `imageGenerationConfig` object, but each task has a parameters object specific to that task. The following use-cases are supported.


****  

| taskType | Task parameters field | Type of task | Definition | 
| --- | --- | --- | --- | 
| TEXT\$1IMAGE | textToImageParams | Generation |  Generate an image using a text prompt.  | 
| TEXT\$1IMAGE | textToImageParams | Generation |  (Image conditioning-V2 only) Provide an additional input conditioning image along with a text prompt to generate an image that follows the layout and composition of the conditioning image.   | 
| INPAINTING | inPaintingParams | Editing |  Modify an image by changing the inside of a *mask* to match the surrounding background.  | 
| OUTPAINTING | outPaintingParams | Editing | Modify an image by seamlessly extending the region defined by the mask. | 
| IMAGE\$1VARIATION | imageVariationParams | Editing | Modify an image by producing variations of the original image. | 
| COLOR\$1GUIDED\$1GENERATION (V2 only) | colorGuidedGenerationParams | Generation | Provide a list of hex color codes along with a text prompt to generate an image that follows the color palette. | 
| BACKGROUND\$1REMOVAL (V2 only) | backgroundRemovalParams | Editing | Modify an image by identifying multiple objects and removing the background, outputting an image with a transparent background. | 

Editing tasks require an `image` field in the input. This field consists of a string that defines the pixels in the image. Each pixel is defined by 3 RGB channels, each of which ranges from 0 to 255 (for example, (255 255 0) would represent the color yellow). These channels are encoded in base64.

The image you use must be in JPEG or PNG format.

If you carry out inpainting or outpainting, you also define a *mask*, a Region or Regions that define parts of the image to be modified. You can define the mask in one of two ways.
+ `maskPrompt` – Write a text prompt to describe the part of the image to be masked.
+ `maskImage` – Input a base64-encoded string that defines the masked Regions by marking each pixel in the input image as (0 0 0) or (255 255 255).
  + A pixel defined as (0 0 0) is a pixel inside the mask.
  + A pixel defined as (255 255 255) is a pixel outside the mask.

  You can use a photo editing tool to draw masks. You can then convert the output JPEG or PNG image to base64-encoding to input into this field. Otherwise, use the `maskPrompt` field instead to allow the model to infer the mask.

Select a tab to view API request bodies for different image generation use-cases and explanations of the fields.

------
#### [ Text-to-image generation (Request) ]

A text prompt to generate the image must be <= 512 characters. Resolutions <= 1,408 on the longer side. negativeText (Optional) – A text prompt to define what not to include in the image that is <= 512 characters. See the table below for a full list of resolutions.

```
{
    "taskType": "TEXT_IMAGE",
    "textToImageParams": {
        "text": "string",      
        "negativeText": "string"
    },
    "imageGenerationConfig": {
        "quality": "standard" | "premium",
        "numberOfImages": int,
        "height": int,
        "width": int,
        "cfgScale": float,
        "seed": int
    }
}
```

The `textToImageParams` fields are described below.
+ **text** (Required) – A text prompt to generate the image.
+ **negativeText** (Optional) – A text prompt to define what not to include in the image.
**Note**  
Don't use negative words in the `negativeText` prompt. For example, if you don't want to include mirrors in an image, enter **mirrors** in the `negativeText` prompt. Don't enter **no mirrors**.

------
#### [ Inpainting (Request) ]

text (Optional) – A text prompt to define what to change inside the mask. If you don't include this field, the model tries to replace the entire mask area with the background. Must be <= 512 characters. negativeText (Optional) – A text prompt to define what not to include in the image. Must be <= 512 characters. The size limits for the input image and input mask are <= 1,408 on the longer side of image. The output size is the same as the input size.

```
{
    "taskType": "INPAINTING",
    "inPaintingParams": {
        "image": "base64-encoded string",                         
        "text": "string",
        "negativeText": "string",        
        "maskPrompt": "string",                      
        "maskImage": "base64-encoded string",   
        "returnMask": boolean # False by default                
    },                                                 
    "imageGenerationConfig": {
        "quality": "standard" | "premium",
        "numberOfImages": int,
        "height": int,
        "width": int,
        "cfgScale": float
    }
}
```

The `inPaintingParams` fields are described below. The *mask* defines the part of the image that you want to modify.
+ **image** (Required) – The JPEG or PNG image to modify, formatted as a string that specifies a sequence of pixels, each defined in RGB values and encoded in base64. For examples of how to encode an image into base64 and decode a base64-encoded string and transform it into an image, see the [code examples](#model-parameters-titan-image-code-examples).
+ You must define one of the following fields (but not both) in order to define.
  + **maskPrompt** – A text prompt that defines the mask.
  + **maskImage** – A string that defines the mask by specifying a sequence of pixels that is the same size as the `image`. Each pixel is turned into an RGB value of (0 0 0) (a pixel inside the mask) or (255 255 255) (a pixel outside the mask). For examples of how to encode an image into base64 and decode a base64-encoded string and transform it into an image, see the [code examples](#model-parameters-titan-image-code-examples).
+ **text** (Optional) – A text prompt to define what to change inside the mask. If you don't include this field, the model tries to replace the entire mask area with the background.
+ **negativeText** (Optional) – A text prompt to define what not to include in the image.
**Note**  
Don't use negative words in the `negativeText` prompt. For example, if you don't want to include mirrors in an image, enter **mirrors** in the `negativeText` prompt. Don't enter **no mirrors**.

------
#### [ Outpainting (Request) ]

text (Required) – A text prompt to define what to change outside the mask. Must be <= 512 characters. negativeText (Optional) – A text prompt to define what not to include in the image. Must be <= 512 characters. The size limits for the input image and input mask are <= 1,408 on the longer side of image. The output size is the same as the input size. 

```
{
    "taskType": "OUTPAINTING",
    "outPaintingParams": {
        "text": "string",
        "negativeText": "string",        
        "image": "base64-encoded string",                         
        "maskPrompt": "string",                      
        "maskImage": "base64-encoded string",    
        "returnMask": boolean, # False by default                                         
        "outPaintingMode": "DEFAULT | PRECISE"                 
    },                                                 
    "imageGenerationConfig": {
        "quality": "standard" | "premium",
        "numberOfImages": int,
        "height": int,
        "width": int,
        "cfgScale": float
    }
}
```

The `outPaintingParams` fields are defined below. The *mask* defines the Region in the image whose that you don't want to modify. The generation seamlessly extends the Region you define.
+ **image** (Required) – The JPEG or PNG image to modify, formatted as a string that specifies a sequence of pixels, each defined in RGB values and encoded in base64. For examples of how to encode an image into base64 and decode a base64-encoded string and transform it into an image, see the [code examples](#model-parameters-titan-image-code-examples).
+ You must define one of the following fields (but not both) in order to define.
  + **maskPrompt** – A text prompt that defines the mask.
  + **maskImage** – A string that defines the mask by specifying a sequence of pixels that is the same size as the `image`. Each pixel is turned into an RGB value of (0 0 0) (a pixel inside the mask) or (255 255 255) (a pixel outside the mask). For examples of how to encode an image into base64 and decode a base64-encoded string and transform it into an image, see the [code examples](#model-parameters-titan-image-code-examples).
+ **text** (Required) – A text prompt to define what to change outside the mask.
+ **negativeText** (Optional) – A text prompt to define what not to include in the image.
**Note**  
Don't use negative words in the `negativeText` prompt. For example, if you don't want to include mirrors in an image, enter **mirrors** in the `negativeText` prompt. Don't enter **no mirrors**.
+ **outPaintingMode** – Specifies whether to allow modification of the pixels inside the mask or not. The following values are possible.
  + DEFAULT – Use this option to allow modification of the image inside the mask in order to keep it consistent with the reconstructed background.
  + PRECISE – Use this option to prevent modification of the image inside the mask.

------
#### [ Image variation (Request) ]

Image variation allow you to create variations of your original image based on the parameter values. The size limit for the input image are <= 1,408 on the longer side of image. See the table below for a full list of resolutions. 
+ text (Optional) – A text prompt that can define what to preserve and what to change in the image. Must be <= 512 characters.
+ negativeText (Optional) – A text prompt to define what not to include in the image. Must be <= 512 characters.
+ text (Optional) – A text prompt that can define what to preserve and what to change in the image. Must be <= 512 characters.
+ similarityStrength (Optional) – Specifies how similar the generated image should be to the input image(s) Use a lower value to introduce more randomness in the generation. Accepted range is between 0.2 and 1.0 (both inclusive), while a default of 0.7 is used if this parameter is missing in the request.

```
{
     "taskType": "IMAGE_VARIATION",
     "imageVariationParams": {
         "text": "string",
         "negativeText": "string",
         "images": ["base64-encoded string"],
         "similarityStrength": 0.7,  # Range: 0.2 to 1.0
     },
     "imageGenerationConfig": {
         "quality": "standard" | "premium",
         "numberOfImages": int,
         "height": int,
         "width": int,
         "cfgScale": float
     }
}
```

The `imageVariationParams` fields are defined below.
+ **images** (Required) – A list of images for which to generate variations. You can include 1 to 5 images. An image is defined as a base64-encoded image string. For examples of how to encode an image into base64 and decode a base64-encoded string and transform it into an image, see the [code examples](#model-parameters-titan-image-code-examples).
+ **text** (Optional) – A text prompt that can define what to preserve and what to change in the image.
+ **similarityStrength** (Optional) – Specifies how similar the generated image should be to the input images(s). Range in 0.2 to 1.0 with lower values used to introduce more randomness.
+ **negativeText** (Optional) – A text prompt to define what not to include in the image.
**Note**  
Don't use negative words in the `negativeText` prompt. For example, if you don't want to include mirrors in an image, enter **mirrors** in the `negativeText` prompt. Don't enter **no mirrors**.

------
#### [ Conditioned Image Generation (Request) V2 only ]

The conditioned image generation task type allows customers to augment text-to-image generation by providing a “condition image” to achieve more fine-grained control over the resulting generated image.
+ Canny edge detection
+ Segmentation map

Text prompt to generate the image must be <= 512 characters. Resolutions <= 1,408 on the longer side. negativeText (Optional) is a text prompt to define what not to include in the image and is <= 512 characters. See the table below for a full list of resolutions.

```
{
    "taskType": "TEXT_IMAGE",
    "textToImageParams": {
        "text": "string",      
        "negativeText": "string",
        "conditionImage": "base64-encoded string", # [OPTIONAL] base64 encoded image
        "controlMode": "string", # [OPTIONAL] CANNY_EDGE | SEGMENTATION. DEFAULT: CANNY_EDGE
        "controlStrength": float # [OPTIONAL] weight given to the condition image. DEFAULT: 0.7
    },
    "imageGenerationConfig": {
        "quality": "standard" | "premium",
        "numberOfImages": int,
        "height": int,
        "width": int,
        "cfgScale": float,
        "seed": int
    }
}
```
+ **text** (Required) – A text prompt to generate the image.
+ **negativeText** (Optional) – A text prompt to define what not to include in the image.
**Note**  
Don't use negative words in the `negativeText` prompt. For example, if you don't want to include mirrors in an image, enter **mirrors** in the `negativeText` prompt. Don't enter **no mirrors**.
+ **conditionImage** (Optional-V2 only) – A single input conditioning image that guides the layout and composition of the generated image. An image is defined as a base64-encoded image string. For examples of how to encode an image into base64 and decode a base64-encoded string and transform it into an image.
+ **controlMode** (Optional-V2 only) – Specifies that type of conditioning mode should be used. Two types of conditioning modes are supported: CANNY\$1EDGE and SEGMENTATION. Default value is CANNY\$1EDGE.
+ **controlStrength** (Optional-V2 only) – Specifies how similar the layout and composition of the generated image should be to the conditioningImage. Range in 0 to 1.0 with lower values used to introduce more randomness. Default value is 0.7.

**Note**  
If controlMode or controlStrength are provided, then conditionImage must also be provided.

------
#### [ Color Guided Content (Request) V2 only ]

Provide a list of hex color codes along with a text prompt to generate an image that follows the color palette. A text prompt is required to generate the image must be <= 512 characters. Resolutions maximum is 1,408 on the longer side. A list of 1 to 10 hex color codes are required to specify colors in the generated image, negativeText Optional A text prompt to define what not to include in the image <= 512 characters referenceImage optional an additional reference image to guide the color palette in the generate image. The size limit for user-uploaded RGB reference image is <= 1,408 on the longer side. 

```
{
    "taskType": "COLOR_GUIDED_GENERATION",
    "colorGuidedGenerationParams": {
        "text": "string",      
        "negativeText": "string",
        "referenceImage" "base64-encoded string", # [OPTIONAL]
        "colors": ["string"] # list of color hex codes
    },
    "imageGenerationConfig": {
        "quality": "standard" | "premium",
        "numberOfImages": int,
        "height": int,
        "width": int,
        "cfgScale": float,
        "seed": int
    }
}
```

The colorGuidedGenerationParams fields are described below. Note that this parameter is for V2 only.
+ **text** (Required) – A text prompt to generate the image.
+ **colors** (Required) – A list of up to 10 hex color codes to specify colors in the generated image.
+ **negativeText** (Optional) – A text prompt to define what not to include in the image.
**Note**  
Don't use negative words in the `negativeText` prompt. For example, if you don't want to include mirrors in an image, enter **mirrors** in the `negativeText` prompt. Don't enter **no mirrors**.
+ **referenceImage** (Optional) – A single input reference image that guides the color palette of the generated image. An image is defined as a base64-encoded image string.

------
#### [ Background Removal (Request) ]

The background removal task type automatically identifies multiple objects in the input image and removes the background. The output image has a transparent background. 

**Request format**

```
{
    "taskType": "BACKGROUND_REMOVAL",
    "backgroundRemovalParams": {
        "image": "base64-encoded string"
    }
}
```

**Response format**

```
{
  "images": [
    "base64-encoded string", 
    ...
  ],
  "error": "string" 
}
```

The backgroundRemovalParams field is described below.
+ **image** (Required) – The JPEG or PNG image to modify, formatted as a string that specifies a sequence of pixels, each defined in RGB values and encoded in base64.

------
#### [ Response body ]

```
{
  "images": [
    "base64-encoded string", 
    ...
  ],
  "error": "string" 
}
```

The response body is a streaming object that contains one of the following fields.
+ `images` – If the request is successful, it returns this field, a list of base64-encoded strings, each defining a generated image. Each image is formatted as a string that specifies a sequence of pixels, each defined in RGB values and encoded in base64. For examples of how to encode an image into base64 and decode a base64-encoded string and transform it into an image, see the [code examples](#model-parameters-titan-image-code-examples).
+ `error` – If the request violates the content moderation policy in one of the following situations, a message is returned in this field.
  + If the input text, image, or mask image is flagged by the content moderation policy.
  + If at least one output image is flagged by the content moderation policy

------

The shared and optional `imageGenerationConfig` contains the following fields. If you don't include this object, the default configurations are used.
+ **quality** – The quality of the image. The default value is `standard`. For pricing details, see [Amazon Bedrock Pricing](https://aws.amazon.com/bedrock/pricing/).
+ **numberOfImages** (Optional) – The number of images to generate.  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-titan-image.html)
+ **cfgScale** (Optional) – Specifies how strongly the generated image should adhere to the prompt. Use a lower value to introduce more randomness in the generation.  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-titan-image.html)
+ The following parameters define the size that you want the output image to be. For more details about pricing by image size, see [Amazon Bedrock pricing](https://aws.amazon.com/bedrock/pricing/).
  + **height** (Optional) – The height of the image in pixels. The default value is 1408.
  + **width** (Optional) – The width of the image in pixels. The default value is 1408.

  The following sizes are permissible.  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-titan-image.html)
+ **seed** (Optional) – Use to control and reproduce results. Determines the initial noise setting. Use the same seed and the same settings as a previous run to allow inference to create a similar image.  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-titan-image.html)

## Examples
<a name="model-parameters-titan-image-code-examples"></a>

The following examples show how to invoke the Amazon Titan Image Generator models with on-demand throughput in the Python SDK. Select a tab to view an example for each use-case. Each example displays the image at the end.

------
#### [ Text-to-image generation ]

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate an image from a text prompt with the Amazon Titan Image Generator G1 model (on demand).
"""
import base64
import io
import json
import logging
import boto3
from PIL import Image

from botocore.exceptions import ClientError


class ImageError(Exception):
    "Custom exception for errors returned by Amazon Titan Image Generator G1"

    def __init__(self, message):
        self.message = message


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_image(model_id, body):
    """
    Generate an image using Amazon Titan Image Generator G1 model on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        image_bytes (bytes): The image generated by the model.
    """

    logger.info(
        "Generating image with Amazon Titan Image Generator G1 model %s", model_id)

    bedrock = boto3.client(service_name='bedrock-runtime')

    accept = "application/json"
    content_type = "application/json"

    response = bedrock.invoke_model(
        body=body, modelId=model_id, accept=accept, contentType=content_type
    )
    response_body = json.loads(response.get("body").read())

    base64_image = response_body.get("images")[0]
    base64_bytes = base64_image.encode('ascii')
    image_bytes = base64.b64decode(base64_bytes)

    finish_reason = response_body.get("error")

    if finish_reason is not None:
        raise ImageError(f"Image generation error. Error is {finish_reason}")

    logger.info(
        "Successfully generated image with Amazon Titan Image Generator G1 model %s", model_id)

    return image_bytes


def main():
    """
    Entrypoint for Amazon Titan Image Generator G1 example.
    """

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    model_id = 'amazon.titan-image-generator-v1'

    prompt = """A photograph of a cup of coffee from the side."""

    body = json.dumps({
        "taskType": "TEXT_IMAGE",
        "textToImageParams": {
            "text": prompt
        },
        "imageGenerationConfig": {
            "numberOfImages": 1,
            "height": 1024,
            "width": 1024,
            "cfgScale": 8.0,
            "seed": 0
        }
    })

    try:
        image_bytes = generate_image(model_id=model_id,
                                     body=body)
        image = Image.open(io.BytesIO(image_bytes))
        image.show()

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))
    except ImageError as err:
        logger.error(err.message)
        print(err.message)

    else:
        print(
            f"Finished generating image with Amazon Titan Image Generator G1 model {model_id}.")


if __name__ == "__main__":
    main()
```

------
#### [ Inpainting ]

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to use inpainting to generate an image from a source image with 
the Amazon Titan Image Generator G1 model (on demand).
The example uses a mask prompt to specify the area to inpaint.
"""
import base64
import io
import json
import logging
import boto3
from PIL import Image

from botocore.exceptions import ClientError


class ImageError(Exception):
    "Custom exception for errors returned by Amazon Titan Image Generator G1"

    def __init__(self, message):
        self.message = message


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_image(model_id, body):
    """
    Generate an image using Amazon Titan Image Generator G1 model on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        image_bytes (bytes): The image generated by the model.
    """

    logger.info(
        "Generating image with Amazon Titan Image Generator G1 model %s", model_id)

    bedrock = boto3.client(service_name='bedrock-runtime')

    accept = "application/json"
    content_type = "application/json"

    response = bedrock.invoke_model(
        body=body, modelId=model_id, accept=accept, contentType=content_type
    )
    response_body = json.loads(response.get("body").read())

    base64_image = response_body.get("images")[0]
    base64_bytes = base64_image.encode('ascii')
    image_bytes = base64.b64decode(base64_bytes)

    finish_reason = response_body.get("error")

    if finish_reason is not None:
        raise ImageError(f"Image generation error. Error is {finish_reason}")

    logger.info(
        "Successfully generated image with Amazon Titan Image Generator G1 model %s", model_id)

    return image_bytes


def main():
    """
    Entrypoint for Amazon Titan Image Generator G1 example.
    """
    try:
        logging.basicConfig(level=logging.INFO,
                            format="%(levelname)s: %(message)s")

        model_id = 'amazon.titan-image-generator-v1'

        # Read image from file and encode it as base64 string.
        with open("/path/to/image", "rb") as image_file:
            input_image = base64.b64encode(image_file.read()).decode('utf8')

        body = json.dumps({
            "taskType": "INPAINTING",
            "inPaintingParams": {
                "text": "Modernize the windows of the house",
                "negativeText": "bad quality, low res",
                "image": input_image,
                "maskPrompt": "windows"
            },
            "imageGenerationConfig": {
                "numberOfImages": 1,
                "height": 512,
                "width": 512,
                "cfgScale": 8.0
            }
        })

        image_bytes = generate_image(model_id=model_id,
                                     body=body)
        image = Image.open(io.BytesIO(image_bytes))
        image.show()

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))
    except ImageError as err:
        logger.error(err.message)
        print(err.message)

    else:
        print(
            f"Finished generating image with Amazon Titan Image Generator G1 model {model_id}.")


if __name__ == "__main__":
    main()
```

------
#### [ Outpainting ]

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to use outpainting to generate an image from a source image with 
the Amazon Titan Image Generator G1 model (on demand).
The example uses a mask image to outpaint the original image.
"""
import base64
import io
import json
import logging
import boto3
from PIL import Image

from botocore.exceptions import ClientError


class ImageError(Exception):
    "Custom exception for errors returned by Amazon Titan Image Generator G1"

    def __init__(self, message):
        self.message = message


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_image(model_id, body):
    """
    Generate an image using Amazon Titan Image Generator G1 model on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        image_bytes (bytes): The image generated by the model.
    """

    logger.info(
        "Generating image with Amazon Titan Image Generator G1 model %s", model_id)

    bedrock = boto3.client(service_name='bedrock-runtime')

    accept = "application/json"
    content_type = "application/json"

    response = bedrock.invoke_model(
        body=body, modelId=model_id, accept=accept, contentType=content_type
    )
    response_body = json.loads(response.get("body").read())

    base64_image = response_body.get("images")[0]
    base64_bytes = base64_image.encode('ascii')
    image_bytes = base64.b64decode(base64_bytes)

    finish_reason = response_body.get("error")

    if finish_reason is not None:
        raise ImageError(f"Image generation error. Error is {finish_reason}")

    logger.info(
        "Successfully generated image with Amazon Titan Image Generator G1 model %s", model_id)

    return image_bytes


def main():
    """
    Entrypoint for Amazon Titan Image Generator G1 example.
    """
    try:
        logging.basicConfig(level=logging.INFO,
                            format="%(levelname)s: %(message)s")

        model_id = 'amazon.titan-image-generator-v1'

        # Read image and mask image from file and encode as base64 strings.
        with open("/path/to/image", "rb") as image_file:
            input_image = base64.b64encode(image_file.read()).decode('utf8')
        with open("/path/to/mask_image", "rb") as mask_image_file:
            input_mask_image = base64.b64encode(
                mask_image_file.read()).decode('utf8')

        body = json.dumps({
            "taskType": "OUTPAINTING",
            "outPaintingParams": {
                "text": "Draw a chocolate chip cookie",
                "negativeText": "bad quality, low res",
                "image": input_image,
                "maskImage": input_mask_image,
                "outPaintingMode": "DEFAULT"
            },
            "imageGenerationConfig": {
                "numberOfImages": 1,
                "height": 512,
                "width": 512,
                "cfgScale": 8.0
            }
        }
        )

        image_bytes = generate_image(model_id=model_id,
                                     body=body)
        image = Image.open(io.BytesIO(image_bytes))
        image.show()

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))
    except ImageError as err:
        logger.error(err.message)
        print(err.message)

    else:
        print(
            f"Finished generating image with Amazon Titan Image Generator G1 model {model_id}.")


if __name__ == "__main__":
    main()
```

------
#### [ Image variation ]

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate an image variation from a source image with the
Amazon Titan Image Generator G1 model (on demand).
"""
import base64
import io
import json
import logging
import boto3
from PIL import Image

from botocore.exceptions import ClientError


class ImageError(Exception):
    "Custom exception for errors returned by Amazon Titan Image Generator G1"

    def __init__(self, message):
        self.message = message


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_image(model_id, body):
    """
    Generate an image using Amazon Titan Image Generator G1 model on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        image_bytes (bytes): The image generated by the model.
    """

    logger.info(
        "Generating image with Amazon Titan Image Generator G1 model %s", model_id)

    bedrock = boto3.client(service_name='bedrock-runtime')

    accept = "application/json"
    content_type = "application/json"

    response = bedrock.invoke_model(
        body=body, modelId=model_id, accept=accept, contentType=content_type
    )
    response_body = json.loads(response.get("body").read())

    base64_image = response_body.get("images")[0]
    base64_bytes = base64_image.encode('ascii')
    image_bytes = base64.b64decode(base64_bytes)

    finish_reason = response_body.get("error")

    if finish_reason is not None:
        raise ImageError(f"Image generation error. Error is {finish_reason}")

    logger.info(
        "Successfully generated image with Amazon Titan Image Generator G1 model %s", model_id)

    return image_bytes


def main():
    """
    Entrypoint for Amazon Titan Image Generator G1 example.
    """
    try:
        logging.basicConfig(level=logging.INFO,
                            format="%(levelname)s: %(message)s")

        model_id = 'amazon.titan-image-generator-v1'

        # Read image from file and encode it as base64 string.
        with open("/path/to/image", "rb") as image_file:
            input_image = base64.b64encode(image_file.read()).decode('utf8')

        body = json.dumps({
            "taskType": "IMAGE_VARIATION",
            "imageVariationParams": {
                "text": "Modernize the house, photo-realistic, 8k, hdr",
                "negativeText": "bad quality, low resolution, cartoon",
                "images": [input_image],
		"similarityStrength": 0.7,  # Range: 0.2 to 1.0
            },
            "imageGenerationConfig": {
                "numberOfImages": 1,
                "height": 512,
                "width": 512,
                "cfgScale": 8.0
            }
        })

        image_bytes = generate_image(model_id=model_id,
                                     body=body)
        image = Image.open(io.BytesIO(image_bytes))
        image.show()

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))
    except ImageError as err:
        logger.error(err.message)
        print(err.message)

    else:
        print(
            f"Finished generating image with Amazon Titan Image Generator G1 model {model_id}.")


if __name__ == "__main__":
    main()
```

------
#### [ Image conditioning (V2 only) ]

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate image conditioning from a source image with the
Amazon Titan Image Generator G1 V2 model (on demand).
"""
import base64
import io
import json
import logging
import boto3
from PIL import Image

from botocore.exceptions import ClientError


class ImageError(Exception):
    "Custom exception for errors returned by Amazon Titan Image Generator V2"

    def __init__(self, message):
        self.message = message


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_image(model_id, body):
    """
    Generate an image using Amazon Titan Image Generator V2 model on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        image_bytes (bytes): The image generated by the model.
    """

    logger.info(
        "Generating image with Amazon Titan Image Generator V2 model %s", model_id)

    bedrock = boto3.client(service_name='bedrock-runtime')

    accept = "application/json"
    content_type = "application/json"

    response = bedrock.invoke_model(
        body=body, modelId=model_id, accept=accept, contentType=content_type
    )
    response_body = json.loads(response.get("body").read())

    base64_image = response_body.get("images")[0]
    base64_bytes = base64_image.encode('ascii')
    image_bytes = base64.b64decode(base64_bytes)

    finish_reason = response_body.get("error")

    if finish_reason is not None:
        raise ImageError(f"Image generation error. Error is {finish_reason}")

    logger.info(
        "Successfully generated image with Amazon Titan Image Generator V2 model %s", model_id)

    return image_bytes


def main():
    """
    Entrypoint for Amazon Titan Image Generator V2 example.
    """
    try:
        logging.basicConfig(level=logging.INFO,
                            format="%(levelname)s: %(message)s")

        model_id = 'amazon.titan-image-generator-v2:0'

        # Read image from file and encode it as base64 string.
        with open("/path/to/image", "rb") as image_file:
            input_image = base64.b64encode(image_file.read()).decode('utf8')

        body = json.dumps({
            "taskType": "TEXT_IMAGE",
            "textToImageParams": {
                "text": "A robot playing soccer, anime cartoon style",
                "negativeText": "bad quality, low res",
                "conditionImage": input_image,
                "controlMode": "CANNY_EDGE"
            },
            "imageGenerationConfig": {
                "numberOfImages": 1,
                "height": 512,
                "width": 512,
                "cfgScale": 8.0
            }
        })

        image_bytes = generate_image(model_id=model_id,
                                     body=body)
        image = Image.open(io.BytesIO(image_bytes))
        image.show()

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))
    except ImageError as err:
        logger.error(err.message)
        print(err.message)

    else:
        print(
            f"Finished generating image with Amazon Titan Image Generator V2 model {model_id}.")


if __name__ == "__main__":
    main()
```

------
#### [ Color guided content (V2 only) ]

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate an image from a source image color palette with the
Amazon Titan Image Generator G1 V2 model (on demand).
"""
import base64
import io
import json
import logging
import boto3
from PIL import Image

from botocore.exceptions import ClientError


class ImageError(Exception):
    "Custom exception for errors returned by Amazon Titan Image Generator V2"

    def __init__(self, message):
        self.message = message


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_image(model_id, body):
    """
    Generate an image using Amazon Titan Image Generator V2 model on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        image_bytes (bytes): The image generated by the model.
    """

    logger.info(
        "Generating image with Amazon Titan Image Generator V2 model %s", model_id)

    bedrock = boto3.client(service_name='bedrock-runtime')

    accept = "application/json"
    content_type = "application/json"

    response = bedrock.invoke_model(
        body=body, modelId=model_id, accept=accept, contentType=content_type
    )
    response_body = json.loads(response.get("body").read())

    base64_image = response_body.get("images")[0]
    base64_bytes = base64_image.encode('ascii')
    image_bytes = base64.b64decode(base64_bytes)

    finish_reason = response_body.get("error")

    if finish_reason is not None:
        raise ImageError(f"Image generation error. Error is {finish_reason}")

    logger.info(
        "Successfully generated image with Amazon Titan Image Generator V2 model %s", model_id)

    return image_bytes


def main():
    """
    Entrypoint for Amazon Titan Image Generator V2 example.
    """
    try:
        logging.basicConfig(level=logging.INFO,
                            format="%(levelname)s: %(message)s")

        model_id = 'amazon.titan-image-generator-v2:0'

        # Read image from file and encode it as base64 string.
        with open("/path/to/image", "rb") as image_file:
            input_image = base64.b64encode(image_file.read()).decode('utf8')

        body = json.dumps({
            "taskType": "COLOR_GUIDED_GENERATION",
            "colorGuidedGenerationParams": {
                "text": "digital painting of a girl, dreamy and ethereal, pink eyes, peaceful expression, ornate frilly dress, fantasy, intricate, elegant, rainbow bubbles, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration",
                "negativeText": "bad quality, low res",
                "referenceImage": input_image,
                "colors": ["#ff8080", "#ffb280", "#ffe680", "#ffe680"]
            },
            "imageGenerationConfig": {
                "numberOfImages": 1,
                "height": 512,
                "width": 512,
                "cfgScale": 8.0
            }
        })

        image_bytes = generate_image(model_id=model_id,
                                     body=body)
        image = Image.open(io.BytesIO(image_bytes))
        image.show()

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))
    except ImageError as err:
        logger.error(err.message)
        print(err.message)

    else:
        print(
            f"Finished generating image with Amazon Titan Image Generator V2 model {model_id}.")


if __name__ == "__main__":
    main()
```

------
#### [ Background removal (V2 only) ]

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate an image with background removal with the
Amazon Titan Image Generator G1 V2 model (on demand).
"""
import base64
import io
import json
import logging
import boto3
from PIL import Image

from botocore.exceptions import ClientError


class ImageError(Exception):
    "Custom exception for errors returned by Amazon Titan Image Generator V2"

    def __init__(self, message):
        self.message = message


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_image(model_id, body):
    """
    Generate an image using Amazon Titan Image Generator V2 model on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        image_bytes (bytes): The image generated by the model.
    """

    logger.info(
        "Generating image with Amazon Titan Image Generator V2 model %s", model_id)

    bedrock = boto3.client(service_name='bedrock-runtime')

    accept = "application/json"
    content_type = "application/json"

    response = bedrock.invoke_model(
        body=body, modelId=model_id, accept=accept, contentType=content_type
    )
    response_body = json.loads(response.get("body").read())

    base64_image = response_body.get("images")[0]
    base64_bytes = base64_image.encode('ascii')
    image_bytes = base64.b64decode(base64_bytes)

    finish_reason = response_body.get("error")

    if finish_reason is not None:
        raise ImageError(f"Image generation error. Error is {finish_reason}")

    logger.info(
        "Successfully generated image with Amazon Titan Image Generator V2 model %s", model_id)

    return image_bytes


def main():
    """
    Entrypoint for Amazon Titan Image Generator V2 example.
    """
    try:
        logging.basicConfig(level=logging.INFO,
                            format="%(levelname)s: %(message)s")

        model_id = 'amazon.titan-image-generator-v2:0'

        # Read image from file and encode it as base64 string.
        with open("/path/to/image", "rb") as image_file:
            input_image = base64.b64encode(image_file.read()).decode('utf8')

        body = json.dumps({
            "taskType": "BACKGROUND_REMOVAL",
            "backgroundRemovalParams": {
                "image": input_image,
            }
        })

        image_bytes = generate_image(model_id=model_id,
                                     body=body)
        image = Image.open(io.BytesIO(image_bytes))
        image.show()

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))
    except ImageError as err:
        logger.error(err.message)
        print(err.message)

    else:
        print(
            f"Finished generating image with Amazon Titan Image Generator V2 model {model_id}.")


if __name__ == "__main__":
    main()
```

------

# Amazon Titan Embeddings G1 - Text
<a name="model-parameters-titan-embed-text"></a>

Titan Embeddings G1 - Text does not support the use of inference parameters. The following sections detail the request and response formats and provides a code example.

**Topics**
+ [

## Request and response
](#model-parameters-titan-embed-text-request-response)
+ [

## Example code
](#api-inference-examples-titan-embed-text)

## Request and response
<a name="model-parameters-titan-embed-text-request-response"></a>

The request body is passed in the `body` field of an [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) request. 

------
#### [ V2 Request ]

The inputText parameter is required. The normalize and dimensions parameters are optional.
+ inputText – Enter text to convert to an embedding.
+ normalize – (optional) Flag indicating whether or not to normalize the output embedding. Defaults to true.
+ dimensions – (optional) The number of dimensions the output embedding should have. The following values are accepted: 1024 (default), 512, 256.
+ embeddingTypes – (optional) Accepts a list containing "float", "binary", or both. Defaults to `float`. 

```
{
    "inputText": string,
    "dimensions": int,
    "normalize": boolean,
    "embeddingTypes": list
}
```

------
#### [ V2 Response ]

The fields are described below.
+ embedding – An array that represents the embedding vector of the input you provided. This will always be type `float`.
+ inputTextTokenCount – The number of tokens in the input.
+ embeddingsByType – A dictionary or map of the embedding list. Depends on the input, lists "float", "binary", or both.
  + Example: `"embeddingsByType": {"binary": [int,..], "float": [float,...]}`
  + This field will always appear. Even if you don't specify `embeddingTypes` in your input, there will still be “float”. Example: `"embeddingsByType": {"float": [float,...]}`

```
{
    "embedding": [float, float, ...],
    "inputTextTokenCount": int,
    "embeddingsByType": {"binary": [int,..], "float": [float,...]}
}
```

------
#### [ G1 Request ]

The only available field is `inputText`, in which you can include text to convert into an embedding.

```
{
    "inputText": string
}
```

------
#### [ G1 Response ]

The `body` of the response contains the following fields.

```
{
    "embedding": [float, float, ...],
    "inputTextTokenCount": int
}
```

The fields are described below.
+ **embedding** – An array that represents the embedding vector of the input you provided.
+ **inputTextTokenCount** – The number of tokens in the input.

------

## Example code
<a name="api-inference-examples-titan-embed-text"></a>

The following examples show how to call the Amazon Titan Embeddings models to generate embedding. Select the tab that corresponds to the model you're using:

------
#### [ Amazon Titan Embeddings G1 - Text ]

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate an embedding with the Amazon Titan Embeddings G1 - Text model (on demand).
"""

import json
import logging
import boto3


from botocore.exceptions import ClientError


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_embedding(model_id, body):
    """
    Generate an embedding with the vector representation of a text input using Amazon Titan Embeddings G1 - Text on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        response (JSON): The embedding created by the model and the number of input tokens.
    """

    logger.info("Generating an embedding with Amazon Titan Embeddings G1 - Text model %s", model_id)

    bedrock = boto3.client(service_name='bedrock-runtime')

    accept = "application/json"
    content_type = "application/json"

    response = bedrock.invoke_model(
        body=body, modelId=model_id, accept=accept, contentType=content_type
    )

    response_body = json.loads(response.get('body').read())

    return response_body


def main():
    """
    Entrypoint for Amazon Titan Embeddings G1 - Text example.
    """

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    model_id = "amazon.titan-embed-text-v1"
    input_text = "What are the different services that you offer?"


    # Create request body.
    body = json.dumps({
        "inputText": input_text,
    })


    try:

        response = generate_embedding(model_id, body)

        print(f"Generated an embedding: {response['embedding']}")
        print(f"Input Token count:  {response['inputTextTokenCount']}")

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))

    else:
        print(f"Finished generating an embedding with Amazon Titan Embeddings G1 - Text model {model_id}.")


if __name__ == "__main__":
    main()
```

------
#### [ Amazon Titan Text Embeddings V2 ]

When using Titan Text Embeddings V2, the `embedding` field is not in the response if the `embeddingTypes` only contains `binary`. 

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate an embedding with the Amazon Titan Text Embeddings V2 Model
"""

import json
import logging
import boto3


from botocore.exceptions import ClientError


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_embedding(model_id, body):
    """
    Generate an embedding with the vector representation of a text input using Amazon Titan Text Embeddings G1 on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        response (JSON): The embedding created by the model and the number of input tokens.
    """

    logger.info("Generating an embedding with Amazon Titan Text Embeddings V2 model %s", model_id)

    bedrock = boto3.client(service_name='bedrock-runtime')

    accept = "application/json"
    content_type = "application/json"

    response = bedrock.invoke_model(
        body=body, modelId=model_id, accept=accept, contentType=content_type
    )

    response_body = json.loads(response.get('body').read())

    return response_body


def main():
    """
    Entrypoint for Amazon Titan Embeddings V2 - Text example.
    """

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    model_id = "amazon.titan-embed-text-v2:0"
    input_text = "What are the different services that you offer?"


    # Create request body.
    body = json.dumps({
        "inputText": input_text,
        "embeddingTypes": ["binary"]
    })


    try:

        response = generate_embedding(model_id, body)

        print(f"Generated an embedding: {response['embeddingsByType']['binary']}") # returns binary embedding
        print(f"Input text: {input_text}")
        print(f"Input Token count:  {response['inputTextTokenCount']}")

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))

    else:
        print(f"Finished generating an embedding with Amazon Titan Text Embeddings V2 model {model_id}.")


if __name__ == "__main__":
    main()
```

------

# Amazon Titan Multimodal Embeddings G1
<a name="model-parameters-titan-embed-mm"></a>

This section provides request and response body formats and code examples for using Amazon Titan Multimodal Embeddings G1.

**Topics**
+ [

## Request and response
](#model-parameters-titan-embed-mm-request-response)
+ [

## Example code
](#api-inference-examples-titan-embed-mm)

## Request and response
<a name="model-parameters-titan-embed-mm-request-response"></a>

The request body is passed in the `body` field of an [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) request.

------
#### [ Request ]

The request body for Amazon Titan Multimodal Embeddings G1 includes the following fields.

```
{
    "inputText": string,
    "inputImage": base64-encoded string,
    "embeddingConfig": {
        "outputEmbeddingLength": 256 | 384 | 1024
    }
}
```

At least one of the following fields is required. Include both to generate an embeddings vector that averages the resulting text embeddings and image embeddings vectors.
+ **inputText** – Enter text to convert to embeddings.
+ **inputImage** – Encode the image that you want to convert to embeddings in base64 and enter the string in this field. For examples of how to encode an image into base64 and decode a base64-encoded string and transform it into an image, see the [code examples](#api-inference-examples-titan-embed-mm).

The following field is optional.
+ **embeddingConfig** – Contains an `outputEmbeddingLength` field, in which you specify one of the following lengths for the output embeddings vector.
  + 256
  + 384
  + 1024 (default)

------
#### [ Response ]

The `body` of the response contains the following fields.

```
{
    "embedding": [float, float, ...],
    "inputTextTokenCount": int,
    "message": string
}
```

The fields are described below.
+ **embedding** – An array that represents the embeddings vector of the input you provided.
+ **inputTextTokenCount** – The number of tokens in the text input.
+ **message** – Specifies any errors that occur during generation.

------

## Example code
<a name="api-inference-examples-titan-embed-mm"></a>

The following examples show how to invoke the Amazon Titan Multimodal Embeddings G1 model with on-demand throughput in the Python SDK. Select a tab to view an example for each use-case.

------
#### [ Text embeddings ]

This example shows how to call the Amazon Titan Multimodal Embeddings G1 model to generate text embeddings.

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate embeddings from text with the Amazon Titan Multimodal Embeddings G1 model (on demand).
"""

import json
import logging
import boto3


from botocore.exceptions import ClientError

class EmbedError(Exception):
    "Custom exception for errors returned by Amazon Titan Multimodal Embeddings G1"

    def __init__(self, message):
        self.message = message

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_embeddings(model_id, body):
    """
    Generate a vector of embeddings for a text input using Amazon Titan Multimodal Embeddings G1 on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        response (JSON): The embeddings that the model generated, token information, and the
        reason the model stopped generating embeddings.
    """

    logger.info("Generating embeddings with Amazon Titan Multimodal Embeddings G1 model %s", model_id)

    bedrock = boto3.client(service_name='bedrock-runtime')

    accept = "application/json"
    content_type = "application/json"

    response = bedrock.invoke_model(
        body=body, modelId=model_id, accept=accept, contentType=content_type
    )

    response_body = json.loads(response.get('body').read())

    finish_reason = response_body.get("message")

    if finish_reason is not None:
        raise EmbedError(f"Embeddings generation error: {finish_reason}")

    return response_body


def main():
    """
    Entrypoint for Amazon Titan Multimodal Embeddings G1 example.
    """

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    model_id = "amazon.titan-embed-image-v1"
    input_text = "What are the different services that you offer?"
    output_embedding_length = 256

    # Create request body.
    body = json.dumps({
        "inputText": input_text,
        "embeddingConfig": {
            "outputEmbeddingLength": output_embedding_length
        }
    })


    try:

        response = generate_embeddings(model_id, body)

        print(f"Generated text embeddings of length {output_embedding_length}: {response['embedding']}")
        print(f"Input text token count:  {response['inputTextTokenCount']}")

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))
        
    except EmbedError as err:
        logger.error(err.message)
        print(err.message)

    else:
        print(f"Finished generating text embeddings with Amazon Titan Multimodal Embeddings G1 model {model_id}.")


if __name__ == "__main__":
    main()
```

------
#### [ Image embeddings ]

This example shows how to call the Amazon Titan Multimodal Embeddings G1 model to generate image embeddings.

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate embeddings from an image with the Amazon Titan Multimodal Embeddings G1 model (on demand).
"""

import base64
import json
import logging
import boto3

from botocore.exceptions import ClientError

class EmbedError(Exception):
    "Custom exception for errors returned by Amazon Titan Multimodal Embeddings G1"

    def __init__(self, message):
        self.message = message

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_embeddings(model_id, body):
    """
    Generate a vector of embeddings for an image input using Amazon Titan Multimodal Embeddings G1 on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        response (JSON): The embeddings that the model generated, token information, and the
        reason the model stopped generating embeddings.
    """

    logger.info("Generating embeddings with Amazon Titan Multimodal Embeddings G1 model %s", model_id)

    bedrock = boto3.client(service_name='bedrock-runtime')

    accept = "application/json"
    content_type = "application/json"

    response = bedrock.invoke_model(
        body=body, modelId=model_id, accept=accept, contentType=content_type
    )

    response_body = json.loads(response.get('body').read())

    finish_reason = response_body.get("message")

    if finish_reason is not None:
        raise EmbedError(f"Embeddings generation error: {finish_reason}")

    return response_body


def main():
    """
    Entrypoint for Amazon Titan Multimodal Embeddings G1 example.
    """

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    # Read image from file and encode it as base64 string.
    with open("/path/to/image", "rb") as image_file:
        input_image = base64.b64encode(image_file.read()).decode('utf8')

    model_id = 'amazon.titan-embed-image-v1'
    output_embedding_length = 256

    # Create request body.
    body = json.dumps({
        "inputImage": input_image,
        "embeddingConfig": {
            "outputEmbeddingLength": output_embedding_length
        }
    })


    try:

        response = generate_embeddings(model_id, body)

        print(f"Generated image embeddings of length {output_embedding_length}: {response['embedding']}")

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))
        
    except EmbedError as err:
        logger.error(err.message)
        print(err.message)

    else:
        print(f"Finished generating image embeddings with Amazon Titan Multimodal Embeddings G1 model {model_id}.")


if __name__ == "__main__":
    main()
```

------
#### [ Text and image embeddings ]

This example shows how to call the Amazon Titan Multimodal Embeddings G1 model to generate embeddings from a combined text and image input. The resulting vector is the average of the generated text embeddings vector and the image embeddings vector.

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate embeddings from an image and accompanying text with the Amazon Titan Multimodal Embeddings G1 model (on demand).
"""

import base64
import json
import logging
import boto3

from botocore.exceptions import ClientError

class EmbedError(Exception):
    "Custom exception for errors returned by Amazon Titan Multimodal Embeddings G1"

    def __init__(self, message):
        self.message = message

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_embeddings(model_id, body):
    """
    Generate a vector of embeddings for a combined text and image input using Amazon Titan Multimodal Embeddings G1 on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        response (JSON): The embeddings that the model generated, token information, and the
        reason the model stopped generating embeddings.
    """

    logger.info("Generating embeddings with Amazon Titan Multimodal Embeddings G1 model %s", model_id)

    bedrock = boto3.client(service_name='bedrock-runtime')

    accept = "application/json"
    content_type = "application/json"

    response = bedrock.invoke_model(
        body=body, modelId=model_id, accept=accept, contentType=content_type
    )

    response_body = json.loads(response.get('body').read())

    finish_reason = response_body.get("message")

    if finish_reason is not None:
        raise EmbedError(f"Embeddings generation error: {finish_reason}")

    return response_body


def main():
    """
    Entrypoint for Amazon Titan Multimodal Embeddings G1 example.
    """

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    model_id = "amazon.titan-embed-image-v1"
    input_text = "A family eating dinner"
    # Read image from file and encode it as base64 string.
    with open("/path/to/image", "rb") as image_file:
        input_image = base64.b64encode(image_file.read()).decode('utf8')
    output_embedding_length = 256

    # Create request body.
    body = json.dumps({
        "inputText": input_text,
        "inputImage": input_image,
        "embeddingConfig": {
            "outputEmbeddingLength": output_embedding_length
        }
    })


    try:

        response = generate_embeddings(model_id, body)

        print(f"Generated embeddings of length {output_embedding_length}: {response['embedding']}")
        print(f"Input text token count:  {response['inputTextTokenCount']}")

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))
        
    except EmbedError as err:
        logger.error(err.message)
        print(err.message)

    else:
        print(f"Finished generating embeddings with Amazon Titan Multimodal Embeddings G1 model {model_id}.")


if __name__ == "__main__":
    main()
```

------

# Anthropic Claude models
<a name="model-parameters-claude"></a>

This section describes the request parameters and response fields for Anthropic Claude models. Use this information to make inference calls to Anthropic Claude models with the [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) and [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) (streaming) operations. This section also includes Python code examples that shows how to call Anthropic Claude models. To use a model in an inference operation, you need the model ID for the model. To get the model ID, see [Supported foundation models in Amazon Bedrock](models-supported.md). Some models also work with the [Converse API](conversation-inference.md). To check if the Converse API supports a specific Anthropic Claude model, see [Supported models and model features](conversation-inference-supported-models-features.md). For more code examples, see [Code examples for Amazon Bedrock using AWS SDKs](service_code_examples.md).

Foundation models in Amazon Bedrock support input and output modalities, which vary from model to model. To check the modalities that Anthropic Claude models support, see [Supported foundation models in Amazon Bedrock](models-supported.md). To check which Amazon Bedrock features the Anthropic Claude models support, see [Supported foundation models in Amazon Bedrock](models-supported.md). To check which AWS Regions that Anthropic Claude models are available in, see [Supported foundation models in Amazon Bedrock](models-supported.md).

When you make inference calls with Anthropic Claude models, you include a prompt for the model. For general information about creating prompts for the models that Amazon Bedrock supports, see [Prompt engineering concepts](prompt-engineering-guidelines.md). For Anthropic Claude specific prompt information, see the [Anthropic Claude prompt engineering guide](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview).

You can use Amazon Bedrock to send [Anthropic Claude Text Completions API](model-parameters-anthropic-claude-text-completion.md) or [Anthropic Claude Messages API](model-parameters-anthropic-claude-messages.md) inference requests.

You use the messages API to create conversational applications, such as a virtual assistant or a coaching application. Use the text completion API for single-turn text generation applications. For example, generating text for a blog post or summarizing text that a user supplies. 

Anthropic Claude models support the use of XML tags to structure and delineate your prompts. For example, you can surround examples in your prompt with an `<examples>` tag. Use descriptive tag names for optimal results. For more information, see [Use XML tags](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/use-xml-tags) in the [Anthropic user guide](https://docs.anthropic.com/en/docs/welcome).

Anthropic Claude models support the use of PDF document processing and citations. Citations provide references to information in the document used by the model in a response.

**Note**  
To use system prompts in inference calls, you must use Anthropic Claude versions that are 2.1 or greater.  
For information about creating system prompts, see [Giving Claude a role with a system prompt](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/system-prompts) in the Anthropic Claude documentation.  
To avoid timeouts with Anthropic Claude version 2.1, we recommend limiting the input token count in the `prompt` field to 180K. We expect to address this timeout issue soon.

In the inference call, fill the `body` field with a JSON object that conforms the type call you want to make, [Anthropic Claude Text Completions API](model-parameters-anthropic-claude-text-completion.md) or [Anthropic Claude Messages API](model-parameters-anthropic-claude-messages.md). 

**Topics**
+ [

# Anthropic Claude Text Completions API
](model-parameters-anthropic-claude-text-completion.md)
+ [

# Anthropic Claude Messages API
](model-parameters-anthropic-claude-messages.md)

# Anthropic Claude Text Completions API
<a name="model-parameters-anthropic-claude-text-completion"></a>

This section provides inference parameters and code examples for using Anthropic Claude models with the Text Completions API.

**Topics**
+ [

## Anthropic Claude Text Completions API overview
](#model-parameters-anthropic-claude-text-completion-overview)
+ [

## Supported models
](#claude-messages-supported-models)
+ [

## Request and Response
](#model-parameters-anthropic-claude-text-completion-request-response)
+ [

## Code example
](#api-inference-examples-claude-text-completion)

## Anthropic Claude Text Completions API overview
<a name="model-parameters-anthropic-claude-text-completion-overview"></a>

Use the Text Completion API for single-turn text generation from a user supplied prompt. For example, you can use the Text Completion API to generate text for a blog post or to summarize text input from a user.

For information about creating prompts for Anthropic Claude models, see [Introduction to prompt design](https://docs.anthropic.com/claude/docs/introduction-to-prompt-design). If you want to use your existing Text Completions prompts with the [Anthropic Claude Messages API](model-parameters-anthropic-claude-messages.md), see [Migrating from Text Completions](https://docs.anthropic.com/claude/reference/migrating-from-text-completions-to-messages).

## Supported models
<a name="claude-messages-supported-models"></a>

You can use the Text Completions API with the following Anthropic Claude models.
+ Anthropic Claude Instant v1.2
+ Anthropic Claude v2
+ Anthropic Claude v2.1 

## Request and Response
<a name="model-parameters-anthropic-claude-text-completion-request-response"></a>

The request body is passed in the `body` field of a request to [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html). 

For more information, see [https://docs.anthropic.com/claude/reference/complete\$1post](https://docs.anthropic.com/claude/reference/complete_post) in the Anthropic Claude documentation.

------
#### [ Request ]

Anthropic Claude has the following inference parameters for a Text Completion inference call. 

```
{
    "prompt": "\n\nHuman:<prompt>\n\nAssistant:",
    "temperature": float,
    "top_p": float,
    "top_k": int,
    "max_tokens_to_sample": int,
    "stop_sequences": [string]
}
```

The following are required parameters.
+  **prompt** – (Required) The prompt that you want Claude to complete. For proper response generation you need to format your prompt using alternating `\n\nHuman:` and `\n\nAssistant:` conversational turns. For example:

  ```
  "\n\nHuman: {userQuestion}\n\nAssistant:"
  ```

  For more information, see [Prompt validation](https://docs.anthropic.com/claude/reference/prompt-validation) in the Anthropic Claude documentation. 
+  **max\$1tokens\$1to\$1sample** – (Required) The maximum number of tokens to generate before stopping. We recommend a limit of 4,000 tokens for optimal performance.

  Note that Anthropic Claude models might stop generating tokens before reaching the value of `max_tokens_to_sample`. Different Anthropic Claude models have different maximum values for this parameter. For more information, see [Model comparison](https://docs.anthropic.com/claude/docs/models-overview#model-comparison) in the Anthropic Claude documentation.     
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-text-completion.html)

The following are optional parameters.
+  **stop\$1sequences** – (Optional) Sequences that will cause the model to stop generating.

  Anthropic Claude models stop on `"\n\nHuman:"`, and may include additional built-in stop sequences in the future. Use the `stop_sequences` inference parameter to include additional strings that will signal the model to stop generating text.
+  **temperature** – (Optional) The amount of randomness injected into the response. Use a value closer to 0 for analytical / multiple choice, and a value closer to 1 for creative and generative tasks.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-text-completion.html)
+  **top\$1p** – (Optional) Use nucleus sampling.

  In nucleus sampling, Anthropic Claude computes the cumulative distribution over all the options for each subsequent token in decreasing probability order and cuts it off once it reaches a particular probability specified by `top_p`. You should alter either `temperature` or `top_p`, but not both.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-text-completion.html)
+  **top\$1k** – (Optional) Only sample from the top K options for each subsequent token.

  Use `top_k` to remove long tail low probability responses.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-text-completion.html)

------
#### [ Response ]

The Anthropic Claude model returns the following fields for a Text Completion inference call. 

```
{
    "completion": string,
    "stop_reason": string,
    "stop": string
}
```
+ **completion** – The resulting completion up to and excluding the stop sequences.
+ **stop\$1reason** – The reason why the model stopped generating the response.
  + **"stop\$1sequence"** – The model reached a stop sequence — either provided by you with the `stop_sequences` inference parameter, or a stop sequence built into the model.
  + **"max\$1tokens"** – The model exceeded `max_tokens_to_sample` or the model's maximum number of tokens. 
+ **stop** – If you specify the `stop_sequences` inference parameter, `stop` contains the stop sequence that signalled the model to stop generating text. For example, `holes` in the following response.

  ```
  {
      "completion": " Here is a simple explanation of black ",
      "stop_reason": "stop_sequence",
      "stop": "holes"
  }
  ```

  If you don't specify `stop_sequences`, the value for `stop` is empty.

------

## Code example
<a name="api-inference-examples-claude-text-completion"></a>

These examples shows how to call the *Anthropic Claude V2* model with on demand throughput. To use Anthropic Claude version 2.1, change the value of `modelId` to `anthropic.claude-v2:1`.

```
import boto3
import json
brt = boto3.client(service_name='bedrock-runtime')

body = json.dumps({
    "prompt": "\n\nHuman: explain black holes to 8th graders\n\nAssistant:",
    "max_tokens_to_sample": 300,
    "temperature": 0.1,
    "top_p": 0.9,
})

modelId = 'anthropic.claude-v2'
accept = 'application/json'
contentType = 'application/json'

response = brt.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)

response_body = json.loads(response.get('body').read())

# text
print(response_body.get('completion'))
```

The following example shows how to generate streaming text with Python using the prompt *write an essay for living on mars in 1000 words* and the Anthropic Claude V2 model:

```
import boto3
import json

brt = boto3.client(service_name='bedrock-runtime')

body = json.dumps({
    'prompt': '\n\nHuman: write an essay for living on mars in 1000 words\n\nAssistant:',
    'max_tokens_to_sample': 4000
})
                   
response = brt.invoke_model_with_response_stream(
    modelId='anthropic.claude-v2', 
    body=body
)
    
stream = response.get('body')
if stream:
    for event in stream:
        chunk = event.get('chunk')
        if chunk:
            print(json.loads(chunk.get('bytes').decode()))
```

# Anthropic Claude Messages API
<a name="model-parameters-anthropic-claude-messages"></a>

This section provides inference parameters and code examples for using the Anthropic Claude Messages API.

**Topics**
+ [

## Anthropic Claude Messages API overview
](#model-parameters-anthropic-claude-messages-overview)
+ [

# Tool use
](model-parameters-anthropic-claude-messages-tool-use.md)
+ [

# Extended thinking
](claude-messages-extended-thinking.md)
+ [

# Adaptive thinking
](claude-messages-adaptive-thinking.md)
+ [

# Thinking encryption
](claude-messages-thinking-encryption.md)
+ [

# Differences in thinking across model versions
](claude-messages-thinking-differences.md)
+ [

# Compaction
](claude-messages-compaction.md)
+ [

# Get validated JSON results from models
](claude-messages-structured-outputs.md)
+ [

# Request and Response
](model-parameters-anthropic-claude-messages-request-response.md)
+ [

# Code examples
](api-inference-examples-claude-messages-code-examples.md)
+ [

# Supported models
](claude-messages-supported-models.md)

## Anthropic Claude Messages API overview
<a name="model-parameters-anthropic-claude-messages-overview"></a>

You can use the Messages API to create chat bots or virtual assistant applications. The API manages the conversational exchanges between a user and an Anthropic Claude model (assistant). 

**Note**  
This topic shows how to use the Anthropic Claude messages API with the base inference operations ([InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html)). However, we recommend that you use the Converse API to implement messages in your application. The Converse API provides a unified set of parameters that work across all models that support messages. For more information, see [Carry out a conversation with the Converse API operations](conversation-inference.md).
Restrictions apply to the following operations: `InvokeModel`, `InvokeModelWithResponseStream`, `Converse`, and `ConverseStream`. See [API restrictions](inference-api-restrictions.md) for details.

Anthropic trains Claude models to operate on alternating user and assistant conversational turns. When creating a new message, you specify the prior conversational turns with the messages parameter. The model then generates the next Message in the conversation.

Each input message must be an object with a role and content. You can specify a single user-role message, or you can include multiple user and assistant messages.

If you are using the technique of prefilling the response from Claude (filling in the beginning of Claude's response by using a final assistant role Message), Claude will respond by picking up from where you left off. With this technique, Claude will still return a response with the assistant role. 

If the final message uses the assistant role, the response content will continue immediately from the content in that message. You can use this to constrain part of the model's response. 

Example with a single user message:

```
[{"role": "user", "content": "Hello, Claude"}]
```

Example with multiple conversational turns:

```
[
  {"role": "user", "content": "Hello there."},
  {"role": "assistant", "content": "Hi, I'm Claude. How can I help you?"},
  {"role": "user", "content": "Can you explain LLMs in plain English?"},
]
```

Example with a partially-filled response from Claude:

```
[
  {"role": "user", "content": "Please describe yourself using only JSON"},
  {"role": "assistant", "content": "Here is my JSON description:\n{"},
]
```

Each input message content may be either a single string or an array of content blocks, where each block has a specific type. Using a string is shorthand for an array of one content block of type "text". The following input messages are equivalent:

```
{"role": "user", "content": "Hello, Claude"}
```

```
{"role": "user", "content": [{"type": "text", "text": "Hello, Claude"}]}
```

For information about creating prompts for Anthropic Claude models, see [Intro to prompting](https://docs.anthropic.com/claude/docs/intro-to-prompting) in the Anthropic Claude documentation. If you have existing [Text Completion](model-parameters-anthropic-claude-text-completion.md) prompts that you want to migrate to the messages API, see [Migrating from Text Completions](https://docs.anthropic.com/claude/reference/migrating-from-text-completions-to-messages).

**Important**  
The timeout period for inference calls to Anthropic Claude 3.7 Sonnet and Claude 4 models is 60 minutes. By default, AWS SDK clients timeout after 1 minute. We recommend that you increase the read timeout period of your AWS SDK client to at least 60 minutes. For example, in the AWS Python botocore SDK, change the value of the `read_timeout` field in [botocore.config](https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html#) to at least 3600.

### System prompts
<a name="model-parameters-anthropic-claude-messages-system-prompts"></a>

You can also include a system prompt in the request. A system prompt lets you provide context and instructions to Anthropic Claude, such as specifying a particular goal or role. Specify a system prompt in the `system` field, as shown in the following example. 

```
"system": "You are Claude, an AI assistant created by Anthropic to be helpful,
                harmless, and honest. Your goal is to provide informative and substantive responses
                to queries while avoiding potential harms."
```

For more information, see [System prompts](https://docs.anthropic.com/en/docs/system-prompts) in the Anthropic documentation.

### Multimodal prompts
<a name="model-parameters-anthropic-claude-messages-multimodal-prompts"></a>

A multimodal prompt combines multiple modalities (images and text) in a single prompt. You specify the modalities in the `content` input field. The following example shows how you could ask Anthropic Claude to describe the content of a supplied image. For example code, see [Multimodal code examples](api-inference-examples-claude-messages-code-examples.md#api-inference-examples-claude-multimodal-code-example). 

```
{
    "anthropic_version": "bedrock-2023-05-31", 
    "max_tokens": 1024,
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/jpeg",
                        "data": "iVBORw..."
                    }
                },
                {
                    "type": "text",
                    "text": "What's in these images?"
                }
            ]
        }
    ]
}
```

Each image you include in a request counts towards your token usage. For more information, see [Image costs](https://docs.anthropic.com/claude/docs/vision#image-costs) in the Anthropic documentation.

# Tool use
<a name="model-parameters-anthropic-claude-messages-tool-use"></a>

**Warning**  
Several functions below are offered in beta as indicated. These features are made available to you as a "Beta Service" as defined in the AWS Service Terms. It is subject to your Agreement with AWS and the AWS Service Terms, and the applicable model EULA.

With Anthropic Claude models, you can specify a tool that the model can use to answer a message. For example, you could specify a tool that gets the most popular song on a radio station. If the user passes the message *What's the most popular song on WZPZ?*, the model determines that the tool you specified can help answer the question. In its response, the model requests that you run the tool on its behalf. You then run the tool and pass the tool result to the model, which then generates a response for the original message. For more information, see [Tool use (function calling)](https://docs.anthropic.com/en/docs/tool-use) in the Anthropic Claude documentation.

**Tip**  
We recommend that you use the Converse API for integrating tool use into your application. For more information, see [Use a tool to complete an Amazon Bedrock model response](tool-use.md). 

**Important**  
Claude Sonnet 4.5 now preserves intentional formatting in tool call string parameters. Previously, trailing newlines in string parameters were sometimes incorrectly stripped. This fix ensures that tools requiring precise formatting (like text editors) receive parameters exactly as intended. This is a behind-the-scenes improvement with no API changes required. However, tools with string parameters may now receive values with trailing newlines that were previously stripped.

**Note**  
Claude Sonnet 4.5 includes automatic optimizations to improve model performance. These optimizations may add small amounts of tokens to requests, but you are not billed for these system-added tokens.

You specify the tools that you want to make available to a model in the `tools` field. The following example is for a tool that gets the most popular songs on a radio station. 

```
[
    {
        "name": "top_song",
        "description": "Get the most popular song played on a radio station.",
        "input_schema": {
            "type": "object",
            "properties": {
                "sign": {
                    "type": "string",
                    "description": "The call sign for the radio station for which you want the most popular song. Example calls signs are WZPZ and WKRP."
                }
            },
            "required": [
                "sign"
            ]
        }
    }
]
```

When the model needs a tool to generate a response to a message, it returns information about the requested tool, and the input to the tool, in the message `content` field. It also sets the stop reason for the response to `tool_use`.

```
{
    "id": "msg_bdrk_01USsY5m3XRUF4FCppHP8KBx",
    "type": "message",
    "role": "assistant",
    "model": "claude-3-sonnet-20240229",
    "stop_sequence": null,
    "usage": {
        "input_tokens": 375,
        "output_tokens": 36
    },
    "content": [
        {
            "type": "tool_use",
            "id": "toolu_bdrk_01SnXQc6YVWD8Dom5jz7KhHy",
            "name": "top_song",
            "input": {
                "sign": "WZPZ"
            }
        }
    ],
    "stop_reason": "tool_use"
}
```

In your code, you call the tool on the tools behalf. You then pass the tool result (`tool_result`) in a user message to the model.

```
{
    "role": "user",
    "content": [
        {
            "type": "tool_result",
            "tool_use_id": "toolu_bdrk_01SnXQc6YVWD8Dom5jz7KhHy",
            "content": "Elemental Hotel"
        }
    ]
}
```

In its response, the model uses the tool result to generate a response for the original message.

```
{
    "id": "msg_bdrk_012AaqvTiKuUSc6WadhUkDLP",
    "type": "message",
    "role": "assistant",
    "model": "claude-3-sonnet-20240229",
    "content": [
        {
            "type": "text",
            "text": "According to the tool, the most popular song played on radio station WZPZ is \"Elemental Hotel\"."
        }
    ],
    "stop_reason": "end_turn"
}
```

## Fine-grained tool streaming
<a name="model-parameters-anthropic-claude-messages-fine-grained-tool-streaming"></a>

Fine-grained tool streaming is an Anthropic Claude model capability available with Claude Sonnet 4.5, Claude Haiku 4.5, Claude Sonnet 4, and Claude Opus 4. With fine-grained tool streaming, Claude developers can stream tool use parameters without buffering or JSON validation, reducing the latency to begin receiving large parameters.

**Note**  
When using fine-grained tool streaming, you may potentially receive invalid or partial JSON inputs. Please make sure to account for these edge cases in your code.

To use this feature, simply add the header `fine-grained-tool-streaming-2025-05-14` to a tool use request.

Here’s an example of how to specify the fine-grained tool streaming header:

```
{
  "anthropic_version": "bedrock-2023-05-31",
  "max_tokens": 1024,
  "anthropic_beta": ["fine-grained-tool-streaming-2025-05-14"],
  "messages": [
    {
      "role": "user",
      "content": "Can you write a long poem and make a file called poem.txt?"
    }
  ],
  "tools": [
    {
      "name": "make_file",
      "description": "Write text to a file",
      "input_schema": {
        "type": "object",
        "properties": {
          "filename": {
            "type": "string",
            "description": "The filename to write text to"
          },
          "lines_of_text": {
            "type": "array",
            "description": "An array of lines of text to write to the file"
          }
        },
        "required": [
          "filename",
          "lines_of_text"
        ]
      }
    }
  ]
}
```

In this example, fine-grained tool streaming enables Claude to stream the lines of a long poem into the tool call `make_file` without buffering to validate if the `lines_of_text` parameter is valid JSON. This means you can see the parameter stream as it arrives, without having to wait for the entire parameter to buffer and validate.

With fine-grained tool streaming, tool use chunks start streaming faster, and are often longer and contain fewer word breaks. This is due to differences in chunking behavior.

For example, without fine-grained streaming (15s delay):

```
Chunk 1: '{"'
Chunk 2: 'query": "Ty'
Chunk 3: 'peScri'
Chunk 4: 'pt 5.0 5.1 '
Chunk 5: '5.2 5'
Chunk 6: '.3'
Chunk 8: ' new f'
Chunk 9: 'eatur'
...
```

With fine-grained streaming (3s delay):

```
Chunk 1: '{"query": "TypeScript 5.0 5.1 5.2 5.3'
Chunk 2: ' new features comparison'
```

**Note**  
Because fine-grained streaming sends parameters without buffering or JSON validation, there is no guarantee that the resulting stream will complete in a valid JSON string. Particularly, if the stop reason `max_tokens` is reached, the stream may end midway through a parameter and may be incomplete. You will generally have to write specific support to handle when `max_tokens` is reached.

## Computer use (Beta)
<a name="model-parameters-anthropic-claude-messages-computer-use"></a>

Computer use is an Anthropic Claude model capability (in beta) available with Claude 3.5 Sonnet v2, Claude Sonnet 4.5, Claude Haiku 4.5, Claude 3.7 Sonnet, Claude Sonnet 4, and Claude Opus 4. With computer use, Claude can help you automate tasks through basic GUI actions.

**Warning**  
Computer use feature is made available to you as a ‘Beta Service’ as defined in the AWS Service Terms. It is subject to your Agreement with AWS and the AWS Service Terms, and the applicable model EULA. Please be aware that the Computer Use API poses unique risks that are distinct from standard API features or chat interfaces. These risks are heightened when using the Computer Use API to interact with the Internet. To minimize risks, consider taking precautions such as:  
Operate computer use functionality in a dedicated Virtual Machine or container with minimal privileges to prevent direct system attacks or accidents.
To prevent information theft, avoid giving the Computer Use API access to sensitive accounts or data.
Limiting the computer use APIs internet access to required domains to reduce exposure to malicious content.
To ensure proper oversight, keep a human in the loop for sensitive tasks (such as making decisions that could have meaningful real-world consequences) and for anything requiring affirmative consent (such as accepting cookies, executing financial transactions, or agreeing to terms of service).
Any content that you enable Claude to see or access can potentially override instructions or cause Claude to make mistakes or perform unintended actions. Taking proper precautions, such as isolating Claude from sensitive surfaces, is essential — including to avoid risks related to prompt injection. Before enabling or requesting permissions necessary to enable computer use features in your own products, please inform end users of any relevant risks, and obtain their consent as appropriate. 

The computer use API offers several pre-defined computer use tools for you to use. You can then create a prompt with your request, such as “send an email to Ben with the notes from my last meeting” and a screenshot (when required). The response contains a list of `tool_use` actions in JSON format (for example, scroll\$1down, left\$1button\$1press, screenshot). Your code runs the computer actions and provides Claude with screenshot showcasing outputs (when requested).

Since the release of Claude 3.5 v2, the tools parameter has been updated to accept polymorphic tool types; a `tool.type` property was added to distinguish them. `type` is optional; if omitted, the tool is assumed to be a custom tool (previously the only tool type supported). To access computer use, you must use the `anthropic_beta` parameter, with a corresponding enum, whose value depends on the model version in use. See the following table for more information.

Only requests made with this parameter and enum can use the computer use tools. It can be specified as follows: `"anthropic_beta": ["computer-use-2025-01-24"]`.


| Model | Beta header | 
| --- | --- | 
|  Claude Opus 4.5 Claude Opus 4.1 Claude Opus 4 Claude Sonnet 4.5 Claude Haiku 4.5 Claude Sonnet 4 Claude 3.7 Sonnet  | computer-use-2025-01-24 | 
| Claude 3.5 Sonnet v2 | computer-use-2024-10-22 | 

For more information, see [Computer use (beta)](https://docs.anthropic.com/en/docs/build-with-claude/computer-use) in the Anthropic documentation.

The following is an example response that assumes the request contained a screenshot of your desktop with a Firefox icon. 

```
{
    "id": "msg_123",
    "type": "message",
    "role": "assistant",
    "model": "anthropic.claude-3-5-sonnet-20241022-v2:0",
    "content": [
        {
            "type": "text",
            "text": "I see the Firefox icon. Let me click on it and then navigate to a weather website."
        },
        {
            "type": "tool_use",
            "id": "toolu_123",
            "name": "computer",
            "input": {
                "action": "mouse_move",
                "coordinate": [
                    708,
                    736
                ]
            }
        },
        {
            "type": "tool_use",
            "id": "toolu_234",
            "name": "computer",
            "input": {
                "action": "left_click"
            }
        }
    ],
    "stop_reason": "tool_use",
    "stop_sequence": null,
    "usage": {
        "input_tokens": 3391,
        "output_tokens": 132
    }
}
```

## Anthropic defined tools
<a name="model-parameters-anthropic-anthropic-defined-tools"></a>

Anthropic provides a set of tools to enable certain Claude models to effectively use computers. When specifying an Anthropic defined tool, the `description` and `tool_schema` fields are not necessary or allowed. Anthropic defined tools are defined by Anthropic, but you must explicitly evaluate the results of the tool and return the `tool_results` to Claude. As with any tool, the model does not automatically execute the tool. Each Anthropic defined tool has versions optimized for specific models Claude 3.5 Sonnet (new) and Claude 3.7 Sonnet:


| Model | Tool | Notes | 
| --- | --- | --- | 
|  Claude Claude Opus 4.1 Claude Claude Opus 4 Claude Sonnet 4.5 Claude Haiku 4.5 Claude Sonnet 4  |  <pre>{ <br />    "type": "text_editor_20250124", <br />    "name": "str_replace_based_edit_tool" <br />}</pre>  | Update to existing `str_replace_editor` tool | 
|  Claude 3.7 Sonnet  |  <pre>{ <br />    "type": "computer_20250124", <br />    "name": "computer" <br />}</pre>  |  Includes new actions for more precise control  | 
|  Claude 3.7 Sonnet  |  <pre>{ <br />    "type": "text_editor_20250124", <br />    "name": "str_replace_editor"<br />}</pre>  | Same capabilities as 20241022 version | 
|  Claude 3.5 Sonnet v2  |  <pre>{ <br />    "type": "bash_20250124", <br />    "name": "bash" <br />}</pre>  |  Same capabilities as 20241022 version  | 
|  Claude 3.5 Sonnet v2  |  <pre>{ <br />    "type": "text_editor_20241022", <br />    "name": "str_replace_editor"<br />}</pre>  | 
|  Claude 3.5 Sonnet v2  |  <pre>{ <br />    "type": "bash_20241022", <br />    "name": "bash"<br />}</pre>  | 
|  Claude 3.5 Sonnet v2  |  <pre>{ <br />    "type": "computer_20241022", <br />    "name": "computer"<br />}</pre>  | 

The `type` field identifies the tool and its parameters for validation purposes, the `name` field is the tool name exposed to the model.

If you want to prompt the model to use one of these tools, you can explicitly refer the tool by the `name` field. The `name` field must be unique within the tool list; you cannot define a tool with the same `name` as an Anthropic defined tool in the same API call.

## Automatic tool call clearing (Beta)
<a name="model-parameters-anthropic-claude-automatic-tool-call-clearing"></a>

**Warning**  
Automatic tool call clearing is made available as a "Beta Service" as defined in the AWS Service Terms.

**Note**  
This feature is currently supported on Claude Sonnet 4/4.5, Claude Haiku 4.5, and Claude Opus 4/4.1/4.5.

Automatic tool call clearing is an Anthropic Claude model capability (in beta). With this feature, Claude can automatically clear old tool use results as you approach token limits, allowing for more efficient context management in multi-turn tool use scenarios. To use tool use clearing, you need to add `context-management-2025-06-27` to the list of beta headers on the anthropic\$1beta request parameter. You will also need to specify the use of `clear_tool_uses_20250919` and choose from the following configuration options.

These are the available controls for the `clear_tool_uses_20250919` context management strategy. All are optional or have defaults:


| **Configuration Option** | **Description** | 
| --- | --- | 
|  `trigger` default: 100,000 input tokens  |  Defines when the context editing strategy activates. Once the prompt exceeds this threshold, clearing will begin. You can specify this value in either input\$1tokens or tool\$1uses.  | 
|  `keep` default: 3 tool uses  |  Defines how many recent tool use/result pairs to keep after clearing occurs. The API removes the oldest tool interactions first, preserving the most recent ones. Helpful when the model needs access to recent tool interactions to continue the conversation effectively.  | 
|  `clear_at_least` (optional)  |  Ensures a minimum number of tokens are cleared each time the strategy activates. If the API can't clear at least the specified amount, the strategy will not be applied. This is useful for determining whether context clearing is worth breaking your prompt cache for.  | 
|  `exclude_tools` (optional)  |  List of tool names whose tool uses and results should never be cleared. Useful for preserving important context.  | 
|  `clear_tool_inputs` (optional, default False)  |  Controls whether the tool call parameters are cleared along with the tool results. By default, only the tool results are cleared while keeping Claude's original tool calls visible, so Claude can see what operations were performed even after the results are removed.  | 

**Note**  
Tool clearing will invalidate your cache if your prefixes contain your tools.

------
#### [ Request ]

```
response = client.beta.messages.create(
    betas=["context-management-2025-06-27"],
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[
        {
            "role": "user",
            "content": "Create a simple command line calculator app using Python"
       }
    ],
    tools=[
        {
            "type": "text_editor_20250728",
            "name": "str_replace_based_edit_tool",
            "max_characters": 10000
        },
       {
            "type": "web_search_20250305",
            "name": "web_search",
            "max_uses": 3
       }
    ],
    extra_body={
        "context_management": {
            "edits": [
                {
                    "type": "clear_tool_uses_20250919",
                # The below parameters are OPTIONAL:
                    # Trigger clearing when threshold is exceeded
                    "trigger": {
                        "type": "input_tokens",
                        "value": 30000
                    },
                    # Number of tool uses to keep after clearing
                    "keep": {
                        "type": "tool_uses",
                        "value": 3
                    },
                    # Optional: Clear at least this many tokens
                    "clear_at_least": {
                        "type": "input_tokens",
                        "value": 5000
                    },
                    # Exclude these tools uses from being cleared
                    "exclude_tools": ["web_search"]
                }
            ]
       }
    }
 )
```

------
#### [ Response ]

```
{
    "id": "msg_123",
    "type": "message",
    "role": "assistant",
    "content": [
        {
            "type": "tool_use",
            "id": "toolu_456",
            "name": "data_analyzer",
            "input": {
                "data": "sample data"
            }
        }
    ],
    "context_management": {
        "applied_edits": [
            {
                "type": "clear_tool_uses_20250919",
                "cleared_tool_uses": 8,  # Number of tool use/result pairs that were cleared
                "cleared_input_tokens": 50000  # Total number of input tokens removed from the prompt
            }
        ]
    }
    "stop_reason": "tool_use",
    "usage": {
        "input_tokens": 150,
        "output_tokens": 50
    }
}
```

------
#### [ Streaming Response ]

```
data: {"type": "message_start", "message": {"id": "msg_123", "type": "message", "role": "assistant"}}

data: {"type": "content_block_start", "index": 0, "content_block": {"type": "tool_use", "id": "toolu_456", "name": "data_analyzer", "input": {}}}

data: {"type": "content_block_delta", "index": 0, "delta": {"type": "input_json_delta", "partial_json": "{\"data\": \"sample"}}

data: {"type": "content_block_delta", "index": 0, "delta": {"type": "input_json_delta", "partial_json": " data\"}"}}

data: {"type": "content_block_stop", "index": 0}

data: {"type": "message_delta", "delta": {"stop_reason": "tool_use"}}

data: {"type": "message_stop"}

{
  "type": "message_delta",
  "delta": {
    "stop_reason": "end_turn",
    "stop_sequence": null,
  },
  "usage": {
    "output_tokens": 1024
  },
  "context_management": {
    "applied_edits": [...],
  }
}
```

------

**Note**  
Bedrock does not currently support `clear_tool_uses_20250919` context management on the CountTokens API.

## Memory Tool (Beta)
<a name="model-parameters-anthropic-claude-memory-tool"></a>

**Warning**  
Memory Tool is made available as a "Beta Service" as defined in the AWS Service Terms.

Claude Sonnet 4.5 includes a new memory tool that provide customers a way to manage memory across conversations. With this feature, customers can allow Claude to retrieve information outside the context window by providing access to a local directory. This will be available as a beta feature. To use this feature, you must use the `context-management-2025-06-27` beta header.

Tool definition:

```
{
  "type": "memory_20250818",
  "name": "memory"
}
```

Example Request:

```
{
    "max_tokens": 2048,
    "anthropic_version": "bedrock-2023-05-31",
    "anthropic_beta": ["context-management-2025-06-27"],
    "tools": [{
        "type": "memory_20250818",
        "name": "memory"
    }],
    "messages": [
        {
            "role": "user",
            "content": [{"type": "text", "text": "Remember that my favorite color is blue and I work at Amazon?"}]
        }
    ]
}
```

Example Response:

```
{
    "id": "msg_vrtx_014mQ5ficCRB6PEa5k5sKqHd",
    "type": "message",
    "role": "assistant",
    "model": "claude-sonnet-4-20250514",
    "content": [
        {
            "type": "text",
            "text": "I'll start by checking your memory directory and then record this important information about you."
        },
        {
            "type": "tool_use",
            "id": "toolu_vrtx_01EU1UrCDigyPMRntr3VYvUB",
            "name": "memory",
            "input": {
                "command": "view",
                "path": "/memories"
            }
        }
    ],
    "stop_reason": "tool_use",
    "stop_sequence": null,
    "usage": {
        "input_tokens": 1403,
        "cache_creation_input_tokens": 0,
        "cache_read_input_tokens": 0,
        "output_tokens": 87
    },
    "context_management": {
        "applied_edits": []
    }
}
```

## Cost considerations for tool use
<a name="model-parameters-anthropic-claude-tool-use-cost"></a>

Tool use requests are priced based on the following factors:

1. The total number of input tokens sent to the model (including in the tools parameter).

1. The number of output tokens generated.

Tools are priced the same as all other Claude API requests, but do include additional tokens per request. The additional tokens from tool use come from the following:
+ The `tools` parameter in the API requests. For example, tool names, descriptions, and schemas.
+ Any `tool_use` content blocks in API requests and responses.
+ Any `tool_result` content blocks in API requests.

When you use tools, the Anthropic models automatically include a special system prompt that enables tool use. The number of tool use tokens required for each model is listed in the following table. This table excludes the additional tokens described previously. Note that this table assumes at least one tool is provided. If no tools are provided, then a tool choice of none uses 0 additional system prompt tokens.


| Model | Tool choice | Tool use system prompt token count | 
| --- | --- | --- | 
|  Claude Opus 4.5 Claude Opus 4.1 Claude Opus 4 Claude Sonnet 4.5 Claude Haiku 4.5 Claude Sonnet 4 Claude 3.7 Sonnet Claude 3.5 Sonnet v2  | auto or none | 346 | 
|  Claude Opus 4.5 Claude Opus 4.1 Claude Opus 4 Claude Sonnet 4.5 Claude Haiku 4.5 Claude Sonnet 4 Claude 3.7 Sonnet Claude 3.5 Sonnet v2  | any or tool | 313 | 
|  Claude 3.5 Sonnet  | auto or none | 294 | 
|  Claude 3.5 Sonnet  | any or tool | 261 | 
|  Claude 3 Opus  | auto or none | 530 | 
|  Claude 3 Opus  | any or tool | 281 | 
|  Claude 3 Sonnet  | auto or none | 159 | 
|  Claude 3 Sonnet  | any or tool | 235 | 
|  Claude 3 Haiku  | auto or none | 264 | 
|  Claude 3 Haiku  | any or tool | 340 | 

## Tool search tool (beta)
<a name="model-parameters-anthropic-claude-tool-search-tool"></a>

Tool Search Tool allows Claude to work with hundreds or even thousands of tools without loading all their definitions into the context window upfront. Instead of declaring all tools immediately, you can mark them with `defer_loading: true`, and Claude finds and loads only the tools it needs through the tool search mechanism.

To access this feature you must use the beta header `tool-search-tool-2025-10-19`. Note that this feature is currently only available via the [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) and [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) APIs.

Tool definition:

```
{
    "type": "tool_search_tool_regex",
    "name": "tool_search_tool_regex"
}
```

Request example:

```
{
    "anthropic_version": "bedrock-2023-05-31",
    "anthropic_beta": [
        "tool-search-tool-2025-10-19"
    ],
    "max_tokens": 4096,
    "tools": [{
            "type": "tool_search_tool_regex",
            "name": "tool_search_tool_regex"
        },
        {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "input_schema": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location"]
            },
            "defer_loading": true
        },
        {
            "name": "search_files",
            "description": "Search through files in the workspace",
            "input_schema": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string"
                    },
                    "file_types": {
                        "type": "array",
                        "items": {
                            "type": "string"
                        }
                    }
                },
                "required": ["query"]
            },
            "defer_loading": true
        }
    ],
    "messages": [{
        "role": "user",
        "content": "What's the weather in Seattle?"
    }]
}
```

Response example

```
{
    "role": "assistant",
    "content": [{
            "type": "text",
            "text": "I'll search for the appropriate tools to help with this task."
        },
        {
            "type": "server_tool_use",
            "id": "srvtoolu_01ABC123",
            "name": "tool_search_tool_regex",
            "input": {
                "pattern": "weather"
            }
        },
        {
            "type": "tool_search_tool_result",
            "tool_use_id": "srvtoolu_01ABC123",
            "content": {
                "type": "tool_search_tool_search_result",
                "tool_references": [{
                    "type": "tool_reference",
                    "tool_name": "get_weather"
                }]
            }
        },
        {
            "type": "text",
            "text": "Now I can check the weather."
        },
        {
            "type": "tool_use",
            "id": "toolu_01XYZ789",
            "name": "get_weather",
            "input": {
                "location": "Seattle",
                "unit": "fahrenheit"
            }
        }
    ],
    "stop_reason": "tool_use"
}
```

Streaming example

```
# Event 1: content_block_start(with complete server_tool_use block) {
    "type": "content_block_start",
    "index": 0,
    "content_block": {
        "type": "server_tool_use",
        "id": "srvtoolu_01ABC123",
        "name": "tool_search_tool_regex"
    }
}

# Event 2: content_block_delta(input JSON streamed) {
    "type": "content_block_delta",
    "index": 0,
    "delta": {
        "type": "input_json_delta",
        "partial_json": "{\"regex\": \".*weather.*\"}"
    }
}

# Event 3: content_block_stop(tool_use complete) {
    "type": "content_block_stop",
    "index": 0
}

# Event 4: content_block_start(COMPLETE result in single chunk) {
    "type": "content_block_start",
    "index": 1,
    "content_block": {
        "type": "tool_search_tool_result",
        "tool_use_id": "srvtoolu_01ABC123",
        "content": {
            "type": "tool_search_tool_search_result",
            "tool_references": [{
                "type": "tool_reference",
                "tool_name": "get_weather"
            }]
        }
    }
}

# Event 5: content_block_stop(result complete) {
    "type": "content_block_stop",
    "index": 1
}
```

**Custom tool search tools**  
You can implement custom tool search tools (for example, using embeddings) by defining a tool that returns `tool_reference` blocks. The custom tool must have `defer_loading: false` while other tools should have `defer_loading: true`. When you define your own Tool Search Tool, it should return a tool result containing `tool_reference` content blocks that point to the tools you want Claude to use.

The expected customer-defined Tool Search Tool result response format:

```
{
    "type": "tool_result",
    "tool_use_id": "toolu_01ABC123",
    "content": [{
            "type": "tool_reference",
            "tool_name": "get_weather"
        },
        {
            "type": "tool_reference",
            "tool_name": "weather_forecast"
        }
    ]
}
```

The `tool_name` must match a tool defined in the request with `defer_loading: true`. Claude will then have access to those tools' full schemas.

**Custom search tools - Detailed example**  
You can implement custom tool search tools (for example, using embeddings or semantic search) by defining a tool that returns `tool_reference` blocks. This enables sophisticated tool discovery mechanisms beyond regex matching.

Request example with custom TST:

```
{
    "model": "claude-sonnet-4-5-20250929",
    "anthropic_version": "bedrock-2023-05-31",
    "anthropic_beta": ["tool-search-tool-2025-10-19"],
    "max_tokens": 4096,
    "tools": [{
            "name": "semantic_tool_search",
            "description": "Search for available tools using semantic similarity. Returns the most relevant tools for the given query.",
            "input_schema": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Natural language description of what kind of tool is needed"
                    },
                    "top_k": {
                        "type": "integer",
                        "description": "Number of tools to return (default: 5)"
                    }
                },
                "required": ["query"]
            },
            "defer_loading": false
        },
        {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "input_schema": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location"]
            },
            "defer_loading": true
        },
        {
            "name": "search_flights",
            "description": "Search for available flights between locations",
            "input_schema": {
                "type": "object",
                "properties": {
                    "origin": {
                        "type": "string"
                    },
                    "destination": {
                        "type": "string"
                    },
                    "date": {
                        "type": "string"
                    }
                },
                "required": ["origin", "destination", "date"]
            },
            "defer_loading": true
        }
    ],
    "messages": [{
        "role": "user",
        "content": "What's the weather forecast in Seattle for the next 3 days?"
    }]
}
```

Claude's response (calling custom TST):

```
{
    "role": "assistant",
    "content": [{
            "type": "text",
            "text": "I'll search for the appropriate tools to help with weather information."
        },
        {
            "type": "tool_use",
            "id": "toolu_01ABC123",
            "name": "semantic_tool_search",
            "input": {
                "query": "weather forecast multiple days",
                "top_k": 3
            }
        }
    ],
    "stop_reason": "tool_use"
}
```

**Customer-provided tool result**  
After performing semantic search on the tool library, the customer returns matching tool references:

```
{
    "role": "user",
    "content": [{
        "type": "tool_search_tool_result",
        "tool_use_id": "toolu_01ABC123",
        "content": {
            "type": "tool_search_tool_search_result",
            "tool_references": [{
                "type": "tool_reference",
                "tool_name": "get_weather"
            }]
        }
    }]
}
```

Claude's follow-up (using discovered tool)

```
{
    "role": "assistant",
    "content": [{
            "type": "text",
            "text": "I found the forecast tool. Let me get the weather forecast for Seattle."
        },
        {
            "type": "tool_use",
            "id": "toolu_01DEF456",
            "name": "get_weather",
            "input": {
                "location": "Seattle, WA"
            }
        }
    ],
    "stop_reason": "tool_use"
}
```

**Error handling**
+ Setting `defer_loading: true` for all tools (including the Tool Search Tool) will throw a 400 error.
+ Passing a `tool_reference` without a corresponding tool definition will throw a 400 error

## Tool use examples (beta)
<a name="model-parameters-anthropic-claude-tool-use-examples"></a>

Claude Opus 4.5 supports user-provided examples in tool definitions to increase Claude's tool use performance. You can provide examples as full function calls, formatted exactly as real LLM outputs would be, without needing translation into another format. To use this feature you must pass the beta header `tool-examples-2025-10-29`.

Tool definition example:

```
{
    "name": "get_weather",
    "description": "Get the current weather in a given location",
    "input_schema": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA"
            },
            "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "Temperature unit"
            }
        },
        "required": ["location"]
    },
    "input_examples": [{
            "location": "San Francisco, CA",
            "unit": "fahrenheit"
        },
        {
            "location": "Tokyo, Japan",
            "unit": "celsius"
        },
        {
            "location": "New York, NY"
        }
    ]
}
```

**Validation rules**
+ Schema conformance: Each example in `input_examples` must be valid according to the tool's `input_schema`.
  + Required fields must be present in at least one example.
  + Field types must match the schema.
  + Enum values must be from the allowed set.
  + If validation fails, return a 400 error with details about which example failed validation.
+ Array requirements: `input_examples` must be an array (can be empty).
  + Empty array `[]` is valid and equivalent to omitting the field.
  + Single example must still be wrapped in an array: `[{...}]`
  + Length limit: start with a limit of 20 examples per tool definition.

Error examples:

```
// Invalid: Example doesn't match schema (missing required field)
{
    "type": "invalid_request_error",
    "message": "Tool 'get_weather' input_examples[0] is invalid: Missing required property 'location'"
}

// Invalid: Example has wrong type for field
{
    "type": "invalid_request_error",
    "message": "Tool 'search_products' input_examples[1] is invalid: Property 'filters.price_range.min' must be a number, got string"
}

// Invalid: input_examples on server-side tool
{
    "type": "invalid_request_error",
    "message": "input_examples is not supported for server-side tool"
}
```

# Extended thinking
<a name="claude-messages-extended-thinking"></a>

Extended thinking gives Claude enhanced reasoning capabilities for complex tasks, while providing varying levels of transparency into its step-by-step thought process before it delivers its final answer. Whenever you enable Claude’s thinking mode, you will need to set a budget for the maximum number of tokens that Claude can use for its internal reasoning process.

The supported models are as follows:


| Model | Model ID | 
| --- | --- | 
| Claude Opus 4.5 | `anthropic.claude-opus-4-5-20251101-v1:0` | 
| Claude Opus 4 | `anthropic.claude-opus-4-20250514-v1:0` | 
| Claude Sonnet 4 | `anthropic.claude-sonnet-4-20250514-v1:0` | 
| Claude Sonnet 4.5 | `anthropic.claude-sonnet-4-5-20250929-v1:0` | 
| Claude Haiku 4.5 | `anthropic.claude-haiku-4-5-20251001-v1:0` | 
| Claude 3.7 Sonnet | `anthropic.claude-3-7-sonnet-20250219-v1:0` | 
| Claude Sonnet 4.5 | `anthropic.claude-opus-4-5-20251101-v1:0` | 

**Note**  
API behavior differs between Claude 3.7 and Claude 4 models. For more information, see [Differences in thinking across model versions](claude-messages-thinking-differences.md).

**Topics**
+ [

## Best practices and considerations for extended thinking
](#claude-messages-extended-thinking-bps)
+ [

## How extended thinking works
](#claude-messages-how-extended-thinking-works)
+ [

## How to use extended thinking
](#claude-messages-use-extended-thinking)
+ [

## Extended thinking with tool use
](#claude-messages-extended-thinking-tool-use)
+ [

## Thinking block clearing (beta)
](#claude-messages-thinking-block-clearing)
+ [

## Extended thinking with prompt caching
](#claude-messages-extended-thinking-prompt-caching)
+ [

## Understanding thinking block caching behavior
](#claude-messages-extended-thinking-caching-behavior)
+ [

## Max tokens and context window size with extended thinking
](#claude-messages-extended-thinking-max-tokens)
+ [

## Extended thinking token cost considerations
](#claude-messages-extended-thinking-cost)

## Best practices and considerations for extended thinking
<a name="claude-messages-extended-thinking-bps"></a>

Usage guidelines
+ **Task selection**: Use extended thinking for particularly complex tasks that benefit from step-by-step reasoning like math, coding, and analysis.
+ **Context handling**: You do not need to remove previous thinking blocks yourself. The Anthropic API automatically ignores thinking blocks from previous turns and they are not included when calculating context usage.
+ **Prompt engineering**: Review Anthropic's [extended thinking prompting tips](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/extended-thinking-tips) if you want to maximize Claude's thinking capabilities.

Performance considerations
+ **Response times**: Be prepared for potentially longer response times due to the additional processing required for the reasoning process. Factor in that generating thinking blocks might increase the overall response time.
+ **Streaming requirements**: Streaming is required when `max_tokens` is greater than 21,333. When streaming, be prepared to handle both `thinking` and `text` content blocks as they arrive.

Feature compatibility
+ Thinking isn't compatible with `temperature`, `top_p`, or `top_k` modifications or [forced tool use](https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/implement-tool-use#forcing-tool-use).
+ You cannot pre-fill responses when thinking is enabled.
+ Changes to the thinking budget invalidate cached prompt prefixes that include messages. However, cached system prompts and tool definitions will continue to work when thinking parameters change.

Working with thinking budgets
+ **Budget optimizations**: The minimum budget is 1,024 tokens. Anthropic suggests starting at the minimum and increasing the thinking budget incrementally to find the optimal range for your use case. Larger token counts might allow for more comprehensive and nuanced reasoning, but there can also be diminishing returns depending on the task. The thinking budget is a target rather than a strict limit - actual token usage may vary based on the task.
+ **Minimum and optimal settings**: The minimum budget is 1,024 tokens. We suggest starting at the minimum and increasing the thinking budget incrementally to find the optimal range for Claude to perform well for your use case. Higher token counts might allow you to achieve more comprehensive and nuanced reasoning, but there might also be diminishing returns depending on the task. The thinking budget is a target rather than a strict limit - actual token usage can vary based on the task.
+ **Experimentation**: The model might perform differently at different max thinking budget settings. Increasing the max thinking budget can make the model think better or harder, at the tradeoff of increased latency. For critical tasks, consider testing different budget settings to find the optimal balance between quality and performance.
+ **Large budgets**: For thinking budgets above 32K, we recommend using batch processing to avoid networking issues. Requests pushing the model to think above 32K tokens causes long running requests that might result in system timeouts and open connection limits. Please note that `max_tokens` limits vary among Claude models. For more information, see [Max tokens and context window size with extended thinking](#claude-messages-extended-thinking-max-tokens).
+ **Token usage tracking**: Monitor thinking token usage to optimize costs and performance.

## How extended thinking works
<a name="claude-messages-how-extended-thinking-works"></a>

When extended thinking is turned on, Claude creates `thinking` content blocks where it outputs its internal reasoning. Claude incorporates insights from this reasoning before crafting a final response. The API response will include `thinking` content blocks, followed by `text` content blocks.

Here’s an example of the default response format:

```
{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
    },
    {
      "type": "text", 
      "text": "Based on my analysis..."
    }
  ]
}
```

For more information about the response format of extended thinking, see Anthropic’s Messages API [Request and Response](model-parameters-anthropic-claude-messages-request-response.md).

## How to use extended thinking
<a name="claude-messages-use-extended-thinking"></a>

To turn on extended thinking, add a `thinking` object, with the `thinking` parameter set to enabled and the `budget_tokens` set to a specified token budget for extended thinking.

The `budget_tokens` parameter determines the maximum number of tokens Claude is allowed to use for its internal reasoning process. In Claude 4 models, this limit applies to full thinking tokens, and not to the summarized output. Larger budgets can improve response quality by enabling more thorough analysis for complex problems, although Claude might not use the entire budget allocated, especially at ranges above 32K.

The value of `budget_tokens` must be set to a value less than `max_tokens`. However, when using [Interleaved thinking (beta)](#claude-messages-extended-thinking-tool-use-interleaved) with tools, you can exceed this limit because the token limit becomes your entire context window (200K tokens).

### Summarized thinking
<a name="claude-messages-use-extended-thinking-summarized"></a>

With extended thinking enabled, the Messages API for Claude 4 models returns a summary of Claude’s full thinking process. Summarized thinking provides the full intelligence benefits of extended thinking, while preventing misuse.

Here are some important considerations for summarized thinking:
+ You’re charged for the full thinking tokens generated by the original request, not the summary tokens.
+ The billed output token count will not match the count of tokens you see in the response.
+ The prompt provided to the summarizer model is subject to change.
+ The first few lines of thinking output are more verbose, providing detailed reasoning that's particularly helpful for prompt engineering purposes.

**Note**  
Claude 3.7 Sonnet still returns the full thinking output.  
To access the full thinking output for Claude 4 models, contact your account team.

### Streaming thinking
<a name="claude-messages-use-extended-thinking-streaming"></a>

You can stream extended thinking responses using server-sent events (SSE). When streaming is enabled for extended thinking, you receive thinking content via `thinking_delta` events. Streamed events are not guaranteed to return at a constant rate. There can be delays between streaming events. For more documentation on streaming via the Messages API, see [Streaming messages](https://docs.anthropic.com/en/docs/build-with-claude/streaming).

Here’s how to handle streaming with thinking using **InvokeModelWithResponseStream**:

```
{
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 10000,
    "thinking": {
        "type": "enabled",
        "budget_tokens": 4000
    },
    "messages": [
        {
            "role": "user",
            "content": "What is 27 * 453?"
        }
    ]
}
```

Response:

```
event: message_start
data: {"type": "message_start", "message": {"id": "msg_01...", "type": "message", "role": "assistant", "content": [], "model": "claude-3-7-sonnet-20250219", "stop_reason": null, "stop_sequence": null}}

event: content_block_start
data: {"type": "content_block_start", "index": 0, "content_block": {"type": "thinking", "thinking": ""}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "thinking_delta", "thinking": "Let me solve this step by step:\n\n1. First break down 27 * 453"}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "thinking_delta", "thinking": "\n2. 453 = 400 + 50 + 3"}}

// Additional thinking deltas...

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "signature_delta", "signature": "EqQBCgIYAhIM1gbcDa9GJwZA2b3hGgxBdjrkzLoky3dl1pkiMOYds..."}}

event: content_block_stop
data: {"type": "content_block_stop", "index": 0}

event: content_block_start
data: {"type": "content_block_start", "index": 1, "content_block": {"type": "text", "text": ""}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 1, "delta": {"type": "text_delta", "text": "27 * 453 = 12,231"}}

// Additional text deltas...

event: content_block_stop
data: {"type": "content_block_stop", "index": 1}

event: message_delta
data: {"type": "message_delta", "delta": {"stop_reason": "end_turn", "stop_sequence": null}}

event: message_stop
data: {"type": "message_stop"}
```

**About streaming behavior with thinking**  
When using streaming with thinking enabled, you might notice that text sometimes arrives in larger chunks alternating with smaller, token-by-token delivery. This is expected behavior, especially for thinking content. The streaming system needs to process content in batches for optimal performance, which can result in this delivery pattern.

## Extended thinking with tool use
<a name="claude-messages-extended-thinking-tool-use"></a>

Extended thinking can be used alongside [Tool use](model-parameters-anthropic-claude-messages-tool-use.md) allowing Claude to reason through tool selection and results processing. When using extended thinking with tool use, be aware of the following limitations:
+ **Tool choice limitation**: Tool use with thinking only supports `tool_choice: any`. It does not support providing a specific tool, `auto`, or any other values.
+ **Preserving thinking blocks**: During tool use, you must pass thinking blocks back to the API for the last assistant message. Include the complete unmodified block back to the API to maintain reasoning continuity.

Here is how context window management works with tools:

```
{
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 10000,
    "thinking": {
        "type": "enabled",
        "budget_tokens": 4000
    },
  "tools": [
  {
    "name": "get_weather",
    "description": "Get current weather for a location",
    "input_schema": {
      "type": "object",
      "properties": {
        "location": {
          "type": "string"
        }
      },
      "required": [
        "location"
      ]
    }
  }
],
    "messages": [
        {
            "role": "user",
            "content": "What's the weather in Paris?"
        }
    ]
}
```

The first response is the following:

```
{
    "content": [
        {
            "type": "thinking",
            "thinking": "The user wants to know the current weather in Paris. I have access to a function `get_weather`...",
            "signature": "BDaL4VrbR2Oj0hO4XpJxT28J5TILnCrrUXoKiiNBZW9P+nr8XSj1zuZzAl4egiCCpQNvfyUuFFJP5CncdYZEQPPmLxYsNrcs...."
        },
        {
            "type": "text",
            "text": "I can help you get the current weather information for Paris. Let me check that for you"
        },
        {
            "type": "tool_use",
            "id": "toolu_01CswdEQBMshySk6Y9DFKrfq",
            "name": "get_weather",
            "input": {
                "location": "Paris"
            }
        }
    ]
}
```

Continuing the conversation with tool use will generate another response. Notice that the `thinking_block` is passed in as well as the `tool_use_block`. If this is not passed in, an error occurs.

```
{
  "anthropic_version": "bedrock-2023-05-31",
  "max_tokens": 10000,
  "thinking": {
    "type": "enabled",
    "budget_tokens": 4000
  },
  "tools": [
    {
      "name": "get_weather",
      "description": "Get current weather for a location",
      "input_schema": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string"
          }
        },
        "required": [
          "location"
        ]
      }
    }
  ],
      "messages": [
        {
          "role": "user",
          "content": "What's the weather in Paris?"
        },
        {
          "role": "assistant",
          "content": [
            {
              "type": "thinking",
              "thinking": "The user wants to know the current weather in Paris. I have access to a function `get_weather`…",
              "signature": "BDaL4VrbR2Oj0hO4XpJxT28J5TILnCrrUXoKiiNBZW9P+nr8XSj1zuZzAl4egiCCpQNvfyUuFFJP5CncdYZEQPPmLxY",
            },
            {
              "type": "tool_use",
              "id": "toolu_01CswdEQBMshySk6Y9DFKrfq",
              "name": "get_weather",
              "input": {
                "location": "Paris"
              }
            }
          ]
        },
        {
          "role": "user",
          "content": [
            {
              "type": "tool_result",
              "tool_use_id": "toolu_01CswdEQBMshySk6Y9DFKrfq",
              "content": "Current temperature: 88°F"
            }
          ]
        }
      ]
    }
```

The API response will now only include text

```
{
  "content": [
    {
      "type": "text",
      "text": "Currently in Paris, the temperature is 88°F (31°C)"
    }
  ]
}
```

### Preserve thinking blocks
<a name="claude-messages-extended-thinking-tool-use-thinking-blocks"></a>

During tool use, you must pass thinking blocks back to the API, and you must include the complete unmodified block back to the API. This is critical for maintaining the model’s reasoning flow and conversation integrity.

**Tip**  
While you can omit `thinking` blocks from prior `assistant` role turns, we suggest always passing back all thinking blocks to the API for any multi-turn conversation. The API will do the following:  
Automatically filter the provided thinking blocks
Use the relevant thinking blocks necessary to preserve the model’s reasoning
Only bill for the input tokens for the blocks shown to Claude

When Claude invokes tools, it is pausing its construction of a response to await external information. When tool results are returned, Claude will continue building that existing response. This necessitates preserving thinking blocks during tool use, for the following reasons:
+ **Reasoning continuity**: The thinking blocks capture Claude’s step-by-step reasoning that led to tool requests. When you post tool results, including the original thinking ensures Claude can continue its reasoning from where it left oﬀ.
+ **Context maintenance**: While tool results appear as user messages in the API structure, they’re part of a continuous reasoning flow. Preserving thinking blocks maintains this conceptual flow across multiple API calls.

**Important**  
When providing thinking blocks, the entire sequence of consecutive thinking blocks must match the outputs generated by the model during the original request; you cannot rearrange or modify the sequence of these blocks.

### Interleaved thinking (beta)
<a name="claude-messages-extended-thinking-tool-use-interleaved"></a>

**Warning**  
Interleaved thinking is made available to you as a ‘Beta Service’ as defined in the AWS Service Terms. It is subject to your Agreement with AWS and the AWS Service Terms, and the applicable model EULA.

Claude 4 models support interleaved thinking, a feature that enables Claude to think between tool calls and run more sophisticated reasoning after receiving tool results. This allows for more complex agentic interactions where Claude can do the following:
+ Reason about the results of a tool call before deciding what to do next
+ Chain multiple tool calls with reasoning steps in between
+ Make more nuanced decisions based on intermediate results

To enable interleaved thinking, add the beta header `interleaved-thinking-2025-05-14` to your API request.

**Note**  
With interleaved thinking, the `budget_tokens` can exceed the `max_tokens` parameter because it represents the total budget across all thinking blocks within one assistant turn.

## Thinking block clearing (beta)
<a name="claude-messages-thinking-block-clearing"></a>

**Warning**  
Thinking block clearing is made available as a "Beta Service" as defined in the AWS Service Terms.

**Note**  
This feature is currently supported on Claude Sonnet 4/4.5, Claude Haiku 4.5, and Claude Opus 4/4.1/4.5.

Thinking block clearing is an Anthropic Claude model capability (in beta). With this feature, Claude can automatically clear older thinking blocks from previous turns. To use Thinking block clearing, you need to add `context-management-2025-06-27` to the list of beta headers on the anthropic\$1beta request parameter. You will also need to specify the use of `clear_thinking_20251015` and choose from the following configuration options.

These are the available controls for the `clear_thinking_20251015` context management strategy. All are optional or have defaults:


| **Configuration Option** | **Description** | 
| --- | --- | 
|  `keep` default: 1 thinking turn  |  Defines how many recent assistant turns with thinking blocks to preserve. Use `{"type": "thinking_turns", "value": N}` where N must be > 0 to keep the last N turns, or `{"type": "all"}` to keep all thinking blocks.  | 

------
#### [ Request ]

```
{
      "anthropic_version": "bedrock-2023-05-31",
      "max_tokens": 10000,
      "anthropic_beta": [
        "context-management-2025-06-27"
      ],
      "thinking": {
        "type": "enabled",
        "budget_tokens": 4000
      },
      "tools": [
        {
          "name": "get_weather",
          "description": "Get current weather for a location",
          "input_schema": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string"
              }
            },
            "required": [
              "location"
            ]
          }
        }
      ],
      "messages": [
        {
          "role": "user",
          "content": "What's the weather in Paris?"
        },
        {
          "role": "assistant",
          "content": [
            {
              "type": "thinking",
              "thinking": "The user is asking for the weather in Paris. I have access to a get_weather function that takes a location as a parameter. I have all the information I need to make this call - the location is \"Paris\".\n\nLet me call the get_weather function with \"Paris\" as the location.",
              "signature": "ErgDCkgIChABGAIqQC/Ccv8GC+5VfcMEiq78XmpU2Ef2cT+96pHKMedKcRNuPz1x0kFlo5HBpW0r1NcQFVQUPuj6PDmP7jdHY7GsrUwSDKNBMogjaM7wYkwfPhoMswjlmfF09JLjZfFlIjB03NkghGOxLbr3VCQHIY0lMaV9UBvt7ZwTpJKzlz+mulBysfvAmDfcnvdJ/6CZre4qnQJsTZaiXdEgASwPIc5jOExBguerrtYSWVC/oPjSi7KZM8PfhP/SPXupyLi8hwYxeqomqkeG7AQhD+3487ecerZJcpJSOSsf0I1OaMpmQEE/b7ehnvTV/A4nLhxIjP4msyIBW+dVwHNFRFlpJLBHUJvN99b4run6YmqBSf4y9TyNMfOr+FtfxedGE0HfJMBd4FHXmUFyW5y91jAHMWqwNxDgacaKkFCAMaqce5rm0ShOxXn1uwDUAS3jeRP26Pynihq8fw5DQwlqOpo7vvXtqb5jjiCmqfOe6un5xeIdhhbzWddhEk1Vmtg7I817pM4MZjVaeQN02drPs8QgDxihnP6ZooGhd6FCBP2X3Ymdlj5zMlbVHxmSkA4wcNtg4IAYAQ=="
            },
            {
              "type": "tool_use",
              "id": "toolu_bdrk_01U7emCvL5v5z5GT7PDr2vzc",
              "name": "get_weather",
              "input": {
                "location": "Paris"
              }
            }
          ]
        },
        {
          "role": "user",
          "content": [
            {
              "type": "tool_result",
              "tool_use_id": "toolu_bdrk_01U7emCvL5v5z5GT7PDr2vzc",
              "content": "Current temperature: 88°F"
            }
          ]
        }
      ],
      "context_management": {
        "edits": [
          {
            "type": "clear_thinking_20251015",
            "keep": {
              "type": "thinking_turns",
              "value": 1
            }
          }
        ]
      }
    }
```

------
#### [ Response ]

```
{
      "model": "claude-haiku-4-5-20251001",
      "id": "msg_bdrk_01KyTbyFbdG2kzPwWMJY1kum",
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "text",
          "text": "The current weather in Paris is **88°F** (approximately 31°C). It's quite warm! If you need more detailed information like humidity, wind conditions, or a forecast, please let me know."
        }
      ],
      "stop_reason": "end_turn",
      "stop_sequence": null,
      "usage": {
        "input_tokens": 736,
        "cache_creation_input_tokens": 0,
        "cache_read_input_tokens": 0,
        "cache_creation": {
          "ephemeral_5m_input_tokens": 0,
          "ephemeral_1h_input_tokens": 0
        },
        "output_tokens": 47
      },
      "context_management": {
        "applied_edits": [...]
      }
    }
```

------

## Extended thinking with prompt caching
<a name="claude-messages-extended-thinking-prompt-caching"></a>

[Prompt caching](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html) with thinking has several important considerations:

**Thinking block context removal**
+ Thinking blocks from previous turns are removed from context, which can affect cache breakpoints.
+ When continuing conversations with tool use, thinking blocks are cached and count as input tokens when read from cache. This creates a tradeoff where thinking blocks don't consume context window space visually, but they will still count towards your input token usage when cached.
+ If thinking becomes disabled, requests will fail if you pass thinking content in the current tool use turn. In other contexts, thinking content passed to the API is simply ignored.

**Cache invalidation patterns**
+ Changes to thinking parameters (such as enabling, disabling, or altering the budget allocation) invalidate message cache breakpoints.
+ [Interleaved thinking (beta)](#claude-messages-extended-thinking-tool-use-interleaved) amplifies cache invalidation, as thinking blocks can occur between multiple tool calls.
+ System prompts and tools remain cached despite thinking parameter changes or block removal.

**Note**  
While thinking blocks are removed for aching and context calculations, they must be preserved when continuing conversations with tool use, especially with interleaved thinking.

## Understanding thinking block caching behavior
<a name="claude-messages-extended-thinking-caching-behavior"></a>

When using extended thinking with tool use, thinking blocks exhibit specific caching behavior that affects token counting. The following sequence demonstrates how this works.

1. Caching only occurs when you make a subsequent request that includes tool results.

1. When the subsequent request is made, the previous conversation history (including thinking blocks) can be cached.

1. These cached thinking blocks count as input tokens in your usage metrics when they are read from the cache.

1. When a non-tool-result user block is included, all previous thinking blocks are ignored and stripped from context.

Here is a detailed example flow:

Request 1:

```
User: "What's the weather in Paris?"
```

Response 1:

```
[thinking_block 1] + [tool_use block 1]
```

Request 2:

```
User: "What's the weather in Paris?",
Assistant: [thinking_block_1] + [tool_use block 1],
User: [tool_result_1, cache=True]
```

Response 2:

```
[thinking_block 2] + [text block 2]
```

Request 2 writes a cache of the request content (not the response). The cache includes the original user message, the first thinking block, tool use block, and the tool result.

Request 3:

```
User: ["What's the weather in Paris?"],
Assistant: [thinking_block_1] + [tool_use block 1],
User: [tool_result_1, cache=True],
Assistant: [thinking_block_2] + [text block 2],
User: [Text response, cache=True]
```

Because a non-tool-result user block was included, all previous thinking blocks are ignored. This request will be processed the same as the following request:

Request 3 Alternate:

```
User: ["What's the weather in Paris?"]
Assistant: [tool_use block 1]
User: [tool_result_1, cache=True]
Assistant: [text block 2]
User: [Text response, cache=True]
```

This behavior is consistent whether using regular thinking or interleaved thinking.

## Max tokens and context window size with extended thinking
<a name="claude-messages-extended-thinking-max-tokens"></a>

In older Claude models (prior to Claude 3.7 Sonnet), if the sum of prompt tokens and max\$1tokens exceeded the model’s context window, the system would automatically adjust max\$1tokens to fit within the context limit. This meant you could set a large max\$1tokens value and the system would silently reduce it as needed. With Claude 3.7 and 4 models, `max_tokens` (which includes your thinking budget when thinking is enabled) is enforced as a strict limit. The system now returns a validation error if prompt tokens \$1 max\$1tokens exceeds the context window size.

### The context window with extended thinking
<a name="claude-messages-extended-thinking-max-tokens-calculate"></a>

When calculating context window usage with thinking enabled, there are some considerations to be aware of:
+ Thinking blocks from previous turns are removed and not counted towards your context window.
+ Current turn thinking counts towards your `max_tokens` limit for that turn.

The eﬀective context window is calculated as: context window = (current input tokens - previous thinking tokens) \$1 (thinking tokens \$1 encrypted thinking tokens \$1 text output tokens).

### Managing tokens with extended thinking and tool use
<a name="claude-messages-extended-thinking-max-tokens-manage-tool"></a>

When using extended thinking with tool use, thinking blocks must be explicitly preserved and returned with the tool results. The effective context window calculation for extended thinking with tool use becomes the following:

`context window = (current input tokens + previous thinking tokens + tool use tokens) + (thinking tokens + encrypted thinking tokens + text output tokens)`

### Managing tokens with extended thinking
<a name="claude-messages-extended-thinking-max-tokens-manage"></a>

Given the context window and `max_tokens` behavior with extended thinking Claude 3.7 and 4 models, you might need to perform one of the following actions:
+ More actively monitor and manage your token usage.
+ Adjust `max_tokens` values as your prompt length changes.
+ Be aware that previous thinking blocks don’t accumulate in your context window. This change has been made to provide more predictable and transparent behavior, especially as maximum token limits have increased significantly.

## Extended thinking token cost considerations
<a name="claude-messages-extended-thinking-cost"></a>

The thinking process incurs charges for the following:
+ Tokens used during thinking (output tokens)
+ Thinking blocks from the last assistant turn included in subsequent requests (input tokens)
+ Standard text output tokens

**Tip**  
When extended thinking is enabled, a specialized 28 or 29 token system prompt is automatically included to support this feature.

The `budget_tokens` parameter determines the maximum number of tokens Claude is allowed to use for its internal reasoning process. Larger budgets can improve response quality by enabling more thorough analysis for complex problems, although Claude may not use the entire budget allocated, especially at ranges above 32K.

With interleaved thinking, the `budget_tokens` can exceed the `max_tokens` parameter as it represents the total budget across all thinking blocks within one assistant turn.

When using summarized thinking, keep the following information in mind:
+ **Input tokens**: Tokens in your original request
+ **Output tokens (billed)**: The original thinking tokens that Claude generated internally
+ **Output tokens (visible)**: The summarized thinking tokens you see in the response
+ **No charge**: Tokens used to generate the summary
+ The `summary_status` field can indicate if token limits aﬀected summarization
+ The billed output token count will not match the visible token count in the response. You are billed for the full thinking process, not the summary you see.

# Adaptive thinking
<a name="claude-messages-adaptive-thinking"></a>

Adaptive thinking is the recommended way to use [Extended thinking](claude-messages-extended-thinking.md) with Claude Opus 4.6. Instead of manually setting a thinking token budget, adaptive thinking lets Claude dynamically decide when and how much to think based on the complexity of each request. Adaptive thinking reliably drives better performance than extended thinking with a fixed `budget_tokens`, and we recommend moving to adaptive thinking to get the most intelligent responses from Claude Opus 4.6. No beta header is required.

The supported models are as follows:


| Model | Model ID | 
| --- | --- | 
| Claude Opus 4.6 | `anthropic.claude-opus-4-6-v1` | 
| Claude Sonnet 4.6 | `anthropic.claude-sonnet-4-6` | 

**Note**  
`thinking.type: "enabled"` and `budget_tokens` are deprecated on Claude Opus 4.6 and will be removed in a future model release. Use `thinking.type: "adaptive"` with the effort parameter instead.  
Older models (Claude Sonnet 4.5, Claude Opus 4.5, etc.) do not support adaptive thinking and require `thinking.type: "enabled"` with `budget_tokens`.

## How adaptive thinking works
<a name="claude-messages-adaptive-thinking-how-it-works"></a>

In adaptive mode, Claude evaluates the complexity of each request and decides whether and how much to think. At the default effort level (`high`), Claude will almost always think. At lower effort levels, Claude may skip thinking for simpler problems.

Adaptive thinking also automatically enables [Interleaved thinking (beta)](claude-messages-extended-thinking.md#claude-messages-extended-thinking-tool-use-interleaved). This means Claude can think between tool calls, making it especially effective for agentic workflows.

Set `thinking.type` to `"adaptive"` in your API request:

------
#### [ CLI ]

```
aws bedrock-runtime invoke-model \
--model-id "us.anthropic.claude-opus-4-6-v1" \
--body '{
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 16000,
"thinking": {
"type": "adaptive"
},
"messages": [
{
"role": "user",
"content": "Three players A, B, C play a game. Each has a jar with 100 balls numbered 1-100. Simultaneously, each draws one ball. A beats B if As number > Bs number (mod 100, treating 100 as 0 for comparison). Similarly for B vs C and C vs A. The overall winner is determined by majority of pairwise wins (ties broken randomly). Is there a mixed strategy Nash equilibrium where each player draws uniformly? If not, characterize the equilibrium."
}
]
}' \
--cli-binary-format raw-in-base64-out \
output.json && cat output.json | jq '.content[] | {type, thinking: .thinking[0:200], text}'
```

------
#### [ Python ]

```
import boto3
import json

bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name='us-east-2'
)

response = bedrock_runtime.invoke_model(
    modelId="us.anthropic.claude-opus-4-6-v1",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 16000,
        "thinking": {
            "type": "adaptive"
        },
        "messages": [{
            "role": "user",
            "content": "Explain why the sum of two even numbers is always even."
        }]
    })
)

response_body = json.loads(response["body"].read())

for block in response_body["content"]:
    if block["type"] == "thinking":
        print(f"\nThinking: {block['thinking']}")
    elif block["type"] == "text":
        print(f"\nResponse: {block['text']}")
```

------
#### [ TypeScript ]

```
import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";

async function main() {
    const client = new BedrockRuntimeClient({});

    const command = new InvokeModelCommand({
        modelId: "us.anthropic.claude-opus-4-6-v1",
        body: JSON.stringify({
            anthropic_version: "bedrock-2023-05-31",
            max_tokens: 16000,
            thinking: {
                type: "adaptive"
            },
            messages: [{
                role: "user",
                content: "Explain why the sum of two even numbers is always even."
            }]
        })
    });

    const response = await client.send(command);
    const responseBody = JSON.parse(new TextDecoder().decode(response.body));

    for (const block of responseBody.content) {
        if (block.type === "thinking") {
            console.log(`\nThinking: ${block.thinking}`);
        } else if (block.type === "text") {
            console.log(`\nResponse: ${block.text}`);
        }
    }
}

main().catch(console.error);
```

------

## Adaptive thinking with the effort parameter
<a name="claude-messages-adaptive-thinking-effort"></a>

You can combine adaptive thinking with the effort parameter to guide how much thinking Claude does. The effort level acts as soft guidance for Claude's thinking allocation:


| Effort level | Thinking behavior | 
| --- | --- | 
| max | Claude always thinks with no constraints on thinking depth. Claude Opus 4.6 only — requests using max on other models will return an error. | 
| high (default) | Claude always thinks. Provides deep reasoning on complex tasks. | 
| medium | Claude uses moderate thinking. May skip thinking for very simple queries. | 
| low | Claude minimizes thinking. Skips thinking for simple tasks where speed matters most. | 

## Prompt caching
<a name="claude-messages-adaptive-thinking-prompt-caching"></a>

Consecutive requests using `adaptive` thinking preserve prompt cache breakpoints. However, switching between `adaptive` and `enabled`/`disabled` thinking modes breaks cache breakpoints for messages. System prompts and tool definitions remain cached regardless of mode changes.

## Tuning thinking behavior
<a name="claude-messages-adaptive-thinking-tuning"></a>

If Claude is thinking more or less often than you'd like, you can add guidance to your system prompt:

```
Extended thinking adds latency and should only be used when it
will meaningfully improve answer quality — typically for problems
that require multi-step reasoning. When in doubt, respond directly.
```

**Warning**  
Steering Claude to think less often may reduce quality on tasks that benefit from reasoning. Measure the impact on your specific workloads before deploying prompt-based tuning to production. Consider testing with lower effort levels first.

# Thinking encryption
<a name="claude-messages-thinking-encryption"></a>

Full thinking content is encrypted and returned in the signature field. This field is used to verify that thinking blocks were generated by Claude when passed back to the API. When streaming responses, the signature is added via a `signature_delta` inside a `content_block_delta` event just before the `content_block_stop` event.

**Note**  
It is only strictly necessary to send back thinking blocks when using tools with extended thinking. Otherwise, you can omit thinking blocks from previous turns, or let the API strip them for you if you pass them back.  
If sending back thinking blocks, we recommend passing everything back as you received it for consistency and to avoid potential issues.

## Thinking redaction in Claude 3.7 Sonnet
<a name="claude-messages-thinking-encryption-redaction"></a>

**Note**  
The following information applies specifically to Claude 3.7 Sonnet. Claude 4 models handle thinking differently and do not produce redacted thinking blocks.

In Claude 3.7 Sonnet, the following applies:
+ Occasionally Claude’s internal reasoning will be flagged by our safety systems. When this occurs, we encrypt some or all of the thinking block and return it to you as a redacted\$1thinking block. redacted\$1thinking blocks are decrypted when passed back to the API, allowing Claude to continue its response without losing context.
+ `thinking` and `redacted_thinking` blocks are returned before the text blocks in the response.

When building customer-facing applications that use extended thinking with Claude 3.7 Sonnet, consider the following:
+ Be aware that redacted thinking blocks contain encrypted content that isn’t human-readable.
+ Consider providing a simple explanation like: “Some of Claude’s internal reasoning has been automatically encrypted for safety reasons. This doesn’t aﬀect the quality of responses.”
+ If you display thinking blocks to users, you can filter out redacted blocks while preserving normal thinking blocks.
+ Be transparent that using extended thinking features may occasionally result in some reasoning being encrypted.
+ Implement appropriate error handling to gracefully manage redacted thinking without breaking your UI.

Here’s an example showing both normal and redacted thinking blocks:

```
{
    "content": [
        {
            "type": "thinking",
            "thinking": "Let me analyze this step by step...",
            "signature":"WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."                },
        {
            "type": "redacted_thinking",
            "data":"EmwKAhgBEgy3va3pzix/LafPsn4aDFIT2Xlxh0L5L8rLVyIwxtE3rAFBa8cr3qpP..."
        },
        {
            "type": "text",
            "text": "Based on my analysis..."
        }
    ]
}
```

**Tip**  
Seeing redacted thinking blocks in your output is expected behavior. The model can still use this redacted reasoning to inform its responses while maintaining safety guardrails.  
If you need to test redacted thinking handling in your application, you can use this special test string as your prompt: `ANTHROPIC_MAGIC_STRING_TRIGGER_REDACTED_THINKING_46C9A13E193C177646C7398A98432ECCCE4C1253D5E2D82641AC0E52CC2876CB`

When passing `thinking` and `redacted_thinking` blocks back to the API in a multi-turn conversation, you must include the complete unmodified block back to the API for the last assistant turn. This is critical for maintaining the model’s reasoning flow. We suggest always passing back all thinking blocks to the API. For more details, see the [Extended thinking with tool usePreserve thinking blocks](claude-messages-extended-thinking.md#claude-messages-extended-thinking-tool-use-thinking-blocks).

The following example uses the **InvokeModelWithResponseStream** API to demonstrate the request and response structure when using thinking tokens with redactions.

When streaming is enabled, you’ll receive thinking content from the thinking\$1delta events. Here’s how to handle streaming with thinking:

**Request**

```
{
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 24000,
    "thinking": {
        "type": "enabled",
        "budget_tokens": 16000
    },
    "messages": [
        {
            "role": "user",
            "content": "What is 27 * 453?"
        }
    ]
}
```

**Response**

```
event: message_start
data: {"type": "message_start", "message": {"id": "msg_01...", "type": "message", "role": "assistant", "content": [], "model": "claude-3-7-sonnet-20250219", "stop_reason": null, "stop_sequence": null}}

event: content_block_start
data: {"type": "content_block_start", "index": 0, "content_block": {"type": "thinking", "thinking": ""}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "thinking_delta", "thinking": "Let me solve this step by step:\n\n1. First break down 27 * 453"}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "thinking_delta", "thinking": "\n2. 453 = 400 + 50 + 3"}}

// Additional thinking deltas...

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "signature_delta", "signature": "EqQBCgIYAhIM1gbcDa9GJwZA2b3hGgxBdjrkzLoky3dl1pkiMOYds..."}}

event: content_block_stop
data: {"type": "content_block_stop", "index": 0}

event: content_block_start
data: {"type": "content_block_start", "index": 1, "content_block": {"type": "text", "text": ""}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 1, "delta": {"type": "text_delta", "text": "27 * 453 = 12,231"}}

// Additional text deltas...

event: content_block_stop
data: {"type": "content_block_stop", "index": 1}

event: message_delta
data: {"type": "message_delta", "delta": {"stop_reason": "end_turn", "stop_sequence": null}}

event: message_stop
data: {"type": "message_stop"}
```

# Differences in thinking across model versions
<a name="claude-messages-thinking-differences"></a>

The Messages API handles thinking differently across Claude 3.7 Sonnet and Claude 4 models, primarily in redaction and summarization behavior. The following table summarizes those differences.


| Feature | Claude 3.7 Sonnet | Claude 4 Models | 
| --- | --- | --- | 
| Thinking output | Returns the full thinking output | Returns summarized thinking | 
| Redaction handling | Uses `redacted_thinking` blocks | Redacts and encrypts full thinking, returned in a `signature` field | 
| Interleaved thinking | Not supported | Supported with a beta header | 

# Compaction
<a name="claude-messages-compaction"></a>

**Tip**  
Server-side compaction is recommended for managing context in long-running conversations and agentic workflows as it handles context management automatically with minimal integration work.

**Note**  
Compaction is currently in beta. Include the beta header `compact-2026-01-12` in your API requests to use this feature. Compaction is currently not supported by the Converse API, however it is supported with [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html).

Compaction extends the effective context length for long-running conversations and tasks by automatically summarizing older context when approaching the context window limit. This is ideal for:
+ Chat-based, multi-turn conversations where you want users to use one chat for a long period of time
+ Task-oriented prompts that require a lot of follow-up work (often tool use) that may exceed the 200K context window

Compaction is supported on the following models:


| Model | Model ID | 
| --- | --- | 
| Claude Sonnet 4.6 | `anthropic.claude-sonnet-4-6` | 
| Claude Opus 4.6 | `anthropic.claude-opus-4-6-v1` | 

**Note**  
The top-level `input_tokens` and `output_tokens` in the `usage` field do not include compaction iteration usage, and reflect the sum of all non-compaction iterations. To calculate the total tokens consumed and billed for a request, sum across all entries in the `usage.iterations` array.  
If you previously relied on `usage.input_tokens` and `usage.output_tokens` for cost tracking or auditing, you will need to update your tracking logic to aggregate across `usage.iterations` when compaction is enabled. The `iterations` array is only present when a new compaction is triggered during the request. Re-applying a previous `compaction` block incurs no additional compaction cost, and the top-level usage fields remain accurate in that case.

## How compaction works
<a name="claude-messages-compaction-how-it-works"></a>

When compaction is enabled, Claude automatically summarizes your conversation when it approaches the configured token threshold. The API:

1. Detects when input tokens exceed your specified trigger threshold.

1. Generates a summary of the current conversation.

1. Creates a `compaction` block containing the summary.

1. Continues the response with the compacted context.

On subsequent requests, append the response to your messages. The API automatically drops all message blocks prior to the `compaction` block, continuing the conversation from the summary.

## Basic usage
<a name="claude-messages-compaction-basic-usage"></a>

Enable compaction by adding the `compact_20260112` strategy to `context_management.edits` in your Messages API request.

------
#### [ CLI ]

```
aws bedrock-runtime invoke-model \
    --model-id "us.anthropic.claude-opus-4-6-v1" \
    --body '{
        "anthropic_version": "bedrock-2023-05-31",
        "anthropic_beta": ["compact-2026-01-12"],
        "max_tokens": 4096,
        "messages": [
            {
                "role": "user",
                "content": "Help me build a website"
            }
        ],
        "context_management": {
            "edits": [
                {
                    "type": "compact_20260112"
                }
            ]
        }
    }' \
    --cli-binary-format raw-in-base64-out \
    /tmp/response.json

echo "Response:"
cat /tmp/response.json | jq '.content[] | {type, text: .text[0:500]}'
```

------
#### [ Python ]

```
import boto3
import json

bedrock_runtime = boto3.client(service_name='bedrock-runtime')

messages = [{"role": "user", "content": "Help me build a website"}]

response = bedrock_runtime.invoke_model(
    modelId="us.anthropic.claude-opus-4-6-v1",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "anthropic_beta": ["compact-2026-01-12"],
        "max_tokens": 4096,
        "messages": messages,
        "context_management": {
            "edits": [
                {
                    "type": "compact_20260112"
                }
            ]
        }
    })
)

response_body = json.loads(response["body"].read())

# Append the response (including any compaction block) to continue the conversation
messages.append({"role": "assistant", "content": response_body["content"]})

for block in response_body["content"]:
    if block.get("type") == "compaction":
        print(f"[COMPACTION]: {block['content'][:200]}...")
    elif block.get("type") == "text":
        print(f"[RESPONSE]: {block['text']}")
```

------
#### [ TypeScript ]

```
import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";

async function main() {
    const client = new BedrockRuntimeClient({});

    const messages: Array<{role: string, content: string | object[]}> = [
        { role: "user", content: "Help me build a website" }
    ];

    const command = new InvokeModelCommand({
        modelId: "us.anthropic.claude-opus-4-6-v1",
        body: JSON.stringify({
            anthropic_version: "bedrock-2023-05-31",
            anthropic_beta: ["compact-2026-01-12"],
            max_tokens: 4096,
            messages,
            context_management: {
                edits: [
                    {
                        type: "compact_20260112"
                    }
                ]
            }
        })
    });

    const response = await client.send(command);
    const responseBody = JSON.parse(new TextDecoder().decode(response.body));

    // Append response to continue conversation
    messages.push({ role: "assistant", content: responseBody.content });

    for (const block of responseBody.content) {
        if (block.type === "compaction") {
            console.log(`[COMPACTION]: ${block.content.substring(0, 200)}...`);
        } else if (block.type === "text") {
            console.log(`[RESPONSE]: ${block.text}`);
        }
    }
}

main().catch(console.error);
```

------

## Parameters
<a name="claude-messages-compaction-parameters"></a>


| Parameter | Type | Default | Description | 
| --- | --- | --- | --- | 
| type | string | Required | Must be "compact\$120260112" | 
| trigger | object | 150,000 tokens | When to trigger compaction. Must be at least 50,000 tokens. | 
| pause\$1after\$1compaction | boolean | false | Whether to pause after generating the compaction summary | 
| instructions | string | null | Custom summarization prompt. Completely replaces the default prompt when provided. | 

## Trigger configuration
<a name="claude-messages-compaction-trigger"></a>

Configure when compaction triggers using the `trigger` parameter:

```
import boto3
import json

bedrock_runtime = boto3.client(service_name='bedrock-runtime')

response = bedrock_runtime.invoke_model(
    modelId="us.anthropic.claude-opus-4-6-v1",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "anthropic_beta": ["compact-2026-01-12"],
        "max_tokens": 4096,
        "messages": [{"role": "user", "content": "Help me build a website"}],
        "context_management": {
            "edits": [
                {
                    "type": "compact_20260112",
                    "trigger": {
                        "type": "input_tokens",
                        "value": 100000
                    }
                }
            ]
        }
    })
)

response_body = json.loads(response["body"].read())
print(response_body["content"][-1]["text"])
```

## Custom summarization instructions
<a name="claude-messages-compaction-custom-instructions"></a>

By default, compaction uses the following summarization prompt:

```
You have written a partial transcript for the initial task above. Please write a summary of the transcript. The purpose of this summary is to provide continuity so you can continue to make progress towards solving the task in a future context, where the raw history above may not be accessible and will be replaced with this summary. Write down anything that would be helpful, including the state, next steps, learnings etc. You must wrap your summary in a <summary></summary> block.
```

You can provide custom instructions via the `instructions` parameter to replace this prompt entirely. Custom instructions don't supplement the default; they completely replace it:

```
import boto3
import json

bedrock_runtime = boto3.client(service_name='bedrock-runtime')

response = bedrock_runtime.invoke_model(
    modelId="us.anthropic.claude-opus-4-6-v1",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "anthropic_beta": ["compact-2026-01-12"],
        "max_tokens": 4096,
        "messages": [{"role": "user", "content": "Help me build a website"}],
        "context_management": {
            "edits": [
                {
                    "type": "compact_20260112",
                    "instructions": "Focus on preserving code snippets, variable names, and technical decisions."
                }
            ]
        }
    })
)

response_body = json.loads(response["body"].read())
print(response_body["content"][-1]["text"])
```

## Pausing after compaction
<a name="claude-messages-compaction-pause"></a>

Use `pause_after_compaction` to pause the API after generating the compaction summary. This allows you to add additional content blocks (such as preserving recent messages or specific instruction-oriented messages) before the API continues with the response.

When enabled, the API returns a message with the `compaction` stop reason after generating the compaction block:

```
import boto3
import json

bedrock_runtime = boto3.client(service_name='bedrock-runtime')

messages = [{"role": "user", "content": "Help me build a website"}]

response = bedrock_runtime.invoke_model(
    modelId="us.anthropic.claude-opus-4-6-v1",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "anthropic_beta": ["compact-2026-01-12"],
        "max_tokens": 4096,
        "messages": messages,
        "context_management": {
            "edits": [
                {
                    "type": "compact_20260112",
                    "pause_after_compaction": True
                }
            ]
        }
    })
)

response_body = json.loads(response["body"].read())

# Check if compaction triggered a pause
if response_body.get("stop_reason") == "compaction":
    # Response contains only the compaction block
    messages.append({"role": "assistant", "content": response_body["content"]})

    # Continue the request
    response = bedrock_runtime.invoke_model(
        modelId="us.anthropic.claude-opus-4-6-v1",
        body=json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "anthropic_beta": ["compact-2026-01-12"],
            "max_tokens": 4096,
            "messages": messages,
            "context_management": {
                "edits": [{"type": "compact_20260112"}]
            }
        })
    )
    response_body = json.loads(response["body"].read())

print(response_body["content"][-1]["text"])
```

## Working with compaction blocks
<a name="claude-messages-compaction-blocks"></a>

When compaction is triggered, the API returns a `compaction` block at the start of the assistant response.

A long-running conversation may result in multiple compactions. The last compaction block reflects the final state of the prompt, replacing content prior to it with the generated summary.

```
{
  "content": [
    {
      "type": "compaction",
      "content": "Summary of the conversation: The user requested help building a web scraper..."
    },
    {
      "type": "text",
      "text": "Based on our conversation so far..."
    }
  ]
}
```

## Streaming
<a name="claude-messages-compaction-streaming"></a>

When streaming responses with compaction enabled, you'll receive a `content_block_start` event when compaction begins. The compaction block streams differently from text blocks. You'll receive a `content_block_start` event, followed by a single `content_block_delta` with the complete summary content (no intermediate streaming), and then a `content_block_stop` event.

## Prompt caching
<a name="claude-messages-compaction-prompt-caching"></a>

You may add a `cache_control` breakpoint on compaction blocks, which caches the full system prompt along with the summarized content. The original compacted content is ignored. Note that when compaction is triggered, it can result in a cache miss on the subsequent request.

```
{
    "role": "assistant",
    "content": [
        {
            "type": "compaction",
            "content": "[summary text]",
            "cache_control": {"type": "ephemeral"}
        },
        {
            "type": "text",
            "text": "Based on our conversation..."
        }
    ]
}
```

## Understanding usage
<a name="claude-messages-compaction-usage"></a>

Compaction requires an additional sampling step, which contributes to rate limits and billing. The API returns detailed usage information in the response:

```
{
  "usage": {
    "input_tokens": 45000,
    "output_tokens": 1234,
    "iterations": [
      {
        "type": "compaction",
        "input_tokens": 180000,
        "output_tokens": 3500
      },
      {
        "type": "message",
        "input_tokens": 23000,
        "output_tokens": 1000
      }
    ]
  }
}
```

The `iterations` array shows usage for each sampling iteration. When compaction occurs, you'll see a `compaction` iteration followed by the main `message` iteration. The final iteration's token counts reflect the effective context size after compaction.

# Get validated JSON results from models
<a name="claude-messages-structured-outputs"></a>

You can use structured outputs with Claude Sonnet 4.5, Claude Haiku 4.5, Claude Opus 4.5, and Claude Opus 4.6. To learn more, see [Get validated JSON results from models](structured-output.md).

# Request and Response
<a name="model-parameters-anthropic-claude-messages-request-response"></a>

The request body is passed in the `body` field of a request to [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html).

**Note**  
Restrictions apply to the following operations: `InvokeModel`, `InvokeModelWithResponseStream`, `Converse`, and `ConverseStream`. See [API restrictions](inference-api-restrictions.md) for details.

**Warning**  
Claude Sonnet 4.5 and Claude Haiku 4.5 support specifying either the `temperature` or `top_p` parameter, but not both. This does not apply to any older models.

------
#### [ Request ]

Anthropic Claude has the following inference parameters for a messages inference call. 

```
{
    "anthropic_version": "bedrock-2023-05-31", 
    "anthropic_beta": ["computer-use-2024-10-22"] 
    "max_tokens": int,
    "system": string,    
    "messages": [
        {
            "role": string,
            "content": [
                { "type": "image", "source": { "type": "base64", "media_type": "image/jpeg", "data": "content image bytes" } },
                { "type": "text", "text": "content text" }
      ]
        }
    ],
    "temperature": float,
    "top_p": float,
    "top_k": int,
    "tools": [
        {
                "type": "custom",
                "name": string,
                "description": string,
                "input_schema": json
            
        },
        { 
            "type": "computer_20241022",  
            "name": "computer", 
            "display_height_px": int,
            "display_width_px": int,
            "display_number": 0 int
        },
        { 
            "type": "bash_20241022", 
            "name": "bash"
        },
        { 
            "type": "text_editor_20241022",
            "name": "str_replace_editor"
        }
        
    ],
    "tool_choice": {
        "type" :  string,
        "name" : string,
    },
    

 
    "stop_sequences": [string]
}
```

The following are required parameters.
+  **anthropic\$1version** – (Required) The anthropic version. The value must be `bedrock-2023-05-31`.
+ **max\$1tokens** – (Required) The maximum number of tokens to generate before stopping.

  Note that Anthropic Claude models might stop generating tokens before reaching the value of `max_tokens`. Different Anthropic Claude models have different maximum values for this parameter. For more information, see [Model comparison](https://docs.anthropic.com/claude/docs/models-overview#model-comparison).
+ **messages** – (Required) The input messages.
  + **role** – The role of the conversation turn. Valid values are `user` and `assistant`.     
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages-request-response.html)
  + **content** – (required) The content of the conversation turn, as an array of objects. Each object contains a **type** field, in which you can specify one of the following values:
    + `text` – If you specify this type, you must include a **text** field and specify the text prompt as its value. If another object in the array is an image, this text prompt applies to the images.
    + `image` – If you specify this type, you must include a **source** field that maps to an object with the following fields:
      + **type** – (required) The encoding type for the image. You can specify `base64`. 
      + **media\$1type** – (required) The type of the image. You can specify the following image formats. 
        + `image/jpeg`
        + `image/png`
        + `image/webp` 
        + `image/gif`
      + **data** – (required) The base64 encoded image bytes for the image. The maximum image size is 3.75MB. The maximum height and width of an image is 8000 pixels. 

The following are optional parameters.
+  **system** – (Optional) The system prompt for the request.

  A system prompt is a way of providing context and instructions to Anthropic Claude, such as specifying a particular goal or role. For more information, see [System prompts](https://docs.anthropic.com/en/docs/system-prompts) in the Anthropic documentation. 
**Note**  
You can use system prompts with Anthropic Claude version 2.1 or higher.
+ **anthropic\$1beta** – (Optional) The anthropic beta parameter is a list of strings of beta headers used to indicate opt-in to a particular set of beta features.
**Note**  
The 1 million token context length variant of Claude Sonnet 4 is available to you in select AWS Regions as a "Beta Service" as defined in the AWS Service Terms. It is subject to your Agreement with AWS and the AWS Service Terms, and the applicable model EULA. Please see the [Amazon Bedrock Pricing](https://aws.amazon.com/bedrock/pricing/) page for more information about the pricing for longer context requests. Separate service quotas apply (for more information, see **Service Quotas** in the AWS Management Console).

  Available beta headers include the following:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages-request-response.html)
+  **stop\$1sequences** – (Optional) Custom text sequences that cause the model to stop generating. Anthropic Claude models normally stop when they have naturally completed their turn, in this case the value of the `stop_reason` response field is `end_turn`. If you want the model to stop generating when it encounters custom strings of text, you can use the `stop_sequences` parameter. If the model encounters one of the custom text strings, the value of the `stop_reason` response field is `stop_sequence` and the value of `stop_sequence` contains the matched stop sequence.

  The maximum number of entries is 8191. 
+  **temperature** – (Optional) The amount of randomness injected into the response.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages-request-response.html)
+  **top\$1p** – (Optional) Use nucleus sampling.

  In nucleus sampling, Anthropic Claude computes the cumulative distribution over all the options for each subsequent token in decreasing probability order and cuts it off once it reaches a particular probability specified by `top_p`. When adjusting sampling parameters, modify either `temperature` or `top_p`. Do not modify both at the same time.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages-request-response.html)
+  **top\$1k** – (Optional) Only sample from the top K options for each subsequent token.

  Use `top_k` to remove long tail low probability responses.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages-request-response.html)
+  **tools** – (Optional) Definitions of tools that the model may use.
**Note**  
Requires an Anthropic Claude 3 model.

  If you include `tools` in your request, the model may return `tool_use` content blocks that represent the model's use of those tools. You can then run those tools using the tool input generated by the model and then optionally return results back to the model using `tool_result` content blocks.

  You can pass the following tool types:

**Custom**  
Definition for a custom tool.
  + (optional) **type** – The type of the tool. If defined, use the value `custom`.
  + **name** – The name of the tool.
  + **description** – (optional, but strongly recommended) The description of the tool.
  + **input\$1schema** – The JSON schema for the tool.

**Computer**  
Definition for the computer tool that you use with the computer use API.
  +  **type** – The value must be `computer_20241022`.
  + **name** – The value must be `computer`.
  + (Required) **display\$1height\$1px** – The height of the display being controlled by the model, in pixels..    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages-request-response.html)
  + (Required) **display\$1width\$1px** – The width of the display being controlled by the model, in pixels.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages-request-response.html)
  + (Optional) **display\$1number** – The display number to control (only relevant for X11 environments). If specified, the tool will be provided a display number in the tool definition.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages-request-response.html)

**bash**  
Definition for the bash tool that you use with the computer use API.
  + (optional) **type** – The value must be `bash_20241022`.
  + **name** – The value must be `bash`. the tool.

**text editor**  
Definition for the text editor tool that you use with the computer use API.
  + (optional) **type** – The value must be `text_editor_20241022`.
  + **name** – The value must be `str_replace_editor`. the tool.
+  **tool\$1choice** – (Optional) Specifices how the model should use the provided tools. The model can use a specific tool, any available tool, or decide by itself.
**Note**  
Requires an Anthropic Claude 3 model.
  + **type** – The type of tool choice. Possible values are `any` (use any available tool), `auto` (the model decides), and `tool` (use the specified tool).
  + **name** – (Optional) The name of the tool to use. Required if you specify `tool` in the `type` field.

------
#### [ Response ]

The Anthropic Claude model returns the following fields for a messages inference call. 

```
{
    "id": string,
    "model": string,
    "type" : "message",
    "role" : "assistant",
    "content": [
        {
            "type": string,
            "text": string,
            "image" :json,
            "id": string,
            "name":string,
            "input": json
        }
    ],
    "stop_reason": string,
    "stop_sequence": string,
    "usage": {
        "input_tokens": integer,
        "output_tokens": integer
    }
    
}
```

Example responses with new stop\$1reason values:

```
// Example with refusal
{
    "stop_reason": "refusal",
    "content": [
        {
            "type": "text",
            "text": "I can't help with that request."
        }
    ]
}

// Example with tool_use
{
    "stop_reason": "tool_use",
    "content": [
        {
            "type": "tool_use",
            "id": "toolu_123",
            "name": "calculator",
            "input": {"expression": "2+2"}
        }
    ]
}

// Example with model_context_window_exceeded (Claude Sonnet 4.5)
{
    "stop_reason": "model_context_window_exceeded",
    "content": [
        {
            "type": "text",
            "text": "The response was truncated due to context window limits..."
        }
    ]
}
```
+ **id** – The unique identifier for the response. The format and length of the ID might change over time.
+ **model** – The ID for the Anthropic Claude model that made the request.
+ **stop\$1reason** – The reason why Anthropic Claude stopped generating the response.
  + **end\$1turn** – The model reached a natural stopping point
  + **max\$1tokens** – The generated text exceeded the value of the `max_tokens` input field or exceeded the maximum number of tokens that the model supports.' .
  + **stop\$1sequence** – The model generated one of the stop sequences that you specified in the `stop_sequences` input field. 
  + **refusal** – Claude refuses to generate a response due to safety concerns
  + **tool\$1use** – Claude is calling a tool and expects you to execute it
  + **model\$1context\$1window\$1exceeded** – the model stopped generation due to hitting the context window limit.
    + New with Claude Sonnet 4.5
+ **stop\$1sequence** – The stop sequence that ended the generation.
+ **type** – The type of response. The value is always `message`.
+ **role** – The conversational role of the generated message. The value is always `assistant`.
+ **content** – The content generated by the model. Returned as an array. There are three types of content, *text*, *tool\$1use* and *image*.
  + *text* – A text response.
    + **type** – The type of the content. This value is `text`. 
    + **text** – If the value of `type` is text, contains the text of the content. 
  + *tool use* – A request from the model to use a tool.
    + **type** – The type of the content. This value is `tool_use`.
    + **id** – The ID for the tool that the model is requesting use of.
    + **name** – Contains the name of the requested tool. 
    + **input** – The input parameters to pass to the tool.
  + *Image* – A request from the model to use a tool.
    + **type** – The type of the content. This value is `image`.
    + **source** – Contains the image. For more information, see [Multimodal prompts](model-parameters-anthropic-claude-messages.md#model-parameters-anthropic-claude-messages-multimodal-prompts).
+ **usage** – Container for the number of tokens that you supplied in the request and the number tokens of that the model generated in the response.
  + **input\$1tokens** – The number of input tokens in the request.
  + **output\$1tokens** – The number tokens of that the model generated in the response.
  + **stop\$1sequence** – The model generated one of the stop sequences that you specified in the `stop_sequences` input field. 

------

## Effort parameter (beta)
<a name="model-parameters-anthropic-claude-effort-parameter"></a>

The `effort` parameter is an alternative to thinking token budgets for Claude Opus 4.5. This parameter tells Claude how liberally it should spend tokens to produce the best result, adjusting token usage across thinking, tool calls, and user communication. It can be used with or without extended thinking mode.

The effort parameter can be set to:
+ `high` (default) – Claude spends as many tokens as needed for the best result
+ `medium` – Balanced token usage
+ `low` – Conservative token usage

To use this feature you must pass the beta header `effort-2025-11-24`.

Request example:

```
{
    "anthropic_version": "bedrock-2023-05-31",
    "anthropic_beta": [
        "effort-2025-11-24"
    ],
    "max_tokens": 4096,
    "output_config": {
        "effort": "medium"
    },
    "messages": [{
        "role": "user",
        "content": "Analyze this complex dataset and provide insights"
    }]
}
```

# Code examples
<a name="api-inference-examples-claude-messages-code-examples"></a>

The following code examples show how to use the messages API. 

**Topics**
+ [

## Messages code example
](#api-inference-examples-claude-messages-code-example)
+ [

## Multimodal code examples
](#api-inference-examples-claude-multimodal-code-example)

## Messages code example
<a name="api-inference-examples-claude-messages-code-example"></a>

This example shows how to send a single turn user message and a user turn with a prefilled assistant message to an Anthropic Claude 3 Sonnet model.

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate a message with Anthropic Claude (on demand).
"""
import boto3
import json
import logging

from botocore.exceptions import ClientError


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

def generate_message(bedrock_runtime, model_id, system_prompt, messages, max_tokens):

    body=json.dumps(
        {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": max_tokens,
            "system": system_prompt,
            "messages": messages
        }  
    )  

    
    response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
    response_body = json.loads(response.get('body').read())
   
    return response_body


def main():
    """
    Entrypoint for Anthropic Claude message example.
    """

    try:

        bedrock_runtime = boto3.client(service_name='bedrock-runtime')

        model_id = 'anthropic.claude-3-sonnet-20240229-v1:0'
        system_prompt = "Please respond only with emoji."
        max_tokens = 1000

        # Prompt with user turn only.
        user_message =  {"role": "user", "content": "Hello World"}
        messages = [user_message]

        response = generate_message (bedrock_runtime, model_id, system_prompt, messages, max_tokens)
        print("User turn only.")
        print(json.dumps(response, indent=4))

        # Prompt with both user turn and prefilled assistant response.
        #Anthropic Claude continues by using the prefilled assistant text.
        assistant_message =  {"role": "assistant", "content": "<emoji>"}
        messages = [user_message, assistant_message]
        response = generate_message(bedrock_runtime, model_id,system_prompt, messages, max_tokens)
        print("User turn and prefilled assistant response.")
        print(json.dumps(response, indent=4))

    except ClientError as err:
        message=err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occurred: " +
            format(message))

if __name__ == "__main__":
    main()
```

## Multimodal code examples
<a name="api-inference-examples-claude-multimodal-code-example"></a>

The following examples show how to pass an image and prompt text in a multimodal message to an Anthropic Claude 3 Sonnet model.

**Topics**
+ [

### Multimodal prompt with InvokeModel
](#api-inference-examples-claude-multimodal-code-example-invoke-model)
+ [

### Streaming multimodal prompt with InvokeModelWithResponseStream
](#api-inference-examples-claude-multimodal-code-example-streaming)

### Multimodal prompt with InvokeModel
<a name="api-inference-examples-claude-multimodal-code-example-invoke-model"></a>

The following example shows how to send a multimodal prompt to Anthropic Claude 3 Sonnet with [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html). 

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to run a multimodal prompt with Anthropic Claude (on demand) and InvokeModel.
"""

import json
import logging
import base64
import boto3

from botocore.exceptions import ClientError


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def run_multi_modal_prompt(bedrock_runtime, model_id, messages, max_tokens):
    """
    Invokes a model with a multimodal prompt.
    Args:
        bedrock_runtime: The Amazon Bedrock boto3 client.
        model_id (str): The model ID to use.
        messages (JSON) : The messages to send to the model.
        max_tokens (int) : The maximum  number of tokens to generate.
    Returns:
        None.
    """



    body = json.dumps(
        {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": max_tokens,
            "messages": messages
        }
    )

    response = bedrock_runtime.invoke_model(
        body=body, modelId=model_id)
    response_body = json.loads(response.get('body').read())

    return response_body


def main():
    """
    Entrypoint for Anthropic Claude multimodal prompt example.
    """

    try:

        bedrock_runtime = boto3.client(service_name='bedrock-runtime')

        model_id = 'anthropic.claude-3-sonnet-20240229-v1:0'
        max_tokens = 1000
        input_text = "What's in this image?"
        input_image = "/path/to/image" # Replace with actual path to image file
 
        # Read reference image from file and encode as base64 strings.
        image_ext = input_image.split(".")[-1]
        with open(input_image, "rb") as image_file:
            content_image = base64.b64encode(image_file.read()).decode('utf8')

        message = {
            "role": "user",
            "content": [
                {
                    "type": "image", 
                    "source": {
                        "type": "base64",
                        "media_type": f"image/{image_ext}", 
                        "data": content_image
                    }
                },
                {
                    "type": "text", 
                    "text": input_text
                }
            ]
        }

    
        messages = [message]

        response = run_multi_modal_prompt(
            bedrock_runtime, model_id, messages, max_tokens)
        print(json.dumps(response, indent=4))

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occurred: " +
              format(message))


if __name__ == "__main__":
    main()
```

### Streaming multimodal prompt with InvokeModelWithResponseStream
<a name="api-inference-examples-claude-multimodal-code-example-streaming"></a>

The following example shows how to stream the response from a multimodal prompt sent to Anthropic Claude 3 Sonnet with [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html). 

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to stream the response from Anthropic Claude Sonnet (on demand) for a 
multimodal request.
"""

import json
import base64
import logging
import boto3

from botocore.exceptions import ClientError

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def stream_multi_modal_prompt(bedrock_runtime, model_id, input_text, image, max_tokens):
    """
    Streams the response from a multimodal prompt.
    Args:
        bedrock_runtime: The Amazon Bedrock boto3 client.
        model_id (str): The model ID to use.
        input_text (str) : The prompt text
        image (str) : The path to  an image that you want in the prompt.
        max_tokens (int) : The maximum  number of tokens to generate.
    Returns:
        None.
    """

    with open(image, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read())

    body = json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": max_tokens,
        "messages": [
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": input_text},
                    {"type": "image", "source": {"type": "base64",
                                                 "media_type": "image/jpeg", "data": encoded_string.decode('utf-8')}}
                ]
            }
        ]
    })

    response = bedrock_runtime.invoke_model_with_response_stream(
        body=body, modelId=model_id)

    for event in response.get("body"):
        chunk = json.loads(event["chunk"]["bytes"])

        if chunk['type'] == 'message_delta':
            print(f"\nStop reason: {chunk['delta']['stop_reason']}")
            print(f"Stop sequence: {chunk['delta']['stop_sequence']}")
            print(f"Output tokens: {chunk['usage']['output_tokens']}")

        if chunk['type'] == 'content_block_delta':
            if chunk['delta']['type'] == 'text_delta':
                print(chunk['delta']['text'], end="")


def main():
    """
    Entrypoint for Anthropic Claude Sonnet multimodal prompt example.
    """

    model_id = "anthropic.claude-3-sonnet-20240229-v1:0"
    input_text = "What can you tell me about this image?"
    image = "/path/to/image"
    max_tokens = 100

    try:

        bedrock_runtime = boto3.client('bedrock-runtime')

        stream_multi_modal_prompt(
            bedrock_runtime, model_id, input_text, image, max_tokens)

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))


if __name__ == "__main__":
    main()
```

# Supported models
<a name="claude-messages-supported-models"></a>

You can use the Messages API with the following Anthropic Claude models.
+ Anthropic Claude Opus 4.5
+ Anthropic Claude Opus 4.1
+ Anthropic Claude Opus 4 
+ Anthropic Claude Sonnet 4.5 
+ Anthropic Claude Haiku 4.5 
+ Anthropic Claude Sonnet 4 
+ Anthropic Claude 3.7 Sonnet 
+ Anthropic Claude 3.5 Sonnet v2 
+ Anthropic Claude 3.5 Sonnet 
+ Anthropic Claude 3 Opus 
+ Anthropic Claude 3 Sonnet 
+ Anthropic Claude 3 Haiku 
+ Anthropic Claude 2 v2.1 
+ Anthropic Claude 2 v2 
+ Anthropic Claude Instant v1.2

# AI21 Labs models
<a name="model-parameters-ai21"></a>

This section describes the request parameters and response fields for AI21 Labs models. Use this information to make inference calls to AI21 Labs models with the [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) and [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) (streaming) operations. This section also includes Python code examples that shows how to call AI21 Labs models. To use a model in an inference operation, you need the model ID for the model. To get the model ID, see [Supported foundation models in Amazon Bedrock](models-supported.md). Some models also work with the [Converse API](conversation-inference.md). To check if the Converse API supports a specific AI21 Labs model, see [Supported models and model features](conversation-inference-supported-models-features.md). For more code examples, see [Code examples for Amazon Bedrock using AWS SDKs](service_code_examples.md).

Foundation models in Amazon Bedrock support input and output modalities, which vary from model to model. To check the modalities that AI21 Labs models support, see [Supported foundation models in Amazon Bedrock](models-supported.md). To check which Amazon Bedrock features the AI21 Labs models support, see [Supported foundation models in Amazon Bedrock](models-supported.md). To check which AWS Regions that AI21 Labs models are available in, see [Supported foundation models in Amazon Bedrock](models-supported.md).

When you make inference calls with AI21 Labs models, you include a prompt for the model. For general information about creating prompts for the models that Amazon Bedrock supports, see [Prompt engineering concepts](prompt-engineering-guidelines.md). For AI21 Labs specific prompt information, see the [AI21 Labs prompt engineering guide](https://docs.ai21.com/docs/prompt-engineering).

**Topics**
+ [

# AI21 Labs Jurassic-2 models
](model-parameters-jurassic2.md)
+ [

# AI21 Labs Jamba models
](model-parameters-jamba.md)

# AI21 Labs Jurassic-2 models
<a name="model-parameters-jurassic2"></a>

This section provides inference parameters and a code example for using AI21 Labs AI21 Labs Jurassic-2 models.

**Topics**
+ [

## Inference parameters
](#model-parameters-jurassic2-request-response)
+ [

## Code example
](#api-inference-examples-a2i-jurassic)

## Inference parameters
<a name="model-parameters-jurassic2-request-response"></a>

The AI21 Labs Jurassic-2 models support the following inference parameters.

**Topics**
+ [

### Randomness and Diversity
](#model-parameters-jurassic2-random)
+ [

### Length
](#model-parameters-jurassic2-length)
+ [

### Repetitions
](#model-parameters-jurassic2-reps)
+ [

### Model invocation request body field
](#model-parameters-jurassic2-request-body)
+ [

### Model invocation response body field
](#model-parameters-jurassic2-response-body)

### Randomness and Diversity
<a name="model-parameters-jurassic2-random"></a>

The AI21 Labs Jurassic-2 models support the following parameters to control randomness and diversity in the response.
+ **Temperature** (`temperature`)– Use a lower value to decrease randomness in the response.
+ **Top P** (`topP`) – Use a lower value to ignore less probable options.

### Length
<a name="model-parameters-jurassic2-length"></a>

The AI21 Labs Jurassic-2 models support the following parameters to control the length of the generated response.
+ **Max completion length** (`maxTokens`) – Specify the maximum number of tokens to use in the generated response.
+ **Stop sequences** (`stopSequences`) – Configure stop sequences that the model recognizes and after which it stops generating further tokens. Press the Enter key to insert a newline character in a stop sequence. Use the Tab key to finish inserting a stop sequence.

### Repetitions
<a name="model-parameters-jurassic2-reps"></a>

The AI21 Labs Jurassic-2 models support the following parameters to control repetition in the generated response.
+ **Presence penalty** (`presencePenalty`) – Use a higher value to lower the probability of generating new tokens that already appear at least once in the prompt or in the completion.
+ **Count penalty** (`countPenalty`) – Use a higher value to lower the probability of generating new tokens that already appear at least once in the prompt or in the completion. Proportional to the number of appearances.
+ **Frequency penalty** (`frequencyPenalty`) – Use a high value to lower the probability of generating new tokens that already appear at least once in the prompt or in the completion. The value is proportional to the frequency of the token appearances (normalized to text length).
+ **Penalize special tokens** – Reduce the probability of repetition of special characters. The default values are `true`.
  + **Whitespaces** (`applyToWhitespaces`) – A `true` value applies the penalty to whitespaces and new lines.
  + **Punctuations** (`applyToPunctuation`) – A `true` value applies the penalty to punctuation.
  + **Numbers** (`applyToNumbers`) – A `true` value applies the penalty to numbers.
  + **Stop words** (`applyToStopwords`) – A `true` value applies the penalty to stop words.
  + **Emojis** (`applyToEmojis`) – A `true` value excludes emojis from the penalty.

### Model invocation request body field
<a name="model-parameters-jurassic2-request-body"></a>

When you make an [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) call using an AI21 Labs model, fill the `body` field with a JSON object that conforms to the one below. Enter the prompt in the `prompt` field.

```
{
    "prompt": string,
    "temperature": float,
    "topP": float,
    "maxTokens": int,
    "stopSequences": [string],
    "countPenalty": {
        "scale": float
    },
    "presencePenalty": {
        "scale": float
    },
    "frequencyPenalty": {
        "scale": float
    }
}
```

To penalize special tokens, add those fields to any of the penalty objects. For example, you can modify the `countPenalty` field as follows.

```
"countPenalty": {
    "scale": float,
    "applyToWhitespaces": boolean,
    "applyToPunctuations": boolean,
    "applyToNumbers": boolean,
    "applyToStopwords": boolean,
    "applyToEmojis": boolean
}
```

The following table shows the minimum, maximum, and default values for the numerical parameters.


****  
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-jurassic2.html)

### Model invocation response body field
<a name="model-parameters-jurassic2-response-body"></a>

For information about the format of the `body` field in the response, see [https://docs.ai21.com/reference/j2-complete-api-ref](https://docs.ai21.com/reference/j2-complete-api-ref).

**Note**  
Amazon Bedrock returns the response identifier (`id`) as an integer value.

## Code example
<a name="api-inference-examples-a2i-jurassic"></a>

This examples shows how to call the *A2I AI21 Labs Jurassic-2 Mid* model.

```
import boto3
import json

brt = boto3.client(service_name='bedrock-runtime')

body = json.dumps({
    "prompt": "Translate to spanish: 'Amazon Bedrock is the easiest way to build and scale generative AI applications with base models (FMs)'.", 
    "maxTokens": 200,
    "temperature": 0.5,
    "topP": 0.5
})

modelId = 'ai21.j2-mid-v1'
accept = 'application/json'
contentType = 'application/json'

response = brt.invoke_model(
    body=body, 
    modelId=modelId, 
    accept=accept, 
    contentType=contentType
)

response_body = json.loads(response.get('body').read())

# text
print(response_body.get('completions')[0].get('data').get('text'))
```

# AI21 Labs Jamba models
<a name="model-parameters-jamba"></a>

This section provides inference parameters and a code example for using AI21 Labs Jamba models.

**Topics**
+ [

## Required fields
](#model-parameters-jamba-required-fields)
+ [

## Inference parameters
](#model-parameters-jamba-request-response)
+ [

## Model invocation request body field
](#model-parameters-jamba-request-body)
+ [

## Model invocation response body field
](#model-parameters-jamba-response-body)
+ [

## Code example
](#api-inference-examples-a2i-jamba)
+ [

## Code example for Jamba 1.5 Large
](#api-inference-examples-a2i-jamba15-large)

## Required fields
<a name="model-parameters-jamba-required-fields"></a>

The AI21 Labs Jamba models supports the following required fields:
+ **Messages** (`messages`) – The previous messages in this chat, from oldest (index 0) to newest. Must have at least one user or assistant message in the list. Include both user inputs and system responses. Maximum total size for the list is about 256K tokens. Each message includes the following members:
+ **Role** (`role`) – The role of the message author. One of the following values:
  + **User** (`user`) – Input provided by the user. Any instructions given here that conflict with instructions given in the `system` prompt take precedence over the `system` prompt instructions.
  + **Assistant** (`assistant`) – Response generated by the model.
  + **System** (`system`) – Initial instructions provided to the system to provide general guidance on the tone and voice of the generated message. An initial system message is optional but recommended to provide guidance on the tone of the chat. For example, "You are a helpful chatbot with a background in earth sciences and a charming French accent."
+ **Content** (`content`) – The content of the message.

## Inference parameters
<a name="model-parameters-jamba-request-response"></a>

The AI21 Labs Jamba models support the following inference parameters.

**Topics**
+ [

### Randomness and Diversity
](#model-parameters-jamba-random)
+ [

### Length
](#model-parameters-jamba-length)
+ [

### Repetitions
](#model-parameters-jamba-reps)

### Randomness and Diversity
<a name="model-parameters-jamba-random"></a>

The AI21 Labs Jamba models support the following parameters to control randomness and diversity in the response.
+ **Temperature** (`temperature`)– How much variation to provide in each answer. Setting this value to 0 guarantees the same response to the same question every time. Setting a higher value encourages more variation. Modifies the distribution from which tokens are sampled. Default: 1.0, Range: 0.0 – 2.0
+ **Top P** (`top_p`) – Limit the pool of next tokens in each step to the top N percentile of possible tokens, where 1.0 means the pool of all possible tokens, and 0.01 means the pool of only the most likely next tokens.

### Length
<a name="model-parameters-jamba-length"></a>

The AI21 Labs Jamba models support the following parameters to control the length of the generated response.
+ **Max completion length** (`max_tokens`) – The maximum number of tokens to allow for each generated response message. Typically the best way to limit output length is by providing a length limit in the system prompt (for example, "limit your answers to three sentences"). Default: 4096, Range: 0 – 4096.
+ **Stop sequences** (`stop`) – End the message when the model generates one of these strings. The stop sequence is not included in the generated message. Each sequence can be up to 64K long, and can contain newlines as \$1n characters. 

  Examples:
  + Single stop string with a word and a period: "monkeys."
  + Multiple stop strings and a newline: ["cat", "dog", " .", "\$1\$1\$1\$1", "\$1n"]
+ **Number of responses** (`n`) – How many chat responses to generate. Notes n must be 1 for streaming responses. If n is set to larger than 1, setting `temperature=0` will always fail because all answers are guaranteed to be duplicates. Default:1, Range: 1 – 16

### Repetitions
<a name="model-parameters-jamba-reps"></a>

The AI21 Labs Jamba models support the following parameters to control repetition in the generated response.
+ **Frequency Penalty** (`frequency_penalty`) – Reduce frequency of repeated words within a single response message by increasing this number. This penalty gradually increases the more times a word appears during response generation. Setting to 2.0 will produce a string with few, if any repeated words. 
+ **Presence Penalty ** (`presence_penalty`) – Reduce the frequency of repeated words within a single message by increasing this number. Unlike frequency penalty, presence penalty is the same no matter how many times a word appears. 

## Model invocation request body field
<a name="model-parameters-jamba-request-body"></a>

When you make an [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) call using an AI21 Labs model, fill the `body` field with a JSON object that conforms to the one below. Enter the prompt in the `prompt` field.

```
{
  "messages": [
    {
      "role":"system", // Non-printing contextual information for the model
      "content":"You are a helpful history teacher. You are kind and you respond with helpful content in a professional manner. Limit your answers to three sentences. Your listener is a high school student."
    },
    {
      "role":"user", // The question we want answered.
      "content":"Who was the first emperor of rome?"
    }
  ],
  "n":1 // Limit response to one answer
}
```

## Model invocation response body field
<a name="model-parameters-jamba-response-body"></a>

For information about the format of the `body` field in the response, see [https://docs.ai21.com/reference/jamba-instruct-api\$1response-details](https://docs.ai21.com/reference/jamba-instruct-api#response-details).

## Code example
<a name="api-inference-examples-a2i-jamba"></a>

This example shows how to call the *AI21 Labs Jamba-Instruct* model.

**`invoke_model`**

```
import boto3 
import json

bedrock = session.client('bedrock-runtime', 'us-east-1') 
response = bedrock.invoke_model( 
        modelId='ai21.jamba-instruct-v1:0', 
        body=json.dumps({
            'messages': [ 
                { 
                    'role': 'user', 
                    'content': 'which llm are you?' 
                } 
             ], 
         }) 
       ) 

print(json.dumps(json.loads(response['body']), indent=4))
```

**converse**

```
import boto3 
import json

bedrock = session.client('bedrock-runtime', 'us-east-1')
response = bedrock.converse( 
    modelId='ai21.jamba-instruct-v1:0', 
    messages=[ 
        { 
            'role': 'user', 
            'content': [ 
                { 
                    'text': 'which llm are you?' 
                } 
             ] 
          } 
     ] 
  ) 

print(json.dumps(json.loads(response['body']), indent=4))
```

## Code example for Jamba 1.5 Large
<a name="api-inference-examples-a2i-jamba15-large"></a>

This example shows how to call the *AI21 Labs Jamba 1.5 Large* model.

**`invoke_model`**

```
POST https://bedrock-runtime.us-east-1.amazonaws.com/model/ai21.jamba-1-5-mini-v1:0/invoke-model HTTP/1.1
{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful chatbot with a background in earth sciences and a charming French accent."
    },
    {
      "role": "user",
      "content": "What are the main causes of earthquakes?"
    }
  ],
  "max_tokens": 512,
  "temperature": 0.7,
  "top_p": 0.9,
  "stop": ["###"],
  "n": 1
}
```

# Cohere models
<a name="model-parameters-cohere"></a>

This section describes the request parameters and response fields for Cohere models. Use this information to make inference calls to Cohere models with the [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) and [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) (streaming) operations. This section also includes Python code examples that shows how to call Cohere models. To use a model in an inference operation, you need the model ID for the model. To get the model ID, see [Supported foundation models in Amazon Bedrock](models-supported.md). Some models also work with the [Converse API](conversation-inference.md). To check if the Converse API supports a specific Cohere model, see [Supported models and model features](conversation-inference-supported-models-features.md). For more code examples, see [Code examples for Amazon Bedrock using AWS SDKs](service_code_examples.md).

Foundation models in Amazon Bedrock support input and output modalities, which vary from model to model. To check the modalities that Cohere models support, see [Supported foundation models in Amazon Bedrock](models-supported.md). To check which Amazon Bedrock features the Cohere models support, see [Supported foundation models in Amazon Bedrock](models-supported.md). To check which AWS Regions that Cohere models are available in, see [Supported foundation models in Amazon Bedrock](models-supported.md).

When you make inference calls with Cohere models, you include a prompt for the model. For general information about creating prompts for the models that Amazon Bedrock supports, see [Prompt engineering concepts](prompt-engineering-guidelines.md). For Cohere specific prompt information, see the [Cohere prompt engineering guide](https://txt.cohere.com/how-to-train-your-pet-llm-prompt-engineering).

**Topics**
+ [

# Cohere Command models
](model-parameters-cohere-command.md)
+ [

# Cohere Embed and Cohere Embed v4 models
](model-parameters-embed.md)
+ [

# Cohere Command R and Command R\$1 models
](model-parameters-cohere-command-r-plus.md)

# Cohere Command models
<a name="model-parameters-cohere-command"></a>

You make inference requests to an Cohere Command model with [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) (streaming). You need the model ID for the model that you want to use. To get the model ID, see [Supported foundation models in Amazon Bedrock](models-supported.md). 

**Topics**
+ [

## Request and Response
](#model-parameters-cohere-command-request-response)
+ [

## Code example
](#api-inference-examples-cohere-command)

## Request and Response
<a name="model-parameters-cohere-command-request-response"></a>

------
#### [ Request ]

The Cohere Command models have the following inference parameters. 

```
{
    "prompt": string,
    "temperature": float,
    "p": float,
    "k": float,
    "max_tokens": int,
    "stop_sequences": [string],
    "return_likelihoods": "GENERATION|ALL|NONE",
    "stream": boolean,
    "num_generations": int,
    "logit_bias": {token_id: bias},
    "truncate": "NONE|START|END"
}
```

The following are required parameters.
+ **prompt** – (Required) The input text that serves as the starting point for generating the response.

  The following are text per call and character limits.

**Texts per call**  
    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-cohere-command.html)

**Characters**  
    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-cohere-command.html)

The following are optional parameters.
+ **return\$1likelihoods** – Specify how and if the token likelihoods are returned with the response. You can specify the following options. 
  + `GENERATION` – Only return likelihoods for generated tokens.
  + `ALL` – Return likelihoods for all tokens.
  + `NONE` – (Default) Don't return any likelihoods.
+ **stream** – ( Required to support streaming) Specify `true` to return the response piece-by-piece in real-time and `false` to return the complete response after the process finishes.
+ **logit\$1bias** – Prevents the model from generating unwanted tokens or incentivizes the model to include desired tokens. The format is `{token_id: bias}` where bias is a float between -10 and 10. Tokens can be obtained from text using any tokenization service, such as Cohere’s Tokenize endpoint. For more information, see [Cohere documentation](https://docs.cohere.com/docs).    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-cohere-command.html)
+  **num\$1generations** – The maximum number of generations that the model should return.     
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-cohere-command.html)
+  **truncate** – Specifies how the API handles inputs longer than the maximum token length. Use one of the following:
  + `NONE` – Returns an error when the input exceeds the maximum input token length. 
  + `START` – Discard the start of the input. 
  + `END` – (Default) Discards the end of the input.

  If you specify `START` or `END`, the model discards the input until the remaining input is exactly the maximum input token length for the model.
+ **temperature** – Use a lower value to decrease randomness in the response.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-cohere-command.html)
+ **p** – Top P. Use a lower value to ignore less probable options. Set to 0 or 1.0 to disable. If both `p` and `k` are enabled, `p` acts after `k`.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-cohere-command.html)
+ **k** – Top K. Specify the number of token choices the model uses to generate the next token. If both `p` and `k` are enabled, `p` acts after `k`.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-cohere-command.html)
+ **max\$1tokens** – Specify the maximum number of tokens to use in the generated response.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-cohere-command.html)
+ **stop\$1sequences** – Configure up to four sequences that the model recognizes. After a stop sequence, the model stops generating further tokens. The returned text doesn't contain the stop sequence.

------
#### [ Response ]

The response has the following possible fields:

```
{
    "generations": [
        {
            "finish_reason": "COMPLETE | MAX_TOKENS | ERROR | ERROR_TOXIC",
            "id": string,
            "text": string,
            "likelihood" : float,
            "token_likelihoods" : [{"token" : string, "likelihood": float}],
            "is_finished" : true | false,
            "index" : integer
           
        }
    ],
    "id": string,
    "prompt": string
}
```
+ `generations` — A list of generated results along with the likelihoods for tokens requested. (Always returned). Each generation object in the list contains the following fields.
  + `id` — An identifier for the generation. (Always returned).
  + `likelihood` — The likelihood of the output. The value is the average of the token likelihoods in `token_likelihoods`. Returned if you specify the `return_likelihoods` input parameter.
  + `token_likelihoods` — An array of per token likelihoods. Returned if you specify the `return_likelihoods` input parameter.
  + `finish_reason` — The reason why the model finished generating tokens. `COMPLETE` - the model sent back a finished reply. `MAX_TOKENS` – the reply was cut off because the model reached the maximum number of tokens for its context length. `ERROR ` – something went wrong when generating the reply. `ERROR_TOXIC` – the model generated a reply that was deemed toxic. `finish_reason` is returned only when `is_finished`=`true`. (Not always returned). 
  + `is_finished` — A boolean field used only when `stream` is `true`, signifying whether or not there are additional tokens that will be generated as part of the streaming response. (Not always returned)
  + `text` — The generated text.
  + `index` — In a streaming response, use to determine which generation a given token belongs to. When only one response is streamed, all tokens belong to the same generation and index is not returned. `index` therefore is only returned in a streaming request with a value for `num_generations` that is larger than one.
+ `prompt` — The prompt from the input request (always returned).
+ `id` — An identifier for the request (always returned).

For more information, see [Generate](https://docs.cohere.com/reference/generate-1) in the Cohere documentations.

------

## Code example
<a name="api-inference-examples-cohere-command"></a>

This examples shows how to call the *Cohere Command* model.

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate text using a Cohere model.
"""
import json
import logging
import boto3


from botocore.exceptions import ClientError

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_text(model_id, body):
    """
    Generate text using a Cohere model.
    Args:
        model_id (str): The model ID to use.
        body (str) : The reqest body to use.
    Returns:
        dict: The response from the model.
    """

    logger.info("Generating text with Cohere model %s", model_id)

    accept = 'application/json'
    content_type = 'application/json'

    bedrock = boto3.client(service_name='bedrock-runtime')

    response = bedrock.invoke_model(
        body=body,
        modelId=model_id,
        accept=accept,
        contentType=content_type
    )

    logger.info("Successfully generated text with Cohere model %s", model_id)

    return response


def main():
    """
    Entrypoint for Cohere example.
    """

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    model_id = 'cohere.command-text-v14'
    prompt = """Summarize this dialogue: 
"Customer: Please connect me with a support agent.
AI: Hi there, how can I assist you today?
Customer: I forgot my password and lost access to the email affiliated to my account. Can you please help me?
AI: Yes of course. First I'll need to confirm your identity and then I can connect you with one of our support agents.
"""
    try:
        body = json.dumps({
            "prompt": prompt,
            "max_tokens": 200,
            "temperature": 0.6,
            "p": 1,
            "k": 0,
            "num_generations": 2,
            "return_likelihoods": "GENERATION"
        })
        response = generate_text(model_id=model_id,
                                 body=body)

        response_body = json.loads(response.get('body').read())
        generations = response_body.get('generations')

        for index, generation in enumerate(generations):

            print(f"Generation {index + 1}\n------------")
            print(f"Text:\n {generation['text']}\n")
            if 'likelihood' in generation:
                print(f"Likelihood:\n {generation['likelihood']}\n")
            
            print(f"Reason: {generation['finish_reason']}\n\n")

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))
    else:
        print(f"Finished generating text with Cohere model {model_id}.")


if __name__ == "__main__":
    main()
```

# Cohere Embed and Cohere Embed v4 models
<a name="model-parameters-embed"></a>

You make inference requests to an Embed model with [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) You need the model ID for the model that you want to use. To get the model ID, see [Supported foundation models in Amazon Bedrock](models-supported.md). 

**Note**  
Amazon Bedrock doesn't support streaming responses from Cohere Embed models.

**Topics**
+ [

# Cohere Embed v4
](model-parameters-embed-v4.md)
+ [

# Cohere Embed v3
](model-parameters-embed-v3.md)

# Cohere Embed v4
<a name="model-parameters-embed-v4"></a>

Cohere Embed v4 is a multimodal embedding model that supports both text and image inputs. It can process interleaved text and image content, making it ideal for document understanding, visual search, and multimodal retrieval applications. The model supports various embedding types including float, int8, uint8, binary, and ubinary formats, with configurable output dimensions from 256 to 1536.

The model ID for Cohere Embed v4 is `cohere.embed-v4`.

**Additional usage notes**  

+ **Context length:** Up to \$1128k tokens supported; for RAG, smaller chunks often improve retrieval and cost.
+ **Image sizing:** Images > 2,458,624 pixels are downsampled to that size; images < 3,136 pixels are upsampled.
+ **Interleaved inputs:** Prefer inputs.content[] for page-like multimodal content so text context (e.g., filename, entities) travels with the image.

**Topics**
+ [

## Request and Response
](#model-parameters-embed-v4-request-response)
+ [

## Request and response for different input\$1types
](#api-inference-examples-cohere-embed-v4)
+ [

## Code Examples
](#code-examples-cohere-embed-v4)

## Request and Response
<a name="model-parameters-embed-v4-request-response"></a>

------
#### [ Request ]

Content type: application/json

```
{
  "input_type": "search_document | search_query | classification | clustering",
  "texts": ["..."],                      // optional; text-only
  "images": ["data:<mime>;base64,..."],  // optional; image-only
  "inputs": [
    { "content": [
        { "type": "text",      "text": "..." },
        { "type": "image_url", "image_url": {"url": "data:<mime>;base64,..."} }
      ]
    }
  ],                                     // optional; mixed (interleaved) text+image
  "embedding_types": ["float" | "int8" | "uint8" | "binary" | "ubinary"],
  "output_dimension": 256 | 512 | 1024 | 1536,
  "max_tokens": 128000,
  "truncate": "NONE | LEFT | RIGHT"
}
```

**Parameters**  

+ **input\$1type** (required) – Adds special tokens to distinguish use cases. Allowed: `search_document`, `search_query`, `classification`, `clustering`. For search/RAG, embed your corpus with `search_document` and queries with `search_query`.
+ **texts** (optional) – Array of strings to embed. Max 96 per call. If you use `texts`, don't send `images` in the same call.
+ **images** (optional) – Array of data-URI base64 images to embed. Max 96 per call. Don't send `texts` and `images` together. (Use `inputs` for interleaved.)
+ **inputs** (optional; mixed/fused modality) – A list where each item has a content list of parts. Each part is `{ "type": "text", "text": ... }` or `{ "type": "image_url", "image_url": {"url": "data:<mime>;base64,..."} }`. Send interleaved page-like content here (e.g., PDF page image \$1 caption/metadata). Max 96 items.
+ **embedding\$1types** (optional) – One or more of: `float`, `int8`, `uint8`, `binary`, `ubinary`. If omitted, returns float embeddings.
+ **output\$1dimension** (optional) – Select vector length. Allowed: `256`, `512`, `1024`, `1536` (default `1536` if unspecified).
+ **max\$1tokens** (optional) – Truncation budget per input object. The model supports up to \$1128,000 tokens; chunk smaller for RAG as appropriate.
+ **truncate** (optional) – How to handle over-length inputs: `LEFT` drops tokens from the start; `RIGHT` drops from the end; `NONE` returns an error if the input exceeds the limit.

**Limits & sizing**  

+ Items per request: up to 96 images. The original image file type must be in a png, jpeg, webp, or gif format and can be up to 5 MB in size.
+ Request size cap: \$120 MB total payload.
+ Maximum input tokens: 128k tokens max. Image files are converted into tokens, and total tokens should be less than 128k.
+ Images: max 2,458,624 pixels before downsampling; images smaller than 3,136 pixels are upsampled. Provide images as `data:<mime>;base64,....`
+ Token accounting (per `inputs` item): Tokens from an image input ≈ (image pixels ÷ 784) x 4 Tokens from an interleaved text and image input = (image pixels ÷ 784) x 4 \$1 (text tokens)

**Tip:** For PDFs, convert each page to an image and send via `inputs` along with page metadata (e.g., file\$1name, entities) in adjacent text parts.

------
#### [ Response ]

Content type: application/json

If you requested a single embedding type (e.g., only `float`):

```
{
"id": "string",
"embeddings": [[ /* length = output_dimension */ ]],
"response_type": "embeddings_floats",
"texts": ["..."], // present if text was provided
"inputs": [ { "content": [ ... ] } ] // present if 'inputs' was used
}
```

If you requested multiple embedding types (e.g., `["float","int8"]`):

```
{
  "id": "string",
  "embeddings": {
    "float": [[ ... ]],
    "int8":  [[ ... ]]
  },
  "response_type": "embeddings_by_type",
  "texts": ["..."],     // when text used
  "inputs": [ { "content": [ ... ] } ] // when 'inputs' used
}
```
+ The number of returned vectors matches the length of your `texts` array or the number of `inputs` items.
+ Each vector's length equals `output_dimension` (default `1536`).

------

## Request and response for different input\$1types
<a name="api-inference-examples-cohere-embed-v4"></a>

**A) Interleaved page (image \$1 caption) with compact int8 vectors**

**Request**  


```
{
  "input_type": "search_document",
  "inputs": [
    {
      "content": [
        { "type": "text", "text": "Quarterly ARR growth chart; outlier in Q3." },
        { "type": "image_url", "image_url": {"url": "data:image/png;base64,{{BASE64_PAGE_IMG}}"} }
      ]
    }
  ],
  "embedding_types": ["int8"],
  "output_dimension": 512,
  "truncate": "RIGHT",
  "max_tokens": 128000
}
```

**Response (truncated)**  


```
{
  "id": "836a33cc-61ec-4e65-afaf-c4628171a315",
  "embeddings": { "int8": [[ 7, -3, ... ]] },
  "response_type": "embeddings_by_type",
  "inputs": [
    { "content": [
      { "type": "text", "text": "Quarterly ARR growth chart; outlier in Q3." },
      { "type": "image_url", "image_url": {"url": "data:image/png;base64,{{...}}"} }
    ] }
  ]
}
```

**B) Text-only corpus indexing (default float, 1536-dim)**

**Request**  


```
{
  "input_type": "search_document",
  "texts": [
    "RAG system design patterns for insurance claims",
    "Actuarial loss triangles and reserving primer"
  ]
}
```

**Response (sample)**  


```
{
  "response_type": "embeddings_floats",
  "embeddings": [
    [0.0135, -0.0272, ...],   // length 1536
    [0.0047,  0.0189, ...]
  ]
}
```

## Code Examples
<a name="code-examples-cohere-embed-v4"></a>

------
#### [ Text input ]

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate embeddings using the Cohere Embed v4 model.
"""
import json
import logging
import boto3


from botocore.exceptions import ClientError

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_text_embeddings(model_id, body, region_name):
    """
    Generate text embedding by using the Cohere Embed model.
    Args:
        model_id (str): The model ID to use.
        body (str) : The reqest body to use.
        region_name (str): The AWS region to invoke the model on
    Returns:
        dict: The response from the model.
    """

    logger.info("Generating text embeddings with the Cohere Embed model %s", model_id)

    accept = '*/*'
    content_type = 'application/json'

    bedrock = boto3.client(service_name='bedrock-runtime', region_name=region_name)

    response = bedrock.invoke_model(
        body=body,
        modelId=model_id,
        accept=accept,
        contentType=content_type
    )

    logger.info("Successfully generated embeddings with Cohere model %s", model_id)

    return response


def main():
    """
    Entrypoint for Cohere Embed example.
    """

    logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
    
    region_name = 'us-east-1'

    model_id = 'cohere.embed-v4:0'
    text1 = "hello world"
    text2 = "this is a test"
    input_type = "search_document"
    embedding_types = ["float"]

    try:
        body = json.dumps({
            "texts": [
                text1,
                text2],
            "input_type": input_type,
            "embedding_types": embedding_types
        })
        
        response = generate_text_embeddings(model_id=model_id, body=body, region_name=region_name)

        response_body = json.loads(response.get('body').read())

        print(f"ID: {response_body.get('id')}")
        print(f"Response type: {response_body.get('response_type')}")

        print("Embeddings")
        embeddings = response_body.get('embeddings')
        for i, embedding_type in enumerate(embeddings):
            print(f"\t{embedding_type} Embeddings:")
            print(f"\t{embeddings[embedding_type]}")

        print("Texts")
        for i, text in enumerate(response_body.get('texts')):
            print(f"\tText {i}: {text}")

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))
    else:
        print(
            f"Finished generating text embeddings with Cohere model {model_id}.")


if __name__ == "__main__":
    main()
```

------
#### [ Mixed modalities ]

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate image embeddings using the Cohere Embed v4 model.
"""
import json
import logging
import boto3
import base64


from botocore.exceptions import ClientError

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

def get_base64_image_uri(image_file_path: str, image_mime_type: str):
    with open(image_file_path, "rb") as image_file:
        image_bytes = image_file.read()
        base64_image = base64.b64encode(image_bytes).decode("utf-8")
    return f"data:{image_mime_type};base64,{base64_image}"


def generate_embeddings(model_id, body, region_name):
    """
    Generate image embedding by using the Cohere Embed model.
    Args:
        model_id (str): The model ID to use.
        body (str) : The reqest body to use.
        region_name (str): The AWS region to invoke the model on
    Returns:
        dict: The response from the model.
    """

    logger.info("Generating image embeddings with the Cohere Embed model %s", model_id)

    accept = '*/*'
    content_type = 'application/json'

    bedrock = boto3.client(service_name='bedrock-runtime', region_name=region_name)

    response = bedrock.invoke_model(
        body=body,
        modelId=model_id,
        accept=accept,
        contentType=content_type
    )

    logger.info("Successfully generated embeddings with Cohere model %s", model_id)

    return response


def main():
    """
    Entrypoint for Cohere Embed example.
    """

    logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
    
    region_name = 'us-east-1'

    image_file_path = "image.jpg"
    image_mime_type = "image/jpg"
    text = "hello world"

    model_id = 'cohere.embed-v4:0'
    input_type = "search_document"
    image_base64_uri = get_base64_image_uri(image_file_path, image_mime_type)
    embedding_types = ["int8","float"]

    try:
        body = json.dumps({
            "inputs": [
                {
                  "content": [
                    { "type": "text", "text": text },
                    { "type": "image_url", "image_url": {"url": "data:image/png;base64,{{image_base64_uri}}"} }
                  ]
                }
              ],
            "input_type": input_type,
            "embedding_types": embedding_types
        })
        
        response = generate_embeddings(model_id=model_id, body=body, region_name=region_name)

        response_body = json.loads(response.get('body').read())

        print(f"ID: {response_body.get('id')}")
        print(f"Response type: {response_body.get('response_type')}")

        print("Embeddings")
        embeddings = response_body.get('embeddings')
        for i, embedding_type in enumerate(embeddings):
            print(f"\t{embedding_type} Embeddings:")
            print(f"\t{embeddings[embedding_type]}")

        print("inputs")
        for i, input in enumerate(response_body.get('inputs')):
            print(f"\tinput {i}: {input}")

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))
    else:
        print(
            f"Finished generating embeddings with Cohere model {model_id}.")


if __name__ == "__main__":
    main()
```

------

# Cohere Embed v3
<a name="model-parameters-embed-v3"></a>

**Topics**
+ [

## Request and Response
](#model-parameters-embed-v3-request-response)
+ [

## Code example
](#api-inference-examples-cohere-embed-v3)

## Request and Response
<a name="model-parameters-embed-v3-request-response"></a>

------
#### [ Request ]

The Cohere Embed models have the following inference parameters. 

```
{
    "input_type": "search_document|search_query|classification|clustering|image",
    "texts":[string],
    "images":[image_base64_image_uri]
    "truncate": "NONE|START|END",
    "embedding_types": embedding_types
}
```

The following are required parameters.
+ **texts** – An array of strings for the model to embed. For optimal performance, we recommend reducing the length of each text to less than 512 tokens. 1 token is about 4 characters.

  The following are text per call and character limits.

**Texts per call**  
    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-embed-v3.html)

**Characters**  
    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-embed-v3.html)
+ **input\$1type** – Prepends special tokens to differentiate each type from one another. You should not mix different types together, except when mixing types for for search and retrieval. In this case, embed your corpus with the `search_document` type and embedded queries with type `search_query` type. 
  + `search_document` – In search use-cases, use `search_document` when you encode documents for embeddings that you store in a vector database.
  + `search_query` – Use `search_query` when querying your vector DB to find relevant documents.
  + `classification` – Use `classification` when using embeddings as an input to a text classifier.
  + `clustering` – Use `clustering` to cluster the embeddings.
  + `images` – This is an array of images.
    + An array of image data URIs for the model to embed. Maximum number of images per call is 1 (i.e, the model only supports one image input).
    + The image must be a valid data URI. The image must be in either image/jpeg or image/png format and has a maximum size of 5MB.
    + Only one of either “images” or “texts” must be provided.

The following are optional parameters:
+  **truncate** – Specifies how the API handles inputs longer than the maximum token length. Use one of the following:
  + `NONE` – (Default) Returns an error when the input exceeds the maximum input token length. 
  + `START` – Discards the start of the input. 
  + `END` – Discards the end of the input.

  If you specify `START` or `END`, the model discards the input until the remaining input is exactly the maximum input token length for the model.
+  **embedding\$1types** – Specifies the types of embeddings you want to have returned. Optional and default is `None`, which returns the `Embed Floats` response type. Can be one or more of the following types:
  + `float` – Use this value to return the default float embeddings. 
  + `int8` – Use this value to return signed int8 embeddings. 
  + `uint8` – Use this value to return unsigned int8 embeddings. 
  + `binary` – Use this value to return signed binary embeddings. 
  + `ubinary` – Use this value to return unsigned binary embeddings. 

For more information, see [https://docs.cohere.com/reference/embed](https://docs.cohere.com/reference/embed) in the Cohere documentation.

------
#### [ Response ]

The `body` response from a call to `InvokeModel` is the following:

```
{
    "embeddings": [
        [ array of 1024 floats. ]
    ],
    "id": string,
    "response_type" : "embeddings_floats,
    "texts": [string],
    "images": [image_description]
}
```

The `body` response has the following fields:
+ **id** – An identifier for the response. 
+ **response\$1type** – The response type. This value is always `embeddings_floats`. 
+ **embeddings** – An array of embeddings, where each embedding is an array of floats with 1024 elements. The length of the `embeddings` array will be the same as the length of the original `texts` array. 
+ **texts** – An array containing the text entries for which embeddings were returned. 
+ **images** – An array of a description for each image input.

  An `image_description`image\$1description is of this form:

  ```
  {
      "width": long,
      "height": long,
      "format": string,
      "bit_depth": long
  }
  ```

  If image was used as input, the `“texts”` response field will be an empty array. Vice-versa is not true (i.e, when texts is used, `“images”` will not be in the response)

For more information, see [https://docs.cohere.com/reference/embed](https://docs.cohere.com/reference/embed).

------

## Code example
<a name="api-inference-examples-cohere-embed-v3"></a>

This examples shows how to call the *Cohere Embed English* model.

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate text embeddings using the Cohere Embed English model.
"""
import json
import logging
import boto3


from botocore.exceptions import ClientError

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_text_embeddings(model_id, body, region_name):
    """
    Generate text embedding by using the Cohere Embed model.
    Args:
        model_id (str): The model ID to use.
        body (str) : The reqest body to use.
        region_name (str): The AWS region to invoke the model on
    Returns:
        dict: The response from the model.
    """

    logger.info("Generating text embeddings with the Cohere Embed model %s", model_id)

    accept = '*/*'
    content_type = 'application/json'

    bedrock = boto3.client(service_name='bedrock-runtime', region_name=region_name)

    response = bedrock.invoke_model(
        body=body,
        modelId=model_id,
        accept=accept,
        contentType=content_type
    )

    logger.info("Successfully generated embeddings with Cohere model %s", model_id)

    return response


def main():
    """
    Entrypoint for Cohere Embed example.
    """

    logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
    
    region_name = 'us-east-1'

    model_id = 'cohere.embed-english-v3'
    text1 = "hello world"
    text2 = "this is a test"
    input_type = "search_document"
    embedding_types = ["int8", "float"]

    try:
        body = json.dumps({
            "texts": [
                text1,
                text2],
            "input_type": input_type,
            "embedding_types": embedding_types
        })
        
        response = generate_text_embeddings(model_id=model_id, body=body, region_name=region_name)

        response_body = json.loads(response.get('body').read())

        print(f"ID: {response_body.get('id')}")
        print(f"Response type: {response_body.get('response_type')}")

        print("Embeddings")
        embeddings = response_body.get('embeddings')
        for i, embedding_type in enumerate(embeddings):
            print(f"\t{embedding_type} Embeddings:")
            print(f"\t{embeddings[embedding_type]}")

        print("Texts")
        for i, text in enumerate(response_body.get('texts')):
            print(f"\tText {i}: {text}")

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))
    else:
        print(
            f"Finished generating text embeddings with Cohere model {model_id}.")


if __name__ == "__main__":
    main()
```

**Image Input**

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate image embeddings using the Cohere Embed English model.
"""
import json
import logging
import boto3
import base64


from botocore.exceptions import ClientError

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

def get_base64_image_uri(image_file_path: str, image_mime_type: str):
    with open(image_file_path, "rb") as image_file:
        image_bytes = image_file.read()
        base64_image = base64.b64encode(image_bytes).decode("utf-8")
    return f"data:{image_mime_type};base64,{base64_image}"


def generate_image_embeddings(model_id, body, region_name):
    """
    Generate image embedding by using the Cohere Embed model.
    Args:
        model_id (str): The model ID to use.
        body (str) : The reqest body to use.
        region_name (str): The AWS region to invoke the model on
    Returns:
        dict: The response from the model.
    """

    logger.info("Generating image embeddings with the Cohere Embed model %s", model_id)

    accept = '*/*'
    content_type = 'application/json'

    bedrock = boto3.client(service_name='bedrock-runtime', region_name=region_name)

    response = bedrock.invoke_model(
        body=body,
        modelId=model_id,
        accept=accept,
        contentType=content_type
    )

    logger.info("Successfully generated embeddings with Cohere model %s", model_id)

    return response


def main():
    """
    Entrypoint for Cohere Embed example.
    """

    logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
    
    region_name = 'us-east-1'

    image_file_path = "image.jpg"
    image_mime_type = "image/jpg"

    model_id = 'cohere.embed-english-v3'
    input_type = "image"
    images = [get_base64_image_uri(image_file_path, image_mime_type)]
    embedding_types = ["int8", "float"]

    try:
        body = json.dumps({
            "images": images,
            "input_type": input_type,
            "embedding_types": embedding_types
        })
        
        response = generate_image_embeddings(model_id=model_id, body=body, region_name=region_name)

        response_body = json.loads(response.get('body').read())

        print(f"ID: {response_body.get('id')}")
        print(f"Response type: {response_body.get('response_type')}")

        print("Embeddings")
        embeddings = response_body.get('embeddings')
        for i, embedding_type in enumerate(embeddings):
            print(f"\t{embedding_type} Embeddings:")
            print(f"\t{embeddings[embedding_type]}")

        print("Texts")
        for i, text in enumerate(response_body.get('texts')):
            print(f"\tText {i}: {text}")

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))
    else:
        print(
            f"Finished generating text embeddings with Cohere model {model_id}.")


if __name__ == "__main__":
    main()
```

# Cohere Command R and Command R\$1 models
<a name="model-parameters-cohere-command-r-plus"></a>

You make inference requests to Cohere Command R and Cohere Command R\$1 models with [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) (streaming). You need the model ID for the model that you want to use. To get the model ID, see [Supported foundation models in Amazon Bedrock](models-supported.md). 

**Tip**  
For conversational applications, we recommend that you use the Converse API. The Converse API provides a unified set of parameters that work across all models that support messages. For more information, see [Carry out a conversation with the Converse API operations](conversation-inference.md).

**Topics**
+ [

## Request and Response
](#model-parameters-cohere-command-request-response)
+ [

## Code example
](#api-inference-examples-cohere-command-r)

## Request and Response
<a name="model-parameters-cohere-command-request-response"></a>

------
#### [ Request ]

The Cohere Command models have the following inference parameters. 

```
{
    "message": string,
    "chat_history": [
        {
            "role":"USER or CHATBOT",
            "message": string
        }
  
    ],
    "documents": [
        {"title": string, "snippet": string},
    ],
    "search_queries_only" : boolean,
    "preamble" : string,
    "max_tokens": int,
    "temperature": float,
    "p": float,
    "k": float,
    "prompt_truncation" : string,
    "frequency_penalty" : float,
    "presence_penalty" : float,
    "seed" : int,
    "return_prompt" : boolean,
    "tools" : [
        {
            "name": string,
            "description": string,
            "parameter_definitions": {
                "parameter name": {
                    "description": string,
                    "type": string,
                    "required": boolean
                }
            }
        }
    ],
    "tool_results" : [
        {
            "call": {
                "name": string,
                "parameters": {
                "parameter name": string
                }
            },
        "outputs": [
                {
                "text": string
                }
            ]
        }
    ],
    "stop_sequences": [string],
    "raw_prompting" : boolean

}
```

The following are required parameters.
+ **message** – (Required) Text input for the model to respond to.

The following are optional parameters.
+ **chat\$1history** – A list of previous messages between the user and the model, meant to give the model conversational context for responding to the user's message. 

  The following are required fields.
  + `role` – The role for the message. Valid values are `USER` or `CHATBOT`. tokens.
  + `message` – Text contents of the message.

  The following is example JSON for the `chat_history` field

  ```
  "chat_history": [
  {"role": "USER", "message": "Who discovered gravity?"},
  {"role": "CHATBOT", "message": "The man who is widely credited with discovering gravity is Sir Isaac Newton"}
  ]
  ```
+ **documents** – A list of texts that the model can cite to generate a more accurate reply. Each document is a string-string dictionary. The resulting generation includes citations that reference some of these documents. We recommend that you keep the total word count of the strings in the dictionary to under 300 words. An `_excludes` field (array of strings) can be optionally supplied to omit some key-value pairs from being shown to the model. For more information, see the [Document Mode guide](https://docs.cohere.com/docs/retrieval-augmented-generation-rag#document-mode) in the Cohere documentation. 

  The following is example JSON for the `documents` field.

  ```
  "documents": [
  {"title": "Tall penguins", "snippet": "Emperor penguins are the tallest."},
  {"title": "Penguin habitats", "snippet": "Emperor penguins only live in Antarctica."}
  ]
  ```
+ **search\$1queries\$1only** – Defaults to `false`. When `true`, the response will only contain a list of generated search queries, but no search will take place, and no reply from the model to the user's `message` will be generated. 
+ **preamble** – Overrides the default preamble for search query generation. Has no effect on tool use generations. 
+ **max\$1tokens** – The maximum number of tokens the model should generate as part of the response. Note that setting a low value may result in incomplete generations. Setting `max_tokens` may result in incomplete or no generations when used with the `tools` or `documents` fields.
+ **temperature** – Use a lower value to decrease randomness in the response. Randomness can be further maximized by increasing the value of the `p` parameter.     
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-cohere-command-r-plus.html)
+ **p** – Top P. Use a lower value to ignore less probable options.     
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-cohere-command-r-plus.html)
+ **k** – Top K. Specify the number of token choices the model uses to generate the next token.     
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-cohere-command-r-plus.html)
+  **prompt\$1truncation** – Defaults to `OFF`. Dictates how the prompt is constructed. With `prompt_truncation` set to `AUTO_PRESERVE_ORDER`, some elements from `chat_history` and `documents` will be dropped to construct a prompt that fits within the model's context length limit. During this process the order of the documents and chat history will be preserved. With `prompt_truncation`` set to `OFF`, no elements will be dropped. 
+  **frequency\$1penalty** – Used to reduce repetitiveness of generated tokens. The higher the value, the stronger a penalty is applied to previously present tokens, proportional to how many times they have already appeared in the prompt or prior generation.     
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-cohere-command-r-plus.html)
+  **presence\$1penalty** – Used to reduce repetitiveness of generated tokens. Similar to `frequency_penalty`, except that this penalty is applied equally to all tokens that have already appeared, regardless of their exact frequencies.     
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-cohere-command-r-plus.html)
+ **seed** – If specified, the backend will make a best effort to sample tokens deterministically, such that repeated requests with the same seed and parameters should return the same result. However, determinism cannot be totally guaranteed.
+ **return\$1prompt** – Specify `true` to return the full prompt that was sent to the model. The default value is `false`. In the response, the prompt in the `prompt` field.
+ **tools** – A list of available tools (functions) that the model may suggest invoking before producing a text response. When `tools` is passed (without `tool_results`), the `text` field in the response will be `""` and the `tool_calls` field in the response will be populated with a list of tool calls that need to be made. If no calls need to be made, the `tool_calls` array will be empty. 

  For more information, see [Tool Use](https://docs.cohere.com/docs/tool-use) in the Cohere documentation.
**Tip**  
We recommend that you use the Converse API for integrating tool use into your application. For more information, see [Use a tool to complete an Amazon Bedrock model response](tool-use.md). 

  The following is example JSON for the `tools` field.

  ```
  [
      {
          "name": "top_song",
          "description": "Get the most popular song played on a radio station.",
          "parameter_definitions": {
              "sign": {
                  "description": "The call sign for the radio station for which you want the most popular song. Example calls signs are WZPZ and WKRP.",
                  "type": "str",
                  "required": true
              }
          }
      }
  ]
  ```

  For more information, see [Single-Step Tool Use (Function Calling)](https://docs.cohere.com/docs/tool-use) in the Cohere documentation.
+ **tools\$1results** – A list of results from invoking tools recommended by the model in the previous chat turn. Results are used to produce a text response and are referenced in citations. When using `tool_results`, `tools` must be passed as well. Each `tool_result` contains information about how it was invoked, as well as a list of outputs in the form of dictionaries. Cohere’s unique fine-grained citation logic requires the output to be a list. In case the output is just one item, such as `{"status": 200}`, you should still wrap it inside a list. 

  For more information, see [Tool Use](https://docs.cohere.com/docs/tool-use) in the Cohere documentation.

  The following is example JSON for the `tools_results` field.

  ```
  [
      {
          "call": {
              "name": "top_song",
              "parameters": {
                  "sign": "WZPZ"
              }
          },
          "outputs": [
              {
                  "song": "Elemental Hotel"
              }
          ]
      }
  ]
  ```
+  **stop\$1sequences** – A list of stop sequences. After a stop sequence is detected, the model stops generating further tokens.
+  **raw\$1prompting** – Specify `true`, to send the user’s `message` to the model without any preprocessing, otherwise false.

------
#### [ Response ]

The response has the following possible fields:

```
{
    "response_id": string,
    "text": string,
    "generation_id": string,
    "citations": [
        {
          "start": int,
          "end": int,
          "text": "string",
          "document_ids": [
              "string"
          ]
        }
      ],    
    "finish_reason": string,
    "tool_calls": [
        {
            "name": string,
            "parameters": {
                "parameter name": string
            }
        }
    ],
    {
    "meta": {
        "api_version": {
            "version": string
        },
        "billed_units": {
            "input_tokens": int,
            "output_tokens": int
        }
    }
}
```
+ **response\$1id** — Unique identifier for chat completion
+ **text** — The model’s response to chat message input. 
+ **generation\$1id** — Unique identifier for chat completion, used with Feedback endpoint on Cohere’s platform. 
+ **citations** — An array of inline citations and associated metadata for the generated reply. Contains the following fields:
  + **start** — The index that the citation begins at, starting from 0.
  + **end** — The index that the citation ends after, starting from 0.
  + **text** — The text that the citation pertains to.
  + **document\$1ids** — An array of document IDs that correspond to documents that are cited for the text.
+ **prompt** — The full prompt that was sent to the model. Specify the `return_prompt` field to return this field. 
+ **finish\$1reason** — The reason why the model stopped generating output. Can be any of the following: 
  + **complete** — The completion reached the end of generation token, ensure this is the finish reason for best performance.
  + **error\$1toxic** — The generation could not be completed due to our content filters.
  + **error\$1limit** — The generation could not be completed because the model’s context limit was reached.
  + **error** — The generation could not be completed due to an error.
  + **user\$1cancel** — The generation could not be completed because it was stopped by the user.
  + **max\$1tokens** — The generation could not be completed because the user specified a `max_tokens` limit in the request and this limit was reached. May not result in best performance.
+ **tool\$1calls** – A list of appropriate tools to calls. Only returned if you specify the `tools` input field.

  For more information, see [Tool Use](https://docs.cohere.com/docs/tool-use) in the Cohere documentation.
**Tip**  
We recommend that you use the Converse API for integrating tool use into your application. For more information, see [Use a tool to complete an Amazon Bedrock model response](tool-use.md). 

  The following is example JSON for the `tool_calls` field.

  ```
  [
          {
              "name": "top_song",
              "parameters": {
                  "sign": "WZPZ"
              }
          }
      ]
  ```
+ **meta** — API usage data (only exists for streaming). 
  + `api_version` — The API version. The version is in the `version` field.
  + `billed_units` — The billed units. Possible values are:
    + `input_tokens` — The number of input tokens that were billed.
    + `output_tokens` — The number of output tokens that were billed.

------

## Code example
<a name="api-inference-examples-cohere-command-r"></a>

This examples shows how to call the *Cohere Command R* model.

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to use the  Cohere Command R model.
"""
import json
import logging
import boto3


from botocore.exceptions import ClientError

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_text(model_id, body):
    """
    Generate text using a Cohere Command R model.
    Args:
        model_id (str): The model ID to use.
        body (str) : The reqest body to use.
    Returns:
        dict: The response from the model.
    """

    logger.info("Generating text with Cohere model %s", model_id)

    bedrock = boto3.client(service_name='bedrock-runtime')

    response = bedrock.invoke_model(
        body=body,
        modelId=model_id
    )

    logger.info(
        "Successfully generated text with Cohere Command R model %s", model_id)

    return response


def main():
    """
    Entrypoint for Cohere example.
    """

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    model_id = 'cohere.command-r-v1:0'
    chat_history = [
        {"role": "USER", "message": "What is an interesting new role in AI if I don't have an ML background?"},
        {"role": "CHATBOT", "message": "You could explore being a prompt engineer!"}
    ]
    message = "What are some skills I should have?"

    try:
        body = json.dumps({
            "message": message,
            "chat_history": chat_history,
            "max_tokens": 2000,
            "temperature": 0.6,
            "p": 0.5,
            "k": 250
        })
        response = generate_text(model_id=model_id,
                                 body=body)

        response_body = json.loads(response.get('body').read())
        response_chat_history = response_body.get('chat_history')
        print('Chat history\n------------')
        for response_message in response_chat_history:
            if 'message' in response_message:
                print(f"Role: {response_message['role']}")
                print(f"Message: {response_message['message']}\n")
        print("Generated text\n--------------")
        print(f"Stop reason: {response_body['finish_reason']}")
        print(f"Response text: \n{response_body['text']}")

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))
    else:
        print(f"Finished generating text with Cohere model {model_id}.")


if __name__ == "__main__":
    main()
```

# DeepSeek models
<a name="model-parameters-deepseek"></a>

DeepSeek’s R1 and V3.1 models are text-to-text models available for use for inferencing through the Invoke API ([InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html), [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html)) and the Converse API ([Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html) and [ConverseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseStream.html)). 

When you make inference calls with DeepSeek’s models, you must include a prompt for the model. For general information about creating prompts for the DeepSeek models that Amazon Bedrock supports, see [DeepSeek prompt guide](https://api-docs.deepseek.com/guides/reasoning_model.html). 

**Note**  
You can't remove request access from the Amazon Titan, Amazon Nova, DeepSeek-R1, Mistral AI, Meta Llama 3 Instruct, and Meta Llama 4 models. You can prevent users from making inference calls to these models by using an IAM policy and specifying the model ID. For more information, see [Deny access for inference of foundation models](https://docs.aws.amazon.com/bedrock/latest/userguide/security_iam_id-based-policy-examples.html#security_iam_id-based-policy-examples-deny-inference                         .html).
For optimal response quality with DeepSeek-R1, limit the `max_tokens` parameter to 8,192 tokens or fewer. While the API accepts up to 32,768 tokens, response quality significantly degrades above 8,192 tokens. This aligns with the model's reasoning capabilities as described in the [inference reasoning guide](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-reasoning.html).

This section describes the request parameters and response fields for DeepSeek models. Use this information to make inference calls to DeepSeek models with the [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) operation. This section also includes Python code examples that shows how to call DeepSeek models.

To use a model in an inference operation, you need the model ID for the model. Since this model is invoked through cross-Region inference, you will need to use the [Inference profile ID](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html) as the model ID. For example, for the US, you will use `us.deepseek.r1-v1:0`.
+ Model Name: DeepSeek-R1
+ Text Model

For more information on how to use DeepSeek models with APIs, see [DeepSeek Models](https://deepseek.com/).

**DeepSeek Request and Response**

**Request body**

DeepSeek has the following inference parameters for a Text Completion inference call.

```
{
    "prompt": string,
    "temperature": float, 
    "top_p": float,
    "max_tokens": int,
    "stop": string array
}
```

**Fields:**
+ **prompt** – (string) Required text input of prompt.
+ **temperature** – (float) Numerical value less than or equal to 1.
+ **top\$1p** – (float) Numerical value less than or equal to 1.
+ **max\$1tokens** – (int) Tokens used, minimum of 1 to a max of 8,192 tokens for optimal quality. While the API accepts up to 32,768 tokens, response quality significantly degrades above 8,192 tokens.
+ **stop** – (string array) Maximum of 10 items.

**Response body**

DeepSeek has the following response parameters for a Text Completion inference call. This example is a text completion from DeepSeek, and does not return a content reasoning block.

```
{
    "choices": [
        {
            "text": string,
            "stop_reason": string
        }
    ]
}
```

**Fields:**
+ **stop\$1reason** – (string) The reason why the response stopped generating text. Value of `stop` or `length`.
+ **stop** – (string) The model has finished generating text for the input prompt.
+ **length** – (string) The length of the tokens for the generated text exceeds the value of `max_tokens` in the call to `InvokeModel` ( or `InvokeModelWithResponseStream`, if you are streaming output). The response is truncated to `max_tokens`. Increase the value of `max_tokens` and try your request again.

**Example Code**

This example shows how to call the DeepSeek-R1 model.

```
# Use the API to send a text message to DeepSeek-R1.

import boto3
import json

from botocore.exceptions import ClientError

# Create a Bedrock Runtime client in the AWS Region of your choice.
client = boto3.client("bedrock-runtime", region_name="us-west-2")

# Set the cross Region inference profile ID for DeepSeek-R1
model_id = "us.deepseek.r1-v1:0"

# Define the prompt for the model.
prompt = "Describe the purpose of a 'hello world' program in one line."

# Embed the prompt in DeepSeek-R1's instruction format.
formatted_prompt = f"""
<｜begin▁of▁sentence｜><｜User｜>{prompt}<｜Assistant｜><think>\n
"""

body = json.dumps({
    "prompt": formatted_prompt,
    "max_tokens": 512,
    "temperature": 0.5,
    "top_p": 0.9,
})

try:
    # Invoke the model with the request.
    response = client.invoke_model(modelId=model_id, body=body)

    # Read the response body.
    model_response = json.loads(response["body"].read())
    
    # Extract choices.
    choices = model_response["choices"]
    
    # Print choices.
    for index, choice in enumerate(choices):
        print(f"Choice {index + 1}\n----------")
        print(f"Text:\n{choice['text']}\n")
        print(f"Stop reason: {choice['stop_reason']}\n")
except (ClientError, Exception) as e:
    print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
    exit(1)
```

**Converse**

Request Body - Use this request body example to call the ConverseAPI.

```
{
    "modelId": string, # us.deepseek.r1-v1:0
    "system": [
        {
            "text": string
        }
    ],
    "messages": [
        {
            "role": string,
            "content": [
                {
                    "text": string
                }
            ]
        }
    ],
    "inferenceConfig": {
        "temperature": float,
        "topP": float,
        "maxTokens": int,
        "stopSequences": string array
    },
    "guardrailConfig": { 
        "guardrailIdentifier":"string",
        "guardrailVersion": "string",
        "trace": "string"
    }
}
```

**Fields:**
+ **system** – (Optional) The system prompt for the request.
+ **messages** – (Required) The input messages.
  + **role** – The role of the conversation turn. Valid values are `user` and `assistant`.
  + **content** – (Required) The content of the conversation turn, as an array of objects. Each object contains a typefield, in which you can specify one of the following values:
    + **text** – (Required) If you specify this type, you must include a text field and specify the text prompt as its value.
+ **inferenceConfig** 
  + **temperature** – (Optional) Values: minimum = 0. maximum = 1.
  + **topP** – (Optional) Values: minimum = 0. maximum = 1.
  + **maxTokens** – (Optional) The maximum number of tokens to generate before stopping. Values: minimum = 0. maximum = 32,768.
  + **stopSequences ** – (Optional) Custom text sequences that causes the model to stop generating output. Maximum = 10 items.

Response Body - Use this request body example to call the ConverseAPI.

```
{
    "message": {
        "role" : "assistant",
        "content": [
            {
                "text": string
            },
            {
                "reasoningContent": {
                    "reasoningText": string
                }
            }
        ],
    },
    "stopReason": string,
    "usage": {
        "inputTokens": int,
        "outputTokens": int,
        "totalTokens": int
    }
    "metrics": {
        "latencyMs": int
    }
}
```

**Fields:**
+ **message** – The return response from the model.
+ **role** – The conversational role of the generated message. The value is always `assistant`.
+ **content** – The content generated by the model, which is returned as an array. There are two types of content:
  + **text** – The text content of the response.
  + **reasoningContent** – (Optional) The reasoning content from the model response.
    + **reasoningText** – The reasoning text from the model response.
+ **stopReason** – The reason why the model stopped generating the response. 
  + **end\$1turn** – The turn the model reached a stopping point.
  + **max\$1tokens** – The generated text exceeded the value of the `maxTokens` input field or exceeded the maximum number of tokens that the model supports.

Example Code - Here is an example of DeepSeek making a to call the ConverseAPI.

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to use the Converse API with DeepSeek-R1 (on demand).
"""

import logging
import boto3

from botocore.client import Config
from botocore.exceptions import ClientError


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_conversation(bedrock_client,
                          model_id,
                          system_prompts,
                          messages):
    """
    Sends messages to a model.
    Args:
        bedrock_client: The Boto3 Bedrock runtime client.
        model_id (str): The model ID to use.
        system_prompts (JSON) : The system prompts for the model to use.
        messages (JSON) : The messages to send to the model.

    Returns:
        response (JSON): The conversation that the model generated.

    """

    logger.info("Generating message with model %s", model_id)

    # Inference parameters to use.
    temperature = 0.5
    max_tokens = 4096

    # Base inference parameters to use.
    inference_config = {
        "temperature": temperature,
        "maxTokens": max_tokens,
    }

    # Send the message.
    response = bedrock_client.converse(
        modelId=model_id,
        messages=messages,
        system=system_prompts,
        inferenceConfig=inference_config,
    )

    # Log token usage.
    token_usage = response['usage']
    logger.info("Input tokens: %s", token_usage['inputTokens'])
    logger.info("Output tokens: %s", token_usage['outputTokens'])
    logger.info("Total tokens: %s", token_usage['totalTokens'])
    logger.info("Stop reason: %s", response['stopReason'])

    return response

def main():
    """
    Entrypoint for DeepSeek-R1 example.
    """

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    model_id = "us.deepseek.r1-v1:0"

    # Setup the system prompts and messages to send to the model.
    system_prompts = [{"text": "You are an app that creates playlists for a radio station that plays rock and pop music. Only return song names and the artist."}]
    message_1 = {
        "role": "user",
        "content": [{"text": "Create a list of 3 pop songs."}]
    }
    message_2 = {
        "role": "user",
        "content": [{"text": "Make sure the songs are by artists from the United Kingdom."}]
    }
    messages = []

    try:
        # Configure timeout for long responses if needed
        custom_config = Config(connect_timeout=840, read_timeout=840)
        bedrock_client = boto3.client(service_name='bedrock-runtime', config=custom_config)

        # Start the conversation with the 1st message.
        messages.append(message_1)
        response = generate_conversation(
            bedrock_client, model_id, system_prompts, messages)

        # Add the response message to the conversation.
        output_message = response['output']['message']
        
        # Remove reasoning content from the response
        output_contents = []
        for content in output_message["content"]:
            if content.get("reasoningContent"):
                continue
            else:
                output_contents.append(content)
        output_message["content"] = output_contents
        
        messages.append(output_message)

        # Continue the conversation with the 2nd message.
        messages.append(message_2)
        response = generate_conversation(
            bedrock_client, model_id, system_prompts, messages)

        output_message = response['output']['message']
        messages.append(output_message)

        # Show the complete conversation.
        for message in messages:
            print(f"Role: {message['role']}")
            for content in message['content']:
                if content.get("text"):
                    print(f"Text: {content['text']}")
                if content.get("reasoningContent"):
                    reasoning_content = content['reasoningContent']
                    reasoning_text = reasoning_content.get('reasoningText', {})
                    print()
                    print(f"Reasoning Text: {reasoning_text.get('text')}")
            print()

    except ClientError as err:
        message = err.response['Error']['Message']
        logger.error("A client error occurred: %s", message)
        print(f"A client error occured: {message}")

    else:
        print(
            f"Finished generating text with model {model_id}.")


if __name__ == "__main__":
    main()
```

# Luma AI models
<a name="model-parameters-luma"></a>

This section describes the request parameters and response fields for Luma AI models. Use this information to make inference calls to Luma AI models with the [StartAsyncInvoke](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_StartAsyncInvoke.html) operation. This section also includes Python code examples that shows how to call Luma AI models. To use a model in an inference operation, you need the model ID for the model. 
+ Model ID: luma.ray-v2:0
+ Model Name: Luma Ray 2
+ Text to Video Model

Luma AI models process model prompts asynchronously by using the Async APIs including [StartAsyncInvoke](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_StartAsyncInvoke.html), [GetAsyncInvoke](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_GetAsyncInvoke.html), and [ListAsyncInvokes](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ListAsyncInvokes.html).

Luma AI model processes prompts using the following steps. 
+ The user prompts the model using [StartAsyncInvoke](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_StartAsyncInvoke.html).
+ Wait until the InvokeJob is finished. You can use `GetAsyncInvoke` or `ListAsyncInvokes` to check the job completion status.
+ The model output will be placed in the specified output Amazon S3 bucket

For more information using the Luma AI models with the APIs, see [Video Generation](https://docs.lumalabs.ai/docs/video-generation).

Luma AI inference call. 

```
POST /async-invoke HTTP/1.1
Content-type: application/json
{
  "modelId": "luma.ray-v2:0",
  "modelInput": {
    "prompt": "your input text here",
    "aspect_ratio": "16:9",
    "loop": false,
    "duration": "5s",
    "resolution": "720p"
  },
  "outputDataConfig": {
    "s3OutputDataConfig": {
      "s3Uri": "s3://your-bucket-name"
    }
  }
}
```

**Fields**
+ **prompt** – (string) The content needed in the output video (1 <= length <= 5000 characters).
+ **aspect\$1ratio** – (enum) The aspect ratio of the output video ("1:1", "16:9", "9:16", "4:3", "3:4", "21:9", "9:21").
+ **loop** – (boolean) Whether to loop the output video.
+ **duration** – (enum) - The duration of the output video ("5s", "9s").
+ **resolution** – (enum) The resolution of the output video ("540p", "720p").

The MP4 file will be stored in the Amazon S3 bucket as configured in the response.

## Text-to-Video Generation
<a name="luma-text-to-video"></a>

Generate videos from text prompts using the Luma Ray 2 model. The model supports various customization options including aspect ratio, duration, resolution, and looping.

**Basic Text-to-Video Request**

```
{
  "modelId": "luma.ray-v2:0",
  "modelInput": {
    "prompt": "an old lady laughing underwater, wearing a scuba diving suit"
  },
  "outputDataConfig": {
    "s3OutputDataConfig": {
      "s3Uri": "s3://your-bucket-name"
    }
  }
}
```

**Advanced Text-to-Video with Options**

```
{
  "modelId": "luma.ray-v2:0",
  "modelInput": {
    "prompt": "an old lady laughing underwater, wearing a scuba diving suit",
    "aspect_ratio": "16:9",
    "loop": true,
    "duration": "5s",
    "resolution": "720p"
  },
  "outputDataConfig": {
    "s3OutputDataConfig": {
      "s3Uri": "s3://your-bucket-name"
    }
  }
}
```

**Additional Text-to-Video Example**

Example with resolution and duration parameters.

```
{
  "modelId": "luma.ray-v2:0",
  "modelInput": {
    "prompt": "a car",
    "resolution": "720p",
    "duration": "5s"
  },
  "outputDataConfig": {
    "s3OutputDataConfig": {
      "s3Uri": "s3://your-bucket-name"
    }
  }
}
```

## Image-to-Video Generation
<a name="luma-image-to-video"></a>

Transform static images into dynamic videos by providing keyframes. You can specify start frames, end frames, or both to control the video generation process.

**Basic Image-to-Video with Start Frame**

```
{
  "modelId": "luma.ray-v2:0",
  "modelInput": {
    "prompt": "A tiger walking in snow",
    "keyframes": {
      "frame0": {
        "type": "image",
        "source": {
          "type": "base64",
          "media_type": "image/jpeg",
          "data": "iVBORw0KGgoAAAANSUhEUgAAAAgAAAAIAQMAAAD+wSzIAAAABlBMVEX///+/v7+jQ3"
        }
      }
    }
  },
  "outputDataConfig": {
    "s3OutputDataConfig": {
      "s3Uri": "s3://your-bucket-name"
    }
  }
}
```

**Image-to-Video with Start and End Frames**

```
{
  "modelId": "luma.ray-v2:0",
  "modelInput": {
    "prompt": "A tiger walking in snow",
    "keyframes": {
      "frame0": {
        "type": "image",
        "source": {
          "type": "base64",
          "media_type": "image/jpeg",
          "data": "iVBORw0KGgoAAAANSUhEUgAAAAgAAAAIAQMAAAD+wSzIAAAABlBMVEX///+/v7+jQ3"
        }
      },
      "frame1": {
        "type": "image",
        "source": {
          "type": "base64",
          "media_type": "image/jpeg",
          "data": "iVBORw0KGgoAAAANSUhEUgAAAAgAAAAIAQMAAAD+wSzIAAAABlBMVEX///+/v7+jQ3"
        }
      }
    },
    "loop": false,
    "aspect_ratio": "16:9"
  },
  "outputDataConfig": {
    "s3OutputDataConfig": {
      "s3Uri": "s3://your-bucket-name"
    }
  }
}
```

**Additional Parameters for Image-to-Video**
+ **keyframes** – (object) Define start (frame0) and/or end (frame1) keyframes
  + **frame0** – Starting keyframe image
  + **frame1** – Ending keyframe image
  + **type** – Must be "image"
  + **source** – Image source

## Troubleshooting
<a name="luma-troubleshooting"></a>

Common issues and solutions when working with Luma AI models:
+ **Job Status "Failed"** - Check that your S3 bucket has proper write permissions and the bucket exists in the same region as your Bedrock service.
+ **Image URL Access Errors** - Ensure image URLs are publicly accessible and use HTTPS. Images must be in supported formats (JPEG, PNG).
+ **Invalid Parameter Errors** - Verify aspect ratio values match supported options ("1:1", "16:9", "9:16", "4:3", "3:4", "21:9", "9:21") and duration is either "5s" or "9s".
+ **Timeout Issues** - Use `GetAsyncInvoke` to check job status rather than waiting synchronously. Video generation can take several minutes.
+ **Prompt Length Errors** - Keep prompts between 1-5000 characters. Longer prompts will be rejected.

## Performance Notes
<a name="luma-performance"></a>

Important considerations for Luma AI model performance and limitations:
+ **Processing Time** - Video generation typically takes 2-5 minutes for 5-second videos and 4-8 minutes for 9-second videos, depending on complexity.
+ **Image Requirements** - Input images should be high quality with minimum resolution of 512x512 pixels. Maximum supported image size is 4096x4096 pixels.
+ **Output Video Size** - Generated videos range from 5-50 MB depending on duration, resolution, and content complexity.
+ **Rate Limits** - Async API calls are subject to service quotas. Monitor your usage and request quota increases if needed.
+ **S3 Storage** - Ensure sufficient S3 storage capacity for output videos and consider lifecycle policies for cost optimization.

## Related Documentation
<a name="luma-cross-references"></a>

For additional information and related services:
+ **Amazon S3 Configuration** - [Creating S3 buckets](https://docs.aws.amazon.com/s3/latest/userguide/creating-buckets-s3.html) and [bucket policies](https://docs.aws.amazon.com/s3/latest/userguide/bucket-policies.html) for output storage.
+ **Async API Operations** - [StartAsyncInvoke](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_StartAsyncInvoke.html), [GetAsyncInvoke](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_GetAsyncInvoke.html), and [ListAsyncInvokes](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ListAsyncInvokes.html) API reference.
+ **Service Quotas** - [Quotas for Amazon Bedrock](quotas.md) for Bedrock service limits and quota increase requests.
+ **Video Processing Best Practices** - [Submit prompts and generate responses with model inference](inference.md) for general model inference guidance.
+ **Luma AI Documentation** - [Luma Labs Video Generation Documentation](https://docs.lumalabs.ai/docs/video-generation) for detailed model capabilities and advanced features.

# Meta Llama models
<a name="model-parameters-meta"></a>

This section describes the request parameters and response fields for Meta Llama models. Use this information to make inference calls to Meta Llama models with the [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) and [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) (streaming) operations. This section also includes Python code examples that shows how to call Meta Llama models. To use a model in an inference operation, you need the model ID for the model. To get the model ID, see [Supported foundation models in Amazon Bedrock](models-supported.md). Some models also work with the [Converse API](conversation-inference.md). To check if the Converse API supports a specific Meta Llama model, see [Supported models and model features](conversation-inference-supported-models-features.md). For more code examples, see [Code examples for Amazon Bedrock using AWS SDKs](service_code_examples.md).

Foundation models in Amazon Bedrock support input and output modalities, which vary from model to model. To check the modalities that Meta Llama models support, see [Supported foundation models in Amazon Bedrock](models-supported.md). To check which Amazon Bedrock features the Meta Llama models support, see [Supported foundation models in Amazon Bedrock](models-supported.md). To check which AWS Regions that Meta Llama models are available in, see [Supported foundation models in Amazon Bedrock](models-supported.md).

When you make inference calls with Meta Llama models, you include a prompt for the model. For general information about creating prompts for the models that Amazon Bedrock supports, see [Prompt engineering concepts](prompt-engineering-guidelines.md). For Meta Llama specific prompt information, see the [Meta Llama prompt engineering guide](https://ai.meta.com/llama/get-started/#prompting).

**Note**  
Llama 3.2 Instruct and Llama 3.3 Instruct models use geofencing. This means that these models cannot be used outside the AWS Regions available for these models listed in the Regions table.

This section provides information for using the following models from Meta.
+ Llama 3 Instruct
+ Llama 3.1 Instruct
+ Llama 3.2 Instruct
+ Llama 3.3 Instruct
+ Llama 4 Instruct

**Topics**
+ [

## Request and response
](#model-parameters-meta-request-response)
+ [

## Example code
](#api-inference-examples-meta-llama)

## Request and response
<a name="model-parameters-meta-request-response"></a>

The request body is passed in the `body` field of a request to [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html).

**Note**  
You can't use the [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) or [ConverseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseStream.html) (streaming) operations with Llama 4 Instruct.

------
#### [ Request ]

The Llama 3 Instruct, Llama 3.1 Instruct, Llama 3.2 Instruct, and Llama 4 Instruct models have the following inference parameters: 

```
{
    "prompt": string,
    "temperature": float,
    "top_p": float,
    "max_gen_len": int
}
```

NOTE: Llama 3.2 and later models adds `images` to the request structure, which is a list of strings. Example: `images: Optional[List[str]]` 

The following are required parameters:
+  **prompt** – (Required) The prompt that you want to pass to the model. For optimal results, format the conversation with the following template.

  ```
  <|begin_of_text|><|start_header_id|>user<|end_header_id|>
  
  What can you help me with?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
  ```

  **Example template with system prompt**

  The following is an example prompt that includes a system prompt.

  ```
  <|begin_of_text|><|start_header_id|>system<|end_header_id|>
  
  You are a helpful AI assistant for travel tips and recommendations<|eot_id|><|start_header_id|>user<|end_header_id|>
  
  What can you help me with?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
  ```

  **Multi-turn conversation example**

  The following is an example prompt of a multi-turn conversation.

  ```
  <|begin_of_text|><|start_header_id|>user<|end_header_id|>
  
  What is the capital of France?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
  
  The capital of France is Paris!<|eot_id|><|start_header_id|>user<|end_header_id|>
  
  What is the weather like in Paris?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
  ```

  **Example template with system prompt**

  For more information, see [Meta Llama 3](https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3).

The following are optional parameters:
+ **temperature** – Use a lower value to decrease randomness in the response.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-meta.html)
+ **top\$1p** – Use a lower value to ignore less probable options. Set to 0 or 1.0 to disable.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-meta.html)
+ **max\$1gen\$1len** – Specify the maximum number of tokens to use in the generated response. The model truncates the response once the generated text exceeds `max_gen_len`.     
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-meta.html)

------
#### [ Response ]

The Llama 3 Instruct models return the following fields for a text completion inference call. 

```
{
    "generation": "\n\n<response>",
    "prompt_token_count": int,
    "generation_token_count": int,
    "stop_reason" : string
}
```

More information about each field is provided below.
+ **generation** – The generated text.
+ **prompt\$1token\$1count** – The number of tokens in the prompt.
+ **generation\$1token\$1count** – The number of tokens in the generated text.
+ **stop\$1reason** – The reason why the response stopped generating text. Possible values are:
  + **stop** – The model has finished generating text for the input prompt.
  + **length** – The length of the tokens for the generated text exceeds the value of `max_gen_len` in the call to `InvokeModel` (`InvokeModelWithResponseStream`, if you are streaming output). The response is truncated to `max_gen_len` tokens. Consider increasing the value of `max_gen_len` and trying again.

------

## Example code
<a name="api-inference-examples-meta-llama"></a>

This example shows how to call the *Llama 3 Instruct* model.

```
# Use the native inference API to send a text message to Meta Llama 3.

import boto3
import json

from botocore.exceptions import ClientError

# Create a Bedrock Runtime client in the AWS Region of your choice.
client = boto3.client("bedrock-runtime", region_name="us-west-2")

# Set the model ID, e.g., Llama 3 70b Instruct.
model_id = "meta.llama3-70b-instruct-v1:0"

# Define the prompt for the model.
prompt = "Describe the purpose of a 'hello world' program in one line."

# Embed the prompt in Llama 3's instruction format.
formatted_prompt = f"""
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
{prompt}
<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
"""

# Format the request payload using the model's native structure.
native_request = {
    "prompt": formatted_prompt,
    "max_gen_len": 512,
    "temperature": 0.5,
}

# Convert the native request to JSON.
request = json.dumps(native_request)

try:
    # Invoke the model with the request.
    response = client.invoke_model(modelId=model_id, body=request)

except (ClientError, Exception) as e:
    print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
    exit(1)

# Decode the response body.
model_response = json.loads(response["body"].read())

# Extract and print the response text.
response_text = model_response["generation"]
print(response_text)
```

This example shows how to control the generation length using Llama 3 Instruct models. For detailed responses or summaries, adjust `max\$1gen\$1len` and include specific instructions in your prompt.

# Mistral AI models
<a name="model-parameters-mistral"></a>

This section describes the request parameters and response fields for Mistral AI models. Use this information to make inference calls to Mistral AI models with the [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) and [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) (streaming) operations. This section also includes Python code examples that shows how to call Mistral AI models. To use a model in an inference operation, you need the model ID for the model. To get the model ID, see [Supported foundation models in Amazon Bedrock](models-supported.md). Some models also work with the [Converse API](conversation-inference.md). To check if the Converse API supports a specific Mistral AI model, see [Supported models and model features](conversation-inference-supported-models-features.md). For more code examples, see [Code examples for Amazon Bedrock using AWS SDKs](service_code_examples.md).

Foundation models in Amazon Bedrock support input and output modalities, which vary from model to model. To check the modalities that Mistral AI models support, see [Supported foundation models in Amazon Bedrock](models-supported.md). To check which Amazon Bedrock features the Mistral AI models support, see [Supported foundation models in Amazon Bedrock](models-supported.md). To check which AWS Regions that Mistral AI models are available in, see [Supported foundation models in Amazon Bedrock](models-supported.md).

When you make inference calls with Mistral AI models, you include a prompt for the model. For general information about creating prompts for the models that Amazon Bedrock supports, see [Prompt engineering concepts](prompt-engineering-guidelines.md). For Mistral AI specific prompt information, see the [Mistral AI prompt engineering guide](https://docs.mistral.ai/guides/prompting_capabilities/).

**Topics**
+ [

# Mistral AI text completion
](model-parameters-mistral-text-completion.md)
+ [

# Mistral AI chat completion
](model-parameters-mistral-chat-completion.md)
+ [

# Mistral AI Large (24.07) parameters and inference
](model-parameters-mistral-large-2407.md)
+ [

# Pixtral Large (25.02) parameters and inference
](model-parameters-mistral-pixtral-large.md)

# Mistral AI text completion
<a name="model-parameters-mistral-text-completion"></a>

The Mistral AI text completion API lets you generate text with a Mistral AI model.

You make inference requests to Mistral AI models with [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) (streaming). 

Mistral AI models are available under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0.txt). For more information about using Mistral AI models, see the [Mistral AI documentation](https://docs.mistral.ai/).

**Topics**
+ [

## Supported models
](#mistral--text-completion-supported-models)
+ [

## Request and Response
](#model-parameters-mistral-text-completion-request-response)
+ [

## Code example
](#api-inference-examples-mistral-text-completion)

## Supported models
<a name="mistral--text-completion-supported-models"></a>

You can use following Mistral AI models.
+ Mistral 7B Instruct
+ Mixtral 8X7B Instruct
+ Mistral Large
+ Mistral Small

You need the model ID for the model that you want to use. To get the model ID, see [Supported foundation models in Amazon Bedrock](models-supported.md). 

## Request and Response
<a name="model-parameters-mistral-text-completion-request-response"></a>

------
#### [ Request ]

The Mistral AI models have the following inference parameters. 

```
{
    "prompt": string,
    "max_tokens" : int,
    "stop" : [string],    
    "temperature": float,
    "top_p": float,
    "top_k": int
}
```

The following are required parameters.
+  **prompt** – (Required) The prompt that you want to pass to the model, as shown in the following example. 

  ```
  <s>[INST] What is your favourite condiment? [/INST]
  ```

  The following example shows how to format is a multi-turn prompt. 

  ```
  <s>[INST] What is your favourite condiment? [/INST]
  Well, I'm quite partial to a good squeeze of fresh lemon juice. 
  It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> 
  [INST] Do you have mayonnaise recipes? [/INST]
  ```

  Text for the user role is inside the `[INST]...[/INST]` tokens, text outside is the assistant role. The beginning and ending of a string are represented by the `<s>` (beginning of string) and `</s>` (end of string) tokens. For information about sending a chat prompt in the correct format, see [Chat template](https://docs.mistral.ai/models/#chat-template) in the Mistral AI documentation. 

The following are optional parameters.
+ **max\$1tokens** – Specify the maximum number of tokens to use in the generated response. The model truncates the response once the generated text exceeds `max_tokens`.     
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-mistral-text-completion.html)
+ **stop** – A list of stop sequences that if generated by the model, stops the model from generating further output.     
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-mistral-text-completion.html)
+ **temperature** – Controls the randomness of predictions made by the model. For more information, see [Influence response generation with inference parameters](inference-parameters.md).     
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-mistral-text-completion.html)
+ **top\$1p** – Controls the diversity of text that the model generates by setting the percentage of most-likely candidates that the model considers for the next token. For more information, see [Influence response generation with inference parameters](inference-parameters.md).    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-mistral-text-completion.html)
+ **top\$1k** – Controls the number of most-likely candidates that the model considers for the next token. For more information, see [Influence response generation with inference parameters](inference-parameters.md).    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-mistral-text-completion.html)

------
#### [ Response ]

The `body` response from a call to `InvokeModel` is the following:

```
{
  "outputs": [
    {
        "text": string,
        "stop_reason": string
    }
  ]
}
```

The `body` response has the following fields:
+ **outputs** – A list of outputs from the model. Each output has the following fields.
  + **text** – The text that the model generated. 
  + **stop\$1reason** – The reason why the response stopped generating text. Possible values are:
    + **stop** – The model has finished generating text for the input prompt. The model stops because it has no more content to generate or if the model generates one of the stop sequences that you define in the `stop` request parameter.
    + **length** – The length of the tokens for the generated text exceeds the value of `max_tokens` in the call to `InvokeModel` (`InvokeModelWithResponseStream`, if you are streaming output). The response is truncated to `max_tokens` tokens. 

------

## Code example
<a name="api-inference-examples-mistral-text-completion"></a>

This examples shows how to call the Mistral 7B Instruct model.

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate text using a Mistral AI model.
"""
import json
import logging
import boto3


from botocore.exceptions import ClientError

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_text(model_id, body):
    """
    Generate text using a Mistral AI model.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        JSON: The response from the model.
    """

    logger.info("Generating text with Mistral AI model %s", model_id)

    bedrock = boto3.client(service_name='bedrock-runtime')

    response = bedrock.invoke_model(
        body=body,
        modelId=model_id
    )

    logger.info("Successfully generated text with Mistral AI model %s", model_id)

    return response


def main():
    """
    Entrypoint for Mistral AI example.
    """

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    try:
        model_id = 'mistral.mistral-7b-instruct-v0:2'

        prompt = """<s>[INST] In Bash, how do I list all text files in the current directory
          (excluding subdirectories) that have been modified in the last month? [/INST]"""

        body = json.dumps({
            "prompt": prompt,
            "max_tokens": 400,
            "temperature": 0.7,
            "top_p": 0.7,
            "top_k": 50
        })

        response = generate_text(model_id=model_id,
                                 body=body)

        response_body = json.loads(response.get('body').read())

        outputs = response_body.get('outputs')

        for index, output in enumerate(outputs):

            print(f"Output {index + 1}\n----------")
            print(f"Text:\n{output['text']}\n")
            print(f"Stop reason: {output['stop_reason']}\n")

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))
    else:
        print(f"Finished generating text with Mistral AI model {model_id}.")


if __name__ == "__main__":
    main()
```

# Mistral AI chat completion
<a name="model-parameters-mistral-chat-completion"></a>

The Mistral AI chat completion API lets create conversational applications.

**Tip**  
You can use the Mistral AI chat completion API with the base inference operations ([InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html)). However, we recommend that you use the Converse API to implement messages in your application. The Converse API provides a unified set of parameters that work across all models that support messages. For more information, see [Carry out a conversation with the Converse API operations](conversation-inference.md).

Mistral AI models are available under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0.txt). For more information about using Mistral AI models, see the [Mistral AI documentation](https://docs.mistral.ai/).

**Topics**
+ [

## Supported models
](#mistral-supported-models-chat-completion)
+ [

## Request and Response
](#model-parameters-mistral-chat-completion-request-response)

## Supported models
<a name="mistral-supported-models-chat-completion"></a>

You can use following Mistral AI models.
+ Mistral Large

You need the model ID for the model that you want to use. To get the model ID, see [Supported foundation models in Amazon Bedrock](models-supported.md). 

## Request and Response
<a name="model-parameters-mistral-chat-completion-request-response"></a>

------
#### [ Request ]

The Mistral AI models have the following inference parameters. 

```
{
    "messages": [
        {
            "role": "system"|"user"|"assistant",
            "content": str
        },
        {
            "role": "assistant",
            "content": "",
            "tool_calls": [
                {
                    "id": str,
                    "function": {
                        "name": str,
                        "arguments": str
                    }
                }
            ]
        },
        {
            "role": "tool",
            "tool_call_id": str,
            "content": str
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": str,
                "description": str,
                "parameters": dict
            }
        }
    ],
    "tool_choice": "auto"|"any"|"none",
    "max_tokens": int,
    "top_p": float,
    "temperature": float
}
```

The following are required parameters.
+  **messages** – (Required) The messages that you want to pass to the model.
  + **role** – The role for the message. Valid values are:
    + **system** – Sets the behavior and context for the model in the conversation. 
    + **user** – The user message to send to the model.
    + **assistant** – The response from the model.
  + **content** – The content for the message.

  ```
  [
      {
          "role": "user",
          "content": "What is the most popular song on WZPZ?"
      }
  ]
  ```

  To pass a tool result, use JSON with the following fields.
  + **role** – The role for the message. The value must be `tool`. 
  + **tool\$1call\$1id** – The ID of the tool request. You get the ID from the `tool_calls` fields in the response from the previous request. 
  + **content** – The result from the tool.

  The following example is the result from a tool that gets the most popular song on a radio station.

  ```
  {
      "role": "tool",
      "tool_call_id": "v6RMMiRlT7ygYkT4uULjtg",
      "content": "{\"song\": \"Elemental Hotel\", \"artist\": \"8 Storey Hike\"}"
  }
  ```

The following are optional parameters.
+  **tools** – Definitions of tools that the model may use.

  If you include `tools` in your request, the model may return a `tool_calls` field in the message that represent the model's use of those tools. You can then run those tools using the tool input generated by the model and then optionally return results back to the model using `tool_result` content blocks.

  The following example is for a tool that gets the most popular songs on a radio station.

  ```
  [
      {
          "type": "function",
          "function": {
              "name": "top_song",
              "description": "Get the most popular song played on a radio station.",
              "parameters": {
                  "type": "object",
                  "properties": {
                      "sign": {
                          "type": "string",
                          "description": "The call sign for the radio station for which you want the most popular song. Example calls signs are WZPZ and WKRP."
                      }
                  },
                  "required": [
                      "sign"
                  ]
              }
          }
      }
  ]
  ```
+  **tool\$1choice** – Specifies how functions are called. If set to `none` the model won't call a function and will generate a message instead. If set to `auto` the model can choose to either generate a message or call a function. If set to `any` the model is forced to call a function.
+ **max\$1tokens** – Specify the maximum number of tokens to use in the generated response. The model truncates the response once the generated text exceeds `max_tokens`.     
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-mistral-chat-completion.html)
+ **temperature** – Controls the randomness of predictions made by the model. For more information, see [Influence response generation with inference parameters](inference-parameters.md).     
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-mistral-chat-completion.html)
+ **top\$1p** – Controls the diversity of text that the model generates by setting the percentage of most-likely candidates that the model considers for the next token. For more information, see [Influence response generation with inference parameters](inference-parameters.md).    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-mistral-chat-completion.html)

------
#### [ Response ]

The `body` response from a call to `InvokeModel` is the following:

```
{
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": str,
                "tool_calls": [...]
            },
            "stop_reason": "stop"|"length"|"tool_calls"
        }
    ]
}
```

The `body` response has the following fields:
+ **choices** – The output from the model. fields.
  + **index** – The index for the message. 
  + **message** – The message from the model. 
    + **role** – The role for the message. 
    + **content** – The content for the message. 
    + **tool\$1calls** – If the value of `stop_reason` is `tool_calls`, this field contains a list of tool requests that the model wants you to run. 
      + **id** – The ID for the tool request. 
      + **function** – The function that the model is requesting. 
        + **name** – The name of the function. 
        + **arguments** – The arguments to pass to the tool 

      The following is an example request for a tool that gets the top song on a radio station.

      ```
      [
                          {
                              "id": "v6RMMiRlT7ygYkT4uULjtg",
                              "function": {
                                  "name": "top_song",
                                  "arguments": "{\"sign\": \"WZPZ\"}"
                              }
                          }
                      ]
      ```
  + **stop\$1reason** – The reason why the response stopped generating text. Possible values are:
    + **stop** – The model has finished generating text for the input prompt. The model stops because it has no more content to generate or if the model generates one of the stop sequences that you define in the `stop` request parameter.
    + **length** – The length of the tokens for the generated text exceeds the value of `max_tokens`. The response is truncated to `max_tokens` tokens. 
    + **tool\$1calls** – The model is requesting that you run a tool.

------

# Mistral AI Large (24.07) parameters and inference
<a name="model-parameters-mistral-large-2407"></a>

The Mistral AI chat completion API lets you create conversational applications. You can also use the Amazon Bedrock Converse API with this model. You can use tools to make function calls.

**Tip**  
You can use the Mistral AI chat completion API with the base inference operations ([InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html)). However, we recommend that you use the Converse API to implement messages in your application. The Converse API provides a unified set of parameters that work across all models that support messages. For more information, see [Carry out a conversation with the Converse API operations](conversation-inference.md).

Mistral AI models are available under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0.txt). For more information about using Mistral AI models, see the [Mistral AI documentation](https://docs.mistral.ai/).

**Topics**
+ [

## Supported models
](#mistral-supported-models-chat-completion)
+ [

## Request and Response Examples
](#model-parameters-mistral-large-2407-request-response)

## Supported models
<a name="mistral-supported-models-chat-completion"></a>

You can use following Mistral AI models with the code examples on this page..
+ Mistral Large 2 (24.07)

You need the model ID for the model that you want to use. To get the model ID, see [Supported foundation models in Amazon Bedrock](models-supported.md). 

## Request and Response Examples
<a name="model-parameters-mistral-large-2407-request-response"></a>

------
#### [ Request ]

Mistral AI Large (24.07) invoke model example. 

```
import boto3 
import json

bedrock = session.client('bedrock-runtime', 'us-west-2') 
response = bedrock.invoke_model( 
        modelId='mistral.mistral-large-2407-v1:0', 
        body=json.dumps({
            'messages': [ 
                { 
                    'role': 'user', 
                    'content': 'which llm are you?' 
                } 
             ], 
         }) 
       ) 

print(json.dumps(json.loads(response['body']), indent=4))
```

------
#### [ Converse ]

Mistral AI Large (24.07) converse example. 

```
import boto3 
import json

bedrock = session.client('bedrock-runtime', 'us-west-2')
response = bedrock.converse( 
    modelId='mistral.mistral-large-2407-v1:0', 
    messages=[ 
        { 
            'role': 'user', 
            'content': [ 
                { 
                    'text': 'which llm are you?' 
                } 
             ] 
          } 
     ] 
  ) 

print(json.dumps(json.loads(response['body']), indent=4))
```

------
#### [ invoke\$1model\$1with\$1response\$1stream ]

Mistral AI Large (24.07) invoke\$1model\$1with\$1response\$1stream example. 

```
import boto3 
import json

bedrock = session.client('bedrock-runtime', 'us-west-2')
response = bedrock.invoke_model_with_response_stream(
    "body": json.dumps({
        "messages": [{"role": "user", "content": "What is the best French cheese?"}],
        }),
        "modelId":"mistral.mistral-large-2407-v1:0"
)

stream = response.get('body')
if stream:
        for event in stream:
            chunk=event.get('chunk')
            if chunk:
                chunk_obj=json.loads(chunk.get('bytes').decode())
                print(chunk_obj)
```

------
#### [ converse\$1stream ]

Mistral AI Large (24.07) converse\$1stream example. 

```
import boto3 
import json

bedrock = session.client('bedrock-runtime', 'us-west-2')
mistral_params = {
    "messages": [{
            "role": "user","content": [{"text": "What is the best French cheese? "}]
         }],
            "modelId":"mistral.mistral-large-2407-v1:0",
        }
    response = bedrock.converse_stream(**mistral_params)
    stream = response.get('stream')
    if stream:
        for event in stream:

            if 'messageStart' in event:
                print(f"\nRole: {event['messageStart']['role']}")

            if 'contentBlockDelta' in event:
                print(event['contentBlockDelta']['delta']['text'], end="")

            if 'messageStop' in event:
                print(f"\nStop reason: {event['messageStop']['stopReason']}")

            if 'metadata' in event:
                metadata = event['metadata']
                if 'usage' in metadata:
                    print("\nToken usage ... ")
                    print(f"Input tokens: {metadata['usage']['inputTokens']}")
                    print(
                        f":Output tokens: {metadata['usage']['outputTokens']}")
                    print(f":Total tokens: {metadata['usage']['totalTokens']}")
                if 'metrics' in event['metadata']:
                    print(
                        f"Latency: {metadata['metrics']['latencyMs']} milliseconds")
```

------
#### [ JSON Output ]

Mistral AI Large (24.07) JSON output example. 

```
import boto3 
import json

bedrock = session.client('bedrock-runtime', 'us-west-2')
mistral_params = {
        "body": json.dumps({
            "messages": [{"role": "user", "content": "What is the best French meal? Return the name and the ingredients in short JSON object."}]
        }),
        "modelId":"mistral.mistral-large-2407-v1:0",
    }
response = bedrock.invoke_model(**mistral_params)

body = response.get('body').read().decode('utf-8')
print(json.loads(body))
```

------
#### [ Tooling ]

Mistral AI Large (24.07) tools example. 

```
data = {
    'transaction_id': ['T1001', 'T1002', 'T1003', 'T1004', 'T1005'],
    'customer_id': ['C001', 'C002', 'C003', 'C002', 'C001'],
    'payment_amount': [125.50, 89.99, 120.00, 54.30, 210.20],
    'payment_date': ['2021-10-05', '2021-10-06', '2021-10-07', '2021-10-05', '2021-10-08'],
    'payment_status': ['Paid', 'Unpaid', 'Paid', 'Paid', 'Pending']
}

# Create DataFrame
df = pd.DataFrame(data)


def retrieve_payment_status(df: data, transaction_id: str) -> str:
    if transaction_id in df.transaction_id.values: 
        return json.dumps({'status': df[df.transaction_id == transaction_id].payment_status.item()})
    return json.dumps({'error': 'transaction id not found.'})

def retrieve_payment_date(df: data, transaction_id: str) -> str:
    if transaction_id in df.transaction_id.values: 
        return json.dumps({'date': df[df.transaction_id == transaction_id].payment_date.item()})
    return json.dumps({'error': 'transaction id not found.'})

tools = [
    {
        "type": "function",
        "function": {
            "name": "retrieve_payment_status",
            "description": "Get payment status of a transaction",
            "parameters": {
                "type": "object",
                "properties": {
                    "transaction_id": {
                        "type": "string",
                        "description": "The transaction id.",
                    }
                },
                "required": ["transaction_id"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "retrieve_payment_date",
            "description": "Get payment date of a transaction",
            "parameters": {
                "type": "object",
                "properties": {
                    "transaction_id": {
                        "type": "string",
                        "description": "The transaction id.",
                    }
                },
                "required": ["transaction_id"],
            },
        },
    }
]

names_to_functions = {
    'retrieve_payment_status': functools.partial(retrieve_payment_status, df=df),
    'retrieve_payment_date': functools.partial(retrieve_payment_date, df=df)
}



test_tool_input = "What's the status of my transaction T1001?"
message = [{"role": "user", "content": test_tool_input}]


def invoke_bedrock_mistral_tool():
   
    mistral_params = {
        "body": json.dumps({
            "messages": message,
            "tools": tools           
        }),
        "modelId":"mistral.mistral-large-2407-v1:0",
    }
    response = bedrock.invoke_model(**mistral_params)
    body = response.get('body').read().decode('utf-8')
    body = json.loads(body)
    choices = body.get("choices")
    message.append(choices[0].get("message"))

    tool_call = choices[0].get("message").get("tool_calls")[0]
    function_name = tool_call.get("function").get("name")
    function_params = json.loads(tool_call.get("function").get("arguments"))
    print("\nfunction_name: ", function_name, "\nfunction_params: ", function_params)
    function_result = names_to_functions[function_name](**function_params)

    message.append({"role": "tool", "content": function_result, "tool_call_id":tool_call.get("id")})
   
    new_mistral_params = {
        "body": json.dumps({
                "messages": message,
                "tools": tools           
        }),
        "modelId":"mistral.mistral-large-2407-v1:0",
    }
    response = bedrock.invoke_model(**new_mistral_params)
    body = response.get('body').read().decode('utf-8')
    body = json.loads(body)
    print(body)
invoke_bedrock_mistral_tool()
```

------

# Pixtral Large (25.02) parameters and inference
<a name="model-parameters-mistral-pixtral-large"></a>

Pixtral Large 25.02 is a 124B parameter multimodal model that combines state-of-the-art image understanding with powerful text processing capabilities. AWS is the first cloud provider to deliver Pixtral Large (25.02) as a fully-managed, serverless model. This model delivers frontier-class performance when performing document analysis, chart interpretation, and natural image understanding tasks, while maintaining the advanced text capabilities of Mistral Large 2.

With a 128K context window, Pixtral Large 25.02 achieves best-in-class performance on key benchmarks including MathVista, DocVQA, and VQAv2. The model features comprehensive multilingual support across many languages and is trained on over 80 programming languages. Key capabilities include advanced mathematical reasoning, native function calling, JSON outputting, and robust context adherence for RAG applications.

The Mistral AI chat completion API lets you create conversational applications. You can also use the Amazon Bedrock Converse API with this model. You can use tools to make function calls.

**Tip**  
You can use the Mistral AI chat completion API with the base inference operations ([InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html)). However, we recommend that you use the Converse API to implement messages in your application. The Converse API provides a unified set of parameters that work across all models that support messages. For more information, see [Carry out a conversation with the Converse API operations](conversation-inference.md).

The Mistral AI Pixtral Large model is available under the [Mistral Research License](https://mistral.ai/licenses/MRL-0.1.md). For more information about using Mistral AI models, see the [Mistral AI documentation](https://docs.mistral.ai/).

**Topics**
+ [

## Supported models
](#mistral-supported-models-chat-completion)
+ [

## Request and Response Examples
](#model-parameters-pixtral-large-2502-request-response)

## Supported models
<a name="mistral-supported-models-chat-completion"></a>

You can use following Mistral AI models with the code examples on this page..
+ Pixtral Large (25.02)

You need the model ID for the model that you want to use. To get the model ID, see [Supported foundation models in Amazon Bedrock](models-supported.md). 

## Request and Response Examples
<a name="model-parameters-pixtral-large-2502-request-response"></a>

------
#### [ Request ]

Pixtral Large (25.02) invoke model example.

```
import boto3
import json
import base64


input_image = "image.png"
with open(input_image, "rb") as f:
    image = f.read()

image_bytes = base64.b64encode(image).decode("utf-8")

bedrock = boto3.client(
    service_name='bedrock-runtime',
    region_name="us-east-1")


request_body = {
    "messages" : [
        {
          "role" : "user",
          "content" : [
            {
              "text": "Describe this picture:",
              "type": "text"
            },
            {
              "type" : "image_url",
              "image_url" : {
                "url" : f"data:image/png;base64,{image_bytes}"
              }
            }
          ]
        }
      ],
      "max_tokens" : 10
    }

response = bedrock.invoke_model(
        modelId='us.mistral.pixtral-large-2502-v1:0',
        body=json.dumps(request_body)
       )


print(json.dumps(json.loads(response.get('body').read()), indent=4))
```

------
#### [ Converse ]

Pixtral Large (25.02) Converse example.

```
import boto3
import json
import base64

input_image = "image.png"
with open(input_image, "rb") as f:
    image_bytes = f.read()


bedrock = boto3.client(
    service_name='bedrock-runtime',
    region_name="us-east-1")

messages =[
    {
        "role" : "user",
        "content" : [
            {
              "text": "Describe this picture:"
            },
            {
                "image": {
                    "format": "png",
                    "source": {
                        "bytes": image_bytes
                    }
                }
            }
        ]
    }
]

response = bedrock.converse(
        modelId='mistral.pixtral-large-2502-v1:0',
        messages=messages
       )

print(json.dumps(response.get('output'), indent=4))
```

------
#### [ invoke\$1model\$1with\$1response\$1stream ]

Pixtral Large (25.02) invoke\$1model\$1with\$1response\$1stream example. 

```
import boto3
import json
import base64


input_image = "image.png"
with open(input_image, "rb") as f:
    image = f.read()

image_bytes = base64.b64encode(image).decode("utf-8")

bedrock = boto3.client(
    service_name='bedrock-runtime',
    region_name="us-east-1")


request_body = {
    "messages" : [
        {
          "role" : "user",
          "content" : [
            {
              "text": "Describe this picture:",
              "type": "text"
            },
            {
              "type" : "image_url",
              "image_url" : {
                "url" : f"data:image/png;base64,{image_bytes}"
              }
            }
          ]
        }
      ],
      "max_tokens" : 10
    }

response = bedrock.invoke_model_with_response_stream(
        modelId='us.mistral.pixtral-large-2502-v1:0',
        body=json.dumps(request_body)
       )

stream = response.get('body')
if stream:
    for event in stream:
        chunk=event.get('chunk')
        if chunk:
            chunk_obj=json.loads(chunk.get('bytes').decode())
            print(chunk_obj)
```

------
#### [ converse\$1stream ]

Pixtral Large (25.02) converse\$1stream example. 

```
import boto3
import json
import base64

input_image = "image.png"
with open(input_image, "rb") as f:
    image_bytes = f.read()


bedrock = boto3.client(
    service_name='bedrock-runtime',
    region_name="us-east-1")

messages =[
    {
        "role" : "user",
        "content" : [
            {
              "text": "Describe this picture:"
            },
            {
                "image": {
                    "format": "png",
                    "source": {
                        "bytes": image_bytes
                    }
                }
            }
        ]
    }
]

response = bedrock.converse_stream(
        modelId='mistral.pixtral-large-2502-v1:0',
        messages=messages
       )

stream = response.get('stream')
if stream:
    for event in stream:
        if 'messageStart' in event:
            print(f"\nRole: {event['messageStart']['role']}")

        if 'contentBlockDelta' in event:
            print(event['contentBlockDelta']['delta']['text'], end="")

        if 'messageStop' in event:
            print(f"\nStop reason: {event['messageStop']['stopReason']}")

        if 'metadata' in event:
            metadata = event['metadata']
            if 'usage' in metadata:
                print("\nToken usage ... ")
                print(f"Input tokens: {metadata['usage']['inputTokens']}")
                print(
                    f":Output tokens: {metadata['usage']['outputTokens']}")
                print(f":Total tokens: {metadata['usage']['totalTokens']}")
            if 'metrics' in event['metadata']:
                print(
                    f"Latency: {metadata['metrics']['latencyMs']} milliseconds")
```

------
#### [ JSON Output ]

Pixtral Large (25.02) JSON output example. 

```
import boto3 
import json

bedrock = session.client('bedrock-runtime', 'us-west-2')
mistral_params = {
        "body": json.dumps({
            "messages": [{"role": "user", "content": "What is the best French meal? Return the name and the ingredients in short JSON object."}]
        }),
        "modelId":"us.mistral.pixtral-large-2502-v1:0",
    }
response = bedrock.invoke_model(**mistral_params)

body = response.get('body').read().decode('utf-8')
print(json.loads(body))
```

------
#### [ Tooling ]

Pixtral Large (25.02) tools example. 

```
data = {
    'transaction_id': ['T1001', 'T1002', 'T1003', 'T1004', 'T1005'],
    'customer_id': ['C001', 'C002', 'C003', 'C002', 'C001'],
    'payment_amount': [125.50, 89.99, 120.00, 54.30, 210.20],
    'payment_date': ['2021-10-05', '2021-10-06', '2021-10-07', '2021-10-05', '2021-10-08'],
    'payment_status': ['Paid', 'Unpaid', 'Paid', 'Paid', 'Pending']
}

# Create DataFrame
df = pd.DataFrame(data)


def retrieve_payment_status(df: data, transaction_id: str) -> str:
    if transaction_id in df.transaction_id.values: 
        return json.dumps({'status': df[df.transaction_id == transaction_id].payment_status.item()})
    return json.dumps({'error': 'transaction id not found.'})

def retrieve_payment_date(df: data, transaction_id: str) -> str:
    if transaction_id in df.transaction_id.values: 
        return json.dumps({'date': df[df.transaction_id == transaction_id].payment_date.item()})
    return json.dumps({'error': 'transaction id not found.'})

tools = [
    {
        "type": "function",
        "function": {
            "name": "retrieve_payment_status",
            "description": "Get payment status of a transaction",
            "parameters": {
                "type": "object",
                "properties": {
                    "transaction_id": {
                        "type": "string",
                        "description": "The transaction id.",
                    }
                },
                "required": ["transaction_id"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "retrieve_payment_date",
            "description": "Get payment date of a transaction",
            "parameters": {
                "type": "object",
                "properties": {
                    "transaction_id": {
                        "type": "string",
                        "description": "The transaction id.",
                    }
                },
                "required": ["transaction_id"],
            },
        },
    }
]

names_to_functions = {
    'retrieve_payment_status': functools.partial(retrieve_payment_status, df=df),
    'retrieve_payment_date': functools.partial(retrieve_payment_date, df=df)
}



test_tool_input = "What's the status of my transaction T1001?"
message = [{"role": "user", "content": test_tool_input}]


def invoke_bedrock_mistral_tool():
   
    mistral_params = {
        "body": json.dumps({
            "messages": message,
            "tools": tools           
        }),
        "modelId":"us.mistral.pixtral-large-2502-v1:0",
    }
    response = bedrock.invoke_model(**mistral_params)
    body = response.get('body').read().decode('utf-8')
    body = json.loads(body)
    choices = body.get("choices")
    message.append(choices[0].get("message"))

    tool_call = choices[0].get("message").get("tool_calls")[0]
    function_name = tool_call.get("function").get("name")
    function_params = json.loads(tool_call.get("function").get("arguments"))
    print("\nfunction_name: ", function_name, "\nfunction_params: ", function_params)
    function_result = names_to_functions[function_name](**function_params)

    message.append({"role": "tool", "content": function_result, "tool_call_id":tool_call.get("id")})
   
    new_mistral_params = {
        "body": json.dumps({
                "messages": message,
                "tools": tools           
        }),
        "modelId":"us.mistral.pixtral-large-2502-v1:0",
    }
    response = bedrock.invoke_model(**new_mistral_params)
    body = response.get('body').read().decode('utf-8')
    body = json.loads(body)
    print(body)
invoke_bedrock_mistral_tool()
```

------

# OpenAI models
<a name="model-parameters-openai"></a>

OpenAI offers the following open-weight models:
+ [https://huggingface.co/openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) – A smaller model optimized for lower latency and local or specialized use cases.
+ [https://huggingface.co/openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) – A larger model optimized for production and general purpose or high-reasoning use cases.

The following table summarizes information about the models:


| Information | gpt-oss-20b | gpt-oss-120b | 
| --- | --- | --- | 
| Release date | August 5, 2025 | August 5, 2025 | 
| Model ID | openai.gpt-oss-20b-1:0 | openai.gpt-oss-120b-1:0 | 
| Product ID | N/A | N/A | 
| Input modalities supported | Text | Text | 
| Output modalities supported | Text | Text | 
| Context window | 128,000 | 128,000 | 

The OpenAI models support the following features:
+ [Model invocation](inference.md) with the following operations:
  + [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html)
  + [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html)
  + [OpenAI Chat completions API](inference-chat-completions.md)
+ [Batch inference](batch-inference.md) with [CreateModelInvocationJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateModelInvocationJob.html).
+ [Guardrails](guardrails.md) application through the use of headers in the model invocation operations.

**Topics**
+ [

## OpenAI request body
](#model-parameters-openai-request)
+ [

## OpenAI response body
](#model-parameters-openai-response)
+ [

## Example usage of OpenAI models
](#model-parameters-openai-use)

## OpenAI request body
<a name="model-parameters-openai-request"></a>

For information about the parameters in the request body and their descriptions, see [Create chat completion](https://platform.openai.com/docs/api-reference/chat/create) in the OpenAI documentation.

Use the request body fields in the following ways:
+ In an [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or OpenAI Chat Completions request, include the fields in the request body.
+ In a [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html) request, do the following:
  + Map the `messages` as follows:
    + For each message whose role is `developer`, add the `content` a [SystemContentBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_SystemContentBlock.html) in the `system` array.
    + For each message whose role is `user` or `assistant`, add the `content` to a [ContentBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ContentBlock.html) in the `content` field and specify the `role` in the `role` field of a [Message](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Message.html) in the `messages` array.
  + Map the values for the following fields to the corresponding fields in the `inferenceConfig` object:  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-openai.html)
  + Include any other fields in the `additionalModelRequestFields` object.

**Considerations when constructing the request body**
+ The OpenAI models support only text input and text output.
+ The value in the `model` field must match the one in the header. You can omit this field to let it be automatically populated with the same value as the header.
+ The value in the `stream` field must match the API operation that you use. You can omit this field to let it be automatically populated with the correct value.
  + If you use [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html), the `stream` value must be `false`.

## OpenAI response body
<a name="model-parameters-openai-response"></a>

The response body for OpenAI models conforms to the chat completion object returned by OpenAI. For more information about the response fields, see [The chat completion object](https://platform.openai.com/docs/api-reference/chat/object) in the OpenAI documentation.

**Note**  
If you use `InvokeModel`, the model reasoning, surrounded by `<reasoning>` tags, precedes the text content of the response.

## Example usage of OpenAI models
<a name="model-parameters-openai-use"></a>

This section provides some examples of how to use the OpenAI models.

### Prerequisites
<a name="model-parameters-openai-use-prereq"></a>

Before trying out these examples, check that you've fulfilled the prerequisites:
+ **Authentication** – You can authenticate with either your AWS credentials or with an Amazon Bedrock API key.

  Set up your AWS credentials or generate an Amazon Bedrock API key to authenticate your request.
  + To learn about setting up your AWS credentials, see [Programmatic access with AWS security credentials](https://docs.aws.amazon.com/IAM/latest/UserGuide/security-creds-programmatic-access.html).
  + To learn about Amazon Bedrock API keys and how to generate them, see the API keys section in the Build chapter.
**Note**  
If you use the OpenAI Chat completions API, you can only authenticate with an Amazon Bedrock API key.
+ **Endpoint** – Find the endpoint that corresponds to the AWS Region to use in [Amazon Bedrock Runtime endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-rt). If you use an AWS SDK, you might only need to specify the region code and not the whole endpoint when you set up the client. You must use an endpoint associated with a Region supported by the model used in the example.
+ **Model access** – Request access to an OpenAI model. For more information, see [Manage model access using SDK and CLI](model-access.md#model-access-modify).
+ **(If the example uses an SDK) Install the SDK** – After installation, set up default credentials and a default AWS Region. If you don't set up default credentials or a Region, you'll have to explicitly specify them in the relevant code examples. For more information about standardized credential providers, see [AWS SDKs and Tools standardized credential providers](https://docs.aws.amazon.com/sdkref/latest/guide/standardized-credentials.html).
**Note**  
If you use the OpenAI SDK, you can only authenticate with an Amazon Bedrock API key and you must explicitly set the Amazon Bedrock endpoint.

Expand the section for the example that you want to see:

### OpenAI Create chat completion
<a name="model-parameters-openai-use-chat-completions"></a>

To see examples of using the OpenAI Create chat completion API, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

The following Python script calls the Create chat completion API with the OpenAI Python SDK:

```
from openai import OpenAI

client = OpenAI(
    base_url="https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1", 
    api_key="$AWS_BEARER_TOKEN_BEDROCK" # Replace with actual API key
)

completion = client.chat.completions.create(
    model="openai.gpt-oss-20b-1:0",
    messages=[
        {
            "role": "developer",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Hello!"
        }
    ]
)

print(completion.choices[0].message)
```

------
#### [ HTTP request using curl ]

You can run the following command in a terminal to call the Create chat completion API using curl:

```
curl -X POST https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1/chat/completions \
   -H "Content-Type: application/json" \
   -H "Authorization: Bearer $AWS_BEARER_TOKEN_BEDROCK" \
   -d '{
    "model": "openai.gpt-oss-20b-1:0",
    "messages": [
        {
            "role": "developer",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Hello!"
        }
    ]
}'
```

------

### InvokeModel
<a name="model-parameters-openai-use-invoke"></a>

Choose the tab for your preferred method, and then follow the steps:

------
#### [ Python ]

```
import boto3
import json

# Initialize the Bedrock Runtime client
client = boto3.client('bedrock-runtime')

# Model ID
model_id = 'openai.gpt-oss-20b-1:0'

# Create the request body
native_request = {
  "model": model_id, # You can omit this field
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "assistant", 
      "content": "Hello! How can I help you today?"
    },
    {
      "role": "user",
      "content": "What is the weather like today?"
    }
  ],
  "max_completion_tokens": 150,
  "temperature": 0.7,
  "top_p": 0.9,
  "stream": False # You can omit this field
}

# Make the InvokeModel request
response = client.invoke_model(
    modelId=model_id,
    body=json.dumps(native_request)
)

# Parse and print the message for each choice in the chat completion
response_body = json.loads(response['body'].read().decode('utf-8'))

for choice in response_body['choices']:
    print(choice['message']['content'])
```

------

### Converse
<a name="model-parameters-openai-use-converse"></a>

When you use the unified [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html) API, you need to map the OpenAI Create chat completion fields to its corresponding field in the Converse request body.

For example, compare the following chat completion request body to its corresponding Converse request body:

------
#### [ Create chat completion request body ]

```
{
  "model": "openai.gpt-oss-20b-1:0",
  "messages": [
    {
      "role": "developer",
      "content": "You are a helpful assistant."
    },
    {
      "role": "assistant", 
      "content": "Hello! How can I help you today?"
    },
    {
      "role": "user",
      "content": "What is the weather like today?"
    }
  ],
  "max_completion_tokens": 150,
  "temperature": 0.7
}
```

------
#### [ Converse request body ]

```
{
    "messages": [
        {
            "role": "user", 
            "content": [
                {
                    "text": "Hello! How can I help you today?"
                }
            ]
        },
        {
            "role": "user",
            "content": [
                {
                    "text": "What is the weather like today?"
                }
            ]
        }
    ],
    "system": [
        {
            "text": "You are a helpful assistant."
        }
    ],
    "inferenceConfig": {
        "maxTokens": 150,
        "temperature": 0.7
    }
}
```

------

Choose the tab for your preferred method, and then follow the steps:

------
#### [ Python ]

```
# Use the Conversation API to send a text message to Anthropic Claude.

import boto3
from botocore.exceptions import ClientError

# Initialize the Bedrock Runtime client
client = boto3.client("bedrock-runtime")

# Set the model ID
model_id = "openai.gpt-oss-20b-1:0"

# Set up messages and system message
messages = [
    {
        "role": "assistant", 
        "content": [
            {
                "text": "Hello! How can I help you today?"
            }
        ]
    },
    {
        "role": "user",
        "content": [
            {
                "text": "What is the weather like today?"
            }
        ]
    }
]

system = [
    {
        "text": "You are a helpful assistant."
    }
]

try:
    # Send the message to the model, using a basic inference configuration.
    response = client.converse(
        modelId=model_id,
        messages=messages,
        system=system,
        inferenceConfig={
            "maxTokens": 150, 
            "temperature": 0.7, 
            "topP": 0.9
        },
    )

    # Extract and print the response text.
    for content_block in response["output"]["message"]["content"]:
        print(content_block)

except (ClientError, Exception) as e:
    print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
    exit(1)
```

------

### Guardrails with InvokeModel
<a name="model-parameters-openai-use-guardrails-invoke"></a>

Apply a guardrail when running model invocation by specifying the guardrail ID, version, and whether or not to enable the guardrail trace in the header of a model invocation request.

Choose the tab for your preferred method, and then follow the steps:

------
#### [ Python ]

```
import boto3
from botocore.exceptions import ClientError
import json

# Initiate the Amazon Bedrock Runtime client
bedrock_runtime = boto3.client("bedrock-runtime")

# Model ID
model_id = "openai.gpt-oss-20b-1:0"

# Replace with actual values from your guardrail
guardrail_id = "GR12345"
guardrail_version = "DRAFT"

# Create the request body
native_request = {
  "model": model_id, # You can omit this field
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "assistant", 
      "content": "Hello! How can I help you today?"
    },
    {
      "role": "user",
      "content": "What is the weather like today?"
    }
  ],
  "max_completion_tokens": 150,
  "temperature": 0.7,
  "top_p": 0.9,
  "stream": False # You can omit this field
}

try:
    response = bedrock_runtime.invoke_model(
        modelId=model_id,
        body=json.dumps(native_request),
        guardrailIdentifier=guardrail_id,
        guardrailVersion=guardrail_version,
        trace='ENABLED',
    )
    response_body = json.loads(response.get('body').read())
    print("Received response from InvokeModel API (Request Id: {})".format(response['ResponseMetadata']['RequestId']))
    print(json.dumps(response_body, indent=2))

except ClientError as err:
    print("RequestId = " + err.response['ResponseMetadata']['RequestId'])
    raise err
```

------

### Guardrails with OpenAI chat completions
<a name="model-parameters-openai-use-guardrails-chat-completions"></a>

To see examples of using guardrails with OpenAI chat completions, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

```
import openai
from openai import OpenAIError

# Endpoint for Amazon Bedrock Runtime
bedrock_endpoint = "https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1"

# Model ID
model_id = "openai.gpt-oss-20b-1:0"

# Replace with actual values
bedrock_api_key = "$AWS_BEARER_TOKEN_BEDROCK"
guardrail_id = "GR12345"
guardrail_version = "DRAFT"

client = openai.OpenAI(
    api_key=bedrock_api_key,
    base_url=bedrock_endpoint,
)

try:
    response = client.chat.completions.create(
        model=model_id,
        # Specify guardrail information in the header
        extra_headers={
            "X-Amzn-Bedrock-GuardrailIdentifier": guardrail_id,
            "X-Amzn-Bedrock-GuardrailVersion": guardrail_version,
            "X-Amzn-Bedrock-Trace": "ENABLED",
        },
        # Additional guardrail information can be specified in the body
        extra_body={
            "amazon-bedrock-guardrailConfig": {
                "tagSuffix": "xyz"  # Used for input tagging
            }
        },
        messages=[
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "assistant", 
                "content": "Hello! How can I help you today?"
            },
            {
                "role": "user",
                "content": "What is the weather like today?"
            }
        ]
    )

    request_id = response._request_id
    print(f"Request ID: {request_id}")
    print(response)
    
except OpenAIError as e:
    print(f"An error occurred: {e}")
    if hasattr(e, 'response') and e.response is not None:
        request_id = e.response.headers.get("x-request-id")
        print(f"Request ID: {request_id}")
```

------
#### [ OpenAI SDK (Java) ]

```
import com.openai.client.OpenAIClient;
import com.openai.client.okhttp.OpenAIOkHttpClient;
import com.openai.core.http.HttpResponseFor;
import com.openai.models.chat.completions.ChatCompletion;
import com.openai.models.chat.completions.ChatCompletionCreateParams;

// Endpoint for Amazon Bedrock Runtime
String bedrockEndpoint = "http://bedrock-runtime.us-west-2.amazonaws.com/openai/v1"

// Model ID
String modelId = "openai.gpt-oss-20b-1:0"

// Replace with actual values
String bedrockApiKey = "$AWS_BEARER_TOKEN_BEDROCK"
String guardrailId = "GR12345"
String guardrailVersion = "DRAFT"

OpenAIClient client = OpenAIOkHttpClient.builder()
        .apiKey(bedrockApiKey)
        .baseUrl(bedrockEndpoint)
        .build()

ChatCompletionCreateParams request = ChatCompletionCreateParams.builder()
        .addUserMessage("What is the temperature in Seattle?")
        .model(modelId)
        // Specify additional headers for the guardrail
        .putAdditionalHeader("X-Amzn-Bedrock-GuardrailIdentifier", guardrailId)
        .putAdditionalHeader("X-Amzn-Bedrock-GuardrailVersion", guardrailVersion)
        // Specify additional body parameters for the guardrail
        .putAdditionalBodyProperty(
                "amazon-bedrock-guardrailConfig",
                JsonValue.from(Map.of("tagSuffix", JsonValue.of("xyz"))) // Allows input tagging
        )
        .build();
        
HttpResponseFor<ChatCompletion> rawChatCompletionResponse =
        client.chat().completions().withRawResponse().create(request);

final ChatCompletion chatCompletion = rawChatCompletionResponse.parse();

System.out.println(chatCompletion);
```

------

### Batch inference
<a name="model-parameters-openai-use-batch"></a>

[Batch inference](batch-inference.md) lets you run model inference asynchronously with multiple prompts. To run batch inference with an OpenAI model, you do the following:

1. Create a JSONL file and populate it with at least the minimum number of JSON objects, each separated by a new line. Each `modelInput` object must conform to the format of the [OpenAI create chat completion](https://platform.openai.com/docs/api-reference/chat/create) request body. The following shows an example of the first two lines of a JSONL file containing request bodies for OpenAI.

   ```
   {
       "recordId": "RECORD1", 
       "modelInput": {
           "messages": [
               {
                   "role": "system", 
                   "content": "You are a helpful assistant."
               }, 
               {
                   "role": "user", 
                   "content": "Can you generate a question with a factual answer?"
               }
           ], 
           "max_completion_tokens": 1000
       }
   }
   {
       "recordId": "RECORD2", 
       "modelInput": {
           "messages": [
               {
                   "role": "system", 
                   "content": "You are a helpful assistant."
               }, 
               {
                   "role": "user", 
                   "content": "What is the weather like today?"
               }
           ], 
           "max_completion_tokens": 1000
       }
   }
   ...
   ```
**Note**  
The `model` field is optional because the batch inference service will insert it for you based on the header if you omit it.  
Check that your JSONL file conforms to the batch inference quotas as outlined in [Format and upload your batch inference data](batch-inference-data.md).

1. Upload the file to an Amazon S3 bucket.

1. Send a [CreateModelInvocationJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateModelInvocationJob.html) request with an [Amazon Bedrock control plane endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-cp) with the S3 bucket from the previous step specified in the `inputDataConfig` field and the OpenAI model specified in the `modelId` field.

For an end-to-end code example, see [Code example for batch inference](batch-inference-example.md). Replace with the proper configurations for the OpenAI models.

# Stability AI models
<a name="model-parameters-stability-diffusion"></a>

This section describes the request parameters and response fields for Stability AI models. Use this information to make inference calls to Stability AI models with the [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) operation. This section also includes Python code examples that shows how to call Stability AI models. To use a model in an inference operation, you need the model ID for the model. To get the model ID, see [Supported foundation models in Amazon Bedrock](models-supported.md). Some models also work with the [Converse API](conversation-inference.md). To check if the Converse API supports a specific Stability AI model, see [Supported models and model features](conversation-inference-supported-models-features.md). For more code examples, see [Code examples for Amazon Bedrock using AWS SDKs](service_code_examples.md).

Foundation models in Amazon Bedrock support input and output modalities, which vary from model to model. To check the modalities that Stability AI models support, see [Supported foundation models in Amazon Bedrock](models-supported.md). To check which Amazon Bedrock features the Stability AI models support, see [Supported foundation models in Amazon Bedrock](models-supported.md). To check which AWS Regions that Stability AI models are available in, see [Supported foundation models in Amazon Bedrock](models-supported.md).

When you make inference calls with Stability AI models, you include a prompt for the model. For general information about creating prompts for the models that Amazon Bedrock supports, see [Prompt engineering concepts](prompt-engineering-guidelines.md). For Stability AI specific prompt information, see the [Stability AI prompt engineering guide](https://platform.stability.ai/docs/getting-started).

## Supported models and image services
<a name="supported-stability-models"></a>

Amazon Bedrock supports the following Stability AI models and image services.

**Note**  
Support for all other Stability AI models are in the process of being deprecated.


| Model | Use cases | Example | 
| --- | --- | --- | 
|  [Stable Image Ultra](model-parameters-diffusion-stable-ultra-text-image-request-response.md)  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-stability-diffusion.html)  |  A luxury brand uses Stable Image Ultra to create stunning visuals of its latest collection for magazine spreads, ensuring a premium feel that matches its high standards.  | 
|  [Stable Diffusion 3.5 Large](model-parameters-diffusion-3-5-large.md)  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-stability-diffusion.html)  |  A game development team uses SD3.5 Large to create detailed environmental textures and character concepts, accelerating their creative pipeline.  | 
|  [Stable Image Core](model-parameters-diffusion-stable-image-core-text-image-request-response.md)  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-stability-diffusion.html)  |  An online retailer uses Stable Image Core to quickly generate product images for new arrivals, allowing it to list items faster and keep its catalog up-to-date.  | 
|  [Stability AI Image Services](stable-image-services.md)  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-stability-diffusion.html)  |  A media company uses search and recolor, style transfer, and remove background to generate variations of images for an advertising campaign.  | 

**Topics**
+ [

## Supported models and image services
](#supported-stability-models)
+ [

# Stable Image Ultra request and response
](model-parameters-diffusion-stable-ultra-text-image-request-response.md)
+ [

# Stability.ai Stable Diffusion 3.5 Large
](model-parameters-diffusion-3-5-large.md)
+ [

# Stable Image Core request and response
](model-parameters-diffusion-stable-image-core-text-image-request-response.md)
+ [

# Stability AI Image Services
](stable-image-services.md)

# Stable Image Ultra request and response
<a name="model-parameters-diffusion-stable-ultra-text-image-request-response"></a>

The request body is passed in the `body` field of a request to [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) operation. 

**Model invocation request body field**

When you make an InvokeModel call using a Stable Image Ultra model, fill the body field with a JSON object that looks like the below. 
+ **prompt** – (string) What you wish to see in the output image. A strong, descriptive prompt that clearly defines elements, colors, and subjects will lead to better results.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-diffusion-stable-ultra-text-image-request-response.html)

**Model invocation responses body field**

When you make an `InvokeModel` call using a Stable Image Ultra model, the response looks like the below 

```
{
         'seeds': [2130420379], 
         "finish_reasons":[null], 
         "images":["..."]
     }
```

A response with a finish reason that is not `null`, will look like the below:

```
{
         "finish_reasons":["Filter reason: prompt"]
     }
```
+ **seeds** – (string) List of seeds used to generate images for the model.
+ **finish\$1reasons** – Enum indicating whether the request was filtered or not. `null` will indicate that the request was successful. Current possible values: `"Filter reason: prompt", "Filter reason: output image", "Filter reason: input image", "Inference error", null`.
+ **images** – A list of generated images in base64 string format.

For more information, see [https://platform.us.stability.ai/docs/api-reference\$1tag/v1generation](https://platform.us.stability.ai/docs/api-reference#tag/v1generation).

------
#### [ Text to image ]

The Stability.ai Stable Image Ultra model has the following inference parameters for a text-to-image inference call. 
+ **prompt** – (string) What you wish to see in the output image. A strong, descriptive prompt that clearly defines elements, colors, and subjects will lead to better results.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-diffusion-stable-ultra-text-image-request-response.html)

**Optional fields**
+ **aspect\$1ratio** – (string) Controls the aspect ratio of the generated image. This parameter is only valid for text-to-image requests. Default 1:1. Enum: 16:9, 1:1, 21:9, 2:3, 3:2, 4:5, 5:4, 9:16, 9:21.
+ **mode** – Set to text-to-image. Default: text-to-image. Enum: `text-to-image`.
+ **output\$1format** – Specifies the format of the output image. Supported formats: JPEG, PNG. Supported dimensions: height 640 to 1,536 px, width 640 to 1,536 px.
+ **seed** – (number) A specific value that is used to guide the 'randomness' of the generation. (Omit this parameter or pass 0 to use a random seed.) Range: 0 to 4294967295.
+ **negative\$1prompt** – Keywords of what you do not wish to see in the output image. Max: 10.000 characters.

```
import boto3
       import json
       import base64
       import io
       from PIL import Image
       
       bedrock = boto3.client('bedrock-runtime', region_name='us-west-2')
       response = bedrock.invoke_model(
           modelId='stability.sd3-ultra-v1:1',
           body=json.dumps({
               'prompt': 'A car made out of vegetables.'
           })
       )
       output_body = json.loads(response["body"].read().decode("utf-8"))
       base64_output_image = output_body["images"][0]
       image_data = base64.b64decode(base64_output_image)
       image = Image.open(io.BytesIO(image_data))
       image.save("image.png")
```

------
#### [ Image to image ]

The Stability.ai Stable Image Ultra model has the following inference parameters for an image-to-image inference call.
+ **prompt** – (string) What you wish to see in the output image. A strong, descriptive prompt that clearly defines elements, colors, and subjects will lead to better results.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-diffusion-stable-ultra-text-image-request-response.html)

**Optional fields**
+ **image** – (string) The Base64 image to use as the starting point for the generation. Supported formats: JPEG, PNG, WebP.
+ **strength** – (number) How much influence the image parameter has on the generated image. Images with lower strength values will look more like the original image. Range: 0.0 to 1.0. Default: 0.35.
+ **aspect\$1ratio** – (string) Controls the aspect ratio of the generated image. This parameter is only valid for text-to-image requests. Default 1:1. Enum: 16:9, 1:1, 21:9, 2:3, 3:2, 4:5, 5:4, 9:16, 9:21.
+ **output\$1format** – Specifies the format of the output image. Supported formats: JPEG, PNG. Supported dimensions: height 640 to 1,536 px, width 640 to 1,536 px.
+ **seed** – (number) A specific value that is used to guide the 'randomness' of the generation. (Omit this parameter or pass 0 to use a random seed.) Range: 0 to 4294967295.
+ **negative\$1prompt** – Keywords of what you do not wish to see in the output image. Max: 10.000 characters.

```
import boto3
       import json
       import base64
       import io
       from PIL import Image
       
       bedrock = boto3.client('bedrock-runtime', region_name='us-west-2')
       response = bedrock.invoke_model(
           modelId='stability.sd3-ultra-v1:1',
           body=json.dumps({
               'prompt': 'A car made out of vegetables.'
           })
       )
       output_body = json.loads(response["body"].read().decode("utf-8"))
       base64_output_image = output_body["images"][0]
       image_data = base64.b64decode(base64_output_image)
       image = Image.open(io.BytesIO(image_data))
       image.save("image.png")
```

------

# Stability.ai Stable Diffusion 3.5 Large
<a name="model-parameters-diffusion-3-5-large"></a>

The Stable Diffusion 3.5 Large model uses 8 billion parameters and supports 1 megapixel resolution output for text-to-image and image-to-image generation.

The request body is passed in the `body` field of a request to [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html). 

**Model invocation request body field**

When you make an InvokeModel call using a Stable Diffusion 3.5 Large model, fill the body field with a JSON object that looks like the below.
+ **prompt** – (string) Text description of the desired output image. Maximum 10,000 characters.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-diffusion-3-5-large.html)

**Model invocation responses body field**

When you make an `InvokeModel` call using a Stable Diffusion 3.5 Large model, the response looks like the below

```
{
    'seeds': [2130420379], 
    "finish_reasons":[null], 
    "images":["..."]
}
```

A response with a finish reason that is not `null`, will look like the below:

```
{
    "finish_reasons":["Filter reason: prompt"]
}
```
+ **seeds** – (string) List of seeds used to generate images for the model.
+ **finish\$1reasons** – Enum indicating whether the request was filtered or not. `null` will indicate that the request was successful. Current possible values: `"Filter reason: prompt", "Filter reason: output image", "Filter reason: input image", "Inference error", null`.
+ **images** – A list of generated images in base64 string format.

------
#### [ Text to image ]

The Stability.ai Stable Diffusion 3.5 Large model has the following inference parameters for a text-to-image inference call.
+ **prompt** (string) – Text description of the desired output image. Maximum 10,000 characters.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-diffusion-3-5-large.html)

**Optional parameters**
+ **aspect\$1ratio** (string) – Controls the aspect ratio of the generated image. Valid for text-to-image requests only. Enum: 16:9, 1:1, 21:9, 2:3, 3:2, 4:5, 5:4, 9:16, 9:21. Default 1:1.
+ **mode** (string) (GenerationMode) - Default: text-to-image. Enum: image-to-image or text-to-image. Controls whether this is a text-to-image or image-to-image generation, which affects which parameters are required:
  + text-to-image requires only the prompt parameter.
  + image-to-image requires the prompt, image, and strength parameters.
+ **seed** (number) – Value to control randomness in generation. Range 0 to 4294967294. Default 0 (random seed).    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-diffusion-3-5-large.html)
+ **negative\$1prompt** (string) – Text describing elements to exclude from the output image. Maximum 10,000 characters.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-diffusion-3-5-large.html)
+ **output\$1format** (string) – Output image format. Enum: jpeg, png, webp. Default png.

```
import boto3
import json

bedrock = boto3.client('bedrock-runtime', region_name='us-west-2')
response = bedrock.invoke_model(
    modelId='stability.sd3-5-large-v1:0',
    body=json.dumps({
        'prompt': 'A car made out of vegetables.'
    })
)
```

------
#### [ Image to image ]

The Stability.ai Stable Diffusion 3.5 Large model has the following inference parameters for an image-to-image inference call.
+ **prompt** (string) – Text description of the desired output image. Maximum 10,000 characters.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-diffusion-3-5-large.html)
+ **image** (string) – Base64-encoded input image. Minimum 64 pixels per side. Supported formats: jpeg, png, webp.
+ **mode** (string) (GenerationMode) - Default: text-to-image. Enum: image-to-image or text-to-image. Controls whether this is a text-to-image or image-to-image generation, which affects which parameters are required:
  + text-to-image requires only the prompt parameter.
  + image-to-image requires the prompt, image, and strength parameters.
+ **strength** (number) – Controls influence of the input image on the output. Range 0 to 1. Value of 0 preserves the input image, value of 1 ignores the input image.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-diffusion-3-5-large.html)
+ **seed** (number) – Value to control randomness in generation. Range 0 to 4294967294. Default 0 (random seed).    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-diffusion-3-5-large.html)
+ **negative\$1prompt** (string) – Text describing elements to exclude from the output image. Maximum 10,000 characters.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-diffusion-3-5-large.html)
+ **output\$1format** (string) – Output image format. Enum: jpeg, png, webp. Default png.

```
import boto3
import base64
import json

# Load and encode image
with open('input_image.jpg', 'rb') as image_file:
    image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

bedrock = boto3.client('bedrock-runtime', region_name='us-west-2')
response = bedrock.invoke_model(
    modelId='stability.sd3-5-large-v1:0',
    body=json.dumps({
        'prompt': 'A car made out of vegetables.',
        'image': image_base64,
        'strength': 0.7
    })
)
```

------

# Stable Image Core request and response
<a name="model-parameters-diffusion-stable-image-core-text-image-request-response"></a>

The request body is passed in the `body` field of a request to [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html). 

**Model invocation request body field**

When you make an InvokeModel call using a Stability AI Stable Diffusion Stable Image Core model, fill the body field with a JSON object that looks like the below. 

```
{
        'prompt': 'Create an image of a panda'
    }
```

**Model invocation responses body field**

When you make an InvokeModel call using a Stability AI Stable Diffusion Stable Image Core model, the response looks like the below 

```
{
        'seeds': [2130420379], 
        'finish_reasons': [null], 
        'images': ['...']
    }
```
+ **seeds** – (string) List of seeds used to generate images for the model.
+ **finish\$1reasons** – Enum indicating whether the request was filtered or not. `null` will indicate that the request was successful. Current possible values: `"Filter reason: prompt", "Filter reason: output image", "Filter reason: input image", "Inference error", null`.
+ **images** – A list of generated images in base64 string format.

For more information, see [https://platform.us.stability.ai/docs/api-reference\$1tag/v1generation](https://platform.us.stability.ai/docs/api-reference#tag/v1generation).

------
#### [ Text to image ]

The Stable Image Core model has the following inference parameters for a text to image inference call. 

 **text\$1prompts** (Required) – An array of text prompts to use for generation. Each element is a JSON object that contains a prompt and a weight for the prompt.
+ **prompt** – (string) What you wish to see in the output image. A strong, descriptive prompt that clearly defines elements, colors, and subjects will lead to better results.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-diffusion-stable-image-core-text-image-request-response.html)

**Optional fields**
+ **aspect\$1ratio** – (string) Controls the aspect ratio of the generated image. This parameter is only valid for text-to-image requests. Default 1:1. Enum: 16:9, 1:1, 21:9, 2:3, 3:2, 4:5, 5:4, 9:16, 9:21.
+ **output\$1format** – Specifies the format of the output image. Supported formats: JPEG, PNG. Supported dimensions: height 640 to 1,536 px, width 640 to 1,536 px.
+ **seed** – (number) A specific value that is used to guide the 'randomness' of the generation. (Omit this parameter or pass 0 to use a random seed.) Range: 0 to 4294967295.
+ **negative\$1prompt** – Keywords of what you do not wish to see in the output image. Max: 10.000 characters.

```
     import boto3
     import json
     import base64
     import io
     from PIL import Image
     
     bedrock = boto3.client('bedrock-runtime', region_name='us-west-2')
     response = bedrock.invoke_model(
         modelId='stability.stable-image-core-v1:0',
         body=json.dumps({
             'prompt': 'A car made out of vegetables.'
         })
     )
     output_body = json.loads(response["body"].read().decode("utf-8"))
     base64_output_image = output_body["images"][0]
     image_data = base64.b64decode(base64_output_image)
     image = Image.open(io.BytesIO(image_data))
     image.save("image.png")
```

------

# Stability AI Image Services
<a name="stable-image-services"></a>

You can use Stability AI Image Services with Amazon Bedrock to access thirteen specialized image editing tools designed to accelerate professional creative workflows. With Stability AI Image Services you can generate images from a sketch, restructure and restyle an existing image, or remove and replace objects within an image.

This section describes how to make inference calls to Stability AI Image Services using the [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html). This section also provides code examples in Python and examples of images before and after using Stability AI Image Services.

Stability AI Image Services are available in the following categories:
+ **Edit** ‐ AI-based image editing services, including inpainting with masks (generative fill), or with words. Includes tools for product placement and advertising, as well as basic tools such as background removal.
+ **Control** ‐ May take prompts, maps and other guides. These services leverage ControlNets and similar technologies built on Stable Diffusion models.

**Note**  
Subscribing to any edit or control Stability AI Image Service automatically enrolls you in all thirteen available Stability AI Image Services.

**Topics**
+ [

## Request and response
](#model-parameters-stable-image-services-request-response)
+ [

## Upscale
](#stable-image-services-upscale)
+ [

## Edit
](#stable-image-services-edit)
+ [

## Control
](#stable-image-services-control)

## Request and response
<a name="model-parameters-stable-image-services-request-response"></a>

The request body is passed in the `body` field of a request to [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html). 

**Model invocation request body field**

When you make an InvokeModel call using Stability AI Image Services, fill the body field with a JSON object that looks like the below. 

```
{
    'prompt': 'Create an image of a panda'
}
```

**Model invocation responses body field**

When you make an InvokeModel call using Stability AI Image Services, the response looks like the below 

```
{
    'seeds': [2130420379], 
    'finish_reasons': [null], 
    'images': ['...']
}
```
+ **seeds** – (string) List of seeds used to generate images for the model.
+ **finish\$1reasons** – Enum indicating whether the request was filtered or not. `null` will indicate that the request was successful. Current possible values: `"Filter reason: prompt", "Filter reason: output image", "Filter reason: input image", "Inference error", null`.
+ **images** – A list of generated images in base64 string format.

For more information, see [https://platform.us.stability.ai/docs/api-reference\$1tag/v1generation](https://platform.us.stability.ai/docs/api-reference#tag/v1generation).

## Upscale
<a name="stable-image-services-upscale"></a>

The following section describes the upscale Stability AI Image Services.

### Creative Upscale
<a name="stable-image-services-5"></a>

Creative Upscale takes images between 64x64 and 1 megapixel and upscales them to 4K resolution. This service can upscale images by 20 to 40 times while preserving and often enhancing quality. Creative Upscale works best on highly degraded images and is not for photos of 1 megapixel or above as it performs heavy reimagining.

Creative Upscale has the following required parameters:
+ **prompt** ‐ What you wish to see in the output image. A strong, descriptive prompt that clearly defines elements, colors, and subjects will lead to better results. To control the weight of a given word use the format (word:weight), where word is the word you'd like to control the weight of and weight is a value. A value 0 and 1.0 de-emphasized the word and a value between 1.1 and 2 emphasized the word . For example: The sky was a crisp (blue:0.3) and (green:1.8) would convey a sky that was blue and green, but more green than blue. Minimum 0 and Maximum 10000 characters.
+ **image** ‐ (string) The Base64 image to upscale. Every side of the image must be at least 64 pixels. Total pixel count must be between 4,096 and 1,048,576 pixels. Supported formats: jpeg, png, webp.

The following parameters are optional:
+ **creativity** ‐ (number) Indicates how creative the model should be when upscaling an image. Higher values will result in more details being added to the image during upscaling. Range between 0.1 and 0.5. Default 0.3
+ **negative\$1prompt** ‐ (string) A blurb of text describing what you do not wish to see in the output image. This is an advanced feature. Maximum 10000 characters.
+ **seed** ‐ (number) A specific value that is used to guide the 'randomness' of the generation. (Omit this parameter or pass 0 to use a random seed.) Range 0 to 4294967294. Default 0.
+ **output\$1format** ‐ (string) Dictates the content-type of the generated image. Enum: jpeg, png, webp. Default png.
+ **style\$1preset** ‐ Guides the image model towards a particular style. Enum: 3d-model, analog-film, anime, cinematic, comic-book, digital-art, enhance, fantasy-art, isometric, line-art, low-poly, modeling-compound, neon-punk, origami, photographic, pixel-art, tile-texture.

------
#### [ API ]

```
import base64
import json
import requests
import io
import os
from PIL import Image

image = "./content/input.jpg"

region = "us-east-1"
model_id = "us.stability.stable-creative-upscale-v1:0"
url = f"https://bedrock-runtime.{region}.amazonaws.com/model/{model_id}/invoke"
api_key = os.getenv("AWS_BEARER_TOKEN_BEDROCK") # https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started-api-keys.html
headers = {
    "Content-Type":"application/json",
    "Authorization":f"Bearer {api_key}"
}

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

    params = {
        "image": image_base64,
        "prompt": "This dreamlike digital art captures a vibrant, kaleidoscopic Big Ben in London",
        "creativity": 0.30
    }
    response = requests.request("POST", url, json=params, headers=headers)
    response.raise_for_status()
    model_response = json.loads(response.text)
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")
    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------
#### [ Python ]

```
import boto3
import base64
import io
import json
from PIL import Image

image = "./content/input.jpg"

region = "us-east-1"
model_id = "us.stability.stable-creative-upscale-v1:0"

bedrock = boto3.client("bedrock-runtime", region_name=region)

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

    params = {
        "image": image_base64,
        "prompt": "This dreamlike digital art captures a vibrant, kaleidoscopic Big Ben in London",
        "creativity": 0.30
    }
    request = json.dumps(params)
    response = bedrock.invoke_model(modelId=model_id, body=request)
    model_response = json.loads(response["body"].read())
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")
    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------

The following table shows the input and output images of a Creative Upscale operation using the following prompt: *This dreamlike digital art captures a vibrant, kaleidoscopic bird in a lush rainforest*.


|  Input  |  Output  | 
| --- | --- | 
|  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/input-creative-upscale.jpg)  |  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/output-creative-upscale.jpg)  | 

### Conservative Upscale
<a name="stable-image-services-6"></a>

Conservative Upscale takes images between 64x64 and 1 megapixel and upscale them to 4K resolution. This service can upscale images by 20 to 40 times while preserving all aspects. Conservative Upscale minimizes alterations to the image and should not be used to reimagine an image.

Conservative Upscale has the following required parameters:
+ **prompt** ‐ What you wish to see in the output image. A strong, descriptive prompt that clearly defines elements, colors, and subjects will lead to better results. To control the weight of a given word use the format (word:weight), where word is the word you'd like to control the weight of and weight is a value. A value 0 and 1.0 de-emphasized the word and a value between 1.1 and 2 emphasized the word . For example: The sky was a crisp (blue:0.3) and (green:1.8) would convey a sky that was blue and green, but more green than blue. Minimum 0 and Maximum 10000 characters.
+ **image** ‐ (string) The Base64 image to upscale. Every side of the image must be at least 64 pixels. The total pixel count cannot exceed 9,437,184 pixels. Image aspect ratio must be between 1:2.5 and 2.5:1. Supported formats: jpeg, png, webp.

The following parameters are optional:
+ **creativity** ‐ (number) Indicates how creative the model should be when upscaling an image. Higher values will result in more details being added to the image during upscaling. Range between 0.1 and 0.5. Default 0.35
+ **negative\$1prompt** ‐ (string) A blurb of text describing what you do not wish to see in the output image. This is an advanced feature. Maximum 10000 characters.
+ **seed** ‐ (number) A specific value that is used to guide the 'randomness' of the generation. (Omit this parameter or pass 0 to use a random seed.) Range 0 to 4294967294. Default 0.
+ **output\$1format** ‐ (string) Dictates the content-type of the generated image. Enum: jpeg, png, webp. Default png.

------
#### [ API ]

```
import base64
import json
import requests
import io
import os
from PIL import Image

image = "./content/input.jpg"

region = "us-east-1"
model_id = "us.stability.stable-conservative-upscale-v1:0"
url = f"https://bedrock-runtime.{region}.amazonaws.com/model/{model_id}/invoke"
api_key = os.getenv("AWS_BEARER_TOKEN_BEDROCK") # https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started-api-keys.html
headers = {
    "Content-Type":"application/json",
    "Authorization":f"Bearer {api_key}"
}

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

    params = {
        "image": image_base64,
        "prompt": "This dreamlike digital art captures a vibrant, kaleidoscopic Big Ben in London",
        "creativity": 0.30
    }
    response = requests.request("POST", url, json=params, headers=headers)
    response.raise_for_status()
    model_response = json.loads(response.text)
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")
    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------
#### [ Python ]

```
import boto3
import base64
import io
import json
from PIL import Image

image = "./content/input.jpg"

region = "us-east-1"
model_id = "us.stability.stable-conservative-upscale-v1:0"

bedrock = boto3.client("bedrock-runtime", region_name=region)

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

    params = {
        "image": image_base64,
        "prompt": "This dreamlike digital art captures a vibrant, kaleidoscopic Big Ben in London",
        "creativity": 0.30
    }
    request = json.dumps(params)
    response = bedrock.invoke_model(modelId=model_id, body=request)
    model_response = json.loads(response["body"].read())
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")
    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------

The following table shows the input and output images of a Conservative Upscale operation using the following prompt: *photo of a giant chicken in a forest*.


|  Input  |  Output  | 
| --- | --- | 
|  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/input-conservative-upscale.jpg)  |  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/output-conservative-upscale.jpg)  | 

### Fast Upscale
<a name="stable-image-services-7"></a>

Fast Upscale enhances image resolution by 4 times using predictive and generative AI. This lightweight and fast service is ideal for enhancing the quality of compressed images, making it suitable for social media posts and other applications.

Fast upscale has the following required parameters:
+ **image** ‐ (string) The Base64 image to upscale. Width must be between 32 and 1,536 pixels. Height must be between 32 and 1,536 pixels. Total pixel count must be between 1,024 and 1,048,576 pixels. Supported formats: jpeg, png, webp.
+ **output\$1format** ‐ (string) Dictates the content-type of the generated image. Enum: jpeg, png, webp. Default png.

------
#### [ API ]

```
import base64
import json
import requests
import io
import os
from PIL import Image

image = "./content/input.jpg"

region = "us-east-1"
model_id = "us.stability.stable-fast-upscale-v1:0"
url = f"https://bedrock-runtime.{region}.amazonaws.com/model/{model_id}/invoke"
api_key = os.getenv("AWS_BEARER_TOKEN_BEDROCK") # https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started-api-keys.html
headers = {
    "Content-Type":"application/json",
    "Authorization":f"Bearer {api_key}"
}

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

    params = {
        "image": image_base64
    }
    response = requests.request("POST", url, json=params, headers=headers)
    response.raise_for_status()
    model_response = json.loads(response.text)
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")
    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------
#### [ Python ]

```
import boto3
import base64
import io
import json
from PIL import Image

image = "./content/input.jpg"

region = "us-east-1"
model_id = "us.stability.stable-fast-upscale-v1:0"

bedrock = boto3.client("bedrock-runtime", region_name=region)

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

    params = {
        "image": image_base64
    }
    request = json.dumps(params)
    response = bedrock.invoke_model(modelId=model_id, body=request)
    model_response = json.loads(response["body"].read())
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")
    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------

The following table shows the input and output images of a Fast Upscale operation.


|  Input  |  Output  | 
| --- | --- | 
|  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/input-fast-upscale.jpg)  |  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/output-fast-upscale.jpg)  | 

## Edit
<a name="stable-image-services-edit"></a>

The following section describes the edit Stability AI Image Services.

### Inpaint
<a name="stable-image-services-8"></a>

Inpaint intelligently modifies images by filling in or replacing specified areas with new content based on the content of a mask image.

Inpaint has the following required parameters:
+ **prompt** ‐ What you wish to see in the output image. A strong, descriptive prompt that clearly defines elements, colors, and subjects will lead to better results. To control the weight of a given word use the format (word:weight), where word is the word you'd like to control the weight of and weight is a value. A value 0 and 1.0 de-emphasized the word and a value between 1.1 and 2 emphasized the word . For example: The sky was a crisp (blue:0.3) and (green:1.8) would convey a sky that was blue and green, but more green than blue. Minimum 0 and Maximum 10000 characters.
+ **image** ‐ (string) The Base64 image to inpaint. Every side of the image must be at least 64 pixels. The total pixel count cannot exceed 9,437,184 pixels. Image aspect ratio must be between 1:2.5 and 2.5:1. Supported formats: jpeg, png, webp.

The following parameters are optional:
+ **style\$1preset** ‐ (string) Guides the image model towards a particular style. Enum: 3d-model, analog-film, anime, cinematic, comic-book, digital-art, enhance, fantasy-art, isometric, line-art, low-poly, modeling-compound, neon-punk, origami, photographic, pixel-art, tile-texture.
+ **negative\$1prompt** ‐ (string) A blurb of text describing what you do not wish to see in the output image. This is an advanced feature. Maximum 10000 characters.
+ **seed** ‐ (number) A specific value that is used to guide the 'randomness' of the generation. (Omit this parameter or pass 0 to use a random seed.) Range 0 to 4294967294. Default 0.
+ **output\$1format** ‐ (string) Dictates the content-type of the generated image. Enum: jpeg, png, webp. Default png.
+ **mask** ‐ (string) Controls the strength of the inpainting process on a per-pixel basis, either via a second image (passed into this parameter) or via the alpha channel of the image parameter.
  + **Passing in a Mask** ‐ The image passed to this parameter should be a black and white image that represents, at any pixel, the strength of inpainting based on how dark or light the given pixel is. Completely black pixels represent no inpainting strength while completely white pixels represent maximum strength. In the event the mask is a different size than the image parameter, it will be automatically resized.
  + **Alpha Channel Support** ‐ If you don't provide an explicit mask, one will be derived from the alpha channel of the image parameter. Transparent pixels will be inpainted while opaque pixels will be preserved. In the event an image with an alpha channel is provided along with a mask, the mask will take precedence.
+ **grow\$1mask** ‐ Grows the edges of the mask outward in all directions by the specified number of pixels. The expanded area around the mask will be blurred, which can help smooth the transition between inpainted content and the original image. Range between 0 and 20. Default 5. Try this parameter if you notice seams or rough edges around the inpainted content. Note that excessive growth may obscure fine details in the mask and/or merge nearby masked regions.

------
#### [ API ]

```
import base64
import json
import requests
import io
import os
from PIL import Image

image = "./content/input.png"
mask = "./content/mask.png"

region = "us-east-1"
model_id = "us.stability.stable-image-inpaint-v1:0"
url = f"https://bedrock-runtime.{region}.amazonaws.com/model/{model_id}/invoke"
api_key = os.getenv("AWS_BEARER_TOKEN_BEDROCK") # https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started-api-keys.html
headers = {
    "Content-Type":"application/json",
    "Authorization":f"Bearer {api_key}"
}

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8')
    with open(mask, "rb") as mask_file:
        mask_base64 = base64.b64encode(mask_file.read()).decode('utf-8')

    params = {
        "image": image_base64,
        "mask": mask_base64,
        "prompt": "artificer of time and space"
    }
    response = requests.request("POST", url, json=params, headers=headers)
    response.raise_for_status()
    model_response = json.loads(response.text)
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")

    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------
#### [ Python ]

```
import boto3
import base64
import io
import json
from PIL import Image

image = "./content/input.png"
mask = "./content/mask.png"

region = "us-east-1"
model_id = "us.stability.stable-image-inpaint-v1:0"

bedrock = boto3.client("bedrock-runtime", region_name=region)

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8')
    with open(mask, "rb") as mask_file:
        mask_base64 = base64.b64encode(mask_file.read()).decode('utf-8')

    params = {
        "image": image_base64,
        "mask": mask_base64,
        "prompt": "artificer of time and space"
    }
    request = json.dumps(params)
    response = bedrock.invoke_model(modelId=model_id, body=request)
    model_response = json.loads(response["body"].read())
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")

    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------

The following table shows the input and output images of an Inpaint operation.


|  Input  |  Mask  |  Output  | 
| --- | --- | --- | 
|  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/input-image-inpaint.jpg)  |  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/mask-image-inpaint.png)  |  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/output-image-inpaint.jpg)  | 

### Outpaint
<a name="stable-image-services-9"></a>

Outpaint inserts additional content in an image to fill in the space in any direction. Compared to other automated or manual attempts to expand the content in an image, the Outpaint service minimizes indications that the original image has been edited.

Outpaint has the following required parameters:
+ **image** ‐ (string) The Base64 image to outpaint. Every side of the image must be at least 64 pixels. The total pixel count cannot exceed 9,437,184 pixels. Image aspect ratio must be between 1:2.5 and 2.5:1. Supported formats: jpeg, png, webp.
**Note**  
At least one outpaint direction: (left, right, up, or down) must be supplied with a non-zero value. For best quality results, consider the composition and content of your original image when choosing outpainting directions.

The following parameters are optional:
+ **prompt** ‐ What you wish to see in the output image. A strong, descriptive prompt that clearly defines elements, colors, and subjects will lead to better results. To control the weight of a given word use the format (word:weight), where word is the word you'd like to control the weight of and weight is a value. A value 0 and 1.0 de-emphasized the word and a value between 1.1 and 2 emphasized the word . For example: The sky was a crisp (blue:0.3) and (green:1.8) would convey a sky that was blue and green, but more green than blue. Minimum 0 and Maximum 10000 characters.
+ **style\$1preset** ‐ (string) Guides the image model towards a particular style. Enum: 3d-model, analog-film, anime, cinematic, comic-book, digital-art, enhance, fantasy-art, isometric, line-art, low-poly, modeling-compound, neon-punk, origami, photographic, pixel-art, tile-texture.
+ **seed** ‐ (number) A specific value that is used to guide the 'randomness' of the generation. (Omit this parameter or pass 0 to use a random seed.) Range 0 to 4294967294. Default 0.
+ **output\$1format** ‐ (string) Dictates the content-type of the generated image. Enum: jpeg, png, webp. Default png.
+ **creativity** ‐ (number) Indicates how creative the model should be when outpainting an image. Higher values will result in more creative content being added to the image during outpainting. Range between 0.1 and 1.0. Default 0.5.
+ **left** ‐ (integer) The number of pixels to outpaint on the left side of the image. At least one outpainting direction must be supplied with a non-zero value. Range 0 to 2000. Detault 0.
+ **right** ‐ (integer) The number of pixels to outpaint on the right side of the image. At least one outpainting direction must be supplied with a non-zero value. Range 0 to 2000. Detault 0.
+ **up** ‐ (integer) The number of pixels to outpaint on the top of the image. At least one outpainting direction must be supplied with a non-zero value. Range 0 to 2000. Detault 0.
+ **down** ‐ (integer) The number of pixels to outpaint on the bottom of the image. At least one outpainting direction must be supplied with a non-zero value. Range 0 to 2000. Detault 0.

------
#### [ API ]

```
import base64
import json
import requests
import io
import os
from PIL import Image

image = "./content/input.jpg"

region = "us-east-1"
model_id = "us.stability.stable-outpaint-v1:0"
url = f"https://bedrock-runtime.{region}.amazonaws.com/model/{model_id}/invoke"
api_key = os.getenv("AWS_BEARER_TOKEN_BEDROCK") # https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started-api-keys.html
headers = {
    "Content-Type":"application/json",
    "Authorization":f"Bearer {api_key}"
}

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

    params = {
        "image": image_base64,
        "left": 512,
        "right": 512,
        "up": 200,
        "down": 100
    }
    response = requests.request("POST", url, json=params, headers=headers)
    response.raise_for_status()
    model_response = json.loads(response.text)
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")
    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------
#### [ Python ]

```
import boto3
import base64
import io
import json
from PIL import Image

image = "./content/input.png"

region = "us-east-1"
model_id = "us.stability.stable-outpaint-v1:0"

bedrock = boto3.client("bedrock-runtime", region_name=region)

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

    params = {
        "image": image_base64,
        "left": 512,
        "right": 512,
        "up": 200,
        "down": 100
    }
    request = json.dumps(params)
    response = bedrock.invoke_model(modelId=model_id, body=request)
    model_response = json.loads(response["body"].read())
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")
    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------

The following table shows the input and output images of an Outpaint operation.


|  Input  |  Output  | 
| --- | --- | 
|  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/input-image-outpaint.jpg)  |  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/output-image-outpaint.jpg)  | 

### Search and Recolor
<a name="stable-image-services-10"></a>

Search and Recolor allows you to change the color of a specific object in an image using a prompt. This service is a specific version of inpainting that does not require a mask. It will automatically segment the object and recolor it using the colors requested in the prompt.

Search and Recolor has the following required parameters:
+ **prompt** ‐ What you wish to see in the output image. A strong, descriptive prompt that clearly defines elements, colors, and subjects will lead to better results. To control the weight of a given word use the format (word:weight), where word is the word you'd like to control the weight of and weight is a value. A value 0 and 1.0 de-emphasized the word and a value between 1.1 and 2 emphasized the word . For example: The sky was a crisp (blue:0.3) and (green:1.8) would convey a sky that was blue and green, but more green than blue. Minimum 0 and Maximum 10000 characters.
+ **image** ‐ (string) The Base64 image to recolor. Every side of the image must be at least 64 pixels. The total pixel count cannot exceed 9,437,184 pixels. Image aspect ratio must be between 1:2.5 and 2.5:1. Supported formats: jpeg, png, webp.
+ **select\$1prompt** ‐ (string) Short description of what to search for in the image. Maximum 10000 characters.

The following parameters are optional:
+ **style\$1preset** ‐ (string) Guides the image model towards a particular style. Enum: 3d-model, analog-film, anime, cinematic, comic-book, digital-art, enhance, fantasy-art, isometric, line-art, low-poly, modeling-compound, neon-punk, origami, photographic, pixel-art, tile-texture.
+ **negative\$1prompt** ‐ (string) A blurb of text describing what you do not wish to see in the output image. This is an advanced feature. Maximum 10000 characters.
+ **seed** ‐ (number) A specific value that is used to guide the 'randomness' of the generation. (Omit this parameter or pass 0 to use a random seed.) Range 0 to 4294967294. Default 0.
+ **output\$1format** ‐ (string) Dictates the content-type of the generated image. Enum: jpeg, png, webp. Default png.
+ **grow\$1mask** ‐ Grows the edges of the mask outward in all directions by the specified number of pixels. The expanded area around the mask will be blurred, which can help smooth the transition between inpainted content and the original image. Range between 0 and 20. Default 5. Try this parameter if you notice seams or rough edges around the inpainted content. Note that excessive growth may obscure fine details in the mask and/or merge nearby masked regions.

------
#### [ API ]

```
import base64
import json
import requests
import io
import os
from PIL import Image

image = "./content/input.png"

region = "us-east-1"
model_id = "us.stability.stable-image-search-recolor-v1:0"
url = f"https://bedrock-runtime.{region}.amazonaws.com/model/{model_id}/invoke"
api_key = os.getenv("AWS_BEARER_TOKEN_BEDROCK") # https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started-api-keys.html
headers = {
    "Content-Type":"application/json",
    "Authorization":f"Bearer {api_key}"
}

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

    params = {
        "image": image_base64,
        "prompt": "pink jacket",
        "select_prompt": "jacket"
    }
    response = requests.request("POST", url, json=params, headers=headers)
    response.raise_for_status()
    model_response = json.loads(response.text)

    base64_image_data = model_response["images"][0]
    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")

    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------
#### [ Python ]

```
import boto3
import base64
import io
import json
from PIL import Image

image = "./content/input.png"

region = "us-east-1"
model_id = "us.stability.stable-image-search-recolor-v1:0"

bedrock = boto3.client("bedrock-runtime", region_name=region)

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

    params = {
        "image": image_base64,
        "prompt": "pink jacket",
        "select_prompt": "jacket"
    }
    request = json.dumps(params)
    response = bedrock.invoke_model(modelId=model_id, body=request)
    model_response = json.loads(response["body"].read())
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")

    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------

The following table shows the input and output images of a Search and Recolor operation using the following prompt: *pink jacket*.


|  Input  |  Output  | 
| --- | --- | 
|  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/input-search-recolor.jpg)  |  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/output-search-recolor.jpg)  | 

### Search and Replace
<a name="stable-image-services-11"></a>

Search and Replace allows you to use a search prompt to identify an object in simple language to be replaced. The service will automatically segment the object and replace it with the object requested in the prompt without requiring a mask.

Search and Replace has the following required parameters:
+ **prompt** ‐ What you wish to see in the output image. A strong, descriptive prompt that clearly defines elements, colors, and subjects will lead to better results. To control the weight of a given word use the format (word:weight), where word is the word you'd like to control the weight of and weight is a value. A value 0 and 1.0 de-emphasized the word and a value between 1.1 and 2 emphasized the word . For example: The sky was a crisp (blue:0.3) and (green:1.8) would convey a sky that was blue and green, but more green than blue. Minimum 0 and Maximum 10000 characters.
+ **image** ‐ (string) The Base64 image to recolor. Every side of the image must be at least 64 pixels. The total pixel count cannot exceed 9,437,184 pixels. Image aspect ratio must be between 1:2.5 and 2.5:1. Supported formats: jpeg, png, webp.
+ **search\$1prompt** ‐ (string) Short description of what to inpaint in the image. Maximum 10000 characters.

The following parameters are optional:
+ **style\$1preset** ‐ (string) Guides the image model towards a particular style. Enum: 3d-model, analog-film, anime, cinematic, comic-book, digital-art, enhance, fantasy-art, isometric, line-art, low-poly, modeling-compound, neon-punk, origami, photographic, pixel-art, tile-texture.
+ **negative\$1prompt** ‐ (string) A blurb of text describing what you do not wish to see in the output image. This is an advanced feature. Maximum 10000 characters.
+ **seed** ‐ (number) A specific value that is used to guide the 'randomness' of the generation. (Omit this parameter or pass 0 to use a random seed.) Range 0 to 4294967294. Default 0.
+ **output\$1format** ‐ (string) Dictates the content-type of the generated image. Enum: jpeg, png, webp. Default png.
+ **grow\$1mask** ‐ Grows the edges of the mask outward in all directions by the specified number of pixels. The expanded area around the mask will be blurred, which can help smooth the transition between inpainted content and the original image. Range between 0 and 20. Default 5. Try this parameter if you notice seams or rough edges around the inpainted content. Note that excessive growth may obscure fine details in the mask and/or merge nearby masked regions.

------
#### [ API ]

```
import base64
import json
import requests
import io
import os
from PIL import Image

image = "./content/input.png"

region = "us-east-1"
model_id = "us.stability.stable-image-search-replace-v1:0"
url = f"https://bedrock-runtime.{region}.amazonaws.com/model/{model_id}/invoke"
api_key = os.getenv("AWS_BEARER_TOKEN_BEDROCK") # https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started-api-keys.html
headers = {
    "Content-Type":"application/json",
    "Authorization":f"Bearer {api_key}"
}

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

    params = {
        "image": image_base64,
        "prompt": "jacket",
        "search_prompt": "sweater",
    }
    response = requests.request("POST", url, json=params, headers=headers)
    response.raise_for_status()
    model_response = json.loads(response.text)
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")

    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------
#### [ Python ]

```
import boto3
import base64
import io
import json
from PIL import Image

image = "./content/input.png"

region = "us-east-1"
model_id = "us.stability.stable-image-search-replace-v1:0"

bedrock = boto3.client("bedrock-runtime", region_name=region)

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

    params = {
        "image": image_base64,
        "prompt": "jacket",
        "search_prompt": "sweater",
    }
    request = json.dumps(params)
    response = bedrock.invoke_model(modelId=model_id, body=request)
    model_response = json.loads(response["body"].read())
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")
    image_data = base64.b64decode(base64_image_data)

    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")

    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------

The following table shows the input and output images of a Search and Replace operation using the following prompt: *jacket*.


|  Input  |  Output  | 
| --- | --- | 
|  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/input-search-replace.jpg)  |  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/output-search-replace.jpg)  | 

### Erase
<a name="stable-image-services-12"></a>

Erase allows you to remove unwanted elements using image masks, while intelligently maintaining background consistency.

Erase has the following required parameters:
+ **image** ‐ (string) The Base64 image to erase from. Every side of the image must be at least 64 pixels. The total pixel count cannot exceed 9,437,184 pixels. Image aspect ratio must be between 1:2.5 and 2.5:1. Supported formats: jpeg, png, webp.

The following parameters are optional:
+ **seed** ‐ (number) A specific value that is used to guide the 'randomness' of the generation. (Omit this parameter or pass 0 to use a random seed.) Range 0 to 4294967294. Default 0.
+ **output\$1format** ‐ (string) Dictates the content-type of the generated image. Enum: jpeg, png, webp. Default png.
+ **mask** ‐ (string) Controls the strength of the inpainting process on a per-pixel basis, either via a second image (passed into this parameter) or via the alpha channel of the image parameter.
  + **Passing in a Mask** ‐ The image passed to this parameter should be a black and white image that represents, at any pixel, the strength of inpainting based on how dark or light the given pixel is. Completely black pixels represent no inpainting strength while completely white pixels represent maximum strength. In the event the mask is a different size than the image parameter, it will be automatically resized.
  + **Alpha Channel Support** ‐ If you don't provide an explicit mask, one will be derived from the alpha channel of the image parameter. Transparent pixels will be inpainted while opaque pixels will be preserved. In the event an image with an alpha channel is provided along with a mask, the mask will take precedence.
+ **grow\$1mask** ‐ Grows the edges of the mask outward in all directions by the specified number of pixels. The expanded area around the mask will be blurred, which can help smooth the transition between inpainted content and the original image. Range between 0 and 20. Default 5. Try this parameter if you notice seams or rough edges around the inpainted content. Note that excessive growth may obscure fine details in the mask and/or merge nearby masked regions.

**Note**  
For optimal erase results, ensure your mask accurately defines the areas to be removed. If no explicit mask is provided, the service will use the alpha channel of the input image. The mask will take precedence if both are provided.

------
#### [ API ]

```
import base64
import json
import requests
import io
import os
from PIL import Image

image = "./content/input.png"
mask = "./content/mask.png"

region = "us-east-1"
model_id = "us.stability.stable-image-erase-object-v1:0"
url = f"https://bedrock-runtime.{region}.amazonaws.com/model/{model_id}/invoke"
api_key = os.getenv("AWS_BEARER_TOKEN_BEDROCK") # https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started-api-keys.html
headers = {
    "Content-Type":"application/json",
    "Authorization":f"Bearer {api_key}"
}

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8'),
    with open(mask, "rb") as mask_file:
        mask_base64 = base64.b64encode(mask_file.read()).decode('utf-8')

    params = {
        "image": image_base64,
        "mask": mask_base64
    }
    response = requests.request("POST", url, json=params, headers=headers)
    response.raise_for_status()
    model_response = json.loads(response.text)
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")

    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------
#### [ Python ]

```
import boto3
import base64
import io
import json
from PIL import Image

image = "./content/input.png" 
mask = "./content/mask.png"

region = "us-east-1"
model_id = "us.stability.stable-image-erase-object-v1:0"

bedrock = boto3.client("bedrock-runtime", region_name=region)

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8'),
    with open(mask, "rb") as mask_file:
        mask_base64 = base64.b64encode(mask_file.read()).decode('utf-8')

    params = {
        "image": image_base64,
        "mask": mask_base64
    }
    request = json.dumps(params)
    response = bedrock.invoke_model(modelId=model_id, body=request)
    model_response = json.loads(response["body"].read())
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")

    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------

The following table shows the input and output images of an Erase operation.


|  Input  |  Mask  |  Output  | 
| --- | --- | --- | 
|  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/input-erase-object.jpg)  |  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/mask-erase-object.png)  |  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/output-erase-object.jpg)  | 

### Remove Background
<a name="stable-image-services-13"></a>

Remove Background allows you to isolate subjects from the background with precision.

Remove Background has the following required parameters:
+ **image** ‐ (string) The Base64 image to remove the background from. Every side of the image must be at least 64 pixels. The total pixel count cannot exceed 9,437,184 pixels. Image aspect ratio must be between 1:2.5 and 2.5:1. Supported formats: jpeg, png, webp.

The following parameters are optional:
+ **output\$1format** ‐ (string) Dictates the content-type of the generated image. Enum: jpeg, png, webp. Default png.

------
#### [ API ]

```
import base64
import json
import requests
import io
import os
from PIL import Image

image = "./content/input.png"

region = "us-east-1"
model_id = "us.stability.stable-image-remove-background-v1:0"
url = f"https://bedrock-runtime.{region}.amazonaws.com/model/{model_id}/invoke"
api_key = os.getenv("AWS_BEARER_TOKEN_BEDROCK") # https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started-api-keys.html
headers = {
    "Content-Type":"application/json",
    "Authorization":f"Bearer {api_key}"
}

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

    params = {
        "image": image_base64,
    }
    response = requests.request("POST", url, json=params, headers=headers)
    response.raise_for_status()
    model_response = json.loads(response.text)
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")

    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------
#### [ Python ]

```
import boto3
import base64
import io
import json
from PIL import Image

image = "./content/input.png"

region = "us-east-1"
model_id = "us.stability.stable-image-remove-background-v1:0"

bedrock = boto3.client("bedrock-runtime", region_name=region)

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

    params = {
        "image": image_base64
    }
    request = json.dumps(params)
    response = bedrock.invoke_model(modelId=model_id, body=request)
    model_response = json.loads(response["body"].read())
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")

    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------

The following table shows the input and output images of a Remove Background operation.


|  Input  |  Output  | 
| --- | --- | 
|  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/input-remove-background.jpg)  |  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/output-remove-background.jpg)  | 

## Control
<a name="stable-image-services-control"></a>

The following section describes the control Stability AI Image Services.

### Control Sketch
<a name="stable-image-services-1"></a>

Upgrade rough hand-drawn sketches to refined outputs with precise control. For non-sketch images, Control Sketch allows detailed manipulation of the final appearance by leveraging the contour lines and edges within the image.

Control Sketch has the following required parameters:
+ **prompt** ‐ What you wish to see in the output image. A strong, descriptive prompt that clearly defines elements, colors, and subjects will lead to better results. To control the weight of a given word use the format (word:weight), where word is the word you'd like to control the weight of and weight is a value. A value 0 and 1.0 de-emphasized the word and a value between 1.1 and 2 emphasized the word . For example: The sky was a crisp (blue:0.3) and (green:1.8) would convey a sky that was blue and green, but more green than blue. Minimum 0 and Maximum 10000 characters.
+ **image** ‐ (string) The Base64 image of the sketch. Every side of the image must be at least 64 pixels. The total pixel count cannot exceed 9,437,184 pixels. Image aspect ratio must be between 1:2.5 and 2.5:1. Supported formats: jpeg, png, webp.

The following parameters are optional:
+ **control\$1strength** ‐ (number) How much influence, or control, the image has on the generation. Represented as a float between 0 and 1, where 0 is the least influence and 1 is the maximum. Default 0.7.
+ **negative\$1prompt** ‐ (string) A blurb of text describing what you do not wish to see in the output image. This is an advanced feature. Maximum 10000 characters.
+ **seed** ‐ (number) A specific value that is used to guide the 'randomness' of the generation. (Omit this parameter or pass 0 to use a random seed.) Range 0 to 4294967294. Default 0.
+ **output\$1format** ‐ (string) Dictates the content-type of the generated image. Enum: jpeg, png, webp. Default png.
+ **style\$1preset** ‐ Guides the image model towards a particular style. Enum: 3d-model, analog-film, anime, cinematic, comic-book, digital-art, enhance, fantasy-art, isometric, line-art, low-poly, modeling-compound, neon-punk, origami, photographic, pixel-art, tile-texture.

------
#### [ API ]

```
import base64
import json
import requests
import io
import os
from PIL import Image

image = "./content/input.jpg"

region = "us-east-1"
model_id = "us.stability.stable-image-control-sketch-v1:0"
url = f"https://bedrock-runtime.{region}.amazonaws.com/model/{model_id}/invoke"
api_key = os.getenv("AWS_BEARER_TOKEN_BEDROCK") # https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started-api-keys.html
headers = {
    "Content-Type":"application/json",
    "Authorization":f"Bearer {api_key}"
}

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

    params = {
        "image": image_base64,
        "prompt": "a house with background of mountains and river flowing nearby"
    }
    response = requests.request("POST", url, json=params, headers=headers)
    response.raise_for_status()
    model_response = json.loads(response.text)
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")
    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------
#### [ Python ]

```
import boto3
import base64
import io
import json
from PIL import Image

image = "./content/input.jpg"

region = "us-east-1"
model_id = "us.stability.stable-image-control-sketch-v1:0"

bedrock = boto3.client("bedrock-runtime", region_name=region)

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

    params = {
        "image": image_base64,
        "prompt": "a house with background of mountains and river flowing nearby"
    }
    request = json.dumps(params)
    response = bedrock.invoke_model(modelId=model_id, body=request)
    model_response = json.loads(response["body"].read())
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")
    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------

The following table shows the input and output images of a Control Sketch call using the following prompt: *a house with background of mountains and river flowing nearby*.


|  Input  |  Output  | 
| --- | --- | 
|  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/input-control-sketch.jpg)  |  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/output-control-sketch.jpg)  | 
|   |    | 

### Control Structure
<a name="stable-image-services-2"></a>

Control Structure allows you to generate images while maintaining the structure of an input image. This is especially valuable for advanced content creation scenarios such as recreating scenes or rendering characters from models.

Control Structure has the following required parameters:
+ **prompt** ‐ What you wish to see in the output image. A strong, descriptive prompt that clearly defines elements, colors, and subjects will lead to better results. To control the weight of a given word use the format (word:weight), where word is the word you'd like to control the weight of and weight is a value. A value 0 and 1.0 de-emphasized the word and a value between 1.1 and 2 emphasized the word . For example: The sky was a crisp (blue:0.3) and (green:1.8) would convey a sky that was blue and green, but more green than blue. Minimum 0 and Maximum 10000 characters.
+ **image** ‐ (string) The Base64 image of the sketch. Every side of the image must be at least 64 pixels. The total pixel count cannot exceed 9,437,184 pixels. Image aspect ratio must be between 1:2.5 and 2.5:1. Supported formats: jpeg, png, webp.

The following parameters are optional:
+ **control\$1strength** ‐ (number) How much influence, or control, the image has on the generation. Represented as a float between 0 and 1, where 0 is the least influence and 1 is the maximum. Default 0.7.
+ **negative\$1prompt** ‐ (string) A blurb of text describing what you do not wish to see in the output image. This is an advanced feature. Maximum 10000 characters.
+ **seed** ‐ (number) A specific value that is used to guide the 'randomness' of the generation. (Omit this parameter or pass 0 to use a random seed.) Range 0 to 4294967294. Default 0.
+ **output\$1format** ‐ (string) Dictates the content-type of the generated image. Enum: jpeg, png, webp. Default png.
+ **style\$1preset** ‐ Guides the image model towards a particular style. Enum: 3d-model, analog-film, anime, cinematic, comic-book, digital-art, enhance, fantasy-art, isometric, line-art, low-poly, modeling-compound, neon-punk, origami, photographic, pixel-art, tile-texture.

------
#### [ API ]

```
import base64
import json
import requests
import io
import os
from PIL import Image

image = "./content/input.jpg"

region = "us-east-1"
model_id = "us.stability.stable-image-control-structure-v1:0"
url = f"https://bedrock-runtime.{region}.amazonaws.com/model/{model_id}/invoke"
api_key = os.getenv("AWS_BEARER_TOKEN_BEDROCK") # https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started-api-keys.html
headers = {
    "Content-Type":"application/json",
    "Authorization":f"Bearer {api_key}"
}

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

    params = {
        "image": image_base64,
        "prompt": "surreal structure with motion generated sparks lighting the scene"

    }
    response = requests.request("POST", url, json=params, headers=headers)
    response.raise_for_status()
    model_response = json.loads(response.text)
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")
    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------
#### [ Python ]

```
import boto3
import base64
import io
import json
from PIL import Image

image = "./content/input.jpg"

region = "us-east-1"
model_id = "us.stability.stable-image-control-structure-v1:0"

bedrock = boto3.client("bedrock-runtime", region_name=region)

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

    params = {
        "image": image_base64,
        "prompt": "surreal structure with motion generated sparks lighting the scene"

    }
    request = json.dumps(params)
    response = bedrock.invoke_model(modelId=model_id, body=request)
    model_response = json.loads(response["body"].read())
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")
    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------

The following table shows the input and output images of a Control Structure operation using the following prompt: *surreal structure with motion generated sparks lighting the scene*.


|  Input  |  Output  | 
| --- | --- | 
|  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/input-control-structure.jpg)  |  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/output-control-structure.jpg)  | 

### Style Guide
<a name="stable-image-services-3"></a>

Style Guide allows you to extract stylistic elements from an input image and use it to guide the creation of an output image based on the prompt. The result is a new image in the same style as the input image.

Style Guide has the following required parameters:
+ **prompt** ‐ What you wish to see in the output image. A strong, descriptive prompt that clearly defines elements, colors, and subjects will lead to better results. To control the weight of a given word use the format (word:weight), where word is the word you'd like to control the weight of and weight is a value. A value 0 and 1.0 de-emphasized the word and a value between 1.1 and 2 emphasized the word . For example: The sky was a crisp (blue:0.3) and (green:1.8) would convey a sky that was blue and green, but more green than blue. Minimum 0 and Maximum 10000 characters.
+ **image** ‐ (string) The Base64 image of the sketch. Every side of the image must be at least 64 pixels. The total pixel count cannot exceed 9,437,184 pixels. Image aspect ratio must be between 1:2.5 and 2.5:1. Supported formats: jpeg, png, webp.

The following parameters are optional:
+ **aspect\$1ratio** ‐ (string) Controls the aspect ratio of the generated image. This parameter is only valid for text-to-image requests. Default 1:1. Enum: 16:9, 1:1, 21:9, 2:3, 3:2, 4:5, 5:4, 9:16, 9:21. Default 1:1.
+ **negative\$1prompt** ‐ (string) A blurb of text describing what you do not wish to see in the output image. This is an advanced feature. Maximum 10000 characters.
+ **seed** ‐ (number) A specific value that is used to guide the 'randomness' of the generation. (Omit this parameter or pass 0 to use a random seed.) Range 0 to 4294967294. Default 0.
+ **output\$1format** ‐ (string) Dictates the content-type of the generated image. Enum: jpeg, png, webp. Default png.
+ **fidelity** ‐ (number) How closely the output image's style resembles the input image's style. Range 0 to 1. Default 0.5.
+ **style\$1preset** ‐ Guides the image model towards a particular style. Enum: 3d-model, analog-film, anime, cinematic, comic-book, digital-art, enhance, fantasy-art, isometric, line-art, low-poly, modeling-compound, neon-punk, origami, photographic, pixel-art, tile-texture.

------
#### [ API ]

```
import base64
import json
import requests
import io
import os
from PIL import Image

image = "./content/input.jpg"

region = "us-east-1"
model_id = "us.stability.stable-image-style-guide-v1:0"
url = f"https://bedrock-runtime.{region}.amazonaws.com/model/{model_id}/invoke"
api_key = os.getenv("AWS_BEARER_TOKEN_BEDROCK") # https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started-api-keys.html
headers = {
    "Content-Type":"application/json",
    "Authorization":f"Bearer {api_key}"
}

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

    params = {
        "image": image_base64,
        "prompt": "wide shot of modern metropolis" 
    }
    response = requests.request("POST", url, json=params, headers=headers)
    response.raise_for_status()
    model_response = json.loads(response.text)
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")
    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------
#### [ Python ]

```
import boto3
import base64
import io
import json
from PIL import Image

image = "./content/input.jpg"

region = "us-east-1"
model_id = "us.stability.stable-image-style-guide-v1:0"

bedrock = boto3.client("bedrock-runtime", region_name=region)

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

    params = {
        "image": image_base64,
        "prompt": "wide shot of modern metropolis" 
    }
    request = json.dumps(params)
    response = bedrock.invoke_model(modelId=model_id, body=request)
    model_response = json.loads(response["body"].read())
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")
    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------

The following table shows the input and output images of a Style Guide call using the following prompt: *wide shot of modern metropolis*.


|  Input  |  Output  | 
| --- | --- | 
|  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/input-style-guide.jpg)  |  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/output-style-guide.jpg)  | 

### Style Transfer
<a name="stable-image-services-4"></a>

Style Transfer allows you to apply visual characteristics from reference style images to target images. While the Style Guide service extracts stylistic elements from an input image and uses them to guide the creation of an output image based on the prompt, Style Transfer specifically transforms existing content while preserving the original composition. This tool helps create consistent content across multiple assets.

Style Transfer has the following required parameters:
+ **init\$1image** ‐ (string) A Base64 image containing the subject you wish to restyle. Every side of the image must be at least 64 pixels. The total pixel count cannot exceed 9,437,184 pixels. Image aspect ratio must be between 1:2.5 and 2.5:1. Supported formats: jpeg, png, webp.
+ **style\$1image** ‐ (string) A Base64 image containing the subject you wish to restyle. Every side of the image must be at least 64 pixels. The total pixel count cannot exceed 9,437,184 pixels. Image aspect ratio must be between 1:2.5 and 2.5:1. Supported formats: jpeg, png, webp.

The following parameters are optional:
+ **prompt** ‐ (string) What you wish to see in the output image. A strong, descriptive prompt that clearly defines elements, colors, and subjects will lead to better results. To control the weight of a given word use the format (word:weight), where word is the word you'd like to control the weight of and weight is a value. A value 0 and 1.0 de-emphasized the word and a value between 1.1 and 2 emphasized the word . For example: The sky was a crisp (blue:0.3) and (green:1.8) would convey a sky that was blue and green, but more green than blue.
+ **negative\$1prompt** ‐ (string) A blurb of text describing what you do not wish to see in the output image. This is an advanced feature. Maximum 10000 characters.
+ **seed** ‐ (number) A specific value that is used to guide the 'randomness' of the generation. (Omit this parameter or pass 0 to use a random seed.) Range 0 to 4294967294. Default 0.
+ **output\$1format** ‐ (string) Dictates the content-type of the generated image. Enum: jpeg, png, webp. Default png.
+ **composition\$1fidelity** ‐ (number) How closely the output image's style resembles the input image's style. Range between 0 and 1. Default 0.9.
+ **style\$1strength** ‐ (number) Sometimes referred to as denoising, this parameter controls how much influence the style\$1image parameter has on the generated image. A value of 0 would yield an image that is identical to the input. A value of 1 would be as if you passed in no image at all. Range between 0 and 1. Default 1.
+ **change\$1strength** ‐ (number) How much the original image should change. Range between 0.1 and 1. Default 0.9.

------
#### [ API ]

```
import base64
import json
import requests
import io
import os
from PIL import Image

image = "./content/input.jpg"
style_image = "./content/style.jpg"

region = "us-east-1"
model_id = "us.stability.stable-style-transfer-v1:0"
url = f"https://bedrock-runtime.{region}.amazonaws.com/model/{model_id}/invoke"
api_key = os.getenv("AWS_BEARER_TOKEN_BEDROCK") # https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started-api-keys.html
headers = {
    "Content-Type":"application/json",
    "Authorization":f"Bearer {api_key}"
}

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

    with open(style_image, "rb") as style_image_file:
        style_image_base64 = base64.b64encode(style_image_file.read()).decode('utf-8')

    params = {
        "init_image": image_base64,
        "style_image": style_image_base64,
        "prompt": "statue"
    }
    response = requests.request("POST", url, json=params, headers=headers)
    response.raise_for_status()
    model_response = json.loads(response.text)
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")
    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------
#### [ Python ]

```
import boto3
import base64
import io
import json
from PIL import Image

image = "./content/cat_statue_512x512.jpg"
style_image = "./content/glowbot_style.jpg"

region = "us-east-1"
model_id = "us.stability.stable-style-transfer-v1:0"

bedrock = boto3.client("bedrock-runtime", region_name=region)

try:
    with open(image, "rb") as image_file:
        image_base64 = base64.b64encode(image_file.read()).decode('utf-8')

    with open(style_image, "rb") as style_image_file:
        style_image_base64 = base64.b64encode(style_image_file.read()).decode('utf-8')

    params = {
        "init_image": image_base64,
        "style_image": style_image_base64,
        "prompt": "statue"
    }
    request = json.dumps(params)
    response = bedrock.invoke_model(modelId=model_id, body=request)
    model_response = json.loads(response["body"].read())
    base64_image_data = model_response["images"][0]

    if not base64_image_data:
        raise ValueError("No image data found in model response.")

    image_data = base64.b64decode(base64_image_data)
    image = Image.open(io.BytesIO(image_data))
    image.save("image.png")
    print("Successfully saved image.")

except Exception as e:
    print(e)
```

------

The following table shows the input and output images of a Style Transfer call.


|  Input  |  Style  |  Output  | 
| --- | --- | --- | 
|  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/input-style-transfer.jpg)  |  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/style-style-transfer.jpg)  |  ![\[alt text not found\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/stable-image-services/output-style-transfer.jpg)  | 

# TwelveLabs models
<a name="model-parameters-twelvelabs"></a>

This section describes the request parameters and response fields for TwelveLabs models. Use this information to make inference calls to TwelveLabs models. The TwelveLabs Pegasus 1.2 model supports [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) and [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) (streaming) operations. The TwelveLabs Marengo Embed 2.7 and TwelveLabs Marengo Embed 3.0 models support [StartAsyncInvoke](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_StartAsyncInvoke.html) operations. This section also includes code examples that show how to call TwelveLabs models. To use a model in an inference operation, you need the model ID for the model. To get the model ID, see [Supported foundation models in Amazon Bedrock](models-supported.md).

TwelveLabs is a leading provider of multimodal AI models specializing in video understanding and analysis. Their advanced models enable sophisticated video search, analysis, and content generation capabilities through state-of-the-art computer vision and natural language processing technologies.

Amazon Bedrock offers three TwelveLabs models:
+ TwelveLabs Pegasus 1.2 provides comprehensive video understanding and analysis.
+ TwelveLabs Marengo Embed 2.7 generates high-quality embeddings for video, text, audio, and image content.
+ TwelveLabs Marengo Embed 3.0 is the latest embedding model with enhanced performance and capabilities.

These models help you build applications that process, analyze, and derive insights from video data at scale.

**TwelveLabs Pegasus 1.2**

A multimodal model that provides comprehensive video understanding and analysis capabilities, including content recognition, scene detection, and contextual understanding. The model can analyze video content and generate textual descriptions, insights, and answers to questions about the video.

**TwelveLabs Marengo Embed 2.7**

A multimodal embedding model that generates high-quality vector representations of video, text, audio, and image content for similarity search, clustering, and other machine learning tasks. The model supports multiple input modalities and provides specialized embeddings optimized for different use cases.

**TwelveLabs Marengo Embed 3.0**

An enhanced multimodal embedding model that extends the capabilities of Marengo 2.7 with support for text and image interleaved input modality. This model generates high-quality vector representations of video, text, audio, image, and interleaved text-image content for similarity search, clustering, and other machine learning tasks.

**Topics**
+ [

# TwelveLabs Pegasus 1.2
](model-parameters-pegasus.md)
+ [

# TwelveLabs Marengo Embed 2.7
](model-parameters-marengo.md)
+ [

# TwelveLabs Marengo Embed 3.0
](model-parameters-marengo-3.md)

# TwelveLabs Pegasus 1.2
<a name="model-parameters-pegasus"></a>

The TwelveLabs Pegasus 1.2 model provides comprehensive video understanding and analysis capabilities. It can analyze video content and generate textual descriptions, insights, and answers to questions about the video.

Use this information to make inference calls to TwelveLabs models with the InvokeModel, InvokeModelWithResponseStream (streaming) operations.
+ Provider — TwelveLabs
+ Categories — Video understanding, content analysis
+ Model ID — `twelvelabs.pegasus-1-2-v1:0`
+ Input modality — Video
+ Output modality — Text
+ Max video size — 1 hour long video (< 2GB file size)

## TwelveLabs Pegasus 1.2 request parameters
<a name="model-parameters-pegasus-request"></a>

The following table describes the input parameters for the TwelveLabs Pegasus 1.2 model:


**TwelveLabs Pegasus 1.2 request parameters**  

| Field | Type | Required | Description | 
| --- | --- | --- | --- | 
| inputPrompt | string | Yes | Prompt to analyze the video. Max: 2000 tokens. | 
| temperature | double | No | Temperature for the model. Controls randomness in the output. Default: 0.2, Min: 0, Max: 1. | 
| responseFormat | Object | No | Lets users specify the structured output format. Currently supports json\$1schema only. | 
| mediaSource | object | Yes | Describes the media source. Either base64String or s3Location must be provided. | 
| mediaSource.base64String | string | No | Base64 encoded byte string for the video. Max: 25MB. | 
| mediaSource.s3Location.uri | string | No | S3 URI where the video could be downloaded from. Max: 1 hour long video (< 2GB file size). | 
| mediaSource.s3Location.bucketOwner | string | No | AWS account ID of the bucket owner. | 
| maxOutputTokens | integer | No | The maximum number of tokens to generate. Max: 4096. | 

## TwelveLabs Pegasus 1.2 response fields
<a name="model-parameters-pegasus-response"></a>

The following table describes the output fields for the TwelveLabs Pegasus 1.2 model:


**TwelveLabs Pegasus 1.2 response fields**  

| Field | Type | Description | 
| --- | --- | --- | 
| message | string | Output message containing the model's analysis of the video. | 
| finishReason | string | Stop reason that describes why the output ended. Valid values: stop (API returned the full completions without reaching any limits), length (the generation exceeded the max\$1tokens limit). | 

## TwelveLabs Pegasus 1.2 request and response
<a name="model-parameters-pegasus-examples"></a>

The following examples show how to use the TwelveLabs Pegasus 1.2 model with different input sources.

------
#### [ Request ]

The following examples show request formats for the TwelveLabs Pegasus 1.2 model.

**Using base64 encoded video:**

```
{
  "inputPrompt": "tell me about the video",
  "mediaSource": {
      "base64String": "<BASE64 STRING OF VIDEO FILE>"
  },
  "temperature": 0
}
```

**Using S3 stored video:**

```
{
    "inputPrompt": "Tell me about this video",
    "mediaSource": {
        "s3Location": {
            "uri": "s3://path-to-video-object-in-s3",
            "bucketOwner": "bucket-owner-account-id"
        }
    },
    "temperature": 0
}
```

**Using structured output format:**

```
{
    "inputPrompt": "Analyze this video and provide a structured summary",
    "mediaSource": {
        "s3Location": {
            "uri": "s3://path-to-video-object-in-s3",
            "bucketOwner": "bucket-owner-account-id"
        }
    },
    "temperature": 0.2,
    "maxOutputTokens": 2048,
    "responseFormat": {
        "type": "json_schema",
        "json_schema": {
            "name": "video_analysis",
            "schema": {
                "type": "object",
                "properties": {
                    "summary": {"type": "string"},
                    "key_scenes": {"type": "array", "items": {"type": "string"}},
                    "duration": {"type": "string"}
                },
                "required": ["summary", "key_scenes"]
            }
        }
    }
}
```

------
#### [ Response ]

The following examples show response formats from the TwelveLabs Pegasus 1.2 model.

**Standard response:**

```
{
  "message": "This video shows a person walking through a park during sunset. The scene includes trees, a walking path, and golden lighting from the setting sun. The person appears to be enjoying a peaceful evening stroll.",
  "finishReason": "stop"
}
```

**Response with structured output:**

```
{
  "message": "{\"summary\": \"A peaceful evening walk through a park at sunset\", \"key_scenes\": [\"Person entering the park\", \"Walking along tree-lined path\", \"Sunset lighting through trees\", \"Person sitting on bench\"], \"duration\": \"Approximately 2 minutes\"}",
  "finishReason": "stop"
}
```

**Response when max tokens reached:**

```
{
  "message": "This video contains multiple scenes showing various activities. The first scene shows...",
  "finishReason": "length"
}
```

------

# TwelveLabs Marengo Embed 2.7
<a name="model-parameters-marengo"></a>

The TwelveLabs Marengo Embed 2.7 model generates embeddings from video, text, audio, or image inputs. These embeddings can be used for similarity search, clustering, and other machine learning tasks.
+ Provider — TwelveLabs
+ Model ID — twelvelabs.marengo-embed-2-7-v1:0

The TwelveLabs Marengo Embed 2.7 model supports the Amazon Bedrock Runtime operations in the following table. 
+ For more information about use cases for different API methods, see [Learn about use cases for different model inference methodsDifferent inference methods](inference-methods.md).
+ For more information about model types, see [How inference works in Amazon BedrockHow inference works](inference-how.md).
  + For a list of model IDs and to see the models and AWS Regions that TwelveLabs Marengo Embed 2.7 is supported in, search for the model in the table at [Supported foundation models in Amazon Bedrock](models-supported.md).
  + For a full list of inference profile IDs, see [Supported Regions and models for inference profiles](inference-profiles-support.md). The inference profile ID is based on the AWS Region.


****  

| API operation | Supported model types | Input modalities | Output modalities | 
| --- | --- | --- | --- | 
|  InvokeModel  | [Inference profiles](inference-profiles-support.md) |  Text Image  |  Embedding  | 
| StartAsyncInvoke | [Base models](models-supported.md) |  Video Audio Image Text  |  Embedding  | 

**Note**  
Use `InvokeModel` to generate embeddings for search query. Use `StartAsyncInvoke` to generate embeddings for assets at a large scale.

The following quotas apply to the input:


****  

| Input modality | Maximum | 
| --- | --- | 
| Text | 77 tokens | 
| Image | 5 MB | 
| Video (S3) | 2 GB | 
| Audio (S3) | 2 GB | 

**Note**  
If you define audio or video inline by using base64-encoding, make sure that the request body payload doesn't exceed the Amazon Bedrock 25 MB model invocation quota.

**Topics**
+ [

## TwelveLabs Marengo Embed 2.7 request parameters
](#model-parameters-marengo-async-request)
+ [

## TwelveLabs Marengo Embed 2.7 response
](#model-parameters-marengo-response)
+ [

## TwelveLabs Marengo Embed 2.7 code examples
](#model-parameters-marengo-examples)

## TwelveLabs Marengo Embed 2.7 request parameters
<a name="model-parameters-marengo-async-request"></a>

When you make a request, the field in which the model-specific input is specified depends on the API operation:
+ [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) – In the request `body`.
+ [StartAsyncInvoke](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_StartAsyncInvoke.html) – In the `modelInput` field of the request body.

The format of the model input depends on the input modality:

------
#### [ Text ]

```
{
    "inputType": "text",
    "inputText": "string",
    "textTruncate": "string
}
```

------
#### [ Inline image ]

```
{
     "inputType": "image",
     "mediaSource": {
          "base64String": "base64-encoded string"
     }
}
```

------
#### [ S3 image ]

```
{
    "inputType": "image",
    "mediaSource": {
        "s3Location": {
            "uri": "string",
            "bucketOwner": "string"
        }
    }
}
```

------
#### [ Inline video ]

```
{
    "inputType": "video",
    "mediaSource": {
        "s3Location": {
            "base64String": "base64-encoded string"
        }
    },
    "startSec": double,
    "lengthSec": double,
    "useFixedLengthSec": double,
    "embeddingOption": "visual-text" | "visual-image" | "audio"
}
```

------
#### [ S3 video ]

```
{
    "inputType": "image",
    "mediaSource": {
        "s3Location": {
           "uri": "string",
           "bucketOwner": "string"
        }
    },
    "startSec": double,
    "lengthSec": double,
    "useFixedLengthSec": double,
    "minClipSec": int,
    "embeddingOption": ["string"]
}
```

------
#### [ Inline audio ]

```
{
    "inputType": "audio", 
    "mediaSource": { 
        "base64String": "base64-encoded string"
    },
    "startSec": double,
    "lengthSec": double,
    "useFixedLengthSec": double
}
```

------
#### [ S3 audio ]

```
{
    "inputType": "audio",
    "mediaSource": {
        "s3Location": {
           "uri": "string",
           "bucketOwner": "string"
        }
    },
    "startSec": double,
    "lengthSec": double,
    "useFixedLengthSec": double
}
```

------

Expand the following sections for details about the input parameters:

### inputType
<a name="model-parameters-marengo-inputType"></a>

Modality for the embedding.
+ **Type:** String
+ **Required:** Yes
+ **Valid values:** `video` \$1 `text` \$1 `audio` \$1 `image`

### inputText
<a name="model-parameters-marengo-inputText"></a>

Text to be embedded.
+ **Type:** String
+ **Required:** Yes (for compatible input types)
+ **Compatible input types:** Text

### textTruncate
<a name="model-parameters-marengo-textTruncate"></a>

Specifies how the platform truncates text.
+ **Type:** String
+ **Required:** No
+ **Valid values:**
  + `end` – Truncates the end of the text.
  + `none` – Returns an error if the text exceeds the limit
+ **Default value:** end
+ **Compatible input types:** Text

### mediaSource
<a name="model-parameters-marengo-mediaSource"></a>

Contains information about the media source.
+ **Type:** Object
+ **Required:** Yes (if compatible type)
+ **Compatible input types:** Image, Video, Audio

The format of the `mediaSource` object in the request body depends on whether the media is defined as a Base64-encoded string or as an S3 location.
+ **Base64-encoded string**

  ```
  {
      "mediaSource": {
          "base64String": "base64-encoded string"
      }
  }
  ```
  + `base64String` – The Base64-encoded string for the media.
+ **S3 location** – Specify the S3 URI and the bucket owner.

  ```
  {
      "s3Location": {
          "uri": "string",
          "bucketOwner": "string"
      }
  }
  ```
  + `uri` – The S3 URI containing the media.
  + `bucketOwner` – The AWS account ID of the S3 bucket owner.

### embeddingOption
<a name="model-parameters-marengo-embeddingOption"></a>

Specifies which types of embeddings to retrieve.
+ **Type:** List
+ **Required:** No
+ **Valid values for list members:**
  + `visual-text` – Visual embeddings optimized for text search.
  + `visual-image` – Visual embeddings optimized for image search.
  + `audio` – Embeddings of the audio in the video.
+ **Default value:** ["visual-text", "visual-image", "audio"]
+ **Compatible input types:** Video, Audio

### startSec
<a name="model-parameters-marengo-startSec"></a>

The time point in seconds of the clip where processing should begin.
+ **Type:** Double
+ **Required:** No
+ **Minimum value:** 0
+ **Default value:** 0
+ **Compatible input types:** Video, Audio

### lengthSec
<a name="model-parameters-marengo-lengthSec"></a>

The time in seconds, counting from the `startSec` time point, after which processing should stop.
+ **Type:** Double
+ **Required:** No
+ **Valid values:** 0 - Duration of media
+ **Default value:** Duration of media
+ **Compatible input types:** Video, Audio

Example:
+ startSec: 5
+ lengthSec: 20
+ Result: The clip is processed from 0:05 to 0:25 (5 seconds \$1 20 seconds).

### useFixedLengthSec
<a name="model-parameters-marengo-useFixedLengthSec"></a>

The duration of each clip for which the model should generate an embedding.
+ **Type:** Double
+ **Required:** No
+ **Value parameters:** 2 - 10. Must be greater than or equal to `minClipSec`.
+ **Default value:** Depends on the type of media:
  + **Video:** Divided dynamically by shot boundary detection.
  + **Audio:** Divided evenly with segments as close to 10 seconds as possible.

    Examples:
    + A 50-second clip is divided into 5 10-second segments.
    + A 16-second clip is divided into 2 8-second segments.
+ **Compatible input types:** – Video, Audio
+ **Notes:** Must be greater than or equal to `minClipSec`.

### minClipSec
<a name="model-parameters-marengo-minClipSec"></a>

Sets a minimum value for each clip in seconds.
+ **Type:** int
+ **Required:** No
+ **Value parameters:** Range: 1-5
+ **Default value:** 4
+ **Compatible input types:** Video
+ **Notes:** Must be less than or equal to `useFixedLengthSec`.

## TwelveLabs Marengo Embed 2.7 response
<a name="model-parameters-marengo-response"></a>

The location of the output embeddings and associated metadata depends on the invocation method:
+ [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) – In the response body.
+ [StartAsyncInvoke](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_StartAsyncInvoke.html) – In the S3 bucket defined in the `s3OutputDataConfig`, after the asynchronous invocation job completes.

If there are multiple embeddings vectors, the output is a list of objects, each containing a vector and its associated metadata.

The format of the output embeddings vector is as follows:

```
{
    "embedding": ["string"],
    "embeddingOption": "visual-text" | "visual-image" | "audio",
    "startSec": double,
    "endsec": double
}
```

Expand the following sections for details about the response parameters:

### embedding
<a name="model-parameters-marengo-embedding"></a>

Embeddings vector representation of input.
+ **Type:** List of doubles

### embeddingOption
<a name="model-parameters-marengo-embeddingOption"></a>

The type of embeddings.
+ **Type:** String
+ **Possible values:**
  + `visual-text` – Visual embeddings optimized for text search.
  + `visual-image` – Visual embeddings optimized for image search.
  + `audio` – Embeddings of the audio in the video.
+ **Compatible input types:** Video

### startSec
<a name="model-parameters-marengo-startSec"></a>

The start offset of the clip.
+ **Type:** Double
+ **Compatible input types:** Video, Audio

### endSec
<a name="model-parameters-marengo-endSec"></a>

The end offset of the clip, in seconds.
+ **Type:** Double
+ **Compatible input types:** Video, Audio

## TwelveLabs Marengo Embed 2.7 code examples
<a name="model-parameters-marengo-examples"></a>

This section shows how to use the TwelveLabs Marengo Embed 2.7 model with different input types using Python. The examples demonstrate how to define model-specific input and run model invocations.

**Note**  
InvokeModel supports text and image input only. For video and audio input, use StartAsyncInvoke.

Put your code together in the following steps:

**1. Define model-specific input**  
Define the model-specific input depending on your input type:

------
#### [ Text ]

```
# Create the model-specific input
model_id = "twelvelabs.marengo-embed-2-7-v1:0"
# Replace the us prefix depending on your region
inference_profile_id = "us.twelvelabs.marengo-embed-2-7-v1:0"
                            
model_input = {
  "inputType": "text",
  "inputText": "man walking a dog"
}
```

------
#### [ Inline image ]

```
# Create the model-specific input
model_id = "twelvelabs.marengo-embed-2-7-v1:0"
# Replace the us prefix depending on your region
inference_profile_id = "us.twelvelabs.marengo-embed-2-7-v1:0"

model_input = {
   "inputType": "image",
   "mediaSource": {
      "base64String": "example-base64-image"
   }
}
```

------
#### [ S3 image ]

```
# Create the model-specific input
model_id = "twelvelabs.marengo-embed-2-7-v1:0"
# Replace the us prefix depending on your region
inference_profile_id = "us.twelvelabs.marengo-embed-2-7-v1:0"

model_input = {
     "inputType": "image",
     "mediaSource": {
          "s3Location": {
               "uri": "s3://amzn-s3-demo-bucket/my_image.png",
               "bucketOwner": "123456789012"
          }
     }
}
```

------
#### [ Inline video ]

```
# Create the model-specific input
model_id = "twelvelabs.marengo-embed-2-7-v1:0"
# Replace the us prefix depending on your region
inference_profile_id = "us.twelvelabs.marengo-embed-2-7-v1:0"

model_input = {
    "inputType": "video",
    "mediaSource": {
        "base64String": "base_64_encoded_string_of_video"
    },
    "startSec": 0,
    "lengthSec": 13,
    "useFixedLengthSec": 5,
    "embeddingOption": [
        "visual-text", 
        "audio"
    ]
}
```

------
#### [ S3 video ]

```
# Create the model-specific input
model_id = "twelvelabs.marengo-embed-2-7-v1:0"
# Replace the us prefix depending on your region
inference_profile_id = "us.twelvelabs.marengo-embed-2-7-v1:0"

model_input = {
    "inputType": "video",
    "mediaSource": {
        "s3Location": {
            "uri": "amzn-s3-demo-bucket/my-video.mp4",
            "bucketOwner": "123456789012"
        }
    },
    "startSec": 0,
    "lengthSec": 13,
    "useFixedLengthSec": 5,
    "embeddingOption": [
        "visual-text", 
        "audio"
    ]
}
```

------
#### [ Inline audio ]

```
# Create the model-specific input
model_id = "twelvelabs.marengo-embed-2-7-v1:0"
# Replace the us prefix depending on your region
inference_profile_id = "us.twelvelabs.marengo-embed-2-7-v1:0"

model_input = {
    "inputType": "audio", 
    "mediaSource": { 
        "base64String": "base_64_encoded_string_of_audio"
    },
    "startSec": 0,
    "lengthSec": 13,
    "useFixedLengthSec": 10
}
```

------
#### [ S3 audio ]

```
# Create the model-specific input
model_id = "twelvelabs.marengo-embed-2-7-v1:0"
# Replace the us prefix depending on your region
inference_profile_id = "us.twelvelabs.marengo-embed-2-7-v1:0"

model_input = {
    "inputType": "audio",
    "mediaSource": {  
        "s3Location": { 
            "uri": "s3://amzn-s3-demo-bucket/my-audio.wav", 
            "bucketOwner": "123456789012" 
        }
    },
    "startSec": 0,
    "lengthSec": 13,
    "useFixedLengthSec": 10
}
```

------

**2. Run model invocation using the model input**  
Then, add the code snippet that corresponds to your model invocation method of choice.

------
#### [ InvokeModel ]

```
# Run model invocation with InvokeModel
import boto3
import json

# Initialize the Bedrock Runtime client
client = boto3.client('bedrock-runtime')

# Make the request
response = client.invoke_model(
    modelId=inference_profile_id,
    body=json.dumps(model_input)
)

# Print the response body
response_body = json.loads(response['body'].read().decode('utf-8'))

print(response_body)
```

------
#### [ StartAsyncInvoke ]

```
# Run model invocation asynchronously
import boto3
import json

# Initalize the Bedrock Runtime client.
client = boto3.client("bedrock-runtime")

try:
    # Start the asynchronous job
    invocation = client.start_async_invoke(
        modelId=model_id,
        modelInput=model_input,
        outputDataConfig={
            "s3OutputDataConfig": {
                "s3Uri": "s3://&example-s3-destination-bucket;"
            }
        }
    )

    # Print the response JSON
    print("Response:")
    print(json.dumps(invocation, indent=2, default=str))

except Exception as e:
    # Implement error handling here.
    message = e.response["Error"]["Message"]
    print(f"Error: {message}")
```

------

# TwelveLabs Marengo Embed 3.0
<a name="model-parameters-marengo-3"></a>

The TwelveLabs Marengo Embed 3.0 model generates enhanced embeddings from video, text, audio, or image inputs. This latest version offers improved performance and accuracy for similarity search, clustering, and other machine learning tasks.
+ Provider — TwelveLabs
+ Model ID — twelvelabs.marengo-embed-3-0-v1:0

Marengo Embed 3.0 delivers several key enhancements:
+ **Extended video processing capacity** – Process up to 4 hours of video and audio content. Files can be up to 6 GB, which is double the capacity of previous versions. This makes it ideal for analyzing full sporting events, extended training videos, and complete film productions.
+ **Enhanced sports analysis** – The model delivers significant improvements. It provides better understanding of gameplay dynamics, player movements, and event detection.
+ **Global multilingual support** – Expanded language capabilities from 12 to 36 languages. This enables global organizations to build unified search and retrieval systems that work seamlessly across diverse regions and markets.
+ **Multimodal search precision** – Combine images and descriptive text in a single embedding request. This merges visual similarity with semantic understanding to deliver more accurate and contextually relevant search results.
+ **Reduced embedding dimension** – Reduced from 1024 to 512, which can help reduce storage costs.

The TwelveLabs Marengo Embed 3.0 model supports the Amazon Bedrock Runtime operations in the following table. 
+ For more information about use cases for different API methods, see [Learn about use cases for different model inference methodsDifferent inference methods](inference-methods.md).
+ For more information about model types, see [How inference works in Amazon BedrockHow inference works](inference-how.md).
  + For a list of model IDs and to see the models and AWS Regions that TwelveLabs Marengo Embed 3.0 is supported in, search for the model in the table at [Supported foundation models in Amazon Bedrock](models-supported.md).
  + For a full list of inference profile IDs, see [Supported Regions and models for inference profiles](inference-profiles-support.md). The inference profile ID is based on the AWS Region.


****  

| API operation | Supported model types | Input modalities | Output modalities | 
| --- | --- | --- | --- | 
|  InvokeModel  |  US East (N. Virginia) – [Base models](models-supported.md) and [Inference profiles](inference-profiles-support.md) Europe (Ireland) – [Inference profiles](inference-profiles-support.md) Asia Pacific (Seoul) - [Base models](models-supported.md)  |  Text Image **Note:** Text and image interleaved is also supported.  |  Embedding  | 
| StartAsyncInvoke | [Base models](models-supported.md) |  Video Audio Image Text **Note:** Text and image interleaved is also supported.  |  Embedding  | 

**Note**  
Use `InvokeModel` to generate embeddings for search query. Use `StartAsyncInvoke` to generate embeddings for assets at a large scale.

The following quotas apply to the input:


****  

| Input modality | Maximum | 
| --- | --- | 
| Text | 500 tokens | 
| Image | 5 MB per image | 
| Video (S3) | 6 GB, 4 hour length | 
| Audio (S3) | 6 GB, 4 hour length | 

**Note**  
If you define audio or video inline by using base64-encoding, make sure that the request body payload doesn't exceed the Amazon Bedrock 25 MB model invocation quota.

**Topics**
+ [

## TwelveLabs Marengo Embed 3.0 request parameters
](#model-parameters-marengo-3-async-request)
+ [

## TwelveLabs Marengo Embed 3.0 response
](#model-parameters-marengo-3-response)
+ [

## TwelveLabs Marengo Embed 3.0 code examples
](#model-parameters-marengo-3-examples)

## TwelveLabs Marengo Embed 3.0 request parameters
<a name="model-parameters-marengo-3-async-request"></a>

When you make a request, the field in which the model-specific input is specified depends on the API operation:
+ [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) – In the request `body`.
+ [StartAsyncInvoke](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_StartAsyncInvoke.html) – In the `modelInput` field of the request body.

The format of the model input depends on the input modality:

------
#### [ Text ]

```
{
    "inputType": "text",
    "text": {
        "inputText": "string"
    }
}
```

------
#### [ Image ]

```
{
  "inputType": "image",
  "image": {
    "mediaSource": {
      "base64String": "base64-encoded string", // base64String OR s3Location, exactly one
      "s3Location": {
        "uri": "s3://amzn-s3-demo-bucket/folder/dog.jpg",
        "bucketOwner": "123456789012"
      }
    }
  }
}
```

------
#### [ Text & image ]

```
{
  "inputType": "text_image",
  "text_image": {
    "inputText": "man walking a dog",
    "mediaSource": {
      "base64String": "base64-encoded string", // base64String OR s3Location, exactly one
      "s3Location": {
        "uri": "s3://amzn-s3-demo-bucket/folder/dog.jpg",
        "bucketOwner": "123456789012"
      }
    }
  }
}
```

------
#### [ Audio ]

```
{
  "inputType": "audio",
  "audio": {
    "mediaSource": {
      "base64String": "base64-encoded string", // base64String OR s3Location, exactly one
      "s3Location": {
        "uri": "s3://amzn-s3-demo-bucket/audio/a.wav",
        "bucketOwner": "123456789012"
      }
    },
    "startSec": 0,
    "endSec": 6,
    "segmentation": {
      "method": "fixed", 
      "fixed": {
        "durationSec": 6
      }
    },
    "embeddingOption": [
      "audio",
      "transcription"
    ], // optional, default=both
    "embeddingScope": [
      "clip",
      "asset"
    ] // optional, one or both
  }
}
```

------
#### [ Video ]

```
{
  "inputType": "video",
  "video": {
    "mediaSource": {
      "base64String": "base64-encoded string", // base64String OR s3Location, exactly one
      "s3Location": {
        "uri": "s3://amzn-s3-demo-bucket/video/clip.mp4",
        "bucketOwner": "123456789012"
      }
    },
    "startSec": 0,
    "endSec": 6,
    "segmentation": {
      "method": "dynamic", // dynamic OR fixed, exactly one
      "dynamic": {
        "minDurationSec": 4
      }
      "method": "fixed",
      "fixed": {
        "durationSec": 6
      }
    },
    "embeddingOption": [
      "visual",
      "audio", 
      "transcription"
    ], // optional, default=all
    "embeddingScope": [
      "clip",
      "asset"
    ] // optional, one or both
  },
  "inferenceId": "some inference id"
}
```

------

Expand the following sections for details about the input parameters:

### inputType
<a name="model-parameters-marengo-3-inputType"></a>

Modality for the embedding.
+ **Type:** String
+ **Required:** Yes
+ **Valid values:** `text` \$1 `image` \$1 `text_image` \$1 `audio` \$1 `video`

### inputText
<a name="model-parameters-marengo-3-inputText"></a>

Text to be embedded.
+ **Type:** String
+ **Required:** Yes (for compatible input types)
+ **Compatible input types:** Text

### mediaSource
<a name="model-parameters-marengo-3-mediaSource"></a>

Contains information about the media source.
+ **Type:** Object
+ **Required:** Yes (if compatible type)
+ **Compatible input types:** Image, Video, Audio

The format of the `mediaSource` object in the request body depends on whether the media is defined as a Base64-encoded string or as an S3 location.
+ **Base64-encoded string**

  ```
  {
      "mediaSource": {
          "base64String": "base64-encoded string"
      }
  }
  ```
  + `base64String` – The Base64-encoded string for the media.
+ **S3 location** – Specify the S3 URI and the bucket owner.

  ```
  {
      "s3Location": {
          "uri": "string",
          "bucketOwner": "string"
      }
  }
  ```
  + `uri` – The S3 URI containing the media.
  + `bucketOwner` – The AWS account ID of the S3 bucket owner.

### embeddingOption
<a name="model-parameters-marengo-3-embeddingOption"></a>

Specifies which types of embeddings to retrieve.
+ **Type:** List
+ **Required:** No
+ **Valid values for list members:**
  + `visual` – Visual embeddings from the video.
  + `audio` – Embeddings of the audio in the video.
  + `transcription` – Embeddings of the transcribed text.
+ **Default value:**
  + Video: ["visual", "audio", "transcription"]
  + Audio: ["audio", "transcription"]
+ **Compatible input types:** Video, Audio

### embeddingScope
<a name="model-parameters-marengo-3-embeddingScope"></a>

Specifies the scope of the embeddings to retrieve.
+ **Type:** List
+ **Required:** No
+ **Valid values for list members:**
  + `clip` – Returns embeddings for each clip.
  + `asset` – Returns embeddings for the entire asset.
+ **Compatible input types:** Video, Audio

### startSec
<a name="model-parameters-marengo-3-startSec"></a>

The time point in seconds of the clip where processing should begin.
+ **Type:** Double
+ **Required:** No
+ **Minimum value:** 0
+ **Default value:** 0
+ **Compatible input types:** Video, Audio

### endSec
<a name="model-parameters-marengo-3-endSec"></a>

The time point in seconds where processing should end.
+ **Type:** Double
+ **Required:** No
+ **Minimum value:** startSec \$1 segment length
+ **Maximum value:** Duration of media
+ **Default value:** Duration of media
+ **Compatible input types:** Video, Audio

### segmentation
<a name="model-parameters-marengo-3-segmentation"></a>

Defines how the media is divided into segments for embedding generation.
+ **Type:** Object
+ **Required:** No
+ **Compatible input types:** Video, Audio

The segmentation object contains a `method` field and method-specific parameters:
+ `method` – The segmentation method to use. Valid values: `dynamic` \$1 `fixed`
+ `dynamic` – For video, uses shot boundary detection to divide content dynamically. Contains:
  + `minDurationSec` – Minimum duration for each segment in seconds. Type: Integer. Range: 1-5. Default: 4.
+ `fixed` – Divides content into segments of equal duration. Contains:
  + `durationSec` – Duration of each segment in seconds. Type: Integer. Range: 1-10. Default: 6.

**Default behavior:**
+ Video: Uses dynamic segmentation with shot boundary detection.
+ Audio: Uses fixed segmentation. Content is divided as evenly as possible with segments close to 10 seconds.

### inferenceId
<a name="model-parameters-marengo-3-inferenceId"></a>

Unique identifier for the inference request.
+ **Type:** String
+ **Required:** No

## TwelveLabs Marengo Embed 3.0 response
<a name="model-parameters-marengo-3-response"></a>

The location of the output embeddings and associated metadata depends on the invocation method:
+ InvokeModel – In the response body.
+ StartAsyncInvoke – In the S3 bucket defined in `s3OutputDataConfig`, after the asynchronous invocation job completes.

If there are multiple embeddings vectors, the output is a list of objects, each containing a vector and its associated metadata.

The format of the output embeddings vector is as follows:

```
{
  "data": {
    "embedding": [
    0.111, 0.234, ...
    ],
    "embeddingOption": ["visual", "audio", "transcription" (for video input) | "audio", "transcription" (for audio input)],
    "embeddingScope": ["asset" | "clip"],
    "startSec": 0,
    "endSec": 4.2
  }
}
```

The embeddings are returned as an array of floats.

Where you see this response depends on the API method you used:
+ InvokeModel – Appears in the response body.
+ StartAsyncInvoke – Appears at the S3 location that you specified in the request. The response returns an `invocationArn`. You can use this to get metadata about the asynchronous invocation. This includes the status and the S3 location where the results are written.

Expand the following sections for details about the response parameters:

### embedding
<a name="model-parameters-marengo-3-embedding"></a>

Embeddings vector representation of input.
+ **Type:** List of doubles

### embeddingOption
<a name="model-parameters-marengo-3-embeddingOption-response"></a>

The type of embeddings.
+ **Type:** String
+ **Possible values:**
  + visual – Visual embeddings from the video.
  + audio – Embeddings of the audio in the video.
  + transcription – Embeddings of the transcribed text.
+ **Compatible input types:** Video, Audio

### embeddingScope
<a name="model-parameters-marengo-3-embeddingScope"></a>

Specifies the scope of the embeddings to retrieve.
+ **Type:** String

You can include one or more of the following values:
+ clip: Returns embeddings for each clip.
+ asset: Returns embeddings for the entire asset.

### startSec
<a name="model-parameters-marengo-3-startSec-response"></a>

The start offset of the clip.
+ **Type:** Double
+ **Compatible input types:** Video, Audio

### endSec
<a name="model-parameters-marengo-3-endSec-response"></a>

The end offset of the clip. Not applicable for text, image and text\$1image embeddings.
+ **Type:** Double
+ **Compatible input types:** Video, Audio

## TwelveLabs Marengo Embed 3.0 code examples
<a name="model-parameters-marengo-3-examples"></a>

This section shows how to use the TwelveLabs Marengo Embed 3.0 model with different input types using Python. The examples demonstrate how to define model-specific input and run model invocations.

**Note**  
InvokeModel supports text, image, and text with image interleaved input. For video and audio input, use StartAsyncInvoke.

Put your code together in the following steps:

**1. Define model-specific input**  
Define the model-specific input depending on your input type:

------
#### [ Text ]

```
# Create the model-specific input
model_id = "twelvelabs.marengo-embed-3-0-v1:0"
# Replace the us prefix depending on your region
inference_profile_id = "us.twelvelabs.marengo-embed-3-0-v1:0"

model_input = {
    "inputType": "text",
    "text": {
        "inputText": "man walking a dog"
    }
}
```

------
#### [ Image ]

```
# Create the model-specific input
model_id = "twelvelabs.marengo-embed-3-0-v1:0"
# Replace the us prefix depending on your region
inference_profile_id = "us.twelvelabs.marengo-embed-3-0-v1:0"

model_input = {
    "inputType": "image",
    "image": {
        "mediaSource": {
            "s3Location": {
                "uri": "s3://amzn-s3-demo-bucket/my_image.png",
                "bucketOwner": "123456789012"
            }
        }
    }
}
```

------
#### [ Text & image ]

```
# Create the model-specific input
model_id = "twelvelabs.marengo-embed-3-0-v1:0"
# Replace the us prefix depending on your region
inference_profile_id = "us.twelvelabs.marengo-embed-3-0-v1:0"

model_input = {
    "inputType": "text_image",
    "text_image": {
        "inputText": "man walking a dog",
        "mediaSource": {
            "s3Location": {
                "uri": "s3://amzn-s3-demo-bucket/my_image.jpg",
                "bucketOwner": "123456789012"
            }
        }
    }
}
```

------
#### [ Audio ]

```
# Create the model-specific input
model_id = "twelvelabs.marengo-embed-3-0-v1:0"
# Replace the us prefix depending on your region
inference_profile_id = "us.twelvelabs.marengo-embed-3-0-v1:0"
 
model_input = {
    "inputType": "audio",
    "audio": {
        "mediaSource": {  
            "s3Location": { 
                "uri": "s3://amzn-s3-demo-bucket/my-audio.wav", 
                "bucketOwner": "123456789012" 
            }
        },
        "startSec": 0,
        "endSec": 5,
        "segmentation": {
            "method": "fixed",
            "fixed": {
                "durationSec": 5
            }
        },
        "embeddingScope": ["clip", "asset"],
        "embeddingOption": ["audio"]
    }
}
```

------
#### [ Video ]

```
# Create the model-specific input
model_id = "twelvelabs.marengo-embed-3-0-v1:0"
# Replace the us prefix depending on your region
inference_profile_id = "us.twelvelabs.marengo-embed-3-0-v1:0"
 
model_input = {
    "inputType": "video",
    "video": {
        "mediaSource": {
            "s3Location": {
                "uri": "s3://amzn-s3-demo-bucket/my-video.mp4",
                "bucketOwner": "123456789012"
            }
        },
        "startSec": 10,
        "endSec": 20,
        "segmentation": {
            "method": "fixed",
            "fixed": {
                "durationSec": 5
            }
        },
        "embeddingOption": [
            "visual", 
            "audio"
        ],
        "embeddingScope": [
            "clip",
            "asset"
        ]
    }
}
```

------

**2. Run model invocation using the model input**  
Then, add the code snippet that corresponds to your model invocation method of choice.

------
#### [ InvokeModel ]

```
# Run model invocation with InvokeModel
import boto3
import json

# Initialize the Bedrock Runtime client
client = boto3.client('bedrock-runtime')

# Make the request
response = client.invoke_model(
    modelId=inference_profile_id,
    body=json.dumps(model_input)
)

# Print the response body
response_body = json.loads(response['body'].read().decode('utf-8'))

print(response_body)
```

------
#### [ StartAsyncInvoke ]

```
# Run model invocation asynchronously
import boto3
import json

# Initalize the Bedrock Runtime client.
client = boto3.client("bedrock-runtime")

try:
    # Start the asynchronous job
    invocation = client.start_async_invoke(
        modelId=model_id,
        modelInput=model_input,
        outputDataConfig={
            "s3OutputDataConfig": {
                "s3Uri": "s3://amzn-s3-demo-bucket"
            }
        }
    )

    # Print the response JSON
    print("Response:")
    print(json.dumps(invocation, indent=2, default=str))

except Exception as e:
    # Implement error handling here.
    message = e.response["Error"]["Message"]
    print(f"Error: {message}")
```

------

# Writer AI Palmyra models
<a name="model-parameters-writer-palmyra"></a>

This section describes the request parameters and response fields for Writer AI models. Use this information to make inference calls to Writer AI models with the [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) and [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) (streaming) operations. This section also includes Python code examples that shows how to call Writer AI models. To use a model in an inference operation, you need the model ID for the model. To get the model ID, see [Supported foundation models in Amazon Bedrock](models-supported.md). Some models also work with the [Converse API](conversation-inference.md). To check if the Converse API supports a specific Writer AI model, see [Supported models and model features](conversation-inference-supported-models-features.md). For more code examples, see [Code examples for Amazon Bedrock using AWS SDKs](service_code_examples.md).

Foundation models in Amazon Bedrock support input and output modalities, which vary from model to model. To check the modalities that Writer AI models support, see [Supported foundation models in Amazon Bedrock](models-supported.md). To check which Amazon Bedrock features the Writer AI models support, see [Supported foundation models in Amazon Bedrock](models-supported.md). To check which AWS Regions that Writer AI models are available in, see [Supported foundation models in Amazon Bedrock](models-supported.md).

When you make inference calls with Writer AI models, you include a prompt for the model. For general information about creating prompts for the models that Amazon Bedrock supports, see [Prompt engineering concepts](prompt-engineering-guidelines.md). For Writer AI specific prompt information, see the [Writer AI prompt engineering guide]().

**Writer Palmyra X4**

Top-ranked on Stanford HELM, Writer Palmyra X4 achieves superior performance on complex tasks and agentic workflows. It combines a 128k token context window with a suite of enterprise-grade capabilities, including advanced reasoning, tool-calling, LLM delegation, built-in RAG, code generation, structured outputs, multi-modality, and multi-lingual support. Using enterprise-specific tools that extend the model's ability to take action, Palmyra X4 enables developers to build apps and agents that updating system, performing transaction, sending email, triggering workflow, and more.

**Writer Palmyra X5**

With a one million token context window, Writer Palmyra X5 marks the end of context constraints for app and agent development. Writer's newest model achieves superior performance on long context inference through expanded memory and processing power, enabling developers to build more complex, multi-step agentic workflows faster. Like Palmyra X4, Palmyra X5 includes a suite of enterprise-ready capabilities, including advanced reasoning, tool-calling, LLM delegation, built-in RAG, code generation, structured outputs, multi-modality, and multi-lingual support

**Topics**
+ [

# Writer Palmyra X4
](model-parameters-palmyra-x4.md)
+ [

# Writer Palmyra X5
](model-parameters-palmyra-x5.md)

# Writer Palmyra X4
<a name="model-parameters-palmyra-x4"></a>

Writer Palmyra X4 is a model with a context window of up to 128,000 tokens. This model excels in processing and understanding complex tasks, making it ideal for workflow automation, coding tasks, and data analysis.
+ Provider — Writer
+ Categories — Text generation, code generation, rich text formatting
+ Last version — v1
+ Release date — April 28th, 2025
+ Model ID — `writer.palmyra-x4-v1:0`
+ Modality — Text
+ Max tokens — Input: 122,880 tokens, Output: 8192 tokens
+ Language — English, Spanish, French, German, Chinese and multiple other languages
+ Deployment type — Serverless

## Palmyra X4 invocation request body field
<a name="model-parameters-palmyra-x4-request-body"></a>

When you make an [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) call using a Writer model, fill the `body` field with a JSON object that conforms to the one below. Enter the prompt in the `text` field in the `text_prompts` object.

```
{
"modelId": "writer.palmyra-x4-v1:0",
"contentType": "application/json",
"accept": "application/json",
"body": "{\"messages\":[{\"role\":\"user\",\"content\":{\"text\":\"Explain quantum computing in simple terms\"}}]}"
}
```

The following table shows the minimum, maximum, and default values for the numerical parameters.


****  

| Parameter | Type | Default | Range/Validation | Description | 
| --- | --- | --- | --- | --- | 
| messages | array | Required | 1-∞ items | Chat history messages | 
| temperature | float | 1.0 | 0.0 ≤ x ≤ 2.0 | Sampling temperature | 
| top\$1p | float | 1.0 | 0.0 < value ≤ 1.0 | Nucleus sampling threshold | 
| max\$1tokens | int | 16 | 1 ≤ x ≤ 8192 | Maximum tokens to generate | 
| min\$1tokens | int | 0 | 0 ≤ x ≤ max\$1tokens | Minimum tokens before stopping | 
| stop | array | [] | ≤4 entries | Stop sequences | 
| seed | int | null | Any integer | Random seed | 
| presence\$1penalty | float | 0.0 | -2.0 ≤ x ≤ 2.0 | New token presence penalty | 
| frequency\$1penalty | float | 0.0 | -2.0 ≤ x ≤ 2.0 | Token frequency penalty | 

## Palmyra X4 invocation response body field
<a name="model-parameters-palmyra-x4-response-body"></a>

The response JSON for Writer Palmyra X4 uses the following format:

```
{
  "id": "chatcmpl-a689a6e150b048ca8814890d3d904d41",
  "object": "chat.completion",
  "created": 1745854231,
  "model": "writer.palmyra-x4-v1:0",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": null,
        "content": "Quantum computing harnesses quantum mechanics to process information in extraordinarily powerful ways. Unlike classical bits, which are 0 or 1, quantum bits (qubits) can exist in multiple states simultaneously through superposition. Qubits also entangle, allowing them to be interconnected in such a way that the state of one (whether it's 0 or 1) can depend on the state of another, no matter the distance between them. This combination of superposition and entanglement enables quantum computers to solve complex problems much faster than classical computers, particularly in areas like cryptography, optimization, and simulations of molecular structures. However, quantum computing is still in its early stages, facing challenges in stability and scalability.",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 43,
    "total_tokens": 186,
    "completion_tokens": 143,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null
}
```

## Writer Palmyra X4 example code
<a name="model-parameters-palmyra-x4-example-code"></a>

Example code for Writer Palmyra X4:

```
import boto3
import json
from botocore.exceptions import ClientError

client = boto3.client("bedrock-runtime", region_name="us-west-2")
model_id = "writer.palmyra-x4-v1:0"

# Format the request payload using the model's native structure.
native_request = {
    "temperature": 1,
    "messages": [
        {
            "role": "user",
            "content": "Explain quantum computing in simple terms.",
        }
    ],
}

# Convert the native request to JSON.
request = json.dumps(native_request)

try:
    # Invoke the model with the request.
    response = client.invoke_model(modelId=model_id, body=request)
except (ClientError, Exception) as e:
    print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
    exit(1)

# Decode the response body.
model_response = json.loads(response["body"].read())

# Extract and print the response text.
response_text = model_response["content"][0]["text"]
print(response_text)
```

# Writer Palmyra X5
<a name="model-parameters-palmyra-x5"></a>

Writer Palmyra X5 includes a suite of enterprise-ready capabilities, including advanced reasoning, tool-calling, LLM delegation, built-in RAG, code generation, structured outputs, multi-modality, and multi-lingual support.

The Writer Palmyra X5 model has the following controls:
+ Provider — Writer
+ Categories — Text generation, code generation, rich text formatting
+ Last version — v1
+ Release date — April 28th, 2025
+ Model ID — `writer.palmyra-x5-v1:0`
+ Modality — Text
+ Max tokens — Input: 1,040,000 tokens, Output: 8192 tokens
+ Language — English, Spanish, French, German, Chinese and multiple other languages
+ Deployment type — Serverless

## Palmyra X5 invocation request body field
<a name="model-parameters-palmyra-x5-request-body"></a>

When you make an [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) call using a Writer model, fill the `body` field with a JSON object that conforms to the one below. Enter the prompt in the `text` field in the `text_prompts` object.

```
{
"modelId": "writer.palmyra-x5-v1:0",
"contentType": "application/json",
"accept": "application/json",
"body": "{\"messages\":[{\"role\":\"user\",\"content\":{\"text\":\"Explain quantum computing in simple terms\"}}]}"
}
```

The following table shows the minimum, maximum, and default values for the numerical parameters.


****  

| Parameter | Type | Default | Range/Validation | Description | 
| --- | --- | --- | --- | --- | 
| messages | array | Required | 1-∞ items | Chat history messages | 
| temperature | float | 1.0 | 0.0 ≤ x ≤ 2.0 | Sampling temperature | 
| top\$1p | float | 1.0 | 0.0 < x ≤ 1.0 | Nucleus sampling threshold | 
| max\$1tokens | int | 16 | 1 ≤ x ≤ 8192 | Maximum tokens to generate | 
| min\$1tokens | int | 0 | 0 ≤ x ≤ max\$1tokens | Minimum tokens before stopping | 
| stop | array | [] | ≤4 entries | Stop sequences | 
| seed | int | null | Any integer | Random seed | 
| presence\$1penalty | float | 0.0 | -2.0 ≤ x ≤ 2.0 | New token presence penalty | 
| frequency\$1penalty | float | 0.0 | -2.0 ≤ x ≤ 2.0 | Token frequency penalty | 

## Palmyra X5 invocation response body field
<a name="model-parameters-palmyra-x5-response-body"></a>

The response JSON for Writer Palmyra X5 uses the following format:

```
{
  "id": "chatcmpl-a689a6e150b048ca8814890d3d904d41",
  "object": "chat.completion",
  "created": 1745854231,
  "model": "writer.palmyra-x5-v1:0",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": null,
        "content": "Quantum computing harnesses quantum mechanics to process information in extraordinarily powerful ways. Unlike classical bits, which are 0 or 1, quantum bits (qubits) can exist in multiple states simultaneously through superposition. Qubits also entangle, allowing them to be interconnected in such a way that the state of one (whether it's 0 or 1) can depend on the state of another, no matter the distance between them. This combination of superposition and entanglement enables quantum computers to solve complex problems much faster than classical computers, particularly in areas like cryptography, optimization, and simulations of molecular structures. However, quantum computing is still in its early stages, facing challenges in stability and scalability.",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 43,
    "total_tokens": 186,
    "completion_tokens": 143,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null
}
```