# Submit prompts and generate responses with model inference
<a name="inference"></a>

Inference refers to the process of generating an output from an input provided to a model.

Amazon Bedrock offers a suite of foundation models that you can use to generate outputs of the following modalities. To see modality support by foundation model, refer to [Supported foundation models in Amazon Bedrock](models-supported.md).


****  

| Output modality | Description | Example use cases | 
| --- | --- | --- | 
| Text | Provide text input and generate various types of text | Chat, question-and-answering, brainstorming, summarization, code generation, table creation, data formatting, rewriting | 
| Image | Provide text or input images and generate or modify images | Image generation, image editing, image variation | 
| Video | Provide text or reference images and generate a video | Video generation, image conversion to video | 
| Embeddings | Provide text, images, or both text and images and generate a vector of numeric values that represent the input. The output vector can be compared to other embeddings vectors to determine semantic similarity (for text) or visual similarity (for images). | Text and image search, query, categorization, recommendations, personalization, [knowledge base creation](knowledge-base.md) | 

**Topics**
+ [

# Learn about use cases for different model inference methods
](inference-methods.md)
+ [

# How inference works in Amazon Bedrock
](inference-how.md)
+ [

# Influence response generation with inference parameters
](inference-parameters.md)
+ [

# Supported Regions and models for running model inference
](inference-supported.md)
+ [

# Prerequisites for running model inference
](inference-prereq.md)
+ [

# Generate responses in the console using playgrounds
](playgrounds.md)
+ [

# Enhance model responses with model reasoning
](inference-reasoning.md)
+ [

# Optimize model inference for latency
](latency-optimized-inference.md)
+ [

# Generate responses using OpenAI APIs
](bedrock-mantle.md)
+ [

# Submit prompts and generate responses using the API
](inference-api.md)
+ [

# Get validated JSON results from models
](structured-output.md)
+ [

# Use a computer use tool to complete an Amazon Bedrock model response
](computer-use.md)

# Learn about use cases for different model inference methods
<a name="inference-methods"></a>

You can directly run model inference in the following ways:


****  

| Method | Use case | 
| --- | --- | 
| [Amazon Bedrock console playgrounds](playgrounds.md) | Run inference in a user-friendly graphical interface. Convenient for exploration. | 
| [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html) or [ConverseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseStream.html) | Implement conversational applications with a unified API for model input. | 
| [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) | Submit a single prompt and generate a response synchronously. Useful for generating responses in real time or for search queries. | 
| [StartAsyncInvoke](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_StartAsyncInvoke.html) | Submit a single prompt and generate a response asynchronously. Useful for generating responses at a large scale. | 
| [CreateModelInvocationJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateModelInvocationJob.html) | Prepare a dataset of prompts and generate responses in batches. | 
| [Responses API](https://platform.openai.com/docs/api-reference/responses) | Use the Responses API for modern, agentic applications requiring built-in tool use (search, code interpreter), multimodal inputs, and stateful conversations. | 
| [Chat Completions](https://platform.openai.com/docs/api-reference/chat) | Use the Chat Completions API for lightweight, stateless, text-focused tasks where you need full control over chat history management and lower latency. | 

The following Amazon Bedrock features also use model inference as a step in a larger workflow:
+ [Model evaluation](evaluation.md) uses the model invocation process to evaluate the performance of different models after you submit a [CreateEvaluationJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateEvaluationJob.html) request.
+ [Knowledge bases](knowledge-base.md) use model invocation when using the [https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_RetrieveAndGenerate.html](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_RetrieveAndGenerate.html) API to generate a response based on results retrieved from a knowledge base.
+ [Agents](agents.md) use model invocation to generate responses in various stages during an [https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_InvokeAgent.html](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_InvokeAgent.html) request.
+ [Flows](flows.md) include Amazon Bedrock resources, such as prompts, knowledge bases, and agents, which use model invocation.

After testing out different foundation models with different prompts and inference parameters, you can configure your application to call these APIs with your desired specifications.

# How inference works in Amazon Bedrock
<a name="inference-how"></a>

When you submit an input to a model, the model predicts a probable sequence of tokens that follows, and returns that sequence as the output. Amazon Bedrock provides you the capability of running inference with the foundation model of your choice. When you run inference, you provide the following inputs:
+ **Prompt** – An input provided to the model in order for it to generate a response. For information about writing prompts, see [Prompt engineering concepts](prompt-engineering-guidelines.md). For information about protecting against prompt injection attacks, see [Prompt injection security](prompt-injection.md).
+ **Model** – You make requests to a model to run inference on a prompt. The model that you choose also specifies a level of throughput, which defines the number and rate of input and output tokens that you can process. You can make requests to the following types of models:
  + **Base model** – A foundation model to run inference with. Requests are sent to a single AWS Region. For model IDs, see [Supported foundation models in Amazon Bedrock](models-supported.md). For more information about the foundation models that are available in Amazon Bedrock, see [Amazon Bedrock foundation model information](foundation-models-reference.md). 
  + **Inference profile** – A foundation model to run inference with. Requests are made to the model in a multiple AWS Regions. For inference profile IDs, see [Supported Regions and models for inference profiles](inference-profiles-support.md).
**Note**  
Models differ in their base model and inference profile availability by Region and by API method. For more information, see [Supported foundation models in Amazon Bedrock](models-supported.md) and individual model pages in the [Foundation model reference](foundation-models-reference.md).
  + **Provisioned Throughput** – A foundation model for which you've purchased dedicated throughput. For more information, see [Increase model invocation capacity with Provisioned Throughput in Amazon Bedrock](prov-throughput.md)
  + **Custom model** – A foundation model whose weights have been modified through model customization. For more information, see [Customize your model to improve its performance for your use case](custom-models.md).
+ **Inference parameters** – A set of values that can be adjusted to limit or influence the model response. For information about inference parameters, see [Influence response generation with inference parameters](inference-parameters.md) and [Inference request parameters and response fields for foundation models](model-parameters.md).

## Invoking models in different AWS Regions
<a name="inference-how-regions"></a>

When you invoke a model, you choose the AWS Region in which to invoke it. The quotas for the frequency and size of the requests that you can make depend on the Region. You can find these quotas by searching for the following quotas at [Amazon Bedrock service quotas](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#limits_bedrock):
+ On-demand model inference requests per minute for *\$1\$1Model\$1*
+ On-demand InvokeModel tokens per minute for *\$1\$1Model\$1*

You can also invoke an inference profile instead of the foundation model itself. An inference profile defines a model and one or more Regions to which the inference profile can route model invocation requests. By invoking an inference profile that includes multiple Regions, you can increase your throughput. For more information, see [Increase throughput with cross-Region inference](cross-region-inference.md). To see the quotas for the frequency and size of the requests that you can make with an inference profile, search for the following quotas at [Amazon Bedrock service quotas](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#limits_bedrock):
+ Cross-Region InvokeModel requests per minute for *\$1\$1Model\$1*
+ Cross-Region InvokeModel tokens per minute for *\$1\$1Model\$1*
+ Global Cross-Region InvokeModel requests per minute for *\$1\$1Model\$1*
+ Global Cross-Region InvokeModel tokens per minute for *\$1\$1Model\$1*

Requests made to a Region may be served out of local zones that share the same parent Region. For example, requests made to US East (N. Virginia) (us-east-1) may be served out of any local zone associated with it, such as Atlanta, US (us-east-1-atl-2a).

The same principle applies when using cross-Region inference. For example, requests made to the US Anthropic Claude 3 Haiku inference profile may be served out of any local zone whose parent Region is in US, such as Seattle, US (us-west-2-sea-1a). When new local zones are added to AWS, they will be also be added to the corresponding cross-Region inference endpoint.

To see a list of local endpoints and the parent Regions they're associated with, see [AWS Local Zones Locations](https://aws.amazon.com/about-aws/global-infrastructure/localzones/locations/).

When you invoke a cross-Region inference profile in Amazon Bedrock, your request originates from a source Region and is automatically routed to one of the destination Regions defined in that profile, optimizing for performance. The destination Regions for Global cross-Region inference profile includes all commercial Regions.

Global cross-Region inference profile for a specific model can change over time as AWS adds more commercial Regions where your requests can be processed. However, if an inference profile is tied to a geography (such as US, EU, or APAC), its destination Region list will never change. AWS might create new inference profiles that incorporate new Regions. You can update your systems to use these inference profiles by changing the IDs in your setup to the new ones.

**Note**  
The destination Regions in a cross-Region inference profile can include **opt-in Regions**, which are Regions that you must explicitly enable at AWS account or Organization level. To learn more, see [Enable or disable AWS Regions in your account](https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-regions.html). When using a cross-Region inference profile, your inference request can be routed to any of the destination Regions in the profile, even if you did not opt-in to such Regions in your account.

Service Control Policies (SCPs) and AWS Identity and Access Management (IAM) policies work together to control where cross-Region inference is allowed. Using SCPs, you can control which Regions Amazon Bedrock can use for inference, and using IAM policies, you can define which users or roles have permission to run inference. If any destination Region in a cross-Region inference profile is blocked in your SCPs, the request will fail even if other Regions remain allowed. To ensure efficient operation with cross-region inference, you can update your SCPs and IAM policies to allow all required Amazon Bedrock inference actions (for example, `bedrock:InvokeModel*` or `bedrock:CreateModelInvocationJob`) in all destination Regions included in your chosen inference profile. To learn more, see [https://aws.amazon.com/blogs/machine-learning/enable-amazon-bedrock-cross-region-inference-in-multi-account-environments/](https://aws.amazon.com/blogs/machine-learning/enable-amazon-bedrock-cross-region-inference-in-multi-account-environments/)Enabling Amazon Bedrock cross-Region inference in multi-account environments.

# Influence response generation with inference parameters
<a name="inference-parameters"></a>

When running model inference, you can adjust inference parameters to influence the model response. Inference parameters can change the pool of possible outputs that the model considers during generation, or they can limit the final response.

Inference parameter default values and ranges depend on the model. To learn about inference parameters for different models, see [Inference request parameters and response fields for foundation models](model-parameters.md).

The following categories of parameters are commonly found across different models:

**Topics**
+ [

## Randomness and diversity
](#inference-randomness)
+ [

## Length
](#inference-length)

## Randomness and diversity
<a name="inference-randomness"></a>

For any given sequence, a model determines a probability distribution of options for the next token in the sequence. To generate each token in an output, the model samples from this distribution. Randomness and diversity refer to the amount of variation in a model's response. You can control these factors by limiting or adjusting the distribution. Foundation models typically support the following parameters to control randomness and diversity in the response.
+ **Temperature**– Affects the shape of the probability distribution for the predicted output and influences the likelihood of the model selecting lower-probability outputs.
  + Choose a lower value to influence the model to select higher-probability outputs.
  + Choose a higher value to influence the model to select lower-probability outputs.

  In technical terms, the temperature modulates the probability mass function for the next token. A lower temperature steepens the function and leads to more deterministic responses, and a higher temperature flattens the function and leads to more random responses.
+ **Top K** – The number of most-likely candidates that the model considers for the next token.
  + Choose a lower value to decrease the size of the pool and limit the options to more likely outputs.
  + Choose a higher value to increase the size of the pool and allow the model to consider less likely outputs.

  For example, if you choose a value of 50 for Top K, the model selects from 50 of the most probable tokens that could be next in the sequence.
+ **Top P** – The percentage of most-likely candidates that the model considers for the next token.
  + Choose a lower value to decrease the size of the pool and limit the options to more likely outputs.
  + Choose a higher value to increase the size of the pool and allow the model to consider less likely outputs.

  In technical terms, the model computes the cumulative probability distribution for the set of responses and considers only the top P% of the distribution.

  For example, if you choose a value of 0.8 for Top P, the model selects from the top 80% of the probability distribution of tokens that could be next in the sequence.

The following table summarizes the effects of these parameters.


****  

| Parameter | Effect of lower value | Effect of higher value | 
| --- | --- | --- | 
| Temperature | Increase likelihood of higher-probability tokens Decrease likelihood of lower-probability tokens | Increase likelihood of lower-probability tokensDecrease likelihood of higher-probability tokens | 
| Top K | Remove lower-probability tokens | Allow lower-probability tokens | 
| Top P | Remove lower-probability tokens | Allow lower-probability tokens | 

As an example to understand these parameters, consider the example prompt **I hear the hoof beats of "**. Let's say that the model determines the following three words to be candidates for the next token. The model also assigns a probability for each word.

```
{
    "horses": 0.7,
    "zebras": 0.2,
    "unicorns": 0.1
}
```
+ If you set a high **temperature**, the probability distribution is flattened and the probabilities become less different, which would increase the probability of choosing "unicorns" and decrease the probability of choosing "horses".
+ If you set **Top K** as 2, the model only considers the top 2 most likely candidates: "horses" and "zebras."
+ If you set **Top P** as 0.7, the model only considers "horses" because it is the only candidate that lies in the top 70% of the probability distribution. If you set **Top P** as 0.9, the model considers "horses" and "zebras" as they are in the top 90% of probability distribution.

## Length
<a name="inference-length"></a>

Foundation models typically support parameters that limit the length of the response. Examples of these parameters are provided below.
+ **Response length** – An exact value to specify the minimum or maximum number of tokens to return in the generated response.
+ **Penalties** – Specify the degree to which to penalize outputs in a response. Examples include the following.
  + The length of the response.
  + Repeated tokens in a response.
  + Frequency of tokens in a response.
  + Types of tokens in a response.
+ **Stop sequences** – Specify sequences of characters that stop the model from generating further tokens. If the model generates a stop sequence that you specify, it will stop generating after that sequence.

# Supported Regions and models for running model inference
<a name="inference-supported"></a>

Model inference using foundation models is supported in all Regions and with all models supported by Amazon Bedrock. To see the Regions and models supported by Amazon Bedrock, refer to [Supported foundation models in Amazon Bedrock](models-supported.md).

You can also run model inference with Amazon Bedrock resources other than foundation models. Refer to the following pages to see Region and model availability for different resources:
+ [Supported Regions and models for inference profiles](inference-profiles-support.md)
+ [Supported Regions and models for Prompt management](prompt-management-supported.md)
**Note**  
[InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) and [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) only work on prompts from Prompt management whose configuration specifies an Anthropic Claude or Meta Llama model.
+ [Supported models and Regions for fine-tuning](custom-model-fine-tuning.md#custom-model-supported)
+ [Use Custom model import to import a customized open-source model into Amazon Bedrock](model-customization-import-model.md)
+ [Supported Regions and models for Amazon Bedrock Guardrails](guardrails-supported.md)

# Prerequisites for running model inference
<a name="inference-prereq"></a>

For a role to run model inference, you need to allow it to perform the model invocation API actions. If your role has the [AmazonBedrockFullAccess](security-iam-awsmanpol.md#security-iam-awsmanpol-AmazonBedrockFullAccess) AWS managed policy attached, you can skip this section. Otherwise, attach the following permissions to the role to allow it to use the [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html), [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html), [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html), and [ConverseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseStream.html) actions with all supported resources in Amazon Bedrock:

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "ModelInvocationPermissions",
            "Effect": "Allow",
            "Action": [
                "bedrock:InvokeModel",
                "bedrock:InvokeModelWithResponseStream",
                "bedrock:GetInferenceProfile",
                "bedrock:ListInferenceProfiles",
                "bedrock:RenderPrompt",
                "bedrock:GetCustomModel",
                "bedrock:ListCustomModels",
                "bedrock:GetImportedModel",
                "bedrock:ListImportedModels",
                "bedrock:GetProvisionedModelThroughput",
                "bedrock:ListProvisionedModelThroughputs",
                "bedrock:GetGuardrail",
                "bedrock:ListGuardrails",
                "bedrock:ApplyGuardrail"
            ],
            "Resource": "*"
        }
    ]
}
```

------

To further restrict permissions, you can omit actions, or you can specify resources and condition keys by which to filter permissions. For more information about actions, resources, and condition keys, see the following topics in the *Service Authorization Reference*:
+ [Actions defined by Amazon Bedrock](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonbedrock.html#amazonbedrock-actions-as-permissions) – Learn about actions, the resource types that you can scope them to in the `Resource` field, and the condition keys that you can filter permissions on in the `Condition` field.
+ [Resource types defined by Amazon Bedrock](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonbedrock.html#amazonbedrock-resources-for-iam-policies) – Learn about the resource types in Amazon Bedrock.
+ [Condition keys for Amazon Bedrock](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonbedrock.html#amazonbedrock-policy-keys) – Learn about the condition keys in Amazon Bedrock.

The following list summarizes whether you need an action, depending on your use case:
+ `bedrock:InvokeModel` – Required to carry out model invocation. Allows the role to call the [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) and [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html) API operations.
+ `bedrock:InvokeModelWithResponseStream` – Required to carry out model invocation and return streaming responses. Allows the role to call the [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) and [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html)Stream API operations.
+ The following actions allow a role to run inference with Amazon Bedrock resources other than foundation models:
  + `bedrock:GetInferenceProfile` – Required to run inference with an [inference profile](inference-profiles.md).
  + `bedrock:RenderPrompt` – Required to invoke a prompt from [Prompt management](prompt-management.md).
  + `bedrock:GetCustomModel` – Required to run inference with a [custom model](custom-models.md).
  + `bedrock:GetImportedModel` – Required to run inference with an [imported model](model-customization-import-model.md).
  + `bedrock:GetProvisionedModelThroughput` – Required to run inference with a [Provisioned Throughput](prov-throughput.md).
+ The following actions allow a role to see Amazon Bedrock resources other than foundation models in the Amazon Bedrock console and to select them:
  + `bedrock:ListInferenceProfiles` – Required to choose an [inference profile](custom-models.md) in the Amazon Bedrock console.
  + `bedrock:ListCustomModels` – Required to choose a [custom model](custom-models.md) in the Amazon Bedrock console.
  + `bedrock:ListImportedModels` – Required to choose an [imported model](model-customization-import-model.md) in the Amazon Bedrock console.
  + `bedrock:ListProvisionedModelThroughputs` – Required to choose a [Provisioned Throughput](prov-throughput.md) in the Amazon Bedrock console.
+ The following actions allow a role to access and apply guardrails from [Amazon Bedrock Guardrails](guardrails.md) during model invocation:
  + `bedrock:GetGuardrail` – Required to use a guardrail during model invocation.
  + `bedrock:ApplyGuardrail` – Required to apply a guardrail during model invocation.
  + `bedrock:ListGuardrails` – Required to choose a guardrail in the Amazon Bedrock console.

# Generate responses in the console using playgrounds
<a name="playgrounds"></a>

The Amazon Bedrock playgrounds are a tool in the AWS Management Console that provide a visual interface to experiment with running inference on different models and using different configurations. You can use the playgrounds to test different models and values before you integrate them into your application.

Running a prompt in a playground is equivalent to making an [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html), [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html), [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html), or [ConverseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseStream.html) request in the API.

Amazon Bedrock offers the following playgrounds for you to experiment with:
+ **Chat/text** – Submit text prompts and generate responses, or interact with speech. You can select one of the following modes:
  + **Chat** – Submit a text prompt or interact with speech. For text prompts, you can also include images or documents to supplement the prompt. Subsequent prompts that you submit will include your previous prompts as context, such that the sequence of prompts and responses resembles a conversation.
  + **Single prompt** – Submit a single text prompt and generate a response to it.
**Note**  
Speech-to-speech models such as Amazon Nova Sonic are only available in chat mode. Compare mode is not supported for speech-to-speech models.
+ **Image** – Submit a text prompt to generate an image. You can also submit an image prompt and specify whether to edit it or to generate variations of it.
+ **Multi-modal (preview)** – Submit text prompts and generate multi-modal content. It also supports chat and single prompt modes.

The following procedure describes how to submit a prompt in the playground, the options that you can adjust, and the actions that you can take after the model generates a response.

**To use a playground**

1. If you haven't already, request access to the models that you want to use. For more information, see [Access Amazon Bedrock foundation models](model-access.md).

1. Sign in to the AWS Management Console with an IAM identity that has permissions to use the Amazon Bedrock console. Then, open the Amazon Bedrock console at [https://console.aws.amazon.com/bedrock](https://console.aws.amazon.com/bedrock).

1. From the navigation pane, under Test, choose **Playground**.

1. If you're in the **Chat/text** playground, select a **Mode**.

1. Choose **Select model** and select a provider, model, and throughput to use. For more information about increasing throughput, see [Increase throughput with cross-Region inference](cross-region-inference.md) and [Increase model invocation capacity with Provisioned Throughput in Amazon Bedrock](prov-throughput.md).

1. Submit the following information to generate a response:
   + Prompt – One or more sentences of text that set up a scenario, question, or task for a model. For information about creating prompts, see [Prompt engineering concepts](prompt-engineering-guidelines.md).

     Some models (refer to [Supported models and model features](conversation-inference-supported-models-features.md)) allow you to include a file in the following ways:
     + Select the attachment icon and choose a file to upload.
     + Select the attachment icon and choose an Amazon S3 object to upload.
     + Drag a file onto the prompt.

     Include files to complement your prompt. You can refer to the file in the prompt text. For example, you could write **Summarize this document for me** or **Tell me what's in this image**. You can include the following types of files:
     + **Documents** – Add documents to complement the prompt. For a list of supported file types, see the `format` field in [DocumentBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_DocumentBlock.html).
**Warning**  
Document names are vulnerable to prompt injections, because the model might inadvertently interpret them as instructions. Therefore, we recommend that you specify a neutral name.
     + **Images** – Add images to complement the prompt, if the model supports multimodal image and text inputs. For a list of supported file types, see the `format` field in the [ImageBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ImageBlock.html).
     + **Videos** – Add videos to complement the prompt, if the model supports multimodal video and text inputs. For a list of supported file types, see the `format` field in the [VideoBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_VideoBlock.html).
**Note**  
Content restrictions vary by underlying API operation and model. For more information, see [API restrictions](inference-api-restrictions.md).
   + Configurations – Settings that you adjust to modify the model response. Configurations include the following:
     + Inference parameters – Values that affect or limit how the model generates the response. For more information, see [Influence response generation with inference parameters](inference-parameters.md). To see inference parameters for specific models, refer to [Inference request parameters and response fields for foundation models](model-parameters.md).
     + System prompts – Prompts that provide instructions or context to the model about the task that it should perform or the persona that it should adopt. For more information and a list of models that support system prompts, see [Carry out a conversation with the Converse API operations](conversation-inference.md).
     + Guardrails – Filters out harmful or unwanted content in prompts and model responses. For more information, see [Detect and filter harmful content by using Amazon Bedrock Guardrails](guardrails.md).

1. (Optional) If a model supports streaming, the default behavior is to stream the responses. You can turn off streaming by choosing the options icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/icons/vertical-ellipsis.png)) and modifying the **Streaming preference** option.

1. (Optional) Some text generation models support comparative evaluation, you can compare responses from different models by doing the following:

   1. Turn on **Compare mode**.

   1. Choose **Select model** and select a provider, model, and throughput to use.

   1. Choose the configurations icon (![\[Three horizontal sliders with adjustable circular controls for settings or parameters.\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/icons/configurations.png)) to modify the configurations to use.

   1. To add more models to compare, choose the \$1 icon on the right, select a model, and modify the configurations as necessary.

1. (Optional) If a model supports prompt caching, you can open the **Configurations** panel and turn on **Prompt caching** to enabling caching of your input and model responses for reduced cost and latency. For more information, see [Prompt caching for faster model inference](prompt-caching.md).

1. To run the prompt, choose **Run**. Amazon Bedrock doesn't store any text, images, or documents that you provide. The data is only used to generate the response. 
**Note**  
If the response violates the content moderation policy, Amazon Bedrock doesn't display it. If you have turned on streaming, Amazon Bedrock clears the entire response if it generates content that violates the policy. For more details, navigate to the Amazon Bedrock console, select **Providers**, and read the text under the **Content limitations** section.

1. The model returns the response. If you're using the chat mode of the playground, you can submit a prompt to reply to the response and generate another response.

1. After generating a response, you have the following options:
   + To export the response as a JSON file, choose the options icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/icons/vertical-ellipsis.png)) and select **Export as JSON**.
   + To view the API request that you made, choose the options icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/icons/vertical-ellipsis.png)) and select **View API request**.
   + In the chat mode of the playground, you can view metrics in the **Model metrics** section. The following model metrics are available:
     + **Latency** — The time it takes between when the request is received by Amazon Bedrock and when the response is returned (for non-streaming responses) or when the response stream is completed (for streaming responses).
     + **Input token count** — The number of tokens that are fed into the model as input during inference.
     + **Output token count** — The number of tokens generated in response to a prompt. Longer, more conversational, responses require more tokens.
     + **Cost** — The cost of processing the input and generating output tokens.

     To set metric criteria that you want the response to match, choose **Define metric criteria** and define conditions for the model to match. After you apply the criteria, the **Model metrics** section shows how many and which criteria were met by the response.

     If criteria are unmet, you can choose a different model, rewrite the prompt, or modify configurations and rerun the prompt.

# Enhance model responses with model reasoning
<a name="inference-reasoning"></a>

Some foundation models are able to perform model reasoning, where they are able to take a larger, complex task and break it down into smaller, simpler steps. This process is often referred to as chain of thought (CoT) reasoning. Chain of thought reasoning can often improve model accuracy by giving the model a chance to think before it responds. Model reasoning is most useful for task such as multi-step analysis, math problems, and complex reasoning tasks. 

For example, in tackling a mathematical word problem, the model can first identify the relevant variables, then construct equations based on the given information, and finally solve those equations to reach the solution. This strategy not only minimizes errors but also makes the reasoning process more transparent and easier to follow, thereby enhancing the quality of foundation model's output.

Model reasoning is not necessary for all tasks and does come with additional overhead, including increased latency and output tokens. Simple tasks that don't need additional explanations are not good candidates for CoT reasoning.

Note that not all models allow you to configure the number of output tokens that are allocated for model reasoning.

Model reasoning is available for the following models.


| Foundation Model | Model ID | Number of tokens | Reasoning configuration | 
| --- | --- | --- | --- | 
| Anthropic Claude Opus 4 | anthropic.claude-opus-4-20250514-v1:0 | This model will have 32,768 tokens, which includes both output and reasoning tokens. | Reasoning can be enabled or disabled for this model using a configurable token budget. By default, reasoning is disabled. | 
| Anthropic Claude Sonnet 4 | anthropic.claude-sonnet-4-20250514-v1:0 | This model will have 65,536 tokens, which includes both output and reasoning tokens. | Reasoning can be enabled or disabled for this model using a configurable token budget. By default, reasoning is disabled. | 
| Anthropic Claude 3.7 Sonnet | anthropic.claude-3-7-sonnet-20250219-v1:0 | This model will have 65,536 tokens, which includes both output and reasoning tokens. | Reasoning can be enabled or disabled for this model using a configurable token budget. By default, reasoning is disabled. | 
| DeepSeek DeepSeek-R1 | deepseek.r1-v1:0 | This model will have 8192 tokens, which includes both output and reasoning tokens. The number of thinking tokens cannot be configured and the maximum number of output tokens must not be greater than 8192. | Reasoning is always enabled for this model. The model does not support toggling the reasoning capability on and off. | 

# Optimize model inference for latency
<a name="latency-optimized-inference"></a>

**Note**  
The Latency Optimized Inference feature is in preview release for Amazon Bedrock and is subject to change.

Latency-optimized inference for foundation models in Amazon Bedrock delivers faster response times and improved responsiveness for AI applications. The optimized versions of [https://docs.aws.amazon.com/nova/latest/userguide/what-is-nova.html](https://docs.aws.amazon.com/nova/latest/userguide/what-is-nova.html), [Anthropic's Claude 3.5 Haiku model](https://aws.amazon.com/bedrock/claude/) and [Meta's Llama 3.1 405B and 70B models](https://aws.amazon.com/bedrock/llama/) offer significantly reduced latency without compromising accuracy. 

Accessing the latency optimization capability requires no additional setup or model fine-tuning, allowing for immediate enhancement of existing applications with faster response times. You can set the “Latency” parameter to “optimized” while calling the Amazon Bedrock runtime API. If you select "standard" as your invocation option, your requests will be served by standard inference. By default all requests are routed to through "standard".

```
"performanceConfig" : {
    "latency" : "standard | optimized" 
}
```

Once you reach the usage quota for latency optimization for a model, we will attempt to serve the request with Standard latency. In such cases, the request will be charged at Standard latency rates. The latency configuration for a served request is visible in API response and AWS CloudTrail logs. You can also view metrics for latency optimized requests in Amazon CloudWatch logs under "model-id\$1latency-optimized".

Latency optimized inference is available for Meta’s Llama 3.1 70B and 405B, as well as Anthropic’s Claude 3.5 Haiku in the US East (Ohio) and US West (Oregon) Regions via [cross-Region inference](https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html).

Latency optimized inference is available for Amazon Nova Pro in the US East (N. Virginia), US East (Ohio), and US West (Oregon) Regions via [cross-Region inference](https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html).

For more information about pricing, visit the [pricing page](https://aws.amazon.com/bedrock/pricing/).

**Note**  
 Latency optimized inference for Llama 3.1 405B currently supports requests with total input and output token count up to 11K. For larger token count requests, we will fall back to the standard mode. 

The following table shows the inference profiles that support latency optimization:


| Provider | Model | Model ID | Cross-region inference profile support | 
| --- | --- | --- | --- | 
| Amazon | Nova Pro | amazon.nova-pro-v1:0 |  us-east-1 us-east-2  | 
| Anthropic | Claude 3.5 Haiku | anthropic.claude-3-5-haiku-20241022-v1:0 |  us-east-2 us-west-2  | 
| Meta | Llama 3.1 405B Instruct | meta.llama3-1-405b-instruct-v1:0 |  us-east-2  | 
| Meta | Llama 3.1 70B Instruct | meta.llama3-1-70b-instruct-v1:0 |  us-east-2 us-west-2  | 

For more information about inference profiles, see [Supported Regions and models for inference profiles](inference-profiles-support.md).

# Generate responses using OpenAI APIs
<a name="bedrock-mantle"></a>

Amazon Bedrock provides OpenAI compatible API endpoints for model inference, powered by Mantle, a distributed inference engine for large-scale machine learning model serving. These endpoints allow you to use familiar OpenAI SDKs and tools with Amazon Bedrock models, enabling you to migrate existing applications with minimal code changes—simply update your base URL and API key.

Key benefits include:
+ **Asynchronous inference** – Support for long-running inference workloads through the Responses API
+ **Stateful conversation management** – Automatically rebuild context without manually passing conversation history with each request
+ **Simplified tool use** – Streamlined integration for agentic workflows
+ **Flexible response modes** – Support for both streaming and non-streaming responses
+ **Easy migration** – Compatible with existing OpenAI SDK codebases

## Supported Regions and Endpoints
<a name="bedrock-mantle-supported"></a>

Amazon Bedrock is available in the following AWS Regions:


| Region Name | Region | Endpoint | 
| --- | --- | --- | 
| US East (Ohio) | us-east-2 | bedrock-mantle.us-east-2.api.aws | 
| US East (N. Virginia) | us-east-1 | bedrock-mantle.us-east-1.api.aws | 
| US West (Oregon) | us-west-2 | bedrock-mantle.us-west-2.api.aws | 
| Asia Pacific (Jakarta) | ap-southeast-3 | bedrock-mantle.ap-southeast-3.api.aws | 
| Asia Pacific (Mumbai) | ap-south-1 | bedrock-mantle.ap-south-1.api.aws | 
| Asia Pacific (Tokyo) | ap-northeast-1 | bedrock-mantle.ap-northeast-1.api.aws | 
| Europe (Frankfurt) | eu-central-1 | bedrock-mantle.eu-central-1.api.aws | 
| Europe (Ireland) | eu-west-1 | bedrock-mantle.eu-west-1.api.aws | 
| Europe (London) | eu-west-2 | bedrock-mantle.eu-west-2.api.aws | 
| Europe (Milan) | eu-south-1 | bedrock-mantle.eu-south-1.api.aws | 
| Europe (Stockholm) | eu-north-1 | bedrock-mantle.eu-north-1.api.aws | 
| South America (São Paulo) | sa-east-1 | bedrock-mantle.sa-east-1.api.aws | 

## Prerequisites
<a name="bedrock-mantle-prereq"></a>

Before using OpenAI APIs, ensure you have the following:
+ **Authentication** – You can authenticate using:
  + Amazon Bedrock API key (required for OpenAI SDK)
  + AWS credentials (supported for HTTP requests)
+ **OpenAI SDK** (optional) – Install the OpenAI Python SDK if using SDK-based requests.
+ **Environment variables** – Set the following environment variables:
  + `OPENAI_API_KEY` – Set to your Amazon Bedrock API key
  + `OPENAI_BASE_URL` – Set to the Amazon Bedrock endpoint for your region (for example, ` https://bedrock-mantle.us-east-1.api.aws/v1`)

## Models API
<a name="bedrock-mantle-models"></a>

The Models API allows you to discover available models in Amazon Bedrock powered by Mantle. Use this API to retrieve a list of models you can use with the Responses API and Chat Completions API. For complete API details, see the [OpenAI Models documentation](https://platform.openai.com/docs/api-reference/models).

### List available models
<a name="bedrock-mantle-models-list"></a>

To list available models, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

```
# List all available models using the OpenAI SDK
# Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables

from openai import OpenAI

client = OpenAI()

models = client.models.list()

for model in models.data:
    print(model.id)
```

------
#### [ HTTP request ]

Make a GET request to `/v1/models`:

```
# List all available models
# Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables

curl -X GET $OPENAI_BASE_URL/models \
   -H "Authorization: Bearer $OPENAI_API_KEY"
```

------

## Responses API
<a name="bedrock-mantle-responses"></a>

The Responses API provides stateful conversation management with support for streaming, background processing, and multi-turn interactions. For complete API details, see the [OpenAI Responses documentation](https://platform.openai.com/docs/api-reference/responses).

### Basic request
<a name="bedrock-mantle-responses-create"></a>

To create a response, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

```
# Create a basic response using the OpenAI SDK
# Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables

from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="openai.gpt-oss-120b",
    input=[
        {"role": "user", "content": "Hello! How can you help me today?"}
    ]
)

print(response)
```

------
#### [ HTTP request ]

Make a POST request to `/v1/responses`:

```
# Create a basic response
# Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables

curl -X POST $OPENAI_BASE_URL/responses \
   -H "Content-Type: application/json" \
   -H "Authorization: Bearer $OPENAI_API_KEY" \
   -d '{
    "model": "openai.gpt-oss-120b",
    "input": [
        {"role": "user", "content": "Hello! How can you help me today?"}
    ]
}'
```

------

### Stream responses
<a name="bedrock-mantle-responses-streaming"></a>

To receive response events incrementally, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

```
# Stream response events incrementally using the OpenAI SDK
# Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables

from openai import OpenAI

client = OpenAI()

stream = client.responses.create(
    model="openai.gpt-oss-120b",
    input=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for event in stream:
    print(event)
```

------
#### [ HTTP request ]

Make a POST request to `/v1/responses` with `stream` set to `true`:

```
# Stream response events incrementally
# Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables

curl -X POST $OPENAI_BASE_URL/responses \
   -H "Content-Type: application/json" \
   -H "Authorization: Bearer $OPENAI_API_KEY" \
   -d '{
    "model": "openai.gpt-oss-120b",
    "input": [
        {"role": "user", "content": "Tell me a story"}
    ],
    "stream": true
}'
```

------

## Chat Completions API
<a name="bedrock-mantle-chat-completions"></a>

The Chat Completions API generates conversational responses. For complete API details, see the [OpenAI Chat Completions documentation](https://platform.openai.com/docs/api-reference/chat/create).

### Create a chat completion
<a name="bedrock-mantle-chat-completions-create"></a>

To create a chat completion, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

Configure the OpenAI client using environment variables:

```
# Create a chat completion using the OpenAI SDK
# Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables

from openai import OpenAI

client = OpenAI()

completion = client.chat.completions.create(
    model="openai.gpt-oss-120b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)

print(completion.choices[0].message)
```

------
#### [ HTTP request ]

Make a POST request to `/v1/chat/completions`:

```
# Create a chat completion
# Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables

curl -X POST $OPENAI_BASE_URL/chat/completions \
   -H "Content-Type: application/json" \
   -H "Authorization: Bearer $OPENAI_API_KEY" \
   -d '{
    "model": "openai.gpt-oss-120b",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
}'
```

------

### Enable streaming
<a name="bedrock-mantle-chat-completions-streaming"></a>

To receive responses incrementally, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

```
# Stream chat completion responses incrementally using the OpenAI SDK
# Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables

from openai import OpenAI

client = OpenAI()

stream = client.chat.completions.create(
    model="openai.gpt-oss-120b",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")
```

------
#### [ HTTP request ]

Make a POST request to `/v1/chat/completions` with ` stream` set to `true`:

```
# Stream chat completion responses incrementally
# Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables

curl -X POST $OPENAI_BASE_URL/chat/completions \
   -H "Content-Type: application/json" \
   -H "Authorization: Bearer $OPENAI_API_KEY" \
   -d '{
    "model": "openai.gpt-oss-120b",
    "messages": [
        {"role": "user", "content": "Tell me a story"}
    ],
    "stream": true
}'
```

------

# Submit prompts and generate responses using the API
<a name="inference-api"></a>

Amazon Bedrock offers the followingAPI operations for carrying out model inference:
+ [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) – Submit a prompt and generate a response. The request body is model-specific. To generate streaming responses, use [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html).
+ [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html) – Submit a prompt and generate responses with a structure unified across all models. Model-specific request fields can be specified in the `additionalModelRequestFields` field. You can also include system prompts and previous conversation for context. To generate streaming responses, use [ConverseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseStream.html).
+ [StartAsyncInvoke](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_StartAsyncInvoke.html) – Submit a prompt and generate a response asynchronously that can be retrieved later. Used to generate videos.
+ [InvokeModelWithBidirectionalStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithBidirectionalStream.html) – 
+ OpenAI Chat completions API – Use the [OpenAI Chat Completions API](https://platform.openai.com/docs/api-reference/chat/create) with models supported by Amazon Bedrock to generate a response.

**Note**  
Restrictions apply to the following operations: `InvokeModel`, `InvokeModelWithResponseStream`, `Converse`, and `ConverseStream`. See [API restrictions](inference-api-restrictions.md) for details.

For model inference, you need to determine the following parameters:
+ Model ID – The ID or Amazon Resource Name (ARN) of the model or inference profile to use in the `modelId` field for inference. The following table describes how to find IDs for different types of resources:  
****    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/inference-api.html)
+ Request body – Contains the inference parameters for a model and other configurations. Each base model has its own inference parameters. The inference parameters for a custom or provisioned model depends on the base model from which it was created. For more information, see [Inference request parameters and response fields for foundation models](model-parameters.md).

Select a topic to learn how to use the model invocation APIs.

**Topics**
+ [

# Submit a single prompt with InvokeModel
](inference-invoke.md)
+ [

# Invoke a model with the OpenAI Chat Completions API
](inference-chat-completions.md)
+ [

# Carry out a conversation with the Converse API operations
](conversation-inference.md)
+ [

# API restrictions
](inference-api-restrictions.md)

# Submit a single prompt with InvokeModel
<a name="inference-invoke"></a>

You run inference on a single prompt by using the [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) and [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) API operations and specifying a model. Amazon Bedrock models differ in whether they accept text, image, or video inputs and whether they can produce outputs of text, image, or embeddings. Some models can return the response in a stream. To check model support for input, output, and streaming support, do one of the following:
+ Check the value in the **Input modalities**, **Output modalities**, or **Streaming supported** columns for a model at [Supported foundation models in Amazon Bedrock](models-supported.md).
+ Send a [GetFoundationModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_GetFoundationModel.html) request with the model ID and check the values in the `inputModalities`, `outputModalities`, and `responseStreamingSupported` field.

Run model inference on a prompt by sending an [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) request with an [Amazon Bedrock runtime endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-rt).

**Note**  
Restrictions apply to the following operations: `InvokeModel`, `InvokeModelWithResponseStream`, `Converse`, and `ConverseStream`. See [API restrictions](inference-api-restrictions.md) for details.

The following fields are required:


****  

| Field | Use case | 
| --- | --- | 
| modelId | To specify the model, inference profile, or prompt from Prompt management to use. To learn how to find this value, see [Submit prompts and generate responses using the API](inference-api.md). | 
| body | To specify the inference parameters for a model. To see inference parameters for different models, see [Inference request parameters and response fields for foundation models](model-parameters.md). If you specify a prompt from Prompt management in the modelId field, omit this field (if you include it, it will be ignored). | 

The following fields are optional:


****  

| Field | Use case | 
| --- | --- | 
| accept | To specify the media type for the request body. For more information, see Media Types on the [Swagger website](https://swagger.io/specification/). | 
| contentType | To specify the media type for the response body. For more information, see Media Types on the [Swagger website](https://swagger.io/specification/). | 
| performanceConfigLatency | To specify whether to optimize a model for latency. For more information, see [Optimize model inference for latency](latency-optimized-inference.md). | 
| guardrailIdentifier | To specify a guardrail to apply to the prompt and response. For more information, see [Test your guardrail](guardrails-test.md). | 
| guardrailVersion | To specify a guardrail to apply to the prompt and response. For more information, see [Test your guardrail](guardrails-test.md). | 
| trace | To specify whether to return the trace for the guardrail you specify. For more information, see [Test your guardrail](guardrails-test.md). | 
| serviceTier | To specify the service tier for a request. For more information, see [Service tiers for optimizing performance and cost](service-tiers-inference.md). | 

## Invoke model code examples
<a name="inference-example-invoke"></a>

This topic provides some basic examples for running inference using a single prompt with the [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) API. For more examples with different models, visit the following resources:
+ Pick an example under the [Code examples for Amazon Bedrock Runtime using AWS SDKs](service_code_examples_bedrock-runtime.md) topic.
+ Visit the inference parameter reference for the desired model at [Inference request parameters and response fields for foundation models](model-parameters.md).

The following examples assume that you've set up programmatic access such that you automatically authenticate to the AWS CLI and the SDK for Python (Boto3) in a default AWS Region when you run these examples. For information on setting up programmating access, see [Get started with the API](getting-started-api.md).

**Note**  
Review the following points before trying out the examples:  
You should test these examples in US East (N. Virginia) (us-east-1), which supports all the models used in the examples.
The `body` parameter can be large, so for some CLI examples, you'll be asked to create a JSON file and provide that file into the `--body` argument instead of specifying it in the command line.
For the image and video examples, you'll be asked to use your own image and video. The examples assume that your image file is named *image.png* and that your video file is named *video.mp4*.
You might have to convert images or videos into a base64-encoded string or upload them to an Amazon S3 location. In the examples, you'll have to replace the placeholders with the actual base64-encoded string or S3 location.

Expand a section to try some basic code examples.

### Generate text with a text prompt
<a name="w2aac13c32c33c17c19c13b1"></a>

The following examples generate a text response to a text prompt using the Amazon Titan Text Premier model. Choose the tab for your preferred method, and then follow the steps:

------
#### [ CLI ]

Run the following command in a terminal and find the generated response in a file called *invoke-model-output.txt*.

```
aws bedrock-runtime invoke-model \
    --model-id amazon.titan-text-premier-v1:0 \
    --body '{
        "inputText": "Describe the purpose of a 'hello world' program in one line.",
        "textGenerationConfig": {
            "maxTokenCount": 512,
            "temperature": 0.5
        }
    }' \
    --cli-binary-format raw-in-base64-out \
    invoke-model-output.txt
```

------
#### [ Python ]

Run the following Python code example to generate a text response:

```
# Use the native inference API to send a text message to Amazon Titan Text.

import boto3
import json

from botocore.exceptions import ClientError

# Create a Bedrock Runtime client in the AWS Region of your choice.
client = boto3.client("bedrock-runtime", region_name="us-east-1")

# Set the model ID, e.g., Titan Text Premier.
model_id = "amazon.titan-text-premier-v1:0"

# Define the prompt for the model.
prompt = "Describe the purpose of a 'hello world' program in one line."

# Format the request payload using the model's native structure.
native_request = {
    "inputText": prompt,
    "textGenerationConfig": {
        "maxTokenCount": 512,
        "temperature": 0.5,
    },
}

# Convert the native request to JSON.
request = json.dumps(native_request)

try:
    # Invoke the model with the request.
    response = client.invoke_model(modelId=model_id, body=request)

except (ClientError, Exception) as e:
    print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
    exit(1)

# Decode the response body.
model_response = json.loads(response["body"].read())

# Extract and print the response text.
response_text = model_response["results"][0]["outputText"]
print(response_text)
```

------

### Generate text with a text prompt using service tier
<a name="w2aac13c32c33c17c19c13b3"></a>

The following examples generate a text response to a text prompt using the OpenAI GPT model with a service tier to prioritize the request. Choose the tab for your preferred method, and then follow the steps:

------
#### [ CLI ]

Run the following command in a terminal and validate the service tier in the response.

```
aws bedrock-runtime invoke-model \
    --model-id openai.gpt-oss-120b-1:0 \
    --body '{
        "messages": [
            {
                "role": "user",
                "content": "Describe the purpose of a '\''hello world'\'' program in one line."
            }
        ],
        "max_tokens": 512,
        "temperature": 0.7
    }' \
    --content-type application/json \
    --accept application/json \
    --service-tier priority \
    --cli-binary-format raw-in-base64-out
```

------
#### [ Python ]

Run the following Python code example to generate a text response with service tier:

```
import boto3
import json

# Create a Bedrock Runtime client
bedrock_runtime = boto3.client(
    service_name="bedrock-runtime",
    region_name="us-east-1"
)

# Define the model ID and request body
model_id = "openai.gpt-oss-120b-1:0"
body = json.dumps({
    "messages": [
        {
            "role": "user",
            "content": "Describe the purpose of a 'hello world' program in one line."
        }
    ],
    "max_tokens": 512,
    "temperature": 0.7
})

# Make the request with service tier
response = bedrock_runtime.invoke_model(
    modelId=model_id,
    body=body,
    contentType="application/json",
    accept="application/json",
    serviceTier="priority"
)

# Parse and print the response
response_body = json.loads(response["body"])
print(response_body)
```

------

### Generate an image with a text prompt
<a name="w2aac13c32c33c17c19c13b5"></a>

The following code examples generate an image using a text prompt with the Stable Diffusion XL 1.0 model. Choose the tab for your preferred method, and then follow the steps:

------
#### [ CLI ]

Run the following command in a terminal and find the generated response in a file called *invoke-model-output.txt*. The bytes that represent the image can be found in the `base64` field in the response:

```
aws bedrock-runtime invoke-model \
    --model-id stability.stable-diffusion-xl-v1 \
    --body '{
        "text_prompts": [{"text": "A stylized picture of a cute old steampunk robot."}],
        "style_preset": "photographic",
        "seed": 0,
        "cfg_scale": 10,
        "steps": 30
    }' \
    --cli-binary-format raw-in-base64-out \
    invoke-model-output.txt
```

------
#### [ Python ]

Run the following Python code example to generate an image and find the resulting *stability\$11.png* image file in a folder called *output*.

```
# Use the native inference API to create an image with Amazon Titan Image Generator

import base64
import boto3
import json
import os
import random

# Create a Bedrock Runtime client in the AWS Region of your choice.
client = boto3.client("bedrock-runtime", region_name="us-east-1")

# Set the model ID, e.g., Titan Image Generator G1.
model_id = "amazon.titan-image-generator-v2:0"

# Define the image generation prompt for the model.
prompt = "A stylized picture of a cute old steampunk robot."

# Generate a random seed.
seed = random.randint(0, 2147483647)

# Format the request payload using the model's native structure.
native_request = {
    "taskType": "TEXT_IMAGE",
    "textToImageParams": {"text": prompt},
    "imageGenerationConfig": {
        "numberOfImages": 1,
        "quality": "standard",
        "cfgScale": 8.0,
        "height": 512,
        "width": 512,
        "seed": seed,
    },
}

# Convert the native request to JSON.
request = json.dumps(native_request)

# Invoke the model with the request.
response = client.invoke_model(modelId=model_id, body=request)

# Decode the response body.
model_response = json.loads(response["body"].read())

# Extract the image data.
base64_image_data = model_response["images"][0]

# Save the generated image to a local folder.
i, output_dir = 1, "output"
if not os.path.exists(output_dir):
    os.makedirs(output_dir)
while os.path.exists(os.path.join(output_dir, f"titan_{i}.png")):
    i += 1

image_data = base64.b64decode(base64_image_data)

image_path = os.path.join(output_dir, f"titan_{i}.png")
with open(image_path, "wb") as file:
    file.write(image_data)

print(f"The generated image has been saved to {image_path}")
```

------

### Generate embeddings from text
<a name="w2aac13c32c33c17c19c13b9"></a>

The following examples use the Amazon Titan Text Embeddings V2 model to generate binary embeddings for a text input. Choose the tab for your preferred method, and then follow the steps:

------
#### [ CLI ]

Run the following command in a terminal and find the generated response in a file called *invoke-model-output.txt*. The resulting embeddings are in the `binary` field.

```
aws bedrock-runtime invoke-model \
    --model-id amazon.titan-embed-text-v2:0 \
    --body '{
        "inputText": "What are the different services that you offer?",
        "embeddingTypes": ["binary"]
    }' \
    --cli-binary-format raw-in-base64-out \
    invoke-model-output.txt
```

------
#### [ Python ]

Run the following Python code example to generate embeddings for the provided text:

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate an embedding with the Amazon Titan Text Embeddings V2 Model
"""

import json
import logging
import boto3


from botocore.exceptions import ClientError


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_embedding(model_id, body):
    """
    Generate an embedding with the vector representation of a text input using Amazon Titan Text Embeddings G1 on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        response (JSON): The embedding created by the model and the number of input tokens.
    """

    logger.info("Generating an embedding with Amazon Titan Text Embeddings V2 model %s", model_id)

    bedrock = boto3.client(service_name='bedrock-runtime')

    accept = "application/json"
    content_type = "application/json"

    response = bedrock.invoke_model(
        body=body, modelId=model_id, accept=accept, contentType=content_type
    )

    response_body = json.loads(response.get('body').read())

    return response_body


def main():
    """
    Entrypoint for Amazon Titan Embeddings V2 - Text example.
    """

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    model_id = "amazon.titan-embed-text-v2:0"
    input_text = "What are the different services that you offer?"


    # Create request body.
    body = json.dumps({
        "inputText": input_text,
        "embeddingTypes": ["binary"]
    })


    try:

        response = generate_embedding(model_id, body)

        print(f"Generated an embedding: {response['embeddingsByType']['binary']}") # returns binary embedding
        print(f"Input text: {input_text}")
        print(f"Input Token count:  {response['inputTextTokenCount']}")

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))

    else:
        print(f"Finished generating an embedding with Amazon Titan Text Embeddings V2 model {model_id}.")


if __name__ == "__main__":
    main()
```

------

### Generate embeddings from an image
<a name="w2aac13c32c33c17c19c13c11"></a>

The following examples use the Amazon Titan Multimodal Embeddings G1 model to generate embeddings for an image input. Choose the tab for your preferred method, and then follow the steps:

------
#### [ CLI ]

Open a terminal and do the following:

1. Convert an image titled *image.png* in your current folder into a base64-encoded string and write it to a file titled *image.txt* by running the following command:

   ```
   base64 -i image.png -o image.txt
   ```

1. Create a JSON file called *image-input-embeddings-output.json* and paste the following JSON, replacing *\$1\$1image-base64\$1* with the contents of the *image.txt* file (make sure there is no new line at the end of the string):

   ```
   {
       "inputImage": "${image-base64}",
       "embeddingConfig": {
           "outputEmbeddingLength": 256
       }
   }
   ```

1. Run the following command, specifying the *image-input-embeddings-output.json* file as the body.

   ```
   aws bedrock-runtime invoke-model \
       --model-id amazon.titan-embed-image-v1 \
       --body file://image-input-embeddings-output.json \
       --cli-binary-format raw-in-base64-out \
       invoke-model-output.txt
   ```

1. Find the resulting embeddings in the *invoke-model-output.txt* file.

------
#### [ Python ]

In the following Python script, replace */path/to/image* with the path to an actual image. Then run the script to generate embeddings:

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to generate embeddings from an image with the Amazon Titan Multimodal Embeddings G1 model (on demand).
"""

import base64
import json
import logging
import boto3

from botocore.exceptions import ClientError

class EmbedError(Exception):
    "Custom exception for errors returned by Amazon Titan Multimodal Embeddings G1"

    def __init__(self, message):
        self.message = message

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_embeddings(model_id, body):
    """
    Generate a vector of embeddings for an image input using Amazon Titan Multimodal Embeddings G1 on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        response (JSON): The embeddings that the model generated, token information, and the
        reason the model stopped generating embeddings.
    """

    logger.info("Generating embeddings with Amazon Titan Multimodal Embeddings G1 model %s", model_id)

    bedrock = boto3.client(service_name='bedrock-runtime')

    accept = "application/json"
    content_type = "application/json"

    response = bedrock.invoke_model(
        body=body, modelId=model_id, accept=accept, contentType=content_type
    )

    response_body = json.loads(response.get('body').read())

    finish_reason = response_body.get("message")

    if finish_reason is not None:
        raise EmbedError(f"Embeddings generation error: {finish_reason}")

    return response_body


def main():
    """
    Entrypoint for Amazon Titan Multimodal Embeddings G1 example.
    """

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    # Read image from file and encode it as base64 string.
    with open("/path/to/image", "rb") as image_file:
        input_image = base64.b64encode(image_file.read()).decode('utf8')

    model_id = 'amazon.titan-embed-image-v1'
    output_embedding_length = 256

    # Create request body.
    body = json.dumps({
        "inputImage": input_image,
        "embeddingConfig": {
            "outputEmbeddingLength": output_embedding_length
        }
    })


    try:

        response = generate_embeddings(model_id, body)

        print(f"Generated image embeddings of length {output_embedding_length}: {response['embedding']}")

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))
        
    except EmbedError as err:
        logger.error(err.message)
        print(err.message)

    else:
        print(f"Finished generating image embeddings with Amazon Titan Multimodal Embeddings G1 model {model_id}.")


if __name__ == "__main__":
    main()
```

------

### Generate a text response to an image with an accompanying text prompt
<a name="w2aac13c32c33c17c19c13c13"></a>

Choose the tab for your preferred method, and then follow the steps:

------
#### [ CLI ]

The following example uses the Anthropic Claude 3 Haiku model to generate a response, given an image and a text prompt that asks the contents of the image. Open a terminal and do the following:

1. Convert an image titled *image.png* in your current folder into a base64-encoded string and write it to a file titled *image.txt* by running the following command:

   ```
   base64 -i image.png -o image.txt
   ```

1. Create a JSON file called *image-text-input.json* and paste the following JSON, replacing *\$1\$1image-base64\$1* with the contents of the *image.txt* file (make sure there is no new line at the end of the string):

   ```
   {
       "anthropic_version": "bedrock-2023-05-31",
       "max_tokens": 1000,
       "messages": [
           {               
               "role": "user",
               "content": [
                   {
                       "type": "image",
                       "source": {
                           "type": "base64",
                           "media_type": "image/png", 
                           "data": "${image-base64}"
                       }
                   },
                   {
                       "type": "text",
                       "text": "What's in this image?"
                   }
               ]
           }
       ]
   }
   ```

1. Run the following command to generate a text output, based on the image and the accompanying text prompt, to a file called *invoke-model-output.txt*:

   ```
   aws bedrock-runtime invoke-model \
       --model-id anthropic.claude-3-haiku-20240307-v1:0 \
       --body file://image-text-input.json \
       --cli-binary-format raw-in-base64-out \
       invoke-model-output.txt
   ```

1. Find the output in the *invoke-model-output.txt* file in the current folder.

------
#### [ Python ]

In the following python script, replace */path/to/image.png* with the actual path to the image before running the script:

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to run a multimodal prompt with Anthropic Claude (on demand) and InvokeModel.
"""

import json
import logging
import base64
import boto3

from botocore.exceptions import ClientError


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def run_multi_modal_prompt(bedrock_runtime, model_id, messages, max_tokens):
    """
    Invokes a model with a multimodal prompt.
    Args:
        bedrock_runtime: The Amazon Bedrock boto3 client.
        model_id (str): The model ID to use.
        messages (JSON) : The messages to send to the model.
        max_tokens (int) : The maximum  number of tokens to generate.
    Returns:
        None.
    """


    body = json.dumps(
        {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": max_tokens,
            "messages": messages
        }
    )

    response = bedrock_runtime.invoke_model(
        body=body, modelId=model_id)
    response_body = json.loads(response.get('body').read())

    return response_body


def main():
    """
    Entrypoint for Anthropic Claude multimodal prompt example.
    """

    try:

        bedrock_runtime = boto3.client(service_name='bedrock-runtime')

        model_id = 'anthropic.claude-3-sonnet-20240229-v1:0'
        max_tokens = 1000
        input_text = "What's in this image?"
        input_image = "/path/to/image" # Replace with actual path to image file
 
        # Read reference image from file and encode as base64 strings.
        image_ext = input_image.split(".")[-1]
        with open(input_image, "rb") as image_file:
            content_image = base64.b64encode(image_file.read()).decode('utf8')

        message = {
            "role": "user",
            "content": [
                {
                    "type": "image", 
                    "source": {
                        "type": "base64",
                        "media_type": f"image/{image_ext}", 
                        "data": content_image
                    }
                },
                {
                    "type": "text", 
                    "text": input_text
                }
            ]
        }

    
        messages = [message]

        response = run_multi_modal_prompt(
            bedrock_runtime, model_id, messages, max_tokens)
        print(json.dumps(response, indent=4))

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occurred: " +
              format(message))


if __name__ == "__main__":
    main()
```

------

### Generate a text response to a video uploaded to Amazon S3 with an accompanying text prompt
<a name="w2aac13c32c33c17c19c13c15"></a>

The following examples show how to generate a response with the Amazon Nova Lite model, given a video you upload to an S3 bucket and an accompanying text prompt.

**Prerequisite:** Upload a video titled *video.mp4* to an Amazon S3 bucket in your account by following the steps at [Uploading objects](https://docs.aws.amazon.com/AmazonS3/latest/userguide/upload-objects.html#upload-objects-procedure) in the Amazon Simple Storage Service User Guide. Take note of the S3 URI of the video.

Choose the tab for your preferred method, and then follow the steps:

------
#### [ CLI ]

Open a terminal and run the following command, replacing *s3://amzn-s3-demo-bucket/video.mp4* with the actual S3 location of your video:

```
aws bedrock-runtime invoke-model \
    --model-id amazon.nova-lite-v1:0 \
    --body '{
        "messages": [          
            {               
                "role": "user",
                "content": [      
                    {                       
                        "video": {     
                            "format": "mp4",   
                            "source": {
                                "s3Location": {
                                    "uri": "s3://amzn-s3-demo-bucket/video.mp4"
                                }
                            }
                        }                                    
                    },
                    {
                        "text": "What happens in this video?"
                    }
                ]
            }                              
        ]                  
    }' \
    --cli-binary-format raw-in-base64-out \
    invoke-model-output.txt
```

Find the output in the *invoke-model-output.txt* file in the current folder.

------
#### [ Python ]

In the following Python script, replace *s3://amzn-s3-demo-bucket/video.mp4* with the actual S3 location of your video. Then run the script:

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to run a multimodal prompt with Nova Lite (on demand) and InvokeModel.
"""

import json
import logging
import base64
import boto3

from botocore.exceptions import ClientError


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def run_multi_modal_prompt(bedrock_runtime, model_id, messages, max_tokens):
    """
    Invokes a model with a multimodal prompt.
    Args:
        bedrock_runtime: The Amazon Bedrock boto3 client.
        model_id (str): The model ID to use.
        messages (JSON) : The messages to send to the model.
        max_tokens (int) : The maximum  number of tokens to generate.
    Returns:
        None.
    """

    body = json.dumps(
        {
            "messages": messages,
            "inferenceConfig": {
                "maxTokens": max_tokens
            }
        }
    )

    response = bedrock_runtime.invoke_model(
        body=body, modelId=model_id)
    response_body = json.loads(response.get('body').read())

    return response_body


def main():
    """
    Entrypoint for Nova Lite video prompt example.
    """

    try:

        bedrock_runtime = boto3.client(service_name='bedrock-runtime')

        model_id = "amazon.nova-lite-v1:0"
        max_tokens = 1000
        input_video_s3_uri = "s3://amzn-s3-demo-bucket/video.mp4" # Replace with real S3 URI
        video_ext = input_video_s3_uri.split(".")[-1]
        input_text = "What happens in this video?"

        message = {
            "role": "user",
            "content": [
                {
                    "video": {
                        "format": video_ext,
                        "source": {
                            "s3Location": {
                                "uri": input_video_s3_uri
                            }
                        }
                    }
                },
                {
                    "text": input_text
                }
            ]
        }

    
        messages = [message]

        response = run_multi_modal_prompt(
            bedrock_runtime, model_id, messages, max_tokens)
        print(json.dumps(response, indent=4))

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))


if __name__ == "__main__":
    main()
```

------

### Generate a text response to a video converted to a base64-encoded string with an accompanying text prompt
<a name="w2aac13c32c33c17c19c13c17"></a>

The following examples show how to generate a response with the Amazon Nova Lite model, given a video converted to a base64-encoded string and an accompanying text prompt. Choose the tab for your preferred method, and then follow the steps:

------
#### [ CLI ]

Do the following:

1. Convert a video titled *video.mp4* in your current folder into base64 by running the following command:

   ```
   base64 -i video.mp4 -o video.txt
   ```

1. Create a JSON file called *video-text-input.json* and paste the following JSON, replacing *\$1\$1video-base64\$1* with the contents of the `video.txt` file (make sure there is no new line at the end):

   ```
   {
       "messages": [          
           {               
               "role": "user",
               "content": [      
                   {                       
                       "video": {     
                           "format": "mp4",   
                           "source": {
                               "bytes": ${video-base64}
                           }
                       }                                    
                   },
                   {
                       "text": "What happens in this video?"
                   }
               ]
           }                              
       ]                  
   }
   ```

1. Run the following command to generate a text output based on the video and the accompanying text prompt to a file called *invoke-model-output.txt*:

   ```
   aws bedrock-runtime invoke-model \
       --model-id amazon.nova-lite-v1:0 \
       --body file://video-text-input.json \
       --cli-binary-format raw-in-base64-out \
       invoke-model-output.txt
   ```

1. Find the output in the *invoke-model-output.txt* file in the current folder.

------
#### [ Python ]

In the following Python script, replace */path/to/video.mp4* with the actual path to the video. Then run the script:

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to run a multimodal prompt with Nova Lite (on demand) and InvokeModel.
"""

import json
import logging
import base64
import boto3

from botocore.exceptions import ClientError


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def run_multi_modal_prompt(bedrock_runtime, model_id, messages, max_tokens):
    """
    Invokes a model with a multimodal prompt.
    Args:
        bedrock_runtime: The Amazon Bedrock boto3 client.
        model_id (str): The model ID to use.
        messages (JSON) : The messages to send to the model.
        max_tokens (int) : The maximum  number of tokens to generate.
    Returns:
        None.
    """

    body = json.dumps(
        {
            "messages": messages,
            "inferenceConfig": {
                "maxTokens": max_tokens
            }
        }
    )

    response = bedrock_runtime.invoke_model(
        body=body, modelId=model_id)
    response_body = json.loads(response.get('body').read())

    return response_body


def main():
    """
    Entrypoint for Nova Lite video prompt example.
    """

    try:

        bedrock_runtime = boto3.client(service_name='bedrock-runtime')

        model_id = "amazon.nova-lite-v1:0"
        max_tokens = 1000
        input_video = "/path/to/video.mp4" # Replace with real path to video
        video_ext = input_video.split(".")[-1]
        input_text = "What happens in this video?"

        # Read reference video from file and encode as base64 string.
        with open(input_video, "rb") as video_file:
            content_video = base64.b64encode(video_file.read()).decode('utf8')\

        message = {
            "role": "user",
            "content": [
                {
                    "video": {
                        "format": video_ext,
                        "source": {
                            "bytes": content_video
                        }
                    }
                },
                {
                    "text": input_text
                }
            ]
        }

    
        messages = [message]

        response = run_multi_modal_prompt(
            bedrock_runtime, model_id, messages, max_tokens)
        print(json.dumps(response, indent=4))

    except ClientError as err:
        message = err.response["Error"]["Message"]
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))


if __name__ == "__main__":
    main()
```

------

## Invoke model with streaming code example
<a name="inference-examples-stream"></a>

**Note**  
The AWS CLI does not support streaming.

The following example shows how to use the [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) API to generate streaming text with Python using the prompt *write an essay for living on mars in 1000 words*.

```
import boto3
import json

brt = boto3.client(service_name='bedrock-runtime')

body = json.dumps({
    'prompt': '\n\nHuman: write an essay for living on mars in 1000 words\n\nAssistant:',
    'max_tokens_to_sample': 4000
})
                   
response = brt.invoke_model_with_response_stream(
    modelId='anthropic.claude-v2', 
    body=body
)
    
stream = response.get('body')
if stream:
    for event in stream:
        chunk = event.get('chunk')
        if chunk:
            print(json.loads(chunk.get('bytes').decode()))
```

# Invoke a model with the OpenAI Chat Completions API
<a name="inference-chat-completions"></a>

You can run model inference using the [OpenAI Create chat completion API](https://platform.openai.com/docs/api-reference/chat/create) with Amazon Bedrock models.

You can call the Create chat completion API in the following ways:
+ Make an HTTP request with an Amazon Bedrock Runtime endpoint.
+ Use an OpenAI SDK request with an Amazon Bedrock Runtime endpoint.

Select a topic to learn more:

**Topics**
+ [

## Supported models and Regions for the OpenAI Chat Completions API
](#inference-chat-completions-supported)
+ [

## Prerequisites to use the Chat Completions API
](#inference-chat-completions-prereq)
+ [

## Create a chat completion
](#inference-chat-completions-create)
+ [

## Include a guardrail in a chat completion
](#inference-chat-completions-guardrails)

## Supported models and Regions for the OpenAI Chat Completions API
<a name="inference-chat-completions-supported"></a>

You can use the Create chat completion API with all OpenAI models supported in Amazon Bedrock and in the AWS Regions that support these models. For more information about supported models and regions, see [Supported foundation models in Amazon Bedrock](models-supported.md).

## Prerequisites to use the Chat Completions API
<a name="inference-chat-completions-prereq"></a>

To see prerequisites for using the Chat Completions API, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK ]
+ **Authentication** – The OpenAI SDK only supports authentication with an Amazon Bedrock API key. Generate an Amazon Bedrock API key to authenticate your request. To learn about Amazon Bedrock API keys and how to generate them, see the API keys section in the Build chapter.
+ **Endpoint** – Find the endpoint that corresponds to the AWS Region to use in [Amazon Bedrock Runtime endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-rt). If you use an AWS SDK, you might only need to specify the region code and not the whole endpoint when you set up the client.
+ **Install an OpenAI SDK** – For more information, see [Libraries](https://platform.openai.com/docs/libraries) in the OpenAI documentation.

------
#### [ HTTP request ]
+ **Authentication** – You can authenticate with either your AWS credentials or with an Amazon Bedrock API key.

  Set up your AWS credentials or generate an Amazon Bedrock API key to authenticate your request.
  + To learn about setting up your AWS credentials, see [Programmatic access with AWS security credentials](https://docs.aws.amazon.com/IAM/latest/UserGuide/security-creds-programmatic-access.html).
  + To learn about Amazon Bedrock API keys and how to generate them, see the API keys section in the Build chapter.
+ **Endpoint** – Find the endpoint that corresponds to the AWS Region to use in [Amazon Bedrock Runtime endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-rt). If you use an AWS SDK, you might only need to specify the region code and not the whole endpoint when you set up the client.

------

## Create a chat completion
<a name="inference-chat-completions-create"></a>

Refer to the following resources in the OpenAI documentation for details about the Create chat completion API:
+ [Request body parameters](https://platform.openai.com/docs/api-reference/chat/create)
+ [Response body parameters](https://platform.openai.com/docs/api-reference/chat/object)

**Note**  
Amazon Bedrock currently doesn't support the other OpenAI Chat completion API operations.

To learn how to use the OpenAI Create chat completion API, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

To create a chat completion with the OpenAI SDK, do the following:

1. Import the OpenAI SDK and set up the client with the following fields:
   + `base_url` – Prefix the Amazon Bedrock Runtime endpoint to `/openai/v1`, as in the following format:

     ```
     https://${bedrock-runtime-endpoint}/openai/v1
     ```
   + `api_key` – Specify an Amazon Bedrock API key.
   + `default_headers` – If you need to include any headers, you can include them as key-value pairs in this object. You can alternatively specify headers in the `extra_headers` when making a specific API call.

1. Use the `chat.completions.create()` method with the client and minimally specify the `model` and `messages` in the request body.

The following example calls the Create chat completion API in `us-west-2`. Replace *\$1AWS\$1BEARER\$1TOKEN\$1BEDROCK* with your actual API key.

```
from openai import OpenAI

client = OpenAI(
    base_url="https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1", 
    api_key="$AWS_BEARER_TOKEN_BEDROCK" # Replace with actual API key
)

completion = client.chat.completions.create(
    model="openai.gpt-oss-20b-1:0",
    messages=[
        {
            "role": "developer",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Hello!"
        }
    ]
)

print(completion.choices[0].message)
```

------
#### [ HTTP request ]

To create a chat completion with a direct HTTTP request, do the following:

1. Specify the URL by prefixing the Amazon Bedrock Runtime endpoint to `/openai/v1/chat/completions`, as in the following format:

   ```
   https://${bedrock-runtime-endpoint}/openai/v1/chat/completions
   ```

1. Specify your AWS credentials or an Amazon Bedrock API key in the `Authorization` header.

1. In the request body, specify at least the `model` and `messages` in the request body.

The following example uses curl to call the Create chat completion API in `us-west-2`. Replace *\$1AWS\$1BEARER\$1TOKEN\$1BEDROCK* with your actual API key:

```
curl -X POST https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1/chat/completions \
   -H "Content-Type: application/json" \
   -H "Authorization: Bearer $AWS_BEARER_TOKEN_BEDROCK" \
   -d '{
    "model": "openai.gpt-oss-20b-1:0",
    "messages": [
        {
            "role": "developer",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Hello!"
        }
    ]
}'
```

------

## Include a guardrail in a chat completion
<a name="inference-chat-completions-guardrails"></a>

To include safeguards in model input and responses, apply a [guardrail](guardrails.md) when running model invocation by including the following [extra parameters](https://github.com/openai/openai-python#undocumented-request-params) as fields in the request body:
+ `extra_headers` – Maps to an object containing the following fields, which specify extra headers in the request:
  + `X-Amzn-Bedrock-GuardrailIdentifier` (required) – The ID of the guardrail.
  + `X-Amzn-Bedrock-GuardrailVersion` (required) – The version of the guardrail.
  + `X-Amzn-Bedrock-Trace` (optional) – Whether or not to enable the guardrail trace.
+ `extra_body` – Maps to an object. In that object, you can include the `amazon-bedrock-guardrailConfig` field, which maps to an object containing the following fields:
  + `tagSuffix` (optional) – Include this field for [input tagging](guardrails-tagging.md).

For more information about these parameters in Amazon Bedrock Guardrails, see [Test your guardrail](guardrails-test.md).

To see examples of using guardrails with OpenAI chat completions, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

```
import openai
from openai import OpenAIError

# Endpoint for Amazon Bedrock Runtime
bedrock_endpoint = "https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1"

# Model ID
model_id = "openai.gpt-oss-20b-1:0"

# Replace with actual values
bedrock_api_key = "$AWS_BEARER_TOKEN_BEDROCK"
guardrail_id = "GR12345"
guardrail_version = "DRAFT"

client = openai.OpenAI(
    api_key=bedrock_api_key,
    base_url=bedrock_endpoint,
)

try:
    response = client.chat.completions.create(
        model=model_id,
        # Specify guardrail information in the header
        extra_headers={
            "X-Amzn-Bedrock-GuardrailIdentifier": guardrail_id,
            "X-Amzn-Bedrock-GuardrailVersion": guardrail_version,
            "X-Amzn-Bedrock-Trace": "ENABLED",
        },
        # Additional guardrail information can be specified in the body
        extra_body={
            "amazon-bedrock-guardrailConfig": {
                "tagSuffix": "xyz"  # Used for input tagging
            }
        },
        messages=[
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "assistant", 
                "content": "Hello! How can I help you today?"
            },
            {
                "role": "user",
                "content": "What is the weather like today?"
            }
        ]
    )

    request_id = response._request_id
    print(f"Request ID: {request_id}")
    print(response)
    
except OpenAIError as e:
    print(f"An error occurred: {e}")
    if hasattr(e, 'response') and e.response is not None:
        request_id = e.response.headers.get("x-request-id")
        print(f"Request ID: {request_id}")
```

------
#### [ OpenAI SDK (Java) ]

```
import com.openai.client.OpenAIClient;
import com.openai.client.okhttp.OpenAIOkHttpClient;
import com.openai.core.http.HttpResponseFor;
import com.openai.models.chat.completions.ChatCompletion;
import com.openai.models.chat.completions.ChatCompletionCreateParams;

// Endpoint for Amazon Bedrock Runtime
String bedrockEndpoint = "http://bedrock-runtime.us-west-2.amazonaws.com/openai/v1"

// Model ID
String modelId = "openai.gpt-oss-20b-1:0"

// Replace with actual values
String bedrockApiKey = "$AWS_BEARER_TOKEN_BEDROCK"
String guardrailId = "GR12345"
String guardrailVersion = "DRAFT"

OpenAIClient client = OpenAIOkHttpClient.builder()
        .apiKey(bedrockApiKey)
        .baseUrl(bedrockEndpoint)
        .build()

ChatCompletionCreateParams request = ChatCompletionCreateParams.builder()
        .addUserMessage("What is the temperature in Seattle?")
        .model(modelId)
        // Specify additional headers for the guardrail
        .putAdditionalHeader("X-Amzn-Bedrock-GuardrailIdentifier", guardrailId)
        .putAdditionalHeader("X-Amzn-Bedrock-GuardrailVersion", guardrailVersion)
        // Specify additional body parameters for the guardrail
        .putAdditionalBodyProperty(
                "amazon-bedrock-guardrailConfig",
                JsonValue.from(Map.of("tagSuffix", JsonValue.of("xyz"))) // Allows input tagging
        )
        .build();
        
HttpResponseFor<ChatCompletion> rawChatCompletionResponse =
        client.chat().completions().withRawResponse().create(request);

final ChatCompletion chatCompletion = rawChatCompletionResponse.parse();

System.out.println(chatCompletion);
```

------

# Carry out a conversation with the Converse API operations
<a name="conversation-inference"></a>

You can use the Amazon Bedrock Converse API to create conversational applications that send and receive messages to and from an Amazon Bedrock model. For example, you can create a chat bot that maintains a conversation over many turns and uses a persona or tone customization that is unique to your needs, such as a helpful technical support assistant.

To use the Converse API, you use the [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html) or [ConverseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseStream.html) (for streaming responses) operations to send messages to a model. It is possible to use the existing base inference operations ([InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html)) for conversation applications. However, we recommend using the Converse API as it provides consistent API, that works with all Amazon Bedrock models that support messages. This means you can write code once and use it with different models. Should a model have unique inference parameters, the Converse API also allows you to pass those unique parameters in a model specific structure. 

You can use the Converse API to implement [tool use](tool-use.md) and [guardrails](guardrails-use-converse-api.md) in your applications. 

**Note**  
With Mistral AI and Meta models, the Converse API embeds your input in a model-specific prompt template that enables conversations. 
Restrictions apply to the following operations: `InvokeModel`, `InvokeModelWithResponseStream`, `Converse`, and `ConverseStream`. See [API restrictions](inference-api-restrictions.md) for details.

For code examples, see the following:
+ Python examples for this topic – [Converse API examples](conversation-inference-examples.md)
+ Various languages and models – [Code examples for Amazon Bedrock Runtime using AWS SDKs](service_code_examples_bedrock-runtime.md)
+ Java tutorial – [A Java developer's guide to Bedrock's new Converse API](https://community.aws/content/2hUiEkO83hpoGF5nm3FWrdfYvPt/amazon-bedrock-converse-api-java-developer-guide)
+ JavaScript tutorial – [A developer's guide to Bedrock's new Converse API](https://community.aws/content/2dtauBCeDa703x7fDS9Q30MJoBA/amazon-bedrock-converse-api-developer-guide)

**Topics**
+ [

# Supported models and model features
](conversation-inference-supported-models-features.md)
+ [

# Using the Converse API
](conversation-inference-call.md)
+ [

# Converse API examples
](conversation-inference-examples.md)

# Supported models and model features
<a name="conversation-inference-supported-models-features"></a>

The Converse API supports the following Amazon Bedrock models and model features. The Converse API doesn't support any embedding or image generation models.


| Model | Converse | ConverseStream | System prompts | Document chat | Vision | Tool use | Streaming tool use | Guardrails | Amazon S3 links for multimedia | 
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
| AI21 Jamba-Instruct | Yes | Yes | Yes | No | No | No | No | No | No | 
| AI21 Labs Jurassic-2 (Text) | Limited. No chat support. | No | No | No | No | No | No | Yes | No | 
| AI21 Labs Jamba 1.5 Large | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | No | 
| AI21 Labs Jamba 1.5 Mini | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | No | 
| Amazon Nova Premier | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | 
| Amazon Nova Pro | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | 
| Amazon Nova Lite | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | 
| Amazon Nova Micro | Yes | Yes | Yes | No | No | Yes | Yes | Yes | No | 
| Amazon Titan models | Yes | Yes | No | Yes (except Titan Text Premier) | No | No | No | Yes | No | 
| Anthropic Claude 2.x and earlier models | Yes | Yes | Yes | Yes | No | No | No | Yes | No | 
| Anthropic Claude 3 models | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | 
| Anthropic Claude 3.5 Sonnet | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | 
| Anthropic Claude 3.5 Sonnet v2 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | 
| Anthropic Claude 3.7 Sonnet | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | 
| Anthropic Claude 3.5 Haiku | Yes | Yes | Yes | Yes | No | Yes | Yes | No | No | 
| Anthropic Claude Sonnet 4 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | No | 
| Anthropic Claude Opus 4 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | No | 
| Anthropic Claude Sonnet 4.5 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | No | 
| Anthropic Claude Haiku 4.5 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | No | 
| Anthropic Claude Opus 4.1 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | No | 
| Claude Opus 4.5 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | No | 
| Cohere Command | Limited. No chat support. | Limited. No chat support. | No | Yes | No | No | No | Yes | No | 
| Cohere Command Light | Limited. No chat support. | Limited. No chat support. | No | No | No | No | No | Yes | No | 
| Cohere Command R and Command R\$1 | Yes | Yes | Yes | Yes | No | Yes | Yes | No | No | 
| DeepSeek-R1 | Yes | Yes | Yes | Yes | No | No | No | Yes | No | 
| Meta Llama 2 and Llama 3 | Yes | Yes | Yes | Yes | No | No | No | Yes | No | 
| Meta Llama 3.1 | Yes | Yes | Yes | Yes | No | Yes | No | Yes | No | 
| Meta Llama 3.2 1b and Llama 3.2 3b | Yes | Yes | Yes | Yes | No | No | No | Yes | No | 
| Meta Llama 3.2 11b and Llama 3.2 90b | Yes | Yes | Yes | Yes | Yes | Yes | No | Yes | No | 
| Meta Llama 4 Maverick 17B and Llama 4.0 Scout 17B | Yes | Yes | Yes | Yes | Yes | Yes | No | Yes | No | 
| Mistral AI Instruct | Yes | Yes | No | Yes | No | No | No | Yes | No | 
| Mistral Large | Yes | Yes | Yes | Yes | No | Yes | No | Yes | No | 
| Mistral Large 2 (24.07) | Yes | Yes | Yes | Yes | No | Yes | No | Yes | No | 
| Mistral Small | Yes | Yes | Yes | No | No | Yes | No | Yes | No | 
| Pixtral Large (25.02) | Yes | Yes | Yes | Yes | No | Yes | No | Yes | No | 
| Writer Palmyra X4 | Yes | Yes | Yes | Yes | No | Yes | No | Yes | No | 
| Writer Palmyra X5 | Yes | Yes | Yes | Yes | No | Yes | No | Yes | No | 

For a table of the Regions that support each model, see [Model support by AWS Region in Amazon Bedrock](models-regions.md).

**Note**  
Cohere Command (Text) and AI21 Labs Jurassic-2 (Text) don't support chat with the Converse API. The models can only handle one user message at a time and can't maintain the history of a conversation. You get an error if you attempt to pass more than one message.

# Using the Converse API
<a name="conversation-inference-call"></a>

To use the Converse API, you call the `Converse` or `ConverseStream` operations to send messages to a model. To call `Converse`, you require permission for the `bedrock:InvokeModel` operation. To call `ConverseStream`, you require permission for the `bedrock:InvokeModelWithResponseStream` operation.

**Topics**
+ [

## Request
](#conversation-inference-call-request)
+ [

## Response
](#conversation-inference-call-response)

**Note**  
Restrictions apply to the following operations: InvokeModel, InvokeModelWithResponseStream, Converse, and ConverseStream. See [API restrictions](inference-api-restrictions.md) for details.

## Request
<a name="conversation-inference-call-request"></a>

When you make a [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html) request with an [Amazon Bedrock runtime endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-rt), you can include the following fields:
+ **modelId** – A required parameter in the header that lets you specify the resource to use for inference.
+ The following fields let you customize the prompt:
  + **messages** – Use to specify the content and role of the prompts.
  + **system** – Use to specify system prompts, which define instructions or context for the model.
  + **inferenceConfig** – Use to specify inference parameters that are common to all models. Inference parameters influence the generation of the response.
  + **additionalModelRequestFields** – Use to specify inference parameters that are specific to the model that you run inference with.
  + **promptVariables** – (If you use a prompt from Prompt management) Use this field to define the variables in the prompt to fill in and the values with which to fill them.
+ The following fields let you customize how the response is returned:
  + **guardrailConfig** – Use this field to include a guardrail to apply to the entire prompt.
  + **toolConfig** – Use this field to include a tool to help a model generate responses.
  + **additionalModelResponseFieldPaths** – Use this field to specify fields to return as a JSON pointer object.
  + **serviceTier** – Use this field to specify the service tier for a particular request
+ **requestMetadata** – Use this field to include metadata that can be filtered on when using invocation logs.

**Note**  
The following restrictions apply when you use a Prompt management prompt with `Converse` or `ConverseStream`:  
You can't include the `additionalModelRequestFields`, `inferenceConfig`, `system`, or `toolConfig` fields.
If you include the `messages` field, the messages are appended after the messages defined in the prompt.
If you include the `guardrailConfig` field, the guardrail is applied to the entire prompt. If you include `guardContent` blocks in the [ContentBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ContentBlock.html) field, the guardrail will only be applied to those blocks.

Expand a section to learn more about a field in the `Converse` request body:

### messages
<a name="converse-messages"></a>

The `messages` field is an array of [Message](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Message.html) objects, each of which defines a message between the user and the model. A `Message` object contains the following fields:
+ **role** – Defines whether the message is from the `user` (the prompt sent to the model) or `assistant` (the model response).
+ **content** – Defines the content in the prompt.
**Note**  
Amazon Bedrock doesn't store any text, images, or documents that you provide as content. The data is only used to generate the response.

You can maintain conversation context by including all the messages in the conversation in subsequent `Converse` requests and using the `role` field to specify whether the message is from the user or the model.

The `content` field maps to an array of [ContentBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ContentBlock.html) objects. Within each [ContentBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ContentBlock.html), you can specify one of the following fields (to see what models support what blocks, see [Supported models and model features](conversation-inference-supported-models-features.md)):

------
#### [ text ]

The `text` field maps to a string specifying the prompt. The `text` field is interpreted alongside other fields that are specified in the same [ContentBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ContentBlock.html).

The following shows a [Message](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Message.html) object with a `content` array containing only a text [ContentBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ContentBlock.html):

```
{
    "role": "user",
    "content": [
        {
            "text": "string"
        }
    ]
}
```

------
#### [ image ]

The `image` field maps to an [ImageBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ImageBlock.html). Pass the raw bytes, encoded in base64, for an image in the `bytes` field. If you use an AWS SDK, you don't need to encode the bytes in base64.

If you exclude the `text` field, the model describes the image.

The following shows an example [Message](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Message.html) object with a `content` array containing only an image [ContentBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ContentBlock.html):

```
{
    "role": "user",
    "content": [
        {
            "image": {
                "format": "png",
                "source": {
                    "bytes": "image in bytes"
                }
            }
        }
    ]
}
```

You can also specify an Amazon S3 URI instead of passing the bytes directly in the request body. The following shows a sample `Message` object with a content array containing the source passed through an Amazon S3 URI.

```
{
    "role": "user",
    "content": [
        {
            "image": {
                "format": "png",
                "source": {
                    "s3Location": {
                        "uri": "s3://amzn-s3-demo-bucket/myImage",
                        "bucketOwner": "111122223333"
                    }
                }
            }
        }
    ]
}
```

------
#### [ document ]

The `document` field maps to an [DocumentBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_DocumentBlock.html). If you include a `DocumentBlock`, check that your request conforms to the following restrictions:
+ In the `content` field of the [Message](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Message.html) object, you must also include a `text` field with a prompt related to the document.
+ Pass the raw bytes, encoded in base64, for the document in the `bytes` field. If you use an AWS SDK, you don't need to encode the document bytes in base64.
+ The `name` field can only contain the following characters:
  + Alphanumeric characters
  + Whitespace characters (no more than one in a row)
  + Hyphens
  + Parentheses
  + Square brackets
**Note**  
The `name` field is vulnerable to prompt injections, because the model might inadvertently interpret it as instructions. Therefore, we recommend that you specify a neutral name.

When using a document you can enable the `citations` tag, which will provide document specific citations in the response of the API call. See the [DocumentBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_DocumentBlock.html) API for more details.

The following shows a sample [Message](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Message.html) object with a `content` array containing only a document [ContentBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ContentBlock.html) and a required accompanying text [ContentBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ContentBlock.html).

```
{
    "role": "user",
    "content": [
        {
            "text": "string"
        },
        {
            "document": {
                "format": "pdf",
                "name": "MyDocument",
                "source": {
                    "bytes": "document in bytes"
                }
            }
        }
    ]
}
```

You can also specify an Amazon S3 URI instead of passing the bytes directly in the request body. The following shows a sample `Message` object with a content array containing the source passed through an Amazon S3 URI.

```
{
    "role": "user",
    "content": [
        {
            "text": "string"
        },
        {
            "document": {
                "format": "pdf",
                "name": "MyDocument",
                "source": {
                    "s3Location": {
                      "uri": "s3://amzn-s3-demo-bucket/myDocument",
                      "bucketOwner": "111122223333"
                    }
                }
            }
        }
    ]
}
```

------
#### [ video ]

The `video` field maps to a [VideoBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_VideoBlock.html) object. Pass the raw bytes in the `bytes` field, encoded in base64. If you use the AWS SDK, you don't need to encode the bytes in base64.

If you don't include the `text` field, the model will describe the video.

The following shows a sample [Message](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Message.html) object with a `content` array containing only a video [ContentBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ContentBlock.html).

```
{
    "role": "user",
    "content": [
        {
            "video": {
                "format": "mp4",
                "source": {
                    "bytes": "video in bytes"
                }
            }
        }
    ]
}
```

You can also specify an Amazon S3 URI instead of passing the bytes directly in the request body. The following shows a sample `Message` object with a content array containing the source passed through an Amazon S3 URI.

```
{
    "role": "user",
    "content": [
        {
            "video": {
                "format": "mp4",
                "source": {
                    "s3Location": {
                        "uri": "s3://amzn-s3-demo-bucket/myVideo",
                        "bucketOwner": "111122223333"
                    }
                }
            }
        }
    ]
}
```

**Note**  
The assumed role must have the `s3:GetObject` permission to the Amazon S3 URI. The `bucketOwner` field is optional but must be specified if the account making the request does not own the bucket the Amazon S3 URI is found in. For more information, see [Configure access to Amazon S3 buckets](s3-bucket-access.md).

------
#### [ cachePoint ]

You can add cache checkpoints as a block in a message alongside an accompanying prompt by using `cachePoint` fields to utilize prompt caching. Prompt caching is a feature that lets you begin caching the context of conversations to achieve cost and latency savings. For more information, see [Prompt caching for faster model inference](prompt-caching.md).

The following shows a sample [Message](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Message.html) object with a `content` array containing a document [ContentBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ContentBlock.html) and a required accompanying text [ContentBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ContentBlock.html), as well as a **cachePoint** that adds both the document and text contents to the cache.

```
{
    "role": "user",
    "content": [
        {
            "text": "string"
        },
        {
            "document": {
                "format": "pdf",
                "name": "string",
                "source": {
                    "bytes": "document in bytes"
                }
            }
        },
        {
            "cachePoint": {
                "type": "default"
            }
        }
    ]
}
```

------
#### [ guardContent ]

The `guardContent` field maps to a [GuardrailConverseContentBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_GuardrailConverseContentBlock.html) object. You can use this field to target an input to be evaluated by the guardrail defined in the `guardrailConfig` field. If you don't specify this field, the guardrail evaluates all messages in the request body. You can pass the following types of content in a `GuardBlock`:
+ **text** – The following shows an example [Message](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Message.html) object with a `content` array containing only a text [GuardrailConverseContentBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_GuardrailConverseContentBlock.html):

  ```
  {
      "role": "user",
      "content": [
          {
              "text": "Tell me what stocks to buy.",
              "qualifiers": [
                  "guard_content"
              ]
          }
      ]
  }
  ```

  You define the text to be evaluated and include any qualifiers to use for [contextual grounding](guardrails-contextual-grounding-check.md).
+ **image** – The following shows a [Message](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Message.html) object with a `content` array containing only an image [GuardrailConverseContentBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_GuardrailConverseContentBlock.html):

  ```
  {
      "role": "user",
      "content": [
          {
              "format": "png",
              "source": {
                  "bytes": "image in bytes"
              }
          }
      ]
  }
  ```

  You specify the format of the image and define the image in bytes.

For more information about using guardrails, see [Detect and filter harmful content by using Amazon Bedrock Guardrails](guardrails.md).

------
#### [ reasoningContent ]

The `reasoningContent` field maps to a [ReasoningContentBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ReasoningContentBlock.html). This block contains content regarding the reasoning that was carried out by the model to generate the response in the accompanying `ContentBlock`.

The following shows a `Message` object with a `content` array containing only a `ReasoningContentBlock` and an accompanying text `ContentBlock`.

```
{
    "role": "user",
    "content": [
        {
            "text": "string"
        },
        {
            "reasoningContent": {
                "reasoningText": {
                    "text": "string",
                    "signature": "string"
                }
                "redactedContent": "base64-encoded binary data object"
            }
        }
    ]
}
```

The `ReasoningContentBlock` contains the reasoning used to generate the accompanying content in the `reasoningText` field, in addition to any content in the reasoning that was encrypted by the model provider for trust and safety reasons in the `redactedContent` field.

Within the `reasoningText` field, the `text` fields describes the reasoning. The `signature` field is a hash of all the messages in the conversation and is a safeguard against tampering of the reasoning used by the model. You must include the signature and all previous messages in subsequent `Converse` requests. If any of the messages are changed, the response throws an error.

------
#### [ toolUse ]

Contains information about a tool for the model to use. For more information, see [Use a tool to complete an Amazon Bedrock model response](tool-use.md).

------
#### [ toolResult ]

Contains information about the result from the model using a tool. For more information, see [Use a tool to complete an Amazon Bedrock model response](tool-use.md).

------

In the following `messages` example, the user asks for a list of three pop songs, and the model generates a list of songs. 

```
[
    {
        "role": "user",
        "content": [
            {
                "text": "Create a list of 3 pop songs."
            }
        ]
    },
    {
        "role": "assistant",
        "content": [
            {
                "text": "Here is a list of 3 pop songs by artists from the United Kingdom:\n\n1. \"As It Was\" by Harry Styles\n2. \"Easy On Me\" by Adele\n3. \"Unholy\" by Sam Smith and Kim Petras"
            }
        ]
    }
]
```

### system
<a name="converse-system"></a>

A system prompt is a type of prompt that provides instructions or context to the model about the task it should perform, or the persona it should adopt during the conversation. You can specify a list of system prompts for the request in the `system` ([SystemContentBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_SystemContentBlock.html)) field, as shown in the following example.

```
[
    {
        "text": "You are an app that creates play lists for a radio station that plays rock and pop music. Only return song names and the artist. "
    }
]
```

### inferenceConfig
<a name="converse-inference"></a>

The Converse API supports a base set of inference parameters that you set in the `inferenceConfig` field ([InferenceConfiguration](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InferenceConfiguration.html)). The base set of inference parameters are:
+ **maxTokens** – The maximum number of tokens to allow in the generated response. 
+ **stopSequences** – A list of stop sequences. A stop sequence is a sequence of characters that causes the model to stop generating the response. 
+ **temperature** – The likelihood of the model selecting higher-probability options while generating a response. 
+ **topP** – The percentage of most-likely candidates that the model considers for the next token.

For more information, see [Influence response generation with inference parameters](inference-parameters.md).

The following example JSON sets the `temperature` inference parameter. 

```
{"temperature": 0.5}
```

### additionalModelRequestFields
<a name="converse-additional-model-request-fields"></a>

If the model you are using has additional inference parameters, you can set those parameters by specifying them as JSON in the `additionalModelRequestFields` field. The following example JSON shows how to set `top_k`, which is available in Anthropic Claude models, but isn't a base inference parameter in the messages API. 

```
{"top_k": 200}
```

### promptVariables
<a name="converse-prompt-variables"></a>

If you specify a prompt from [Prompt management](prompt-management.md) in the `modelId` as the resource to run inference on, use this field to fill in the prompt variables with actual values. The `promptVariables` field maps to a JSON object with keys that correspond to variables defined in the prompts and values to replace the variables with.

For example, let's say that you have a prompt that says **Make me a *\$1\$1genre\$1\$1* playlist consisting of the following number of songs: *\$1\$1number\$1\$1*.**. The prompt's ID is `PROMPT12345` and its version is `1`. You could send the following `Converse` request to replace the variables:

```
POST /model/arn:aws:bedrock:us-east-1:111122223333:prompt/PROMPT12345:1/converse HTTP/1.1
Content-type: application/json

{
   "promptVariables": { 
      "genre" : "pop",
      "number": 3
   }
}
```

### guardrailConfig
<a name="converse-guardrail"></a>

You can apply a guardrail that you created with [Amazon Bedrock Guardrails](guardrails.md) by including this field. To apply the guardrail to a specific message in the conversation, include the message in a [GuardrailConverseContentBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_GuardrailConverseContentBlock.html). If you don't include any `GuardrailConverseContentBlock`s in the request body, the guardrail is applied to all the messages in the `messages` field. For an example, see [Include a guardrail with the Converse API](guardrails-use-converse-api.md).

### toolConfig
<a name="converse-tool"></a>

This field lets you define a tool for the model to use to help it generate a response. For more information, see [Use a tool to complete an Amazon Bedrock model response](tool-use.md).

### additionalModelResponseFieldPaths
<a name="converse-additional-model-response-field-paths"></a>

You can specify the paths for additional model parameters in the `additionalModelResponseFieldPaths` field, as shown in the following example.

```
[ "/stop_sequence" ]
```

The API returns the additional fields that you request in the `additionalModelResponseFields` field. 

### requestMetadata
<a name="converse-request-metadata"></a>

This field maps to a JSON object. You can specify metadata keys and values that they map to within this object. You can use request metadata to help you filter model invocation logs.

### serviceTier
<a name="inference-service-tiers"></a>

This field maps to a JSON object. You can specify the service tier for a particular request.

The following example shows the `serviceTier` structure:

```
"serviceTier": {
  "type": "reserved" | "priority" | "default" | "flex"
}
```

For detailed information about service tiers, including pricing and performance characteristics, see [Service tiers for optimizing performance and cost](service-tiers-inference.md).

You can also optionally add cache checkpoints to the `system` or `tools` fields to use prompt caching, depending on which model you're using. For more information, see [Prompt caching for faster model inference](prompt-caching.md).

## Response
<a name="conversation-inference-call-response"></a>

The response you get from the Converse API depends on which operation you call, `Converse` or `ConverseStream`.

**Topics**
+ [

### Converse response
](#conversation-inference-call-response-converse)
+ [

### ConverseStream response
](#conversation-inference-call-response-converse-stream)

### Converse response
<a name="conversation-inference-call-response-converse"></a>

In the response from `Converse`, the `output` field ([ConverseOutput](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseOutput.html)) contains the message ([Message](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Message.html)) that the model generates. The message content is in the `content` ([ContentBlock](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ContentBlock.html)) field and the role (`user` or `assistant`) that the message corresponds to is in the `role` field. 

If you used [prompt caching](prompt-caching.md), then in the usage field, `cacheReadInputTokensCount` and `cacheWriteInputTokensCount` tell you how many total tokens were read from the cache and written to the cache, respectively.

If you used [service tiers](#inference-service-tiers), then in the response field, `service tier` would tell you which service tier was used for the request.

The `metrics` field ([ConverseMetrics](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseMetrics.html)) includes metrics for the call. To determine why the model stopped generating content, check the `stopReason` field. You can get information about the tokens passed to the model in the request, and the tokens generated in the response, by checking the `usage` field ([TokenUsage](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_TokenUsage.html)). If you specified additional response fields in the request, the API returns them as JSON in the `additionalModelResponseFields` field. 

The following example shows the response from `Converse` when you pass the prompt discussed in [Request](#conversation-inference-call-request).

```
{
    "output": {
        "message": {
            "role": "assistant",
            "content": [
                {
                    "text": "Here is a list of 3 pop songs by artists from the United Kingdom:\n\n1. \"Wannabe\" by Spice Girls\n2. \"Bitter Sweet Symphony\" by The Verve \n3. \"Don't Look Back in Anger\" by Oasis"
                }
            ]
        }
    },
    "stopReason": "end_turn",
    "usage": {
        "inputTokens": 125,
        "outputTokens": 60,
        "totalTokens": 185
    },
    "metrics": {
        "latencyMs": 1175
    }
}
```

### ConverseStream response
<a name="conversation-inference-call-response-converse-stream"></a>

If you call `ConverseStream` to stream the response from a model, the stream is returned in the `stream` response field. The stream emits the following events in the following order.

1. `messageStart` ([MessageStartEvent](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_MessageStartEvent.html)). The start event for a message. Includes the role for the message.

1. `contentBlockStart` ([ContentBlockStartEvent](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ContentBlockStartEvent.html)). A Content block start event. Tool use only. 

1. `contentBlockDelta` ([ContentBlockDeltaEvent](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ContentBlockDeltaEvent.html)). A Content block delta event. Includes one of the following:
   + `text` – The partial text that the model generates.
   + `reasoningContent` – The partial reasoning carried out by the model to generate the response. You must submit the returned `signature`, in addition to all previous messages in subsequent `Converse` requests. If any of the messages are changed, the response throws an error.
   + `toolUse` – The partial input JSON object for tool use.

1. `contentBlockStop` ([ContentBlockStopEvent](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ContentBlockStopEvent.html)). A Content block stop event.

1. `messageStop` ([MessageStopEvent](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_MessageStopEvent.html)). The stop event for the message. Includes the reason why the model stopped generating output. 

1. `metadata` ([ConverseStreamMetadataEvent](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseStreamMetadataEvent.html)). Metadata for the request. The metadata includes the token usage in `usage` ([TokenUsage](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_TokenUsage.html)) and metrics for the call in `metrics` ([ConverseStreamMetadataEvent](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseStreamMetadataEvent.html)).

ConverseStream streams a complete content block as a `ContentBlockStartEvent` event, one or more `ContentBlockDeltaEvent` events, and a `ContentBlockStopEvent` event. Use the `contentBlockIndex` field as an index to correlate the events that make up a content block.

The following example is a partial response from `ConverseStream`. 

```
{'messageStart': {'role': 'assistant'}}
{'contentBlockDelta': {'delta': {'text': ''}, 'contentBlockIndex': 0}}
{'contentBlockDelta': {'delta': {'text': ' Title'}, 'contentBlockIndex': 0}}
{'contentBlockDelta': {'delta': {'text': ':'}, 'contentBlockIndex': 0}}
.
.
.
{'contentBlockDelta': {'delta': {'text': ' The'}, 'contentBlockIndex': 0}}
{'messageStop': {'stopReason': 'max_tokens'}}
{'metadata': {'usage': {'inputTokens': 47, 'outputTokens': 20, 'totalTokens': 67}, 'metrics': {'latencyMs': 100.0}}}
```

# Converse API examples
<a name="conversation-inference-examples"></a>

The following examples show you how to use the `Converse` and `ConverseStream` operations.

------
#### [ Text ]

This example shows how to call the `Converse` operation with the *Anthropic Claude 3 Sonnet* model. The example shows how to send the input text, inference parameters, and additional parameters that are unique to the model. The code starts a conversation by asking the model to create a list of songs. It then continues the conversation by asking that the songs are by artists from the United Kingdom.

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to use the <noloc>Converse</noloc> API with Anthropic Claude 3 Sonnet (on demand).
"""

import logging
import boto3

from botocore.exceptions import ClientError


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_conversation(bedrock_client,
                          model_id,
                          system_prompts,
                          messages):
    """
    Sends messages to a model.
    Args:
        bedrock_client: The Boto3 Bedrock runtime client.
        model_id (str): The model ID to use.
        system_prompts (JSON) : The system prompts for the model to use.
        messages (JSON) : The messages to send to the model.

    Returns:
        response (JSON): The conversation that the model generated.

    """

    logger.info("Generating message with model %s", model_id)

    # Inference parameters to use.
    temperature = 0.5
    top_k = 200

    # Base inference parameters to use.
    inference_config = {"temperature": temperature}
    # Additional inference parameters to use.
    additional_model_fields = {"top_k": top_k}

    # Send the message.
    response = bedrock_client.converse(
        modelId=model_id,
        messages=messages,
        system=system_prompts,
        inferenceConfig=inference_config,
        additionalModelRequestFields=additional_model_fields
    )

    # Log token usage.
    token_usage = response['usage']
    logger.info("Input tokens: %s", token_usage['inputTokens'])
    logger.info("Output tokens: %s", token_usage['outputTokens'])
    logger.info("Total tokens: %s", token_usage['totalTokens'])
    logger.info("Stop reason: %s", response['stopReason'])

    return response

def main():
    """
    Entrypoint for Anthropic Claude 3 Sonnet example.
    """

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    model_id = "anthropic.claude-3-sonnet-20240229-v1:0"

    # Setup the system prompts and messages to send to the model.
    system_prompts = [{"text": "You are an app that creates playlists for a radio station that plays rock and pop music. Only return song names and the artist."}]
    message_1 = {
        "role": "user",
        "content": [{"text": "Create a list of 3 pop songs."}]
    }
    message_2 = {
        "role": "user",
        "content": [{"text": "Make sure the songs are by artists from the United Kingdom."}]
    }
    messages = []

    try:

        bedrock_client = boto3.client(service_name='bedrock-runtime')

        # Start the conversation with the 1st message.
        messages.append(message_1)
        response = generate_conversation(
            bedrock_client, model_id, system_prompts, messages)

        # Add the response message to the conversation.
        output_message = response['output']['message']
        messages.append(output_message)

        # Continue the conversation with the 2nd message.
        messages.append(message_2)
        response = generate_conversation(
            bedrock_client, model_id, system_prompts, messages)

        output_message = response['output']['message']
        messages.append(output_message)

        # Show the complete conversation.
        for message in messages:
            print(f"Role: {message['role']}")
            for content in message['content']:
                print(f"Text: {content['text']}")
            print()

    except ClientError as err:
        message = err.response['Error']['Message']
        logger.error("A client error occurred: %s", message)
        print(f"A client error occured: {message}")

    else:
        print(
            f"Finished generating text with model {model_id}.")


if __name__ == "__main__":
    main()
```

------
#### [ Image ]

This example shows how to send an image as part of a message and requests that the model describe the image. The example uses `Converse` operation and the *Anthropic Claude 3 Sonnet* model. 

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to send an image with the <noloc>Converse</noloc> API with an accompanying text prompt to Anthropic Claude 3 Sonnet (on demand).
"""

import logging
import boto3


from botocore.exceptions import ClientError


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_conversation(bedrock_client,
                          model_id,
                          input_text,
                          input_image):
    """
    Sends a message to a model.
    Args:
        bedrock_client: The Boto3 Bedrock runtime client.
        model_id (str): The model ID to use.
        input text : The text prompt accompanying the image.
        input_image : The path to the input image.

    Returns:
        response (JSON): The conversation that the model generated.

    """

    logger.info("Generating message with model %s", model_id)

    # Get image extension and read in image as bytes
    image_ext = input_image.split(".")[-1]
    with open(input_image, "rb") as f:
        image = f.read()

    message = {
        "role": "user",
        "content": [
            {
                "text": input_text
            },
            {
                "image": {
                    "format": image_ext,
                    "source": {
                        "bytes": image
                    }
                }
            }
        ]
    }

    messages = [message]

    # Send the message.
    response = bedrock_client.converse(
        modelId=model_id,
        messages=messages
    )

    return response


def main():
    """
    Entrypoint for Anthropic Claude 3 Sonnet example.
    """

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    model_id = "anthropic.claude-3-sonnet-20240229-v1:0"
    input_text = "What's in this image?"
    input_image = "path/to/image"

    try:

        bedrock_client = boto3.client(service_name="bedrock-runtime")

        response = generate_conversation(
            bedrock_client, model_id, input_text, input_image)

        output_message = response['output']['message']

        print(f"Role: {output_message['role']}")

        for content in output_message['content']:
            print(f"Text: {content['text']}")

        token_usage = response['usage']
        print(f"Input tokens:  {token_usage['inputTokens']}")
        print(f"Output tokens:  {token_usage['outputTokens']}")
        print(f"Total tokens:  {token_usage['totalTokens']}")
        print(f"Stop reason: {response['stopReason']}")

    except ClientError as err:
        message = err.response['Error']['Message']
        logger.error("A client error occurred: %s", message)
        print(f"A client error occured: {message}")

    else:
        print(
            f"Finished generating text with model {model_id}.")


if __name__ == "__main__":
    main()
```

------
#### [ Document ]

This example shows how to send a document as part of a message and requests that the model describe the contents of the document. The example uses `Converse` operation and the *Anthropic Claude 3 Sonnet* model. 

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to send an document as part of a message to Anthropic Claude 3 Sonnet (on demand).
"""

import logging
import boto3


from botocore.exceptions import ClientError


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_message(bedrock_client,
                     model_id,
                     input_text,
                     input_document_path):
    """
    Sends a message to a model.
    Args:
        bedrock_client: The Boto3 Bedrock runtime client.
        model_id (str): The model ID to use.
        input text : The input message.
        input_document_path : The path to the input document.

    Returns:
        response (JSON): The conversation that the model generated.

    """

    logger.info("Generating message with model %s", model_id)

    # Get format from path and read the path
    input_document_format = input_document_path.split(".")[-1]
    with open(input_document_path, 'rb') as input_document_file:
        input_document = input_document_file.read()

    # Message to send.
    message = {
        "role": "user",
        "content": [
            {
                "text": input_text
            },
            {
                "document": {
                    "name": "MyDocument",
                    "format": input_document_format,
                    "source": {
                        "bytes": input_document
                    }
                }
            }
        ]
    }

    messages = [message]

    # Send the message.
    response = bedrock_client.converse(
        modelId=model_id,
        messages=messages
    )

    return response


def main():
    """
    Entrypoint for Anthropic Claude 3 Sonnet example.
    """

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    model_id = "anthropic.claude-3-sonnet-20240229-v1:0"
    input_text = "What's in this document?"
    input_document_path = "path/to/document"

    try:

        bedrock_client = boto3.client(service_name="bedrock-runtime")


        response = generate_message(
            bedrock_client, model_id, input_text, input_document_path)

        output_message = response['output']['message']

        print(f"Role: {output_message['role']}")

        for content in output_message['content']:
            print(f"Text: {content['text']}")

        token_usage = response['usage']
        print(f"Input tokens:  {token_usage['inputTokens']}")
        print(f"Output tokens:  {token_usage['outputTokens']}")
        print(f"Total tokens:  {token_usage['totalTokens']}")
        print(f"Stop reason: {response['stopReason']}")

    except ClientError as err:
        message = err.response['Error']['Message']
        logger.error("A client error occurred: %s", message)
        print(f"A client error occured: {message}")

    else:
        print(
            f"Finished generating text with model {model_id}.")


if __name__ == "__main__":
    main()
```

------
#### [ Streaming ]

This example shows how to call the `ConverseStream` operation with the *Anthropic Claude 3 Sonnet* model. The example shows how to send the input text, inference parameters, and additional parameters that are unique to the model.

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to use the <noloc>Converse</noloc> API to stream a response from Anthropic Claude 3 Sonnet (on demand).
"""

import logging
import boto3


from botocore.exceptions import ClientError


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def stream_conversation(bedrock_client,
                    model_id,
                    messages,
                    system_prompts,
                    inference_config,
                    additional_model_fields):
    """
    Sends messages to a model and streams the response.
    Args:
        bedrock_client: The Boto3 Bedrock runtime client.
        model_id (str): The model ID to use.
        messages (JSON) : The messages to send.
        system_prompts (JSON) : The system prompts to send.
        inference_config (JSON) : The inference configuration to use.
        additional_model_fields (JSON) : Additional model fields to use.

    Returns:
        Nothing.

    """

    logger.info("Streaming messages with model %s", model_id)

    response = bedrock_client.converse_stream(
        modelId=model_id,
        messages=messages,
        system=system_prompts,
        inferenceConfig=inference_config,
        additionalModelRequestFields=additional_model_fields
    )

    stream = response.get('stream')
    if stream:
        for event in stream:

            if 'messageStart' in event:
                print(f"\nRole: {event['messageStart']['role']}")

            if 'contentBlockDelta' in event:
                print(event['contentBlockDelta']['delta']['text'], end="")

            if 'messageStop' in event:
                print(f"\nStop reason: {event['messageStop']['stopReason']}")

            if 'metadata' in event:
                metadata = event['metadata']
                if 'usage' in metadata:
                    print("\nToken usage")
                    print(f"Input tokens: {metadata['usage']['inputTokens']}")
                    print(
                        f":Output tokens: {metadata['usage']['outputTokens']}")
                    print(f":Total tokens: {metadata['usage']['totalTokens']}")
                if 'metrics' in event['metadata']:
                    print(
                        f"Latency: {metadata['metrics']['latencyMs']} milliseconds")


def main():
    """
    Entrypoint for streaming message API response example.
    """

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    model_id = "anthropic.claude-3-sonnet-20240229-v1:0"
    system_prompt = """You are an app that creates playlists for a radio station
      that plays rock and pop music. Only return song names and the artist."""

    # Message to send to the model.
    input_text = "Create a list of 3 pop songs."

    message = {
        "role": "user",
        "content": [{"text": input_text}]
    }
    messages = [message]
    
    # System prompts.
    system_prompts = [{"text" : system_prompt}]

    # inference parameters to use.
    temperature = 0.5
    top_k = 200
    # Base inference parameters.
    inference_config = {
        "temperature": temperature
    }
    # Additional model inference parameters.
    additional_model_fields = {"top_k": top_k}

    try:
        bedrock_client = boto3.client(service_name='bedrock-runtime')

        stream_conversation(bedrock_client, model_id, messages,
                        system_prompts, inference_config, additional_model_fields)

    except ClientError as err:
        message = err.response['Error']['Message']
        logger.error("A client error occurred: %s", message)
        print("A client error occured: " +
              format(message))

    else:
        print(
            f"Finished streaming messages with model {model_id}.")


if __name__ == "__main__":
    main()
```

------
#### [ Video ]

This example shows how to send a video as part of a message and requests that the model describes the video. The example uses `Converse` operation and the Amazon Nova Pro model.

```
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Shows how to send a video with the <noloc>Converse</noloc> API to Amazon Nova Pro (on demand).
"""

import logging
import boto3


from botocore.exceptions import ClientError


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


def generate_conversation(bedrock_client,
                          model_id,
                          input_text,
                          input_video):
    """
    Sends a message to a model.
    Args:
        bedrock_client: The Boto3 Bedrock runtime client.
        model_id (str): The model ID to use.
        input text : The input message.
        input_video : The input video.

    Returns:
        response (JSON): The conversation that the model generated.

    """

    logger.info("Generating message with model %s", model_id)

    # Message to send.

    with open(input_video, "rb") as f:
        video = f.read()

    message = {
        "role": "user",
        "content": [
            {
                "text": input_text
            },
            {
                    "video": {
                        "format": 'mp4',
                        "source": {
                            "bytes": video
                        }
                    }
            }
        ]
    }

    messages = [message]

    # Send the message.
    response = bedrock_client.converse(
        modelId=model_id,
        messages=messages
    )

    return response


def main():
    """
    Entrypoint for Amazon Nova Pro example.
    """

    logging.basicConfig(level=logging.INFO,
                        format="%(levelname)s: %(message)s")

    model_id = "amazon.nova-pro-v1:0"
    input_text = "What's in this video?"
    input_video = "path/to/video"

    try:

        bedrock_client = boto3.client(service_name="bedrock-runtime")

        response = generate_conversation(
            bedrock_client, model_id, input_text, input_video)

        output_message = response['output']['message']

        print(f"Role: {output_message['role']}")

        for content in output_message['content']:
            print(f"Text: {content['text']}")

        token_usage = response['usage']
        print(f"Input tokens:  {token_usage['inputTokens']}")
        print(f"Output tokens:  {token_usage['outputTokens']}")
        print(f"Total tokens:  {token_usage['totalTokens']}")
        print(f"Stop reason: {response['stopReason']}")

    except ClientError as err:
        message = err.response['Error']['Message']
        logger.error("A client error occurred: %s", message)
        print(f"A client error occured: {message}")

    else:
        print(
            f"Finished generating text with model {model_id}.")


if __name__ == "__main__":
    main()
```

------

# API restrictions
<a name="inference-api-restrictions"></a>

The following restrictions apply to the `InvokeModel`, `InvokeModelWithResponseStream`, `Converse`, and `ConverseStream` operations. Some restrictions vary by operation or model as noted below:
+ When using these operations, you can only include images and documents if the `role` is `user`.
+ **Video generation:** Video generation is not supported with `InvokeModel` and `InvokeModelWithResponseStream`. Instead, you can use the [StartAsyncInvoke](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_StartAsyncInvoke.html) operation. For an example, see [Use Amazon Nova Reel to generate a video from a text prompt](https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-runtime_example_bedrock-runtime_Scenario_AmazonNova_TextToVideo_section.html).
+ **Document support in request body:** Including documents in the request body is not supported when using `InvokeModel` and `InvokeModelWithResponseStream`. To include a document during inference, use the [Chat/text playground](playgrounds.md) in the AWS Management Console or use the `Converse` or `ConverseStream` operations.
+ **Document count and size:** You can include up to 5 documents per request. Each document can be no more than 4.5 MB in size. For Claude 4 and subsequent versions, the 4.5 MB document size restriction doesn't apply to `PDF` format. For Nova models, the 4.5 MB document size restriction doesn't apply to `PDF` and `DOCX` formats. These restrictions continue to apply in the Bedrock Console. Individual models may have additional content restrictions beyond those applied by Amazon Bedrock. For more information, see **Third-party model provider requirements**.
+ **Image count and size**: Amazon Bedrock doesn't impose restrictions on image count and size. However, individual models may have specific image requirements. For more information, see **Third-party model provider requirements**.
+ **Third-party model provider requirements:** Third-party model provider requirements apply when you use `InvokeModel`, `InvokeModelWithResponseStream`, `Converse`, and `ConverseStream` operations, and may result in an error if not met. If you use a third-party model through Amazon Bedrock (for example, Anthropic Claude), review the provider's user guide and API documentation to avoid unexpected errors. For example, the Anthropic Messages standard endpoint supports a maximum request size of 32 MB. Claude also has specific content requirements, such as a maximum of 100 `PDF` pages per request and a maximum image size of 8000x8000 px. For the latest information about Anthropic Claude Messages requests and responses, including request size and content requirements, see the following Anthropic Claude documentation: [Anthropic Claude API Overview](https://platform.claude.com/docs/en/api/overview), [Anthropic Claude Messages API Reference](https://docs.anthropic.com/claude/reference/messages_post), [Build with Claude: Vision](https://platform.claude.com/docs/en/build-with-claude/vision) and [Build with Claude: PDF Support](https://platform.claude.com/docs/en/build-with-claude/pdf-support).

**Tip**  
Claude requires PDF documents to be a maximum of 100 pages per request. If you have larger PDF documents, we recommend splitting them into multiple PDFs under 100 pages each or consolidating more text into fewer pages.

# Get validated JSON results from models
<a name="structured-output"></a>

Structured outputs is a capability on Amazon Bedrock that ensures model responses conform to user-defined JSON schemas and tool definitions, reducing the need for custom parsing and validation mechanisms in production AI deployments.

## Benefits
<a name="structured-output-benefits"></a>

Structured outputs addresses critical challenges in production AI applications:
+ **Ensures schema compliance** – Eliminates error rates and retry loops from prompt-based approaches
+ **Reduced development complexity** – Removes the need for custom parsing and validation logic
+ **Lower operational costs** – Reduces failed requests and retries
+ **Production reliability** – Enables confident deployment of AI applications requiring predictable, machine-readable outputs

## How it works
<a name="structured-output-how-it-works"></a>

Structured outputs constrains model responses to follow a specific schema, ensuring valid, parseable output for downstream processing. You can use structured outputs through two complementary mechanisms:

### JSON Schema output format
<a name="structured-output-json-schema"></a>

For InvokeModel API with Anthropic Claude models, use the `output_config.format` request field. With open weight models, use the `response_format` request field. For Converse APIs, use the `outputConfig.textFormat` request field. The model's response will conform to the specified JSON schema.

### Strict tool use
<a name="structured-output-strict-tool-use"></a>

Add the `strict: true` flag to tool definitions to enable schema validation on tool names and inputs. The model's tool calls will then follow the defined tool input schema.

These mechanisms can be used independently or together in the same request. Refer to [Bedrock API documentation](https://docs.aws.amazon.com/bedrock/latest/APIReference/welcome.html) for more details.

### Request workflow
<a name="structured-output-request-workflow"></a>

The following describes how Amazon Bedrock processes requests with structured outputs:

1. **Initial request** – You include either a JSON schema via the `outputConfig.textFormat`, `output_config.format`, or `response_format` parameter or a tool definition with the `strict: true` flag in your inference request.

1. **Schema validation** – Amazon Bedrock validates the JSON schema format against the supported JSON Schema Draft 2020-12 subset. If the schema contains unsupported features, Amazon Bedrock returns a 400 error immediately.

1. **First-time compilation** – For new schemas, Amazon Bedrock compiles the grammar, which may take up to a few minutes.

1. **Caching** – Successfully compiled grammars are cached for 24 hours from first access. Cached grammars are encrypted with AWS-managed keys.

1. **Subsequent requests** – Identical schemas from the same account use cached grammars, resulting in inference latency comparable to standard requests with minimal overhead.

1. **Response** – You receive standard inference responses with strict schema compliance.

## Supported APIs or features
<a name="structured-output-supported-apis"></a>

You can use structured outputs across the following Amazon Bedrock features:

**Converse and ConverseStream APIs** – Use structured outputs with the Converse and ConverseStream APIs for conversational inference.

**InvokeModel and InvokeModelWithResponseStream APIs** – Use structured outputs with the InvokeModel and InvokeModelWithResponseStream APIs for single-turn inference.

**Cross-Region inference** – Use structured outputs within cross-Region inference without any additional setup.

**Batch inference** – Use structured outputs within batch inference without any additional setup.

**Note**  
Structured outputs is incompatible with citations for Anthropic models. If you enable citations while using structured outputs, the model will return a 400 error.

## Supported models
<a name="structured-output-supported-models"></a>

Structured outputs is generally available in all commercial AWS regions for the select Amazon Bedrock serverless models. For the list of supported model, refer Model support by feature.

### View all supported models
<a name="w2aac13c32c35c11b5b1"></a>

Anthropic  
+ Claude Haiku 4.5 (`anthropic.claude-haiku-4-5-20251001-v1:0`)
+ Claude Sonnet 4.5 (`anthropic.claude-sonnet-4-5-20250929-v1:0`)
+ Claude Sonnet 4.6 (`anthropic.claude-sonnet-4-6`)
+ Claude Opus 4.5 (`anthropic.claude-opus-4-5-20251101-v1:0`)
+ Claude Opus 4.6 (`anthropic.claude-opus-4-6-v1`)

Qwen  
+ Qwen3 235B A22B 2507 (`qwen.qwen3-235b-a22b-2507-v1:0`)
+ Qwen3 32B (dense) (`qwen.qwen3-32b-v1:0`)
+ Qwen3-Coder-30B-A3B-Instruct (`qwen.qwen3-coder-30b-a3b-v1:0`)
+ Qwen3 Coder 480B A35B Instruct (`qwen.qwen3-coder-480b-a35b-v1:0`)
+ Qwen3 Next 80B A3B (`qwen.qwen3-next-80b-a3b`)
+ Qwen3 VL 235B A22B (`qwen.qwen3-vl-235b-a22b`)

OpenAI  
+ gpt-oss-120b (`openai.gpt-oss-120b-1:0`)
+ gpt-oss-20b (`openai.gpt-oss-20b-1:0`)
+ GPT OSS Safeguard 120B (`openai.gpt-oss-safeguard-120b`)
+ GPT OSS Safeguard 20B (`openai.gpt-oss-safeguard-20b`)

DeepSeek  
+ DeepSeek-V3.1 (`deepseek.v3-v1:0`)

Google  
+ Gemma 3 12B IT (`google.gemma-3-12b-it`)
+ Gemma 3 27B PT (`google.gemma-3-27b-it`)

MiniMax  
+ MiniMax M2 (`minimax.minimax-m2`)

Mistral AI  
+ Magistral Small 2509 (`mistral.magistral-small-2509`)
+ Ministral 3B (`mistral.ministral-3-3b-instruct`)
+ Ministral 3 8B (`mistral.ministral-3-8b-instruct`)
+ Ministral 14B 3.0 (`mistral.ministral-3-14b-instruct`)
+ Mistral Large 3 (`mistral.mistral-large-3-675b-instruct`)
+ Voxtral Mini 3B 2507 (`mistral.voxtral-mini-3b-2507`)
+ Voxtral Small 24B 2507 (`mistral.voxtral-small-24b-2507`)

Moonshot AI  
+ Kimi K2 Thinking (`moonshot.kimi-k2-thinking`)

NVIDIA  
+ NVIDIA Nemotron Nano 12B v2 VL BF16 (`nvidia.nemotron-nano-12b-v2`)
+ NVIDIA Nemotron Nano 9B v2 (`nvidia.nemotron-nano-9b-v2`)

## Example requests
<a name="structured-output-examples"></a>

### JSON Schema output format
<a name="structured-output-json-schema-examples"></a>

The following examples show how to use JSON Schema output format with structured outputs.

#### Converse API
<a name="json-schema-converse"></a>

##### View example
<a name="w2aac13c32c35c13b3b5b3b1"></a>

```
{
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "text": "Given the following unstructured data, extract it into the provided structure."
        },
        {
          "text": "..."
        }
      ]
    }
  ],
  "outputConfig": {
    "textFormat": {
      "type": "json_schema",
      "structure": {
        "jsonSchema": {
          "schema": "{\"type\": \"object\", \"properties\": {\"title\": {\"type\": \"string\", \"description\": \"title\"}, \"summary\": {\"type\": \"string\", \"description\": \"summary\"}, \"next_steps\": {\"type\": \"string\", \"description\": \"next steps\"}}, \"required\": [\"title\", \"summary\", \"next_steps\"], \"additionalProperties\": false}",
          "name": "data_extraction",
          "description": "Extract structured data from unstructured text"
        }
      }
    }
  }
}
```

#### InvokeModel (Anthropic Claude)
<a name="json-schema-invokemodel-claude"></a>

##### View example
<a name="w2aac13c32c35c13b3b7b3b1"></a>

```
{
  "anthropic_version": "bedrock-2023-05-31",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Given the following unstructured data, extract it into the provided structure."
        },
        {
          "type": "text",
          "text": "..."
        }
      ]
    }
  ],
  "max_tokens": 3000,
  "temperature": 1.0,
  "output_config": {
    "format": {
      "type": "json_schema",
      "schema": {
        "type": "object",
        "properties": {
          "title": {
            "type": "string",
            "description": "title"
          },
          "summary": {
            "type": "string",
            "description": "summary"
          },
          "next_steps": {
            "type": "string",
            "description": "next steps"
          }
        },
        "required": [
          "title",
          "summary",
          "next_steps"
        ],
        "additionalProperties": false
      }
    }
  }
}
```

#### InvokeModel (Open-weight models)
<a name="json-schema-invokemodel-openweight"></a>

##### View example
<a name="w2aac13c32c35c13b3b9b3b1"></a>

```
{
  "messages": [
    {
      "role": "user",
      "content": "Given the following unstructured data, extract it into the provided structure."
    },
    {
      "role": "user",
      "content": "..."
    }
  ],
  "inferenceConfig": {
    "maxTokens": 3000,
    "temperature": 1.0
  },
  "response_format": {
    "json_schema": {
      "name": "summarizer",
      "schema": {
        "type": "object",
        "properties": {
          "title": {
            "type": "string",
            "description": "title"
          },
          "summary": {
            "type": "string",
            "description": "summary"
          },
          "next_steps": {
            "type": "string",
            "description": "next steps"
          }
        },
        "required": [
          "title",
          "summary",
          "next_steps"
        ],
        "additionalProperties": false
      }
    },
    "type": "json_schema"
  }
}
```

### Strict tool use
<a name="structured-output-strict-tool-examples"></a>

The following examples show how to use the strict field with tool use.

#### Converse API
<a name="strict-tool-converse"></a>

##### View example
<a name="w2aac13c32c35c13b5b5b3b1"></a>

```
{
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "text": "What's the weather like in New York?"
        }
      ]
    }
  ],
  "toolConfig": {
    "tools": [
      {
        "toolSpec": {
          "name": "get_weather",
          "description": "Get the current weather for a specified location",
          "strict": true,
          "inputSchema": {
            "json": {
              "type": "object",
              "properties": {
                "location": {
                  "type": "string",
                  "description": "The city and state, e.g. San Francisco, CA"
                },
                "unit": {
                  "type": "string",
                  "enum": [
                    "fahrenheit",
                    "celsius"
                  ],
                  "description": "The temperature unit to use"
                }
              },
              "required": [
                "location",
                "unit"
              ]
            }
          }
        }
      }
    ]
  }
}
```

#### InvokeModel (Anthropic Claude)
<a name="strict-tool-invokemodel-claude"></a>

##### View example
<a name="w2aac13c32c35c13b5b7b3b1"></a>

```
{
  "anthropic_version": "bedrock-2023-05-31",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What's the weather like in San Francisco?"
        }
      ]
    }
  ],
  "max_tokens": 3000,
  "temperature": 1.0,
  "tools": [
    {
      "name": "get_weather",
      "description": "Get the current weather for a specified location",
      "strict": true,
      "input_schema": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state, e.g. San Francisco, CA"
          },
          "unit": {
            "type": "string",
            "enum": [
              "fahrenheit",
              "celsius"
            ],
            "description": "The temperature unit to use"
          }
        },
        "required": [
          "location",
          "unit"
        ],
        "additionalProperties": false
      }
    }
  ]
}
```

#### InvokeModel (Open-weight models)
<a name="strict-tool-invokemodel-openweight"></a>

##### View example
<a name="w2aac13c32c35c13b5b9b3b1"></a>

```
{
  "messages": [
    {
      "role": "user",
      "content": "What's the weather like in San Francisco?"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current weather for a specified location",
        "strict": true,
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
            },
            "unit": {
              "type": "string",
              "enum": [
                "fahrenheit",
                "celsius"
              ],
              "description": "The temperature unit to use"
            }
          },
          "required": [
            "location",
            "unit"
          ]
        }
      }
    }
  ],
  "tool_choice": "auto",
  "max_tokens": 2000,
  "temperature": 1.0
}
```

# Use a computer use tool to complete an Amazon Bedrock model response
<a name="computer-use"></a>

Computer use is an Anthropic Claude model capability (in beta) available with Anthropic Claude 3.7 Sonnet and Claude 3.5 Sonnet v2 only. With computer use, Claude can help you automate tasks through basic GUI actions.

**Warning**  
Computer use feature is made available to you as a ‘Beta Service’ as defined in the AWS Service Terms. It is subject to your Agreement with AWS and the AWS Service Terms, and the applicable model EULA. Please be aware that the Computer Use API poses unique risks that are distinct from standard API features or chat interfaces. These risks are heightened when using the Computer Use API to interact with the Internet. To minimize risks, consider taking precautions such as:  
Operate computer use functionality in a dedicated Virtual Machine or container with minimal privileges to prevent direct system attacks or accidents.
To prevent information theft, avoid giving the Computer Use API access to sensitive accounts or data.
Limiting the computer use API’s internet access to required domains to reduce exposure to malicious content.
To ensure proper oversight, keep a human in the loop for sensitive tasks (such as making decisions that could have meaningful real-world consequences) and for anything requiring affirmative consent (such as accepting cookies, executing financial transactions, or agreeing to terms of service).
Any content that you enable Claude to see or access can potentially override instructions or cause Claude to make mistakes or perform unintended actions. Taking proper precautions, such as isolating Claude from sensitive surfaces, is essential — including to avoid risks related to prompt injection. Before enabling or requesting permissions necessary to enable computer use features in your own products, please inform end users of any relevant risks, and obtain their consent as appropriate. 

The computer use API offers several pre-defined computer use tools (*computer\$120241022*, *bash\$120241022*, and *text\$1editor\$120241022*) for you to use. You can then create a prompt with your request, such as “send an email to Ben with the notes from my last meeting” and a screenshot (when required). The response contains a list of `tool_use` actions in JSON format (for example, scroll\$1down, left\$1button\$1press, screenshot). Your code runs the computer actions and provides Claude with screenshot showcasing outputs (when requested).

The tools parameter has been updated to accept polymorphic tool types; a new `tool.type` property is being added to distinguish them. `type` is optional; if omitted, the tool is assumed to be a custom tool (previously the only tool type supported). Additionally, a new parameter, `anthropic_beta`, has been added, with a corresponding enum value: `computer-use-2024-10-22`. Only requests made with this parameter and enum can use the new computer use tools. It can be specified as follows: `"anthropic_beta": ["computer-use-2024-10-22"] `.

To use computer use with Anthropic Claude 3.5 Sonnet v2 you can use the Converse API ([Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html) or [ConverseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseStream.html)). You specify the computer use specific fields in the `additionalModelRequestFields` field. For general information about calling the Converse API, see [Carry out a conversation with the Converse API operations](conversation-inference.md).

It is possible to use tools with the base inference operations ([InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html)). To find the inference parameters that you pass in the request body, see the [Anthropic Claude Messages API](model-parameters-anthropic-claude-messages.md).

For more information, see [Computer use (beta)](https://docs.anthropic.com/en/docs/build-with-claude/computer-use) in the Anthropic documentation.

**Topics**
+ [

## Example code
](#computer-use-example-code)
+ [

## Example response
](#example-response)

## Example code
<a name="computer-use-example-code"></a>

The following code shows how to call the computer use API. The input is an image of the AWS console. 

```
with open('test_images/console.png', 'rb') as f:
        png = f.read()

    response = bedrock.converse(
        modelId='anthropic.claude-3-5-sonnet-20241022-v2:0',
        messages=[
            {
                'role': 'user',
                'content': [
                    {
                        'text': 'Go to the bedrock console'
                    },
                    {
                        'image': {
                            'format': 'png',
                            'source': {
                                'bytes': png
                            }
                        }
                    }
                ]
            }
        ],
        additionalModelRequestFields={
            "tools": [
                {
                    "type": "computer_20241022",
                    "name": "computer",
                    "display_height_px": 768,
                    "display_width_px": 1024,
                    "display_number": 0
                },
                {
                    "type": "bash_20241022",
                    "name": "bash",

                },
                {
                    "type": "text_editor_20241022",
                    "name": "str_replace_editor",
                }
            ],
            "anthropic_beta": ["computer-use-2024-10-22"]
        },
        toolConfig={
            'tools': [
                {
                    'toolSpec': {
                        'name': 'get_weather',
                        'inputSchema': {
                            'json': {
                                'type': 'object'
                            }
                        }
                    }
                }
            ]
        })

    print(json.dumps(response, indent=4))
```

## Example response
<a name="example-response"></a>

The example code emits output similar to the following.

```
{
   "id": "msg_bdrk_01Ch8g9MF3A9FTrmeywrwfMZ",
   "type": "message",
   "role": "assistant",
   "content": [
        {
            "type": "text",
            "text": "I can see from the screenshot that we're already in the AWS Console. To go to the Amazon Bedrock console specifically, I'll click on the Amazon Bedrock service from the \"Recently Visited\" section."
        },
        {
            "type": "tool_use",
            "id": "toolu_bdrk_013sAzs1gsda9wLrfD8bhYQ3",
            "name": "computer",
            "input": {
                "action": "screenshot"
            }
        }
   ],
   "stop_reason": "tool_use",
   "stop_sequence": null,
   "usage": {
       "input_tokens": 3710,
       "output_tokens": 97
   }
}
```