Using the Converse API - Amazon Bedrock

Using the Converse API

To use the Converse API, you call the Converse or ConverseStream operations to send messages to a model. To call Converse, you require permission for the bedrock:InvokeModel operation. To call ConverseStream, you require permission for the bedrock:InvokeModelWithResponseStream operation.

Request

You specify the model you want to use by setting the modelId field. For a list of model IDs that Amazon Bedrock supports, see Supported foundation models in Amazon Bedrock.

A conversation is a series of messages between the user and the model. You start a conversation by sending a message as a user (user role) to the model. The model, acting as an assistant (assistant role), then generates a response that it returns in a message. If desired, you can continue the conversation by sending further user role messages to the model. To maintain the conversation context, be sure to include any assistant role messages that you receive from the model in subsequent requests.

You provide the messages that you want to pass to a model in the messages field, which maps to an array of Message objects. Each Message contains the content for the message and the role that the message plays in the conversation.

Note

Amazon Bedrock doesn't store any text, images, or documents that you provide as content. The data is only used to generate the response. When using Converse API, you must use an uncompressed and decoded document that is less than 4.5 MB in size.

You add the content for the message in the content field, which maps to an array of ContentBlock objects. Within each ContentBlock, you can specify one of the following fields (to see what models support what modalities, see Supported models and model features):

text

The text field maps to a string specifying the prompt. The text field is interpreted alongside other fields that are specified in the same ContentBlock.

(Optional) For certain models, you can add cache checkpoints using cachePoint fields to utilize prompt caching. Prompt caching is a feature that enables you to begin caching the context of conversations to achieve cost and latency savings. For more information, see Prompt caching for faster model inference.

Note

Amazon Bedrock prompt caching is currently only available to a select number of customers. To learn more about participating in the preview, see Amazon Bedrock prompt caching.

The following shows a Message object with a content array containing only a text ContentBlock:

{ "role": "user | assistant", "content": [ { "text": "string" } ] }

The following shows a Message object with a content array containing a text ContentBlock and an optional cachePoint field. The content in the text ContentBlock is added to the cache as a result.

{ "role": "user | assistant", "content": [ { "text": "string" }, { "cachePoint": { "type": "default" } } ] }
image

The image field maps to an ImageBlock. Pass the raw bytes, encoded in base64, for an image in the bytes field. If you use an AWS SDK, you don't need to encode the bytes in base64.

If you exclude the text field, the model describes the image.

(Optional) For certain models, you can add cache checkpoints using cachePoint fields to utilize prompt caching. Prompt caching is a feature that enables you to begin caching the context of conversations to achieve cost and latency savings. For more information, see Prompt caching for faster model inference.

Note

Amazon Bedrock prompt caching is currently only available to a select number of customers. To learn more about participating in the preview, see Amazon Bedrock prompt caching.

The following shows a Message object with a content array containing only an image ContentBlock:

{ "role": "user", "content": [ { "image": { "format": "png | jpeg | gif | webp", "source": { "bytes": "image in bytes" } } } ] }

The following shows a Message object with a content array containing an image ContentBlock and an optional cachePoint field. The image content is added to the cache as a result.

{ "role": "user", "content": [ { "image": { "format": "png | jpeg | gif | webp", "source": { "bytes": "image in bytes" } } }, { "cachePoint": { "type": "default" } } ] }
document

The document field maps to an DocumentBlock. If you include a DocumentBlock, check that your request conforms to the following restrictions:

  • In the content field of the Message object, you must also include a text field with a prompt related to the document.

  • Pass the raw bytes, encoded in base64, for the document in the bytes field. If you use an AWS SDK, you don't need to encode the document bytes in base64.

  • The name field can only contain the following characters:

    • Alphanumeric characters

    • Whitespace characters (no more than one in a row)

    • Hyphens

    • Parentheses

    • Square brackets

    Note

    The name field is vulnerable to prompt injections, because the model might inadvertently interpret it as instructions. Therefore, we recommend that you specify a neutral name.

(Optional) For certain models, you can add cache checkpoints using cachePoint fields to utilize prompt caching. Prompt caching is a feature that enables you to begin caching the context of conversations to achieve cost and latency savings. For more information, see Prompt caching for faster model inference.

Note

Amazon Bedrock prompt caching is currently only available to a select number of customers. To learn more about participating in the preview, see Amazon Bedrock prompt caching.

The following shows a Message object with a content array containing only a document ContentBlock and a required accompanying text ContentBlock.

{ "role": "user", "content": [ { "text": "string" }, { "document": { "format": "pdf | csv | doc | docx | xls | xlsx | html | txt | md", "name": "string", "source": { "bytes": "document in bytes" } } } ] }

The following shows a Message object with a content array containing a document ContentBlock and a required accompanying text ContentBlock, as well as a cachePoint that adds both the document and text contents to the cache.

{ "role": "user", "content": [ { "text": "string" }, { "document": { "format": "pdf | csv | doc | docx | xls | xlsx | html | txt | md", "name": "string", "source": { "bytes": "document in bytes" } } }, { "cachePoint": { "type": "default" } } ] }
video

The video field maps to a VideoBlock. Pass the raw bytes in the bytes field, encoded in base64. If you use the AWS SDK, you don't need to encode the bytes in base64.

If you don't include the text field, the model will describe the video.

The following shows a Message object with a content array containing only a video ContentBlock.

{ "role": "user", "content": [ { "video": { "format": "mov | mkv | mp4 | webm | flv | mpeg | mpg | wmv | three_gp", "source": { "bytes": "video in bytes" } } } ] }

Note that for files with a .3gp extension, the format needs to be specified as three_gp.

You can also pass a video through an Amazon S3 URI instead of passing the bytes directly in the request body. The following shows a Message object with a content array containing only a video ContentBlock with the video source passed through an Amazon S3 URI.

{ "role": "user", "content": [ { "video": { "format": "mov | mkv | mp4 | webm | flv | mpeg | mpg | wmv | three_gp", "source": { "s3Location": { "uri": "s3 uri", "bucketOwner": "s3 uri bucket owner" } } } } ] }

The s3Location parameter is only supported in the US East (N. Virginia) region.

Note

The assumed role must have the s3:GetObject permission to the Amazon S3 URI. The bucketOwner field is optional but must be specified if the account making the request does not own the bucket the Amazon S3 URI is found in.

The other fields in ContentBlock are for tool use.

You specify the role in the role field. The role can be one of the following:

  • user — The human that is sending messages to the model.

  • assistant — The model that is sending messages back to the human user.

Note

The following restrictions pertain to the content field:

  • You can include up to 20 images. Each image's size, height, and width must be no more than 3.75 MB, 8,000 px, and 8,000 px, respectively.

  • You can include up to five documents. Each document's size must be no more than 4.5 MB.

  • You can only include images and documents if the role is user.

In the following messages example, the user asks for a list of three pop songs, and the model generates a list of songs.

[ { "role": "user", "content": [ { "text": "Create a list of 3 pop songs." } ] }, { "role": "assistant", "content": [ { "text": "Here is a list of 3 pop songs by artists from the United Kingdom:\n\n1. \"As It Was\" by Harry Styles\n2. \"Easy On Me\" by Adele\n3. \"Unholy\" by Sam Smith and Kim Petras" } ] } ]

A system prompt is a type of prompt that provides instructions or context to the model about the task it should perform, or the persona it should adopt during the conversation. You can specify a list of system prompts for the request in the system (SystemContentBlock) field, as shown in the following example.

[ { "text": "You are an app that creates play lists for a radio station that plays rock and pop music. Only return song names and the artist. " } ]

You can also optionally add cache checkpoints to the system or tools fields to use prompt caching, depending on which model you're using. For more information, see Prompt caching for faster model inference.

Note

Amazon Bedrock prompt caching is currently only available to a select number of customers. To learn more about participating in the preview, see Amazon Bedrock prompt caching.

Inference parameters

The Converse API supports a base set of inference parameters that you set in the inferenceConfig field (InferenceConfiguration). The base set of inference parameters are:

  • maxTokens – The maximum number of tokens to allow in the generated response.

  • stopSequences – A list of stop sequences. A stop sequence is a sequence of characters that causes the model to stop generating the response.

  • temperature – The likelihood of the model selecting higher-probability options while generating a response.

  • topP – The percentage of most-likely candidates that the model considers for the next token.

For more information, see Influence response generation with inference parameters.

The following example JSON sets the temperature inference parameter.

{"temperature": 0.5}

If the model you are using has additional inference parameters, you can set those parameters by specifying them as JSON in the additionalModelRequestFields field. The following example JSON shows how to set top_k, which is available in Anthropic Claude models, but isn't a base inference parameter in the messages API.

{"top_k": 200}

You can specify the paths for additional model parameters in the additionalModelResponseFieldPaths field, as shown in the following example.

[ "/stop_sequence" ]

The API returns the additional fields that you request in the additionalModelResponseFields field.

Response

The response you get from the Converse API depends on which operation you call, Converse or ConverseStream.

Converse response

In the response from Converse, the output field (ConverseOutput) contains the message (Message) that the model generates. The message content is in the content (ContentBlock) field and the role (user or assistant) that the message corresponds to is in the role field.

If you used prompt caching, then in the usage field, cacheReadInputTokensCount and cacheWriteInputTokensCount tell you how many total tokens were read from the cache and written to the cache, respectively.

The metrics field (ConverseMetrics) includes metrics for the call. To determine why the model stopped generating content, check the stopReason field. You can get information about the tokens passed to the model in the request, and the tokens generated in the response, by checking the usage field (TokenUsage). If you specified additional response fields in the request, the API returns them as JSON in the additionalModelResponseFields field.

The following example shows the response from Converse when you pass the prompt discussed in Request.

{ "output": { "message": { "role": "assistant", "content": [ { "text": "Here is a list of 3 pop songs by artists from the United Kingdom:\n\n1. \"Wannabe\" by Spice Girls\n2. \"Bitter Sweet Symphony\" by The Verve \n3. \"Don't Look Back in Anger\" by Oasis" } ] } }, "stopReason": "end_turn", "usage": { "inputTokens": 125, "outputTokens": 60, "totalTokens": 185 }, "metrics": { "latencyMs": 1175 } }

ConverseStream response

If you call ConverseStream to stream the response from a model, the stream is returned in the stream response field. The stream emits the following events in the following order.

  1. messageStart (MessageStartEvent). The start event for a message. Includes the role for the message.

  2. contentBlockStart (ContentBlockStartEvent). A Content block start event. Tool use only.

  3. contentBlockDelta (ContentBlockDeltaEvent). A Content block delta event. Includes the partial text that the model generates or the partial input json for tool use.

  4. contentBlockStop (ContentBlockStopEvent). A Content block stop event.

  5. messageStop (MessageStopEvent). The stop event for the message. Includes the reason why the model stopped generating output.

  6. metadata (ConverseStreamMetadataEvent). Metadata for the request. The metadata includes the token usage in usage (TokenUsage) and metrics for the call in metrics (ConverseStreamMetadataEvent).

ConverseStream streams a complete content block as a ContentBlockStartEvent event, one or more ContentBlockDeltaEvent events, and a ContentBlockStopEvent event. Use the contentBlockIndex field as an index to correlate the events that make up a content block.

The following example is a partial response from ConverseStream.

{'messageStart': {'role': 'assistant'}} {'contentBlockDelta': {'delta': {'text': ''}, 'contentBlockIndex': 0}} {'contentBlockDelta': {'delta': {'text': ' Title'}, 'contentBlockIndex': 0}} {'contentBlockDelta': {'delta': {'text': ':'}, 'contentBlockIndex': 0}} . . . {'contentBlockDelta': {'delta': {'text': ' The'}, 'contentBlockIndex': 0}} {'messageStop': {'stopReason': 'max_tokens'}} {'metadata': {'usage': {'inputTokens': 47, 'outputTokens': 20, 'totalTokens': 67}, 'metrics': {'latencyMs': 100.0}}}