Submit a single prompt with InvokeModel
Run inference on a model through the API by sending an InvokeModel or InvokeModelWithResponseStream request. To check if a model supports streaming, send a GetFoundationModel or ListFoundationModels request and check the value in the responseStreamingSupported
field.
The following fields are required:
Field | Use case |
---|---|
modelId | To specify the model, inference profile, or prompt from Prompt management to use. To learn how to find this value, see Submit prompts and generate responses using the API. |
body | To specify the inference parameters for a model. To see inference parameters for different models, see Inference request parameters and response fields for foundation models. If you specify a prompt from Prompt management in the modelId field, omit this field (if you include it, it will be ignored). |
The following fields are optional:
Field | Use case |
---|---|
accept | To specify the media type for the request body. For more information, see Media Types on the Swagger website |
contentType | To specify the media type for the response body. For more information, see Media Types on the Swagger website |
explicitPromptCaching | To specify whether prompt caching is enabled or disabled. For more information, see Prompt caching for faster model inference. |
guardrailIdentifier | To specify a guardrail to apply to the prompt and response. For more information, see Test a guardrail. |
guardrailVersion | To specify a guardrail to apply to the prompt and response. For more information, see Test a guardrail. |
trace | To specify whether to return the trace for the guardrail you specify. For more information, see Test a guardrail. |
Invoke model code examples
The following examples show how to run inference with the InvokeModel API. For examples with different models, see the inference parameter reference for the desired model (Inference request parameters and response fields for foundation models).
Invoke model with streaming code example
Note
The AWS CLI does not support streaming.
The following example shows how to use the InvokeModelWithResponseStream API to generate streaming text with Python
using the prompt
write an essay for living on mars in 1000
words
.
import boto3 import json brt = boto3.client(service_name='bedrock-runtime') body = json.dumps({ 'prompt': '\n\nHuman: write an essay for living on mars in 1000 words\n\nAssistant:', 'max_tokens_to_sample': 4000 }) response = brt.invoke_model_with_response_stream( modelId='anthropic.claude-v2', body=body ) stream = response.get('body') if stream: for event in stream: chunk = event.get('chunk') if chunk: print(json.loads(chunk.get('bytes').decode()))