This section provides inference parameters and code examples for using Anthropic Claude models with the Text Completions API.
Topics
Anthropic Claude Text Completions API overview
Use the Text Completion API for single-turn text generation from a user supplied prompt. For example, you can use the Text Completion API to generate text for a blog post or to summarize text input from a user.
For information about creating prompts for Anthropic Claude models, see Introduction to prompt design
Supported models
You can use the Text Completions API with the following Anthropic Claude models.
Anthropic Claude Instant v1.2
Anthropic Claude v2
Anthropic Claude v2.1
Request and Response
The request body is passed in the body
field of a request to
InvokeModel or InvokeModelWithResponseStream.
For more information,
see https://docs.anthropic.com/claude/reference/complete_post
Anthropic Claude has the following inference parameters for a Text Completion inference call.
{
"prompt": "\n\nHuman:<prompt>
\n\nAssistant:",
"temperature": float,
"top_p": float,
"top_k": int,
"max_tokens_to_sample": int,
"stop_sequences": [string]
}
The following are required parameters.
-
prompt – (Required) The prompt that you want Claude to complete. For proper response generation you need to format your prompt using alternating
\n\nHuman:
and\n\nAssistant:
conversational turns. For example:"\n\nHuman: {userQuestion}\n\nAssistant:"
For more information, see Prompt validation
in the Anthropic Claude documentation. -
max_tokens_to_sample – (Required) The maximum number of tokens to generate before stopping. We recommend a limit of 4,000 tokens for optimal performance.
Note that Anthropic Claude models might stop generating tokens before reaching the value of
max_tokens_to_sample
. Different Anthropic Claude models have different maximum values for this parameter. For more information, see Model comparisonin the Anthropic Claude documentation. Default Minimum Maximum 200
0
4096
The following are optional parameters.
-
stop_sequences – (Optional) Sequences that will cause the model to stop generating.
Anthropic Claude models stop on
"\n\nHuman:"
, and may include additional built-in stop sequences in the future. Use thestop_sequences
inference parameter to include additional strings that will signal the model to stop generating text. -
temperature – (Optional) The amount of randomness injected into the response. Use a value closer to 0 for analytical / multiple choice, and a value closer to 1 for creative and generative tasks.
Default Minimum Maximum 1
0
1
-
top_p – (Optional) Use nucleus sampling.
In nucleus sampling, Anthropic Claude computes the cumulative distribution over all the options for each subsequent token in decreasing probability order and cuts it off once it reaches a particular probability specified by
top_p
. You should alter eithertemperature
ortop_p
, but not both.Default Minimum Maximum 1
0
1
-
top_k – (Optional) Only sample from the top K options for each subsequent token.
Use
top_k
to remove long tail low probability responses.Default Minimum Maximum 250
0
500
Code example
These examples shows how to call the Anthropic Claude V2 model with on demand
throughput. To use Anthropic Claude version 2.1, change the value of modelId
to anthropic.claude-v2:1
.
import boto3
import json
brt = boto3.client(service_name='bedrock-runtime')
body = json.dumps({
"prompt": "\n\nHuman: explain black holes to 8th graders\n\nAssistant:",
"max_tokens_to_sample": 300,
"temperature": 0.1,
"top_p": 0.9,
})
modelId = 'anthropic.claude-v2'
accept = 'application/json'
contentType = 'application/json'
response = brt.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response.get('body').read())
# text
print(response_body.get('completion'))
The following example shows how to generate streaming text with Python
using the prompt
write an essay for living on mars in 1000
words
and the
Anthropic Claude V2 model:
import boto3
import json
brt = boto3.client(service_name='bedrock-runtime')
body = json.dumps({
'prompt': '\n\nHuman: write an essay for living on mars in 1000 words\n\nAssistant:',
'max_tokens_to_sample': 4000
})
response = brt.invoke_model_with_response_stream(
modelId='anthropic.claude-v2',
body=body
)
stream = response.get('body')
if stream:
for event in stream:
chunk = event.get('chunk')
if chunk:
print(json.loads(chunk.get('bytes').decode()))