Influence model responses with inference parameters Prompt engineering guides

What is a prompt?

A prompt is input you send to a model in order for it to generate a response, in a process known as inference. For example, you could send the following prompt to a model.

What is Avebury stone circle?

When you send the prompt as a request to a model, the response similar to the following.


Avebury stone circle is a Neolithic monument located in Wiltshire, England. 
It consists of a massive circular bank and ditch, with a large outer circle of standing stones
that originally numbered around 100.

The actual response that you get for a prompt depends on the model you use.

Some models support multimodal prompts, which are prompts that might include text, images, or video (modality support varies by model). For example, you could pass an image to a model and ask questions such as What's in this image?. Not all models support multimodal prompts. To try sending prompt requests to a model, see chat playground.

Some models can generate images from text prompts and edit existing images according to changes that you request in the prompt. To try generating an image, use the image playground.

With Amazon Bedrock models you can use inference parameters to influence the response from a model. For example, you can use the temperature inference parameter to filter out lower probability responses.

In Amazon Bedrock IDE, you can create a chat agent app or a flows app. Apps take prompts as inputs. For example, a chat agent app takes a prompt as input and the model generates a response. You can continue the chat by sending further prompt requests to the app. To create a chat agent app, see Build a chat agent app with Amazon Bedrock IDE. If you create a flow app, you can also create reusable prompts that you can customize for different use cases. For more information, see Reuse and share prompts.

Topics

Influence model responses with inference parameters
Prompt engineering guides

Influence model responses with inference parameters

Inference parameters are values that you can adjust to limit or influence how a model generates a response to a prompt. For example, in the chat agent app you create in Build a chat agent app with Amazon Bedrock IDE, you can use inference parameters to adjust the randomness and diversity of the songs that the model generates for a playlist.

You can apply inference parameters to models you use in explore mode, chat agent apps, and flow apps.

Randomness and diversity

For any given sequence, a model determines a probability distribution of options for the next token in the sequence. To generate each token in an output, the model samples from this distribution. Randomness and diversity refer to the amount of variation in a model's response. You can control these factors by limiting or adjusting the distribution. Foundation models typically support the following parameters to control randomness and diversity in the response.

Temperature– Affects the shape of the probability distribution for the predicted output and influences the likelihood of the model selecting lower-probability outputs.
- Choose a lower value to influence the model to select higher-probability outputs.
- Choose a higher value to influence the model to select lower-probability outputs.
In technical terms, the temperature modulates the probability mass function for the next token. A lower temperature steepens the function and leads to more deterministic responses, and a higher temperature flattens the function and leads to more random responses.
Top K – The number of most-likely candidates that the model considers for the next token.
- Choose a lower value to decrease the size of the pool and limit the options to more likely outputs.
- Choose a higher value to increase the size of the pool and allow the model to consider less likely outputs.
For example, if you choose a value of 50 for Top K, the model selects from 50 of the most probable tokens that could be next in the sequence.
Top P – The percentage of most-likely candidates that the model considers for the next token.
- Choose a lower value to decrease the size of the pool and limit the options to more likely outputs.
- Choose a higher value to increase the size of the pool and allow the model to consider less likely outputs.
In technical terms, the model computes the cumulative probability distribution for the set of responses and considers only the top P% of the distribution.

For example, if you choose a value of 0.8 for Top P, the model selects from the top 80% of the probability distribution of tokens that could be next in the sequence.

The following table summarizes the effects of these parameters.

Parameter	Effect of lower value	Effect of higher value
Temperature	Increase likelihood of higher-probability tokens Decrease likelihood of lower-probability tokens	Increase likelihood of lower-probability tokens Decrease likelihood of higher-probability tokens
Top K	Remove lower-probability tokens	Allow lower-probability tokens
Top P	Remove lower-probability tokens	Allow lower-probability tokens

As an example to understand these parameters, consider the example prompt I hear the hoof beats of ". Let's say that the model determines the following three words to be candidates for the next token. The model also assigns a probability for each word.


{
    "horses": 0.7,
    "zebras": 0.2,
    "unicorns": 0.1
}

If you set a high temperature, the probability distribution is flattened and the probabilities become less different, which would increase the probability of choosing "unicorns" and decrease the probability of choosing "horses".
If you set Top K as 2, the model only considers the top 2 most likely candidates: "horses" and "zebras."
If you set Top P as 0.7, the model only considers "horses" because it is the only candidate that lies in the top 70% of the probability distribution. If you set Top P as 0.9, the model considers "horses" and "zebras" as they lie in the top 90% of probability distribution.

Prompt engineering guides

Amazon Bedrock IDE provides models from a variety of model providers. Each provider provides guidance on how to best create prompt for their models.

Anthropic Claude model prompt guide: https://docs.anthropic.com/claude/docs
Anthropic Claude prompt engineering resources: https://docs.anthropic.com/claude/docs/guide-to-anthropics-prompt-engineering-resources
Cohere prompt guide: https://txt.cohere.com/how-to-train-your-pet-llm-prompt-engineering
AI21 Labs Jurassic model prompt guide: https://docs.ai21.com/docs/prompt-engineering
Meta Llama 2 prompt guide: https://ai.meta.com/llama/get-started/#prompting
Stability documentation: https://platform.stability.ai/docs/getting-started
Mistral AI prompt guide: https://docs.mistral.ai/guides/prompting_capabilities/

For general guidelines about creating prompts with Amazon Bedrock, see General guidelines for Amazon Bedrock LLM users.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Experiment with the playgrounds

Chat with a model in the chat playground