Amazon Bedrock Studio is in preview release and is subject to change.
Influence model responses with inference parameters
Inference parameters are values that you can adjust to limit or influence how a model generates a response to a prompt. For example, in the chat app you create in Build a chat app with Amazon Bedrock Studio, you can use inference parameters to adjust the randomness and diversity of the songs that the model generates for a playlist.
You can apply inference parameters to models you use in explore mode, chat apps, and Prompt flows apps.
Randomness and diversity
For any given sequence, a model determines a probability distribution of options for the next token in the sequence. To generate each token in an output, the model samples from this distribution. Randomness and diversity refer to the amount of variation in a model's response. You can control these factors by limiting or adjusting the distribution. Foundation models typically support the following parameters to control randomness and diversity in the response.
-
Temperature– Affects the shape of the probability distribution for the predicted output and influences the likelihood of the model selecting lower-probability outputs.
-
Choose a lower value to influence the model to select higher-probability outputs.
-
Choose a higher value to influence the model to select lower-probability outputs.
In technical terms, the temperature modulates the probability mass function for the next token. A lower temperature steepens the function and leads to more deterministic responses, and a higher temperature flattens the function and leads to more random responses.
-
-
Top K – The number of most-likely candidates that the model considers for the next token.
-
Choose a lower value to decrease the size of the pool and limit the options to more likely outputs.
-
Choose a higher value to increase the size of the pool and allow the model to consider less likely outputs.
For example, if you choose a value of 50 for Top K, the model selects from 50 of the most probable tokens that could be next in the sequence.
-
-
Top P – The percentage of most-likely candidates that the model considers for the next token.
-
Choose a lower value to decrease the size of the pool and limit the options to more likely outputs.
-
Choose a higher value to increase the size of the pool and allow the model to consider less likely outputs.
In technical terms, the model computes the cumulative probability distribution for the set of responses and considers only the top P% of the distribution.
For example, if you choose a value of 0.8 for Top P, the model selects from the top 80% of the probability distribution of tokens that could be next in the sequence.
-
The following table summarizes the effects of these parameters.
Parameter | Effect of lower value | Effect of higher value |
---|---|---|
Temperature | Increase likelihood of higher-probability tokens
Decrease likelihood of lower-probability tokens |
Increase likelihood of lower-probability tokens Decrease likelihood of higher-probability tokens |
Top K | Remove lower-probability tokens | Allow lower-probability tokens |
Top P | Remove lower-probability tokens | Allow lower-probability tokens |
As an example to understand these parameters, consider the example prompt I hear
the hoof beats of "
. Let's say that the model determines the following three words
to be candidates for the next token. The model also assigns a probability for each word.
{ "horses": 0.7, "zebras": 0.2, "unicorns": 0.1 }
-
If you set a high temperature, the probability distribution is flattened and the probabilities become less different, which would increase the probability of choosing "unicorns" and decrease the probability of choosing "horses".
-
If you set Top K as 2, the model only considers the top 2 most likely candidates: "horses" and "zebras."
-
If you set Top P as 0.7, the model only considers "horses" because it is the only candidate that lies in the top 70% of the probability distribution. If you set Top P as 0.9, the model considers "horses" and "zebras" as they lie in the top 90% of probability distribution.