Influence model responses with inference parameters - Amazon Bedrock Studio

Amazon Bedrock Studio, renamed to Amazon Bedrock IDE, is now available in Amazon SageMaker Unified Studio. Amazon Bedrock Studio will be available until February 28, 2025. You may access existing workspaces in this previous version through February 28, 2025, but you may not create new workspaces. To access the enhanced GA version of Amazon Bedrock Studio with additional features and capabilities, you can create a new Amazon SageMaker Unified Studio domain. To learn about Amazon Bedrock Studio IDE, see the documentation.

Influence model responses with inference parameters

Inference parameters are values that you can adjust to limit or influence how a model generates a response to a prompt. For example, in the chat app you create in Build a chat app with Amazon Bedrock Studio, you can use inference parameters to adjust the randomness and diversity of the songs that the model generates for a playlist.

You can apply inference parameters to models you use in explore mode, chat apps, and Flows apps.

Randomness and diversity

For any given sequence, a model determines a probability distribution of options for the next token in the sequence. To generate each token in an output, the model samples from this distribution. Randomness and diversity refer to the amount of variation in a model's response. You can control these factors by limiting or adjusting the distribution. Foundation models typically support the following parameters to control randomness and diversity in the response.

  • Temperature– Affects the shape of the probability distribution for the predicted output and influences the likelihood of the model selecting lower-probability outputs.

    • Choose a lower value to influence the model to select higher-probability outputs.

    • Choose a higher value to influence the model to select lower-probability outputs.

    In technical terms, the temperature modulates the probability mass function for the next token. A lower temperature steepens the function and leads to more deterministic responses, and a higher temperature flattens the function and leads to more random responses.

  • Top K – The number of most-likely candidates that the model considers for the next token.

    • Choose a lower value to decrease the size of the pool and limit the options to more likely outputs.

    • Choose a higher value to increase the size of the pool and allow the model to consider less likely outputs.

    For example, if you choose a value of 50 for Top K, the model selects from 50 of the most probable tokens that could be next in the sequence.

  • Top P – The percentage of most-likely candidates that the model considers for the next token.

    • Choose a lower value to decrease the size of the pool and limit the options to more likely outputs.

    • Choose a higher value to increase the size of the pool and allow the model to consider less likely outputs.

    In technical terms, the model computes the cumulative probability distribution for the set of responses and considers only the top P% of the distribution.

    For example, if you choose a value of 0.8 for Top P, the model selects from the top 80% of the probability distribution of tokens that could be next in the sequence.

The following table summarizes the effects of these parameters.

Parameter Effect of lower value Effect of higher value
Temperature Increase likelihood of higher-probability tokens

Decrease likelihood of lower-probability tokens

Increase likelihood of lower-probability tokens

Decrease likelihood of higher-probability tokens

Top K Remove lower-probability tokens Allow lower-probability tokens
Top P Remove lower-probability tokens Allow lower-probability tokens

As an example to understand these parameters, consider the example prompt I hear the hoof beats of ". Let's say that the model determines the following three words to be candidates for the next token. The model also assigns a probability for each word.

{ "horses": 0.7, "zebras": 0.2, "unicorns": 0.1 }
  • If you set a high temperature, the probability distribution is flattened and the probabilities become less different, which would increase the probability of choosing "unicorns" and decrease the probability of choosing "horses".

  • If you set Top K as 2, the model only considers the top 2 most likely candidates: "horses" and "zebras."

  • If you set Top P as 0.7, the model only considers "horses" because it is the only candidate that lies in the top 70% of the probability distribution. If you set Top P as 0.9, the model considers "horses" and "zebras" as they lie in the top 90% of probability distribution.