Submit prompts and generate responses with model inference
Inference refers to the process of generating an output from an input provided to a model. Foundation models use probability to construct the words in a sequence. Given an input, the model predicts a probable sequence of tokens that follows, and returns that sequence as the output. Amazon Bedrock provides you the capability of running inference in the foundation model of your choice. When you run inference, you provide the following inputs:
-
Prompt – An input provided to the model in order for it to generate a response. For information about writing prompts, see Prompt engineering concepts. For information about protecting against prompt injection attacks, see Prompt injection security.
-
Model – A foundation model or inference profile to run inference with. The model or inference profile that you choose also specifies a level of throughput, which defines the number and rate of input and output tokens that you can process. For more information about the foundation models that are available in Amazon Bedrock, see Amazon Bedrock foundation model information. For more information about inference profiles, see Set up a model invocation resource using inference profiles. For more information about throughput, see Increase throughput for resiliency and processing power.
-
Inference parameters – A set of values that can be adjusted to limit or influence the model response. For information about inference parameters, see Influence response generation with inference parameters and Inference request parameters and response fields for foundation models.
Amazon Bedrock offers a suite of foundation models that you can use to generate outputs of the following modalities. To see modality support by foundation model, refer to Supported foundation models in Amazon Bedrock.
Output modality | Description | Example use cases |
---|---|---|
Text | Provide text input and generate various types of text | Chat, question-and-answering, brainstorming, summarization, code generation, table creation, data formatting, rewriting |
Image | Provide text or input images and generate or modify images | Image generation, image editing, image variation |
Embeddings | Provide text, images, or both text and images and generate a vector of numeric values that represent the input. The output vector can be compared to other embeddings vectors to determine semantic similarity (for text) or visual similarity (for images). | Text and image search, query, categorization, recommendations, personalization, knowledge base creation |
You can directly run model inference in the following ways:
-
In the AWS Management Console, use any of the Amazon Bedrock Playgrounds to run inference in a user-friendly graphical interface.
Use the Converse or ConverseStream API to implement conversational applications.
-
Use the InvokeModel or InvokeModelWithResponseStream API to submit a single prompt.
-
Prepare a dataset of prompts with your desired configurations and run batch inference with a CreateModelInvocationJob request.
The following Amazon Bedrock features also use model inference as a step in a larger workflow:
-
Model evaluation uses the model invocation process to evaluate the performance of different models after you submit a CreateEvaluationJob request.
-
Knowledge bases use model invocation when using the RetrieveAndGenerate API to generate a response based on results retrieved from a knowledge base.
-
Agents use model invocation to generate responses in various stages during an InvokeAgent request.
-
Flows include Amazon Bedrock resources, such as prompts, knowledge bases, and agents, which use model invocation.
After testing out different foundation models with different prompts and inference parameters, you can configure your application to call these APIs with your desired specifications.
Topics
- Influence response generation with inference parameters
- Supported Regions and models for running model inference
- Prerequisites for running model inference
- Generate responses in the console using playgrounds
- Optimize model inference for latency
- Submit prompts and generate responses using the API
- Use a tool to complete an Amazon Bedrock model response
- Use a computer use tool to complete an Amazon Bedrock model response
- Prompt caching for faster model inference