Submit prompts and generate responses with model inference - Amazon Bedrock

Submit prompts and generate responses with model inference

Inference refers to the process of generating an output from an input provided to a model. Foundation models use probability to construct the words in a sequence. Given an input, the model predicts a probable sequence of tokens that follows, and returns that sequence as the output. Amazon Bedrock provides you the capability of running inference in the foundation model of your choice. When you run inference, you provide the following inputs.

Amazon Bedrock offers a suite of foundation models that you can use to generate outputs of the following modalities. To see modality support by foundation model, refer to Supported foundation models in Amazon Bedrock.

Output modality Description Example use cases
Text Provide text input and generate various types of text Chat, question-and-answering, brainstorming, summarization, code generation, table creation, data formatting, rewriting
Image Provide text or input images and generate or modify images Image generation, image editing, image variation
Embeddings Provide text, images, or both text and images and generate a vector of numeric values that represent the input. The output vector can be compared to other embeddings vectors to determine semantic similarity (for text) or visual similarity (for images). Text and image search, query, categorization, recommendations, personalization, knowledge base creation

When you run inference, you specify the level of throughput to use by selecting a throughput in the console or by specifying the throughput in the modelId field in an API request. Throughput defines the number and rate of input and output tokens that you can process. For more information, see Increase throughput for resiliency and processing power.

You can run model inference in the following ways.

  • Use any of the Playgrounds to run inference in a user-friendly graphical interface.

  • Use the Converse API (Converse and ConverseStream) to implement conversational applications.

  • Send an InvokeModel or InvokeModelWithResponseStream request.

  • Prepare a dataset of prompts with your desired configurations and run batch inference with a CreateModelInvocationJob request.

  • The following Amazon Bedrock features use model inference as a step in a larger orchestration. Refer to those sections for more details.

You can run inference with base models, custom models, or provisioned models. To run inference on a custom model, first purchase Provisioned Throughput for it (for more information, see Increase model invocation capacity with Provisioned Throughput in Amazon Bedrock).

Use these methods to test foundation model responses with different prompts and inference parameters. Once you have sufficiently explored these methods, you can set up your application to run model inference by calling these APIs.

Select a topic to learn more about running model inference through that method. To learn more about using agents, see Automate tasks in your application using conversational agents.