View a markdown version of this page

Inference using Responses API - Amazon Bedrock

Inference using Responses API

Amazon Bedrock provides the OpenAI Responses API via the bedrock-mantle endpoint, powered by Mantle, a distributed inference engine for large-scale machine learning model serving. This endpoint allows you to use familiar OpenAI SDKs and tools with Amazon Bedrock models, enabling you to migrate existing applications with minimal code changes—simply update your base URL and API key.

Important

When using the OpenAI SDK with Amazon Bedrock, you must point it to the Amazon Bedrock endpoint, not the OpenAI endpoint. Set the following environment variables:

OPENAI_BASE_URL="https://bedrock-mantle.<your-region>.api.aws/v1" OPENAI_API_KEY="<your Bedrock API key>"

Do not use your OpenAI API key or the OpenAI base URL (https://api.openai.com/v1). Those connect to OpenAI directly, not to Amazon Bedrock. To create a Amazon Bedrock API key, see API keys.

Key benefits include:

  • Asynchronous inference – Support for long-running inference workloads through the Responses API

  • Stateful conversation management – Automatically rebuild context without manually passing conversation history with each request

  • Simplified tool use – Streamlined integration for agentic workflows

  • Flexible response modes – Support for both streaming and non-streaming responses

  • Easy migration – Compatible with existing OpenAI SDK codebases

Supported Regions and Endpoints

The bedrock-mantle endpoint is available in the following AWS Regions:

Region Name Region Endpoint
US East (Ohio) us-east-2 bedrock-mantle.us-east-2.api.aws
US East (N. Virginia) us-east-1 bedrock-mantle.us-east-1.api.aws
US West (Oregon) us-west-2 bedrock-mantle.us-west-2.api.aws
Asia Pacific (Jakarta) ap-southeast-3 bedrock-mantle.ap-southeast-3.api.aws
Asia Pacific (Mumbai) ap-south-1 bedrock-mantle.ap-south-1.api.aws
Asia Pacific (Sydney) ap-southeast-2 bedrock-mantle.ap-southeast-2.api.aws
Asia Pacific (Tokyo) ap-northeast-1 bedrock-mantle.ap-northeast-1.api.aws
Europe (Frankfurt) eu-central-1 bedrock-mantle.eu-central-1.api.aws
Europe (Ireland) eu-west-1 bedrock-mantle.eu-west-1.api.aws
Europe (London) eu-west-2 bedrock-mantle.eu-west-2.api.aws
Europe (Milan) eu-south-1 bedrock-mantle.eu-south-1.api.aws
Europe (Stockholm) eu-north-1 bedrock-mantle.eu-north-1.api.aws
South America (São Paulo) sa-east-1 bedrock-mantle.sa-east-1.api.aws

Prerequisites

Before using OpenAI APIs, ensure you have the following:

  • Authentication – You can authenticate using:

    • Amazon Bedrock API key (required for OpenAI SDK)

    • AWS credentials (supported for HTTP requests)

  • OpenAI SDK (optional) – Install the OpenAI Python SDK if using SDK-based requests.

  • Environment variables – Set the following environment variables:

    • OPENAI_API_KEY – Set to your Amazon Bedrock API key

    • OPENAI_BASE_URL – Set to the Amazon Bedrock endpoint for your region (for example, https://bedrock-mantle.us-east-1.api.aws/v1)

Models API

The Models API allows you to discover available models in Amazon Bedrock powered by Mantle. Use this API to retrieve a list of models you can use with the Responses API. For complete API details, see the OpenAI Models documentation.

List available models

To list available models, choose the tab for your preferred method, and then follow the steps:

OpenAI SDK (Python)
# List all available models using the OpenAI SDK # Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables from openai import OpenAI client = OpenAI() models = client.models.list() for model in models.data: print(model.id)
HTTP request

Make a GET request to /v1/models:

# List all available models # Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables curl -X GET $OPENAI_BASE_URL/models \ -H "Authorization: Bearer $OPENAI_API_KEY"

Responses API

The Responses API provides stateful conversation management with support for streaming, background processing, and multi-turn interactions. For complete API details, see the OpenAI Responses documentation.

Note

Not all models support the Responses API. To see which models support the Responses API, see API compatibility.

Basic request

To create a response, choose the tab for your preferred method, and then follow the steps:

OpenAI SDK (Python)
# Create a basic response using the OpenAI SDK # Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables from openai import OpenAI client = OpenAI() response = client.responses.create( model="openai.gpt-oss-120b", input=[ {"role": "user", "content": "Hello! How can you help me today?"} ] ) print(response)
HTTP request

Make a POST request to /v1/responses:

# Create a basic response # Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables curl -X POST $OPENAI_BASE_URL/responses \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -d '{ "model": "openai.gpt-oss-120b", "input": [ {"role": "user", "content": "Hello! How can you help me today?"} ] }'

Stream responses

To receive response events incrementally, choose the tab for your preferred method, and then follow the steps:

OpenAI SDK (Python)
# Stream response events incrementally using the OpenAI SDK # Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables from openai import OpenAI client = OpenAI() stream = client.responses.create( model="openai.gpt-oss-120b", input=[{"role": "user", "content": "Tell me a story"}], stream=True ) for event in stream: print(event)
HTTP request

Make a POST request to /v1/responses with stream set to true:

# Stream response events incrementally # Requires OPENAI_API_KEY and OPENAI_BASE_URL environment variables curl -X POST $OPENAI_BASE_URL/responses \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -d '{ "model": "openai.gpt-oss-120b", "input": [ {"role": "user", "content": "Tell me a story"} ], "stream": true }'