

# Invoke your imported model
<a name="invoke-imported-model"></a>

The model import job can take several minutes to import your model after you send [CreateModelImportJob](https://docs.aws.amazon.com//bedrock/latest/APIReference/API_CreateModelImportJob.html) request. You can check the status of your import job in the console or by calling the [GetModelImportJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_GetModelImportJob.html) operation and checking the `Status` field in the response. The import job is complete if the Status for the model is **Complete**. 

After your imported model is available in Amazon Bedrock, you can use the model with on demand throughput by sending [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) requests to make inference calls to the model. For more information, see [Submit a single prompt with InvokeModel](inference-invoke.md).

To interface with your imported model using the messages format, you can call the [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html) or [ConverseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseStream.html) operations. For more information, see [Using the Converse API](conversation-inference-call.md).

**Note**  
Converse API is not supported for Qwen2.5, Qwen2-VL, Qwen2.5-VL, and GPT-OSS models.

## Enhanced API Support: Multiple API Formats
<a name="enhanced-api-support"></a>

Starting November 17, 2025, Amazon Bedrock Custom Model Import supports comprehensive OpenAI-compatible API formats, providing flexibility in how you integrate and deploy your custom models. All models imported after November 11, 2025, will automatically benefit from these enhanced capabilities with no additional configuration required.

Custom Model Import now supports three API formats:
+ **BedrockCompletion (Text)** - Compatible with current Bedrock workflows
+ **OpenAICompletion (Text)** - OpenAI Completions Schema compatibility
+ **OpenAIChatCompletion (Text and Images)** - Full conversational Schema compatibility

These enhanced capabilities include structured outputs for enforcing JSON schemas and patterns, enhanced vision support with multi-image processing, log probabilities for model confidence insights, and tool calling capabilities for GPT-OSS models.

For detailed API reference documentation, see the official OpenAI documentation:
+ Completion: [OpenAI Completions API](https://platform.openai.com/docs/api-reference/completions)
+ ChatCompletion: [OpenAI Chat API](https://platform.openai.com/docs/api-reference/chat)

### API Format Examples
<a name="api-format-examples"></a>

The following examples demonstrate how to use each of the four supported API formats with your imported models.

------
#### [ BedrockCompletion ]

**BedrockCompletion** format is compatible with current Bedrock workflows and supports text-based inference requests.

Example request:

```
import json
import boto3

client = boto3.client('bedrock-runtime', region_name='us-east-1')

payload = {
    "prompt": "How is the rainbow formed?",
    "max_gen_len": 100,
    "temperature": 0.5
}

response = client.invoke_model(
    modelId='your-model-arn',
    body=json.dumps(payload),
    accept='application/json',
    contentType='application/json'
)

response_body = json.loads(response['body'].read())
```

Example response:

```
{
    "generation": " – A scientific explanation\nA rainbow is a beautiful natural phenomenon that occurs when sunlight passes through water droplets in the air. It is formed through a process called refraction, which is the bending of light as it passes from one medium to another.\nHere's a step-by-step explanation of how a rainbow is formed:\n1. Sunlight enters the Earth's atmosphere: The first step in forming a rainbow is for sunlight to enter the Earth's atmosphere. This sunlight is made up of a spectrum of",
    "prompt_token_count": 7,
    "generation_token_count": 100,
    "stop_reason": "length",
    "logprobs": null
}
```

BedrockCompletion supports structured outputs using `response_format` parameter with `json_object` and `json_schema` types.

------
#### [ OpenAICompletion ]

**OpenAICompletion** format provides OpenAI Completions Schema compatibility. To use this format, include the `max_tokens` parameter instead of `max_gen_len`.

Example request:

```
import json
import boto3

client = boto3.client('bedrock-runtime', region_name='us-east-1')

payload = {
    "prompt": "How is the rainbow formed?",
    "max_tokens": 100,
    "temperature": 0.5
}

response = client.invoke_model(
    modelId='your-model-arn',
    body=json.dumps(payload),
    accept='application/json',
    contentType='application/json'
)

response_body = json.loads(response['body'].read())
```

Example response:

```
{
    "id": "cmpl-b09d5810bd64428f8a853be71c31f912",
    "object": "text_completion",
    "created": 1763166682,
    "choices": [
        {
            "index": 0,
            "text": " The formation of a rainbow is a complex process that involves the interaction of sunlight with water droplets in the air. Here's a simplified explanation: 1. Sunlight enters the Earth's atmosphere and is refracted, or bent, as it passes through the air. 2. When sunlight encounters a water droplet, such as a cloud, mist, or fog, it is refracted again and split into its individual colors, a process known as dispersion. 3. The refracted and",
            "finish_reason": "length"
        }
    ],
    "usage": {
        "prompt_tokens": 7,
        "total_tokens": 107,
        "completion_tokens": 100
    }
}
```

OpenAICompletion supports full structured outputs capabilities including `json`, `regex`, `choice`, and `grammar` constraints using the `structured_outputs` parameter.

------
#### [ OpenAIChatCompletion ]

**OpenAIChatCompletion** format provides full conversational Schema compatibility and supports both text and image inputs.

Example request:

```
import json
import boto3

client = boto3.client('bedrock-runtime', region_name='us-east-1')

payload = {
    "messages": [
        {
            "role": "user",
            "content": "How is the rainbow formed?"
        }
    ],
    "max_tokens": 100,
    "temperature": 0.5
}

response = client.invoke_model(
    modelId='your-model-arn',
    body=json.dumps(payload),
    accept='application/json',
    contentType='application/json'
)

response_body = json.loads(response['body'].read())
```

Example response:

```
{
    "id": "chatcmpl-1d84ce1d3d61418e8c6d1973f87173db",
    "object": "chat.completion",
    "created": 1763166683,
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "A rainbow is a beautiful natural phenomenon that occurs when sunlight passes through water droplets in the air. The process of forming a rainbow involves several steps:\n\n1. **Sunlight**: The first requirement for a rainbow is sunlight. The sun should be shining brightly, but not directly overhead.\n2. **Water droplets**: The second requirement is water droplets in the air..."
            },
            "finish_reason": "length"
        }
    ],
    "usage": {
        "prompt_tokens": 41,
        "completion_tokens": 100,
        "total_tokens": 141
    }
}
```

OpenAIChatCompletion supports structured outputs using both `response_format` and `structured_outputs` parameters. For vision capabilities, include images in the content array with base64-encoded image data.

**Note**  
To use ChatCompletion format, the chat template needs to be part of the `tokenizer_config.json`. Custom Model Import will not apply any default chat templates to the request.

------

# Advanced API features for imported models
<a name="custom-model-import-advanced-features"></a>

This page provides detailed examples of advanced features available for models imported after November 11, 2025. These capabilities include structured outputs for controlled generation, enhanced vision support for multi-image processing, log probabilities for confidence insights, and tool calling for GPT-OSS models.

## Structured Outputs
<a name="structured-outputs"></a>

Structured outputs enable controlled generation following specific formats, schemas, or patterns. This feature ensures that the model's response adheres to predefined constraints, making it ideal for applications requiring consistent data formats, API integrations, or automated processing pipelines.

Structured outputs on Custom Model Import are supported via two parameters:
+ `response_format` - Supports `json_object` and `json_schema` types
+ `structured_outputs` - Supports `json`, `regex`, `choice`, and `grammar` types

**Note**  
When using structured outputs on Custom Model Import, customers should expect performance trade-offs due to constraint validation during generation. Simple constraints like `choice` and `json_object` have minimal impact, while complex constraints like `json_schema` and `grammar` can significantly increase latency and reduce throughput. For optimal performance, use simpler constraint types when possible and keep schemas flat rather than deeply nested.

The following examples demonstrate structured outputs support across different API formats. The Pydantic model definition is:

```
from pydantic import BaseModel
from enum import Enum

class CarType(str, Enum):
    sedan = "sedan"
    suv = "SUV"
    truck = "Truck"
    coupe = "Coupe"

class CarDescription(BaseModel):
    brand: str
    model: str
    car_type: CarType
```

------
#### [ BedrockCompletion ]

BedrockCompletion supports structured outputs using the `response_format` parameter with `json_object` and `json_schema` types only.

**Example: JSON Schema**

```
payload = {
    "prompt": "Generate a JSON with the brand, model and car_type of the most iconic car from the 90's",
    "response_format": {
        "type": "json_schema",
        "json_schema": CarDescription.model_json_schema()
    }
}

response = client.invoke_model(
    modelId='your-model-arn',
    body=json.dumps(payload),
    accept='application/json',
    contentType='application/json'
)

response_body = json.loads(response['body'].read())
```

Example response:

```
{
    "generation": "{\n    \"brand\": \"Ferrari\",\n    \"model\": \"F40\",\n    \"car_type\": \"SUV\"\n  }",
    "prompt_token_count": 22,
    "generation_token_count": 30,
    "stop_reason": "stop",
    "logprobs": null
}
```

------
#### [ OpenAICompletion ]

OpenAICompletion supports both `response_format` (json\$1object, json\$1schema) and `structured_outputs` (json, regex, choice, grammar) parameters. Use `max_tokens` instead of `max_gen_len` to route requests to OpenAICompletion.

**Example: Structured Outputs - Choice**

```
payload = {
    "prompt": "Classify the sentiment of this sentence. Amazon Bedrock CMI is Amazing!",
    "max_tokens": 10,
    "structured_outputs": {
        "choice": ["positive", "negative"]
    }
}

response = client.invoke_model(
    modelId='your-model-arn',
    body=json.dumps(payload),
    accept='application/json',
    contentType='application/json'
)

response_body = json.loads(response['body'].read())
```

Example response:

```
{
    "id": "cmpl-01f94c4652d24870bbb4d5418a01c384",
    "object": "text_completion",
    "choices": [
        {
            "index": 0,
            "text": "positive",
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 17,
        "completion_tokens": 4
    }
}
```

------
#### [ OpenAIChatCompletion ]

OpenAIChatCompletion supports both `response_format` (json\$1object, json\$1schema) and `structured_outputs` (json, regex, choice, grammar) parameters.

**Example: Response Format - JSON Schema**

```
payload = {
    "messages": [
        {"role": "user", "content": "Generate a JSON with the brand, model and car_type of the most iconic car from the 90's"}
    ],
    "max_tokens": 100,
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "car-description",
            "schema": CarDescription.model_json_schema()
        }
    }
}

response = client.invoke_model(
    modelId='your-model-arn',
    body=json.dumps(payload),
    accept='application/json',
    contentType='application/json'
)

response_body = json.loads(response['body'].read())
```

Example response:

```
{
    "id": "chatcmpl-cae5a43b0a924b8eb434510cbf978a19",
    "object": "chat.completion",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "{\"brand\": \"Dodge\", \"model\": \"Viper\", \"car_type\": \"Coupe\"}"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 56,
        "completion_tokens": 23
    }
}
```

------

## Vision Support
<a name="vision-support"></a>

Vision capabilities enable processing of images alongside text inputs, with enhanced multi-image support for complex visual analysis tasks. Custom Model Import now supports up to 3 images per request, enhanced from previous single-image limitation.

**Supported API:** OpenAIChatCompletion only. All models imported after November 11, 2025 will default to this API for vision capabilities.

**Image Requirements:**
+ Base64 encoding required - Image URLs will cause request failures
+ Maximum 3 images per request
+ High resolution images significantly increase processing time and memory usage

**Warning**  
High resolution images significantly increase processing time, memory usage, and may cause timeout errors. Multiple high-resolution images compound performance impact exponentially. For optimal performance, resize images appropriately and use lower detail levels when possible.

------
#### [ OpenAIChatCompletion ]

**Example: Multi-Image Processing**

```
import json
import boto3
import base64

client = boto3.client('bedrock-runtime', region_name='us-east-1')

# Load and encode images
with open('/path/to/car_image_1.jpg', 'rb') as f:
    image_data_1 = base64.b64encode(f.read()).decode('utf-8')

with open('/path/to/car_image_2.jpg', 'rb') as f:
    image_data_2 = base64.b64encode(f.read()).decode('utf-8')

payload = {
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant that can analyze images."
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Spot the difference between the two images?"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{image_data_1}"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{image_data_2}"
                    }
                }
            ]
        }
    ],
    "max_tokens": 300,
    "temperature": 0.5
}

response = client.invoke_model(
    modelId='your-model-arn',
    body=json.dumps(payload),
    accept='application/json',
    contentType='application/json'
)

response_body = json.loads(response['body'].read())
```

Example response:

```
{
    "id": "chatcmpl-ccae8a67e62f4014a9ffcbedfff96f44",
    "object": "chat.completion",
    "created": 1763167018,
    "model": "667387627229-g6vkuhd609s4",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "There are no differences between the two images provided. They appear to be identical.",
                "refusal": null,
                "annotations": null,
                "audio": null,
                "function_call": null,
                "tool_calls": [],
                "reasoning_content": null
            },
            "logprobs": null,
            "finish_reason": "stop",
            "stop_reason": null,
            "token_ids": null
        }
    ],
    "service_tier": null,
    "system_fingerprint": null,
    "usage": {
        "prompt_tokens": 2795,
        "total_tokens": 2812,
        "completion_tokens": 17,
        "prompt_tokens_details": null
    },
    "prompt_logprobs": null,
    "prompt_token_ids": null,
    "kv_transfer_params": null
}
```

------

## Log Probabilities
<a name="log-probabilities"></a>

Log probabilities represent the likelihood of each token in a sequence, calculated as log(p) where p is the probability of a token at any position given its previous token in the context. Since log probs are additive, sequence probability equals the sum of individual token log probs, making them useful for ranking generations by average per-token scores. Custom Model Import will always return the raw logprob values for requested tokens.

Key applications include classification tasks where log probs enable custom confidence thresholds, retrieval Q&A systems that use confidence scores to reduce hallucinations, autocomplete suggestions based on token likelihood, and perplexity calculations for comparing model performance across prompts. Log probs also provide token-level analysis capabilities, allowing developers to examine alternative tokens the model considered.

**Note**  
Logprobs are not cached. For a request requiring prompt logprobs, the system will ignore the prefix cache and recompute the prefill of full prompt to generate the logprobs. This presents an obvious performance tradeoff when using logprobs.

Log probability support varies by API format:
+ BedrockCompletion - Output tokens only
+ OpenAICompletion - Prompt and output tokens
+ OpenAIChatCompletion - Prompt and output tokens

------
#### [ BedrockCompletion ]

BedrockCompletion only supports output token logprobs. This will return top 1 logprob for each output token.

```
payload = {
    "prompt": "How is the rainbow formed?",
    "max_gen_len": 10,
    "temperature": 0.5,
    "return_logprobs": True
}

response = client.invoke_model(
    modelId='your-model-arn',
    body=json.dumps(payload),
    accept='application/json',
    contentType='application/json'
)

response_body = json.loads(response['body'].read())
```

Example response (truncated):

```
{
    "generation": " A rainbow is formed when sunlight passes through water dro",
    "prompt_token_count": 7,
    "generation_token_count": 10,
    "stop_reason": "length",
    "logprobs": [
        {
            "362": -2.1413702964782715
        },
        {
            "48713": -0.8180374503135681
        },
        {
            "374": -0.09657637774944305
        },
        ...
    ]
}
```

------
#### [ OpenAIChatCompletion ]

OpenAIChatCompletion supports both prompt and output token logprobs. You can set `top_logprobs=N` and `prompt_logprobs=N` where N is an integer representing log probabilities for the N most likely tokens at each position.

```
payload = {
    "messages": [
        {
            "role": "user",
            "content": "How is the rainbow formed?"
        }
    ],
    "max_tokens": 10,
    "temperature": 0.5,
    "logprobs": True,
    "top_logprobs": 1,
    "prompt_logprobs": 1
}

response = client.invoke_model(
    modelId='your-model-arn',
    body=json.dumps(payload),
    accept='application/json',
    contentType='application/json'
)

response_body = json.loads(response['body'].read())
```

Example response (truncated):

```
{
    "id": "chatcmpl-xxx",
    "object": "chat.completion",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "A rainbow is formed..."
            },
            "logprobs": {
                "content": [
                    {
                        "token": "A",
                        "logprob": -0.07903262227773666,
                        "bytes": [65],
                        "top_logprobs": [
                            {
                                "token": "A",
                                "logprob": -0.07903262227773666,
                                "bytes": [65]
                            }
                        ]
                    },
                    {
                        "token": " rainbow",
                        "logprob": -0.20187227427959442,
                        "bytes": [32, 114, 97, 105, 110, 98, 111, 119],
                        "top_logprobs": [...]
                    },
                    ...
                ]
            },
            "finish_reason": "length"
        }
    ],
    "usage": {
        "prompt_tokens": 41,
        "completion_tokens": 10,
        "total_tokens": 51
    }
}
```

------

You'll need the model ARN to make inference calls to your newly imported model. After the successful completion of the import job and after your imported model is active, you can get the model ARN of your imported model in the console or by sending a [ListImportedModels](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_ListImportedModels.html) request. 

When you invoke your imported model using `InvokeModel` or `InvokeModelWithStream`, your request is served within 5 minutes or you might get `ModelNotReadyException`. To understand the ModelNotReadyException, follow the steps in this next section for handling ModelNotreadyException. 

## Frequently Asked Questions
<a name="api-format-faq"></a>

**Q: What API format should I use?**

A: For maximum compatibility with various SDKs, we recommend using OpenAICompletion or OpenAIChatCompletion formats as they provide OpenAI-compatible schemas that are widely supported across different tools and libraries.

**Q: Does GPT-OSS on Amazon Bedrock Custom Model Import support the Converse API?**

A: No. GPT-OSS based custom model import models do not support the Converse API or ConverseStream API. You must use the [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) API with OpenAI-compatible schemas when working with GPT-OSS based custom models.

**Q: What models support tool calling?**

A: GPT-OSS based custom models support tool calling capabilities. Tool calling enables function calling for complex workflows.

**Q: What about models imported before November 11, 2025?**

A: Models imported before November 11, 2025, continue to work as is with their existing API formats and capabilities.

**Q: What about `generation_config.json` for OpenAI-based models?**

A: It is critical that you include the correct `generation_config.json` file when importing OpenAI-based models such as GPT-OSS. You must use the updated configuration file (updated August 13, 2024) available at [https://huggingface.co/openai/gpt-oss-20b/blob/main/generation\$1config.json](https://huggingface.co/openai/gpt-oss-20b/blob/main/generation_config.json). The updated configuration includes three end-of-sequence token IDs (`[200002, 199999, 200012]`), whereas older versions only included two tokens (`[200002, 199999]`). Using an outdated `generation_config.json` file will cause runtime errors during model invocation. This file is essential for proper model behavior and must be included with your OpenAI-based model imports.

## Handling ModelNotReadyException
<a name="handle-model-not-ready-exception"></a>

Amazon Bedrock Custom Model Import optimizes the hardware utilization by removing the models that are not active. If you try to invoke a model that has been removed, you'll get a `ModelNotReadyException`. After the model is removed and you invoke the model for the first time, Custom Model Import starts to restore the model. The restoration time depends on the on-demand fleet size and the model size.

If your `InvokeModel` or `InvokeModelWithStream` request returns `ModelNotReadyException`, follow the steps to handle the exception.

1. 

**Configure retries**

   By default, the request is automatically retried with exponential backoff. You can configure the maximum number of retries.

   The following example shows how to configure the retry. Replace *\$1\$1region-name\$1*, *\$1\$1model-arn\$1*, and *10* with your Region, model ARN, and maximum attempts.

   ```
   import json
   import boto3
   from botocore.config import Config
   
   
   REGION_NAME = ${region-name}
   MODEL_ID= '${model-arn}'
   
   config = Config(
       retries={
           'total_max_attempts': 10, //customizable
           'mode': 'standard'
       }
   )
   message = "Hello"
   
   
   session = boto3.session.Session()
   br_runtime = session.client(service_name = 'bedrock-runtime', 
                                    region_name=REGION_NAME, 
                                    config=config)
       
   try:
       invoke_response = br_runtime.invoke_model(modelId=MODEL_ID, 
                                               body=json.dumps({'prompt': message}), 
                                               accept="application/json", 
                                               contentType="application/json")
       invoke_response["body"] = json.loads(invoke_response["body"].read().decode("utf-8"))
       print(json.dumps(invoke_response, indent=4))
   except Exception as e:
       print(e)
       print(e.__repr__())
   ```

1. 

**Monitor response codes during retry attempts**

   Each retry attempt starts model restoration process. The restoration time depends on the availability of the on-demand fleet and the model size. Monitor the response codes while the restoration process is going on. 

   If the retries are consistently failing, continue with the next steps.

1. 

**Verify model was successfully imported**

   You can verify if the model was successfully imported by checking the status of your import job in the console or by calling the [GetModelImportJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_GetModelImportJob.html) operation. Check the `Status` field in the response. The import job is successful if the Status for the model is **Complete**. 

1. 

**Contact Support for further investigation**

   Open a ticket with Support For more information, see [Creating support cases](https://docs.aws.amazon.com//awssupport/latest/user/case-management.html).

   Include relevant details such as model ID and timestamps in the support ticket.