# Set up inference for a custom model
<a name="model-customization-use"></a>

After you create a custom model, you can set up inference using one of the following options:
+ **Purchase Provisioned Throughput** – Purchase Provisioned Throughput for your model to set up dedicated compute capacity with guaranteed throughput for consistent performance and lower latency. 

  For more information about Provisioned Throughput, see [Increase model invocation capacity with Provisioned Throughput in Amazon Bedrock](prov-throughput.md). For more information about using custom models with Provisioned Throughput, see [Purchase Provisioned Throughput for a custom model](custom-model-use-pt.md).
+ **Deploy custom model for on-demand inference** – To set up on-demand inference, you deploy the model with a custom model deployment. After you deploy the model, you invoke it using the ARN for the custom model deployment. With on-demand inference, you only pay for what you use and you don't need to set up provisioned compute resources.

  For more information about deploying custom models for on-demand inference, see [Deploy a custom model for on-demand inference](deploy-custom-model-on-demand.md).

**Topics**
+ [Purchase Provisioned Throughput for a custom model](custom-model-use-pt.md)
+ [Deploy a custom model for on-demand inference](deploy-custom-model-on-demand.md)

# Purchase Provisioned Throughput for a custom model
<a name="custom-model-use-pt"></a>

To use a custom model with dedicated compute capacity and guaranteed throughput, you can purchase Provisioned Throughput for it. You can then use the resulting provisioned model for inference. For more information about Provisioned Throughput, see [Increase model invocation capacity with Provisioned Throughput in Amazon Bedrock](prov-throughput.md).

------
#### [ Console ]

**To purchase Provisioned Throughput for a custom model.**

1. Sign in to the AWS Management Console with an IAM identity that has permissions to use the Amazon Bedrock console. Then, open the Amazon Bedrock console at [https://console.aws.amazon.com/bedrock](https://console.aws.amazon.com/bedrock).

1. From the left navigation pane, choose **Custom models** under **Tune**.

1. In the **Models** tab, choose the radio button next to the model for which you want to buy Provisioned Throughput or select the model name to navigate to the details page.

1. Select **Purchase Provisioned Throughput**.

1. For more details, follow the steps in the provisioned throughput documentation.

1. After purchasing Provisioned Throughput for your custom model, follow the steps in the provisioned throughput usage documentation.

When you carry out any operation that supports usage of custom models, you will see your custom model as an option in the model selection menu.

------
#### [ API ]

To purchase Provisioned Throughput for a custom model, follow the steps in the provisioned throughput documentation to send a [CreateProvisionedModelThroughput](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateProvisionedModelThroughput.html) (see link for request and response formats and field details) request with a [Amazon Bedrock control plane endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-cp). Use the name or ARN of your custom model as the `modelId`. The response returns a `provisionedModelArn` that you can use as the `modelId` when making an [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) request.

[See code examples](model-customization-code-samples.md)

------

# Deploy a custom model for on-demand inference
<a name="deploy-custom-model-on-demand"></a>

 After you create a custom model with a model customization job or import a SageMaker AI-trained custom Amazon Nova model, you can set up on-demand inference for the model. With on-demand inference, you only pay for what you use and you don't need to set up provisioned compute resources. 

To set up on-demand inference for a custom model, you deploy it with a custom model deployment. After you deploy your custom model, you use the deployment's Amazon Resource Name (ARN) as the `modelId` parameter when you submit prompts and generate responses with model inference.

 For information about on-demand inference pricing, see [Amazon Bedrock pricing](https://aws.amazon.com/bedrock/pricing). You can deploy a custom model for on-demand inference in the following Regions (for more information about Regions supported in Amazon Bedrock, see [Amazon Bedrock endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/bedrock.html)): 
+ US East (N. Virginia)
+ US West (Oregon)

## Prerequisites for deploying a custom model for on-demand inference
<a name="custom-model-inference-prerequisites"></a>

Before you can deploy a custom model for on-demand inference, make sure you meet the following requirements:
+ You must use the US East (N. Virginia) or US West (Oregon) region.
+ You must customize the model on or after 7/16/2025. For supported models, see [Supported base models](#custom-model-inference-supported-models).
+ Your account must have permission to access the model that you are deploying. For more information about model customization access and security, see [Model customization access and security](custom-model-job-access-security.md).
+ If the model is encrypted with a AWS KMS key, you must have permission to use that key. For more information, see [Encryption of custom models](encryption-custom-job.md).

## Supported base models
<a name="custom-model-inference-supported-models"></a>

You can set up on-demand inference for the following base models:
+ Amazon Nova Lite
+ Amazon Nova Micro
+ Amazon Nova Pro
+ Meta Llama 3.3 70B Instruct

## Deploy a custom model
<a name="deploy-custom-model"></a>

You can deploy a custom model with the Amazon Bedrock console, AWS Command Line Interface, or AWS SDKs. For information about using the deployment for inference, see [Use a deployment for on-demand inference](#use-custom-model-on-demand).

------
#### [ Console ]

You deploy a custom model from the **Custom models** page as follows. You can also deploy a model from the **Custom model on-demand** page with the same fields. To find this page, under **Infer** in the navigation pane, choose **Custom model on-demand**.

**To deploy a custom model**

1. Sign in to the AWS Management Console with an IAM identity that has permissions to use the Amazon Bedrock console. Then, open the Amazon Bedrock console at [https://console.aws.amazon.com/bedrock](https://console.aws.amazon.com/bedrock).

1. From the left navigation pane, choose **Custom models** under **Tune**.

1. In the **Models** tab, choose the radio button for the model you want to deploy.

1. Choose **Set up inference** and choose **Deploy for on-demand**.

1. In **Deployment details**, provide the following information:
   + **Deployment Name** (required) – Enter a unique name for your deployment.
   + **Description** (optional) – Enter a description for your deployment.
   + **Tags** (optional) – Add tags for cost allocation and resource management.

1. Choose **Create**. When the deployment's status is `Active`, your custom model is ready for on-demand inference. For more information about using the custom model, see [Use a deployment for on-demand inference](#use-custom-model-on-demand).

------
#### [ CLI ]

To deploy a custom model for on-demand inference using the AWS Command Line Interface, use the `create-custom-model-deployment` command with your custom model's Amazon Resource Name (ARN). This command uses the [CreateCustomModelDeployment](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateCustomModelDeployment.html) API operation. The response includes the deployment's ARN. When the deployment is active, you use this ARN as the `modelId` when making inference requests. For information about using the deployment for inference, see [Use a deployment for on-demand inference](#use-custom-model-on-demand).

```
aws bedrock create-custom-model-deployment \
--model-deployment-name "Unique name" \
--model-arn "Custom Model ARN" \
--description "Deployment description" \
--tags '[
    {
        "key": "Environment",
        "value": "Production"
    },
    {
        "key": "Team",
        "value": "ML-Engineering"
    },
    {
        "key": "Project",
        "value": "CustomerSupport"
    }
]' \
--client-request-token "unique-deployment-token" \
--region region
```

------
#### [ API ]

To deploy a custom model for on-demand inference, use the [CreateCustomModelDeployment](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateCustomModelDeployment.html) API operation with your custom model's Amazon Resource Name (ARN). The response includes the deployment's ARN. When the deployment is active, you use this ARN as the `modelId` when making inference requests. For information about using the deployment for inference, see [Use a deployment for on-demand inference](#use-custom-model-on-demand).

The following code shows how to use the SDK for Python (Boto3) to deploy a custom model.

```
def create_custom_model_deployment(bedrock_client):
    """Create a custom model deployment
    Args:
        bedrock_client: A boto3 Amazon Bedrock client for making API calls

    Returns:
        str: The ARN of the new custom model deployment

    Raises:
        Exception: If there is an error creating the deployment
    """

    try:
        response = bedrock_client.create_custom_model_deployment(
            modelDeploymentName="Unique deployment name",
            modelArn="Custom Model ARN",
            description="Deployment description",
            tags=[
                {'key': 'Environment', 'value': 'Production'},
                {'key': 'Team', 'value': 'ML-Engineering'},
                {'key': 'Project', 'value': 'CustomerSupport'}
            ],
            clientRequestToken=f"deployment-{uuid.uuid4()}"
        )

        deployment_arn = response['customModelDeploymentArn']
        print(f"Deployment created: {deployment_arn}")
        return deployment_arn

    except Exception as e:
        print(f"Error creating deployment: {str(e)}")
        raise
```

------

## Use a deployment for on-demand inference
<a name="use-custom-model-on-demand"></a>

After you deploy your custom model, you use the deployment's Amazon Resource Name (ARN) as the `modelId` parameter when you submit prompts and generate responses with model inference.

For information about making inference requests, see the following topics:
+ [Submit prompts and generate responses with model inference](inference.md)
+ [Prerequisites for running model inference](inference-prereq.md)
+ [Submit prompts and generate responses using the API](inference-api.md)

## Delete a custom model deployment
<a name="delete-custom-model-deployment"></a>

After you are finished using your model for on-demand inference, you can delete the deployment. After you delete the deployment, you can't use it for on-demand inference but deployment deletion doesn't delete the underlying custom model.

You can delete a custom model deployment with the Amazon Bedrock console, AWS Command Line Interface, or AWS SDKs.

**Important**  
Deleting a custom model deployment is irreversible. Make sure you no longer need the deployment before proceeding with the deletion. If you need to use the custom model for on-demand inference again, you must create a new deployment.

------
#### [ Console ]

**To delete a custom model deployment**

1. In the navigation pane, under **Infer**, choose **Custom model on-demand**.

1. Choose the custom model deployment you want to delete.

1. Choose **Delete**.

1. In the confirmation dialog, enter the deployment name to confirm the deletion.

1. Choose **Delete** to confirm deletion.

------
#### [ CLI ]

To delete a custom model deployment using the AWS Command Line Interface, use the `delete-custom-model-deployment` command with your deployment identifier. This command uses the [DeleteCustomModelDeployment](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_DeleteCustomModelDeployment.html) API operation. 

```
aws bedrock delete-custom-model-deployment \
--custom-model-deployment-identifier "deployment-arn-or-name" \
--region region
```

------
#### [ API ]

To delete a custom model deployment programmatically, use the [DeleteCustomModelDeployment](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_DeleteCustomModelDeployment.html) API operation with the deployment's Amazon Resource Name (ARN) or name. The following code shows how to use the SDK for Python (Boto3) to delete a custom model deployment.

```
def delete_custom_model_deployment(bedrock_client):
    """Delete a custom model deployment

    Args:
        bedrock_client: A boto3 Amazon Bedrock client for making API calls

    Returns:
        dict: The response from the delete operation

    Raises:
        Exception: If there is an error deleting the deployment
    """

    try:
        response = bedrock_client.delete_custom_model_deployment(
            customModelDeploymentIdentifier="Deployment identifier"
        )

        print("Deleting deployment...")
        return response

    except Exception as e:
        print(f"Error deleting deployment: {str(e)}")
        raise
```

------