

# Serverless endpoint creation
<a name="serverless-endpoints-create"></a>

**Important**  
Custom IAM policies that allow Amazon SageMaker Studio or Amazon SageMaker Studio Classic to create Amazon SageMaker resources must also grant permissions to add tags to those resources. The permission to add tags to resources is required because Studio and Studio Classic automatically tag any resources they create. If an IAM policy allows Studio and Studio Classic to create resources but does not allow tagging, "AccessDenied" errors can occur when trying to create resources. For more information, see [Provide permissions for tagging SageMaker AI resources](security_iam_id-based-policy-examples.md#grant-tagging-permissions).  
[AWS managed policies for Amazon SageMaker AI](security-iam-awsmanpol.md) that give permissions to create SageMaker resources already include permissions to add tags while creating those resources.

To create a serverless endpoint, you can use the Amazon SageMaker AI console, the APIs, or the AWS CLI. You can create a serverless endpoint using a similar process as a [real-time endpoint](realtime-endpoints.md).

**Topics**
+ [Create a model](serverless-endpoints-create-model.md)
+ [Create an endpoint configuration](serverless-endpoints-create-config.md)
+ [Create an endpoint](serverless-endpoints-create-endpoint.md)

# Create a model
<a name="serverless-endpoints-create-model"></a>

To create your model, you must provide the location of your model artifacts and container image. You can also use a model version from [SageMaker Model Registry](model-registry.md). The examples in the following sections show you how to create a model using the [CreateModel](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModel.html) API, Model Registry, and the [Amazon SageMaker AI console](https://console.aws.amazon.com/sagemaker/home).

## To create a model (using Model Registry)
<a name="serverless-endpoints-create-model-registry"></a>

[Model Registry](model-registry.md) is a feature of SageMaker AI that helps you catalog and manage versions of your model for use in ML pipelines. To use Model Registry with Serverless Inference, you must first register a model version in a Model Registry model group. To learn how to register a model in Model Registry, follow the procedures in [Create a Model Group](model-registry-model-group.md) and [Register a Model Version](model-registry-version.md).

The following example requires you to have the ARN of a registered model version and uses the [AWS SDK for Python (Boto3)](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) to call the [CreateModel](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModel.html) API. For Serverless Inference, Model Registry is currently only supported by the AWS SDK for Python (Boto3). For the example, specify the following values:
+ For `model_name`, enter a name for the model.
+ For `sagemaker_role`, you can use the default SageMaker AI-created role or a customized SageMaker AI IAM role from Step 4 of the [Complete the prerequisites](serverless-endpoints-prerequisites.md) section.
+ For `ModelPackageName`, specify the ARN for your model version, which must be registered to a model group in Model Registry.

```
#Setup
import boto3
import sagemaker
region = boto3.Session().region_name
client = boto3.client("sagemaker", region_name=region)

#Role to give SageMaker AI permission to access AWS services.
sagemaker_role = sagemaker.get_execution_role()

#Specify a name for the model
model_name = "<name-for-model>"

#Specify a Model Registry model version
container_list = [
    {
        "ModelPackageName": <model-version-arn>
     }
 ]

#Create the model
response = client.create_model(
    ModelName = model_name,
    ExecutionRoleArn = sagemaker_role,
    container_list
)
```

## To create a model (using API)
<a name="serverless-endpoints-create-model-api"></a>

The following example uses the [AWS SDK for Python (Boto3)](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) to call the [CreateModel](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModel.html) API. Specify the following values:
+ For `sagemaker_role,` you can use the default SageMaker AI-created role or a customized SageMaker AI IAM role from Step 4 of the [Complete the prerequisites](serverless-endpoints-prerequisites.md) section.
+ For `model_url`, specify the Amazon S3 URI to your model.
+ For `container`, retrieve the container you want to use by its Amazon ECR path. This example uses a SageMaker AI-provided XGBoost container. If you have not selected a SageMaker AI container or brought your own, see Step 6 of the [Complete the prerequisites](serverless-endpoints-prerequisites.md) section for more information.
+ For `model_name`, enter a name for the model.

```
#Setup
import boto3
import sagemaker
region = boto3.Session().region_name
client = boto3.client("sagemaker", region_name=region)

#Role to give SageMaker AI permission to access AWS services.
sagemaker_role = sagemaker.get_execution_role()

#Get model from S3
model_url = "s3://amzn-s3-demo-bucket/models/model.tar.gz"

#Get container image (prebuilt example)
from sagemaker import image_uris
container = image_uris.retrieve("xgboost", region, "0.90-1")

#Create model
model_name = "<name-for-model>"

response = client.create_model(
    ModelName = model_name,
    ExecutionRoleArn = sagemaker_role,
    Containers = [{
        "Image": container,
        "Mode": "SingleModel",
        "ModelDataUrl": model_url,
    }]
)
```

## To create a model (using the console)
<a name="serverless-endpoints-create-model-console"></a>

1. Sign in to the [Amazon SageMaker AI console](https://console.aws.amazon.com/sagemaker/home).

1. In the navigation tab, choose **Inference**.

1. Next, choose **Models**.

1. Choose **Create model**.

1. For **Model name**, enter a name for the model that is unique to your account and AWS Region.

1. For **IAM role**, either select an IAM role you have already created (see [Complete the prerequisites](serverless-endpoints-prerequisites.md)) or allow SageMaker AI to create one for you.

1. In **Container definition 1**, for **Container input options**, select **Provide model artifacts and input location**.

1. For **Provide model artifacts and inference image options**, select **Use a single model**.

1. For **Location of inference code image**, enter an Amazon ECR path to a container. The image must either be a SageMaker AI-provided first party image (e.g. TensorFlow, XGBoost) or an image that resides in an Amazon ECR repository within the same account in which you are creating the endpoint. If you do not have a container, go back to Step 6 of the [Complete the prerequisites](serverless-endpoints-prerequisites.md) section for more information.

1. For **Location of model artifacts**, enter the Amazon S3 URI to your ML model. For example, `s3://amzn-s3-demo-bucket/models/model.tar.gz`.

1. (Optional) For **Tags**, add key-value pairs to create metadata for your model.

1. Choose **Create model**.

# Create an endpoint configuration
<a name="serverless-endpoints-create-config"></a>

After you create a model, create an endpoint configuration. You can then deploy your model using the specifications in your endpoint configuration. In the configuration, you specify whether you want a real-time or serverless endpoint. To create a serverless endpoint configuration, you can use the [Amazon SageMaker AI console](https://console.aws.amazon.com/sagemaker/home), the [CreateEndpointConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpointConfig.html) API, or the AWS CLI. The API and console approaches are outlined in the following sections.

## To create an endpoint configuration (using API)
<a name="serverless-endpoints-create-config-api"></a>

The following example uses the [AWS SDK for Python (Boto3)](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) to call the [CreateEndpointConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpointConfig.html) API. Specify the following values:
+ For `EndpointConfigName`, choose a name for the endpoint configuration. The name should be unique within your account in a Region.
+ (Optional) For `KmsKeyId`, use the key ID, key ARN, alias name, or alias ARN for an AWS KMS key that you want to use. SageMaker AI uses this key to encrypt your Amazon ECR image.
+ For `ModelName`, use the name of the model you want to deploy. It should be the same model that you used in the [Create a model](serverless-endpoints-create-model.md) step.
+ For `ServerlessConfig`:
  + Set `MemorySizeInMB` to `2048`. For this example, we set the memory size to 2048 MB, but you can choose any of the following values for your memory size: 1024 MB, 2048 MB, 3072 MB, 4096 MB, 5120 MB, or 6144 MB. 
  + Set `MaxConcurrency` to `20`. For this example, we set the maximum concurrency to 20. The maximum number of concurrent invocations you can set for a serverless endpoint is 200, and the minimum value you can choose is 1.
  + (Optional) To use Provisioned Concurrency, set `ProvisionedConcurrency` to 10. For this example, we set the Provisioned Concurrency to 10. The `ProvisionedConcurrency` number for a serverless endpoint must be lower than or equal to the `MaxConcurrency` number. You can leave it empty if you want to use on-demand Serverless Inference endpoint. You can dynamically scale Provision Concurrency. For more information, see [Automatically scale Provisioned Concurrency for a serverless endpoint](serverless-endpoints-autoscale.md).

```
response = client.create_endpoint_config(
   EndpointConfigName="<your-endpoint-configuration>",
   KmsKeyId="arn:aws:kms:us-east-1:123456789012:key/143ef68f-76fd-45e3-abba-ed28fc8d3d5e",
   ProductionVariants=[
        {
            "ModelName": "<your-model-name>",
            "VariantName": "AllTraffic",
            "ServerlessConfig": {
                "MemorySizeInMB": 2048,
                "MaxConcurrency": 20,
                "ProvisionedConcurrency": 10,
            }
        } 
    ]
)
```

## To create an endpoint configuration (using the console)
<a name="serverless-endpoints-create-config-console"></a>

1. Sign in to the [Amazon SageMaker AI console](https://console.aws.amazon.com/sagemaker/home).

1. In the navigation tab, choose **Inference**.

1. Next, choose **Endpoint configurations**.

1. Choose **Create endpoint configuration**.

1. For **Endpoint configuration name**, enter a name that is unique within your account in a Region.

1. For **Type of endpoint**, select **Serverless**.  
![\[Screenshot of the endpoint type option in the console.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/serverless-endpoints-endpoint-config.png)

1. For **Production variants**, choose **Add model**.

1. Under **Add model**, select the model you want to use from the list of models and then choose **Save**.

1. After adding your model, under **Actions**, choose **Edit**.

1. For **Memory size**, choose the memory size you want in GB.  
![\[Screenshot of the memory size option in the console.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/serverless-endpoints-endpoint-config-2.png)

1. For **Max Concurrency**, enter your desired maximum concurrent invocations for the endpoint. The maximum value you can enter is 200 and the minimum is 1.

1. (Optional) To use Provisioned Concurrency, enter the desired number of concurrent invocations in the **Provisioned Concurrency setting** field. The number of provisioned concurrent invocations must be less than or equal to the number of maximum concurrent invocations.

1. Choose **Save**.

1. (Optional) For **Tags**, enter key-value pairs if you want to create metadata for your endpoint configuration.

1. Choose **Create endpoint configuration**.

# Create an endpoint
<a name="serverless-endpoints-create-endpoint"></a>

To create a serverless endpoint, you can use the [Amazon SageMaker AI console](https://console.aws.amazon.com/sagemaker/home), the [CreateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpoint.html) API, or the AWS CLI. The API and console approaches are outlined in the following sections. Once you create your endpoint, it can take a few minutes for the endpoint to become available.

## To create an endpoint (using API)
<a name="serverless-endpoints-create-endpoint-api"></a>

The following example uses the [AWS SDK for Python (Boto3)](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) to call the [CreateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpoint.html) API. Specify the following values:
+ For `EndpointName`, enter a name for the endpoint that is unique within a Region in your account.
+ For `EndpointConfigName`, use the name of the endpoint configuration that you created in the previous section.

```
response = client.create_endpoint(
    EndpointName="<your-endpoint-name>",
    EndpointConfigName="<your-endpoint-config>"
)
```

## To create an endpoint (using the console)
<a name="serverless-endpoints-create-endpoint-console"></a>

1. Sign in to the [Amazon SageMaker AI console](https://console.aws.amazon.com/sagemaker/home).

1. In the navigation tab, choose **Inference**.

1. Next, choose **Endpoints**.

1. Choose **Create endpoint**.

1. For **Endpoint name**, enter a name than is unique within a Region in your account.

1. For **Attach endpoint configuration**, select **Use an existing endpoint configuration**.

1. For **Endpoint configuration**, select the name of the endpoint configuration you created in the previous section and then choose **Select endpoint configuration**.

1. (Optional) For **Tags**, enter key-value pairs if you want to create metadata for your endpoint.

1. Choose **Create endpoint**.  
![\[Screenshot of the create and configure endpoint page in the console.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/serverless-endpoints-create.png)