Custom Containers Contract for Multi-Model Endpoints
To handle multiple models, your container must support a set of APIs that enable
Amazon SageMaker AI to communicate with the container for loading, listing, getting, and unloading
models as required. The model_name
is used in the new set of APIs as the key
input parameter. The customer container is expected to keep track of the loaded models using
model_name
as the mapping key. Also, the model_name
is an opaque
identifier and is not necessarily the value of the TargetModel
parameter passed
into the InvokeEndpoint
API. The original TargetModel
value in the
InvokeEndpoint
request is passed to container in the APIs as a
X-Amzn-SageMaker-Target-Model
header that can be used for logging
purposes.
Note
Multi-model endpoints for GPU backed instances are currently supported only with SageMaker AI's NVIDIA Triton Inference Server container. This container already implements the contract defined below. Customers can directly use this container with their multi-model GPU endpoints, without any additional work.
You can configure the following APIs on your containers for CPU backed multi-model endpoints.
Load Model API
Instructs the container to load a particular model present in the url
field of the body into the memory of the customer container and to keep track of it with
the assigned model_name
. After a model is loaded, the container should be
ready to serve inference requests using this model_name
.
POST /models HTTP/1.1 Content-Type: application/json Accept: application/json { "model_name" : "{model_name}", "url" : "/opt/ml/models/{model_name}/model", }
Note
If model_name
is already loaded, this API should return 409. Any time a
model cannot be loaded due to lack of memory or to any other resource, this API should
return a 507 HTTP status code to SageMaker AI, which then initiates unloading unused models to
reclaim.
List Model API
Returns the list of models loaded into the memory of the customer container.
GET /models HTTP/1.1 Accept: application/json Response = { "models": [ { "modelName" : "{model_name}", "modelUrl" : "/opt/ml/models/{model_name}/model", }, { "modelName" : "{model_name}", "modelUrl" : "/opt/ml/models/{model_name}/model", }, .... ] }
This API also supports pagination.
GET /models HTTP/1.1 Accept: application/json Response = { "models": [ { "modelName" : "{model_name}", "modelUrl" : "/opt/ml/models/{model_name}/model", }, { "modelName" : "{model_name}", "modelUrl" : "/opt/ml/models/{model_name}/model", }, .... ] }
SageMaker AI can initially call the List Models API without providing a value for
next_page_token
. If a nextPageToken
field is returned as part
of the response, it will be provided as the value for next_page_token
in a
subsequent List Models call. If a nextPageToken
is not returned, it means
that there are no more models to return.
Get Model API
This is a simple read API on the model_name
entity.
GET /models/{model_name} HTTP/1.1 Accept: application/json { "modelName" : "{model_name}", "modelUrl" : "/opt/ml/models/{model_name}/model", }
Note
If model_name
is not loaded, this API should return 404.
Unload Model API
Instructs the SageMaker AI platform to instruct the customer container to unload a model from
memory. This initiates the eviction of a candidate model as determined by the platform
when starting the process of loading a new model. The resources provisioned to
model_name
should be reclaimed by the container when this API returns a
response.
DELETE /models/{model_name}
Note
If model_name
is not loaded, this API should return 404.
Invoke Model API
Makes a prediction request from the particular model_name
supplied. The
SageMaker AI Runtime InvokeEndpoint
request supports
X-Amzn-SageMaker-Target-Model
as a new header that takes the relative path
of the model specified for invocation. The SageMaker AI system constructs the absolute path of the
model by combining the prefix that is provided as part of the CreateModel
API
call with the relative path of the model.
POST /models/{model_name}/invoke HTTP/1.1 Content-Type: ContentType Accept: Accept X-Amzn-SageMaker-Custom-Attributes: CustomAttributes X-Amzn-SageMaker-Target-Model: [relativePath]/{artifactName}.tar.gz
Note
If model_name
is not loaded, this API should return 404.
Additionally, on GPU instances, if InvokeEndpoint
fails due to a lack of
memory or other resources, this API should return a 507 HTTP status code to SageMaker AI, which
then initiates unloading unused models to reclaim.