Create a Multi-container endpoint by calling CreateModel,
CreateEndpointConfig, and
CreateEndpoint
APIs as you would to create any other endpoints. You can
run these containers sequentially as an inference pipeline, or run each individual
container by using direct invocation. Multi-container endpoints have the following
requirements when you call create_model
:
-
Use the
Containers
parameter instead ofPrimaryContainer
, and include more than one container in theContainers
parameter. -
The
ContainerHostname
parameter is required for each container in a multi-container endpoint with direct invocation. -
Set the
Mode
parameter of theInferenceExecutionConfig
field toDirect
for direct invocation of each container, orSerial
to use containers as an inference pipeline. The default mode isSerial
.
Note
Currently there is a limit of up to 15 containers supported on a multi-container endpoint.
The following example creates a multi-container model for direct invocation.
-
Create container elements and
InferenceExecutionConfig
with direct invocation.container1 = { 'Image': '123456789012.dkr.ecr.us-east-1.amazonaws.com/myimage1:mytag', 'ContainerHostname': 'firstContainer' } container2 = { 'Image': '123456789012.dkr.ecr.us-east-1.amazonaws.com/myimage2:mytag', 'ContainerHostname': 'secondContainer' } inferenceExecutionConfig = {'Mode': 'Direct'}
-
Create the model with the container elements and set the
InferenceExecutionConfig
field.import boto3 sm_client = boto3.Session().client('sagemaker') response = sm_client.create_model( ModelName = 'my-direct-mode-model-name', InferenceExecutionConfig = inferenceExecutionConfig, ExecutionRoleArn = role, Containers = [container1, container2] )
To create an endoint, you would then call create_endpoint_config