Supported algorithms, frameworks, and instances for multi-model endpoints - Amazon SageMaker AI

Supported algorithms, frameworks, and instances for multi-model endpoints

For information about the algorithms, frameworks, and instance types that you can use with multi-model endpoints, see the following sections.

Supported algorithms, frameworks, and instances for multi-model endpoints using CPU backed instances

The inference containers for the following algorithms and frameworks support multi-model endpoints:

To use any other framework or algorithm, use the SageMaker AI inference toolkit to build a container that supports multi-model endpoints. For information, see Build Your Own Container for SageMaker AI Multi-Model Endpoints.

Multi-model endpoints support all of the CPU instance types.

Supported algorithms, frameworks, and instances for multi-model endpoints using GPU backed instances

Hosting multiple GPU backed models on multi-model endpoints is supported through the SageMaker AI Triton Inference server. This supports all major inference frameworks such as NVIDIA® TensorRT™, PyTorch, MXNet, Python, ONNX, XGBoost, scikit-learn, RandomForest, OpenVINO, custom C++, and more.

To use any other framework or algorithm, you can use Triton backend for Python or C++ to write your model logic and serve any custom model. After you have the server ready, you can start deploying 100s of Deep Learning models behind one endpoint.

Multi-model endpoints support the following GPU instance types:

Instance family Instance type vCPUs GiB of memory per vCPU GPUs GPU memory

p2

ml.p2.xlarge

4

15.25

1

12

p3

ml.p3.2xlarge

8

7.62

1

16

g5

ml.g5.xlarge

4

4

1

24

g5

ml.g5.2xlarge

8

4

1

24

g5

ml.g5.4xlarge

16

4

1

24

g5

ml.g5.8xlarge

32

4

1

24

g5

ml.g5.16xlarge

64

4

1

24

g4dn

ml.g4dn.xlarge

4

4

1

16

g4dn

ml.g4dn.2xlarge

8

4

1

16

g4dn

ml.g4dn.4xlarge

16

4

1

16

g4dn

ml.g4dn.8xlarge

32

4

1

16

g4dn

ml.g4dn.16xlarge

64

4

1

16