Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Advanced endpoint options for inference with Amazon SageMaker AI

Focus mode
Advanced endpoint options for inference with Amazon SageMaker AI - Amazon SageMaker AI

With real-time inference, you can further optimize for performance and cost with the following advanced inference options:

  • Multi-model endpoints – Use this option if you have multiple models that use the same framework and can share a container. This option helps you optimize costs by improving endpoint utilization and reducing deployment overhead.

  • Multi-container endpoints – Use this option if you have multiple models that use different frameworks and require their own containers. You get many of the benefits of Multi-Model Endpoints and can deploy a variety of frameworks and models.

  • Serial Inference Pipelines – Use this option if you want to host models with pre-processing and post-processing logic behind an endpoint. Inference pipelines are fully managed by SageMaker AI and provide lower latency because all of the containers are hosted on the same Amazon EC2 instances.

PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.