MXNet Elastic Inference with SageMaker

By using Amazon Elastic Inference, you can speed up the throughput and decrease the latency of getting real-time inferences from your deep learning models that are deployed as Amazon SageMaker hosted models, but at a fraction of the cost of using a GPU instance for your endpoint.

For more information, see the Amazon SageMaker Elastic Inference Documentation

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Monitoring Elastic Inference Accelerators

Using Amazon Deep Learning Containers With Elastic Inference