MXNet Elastic Inference with SageMaker
By using Amazon Elastic Inference, you can speed up the throughput and decrease the latency of getting real-time inferences from your deep learning models that are deployed as Amazon SageMaker hosted models, but at a fraction of the cost of using a GPU instance for your endpoint.
For more information, see the Amazon SageMaker Elastic Inference Documentation