Deploy a Model - Amazon SageMaker AI

Deploy a Model

To deploy an Amazon SageMaker Neo-compiled model to an HTTPS endpoint, you must configure and create the endpoint for the model using Amazon SageMaker AI hosting services. Currently, developers can use Amazon SageMaker APIs to deploy modules on to ml.c5, ml.c4, ml.m5, ml.m4, ml.p3, ml.p2, and ml.inf1 instances.

For Inferentia and Trainium instances, models need to be compiled specifically for those instances. Models compiled for other instance types are not guaranteed to work with Inferentia or Trainium instances.

When you deploy a compiled model, you need to use the same instance for the target that you used for compilation. This creates a SageMaker AI endpoint that you can use to perform inferences. You can deploy a Neo-compiled model using any of the following: Amazon SageMaker AI SDK for Python, SDK for Python (Boto3), AWS Command Line Interface, and the SageMaker AI console.

Note

For deploying a model using AWS CLI, the console, or Boto3, see Neo Inference Container Images to select the inference image URI for your primary container.