Compiled recommendations with Neo
In Inference Recommender, you can compile your model with Neo and get endpoint recommendations for your compiled model. SageMaker Neo is a service that can optimize your model for a target hardware platform (that is, a specific instance type or environment). Optimizing a model with Neo might improve the performance of your hosted model.
For Neo-supported frameworks and containers, Inference Recommender automatically suggests Neo-optimized recommendations. To be eligible for Neo compilation, your input must meet the following prerequisites:
-
You are using a SageMaker AI owned DLC or XGBoost container.
-
You are using a framework version supported by Neo. For the framework versions supported by Neo, see Cloud Instances in the SageMaker Neo documentation.
-
Neo requires that you provide a correct input data shape for your model. You can specify this data shape as the
DataInputConfig
in theInferenceSpecification
when you create a model package. For information about the correct data shapes for each framework, see Prepare Model for Compilation in the SageMaker Neo documentation.The following example shows how to specify the
DataInputConfig
field in theInferenceSpecification
, wheredata_input_configuration
is a variable that contains the data shape in dictionary format (for example,{'input':[1,1024,1024,3]}
)."InferenceSpecification": { "Containers": [ { "Image": dlc_uri, "Framework": framework.upper(), "FrameworkVersion": framework_version, "NearestModelName": model_name, "ModelInput": {"DataInputConfig": data_input_configuration}, } ], "SupportedContentTypes": input_mime_types, # required, must be non-null "SupportedResponseMIMETypes": [], "SupportedRealtimeInferenceInstanceTypes": supported_realtime_inference_types, # optional }
If these conditions are met in your request, then Inference Recommender runs scenarios for both compiled and uncompiled versions of your model, giving you multiple recommendation combinations to choose from. You can compare the configurations for compiled and uncompiled versions of the same inference recommendation and determine which one best suits your use case. The recommendations are ranked by cost per inference.
To get the Neo compilation recommendations, you don’t have to do any additional configuration besides making sure that your input meets the preceding requirements. Inference Recommender automatically runs Neo compilation on your model if your input meets the requirements, and you receive a response that includes Neo recommendations.
If you run into errors during your Neo compilation, see Troubleshoot Neo Compilation Errors.
The following table is an example of a response you might get from an Inference Recommender job
that includes recommendations for compiled models. If the
InferenceSpecificationName
field is None
, then the
recommendation is an uncompiled model. The last row, in which the value for the
InferenceSpecificationName field is
neo-00011122-2333-4445-5566-677788899900
, is for a model compiled
with Neo. The value in the field is the name of the Neo job used to compile and
optimize your model.
EndpointName | InstanceType | InitialInstanceCount | EnvironmentParameters | CostPerHour | CostPerInference | MaxInvocations | ModelLatency | InferenceSpecificationName |
---|---|---|---|---|---|---|---|---|
sm-epc-example-000111222 |
ml.c5.9xlarge |
1 |
[] |
1.836 |
9.15E-07 |
33456 |
7 |
None |
sm-epc-example-111222333 |
ml.c5.2xlarge |
1 |
[] |
0.408 |
2.11E-07 |
32211 |
21 |
None |
sm-epc-example-222333444 |
ml.c5.xlarge |
1 |
[] |
0.204 |
1.86E-07 |
18276 |
92 |
None |
sm-epc-example-333444555 |
ml.c5.xlarge |
1 |
[] |
0.204 |
1.60E-07 |
21286 |
42 |
neo-00011122-2333-4445-5566-677788899900 |
Get started
The general steps for creating an Inference Recommender job that includes Neo-optimized recommendations are as follows:
-
Prepare your ML model for compilation. For more information, see Prepare Model for Compilation in the Neo documentation.
-
Package your model in a model archive (
.tar.gz
file). -
Create a sample payload archive.
-
Register your model in SageMaker Model Registry.
-
Create an Inference Recommender job.
-
View the results of the Inference Recommender job and choose a configuration.
-
Debug compilation failures, if any. For more information, see Troubleshoot Neo Compilation Errors.
For an example that demonstrates the previous workflow and how to get
Neo-optimized recommendations using XGBoost, see the following example notebook