Compiled recommendations with Neo

In Inference Recommender, you can compile your model with Neo and get endpoint recommendations for your compiled model. SageMaker Neo is a service that can optimize your model for a target hardware platform (that is, a specific instance type or environment). Optimizing a model with Neo might improve the performance of your hosted model.

For Neo-supported frameworks and containers, Inference Recommender automatically suggests Neo-optimized recommendations. To be eligible for Neo compilation, your input must meet the following prerequisites:

You are using a SageMaker AI owned DLC or XGBoost container.
You are using a framework version supported by Neo. For the framework versions supported by Neo, see Cloud Instances in the SageMaker Neo documentation.
Neo requires that you provide a correct input data shape for your model. You can specify this data shape as the DataInputConfig in the InferenceSpecification when you create a model package. For information about the correct data shapes for each framework, see Prepare Model for Compilation in the SageMaker Neo documentation.

The following example shows how to specify the DataInputConfig field in the InferenceSpecification, where data_input_configuration is a variable that contains the data shape in dictionary format (for example, {'input':[1,1024,1024,3]}).
```
"InferenceSpecification": {
        "Containers": [
            {
                "Image": dlc_uri,
                "Framework": framework.upper(),
                "FrameworkVersion": framework_version,
                "NearestModelName": model_name,
                "ModelInput": {"DataInputConfig": data_input_configuration},
            }
        ],
        "SupportedContentTypes": input_mime_types,  # required, must be non-null
        "SupportedResponseMIMETypes": [],
        "SupportedRealtimeInferenceInstanceTypes": supported_realtime_inference_types,  # optional
    }
```

If these conditions are met in your request, then Inference Recommender runs scenarios for both compiled and uncompiled versions of your model, giving you multiple recommendation combinations to choose from. You can compare the configurations for compiled and uncompiled versions of the same inference recommendation and determine which one best suits your use case. The recommendations are ranked by cost per inference.

To get the Neo compilation recommendations, you don’t have to do any additional configuration besides making sure that your input meets the preceding requirements. Inference Recommender automatically runs Neo compilation on your model if your input meets the requirements, and you receive a response that includes Neo recommendations.

If you run into errors during your Neo compilation, see Troubleshoot Neo Compilation Errors.

The following table is an example of a response you might get from an Inference Recommender job that includes recommendations for compiled models. If the InferenceSpecificationName field is None, then the recommendation is an uncompiled model. The last row, in which the value for the InferenceSpecificationName field is neo-00011122-2333-4445-5566-677788899900, is for a model compiled with Neo. The value in the field is the name of the Neo job used to compile and optimize your model.

EndpointName	InstanceType	InitialInstanceCount	EnvironmentParameters	CostPerHour	CostPerInference	MaxInvocations	ModelLatency	InferenceSpecificationName
sm-epc-example-000111222	ml.c5.9xlarge	1	[]	1.836	9.15E-07	33456	7	None
sm-epc-example-111222333	ml.c5.2xlarge	1	[]	0.408	2.11E-07	32211	21	None
sm-epc-example-222333444	ml.c5.xlarge	1	[]	0.204	1.86E-07	18276	92	None
sm-epc-example-333444555	ml.c5.xlarge	1	[]	0.204	1.60E-07	21286	42	neo-00011122-2333-4445-5566-677788899900

Get started

The general steps for creating an Inference Recommender job that includes Neo-optimized recommendations are as follows:

Prepare your ML model for compilation. For more information, see Prepare Model for Compilation in the Neo documentation.
Package your model in a model archive (.tar.gz file).
Create a sample payload archive.
Register your model in SageMaker Model Registry.
Create an Inference Recommender job.
View the results of the Inference Recommender job and choose a configuration.
Debug compilation failures, if any. For more information, see Troubleshoot Neo Compilation Errors.

For an example that demonstrates the previous workflow and how to get Neo-optimized recommendations using XGBoost, see the following example notebook. For an example that show how to get Neo-optimized recommendations using TensorFlow, see the following example notebook.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Stop your inference recommendation

Recommendation results