Recommendation jobs with Amazon SageMaker Inference Recommender
Amazon SageMaker Inference Recommender can make two types of recommendations:
-
Inference recommendations (
Default
job type) run a set of load tests on the recommended instance types. You can also load test for a serverless endpoint.. You only need to provide a model package Amazon Resource Name (ARN) to launch this type of recommendation job. Inference recommendation jobs complete within 45 minutes. -
Endpoint recommendations (
Advanced
job type) are based on a custom load test where you select your desired ML instances or a serverless endpoint, provide a custom traffic pattern, and provide requirements for latency and throughput based on your production requirements. This job takes an average of 2 hours to complete depending on the job duration set and the total number of inference configurations tested.
Both types of recommendations use the same APIs to create, describe, and stop jobs.
The output is a list of instance configuration recommendations with associated
environment variables, cost, throughput, and latency metrics. Recommendation jobs also
provide an initial instance count, which you can use to configure an autoscaling policy.
To differentiate between the two types of jobs, when you’re creating a job through
either the SageMaker AI console or the APIs, specify Default
to create preliminary
endpoint recommendations and Advanced
for custom load testing and endpoint
recommendations.
Note
You do not need to do both types of recommendation jobs in your own workflow. You can do either independently of the other.
Inference Recommender can also provide you with a list of prospective instances, or the top five instance types that are optimized for cost, throughput and latency for model deployment, along with a confidence score. You can choose these instances when deploying your model. Inference Recommender automatically performs benchmarking against your model for you to provide the prospective instances. Since these are preliminary recommendations, we recommend that you run further instance recommendation jobs to get more accurate results. To view the prospective instances, go to your SageMaker AI model details page. For more information, see Get instant prospective instances.
Topics
- Get instant prospective instances
- Inference recommendations
- Get an inference recommendation for an existing endpoint
- Stop your inference recommendation
- Compiled recommendations with Neo
- Recommendation results
- Get autoscaling policy recommendations
- Run a custom load test
- Stop your load test
- Troubleshoot Inference Recommender errors