Automatic scaling of Amazon SageMaker AI models
Amazon SageMaker AI supports automatic scaling (auto scaling) for your hosted models. Auto scaling dynamically adjusts the number of instances provisioned for a model in response to changes in your workload. When the workload increases, auto scaling brings more instances online. When the workload decreases, auto scaling removes unnecessary instances so that you don't pay for provisioned instances that you aren't using.
Topics
- Auto scaling policy overview
- Auto scaling prerequisites
- Configure model auto scaling with the console
- Register a model
- Define a scaling policy
- Apply a scaling policy
- Instructions for editing a scaling policy
- Temporarily turn off scaling policies
- Delete a scaling policy
- Check the status of a scaling activity by describing scaling activities
- Scale an endpoint to zero instances
- Load testing your auto scaling configuration
- Use AWS CloudFormation to create a scaling policy
- Update endpoints that use auto scaling
- Delete endpoints configured for auto scaling