MLPER-16: Establish an automated re-training framework
Monitor the data and the model predictions. Run analyses of model performance against defined metrics to identify errors due to data and concept drift. Automate model re-training to mitigate these errors on fixed scheduled intervals, or when model variance reaches a defined threshold. Automated model retraining can also be started as enough new data becomes available.
Implementation plan
-
Identify retraining opportunities - Monitor data statistics and ML inferences at production using Amazon SageMaker AI Model Monitor. If the data drifts beyond a defined threshold, then start retraining. Additionally, retraining can be initiated at defined scheduled intervals (to meet business requirements) or when additional training data is available. AWS supports mechanisms for automatically starting retraining based on a new data PUT to an Amazon S3
bucket. Ensure model versioning is supported when incorporating additional data into your models. This enables re-creating an inadvertently deleted model artifact using the combined versions of components used to create the versioned artifact. -
Use Amazon SageMaker AI Pipelines - A retraining pipeline can be developed using Amazon
SageMaker AI Pipelines that enables orchestration using step creation and management. -
Use AWS Step Functions- You can also use AWS Step Functions Data Science SDK for Amazon SageMaker AI to automate training of a machine learning model. Define all the steps in the workflow and set up alerts to start the flow. To detect the presence of new training data in an S3 bucket, AWS CloudTrail
combined with Amazon CloudWatch Events allows you to start an AWS Step Function workflow to initiate retraining tasks in your training pipeline. -
Use third-party tools - Use third-party deployment orchestration tools, such as Jenkins
, that integrate with AWS service APIs to automate model retraining when new data is available.
Documents
Blogs
-
Monitoring in-production ML models at large scale using Amazon SageMaker AI Model Monitor
-
Automating model retraining and deployment using the AWS Step Functions Data Science SDK for
Amazon SageMaker AI -
Automating complex deep learning model training using Amazon SageMaker AI Debugger and AWS Step
Functions -
Build a CI/CD pipeline for deploying custom machine learning models using AWS services
-
Create SageMaker AI Pipelines for training, consuming and monitoring your batch use cases