MLPER-16: Establish an automated re-training framework - Machine Learning Lens

MLPER-16: Establish an automated re-training framework

Monitor the data and the model predictions. Run analyses of model performance against defined metrics to identify errors due to data and concept drift. Automate model re-training to mitigate these errors on fixed scheduled intervals, or when model variance reaches a defined threshold. Automated model retraining can also be started as enough new data becomes available.

Implementation plan

  • Identify retraining opportunities - Monitor data statistics and ML inferences at production using Amazon SageMaker AI Model Monitor. If the data drifts beyond a defined threshold, then start retraining. Additionally, retraining can be initiated at defined scheduled intervals (to meet business requirements) or when additional training data is available. AWS supports mechanisms for automatically starting retraining based on a new data PUT to an Amazon S3 bucket. Ensure model versioning is supported when incorporating additional data into your models. This enables re-creating an inadvertently deleted model artifact using the combined versions of components used to create the versioned artifact.

  • Use Amazon SageMaker AI Pipelines - A retraining pipeline can be developed using Amazon SageMaker AI Pipelines that enables orchestration using step creation and management.

  • Use AWS Step Functions- You can also use AWS Step Functions Data Science SDK for Amazon SageMaker AI to automate training of a machine learning model. Define all the steps in the workflow and set up alerts to start the flow. To detect the presence of new training data in an S3 bucket, AWS CloudTrail combined with Amazon CloudWatch Events allows you to start an AWS Step Function workflow to initiate retraining tasks in your training pipeline.

  • Use third-party tools - Use third-party deployment orchestration tools, such as Jenkins, that integrate with AWS service APIs to automate model retraining when new data is available.

Documents

Blogs

Videos

Examples