Implementation plan Documents Blogs Videos Examples

MLPER-16: Establish an automated re-training framework

Monitor the data and the model predictions. Run analyses of model performance against defined metrics to identify errors due to data and concept drift. Automate model re-training to mitigate these errors on fixed scheduled intervals, or when model variance reaches a defined threshold. Automated model retraining can also be started as enough new data becomes available.

Implementation plan

Identify retraining opportunities - Monitor data statistics and ML inferences at production using Amazon SageMaker AI Model Monitor. If the data drifts beyond a defined threshold, then start retraining. Additionally, retraining can be initiated at defined scheduled intervals (to meet business requirements) or when additional training data is available. AWS supports mechanisms for automatically starting retraining based on a new data PUT to an Amazon S3 bucket. Ensure model versioning is supported when incorporating additional data into your models. This enables re-creating an inadvertently deleted model artifact using the combined versions of components used to create the versioned artifact.
Use Amazon SageMaker AI Pipelines - A retraining pipeline can be developed using Amazon SageMaker AI Pipelines that enables orchestration using step creation and management.
Use AWS Step Functions- You can also use AWS Step Functions Data Science SDK for Amazon SageMaker AI to automate training of a machine learning model. Define all the steps in the workﬂow and set up alerts to start the ﬂow. To detect the presence of new training data in an S3 bucket, AWS CloudTrail combined with Amazon CloudWatch Events allows you to start an AWS Step Function workflow to initiate retraining tasks in your training pipeline.
Use third-party tools - Use third-party deployment orchestration tools, such as Jenkins, that integrate with AWS service APIs to automate model retraining when new data is available.

Documents

Blogs

Videos

Examples

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

MLPER-15: Monitor, detect, and handle model performance degradation

MLPER-17: Review for updated data/features for retraining