Best practices arranged by pillar
This is a list of best practices outlined in this paper organized by the pillars of the AWS Well-Architected Framework.
Operational excellence pillar
MLOE-01: Develop the right skills with accountability and empowerment
MLOE-02: Discuss and agree on the level of model explainability
MLOE-08: Establish feedback loops across ML lifecycle phases
MLOE-13: Establish reliable packaging patterns to access approved public libraries
MLOE-16: Synchronize architecture and configuration, and check for skew across environments
Security pillar
Reliability pillar
Performance efficiency pillar
Cost optimization pillar
MLCOST-01: Define overall return on investment (ROI) and opportunity cost
MLCOST-02: Use managed services to reduce total cost of ownership (TCO)
MLCOST-03: Identify if machine learning is the right solution
MLCOST-04: Tradeoff analysis on custom versus pre-trained models
MLCOST-11: Select local training for small scale experiments
MLCOST-18: Use warm-start and checkpointing hyperparameter tuning
MLCOST-20 - Setup budget and use resource tagging to track costs
MLCOST-29: Monitor endpoint usage and right-size the instance fleet