Amazon SageMaker Autopilot example notebooks - Amazon SageMaker AI

Amazon SageMaker Autopilot example notebooks

The following notebooks serve as practical, hands-on examples that address various use cases of Autopilot.

You can find all of Autopilot's notebooks in the autopilot directory of SageMaker AI GitHub examples repository.

We recommend cloning the full Git repository within Studio Classic to access and run the notebooks directly. For information on how to clone a Git repository in Studio Classic, see Clone a Git Repository in SageMaker Studio Classic.

Use case Description
Serverless inference

By default, Autopilot allows deploying generated models to real-time inference endpoints. In this repository, the notebook illustrates how to deploy Autopilot models trained with ENSEMBLING and HYPERPARAMETER OPTIMIZATION (HPO) modes to serverless endpoints. Serverless endpoints automatically launch compute resources and scale them in and out depending on traffic, eliminating the need to choose instance types or manage scaling policies.

Custom feature selection

Autopilot inspects your data set, and runs a number of candidates to figure out the optimal combination of data preprocessing steps, machine learning algorithms, and hyperparameters. You can easily deploy either on a real-time endpoint or for batch processing.

In some cases, you might want to have the flexibility to bring custom data processing code to Autopilot. For example, your datasets might contain a large number of independent variables, and you may wish to incorporate a custom feature selection step to remove irrelevant variables first. The resulting smaller dataset can then be used to launch an Autopilot job. Ultimately, you would also want to include both the custom processing code and models from Autopilot for real-time or batch processing.

Pipeline example

While Autopilot streamlines the process of building ML models, MLOps engineers are still responsible for creating, automating, and managing end-to-end ML workflows in production. SageMaker Pipelines can assist in automating various steps of the ML lifecycle, such as data preprocessing, model training, hyperparameter tuning, model evaluation, and deployment. This notebook serves as a demonstration of how to incorporate Autopilot into a SageMaker Pipelines end-to-end AutoML training workflow. To launch an Autopilot experiment within Pipelines, you must create a model-building workflow by writing custom integration code using Pipelines Lambda or Processing steps. For more information, refer to Move Amazon SageMaker Autopilot ML models from experimentation to production using Amazon SageMaker AI Pipelines.

Alternatively, when using Autopilot in Ensembling mode, you can refer to the notebook example that demonstrates how to use native AutoML step in SageMaker Pipeline's native AutoML step. With Autopilot supported as a native step within Pipelines, you can now add an automated training step (AutoMLStep) to your Pipelines and invoke an Autopilot experiment in Ensembling mode.

Direct marketing with Amazon SageMaker Autopilot

This notebook demonstrates how uses the Bank Marketing Data Set to predict whether a customer will enroll for a term deposit at a bank. You can use Autopilot on this dataset to get the most accurate ML pipeline by exploring options contained in various candidate pipelines. Autopilot generates each candidate in a two-step procedure. The first step performs automated feature engineering on the dataset. The second step trains and tunes an algorithm to produce a model. The notebook contains instructions on how to train the model and how to deploy the model to perform batch inference using the best candidate.

Customer Churn Prediction with Amazon SageMaker Autopilot

This notebook describes using machine learning for the automated identification of unhappy customers, also known as customer churn prediction. The example shows how to analyze a publicly available dataset and perform feature engineering on it. Next it shows how to tune a model by selecting the best performing pipeline along with the optimal hyperparameters for the training algorithm. Finally, it shows how to deploy the model to a hosted endpoint and how to evaluate its predictions against ground truth. However, ML models rarely give perfect predictions. That's why this notebook also shows how to incorporate the relative costs of prediction mistakes when determining the financial outcome of using ML.

Top Candidates Customer Churn Prediction with Amazon SageMaker Autopilot and Batch Transform (Python SDK)

This notebook also describes using machine learning for the automated identification of unhappy customers, also known as customer churn prediction. This notebook demonstrates how to configure the model to obtain the inference probability, select the top N models, and make Batch Transform on a hold-out test set for evaluation.

Note

This notebook works with SageMaker Python SDK >= 1.65.1 released on 6/19/2020.

Bringing your own data processing code to Amazon SageMaker Autopilot

This notebook demonstrates how to incorporate and deploy custom data processing code when using Amazon SageMaker Autopilot. It adds a custom feature selection step to remove irrelevant variables to an Autopilot job. It then shows how to deploy both the custom processing code and models generated by Autopilot on a real-time endpoint and, alternatively, for batch processing.

More notebooks

You can find more notebooks illustrating other use cases such as batch transform, time-series forecasting and more in the root directory.