Schedule your ML workflows
With Amazon SageMaker AI you can manage your entire ML workflow as you create datasets, perform data transforms, build models from data, and deploy your models to endpoints for inference. If you perform any subset of steps of your workflow periodically, you can also choose to run these steps on a schedule. For example, you might want to schedule a job in SageMaker Canvas to run a transform on new data every hour. In another scenario, you might want to schedule a weekly job to monitor model drift of your deployed model. You can specify a recurring schedule of any time interval—you can iterate every second, minute, daily, weekly, monthly, or the 3rd Friday of every month at 3pm.
The following scenarios summarize the options available to you depending on your use case.
Use case 1: Build and schedule your ML workflow in a no-code environment. For beginners or those new to SageMaker AI, you can use Amazon SageMaker Canvas to both build your ML workflow and create scheduled runs using the Canvas UI-based scheduler.
Use case 2: Build your workflow in a single Jupyter notebook and use a no-code scheduler. Experienced ML practitioners can use code to build their ML workflow in a Jupyter notebook and use the no-code scheduling option available with the Notebook Jobs widget. If your ML workflow consists of multiple Jupyter notebooks, you can use the scheduling feature in the Pipelines Python SDK described in use case 3.
Use case 3: Build and schedule your ML workflow using Pipelines. Advanced users can use the Amazon SageMaker Python SDK
or Amazon EventBridge scheduling options available with Pipelines. You can build an ML workflow comprised of steps that include operations with various SageMaker AI features and AWS services, such as Amazon EMR.
Descriptor | Use case 1 | Use case 2 | Use case 3 |
---|---|---|---|
SageMaker AI feature | Amazon SageMaker Canvas data processing and ML workflow scheduling | Notebook Jobs schedule widget (UI) | Pipelines Python SDK scheduling options |
Description | With Amazon SageMaker Canvas, you can schedule automatic runs of data processing steps and, in a separate procedure, automatic dataset updates. You can also indirectly schedule your entire ML workflow by setting up a configuration that runs a batch prediction whenever a specific dataset is updated. For both automated data processing and dataset updates, SageMaker Canvas provides a basic form where you select a start time and date and a time interval between runs (or a cron expression if you schedule a data processing step). For more information about how to schedule data processing steps, see Create a schedule to automatically process new data. For more information about how to schedule dataset and batch prediction updates, see How to manage automations. | If you built your data processing and pipeline workflow in a single Jupyter notebook, you can use the Notebook Jobs widget to run your notebook on demand or on a schedule. The Notebook Jobs widget displays a basic form where you specify the compute type, run schedule, and optional custom settings. You define your run schedule by selecting a time-based interval or by inserting a cron expression. The widget is automatically installed in Studio, or you can perform additional installation to use this feature in your local JupyterLab environment. For more information about Notebook Jobs, see SageMaker Notebook Jobs. | You can use the scheduling features in the SageMaker SDK if you implemented your ML
workflow with Pipelines. Your pipeline can include steps such as fine-tuning,
data processing, and deployment. Pipelines supports two ways to
schedule your pipeline. You can create an Amazon EventBridge rule or use the SageMaker SDK
PipelineSchedule |
Optimized for | Provides a scheduling option for a SageMaker Canvas ML workflow | Provides a UI-based scheduling option for Jupyter notebook-based ML workflows | Provides a SageMaker SDK or EventBridge scheduling option for ML workflows |
Considerations | You can schedule your workflow with the Canvas no-code framework, but dataset updates and batch transform updates can handle up to 5GB of data. | You can schedule one notebook using the UI-based scheduling form, but not multiple notebooks, in the same job. To schedule multiple notebooks, use the Pipelines SDK code-based solution described in use case 3. | You can use the more advanced (SDK based) scheduling capabilities provided by Pipelines, but you need to reference API documentation to specify the correct options rather than selecting from a UI-based menu of options. |
Recommended environment | Amazon SageMaker Canvas | Studio, local JupyterLab environment | Studio, local JupyterLab environment, any code editor |
Additional resources
SageMaker AI offers the following additional options for scheduling your workflows.
What is Amazon EventBridge Scheduler?. The scheduling options discussed in this section include pre-built options available in SageMaker Canvas, Studio, and the SageMaker AI Python SDK. All options extend the features of Amazon EventBridge, and you can also create your own custom scheduling solution with EventBridge.
Scheduled and event based executions for Feature Processor pipelines. With Amazon SageMaker Feature Store Feature Processing, you can configure your Feature Processing pipelines to run on a schedule or as a result of another AWS service event.