Create a workflow in Amazon SageMaker Unified Studio - Amazon SageMaker Unified Studio

Amazon SageMaker Unified Studio is in preview release and is subject to change.

Create a workflow in Amazon SageMaker Unified Studio

Use workflows to orchestrate notebooks, querybooks, and more in your project repositories. With workflows, you can define a collection of tasks organized as a directed acyclic graph (DAG) that can run on a user-defined schedule.

Prerequisites

Before you can create a workflow, you must prepare the files that you want to run. The files should be saved in your JupyterLab space in a folder that you can easily locate later.

If you want to schedule a query to run, you must first save the querybook to the project and pull it into your JupyterLab space. The steps are as follows:

  1. In a project that uses the Data analytics and AI-ML model development project profile, create the query you want to run and save it to the project. For more information, see Create a query.

  2. Expand the Build menu in the top navigation, then choose JupyterLab to navigate to the JupyterLab IDE.

  3. Choose the Git icon in the left navigation.

  4. Choose the Pull latest changes icon to do a git pull and bring the published querybook into your JupyterLab space.

  5. Note the location of the file in the JupyterLab file navigation. You will need that path later so you can add it to your workflow.

Create a workflow

To create a workflow, complete the following steps:

  1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials.

  2. Navigate to a project that was created with the Data analytics and AI-ML model development project profile. You can do this by using the center menu at the top of the page and choosing Browse all projects, then choosing the name of the project that you want to navigate to.

  3. In the Build menu, choose Workflows. This takes you to the Workflows page.

  4. Choose Create workflow in editor. This takes you to the Code page and opens a new notebook file in the workflows/dags folder of the JupyterLab file navigation. The file is prepopulated with a workflow definition template.

  5. Update the file as desired to create your workflow.

    1. Update WORKFLOW_SCHEDULE to determine when the workflow will be scheduled to run.

    2. Update NOTEBOOK_PATH to point to the querybook or JupyterLab notebook that you want to run. For example, 'src/querybook.sqlnb'.

    3. Update dag_id with an ID that you can identify later.

    4. Add tags and parameters, if desired. For more information, see Params in the Apache Airflow documentation.

When you create a workflow, you are modifying the directed acyclic graph (DAG) within the Python file. A DAG defines a collection of tasks with their dependencies and relationships to show how they should run.

A DAG consists of the following:

  • A DAG definition. The DAG ID will also be the name of the workflow.

  • Operators that describe how to run the DAG and the tasks to run.

  • Operator relationships that describe the order in which to run the tasks.

For more information about DAGs, see DAGs in the Apache Airflow documentation. You can configure a DAG to run on a schedule or run it manually.

You can include multiple DAGs to create multiple workflows. When you have included the DAGs you want to use, save the file in the workflows/dag folder in JupyterLab. There might be a slight delay before the workflow appears on the Workflows page.