Creating
a Visual ETL flow
To create a flow using Visual ETL in Amazon SageMaker Unified Studio:
-
Log in to Amazon SageMaker Unified Studio and select a project.
-
Navigate to the Visual ETL tool using the dropdown "Build" menu, selecting "Visual ETL flows".
-
Click "Create visual ETL flow" to open the Visual ETL editor.
If this is your first time using Visual ETL flows in Amazon SageMaker Unified Studio, you are asked to choose a default compute permission mode option based on your data access preference. For more information, see Configuring permission mode.
-
Give the flow a name when you begin authoring the flow.
From the dropdown menu next to the Run button, choose the compute permission mode option that supports the data you will be using in the flow.
Select project.spark.fineGrained to configure permission mode to support fine-grained access control. Choosing this option configures your Visual ETL flow to work with data product subscriptions from Amazon SageMaker Catalog.
Select project.spark.compatibility to configure permission mode to be compatible with data managed using full-table access, meaning the compute engine can access all rows and columns in the data. Choosing this option configures your Visual ETL flow to work with data assets that you connect to from your project.
-
Select the "Add nodes" button and select a node, chooing your node from one of the three tabs: "Data sources", "Transforms", or "Data targets".
-
Drag a source component onto the canvas.
-
Configure the component by clicking on the node and editing the configurations, to connect to your data source.
-
Add transformation components as needed, connecting them in the desired order.
-
Drag a data target onto the canvas and configure it to specify where the processed data should be stored.
-
Connect the components to create a complete flow.
-
Click the "Checklist" button to check for any configuration errors.
-
To make the flow accessible for all project members to view and edit, select "Save to project".
-
Select "Run" to execute it immediately or run it on a schedule with the instructions at Scheduling and running visual flows with workflows.