Pipelines overview - Amazon SageMaker AI

Pipelines overview

An Amazon SageMaker AI pipeline is a series of interconnected steps in directed acyclic graph (DAG) that are defined using the drag-and-drop UI or Pipelines SDK. You can also build your pipeline using the pipeline definition JSON schema. This DAG JSON definition gives information on the requirements and relationships between each step of your pipeline. The structure of a pipeline's DAG is determined by the data dependencies between steps. These data dependencies are created when the properties of a step's output are passed as the input to another step. The following image is an example of a pipeline DAG:

An example pipeline directed acyclic graph (DAG).
The example DAG includes the following steps:
  1. AbaloneProcess, an instance of the Processing step, runs a preprocessing script on the data used for training. For example, the script could fill in missing values, normalize numerical data, or split data into the train, validation, and test datasets.

  2. AbaloneTrain, an instance of the Training step, configures hyperparameters and trains a model from the preprocessed input data.

  3. AbaloneEval, another instance of the Processing step, evaluates the model for accuracy. This step shows an example of a data dependency—this step uses the test dataset output of the AbaloneProcess.

  4. AbaloneMSECond is an instance of a Condition step which, in this example, checks to make sure the mean-square-error result of model evaluation is below a certain limit. If the model does not meet the criteria, the pipeline run stops.

  5. The pipeline run proceeds with the following steps:

    1. AbaloneRegisterModel, where SageMaker AI calls a RegisterModel step to register the model as a versioned model package group into the Amazon SageMaker Model Registry.

    2. AbaloneCreateModel, where SageMaker AI calls a CreateModel step to create the model in preparation for batch transform. In AbaloneTransform, SageMaker AI calls a Transform step to generate model predictions on a dataset you specify.

The following topics describe fundamental Pipelines concepts. For a tutorial describing the implementation of these concepts, see Pipelines actions.