Pass Data Between Steps - Amazon SageMaker AI

Pass Data Between Steps

When building pipelines with Amazon SageMaker Pipelines, you might need to pass data from one step to the next. For example, you might want to use the model artifacts generated by a training step as input to a model evaluation or deployment step. You can use this functionality to create interdependent pipeline steps and build your ML workflows.

When you need to retrieve information from the output of a pipeline step, you can use JsonGet. JsonGet helps you extract information from Amazon S3 or property files. The following sections explain methods you can use to extract step outputs with JsonGet.

Pass data between steps with Amazon S3

You can use JsonGet in a ConditionStep to fetch the JSON output directly from Amazon S3. The Amazon S3 URI can be a Std:Join function containing primitive strings, pipeline run variables, or pipeline parameters. The following example shows how you can use JsonGet in a ConditionStep:

# Example json file in s3 bucket generated by a processing_step { "Output": [5, 10] } cond_lte = ConditionLessThanOrEqualTo( left=JsonGet( step_name="<step-name>", s3_uri="<s3-path-to-json>", json_path="Output[1]" ), right=6.0 )

If you are using JsonGet with an Amazon S3 path in the condition step, you must explicitly add a dependency between the condition step and the step generating the JSON output. In following example, the condition step is created with a dependency on the processing step:

cond_step = ConditionStep( name="<step-name>", conditions=[cond_lte], if_steps=[fail_step], else_steps=[register_model_step], depends_on=[processing_step], )

Pass data between steps with property files

Use property files to store information from the output of a processing step. This is particularly useful when analyzing the results of a processing step to decide how a conditional step should be executed. The JsonGet function processes a property file and enables you to use JsonPath notation to query the property JSON file. For more information on JsonPath notation, see the JsonPath repo.

To store a property file for later use, you must first create a PropertyFile instance with the following format. The path parameter is the name of the JSON file to which the property file is saved. Any output_name must match the output_name of the ProcessingOutput that you define in your processing step. This enables the property file to capture the ProcessingOutput in the step.

from sagemaker.workflow.properties import PropertyFile <property_file_instance> = PropertyFile( name="<property_file_name>", output_name="<processingoutput_output_name>", path="<path_to_json_file>" )

When you create your ProcessingStep instance, add the property_files parameter to list all of the parameter files that the Amazon SageMaker Pipelines service must index. This saves the property file for later use.

property_files=[<property_file_instance>]

To use your property file in a condition step, add the property_file to the condition that you pass to your condition step as shown in the following example to query the JSON file for your desired property using the json_path parameter.

cond_lte = ConditionLessThanOrEqualTo( left=JsonGet( step_name=step_eval.name, property_file=<property_file_instance>, json_path="mse" ), right=6.0 )

For more in-depth examples, see Property File in the Amazon SageMaker Python SDK.