Pass Data Between Steps
When building pipelines with Amazon SageMaker Pipelines, you might need to pass data from one step to the next. For example, you might want to use the model artifacts generated by a training step as input to a model evaluation or deployment step. You can use this functionality to create interdependent pipeline steps and build your ML workflows.
When you need to retrieve information from the output of a pipeline step,
you can use JsonGet
. JsonGet
helps you extract information
from Amazon S3 or property files. The following sections
explain methods you can use to extract step outputs with JsonGet
.
Pass data between steps with Amazon S3
You can use JsonGet
in a ConditionStep
to fetch the JSON output directly from Amazon S3.
The Amazon S3 URI can be a Std:Join
function containing
primitive strings, pipeline run variables, or pipeline parameters.
The following example shows how you can use JsonGet
in a ConditionStep
:
# Example json file in s3 bucket generated by a processing_step { "Output": [5, 10] } cond_lte = ConditionLessThanOrEqualTo( left=JsonGet( step_name="
<step-name>
", s3_uri="<s3-path-to-json>
", json_path="Output[1]" ), right=6.0 )
If you are using JsonGet
with an Amazon S3 path in the condition step, you must
explicitly add a dependency between the condition step and the step generating the JSON output. In
following example, the condition step is created with a dependency on the processing step:
cond_step = ConditionStep( name="
<step-name>
", conditions=[cond_lte], if_steps=[fail_step], else_steps=[register_model_step], depends_on=[processing_step], )
Pass data between steps with property files
Use property files to store information from the output of a processing step. This is
particularly useful when analyzing the results of a processing step to decide how a
conditional step should be executed. The JsonGet
function processes a property
file and enables you to use JsonPath notation to query the property JSON file. For more
information on JsonPath notation, see the JsonPath repo
To store a property file for later use, you must first create a PropertyFile
instance with the following format. The path
parameter is the name of the JSON
file to which the property file is saved. Any output_name
must match the
output_name
of the ProcessingOutput
that you define in your
processing step. This enables the property file to capture the ProcessingOutput
in the step.
from sagemaker.workflow.properties import PropertyFile
<property_file_instance>
= PropertyFile( name="<property_file_name>
", output_name="<processingoutput_output_name>
", path="<path_to_json_file>
" )
When you create your ProcessingStep
instance, add the
property_files
parameter to list all of the parameter files that the Amazon SageMaker Pipelines
service must index. This saves the property file for later use.
property_files=[
<property_file_instance>
]
To use your property file in a condition step, add the property_file
to the
condition that you pass to your condition step as shown in the following example to query the
JSON file for your desired property using the json_path
parameter.
cond_lte = ConditionLessThanOrEqualTo( left=JsonGet( step_name=step_eval.name, property_file=
<property_file_instance>
, json_path="mse" ), right=6.0 )
For more in-depth examples, see Property File