Pass Data Between Steps
When building pipelines with Amazon SageMaker Pipelines, you might need to pass data from one step to the next. For example, you might want to use the model artifacts generated by a training step as input to a model evaluation or deployment step. You can use this functionality to create interdependent pipeline steps and build your ML workflows.
When you need to retrieve information from the output of a pipeline step, you can use
JsonGet
. JsonGet
helps you extract information from Amazon S3 or
property files. The following sections explain methods you can use to extract step outputs
with JsonGet
.
Pass data between steps with Amazon S3
You can use JsonGet
in a ConditionStep
to fetch the JSON
output directly from Amazon S3. The Amazon S3 URI can be a Std:Join
function containing
primitive strings, pipeline run variables, or pipeline parameters. The following example
shows how you can use JsonGet
in a ConditionStep
:
# Example json file in s3 bucket generated by a processing_step { "Output": [5, 10] } cond_lte = ConditionLessThanOrEqualTo( left=JsonGet( step_name="
<step-name>
", s3_uri="<s3-path-to-json>
", json_path="Output[1]" ), right=6.0 )
If you are using JsonGet
with an Amazon S3 path in the condition step, you must
explicitly add a dependency between the condition step and the step generating the JSON
output. In following example, the condition step is created with a dependency on the
processing step:
cond_step = ConditionStep( name="
<step-name>
", conditions=[cond_lte], if_steps=[fail_step], else_steps=[register_model_step], depends_on=[processing_step], )
Pass data between steps with property files
Use property files to store information from the output of a processing step. This is
particularly useful when analyzing the results of a processing step to decide how a
conditional step should be executed. The JsonGet
function processes a property
file and enables you to use JsonPath notation to query the property JSON file. For more
information on JsonPath notation, see the JsonPath repo
To store a property file for later use, you must first create a
PropertyFile
instance with the following format. The path
parameter is the name of the JSON file to which the property file is saved. Any
output_name
must match the output_name
of the
ProcessingOutput
that you define in your processing step. This enables the
property file to capture the ProcessingOutput
in the step.
from sagemaker.workflow.properties import PropertyFile
<property_file_instance>
= PropertyFile( name="<property_file_name>
", output_name="<processingoutput_output_name>
", path="<path_to_json_file>
" )
When you create your ProcessingStep
instance, add the
property_files
parameter to list all of the parameter files that the
Amazon SageMaker Pipelines service must index. This saves the property file for later use.
property_files=[
<property_file_instance>
]
To use your property file in a condition step, add the property_file
to the
condition that you pass to your condition step as shown in the following example to query
the JSON file for your desired property using the json_path
parameter.
cond_lte = ConditionLessThanOrEqualTo( left=JsonGet( step_name=step_eval.name, property_file=
<property_file_instance>
, json_path="mse" ), right=6.0 )
For more in-depth examples, see Property File