Turn on step caching
To turn on step caching, you must add a CacheConfig
property to the step
definition. CacheConfig
properties use the following format in the pipeline
definition file:
{ "CacheConfig": { "Enabled": false, "ExpireAfter": "<time>" } }
The Enabled
field indicates whether caching is turned on for the particular
step. You can set the field to true
, which tells SageMaker AI to try to find a previous
run of the step with the same attributes. Or, you can set the field to false
,
which tells SageMaker AI to run the step every time the pipeline runs. ExpireAfter
is a
string in ISO 8601
durationExpireAfter
duration can be a
year, month, week, day, hour, or minute value. Each value consists of a number followed by a
letter indicating the unit of duration. For example:
-
"30d" = 30 days
-
"5y" = 5 years
-
"T16m" = 16 minutes
-
"30dT5h" = 30 days and 5 hours.
The following discussion describes the procedure to turn on caching for new or pre-existing pipelines using the Amazon SageMaker Python SDK.
Turn on caching for new pipelines
For new pipelines, initialize a CacheConfig
instance with
enable_caching=True
and provide it as an input to your pipeline step. The
following example turns on caching with a 1-hour timeout period for a training step:
from sagemaker.workflow.pipeline_context import PipelineSession from sagemaker.workflow.steps import CacheConfig cache_config = CacheConfig(enable_caching=True, expire_after="PT1H") estimator = Estimator(..., sagemaker_session=PipelineSession()) step_train = TrainingStep( name="TrainAbaloneModel", step_args=estimator.fit(inputs=inputs), cache_config=cache_config )
Turn on caching for pre-existing pipelines
To turn on caching for pre-existing, already-defined pipelines, turn on the
enable_caching
property for the step, and set expire_after
to a
timeout value. Lastly, update the pipeline with pipeline.upsert()
or
pipeline.update()
. Once you run it again, the following code example turns on
caching with a 1-hour timeout period for a training step:
from sagemaker.workflow.pipeline_context import PipelineSession from sagemaker.workflow.steps import CacheConfig from sagemaker.workflow.pipeline import Pipeline cache_config = CacheConfig(enable_caching=True, expire_after="PT1H") estimator = Estimator(..., sagemaker_session=PipelineSession()) step_train = TrainingStep( name="TrainAbaloneModel", step_args=estimator.fit(inputs=inputs), cache_config=cache_config ) # define pipeline pipeline = Pipeline( steps=[step_train] ) # additional step for existing pipelines pipeline.update() # or, call upsert() to update the pipeline # pipeline.upsert()
Alternatively, update the cache config after you have already defined the (pre-existing) pipeline, allowing one continuous code run. The following code sample demonstrates this method:
# turn on caching with timeout period of one hour pipeline.steps[0].cache_config.enable_caching = True pipeline.steps[0].cache_config.expire_after = "PT1H" # additional step for existing pipelines pipeline.update() # or, call upsert() to update the pipeline # pipeline.upsert()
For more detailed code examples and a discussion about how Python SDK parameters affect
caching, see Caching Configuration