Resume training from a checkpoint - Amazon SageMaker

Resume training from a checkpoint

To resume a training job from a checkpoint, run a new estimator with the same checkpoint_s3_uri that you created in the Enable checkpointing section. Once the training has resumed, the checkpoints from this S3 bucket are restored to checkpoint_local_path in each instance of the new training job. Ensure that the S3 bucket is in the same Region as that of the current SageMaker session.

Architecture diagram of syncing checkpoints to resume training.