How SageMaker smart sifting works
The goal of SageMaker smart sifting is to sift through your training data during the training
process and only feed more informative samples to the model. During typical training
with PyTorch, data is iteratively sent in batches to the training loop and to
accelerator devices (such as GPUs or Trainium chips) by the PyTorch
DataLoader
The following diagram shows an overview of how the SageMaker smart sifting algorithm is designed.
In short, SageMaker smart sifting operates during training as data is loaded. The SageMaker smart sifting algorithm runs loss calculation over the batches, and sifts non-improving data out before the forward and backward pass of each iteration. The refined data batch is then used for the forward and backward pass.
Note
Smart sifting of data on SageMaker AI uses additional forward passes to analyze and filter your training data. In turn, there are fewer backward passes as less impactful data is excluded from your training job. Because of this, models which have long or expensive backward passes see the greatest efficiency gains when using smart sifting. Meanwhile, if your model's forward pass takes longer than its backward pass, overhead could increase total training time. To measure the time spent by each pass, you can run a pilot training job and collect logs that record the time on the processes. Also consider using SageMaker Profiler that provides profiling tools and UI application. To learn more, see Amazon SageMaker Profiler.
SageMaker smart sifting works for PyTorch-based training jobs with classic distributed data
parallelism, which makes model replicas on each GPU worker and performs
AllReduce
. It works with PyTorch DDP and the SageMaker AI distributed data
parallel library.