Distributed Training with Amazon SageMaker AI RL - Amazon SageMaker AI

Distributed Training with Amazon SageMaker AI RL

Amazon SageMaker AI RL supports multi-core and multi-instance distributed training. Depending on your use case, training and/or environment rollout can be distributed. For example, SageMaker AI RL works for the following distributed scenarios:

  • Single training instance and multiple rollout instances of the same instance type. For an example, see the Neural Network Compression example in the SageMaker AI examples repository.

  • Single trainer instance and multiple rollout instances, where different instance types for training and rollouts. For an example, see the AWS DeepRacer / AWS RoboMaker example in the SageMaker AI examples repository.

  • Single trainer instance that uses multiple cores for rollout. For an example, see the Roboschool example in the SageMaker AI examples repository. This is useful if the simulation environment is light-weight and can run on a single thread.

  • Multiple instances for training and rollouts. For an example, see the Roboschool example in the SageMaker AI examples repository.