Distributed Training with Amazon SageMaker AI RL
Amazon SageMaker AI RL supports multi-core and multi-instance distributed training. Depending on your use case, training and/or environment rollout can be distributed. For example, SageMaker AI RL works for the following distributed scenarios:
-
Single training instance and multiple rollout instances of the same instance type. For an example, see the Neural Network Compression example in the SageMaker AI examples repository
. -
Single trainer instance and multiple rollout instances, where different instance types for training and rollouts. For an example, see the AWS DeepRacer / AWS RoboMaker example in the SageMaker AI examples repository
. -
Single trainer instance that uses multiple cores for rollout. For an example, see the Roboschool example in the SageMaker AI examples repository
. This is useful if the simulation environment is light-weight and can run on a single thread. -
Multiple instances for training and rollouts. For an example, see the Roboschool example in the SageMaker AI examples repository
.