TensorFlow
Bring your own TensorFlow model to SageMaker AI, and run the training job with SageMaker Training Compiler.
TensorFlow Models
SageMaker Training Compiler automatically optimizes model training workloads that are built on top of the native TensorFlow API or the high-level Keras API.
Tip
For preprocessing your input dataset, ensure that you use a static input shape. Dynamic input shape can initiate recompilation of the model and might increase total training time.
Using Keras (Recommended)
For the best compiler acceleration, we recommend using models that are subclasses
of TensorFlow Keras (tf.keras.Model
For single GPU training
There's no additional change you need to make in the training script.
Without Keras
SageMaker Training Compiler does not support eager execution in TensorFlow. Accordingly, you should
wrap your model and training loops with the TensorFlow function decorator
(@tf.function
) to leverage compiler acceleration.
SageMaker Training Compiler performs a graph-level optimization, and uses the
decorator to make sure your TensorFlow functions are set to run in graph
mode
For single GPU training
TensorFlow 2.0 or later has the eager execution on by default, so you should
add the @tf.function
decorator in front of every function that you
use for constructing a TensorFlow model.
TensorFlow Models with Hugging Face Transformers
TensorFlow models with Hugging
Face TransformersHuggingFace
estimator with the SageMaker Training Compiler configuration class as shown in the previous topic at Run TensorFlow Training Jobs with SageMaker Training Compiler.
SageMaker Training Compiler automatically optimizes model training workloads that are built on top of the native TensorFlow API or the high-level Keras API, such as the TensorFlow transformer models.
Tip
When you create a tokenizer for an NLP model using Transformers in your training
script, make sure that you use a static input tensor shape by specifying
padding='max_length'
. Do not use padding='longest'
because padding to the longest sequence in the batch can change the tensor shape for
each training batch. The dynamic input shape can initiate recompilation of the model
and might increase total training time. For more information about padding options
of the Transformers tokenizers, see Padding and
truncation
Topics
Using Keras
For the best compiler acceleration, we recommend using models that are subclasses
of TensorFlow Keras (tf.keras.Model
For single GPU training
There's no additional change you need to make in the training script.
For distributed training
SageMaker Training Compiler acceleration works transparently for multi-GPU workloads when the
model is constructed and trained using Keras APIs within the scope of tf.distribute.Strategy.scope()
-
Choose the right distributed training strategy.
-
For single-node multi-GPU, use
tf.distribute.MirroredStrategy
to set the strategy.strategy = tf.distribute.MirroredStrategy()
-
For multi-node multi-GPU, add the following code to properly set the TensorFlow distributed training configuration before creating the strategy.
def set_sm_dist_config(): DEFAULT_PORT = '8890' DEFAULT_CONFIG_FILE = '/opt/ml/input/config/resourceconfig.json' with open(DEFAULT_CONFIG_FILE) as f: config = json.loads(f.read()) current_host = config['current_host'] tf_config = { 'cluster': { 'worker': [] }, 'task': {'type': 'worker', 'index': -1} } for i, host in enumerate(config['hosts']): tf_config['cluster']['worker'].append("%s:%s" % (host, DEFAULT_PORT)) if current_host == host: tf_config['task']['index'] = i os.environ['TF_CONFIG'] = json.dumps(tf_config) set_sm_dist_config()
Use
tf.distribute.MultiWorkerMirroredStrategy
to set the strategy.strategy = tf.distribute.MultiWorkerMirroredStrategy()
-
-
Using the strategy of your choice, wrap the model.
with strategy.scope(): # create a model and do fit
Without Keras
If you want to bring custom models with custom training loops using TensorFlow without
Keras, you should wrap the model and the training loop with the TensorFlow function
decorator (@tf.function
) to leverage compiler acceleration.
SageMaker Training Compiler performs a graph-level optimization, and uses the decorator to make sure your TensorFlow functions are set to run in graph mode.
For single GPU training
TensorFlow 2.0 or later has the eager execution on by default, so you should add
the @tf.function
decorator in front of every function that you use
for constructing a TensorFlow model.
For distributed training
In addition to the changes needed for Using Keras for distributed training, you need to ensure that
functions to be run on each GPU are annotated with @tf.function
,
while cross-GPU communication functions are not annotated. An example training
code should look like the following:
@tf.function() def compiled_step(inputs, outputs): with tf.GradientTape() as tape: pred=model(inputs, training=True) total_loss=loss_object(outputs, pred)/args.batch_size gradients=tape.gradient(total_loss, model.trainable_variables) return total_loss, pred, gradients def train_step(inputs, outputs): total_loss, pred, gradients=compiled_step(inputs, outputs) if args.weight_decay > 0.: gradients=[g+v*args.weight_decay for g,v in zip(gradients, model.trainable_variables)] optimizer.apply_gradients(zip(gradients, model.trainable_variables)) train_loss.update_state(total_loss) train_accuracy.update_state(outputs, pred) @tf.function() def train_step_dist(inputs, outputs): strategy.run(train_step, args= (inputs, outputs))
Note that this instruction can be used for both single-node multi-GPU and multi-node multi-GPU.