TensorBoard in Amazon SageMaker
Amazon SageMaker with TensorBoard is a capability of Amazon SageMaker that brings the TensorBoard
Note
This feature is for debugging the training of deep learning models using PyTorch or TensorFlow.
For data scientists
Training large models can have scientific problems that require data scientists to debug and resolve them in order to improve model convergence and stabilize gradient descent processes.
When you encounter model training issues, such as loss not converging, or vanishing or exploding weights and gradients, you need to access tensor data to dive deep and analyze the model parameters, scalars, and any custom metrics. Using SageMaker with TensorBoard, you can visualize model output tensors extracted from training jobs. As you experiment with different models, multiple training runs, and model hyperparameters, you can select multiple training jobs in TensorBoard and compare them in one place.
For administrators
Through the TensorBoard landing page in the SageMaker console or SageMaker domain, you can manage TensorBoard application users if you are an administrator of an AWS account or SageMaker domain. Each domain user can access their own TensorBoard application given the granted permissions. As a SageMaker domain administrator and domain user, you can create and delete the TensorBoard application given the permission level you have.
Note
You cannot share the TensorBoard application for collaboration purposes because SageMaker domain does not allow application sharing among users. Users can share the output tensors saved in an S3 bucket, if they have access to the bucket.
Supported frameworks and AWS Regions
The TensorBoard application in SageMaker is available for the following machine learning frameworks and AWS Regions.
Frameworks
-
PyTorch
-
TensorFlow
-
Hugging Face Transformers
AWS Regions
-
US East (N. Virginia) (
us-east-1
) -
US East (Ohio) (
us-east-2
) -
US West (Oregon) (
us-west-2
) -
Europe (Frankfurt) (
eu-central-1
) -
Europe (Ireland) (
eu-west-1
)
Note
Amazon SageMaker with TensorBoard runs on an ml.r5.large
instance and incurs
charges after the SageMaker free tier or the free trial period of the feature. For more
information, see Amazon SageMaker
Pricing