Define metrics and environment variables - Amazon SageMaker AI

Define metrics and environment variables

A tuning job optimizes hyperparameters for training jobs that it launches by using a metric to evaluate performance. This guide shows how to define metrics so that you can use a custom algorithm for training, or use a built-in algorithm from Amazon SageMaker AI. This guide also shows how to specify environment variables during an Automatic model tuning (AMT) job.

Define metrics

Amazon SageMaker AI hyperparameter tuning parses your machine learning algorithm's stdout and stderr streams to find metrics, such as loss or validation-accuracy. The metrics show how well the model is performing on the dataset.

The following sections describe how to use two types of algorithms for training: built-in and custom.

Use a built-in algorithm for training

If you use one of the SageMaker AI built-in algorithms, metrics are already defined for you. In addition, built-in algorithms automatically send metrics to hyperparameter tuning for optimization. These metrics are also written to Amazon CloudWatch logs. For more information, see Log Amazon SageMaker AI Events with Amazon CloudWatch.

For the objective metric for the tuning job, choose one of the metrics that the built-in algorithm emits. For a list of available metrics, see the model tuning section for the appropriate algorithm in Use Amazon SageMaker AI Built-in Algorithms or Pre-trained Models.

You can choose up to 40 metrics to monitor in your tuning job. Select one of those metrics to be the objective metric. The hyperparameter tuning job returns the training job that performed the best against the objective metric.

Note

Hyperparameter tuning automatically sends an additional hyperparameter _tuning_objective_metric to pass your objective metric to the tuning job for use during training.

Use a custom algorithm for training

This section shows how to define your own metrics to use your own custom algorithm for training. When doing so, make sure that your algorithm writes at least one metric to stderr or stdout. Hyperparameter tuning parses these streams to find algorithm metrics that show how well the model is performing on the dataset.

You can define custom metrics by specifying a name and regular expression for each metric that your tuning job monitors. Then, pass these metric definitions to the CreateHyperParameterTuningJob API in the TrainingJobDefinition parameter in the MetricDefinitions field of AlgorithmSpecification.

The following shows sample output from a log written to stderr or stdout by a training algorithm.

GAN_loss=0.138318; Scaled_reg=2.654134; disc:[-0.017371,0.102429] real 93.3% gen 0.0% disc-combined=0.000000; disc_train_loss=1.374587; Loss = 16.020744; Iteration 0 took 0.704s; Elapsed=0s

The following code example shows how to use regular expressions in Python (regex). This is used to search the sample log output and capture the numeric values of four different metrics.

[ { "Name": "ganloss", "Regex": "GAN_loss=(.*?);", }, { "Name": "disc-combined", "Regex": "disc-combined=(.*?);", }, { "Name": "discloss", "Regex": "disc_train_loss=(.*?);", }, { "Name": "loss", "Regex": "Loss = (.*?);", }, ]

In regular expressions, parenthesis () are used to group parts of the regular expression together.

  • For the loss metric that is defined in the code example, the expression (.*?); captures any character between the exact text "Loss=" and the first semicolon (;) character.

  • The character . instructs the regular expression to match any character.

  • The character * means to match zero or more characters.

  • The character ? means capture only until the first instance of the ; character.

The loss metric defined in the code sample will capture Loss = 16.020744 from the sample output.

Choose one of the metrics that you define as the objective metric for the tuning job. If you are using the SageMaker API, specify the value of the name key in the HyperParameterTuningJobObjective field of the HyperParameterTuningJobConfig parameter that you send to the CreateHyperParameterTuningJob operation.

Specify environment variables

SageMaker AI AMT optimizes hyperparameters within a tuning job to find the best parameters for model performance. You can use environment variables to configure your tuning job to change its behavior. You can also use environment variables that you used during training inside your tuning job.

If you want to use an environment variable from your tuning job or specify a new environment variable, input a string value for Environment within the SageMaker AI HyperParameterTrainingJobDefinition API. Pass this training job definition to the CreateHyperParameterTuningJob API.

For example, the environment variable SM_LOG_LEVEL can be set to the following values to tailor the output from a Python container.

NOTSET=0 DEBUG=10 INFO=20 WARN=30 ERROR=40 CRITICAL=50

As an example, to set the log level to 10 to debug your container logs, set the environment variable inside the HyperParameterTrainingJobDefinition, as follows.

{ "HyperParameterTuningJobConfig": { ..., } "TrainingJobDefinition": { ..., "Environment" : [ { "SM_LOG_LEVEL": 10 } ], ..., }, ..., }