Define metrics and environment variables
A tuning job optimizes hyperparameters for training jobs that it launches by using a metric to evaluate performance. This guide shows how to define metrics so that you can use a custom algorithm for training, or use a built-in algorithm from Amazon SageMaker AI. This guide also shows how to specify environment variables during an Automatic model tuning (AMT) job.
Define metrics
Amazon SageMaker AI hyperparameter tuning parses your machine learning algorithm's
stdout
and stderr
streams to find metrics, such as loss or
validation-accuracy. The metrics show how well the model is performing on the dataset.
The following sections describe how to use two types of algorithms for training: built-in and custom.
Use a built-in algorithm for training
If you use one of the SageMaker AI built-in algorithms, metrics are already defined for you. In addition, built-in algorithms automatically send metrics to hyperparameter tuning for optimization. These metrics are also written to Amazon CloudWatch logs. For more information, see Log Amazon SageMaker AI Events with Amazon CloudWatch.
For the objective metric for the tuning job, choose one of the metrics that the built-in algorithm emits. For a list of available metrics, see the model tuning section for the appropriate algorithm in Use Amazon SageMaker AI Built-in Algorithms or Pre-trained Models.
You can choose up to 40 metrics to monitor in your tuning job. Select one of those metrics to be the objective metric. The hyperparameter tuning job returns the training job that performed the best against the objective metric.
Note
Hyperparameter tuning automatically sends an additional hyperparameter
_tuning_objective_metric
to pass your objective metric to the tuning job
for use during training.
Use a custom algorithm for training
This section shows how to define your own metrics to use your own custom algorithm for
training. When doing so, make sure that your algorithm writes at least one metric to
stderr
or stdout
. Hyperparameter tuning parses these streams
to find algorithm metrics that show how well the model is performing on the
dataset.
You can define custom metrics by specifying a name and regular expression for each
metric that your tuning job monitors. Then, pass these metric definitions to the CreateHyperParameterTuningJob
API in the
TrainingJobDefinition
parameter in the MetricDefinitions
field
of AlgorithmSpecification
.
The following shows sample output from a log written to stderr
or
stdout
by a training algorithm.
GAN_loss=0.138318; Scaled_reg=2.654134; disc:[-0.017371,0.102429] real 93.3% gen 0.0% disc-combined=0.000000; disc_train_loss=1.374587; Loss = 16.020744; Iteration 0 took 0.704s; Elapsed=0s
The following code example shows how to use regular expressions in Python (regex). This is used to search the sample log output and capture the numeric values of four different metrics.
[ { "Name": "ganloss", "Regex": "GAN_loss=(.*?);", }, { "Name": "disc-combined", "Regex": "disc-combined=(.*?);", }, { "Name": "discloss", "Regex": "disc_train_loss=(.*?);", }, { "Name": "loss", "Regex": "Loss = (.*?);", }, ]
In regular expressions, parenthesis ()
are used to group parts of the
regular expression together.
-
For the
loss
metric that is defined in the code example, the expression(.*?);
captures any character between the exact text"Loss="
and the first semicolon (;
) character. -
The character
.
instructs the regular expression to match any character. -
The character
*
means to match zero or more characters. -
The character
?
means capture only until the first instance of the;
character.
The loss metric defined in the code sample will capture Loss = 16.020744
from the sample output.
Choose one of the metrics that you define as the objective metric for the tuning job.
If you are using the SageMaker API, specify the value of the name
key in the
HyperParameterTuningJobObjective
field of the
HyperParameterTuningJobConfig
parameter that you send to the CreateHyperParameterTuningJob
operation.
Specify environment variables
SageMaker AI AMT optimizes hyperparameters within a tuning job to find the best parameters for model performance. You can use environment variables to configure your tuning job to change its behavior. You can also use environment variables that you used during training inside your tuning job.
If you want to use an environment variable from your tuning job or specify a new
environment variable, input a string value for Environment
within the SageMaker AI
HyperParameterTrainingJobDefinition
API. Pass this training job definition to the CreateHyperParameterTuningJob API.
For example, the environment variable SM_LOG_LEVEL
can be set to the
following values to tailor the output from a Python container.
NOTSET=0 DEBUG=10 INFO=20 WARN=30 ERROR=40 CRITICAL=50
As an example, to set the log level to 10
to debug your container logs, set
the environment variable inside the HyperParameterTrainingJobDefinition, as follows.
{ "HyperParameterTuningJobConfig": { ..., } "TrainingJobDefinition": { ..., "Environment" : [ { "SM_LOG_LEVEL": 10 } ], ..., }, ..., }