

# Automatic model tuning with SageMaker AI
<a name="automatic-model-tuning"></a>

Amazon SageMaker AI automatic model tuning (AMT) finds the best version of a model by running many training jobs on your dataset. Amazon SageMaker AI automatic model tuning (AMT) is also known as hyperparameter tuning. To do this, AMT uses the algorithm and ranges of hyperparameters that you specify. It then chooses the hyperparameter values that creates a model that performs the best, as measured by a metric that you choose.

For example, running a *[binary classification](https://docs.aws.amazon.com/glossary/latest/reference/glos-chap.html#binary-classification-model)* problem on a marketing dataset. Your goal is to maximize the *[area under the curve (AUC)](https://docs.aws.amazon.com/glossary/latest/reference/glos-chap.html#AUC)* metric of the algorithm by training an [XGBoost algorithm with Amazon SageMaker AI](xgboost.md) model. You want to find which values for the `eta`, `alpha`, `min_child_weight`, and `max_depth` hyperparameters that will train the best model. Specify a range of values for these hyperparameters. Then, SageMaker AI hyperparameter tuning searches within the ranges to find a combination that creates a training job that creates a model with the highest AUC. To conserve resources or meet a specific model quality expectation, set up completion criteria to stop tuning after the criteria have been met.

You can use SageMaker AI AMT with built-in algorithms, custom algorithms, or SageMaker AI pre-built containers for machine learning frameworks.

SageMaker AI AMT can use an Amazon EC2 Spot instance to optimize costs when running training jobs. For more information, see [Managed Spot Training in Amazon SageMaker AI](model-managed-spot-training.md).

Before you start using hyperparameter tuning, you should have a well-defined machine learning problem, including the following:
+ A dataset
+ An understanding of the type of algorithm that you need to train
+ A clear understanding of how you measure success

Prepare your dataset and algorithm so that they work in SageMaker AI and successfully run a training job at least once. For information about setting up and running a training job, see [Guide to getting set up with Amazon SageMaker AI](gs.md).

**Topics**
+ [

# Understand the hyperparameter tuning strategies available in Amazon SageMaker AI
](automatic-model-tuning-how-it-works.md)
+ [

# Define metrics and environment variables
](automatic-model-tuning-define-metrics-variables.md)
+ [

# Define Hyperparameter Ranges
](automatic-model-tuning-define-ranges.md)
+ [

# Track and set completion criteria for your tuning job
](automatic-model-tuning-progress.md)
+ [

# Tune Multiple Algorithms with Hyperparameter Optimization to Find the Best Model
](multiple-algorithm-hpo.md)
+ [

# Example: Hyperparameter Tuning Job
](automatic-model-tuning-ex.md)
+ [

# Stop Training Jobs Early
](automatic-model-tuning-early-stopping.md)
+ [

# Run a Warm Start Hyperparameter Tuning Job
](automatic-model-tuning-warm-start.md)
+ [

# Resource Limits for Automatic Model Tuning
](automatic-model-tuning-limits.md)
+ [

# Best Practices for Hyperparameter Tuning
](automatic-model-tuning-considerations.md)

# Understand the hyperparameter tuning strategies available in Amazon SageMaker AI
<a name="automatic-model-tuning-how-it-works"></a>

When you build complex machine learning systems like deep learning neural networks, exploring all of the possible combinations is impractical. Hyperparameter tuning can accelerate your productivity by trying many variations of a model. It looks for the best model automatically by focusing on the most promising combinations of hyperparameter values within the ranges that you specify. To get good results, you must choose the right ranges to explore. This page provides a brief explanation of the different hyperparameter tuning strategies that you can use with Amazon SageMaker AI.

Use the [API reference guide](https://docs.aws.amazon.com/sagemaker/latest/APIReference/Welcome.html?icmpid=docs_sagemaker_lp) to understand how to interact with hyperparameter tuning. You can use the tuning strategies described on this page with the [HyperParameterTuningJobConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTuningJobConfig.html) and [HyperbandStrategyConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperbandStrategyConfig.html) APIs.

**Note**  
Because the algorithm itself is stochastic, the hyperparameter tuning model may fail to converge on the best answer. This can occur even if the best possible combination of values is within the ranges that you choose.

## Grid search
<a name="automatic-tuning-grid-search"></a>

 When using grid search, hyperparameter tuning chooses combinations of values from the range of categorical values that you specify when you create the job. Only categorical parameters are supported when using the grid search strategy. You do not need to specify the `MaxNumberOfTrainingJobs`. The number of training jobs created by the tuning job is automatically calculated to be the total number of distinct categorical combinations possible. If specified, the value of `MaxNumberOfTrainingJobs` should equal the total number of distinct categorical combinations possible.

## Random search
<a name="automatic-tuning-random-search"></a>

When using random search, hyperparameter tuning chooses a random combination of hyperparameter values in the ranges that you specify for each training job it launches. The choice of hyperparameter values doesn't depend on the results of previous training jobs. As a result, you can run the maximum number of concurrent training jobs without changing the performance of the tuning.

For an example notebook that uses random search, see the [ Random search and hyperparameter scaling with SageMaker XGBoost and Automatic Model Tuning](https://github.com/aws/amazon-sagemaker-examples-community/blob/215215eb25b40eadaf126d055dbb718a245d7603/training/sagemaker-automatic-model-tuning/hpo_xgboost_random_log.ipynb) notebook.

## Bayesian optimization
<a name="automatic-tuning-bayesian-optimization"></a>

Bayesian optimization treats hyperparameter tuning like a *[regression](https://docs.aws.amazon.com/glossary/latest/reference/glos-chap.html#[regression])* problem. Given a set of input features (the hyperparameters), hyperparameter tuning optimizes a model for the metric that you choose. To solve a regression problem, hyperparameter tuning makes guesses about which hyperparameter combinations are likely to get the best results. It then runs training jobs to test these values. After testing a set of hyperparameter values, hyperparameter tuning uses regression to choose the next set of hyperparameter values to test.

Hyperparameter tuning uses an Amazon SageMaker AI implementation of Bayesian optimization.

When choosing the best hyperparameters for the next training job, hyperparameter tuning considers everything that it knows about this problem so far. Sometimes it chooses a combination of hyperparameter values close to the combination that resulted in the best previous training job to incrementally improve performance. This allows hyperparameter tuning to use the best known results. Other times, it chooses a set of hyperparameter values far removed from those it has tried. This allows it to explore the range of hyperparameter values to try to find new areas that are not yet well understood. The explore/exploit trade-off is common in many machine learning problems.

For more information about Bayesian optimization, see the following:

**Basic Topics on Bayesian Optimization**
+ [A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning](https://arxiv.org/abs/1012.2599)
+ [Practical Bayesian Optimization of Machine Learning Algorithms](https://arxiv.org/abs/1206.2944)
+ [Taking the Human Out of the Loop: A Review of Bayesian Optimization](https://ieeexplore.ieee.org/document/7352306?reload=true)

**Speeding up Bayesian Optimization**
+ [Google Vizier: A Service for Black-Box Optimization](https://dl.acm.org/doi/10.1145/3097983.3098043)
+ [Learning Curve Prediction with Bayesian Neural Networks](https://openreview.net/forum?id=S11KBYclx)
+ [Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves](https://dl.acm.org/doi/10.5555/2832581.2832731)

**Advanced Modeling and Transfer Learning**
+ [Scalable Hyperparameter Transfer Learning](https://papers.nips.cc/paper_files/paper/2018/hash/14c879f3f5d8ed93a09f6090d77c2cc3-Abstract.html)
+ [Bayesian Optimization with Tree-structured Dependencies](http://proceedings.mlr.press/v70/jenatton17a.html)
+ [Bayesian Optimization with Robust Bayesian Neural Networks](https://papers.nips.cc/paper_files/paper/2016/hash/291597a100aadd814d197af4f4bab3a7-Abstract.html)
+ [Scalable Bayesian Optimization Using Deep Neural Networks](http://proceedings.mlr.press/v37/snoek15.pdf)
+ [Input Warping for Bayesian Optimization of Non-stationary Functions](https://arxiv.org/abs/1402.0929)

## Hyperband
<a name="automatic-tuning-hyperband"></a>

Hyperband is a multi-fidelity based tuning strategy that dynamically reallocates resources. Hyperband uses both intermediate and final results of training jobs to re-allocate epochs to well-utilized hyperparameter configurations and automatically stops those that underperform. It also seamlessly scales to using many parallel training jobs. These features can significantly speed up hyperparameter tuning over random search and Bayesian optimization strategies.

Hyperband should only be used to tune iterative algorithms that publish results at different resource levels. For example, Hyperband can be used to tune a neural network for image classification which publishes accuracy metrics after every epoch.

For more information about Hyperband, see the following links:
+ [Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization](http://arxiv.org/pdf/1603.06560)
+ [Massively Parallel Hyperparameter Tuning](https://liamcli.com/assets/pdf/asha_arxiv.pdf)
+ [BOHB: Robust and Efficient Hyperparameter Optimization at Scale](http://proceedings.mlr.press/v80/falkner18a/falkner18a.pdf)
+ [Model-based Asynchronous Hyperparameter and Neural Architecture Search](https://openreview.net/pdf?id=a2rFihIU7i)

### Hyperband with early stopping
<a name="automatic-tuning-hyperband-early-stopping"></a>

Training jobs can be stopped early when they are unlikely to improve the objective metric of the hyperparameter tuning job. This can help reduce compute time and avoid overfitting your model. Hyperband uses an advanced internal mechanism to apply early stopping. The parameter `TrainingJobEarlyStoppingType` in the `HyperParameterTuningJobConfig` API must be set to `OFF` when using the Hyperband internal early stopping feature.

**Note**  
Hyperparameter tuning might not improve your model. It is an advanced tool for building machine solutions. As such, it should be considered part of the scientific development process. 

# Define metrics and environment variables
<a name="automatic-model-tuning-define-metrics-variables"></a>

A tuning job optimizes hyperparameters for training jobs that it launches by using a metric to evaluate performance. This guide shows how to define metrics so that you can use a custom algorithm for training, or use a built-in algorithm from Amazon SageMaker AI. This guide also shows how to specify environment variables during an Automatic model tuning (AMT) job.

## Define metrics
<a name="automatic-model-tuning-define-metrics"></a>

Amazon SageMaker AI hyperparameter tuning parses your machine learning algorithm's `stdout` and `stderr` streams to find metrics, such as loss or validation-accuracy. The metrics show how well the model is performing on the dataset. 

The following sections describe how to use two types of algorithms for training: built-in and custom.

### Use a built-in algorithm for training
<a name="automatic-model-tuning-define-metrics-builtin"></a>

If you use one of the [SageMaker AI built-in algorithms](https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html), metrics are already defined for you. In addition, built-in algorithms automatically send metrics to hyperparameter tuning for optimization. These metrics are also written to Amazon CloudWatch logs. For more information, see [Log Amazon SageMaker AI Events with Amazon CloudWatch](https://docs.aws.amazon.com/sagemaker/latest/dg/logging-cloudwatch.html). 

For the objective metric for the tuning job, choose one of the metrics that the built-in algorithm emits. For a list of available metrics, see the model tuning section for the appropriate algorithm in [Use Amazon SageMaker AI Built-in Algorithms or Pre-trained Models](https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html).

You can choose up to 40 metrics to monitor in your [tuning job](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterAlgorithmSpecification.html). Select one of those metrics to be the objective metric. The hyperparameter tuning job returns the [training job ](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeHyperParameterTuningJob.html#sagemaker-DescribeHyperParameterTuningJob-response-BestTrainingJob) that performed the best against the objective metric.

**Note**  
Hyperparameter tuning automatically sends an additional hyperparameter `_tuning_objective_metric` to pass your objective metric to the tuning job for use during training.

### Use a custom algorithm for training
<a name="automatic-model-tuning-define-metrics-custom"></a>

This section shows how to define your own metrics to use your own custom algorithm for training. When doing so, make sure that your algorithm writes at least one metric to `stderr` or `stdout`. Hyperparameter tuning parses these streams to find algorithm metrics that show how well the model is performing on the dataset.

You can define custom metrics by specifying a name and regular expression for each metric that your tuning job monitors. Then, pass these metric definitions to the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html) API in the `TrainingJobDefinition` parameter in the `MetricDefinitions` field of `AlgorithmSpecification`.

The following shows sample output from a log written to `stderr` or `stdout` by a training algorithm.

```
GAN_loss=0.138318;  Scaled_reg=2.654134; disc:[-0.017371,0.102429] real 93.3% gen 0.0% disc-combined=0.000000; disc_train_loss=1.374587;  Loss = 16.020744;  Iteration 0 took 0.704s;  Elapsed=0s
```

The following code example shows how to use regular expressions in Python (regex). This is used to search the sample log output and capture the numeric values of four different metrics.

```
[
    {
        "Name": "ganloss",
        "Regex": "GAN_loss=(.*?);",
    },
    {
        "Name": "disc-combined",
        "Regex": "disc-combined=(.*?);",
    },
    {
        "Name": "discloss",
        "Regex": "disc_train_loss=(.*?);",
    },
    {
        "Name": "loss",
        "Regex": "Loss = (.*?);",
    },
]
```

In regular expressions, parenthesis `()` are used to group parts of the regular expression together.
+ For the `loss` metric that is defined in the code example, the expression `(.*?);` captures any character between the exact text `"Loss="` and the first semicolon (`;`) character.
+ The character `.` instructs the regular expression to match any character.
+  The character `*` means to match zero or more characters. 
+ The character `?` means capture only until the first instance of the `;` character. 

The loss metric defined in the code sample will capture `Loss = 16.020744` from the sample output.

Choose one of the metrics that you define as the objective metric for the tuning job. If you are using the SageMaker API, specify the value of the `name` key in the `HyperParameterTuningJobObjective` field of the `HyperParameterTuningJobConfig` parameter that you send to the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html) operation.

## Specify environment variables
<a name="automatic-model-tuning-define-variables"></a>

SageMaker AI AMT optimizes hyperparameters within a tuning job to find the best parameters for model performance. You can use environment variables to configure your tuning job to change its behavior. You can also use environment variables that you used during training inside your tuning job.

If you want to use an environment variable from your tuning job or specify a new environment variable, input a string value for `Environment` within the SageMaker AI [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTrainingJobDefinition.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTrainingJobDefinition.html) API. Pass this training job definition to the [CreateHyperParameterTuningJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html) API.

For example, the environment variable `SM_LOG_LEVEL` can be set to the following values to tailor the output from a Python container.

```
NOTSET=0
DEBUG=10
INFO=20
WARN=30
ERROR=40
CRITICAL=50
```

As an example, to set the log level to `10` to debug your container logs, set the environment variable inside the [HyperParameterTrainingJobDefinition](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTrainingJobDefinition.html), as follows.

```
{
   "[HyperParameterTuningJobConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html#sagemaker-CreateHyperParameterTuningJob-request-HyperParameterTuningJobConfig)": { 
   ...,
   }
   "[TrainingJobDefinition](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html#sagemaker-CreateHyperParameterTuningJob-request-TrainingJobDefinition)": { 
      ...,
      "Environment" : [
          {
            "SM_LOG_LEVEL": 10 
          }
      ],
      ...,
   },
   ...,        
}
```

# Define Hyperparameter Ranges
<a name="automatic-model-tuning-define-ranges"></a>

This guide shows how to use SageMaker APIs to define hyperparameter ranges. It also provides a list of hyperparameter scaling types that you can use.

Choosing hyperparameters and ranges significantly affects the performance of your tuning job. Hyperparameter tuning finds the best hyperparameter values for your model by searching over a [range](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTrainingJobDefinition.html#sagemaker-Type-HyperParameterTrainingJobDefinition-HyperParameterRanges) of values that you specify for each tunable hyperparameter. You can also specify up to 100 [static hyperparameters](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTrainingJobDefinition.html#sagemaker-Type-HyperParameterTrainingJobDefinition-StaticHyperParameters) that do not change over the course of the tuning job. You can use up to 100 hyperparameters in total (static \$1 tunable). For guidance on choosing hyperparameters and ranges, see [Best Practices for Hyperparameter Tuning](automatic-model-tuning-considerations.md). You can also use autotune to find optimal tuning job settings. For more information, see the following **Autotune** section.

**Note**  
SageMaker AI Automatic Model Tuning (AMT) may add additional hyperparameters(s) that contribute to the limit of 100 total hyperparameters. Currently, to pass your objective metric to the tuning job for use during training, SageMaker AI adds `_tuning_objective_metric` automatically.

## Static hyperparameters
<a name="automatic-model-tuning-define-ranges-static"></a>

Use static hyperparameters for the following cases:    If you have background knowledge that guides you to select a constant value.   If you don't want to explore a value range for the hyperparameters.   For example, you can use AMT to tune your model using `param1` (a tunable parameter) and `param2` (a static parameter). If you do, then use a search space for `param1` that lies between two values, and pass `param2` as a static hyperparameter, as follows.

```
param1: ["range_min","range_max"]
param2: "static_value"
```

Static hyperparameters have the following structure:

```
"StaticHyperParameters": {
    "objective" : "reg:squarederror",
    "dropout_rate": "0.3"
}
```

You can use the Amazon SageMaker API to specify key value pairs in the [StaticHyperParameters](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTrainingJobDefinition.html#sagemaker-Type-HyperParameterTrainingJobDefinition-StaticHyperParameters) field of the `HyperParameterTrainingJobDefinition` parameter that you pass to the [CreateHyperParameterTuningJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html) operation.

## Dynamic hyperparameters
<a name="automatic-model-tuning-define-ranges-dynamic"></a>

You can use the SageMaker API to define [hyperparameter ranges](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTrainingJobDefinition.html#sagemaker-Type-HyperParameterTrainingJobDefinition-HyperParameterRanges). Specify the names of hyperparameters and ranges of values in the `ParameterRanges` field of the `HyperParameterTuningJobConfig` parameter that you pass to the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html) operation. 

The `ParameterRanges` field has three subfields: categorical, integer, and continuous. You can define up to 30 total (categorical \$1 integer \$1 continuous) tunable hyperparameters to search over. 

**Note**  
Each categorical hyperparameter can have at most 30 different values.

Dynamic hyperparameters have the following structure:

```
"ParameterRanges": {
    "CategoricalParameterRanges": [
        {
            "Name": "tree_method",
            "Values": ["auto", "exact", "approx", "hist"]
        }
    ],
    "ContinuousParameterRanges": [
        {
            "Name": "eta",
            "MaxValue" : "0.5",
            "MinValue": "0",
            "ScalingType": "Auto"
        }
    ],
    "IntegerParameterRanges": [
        {
            "Name": "max_depth",
            "MaxValue": "10",
            "MinValue": "1",
            "ScalingType": "Auto"
        }
    ]
}
```

If you create a tuning job with a `Grid` strategy, you can only specify categorical values. You don't need to provide the `MaxNumberofTrainingJobs`. This value is inferred from the total number of configurations that can be produced from your categorical parameters. If specified, the value of `MaxNumberOfTrainingJobs` should be equal to the total number of distinct categorical combinations possible.

## Autotune
<a name="automatic-model-tuning-define-ranges-autotune"></a>

To save time and resources searching for hyperparameter ranges, resources or objective metrics, autotune can automatically guess optimal values for some hyperparameter fields. Use autotune to find optimal values for the following fields:
+ **[ParameterRanges](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTuningJobConfig.html#sagemaker-Type-HyperParameterTuningJobConfig-ParameterRanges)** – The names and ranges of hyperparameters that a tuning job can optimize.
+ **[ResourceLimits](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ResourceLimits.html) ** – The maximum resources to be used in a tuning job. These resources can include the maximum number of training jobs, maximum runtime of a tuning job, and the maximum number of training jobs that can be run at the same time.
+ **[TrainingJobEarlyStoppingType](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTuningJobConfig.html#sagemaker-Type-HyperParameterTuningJobConfig-TrainingJobEarlyStoppingType)** – A flag that stops a training job if a job is not significantly improving against an objective metric. Defaults to enabled. For more information, see [Stop Training Jobs Early](automatic-model-tuning-early-stopping.md).
+ **[RetryStrategy](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTrainingJobDefinition.html#sagemaker-Type-HyperParameterTrainingJobDefinition-RetryStrategy)** – The number of times to retry a training job. Non-zero values for `RetryStrategy` can increase the likelihood that your job will complete successfully.
+ **[Strategy](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTuningJobConfig.html#sagemaker-Type-HyperParameterTuningJobConfig-Strategy)** – Specifies how hyperparameter tuning chooses the combinations of hyperparameter values to use for the training job that it launches.
+ **[ConvergenceDetected](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ConvergenceDetected.html)** – A flag to indicate that Automatic Model Tuning (AMT) has detected model convergence.

To use autotune, do the following:

1. Specify the hyperparameter and an example value in the `AutoParameters` field of the [ParameterRanges](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ParameterRanges.html) API.

1. Enable autotune.

AMT will determine if your hyperparameters and example values are eligible for autotune. Hyperparameters that can be used in autotune are automatically assigned to the appropriate parameter range type. Then, AMT uses `ValueHint` to select an optimal range for you. You can use the `DescribeHyperParameterTrainingJob` API to view these ranges.

The following example shows you how to configure a tuning job that uses autotune. In the configuration example, the hyperparameter `max_depth` has `ValueHint` containing an example value of `4`.

```
config = {
    'Autotune': {'Mode': 'Enabled'},
    'HyperParameterTuningJobName':'my-autotune-job',
    'HyperParameterTuningJobConfig': {
        'HyperParameterTuningJobObjective': {'Type': 'Minimize', 'MetricName': 'validation:rmse'},
        'ResourceLimits': {'MaxNumberOfTrainingJobs': 5, 'MaxParallelTrainingJobs': 1},
        'ParameterRanges': {       
            'AutoParameters': [
                {'Name': 'max_depth', 'ValueHint': '4'}
            ]
        }
    },
    'TrainingJobDefinition': {
    .... }
```

Continuing the previous example, a tuning job is created after the previous configuration is included in a call to the `CreateHyperParameterTuningJob` API. Then, autotune converts the max\$1depth hyperparameter in AutoParameters to the hyperparameter `IntegerParameterRanges`. The following response from a `DescribeHyperParameterTrainingJob` API shows that the optimal `IntegerParameterRanges` for `max_depth` are between `2` and `8`.

```
{
    'HyperParameterTuningJobName':'my_job',
    'HyperParameterTuningJobConfig': {
        'ParameterRanges': {
            'IntegerParameterRanges': [
                {'Name': 'max_depth', 'MinValue': '2', 'MaxValue': '8'},
            ],
        }
    },
    'TrainingJobDefinition': {
        ...
    },
    'Autotune': {'Mode': 'Enabled'}
    
}
```

## Hyperparameter scaling types
<a name="scaling-type"></a>

For integer and continuous hyperparameter ranges, you can choose the scale that you want hyperparameter tuning to use. For example, to search the range of values, you can specify a value for the `ScalingType` field of the hyperparameter range. You can choose from the following hyperparameter scaling types:

Auto  
SageMaker AI hyperparameter tuning chooses the best scale for the hyperparameter.

Linear  
Hyperparameter tuning searches the values in the hyperparameter range by using a linear scale. Typically, you choose this if the range of all values from the lowest to the highest is relatively small (within one order of magnitude). Uniformly searching values from the range provides a reasonable exploration of the entire range.

Logarithmic  
Hyperparameter tuning searches the values in the hyperparameter range by using a logarithmic scale.  
Logarithmic scaling works only for ranges that have values greater than 0.  
Choose logarithmic scaling when you're searching a range that spans several orders of magnitude.   
For example, if you're tuning a [Tune a linear learner model](linear-learner.md) model, and you specify a range of values between .0001 and 1.0 for the `learning_rate` hyperparameter, consider the following: Searching uniformly on a logarithmic scale gives you a better sample of the entire range than searching on a linear scale would. This is because searching on a linear scale would, on average, devote 90 percent of your training budget to only the values between .1 and 1.0. As a result, that leaves only 10 percent of your training budget for the values between .0001 and .1.

`ReverseLogarithmic`  
Hyperparameter tuning searches the values in the hyperparameter range by using a reverse logarithmic scale. Reverse logarithmic scaling is supported only for continuous hyperparameter ranges. It is not supported for integer hyperparameter ranges.  
Choose reverse logarithmic scaling when you are searching a range that is highly sensitive to small changes that are very close to 1.  
Reverse logarithmic scaling works only for ranges that are entirely within the range 0<=x<1.0.

For an example notebook that uses hyperparameter scaling, see these [Amazon SageMaker AI hyperparameter examples on GitHub](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/hyperparameter_tuning).

# Track and set completion criteria for your tuning job
<a name="automatic-model-tuning-progress"></a>

You can use completion criteria to instruct Automatic model tuning (AMT) to stop your tuning job if certain conditions are met. With these conditions, you can set a minimum model performance or maximum number of training jobs that don’t improve when evaluated against the objective metric. You can also track the progress of your tuning job and decide to let it continue or to stop it manually. This guide shows you how to set completion criteria, check the progress of and stop your tuning job manually.

## Set completion criteria for your tuning job
<a name="automatic-model-tuning-progress-completion"></a>

During hyperparameter optimization, a tuning job will launch several training jobs inside a loop. The tuning job will do the following. 
+ Check your training jobs for completion and update statistics accordingly
+ Decide what combination of hyperparameters to evaluate next.

AMT will continuously check the training jobs that were launched from your tuning job to update statistics. These statistics include tuning job runtime and best training job. Then, AMT determines whether it should stop the job according to your completion criteria. You can also check these statistics and stop your job manually. For more information about stopping a job manually, see the [Stopping your tuning job manually](#automatic-model-tuning-progress-stop) section.

As an example, if your tuning job meets your objective, you can stop tuning early to conserve resources or ensure model quality. AMT checks your job performance against your completion criteria and stops the tuning job if any have been met. 

You can specify the following kinds of completion criteria:
+ `MaxNumberOfTrainingJobs` – The maximum number of training jobs to be run before tuning is stopped.
+ `MaxNumberOfTrainingJobsNotImproving` – The maximum number of training jobs that do not improve performance against the objective metric from the current best training job. As an example, if the best training job returned an objective metric that had an accuracy of `90%`, and `MaxNumberOfTrainingJobsNotImproving` is set to `10`. In this example, tuning will stop after `10` training jobs fail to return an accuracy higher than `90`%.
+ `MaxRuntimeInSeconds` – The upper limit of wall clock time in seconds of how long a tuning job can run.
+ `TargetObjectiveMetricValue` – The value of the objective metric against which the tuning job is evaluated. Once this value is met, AMT stops the tuning job.
+ `CompleteOnConvergence` – A flag to stop tuning after an internal algorithm determines that the tuning job is unlikely to improve more than 1% over the objective metric from the best training job.

### Selecting completion criteria
<a name="automatic-model-tuning-progress-completion-how"></a>

You can choose one or multiple completion criteria to stop your hyperparameter tuning job after a condition has been meet. The following instructions show you how to select completion criteria and how to decide which is the most appropriate for your use case.
+ Use `MaxNumberOfTrainingJobs` in the [ResourceLimits](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ResourceLimits.html) API to set an upper limit for the number of training jobs that can be run before your tuning job is stopped. Start with a large number and adjust it based on model performance against your tuning job objective. Most users input values of around `50` or more training jobs to find an optimal hyperparameter configuration. Users looking for higher levels of model performance will use `200` or more training jobs.
+ Use `MaxNumberOfTrainingJobsNotImproving` in the [BestObjectiveNotImproving](https://docs.aws.amazon.com//sagemaker/latest/APIReference/API_BestObjectiveNotImproving.html) API field to stop training if model performance fails to improve after a specified number of jobs. Model performance is evaluated against an objective function. After the `MaxNumberOfTrainingJobsNotImproving` is met, AMT will stop the tuning job. Tuning jobs tend to make the most progress in the beginning of the job. Improving model performance against an objective function will require a larger number of training jobs towards the end of tuning. Select a value for `MaxNumberOfTrainingJobsNotImproving` by checking the performance of similar training jobs against your objective metric.
+ Use `MaxRuntimeInSeconds` in the [ResourceLimits](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ResourceLimits.html) API to set an upper limit for the amount of wall clock time that the tuning job may take. Use this field to meet a deadline by which the tuning job must complete or to limit compute resources.

  To get an estimated total compute time in seconds for a tuning job, use the following formula:

  Estimated max compute time in seconds= `MaxRuntimeInSeconds` \$1 `MaxParallelTrainingJobs` \$1 `MaxInstancesPerTrainingJob` 
**Note**  
The actual duration of a tuning job may deviate slightly from the value specified in this field.
+ Use `TargetObjectiveMetricValue` in the [TuningJobCompletionCriteria](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_TuningJobCompletionCriteria.html) API to stop your tuning job. You stop the tuning job after any training job that is launched by the tuning job reaches this objective metric value. Use this field if your use case depends on reaching a specific performance level, rather than spending compute resources to find the best possible model.
+ Use `CompleteOnConvergence` in the [TuningJobCompletionCriteria](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_TuningJobCompletionCriteria.html) API to stop a tuning job after AMT has detected that the tuning job has converged, and is unlikely to make further significant progress. Use this field when it is not clear what values for any of the other completion criteria should be used. AMT determines convergence based on an algorithm developed and tested on a wide range of diverse benchmarks. A tuning job is defined to have converged when none of the training jobs return significant improvement (1% or less). Improvement is measured against the objective metric returned by the highest performing job, so far.

### Combining different completion criteria
<a name="automatic-model-tuning-progress-completion-combine"></a>

You can also combine any of the different completion criteria in the same tuning job. AMT will stop the tuning job when any one of the completion criteria is met. For example, if you want to tune your model until it meets an objective metric, but don't want to keep tuning if your job has converged, use the following guidance.
+ Specify `TargetObjectiveMetricValue` in the [TuningJobCompletionCriteria](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_TuningJobCompletionCriteria.html) API to set a target objective metrics value to reach.
+ Set [CompleteOnConvergence](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ConvergenceDetected.html) to `Enabled` to stop a tuning job if AMT has determined that model performance is unlikely to improve.

## Track tuning job progress
<a name="automatic-model-tuning-progress-track"></a>

You can use the `DescribeHyperParameterTuningJob` API to track the progress of your tuning job at any time while it is running. You don't have to specify completion criteria to obtain tracking information for your tuning job. Use the following fields to obtain statistics about your tuning job.
+ [BestTrainingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeHyperParameterTuningJob.html#sagemaker-DescribeHyperParameterTuningJob-response-BestTrainingJob) – An object that describes the best training job obtained so far, evaluated against your objective metric. Use this field to check your current model performance and the value of the objective metric of this best training job.
+ [ObjectiveStatusCounters](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeHyperParameterTuningJob.html#sagemaker-DescribeHyperParameterTuningJob-response-ObjectiveStatusCounters) – An object that specifies the total number of training jobs completed in a tuning job. To estimate average duration of a tuning job, use `ObjectiveStatusCounters` and the total runtime of a tuning job. You can use the average duration to estimate how much longer your tuning job will run.
+ `ConsumedResources` – The total resources, such as `RunTimeInSeconds`, consumed by your tuning job. Compare `ConsumedResources`, found in the [DescribeHyperParameterTuningJob ](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeHyperParameterTuningJob.html) API, against `BestTrainingJob` in the same API. You can also compare `ConsumedResources` against the response from the [ListTrainingJobsForHyperParameterTuningJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ListTrainingJobsForHyperParameterTuningJob.html) API to assess if your tuning job is making satisfactory progress given the resources being consumed.
+ [TuningJobCompletionDetails](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTuningJobCompletionDetails.html) – Tuning job completion information that includes the following:
  + The timestamp of when convergence is detected if the job has converged.
  + The number of training jobs that have not improved model performance. Model performance is evaluated against the objective metric from the best training job.

  Use the tuning job completion criteria to assess how likely your tuning job is to improve your model performance. Model performance is evaluated against the best objective metric if it ran to completion.

## Stopping your tuning job manually
<a name="automatic-model-tuning-progress-stop"></a>

You can determine if you should let the tuning job run until it completes or if you should stop the tuning job manually. To determine this, use the information returned by the parameters in the `DescribeHyperParameterTuningJob` API, as shown in the previous **Tracking tuning job progress** section. As an example, if your model performance does not improve after several training jobs complete, you may choose to stop the tuning job. Model performance is evaluated against the best objective metric.

To stop the tuning job manually, use the [StopHyperParameterTuningJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_StopHyperParameterTuningJob.html) API and provide the name of the tuning job to be stopped.

# Tune Multiple Algorithms with Hyperparameter Optimization to Find the Best Model
<a name="multiple-algorithm-hpo"></a>

To create a new hyperparameter optimization (HPO) job with Amazon SageMaker AI that tunes multiple algorithms, you must provide job settings that apply to all of the algorithms to be tested and a training definition for each of these algorithms. You must also specify the resources you want to use for the tuning job.
+ The **job settings** to configure include warm starting, early stopping, and the tuning strategy. Warm starting and early stopping are available only when tuning a single algorithm.
+ The **training job definition** to specify the name, algorithm source, objective metric, and the range of values, when required, to configure the set of hyperparameter values for each training job. It configures the channels for data inputs, data output locations, and any checkpoint storage locations for each training job. The definition also configures the resources to deploy for each training job, including instance types and counts, managed spot training, and stopping conditions.
+ The **tuning job resources**: to deploy, including the maximum number of concurrent training jobs that a hyperparameter tuning job can run concurrently and the maximum number of training jobs that the hyperparameter tuning job can run.

## Get Started
<a name="multiple-algorithm-hpo-get-started"></a>

You can create a new hyperparameter tuning job, clone a job, add, or edit tags to a job from the console. You can also use the search feature to find jobs by their name, creation time, or status. Alternatively, you can also hyperparameter tuning jobs with the SageMaker AI API.
+ **In the console**: To create a new job, open the Amazon SageMaker AI console at [https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/), choose **Hyperparameter tuning jobs** from the **Training**, menu, and then choose **Create hyperparameter tuning job**. Then following the configuration steps to create a training job for each algorithm that you want to use. These steps are documented in the [Create a Hyperparameter Optimization Tuning Job for One or More Algorithms (Console)](multiple-algorithm-hpo-create-tuning-jobs.md) topic. 
**Note**  
When you start the configuration steps, note that the warm start and early stopping features are not available to use with multi-algorithm HPO. If you want to use these features, you can only tune a single algorithm at a time. 
+ **With the API**: For instructions on using the SageMaker API to create a hyperparameter tuning job, see [Example: Hyperparameter Tuning Job](automatic-model-tuning-ex.html). When you call [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html) to tune multiple algorithms, you must provide a list of training definitions using [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html#sagemaker-CreateHyperParameterTuningJob-request-TrainingJobDefinitions](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html#sagemaker-CreateHyperParameterTuningJob-request-TrainingJobDefinitions) instead of specifying a single [TrainingJobDefinition](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html#sagemaker-CreateHyperParameterTuningJob-request-TrainingJobDefinition). You must provide job settings that apply to all of the algorithms to be tested and a training definition for each of these algorithms. You must also specify the resources that you want to use for the tuning job. Choose only one of these definition types depending on the number of algorithms that are being tuned. 

**Topics**
+ [

## Get Started
](#multiple-algorithm-hpo-get-started)
+ [

# Create a Hyperparameter Optimization Tuning Job for One or More Algorithms (Console)
](multiple-algorithm-hpo-create-tuning-jobs.md)
+ [

# Manage Hyperparameter Tuning and Training Jobs
](multiple-algorithm-hpo-manage-tuning-jobs.md)

# Create a Hyperparameter Optimization Tuning Job for One or More Algorithms (Console)
<a name="multiple-algorithm-hpo-create-tuning-jobs"></a>

This guide shows you how to create a new hyperparameter optimization (HPO) tuning job for one or more algorithms. To create an HPO job, define the settings for the tuning job, and create training job definitions for each algorithm being tuned. Next, configure the resources for and create the tuning job. The following sections provide details about how to complete each step. We provide an example of how to tune multiple algorithms using the SageMaker AI SDK for Python client at the end of this guide.

## Components of a tuning job
<a name="multiple-algorithm-hpo-create-tuning-jobs-define-settings"></a>

An HPO tuning job contains the following three components:
+ Tuning job settings
+ Training job definitions
+ Tuning job configuration

The way that these components are included in your HPO tuning job depends on whether your tuning job contains one or multiple training algorithms. The following guide describes each of the components and gives an example of both types of tuning jobs.

### Tuning job settings
<a name="multiple-algorithm-hpo-create-tuning-jobs-components-tuning-settings"></a>

Your tuning job settings are applied across all of the algorithms in the HPO tuning job. Warm start and early stopping are available only when you're tuning a single algorithm. After you define the job settings, you can create individual training definitions for each algorithm or variation that you want to tune. 

**Warm start**  
If you cloned this job, you can use the results from a previous tuning job to improve the performance of this new tuning job. This is the warm start feature, and it's only available when tuning a single algorithm. With the warm start option, you can choose up to five previous hyperparameter tuning jobs to use. Alternatively, you can use transfer learning to add additional data to the parent tuning job. When you select this option, you choose one previous tuning job as the parent. 

**Note**  
Warm start is compatible only with tuning jobs that were created after October 1, 2018. For more information, see [Run a warm start job](automatic-model-tuning-considerations.html).

**Early stopping**  
To reduce compute time and avoid overfitting your model, you can stop training jobs early. Early stopping is helpful when the training job is unlikely to improve the current best objective metric of the hyperparameter tuning job. Like warm start, this feature is only available when tuning a single algorithm. This is an automatic feature without configuration options, and it’s disabled by default. For more information about how early stopping works, the algorithms that support it, and how to use it with your own algorithms, see [Stop Training Jobs Early](automatic-model-tuning-early-stopping.html).

**Tuning strategy**  
Tuning strategy can be either random, Bayesian, or Hyperband. These selections specify how automatic tuning algorithms search specified hyperparameter ranges that are selected in a later step. Random search chooses random combinations of values from the specified ranges and can be run sequentially or in parallel. Bayesian optimization chooses values based on what is likely to get the best result according to the known history of previous selections. Hyperband uses a multi-fidelity strategy that dynamically allocates resources toward well-utilized jobs and automatically stops those that underperform. The new configuration that starts after stopping other configurations is chosen randomly.

 Hyperband can only be used with iterative algorithms, or algorithms that run steps in iterations, such as [https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html) or [Random Cut Forest](https://docs.aws.amazon.com/sagemaker/latest/dg/randomcutforest.html). Hyperband can't be used with non-iterative algorithms, such as decision trees or [k-Nearest Neighbors](https://docs.aws.amazon.com/sagemaker/latest/dg/k-nearest-neighbors.html). For more information about search strategies, see [How Hyperparameter Tuning Works](automatic-model-tuning-how-it-works.html).

**Note**  
Hyperband uses an advanced internal mechanism to apply early stopping. Therefore, when you use the Hyperband internal early stopping feature, the parameter `TrainingJobEarlyStoppingType` in the `HyperParameterTuningJobConfig` API must be set to `OFF`.

**Tags**  
To help you manage tuning jobs, you can enter tags as key-value pairs to assign metadata to tuning jobs. Values in the key-value pair are not required. You can use the key without values. To see the keys associated with a job, choose the **Tags** tab on the details page for tuning job. For more information about using tags for tuning jobs, see [Manage Hyperparameter Tuning and Training Jobs](multiple-algorithm-hpo-manage-tuning-jobs.md).

### Training job definitions
<a name="multiple-algorithm-hpo-create-tuning-jobs-training-definitions"></a>

To create a training job definition, you must configure the algorithm and parameters, define the data input and output, and configure resources. Provide at least one [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_TrainingJobDefinition.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_TrainingJobDefinition.html) for each HPO tuning job. Each training definition specifies the configuration for an algorithm.

To create several definitions for your training job, you can clone a job definition. Cloning a job can save time because it copies all of the job settings, including data channels and Amazon S3 storage locations for output artifacts. You can edit a cloned job to change what you need for your use case.

**Topics**
+ [

#### Configure algorithm and parameters
](#multiple-algorithm-hpo-algorithm-configuration)
+ [

#### Define data input and output
](#multiple-algorithm-hpo-data)
+ [

#### Configure training job resources
](#multiple-algorithm-hpo-training-job-definition-resources)
+ [

#### Add or clone a training job
](#multiple-algorithm-hpo-add-training-job)

#### Configure algorithm and parameters
<a name="multiple-algorithm-hpo-algorithm-configuration"></a>

 The following list describes what you need to configure the set of hyperparameter values for each training job. 
+ A name for your tuning job
+ Permission to access services
+ Parameters for any algorithm options
+ An objective metric
+ The range of hyperparameter values, when required

**Name**  
 Provide a unique name for the training definition. 

**Permissions**  
 Amazon SageMaker AI requires permissions to call other services on your behalf. Choose an AWS Identity and Access Management (IAM) role, or let AWS create a role with the `AmazonSageMakerFullAccess` IAM policy attached. 

**Optional security settings**  
 The network isolation setting prevents the container from making any outbound network calls. This is required for AWS Marketplace machine learning offerings. 

 You can also choose to use a virtual private cloud (VPC).

**Note**  
 Inter-container encryption is only available when you create a job definition from the API. 

**Algorithm options**  
You can choose built-in algorithms, your own algorithm, your own container with an algorithm, or you can subscribe to an algorithm from AWS Marketplace. 
+ If you choose a built-in algorithm, it has the Amazon Elastic Container Registry (Amazon ECR) image information pre-populated.
+ If you choose your own container, you must specify the (Amazon ECR) image information. You can select the input mode for the algorithm as file or pipe.
+ If you plan to supply your data using a CSV file from Amazon S3, you should select the file.

**Metrics**  
When you choose a built-in algorithm, metrics are provided for you. If you choose your own algorithm, you must define your metrics. You can define up to 20 metrics for your tuning job to monitor. You must choose one metric as the objective metric. For more information about how to define a metric for a tuning job, see [Define metrics](automatic-model-tuning-define-metrics-variables.md#automatic-model-tuning-define-metrics).

**Objective metric**  
To find the best training job, set an objective metric and whether to maximize or minimize it. After the training job is complete, you can view the tuning job detail page. The detail page provides a summary of the best training job that is found using this objective metric. 

**Hyperparameter configuration**  
When you choose a built-in algorithm, the default values for its hyperparameters are set for you, using ranges that are optimized for the algorithm that's being tuned. You can change these values as you see fit. For example, instead of a range, you can set a fixed value for a hyperparameter by setting the parameter’s type to **static**. Each algorithm has different required and optional parameters. For more information, see [Best Practices for Hyperparameter Tuning](automatic-model-tuning-considerations.html) and [Define Hyperparameter Ranges](automatic-model-tuning-define-ranges.html). 

#### Define data input and output
<a name="multiple-algorithm-hpo-data"></a>

Each training job definition for a tuning job must configure the channels for data inputs, data output locations, and optionally, any checkpoint storage locations for each training job. 

**Input data configuration**  
Input data is defined by channels. Each channel its own source location (Amazon S3 or Amazon Elastic File System), compression, and format options. You can define up to 20 channels of input sources. If the algorithm that you choose supports multiple input channels, you can specify those, too. For example, when you use the [XGBoost churn prediction notebook](https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_applying_machine_learning/xgboost_customer_churn/xgboost_customer_churn.html), you can add two channels: train and validation.

**Checkpoint configuration**  
Checkpoints are periodically generated during training. For the checkpoints to be saved, you must choose an Amazon S3 location. Checkpoints are used in metrics reporting, and are also used to resume managed spot training jobs. For more information, see [Checkpoints in Amazon SageMaker AI](model-checkpoints.md).

**Output data configuration**  
Define an Amazon S3 location for the artifacts of the training job to be stored. You have the option of adding encryption to the output using an AWS Key Management Service (AWS KMS) key. 

#### Configure training job resources
<a name="multiple-algorithm-hpo-training-job-definition-resources"></a>

Each training job definition for a tuning job must configure the resources to deploy, including instance types and counts, managed spot training, and stopping conditions.

**Resource configuration**  
Each training definition can have a different resource configuration. You choose the instance type and number of nodes. 

**Managed spot training**  
You can save computer costs for jobs if you have flexibility in start and end times by allowing SageMaker AI to use spare capacity to run jobs. For more information, see [Managed Spot Training in Amazon SageMaker AI](model-managed-spot-training.md).

**Stopping condition**  
The stopping condition specifies the maximum duration that's allowed for each training job. 

#### Add or clone a training job
<a name="multiple-algorithm-hpo-add-training-job"></a>

After you create a training job definition for a tuning job, you will return to the **Training Job Definition(s)** panel. This panel is where you can create additional training job definitions to train additional algorithms. You can select the **Add training job definition** and work through the steps to define a training job again. 

Alternatively, to replicate an existing training job definition and edit it for the new algorithm, choose **Clone** from the **Action** menu. The clone option can save time because it copies all of the job’s settings, including the data channels and Amazon S3 storage locations. For more information about cloning, see [Manage Hyperparameter Tuning and Training Jobs](multiple-algorithm-hpo-manage-tuning-jobs.md).

### Tuning job configuration
<a name="multiple-algorithm-hpo-resource-config"></a>

**Resource Limits**  
You can specify the maximum number of concurrent training jobs that a hyperparameter tuning job can run concurrently (10 at most). You can also specify the maximum number of training jobs that the hyperparameter tuning job can run (500 at most). The number of parallel jobs should not exceed the number of nodes that you have requested across all of your training definitions. The total number of jobs can’t exceed the number of jobs that your definitions are expected to run.

Review the job settings, the training job definitions, and the resource limits. Then select **Create hyperparameter tuning job**.

## HPO tuning job example
<a name="multiple-algorithm-hpo-create-tuning-jobs-define-example"></a>

To run a hyperparameter optimization (HPO) training job, first create a training job definition for each algorithm that's being tuned. Next, define the tuning job settings and configure the resources for the tuning job. Finally, run the tuning job.

If your HPO tuning job contains a single training algorithm, the SageMaker AI tuning function will call the `HyperparameterTuner` API directly and pass in your parameters. If your HPO tuning job contains multiple training algorithms, your tuning function will call the `create` function of the `HyperparameterTuner` API. The `create` function tells the API to expect a dictionary containing one or more estimators.

In the following section, code examples show how to tune a job containing either a single training algorithm or multiple algorithms using the SageMaker AI Python SDK.

### Create training job definitions
<a name="multiple-algorithm-hpo-create-tuning-jobs-define-example-train"></a>

When you create a tuning job that includes multiple training algorithms, your tuning job configuration will include the estimators and metrics and other parameters for your training jobs. Therefore, you need to create the training job definition first, and then configure your tuning job. 

The following code example shows how to retrieve two SageMaker AI containers containing the built-in algorithms [https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html) and [https://docs.aws.amazon.com/sagemaker/latest/dg/linear-learner.html](https://docs.aws.amazon.com/sagemaker/latest/dg/linear-learner.html). If your tuning job contains only one training algorithm, omit one of the containers and one of the estimators.

```
import sagemaker
from sagemaker import image_uris

from sagemaker.estimator import Estimator

sess = sagemaker.Session()
region = sess.boto_region_name
role = sagemaker.get_execution_role()

bucket = sess.default_bucket()
prefix = "sagemaker/multi-algo-hpo"

# Define the training containers and intialize the estimators
xgb_container = image_uris.retrieve("xgboost", region, "latest")
ll_container = image_uris.retrieve("linear-learner", region, "latest")

xgb_estimator = Estimator(
    xgb_container,
    role=role,
    instance_count=1,
    instance_type="ml.m4.xlarge",
    output_path='s3://{}/{}/xgb_output".format(bucket, prefix)',
    sagemaker_session=sess,
)

ll_estimator = Estimator(
    ll_container,
    role,
    instance_count=1,
    instance_type="ml.c4.xlarge",
    output_path="s3://{}/{}/ll_output".format(bucket, prefix),
    sagemaker_session=sess,
)

# Set static hyperparameters
ll_estimator.set_hyperparameters(predictor_type="binary_classifier")
xgb_estimator.set_hyperparameters(
    eval_metric="auc",
    objective="binary:logistic",
    num_round=100,
    rate_drop=0.3,
    tweedie_variance_power=1.4,
)
```

Next, define your input data by specifying the training, validation, and testing datasets, as shown in the following code example. This example shows how to tune multiple training algorithms.

```
training_data = sagemaker.inputs.TrainingInput(
    s3_data="s3://{}/{}/train".format(bucket, prefix), content_type="csv"
)
validation_data = sagemaker.inputs.TrainingInput(
    s3_data="s3://{}/{}/validate".format(bucket, prefix), content_type="csv"
)
test_data = sagemaker.inputs.TrainingInput(
    s3_data="s3://{}/{}/test".format(bucket, prefix), content_type="csv"
)

train_inputs = {
    "estimator-1": {
        "train": training_data,
        "validation": validation_data,
        "test": test_data,
    },
    "estimator-2": {
        "train": training_data,
        "validation": validation_data,
        "test": test_data,
    },
}
```

If your tuning algorithm contains only one training algorithm, your `train_inputs` should contain only one estimator.

You must upload the inputs for the training, validation, and training datasets to your Amazon S3 bucket before you use those in an HPO tuning job.

### Define resources and settings for your tuning job
<a name="multiple-algorithm-hpo-create-tuning-jobs-define-example-resources"></a>

This section shows how to initialize a tuner, define resources, and specify job settings for your tuning job. If your tuning job contains multiple training algorithms, these settings are applied to all of the algorithms that are contained inside your tuning job. This section provides two code examples to define a tuner. The code examples show you how to optimize a single training algorithm followed by an example of how to tune multiple training algorithms.

#### Tune a single training algorithm
<a name="multiple-algorithm-hpo-create-tuning-jobs-define-example-resources-single"></a>

The following code example shows how to initialize a tuner and set hyperparameter ranges for one SageMaker AI built-in algorithm, XGBoost.

```
from sagemaker.tuner import HyperparameterTuner
from sagemaker.parameter import ContinuousParameter, IntegerParameter

hyperparameter_ranges = {
    "max_depth": IntegerParameter(1, 10),
    "eta": ContinuousParameter(0.1, 0.3),
}

objective_metric_name = "validation:accuracy"

tuner = HyperparameterTuner(
    xgb_estimator,
    objective_metric_name,
    hyperparameter_ranges,
    objective_type="Maximize",
    max_jobs=5,
    max_parallel_jobs=2,
)
```

#### Tune multiple training algorithms
<a name="multiple-algorithm-hpo-create-tuning-jobs-define-example-resources-multiple"></a>

Each training job requires different configurations, and these are specified using a dictionary. The following code example shows how to initialize a tuner with configurations for two SageMaker AI built-in algorithms, XGBoost and Linear Learner. The code example also shows how to set a tuning strategy and other job settings, such as the compute resources for the tuning job. The following code example uses `metric_definitions_dict`, which is optional.

```
from sagemaker.tuner import HyperparameterTuner
from sagemaker.parameter import ContinuousParameter, IntegerParameter

# Initialize your tuner
tuner = HyperparameterTuner.create(
    estimator_dict={
        "estimator-1": xgb_estimator,
        "estimator-2": ll_estimator,
    },
    objective_metric_name_dict={
        "estimator-1": "validation:auc",
        "estimator-2": "test:binary_classification_accuracy",
    },
    hyperparameter_ranges_dict={
        "estimator-1": {"eta": ContinuousParameter(0.1, 0.3)},
        "estimator-2": {"learning_rate": ContinuousParameter(0.1, 0.3)},
    },
    metric_definitions_dict={
        "estimator-1": [
            {"Name": "validation:auc", "Regex": "Overall test accuracy: (.*?);"}
        ],
        "estimator-2": [
            {
                "Name": "test:binary_classification_accuracy",
                "Regex": "Overall test accuracy: (.*?);",
            }
        ],
    },
    strategy="Bayesian",
    max_jobs=10,
    max_parallel_jobs=3,
)
```

### Run your HPO tuning job
<a name="multiple-algorithm-hpo-create-tuning-jobs-define-example-run"></a>

Now you can run your tuning job by passing your training inputs to the `fit` function of the `HyperparameterTuner` class. The following code example shows how to pass the `train_inputs` parameter, that is defined in a previous code example, to your tuner.

```
tuner.fit(inputs=train_inputs, include_cls_metadata ={}, estimator_kwargs ={})   
```

# Manage Hyperparameter Tuning and Training Jobs
<a name="multiple-algorithm-hpo-manage-tuning-jobs"></a>

A tuning job can contain many training jobs and creating and managing these jobs and their definitions can become a complex and onerous task. SageMaker AI provides tools to help facilitate the management of these jobs. Tuning jobs you have run can be accessed from the Amazon SageMaker AI console at [https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/). Select **Hyperparameter tuning job** from the **Training** menu to see the list. This page is also where you start the procedure to create a new tuning job by selecting **Create hyperparameter tuning job**. 

To see the training jobs run a part of a tuning job, select one of the hyperparameter tuning jobs from the list. The tabs on the tuning job page allow you to inspect the training jobs, their definitions, the tags and configuration used for the tuning job, and the best training job found during tuning. You can select the best training job or any of the other training jobs that belong to the tuning job to see all of their settings. From here you can create a model that uses the hyperparameter values found by a training job by selecting **Create Model** or you can clone the training job by selecting **Clone**.

**Cloning**  
You can save time by cloning a training job that belongs to a hyperparameter tuning job. Cloning copies all of the job’s settings, including data channels, S3 storage locations for output artifacts. You can do this for training jobs you have already run from the tuning job page, as just described, or when you are creating additional training job definitions while creating a hyperparameter tuning job, as described in [Add or clone a training job](multiple-algorithm-hpo-create-tuning-jobs.md#multiple-algorithm-hpo-add-training-job) step of that procedure. 

**Tagging**  
Automatic Model Tuning launches multiple training jobs within a single parent tuning job to discover the ideal weighting of model hyperparameters. Tags can be added to the parent tuning job as described in the [Components of a tuning job](multiple-algorithm-hpo-create-tuning-jobs.md#multiple-algorithm-hpo-create-tuning-jobs-define-settings) section and these tags are then propagated to the individual training jobs underneath. Customers can use these tags for purposes, such as cost allocation or access control. To add tags using the SageMaker SDK, use [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AddTags.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AddTags.html) API. For more information about using tagging for AWS resources, see [Tagging AWS resources](https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html).

# Example: Hyperparameter Tuning Job
<a name="automatic-model-tuning-ex"></a>

This example shows how to create a new notebook for configuring and launching a hyperparameter tuning job. The tuning job uses the [XGBoost algorithm with Amazon SageMaker AI](xgboost.md) to train a model to predict whether a customer will enroll for a term deposit at a bank after being contacted by phone.

You use the low-level SDK for Python (Boto3) to configure and launch the hyperparameter tuning job, and the AWS Management Console to monitor the status of hyperparameter tuning jobs. You can also use the Amazon SageMaker AI high-level [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) to configure, run, monitor, and analyze hyperparameter tuning jobs. For more information, see [https://github.com/aws/sagemaker-python-sdk](https://github.com/aws/sagemaker-python-sdk).

## Prerequisites
<a name="automatic-model-tuning-ex-prereq"></a>

To run the code in this example, you need
+ [An AWS account and an administrator user](gs-set-up.md)
+ An Amazon S3 bucket for storing your training dataset and the model artifacts created during training
+ [A running SageMaker AI notebook instance](gs-setup-working-env.md)

**Topics**
+ [

## Prerequisites
](#automatic-model-tuning-ex-prereq)
+ [

# Create a Notebook Instance
](automatic-model-tuning-ex-notebook.md)
+ [

# Get the Amazon SageMaker AI Boto 3 Client
](automatic-model-tuning-ex-client.md)
+ [

# Get the SageMaker AI Execution Role
](automatic-model-tuning-ex-role.md)
+ [

# Use an Amazon S3 bucket for input and output
](automatic-model-tuning-ex-bucket.md)
+ [

# Download, Prepare, and Upload Training Data
](automatic-model-tuning-ex-data.md)
+ [

# Configure and Launch a Hyperparameter Tuning Job
](automatic-model-tuning-ex-tuning-job.md)
+ [

# Clean up
](automatic-model-tuning-ex-cleanup.md)

# Create a Notebook Instance
<a name="automatic-model-tuning-ex-notebook"></a>

**Important**  
Custom IAM policies that allow Amazon SageMaker Studio or Amazon SageMaker Studio Classic to create Amazon SageMaker resources must also grant permissions to add tags to those resources. The permission to add tags to resources is required because Studio and Studio Classic automatically tag any resources they create. If an IAM policy allows Studio and Studio Classic to create resources but does not allow tagging, "AccessDenied" errors can occur when trying to create resources. For more information, see [Provide permissions for tagging SageMaker AI resources](security_iam_id-based-policy-examples.md#grant-tagging-permissions).  
[AWS managed policies for Amazon SageMaker AI](security-iam-awsmanpol.md) that give permissions to create SageMaker resources already include permissions to add tags while creating those resources.

Create a Jupyter notebook that contains a pre-installed environment with the default Anaconda installation and Python3. 

**To create a Jupyter notebook**

1. Open the Amazon SageMaker AI console at [https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/).

1. Open a running notebook instance, by choosing **Open** next to its name. The Jupyter notebook server page appears:

     
![\[Example Jupyter notebook server page.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/notebook-dashboard.png)

1. To create a notebook, choose **Files**, **New**, and **conda\$1python3**. .

1. Name the notebook.

## Next Step
<a name="automatic-model-tuning-ex-next-client"></a>

[Get the Amazon SageMaker AI Boto 3 Client](automatic-model-tuning-ex-client.md)

# Get the Amazon SageMaker AI Boto 3 Client
<a name="automatic-model-tuning-ex-client"></a>

Import Amazon SageMaker Python SDK, AWS SDK for Python (Boto3), and other Python libraries. In a new Jupyter notebook, paste the following code to the first cell:

```
import sagemaker
import boto3

import numpy as np                                # For performing matrix operations and numerical processing
import pandas as pd                               # For manipulating tabular data
from time import gmtime, strftime
import os

region = boto3.Session().region_name
smclient = boto3.Session().client('sagemaker')
```

The preceding code cell defines `region` and `smclient` objects that you will use to call the built-in XGBoost algorithm and set the SageMaker AI hyperparameter tuning job.

## Next Step
<a name="automatic-model-tuning-ex-next-role"></a>

[Get the SageMaker AI Execution Role](automatic-model-tuning-ex-role.md)

# Get the SageMaker AI Execution Role
<a name="automatic-model-tuning-ex-role"></a>

Get the execution role for the notebook instance. This is the IAM role that you created for your notebook instance.

To find the ARN of the IAM execution role attached to a notebook instance:

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. On the left navigation pane, choose **Notebook** then **Notebook instances**.

1. From the list of notebooks, select the notebook that you want to view.

1. The ARN is in the **Permissions and encryption** section.

Alternatively, [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) users can retrieve the ARN of the execution role attached to their user profile or a notebook instance by running the following code:

```
from sagemaker import get_execution_role

role = get_execution_role()
print(role)
```

For more information about using `get_execution_role` in the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable), see [Session](https://sagemaker.readthedocs.io/en/stable/api/utility/session.html). For more information about roles, see [How to use SageMaker AI execution roles](sagemaker-roles.md).

## Next Step
<a name="automatic-model-tuning-ex-next-bucket"></a>

[Use an Amazon S3 bucket for input and output](automatic-model-tuning-ex-bucket.md)

# Use an Amazon S3 bucket for input and output
<a name="automatic-model-tuning-ex-bucket"></a>

Set up a S3 bucket to upload training datasets and save training output data for your hyperparameter tuning job.

**To use a default S3 bucket**

Use the following code to specify the default S3 bucket allocated for your SageMaker AI session. `prefix` is the path within the bucket where SageMaker AI stores the data for the current training job.

```
sess = sagemaker.Session()
bucket = sess.default_bucket() # Set a default S3 bucket
prefix = 'DEMO-automatic-model-tuning-xgboost-dm'
```

**To use a specific S3 bucket (Optional)**

If you want to use a specific S3 bucket, use the following code and replace the strings to the exact name of the S3 bucket. The name of the bucket must contain **sagemaker**, and be globally unique. The bucket must be in the same AWS Region as the notebook instance that you use for this example.

```
bucket = "sagemaker-your-preferred-s3-bucket"

sess = sagemaker.Session(
    default_bucket = bucket
)
```

**Note**  
The name of the bucket doesn't need to contain **sagemaker** if the IAM role that you use to run the hyperparameter tuning job has a policy that gives the `S3FullAccess` permission.

## Next Step
<a name="automatic-model-tuning-ex-next-data"></a>

[Download, Prepare, and Upload Training Data](automatic-model-tuning-ex-data.md)

# Download, Prepare, and Upload Training Data
<a name="automatic-model-tuning-ex-data"></a>

For this example, you use a training dataset of information about bank customers that includes the customer's job, marital status, and how they were contacted during the bank's direct marketing campaign. To use a dataset for a hyperparameter tuning job, you download it, transform the data, and then upload it to an Amazon S3 bucket.

For more information about the dataset and the data transformation that the example performs, see the *hpo\$1xgboost\$1direct\$1marketing\$1sagemaker\$1APIs* notebook in the **Hyperparameter Tuning** section of the **SageMaker AI Examples** tab in your notebook instance.

## Download and Explore the Training Dataset
<a name="automatic-model-tuning-ex-data-download"></a>

To download and explore the dataset, run the following code in your notebook:

```
!wget -N https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank-additional.zip
!unzip -o bank-additional.zip
data = pd.read_csv('./bank-additional/bank-additional-full.csv', sep=';')
pd.set_option('display.max_columns', 500)     # Make sure we can see all of the columns
pd.set_option('display.max_rows', 5)         # Keep the output on one page
data
```

## Prepare and Upload Data
<a name="automatic-model-tuning-ex-data-transform"></a>

Before creating the hyperparameter tuning job, prepare the data and upload it to an S3 bucket where the hyperparameter tuning job can access it.

Run the following code in your notebook:

```
data['no_previous_contact'] = np.where(data['pdays'] == 999, 1, 0)                                 # Indicator variable to capture when pdays takes a value of 999
data['not_working'] = np.where(np.in1d(data['job'], ['student', 'retired', 'unemployed']), 1, 0)   # Indicator for individuals not actively employed
model_data = pd.get_dummies(data)                                                                  # Convert categorical variables to sets of indicators
model_data
model_data = model_data.drop(['duration', 'emp.var.rate', 'cons.price.idx', 'cons.conf.idx', 'euribor3m', 'nr.employed'], axis=1)

train_data, validation_data, test_data = np.split(model_data.sample(frac=1, random_state=1729), [int(0.7 * len(model_data)), int(0.9*len(model_data))])

pd.concat([train_data['y_yes'], train_data.drop(['y_no', 'y_yes'], axis=1)], axis=1).to_csv('train.csv', index=False, header=False)
pd.concat([validation_data['y_yes'], validation_data.drop(['y_no', 'y_yes'], axis=1)], axis=1).to_csv('validation.csv', index=False, header=False)
pd.concat([test_data['y_yes'], test_data.drop(['y_no', 'y_yes'], axis=1)], axis=1).to_csv('test.csv', index=False, header=False)

boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train/train.csv')).upload_file('train.csv')
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'validation/validation.csv')).upload_file('validation.csv')
```

## Next Step
<a name="automatic-model-tuning-ex-next-tuning-job"></a>

[Configure and Launch a Hyperparameter Tuning Job](automatic-model-tuning-ex-tuning-job.md)

# Configure and Launch a Hyperparameter Tuning Job
<a name="automatic-model-tuning-ex-tuning-job"></a>

**Important**  
Custom IAM policies that allow Amazon SageMaker Studio or Amazon SageMaker Studio Classic to create Amazon SageMaker resources must also grant permissions to add tags to those resources. The permission to add tags to resources is required because Studio and Studio Classic automatically tag any resources they create. If an IAM policy allows Studio and Studio Classic to create resources but does not allow tagging, "AccessDenied" errors can occur when trying to create resources. For more information, see [Provide permissions for tagging SageMaker AI resources](security_iam_id-based-policy-examples.md#grant-tagging-permissions).  
[AWS managed policies for Amazon SageMaker AI](security-iam-awsmanpol.md) that give permissions to create SageMaker resources already include permissions to add tags while creating those resources.

A hyperparameter is a high-level parameter that influences the learning process during model training. To get the best model predictions, you can optimize a hyperparameter configuration or set hyperparameter values. The process of finding an optimal configuration is called hyperparameter tuning. To configure and launch a hyperparameter tuning job, complete the steps in these guides.

**Topics**
+ [

## Settings for the hyperparameter tuning job
](#automatic-model-tuning-ex-low-tuning-config)
+ [

## Configure the training jobs
](#automatic-model-tuning-ex-low-training-def)
+ [

## Name and launch the hyperparameter tuning job
](#automatic-model-tuning-ex-low-launch)
+ [

# Monitor the Progress of a Hyperparameter Tuning Job
](automatic-model-tuning-monitor.md)
+ [

## View the Status of the Training Jobs
](#automatic-model-tuning-monitor-training)
+ [

## View the Best Training Job
](#automatic-model-tuning-best-training-job)

## Settings for the hyperparameter tuning job
<a name="automatic-model-tuning-ex-low-tuning-config"></a>

To specify settings for the hyperparameter tuning job, define a JSON object when you create the tuning job. Pass this JSON object as the value of the `HyperParameterTuningJobConfig` parameter to the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html) API.

In this JSON object, specify the following:

In this JSON object, you specify:
+ `HyperParameterTuningJobObjective` – The objective metric used to evaluate the performance of the training job launched by the hyperparameter tuning job.
+ `ParameterRanges` – The range of values that a tunable hyperparameter can use during optimization. For more information, see [Define Hyperparameter Ranges](automatic-model-tuning-define-ranges.md)
+ `RandomSeed` – A value used to initialize a pseudo-random number generator. Setting a random seed will allow the hyperparameter tuning search strategies to produce more consistent configurations for the same tuning job (optional).
+ `ResourceLimits` – The maximum number of training and parallel training jobs that the hyperparameter tuning job can use.

**Note**  
If you use your own algorithm for hyperparameter tuning, rather than a SageMaker AI [built-in algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html), you must define metrics for your algorithm. For more information, see [Define metrics](automatic-model-tuning-define-metrics-variables.md#automatic-model-tuning-define-metrics).

The following code example shows how to configure a hyperparameter tuning job using the built-in [XGBoost algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html). The code example shows how to define ranges for the `eta`, `alpha`, `min_child_weight`, and `max_depth` hyperparameters. For more information about these and other hyperparameters see [XGBoost Parameters](https://xgboost.readthedocs.io/en/release_1.2.0/parameter.html). 

In this code example, the objective metric for the hyperparameter tuning job finds the hyperparameter configuration that maximizes `validation:auc`. SageMaker AI built-in algorithms automatically write the objective metric to CloudWatch Logs. The following code example also shows how to set a `RandomSeed`. 

```
tuning_job_config = {
    "ParameterRanges": {
      "CategoricalParameterRanges": [],
      "ContinuousParameterRanges": [
        {
          "MaxValue": "1",
          "MinValue": "0",
          "Name": "eta"
        },
        {
          "MaxValue": "2",
          "MinValue": "0",
          "Name": "alpha"
        },
        {
          "MaxValue": "10",
          "MinValue": "1",
          "Name": "min_child_weight"
        }
      ],
      "IntegerParameterRanges": [
        {
          "MaxValue": "10",
          "MinValue": "1",
          "Name": "max_depth"
        }
      ]
    },
    "ResourceLimits": {
      "MaxNumberOfTrainingJobs": 20,
      "MaxParallelTrainingJobs": 3
    },
    "Strategy": "Bayesian",
    "HyperParameterTuningJobObjective": {
      "MetricName": "validation:auc",
      "Type": "Maximize"
    },
    "RandomSeed" : 123
  }
```

## Configure the training jobs
<a name="automatic-model-tuning-ex-low-training-def"></a>

The hyperparameter tuning job will launch training jobs to find an optimal configuration of hyperparameters. These training jobs should be configured using the SageMaker AI [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html) API. 

To configure the training jobs, define a JSON object and pass it as the value of the `TrainingJobDefinition` parameter inside `CreateHyperParameterTuningJob`.

In this JSON object, you can specify the following: 
+ `AlgorithmSpecification` – The [registry path](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.html) of the Docker image containing the training algorithm and related metadata. To specify an algorithm, you can use your own [custom built algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms.html) inside a [Docker](https://docs.docker.com/get-started/overview/) container or a [SageMaker AI built-in algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html) (required).
+ `InputDataConfig` – The input configuration, including the `ChannelName`, `ContentType`, and data source for your training and test data (required).
+ `InputDataConfig` – The input configuration, including the `ChannelName`, `ContentType`, and data source for your training and test data (required).
+ The storage location for the algorithm's output. Specify the S3 bucket where you want to store the output of the training jobs.
+ `RoleArn` – The [Amazon Resource Name](https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html) (ARN) of an AWS Identity and Access Management (IAM) role that SageMaker AI uses to perform tasks. Tasks include reading input data, downloading a Docker image, writing model artifacts to an S3 bucket, writing logs to Amazon CloudWatch Logs, and writing metrics to Amazon CloudWatch (required).
+ `StoppingCondition` – The maximum runtime in seconds that a training job can run before being stopped. This value should be greater than the time needed to train your model (required).
+ `MetricDefinitions` – The name and regular expression that defines any metrics that the training jobs emit. Define metrics only when you use a custom training algorithm. The example in the following code uses a built-in algorithm, which already has metrics defined. For information about defining metrics (optional), see [Define metrics](automatic-model-tuning-define-metrics-variables.md#automatic-model-tuning-define-metrics).
+ `TrainingImage` – The [Docker](https://docs.docker.com/get-started/overview/)container image that specifies the training algorithm (optional).
+ `StaticHyperParameters` – The name and values of hyperparameters that are not tuned in the tuning job (optional).

The following code example sets static values for the `eval_metric`, `num_round`, `objective`, `rate_drop`, and `tweedie_variance_power` parameters of the [XGBoost algorithm with Amazon SageMaker AI](xgboost.md) built-in algorithm.

------
#### [ SageMaker Python SDK v1 ]

```
from sagemaker.amazon.amazon_estimator import get_image_uri
training_image = get_image_uri(region, 'xgboost', repo_version='1.0-1')

s3_input_train = 's3://{}/{}/train'.format(bucket, prefix)
s3_input_validation ='s3://{}/{}/validation/'.format(bucket, prefix)

training_job_definition = {
    "AlgorithmSpecification": {
      "TrainingImage": training_image,
      "TrainingInputMode": "File"
    },
    "InputDataConfig": [
      {
        "ChannelName": "train",
        "CompressionType": "None",
        "ContentType": "csv",
        "DataSource": {
          "S3DataSource": {
            "S3DataDistributionType": "FullyReplicated",
            "S3DataType": "S3Prefix",
            "S3Uri": s3_input_train
          }
        }
      },
      {
        "ChannelName": "validation",
        "CompressionType": "None",
        "ContentType": "csv",
        "DataSource": {
          "S3DataSource": {
            "S3DataDistributionType": "FullyReplicated",
            "S3DataType": "S3Prefix",
            "S3Uri": s3_input_validation
          }
        }
      }
    ],
    "OutputDataConfig": {
      "S3OutputPath": "s3://{}/{}/output".format(bucket,prefix)
    },
    "ResourceConfig": {
      "InstanceCount": 2,
      "InstanceType": "ml.c4.2xlarge",
      "VolumeSizeInGB": 10
    },
    "RoleArn": role,
    "StaticHyperParameters": {
      "eval_metric": "auc",
      "num_round": "100",
      "objective": "binary:logistic",
      "rate_drop": "0.3",
      "tweedie_variance_power": "1.4"
    },
    "StoppingCondition": {
      "MaxRuntimeInSeconds": 43200
    }
}
```

------
#### [ SageMaker Python SDK v2 ]

```
training_image = sagemaker.image_uris.retrieve('xgboost', region, '1.0-1')

s3_input_train = 's3://{}/{}/train'.format(bucket, prefix)
s3_input_validation ='s3://{}/{}/validation/'.format(bucket, prefix)

training_job_definition = {
    "AlgorithmSpecification": {
      "TrainingImage": training_image,
      "TrainingInputMode": "File"
    },
    "InputDataConfig": [
      {
        "ChannelName": "train",
        "CompressionType": "None",
        "ContentType": "csv",
        "DataSource": {
          "S3DataSource": {
            "S3DataDistributionType": "FullyReplicated",
            "S3DataType": "S3Prefix",
            "S3Uri": s3_input_train
          }
        }
      },
      {
        "ChannelName": "validation",
        "CompressionType": "None",
        "ContentType": "csv",
        "DataSource": {
          "S3DataSource": {
            "S3DataDistributionType": "FullyReplicated",
            "S3DataType": "S3Prefix",
            "S3Uri": s3_input_validation
          }
        }
      }
    ],
    "OutputDataConfig": {
      "S3OutputPath": "s3://{}/{}/output".format(bucket,prefix)
    },
    "ResourceConfig": {
      "InstanceCount": 2,
      "InstanceType": "ml.c4.2xlarge",
      "VolumeSizeInGB": 10
    },
    "RoleArn": role,
    "StaticHyperParameters": {
      "eval_metric": "auc",
      "num_round": "100",
      "objective": "binary:logistic",
      "rate_drop": "0.3",
      "tweedie_variance_power": "1.4"
    },
    "StoppingCondition": {
      "MaxRuntimeInSeconds": 43200
    }
}
```

------

## Name and launch the hyperparameter tuning job
<a name="automatic-model-tuning-ex-low-launch"></a>

After you configure the hyperparameter tuning job, you can launch it by calling the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html) API. The following code example uses `tuning_job_config` and `training_job_definition`. These were defined in the previous two code examples to create a hyperparameter tuning job.

```
tuning_job_name = "MyTuningJob"
smclient.create_hyper_parameter_tuning_job(HyperParameterTuningJobName = tuning_job_name,
                                           HyperParameterTuningJobConfig = tuning_job_config,
                                           TrainingJobDefinition = training_job_definition)
```

# Monitor the Progress of a Hyperparameter Tuning Job
<a name="automatic-model-tuning-monitor"></a>

To monitor the progress of a hyperparameter tuning job and the training jobs that it launches, use the Amazon SageMaker AI console.

**Topics**
+ [

## View the Status of the Hyperparameter Tuning Job
](#automatic-model-tuning-monitor-tuning)

## View the Status of the Hyperparameter Tuning Job
<a name="automatic-model-tuning-monitor-tuning"></a>

**To view the status of the hyperparameter tuning job**

1. Open the Amazon SageMaker AI console at [https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/).

1. Choose **Hyperparameter tuning jobs**.  
![\[Hyperparameter tuning job console.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/console-tuning-jobs.png)

1. In the list of hyperparameter tuning jobs, check the status of the hyperparameter tuning job you launched. A tuning job can be:
   + `Completed`—The hyperparameter tuning job successfully completed.
   + `InProgress`—The hyperparameter tuning job is in progress. One or more training jobs are still running.
   + `Failed`—The hyperparameter tuning job failed.
   + `Stopped`—The hyperparameter tuning job was manually stopped before it completed. All training jobs that the hyperparameter tuning job launched are stopped.
   + `Stopping`—The hyperparameter tuning job is in the process of stopping.

## View the Status of the Training Jobs
<a name="automatic-model-tuning-monitor-training"></a>

**To view the status of the training jobs that the hyperparameter tuning job launched**

1. In the list of hyperparameter tuning jobs, choose the job that you launched.

1. Choose **Training jobs**.  
![\[Location of Training jobs in the .\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/hyperparameter-training-jobs.png)

1. View the status of each training job. To see more details about a job, choose it in the list of training jobs. To view a summary of the status of all of the training jobs that the hyperparameter tuning job launched, see **Training job status counter**.

   A training job can be:
   + `Completed`—The training job successfully completed.
   + `InProgress`—The training job is in progress.
   + `Stopped`—The training job was manually stopped before it completed.
   + `Failed (Retryable)`—The training job failed, but can be retried. A failed training job can be retried only if it failed because an internal service error occurred.
   + `Failed (Non-retryable)`—The training job failed and can't be retried. A failed training job can't be retried when a client error occurs.
**Note**  
Hyperparameter tuning jobs can be stopped and the underlying resources [ deleted](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-ex-cleanup.html), but the jobs themselves cannot be deleted.

## View the Best Training Job
<a name="automatic-model-tuning-best-training-job"></a>

A hyperparameter tuning job uses the objective metric that each training job returns to evaluate training jobs. While the hyperparameter tuning job is in progress, the best training job is the one that has returned the best objective metric so far. After the hyperparameter tuning job is complete, the best training job is the one that returned the best objective metric.

To view the best training job, choose **Best training job**.

![\[Location of Best training job in the hyperparameter tuning job console.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/best-training-job.png)


To deploy the best training job as a model that you can host at a SageMaker AI endpoint, choose **Create model**.

### Next Step
<a name="automatic-model-tuning-ex-next-cleanup"></a>

[Clean up](automatic-model-tuning-ex-cleanup.md)

# Clean up
<a name="automatic-model-tuning-ex-cleanup"></a>

To avoid incurring unnecessary charges, when you are done with the example, use the AWS Management Console to delete the resources that you created for it. 

**Note**  
If you plan to explore other examples, you might want to keep some of these resources, such as your notebook instance, S3 bucket, and IAM role.

1. Open the SageMaker AI console at [https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/) and delete the notebook instance. Stop the instance before deleting it.

1. Open the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/) and delete the bucket that you created to store model artifacts and the training dataset. 

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/) and delete the IAM role. If you created permission policies, you can delete them, too.

1. Open the Amazon CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/) and delete all of the log groups that have names starting with `/aws/sagemaker/`.

# Stop Training Jobs Early
<a name="automatic-model-tuning-early-stopping"></a>

Stop the training jobs that a hyperparameter tuning job launches early when they are not improving significantly as measured by the objective metric. Stopping training jobs early can help reduce compute time and helps you avoid overfitting your model. To configure a hyperparameter tuning job to stop training jobs early, do one of the following:
+ If you are using the AWS SDK for Python (Boto3), set the `TrainingJobEarlyStoppingType` field of the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTuningJobConfig.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTuningJobConfig.html) object that you use to configure the tuning job to `AUTO`.
+ If you are using the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable), set the `early_stopping_type` parameter of the [HyperParameterTuner](https://sagemaker.readthedocs.io/en/stable/tuner.html) object to `Auto`.
+ In the Amazon SageMaker AI console, in the **Create hyperparameter tuning job** workflow, under **Early stopping**, choose **Auto**.

For a sample notebook that demonstrates how to use early stopping, see [https://github.com/awslabs/amazon-sagemaker-examples/blob/master/hyperparameter\$1tuning/image\$1classification\$1early\$1stopping/hpo\$1image\$1classification\$1early\$1stopping.ipynb](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/hyperparameter_tuning/image_classification_early_stopping/hpo_image_classification_early_stopping.ipynb) or open the `hpo_image_classification_early_stopping.ipynb` notebook in the **Hyperparameter Tuning** section of the **SageMaker AI Examples** in a notebook instance.

## How Early Stopping Works
<a name="automatic-tuning-early-stop-how"></a>

When you enable early stopping for a hyperparameter tuning job, SageMaker AI evaluates each training job the hyperparameter tuning job launches as follows:
+ After each epoch of training, get the value of the objective metric.
+ Compute the running average of the objective metric for all previous training jobs up to the same epoch, and then compute the median of all of the running averages.
+ If the value of the objective metric for the current training job is worse (higher when minimizing or lower when maximizing the objective metric) than the median value of running averages of the objective metric for previous training jobs up to the same epoch, SageMaker AI stops the current training job.

## Algorithms That Support Early Stopping
<a name="automatic-tuning-early-stopping-algos"></a>

To support early stopping, an algorithm must emit objective metrics for each epoch. The following built-in SageMaker AI algorithms support early stopping:
+ [LightGBM](lightgbm.md)
+ [CatBoost](catboost.md)
+ [AutoGluon-Tabular](autogluon-tabular.md)
+ [TabTransformer](tabtransformer.md)
+ [Linear Learner Algorithm](linear-learner.md)—Supported only if you use `objective_loss` as the objective metric.
+ [XGBoost algorithm with Amazon SageMaker AI](xgboost.md)
+ [Image Classification - MXNet](image-classification.md)
+ [Object Detection - MXNet](object-detection.md)
+ [Sequence-to-Sequence Algorithm](seq-2-seq.md)
+ [IP Insights](ip-insights.md)

**Note**  
This list of built-in algorithms that support early stopping is current as of December 13, 2018. Other built-in algorithms might support early stopping in the future. If an algorithm emits a metric that can be used as an objective metric for a hyperparameter tuning job (preferably a validation metric), then it supports early stopping.

To use early stopping with your own algorithm, you must write your algorithms such that it emits the value of the objective metric after each epoch. The following list shows how you can do that in different frameworks:

TensorFlow  
Use the `tf.keras.callbacks.ProgbarLogger` class. For information, see the [tf.keras.callbacks.ProgbarLogger API](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ProgbarLogger).

MXNet  
Use the `mxnet.callback.LogValidationMetricsCallback`. For information, see the [mxnet.callback APIs](https://mxnet.apache.org/versions/master/api/python/docs/api/legacy/callback/index.html).

Chainer  
Extend chainer by using the `extensions.Evaluator` class. For information, see the [chainer.training.extensions.Evaluator API](https://docs.chainer.org/en/v1.24.0/reference/extensions.html#evaluator).

PyTorch and Spark  
There is no high-level support. You must explicitly write your training code so that it computes objective metrics and writes them to logs after each epoch.

# Run a Warm Start Hyperparameter Tuning Job
<a name="automatic-model-tuning-warm-start"></a>

Use warm start to start a hyperparameter tuning job using one or more previous tuning jobs as a starting point. The results of previous tuning jobs are used to inform which combinations of hyperparameters to search over in the new tuning job. Hyperparameter tuning uses either Bayesian or random search to choose combinations of hyperparameter values from ranges that you specify. For more information, see [Understand the hyperparameter tuning strategies available in Amazon SageMaker AI](automatic-model-tuning-how-it-works.md). Using information from previous hyperparameter tuning jobs can help increase the performance of the new hyperparameter tuning job by making the search for the best combination of hyperparameters more efficient.

**Note**  
Warm start tuning jobs typically take longer to start than standard hyperparameter tuning jobs, because the results from the parent jobs have to be loaded before the job can start. The increased time depends on the total number of training jobs launched by the parent jobs.

Reasons to consider warm start include the following:
+ To gradually increase the number of training jobs over several tuning jobs based on results after each iteration.
+ To tune a model using new data that you received.
+ To change hyperparameter ranges that you used in a previous tuning job, change static hyperparameters to tunable, or change tunable hyperparameters to static values.
+ You stopped a previous hyperparameter job early or it stopped unexpectedly.

**Topics**
+ [

## Types of Warm Start Tuning Jobs
](#tuning-warm-start-types)
+ [

## Warm Start Tuning Restrictions
](#warm-start-tuning-restrictions)
+ [

## Warm Start Tuning Sample Notebook
](#warm-start-tuning-sample-notebooks)
+ [

## Create a Warm Start Tuning Job
](#warm-start-tuning-example)

## Types of Warm Start Tuning Jobs
<a name="tuning-warm-start-types"></a>

There are two different types of warm start tuning jobs:

`IDENTICAL_DATA_AND_ALGORITHM`  
The new hyperparameter tuning job uses the same input data and training image as the parent tuning jobs. You can change the hyperparameter ranges to search and the maximum number of training jobs that the hyperparameter tuning job launches. You can also change hyperparameters from tunable to static, and from static to tunable, but the total number of static plus tunable hyperparameters must remain the same as it is in all parent jobs. You cannot use a new version of the training algorithm, unless the changes in the new version do not affect the algorithm itself. For example, changes that improve logging or adding support for a different data format are allowed.  
Use identical data and algorithm when you use the same training data as you used in a previous hyperparameter tuning job, but you want to increase the total number of training jobs or change ranges or values of hyperparameters.  
When you run an warm start tuning job of type `IDENTICAL_DATA_AND_ALGORITHM`, there is an additional field in the response to [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeHyperParameterTuningJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeHyperParameterTuningJob.html) named `OverallBestTrainingJob`. The value of this field is the [TrainingJobSummary](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_TrainingJobSummary.html) for the training job with the best objective metric value of all training jobs launched by this tuning job and all parent jobs specified for the warm start tuning job.

`TRANSFER_LEARNING`  
The new hyperparameter tuning job can include input data, hyperparameter ranges, maximum number of concurrent training jobs, and maximum number of training jobs that are different than those of its parent hyperparameter tuning jobs. You can also change hyperparameters from tunable to static, and from static to tunable, but the total number of static plus tunable hyperparameters must remain the same as it is in all parent jobs. The training algorithm image can also be a different version from the version used in the parent hyperparameter tuning job. When you use transfer learning, changes in the dataset or the algorithm that significantly affect the value of the objective metric might reduce the usefulness of using warm start tuning.

## Warm Start Tuning Restrictions
<a name="warm-start-tuning-restrictions"></a>

The following restrictions apply to all warm start tuning jobs:
+ A tuning job can have a maximum of 5 parent jobs, and all parent jobs must be in a terminal state (`Completed`, `Stopped`, or `Failed`) before you start the new tuning job.
+ The objective metric used in the new tuning job must be the same as the objective metric used in the parent jobs.
+ The total number of static plus tunable hyperparameters must remain the same between parent jobs and the new tuning job. Because of this, if you think you might want to use a hyperparameter as tunable in a future warm start tuning job, you should add it as a static hyperparameter when you create a tuning job.
+ The type of each hyperparameter (continuous, integer, categorical) must not change between parent jobs and the new tuning job.
+ The number of total changes from tunable hyperparameters in the parent jobs to static hyperparameters in the new tuning job, plus the number of changes in the values of static hyperparameters cannot be more than 10. For example, if the parent job has a tunable categorical hyperparameter with the possible values `red` and `blue`, you change that hyperparameter to static in the new tuning job, that counts as 2 changes against the allowed total of 10. If the same hyperparameter had a static value of `red` in the parent job, and you change the static value to `blue` in the new tuning job, it also counts as 2 changes.
+ Warm start tuning is not recursive. For example, if you create `MyTuningJob3` as a warm start tuning job with `MyTuningJob2` as a parent job, and `MyTuningJob2` is itself an warm start tuning job with a parent job `MyTuningJob1`, the information that was learned when running `MyTuningJob1` is not used for `MyTuningJob3`. If you want to use the information from `MyTuningJob1`, you must explicitly add it as a parent for `MyTuningJob3`.
+ The training jobs launched by every parent job in a warm start tuning job count against the 500 maximum training jobs for a tuning job.
+ Hyperparameter tuning jobs created before October 1, 2018 cannot be used as parent jobs for warm start tuning jobs.

## Warm Start Tuning Sample Notebook
<a name="warm-start-tuning-sample-notebooks"></a>

For a sample notebook that shows how to use warm start tuning, see [https://github.com/awslabs/amazon-sagemaker-examples/blob/master/hyperparameter\$1tuning/image\$1classification\$1warmstart/hpo\$1image\$1classification\$1warmstart.ipynb](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/hyperparameter_tuning/image_classification_warmstart/hpo_image_classification_warmstart.ipynb).

## Create a Warm Start Tuning Job
<a name="warm-start-tuning-example"></a>

You can use either the low-level AWS SDK for Python (Boto 3) or the high-level SageMaker AI Python SDK to create a warm start tuning job.

**Topics**
+ [

### Create a Warm Start Tuning Job ( Low-level SageMaker AI API for Python (Boto 3))
](#warm-start-tuning-example-boto)
+ [

### Create a Warm Start Tuning Job (SageMaker AI Python SDK)
](#warm-start-tuning-example-sdk)

### Create a Warm Start Tuning Job ( Low-level SageMaker AI API for Python (Boto 3))
<a name="warm-start-tuning-example-boto"></a>

To use warm start tuning, you specify the values of a [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTuningJobWarmStartConfig.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTuningJobWarmStartConfig.html) object, and pass that as the `WarmStartConfig` field in a call to [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html).

The following code shows how to create a [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTuningJobWarmStartConfig.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTuningJobWarmStartConfig.html) object and pass it to [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html) job by using the low-level SageMaker AI API for Python (Boto 3).

Create the `HyperParameterTuningJobWarmStartConfig` object:

```
warm_start_config = {
          "ParentHyperParameterTuningJobs" : [
          {"HyperParameterTuningJobName" : 'MyParentTuningJob'}
          ],
          "WarmStartType" : "IdenticalDataAndAlgorithm"
}
```

Create the warm start tuning job:

```
smclient = boto3.Session().client('sagemaker')
smclient.create_hyper_parameter_tuning_job(HyperParameterTuningJobName = 'MyWarmStartTuningJob',
   HyperParameterTuningJobConfig = tuning_job_config, # See notebook for tuning configuration
   TrainingJobDefinition = training_job_definition, # See notebook for job definition
   WarmStartConfig = warm_start_config)
```

### Create a Warm Start Tuning Job (SageMaker AI Python SDK)
<a name="warm-start-tuning-example-sdk"></a>

To use the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) to run a warm start tuning job, you:
+ Specify the parent jobs and the warm start type by using a `WarmStartConfig` object.
+ Pass the `WarmStartConfig` object as the value of the `warm_start_config` argument of a [HyperparameterTuner](https://sagemaker.readthedocs.io/en/stable/tuner.html) object.
+ Call the `fit` method of the `HyperparameterTuner` object.

For more information about using the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) for hyperparameter tuning, see [https://github.com/aws/sagemaker-python-sdk\$1sagemaker-automatic-model-tuning](https://github.com/aws/sagemaker-python-sdk#sagemaker-automatic-model-tuning).

This example uses an estimator that uses the [Image Classification - MXNet](image-classification.md) algorithm for training. The following code sets the hyperparameter ranges that the warm start tuning job searches within to find the best combination of values. For information about setting hyperparameter ranges, see [Define Hyperparameter Ranges](automatic-model-tuning-define-ranges.md).

```
hyperparameter_ranges = {'learning_rate': ContinuousParameter(0.0, 0.1),
                         'momentum': ContinuousParameter(0.0, 0.99)}
```

The following code configures the warm start tuning job by creating a `WarmStartConfig` object.

```
from sagemaker.tuner import WarmStartConfig,WarmStartTypes

parent_tuning_job_name = "MyParentTuningJob"
warm_start_config = WarmStartConfig(warm_start_type=WarmStartTypes.IDENTICAL_DATA_AND_ALGORITHM, parents={parent_tuning_job_name})
```

Now set the values for static hyperparameters, which are hyperparameters that keep the same value for every training job that the warm start tuning job launches. In the following code, `imageclassification` is an estimator that was created previously.

```
imageclassification.set_hyperparameters(num_layers=18,
                                        image_shape='3,224,224',
                                        num_classes=257,
                                        num_training_samples=15420,
                                        mini_batch_size=128,
                                        epochs=30,
                                        optimizer='sgd',
                                        top_k='2',
                                        precision_dtype='float32',
                                        augmentation_type='crop')
```

Now create the `HyperparameterTuner` object and pass the `WarmStartConfig` object that you previously created as the `warm_start_config` argument.

```
tuner_warm_start = HyperparameterTuner(imageclassification,
                            'validation:accuracy',
                            hyperparameter_ranges,
                            objective_type='Maximize',
                            max_jobs=10,
                            max_parallel_jobs=2,
                            base_tuning_job_name='warmstart',
                            warm_start_config=warm_start_config)
```

Finally, call the `fit` method of the `HyperparameterTuner` object to launch the warm start tuning job.

```
tuner_warm_start.fit(
        {'train': s3_input_train, 'validation': s3_input_validation},
        include_cls_metadata=False)
```

# Resource Limits for Automatic Model Tuning
<a name="automatic-model-tuning-limits"></a>

SageMaker AI sets the following default limits for resources used by automatic model tuning:


| Resource | Regions | Default limits | Can be increased to | 
| --- | --- | --- | --- | 
|  Number of parallel (concurrent) hyperparameter tuning jobs  |  All  |  100  |  N/A  | 
|  Number of hyperparameters that can be searched \$1  |  All  |  30  |  N/A  | 
|  Number of metrics defined per hyperparameter tuning job  |  All  |  20  |  N/A  | 
|  Number of parallel training jobs per hyperparameter tuning job  |  All  |  10  |  100  | 
|  [Bayesian optimization] Number of training jobs per hyperparameter tuning job  |  All  |  750  |  N/A  | 
|  [Random search] Number of training jobs per hyperparameter tuning job  |  All  |  750  |  10000  | 
|  [Hyperband] Number of training jobs per hyperparameter tuning job  |  All  |  750  |  N/A  | 
|  [Grid] Number of training jobs per hyperparameter tuning job, either specified explicitly or inferred from the search space  |  All  |  750  |  N/A  | 
|  Maximum run time for a hyperparameter tuning job  |  All  |  30 days  |  N/A  | 

\$1 Each categorical hyperparameter can have at most 30 different values.

## Resource limit example
<a name="automatic-model-tuning-limits-example"></a>

When you plan hyperparameter tuning jobs, you also have to take into account the limits on training resources. For information about the default resource limits for SageMaker AI training jobs, see [SageMaker AI Limits](https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html#limits_sagemaker). Every concurrent training instance on which all of your hyperparameter tuning jobs run counts against the total number of training instances allowed. For example, if you run 10 concurrent hyperparameter tuning jobs, each of those hyperparameter tuning jobs runs 100 total training jobs and 20 concurrent training jobs. Each of those training jobs runs on one **ml.m4.xlarge** instance. The following limits apply: 
+ Number of concurrent hyperparameter tuning jobs: You don't need to increase the limit, because 10 tuning jobs is below the limit of 100.
+ Number of training jobs per hyperparameter tuning job: You don't need to increase the limit, because 100 training jobs is below the limit of 750.
+ Number of concurrent training jobs per hyperparameter tuning job: You need to request a limit increase to 20, because the default limit is 10.
+ SageMaker AI training **ml.m4.xlarge** instances: You need to request a limit increase to 200, because you have 10 hyperparameter tuning jobs, each of which is running 20 concurrent training jobs. The default limit is 20 instances.
+ SageMaker AI training total instance count: You need to request a limit increase to 200, because you have 10 hyperparameter tuning jobs, each of which is running 20 concurrent training jobs. The default limit is 20 instances.

**To request a quota increase:**

1. Open the [AWS Support Center](https://console.aws.amazon.com/support/home#/) page, sign in if necessary, and then choose **Create case**. 

1. On the **Create case** page, choose **Service limit increase**.

1. On the **Case details** panel, select **SageMaker AI Automatic Model Tuning [Hyperparameter Optimization]** for the **Limit type** 

1. On the **Requests** panel for **Request 1**, select the **Region**, the resource **Limit** to increase and the **New Limit value** you are requesting. Select **Add another request** if you have additional requests for quota increases.  
![\[\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/hpo/hpo-quotas-service-linit-increase-request.PNG)

1. In the **Case description** panel, provide a description of your use case .

1. In the **Contact options** panel, select your preferred **Contact methods** (**Web**, **Chat** or **Phone**) and then choose **Submit**. 

# Best Practices for Hyperparameter Tuning
<a name="automatic-model-tuning-considerations"></a>

Hyperparameter optimization (HPO) is not a fully-automated process. To improve optimization, follow these best practices for hyperparameter tuning.

**Topics**
+ [

## Choosing a tuning strategy
](#automatic-model-tuning-strategy)
+ [

## Choosing the number of hyperparameters
](#automatic-model-tuning-num-hyperparameters)
+ [

## Choosing hyperparameter ranges
](#automatic-model-tuning-choosing-ranges)
+ [

## Using the correct scales for hyperparameters
](#automatic-model-tuning-log-scales)
+ [

## Choosing the best number of parallel training jobs
](#automatic-model-tuning-parallelism)
+ [

## Running training jobs on multiple instances
](#automatic-model-tuning-distributed-metrics)
+ [

## Using a random seed to reproduce hyperparameter configurations
](#automatic-model-tuning-random-seed)

## Choosing a tuning strategy
<a name="automatic-model-tuning-strategy"></a>

For large jobs, using the [Hyperband](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html#automatic-tuning-hyperband) tuning strategy can reduce computation time. Hyperband has an early stopping mechanism to stop under-performing jobs. Hyperband can also reallocate resources towards well-utilized hyperparameter configurations and run parallel jobs. For smaller training jobs using less runtime, use either [random search](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html#automatic-tuning-random-search) or [Bayesian optimization](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html#automatic-tuning-bayesian-optimization.title). 

Use Bayesian optimization to make increasingly informed decisions about improving hyperparameter configurations in the next run. Bayesian optimization uses information gathered from prior runs to improve subsequent runs. Because of its sequential nature, Bayesian optimization cannot massively scale. 

Use random search to run a large number of parallel jobs. In random search, subsequent jobs do not depend on the results from prior jobs and can be run independently. Compared to other strategies, random search is able to run the largest number of parallel jobs. 

Use [grid search](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html#automatic-tuning-grid-search) to reproduce results of a tuning job, or if simplicity and transparency of the optimization algorithm are important. You can also use grid search to explore the entire hyperparameter search space evenly. Grid search methodically searches through every hyperparameter combination to find optimal hyperparameter values. Unlike grid search, Bayesian optimization, random search and Hyperband all draw hyperparameters randomly from the search space. Because grid search analyzes every combination of hyperparameters, optimal hyperparameter values will be identical between tuning jobs that use the same hyperparameters. 

## Choosing the number of hyperparameters
<a name="automatic-model-tuning-num-hyperparameters"></a>

During optimization, the computational complexity of a hyperparameter tuning job depends on the following:
+ The number of hyperparameters
+ The range of values that Amazon SageMaker AI has to search

Although you can simultaneously specify up to 30 hyperparameters, limiting your search to a smaller number can reduce computation time. Reducing computation time allows SageMaker AI to converge more quickly to an optimal hyperparameter configuration.

## Choosing hyperparameter ranges
<a name="automatic-model-tuning-choosing-ranges"></a>

The range of values that you choose to search can adversely affect hyperparameter optimization. For example, a range that covers every possible hyperparameter value can lead to large compute times and a model that doesn't generalize well to unseen data. If you know that using a subset of the largest possible range is appropriate for your use case, consider limiting the range to that subset.

## Using the correct scales for hyperparameters
<a name="automatic-model-tuning-log-scales"></a>

During hyperparameter tuning, SageMaker AI attempts to infer if your hyperparameters are log-scaled or linear-scaled. Initially, SageMaker AI assumes linear scaling for hyperparameters. If hyperparameters are log-scaled, choosing the correct scale will make your search more efficient. You can also select `Auto` for `ScalingType` in the [CreateHyperParameterTuningJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html) API if you want SageMaker AI to detect the scale for you.

## Choosing the best number of parallel training jobs
<a name="automatic-model-tuning-parallelism"></a>

You can use the results of previous trials to improve the performance of subsequent trials. Choose the largest number of parallel jobs that would provide a meaningful incremental result that is also within your region and account compute constraints. Use the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ResourceLimits.html#MaxParallelTrainingJobs](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ResourceLimits.html#MaxParallelTrainingJobs) field to limit the number of training jobs that a hyperparameter tuning job can launch in parallel. For more information, see [Running multiple HPO jobs in parallel on Amazon SageMaker AI](https://aws.amazon.com/blogs/machine-learning/running-multiple-hpo-jobs-in-parallel-on-amazon-sagemaker).

## Running training jobs on multiple instances
<a name="automatic-model-tuning-distributed-metrics"></a>

When a training job runs on multiple machines in distributed mode, each machine emits an objective metric. HPO can only use one of these emitted objective metrics to evaluate model performance, In distributed mode, HPO uses the objective metric that was reported by the last running job across all instances. 

## Using a random seed to reproduce hyperparameter configurations
<a name="automatic-model-tuning-random-seed"></a>

You can specify an integer as a random seed for hyperparameter tuning and use that seed during hyperparameter generation. Later, you can use the same seed to reproduce hyperparameter configurations that are consistent with your previous results. For random search and Hyperband strategies, using the same random seed can provide up to 100% reproducibility of the previous hyperparameter configuration for the same tuning job. For Bayesian strategy, using the same random seed will improve reproducibility for the same tuning job.