

# Image Classification - MXNet
<a name="image-classification"></a>

The Amazon SageMaker image classification algorithm is a supervised learning algorithm that supports multi-label classification. It takes an image as input and outputs one or more labels assigned to that image. It uses a convolutional neural network that can be trained from scratch or trained using transfer learning when a large number of training images are not available 

The recommended input format for the Amazon SageMaker AI image classification algorithms is Apache MXNet [RecordIO](https://mxnet.apache.org/api/faq/recordio). However, you can also use raw images in .jpg or .png format. Refer to [this discussion](https://mxnet.apache.org/api/architecture/note_data_loading) for a broad overview of efficient data preparation and loading for machine learning systems. 

**Note**  
To maintain better interoperability with existing deep learning frameworks, this differs from the protobuf data formats commonly used by other Amazon SageMaker AI algorithms.

For more information on convolutional networks, see: 
+ [Deep residual learning for image recognition](https://arxiv.org/abs/1512.03385) Kaiming He, et al., 2016 IEEE Conference on Computer Vision and Pattern Recognition
+ [ImageNet image database](http://www.image-net.org/)
+ [Image classification with Gluon-CV and MXNet](https://gluon-cv.mxnet.io/build/examples_classification/index.html)

**Topics**
+ [

## Input/Output Interface for the Image Classification Algorithm
](#IC-inputoutput)
+ [

## EC2 Instance Recommendation for the Image Classification Algorithm
](#IC-instances)
+ [

## Image Classification Sample Notebooks
](#IC-sample-notebooks)
+ [

# How Image Classification Works
](IC-HowItWorks.md)
+ [

# Image Classification Hyperparameters
](IC-Hyperparameter.md)
+ [

# Tune an Image Classification Model
](IC-tuning.md)

## Input/Output Interface for the Image Classification Algorithm
<a name="IC-inputoutput"></a>

The SageMaker AI Image Classification algorithm supports both RecordIO (`application/x-recordio`) and image (`image/png`, `image/jpeg`, and `application/x-image`) content types for training in file mode, and supports the RecordIO (`application/x-recordio`) content type for training in pipe mode. However, you can also train in pipe mode using the image files (`image/png`, `image/jpeg`, and `application/x-image`), without creating RecordIO files, by using the augmented manifest format.

Distributed training is supported for file mode and pipe mode. When using the RecordIO content type in pipe mode, you must set the `S3DataDistributionType` of the `S3DataSource` to `FullyReplicated`. The algorithm supports a fully replicated model where your data is copied onto each machine.

The algorithm supports `image/png`, `image/jpeg`, and `application/x-image` for inference.

### Train with RecordIO Format
<a name="IC-recordio-training"></a>

If you use the RecordIO format for training, specify both `train` and `validation` channels as values for the `InputDataConfig` parameter of the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request. Specify one RecordIO (`.rec`) file in the `train` channel and one RecordIO file in the `validation` channel. Set the content type for both channels to `application/x-recordio`. 

### Train with Image Format
<a name="IC-image-training"></a>

If you use the Image format for training, specify `train`, `validation`, `train_lst`, and `validation_lst` channels as values for the `InputDataConfig` parameter of the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request. Specify the individual image data (`.jpg` or `.png` files) for the `train` and `validation` channels. Specify one `.lst` file in each of the `train_lst` and `validation_lst` channels. Set the content type for all four channels to `application/x-image`. 

**Note**  
SageMaker AI reads the training and validation data separately from different channels, so you must store the training and validation data in different folders.

A `.lst` file is a tab-separated file with three columns that contains a list of image files. The first column specifies the image index, the second column specifies the class label index for the image, and the third column specifies the relative path of the image file. The image index in the first column must be unique across all of the images. The set of class label indices are numbered successively and the numbering should start with 0. For example, 0 for the cat class, 1 for the dog class, and so on for additional classes. 

 The following is an example of a `.lst` file: 

```
5      1   your_image_directory/train_img_dog1.jpg
1000   0   your_image_directory/train_img_cat1.jpg
22     1   your_image_directory/train_img_dog2.jpg
```

For example, if your training images are stored in `s3://<your_bucket>/train/class_dog`, `s3://<your_bucket>/train/class_cat`, and so on, specify the path for your `train` channel as `s3://<your_bucket>/train`, which is the top-level directory for your data. In the `.lst` file, specify the relative path for an individual file named `train_image_dog1.jpg` in the `class_dog` class directory as `class_dog/train_image_dog1.jpg`. You can also store all your image files under one subdirectory inside the `train` directory. In that case, use that subdirectory for the relative path. For example, `s3://<your_bucket>/train/your_image_directory`. 

### Train with Augmented Manifest Image Format
<a name="IC-augmented-manifest-training"></a>

The augmented manifest format enables you to do training in Pipe mode using image files without needing to create RecordIO files. You need to specify both train and validation channels as values for the `InputDataConfig` parameter of the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request. While using the format, an S3 manifest file needs to be generated that contains the list of images and their corresponding annotations. The manifest file format should be in [JSON Lines](http://jsonlines.org/) format in which each line represents one sample. The images are specified using the `'source-ref'` tag that points to the S3 location of the image. The annotations are provided under the `"AttributeNames"` parameter value as specified in the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request. It can also contain additional metadata under the `metadata` tag, but these are ignored by the algorithm. In the following example, the `"AttributeNames"` are contained in the list of image and annotation references `["source-ref", "class"]`. The corresponding label value is `"0"` for the first image and `“1”` for the second image:

```
{"source-ref":"s3://image/filename1.jpg", "class":"0"}
{"source-ref":"s3://image/filename2.jpg", "class":"1", "class-metadata": {"class-name": "cat", "type" : "groundtruth/image-classification"}}
```

The order of `"AttributeNames"` in the input files matters when training the ImageClassification algorithm. It accepts piped data in a specific order, with `image` first, followed by `label`. So the "AttributeNames" in this example are provided with `"source-ref"` first, followed by `"class"`. When using the ImageClassification algorithm with Augmented Manifest, the value of the `RecordWrapperType` parameter must be `"RecordIO"`.

Multi-label training is also supported by specifying a JSON array of values. The `num_classes` hyperparameter must be set to match the total number of classes. There are two valid label formats: multi-hot and class-id. 

In the multi-hot format, each label is a multi-hot encoded vector of all classes, where each class takes the value of 0 or 1. In the following example, there are three classes. The first image is labeled with classes 0 and 2, while the second image is labeled with class 2 only: 

```
{"image-ref": "s3://amzn-s3-demo-bucket/sample01/image1.jpg", "class": "[1, 0, 1]"}
{"image-ref": "s3://amzn-s3-demo-bucket/sample02/image2.jpg", "class": "[0, 0, 1]"}
```

In the class-id format, each label is a list of the class ids, from [0, `num_classes`), which apply to the data point. The previous example would instead look like this:

```
{"image-ref": "s3://amzn-s3-demo-bucket/sample01/image1.jpg", "class": "[0, 2]"}
{"image-ref": "s3://amzn-s3-demo-bucket/sample02/image2.jpg", "class": "[2]"}
```

The multi-hot format is the default, but can be explicitly set in the content type with the `label-format` parameter: `"application/x-recordio; label-format=multi-hot".` The class-id format, which is the format outputted by GroundTruth, must be set explicitly: `"application/x-recordio; label-format=class-id".`

For more information on augmented manifest files, see [Augmented Manifest Files for Training Jobs](augmented-manifest.md).

### Incremental Training
<a name="IC-incremental-training"></a>

You can also seed the training of a new model with the artifacts from a model that you trained previously with SageMaker AI. Incremental training saves training time when you want to train a new model with the same or similar data. SageMaker AI image classification models can be seeded only with another built-in image classification model trained in SageMaker AI.

To use a pretrained model, in the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request, specify the `ChannelName` as "model" in the `InputDataConfig` parameter. Set the `ContentType` for the model channel to `application/x-sagemaker-model`. The input hyperparameters of both the new model and the pretrained model that you upload to the model channel must have the same settings for the `num_layers`, `image_shape` and `num_classes` input parameters. These parameters define the network architecture. For the pretrained model file, use the compressed model artifacts (in .tar.gz format) output by SageMaker AI. You can use either RecordIO or image formats for input data.

### Inference with the Image Classification Algorithm
<a name="IC-inference"></a>

The generated models can be hosted for inference and support encoded `.jpg` and `.png` image formats as `image/png, image/jpeg`, and `application/x-image` content-type. The input image is resized automatically. The output is the probability values for all classes encoded in JSON format, or in [JSON Lines text format](http://jsonlines.org/) for batch transform. The image classification model processes a single image per request and so outputs only one line in the JSON or JSON Lines format. The following is an example of a response in JSON Lines format:

```
accept: application/jsonlines

 {"prediction": [prob_0, prob_1, prob_2, prob_3, ...]}
```

For more details on training and inference, see the image classification sample notebook instances referenced in the introduction.

## EC2 Instance Recommendation for the Image Classification Algorithm
<a name="IC-instances"></a>

For image classification, we support P2, P3, G4dn, and G5 instances. We recommend using GPU instances with more memory for training with large batch sizes. You can also run the algorithm on multi-GPU and multi-machine settings for distributed training. Both CPU (such as C4) and GPU (P2, P3, G4dn, or G5) instances can be used for inference.

## Image Classification Sample Notebooks
<a name="IC-sample-notebooks"></a>

For a sample notebook that uses the SageMaker AI image classification algorithm, see [Build and Register an MXNet Image Classification Model via SageMaker Pipelines](https://github.com/aws-samples/amazon-sagemaker-pipelines-mxnet-image-classification/blob/main/image-classification-sagemaker-pipelines.ipynb). For instructions how to create and access Jupyter notebook instances that you can use to run the example in SageMaker AI, see [Amazon SageMaker notebook instances](nbi.md). Once you have created a notebook instance and opened it, select the **SageMaker AI Examples** tab to see a list of all the SageMaker AI samples. The example image classification notebooks are located in the **Introduction to Amazon algorithms** section. To open a notebook, click on its **Use** tab and select **Create copy**.

# How Image Classification Works
<a name="IC-HowItWorks"></a>

The image classification algorithm takes an image as input and classifies it into one of the output categories. Deep learning has revolutionized the image classification domain and has achieved great performance. Various deep learning networks such as [ResNet](https://arxiv.org/abs/1512.03385), [DenseNet](https://arxiv.org/abs/1608.06993), [Inception](https://arxiv.org/pdf/1409.4842.pdf), and so on, have been developed to be highly accurate for image classification. At the same time, there have been efforts to collect labeled image data that are essential for training these networks. [ImageNet](https://www.image-net.org/) is one such large dataset that has more than 11 million images with about 11,000 categories. Once a network is trained with ImageNet data, it can then be used to generalize with other datasets as well, by simple re-adjustment or fine-tuning. In this transfer learning approach, a network is initialized with weights (in this example, trained on ImageNet), which can be later fine-tuned for an image classification task in a different dataset. 

Image classification in Amazon SageMaker AI can be run in two modes: full training and transfer learning. In full training mode, the network is initialized with random weights and trained on user data from scratch. In transfer learning mode, the network is initialized with pre-trained weights and just the top fully connected layer is initialized with random weights. Then, the whole network is fine-tuned with new data. In this mode, training can be achieved even with a smaller dataset. This is because the network is already trained and therefore can be used in cases without sufficient training data.

# Image Classification Hyperparameters
<a name="IC-Hyperparameter"></a>

Hyperparameters are parameters that are set before a machine learning model begins learning. The following hyperparameters are supported by the Amazon SageMaker AI built-in Image Classification algorithm. See [Tune an Image Classification Model](IC-tuning.md) for information on image classification hyperparameter tuning. 


| Parameter Name | Description | 
| --- | --- | 
| num\$1classes | Number of output classes. This parameter defines the dimensions of the network output and is typically set to the number of classes in the dataset. Besides multi-class classification, multi-label classification is supported too. Please refer to [Input/Output Interface for the Image Classification Algorithm](image-classification.md#IC-inputoutput) for details on how to work with multi-label classification with augmented manifest files.  **Required** Valid values: positive integer  | 
| num\$1training\$1samples | Number of training examples in the input dataset. If there is a mismatch between this value and the number of samples in the training set, then the behavior of the `lr_scheduler_step` parameter is undefined and distributed training accuracy might be affected. **Required** Valid values: positive integer  | 
| augmentation\$1type |  Data augmentation type. The input images can be augmented in multiple ways as specified below. [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/IC-Hyperparameter.html) **Optional**  Valid values: `crop`, `crop_color`, or `crop_color_transform`. Default value: no default value  | 
| beta\$11 | The beta1 for `adam`, that is the exponential decay rate for the first moment estimates. **Optional**  Valid values: float. Range in [0, 1]. Default value: 0.9 | 
| beta\$12 | The beta2 for `adam`, that is the exponential decay rate for the second moment estimates. **Optional**  Valid values: float. Range in [0, 1]. Default value: 0.999 | 
| checkpoint\$1frequency | Period to store model parameters (in number of epochs). Note that all checkpoint files are saved as part of the final model file "model.tar.gz" and uploaded to S3 to the specified model location. This increases the size of the model file proportionally to the number of checkpoints saved during training. **Optional** Valid values: positive integer no greater than `epochs`. Default value: no default value (Save checkpoint at the epoch that has the best validation accuracy) | 
| early\$1stopping | `True` to use early stopping logic during training. `False` not to use it. **Optional** Valid values: `True` or `False` Default value: `False` | 
| early\$1stopping\$1min\$1epochs | The minimum number of epochs that must be run before the early stopping logic can be invoked. It is used only when `early_stopping` = `True`. **Optional** Valid values: positive integer Default value: 10 | 
| early\$1stopping\$1patience | The number of epochs to wait before ending training if no improvement is made in the relevant metric. It is used only when `early_stopping` = `True`. **Optional** Valid values: positive integer Default value: 5 | 
| early\$1stopping\$1tolerance | Relative tolerance to measure an improvement in accuracy validation metric. If the ratio of the improvement in accuracy divided by the previous best accuracy is smaller than the `early_stopping_tolerance` value set, early stopping considers there is no improvement. It is used only when `early_stopping` = `True`. **Optional** Valid values: 0 ≤ float ≤ 1 Default value: 0.0 | 
| epochs | Number of training epochs. **Optional** Valid values: positive integer Default value: 30 | 
| eps | The epsilon for `adam` and `rmsprop`. It is usually set to a small value to avoid division by 0. **Optional** Valid values: float. Range in [0, 1]. Default value: 1e-8 | 
| gamma | The gamma for `rmsprop`, the decay factor for the moving average of the squared gradient. **Optional** Valid values: float. Range in [0, 1]. Default value: 0.9 | 
| image\$1shape | The input image dimensions, which is the same size as the input layer of the network. The format is defined as '`num_channels`, height, width'. The image dimension can take on any value as the network can handle varied dimensions of the input. However, there may be memory constraints if a larger image dimension is used. Pretrained models can use only a fixed 224 x 224 image size. Typical image dimensions for image classification are '3,224,224'. This is similar to the ImageNet dataset.  For training, if any input image is smaller than this parameter in any dimension, training fails. If an image is larger, a portion of the image is cropped, with the cropped area specified by this parameter. If hyperparameter `augmentation_type` is set, random crop is taken; otherwise, central crop is taken.  At inference, input images are resized to the `image_shape` that was used during training. Aspect ratio is not preserved, and images are not cropped. **Optional** Valid values: string Default value: ‘3,224,224’ | 
| kv\$1store |  Weight update synchronization mode during distributed training. The weight updates can be updated either synchronously or asynchronously across machines. Synchronous updates typically provide better accuracy than asynchronous updates but can be slower. See distributed training in MXNet for more details. This parameter is not applicable to single machine training. [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/IC-Hyperparameter.html) **Optional** Valid values: `dist_sync` or `dist_async` Default value: no default value  | 
| learning\$1rate | Initial learning rate. **Optional** Valid values: float. Range in [0, 1]. Default value: 0.1 | 
| lr\$1scheduler\$1factor | The ratio to reduce learning rate used in conjunction with the `lr_scheduler_step` parameter, defined as `lr_new` = `lr_old` \$1 `lr_scheduler_factor`. **Optional** Valid values: float. Range in [0, 1]. Default value: 0.1 | 
| lr\$1scheduler\$1step | The epochs at which to reduce the learning rate. As explained in the `lr_scheduler_factor` parameter, the learning rate is reduced by `lr_scheduler_factor` at these epochs. For example, if the value is set to "10, 20", then the learning rate is reduced by `lr_scheduler_factor` after 10th epoch and again by `lr_scheduler_factor` after 20th epoch. The epochs are delimited by ",". **Optional** Valid values: string Default value: no default value | 
| mini\$1batch\$1size | The batch size for training. In a single-machine multi-GPU setting, each GPU handles `mini_batch_size`/num\$1gpu training samples. For the multi-machine training in dist\$1sync mode, the actual batch size is `mini_batch_size`\$1number of machines. See MXNet docs for more details. **Optional** Valid values: positive integer Default value: 32 | 
| momentum | The momentum for `sgd` and `nag`, ignored for other optimizers. **Optional** Valid values: float. Range in [0, 1]. Default value: 0.9 | 
| multi\$1label |  Flag to use for multi-label classification where each sample can be assigned multiple labels. Average accuracy across all classes is logged. **Optional** Valid values: 0 or 1 Default value: 0  | 
| num\$1layers | Number of layers for the network. For data with large image size (for example, 224x224 - like ImageNet), we suggest selecting the number of layers from the set [18, 34, 50, 101, 152, 200]. For data with small image size (for example, 28x28 - like CIFAR), we suggest selecting the number of layers from the set [20, 32, 44, 56, 110]. The number of layers in each set is based on the ResNet paper. For transfer learning, the number of layers defines the architecture of base network and hence can only be selected from the set [18, 34, 50, 101, 152, 200]. **Optional** Valid values: positive integer in [18, 34, 50, 101, 152, 200] or [20, 32, 44, 56, 110] Default value: 152 | 
| optimizer | The optimizer type. For more details of the parameters for the optimizers, please refer to MXNet's API. **Optional** Valid values: One of `sgd`, `adam`, `rmsprop`, or `nag`. [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/IC-Hyperparameter.html) Default value: `sgd` | 
| precision\$1dtype | The precision of the weights used for training. The algorithm can use either single precision (`float32`) or half precision (`float16`) for the weights. Using half-precision for weights results in reduced memory consumption. **Optional** Valid values: `float32` or `float16` Default value: `float32` | 
| resize | The number of pixels in the shortest side of an image after resizing it for training. If the parameter is not set, then the training data is used without resizing. The parameter should be larger than both the width and height components of `image_shape` to prevent training failure. **Required** when using image content types **Optional** when using the RecordIO content type Valid values: positive integer Default value: no default value  | 
| top\$1k | Reports the top-k accuracy during training. This parameter has to be greater than 1, since the top-1 training accuracy is the same as the regular training accuracy that has already been reported. **Optional** Valid values: positive integer larger than 1. Default value: no default value | 
| use\$1pretrained\$1model | Flag to use pre-trained model for training. If set to 1, then the pretrained model with the corresponding number of layers is loaded and used for training. Only the top FC layer are reinitialized with random weights. Otherwise, the network is trained from scratch. **Optional** Valid values: 0 or 1 Default value: 0 | 
| use\$1weighted\$1loss |  Flag to use weighted cross-entropy loss for multi-label classification (used only when `multi_label` = 1), where the weights are calculated based on the distribution of classes. **Optional** Valid values: 0 or 1 Default value: 0  | 
| weight\$1decay | The coefficient weight decay for `sgd` and `nag`, ignored for other optimizers. **Optional** Valid values: float. Range in [0, 1]. Default value: 0.0001 | 

# Tune an Image Classification Model
<a name="IC-tuning"></a>

*Automatic model tuning*, also known as hyperparameter tuning, finds the best version of a model by running many jobs that test a range of hyperparameters on your dataset. You choose the tunable hyperparameters, a range of values for each, and an objective metric. You choose the objective metric from the metrics that the algorithm computes. Automatic model tuning searches the hyperparameters chosen to find the combination of values that result in the model that optimizes the objective metric.

For more information about model tuning, see [Automatic model tuning with SageMaker AI](automatic-model-tuning.md).

## Metrics Computed by the Image Classification Algorithm
<a name="IC-metrics"></a>

The image classification algorithm is a supervised algorithm. It reports an accuracy metric that is computed during training. When tuning the model, choose this metric as the objective metric.


| Metric Name | Description | Optimization Direction | 
| --- | --- | --- | 
| validation:accuracy | The ratio of the number of correct predictions to the total number of predictions made. | Maximize | 

## Tunable Image Classification Hyperparameters
<a name="IC-tunable-hyperparameters"></a>

Tune an image classification model with the following hyperparameters. The hyperparameters that have the greatest impact on image classification objective metrics are: `mini_batch_size`, `learning_rate`, and `optimizer`. Tune the optimizer-related hyperparameters, such as `momentum`, `weight_decay`, `beta_1`, `beta_2`, `eps`, and `gamma`, based on the selected `optimizer`. For example, use `beta_1` and `beta_2` only when `adam` is the `optimizer`.

For more information about which hyperparameters are used in each optimizer, see [Image Classification Hyperparameters](IC-Hyperparameter.md).


| Parameter Name | Parameter Type | Recommended Ranges | 
| --- | --- | --- | 
| beta\$11 | ContinuousParameterRanges | MinValue: 1e-6, MaxValue: 0.999 | 
| beta\$12 | ContinuousParameterRanges | MinValue: 1e-6, MaxValue: 0.999 | 
| eps | ContinuousParameterRanges | MinValue: 1e-8, MaxValue: 1.0 | 
| gamma | ContinuousParameterRanges | MinValue: 1e-8, MaxValue: 0.999 | 
| learning\$1rate | ContinuousParameterRanges | MinValue: 1e-6, MaxValue: 0.5 | 
| mini\$1batch\$1size | IntegerParameterRanges | MinValue: 8, MaxValue: 512 | 
| momentum | ContinuousParameterRanges | MinValue: 0.0, MaxValue: 0.999 | 
| optimizer | CategoricalParameterRanges | ['sgd', ‘adam’, ‘rmsprop’, 'nag'] | 
| weight\$1decay | ContinuousParameterRanges | MinValue: 0.0, MaxValue: 0.999 | 