# Semantic Segmentation Algorithm The SageMaker AI semantic segmentation algorithm provides a fine-grained, pixel-level approach to developing computer vision applications. It tags every pixel in an image with a class label from a predefined set of classes. Tagging is fundamental for understanding scenes, which is critical to an increasing number of computer vision applications, such as self-driving vehicles, medical imaging diagnostics, and robot sensing. For comparison, the SageMaker AI [Image Classification - MXNet](image-classification.md) is a supervised learning algorithm that analyzes only whole images, classifying them into one of multiple output categories. The [Object Detection - MXNet](object-detection.md) is a supervised learning algorithm that detects and classifies all instances of an object in an image. It indicates the location and scale of each object in the image with a rectangular bounding box. Because the semantic segmentation algorithm classifies every pixel in an image, it also provides information about the shapes of the objects contained in the image. The segmentation output is represented as a grayscale image, called a *segmentation mask*. A segmentation mask is a grayscale image with the same shape as the input image. The SageMaker AI semantic segmentation algorithm is built using the [MXNet Gluon framework and the Gluon CV toolkit](https://github.com/dmlc/gluon-cv). It provides you with a choice of three built-in algorithms to train a deep neural network. You can use the [Fully-Convolutional Network (FCN) algorithm ](https://arxiv.org/abs/1605.06211), [Pyramid Scene Parsing (PSP) algorithm](https://arxiv.org/abs/1612.01105), or [DeepLabV3](https://arxiv.org/abs/1706.05587). Each of the three algorithms has two distinct components: + The *backbone* (or *encoder*)—A network that produces reliable activation maps of features. + The *decoder*—A network that constructs the segmentation mask from the encoded activation maps. You also have a choice of backbones for the FCN, PSP, and DeepLabV3 algorithms: [ResNet50 or ResNet101](https://arxiv.org/abs/1512.03385). These backbones include pretrained artifacts that were originally trained on the [ImageNet](http://www.image-net.org/) classification task. You can fine-tune these backbones for segmentation using your own data. Or, you can initialize and train these networks from scratch using only your own data. The decoders are never pretrained. To deploy the trained model for inference, use the SageMaker AI hosting service. During inference, you can request the segmentation mask either as a PNG image or as a set of probabilities for each class for each pixel. You can use these masks as part of a larger pipeline that includes additional downstream image processing or other applications. **Topics** + [Semantic Segmentation Sample Notebooks](#semantic-segmentation-sample-notebooks) + [Input/Output Interface for the Semantic Segmentation Algorithm](#semantic-segmentation-inputoutput) + [EC2 Instance Recommendation for the Semantic Segmentation Algorithm](#semantic-segmentation-instances) + [Semantic Segmentation Hyperparameters](segmentation-hyperparameters.md) + [Tuning a Semantic Segmentation Model](semantic-segmentation-tuning.md) ## Semantic Segmentation Sample Notebooks For a sample Jupyter notebook that uses the SageMaker AI semantic segmentation algorithm to train a model and deploy it to perform inferences, see the [Semantic Segmentation Example](https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/semantic_segmentation_pascalvoc/semantic_segmentation_pascalvoc.html). For instructions on how to create and access Jupyter notebook instances that you can use to run the example in SageMaker AI, see [Amazon SageMaker notebook instances](nbi.md). To see a list of all of the SageMaker AI samples, create and open a notebook instance, and choose the **SageMaker AI Examples** tab. The example semantic segmentation notebooks are located under **Introduction to Amazon algorithms**. To open a notebook, choose its **Use** tab, and choose **Create copy**. ## Input/Output Interface for the Semantic Segmentation Algorithm SageMaker AI semantic segmentation expects the customer's training dataset to be on [Amazon Simple Storage Service (Amazon S3)](https://aws.amazon.com/s3/). Once trained, it produces the resulting model artifacts on Amazon S3. The input interface format for the SageMaker AI semantic segmentation is similar to that of most standardized semantic segmentation benchmarking datasets. The dataset in Amazon S3 is expected to be presented in two channels, one for `train` and one for `validation` using four directories, two for images and two for annotations. Annotations are expected to be uncompressed PNG images. The dataset might also have a label map that describes how the annotation mappings are established. If not, the algorithm uses a default. It also supports the augmented manifest image format (`application/x-image`) for training in Pipe input mode straight from Amazon S3. For inference, an endpoint accepts images with an `image/jpeg` content type. ### How Training Works The training data is split into four directories: `train`, `train_annotation`, `validation`, and `validation_annotation`. There is a channel for each of these directories. The dataset also expected to have one `label_map.json` file per channel for `train_annotation` and `validation_annotation` respectively. If you don't provide these JSON files, SageMaker AI provides the default set label map. The dataset specifying these files should look similar to the following example: ``` s3://bucket_name | |- train | | - 0000.jpg | - coffee.jpg |- validation | | - 00a0.jpg | - bananna.jpg |- train_annotation | | - 0000.png | - coffee.png |- validation_annotation | | - 00a0.png | - bananna.png |- label_map | - train_label_map.json | - validation_label_map.json ``` Every JPG image in the train and validation directories has a corresponding PNG label image with the same name in the `train_annotation` and `validation_annotation` directories. This naming convention helps the algorithm to associate a label with its corresponding image during training. The `train`, `train_annotation`, `validation`, and `validation_annotation` channels are mandatory. The annotations are single-channel PNG images. The format works as long as the metadata (modes) in the image helps the algorithm read the annotation images into a single-channel 8-bit unsigned integer. For more information on our support for modes, see the [Python Image Library documentation](https://pillow.readthedocs.io/en/stable/handbook/concepts.html#modes). We recommend using the 8-bit pixel, true color `P` mode. The image that is encoded is a simple 8-bit integer when using modes. To get from this mapping to a map of a label, the algorithm uses one mapping file per channel, called the *label map*. The label map is used to map the values in the image with actual label indices. In the default label map, which is provided by default if you don’t provide one, the pixel value in an annotation matrix (image) directly index the label. These images can be grayscale PNG files or 8-bit indexed PNG files. The label map file for the unscaled default case is the following: ``` { "scale": "1" } ``` To provide some contrast for viewing, some annotation software scales the label images by a constant amount. To support this, the SageMaker AI semantic segmentation algorithm provides a rescaling option to scale down the values to actual label values. When scaling down doesn’t convert the value to an appropriate integer, the algorithm defaults to the greatest integer less than or equal to the scale value. The following code shows how to set the scale value to rescale the label values: ``` { "scale": "3" } ``` The following example shows how this `"scale"` value is used to rescale the `encoded_label` values of the input annotation image when they are mapped to the `mapped_label` values to be used in training. The label values in the input annotation image are 0, 3, 6, with scale 3, so they are mapped to 0, 1, 2 for training: ``` encoded_label = [0, 3, 6] mapped_label = [0, 1, 2] ``` In some cases, you might need to specify a particular color mapping for each class. Use the map option in the label mapping as shown in the following example of a `label_map` file: ``` { "map": { "0": 5, "1": 0, "2": 2 } } ``` This label mapping for this example is: ``` encoded_label = [0, 5, 2] mapped_label = [1, 0, 2] ``` With label mappings, you can use different annotation systems and annotation software to obtain data without a lot of preprocessing. You can provide one label map per channel. The files for a label map in the `label_map` channel must follow the naming conventions for the four directory structure. If you don't provide a label map, the algorithm assumes a scale of 1 (the default). ### Training with the Augmented Manifest Format The augmented manifest format enables you to do training in Pipe mode using image files without needing to create RecordIO files. The augmented manifest file contains data objects and should be in [JSON Lines](http://jsonlines.org/) format, as described in the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request. Each line in the manifest is an entry containing the Amazon S3 URI for the image and the URI for the annotation image. Each JSON object in the manifest file must contain a `source-ref` key. The `source-ref` key should contain the value of the Amazon S3 URI to the image. The labels are provided under the `AttributeNames` parameter value as specified in the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request. It can also contain additional metadata under the metadata tag, but these are ignored by the algorithm. In the example below, the `AttributeNames` are contained in the list of image and annotation references `["source-ref", "city-streets-ref"]`. These names must have `-ref` appended to them. When using the Semantic Segmentation algorithm with Augmented Manifest, the value of the `RecordWrapperType` parameter must be `"RecordIO"` and value of the `ContentType` parameter must be `application/x-recordio`. ``` {"source-ref": "S3 bucket location", "city-streets-ref": "S3 bucket location", "city-streets-metadata": {"job-name": "label-city-streets", }} ``` For more information on augmented manifest files, see [Augmented Manifest Files for Training Jobs](augmented-manifest.md). ### Incremental Training You can also seed the training of a new model with a model that you trained previously using SageMaker AI. This incremental training saves training time when you want to train a new model with the same or similar data. Currently, incremental training is supported only for models trained with the built-in SageMaker AI Semantic Segmentation. To use your own pre-trained model, specify the `ChannelName` as "model" in the `InputDataConfig` for the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request. Set the `ContentType` for the model channel to `application/x-sagemaker-model`. The `backbone`, `algorithm`, `crop_size`, and `num_classes` input parameters that define the network architecture must be consistently specified in the input hyperparameters of the new model and the pre-trained model that you upload to the model channel. For the pretrained model file, you can use the compressed (.tar.gz) artifacts from SageMaker AI outputs. You can only use Image formats for input data. For more information on incremental training and for instructions on how to use it, see [Use Incremental Training in Amazon SageMaker AI](incremental-training.md). ### Produce Inferences To query a trained model that is deployed to an endpoint, you need to provide an image and an `AcceptType` that denotes the type of output required. The endpoint takes JPEG images with an `image/jpeg` content type. If you request an `AcceptType` of `image/png`, the algorithm outputs a PNG file with a segmentation mask in the same format as the labels themselves. If you request an accept type of`application/x-recordio-protobuf`, the algorithm returns class probabilities encoded in recordio-protobuf format. The latter format outputs a 3D tensor where the third dimension is the same size as the number of classes. This component denotes the probability of each class label for each pixel. ## EC2 Instance Recommendation for the Semantic Segmentation Algorithm The SageMaker AI semantic segmentation algorithm only supports GPU instances for training, and we recommend using GPU instances with more memory for training with large batch sizes. The algorithm can be trained using P2, P3, G4dn, or G5 instances in single machine configurations. For inference, you can use either CPU instances (such as C5 and M5) and GPU instances (such as P3 and G4dn) or both. For information about the instance types that provide varying combinations of CPU, GPU, memory, and networking capacity for inference, see [Amazon SageMaker AI ML Instance Types](https://aws.amazon.com/sagemaker/pricing/instance-types/). # Semantic Segmentation Hyperparameters The following tables list the hyperparameters supported by the Amazon SageMaker AI semantic segmentation algorithm for network architecture, data inputs, and training. You specify Semantic Segmentation for training in the `AlgorithmName` of the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request. **Network Architecture Hyperparameters** | Parameter Name | Description | | --- | --- | | backbone | The backbone to use for the algorithm's encoder component. **Optional** Valid values: `resnet-50`, `resnet-101` Default value: `resnet-50` | | use\$1pretrained\$1model | Whether a pretrained model is to be used for the backbone. **Optional** Valid values: `True`, `False` Default value: `True` | | algorithm | The algorithm to use for semantic segmentation. **Optional** Valid values: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/segmentation-hyperparameters.html) Default value: `fcn` | **Data Hyperparameters** | Parameter Name | Description | | --- | --- | | num\$1classes | The number of classes to segment. **Required** Valid values: 2 ≤ positive integer ≤ 254 | | num\$1training\$1samples | The number of samples in the training data. The algorithm uses this value to set up the learning rate scheduler. **Required** Valid values: positive integer | | base\$1size | Defines how images are rescaled before cropping. Images are rescaled such that the long size length is set to `base_size` multiplied by a random number from 0.5 to 2.0, and the short size is computed to preserve the aspect ratio. **Optional** Valid values: positive integer > 16 Default value: 520 | | crop\$1size | The image size for input during training. We randomly rescale the input image based on `base_size`, and then take a random square crop with side length equal to `crop_size`. The `crop_size` will be automatically rounded up to multiples of 8. **Optional** Valid values: positive integer > 16 Default value: 240 | **Training Hyperparameters** | Parameter Name | Description | | --- | --- | | early\$1stopping | Whether to use early stopping logic during training. **Optional** Valid values: `True`, `False` Default value: `False` | | early\$1stopping\$1min\$1epochs | The minimum number of epochs that must be run. **Optional** Valid values: integer Default value: 5 | | early\$1stopping\$1patience | The number of epochs that meet the tolerance for lower performance before the algorithm enforces an early stop. **Optional** Valid values: integer Default value: 4 | | early\$1stopping\$1tolerance | If the relative improvement of the score of the training job, the mIOU, is smaller than this value, early stopping considers the epoch as not improved. This is used only when `early_stopping` = `True`. **Optional** Valid values: 0 ≤ float ≤ 1 Default value: 0.0 | | epochs | The number of epochs with which to train. **Optional** Valid values: positive integer Default value: 10 | | gamma1 | The decay factor for the moving average of the squared gradient for `rmsprop`. Used only for `rmsprop`. **Optional** Valid values: 0 ≤ float ≤ 1 Default value: 0.9 | | gamma2 | The momentum factor for `rmsprop`. **Optional** Valid values: 0 ≤ float ≤ 1 Default value: 0.9 | | learning\$1rate | The initial learning rate. **Optional** Valid values: 0 < float ≤ 1 Default value: 0.001 | | lr\$1scheduler | The shape of the learning rate schedule that controls its decrease over time. **Optional** Valid values: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/segmentation-hyperparameters.html) Default value: `poly` | | lr\$1scheduler\$1factor | If `lr_scheduler` is set to `step`, the ratio by which to reduce (multipy) the `learning_rate` after each of the epochs specified by the `lr_scheduler_step`. Otherwise, ignored. **Optional** Valid values: 0 ≤ float ≤ 1 Default value: 0.1 | | lr\$1scheduler\$1step | A comma delimited list of the epochs after which the `learning_rate` is reduced (multiplied) by an `lr_scheduler_factor`. For example, if the value is set to `"10, 20"`, then the `learning-rate` is reduced by `lr_scheduler_factor` after the 10th epoch and again by this factor after 20th epoch. **Conditionally Required** if `lr_scheduler` is set to `step`. Otherwise, ignored. Valid values: string Default value: (No default, as the value is required when used.) | | mini\$1batch\$1size | The batch size for training. Using a large `mini_batch_size` usually results in faster training, but it might cause you to run out of memory. Memory usage is affected by the values of the `mini_batch_size` and `image_shape` parameters, and the backbone architecture. **Optional** Valid values: positive integer Default value: 16 | | momentum | The momentum for the `sgd` optimizer. When you use other optimizers, the semantic segmentation algorithm ignores this parameter. **Optional** Valid values: 0 < float ≤ 1 Default value: 0.9 | | optimizer | The type of optimizer. For more information about an optimizer, choose the appropriate link: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/segmentation-hyperparameters.html) **Optional** Valid values: `adam`, `adagrad`, `nag`, `rmsprop`, `sgd` Default value: `sgd` | | syncbn | If set to `True`, the batch normalization mean and variance are computed over all the samples processed across the GPUs. **Optional** Valid values: `True`, `False` Default value: `False` | | validation\$1mini\$1batch\$1size | The batch size for validation. A large `mini_batch_size` usually results in faster training, but it might cause you to run out of memory. Memory usage is affected by the values of the `mini_batch_size` and `image_shape` parameters, and the backbone architecture. [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/segmentation-hyperparameters.html) **Optional** Valid values: positive integer Default value: 16 | | weight\$1decay | The weight decay coefficient for the `sgd` optimizer. When you use other optimizers, the algorithm ignores this parameter. **Optional** Valid values: 0 < float < 1 Default value: 0.0001 | # Tuning a Semantic Segmentation Model *Automatic model tuning*, also known as hyperparameter tuning, finds the best version of a model by running many jobs that test a range of hyperparameters on your dataset. You choose the tunable hyperparameters, a range of values for each, and an objective metric. You choose the objective metric from the metrics that the algorithm computes. Automatic model tuning searches the hyperparameters chosen to find the combination of values that result in the model that optimizes the objective metric. ## Metrics Computed by the Semantic Segmentation Algorithm The semantic segmentation algorithm reports two validation metrics. When tuning hyperparameter values, choose one of these metrics as the objective. | Metric Name | Description | Optimization Direction | | --- | --- | --- | | validation:mIOU | The area of the intersection of the predicted segmentation and the ground truth divided by the area of union between them for images in the validation set. Also known as the Jaccard Index. | Maximize | | validation:pixel\$1accuracy | The percentage of pixels that are correctly classified in images from the validation set. | Maximize | ## Tunable Semantic Segmentation Hyperparameters You can tune the following hyperparameters for the semantic segmentation algorithm. | Parameter Name | Parameter Type | Recommended Ranges | | --- | --- | --- | | learning\$1rate | ContinuousParameterRange | MinValue: 1e-4, MaxValue: 1e-1 | | mini\$1batch\$1size | IntegerParameterRanges | MinValue: 1, MaxValue: 128 | | momentum | ContinuousParameterRange | MinValue: 0.9, MaxValue: 0.999 | | optimzer | CategoricalParameterRanges | ['sgd', 'adam', 'adadelta'] | | weight\$1decay | ContinuousParameterRange | MinValue: 1e-5, MaxValue: 1e-3 |