

# Object Detection - MXNet
<a name="object-detection"></a>

The Amazon SageMaker AI Object Detection - MXNet algorithm detects and classifies objects in images using a single deep neural network. It is a supervised learning algorithm that takes images as input and identifies all instances of objects within the image scene. The object is categorized into one of the classes in a specified collection with a confidence score that it belongs to the class. Its location and scale in the image are indicated by a rectangular bounding box. It uses the [Single Shot multibox Detector (SSD)](https://arxiv.org/pdf/1512.02325.pdf) framework and supports two base networks: [VGG](https://arxiv.org/pdf/1409.1556.pdf) and [ResNet](https://arxiv.org/pdf/1603.05027.pdf). The network can be trained from scratch, or trained with models that have been pre-trained on the [ImageNet](http://www.image-net.org/) dataset.

**Topics**
+ [Input/Output Interface for the Object Detection Algorithm](#object-detection-inputoutput)
+ [EC2 Instance Recommendation for the Object Detection Algorithm](#object-detection-instances)
+ [Object Detection Sample Notebooks](#object-detection-sample-notebooks)
+ [How Object Detection Works](algo-object-detection-tech-notes.md)
+ [Object Detection Hyperparameters](object-detection-api-config.md)
+ [Tune an Object Detection Model](object-detection-tuning.md)
+ [Object Detection Request and Response Formats](object-detection-in-formats.md)

## Input/Output Interface for the Object Detection Algorithm
<a name="object-detection-inputoutput"></a>

The SageMaker AI Object Detection algorithm supports both RecordIO (`application/x-recordio`) and image (`image/png`, `image/jpeg`, and `application/x-image`) content types for training in file mode and supports RecordIO (`application/x-recordio`) for training in pipe mode. However you can also train in pipe mode using the image files (`image/png`, `image/jpeg`, and `application/x-image`), without creating RecordIO files, by using the augmented manifest format. The recommended input format for the Amazon SageMaker AI object detection algorithms is [Apache MXNet RecordIO](https://mxnet.apache.org/api/architecture/note_data_loading). However, you can also use raw images in .jpg or .png format. The algorithm supports only `application/x-image` for inference.

**Note**  
To maintain better interoperability with existing deep learning frameworks, this differs from the protobuf data formats commonly used by other Amazon SageMaker AI algorithms.

See the [Object Detection Sample Notebooks](#object-detection-sample-notebooks) for more details on data formats.

### Train with the RecordIO Format
<a name="object-detection-recordio-training"></a>

If you use the RecordIO format for training, specify both train and validation channels as values for the `InputDataConfig` parameter of the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request. Specify one RecordIO (.rec) file in the train channel and one RecordIO file in the validation channel. Set the content type for both channels to `application/x-recordio`. An example of how to generate RecordIO file can be found in the object detection sample notebook. You can also use tools from the [MXNet's GluonCV](https://gluon-cv.mxnet.io/build/examples_datasets/recordio.html) to generate RecordIO files for popular datasets like the [PASCAL Visual Object Classes](http://host.robots.ox.ac.uk/pascal/VOC/) and [Common Objects in Context (COCO)](http://cocodataset.org/#home).

### Train with the Image Format
<a name="object-detection-image-training"></a>

If you use the image format for training, specify `train`, `validation`, `train_annotation`, and `validation_annotation` channels as values for the `InputDataConfig` parameter of [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request. Specify the individual image data (.jpg or .png) files for the train and validation channels. For annotation data, you can use the JSON format. Specify the corresponding .json files in the `train_annotation` and `validation_annotation` channels. Set the content type for all four channels to `image/png` or `image/jpeg` based on the image type. You can also use the content type `application/x-image` when your dataset contains both .jpg and .png images. The following is an example of a .json file.

```
{
   "file": "your_image_directory/sample_image1.jpg",
   "image_size": [
      {
         "width": 500,
         "height": 400,
         "depth": 3
      }
   ],
   "annotations": [
      {
         "class_id": 0,
         "left": 111,
         "top": 134,
         "width": 61,
         "height": 128
      },
      {
         "class_id": 0,
         "left": 161,
         "top": 250,
         "width": 79,
         "height": 143
      },
      {
         "class_id": 1,
         "left": 101,
         "top": 185,
         "width": 42,
         "height": 130
      }
   ],
   "categories": [
      {
         "class_id": 0,
         "name": "dog"
      },
      {
         "class_id": 1,
         "name": "cat"
      }
   ]
}
```

Each image needs a .json file for annotation, and the .json file should have the same name as the corresponding image. The name of above .json file should be "sample\$1image1.json". There are four properties in the annotation .json file. The property "file" specifies the relative path of the image file. For example, if your training images and corresponding .json files are stored in s3://*your\$1bucket*/train/sample\$1image and s3://*your\$1bucket*/train\$1annotation, specify the path for your train and train\$1annotation channels as s3://*your\$1bucket*/train and s3://*your\$1bucket*/train\$1annotation, respectively. 

In the .json file, the relative path for an image named sample\$1image1.jpg should be sample\$1image/sample\$1image1.jpg. The `"image_size"` property specifies the overall image dimensions. The SageMaker AI object detection algorithm currently only supports 3-channel images. The `"annotations"` property specifies the categories and bounding boxes for objects within the image. Each object is annotated by a `"class_id"` index and by four bounding box coordinates (`"left"`, `"top"`, `"width"`, `"height"`). The `"left"` (x-coordinate) and `"top"` (y-coordinate) values represent the upper-left corner of the bounding box. The `"width"` (x-coordinate) and `"height"` (y-coordinate) values represent the dimensions of the bounding box. The origin (0, 0) is the upper-left corner of the entire image. If you have multiple objects within one image, all the annotations should be included in a single .json file. The `"categories"` property stores the mapping between the class index and class name. The class indices should be numbered successively and the numbering should start with 0. The `"categories"` property is optional for the annotation .json file

### Train with Augmented Manifest Image Format
<a name="object-detection-augmented-manifest-training"></a>

The augmented manifest format enables you to do training in pipe mode using image files without needing to create RecordIO files. You need to specify both train and validation channels as values for the `InputDataConfig` parameter of the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request. While using the format, an S3 manifest file needs to be generated that contains the list of images and their corresponding annotations. The manifest file format should be in [JSON Lines](http://jsonlines.org/) format in which each line represents one sample. The images are specified using the `'source-ref'` tag that points to the S3 location of the image. The annotations are provided under the `"AttributeNames"` parameter value as specified in the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request. It can also contain additional metadata under the `metadata` tag, but these are ignored by the algorithm. In the following example, the `"AttributeNames` are contained in the list `["source-ref", "bounding-box"]`:

```
{"source-ref": "s3://your_bucket/image1.jpg", "bounding-box":{"image_size":[{ "width": 500, "height": 400, "depth":3}], "annotations":[{"class_id": 0, "left": 111, "top": 134, "width": 61, "height": 128}, {"class_id": 5, "left": 161, "top": 250, "width": 80, "height": 50}]}, "bounding-box-metadata":{"class-map":{"0": "dog", "5": "horse"}, "type": "groundtruth/object-detection"}}
{"source-ref": "s3://your_bucket/image2.jpg", "bounding-box":{"image_size":[{ "width": 400, "height": 300, "depth":3}], "annotations":[{"class_id": 1, "left": 100, "top": 120, "width": 43, "height": 78}]}, "bounding-box-metadata":{"class-map":{"1": "cat"}, "type": "groundtruth/object-detection"}}
```

The order of `"AttributeNames"` in the input files matters when training the Object Detection algorithm. It accepts piped data in a specific order, with `image` first, followed by `annotations`. So the "AttributeNames" in this example are provided with `"source-ref"` first, followed by `"bounding-box"`. When using Object Detection with Augmented Manifest, the value of parameter `RecordWrapperType` must be set as `"RecordIO"`.

For more information on augmented manifest files, see [Augmented Manifest Files for Training Jobs](augmented-manifest.md).

### Incremental Training
<a name="object-detection-incremental-training"></a>

You can also seed the training of a new model with the artifacts from a model that you trained previously with SageMaker AI. Incremental training saves training time when you want to train a new model with the same or similar data. SageMaker AI object detection models can be seeded only with another built-in object detection model trained in SageMaker AI.

To use a pretrained model, in the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request, specify the `ChannelName` as "model" in the `InputDataConfig` parameter. Set the `ContentType` for the model channel to `application/x-sagemaker-model`. The input hyperparameters of both the new model and the pretrained model that you upload to the model channel must have the same settings for the `base_network` and `num_classes` input parameters. These parameters define the network architecture. For the pretrained model file, use the compressed model artifacts (in .tar.gz format) output by SageMaker AI. You can use either RecordIO or image formats for input data.

For more information on incremental training and for instructions on how to use it, see [Use Incremental Training in Amazon SageMaker AI](incremental-training.md). 

## EC2 Instance Recommendation for the Object Detection Algorithm
<a name="object-detection-instances"></a>

The object detection algorithm supports P2, P3, G4dn, and G5 GPU instance families. We recommend using GPU instances with more memory for training with large batch sizes. You can run the object detection algorithm on multi-GPU and mult-machine settings for distributed training.

You can use both CPU (such as C5 and M5) and GPU (such as P3 and G4dn) instances for inference.

## Object Detection Sample Notebooks
<a name="object-detection-sample-notebooks"></a>

For a sample notebook that shows how to use the SageMaker AI Object Detection algorithm to train and host a model on the 

[Caltech Birds (CUB 200 2011)](http://www.vision.caltech.edu/datasets/cub_200_2011/) dataset using the Single Shot multibox Detector algorithm, see [Amazon SageMaker AI Object Detection for Bird Species](https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/object_detection_birds/object_detection_birds.html). For instructions how to create and access Jupyter notebook instances that you can use to run the example in SageMaker AI, see [Amazon SageMaker notebook instances](nbi.md). Once you have created a notebook instance and opened it, select the **SageMaker AI Examples** tab to see a list of all the SageMaker AI samples. The object detection example notebook using the Object Detection algorithm is located in the **Introduction to Amazon Algorithms** section. To open a notebook, click on its **Use** tab and select **Create copy**.

For more information about the Amazon SageMaker AI Object Detection algorithm, see the following blog posts:
+ [Training the Amazon SageMaker AI object detection model and running it on AWS IoT Greengrass – Part 1 of 3: Preparing training data](https://aws.amazon.com/blogs/iot/sagemaker-object-detection-greengrass-part-1-of-3/)
+ [Training the Amazon SageMaker AI object detection model and running it on AWS IoT Greengrass – Part 2 of 3: Training a custom object detection model](https://aws.amazon.com/blogs/iot/sagemaker-object-detection-greengrass-part-2-of-3/)
+ [Training the Amazon SageMaker AI object detection model and running it on AWS IoT Greengrass – Part 3 of 3: Deploying to the edge](https://aws.amazon.com/blogs/iot/sagemaker-object-detection-greengrass-part-3-of-3/)

# How Object Detection Works
<a name="algo-object-detection-tech-notes"></a>

The object detection algorithm identifies and locates all instances of objects in an image from a known collection of object categories. The algorithm takes an image as input and outputs the category that the object belongs to, along with a confidence score that it belongs to the category. The algorithm also predicts the object's location and scale with a rectangular bounding box. Amazon SageMaker AI Object Detection uses the [Single Shot multibox Detector (SSD)](https://arxiv.org/pdf/1512.02325.pdf) algorithm that takes a convolutional neural network (CNN) pretrained for classification task as the base network. SSD uses the output of intermediate layers as features for detection. 

Various CNNs such as [VGG](https://arxiv.org/pdf/1409.1556.pdf) and [ResNet](https://arxiv.org/pdf/1603.05027.pdf) have achieved great performance on the image classification task. Object detection in Amazon SageMaker AI supports both VGG-16 and ResNet-50 as a base network for SSD. The algorithm can be trained in full training mode or in transfer learning mode. In full training mode, the base network is initialized with random weights and then trained on user data. In transfer learning mode, the base network weights are loaded from pretrained models.

The object detection algorithm uses standard data augmentation operations, such as flip, rescale, and jitter, on the fly internally to help avoid overfitting.

# Object Detection Hyperparameters
<a name="object-detection-api-config"></a>

In the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request, you specify the training algorithm that you want to use. You can also specify algorithm-specific hyperparameters that are used to help estimate the parameters of the model from a training dataset. The following table lists the hyperparameters provided by Amazon SageMaker AI for training the object detection algorithm. For more information about how object training works, see [How Object Detection Works](algo-object-detection-tech-notes.md).


| Parameter Name | Description | 
| --- | --- | 
| num\$1classes |  The number of output classes. This parameter defines the dimensions of the network output and is typically set to the number of classes in the dataset. **Required** Valid values: positive integer  | 
| num\$1training\$1samples |  The number of training examples in the input dataset.  If there is a mismatch between this value and the number of samples in the training set, then the behavior of the `lr_scheduler_step` parameter will be undefined and distributed training accuracy may be affected.  **Required** Valid values: positive integer  | 
| base\$1network |  The base network architecture to use. **Optional** Valid values: 'vgg-16' or 'resnet-50' Default value: 'vgg-16'  | 
| early\$1stopping |  `True` to use early stopping logic during training. `False` not to use it. **Optional** Valid values: `True` or `False` Default value: `False`  | 
| early\$1stopping\$1min\$1epochs |  The minimum number of epochs that must be run before the early stopping logic can be invoked. It is used only when `early_stopping` = `True`. **Optional** Valid values: positive integer Default value: 10  | 
| early\$1stopping\$1patience |  The number of epochs to wait before ending training if no improvement, as defined by the `early_stopping_tolerance` hyperparameter, is made in the relevant metric. It is used only when `early_stopping` = `True`. **Optional** Valid values: positive integer Default value: 5  | 
| early\$1stopping\$1tolerance |  The tolerance value that the relative improvement in `validation:mAP`, the mean average precision (mAP), is required to exceed to avoid early stopping. If the ratio of the change in the mAP divided by the previous best mAP is smaller than the `early_stopping_tolerance` value set, early stopping considers that there is no improvement. It is used only when `early_stopping` = `True`. **Optional** Valid values: 0 ≤ float ≤ 1 Default value: 0.0  | 
| image\$1shape |  The image size for input images. We rescale the input image to a square image with this size. We recommend using 300 and 512 for better performance. **Optional** Valid values: positive integer ≥300 Default: 300  | 
| epochs |  The number of training epochs.  **Optional** Valid values: positive integer Default: 30  | 
| freeze\$1layer\$1pattern |  The regular expression (regex) for freezing layers in the base network. For example, if we set `freeze_layer_pattern` = `"^(conv1_\|conv2_).*"`, then any layers with a name that contains `"conv1_"` or `"conv2_"` are frozen, which means that the weights for these layers are not updated during training. The layer names can be found in the network symbol files [vgg16-symbol.json](http://data.mxnet.io/models/imagenet/vgg/vgg16-symbol.json ) and [resnet-50-symbol.json](http://data.mxnet.io/models/imagenet/resnet/50-layers/resnet-50-symbol.json). Freezing a layer means that its weights can not be modified further. This can reduce training time significantly in exchange for modest losses in accuracy. This technique is commonly used in transfer learning where the lower layers in the base network do not need to be retrained. **Optional** Valid values: string Default: No layers frozen.  | 
| kv\$1store |  The weight update synchronization mode used for distributed training. The weights can be updated either synchronously or asynchronously across machines. Synchronous updates typically provide better accuracy than asynchronous updates but can be slower. See the [Distributed Training](https://mxnet.apache.org/api/faq/distributed_training) MXNet tutorial for details.  This parameter is not applicable to single machine training.  **Optional** Valid values: `'dist_sync'` or `'dist_async'` [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/object-detection-api-config.html) Default: -  | 
| label\$1width |  The force padding label width used to sync across training and validation data. For example, if one image in the data contains at most 10 objects, and each object's annotation is specified with 5 numbers, [class\$1id, left, top, width, height], then the `label_width` should be no smaller than (10\$15 \$1 header information length). The header information length is usually 2. We recommend using a slightly larger `label_width` for the training, such as 60 for this example. **Optional** Valid values: Positive integer large enough to accommodate the largest annotation information length in the data. Default: 350  | 
| learning\$1rate |  The initial learning rate. **Optional** Valid values: float in (0, 1] Default: 0.001  | 
| lr\$1scheduler\$1factor |  The ratio to reduce learning rate. Used in conjunction with the `lr_scheduler_step` parameter defined as `lr_new` = `lr_old` \$1 `lr_scheduler_factor`. **Optional** Valid values: float in (0, 1) Default: 0.1  | 
| lr\$1scheduler\$1step |  The epochs at which to reduce the learning rate. The learning rate is reduced by `lr_scheduler_factor` at epochs listed in a comma-delimited string: "epoch1, epoch2, ...". For example, if the value is set to "10, 20" and the `lr_scheduler_factor` is set to 1/2, then the learning rate is halved after 10th epoch and then halved again after 20th epoch. **Optional** Valid values: string Default: empty string  | 
| mini\$1batch\$1size |  The batch size for training. In a single-machine multi-gpu setting, each GPU handles `mini_batch_size`/`num_gpu` training samples. For the multi-machine training in `dist_sync` mode, the actual batch size is `mini_batch_size`\$1number of machines. A large `mini_batch_size` usually leads to faster training, but it may cause out of memory problem. The memory usage is related to `mini_batch_size`, `image_shape`, and `base_network` architecture. For example, on a single p3.2xlarge instance, the largest `mini_batch_size` without an out of memory error is 32 with the base\$1network set to "resnet-50" and an `image_shape` of 300. With the same instance, you can use 64 as the `mini_batch_size` with the base network `vgg-16` and an `image_shape` of 300. **Optional** Valid values: positive integer Default: 32  | 
| momentum |  The momentum for `sgd`. Ignored for other optimizers. **Optional** Valid values: float in (0, 1] Default: 0.9  | 
| nms\$1threshold |  The non-maximum suppression threshold. **Optional** Valid values: float in (0, 1] Default: 0.45  | 
| optimizer |  The optimizer types. For details on optimizer values, see [MXNet's API](https://mxnet.apache.org/api/python/docs/api/). **Optional** Valid values: ['sgd', 'adam', 'rmsprop', 'adadelta'] Default: 'sgd'  | 
| overlap\$1threshold |  The evaluation overlap threshold. **Optional** Valid values: float in (0, 1] Default: 0.5  | 
| use\$1pretrained\$1model |  Indicates whether to use a pre-trained model for training. If set to 1, then the pre-trained model with corresponding architecture is loaded and used for training. Otherwise, the network is trained from scratch. **Optional** Valid values: 0 or 1 Default: 1  | 
| weight\$1decay |  The weight decay coefficient for `sgd` and `rmsprop`. Ignored for other optimizers. **Optional** Valid values: float in (0, 1) Default: 0.0005  | 

# Tune an Object Detection Model
<a name="object-detection-tuning"></a>

*Automatic model tuning*, also known as hyperparameter tuning, finds the best version of a model by running many jobs that test a range of hyperparameters on your dataset. You choose the tunable hyperparameters, a range of values for each, and an objective metric. You choose the objective metric from the metrics that the algorithm computes. Automatic model tuning searches the hyperparameters chosen to find the combination of values that result in the model that optimizes the objective metric.

For more information about model tuning, see [Automatic model tuning with SageMaker AI](automatic-model-tuning.md).

## Metrics Computed by the Object Detection Algorithm
<a name="object-detection-metrics"></a>

The object detection algorithm reports on a single metric during training: `validation:mAP`. When tuning a model, choose this metric as the objective metric.


| Metric Name | Description | Optimization Direction | 
| --- | --- | --- | 
| validation:mAP |  Mean Average Precision (mAP) computed on the validation set.  |  Maximize  | 



## Tunable Object Detection Hyperparameters
<a name="object-detection-tunable-hyperparameters"></a>

Tune the Amazon SageMaker AI object detection model with the following hyperparameters. The hyperparameters that have the greatest impact on the object detection objective metric are: `mini_batch_size`, `learning_rate`, and `optimizer`.


| Parameter Name | Parameter Type | Recommended Ranges | 
| --- | --- | --- | 
| learning\$1rate |  ContinuousParameterRange  |  MinValue: 1e-6, MaxValue: 0.5  | 
| mini\$1batch\$1size |  IntegerParameterRanges  |  MinValue: 8, MaxValue: 64  | 
| momentum |  ContinuousParameterRange  |  MinValue: 0.0, MaxValue: 0.999  | 
| optimizer |  CategoricalParameterRanges  |  ['sgd', 'adam', 'rmsprop', 'adadelta']  | 
| weight\$1decay |  ContinuousParameterRange  |  MinValue: 0.0, MaxValue: 0.999  | 

# Object Detection Request and Response Formats
<a name="object-detection-in-formats"></a>

The following page describes the inference request and response formats for the Amazon SageMaker AI Object Detection - MXNet model.

## Request Format
<a name="object-detection-json"></a>

Query a trained model by using the model's endpoint. The endpoint takes .jpg and .png image formats with `image/jpeg` and `image/png` content-types.

## Response Formats
<a name="object-detection-recordio"></a>

The response is the class index with a confidence score and bounding box coordinates for all objects within the image encoded in JSON format. The following is an example of response .json file:

```
{"prediction":[
  [4.0, 0.86419455409049988, 0.3088374733924866, 0.07030484080314636, 0.7110607028007507, 0.9345266819000244],
  [0.0, 0.73376623392105103, 0.5714187026023865, 0.40427327156066895, 0.827075183391571, 0.9712159633636475],
  [4.0, 0.32643985450267792, 0.3677481412887573, 0.034883320331573486, 0.6318609714508057, 0.5967587828636169],
  [8.0, 0.22552496790885925, 0.6152569651603699, 0.5722782611846924, 0.882301390171051, 0.8985623121261597],
  [3.0, 0.42260299175977707, 0.019305512309074402, 0.08386176824569702, 0.39093565940856934, 0.9574796557426453]
]}
```

Each row in this .json file contains an array that represents a detected object. Each of these object arrays consists of a list of six numbers. The first number is the predicted class label. The second number is the associated confidence score for the detection. The last four numbers represent the bounding box coordinates [xmin, ymin, xmax, ymax]. These output bounding box corner indices are normalized by the overall image size. Note that this encoding is different than that use by the input .json format. For example, in the first entry of the detection result, 0.3088374733924866 is the left coordinate (x-coordinate of upper-left corner) of the bounding box as a ratio of the overall image width, 0.07030484080314636 is the top coordinate (y-coordinate of upper-left corner) of the bounding box as a ratio of the overall image height, 0.7110607028007507 is the right coordinate (x-coordinate of lower-right corner) of the bounding box as a ratio of the overall image width, and 0.9345266819000244 is the bottom coordinate (y-coordinate of lower-right corner) of the bounding box as a ratio of the overall image height. 

To avoid unreliable detection results, you might want to filter out the detection results with low confidence scores. In the [object detection sample notebook](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/object_detection_birds/object_detection_birds.ipynb), we provide examples of scripts that use a threshold to remove low confidence detections and to plot bounding boxes on the original images.

For batch transform, the response is in JSON format, where the format is identical to the JSON format described above. The detection results of each image is represented as a JSON file. For example:

```
{"prediction": [[label_id, confidence_score, xmin, ymin, xmax, ymax], [label_id, confidence_score, xmin, ymin, xmax, ymax]]}
```

For more details on training and inference, see the [Object Detection Sample Notebooks](object-detection.md#object-detection-sample-notebooks).

## OUTPUT: JSON Response Format
<a name="object-detection-output-json"></a>

accept: application/json;annotation=1

```
{
   "image_size": [
      {
         "width": 500,
         "height": 400,
         "depth": 3
      }
   ],
   "annotations": [
      {
         "class_id": 0,
         "score": 0.943,
         "left": 111,
         "top": 134,
         "width": 61,
         "height": 128
      },
      {
         "class_id": 0,
         "score": 0.0013,
         "left": 161,
         "top": 250,
         "width": 79,
         "height": 143
      },
      {
         "class_id": 1,
         "score": 0.0133,
         "left": 101,
         "top": 185,
         "width": 42,
         "height": 130
      }
   ]
}
```