Image Classification - MXNet
The Amazon SageMaker AI image classification algorithm is a supervised learning algorithm that supports multi-label classification. It takes an image as input and outputs one or more labels assigned to that image. It uses a convolutional neural network that can be trained from scratch or trained using transfer learning when a large number of training images are not available
The recommended input format for the Amazon SageMaker AI image classification algorithms is Apache
MXNet RecordIO
Note
To maintain better interoperability with existing deep learning frameworks, this differs from the protobuf data formats commonly used by other Amazon SageMaker AI algorithms.
For more information on convolutional networks, see:
-
Deep residual learning for image recognition
Kaiming He, et al., 2016 IEEE Conference on Computer Vision and Pattern Recognition
Topics
Input/Output Interface for the Image Classification Algorithm
The SageMaker AI Image Classification algorithm supports both RecordIO
(application/x-recordio
) and image (image/png
,
image/jpeg
, and application/x-image
) content types for
training in file mode, and supports the RecordIO (application/x-recordio
)
content type for training in pipe mode. However, you can also train in pipe mode using
the image files (image/png
, image/jpeg
, and
application/x-image
), without creating RecordIO files, by using the
augmented manifest format.
Distributed training is supported for file mode and pipe mode. When using the RecordIO
content type in pipe mode, you must set the S3DataDistributionType
of the
S3DataSource
to FullyReplicated
. The algorithm supports a fully replicated model where your data is
copied onto each machine.
The algorithm supports image/png
, image/jpeg
, and
application/x-image
for inference.
Train with RecordIO Format
If you use the RecordIO format for training, specify both train
and
validation
channels as values for the InputDataConfig
parameter of the CreateTrainingJob
request. Specify one RecordIO
(.rec
) file in the train
channel and one RecordIO file
in the validation
channel. Set the content type for both channels to
application/x-recordio
.
Train with Image Format
If you use the Image format for training, specify train
,
validation
, train_lst
, and validation_lst
channels as values for the InputDataConfig
parameter of the CreateTrainingJob
request. Specify the individual image
data (.jpg
or .png
files) for the train
and
validation
channels. Specify one .lst
file in each of
the train_lst
and validation_lst
channels. Set the content
type for all four channels to application/x-image
.
Note
SageMaker AI reads the training and validation data separately from different channels, so you must store the training and validation data in different folders.
A .lst
file is a tab-separated file with three columns that contains
a list of image files. The first column specifies the image index, the second column
specifies the class label index for the image, and the third column specifies the
relative path of the image file. The image index in the first column must be unique
across all of the images. The set of class label indices are numbered successively
and the numbering should start with 0. For example, 0 for the cat class, 1 for the
dog class, and so on for additional classes.
The following is an example of a .lst
file:
5 1 your_image_directory/train_img_dog1.jpg
1000 0 your_image_directory/train_img_cat1.jpg
22 1 your_image_directory/train_img_dog2.jpg
For example, if your training images are stored in
s3://<your_bucket>/train/class_dog
,
s3://<your_bucket>/train/class_cat
, and so on, specify the
path for your train
channel as
s3://<your_bucket>/train
, which is the top-level directory
for your data. In the .lst
file, specify the relative path for an
individual file named train_image_dog1.jpg
in the
class_dog
class directory as
class_dog/train_image_dog1.jpg
. You can also store all your image
files under one subdirectory inside the train
directory. In that case,
use that subdirectory for the relative path. For example,
s3://<your_bucket>/train/your_image_directory
.
Train with Augmented Manifest Image Format
The augmented manifest format enables you to do training in Pipe mode using image
files without needing to create RecordIO files. You need to specify both train and
validation channels as values for the InputDataConfig
parameter of the
CreateTrainingJob
request. While using the format, an S3
manifest file needs to be generated that contains the list of images and their
corresponding annotations. The manifest file format should be in JSON Lines'source-ref'
tag that points to the S3 location of the image. The annotations are provided under
the "AttributeNames"
parameter value as specified in the CreateTrainingJob
request. It can also contain additional
metadata under the metadata
tag, but these are ignored by the
algorithm. In the following example, the "AttributeNames"
are contained
in the list of image and annotation references ["source-ref", "class"]
.
The corresponding label value is "0"
for the first image and
“1”
for the second image:
{"source-ref":"s3://image/filename1.jpg", "class":"0"} {"source-ref":"s3://image/filename2.jpg", "class":"1", "class-metadata": {"class-name": "cat", "type" : "groundtruth/image-classification"}}
The order of "AttributeNames"
in the input files matters when
training the ImageClassification algorithm. It accepts piped data in a specific
order, with image
first, followed by label
. So the
"AttributeNames" in this example are provided with "source-ref"
first,
followed by "class"
. When using the ImageClassification algorithm with
Augmented Manifest, the value of the RecordWrapperType
parameter must
be "RecordIO"
.
Multi-label training is also supported by specifying a JSON array of values. The
num_classes
hyperparameter must be set to match the total number of
classes. There are two valid label formats: multi-hot and class-id.
In the multi-hot format, each label is a multi-hot encoded vector of all classes, where each class takes the value of 0 or 1. In the following example, there are three classes. The first image is labeled with classes 0 and 2, while the second image is labeled with class 2 only:
{"image-ref": "s3://amzn-s3-demo-bucket/sample01/image1.jpg", "class": "[1, 0, 1]"} {"image-ref": "s3://amzn-s3-demo-bucket/sample02/image2.jpg", "class": "[0, 0, 1]"}
In the class-id format, each label is a list of the class ids, from [0,
num_classes
), which apply to the data point. The previous example
would instead look like this:
{"image-ref": "s3://amzn-s3-demo-bucket/sample01/image1.jpg", "class": "[0, 2]"} {"image-ref": "s3://amzn-s3-demo-bucket/sample02/image2.jpg", "class": "[2]"}
The multi-hot format is the default, but can be explicitly set in the content type
with the label-format
parameter: "application/x-recordio;
label-format=multi-hot".
The class-id format, which is the format
outputted by GroundTruth, must be set explicitly: "application/x-recordio;
label-format=class-id".
For more information on augmented manifest files, see Augmented Manifest Files for Training Jobs.
Incremental Training
You can also seed the training of a new model with the artifacts from a model that you trained previously with SageMaker AI. Incremental training saves training time when you want to train a new model with the same or similar data. SageMaker AI image classification models can be seeded only with another built-in image classification model trained in SageMaker AI.
To use a pretrained model, in the CreateTrainingJob
request, specify the
ChannelName
as "model" in the InputDataConfig
parameter. Set the ContentType
for the model channel to
application/x-sagemaker-model
. The input hyperparameters of both
the new model and the pretrained model that you upload to the model channel must
have the same settings for the num_layers
, image_shape
and
num_classes
input parameters. These parameters define the network
architecture. For the pretrained model file, use the compressed model artifacts (in
.tar.gz format) output by SageMaker AI. You can use either RecordIO or image formats for
input data.
Inference with the Image Classification Algorithm
The generated models can be hosted for inference and support encoded
.jpg
and .png
image formats as image/png,
image/jpeg
, and application/x-image
content-type. The input
image is resized automatically. The output is the probability values for all classes
encoded in JSON format, or in JSON Lines text
format
accept: application/jsonlines {"prediction": [prob_0, prob_1, prob_2, prob_3, ...]}
For more details on training and inference, see the image classification sample notebook instances referenced in the introduction.
EC2 Instance Recommendation for the Image Classification Algorithm
For image classification, we support P2, P3, G4dn, and G5 instances. We recommend using GPU instances with more memory for training with large batch sizes. You can also run the algorithm on multi-GPU and multi-machine settings for distributed training. Both CPU (such as C4) and GPU (P2, P3, G4dn, or G5) instances can be used for inference.
Image Classification Sample Notebooks
For a sample notebook that uses the SageMaker AI image classification algorithm, see Build and Register an MXNet Image Classification Model via SageMaker Pipelines