Linear models are supervised learning algorithms used
for solving either classification or regression problems. For input, you give the model
labeled examples (x, y). x is
a high-dimensional vector and y is a numeric label. For binary
classification problems, the label must be either 0 or 1. For multiclass classification
problems, the labels must be from 0 to num_classes
- 1. For regression
problems, y is a real number. The algorithm learns a linear function,
or, for classification problems, a linear threshold function, and maps a vector
x to an approximation of the label y.
The Amazon SageMaker AI linear learner algorithm provides a solution for both classification and regression problems. With the SageMaker AI algorithm, you can simultaneously explore different training objectives and choose the best solution from a validation set. You can also explore a large number of models and choose the best. The best model optimizes either of the following:
-
Continuous objectives, such as mean square error, cross entropy loss, absolute error.
-
Discrete objectives suited for classification, such as F1 measure, precision, recall, or accuracy.
Compared with methods that provide a solution for only continuous objectives, the SageMaker AI linear learner algorithm provides a significant increase in speed over naive hyperparameter optimization techniques. It is also more convenient.
The linear learner algorithm requires a data matrix, with rows representing the
observations, and columns representing the dimensions of the features. It also requires an
additional column that contains the labels that match the data points. At a minimum,
Amazon SageMaker AI linear learner requires you to specify input and output data locations, and
objective type (classification or regression) as arguments. The feature dimension is also
required. For more information, see CreateTrainingJob
. You can specify additional parameters in the
HyperParameters
string map of the request body. These parameters control
the optimization procedure, or specifics of the objective function that you train on. For
example, the number of epochs, regularization, and loss type.
If you're using Managed Spot Training, the linear learner algorithm supports using checkpoints to take a snapshot of the state of the model.
Topics
Input/Output interface for the linear learner
algorithm
The Amazon SageMaker AI linear learner algorithm supports three data channels: train, validation
(optional), and test (optional). If you provide validation data, the
S3DataDistributionType
should be FullyReplicated
. The
algorithm logs validation loss at every epoch, and uses a sample of the validation data
to calibrate and select the best model. If you don't provide validation data, the
algorithm uses a sample of the training data to calibrate and select the model. If you
provide test data, the algorithm logs include the test score for the final model.
For training, the linear learner algorithm supports
both recordIO-wrapped protobuf
and CSV
formats. For the
application/x-recordio-protobuf
input type, only Float32 tensors are
supported. For the text/csv
input type, the first column is assumed to be
the label, which is the target variable for prediction. You can use either File mode or
Pipe mode to train linear learner models on data that is formatted as
recordIO-wrapped-protobuf
or as CSV
.
For inference, the linear learner algorithm supports
the application/json
, application/x-recordio-protobuf
, and
text/csv
formats. When you make predictions on new data, the format of
the response depends on the type of model. For
regression (predictor_type='regressor'
), the
score
is the prediction produced by the model. For classification (predictor_type='binary_classifier'
or
predictor_type='multiclass_classifier'
), the model returns a
score
and also a predicted_label
. The
predicted_label
is the class predicted by the model and the
score
measures the strength of that prediction.
-
For binary classification,
predicted_label
is0
or1
, andscore
is a single floating point number that indicates how strongly the algorithm believes that the label should be 1. -
For multiclass classification, the
predicted_class
will be an integer from0
tonum_classes-1
, andscore
will be a list of one floating point number per class.
To interpret the score
in classification problems, you have to consider
the loss function used. If the loss
hyperparameter value is
logistic
for binary classification or softmax_loss
for
multiclass classification, then the score
can be interpreted as the
probability of the corresponding class. These are the loss values used by the linear
learner when the loss
value is auto
default value. But if the
loss is set to hinge_loss
, then the score cannot be interpreted as a
probability. This is because hinge loss corresponds to a Support Vector Classifier,
which does not produce probability estimates.
For more information on input and output file formats, see Linear learner response formats. For more information on inference formats, and the Linear learner sample notebooks.
EC2 instance recommendation for the linear learner
algorithm
The linear learner algorithm supports both CPU and GPU instances for training and inference. For GPU, the linear learner algorithm supports P2, P3, G4dn, and G5 GPU families.
During testing, we have not found substantial evidence that multi-GPU instances are faster than single-GPU instances. Results can vary, depending on your specific use case.
Linear learner sample notebooks
The following table outlines a variety of sample notebooks that address different use cases of Amazon SageMaker AI linear learner algorithm.
Notebook Title | Description |
---|---|
Using the MNIST dataset, we train a binary classifier to predict a single digit. |
|
Using UCI's Covertype dataset, we demonstrate how to train a multiclass classifier. |
|
How to Build a Machine Learning (ML) Pipeline for Inference?
|
Using a Scikit-learn container, we demonstrate how to build an end-to-end ML pipeline. |
For instructions on how to create and access Jupyter notebook instances that you can use to run the example in SageMaker AI, see Amazon SageMaker Notebook Instances. After you have created a notebook instance and opened it, choose the SageMaker AI Examples tab to see a list of all of the SageMaker AI samples. The topic modeling example notebooks using the linear learning algorithm are located in the Introduction to Amazon algorithms section. To open a notebook, choose its Use tab and choose Create copy.