Built-in SageMaker Algorithms for Tabular Data
Amazon SageMaker provides built-in algorithms that are tailored to the analysis of tabular data. Tabular data refers to any datasets that are organized in tables consisting of rows (observations) and columns (features). The built-in SageMaker algorithms for tabular data can be used for either classification or regression problems.
-
AutoGluon-Tabular—an open-source AutoML framework that succeeds by ensembling models and stacking them in multiple layers.
-
CatBoost—an implementation of the gradient-boosted trees algorithm that introduces ordered boosting and an innovative algorithm for processing categorical features.
-
Factorization Machines Algorithm—an extension of a linear model that is designed to economically capture interactions between features within high-dimensional sparse datasets.
-
K-Nearest Neighbors (k-NN) Algorithm—a non-parametric method that uses the k nearest labeled points to assign a label to a new data point for classification or a predicted target value from the average of the k nearest points for regression.
-
LightGBM—an implementation of the gradient-boosted trees algorithm that adds two novel techniques for improved efficiency and scalability: Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB).
-
Linear Learner Algorithm—learns a linear function for regression or a linear threshold function for classification.
-
TabTransformer—a novel deep tabular data modeling architecture built on self-attention-based Transformers.
-
XGBoost algorithm with Amazon SageMaker—an implementation of the gradient-boosted trees algorithm that combines an ensemble of estimates from a set of simpler and weaker models.
Algorithm name | Channel name | Training input mode | File type | Instance class | Parallelizable |
---|---|---|---|---|---|
AutoGluon-Tabular | training and (optionally) validation | File | CSV | CPU or GPU (single instance only) | No |
CatBoost | training and (optionally) validation | File | CSV | CPU (single instance only) | No |
Factorization Machines | train and (optionally) test | File or Pipe | recordIO-protobuf | CPU (GPU for dense data) | Yes |
K-Nearest-Neighbors (k-NN) | train and (optionally) test | File or Pipe | recordIO-protobuf or CSV | CPU or GPU (single GPU device on one or more instances) | Yes |
LightGBM | training and (optionally) validation | File | CSV | CPU (single instance only) | No |
Linear Learner | train and (optionally) validation, test, or both | File or Pipe | recordIO-protobuf or CSV | CPU or GPU | Yes |
TabTransformer | training and (optionally) validation | File | CSV | CPU or GPU (single instance only) | No |
XGBoost (0.90-1, 0.90-2, 1.0-1, 1.2-1, 1.2-21) | train and (optionally) validation | File or Pipe | CSV, LibSVM, or Parquet | CPU (or GPU for 1.2-1) | Yes |