Offline store
The offline store is used for historical data when sub-second retrieval is not needed. It is typically used for data exploration, model training, and batch inference.
When you enable both the online and offline stores for your feature group, both stores
sync to avoid discrepancies between training and serving data. Please note that an
online store feature group with the InMemory
storage type enabled does not
currently support a corresponding feature group in the offline store (no online to
offline replication). For more information about ML model serving in Amazon SageMaker Feature Store, see Online store.
The offline store contains the following TableFormat
options. For
information about the offline store contents, see OfflineStoreConfig
in the Amazon SageMaker API Reference.
Glue table format
The Glue
format (default) is a standard Hive type table format for
AWS Glue. With AWS Glue, you can discover, prepare, move, and integrate data from
multiple sources. It also includes additional productivity and data ops tooling for
authoring, running jobs, and implementing business workflows. For more information
about AWS Glue, see What is AWS Glue?.
Iceberg table format
The Iceberg
format (recommended) is an open table format for very
large analytic tables. With Iceberg
, you can compact the small data
files into fewer large files in the partition, resulting in significantly faster
queries. This compaction operation is concurrent and does not affect ongoing read
and write operations on the feature group. For more information about optimizing
Iceberg tables, see the Amazon Athena
and AWS Lake Formation user guides.
Iceberg
manages large collections of files as tables and supports
modern analytical data lake operations. If you choose the Iceberg
option when creating new feature groups, Amazon SageMaker Feature Store creates the Iceberg
tables using Parquet file format, and registers the tables with the AWS Glue Data Catalog. For
more information about Iceberg
table formats, see Using Apache
Iceberg tables.
Important
Note that for feature groups in Iceberg
table format, you must
specify String
as the feature type for the event time. If you
specify any other type, you can't create the feature group successfully.