Delete records from your feature groups
You can use the Amazon SageMaker Feature Store API to delete records from your feature groups. A feature group is an object that contains your machine learning (ML) data, where the columns of your data are described by features and your data are contained in records. A record contains values for features that are associated with a specific record identifier.
There are two storage configurations for your feature groups: online store and offline store. The online store only keeps the record with the latest event time and is typically used for real-time lookup for ML inference. The offline store keeps all records and acts as a historical database and is typically used for feature exploration, ML training, and batch inference.
For more information on Feature Store concepts, see Ingestion diagrams.
There are two ways to delete records from your feature groups, and the behavior is different depending on the storage configuration. In the following topics we will describe how to soft and hard delete records from the online and offline stores and provide examples.
Delete records from the online store
You can soft or hard delete a record from the online store using the
DeleteRecord
API by using the DeletionMode
request parameter to
specify SoftDelete
(default) or HardDelete
. For more information on the
DeleteRecord
API, see DeleteRecord
in the Amazon SageMaker API Reference.
With the online store:
-
When you soft delete (default), the record is no longer retrievable by GetRecord or BatchGetRecord and the feature column values are set to
null
, except for theRecordIdentifer
andEventTime
feature values. -
When you hard delete, the record is completely removed from the online store.
In both cases Feature Store appends the deleted record marker to the OfflineStore
. The
deleted record marker is a record with the same RecordIdentifer
as the original, but
with is_deleted
value set to True
, EventTime
set to the
delete input EventTime
, and other feature values set to null
.
Note that the EventTime
specified in DeleteRecord
should be set
later than the EventTime
of the existing record in the OnlineStore
for
that same RecordIdentifer
. If it is not, the deletion does not occur:
-
For
SoftDelete
, the existing (not deleted) record remains in theOnlineStore
, though the delete record marker is still written to theOfflineStore
. -
HardDelete
returnsEventTime
:400 ValidationException
to indicate that the delete operation failed. No delete record marker is written to theOfflineStore
.
The following examples use the SDK for Python (Boto3) delete_record
-
Feature group name (
)feature-group-name
-
Record identifier value as a string (
)record-identifier-value
-
Deletion event time (
)deletion-event-time
The deletion event time should be later than the event time of the record you wish to delete.
Online store soft delete example
For soft delete you will need use the DeleteRecord
API and can use the default
DeletionMode
or set the DeletionMode
to SoftDelete
.
import boto3 client = boto3.client('sagemaker-featurestore-runtime') client.delete_record( FeatureGroupName='
feature-group-name
', RecordIdentifierValueAsString='record-identifier-value
', EventTime='deletion-event-time
', TargetStores=[ 'OnlineStore', ], DeletionMode='SoftDelete' )
Online store hard delete example
For hard delete you will need use the DeleteRecord
API and set the
DeletionMode
to HardDelete
.
import boto3 client = boto3.client('sagemaker-featurestore-runtime') client.delete_record( FeatureGroupName='
feature-group-name
', RecordIdentifierValueAsString='record-identifier-value
', EventTime='deletion-event-timestamp
', TargetStores=[ 'OnlineStore', ], DeletionMode='HardDelete' )
Delete records from the offline store
With Amazon SageMaker Feature Store you can soft and hard delete a record from the OfflineStore
Iceberg table format. With the OfflineStore
Iceberg table format:
-
When you soft delete a record the latest version of the Iceberg table file will not contain the record, but previous versions will still contain the record and can be accessed using time travel. For information on time travel, see Querying Iceberg table data and performing time travel in the Athena user guide.
-
When you hard delete a record you are removing previous versions of the Iceberg table that contain the record. In this case you should specify which versions of the Iceberg table you wish to delete.
Obtain your Iceberg table name
To soft and hard delete from your OfflineStore
Iceberg table, you will need to
obtain your Iceberg table name,
. The
following instructions assumes you have already used Feature Store to create a feature group using the
offline store storage configuration using the Iceberg table format, with
iceberg-table-name
DisableGlueTableCreation = False
(default). For more information on creating
feature groups, see Get started with Amazon SageMaker Feature Store.
To obtain your
, use the iceberg-table-name
DescribeFeatureGroup
API to obtain DataCatalogConfig
. This contains the metadata of the Glue table which
serves as data catalog for the OfflineStore
. The TableName
within the
DataCatalogConfig
is your
.iceberg-table-name
Amazon Athena offline store soft and hard delete example
The following instructions use Amazon Athena to soft delete then hard delete a record from the
OfflineStore
Iceberg table. This assumes that the record you intend to delete in
your OfflineStore
is a deleted record marker. For information on the deleted record
marker in your OfflineStore
, see Delete records from the online
store.
-
Obtain your Iceberg table name,
. For information on how to obtain your Iceberg table name, see Obtain your Iceberg table name.iceberg-table-name
-
Run the
DELETE
command to soft delete the records on theOfflineStore
, such that the latest version (or snapshot) of the Iceberg table will not contain the records. The following example deletes the records whereis_deleted
is'True'
and the previous event-time versions of the those records .You may add additional conditions based on other features to restrict the deletion. For more information on usingDELETE
with Athena, seeDELETE
in the Athena user guide.DELETE FROM
iceberg-table-name
WHERErecord-id-feature-name
IS IN ( SELECTrecord-id-feature-name
FROMiceberg-table-name
WHERE is_deleted = 'True')The soft deleted records are still viewable on previous file versions by performing time travel. For information on performing time travel, see Querying Iceberg table data and performing time travel in the Athena user guide.
-
Remove the record from previous versions of your Iceberg tables to hard delete the record from
OfflineStore
:-
Run the
OPTIMIZE
command to rewrite the data files into a more optimized layout, based on their size and number of associated delete files. For more information on optimizing Iceberg tables and the syntax, see Optimizing Iceberg tables in the Athena user guide.OPTIMIZE
iceberg-table-name
REWRITE DATA USING BIN_PACK -
(Optional, only need to run once) Run the
ALTER TABLE
command to alter the Iceberg table set values, and set when previous file versions are to be hard deleted according to your specifications. This can be done by assigning values tovacuum_min_snapshots_to_keep
andvacuum_max_snapshot_age_seconds
properties. For more information on altering your Iceberg table set properties, see ALTER TABLE SET PROPERTIES in the Athena user guide. For more information on Iceberg table property key-value pairs, see Table properties in the Athena user guide.ALTER TABLE
iceberg-table-name
SET TBLPROPERTIES ( 'vacuum_min_snapshots_to_keep'='your-specified-value
', 'vacuum_max_snapshot_age_seconds'='your-specified-value
' ) -
Run the
VACUUM
command to remove no longer needed data files for your Iceberg tables, not referenced by the current version. TheVACUUM
command should run after the deleted record is no longer referenced in the current snapshot. For example,vacuum_max_snapshot_age_seconds
after the deletion. For more information onVACUUM
with Athena and the syntax, seeVACUUM
.VACUUM
iceberg-table-name
-
Apache Spark offline store soft and hard delete example
To soft and then hard delete a record from the OfflineStore
Iceberg table
using Apache Spark, you can follow the same instructions as in the Amazon Athena offline store soft
and hard delete example above, but using Spark
procedures. For a full list of procedures, see Spark Procedures
-
When soft deleting from the
OfflineStore
: instead of using theDELETE
command in Athena, use theDELETE FROM
command in Apache Spark. -
To remove the record from previous versions of your Iceberg tables to hard delete the record from
OfflineStore
:-
When changing your Iceberg table configuration: instead of using the
ALTER TABLE
command from Athena, useexpire_snapshots
procedure. -
To remove no longer needed data files from your Iceberg tables: instead of using the
VACUUM
command in Athena, use theremove_orphan_files
procedure.
-