# Create, store, and share features with Feature Store
<a name="feature-store"></a>

The machine learning (ML) development process includes extracting raw data, transforming it into *features* (meaningful inputs for your ML model). Those features are then stored in a serviceable way for data exploration, ML training, and ML inference. Amazon SageMaker Feature Store simplifies how you create, store, share, and manage features. This is done by providing feature store options and reducing repetitive data processing and curation work.

Among other things, with Feature Store you can:
+ Simplify feature processing, storing, retrieving, and sharing features for ML development across accounts or in an organization.
+ Track your feature processing code development, apply your feature processor to the raw data, and ingest your features into Feature Store in a consistent way. This reduces training-serving skew, a common issue in ML where the difference between performance during training and serving can impact the accuracy of your ML model.
+ Store your features and associated metadata in feature groups, so features can be easily discovered and reused. Feature groups are mutable and can evolve their schema after creation.
+ Create feature groups that can be configured to include an online or offline store, or both, to manage your features and automate how features are stored for your ML tasks.
  + The online store retains only the latest records for your features. This is primarily designed for supporting real-time predictions that need low millisecond latency reads and high throughput writes.
  + The offline store keeps all records for your features as a historical database. This is primarily intended for data exploration, model training, and batch predictions.

The following diagram shows how you can use Feature Store as part of your ML pipeline. Once you read in your raw data, you can use Feature Store to transform the raw data into features and ingest them into your feature group. The features can be ingested via streaming or batches to the feature group's online and offline stores. The features can then be served for data exploration, model training, and real-time or batch inference.

![\[Where Feature Store fits in your machine learning pipeline.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/feature-store/feature-store-overview.png)


## How Feature Store works
<a name="how-feature-store-works"></a>

In Feature Store, features are stored in a collection called a *feature group*. You can visualize a feature group as a table in which each column is a feature, with a unique identifier for each row. In principle, a feature group is composed of features and values specific to each feature. A `Record` is a collection of values for features that correspond to a unique `RecordIdentifier`. Altogether, a `FeatureGroup` is a group of features defined in your `FeatureStore` to describe a `Record`.  

 You can use Feature Store in the following modes:  
+  **Online** – In online mode, features are read with low latency (milliseconds) reads and used for high throughput predictions. This mode requires a feature group to be stored in an online store.  
+  **Offline** – In offline mode, large streams of data are fed to an offline store, which can be used for training and batch inference. This mode requires a feature group to be stored in an offline store. The offline store uses your S3 bucket for storage and can also fetch data using Athena queries.  
+  **Online and Offline** – This includes both online and offline modes. 

You can ingest data into feature groups in Feature Store in two ways: streaming or in batches. When you ingest data through streaming, a collection of records are pushed to Feature Store by calling a synchronous `PutRecord` API call. This API enables you to maintain the latest feature values in Feature Store and to push new feature values as soon an update is detected. 

Alternatively, Feature Store can process and ingest data in batches. For example, you can author features using Amazon SageMaker Data Wrangler and export a notebook from Data Wrangler. The notebook can be a SageMaker Processing job that ingests the features in batches to a Feature Store feature group. This mode allows for batch ingestion into the offline store. It also supports ingestion into the online store if the feature group is configured for both online and offline use.  

## Create feature groups
<a name="create-feature-groups"></a>

To ingest features into Feature Store, you must first define the feature group and the feature definitions (feature name and data type) for all features that belong to the feature group. After they are created, feature groups are mutable and can evolve their schema. Feature group names are unique within an AWS Region and AWS account. When creating a feature group, you can also create the metadata for the feature group. The metadata can contain a short description, storage configuration, features for identifying each record, and the event time. Furthermore, the metadata can include tags to store information such as the author, data source, version, and more. 

**Important**  
`FeatureGroup` names or associated metadata such as description or tags should not contain any personal identifiable information (PII) or confidential information. 

## Find, discover, and share features
<a name="Find-discover-share-features"></a>

After you create a feature group in Feature Store, other authorized users of the feature store can share and discover it. Users can browse through a list of all feature groups in Feature Store or discover existing feature groups by searching by feature group name, description, record identifier name, creation date, and tags.  

## Real-time inference for features stored in the online store 
<a name="real-time-inference"></a>

With Feature Store, you can enrich your features stored in the online store in real time with data from a streaming source (clean stream data from another application) and serve the features with low millisecond latency for real-time inference.  

You can also perform joins across different `FeatureGroups` for real-time inference by querying two different `FeatureGroups` in the client application.  

## Offline store for model training and batch inference
<a name="offline-store-for-model-training"></a>

Feature Store provides offline storage for feature values in your S3 bucket. Your data is stored in your S3 bucket using a prefixing scheme based on event time. The offline store is an append-only store, enabling Feature Store to maintain a historical record of all feature values. Data is stored in the offline store in Parquet format for optimized storage and query access.

You can query, explore, and visualize features using Data Wrangler from the console.  Feature Store supports combining data to produce, train, validate, and test data sets, and allows you to extract data at different points in time. 

## Feature data ingestion
<a name="feature-data-ingestion"></a>

Feature generation pipelines can be created to process large batches (1 million rows of data or more) or small batches, and to write feature data to the offline or online store. Streaming sources such as Amazon Managed Streaming for Apache Kafka or Amazon Kinesis can also be used as data sources from which features are extracted and directly fed to the online store for training, inference, or feature creation.  

You can push records to Feature Store by calling the synchronous `PutRecord` API call. Since this is a synchronous API call, it allows small batches of updates to be pushed in a single API call. This enables you to maintain high freshness of the feature values and publish values as soon as an update is detected. These are also called *streaming features*. 

When feature data is ingested and updated, Feature Store stores historical data for all features in the offline store. For batch ingest, you can pull feature values from your S3 bucket or use Athena to query. You can also use Data Wrangler to process and engineer new features that can then be exported to a chosen S3 bucket to be accessed by Feature Store. For batch ingestion, you can configure a processing job to batch ingest your data into Feature Store, or you can pull feature values from your S3 bucket using Athena.  

To remove a `Record` from your online store, use the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_feature_store_DeleteRecord.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_feature_store_DeleteRecord.html) API call. This will also add the deleted record to the offline store.

## Resilience in Feature Store
<a name="feature-store-resilience"></a>

Feature Store is distributed across multiple Availability Zones (AZs). An AZ is an isolated location within an AWS Region. If some AZs fail, Feature Store can use other AZs. For more information about AZs, see [Resilience in Amazon SageMaker AI](disaster-recovery-resiliency.md). 

# Get started with Amazon SageMaker Feature Store
<a name="feature-store-getting-started"></a>

The following topics give information about using Amazon SageMaker Feature Store. First learn the Feature Store concepts, then how to manage permissions to use Feature Store, how to create and use feature groups using Studio Classic, Jupyter or JupyterLab notebook, how to use Feature Store using the User Interface through the console, and how to delete feature groups using the console and AWS SDK for Python (Boto3).

The instructions on using Feature Store through the console depends on if you have enabled Studio or Studio Classic as your default experience. For information on accessing Studio Classic, see [Launch Amazon SageMaker Studio Classic Using the Amazon SageMaker AI Console](studio-launch.md#studio-launch-console).

**Topics**
+ [

# Feature Store concepts
](feature-store-concepts.md)
+ [

# Adding policies to your IAM role
](feature-store-adding-policies.md)
+ [

# Use Feature Store with SDK for Python (Boto3)
](feature-store-create-feature-group.md)
+ [

# Using Amazon SageMaker Feature Store in the console
](feature-store-use-with-studio.md)
+ [

# Delete a feature group
](feature-store-delete-feature-group.md)

# Feature Store concepts
<a name="feature-store-concepts"></a>

We list common terms used in Amazon SageMaker Feature Store, followed by example diagrams to visualize a few concepts: 
+  **Feature Store**: Storage and data management layer for machine learning (ML) features. Serves as the single source of truth to store, retrieve, remove, track, share, discover, and control access to features. In the following example diagram, the Feature Store is a store for your feature groups, which contains your ML data, and provides additional services. 
+  **Online store**: Low latency, high availability store for a feature group that enables real-time lookup of records. The online store allows quick access to the latest record via the `GetRecord` API. 
+  **Offline store**: Stores historical data in your Amazon S3 bucket. The offline store is used when low (sub-second) latency reads are not needed. For example, the offline store can be used when you want to store and serve features for exploration, model training, and batch inference. 
+  **Feature group**: The main resource of Feature Store that contains the data and metadata used for training or predicting with a ML model. A feature group is a logical grouping of features used to describe records. In the following example diagram, a feature group contains your ML data. 
+  **Feature**: A property that is used as one of the inputs to train or predict using your ML model. In the Feature Store API a feature is an attribute of a record. In the following example diagram, a feature describes a column in your ML data table. 
+  **Feature definition**: Consists of a name and one of the data types: integral, string or fractional. A feature group contains a list of feature definitions. For more information on Feature Store data types, see [Data types](feature-store-quotas.md#feature-store-data-types). 
+  **Record**: Collection of values for features for a single record identifier. A combination of record identifier and event time values uniquely identify a record within a feature group. In the following example diagram, a record is a row in your ML data table. 
+  **Record identifier name**: The record identifier name is the name of the feature that identifies the records. It must refer to one of the names of a feature defined in the feature group's feature definitions. Each feature group is defined with a record identifier name. 
+  **Event time**: Timestamp that you provide corresponding to when the record event occurred. All records in a feature group must have a corresponding event time. The online store only contains the record corresponding to the latest event time, whereas the offline store contains all historic records. For more information on event time formats, see [Data types](feature-store-quotas.md#feature-store-data-types). 
+  **Ingestion**: Adding new records to a feature group. Ingestion is typically achieved via the `PutRecord` API. 

**Topics**
+ [

## Concepts overview diagram
](#feature-store-concepts-overview)
+ [

## Ingestion diagrams
](#feature-store-concepts-ingestion)

## Concepts overview diagram
<a name="feature-store-concepts-overview"></a>

The following example diagram conceptualizes a few Feature Store concepts: 

 ![\[An example representation of a feature group using an example table as reference.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/feature-store/feature-store-feature-group-components.png) 

The Feature Store contains your feature groups and a feature group contains your ML data. In the example diagram, the original feature group contains a data table that has three features (each describing a column) and two records (rows). 
+ A feature's definition describes the feature name and data type of the feature values that are associated with records. 
+ A record contains the feature values and is uniquely identified by its record identifier and must include the event time. 

## Ingestion diagrams
<a name="feature-store-concepts-ingestion"></a>

Ingestion is the action of adding a record or records to an existing feature group. The online and offline stores are updated differently for different storage use cases. 

**Ingestion to the online store example**

The online store acts as a real-time look-up of records and only keeps the most up-to-date records. Once a record is ingested into an existing online store, the updated online store will only keep the record with the latest event time.

In the following example diagram, the original online store contains a ML data table with one record. A record is ingested with the same record identifier name as the original record, and the ingested record has an earlier event time than the original record. As the updated online store only keeps the record with the latest event time, the updated online store contains the original record.

 ![\[An example showing how records are ingested in the online store.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/feature-store/feature-store-ingestion-online-store.png) 

**Ingestion to the offline store example**

The offline store acts as a historical look-up of records and keeps all records. After a new record is ingested into an existing offline store, the updated offline store will keep the new record. 

In the following example diagram, the original offline store contains a ML data table with one record. A record is ingested with the same record identifier name as the original record, and the ingested record has an event time earlier than the original record. As the updated offline store keeps all of the records, the updated offline store contains both records.

 ![\[An example showing how records are ingested in the offline store.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/feature-store/feature-store-ingestion-offline-store.png) 

# Adding policies to your IAM role
<a name="feature-store-adding-policies"></a>

To get started with Amazon SageMaker Feature Store you must have a role and add the required policy to your role, `AmazonSageMakerFeatureStoreAccess`. The following is a walkthrough on how to view the policies attached to a role and how to add a policy to your role. For information on how to create a role, see [How to use SageMaker AI execution roles](sagemaker-roles.md). For information on how to get your execution role, see [Get your execution role](sagemaker-roles.md#sagemaker-roles-get-execution-role).

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the navigation pane on the left of the IAM console, choose **Roles**.

1. In the search bar enter the role you are using for Amazon SageMaker Feature Store. 

   For examples on how to find your execution role ARN for a notebook within SageMaker AI, see [Get your execution role](sagemaker-roles.md#sagemaker-roles-get-execution-role). The role is at the end of the execution role ARN.

1. After you enter the role in the search bar, choose the role.

   Under **Permissions policies** you can view the policies attached to the role.

1. After you choose the role, choose **Add permissions**, then choose **Attach policies**.

1. In the search bar under **Other permissions policies** enter `AmazonSageMakerFeatureStoreAccess` and press enter. If the policy does not show, you may already have the policy attached, listed under your **Current permissions policies**.

1. After you press enter, select the **check box** next to the policy and then choose **Add permissions**.

1. After you have attached the policy to your role, the policy will appear under **Permissions policies** for your IAM role.

# Use Feature Store with SDK for Python (Boto3)
<a name="feature-store-create-feature-group"></a>

The feature group is the main Feature Store resource that contains your machine learning (ML) data and metadata stored in Amazon SageMaker Feature Store. A feature group is a logical grouping of features and records. A feature group’s definition is composed of a configurations for its online and offline store and a list of feature definitions that are used to describe the values of your records. The feature definitions must include a record identifier name and an event time name. For more information on feature store concepts, see [Feature Store concepts](feature-store-concepts.md).

Prior to using a feature store you typically load your dataset, run transformations, and set up your features for ingestion. This process has a lot of variation and is highly dependent on your data. The example code in the following topics refer to the [ Introduction to Feature Store](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-featurestore/feature_store_introduction.html) and [Fraud Detection with Amazon SageMaker Feature Store](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-featurestore/sagemaker_featurestore_fraud_detection_python_sdk.html) example notebooks, respectively. Both use the AWS SDK for Python (Boto3). For more Feature Store examples and resources, see [Amazon SageMaker Feature Store resources](feature-store-resources.md).

Feature Store supports the following feature types: `String`, `Fractional` (IEEE 64-bit floating point value), and `Integral` (Int64 - 64 bit signed integral value). The default type is set to `String`. This means that, if a column in your dataset is not of a `float` or `long` feature type, it defaults to `String` in your feature store.

You may use a schema to describe your data’s columns and data types. You pass this schema into `FeatureDefinitions`, a required parameter for a `FeatureGroup`. You can use the SDK for Python (Boto3), which has automatic data type detection when you use the `load_feature_definitions` function.

The default behavior when a new feature record is added with an already existing record ID is as follows. In the offline store, the new record will be appended. In the online store, if the event time of the new record is less than the existing event time then nothing will happen, but if the event time of the new record is greater than or equal to the existing event time, the record will be overwritten.

When you create a new feature group you can choose one of the following table formats:
+ AWS Glue (Default)
+ Apache Iceberg

Ingesting data, especially when streaming, can result in a large number of small files deposited into the offline store. This can negatively impact query performance due the higher number of file operations required. To avoid potential performance issues, use the Apache Iceberg table format when creating new feature groups. With Iceberg you can compact the small data files into fewer large files in the partition, resulting in significantly faster queries. This compaction operation is concurrent and does not affect ongoing read and write operations on the feature group. If you choose the Iceberg option when creating new feature groups, Amazon SageMaker Feature Store will create the Iceberg tables using Parquet file format, and register the tables with the AWS Glue Data Catalog.

**Important**  
Note that for feature groups in Iceberg table format, you must specify `String` as the value for the event time. If you specify any other type, you can't create the feature group successfully.

In the following we list some available Feature Store managed resources.

**Topics**
+ [

# Introduction to Feature Store example notebook
](feature-store-introduction-notebook.md)
+ [

# Fraud detection with Feature Store example notebook
](feature-store-fraud-detection-notebook.md)

# Introduction to Feature Store example notebook
<a name="feature-store-introduction-notebook"></a>

**Important**  
Custom IAM policies that allow Amazon SageMaker Studio or Amazon SageMaker Studio Classic to create Amazon SageMaker resources must also grant permissions to add tags to those resources. The permission to add tags to resources is required because Studio and Studio Classic automatically tag any resources they create. If an IAM policy allows Studio and Studio Classic to create resources but does not allow tagging, "AccessDenied" errors can occur when trying to create resources. For more information, see [Provide permissions for tagging SageMaker AI resources](security_iam_id-based-policy-examples.md#grant-tagging-permissions).  
[AWS managed policies for Amazon SageMaker AI](security-iam-awsmanpol.md) that give permissions to create SageMaker resources already include permissions to add tags while creating those resources.

The example code on this page refers to the [Introduction to Feature Store](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-featurestore/feature_store_introduction.html) example notebook. We recommend that you run this notebook in Studio Classic, notebook instances, or JupyterLab because the code in this guide is conceptual and not fully functional if copied.

Use the following to clone the [aws/amazon-sagemaker-examples](https://github.com/aws/amazon-sagemaker-examples) GitHub repository, containing the example notebook:
+ **For Studio Classic**

  Launch Studio Classic. You can open Studio Classic if Studio or Studio Classic is enabled as your default experience. For instructions on how to open Studio Classic, see [Launch Amazon SageMaker Studio Classic Using the Amazon SageMaker AI Console](studio-launch.md#studio-launch-console).

  Clone the [aws/amazon-sagemaker-examples](https://github.com/aws/amazon-sagemaker-examples) GitHub repository to Studio Classic by following the steps in [Clone a Git Repository in Amazon SageMaker Studio Classic](studio-tasks-git.md).
+ **For Amazon SageMaker notebook instances**

  Launch SageMaker notebook instance by following the instructions in [Access Notebook Instances](howitworks-access-ws.md).

Now that you have the SageMaker AI example notebooks, navigate to the `amazon-sagemaker-examples/sagemaker-featurestore` directory and open the [Introduction to Feature Store](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-featurestore/feature_store_introduction.html) example notebook.

## Step 1: Set up your SageMaker AI session
<a name="feature-store-setup"></a>

To start using Feature Store, create a SageMaker AI session. Then, set up the Amazon Simple Storage Service (Amazon S3) bucket that you want to use for your features. The Amazon S3 bucket is your offline store. The following code uses the SageMaker AI default bucket and adds a custom prefix to it.

**Note**  
The role that you use to run the notebook must have the following managed policies attached to it: `AmazonS3FullAccess` and `AmazonSageMakerFeatureStoreAccess`. For information about adding policies to your IAM role, see [Adding policies to your IAM role](feature-store-adding-policies.md).

```
# SageMaker Python SDK version 2.x is required
import sagemaker
import sys
```

```
import boto3
import pandas as pd
import numpy as np
import io
from sagemaker.session import Session
from sagemaker import get_execution_role

prefix = 'sagemaker-featurestore-introduction'
role = get_execution_role()

sagemaker_session = sagemaker.Session()
region = sagemaker_session.boto_region_name
s3_bucket_name = sagemaker_session.default_bucket()
```

## Step 2: Inspect your data
<a name="feature-store-load-datasets"></a>

In this notebook example, we ingest synthetic data from the [GitHub repository](https://github.com/aws/amazon-sagemaker-examples/tree/master/sagemaker-featurestore/data) that hosts the full notebook.

```
customer_data = pd.read_csv("data/feature_store_introduction_customer.csv")
orders_data = pd.read_csv("data/feature_store_introduction_orders.csv")

print(customer_data.head())
print(orders_data.head())
```

The following diagram illustrates the steps that data goes through before Feature Store ingests it. In this notebook, we illustrate the use case where you have data from multiple sources and want to store them independently in a Feature Store. Our example considers data from a data warehouse (customer data), and data from a real-time streaming service (order data).

![\[Feature group creation and data ingestion in Feature Store for this example notebook.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/feature-store/feature-store-intro-diagram.png)


## Step 3: Create feature groups
<a name="feature-store-set-up-feature-groups-introduction"></a>

We first start by creating feature group names for customer\$1data and orders\$1data. Following this, we create two feature groups, one for `customer_data` and another for `orders_data`:

```
import time
from time import strftime, gmtime
customers_feature_group_name = 'customers-feature-group-' + strftime('%d-%H-%M-%S', gmtime())
orders_feature_group_name = 'orders-feature-group-' + strftime('%d-%H-%M-%S', gmtime())
```

Instantiate a `FeatureGroup` object for `customers_data` and `orders_data`:

```
from sagemaker.feature_store.feature_group import FeatureGroup

customers_feature_group = FeatureGroup(
    name=customers_feature_group_name, sagemaker_session=sagemaker_session
)
orders_feature_group = FeatureGroup(
    name=orders_feature_group_name, sagemaker_session=sagemaker_session
)
```

```
import time
current_time_sec = int(round(time.time()))
record_identifier_feature_name = "customer_id"
```

Append `EventTime` feature to your data frame. This parameter is required, and timestamps each data point:

```
customer_data["EventTime"] = pd.Series([current_time_sec]*len(customer_data), dtype="float64")
orders_data["EventTime"] = pd.Series([current_time_sec]*len(orders_data), dtype="float64")
```

Load feature definitions to your feature group:

```
customers_feature_group.load_feature_definitions(data_frame=customer_data)
orders_feature_group.load_feature_definitions(data_frame=orders_data)
```

The following calls `create` to create two feature groups, `customers_feature_group` and `orders_feature_group`, respectively:

```
customers_feature_group.create(
    s3_uri=f"s3://{s3_bucket_name}/{prefix}",
    record_identifier_name=record_identifier_feature_name,
    event_time_feature_name="EventTime",
    role_arn=role,
    enable_online_store=True
)

orders_feature_group.create(
    s3_uri=f"s3://{s3_bucket_name}/{prefix}",
    record_identifier_name=record_identifier_feature_name,
    event_time_feature_name="EventTime",
    role_arn=role,
    enable_online_store=True
)
```

To confirm that your feature group was created, we display it by using `DescribeFeatureGroup` and `ListFeatureGroups` APIs:

```
customers_feature_group.describe()
```

```
orders_feature_group.describe()
```

```
sagemaker_session.boto_session.client('sagemaker', region_name=region).list_feature_groups() # We use the boto client to list FeatureGroups
```

## Step 4: Ingest data into a feature group
<a name="feature-store-set-up-record-identifier-event-time"></a>

After feature groups are created, we can put data into them. If you're using the SageMaker AI AWS SDK for Python (Boto3), use the `ingest` API call. If you're using SDK for Python (Boto3), then use the `PutRecord` API. It will take less than 1 minute to ingest data both of these options. This example uses the SageMaker AI SDK for Python (Boto3), so it uses the `ingest` API call:

```
def check_feature_group_status(feature_group):
    status = feature_group.describe().get("FeatureGroupStatus")
    while status == "Creating":
        print("Waiting for Feature Group to be Created")
        time.sleep(5)
        status = feature_group.describe().get("FeatureGroupStatus")
    print(f"FeatureGroup {feature_group.name} successfully created.")

check_feature_group_status(customers_feature_group)
check_feature_group_status(orders_feature_group)
```

```
customers_feature_group.ingest(
    data_frame=customer_data, max_workers=3, wait=True
)
```

```
orders_feature_group.ingest(
    data_frame=orders_data, max_workers=3, wait=True
)
```

Using an arbitrary customer record id, 573291 we use `get_record` to check that the data has been ingested into the feature group.

```
customer_id = 573291
sample_record = sagemaker_session.boto_session.client('sagemaker-featurestore-runtime', region_name=region).get_record(FeatureGroupName=customers_feature_group_name, RecordIdentifierValueAsString=str(customer_id))
```

```
print(sample_record)
```

The following demonstrates how to use the `batch_get_record` to get a batch of records.

```
all_records = sagemaker_session.boto_session.client(
    "sagemaker-featurestore-runtime", region_name=region
).batch_get_record(
    Identifiers=[
        {
            "FeatureGroupName": customers_feature_group_name,
            "RecordIdentifiersValueAsString": ["573291", "109382", "828400", "124013"],
        },
        {
            "FeatureGroupName": orders_feature_group_name,
            "RecordIdentifiersValueAsString": ["573291", "109382", "828400", "124013"],
        },
    ]
)
```

```
print(all_records)
```

## Step 5: Clean up
<a name="feature-store-load-feature-definitions"></a>

Here we remove the Feature Groups that we created.

```
customers_feature_group.delete()
orders_feature_group.delete()
```

## Step 6: Next steps
<a name="feature-store-setup-create-feature-group"></a>

In this example notebook, you learned how to get started with Feature Store, create feature groups, and ingest data into them.

For an advanced example on how to use Feature Store for a fraud detection use case, see [Fraud Detection with Feature Store](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-featurestore/sagemaker_featurestore_fraud_detection_python_sdk.html).

## Step 7: Code examples for programmers
<a name="feature-store-working-with-feature-groups"></a>

In this notebook we used a variety of different API calls. Most of them are accessible through the SageMaker Python SDK, however some only exist within Boto3. You can invoke the SageMaker Python SDK API calls directly on your Feature Store objects, whereas to invoke API calls that exist within Boto3, you must first access a Boto3 client through your Boto3 and SageMaker AI sessions: for example, `sagemaker_session.boto_session.client()`.

The following is a list of API calls for this notebook. These calls exist within the SDK for Python and exist in Boto3, for your reference:

 **SDK for Python (Boto3) API Calls ** 

```
describe()
ingest()
delete()
create()
load_feature_definitions()
```

 **Boto3 API Calls** 

```
list_feature_groups()
get_record()
```

# Fraud detection with Feature Store example notebook
<a name="feature-store-fraud-detection-notebook"></a>

**Important**  
Custom IAM policies that allow Amazon SageMaker Studio or Amazon SageMaker Studio Classic to create Amazon SageMaker resources must also grant permissions to add tags to those resources. The permission to add tags to resources is required because Studio and Studio Classic automatically tag any resources they create. If an IAM policy allows Studio and Studio Classic to create resources but does not allow tagging, "AccessDenied" errors can occur when trying to create resources. For more information, see [Provide permissions for tagging SageMaker AI resources](security_iam_id-based-policy-examples.md#grant-tagging-permissions).  
[AWS managed policies for Amazon SageMaker AI](security-iam-awsmanpol.md) that give permissions to create SageMaker resources already include permissions to add tags while creating those resources.

The example code on this page refers to the example notebook: [Fraud Detection with Amazon SageMaker Feature Store](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-featurestore/sagemaker_featurestore_fraud_detection_python_sdk.html). We recommend that you run this notebook in Studio Classic, notebook instances, or JupyterLab because the code in this guide is conceptual and not fully functional if copied.

Use the following to clone the [aws/amazon-sagemaker-examples](https://github.com/aws/amazon-sagemaker-examples) GitHub repository, containing the example notebook.
+ **For Studio Classic**

  First launch Studio Classic. You can open Studio Classic if Studio or Studio Classic is enabled as your default experience. To open Studio Classic, see [Launch Amazon SageMaker Studio Classic Using the Amazon SageMaker AI Console](studio-launch.md#studio-launch-console).

  Clone the [aws/amazon-sagemaker-examples](https://github.com/aws/amazon-sagemaker-examples) GitHub repository to Studio Classic by following the steps in [Clone a Git Repository in Amazon SageMaker Studio Classic](studio-tasks-git.md).
+ **For Amazon SageMaker notebook instances**

  First launch SageMaker notebook instance by following the instructions in [Access Notebook Instances](howitworks-access-ws.md).

  Then, follow the instructions in [Add a Git repository to your Amazon SageMaker AI account](nbi-git-resource.md).

Now that you have the SageMaker AI example notebooks, navigate to the `amazon-sagemaker-examples/sagemaker-featurestore` directory and open the [Fraud Detection with Amazon SageMaker Feature Store](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-featurestore/sagemaker_featurestore_fraud_detection_python_sdk.html) example notebook.

## Step 1: Set up your Feature Store session
<a name="feature-store-setup"></a>

To start using Feature Store, create a SageMaker AI session, Boto3 session, and a Feature Store session. Also, set up the Amazon S3 bucket you want to use for your features. This is your offline store. The following code uses the SageMaker AI default bucket and adds a custom prefix to it.

**Note**  
The role that you use to run the notebook must have the following managed policies attached to it: `AmazonSageMakerFullAccess` and `AmazonSageMakerFeatureStoreAccess`. For information about adding policies to your IAM role, see [Adding policies to your IAM role](feature-store-adding-policies.md).

```
import boto3
import sagemaker
from sagemaker.session import Session

sagemaker_session = sagemaker.Session()
region = sagemaker_session.boto_region_name
boto_session = boto3.Session(region_name=region)
role = sagemaker.get_execution_role()
default_bucket = sagemaker_session.default_bucket()
prefix = 'sagemaker-featurestore'
offline_feature_store_bucket = 's3://{}/{}'.format(default_bucket, prefix)

sagemaker_client = boto_session.client(service_name='sagemaker', region_name=region)
featurestore_runtime = boto_session.client(service_name='sagemaker-featurestore-runtime', region_name=region)

feature_store_session = Session(
    boto_session=boto_session,
    sagemaker_client=sagemaker_client,
    sagemaker_featurestore_runtime_client=featurestore_runtime
)
```

## Step 2: Load datasets and partition data into feature groups
<a name="feature-store-load-datasets"></a>

Load your data into data frames for each of your features. You use these data frames after you set up the feature group. In the fraud detection example, you can see these steps in the following code.

```
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import io

s3_client = boto3.client(service_name='s3', region_name=region)

fraud_detection_bucket_name = 'sagemaker-featurestore-fraud-detection'
identity_file_key = 'sampled_identity.csv'
transaction_file_key = 'sampled_transactions.csv'

identity_data_object = s3_client.get_object(Bucket=fraud_detection_bucket_name, Key=identity_file_key)
transaction_data_object = s3_client.get_object(Bucket=fraud_detection_bucket_name, Key=transaction_file_key)

identity_data = pd.read_csv(io.BytesIO(identity_data_object['Body'].read()))
transaction_data = pd.read_csv(io.BytesIO(transaction_data_object['Body'].read()))

identity_data = identity_data.round(5)
transaction_data = transaction_data.round(5)

identity_data = identity_data.fillna(0)
transaction_data = transaction_data.fillna(0)

# Feature transformations for this dataset are applied before ingestion into FeatureStore.
# One hot encode card4, card6
encoded_card_bank = pd.get_dummies(transaction_data['card4'], prefix = 'card_bank')
encoded_card_type = pd.get_dummies(transaction_data['card6'], prefix = 'card_type')

transformed_transaction_data = pd.concat([transaction_data, encoded_card_type, encoded_card_bank], axis=1)
transformed_transaction_data = transformed_transaction_data.rename(columns={"card_bank_american express": "card_bank_american_express"})
```

## Step 3: Set up feature groups
<a name="feature-store-set-up-feature-groups-fraud-detection"></a>

When you set up your feature groups, you need to customize the feature names with a unique name and set up each feature group with the `FeatureGroup` class.

```
from sagemaker.feature_store.feature_group import FeatureGroup
feature_group_name = "some string for a name"
feature_group = FeatureGroup(name=feature_group_name, sagemaker_session=feature_store_session)
```

For example, in the fraud detection example, the two feature groups are `identity` and `transaction`. In the following code you can see how the names are customized with a timestamp, and then each group is set up by passing in the name and the session.

```
import time
from time import gmtime, strftime, sleep
from sagemaker.feature_store.feature_group import FeatureGroup

identity_feature_group_name = 'identity-feature-group-' + strftime('%d-%H-%M-%S', gmtime())
transaction_feature_group_name = 'transaction-feature-group-' + strftime('%d-%H-%M-%S', gmtime())

identity_feature_group = FeatureGroup(name=identity_feature_group_name, sagemaker_session=feature_store_session)
transaction_feature_group = FeatureGroup(name=transaction_feature_group_name, sagemaker_session=feature_store_session)
```

## Step 4: Set up record identifier and event time features
<a name="feature-store-set-up-record-identifier-event-time"></a>

In this step, you specify a record identifier name and an event time feature name. This name maps to the column of the corresponding features in your data. For example, in the fraud detection example, the column of interest is `TransactionID`. `EventTime` can be appended to your data when no timestamp is available. In the following code, you can see how these variables are set, and then `EventTime` is appended to both feature’s data.

```
record_identifier_name = "TransactionID"
event_time_feature_name = "EventTime"
current_time_sec = int(round(time.time()))
identity_data[event_time_feature_name] = pd.Series([current_time_sec]*len(identity_data), dtype="float64")
transformed_transaction_data[event_time_feature_name] = pd.Series([current_time_sec]*len(transaction_data), dtype="float64")
```

## Step 5: Load feature definitions
<a name="feature-store-load-feature-definitions"></a>

You can now load the feature definitions by passing a data frame containing the feature data. In the following code for the fraud detection example, the identity feature and transaction feature are each loaded by using `load_feature_definitions`, and this function automatically detects the data type of each column of data. For developers using a schema rather than automatic detection, see the [Export Feature Groups from Data Wrangler](https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-data-export.html#data-wrangler-data-export-feature-store) example for code that shows how to load the schema, map it, and add it as a `FeatureDefinition` that you can use to create the `FeatureGroup`. This example also covers a AWS SDK for Python (Boto3) implementation, which you can use instead of the SageMaker Python SDK.

```
identity_feature_group.load_feature_definitions(data_frame=identity_data); # output is suppressed
transaction_feature_group.load_feature_definitions(data_frame=transformed_transaction_data); # output is suppressed
```

## Step 6: Create a feature group
<a name="feature-store-setup-create-feature-group"></a>

In this step, you use the `create` function to create the feature group. The following code shows all of the available parameters. The online store is not created by default, so you must set this as `True` if you want to enable it. The `s3_uri` is the S3 bucket location of your offline store.

```
# create a FeatureGroup
feature_group.create(
    description = "Some info about the feature group",
    feature_group_name = feature_group_name,
    record_identifier_name = record_identifier_name,
    event_time_feature_name = event_time_feature_name,
    feature_definitions = feature_definitions,
    role_arn = role,
    s3_uri = offline_feature_store_bucket,
    enable_online_store = True,
    online_store_kms_key_id = None,
    offline_store_kms_key_id = None,
    disable_glue_table_creation = False,
    data_catalog_config = None,
    tags = ["tag1","tag2"])
```

The following code from the fraud detection example shows a minimal `create` call for each of the two features groups being created.

```
identity_feature_group.create(
    s3_uri=offline_feature_store_bucket,
    record_identifier_name=record_identifier_name,
    event_time_feature_name=event_time_feature_name,
    role_arn=role,
    enable_online_store=True
)

transaction_feature_group.create(
    s3_uri=offline_feature_store_bucket,
    record_identifier_name=record_identifier_name,
    event_time_feature_name=event_time_feature_name,
    role_arn=role,
    enable_online_store=True
)
```

When you create a feature group, it takes time to load the data, and you need to wait until the feature group is created before you can use it. You can check status using the following method.

```
status = feature_group.describe().get("FeatureGroupStatus")
```

While the feature group is being created, you receive `Creating` as a response. When this step has finished successfully, the response is `Created`. Other possible statuses are `CreateFailed`, `Deleting`, or `DeleteFailed`.

## Step 7: Work with feature groups
<a name="feature-store-working-with-feature-groups"></a>

Now that you've set up your feature group, you can perform any of the following tasks:

**Topics**
+ [

### Describe a feature group
](#feature-store-describe-feature-groups)
+ [

### List feature groups
](#feature-store-list-feature-groups)
+ [

### Put records in a feature group
](#feature-store-put-records-feature-group)
+ [

### Get records from a feature group
](#feature-store-get-records-feature-group)
+ [

### Generate hive DDL commands
](#feature-store-generate-hive-ddl-commands-feature-group)
+ [

### Build a training dataset
](#feature-store-build-training-dataset)
+ [

### Write and execute an Athena query
](#feature-store-write-athena-query)
+ [

### Delete a feature group
](#feature-store-delete-feature-group)

### Describe a feature group
<a name="feature-store-describe-feature-groups"></a>

You can retrieve information about your feature group with the `describe` function.

```
feature_group.describe()
```

### List feature groups
<a name="feature-store-list-feature-groups"></a>

You can list all of your feature groups with the `list_feature_groups` function.

```
sagemaker_client.list_feature_groups()
```

### Put records in a feature group
<a name="feature-store-put-records-feature-group"></a>

You can use the `ingest` function to load your feature data. You pass in a data frame of feature data, set the number of workers, and choose to wait for it to return or not. The following example demonstrates using the `ingest` function.

```
feature_group.ingest(
    data_frame=feature_data, max_workers=3, wait=True
)
```

For each feature group you have, run the `ingest` function on the feature data you want to load.

### Get records from a feature group
<a name="feature-store-get-records-feature-group"></a>

You can use the `get_record` function to retrieve the data for a specific feature by its record identifier. The following example uses an example identifier to retrieve the record.

```
record_identifier_value = str(2990130)
featurestore_runtime.get_record(FeatureGroupName=transaction_feature_group_name, RecordIdentifierValueAsString=record_identifier_value)
```

An example response from the fraud detection example:

```
...
'Record': [{'FeatureName': 'TransactionID', 'ValueAsString': '2990130'},
  {'FeatureName': 'isFraud', 'ValueAsString': '0'},
  {'FeatureName': 'TransactionDT', 'ValueAsString': '152647'},
  {'FeatureName': 'TransactionAmt', 'ValueAsString': '75.0'},
  {'FeatureName': 'ProductCD', 'ValueAsString': 'H'},
  {'FeatureName': 'card1', 'ValueAsString': '4577'},
...
```

### Generate hive DDL commands
<a name="feature-store-generate-hive-ddl-commands-feature-group"></a>

The SageMaker Python SDK’s `FeatureStore` class also provides the functionality to generate Hive DDL commands. The schema of the table is generated based on the feature definitions. Columns are named after feature name and data-type are inferred based on feature type.

```
print(feature_group.as_hive_ddl())
```

Example output:

```
CREATE EXTERNAL TABLE IF NOT EXISTS sagemaker_featurestore.identity-feature-group-27-19-33-00 (
  TransactionID INT
  id_01 FLOAT
  id_02 FLOAT
  id_03 FLOAT
  id_04 FLOAT
 ...
```

### Build a training dataset
<a name="feature-store-build-training-dataset"></a>

Feature Store automatically builds an AWS Glue data catalog when you create feature groups and you can turn this off if you want. The following describes how to create a single training dataset with feature values from both identity and transaction feature groups created earlier in this topic. Also, the following describes how to run an Amazon Athena query to join data stored in the offline store from both identity and transaction feature groups.

To start, create an Athena query using `athena_query()` for both identity and transaction feature groups. The `table\$1name` is the AWS Glue table that is autogenerated by Feature Store.

```
identity_query = identity_feature_group.athena_query()
transaction_query = transaction_feature_group.athena_query()

identity_table = identity_query.table_name
transaction_table = transaction_query.table_name
```

### Write and execute an Athena query
<a name="feature-store-write-athena-query"></a>

You write your query using SQL on these feature groups, and then execute the query with the `.run()` command and specify your Amazon S3 bucket location for the data set to be saved there.

```
# Athena query
query_string = 'SELECT * FROM "'+transaction_table+'" LEFT JOIN "'+identity_table+'" ON "'+transaction_table+'".transactionid = "'+identity_table+'".transactionid'

# run Athena query. The output is loaded to a Pandas dataframe.
dataset = pd.DataFrame()
identity_query.run(query_string=query_string, output_location='s3://'+default_s3_bucket_name+'/query_results/')
identity_query.wait()
dataset = identity_query.as_dataframe()
```

From here you can train a model using this data set and then perform inference.

### Delete a feature group
<a name="feature-store-delete-feature-group"></a>

You can delete a feature group with the `delete` function.

```
feature_group.delete()
```

The following code example is from the fraud detection example.

```
identity_feature_group.delete()
transaction_feature_group.delete()
```

For more information, see the [Delete a feature group API](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DeleteFeatureGroup.html).

# Using Amazon SageMaker Feature Store in the console
<a name="feature-store-use-with-studio"></a>

**Important**  
Custom IAM policies that allow Amazon SageMaker Studio or Amazon SageMaker Studio Classic to create Amazon SageMaker resources must also grant permissions to add tags to those resources. The permission to add tags to resources is required because Studio and Studio Classic automatically tag any resources they create. If an IAM policy allows Studio and Studio Classic to create resources but does not allow tagging, "AccessDenied" errors can occur when trying to create resources. For more information, see [Provide permissions for tagging SageMaker AI resources](security_iam_id-based-policy-examples.md#grant-tagging-permissions).  
[AWS managed policies for Amazon SageMaker AI](security-iam-awsmanpol.md) that give permissions to create SageMaker resources already include permissions to add tags while creating those resources.

You can use Amazon SageMaker Feature Store on the console to create, view, update, and monitor your feature groups. Monitoring in this guide includes viewing pipeline executions and lineage of your feature groups. This guide provides instructions on how to achieve these tasks from the console.

For Feature Store examples and resources using the Amazon SageMaker APIs and AWS SDK for Python (Boto3), see [Amazon SageMaker Feature Store resources](feature-store-resources.md).

**Topics**
+ [

## Create a feature group from the console
](#feature-store-create-feature-group-studio)
+ [

## View feature group details from the console
](#feature-store-view-feature-group-detail-studio)
+ [

## Update a feature group from the console
](#feature-store-update-feature-group-studio)
+ [

## View pipeline executions from the console
](#feature-store-view-feature-processor-pipeline-executions-studio)
+ [

## View lineage from the console
](#feature-store-view-feature-processor-pipeline-lineage-studio)

## Create a feature group from the console
<a name="feature-store-create-feature-group-studio"></a>

The create feature group process has four steps:

1. Enter feature group information.

1. Enter feature definitions.

1. Enter required features.

1. Enter feature group tags.


Consider which of the following options fits your use case:
+ Create an online store, an offline store, or both. For more information about the differences between online and offline stores, see [Feature Store concepts](feature-store-concepts.md).
+ Use a default AWS Key Management Service key or your own KMS key. The default key is [AWS KMS key (SSE-KMS)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html). You can reduce AWS KMS request costs by configuring use of Amazon S3 Bucket Keys on the offline store Amazon S3 bucket. The Amazon S3 Bucket Key must be enabled before using the bucket for your feature groups. For more information about reducing the cost by using Amazon S3 Bucket Keys, see [Reducing the cost of SSE-KMS with Amazon S3 Bucket Keys](https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucket-key.html).

  You can use the same key for both online and offline stores, or have a unique key for each. For more information about AWS KMS, see [AWS Key Management Service](https://docs.aws.amazon.com/kms/latest/developerguide/overview.html).
+ If you create an offline store:
  + Decide if you want to create an Amazon S3 bucket or use an existing one. When using an existing one, you must know the Amazon S3 bucket URL or Amazon S3 bucket name and dataset directory name, if applicable.
  + Choose which Amazon Resource Name (ARN) to use to specify the IAM role. For more information about how to find your role and attached policies, see [Adding policies to your IAM role](feature-store-adding-policies.md).
  + Decide whether to use the AWS Glue (default) or Apache Iceberg table format. In most use cases, you use the Apache Iceberg table format. For more information about table formats, see [Use Feature Store with SDK for Python (Boto3)](feature-store-create-feature-group.md).

You can use the console to view the lineage of a feature group. The instructions for using Feature Store on the console vary depending on whether you enabled [Amazon SageMaker Studio](studio-updated.md) or [Amazon SageMaker Studio Classic](studio.md) as your default experience.

### Create feature groups if Studio is your default experience (console)
<a name="feature-store-create-feature-group-studio-how-to-with-studio-updated"></a>

1. Open the Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. Choose **Data** from the left navigation pane to expand the dropdown list.

1. From the dropdown list, choose **Feature Store**.

1. Choose **Create feature group**.

1. Under **Feature group details**, enter a feature group name.

1. (Optional) Enter a description of the feature group.

1. Under **Feature group storage configuration**, choose a storage configuration from the dropdown list. For information about storage configurations, see [Feature Store storage configurations](feature-store-storage-configurations.md).

1. If you have chosen to enable the online storage:

   1. If you *only* enable the online storage, you can choose a **Storage type** from the dropdown list. For information about online store storage types, see [Online store](feature-store-storage-configurations-online-store.md).

   1. (Optional) Apply **Time to Live (TTL)** by toggling the switch to **On** and specifying the **Time to Live duration** value and unit. This will update the default TTL duration for all records added to the feature group *after the feature group is created*. For more information about TTL, see [Time to live (TTL) duration for records](feature-store-time-to-live.md).

1. If you have chosen to enable the offline storage:

   1. Under the **Amazon S3 bucket name**, enter a new bucket name, or enter an existing bucket URL, manually.

   1. From the **Table format** dropdown list, choose the table format. In most use cases, you should use the Apache Iceberg table format. For more information about table formats, see [Use Feature Store with SDK for Python (Boto3)](feature-store-create-feature-group.md).

   1. Under **IAM role ARN**, choose the IAM role ARN you want to attach to this feature group. For more information about how to find your role and attached policies, see [Adding policies to your IAM role](feature-store-adding-policies.md).

   1. If you have chosen to enable the offline storage **Table format** and AWS Glue (default) **Table format**, under **Data catalog**, you can choose one of the following two options:
      + **Use default values for your AWS Glue Data Catalog**.
      + Provide your existing Data Catalog name, table name, and database name to extend your existing AWS Glue Data Catalog. 

1. Under the **Online store encryption key** or **Offline store encryption key** dropdown list, choose one of the following options:
   + **Use AWS managed AWS KMS key (default)**
   + **Enter an AWS KMS key ARN** and enter your AWS KMS key ARN under **Offline store encryption key ARN**. For more information about AWS KMS, see [AWS Key Management Service](https://docs.aws.amazon.com/kms/latest/developerguide/overview.html).

1. If applicable, you will have the option to choose your throughput mode, which impacts how you are charged. Under **Throughput mode**, choose a mode from the dropdown list and input the read and write capacities when available. For information about throughput modes, like when the mode can be applied and capacity units, see [Throughput modes](feature-store-throughput-mode.md).

1. After you specify all of the required information, the **Continue** button appears available. Choose **Continue**.

1. Under **Specify feature definitions**, you have two options for providing a schema for your features: a JSON editor, or a table editor.
   + JSON editor: In the **JSON** tab, enter or copy and paste your feature definitions in the JSON format.
   + Table editor: In the **Table** tab, enter the feature feature name and choose the corresponding data type for each feature in your feature group. Choose **\$1 Add feature definitions** to include more features. Be aware that you cannot remove feature definitions from your feature groups. However, you can add and update feature definitions after the feature group is created.

   There must be at least two features in a feature group that represent the record identifier and event time:
   + The record **Feature type** can be a string, fractional, or an integral.
   + The event time **Feature type** must be a string or a fractional. However, if you chose the Iceberg table format, the event time must be a string.

1. After all of the features are included, choose **Continue**.

1. Under **Select required features**, you must specify the record identifier and event time features. Do this by choosing the feature name under **Record identifier feature name** and **Event time feature name** dropdown lists, respectively.

1. After you choose the record identifier and event time features, choose **Continue**.

1. (Optional) To add tags for the feature group, choose **Add new tag**. Then enter a tag key and the corresponding value under **Key** and **Value**, respectively.

1. Choose **Continue**.

1. Under **Review feature group**, review the feature group information. To edit any step, choose the **Edit** button that corresponds to that step. This brings you to the corresponding step for editing. To return to step 5, choose **Continue** until you return to step 5. 

1. After you finalize the setup for your feature group, choose **Create feature group**.

   If an issue occurs during setup, a pop-up alert message appears at the bottom of the page with tips for solving the issue. You can return to previous steps to fix the issues by choosing **Edit** for the step with conflicts.

   After the feature group has been successfully created, a green pop-up message appears at the bottom of the page. The new feature group also appears in your feature groups catalog.

### Create feature groups if Studio Classic is your default experience (console)
<a name="feature-store-create-feature-group-studio-how-to-with-studio-classic"></a>

1. Open the Studio Classic console by following the instructions in [Launch Amazon SageMaker Studio Classic](studio-launch.md).

1. Choose the **Home** icon (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)) on the left navigation pane.

1. Choose **Data**.

1. From the dropdown list, choose **Feature Store**.

1. Choose **Create feature group**.

1. Under **Feature group details**, enter a feature group name.

1. (Optional) Enter a description of the feature group.

1. Under **Feature group storage configuration**, choose a storage configuration from the dropdown list. For information about storage configurations, see [Feature Store storage configurations](feature-store-storage-configurations.md).

1. If you have chosen to enable the online storage:

   1. If you *only* enable the online storage, you may choose a **Storage type** from the dropdown list. For information about online store storage types, see [Online store](feature-store-storage-configurations-online-store.md).

   1. (Optional) Apply **Time to Live (TTL)** by toggling the switch to **On** and specifying the **Time to Live duration** value and unit. This will update the default TTL duration for all records added to the feature group *after the feature group is created*. For more information about TTL, see [Time to live (TTL) duration for records](feature-store-time-to-live.md).

1. If you have chosen to enable the offline storage:

   1. Under the **Amazon S3 bucket name**, enter a new bucket name or enter an existing bucket URL manually.

   1. From the **Table format** dropdown list, choose the table format. In most use cases, you should use the Apache Iceberg table format. For more information about table formats, see [Use Feature Store with SDK for Python (Boto3)](feature-store-create-feature-group.md).

   1. Under **IAM role ARN**, choose the IAM role ARN you want to attach to this feature group. For more information about how to find your role and attached policies, see [Adding policies to your IAM role](feature-store-adding-policies.md).

   1. If you have chosen to enable the offline storage **Table format** and AWS Glue (default) **Table format**, under **Data catalog**, you can choose one of the following two options:
      + **Use default values for your AWS Glue Data Catalog**.
      + Provide your existing Data Catalog name, table name, and database name to extend your existing AWS Glue Data Catalog. 

1. Under the **Online store encryption key** or **Offline store encryption key** dropdown list, choose one of the following options:
   + **Use AWS managed AWS KMS key (default)**
   + **Enter an AWS KMS key ARN** and enter your AWS KMS key ARN under **Offline store encryption key ARN**. For more information about AWS KMS, see [AWS Key Management Service](https://docs.aws.amazon.com/kms/latest/developerguide/overview.html).

1. After you specify all of the required information, the **Continue** button appears available. Choose **Continue**.

1. Under **Specify feature definitions**, you have two options for providing a schema for your features: a JSON editor, or a table editor.
   + JSON editor: In the **JSON** tab, enter or copy and paste your feature definitions in the JSON format.
   + Table editor: In the **Table** tab, enter the feature feature name and choose the corresponding data type for each feature in your feature group. Choose **\$1 Add feature definitions** to include more features. Be aware that you cannot remove feature definitions from your feature groups. However, you can add and update feature definitions after the feature group is created.

   There must be at least two features in a feature group that represent the record identifier and event time:
   + The record **Feature type** can be a string, fractional, or an integral.
   + The event time **Feature type** must be a string or a fractional. However, if you chose the Iceberg table format, the event time must be a string.

1. After all of the features are included, choose **Continue**.

1. Under **Select required features**, you must specify the record identifier and event time features. Do this by choosing the feature name under **Record identifier feature name** and **Event time feature name** dropdown lists, respectively.

1. After you choose the record identifier and event time features, choose **Continue**.

1. (Optional) To add tags for the feature group, choose **Add new tag**. Then enter a tag key and the corresponding value under **Key** and **Value**, respectively.

1. Choose **Continue**.

1. Under **Review feature group**, review the feature group information. To edit any step, choose the **Edit** button that corresponds to that step. This brings you to the corresponding step for editing. To return to step 5, choose **Continue** until you return to step 5. 

1. After you finalize the setup for your feature group, choose **Create feature group**.

   If an issue occurs during setup, a pop-up alert message appears at the bottom of the page with tips for solving the issue. You can return to previous steps to fix the issues by choosing **Edit** for the step with conflicts.

   After the feature group has been successfully created, a green pop-up message appears at the bottom of the page. The new feature group also appears in your feature groups catalog.

## View feature group details from the console
<a name="feature-store-view-feature-group-detail-studio"></a>

You can view details of your feature groups after a feature group has successfully been created in the Feature Store.

You can use the console or the Amazon SageMaker Feature Store API to view your feature group details. The instructions for using Feature Store through the console depends on if you have enabled [Amazon SageMaker Studio](studio-updated.md) or [Amazon SageMaker Studio Classic](studio.md) as your default experience.

### View feature group details if Studio is your default experience (console)
<a name="feature-store-view-feature-group-detail-studio-how-to-with-studio-updated"></a>

1. Open the Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. Choose **Data** in the left navigation pane, to expand the dropdown list.

1. From the dropdown list, choose **Feature Store**.

1. (Optional) To view your feature groups, choose **My account**. To view shared feature groups, choose **Cross account**.

1. Under the **Feature group catalog** tab, choose your feature group name from the list. This opens the feature group page.

1. On the **Features** tab, you can find a list of all of the features. Use the filter to refine your list. Choose a feature to view its details.

1. Under the **Details** tab and the **Information** subtab, you can review your feature group information. This includes **Latest execution**, **Offline storage settings**, **Online storage settings**, and more.

1. Under the **Details** tab and the **Tags** subtab, you can review your feature group tags. Choose **Add new tag** to add a new tag or **Remove** to remove a tag.

1. Under the **Pipeline Executions** tab, you can view the associated pipelines or pipeline executions for your feature group.

1. Under the **Lineage** tab, you can view the lineage of your feature group.

### View feature group details if Studio Classic is your default experience (console)
<a name="feature-store-view-feature-group-detail-studio-how-to-with-studio-classic"></a>

1. Open the Studio Classic console by following the instructions in [Launch Amazon SageMaker Studio Classic](studio-launch.md).

1. Choose the **Home** icon (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)) in the left navigation pane.

1. Choose **Data**.

1. From the dropdown list, choose **Feature Store**.

1. (Optional) To view your feature groups, choose **My account**. To view shared feature groups, choose **Cross account**.

1. Under the **Feature group catalog** tab, choose your feature group name from the list. This opens the feature group page.

1. On the **Features** tab, you can find a list of all of the features. Use the filter to refine your list. Choose a feature to view its details.

1. Under the **Details** tab and the **Information** subtab, you can review your feature group information, including **Latest execution**, **Offline storage settings**, **Online storage settings**, and more.

1. Under the **Details** tab and the **Tags** subtab, you can review your feature group tags. Choose **Add new tag** to add a new tag or **Remove** to remove a tag.

1. Under the **Pipeline Executions** tab, you can view the associated pipelines or pipeline executions for your feature group.

1. Under the **Lineage** tab, you can view the lineage of your feature group.

## Update a feature group from the console
<a name="feature-store-update-feature-group-studio"></a>

You can update your feature groups after a feature group has successfully been created in the Feature Store.

You can use the console or the Amazon SageMaker Feature Store API to update a feature group. The instructions for using Feature Store through the console depends on if you have enabled [Amazon SageMaker Studio](studio-updated.md) or [Amazon SageMaker Studio Classic](studio.md) as your default experience.

### Update a feature group if Studio is your default experience (console)
<a name="feature-store-update-feature-group-studio-how-to-with-studio-updated"></a>

1. Open the Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. Choose **Data** in the left navigation pane, to expand the dropdown list.

1. From the dropdown list, choose **Feature Store**.

1. (Optional) To view your feature groups, choose **My account**. To view shared feature groups, choose **Cross account**.

1. Under the **Feature group catalog** tab, search for and choose your feature group name from the list. This opens the feature group page.

1. Choose **Update feature group**.

1. (Optional) If applicable, you can change your throughput mode, which impacts how you are charged. Under **Throughput mode**, choose a mode from the dropdown list and input the read and write capacities when available. For information about throughput modes, like when the mode can be applied and capacity units, see [Throughput modes](feature-store-throughput-mode.md).

1. (Optional) If your feature group uses the online store, you can update the default **Time to Live (TTL)**. If TTL hasn't been enabled for the feature group, toggle the switch button under **Time to Live (TTL)** to **On**. You can specify the TTL value and unit under **Time to Live duration**. This will update the default TTL duration for all records added to the feature group *after the feature group is updated*.

1. (Optional) You can add feature definitions to your feature group but be aware that you cannot remove feature definitions from your feature groups. To add a feature definition, choose **\$1 Add feature definition** and then specify the new feature definition name under the **Name** column and select the feature type under the **Feature type** column.

1. Choose **Save changes**.

1. To confirm your changes, choose **Confirm**.

### Update a feature group if Studio Classic is your default experience (console)
<a name="feature-store-update-feature-group-studio-how-to-with-studio-classic"></a>

1. Open the Studio Classic console by following the instructions in [Launch Amazon SageMaker Studio Classic](studio-launch.md).

1. Choose the **Home** icon (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)) in the left navigation pane.

1. Choose **Data**.

1. From the dropdown list, choose **Feature Store**.

1. (Optional) To view your feature groups, choose **My account**. To view shared feature groups, choose **Cross account**.

1. Under the **Feature group catalog** tab, search for and choose your feature group name from the list. This opens the feature group page.

1. Choose **Update feature group**.

1. (Optional) If your feature group uses the online store, you can update the default **Time to Live (TTL)**. If TTL hasn't been enabled for the feature group, toggle the switch button under **Time to Live (TTL)** to **On**. You can specify the TTL value and unit under **Time to Live duration**. This will update the default TTL duration for all records added to the feature group *after the feature group is updated*.

1. (Optional) You can add feature definitions to your feature group but be aware that you cannot remove feature definitions from your feature groups. To add a feature definition, choose **\$1 Add feature definition** and then specify the new feature definition name under the **Name** column and select the feature type under the **Feature type** column.

1. Choose **Save changes**.

1. To confirm your changes, choose **Confirm**.

## View pipeline executions from the console
<a name="feature-store-view-feature-processor-pipeline-executions-studio"></a>

You can view the latest pipeline execution information for a feature or feature group under **Pipeline executions**. You can also get links to pipelines, executions, code, and other useful execution information.

You can use the console to view your pipeline executions. The instructions for using Feature Store through the console depends on if you have enabled [Amazon SageMaker Studio](studio-updated.md) or [Amazon SageMaker Studio Classic](studio.md) as your default experience.

### View pipeline executions if Studio is your default experience (console)
<a name="feature-store-view-feature-processor-pipeline-executions-studio-how-to-with-studio-updated"></a>

1. Open the Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. Choose **Data** in the left navigation pane, to expand the dropdown list.

1. From the dropdown list, choose **Feature Store**.

1. (Optional) To view your feature groups, choose **My account**. To view shared feature groups, choose **Cross account**.

1. Choose a feature group or feature to view their pipeline executions.

1. Choose the **Pipeline executions** tab.

1. Search for a pipeline from the **Select a pipeline** dropdown list.

1. You can view the links for the pipeline, execution, and code details. You can also view the execution owner, status, date, and duration.

### View pipeline executions if Studio Classic is your default experience (console)
<a name="feature-store-view-feature-processor-pipeline-executions-studio-how-to-with-studio-classic"></a>

1. Open the Studio Classic console by following the instructions in [Launch Amazon SageMaker Studio Classic](studio-launch.md).

1. Choose the **Home** icon (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)) in the left navigation pane.

1. Choose **Data**.

1. From the dropdown list, choose **Feature Store**.

1. (Optional) To view your feature groups, choose **My account**. To view shared feature groups, choose **Cross account**.

1. Choose a feature group or feature to view their pipeline executions.

1. Choose the **Pipeline executions** tab.

1. Search for a pipeline from the **Select a pipeline** dropdown list.

1. You can view the links for the pipeline, execution, and code details. You can also view the execution owner, status, date, and duration.

## View lineage from the console
<a name="feature-store-view-feature-processor-pipeline-lineage-studio"></a>

You can view the lineage of a feature group. The lineage includes the information about the execution code of your feature processing workflow, what data sources were used, and how they are ingested to the feature group or feature.

You can use the console to view the lineage of a feature group. The instructions on using Feature Store through the console depends on if you have enabled [Amazon SageMaker Studio](studio-updated.md) or [Amazon SageMaker Studio Classic](studio.md) as your default experience.

### View lineage if Studio is your default experience (console)
<a name="feature-store-view-feature-processor-pipeline-lineage-studio-how-to-with-studio-updated"></a>

1. Open the Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. Choose **Data** from the left navigation pane to expand the dropdown list.

1. From the dropdown list, choose **Feature Store**.

1. (Optional) To view your feature groups, choose **My account**. To view shared feature groups, choose **Cross account**.

1. Choose a feature group or feature to view its lineage details.

1. Choose the **Lineage** tab.

1. Choose a feature group or pipeline node to expand the node. This contains more information about a feature group or pipeline.

1. You can zoom in, zoom out, or recenter the lineage graph by using the buttons on the bottom left of the screen.

1. You can move through the lineage map when you choose and drag the screen. To move your lineage maps using nodes as the focal point, you can press **Tab** or **Shift\$1Tab** to switch between nodes.

1. If applicable, you can navigate the lineage upstream (left, earlier) or downstream (right, most recent). Do this by choosing a node and then choosing **Query upstream lineage** or **Query downstream lineage**.

### View lineage if Studio Classic is your default experience (console)
<a name="feature-store-view-feature-processor-pipeline-lineage-studio-how-to-with-studio-classic"></a>

1. Open Studio Classic by following the instructions in [Launch Amazon SageMaker Studio Classic](studio-launch.md).

1. Choose the **Home** icon (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)) in the left navigation pane.

1. Choose **Data**.

1. From the dropdown list, choose **Feature Store**.

1. (Optional) To view your feature groups, choose **My account**. To view shared feature groups, choose **Cross account**.

1. Choose a feature group or feature to view its lineage details.

1. Choose the **Lineage** tab.

1. Choose a feature group or pipeline node to expand the node. This contains more information about a feature group or pipeline.

1. You can zoom in, zoom out, or recenter the lineage graph by using the buttons on the bottom left of the screen.

1. You can move through the lineage map when you choose and drag the screen. To move your lineage maps using nodes as the focal point, you can press **Tab** or **Shift\$1Tab** to switch between nodes.

1. If applicable, you can navigate the lineage upstream (left, earlier) or downstream (right, most recent). Do this by choosing a node and then choosing **Query upstream lineage** or **Query downstream lineage**.

# Delete a feature group
<a name="feature-store-delete-feature-group"></a>

You can use the console or the Amazon SageMaker Feature Store API to delete your feature group. The instructions on using Feature Store through the console depends on if you have enabled Studio or Studio Classic as your default experience. For more information about the differences between the two, or how to change your default, see [Amazon SageMaker Studio](studio-updated.md).

The following sections provide an overview on how to delete a feature group.

**Topics**
+ [

## Delete a feature group using the console
](#feature-store-delete-feature-group-studio)
+ [

## Delete feature group example Python code
](#feature-store-delete-feature-group-example)

## Delete a feature group using the console
<a name="feature-store-delete-feature-group-studio"></a>

This section shows two ways to delete a feature group in the console, depending on your default experience: Studio or Studio Classic.

### Delete feature group if Studio is your default experience (console)
<a name="feature-store-delete-feature-group-studio-updated"></a>

1. Open the Studio console by following instructions in [Launch Amazon SageMaker Studio Classic](studio-launch.md).

1. Choose **Data** in the left navigation pane to expand the dropdown list.

1. From the dropdown list, choose **Feature Store**.

1. (Optional) To view your feature groups, choose **My account**. To view shared feature groups, choose **Cross account**.

1. In the **Feature Group Catalog** tab, choose the feature group to delete under **Feature group name**.

1. Choose **Delete feature group**.

1. In the pop-up window, confirm deletion by entering **delete** in the field, then choose **Delete**.

### Delete feature group if Studio Classic is your default experience (console)
<a name="feature-store-delete-feature-group-studio-classic"></a>

1. Open the Studio Classic console by following the instructions in [Launch Amazon SageMaker Studio Classic](studio-launch.md).

1. In the left navigation pane, choose the **Home** icon (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Choose **Data**.

1. From the dropdown list, choose **Feature Store**.

1. (Optional) To view your feature groups, choose **My account**. To view shared feature groups, choose **Cross account**.

1. In the **Feature Group Catalog** tab, choose the feature group to delete under **Feature group name**.

1. Choose **Delete feature group**.

1. In the pop-up window, confirm deletion by typing **delete** in the field, then choose **Delete**.

## Delete feature group example Python code
<a name="feature-store-delete-feature-group-example"></a>

The following code uses the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DeleteFeatureGroup.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DeleteFeatureGroup.html) API operation to delete your feature group using the AWS SDK for Python (Boto3). It assumes that you've set up Feature Store and created a feature group. For more information about getting started, see [Introduction to Feature Store example notebook](feature-store-introduction-notebook.md).

```
import sagemaker
from sagemaker.feature_store.feature_group import FeatureGroup

sagemaker_session = sagemaker.Session()
fg_name = 'your-feature-group-name'

my_fg = FeatureGroup(name=fg_name, sagemaker_session=sagemaker_session)
my_fg.delete()
```

# Add features and records to a feature group
<a name="feature-store-update-feature-group"></a>

You can use the Amazon SageMaker Feature Store API or the console to update and describe your feature group as well as add features and records to your feature group. A feature group is an object that contains your data and a feature describes a column in the table. When you add a feature to the feature group you are effectively adding a column to the table. When you add a new record to the feature group you are filling in values for features associated with a specific record identifier. For more information on Feature Store concepts, see [Feature Store concepts](feature-store-concepts.md). 

After you successfully add features to a feature group, you cannot remove those features. The features that you have added do not add any data to your records. You can add new records to the feature group or overwrite them using the [PutRecord](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_feature_store_PutRecord.html) API. For examples on updating, describing, and putting records into a feature group, see [Example code](#feature-store-update-feature-group-example).

You can use the console to add features to a feature group. For more information on how to update your feature groups using the console, see [Update a feature group from the console](feature-store-use-with-studio.md#feature-store-update-feature-group-studio).

The following sections provide an overview of using Feature Store APIs to add features to a feature group followed by examples. With the API, you can also add or overwrite records after you've updated the feature group. 

**Topics**
+ [

## API
](#feature-store-update-feature-group-api)
+ [

## Example code
](#feature-store-update-feature-group-example)

## API
<a name="feature-store-update-feature-group-api"></a>

Use the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateFeatureGroup.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateFeatureGroup.html) operation to add features to a feature group.

You can use the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeFeatureGroup.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeFeatureGroup.html) operation to see if you've added the features successfully.

To add or overwrite records, use the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_feature_store_PutRecord.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_feature_store_PutRecord.html) operation.

To see the updates that you've made to a record, use the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_feature_store_GetRecord.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_feature_store_GetRecord.html) operation. To see the updates that you've made to multiple records, use the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_feature_store_BatchGetRecord.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_feature_store_BatchGetRecord.html) operation. It can take up to five minutes for the updates that you've made to appear.

You can use the example code in the following section to walk through adding features and records using the AWS SDK for Python (Boto3).

## Example code
<a name="feature-store-update-feature-group-example"></a>

The example code walks you through the following process: 

1. Adding features to the feature group

1. Verifying that you've added them successfully

1. Adding a record to the feature group

1. Verifying that you've added it successfully

### Step 1: Add features to a feature group
<a name="feature-store-update-feature-group-step-1"></a>

The following code uses the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateFeatureGroup.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateFeatureGroup.html) operation to add new features to the feature group. It assumes that you've set up Feature Store and created a feature group. For more information about getting started, see [Introduction to Feature Store example notebook](feature-store-introduction-notebook.md).

```
import boto3

sagemaker_client = boto3.client("sagemaker")

sagemaker_client.update_feature_group(
    FeatureGroupName=feature_group_name,
    FeatureAdditions=[
        {"FeatureName": "new-feature-1", "FeatureType": "Integral"},
        {"FeatureName": "new-feature-2", "FeatureType": "Fractional"},
        {"FeatureName": "new-feature-3", "FeatureType": "String"}
    ]
)
```

The following code uses the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeFeatureGroup.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeFeatureGroup.html) operation to check the status of the update. If the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeFeatureGroup.html#sagemaker-DescribeFeatureGroup-response-LastUpdateStatus](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeFeatureGroup.html#sagemaker-DescribeFeatureGroup-response-LastUpdateStatus) field is `Successful`, you've added the features successfully.

```
sagemaker_client.describe_feature_group(
    FeatureGroupName=feature_group_name
)
```

### Step 2: Add a new record to the feature group
<a name="feature-store-update-feature-group-step-2"></a>

The following code uses the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_feature_store_PutRecord.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_feature_store_PutRecord.html) operation to add records to the feature group that you've created.

```
record_identifier_value = 'new_record'

sagemaker_featurestore_runtime_client = boto3.client("sagemaker-featurestore-runtime")

sagemaker_runtime_client.put_record(
    FeatureGroupName=feature_group_name,
    Record=[
        {
            'FeatureName': "record-identifier-feature-name",
            'ValueAsString': record_identifier_value
        },
        {
            'FeatureName': "event-time-feature",
            'ValueAsString': "timestamp-that-feature-store-returns"
        },
        {
            'FeatureName': "new-feature-1", 
            'ValueAsString': "value-as-string"
        },
        {
            'FeatureName': "new-feature-2", 
            'ValueAsString': "value-as-string"
        },
        {
            'FeatureName': "new-feature-3", 
            'ValueAsString': "value-as-string"
        },
    ]
)
```

Use the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_feature_store_GetRecord.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_feature_store_GetRecord.html) operation to see which records in your feature group don't have data for the features that you've added. You can use the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_feature_store_PutRecord.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_feature_store_PutRecord.html) operation to overwrite the records that don't have data for the features that you've added.

# Delete records from your feature groups
<a name="feature-store-delete-records"></a>

You can use the Amazon SageMaker Feature Store API to delete records from your feature groups. A feature group is an object that contains your machine learning (ML) data, where the columns of your data are described by features and your data are contained in records. A record contains values for features that are associated with a specific record identifier. 

There are two storage configurations for your feature groups: online store and offline store. The online store only keeps the record with the latest event time and is typically used for real-time lookup for ML inference. The offline store keeps all records and acts as a historical database and is typically used for feature exploration, ML training, and batch inference.

For more information on Feature Store concepts, see [Ingestion diagrams](feature-store-concepts.md#feature-store-concepts-ingestion).

There are two ways to delete records from your feature groups, and the behavior is different depending on the storage configuration. In the following topics we will describe how to soft and hard delete records from the online and offline stores and provide examples.

**Topics**
+ [

## Delete records from the online store
](#feature-store-delete-records-online-store)
+ [

## Delete records from the offline store
](#feature-store-delete-records-offline-store)

## Delete records from the online store
<a name="feature-store-delete-records-online-store"></a>

You can soft or hard delete a record from the online store using the `DeleteRecord` API by using the `DeletionMode` request parameter to specify `SoftDelete` (default) or `HardDelete`. For more information on the `DeleteRecord` API, see [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_feature_store_DeleteRecord.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_feature_store_DeleteRecord.html) in the Amazon SageMaker API Reference.

With the online store:
+ When you soft delete (default), the record is no longer retrievable by GetRecord or BatchGetRecord and the feature column values are set to `null`, except for the `RecordIdentifer` and `EventTime` feature values. 
+ When you hard delete, the record is completely removed from the online store. 

In both cases Feature Store appends the deleted record marker to the `OfflineStore`. The deleted record marker is a record with the same `RecordIdentifer` as the original, but with `is_deleted` value set to `True`, `EventTime` set to the delete input `EventTime`, and other feature values set to `null`.

Note that the `EventTime` specified in `DeleteRecord` should be set later than the `EventTime` of the existing record in the `OnlineStore` for that same `RecordIdentifer`. If it is not, the deletion does not occur:
+ For `SoftDelete`, the existing (not deleted) record remains in the `OnlineStore`, though the delete record marker is still written to the `OfflineStore`. 
+ `HardDelete` returns `EventTime`: `400 ValidationException` to indicate that the delete operation failed. No delete record marker is written to the `OfflineStore`.

The following examples use the SDK for Python (Boto3) [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-featurestore-runtime/client/delete_record.html#delete-record](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-featurestore-runtime/client/delete_record.html#delete-record) operation to delete a record from a feature group. To delete a record from a feature group, you will need:
+ Feature group name (`feature-group-name`)
+ Record identifier value as a string (`record-identifier-value`)
+ Deletion event time (`deletion-event-time`)

  The deletion event time should be later than the event time of the record you wish to delete.

### Online store soft delete example
<a name="feature-store-delete-records-online-store-soft-delete"></a>

For soft delete you will need use the `DeleteRecord` API and can use the default `DeletionMode` or set the `DeletionMode` to `SoftDelete`. 

```
import boto3
client = boto3.client('sagemaker-featurestore-runtime')

client.delete_record(
    FeatureGroupName='feature-group-name',
    RecordIdentifierValueAsString='record-identifier-value',
    EventTime='deletion-event-time',
    TargetStores=[
        'OnlineStore',
    ],
    DeletionMode='SoftDelete'
)
```

### Online store hard delete example
<a name="feature-store-delete-records-online-store-hard-delete"></a>

For hard delete you will need use the `DeleteRecord` API and set the `DeletionMode` to `HardDelete`.

```
import boto3
client = boto3.client('sagemaker-featurestore-runtime')

client.delete_record(
    FeatureGroupName='feature-group-name',
    RecordIdentifierValueAsString='record-identifier-value',
    EventTime='deletion-event-timestamp',
    TargetStores=[
        'OnlineStore',
    ],
    DeletionMode='HardDelete'
)
```

## Delete records from the offline store
<a name="feature-store-delete-records-offline-store"></a>

With Amazon SageMaker Feature Store you can soft and hard delete a record from the `OfflineStore` Iceberg table format. With the `OfflineStore` Iceberg table format: 
+ When you soft delete a record the latest version of the Iceberg table file will not contain the record, but previous versions will still contain the record and can be accessed using time travel. For information on time travel, see [Querying Iceberg table data and performing time travel](https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-table-data.html) in the Athena user guide.
+ When you hard delete a record you are removing previous versions of the Iceberg table that contain the record. In this case you should specify which versions of the Iceberg table you wish to delete.

### Obtain your Iceberg table name
<a name="feature-store-delete-records-offline-store-get-iceberg-table-name"></a>

To soft and hard delete from your `OfflineStore` Iceberg table, you will need to obtain your Iceberg table name, `iceberg-table-name`. The following instructions assumes you have already used Feature Store to create a feature group using the offline store storage configuration using the Iceberg table format, with `DisableGlueTableCreation = False` (default). For more information on creating feature groups, see [Get started with Amazon SageMaker Feature Store](feature-store-getting-started.md).

To obtain your `iceberg-table-name`, use the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeFeatureGroup.html.title](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeFeatureGroup.html.title) API to obtain [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DataCatalogConfig.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DataCatalogConfig.html). This contains the metadata of the Glue table which serves as data catalog for the `OfflineStore`. The `TableName` within the `DataCatalogConfig` is your `iceberg-table-name`.

### Amazon Athena offline store soft and hard delete example
<a name="feature-store-delete-records-offline-store-athena"></a>

The following instructions use Amazon Athena to soft delete then hard delete a record from the `OfflineStore` Iceberg table. This assumes that the record you intend to delete in your `OfflineStore` is a deleted record marker. For information on the deleted record marker in your `OfflineStore`, see [Delete records from the online store](#feature-store-delete-records-online-store). 

1. Obtain your Iceberg table name, `iceberg-table-name`. For information on how to obtain your Iceberg table name, see [Obtain your Iceberg table name](#feature-store-delete-records-offline-store-get-iceberg-table-name). 

1. Run the `DELETE` command to soft delete the records on the `OfflineStore`, such that the latest version (or snapshot) of the Iceberg table will not contain the records. The following example deletes the records where `is_deleted` is `'True'` and the previous event-time versions of the those records .You may add additional conditions based on other features to restrict the deletion. For more information on using `DELETE` with Athena, see `DELETE` in the Athena user guide.

   ```
   DELETE FROM iceberg-table-name WHERE record-id-feature-name IS IN ( SELECT record-id-feature-name FROM iceberg-table-name WHERE is_deleted = 'True')
   ```

   The soft deleted records are still viewable on previous file versions by performing time travel. For information on performing time travel, see [Querying Iceberg table data and performing time travel](https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-table-data.html) in the Athena user guide.

1. Remove the record from previous versions of your Iceberg tables to hard delete the record from `OfflineStore`:

   1. Run the `OPTIMIZE` command to rewrite the data files into a more optimized layout, based on their size and number of associated delete files. For more information on optimizing Iceberg tables and the syntax, see [Optimizing Iceberg tables](https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-data-optimization.html) in the Athena user guide. 

      ```
      OPTIMIZE iceberg-table-name REWRITE DATA USING BIN_PACK
      ```

   1. (Optional, only need to run once) Run the `ALTER TABLE` command to alter the Iceberg table set values, and set when previous file versions are to be hard deleted according to your specifications. This can be done by assigning values to `vacuum_min_snapshots_to_keep` and `vacuum_max_snapshot_age_seconds` properties. For more information on altering your Iceberg table set properties, see [ALTER TABLE SET PROPERTIES](https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-managing-tables.html#querying-iceberg-alter-table-set-properties) in the Athena user guide. For more information on Iceberg table property key-value pairs, see [Table properties](https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-creating-tables.html#querying-iceberg-table-properties) in the Athena user guide. 

      ```
      ALTER TABLE iceberg-table-name SET TBLPROPERTIES (
        'vacuum_min_snapshots_to_keep'='your-specified-value',
        'vacuum_max_snapshot_age_seconds'='your-specified-value'
      )
      ```

   1. Run the `VACUUM` command to remove no longer needed data files for your Iceberg tables, not referenced by the current version. The `VACUUM` command should run after the deleted record is no longer referenced in the current snapshot. For example, `vacuum_max_snapshot_age_seconds` after the deletion. For more information on `VACUUM` with Athena and the syntax, see [https://docs.aws.amazon.com/athena/latest/ug/vacuum-statement.html](https://docs.aws.amazon.com/athena/latest/ug/vacuum-statement.html).

      ```
      VACUUM iceberg-table-name
      ```

### Apache Spark offline store soft and hard delete example
<a name="feature-store-delete-records-offline-store-spark"></a>

To soft and then hard delete a record from the `OfflineStore` Iceberg table using Apache Spark, you can follow the same instructions as in the [Amazon Athena offline store soft and hard delete example](#feature-store-delete-records-offline-store-athena) above, but using Spark procedures. For a full list of procedures, see [Spark Procedures](https://iceberg.apache.org/docs/1.3.1/spark-procedures/) in the Apache Iceberg documentation. 
+ When soft deleting from the `OfflineStore`: instead of using the `DELETE` command in Athena, use the [https://iceberg.apache.org/docs/latest/spark-writes/#delete-from](https://iceberg.apache.org/docs/latest/spark-writes/#delete-from) command in Apache Spark.
+ To remove the record from previous versions of your Iceberg tables to hard delete the record from `OfflineStore`:
  + When changing your Iceberg table configuration: instead of using the `ALTER TABLE` command from Athena, use [https://iceberg.apache.org/docs/1.3.1/spark-procedures/#expire_snapshots](https://iceberg.apache.org/docs/1.3.1/spark-procedures/#expire_snapshots) procedure.
  + To remove no longer needed data files from your Iceberg tables: instead of using the `VACUUM` command in Athena, use the [https://iceberg.apache.org/docs/1.3.1/spark-procedures/#remove_orphan_files](https://iceberg.apache.org/docs/1.3.1/spark-procedures/#remove_orphan_files) procedure.

# Collection types
<a name="feature-store-collection-types"></a>

Collection types provide a way to organize and structure data for efficient retrieval and analysis. They are used in ML databases to define the schema of a dataset and its elements. In Amazon SageMaker Feature Store, the supported collection types include list, set, and vector. 

Collections are a grouping of elements in which each element within the collection must have the same feature type (`String`, `Integral`, or `Fractional`). For example, a collection can contain elements with all of the element feature types as `Fractional`, but a collection cannot contain elements with some feature types as `Fractional` and some feature types as `String`.

Only `InMemory` online store feature groups currently support collection types. The following list describes the collection type options.

**List**: An ordered collection of elements.
+ The length of the list is determined by how many elements are in the collection.
+ Example: You can have a list such as [‘a’, ‘b’, ‘a’], because the list preserves the order and can have repeat elements.

**Set**: An unordered collection of unique elements. 
+ The length of the set is determined by how many unique elements are in the collection.
+ Example: You cannot have a set such as [‘a’, 'b', 'a'], because it contains a repeat element. The set will instead have the elements [‘a’, ‘b’], because the set only contains unique elements.

**Vector**: A specialized list that represents a fixed-size array of elements. The order of the elements hold significance, such that the positions of the elements represent certain properties of the data. 
+ The elements in the vector collection type *must* have the `Fractional` feature type. 
+ You may only have one vector collection type per online store `InMemory` tier feature group.
+ The dimension (number of elements in the vector) of the vector is predetermined by you and is specified using `VectorDimension`. The max dimension limit is 8192.
+ Example: You can have a vector such as [4.2, -6.3, 4.2], where the first, second, and third elements can represent the x, y, and z positions in physical space.

There are no limits on the length of the collections, as long as they don't exceed the maximum size of a record. For the maximum size of a record, see [Quotas, naming rules and data types](feature-store-quotas.md).

# Time to live (TTL) duration for records
<a name="feature-store-time-to-live"></a>

Amazon SageMaker Feature Store provides the option for records to be hard deleted from the online store after a time duration is reached, with time to live (TTL) duration (`TtlDuration`). The record will expire after the record’s `EventTime` plus the `TtlDuration` is reached, or `ExpiresAt` = `EventTime` \$1 `TtlDuration`. The `TtlDuration` can be applied at a feature group level, where all records within the feature group will have the `TtlDuration` by default, or at an individual record level. If `TtlDuration` is unspecified, the default value is `null` and the record will remain in the online store until it is overwritten. 

A record deleted using `TtlDuration` is hard deleted, or completely removed from the online store, and the deleted record is added to the offline store. For more information on hard delete and deletion modes, see [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_feature_store_DeleteRecord.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_feature_store_DeleteRecord.html) in the Amazon SageMaker API Reference guide. When a record is hard deleted it immediately becomes inaccessible using Feature Store APIs.

**Important**  
TTL typically deletes expired items within a few days. Depending on the size and activity level of a table, the actual delete operation of an expired item can vary. Because TTL is meant to be a background process, the nature of the capacity used to expire and delete items via TTL is variable (but free of charge). For more information on how items are deleted from a DynamoDB table, see [How it works: DynamoDB Time to Live (TTL)](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/howitworks-ttl.html).

`TtlDuration` must be a dictionary containing a `Unit` and a `Value`, where the `Unit` must be a string with values "Seconds", "Minutes", "Hours", "Days", or "Weeks" and `Value` must be an integer greater than or equal to 1. `TtlDuration` can be applied while using the `CreateFeatureGroup`, `UpdateFeatureGroup`, and `PutRecord` APIs. See the request and response syntax in the SDK for Python (Boto3) documentation for [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/create_feature_group.html#SageMaker.Client.create_feature_group](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/create_feature_group.html#SageMaker.Client.create_feature_group), [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/update_feature_group.html](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/update_feature_group.html), and [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-featurestore-runtime/client/put_record.html#](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-featurestore-runtime/client/put_record.html#) APIs.
+ When `TtlDuration` is applied at a feature group level (using the `CreateFeatureGroup` or `UpdateFeatureGroup` APIs), the applied `TtlDuration` becomes the default `TtlDuration` for all records that are added to the feature group *from the point in time that the API is called*. When applying `TtlDuration` with the `UpdateFeatureGroup` API, this will *not* become the default `TtlDuration` for records that were created *before* the API is called.

  To remove the default `TtlDuration` from an existing feature group, use the `UpdateFeatureGroup` API and set the `TtlDuration` `Unit` and `Value` to `null`.
+ When `TtlDuration` is applied at a record level (for example, using `PutRecord` API), the `TtlDuration` duration applies to that record and is used instead of the feature group level default `TtlDuration`.
+ When `TtlDuration` is applied on a feature group level it may take a few minutes for `TtlDuration` to come into effect.
+ If `TtlDuration` is used when there is no online store, you will receive a `Validation Exception (400)` error. 

The following example code shows how to apply `TtlDuration` while updating a feature group, such that the records added to the feature group *after running the API *will by default expire four weeks after their event times.

```
import boto3

sagemaker_client = boto3.client("sagemaker")
feature_group_name = '<YOUR_FEATURE_GROUP_NAME>'

sagemaker_client.update_feature_group(
    FeatureGroupName=feature_group_name,
    OnlineStoreConfig={
        TtlDuration:{
            Unit: "Weeks",
            Value: 4
        }
    }
)
```

You can use the `DescribeFeatureGroup` API to view the default `TtlDuration`. 

To view the expiration times, `ExpiresAt` (in UTC time ISO-8601 format), while using the `GetRecord` or `BatchGetRecord` APIs you must set `ExpirationTimeResponse` to `ENABLED`. See the request and response syntax in the SDK for Python (Boto3) documentation for [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/describe_feature_group.html#describe-feature-group](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/describe_feature_group.html#describe-feature-group), [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-featurestore-runtime/client/get_record.html#get-record](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-featurestore-runtime/client/get_record.html#get-record), and [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-featurestore-runtime/client/batch_get_record.html#batch-get-record](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-featurestore-runtime/client/batch_get_record.html#batch-get-record) APIs.

# Feature Store storage configurations
<a name="feature-store-storage-configurations"></a>

Amazon SageMaker Feature Store consists of an online store and an offline store. The online store enables real-time lookup of features for inference, while the offline store contains historical data for model training and batch inference. When creating a feature group, you have the option of enabling either the online store, offline store, or both. When you enable both, they sync to avoid discrepancies between training and serving data. For more information about the online and offline stores and other Feature Store concepts, see [Feature Store concepts](feature-store-concepts.md).

The following topics discuss online store storage types and offline store table formats. 

**Topics**
+ [

# Online store
](feature-store-storage-configurations-online-store.md)
+ [

# Offline store
](feature-store-storage-configurations-offline-store.md)
+ [

# Throughput modes
](feature-store-throughput-mode.md)

# Online store
<a name="feature-store-storage-configurations-online-store"></a>

The online store is a low-latency, high-availability data store that provides real-time lookup of features. It is typically used for machine learning (ML) model serving. You can chose between the standard online store (`Standard`) or an in-memory tier online store (`InMemory`), at the point when you create a feature group. In this way, you can select the storage type that best matches the read and write patterns for a particular application, while considering performance and cost. For more details about pricing, see [Amazon SageMaker Pricing](https://aws.amazon.com/sagemaker/pricing/).

The online store contains the following `StorageType` options. For more information about the online store contents, see [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_OnlineStoreConfig.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_OnlineStoreConfig.html). 

## Standard tier storage type
<a name="feature-store-storage-configurations-online-store-standard-tier"></a>

The `Standard` tier is a managed low-latency data store for online store feature groups. It provides fast data retrieval for ML model service for your applications. `Standard` is the default storage type.

## In-memory tier storage type
<a name="feature-store-storage-configurations-online-store-in-memory-tier"></a>

The `InMemory` tier is a managed data store for online store feature groups that supports very low-latency retrieval. It provides large-scale real-time data retrieval for ML model serving used for high throughput applications. The `InMemory` tier is powered by Amazon ElastiCache (Redis OSS). For more information, see [What is Amazon ElastiCache (Redis OSS)?](https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/WhatIs.html).

The online store `InMemory` tier supports collection types, namely list, set, and vector. For more information about the `InMemory` collection types, see [Collection types](feature-store-collection-types.md).

Feature Store provides low latency read and writes to the online store. The application latency is primarily made up of two primary components: infrastructure or network latency and Feature Store API latency. Reduction of network latency helps with getting the lowest latency reads and writes to Feature Store. You can reduce the network latency to Feature Store by deploying AWS PrivateLink to Feature Store Runtime endpoint. With AWS PrivateLink, you can privately access all Feature Store Runtime API operations from your Amazon Virtual Private Cloud (VPC) in a scalable manner by using interface VPC endpoints. An AWS PrivateLink deployment with the `privateDNSEnabled` option set as true:
+ It keeps all Feature Store read/write traffic within your VPC.
+ It keeps traffic in the same AZ as the client that originated it when using Feature Store. This avoids the “hops” between AZs reducing the network latency.

Follow the steps in [Access an AWS service using an interface VPC endpoint](https://docs.aws.amazon.com/vpc/latest/privatelink/create-interface-endpoint.html) to setup AWS PrivateLink to Feature Store. The service name for Feature Store Runtime in AWS PrivateLink is `com.amazonaws.region.sagemaker.featurestore-runtime`.

The `InMemory` tier online store scales automatically based about storage usage and requests. The automated scaling can take a few minutes to adapt to a new usage pattern if it changes rapidly. During automated scaling:
+ Write operations to the feature group may receive throttling errors. You should retry your requests a few minutes later.
+ Read operations to the feature group may receive throttling errors. Standard retry strategies are suitable in this case.
+ Read operations may see elevated latency.

The default `InMemory` tier feature group maximum size is 50 GiB.

Note that the `InMemory` tier currently supports online feature groups only, not online\$1offline feature groups, so there is not replication between online and offline stores for the `InMemory` tier. Also, the `InMemory` tier does not currently support customer managed KMS keys.

# Offline store
<a name="feature-store-storage-configurations-offline-store"></a>

The offline store is used for historical data when sub-second retrieval is not needed. It is typically used for data exploration, model training, and batch inference. 

When you enable both the online and offline stores for your feature group, both stores sync to avoid discrepancies between training and serving data. Please note that an online store feature group with the `InMemory` storage type enabled does not currently support a corresponding feature group in the offline store (no online to offline replication). For more information about ML model serving in Amazon SageMaker Feature Store, see [Online store](feature-store-storage-configurations-online-store.md).

The offline store contains the following `TableFormat` options. For information about the offline store contents, see [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_OfflineStoreConfig.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_OfflineStoreConfig.html) in the Amazon SageMaker API Reference.

## Glue table format
<a name="feature-store-storage-configurations-offline-store-glue-table-format"></a>

The `Glue` format (default) is a standard Hive type table format for AWS Glue. With AWS Glue, you can discover, prepare, move, and integrate data from multiple sources. It also includes additional productivity and data ops tooling for authoring, running jobs, and implementing business workflows. For more information about AWS Glue, see [What is AWS Glue?](https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html).

## Iceberg table format
<a name="feature-store-storage-configurations-offline-store-iceberg-table-format"></a>

The `Iceberg` format (recommended) is an open table format for very large analytic tables. With `Iceberg`, you can compact the small data files into fewer large files in the partition, resulting in significantly faster queries. This compaction operation is concurrent and does not affect ongoing read and write operations on the feature group. For more information about optimizing Iceberg tables, see the [Amazon Athena](https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-data-optimization.html) and [AWS Lake Formation](https://docs.aws.amazon.com/lake-formation/latest/dg/data-compaction.html) user guides.

`Iceberg` manages large collections of files as tables and supports modern analytical data lake operations. If you choose the `Iceberg` option when creating new feature groups, Amazon SageMaker Feature Store creates the `Iceberg` tables using Parquet file format, and registers the tables with the AWS Glue Data Catalog. For more information about `Iceberg` table formats, see [Using Apache Iceberg tables](https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg.html). 

**Important**  
Note that for feature groups in `Iceberg` table format, you must specify `String` as the feature type for the event time. If you specify any other type, you can't create the feature group successfully.

# Throughput modes
<a name="feature-store-throughput-mode"></a>

Amazon SageMaker Feature Store provides two pricing models to choose from: on-demand (`On-demand`) and provisioned (`Provisioned`) throughput modes. `On-demand` works best for less predictable traffic, while `Provisioned` works best for consistent and predictable traffic. 

You have the option to switch between `On-demand` and `Provisioned` throughput modes for a given feature group, to accommodate periods in which application traffic patterns are changing or less predictable. You can only update your feature group throughput mode to `On-demand` once in a 24 hour period. The throughput mode can be updated programmatically using the [UpdateFeatureGroup](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateFeatureGroup.html) API or through the console UI. For more information about using the console, see [Using Amazon SageMaker Feature Store in the console](feature-store-use-with-studio.md).

You can use the `Provisioned` throughput mode with offline-only feature groups or feature groups with the `Standard` storage type. For other storage configurations, the `On-demand` throughput mode is used. For information about the online and offline storage configurations, see [Online store](feature-store-storage-configurations-online-store.md) and [Offline store](feature-store-storage-configurations-offline-store.md), respectively.

For more details about pricing, see [Amazon SageMaker Pricing](https://aws.amazon.com/sagemaker/pricing/).

**Topics**
+ [

## On-demand throughput mode
](#feature-store-throughput-mode-on-demand)
+ [

## Provisioned throughput mode
](#feature-store-throughput-mode-provisioned)
+ [

## Throughput mode metrics
](#feature-store-throughput-mode-metrics)
+ [

## Throughput mode limits
](#feature-store-throughput-mode-limits)

## On-demand throughput mode
<a name="feature-store-throughput-mode-on-demand"></a>

The `On-demand` (default) throughput mode works best when you are using feature groups with unknown workload, unpredictable application traffic, and you cannot forecast the capacity requirements.

The `On-demand` mode charges you for the reads and writes that your application performs on your feature groups. You do not need to specify how much read and write throughput you expect your application to perform because Feature Store instantly accommodates your workloads as they ramp up or down. You pay only for what you use, which is measured in `ReadRequestsUnits` and `WriteRequestsUnits`.

You can enable the `On-demand` throughput mode using the [CreateFeatureGroup](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateFeatureGroup.html) or [UpdateFeatureGroup](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateFeatureGroup.html) APIs or through the console UI. For more information about using the console UI, see [Using Amazon SageMaker Feature Store in the console](feature-store-use-with-studio.md).

**Important**  
You can only update your feature group throughput mode to `On-demand` once in a 24 hour period.

## Provisioned throughput mode
<a name="feature-store-throughput-mode-provisioned"></a>

The `Provisioned` throughput mode works best when you are using feature groups with predictable workloads and you can forecast the capacity requirements to control costs. This can make it more cost effective for certain workloads where you can anticipate throughput requirements in advance.

When you set a feature group to `Provisioned` mode, you specify capacity units which are the maximum amount of capacity that an application can consume from a feature group. If your application exceeds this `Provisioned` throughput capacity, it is subject to request throttling.

The following includes information about the read and write capacity units. 
+ Retrieving a single record of up to 4 KB using the `GetRecord` API will consume *at least* 1 RCU (read capacity unit). Retrieving larger payloads may take more. The total number of read capacity units required depends on the item size, including a small per record metadata added by the Feature Store service. 
+ A single write request with a payload of 1 KB using the `PutRecord` API will consume *at least* 1 WCU (write capacity unit), with fractional payloads rounded up to nearest KB. It may consume more depending on the event time, deletion status of the record, and time to live (TTL) status. For more information about TTL, see [Time to live (TTL) duration for records](feature-store-time-to-live.md).

**Important**  
When setting your capacity units please consider the following:  
You will be charged for the read and write capacities you provision for your feature group, even if you do not fully utilize the `Provisioned` capacity.
If you set a read or write capacity too low, your requests may experience throttling.
In some cases, records may consume an extra capacity unit due to record level metadata that is added by the Feature Store service to enable various features. 
Retrieving only a subset of features using `GetRecord` or `BatchGetRecord` APIs will still consume RCU corresponding to the entire record. 
For write capacity, you should provision 2x the recent peak capacity to avoid throttling when performing backfills or bulk ingestion that may result in a large number of historical record writes. This is because writing historical records consumes additional write capacity.
Feature Store does not currently support auto scaling for `Provisioned` mode. 

You can enable the `On-demand` throughput mode using the [CreateFeatureGroup](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateFeatureGroup.html) or [UpdateFeatureGroup](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateFeatureGroup.html) APIs or through the console UI. For more information about using the console UI, see [Using Amazon SageMaker Feature Store in the console](feature-store-use-with-studio.md).

The following describes how you can increase or decrease the RCU and WCU throughput for your feature groups when `Provisioned` mode is enabled. 

**Increasing provisioned throughput**

You can increase RCU or WCU as often as needed using the [UpdateFeatureGroup](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateFeatureGroup.html) API or the console UI. 

**Decreasing provisioned throughput**

You can decrease RCU and WCU (or both) for feature groups using [UpdateFeatureGroup](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateFeatureGroup.html) API or the console UI. 

There is a default quota on the number of `Provisioned` capacity decreases you can perform on your feature group per day. A day is defined according to Universal Time Coordinated (UTC). On a given day, you can start by performing up to four decreases within one hour as long as you have not performed any other decreases yet during that day. Subsequently, you can perform one additional decrease per hour as long as there were no decreases in the preceding hour. This effectively brings the maximum number of decreases in a day to 27 times (4 decreases in the first hour, and 1 decrease for each of the subsequent 1-hour windows in a day).

## Throughput mode metrics
<a name="feature-store-throughput-mode-metrics"></a>

A feature group in `On-demand` mode will emit `ConsumedReadRequestsUnits` and `ConsumedWriteRequestsUnits` metrics. A feature group in `Provisioned` mode will emit `ConsumedReadCapacityUnits` and `ConsumedWriteCapacityUnits` metrics. For more information about Feature Store metrics, see [Amazon SageMaker Feature Store metrics](monitoring-cloudwatch.md#cloudwatch-metrics-feature-store).

## Throughput mode limits
<a name="feature-store-throughput-mode-limits"></a>

Each AWS account has default service quotas or limits that are applied to help ensure availability and manage billing risks. For information about the default quotas and limits, see [Quotas, naming rules and data types](feature-store-quotas.md).

In some cases, these limits may be lower than what is stated in the documentation. If you need higher limits, you can submit a request for an increase. It's a good idea to do so before reaching current limits to avoid interruptions to your work. For information about service quotas and how to request a quota increase, see [AWS service quotas](https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html).

# Data sources and ingestion
<a name="feature-store-ingest-data"></a>

Records are added to your feature groups through ingestion. Depending on your desired use case, the ingested records may be kept within the feature group or not. This depends on the storage configuration, if your feature group uses the offline or online store. The offline store is used as a historical database, that is typically used for data exploration, machine learning (ML) model training, and batch inference. The online store is used as a real-time lookup of records, that is typically used for ML model serving. For more information on Feature Store concepts and ingestion, see [Feature Store concepts](feature-store-concepts.md).

There are multiple ways to bring your data into Amazon SageMaker Feature Store. Feature Store offers a single API call for data ingestion called `PutRecord` that enables you to ingest data in batches or from streaming sources. You can use Amazon SageMaker Data Wrangler to engineer features and then ingest your features into your Feature Store. You can also use Amazon EMR for batch data ingestion through a Spark connector.

In the following topics we will discuss the difference between 

**Topics**
+ [

## Stream ingestion
](#feature-store-ingest-data-stream)
+ [

## Data Wrangler with Feature Store
](#feature-store-data-wrangler-integration)
+ [

# Batch ingestion with Amazon SageMaker Feature Store Spark
](batch-ingestion-spark-connector-setup.md)

## Stream ingestion
<a name="feature-store-ingest-data-stream"></a>

You can use streaming sources such as Kafka or Kinesis as a data source, where records are extracted from, and directly feed records to the online store for training, inference or feature creation. Records can be ingested into your feature group by using the synchronous `PutRecord` API call. Since this is a synchronous API call it allows small batches of updates to be pushed in a single API call. This enables you to maintain high freshness of the feature values and publish values as soon an update is detected. These are also called *streaming* features. 

## Data Wrangler with Feature Store
<a name="feature-store-data-wrangler-integration"></a>

Data Wrangler is a feature of Studio Classic that provides an end-to-end solution to import, prepare, transform, featurize, and analyze data. Data Wrangler enables you to engineer your features and ingest them into your online or offline store feature groups.

The following instructions exports a Jupyter notebook that contains all of the source code needed to create a Feature Store feature group that adds your features from Data Wrangler to an online or offline store.

The instructions on exporting your Data Wrangler data flow to Feature Store on the console vary depending on whether you enabled enabled [Amazon SageMaker Studio](studio-updated.md) or [Amazon SageMaker Studio Classic](studio.md) as your default experience.

### Export your Data Wrangler data flow to Feature Store if Studio is your default experience (console)
<a name="feature-store-ingest-data-wrangler-integration-with-studio-updated"></a>

1. Open the Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. Choose **Data** from the left panel, to expand the dropdown list.

1. From the dropdown list, choose **Data Wrangler**.

1. If you have an instance of Amazon SageMaker Canvas already running, choose **Open Canvas**.

   If you don't have an instance of SageMaker Canvas running, choose **Run in Canvas**.

1. On the SageMaker Canvas console, choose **Data Wrangler** in the left navigation pane.

1. Choose **Data flows** to view your data flows.

1. Choose **\$1** to expand the dropdown list.

1. Choose **Export data flow** to expand the dropdown list.

1. Choose **Save to SageMaker Feature Store (via JupyterLab Notebook)**.

1. **Under Export data flow as notebook**, choose one of the following options:
   + **Download a local copy** to download the dataflow to your local machine.
   + **Export to S3 location** to download the dataflow to an Amazon Simple Storage Service location and enter the Amazon S3 location or choose **Browse** to find your Amazon S3 location.

1. Choose **Export**.

### Export your Data Wrangler data flow to Feature Store if Studio Classic is your default experience (console)
<a name="feature-store-ingest-data-wrangler-integration-with-studio-classic"></a>

1. Open the Studio Classic console by following the instructions in [Launch Amazon SageMaker Studio Classic](studio-launch.md).

1. Choose the **Home** icon (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)) in the left navigation pane.

1. Choose **Data**.

1. From the dropdown list, choose **Data Wrangler**.

1. Choose your workflow.

1. Choose the **Export** tab.

1. Choose **Export Step**.

1. Choose **Feature Store**.

 After the feature group has been created, you can also select and join data across multiple feature groups to create new engineered features in Data Wrangler and then export your data set to an Amazon S3 bucket. 

For more information on how to export to Feature Store, see [Export to SageMaker AI Feature Store](https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-data-export.html#data-wrangler-data-export-feature-store). 

# Batch ingestion with Amazon SageMaker Feature Store Spark
<a name="batch-ingestion-spark-connector-setup"></a>

Amazon SageMaker Feature Store Spark is a Spark connector that connects the Spark library to Feature Store. Feature Store Spark simplifies data ingestion from Spark `DataFrame`s to feature groups. Feature Store supports batch data ingestion with Spark, using your existing ETL pipeline, on Amazon EMR, GIS, an AWS Glue job, an Amazon SageMaker Processing job, or a SageMaker notebook.

Methods for installing and implementing batch data ingestion are provided for Python and Scala developers. Python developers can use the open-source `sagemaker-feature-store-pyspark` Python library for local development, installation on Amazon EMR, and for Jupyter Notebooks by following the instructions in the [Amazon SageMaker Feature Store Spark GitHub repository](https://github.com/aws/sagemaker-feature-store-spark). Scala developers can use the Feature Store Spark connector available in the [Amazon SageMaker Feature Store Spark SDK Maven central repository](https://mvnrepository.com/artifact/software.amazon.sagemaker.featurestore/sagemaker-feature-store-spark-sdk).

You can use the Spark connector to ingest data in the following ways, depending on if the online store, offline store, or both are enabled.

1. Ingest by default – If the online store is enabled, Spark connector first ingests your dataframe into the online store using the [PutRecord](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_feature_store_PutRecord.html) API. Only the record with the largest event time remains in the online store. If the offline store is enabled, within 15 minutes Feature Store ingests your dataframe into the offline store. For more information about how the online and offline stores work, see [Feature Store concepts](feature-store-concepts.md).

   You can accomplish this by not specifying `target_stores` in the `.ingest_data(...)` method. 

1. Offline store direct ingestion – If offline store is enabled, Spark connector batch ingests your dataframe directly into the offline store. Ingesting the dataframe directly into the offline store doesn't update the online store.

   You can accomplish this by setting `target_stores=["OfflineStore"]` in the `.ingest_data(...)` method.

1. Online store only – If online store is enabled, Spark connector ingests your dataframe into the online store using the [PutRecord](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_feature_store_PutRecord.html) API. Ingesting the dataframe directly into the online store doesn't update the offline store. 

   You can accomplish this by setting `target_stores=["OnlineStore"]` in the `.ingest_data(...)` method.

For information about using the different ingestion methods, see [Example implementations](#batch-ingestion-spark-connector-example-implementations).

**Topics**
+ [

## Feature Store Spark installation
](#batch-ingestion-spark-connector-installation)
+ [

## Retrieving the JAR for Feature Store Spark
](#retrieve-jar-spark-connector)
+ [

## Example implementations
](#batch-ingestion-spark-connector-example-implementations)

## Feature Store Spark installation
<a name="batch-ingestion-spark-connector-installation"></a>

 **Scala users** 

The Feature Store Spark SDK is available in the [Amazon SageMaker Feature Store Spark SDK Maven central repository](https://mvnrepository.com/artifact/software.amazon.sagemaker.featurestore/sagemaker-feature-store-spark-sdk) for Scala users.

 ****Requirements**** 
+ Spark >= 3.0.0 and <= 3.3.0
+ `iceberg-spark-runtime` >= 0.14.0
+ Scala >= 2.12.x  
+  Amazon EMR >= 6.1.0 (only if you are using Amazon EMR) 

 **Declare the dependency in POM.xml** 

The Feature Store Spark connector has a dependency on the `iceberg-spark-runtime` library. You must therefore add corresponding version of the `iceberg-spark-runtime` library to the dependency if you're ingesting data into a feature group that you've auto-created with the Iceberg table format. For example, if you're using Spark 3.1, you must declare the following in your project’s `POM.xml`: 

```
 <dependency>
 <groupId>software.amazon.sagemaker.featurestore</groupId>
 <artifactId>sagemaker-feature-store-spark-sdk_2.12</artifactId>
 <version>1.0.0</version>
 </dependency>
 
 <dependency>
   <groupId>org.apache.iceberg</groupId>
   <artifactId>iceberg-spark-runtime-3.1_2.12</artifactId>
   <version>0.14.0</version>
</dependency>
```

 **Python users** 

The Feature Store Spark SDK is available in the open-source [Amazon SageMaker Feature Store Spark GitHub repository](https://github.com/aws/sagemaker-feature-store-spark).

 ****Requirements**** 
+ Spark >= 3.0.0 and <= 3.3.0
+ Amazon EMR >= 6.1.0 (only if you are using Amazon EMR) 
+ Kernel = `conda_python3`

We recommend setting the `$SPARK_HOME` to the directory where you have Spark installed. During installation, Feature Store uploads the required JAR to `SPARK_HOME`, so that the dependencies load automatically. Spark starting a JVM is required to make this PySpark library work.

 **Local installation** 

To find more info about the installation, enable verbose mode by appending `--verbose` to the following installation command. 

```
pip3 install sagemaker-feature-store-pyspark-3.1 --no-binary :all:
```

 **Installation on Amazon EMR** 

Create an Amazon EMR cluster with the release version 6.1.0 or later. Enable SSH to help you troubleshoot any issues.

You can do one of the following to install the library:
+ Create a custom step within Amazon EMR.
+ Connect to your cluster using SSH and install the library from there.

**Note**  
The following information uses Spark version 3.1, but you can specify any version that meets the requirements.

```
export SPARK_HOME=/usr/lib/spark
sudo -E pip3 install sagemaker-feature-store-pyspark-3.1 --no-binary :all: --verbose
```

**Note**  
If you want to install the dependent JARs automatically to SPARK\$1HOME, do not use the bootstrap step.

 **Installation on a SageMaker notebook instance** 

Install a version of PySpark that's compatible with the Spark connector using the following commands:

```
!pip3 install pyspark==3.1.1 
!pip3 install sagemaker-feature-store-pyspark-3.1 --no-binary :all:
```

If you're performing batch ingestion to the offline store, the dependencies aren't within the notebook instance environment.

```
from pyspark.sql import SparkSession
import feature_store_pyspark

extra_jars = ",".join(feature_store_pyspark.classpath_jars())

spark = SparkSession.builder \
    .config("spark.jars", extra_jars) \
    .config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:3.2.1,org.apache.hadoop:hadoop-common:3.2.1") \
    .getOrCreate()
```

 **Installation on notebooks with GIS** 

**Important**  
You must use AWS Glue Version 2.0 or later.

Use the following information to help you install the PySpark connector in an AWS Glue Interactive Session (GIS).

Amazon SageMaker Feature Store Spark requires a specific Spark connector JAR during the initialization of the session to be uploaded to your Amazon S3 bucket. For more information on uploading the required JAR to your S3 bucket, see [Retrieving the JAR for Feature Store Spark](#retrieve-jar-spark-connector).

After you’ve uploaded the JAR, you must provide the GIS sessions with the JAR using the following command. 

```
%extra_jars s3:/<YOUR_BUCKET>/spark-connector-jars/sagemaker-feature-store-spark-sdk.jar
```

To install Feature Store Spark in the AWS Glue runtime, use the `%additional_python_modules` magic command within the GIS notebook. AWS Glue runs `pip` to the modules that you’ve specified under `%additional_python_modules`.

```
%additional_python_modules sagemaker-feature-store-pyspark-3.1
```

Before you start the AWS Glue session, you must use both of the preceding magic commands.

 **Installation on an AWS Glue job** 

**Important**  
You must use AWS Glue Version 2.0 or later.

To install the Spark connector on a AWS Glue job, use the `--extra-jars` argument to provide the necessary JARs and `--additional-python-modules` to install the Spark Connector as job parameters when you create the AWS Glue job as shown in the following example. For more information on uploading the required JAR to your S3 bucket, see [Retrieving the JAR for Feature Store Spark](#retrieve-jar-spark-connector).

```
glue_client = boto3.client('glue', region_name=region)
response = glue_client.create_job(
    Name=pipeline_id,
    Description='Feature Store Compute Job',
    Role=glue_role_arn,
    ExecutionProperty={'MaxConcurrentRuns': max_concurrent_run},
    Command={
        'Name': 'glueetl',
        'ScriptLocation': script_location_uri,
        'PythonVersion': '3'
    },
    DefaultArguments={
        '--TempDir': temp_dir_location_uri,
        '--additional-python-modules': 'sagemaker-feature-store-pyspark-3.1',
        '--extra-jars': "s3:/<YOUR_BUCKET>/spark-connector-jars/sagemaker-feature-store-spark-sdk.jar",
        ...
    },
    MaxRetries=3,
    NumberOfWorkers=149,
    Timeout=2880,
    GlueVersion='3.0',
    WorkerType='G.2X'
)
```

 **Installation on an Amazon SageMaker Processing job** 

To use Feature Store Spark with Amazon SageMaker Processing jobs, bring your own image. For more information about bringing your image, see [Custom Images in Amazon SageMaker Studio Classic](studio-byoi.md). Add the installation step to a Dockerfile. After you've pushed the Docker image to an Amazon ECR repository, you can use the PySparkProcessor to create the processing job. For more information about creating a processing job with the PySpark processor, see [Run a Processing Job with Apache Spark](use-spark-processing-container.md).

The following is an example of adding an installation step to the Dockerfile.

```
FROM <ACCOUNT_ID>.dkr.ecr.<AWS_REGION>.amazonaws.com/sagemaker-spark-processing:3.1-cpu-py38-v1.0

RUN /usr/bin/python3 -m pip install sagemaker-feature-store-pyspark-3.1 --no-binary :all: --verbose
```

## Retrieving the JAR for Feature Store Spark
<a name="retrieve-jar-spark-connector"></a>

To retrieve the Feature Store Spark dependency JAR, you must install the Spark connector from the Python Package Index (PyPI) repository using `pip` in any Python environment with network access. A SageMaker Jupyter Notebook is an example of a Python environment with network access.

The following command installs the Spark connector.

```
!pip install sagemaker-feature-store-pyspark-3.1      
```

After you've installed Feature Store Spark, you can retrieve the JAR location and upload the JAR to Amazon S3.

The `feature-store-pyspark-dependency-jars` command provides the location of the necessary dependency JAR that Feature Store Spark added. You can use the command to retrieve the JAR and upload it to Amazon S3.

```
jar_location = !feature-store-pyspark-dependency-jars
jar_location = jar_location[0]

s3_client = boto3.client("s3")
s3_client.upload_file(jar_location, "<YOUR_BUCKET>","spark-connector-jars/sagemaker-feature-store-spark-sdk.jar")
```

## Example implementations
<a name="batch-ingestion-spark-connector-example-implementations"></a>

------
#### [ Example Python script ]

 *FeatureStoreBatchIngestion.py* 

```
from pyspark.sql import SparkSession
from feature_store_pyspark.FeatureStoreManager import FeatureStoreManager
import feature_store_pyspark

spark = SparkSession.builder \
                    .getOrCreate()

# Construct test DataFrame
columns = ["RecordIdentifier", "EventTime"]
data = [("1","2021-03-02T12:20:12Z"), ("2", "2021-03-02T12:20:13Z"), ("3", "2021-03-02T12:20:14Z")]

df = spark.createDataFrame(data).toDF(*columns)

# Initialize FeatureStoreManager with a role arn if your feature group is created by another account
feature_store_manager= FeatureStoreManager("arn:aws:iam::111122223333:role/role-arn")
 
# Load the feature definitions from input schema. The feature definitions can be used to create a feature group
feature_definitions = feature_store_manager.load_feature_definitions_from_schema(df)

feature_group_arn = "arn:aws:sagemaker:<AWS_REGION>:<ACCOUNT_ID>:feature-group/<YOUR_FEATURE_GROUP_NAME>"

# Ingest by default. The connector will leverage PutRecord API to ingest your data in stream
# https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_feature_store_PutRecord.html
feature_store_manager.ingest_data(input_data_frame=df, feature_group_arn=feature_group_arn)

# To select the target stores for ingestion, you can specify the target store as the paramter
# If OnlineStore is selected, the connector will leverage PutRecord API to ingest your data in stream
feature_store_manager.ingest_data(input_data_frame=df, feature_group_arn=feature_group_arn, target_stores=["OfflineStore", "OnlineStore"])

# If only OfflineStore is selected, the connector will batch write the data to offline store directly
feature_store_manager.ingest_data(input_data_frame=df, feature_group_arn=feature_group_arn, target_stores=["OfflineStore"])

# To retrieve the records failed to be ingested by spark connector
failed_records_df = feature_store_manager.get_failed_stream_ingestion_data_frame()
```

 **Submit a Spark job with example Python script** 

The PySpark version requires an extra dependent JAR to be imported, so extra steps are needed to run the Spark application. 

If you did not specify `SPARK_HOME` during installation, then you have to load required JARs in JVM when running `spark-submit`. `feature-store-pyspark-dependency-jars` is a Python script installed by the Spark library to automatically fetch the path to all JARs for you. 

```
spark-submit --jars `feature-store-pyspark-dependency-jars` FeatureStoreBatchIngestion.py
```

If you are running this application on Amazon EMR, we recommended that you run the application in client mode, so that you do not need to distribute the dependent JARs to other task nodes. Add one more step in Amazon EMR cluster with Spark argument similar to the following:

```
spark-submit --deploy-mode client --master yarn s3:/<PATH_TO_SCRIPT>/FeatureStoreBatchIngestion.py
```

------
#### [ Example Scala script ]

 *FeatureStoreBatchIngestion.scala* 

```
import software.amazon.sagemaker.featurestore.sparksdk.FeatureStoreManager
import org.apache.spark.sql.types.{StringType, StructField, StructType}
import org.apache.spark.sql.{Row, SparkSession}

object TestSparkApp {
  def main(args: Array[String]): Unit = {

    val spark = SparkSession.builder().getOrCreate()

    // Construct test DataFrame
    val data = List(
      Row("1", "2021-07-01T12:20:12Z"),
      Row("2", "2021-07-02T12:20:13Z"),
      Row("3", "2021-07-03T12:20:14Z")
    )
    
    val schema = StructType(
      List(StructField("RecordIdentifier", StringType), StructField("EventTime", StringType))
    )

    val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
    
    // Initialize FeatureStoreManager with a role arn if your feature group is created by another account
    val featureStoreManager = new FeatureStoreManager("arn:aws:iam::111122223333:role/role-arn")
    
    // Load the feature definitions from input schema. The feature definitions can be used to create a feature group
    val featureDefinitions = featureStoreManager.loadFeatureDefinitionsFromSchema(df)

    val featureGroupArn = "arn:aws:sagemaker:<AWS_REGION>:<ACCOUNT_ID>:feature-group/<YOUR_FEATURE_GROUP_NAME>"
   
    // Ingest by default. The connector will leverage PutRecord API to ingest your data in stream
    // https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_feature_store_PutRecord.html
    featureStoreManager.ingestData(df, featureGroupArn)
    
    // To select the target stores for ingestion, you can specify the target store as the paramter
    // If OnlineStore is selected, the connector will leverage PutRecord API to ingest your data in stream
    featureStoreManager.ingestData(df, featureGroupArn, List("OfflineStore", "OnlineStore"))
    
    // If only OfflineStore is selected, the connector will batch write the data to offline store directly
    featureStoreManager.ingestData(df, featureGroupArn, ["OfflineStore"])
    
    // To retrieve the records failed to be ingested by spark connector
    val failedRecordsDf = featureStoreManager.getFailedStreamIngestionDataFrame()
  }
}
```

 **Submit a Spark job** 

 **Scala** 

You should be able to use Feature Store Spark as a normal dependency. No extra instruction is needed to run the application on all platforms. 

------

# Feature Processing
<a name="feature-store-feature-processing"></a>

Amazon SageMaker Feature Store Feature Processing is a capability with which you can transform raw data into machine learning (ML) features. It provides you with a Feature Processor SDK with which you can transform and ingest data from batch data sources into your feature groups. With this capability, Feature Store takes care of the underlying infrastructure including provisioning the compute environments and creating and maintaining Pipelines to load and ingest data. This way you can focus on your feature processor definitions that includes a transformation function (for example, count of product views, mean of transaction value), sources (where to apply this transformation on), and sinks (where to write the computed feature values to).

Feature Processor pipeline is a Pipelines pipeline. As a Pipelines, you can also track scheduled Feature Processor pipelines with SageMaker AI lineage in the console. For more information on SageMaker AI Lineage, see [Amazon SageMaker ML Lineage Tracking](lineage-tracking.md) This includes tracking scheduled executions, visualizing lineage to trace features back to their data sources, and viewing shared feature processors in a single environment. For information on using Feature Store with the console, see [View pipeline executions from the console](feature-store-use-with-studio.md#feature-store-view-feature-processor-pipeline-executions-studio).

**Topics**
+ [

# Feature Store Feature Processor SDK
](feature-store-feature-processor-sdk.md)
+ [

# Running Feature Store Feature Processor remotely
](feature-store-feature-processor-execute-remotely.md)
+ [

# Creating and running Feature Store Feature Processor pipelines
](feature-store-feature-processor-create-execute-pipeline.md)
+ [

# Scheduled and event based executions for Feature Processor pipelines
](feature-store-feature-processor-schedule-pipeline.md)
+ [

# Monitor Amazon SageMaker Feature Store Feature Processor pipelines
](feature-store-feature-processor-monitor-pipeline.md)
+ [

# IAM permissions and execution roles
](feature-store-feature-processor-iam-permissions.md)
+ [

# Feature Processor restrictions, limits, and quotas
](feature-store-feature-processor-quotas.md)
+ [

# Data sources
](feature-store-feature-processor-data-sources.md)
+ [

# Example Feature Processing code for common use cases
](feature-store-feature-processor-examples.md)

# Feature Store Feature Processor SDK
<a name="feature-store-feature-processor-sdk"></a>

Declare a Feature Store Feature Processor definition by decorating your transformation functions with the `@feature_processor` decorator. The SageMaker AI SDK for Python (Boto3) automatically loads data from the configured input data sources, applies the decorated transformation function, and then ingests the transformed data to a target feature group. Decorated transformation functions must conform to the expected signature of the `@feature_processor` decorator. For more information about the `@feature_processor` decorator, see [@feature\$1processor Decorator](https://sagemaker.readthedocs.io/en/stable/api/prep_data/feature_store.html#feature-processor-decorator) in the Amazon SageMaker Feature Store Read the Docs. 

With the `@feature_processor` decorator, your transformation function runs in a Spark runtime environment where the input arguments provided to your function and its return value are Spark DataFrames. The number of input parameters in your transformation function must match the number of inputs configured in the `@feature_processor` decorator. 

For more information on the `@feature_processor` decorator, see the [Feature Processor Feature Store SDK for Python (Boto3)](https://github.com/aws/sagemaker-python-sdk/tree/master/src/sagemaker/feature_store/feature_processor).

The following code are basic examples on how to use the `@feature_processor` decorator. For more specific example usage cases, see [Example Feature Processing code for common use cases](feature-store-feature-processor-examples.md).

The Feature Processor SDK can be installed from the SageMaker Python SDK and its extras using the following command. 

```
pip install sagemaker[feature-processor]
```

In the following examples, `us-east-1` is the region of the resource, `111122223333` is the resource owner account ID, and `your-feature-group-name` is the feature group name.

The following is a basic feature processor definition, where the `@feature_processor` decorator configures a CSV input from Amazon S3 to be loaded and provided to your transformation function (for example, `transform`), and prepares it for ingestion to a feature group. The last line runs it.

```
from sagemaker.feature_store.feature_processor import CSVDataSource, feature_processor

CSV_DATA_SOURCE = CSVDataSource('s3://your-bucket/prefix-to-csv/')
OUTPUT_FG = 'arn:aws:sagemaker:us-east-1:111122223333:feature-group/your-feature-group-name'

@feature_processor(inputs=[CSV_DATA_SOURCE], output=OUTPUT_FG)
def transform(csv_input_df):
   return csv_input_df
   
transform()
```

The `@feature_processor` parameters include:
+ `inputs` (List[str]): A list of data sources that are used in your Feature Store Feature Processor. If your data sources are feature groups or stored in Amazon S3 you may be able to use Feature Store provided data source definitions for feature processor. For a full list of Feature Store provided data source definitions, see the [Feature Processor Data Source](https://sagemaker.readthedocs.io/en/stable/api/prep_data/feature_store.html#feature-processor-data-source) in the Amazon SageMaker Feature Store Read the Docs.
+ `output` (str): The ARN of the feature group to ingest the output of the decorated function.
+ `target_stores` (Optional[List[str]]): A list of stores (for example, `OnlineStore` or `OfflineStore`) to ingest to the output. If unspecified, data is ingested to all of the output feature group’s enabled stores.
+ `parameters` (Dict[str, Any]): A dictionary to be provided to your transformation function. 
+ `enable_ingestion` (bool): A flag to indicate whether the transformation function’s outputs are ingested to the output feature group. This flag is useful during the development phase. If unspecified, ingestion is enabled.

Optional wrapped function parameters (provided as an argument if provided in the function signature) include:
+ `params` (Dict[str, Any]): The dictionary defined in the `@feature_processor` parameters. It also contains system configured parameters that can be referenced with the key `system`, such as the `scheduled_time` parameter.
+ `spark` (SparkSession): A reference to the SparkSession instance initialized for the Spark Application.

The following code is an example of using the `params` and `spark` parameters.

```
from sagemaker.feature_store.feature_processor import CSVDataSource, feature_processor

CSV_DATA_SOURCE = CSVDataSource('s3://your-bucket/prefix-to-csv/')
OUTPUT_FG = 'arn:aws:sagemaker:us-east-1:111122223333:feature-group/your-feature-group-name' 

@feature_processor(inputs=[CSV_DATA_SOURCE], output=OUTPUT_FG)
def transform(csv_input_df, params, spark):
   
   scheduled_time = params['system']['scheduled_time']
   csv_input_df.createOrReplaceTempView('csv_input_df')
   return spark.sql(f'''
        SELECT *
        FROM csv_input_df
        WHERE date_add(event_time, 1) >= {scheduled_time}
   ''')
   
transform()
```

The `scheduled_time` system parameter (provided in the `params` argument to your function) is an important value to support retrying each execution. The value can help to uniquely identify the Feature Processor’s execution and can be used as a reference point for daterange–based inputs (for example, only loading the last 24 hours worth of data) to guarantee the input range independent of the code’s actual execution time. If the Feature Processor runs on a schedule (see [Scheduled and event based executions for Feature Processor pipelines](feature-store-feature-processor-schedule-pipeline.md)) then its value is fixed to the time it is scheduled to run. The argument can be overridden during synchronous execution using the SDK’s execute API to support use cases such as data backfills or re-running a missed past execution. Its value is the current time if the Feature Processor runs any other way.

For information about authoring Spark code, see the [ Spark SQL Programming Guide](https://spark.apache.org/docs/latest/sql-programming-guide.html).

For more code samples for common use-cases, see the [Example Feature Processing code for common use cases](feature-store-feature-processor-examples.md). 

Note that transformation functions decorated with `@feature_processor` do not return a value. To programmatically test your function, you can remove or monkey patch the `@feature_processor` decorator such that it acts as a pass-through to the wrapped function. For more details on the `@feature_processor` decorator, see [Amazon SageMaker Feature Store Python SDK](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_featurestore.html). 

# Running Feature Store Feature Processor remotely
<a name="feature-store-feature-processor-execute-remotely"></a>

To run your Feature Processors on large data sets that require hardware more powerful than what is locally available, you can decorate your code with the `@remote` decorator to run your local Python code as a single or multi-node distributed SageMaker training job. For more information on running your code as a SageMaker training job, see [Run your local code as a SageMaker training job](train-remote-decorator.md). 

The following is a usage example of the `@remote` decorator along with the `@feature_processor` decorator.

```
from sagemaker.remote_function.spark_config import SparkConfig
from sagemaker.remote_function import remote
from sagemaker.feature_store.feature_processor import CSVDataSource, feature_processor

CSV_DATA_SOURCE = CSVDataSource('s3://bucket/prefix-to-csv/')
OUTPUT_FG = 'arn:aws:sagemaker:us-east-1:123456789012:feature-group/feature-group'

@remote(
    spark_config=SparkConfig(), 
    instance_type="ml.m5.2xlarge",
    dependencies="/local/requirements.txt"
)
@feature_processor(
    inputs=[CSV_DATA_SOURCE], 
    output=OUTPUT_FG,
)
def transform(csv_input_df):
   return csv_input_df
   
transform()
```

The `spark_config` parameter indicates that the remote job runs as a Spark application. The `SparkConfig` instance can be used to configure the Spark Configuration and provide additional dependencies to the Spark application such as Python files, JARs, and files.

For faster iterations when developing your feature processing code, you can specify the `keep_alive_period_in_seconds` argument in the `@remote` decorator to retain configured resources in a warm pool for subsequent training jobs. For more information on warm pools, see `[KeepAlivePeriodInSeconds](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ResourceConfig.html#sagemaker-Type-ResourceConfig-KeepAlivePeriodInSeconds)` in the API Reference guide.

The following code is an example of local `requirements.txt:`

```
sagemaker>=2.167.0
```

This will install the corresponding SageMaker SDK version in remote job which is required for executing the method annotated by `@feature-processor`. 

# Creating and running Feature Store Feature Processor pipelines
<a name="feature-store-feature-processor-create-execute-pipeline"></a>

The Feature Processor SDK provides APIs to promote your Feature Processor Definitions into a fully managed SageMaker AI Pipeline. For more information on Pipelines, see [Pipelines overview](pipelines-overview.md). To convert your Feature Processor Definitions in to a SageMaker AI Pipeline, use the `to_pipeline` API with your Feature Processor definition. You can schedule executions of your Feature Processor Definition can be scheduled, operationally monitor them with CloudWatch metrics, and integrate them with EventBridge to act as event sources or subscribers. For more information about monitoring pipelines created with Pipelines, see [Monitor Amazon SageMaker Feature Store Feature Processor pipelines](feature-store-feature-processor-monitor-pipeline.md).

To view your Feature Processor pipelines, see [View pipeline executions from the console](feature-store-use-with-studio.md#feature-store-view-feature-processor-pipeline-executions-studio).

If your function is also decorated with the `@remote` decorator, then its configurations is carried over to the Feature Processor pipeline. You can specify advanced configurations such as compute instance type and count, runtime dependencies, network and security configurations using the `@remote` decorator.

The following example uses the `to_pipeline` and `execute` APIs.

```
from sagemaker.feature_store.feature_processor import (
    execute, to_pipeline, describe, TransformationCode
)

pipeline_name="feature-processor-pipeline"
pipeline_arn = to_pipeline(
    pipeline_name=pipeline_name,
    step=transform,
    transformation_code=TransformationCode(s3_uri="s3://bucket/prefix"),
)

pipeline_execution_arn = execute(
    pipeline_name=pipeline_name
)
```

The `to_pipeline` API is semantically an upsert operation. It updates the pipeline if it already exists; otherwise, it creates a pipeline.

The `to_pipeline` API optionally accepts an Amazon S3 URI that references a file containing the Feature Processor definition to associate it with the Feature Processor pipeline to track the transformation function and its versions in its SageMaker AI machine learning lineage.

To retrieve a list of every Feature Processor pipeline in your account, you can use the `list_pipelines` API. A subsequent request to the `describe` API returns details related to the Feature Processor pipeline including, but not limited to, Pipelines and schedule details.

The following example uses the `list_pipelines` and `describe` APIs.

```
from sagemaker.feature_store.feature_processor import list_pipelines, describe

feature_processor_pipelines = list_pipelines()

pipeline_description = describe(
    pipeline_name = feature_processor_pipelines[0]
)
```

# Scheduled and event based executions for Feature Processor pipelines
<a name="feature-store-feature-processor-schedule-pipeline"></a>

Amazon SageMaker Feature Store Feature Processing pipeline executions can be configured to start automatically and asynchronously based on a preconfigured schedule or as a result of another AWS service event. For example, you can schedule Feature Processing pipelines to execute on the first of every month or chain two pipelines together so that a target pipeline is executed automatically after a source pipeline execution completes.

**Topics**
+ [

## Schedule based executions
](#feature-store-feature-processor-schedule-pipeline-schedule-based)
+ [

## Event based executions
](#feature-store-feature-processor-schedule-pipeline-event-based)

## Schedule based executions
<a name="feature-store-feature-processor-schedule-pipeline-schedule-based"></a>

The Feature Processor SDK provides a [https://sagemaker.readthedocs.io/en/stable/api/prep_data/feature_store.html#sagemaker.feature_store.feature_processor.schedule](https://sagemaker.readthedocs.io/en/stable/api/prep_data/feature_store.html#sagemaker.feature_store.feature_processor.schedule) API to run Feature Processor pipelines on a recurring basis with Amazon EventBridge Scheduler integration. The schedule can be specified with an `at`, `rate`, or `cron` expression using the [https://docs.aws.amazon.com/scheduler/latest/APIReference/API_CreateSchedule.html#scheduler-CreateSchedule-request-ScheduleExpression](https://docs.aws.amazon.com/scheduler/latest/APIReference/API_CreateSchedule.html#scheduler-CreateSchedule-request-ScheduleExpression) parameter with the same expressions supported by Amazon EventBridge. The schedule API is semantically an upsert operation in that it updates the schedule if it already exists; otherwise, it creates it. For more information on the EventBridge expressions and examples, see [Schedule types on EventBridge Scheduler](https://docs.aws.amazon.com/scheduler/latest/UserGuide/schedule-types.html) in the EventBridge Scheduler User Guide.

The following examples use the Feature Processor [https://sagemaker.readthedocs.io/en/stable/api/prep_data/feature_store.html#sagemaker.feature_store.feature_processor.schedule](https://sagemaker.readthedocs.io/en/stable/api/prep_data/feature_store.html#sagemaker.feature_store.feature_processor.schedule) API, using the `at`, `rate`, and `cron` expressions.

```
from sagemaker.feature_store.feature_processor import schedule
pipeline_name='feature-processor-pipeline'

event_bridge_schedule_arn = schedule(
    pipeline_name=pipeline_name, 
    schedule_expression="at(2020-11-30T00:00:00)"
)

event_bridge_schedule_arn = schedule(
    pipeline_name=pipeline_name, 
    schedule_expression="rate(24 hours)"
)

event_bridge_schedule_arn = schedule(
    pipeline_name=pipeline_name, 
    schedule_expression="cron(0 0-23/1 ? * * 2023-2024)"
)
```

The default timezone for date and time inputs in the `schedule` API are in UTC. For more information about EventBridge Scheduler schedule expressions, see [https://docs.aws.amazon.com/scheduler/latest/APIReference/API_CreateSchedule.html#scheduler-CreateSchedule-request-ScheduleExpression](https://docs.aws.amazon.com/scheduler/latest/APIReference/API_CreateSchedule.html#scheduler-CreateSchedule-request-ScheduleExpression) in the EventBridge Scheduler API Reference documentation.

Scheduled Feature Processor pipeline executions provide your transformation function with the scheduled execution time, to be used as an idempotency token or a fixed reference point for date range–based inputs. To disable (i.e., pause) or re-enable a schedule, use the `state` parameter of the [https://sagemaker.readthedocs.io/en/stable/api/prep_data/feature_store.html#sagemaker.feature_store.feature_processor.schedule](https://sagemaker.readthedocs.io/en/stable/api/prep_data/feature_store.html#sagemaker.feature_store.feature_processor.schedule) API with `‘DISABLED’` or `‘ENABLED’`, respectively.

For information about Feature Processor, see [Feature Processor SDK data sources](feature-store-feature-processor-data-sources-sdk.md). 

## Event based executions
<a name="feature-store-feature-processor-schedule-pipeline-event-based"></a>

A Feature Processing pipeline can be configured to automatically execute when an AWS event occurs. The Feature Processing SDK provides a [https://sagemaker.readthedocs.io/en/stable/api/prep_data/feature_store.html#sagemaker.feature_store.feature_processor.put_trigger](https://sagemaker.readthedocs.io/en/stable/api/prep_data/feature_store.html#sagemaker.feature_store.feature_processor.put_trigger) function that accepts a list of source events and a target pipeline. The source events must be instances of [https://sagemaker.readthedocs.io/en/stable/api/prep_data/feature_store.html#sagemaker.feature_store.feature_processor.FeatureProcessorPipelineEvent](https://sagemaker.readthedocs.io/en/stable/api/prep_data/feature_store.html#sagemaker.feature_store.feature_processor.FeatureProcessorPipelineEvent), that specifies a pipeline and [execution status](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribePipelineExecution.html#sagemaker-DescribePipelineExecution-response-PipelineExecutionStatus) events. 

The `put_trigger` function configures an Amazon EventBridge rule and target to route events and allows you to specify an EventBridge event pattern to respond to any AWS event. For information on these concepts, see Amazon EventBridge [rules](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-rules.html), [targets](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-targets.html), and [event patterns](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-event-patterns.html).

Triggers can be enabled or disabled. EventBridge will start a target pipeline execution using the role provided in the `role_arn` parameter of the `put_trigger` API. The execution role is used by default if the SDK is used in a Amazon SageMaker Studio Classic or Notebook environment. For information on how to get your execution role, see [Get your execution role](sagemaker-roles.md#sagemaker-roles-get-execution-role).

The following example sets up:
+ A SageMaker AI Pipeline using the `to_pipeline` API, that takes in your target pipeline name (`target-pipeline`) and your transformation function (`transform`). For information on your Feature Processor and transform function, see [Feature Processor SDK data sources](feature-store-feature-processor-data-sources-sdk.md).
+ A trigger using the `put_trigger` API, that takes in `FeatureProcessorPipelineEvent` for the event and your target pipeline name (`target-pipeline`). 

  The `FeatureProcessorPipelineEvent` defines the trigger for when the status of your source pipeline (`source-pipeline`) becomes `Succeeded`. For information on the Feature Processor Pipeline event function, see [https://sagemaker.readthedocs.io/en/stable/api/prep_data/feature_store.html#sagemaker.feature_store.feature_processor.FeatureProcessorPipelineEvent](https://sagemaker.readthedocs.io/en/stable/api/prep_data/feature_store.html#sagemaker.feature_store.feature_processor.FeatureProcessorPipelineEvent) in the Feature Store Read the Docs. 

```
from sagemaker.feature_store.feature_processor import put_trigger, to_pipeline, FeatureProcessorPipelineEvent

to_pipeline(pipeline_name="target-pipeline", step=transform)

put_trigger(
    source_pipeline_events=[
        FeatureProcessorPipelineEvent(
            pipeline_name="source-pipeline",
            status=["Succeeded"]
        )
    ],
    target_pipeline="target-pipeline"
)
```

For an example of using event based triggers to create continuous executions and automatic retries for your Feature Processor pipeline, see [Continuous executions and automatic retries using event based triggers](feature-store-feature-processor-examples.md#feature-store-feature-processor-examples-continuous-execution-automatic-retries).

For an example of using event based triggers to create continuous *streaming* and automatic retries using event based triggers, see [Streaming custom data source examples](feature-store-feature-processor-data-sources-custom-examples.md#feature-store-feature-processor-data-sources-custom-examples-streaming). 

# Monitor Amazon SageMaker Feature Store Feature Processor pipelines
<a name="feature-store-feature-processor-monitor-pipeline"></a>

AWS provides monitoring tools to watch your Amazon SageMaker AI resources and applications in real time, report when something goes wrong, and take automatic actions when appropriate. Feature Store Feature Processor pipelines are Pipelines, so the standard monitoring mechanisms and integrations are available. Operational metrics such as execution failures can be monitored via Amazon CloudWatch metrics and Amazon EventBridge events. 

For more information on how to monitor and operationalize Feature Store Feature Processor, see the following resources:
+ [Monitoring AWS resources in Amazon SageMaker AI](monitoring-overview.md) - General guidance on monitoring and auditing activity for SageMaker AI resources.
+ [SageMaker pipelines metrics](monitoring-cloudwatch.md#cloudwatch-metrics-pipelines) - CloudWatch Metrics emitted by Pipelines.
+ [SageMaker pipeline execution state change](automating-sagemaker-with-eventbridge.md#eventbridge-pipeline) - EventBridge events emitted for Pipelines and executions.
+ [Troubleshooting Amazon SageMaker Pipelines](pipelines-troubleshooting.md) - General debugging and troubleshooting tips for Pipelines.

Feature Store Feature Processor execution logs can be found in Amazon CloudWatch Logs under the `/aws/sagemaker/TrainingJobs` log group, where you can find the execution log streams using lookup conventions. For executions created by directly invoking the `@feature_processor` decorated function, you can find logs in your local execution environment’s console. For` @remote` decorated executions, the CloudWatch Logs stream name contains the name of the function and the execution timestamp. For Feature Processor pipeline executions, the CloudWatch Logs stream for the step contains the `feature-processor` string and the pipeline execution ID.

Feature Store Feature Processor pipelines and recent execution statuses can be found in Amazon SageMaker Studio Classic for a given feature group in the Feature Store UI. Feature groups related to the Feature Processor pipelines as either inputs or outputs are displayed in the UI. In addition, the lineage view can provide context into upstream executions, such as data producing Feature Processor pipelines and data sources, for further debugging. For more information on using the lineage view using Studio Classic, see [View lineage from the console](feature-store-use-with-studio.md#feature-store-view-feature-processor-pipeline-lineage-studio).

# IAM permissions and execution roles
<a name="feature-store-feature-processor-iam-permissions"></a>

To use the The Amazon SageMaker Python SDK requires permissions to interact with AWS services. The following policies are required for full Feature Processor functionality. You can attach the [AmazonSageMakerFullAccess](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonSageMakerFullAccess.html) and [AmazonEventBridgeSchedulerFullAccess](https://docs.aws.amazon.com/scheduler/latest/UserGuide/security_iam_id-based-policy-examples.html#security_iam_id-based-policies-managed-policies) AWS Managed Policies attached to your IAM role. For information on attaching policies to your IAM role, see [Adding policies to your IAM role](feature-store-adding-policies.md). See the following examples for details.

The trust policy of the role to which this policy is applied must allow the "scheduler.amazonaws.com", "sagemaker.amazonaws.com", and "glue.amazonaws.com" principles.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "scheduler.amazonaws.com",
                    "sagemaker.amazonaws.com",
                    "glue.amazonaws.com"
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}
```

------

# Feature Processor restrictions, limits, and quotas
<a name="feature-store-feature-processor-quotas"></a>

Amazon SageMaker Feature Store Feature Processing relies on SageMaker AI machine learning (ML) lineage tracking. The Feature Store Feature Processor uses lineage contexts to represent and track Feature Processing Pipelines and Pipeline versions. Each Feature Store Feature Processor consumes at least two lineage contexts (one for the Feature Processing Pipeline and another for the version). If the input or output data source of a Feature Processing Pipeline changes, an additional lineage context is created. You can update SageMaker AI ML lineage limits by reaching out to AWS support for a limit increase. Default limits for resources used by Feature Store Feature Processor are as follows. For information on SageMaker AI ML lineage tracking, see [Amazon SageMaker ML Lineage Tracking](lineage-tracking.md).

For more information on SageMaker AI quotas, see [Amazon SageMaker AI endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/sagemaker.html).

Lineage limits per Region
+ Contexts – 500 (soft limit)
+ Artifacts – 6,000 (soft limit)
+ Associations – 6,000 (soft limit)

Training Limits per Region
+ Longest run time for a training job – 432,000 seconds
+ Maximum number of instances per training job – 20
+ The maximum number of `CreateTrainingJob` requests that you can make, per second, in this account in the current Region – 1 TPS
+ Keep alive period for cluster reuse – 3,600 seconds

Maximum number of Pipelines and concurrent pipeline executions per Region
+ Maximum number of pipelines allowed per account – 500
+ Maximum number of concurrent pipeline executions allowed per account – 20
+ Time at which pipeline executions time out – 672 hours

# Data sources
<a name="feature-store-feature-processor-data-sources"></a>

Amazon SageMaker Feature Store Feature Processing supports multiple data sources. The Feature Processor SDK for Python (Boto3) provides constructs to load data from feature groups or objects stored in Amazon S3. In addition, you can author custom data sources to load data from other data sources. For information about Feature Store provided data sources, see [Feature Processor data source Feature Store Python SDK](https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/feature_store/feature_processor/_data_source.py). 

**Topics**
+ [

# Feature Processor SDK data sources
](feature-store-feature-processor-data-sources-sdk.md)
+ [

# Custom data sources
](feature-store-feature-processor-data-sources-custom.md)
+ [

# Custom data source examples
](feature-store-feature-processor-data-sources-custom-examples.md)

# Feature Processor SDK data sources
<a name="feature-store-feature-processor-data-sources-sdk"></a>

The Amazon SageMaker Feature Store Feature Processor SDK for Python (Boto3) provides constructs to load data from feature groups or objects stored in Amazon S3. For a full list of Feature Store provided data source definitions, see the [Feature Processor data source Feature Store Python SDK](https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/feature_store/feature_processor/_data_source.py). 

For examples on how to use the Feature Store Python SDK data source definitions, see [Example Feature Processing code for common use cases](feature-store-feature-processor-examples.md).

## FeatureGroupDataSource
<a name="feature-store-feature-processor-data-sources-sdk-featuregroup"></a>

The `FeatureGroupDataSource` is used to specify a feature group as an input data source for a Feature Processor. Data can be loaded from an offline store feature group. Attempting to load your data from an online store feature group will result in a validation error. You can specify start and end offsets to limit the data that is loaded to a specific time range. For example, you can specify a start offset of ‘14 days' to load only the last two weeks of data, and you can additionally specify an end offset of '7 days' to limit the input to the previous week of data.

## Feature Store provided data source definitions
<a name="feature-store-feature-processor-data-sources-sdk-provided-sources"></a>

The Feature Store Python SDK contain data source definitions that can be used to specify various input data sources for a Feature Processor. These include CSV, Parquet, and Iceberg table sources. For a full list of Feature Store provided data source definitions, see the [Feature Processor data source Feature Store Python SDK](https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/feature_store/feature_processor/_data_source.py). 

# Custom data sources
<a name="feature-store-feature-processor-data-sources-custom"></a>

On this page we will describe how to create a custom data source class and show some usage examples. With custom data sources, you can use the SageMaker AI SDK for Python (Boto3) provided APIs in the same way as if you are using Amazon SageMaker Feature Store provided data sources. 

To use a custom data source to transform and ingest data into a feature group using Feature Processing, you will need to extend the `PySparkDataSource` class with the following class members and function.
+ `data_source_name` (str): an arbitrary name for the data source. For example, Amazon Redshift, Snowflake, or a Glue Catalog ARN.
+ `data_source_unique_id` (str): a unique identifier that refers to the specific resource being accessed. For example, table name, DDB Table ARN, Amazon S3 prefix. All usage of the same `data_source_unique_id` in custom data sources will be associated to the same data source in the lineage view. Lineage includes information about the execution code of a feature processing workflow, what data sources were used, and how they are ingested into the feature group or feature. For information about viewing lineage of a feature group in **Studio**, see [View lineage from the console](feature-store-use-with-studio.md#feature-store-view-feature-processor-pipeline-lineage-studio).
+ `read_data` (func): a method used to connect with the feature processor. Returns a Spark data frame. For examples, see [Custom data source examples](feature-store-feature-processor-data-sources-custom-examples.md).

Both `data_source_name` and `data_source_unique_id` are used to uniquely identify your lineage entity. The following is an example for a custom data source class named `CustomDataSource`.

```
from sagemaker.feature_store.feature_processor import PySparkDataSource
from pyspark.sql import DataFrame

class CustomDataSource(PySparkDataSource):
    
    data_source_name = "custom-data-source-name"
    data_source_unique_id = "custom-data-source-id"
    
    def read_data(self, parameter, spark) -> DataFrame:
        your own code here to read data into a Spark dataframe
        return dataframe
```

# Custom data source examples
<a name="feature-store-feature-processor-data-sources-custom-examples"></a>

This section provides examples of custom data sources implementations for Feature Processors. For more information on custom data sources, see [Custom data sources](feature-store-feature-processor-data-sources-custom.md).

Security is a shared responsibility between AWS and our customers. AWS is responsible for protecting the infrastructure that runs the services in the AWS Cloud. Customers are responsible for all of their necessary security configuration and management tasks. For example, secrets such as access credentials to data stores should not be hard coded in your custom data sources. You can use AWS Secrets Manager to manage these credentials. For information about Secrets Manager, see [What is AWS Secrets Manager?](https://docs.aws.amazon.com/secretsmanager/latest/userguide/intro.html) in the AWS Secrets Manager user guide. The following examples will use Secrets Manager for your credentials.

**Topics**
+ [

## Amazon Redshift Clusters (JDBC) custom data source examples
](#feature-store-feature-processor-data-sources-custom-examples-redshift)
+ [

## Snowflake custom data source examples
](#feature-store-feature-processor-data-sources-custom-examples-snowflake)
+ [

## Databricks (JDBC) custom data source examples
](#feature-store-feature-processor-data-sources-custom-examples-databricks)
+ [

## Streaming custom data source examples
](#feature-store-feature-processor-data-sources-custom-examples-streaming)

## Amazon Redshift Clusters (JDBC) custom data source examples
<a name="feature-store-feature-processor-data-sources-custom-examples-redshift"></a>

Amazon Redshift offers a JDBC driver that can be used to read data with Spark. For information about how to download the Amazon Redshift JDBC driver, see [Download the Amazon Redshift JDBC driver, version 2.1](https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-download-driver.html). 

To create the custom Amazon Redshift data source class, you will need to overwrite the `read_data` method from the [Custom data sources](feature-store-feature-processor-data-sources-custom.md). 

To connect with an Amazon Redshift cluster you need your:
+ Amazon Redshift JDBC URL (`jdbc-url`)

  For information about obtaining your Amazon Redshift JDBC URL, see [Getting the JDBC URL](https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-obtain-url.html) in the Amazon Redshift Database Developer Guide.
+ Amazon Redshift user name (`redshift-user`) and password (`redshift-password`)

  For information about how to create and manage database users using the Amazon Redshift SQL commands, see [Users](https://docs.aws.amazon.com/redshift/latest/dg/r_Users.html) in the Amazon Redshift Database Developer Guide.
+ Amazon Redshift table name (`redshift-table-name`)

  For information about how to create a table with some examples, see [CREATE TABLE](https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html) in the Amazon Redshift Database Developer Guide.
+ (Optional) If using Secrets Manager, you’ll need the secret name (`secret-redshift-account-info`) where you store your Amazon Redshift access username and password on Secrets Manager.

  For information about Secrets Manager, see [Find secrets in AWS Secrets Manager](https://docs.aws.amazon.com/secretsmanager/latest/userguide/manage_search-secret.html) in the AWS Secrets Manager User Guide. 
+ AWS Region (`your-region`)

  For information about obtaining your current session’s region name using SDK for Python (Boto3), see [region\$1name](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html#boto3.session.Session.region_name) in the Boto3 documentation.

The following example demonstrates how to retrieve the JDBC URL and personal access token from Secrets Manager and override the `read_data` for your custom data source class, `DatabricksDataSource`.

```
from sagemaker.feature_store.feature_processor import PySparkDataSource
import json
import boto3


class RedshiftDataSource(PySparkDataSource):
    
    data_source_name = "Redshift"
    data_source_unique_id = "redshift-resource-arn"
    
    def read_data(self, spark, params):
        url = "jdbc-url?user=redshift-user&password=redshift-password"
        aws_iam_role_arn = "redshift-command-access-role"
        secret_name = "secret-redshift-account-info"
        region_name = "your-region"
        
        session = boto3.session.Session()
        sm_client = session.client(
            service_name='secretsmanager',
            region_name=region_name,
        )
        
        secrets = json.loads(sm_client.get_secret_value(SecretId=secret_name)["SecretString"])
        jdbc_url = url.replace("jdbc-url", secrets["jdbcurl"]).replace("redshift-user", secrets['username']).replace("redshift-password", secrets['password'])
        
        return spark.read \
             .format("jdbc") \
             .option("url", url) \
             .option("driver", "com.amazon.redshift.Driver") \
             .option("dbtable", "redshift-table-name") \
             .option("tempdir", "s3a://your-bucket-name/your-bucket-prefix") \
             .option("aws_iam_role", aws_iam_role_arn) \
             .load()
```

The following example shows how to connect `RedshiftDataSource` to your `feature_processor` decorator.

```
from sagemaker.feature_store.feature_processor import feature_processor
    
@feature_processor(
    inputs=[RedshiftDataSource()],
    output="feature-group-arn",
    target_stores=["OfflineStore"],
    spark_config={"spark.jars.packages": "com.amazon.redshift:redshift-jdbc42:2.1.0.16"}
)
def transform(input_df):
    return input_df
```

To run the feature processor job remotely, you need to provide the jdbc driver by defining `SparkConfig` and pass it to the `@remote` decorator.

```
from sagemaker.remote_function import remote
from sagemaker.remote_function.spark_config import SparkConfig

config = {
    "Classification": "spark-defaults",
    "Properties": {
      "spark.jars.packages": "com.amazon.redshift:redshift-jdbc42:2.1.0.16"
    }
}

@remote(
    spark_config=SparkConfig(configuration=config),
    instance_type="ml.m5.2xlarge",
)
@feature_processor(
    inputs=[RedshiftDataSource()],
    output="feature-group-arn",
    target_stores=["OfflineStore"],
)
def transform(input_df):
    return input_df
```

## Snowflake custom data source examples
<a name="feature-store-feature-processor-data-sources-custom-examples-snowflake"></a>

Snowflake provides a Spark connector that can be used for your `feature_processor` decorator. For information about Snowflake connector for Spark, see [Snowflake Connector for Spark](https://docs.snowflake.com/en/user-guide/spark-connector) in the Snowflake documentation.

To create the custom Snowflake data source class, you will need to override the `read_data` method from the [Custom data sources](feature-store-feature-processor-data-sources-custom.md) and add the Spark connector packages to the Spark classpath. 

To connect with a Snowflake data source you need:
+ Snowflake URL (`sf-url`)

  For information about URLs for accessing Snowflake web interfaces, see [Account Identifiers](https://docs.snowflake.com/en/user-guide/admin-account-identifier) in the Snowflake documentation.
+ Snowflake database (`sf-database`) 

  For information about obtaining the name of your database using Snowflake, see [CURRENT\$1DATABASE](https://docs.snowflake.com/en/sql-reference/functions/current_database) in the Snowflake documentation.
+ Snowflake database schema (`sf-schema`) 

  For information about obtaining the name of your schema using Snowflake, see [CURRENT\$1SCHEMA](https://docs.snowflake.com/en/sql-reference/functions/current_schema) in the Snowflake documentation.
+ Snowflake warehouse (`sf-warehouse`)

  For information about obtaining the name of your warehouse using Snowflake, see [CURRENT\$1WAREHOUSE](https://docs.snowflake.com/en/sql-reference/functions/current_warehouse) in the Snowflake documentation.
+ Snowflake table name (`sf-table-name`)
+ (Optional) If using Secrets Manager, you’ll need the secret name (`secret-snowflake-account-info`) where you store your Snowflake access username and password on Secrets Manager. 

  For information about Secrets Manager, see [Find secrets in AWS Secrets Manager](https://docs.aws.amazon.com/secretsmanager/latest/userguide/manage_search-secret.html) in the AWS Secrets Manager User Guide. 
+ AWS Region (`your-region`)

  For information about obtaining your current session’s region name using SDK for Python (Boto3), see [region\$1name](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html#boto3.session.Session.region_name) in the Boto3 documentation.

The following example demonstrates how to retrieve the Snowflake user name and password from Secrets Manager and override the `read_data` function for your custom data source class `SnowflakeDataSource`.

```
from sagemaker.feature_store.feature_processor import PySparkDataSource
from sagemaker.feature_store.feature_processor import feature_processor
import json
import boto3


class SnowflakeDataSource(PySparkDataSource):
    
    sf_options = { 
        "sfUrl" : "sf-url",
        "sfDatabase" : "sf-database",
        "sfSchema" : "sf-schema",
        "sfWarehouse" : "sf-warehouse",
    }

    data_source_name = "Snowflake"
    data_source_unique_id = "sf-url"
    
    def read_data(self, spark, params):
        secret_name = "secret-snowflake-account-info"
        region_name = "your-region"

        session = boto3.session.Session()
        sm_client = session.client(
            service_name='secretsmanager',
            region_name=region_name,
        )
        
        secrets = json.loads(sm_client.get_secret_value(SecretId=secret_name)["SecretString"])
        self.sf_options["sfUser"] = secrets.get("username")
        self.sf_options["sfPassword"] = secrets.get("password")
        
        return spark.read.format("net.snowflake.spark.snowflake") \
                        .options(**self.sf_options) \
                        .option("dbtable", "sf-table-name") \
                        .load()
```

The following example shows how to connect `SnowflakeDataSource` to your `feature_processor` decorator.

```
from sagemaker.feature_store.feature_processor import feature_processor

@feature_processor(
    inputs=[SnowflakeDataSource()],
    output=feature-group-arn,
    target_stores=["OfflineStore"],
    spark_config={"spark.jars.packages": "net.snowflake:spark-snowflake_2.12:2.12.0-spark_3.3"}
)
def transform(input_df):
    return input_df
```

To run the feature processor job remotely, you need to provide the packages via defining `SparkConfig` and pass it to `@remote` decorator. The Spark packages in the following example are such that `spark-snowflake_2.12` is the Feature Processor Scala version, `2.12.0` is the Snowflake version you wish to use, and `spark_3.3` is the Feature Processor Spark version. 

```
from sagemaker.remote_function import remote
from sagemaker.remote_function.spark_config import SparkConfig

config = {
    "Classification": "spark-defaults",
    "Properties": {
      "spark.jars.packages": "net.snowflake:spark-snowflake_2.12:2.12.0-spark_3.3"
    }
}

@remote(
    spark_config=SparkConfig(configuration=config),
    instance_type="ml.m5.2xlarge",
)
@feature_processor(
    inputs=[SnowflakeDataSource()],
    output="feature-group-arn>",
    target_stores=["OfflineStore"],
)
def transform(input_df):
    return input_df
```

## Databricks (JDBC) custom data source examples
<a name="feature-store-feature-processor-data-sources-custom-examples-databricks"></a>

Spark can read data from Databricks by using the Databricks JDBC driver. For information about the Databricks JDBC driver, see [Configure the Databricks ODBC and JDBC drivers](https://docs.databricks.com/en/integrations/jdbc-odbc-bi.html#configure-the-databricks-odbc-and-jdbc-drivers) in the Databricks documentation.

**Note**  
You can read data from any other database by including the corresponding JDBC driver in Spark classpath. For more information, see [JDBC To Other Databases](https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html) in the Spark SQL Guide.

To create the custom Databricks data source class, you will need to override the `read_data` method from the [Custom data sources](feature-store-feature-processor-data-sources-custom.md) and add the JDBC jar to the Spark classpath. 

To connect with a Databricks data source you need:
+ Databricks URL (`databricks-url`)

  For information about your Databricks URL, see [Building the connection URL for the Databricks driver](https://docs.databricks.com/en/integrations/jdbc-odbc-bi.html#building-the-connection-url-for-the-databricks-driver) in the Databricks documentation.
+ Databricks personal access token (`personal-access-token`)

  For information about your Databricks access token, see [Databricks personal access token authentication](https://docs.databricks.com/en/dev-tools/auth.html#pat) in the Databricks documentation.
+ Data catalog name (`db-catalog`) 

  For information about your Databricks catalog name, see [Catalog name](https://docs.databricks.com/en/sql/language-manual/sql-ref-names.html#catalog-name) in the Databricks documentation.
+ Schema name (`db-schema`)

  For information about your Databricks schema name, see [Schema name](https://docs.databricks.com/en/sql/language-manual/sql-ref-names.html#schema-name) in the Databricks documentation.
+ Table name (`db-table-name`)

  For information about your Databricks table name, see [Table name](https://docs.databricks.com/en/sql/language-manual/sql-ref-names.html#table-name) in the Databricks documentation.
+ (Optional) If using Secrets Manager, you’ll need the secret name (`secret-databricks-account-info`) where you store your Databricks access username and password on Secrets Manager. 

  For information about Secrets Manager, see [Find secrets in AWS Secrets Manager](https://docs.aws.amazon.com/secretsmanager/latest/userguide/manage_search-secret.html) in the AWS Secrets Manager User Guide. 
+ AWS Region (`your-region`)

  For information about obtaining your current session’s region name using SDK for Python (Boto3), see [region\$1name](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html#boto3.session.Session.region_name) in the Boto3 documentation.

The following example demonstrates how to retrieve the JDBC URL and personal access token from Secrets Manager and overwrite the `read_data` for your custom data source class, `DatabricksDataSource`.

```
from sagemaker.feature_store.feature_processor import PySparkDataSource
import json
import boto3


class DatabricksDataSource(PySparkDataSource):
    
    data_source_name = "Databricks"
    data_source_unique_id = "databricks-url"
    
    def read_data(self, spark, params):
        secret_name = "secret-databricks-account-info"
        region_name = "your-region"

        session = boto3.session.Session()
        sm_client = session.client(
            service_name='secretsmanager',
            region_name=region_name,
        )
        
        secrets = json.loads(sm_client.get_secret_value(SecretId=secret_name)["SecretString"])
        jdbc_url = secrets["jdbcurl"].replace("personal-access-token", secrets['pwd'])
         
        return spark.read.format("jdbc") \
                        .option("url", jdbc_url) \
                        .option("dbtable","`db-catalog`.`db-schema`.`db-table-name`") \
                        .option("driver", "com.simba.spark.jdbc.Driver") \
                        .load()
```

The following example shows how to upload the JDBC driver jar, `jdbc-jar-file-name.jar`, to Amazon S3 in order to add it to the Spark classpath. For information about downloading the Spark JDBC driver (`jdbc-jar-file-name.jar`) from Databricks, see [Download JDBC Driver](https://www.databricks.com/spark/jdbc-drivers-download)in the Databricks website.

```
from sagemaker.feature_store.feature_processor import feature_processor
    
@feature_processor(
    inputs=[DatabricksDataSource()],
    output=feature-group-arn,
    target_stores=["OfflineStore"],
    spark_config={"spark.jars": "s3://your-bucket-name/your-bucket-prefix/jdbc-jar-file-name.jar"}
)
def transform(input_df):
    return input_df
```

To run the feature processor job remotely, you need to provide the jars by defining `SparkConfig` and pass it to the `@remote` decorator.

```
from sagemaker.remote_function import remote
from sagemaker.remote_function.spark_config import SparkConfig

config = {
    "Classification": "spark-defaults",
    "Properties": {
      "spark.jars": "s3://your-bucket-name/your-bucket-prefix/jdbc-jar-file-name.jar"
    }
}

@remote(
    spark_config=SparkConfig(configuration=config),
    instance_type="ml.m5.2xlarge",
)
@feature_processor(
    inputs=[DatabricksDataSource()],
    output="feature-group-arn",
    target_stores=["OfflineStore"],
)
def transform(input_df):
    return input_df
```

## Streaming custom data source examples
<a name="feature-store-feature-processor-data-sources-custom-examples-streaming"></a>

You can connect to streaming data sources like Amazon Kinesis, and author transforms with Spark Structured Streaming to read from streaming data sources. For information about the Kinesis connector, see [Kinesis Connector for Spark Structured Streaming](https://github.com/roncemer/spark-sql-kinesis) in GitHub. For information about Amazon Kinesis, see [What Is Amazon Kinesis Data Streams?](https://docs.aws.amazon.com/streams/latest/dev/introduction.html) in the Amazon Kinesis Developer Guide.

To create the custom Amazon Kinesis data source class, you will need to extend the `BaseDataSource` class and override the `read_data` method from [Custom data sources](feature-store-feature-processor-data-sources-custom.md).

To connect to an Amazon Kinesis data stream you need:
+ Kinesis ARN (`kinesis-resource-arn`) 

  For information on Kinesis data stream ARNs, see [Amazon Resource Names (ARNs) for Kinesis Data Streams](https://docs.aws.amazon.com/streams/latest/dev/controlling-access.html#kinesis-using-iam-arn-format) in the Amazon Kinesis Developer Guide.
+ Kinesis data stream name (`kinesis-stream-name`)
+ AWS Region (`your-region`)

  For information about obtaining your current session’s region name using SDK for Python (Boto3), see [region\$1name](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html#boto3.session.Session.region_name) in the Boto3 documentation.

```
from sagemaker.feature_store.feature_processor import BaseDataSource
from sagemaker.feature_store.feature_processor import feature_processor

class KinesisDataSource(BaseDataSource):

    data_source_name = "Kinesis"
    data_source_unique_id = "kinesis-resource-arn"
    
    def read_data(self, spark, params): 
        return spark.readStream.format("kinesis") \
            .option("streamName", "kinesis-stream-name") \
            .option("awsUseInstanceProfile", "false") \
            .option("endpointUrl", "https://kinesis.your-region.amazonaws.com")
            .load()
```

The following example demonstrates how to connect `KinesisDataSource` to your `feature_processor` decorator. 

```
from sagemaker.remote_function import remote
from sagemaker.remote_function.spark_config import SparkConfig
import feature_store_pyspark.FeatureStoreManager as fsm

def ingest_micro_batch_into_fg(input_df, epoch_id):
    feature_group_arn = "feature-group-arn"
    fsm.FeatureStoreManager().ingest_data(
        input_data_frame = input_df,
        feature_group_arn = feature_group_arn
    )

@remote(
    spark_config=SparkConfig(
        configuration={
            "Classification": "spark-defaults", 
            "Properties":{
                "spark.sql.streaming.schemaInference": "true",
                "spark.jars.packages": "com.roncemer.spark/spark-sql-kinesis_2.13/1.2.2_spark-3.2"
            }
        }
    ),
    instance_type="ml.m5.2xlarge",
    max_runtime_in_seconds=2419200 # 28 days
)
@feature_processor(
    inputs=[KinesisDataSource()],
    output="feature-group-arn"
)
def transform(input_df):    
    output_stream = (
        input_df.selectExpr("CAST(rand() AS STRING) as partitionKey", "CAST(data AS STRING)")
        .writeStream.foreachBatch(ingest_micro_batch_into_fg)
        .trigger(processingTime="1 minute")
        .option("checkpointLocation", "s3a://checkpoint-path")
        .start()
    )
    output_stream.awaitTermination()
```

In the example code above we use a few Spark Structured Streaming options while streaming micro-batches into your feature group. For a full list of options, see the [Structured Streaming Programming Guide](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html) in the Apache Spark documentation. 
+ The `foreachBatch` sink mode is a feature that allows you to apply operations and write logic on the output data of each micro-batch of a streaming query. 

  For information on `foreachBatch`, see [Using Foreach and ForeachBatch](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#using-foreach-and-foreachbatch) in the Apache Spark Structured Streaming Programming Guide. 
+ The `checkpointLocation` option periodically saves the state of the streaming application. The streaming log is saved in checkpoint location `s3a://checkpoint-path`.

  For information on the `checkpointLocation` option, see [Recovering from Failures with Checkpointing](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#recovering-from-failures-with-checkpointing) in the Apache Spark Structured Streaming Programming Guide. 
+ The `trigger` setting defines how often the micro-batch processing is triggered in a streaming application. In the example, the processing time trigger type is used with one-minute micro-batch intervals, specified by `trigger(processingTime="1 minute")`. To backfill from a stream source, you can use the available-now trigger type, specified by `trigger(availableNow=True)`.

  For a full list of `trigger` types, see [Triggers](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers) in the Apache Spark Structured Streaming Programming Guide.

**Continuous streaming and automatic retries using event based triggers**

The Feature Processor uses SageMaker Training as compute infrastructure and it has a maximum runtime limit of 28 days. You can use event based triggers to extend your continuous streaming for a longer period of time and recover from transient failures. For more information on schedule and event based executions, see [Scheduled and event based executions for Feature Processor pipelines](feature-store-feature-processor-schedule-pipeline.md).

The following is an example of setting up an event based trigger to keep the streaming Feature Processor pipeline running continuously. This uses the streaming transform function defined in the previous example. A target pipeline can be configured to be triggered when a `STOPPED` or `FAILED` event occurs for a source pipeline execution. Note that the same pipeline is used as the source and target so that it run continuously.

```
import sagemaker.feature_store.feature_processor as fp
from sagemaker.feature_store.feature_processor import FeatureProcessorPipelineEvent
from sagemaker.feature_store.feature_processor import FeatureProcessorPipelineExecutionStatus

streaming_pipeline_name = "streaming-pipeline"
streaming_pipeline_arn = fp.to_pipeline(
    pipeline_name = streaming_pipeline_name,
    step = transform # defined in previous section
)

fp.put_trigger(
    source_pipeline_events=FeatureProcessorPipelineEvents(
        pipeline_name=source_pipeline_name, 
        pipeline_execution_status=[
            FeatureProcessorPipelineExecutionStatus.STOPPED,
            FeatureProcessorPipelineExecutionStatus.FAILED]
    ),
    target_pipeline=target_pipeline_name
)
```

# Example Feature Processing code for common use cases
<a name="feature-store-feature-processor-examples"></a>

The following examples provide sample Feature Processing code for common use cases. For a more detailed example notebook showcasing specific use cases, see [Amazon SageMaker Feature Store Feature Processing notebook](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-featurestore/feature_store_feature_processor.ipynb).

In the following examples, `us-east-1` is the region of the resource, `111122223333` is the resource owner account ID, and `your-feature-group-name` is the feature group name.

The `transactions` data set used in the following examples has the following schema:

```
'FeatureDefinitions': [
  {'FeatureName': 'txn_id', 'FeatureType': 'String'},
  {'FeatureName': 'txn_time', 'FeatureType': 'String'},
  {'FeatureName': 'credit_card_num', 'FeatureType': 'String'},
  {'FeatureName': 'txn_amount', 'FeatureType': 'Fractional'}
]
```

**Topics**
+ [

## Joining data from multiple data sources
](#feature-store-feature-processor-examples-joining-multiple-sources)
+ [

## Sliding window aggregates
](#feature-store-feature-processor-examples-sliding-window-aggregates)
+ [

## Tumbling window aggregates
](#feature-store-feature-processor-examples-tumbling-window-aggregates)
+ [

## Promotion from the offline store to online store
](#feature-store-feature-processor-examples-promotion-offline-to-online-store)
+ [

## Transformations with the Pandas library
](#feature-store-feature-processor-examples-transforms-with-pandas-library)
+ [

## Continuous executions and automatic retries using event based triggers
](#feature-store-feature-processor-examples-continuous-execution-automatic-retries)

## Joining data from multiple data sources
<a name="feature-store-feature-processor-examples-joining-multiple-sources"></a>

```
@feature_processor(
    inputs=[
        CSVDataSource('s3://bucket/customer'), 
        FeatureGroupDataSource('transactions')
    ],
    output='arn:aws:sagemaker:us-east-1:111122223333:feature-group/your-feature-group-name'
)
def join(transactions_df, customer_df):
  '''Combine two data sources with an inner join on a common column'''

  return transactions_df.join(
    customer_df, transactions_df.customer_id == customer_df.customer_id, "inner"
  )
```

## Sliding window aggregates
<a name="feature-store-feature-processor-examples-sliding-window-aggregates"></a>

```
@feature_processor(
    inputs=[FeatureGroupDataSource('transactions')],
    output='arn:aws:sagemaker:us-east-1:111122223333:feature-group/your-feature-group-name'
)
def sliding_window_aggregates(transactions_df):
    '''Aggregates over 1-week windows, across 1-day sliding windows.'''
    from pyspark.sql.functions import window, avg, count
    
    return (
        transactions_df
            .groupBy("credit_card_num", window("txn_time", "1 week", "1 day"))
            .agg(avg("txn_amount").alias("avg_week"), count("*").alias("count_week")) 
            .orderBy("window.start")
            .select("credit_card_num", "window.start", "avg_week", "count_week")
    )
```

## Tumbling window aggregates
<a name="feature-store-feature-processor-examples-tumbling-window-aggregates"></a>

```
@feature_processor(
    inputs=[FeatureGroupDataSource('transactions')],
    output='arn:aws:sagemaker:us-east-1:111122223333:feature-group/your-feature-group-name'
)
def tumbling_window_aggregates(transactions_df, spark):
    '''Aggregates over 1-week windows, across 1-day tumbling windows, as a SQL query.'''

    transactions_df.createOrReplaceTempView('transactions')
    return spark.sql(f'''
        SELECT credit_card_num, window.start, AVG(amount) AS avg, COUNT(*) AS count  
        FROM transactions
        GROUP BY credit_card_num, window(txn_time, "1 week")  
        ORDER BY window.start
    ''')
```

## Promotion from the offline store to online store
<a name="feature-store-feature-processor-examples-promotion-offline-to-online-store"></a>

```
@feature_processor(
    inputs=[FeatureGroupDataSource('transactions')],
    target_stores=['OnlineStore'],
    output='arn:aws:sagemaker:us-east-1:111122223333:feature-group/transactions'
)
def offline_to_online():
    '''Move data from the offline store to the online store of the same feature group.'''

    transactions_df.createOrReplaceTempView('transactions')
    return spark.sql(f'''
        SELECT txn_id, txn_time, credit_card_num, amount
        FROM
            (SELECT *,
            row_number()
            OVER
                (PARTITION BY txn_id
                ORDER BY "txn_time" DESC, Api_Invocation_Time DESC, write_time DESC)
            AS row_number
            FROM transactions)
        WHERE row_number = 1
    ''')
```

## Transformations with the Pandas library
<a name="feature-store-feature-processor-examples-transforms-with-pandas-library"></a>

**Transformations with the Pandas library**

```
@feature_processor(
    inputs=[FeatureGroupDataSource('transactions')],
    target_stores=['OnlineStore'],
    output='arn:aws:sagemaker:us-east-1:111122223333:feature-group/transactions'
)
def pandas(transactions_df):
    '''Author transformations using the Pandas interface.
    
    Requires PyArrow to be installed via pip.
    For more details: https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark
    '''
    import pyspark.pandas as ps
    
    # PySpark DF to Pandas-On-Spark DF (Distributed DF with Pandas interface).
    pandas_on_spark_df = transactions_df.pandas_api()
    # Pandas-On-Spark DF to Pandas DF (Single Machine Only).
    pandas_df = pandas_on_spark_df.to_pandas()
    
    # Reverse: Pandas DF to Pandas-On-Spark DF
    pandas_on_spark_df = ps.from_pandas(pandas_df)
    # Reverse: Pandas-On-Spark DF to PySpark DF
    spark_df = pandas_on_spark_df.to_spark()
    
    return spark_df
```

## Continuous executions and automatic retries using event based triggers
<a name="feature-store-feature-processor-examples-continuous-execution-automatic-retries"></a>

```
from sagemaker.feature_store.feature_processor import put_trigger, to_pipeline, FeatureProcessorPipelineEvent
from sagemaker.feature_store.feature_processor import FeatureProcessorPipelineExecutionStatus

streaming_pipeline_name = "target-pipeline"

to_pipeline(
    pipeline_name=streaming_pipeline_name,
    step=transform
)

put_trigger(
    source_pipeline_events=[
        FeatureProcessorPipelineEvent(
            pipeline_name=streaming_pipeline_name, 
            pipeline_execution_status=[
            FeatureProcessorPipelineExecutionStatus.STOPPED,
            FeatureProcessorPipelineExecutionStatus.FAILED]
        )
    ],
    target_pipeline=streaming_pipeline_name
)
```

# Find features in your feature groups
<a name="feature-store-search-metadata"></a>

With Amazon SageMaker Feature Store, you can search for the features that you created in your feature groups. You can search through all of your features without needing to select a feature group first. The search functionality helps find the features that are relevant to your use case.

**Note**  
The feature groups where you're searching for features must be within your AWS Region and AWS account. For shared feature groups, the feature groups must be made discoverable to your AWS account. For more instructions on how to share the feature group catalog and grant discoverability, see [Share your feature group catalog](feature-store-cross-account-discoverability-share-feature-group-catalog.md).

If you're on a team, and teammates are looking for features to use in their models, they can search through the features in all of the feature groups.

You can add searchable parameters and descriptions to make your features more discoverable. For more information, see [Adding searchable metadata to your features](feature-store-add-metadata.md).

You can search for features using either the console or by using the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Search.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Search.html) API operation in SageMaker AI. The following table lists all of the searchable metadata and whether you can search for it in the console or with the API.


****  

| Searchable metadata | API field name | Searchable in the console? | 
| --- | --- | --- | 
| All Parameters | AllParameters | Yes | 
| Creation time | CreationTime | Yes | 
| Description | Description | Yes | 
| Feature group name | FeatureGroupName | No | 
| Feature name | FeatureName | Yes | 
| Feature type | FeatureType | No | 
| Last modified time | LastModifiedTime | No | 
| Parameters | Parameters.key | Yes | 

## How to search for your features
<a name="feature-store-search-metadata-how-to"></a>

The instructions for using Feature Store through the console depends on whether you have enabled [Amazon SageMaker Studio](studio-updated.md) or [Amazon SageMaker Studio Classic](studio.md) as your default experience. Choose one of the following instructions based on your use case.

### Search for features if Studio is your default experience (console)
<a name="feature-store-search-metadata-how-to-with-studio-updated"></a>

1. Open the Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. Choose **Data** in the left navigation pane to expand the dropdown list.

1. From the dropdown list, choose **Feature Store**.

1. (Optional) To view your features, choose **My account**. To view shared features, choose **Cross account**.

1. Under the **Feature Catalog** tab, choose **My account** to view your feature groups.

1. Under the **Feature Catalog** tab, choose **Cross account** to view feature groups that others made discoverable to you. Under **Created by**, you can view the resource owner account ID.

1. You can search for your feature in the **Search** dropdown list:
   + (Optional) To filter your search, choose the filter icon next to the **Search** dropdown list. You can use filters to specify parameters or date ranges in your search results. If you search for a parameter, specify both its key and value. To find your features, specify time ranges, or clear (deselect) columns that you don't want to query.
   + For shared resources, you can only edit feature group metadata or feature definitions if you have the proper access permission granted from the resource owner account. The discoverability permission alone won't allow you to edit metadata or feature definitions. For more information about granting access permissions, see[Enabling cross account access](feature-store-cross-account-access.md).

### Search for features if Studio Classic is your default experience (console)
<a name="feature-store-search-metadata-how-to-with-studio-classic"></a>

Use the latest version of Amazon SageMaker Studio Classic so that you have the most recent version of the search functionality. For information about updating Studio Classic, see [Shut Down and Update Amazon SageMaker Studio Classic](studio-tasks-update-studio.md).

1. Open the Studio Classic console by following the instructions in [Launch Amazon SageMaker Studio Classic](studio-launch.md).

1. Choose the **Home** icon (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)) in the left navigation pane.

1. Choose **Data**.

1. From the dropdown list, choose **Feature Store**.

1. (Optional) To view your features, choose **My account**. To view shared features, choose **Cross account**.

1. Under the **Feature Catalog** tab, choose **My account** to view your feature groups.

1. Under the **Feature Catalog** tab, choose **Cross account** to view feature groups that others made discoverable to you. Under **Created by**, you can view the resource owner account ID.

1. You can search for your feature in the **Search** dropdown list:
   + (Optional) To filter your search, choose the filter icon next to the **Search** dropdown list. You can use filters to specify parameters or date ranges in your search results. If you search for a parameter, specify both its key and value. To find your features, specify time ranges, or clear (deselect) columns that you don't want to query.
   + For shared resources, you can only edit feature group metadata or feature definitions if you have the proper access permission granted from the resource owner account. The discoverability permission alone won't allow you to edit metadata or feature definitions. For more information about granting access permissions, see[Enabling cross account access](feature-store-cross-account-access.md).

### Search for your features using SDK for Python (Boto3)
<a name="feature-store-search-metadata-how-to-with-sdk"></a>

The code in this section uses the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Search.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Search.html) operation in the AWS SDK for Python (Boto3) to run the search query to find features in your feature groups. For information about the other languages to submit a query, see [See Also](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Search.html#API_Search_SeeAlso) in the *Amazon SageMaker API Reference*.

For more Feature Store examples and resources, see [Amazon SageMaker Feature Store resources](feature-store-resources.md).

The following code shows different example search queries using the API:

```
# Return all features in your feature groups
sagemaker_client.search(
    Resource="FeatureMetadata",
)  

# Search for all features that belong to a feature group that contain the "ver" substring
sagemaker_client.search(
    Resource="FeatureMetadata",
    SearchExpression={
        'Filters': [
            {
                'Name': 'FeatureGroupName',
                'Operator': 'Contains',
                'Value': 'ver'
            },
        ]
    }
)

# Search for all features that belong to a feature group that have the EXACT name "airport"
sagemaker_client.search(
    Resource="FeatureMetadata",
    SearchExpression={
        'Filters': [
            {
                'Name': 'FeatureGroupName',
                'Operator': 'Equals',
                'Value': 'airport'
            },
        ]
    }
)

# Search for all features that belong to a feature group that contains the name "ver"
AND have a name that contains "wha"
AND have a parameter (key or value) that contains "hea"

sagemaker_client.search(
    Resource="FeatureMetadata",
    SearchExpression={
        'Filters': [
            {
                'Name': 'FeatureGroupName',
                'Operator': 'Contains',
                'Value': 'ver'
            },
            {
                'Name': 'FeatureName',
                'Operator': 'Contains',
                'Value': 'wha'
            },
            {
                'Name': 'AllParameters', 
                'Operator': 'Contains',
                'Value': 'hea'
            },
        ]
    }
)  

# Search for all features that belong to a feature group with substring "ver" in its name
OR features that have a name that contain "wha"
OR features that have a parameter (key or value) that contains "hea"

sagemaker_client.search(
    Resource="FeatureMetadata",
    SearchExpression={
        'Filters': [
            {
                'Name': 'FeatureGroupName',
                'Operator': 'Contains',
                'Value': 'ver'
            },
            {
                'Name': 'FeatureName',
                'Operator': 'Contains',
                'Value': 'wha'
            },
            {
                'Name': 'AllParameters', 
                'Operator': 'Contains',
                'Value': 'hea'
            },
        ],
        'Operator': 'Or' # note that this is explicitly set to "Or"- the default is "And"
    }
)              


# Search for all features that belong to a feature group with substring "ver" in its name
OR features that have a name that contain "wha"
OR parameters with the value 'Sage' for the 'org' key

sagemaker_client.search(
    Resource="FeatureMetadata",
    SearchExpression={
        'Filters': [
            {
                'Name': 'FeatureGroupName',
                'Operator': 'Contains',
                'Value': 'ver'
            },
            {
                'Name': 'FeatureName',
                'Operator': 'Contains',
                'Value': 'wha'
            },
            {
                'Name': 'Parameters.org', 
                'Operator': 'Contains',
                'Value': 'Sage'
            },
        ],
        'Operator': 'Or' # note that this is explicitly set to "Or"- the default is "And"
    }
)
```

# Find feature groups in your Feature Store
<a name="feature-store-search-feature-group-metadata"></a>

With Amazon SageMaker Feature Store, you can search for the feature groups using either the console or the [Search](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Search.html) operation. You can use the search functionality to find features and feature groups that are relevant to the models that you're creating. You can use the search functionality to quickly find the feature groups that are relevant to your use case.

**Note**  
The feature groups that you're searching for must be within your AWS Region and AWS account, or shared with and made discoverable to your AWS account. For more information about how to share the feature group catalog and grant discoverability, see [Share your feature group catalog](feature-store-cross-account-discoverability-share-feature-group-catalog.md).

The following table shows the searchable fields and whether you can use the console to search for a specific field.

You can search for features using either Amazon SageMaker Studio Classic or the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Search.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Search.html) operation in the SageMaker API. The following table lists all of the searchable metadata and whether you can search for it in the console. Tags are searchable for your own feature groups but are not searchable for feature groups made discoverable to you.


****  

| Searchable metadata | API field name | Searchable in the console? | Searchable with cross account? | 
| --- | --- | --- | --- | 
| All Tags | AllTags | Yes | No | 
| Creation Failure Reason | FailureReason | No | No | 
| Creation Status | [FeatureGroupStatus](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_FeatureGroup.html) | Yes | Yes | 
| Creation time | CreationTime | Yes | Yes | 
| Description | Description | Yes | Yes | 
| Event Time Feature Name | EventTimeFeatureName | No | No | 
| Feature Definitions | [FeatureDefinitions](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_FeatureDefinition.html) | No | No | 
| Feature Group ARN | [FeatureGroupARN](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_FeatureGroup.html) | No | No | 
| Feature Group Name | [FeatureGroupName](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_FeatureGroup.html) | Yes | Yes | 
| Offline Store Configuration | [OfflineStoreConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_OfflineStoreConfig.html) | No | No | 
| Offline Store Status | [OfflineStoreStatus](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_OfflineStoreStatus.html) | Yes | Yes | 
| Last Update Status | [LastUpdateStatus](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_LastUpdateStatus.html) | No | No | 
| Record Identfier Feature Name | RecordIdentifierFeatureName | Yes | Yes | 
| Tags | Tags.key | Yes | No | 

## How to find feature groups
<a name="feature-store-search-feature-group-metadata-how-to"></a>

You can use the console or the Amazon SageMaker Feature Store API to find your feature groups. The instructions for using Feature Store through the console depends on if you have enabled [Amazon SageMaker Studio](studio-updated.md) or [Amazon SageMaker Studio Classic](studio.md) as your default experience.

### Find feature groups if Studio is your default experience (console)
<a name="feature-store-search-feature-group-metadata-how-to-using-studio-updated"></a>

1. Open the Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. Choose **Data** in the left navigation pane to expand the dropdown list.

1. From the dropdown list, choose **Feature Store**.

1. (Optional) To view your feature groups, choose **My account**. To view shared feature groups, choose **Cross account**.

1. Under the **Feature Group Catalog** tab, choose **My account** to view your feature groups.

1. Under the **Feature Group Catalog** tab, choose **Cross account** to view feature groups that others made discoverable to you. Under **Created by**, you can view the resource owner account ID.

1. You can search for your feature groups in the **Search** dropdown list:
   + (Optional) To filter your search, choose the filter icon next to the **Search** dropdown list. You can use filters to specify parameters or date ranges in your search results. If you search for a parameter, specify both its key and value. To find your feature groups, you can specify time ranges, clear (deselect) columns that you don't want to query, choose stores to search, or search by status.
   + For shared resources, you can only edit feature group metadata or feature definitions if you have the proper access permission granted from the resource owner account. The discoverability permission alone won't allow you to edit metadata or feature definitions. For more information about granting access permissions, see [Enabling cross account access](feature-store-cross-account-access.md).

### Find feature groups if Studio Classic is your default experience (console)
<a name="feature-store-search-feature-group-metadata-how-to-with-studio-classic"></a>

Use the latest version of Amazon SageMaker Studio Classic to get the most recent version of the search functionality if you are accessing Feature Store through the Studio Classic application. For information about updating Studio Classic, see [Shut Down and Update Amazon SageMaker Studio Classic](studio-tasks-update-studio.md).

1. Open the Studio Classic console by following the instructions in [Launch Amazon SageMaker Studio Classic](studio-launch.md).

1. Choose the **Home** icon (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)) in the left navigation pane.

1. Choose **Data**.

1. From the dropdown list, choose **Feature Store**.

1. (Optional) To view your feature groups, choose **My account**. To view shared feature groups, choose **Cross account**.

1. Under the **Feature Group Catalog** tab, choose **My account** to view your feature groups.

1. Under the **Feature Group Catalog** tab, choose **Cross account** to view feature groups that others made discoverable to you. Under **Created by**, you can view the resource owner account ID.

1. You can search for your feature groups in the **Search** dropdown list:
   + (Optional) To filter your search, choose the filter icon next to the **Search** dropdown list. You can use filters to specify parameters or date ranges in your search results. If you search for a parameter, specify both its key and value. To find your feature groups more easily, you can specify time ranges, clear (deselect) columns that you don't want to query, choose stores to search, or search by status.
   + For shared resources, you can only edit feature group metadata or feature definitions if you have the proper access permission granted from the resource owner account. The discoverability permission alone won't allow you to edit metadata or feature definitions. For more information about granting access permissions, see [Enabling cross account access](feature-store-cross-account-access.md).

### Find feature groups using SDK for Python (Boto3)
<a name="feature-store-search-feature-group-metadata-how-to-with-sdk"></a>

The code in this section uses the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Search.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Search.html) operation in the AWS SDK for Python (Boto3) to run the search query to find feature groups. For information about the other languages to submit a query, see [See Also](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Search.html#API_Search_SeeAlso) in the *Amazon SageMaker API Reference*.

For more Feature Store examples and resources, see [Amazon SageMaker Feature Store resources](feature-store-resources.md).

The following code shows different example search queries using the API:

```
# Return all feature groups
sagemaker_client.search(
    Resource="FeatureGroups",
)  

# Search for feature groups that are shared with your account
sagemaker_session.search(
    resource="FeatureGroup",
    search_expression={
        "Filters": [
            {
                "Name": "FeatureGroupName",
                "Value": "MyFeatureGroup",
                "Operator": "Contains",
            }
        ],
        "Operator": "And",
    },
    sort_by="Name",
    sort_order="Ascending",
    next_token="token",
    max_results=50,
    CrossAccountFilterOption="SameAccount"
)

# Search for all feature groups with a name that contains the "ver" substring
sagemaker_client.search(
    Resource="FeatureGroups",
    SearchExpression={
        'Filters': [
            {
                'Name': 'FeatureGroupName',
                'Operator': 'Contains',
                'Value': 'ver'
            },
        ]
    }
)

# Search for all feature groups that have the EXACT name "airport"
sagemaker_client.search(
    Resource="FeatureGroups",
    SearchExpression={
        'Filters': [
            {
                'Name': 'FeatureGroupName',
                'Operator': 'Equals',
                'Value': 'airport'
            },
        ]
    }
)

# Search for all feature groups that contains the name "ver"
# AND have a record identifier feature name that contains "wha"
# AND have a tag (key or value) that contains "hea"
sagemaker_client.search(
    Resource="FeatureGroups",
    SearchExpression={
        'Filters': [
            {
                'Name': 'FeatureGroupName',
                'Operator': 'Contains',
                'Value': 'ver'
            },
            {
                'Name': 'RecordIdentifierFeatureName',
                'Operator': 'Contains',
                'Value': 'wha'
            },
            {
                'Name': 'AllTags', 
                'Operator': 'Contains',
                'Value': 'hea'
            },
        ]
    }
)  

# Search for all feature groups with substring "ver" in its name
# OR feature groups that have a record identifier feature name that contains "wha"
# OR feature groups that have a tag (key or value) that contains "hea"
sagemaker_client.search(
    Resource="FeatureGroups",
    SearchExpression={
        'Filters': [
            {
                'Name': 'FeatureGroupName',
                'Operator': 'Contains',
                'Value': 'ver'
            },
            {
                'Name': 'RecordIdentifierFeatureName',
                'Operator': 'Contains',
                'Value': 'wha'
            },
            {
                'Name': 'AllTags', 
                'Operator': 'Contains',
                'Value': 'hea'
            },
        ],
        'Operator': 'Or' # note that this is explicitly set to "Or"- the default is "And"
    }
)              


# Search for all feature groups with substring "ver" in its name
# OR feature groups that have a record identifier feature name that contains "wha"
# OR tags with the value 'Sage' for the 'org' key
sagemaker_client.search(
    Resource="FeatureGroups",
    SearchExpression={
        'Filters': [
            {
                'Name': 'FeatureGroupName',
                'Operator': 'Contains',
                'Value': 'ver'
            },
            {
                'Name': 'RecordIdentifierFeatureName',
                'Operator': 'Contains',
                'Value': 'wha'
            },
            {
                'Name': 'Tags.org', 
                'Operator': 'Contains',
                'Value': 'Sage'
            },
        ],
        'Operator': 'Or' # note that this is explicitly set to "Or"- the default is "And"
    }
)

# Search for all offline only feature groups
sagemaker_client.search(
    Resource="FeatureGroups",
    SearchExpression={
        'Filters': [
            {
                'Name': 'OnlineStoreConfig.EnableOnlineStore',
                'Operator': 'NotEquals',
                'Value': 'true'
            },
            {
                'Name': 'OfflineStoreConfig.S3StorageConfig.S3Uri',
                'Operator': 'Exists'
            }
        ]
    }
)

# Search for all online only feature groups
sagemaker_client.search(
    Resource="FeatureGroups",
    SearchExpression={
        'Filters': [
            {
                'Name': 'OnlineStoreConfig.EnableOnlineStore',
                'Operator': 'Equals',
                'Value': 'true'
            },
            {
                'Name': 'OfflineStoreConfig.S3StorageConfig.S3Uri',
                'Operator': 'NotExists'
            }
        ]
    }
)

# Search for all feature groups that are BOTH online and offline
sagemaker_client.search(
    Resource="FeatureGroups",
    SearchExpression={
        'Filters': [
            {
                'Name': 'OnlineStoreConfig.EnableOnlineStore',
                'Operator': 'Equals',
                'Value': 'true'
            },
            {
                'Name': 'OfflineStoreConfig.S3StorageConfig.S3Uri',
                'Operator': 'Exists'
            }
        ]
    }
)
```

You can also use python SDK of AWS RAM APIs to create resource share. The API signature is given below. To use python SDK of AWS RAM API, you need attach AWS RAM full access managed policy with execution Role.

```
response = client.create_resource_share(
    name='string',
    resourceArns=[
        'string',
    ],
    principals=[
        'string',
    ],
    tags=[
        {
            'key': 'string',
            'value': 'string'
        },
    ],
    allowExternalPrincipals=True|False,
    clientToken='string',
    permissionArns=[
        'string',
    ]
)
```

# Adding searchable metadata to your features
<a name="feature-store-add-metadata"></a>

In Amazon SageMaker Feature Store, you can search through all of your features. To make your features more discoverable, you can add metadata to them. You can add the following types of metadata:
+ Description – A searchable description of the feature.
+ Parameters – Searchable key-value pairs.

The description can have up to 255 characters. For parameters, you must specify a key-value pair in your search. You can add up to 25 parameters.

To update the metadata of a feature, you can use either the console or the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateFeatureMetadata.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateFeatureMetadata.html) operation.

## How to add searchable metadata to your features
<a name="feature-store-add-metadata-how-to"></a>

You can use the console or the Amazon SageMaker Feature Store API to add searchable metadata to your features. Instructions for using Feature Store through the console depend on whether you have enabled [Amazon SageMaker Studio](studio-updated.md) or [Amazon SageMaker Studio Classic](studio.md) as your default experience.

### Add searchable metadata to features if Studio is your default experience (console)
<a name="feature-store-add-metadata-how-to-with-studio-updated"></a>

1. Open the Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. Choose **Data** in the left navigation pane, to expand the dropdown list.

1. From the dropdown list, choose **Feature Store**.

1. (Optional) To view your features, choose **My account**. To view shared features, choose **Cross account**.

1. To view your feature groups, under the **Feature Catalog** tab, choose **My account**.

1. Under the **Feature Catalog** tab, choose **Cross account** to view feature groups that others make discoverable to you. Under **Created by**, you can view the resource owner account ID of the feature group.

1. You can search for your feature in the **Search** dropdown list.
   + (Optional) To filter your search, choose the filter icon next to the **Search** dropdown list. You can use filters to specify parameters or date ranges in your search results. If you search for a parameter, specify both its key and value. To find your features more easily, you can specify time ranges or deselect columns that you don't want to query.
   + For shared resources, you can only edit feature group metadata or feature definitions if you have the proper access permission granted from the resource owner account. Having the discoverability permission alone doesn't allow you to edit metadata or feature definitions. For more information about granting access permissions, see[Enabling cross account access](feature-store-cross-account-access.md).

1. Choose your feature.

1. Choose **Edit metadata**.

1. In the **Description** field, add or update the description.

1. In the **Parameters** field under **Parameters**, specify a key-value pair for the parameter.

1. (Optional) Choose **Add new parameter** to add another parameter.

1. Choose **Save changes**.

1. Choose **Confirm**.

### Add searchable metadata to your features if Studio Classic is your default experience (console)
<a name="feature-store-add-metadata-how-to-with-studio-classic"></a>

1. Open the Studio Classic console by following the instructions in [Launch Amazon SageMaker Studio Classic Using the Amazon SageMaker AI Console](studio-launch.md#studio-launch-console).

1. In the left navigation pane, choose the **Home** icon (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Choose **Data**.

1. From the dropdown list, choose **Feature Store**.

1. (Optional) To view your features, choose **My account**. To view shared features, choose **Cross account**.

1. To view your feature groups, under the **Feature Catalog** tab, choose **My account**.

1. Under the **Feature Catalog** tab, choose **Cross account** to view feature groups that other accounts made discoverable to you. Under **Created by**, you can view the resource owner account ID of the feature group.

1. You can search for your feature in the **Search** dropdown list.
   + (Optional) To filter your search, choose the filter icon next to the **Search** dropdown list. You can use filters to specify parameters or date ranges in your search results. If you search for a parameter, specify both its key and value. To find your features more easily, you can specify time ranges or deselect columns that you don't want to query.
   + For shared resources, you can only edit feature group metadata or feature definitions if you have the proper access permission granted from the resource owner account. Having the discoverability permission alone doesn't allow you to edit metadata or feature definitions. For more information about granting access permissions, see[Enabling cross account access](feature-store-cross-account-access.md).

1. Choose your feature.

1. Choose **Edit metadata**.

1. In the **Description** field, add or update the description.

1. In the **Parameters** field under **Parameters**, specify a key-value pair for the parameter.

1. (Optional) Choose **Add new parameter** to add another parameter.

1. Choose **Save changes**.

1. Choose **Confirm**.

### Add searchable metadata to your features using SDK for Python (Boto3)
<a name="feature-store-add-metadata-how-to-with-sdk"></a>

The code in this section uses the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateFeatureMetadata.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateFeatureMetadata.html) operation in the AWS SDK for Python (Boto3) to add searchable metadata to your features for different scenarios. For information about the other languages to submit a query, see [See Also](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateFeatureMetadata.html#API_Search_SeeAlso) in the *Amazon SageMaker API Reference*.

For more Feature Store examples and resources, see [Amazon SageMaker Feature Store resources](feature-store-resources.md).

------
#### [ Add a list of parameters to a feature ]

To add a list of parameters to a feature, specify values for the following fields:
+ `FeatureGroupName`
+ `Feature`
+ `Parameters`

The following example code uses the AWS SDK for Python (Boto3) to add two parameters.

```
sagemaker_client.update_feature_metadata(
    FeatureGroupName="feature_group_name",
    FeatureName="feature-name",
    ParameterAdditions=[
        {"Key": "example-key-0", "Value": "example-value-0"},
        {"Key": "example-key-1", "Value": "example-value-1"},
    ]
)
```

------
#### [ Add a description to a feature ]

To add a description to a feature, specify values for the following fields:
+ `FeatureGroupName`
+ `Feature`
+ `Description`

```
sagemaker_client.update_feature_metadata(
    FeatureGroupName="feature-group-name",
    FeatureName="feature-name",
    Description="description"
)
```

------
#### [ Remove parameters for a feature ]

To remove all parameters for a feature, do the following.

Specify values for the following fields:
+ `FeatureGroupName`
+ `Feature`

Specify the keys for the parameters that you're removing under `ParameterRemovals`.

```
sagemaker_client.update_feature_metadata(
    FeatureGroupName="feature_group_name",
    FeatureName="feature-name",
        ParameterRemovals=[
        {"Key": "example-key-0"},
        {"Key": "example-key-1"},
    ]
)
```

------
#### [ Remove the description for a feature ]

To remove the description for a feature, do the following.

Specify values for the following fields:
+ `FeatureGroupName`
+ `Feature`

Specify an empty string for `Description`.

```
sagemaker_client.update_feature_metadata(
    FeatureGroupName="feature-group-name",
    FeatureName="feature-name",
    Description=""
)
```

------

#### Example code
<a name="feature-store-add-metadata-python-sdk-example"></a>

After you've updated the metadata for a feature, you can use the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeFeatureMetadata.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeFeatureMetadata.html) operation to see the updates that you've made.

The following code goes through an example workflow using the AWS SDK for Python (Boto3). The example code does the following:

1. Sets up your SageMaker AI environment.

1. Creates a feature group.

1. Adds features to the group.

1. Adds metadata to the features.

For more Feature Store examples and resources, see [Amazon SageMaker Feature Store resources](feature-store-resources.md).

##### Step 1: Setup
<a name="feature-store-add-metadata-step-1"></a>

To start using Feature Store, create SageMaker AI, boto3 and Feature Store sessions. Then set up the S3 bucket you want to use for your features. This is your offline store. The following code uses the SageMaker AI default bucket and adds a custom prefix to it.

**Note**  
The role that you use must have the following managed policies attached to it: `AmazonS3FullAccess` and `AmazonSageMakerFeatureStoreAccess`.

```
# SageMaker Python SDK version 2.x is required
%pip install 'sagemaker>=2.0.0'
import sagemaker
import sys
```

```
import boto3
import pandas as pd
import numpy as np
import io
from sagemaker.session import Session
from sagemaker import get_execution_role
from botocore.exceptions import ClientError


prefix = 'sagemaker-featurestore-introduction'
role = get_execution_role()

sagemaker_session = sagemaker.Session()
region = sagemaker_session.boto_region_name
s3_bucket_name = sagemaker_session.default_bucket()
sagemaker_client = boto_session.client(service_name='sagemaker', region_name=region)
```

##### Step 2: Create a feature group and add features
<a name="feature-store-add-metadata-step-2"></a>

The following code is an example of creating a feature group with feature definitions.

```
feature_group_name = "test-for-feature-metadata"
feature_definitions = [
    {"FeatureName": "feature-1", "FeatureType": "String"},
    {"FeatureName": "feature-2", "FeatureType": "String"},
    {"FeatureName": "feature-3", "FeatureType": "String"},
    {"FeatureName": "feature-4", "FeatureType": "String"},
    {"FeatureName": "feature-5", "FeatureType": "String"}
]
try:
    sagemaker_client.create_feature_group(
        FeatureGroupName=feature_group_name,
        RecordIdentifierFeatureName="feature-1",
        EventTimeFeatureName="feature-2",
        FeatureDefinitions=feature_definitions,
        OnlineStoreConfig={"EnableOnlineStore": True}
    )
except ClientError as e:
    if e.response["Error"]["Code"] == "ResourceInUse":
        pass
    else:
        raise e
```

##### Step 3: Add metadata
<a name="feature-store-add-metadata-step-3"></a>

Before you add metadata, use the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeFeatureGroup.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeFeatureGroup.html) operation to make sure that the status of the feature group is `Created`.

```
sagemaker_client.describe_feature_group(
        FeatureGroupName=feature_group_name
    )
```

Add a description to the feature.

```
sagemaker_client.update_feature_metadata(
    FeatureGroupName=feature_group_name,
    FeatureName="feature-1",
    Description="new description"
)
```

You can use the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeFeatureMetadata.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeFeatureMetadata.html) operation to see if you successfully updated the description for the feature group.

```
    sagemaker_client.describe_feature_metadata(
    FeatureGroupName=feature_group_name,
    FeatureName="feature-1"
)
```

You can also use it to add parameters to the feature group.

```
sagemaker_client.update_feature_metadata(
    FeatureGroupName=feature_group_name,
    FeatureName="feature-1",
    ParameterAdditions=[
        {"Key": "team", "Value": "featurestore"},
        {"Key": "org", "Value": "sagemaker"},
    ]
)
```

You can use the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeFeatureMetadata.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeFeatureMetadata.html) operation again to see if you have successfully added the parameters.

```
    sagemaker_client.describe_feature_metadata(
    FeatureGroupName=feature_group_name,
    FeatureName="feature-1"
)
```

# Create a dataset from your feature groups
<a name="feature-store-create-a-dataset"></a>

 After a Feature Store feature group has been created in an offline store, you can choose to use the following methods to get your data:
+ Using the Amazon SageMaker Python SDK
+ Running SQL queries in the Amazon Athena

**Important**  
Feature Store requires data to be registered in a AWS Glue data catalog. By default, Feature Store automatically builds an AWS Glue data catalog when you create a feature group.

After you've created feature groups for your offline store and populated them with data, you can create a dataset by running queries or using the SDK to join data stored in the offline store from different feature groups. You can also join the feature groups to a single pandas dataframe. You can use Amazon Athena to write and execute SQL queries.

**Note**  
To make sure that your data is up to date, you can set up a AWS Glue crawler to run on a schedule.  
 To set up a AWS Glue crawler, specify an IAM role that the crawler is using to access the offline store’s Amazon S3 buckets. For more information, see [Create an IAM role](https://docs.aws.amazon.com/glue/latest/dg/create-an-iam-role.html).  
 For more information on how to use AWS Glue and Athena to build a training dataset for model training and inference, see [Use Feature Store with SDK for Python (Boto3)](feature-store-create-feature-group.md). 

## Using the Amazon SageMaker Python SDK to get your data from your feature groups
<a name="feature-store-dataset-python-sdk"></a>

You can use the [Feature Store APIs](https://sagemaker.readthedocs.io/en/stable/api/prep_data/feature_store.html#dataset-builder) to create a dataset from your feature groups. Data scientists create ML datasets for training by retrieving ML feature data from one or more feature groups in the offline store. Use the `create_dataset()` function to create the dataset. You can use the SDK to do the following:
+ Create a dataset from multiple feature groups.
+ Create a dataset from the feature groups and a pandas data frame.

By default, Feature Store doesn't include records that you've deleted from the dataset. It also doesn't include duplicated records. A duplicate record has the record ID and timestamp value in the event time column.

Before you use the SDK to create a dataset, you must start a SageMaker AI session. Use the following code to start the session.

```
import boto3
from sagemaker.session import Session
from sagemaker.feature_store.feature_store import FeatureStore

region = boto3.Session().region_name
boto_session = boto3.Session(region_name=region)

sagemaker_client = boto_session.client(
    service_name="sagemaker", region_name=region
 )
featurestore_runtime = boto_session.client(
    service_name="sagemaker-featurestore-runtime",region_name=region
)

feature_store_session = Session(
    boto_session=boto_session,
    sagemaker_client=sagemaker_client,
    sagemaker_featurestore_runtime_client=featurestore_runtime,
)

feature_store = FeatureStore(feature_store_session)
```

The following code shows an example of creating a dataset from multiple feature groups. The following code snippet uses the example feature groups "*base\$1fg\$1name*", "*first\$1fg\$1name*", and "*second\$1fg\$1name*", which may not exist or have the same schema within your Feature Store. It is recommended to replace these feature groups with feature groups that exist within your Feature Store. For information on how to create a feature group, see [Step 3: Create feature groups](feature-store-introduction-notebook.md#feature-store-set-up-feature-groups-introduction). 

```
from sagemaker.feature_store.feature_group import FeatureGroup

s3_bucket_name = "offline-store-sdk-test" 

base_fg_name = "base_fg_name"
base_fg = FeatureGroup(name=base_fg_name, sagemaker_session=feature_store_session)

first_fg_name = "first_fg_name"
first_fg = FeatureGroup(name=first_fg_name, sagemaker_session=feature_store_session)

second_fg_name = "second_fg_name"
second_fg = FeatureGroup(name=second_fg_name, sagemaker_session=feature_store_session)

feature_store = FeatureStore(feature_store_session)
builder = feature_store.create_dataset(
    base=base_fg,
    output_path=f"s3://{amzn-s3-demo-bucket1}",
).with_feature_group(first_fg
).with_feature_group(second_fg, "base_id", ["base_feature_1"])
```

The following code shows an example of creating a dataset from multiple feature groups and a pandas dataframe.

```
base_data = [[1, 187512346.0, 123, 128],
             [2, 187512347.0, 168, 258],
             [3, 187512348.0, 125, 184],
             [1, 187512349.0, 195, 206]]
base_data_df = pd.DataFrame(
    base_data, 
    columns=["base_id", "base_time", "base_feature_1", "base_feature_2"]
)

builder = feature_store.create_dataset(
    base=base_data_df, 
    event_time_identifier_feature_name='base_time', 
    record_identifier_feature_name='base_id',
    output_path=f"s3://{s3_bucket_name}"
).with_feature_group(first_fg
).with_feature_group(second_fg, "base_id", ["base_feature_1"])
```

The [Feature Store APIs](https://sagemaker.readthedocs.io/en/stable/api/prep_data/feature_store.html#dataset-builder) provides you with helper methods for the `create_dataset` function. You can use them to do the following:
+ Create a dataset from multiple feature groups.
+ Create a dataset from multiple feature groups and a pandas dataframe.
+ Create a dataset from a single feature group and a pandas dataframe.
+ Create a dataset using a point in time accurate join where records in the joined feature group follow sequentially.
+ Create a dataset with the duplicated records, instead of following the default behavior of the function.
+ Create a dataset with the deleted records, instead of following the default behavior of the function.
+ Create a dataset for time periods that you specify.
+ Save the dataset as a CSV file.
+ Save the dataset as a pandas dataframe.

The *base* feature group is an important concept for joins. The base feature group is the feature group that has other feature groups or the pandas dataframe joined to it. For each dataset

You can add the following optional methods to the `create_dataset` function to configure how you're creating dataset:
+ `with_feature_group` – Performs an inner join between the base feature group and another feature group using the record identifier and the target feature name in the base feature group. The following provides information about the parameters that you specify:
  + `feature_group` – The feature group that you're joining.
  + `target_feature_name_in_base` – The name of the feature in the base feature group that you're using as a key in the join. The record identifier in the other feature groups are the other keys that Feature Store uses in the join.
  + `included_feature_names` – A list of strings representing the feature names of the base feature group. You can use the field to specify the features that you want to include in the dataset.
  + `feature_name_in_target` – Optional string representing the feature in the target feature group that will be compared to the target feature in the base feature group.
  + `join_comparator` – Optional `JoinComparatorEnum` representing the comparator used when joining the target feature in the base feature group and the feature in the target feature group. These `JoinComparatorEnum` values can be `GREATER_THAN`, `GREATER_THAN_OR_EQUAL_TO`, `LESS_THAN`, `LESS_THAN_OR_EQUAL_TO`, `NOT_EQUAL_TO` or `EQUALS` by default.
  + `join_type` – Optional `JoinTypeEnum` representing the type of join between the base and target feature groups. These `JoinTypeEnum` values can be `LEFT_JOIN`, `RIGHT_JOIN`, `FULL_JOIN`, `CROSS_JOIN` or `INNER_JOIN` by default.
+ `with_event_time_range` – Creates a dataset using the event time range that you specify.
+ `as_of` – Creates a dataset up to a timestamp that you specify. For example, if you specify `datetime(2021, 11, 28, 23, 55, 59, 342380)` as the value, creates a dataset up to November 28th, 2021.
+ `point_time_accurate_join` – Creates a dataset where all of the event time values of the base feature group is less than all the event time values of the feature group or pandas dataframe that you're joining.
+ `include_duplicated_records` – Keeps duplicated values in the feature groups.
+ `include_deleted_records` – Keeps deleted values in the feature groups.
+ `with_number_of_recent_records_by_record_identifier` – An integer that you specify to determine how many of the most recent records appear in the dataset.
+ `with_number_of_records_by_record_identifier` – An integer that represents how many records appear in the dataset.

After you've configured the dataset, you can specify the output using one of the following methods:
+ `to_csv_file` – Saves the dataset as a CSV file.
+ `to_dataframe` – Saves the dataset as a pandas dataframe.

You can retrieve data that comes after a specific period in time. The following code retrieves data after a timestamp.

```
fg1 = FeatureGroup("example-feature-group-1")
feature_store.create_dataset(
    base=fg1, 
    output_path="s3://example-S3-path"
).with_number_of_records_from_query_results(5).to_csv_file()
```

You can also retrieve data from a specific time period. You can use the following code to get data for a specific time range:

```
fg1 = FeatureGroup("fg1")
feature_store.create_dataset(
    base=fg1, 
    output_path="example-S3-path"
).with_event_time_range(
    datetime(2021, 11, 28, 23, 55, 59, 342380), 
    datetime(2020, 11, 28, 23, 55, 59, 342380)
).to_csv_file() #example time range specified in datetime functions
```

You might want to join multiple feature groups to a pandas dataframe where the event time values of the feature group happen no later than the event time of the data frame. Use the following code as a template to help you perform the join.

```
fg1 = FeatureGroup("fg1")
fg2 = FeatureGroup("fg2")
events = [['2020-02-01T08:30:00Z', 6, 1],
          ['2020-02-02T10:15:30Z', 5, 2],
          ['2020-02-03T13:20:59Z', 1, 3],
          ['2021-01-01T00:00:00Z', 1, 4]]
df = pd.DataFrame(events, columns=['event_time', 'customer-id', 'title-id']) 
feature_store.create_dataset(
    base=df, 
    event_time_identifier_feature_name='event_time', 
    record_identifier_feature_name='customer_id',
    output_path="s3://example-S3-path"
).with_feature_group(fg1, "customer-id"
).with_feature_group(fg2, "title-id"
).point_in_time_accurate_join(
).to_csv_file()
```

You can also retrieve data that comes after a specific period in time. The following code retrieves data after the time specified by the timestamp in the `as_of` method.

```
fg1 = FeatureGroup("fg1")
feature_store.create_dataset(
    base=fg1, 
    output_path="s3://example-s3-file-path"
).as_of(datetime(2021, 11, 28, 23, 55, 59, 342380)
).to_csv_file() # example datetime values
```

## Sample Amazon Athena queries
<a name="feature-store-athena-sample-queries"></a>

You can write queries in Amazon Athena to create a dataset from your feature groups. You can also write queries that create a dataset from feature groups and a single pandas dataframe.

 **Interactive Exploration** 

 This query selects the first 1000 records.  

```
SELECT *
FROM <FeatureGroup.DataCatalogConfig.DatabaseName>.<FeatureGroup.DataCatalogConfig.TableName>
LIMIT 1000
```

 **Latest snapshot without duplicates** 

 This query selects the latest non-duplicate records. 

```
SELECT *
FROM
    (SELECT *,
         row_number()
        OVER (PARTITION BY <RecordIdentiferFeatureName>
    ORDER BY  <EventTimeFeatureName> desc, Api_Invocation_Time DESC, write_time DESC) AS row_num
    FROM <FeatureGroup.DataCatalogConfig.DatabaseName>.<FeatureGroup.DataCatalogConfig.TableName>)
WHERE row_num = 1;
```

 **Latest snapshot without duplicates and deleted records in the offline store** 

 This query filters out any deleted records and selects non-duplicate records from the offline store.  

```
SELECT *
FROM
    (SELECT *,
         row_number()
        OVER (PARTITION BY <RecordIdentiferFeatureName>
    ORDER BY  <EventTimeFeatureName> desc, Api_Invocation_Time DESC, write_time DESC) AS row_num
    FROM <FeatureGroup.DataCatalogConfig.DatabaseName>.<FeatureGroup.DataCatalogConfig.TableName>)
WHERE row_num = 1 and 
NOT is_deleted;
```

 **Time Travel without duplicates and deleted records in the offline store** 

 This query filters out any deleted records and selects non-duplicate records from a particular point in time.

```
SELECT *
FROM
    (SELECT *,
         row_number()
        OVER (PARTITION BY <RecordIdentiferFeatureName>
    ORDER BY  <EventTimeFeatureName> desc, Api_Invocation_Time DESC, write_time DESC) AS row_num
    FROM <FeatureGroup.DataCatalogConfig.DatabaseName>.<FeatureGroup.DataCatalogConfig.TableName>
    where <EventTimeFeatureName> <= timestamp '<timestamp>')
    -- replace timestamp '<timestamp>' with just <timestamp>  if EventTimeFeature is of type fractional
WHERE row_num = 1 and
NOT is_deleted
```

# Cross account feature group discoverability and access
<a name="feature-store-cross-account"></a>

Data scientists and data engineers can benefit from exploring and accessing features that span multiple accounts, in order to promote data consistency, streamline collaboration, and reduce duplication of effort. 

With Amazon SageMaker Feature Store, you can share feature group resources across accounts. The resources that can be shared in Feature Store are feature group entities or the feature group catalog, where the feature group catalog contains all of the feature group entities on your account. The resource owner account shares resources with the resource consumer accounts. There are two distinct categories of permissions associated with sharing resources:
+ **Discoverability permission**: *Discoverability* means being able to see feature group names and metadata. When you share the feature group catalog and grant the discoverability permission, all feature group entities in the account that you share from (resource owner account) become discoverable by the accounts that you are sharing with (resource consumer account). For example, if you make the feature group catalog in the resource owner account discoverable to a resource consumer account, then principals of the resource consumer account can see all feature groups contained in the resource owner account. It means discoverability is “all or nothing” at the account level (regionalized). This permission is granted to resource consumer accounts by using the feature group catalog resource type.
+ **Access permissions**: When you grant an access permission, you do so at a feature group resource level (not at account level). This gives you more granular control over granting access to data. The type of access permissions that can be granted are: read-only, read-write, and admin. For example, you can select only certain feature groups from the resource owner account to be accessible by principals of the resource consumer account, depending on your business needs. This permission is granted to resource consumer accounts by using the feature group resource type and specifying feature group entities.

The distinction between discoverability and access is important to keep in mind when you set up cross account sharing. Also, the methods of sharing resources differ depending on whether you are sharing online or offline feature groups. For information about online and offline feature groups, see [Feature Store concepts](feature-store-concepts.md). In the following topics, you can learn how to apply discoverability and access permissions to your shared resources.

The following example diagram visualizes the feature group catalog resource versus a feature group resource entity. The feature group catalog contains *all* of your feature group entities and can be shared using the discoverability permission. When granted a discoverability permission, the resource consumer account can search and discover *all* feature group entities within the resource owner account. A feature group entity contains your machine learning data and can be shared using the access permission. When granted an access permission, the resource consumer account can access the feature group data, with access determined by the relevant access permission.

 ![\[Example showing how a resource owner account contains a feature group catalog, which contains feature groups.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/feature-store/feature-store-cross-account-resource-types.png) 

**Topics**
+ [

# Enabling cross account discoverability
](feature-store-cross-account-discoverability.md)
+ [

# Enabling cross account access
](feature-store-cross-account-access.md)

# Enabling cross account discoverability
<a name="feature-store-cross-account-discoverability"></a>

With AWS Resource Access Manager (AWS RAM) you can securely share the feature group catalog, which contains all of your feature group and feature resources, with other AWS accounts. This lets members of your team search and discover feature groups and features that span multiple accounts, promoting data consistency, streamlining collaboration, and reducing duplication of effort.

The resource owner account can share resources with other individual AWS accounts by granting permissions using AWS RAM. The resource consumer account is the AWS account with whom a resource is shared, limited by the permissions granted from the resource owner account. If you are an organization, you may want to take advantage of AWS Organizations, with which you can share resources with individual AWS accounts, with all accounts in your organization, or in an Organization Unit (OU), without having to apply permissions to each account. For instructional videos and more information about AWS RAM concepts and benefits, see [What is AWS Resource Access Manager?](https://docs.aws.amazon.com/ram/latest/userguide/what-is.html) in the AWS RAM User Guide.

This section covers how the resource owner account can choose the feature group catalog and grant discoverability privilege to resource consumer accounts, and then how the resource consumer accounts with the discoverability privilege can use search and discover the feature groups within the resource owner account. The discoverability permission does not grant access permissions (read-only, read-write, or admin). Access permissions are granted at a resource level and not at the account level. For information about granting access permissions, see [Enabling cross account access](feature-store-cross-account-access.md).

The following topics discuss how to share the feature group catalog and how to search for shared resources with discoverability permissions applied.

**Topics**
+ [

# Share your feature group catalog
](feature-store-cross-account-discoverability-share-feature-group-catalog.md)
+ [

# Search discoverable resources
](feature-store-cross-account-discoverability-use.md)

# Share your feature group catalog
<a name="feature-store-cross-account-discoverability-share-feature-group-catalog"></a>

The feature group catalog, `DefaultFeatureGroupCatalog`, contains *all* feature group entities owned by the resource owner account. The catalog can be shared by the resource owner account to grant discoverability to a single or multiple resource consumer accounts. This is done by creating a resource share in AWS Resource Access Manager (AWS RAM). A feature group is the main resource in Amazon SageMaker Feature Store and is composed of feature definitions and records that are managed by Feature Store. For more information about feature groups, see [Feature Store concepts](feature-store-concepts.md).

Discoverability means that the resource consumer accounts can search for the discoverable resources. The discoverable resources are viewed as if they were in their own account (excluding tags). When allowing the feature group catalog to be discoverable, the resource consumer accounts by default are not granted access permissions (read-only, read-write, or admin). Access permissions are granted at a resource level and not at the account level. For information about granting access permissions, see [Enabling cross account access](feature-store-cross-account-access.md).

In order to enable cross account discoverability you will need to specify the SageMaker AI Resource Catalog and the feature group catalog while using the [AWS RAM Create a resources share](https://docs.aws.amazon.com/ram/latest/userguide/getting-started-sharing.html#getting-started-sharing-create) instructions in the AWS RAM developer guide. In the following we give the specifications for using the AWS RAM console instructions.

1. **Specify resource share details**: 
   + Resource type: Choose **SageMaker AI Resource Catalogs**.
   + ARN: Choose the feature group catalog ARN with the format: `arn:aws:sagemaker:us-east-1:111122223333:sagemaker-catalog/DefaultFeatureGroupCatalog`

     *`us-east-1`* is the region of the resource and *`111122223333`* is the resource owner account ID.
   + Resource ID: Choose `DefaultFeatureGroupCatalog`.

1. **Associate managed permissions**: 
   + Managed permission: Choose `AWSRAMPermissionSageMakerCatalogResourceSearch`.

1. **Grant access to principals**:
   + Choose the principal types (AWS account, Organization, or Organizational unit) and enter the appropriate ID.

     If you are an organization, you may want to take advantage of AWS Organizations. With Organizations you can share resources with individual AWS accounts, all accounts in your organization, or with an Organization Unit (OU). This simplifies applying permissions, without having to apply permissions to each account. For more information about sharing your resources and granting permissions within AWS, see [Enable resource sharing within AWS Organizations](https://docs.aws.amazon.com/ram/latest/userguide/getting-started-sharing.html#getting-started-sharing-orgs) in the AWS Resource Access Manager Developer Guide.

1. **Review and create**: 
   + Review then choose **Create resource share**.

It may take a few minutes for the resource share and principal, or resource consumer account, associations to complete. Once the resource share and principal associations are set, the specified resource consumer accounts receive an invitation to join the resource share. The resource consumer accounts can view and accept the invitations by opening the [Shared with me: Resource shares](https://console.aws.amazon.com/ram/home#SharedResourceShares) page in the AWS RAM console. For more information on accepting and viewing resources in AWS RAM, see [Access AWS resources shared with you](https://docs.aws.amazon.com/ram/latest/userguide/working-with-shared.html). Invitations are not sent in these cases:
+ If you are part of an organization in AWS Organizations and sharing in your organization is enabled. In this case principals in the organization automatically get access to the shared resources without invitations.
+ If you share with the AWS account that owns the resource, then the principals in that account automatically get access to the shared resources without invitations.

For more information about accepting and using a resource share, see [Search discoverable resources](feature-store-cross-account-discoverability-use.md).

## Share the feature group catalog using the AWS SDK for Python (Boto3)
<a name="feature-store-cross-account-discoverability-sagemaker-catalog-resource-type-python-sdk-example"></a>

You can use the AWS SDK for Python (Boto3) for AWS RAM APIs to create a resource share. The following code is an example of a resource owner account ID *`111122223333`* within the region *us-east-1*. The resource owner is creating a resource share named *`test-cross-account-catalog`*. They are sharing the feature group catalog with the resource consumer account ID *`444455556666`*. To use the Python SDK for AWS RAM APIs, attach the `AWSRAMPermissionSageMakerCatalogResourceSearch` policy with the execution role. See [AWS RAM APIs](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ram/client/create_resource_share.html) for more details.

```
#Call list resource catalogs as a prerequisite for RAM share
sagemaker_client.list_resource_catalogs()

# Share DefaultFeatureGroupCatalog with other account
ram_client = boto3.client("ram")
response = ram_client.create_resource_share(
    name='test-cross-account-catalog', # Change to your custom resource share name
    resourceArns=[
        'arn:aws:sagemaker:us-east-1:111122223333:sagemaker-catalog/' + 'DefaultFeatureGroupCatalog', # Change 111122223333 to the resource owner account ID
    ],
    principals=[
        '444455556666', # Change 444455556666 to the resource consumer account ID
    ],
    permissionArns = ["arn:aws:ram::aws:permission/AWSRAMPermissionSageMakerCatalogResourceSearch"] # AWSRAMPermissionSageMakerCatalogResourceSearch is the only policy allowed for SageMaker Catalog
)
```

Principals are actors in a security system. In a resource-based policy, the allowed principals are IAM users, IAM roles, the root account, or another AWS service.

# Search discoverable resources
<a name="feature-store-cross-account-discoverability-use"></a>

The resource owner account must grant permissions to resource consumer accounts to allow for discoverability or access (read-only, read-write, or admin) privileges with a shared resource. In the following sections, we provide instructions on how to accept an invitation to shared resources and examples showing how to search for discoverable feature groups.

**Accept an invitation to shared resources**

As the resource consumer account, you receive an invitation to join a resource share once the resource owner account has granted permission. To accept the invitation to any shared resources, open the [Shared with me: Resource shares](https://console.aws.amazon.com/ram/home#SharedResourceShares) page in the AWS RAM console to view and respond to invitations. Invitations are not sent in these cases:
+ If you are part of an organization in AWS Organizations and sharing in your organization is enabled, then principals in the organization automatically get access to the shared resources without invitations.
+ If you share with the AWS account that owns the resource, then the principals in that account automatically get access to the shared resources without invitations.

For more information about accepting and using a resource share in AWS RAM, see [Respond to the resource share invitation](https://docs.aws.amazon.com/ram/latest/userguide/getting-started-shared.html).

## Search discoverable feature groups example
<a name="feature-store-cross-account-discoverability-use-search"></a>

Once resources are shared with a resource consumer account with the discoverability permission applied, the resource consumer account can search for and discover the shared resources in Amazon SageMaker Feature Store using the console UI and the Feature Store SDK. Note that you cannot search on tags for cross account resources. The maximum number of feature group catalogs viewable is 1000. For more information about granting discoverability permissions, see [Enabling cross account discoverability](feature-store-cross-account-discoverability.md).

For details about viewing shared feature groups in the console, see [Find feature groups in your Feature Store](feature-store-search-feature-group-metadata.md).

In the following example, the resource consumer account uses SageMaker AI search to search for resources made discoverable to them when `CrossAccountFilterOption` is set to `"CrossAccount"`:

```
from sagemaker.session import Session

sagemaker_session = Session(boto_session=boto_session)

sagemaker_session.search(
    resource="FeatureGroup",
    search_expression={
        "Filters": [
            {
                "Name": "FeatureGroupName",
                "Value": "MyFeatureGroup",
                "Operator": "Contains",
            }
        ],
        "Operator": "And",
    },
    sort_by="Name",
    sort_order="Ascending",
    next_token="token",
    max_results=50,
    CrossAccountFilterOption="CrossAccount"
)
```

For more information about SageMaker AI search and the request parameters, see [Search](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Search.html) in the Amazon SageMaker API Reference.

# Enabling cross account access
<a name="feature-store-cross-account-access"></a>

The access permissions are read-only, read-write, and admin permissions. The permission name, description, and list of specific APIs available for each permission are listed in the following:
+ Read-only permission (`AWSRAMPermissionSageMakerFeatureGroupReadOnly`): The read privilege allows resource consumer accounts to read records in the shared feature groups and view details and metadata. 
  + `DescribeFeatureGroup`: Retrieves details about a feature group and its configuration
  + `DescribeFeatureMetadata`: Shows the metadata for a feature within a feature group
  + `BatchGetRecord`: Retrieves a batch of records from a feature group
  + `GetRecord`: Retrieves a record from a feature group
+ Read-write permission (`AWSRAMPermissionSagemakerFeatureGroupReadWrite`): The read-write privilege allows resource consumer accounts to write records to, and delete records from, the shared feature groups, in addition to read permissions.
  + `PutRecord`: Writes a record to a feature group
  + `DeleteRecord`: Removes a record from a feature group
  + APIs listed in `AWSRAMPermissionSageMakerFeatureGroupReadOnly`
+ Admin permission (`AWSRAMPermissionSagemakerFeatureGroupAdmin`): The admin privilege allows the resource consumer accounts to update the description and parameters of features within the shared feature groups, update the configuration of the shared feature groups, in addition to read-write permissions.
  + `DescribeFeatureMetadata`: Shows the metadata for a feature within a feature group
  + `UpdateFeatureGroup`: Updates a feature group configuration
  + `UpdateFeatureMetadata`: Updates description and parameters of a feature in the feature group
  + APIs listed in `AWSRAMPermissionSagemakerFeatureGroupReadWrite`

In the following topics you can learn how to share online store and offline feature groups—there are differences between the two when it comes to sharing.

**Topics**
+ [

# Share online feature groups with AWS Resource Access Manager
](feature-store-cross-account-access-online-store.md)
+ [

# Cross account offline store access
](feature-store-cross-account-access-offline-store.md)

# Share online feature groups with AWS Resource Access Manager
<a name="feature-store-cross-account-access-online-store"></a>

With AWS Resource Access Manager (AWS RAM) you can securely share Amazon SageMaker Feature Store online feature groups with other AWS accounts. Members of your team can explore and access feature groups that span multiple accounts, promoting data consistency, streamlining collaboration, and reducing duplication of effort.

The resource owner account can share resources with other individual AWS accounts by granting permissions using AWS RAM. The resource consumer account is the AWS account with whom a resource is shared, limited by the permissions granted from the resource owner account. If you are an organization, you may want to take advantage of AWS Organizations, with which you can share resources with individual AWS accounts, with all accounts in your organization, or in an Organization Unit (OU), without having to apply permissions to each account. For instructional videos and more information about AWS RAM concepts and benefits, see [What is AWS Resource Access Manager?](https://docs.aws.amazon.com/ram/latest/userguide/what-is.html) in the AWS RAM User Guide.

Note that there is a soft maximum limit to the transactions per second (TPS) per API per AWS account. The maximum TPS limit applies to *all* transactions on the resources within the resource owner account, so transactions from the resource consumer accounts also count towards this maximum limit. For information about service quotas and how to request a quota increase, see [AWS service quotas](https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html).

This section covers how the resource owner account can choose feature groups and grant access privileges (read-only, read-write, and admin) to resource consumer accounts, and then how the resource consumer accounts with access privileges can use those feature groups. The access permissions do not allow for the resource consumer accounts to search and discover feature groups. To allow for resource consumer accounts to search and discover feature groups from the resource owner account, the resource owner account must grant discoverability permission to the resource consumer accounts, where all of the feature groups within the resource owner account are discoverable by the resource consumer accounts. For more information about granting the discoverability permission, see [Enabling cross account discoverability](feature-store-cross-account-discoverability.md).

The following topics show how to share Feature Store online store resources using the AWS RAM console. For information about sharing your resources and granting permissions within AWS using the AWS RAM console or AWS Command Line Interface (AWS CLI), see [Sharing your AWS resources](https://docs.aws.amazon.com/ram/latest/userguide/getting-started-sharing.html).

**Topics**
+ [

# Share your feature group entities
](feature-store-cross-account-access-online-store-share-feature-group.md)
+ [

# Use online store shared resources with access permissions
](feature-store-cross-account-access-online-store-use.md)

# Share your feature group entities
<a name="feature-store-cross-account-access-online-store-share-feature-group"></a>

As the resource owner account you can use the feature group resource type for Amazon SageMaker Feature Store to share feature group entities, by creating a resource share in AWS Resource Access Manager (AWS RAM). 

Use the following instructions along with the [Sharing your AWS resources](https://docs.aws.amazon.com/ram/latest/userguide/getting-started-sharing.html#getting-started-sharing-create) instructions in the AWS RAM User Guide.

When sharing the feature group resource type using the AWS RAM console, you need to make the following choices.

1. **Specify resource share details**: 
   + Resource type: Choose **SageMaker AI Feature Groups**.
   + ARN: Choose your feature group ARN with the format: `arn:aws:sagemaker:us-east-1:111122223333:feature-group/your-feature-group-name`.

     `us-east-1` is the region of the resource, `111122223333` is the resource owner account ID, and `your-feature-group-name` is the feature group you are sharing.
   + Resource ID: Choose the feature group, `your-feature-group-name`, to which you want to grant access permissions.

1. **Associate managed permissions**: 
   + Managed permission: Choose the access permission. For more information about access permissions, see [Enabling cross account access](feature-store-cross-account-access.md).

1. **Grant access to principals**:
   + Choose the principal type (AWS account, Organization, Organizational unit, IAM role, or IAM user) and enter the appropriate ID or ARN.

1. **Review and create**: 
   + Review then choose **Create resource share**.

Granting any access permission does not grant resource consumer accounts the discoverability permission, so the resource consumer accounts with access permissions cannot search and discover those feature groups. To allow for resource consumer accounts to search and discover feature groups from the resource owner account, the resource owner account must grant the discoverability permission to the resource consumer accounts, where *all* of the feature groups within the resource owner account are discoverable by the resource consumer accounts. For more information about granting the discoverability permission, see [Enabling cross account discoverability](feature-store-cross-account-discoverability.md).

If the resource consumer accounts are only granted access permissions, the feature group entities can still be viewed on AWS RAM. To view resources on AWS RAM, see [Access AWS resources shared with you](https://docs.aws.amazon.com/ram/latest/userguide/working-with-shared.html) in the AWS RAM User Guide.

It may take a few minutes for the resource share and principal, or resource consumer account, associations to complete. Once the resource share and principal associations are set, the specified resource consumer accounts receive an invitation to join the resource share. The resource consumer accounts can view and accept the invitations by opening the [Shared with me: Resource shares](https://console.aws.amazon.com/ram/home#SharedResourceShares) page in the AWS RAM console. Invitations are not sent in these cases:
+ If you are part of an organization in AWS Organizations and sharing in your organization is enabled, then principals in the organization automatically get access to the shared resources without invitations.
+ If you share with the AWS account that owns the resource, then the principals in that account automatically get access to the shared resources without invitations.

For more information about accepting and using a resource share in AWS RAM, see [Using shared AWS resources](https://docs.aws.amazon.com/ram/latest/userguide/getting-started-shared.html) in the AWS RAM User Guide.

## Share online store feature groups using the AWS SDK for Python (Boto3)
<a name="feature-store-cross-account-access-online-store-python-sdk-example"></a>

You can use the AWS SDK for Python (Boto3) for AWS RAM APIs to create a resource share. The following code is an example of a resource owner account ID `111122223333` creating a resource share named `'test-cross-account-fg'`, sharing the feature group named `'my-feature-group'` with the resource consumer account ID `444455556666` while granting the `AWSRAMPermissionSageMakerFeatureGroupReadOnly` permission. For more information about access permissions, see [Enabling cross account access](feature-store-cross-account-access.md). To use the Python SDK for AWS RAM APIs, you need to attach AWS RAM full access managed policy with execution role. See [create\$1resource\$1share](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ram/client/create_resource_share.html) AWS RAM API for more details.

```
import boto3

# Choose feature group name
feature_group_name = 'my-feature-group' # Change to your feature group name 

# Share 'my-feature-group' with other account
ram_client = boto3.client("ram")
response = ram_client.create_resource_share(
    name='test-cross-account-fg', # Change to your custom resource share name
    resourceArns=[
        'arn:aws:sagemaker:us-east-1:111122223333:feature-group/' + feature_group_name, # Change 111122223333 to the resource owner account ID
    ],
    principals=[
        '444455556666', # Change 444455556666 to the resource consumer account ID
    ],
    permissionArns = ["arn:aws:ram::aws:permission/AWSRAMPermissionSageMakerFeatureGroupReadOnly"]
)
```

Principals are actors in a security system. In a resource-based policy, the allowed principals are IAM users, IAM roles, the root account, or another AWS service.

# Use online store shared resources with access permissions
<a name="feature-store-cross-account-access-online-store-use"></a>

The resource owner account must grant permissions to resource consumer accounts to allow for discoverability, read-only, write, or admin privileges with a shared resource. In the following sections, we provide instructions on how to accept an invitation to access shared resources and provide examples showing how to view and interact with shared feature groups.

**Accept an invitation to access shared resources using AWS RAM**

As the resource consumer account, you will receive an invitation to join a resource share once the resource owner account has granted permission. To accept the invitation to any shared resources, open the [Shared with me: Resource shares](https://console.aws.amazon.com/ram/home#SharedResourceShares) page in the AWS RAM console to view and respond to invitations. Invitations are not sent in these cases:
+ If you are part of an organization in AWS Organizations and sharing in your organization is enabled, then principals in the organization automatically get access to the shared resources without invitations.
+ If you share with the AWS account that owns the resource, then the principals in that account automatically get access to the shared resources without invitations.

For more information about accepting and using a resource share in AWS RAM, see [Using shared AWS resources](https://docs.aws.amazon.com/ram/latest/userguide/getting-started-shared.html) in the AWS RAM User Guide.

## View shared resources on the AWS RAM console
<a name="feature-store-cross-account-access-online-store-use-view"></a>

Granting any access permissions does not grant resource consumer accounts the discoverability permission, so the resource consumer accounts with access permissions cannot search and discover those feature groups. To allow for resource consumer accounts to search and discover feature groups from the resource owner account, the resource owner account must grant the discoverability permission to the resource consumer accounts, where all of the feature groups within the resource owner account are discoverable by the resource consumer accounts. For more information about granting the discoverability permission, see [Enabling cross account discoverability](feature-store-cross-account-discoverability.md).

To view the shared resources on the AWS RAM console, open the [Shared with me: Resource shares](https://console.aws.amazon.com/ram/home#SharedResourceShares) page in the AWS RAM console. 

## Read and write actions with a shared feature groups example
<a name="feature-store-cross-account-access-online-store-use-read-write-actions"></a>

Once your resource consumer account is granted the appropriate permissions by the resource owner account, you can perform actions on the shared resources using the Feature Store SDK. You can do this by providing the resource ARN as the `FeatureGroupName`. To obtain the Feature Group ARN, you can use the AWS SDK for Python (Boto3) [https://boto3.amazonaws.com/v1/documentation/api/1.26.98/reference/services/sagemaker/client/describe_feature_group.html#describe-feature-group](https://boto3.amazonaws.com/v1/documentation/api/1.26.98/reference/services/sagemaker/client/describe_feature_group.html#describe-feature-group) function or use the console UI. For information about using the console UI to view feature group details, see [View feature group details from the console](feature-store-use-with-studio.md#feature-store-view-feature-group-detail-studio).

The following examples use `PutRecord` and `GetRecord` with a shared feature group entity. See the request and response syntax in the AWS SDK for Python (Boto3) documentation for [https://boto3.amazonaws.com/v1/documentation/api/1.26.98/reference/services/firehose/client/put_record.html#put-record](https://boto3.amazonaws.com/v1/documentation/api/1.26.98/reference/services/firehose/client/put_record.html#put-record) and [https://boto3.amazonaws.com/v1/documentation/api/1.26.98/reference/services/sagemaker-featurestore-runtime/client/get_record.html#get-record](https://boto3.amazonaws.com/v1/documentation/api/1.26.98/reference/services/sagemaker-featurestore-runtime/client/get_record.html#get-record).

```
import boto3

sagemaker_featurestore_runtime = boto3.client('sagemaker-featurestore-runtime')

# Put record into feature group named 'test-fg' within the resource owner account ID 111122223333
featurestore_runtime.put_record(
    FeatureGroupName="arn:aws:sagemaker:us-east-1:111122223333:feature-group/test-fg", 
    Record=[value.to_dict() for value in record] # You will need to define record prior to calling PutRecord
)
```

```
import boto3

sagemaker_featurestore_runtime = boto3.client('sagemaker-featurestore-runtime')

# Choose record identifier
record_identifier_value = str(2990130)

# Get record from feature group named 'test-fg' within the resource owner account ID 111122223333
featurestore_runtime.get_record(
    FeatureGroupName="arn:aws:sagemaker:us-east-1:111122223333:feature-group/test-fg", 
    RecordIdentifierValueAsString=record_identifier_value
)
```

For more information about granting permissions to feature group entities, see [Share your feature group entities](feature-store-cross-account-access-online-store-share-feature-group.md).

# Cross account offline store access
<a name="feature-store-cross-account-access-offline-store"></a>

 Amazon SageMaker Feature Store allows users to create a feature group in one account (Account A) and configure it with an offline store using an Amazon S3 bucket in another account (Account B). You can set this up using the steps in the following section.

**Topics**
+ [

## Step 1: Set up the offline store access role in Account A
](#feature-store-setup-step1)
+ [

## Step 2: Set up an offline store Amazon S3 bucket in Account B
](#feature-store-setup-step2)
+ [

## Step 3: Set up an offline store AWS KMS encryption key in Account A
](#feature-store-setup-step3)
+ [

## Step 4: Create a feature group in Account A
](#feature-store-setup-step4)

## Step 1: Set up the offline store access role in Account A
<a name="feature-store-setup-step1"></a>

First, set up a role for Amazon SageMaker Feature Store to write the data into the offline store. The simplest way to accomplish this is to create a new role using the `AmazonSageMakerFeatureStoreAccess` policy or to use an existing role that already has the `AmazonSageMakerFeatureStoreAccess` policy attached. This document refers to this policy as `Account-A-Offline-Feature-Store-Role-ARN`. 

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetBucketAcl",
                "s3:PutObjectAcl"
            ],
            "Resource": [
                "arn:aws:s3:::*SageMaker*",
                "arn:aws:s3:::*Sagemaker*",
                "arn:aws:s3:::*sagemaker*"
            ]
        }
    ]
}
```

------

The preceding code snippet shows the `AmazonSageMakerFeatureStoreAccess` policy. The `Resource` section of the policy is scoped down by default to S3 buckets with names that contain `SageMaker`, `Sagemaker`, or `sagemaker`. This means the offline store Amazon S3 bucket being used must follow this naming convention. If this is not your case, or if you want to further scope down the resource, you can copy and paste the policy to your Amazon S3 bucket policy in the console, customize the `Resource` section to be `arn:aws:s3:::your-offline-store-bucket-name`, and then attach to the role. 

Additionally, this role must have AWS KMS permissions attached. At a minimum, it requires the `kms:GenerateDataKey` permission to be able to write to the offline store using your customer managed key. See Step 3 to learn about why a customer managed key is needed for the cross account scenario and how to set it up. The following example shows an inline policy: 

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "kms:GenerateDataKey"
            ],
            "Resource": "arn:aws:kms:*:111122223333:key/*"
        }
    ]
}
```

------

The `Resource` section of this policy is scoped to any key in Account A. To further scope this down, after setting up the offline store KMS key in Step 3, return to this policy and replace it with the key ARN.

## Step 2: Set up an offline store Amazon S3 bucket in Account B
<a name="feature-store-setup-step2"></a>

Create an Amazon S3 bucket in Account B. If you are using the default `AmazonSageMakerFeatureStoreAccess` policy, the bucket name must include `SageMaker`, `Sagemaker`, or `sagemaker`. Edit the bucket policy as shown in the following example to allow Account A to read and write objects.

This document refers to the following example bucket policy as `Account-B-Offline-Feature-Store-Bucket`. 

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "S3CrossAccountBucketAccess",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:PutObjectAcl",
                "s3:GetBucketAcl"
            ],
            "Principal": {
                "AWS": [
                    "Account-A-Offline-Feature-Store-Role-ARN"
                ]
            },
            "Resource": [
                "arn:aws:s3:::offline-store-bucket-name/*",
                "arn:aws:s3:::offline-store-bucket-name"
            ]
        }
    ]
}
```

------

In the preceding policy, the principal is `Account-A-Offline-Feature-Store-Role-ARN`, which is the role created in Account A in Step 1 and provided to Amazon SageMaker Feature Store to write to the offline store. You can provide multiple ARN roles under `Principal`.

## Step 3: Set up an offline store AWS KMS encryption key in Account A
<a name="feature-store-setup-step3"></a>

Amazon SageMaker Feature Store ensures that server-side encryption is always enabled for Amazon S3 objects in the offline store. For cross account use cases, you must provide a customer managed key so that you are in control of who can write to the offline store (in this case, `Account-A-Offline-Feature-Store-Role-ARN` from Account A) and who can read from the offline store (in this case, identities from Account B). 

This document refers to the following example key policy as `Account-A-Offline-Feature-Store-KMS-Key-ARN`.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Id": "key-consolepolicy-3",
    "Statement": [
        {
            "Sid": "Enable IAM User Permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::111122223333:root"
            },
            "Action": "kms:*",
            "Resource": "*"
        },
        {
            "Sid": "Allow access for Key Administrators",
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                "arn:aws:iam::111122223333:role/Administrator"
                ]
            },
            "Action": [
                "kms:Create*",
                "kms:Describe*",
                "kms:Enable*",
                "kms:List*",
                "kms:Put*",
                "kms:Update*",
                "kms:Revoke*",
                "kms:Disable*",
                "kms:Get*",
                "kms:Delete*",
                "kms:TagResource",
                "kms:UntagResource",
                "kms:ScheduleKeyDeletion",
                "kms:CancelKeyDeletion"
            ],
            "Resource": "*"
        },
        {
            "Sid": "Allow Feature Store to get information about the customer managed key",
            "Effect": "Allow",
            "Principal": {
                "Service": "sagemaker.amazonaws.com"
            },
            "Action": [
                "kms:Describe*",
                "kms:Get*",
                "kms:List*"
            ],
            "Resource": "*"
        },
        {
            "Sid": "Allow use of the key",
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "Account-A-Offline-Feature-Store-Role-ARN",
                    "arn:aws:iam::444455556666:root"
                ]
            },
            "Action": [
                "kms:Encrypt",
                "kms:Decrypt",
                "kms:DescribeKey",
                "kms:CreateGrant",
                "kms:RetireGrant",
                "kms:ReEncryptFrom",
                "kms:ReEncryptTo",
                "kms:GenerateDataKey",
                "kms:ListAliases",
                "kms:ListGrants"
            ],
            "Resource": "*"
        }
    ]
}
```

------

## Step 4: Create a feature group in Account A
<a name="feature-store-setup-step4"></a>

Next, create the feature group in Account A, with an offline store Amazon S3 bucket in Account B. To do this, provide the following parameters for `RoleArn`, `OfflineStoreConfig.S3StorageConfig.KmsKeyId`, and `OfflineStoreConfig.S3StorageConfig.S3Uri`, respectively: 
+ Provide `Account-A-Offline-Feature-Store-Role-ARN` as the `RoleArn`.
+ Provide `Account-A-Offline-Feature-Store-KMS-Key-ARN` for `OfflineStoreConfig.S3StorageConfig.KmsKeyId`.
+ Provide `Account-B-Offline-Feature-Store-Bucket` for `OfflineStoreConfig.S3StorageConfig.S3Uri`.

# Security and access control
<a name="feature-store-security"></a>

 Amazon SageMaker Feature Store enables you to create two types of stores: an online store or offline store. The online store is used for low latency real-time inference use cases whereas the offline store is used for training and batch inference use cases. When you create a feature group for online or offline use you can provide a AWS Key Management Service customer managed key to encrypt all your data at rest. In case you do not provide a AWS KMS key then we ensure that your data is encrypted on the server side using an AWS owned AWS KMS key or AWS managed AWS KMS key. While creating a feature group, you can select storage type and optionally provide a AWS KMS key for encrypting data, then you can call various APIs for data management such as `PutRecord`, `GetRecord`, `DeleteRecord`.

Feature Store allows you to grant or deny access to individuals at the feature group-level and enables cross-account access to Feature Store. For example, you can set up developer accounts to access the offline store for model training and exploration that do not have write access to production accounts. You can set up production accounts to access both online and offline stores. Feature Store uses unique customer AWS KMS keys for offline and online store data at-rest encryption. Access control is enabled through both API and AWS KMS key access. You can also create feature group-level access control. 

 For more information about customer managed key, see [customer managed keys](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#master_keys). For more information about AWS KMS, see [AWS KMS](https://aws.amazon.com/kms/). 

## Using AWS KMS permissions for Amazon SageMaker Feature Store
<a name="feature-store-kms-cmk-permissions"></a>

 Encryption at rest protects Feature Store under an AWS KMS customer managed key. By default, it uses an [AWS owned customer managed key for OnlineStore and AWS managed customer managed key for OfflineStore](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#aws-owned-cmk). Feature Store supports an option to encrypt your online or offline store under [customer managed key](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#customer-cmk). You can select the customer managed key for Feature Store when you create your online or offline store, and they can be different for each store. 

 Feature Store supports only [symmetric customer managed keys](https://docs.aws.amazon.com/kms/latest/developerguide/symm-asymm-concepts.html#symmetric-cmks). You cannot use an [asymmetric customer managed key](https://docs.aws.amazon.com/kms/latest/developerguide/symm-asymm-concepts.html#asymmetric-cmks) to encrypt your data in your online or offline store. For help determining whether a customer managed key is symmetric or asymmetric, see [Identifying symmetric and asymmetric customer managed keys](https://docs.aws.amazon.com/kms/latest/developerguide/find-symm-asymm.html).

When you use a customer managed key, you can take advantage of the following features: 
+  You create and manage the customer managed key, including setting the [key policies](https://docs.aws.amazon.com/kms/latest/developerguide/key-policies.html), [IAM policies](https://docs.aws.amazon.com/kms/latest/developerguide/iam-policies.html) and [grants](https://docs.aws.amazon.com/kms/latest/developerguide/grants.html) to control access to the customer managed key. You can [enable and disable](https://docs.aws.amazon.com/kms/latest/developerguide/enabling-keys.html) the customer managed key, enable and disable [automatic key rotation](https://docs.aws.amazon.com/kms/latest/developerguide/rotate-keys.html), and [delete the customer managed key](https://docs.aws.amazon.com/kms/latest/developerguide/deleting-keys.html) when it is no longer in use. 
+  You can use a customer managed key with [imported key material](https://docs.aws.amazon.com/kms/latest/developerguide/importing-keys.html) or a customer managed key in a [custom key store](https://docs.aws.amazon.com/kms/latest/developerguide/custom-key-store-overview.html) that you own and manage. 
+  You can audit the encryption and decryption of your online or offline store by examining the API calls to AWS KMS in [AWS CloudTrail logs](https://docs.aws.amazon.com/kms/latest/developerguide/services-dynamodb.html#dynamodb-cmk-trail). 

You do not pay a monthly fee for AWS owned customer managed keys. Customer managed keys will [ incur a charge](https://aws.amazon.com/kms/pricing/) for each API call and AWS Key Management Service quotas apply to each customer managed key.

## Authorizing use of a customer managed Key for your online store
<a name="feature-store-authorizing-cmk-online-store"></a>

 If you use a [customer managed key ](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#customer-cmk) to protect your online store, the policies on that customer managed key must give Feature Store permission to use it on your behalf. You have full control over the policies and grants on a customer managed key.

 Feature Store does not need additional authorization to use the default [AWS owned KMS key](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#master_keys) to protect your online or offline stores in your AWS account.

### Customer managed key policy
<a name="feature-store-customer-managed-cmk-policy"></a>

 When you select a [customer managed key](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#customer-cmk) to protect your Online Store, Feature Store must have permission to use the customer managed key on behalf of the principal who makes the selection. That principal, a user or role, must have the permissions on the customer managed key that Feature Store requires. You can provide these permissions in a [key policy](https://docs.aws.amazon.com/kms/latest/developerguide/key-policies.html), an [IAM policy](https://docs.aws.amazon.com/kms/latest/developerguide/iam-policies.html), or a [grant](https://docs.aws.amazon.com/kms/latest/developerguide/grants.html). At a minimum, Feature Store requires the following permissions on a customer managed key: 
+  "kms:Encrypt", "kms:Decrypt", "kms:DescribeKey", "kms:CreateGrant", "kms:RetireGrant", "kms:ReEncryptFrom", "kms:ReEncryptTo", "kms:GenerateDataKey", "kms:ListAliases", "kms:ListGrants", "kms:RevokeGrant" 

 For example, the following example key policy provides only the required permissions. The policy has the following effects: 
+  Allows Feature Store to use the customer managed key in cryptographic operations and create grants, but only when it is acting on behalf of principals in the account who have permission to use your Feature Store. If the principals specified in the policy statement don't have permission to use your Feature Store, the call fails, even when it comes from the Feature Store service. 
+  The [kms:ViaService](https://docs.aws.amazon.com/kms/latest/developerguide/policy-conditions.html#conditions-kms-via-service) condition key allows the permissions only when the request comes from FeatureStore on behalf of the principals listed in the policy statement. These principals can't call these operations directly. The value for `kms:ViaService` should be `sagemaker.*.amazonaws.com`. 
**Note**  
 The `kms:ViaService` condition key can only be used for the online store customer managed AWS KMS key, and cannot be used for the offline store. If you add this special condition to your customer managed key, and use the same AWS KMS key for both the online and offline store, then it will fail the `CreateFeatureGroup` API operation. 
+  Gives the customer managed key administrators read-only access to the customer managed key and permission to revoke grants, including the grants that Feature Store uses to protect your data. 

 Before using an example key policy, replace the example principals with actual principals from your AWS account. 

------
#### [ JSON ]

****  

```
{"Id": "key-policy-feature-store",
   "Version":"2012-10-17",		 	 	 
   "Statement": [
     {"Sid" : "Allow access through Amazon SageMaker AI Feature Store for all principals in the account that are authorized to use  Amazon SageMaker AI Feature Store ",
       "Effect": "Allow",
       "Principal": {"AWS": "arn:aws:iam::111122223333:user/featurestore-user"},
       "Action": [
         "kms:Encrypt",
         "kms:Decrypt",
         "kms:DescribeKey",
         "kms:CreateGrant",
         "kms:RetireGrant",
         "kms:ReEncryptFrom",
         "kms:ReEncryptTo",
         "kms:GenerateDataKey",
         "kms:ListGrants"
       ],
       "Resource": "*",      
       "Condition": {"StringLike": {"kms:ViaService" : "sagemaker.*.amazonaws.com"
          }
       }
     },
     {"Sid" : "Allow listing aliases",
       "Effect": "Allow",
       "Principal": {"AWS": "arn:aws:iam::111122223333:user/featurestore-user"},
       "Action": "kms:ListAliases",
       "Resource": "*"
     },
     {"Sid":  "Allow administrators to view the customer managed key and revoke grants",
       "Effect": "Allow",
       "Principal": {"AWS": "arn:aws:iam::111122223333:role/featurestore-admin"
        },
       "Action": [
         "kms:Describe*",
         "kms:Get*",
         "kms:List*",
         "kms:RevokeGrant"
       ],
       "Resource": "*"
     },
     {"Sid": "Enable IAM User Permissions",
       "Effect": "Allow",
       "Principal": {"AWS": "arn:aws:iam::111122223333:root"
        },
        "Action": "kms:*",
        "Resource": "*"
     }
   ]
 }
```

------

## Using grants to authorize Feature Store
<a name="feature-store-using-grants-authorize"></a>

 In addition to key policies, Feature Store uses grants to set permissions on the customer managed key. To view the grants on a customer managed key in your account, use the `[ListGrants](https://docs.aws.amazon.com/kms/latest/APIReference/API_ListGrants.html)` operation. Feature Store does not need grants, or any additional permissions, to use the [AWS owned customer managed key](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#aws-owned-cmk) to protect your online store. 

 Feature Store uses the grant permissions when it performs background system maintenance and continuous data protection tasks. 

 Each grant is specific to an online store. If the account includes multiple stores encrypted under the same customer managed key, there will be unique grants per `FeatureGroup` using the same customer managed key. 

 The key policy can also allow the account to [revoke the grant](https://docs.aws.amazon.com/kms/latest/APIReference/API_RevokeGrant.html) on the customer managed key. However, if you revoke the grant on an active encrypted online store, Feature Store won't be able to protect and maintain the store. 

## Monitoring Feature Store interaction with AWS KMS
<a name="feature-store-monitoring-kms-interaction"></a>

 If you use a [customer managed key](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#customer-cmk) to protect your online or offline store, you can use AWS CloudTrail logs to track the requests that Feature Store sends to AWS KMS on your behalf.

## Accessing data in your online store
<a name="feature-store-accessing-data-online-store"></a>

 The **caller (either user or role)** to **ALL DataPlane operations (Put, Get, DeleteRecord)** must have below permissions on the customer managed key: 

```
"kms:Decrypt"
```

## Authorizing use of a customer managed key for your offline store
<a name="feature-store-authorizing-use-cmk-offline-store"></a>

 The **roleArn** that is passed as a parameter to `createFeatureGroup` must have below permissions to the OfflineStore KmsKeyId: 

```
"kms:GenerateDataKey"
```

**Note**  
The key policy for the online store also works for the offline store, only when the `kms:ViaService` condition is not specified. 

**Important**  
You can specify a AWS KMS encryption key to encrypt the Amazon S3 location used for your offline feature store when you create a feature group. If AWS KMS encryption key is not specified, by default we encrypt all data at rest using AWS KMS key. By defining your [bucket-level key](https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucket-key.html) for SSE, you can reduce AWS KMS requests costs by up to 99 percent. 

# Logging Feature Store operations by using AWS CloudTrail
<a name="feature-store-logging-using-cloudtrail"></a>

Amazon SageMaker Feature Store is integrated with AWS CloudTrail, a service that provides a record of actions taken by a user, role, or an AWS service in Feature Store. CloudTrail captures all of the API calls for Feature Store listed on this page. The logged events include API calls from Feature Store resource management and data operations. When you create a trail, you activate continuous delivery of CloudTrail events from Feature Store to an Amazon S3 bucket. Using the information collected by CloudTrail, you can determine the request that was made to Feature Store, the IP address from which the request was made, who made the request, when it was made, and additional details.

To learn more about CloudTrail, see the [AWS CloudTrail User Guide](https://docs.aws.amazon.com/awscloudtrail/latest/userguide).

## Management events
<a name="feature-store-logging-using-cloudtrail-management-events"></a>

Management events capture operations performed on Feature Store resources in your AWS account. For example, the log generated from the management events provides visibility if a user creates or deletes a Feature Store. The following APIs log management events with Amazon SageMaker Feature Store.
+ `CreateFeatureGroup`
+ `DeleteFeatureGroup`
+ `DescribeFeatureGroup`
+ `UpdateFeatureGroup`

Amazon SageMaker API calls and management events are logged by default when you create the account, as described in [Logging Amazon SageMaker AI API calls using AWS CloudTrail](logging-using-cloudtrail.md). For more information, see [ Logging management events for trails](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/logging-management-events-with-cloudtrail.html). 

## Data events
<a name="feature-store-logging-using-cloudtrail-data-events"></a>

Data events capture data plane operations performed using the Feature Store resources in your AWS account. For example, the log generated from the data events provides visibility if a user adds or deletes a record within a feature group. The following APIs log data events with Amazon SageMaker Feature Store. 
+ `BatchGetRecord`
+ `DeleteRecord`
+ `GetRecord`
+ `PutRecord`

Data events are *not* logged by CloudTrail trails by default. To activate logging of data events, turn on logging of data plane API activity in CloudTrail. For more information, see CloudTrail's [ Logging data events for trails](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/logging-data-events-with-cloudtrail.html). 

 The following is an example CloudTrail event for a `PutRecord` API call: 

```
{
    "eventVersion": "1.08",
    "userIdentity": {
        "type": "IAMUser",
        "principalId": "USERPRINCIPALID",
        "arn": "arn:aws:iam::123456789012:user/user",
        "accountId": "123456789012",
        "accessKeyId": "USERACCESSKEYID",
        "userName": "your-user-name"
    },
    "eventTime": "2023-01-01T01:00:00Z",
    "eventSource": "sagemaker.amazonaws.com",
    "eventName": "PutRecord",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "192.0.2.0",
    "userAgent": "your-user-agent",
    "requestParameters": {
        "featureGroupName": "your-feature-group-name"
    },
    "responseElements": null,
    "requestID": "request-id",
    "eventID": "event-id",
    "readOnly": false,
    "resources": [
        {
            "accountId": "123456789012",
            "type": "AWS::SageMaker::FeatureGroup",
            "ARN": "arn:aws:sagemaker:us-east-1:123456789012:feature-group/your-feature-group-name"
        }
    ],
    "eventType": "AwsApiCall",
    "managementEvent": false,
    "recipientAccountId": "123456789012",
    "eventCategory": "Data",
    "tlsDetails": {
        ...
    }
}
```

# Quotas, naming rules and data types
<a name="feature-store-quotas"></a>

## Quota terminologies
<a name="feature-store-terminologies"></a>
+  Read Request Unit (RRU): Measure of read throughput, where the number of RRUs per read request is equal to the ceiling of read record's size divided into 4KB chunks. The minimum RRU per request is 0. 
+  Write Request Unit (WRU): Measure of write throughput, where the number of WRUs per write request is equal to the ceiling of the written record's size divided into 1KB chunks. The minimum WRU per request is 1 (including delete operations). 

## Limits and quotas
<a name="feature-store-limits-quotas"></a>
**Note**  
Soft limits can be increased based on your needs.
+  **Maximum number of feature groups per AWS account:** Soft limit of 100.
+  **Maximum number of feature definitions per feature group:** 2500.
+  **Maximum number of RRU per record identifier:** 2400 RRU per second.
+  **Maximum number of WRU per record identifier:** 500 WRU per second.
+  **Max Read Capacity Units (RCU) that can be provisioned on a single feature group:** 40000 RCU.
+  **Max Write Capacity Units (WCU) that can be provisioned on a single feature group:** 40000 WCU.
+  **Max Read Capacity Units that can be provisioned across all feature groups in a region:** 80000 RCU.
+  **Max Write Capacity Units that can be provisioned across all feature groups in a region:** 80000 WCU.
+  **Maximum Transactions per second (TPS) per API per AWS account:** Soft limit of 10000 TPS per API excluding the `BatchGetRecord` API call, which has a soft limit of 500 TPS.
+  **Maximum size of a record:** 350KB.
+  **Maximum size of a record identifier:** 2KB. 
+  **Maximum size of a feature value:** 350KB.
+ **Maximum number of concurrent feature group creation workflows:** 4.
+ **BatchGetRecord API:** Can contain as many as 100 records and can query up to 100 feature groups. 

For information about service quotas and how to request a quota increase, see [AWS service quotas](https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html).

## Naming rules
<a name="feature-store-naming-rules"></a>
+  **Reserved Words:** The following are reserved words and cannot be used as feature names in feature definitions: `is_deleted`, `write_time`, and `api_invocation_time`. 

## Data types
<a name="feature-store-data-types"></a>
+  **String Feature Type:** Strings are Unicode with UTF-8 binary encoding. The minimum length of a string can be zero, the maximum length is constrained by the maximum size of a record. 
+  **Fractional Feature Type:** Fractional feature values must conform to a double precision floating point number as defined by the [IEEE 754 standard](https://en.wikipedia.org/wiki/IEEE_754). 
+  **Integral Feature Type:** Feature Store supports integral values in the range of a 64-bit signed integer. Minimum value of -263 and a maximum value: 263 - 1. 
+  **Event Time Features:** All feature groups have an event time feature with nanosecond precision. Any event time with lower than nanosecond precision will lead to backwards incompatibility. The feature can have a feature type of either String or Fractional. 
  + A string event time is accepted in ISO-8601 format, in UTC time, conforming to the pattern(s): [yyyy-MM-dd'T'HH:mm:ssZ, yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSSZ].
  + A fractional event time value is accepted as seconds from unix epoch. Event times must be in the range of [0000-01-01T00:00:00.000000000Z, 9999-12-31T23:59:59.999999999Z]. For feature groups in the `Iceberg` table format, you can only use String type for the event time.

# Amazon SageMaker Feature Store offline store data format
<a name="feature-store-offline"></a>

Amazon SageMaker Feature Store supports the AWS Glue and Apache Iceberg table formats for the offline store. You can choose the table format when you’re creating a new feature group. AWS Glue is the default format.

 Amazon SageMaker Feature Store offline store data is stored in an Amazon S3 bucket within your account. When you call `PutRecord`, your data is buffered, batched, and written into Amazon S3 within 15 minutes. Feature Store only supports the Parquet file format when writing your data to your offline store. Specifically, when your data is written to your offline store, the data can be retrieved from your Amazon S3 bucket in Parquet format. Each file can contain multiple `Record`s.

For the Iceberg format, Feature Store saves the table’s metadata in the same Amazon S3 bucket that you’re using to store the offline store data. You can find it under the `metadata` prefix.

 Feature Store also exposes the [OfflineStoreConfig.S3StorageConfig.ResolvedOutputS3Uri ](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_S3StorageConfig.html#sagemaker-Type-S3StorageConfig-ResolvedOutputS3Uri) field, which can be found from in the [DescribeFeatureGroup](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeFeatureGroup.html) API call. This is the S3 path under which the files for the specific feature group are written.

The following additional fields are added by Feature Store to each record when they persist in the offline store: 
+  **api\$1invocation\$1time** – The timestamp when the service receives the `PutRecord` or `DeleteRecord` call. If using managed ingestion (e.g. Data Wrangler), this is the timestamp when data was written into the offline store.
+  **write\$1time** – The timestamp when data was written into the offline store. Can be used for constructing time-travel related queries.
+  **is\$1deleted** – `False` by default. If `DeleteRecord` is called, a new `Record` is inserted into `RecordIdentifierValue` and set to `True` in the offline store.

## Amazon SageMaker Feature Store offline store URI structures
<a name="feature-store-offline-URI-structure"></a>

In the following examples `amzn-s3-demo-bucket` is the Amazon S3 bucket within your account, `example-prefix` is your example prefix, `111122223333` is your account ID, `AWS Region` is your region, `feature-group-name` is the name of your feature group. 

**AWS Glue table format**

Records in the offline store stored using the AWS Glue table format are partitioned by event time into hourly partitions. You can’t configure the partitioning scheme. The following URI structure shows the organization of a Parquet file using the AWS Glue format:

```
s3://amzn-s3-demo-bucket/example-prefix/111122223333/sagemaker/AWS Region/offline-store/feature-group-name-feature-group-creation-time/data/year=year/month=month/day=day/hour=hour/timestamp_of_latest_event_time_in_file_16-random-alphanumeric-digits.parquet
```

The following example is the output location of a Parquet file for a file with `feature-group-name` as `customer-purchase-history-patterns`:

```
s3://amzn-s3-demo-bucket/example-prefix/111122223333/sagemaker/AWS Region/offline-store/customer-purchase-history-patterns-1593511200/data/year=2020/month=06/day=31/hour=00/20200631T064401Z_108934320012Az11.parquet
```

**Iceberg table format**

Records in the offline store stored in the Iceberg table format are partitioned by event time into daily partitions. You can’t configure the partitioning scheme. The following URI structure shows the organization of the data files saved in the Iceberg table format:

```
s3://amzn-s3-demo-bucket/example-prefix/111122223333/sagemaker/AWS Region/offline-store/feature-group-name-feature-group-creation-time/data/8-random-alphanumeric-digits/event-time-feature-name_trunc=event-time-year-event-time-month-event-time-day/timestamp-of-latest-event-time-in-file_16-random-alphanumeric-digits.parquet
```

The following example is the output location of a Parquet file for a file with `feature-group-name` as `customer-purchase-history-patterns`, and the `event-time-feature-name` is `EventTime`:

```
s3://amzn-s3-demo-bucket/example-prefix/111122223333/sagemaker/AWS Region/offline-store/customer-purchase-history-patterns-1593511200/data/0aec19ca/EventTime_trunc=2022-11-09/20221109T215231Z_yolTtpyuWbkaeGIl.parquet
```

The following example is the location of a metadata file for data files saved in the Iceberg table format.

```
s3://amzn-s3-demo-bucket/example-prefix/111122223333/sagemaker/AWS Region/offline-store/feature-group-name-feature-group-creation-time/metadata/
```

# Amazon SageMaker Feature Store resources
<a name="feature-store-resources"></a>

The following lists the available resources for Amazon SageMaker Feature Store users. For the Feature Store main page, see [Amazon SageMaker Feature Store](https://aws.amazon.com/sagemaker/feature-store/).

## Feature Store example notebooks and workshops
<a name="feature-store-sample-notebooks"></a>

To get started using Amazon SageMaker Feature Store, you can choose from a variety of example Jupyter notebooks from the following table. If this is your first time using Feature Store, try out the Introduction to Feature Store notebook. To run any these notebooks, you must attach this policy to your IAM execution role: `AmazonSageMakerFeatureStoreAccess`. 

See [IAM Roles](https://console.aws.amazon.com/iam/home#/roles) to access your role and attach this policy. For a walkthrough on how to view the policies attached to a role and how to add a policy to your role, see [Adding policies to your IAM role](https://docs.aws.amazon.com/sagemaker/latest/dg/feature-store-adding-policies.html). 

The following table lists a variety of resources to help you get started with Feature Store. This table contains examples, instructions, and example notebooks to guide you in how to use Feature Store for the first time to specific use cases. The code in these resources use the SageMaker AI SDK for Python (Boto3).


| **Page** | **Description** | 
| --- | --- | 
|  [Get started with Amazon SageMaker Feature Store](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-featurestore/) in Read the Docs.  |  A list of example notebooks to introduce you to Feature Store and its features to help you get started.   | 
|  [Amazon SageMaker Feature Store guide](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_featurestore.html) in Read the Docs.  |  A Feature Store guide on how to set up, create a feature group, load data into a feature group, and how to use Feature Store in general.  | 
|  [Amazon SageMaker Feature Store end-to-end workshop](https://github.com/aws-samples/amazon-sagemaker-feature-store-end-to-end-workshop) in the `aws-samples` Github repository  |  An end-to-end Feature Store workshop.  | 
|   [Feature Store example notebooks](https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker-featurestore) in the SageMaker AI example notebooks repository.  |  Specific use case example notebooks for Feature Store.  | 

## Feature Store Python SDK and API
<a name="feature-store-api-sdks"></a>

Python Software Development Kit (SDK) and Application Programming Interface (API) are tools used for creating software applications. The Feature Store SDK for Python (Boto3) and API are listed in the following table.


| **Page** | **Description** | 
| --- | --- | 
|  [Feature Store APIs](https://sagemaker.readthedocs.io/en/stable/api/prep_data/feature_store.html) in the Amazon SageMaker Python SDK Read the Docs  |  The Feature Store APIs in Read the Docs.  | 
|  [Feature Store Python SDK](https://github.com/aws/sagemaker-python-sdk/tree/master/src/sagemaker/feature_store) in the Amazon SageMaker Python SDK Github repository  |  The Feature Store Python SDK Github repository.  | 
|   [Feature Store Runtime operations and data types](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-featurestore-runtime.html) in the SDK for Python (Boto3) documentation  |  Feature Store Runtime client that contains all data plane API operations and data types for Feature Store.  | 
|   [Amazon SageMaker Feature Store Runtime](https://docs.aws.amazon.com/sagemaker/latest/APIReference/Welcome.html#Welcome_Amazon_SageMaker_Feature_Store_Runtime) in the Amazon SageMaker API Reference  |  Some feature group level actions supported by Feature Store. If the API operation or data type you are looking for is not listed here, please use search in the guide.  | 
|  [Amazon SageMaker Feature Store Runtime](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Operations_Amazon_SageMaker_Feature_Store_Runtime.html) in the Amazon SageMaker API Reference  |  Record level actions supported by Feature Store. If the API operation or data type you are looking for is not listed here, please use search in the guide.  |