Amazon SageMaker Lakehouse key components - Amazon SageMaker Unified Studio

Amazon SageMaker Unified Studio is in preview release and is subject to change.

Amazon SageMaker Lakehouse key components

Amazon SageMaker Lakehouse has the following key components.

Catalog

A catalog is a logical container that organizes objects from a data store, such as schemas, tables, views, or materialized views such as from Amazon Redshift. You can create nested catalogs to mirror the hierarchical structure of your data sources within SageMaker Lakehouse.

There are two types of catalogs in Lakehouse: federated catalogs and managed catalogs. A federated catalog mounts existing data sources you add to Lakehouse. A federated catalog can bring existing data in data sources such as Amazon Redshift, Amazon DynamoDB, and Snowflake. A managed catalog refers to a new catalog you create using Lakehouse. A managed catalog manages data using RMS or S3.

Catalog type in Amazon SageMaker Lakehouse
Database

Databases are used to organize metadata tables in a catalog in Amazon SageMaker Lakehouse.

Table/View

Tables and views are database objects that define how to access and represent the underlying data. They specify details such as schema, partitions, storage location, storage format, and the SQL query required to access the data.

The following is a diagram of how catalogs, databases, tables/views work in Lakehouse.

How catalogs, databases, tables/views work in Amazon SageMaker Lakehouse
Storage

You can read and write data into Amazon S3 or Redshift Managed Storage (RMS). Based on the storage type you choose to store data in the lakehouse.