How Amazon SageMaker Lakehouse works

Amazon SageMaker Lakehouse is accessible from Amazon SageMaker Unified Studio. It organizes data from various sources into logical containers called catalogs. Each catalog represents data from existing sources like Amazon Redshift data warehouses, Amazon S3 data lakes, databases, or enterprise applications. You can also create new catalogs in the lakehouse to store data in S3 or Redshift Managed Storage (RMS).

You can access the data as Apache Iceberg tables and query it using any Iceberg-compatible engine, such as Apache Spark, Amazon Athena, or Amazon EMR. Additionally, these catalogs are mounted as databases in Amazon Redshift, allowing you to connect and analyze your lakehouse data using SQL tools.

Amazon SageMaker Lakehouse is built on AWS Glue Data Catalog and AWS Lake Formation in your AWS account. With Amazon SageMaker Lakehouse, you can access and query your existing data in Amazon Redshift data warehouses and store new data in RMS from any Apache Iceberg compatible engine.

The following diagram shows how Amazon SageMaker Lakehouse works.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Amazon SageMaker Lakehouse

Key components