Using AWS Lake Formation with Amazon Redshift Spectrum
Amazon Redshift Spectrum lets you to query and retrieve data in Amazon S3 data lakes without loading data into Amazon Redshift cluster nodes.
Redshift Spectrum supports two ways of registering an external AWS Glue data catalog enabled with Lake Formation.
Using a cluster attached IAM role that has permission to the Data Catalog
To create an IAM role, follow the steps outlined in the below procedure.
-
Using a federated IAM identity configured to manage access to external AWS Glue Data Catalog resources
Redshift Spectrum supports querying Lake Formation tables using federated IAM identities. The IAM identities can be an IAM user or an IAM role. For more information on IAM identity federation in Redshift Spectrum, see Using a federated identity to manage Amazon Redshift access to local resources and Redshift Spectrum external tables.
With Lake Formation integration with Redshift Spectrum, you can define row, column, and cell-level access control permissions on tables after your data is registered with Lake Formation.
For more information see Using Redshift Spectrum with AWS Lake Formation.
Redshift Spectrum supports reads or SELECT
queries on the Lake Formation managed external schema tables.
For more information, see Creating external schemas for Redshift Spectrum.
Support for transactional table types
This table lists transactional table formats supported in Redshift Spectrum and the applicable Lake Formation permissions.
Table format | Description and allowed operations | Lake Formation permissions supported in Redshift Spectrum |
---|---|---|
Apache Hudi |
A format used to simplify incremental data processing and data pipeline development. Redshift Spectrum supports insert, delete, and upsert write operations using Apache Hudi
Copy on Write (CoW) For more information, see Creating external tables for data managed in Apache Hudi. |
Use Data filtering and cell-level security in Lake Formation to secure Hudi tables using table, column, row, and cell-level permissions. |
Apache Iceberg |
An open table format that manages large collections of files as tables and supports modern analytic data lake operations such as record-level insert, update, delete, and time travel queries. For more information, see Using Apache Iceberg tables with Amazon Redshift. |
Redshift Spectrum supports Apache Iceberg tables for querying. |
Linux Foundation Delta Lake | Delta Lake is an open-source project that helps implement modern data lake
architectures commonly built on Amazon S3 or Hadoop Distributed File System
(HDFS). Redshift Spectrum supports querying Delta Lake tables. For more information, see Creating external tables for data managed in Delta Lake. |
Table, column, row, and cell-level permissions are supported. |