Using AWS Lake Formation with AWS Glue - AWS Lake Formation

Using AWS Lake Formation with AWS Glue

Data engineers and DevOps professionals use AWS Glue with Extract, Transform and Load (ETL) with Apache Spark to perform transformations on their data sets in Amazon S3 and load the transformed data into data lakes and data warehouses for analytics, machine learning, and application development. With different teams accessing the same data set in Amazon S3, it is imperative to grant and restrict permissions based on their roles.

AWS Lake Formation is built on AWS Glue, and the services interact in the following ways:

  • Lake Formation and AWS Glue share the same Data Catalog.

  • The following Lake Formation console features invoke the AWS Glue console:

  • The workflows generated when you use a Lake Formation blueprint are AWS Glue workflows. You can view and manage these workflows in both the Lake Formation console and the AWS Glue console.

  • Machine learning transforms are provided with Lake Formation and are built on AWS Glue API operations. You create and manage machine learning transforms on the AWS Glue console. For more information, see Machine Learning Transforms in the AWS Glue Developer Guide.

You can use the Lake Formation fine-grained access control to manage your existing Data Catalog resources and Amazon S3 data locations.

Note

AWS Glue ETL requires full access to the entire table while fetching data from underlying Amazon S3 location. AWS Glue ETL job fails if you apply column-level permissions on a table.

Support for transactional table types

Applying Lake Formation permissions allows you to secure your transactional data in your Amazon S3 based data lakes. The table below lists transactional table formats supported in AWS Glue and the Lake Formation permissions. Lake Formation enforces these permissions for AWS Glue operations.

Supported table formats
Table format Description and allowed operations Lake Formation permissions supported in AWS Glue

Apache Hudi

A open table format used to simplify incremental data processing and data pipeline development.

For examples, see Using the Hudi framework in AWS Glue.

Table-level permissions are available for Hudi tables.

For more information, see Limitations.

Apache Iceberg

An open table format that manages large collections of files as tables.

For examples, see Using the Iceberg framework in AWS Glue.

Table-level permissions are available for Iceberg tables.

For more information, see Limitations.

Linux Foundation Delta Lake

Delta Lake is an open-source project that helps implement modern data lake architectures commonly built on Amazon S3 or Hadoop Distributed File System (HDFS).

For examples, see Using the Delta Lake framework in AWS Glue.

Table-level permissions are available for Delta Lake tables.

For more information, see Limitations.

Additional resources