Considerations for Amazon EMR with Lake Formation - Amazon EMR

Considerations for Amazon EMR with Lake Formation

Consider the following when using Amazon EMR with AWS Lake Formation.

  • Table-level access control is available on clusters with Amazon EMR releases 6.13 and higher.

  • Fine-grained access control at row, column, and cell level is available on clusters with Amazon EMR releases 6.15 and higher.

  • Users with access to a table can access all the properties of that table. If you have Lake Formation based access control on a table, review the table to make sure that the properties don't contain any sensitive data or information.

  • Amazon EMR clusters with Lake Formation don't support Spark's fallback to HDFS when Spark collects table statistics. This ordinarily helps optimize query performance.

  • Operations that support access controls based on Lake Formation with non-governed Apache Spark tables include INSERT INTO and INSERT OVERWRITE.

  • Operations that support access controls based on Lake Formation with Apache Spark and Apache Hive include SELECT, DESCRIBE, SHOW DATABASE, SHOW TABLE, SHOW COLUMN, and SHOW PARTITION.

  • Amazon EMR doesn't support access control to the following Lake Formation based operations:

    • Writes to governed tables

    • Amazon EMR doesn't support CREATE TABLE. Amazon EMR 6.10.0 and higher supports ALTER TABLE.

    • DML statements other than INSERT commands.

  • There are performance differences between the same query with and without Lake Formation based access control.

  • You can only use Amazon EMR with Lake Formation for Spark jobs.

  • Trusted Identity propagation is not supported with multi-catalog hierarchy in Glue Data Catalog. For more information, see Working with a multi-catalog hierarchy in AWS Glue Data Catalog.