Considerations for Amazon EMR with Lake Formation
Consider the following when using Amazon EMR with AWS Lake Formation.
-
Table-level access control is available on clusters with Amazon EMR releases 6.13 and higher.
-
Fine-grained access control at row, column, and cell level is available on clusters with Amazon EMR releases 6.15 and higher.
-
Users with access to a table can access all the properties of that table. If you have Lake Formation based access control on a table, review the table to make sure that the properties don't contain any sensitive data or information.
-
Amazon EMR clusters with Lake Formation don't support Spark's fallback to HDFS when Spark collects table statistics. This ordinarily helps optimize query performance.
-
Operations that support access controls based on Lake Formation with non-governed Apache Spark tables include
INSERT INTO
andINSERT OVERWRITE
. -
Operations that support access controls based on Lake Formation with Apache Spark and Apache Hive include
SELECT
,DESCRIBE
,SHOW DATABASE
,SHOW TABLE
,SHOW COLUMN
, andSHOW PARTITION
. -
Amazon EMR doesn't support access control to the following Lake Formation based operations:
-
Writes to governed tables
-
Amazon EMR doesn't support
CREATE TABLE
. Amazon EMR 6.10.0 and higher supportsALTER TABLE
. -
DML statements other than
INSERT
commands.
-
-
There are performance differences between the same query with and without Lake Formation based access control.
-
You can only use Amazon EMR with Lake Formation for Spark jobs.
-
Trusted Identity propagation is not supported with multi-catalog hierarchy in Glue Data Catalog. For more information, see Working with a multi-catalog hierarchy in AWS Glue Data Catalog.