Using Apache Iceberg with Amazon EMR on EKS

Focus mode

Using Apache Iceberg with Amazon EMR on EKS - Amazon EMR

Spark session configurations for catalog integration

The runtime JAR for Iceberg contains the necessary Iceberg classes for Spark runtime support. The following procedure shows how to start a job run using the Iceberg spark runtime.

To use Apache Iceberg with Amazon EMR on EKS applications

When you start a job run to submit a Spark job in the application configuration, include the Iceberg spark runtime JAR file:


--job-driver '{"sparkSubmitJobDriver" : {"sparkSubmitParameters" : "--jars local:///usr/share/aws/iceberg/lib/iceberg-spark3-runtime.jar"}}'

Include Iceberg additional configuration:


--configuration-overrides '{
    "applicationConfiguration": [
    "classification" : "spark-defaults", 
    "properties" : {
        "spark.sql.catalog.dev.warehouse" : "s3://amzn-s3-demo-bucket/EXAMPLE-PREFIX/ ", 
        "spark.sql.extensions ":" org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions ", 
        "spark.sql.catalog.dev" : "org.apache.iceberg.spark.SparkCatalog",
        "spark.sql.catalog.dev.catalog-impl" : "org.apache.iceberg.aws.glue.GlueCatalog",
        "spark.sql.catalog.dev.io-impl": "org.apache.iceberg.aws.s3.S3FileIO"
        }
    ]
}'

To learn more about Apache Iceberg release versions of EMR, see Iceberg release history.

Spark session configurations for catalog integration

Spark session configurations for Iceberg AWS Glue catalog integration

This sample shows how to integrate Iceberg with the AWS Glue crawler:


spark-sql \
  --conf spark.sql.catalog.rms = org.apache.iceberg.spark.SparkCatalog \
  --conf spark.sql.catalog.rms.type = glue \
  --conf spark.sql.catalog.rms.glue.id = glue RMS catalog ID \
  --conf spark.sql.catalog.rms.glue.account-id = AWS account ID \
  
  --conf spark.sql.extensions=
    org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

The following shows a sample query:


SELECT * FROM rms.rmsdb.table1

Spark session configurations for Iceberg REST AWS Glue catalog integration

This sample shows how to integrate Iceberg REST with the AWS Glue crawler:


spark-sql \
  --conf spark.sql.catalog.rms = org.apache.iceberg.spark.SparkCatalog \
  --conf spark.sql.catalog.rms.type = rest \
  --conf spark.sql.catalog.rms.warehouse = glue RMS catalog ID \
  --conf spark.sql.catalog.rms.uri = glue endpoint URI/iceberg \
  --conf spark.sql.catalog.rms.rest.sigv4-enabled = true \
  --conf spark.sql.catalog.rms.rest.signing-name = glue \
  
  --conf spark.sql.extensions=
    org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

The following shows a sample query:


SELECT * FROM rms.rmsdb.table1

This configuration works for Redshift Managed Storage only. FGAC for Amazon S3 isn't supported.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Using Delta Lake

Using PyFlink

Select your cookie preferences

Customize cookie preferences

Essential

Performance

Functional

Advertising

Unable to save cookie preferences

Using Apache Iceberg with Amazon EMR on EKS

To use Apache Iceberg with Amazon EMR on EKS applications

Spark session configurations for catalog integration

Spark session configurations for Iceberg AWS Glue catalog integration

Spark session configurations for Iceberg REST AWS Glue catalog integration

On this page

Did this page help you?

Next topic:

Previous topic:

Need help?