Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Connecting to Data Catalog from a standalone Spark application

Focus mode
Connecting to Data Catalog from a standalone Spark application - AWS Glue

You can connect to the Data Catalog from a stand application using an Apache Iceberg connector.

  1. Create an IAM role for Spark application.

  2. Connect to AWS Glue Iceberg Rest endpoint using Iceberg connector.

    # configure your application. Refer to https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html for best practices on configuring environment variables. export AWS_ACCESS_KEY_ID=$(aws configure get appUser.aws_access_key_id) export AWS_SECRET_ACCESS_KEY=$(aws configure get appUser.aws_secret_access_key) export AWS_SESSION_TOKEN=$(aws configure get appUser.aws_secret_token) export AWS_REGION=us-east-1 export REGION=us-east-1 export AWS_ACCOUNT_ID = {specify your aws account id here} ~/spark-3.5.3-bin-hadoop3/bin/spark-shell \ --packages org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.6.0 \ --conf "spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions" \ --conf "spark.sql.defaultCatalog=spark_catalog" \ --conf "spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog" \ --conf "spark.sql.catalog.spark_catalog.type=rest" \ --conf "spark.sql.catalog.spark_catalog.uri=https://glue.us-east-1.amazonaws.com/iceberg" \ --conf "spark.sql.catalog.spark_catalog.warehouse = {AWS_ACCOUNT_ID}" \ --conf "spark.sql.catalog.spark_catalog.rest.sigv4-enabled=true" \ --conf "spark.sql.catalog.spark_catalog.rest.signing-name=glue" \ --conf "spark.sql.catalog.spark_catalog.rest.signing-region=us-east-1" \ --conf "spark.sql.catalog.spark_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO" \ --conf "spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.SimpleAWSCredentialProvider"
  3. Query data in the Data Catalog.

    spark.sql("create database myicebergdb").show()
    spark.sql("""CREATE TABLE myicebergdb.mytbl (name string) USING iceberg location 's3://bucket_name/mytbl'""")
    spark.sql("insert into myicebergdb.mytbl values('demo') ").show()
           
PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.