Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Launching a Spark application using the Amazon Redshift integration for Apache Spark - Amazon EMR

Launching a Spark application using the Amazon Redshift integration for Apache Spark

For Amazon EMR releases 6.4 through 6.9, you must use the --jars or --packages option to specify which of the following JAR files you want to use. The --jars option specifies dependencies stored locally, in HDFS, or using HTTP/S. To see other file locations supported by the --jars option, see Advanced Dependency Management in the Spark documentation. The --packages option specifies dependencies stored in the public Maven repo.

  • spark-redshift.jar

  • spark-avro.jar

  • RedshiftJDBC.jar

  • minimal-json.jar

Amazon EMR releases 6.10.0 and higher don't require the minimal-json.jar dependency, and automatically install the other dependencies to each cluster by default. The following examples show how to launch a Spark application with the Amazon Redshift integration for Apache Spark.

Amazon EMR 6.10.0 +

The following example shows how to launch a Spark application with the spark-redshift connector with Amazon EMR releases 6.10 and higher.

spark-submit my_script.py
Amazon EMR 6.4.0 - 6.9.x

To launch a Spark application with the spark-redshift connector on Amazon EMR releases 6.4 through 6.9, you must use the --jars or --packages option, as the following example shows. Note that the paths listed with the --jars option are the default paths for the JAR files.

spark-submit \ --jars /usr/share/aws/redshift/jdbc/RedshiftJDBC.jar,/usr/share/aws/redshift/spark-redshift/lib/spark-redshift.jar,/usr/share/aws/redshift/spark-redshift/lib/spark-avro.jar,/usr/share/aws/redshift/spark-redshift/lib/minimal-json.jar \ my_script.py

The following example shows how to launch a Spark application with the spark-redshift connector with Amazon EMR releases 6.10 and higher.

spark-submit my_script.py
PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.