Query Amazon Redshift with PySpark (via Glue InteractiveSession) - Amazon SageMaker Unified Studio

Amazon SageMaker Unified Studio is in preview release and is subject to change.

Query Amazon Redshift with PySpark (via Glue InteractiveSession)

To query Amazon Redshift through AWS Glue using PySpark, write and run the following code:

%%pyspark project.spark import sys import boto3 from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from pyspark.sql import SparkSession args = getResolvedOptions( sys.argv, ["redshift_url", "redshift_iam_role", "redshift_tempdir","redshift_jdbc_iam_url"] ) sc = SparkContext.getOrCreate() spark = SparkSession(sc) table_name = "database.table" rs_read_df = ( spark.read.format("io.github.spark_redshift_community.spark.redshift") .option("url", args["redshift_jdbc_iam_url"]) .option("aws_iam_role", args["redshift_iam_role"]) .option("tempdir", args["redshift_tempdir"]) .option("unload_s3_format", "PARQUET") .option("dbtable", table_name) .load() ) rs_read_df.show(5)