本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
在 Amazon Redshift 中讀取和寫入
下列程式碼範例用 PySpark 於從具有資料來源和 Spark SQL 的 Amazon Redshift 資料庫讀取API和寫入範例資料。
- Data source API
-
用 PySpark 於從具有資料來源的 Amazon Redshift 資料庫讀取和寫入範例資料API。
import boto3 from pyspark.sql import SQLContext sc = # existing SparkContext sql_context = SQLContext(sc) url = "jdbc:redshift:iam://redshifthost:5439/database" aws_iam_role_arn = "arn:aws:iam::
account-id
:role/role-name
" df = sql_context.read \ .format("io.github.spark_redshift_community.spark.redshift") \ .option("url",url
) \ .option("dbtable", "table-name
") \ .option("tempdir", "s3://path/for/temp/data
") \ .option("aws_iam_role", "aws-iam-role-arn
") \ .load() df.write \ .format("io.github.spark_redshift_community.spark.redshift") \ .option("url",url
) \ .option("dbtable", "table-name-copy
") \ .option("tempdir", "s3://path/for/temp/data
") \ .option("aws_iam_role", "aws-iam-role-arn
") \ .mode("error") \ .save() - SparkSQL
-
用於 PySpark 使用 Spark SQL 從 Amazon Redshift 資料庫讀取和寫入範例資料。
import boto3 import json import sys import os from pyspark.sql import SparkSession spark = SparkSession \ .builder \ .enableHiveSupport() \ .getOrCreate() url = "jdbc:redshift:iam://redshifthost:5439/database" aws_iam_role_arn = "arn:aws:iam::
account-id
:role/role-name
" bucket = "s3://path/for/temp/data
" tableName = "table-name
" # Redshift table name s = f"""CREATE TABLE IF NOT EXISTS {table-name
} (country string, data string) USING io.github.spark_redshift_community.spark.redshift OPTIONS (dbtable '{table-name
}', tempdir '{bucket
}', url '{url
}', aws_iam_role '{aws-iam-role-arn
}' ); """ spark.sql(s) columns = ["country" ,"data"] data = [("test-country
","test-data
")] df = spark.sparkContext.parallelize(data).toDF(columns) # Insert data into table df.write.insertInto(table-name
, overwrite=False) df = spark.sql(f"SELECT * FROM {table-name
}") df.show()
向 Amazon Redshift 進行身分驗證
考量事項