Prerequisite
A Salesforce sObject you would like to read from. You will need the object name such as Account
or Case
or Opportunity
.
Example:
salesforce_read = glueContext.create_dynamic_frame.from_options(
connection_type="salesforce",
connection_options={
"connectionName": "connectionName",
"ENTITY_NAME": "Account",
"API_VERSION": "v60.0"
}
Partitioning queries
You can provide the additional Spark options PARTITION_FIELD
, LOWER_BOUND
, UPPER_BOUND
, and NUM_PARTITIONS
if you want to utilize concurrency in Spark. With these parameters, the original query would be split into NUM_PARTITIONS
number of sub-queries that can be executed by Spark tasks concurrently.
PARTITION_FIELD
: the name of the field to be used to partition the query.LOWER_BOUND
: an inclusive lower bound value of the chosen partition field.For Date or Timestamp fields, the connector accepts the Spark timestamp format used in Spark SQL queries.
Examples of valid values:
"TIMESTAMP \"1707256978123\"" "TIMESTAMP '2018-01-01 00:00:00.000 UTC'" "TIMESTAMP \"2018-01-01 00:00:00 Pacific/Tahiti\"" "TIMESTAMP \"2018-01-01 00:00:00\"" "TIMESTAMP \"-123456789\" Pacific/Tahiti" "TIMESTAMP \"1702600882\""
UPPER_BOUND
: an exclusive upper bound value of the chosen partition field.NUM_PARTITIONS
: the number of partitions.
Example:
salesforce_read = glueContext.create_dynamic_frame.from_options(
connection_type="salesforce",
connection_options={
"connectionName": "connectionName",
"ENTITY_NAME": "Account",
"API_VERSION": "v60.0",
"PARTITION_FIELD": "SystemModstamp"
"LOWER_BOUND": "TIMESTAMP '2021-01-01 00:00:00 Pacific/Tahiti'"
"UPPER_BOUND": "TIMESTAMP '2023-01-10 00:00:00 Pacific/Tahiti'"
"NUM_PARTITIONS": "10"
}