Reading from SAP OData entities - AWS Glue

Reading from SAP OData entities

Prerequisite

A SAP OData object you would like to read from. You will need the object/EntitySet name, for example, /sap/opu/odata/sap/API_SALES_ORDER_SRV/A_SalesOrder.

Example:

sapodata_read = glueContext.create_dynamic_frame.from_options( connection_type="SAPOData", connection_options={ "connectionName": "connectionName", "ENTITY_NAME": "entityName" }, transformation_ctx=key)

SAP OData entity and field details:

Entity Data type Supported operators
Tables (dynamic entities) String =, !=, >, >=, <, <=, BETWEEN, LIKE
Integer =, !=, >, >=, <, <=, BETWEEN, LIKE
Long =, !=, >, >=, <, <=, BETWEEN, LIKE
Double =, !=, >, >=, <, <=, BETWEEN, LIKE
Date =, !=, >, >=, <, <=, BETWEEN, LIKE
DateTime =, !=, >, >=, <, <=, BETWEEN, LIKE
Boolean =, !=
Struct =, !=, >, >=, <, <=, BETWEEN, LIKE

Partitioning queries

Field-based partitioning:

You can provide the additional Spark options PARTITION_FIELD, LOWER_BOUND, UPPER_BOUND, and NUM_PARTITIONS if you want to utilize concurrency in Spark. With these parameters, the original query would be split into NUM_PARTITIONS number of sub-queries that can be executed by Spark tasks concurrently. Integer, Date and DateTime fields support field-based partitioning in the SAP OData connector.

  • PARTITION_FIELD: the name of the field to be used to partition the query.

  • LOWER_BOUND: an inclusive lower bound value of the chosen partition field.

    For the Datetime field, we accept the Spark timestamp format used in SPark SQL queries.

    Examples of valid value:

    "2000-01-01T00:00:00.000Z"
  • UPPER_BOUND: an exclusive upper bound value of the chosen partition field.

  • NUM_PARTITIONS: the number of partitions.

  • PARTITION_BY: the type partitioning to be performed. "FIELD" is to be passed in case of field-based partitioning.

Example:

sapodata= glueContext.create_dynamic_frame.from_options( connection_type="sapodata", connection_options={ "connectionName": "connectionName", "ENTITY_NAME": "/sap/opu/odata/sap/SEPM_HCM_SCENARIO_SRV/EmployeeSet", "PARTITION_FIELD": "validStartDate" "LOWER_BOUND": "2000-01-01T00:00:00.000Z" "UPPER_BOUND": "2020-01-01T00:00:00.000Z" "NUM_PARTITIONS": "10", "PARTITION_BY": "FIELD" }, transformation_ctx=key)

Record-based partitioning:

The original query would be split into NUM_PARTITIONS number of sub-queries that can be executed by Spark tasks concurrently.

Record-based partitioning is only supported for non-ODP entities, as pagination in ODP entities is supported through the next token/skip token.

  • PARTITION_BY: the type partitioning to be performed. "COUNT" is to be passed in case of record-based partitioning.

sapodata= glueContext.create_dynamic_frame.from_options( connection_type="sapodata", connection_options={ "connectionName": "connectionName", "ENTITY_NAME": "/sap/opu/odata/sap/SEPM_HCM_SCENARIO_SRV/EmployeeSet", "NUM_PARTITIONS": "10", "PARTITION_BY": "COUNT" }, transformation_ctx=key)