PySpark extension types
The types that are used by the AWS Glue PySpark extensions.
DataType
The base class for the other AWS Glue types.
__init__(properties={})
-
properties
– Properties of the data type (optional).
typeName(cls)
Returns the type of the AWS Glue type class (that is, the class name with "Type" removed from the end).
-
cls
– An AWS Glue class instance derived fromDataType
.
jsonValue( )
Returns a JSON object that contains the data type and properties of the class:
{ "dataType": typeName, "properties": properties }
AtomicType and simple derivatives
Inherits from and extends the DataType class, and serves as the base class for all the AWS Glue atomic data types.
fromJsonValue(cls, json_value)
Initializes a class instance with values from a JSON object.
-
cls
– An AWS Glue type class instance to initialize. -
json_value
– The JSON object to load key-value pairs from.
The following types are simple derivatives of the AtomicType class:
BinaryType
– Binary data.BooleanType
– Boolean values.ByteType
– A byte value.DateType
– A datetime value.DoubleType
– A floating-point double value.IntegerType
– An integer value.LongType
– A long integer value.NullType
– A null value.ShortType
– A short integer value.StringType
– A text string.TimestampType
– A timestamp value (typically in seconds from 1/1/1970).UnknownType
– A value of unidentified type.
DecimalType(AtomicType)
Inherits from and extends the AtomicType class to represent a decimal number (a number expressed in decimal digits, as opposed to binary base-2 numbers).
__init__(precision=10, scale=2, properties={})
-
precision
– The number of digits in the decimal number (optional; the default is 10). -
scale
– The number of digits to the right of the decimal point (optional; the default is 2). -
properties
– The properties of the decimal number (optional).
EnumType(AtomicType)
Inherits from and extends the AtomicType class to represent an enumeration of valid options.
__init__(options)
-
options
– A list of the options being enumerated.
collection types
ArrayType(DataType)
__init__(elementType=UnknownType(), properties={})
-
elementType
– The type of elements in the array (optional; the default is UnknownType). -
properties
– Properties of the array (optional).
ChoiceType(DataType)
__init__(choices=[], properties={})
-
choices
– A list of possible choices (optional). -
properties
– Properties of these choices (optional).
add(new_choice)
Adds a new choice to the list of possible choices.
-
new_choice
– The choice to add to the list of possible choices.
merge(new_choices)
Merges a list of new choices with the existing list of choices.
-
new_choices
– A list of new choices to merge with existing choices.
MapType(DataType)
__init__(valueType=UnknownType, properties={})
-
valueType
– The type of values in the map (optional; the default is UnknownType). -
properties
– Properties of the map (optional).
Field(Object)
Creates a field object out of an object that derives from DataType.
__init__(name, dataType, properties={})
-
name
– The name to be assigned to the field. -
dataType
– The object to create a field from. -
properties
– Properties of the field (optional).
StructType(DataType)
Defines a data structure (struct
).
__init__(fields=[], properties={})
-
fields
– A list of the fields (of typeField
) to include in the structure (optional). -
properties
– Properties of the structure (optional).
add(field)
-
field
– An object of typeField
to add to the structure.
hasField(field)
Returns True
if this structure has a field of the same name, or
False
if not.
-
field
– A field name, or an object of typeField
whose name is used.
getField(field)
-
field
– A field name or an object of typeField
whose name is used. If the structure has a field of the same name, it is returned.
EntityType(DataType)
__init__(entity, base_type, properties)
This class is not yet implemented.
other types
DataSource(object)
__init__(j_source, sql_ctx, name)
-
j_source
– The data source. -
sql_ctx
– The SQL context. -
name
– The data-source name.
setFormat(format, **options)
-
format
– The format to set for the data source. -
options
– A collection of options to set for the data source. For more information about format options, see Data format options for inputs and outputs in AWS Glue for Spark.
getFrame()
Returns a DynamicFrame
for the data source.
DataSink(object)
__init__(j_sink, sql_ctx)
-
j_sink
– The sink to create. -
sql_ctx
– The SQL context for the data sink.
setFormat(format, **options)
-
format
– The format to set for the data sink. -
options
– A collection of options to set for the data sink. For more information about format options, see Data format options for inputs and outputs in AWS Glue for Spark.
setAccumulableSize(size)
-
size
– The accumulable size to set, in bytes.
writeFrame(dynamic_frame, info="")
-
dynamic_frame
– TheDynamicFrame
to write. -
info
– Information about theDynamicFrame
(optional).
write(dynamic_frame_or_dfc, info="")
Writes a DynamicFrame
or a DynamicFrameCollection
.
-
dynamic_frame_or_dfc
– Either aDynamicFrame
object or aDynamicFrameCollection
object to be written. -
info
– Information about theDynamicFrame
orDynamicFrames
to be written (optional).