Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Examples of custom visual scripts

Focus mode
Examples of custom visual scripts - AWS Glue

The following examples perform equivalent transformations. However, the second example (SparkSQL) is the cleanest and most efficient, followed by the Pandas UDF and finally the low level mapping in the first example. The following example is a complete example of a simple transformation to add up two columns:

from awsglue import DynamicFrame # You can have other auxiliary variables, functions or classes on this file, it won't affect the runtime def record_sum(rec, col1, col2, resultCol): rec[resultCol] = rec[col1] + rec[col2] return rec # The number and name of arguments must match the definition on json config file # (expect self which is the current DynamicFrame to transform # If an argument is optional, you need to define a default value here # (resultCol in this example is an optional argument) def custom_add_columns(self, col1, col2, resultCol="result"): # The mapping will alter the columns order, which could be important fields = [field.name for field in self.schema()] if resultCol not in fields: # If it's a new column put it at the end fields.append(resultCol) return self.map(lambda record: record_sum(record, col1, col2, resultCol)).select_fields(paths=fields) # The name we assign on DynamicFrame must match the configured "functionName" DynamicFrame.custom_add_columns = custom_add_columns

The following example is an equivalent transform leveraging the SparkSQL API.

from awsglue import DynamicFrame # The number and name of arguments must match the definition on json config file # (expect self which is the current DynamicFrame to transform # If an argument is optional, you need to define a default value here # (resultCol in this example is an optional argument) def custom_add_columns(self, col1, col2, resultCol="result"): df = self.toDF() return DynamicFrame.fromDF( df.withColumn(resultCol, df[col1] + df[col2]) # This is the conversion logic , self.glue_ctx, self.name) # The name we assign on DynamicFrame must match the configured "functionName" DynamicFrame.custom_add_columns = custom_add_columns

The following example uses the same transformations but using a pandas UDF, which is more efficient that using a plain UDF. For more information about writing pandas UDFs see: Apache Spark SQL documentation.

from awsglue import DynamicFrame import pandas as pd from pyspark.sql.functions import pandas_udf # The number and name of arguments must match the definition on json config file # (expect self which is the current DynamicFrame to transform # If an argument is optional, you need to define a default value here # (resultCol in this example is an optional argument) def custom_add_columns(self, col1, col2, resultCol="result"): @pandas_udf("integer") # We need to declare the type of the result column def add_columns(value1: pd.Series, value2: pd.Series) → pd.Series: return value1 + value2 df = self.toDF() return DynamicFrame.fromDF( df.withColumn(resultCol, add_columns(col1, col2)) # This is the conversion logic , self.glue_ctx, self.name) # The name we assign on DynamicFrame must match the configured "functionName" DynamicFrame.custom_add_columns = custom_add_columns
PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.