Amazon Redshift will no longer support the creation of new Python UDFs starting Patch 198. Existing Python UDFs will continue to function until June 30, 2026. For more information, see the [ blog post ](https://aws.amazon.com/blogs/big-data/amazon-redshift-python-user-defined-functions-will-reach-end-of-support-after-june-30-2026/). # User-defined functions in Amazon Redshift You can create a custom scalar user-defined function (UDF) using either a SQL SELECT clause or a Python program. The new function is stored in the database and is available for any user with sufficient privileges to run. You run a custom scalar UDF in much the same way as you run existing Amazon Redshift functions. For Python UDFs, in addition to using the standard Python functionality, you can import your own custom Python modules. For more information, see [Python language support for UDFs](udf-python-language-support.md). Note that Python 3 isn't available for Python UDFs. To get Python 3 support for Amazon Redshift UDFs, use [Scalar Lambda UDFs](udf-creating-a-lambda-sql-udf.md) instead. You can also create AWS Lambda UDFs that use custom functions defined in Lambda as part of your SQL queries. Lambda UDFs enable you to write complex UDFs and integrate with third-party components. They also can help you overcome some of the limitations of current Python and SQL UDFs. For example, they can help you access network and storage resources and write more full-fledged SQL statements. You can create Lambda UDFs in any of the programming languages supported by Lambda, such as Java, Go, PowerShell, Node.js, C\$1, Python, and Ruby. Or you can use a custom runtime. By default, all users can run UDFs. For more information about privileges, see [UDF security and permissions](udf-security-and-privileges.md). **Topics** + [UDF security and permissions](udf-security-and-privileges.md) + [Preventing UDF naming conflicts](udf-naming-udfs.md) + [Scalar SQL UDFs](udf-creating-a-scalar-sql-udf.md) + [Scalar Python UDFs](udf-creating-a-scalar-udf.md) + [Scalar Lambda UDFs](udf-creating-a-lambda-sql-udf.md) + [Use case examples for user-defined functions (UDFs)](udf-example-uses.md) # UDF security and permissions To create a UDF, you must have permission for usage on language for SQL or plpythonu (Python). By default, USAGE ON LANGUAGE SQL is granted to PUBLIC, but you must explicitly grant USAGE ON LANGUAGE PLPYTHONU to specific users or groups. To revoke usage for SQL, first revoke usage from PUBLIC. Then grant usage on SQL only to the specific users or groups permitted to create SQL UDFs. The following example revokes usage on SQL from PUBLIC. Then it grants usage to the user group `udf_devs`. ``` revoke usage on language sql from PUBLIC; grant usage on language sql to group udf_devs; ``` To run a UDF, you must have permission to do so for each function. By default, permission to run new UDFs is granted to PUBLIC. To restrict usage, revoke this permission from PUBLIC for the function. Then grant the privilege to specific individuals or groups. The following example revokes execution on function `f_py_greater` from PUBLIC. Then it grants usage to the user group `udf_devs`. ``` revoke execute on function f_py_greater(a float, b float) from PUBLIC; grant execute on function f_py_greater(a float, b float) to group udf_devs; ``` Superusers have all privileges by default. For more information, see [GRANT](r_GRANT.md) and [REVOKE](r_REVOKE.md). # Preventing UDF naming conflicts You can avoid potential conflicts and unexpected results considering your UDF naming conventions before implementation. Because function names can be overloaded, they can collide with existing and future Amazon Redshift function names. This topic discusses overloading and presents a strategy for avoiding conflict. ## Overloading function names A function is identified by its name and *signature*, which is the number of input arguments and the data types of the arguments. Two functions in the same schema can have the same name if they have different signatures. In other words, the function names can be *overloaded*. When you run a query, the query engine determines which function to call based on the number of arguments you provide and the data types of the arguments. You can use overloading to simulate functions with a variable number of arguments, up to the limit allowed by the [CREATE FUNCTION](r_CREATE_FUNCTION.md) command. ## Avoiding conflict with built-in Amazon Redshift functions We recommend that you name all UDFs using the prefix `f_`. Amazon Redshift reserves the `f_` prefix exclusively for UDFs and by prefixing your UDF names with `f_`, you ensure that your UDF name won't conflict with any existing or future Amazon Redshift built-in SQL function names. For example, by naming a new UDF `f_sum`, you avoid conflict with the Amazon Redshift SUM function. Similarly, if you name a new function `f_fibonacci`, you avoid conflict if Amazon Redshift adds a function named FIBONACCI in a future release. You can create a UDF with the same name and signature as an existing Amazon Redshift built-in SQL function without the function name being overloaded if the UDF and the built-in function exist in different schemas. Because built-in functions exist in the system catalog schema, pg\$1catalog, you can create a UDF with the same name in another schema, such as public or a user-defined schema. In some cases, you might call a function that is not explicitly qualified with a schema name. If so, Amazon Redshift searches the pg\$1catalog schema first by default. Thus, a built-in function runs before a new UDF with the same name. You can change this behavior by setting the search path to place pg\$1catalog at the end. If you do so, your UDFs take precedence over built-in functions, but the practice can cause unexpected results. Adopting a unique naming strategy, such as using the reserved prefix `f_`, is a more reliable practice. For more information, see [SET](r_SET.md) and [search\$1path](r_search_path.md). # Scalar SQL UDFs A scalar SQL UDF incorporates a SQL SELECT clause that runs when the function is called and returns a single value. The [CREATE FUNCTION](r_CREATE_FUNCTION.md) command defines the following parameters: + (Optional) Input arguments. Each argument must have a data type. + One return data type. + One SQL SELECT clause. In the SELECT clause, refer to the input arguments using \$11, \$12, and so on, according to the order of the arguments in the function definition. The input and return data types can be any standard Amazon Redshift data type. Don't include a FROM clause in your SELECT clause. Instead, include the FROM clause in the SQL statement that calls the SQL UDF. The SELECT clause can't include any of the following types of clauses: + FROM + INTO + WHERE + GROUP BY + ORDER BY + LIMIT # Scalar SQL function example The following example creates a function that compares two numbers and returns the larger value. For more information, see [CREATE FUNCTION](r_CREATE_FUNCTION.md). ``` create function f_sql_greater (float, float) returns float stable as $$ select case when $1 > $2 then $1 else $2 end $$ language sql; ``` The following query calls the new f\$1sql\$1greater function to query the SALES table and return either COMMISSION or 20 percent of PRICEPAID, whichever is greater. ``` select f_sql_greater(commission, pricepaid*0.20) from sales; ``` # Scalar Python UDFs A scalar Python UDF incorporates a Python program that runs when the function is called and returns a single value. The [CREATE FUNCTION](r_CREATE_FUNCTION.md) command defines the following parameters: + (Optional) Input arguments. Each argument must have a name and a data type. + One return data type. + One executable Python program. The input and return data types for Python UDFs can be any of the following types: + SMALLINT + INTEGER + BIGINT + DECIMAL + REAL + DOUBLE PRECISION + BOOLEAN + CHAR + VARCHAR + DATE + TIMESTAMP + ANYELEMENT The aliases for these types are also valid. For a full list of data types and their aliases, see [Data types](c_Supported_data_types.md). When Python UDFs use the data type ANYELEMENT, Amazon Redshift automatically converts to a standard data type based on the arguments supplied at runtime. For more information, see [ANYELEMENT data type](udf-data-types.md#udf-anyelement-data-type). When an Amazon Redshift query calls a scalar UDF, the following steps occur at runtime: 1. The function converts the input arguments to Python data types. For a mapping of Amazon Redshift data types to Python data types, see [Python UDF data types](udf-data-types.md). 1. The function runs the Python program, passing the converted input arguments. 1. The Python code returns a single value. The data type of the return value must correspond to the RETURNS data type specified by the function definition. 1. The function converts the Python return value to the specified Amazon Redshift data type, then returns that value to the query. **Note** Python 3 isn’t available for Python UDFs. To get Python 3 support for Amazon Redshift UDFs, use [Scalar Lambda UDFs](udf-creating-a-lambda-sql-udf.md) instead. # Scalar Python UDF example The following example creates a function that compares two numbers and returns the larger value. Note that the indentation of the code between the double dollar signs (\$1\$1) is a Python requirement. For more information, see [CREATE FUNCTION](r_CREATE_FUNCTION.md). ``` create function f_py_greater (a float, b float) returns float stable as $$ if a > b: return a return b $$ language plpythonu; ``` The following query calls the new `f_greater` function to query the SALES table and return either COMMISSION or 20 percent of PRICEPAID, whichever is greater. ``` select f_py_greater (commission, pricepaid*0.20) from sales; ``` # Python UDF data types Python UDFs can use any standard Amazon Redshift data type for the input arguments and the function's return value. In addition to the standard data types, UDFs support the data type *ANYELEMENT*, which Amazon Redshift automatically converts to a standard data type based on the arguments supplied at runtime. Scalar UDFs can return a data type of ANYELEMENT. For more information, see [ANYELEMENT data type](#udf-anyelement-data-type). During execution, Amazon Redshift converts the arguments from Amazon Redshift data types to Python data types for processing. It then converts the return value from the Python data type to the corresponding Amazon Redshift data type. For more information about Amazon Redshift data types, see [Data types](c_Supported_data_types.md). The following table maps Amazon Redshift data types to Python data types. [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/redshift/latest/dg/udf-data-types.html) ## ANYELEMENT data type ANYELEMENT is a *polymorphic data type*. This means that if a function is declared using ANYELEMENT for an argument's data type, the function can accept any standard Amazon Redshift data type as input for that argument when the function is called. The ANYELEMENT argument is set to the data type actually passed to it when the function is called. If a function uses multiple ANYELEMENT data types, they must all resolve to the same actual data type when the function is called. All ANYELEMENT argument data types are set to the actual data type of the first argument passed to an ANYELEMENT. For example, a function declared as `f_equal(anyelement, anyelement)` will take any two input values, so long as they are of the same data type. If the return value of a function is declared as ANYELEMENT, at least one input argument must be ANYELEMENT. The actual data type for the return value is the same as the actual data type supplied for the ANYELEMENT input argument. # Python language support for UDFs You can create a custom UDF based on the Python programming language. The [Python 2.7 standard library](https://docs.python.org/2/library/index.html) is available for use in UDFs, with the exception of the following modules: + ScrolledText + Tix + Tkinter + tk + turtle + smtpd In addition to the Python Standard Library, the following modules are part of the Amazon Redshift implementation: + [numpy 1.8.2](http://www.numpy.org/) + [pandas 0.14.1](https://pandas.pydata.org/) + [python-dateutil 2.2](https://dateutil.readthedocs.org/en/latest/) + [pytz 2014.7](https://pypi.org/project/pytz/2014.7/) + [scipy 0.12.1](https://www.scipy.org/) + [six 1.3.0](https://pypi.org/project/six/1.3.0/) + [wsgiref 0.1.2](https://pypi.python.org/pypi/wsgiref) You can also import your own custom Python modules and make them available for use in UDFs by executing a [CREATE LIBRARY](r_CREATE_LIBRARY.md) command. For more information, see [Example: Importing custom Python library modules](udf-importing-custom-python-library-modules.md). **Important** Amazon Redshift blocks all network access and write access to the file system through UDFs. **Note** Python 3 isn’t available for Python UDFs. To get Python 3 support for Amazon Redshift UDFs, use [Scalar Lambda UDFs](udf-creating-a-lambda-sql-udf.md) instead. # Example: Importing custom Python library modules You define scalar functions using Python language syntax. You can use the Python Standard Library modules and Amazon Redshift preinstalled modules. You can also create your own custom Python library modules and import the libraries into your clusters, or use existing libraries from Python or third parties. You cannot create a library that contains a module with the same name as a Python Standard Library module or an Amazon Redshift preinstalled Python module. If an existing user-installed library uses the same Python package as a library you create, you must drop the existing library before installing the new library. You must be a superuser or have `USAGE ON LANGUAGE plpythonu` privilege to install custom libraries; however, any user with sufficient privileges to create functions can use the installed libraries. You can query the [PG\$1LIBRARY](r_PG_LIBRARY.md) system catalog to view information about the libraries installed on your cluster. ## Importing a custom Python module into your cluster This section provides an example of importing a custom Python module into your cluster. To perform the steps in this section, you must have an Amazon S3 bucket, where you upload the library package. You then install the package in your cluster. For more information about creating buckets, go to [ Creating a bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/CreatingaBucket.html) in the *Amazon Simple Storage Service User Guide*. In this example, let's suppose that you create UDFs to work with positions and distances in your data. Connect to your Amazon Redshift cluster from a SQL client tool, and run the following commands to create the functions. ``` CREATE FUNCTION f_distance (x1 float, y1 float, x2 float, y2 float) RETURNS float IMMUTABLE as $$ def distance(x1, y1, x2, y2): import math return math.sqrt((y2 - y1) ** 2 + (x2 - x1) ** 2) return distance(x1, y1, x2, y2) $$ LANGUAGE plpythonu; CREATE FUNCTION f_within_range (x1 float, y1 float, x2 float, y2 float) RETURNS bool IMMUTABLE as $$ def distance(x1, y1, x2, y2): import math return math.sqrt((y2 - y1) ** 2 + (x2 - x1) ** 2) return distance(x1, y1, x2, y2) < 20 $$ LANGUAGE plpythonu; ``` Note that a few lines of code are duplicated in the previous functions. This duplication is necessary because a UDF cannot reference the contents of another UDF, and both functions require the same functionality. However, instead of duplicating code in multiple functions, you can create a custom library and configure your functions to use it. To do so, first create the library package by following these steps: 1. Create a folder named **geometry**. This folder is the top level package of the library. 1. In the **geometry** folder, create a file named `__init__.py`. Note that the file name contains two double underscore characters. This file indicates to Python that the package can be initialized. 1. Also in the **geometry** folder, create a folder named **trig**. This folder is the subpackage of the library. 1. In the **trig** folder, create another file named `__init__.py` and a file named `line.py`. In this folder, `__init__.py` indicates to Python that the subpackage can be initialized and that `line.py` is the file that contains library code. Your folder and file structure should be the same as the following: ``` geometry/ __init__.py trig/ __init__.py line.py ``` For more information about package structure, go to [ Modules](https://docs.python.org/2/tutorial/modules.html) in the Python tutorial on the Python website. 1. The following code contains a class and member functions for the library. Copy and paste it into `line.py`. ``` class LineSegment: def __init__(self, x1, y1, x2, y2): self.x1 = x1 self.y1 = y1 self.x2 = x2 self.y2 = y2 def angle(self): import math return math.atan2(self.y2 - self.y1, self.x2 - self.x1) def distance(self): import math return math.sqrt((self.y2 - self.y1) ** 2 + (self.x2 - self.x1) ** 2) ``` After you have created the package, do the following to prepare the package and upload it to Amazon S3. 1. Compress the contents of the **geometry** folder into a .zip file named **geometry.zip**. Do not include the **geometry** folder itself; only include the contents of the folder as shown following: ``` geometry.zip __init__.py trig/ __init__.py line.py ``` 1. Upload **geometry.zip** to your Amazon S3 bucket. **Important** If the Amazon S3 bucket does not reside in the same region as your Amazon Redshift cluster, you must use the REGION option to specify the region in which the data is located. For more information, see [CREATE LIBRARY](r_CREATE_LIBRARY.md). 1. From your SQL client tool, run the following command to install the library. Replace ** with the name of your bucket, and replace ** and ** with an access key and secret access key from your AWS Identity and Access Management (IAM) user credentials. ``` CREATE LIBRARY geometry LANGUAGE plpythonu FROM 's3:///geometry.zip' CREDENTIALS 'aws_access_key_id=;aws_secret_access_key='; ``` After you install the library in your cluster, you need to configure your functions to use the library. To do this, run the following commands. ``` CREATE OR REPLACE FUNCTION f_distance (x1 float, y1 float, x2 float, y2 float) RETURNS float IMMUTABLE as $$ from trig.line import LineSegment return LineSegment(x1, y1, x2, y2).distance() $$ LANGUAGE plpythonu; CREATE OR REPLACE FUNCTION f_within_range (x1 float, y1 float, x2 float, y2 float) RETURNS bool IMMUTABLE as $$ from trig.line import LineSegment return LineSegment(x1, y1, x2, y2).distance() < 20 $$ LANGUAGE plpythonu; ``` In the preceding commands, `import trig/line` eliminates the duplicated code from the original functions in this section. You can reuse the functionality provided by this library in multiple UDFs. Note that to import the module, you only need to specify the path to the subpackage and module name (`trig/line`). # Python UDF constraints Within the constraints listed in this topic, you can use UDFs anywhere you use the Amazon Redshift built-in scalar functions. For more information, see [SQL functions reference](c_SQL_functions.md). Amazon Redshift Python UDFs have the following constraints: + Python UDFs cannot access the network or read or write to the file system. + The total size of user-installed Python libraries cannot exceed 100 MB. + Amazon Redshift can only run one Python UDF at a time for provisioned clusters using automatic workload management (WLM) and for serverless workgroups. If you try to run more than one UDF concurrently, Amazon Redshift queues the remaining Python UDFs to run in the workload management queues. SQL UDFs don’t have a concurrency limit when using automatic WLM. + When using manual WLM for provisioned clusters, the number of Python UDFs that can run concurrently per cluster is limited to one-fourth of the cluster’s total concurrency level. For example, a provisioned cluster with a concurrency of 15 can run a maximum of three concurrent Python UDFs. + When using Python UDFs, Amazon Redshift doesn't support the SUPER and HLLSKETCH data types. # Logging errors and warnings in Python UDFs You can use the Python logging module to create user-defined error and warning messages in your UDFs. Following query execution, you can query the [SVL\$1UDF\$1LOG](r_SVL_UDF_LOG.md) system view to retrieve logged messages. **Note** UDF logging consumes cluster resources and might affect system performance. We recommend implementing logging only for development and troubleshooting. During query execution, the log handler writes messages to the SVL\$1UDF\$1LOG system view, along with the corresponding function name, node, and slice. The log handler writes one row to the SVL\$1UDF\$1LOG per message, per slice. Messages are truncated to 4096 bytes. The UDF log is limited to 500 rows per slice. When the log is full, the log handler discards older messages and adds a warning message to SVL\$1UDF\$1LOG. **Note** The Amazon Redshift UDF log handler escapes newlines ( `\n` ), pipe ( `|` ) characters, and backslash ( `\` ) characters with a backslash ( `\` ) character. By default, the UDF log level is set to WARNING. Messages with a log level of WARNING, ERROR, and CRITICAL are logged. Messages with lower severity INFO, DEBUG, and NOTSET are ignored. To set the UDF log level, use the Python logger method. For example, the following sets the log level to INFO. ``` logger.setLevel(logging.INFO) ``` For more information about using the Python logging module, see [Logging facility for Python](https://docs.python.org/2.7/library/logging.html) in the Python documentation. The following example creates a function named f\$1pyerror that imports the Python logging module, instantiates the logger, and logs an error. ``` CREATE OR REPLACE FUNCTION f_pyerror() RETURNS INTEGER VOLATILE AS $$ import logging logger = logging.getLogger() logger.setLevel(logging.INFO) logger.info('Your info message here') return 0 $$ language plpythonu; ``` The following example queries SVL\$1UDF\$1LOG to view the message logged in the previous example. ``` select funcname, node, slice, trim(message) as message from svl_udf_log; funcname | query | node | slice | message ------------+-------+------+-------+------------------ f_pyerror | 12345 | 1| 1 | Your info message here ``` # Scalar Lambda UDFs Amazon Redshift can use custom functions defined in AWS Lambda as part of SQL queries. You can write scalar Lambda UDFs in any programming languages supported by Lambda, such as Java, Go, PowerShell, Node.js, C\$1, Python, and Ruby. Or you can use a custom runtime. The [CREATE EXTERNAL FUNCTION](r_CREATE_EXTERNAL_FUNCTION.md) command creates the following parameters: + (Optional) A list of arguments with data type. + One return data type. + One function name of the external function that is called by Amazon Redshift. + One IAM role that the Amazon Redshift cluster is authorized to assume and call to Lambda. + One Lambda function name that the Lambda UDF invokes. For information about CREATE EXTERNAL FUNCTION, see [CREATE EXTERNAL FUNCTION](r_CREATE_EXTERNAL_FUNCTION.md). The input and return data types for Lambda UDFs can be any of the following types: + SMALLINT + INTEGER + BIGINT + DECIMAL + REAL + DOUBLE PRECISION + CHAR + VARCHAR + BOOLEAN + DATE + TIMESTAMP The aliases for these types are also valid. For a full list of data types and their aliases, see [Data types](c_Supported_data_types.md). Amazon Redshift ensures that the external function can send and receive batched arguments and results. Lambda UDFs are defined and managed in Lambda, and you can control the access privileges to invoke these UDFs in Amazon Redshift. You can invoke multiple Lambda functions in the same query or invoke the same function multiple times. Use Lambda UDFs in any clauses of the SQL statements where scalar functions are supported. You can also use Lambda UDFs in any SQL statement such as SELECT, UPDATE, INSERT, or DELETE. **Note** Using Lambda UDFs can incur additional charges from the Lambda service. Whether it does so depends on factors such as the numbers of Lambda requests (UDF invocations) and the total duration of the Lambda program execution. However, there is no additional charge to use Lambda UDFs in Amazon Redshift. For information about AWS Lambda pricing, see [AWS Lambda Pricing](https://aws.amazon.com/lambda/pricing). The number of Lambda requests varies depending on the specific SQL statement clause where the Lambda UDF is used. For example, suppose the function is used in a WHERE clause such as the following. `` `SELECT a, b FROM t1 WHERE lambda_multiply(a, b) = 64; SELECT a, b FROM t1 WHERE a*b = lambda_multiply(2, 32) ` In this case, Amazon Redshift calls the first SELECT statement for each and calls the second SELECT statement only once. However, using a UDF in the projection part of the query might only invoke the Lambda function once for every qualified or aggregated row in the result set. ## UDF security and permissions To create a Lambda UDF, make sure that you have permissions for usage on the LANGUAGE EXFUNC. You must explicitly grant USAGE ON LANGUAGE EXFUNC or revoke USAGE ON LANGUAGE EXFUNC to specific users, groups, or public. The following example grants usage on EXFUNC to PUBLIC. ``` grant usage on language exfunc to PUBLIC; ``` The following example revokes usage on exfunc from PUBLIC and then grants usage to the user group lambda\$1udf\$1devs. ``` revoke usage on language exfunc from PUBLIC; grant usage on language exfunc to group lambda_udf_devs; ``` To run a Lambda UDF, make sure that you have permission for each function called. By default, permission to run new Lambda UDFs is granted to PUBLIC. To restrict usage, revoke this permission from PUBLIC for the function. Then, grant the privilege to specific users or groups. The following example revokes execution on the function exfunc\$1sum from PUBLIC. Then, it grants usage to the user group lambda\$1udf\$1devs. ``` revoke execute on function exfunc_sum(int, int) from PUBLIC; grant execute on function exfunc_sum(int, int) to group lambda_udf_devs; ``` Superusers have all privileges by default. For more information about granting and revoking privileges, see [GRANT](r_GRANT.md) and [REVOKE](r_REVOKE.md). ## Configuring the authorization parameter for Lambda UDFs The CREATE EXTERNAL FUNCTION command requires authorization to invoke Lambda functions in AWS Lambda. To start authorization, specify an AWS Identity and Access Management (IAM) role when you run the CREATE EXTERNAL FUNCTION command. For more information about IAM roles, see [IAM roles](https://docs.aws.amazon.com//IAM/latest/UserGuide/id_roles.html) in the *IAM User Guide*. If there is an existing IAM role with permissions to invoke Lambda functions attached to your cluster, then you can substitute your role Amazon Resource Name (ARN) in the IAM\$1ROLE parameter for the command. Following sections describe the steps for using an IAM role in the CREATE EXTERNAL FUNCTION command. ### Creating an IAM role for Lambda The IAM role requires permission to invoke Lambda functions. While creating the IAM role, provide the permission in one of the following ways: + Attach the `AWSLambdaRole` policy on the **Attach permissions policy** page while creating an IAM role. The `AWSLambdaRole` policy grants permissions to invoke Lambda functions which is the minimal requirement. For more information and other policies, see [Identity-based IAM policies for AWS Lambda](https://docs.aws.amazon.com/lambda/latest/dg/access-control-identity-based.html) in the *AWS Lambda Developer Guide*. + Create your own custom policy to attach to your IAM role with the `lambda:InvokeFunction` permission of either all resources or a particular Lambda function with the ARN of that function. For more information on how to create a policy, see [Creating IAM policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create.html) in the *IAM User Guide*. The following example policy enables invoking Lambda on a particular Lambda function. ------ #### [ JSON ] **** ``` { "Version":"2012-10-17", "Statement": [ { "Sid": "Invoke", "Effect": "Allow", "Action": [ "lambda:InvokeFunction" ], "Resource": "arn:aws:lambda:us-west-2:123456789012:function:my-function" } ] } ``` ------ For more information on resources for Lambda functions, see [Resources and conditions for Lambda actions](https://docs.aws.amazon.com/lambda/latest/dg/lambda-api-permissions-ref.html) in the *IAM API Reference*. After creating your custom policy with the required permissions, you can attach your policy to the IAM role on the **Attach permissions policy** page while creating an IAM role. For steps to create an IAM role, see [Authorizing Amazon Redshift to access other AWS services on your behalf](https://docs.aws.amazon.com/redshift/latest/mgmt/authorizing-redshift-service.html) in the *Amazon Redshift Management Guide*. If you don't want to create a new IAM role, you can add the permissions mentioned previously to your existing IAM role. ### Associating an IAM role with the cluster Attach the IAM role to your cluster. You can add a role to a cluster or view the roles associated with a cluster by using the Amazon Redshift Management Console, CLI, or API. For more information, see [Associating an IAM Role With a Cluster](https://docs.aws.amazon.com/redshift/latest/mgmt/copy-unload-iam-role.html) in the *Amazon Redshift Management Guide*. ### Including the IAM role in the command Include the IAM role ARN in the CREATE EXTERNAL FUNCTION command. When you create an IAM role, IAM returns an Amazon Resource Name (ARN) for the role. To specify an IAM role, provide the role ARN with the `IAM_ROLE` parameter. The following shows the syntax for the `IAM_ROLE` parameter. ``` IAM_ROLE 'arn:aws:iam::aws-account-id:role/role-name' ``` To invoke Lambda functions which reside in other accounts within the same Region, see [Chaining IAM roles in Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/mgmt/authorizing-redshift-service.html#authorizing-redshift-service-chaining-roles.html). ## Using the JSON interface between Amazon Redshift and AWS Lambda Amazon Redshift uses a common interface for all Lambda functions that Amazon Redshift communicates to. The following table shows the list of input fields that the designated Lambda functions that you can expect for the JSON payload. [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/redshift/latest/dg/udf-creating-a-lambda-sql-udf.html) The order of the JSON array determines the order of batch processing. The Lambda function must process the arguments iteratively and produce the exact number of records. The following is an example of a payload. ``` { "request_id" : "23FF1F97-F28A-44AA-AB67-266ED976BF40", "cluster" : "arn:aws:redshift:xxxx", "user" : "adminuser", "database" : "db1", "external_function": "public.foo", "query_id" : 5678234, "num_records" : 4, "arguments" : [ [ 1, 2 ], [ 3, null], null, [ 4, 6] ] } ``` The return output of the Lambda function contains the following fields. [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/redshift/latest/dg/udf-creating-a-lambda-sql-udf.html) The following is an example of the Lambda function output. ``` { "success": true, // true indicates the call succeeded "error_msg" : "my function isn't working", // shall only exist when success != true "num_records": 4, // number of records in this payload "results" : [ 1, 4, null, 7 ] } ``` When you call Lambda functions from SQL queries, Amazon Redshift ensures the security of the connection with the following considerations: + GRANT and REVOKE permissions. For more information about UDF security and permissions, see [UDF security and permissions](udf-security-and-privileges.md). + Amazon Redshift only submits the minimum set of data to the designated Lambda function. + Amazon Redshift only calls the designated Lambda function with the designated IAM role. # Use case examples for user-defined functions (UDFs) **Note** Starting November 1, 2025, Amazon Redshift will no longer support the creation of new Python UDFs. Existing Python UDFs will continue to function until June 30, 2026. Starting July 1, 2026, Amazon Redshift will no longer support Python UDFs. We recommend that you migrate your existing Python UDFs to Lambda UDFs before November 1, 2025. For information on creating and using Lambda UDFs, see [Scalar Lambda UDFs](udf-creating-a-lambda-sql-udf.md). For information on converting existing Python UDFs to Lambda UDFs, see the [ blog post ](https://aws.amazon.com/blogs/big-data/amazon-redshift-python-user-defined-functions-will-reach-end-of-support-after-june-30-2026/). You can use user-defined functions to solve business problems by integrating Amazon Redshift with other components. Following are some examples of how others have used UDFs for their use cases: + [Accessing external components using Amazon Redshift Lambda UDFs](https://aws.amazon.com/blogs/big-data/accessing-external-components-using-amazon-redshift-lambda-udfs/) – describes how Amazon Redshift Lambda UDFs work and walks through creating a Lambda UDF. + [Translate and analyze text using SQL functions with Amazon Redshift, Amazon Translate, and Amazon Comprehend](https://aws.amazon.com/blogs/machine-learning/translate-and-analyze-text-using-sql-functions-with-amazon-redshift-amazon-translate-and-amazon-comprehend/) – provides prebuilt Amazon Redshift Lambda UDFs that you can install with a few clicks to translate, redact, and analyze text fields. + [Access Amazon Location Service from Amazon Redshift](https://aws.amazon.com/blogs/big-data/access-amazon-location-service-from-amazon-redshift/) – describes how to use Amazon Redshift Lambda UDFs to integrate with Amazon Location Service. + [Data Tokenization with Amazon Redshift and Protegrity](https://aws.amazon.com/blogs/apn/data-tokenization-with-amazon-redshift-and-protegrity/) – describes how to integrate Amazon Redshift Lambda UDFs with the Protegrity Serverless product. + [Amazon Redshift UDFs](https://github.com/aws-samples/amazon-redshift-udfs) – a collection of Amazon Redshift SQL, Lambda, and Python UDFs.