CREATE FUNCTION（外部）

适用于：勾选“是” Databricks Runtime

创建临时或永久外部函数。临时函数的范围为会话级别，永久函数在持久性目录中创建，并且可供所有会话使用。在 USING 子句中指定的资源在首次执行时可供所有执行程序使用。

除了 SQL 接口，Spark 还支持使用 Scala、Python 和 Java API 创建自定义用户定义的标量函数和聚合函数。有关详细信息，请参阅外部用户定义的标量函数 (UDF) 和用户定义的聚合函数 (UDAF)。

注释

无法在 Unity 目录中创建外部函数。

语法

CREATE [ OR REPLACE ] [ TEMPORARY ] FUNCTION [ IF NOT EXISTS ]
    function_name AS class_name [ resource_locations ]

参数

或替换

如果指定，将重新加载该函数的资源。这主要用于获取对函数实现所做的任何更改。此参数与 IF NOT EXISTS 互斥，不能一起指定。
TEMPORARY

指示要创建的函数范围。指定 TEMPORARY 时，创建的函数有效并在当前会话中可见。在目录中不会为这些类型的函数生成永久条目。
如果不存在

如果指定，则仅在不存在时才创建该函数。如果系统中已存在指定的函数，则函数的创建将成功（不会引发错误）。此参数与 OR REPLACE 互斥，不能一起指定。
function_name

函数的名称。可选择使用架构名称来限定函数名称。

hive_metastore 中创建的函数只能包含字母数字 ASCII 字符和下划线。
class_name

提供要创建的函数的实现的类的名称。实现类应按如下方式扩展其中一个基类：
- 应在 UDF 包中扩展 UDAF 或 org.apache.hadoop.hive.ql.exec。
- 应在 AbstractGenericUDAFResolver 包中扩展 GenericUDF、GenericUDTF 或 org.apache.hadoop.hive.ql.udf.generic。
- 应在 UserDefinedAggregateFunction 包中扩展 org.apache.spark.sql.expressions。
resource_locations

包含函数实现及其依赖项的资源列表。

语法：USING { { (JAR | FILE | ARCHIVE) resource_uri } , ... }

示例

-- 1. Create a simple UDF `SimpleUdf` that increments the supplied integral value by 10.
-- import org.apache.hadoop.hive.ql.exec.UDF;
-- public class SimpleUdf extends UDF {
-- public int evaluate(int value) {
-- return value + 10;
-- }
-- }
-- 2. Compile and place it in a JAR file called `SimpleUdf.jar` in /tmp.

-- Create a table called `test` and insert two rows.
> CREATE TABLE test(c1 INT);
> INSERT INTO test VALUES (1), (2);

-- Create a permanent function called `simple_udf`.
> CREATE FUNCTION simple_udf AS 'SimpleUdf'
    USING JAR '/tmp/SimpleUdf.jar';

-- Verify that the function is in the registry.
> SHOW USER FUNCTIONS;
           function
 ------------------
 default.simple_udf

-- Invoke the function. Every selected value should be incremented by 10.
> SELECT simple_udf(c1) AS function_return_value FROM t1;
 function_return_value
 ---------------------
                    11
                    12

-- Created a temporary function.
> CREATE TEMPORARY FUNCTION simple_temp_udf AS 'SimpleUdf'
    USING JAR '/tmp/SimpleUdf.jar';

-- Verify that the newly created temporary function is in the registry.
-- The temporary function does not have a qualified
-- schema associated with it.
> SHOW USER FUNCTIONS;
           function
 ------------------
 default.simple_udf
    simple_temp_udf

-- 1. Modify `SimpleUdf`'s implementation to add supplied integral value by 20.
-- import org.apache.hadoop.hive.ql.exec.UDF;

-- public class SimpleUdfR extends UDF {
-- public int evaluate(int value) {
-- return value + 20;
-- }
-- }
-- 2. Compile and place it in a jar file called `SimpleUdfR.jar` in /tmp.

-- Replace the implementation of `simple_udf`
> CREATE OR REPLACE FUNCTION simple_udf AS 'SimpleUdfR'
    USING JAR '/tmp/SimpleUdfR.jar';

-- Invoke the function. Every selected value should be incremented by 20.
> SELECT simple_udf(c1) AS function_return_value FROM t1;
 function_return_value
 ---------------------
                    21
                    22

Last updated on 2025-10-20

通过

CREATE FUNCTION（外部）

语法

参数

示例

相关文章

其他资源