Hive UDFHive UDFs
本文介绍如何创建 Hive UDF、在 Spark 中注册它以及在 Spark SQL 查询中使用它。This article shows how to create a Hive UDF, register it in Spark, and use it in a Spark SQL query.
下面是一个 Hive UDF,它采用 long
作为参数并返回其十六进制表示形式。Here is a Hive UDF that takes a long
as an argument and returns its hexadecimal representation.
import org.apache.hadoop.hive.ql.exec.UDF
import org.apache.hadoop.io.LongWritable
// This UDF takes a long integer and converts it to a hexadecimal string.
class ToHex extends UDF {
def evaluate(n: LongWritable): String = {
Option(n)
.map { num =>
// Use Scala string interpolation. It's the easiest way, and it's
// type-safe, unlike String.format().
f"0x${num.get}%x"
}
.getOrElse("")
}
}
注册函数:Register the function:
spark.sql("CREATE TEMPORARY FUNCTION to_hex AS 'com.ardentex.spark.hiveudf.ToHex'")
将函数用作任何其他已注册的函数:Use your function as any other registered function:
spark.sql("SELECT first_name, to_hex(code) as hex_code FROM people")
可以在示例 Hive UDF 项目中找到更多示例和可编译代码。You can find more examples and compilable code at Sample Hive UDF project.