在映射数据流中生成表达式Build expressions in mapping data flow

适用于: Azure 数据工厂 Azure Synapse Analytics

在映射数据流中,许多转换属性都作为表达式输入。In mapping data flow, many transformation properties are entered as expressions. 这些表达式由列值、参数、函数、运算符和文本组成,它们在运行时的计算结果为 Spark 数据类型。These expressions are composed of column values, parameters, functions, operators, and literals that evaluate to a Spark data type at run time. 映射数据流具有专用体验,旨在协助构建这些称为“表达式生成器”的表达式。Mapping data flows has a dedicated experience aimed to aid you in building these expressions called the Expression Builder. 通过 IntelliSense 代码完成功能来实现突出显示、语法检查和自动完成,表达式生成器可简化数据流的生成。Utilizing IntelliSense code completion for highlighting, syntax checking, and autocompleting, the expression builder is designed to make building data flows easy. 本文介绍如何使用表达式生成器有效地生成业务逻辑。This article explains how to use the expression builder to effectively build your business logic.

表达式生成器Expression Builder

打开表达式生成器Open Expression Builder

有多个打开表达式生成器的入口点。There are multiple entry points to opening the expression builder. 这些都取决于数据流转换的具体上下文。These are all dependent on the specific context of the data flow transformation. 最常见的用例是转换(如派生列聚合转换),用户使用数据流表达式语言创建或更新列。The most common use case is in transformations like derived column and aggregate where users create or update columns using the data flow expression language. 可以通过在列的列表上方选择“打开表达式生成器”来打开表达式生成器。The expression builder can be opened by selecting Open expression builder above the list of columns. 你还可以单击列上下文,然后直接打开该表达式的表达式生成器。You can also click on a column context and open the expression builder directly to that expression.

打开表达式生成器派生Open Expression Builder derive

在某些转换(如筛选器)中,单击蓝色表达式文本框将打开表达式生成器。In some transformations like filter, clicking on a blue expression text box will open the expression builder.

蓝色表达式框Blue expression box

在使用匹配或分组依据条件引用列时,表达式可以从列中提取值。When you reference columns in a matching or group-by condition, an expression can extract values from columns. 若要创建表达式,请选择“计算列”。To create an expression, select Computed column.

计算列选项Computed column option

如果表达式或文本值是有效的输入,请选择“添加动态内容”以生成计算结果为文本值的表达式。In cases where an expression or a literal value are valid inputs, select Add dynamic content to build an expression that evaluates to a literal value.

添加动态内容选项Add dynamic content option

表达式元素Expression elements

在映射数据流中,表达式可以由列值、参数、函数、局部变量、运算符和文本组成。In mapping data flows, expressions can be composed of column values, parameters, functions, local variables, operators, and literals. 这些表达式的计算结果必须为 Spark 数据类型,如字符串、布尔值或整数。These expressions must evaluate to a Spark data type such as string, boolean, or integer.

表达式元素Expression elements

函数Functions

映射数据流具有可用于表达式的内置函数和运算符。Mapping data flows has built-in functions and operators that can be used in expressions. 有关可用函数的列表,请参阅映射数据流语言参考For a list of available functions, see the mapping data flow language reference.

寻址数组索引Address array indexes

在处理返回数组类型的列或函数时,请使用括号 ([]) 访问特定元素。When dealing with columns or functions that return array types, use brackets ([]) to access a specific element. 如果索引不存在,表达式的计算结果为 NULL。If the index doesn't exist, the expression evaluates into NULL.

表达式生成器数组Expression Builder array

重要

在映射数据流中,数组是从 1 开始的,这意味着第一个元素由第一个索引引用。In mapping data flows, arrays are one-based meaning the first element is referenced by index one. 例如,myArray[1] 将访问名为“myArray”的数组的第一个元素。For example, myArray[1] will access the first element of an array called 'myArray'.

输入架构Input schema

如果数据流在其任何源中使用定义的架构,则你可以在许多表达式中按名称引用列。If your data flow uses a defined schema in any of its sources, you can reference a column by name in many expressions. 如果使用架构偏差,则可以使用 byName()byNames() 函数显式引用列,或使用列模式进行匹配。If you are utilizing schema drift, you can reference columns explicitly using the byName() or byNames() functions or match using column patterns.

包含特殊字符的列名称Column names with special characters

如果列名称包含特殊字符或空格,请用大括号将名称括起来,以便在表达式中引用它们。When you have column names that include special characters or spaces, surround the name with curly braces to reference them in an expression.

{[dbo].this_is my complex name$$$}

参数Parameters

参数是在运行时从管道传递到数据流的值。Parameters are values that are passed into a data flow at run time from a pipeline. 若要引用参数,请单击“表达式元素”视图中的参数,或者在其名称前面用美元符号引用该参数。To reference a parameter, either click on the parameter from the Expression elements view or reference it with a dollar sign in front of its name. 例如,通过 $parameter1 可以引用名为 parameter1 的参数。For example, a parameter called parameter1 would be referenced by $parameter1. 若要了解详细信息,请参阅将映射数据流参数化To learn more, see parameterizing mapping data flows.

缓存查找Cached lookup

通过缓存查找,你可以对缓存接收器的输出进行内联查找。A cached lookup allows you to do an inline lookup of the output of a cached sink. 每个接收器上可以使用两个函数:lookup()outputs()There are two functions available to use on each sink, lookup() and outputs(). 引用这些函数的语法为 cacheSinkName#functionName()The syntax to reference these functions is cacheSinkName#functionName(). 有关详细信息,请参阅缓存接收器For more information, see cache sinks.

lookup() 以参数的形式接收当前转换中的匹配列,并返回一个与缓存接收器中键列匹配的行相等的复杂列。lookup() takes in the matching columns in the current transformation as parameters and returns a complex column equal to the row matching the key columns in the cache sink. 返回的复杂列包含缓存接收器中映射的每个列的子列。The complex column returned contains a subcolumn for each column mapped in the cache sink. 例如,如果你有一个错误代码缓存接收器 errorCodeCache,该接收器一个具有匹配代码的键列和一个名为 Message 的列。For example, if you had an error code cache sink errorCodeCache that had a key column matching on the code and a column called Message. 调用 errorCodeCache#lookup(errorCode).Message 将返回与传入的代码对应的消息。Calling errorCodeCache#lookup(errorCode).Message would return the message corresponding with the code passed in.

outputs() 不接收任何参数,并以一组复杂列的形式返回整个缓存接收器。outputs() takes no parameters and returns the entire cache sink as an array of complex columns. 如果在接收器中指定了键列,则无法调用此项,并且仅在缓存接收器中存在少量行时才应使用此项。This can't be called if key columns are specified in the sink and should only be used if there is a small number of rows in the cache sink. 常见的用例是追加递增键的最大值。A common use case is appending the max value of an incrementing key. 如果缓存的单个聚合行 CacheMaxKey 包含列 MaxKey,则可以通过调用 CacheMaxKey#outputs()[1].MaxKey 引用第一个值。If a cached single aggregated row CacheMaxKey contains a column MaxKey, you can reference the first value by calling CacheMaxKey#outputs()[1].MaxKey.

缓存查找Cached lookup

局部变量Locals

如果要跨多个列共享逻辑或想要划分逻辑,可以在派生列中创建一个局部变量。If you are sharing logic across multiple columns or want to compartmentalize your logic, you can create a local within a derived column. 若要引用局部变量,请单击“表达式元素”视图中的局部变量,或者在其名称前面用冒号引用它。To reference a local, either click on the local from the Expression elements view or reference it with a colon in front of its name. 例如,通过 :local1 可引用名为 local1 的局部变量。For example, a local called local1 would be referenced by :local1. 派生列文档中详细了解局部变量。Learn more about locals in the derived column documentation.

预览表达式结果Preview expression results

如果启用了调试模式,则可以交互方式使用调试群集预览表达式的计算结果。If debug mode is switched on, you can interactively use the debug cluster to preview what your expression evaluates to. 选择数据预览旁边的“刷新”以更新数据预览的结果。Select Refresh next to data preview to update the results of the data preview. 你可以看到给定输入列的每一行的输出。You can see the output of each row given the input columns.

进行中的预览In-progress preview

字符串内插String interpolation

创建使用表达式元素的长字符串时,请使用字符串内插来轻松生成复杂的字符串逻辑。When creating long strings that use expression elements, use string interpolation to easily build up complex string logic. 当参数包含在查询字符串中时,字符串内插可避免大量使用字符串串联。String interpolation avoids extensive use of string concatenation when parameters are included in query strings. 使用双引号将文本字符串文本与表达式括起来。Use double quotation marks to enclose literal string text together with expressions. 可以包括表达式函数、列和参数。You can include expression functions, columns, and parameters. 若要使用表达式语法,请用大括号将其括起来,To use expression syntax, enclose it in curly braces,

字符串内插的一些示例:Some examples of string interpolation:

  • "My favorite movie is {iif(instr(title,', The')>0,"The {split(title,', The')[1]}",title)}"

  • "select * from {$tablename} where orderyear > {$year}"

  • "Total cost with sales tax is {round(totalcost * 1.08,2)}"

  • "{:playerName} is a {:playerRating} player"

备注

在 SQL 源查询中使用字符串内插语法时,查询字符串必须位于一行上,且不带“/n”。When using string interpolation syntax in SQL source queries, the query string must be on one single line, without '/n'.

注释表达式Commenting expressions

使用单行和多行注释语法将注释添加到表达式。Add comments to your expressions by using single-line and multiline comment syntax.

以下示例是有效的注释:The following examples are valid comments:

  • /* This is my comment */

  • /* This is a

  • multi-line comment */

如果将注释放在表达式的顶部,则该注释将显示在转换文本框中以记录转换表达式。If you put a comment at the top of your expression, it appears in the transformation text box to document your transformation expressions.

转换文本框中的注释Comment in the transformation text box

正则表达式Regular expressions

许多表达式语言函数使用正则表达式语法。Many expression language functions use regular expression syntax. 使用正则表达式函数时,表达式生成器会尝试将反斜杠 (\) 解释为转义字符序列。When you use regular expression functions, Expression Builder tries to interpret a backslash (\) as an escape character sequence. 在正则表达式中使用反斜杠时,请将整个正则表达式括在反引号 (`) 中,或使用双反斜杠。When you use backslashes in your regular expression, either enclose the entire regex in backticks (`) or use a double backslash.

使用反引号的示例:An example that uses backticks:

regex_replace('100 and 200', `(\d+)`, 'digits')

使用双斜杠的示例:An example that uses double slashes:

regex_replace('100 and 200', '(\\d+)', 'digits')

键盘快捷方式Keyboard shortcuts

下面是表达式生成器中可用的快捷方式列表。Below are a list of shortcuts available in the expression builder. 创建表达式时,大多数 intellisense 快捷方式都可用。Most intellisense shortcuts are available when creating expressions.

  • Ctrl+K Ctrl+C:注释整行。Ctrl+K Ctrl+C: Comment entire line.
  • Ctrl+K Ctrl+U:取消注释。Ctrl+K Ctrl+U: Uncomment.
  • F1:提供编辑器帮助命令。F1: Provide editor help commands.
  • Alt+向下键:向下移动当前行。Alt+Down arrow key: Move down current line.
  • Alt+向上键:向上移动当前行。Alt+Up arrow key: Move up current line.
  • Ctrl+空格键:显示上下文帮助。Ctrl+Spacebar: Show context help.

常用表达式Commonly used expressions

转换为日期或时间戳Convert to dates or timestamps

若要在时间戳输出中包含字符串文本,请在 toString() 中包含转换。To include string literals in your timestamp output, wrap your conversion in toString().

toString(toTimestamp('12/31/2016T00:12:00', 'MM/dd/yyyy\'T\'HH:mm:ss'), 'MM/dd /yyyy\'T\'HH:mm:ss')

若要将毫秒从纪元转换为日期或时间戳,请使用 toTimestamp(<number of milliseconds>)To convert milliseconds from epoch to a date or timestamp, use toTimestamp(<number of milliseconds>). 如果时间以秒为单位,则乘以 1,000。If time is coming in seconds, multiply by 1,000.

toTimestamp(1574127407*1000l)

上一个表达式末尾的“l”指示转换为长类型作为内联语法。The trailing "l" at the end of the previous expression signifies conversion to a long type as inline syntax.

查找自纪元或 Unix 时间起的时间Find time from epoch or Unix Time

toLong( currentTimestamp() - toTimestamp('1970-01-01 00:00:00.000', 'yyyy-MM-dd HH:mm:ss.SSS') ) * 1000ltoLong( currentTimestamp() - toTimestamp('1970-01-01 00:00:00.000', 'yyyy-MM-dd HH:mm:ss.SSS') ) * 1000l

后续步骤Next steps

开始生成数据转换表达式Begin building data transformation expressions