字符串运算符String operators

Kusto 提供了用于搜索字符串数据类型的各种查询运算符。Kusto offers a variety of query operators for searching string data types. 以下文章介绍如何为字符串词语编制索引、列出字符串查询运算符,以及提供用于优化性能的提示。The following article describes how string terms are indexed, lists the string query operators, and gives tips for optimizing performance.

了解字符串词语Understanding string terms

Kusto 将为所有列(包括 string 类型的列)编制索引。Kusto indexes all columns, including columns of type string. 将根据实际数据为这些列构建多个索引。Multiple indexes are built for such columns, depending on the actual data. 这些索引不会直接公开,而是在查询中使用,此类查询的 string 运算符在其名称中包含 has(如 has!hashasprefix!hasprefix)。These indexes aren't directly exposed, but are used in queries with the string operators that have has as part of their name, such as has, !has, hasprefix, !hasprefix. 这些运算符的语义由列的编码方式决定。The semantics of these operators are dictated by the way the column is encoded. 这些运算符对词语进行匹配,而不是执行“纯”子字符串匹配。Instead of doing a "plain" substring match, these operators match terms.

什么是词语?What is a term?

默认情况下,每个 string 值都分解为 ASCII 字母数字字符的最大序列,并将这些序列中的每一个都转换为一个词语。By default, each string value is broken into maximal sequences of ASCII alphanumeric characters, and each of those sequences is made into a term. 例如,在下面的 string 中,词语是 KustoWilliamGates3rd 以及以下子字符串:ad67d136c1db4f9f88efd94f3b6b0b5aFor example, in the following string, the terms are Kusto, WilliamGates3rd, and the following substrings: ad67d136, c1db, 4f9f, 88ef, d94f3b6b0b5a.

Kusto:  ad67d136-c1db-4f9f-88ef-d94f3b6b0b5a;;WilliamGates3rd

Kusto 会构建一个词语索引,其中包含具有四个或更多个字符的所有词语。此索引由 has!has 等使用。Kusto builds a term index consisting of all terms that are four characters or more, and this index is used by has, !has, and so on. 如果查询查找小于四个字符的词语,或者使用 contains 运算符,则如果 Kusto 无法确定匹配项,它将恢复为扫描列中的值。If the query looks for a term that is smaller than four characters, or uses a contains operator, Kusto will revert to scanning the values in the column if it can't determine a match. 此方法比在词语索引中查找词语的速度要慢得多。This method is much slower than looking up the term in the term index.

针对字符串的运算符Operators on strings

备注

下表中使用了以下缩写:The following abbreviations are used in the table below:

  • RHS = 表达式的右侧RHS = right hand side of the expression
  • LHS = 表达式的左侧LHS = left hand side of the expression
运算符Operator 描述Description 区分大小写Case-Sensitive 示例(生成 trueExample (yields true)
== 等于Equals Yes "aBc" == "aBc"
!= 不等于Not equals Yes "abc" != "ABC"
=~ 等于Equals No "abc" =~ "ABC"
!~ 不等于Not equals No "aBc" !~ "xyz"
has 右侧 (RHS) 是左侧 (LHS) 的整体Right-hand-side (RHS) is a whole term in left-hand-side (LHS) No "North America" has "america"
!has RHS 不是 LHS 中的完整词语RHS isn't a full term in LHS No "North America" !has "amer"
has_cs RHS 是 LHS 中的完整词语RHS is a whole term in LHS Yes "North America" has_cs "America"
!has_cs RHS 不是 LHS 中的完整词语RHS isn't a full term in LHS Yes "North America" !has_cs "amer"
hasprefix RHS 是 LHS 中的词语前缀RHS is a term prefix in LHS No "North America" hasprefix "ame"
!hasprefix RHS 不是 LHS 中的词语前缀RHS isn't a term prefix in LHS No "North America" !hasprefix "mer"
hasprefix_cs RHS 是 LHS 中的词语前缀RHS is a term prefix in LHS Yes "North America" hasprefix_cs "Ame"
!hasprefix_cs RHS 不是 LHS 中的词语前缀RHS isn't a term prefix in LHS Yes "North America" !hasprefix_cs "CA"
hassuffix RHS 是 LHS 中的词语后缀RHS is a term suffix in LHS No "North America" hassuffix "ica"
!hassuffix RHS 不是 LHS 中的词语后缀RHS isn't a term suffix in LHS No "North America" !hassuffix "americ"
hassuffix_cs RHS 是 LHS 中的词语后缀RHS is a term suffix in LHS Yes "North America" hassuffix_cs "ica"
!hassuffix_cs RHS 不是 LHS 中的词语后缀RHS isn't a term suffix in LHS Yes "North America" !hassuffix_cs "icA"
contains RHS 以 LHS 子序列的形式存在RHS occurs as a subsequence of LHS No "FabriKam" contains "BRik"
!contains LHS 中未出现 RHSRHS doesn't occur in LHS No "Fabrikam" !contains "xyz"
contains_cs RHS 以 LHS 子序列的形式存在RHS occurs as a subsequence of LHS Yes "FabriKam" contains_cs "Kam"
!contains_cs LHS 中未出现 RHSRHS doesn't occur in LHS Yes "Fabrikam" !contains_cs "Kam"
startswith RHS 是 LHS 的初始子序列RHS is an initial subsequence of LHS No "Fabrikam" startswith "fab"
!startswith RHS 不是 LHS 的初始子序列RHS isn't an initial subsequence of LHS No "Fabrikam" !startswith "kam"
startswith_cs RHS 是 LHS 的初始子序列RHS is an initial subsequence of LHS Yes "Fabrikam" startswith_cs "Fab"
!startswith_cs RHS 不是 LHS 的初始子序列RHS isn't an initial subsequence of LHS Yes "Fabrikam" !startswith_cs "fab"
endswith RHS 是 LHS 的闭合子序列RHS is a closing subsequence of LHS No "Fabrikam" endswith "Kam"
!endswith RHS 不是 LHS 的闭合子序列RHS isn't a closing subsequence of LHS No "Fabrikam" !endswith "brik"
endswith_cs RHS 是 LHS 的闭合子序列RHS is a closing subsequence of LHS Yes "Fabrikam" endswith "Kam"
!endswith_cs RHS 不是 LHS 的闭合子序列RHS isn't a closing subsequence of LHS Yes "Fabrikam" !endswith "brik"
matches regex LHS 包含 RHS 的匹配项LHS contains a match for RHS Yes "Fabrikam" matches regex "b.*k"
in 等于某个元素Equals to one of the elements Yes "abc" in ("123", "345", "abc")
!in 不等于任何元素Not equals to any of the elements Yes "bca" !in ("123", "345", "abc")
in~ 等于某个元素Equals to one of the elements No "abc" in~ ("123", "345", "ABC")
!in~ 不等于任何元素Not equals to any of the elements No "bca" !in~ ("123", "345", "ABC")
has_any has 相同,但适用于任何元素Same as has but works on any of the elements No "North America" has_any("south", "north")

提示

包含 has 的所有运算符都对四个或更多个字符的索引项进行搜索,而不对子字符串匹配项进行搜索。All operators containing has search on indexed terms of four or more characters, and not on substring matches. 通过将字符串分解为 ASCII 字母数字字符序列来创建词语。A term is created by breaking up the string into sequences of ASCII alphanumeric characters. 请参阅了解字符串词语See understanding string terms.

性能提示Performance tips

为了获得更好的性能,当存在两个执行相同任务的运算符时,请使用区分大小写的那个运算符。For better performance, when there are two operators that do the same task, use the case-sensitive one. 例如:For example:

  • 不要使用 =~,应使用 ==instead of =~, use ==
  • 不要使用 in~,应使用 ininstead of in~, use in
  • 不要使用 contains,应使用 contains_csinstead of contains, use contains_cs

为了更快地得到结果,如果要测试是否存在符号或字母数字式字词(受非字母数字字符或字段的开头或结尾限制),请使用 hasinFor faster results, if you're testing for the presence of a symbol or alphanumeric word that is bound by non-alphanumeric characters, or the start or end of a field, use has or in. has 执行起来比 containsstartswithendswith 更快。has works faster than contains, startswith, or endswith.

例如,下面的第一个查询运行速度更快:For example, the first of these queries will run faster:

EventLog | where continent has "North" | count;
EventLog | where continent contains "nor" | count