正则表达式实体Regular expression entity

正则表达式实体基于所提供的正则表达式模式提取实体。A regular expression entity extracts an entity based on a regular expression pattern you provide.

正则表达式最适合用于原始话语文本。A regular expression is best for raw utterance text. 不区分大小写,并忽略区域性变体。It ignores case and ignores cultural variant. 完成字符级别而不是令牌级别的拼写检查更改后,会应用正则表达式匹配。Regular expression matching is applied after spell-check alterations at the character level, not the token level. 如果正则表达式过于复杂,例如使用了许多括号,则不能将表达式添加到模型。If the regular expression is too complex, such as using many brackets, you're not able to add the expression to the model. 使用部分但并非全部 .NET Regex 库。Uses part but not all of the .NET Regex library.

在以下情况下,非常适合使用此实体:The entity is a good fit when:

  • 数据的格式一致,并且其任何变体也是一致的。The data are consistently formatted with any variation that is also consistent.
  • 正则表达式不需要 2 个级别以上的嵌套。The regular expression does not need more than 2 levels of nesting.

正则表达式实体

使用注意事项Usage considerations

正则表达式可能匹配超出预期的匹配。Regular expressions may match more than you expect to match. 例如,数字单词匹配,例如 onetwoAn example of this is numeric word matching such as one and two. 例如,下面的正则表达式匹配数字 one 以及其他数字:An example is the following regex, which matches the number one along with other numbers:

(plus )?(zero|one|two|three|four|five|six|seven|eight|nine)(\s+(zero|one|two|three|four|five|six|seven|eight|nine))*

此正则表达式还匹配以这些数字结尾的任何单词,如 phoneThis regex expression also matches any words that end with these numbers, such as phone. 为了解决这样的问题,请确保正则表达式匹配考虑到单词边界。In order to fix issues like this, make sure the regex matches takes into account word boundaries. 此示例中使用单词边界的正则表达式用于以下正则表达式:The regex to use word boundaries for this example is used in the following regex:

\b(plus )?(zero|one|two|three|four|five|six|seven|eight|nine)(\s+(zero|one|two|three|four|five|six|seven|eight|nine))*\b

示例 JSONExample JSON

如果将 kb[0-9]{6} 用作正则表达式实体定义,则下面的 JSON 响应就是一个示例话语,其中包含查询返回的正则表达式实体:When using kb[0-9]{6}, as the regular expression entity definition, the following JSON response is an example utterance with the returned regular expression entities for the query:

When was kb123456 published?When was kb123456 published?:

"entities": [
  {
    "entity": "kb123456",
    "type": "KB number",
    "startIndex": 9,
    "endIndex": 16
  }
]

后续步骤Next steps

详细了解实体:Learn more about entities: