在认知搜索中向查询添加拼写检查Add spell check to queries in Cognitive Search

重要

拼写更正以公共预览版提供,只能通过预览版 REST API 使用。Spell correction is in public preview, available through the preview REST API only. 根据使用条款补充,预览功能按原样提供。Preview features are offered as-is, under Supplemental Terms of Use. 在初始预览版推出期间,拼写检查器不收取任何费用。During the initial preview launch, there is no charge for speller. 有关详细信息,请参阅可用性和定价For more information, see Availability and pricing.

在单个搜索查询字词进入搜索引擎之前,可以通过对其进行拼写更正来改善召回率。You can improve recall by spell-correcting individual search query terms before they reach the search engine. 所有查询类型都支持 拼写检查器 参数:简单完整和新的 语义选项当前为公共预览版。The speller parameter is supported for all query types: simple, full, and the new semantic option currently in public preview.

先决条件Prerequisites

  • 包含英文内容的现有搜索索引。An existing search index, containing English content. 目前,拼写校正不能用于同义词Currently, spell correction does not work with synonyms. 避免在任何字段定义中指定同义词映射的索引上使用它。Avoid using it on indexes that specify a synonym map in any field definition.

  • 用于发送查询的搜索客户端A search client for sending queries

    搜索客户端必须支持查询请求的预览版 REST API。The search client must support preview REST APIs on the query request. 你可以使用 PostmanVisual Studio Code 或已修改的代码对预览版 API 发出 REST 调用。You can use Postman, Visual Studio Code, or code that you've modified to make REST calls to the preview APIs.

  • 使用拼写更正的查询请求具有 "api-version=2020-06-30-Preview"、"speller=lexicon" 和 "queryLanguage=en-us"。A query request that uses spell correction has "api-version=2020-06-30-Preview", "speller=lexicon", and "queryLanguage=en-us".

    queryLanguage 是拼写检查器所必需的,当前“en-us”是唯一有效的值。The queryLanguage is required for speller, and currently "en-us" is the only valid value.

备注

拼写检查器参数在提供语义搜索的同一区域中的所有层上都可用。The speller parameter is available on all tiers, in the same regions that provide semantic search. 你不需要注册即可访问此预览功能。You do not need to sign up for access to this preview feature. 有关详细信息,请参阅可用性和定价For more information, see Availability and pricing.

下面的示例使用内置的 hotels-sample 索引演示如何对一个简单自由格式文本查询进行拼写更正。The following example uses the built-in hotels-sample index to demonstrate spell correction on a simple free form text query. 在不进行拼写更正的情况下,该查询返回零个结果。Without spell correction, the query returns zero results. 更正后,该查询返回一条结果(Johnson 家庭度假村)。With correction, the query returns one result for Johnson's family-oriented resort.

POST https://[service name].search.azure.cn/indexes/hotels-sample-index/docs/search?api-version=2020-06-30-Preview
{
    "search": "famly acitvites",
    "speller": "lexicon",
    "queryLanguage": "en-us",
    "queryType": "simple",
    "select": "HotelId,HotelName,Description,Category,Tags",
    "count": true
}

对完整 Lucene 的拼写更正Spell correction with full Lucene

拼写更正是针对接受分析的单个查询字词进行的,正因如此,你可以对某些 Lucene 查询(但不能是其他查询)使用拼写检查器参数。Spelling correction occurs on individual query terms that undergo analysis, which is why you can use the speller parameter with some Lucene queries, but not others.

  • 绕过文本分析的不兼容查询格式包括:通配符、正则表达式、模糊Incompatible query forms that bypass text analysis include: wildcard, regex, fuzzy
  • 兼容的查询格式包括:字段搜索、接近性、字词提升Compatible query forms include: fielded search, proximity, term boosting

此示例对采用完整 Lucene 语法并包含一个拼错的查询字词的“类别”字段使用字段搜索。This example uses fielded search over the Category field, with full Lucene syntax, and a misspelled query term. 通过包含拼写检查器,“Suiite”中的拼写错误已得到更正,查询将会成功。By including speller, the typo in "Suiite" is corrected and the query succeeds.

POST https://[service name].search.azure.cn/indexes/hotels-sample-index/docs/search?api-version=2020-06-30-Preview
{
    "search": "Category:(Resort and Spa) OR Category:Suiite",
    "queryType": "full",
    "speller": "lexicon",
    "queryLanguage": "en-us",
    "select": "Category",
    "count": true
}

此查询只有一个字词拼写正确,其他每个字词都有拼写错误,它将接受拼写更正以返回相关结果。This query, with typos in every term except one, undergoes spelling corrections to return relevant results. 若要了解详细信息,请参阅创建语义查询To learn more, see Create a semantic query.

POST https://[service name].search.azure.cn/indexes/hotels-sample-index/docs/search?api-version=2020-06-30-Preview     
{
    "search": "hisotoric hotell wiht great restrant nad wiifi",
    "queryType": "semantic",
    "speller": "lexicon",
    "queryLanguage": "en-us",
    "searchFields": "HotelName,Tags,Description",
    "select": "HotelId,HotelName,Description,Category,Tags",
    "count": true
}

语言注意事项Language considerations

拼写检查器所需的 queryLanguage 参数必须与分配给索引架构中的字段定义的任何语言分析器一致。The queryLanguage parameter required for speller must be consistent with any language analyzers assigned to field definitions in the index schema.

  • queryLanguage 确定哪些词典用于拼写检查,如果使用了“queryType=semantic”,则它还用作语义排名算法的输入。queryLanguage determines which lexicons are used for spell check, and is also used as an input to the semantic ranking algorithm if you are using "queryType=semantic".

  • 在编制索引和执行查询期间将使用语言分析器在搜索索引中查找匹配的文档。Language analyzers are used during indexing and query execution to find matching documents in the search index. 使用语言分析器的字段定义的示例是 "name": "Description", "type": "Edm.String", "analyzer": "en.microsoft"An example of a field definition that uses a language analyzer is "name": "Description", "type": "Edm.String", "analyzer": "en.microsoft".

为了在使用拼写检查器时获得最佳结果,如果 queryLanguage 为“en-us”,则任何语言分析器也必须是英语变体(“en.microsoft”或“en.lucene”)。For best results when using speller, if queryLanguage is "en-us", then any language analyzers must also be an English variant ("en.microsoft" or "en.lucene").

备注

与语言无关的分析器(例如关键字、简单、标准、非索引字、空白或 standardasciifolding.lucene 分析器)不会与 queryLanguage 设置冲突。Language-agnostic analyzers (such as keyword, simple, standard, stop, whitespace, or standardasciifolding.lucene) do not conflict with queryLanguage settings.

在查询请求中,你设置的 queryLanguage 以同等方式应用于拼写检查器、答案和标题。In a query request, the queryLanguage you set applies equally to speller, answers, and captions. 对于单个部分,不存在重写。There is no override for individual parts.

尽管搜索索引中的内容可以用多种语言撰写,但查询输入很可能用一种语言。While content in a search index can be composed in multiple languages, the query input is most likely in one. 搜索引擎不检查 queryLanguage、语言分析器以及撰写内容所用的语言的兼容性,因此请确保对查询范围进行相应的限定以避免产生不正确的结果。The search engine doesn't check for compatibility of queryLanguage, language analyzer, and the language in which content is composed, so be sure to scope queries accordingly to avoid producing incorrect results.

后续步骤Next steps