在 Azure 认知搜索中返回语义答案Return a semantic answer in Azure Cognitive Search

重要

语义搜索功能目前为公共预览版,仅可通过预览版 REST API 使用。Semantic search is in public preview, available through the preview REST API only. 根据使用条款补充,预览版功能按原样提供,不保证与正式版实现的功能相同。Preview features are offered as-is, under Supplemental Terms of Use, and are not guaranteed to have the same implementation at general availability. 这些功能将计费。These features are billable. 有关详细信息,请参阅可用性和定价For more information, see Availability and pricing.

在表述语义查询时,可以选择性地从最匹配的文档中直接提取能够“回答”查询的内容。When formulating a semantic query, you can optionally extract content from the top-matching documents that "answers" the query directly. 可以在响应中包含一个或多个答案,然后可以在搜索页上呈现这些答案,以改善应用的用户体验。One or more answers can be included in the response, which you can then render on a search page to improve the user experience of your app.

本文介绍如何请求语义答案、解包响应,以及哪些内容特征最有利于生成高质量的答案。In this article, learn how to request a semantic answer, unpack the response, and learn what content characteristics are most conducive to producing high quality answers.

必备条件Prerequisites

适用于语义查询的所有先决条件(包括服务层和区域)同样适用于答案。All prerequisites that apply to semantic queries also apply to answers, including service tier and region.

  • 查询逻辑必须包括语义查询参数和“answers”参数。Query logic must include the semantic query parameters, plus the "answers" parameter. 本文将介绍所需的参数。Required parameters are discussed in this article.

  • 必须使用具有提问特征(什么、何处、何时、如何)的语言来表述用户输入的查询字符串。Query strings entered by the user must be formulated in language having the characteristics of a question (what, where, when, how).

  • 搜索文档必须包含具有回答特征的文本,并且该文本必须存在于“searchFields”中列出的某个字段内。Search documents must contain text having the characteristics of an answer, and that text must exist in one of the fields listed in "searchFields". 例如,给定查询“what is a hash table”,如果没有一个 searchFields 包含其中有“A hash table is ...”的段落,则不太可能返回答案。For example, given a query "what is a hash table", if none of the searchFields contain passages that include "A hash table is ..." , then an answer is unlikely to be returned.

什么是语义答案?What is a semantic answer?

语义答案是语义查询响应的子结构。A semantic answer is a substructure of a semantic query response. 它由搜索文档中的一个或多个字面段落构成,表述为对类似于提问的查询给出的答案。It consists of one or more verbatim passages from a search document, formulated as an answer to a query that looks like a question. 若要使答案能够返回,短语或句子必须存在于具有回答语言特征的搜索文档中,并且查询本身必须以提问的形式表示。For an answer to be returned, phrases or sentences must exist in a search document that have the language characteristics of an answer, and the query itself must be posed as a question.

认知搜索使用机器阅读理解模型来选取最佳答案。Cognitive Search uses a machine reading comprehension model to pick the best answer. 该模型从可用的内容生成一组潜在答案,在达到足够高的置信度级别后,它便会给出答案。The model produces a set of potential answers from the available content, and when it reaches a high enough confidence level, it will propose an answer.

答案将在查询响应有效负载中作为独立的顶级对象返回,你可以选择在搜索页上将此对象与搜索结果一起呈现。Answers are returned as an independent, top-level object in the query response payload that you can choose to render on search pages, along side search results. 从结构上讲,它是响应中的一个数组元素,由文本、文档键和置信度分数组成。Structurally, it's an array element within the response consisting of text, a document key, and a confidence score.

如何在查询中请求语义答案How to request semantic answers in a query

若要返回语义答案,查询中必须包含“queryType”、“queryLanguage”、“searchFields”和“answers”语义参数。To return a semantic answer, the query must have the semantic "queryType", "queryLanguage", "searchFields", and the "answers" parameter. 指定“answers”参数并不保证会获得答案,但如果要在任何情况下都调用回答处理过程,请求中必须包含此参数。Specifying the "answers" parameter does not guarantee that you will get an answer, but the request must include this parameter if answer processing is to be invoked at all.

“searchFields”参数对于返回内容和顺序方面的高质量答案至关重要(见下)。The "searchFields" parameter is crucial to returning a high quality answer, both in terms of content and order (see below).

{
    "search": "how do clouds form",
    "queryType": "semantic",
    "queryLanguage": "en-us",
    "searchFields": "title,locations,content",
    "answers": "extractive|count-3",
    "count": "true"
}
  • 查询字符串不得为 null,并且应表述为提问形式。A query string must not be null and should be formulated as question. 在此预览版中,必须严格按照示例中所示设置“queryType”和“queryLanguage”。In this preview, the "queryType" and "queryLanguage" must be set exactly as shown in the example.

  • “searchFields”参数确定哪些字符串字段向提取模型提供标记。The "searchFields" parameter determines which string fields provide tokens to the extraction model. 生成标题的字段也会生成答案。The same fields that produce captions also produce answers. 有关如何设置此字段以使其同时适用于标题和答案的精确指导,请参阅设置 searchFieldsFor precise guidance on how to set this field so that it works for both captions and answers, see Set searchFields.

  • 对于“answers”,参数构造为 "answers": "extractive",其中,返回的默认答案数为 1。For "answers", parameter construction is "answers": "extractive", where the default number of answers returned is one. 可以通过添加计数(如上例所示)来增加答案的数量(最多 5 个)。You can increase the number of answers by adding a count as shown in the above example, up to a maximum of five. 是否需要多个答案取决于应用的用户体验以及呈现结果的方式。Whether you need more than one answer depends on the user experience of your app, and how you want to render results.

解构响应中的答案Deconstruct an answer from the response

答案在 @search.answers 数组中提供,该数组在响应中最先出现。Answers are provided in the @search.answers array, which appears first in the response. 如果答案是不确定的,则响应将显示为 "@search.answers": []If an answer is indeterminate, the response will show up as "@search.answers": []. 设计包含答案的搜索结果页时,请确保处理找不到答案的情况。When designing a search results page that includes answers, be sure to handle cases where answers are not found.

在 @search.answers 中,“key”是匹配项的文档键或 ID。Within @search.answers, the "key" is the document key or ID of the match. 获取文档键后,可以使用查找文档 API 来检索要包含在搜索页或详细信息页中的搜索文档的任何部分或所有部分。Given a document key, you can use Lookup Document API to retrieve any or all parts of the search document to include on the search page or a detail page.

“text”和“highlights”以纯文本格式和突出显示形式提供相同的内容。Both "text" and "highlights" provide identical content, in both plain text and with highlights. 默认情况下,突出显示的样式为 <em>,你可以使用现有的 highlightPreTag 和 highlightPostTag 参数替代此样式。By default, highlights are styled as <em>, which you can override using the existing highlightPreTag and highlightPostTag parameters. 如其他部分中所述,答案实质上是搜索文档中的字面内容。As noted elsewhere, the substance of an answer is verbatim content from a search document. 提取模型将查找答案的特征以查找相应的内容,但不会在响应中撰写新语言。The extraction model looks for characteristics of an answer to find the appropriate content, but does not compose new language in the response.

“score”是置信度分数,用于反映答案的可信度。The "score" is a confidence score that reflects the strength of the answer. 如果响应中有多个答案,此分数将用于确定顺序。If there are multiple answers in the response, this score is used to determine the order. 可以从不同的搜索文档派生最可信的答案和最可信的标题,其中,最可信的答案来自一个文档,最可信的标题来自另一个文档,但一般情况下,你会在每个数组中的最靠前位置看到相同的文档。Top answers and top captions can be derived from different search documents, where the top answer originates from one document, and the top caption from another, but in general you will see the same documents in the top positions within each array.

答案后接“value”数组,该数组始终包含分数、标题以及默认可检索的任何字段。Answers are followed by the "value" array, which always includes scores, captions, and any fields that are retrievable by default. 如果指定了 select 参数,则“value”数组限制为指定的字段。If you specified the select parameter, the "value" array is limited to the fields that you specified. 有关响应中的项的详细信息,请参阅创建语义查询For more information about items in the response, see Create a semantic query.

假设查询为“how do clouds form”,响应中将返回以下答案:Given the query "how do clouds form", the following answer is returned in the response:

{
    "@search.answers": [
        {
            "key": "4123",
            "text": "Sunlight heats the land all day, warming that moist air and causing it to rise high into the   atmosphere until it cools and condenses into water droplets. Clouds generally form where air is ascending (over land in this case),   but not where it is descending (over the river).",
            "highlights": "Sunlight heats the land all day, warming that moist air and causing it to rise high into the   atmosphere until it cools and condenses into water droplets. Clouds generally form<em> where air is ascending</em> (over land in this case),   but not where it is<em> descending</em> (over the river).",
            "score": 0.94639826
        }
    ],
    "value": [
        {
            "@search.score": 0.5479723,
            "@search.rerankerScore": 1.0321671911515296,
            "@search.captions": [
                {
                    "text": "Like all clouds, it forms when the air reaches its dew point—the temperature at which an air mass is cool enough for its water vapor to condense into liquid droplets. This false-color image shows valley fog, which is common in the Pacific Northwest of North America.",
                    "highlights": "Like all<em> clouds</em>, it<em> forms</em> when the air reaches its dew point—the temperature at    which an air mass is cool enough for its water vapor to condense into liquid droplets. This false-color image shows valley<em> fog</em>, which is common in the Pacific Northwest of North America."
                }
            ],
            "title": "Earth Atmosphere",
            "content": "Fog is essentially a cloud lying on the ground. Like all clouds, it forms when the air reaches its dew point—the temperature at  \n\nwhich an air mass is cool enough for its water vapor to condense into liquid droplets.\n\nThis false-color image shows valley fog, which is common in the Pacific Northwest of North America. On clear winter nights, the \n\nground and overlying air cool off rapidly, especially at high elevations. Cold air is denser than warm air, and it sinks down into the \n\nvalleys. The moist air in the valleys gets chilled to its dew point, and fog forms. If undisturbed by winds, such fog may persist for \n\ndays. The Terra satellite captured this image of foggy valleys northeast of Vancouver in February 2010.\n\n\n",
            "locations": [
                "Pacific Northwest",
                "North America",
                "Vancouver"
            ]
        }

有关生成高质量答案的提示Tips for producing high-quality answers

为获得最佳结果,应返回文档集中具有以下特征的语义答案:For best results, return semantic answers on a document corpus having the following characteristics:

  • “searchFields”必须提供可提供足够文本的字段,这样就可能在其中找到答案。"searchFields" must provide fields that offer sufficient text in which an answer is likely to be found. 只有文档中的逐字文本才能显示为答案。Only verbatim text from a document can appear as an answer.

  • 查询字符串不得为 null (search=*),并且字符串应具有提问的特征,这与关键字搜索(任意字词或短语的顺序列表)完全不同。query strings must not be null (search=*) and the string should have the characteristics of a question, as opposed to a keyword search (a sequential list of arbitrary terms or phrases). 如果查询字符串看上去不是答案,则会跳过答案处理,即使请求指定了“answers”作为查询参数,也是如此。If the query string does not appear to be answer, answer processing is skipped, even if the request specifies "answers" as a query parameter.

  • 语义提取和汇总对于可以及时分析的每个文档中的标记数有限制。Semantic extraction and summarization have limits over how many tokens per document can be analyzed in a timely fashion. 在实践中,如果你的大型文档包含数百个页面,则你应该首先尝试将内容分解为较小的文档。In practical terms, if you have large documents that run into hundreds of pages, you should try to break the content up into smaller documents first.

后续步骤Next steps