Azure 认知搜索中的查询类型和组成部分Query types and composition in Azure Cognitive Search

在 Azure 认知搜索中,查询是往返操作的完整规范。In Azure Cognitive Search, a query is a full specification of a round-trip operation. 在请求中,有参数可为引擎提供执行指令,还有参数来形成要返回的响应。On the request, there are parameters that provide execution instructions for the engine, as well as parameters that shape the response coming back. 如果未指定这些内容 (search=*)、不使用匹配条件以及使用 null 或默认参数,查询会作为全文搜索操作针对所有可搜索字段执行,返回一个任意排序的未评分结果集。Unspecified (search=*), with no match criteria and using null or default parameters, a query executes against all searchable fields as a full text search operation, returning an unscored result set in arbitrary order.

以下示例是在 REST API 中构造的代表性查询。The following example is a representative query constructed in the REST API. 该示例针对的是酒店演示索引,它包含常见参数,让你能够了解到查询是什么样子的。This example targets the hotels demo index and includes common parameters so that you can get an idea of what a query looks like.

{
    "queryType": "simple" 
    "search": "+New York +restaurant",
    "searchFields": "Description, Address/City, Tags",
    "select": "HotelId, HotelName, Description, Rating, Address/City, Tags",
    "top": "10",
    "count": "true",
    "orderby": "Rating desc"
}
  • queryType 设置分析器,该分析器可以是默认的简单查询分析器(最适合用于全文搜索),也可以是完整的 Lucene 查询分析器(用于正则表达式、邻近搜索、模糊和通配符搜索等高级查询构造)。queryType sets the parser, which is either the default simple query parser (optimal for full text search), or the full Lucene query parser used for advanced query constructs like regular expressions, proximity search, fuzzy and wildcard search, to name a few.

  • search 提供匹配条件(通常是整个搜索词或短语,但往往带有布尔运算符)。search provides the match criteria, usually whole terms or phrases, but often accompanied by boolean operators. 包含单个独立字词的查询称为字词查询。Single standalone terms are term queries. 由括在引号中的多个部分组成的查询称为短语查询。Quote-enclosed multi-part queries are phrase queries. 搜索可以不用定义(如 search=* 中所示),但如果没有要匹配的条件,则结果集由任意选定的文档组成。Search can be undefined, as in search=*, but with no criteria to match on, the result set is composed of arbitrarily selected documents.

  • searchFields 将查询执行约束为特定的字段。searchFields constrains query execution to specific fields. 在索引架构中设置了 searchable 属性的任何字段都适合指定此参数。Any field that is attributed as searchable in the index schema is a candidate for this parameter.

还可通过查询中包含的参数来调整响应:Responses are also shaped by the parameters you include in the query:

  • select 指定要在响应中返回哪些字段。select specifies which fields to return in the response. 仅可在 select 语句中使用索引内标记为“可检索”的字段。Only fields marked as retrievable in the index can be used in a select statement.

  • top 返回指定数目的最匹配的文档。top returns the specified number of best-matching documents. 在本例中,仅返回 10 个命中项。In this example, only 10 hits are returned. 你可使用 top 和 skip(未显示)分页显示结果。You can use top and skip (not shown) to page the results.

  • count 指出整体上整个索引中多少文档匹配,该数目可能比返回的数目多。count tells you how many documents in the entire index match overall, which can be more than what are returned.

  • 如果你想要按值(例如排名或位置)对结果分类,则使用 orderbyorderby is used if you want to sort results by a value, such as a rating or location. 否则,默认使用相关性分数对结果进行排名。Otherwise, the default is to use the relevance score to rank results.

在 Azure 认知搜索中,查询执行始终针对一个使用请求中提供的 API 密钥进行身份验证的索引。In Azure Cognitive Search, query execution is always against one index, authenticated using an api-key provided in the request. 在 REST 中,两者均在请求标头中提供。In REST, both are provided in request headers.

如何运行此查询How to run this query

在编写任何代码之前,你都可使用查询工具来学习语法并用不同的参数进行试验。Before writing any code, you can use query tools to learn the syntax and experiment with different parameters. 最快捷的方法是使用内置门户工具 - 搜索浏览器The quickest approach is the built-in portal tool, Search Explorer.

如果按照这份关于创建酒店演示索引的快速入门操作,则可将此查询字符串粘贴到浏览器的搜索栏中来运行你的第一个查询:search=+"New York" +restaurant&searchFields=Description, Address/City, Tags&$select=HotelId, HotelName, Description, Rating, Address/City, Tags&$top=10&$orderby=Rating desc&$count=trueIf you followed this quickstart to create the hotels demo index, you can paste this query string into the explorer's search bar to run your first query: search=+"New York" +restaurant&searchFields=Description, Address/City, Tags&$select=HotelId, HotelName, Description, Rating, Address/City, Tags&$top=10&$orderby=Rating desc&$count=true

索引如何启用查询操作How query operations are enabled by the index

索引设计和查询设计在 Azure 认知搜索中紧密耦合。Index design and query design are tightly coupled in Azure Cognitive Search. 需要提前知道的一个重要事实是,包含每个字段中属性的索引架构确定了可以生成的查询类型。An essential fact to know up front is that the index schema, with attributes on each field, determines the kind of query you can build.

字段中的索引属性设置允许的操作 - 字段在索引中是否可搜索、在结果中是否可检索、是否可排序、是否可筛选,等等。Index attributes on a field set the allowed operations - whether a field is searchable in the index, retrievable in results, sortable, filterable, and so forth. 在示例查询字符串中,只有 "$orderby": "Rating" 可以正常工作,因为 Rating 字段在索引架构中标记为 sortableIn the example query string, "$orderby": "Rating" only works because the Rating field is marked as sortable in the index schema.

酒店示例的索引定义Index definition for the hotel sample

以上屏幕截图是酒店示例的索引属性的部分列表。The above screenshot is a partial list of index attributes for the hotels sample. 可在门户中查看整个索引架构。You can view the entire index schema in the portal. 有关索引属性的详细信息,请参阅创建索引 REST APIFor more information about index attributes, see Create Index REST API.

备注

某些查询功能在索引范围启用,而不是按字段启用。Some query functionality is enabled index-wide rather than on a per-field basis. 这些功能包括:同义词映射自定义分析器建议器构造(用于自动完成和建议的查询)排名结果的评分逻辑These capabilities include: synonym maps, custom analyzers, suggester constructs (for autocomplete and suggested queries), scoring logic for ranking results.

查询请求的元素Elements of a query request

查询始终指向单个索引。Queries are always directed at a single index. 不能联接索引或者创建自定义或临时数据结构作为查询目标。You cannot join indexes or create custom or temporary data structures as a query target.

查询请求上的必需元素包括以下内容:Required elements on a query request include the following components:

  • 以 URL 表示的、包含固定和用户定义组件的服务终结点与索引文档集合: https://<your-service-name>.search.azure.cn/indexes/<your-index-name>/docsService endpoint and index documents collection, expressed as a URL containing fixed and user-defined components: https://<your-service-name>.search.azure.cn/indexes/<your-index-name>/docs
  • 之所以需要 api-version (仅适用于 REST),是因为始终有多个可用的 API 版本。api-version (REST only) is necessary because more than one version of the API is available at all times.
  • api-key :查询或管理 API 密钥,用于对服务请求进行身份验证。api-key, either a query or admin api-key, authenticates the request to your service.
  • queryType :简单或完整类型,如果想使用内置的默认简单语法,则可以省略此元素。queryType, either simple or full, which can be omitted if you are using the built-in default simple syntax.
  • searchfilter 提供匹配条件,如果想要执行空搜索,则可以不指定此元素。search or filter provides the match criteria, which can be unspecified if you want to perform an empty search. 这两种查询类型都是作为简单分析器讨论的,但即使是高级查询,也需要通过搜索参数来传递复杂的查询表达式。Both query types are discussed in terms of the simple parser, but even advanced queries require the search parameter for passing complex query expressions.

所有其他搜索参数都为可选参数。All other search parameters are optional. 有关属性的完整列表,请参阅创建索引 (REST)For the full list of attributes, see Create Index (REST). 有关如何在处理期间使用参数的详细信息,请参阅 Azure 认知搜索中全文搜索的工作原理For a closer look at how parameters are used during processing, see How full-text search works in Azure Cognitive Search.

选择 API 和工具Choose APIs and tools

下表列出用于提交查询的 API 和基于工具的方法。The following table lists the APIs and tool-based approaches for submitting queries.

方法Methodology 说明Description
搜索浏览器(门户)Search explorer (portal) 提供搜索栏,以及索引和 API 版本选项。Provides a search bar and options for index and api-version selections. 结果会以 JSON 文档的形式返回。Results are returned as JSON documents. 建议用于浏览、测试和验证。Recommended for exploration, testing, and validation.
了解详细信息。Learn more.
Postman 或其他 REST 工具Postman or other REST tools Web 测试工具是用公式表示 REST 调用的极佳选择。Web testing tools are an excellent choice for formulating REST calls. REST API 支持 Azure 认知搜索中的每个可能操作。The REST API supports every possible operation in Azure Cognitive Search. 在本文中,了解如何设置 HTTP 请求标头和正文,以便向 Azure 认知搜索发送请求。In this article, learn how to set up an HTTP request header and body for sending requests to Azure Cognitive Search.
SearchIndexClient (.NET)SearchIndexClient (.NET) 可用于查询 Azure 认知搜索索引的客户端。Client that can be used to query an Azure Cognitive Search index.
了解详细信息。Learn more.
搜索文档 (REST API)Search Documents (REST API) 索引上的 GET 或 POST 方法,使用查询参数进行其他输入。GET or POST methods on an index, using query parameters for additional input.

选择一个分析器:简单 | 完整Choose a parser: simple | full

Azure 认知搜索基于 Apache Lucene ,提供两种查询分析器选择,分别用于处理典型查询和专用查询。Azure Cognitive Search sits on top of Apache Lucene and gives you a choice between two query parsers for handling typical and specialized queries. 使用简单分析器的请求是通过简单查询语法构建的。由于在自由格式文本查询中具有速度和效率优势,这种语法已选作默认语法。Requests using the simple parser are formulated using the simple query syntax, selected as the default for its speed and effectiveness in free form text queries. 此语法支持多种常用的搜索运算符,包括 AND、OR、NOT、短语、后缀和优先运算符。This syntax supports a number of common search operators including the AND, OR, NOT, phrase, suffix, and precedence operators.

在将 queryType=full 添加到请求时所启用的完整 Lucene 查询语法公开作为 Apache Lucene 的一部分开发的、已被广泛采用的且富有表达能力的查询语言。The full Lucene query syntax, enabled when you add queryType=full to the request, exposes the widely adopted and expressive query language developed as part of Apache Lucene. 完整语法扩展了简单语法。Full syntax extends the simple syntax. 为简单语法编写的任何查询在完整 Lucene 分析器下运行。Any query you write for the simple syntax runs under the full Lucene parser.

以下示例演示了一个要点:采用不同 queryType 设置的同一个查询会产生不同的结果。The following examples illustrate the point: same query, but with different queryType settings, yield different results. 在第一个查询中,historic 后面的 ^3 被视为搜索字词的一部分。In the first query, the ^3 after historic is treated as part of the search term. 此查询的最高排名结果是“Marquis Plaza & Suites”,其说明中包含 oceanThe top-ranked result for this query is "Marquis Plaza & Suites", which has ocean in its description

queryType=simple&search=ocean historic^3&searchFields=Description, Tags&$select=HotelId, HotelName, Tags, Description&$count=true

使用完整 Lucene 分析器的同一查询会将 ^3 解释为字段内字词提升器。The same query using the full Lucene parser interprets ^3 as an in-field term booster. 切换分析器会更改排名,并将包含 historic 一词的结果移到最前面。Switching parsers changes the rank, with results containing the term historic moving to the top.

queryType=full&search=ocean historic^3&searchFields=Description, Tags&$select=HotelId, HotelName, Tags, Description&$count=true

查询类型Types of queries

Azure 认知搜索支持广泛的查询类型。Azure Cognitive Search supports a broad range of query types.

查询类型Query type 使用情况Usage 示例和详细信息Examples and more information
自由格式文本搜索Free form text search 搜索参数和任一分析器Search parameter and either parser 全文搜索在索引中所有可搜索字段中扫描一个或多个字词,其工作方式与你所期望的搜索引擎 (如 Bing) 的工作方式相同。Full text search scans for one or more terms in all searchable fields in your index, and works the way you would expect a search engine like Bing to work. 简介中的示例属于全文搜索。The example in the introduction is full text search.

全文搜索默认使用标准 Lucene 分析器来执行文本分析,以将字词设为小写,并删除“the”等干扰词。Full text search undergoes text analysis using the standard Lucene analyzer (by default) to lower-case all terms, remove stop words like "the". 可将默认设置替代为可以修改文本分析的非英语分析器专用的与语言无关的分析器You can override the default with non-English analyzers or specialized language-agnostic analyzers that modify text analysis. 例如,将整个字段内容视为单个标记的关键字An example is keyword that treats the entire contents of a field as a single token. 此分析器可用于邮政编码、ID 和某些产品名称等数据。This is useful for data like zip codes, IDs, and some product names.
筛选的搜索Filtered search OData 筛选表达式和任一分析器OData filter expression and either parser 筛选器查询对索引中的所有可筛选字段计算布尔表达式 。Filter queries evaluate a boolean expression over all filterable fields in an index. 与搜索不同,筛选器查询与字段内容完全匹配,包括字符串字段的大小写区分。Unlike search, a filter query matches the exact contents of a field, including case-sensitivity on string fields. 另一项差别在于,筛选器查询以 OData 语法表示。Another difference is that filter queries are expressed in OData syntax.
筛选表达式示例Filter expression example
地理搜索Geo-search 字段中的 Edm.GeographyPoint 类型、筛选表达式和任一分析器Edm.GeographyPoint type on the field, filter expression, and either parser 存储在字段中的具有 Edm.GeographyPoint 的坐标用于“附近查找”或基于地图的搜索控件。Coordinates stored in a field having an Edm.GeographyPoint are used for "find near me" or map-based search controls.
地理搜索示例Geo-search example
范围搜索Range search 筛选表达式和简单分析器filter expression and simple parser 在 Azure 认知搜索中,范围查询是使用筛选器参数生成的。In Azure Cognitive Search, range queries are built using the filter parameter.
范围筛选器示例Range filter example
字段化搜索Fielded search 搜索参数和完整分析器Search parameter and Full parser 针对单个字段生成复合查询表达式。Build a composite query expression targeting a single field.
字段化搜索示例Fielded search example
模糊搜索fuzzy search 搜索参数和完整分析器Search parameter and Full parser 匹配具有类似构造或拼写方式的字词。Matches on terms having a similar construction or spelling.
模糊搜索示例Fuzzy search example
邻近搜索proximity search 搜索参数和完整分析器Search parameter and Full parser 查找在文档中相互靠近的字词。Finds terms that are near each other in a document.
邻近搜索示例Proximity search example
术语提升term boosting 搜索参数和完整分析器Search parameter and Full parser 如果某个文档包含提升的字词(相对于其他未提升的字词),则提高其排名。Ranks a document higher if it contains the boosted term, relative to others that don't.
字词提升示例Term boosting example
正则表达式搜索regular expression search 搜索参数和完整分析器Search parameter and Full parser 基于正则表达式的内容进行匹配。Matches based on the contents of a regular expression.
正则表达式示例Regular expression example
通配符或前缀搜索wildcard or prefix search 搜索参数和完整分析器Search parameter and Full parser 基于前缀和波浪符 (~) 或单个字符 (?) 进行匹配。Matches based on a prefix and tilde (~) or single character (?).
通配符搜索示例Wildcard search example

管理搜索结果Manage search results

查询结果会流式处理为 REST API 中的 JSON 文档,但如果使用 .NET API,则会内置序列化功能。Query results are streamed as JSON documents in the REST API, although if you use .NET APIs, serialization is built in. 可通过在查询中设置参数并为响应选择特定字段来调整结果。You can shape results by setting parameters on the query, selecting specific fields for the response.

可通过以下方式使用查询上的参数来调整结果集的结构:Parameters on the query can be used to structure the result set in the following ways:

  • 对结果中的文档数量(默认为 50 个)进行限制或分批Limiting or batching the number of documents in the results (50 by default)
  • 选择结果中要包含的字段Selecting fields to include in the results
  • 设置排列顺序Setting a sort order
  • 添加突出显示效果,以便在搜索结果正文中清楚看到匹配的搜索词Adding hit highlights to draw attention to matching terms in the body of the search results

意外结果提示Tips for unexpected results

有时可能会出现预料外的结果内容(而不是结构)。Occasionally, the substance and not the structure of results are unexpected. 如果查询结果并不是预期内容,可以尝试对查询进行以下修改,然后查看结果是否改进:When query outcomes are not what you expect to see, you can try these query modifications to see if results improve:

  • searchMode=any (默认)更改为 searchMode=all 可获取符合所有条件而不是某个条件的匹配项。Change searchMode=any (default) to searchMode=all to require matches on all criteria instead of any of the criteria. 在查询包含布尔运算符时更应如此。This is especially true when boolean operators are included the query.

  • 如果需要进行文本或词法分析但查询类型排除了语言处理环节,请更改查询方法。Change the query technique if text or lexical analysis is necessary, but the query type precludes linguistic processing. 在全文搜索中,文本或词法分析会自动更正拼写错误、单复数形式以及不合规范的谓词或名词。In full text search, text or lexical analysis autocorrects for spelling errors, singular-plural word forms, and even irregular verbs or nouns. 对于模糊搜索和通配符搜索等一些查询,其查询分析管道中不包含词法分析。For some queries such as fuzzy or wildcard search, lexical analysis is not part of the query parsing pipeline. 在某些情况下会采用正则表达式作为解决方法。For some scenarios, regular expressions have been used as a workaround.

分页结果Paging results

Azure 认知搜索可轻松对搜索结果进行分页。Azure Cognitive Search makes it easy to implement paging of search results. 使用 topskip 参数可顺利地发出搜索请求,接收搜索结果总集,并通过其中易于管理的有序子集轻松完成效果良好的搜索 UI 操作。By using the top and skip parameters, you can smoothly issue search requests that allow you to receive the total set of search results in manageable, ordered subsets that easily enable good search UI practices. 接收较小的结果子集时,还可以在搜索结果总集中获得文档计数。When receiving these smaller subsets of results, you can also receive the count of documents in the total set of search results.

有关对搜索结果分页的详细信息,请参阅 如何在 Azure 认知搜索中对搜索结果分页一文。You can learn more about paging search results in the article How to page search results in Azure Cognitive Search.

对结果排序Ordering results

接收搜索查询结果时,可请求 Azure 认知搜索按特定字段中的值对结果排序。When receiving results for a search query, you can request that Azure Cognitive Search serves the results ordered by values in a specific field. 默认情况下,Azure 认知搜索根据每个文档的搜索评分(源自 TF-IDF)排名对结果排序。By default, Azure Cognitive Search orders the search results based on the rank of each document's search score, which is derived from TF-IDF.

如果希望 Azure 认知搜索返回按搜索评分以外的某个值排序的结果,可以使用 orderby 搜索参数。If you want Azure Cognitive Search to return your results ordered by a value other than the search score, you can use the orderby search parameter. 对于地理空间值,可以指定 orderby 参数的值,使其包含字段名称及对 geo.distance() 函数的调用。You can specify the value of the orderby parameter to include field names and calls to the geo.distance() function for geospatial values. 每个表达式可后接 asc 来指示按升序请求结果,或后接 desc 来指示按降序请求结果。Each expression can be followed by asc to indicate that results are requested in ascending order, and desc to indicate that results are requested in descending order. 默认为升序。The default ranking ascending order.

突出显示Hit highlighting

在 Azure 认知搜索中,使用 highlighthighlightPreTaghighlightPostTag 参数可轻松强调搜索结果中与搜索查询相匹配的确切部分。In Azure Cognitive Search, emphasizing the exact portion of search results that match the search query is made easy by using the highlight, highlightPreTag, and highlightPostTag parameters. 可以指定应强调其匹配文本的可搜索字段,还可指定要附加到 Azure 认知搜索所返回匹配文本的开头和结尾的精确字符串标记 。You can specify which searchable fields should have their matched text emphasized as well as specifying the exact string tags to append to the start and end of the matched text that Azure Cognitive Search returns.

另请参阅See also