使用“完整的”Lucene 搜索语法(Azure 认知搜索中的高级查询)Use the "full" Lucene search syntax (advanced queries in Azure Cognitive Search)

在构造 Azure 认知搜索的查询时,可以将默认的简单查询分析器替换为更全面的 Azure 认知搜索中的 Lucene 查询分析器,以便制定专用的高级查询定义。When constructing queries for Azure Cognitive Search, you can replace the default simple query parser with the more expansive Lucene Query Parser in Azure Cognitive Search to formulate specialized and advanced query definitions.

Lucene 分析器支持复杂的查询构造,比如字段范围查询、模糊搜索、中缀和后缀通配符搜索、邻近搜索、术语提升以及正则表达式搜索。The Lucene parser supports complex query constructs, such as field-scoped queries, fuzzy search, infix and suffix wildcard search, proximity search, term boosting, and regular expression search. 额外的功能需遵守额外的处理要求,因此执行时间应该会更长一些。The additional power comes with additional processing requirements so you should expect a slightly longer execution time. 本文展示了使用完整语法时的查询操作示例,可以按照这些示例逐步操作。In this article, you can step through examples demonstrating query operations available when using the full syntax.


通过完整的 Lucene 查询语法实现的专用查询构造很多都不是按文本分析的,所以并不涉及词干分解和词形还原,这一点有些出人意料。Many of the specialized query constructions enabled through the full Lucene query syntax are not text-analyzed, which can be surprising if you expect stemming or lemmatization. 只会对完整字词(字词查询或短语查询)进行词法分析。Lexical analysis is only performed on complete terms (a term query or phrase query). 字词不完整的查询类型(前缀查询、通配符查询、正则表达式查询、模糊查询)会被直接添加到查询树中,绕过分析阶段。Query types with incomplete terms (prefix query, wildcard query, regex query, fuzzy query) are added directly to the query tree, bypassing the analysis stage. 对部分查询字词执行的唯一转换操作是转换为小写。The only transformation performed on partial query terms is lowercasing.

在 Postman 中创建请求Formulate requests in Postman

下面的示例使用“纽约工作岗位”搜索索引,它包含基于纽约市开放数据计划提供的数据集得出的岗位。The following examples leverage a NYC Jobs search index consisting of jobs available based on a dataset provided by the City of New York OpenData initiative. 此数据不应认为是最新或完整数据。This data should not be considered current or complete. 该索引位于 Microsoft 提供的一项沙盒服务上,也就是说无需 Azure 订阅或 Azure 认知搜索即可试用这些查询。The index is on a sandbox service provided by Microsoft, which means you do not need an Azure subscription or Azure Cognitive Search to try these queries.

要在 GET 上发出 HTTP 请求,需具备 Postman 或其等效工具。What you do need is Postman or an equivalent tool for issuing HTTP request on GET. 有关详细信息,请参阅使用 REST 客户端进行浏览For more information, see Explore with REST clients.

设置请求标头Set the request header

  1. 在请求标头中,将“Content-Type”设为 application/jsonIn the request header, set Content-Type to application/json.

  2. 添加 api-key,并将其设为此字符串:252044BE3886FE4A8E3BAA4F595114BBAdd an api-key, and set it to this string: 252044BE3886FE4A8E3BAA4F595114BB. 它是托管“纽约工作岗位”索引的沙盒搜索服务的查询密钥。This is a query key for the sandbox search service hosting the NYC Jobs index.

指定请求标头后,只需更改“search=”字符串即可在本文中的各项查询中重复使用 。After you specify the request header, you can reuse it for all of the queries in this article, swapping out only the search= string.

Postman 请求标头

设置请求 URLSet the request URL

请求是一个与包含 Azure 认知搜索终结点和搜索字符串的 URL 配对的 GET 命令。Request is a GET command paired with a URL containing the Azure Cognitive Search endpoint and search string.

Postman 请求标头

URL 组合具备以下元素:URL composition has the following elements:

  • https://azs-playground.search.azure.cn/ 是由 Azure 认知搜索开发团队维护的沙盒搜索服务 。https://azs-playground.search.azure.cn/ is a sandbox search service maintained by the Azure Cognitive Search development team.
  • indexes/nycjobs/ 是该服务的索引集合中的“纽约工作岗位”索引 。indexes/nycjobs/ is the NYC Jobs index in the indexes collection of that service. 请求中需同时具备服务名称和索引。Both the service name and index are required on the request.
  • docs 是包含所有可搜索内容的文档集合 。docs is the documents collection containing all searchable content. 请求标头中提供的查询 api-key 仅适用于针对文档集合的读取操作。The query api-key provided in the request header only works on read operations targeting the documents collection.
  • api-version=2020-06-30 设置了 api-version(每个请求都需具备此参数) 。api-version=2020-06-30 sets the api-version, which is a required parameter on every request.
  • search=* 是查询字符串,此元素在初始查询中为 NULL,返回前 50 个结果(此为默认情况) 。search=* is the query string, which in the initial query is null, returning the first 50 results (by default).

发送自己的第一个查询Send your first query

进行验证,将以下请求粘贴至 GET 并单击“发送” 。As a verification step, paste the following request into GET and click Send. 结果以详细的 JSON 文档形式返回。Results are returned as verbose JSON documents. 将返回整个文档,这样就可以查看所有字段和所有值。Entire documents are returned, which allows you to see all fields and all values.

将此 URL 作为验证步骤粘贴到 REST 客户端中并查看文档结构。Paste this URL into a REST client as a validation step and to view document structure.


查询字符串 search=* 是一个未指定的搜索,它与 NULL 或空搜索等效 。The query string, search=*, is an unspecified search equivalent to null or empty search. 它是可以执行的最简单搜索。It's the simplest search you can do.

可选择将 $count=true 添加到 URL,以便返回一个符合搜索条件的文档的计数 。Optionally, you can add $count=true to the URL to return a count of the documents matching the search criteria. 在空搜索字符串上,这就是索引中的所有文档(在“纽约工作岗位”例子中,数量约为 2800)。On an empty search string, this is all the documents in the index (about 2800 in the case of NYC Jobs).

如何调用完整 Lucene 分析How to invoke full Lucene parsing

添加 queryType=full 可调用完整查询语法,替代默认的简单查询语法 。Add queryType=full to invoke the full query syntax, overriding the default simple query syntax.


本文中的所有示例都指定了 queryType=full 搜索参数,指明由 Lucene 查询分析程序处理完整语法。All of the examples in this article specify the queryType=full search parameter, indicating that the full syntax is handled by the Lucene Query Parser.

示例 1:将查询范围限定为字段列表Example 1: Query scoped to a list of fields

第一个示例并非特定于 Lucene,但我们将先使用它来介绍第一个基本查询概念:字段范围。This first example is not Lucene-specific, but we lead with it to introduce the first fundamental query concept: field scope. 此示例将整个查询和响应的范围限定为几个特定的字段。This example scopes the entire query and the response to just a few specific fields. 当你的工具是 Postman 或搜索资源管理器时,了解如何构建可读的 JSON 响应非常重要。Knowing how to structure a readable JSON response is important when your tool is Postman or Search explorer.

出于简洁目的,该查询仅针对 business_title 字段并指定仅返回职位 。For brevity, the query targets only the business_title field and specifies only business titles are returned. searchFields 参数将查询执行限制为 business_title 字段,select 指定响应中包含的字段。The searchFields parameter restricts query execution to just the business_title field, and select specifies which fields are included in the response.

搜索表达式Search expression


下面是同一查询,在逗号分隔列表中具有多个字段。Here is the same query with multiple fields in a comma-delimited list.

search=*&searchFields=business_title, posting_type&$select=business_title, posting_type

逗号后面的空格是可选的。The spaces after the commas are optional.


在应用程序代码中使用 REST API 时,请记得对参数进行 URL 编码,例如 $selectsearchFieldsWhen using the REST API from your application code, don't forget to URL-encode parameters like $select and searchFields.

完整 URLFull URL


此查询的响应应与以下屏幕截图类似。Response for this query should look similar to the following screenshot.

Postman 示例响应

你可能已经注意到响应中的搜索分数。You might have noticed the search score in the response. 由于搜索不是全文搜索或者没有应用条件,因此不存在排名时评分统统为 1。Uniform scores of 1 occur when there is no rank, either because the search was not full text search, or because no criteria was applied. 对于不带条件的空搜索,按任意顺序返回行。For null search with no criteria, rows come back in arbitrary order. 包含实际搜索条件时,会看到搜索评分变成有意义的值。When you include actual search criteria, you will see search scores evolve into meaningful values.

完整 Lucene 语法支持将单个搜索表达式的范围限定为特定的字段。Full Lucene syntax supports scoping individual search expressions to a specific field. 此示例搜索其中带有字词 senior 而非 junior 的 business title(职务)。This example searches for business titles with the term senior in them, but not junior.

搜索表达式Search expression

$select=business_title&search=business_title:(senior NOT junior)

下面是包含多个字段的相同查询。Here is the same query with multiple fields.

$select=business_title, posting_type&search=business_title:(senior NOT junior) AND posting_type:external

完整 URLFull URL

https://azs-playground.search.azure.cn/indexes/nycjobs/docs?api-version=2020-06-30&queryType=full&$count=true&$select=business_title&search=business_title:(senior NOT junior)

Postman 示例响应

可以使用 fieldName:searchExpression 语法定义字段化搜索操作,其中的搜索表达式可以是单个词,也可以是一个短语,或者是括号中的更复杂的表达式,可以选择使用布尔运算符。You can define a fielded search operation with the fieldName:searchExpression syntax, where the search expression can be a single word or a phrase, or a more complex expression in parentheses, optionally with Boolean operators. 下面是部分示例:Some examples include the following:

  • business_title:(senior NOT junior)
  • state:("New York" OR "New Jersey")
  • business_title:(senior NOT junior) AND posting_type:external

如果想要两个字符串评估为单个实体,请务必将多个字符串放置在引号内,正如这个在 state 字段中搜索两个不同位置的情况一样。Be sure to put multiple strings within quotation marks if you want both strings to be evaluated as a single entity, as in this case searching for two distinct locations in the state field. 此外,请确保运算符大写,就像你看到的 NOT 和 AND 一样。Also, ensure the operator is capitalized as you see with NOT and AND.

fieldName:searchExpression 中指定的字段必须是可搜索的字段。The field specified in fieldName:searchExpression must be a searchable field. 有关如何在字段定义中使用索引属性的详细信息,请参阅创建索引(Azure 认知搜索 REST API)See Create Index (Azure Cognitive Search REST API) for details on how index attributes are used in field definitions.


在以上示例中,不需要使用 searchFields 参数,因为查询的每个部分都显式指定了一个字段名称。In the example above, we did not need to use the searchFields parameter because each part of the query has a field name explicitly specified. 但是,如果需要运行查询,则仍可使用 searchFields 参数,其中的某些部分局限于特定字段,其余部分可以应用到多个字段。However, you can still use the searchFields parameter if you want to run a query where some parts are scoped to a specific field, and the rest could apply to several fields. 例如,查询 search=business_title:(senior NOT junior) AND external&searchFields=posting_type 只将 senior NOT junior 匹配到 business_title 字段,而它则会将“external”与 posting_type 字段匹配。For example, the query search=business_title:(senior NOT junior) AND external&searchFields=posting_type would match senior NOT junior only to the business_title field, while it would match "external" with the posting_type field. fieldName:searchExpression 中提供的字段名称始终优先于 searchFields 参数,这就是在此示例中我们不需在 searchFields 参数中包括 business_title 的原因。The field name provided in fieldName:searchExpression always takes precedence over the searchFields parameter, which is why in this example, we do not need to include business_title in the searchFields parameter.

完整 Lucene 语法还支持模糊搜索,能对构造相似的术语进行匹配。Full Lucene syntax also supports fuzzy search, matching on terms that have a similar construction. 若要执行模糊搜索,请在单个字词的末尾追加“~”波形符,后跟指定编辑距离的可选参数(介于 0 到 2 之间的值)。To do a fuzzy search, append the tilde ~ symbol at the end of a single word with an optional parameter, a value between 0 and 2, that specifies the edit distance. 例如,blue~blue~1 会返回 blue、blues 和 glue。For example, blue~ or blue~1 would return blue, blues, and glue.

搜索表达式Search expression


不直接支持短语,但可以指定根据短语的组件部分执行模糊匹配。Phrases aren't supported directly but you can specify a fuzzy match on component parts of a phrase.

searchFields=business_title&$select=business_title&search=business_title:asosiate~ AND comm~ 

完整 URLFull URL

此查询搜索带有术语“associate”(故意拼错)的作业:This query searches for jobs with the term "associate" (deliberately misspelled):




不会对模糊查询进行分析Fuzzy queries are not analyzed. 字词不完整的查询类型(前缀查询、通配符查询、正则表达式查询、模糊查询)会被直接添加到查询树中,绕过分析阶段。Query types with incomplete terms (prefix query, wildcard query, regex query, fuzzy query) are added directly to the query tree, bypassing the analysis stage. 对不完整查询字词执行的唯一转换操作是转换为小写。The only transformation performed on incomplete query terms is lowercasing.

邻近搜索用于搜索文档中彼此邻近的术语。Proximity searches are used to find terms that are near each other in a document. 在短语末尾插入波形符“~”,后跟创建邻近边界的词数。Insert a tilde "~" symbol at the end of a phrase followed by the number of words that create the proximity boundary. 例如“酒店机场”~5 将查找文档中彼此之间 5 个字以内的术语“酒店”和“机场”。For example, "hotel airport"~5 will find the terms hotel and airport within 5 words of each other in a document.

搜索表达式Search expression


完整 URLFull URL

在此查询中,对于包含术语“senior analyst”的作业(其中分隔字数不超过一个字):In this query, for jobs with the term "senior analyst" where it is separated by no more than one word:



再次尝试删除术语“高级分析师”之间的词。Try it again removing the words between the term "senior analyst". 请注意,此查询返回了 8 个文档,而前面的查询中返回了 10 个文档。Notice that 8 documents are returned for this query as opposed to 10 for the previous query.


示例 5:术语提升Example 5: Term boosting

术语提升是指相对于不包含术语的文档,提高包含提升术语的文档排名。Term boosting refers to ranking a document higher if it contains the boosted term, relative to documents that do not contain the term. 若要提升术语,请使用插入符号“^”,并且所搜索术语末尾还要附加提升系数(数字)。To boost a term, use the caret, "^", symbol with a boost factor (a number) at the end of the term you are searching.

完整 URLFull URLs

在“before”查询中,搜索包含术语“computer analyst”的作业时,你会发现没有同时包含“computer”和“analyst”的结果,但“computer”作业排在结果顶部 。In this "before" query, search for jobs with the term computer analyst and notice there are no results with both words computer and analyst, yet computer jobs are at the top of the results.



在“after”查询中,请重试该搜索,如果两个词都不存在,此时会提升包含术语“analyst”而非“computer”的结果 。In the "after" query, repeat the search, this time boosting results with the term analyst over the term computer if both words do not exist.


上述查询有一个更能让人理解的版本:search=business_title:computer analyst^2A more human readable version of the above query is search=business_title:computer analyst^2. 对于可操作的查询,^2 被编码为 %5E2,这比较不容易理解。For a workable query, ^2 is encoded as %5E2, which is harder to see.


术语提升不同于计分配置文件,因为计分配置文件提升某些字段,而非特定术语。Term boosting differs from scoring profiles in that scoring profiles boost certain fields, rather than specific terms. 以下示例有助于解释这些差异。The following example helps illustrate the differences.

请考虑在某个字段中提升匹配项的计分配置文件,例如 musicstoreindex 示例中的“流派” 。Consider a scoring profile that boosts matches in a certain field, such as genre in the musicstoreindex example. 术语提升可用于进一步提升高于其他术语的某些搜索词。Term boosting could be used to further boost certain search terms higher than others. 例如“rock^2 electronic”将提升在“流派” 字段(高于搜索中的其他搜索字段)中包含搜索词的文档。For example, "rock^2 electronic" will boost documents that contain the search terms in the genre field higher than other searchable fields in the index. 另外,由于术语提升值 (2) 的原因,包含搜索词“rock”的文档的排名要比包含搜索词“electronic”的要高。Furthermore, documents that contain the search term "rock" will be ranked higher than the other search term "electronic" as a result of the term boost value (2).

在设置因素级别时,提升系数越高,术语相对于其他搜索词的相关性也越大。When setting the factor level, the higher the boost factor, the more relevant the term will be relative to other search terms. 默认情况下,提升系数是 1。By default, the boost factor is 1. 虽然提升系数必须是整数,但可以小于 1(例如 0.2)。Although the boost factor must be positive, it can be less than 1 (for example, 0.2).

示例 6:正则表达式Example 6: Regex

正则表达式搜索基于正斜杠“/”之间的内容查找匹配项,如在 RegExp 类中所记录的那样。A regular expression search finds a match based on the contents between forward slashes "/", as documented in the RegExp class.

搜索表达式Search expression


完整 URLFull URL

此查询搜索带有字词 Senior 或 Junior 的职务:search=business_title:/(Sen|Jun)ior/In this query, search for jobs with either the term Senior or Junior: search=business_title:/(Sen|Jun)ior/.




不会对正则表达式查询进行分析Regex queries are not analyzed. 对不完整查询字词执行的唯一转换操作是转换为小写。The only transformation performed on incomplete query terms is lowercasing.

可将通常可识别的语法用于多个 (*) 或单个 (?) 字符通配符搜索。You can use generally recognized syntax for multiple (*) or single (?) character wildcard searches. 请注意,Lucene 查询分析器支持将这些符号与单个术语一起使用,但不能与短语一起使用。Note the Lucene query parser supports the use of these symbols with a single term, and not a phrase.

搜索表达式Search expression


完整 URLFull URL

在此查询中,搜索包含前缀“prog”的作业,这会包含带有术语“编程”和“程序员”的职位。In this query, search for jobs that contain the prefix 'prog' which would include business titles with the terms programming and programmer in it. 不得将 * 或 ? You cannot use a * or ? 符号用作搜索的第一个字符。symbol as the first character of a search.




不会对通配符查询进行分析Wildcard queries are not analyzed. 对不完整查询字词执行的唯一转换操作是转换为小写。The only transformation performed on incomplete query terms is lowercasing.

后续步骤Next steps

请尝试在代码中指定 Lucene 查询分析器。Try specifying the Lucene Query Parser in your code. 以下链接介绍如何为 .NET 和 REST API 设置搜索查询。The following links explain how to set up search queries for both .NET and the REST API. 链接使用默认的简单语法,因此需要应用从本文中所学知识指定 queryTypeThe links use the default simple syntax so you will need to apply what you learned from this article to specify the queryType.

可在以下链接找到其他语法参考、查询体系结构和示例:Additional syntax reference, query architecture, and examples can be found in the following links: