Azure 认知搜索中的筛选器Filters in Azure Cognitive Search

筛选器提供相关的条件来指定如何选择 Azure 认知搜索查询中使用的文档。 A filter provides criteria for selecting documents used in an Azure Cognitive Search query. 未经筛选的搜索包含索引中的所有文档。Unfiltered search includes all documents in the index. 筛选器将搜索查询的范围限定为一部分文档。A filter scopes a search query to a subset of documents. 例如,筛选器可将全文搜索限制为具有特定品牌或颜色,并且价位超过特定阈值的产品。For example, a filter could restrict full text search to just those products having a specific brand or color, at price points above a certain threshold.

某些搜索体验在实施过程中会施加筛选要求,但你随时可以使用基于值的条件(将搜索范围限定为产品类型“书籍”、类别“纪实”、发布者“Simon 和 Schuster”)。 Some search experiences impose filter requirements as part of the implementation, but you can use filters anytime you want to constrain search using value-based criteria (scoping search to product type "books" for category "non-fiction" published by "Simon & Schuster").

如果目标是针对特定的数据结构执行有针对性的搜索(将搜索范围限定为客户评论字段),可采用如下所述的替代方法。 If instead your goal is targeted search on specific data structures (scoping search to a customer-reviews field), there are alternative methods, described below.

使用筛选器的时机When to use a filter

筛选器在多种搜索体验中不可或缺,包括“附近查找”、分面导航,以及仅显示允许用户查看的文档的安全筛选器。Filters are foundational to several search experiences, including "find near me", faceted navigation, and security filters that show only those documents a user is allowed to see. 如果实施其中任何一种体验,必须使用筛选器。If you implement any one of these experiences, a filter is required. 该筛选器会附加到提供地理位置坐标的搜索查询、用户选择的分面类别,或请求者的安全 ID。It's the filter attached to the search query that provides the geolocation coordinates, the facet category selected by the user, or the security ID of the requestor.

示例方案包括:Example scenarios include the following:

  1. 使用筛选器基于索引中的数据值将索引分片。Use a filter to slice your index based on data values in the index. 获得一个包含城市、房屋类型和设施的架构后,可以创建一个筛选器来显式选择满足条件(西雅图、共管式公寓、水滨)的文档。Given a schema with city, housing type, and amenities, you might create a filter to explicitly select documents that satisfy your criteria (in Seattle, condos, waterfront).

    使用相同输入的全文搜索通常生成类似的结果,但筛选器更精确,因为它要求筛选字词与索引中的内容完全匹配。Full text search with the same inputs often produces similar results, but a filter is more precise in that it requires an exact match of the filter term against content in your index.

  2. 如果搜索体验附带筛选要求,请使用筛选器:Use a filter if the search experience comes with a filter requirement:

    • 分面导航使用筛选器传回用户选择的分面类别。Faceted navigation uses a filter to pass back the facet category selected by the user.
    • 地理搜索使用筛选器在“附近查找”应用中传递当前位置的坐标。Geo-search uses a filter to pass coordinates of the current location in "find near me" apps.
    • 安全筛选器将安全标识符作为筛选条件进行传递,在索引中找到的匹配项充当代理,提供对文档的访问权限。Security filters pass security identifiers as filter criteria, where a match in the index serves as a proxy for access rights to the document.
  3. 如果想要对数字字段施加搜索条件,请使用筛选器。Use a filter if you want search criteria on a numeric field.

    数字字段可在文档中检索并可显示在搜索结果中,但不可单独搜索(受全文搜索的限制)。Numeric fields are retrievable in the document and can appear in search results, but they are not searchable (subject to full text search) individually. 如果需要基于数字数据施加选择条件,请使用筛选器。If you need selection criteria based on numeric data, use a filter.

缩小范围的替代方法Alternative methods for reducing scope

如果想要缩小搜索结果的范围,筛选器并不是唯一的选择。If you want a narrowing effect in your search results, filters are not your only choice. 根据目标,以下替代方法可能更合适:These alternatives could be a better fit, depending on your objective:

  • searchFields 查询参数将搜索范围限定为特定的字段。searchFields query parameter pegs search to specific fields. 例如,如果索引分别为英语和西班牙语说明提供了单独的字段,则可以使用 searchFields 来限定要用于全文搜索的字段。For example, if your index provides separate fields for English and Spanish descriptions, you can use searchFields to target which fields to use for full text search.

  • $select 参数用于指定要在结果集中包含哪些字段,在将响应发送到调用方应用程序之前能够有效地修剪响应。$select parameter is used to specify which fields to include in a result set, effectively trimming the response before sending it to the calling application. 此参数不会具体化查询或缩小文档集合,但如果目标是获取更小的响应,则可以考虑使用此参数。This parameter does not refine the query or reduce the document collection, but if a smaller response is your goal, this parameter is an option to consider.

有关上述任一参数的详细信息,请参阅搜索文档 > 请求 > 查询参数For more information about either parameter, see Search Documents > Request > Query parameters.

如何执行筛选器How filters are executed

在查询时,筛选器分析器会接受作为输入提供的条件、将表达式转换为树状表示的原子布尔表达式,然后基于索引中的可筛选字段评估筛选器树。At query time, a filter parser accepts criteria as input, converts the expression into atomic Boolean expressions represented as a tree, and then evaluates the filter tree over filterable fields in an index.

筛选是配合搜索发生的,限定要将哪些文档包含在下游处理流程中进行文档检索和相关性评分。Filtering occurs in tandem with search, qualifying which documents to include in downstream processing for document retrieval and relevance scoring. 与搜索字符串搭配使用时,筛选器能够有效减小后续搜索操作的重调集。When paired with a search string, the filter effectively reduces the recall set of the subsequent search operation. 如果单独使用(例如,在 search=* 中查询字符串为空时),筛选条件是唯一的输入。When used alone (for example, when the query string is empty where search=*), the filter criteria is the sole input.

定义筛选器Defining filters

筛选器是使用 Azure 认知搜索中支持的 OData V4 语法子集构建的 OData 表达式。Filters are OData expressions, articulated using a subset of OData V4 syntax supported in Azure Cognitive Search.

可为每个搜索操作指定一个筛选器,但筛选器本身可以包含多个字段和多个条件,如果使用 ismatch 函数,则还可以包含多个全文搜索表达式。You can specify one filter for each search operation, but the filter itself can include multiple fields, multiple criteria, and if you use an ismatch function, multiple full-text search expressions. 在多部分筛选表达式中,可按任意顺序指定谓词(受运算符优先顺序规则的约束)。In a multi-part filter expression, you can specify predicates in any order (subject to the rules of operator precedence). 如果尝试按特定的顺序重新排列谓词,性能不会有明显的提升。There is no appreciable gain in performance if you try to rearrange predicates in a particular sequence.

筛选表达式的限制之一是请求的最大大小限制。One of the limits on a filter expression is the maximum size limit of the request. 整个 POST 请求(包括筛选器)最大可为 16 MB;对于 GET 请求,最大可为 8 KB。The entire request, inclusive of the filter, can be a maximum of 16 MB for POST, or 8 KB for GET. 筛选表达式中的子句数也有限制。There is also a limit on the number of clauses in your filter expression. 根据经验,如果有数百个子句,则就存在达到限制的风险。A good rule of thumb is that if you have hundreds of clauses, you are at risk of running into the limit. 我们建议正确设计应用程序,使之不会生成大小不受限制的筛选器。We recommend designing your application in such a way that it does not generate filters of unbounded size.

以下示例显示了多个 API 中的原型筛选器定义。The following examples represent prototypical filter definitions in several APIs.

# Option 1:  Use $filter for GET
GET https://[service name].search.azure.cn/indexes/hotels/docs?api-version=2020-06-30&search=*&$filter=Rooms/any(room: room/BaseRate lt 150.0)&$select=HotelId, HotelName, Rooms/Description, Rooms/BaseRate

# Option 2: Use filter for POST and pass it in the request body
POST https://[service name].search.azure.cn/indexes/hotels/docs/search?api-version=2020-06-30
{
    "search": "*",
    "filter": "Rooms/any(room: room/BaseRate lt 150.0)",
    "select": "HotelId, HotelName, Rooms/Description, Rooms/BaseRate"
}
    parameters =
        new SearchParameters()
        {
            Filter = "Rooms/any(room: room/BaseRate lt 150.0)",
            Select = new[] { "HotelId", "HotelName", "Rooms/Description" ,"Rooms/BaseRate"}
        };

    var results = searchIndexClient.Documents.Search("*", parameters);

筛选器使用模式Filter usage patterns

以下示例演示了筛选器方案的多种使用模式。The following examples illustrate several usage patterns for filter scenarios. 有关更多思路,请参阅 OData 表达式语法 > 示例For more ideas, see OData expression syntax > Examples.

  • 单独使用 $filter 而不使用查询字符串:如果筛选表达式能够完全限定所需的文档,则此模式很有效。Standalone $filter, without a query string, useful when the filter expression is able to fully qualify documents of interest. 不使用查询字符串也就不会执行词法或语言分析、评分和排名。Without a query string, there is no lexical or linguistic analysis, no scoring, and no ranking. 请注意,搜索字符串只是一个星号,表示“匹配所有文档”。Notice the search string is just an asterisk, which means "match all documents".

    search=*&$filter=Rooms/any(room: room/BaseRate ge 60 and room/BaseRate lt 300) and Address/City eq 'Honolulu'
    
  • 将查询字符串与 $filter 结合使用:让筛选器创建子集,让查询字符串提供字词输入,用于对筛选的子集进行全文搜索。Combination of query string and $filter, where the filter creates the subset, and the query string provides the term inputs for full text search over the filtered subset. 添加搜索词(“步行距离内的影院”)可以在结果中引入搜索分数,与搜索词最匹配的文档的排名较高。The addition of terms (walking distance theaters) introduces search scores in the results, where documents that best match the terms are ranked higher. 将筛选器与查询字符串结合使用是最常见的使用模式。Using a filter with a query string is the most common usage pattern.

    search=walking distance theaters&$filter=Rooms/any(room: room/BaseRate ge 60 and room/BaseRate lt 300) and Address/City eq 'Seattle'&$count=true
    
  • 复合查询:使用“or”分隔查询,每个查询有自身的筛选条件(例如,'beagles' in 'dog' or 'siamese' in 'cat')。Compound queries, separated by "or", each with its own filter criteria (for example, 'beagles' in 'dog' or 'siamese' in 'cat'). or 合并的表达式将单独求值,响应中会发回与每个表达式匹配的文档的联合。Expressions combined with or are evaluated individually, with the union of documents matching each expression sent back in the response. 此使用模式是通过 search.ismatchscoring 函数实现的。This usage pattern is achieved through the search.ismatchscoring function. 也可以使用无评分版本 search.ismatchYou can also use the non-scoring version, search.ismatch.

    # Match on hostels rated higher than 4 OR 5-star motels.
    $filter=search.ismatchscoring('hostel') and Rating ge 4 or search.ismatchscoring('motel') and Rating eq 5
    
    # Match on 'luxury' or 'high-end' in the description field OR on category exactly equal to 'Luxury'.
    $filter=search.ismatchscoring('luxury | high-end', 'Description') or Category eq 'Luxury'&$count=true
    

    还可以使用 and(而不是 or)通过包含筛选器的 search.ismatchscoring 来合并全文搜索,但此功能相当于在搜索请求中使用 search$filter 参数。It is also possible to combine full-text search via search.ismatchscoring with filters using and instead of or, but this is functionally equivalent to using the search and $filter parameters in a search request. 例如,以下两个查询生成相同的结果:For example, the following two queries produce the same result:

    $filter=search.ismatchscoring('pool') and Rating ge 4
    
    search=pool&$filter=Rating ge 4
    

请参阅以下文章获取有关特定用例的综合性指导:Follow up with these articles for comprehensive guidance on specific use cases:

与筛选相关的字段要求Field requirements for filtering

在 REST API 中,默认为简单字段启用了可筛选性。 In the REST API, filterable is on by default for simple fields. 可筛选字段会增大索引大小;对于不打算真正在筛选器中使用的字段,请务必设置 "filterable": falseFilterable fields increase index size; be sure to set "filterable": false for fields that you don't plan to actually use in a filter. 有关字段定义设置的详细信息,请参阅创建索引For more information about settings for field definitions, see Create Index.

在 .NET SDK 中,可筛选性默认为“关”。 In the .NET SDK, the filterable is off by default. 可以通过将相应 Field 对象的 IsFilterable 属性设置为 true,使某个字段可筛选。You can make a field filterable by setting the IsFilterable property of the corresponding Field object to true. 也可以使用 IsFilterable 特性以声明方式实现此目的。You can also do this declaratively by using the IsFilterable attribute. 在以下示例中,该特性已在一个映射到索引定义的模型类的 BaseRate 属性中设置。In the example below, the attribute is set on the BaseRate property of a model class that maps to the index definition.

    [IsFilterable, IsSortable, IsFacetable]
    public double? BaseRate { get; set; }

使现有字段可筛选Making an existing field filterable

无法通过修改现有字段来使其可筛选。You can't modify existing fields to make them filterable. 为此需要添加新字段或重建索引。Instead, you need to add a new field, or rebuild the index. 有关重建索引或重新填充字段的详细信息,请参阅如何重建 Azure 认知搜索索引For more information about rebuilding an index or repopulating fields, see How to rebuild an Azure Cognitive Search index.

文本筛选器基础知识Text filter fundamentals

文本筛选器根据筛选器中提供的文本字符串匹配字符串字段。Text filters match string fields against literal strings that you provide in the filter. 与全文搜索不同,对于文本筛选器,不会执行词法分析或分词,因此,比较操作仅用于精确匹配。Unlike full-text search, there is no lexical analysis or word-breaking for text filters, so comparisons are for exact matches only. 例如,假设字段 f 包含“sunny day”,则 $filter=f eq 'Sunny' 与条件不匹配,但 $filter=f eq 'sunny day' 匹配。For example, assume a field f contains "sunny day", $filter=f eq 'Sunny' does not match, but $filter=f eq 'sunny day' will.

文本字符串区分大小写。Text strings are case-sensitive. 大写的单词不会转换成小写:$filter=f eq 'Sunny day' 不会查找“sunny day”。There is no lower-casing of upper-cased words: $filter=f eq 'Sunny day' will not find "sunny day".

基于文本进行筛选的方法Approaches for filtering on text

方法Approach 说明Description 何时使用When to use
search.in 根据字符串分隔列表匹配字段的函数。A function that matches a field against a delimited list of strings. 建议用于安全筛选器,以及其中的许多原始文本值需要与某个字符串字段匹配的任何筛选器。Recommended for security filters and for any filters where many raw text values need to be matched with a string field. search.in 函数旨在提高速度,相比于显式使用 eqor 将字段与每个字符串进行比较,其速度要快得多。The search.in function is designed for speed and is much faster than explicitly comparing the field against each string using eq and or.
search.ismatch 用于在同一个筛选表达式中将全文搜索操作与严格的布尔筛选操作混合使用的函数。A function that allows you to mix full-text search operations with strictly Boolean filter operations in the same filter expression. 想要在一个请求中使用多种搜索-筛选组合时,请使用 search.ismatch(或其等效的评分函数 search.ismatchscoring)。Use search.ismatch (or its scoring equivalent, search.ismatchscoring) when you want multiple search-filter combinations in one request. 还可以使用该函数来构建 contains 筛选器,以根据较大字符串中的部分字符串进行筛选。You can also use it for a contains filter to filter on a partial string within a larger string.
$filter=field operator string 由字段、运算符和值组成的用户定义的表达式。A user-defined expression composed of fields, operators, and values. 想要在字符串字段与字符串值之间查找完全匹配项时,请使用此函数。Use this when you want to find exact matches between a string field and a string value.

数字筛选器基础知识Numeric filter fundamentals

在全文搜索的上下文中,数字字段不可搜索 (not searchable)。Numeric fields are not searchable in the context of full text search. 只有字符串支持全文搜索。Only strings are subject to full text search. 例如,如果输入 99.99 作为搜索词,不会返回价格为 $99.99 的项,For example, if you enter 99.99 as a search term, you won't get back items priced at $99.99. 而只会看到在文档字符串字段中包含数字 99 的项。Instead, you would see items that have the number 99 in string fields of the document. 因此,如果使用了数字数据,则假设条件是要在范围、分面、组等筛选器中使用这些数据。Thus, if you have numeric data, the assumption is that you will use them for filters, including ranges, facets, groups, and so forth.

如果数字字段(价格、大小、SKU、ID)已标记为 retrievable,则包含这些字段的文档会在搜索结果中提供这些值。Documents that contain numeric fields (price, size, SKU, ID) provide those values in search results if the field is marked retrievable. 此处的要点是,全文搜索本身不适用于数字字段类型。The point here is that full text search itself is not applicable to numeric field types.

后续步骤Next steps

首先,尝试在门户中使用“搜索浏览器”来提交包含 $filter 参数的查询。First, try Search explorer in the portal to submit queries with $filter parameters. real-estate-sample 索引粘贴到搜索栏后,该索引会针对以下筛选的查询提供有趣的结果:The real-estate-sample index provides interesting results for the following filtered queries when you paste them into the search bar:

# Geo-filter returning documents within 5 kilometers of Redmond, Washington state
# Use $count=true to get a number of hits returned by the query
# Use $select to trim results, showing values for named fields only
# Use search=* for an empty query string. The filter is the sole input

search=*&$count=true&$select=description,city,postCode&$filter=geo.distance(location,geography'POINT(-122.121513 47.673988)') le 5

# Numeric filters use comparison like greater than (gt), less than (lt), not equal (ne)
# Include "and" to filter on multiple fields (baths and bed)
# Full text search is on John Leclerc, matching on John or Leclerc

search=John Leclerc&$count=true&$select=source,city,postCode,baths,beds&$filter=baths gt 3 and beds gt 4

# Text filters can also use comparison operators
# Wrap text in single or double quotes and use the correct case
# Full text search is on John Leclerc, matching on John or Leclerc

search=John Leclerc&$count=true&$select=source,city,postCode,baths,beds&$filter=city gt 'Seattle'

若要学习更多示例,请参阅 OData 筛选表达式语法 > 示例To work with more examples, see OData Filter Expression Syntax > Examples.

另请参阅See also