用于在 Azure 认知搜索中修整结果的安全筛选器Security filters for trimming results in Azure Cognitive Search

可以应用安全筛选器,以根据用户标识对 Azure 认知搜索中的搜索结果进行修整。You can apply security filters to trim search results in Azure Cognitive Search based on user identity. 此搜索体验通常需要将请求搜索的任何人的标识,与包含拥有文档权限的主体的字段进行比较。This search experience generally requires comparing the identity of whoever requests the search against a field containing the principles who have permissions to the document. 如果找到匹配项,则该用户或主体(例如组或角色)有权访问该文档。When a match is found, the user or principal (such as a group or role) has access to that document.

实现安全筛选的方法之一是对相等表达式进行复杂析取:例如 Id eq 'id1' or Id eq 'id2',等等。One way to achieve security filtering is through a complicated disjunction of equality expressions: for example, Id eq 'id1' or Id eq 'id2', and so forth. 此方法容易出错且难以维护,如果列表包含数百甚至数千个值,会将查询响应时间减慢许多秒。This approach is error-prone, difficult to maintain, and in cases where the list contains hundreds or thousands of values, slows down query response time by many seconds.

更简单快捷的方法是使用 search.in 函数。A simpler and faster approach is through the search.in function. 如果使用 search.in(Id, 'id1, id2, ...') 而不是相等表达式,有望获得亚秒级响应时间。If you use search.in(Id, 'id1, id2, ...') instead of an equality expression, you can expect sub-second response times.

本文介绍如何使用以下步骤实现安全筛选:This article shows you how to accomplish security filtering using the following steps:

  • 创建包含主体标识符的字段Create a field that contains the principal identifiers
  • 推送或更新包含相关主体标识符的现有文档Push or update existing documents with the relevant principal identifiers
  • 发出包含 search.in filter 的搜索请求Issue a search request with search.in filter

Note

本文档未介绍检索主体标识符的过程。The process of retrieving the principal identifiers is not covered in this document. 应该从标识服务提供程序获取该标识符。You should get it from your identity service provider.

必备条件Prerequisites

本文假设读者拥有 Azure 订阅Azure 认知搜索服务Azure 认知搜索索引This article assumes you have an Azure subscription, Azure Cognitive Search service, and Azure Cognitive Search Index.

创建安全字段Create security field

文档必须包含一个指定哪些组拥有访问权限的字段。Your documents must include a field specifying which groups have access. 此信息将成为筛选条件,在返回给请求发出者的结果集中选择或拒绝文档时,将以此条件为依据。This information becomes the filter criteria against which documents are selected or rejected from the result set returned to the issuer. 我们假设为受保护的文件创建了一个索引,每个文件可由一组不同的用户访问。Let's assume that we have an index of secured files, and each file is accessible by a different set of users.

  1. 将字段 group_ids(此处可选择任意名称)为 Collection(Edm.String)Add field group_ids (you can choose any name here) as a Collection(Edm.String). 确保该字段的 filterable 属性设置为 true,以便根据用户拥有的访问权限筛选搜索结果。Make sure the field has a filterable attribute set to true so that search results are filtered based on the access the user has. 例如,如果针对 file_name 为“secured_file_b”的文档将 group_ids 字段设置为 ["group_id1, group_id2"],则只有属于组 ID“group_id1”或“group_id2”的用户才对该文件拥有读访问权限。For example, if you set the group_ids field to ["group_id1, group_id2"] for the document with file_name "secured_file_b", only users that belong to group ids "group_id1" or "group_id2" have read access to the file. 确保字段的 retrievable 属性设置为 false,以便不会将其返回为搜索请求的一部分。Make sure the field's retrievable attribute is set to false so that it is not returned as part of the search request.
  2. 此外,针对此示例添加 file_idfile_name 字段。Also add file_id and file_name fields for the sake of this example.
{
    "name": "securedfiles",  
    "fields": [
        {"name": "file_id", "type": "Edm.String", "key": true, "searchable": false, "sortable": false, "facetable": false},
        {"name": "file_name", "type": "Edm.String"},
        {"name": "group_ids", "type": "Collection(Edm.String)", "filterable": true, "retrievable": false}
    ]
}

使用 REST API 将数据推送到索引中Pushing data into your index using the REST API

向索引的 URL 终结点发出 HTTP POST 请求。Issue an HTTP POST request to your index's URL endpoint. HTTP 请求的正文是一个 JSON 对象,包含要添加的文档:The body of the HTTP request is a JSON object containing the documents to be added:

POST https://[search service].search.azure.cn/indexes/securedfiles/docs/index?api-version=2019-05-06  
Content-Type: application/json
api-key: [admin key]

在请求正文中,指定文档的内容:In the request body, specify the content of your documents:

{
    "value": [
        {
            "@search.action": "upload",
            "file_id": "1",
            "file_name": "secured_file_a",
            "group_ids": ["group_id1"]
        },
        {
            "@search.action": "upload",
            "file_id": "2",
            "file_name": "secured_file_b",
            "group_ids": ["group_id1", "group_id2"]
        },
        {
            "@search.action": "upload",
            "file_id": "3",
            "file_name": "secured_file_c",
            "group_ids": ["group_id5", "group_id6"]
        }
    ]
}

如需使用组列表更新现有文档,可以使用 mergemergeOrUpload 操作:If you need to update an existing document with the list of groups, you can use the merge or mergeOrUpload action:

{
    "value": [
        {
            "@search.action": "mergeOrUpload",
            "file_id": "3",
            "group_ids": ["group_id7", "group_id8", "group_id9"]
        }
    ]
}

有关添加或更新文档的完整详细信息,可以阅读编辑文档For full details on adding or updating documents, you can read Edit documents.

应用安全筛选器Apply the security filter

若要基于 group_ids 访问权限修整文档,应发出包含 group_ids/any(g:search.in(g, 'group_id1, group_id2,...')) 筛选器的搜索查询,其中,'group_id1, group_id2,...' 是搜索请求发出者所属的组。In order to trim documents based on group_ids access, you should issue a search query with a group_ids/any(g:search.in(g, 'group_id1, group_id2,...')) filter, where 'group_id1, group_id2,...' are the groups to which the search request issuer belongs. 此筛选器匹配其 group_ids 字段包含某个给定标识符的所有文档。This filter matches all documents for which the group_ids field contains one of the given identifiers. 有关使用 Azure 认知搜索搜索文档的完整详细信息,可以阅读搜索文档For full details on searching documents using Azure Cognitive Search, you can read Search Documents. 请注意,此示例演示如何使用 POST 请求搜索文档。Note that this sample shows how to search documents using a POST request.

发出 HTTP POST 请求:Issue the HTTP POST request:

POST https://[service name].search.azure.cn/indexes/securedfiles/docs/search?api-version=2019-05-06
Content-Type: application/json  
api-key: [admin or query key]

在请求正文中指定筛选器:Specify the filter in the request body:

{
   "filter":"group_ids/any(g:search.in(g, 'group_id1, group_id2'))"  
}

应该获取 group_ids 包含“group_id1”或“group_id2”的文档。You should get the documents back where group_ids contains either "group_id1" or "group_id2". 换而言之,应获取请求发出者对其拥有读访问权限的文档。In other words, you get the documents to which the request issuer has read access.

{
 [
   {
    "@search.score":1.0,
     "file_id":"1",
     "file_name":"secured_file_a",
   },
   {
     "@search.score":1.0,
     "file_id":"2",
     "file_name":"secured_file_b"
   }
 ]
}

结束语Conclusion

本文介绍了如何基于用户标识和 Azure 认知搜索 search.in() 函数筛选结果。This is how you can filter results based on user identity and Azure Cognitive Search search.in() function. 可以使用此函数传入请求用户的主体标识符,以将其与每个目标文档关联的主体标识符进行匹配。You can use this function to pass in principle identifiers for the requesting user to match against principal identifiers associated with each target document. 处理搜索请求时,search.in 函数会筛选出任何用户主体都对其没有读访问权限的搜索结果。When a search request is handled, the search.in function filters out search results for which none of the user's principals have read access. 主体标识符可以表示安全组、角色甚至用户自己的标识等信息。The principal identifiers can represent things like security groups, roles, or even the user's own identity.

另请参阅See also