如何使用 Azure 认知搜索中的索引器为 Cosmos DB 数据编制索引How to index Cosmos DB data using an indexer in Azure Cognitive Search

重要

SQL API 已推出正式版。SQL API is generally available. MongoDB API、Gremlin API 和 Cassandra API 支持目前以公共预览版提供。MongoDB API, Gremlin API, and Cassandra API support are currently in public preview. 提供的预览版功能不附带服务级别协议,我们不建议将其用于生产工作负荷。Preview functionality is provided without a service level agreement, and is not recommended for production workloads. 可以填写此表单来请求访问预览版。You can request access to the previews by filling out this form. REST API 版本 2020-06-30-Preview 提供预览版功能。The REST API version 2020-06-30-Preview provides preview features. 目前提供有限的门户支持,不提供 .NET SDK 支持。There is currently limited portal support, and no .NET SDK support.

警告

Azure 认知搜索仅支持将索引策略设为一致的 Cosmos DB 集合。Only Cosmos DB collections with an indexing policy set to Consistent are supported by Azure Cognitive Search. 不建议使用延迟索引策略为集合编制索引,这可能会导致数据丢失。Indexing collections with a Lazy indexing policy is not recommended and may result in missing data. 不支持禁用索引的集合。Collections with indexing disabled are not supported.

本文介绍如何配置 Azure Cosmos DB 索引器以提取内容,并使内容在 Azure 认知搜索中可搜索。This article shows you how to configure an Azure Cosmos DB indexer to extract content and make it searchable in Azure Cognitive Search. 此工作流将创建一个 Azure 认知搜索索引,然后连同从 Azure Cosmos DB 中提取的现有文本一起加载该索引。This workflow creates an Azure Cognitive Search index and loads it with existing text extracted from Azure Cosmos DB.

由于术语可能会造成混淆,特此提示,Azure Cosmos DB 索引编制Azure 认知搜索索引编制属于不同的操作,且是每个服务中特有的操作。Because terminology can be confusing, it's worth noting that Azure Cosmos DB indexing and Azure Cognitive Search indexing are distinct operations, unique to each service. 在开始执行 Azure 认知搜索索引编制之前,Azure Cosmos DB 数据库必须已存在且包含数据。Before you start Azure Cognitive Search indexing, your Azure Cosmos DB database must already exist and contain data.

Azure 认知搜索中的 Cosmos DB 索引器可以抓取通过不同协议访问的 Azure Cosmos DB 项The Cosmos DB indexer in Azure Cognitive Search can crawl Azure Cosmos DB items accessed through different protocols.

备注

如果你希望 Azure 认知搜索支持表 API,请在 User Voice 中为它投票。You can cast a vote on User Voice for the Table API if you'd like to see it supported in Azure Cognitive Search.

使用门户Use the portal

备注

门户目前支持 SQL API 和 MongoDB API(预览版)。The portal currently supports the SQL API and MongoDB API (preview).

为 Azure Cosmos DB 项编制索引的最简单方法是使用 Azure 门户中的向导。The easiest method for indexing Azure Cosmos DB items is to use a wizard in the Azure portal. 通过数据采样并读取容器中的元数据,Azure 认知搜索中的导入数据向导可以创建默认索引、将源字段映射到目标索引字段,并以单个操作加载索引。By sampling data and reading metadata on the container, the Import data wizard in Azure Cognitive Search can create a default index, map source fields to target index fields, and load the index in a single operation. 根据源数据的大小和复杂性,在数分钟内就能创建一个有效的全文搜索索引。Depending on the size and complexity of source data, you could have an operational full text search index in minutes.

我们建议对 Azure 认知搜索和 Azure Cosmos DB 使用同一个区域或位置,以降低延迟并避免带宽费用。We recommend using the same region or location for both Azure Cognitive Search and Azure Cosmos DB for lower latency and to avoid bandwidth charges.

1 - 准备源数据1 - Prepare source data

应准备好一个 Cosmos DB 帐户、一个已映射到 SQL API 的 Azure Cosmos DB 数据库、MongoDB API(预览版)或 Gremlin API(预览版),以及数据库中的内容。You should have a Cosmos DB account, an Azure Cosmos DB database mapped to the SQL API, MongoDB API (preview), or Gremlin API (preview), and content in the database.

确保 Cosmos DB 数据库包含数据。Make sure your Cosmos DB database contains data. 导入数据向导会读取元数据并执行数据采样以推断索引架构,但它还会从 Cosmos DB 加载数据。The Import data wizard reads metadata and performs data sampling to infer an index schema, but it also loads data from Cosmos DB. 如果缺少数据,该向导会停止并出现以下错误:“从数据源检测索引架构时出错:由于数据源 'emptycollection' 未返回任何数据,无法生成原型索引”。If the data is missing, the wizard stops with this error "Error detecting index schema from data source: Could not build a prototype index because datasource 'emptycollection' returned no data".

2 - 启动“导入数据”向导2 - Start Import data wizard

可以从 Azure 认知搜索服务页中的命令栏启动向导;如果要连接到 Cosmos DB SQL API,可以在 Cosmos DB 帐户左侧导航窗格的“设置”部分单击“添加 Azure 认知搜索”。You can start the wizard from the command bar in the Azure Cognitive Search service page, or if you're connecting to Cosmos DB SQL API you can click Add Azure Cognitive Search in the Settings section of your Cosmos DB account's left navigation pane.

门户中的“导入数据”命令Import data command in portal

3 - 设置数据源3 - Set the data source

在“数据源”页中,源必须是“Cosmos DB”,其规范如下:In the data source page, the source must be Cosmos DB, with the following specifications:

  • “名称”是数据源对象的名称。Name is the name of the data source object. 创建后,可以选择将它用于其他工作负荷。Once created, you can choose it for other workloads.

  • “Cosmos DB 帐户”应是 Cosmos DB 中的主要或辅助连接字符串,采用 AccountEndpoint=https://<Cosmos DB account name>.documents.azure.cn;AccountKey=<Cosmos DB auth key>; 格式。Cosmos DB account should be the primary or secondary connection string from Cosmos DB with the following format: AccountEndpoint=https://<Cosmos DB account name>.documents.azure.cn;AccountKey=<Cosmos DB auth key>;.

    • 对于 3.2 和 3.6 版 MongoDB 集合,请对 Azure 门户中的 Cosmos DB 帐户使用以下格式:AccountEndpoint=https://<Cosmos DB account name>.documents.azure.cn;AccountKey=<Cosmos DB auth key>;ApiKind=MongoDbFor version 3.2 and version 3.6 MongoDB collections use the following format for the Cosmos DB account in the Azure portal: AccountEndpoint=https://<Cosmos DB account name>.documents.azure.cn;AccountKey=<Cosmos DB auth key>;ApiKind=MongoDb
    • 对于 Gremlin 图和 Cassandra 表,请注册受限索引器预览版以获取预览版的访问权限,以及有关如何设置凭据格式的信息。For Gremlin graphs and Cassandra tables, sign up for the gated indexer preview to get access to the preview and information about how to format the credentials.
  • “数据库”是帐户中的现有数据库。Database is an existing database from the account.

  • “集合”是文档的容器。Collection is a container of documents. 若要成功导入,文档必须存在。Documents must exist in order for import to succeed.

  • 若要查询所有文档,可将“查询”留空;否则,可以输入一个选择文档子集的查询。Query can be blank if you want all documents, otherwise you can input a query that selects a document subset. “查询”仅适用于 SQL API。Query is only available for the SQL API.

    Cosmos DB 数据源定义Cosmos DB data source definition

4 - 跳过向导中的“扩充内容”页4 - Skip the "Enrich content" page in the wizard

完成导入并不要求添加认知技能(或扩充)。Adding cognitive skills (or enrichment) is not an import requirement. 除非有具体的理由需要将 AI 扩充添加到索引管道,否则应跳过此步骤。Unless you have a specific need to add AI enrichment to your indexing pipeline, you should skip this step.

若要跳过该步骤,请单击页面底部的“下一步”和“跳过”蓝色按钮。To skip the step, click the blue buttons at the bottom of the page for "Next" and "Skip".

5 - 设置索引属性5 - Set index attributes

在“索引”页中,应会看到带有数据类型的字段列表,以及一系列用于设置索引属性的复选框。In the Index page, you should see a list of fields with a data type and a series of checkboxes for setting index attributes. 向导可以通过源数据采样,基于元数据生成字段列表。The wizard can generate a fields list based on metadata and by sampling the source data.

可以通过单击属性列顶部的复选框,来批量选择属性。You can bulk-select attributes by clicking the checkbox at the top of an attribute column. 对于应该返回给客户端应用并且需要接受全文搜索处理的每个字段,请选择“可检索”和“可搜索”。 Choose Retrievable and Searchable for every field that should be returned to a client app and subject to full text search processing. 你会注意到,无法对整数进行全文搜索或模糊搜索(数字按原义评估,通常在筛选器中使用)。You'll notice that integers are not full text or fuzzy searchable (numbers are evaluated verbatim and are often useful in filters).

有关详细信息,请查看索引属性语言分析器的说明。Review the description of index attributes and language analyzers for more information.

花费片刻时间来检查所做的选择。Take a moment to review your selections. 运行向导后,将创建物理数据结构,到时,除非删除再重新创建所有对象,否则无法编辑这些字段。Once you run the wizard, physical data structures are created and you won't be able to edit these fields without dropping and recreating all objects.

Cosmos DB 索引定义Cosmos DB index definition

6 - 创建索引器6 - Create indexer

完全指定设置后,向导将在搜索服务中创建三个不同的对象。Fully specified, the wizard creates three distinct objects in your search service. 数据源对象和索引对象作为命名的资源保存在 Azure 认知搜索服务中。A data source object and index object are saved as named resources in your Azure Cognitive Search service. 最后一个步骤创建索引器对象。The last step creates an indexer object. 为索引器命名可让它作为独立的资源存在,无论在同一向导序列中创建了哪种索引和数据源对象,都可以计划和管理该索引器。Naming the indexer allows it to exist as a standalone resource, which you can schedule and manage independently of the index and data source object, created in the same wizard sequence.

如果你不熟悉索引器,请记住,索引器是 Azure 认知搜索中的一个资源,它可以抓取外部数据源,以检索可搜索的内容。If you are not familiar with indexers, an indexer is a resource in Azure Cognitive Search that crawls an external data source for searchable content. “导入数据”向导的输出是抓取 Cosmos DB 数据源、提取可搜索内容,然后将此内容导入 Azure 认知搜索中的某个索引的索引器。The output of the Import data wizard is an indexer that crawls your Cosmos DB data source, extracts searchable content, and imports it into an index on Azure Cognitive Search.

以下屏幕截图显示了默认的索引器配置。The following screenshot shows the default indexer configuration. 若要运行索引器一次,可以切换为“一次”。You can switch to Once if you want to run the indexer one time. 单击“提交”运行向导并创建所有对象。Click Submit to run the wizard and create all objects. 随后会立即开始编制索引。Indexing commences immediately.

Cosmos DB 索引器定义Cosmos DB indexer definition

可以在门户页监视数据导入。You can monitor data import in the portal pages. 进度通知指示索引状态以及已上传的文档数。Progress notifications indicate indexing status and how many documents are uploaded.

索引编制完成后,可以使用搜索浏览器来查询索引。When indexing is complete, you can use Search explorer to query your index.

备注

如果未看到所需的数据,可能需要在更多字段中设置更多的属性。If you don't see the data you expect, you might need to set more attributes on more fields. 删除刚刚创建的索引和索引器,再次完成向导中的每个步骤,并修改在步骤 5 中对索引属性所做的选择。Delete the index and indexer you just created, and step through the wizard again, modifying your selections for index attributes in step 5.

使用 REST APIUse REST APIs

可以遵循 Azure 认知搜索中所有索引器通用的三部分工作流,使用 REST API 为 Azure Cosmos DB 数据编制索引:创建数据源、创建索引、创建索引器。You can use the REST API to index Azure Cosmos DB data, following a three-part workflow common to all indexers in Azure Cognitive Search: create a data source, create an index, create an indexer. 提交“创建索引器”请求时,将从 Cosmos DB 提取数据。Data extraction from Cosmos DB occurs when you submit the Create Indexer request. 完成此请求后,将获得一个可查询的索引。After this request is finished, you will have a queryable index.

备注

若要为来自 Cosmos DB Gremlin API 或 Cosmos DB Cassandra API 的数据编制索引,必须先填写此表单请求访问受限预览版。For indexing data from Cosmos DB Gremlin API or Cosmos DB Cassandra API you must first request access to the gated previews by filling out this form. 处理请求后,你将收到有关如何使用 REST API 版本 2020-06-30-Preview 创建数据源的说明。Once your request is processed, you will receive instructions for how to use the REST API version 2020-06-30-Preview to create the data source.

本文前面已指出,Azure Cosmos DB 索引编制Azure 认知搜索索引编制属于不同的操作。Earlier in this article it is mentioned that Azure Cosmos DB indexing and Azure Cognitive Search indexing indexing are distinct operations. 对于 Cosmos DB 索引编制,默认会自动为所有文档编制索引,但 Cassandra API 除外。For Cosmos DB indexing, by default all documents are automatically indexed except with the Cassandra API. 如果关闭自动索引编制,则只能通过文档本身的链接或使用文档 ID 进行查询的方法访问文档。If you turn off automatic indexing, documents can be accessed only through their self-links or by queries by using the document ID. Azure 认知搜索索引编制要求在将由 Azure 认知搜索编制索引的集合中启用 Cosmos DB 自动索引编制。Azure Cognitive Search indexing requires Cosmos DB automatic indexing to be turned on in the collection that will be indexed by Azure Cognitive Search. 注册 Cosmos DB Cassandra API 索引器预览版时,你将会收到有关如何设置 Cosmos DB 索引编制的说明。When signing up for the Cosmos DB Cassandra API indexer preview, you'll be given instructions on how set up Cosmos DB indexing.

警告

Azure Cosmos DB 是下一代 DocumentDB。Azure Cosmos DB is the next generation of DocumentDB. 在以前的 API 版本 2017-11-11 中,可以使用 documentdb 语法。Previously with API version 2017-11-11 you could use the documentdb syntax. 这意味着,可将数据源类型指定为 cosmosdbdocumentdbThis meant that you could specify your data source type as cosmosdb or documentdb. 从 API 版本 2019-05-06 开始,Azure 认知搜索 API 和门户都仅支持本文中所述的 cosmosdb 语法。Starting with API version 2019-05-06 both the Azure Cognitive Search APIs and Portal only support the cosmosdb syntax as instructed in this article. 这意味着,若要连接到 Cosmos DB 终结点,数据源类型必须是 cosmosdbThis means that the data source type must cosmosdb if you would like to connect to a Cosmos DB endpoint.

1 - 汇编请求的输入1 - Assemble inputs for the request

对于每个请求,必须提供 Azure 认知搜索的服务名称和管理密钥(在 POST 标头中),以及 Blob 存储的存储帐户名称和密钥。For each request, you must provide the service name and admin key for Azure Cognitive Search (in the POST header), and the storage account name and key for blob storage. 可以使用 Postman 将 HTTP 请求发送到 Azure 认知搜索。You can use Postman to send HTTP requests to Azure Cognitive Search.

将以下四个值复制到记事本中,以便将其粘贴到请求:Copy the following four values into Notepad so that you can paste them into a request:

  • Azure 认知搜索服务名称Azure Cognitive Search service name
  • Azure 认知搜索管理密钥Azure Cognitive Search admin key
  • Cosmos DB 连接字符串Cosmos DB connection string

可以在门户中找到这些值:You can find these values in the portal:

  1. 在 Azure 认知搜索的门户页中,从“概述”页复制搜索服务 URL。In the portal pages for Azure Cognitive Search, copy the search service URL from the Overview page.

  2. 在左侧导航窗格中单击“密钥”,然后复制主密钥或辅助密钥(两者是等效的)。In the left navigation pane, click Keys and then copy either the primary or secondary key (they are equivalent).

  3. 切换到 Cosmos 存储帐户的门户页。Switch to the portal pages for your Cosmos storage account. 在左侧导航窗格中的“设置”下,单击“密钥”。 In the left navigation pane, under Settings, click Keys. 此页提供一个 URI、两组连接字符串和两组密钥。This page provides a URI, two sets of connection strings, and two sets of keys. 请将其中一个连接字符串复制到记事本。Copy one of the connection strings to Notepad.

2 - 创建数据源2 - Create a data source

数据源指定要编制索引的数据、凭据和用于识别数据更改(如修改或删除了集合内的文档)的策略。A data source specifies the data to index, credentials, and policies for identifying changes in the data (such as modified or deleted documents inside your collection). 数据源定义为独立的资源,以便它可以被多个索引器使用。The data source is defined as an independent resource so that it can be used by multiple indexers.

若要创建数据源,请构建 POST 请求:To create a data source, formulate a POST request:


    POST https://[service name].search.azure.cn/datasources?api-version=2020-06-30
    Content-Type: application/json
    api-key: [Search service admin key]

    {
        "name": "mycosmosdbdatasource",
        "type": "cosmosdb",
        "credentials": {
            "connectionString": "AccountEndpoint=https://myCosmosDbEndpoint.documents.azure.cn;AccountKey=myCosmosDbAuthKey;Database=myCosmosDbDatabaseId"
        },
        "container": { "name": "myCollection", "query": null },
        "dataChangeDetectionPolicy": {
            "@odata.type": "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
            "highWaterMarkColumnName": "_ts"
        }
    }

请求正文包含数据源定义,其中应包括以下字段:The body of the request contains the data source definition, which should include the following fields:

字段Field 说明Description
namename 必需。Required. 选择任意名称来表示你的数据源对象。Choose any name to represent your data source object.
typetype 必需。Required. 必须是 cosmosdbMust be cosmosdb.
凭据credentials 必需。Required. 必须是 Cosmos DB 连接字符串。Must be a Cosmos DB connection string.

对于 SQL 集合,连接字符串采用以下格式:AccountEndpoint=https://<Cosmos DB account name>.documents.azure.cn;AccountKey=<Cosmos DB auth key>;Database=<Cosmos DB database id>For SQL collections, connection strings are in this format: AccountEndpoint=https://<Cosmos DB account name>.documents.azure.cn;AccountKey=<Cosmos DB auth key>;Database=<Cosmos DB database id>

对于 3.2 和 3.6 版 MongoDB 集合,请对连接字符串使用以下格式:AccountEndpoint=https://<Cosmos DB account name>.documents.azure.cn;AccountKey=<Cosmos DB auth key>;Database=<Cosmos DB database id>;ApiKind=MongoDbFor version 3.2 and version 3.6 MongoDB collections use the following format for the connection string: AccountEndpoint=https://<Cosmos DB account name>.documents.azure.cn;AccountKey=<Cosmos DB auth key>;Database=<Cosmos DB database id>;ApiKind=MongoDb

对于 Gremlin 图和 Cassandra 表,请注册受限索引器预览版以获取预览版的访问权限,以及有关如何设置凭据格式的信息。For Gremlin graphs and Cassandra tables, sign up for the gated indexer preview to get access to the preview and information about how to format the credentials.

避免在终结点 URL 中包含端口号。Avoid port numbers in the endpoint url. 如果包含端口号,Azure 认知搜索将无法为 Azure Cosmos DB 数据库编制索引。If you include the port number, Azure Cognitive Search will be unable to index your Azure Cosmos DB database.
容器container 包含下列元素:Contains the following elements:
名称:必需。name: Required. 指定要编制索引的数据库集合的 ID。Specify the ID of the database collection to be indexed.
查询:可选。query: Optional. 可以指定一个查询来将一个任意 JSON 文档平整成 Azure 认知搜索可编制索引的平面架构。You can specify a query to flatten an arbitrary JSON document into a flat schema that Azure Cognitive Search can index.
对于 MongoDB API、Gremlin API 和 Cassandra API,不支持查询。For the MongoDB API, Gremlin API, and Cassandra API, queries are not supported.
dataChangeDetectionPolicydataChangeDetectionPolicy 推荐。Recommended. 请参阅为已更改的文档编制索引部分。See Indexing Changed Documents section.
dataDeletionDetectionPolicydataDeletionDetectionPolicy 可选。Optional. 请参阅为已删除的文档编制索引部分。See Indexing Deleted Documents section.

使用查询形成索引数据Using queries to shape indexed data

可以指定一个 SQL 查询来平展嵌套的属性或数组、投影 JSON 属性并筛选要编制索引的数据。You can specify a SQL query to flatten nested properties or arrays, project JSON properties, and filter the data to be indexed.

警告

MongoDB APIGremlin APICassandra API 不支持自定义查询:必须将 container.query 参数设置为 null,或将其省略。Custom queries are not supported for MongoDB API, Gremlin API, and Cassandra API: container.query parameter must be set to null or omitted. 如果需要使用自定义查询,请在用户之声上告知我们。If you need to use a custom query, please let us know on User Voice.

示例文档:Example document:

    {
        "userId": 10001,
        "contact": {
            "firstName": "andy",
            "lastName": "hoh"
        },
        "company": "microsoft",
        "tags": ["azure", "cosmosdb", "search"]
    }

筛选查询:Filter query:

SELECT * FROM c WHERE c.company = "microsoft" and c._ts >= @HighWaterMark ORDER BY c._ts

平展查询:Flattening query:

SELECT c.id, c.userId, c.contact.firstName, c.contact.lastName, c.company, c._ts FROM c WHERE c._ts >= @HighWaterMark ORDER BY c._ts

投影查询:Projection query:

SELECT VALUE { "id":c.id, "Name":c.contact.firstName, "Company":c.company, "_ts":c._ts } FROM c WHERE c._ts >= @HighWaterMark ORDER BY c._ts

数组平展查询:Array flattening query:

SELECT c.id, c.userId, tag, c._ts FROM c JOIN tag IN c.tags WHERE c._ts >= @HighWaterMark ORDER BY c._ts

3 - 创建目标搜索索引3 - Create a target search index

创建目标 Azure 认知搜索索引(如果没有)。Create a target Azure Cognitive Search index if you don’t have one already. 以下示例创建带有 ID 和说明字段的索引:The following example creates an index with an ID and description field:

    POST https://[service name].search.azure.cn/indexes?api-version=2020-06-30
    Content-Type: application/json
    api-key: [Search service admin key]

    {
       "name": "mysearchindex",
       "fields": [{
         "name": "id",
         "type": "Edm.String",
         "key": true,
         "searchable": false
       }, {
         "name": "description",
         "type": "Edm.String",
         "filterable": false,
         "sortable": false,
         "facetable": false,
         "suggestions": true
       }]
     }

确保目标索引的架构与源 JSON 文档的架构或自定义查询投影的输出的架构兼容。Ensure that the schema of your target index is compatible with the schema of the source JSON documents or the output of your custom query projection.

备注

对于已分区集合,默认文档键是 Azure Cosmos DB 的 _rid 属性,Azure 认知搜索会自动将其重命名为 rid,因为字段名称不能以下划线字符开头。For partitioned collections, the default document key is Azure Cosmos DB's _rid property, which Azure Cognitive Search automatically renames to rid because field names cannot start with an underscore character. 此外,Azure Cosmos DB 的 _rid 值包含了在 Azure 认知搜索键中无效的字符。Also, Azure Cosmos DB _rid values contain characters that are invalid in Azure Cognitive Search keys. 因此,_rid 值采用 Base64 编码。For this reason, the _rid values are Base64 encoded.

对于 MongoDB 集合,Azure 认知搜索会自动将 _id 属性重命名为 idFor MongoDB collections, Azure Cognitive Search automatically renames the _id property to id.

JSON 数据类型与 Azure 认知搜索数据类型之间的映射Mapping between JSON Data Types and Azure Cognitive Search Data Types

JSON 数据类型JSON data type 兼容的目标索引字段类型Compatible target index field types
BoolBool Edm.Boolean、Edm.StringEdm.Boolean, Edm.String
类似于整数的数字Numbers that look like integers Edm.Int32、Edm.Int64、Edm.StringEdm.Int32, Edm.Int64, Edm.String
类似于浮点的数字Numbers that look like floating-points Edm.Double、Edm.StringEdm.Double, Edm.String
StringString Edm.StringEdm.String
基元类型的数组,如 ["a", "b", "c"]Arrays of primitive types, for example ["a", "b", "c"] 集合 (Edm.String)Collection(Edm.String)
类似于日期的字符串Strings that look like dates Edm.DateTimeOffset、Edm.StringEdm.DateTimeOffset, Edm.String
GeoJSON 对象,例如 { "type":"Point", "coordinates": [long, lat] }GeoJSON objects, for example { "type": "Point", "coordinates": [long, lat] } Edm.GeographyPointEdm.GeographyPoint
其他 JSON 对象Other JSON objects 空值N/A

4 - 配置并运行索引器4 - Configure and run the indexer

创建索引和数据源后,就可以准备创建索引器了:Once the index and data source have been created, you're ready to create the indexer:

    POST https://[service name].search.azure.cn/indexers?api-version=2020-06-30
    Content-Type: application/json
    api-key: [admin key]

    {
      "name" : "mycosmosdbindexer",
      "dataSourceName" : "mycosmosdbdatasource",
      "targetIndexName" : "mysearchindex",
      "schedule" : { "interval" : "PT2H" }
    }

此索引器每两小时运行一次(已将计划间隔设置为“PT2H”)。This indexer runs every two hours (schedule interval is set to "PT2H"). 若要每隔 30 分钟运行一次索引器,可将间隔设置为“PT30M”。To run an indexer every 30 minutes, set the interval to "PT30M". 支持的最短间隔为 5 分钟。The shortest supported interval is 5 minutes. 计划是可选的 - 如果省略,则索引器在创建后只运行一次。The schedule is optional - if omitted, an indexer runs only once when it's created. 但是,可以随时根据需要运行索引器。However, you can run an indexer on-demand at any time.

有关创建索引器 API 的更多详细信息,请参阅创建索引器For more details on the Create Indexer API, check out Create Indexer.

若要详细了解如何定义索引器计划,请参阅如何为 Azure 认知搜索计划索引器For more information about defining indexer schedules, see How to schedule indexers for Azure Cognitive Search.

使用 .NETUse .NET

正式版 .NET SDK 完全可与正式版 REST API 搭配使用。The generally available .NET SDK has full parity with the generally available REST API. 我们建议查看前面的 REST API 部分,以了解相关概念、工作流和要求。We recommend that you review the previous REST API section to learn concepts, workflow, and requirements. 然后,可以参阅以下 .NET API 参考文档,在托管代码中实现 JSON 索引器。You can then refer to following .NET API reference documentation to implement a JSON indexer in managed code.

为已更改的文档编制索引Indexing changed documents

数据更改检测策略旨在有效识别已更改的数据项。The purpose of a data change detection policy is to efficiently identify changed data items. 目前,唯一支持的策略是使用 Azure Cosmos DB 提供的 _ts(时间戳)属性的 HighWaterMarkChangeDetectionPolicy,该属性按如下所示指定:Currently, the only supported policy is the HighWaterMarkChangeDetectionPolicy using the _ts (timestamp) property provided by Azure Cosmos DB, which is specified as follows:

    {
        "@odata.type" : "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
        "highWaterMarkColumnName" : "_ts"
    }

强烈建议使用此策略,以确保索引器性能良好。Using this policy is highly recommended to ensure good indexer performance.

如果使用自定义查询,请确保查询投影 _ts 属性。If you are using a custom query, make sure that the _ts property is projected by the query.

增量操作和自定义查询Incremental progress and custom queries

索引编制过程中的增量操作可确保由于暂时性故障或执行时间限制而中断索引器执行时,索引器能够在下次运行时从中断位置运行,而不是从头开始重新为整个集合编制索引。Incremental progress during indexing ensures that if indexer execution is interrupted by transient failures or execution time limit, the indexer can pick up where it left off next time it runs, instead of having to reindex the entire collection from scratch. 在为大型集合编制索引时,这一点尤其重要。This is especially important when indexing large collections.

要在使用自定义查询时启用增量操作,请确保查询按照 _ts 列对结果进行排序。To enable incremental progress when using a custom query, ensure that your query orders the results by the _ts column. 这会启用定期检查点,在出现故障时,Azure 认知搜索可以利用检查点提供增量操作。This enables periodic check-pointing that Azure Cognitive Search uses to provide incremental progress in the presence of failures.

在某些情况下,即使查询包含 ORDER BY [collection alias]._ts 子句,Azure 认知搜索也可能不会推断出查询是按照 _ts 进行排序的。In some cases, even if your query contains an ORDER BY [collection alias]._ts clause, Azure Cognitive Search may not infer that the query is ordered by the _ts. 可以告知 Azure 认知搜索,结果是通过使用 assumeOrderByHighWaterMarkColumn 配置属性进行排序的。You can tell Azure Cognitive Search that results are ordered by using the assumeOrderByHighWaterMarkColumn configuration property. 要指定此提示,请按如下所示创建或更新索引器:To specify this hint, create or update your indexer as follows:

    {
     ... other indexer definition properties
     "parameters" : {
            "configuration" : { "assumeOrderByHighWaterMarkColumn" : true } }
    } 

为已删除的文档编制索引Indexing deleted documents

从集合中删除行时,通常还需要从搜索索引中删除这些行。When rows are deleted from the collection, you normally want to delete those rows from the search index as well. 数据删除检测策略旨在有效识别已删除的数据项。The purpose of a data deletion detection policy is to efficiently identify deleted data items. 目前,唯一支持的策略是 Soft Delete 策略(删除标有某种类型的标志),它按如下所示指定:Currently, the only supported policy is the Soft Delete policy (deletion is marked with a flag of some sort), which is specified as follows:

    {
        "@odata.type" : "#Microsoft.Azure.Search.SoftDeleteColumnDeletionDetectionPolicy",
        "softDeleteColumnName" : "the property that specifies whether a document was deleted",
        "softDeleteMarkerValue" : "the value that identifies a document as deleted"
    }

如果使用自定义查询,请确保查询投影由 softDeleteColumnName 引用的属性。If you are using a custom query, make sure that the property referenced by softDeleteColumnName is projected by the query.

下面的示例创建具有软删除策略的数据源:The following example creates a data source with a soft-deletion policy:

    POST https://[service name].search.azure.cn/datasources?api-version=2020-06-30
    Content-Type: application/json
    api-key: [Search service admin key]

    {
        "name": "mycosmosdbdatasource",
        "type": "cosmosdb",
        "credentials": {
            "connectionString": "AccountEndpoint=https://myCosmosDbEndpoint.documents.azure.cn;AccountKey=myCosmosDbAuthKey;Database=myCosmosDbDatabaseId"
        },
        "container": { "name": "myCosmosDbCollectionId" },
        "dataChangeDetectionPolicy": {
            "@odata.type": "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
            "highWaterMarkColumnName": "_ts"
        },
        "dataDeletionDetectionPolicy": {
            "@odata.type": "#Microsoft.Azure.Search.SoftDeleteColumnDeletionDetectionPolicy",
            "softDeleteColumnName": "isDeleted",
            "softDeleteMarkerValue": "true"
        }
    }

后续步骤Next steps

祝贺!Congratulations! 你已了解如何使用索引器将 Azure Cosmos DB 与 Azure 认知搜索集成。You have learned how to integrate Azure Cosmos DB with Azure Cognitive Search using an indexer.