如何在 Azure 认知搜索中重新生成索引How to rebuild an index in Azure Cognitive Search

本文介绍如何重新生成 Azure 认知搜索索引、需要重新生成的情况以及有关如何减轻重新生成操作对正在进行的查询请求的影响的建议。This article explains how to rebuild an Azure Cognitive Search index, the circumstances under which rebuilds are required, and recommendations for mitigating the impact of rebuilds on ongoing query requests.

“重新生成”是指删除并重新创建与索引关联的物理数据结构,包括所有基于字段的反向索引 。A rebuild refers to dropping and recreating the physical data structures associated with an index, including all field-based inverted indexes. 在 Azure 认知搜索中,无法删除并重新创建各个字段。In Azure Cognitive Search, you cannot drop and recreate individual fields. 若要重新生成索引,必须删除所有字段存储、基于现有或修订的索引架构重新创建,然后使用推送到索引或从外部源提取的数据重新填充。To rebuild an index, all field storage must be deleted, recreated based on an existing or revised index schema, and then repopulated with data pushed to the index or pulled from external sources.

在开发期间以迭代方式进行索引设计时常常会重新生成索引,但可能还需要重新生成生产级索引以适应结构化的更改,例如,向建议器添加复杂类型或添加字段。It's common to rebuild indexes during development when you are iterating over index design, but you might also need to rebuild a production-level index to accommodate structural changes, such as adding complex types or adding fields to suggesters.

“重新生成”与“刷新”"Rebuild" versus "refresh"

不应将重新生成与使用新的、已修改的或已删除的文档来刷新索引内容相混淆。Rebuild should not be confused with refreshing the contents of an index with new, modified, or deleted documents. 刷新搜索语料库几乎是每个搜索应用的规定,有些场景需要精确到分钟的更新(例如,当搜索语料库需要反映在线销售应用中的库存变化时就是如此)。Refreshing a search corpus is almost a given in every search app, with some scenarios requiring up-to-the-minute updates (for example, when a search corpus needs to reflect inventory changes in an online sales app).

只要不更改索引的结构,就可以使用最初用于加载索引的相同技术来刷新索引:As long as you are not changing the structure of the index, you can refresh an index using the same techniques that you used to load the index initially:

  • 对于推送模式索引,请调用添加、更新或删除文档,将更改推送到索引。For push-mode indexing, call Add, Update or Delete Documents to push the changes to an index.

  • 对于索引器,你可以计划索引器执行并使用更改跟踪或时间戳来确定增量。For indexers, you can schedule indexer execution and use change-tracking or timestamps to identify the delta. 如果更新的反映速度必须快于计划程序的管理速度,那么可以改用推送模式索引。If updates must be reflected faster than what a scheduler can manage, you can use push-mode indexing instead.

重新生成条件Rebuild conditions

如果以下任一情况属实,请删除并重新创建索引。Drop and recreate an index if any of the following conditions are true.

条件Condition 说明Description
更改字段定义Change a field definition 修改字段名称、数据类型或特定的索引属性(可搜索、可筛选、可排序、可查找)需要完全重新生成。Revising a field name, data type, or specific index attributes (searchable, filterable, sortable, facetable) requires a full rebuild.
向字段分配分析器Assign an analyzer to a field 分析器是在索引中定义的,然后分配给字段。Analyzers are defined in an index and then assigned to fields. 随时都可以向索引添加新的分析器定义,但只有在创建字段时才能分配分析器。You can add a new analyzer definition to an index at any time, but you can only assign an analyzer when the field is created. 对于 analyzerindexAnalyzer 属性都是如此。This is true for both the analyzer and indexAnalyzer properties. searchAnalyzer 属性是一个例外(可以向现有字段分配此属性)。The searchAnalyzer property is an exception (you can assign this property to an existing field).
更新或删除索引中的分析器定义Update or delete an analyzer definition in an index 无法删除或更改索引中的现有分析器配置(分析器、tokenizer、令牌筛选器或字符筛选器),除非重新生成整个索引。You cannot delete or change an existing analyzer configuration (analyzer, tokenizer, token filter, or char filter) in the index unless you rebuild the entire index.
将字段添加到建议器Add a field to a suggester 如果某个字段已存在,并且希望将其添加到建议器构造,则必须重新生成索引。If a field already exists and you want to add it to a Suggesters construct, you must rebuild the index.
删除字段Delete a field 若要以物理方式删除字段的所有跟踪,必须重新生成索引。To physically remove all traces of a field, you have to rebuild the index. 当即时重新生成不可行时,可以修改应用程序代码以禁用对“已删除的”字段的访问,或使用 $select 查询参数选择要在结果集中显示的字段。When an immediate rebuild is not practical, you can modify application code to disable access to the "deleted" field or use the $select query parameter to choose which fields are represented in the result set. 实际上,当你应用省略了相关字段的架构时,字段定义和内容会一直保留在索引中,直至下次重新生成。Physically, the field definition and contents remain in the index until the next rebuild, when you apply a schema that omits the field in question.
切换层Switch tiers 如果需要更多容量,则无法在 Azure 门户中就地升级。If you require more capacity, there is no in-place upgrade in the Azure portal. 必须创建新服务,必须在新服务中从头开始生成索引。A new service must be created, and indexes must be built from scratch on the new service. 若要自动完成此过程,可以使用此 Azure 认知搜索 .NET 示例存储库中的 index-backup-restore 示例代码。To help automate this process, you can use the index-backup-restore sample code in this Azure Cognitive Search .NET sample repo. 此应用会将索引备份到一系列 JSON 文件,然后在指定的搜索服务中重新创建索引。This app will back up your index to a series of JSON files, and then recreate the index in a search service you specify.

更新条件Update conditions

可以在不影响现有物理结构的情况下进行许多其他修改。Many other modifications can be made without impacting existing physical structures. 具体而言,以下更改不需要重新生成索引。Specifically, the following changes do not require an index rebuild. 对于这些更改,可以通过你的更改来更新索引定义For these changes, you can update an index definition with your changes.

  • 添加新字段Add a new field
  • 在现有字段上设置“可检索”属性Set the retrievable attribute on an existing field
  • 在现有字段上设置 searchAnalyzerSet a searchAnalyzer on an existing field
  • 在索引中添加新的分析器定义Add a new analyzer definition in an index
  • 添加、更新或删除计分概要文件Add, update, or delete scoring profiles
  • 添加、更新或删除 CORS 设置Add, update, or delete CORS settings
  • 添加、更新或删除 synonymMapsAdd, update, or delete synonymMaps

添加新字段时,将在新字段中为已编制索引的现有文档提供 null 值。When you add a new field, existing indexed documents are given a null value for the new field. 将来刷新数据时,外部源数据中的值会替换 Azure 认知搜索添加的 null 值。On a future data refresh, values from external source data replace the nulls added by Azure Cognitive Search. 有关更新索引内容的详细信息,请参阅添加、更新或删除文档For more information on updating index content, see Add, Update or Delete Documents.

如何重新生成索引How to rebuild an index

在开发过程中,索引架构会频繁更改。During development, the index schema changes frequently. 可以通过以下方式对索引架构进行规划:使用小型的具有代表性的数据集创建可以快速删除、重新创建和重新加载的索引。You can plan for it by creating indexes that can be deleted, recreated, and reloaded quickly with a small representative data set.

对于已投入生产的应用程序,建议创建一个与现有索引并排运行的新索引,以避免查询时停机。For applications already in production, we recommend creating a new index that runs side by side an existing index to avoid query downtime. 应用程序代码提供到新索引的重定向。Your application code provides redirection to the new index.

索引编制不会在后台运行,服务将会根据正在进行的查询平衡额外的索引编制。Indexing does not run in the background and the service will balance the additional indexing against ongoing queries. 在编制索引期间,可以在门户中监视查询请求,以确保查询及时完成。During indexing, you can monitor query requests in the portal to ensure queries are completing in a timely manner.

  1. 确定是否需要重新生成。Determine whether a rebuild is required. 如果只是添加字段或更改与字段无关的索引部分,则可以只更新定义,而无需删除、重新创建并完全重新加载它。If you are just adding fields, or changing some part of the index that is unrelated to fields, you might be able to simply update the definition without deleting, recreating, and fully reloading it.

  2. 获取索引定义,以备将来参考。Get an index definition in case you need it for future reference.

  3. 删除现有索引,前提是你没有并行运行新索引和旧索引。Drop the existing index, assuming you are not running new and old indexes side by side.

    任何针对该索引的查询都会被立即删除。Any queries targeting that index are immediately dropped. 请注意,删除索引是不可逆的,此操作会销毁字段集合和其他构造的物理存储空间。Remember that deleting an index is irreversible, destroying physical storage for the fields collection and other constructs. 在删除之前,请先考虑这样做的影响。Pause to think about the implications before dropping it.

  4. 创建修订的索引,其中,请求正文包括已更改或已修改的字段定义。Create a revised index, where the body of the request includes changed or modified field definitions.

  5. 通过外部源使用文件加载索引Load the index with documents from an external source.

创建索引时,将为索引架构中的每个字段分配物理存储,并为每个可搜索字段创建反向索引。When you create the index, physical storage is allocated for each field in the index schema, with an inverted index created for each searchable field. 不可搜索的字段可以用于筛选器或表达式中,但没有反向索引也不支持全文或模糊搜索。Fields that are not searchable can be used in filters or expressions, but do not have inverted indexes and are not full-text or fuzzy searchable. 在重新生成索引时,将删除这些反向索引,并根据提供的索引架构重新创建。On an index rebuild, these inverted indexes are deleted and recreated based on the index schema you provide.

加载索引时,每个文档中所有唯一的标记化字词都会填充每个字段的反向索引,并映射到相应的文档 ID。When you load the index, each field's inverted index is populated with all of the unique, tokenized words from each document, with a map to corresponding document IDs. 例如,在为酒店数据集编制索引时,为城市字段创建的反向索引可能包含西雅图、波特兰等的术语。For example, when indexing a hotels data set, an inverted index created for a City field might contain terms for Seattle, Portland, and so forth. 对于城市字段中包含西雅图或波特兰的文档来说,文档 ID 将随术语一同列出。Documents that include Seattle or Portland in the City field would have their document ID listed alongside the term. 在任何添加、更新或删除操作中,都会相应地更新术语和文档 ID 列表。On any Add, Update or Delete operation, the terms and document ID list are updated accordingly.

备注

如果你有严格的 SLA 要求,可以考虑专门为此工作预配新服务,使开发和索引完全隔离于生产索引。If you have stringent SLA requirements, you might consider provisioning a new service specifically for this work, with development and indexing occurring in full isolation from a production index. 单独的服务在其自己的硬件上运行,这样消除了出现资源争用的可能性。A separate service runs on its own hardware, eliminating any possibility of resource contention. 开发完成后,可以保留新索引,将查询重定向到新终结点和索引,也可以运行完成的代码以在原始 Azure 认知搜索服务上发布修订后的索引。When development is complete, you would either leave the new index in place, redirecting queries to the new endpoint and index, or you would run finished code to publish a revised index on your original Azure Cognitive Search service. 目前没有将即用型索引移至另一个服务的机制。There is currently no mechanism for moving a ready-to-use index to another service.

检查更新Check for updates

在加载第一个文档时就可以开始查询索引。You can begin querying an index as soon as the first document is loaded. 如果你知道文档的 ID,那么查找文档 REST API 将返回特定的文档。If you know a document's ID, the Lookup Document REST API returns the specific document. 对于更大型的测试,应该等待索引完全加载,然后使用查询来验证你想看到的上下文。For broader testing, you should wait until the index is fully loaded, and then use queries to verify the context you expect to see.

可以使用搜索资源管理器Postman 之类的 Web 测试工具来检查更新的内容。You can use Search Explorer or a Web testing tool like Postman to check for updated content.

如果添加或重命名了字段,请使用 $select 返回该字段:search=*&$select=document-id,my-new-field,some-old-field&$count=trueIf you added or renamed a field, use $select to return that field: search=*&$select=document-id,my-new-field,some-old-field&$count=true

另请参阅See also