Azure 认知搜索 - 常见问题解答 (FAQ)Azure Cognitive Search - frequently asked questions (FAQ)

查找与 Azure 认知搜索有关的概念、代码和方案的常见问题的解答。Find answers to commonly asked questions about concepts, code, and scenarios related to Azure Cognitive Search.

平台Platform

Azure 认知搜索与 DBMS 中的全文搜索有何不同?How is Azure Cognitive Search different from full text search in my DBMS?

Azure 认知搜索支持多个数据源、针对多种语言的语言分析对有趣和异常数据输入的自定义分析、通过评分配置文件控制搜索优先级,以及自动提示、搜索词突出显示和分面导航等用户体验功能。Azure Cognitive Search supports multiple data sources, linguistic analysis for many languages, custom analysis for interesting and unusual data inputs, search rank controls through scoring profiles, and user-experience features such as typeahead, hit highlighting, and faceted navigation. 它还包含其他功能,如同义词和丰富的查询语法,但通常来说,这些不属于区别性功能。It also includes other features, such as synonyms and rich query syntax, but those are generally not differentiating features.

能否暂停 Azure 认知搜索服务并停止计费?Can I pause Azure Cognitive Search service and stop billing?

无法暂停服务。You cannot pause the service. 创建服务时,计算和存储资源是分配给用户单独使用的。Computational and storage resources are allocated for your exclusive use when the service is created. 无法按需发布和回收此类资源。It's not possible to release and reclaim those resources on-demand.

索引操作Indexing Operations

是否可以移动、备份和还原索引或索引快照?Move, backup, and restore indexes or index snapshots?

在开发阶段,你可能想要在搜索服务之间移动索引。During the development phase, you may want to move your index between search services. 例如,可以使用“基本”或“免费”定价层开发索引,然后将其移到“标准”或更高的层以用于生产。For example, you may use a Basic or Free pricing tier to develop your index, and then want to move it to the Standard or higher tier for production use.

或者,你可能想要将索引快照备份到文件中,以后可以使用这些文件来还原该快照。Or, you may want to backup an index snapshot to files that can be used to restore it later.

使用此 Azure 认知搜索 .NET 示例存储库中的 index-backup-restore 示例代码可以实现所有这些目的。You can do all these things with the index-backup-restore sample code in this Azure Cognitive Search .NET sample repo.

此外,随时可以使用 Azure 认知搜索 REST API 获取索引定义You can also get an index definition at any time using the Azure Cognitive Search REST API.

Azure 门户目前不提供内置的索引提取、快照或备份/还原功能。There is currently no built-in index extraction, snapshot, or backup-restore feature in the Azure portal. 但是,我们正在考虑在将来的版本中添加备份和还原功能。However, we are considering adding the backup and restore functionality in a future release. 若要表达你对此功能的支持,请在 User Voice 中投票。If you want show your support for this feature, cast a vote on User Voice.

删除后能否还原索引或服务?Can I restore my index or service once it is deleted?

不可以。如果删除 Azure 认知搜索索引或服务,将无法予以恢复。No, if you delete an Azure Cognitive Search index or service, it cannot be recovered. 删除 Azure 认知搜索服务会永久删除该服务中的所有索引。When you delete an Azure Cognitive Search service, all indexes in the service are deleted permanently. 如果删除包含一个或多个 Azure 认知搜索服务的 Azure 资源组,将永久删除所有服务。If you delete an Azure resource group that contains one or more Azure Cognitive Search services, all services are deleted permanently.

重新创建索引、索引器、数据源和技能集等资源需要通过代码重新创建它们。Recreating resources such as indexes, indexers, data sources, and skillsets requires that you recreate them from code.

若要重新创建索引,必须为外部源中的数据重新编制索引。To recreate an index, you must re-index data from external sources. 因此,建议在其他数据存储(如 Azure SQL 数据库或 Cosmos DB)保留原始数据的主控副本或备份。For this reason, it is recommended that you retain a master copy or backup of the original data in another data store, such as Azure SQL Database or Cosmos DB.

作为替代方法,可以使用此 Azure 认知搜索 .NET 示例存储库中的 index-backup-restore 示例代码,将索引定义和索引快照备份到一系列 JSON 文件。As an alternative, you can use the index-backup-restore sample code in this Azure Cognitive Search .NET sample repo to back up an index definition and index snapshot to a series of JSON files. 以后,可以根据需要使用工具和文件来还原索引。Later, you can use the tool and files to restore the index, if needed.

能否从 SQL 数据库副本(适用于 Azure SQL 数据库索引器)进行索引?Can I index from SQL Database replicas (Applies to Azure SQL Database indexers)

从头开始创建索引时,对使用主要或次要副本作为数据源没有任何限制。There are no restrictions on the use of primary or secondary replicas as a data source when building an index from scratch. 然而,使用增量更新(基于已更改的记录)刷新索引时需要主要副本。However, refreshing an index with incremental updates (based on changed records) requires the primary replica. 此需求来自于 SQL 数据库,它仅确保主要副本上的更改跟踪。This requirement comes from SQL Database, which guarantees change tracking on primary replicas only. 如果尝试为索引刷新工作负荷使用次要副本,则无法保证获得所有数据。If you try using secondary replicas for an index refresh workload, there is no guarantee you get all of the data.

搜索操作Search Operations

能否跨多个索引进行搜索?Can I search across multiple indexes?

不,不支持此操作。No, this operation is not supported. 搜索始终限制在单一索引内。Search is always scoped to a single index.

能否根据用户身份限制搜索索引访问?Can I restrict search index access by user identity?

可以使用 search.in() 筛选器实现安全筛选器You can implement security filters with search.in() filter. 使用 Azure Active Directory(AAD) 等标识管理服务可很好地编写筛选器,并基于定义的用户组成员身份裁剪搜索结果。The filter composes well with identity management services like Azure Active Directory(AAD) to trim search results based on defined user group membership.

为什么确定有效的术语没有匹配项?Why are there zero matches on terms I know to be valid?

最常见的情况是不了解每种查询类型支持不同的搜索行为和语言分析级别。The most common case is not knowing that each query type supports different search behaviors and levels of linguistic analyses. 全文搜索是主要的工作负荷,包括将术语分解成词根形式的语言分析阶段。Full text search, which is the predominant workload, includes a language analysis phase that breaks down terms to root forms. 查询分析的这种特性拓宽了可能的匹配范围,因为标记化的术语能够匹配更多变体。This aspect of query parsing casts a broader net over possible matches, because the tokenized term matches a greater number of variants.

但是,通配符查询、模糊查询和正则表达式查询的分析方法与常规词或短语查询不同,并且当查询与单词在搜索索引中的分析形式不匹配时可能会导致再次调用性能不佳。Wildcard, fuzzy and regex queries, however, aren't analyzed like regular term or phrase queries and can lead to poor recall if the query does not match the analyzed form of the word in the search index. 有关查询解析和分析的详细信息,请参阅查询体系结构For more information on query parsing and analysis, see query architecture.

通配符搜索速度较慢。My wildcard searches are slow.

大多数通配符搜索查询(如前缀、模糊和正则表达式)会使用搜索索引中匹配的词在内部重写。Most wildcard search queries, like prefix, fuzzy and regex, are rewritten internally with matching terms in the search index. 这一扫描搜索索引的额外处理会增加延迟时间。This extra processing of scanning the search index adds to latency. 此外,广泛搜索查询(例如 a*),可能会使用许多词重写,因此速度可能会非常慢。Further, broad search queries, like a* for example, that are likely to be rewritten with many terms can be very slow. 对于高性能通配符搜索,请考虑定义自定义分析器For performant wildcard searches, consider defining a custom analyzer.

为什么每个搜索词的搜索优先级是一个常数,或都等于 1.0?Why is the search rank a constant or equal score of 1.0 for every hit?

默认情况下,根据匹配术语的统计属性对搜索结果打分,在结果集中从高到低排序。By default, search results are scored based on the statistical properties of matching terms, and ordered high to low in the result set. 但某些查询类型(通配符、前缀、正则表达式)始终会给文档总评分贡献一个常数分数。However, some query types (wildcard, prefix, regex) always contribute a constant score to the overall document score. 此行为是设计使然。This behavior is by design. Azure 认知搜索设置一个常量评分后,便可以在结果中包含通过查询扩展找到的匹配项,且不会影响排名。Azure Cognitive Search imposes a constant score to allow matches found through query expansion to be included in the results, without affecting the ranking.

例如,假设在通配符搜索中输入“tour*”,会产生匹配结果“tours”、“tourettes”和“tourmaline”。For example, suppose an input of "tour*" in a wildcard search produces matches on "tours", "tourettes", and "tourmaline". 由于这些结果的性质,我们无法合理推断出哪些字词的相关性高于其他字词。Given the nature of these results, there is no way to reasonably infer which terms are more valuable than others. 因此,在为通配符、前缀和正则表达式类型的查询结果评分时,我们会忽略字词频率。For this reason, we ignore term frequencies when scoring results in queries of types wildcard, prefix, and regex. 建立在不完整输入上的搜索结果获得一个常数分数,以避免可能的意外匹配偏差。Search results based on a partial input are given a constant score to avoid bias towards potentially unexpected matches.

技能组操作Skillset Operations

在引入时,是否有任何提示或技巧可降低认知服务费用?Are there any tips or tricks to reduce cognitive services charges on ingestion?

可以理解的是,你不希望执行的内置技能或自定义技能超出绝对必要的范围,尤其是在处理数百万个要处理的文档的情况下。It is understandable that you don't want to execute built-in skills or custom skills more than is absolutely necessary, especially if you are dealing with millions of documents to process. 考虑到这一点,我们已将“增量扩充”功能添加到技能组执行。With that in mind, we have added "incremental enrichment" capabilities to skillset execution. 实质上,你可以提供一个缓存位置(blob 存储连接字符串),用于存储“中间”扩充步骤的输出。In essence, you can provide a cache location (a blob storage connection string) that will be used to store the output of "intermediate" enrichment steps. 这可以使扩充管道智能化,并且使扩充管道只应用那些在修改技能组时必需的扩充。That allows the enrichment pipeline to be smart and apply only enrichments that are necessary when you modify your skillset. 这自然也会节省索引时间,因为管道会更加高效。This will naturally also save indexing time as the pipeline will be more efficient.

详细了解增量扩充Learn more about incremental enrichment

设计模式Design patterns

涉及到支持相同索引中的不同区域设置(语言)时,大多数客户会选择专用字段而非集合。Most customers choose dedicated fields over a collection when it comes to supporting different locales (languages) in the same index. 通过区域设置特定字段可分配适当的分析器。Locale-specific fields make it possible to assign an appropriate analyzer. 例如,将 Microsoft 法语分析器分配给包含法语字符串的字段。For example, assigning the Microsoft French Analyzer to a field containing French strings. 这样也简化了筛选过程。It also simplifies filtering. 如果已知在 fr-fr 页面上启动了一个查询,则可将搜索结果限制为该字段。If you know a query is initiated on a fr-fr page, you could limit search results to this field. 或者,创建一个计分概要文件以增加该字段的相关性。Or, create a scoring profile to give the field more relative weight. Azure 认知搜索支持 50 多种语言分析器,可从中进行选择。Azure Cognitive Search supports over 50 language analyzers to choose from.

后续步骤Next steps

问题是否与缺少功能相关?Is your question about a missing feature or functionality? 请在 User Voice 网站上请求该功能。Request the feature on the User Voice web site.

另请参阅See also

StackOverflow:Azure 认知搜索 StackOverflow: Azure Cognitive Search
Azure 认知搜索中全文搜索的工作原理How full text search works in Azure Cognitive Search
什么是 Azure 认知搜索?What is Azure Cognitive Search?