Azure 认知搜索的性能缩放Scale for performance on Azure Cognitive Search

本文介绍高级方案的最佳做法,以及可伸缩性和可用性方面的复杂要求。This article describes best practices for advanced scenarios with sophisticated requirements for scalability and availability.

从基线数字开始Start with baseline numbers

在开展更大的部署工作之前,请确保了解典型查询负载的大致形式。Before undertaking a larger deployment effort, make sure you know what a typical query load looks like. 以下准则可帮助你制定出基准查询数字。The following guidelines can help you arrive at baseline query numbers.

  1. 选取完成典型搜索请求应该花费的目标延迟(或最大时间量)。Pick a target latency (or maximum amount of time) that a typical search request should take to complete.

  2. 针对搜索服务,使用现实数据集来创建和测试工作负荷,以测量这些延迟率。Create and test a real workload against your search service with a realistic data set to measure these latency rates.

  3. 从较小的每秒查询数 (QPS) 开始,并逐渐增加在测试中执行的数量,直到查询延迟降到定义的目标之下为止。Start with a low number of queries per second (QPS) and then gradually increase the number executed in the test until the query latency drops below the predefined target. 这是一个重要的基准,可帮助你计划应用程序在使用量增长方面的规模。This is an important benchmark to help you plan for scale as your application grows in usage.

  4. 只要有可能,请重用 HTTP 连接。Wherever possible, reuse HTTP connections. 如果使用的是 Azure 认知搜索 .NET SDK,这意味着你应该重用某个实例或 SearchClient 实例,如果使用的是 REST API,则应该重用单个 HttpClient。If you are using the Azure Cognitive Search .NET SDK, this means you should reuse an instance or SearchClient instance, and if you are using the REST API, you should reuse a single HttpClient.

  5. 差异化查询请求的主旨,以针对索引的不同组成部分执行搜索。Vary the substance of query requests so that search occurs over different parts of your index. 差异化很重要,因为如果不断执行相同的搜索请求,那么比起包含一个更加迥然不同的查询集,数据的缓存将开始使性能看起变得更好。Variation is important because if you continually execute the same search requests, caching of data will start to make performance look better than it might with a more disparate query set.

  6. 差异化查询请求的结构,以获取不同类型的查询。Vary the structure of query requests so that you get different types of queries. 并非每个搜索查询都在相同的级别执行。Not every search query performs at the same level. 例如,与包含大量平面和筛选器的查询的相比,文档查找或搜索建议的执行速度要更快一些。For example, a document lookup or search suggestion is typically faster than a query with a significant number of facets and filters. 测试组成部分应包括各种查询,各查询的比例应与生产环境中预期使用的比例大致相同。Test composition should include various queries, in roughly the same ratios as you would expect in production.

在创建这些测试工作负荷时,需要记住 Azure 认知搜索的下面这些特征:While creating these test workloads, there are some characteristics of Azure Cognitive Search to keep in mind:

  • 一次性推送过多的搜索查询可能会使服务过载。It is possible overload your service by pushing too many search queries at one time. 发生这种情况时,会看到 HTTP 503 响应代码。When this happens, you will see HTTP 503 response codes. 为了避免在测试期间出现 503 代码,请从不同范围的搜索请求开始,以查看在添加更多的搜索请求时延迟速率中的差异。To avoid a 503 during testing, start with various ranges of search requests to see the differences in latency rates as you add more search requests.

  • Azure 认知搜索不会在后台运行索引编制任务。Azure Cognitive Search does not run indexing tasks in the background. 如果服务同时处理查询和索引编制工作负荷,请考虑到这一点:将索引编制作业引入查询测试,或者探讨在非高峰期运行索引编制作业的选项。If your service handles query and indexing workloads concurrently, take this into account by either introducing indexing jobs into your query tests, or by exploring options for running indexing jobs during off peak hours.

针对高查询量的缩放Scale for high query volume

如果查询时间过长或者服务开始丢弃请求,则表示服务已经过载。A service is overburdened when queries take too long or when the service starts dropping requests. 如果发生这种情况,可通过以下两种方式之一解决问题:If this happens, you can address the problem in one of two ways:

  • 添加副本Add replicas

    每个副本是数据的副本,它可以让服务根据多个副本对请求进行负载均衡。Each replica is a copy of your data, allowing the service to load balance requests against multiple copies. 所有负载均衡和数据复制均由 Azure 认知搜索管理,随时可以更改为服务分配的副本数量。All load balancing and replication of data is managed by Azure Cognitive Search and you can alter the number of replicas allocated for your service at any time. 最大可在一个标准搜索服务中分配 12 个副本,并在一个基本搜索服务中分配 3 个副本。You can allocate up to 12 replicas in a Standard search service and 3 replicas in a Basic search service. 可以从 Azure 门户PowerShell 调整副本。Replicas can be adjusted either from the Azure portal or PowerShell.

  • 在更高的层上创建新服务Create a new service at a higher tier

    Azure 认知搜索提供许多的层,每个层提供不同级别的性能。Azure Cognitive Search comes in a number of tiers and each one offers different levels of performance. 在某些情况下,可能有太多查询,即使在副本数已达到最大数目,所在的层仍无法提供足够的周转时间。对于这种情况,请考虑转移到性能更高的层,例如“标准 S3”层,该层可以满足包含大量文档和极高查询工作负荷的方案。In some cases, you may have so many queries that the tier you are on cannot provide sufficient turnaround, even when replicas are maxed out. In this case, consider moving to a higher performing tier, such as the Standard S3 tier, designed for scenarios having large numbers of documents and extremely high query workloads.

针对单个查询速度缓慢进行缩放Scale for slow individual queries

延迟率增大的另一个原因是,完成单个查询花费了太长的时间。Another reason for high latency rates is a single query taking too long to complete. 在这种情况下,添加副本不起作用。In this case, adding replicas will not help. 有作用的两个可能选项包括:Two possible options that might help include the following:

  • 增加分区Increase Partitions

    分区在额外的计算资源之间拆分数据。A partition splits data across extra computing resources. 两个分区会将数据拆分为两半,三个分区会将数据拆分为三份,依此类推。Two partitions split data in half, a third partition splits it into thirds, and so forth. 一个有利的副作用是,由于并行计算,较慢的查询有时执行速度更快。One positive side-effect is that slower queries sometimes perform faster due to parallel computing. 我们在低选择性查询(例如,匹配许多文档的查询,或提供大量文档的计数的分面)上注意到了并行化效果。We have noted parallelization on low selectivity queries, such as queries that match many documents, or facets providing counts over a large number of documents. 由于为文档相关性评分或统计文档数目需要消耗大量的计算资源,添加额外的分区有助于加快查询的完成速度。Since significant computation is required to score the relevancy of the documents, or to count the numbers of documents, adding extra partitions helps queries complete faster.

    在标准搜索服务中最多可以有 12 个分区,在基本搜索服务中最多可以有 1 个分区。There can be a maximum of 12 partitions in Standard search service and 1 partition in the Basic search service. 可以从 Azure 门户PowerShell 调整分区。Partitions can be adjusted either from the Azure portal or PowerShell.

  • 限制高基数字段Limit High Cardinality Fields

    高基数字段包含具有大量唯一值的可查找或可筛选字段,因此,会在计算结果时会消耗大量的资源。A high cardinality field consists of a facetable or filterable field that has a significant number of unique values, and as a result, consumes significant resources when computing results. 例如,将“产品 ID”或“描述”字段设置为可查找/可筛选会导致高基数,因为大多数值在不同的文档中是唯一的。For example, setting a Product ID or Description field as facetable/filterable would count as high cardinality because most of the values from document to document are unique. 只要有可能,请限制高基数字段的数量。Wherever possible, limit the number of high cardinality fields.

  • 增加搜索层Increase Search Tier

    另一种方法是,向上移动到更高的 Azure 认知搜索层,可以为较慢的查询改进性能。Moving up to a higher Azure Cognitive Search tier can be another way to improve performance of slow queries. 每个更高的层提供更快的 CPU 和更多的内存,这会对查询性能产生积极的影响。Each higher tier provides faster CPUs and more memory, both of which have a positive impact on query performance.

针对可用性进行缩放Scale for availability

副本不仅可以帮助缩短查询延迟,而且还能实现高可用性。Replicas not only help reduce query latency, but can also allow for high availability. 借助单个副本,应该可以预期周期性的停机时间,因为在软件更新之后,或针对其他将执行的维护活动后,服务器会周期性停机。With a single replica, you should expect periodic downtime due to server reboots after software updates or for other maintenance events that will occur. 因此,请务必考虑应用程序是否需要搜索(查询)以及写入(编制索引事件)的高可用性。As a result, it is important to consider if your application requires high availability of searches (queries) as well as writes (indexing events). Azure 认知搜索在具有以下属性的所有付费搜索产品/服务上提供 SLA 选项:Azure Cognitive Search offers SLA options on all the paid search offerings with the following attributes:

  • 对于只读工作负荷(查询),需要有两个副本才能实现高可用性Two replicas for high availability of read-only workloads (queries)

  • 针对读写工作负荷(查询和索引),需要有三个或更多副本才可实现高可用性Three or more replicas for high availability of read-write workloads (queries and indexing)

有关这方面的更多详细信息,请访问 Azure 认知搜索服务级别协议For more details on this, please visit the Azure Cognitive Search Service Level Agreement.

由于副本是数据的副本,因此,使用多个副本可让 Azure 认知搜索针对一个副本执行计算机重新启动和维护,同时可继续针对其他副本执行查询。Since replicas are copies of your data, having multiple replicas allows Azure Cognitive Search to do machine reboots and maintenance against one replica, while query execution continues on other replicas. 相反,如果删除副本,将会导致查询性能下降,并认为这些副本是未充分利用的资源。Conversely, if you take replicas away, you'll incur query performance degradation, assuming those replicas were an under-utilized resource.

针对地理分散的工作负荷和异地冗余进行缩放Scale for geo-distributed workloads and geo-redundancy

对于地理分散的工作负荷,与宿主数据中心距离较远的用户将遇到更高的延迟率。For geo-distributed workloads, users who are located far from the host data center will have higher latency rates. 一种缓解措施是在与这些用户更靠近的区域中预配多个搜索服务。One mitigation is to provision multiple search services in regions with closer proximity to these users.

Azure 认知搜索当前不提供自动化方法来跨区域异地复制 Azure 认知搜索索引,但有一些技巧,可以用来使此过程很容易实现和管理。Azure Cognitive Search does not currently provide an automated method of geo-replicating Azure Cognitive Search indexes across regions, but there are some techniques that can be used that can make this process simple to implement and manage. 我们会在下面几节介绍这些技巧。These are outlined in the next few sections.

地理分散的搜索服务集的目标是,让两个或更多索引在两个或更多区域中可用,在这些区域中用户会被路由到 Azure 认知搜索服务,以提供最低延迟,如以下示例中所示:The goal of a geo-distributed set of search services is to have two or more indexes available in two or more regions, where a user is routed to the Azure Cognitive Search service that provides the lowest latency as seen in this example:


在多个服务之间保持数据同步Keep data synchronized across multiple services

有两个选项可让分布式搜索服务保持同步,包括使用 Azure 认知搜索索引器或推送 API(也称为 Azure 认知搜索 REST API)。There are two options for keeping your distributed search services in sync, which consist of either using the Azure Cognitive Search Indexer or the Push API (also referred to as the Azure Cognitive Search REST API).

使用索引器更新多个服务中的内容Use indexers for updating content on multiple services

如果已在一个服务中使用索引器,可以在另一个服务中配置另一个索引器,以使用相同的数据源对象,并从相同的位置提取数据。If you are already using indexer on one service, you can configure a second indexer on a second service to use the same data source object, pulling data from the same location. 每个区域中的每个服务具有自身的索引器和目标索引(搜索索引不会共享,这意味着数据是重复的),但每个索引器引用相同的数据源。Each service in each region has its own indexer and a target index (your search index is not shared, which means data is duplicated), but each indexer references the same data source.

下面是该体系结构的概要视图。Here is a high-level visual of what that architecture would look like.


使用 REST API 在多个服务中推送内容更新Use REST APIs for pushing content updates on multiple services

如果使用 Azure 认知搜索 REST API 推送 Azure 认知搜索索引中的内容,则可以在需要更新时,可以通过将更改推送到所有搜索服务,以保持各种搜索服务同步。If you are using the Azure Cognitive Search REST API to push content in your Azure Cognitive Search index, you can keep your various search services in sync by pushing changes to all search services whenever an update is required. 在代码中,请确保处理好这种情况:对一个搜索服务的更新失败,但对于其他搜索服务成功。In your code, make sure to handle cases where an update to one search service fails but succeeds for other search services.

利用 Azure 流量管理器Leverage Azure Traffic Manager

通过使用 Azure 流量管理器,可以将请求路由到多个地理定位的网站,而这些网站由多个搜索服务支持。Azure Traffic Manager allows you to route requests to multiple geo-located websites that are then backed by multiple search services. 流量管理器的一个优点是,它可以探测 Azure 认知搜索以确保其可用,并在发生停机时会用户路由到备用搜索服务。One advantage of the Traffic Manager is that it can probe Azure Cognitive Search to ensure that it is available and route users to alternate search services in the event of downtime. 此外,如果通过 Azure 网站路由搜索请求,Azure 流量管理器允许在网站启动但没有 Azure 认知搜索的情形下进行负载均衡。In addition, if you are routing search requests through Azure Web Sites, Azure Traffic Manager allows you to load balance cases where the Website is up but not Azure Cognitive Search. 下面是利用流量管理器的体系结构的示例。Here is an example of what the architecture that leverages Traffic Manager.


后续步骤Next steps

若要详细了解定价层和每个层的服务限制,请参阅服务限制To learn more about the pricing tiers and services limits for each one, see Service limits. 参阅规划容量详细了解分区和副本的组合。See Plan for capacity to learn more about partition and replica combinations.