多租户 SaaS 应用程序与 Azure 认知搜索的设计模式Design patterns for multitenant SaaS applications and Azure Cognitive Search

多租户应用程序可以为无法看到或共享任何其他租户数据的任意数量的租户,提供相同服务和功能。A multitenant application is one that provides the same services and capabilities to any number of tenants who cannot see or share the data of any other tenant. 本文档讨论的租户隔离策略适用于使用 Azure 认知搜索生成的多租户应用程序。This document discusses tenant isolation strategies for multitenant applications built with Azure Cognitive Search.

Azure 认知搜索的概念Azure Cognitive Search concepts

作为一种搜索即服务解决方案,Azure 认知搜索允许开发人员将丰富的搜索体验添加到应用程序中,而无需管理任何基础结构,或者成为信息检索方面的专家。As a search-as-a-service solution, Azure Cognitive Search allows developers to add rich search experiences to applications without managing any infrastructure or becoming an expert in information retrieval. 数据上载到服务,并存储在云中。Data is uploaded to the service and then stored in the cloud. 通过对 Azure 认知搜索 API 发出简单请求,即可修改和搜索数据。Using simple requests to the Azure Cognitive Search API, the data can then be modified and searched.

搜索服务、索引、字段和文档Search services, indexes, fields, and documents

在讨论设计模式之前,应务必了解一些基本概念。Before discussing design patterns, it is important to understand a few basic concepts.

使用 Azure 认知搜索时,即已订阅一种搜索服务When using Azure Cognitive Search, one subscribes to a search service. 当数据上传到 Azure 认知搜索后,将存储在搜索服务内的一个索引中。As data is uploaded to Azure Cognitive Search, it is stored in an index within the search service. 单个服务中可能有大量索引。There can be a number of indexes within a single service. 若要利用熟悉的数据库概念,搜索服务可以比作一个数据库,而服务中的索引可以比作数据库中的表。To use the familiar concepts of databases, the search service can be likened to a database while the indexes within a service can be likened to tables within a database.

搜索服务中的每个索引具有自己的架构,由大量的可自定义字段定义。Each index within a search service has its own schema, which is defined by a number of customizable fields. 数据以各文档的形式添加到 Azure 认知搜索索引中。Data is added to an Azure Cognitive Search index in the form of individual documents. 每个文档都必须上传到一个特定索引,并且必须适合该索引的架构。Each document must be uploaded to a particular index and must fit that index's schema. 使用 Azure 认知搜索进行数据搜索时,将针对某个特定索引发出全文搜索查询。When searching data using Azure Cognitive Search, the full-text search queries are issued against a particular index. 若要与数据库中的概念进行比较,字段可以比作表中的列,文档可以比作行。To compare these concepts to those of a database, fields can be likened to columns in a table and documents can be likened to rows.

可伸缩性Scalability

标准定价层中的任何 Azure 认知搜索服务都可以在两个维度中扩展:存储和可用性。Any Azure Cognitive Search service in the Standard pricing tier can scale in two dimensions: storage and availability.

  • 可以添加分区以便增加搜索服务的存储。Partitions can be added to increase the storage of a search service.
  • 可以将副本添加到服务中,以便增加搜索服务可处理请求的吞吐量。Replicas can be added to a service to increase the throughput of requests that a search service can handle.

添加和删除分区以及副本,可使搜索服务的容量随着应用程序需要的大量数据和流量一起增加。Adding and removing partitions and replicas at will allow the capacity of the search service to grow with the amount of data and traffic the application demands. 为了使搜索服务实现读取 SLA,需要两个副本。In order for a search service to achieve a read SLA, it requires two replicas. 为了使服务实现读写 SLA,需要三个副本。In order for a service to achieve a read-write SLA, it requires three replicas.

Azure 认知搜索中有一些不同的定价层,每一层都有不同的限制和配额There are a few different pricing tiers in Azure Cognitive Search, each of the tiers has different limits and quotas. 有一些限制位于服务级别,有一些位于索引级别,还有一些位于分区级别。Some of these limits are at the service-level, some are at the index-level, and some are at the partition-level.

基本Basic 标准 1Standard1 标准 2Standard2 标准 3Standard3 标准 3 HDStandard3 HD
每个服务的副本数上限Maximum Replicas per Service 33 1212 1212 1212 1212
每个服务的分区数上限Maximum Partitions per Service 11 1212 1212 1212 33
每个服务的搜索单位数上限(副本*分区)Maximum Search Units (Replicas*Partitions) per Service 33 3636 3636 3636 36(最多 3 个分区)36 (max 3 partitions)
每个服务的存储上限Maximum Storage per Service 2 GB2 GB 300 GB300 GB 1.2 TB1.2 TB 2.4 TB2.4 TB 600 GB600 GB
每个分区的存储上限Maximum Storage per Partition 2 GB2 GB 25 GB25 GB 100 GB100 GB 200 GB200 GB 200 GB200 GB
每个服务的索引数上限Maximum Indexes per Service 55 5050 200200 200200 3000(最多 1000 个索引/分区)3000 (max 1000 indexes/partition)

S3 高密度S3 High Density'

在 Azure 认知搜索的 S3 定价层中,有一个专门为多租户方案设计的高密度 (HD) 模式的选项。In Azure Cognitive Search’s S3 pricing tier, there is an option for the High Density (HD) mode designed specifically for multitenant scenarios. 在许多情况下,都有必要支持单个服务下的大量较小租户,从而获得简洁性和成本效益带来的优势。In many cases, it is necessary to support a large number of smaller tenants under a single service to achieve the benefits of simplicity and cost efficiency.

S3 HD 通过以使用分区扩展索引的能力,换得在单个服务中承载更多索引的能力,允许多个小索引在单个搜索服务中填满并受其管理。S3 HD allows for the many small indexes to be packed under the management of a single search service by trading the ability to scale out indexes using partitions for the ability to host more indexes in a single service.

S3 服务旨在托管固定数量的索引(最多 200 个),并允许每个索引在新分区添加到服务时大小水平缩放。An S3 service is designed to host a fixed number of indexes (maximum 200) and allow each index to scale in size horizontally as new partitions are added to the service. 向 S3 HD 服务添加分区会增加服务可托管的最大索引数。Adding partitions to S3 HD services increases the maximum number of indexes that the service can host. 尽管系统对每个索引都没有硬性大小限制,但是单个 S3HD 索引的理想最大大小约为 50 - 80 GB。The ideal maximum size for an individual S3HD index is around 50 - 80 GB, although there is no hard size limit on each index imposed by the system.

多租户应用程序注意事项Considerations for multitenant applications

多租户应用程序必须在租户之间高效分配资源,同时在各租户之间保持一定程度的隐私。Multitenant applications must effectively distribute resources among the tenants while preserving some level of privacy between the various tenants. 设计此类应用程序的体系结构时需要了解几个注意事项:There are a few considerations when designing the architecture for such an application:

  • 租户隔离: 应用程序开发人员需要采取适当措施,确保任何租户都无法对其他租户的数据进行未经授权或未经允许的访问。Tenant isolation: Application developers need to take appropriate measures to ensure that no tenants have unauthorized or unwanted access to the data of other tenants. 从数据隐私的角度之上来看,租户隔离策略需要有效管理共享资源并且避免受到干扰性邻户影响。Beyond the perspective of data privacy, tenant isolation strategies require effective management of shared resources and protection from noisy neighbors.
  • 云资源成本: 与任何其他应用程序一样,软件解决方案必须将保持成本竞争力作为多租户应用程序的一部分考虑。Cloud resource cost: As with any other application, software solutions must remain cost competitive as a component of a multitenant application.
  • 操作易用性: 开发多租户体系结构时,对应用程序操作和复杂性的影响是一个重要的考虑因素。Ease of Operations: When developing a multitenant architecture, the impact on the application's operations and complexity is an important consideration. Azure 认知搜索提供 99.9% SLAAzure Cognitive Search has a 99.9% SLA.
  • 全球分布: 多租户应用程序可能需要高效地为分布在全球范围内的租户提供服务。Global footprint: Multitenant applications may need to effectively serve tenants which are distributed across the globe.
  • 可伸缩性: 应用程序开发人员需要考虑如何在以下二者之间进行协调:保持应用程序复杂性级别足够低,以及所设计的应用程序可随着租户数量和租户数据与工作负荷大小的增加而进行扩展。Scalability: Application developers need to consider how they reconcile between maintaining a sufficiently low level of application complexity and designing the application to scale with number of tenants and the size of tenants' data and workload.

Azure 认知搜索提供几个可用于隔离租户数据和工作负荷的边界。Azure Cognitive Search offers a few boundaries that can be used to isolate tenants’ data and workload.

对于多租户方案,应用程序开发人员使用一个或多个搜索服务,并在各服务和/或各索引中划分其租户。In the case of a multitenant scenario, the application developer consumes one or more search services and divide their tenants among services, indexes, or both. Azure 认知搜索具有一些适用于对多租户方案建模的常见模式:Azure Cognitive Search has a few common patterns when modeling a multitenant scenario:

  1. 每租户索引: 每个租户都在搜索服务中有自己与其他租户共享的索引。Index per tenant: Each tenant has its own index within a search service that is shared with other tenants.
  2. 每租户服务: 每个租户都有自己专用的 Azure 认知搜索服务,从而提供最高级别的数据和工作负荷分隔。Service per tenant: Each tenant has its own dedicated Azure Cognitive Search service, offering highest level of data and workload separation.
  3. 二者混合: 为较大、活跃度较高的租户分配专用服务,而为较小的租户分配共享服务中的单个索引。Mix of both: Larger, more-active tenants are assigned dedicated services while smaller tenants are assigned individual indexes within shared services.

1.每租户索引1. Index per tenant

每租户索引模型描绘

在每租户索引模型中,多个租户占用一个 Azure 认知搜索服务,其中每个租户拥有自己的索引。In an index-per-tenant model, multiple tenants occupy a single Azure Cognitive Search service where each tenant has their own index.

租户实现数据隔离,因为所有搜索请求和文档操作都在 Azure 认知搜索的索引级别发出。Tenants achieve data isolation because all search requests and document operations are issued at an index level in Azure Cognitive Search. 在应用程序层中,要带着需求意识将各租户的流量定向到正确的索引,同时还要跨所有租户在服务级别上管理资源。In the application layer, there is the need awareness to direct the various tenants’ traffic to the proper indexes while also managing resources at the service level across all tenants.

每租户索引模型的一个关键特性是应用程序开发人员能够在应用程序的租户之间超额订阅搜索服务容量。A key attribute of the index-per-tenant model is the ability for the application developer to oversubscribe the capacity of a search service among the application’s tenants. 如果租户的工作负载分布不均,最佳租户组合可以是跨搜索服务索引分布,以便容纳大量高度活跃的资源密集型租户,同时又能够为活跃度较低但具有长尾效应的租户提供服务。If the tenants have an uneven distribution of workload, the optimal combination of tenants can be distributed across a search service’s indexes to accommodate a number of highly active, resource-intensive tenants while simultaneously serving a long tail of less active tenants. 弊端是模型无法处理每个租户同时高度活跃的情况。The trade-off is the inability of the model to handle situations where each tenant is concurrently highly active.

每租户索引模型为可变成本模型提供基础,在该模型中,预先提供整个 Azure 认知搜索服务,随后再填充租户。The index-per-tenant model provides the basis for a variable cost model, where an entire Azure Cognitive Search service is bought up-front and then subsequently filled with tenants. 这样可以为试用和免费帐户指定未使用的容量。This allows for unused capacity to be designated for trials and free accounts.

对于全球分布的应用程序,每租户索引模型可能不是最有效的。For applications with a global footprint, the index-per-tenant model may not be the most efficient. 如果应用程序的租户在全球分布,每个区域可能都需要一个单独的服务,而每个都可能叠加成本。If an application's tenants are distributed across the globe, a separate service may be necessary for each region which may duplicate costs across each of them.

Azure 认知搜索允许各索引和索引总数的规模增加。Azure Cognitive Search allows for the scale of both the individual indexes and the total number of indexes to grow. 如果选择相应的定价层,当服务中的单个索引在存储或流量方面增长到过于庞大时,可以向整个搜索服务增加分区和副本。If an appropriate pricing tier is chosen, partitions and replicas can be added to the entire search service when an individual index within the service grows too large in terms of storage or traffic.

如果索引总数对于单个服务而言增长过高,另一个服务必须预配为能够容纳新租户。If the total number of indexes grows too large for a single service, another service has to be provisioned to accommodate the new tenants. 如果当新服务添加后,必须在搜索服务之间移动索引,索引中的数据必须以手动方式从一个索引复制到另一个中,因为 Azure 认知搜索不允许移动索引。If indexes have to be moved between search services as new services are added, the data from the index has to be manually copied from one index to the other as Azure Cognitive Search does not allow for an index to be moved.

2.每租户服务2. Service per tenant

每租户服务模型描绘

在每租户服务体系结构中,每个租户都有自己的搜索服务。In a service-per-tenant architecture, each tenant has its own search service.

在此模型中,应用程序为其租户实现最高隔离级别。In this model, the application achieves the maximum level of isolation for its tenants. 每个服务都有专用存储和吞吐量,用于处理搜索请求以及单独的 API 密钥。Each service has dedicated storage and throughput for handling search request as well as separate API keys.

对于每个租户具有较大分布范围或租户与租户之间工作负载差异不大的应用程序而言,每租户服务模型是一个有效的选择,因为资源不在各租户的工作负载之间共享。For applications where each tenant has a large footprint or the workload has little variability from tenant to tenant, the service-per-tenant model is an effective choice as resources are not shared across various tenants’ workloads.

每租户服务模型还提供可预测、成本固定模型的优势。A service per tenant model also offers the benefit of a predictable, fixed cost model. 在填充租户之前,无需在整个搜索服务中进行前期投资,但是每租户成本高于每租户索引模型。There is no up-front investment in an entire search service until there is a tenant to fill it, however the cost-per-tenant is higher than an index-per-tenant model.

每租户服务模型对于全球分布的应用程序而言是一种高效选择。The service-per-tenant model is an efficient choice for applications with a global footprint. 对于地域分布式租户,很难在相应的区域中让每个租户都拥有服务。With geographically-distributed tenants, it is easy to have each tenant's service in the appropriate region.

当各租户发展过快使其服务不再适用时,扩展此模式将遇到困难。The challenges in scaling this pattern arise when individual tenants outgrow their service. Azure 认知搜索当前不支持升级搜索服务的定价层,因此所有数据都需要以手动方式复制到新服务中。Azure Cognitive Search does not currently support upgrading the pricing tier of a search service, so all data would have to be manually copied to a new service.

3.混合使用两种模型3. Mixing both models

另一种对多组织建模的模式是混合使用每租户索引和每租户服务策略。Another pattern for modeling multitenancy is mixing both index-per-tenant and service-per-tenant strategies.

通过混合使用这两种模式,应用程序的最大租户可以占用专用服务,而活跃度较低且具有长尾效应的较小租户可以占用共享服务中的索引。By mixing the two patterns, an application's largest tenants can occupy dedicated services while the long tail of less active, smaller tenants can occupy indexes in a shared service. 此模型可确保最大租户持续享有服务的高性能,同时帮助较小租户避免受到干扰性邻户的影响。This model ensures that the largest tenants have consistently high performance from the service while helping to protect the smaller tenants from any noisy neighbors.

但是,实现此策略依赖于远见性,要预测哪些租户将需要专用服务,哪些将需要共享服务中的索引。However, implementing this strategy relies foresight in predicting which tenants will require a dedicated service versus an index in a shared service. 应用程序复杂性随着这两种多组织模型的管理需求而增长。Application complexity increases with the need to manage both of these multitenancy models.

实现更精细的粒度Achieving even finer granularity

在 Azure 认知搜索中进行多租户建模方案的以上设计模式假定了一个统一的范围,其中每个租户都是应用程序的一个完整实例。The above design patterns to model multitenant scenarios in Azure Cognitive Search assume a uniform scope where each tenant is a whole instance of an application. 但是,应用程序有时可能处理多个较小的范围。However, applications can sometimes handle many smaller scopes.

如果每租户服务和每租户索引模型不是足够小的范围,则无法对索引建模以实现更精细的粒度。If service-per-tenant and index-per-tenant models are not sufficiently small scopes, it is possible to model an index to achieve an even finer degree of granularity.

若要使单个索引的行为与其他客户端终结点有所不同,可以向索引添加字段,为每个可能的客户端指定某个值。To have a single index behave differently for different client endpoints, a field can be added to an index which designates a certain value for each possible client. 每次客户端调用 Azure 认知搜索查询或修改索引时,客户端应用程序的代码都在查询时使用 Azure 认知搜索的筛选功能为该字段指定相应值。Each time a client calls Azure Cognitive Search to query or modify an index, the code from the client application specifies the appropriate value for that field using Azure Cognitive Search's filter capability at query time.

此方法可用于实现单独用户帐户的功能、分隔权限级别甚至完全分隔应用程序。This method can be used to achieve functionality of separate user accounts, separate permission levels, and even completely separate applications.

备注

如果使用上面介绍的方法将单个索引配置为为多个租户提供服务,将影响搜索结果的相关性。Using the approach described above to configure a single index to serve multiple tenants affects the relevance of search results. 搜索相关性分数在索引级范围(而不是租户级范围)内计算,因此所有租户的数据都纳入相关性分数的基础统计数据(如术语频率)。Search relevance scores are computed at an index-level scope, not a tenant-level scope, so all tenants' data is incorporated in the relevance scores' underlying statistics such as term frequency.

后续步骤Next steps

对于许多应用程序而言,Azure 认知搜索是极具吸引力的选项。Azure Cognitive Search is a compelling choice for many applications. 评估多租户应用程序的各个设计模式时,请考虑各个定价层和各自的服务限制,定制最合适的 Azure 认知搜索以满足所有大小的应用程序工作负载和体系结构需求。When evaluating the various design patterns for multitenant applications, consider the various pricing tiers and the respective service limits to best tailor Azure Cognitive Search to fit application workloads and architectures of all sizes.

有关 Azure 认知搜索和多租户方案的任何疑问都可发往 azuresearch_contact@microsoft.com。Any questions about Azure Cognitive Search and multitenant scenarios can be directed to azuresearch_contact@microsoft.com.