设计可伸缩的高性能表Design scalable and performant tables

提示

本文中的内容适用于原始的 Azure 表存储。The content in this article applies to the original Azure Table storage. 但是,现在有一种表存储的高级产品/服务:Azure Cosmos DB 表 API。However, there is now a premium offering for table storage: the Azure Cosmos DB Table API. 此 API 提供吞吐量优化表、全局分发和自动辅助索引。This API offers throughput-optimized tables, global distribution, and automatic secondary indexes. Azure Cosmos DB 中的表 API 和 Azure 表存储之间存在某些功能差异There are some feature differences between Table API in Azure Cosmos DB and Azure table storage. 若要详细了解相关信息并尝试高级体验,请参阅 Azure Cosmos DB 表 APIFor more information, and to try out the premium experience, see Azure Cosmos DB Table API.

要设计可伸缩的高性能表,必须考虑性能、可伸缩性和成本等诸多因素。To design scalable and performant tables, you must consider factors such as performance, scalability, and cost. 如果你以前为关系数据库设计过架构,则应当很熟悉这些注意事项,尽管 Azure 表服务存储模型与关系模型之间有一些相似之处,但也存在重大差异。If you have previously designed schemas for relational databases, these considerations are familiar, but while there are some similarities between the Azure Table service storage model and relational models, there are also important differences. 这些差异通常会导致不同的设计,这些设计对于熟悉关系数据库的人来说可能看起来不直观或是错误的,但如果正在设计 Azure 表服务等 NoSQL 键/值存储,就会体会到这些设计是合理的。These differences typically lead to different designs that may look counter-intuitive or wrong to someone familiar with relational databases, yet make sense if you are designing for a NoSQL key/value store such as the Azure Table service. 许多设计差异将反映这样一个事实:表服务旨在支持云级别应用程序,这些应用程序可包含数十亿个实体(或关系数据库术语所称的行)的数据,或者用于必须支持高事务量的数据集。Many of your design differences reflect the fact that the Table service is designed to support cloud-scale applications that can contain billions of entities (or rows in relational database terminology) of data or for datasets that must support high transaction volumes. 因此,需要以不同方式考虑如何存储数据,并了解表服务的工作原理。Therefore, you must think differently about how you store your data and understand how the Table service works. 相对于使用关系数据库的解决方案而言,设计良好的 NoSQL 数据存储可以使解决方案以更低的成本更进一步扩展。A well-designed NoSQL data store can enable your solution to scale much further and at a lower cost than a solution that uses a relational database. 本指南中介绍这些主题。This guide helps you with these topics.

关于 Azure 表服务About the Azure Table service

本部分重点介绍表服务的一些主要功能,这些功能尤其与设计性能和可伸缩性相关。This section highlights some of the key features of the Table service that are especially relevant to designing for performance and scalability. 如果不熟悉 Azure 存储和表服务,请在阅读本文的其他部分之前,先阅读 Azure 存储简介通过 .NET 实现 Azure 表存储入门If you're new to Azure Storage and the Table service, first read Introduction to Azure Storage and Get started with Azure Table Storage using .NET before reading the remainder of this article. 尽管本指南的重点是介绍表服务,但它也包括对 Azure 队列和 Blob 服务的论述,并介绍了如何将它们与表服务一起使用。Although the focus of this guide is on the Table service, it includes discussion of the Azure Queue and Blob services, and how you might use them with the Table service.

什么是表服务?What is the Table service? 从名称可以推测出,表服务将使用表格格式来存储数据。As you might expect from the name, the Table service uses a tabular format to store data. 在标准术语中,表的每一行表示一个实体,而列存储该实体的各种属性。In the standard terminology, each row of the table represents an entity, and the columns store the various properties of that entity. 每个实体都有唯一地标识它的一对键,还有一个时间戳列,表服务使用该列来跟踪实体的最后更新时间。Every entity has a pair of keys to uniquely identify it, and a timestamp column that the Table service uses to track when the entity was last updated. 时间戳是自动应用的,无法使用任意值手动覆盖它。The timestamp is applied automatically, and you cannot manually overwrite the timestamp with an arbitrary value. 表服务使用此上次修改时间戳 (LMT) 来管理开放式并发。The Table service uses this last-modified timestamp (LMT) to manage optimistic concurrency.

备注

表服务 REST API 操作还会返回它从 LMT 推导出的 ETag 值。The Table service REST API operations also return an ETag value that it derives from the LMT. 本文档互换使用术语 ETag 和 LMT,因为它们指的是同一基础数据。This document uses the terms ETag and LMT interchangeably because they refer to the same underlying data.

下面的示例演示了一个简单的表设计,该表用于存储员工和部门实体。The following example shows a simple table design to store employee and department entities. 本指南后面所示的许多示例都基于此简单设计。Many of the examples shown later in this guide are based on this simple design.

PartitionKeyPartitionKey RowKeyRowKey 时间戳Timestamp
MarketingMarketing 0000100001 2014-08-22T00:50:32Z2014-08-22T00:50:32Z
FirstNameFirstName LastNameLastName 年龄Age 电子邮件Email
DonDon HallHall 3434 donh@contoso.com
MarketingMarketing 0000200002 2014-08-22T00:50:34Z2014-08-22T00:50:34Z
FirstNameFirstName LastNameLastName 年龄Age 电子邮件Email
六月Jun CaoCao 4747 junc@contoso.com
MarketingMarketing 部门Department 2014-08-22T00:50:30Z2014-08-22T00:50:30Z
DepartmentNameDepartmentName EmployeeCountEmployeeCount
MarketingMarketing 153153
SalesSales 0001000010 2014-08-22T00:50:44Z2014-08-22T00:50:44Z
FirstNameFirstName LastNameLastName 年龄Age 电子邮件Email
KenKen KwokKwok 2323 kenk@contoso.com

到目前为止,此数据看起来非常类似于关系数据库中的表,主要区别是有必需的列,以及能够在同一个表中存储多种实体类型。So far, this data appears similar to a table in a relational database with the key differences being the mandatory columns, and the ability to store multiple entity types in the same table. 此外,FirstNameAge 等用户定义的每个属性还具有数据类型(如 integer 或 string),就像关系数据库中的列一样。Also, each of the user-defined properties such as FirstName or Age has a data type, such as integer or string, just like a column in a relational database. 虽然与关系数据库中不同,表服务的架构灵活性质意味着每个实体的属性不需要具有相同的数据类型。Although unlike in a relational database, the schema-less nature of the Table service means that a property need not have the same data type on each entity. 若要在单个属性中存储复杂数据类型,必须使用序列化格式(例如,JSON 或 XML)。To store complex data types in a single property, you must use a serialized format such as JSON or XML. 若要深入了解表服务(例如支持的数据类型、支持的日期范围、命名规则和大小限制),请参阅 Understanding the Table Service Data Model(了解表服务数据模型)。For more information about the table service such as supported data types, supported date ranges, naming rules, and size constraints, see Understanding the Table Service Data Model.

PartitionKeyRowKey 的选择是实现良好的表设计的基础。Your choice of PartitionKey and RowKey is fundamental to good table design. 表中存储的每个实体都必须具有唯一的 PartitionKeyRowKeyEvery entity stored in a table must have a unique combination of PartitionKey and RowKey. 与关系数据库表中的键一样,将为 PartitionKeyRowKey 值编制索引来创建聚集索引以便快速地进行查找。As with keys in a relational database table, the PartitionKey and RowKey values are indexed to create a clustered index to enable fast look-ups. 但是,表服务不创建任何辅助索引,因此,PartitionKeyRowKey 是唯一具有索引的属性。However, the Table service does not create any secondary indexes, so PartitionKey and RowKey are the only indexed properties. 表设计模式中介绍的一些模式展示了可以如何解决此明显的限制。Some of the patterns described in Table design patterns illustrate how you can work around this apparent limitation.

一个表包含一个或多个分区,为优化解决方案,所做的很多设计决策都将围绕选取合适的 PartitionKeyRowKey 而展开。A table comprises one or more partitions, and many of the design decisions you make will be around choosing a suitable PartitionKey and RowKey to optimize your solution. 一个解决方案可以仅包含单个表,该表包含组织为分区的所有实体,但通常一个解决方案具有多个表。A solution may consist of a single table that contains all your entities organized into partitions, but typically a solution has multiple tables. 表可帮助你在逻辑上组织实体,帮助你使用访问控制列表管理对数据的访问,并且可以使用单个存储操作删除整个表。Tables help you to logically organize your entities, help you manage access to the data using access control lists, and you can drop an entire table using a single storage operation.

表分区Table partitions

帐户名称、表名称和 PartitionKey 共同标识存储服务中表服务用于存储实体的分区。The account name, table name, and PartitionKey together identify the partition within the storage service where the table service stores the entity. 作为实体寻址方案的一部分,分区定义事务的作用域(详见下方的实体组事务),并构成表服务缩放方式的基础。As well as being part of the addressing scheme for entities, partitions define a scope for transactions (see Entity Group Transactions below), and form the basis of how the table service scales. 有关分区的详细信息,请参阅表存储的性能与可伸缩性核对清单For more information on partitions, see Performance and scalability checklist for Table storage.

在表服务中,单个节点为一个或多个完整的分区提供服务,并且该服务可通过对节点上的分区进行动态负载均衡来进行缩放。In the Table service, an individual node services one or more complete partitions, and the service scales by dynamically load-balancing partitions across nodes. 如果某节点负载过轻,表服务将该节点针对的分区范围拆分为不同节点;流量下降时,该服务可将无操作的节点的分区范围合并为单个节点。If a node is under load, the table service can split the range of partitions serviced by that node onto different nodes; when traffic subsides, the service can merge the partition ranges from quiet nodes back onto a single node.

有关表服务的内部细节(特别是服务管理分区的方式)的详细信息,请参阅文章 Microsoft Azure 存储:具有非常一致性的高可用云存储服务For more information about the internal details of the Table service, and in particular how the service manages partitions, see the paper Microsoft Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency.

实体组事务Entity Group Transactions

在表服务中,实体组事务 (EGT) 是唯一内置机制,用于对多个实体执行原子更新。In the Table service, Entity Group Transactions (EGTs) are the only built-in mechanism for performing atomic updates across multiple entities. EGT 有时也被称为“批处理事务”。EGTs are sometimes also referred to as batch transactions. EGT 只能对存储在同一分区中的实体(也就是说,在给定的表中共享同一分区键)执行操作。EGTs can only operate on entities stored in the same partition (that is, share the same partition key in a given table). 因此,任何时候需要实现跨多个实体的原子事务行为时,必须确保那些实体位于同一分区中。So anytime you require atomic transactional behavior across multiple entities, you must ensure that those entities are in the same partition. 这通常是将多个实体类型保存在同一个表(和分区)中,而不是对不同实体类型使用多个表的原因。This is often a reason for keeping multiple entity types in the same table (and partition) and not using multiple tables for different entity types. 单个 EGT 最多可应用于 100 个实体。A single EGT can operate on at most 100 entities. 若要提交多个并发 EGT 进行处理,请务必确保不在 EGT 共用实体上操作这些 EGT,否则会造成延迟处理。If you submit multiple concurrent EGTs for processing, it is important to ensure those EGTs do not operate on entities that are common across EGTs; otherwise, processing can be delayed.

EGT 还引入了一个在设计时需要评估的潜在权衡。EGTs also introduce a potential trade-off for you to evaluate in your design. 那就是,使用更多分区会提高应用程序的可伸缩性,因为 Azure 可以有更多的机会在各个节点之间对请求进行负载均衡。That is, using more partitions increases the scalability of your application, because Azure has more opportunities for load balancing requests across nodes. 但是,使用更多分区可能会限制应用程序执行原子事务以及保持数据的强一致性的能力。But using more partitions might limit the ability of your application to perform atomic transactions and maintain strong consistency for your data. 而且,在分区级别还有特定的可伸缩性目标,这些目标可能会限制预期单个节点可以实现的事务吞吐量。Furthermore, there are specific scalability targets at the level of a partition that might limit the throughput of transactions you can expect for a single node. 有关 Azure 标准存储帐户的可伸缩性目标的详细信息,请参阅标准存储帐户的可伸缩性目标For more information about scalability targets for Azure standard storage accounts, see Scalability targets for standard storage accounts. 有关表服务的可伸缩性目标的详细信息,请参阅表存储的可伸缩性和性能目标For more information about scalability targets for the Table service, see Scalability and performance targets for Table storage.

容量注意事项Capacity considerations

下表描述了表存储的容量、可伸缩性和性能目标。The following table describes capacity, scalability, and performance targets for Table storage.

资源Resource 目标Target
Azure 存储帐户中表的个数Number of tables in an Azure storage account 仅受存储帐户的容量限制Limited only by the capacity of the storage account
表中的分区个数Number of partitions in a table 仅受存储帐户的容量限制Limited only by the capacity of the storage account
分区中实体的个数Number of entities in a partition 仅受存储帐户的容量限制Limited only by the capacity of the storage account
单个表的最大大小Maximum size of a single table 500 TiB500 TiB
单个实体的最大大小,包括所有属性值Maximum size of a single entity, including all property values 1 MiB1 MiB
表实体中属性的最大数目Maximum number of properties in a table entity 255(包括 3 个系统属性:PartitionKeyRowKeyTimestamp255 (including the three system properties, PartitionKey, RowKey, and Timestamp)
实体中单个属性的最大总大小Maximum total size of an individual property in an entity 因属性类型而异。Varies by property type. 有关详细信息,请参阅了解表服务数据模型中的属性类型For more information, see Property Types in Understanding the Table Service Data Model.
PartitionKey 的大小Size of the PartitionKey 大小最大为 1 KiB 的字符串A string up to 1 KiB in size
RowKey 的大小Size of the RowKey 大小最大为 1 KiB 的字符串A string up to 1 KiB in size
实体组事务的大小Size of an entity group transaction 一个事务最多可包含 100 个实体,并且有效负载大小必须小于 4 MiB。A transaction can include at most 100 entities and the payload must be less than 4 MiB in size. 实体组事务只能包含对实体的更新一次。An entity group transaction can include an update to an entity only once.
每个表存储的访问策略的最大数目Maximum number of stored access policies per table 55
每个存储帐户的最大请求速率Maximum request rate per storage account 20,000 事务/秒,假定实体大小为 1-KiB20,000 transactions per second, which assumes a 1-KiB entity size
单个表分区的目标吞吐量(1 KiB 实体)Target throughput for a single table partition (1 KiB-entities) 每秒最多 2,000 个实体Up to 2,000 entities per second

成本注意事项Cost considerations

表存储的价格相对便宜,但在评估任何表服务解决方案时,应同时针对容量使用情况和事务数量进行成本估算。Table storage is relatively inexpensive, but you should include cost estimates for both capacity usage and the quantity of transactions as part of your evaluation of any Table service solution. 但是,在许多情况下,为提高解决方案的性能或可伸缩性,存储非规范化或重复的数据是一种有效方法。However, in many scenarios, storing denormalized or duplicate data in order to improve the performance or scalability of your solution is a valid approach. 有关定价的详细信息,请参阅 Azure 存储定价For more information about pricing, see Azure Storage Pricing.

后续步骤Next steps