针对查询的设计Design for querying

表服务解决方案可能需要进行大量读取操作和/或大量写入操作。Table service solutions may be read intensive, write intensive, or a mix of the two. 本文重点介绍在将表服务设计为支持高效读取操作时需要牢记的事项。This article focuses on the things to bear in mind when you are designing your Table service to support read operations efficiently. 通常,支持高效读取操作的设计对于写入操作来说也是高效的。Typically, a design that supports read operations efficiently is also efficient for write operations. 但是,在进行支持写入操作的设计时,还需牢记一些额外注意事项,具体请参阅文章针对数据修改的设计However, there are additional considerations to bear in mind when designing to support write operations, discussed in the article Design for data modification.

将表服务解决方案设计为能够高效读取数据的良好起点是问“我的应用程序将需要执行哪些查询来从表服务中检索它所需的数据?”A good starting point for designing your Table service solution to enable you to read data efficiently is to ask "What queries will my application need to execute to retrieve the data it needs from the Table service?"

备注

使用表服务时,预先将设计做正确很重要,因为以后进行更改会很难并且开销会很高。With the Table service, it's important to get the design correct up front because it's difficult and expensive to change it later. 例如,在关系数据库中,通常只需通过向现有数据库添加索引就可以解决性能问题:而这不是表服务的选项。For example, in a relational database it's often possible to address performance issues simply by adding indexes to an existing database: this is not an option with the Table service.

本部分重点介绍在设计要查询的表时必须解决的关键问题。This section focuses on the key issues you must address when you design your tables for querying. 本部分中涵盖的主题包括:The topics covered in this section include:

所选的 PartitionKey 和 RowKey 如何影响查询性能How your choice of PartitionKey and RowKey impacts query performance

下述示例假定表服务要使用以下结构存储员工实体(为清楚起见,大多数示例将省略 Timestamp 属性):The following examples assume the table service is storing employee entities with the following structure (most of the examples omit the Timestamp property for clarity):

列名Column name Data typeData type
PartitionKey(部门名称)PartitionKey (Department Name) StringString
RowKey(员工 ID)RowKey (Employee Id) StringString
名字FirstName StringString
姓氏LastName StringString
年龄Age IntegerInteger
EmailAddressEmailAddress StringString

有关一些直接影响查询设计的主要 Azure 表服务功能,请参阅文章:Azure 表存储概述The article Azure Table storage overview describes some of the key features of the Azure Table service that have a direct influence on designing for query. 这些功能产生了以下设计表服务查询的通用准则。These result in the following general guidelines for designing Table service queries. 请注意,下述示例中所用的筛选器语法源自表服务 REST API,详细信息请参阅 Query Entities(查询实体)。Note that the filter syntax used in the examples below is from the Table service REST API, for more information see Query Entities.

  • 点查询是最高效的一种查找方式,可用于且建议用于大容量查找或要求最低延迟的查找。A Point Query is the most efficient lookup to use and is recommended to be used for high-volume lookups or lookups requiring lowest latency. 通过指定 PartitionKeyRowKey 值,此类查询可极为高效地利用索引查找单个实体。Such a query can use the indexes to locate an individual entity very efficiently by specifying both the PartitionKey and RowKey values. 例如:$filter=(PartitionKey eq 'Sales') and (RowKey eq '2')For example: $filter=(PartitionKey eq 'Sales') and (RowKey eq '2')
  • 其次是范围查询,它使用 PartitionKey并筛选一系列 RowKey 值,从而返回多个实体。Second best is a Range Query that uses the PartitionKey and filters on a range of RowKey values to return more than one entity. PartitionKey 值确定特定分区,RowKey 值确定该分区中的实体子集。The PartitionKey value identifies a specific partition, and the RowKey values identify a subset of the entities in that partition. 例如:$filter=PartitionKey eq 'Sales' and RowKey ge 'S' and RowKey lt 'T'For example: $filter=PartitionKey eq 'Sales' and RowKey ge 'S' and RowKey lt 'T'
  • 然后是分区扫描,它使用 PartitionKey 并根据另一个非键属性进行筛选,可能会返回多个实体。Third best is a Partition Scan that uses the PartitionKey and filters on another non-key property and that may return more than one entity. PartitionKey 值确定特定分区,而属性值将选择该分区中的实体子集。The PartitionKey value identifies a specific partition, and the property values select for a subset of the entities in that partition. 例如:$filter=PartitionKey eq 'Sales' and LastName eq 'Smith'For example: $filter=PartitionKey eq 'Sales' and LastName eq 'Smith'
  • 表扫描不包括 PartitionKey 且效率极低,因为它会依次搜索构成表的所有分区,查找所有匹配的实体。A Table Scan does not include the PartitionKey and is very inefficient because it searches all of the partitions that make up your table in turn for any matching entities. 无论筛选器是否使用 RowKey它都将执行表扫描。It will perform a table scan regardless of whether or not your filter uses the RowKey. 例如:$filter=LastName eq 'Jones'For example: $filter=LastName eq 'Jones'
  • 返回多个实体的查询将按 PartitionKeyRowKey 顺序返回实体。Queries that return multiple entities return them sorted in PartitionKey and RowKey order. 若要避免对客户端中的实体重新排序,请选择定义最常见排序顺序的 RowKeyTo avoid resorting the entities in the client, choose a RowKey that defines the most common sort order.

请注意,使用“or”指定基于 RowKey 值的筛选器将导致分区扫描,而不会视为范围查询。Note that using an "or" to specify a filter based on RowKey values results in a partition scan and is not treated as a range query. 因此,应避免使用筛选器 (如查询:$filter=PartitionKey eq 'Sales' and (RowKey eq '121' or RowKey eq '322')Therefore, you should avoid queries that use filters such as: $filter=PartitionKey eq 'Sales' and (RowKey eq '121' or RowKey eq '322')

有关使用存储客户端库执行高效查询的客户端代码的示例,请参阅:For examples of client-side code that use the Storage Client Library to execute efficient queries, see:

有关可以处理存储在同一个表中的多个实体类型的客户端代码的示例,请参阅:For examples of client-side code that can handle multiple entity types stored in the same table, see:

选择适当的 PartitionKeyChoosing an appropriate PartitionKey

所选的 PartitionKey 应权衡不同需求,决定是使用 EGT 确保一致性还是将实体分布到多个分区中来确保可伸缩的解决方案。Your choice of PartitionKey should balance the need to enable the use of EGTs (to ensure consistency) against the requirement to distribute your entities across multiple partitions (to ensure a scalable solution).

一种极端做法是,可以将所有实体都存储在单个分区,但这可能会限制解决方案的伸缩性,并且会使表服务无法实现请求的负载均衡。At one extreme, you could store all your entities in a single partition, but this may limit the scalability of your solution and would prevent the table service from being able to load-balance requests. 另一种极端做法是,可以每个分区存储一个实体,这样可以获得高伸缩性,并且使表服务能够对请求进行负载均衡,但无法使用实体组事务。At the other extreme, you could store one entity per partition, which would be highly scalable and which enables the table service to load-balance requests, but which would prevent you from using entity group transactions.

理想的 PartitionKey 可实现高效查询,具有足够多的分区来确保解决方案可伸缩。An ideal PartitionKey is one that enables you to use efficient queries and that has sufficient partitions to ensure your solution is scalable. 通常,会发现实体将具有合适的属性,可以将实体分布到足够多的分区。Typically, you will find that your entities will have a suitable property that distributes your entities across sufficient partitions.

备注

例如,在存储有关用户或员工的信息的系统中,UserID 可以是一个好的 PartitionKey。For example, in a system that stores information about users or employees, UserID may be a good PartitionKey. 可拥有多个使用某个给定 UserID 作为分区键的实体。You may have several entities that use a given UserID as the partition key. 存储有关用户数据的每个实体分组到单个分区,因此通过实体组事务可访问这些实体,同时仍具有高度可伸缩性。Each entity that stores data about a user is grouped into a single partition, and so these entities are accessible via entity group transactions, while still being highly scalable.

在选择与实体插入、更新和删除方式相关的 PartitionKey 时,还要注意一些额外的事项。There are additional considerations in your choice of PartitionKey that relate to how you will insert, update, and delete entities. 有关详细信息,请参阅针对数据修改设计表For more information, see Designing tables for data modification.

针对表服务优化查询Optimizing queries for the Table service

表服务将使用单个聚集索引中的 PartitionKeyRowKey 值自动编制实体的索引,因此使点查询成为了最高效的方式。The Table service automatically indexes your entities using the PartitionKey and RowKey values in a single clustered index, hence the reason that point queries are the most efficient to use. 但是,只有 PartitionKeyRowKey 的群集索引上具有索引。However, there are no indexes other than that on the clustered index on the PartitionKey and RowKey.

许多设计必须满足要求,才能允许根据多个条件查找实体。Many designs must meet requirements to enable lookup of entities based on multiple criteria. 例如,根据电子邮件、员工 ID 或姓氏查找员工实体。For example, locating employee entities based on email, employee id, or last name. 表设计模式中所述的模式解决了这些类型的要求,并介绍了相关方式来处理表服务不提供辅助索引的问题:The patterns described in Table Design Patterns address these types of requirement and describe ways of working around the fact that the Table service does not provide secondary indexes:

  • 内分区的第二索引模式 - 利用同一分区中的 RowKey 值存储每个实体的多个副本,实现快速、高效的查询,并借助不同的 RowKey 值替换排序顺序。Intra-partition secondary index pattern - Store multiple copies of each entity using different RowKey values (in the same partition) to enable fast and efficient lookups and alternate sort orders by using different RowKey values.
  • 内分区的第二索引模式 - 在单独分区/表格中利用不同 RowKey 值存储每个实体的多个副本,实现快速高效的查找,并借助 RowKey 值替换排序顺序 。Inter-partition secondary index pattern - Store multiple copies of each entity using different RowKey values in separate partitions or in separate tables to enable fast and efficient lookups and alternate sort orders by using different RowKey values.
  • 索引实体模式 - 维护索引实体,实现返回实体列表的高效搜索。Index Entities Pattern - Maintain index entities to enable efficient searches that return lists of entities.

对表服务中的数据进行排序Sorting data in the Table service

表服务依次按 PartitionKeyRowKey 以升序排序返回实体。The Table service returns entities sorted in ascending order based on PartitionKey and then by RowKey. 这些键是字符串值,以确保数字值正确排序,应将值转换为固定长度并使用零进行填充。These keys are string values and to ensure that numeric values sort correctly, you should convert them to a fixed length and pad them with zeroes. 例如,如果用作 RowKey 的员工 ID 值是个整数值,则应将员工 ID 123 转换为 00000123For example, if the employee id value you use as the RowKey is an integer value, you should convert employee id 123 to 00000123.

许多应用程序要求使用按不同顺序排序的数据:例如,按名称或按加入日期对员工进行排序。Many applications have requirements to use data sorted in different orders: for example, sorting employees by name, or by joining date. 以下模式解决了如何替换实体的排序顺序的问题:The following patterns address how to alternate sort orders for your entities:

  • 内分区的第二索引模式 - 利用同一分区中的 RowKey 值存储每个实体的多个副本,实现快速、高效的查询并借助不同的 RowKey 值替换排序顺序。Intra-partition secondary index pattern - Store multiple copies of each entity using different RowKey values (in the same partition) to enable fast and efficient lookups and alternate sort orders by using different RowKey values.
  • 内分区的第二索引模式 - 在单独分区/表格中利用不同 RowKey 值存储每个实体的多个副本,实现快速高效的查找,并借助 RowKey 值替换排序顺序。Inter-partition secondary index pattern - Store multiple copies of each entity using different RowKey values in separate partitions in separate tables to enable fast and efficient lookups and alternate sort orders by using different RowKey values.
  • 日志结尾模式 - 利用按日期和时间倒序方式排序的 RowKey 值,检索最近添加到分区中的 n 个实体。Log tail pattern - Retrieve the n entities most recently added to a partition by using a RowKey value that sorts in reverse date and time order.

后续步骤Next steps