Blob 存储的性能与可伸缩性查检表Performance and scalability checklist for Blob storage

Azure 为开发使用 Blob 存储的高性能应用程序制定了许多经过证实的做法。Azure has developed a number of proven practices for developing high-performance applications with Blob storage. 此查检表列出了开发人员在优化性能时可以遵循的关键做法。This checklist identifies key practices that developers can follow to optimize performance. 在设计应用程序时以及在整个流程中,请牢记这些做法。Keep these practices in mind while you are designing your application and throughout the process.

Azure 存储在容量、事务速率和带宽方面存在可伸缩性与性能目标。Azure Storage has scalability and performance targets for capacity, transaction rate, and bandwidth. 有关 Azure 存储可伸缩性目标的详细信息,请参阅标准存储帐户的可伸缩性和性能目标Blob 存储的可伸缩性和性能目标For more information about Azure Storage scalability targets, see Scalability and performance targets for standard storage accounts and Scalability and performance targets for Blob storage.

清单Checklist

本文以查检表的形式组织了在开发 Blob 存储应用程序时在性能方面可以遵循的经过证实的做法。This article organizes proven practices for performance into a checklist you can follow while developing your Blob storage application.

完成Done 类别Category 设计注意事项Design consideration
  可伸缩性目标Scalability targets 是否可将应用程序设计为避免使用的存储帐户数超过最大数目?Can you design your application to use no more than the maximum number of storage accounts?
  可伸缩性目标Scalability targets 是否要避免接近容量和事务限制?Are you avoiding approaching capacity and transaction limits?
  可伸缩性目标Scalability targets 是否有大量的客户端并发访问单个 Blob?Are a large number of clients accessing a single blob concurrently?
  可伸缩性目标Scalability targets 应用程序是否保留在单个 Blob 的可伸缩性目标范围内?Is your application staying within the scalability targets for a single blob?
  分区Partitioning 命名约定是否旨在实现更好的负载均衡?Is your naming convention designed to enable better load-balancing?
  网络Networking 客户端设备是否具有足够高的带宽和足够低的延迟,以实现所需的性能?Do client-side devices have sufficiently high bandwidth and low latency to achieve the performance needed?
  网络Networking 客户端设备是否具有优质网络链接?Do client-side devices have a high quality network link?
  网络Networking 客户端应用程序是否位于存储帐户所在的同一区域?Is the client application in the same region as the storage account?
  直接客户端访问Direct client access 是否使用共享访问签名 (SAS) 和跨源资源共享 (CORS) 来实现对 Azure 存储的直接访问?Are you using shared access signatures (SAS) and cross-origin resource sharing (CORS) to enable direct access to Azure Storage?
  缓存Caching 应用程序是否缓存经常访问且极少更改的数据?Is your application caching data that is frequently accessed and rarely changed?
  缓存Caching 应用程序是否会对更新进行批处理:将更新缓存在客户端,然后以较大集的形式上传更新?Is your application batching updates by caching them on the client and then uploading them in larger sets?
  .NET 配置.NET configuration 是否使用 .NET Core 2.1 或更高版本来实现最佳性能?Are you using .NET Core 2.1 or later for optimum performance?
  .NET 配置.NET configuration 是否已将客户端配置为使用足够数量的并发连接?Have you configured your client to use a sufficient number of concurrent connections?
  .NET 配置.NET configuration 对于 .NET 应用程序,是否已将 .NET 配置为使用足够数量的线程?For .NET applications, have you configured .NET to use a sufficient number of threads?
  并行度Parallelism 是否能够确保对并行度进行适当的界定,使客户端功能不会过载或接近可伸缩性目标?Have you ensured that parallelism is bounded appropriately so that you don't overload your client's capabilities or approach the scalability targets?
  工具Tools 是否使用 Microsoft 提供的最新版客户端库和工具?Are you using the latest versions of Microsoft-provided client libraries and tools?
  重试Retries 是否对限制错误和超时使用重试策略和指数退避?Are you using a retry policy with an exponential backoff for throttling errors and timeouts?
  重试Retries 对于不可重试的错误,应用程序是否会避免重试?Is your application avoiding retries for non-retryable errors?
  复制 BlobCopying blobs 是否以最高效的方式复制 Blob?Are you copying blobs in the most efficient manner?
  复制 BlobCopying blobs 是否使用最新版本的 AzCopy 执行批量复制操作?Are you using the latest version of AzCopy for bulk copy operations?
  复制 BlobCopying blobs 是否要使用 Azure Data Box 系列导入大量数据?Are you using the Azure Data Box family for importing large volumes of data?
  内容分发Content distribution 是否要使用 CDN 进行内容分发?Are you using a CDN for content distribution?
  使用元数据Use metadata 是否会将频繁使用的有关 Blob 的元数据存储在其元数据中?Are you storing frequently used metadata about blobs in their metadata?
  快速上传Uploading quickly 尝试快速上传一个 Blob 时,是否会以并行方式上传块?When trying to upload one blob quickly, are you uploading blocks in parallel?
  快速上传Uploading quickly 尝试快速上传许多 Blob 时,是否会以并行方式上传 Blob?When trying to upload many blobs quickly, are you uploading blobs in parallel?
  Blob 类型Blob type 是否会根据需要使用页 Blob 或块 Blob?Are you using page blobs or block blobs when appropriate?

可伸缩性目标Scalability targets

如果应用程序接近或超过任何可伸缩性目标,则可能会出现事务处理延迟或限制越来越严重的现象。If your application approaches or exceeds any of the scalability targets, it may encounter increased transaction latencies or throttling. 当 Azure 存储对应用程序进行限制时,该服务将开始返回 503(服务器繁忙)或 500(操作超时)错误代码。When Azure Storage throttles your application, the service begins to return 503 (Server busy) or 500 (Operation timeout) error codes. 保持在可伸缩性目标限制范围内,以避免这些错误,是增强应用程序性能的重要组成部分。Avoiding these errors by staying within the limits of the scalability targets is an important part of enhancing your application's performance.

有关队列服务可伸缩性目标的详细信息,请参阅 Azure 存储可伸缩性和性能目标For more information about scalability targets for the Queue service, see Azure Storage scalability and performance targets.

最大存储帐户数Maximum number of storage accounts

如果即将达到特定订阅/区域组合允许的最大存储帐户数,请评估你的方案并确定是否符合以下任何条件:If you're approaching the maximum number of storage accounts permitted for a particular subscription/region combination, evaluate your scenario and determine whether any of the following conditions apply:

  • 是否使用存储帐户作为非托管磁盘,并将这些磁盘添加到虚拟机 (VM)?Are you using storage accounts to store unmanaged disks and adding those disks to your virtual machines (VMs)? 对于此方案,Azure 建议使用托管磁盘。For this scenario, Azure recommends using managed disks. 托管磁盘可自动缩放,你无需创建和管理单个存储帐户。Managed disks scale for you automatically and without the need to create and manage individual storage accounts. 有关详细信息,请参阅 Azure 托管磁盘简介For more information, see Introduction to Azure managed disks
  • 是否对每个客户使用一个存储帐户,以实现数据隔离?Are you using one storage account per customer, for the purpose of data isolation? 对于此方案,Azure 建议对每个客户使用 Blob 容器,而不要使用整个存储帐户。For this scenario, Azure recommends using a blob container for each customer, instead of an entire storage account. Azure 存储现在允许基于每个容器分配 Azure 角色。Azure Storage now allows you to assign Azure roles on a per-container basis. 有关详细信息,请参阅使用 Azure 门户分配用于访问 Blob 和队列数据的 Azure 角色For more information, see Use the Azure portal to assign an Azure role for access to blob and queue data.
  • 是否使用多个存储帐户进行分片,以增加流入量、流出量、每秒 I/O 操作次数 (IOPS) 或容量?Are you using multiple storage accounts to shard to increase ingress, egress, I/O operations per second (IOPS), or capacity? 对于此方案,Azure 建议在可能的情况下,利用存储帐户的更高限制来减少工作负荷所需的存储帐户数。In this scenario, Azure recommends that you take advantage of increased limits for storage accounts to reduce the number of storage accounts required for your workload if possible. 若要请求提高存储帐户的限制,请联系 Azure 支持部门Contact Azure Support to request increased limits for your storage account.

容量和事务目标Capacity and transaction targets

如果应用程序正接近单个存储帐户的可伸缩性目标,可考虑采用以下方法之一:If your application is approaching the scalability targets for a single storage account, consider adopting one of the following approaches:

  • 如果应用程序达到了事务目标,请考虑使用块 Blob 存储帐户。这些帐户经过优化,可实现较高的事务速率和持续较低的延迟。If your application hits the transaction target, consider using block blob storage accounts, which are optimized for high transaction rates and low and consistent latency. 有关详细信息,请参阅 Azure 存储帐户概述For more information, see Azure storage account overview.
  • 重新考虑导致应用程序接近或超过可伸缩性目标的工作负载。Reconsider the workload that causes your application to approach or exceed the scalability target. 能否对其进行另外的设计,以便使用较少的带宽、容量或处理事务?Can you design it differently to use less bandwidth or capacity, or fewer transactions?
  • 如果应用程序肯定会超出伸缩性目标之一,请创建多个存储帐户并将应用程序数据跨多个这样的存储帐户进行分区。If your application must exceed one of the scalability targets, then create multiple storage accounts and partition your application data across those multiple storage accounts. 如果使用这种模式,则在设计应用程序时,必须确保能够在以后添加更多的存储帐户,以便进行负载均衡。If you use this pattern, then be sure to design your application so that you can add more storage accounts in the future for load balancing. 存储帐户本身除了用于数据存储、事务处理或数据传输之外,并无其他开销。Storage accounts themselves have no cost other than your usage in terms of data stored, transactions made, or data transferred.
  • 如果应用程序接近带宽目标,请考虑压缩客户端的数据,以减少将数据发送到 Azure 存储所需的带宽。If your application is approaching the bandwidth targets, consider compressing data on the client side to reduce the bandwidth required to send the data to Azure Storage. 压缩数据虽然可以节省带宽并提高网络性能,但也可能会对性能带来负面影响。While compressing data may save bandwidth and improve network performance, it can also have negative effects on performance. 评估客户端数据压缩和解压缩的额外处理要求对性能造成的影响。Evaluate the performance impact of the additional processing requirements for data compression and decompression on the client side. 请记住,存储压缩数据可能会使故障排除变得更复杂,因为使用标准工具查看这些数据可能会更困难。Keep in mind that storing compressed data can make troubleshooting more difficult because it may be more challenging to view the data using standard tools.
  • 如果应用程序接近可伸缩性目标,请确保对重试使用指数退避。If your application is approaching the scalability targets, then make sure that you are using an exponential backoff for retries. 最好是尝试通过实施本文中所述的建议来避免达到可伸缩性目标。It's best to try to avoid reaching the scalability targets by implementing the recommendations described in this article. 但是,对重试使用指数退避会导致应用程序无法快速重试,从而导致限制问题恶化。However, using an exponential backoff for retries will prevent your application from retrying rapidly, which could make throttling worse. 有关详细信息,请参阅标题为超时和服务器繁忙错误的部分。For more information, see the section titled Timeout and Server Busy errors.

并发访问单个 Blob 的多个客户端Multiple clients accessing a single blob concurrently

如果有大量的客户端并发访问单个 Blob,则需要要考虑每个 Blob 和每个存储帐户的可伸缩性目标。If you have a large number of clients accessing a single blob concurrently, you will need to consider both per blob and per storage account scalability targets. 可以访问单个 Blob 的客户端的具体数量根据各种因素(例如,同时请求 Blob 的客户端数、Blob 的大小和网络状况)而有所不同。The exact number of clients that can access a single blob will vary depending on factors such as the number of clients requesting the blob simultaneously, the size of the blob, and network conditions.

如果 blob 可以通过 CDN(如从网站提供的图像或视频)分发,则可以使用 CDN。If the blob can be distributed through a CDN such as images or videos served from a website, then you can use a CDN. 有关详细信息,请参阅标题为内容分发的部分。For more information, see the section titled Content distribution.

在其他情况下,例如在数据保密的科学模拟下,有两个选项。In other scenarios, such as scientific simulations where the data is confidential, you have two options. 第一个选项是错开工作负荷的访问,以确保在某个时间段内访问 Blob,而不是同时访问 Blob。The first is to stagger your workload's access such that the blob is accessed over a period of time vs being accessed simultaneously. 另一个选项是暂时将 Blob 复制到多个存储帐户,以增加每个 Blob 和存储帐户内的 IOPS 总数。Alternatively, you can temporarily copy the blob to multiple storage accounts to increase the total IOPS per blob and across storage accounts. 结果将根据应用程序的行为而异,因此,请务必在设计期间测试并发模式。Results will vary depending on your application's behavior, so be sure to test concurrency patterns during design.

每个 Blob 的带宽和操作Bandwidth and operations per blob

单个 Blob 每秒最多可支持 500 个请求。A single blob supports up to 500 requests per second. 如果多个客户端需要读取同一 Blob,而你可能会超过此限制,请考虑使用块 Blob 存储帐户。If you have multiple clients that need to read the same blob and you might exceed this limit, then consider using a block blob storage account. 块 Blob 存储帐户提供更高的请求速率或每秒 I/O 操作次数 (IOPS)。A block blob storage account provides a higher request rate, or I/O operations per second (IOPS).

还可以使用内容分发网络 (CDN)(例如 Azure CDN)在 Blob 上分发操作。You can also use a content delivery network (CDN) such as Azure CDN to distribute operations on the blob. 有关 Azure CDN 的详细信息,请参阅 Azure CDN 概述For more information about Azure CDN, see Azure CDN overview.

分区Partitioning

了解 Azure 存储对 Blob 数据的分区方式有助于增强性能。Understanding how Azure Storage partitions your blob data is useful for enhancing performance. 与数据分散在多个分区相比,在单个分区中,Azure 存储可以更快地提供数据。Azure Storage can serve data in a single partition more quickly than data that spans multiple partitions. 适当地为 Blob 命名可以提高读取请求的效率。By naming your blobs appropriately, you can improve the efficiency of read requests.

Blob 存储使用基于范围的分区方案来进行缩放和负载均衡。Blob storage uses a range-based partitioning scheme for scaling and load balancing. 每个 Blob 有一个分区键,该键由完整的 Blob 名称(帐户+容器+Blob)构成。Each blob has a partition key comprised of the full blob name (account+container+blob). 分区键用于将 Blob 数据分区成范围。The partition key is used to partition blob data into ranges. 然后,范围将在整个 Blob 存储中进行负载均衡。The ranges are then load-balanced across Blob storage.

基于范围的分区意味着,使用词汇顺序(例如 msftpayrollmsftperformancemsftemployees 等)或时间戳(log20160101log20160102log20160102 等)的命名约束更有可能会导致将分区共置在同一个分区服务器中,Range-based partitioning means that naming conventions that use lexical ordering (for example, mypayroll, myperformance, myemployees, etc.) or timestamps (log20160101, log20160102, log20160102, etc.) are more likely to result in the partitions being co-located on the same partition server. 直到更大的负载需要将它们拆分成较小的范围。, until increased load requires that they are split into smaller ranges. 将 Blob 共置在同一分区服务器中可以提高性能,因此,性能增强的一个重要部分涉及到以最有效的组织方式命名 Blob。Co-locating blobs on the same partition server enhances performance, so an important part of performance enhancement involves naming blobs in a way that organizes them most effectively.

例如,容器中的所有 Bob 可接受单个服务器的服务,直到这些 Blob 的负载需要进一步进行重新平衡分区范围。For example, all blobs within a container can be served by a single server until the load on these blobs requires further rebalancing of the partition ranges. 同样,一组名称按词汇顺序排列的少量负载帐户可以接受单个服务器的服务,直到其中一个或所有帐户的负载请求它们拆分到多个分区服务器。Similarly, a group of lightly loaded accounts with their names arranged in lexical order may be served by a single server until the load on one or all of these accounts require them to be split across multiple partition servers.

每个负载均衡操作可能在操作期间会影响存储调用的延迟。Each load-balancing operation may impact the latency of storage calls during the operation. 服务处理某个分区流量激增的能力受到单个分区服务器可伸缩性的限制,直到负载均衡操作着手重新平衡分区键范围。The service's ability to handle a sudden burst of traffic to a partition is limited by the scalability of a single partition server until the load-balancing operation kicks in and rebalances the partition key range.

可以遵循一些最佳实践来降低此类操作的频率。You can follow some best practices to reduce the frequency of such operations.

  • 如果可能,请对标准存储帐户使用大于 4 MiB 的 Blob 或块大小,对高级存储帐户使用大于 256 KiB 的 Blob 或块大小。If possible, use blob or block sizes greater than 4 MiB for standard storage accounts and greater than 256 KiB for premium storage accounts. 较大的 Blob 或块大小会自动激活高吞吐量块 Blob。Larger blob or block sizes automatically activate high-throughput block blobs. 高吞吐量块 Blob 提供不受分区命名影响的高性能引入。High-throughput block blobs provide high-performance ingest that is not affected by partition naming.

  • 检查帐户、容器、Blob、表和队列使用的命名约定。Examine the naming convention you use for accounts, containers, blobs, tables, and queues. 考虑使用最符合需求的哈希函数,在帐户、容器或 Blob 名前加上三位数哈希。Consider prefixing account, container, or blob names with a three-digit hash using a hashing function that best suits your needs.

  • 如果使用时间戳或数字标识符组织数据,请确保使用的不是仅附加在后(或仅在前面加上)的流量模式。If you organize your data using timestamps or numerical identifiers, make sure that you are not using an append-only (or prepend-only) traffic pattern. 这些模式不适用于基于范围的分区系统。These patterns are not suitable for a range-based partitioning system. 这些模式可能导致所有流量进入单个分区并限制系统进行有效的负载均衡。These patterns may lead to all traffic going to a single partition and limiting the system from effectively load balancing.

    例如,如果日常操作使用有时间戳的 Blob,如 yyyymmdd,则该日常操作的所有流量都定向到由单个分区服务器服务的单个 Blob。For example, if you have daily operations that use a blob with a timestamp such as yyyymmdd, then all traffic for that daily operation is directed to a single blob, which is served by a single partition server. 考虑每个 Blob 的限制和每个分区的限制是否符合需求,并考虑是否需要将此操作拆分成多个 Blob。Consider whether the per-blob limits and per-partition limits meet your needs, and consider breaking this operation into multiple blobs if needed. 同样,如果在表中存储时序数据,则所有流量都可能定向到键命名空间的最后一个部分。Similarly, if you store time series data in your tables, all traffic may be directed to the last part of the key namespace. 如果使用数字 ID,请使用三位数哈希作为 ID 的前缀。If you are using numerical IDs, prefix the ID with a three-digit hash. 如果使用时间戳,请使用秒值作为时间戳的前缀,例如 ssyyyymmddIf you are using timestamps, prefix the timestamp with the seconds value, for example, ssyyyymmdd. 如果应用程序定期执行列出和查询操作,请选择限制查询次数的哈希函数。If your application routinely performs listing and querying operations, choose a hashing function that will limit your number of queries. 在某些情况下,使用随机前缀便已足够。In some cases, a random prefix may be sufficient.

  • 有关 Azure 存储中使用的分区方案的详细信息,请参阅 Azure 存储:具有高度一致性的高可用云存储服务For more information on the partitioning scheme used in Azure Storage, see Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency.

网络Networking

物理网络对应用程序的约束可能会严重影响性能。The physical network constraints of the application may have a significant impact on performance. 以下部分描述了用户可能会遇到的某些限制。The following sections describe some of limitations users may encounter.

客户端网络功能Client network capability

如以下各部分所述,网络链接的带宽和质量在应用程序性能方面发挥着重要作用。Bandwidth and the quality of the network link play important roles in application performance, as described in the following sections.

吞吐量Throughput

通常情况下,对带宽来说,问题在于客户端的功能。For bandwidth, the problem is often the capabilities of the client. 较大的 Azure 实例的 NIC 具有较大的容量,因此如果需要提高单个计算机的网络限制,则应考虑使用较大的实例或更多 VM。Larger Azure instances have NICs with greater capacity, so you should consider using a larger instance or more VMs if you need higher network limits from a single machine. 如果从本地应用程序访问 Azure 存储,可应用相同的规则:了解客户端设备的网络功能以及与 Azure 存储位置的网络连接情况,然后根据需要对其进行改进,或者将应用程序设计为可在这种网络功能下工作。If you are accessing Azure Storage from an on premises application, then the same rule applies: understand the network capabilities of the client device and the network connectivity to the Azure Storage location and either improve them as needed or design your application to work within their capabilities.

请注意,因错误和数据包丢失而导致的网络状况会降低有效吞吐量,使用任何网络都是这样。As with any network usage, keep in mind that network conditions resulting in errors and packet loss will slow effective throughput. WireShark 或 NetMon 可用于诊断此问题。Using WireShark or NetMon may help in diagnosing this issue.

位置Location

在任何分布式环境中,将客户端放置在服务器附近可提供最佳性能。In any distributed environment, placing the client near to the server delivers in the best performance. 要以最低的延迟访问 Azure 存储,则最好是将客户端放置在同一 Azure 区域内。For accessing Azure Storage with the lowest latency, the best location for your client is within the same Azure region. 例如,如果 Azure Web 应用使用 Azure 存储,请将二者放在同一个区域(例如中国东部或中国北部)。For example, if you have an Azure web app that uses Azure Storage, then locate them both within a single region, such as China East or China North. 将资源放到一起可降低延迟和成本,因为在同一个区域使用带宽是免费的。Co-locating resources reduces the latency and the cost, as bandwidth usage within a single region is free.

如果客户端应用程序要访问 Azure 存储但不是托管在 Azure 中(例如移动设备应用或本地企业服务),则将存储帐户放在靠近这些客户端的区域可降低延迟。If client applications will access Azure Storage but are not hosted within Azure, such as mobile device apps or on premises enterprise services, then locating the storage account in a region near to those clients may reduce latency. 如果客户端广泛分布在各地,请考虑在每个区域使用一个存储帐户。If your clients are broadly distributed, then consider using one storage account per region. 如果应用程序存储的数据是特定于各个用户的,不需要在存储帐户之间复制数据,则此方法更容易实施。This approach is easier to implement if the data the application stores is specific to individual users, and does not require replicating data between storage accounts.

若要广泛地分发 blob 内容,请使用内容分发网络,如 Azure CDN。For broad distribution of blob content, use a content deliver network such as Azure CDN. 关于 Azure CDN 的详细信息,请参阅 Azure CDNFor more information about Azure CDN, see Azure CDN.

SAS 和 CORSSAS and CORS

假设你需要授权用户 Web 浏览器或手机应用中运行的代码(例如 JavaScript)访问 Azure 存储中的数据。Suppose that you need to authorize code such as JavaScript that is running in a user's web browser or in a mobile phone app to access data in Azure Storage. 一种方法是构建充当代理的服务应用程序。One approach is to build a service application that acts as a proxy. 用户的设备将对服务进行身份验证,而后者又可授权访问 Azure 存储资源。The user's device authenticates with the service, which in turn authorizes access to Azure Storage resources. 这样,就可以避免在不安全的设备上公开存储帐户密钥。In this way, you can avoid exposing your storage account keys on insecure devices. 但是,此方法会明显增大服务应用程序的开销,因为在用户设备与 Azure 存储之间传输的所有数据必须通过服务应用程序。However, this approach places a significant overhead on the service application, because all of the data transferred between the user's device and Azure Storage must pass through the service application.

使用共享访问签名 (SAS) 即可避免将服务应用程序用作 Azure 存储的代理。You can avoid using a service application as a proxy for Azure Storage by using shared access signatures (SAS). 使用 SAS 可让用户设备通过受限访问令牌直接对 Azure 存储发出请求。Using SAS, you can enable your user's device to make requests directly to Azure Storage by using a limited access token. 例如,如果用户想要将照片上传到应用程序,则服务应用程序可以生成 SAS 并将其发送到用户的设备。For example, if a user wants to upload a photo to your application, then your service application can generate a SAS and send it to the user's device. SAS 令牌可按指定的时间间隔授予写入 Azure 存储资源的权限,该时间过后,SAS 令牌将会过期。The SAS token can grant permission to write to an Azure Storage resource for a specified interval of time, after which the SAS token expires. 有关 SAS 的详细信息,请参阅使用共享访问签名 (SAS) 授予对 Azure 存储资源的有限访问权限For more information about SAS, see Grant limited access to Azure Storage resources using shared access signatures (SAS).

通常,Web 浏览器不允许某个域上的网站所托管的页面中的 JavaScript 对另一个域执行某些操作(例如写入操作)。Typically, a web browser will not allow JavaScript in a page that is hosted by a website on one domain to perform certain operations, such as write operations, to another domain. 此策略称为同源策略,可防止一个页面上的恶意脚本获取另一网页上的数据的访问权限。Known as the same-origin policy, this policy prevents a malicious script on one page from obtaining access to data on another web page. 但是,在云中构建解决方案时,同源策略可能会成为一种限制。However, the same-origin policy can be a limitation when building a solution in the cloud. 跨源资源共享 (CORS) 是一种浏览器功能,它使目标域能够与信任源自源域的请求的浏览器通信。Cross-origin resource sharing (CORS) is a browser feature that enables the target domain to communicate to the browser that it trusts requests originating in the source domain.

例如,假设 Azure 中运行的某个 Web 应用程序对 Azure 存储帐户发出了某个资源请求。For example, suppose a web application running in Azure makes a request for a resource to an Azure Storage account. 该 Web 应用程序是源域,存储帐户是目标域。The web application is the source domain, and the storage account is the target domain. 可为任何 Azure 存储服务配置 CORS,以便与从 Azure 存储信任的源域发出请求的 Web 浏览器通信。You can configure CORS for any of the Azure Storage services to communicate to the web browser that requests from the source domain are trusted by Azure Storage. 有关 CORS 的详细信息,请参阅 Azure 存储的跨源资源共享 (CORS) 支持For more information about CORS, see Cross-origin resource sharing (CORS) support for Azure Storage.

SAS 和 CORS 都有助于避免 Web 应用程序上出现不必要的负载。Both SAS and CORS can help you avoid unnecessary load on your web application.

缓存Caching

缓存在性能方面发挥着重要的作用。Caching plays an important role in performance. 以下部分将讨论有关缓存的最佳做法。The following sections discuss caching best practices.

读取数据Reading data

通常,读取数据一次比读取数据两次更为有利。In general, reading data once is preferable to reading it twice. 假设某个 Web 应用程序已从 Azure 存储中检索 50 MiB 的 Blob,并将其作为内容提供给某个用户。Consider the example of a web application that has retrieved a 50 MiB blob from the Azure Storage to serve as content to a user. 理想情况下,应用程序会将 Blob 缓存到磁盘本地,并在处理后续的用户请求时检索缓存的版本。Ideally, the application caches the blob locally to disk and then retrieves the cached version for subsequent user requests.

如果 Blob 自缓存以来未经过修改,避免检索 Blob 的方法之一是使用修改时间的条件标头来限定 GET 操作。One way to avoid retrieving a blob if it hasn't been modified since it was cached is to qualify the GET operation with a conditional header for modification time. 如果上次修改时间处于 Blob 缓存时间之后,则会检索并重新缓存 Blob。If the last modified time is after the time that the blob was cached, then the blob is retrieved and re-cached. 否则,将检索缓存的 Blob 以获得最佳性能。Otherwise, the cached blob is retrieved for optimal performance.

你还可以将应用程序设计为假定 Blob 在检索后的短时间内未有变化。You may also decide to design your application to assume that the blob remains unchanged for a short period after retrieving it. 在这种情况下,应用程序不需要检查 Blob 在该时间间隔内是否被修改。In this case, the application does not need to check whether the blob was modified during that interval.

配置数据、查找数据以及由应用程序频繁使用的其他数据非常适合进行缓存。Configuration data, lookup data, and other data that is frequently used by the application are good candidates for caching.

有关使用条件标头的详细信息,请参阅为 Blob 服务操作指定条件标头For more information about using conditional headers, see Specifying conditional headers for Blob service operations.

批量上传数据Uploading data in batches

在某些方案中,可将数据聚合在本地,然后定期批量进行上传,而不必立即上传每个数据片段。In some scenarios, you can aggregate data locally, and then periodically upload it in a batch instead of uploading each piece of data immediately. 例如,假设某个 Web 应用程序保留活动的日志文件。For example, suppose a web application keeps a log file of activities. 该应用程序可以在表中发生每个活动时上传该活动的详细信息(这需要许多存储操作),或者,可将活动详细信息保存到本地日志文件,然后定期将所有活动详细信息以带分隔符的文件的形式上传到 Blob。The application can either upload details of every activity as it happens to a table (which requires many storage operations), or it can save activity details to a local log file and then periodically upload all activity details as a delimited file to a blob. 如果每个日志条目的大小为 1 KB,则你可以在单个事务中上传数千个条目。If each log entry is 1 KB in size, you can upload thousands of entries in a single transaction. 单个事务支持上传一个最大大小为 64 MiB 的 Blob。A single transaction supports uploading a blob of up to 64 MiB in size. 应用程序开发人员必须针对可能发生的客户端设备故障或上传失败情况进行相应的设计。The application developer must design for the possibility of client device or upload failures. 如果需要根据某个时间间隔下载活动数据,而不是下载单个活动的数据,则我们建议使用 Blob 存储,而不要使用表存储。If the activity data needs to be downloaded for an interval of time rather than for a single activity, then using Blob storage is recommended over Table storage.

.NET 配置.NET configuration

如果使用的是 .NET Framework,则本部分列出的数种快速配置设置可以用于显著提高性能。If using the .NET Framework, this section lists several quick configuration settings that you can use to make significant performance improvements. 如果使用其他语言,则需查看类似的概念是否适用于所选择的语言。If using other languages, check to see if similar concepts apply in your chosen language.

使用 .NET CoreUse .NET Core

使用 .NET Core 2.1 或更高版本开发 Azure 存储应用程序,以利用性能增强功能。Develop your Azure Storage applications with .NET Core 2.1 or later to take advantage of performance enhancements. 建议尽量使用 .NET Core 3.x。Using .NET Core 3.x is recommended when possible.

有关 .NET Core 的性能改进的详细信息,请参阅以下博客文章:For more information on performance improvements in .NET Core, see the following blog posts:

提高默认连接限制Increase default connection limit

在 .NET 中,以下代码可将默认的连接限制(通常在客户端环境中为 2,在服务器环境中为 10)提高到 100。In .NET, the following code increases the default connection limit (which is usually two in a client environment or ten in a server environment) to 100. 通常情况下,应将值大致设置为应用程序使用的线程数。Typically, you should set the value to approximately the number of threads used by your application. 在打开任何连接前设置连接限制。Set the connection limit before opening any connections.

ServicePointManager.DefaultConnectionLimit = 100; //(Or More)  

对于其他编程语言,请参阅文档以确定如何设置连接限制。For other programming languages, see the documentation to determine how to set the connection limit.

有关详细信息,请参阅博客文章 Web 服务:并发连接For more information, see the blog post Web Services: Concurrent Connections.

增大最小线程数Increase minimum number of threads

如果结合异步任务使用同步调用,可能需要增大线程池中的线程数:If you are using synchronous calls together with asynchronous tasks, you may want to increase the number of threads in the thread pool:

ThreadPool.SetMinThreads(100,100); //(Determine the right number for your application)  

有关详细信息,请参阅 ThreadPool.SetMinThreads 方法。For more information, see the ThreadPool.SetMinThreads method.

不受限制的并行度Unbounded parallelism

虽然并行度有助于提高性能,但在使用不受限制的并行度时应保持谨慎,因为这意味着对线程数或并行请求数没有限制。While parallelism can be great for performance, be careful about using unbounded parallelism, meaning that there is no limit enforced on the number of threads or parallel requests. 请务必限制上传或下载数据、访问同一存储帐户中的多个分区以及访问同一分区中的多个项的并行请求。Be sure to limit parallel requests to upload or download data, to access multiple partitions in the same storage account, or to access multiple items in the same partition. 如果并行度不受限制,应用程序则可能会超出客户端设备的承受程度或超出存储帐户的可伸缩性目标,导致延迟和限制时间增长。If parallelism is unbounded, your application can exceed the client device's capabilities or the storage account's scalability targets, resulting in longer latencies and throttling.

客户端库和工具Client libraries and tools

为获得最佳性能,请始终使用 Microsoft 提供的最新客户端库和工具。For best performance, always use the latest client libraries and tools provided by Microsoft. Azure 存储客户端库适用于各种语言。Azure Storage client libraries are available for a variety of languages. Azure 存储还支持 PowerShell 和 Azure CLI。Azure Storage also supports PowerShell and Azure CLI. Microsoft 正在积极开发这些客户端库和工具,并注重其性能,使用最新服务版本对其进行更新,确保这些工具可以在内部协调好许多经过证实的做法。Microsoft actively develops these client libraries and tools with performance in mind, keeps them up-to-date with the latest service versions, and ensures that they handle many of the proven performance practices internally. 有关详细信息,请参阅 Azure 存储参考文档For more information, see the Azure Storage reference documentation.

处理服务错误Handle service errors

当服务无法处理请求时,Azure 存储会返回错误。Azure Storage returns an error when the service cannot process a request. 了解 Azure 存储在特定情况下可能返回的错误将有助于优化性能。Understanding the errors that may be returned by Azure Storage in a given scenario is helpful for optimizing performance.

超时和服务器繁忙错误Timeout and Server Busy errors

如果应用程序即将达到可伸缩性限制,Azure 存储可能会对其进行限制。Azure Storage may throttle your application if it approaches the scalability limits. 在某些情况下,Azure 存储可能会出于某种暂时性的状况而无法处理请求。In some cases, Azure Storage may be unable to handle a request due to some transient condition. 对于这两种情况,服务可能返回 503(服务器繁忙)或 500(超时)错误。In both cases, the service may return a 503 (Server Busy) or 500 (Timeout) error. 如果服务正在对数据分区进行重新均衡以提高吞吐量,则也可能会发生这些错误。These errors can also occur if the service is rebalancing data partitions to allow for higher throughput. 通常,客户端应用程序应重试导致上述某种错误的操作。The client application should typically retry the operation that causes one of these errors. 但是,如果 Azure 存储因为应用程序即将超出可伸缩性目标而限制应用程序,或者其他某种原因导致服务无法为请求提供服务,则过于频繁的重试可能会使问题变得更糟。However, if Azure Storage is throttling your application because it is exceeding scalability targets, or even if the service was unable to serve the request for some other reason, aggressive retries may make the problem worse. 建议使用指数退避重试策略,客户端库默认采用此行为。Using an exponential back off retry policy is recommended, and the client libraries default to this behavior. 例如,应用程序可能会在 2 秒后、4 秒后、10 秒后,以及 30 秒后进行重试,最后彻底放弃重试。For example, your application may retry after 2 seconds, then 4 seconds, then 10 seconds, then 30 seconds, and then give up completely. 这样,应用程序可明显减少其在服务中施加的负载,而不会使得导致出现限制的行为恶化。In this way, your application significantly reduces its load on the service, rather than exacerbating behavior that could lead to throttling.

连接错误可以立即重试,因为它不是限制造成的,而且应该是暂时性的。Connectivity errors can be retried immediately, because they are not the result of throttling and are expected to be transient.

不可重试的错误Non-retryable errors

客户端库将处理重试,同时能够识别哪些错误可重试,哪些不可重试。The client libraries handle retries with an awareness of which errors can be retried and which cannot. 但是,如果直接调用 Azure 存储 REST API,则不应重试某些错误。However, if you are calling the Azure Storage REST API directly, there are some errors that you should not retry. 例如,400(错误的请求)错误表示客户端应用程序发送了一个无法处理的请求(因为该请求未采用预期的格式)。For example, a 400 (Bad Request) error indicates that the client application sent a request that could not be processed because it was not in the expected form. 每次重新发送此请求都会导致相同的响应,因此没有必要重试。Resending this request results the same response every time, so there is no point in retrying it. 如果直接调用 Azure 存储 REST API,请注意潜在错误以及是否应重试这些错误。If you are calling the Azure Storage REST API directly, be aware of potential errors and whether they should be retried.

有关 Azure 存储错误代码的详细信息,请参阅状态和错误代码For more information on Azure Storage error codes, see Status and error codes.

复制和移动 BlobCopying and moving blobs

Azure 存储提供多种解决方案用于在存储帐户内部、在存储帐户之间,以及在本地系统与云之间复制和移动 Blob。Azure Storage provides a number of solutions for copying and moving blobs within a storage account, between storage accounts, and between on-premises systems and the cloud. 本部分将介绍其中的某些选项及其对性能的影响。This section describes some of these options in terms of their effects on performance. 有关有效地与 Blob 存储相互传输数据的信息,请参阅选择 Azure 数据传输解决方案For information about efficiently transferring data to or from Blob storage, see Choose an Azure solution for data transfer.

Blob 复制 APIBlob copy APIs

若要跨存储帐户复制 Blob,请使用从 URL 放置块操作。To copy blobs across storage accounts, use the Put Block From URL operation. 此操作以同步方式将任何 URL 源中的数据复制到块 Blob。This operation copies data synchronously from any URL source into a block blob. 使用 Put Block from URL 操作可以大幅减少跨存储帐户迁移数据时所需的带宽。Using the Put Block from URL operation can significantly reduce required bandwidth when you are migrating data across storage accounts. 由于复制操作在服务端中进行,因此无需下载并重新上传数据。Because the copy operation takes place on the service side, you do not need to download and re-upload the data.

若要复制同一存储帐户中的数据,请使用复制 Blob 操作。To copy data within the same storage account, use the Copy Blob operation. 复制同一存储帐户中的数据通常很快就能完成。Copying data within the same storage account is typically completed quickly.

使用 AzCopyUse AzCopy

AzCopy 命令行实用工具是向/从以及跨存储帐户批量传输 Blob 的简单高效选项。The AzCopy command-line utility is a simple and efficient option for bulk transfer of blobs to, from, and across storage accounts. AzCopy 已针对此方案进行优化,可以实现较高的传输速率。AzCopy is optimized for this scenario, and can achieve high transfer rates. AzCopy 版本 10 使用 Put Block From URL 操作跨存储帐户复制 Blob 数据。AzCopy version 10 uses the Put Block From URL operation to copy blob data across storage accounts. 有关详细信息,请参阅使用 AzCopy v10 将数据复制或移到 Azure 存储For more information, see Copy or move data to Azure Storage by using AzCopy v10.

使用 Azure Data BoxUse Azure Data Box

若要将大量数据导入 Blob 存储,请考虑使用 Azure Data Box 系列进行脱机传输。For importing large volumes of data into Blob storage, consider using the Azure Data Box family for offline transfers. 当你受到时间、网络可用性或成本的限制时,Azure 提供的 Data Box 设备是将大量数据移至 Azure 的理想选择。Azure-supplied Data Box devices are a good choice for moving large amounts of data to Azure when you're limited by time, network availability, or costs. 有关详细信息,请参阅 Azure DataBox 文档For more information, see the Azure DataBox Documentation.

内容分发Content distribution

有时,应用程序需要向位于同一区域或多个区域的许多用户提供相同的内容(例如网站主页中使用的产品演示视频)。Sometimes an application needs to serve the same content to many users (for example, a product demo video used in the home page of a website), located in either the same or multiple regions. 在这种情况下,使用 Azure CDN 等内容分发网络 (CDN) 按地理位置分发 blob 内容。In this scenario, use a Content Delivery Network (CDN) such as Azure CDN to distribute blob content geographically. 与存在于一个区域且无法以低延迟向其他区域交付内容的 Azure 存储帐户不同,Azure CDN 使用位于全世界多个数据中心的服务器。Unlike an Azure Storage account that exists in a single region and that cannot deliver content with low latency to other regions, Azure CDN uses servers in multiple data centers around the world. 此外,与单个存储帐户相比,CDN 通常可以支持更高的出口限制。Additionally, a CDN can typically support much higher egress limits than a single storage account.

关于 Azure CDN 的详细信息,请参阅 Azure CDNFor more information about Azure CDN, see Azure CDN.

使用元数据Use metadata

Blob 服务支持 HEAD 请求,这些请求可以包含 Blob 属性或元数据。The Blob service supports HEAD requests, which can include blob properties or metadata. 例如,如果应用程序需要某张照片中的 Exif(可交换图像格式)数据,则它可以检索该照片并从中提取数据。For example, if your application needs the Exif (exchangable image format) data from a photo, it can retrieve the photo and extract it. 为了节省带宽并改进性能,应用程序可以在上传照片时将 Exif 数据存储到 Blob 的元数据中。To save bandwidth and improve performance, your application can store the Exif data in the blob's metadata when the application uploads the photo. 随后,你只需使用 HEAD 请求即可检索元数据中的 Exif 数据。You can then retrieve the Exif data in metadata using only a HEAD request. 仅检索元数据而不检索 Blob 的全部内容可以大幅节省带宽,并减少提取 Exif 数据所需的处理时间。Retrieving only metadata and not the full contents of the blob saves significant bandwidth and reduces the processing time required to extract the Exif data. 请记住,每个 Blob 可以存储 8 KiB 元数据。Keep in mind that 8 KiB of metadata can be stored per blob.

快速上传 BlobUpload blobs quickly

若要快速上传 Blob,请先确定是要上传一个还是多个 Blob。To upload blobs quickly, first determine whether you will be uploading one blob or many. 请参阅以下指南,根据具体情况确定要使用的正确方法。Use the below guidance to determine the correct method to use depending on your scenario.

快速上传一个大型 BlobUpload one large blob quickly

若要快速上传单个大型 Blob,客户端应用程序可以并行上传其块或页,但需要考虑各个 Blob 的可伸缩性目标并综合考虑存储帐户的情况。To upload a single large blob quickly, a client application can upload its blocks or pages in parallel, being mindful of the scalability targets for individual blobs and the storage account as a whole. Azure 存储客户端库支持并行上传。The Azure Storage client libraries support uploading in parallel. 例如,可使用以下属性来指定 .NET 或 Java 中允许的并发请求数。For example, you can use the following properties to specify the number of concurrent requests permitted in .NET or Java. 其他受支持语言的客户端库提供类似的选项。Client libraries for other supported languages provide similar options.

快速上传多个 BlobUpload many blobs quickly

若要快速上传多个 Blob,请以并行方式上传。To upload many blobs quickly, upload blobs in parallel. 并行上传要快于通过并行块上传方式一次上传一个 Blob,因为这种情况下上传会分布到存储服务的多个分区中。Uploading in parallel is faster than uploading single blobs at a time with parallel block uploads because it spreads the upload across multiple partitions of the storage service. AzCopy 以并行方式执行上传操作,建议用于此方案。AzCopy performs uploads in parallel by default, and is recommended for this scenario. 有关详细信息,请参阅 AzCopy 入门For more information, see Get started with AzCopy.

选择正确的 Blob 类型Choose the correct type of blob

Azure 存储支持块 Blob、追加 Blob 和页 Blob。Azure Storage supports block blobs, append blobs, and page blobs. 在给定的使用方案中,对 Blob 类型的选择会影响到解决方案的执行情况和可伸缩性。For a given usage scenario, your choice of blob type will affect the performance and scalability of your solution.

需要以高效方式上传大量数据时,适合使用块 Blob。Block blobs are appropriate when you want to upload large amounts of data efficiently. 例如,需要将照片或视频上传到 Blob 存储的客户端应用程序适合使用块 Blob。For example, a client application that uploads photos or video to Blob storage would target block blobs.

追加 Blob 与块 Blob 的类似之处在于,它们都由块构成。Append blobs are similar to block blobs in that they are composed of blocks. 修改追加 Blob 时,块只会添加到 Blob 的末尾。When you modify an append blob, blocks are added to the end of the blob only. 在日志记录等方案中,如果应用程序需要将数据添加到现有的 Blob 时,则追加 Blob 会很有用。Append blobs are useful for scenarios such as logging, when an application needs to add data to an existing blob.

如果应用程序需要对数据执行随机写入,则适合使用页 Blob。Page blobs are appropriate if the application needs to perform random writes on the data. 例如,Azure 虚拟机磁盘将存储为页 Blob。For example, Azure virtual machine disks are stored as page blobs. 有关详细信息,请参阅了解块 Blob、追加 Blob 和页 BlobFor more information, see Understanding block blobs, append blobs, and page blobs.

后续步骤Next steps