Azure Redis 缓存的最佳做法Best practices for Azure Cache for Redis

遵循这些最佳做法可帮助最大化性能并在 Azure 中经济、高效地利用 Azure Redis 缓存实例。By following these best practices, you can help maximize the performance and cost-effective use of your Azure Cache for Redis instance.

配置和概念Configuration and concepts

  • 对生产系统使用标准层或高级层。Use Standard or Premium tier for production systems. 基本层是没有数据复制和 SLA 的单节点系统。The Basic tier is a single node system with no data replication and no SLA. 此外,使用至少一个 C1 缓存。Also, use at least a C1 cache. C0 缓存适用于简单的开发/测试方案,因为它们只配备了共享的 CPU 核心和少量的内存,并且容易出现“干扰性邻居”问题。C0 caches are meant for simple dev/test scenarios since they have a shared CPU core, little memory, and are prone to "noisy neighbor" issues.

  • 请记住,Redis 是一个内存中数据存储。Remember that Redis is an in-memory data store. 此文概述了可能发生数据丢失的一些情况。This article outlines some scenarios where data loss can occur.

  • 在开发系统时让它可以处理 由于修补和故障转移出现的 连接故障Develop your system such that it can handle connection blips because of patching and failover.

  • 配置 maxmemory-reserved 设置,以提高系统在遇到内存压力时的响应能力Configure your maxmemory-reserved setting to improve system responsiveness under memory pressure conditions. 对于写入密集型工作负荷,或者,如果你要在 Redis 中存储较大的值(100 KB 或更大),足够的预留设置尤为重要。A sufficient reservation setting is especially important for write-heavy workloads or if you're storing larger values (100 KB or more) in Redis. 应从缓存大小的 10% 开始,如果有进行大量写入的负载,则增加此百分比。You should start with 10% of the size of your cache and increase this percentage if you have write-heavy loads.

  • 具有较小值的 Redis 工作性能最佳,因此请考虑将较大数据分成多个键。Redis works best with smaller values, so consider chopping up bigger data into multiple keys. 此 Redis 介绍文章中列出了一些应该仔细考虑的因素。In this Redis discussion, some considerations are listed that you should consider carefully. 阅读本文了解较大值可能引起的问题示例。Read this article for an example problem that can be caused by large values.

  • 将缓存实例和应用程序定位在同一区域中。Locate your cache instance and your application in the same region. 连接到不同区域中的缓存可能会明显增大延迟并降低可靠性。Connecting to a cache in a different region can significantly increase latency and reduce reliability. 尽管可以从 Azure 外部进行连接,但不建议这样做,尤其是使用 Redis 作为缓存时。While you can connect from outside of Azure, it not recommended especially when using Redis as a cache. 如果只是使用 Redis 作为键/值存储,则延迟可能不是主要考虑因素。If you're using Redis as just a key/value store, latency may not be the primary concern.

  • 重复使用连接。Reuse connections. 创建新连接是高开销的操作,会增大延迟,因此请尽量重复使用连接。Creating new connections is expensive and increases latency, so reuse connections as much as possible. 如果你选择创建新连接,请确保在释放旧连接之前先将其关闭(即使是在 .NET 或 Java 等托管内存语言中)。If you choose to create new connections, make sure to close the old connections before you release them (even in managed memory languages like .NET or Java).

  • 使用管道。Use pipelining. 尝试选择支持 Redis 管道的 Redis 客户端,以便最有效地利用网络来获得尽量最佳的吞吐量。Try to choose a Redis client that supports Redis pipelining in order to make most efficient use of the network to get the best throughput you can.

  • 将客户端库配置为使用至少 15 秒的连接超时,以便即使是在 CPU 负载较高的情况下,系统也有时间建立连接。Configure your client library to use a connect timeout of at least 15 seconds, giving the system time to connect even under higher CPU conditions. 使用较小的连接超时值无法保证在该时间范围内能够建立连接。A small connection timeout value doesn't guarantee that the connection is established in that time frame. 如果出现问题(客户端 CPU 负载偏高、服务器 CPU 负载偏高等),则使用较短的连接超时值会导致连接尝试失败。If something goes wrong (high client CPU, high server CPU, and so on), then a short connection timeout value will cause the connection attempt to fail. 此行为通常会使问题变得更糟。This behavior often makes a bad situation worse. 使用较短的超时不仅无助于解决问题,而且会加剧问题,这会强制系统重启尝试重新连接的进程,从而可能导致出现“连接 -> 失败 -> 重试”循环。Instead of helping, shorter timeouts aggravate the problem by forcing the system to restart the process of trying to reconnect, which can lead to a connect -> fail -> retry loop. 我们通常建议将连接超时保留为 15 秒或更长。We generally recommend that you leave your connection Timeout at 15 seconds or higher. 让连接尝试在 15 或 20 秒后成功,比失败后立即重试更有利。It's better to let your connection attempt succeed after 15 or 20 seconds than to have it fail quickly only to retry. 与最初让系统花费更长时间尝试连接相比,这种重试循环可能会导致服务中断的持续时间变长。Such a retry loop can cause your outage to last longer than if you let the system just take longer initially.


    本指南特定于连接尝试,而与你愿意等待 GET 或 SET 等操作完成的时间无关。 This guidance is specific to the connection attempt and not related to the time you're willing to wait for an operation like GET or SET to complete.

  • 避免高开销操作 - 某些 Redis 操作(例如 KEYS 命令)的开销很大,应该避免。Avoid expensive operations - Some Redis operations, like the KEYS command, are very expensive and should be avoided. 有关详细信息,请参阅有关长时间运行的命令的一些注意事项For more information, see some considerations around long-running commands

  • 使用 TLS 加密 - 默认情况下,Azure Cache for Redis 需要 TLS 加密通信。Use TLS encryption - Azure Cache for Redis requires TLS encrypted communications by default. 目前支持 TLS 版本 1.0、1.1 和 1.2。TLS versions 1.0, 1.1 and 1.2 are currently supported. 但是,TLS 1.0 和 TLS 1.1 即将在全行业范围内弃用,因此,请尽可能使用 TLS 1.2。However, TLS 1.0 and 1.1 are on a path to deprecation industry-wide, so use TLS 1.2 if at all possible. 如果客户端库或工具不支持 TLS,则可以通过 Azure 门户管理 API 来启用未加密的连接。If your client library or tool doesn't support TLS, then enabling unencrypted connections can be done through the Azure portal or management APIs. 在无法进行加密连接的情况下,建议将缓存和客户端应用程序放入虚拟网络中。In such cases where encrypted connections aren't possible, placing your cache and client application into a virtual network would be recommended. 有关虚拟网络缓存方案中使用的端口的详细信息,请参阅此For more information about which ports are used in the virtual network cache scenario, see this table.

  • 空闲超时 - Azure Redis 当前有 10 分钟的连接空闲超时,因此应设置为少于 10 分钟。Idle Timeout - Azure Redis currently has 10 minute idle timeout for connections, so this should be set to less than 10 minutes.

内存管理Memory management

可能需要考虑到与 Redis 服务器实例中内存用量相关的一些问题。There are several things related to memory usage within your Redis server instance that you may want to consider. 下面是一些建议:Here are a few:

  • 选择适合应用程序的 逐出策略Choose an eviction policy that works for your application. Azure Redis 的默认策略是 volatile-lru,表示只有设置了 TTL 值的键才符合逐出条件。The default policy for Azure Redis is volatile-lru, which means that only keys that have a TTL value set will be eligible for eviction. 如果没有任何键具有 TTL 值,则系统不会逐出任何键。If no keys have a TTL value, then the system won't evict any keys. 如果你希望系统在遇到内存压力的情况下允许逐出任何键,可能需要考虑使用 allkeys-lru 策略。If you want the system to allow any key to be evicted if under memory pressure, then you may want to consider the allkeys-lru policy.

  • 为键设置过期值。Set an expiration value on your keys. 过期时会主动删除键,而不会等到出现内存压力的时候。An expiration will remove keys proactively instead of waiting until there's memory pressure. 如果由于内存压力而激发逐出,可能会导致服务器上的负载增大。When eviction does kick in because of memory pressure, it can cause additional load on your server. 有关详细信息,请参阅 EXPIREEXPIREAT 命令的文档。For more information, see the documentation for the EXPIRE and EXPIREAT commands.

特定于客户端库的指南Client library specific guidance

何时可以安全重试?When is it safe to retry?

遗憾的是,没有一个肯定的答案。Unfortunately, there's no easy answer. 每个应用程序需要确定哪些操作可重试,哪些操作不可重试。Each application needs to decide what operations can be retried and which can't. 每个操作具有不同的要求以及键间的依赖关系。Each operation has different requirements and inter-key dependencies. 下面是可能需要考虑的一些因素:Here are some things you might consider:

  • 即使 Redis 已成功根据要求运行了命令,你也仍可能会收到客户端错误。You can get client-side errors even though Redis successfully ran the command you asked it to run. 例如:For example:
    • 超时是与客户端相关的概念。Timeouts are a client-side concept. 如果操作已抵达服务器,服务器将运行命令,即使客户端放弃等待。If the operation reached the server, the server will run the command even if the client gives up waiting.
    • 当套接字连接发生错误时,无法知道操作是否确实在服务器上运行。When an error occurs on the socket connection, it's not possible to know if the operation actually ran on the server. 例如,在服务器处理请求之后(在客户端接收响应之前),会发生连接错误。For example, the connection error can happen after the server processed the request but before the client receives the response.
  • 如果我意外运行同一操作两次,应用程序如何做出反应?How does my application react if I accidentally run the same operation twice? 例如,如果我递增某个整数两次而不是一次,会发生什么情况?For instance, what if I increment an integer twice instead of once? 我的应用程序是否会从多个位置写入同一个键?Is my application writing to the same key from multiple places? 如果重试逻辑覆盖了应用的另一部分设置的值,会发生什么情况?What if my retry logic overwrites a value set by some other part of my app?

若要测试代码在出错的情况下的运行情况,请考虑使用重启功能If you would like to test how your code works under error conditions, consider using the Reboot feature. 重启即可了解连接故障对应用程序的影响。Rebooting allows you to see how connection blips affect your application.

性能测试Performance testing

  • 首先使用 redis-benchmark.exe 以在编写自己的性能测试之前感受可能的吞吐量/延迟。Start by using redis-benchmark.exe to get a feel for possible throughput/latency before writing your own perf tests. 可在此处找到 Redis 基准文档。Redis-benchmark documentation can be found here. 请注意,该 Redis 基准不支持 TLS,因此在运行测试之前必须通过门户启用非 TLS 端口Note that redis-benchmark doesn't support TLS, so you'll have to enable the Non-TLS port through the Portal before you run the test. 可在此处找到 Windows 兼容版本的 redis-benchmark.exeA windows compatible version of redis-benchmark.exe can be found here

  • 用于测试的客户端 VM 应与 Redis 缓存实例位于 同一区域The client VM used for testing should be in the same region as your Redis cache instance.

  • 建议为客户端使用 Dv2 VM 系列,因为它们具有更好的硬件,会提供最佳的结果。We recommend using Dv2 VM Series for your client as they have better hardware and will give the best results.

  • 确保所用客户端 VM 的计算和带宽资源 *至少与要测试的缓存相同。Make sure the client VM you use has *at least as much compute and bandwidth as the cache being tested.

  • 在缓存中 按照故障转移条件进行测试Test under failover conditions on your cache. 必须确保不只是在稳定状态条件下对缓存进行性能测试。It's important to ensure that you don't performance test your cache only under steady state conditions. 还需要按照故障转移条件进行测试,并在测试期间测量缓存中的 CPU/服务器负载。Also test under failover conditions and measure the CPU / Server Load on your cache during that time. 可以通过重新启动主节点来启动故障转移。You can initiate a failover by rebooting the primary node. 这样,便可以看到在根据条件进行故障转移的过程中(可以在更新期间进行,也可以在计划外事件期间进行),应用程序在吞吐量和延迟方面的行为。This will allow you to see how your application behaves in terms of throughput and latency during failover conditions (happens during updates and can happen during an unplanned event). 理想情况下,即使是在故障转移期间,CPU/服务器负载峰值也应该不会很高(例如超过 80%),因为这可能会影响性能。Ideally you dont't want to see CPU / Server Load peak to more than say 80% even during a failover as that can affect performance.

  • 某些大小的缓存 托管在具有 4 个或更多核心的 VM 上。Some cache sizes are hosted on VMs with 4 or more cores. 这有助于将 TLS 加密/解密以及 TLS 连接/断开连接工作负载分散到多个核心,使缓存 VM 上的总体 CPU 使用率降低。This is useful to distribute the TLS encryption / decryption as well as TLS connection / disconnection workloads across multiple cores to bring down overall CPU usage on the cache VMs. 参阅此文了解有关 VM 大小和核心的详细信息See here for details around VM sizes and cores

  • 如果是在 Windows 设备上操作,请在客户端计算机上 启用 VRSSEnable VRSS on the client machine if you are on Windows. 请参阅此处了解详细信息See here for details. PowerShell 脚本示例:Example PowerShell script:

    PowerShell -ExecutionPolicy Unrestricted Enable-NetAdapterRSS -Name ( Get-NetAdapter).NamePowerShell -ExecutionPolicy Unrestricted Enable-NetAdapterRSS -Name ( Get-NetAdapter).Name

  • 考虑使用高级层 Redis 实例Consider using Premium tier Redis instances. 这些缓存大小具有更好的网络延迟和吞吐量,因为它们是在 CPU 和网络两方面都更好的硬件上运行的。These cache sizes will have better network latency and throughput because they're running on better hardware for both CPU and Network.


    此处发布了我们观测到的性能结果供你参考。Our observed performance results are published here for your reference. 另请注意,SSL/TLS 会增大一些开销,因此,如果你使用传输加密,延迟和/或吞吐量可能会有变化。Also, be aware that SSL/TLS adds some overhead, so you may get different latencies and/or throughput if you're using transport encryption.

Redis 基准示例Redis-Benchmark examples

测试前的设置:使用下列延迟和吞吐量测试命令所需的数据准备缓存实例。Pre-test setup: Prepare the cache instance with data required for the latency and throughput testing commands listed below.

redis-benchmark -h -a yourAccesskey -t SET -n 10 -d 1024redis-benchmark -h -a yourAccesskey -t SET -n 10 -d 1024

测试延迟:使用 1k 有效负载测试 GET 请求。To test latency: Test GET requests using a 1k payload.

redis-benchmark -h -a yourAccesskey -t GET -d 1024 -P 50 -c 4redis-benchmark -h -a yourAccesskey -t GET -d 1024 -P 50 -c 4redis-benchmark -h -a yourAccesskey -t GET -d 1024 -P 50 -c 4

测试吞吐量: 管道化的 GET 请求,其有效负载为 1k。To test throughput: Pipelined GET requests with 1k payload.

redis-benchmark -h -a yourAccesskey -t GET -n 1000000 -d 1024 -P 50 -c 50redis-benchmark -h -a yourAccesskey -t GET -n 1000000 -d 1024 -P 50 -c 50