排查 Azure Cache for Redis 中的数据丢失问题Troubleshoot data loss in Azure Cache for Redis

本文讨论如何诊断 Azure Cache for Redis 中可能实际发生的或觉察到的数据丢失问题。This article discusses how to diagnose actual or perceived data losses that might occur in Azure Cache for Redis.

备注

本指南中的多个故障排除步骤包括了运行 Redis 命令和监视各种性能指标的说明。Several of the troubleshooting steps in this guide include instructions to run Redis commands and monitor various performance metrics. 有关详细信息和说明,请参阅其他信息部分的文章。For more information and instructions, see the articles in the Additional information section.

密钥部分丢失Partial loss of keys

将密钥存储在内存中后,Azure Cache for Redis 不会随机删除密钥。Azure Cache for Redis doesn't randomly delete keys after they've been stored in memory. 但是,在响应“过期”或“逐出”策略以及显式密钥删除命令时,它确实会删除密钥。However, it does remove keys in response to expiration or eviction policies and to explicit key-deletion commands. 此外,在高级或标准版 Azure Cache for Redis 实例中写入到主节点的密钥可能不会立即在副本中出现。Keys that have been written to the primary node in a Premium or Standard Azure Cache for Redis instance also might not be available on a replica right away. 数据将以异步的非阻塞方式从主节点复制到副本。Data is replicated from the primary to the replica in an asynchronous and non-blocking manner.

如果你发现密钥在缓存中消失,请查看以下可能原因:If you find that keys have disappeared from your cache, check the following possible causes:

原因Cause 说明Description
密钥过期Key expiration 密钥因设置了超时而被删除。Keys are removed because of time-outs set on them.
密钥逐出Key eviction 在内存压力较大的情况下删除了密钥。Keys are removed under memory pressure.
删除密钥Key deletion 显式删除命令删除了密钥。Keys are removed by explicit delete commands.
异步复制Async replication 由于数据复制延迟,密钥未提供到副本中。Keys are not available on a replica because of data-replication delays.

密钥到期时间Key expiration

如果为密钥分配了超时,而该期限已过,则 Azure Cache for Redis 会自动删除该密钥。Azure Cache for Redis removes a key automatically if the key is assigned a time-out and that period has passed. 有关 Redis 密钥过期的详细信息,请参阅 EXPIRE 命令文档。For more information about Redis key expiration, see the EXPIRE command documentation. 还可以使用 SETSETEXGETSET 和其他 *STORE 命令来设置超时值。Time-out values also can be set by using the SET, SETEX, GETSET, and other *STORE commands.

若要获取有关已过期密钥数的统计信息,请使用 INFO 命令。To get stats on how many keys have expired, use the INFO command. Stats 部分显示已过期密钥的总数。The Stats section shows the total number of expired keys. Keyspace 部分提供有关设置了超时的密钥数以及平均超时值的详细信息。The Keyspace section provides more information about the number of keys with time-outs and the average time-out value.

# Stats

expired_keys:46583

# Keyspace

db0:keys=3450,expires=2,avg_ttl=91861015336

此外,可以查看缓存的诊断指标,以了解密钥丢失的时间与已过期密钥的高峰之间是否存在某种关联。You can also look at diagnostic metrics for your cache, to see if there's a correlation between when the key went missing and a spike in expired keys. 有关如何使用密钥空间通知或 MONITOR 调试此类问题的信息,请参阅 调试 Redis 密钥空间缺失附录。See the Appendix of Debugging Redis Keyspace Misses for information about using keyspace notifications or MONITOR to debug these types of issues.

密钥逐出Key eviction

Azure Cache for Redis 需要使用内存空间来存储数据。Azure Cache for Redis requires memory space to store data. 在必要时,它将清除密钥以释放可用内存。It purges keys to free up available memory when necessary. 如果 INFO 命令中的 used_memoryused_memory_rss 值即将达到配置的 maxmemory 设置,Azure Cache for Redis 将会根据 缓存策略从内存中开始逐出密钥。When the used_memory or used_memory_rss values in the INFO command approach the configured maxmemory setting, Azure Cache for Redis starts evicting keys from memory based on cache policy.

可以使用 INFO 命令来监视逐出的密钥数:You can monitor the number of evicted keys by using the INFO command:

# Stats

evicted_keys:13224

此外,还可以查看缓存的诊断指标,以了解密钥丢失的时间与已逐出密钥的高峰之间是否存在某种关联。You can also look at diagnostic metrics for your cache, to see if there's a correlation between when the key went missing and a spike in evicted keys. 有关如何使用密钥空间通知或 MONITOR 调试此类问题的信息,请参阅 调试 Redis 密钥空间缺失附录。See the Appendix of Debugging Redis Keyspace Misses for information about using keyspace notifications or MONITOR to debug these types of issues.

密钥删除Key deletion

Redis 客户端可以发出 DELHDEL 命令来显式删除 Azure Cache for Redis 中的密钥。Redis clients can issue the DEL or HDEL command to explicitly remove keys from Azure Cache for Redis. 可以使用 INFO 命令来跟踪删除操作数目。You can track the number of delete operations by using the INFO command. 如果已调用 DELHDEL 命令,它们将列出在 Commandstats 部分中。If DEL or HDEL commands have been called, they'll be listed in the Commandstats section.

# Commandstats

cmdstat_del:calls=2,usec=90,usec_per_call=45.00

cmdstat_hdel:calls=1,usec=47,usec_per_call=47.00

异步复制Async replication

标准或高级层中的任何 Azure Cache for Redis 实例都配置有一个主节点和至少一个副本。Any Azure Cache for Redis instance in the Standard or Premium tier is configured with a primary node and at least one replica. 数据将通过一个后台进程以异步方式从主节点复制到副本。Data is copied from the primary to a replica asynchronously by using a background process. redis.io 网站概括性地介绍了 Redis 数据复制的工作原理。The redis.io website describes how Redis data replication works in general. 如果客户端频繁写入 Redis,可能会发生部分数据丢失,因为在这种情况下无法保证此复制会即时完成。For scenarios where clients write to Redis frequently, partial data loss can occur because this replication is not guaranteed to be instantaneous. 例如,如果在客户端向主节点写入密钥之后、后台进程有机会将此密钥发送到副本之前主节点关闭,那么,在副本接管为新的主节点时,密钥就会丢失。 For example, if the primary goes down after a client writes a key to it, but before the background process has a chance to send that key to the replica, the key is lost when the replica takes over as the new primary.

密钥严重丢失或完全丢失Major or complete loss of keys

如果大部分或所有密钥在缓存中消失,请查看以下可能原因:If most or all keys have disappeared from your cache, check the following possible causes:

原因Cause 说明Description
密钥刷新Key flushing 已手动清除密钥。Keys have been purged manually.
选择了错误的数据库Incorrect database selection Azure Cache for Redis 设置为使用非默认数据库。Azure Cache for Redis is set to use a non-default database.
Redis 实例故障Redis instance failure Redis 服务器不可用。The Redis server is unavailable.

密钥刷新Key flushing

客户端可以调用 FLUSHDB 命令来删除单个数据库中的所有密钥,或调用 FLUSHALL 来删除 Redis 缓存中所有数据库中的所有密钥。 Clients can call the FLUSHDB command to remove all keys in a single database or FLUSHALL to remove all keys from all databases in a Redis cache. 若要查明密钥是否已刷新,请使用 INFO 命令。To find out whether keys have been flushed, use the INFO command. Commandstats 部分显示是否调用了 FLUSH 命令:The Commandstats section shows whether either FLUSH command has been called:

# Commandstats

cmdstat_flushall:calls=2,usec=112,usec_per_call=56.00

cmdstat_flushdb:calls=1,usec=110,usec_per_call=52.00

选择了错误的数据库Incorrect database selection

Azure Cache for Redis 默认使用 db0 数据库。Azure Cache for Redis uses the db0 database by default. 如果切换到其他数据库(例如 db1 ),并尝试从该数据库读取密钥,则 Azure Cache for Redis 将无法在其中找到这些密钥。If you switch to another database (for example, db1 ) and try to read keys from it, Azure Cache for Redis won't find them there. 每个数据库都是一个在逻辑上独立的单元,其中保存了不同的数据集。Every database is a logically separate unit and holds a different dataset. 使用 SELECT 命令来选择其他可用数据库,并在其中每个数据库中查找密钥。Use the SELECT command to use other available databases and look for keys in each of them.

Redis 实例故障Redis instance failure

Redis 是内存中数据存储。Redis is an in-memory data store. 数据保存在托管 Redis 缓存的物理机或虚拟机上。Data is kept on the physical or virtual machines that host the Redis cache. “基本”层中的 Azure Cache for Redis 实例只在单个虚拟机 (VM) 上运行。An Azure Cache for Redis instance in the Basic tier runs on only a single virtual machine (VM). 如果该 VM 关闭,则缓存中存储的所有数据都会丢失。If that VM is down, all data that you've stored in the cache is lost.

“标准”层和“高级”层中的缓存在复制的配置中使用两个 VM,能够以更高的复原能力防范数据丢失。Caches in the Standard and Premium tiers offer much higher resiliency against data loss by using two VMs in a replicated configuration. 当此类缓存中的主节点发生故障时,副本节点将会接管工作并自动提供数据。When the primary node in such a cache fails, the replica node takes over to serve data automatically. 这些 VM 位于独立的容错域和更新域中,从而可以最大程度地减少主节点和副本同时发生故障的几率。These VMs are located on separate domains for faults and updates, to minimize the chance of both becoming unavailable simultaneously. 但是,如果发生严重的数据中心故障,这些 VM 仍可能会一起关闭。If a major datacenter outage happens, however, the VMs might still go down together. 此时,数据将会丢失,但这种情况非常罕见。Your data will be lost in these rare cases.

考虑使用 Redis 数据持久性异地复制来改善数据保护,防范此类基础结构故障。Consider using Redis data persistence and geo-replication to improve protection of your data against these infrastructure failures.

其他信息Additional information