Performance testing
Testing the performance of a Redis instance can be a complicated task. The performance of a Redis instance can vary based on parameters such as the number of clients, the size of data values, and whether pipelining is being used. There also can be a tradeoff between optimizing throughput or latency.
Fortunately, several tools exist to make benchmarking Redis easier. Two of the most popular tools are redis-benchmark and memtier-benchmark. This article focuses on redis-benchmark.
How to use the redis-benchmark utility
Install open source Redis server to a client virtual machines (VMs) you can use for testing. The redis-benchmark utility is built into the open source Redis distribution. Follow the Redis documentation for instructions on how to install the open source image.
The client VM used for testing should be in the same region as your Azure Cache for Redis instance.
Make sure the client VM you use has at least as much compute and bandwidth as the cache instance being tested.
Configure your network isolation and firewall settings to ensure that the client VM is able to access your Azure Cache for Redis instance.
If you're using TLS/SSL on your cache instance, you need to add the
--tls
parameter to your redis-benchmark command or use a proxy like stunnel.Redis-benchmark
uses port 6379 by default. Use the-p
parameter to override this setting. You need to do use-p
, if you're using the SSL/TLS (port 6380).If you're using an Azure Cache for Redis instance that uses clustering, you need to add the
--cluster
parameter to yourredis-benchmark
command.Launch
redis-benchmark
from the CLI or shell of the VM. For instructions on how to configure and run the tool, see the redis-benchmark documentation and the redis-benchmark examples sections.
Benchmarking recommendations
It's important to not only test the performance of your cache under steady state conditions. Test under failover conditions too, and measure the CPU/Server Load on your cache during that time. You can start a failover by rebooting the primary node. Testing under failover conditions allows you to see the throughput and latency of your application during failover conditions. Failover can happen during updates or during an unplanned event. Ideally, you don't want to see CPU/Server Load peak to more than say 80% even during a failover as that can affect performance.
Consider using Premium tier Azure Cache for Redis instances. These cache sizes have better network latency and throughput because they're running on better hardware.
Tiers based on open source Redis, such as Standard and Premium, are only able to utilize one vCPU for the Redis process per shard.
Using TLS/SSL decreases throughput performance, which can be seen clearly in the example benchmarking data in the following tables.
Even though a Redis server is single-threaded, scaling up tends to improve throughput performance. System processes can use the extra vCPUs instead of sharing the vCPU being used by the Redis process.
On the Premium tier, scaling out, clustering, is typically recommended before scaling up. Clustering allows Redis server to use more vCPUs by sharding data. Throughput should increase roughly linearly when adding shards in this case.
On C0 and C1 Standard caches, while internal Defender scanning is running on the VMs, you might see short spikes in server load not caused by an increase in cache requests. You see higher latency for requests while internal Defender scans are run on these tiers a couple of times a day. Caches on the C0 and C1 tiers only have a single core to multitask, dividing the work of serving internal Defender scanning and Redis requests. You can reduce the effect by scaling to a higher tier offering with multiple CPU cores, such as C2.
The increased cache size on the higher tiers helps address any latency concerns. Also, at the C2 level, you have support for as many as 2,000 client connections.
Redis-benchmark examples
Pre-test setup: Prepare the cache instance with data required for the latency and throughput testing:
redis-benchmark -h yourcache.redis.cache.chinacloudapi.cn -a yourAccesskey -t SET -n 10 -d 1024
To test latency: Test GET requests using a 1k payload:
redis-benchmark -h yourcache.redis.cache.chinacloudapi.cn -a yourAccesskey -t GET -d 1024 -P 50 -c 4
To test throughput: Pipelined GET requests with 1k payload:
redis-benchmark -h yourcache.redis.cache.chinacloudapi.cn -a yourAccesskey -t GET -n 1000000 -d 1024 -P 50 -c 50
To test throughput of a Basic, Standard, or Premium tier cache using TLS: Pipelined GET requests with 1k payload:
redis-benchmark -h yourcache.redis.cache.chinacloudapi.cn -p 6380 -a yourAccesskey -t GET -n 1000000 -d 1024 -P 50 -c 50 --tls
Example performance benchmark data
The following tables show the maximum throughput values that were observed while testing various sizes of Standard and Premium caches. We used redis-benchmark
from an IaaS Azure VM against the Azure Cache for Redis endpoint. The throughput numbers are only for GET commands. Typically, SET commands have a lower throughput. These numbers are optimized for throughput. Real-world throughput under acceptable latency conditions might be lower.
The following configuration was used to benchmark throughput for the Basic, Standard, and Premium tiers:
redis-benchmark -h yourcache.redis.cache.chinacloudapi.cn -a yourAccesskey -t GET -n 1000000 -d 1024 -P 50 -c 50
Caution
These values aren't guaranteed and there's no SLA for these numbers. We strongly recommend that you should perform your own performance testing to determine the right cache size for your application. These numbers might change as we post newer results periodically.
Important
Microsoft periodically updates the underlying VM used in cache instances. This can change the performance characteristics from cache to cache and from region to region. The example benchmarking values on this page reflect older generation cache hardware in a single region. You may see better or different results in practice.
Standard tier
Instance | Size | vCPUs | Expected network bandwidth (Mbps) | GET requests per second without SSL (1-kB value size) | GET requests per second with SSL (1-kB value size) |
---|---|---|---|---|---|
C0 | 250 MB | Shared | 100 | 15,000 | 7,500 |
C1 | 1 GB | 1 | 500 | 38,000 | 20,720 |
C2 | 2.5 GB | 2 | 500 | 41,000 | 37,000 |
C3 | 6 GB | 4 | 1000 | 100,000 | 90,000 |
C4 | 13 GB | 2 | 500 | 60,000 | 55,000 |
C5 | 26 GB | 4 | 1,000 | 102,000 | 93,000 |
C6 | 53 GB | 8 | 2,000 | 126,000 | 120,000 |
Premium tier
Instance | Size | vCPUs | Expected network bandwidth (Mbps) | GET requests per second without SSL (1-kB value size) | GET requests per second with SSL (1-kB value size) |
---|---|---|---|---|---|
P1 | 6 GB | 2 | 1,500 | 180,000 | 172,000 |
P2 | 13 GB | 4 | 3,000 | 350,000 | 341,000 |
P3 | 26 GB | 4 | 3,000 | 350,000 | 341,000 |
P4 | 53 GB | 8 | 6,000 | 400,000 | 373,000 |
P5 | 120 GB | 32 | 6,000 | 400,000 | 373,000 |
Important
P5 instances in the China East and China North regions use 20 cores, not 32 cores.