Performance testing

Testing the performance of a Redis instance can be a complicated task. The performance of a Redis instance can vary based on parameters such as the number of clients, the size of data values, and whether pipelining is being used. There also can be a tradeoff between optimizing throughput or latency.

Fortunately, several tools exist to make benchmarking Redis easier. Two of the most popular tools are redis-benchmark and memtier-benchmark. This article focuses on redis-benchmark.

How to use the redis-benchmark utility

  1. Install open source Redis server to a client VM you can use for testing. The redis-benchmark utility is built into the open source Redis distribution. Follow the Redis documentation for instructions on how to install the open source image.

  2. The client VM used for testing should be in the same region as your Azure Cache for Redis instance.

  3. Make sure the client VM you use has at least as much compute and bandwidth as the cache instance being tested.

  4. Configure your network isolation and firewall settings to ensure that the client VM is able to access your Azure Cache for Redis instance.

  5. If you're using TLS/SSL on your cache instance, you need to add the --tls parameter to your redis-benchmark command or use a proxy like stunnel.

  6. Redis-benchmark uses port 6379 by default. Use the -p parameter to override this setting. You need to do use -p, if you're using the SSL/TLS (port 6380).

  7. If you're using an Azure Cache for Redis instance that uses clustering, you need to add the --cluster parameter to your redis-benchmark command.

  8. Launch redis-benchmark from the CLI or shell of the VM. For instructions on how to configure and run the tool, see the redis-benchmark documentation and the redis-benchmark examples sections.

Benchmarking recommendations

  • It's important to not only test the performance of your cache under steady state conditions. Test under failover conditions too, and measure the CPU/Server Load on your cache during that time. You can start a failover by rebooting the primary node. Testing under failover conditions allows you to see the throughput and latency of your application during failover conditions. Failover can happen during updates or during an unplanned event. Ideally, you don't want to see CPU/Server Load peak to more than say 80% even during a failover as that can affect performance.

  • Consider using Premium tier Azure Cache for Redis instances. These cache sizes have better network latency and throughput because they're running on better hardware.

  • Tiers based on open source Redis, such as Standard and Premium, are only able to utilize one vCPU for the Redis process per shard.

  • Using TLS/SSL decreases throughput performance, which can be seen clearly in the example benchmarking data in the following tables.

  • Even though a Redis server is single-threaded, scaling up tends to improve throughput performance. System processes can use the extra vCPUs instead of sharing the vCPU being used by the Redis process.

  • On the Premium tier, scaling out, clustering, is typically recommended before scaling up. Clustering allows Redis server to use more vCPUs by sharding data. Throughput should increase roughly linearly when adding shards in this case.

Redis-benchmark examples

Pre-test setup: Prepare the cache instance with data required for the latency and throughput testing:

redis-benchmark -h yourcache.redis.cache.chinacloudapi.cn -a yourAccesskey -t SET -n 10 -d 1024

To test latency: Test GET requests using a 1k payload:

redis-benchmark -h yourcache.redis.cache.chinacloudapi.cn -a yourAccesskey -t GET -d 1024 -P 50 -c 4

To test throughput: Pipelined GET requests with 1k payload:

redis-benchmark -h yourcache.redis.cache.chinacloudapi.cn -a yourAccesskey -t  GET -n 1000000 -d 1024 -P 50  -c 50

To test throughput of a Basic, Standard, or Premium tier cache using TLS: Pipelined GET requests with 1k payload:

redis-benchmark -h yourcache.redis.cache.chinacloudapi.cn -p 6380 -a yourAccesskey -t  GET -n 1000000 -d 1024 -P 50 -c 50 --tls

Example performance benchmark data

The following tables show the maximum throughput values that were observed while testing various sizes of Standard and Premium caches. We used redis-benchmark from an IaaS Azure VM against the Azure Cache for Redis endpoint. The throughput numbers are only for GET commands. Typically, SET commands have a lower throughput. These numbers are optimized for throughput. Real-world throughput under acceptable latency conditions may be lower.

The following configuration was used to benchmark throughput for the Basic, Standard, and Premium tiers:

redis-benchmark -h yourcache.redis.cache.chinacloudapi.cn -a yourAccesskey -t  GET -n 1000000 -d 1024 -P 50  -c 50

Caution

These values aren't guaranteed and there's no SLA for these numbers. We strongly recommend that you should perform your own performance testing to determine the right cache size for your application. These numbers might change as we post newer results periodically.

Important

Microsoft periodically updates the underlying VM used in cache instances. This can change the performance characteristics from cache to cache and from region to region. The example benchmarking values on this page reflect older generation cache hardware in a single region. You may see better or different results in practice.

Standard tier

Instance Size vCPUs Expected network bandwidth (Mbps) GET requests per second without SSL (1-kB value size) GET requests per second with SSL (1-kB value size)
C0 250 MB Shared 100 15,000 7,500
C1 1 GB 1 500 38,000 20,720
C2 2.5 GB 2 500 41,000 37,000
C3 6 GB 4 1000 100,000 90,000
C4 13 GB 2 500 60,000 55,000
C5 26 GB 4 1,000 102,000 93,000
C6 53 GB 8 2,000 126,000 120,000

Premium tier

Instance Size vCPUs Expected network bandwidth (Mbps) GET requests per second without SSL (1-kB value size) GET requests per second with SSL (1-kB value size)
P1 6 GB 2 1,500 180,000 172,000
P2 13 GB 4 3,000 350,000 341,000
P3 26 GB 4 3,000 350,000 341,000
P4 53 GB 8 6,000 400,000 373,000
P5 120 GB 32 6,000 400,000 373,000

Important

P5 instances in the China East and China North regions use 20 cores, not 32 cores.

Next steps