How to choose between standard (manual) and autoscale provisioned throughput

APPLIES TO: NoSQL MongoDB Cassandra Gremlin Table

Azure Cosmos DB supports two types or offers of provisioned throughput: standard (manual) and autoscale. Both throughput types are suitable for mission-critical workloads that require high performance and scale, and are backed by the same Azure Cosmos DB SLAs on throughput, availability, latency, and consistency.

This article describes how to choose between standard (manual) and autoscale provisioned throughput for your workload.

Overview of provisioned throughput types

Before diving into the difference between standard (manual) and autoscale, it's important to first understand how provisioned throughput works in Azure Cosmos DB.

When you use provisioned throughput, you set the throughput, measured in request units per second (RU/s) required for your workload. The service provisions the capacity needed to support the throughput requirements. Database operations against the service, such as reads, writes, and queries consume some amount of request units (RUs). Learn more about request units.

The following table shows a high-level comparison between standard (manual) and autoscale.

Description Standard (manual) Autoscale
Best suited for Workloads with steady or predictable traffic Workloads with variable or unpredictable traffic. See use cases of autoscale.
How it works You provision a set amount of RU/s T that is static over time, unless you manually change them. Each second, you can use up to T RU/s throughput.

For example, if you set standard (manual) 400 RU/s, the throughput will stay at 400 RU/s.
You set the highest, or maximum RU/s Tmax you don't want the system to exceed. The system automatically scales the throughput T such that 0.1* Tmax <= T <= Tmax.

For example, if you set autoscale maximum RU/s of 4000 RU/s, the system will scale between 400 - 4000 RU/s.
When to use it You want to manually manage your throughput capacity (RU/s) and scale yourself.

You have high, consistent utilization of provisioned RU/s. Of all hours in a month, if you set provisioned RU/s T and use the full amount for 66% of the hours or more, it's estimated you'll save with standard (manual) provisioned RU/s.

This is based on a comparison between setting T in standard (manual) and the same amount Tmax in autoscale.
You want Azure Cosmos DB to manage your throughput capacity (RU/s) and scale, based on usage.

You have RU/s usage that is variable or hard to predict. Of all hours in a month, if you set autoscale max RU/s Tmax and use the full amount Tmax for 66% of the hours or less, it's estimated you'll save with autoscale.

This is based on a comparison between setting autoscale Tmax and the same amount T in standard (manual) throughput.
Billing model Billing is done on a per-hour basis for the RU/s provisioned, regardless of how many RUs were consumed.

Example:
  • Provision 400 RU/s
  • Hour 1: no requests
  • Hour 2: 400 RU/s worth of requests


  • For both hours 1 and 2, you'll be billed 400 RU/s for both hours at the standard (manual) rates.
    Billing is done on a per-hour basis, for the highest RU/s the system scaled to in the hour.

    Example:
  • Provision autoscale max RU/s of 4000 RU/s (scales between 400 - 4000 RU/s)
  • Hour 1: system scaled up to highest value of 3500 RU/s
  • Hour 2: system scaled down to minimum of 400 RU/s (always 10% of Tmax), due to no usage


  • You will be billed for 3500 RU/s in hour 1, and 400 RU/s in hour 2 at the autoscale provisioned throughput rates. The autoscale rate per RU/s is 1.5 * the standard (manual) rate.
    What happens if you exceed provisioned RU/s The RU/s remain static at what is provisioned. Any requests that consume beyond the provisioned RUs in a second will be rate-limited, with a response that recommends a time to wait before retrying. You can manually increase or decrease the RU/s if needed. The system will scale the RU/s up to the autoscale max RU/s. Any requests that consume beyond the autoscale max RU/s in a second will be rate-limited, with a response that recommends a time to wait before retrying.

    Understand your traffic patterns

    New applications

    If you are building a new application and do not know your traffic pattern yet, you may want to start at the entry point RU/s (or minimum RU/s) to avoid over-provisioning in the beginning. Or, if you have a small application that doesn't need high scale, you may want to provision just the minimum entry point RU/s to optimize cost. For small applications with a low expected traffic, you can also consider the serverless capacity mode.

    Whether you plan to use standard (manual) or autoscale, here's what you should consider:

    If you provision standard (manual) RU/s at the entry point of 400 RU/s, you won't be able to consume above 400 RU/s, unless you manually change the throughput. You'll be billed for 400 RU/s at the standard (manual) provisioned throughput rate, per hour.

    If you provision autoscale throughput with max RU/s of 4000 RU/s, the resource will scale between 400 to 4000 RU/s. Since the autoscale throughput billing rate per RU/s is 1.5x of the standard (manual) rate, for hours where the system has scaled down to the minimum of 400 RU/s, your bill will be higher than if you provisioned 400 RU/s manually. However, with autoscale, at any time, if your application traffic spikes, you can consume up to 4000 RU/s with no user action required. In general, you should weigh the benefit of being able to consume up to the max RU/s at any time with the 1.5x rate of autoscale.

    Use the Azure Cosmos DB capacity calculator to estimate your throughput requirements.

    Existing applications

    If you have an existing application using standard (manual) provisioned throughput, you can use Azure Monitor metrics to determine if your traffic pattern is suitable for autoscale.

    First, find the normalized request unit consumption metric of your database or container.

    Next, determine how the normalized utilization varies over time. Find the highest normalized utilization for each hour. Then, calculate the average normalized utilization across all hours. If you see that your average utilization is less than 66%, consider enabling autoscale on your database or container. In contrast, if the average utilization is greater than 66%, it's recommended to remain on standard (manual) provisioned throughput.

    Tip

    If your account is configured to use multi-region writes and has more than one region, the rate per 100 RU/s is the same for both manual and autoscale. This means that enabling autoscale incurs no additional cost regardless of utilization. As a result, it is always recommended to use autoscale with multi-region writes when you have more than one region, to take advantage of the savings from paying only for the RU/s your application scales to. If you have multi-region writes and one region, use the average utilization to determine if autoscale will result in cost savings.

    How to calculate average utilization

    Autoscale bills for the highest RU/s scaled to in an hour. When analyzing the normalized RU consumption over time, it is important to use the highest utilization per hour when calculating the average.

    To calculate the average of the highest utilization across all hours:

    1. Set the Aggregation on the Normalized RU Consumption metric to Max.
    2. Select the Time granularity to 1 hour.
    3. Navigate to Chart options.
    4. Select the bar chart option.
    5. Under Share, select the Download to Excel option. From the generated spreadsheet, calculate the average utilization across all hours.

    To see normalized RU consumption by hour, 1) Select time granularity to 1 hour; 2) Edit chart settings; 3) Select bar chart option; 4) Under Share, select Download to Excel option to calculate average across all hours.

    Measure and monitor your usage

    Over time, after you've chosen the throughput type, you should monitor your application and make adjustments as needed.

    When using autoscale, use Azure Monitor to see the provisioned autoscale max RU/s (Autoscale Max Throughput) and the RU/s the system is currently scaled to (Provisioned Throughput).

    The following example shows a variable or unpredictable workload using autoscale. Note when there isn't any traffic, the system scales the RU/s to the minimum of 10% of the max RU/s, which in this case is 5,000 RU/s and 50,000 RU/s, respectively.

    Screenshot of example workload using autoscale, with autoscale max RU/s of 50,000 RU/s and throughput ranging from 5000 - 50,000 RU/s.

    Migrate standard provisioned throughput to autoscale

    Users that want to migrate a large number of resources from standard provisioned throughput to autoscale can use an Azure CLI script that will migrate every throughput resource in an Azure subscription to autoscale. For more details see, Convert to Autoscale.

    Next steps