Quotas and limits for Azure Speech

This article contains a quick reference and a detailed description of the quotas and limits for Azure Speech. The information applies to all pricing tiers of Azure Speech. This article also contains some best practices to avoid request throttling.

For the Free (F0) pricing tier, see the monthly allowances on the pricing page.

Reference for quotas and limits

The following sections provide a quick guide to the quotas and limits that apply to Azure Speech.

For information about adjustable quotas for Standard (S0) Azure Speech resources, see more explanations, best practices, and adjustment instructions later in this article. The quotas and limits for Free (F0) Azure Speech resources aren't adjustable.

Important

If you switch an AI Services resource for Azure Speech from the Free (F0) pricing tier to Standard (S0), the change of the corresponding quotas might take up to several hours.

Speech-to-text quotas and limits per resource

The following sections describe speech-to-text quotas and limits per Azure Speech resource. For information about adjustable quotas, see more explanations in this article.

Real-time speech to text and speech translation

You can use real-time speech to text with the Speech SDK or the Speech-to-text REST API for short audio.

These limits apply to concurrent real-time speech-to-text requests and speech translation requests combined. For example, if you have 60 concurrent speech-to-text requests and 40 concurrent speech translation requests, you reach the limit of 100 concurrent requests.

Quota Free (F0) Standard (S0)
Concurrent request limit for base model endpoint 1

This limit isn't adjustable.
100 (default value)

The rate is adjustable for Standard (S0) resources. See more explanations, best practices, and adjustment instructions later in this article.
Concurrent request limit for custom endpoint 1

This limit isn't adjustable.
100 (default value)

The rate is adjustable for Standard (S0) resources. See more explanations, best practices, and adjustment instructions later in this article.
Maximum audio length for real-time diarization Not applicable 240 minutes per file

Fast transcription

Quota Free (F0) Standard (S0)
Maximum audio input file size Not applicable < 300 MB
Maximum audio length Not applicable < 120 minutes per file
Maximum requests per minute Not applicable 600

Batch transcription

Quota Free (F0) Standard (S0)
Speech-to-text REST API limit Not available for F0 100 requests per 10 seconds (600 requests per minute)
Maximum file size for audio input Not applicable 1 GB
Maximum number of blobs per container Not applicable 10,000
Maximum number of files per transcription request (when you're using multiple content URLs as input) Not applicable 1,000
Maximum audio length for transcriptions with diarization enabled Not applicable 240 minutes per file

Model customization

The limits in this table apply per Azure Speech resource when you create a custom speech model.

Quota Free (F0) Standard (S0)
REST API limit 100 requests per 10 seconds (600 requests per minute) 100 requests per 10 seconds (600 requests per minute)
Maximum number of custom model deployments per Azure Speech resource 1 50
Maximum number of speech datasets 2 500
Maximum file size for the acoustic dataset for data import 2 GB 2 GB
Maximum file size for the language dataset for data import 200 MB 1.5 GB
Maximum file size for the pronunciation dataset for data import 1 KB 1 MB
Maximum text size when you're using the text parameter in the Models_Create API request 200 KB 500 KB

Text-to-speech quotas and limits per resource

The following sections describe text-to-speech quotas and limits per Azure Speech resource. For information about adjustable quotas, see more explanations later in this article.

Real-time text to speech

You can use real-time text to speech with the Speech SDK or the text-to-speech REST API. Unless otherwise specified, the limits aren't adjustable.

Quota Free (F0) Standard (S0)
Maximum number of transactions per time period for standard voices 20 transactions per 60 seconds

This limit isn't adjustable.
200 transactions per second (TPS) (default value)

The rate is adjustable up to 1,000 TPS for Standard (S0) resources. See more explanations, best practices, and adjustment instructions later in this article.
Maximum audio length produced per request 10 minutes 10 minutes
Maximum total number of distinct <voice> and <audio> tags in SSML 50 50
Maximum SSML message size per turn for WebSocket 64 KB 64 KB

Note

Most HTTP 429 errors with text-to-speech standard voice are caused by limited backend service capacity for a specific voice in the selected region, not by quota limits. Increasing your quota doesn't resolve these errors. For best results, use the voice in its native region or select a more popular voice in your current region.

Audio Content Creation tool

Quota Free (F0) Standard (S0)
File size (plain text in SSML)1 3,000 characters per file 20,000 characters per file
File size (lexicon file)2 30 KB per file 100 KB per file
Billable characters in SSML 15,000 characters per file 100,000 characters per file
Export to audio library 1 concurrent task Not applicable

1 The limit applies only to plain text in SSML and doesn't include tags.

2 The characters of the lexicon file aren't charged. Only the lexicon elements in SSML are counted as billable characters. To learn more, refer to Billable characters.

Detailed description, quota adjustment, and best practices

Some Azure Speech quotas are adjustable. This section provides more explanations, best practices, and adjustment instructions.

The following quotas are adjustable for Standard (S0) resources. The Free (F0) request limits aren't adjustable.

Before you request a quota increase (where applicable), check your current transactions per second (TPS) or tokens per minute (TPM) and ensure that you need to increase the quota.

Note

Batch transcription and batch synthesis are asynchronous processes. They process jobs one by one in a queue. So, increasing the quota doesn't improve transcription performance. For performance improvements, see Best practices for improving performance.

Azure Speech uses autoscaling technologies to bring the required computational resources in on-demand mode. At the same time, Azure Speech tries to keep your costs low by not maintaining an excessive amount of hardware capacity.

Let's look at an example. Suppose that your application receives response code 429, which indicates that there are too many requests. Your application receives this response even though your workload is within the limits defined in the earlier Reference for quotas and limits section. The most likely explanation is that Azure Speech is scaling up to your demand and didn't reach the required scale yet. So, Azure Speech doesn't immediately have enough resources to serve the request. In such cases, increasing the quota doesn't help. In most cases, Azure Speech will scale up soon and resolve the issue that's causing response code 429.

As a best practice, every implementation should gracefully handle 429 errors with retry logic to ensure best performance and to handle autoscaling. Consider implementing this best practice before you request additional quota, as described in the next section.

General best practices to mitigate throttling during autoscaling

To minimize issues related to throttling, it's a good idea to use the following techniques:

  • Implement retry logic in your application to handle 429 errors.

  • Avoid sharp changes in the workload. Increase the workload gradually.

    For example, let's say your application is using text to speech, and your current workload is 5 TPS. The next second, you increase the load to 20 TPS (that is, four times more). Azure Speech immediately starts scaling up to fulfill the new load, but it can't scale as needed within one second. Some of the requests get response code 429 (too many requests).

  • Test different patterns for load increases. For more information, see the workload pattern example in this article.

  • Create more Azure Speech resources in different regions, and distribute the workload among them. Creating multiple Azure Speech resources in the same region doesn't affect the performance, because the same backend cluster serves all resources.

Later sections describe specific cases of adjusting quotas.

Example of a best practice for workload patterns

Here's a general example of a good approach to take. It's meant only as a template that you can adjust as necessary for your own use.

Suppose that an Azure Speech resource has the concurrent request limit set to 300. Start the workload from 20 concurrent connections, and increase the load by 20 concurrent connections every 90 to 120 seconds. Control the Azure Speech responses, and implement the logic that falls back (reduces the load) if you get too many requests (response code 429). Then, retry the load increase in one minute. If it still doesn't work, try again in two minutes. Use a pattern of 1-2-4-4 minutes for the intervals.

Generally, it's a good idea to test the workload and the workload patterns before you go to production.

Voice Live: Increase the real-time speech-to-text concurrent request limit

Prepare the required information

For instructions on how to get the general resource information that you need, see Create and submit a quota increase request later in this article.

Note

The token per minute (TPM) limit is dependent on the new connections per minute (NCPM) limit. It increases automatically when the NCPM limit increases.

The formula is: TPM = NCPM * 4,000 tokens. For example: 30 NCPM * 4,000 tokens = 120,000 TPM.

Create the quota increase request

To create the request with the collected information, follow the steps in Create and submit a quota increase request later in this article.

Speech to text: Increase the real-time speech-to-text concurrent request limit

By default, the number of concurrent real-time speech-to-text and speech translation requests combined is limited to:

  • 100 per resource in the base model.
  • 100 per custom endpoint in the custom model.

For the Standard pricing tier, you can increase this amount. Before you submit the request, ensure that you're familiar with the material discussed earlier in this article, such as the best practices to mitigate throttling.

Concurrent request limits for base and custom models need to be adjusted separately. An Azure Speech resource can be associated with many custom endpoints that host many custom model deployments. You must separately request any limit adjustments per custom endpoint.

Increasing the limit of concurrent requests doesn't directly affect your costs. Azure Speech uses a payment model that requires that you pay for only what you use. The limit defines how high Azure Speech can scale before it starts throttle your requests.

You can't see the existing value of the parameter for the concurrent request limit in the Azure portal, the command-line tools, or API requests. To verify the existing value, create an Azure support request.

Prepare the required information

For instructions on how to get the general resource information that you need, see Create and submit a quota increase request later in this article.

To create an increase request for custom speech, you also need to provide a custom endpoint ID. To get this information for the custom speech endpoint:

  1. Go to the Speech Studio portal.
  2. Sign in if necessary, and then go to Custom speech.
  3. Select your project, and then go to Deployment.
  4. Select the required endpoint.
  5. Copy and save the Endpoint ID value.

Create the quota increase request

To create the request with the collected information, follow the steps in Create and submit a quota increase request later in this article.

Fast transcription: Increase maximum requests per minute

Prepare the required information

For instructions on how to get the general resource information that you need, see Create and submit a quota increase request later in this article.

To create an increase request for fast transcription, you also need to provide the average audio length per API request.

An example length is 5 minutes/request. Provide an estimate based on the workload that you want to process.

Create the quota increase request

To create the request with the collected information, follow the steps in Create and submit a quota increase request later in this article.

Text to speech: Increase the real-time TPS limit

For the Standard pricing tier, you can increase the real-time TPS limit. Before you submit the request, ensure that you're familiar with the material discussed earlier in this article, such as the best practices to mitigate throttling.

Estimate your needs

  • Usage Under ¥10,000/month: Typically, 32 TPS is sufficient, assuming that your peak usage is within 10 times your average.
  • Default limit: 200 TPS is available by default. This limit exceeds most use cases.
Example scenario

For example, assume that you're building a call center in which:

  • The number of concurrent calls is 1,000.
  • You expect agents to speak half the time.
  • Average TTS response length is 5 seconds.

The required TPS is: 1,000 calls / (2×5 seconds) = 100 TPS.

A TPS increase request requires the following details:

  • Peak TPS
  • Average TPS
  • Average TTS request length (in characters)

With this data, you can estimate your monthly TTS usage by using this formula: monthly usage = average TPS × request length × 3600 × 24 × 30.

Multiply the result by the unit price of ¥15 per million characters to estimate the monthly cost.

Note

If your estimated usage significantly exceeds your budget, you might be overestimating your needs.

Cost considerations

Increasing the concurrent request limit does not directly affect your costs. You pay for only what you use. The limit simply defines how much Azure Speech can scale before throttling begins.

You can't see the existing value of the parameter for the concurrent request limit in the Azure portal, the command-line tools, or API requests. To verify the existing value, create an Azure support request.

Prepare the required information

For instructions on how to get the general resource information that you need, see Create and submit a quota increase request later in this article.

To create an increase request for standard voice, you also need to provide the voice names that you're requesting the increase for. You can find a list of all voice names in Language and voice support for Azure Speech.

Create the quota increase request

To create the request with the collected information, follow the steps in Create and submit a quota increase request later in this article.

Create and submit a quota increase request

To get the resource information that you need for the quota increase request, follow these steps:

  1. Go to the Azure portal.

  2. Select the resource for which you want to increase the concurrency request limit.

  3. In the Resource Management group, select Properties.

  4. Copy and save the values of the following fields:

    • Subscription ID
    • Resource ID
    • Location (your endpoint region)