Use Speech service through a private endpoint

Azure Private Link lets you connect to services in Azure by using a private endpoint. A private endpoint is a private IP address that's accessible only within a specific virtual network and subnet.

This article explains how to set up and use Private Link and private endpoints with the Speech service. This article then describes how to remove private endpoints later, but still use the Speech resource.

Note

Before you proceed, review how to use virtual networks with Azure AI services.

Setting up a Speech resource for the private endpoint scenarios requires performing the following tasks:

  1. Create a custom domain name
  2. Turn on private endpoints
  3. Adjust existing applications and solutions

Private endpoints and Virtual Network service endpoints

Azure provides private endpoints and Virtual Network service endpoints for traffic that tunnels via the private Azure backbone network. The purpose and underlying technologies of these endpoint types are similar. But there are differences between the two technologies. We recommend that you learn about the pros and cons of both before you design your network.

There are a few things to consider when you decide which technology to use:

  • Both technologies ensure that traffic between the virtual network and the Speech resource doesn't travel over the public internet.
  • A private endpoint provides a dedicated private IP address for your Speech resource. This IP address is accessible only within a specific virtual network and subnet. You have full control of the access to this IP address within your network infrastructure.
  • Virtual Network service endpoints don't provide a dedicated private IP address for the Speech resource. Instead, they encapsulate all packets sent to the Speech resource and deliver them directly over the Azure backbone network.
  • Both technologies support on-premises scenarios. By default, when they use Virtual Network service endpoints, Azure service resources secured to virtual networks can't be reached from on-premises networks. But you can change that behavior.
  • Virtual Network service endpoints are often used to restrict the access for a Speech resource based on the virtual networks from which the traffic originates.
  • For Azure AI services, enabling the Virtual Network service endpoint forces the traffic for all Azure AI services resources to go through the private backbone network. That requires explicit network access configuration. (For more information, see Configure virtual networks and the Speech resource networking settings.) Private endpoints don't have this limitation and provide more flexibility for your network configuration. You can access one resource through the private backbone and another through the public internet by using the same subnet of the same virtual network.
  • Private endpoints incur extra costs. Virtual Network service endpoints are free.
  • Private endpoints require extra DNS configuration.
  • One Speech resource can work simultaneously with both private endpoints and Virtual Network service endpoints.

We recommend that you try both endpoint types before you make a decision about your production design.

For more information, see these resources:

This article describes the usage of the private endpoints with Speech service. Usage of the VNet service endpoints is described here.

Create a custom domain name

Caution

A Speech resource with a custom domain name enabled uses a different way to interact with Speech service. You might have to adjust your application code for both of these scenarios: with private endpoint and without private endpoint.

Follow these steps to create a custom subdomain name for Azure AI services for your Speech resource.

Caution

When you turn on a custom domain name, the operation is not reversible. The only way to go back to the regional name is to create a new Speech resource.

If your Speech resource has a lot of associated custom models and projects created via Speech Studio, we strongly recommend trying the configuration with a test resource before you modify the resource used in production.

To create a custom domain name using the Azure portal, follow these steps:

  1. Go to the Azure portal and sign in to your Azure account.

  2. Select the required Speech resource.

  3. In the Resource Management group on the left pane, select Networking.

  4. On the Firewalls and virtual networks tab, select Generate Custom Domain Name. A new right panel appears with instructions to create a unique custom subdomain for your resource.

  5. In the Generate Custom Domain Name panel, enter a custom domain name. Your full custom domain will look like: https://{your custom name}.cognitiveservices.azure.cn.

    Remember that after you create a custom domain name, it cannot be changed.

    After you've entered your custom domain name, select Save.

  6. After the operation finishes, in the Resource management group, select Keys and Endpoint. Confirm that the new endpoint name of your resource starts this way: https://{your custom name}.cognitiveservices.azure.cn.

Turn on private endpoints

We recommend using the private DNS zone attached to the virtual network with the necessary updates for the private endpoints. You can create a private DNS zone during the provisioning process. If you're using your own DNS server, you might also need to change your DNS configuration.

Decide on a DNS strategy before you provision private endpoints for a production Speech resource. And test your DNS changes, especially if you use your own DNS server.

Use one of the following articles to create private endpoints. These articles use a web app as a sample resource to make available through private endpoints.

Use these parameters instead of the parameters in the article that you chose:

Setting Value
Resource type Microsoft.CognitiveServices/accounts
Resource <your-speech-resource-name>
Target sub-resource account

DNS for private endpoints: Review the general principles of DNS for private endpoints in Azure AI services resources. Then confirm that your DNS configuration is working correctly by performing the checks described in the following sections.

Resolve DNS from the virtual network

This check is required.

Follow these steps to test the custom DNS entry from your virtual network:

  1. Sign in to a virtual machine located in the virtual network to which you attached your private endpoint.

  2. Open a Windows command prompt or a Bash shell, run nslookup, and confirm that it successfully resolves your resource's custom domain name.

    C:\>nslookup my-private-link-speech.cognitiveservices.azure.cn
    Server:  UnKnown
    Address:  168.63.129.16
    
    Non-authoritative answer:
    Name:    my-private-link-speech.privatelink.cognitiveservices.azure.cn
    Address:  172.28.0.10
    Aliases:  my-private-link-speech.cognitiveservices.azure.cn
    
  3. Confirm that the IP address matches the IP address of your private endpoint.

Resolve DNS from other networks

Perform this check only if you've turned on either the All networks option or the Selected Networks and Private Endpoints access option in the Networking section of your resource.

If you plan to access the resource by using only a private endpoint, you can skip this section.

  1. Sign in to a computer attached to a network allowed to access the resource.

  2. Open a Windows command prompt or Bash shell, run nslookup, and confirm that it successfully resolves your resource's custom domain name.

    C:\>nslookup my-private-link-speech.cognitiveservices.azure.cn
    Server:  UnKnown
    Address:  fe80::1
    
    Non-authoritative answer:
    Name:    vnetproxyv1-weu-prod.chinaeast2.chinacloudapp.cn
    Address:  13.69.67.71
    Aliases:  my-private-link-speech.cognitiveservices.azure.cn
              my-private-link-speech.privatelink.cognitiveservices.azure.cn
              chinaeast2.prod.vnet.cog.trafficmanager.cn
    

Note

The resolved IP address points to a virtual network proxy endpoint, which dispatches the network traffic to the private endpoint for the Speech resource. The behavior will be different for a resource with a custom domain name but without private endpoints. See this section for details.

Adjust an application to use a Speech resource with a private endpoint

A Speech resource with a custom domain interacts with the Speech service in a different way. This is true for a custom-domain-enabled Speech resource both with and without private endpoints. Information in this section applies to both scenarios.

Follow instructions in this section to adjust existing applications and solutions to use a Speech resource with a custom domain name and a private endpoint turned on.

A Speech resource with a custom domain name and a private endpoint turned on uses a different way to interact with the Speech service. This section explains how to use such a resource with the Speech service REST APIs and the Speech SDK.

Note

A Speech resource without private endpoints that uses a custom domain name also has a special way of interacting with the Speech service. This way differs from the scenario of a Speech resource that uses a private endpoint. This is important to consider because you may decide to remove private endpoints later. See Adjust an application to use a Speech resource without private endpoints later in this article.

Speech resource with a custom domain name and a private endpoint: Usage with the REST APIs

We use my-private-link-speech.cognitiveservices.azure.cn as a sample Speech resource DNS name (custom domain) for this section.

Speech service has REST APIs for Speech to text and Text to speech. Consider the following information for the private-endpoint-enabled scenario.

Speech to text has two REST APIs. Each API serves a different purpose, uses different endpoints, and requires a different approach when you're using it in the private-endpoint-enabled scenario.

The Speech to text REST APIs are:

Usage of the Speech to text REST API for short audio and the Text to speech REST API in the private endpoint scenario is the same. It's equivalent to the Speech SDK case described later in this article.

Speech to text REST API uses a different set of endpoints, so it requires a different approach for the private-endpoint-enabled scenario.

The next subsections describe both cases.

Speech to text REST API

Usually, Speech resources use Azure AI services regional endpoints for communicating with the Speech to text REST API. These resources have the following naming format:

{region}.api.cognitive.azure.cn.

This is a sample request URL:

https://api.cognitive.azure.cn/speechtotext/v3.1/transcriptions

Note

See this article for Microsoft Azure operated by 21Vianet endpoints.

After you turn on a custom domain for a Speech resource (which is necessary for private endpoints), that resource will use the following DNS name pattern for the basic REST API endpoint:

{your custom name}.cognitiveservices.azure.cn

That means that in our example, the REST API endpoint name is:

my-private-link-speech.cognitiveservices.azure.cn

And the sample request URL needs to be converted to:

https://my-private-link-speech.cognitiveservices.azure.cn/speechtotext/v3.1/transcriptions

This URL should be reachable from the virtual network with the private endpoint attached (provided the correct DNS resolution).

After you turn on a custom domain name for a Speech resource, you typically replace the host name in all request URLs with the new custom domain host name. All other parts of the request (like the path /speechtotext/v3.1/transcriptions in the earlier example) remain the same.

Tip

Some customers develop applications that use the region part of the regional endpoint's DNS name (for example, to send the request to the Speech resource deployed in the particular Azure region).

A custom domain for a Speech resource contains no information about the region where the resource is deployed. So the application logic described earlier will not work and needs to be altered.

Speech to text REST API for short audio and Text to speech REST API

The Speech to text REST API for short audio and the Text to speech REST API use two types of endpoints:

Note

See this article for Microsoft Azure operated by 21Vianet endpoints.

The detailed description of the special endpoints and how their URL should be transformed for a private-endpoint-enabled Speech resource is provided in this subsection about usage with the Speech SDK. The same principle described for the SDK applies for the Speech to text REST API for short audio and the Text to speech REST API.

Get familiar with the material in the subsection mentioned in the previous paragraph and see the following example. The example describes the Text to speech REST API. Usage of the Speech to text REST API for short audio is fully equivalent.

Note

When you're using the Speech to text REST API for short audio and Text to speech REST API in private endpoint scenarios, use a resource key passed through the Ocp-Apim-Subscription-Key header. (See details for Speech to text REST API for short audio and Text to speech REST API)

Using an authorization token and passing it to the special endpoint via the Authorization header will work only if you've turned on the All networks access option in the Networking section of your Speech resource. In other cases you will get either Forbidden or BadRequest error when trying to obtain an authorization token.

Text to speech REST API usage example

We use China North as a sample Azure region and my-private-link-speech.cognitiveservices.azure.cn as a sample Speech resource DNS name (custom domain). The custom domain name my-private-link-speech.cognitiveservices.azure.cn in our example belongs to the Speech resource created in the China North region.

To get the list of the voices supported in the region, perform the following request:

https://chinaeast2.tts.speech.azure.cn/cognitiveservices/voices/list

See more details in the Text to speech REST API documentation.

For the private-endpoint-enabled Speech resource, the endpoint URL for the same operation needs to be modified. The same request looks like this:

https://my-private-link-speech.cognitiveservices.azure.cn/tts/cognitiveservices/voices/list

See a detailed explanation in the Construct endpoint URL subsection for the Speech SDK.

Speech resource with a custom domain name and a private endpoint: Usage with the Speech SDK

Using the Speech SDK with a custom domain name and private-endpoint-enabled Speech resources requires you to review and likely change your application code.

We use my-private-link-speech.cognitiveservices.azure.cn as a sample Speech resource DNS name (custom domain) for this section.

Construct endpoint URL

Usually in SDK scenarios (and in the speech to text REST API for short audio and text to speech REST API scenarios), Speech resources use the dedicated regional endpoints for different service offerings. The DNS name format for these endpoints is:

{region}.{speech service offering}.speech.azure.cn

An example DNS name is:

chinaeast2.stt.speech.azure.cn

All possible values for the region (first element of the DNS name) are listed in Speech service supported regions. (See this article for Microsoft Azure operated by 21Vianet endpoints.) The following table presents the possible values for the Speech service offering (second element of the DNS name):

DNS name value Speech service offering
commands Custom Commands
convai Meeting Transcription
s2s Speech Translation
stt Speech to text
tts Text to speech
voice Custom Voice

So the earlier example (chinaeast2.stt.speech.azure.cn) stands for a Speech to text endpoint in China North.

Private-endpoint-enabled endpoints communicate with Speech service via a special proxy. Because of that, you must change the endpoint connection URLs.

A "standard" endpoint URL looks like:

{region}.{speech service offering}.speech.azure.cn/{URL path}

A private endpoint URL looks like:

{your custom name}.cognitiveservices.azure.cn/{speech service offering}/{URL path}

Example 1. An application is communicating by using the following URL (speech recognition using the base model for US English in China North):

wss://chinaeast2.stt.speech.azure.cn/speech/recognition/conversation/cognitiveservices/v1?language=en-US

To use it in the private-endpoint-enabled scenario when the custom domain name of the Speech resource is my-private-link-speech.cognitiveservices.azure.cn, you must modify the URL like this:

wss://my-private-link-speech.cognitiveservices.azure.cn/stt/speech/recognition/conversation/cognitiveservices/v1?language=en-US

Notice the details:

  • The host name chinaeast2.stt.speech.azure.cn is replaced by the custom domain host name my-private-link-speech.cognitiveservices.azure.cn.
  • The second element of the original DNS name (stt) becomes the first element of the URL path and precedes the original path. So the original URL /speech/recognition/conversation/cognitiveservices/v1?language=en-US becomes /stt/speech/recognition/conversation/cognitiveservices/v1?language=en-US.

Example 2. An application uses the following URL to synthesize speech in China East 2:

wss://chinaeast2.tts.speech.azure.cn/cognitiveservices/websocket/v1

The following equivalent URL uses a private endpoint, where the custom domain name of the Speech resource is my-private-link-speech.cognitiveservices.azure.cn:

wss://my-private-link-speech.cognitiveservices.azure.cn/tts/cognitiveservices/websocket/v1

The same principle in Example 1 is applied, but the key element this time is tts.

Modifying applications

Follow these steps to modify your code:

  1. Determine the application endpoint URL:

    • Turn on logging for your application and run it to log activity.
    • In the log file, search for SPEECH-ConnectionUrl. In matching lines, the value parameter contains the full URL that your application used to reach the Speech service.

    Example:

    (114917): 41ms SPX_DBG_TRACE_VERBOSE:  property_bag_impl.cpp:138 ISpxPropertyBagImpl::LogPropertyAndValue: this=0x0000028FE4809D78; name='SPEECH-ConnectionUrl'; value='wss://chinaeast2.stt.speech.azure.cn/speech/recognition/conversation/cognitiveservices/v1?traffictype=spx&language=en-US'
    

    So the URL that the application used in this example is:

    wss://chinaeast2.stt.speech.azure.cn/speech/recognition/conversation/cognitiveservices/v1?language=en-US
    
  2. Create a SpeechConfig instance by using a full endpoint URL:

    1. Modify the endpoint that you determined, as described in the earlier Construct endpoint URL section.

    2. Modify how you create the instance of SpeechConfig. Most likely, your application is using something like this:

      var config = SpeechConfig.FromSubscription(speechKey, azureRegion);
      

      This example doesn't work for a private-endpoint-enabled Speech resource because of the host name and URL changes that we described in the previous sections. If you try to run your existing application without any modifications by using the key of a private-endpoint-enabled resource, you get an authentication error (401).

      To make it work, modify how you instantiate the SpeechConfig class and use "from endpoint"/"with endpoint" initialization. Suppose we have the following two variables defined:

      • speechKey contains the key of the private-endpoint-enabled Speech resource.
      • endPoint contains the full modified endpoint URL (using the type required by the corresponding programming language). In our example, this variable should contain:
        wss://my-private-link-speech.cognitiveservices.azure.cn/stt/speech/recognition/conversation/cognitiveservices/v1?language=en-US
        

      Create a SpeechConfig instance:

      var config = SpeechConfig.FromEndpoint(endPoint, speechKey);
      
      auto config = SpeechConfig::FromEndpoint(endPoint, speechKey);
      
      SpeechConfig config = SpeechConfig.fromEndpoint(endPoint, speechKey);
      
      import azure.cognitiveservices.speech as speechsdk
      config = speechsdk.SpeechConfig(endpoint=endPoint, subscription=speechKey)
      
      SPXSpeechConfiguration *config = [[SPXSpeechConfiguration alloc] initWithEndpoint:endPoint subscription:speechKey];
      
      import * as sdk from "microsoft.cognitiveservices.speech.sdk";
      config: sdk.SpeechConfig = sdk.SpeechConfig.fromEndpoint(new URL(endPoint), speechKey);
      

Tip

The query parameters specified in the endpoint URI are not changed, even if they're set by other APIs. For example, if the recognition language is defined in the URI as query parameter language=en-US, and is also set to ru-RU via the corresponding property, the language setting in the URI is used. The effective language is then en-US.

Parameters set in the endpoint URI always take precedence. Other APIs can override only parameters that are not specified in the endpoint URI.

After this modification, your application should work with the private-endpoint-enabled Speech resources. We're working on more seamless support of private endpoint scenarios.

Use of Speech Studio

Speech Studio is a web portal with tools for building and integrating Azure AI Speech service in your application. When you work in Speech Studio projects, network connections and API calls to the corresponding Speech resource are made on your behalf. Working with private endpoints, virtual network service endpoints, and other network security options can limit the availability of Speech Studio features. You normally use Speech Studio when working with features, like custom speech and Audio Content Creation.

Reaching Speech Studio web portal from a Virtual network

To use Speech Studio from a virtual machine within an Azure Virtual network, you must allow outgoing connections to the required set of service tags for this virtual network. See details here.

Access to the Speech resource endpoint is not equal to access to Speech Studio web portal. Access to Speech Studio web portal via private or Virtual Network service endpoints is not supported.

Working with Speech Studio projects

This section describes working with the different kind of Speech Studio projects for the different network security options of the Speech resource. It's expected that the web browser connection to Speech Studio is established. Speech resource network security settings are set in Azure portal.

  1. Go to the Azure portal and sign in to your Azure account.
  2. Select the Speech resource.
  3. In the Resource Management group in the left pane, select Networking > Firewalls and virtual networks.
  4. Select one option from All networks, Selected Networks and Private Endpoints, or Disabled.

Custom speech and Audio Content Creation

The following table describes custom speech/audio content creation project accessibility per Speech resource Networking > Firewalls and virtual networks security setting.

Note

If you allow only private endpoints via the Networking > Private endpoint connections tab, then you can't use Speech Studio with the Speech resource. You can still use the Speech resource outside of Speech Studio.

Speech resource network security setting Speech Studio project accessibility
All networks No restrictions
Selected Networks and Private Endpoints Accessible from allowed public IP addresses
Disabled Not accessible

If you select Selected Networks and private endpoints, then you will see a tab with Virtual networks and Firewall access configuration options. In the Firewall section, you must allow at least one public IP address and use this address for the browser connection with Speech Studio.

If you allow only access via Virtual network, then in effect you don't allow access to the Speech resource through Speech Studio. You can still use the Speech resource outside of Speech Studio.

To use custom speech without relaxing network access restrictions on your production Speech resource, consider one of these workarounds.

  • Create another Speech resource for development that can be used on a public network. Prepare your custom model in Speech Studio on the development resource, and then copy the model to your production resource. See the Models_CopyTo REST request with Speech to text REST API.
  • You have the option to not use Speech Studio for custom speech. Use the Speech to text REST API for all custom speech operations.

Adjust an application to use a Speech resource without private endpoints

In this article, we noted several times that enabling a custom domain for a Speech resource is irreversible. Such a resource uses a different way of communicating with Speech service, compared to the ones that are using regional endpoint names.

This section explains how to use a Speech resource with a custom domain name but without any private endpoints with the Speech service REST APIs and Speech SDK. This might be a resource that was once used in a private endpoint scenario, but then had its private endpoints deleted.

DNS configuration

Remember how a custom domain DNS name of the private-endpoint-enabled Speech resource is resolved from public networks. In this case, the IP address resolved points to a proxy endpoint for a virtual network. That endpoint is used for dispatching the network traffic to the private-endpoint-enabled Azure AI services resource.

However, when all resource private endpoints are removed (or right after the enabling of the custom domain name), the CNAME record of the Speech resource is reprovisioned. It now points to the IP address of the corresponding Azure AI services regional endpoint.

So the output of the nslookup command looks like this:

C:\>nslookup my-private-link-speech.cognitiveservices.azure.cn
Server:  UnKnown
Address:  fe80::1

Non-authoritative answer:
Name:    apimgmthskquihpkz6d90kmhvnabrx3ms3pdubscpdfk1tsx3a.chinacloudapp.cn
Address:  13.93.122.1
Aliases:  my-private-link-speech.cognitiveservices.azure.cn
          api.cognitive.azure.cn
          cognitiveweprod.trafficmanager.cn
          cognitiveweprod.azure-api.net
          apimgmttmdjylckcx6clmh2isu2wr38uqzm63s8n4ub2y3e6xs.trafficmanager.cn
          cognitiveweprod-chinaeast2-01.regional.azure-api.net

Compare it with the output from this section.

Speech resource with a custom domain name and without private endpoints: Usage with the REST APIs

Speech to text REST API

Speech to text REST API usage is fully equivalent to the case of private-endpoint-enabled Speech resources.

Speech to text REST API for short audio and Text to speech REST API

In this case, usage of the Speech to text REST API for short audio and usage of the Text to speech REST API have no differences from the general case, with one exception. (See the following note.) You should use both APIs as described in the Speech to text REST API for short audio and Text to speech REST API documentation.

Note

When you're using the Speech to text REST API for short audio and Text to speech REST API in custom domain scenarios, use a Speech resource key passed through the Ocp-Apim-Subscription-Key header. (See details for Speech to text REST API for short audio and Text to speech REST API)

Using an authorization token and passing it to the special endpoint via the Authorization header will work only if you've turned on the All networks access option in the Networking section of your Speech resource. In other cases you will get either Forbidden or BadRequest error when trying to obtain an authorization token.

Speech resource with a custom domain name and without private endpoints: Usage with the Speech SDK

Using the Speech SDK with custom-domain-enabled Speech resources without private endpoints is equivalent to the general case as described in the Speech SDK documentation.

In case you have modified your code for using with a private-endpoint-enabled Speech resource, consider the following.

In the section on private-endpoint-enabled Speech resources, we explained how to determine the endpoint URL, modify it, and make it work through "from endpoint"/"with endpoint" initialization of the SpeechConfig class instance.

However, if you try to run the same application after having all private endpoints removed (allowing some time for the corresponding DNS record reprovisioning), you'll get an internal service error (404). The reason is that the DNS record now points to the regional Azure AI services endpoint instead of the virtual network proxy, and the URL paths like /stt/speech/recognition/conversation/cognitiveservices/v1?language=en-US isn't found there.

You need to roll back your application to the standard instantiation of SpeechConfig in the style of the following code:

var config = SpeechConfig.FromSubscription(speechKey, azureRegion);

Simultaneous use of private endpoints and Virtual Network service endpoints

You can use private endpoints and Virtual Network service endpoints to access to the same Speech resource simultaneously. To enable this simultaneous use, you need to use the Selected Networks and Private Endpoints option in the networking settings of the Speech resource in the Azure portal. Other options aren't supported for this scenario.

Pricing

For pricing details, see Azure Private Link pricing.

Learn more