Upload training and testing datasets for custom speech

You need audio or text data for testing the accuracy of speech recognition or training your custom models. For information about the data types supported for testing or training your model, see Training and testing datasets.

Upload datasets

Follow these steps to upload datasets for training (fine-tuning) your custom speech model.

Important

Repeat the steps to upload testing datasets (such as Audio only) that you need later when you create a test. You can upload multiple datasets for training and testing.

To upload your own datasets in Speech Studio, follow these steps:

Sign in to the Speech Studio.
Select Custom speech > Your project name > Speech datasets > Upload data.
Select the Training data or Testing data tab.
Select a dataset type, and then select Next.
Specify the dataset location, and then select Next. You can choose a local file or enter a remote location such as Azure Blob URL. If you select a remote location and you don't use trusted Azure services security mechanism, then the remote location should be a URL that can be retrieved with a simple anonymous GET request. For example, a SAS URL or a publicly accessible URL. URLs that require extra authorization or expect user interaction aren't supported.

Note

If you use Azure Blob URL, you can ensure maximum security of your dataset files by using trusted Azure services security mechanism. You use the same techniques as for Batch transcription and plain Storage Account URLs for your dataset files. See details here.
Enter the dataset name and description, and then select Next.
Review your settings, and then select Save and close.

After your dataset is uploaded, go to the Train custom models page to train a custom model.

Before proceeding, make sure that you have the Speech CLI installed and configured.

With the Speech CLI and Speech to text REST API, unlike the Speech Studio, you don't choose whether a dataset is for testing or training at the time of upload. You specify how a dataset is used when you train a model or run a test.

Although you don't indicate whether the dataset is for testing or training, you must specify the dataset kind. The dataset kind is used to determine which type of dataset is created. In some cases, a dataset kind is only used for testing or training, but you shouldn't take a dependency on that. The Speech CLI and REST API kind values correspond to the options in the Speech Studio as described in the following table:

CLI and API kind	Portal options
Acoustic	Training data: Audio + human-labeled transcript Testing data: Transcript (automatic audio synthesis) Testing data: Audio + human-labeled transcript
AudioFiles	Testing data: Audio
Language	Training data: Plain text
LanguageMarkdown	Training data: Structured text in markdown format
Pronunciation	Training data: Pronunciation
OutputFormatting	Training data: Output format

Important

You don't use the Speech CLI or REST API to upload data files directly. First you store the training or testing dataset files at a URL that the Speech CLI or REST API can access. After you upload the data files, you can use the Speech CLI or REST API to create a dataset for custom speech testing or training.

To create a dataset and connect it to an existing project, use the spx csr dataset create command. Construct the request parameters according to the following instructions:

Set the project property to the ID of an existing project. The project property is recommended so that you can also manage fine-tuning for custom speech in the Speech Studio. To get the project ID, see Get the project ID for the REST API documentation.
Set the required kind property. The possible set of values for a training dataset kind are: Acoustic, AudioFiles, Language, LanguageMarkdown, and Pronunciation.
Set the required contentUrl property. This parameter is the location of the dataset. If you don't use trusted Azure services security mechanism (see next Note), then the contentUrl property should be a URL that can be retrieved with a simple anonymous GET request. For example, a SAS URL or a publicly accessible URL. URLs that require extra authorization, or expect user interaction aren't supported.

Note

If you use Azure Blob URL, you can ensure maximum security of your dataset files by using trusted Azure services security mechanism. You use the same techniques as for Batch transcription and plain Storage Account URLs for your dataset files. See details here.
Set the required language property. The dataset locale must match the locale of the project. The locale can't be changed later. The Speech CLI language property corresponds to the locale property in the JSON request and response.
Set the required name property. This parameter is the name that is displayed in the Speech Studio. The Speech CLI name property corresponds to the displayName property in the JSON request and response.

Here's an example Speech CLI command that creates a dataset and connects it to an existing project:

spx csr dataset create --api-version v3.2 --kind "Acoustic" --name "My Acoustic Dataset" --description "My Acoustic Dataset Description" --project YourProjectId --content YourContentUrl --language "en-US"

Important

You must set --api-version v3.2. The Speech CLI uses the REST API, but doesn't yet support versions later than v3.2.

You should receive a response body in the following format:

{
  "self": "https://chinanorth2.api.cognitive.azure.cn/speechtotext/v3.2/datasets/aaaabbbb-0000-cccc-1111-dddd2222eeee",
  "kind": "Acoustic",
  "links": {
    "files": "https://chinanorth2.api.cognitive.azure.cn/speechtotext/v3.2/datasets/23b6554d-21f9-4df1-89cb-f84510ac8d23/files"
  },
  "project": {
    "self": "https://chinanorth2.api.cognitive.azure.cn/speechtotext/v3.2/projects/bbbbcccc-1111-dddd-2222-eeee3333ffff"
  },
  "properties": {
    "textNormalizationKind": "Default",
    "acceptedLineCount": 2,
    "rejectedLineCount": 0,
    "duration": "PT59S"
  },
  "lastActionDateTime": "2024-07-14T17:36:30Z",
  "status": "Succeeded",
  "createdDateTime": "2024-07-14T17:36:14Z",
  "locale": "en-US",
  "displayName": "My Acoustic Dataset",
  "description": "My Acoustic Dataset Description",
  "customProperties": {
    "PortalAPIVersion": "3"
  }
}

The top-level self property in the response body is the dataset's URI. Use this URI to get details about the dataset's project and files. You also use this URI to update or delete a dataset.

For Speech CLI help with datasets, run the following command:

spx help csr dataset

CLI and API kind	Portal options
Acoustic	Training data: Audio + human-labeled transcript Testing data: Transcript (automatic audio synthesis) Testing data: Audio + human-labeled transcript
AudioFiles	Testing data: Audio
Language	Training data: Plain text
LanguageMarkdown	Training data: Structured text in markdown format
Pronunciation	Training data: Pronunciation
OutputFormatting	Training data: Output format

Important

To create a dataset and connect it to an existing project, use the Datasets_Create operation of the Speech to text REST API. Construct the request body according to the following instructions:

Set the project property to the ID of an existing project. The project property is recommended so that you can also manage fine-tuning for custom speech in the Speech Studio. To get the project ID, see Get the project ID for the REST API documentation.
Set the required kind property. The possible set of values for a training dataset kind are: Acoustic, AudioFiles, Language, LanguageMarkdown, and Pronunciation.
Set the required contentUrl property. This property is the location of the dataset. If you don't use trusted Azure services security mechanism (see next Note), then the contentUrl property should be a URL that can be retrieved with a simple anonymous GET request. For example, a SAS URL or a publicly accessible URL. URLs that require extra authorization, or expect user interaction aren't supported.

Note

If you use Azure Blob URL, you can ensure maximum security of your dataset files by using trusted Azure services security mechanism. You use the same techniques as for Batch transcription and plain Storage Account URLs for your dataset files. See details here.
Set the required locale property. The dataset locale must match the locale of the project. The locale can't be changed later.
Set the required displayName property. This property is the name that is displayed in the Speech Studio.

Make an HTTP POST request using the URI as shown in the following example. Replace YourSpeechResoureKey with your Speech resource key, replace YourServiceRegion with your Speech resource region, and set the request body properties as previously described.

curl -v -X POST -H "Ocp-Apim-Subscription-Key: YourSpeechResoureKey" -H "Content-Type: application/json" -d '{
  "kind": "Acoustic",
  "displayName": "My Acoustic Dataset",
  "description": "My Acoustic Dataset Description",
  "project": {
    "self": "https://chinanorth2.api.cognitive.azure.cn/speechtotext/v3.2/projects/bbbbcccc-1111-dddd-2222-eeee3333ffff"
  },
  "contentUrl": "https://contoso.com/mydatasetlocation",
  "locale": "en-US",
}'  "https://YourServiceRegion.api.cognitive.azure.cn/speechtotext/v3.2/datasets"

You should receive a response body in the following format:

{
  "self": "https://chinanorth2.api.cognitive.azure.cn/speechtotext/v3.2/datasets/aaaabbbb-0000-cccc-1111-dddd2222eeee",
  "kind": "Acoustic",
  "links": {
    "files": "https://chinanorth2.api.cognitive.azure.cn/speechtotext/v3.2/datasets/23b6554d-21f9-4df1-89cb-f84510ac8d23/files"
  },
  "project": {
    "self": "https://chinanorth2.api.cognitive.azure.cn/speechtotext/v3.2/projects/bbbbcccc-1111-dddd-2222-eeee3333ffff"
  },
  "properties": {
    "textNormalizationKind": "Default",
    "acceptedLineCount": 2,
    "rejectedLineCount": 0,
    "duration": "PT59S"
  },
  "lastActionDateTime": "2024-07-14T17:36:30Z",
  "status": "Succeeded",
  "createdDateTime": "2024-07-14T17:36:14Z",
  "locale": "en-US",
  "displayName": "My Acoustic Dataset",
  "description": "My Acoustic Dataset Description",
  "customProperties": {
    "PortalAPIVersion": "3"
  }
}

The top-level self property in the response body is the dataset's URI. Use this URI to get details about the dataset's project and files. You also use this URI to update or delete the dataset.

Important

Connecting a dataset to a custom speech project isn't required to train and test a custom model using the REST API or Speech CLI. But if the dataset isn't connected to any project, you can't select it for training or testing in the Speech Studio.

Next steps

Last updated on 2025-11-25

Upload training and testing datasets for custom speech

Upload datasets

Next steps

Additional resources