Migrate code from v3.0 to v3.1 of the REST API
The Speech to text REST API is used for Batch transcription and custom speech. Changes from version 3.0 to 3.1 are described in the sections below.
Important
Speech to text REST API v3.2 is the latest version that's generally available. Preview versions 3.2-preview.1 and 3.2-preview.2* will be removed in September 2024. Speech to text REST API v3.1 will be retired on a date to be announced. Speech to text REST API v3.0 will be retired on April 1st, 2026.
Base path
You must update the base path in your code from /speechtotext/v3.0
to /speechtotext/v3.1
. For example, to get base models in the chinanorth2
region, use https://chinanorth2.api.cognitive.azure.cn/speechtotext/v3.1/models/base
instead of https://chinanorth2.api.cognitive.azure.cn/speechtotext/v3.0/models/base
.
Note these other changes:
- The
/models/{id}/copyto
operation (includes '/') in version 3.0 is replaced by the/models/{id}:copyto
operation (includes ':') in version 3.1. - The
/webhooks/{id}/ping
operation (includes '/') in version 3.0 is replaced by the/webhooks/{id}:ping
operation (includes ':') in version 3.1. - The
/webhooks/{id}/test
operation (includes '/') in version 3.0 is replaced by the/webhooks/{id}:test
operation (includes ':') in version 3.1.
For more information, see Operation IDs later in this guide.
Batch transcription
Note
Don't use Speech to text REST API v3.0 to retrieve a transcription created via Speech to text REST API v3.1. You'll see an error message such as the following: "The API version cannot be used to access this transcription. Please use API version v3.1 or higher."
In the Transcriptions_Create operation the following three properties are added:
- The
displayFormWordLevelTimestampsEnabled
property can be used to enable the reporting of word-level timestamps on the display form of the transcription results. The results are returned in thedisplayWords
property of the transcription file. - The
diarization
property can be used to specify hints for the minimum and maximum number of speaker labels to generate when performing optional diarization (speaker separation). With this feature, the service is now able to generate speaker labels for more than two speakers. To use this property, you must also set thediarizationEnabled
property totrue
. With the v3.1 API, we have increased the number of speakers that can be identified through diarization from the two speakers supported by the v3.0 API. It's recommended to keep the number of speakers under 30 for better performance. - The
languageIdentification
property can be used to specify settings for language identification on the input prior to transcription. Up to 10 candidate locales are supported for language identification. The returned transcription includes a newlocale
property for the recognized language or the locale that you provided.
The filter
property is added to the Transcriptions_List, Transcriptions_ListFiles, and Projects_ListTranscriptions operations. The filter
expression can be used to select a subset of the available resources. You can filter by displayName
, description
, createdDateTime
, lastActionDateTime
, status
, and locale
. For example: filter=createdDateTime gt 2022-02-01T11:00:00Z
If you use webhook to receive notifications about transcription status, note that the webhooks created via V3.0 API can't receive notifications for V3.1 transcription requests. You need to create a new webhook endpoint via V3.1 API in order to receive notifications for V3.1 transcription requests.
Custom speech
Datasets
The following operations are added for uploading and managing multiple data blocks for a dataset:
- Datasets_UploadBlock - Upload a block of data for the dataset. The maximum size of the block is 8MiB.
- Datasets_GetBlocks - Get the list of uploaded blocks for this dataset.
- Datasets_CommitBlocks - Commit blocklist to complete the upload of the dataset.
To support model adaptation with structured text in markdown data, the Datasets_Create operation now supports the LanguageMarkdown data kind. For more information, see upload datasets.
Models
The Models_ListBaseModels and Models_GetBaseModel operations return information on the type of adaptation supported by each base model.
"features": {
"supportsAdaptationsWith": [
"Acoustic",
"Language",
"LanguageMarkdown",
"Pronunciation"
]
}
The Models_Create operation has a new customModelWeightPercent
property where you can specify the weight used when the Custom Language Model (trained from plain or structured text data) is combined with the Base Language Model. Valid values are integers between 1 and 100. The default value is currently 30.
The filter
property is added to the following operations:
- Datasets_List
- Datasets_ListFiles
- Endpoints_List
- Evaluations_List
- Evaluations_ListFiles
- Models_ListBaseModels
- Models_ListCustomModels
- Projects_List
- Projects_ListDatasets
- Projects_ListEndpoints
- Projects_ListEvaluations
- Projects_ListModels
The filter
expression can be used to select a subset of the available resources. You can filter by displayName
, description
, createdDateTime
, lastActionDateTime
, status
, locale
, and kind
. For example: filter=locale eq 'en-US'
Added the Models_ListFiles operation to get the files of the model identified by the given ID.
Added the Models_GetFile operation to get one specific file (identified with fileId) from a model (identified with ID). This lets you retrieve a ModelReport file that provides information on the data processed during training.
Operation IDs
You must update the base path in your code from /speechtotext/v3.0
to /speechtotext/v3.1
. For example, to get base models in the chinanorth2
region, use https://chinanorth2.api.cognitive.azure.cn/speechtotext/v3.1/models/base
instead of https://chinanorth2.api.cognitive.azure.cn/speechtotext/v3.0/models/base
.
The name of each operationId
in version 3.1 is prefixed with the object name. For example, the operationId
for "Create Model" changed from CreateModel in version 3.0 to Models_Create in version 3.1.
The /models/{id}/copyto
operation (includes '/') in version 3.0 is replaced by the /models/{id}:copyto
operation (includes ':') in version 3.1.
The /webhooks/{id}/ping
operation (includes '/') in version 3.0 is replaced by the /webhooks/{id}:ping
operation (includes ':') in version 3.1.
The /webhooks/{id}/test
operation (includes '/') in version 3.0 is replaced by the /webhooks/{id}:test
operation (includes ':') in version 3.1.