使用音频文件和请求正文属性向transcriptions
终结点发出多部分/表单数据 POST 请求。
以下示例演示了如何使用指定的区域设置转录音频文件。 如果知道音频文件的地域设置,可以指定它以提高转录准确性并减少延迟。
- 将
YourSpeechResoureKey
替换为语音资源密钥。
- 将
YourServiceRegion
替换为你的语音资源所在区域。
- 将
YourAudioFile
替换为音频文件的路径。
重要
对于建议使用 Microsoft Entra ID 的无密钥身份验证,请将 --header 'Ocp-Apim-Subscription-Key: YourSpeechResoureKey'
替换为 --header "Authorization: Bearer YourAccessToken"
。 有关无密钥身份验证的详细信息,请参阅 基于角色的访问控制 作指南。
curl --location 'https://YourServiceRegion.api.cognitive.azure.cn/speechtotext/transcriptions:transcribe?api-version=2024-11-15' \
--header 'Content-Type: multipart/form-data' \
--header 'Ocp-Apim-Subscription-Key: YourSpeechResoureKey' \
--form 'audio=@"YourAudioFile"' \
--form 'definition={
"locales":["en-US"]}'
根据以下说明构建形式定义:
- 设置可选(但建议选择)的
locales
属性,该属性应与要转录的音频数据的预期语言设置匹配。 在此示例中,区域设置为 en-US
。 有关支持的语言环境的详细信息,请参阅 语音转文本支持的语言。
有关快速听录 API locales
和其他属性的详细信息,请参阅本指南后面的“请求配置选项”部分。
响应包括 durationMilliseconds
、offsetMilliseconds
等。
combinedPhrases
属性包含每个说话人的完整听录内容。
{
"durationMilliseconds": 182439,
"combinedPhrases": [
{
"text": "Good afternoon. This is Sam. Thank you for calling Contoso. How can I help? Hi there. My name is Mary. I'm currently living in Los Angeles, but I'm planning to move to Las Vegas. I would like to apply for a loan. Okay. I see you're currently living in California. Let me make sure I understand you correctly. Uh You'd like to apply for a loan even though you'll be moving soon. Is that right? Yes, exactly. So I'm planning to relocate soon, but I would like to apply for the loan first so that I can purchase a new home once I move there. And are you planning to sell your current home? Yes, I will be listing it on the market soon and hopefully it'll sell quickly. That's why I'm applying for a loan now, so that I can purchase a new house in Nevada and close on it quickly as well once my current home sells. I see. Would you mind holding for a moment while I take your information down? Yeah, no problem. Thank you for your help. Mm-hmm. Just one moment. All right. Thank you for your patience, ma'am. May I have your first and last name, please? Yes, my name is Mary Smith. Thank you, Ms. Smith. May I have your current address, please? Yes. So my address is 123 Main Street in Los Angeles, California, and the zip code is 90923. Sorry, that was a 90 what? 90923. 90923 on Main Street. Got it. Thank you. May I have your phone number as well, please? Uh Yes, my phone number is 504-529-2351 and then yeah. 2351. Got it. And do you have an e-mail address we I can associate with this application? uh Yes, so my e-mail address is mary.a.sm78@gmail.com. Mary.a, was that a S-N as in November or M as in Mike? M as in Mike. Mike78, got it. Thank you. Ms. Smith, do you currently have any other loans? Uh Yes, so I currently have two other loans through Contoso. So my first one is my car loan and then my other is my student loan. They total about 1400 per month combined and my interest rate is 8%. I see. And you're currently paying those loans off monthly, is that right? Yes, of course I do. OK, thank you. Here's what I suggest we do. Let me place you on a brief hold again so that I can talk with one of our loan officers and get this started for you immediately. In the meantime, it would be great if you could take a few minutes and complete the remainder of the secure application online at www.contosoloans.com. Yeah, that sounds good. I can go ahead and get started. Thank you for your help. Thank you."
}
],
"phrases": [
{
"offsetMilliseconds": 960,
"durationMilliseconds": 640,
"text": "Good afternoon.",
"words": [
{
"text": "Good",
"offsetMilliseconds": 960,
"durationMilliseconds": 240
},
{
"text": "afternoon.",
"offsetMilliseconds": 1200,
"durationMilliseconds": 400
}
],
"locale": "en-US",
"confidence": 0.93554276
},
{
"offsetMilliseconds": 1600,
"durationMilliseconds": 640,
"text": "This is Sam.",
"words": [
{
"text": "This",
"offsetMilliseconds": 1600,
"durationMilliseconds": 240
},
{
"text": "is",
"offsetMilliseconds": 1840,
"durationMilliseconds": 120
},
{
"text": "Sam.",
"offsetMilliseconds": 1960,
"durationMilliseconds": 280
}
],
"locale": "en-US",
"confidence": 0.93554276
},
{
"offsetMilliseconds": 2240,
"durationMilliseconds": 1040,
"text": "Thank you for calling Contoso.",
"words": [
{
"text": "Thank",
"offsetMilliseconds": 2240,
"durationMilliseconds": 200
},
{
"text": "you",
"offsetMilliseconds": 2440,
"durationMilliseconds": 80
},
{
"text": "for",
"offsetMilliseconds": 2520,
"durationMilliseconds": 120
},
{
"text": "calling",
"offsetMilliseconds": 2640,
"durationMilliseconds": 200
},
{
"text": "Contoso.",
"offsetMilliseconds": 2840,
"durationMilliseconds": 440
}
],
"locale": "en-US",
"confidence": 0.93554276
},
{
"offsetMilliseconds": 3280,
"durationMilliseconds": 640,
"text": "How can I help?",
"words": [
{
"text": "How",
"offsetMilliseconds": 3280,
"durationMilliseconds": 120
},
{
"text": "can",
"offsetMilliseconds": 3440,
"durationMilliseconds": 120
},
{
"text": "I",
"offsetMilliseconds": 3560,
"durationMilliseconds": 40
},
{
"text": "help?",
"offsetMilliseconds": 3600,
"durationMilliseconds": 320
}
],
"locale": "en-US",
"confidence": 0.93554276
},
{
"offsetMilliseconds": 5040,
"durationMilliseconds": 400,
"text": "Hi there.",
"words": [
{
"text": "Hi",
"offsetMilliseconds": 5040,
"durationMilliseconds": 240
},
{
"text": "there.",
"offsetMilliseconds": 5280,
"durationMilliseconds": 160
}
],
"locale": "en-US",
"confidence": 0.93554276
},
{
"offsetMilliseconds": 5440,
"durationMilliseconds": 800,
"text": "My name is Mary.",
"words": [
{
"text": "My",
"offsetMilliseconds": 5440,
"durationMilliseconds": 80
},
{
"text": "name",
"offsetMilliseconds": 5520,
"durationMilliseconds": 120
},
{
"text": "is",
"offsetMilliseconds": 5640,
"durationMilliseconds": 80
},
{
"text": "Mary.",
"offsetMilliseconds": 5720,
"durationMilliseconds": 520
}
],
"locale": "en-US",
"confidence": 0.93554276
},
// More transcription results...
// Redacted for brevity
{
"offsetMilliseconds": 180320,
"durationMilliseconds": 680,
"text": "Thank you for your help.",
"words": [
{
"text": "Thank",
"offsetMilliseconds": 180320,
"durationMilliseconds": 160
},
{
"text": "you",
"offsetMilliseconds": 180480,
"durationMilliseconds": 80
},
{
"text": "for",
"offsetMilliseconds": 180560,
"durationMilliseconds": 120
},
{
"text": "your",
"offsetMilliseconds": 180680,
"durationMilliseconds": 120
},
{
"text": "help.",
"offsetMilliseconds": 180800,
"durationMilliseconds": 200
}
],
"locale": "en-US",
"confidence": 0.92022026
},
{
"offsetMilliseconds": 181960,
"durationMilliseconds": 280,
"text": "Thank you.",
"words": [
{
"text": "Thank",
"offsetMilliseconds": 181960,
"durationMilliseconds": 200
},
{
"text": "you.",
"offsetMilliseconds": 182160,
"durationMilliseconds": 80
}
],
"locale": "en-US",
"confidence": 0.92022026
}
]
}
使用音频文件和请求正文属性向transcriptions
终结点发出多部分/表单数据 POST 请求。
以下示例演示了如何转录启用了语言识别功能的音频文件。 如果不确定所使用的区域设置,可以指定多个区域设置。 如果未指定任何语言,或者音频文件中没有指定的语言,那么语音服务将尝试识别语言。
- 将
YourSpeechResoureKey
替换为语音资源密钥。
- 将
YourServiceRegion
替换为你的语音资源所在区域。
- 将
YourAudioFile
替换为音频文件的路径。
重要
对于建议使用 Microsoft Entra ID 的无密钥身份验证,请将 --header 'Ocp-Apim-Subscription-Key: YourSpeechResoureKey'
替换为 --header "Authorization: Bearer YourAccessToken"
。 有关无密钥身份验证的详细信息,请参阅 基于角色的访问控制 作指南。
curl --location 'https://YourServiceRegion.api.cognitive.azure.cn/speechtotext/transcriptions:transcribe?api-version=2024-11-15' \
--header 'Content-Type: multipart/form-data' \
--header 'Ocp-Apim-Subscription-Key: YourSpeechResoureKey' \
--form 'audio=@"YourAudioFile"' \
--form 'definition={
"locales":["en-US","ja-JP"]}'
根据以下说明构建形式定义:
- 设置可选(但建议选择)的
locales
属性,该属性应与要转录的音频数据的预期语言设置匹配。 在此示例中,语言设置为 en-US
和 ja-JP
。 可以指定的受支持区域设置位于所有支持的语言中。
有关快速听录 API locales
和其他属性的详细信息,请参阅本指南后面的“请求配置选项”部分。
响应包括 durationMilliseconds
、offsetMilliseconds
等。
combinedPhrases
属性包含每个说话人的完整听录内容。
{
"durationMilliseconds": 185079,
"combinedPhrases": [
{
"text": "Hello, thank you for calling Contoso. Who am I speaking with today? Hi, my name is Mary Rondo. I'm trying to enroll myself with Contoso. Hi, Mary. Are you calling because you need health insurance? Yes. Yeah, I'm calling to sign up for insurance. Great. Uh If you can answer a few questions, we can get you signed up in a Jiffy. Okay. So what's your full name? uh So Mary Beth Rondo, last name is R like Romeo, O like Ocean, N like Nancy D, D like Dog, and O like Ocean again. Rondo. Got it. And what's the best callback number in case we get disconnected? I only have a cell phone, so I can give you that. Yep, that'll be fine. Sure. So it's 234-554 and then 9312. Got it. So to confirm, it's 234-554-9312. Yep, that's right. Excellent. Let's get some additional information for your application. Do you have a job? Uh Yes, I am self-employed. Okay, so then you have a social security number as well? Uh Yes, I do. Okay, and what is your social security number, please? Uh Sure, so it's 412-253-4931. 6789. Sorry, was that a 25 or a 225? You cut out for a bit. It's double two, so 412, then another two, then five. Thank you so much. And could I have your e-mail address, please? Yeah, it's maryrondo@gmail.com. So my first and last name at gmail.com. No periods, no dashes. Great. Uh That is the last question. So let me take your information and I'll be able to get you signed up right away. Thank you for calling Contoso and I'll be able to get you signed up immediately. One of our agents will call you back in about 24 hours or so to confirm your application. That sounds good. Thank you. Absolutely. If you need anything else, please give us a call at 1-800-555-5564, extension 123. Thank you very much for calling Contoso. Actually, so I have one more question. Yes, of course. I'm curious, will I be getting a physical card as proof of coverage? So the default is a digital membership card, but we can send you a physical card if you prefer. Uh Yes. Could you please mail it to me when it's ready? I'd like to have it shipped to, are you ready for my address? Uh Yeah. uh So it's 2660 Unit A on Maple Avenue, Southeast Lansing, and then zip code is 48823. Absolutely. I've made a note on your file. Awesome. Thanks so much. You're very welcome. Thank you for calling Contoso and have a great day."
}
],
"phrases": [
{
"offsetMilliseconds": 720,
"durationMilliseconds": 1600,
"text": "Hello, thank you for calling Contoso.",
"words": [
{
"text": "Hello,",
"offsetMilliseconds": 720,
"durationMilliseconds": 480
},
{
"text": "thank",
"offsetMilliseconds": 1200,
"durationMilliseconds": 200
},
{
"text": "you",
"offsetMilliseconds": 1400,
"durationMilliseconds": 80
},
{
"text": "for",
"offsetMilliseconds": 1480,
"durationMilliseconds": 120
},
{
"text": "calling",
"offsetMilliseconds": 1600,
"durationMilliseconds": 240
},
{
"text": "Contoso.",
"offsetMilliseconds": 1840,
"durationMilliseconds": 480
}
],
"locale": "en-US",
"confidence": 0.93265927
},
{
"offsetMilliseconds": 2320,
"durationMilliseconds": 1120,
"text": "Who am I speaking with today?",
"words": [
{
"text": "Who",
"offsetMilliseconds": 2320,
"durationMilliseconds": 160
},
{
"text": "am",
"offsetMilliseconds": 2480,
"durationMilliseconds": 80
},
{
"text": "I",
"offsetMilliseconds": 2560,
"durationMilliseconds": 80
},
{
"text": "speaking",
"offsetMilliseconds": 2640,
"durationMilliseconds": 320
},
{
"text": "with",
"offsetMilliseconds": 2960,
"durationMilliseconds": 160
},
{
"text": "today?",
"offsetMilliseconds": 3120,
"durationMilliseconds": 320
}
],
"locale": "en-US",
"confidence": 0.93265927
},
{
"offsetMilliseconds": 4480,
"durationMilliseconds": 1600,
"text": "Hi, my name is Mary Rondo.",
"words": [
{
"text": "Hi,",
"offsetMilliseconds": 4480,
"durationMilliseconds": 400
},
{
"text": "my",
"offsetMilliseconds": 4880,
"durationMilliseconds": 120
},
{
"text": "name",
"offsetMilliseconds": 5000,
"durationMilliseconds": 120
},
{
"text": "is",
"offsetMilliseconds": 5120,
"durationMilliseconds": 160
},
{
"text": "Mary",
"offsetMilliseconds": 5280,
"durationMilliseconds": 240
},
{
"text": "Rondo.",
"offsetMilliseconds": 5520,
"durationMilliseconds": 560
}
],
"locale": "en-US",
"confidence": 0.93265927
},
{
"offsetMilliseconds": 6120,
"durationMilliseconds": 1800,
"text": "I'm trying to enroll myself with Contoso.",
"words": [
{
"text": "I'm",
"offsetMilliseconds": 6120,
"durationMilliseconds": 120
},
{
"text": "trying",
"offsetMilliseconds": 6240,
"durationMilliseconds": 200
},
{
"text": "to",
"offsetMilliseconds": 6440,
"durationMilliseconds": 80
},
{
"text": "enroll",
"offsetMilliseconds": 6520,
"durationMilliseconds": 200
},
{
"text": "myself",
"offsetMilliseconds": 6720,
"durationMilliseconds": 360
},
{
"text": "with",
"offsetMilliseconds": 7080,
"durationMilliseconds": 120
},
{
"text": "Contoso.",
"offsetMilliseconds": 7200,
"durationMilliseconds": 720
}
],
"locale": "en-US",
"confidence": 0.93265927
},
// More transcription results...
// Redacted for brevity
{
"offsetMilliseconds": 181520,
"durationMilliseconds": 720,
"text": "You're very welcome.",
"words": [
{
"text": "You're",
"offsetMilliseconds": 181520,
"durationMilliseconds": 160
},
{
"text": "very",
"offsetMilliseconds": 181680,
"durationMilliseconds": 200
},
{
"text": "welcome.",
"offsetMilliseconds": 181880,
"durationMilliseconds": 360
}
],
"locale": "en-US",
"confidence": 0.90571773
},
{
"offsetMilliseconds": 182320,
"durationMilliseconds": 1840,
"text": "Thank you for calling Contoso and have a great day.",
"words": [
{
"text": "Thank",
"offsetMilliseconds": 182320,
"durationMilliseconds": 200
},
{
"text": "you",
"offsetMilliseconds": 182520,
"durationMilliseconds": 80
},
{
"text": "for",
"offsetMilliseconds": 182600,
"durationMilliseconds": 120
},
{
"text": "calling",
"offsetMilliseconds": 182720,
"durationMilliseconds": 280
},
{
"text": "Contoso",
"offsetMilliseconds": 183000,
"durationMilliseconds": 520
},
{
"text": "and",
"offsetMilliseconds": 183520,
"durationMilliseconds": 160
},
{
"text": "have",
"offsetMilliseconds": 183680,
"durationMilliseconds": 120
},
{
"text": "a",
"offsetMilliseconds": 183800,
"durationMilliseconds": 40
},
{
"text": "great",
"offsetMilliseconds": 183840,
"durationMilliseconds": 200
},
{
"text": "day.",
"offsetMilliseconds": 184040,
"durationMilliseconds": 120
}
],
"locale": "en-US",
"confidence": 0.90571773
}
]
}
使用音频文件和请求正文属性向transcriptions
终结点发出多部分/表单数据 POST 请求。
以下示例演示如何使用最新的多语言语音听录模型转录音频文件。 如果音频包含要持续准确地听录的多语言内容,则可以在不指定区域设置代码的情况下使用最新的多语言语音听录模型。
- 将
YourSpeechResoureKey
替换为语音资源密钥。
- 将
YourServiceRegion
替换为你的语音资源所在区域。
- 将
YourAudioFile
替换为音频文件的路径。
重要
对于建议使用 Microsoft Entra ID 的无密钥身份验证,请将 --header 'Ocp-Apim-Subscription-Key: YourSpeechResoureKey'
替换为 --header "Authorization: Bearer YourAccessToken"
。 有关无密钥身份验证的详细信息,请参阅 基于角色的访问控制 作指南。
curl --location 'https://YourServiceRegion.api.cognitive.azure.cn/speechtotext/transcriptions:transcribe?api-version=2024-11-15' \
--header 'Content-Type: multipart/form-data' \
--header 'Ocp-Apim-Subscription-Key: YourSpeechResoureKey' \
--form 'audio=@"YourAudioFile"' \
--form 'definition="{
"locales":[]}"'
根据以下说明构建形式定义:
可以将属性留 locales
空(如前面的示例所示),也可以省略该属性。
支持具有当前多语言模型的音频输入区域设置包括: de-DE、en-AU、en-CA、en-GB、en-IN、en-US、es-ES、es-MX、fr-CA、fr-FR、hi-IN、it-IT、ja-JP、ko-KR和zh-cn。
转录结果在语言层面上区分,并遵循“此语言的主要语言环境”(例如,无论音频具有英国英语或印度英语口音,它将始终输出“en-US”语言环境代码)。
有关快速听录 API locales
和其他属性的详细信息,请参阅本指南后面的“请求配置选项”部分。
响应包括 durationMilliseconds
、offsetMilliseconds
等。
combinedPhrases
属性包含每个说话人的完整听录内容。
{
"durationMilliseconds": 57187,
"combinedPhrases": [
{
"text": "With custom speech,you can evaluate and improve the microsoft speech to text accuracy for your applications and products 现成的语音转文本,利用通用语言模型作为一个基本模型,使用microsoft自有数据进行训练,并反映常用的口语。此基础模型使用那些代表各常见领域的方言和发音进行了预先训练。 Quand vous effectuez une demande de reconnaissance vocale, le modèle de base le plus récent pour chaque langue prise en charge est utilisé par défaut. Le modèle de base fonctionne très bien dans la plupart des scénarios de reconnaissance vocale. A custom model can be used to augment the base model to improve recognition of domain specific vocabulary specified to the application by providing text data to train the model. It can also be used to improve recognition based for the specific audio conditions of the application by providing audio data with reference transcriptions."
}
],
"phrases": [
{
"offsetMilliseconds": 80,
"durationMilliseconds": 6960,
"text": "With custom speech,you can evaluate and improve the microsoft speech to text accuracy for your applications and products.",
"words": [
{
"text": "with",
"offsetMilliseconds": 80,
"durationMilliseconds": 160
},
{
"text": "custom",
"offsetMilliseconds": 240,
"durationMilliseconds": 480
},
{
"text": "speech",
"offsetMilliseconds": 720,
"durationMilliseconds": 360
},
{
"text": ",",
"offsetMilliseconds": 1080,
"durationMilliseconds": 10
},
{
"text": "you",
"offsetMilliseconds": 1200,
"durationMilliseconds": 240
},
{
"text": "can",
"offsetMilliseconds": 1440,
"durationMilliseconds": 160
},
{
"text": "evaluate",
"offsetMilliseconds": 1600,
"durationMilliseconds": 640
},
{
"text": "and",
"offsetMilliseconds": 2240,
"durationMilliseconds": 200
},
{
"text": "improve",
"offsetMilliseconds": 2440,
"durationMilliseconds": 280
},
{
"text": "the",
"offsetMilliseconds": 2720,
"durationMilliseconds": 160
},
{
"text": "microsoft",
"offsetMilliseconds": 2880,
"durationMilliseconds": 640
},
{
"text": "speech",
"offsetMilliseconds": 3520,
"durationMilliseconds": 320
},
{
"text": "to",
"offsetMilliseconds": 3840,
"durationMilliseconds": 200
},
{
"text": "text",
"offsetMilliseconds": 4040,
"durationMilliseconds": 360
},
{
"text": "accuracy",
"offsetMilliseconds": 4400,
"durationMilliseconds": 560
},
{
"text": "for",
"offsetMilliseconds": 4960,
"durationMilliseconds": 160
},
{
"text": "your",
"offsetMilliseconds": 5120,
"durationMilliseconds": 200
},
{
"text": "applications",
"offsetMilliseconds": 5320,
"durationMilliseconds": 760
},
{
"text": "and",
"offsetMilliseconds": 6080,
"durationMilliseconds": 200
},
{
"text": "products",
"offsetMilliseconds": 6280,
"durationMilliseconds": 680
},
],
"locale": "en-us",
"confidence": 0.9539559
},
{
"offsetMilliseconds": 8000,
"durationMilliseconds": 8600,
"text": "现成的语音转文本,利用通用语言模型作为一个基本模型,使用microsoft自有数据进行训练,并反映常用的口语。此基础模型使用那些代表各常见领域的方言和发音进行了预先训练。",
"words": [
{
"text": "现",
"offsetMilliseconds": 8000,
"durationMilliseconds": 40
},
{
"text": "成",
"offsetMilliseconds": 8040,
"durationMilliseconds": 40
},
{
"text": "的",
"offsetMilliseconds": 8160,
"durationMilliseconds": 40
},
{
"text": "语",
"offsetMilliseconds": 8200,
"durationMilliseconds": 40
},
{
"text": "音",
"offsetMilliseconds": 8240,
"durationMilliseconds": 40
},
{
"text": "转",
"offsetMilliseconds": 8280,
"durationMilliseconds": 40
},
{
"text": "文",
"offsetMilliseconds": 8320,
"durationMilliseconds": 40
},
{
"text": "本,",
"offsetMilliseconds": 8360,
"durationMilliseconds": 40
},
{
"text": "利",
"offsetMilliseconds": 8400,
"durationMilliseconds": 40
},
{
"text": "用",
"offsetMilliseconds": 8440,
"durationMilliseconds": 40
},
{
"text": "通",
"offsetMilliseconds": 8480,
"durationMilliseconds": 40
},
{
"text": "用",
"offsetMilliseconds": 8520,
"durationMilliseconds": 40
},
{
"text": "语",
"offsetMilliseconds": 8560,
"durationMilliseconds": 40
},
{
"text": "言",
"offsetMilliseconds": 8600,
"durationMilliseconds": 40
},
{
"text": "模",
"offsetMilliseconds": 8640,
"durationMilliseconds": 40
},
{
"text": "型",
"offsetMilliseconds": 8680,
"durationMilliseconds": 40
},
{
"text": "作",
"offsetMilliseconds": 8800,
"durationMilliseconds": 40
},
{
"text": "为",
"offsetMilliseconds": 8840,
"durationMilliseconds": 40
},
{
"text": "一",
"offsetMilliseconds": 9520,
"durationMilliseconds": 40
},
{
"text": "个",
"offsetMilliseconds": 9560,
"durationMilliseconds": 40
},
{
"text": "基",
"offsetMilliseconds": 9600,
"durationMilliseconds": 40
},
{
"text": "本",
"offsetMilliseconds": 9640,
"durationMilliseconds": 40
},
{
"text": "模",
"offsetMilliseconds": 9680,
"durationMilliseconds": 40
},
{
"text": "型,",
"offsetMilliseconds": 9720,
"durationMilliseconds": 40
},
{
"text": "使",
"offsetMilliseconds": 9760,
"durationMilliseconds": 40
},
{
"text": "用",
"offsetMilliseconds": 10080,
"durationMilliseconds": 320
},
{
"text": "microsoft",
"offsetMilliseconds": 10400,
"durationMilliseconds": 3600
},
{
"text": "自",
"offsetMilliseconds": 14000,
"durationMilliseconds": 40
},
{
"text": "有",
"offsetMilliseconds": 14040,
"durationMilliseconds": 40
},
{
"text": "数",
"offsetMilliseconds": 14160,
"durationMilliseconds": 40
},
{
"text": "据",
"offsetMilliseconds": 14200,
"durationMilliseconds": 40
},
{
"text": "进",
"offsetMilliseconds": 14320,
"durationMilliseconds": 40
},
{
"text": "行",
"offsetMilliseconds": 14360,
"durationMilliseconds": 40
},
{
"text": "训",
"offsetMilliseconds": 14400,
"durationMilliseconds": 40
},
{
"text": "练,",
"offsetMilliseconds": 14440,
"durationMilliseconds": 40
},
{
"text": "并",
"offsetMilliseconds": 14480,
"durationMilliseconds": 40
},
{
"text": "反",
"offsetMilliseconds": 14520,
"durationMilliseconds": 40
},
{
"text": "映",
"offsetMilliseconds": 14560,
"durationMilliseconds": 40
},
{
"text": "常",
"offsetMilliseconds": 14600,
"durationMilliseconds": 40
},
{
"text": "用",
"offsetMilliseconds": 14640,
"durationMilliseconds": 40
},
{
"text": "的",
"offsetMilliseconds": 14680,
"durationMilliseconds": 40
},
{
"text": "口",
"offsetMilliseconds": 14720,
"durationMilliseconds": 40
},
{
"text": "语",
"offsetMilliseconds": 14760,
"durationMilliseconds": 40
},
{
"text": "。",
"offsetMilliseconds": 14800,
"durationMilliseconds": 40
},
{
"text": "此",
"offsetMilliseconds": 14840,
"durationMilliseconds": 40
},
{
"text": "基",
"offsetMilliseconds": 14880,
"durationMilliseconds": 40
},
{
"text": "础",
"offsetMilliseconds": 14920,
"durationMilliseconds": 40
},
{
"text": "模",
"offsetMilliseconds": 14960,
"durationMilliseconds": 40
},
{
"text": "型",
"offsetMilliseconds": 15000,
"durationMilliseconds": 40
},
{
"text": "使",
"offsetMilliseconds": 15040,
"durationMilliseconds": 40
},
{
"text": "用",
"offsetMilliseconds": 15080,
"durationMilliseconds": 40
},
{
"text": "那",
"offsetMilliseconds": 15120,
"durationMilliseconds": 40
},
{
"text": "些",
"offsetMilliseconds": 15160,
"durationMilliseconds": 40
},
{
"text": "代",
"offsetMilliseconds": 15200,
"durationMilliseconds": 40
},
{
"text": "表",
"offsetMilliseconds": 15240,
"durationMilliseconds": 40
},
{
"text": "各",
"offsetMilliseconds": 15280,
"durationMilliseconds": 40
},
{
"text": "常",
"offsetMilliseconds": 15320,
"durationMilliseconds": 40
},
{
"text": "见",
"offsetMilliseconds": 15360,
"durationMilliseconds": 40
},
{
"text": "领",
"offsetMilliseconds": 15400,
"durationMilliseconds": 40
},
{
"text": "域",
"offsetMilliseconds": 15760,
"durationMilliseconds": 40
},
{
"text": "的",
"offsetMilliseconds": 15800,
"durationMilliseconds": 40
},
{
"text": "方",
"offsetMilliseconds": 15920,
"durationMilliseconds": 40
},
{
"text": "言",
"offsetMilliseconds": 15960,
"durationMilliseconds": 40
},
{
"text": "和",
"offsetMilliseconds": 16000,
"durationMilliseconds": 40
},
{
"text": "发",
"offsetMilliseconds": 16040,
"durationMilliseconds": 40
},
{
"text": "音",
"offsetMilliseconds": 16080,
"durationMilliseconds": 40
},
{
"text": "进",
"offsetMilliseconds": 16120,
"durationMilliseconds": 40
},
{
"text": "行",
"offsetMilliseconds": 16160,
"durationMilliseconds": 40
},
{
"text": "了",
"offsetMilliseconds": 16200,
"durationMilliseconds": 40
},
{
"text": "预",
"offsetMilliseconds": 16320,
"durationMilliseconds": 40
},
{
"text": "先",
"offsetMilliseconds": 16360,
"durationMilliseconds": 40
},
{
"text": "训",
"offsetMilliseconds": 16400,
"durationMilliseconds": 40
},
{
"text": "练",
"offsetMilliseconds": 16560,
"durationMilliseconds": 40
},
],
"locale": "zh-cn",
"confidence": 0.9241725
},
{
"offsetMilliseconds": 24320,
"durationMilliseconds": 6640,
"text": "Quand vous effectuez une demande de reconnaissance vocale, le modèle de base le plus récent pour chaque langue prise en charge est utilisé par défaut.",
"words": [
{
"text": "Quand",
"offsetMilliseconds": 24320,
"durationMilliseconds": 160
},
{
"text": "vous",
"offsetMilliseconds": 24480,
"durationMilliseconds": 80
},
// More transcription results...
// Redacted for brevity
{
"text": "scénarios",
"offsetMilliseconds": 34200,
"durationMilliseconds": 400
},
{
"text": "de",
"offsetMilliseconds": 34600,
"durationMilliseconds": 120
},
{
"text": "reconnaissance",
"offsetMilliseconds": 34720,
"durationMilliseconds": 640
},
{
"text": "vocale.",
"offsetMilliseconds": 35360,
"durationMilliseconds": 480
}
],
"locale": "fr-fr",
"confidence": 0.9308314
},
{
"offsetMilliseconds": 36720,
"durationMilliseconds": 10320,
"text": "A custom model can be used to augment the base model to improve recognition of domain specific vocabulary spécifique to the application by providing text data to train the model.",
"words": [
{
"text": "A",
"offsetMilliseconds": 36720,
"durationMilliseconds": 80
},
{
"text": "custom",
"offsetMilliseconds": 36880,
"durationMilliseconds": 400
},
{
"text": "model",
"offsetMilliseconds": 37280,
"durationMilliseconds": 480
},
// More transcription results...
// Redacted for brevity
{
"text": "with",
"offsetMilliseconds": 54720,
"durationMilliseconds": 200
},
{
"text": "reference",
"offsetMilliseconds": 54920,
"durationMilliseconds": 360
},
{
"text": "transcriptions.",
"offsetMilliseconds": 55280,
"durationMilliseconds": 1200
}
],
"locale": "en-us",
"confidence": 0.92155737
}
]
}
使用音频文件和请求正文属性向transcriptions
终结点发出多部分/表单数据 POST 请求。
以下示例演示了如何转录启用了分割聚类功能的音频文件。 分割聚类可区分对话中的不同说话人。 语音服务提供有关哪个讲话者在转录语音的特定部分发言的信息。
- 将
YourSpeechResoureKey
替换为语音资源密钥。
- 将
YourServiceRegion
替换为你的语音资源所在区域。
- 将
YourAudioFile
替换为音频文件的路径。
重要
对于建议使用 Microsoft Entra ID 的无密钥身份验证,请将 --header 'Ocp-Apim-Subscription-Key: YourSpeechResoureKey'
替换为 --header "Authorization: Bearer YourAccessToken"
。 有关无密钥身份验证的详细信息,请参阅 基于角色的访问控制 作指南。
curl --location 'https://YourServiceRegion.api.cognitive.azure.cn/speechtotext/transcriptions:transcribe?api-version=2024-11-15' \
--header 'Content-Type: multipart/form-data' \
--header 'Ocp-Apim-Subscription-Key: YourSpeechResoureKey' \
--form 'audio=@"YourAudioFile"' \
--form 'definition={
"locales":["en-US"],
"diarization": {"maxSpeakers": 2,"enabled": true}}'
根据以下说明构建形式定义:
设置可选(但建议选择)的 locales
属性,该属性应与要转录的音频数据的预期语言设置匹配。 在此示例中,区域设置为 en-US
。 可以指定的区域设置包括:de-DE、en-GB、en-IN、en-US、es-ES、es-MX、fr-FR、hi-IN、it-IT、ja-JP、ko-KR、pt-BR和 zh-cn。
设置 diarization
属性来识别和分隔单声道音频中的多个说话人。 例如,指定 "diarization": {"maxSpeakers": 2, "enabled": true}
。 然后,听录文件会包含每个已转录短语的 speaker
个条目。
有关快速听录 API 的 locales
、diarization
和其他属性的详细信息,请参阅本指南后面的“请求配置选项”部分。
响应包括 durationMilliseconds
、offsetMilliseconds
等。 在此示例中,已启用分割聚类,因此响应包含每个转录短语的 speaker
信息。
combinedPhrases
属性包含单声道中所有说话人的完整转录内容。
{
"durationMilliseconds": 182439,
"combinedPhrases": [
{
"channel": 0,
"text": "Good afternoon. This is Sam. Thank you for calling Contoso. How can I help? Hi there. My name is Mary. I'm currently living in Los Angeles, but I'm planning to move to Las Vegas. I would like to apply for a loan. Okay. I see you're currently living in California. Let me make sure I understand you correctly. Uh You'd like to apply for a loan even though you'll be moving soon. Is that right? Yes, exactly. So I'm planning to relocate soon, but I would like to apply for the loan first so that I can purchase a new home once I move there. And are you planning to sell your current home? Yes, I will be listing it on the market soon and hopefully it'll sell quickly. That's why I'm applying for a loan now, so that I can purchase a new house in Nevada and close on it quickly as well once my current home sells. I see. Would you mind holding for a moment while I take your information down? Yeah, no problem. Thank you for your help. Mm-hmm. Just one moment. All right. Thank you for your patience, ma'am. May I have your first and last name, please? Yes, my name is Mary Smith. Thank you, Ms. Smith. May I have your current address, please? Yes. So my address is 123 Main Street in Los Angeles, California, and the zip code is 90923. Sorry, that was a 90 what? 90923. 90923 on Main Street. Got it. Thank you. May I have your phone number as well, please? Uh. Yes, my phone number is 504-529-2351 and then yeah. 2351. Got it. And do you have an e-mail address we I can associate with this application? Uh Yes, so my e-mail address is mary.a.sm78@gmail.com. Mary.a, was that a S-N as in November or M as in Mike? M as in Mike. Mike78, got it. Thank you. Ms. Smith, do you currently have any other loans? Uh Yes, so I currently have two other loans through Contoso. So my first one is my car loan and then my other is my student loan. They total about 1400 per month combined and my interest rate is 8%. I see. And. You're currently paying those loans off monthly, is that right? Yes, of course I do. OK, thank you. Here's what I suggest we do. Let me place you on a brief hold again so that I can talk with one of our loan officers and get this started for you immediately. In the meantime, it would be great if you could take a few minutes and complete the remainder of the secure application online at www.contosoloans.com. Yeah, that sounds good. I can go ahead and get started. Thank you for your help. Thank you."
}
],
"phrases": [
{
"channel": 0,
"speaker": 1,
"offsetMilliseconds": 960,
"durationMilliseconds": 640,
"text": "Good afternoon.",
"words": [
{
"text": "Good",
"offsetMilliseconds": 960,
"durationMilliseconds": 240
},
{
"text": "afternoon.",
"offsetMilliseconds": 1200,
"durationMilliseconds": 400
}
],
"locale": "en-US",
"confidence": 0.93616915
},
{
"channel": 0,
"speaker": 1,
"offsetMilliseconds": 1600,
"durationMilliseconds": 640,
"text": "This is Sam.",
"words": [
{
"text": "This",
"offsetMilliseconds": 1600,
"durationMilliseconds": 240
},
{
"text": "is",
"offsetMilliseconds": 1840,
"durationMilliseconds": 120
},
{
"text": "Sam.",
"offsetMilliseconds": 1960,
"durationMilliseconds": 280
}
],
"locale": "en-US",
"confidence": 0.93616915
},
{
"channel": 0,
"speaker": 1,
"offsetMilliseconds": 2240,
"durationMilliseconds": 1040,
"text": "Thank you for calling Contoso.",
"words": [
{
"text": "Thank",
"offsetMilliseconds": 2240,
"durationMilliseconds": 200
},
{
"text": "you",
"offsetMilliseconds": 2440,
"durationMilliseconds": 80
},
{
"text": "for",
"offsetMilliseconds": 2520,
"durationMilliseconds": 120
},
{
"text": "calling",
"offsetMilliseconds": 2640,
"durationMilliseconds": 200
},
{
"text": "Contoso.",
"offsetMilliseconds": 2840,
"durationMilliseconds": 440
}
],
"locale": "en-US",
"confidence": 0.93616915
},
{
"channel": 0,
"speaker": 1,
"offsetMilliseconds": 3280,
"durationMilliseconds": 640,
"text": "How can I help?",
"words": [
{
"text": "How",
"offsetMilliseconds": 3280,
"durationMilliseconds": 120
},
{
"text": "can",
"offsetMilliseconds": 3440,
"durationMilliseconds": 120
},
{
"text": "I",
"offsetMilliseconds": 3560,
"durationMilliseconds": 40
},
{
"text": "help?",
"offsetMilliseconds": 3600,
"durationMilliseconds": 320
}
],
"locale": "en-US",
"confidence": 0.93616915
},
{
"channel": 0,
"speaker": 0,
"offsetMilliseconds": 5040,
"durationMilliseconds": 400,
"text": "Hi there.",
"words": [
{
"text": "Hi",
"offsetMilliseconds": 5040,
"durationMilliseconds": 240
},
{
"text": "there.",
"offsetMilliseconds": 5280,
"durationMilliseconds": 160
}
],
"locale": "en-US",
"confidence": 0.93616915
},
{
"channel": 0,
"speaker": 0,
"offsetMilliseconds": 5440,
"durationMilliseconds": 800,
"text": "My name is Mary.",
"words": [
{
"text": "My",
"offsetMilliseconds": 5440,
"durationMilliseconds": 80
},
{
"text": "name",
"offsetMilliseconds": 5520,
"durationMilliseconds": 120
},
{
"text": "is",
"offsetMilliseconds": 5640,
"durationMilliseconds": 80
},
{
"text": "Mary.",
"offsetMilliseconds": 5720,
"durationMilliseconds": 520
}
],
"locale": "en-US",
"confidence": 0.93616915
},
// More transcription results...
// Redacted for brevity
{
"channel": 0,
"speaker": 0,
"offsetMilliseconds": 180320,
"durationMilliseconds": 680,
"text": "Thank you for your help.",
"words": [
{
"text": "Thank",
"offsetMilliseconds": 180320,
"durationMilliseconds": 160
},
{
"text": "you",
"offsetMilliseconds": 180480,
"durationMilliseconds": 80
},
{
"text": "for",
"offsetMilliseconds": 180560,
"durationMilliseconds": 120
},
{
"text": "your",
"offsetMilliseconds": 180680,
"durationMilliseconds": 120
},
{
"text": "help.",
"offsetMilliseconds": 180800,
"durationMilliseconds": 200
}
],
"locale": "en-US",
"confidence": 0.9314801
},
{
"channel": 0,
"speaker": 1,
"offsetMilliseconds": 181960,
"durationMilliseconds": 280,
"text": "Thank you.",
"words": [
{
"text": "Thank",
"offsetMilliseconds": 181960,
"durationMilliseconds": 200
},
{
"text": "you.",
"offsetMilliseconds": 182160,
"durationMilliseconds": 80
}
],
"locale": "en-US",
"confidence": 0.9314801
}
]
}
使用音频文件和请求正文属性向transcriptions
终结点发出多部分/表单数据 POST 请求。
以下示例演示了如何转录包含一个或两个声道的音频文件。 多声道听录对于具有多个声道的音频文件非常有用,例如包含多个说话人的音频文件或有背景噪音的音频文件。 默认情况下,快速听录 API 将所有输入声道合并到单个声道,然后执行听录。 如果不希望这样处理,可以独立转录各个声道,而不进行合并。
- 将
YourSpeechResoureKey
替换为语音资源密钥。
- 将
YourServiceRegion
替换为你的语音资源所在区域。
- 将
YourAudioFile
替换为音频文件的路径。
重要
对于建议使用 Microsoft Entra ID 的无密钥身份验证,请将 --header 'Ocp-Apim-Subscription-Key: YourSpeechResoureKey'
替换为 --header "Authorization: Bearer YourAccessToken"
。 有关无密钥身份验证的详细信息,请参阅 基于角色的访问控制 作指南。
curl --location 'https://YourServiceRegion.api.cognitive.azure.cn/speechtotext/transcriptions:transcribe?api-version=2024-11-15' \
--header 'Content-Type: multipart/form-data' \
--header 'Ocp-Apim-Subscription-Key: YourSpeechResoureKey' \
--form 'audio=@"YourAudioFile"' \
--form 'definition={
"locales":["en-US"],
"channels": [0,1]}'
根据以下说明构建形式定义:
设置可选(但建议选择)的 locales
属性,该属性应与要转录的音频数据的预期语言设置匹配。 在此示例中,区域设置为 en-US
。 可以指定的区域设置包括:de-DE、en-GB、en-IN、en-US、es-ES、es-MX、fr-FR、hi-IN、it-IT、ja-JP、ko-KR、pt-BR和 zh-cn。
设置 channels
属性以指定要单独转录的通道的零起始索引。 除非启用分割聚类,否则最多支持两个声道。 在此示例中,指定了声道 0 和声道 1。
有关快速听录 API 的 locales
、channels
和其他属性的详细信息,请参阅本指南后面的“请求配置选项”部分。
响应包括 durationMilliseconds
、offsetMilliseconds
等。 如果音频文件包含多个声道,则 channel
属性会识别声道。
combinedPhrases
属性包含按照音频声道分隔的完整转录。 查找 "channel": 0,"text"
和 "channel": 1,"text"
以识别每个频道的完整转录。
{
"durationMilliseconds": 185079,
"combinedPhrases": [
{
"channel": 0,
"text": "Hello. Thank you for calling Contoso. Who am I speaking with today? Hi, Mary. Are you calling because you need health insurance? Great. If you can answer a few questions, we can get you signed up in the Jiffy. So what's your full name? Got it. And what's the best callback number in case we get disconnected? Yep, that'll be fine. Got it. So to confirm, it's 234-554-9312. Excellent. Let's get some additional information for your application. Do you have a job? OK, so then you have a Social Security number as well. OK, and what is your Social Security number please? Sorry, what was that, a 25 or a 225? You cut out for a bit. Alright, thank you so much. And could I have your e-mail address please? Great. Uh That is the last question. So let me take your information and I'll be able to get you signed up right away. Thank you for calling Contoso and I'll be able to get you signed up immediately. One of our agents will call you back in about 24 hours or so to confirm your application. Absolutely. If you need anything else, please give us a call at 1-800-555-5564, extension 123. Thank you very much for calling Contoso. Uh Yes, of course. So the default is a digital membership card, but we can send you a physical card if you prefer. Uh, yeah. Absolutely. I've made a note on your file. You're very welcome. Thank you for calling Contoso and have a great day."
},
{
"channel": 1,
"text": "Hi, my name is Mary Rondo. I'm trying to enroll myself with Contuso. Yes, yeah, I'm calling to sign up for insurance. Okay. So Mary Beth Rondo, last name is R like Romeo, O like Ocean, N like Nancy D, D like Dog, and O like Ocean again. Rondo. I only have a cell phone so I can give you that. Sure, so it's 234-554 and then 9312. Yep, that's right. Uh Yes, I am self-employed. Yes, I do. Uh Sure, so it's 412256789. It's double two, so 412, then another two, then five. Yeah, it's maryrondo@gmail.com. So my first and last name at gmail.com. No periods, no dashes. That was quick. Thank you. Actually, so I have one more question. I'm curious, will I be getting a physical card as proof of coverage? uh Yes. Could you please mail it to me when it's ready? I'd like to have it shipped to, are you ready for my address? So it's 2660 Unit A on Maple Avenue SE, Lansing, and then zip code is 48823. Awesome. Thanks so much."
}
],
"phrases": [
{
"channel": 0,
"offsetMilliseconds": 720,
"durationMilliseconds": 480,
"text": "Hello.",
"words": [
{
"text": "Hello.",
"offsetMilliseconds": 720,
"durationMilliseconds": 480
}
],
"locale": "en-US",
"confidence": 0.9177142
},
{
"channel": 0,
"offsetMilliseconds": 1200,
"durationMilliseconds": 1120,
"text": "Thank you for calling Contoso.",
"words": [
{
"text": "Thank",
"offsetMilliseconds": 1200,
"durationMilliseconds": 200
},
{
"text": "you",
"offsetMilliseconds": 1400,
"durationMilliseconds": 80
},
{
"text": "for",
"offsetMilliseconds": 1480,
"durationMilliseconds": 120
},
{
"text": "calling",
"offsetMilliseconds": 1600,
"durationMilliseconds": 240
},
{
"text": "Contoso.",
"offsetMilliseconds": 1840,
"durationMilliseconds": 480
}
],
"locale": "en-US",
"confidence": 0.9177142
},
{
"channel": 0,
"offsetMilliseconds": 2320,
"durationMilliseconds": 1120,
"text": "Who am I speaking with today?",
"words": [
{
"text": "Who",
"offsetMilliseconds": 2320,
"durationMilliseconds": 160
},
{
"text": "am",
"offsetMilliseconds": 2480,
"durationMilliseconds": 80
},
{
"text": "I",
"offsetMilliseconds": 2560,
"durationMilliseconds": 80
},
{
"text": "speaking",
"offsetMilliseconds": 2640,
"durationMilliseconds": 320
},
{
"text": "with",
"offsetMilliseconds": 2960,
"durationMilliseconds": 160
},
{
"text": "today?",
"offsetMilliseconds": 3120,
"durationMilliseconds": 320
}
],
"locale": "en-US",
"confidence": 0.9177142
},
{
"channel": 0,
"offsetMilliseconds": 9520,
"durationMilliseconds": 400,
"text": "Hi, Mary.",
"words": [
{
"text": "Hi,",
"offsetMilliseconds": 9520,
"durationMilliseconds": 80
},
{
"text": "Mary.",
"offsetMilliseconds": 9600,
"durationMilliseconds": 320
}
],
"locale": "en-US",
"confidence": 0.9177142
},
// More transcription results...
// Redacted for brevity
{
"channel": 1,
"offsetMilliseconds": 4480,
"durationMilliseconds": 1600,
"text": "Hi, my name is Mary Rondo.",
"words": [
{
"text": "Hi,",
"offsetMilliseconds": 4480,
"durationMilliseconds": 400
},
{
"text": "my",
"offsetMilliseconds": 4880,
"durationMilliseconds": 120
},
{
"text": "name",
"offsetMilliseconds": 5000,
"durationMilliseconds": 120
},
{
"text": "is",
"offsetMilliseconds": 5120,
"durationMilliseconds": 160
},
{
"text": "Mary",
"offsetMilliseconds": 5280,
"durationMilliseconds": 240
},
{
"text": "Rondo.",
"offsetMilliseconds": 5520,
"durationMilliseconds": 560
}
],
"locale": "en-US",
"confidence": 0.8989456
},
{
"channel": 1,
"offsetMilliseconds": 6080,
"durationMilliseconds": 1920,
"text": "I'm trying to enroll myself with Contuso.",
"words": [
{
"text": "I'm",
"offsetMilliseconds": 6080,
"durationMilliseconds": 160
},
{
"text": "trying",
"offsetMilliseconds": 6240,
"durationMilliseconds": 200
},
{
"text": "to",
"offsetMilliseconds": 6440,
"durationMilliseconds": 80
},
{
"text": "enroll",
"offsetMilliseconds": 6520,
"durationMilliseconds": 200
},
{
"text": "myself",
"offsetMilliseconds": 6720,
"durationMilliseconds": 360
},
{
"text": "with",
"offsetMilliseconds": 7080,
"durationMilliseconds": 120
},
{
"text": "Contuso.",
"offsetMilliseconds": 7200,
"durationMilliseconds": 800
}
],
"locale": "en-US",
"confidence": 0.8989456
},
// More transcription results...
// Redacted for brevity
]
}