High-definition voices in Azure Speech

Azure Speech continues to advance text-to-speech technology with neural high-definition (HD) voices. These HD voices understand content, automatically detect emotions in input text, and adjust speaking tone in real-time to match sentiment. They maintain consistent voice personas while delivering enhanced expressiveness, naturalness, and control.

HD voice overview

Azure Speech offers one advanced HD voice model currently:

Model	Voice Count	Key Characteristics	Best For
DragonHD	30+ fine-tuned voices	Professional quality, accurate pronunciation, multi-talker support	Enterprise applications requiring high-quality output

Key features of HD voices

The following table describes the key features of Azure Speech HD voices:

Key features	Description
Human-like speech generation	Neural text-to-speech HD voices generate highly natural and human-like speech. The model is trained on millions of hours of multilingual data, enabling it to accurately interpret input text and generate speech with the appropriate emotion, pace, and rhythm without manual adjustments.
Conversational	Neural text-to-speech HD voices replicate natural speech patterns, including spontaneous pauses and emphasis. When given conversational text, the model can reproduce common phonemes like pauses and filler words. The generated voice sounds as if someone is conversing directly with you.
Prosody variations	Neural text-to-speech HD voices introduce slight variations in each output to enhance realism. These variations make the speech sound more natural, as human voices naturally exhibit variation.
High fidelity	The primary objective of neural text-to-speech HD voices is to generate high-fidelity audio. The synthetic speech produced by the system can closely mimic human speech in both quality and naturalness.

Comparison of Azure Speech HD voices to other Azure text to speech voices

How do Azure Speech HD voices compare to other Azure text to speech voices? Here's a detailed comparison:

Feature	Azure Speech HD voices	Azure Speech voices (not HD)
Region	See Speech service regions	Available in dozens of regions. See the Speech service regions.
Number of voices	30	More than 500
Multilingual	Yes	Yes (applicable only to multilingual voices)
SSML support	Support for a subset of SSML elements.	Support for the full set of SSML in Azure Speech.
Development options	Speech SDK, Speech CLI, REST API	Speech SDK, Speech CLI, REST API
Deployment options	Cloud only	Cloud, embedded, hybrid, and containers.
Real-time or batch synthesis	Real-time only	Real-time and batch synthesis
Latency	Less than 300 ms	Less than 300 ms
Sample rate of synthesized audio	8, 16, 24, and 48 kHz	8, 16, 24, and 48 kHz
Speech output audio format	opus, mp3, pcm, truesilk	opus, mp3, pcm, truesilk

Supported Azure Speech HD voices

Azure Speech provides two sets of HD voices with different model architectures:

Dragon HD voices

The Azure Speech HD voice values use the format voicename:DragonHD:version. The name before the colon, such as en-US-Ava, is the voice persona name and its original locale.

To make sure you use the latest version of the base model that Microsoft provides, use the LatestNeural version.

For example, for the persona en-US-Ava, you can specify:

en-US-Ava:DragonHDLatestNeural: Always uses the latest version of the DragonHD base model.

The following table lists the available DragonHD voices:

Voice Name	Gender	Status
`de-DE-Seraphina:DragonHDLatestNeural`	Female	GA
`de-DE-Florian:DragonHDLatestNeural`	Male	GA
`en-GB-Ada:DragonHDLatestNeural`	Female	GA
`en-GB-Ollie:DragonHDLatestNeural`	Male	GA
`en-GB-Ryan:DragonHDLatestNeural`	Male	Preview
`en-GB-Sonia:DragonHDLatestNeural`	Female	Preview
`en-US-Ava:DragonHDLatestNeural`	Female	GA
`en-US-Andrew:DragonHDLatestNeural`	Male	GA
`en-US-Adam:DragonHDLatestNeural`	Male	GA
`en-US-Alloy:DragonHDLatestNeural`	Male	GA
`en-US-Aria:DragonHDLatestNeural`	Female	GA
`en-US-Bree:DragonHDLatestNeural`	Female	GA
`en-US-Brian:DragonHDLatestNeural`	Male	GA
`en-US-Davis:DragonHDLatestNeural`	Male	GA
`en-US-Emma:DragonHDLatestNeural`	Female	GA
`en-US-Emma2:DragonHDLatestNeural`	Female	GA
`en-US-Jane:DragonHDLatestNeural`	Female	GA
`en-US-Jenny:DragonHDLatestNeural`	Female	GA
`en-US-Nova:DragonHDLatestNeural`	Female	GA
`en-US-Phoebe:DragonHDLatestNeural`	Female	GA
`en-US-Serena:DragonHDLatestNeural`	Female	GA
`en-US-Steffan:DragonHDLatestNeural`	Male	GA
`en-US-Andrew2:DragonHDLatestNeural`	Male	GA
`en-US-Andrew3:DragonHDLatestNeural`	Male	Preview
`en-US-Ava3:DragonHDLatestNeural`	Female	Preview
`en-US-Evelyn:DragonHDLatestNeural`	Female	Preview
`en-US-Jimmie:DragonHDLatestNeural`	Male	Preview
`en-US-Juno:DragonHDLatestNeural`	Male	Preview
`en-US-Mila:DragonHDLatestNeural`	Female	Preview
`en-US-Tessa:DragonHDLatestNeural`	Female	Preview
`en-US-Tiana:DragonHDLatestNeural`	Female	Preview
`en-US-Tyler:DragonHDLatestNeural`	Male	Preview
`en-US-Vance:DragonHDLatestNeural`	Male	Preview
`es-ES-Ximena:DragonHDLatestNeural`	Female	GA
`es-ES-Tristan:DragonHDLatestNeural`	Male	GA
`es-MX-Ximena:DragonHDLatestNeural`	Female	GA
`es-MX-Tristan:DragonHDLatestNeural`	Male	GA
`fil-PH-Angelo:DragonHDLatestNeural`	Male	Preview
`fil-PH-Blessica:DragonHDLatestNeural`	Female	Preview
`fr-CA-Sylvie:DragonHDLatestNeural`	Female	GA
`fr-CA-Thierry:DragonHDLatestNeural`	Male	GA
`fr-FR-Vivienne:DragonHDLatestNeural`	Female	GA
`fr-FR-Remy:DragonHDLatestNeural`	Male	GA
`id-ID-Ardi:DragonHDLatestNeural`	Male	Preview
`id-ID-Gadis:DragonHDLatestNeural`	Female	Preview
`it-IT-Isabella:DragonHDLatestNeural`	Female	GA
`it-IT-Alessio:DragonHDLatestNeural`	Male	GA
`ja-JP-Nanami:DragonHDLatestNeural`	Female	GA
`ja-JP-Masaru:DragonHDLatestNeural`	Male	GA
`ko-KR-SunHi:DragonHDLatestNeural`	Female	GA
`ko-KR-Hyunsu:DragonHDLatestNeural`	Male	GA
`ms-MY-Osman:DragonHDLatestNeural`	Male	Preview
`ms-MY-Yasmin:DragonHDLatestNeural`	Female	Preview
`pt-BR-Thalita:DragonHDLatestNeural`	Female	GA
`pt-BR-Macerio:DragonHDLatestNeural`	Male	GA
`zh-CN-Xiaochen:DragonHDLatestNeural`	Female	GA
`zh-CN-Yunfan:DragonHDLatestNeural`	Male	GA

The following styles and paralinguistic tags are supported in HD voices:

Type Tag

Styles amazed, amused, angry, annoyed, anxious, appreciative, calm, cautious, concerned, confident, confused, curious, defeated, defensive, defiant, determined, disappointed, disgusted, doubtful, ecstatic, encouraging, excited, fast, fearful, frustrated, happy, hesitant, hurt, impatient, impressed, intrigued, joking, laughing, optimistic, painful, panicked, panting, pleading, proud, quiet, reassuring, reflective, relieved, remorseful, resigned, sad, sarcastic, secretive, serious, shocked, shouting, shy, skeptical, slow, struggling, surprised, suspicious, sympathetic, terrified, upset, urgent, whispering

Paralinguistics laughter, coughing, throat_clearing, breathing, sighing, yawning

Type	Tag
Styles	`amazed`, `amused`, `angry`, `annoyed`, `anxious`, `appreciative`, `calm`, `cautious`, `concerned`, `confident`, `confused`, `curious`, `defeated`, `defensive`, `defiant`, `determined`, `disappointed`, `disgusted`, `doubtful`, `ecstatic`, `encouraging`, `excited`, `fast`, `fearful`, `frustrated`, `happy`, `hesitant`, `hurt`, `impatient`, `impressed`, `intrigued`, `joking`, `laughing`, `optimistic`, `painful`, `panicked`, `panting`, `pleading`, `proud`, `quiet`, `reassuring`, `reflective`, `relieved`, `remorseful`, `resigned`, `sad`, `sarcastic`, `secretive`, `serious`, `shocked`, `shouting`, `shy`, `skeptical`, `slow`, `struggling`, `surprised`, `suspicious`, `sympathetic`, `terrified`, `upset`, `urgent`, `whispering`
Paralinguistics	`laughter`, `coughing`, `throat_clearing`, `breathing`, `sighing`, `yawning`

Note

Styles and paralinguistics are available on all English content for all voices. Style results are strongly relevant to the input content: the model adapts style application based on the semantic meaning of the text. See the styles and paralinguistics SSML template.

Dragon HD Flash voices

HD Flash voices are optimized variants of selected DragonHD voices, currently supporting Chinese (zh-cn) and English (en-US) text. These voices deliver enhanced naturalness and are only available in chinanorth3 currently.

The following table lists all available HD Flash voices and supported styles.

Voice name	Supported styles
`zh-cn-Xiaoxiao:DragonHDFlashLatestNeural`	`angry`, `chat`, `cheerful`, `customer-service`, `excited`, `fearful`, `sad`, `voice-assistant`
`zh-cn-Xiaoxiao2:DragonHDFlashLatestNeural`	`affectionate`, `angry`, `anxious`, `cheerful`, `curious`, `disappointed`, `empathetic`, `encouraging`, `excited`, `fearful`, `guilty`, `lonely`, `poetry-reading`, `sad`, `sentimental`, `sorry`, `story`, `surprised`, `tired`, `whispering`
`zh-cn-Xiaochen:DragonHDFlashLatestNeural`	`cheerful`, `debating`, `empathetic`, `live-commercial`, `poetry-reading`, `sad`, `sorry`
`zh-cn-Xiaoyi:DragonHDFlashLatestNeural`	`angry`, `complaining`, `cute`, `gentle`, `nervous`, `sad`, `shy`, `strict`
`zh-cn-Xiaoyu:DragonHDFlashLatestNeural`	`angry`, `debating`, `cheerful`, `comforting`, `sad`, `sorry`
`zh-cn-Xiaohan:DragonHDFlashLatestNeural`	`affectionate`, `angry`, `cheerful`, `complaining`, `fearful`, `gentle`, `sad`, `shy`, `strict`
`zh-cn-Xiaoshuang:DragonHDFlashLatestNeural`	`chat`
`zh-cn-Xiaoyou:DragonHDFlashLatestNeural`	`chat`, `angry`, `cheerful`, `poetry-reading`, `sad`, `story`, `cute`
`zh-cn-Yunxi:DragonHDFlashLatestNeural`	`angry`, `chat`, `cheerful`, `complaining`, `depressed`, `fearful`, `news`, `sad`, `shy`, `strict`, `voice-assistant`
`zh-cn-Yunyi:DragonHDFlashLatestNeural`	`assassin`, `captain`, `cavalier`, `prince`, `game-narrator`, `geomancer`, `poet`
`zh-cn-Yunxiao:DragonHDFlashLatestNeural`	—
`zh-cn-Yunhan:DragonHDFlashLatestNeural`	`angry`, `cheerful`, `curious`, `empathetic`, `encouraging`, `excited`, `guilty`, `lonely`, `sad`, `serious`, `sorry`, `whispering`, `surprised`, `tired`
`zh-cn-Yunxia:DragonHDFlashLatestNeural`	`affectionate`, `angry`, `cheerful`, `comforting`, `encouraging`, `excited`, `fearful`, `sad`, `surprised`
`zh-cn-Yunye:DragonHDFlashLatestNeural`	—
`en-US-Tiana:DragonHDFlashLatestNeural`	—
`en-US-Tyler:DragonHDFlashLatestNeural`	—
`en-US-Jimmie:DragonHDFlashLatestNeural`	—

Note

HD Flash only supports text in zh-cn and en-US.

How to use Azure Speech HD voices

Use the same Speech SDK and REST APIs for HD voices as you do for non-HD voices.

Consider these key points when using Azure Speech HD voices:

Voice locale: The locale in the voice name indicates its original language and region.
Base models:
- HD voices include a base model that understands the input text and predicts the speaking pattern accordingly. You can specify the desired model, such as DragonHDLatestNeural, based on the availability of each voice.
SSML usage: To reference a voice in SSML, use the format voicename:basemodel:version. The name before the colon, such as de-DE-Seraphina, is the voice persona name and its original locale. The base model is tracked by versions in subsequent updates.
Temperature parameter:
- The temperature value is a float ranging from 0 to 1, influencing the randomness of the output. You can adjust the temperature parameter to control the variation of outputs. Less randomness yields more stable results, while more randomness offers variety but less consistency.
- Lower temperature results in less randomness, leading to more predictable outputs. Higher temperature increases randomness, allowing for more diverse outputs. The default temperature is set at 1.0.

Here's an example of how to use Azure Speech HD voices in SSML:

<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='https://www.w3.org/2001/mstts' xml:lang='en-US'>
<voice name='en-US-Ava:DragonHDLatestNeural' parameters='temperature=0.8'>Here is a test</voice>
</speak>

Supported and unsupported SSML elements for Azure Speech HD voices

The Speech Synthesis Markup Language (SSML) with input text determines the structure, content, and other characteristics of the text to speech output. For example, you can use SSML to define a paragraph, a sentence, a break or a pause, or silence. You can wrap text with event tags such as bookmark or viseme that your application processes later.

The Azure Speech HD voices support different SSML elements depending on the model:

DragonHD voices: Support a subset of SSML elements (see table below)

For detailed information on the supported and unsupported SSML elements for Azure Speech HD voices, refer to the following table. For instructions on how to use SSML elements, refer to the Speech Synthesis Markup Language (SSML) documentation.

SSML element	Description	DragonHD
`<voice>`	Specifies the voice and optional effects (`eq_car` and `eq_telecomhp8k`).	Yes
`<mstts:express-as>`	Specifies speaking styles and roles.	No
`<mstts:ttsembedding>`	Specifies the `speakerProfileId` property for a personal voice.	No
`<lang xml:lang>`	Specifies the speaking language.	Yes
`<prosody>`	Adjusts pitch, contour, range, rate, and volume.	No
`<emphasis>`	Adds or removes word-level stress for the text.	No
`<audio>`	Embeds prerecorded audio into an SSML document.	No
`<mstts:audioduration>`	Specifies the duration of the output audio.	No
`<mstts:backgroundaudio>`	Adds background audio to your SSML documents or mixes an audio file with text to speech.	No
`<phoneme>`	Specifies phonetic pronunciation in SSML documents.	Yes
`<lexicon>`	Defines how multiple entities are read in SSML.	Yes (only supports alias)
`<say-as>`	Indicates the content type, such as number or date, of the element's text.	Yes
`<sub>`	Indicates that the alias attribute's text value should be pronounced instead of the element's enclosed text.	Yes
`<math>`	Uses the MathML as input text to properly pronounce mathematical notations in the output audio.	No
`<bookmark>`	Gets the offset of each marker in the audio stream.	No
`<break>`	Overrides the default behavior of breaks or pauses between words.	Yes
`<mstts:silence>`	Inserts pause before or after text, or between two adjacent sentences.	No
`<mstts:viseme>`	Defines the position of the face and mouth while a person is speaking.	No
`<p>`	Denotes paragraphs in SSML documents.	Yes
`<s>`	Denotes sentences in SSML documents.	Yes

Parameter enhancePronunciation

The enhancePronunciation parameter enables enhanced pronunciation handling during speech synthesis. When set to true, the NeuralHD voices apply extra pronunciation optimizations to improve the clarity and correctness of spoken output, particularly for complex, ambiguous, or nonstandard text.

When you enable enhancePronunciation, the service prioritizes pronunciation accuracy by applying enhanced linguistic processing during synthesis. This improvement can help how the system reads:

Proper nouns, names, and uncommon words
Acronyms, abbreviations, and mixed-case text
Words with multiple possible pronunciations depending on context This parameter complements existing pronunciation controls such as SSML-based pronunciation tags and lexicons, and doesn't replace them. The default value is false to preserve predictable, backward-compatible speech output. Enable it when you want the service to apply extra pronunciation optimizations for improved clarity and naturalness.

<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US">
  <voice name="en-US-Ava:DragonHDLatestNeural" parameters="enhancePronunciation=true">
    This is a pronunciation enhanced example for technical terms like
    Kubernetes, Azure OpenAI, and multilingual content such as 今、何か軽く摘めそうなものある？
  </voice>
</speak>

Recommended use cases

Enable enhancePronunciation in scenarios with structured or technical domain-specific content.

Note

The parameter affects pronunciation handling only; it doesn't change voice selection, speaking style, or prosody controls. Results might vary depending on language, voice, and input text. For deterministic pronunciation control, SSML pronunciation elements remain the recommended approach.

Last updated on 2026-06-12

High-definition voices in Azure Speech

HD voice overview

Key features of HD voices

Comparison of Azure Speech HD voices to other Azure text to speech voices

Supported Azure Speech HD voices

Dragon HD voices

Dragon HD Flash voices

How to use Azure Speech HD voices

Supported and unsupported SSML elements for Azure Speech HD voices

Parameter enhancePronunciation

Recommended use cases

Related content

Additional resources