What is Speech Studio?

Speech Studio is a set of UI-based tools for building and integrating features from Azure AI Speech service in your applications. You create projects in Speech Studio by using a no-code approach, and then reference those assets in your applications by using the Speech SDK, the Speech CLI, or the REST APIs.

Speech Studio scenarios

Explore, try out, and view sample code for some of common use cases.

  • Captioning: Choose a sample video clip to see real-time or offline processed captioning results. Learn how to synchronize captions with your input audio, apply profanity filters, get partial results, apply customizations, and identify spoken languages for multilingual scenarios. For more information, see the captioning quickstart.

  • Call Center: View a demonstration on how to use the Language and Speech services to analyze call center conversations. Transcribe calls in real-time or process a batch of calls, redact personally identifying information, and extract insights such as sentiment to help with your call center use case.

Speech Studio features

In Speech Studio, the following Speech service features are available as project types:

  • Real-time speech to text: Quickly test speech to text by dragging audio files here without having to use any code. Speech Studio has a demo tool for seeing how speech to text works on your audio samples. To explore the full functionality, see What is speech to text.

  • Batch speech to text: Quickly test batch transcription capabilities to transcribe a large amount of audio in storage and receive results asynchronously, To learn more about Batch Speech-to-text, see Batch speech to text overview.

  • Custom speech: Create speech recognition models that are tailored to specific vocabulary sets and styles of speaking. In contrast to the base speech recognition model, Custom speech models become part of your unique competitive advantage because they're not publicly accessible. To get started with uploading sample audio to create a custom speech model, see Upload training and testing datasets.

  • Pronunciation assessment: Evaluate speech pronunciation and give speakers feedback on the accuracy and fluency of spoken audio. Speech Studio provides a sandbox for testing this feature quickly, without code. To use the feature with the Speech SDK in your applications, see the Pronunciation assessment article.

  • Speech Translation: Quickly test and translate speech into other languages of your choice with low latency. To explore the full functionality, see What is speech translation.

  • Voice Gallery: Build apps and services that speak naturally. Choose from a broad portfolio of languages, voices, and variants. Bring your scenarios to life with highly expressive and human-like neural voices.

  • Audio Content Creation: A no-code approach for text to speech synthesis. You can use the output audio as-is, or as a starting point for further customization. You can build highly natural audio content for various scenarios, such as audiobooks, news broadcasts, video narrations, and chat bots. For more information, see the Audio Content Creation documentation.

Next steps