Ingestion Client with Azure AI services
The Ingestion Client is a tool released by Microsoft on GitHub that helps you quickly deploy a call center transcription solution to Azure with a no-code approach.
Tip
You can use the tool and resulting solution in production to process a high volume of audio.
Ingestion Client uses the Azure AI Language, Azure AI Speech, Azure storage, and Azure Functions.
Get started with the Ingestion Client
An Azure account and a multi-service Azure AI services resource are needed to run the Ingestion Client.
- Azure subscription - Create one for trial
- Create an Azure AI services resource in the Azure portal.
- Get the resource key and region. After your resource is deployed, select Go to resource to view and manage keys. For more information about Azure AI services resources, see this quickstart.
Ingestion Client Features
The Ingestion Client works by connecting a dedicated Azure storage account to custom Azure Functions in a serverless fashion to pass transcription requests to the service. The transcribed audio files land in the dedicated Azure Storage container.
Important
Pricing varies depending on the mode of operation (batch vs real-time) as well as the Azure Function SKU selected. By default the tool will create a Premium Azure Function SKU to handle large volume. Visit the Pricing page for more information.
Internally, the tool uses Speech and Language services, and follows best practices to handle scale-up, retries and failover. The following schematic describes the resources and connections.
The following Speech service feature is used by the Ingestion Client:
- Batch speech to text: Transcribe large amounts of audio files asynchronously including speaker diarization and is typically used in post-call analytics scenarios. Diarization is the process of recognizing and separating speakers in mono channel audio data.
Here are some Language service features that are used by the Ingestion Client:
- Personally Identifiable Information (PII) extraction and redaction: Identify, categorize, and redact sensitive information in conversation transcription.
- Sentiment analysis and opinion mining: Analyze transcriptions and associate positive, neutral, or negative sentiment at the utterance and conversation-level.
Besides Azure AI services, these Azure products are used to complete the solution:
- Azure storage: Used for storing telephony data and the transcripts that batch transcription API returns. This storage account should use notifications, specifically for when new files are added. These notifications are used to trigger the transcription process.
- Azure Functions: Used for creating the shared access signature (SAS) URI for each recording, and triggering the HTTP POST request to start a transcription. Additionally, you use Azure Functions to create requests to retrieve and delete transcriptions by using the Batch Transcription API.
Tool customization
The tool is built to show customers results quickly. You can customize the tool to your preferred SKUs and setup. The SKUs can be edited from the Azure portal and the code itself is available on GitHub.
Note
We suggest creating the resources in the same dedicated resource group to understand and track costs more easily.