Use glossaries with Document Translation
A glossary is a list of terms with definitions that you create for the Document Translation service to use during the translation process. Currently, the glossary feature supports one-to-one source-to-target language translation. Common use cases for glossaries include:
Context-specific terminology. Create a glossary that designates specific meanings for your unique context.
No translation. For example, you can restrict Document Translation from translating product name brands by using a glossary with the same source and target text.
Specified translations for ambiguous words. Choose a specific translation for polysemantic words.
Create, upload, and use a glossary file
Create your glossary file. Create a file in a supported format (preferably tab-separated values) that contains all the terms and phrases you want to use in your translation.
To check if your file format is supported, see Get supported glossary formats.
The following English-source glossary contains words that can have different meanings depending upon the context. The glossary provides the expected translation for each word in the file to help ensure accuracy.
For instance, when the word
Bank
appears in a financial document, it should be translated to reflect its financial meaning. If the wordBank
appears in a geographical document, it may refer to shore to reflect its topographical meaning. Similarly, the wordCrane
can refer to either a bird or machine.Example glossary .tsv file: English-to-French
Bank Banque Card Carte Crane Grue Office Office Tiger Tiger US United States
Upload your glossary to Azure storage. To complete this step, you need an Azure Blob Storage account with containers to store and organize your blob data within your storage account.
Specify your glossary in the translation request. Include the
glossary URL
,format
, andversion
in yourPOST
request:{ "inputs": [ { "source": { "sourceUrl": "https://my.blob.core.chinacloudapi.cn/source-en" }, "targets": [ { "targetUrl": "https://my.blob.core.chinacloudapi.cn/target-fr", "language": "fr", "glossaries": [ { "glossaryUrl": "https://my.blob.core.chinacloudapi.cn/glossaries/en-fr.tsv", "format": "tsv" } ] } ] } ] }
Note
The example used an enabled system-assigned managed identity with a Storage Blob Data Contributor role assignment for authorization. For more information, see Managed identities for Document Translation.
Case sensitivity
By default, Azure AI Translator service API is case-sensitive, meaning that it matches terms in the source text based on case.
Partial sentence application. When your glossary is applied to part of a sentence, the Document Translation API checks whether the glossary term matches the case in the source text. If the casing doesn't match, the glossary isn't applied.
Complete sentence application. When your glossary is applied to a complete sentence, the service becomes case-insensitive. It matches the glossary term regardless of its case in the source text. This provision applies the correct results for use cases involving idioms and quotes.
Next steps
Try the Document Translation how-to guide to asynchronously translate whole documents using a programming language of your choice: