Use glossaries with Document Translation

A glossary is a list of terms with definitions that you create for the Document Translation service to use during the translation process. Currently, the glossary feature supports one-to-one source-to-target language translation. Common use cases for glossaries include:

  • Context-specific terminology. Create a glossary that designates specific meanings for your unique context.

  • No translation. For example, you can restrict Document Translation from translating product name brands by using a glossary with the same source and target text.

  • Specified translations for ambiguous words. Choose a specific translation for poly​semantic words.

Create, upload, and use a glossary file

  1. Create your glossary file. Create a file in a supported format (preferably tab-separated values) that contains all the terms and phrases you want to use in your translation.

    To check if your file format is supported, see Get supported glossary formats.

    The following English-source glossary contains words that can have different meanings depending upon the context. The glossary provides the expected translation for each word in the file to help ensure accuracy.

    For instance, when the word Bank appears in a financial document, it should be translated to reflect its financial meaning. If the word Bank appears in a geographical document, it may refer to shore to reflect its topographical meaning. Similarly, the word Crane can refer to either a bird or machine.

    Example glossary .tsv file: English-to-French

       Bank     Banque
       Card     Carte
       Crane    Grue
       Office   Office
       Tiger    Tiger
       US       United States
    
  2. Upload your glossary to Azure storage. To complete this step, you need an Azure Blob Storage account with containers to store and organize your blob data within your storage account.

  3. Specify your glossary in the translation request. Include the glossary URL, format, and version in your POST request:

     {
         "inputs": [
             {
                 "source": {
                     "sourceUrl": "https://my.blob.core.chinacloudapi.cn/source-en"
                 },
                 "targets": [
                     {
                          "targetUrl": "https://my.blob.core.chinacloudapi.cn/target-fr",
                         "language": "fr",
                         "glossaries": [
                             {
                                 "glossaryUrl": "https://my.blob.core.chinacloudapi.cn/glossaries/en-fr.tsv",
                                 "format": "tsv"
                             }
                         ]
    
                     }
                 ]
             }
         ]
     }
    

    Note

    The example used an enabled system-assigned managed identity with a Storage Blob Data Contributor role assignment for authorization. For more information, see Managed identities for Document Translation.

Case sensitivity

By default, Azure AI Translator service API is case-sensitive, meaning that it matches terms in the source text based on case.

  • Partial sentence application. When your glossary is applied to part of a sentence, the Document Translation API checks whether the glossary term matches the case in the source text. If the casing doesn't match, the glossary isn't applied.

  • Complete sentence application. When your glossary is applied to a complete sentence, the service becomes case-insensitive. It matches the glossary term regardless of its case in the source text. This provision applies the correct results for use cases involving idioms and quotes.

Next steps

Try the Document Translation how-to guide to asynchronously translate whole documents using a programming language of your choice: