What is summarization?

Important

Conversation summarization is only available using:

  • REST API
  • Python
  • C#

Summarization is a feature offered by Azure AI Language, a combination of generative Large Language models and task-optimized encoder models that offer summarization solutions with higher quality, cost efficiency, and lower latency. Use this article to learn more about this feature, and how to use it in your applications.

Out of the box, the service provides summarization solutions for three types of genre, plain texts, conversations, and native documents. Text summarization only accepts plain text blocks. Conversation summarization accepts conversational input, including various speech audio signals. Native document summarization accepts documents in their native formats, such as Word, PDF, or plain text. For more information, see Supported document formats.

Capabilities

This documentation contains the following article types:

  • Quickstarts are getting-started instructions to guide you through making requests to the service.
  • How-to guides contain instructions for using the service in more specific or customized ways.

Native document summarization uses natural language processing techniques to generate a summary for native documents. A native document refers to the file format used to create the original document such as Microsoft Word (docx) or a portable document file (pdf). Native document support eliminates the need for text preprocessing before using Azure AI Language resource capabilities. Currently, native document support is available for two types of summarization:

  • Extractive summarization: Produces a summary by extracting salient sentences within the document, together the positioning information of those sentences.

    • Multiple extracted sentences: These sentences collectively convey the main idea of the document. They're original sentences extracted from the input document's content.
    • Rank score: The rank score indicates how relevant a sentence is to the main topic. Text summarization ranks extracted sentences, and you can determine whether they're returned in the order they appear, or according to their rank. For example, if you request a three-sentence summary extractive summarization returns the three highest scored sentences.
    • Positional information: The start position and length of extracted sentences.
  • Abstractive summarization: Generates a summary with concise, coherent sentences or words that aren't verbatim extract sentences from the original document.

    • Summary texts: Abstractive summarization returns a summary for each contextual input range. A long input can be segmented so multiple groups of summary texts can be returned with their contextual input range.
    • Contextual input range: The range within the input that was used to generate the summary text.

Currently, Document Summarization supports the following native document formats:

File type File extension Description
Text .txt An unformatted text document.
Adobe PDF .pdf A portable document file formatted document.
Microsoft Word .docx A Microsoft Word document file.

For more information, see Summarize native documents

Get started with summarization

To use summarization, you submit for analysis and handle the API output in your application. Analysis is performed as-is, with no added customization to the model used on your data. There are two ways to use summarization:

Development option Description
REST API or Client library (Azure SDK) Integrate text summarization into your applications using the REST API, or the client library available in various languages. For more information, see the summarization quickstart.

Input requirements and service limits

  • Summarization takes text for analysis. For more information, see Data and service limits in the how-to guide.
  • Summarization works with various written languages. For more information, see language support.

Reference documentation and code samples

As you use text summarization in your applications, see the following reference documentation and samples for Azure AI Language:

Development option / language Reference documentation Samples
C# C# documentation C# samples
Java Java documentation Java Samples
JavaScript JavaScript documentation JavaScript samples
Python Python documentation Python samples