Document Intelligence add-on capabilities

This content applies to: checkmark v3.1 (GA)

Note

Add-on capabilities are available within all models except for the Business card model.

Document Intelligence supports more sophisticated and modular analysis capabilities. Use the add-on features to extend the results to include more features extracted from your documents. Some add-on features incur an extra cost. These optional features can be enabled and disabled depending on the scenario of the document extraction. To enable a feature, add the associated feature name to the features query string property. You can enable more than one add-on feature on a request by providing a comma-separated list of features. The following add-on capabilities are available for 2023-07-31 (GA) and later releases.

Add-on Capability Add-On/Free 2024-02-29-preview 2023-07-31 (GA) 2022-08-31 (GA) v2.1 (GA)
Font property extraction Add-On ✔️ ✔️ n/a n/a
Formula extraction Add-On ✔️ ✔️ n/a n/a
High resolution extraction Add-On ✔️ ✔️ n/a n/a
Barcode extraction Free ✔️ ✔️ n/a n/a
Language detection Free ✔️ ✔️ n/a n/a
Key value pairs Free ✔️ n/a n/a n/a
Query fields Add-On* ✔️ n/a n/a n/a

Add-On* - Query fields are priced differently than the other add-on features. See pricing for details.

High resolution extraction

The task of recognizing small text from large-size documents, like engineering drawings, is a challenge. Often the text is mixed with other graphical elements and has varying fonts, sizes, and orientations. Moreover, the text can be broken into separate parts or connected with other symbols. Document Intelligence now supports extracting content from these types of documents with the ocr.highResolution capability. You get improved quality of content extraction from A1/A2/A3 documents by enabling this add-on capability.

REST API

{your-resource-endpoint}.cognitiveservices.azure.cn/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=ocrHighResolution

Formula extraction

The ocr.formula capability extracts all identified formulas, such as mathematical equations, in the formulas collection as a top level object under content. Inside content, detected formulas are represented as :formula:. Each entry in this collection represents a formula that includes the formula type as inline or display, and its LaTeX representation as value along with its polygon coordinates. Initially, formulas appear at the end of each page.

Note

The confidence score is hard-coded.

"content": ":formula:",
  "pages": [
    {
      "pageNumber": 1,
      "formulas": [
        {
          "kind": "inline",
          "value": "\\frac { \\partial a } { \\partial b }",
          "polygon": [...],
          "span": {...},
          "confidence": 0.99
        },
        {
          "kind": "display",
          "value": "y = a \\times b + a \\times c",
          "polygon": [...],
          "span": {...},
          "confidence": 0.99
        }
      ]
    }
  ]

REST API

{your-resource-endpoint}.cognitiveservices.azure.cn/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=formulas

Font property extraction

The ocr.font capability extracts all font properties of text extracted in the styles collection as a top-level object under content. Each style object specifies a single font property, the text span it applies to, and its corresponding confidence score. The existing style property is extended with more font properties such as similarFontFamily for the font of the text, fontStyle for styles such as italic and normal, fontWeight for bold or normal, color for color of the text, and backgroundColor for color of the text bounding box.

"content": "Foo bar",
"styles": [
    {
      "similarFontFamily": "Arial, sans-serif",
      "spans": [ { "offset": 0, "length": 3 } ],
      "confidence": 0.98
    },
    {
      "similarFontFamily": "Times New Roman, serif",
      "spans": [ { "offset": 4, "length": 3 } ],
      "confidence": 0.98
    },
    {
      "fontStyle": "italic",
      "spans": [ { "offset": 1, "length": 2 } ],
      "confidence": 0.98
    },
    {
      "fontWeight": "bold",
      "spans": [ { "offset": 2, "length": 3 } ],
      "confidence": 0.98
    },
    {
      "color": "#FF0000",
      "spans": [ { "offset": 4, "length": 2 } ],
      "confidence": 0.98
    },
    {
      "backgroundColor": "#00FF00",
      "spans": [ { "offset": 5, "length": 2 } ],
      "confidence": 0.98
    }
  ]

REST API

  {your-resource-endpoint}.cognitiveservices.azure.cn/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=styleFont

Barcode property extraction

The ocr.barcode capability extracts all identified barcodes in the barcodes collection as a top level object under content. Inside the content, detected barcodes are represented as :barcode:. Each entry in this collection represents a barcode and includes the barcode type as kind and the embedded barcode content as value along with its polygon coordinates. Initially, barcodes appear at the end of each page. The confidence is hard-coded for as 1.

Supported barcode types

Barcode Type Example
QR Code Screenshot of the QR Code.
Code 39 Screenshot of the Code 39.
Code 93 Screenshot of the Code 93.
Code 128 Screenshot of the Code 128.
UPC (UPC-A & UPC-E) Screenshot of the UPC.
PDF417 Screenshot of the PDF417.
EAN-8 Screenshot of the European-article-number barcode ean-8.
EAN-13 Screenshot of the European-article-number barcode ean-13.
Codabar Screenshot of the Codabar.
Databar Screenshot of the Data bar.
Databar Expanded Screenshot of the Data bar Expanded.
ITF Screenshot of the interleaved-two-of-five barcode (ITF).
Data Matrix Screenshot of the Data Matrix.

REST API

{your-resource-endpoint}.cognitiveservices.azure.cn/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=barcodes

Language detection

Adding the languages feature to the analyzeResult request predicts the detected primary language for each text line along with the confidence in the languages collection under analyzeResult.

"languages": [
    {
        "spans": [
            {
                "offset": 0,
                "length": 131
            }
        ],
        "locale": "en",
        "confidence": 0.7
    },
]

REST API

{your-resource-endpoint}.cognitiveservices.azure.cn/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=languages

Next steps

Learn more: Read model Layout model

SDK samples: python