Document Intelligence add-on capabilities
This content applies to: v3.1 (GA)
Note
Add-on capabilities are available within all models except for the Business card model.
Document Intelligence supports more sophisticated and modular analysis capabilities. Use the add-on features to extend the results to include more features extracted from your documents. Some add-on features incur an extra cost. These optional features can be enabled and disabled depending on the scenario of the document extraction. To enable a feature, add the associated feature name to the features
query string property. You can enable more than one add-on feature on a request by providing a comma-separated list of features. The following add-on capabilities are available for 2023-07-31 (GA)
and later releases.
Add-on Capability | Add-On/Free | 2024-02-29-preview | 2023-07-31 (GA) |
2022-08-31 (GA) |
v2.1 (GA) |
---|---|---|---|---|---|
Font property extraction | Add-On | ✔️ | ✔️ | n/a | n/a |
Formula extraction | Add-On | ✔️ | ✔️ | n/a | n/a |
High resolution extraction | Add-On | ✔️ | ✔️ | n/a | n/a |
Barcode extraction | Free | ✔️ | ✔️ | n/a | n/a |
Language detection | Free | ✔️ | ✔️ | n/a | n/a |
Key value pairs | Free | ✔️ | n/a | n/a | n/a |
Query fields | Add-On* | ✔️ | n/a | n/a | n/a |
Add-On* - Query fields are priced differently than the other add-on features. See pricing for details.
High resolution extraction
The task of recognizing small text from large-size documents, like engineering drawings, is a challenge. Often the text is mixed with other graphical elements and has varying fonts, sizes, and orientations. Moreover, the text can be broken into separate parts or connected with other symbols. Document Intelligence now supports extracting content from these types of documents with the ocr.highResolution
capability. You get improved quality of content extraction from A1/A2/A3 documents by enabling this add-on capability.
REST API
{your-resource-endpoint}.cognitiveservices.azure.cn/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=ocrHighResolution
Formula extraction
The ocr.formula
capability extracts all identified formulas, such as mathematical equations, in the formulas
collection as a top level object under content
. Inside content
, detected formulas are represented as :formula:
. Each entry in this collection represents a formula that includes the formula type as inline
or display
, and its LaTeX representation as value
along with its polygon
coordinates. Initially, formulas appear at the end of each page.
Note
The confidence
score is hard-coded.
"content": ":formula:",
"pages": [
{
"pageNumber": 1,
"formulas": [
{
"kind": "inline",
"value": "\\frac { \\partial a } { \\partial b }",
"polygon": [...],
"span": {...},
"confidence": 0.99
},
{
"kind": "display",
"value": "y = a \\times b + a \\times c",
"polygon": [...],
"span": {...},
"confidence": 0.99
}
]
}
]
REST API
{your-resource-endpoint}.cognitiveservices.azure.cn/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=formulas
Font property extraction
The ocr.font
capability extracts all font properties of text extracted in the styles
collection as a top-level object under content
. Each style object specifies a single font property, the text span it applies to, and its corresponding confidence score. The existing style property is extended with more font properties such as similarFontFamily
for the font of the text, fontStyle
for styles such as italic and normal, fontWeight
for bold or normal, color
for color of the text, and backgroundColor
for color of the text bounding box.
"content": "Foo bar",
"styles": [
{
"similarFontFamily": "Arial, sans-serif",
"spans": [ { "offset": 0, "length": 3 } ],
"confidence": 0.98
},
{
"similarFontFamily": "Times New Roman, serif",
"spans": [ { "offset": 4, "length": 3 } ],
"confidence": 0.98
},
{
"fontStyle": "italic",
"spans": [ { "offset": 1, "length": 2 } ],
"confidence": 0.98
},
{
"fontWeight": "bold",
"spans": [ { "offset": 2, "length": 3 } ],
"confidence": 0.98
},
{
"color": "#FF0000",
"spans": [ { "offset": 4, "length": 2 } ],
"confidence": 0.98
},
{
"backgroundColor": "#00FF00",
"spans": [ { "offset": 5, "length": 2 } ],
"confidence": 0.98
}
]
REST API
{your-resource-endpoint}.cognitiveservices.azure.cn/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=styleFont
Barcode property extraction
The ocr.barcode
capability extracts all identified barcodes in the barcodes
collection as a top level object under content
. Inside the content
, detected barcodes are represented as :barcode:
. Each entry in this collection represents a barcode and includes the barcode type as kind
and the embedded barcode content as value
along with its polygon
coordinates. Initially, barcodes appear at the end of each page. The confidence
is hard-coded for as 1.
Supported barcode types
Barcode Type | Example |
---|---|
QR Code |
|
Code 39 |
|
Code 93 |
|
Code 128 |
|
UPC (UPC-A & UPC-E) |
|
PDF417 |
|
EAN-8 |
|
EAN-13 |
|
Codabar |
|
Databar |
|
Databar Expanded |
|
ITF |
|
Data Matrix |
REST API
{your-resource-endpoint}.cognitiveservices.azure.cn/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=barcodes
Language detection
Adding the languages
feature to the analyzeResult
request predicts the detected primary language for each text line along with the confidence
in the languages
collection under analyzeResult
.
"languages": [
{
"spans": [
{
"offset": 0,
"length": 131
}
],
"locale": "en",
"confidence": 0.7
},
]
REST API
{your-resource-endpoint}.cognitiveservices.azure.cn/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=languages
Next steps
Learn more: Read model Layout model
SDK samples: python