文档智能 v4.0 迁移

Important

文档智能 REST API v2.1 于 2027 年 9 月 15 日终止支持。
文档智能 REST API 2022-08-31 v3.0 于 2029 年 3 月 30 日终止支持。
若要避免生产中断，请在这些日期之前使用本迁移指南迁移到 Azure Document Intelligence 2024-11-30 v4.0。

SDK 迁移指南

有关更新应用程序代码以使用 v4.0 SDK 的指导，请参阅GitHub存储库中的特定于语言的 SDK 迁移指南。这些指南提供有关更新代码以调用新 API 方法并处理 v4.0 中引入的更新响应格式的说明：

从 v3.1 迁移到 v4.0

这些预览 API 会定期进行弃用。如果使用预览 API 版本，请将应用程序更新为面向 GA API 版本。若要使用 SDK 从预览 API 版本迁移到 2024-11-30 (GA) API 版本，请更新到语言特定 SDK 的当前版本。早期 v3.x API 版本将在已发布的计划上停用;v3.0 停用 2029 年 3 月 30 日。

Important

使用预览 API 版本训练的自定义提取和分类模型与该预览 API 版本的生命周期和基本模型相关联。预览版 API 版本停用时，训练的自定义模型也会停用。作为 API 迁移的一部分，请使用最新的 GA API 版本重新训练模型。

分析功能

模型 ID	文本提取	Paragraphs	段落角色	选择标记	Tables	键值对	Languages	条形码	文档分析	公式*	StyleFont*	`OCR` 高分辨率*
prebuilt-read	✓	✓					O	O		O	O	O
预生成布局	✓	✓	✓	✓	✓		O	O		O	O	O
预构建文档	✓	✓	✓	✓	✓	✓	O	O		O	O	O
prebuilt-businessCard	✓								✓
prebuilt-idDocument	✓						O	O	✓	O	O	O
预生成的发票	✓			✓	✓	O	O	O	✓	O	O	O
预构建收据	✓						O	O	✓	O	O	O
prebuilt-healthInsuranceCard.us	✓						O	O	✓	O	O	O
prebuilt-tax.us.w2	✓			✓			O	O	✓	O	O	O
prebuilt-tax.us.1098	✓			✓			O	O	✓	O	O	O
prebuilt-tax.us.1098E 预制税表.美国.1098E	✓			✓			O	O	✓	O	O	O
prebuilt-tax.us.1098T	✓			✓			O	O	✓	O	O	O
预构建合同	✓	✓	✓	✓			O	O	✓	O	O	O
{ customModelName }	✓	✓	✓	✓	✓		O	O	✓	O	O	O

• - 已启用 O - 可选公式/StyleFont/OCR 高分辨率* - 高级功能会产生额外的成本

从 v3.0 迁移

Important

Azure文档智能 v3.0 API（2022-08-31）将于 2029 年 3 月 30 日终止支持。若要避免生产中断，请使用 Azure Document Intelligence 2024-11-30 v4.0 进行所有新开发，并将现有工作负荷迁移到此日期之前Azure Document Intelligence 2024-11-30 v4.0。如果使用的是 v3.0，则可以使用特定于语言的 SDK 和 2024-11-30 REST API 的当前版本直接移动到 v4.0。

与 v3.0 相比，文档智能 v3.1 引入了多项新功能：

条形码的提取。
加载项功能，包括高分辨率、公式和字体属性提取。
用于文档拆分和分类的自定义分类模型。
发票和收据模型中的语言扩展和新字段支持。
ID 文档模型中的新文档类型支持。
新的预构建健康保险卡模型。
在预构建的读取模型中支持 Office/HTML 文件，可以无需绑定框直接提取单词和段落。不再支持嵌入图像。如果为 Office/HTML 文件请求插件功能，将返回一个空数组而不产生错误。
自定义提取和分类模型的模型过期 - 我们的新自定义模型基于我们定期更新的大型基础模型进行质量改进。所有自定义模型都会引入到期日期，以便停用相应的基本模型。自定义模型过期后，需要使用最新的 API 版本（基本模型）重新训练模型。

GET /documentModels/{customModelId}?api-version={apiVersion}
{
  "modelId": "{customModelId}",
  "description": "{customModelDescription}",
  "createdDateTime": "2023-09-24T12:54:35Z",
  "expirationDateTime": "2025-01-01T00:00:00Z",
  "apiVersion": "2023-07-31",
  "docTypes": { ... }
}

自定义神经模型生成配额 - 每个订阅每月可以为每个区域构建的神经模型数量有限。我们将结果 JSON 展开，以包含配额和使用信息，以帮助你了解当前使用情况，作为 GET /info 返回的资源信息的一部分。

{
  "customDocumentModels": { ... },
  "customNeuralDocumentModelBuilds": {
    "used": 1,
    "quota": 10,
    "quotaResetDateTime": "2023-03-01T00:00:00Z"
  }
}

分析操作的可选 features 查询参数可以选择启用特定功能。某些高级功能可能会产生额外的收费。有关详细信息，请参阅 “分析”功能列表。
扩展提取的货币字段对象，以尽可能输出规范化货币代码字段。目前，当前字段可以返回金额（例如 123.45）和货币Symbol（例如 $）。此功能将货币符号映射到规范 ISO 4217 代码（例如美元）。模型可以选择使用全局文档内容来消除歧义或推断货币代码。

{
  "fields": {
    "Total": {
      "type": "currency",
      "content": "$123.45",
      "valueCurrency": {
        "amount": 123.45,
        "currencySymbol": "$",
        "currencyCode": "USD"
      },
      ...
    }
  }
}

除了模型质量改进之外，强烈建议更新应用程序以使用 v3.1 从这些新功能中获益。

从 v2.1 或 v2.0 迁移

文档智能 v4.0 是最新的 GA 版本，具有最丰富的功能、大多数语言和文档类型覆盖率以及改进的模型质量。有关 v4.0 中可用的特性和功能，请参阅模型概述。

从 v3.0 开始，文档智能 REST API 经过重新设计，以提高可用性。在本部分中，了解文档智能 v2.0、v2.1 和 v3.1 之间的差异，以及如何迁移到较新版本的 API。

Caution

REST API 2023-07-31 版本包括 REST API 分析响应 JSON 中的中断性变更。
该 boundingBox 属性在每个实例中重命名为 polygon 。

对 REST API 终结点的更改

v3.1 REST API 将布局分析、预建模型和自定义模型的分析操作合并为一个统一的操作对，其中documentModels和modelId分别分配给布局分析和预建模型。

POST 请求

https://{your-form-recognizer-endpoint}/formrecognizer/documentModels/{modelId}?api-version=2023-07-31

GET 请求

https://{your-form-recognizer-endpoint}/formrecognizer/documentModels/{modelId}/AnalyzeResult/{resultId}?api-version=2023-07-31

分析操作

请求有效负载和调用模式保持不变。
该 Analyze 操作指定输入文档和特定于内容的配置，它通过响应中的 Operation-Location 标头返回已分析的结果 URL。
发送 GET 请求轮询 Analyze Result URL，以检查Analyze操作的状态（请求之间的建议最小间隔为 1 秒）。
成功后，状态设置为“成功”，并在响应正文中返回 analyzeResult 。如果遇到错误，则状态设置为 failed，并返回错误。

Model	v2.0	v2.1	v3.1
请求 URL 前缀	https://{your-form-recognizer-endpoint}/formrecognizer/v2.0	https://{your-form-recognizer-endpoint}/formrecognizer/v2.1	https://{your-form-recognizer-endpoint}/formrecognizer
常规文档	不适用	不适用	`/documentModels/prebuilt-document:analyze`
布局	/layout/analyze	/layout/analyze	`/documentModels/prebuilt-layout:analyze`
自定义	/custom/models/{modelId}/分析	/custom/{modelId}/analyze	`/documentModels/{modelId}:analyze`
发票	不适用	/prebuilt/invoice/analyze	`/documentModels/prebuilt-invoice:analyze`
收据	/prebuilt/receipt/analyze	/prebuilt/receipt/analyze	`/documentModels/prebuilt-receipt:analyze`
身份证明文件	不适用	/prebuilt/idDocument/analyze	`/documentModels/prebuilt-idDocument:analyze`
名片	不适用	/prebuilt/businessCard/analyze	`/documentModels/prebuilt-businessCard:analyze`
W-2	不适用	不适用	`/documentModels/prebuilt-tax.us.w2:analyze`
医疗保险卡	不适用	不适用	`/documentModels/prebuilt-healthInsuranceCard.us:analyze`
合同	不适用	不适用	`/documentModels/prebuilt-contract:analyze`

分析请求正文

要分析的内容通过请求正文提供。 URL 或 base64 编码的数据都可以用于构造请求。

若要指定可公开访问的 Web URL，请将 Content-Type 设置为 application/json 并发送以下 JSON 正文：

{
  "urlSource": "{urlPath}"
}

文档智能 v3.0 也支持 Base 64 编码：

{
  "base64Source": "{base64EncodedContent}"
}

其他支持的参数

继续支持的参数：

pages ：仅分析文档中特定页面子集。索引编号从数字 1 开始的页码列表，以进行分析。例如： "1-3,5,7-9"
locale ：用于文本识别和文档分析的区域设置提示。值只能包含语言代码（例如 en，） fr或 BCP 47 种语言标记（例如“en-US”）。

不再支持参数：

includeTextDetails

新的响应格式更紧凑，并且始终返回完整输出。

对分析结果的更改

分析响应重构为以下顶级结果，并支持多页元素。

pages
tables
keyValuePairs
entities
styles
documents

注释

analyzeResult响应更改包括更改，例如从页面的属性向上移动到顶部杠杆属性。analyzeResult


{
// Basic analyze result metadata
"apiVersion": "2022-08-31", // REST API version used
"modelId": "prebuilt-invoice", // ModelId used
"stringIndexType": "textElements", // Character unit used for string offsets and lengths:
// textElements, unicodeCodePoint, utf16CodeUnit // Concatenated content in global reading order across pages.
// Words are generally delimited by space, except CJK (Chinese, Japanese, Korean) characters.
// Lines and selection marks are generally delimited by newline character.
// Selection marks are represented in Markdown emoji syntax (:selected:, :unselected:).
"content": "CONTOSO LTD.\nINVOICE\nContoso Headquarters...", "pages": [ // List of pages analyzed
{
// Basic page metadata
"pageNumber": 1, // 1-indexed page number
"angle": 0, // Orientation of content in clockwise direction (degree)
"width": 0, // Page width
"height": 0, // Page height
"unit": "pixel", // Unit for width, height, and polygon coordinates
"spans": [ // Parts of top-level content covered by page
{
"offset": 0, // Offset in content
"length": 7 // Length in content
}
], // List of words in page
"words": [
{
"text": "CONTOSO", // Equivalent to $.content.Substring(span.offset, span.length)
"boundingBox": [ ... ], // Position in page
"confidence": 0.99, // Extraction confidence
"span": { ... } // Part of top-level content covered by word
}, ...
], // List of selectionMarks in page
"selectionMarks": [
{
"state": "selected", // Selection state: selected, unselected
"boundingBox": [ ... ], // Position in page
"confidence": 0.95, // Extraction confidence
"span": { ... } // Part of top-level content covered by selection mark
}, ...
], // List of lines in page
"lines": [
{
"content": "CONTOSO LTD.", // Concatenated content of line (may contain both words and selectionMarks)
"boundingBox": [ ... ], // Position in page
"spans": [ ... ], // Parts of top-level content covered by line
}, ...
]
}, ...
], // List of extracted tables
"tables": [
{
"rowCount": 1, // Number of rows in table
"columnCount": 1, // Number of columns in table
"boundingRegions": [ // Polygons or Bounding boxes potentially across pages covered by table
{
"pageNumber": 1, // 1-indexed page number
"polygon": [ ... ], // Previously Bounding box, renamed to polygon in the 2022-08-31 API
}
],
"spans": [ ... ], // Parts of top-level content covered by table // List of cells in table
"cells": [
{
"kind": "stub", // Cell kind: content (default), rowHeader, columnHeader, stub, description
"rowIndex": 0, // 0-indexed row position of cell
"columnIndex": 0, // 0-indexed column position of cell
"rowSpan": 1, // Number of rows spanned by cell (default=1)
"columnSpan": 1, // Number of columns spanned by cell (default=1)
"content": "SALESPERSON", // Concatenated content of cell
"boundingRegions": [ ... ], // Bounding regions covered by cell
"spans": [ ... ] // Parts of top-level content covered by cell
}, ...
]
}, ...
], // List of extracted key-value pairs
"keyValuePairs": [
{
"key": { // Extracted key
"content": "INVOICE:", // Key content
"boundingRegions": [ ... ], // Key bounding regions
"spans": [ ... ] // Key spans
},
"value": { // Extracted value corresponding to key, if any
"content": "INV-100", // Value content
"boundingRegions": [ ... ], // Value bounding regions
"spans": [ ... ] // Value spans
},
"confidence": 0.95 // Extraction confidence
}, ...
],
"styles": [
{
"isHandwritten": true, // Is content in this style handwritten?
"spans": [ ... ], // Spans covered by this style
"confidence": 0.95 // Detection confidence
}, ...
], // List of extracted documents
"documents": [
{
"docType": "prebuilt-invoice", // Classified document type (model dependent)
"boundingRegions": [ ... ], // Document bounding regions
"spans": [ ... ], // Document spans
"confidence": 0.99, // Document splitting/classification confidence // List of extracted fields
"fields": {
"VendorName": { // Field name (docType dependent)
"type": "string", // Field value type: string, number, array, object, ...
"valueString": "CONTOSO LTD.",// Normalized field value
"content": "CONTOSO LTD.", // Raw extracted field content
"boundingRegions": [ ... ], // Field bounding regions
"spans": [ ... ], // Field spans
"confidence": 0.99 // Extraction confidence
}, ...
}
}, ...
]
}

生成或训练模型

模型对象在新 API 中具有三个更新

modelId 现在是一个属性，可以针对人类可读名称的模型进行设置。
modelName 已重命名为 description
buildMode是一个新属性，对于自定义表格模型，其值是template，对于自定义神经模型，其值是neural。

调用该 build 操作来训练模型。请求有效负载和调用模式保持不变。生成操作指定模型和训练数据集，它通过响应中的 Operation-Location 标头返回结果。通过 GET 请求轮询此模型操作 URL 以检查生成操作的状态（请求之间的最小建议间隔为 1 秒）。与 v2.1 不同，此 URL 不是模型的资源位置。相反，可从给定的 modelId 构造模型 URL，还可从响应中的 resourceLocation 属性中检索模型 URL。成功后，状态设置为 succeeded，结果中包含自定义模型信息。如果遇到错误，则状态设置为 failed，并返回错误。

以下代码是使用 SAS 令牌的示例生成请求。设置前缀或文件夹路径时，请注意尾部斜杠。

POST https://{your-form-recognizer-endpoint}/formrecognizer/documentModels:build?api-version=2022-08-31

{
  "modelId": {modelId},
  "description": "Sample model",
  "buildMode": "template",
  "azureBlobSource": {
    "containerUrl": "https://{storageAccount}.blob.core.chinacloudapi.cn/{containerName}?{sasToken}",
    "prefix": "{folderName/}"
  }
}

合成模型更改

模型撰写现在仅限于单个嵌套级别。组合模型现在由于添加了modelId和description属性，变得与自定义模型一致。

POST https://{your-form-recognizer-endpoint}/formrecognizer/documentModels:compose?api-version=2022-08-31
{
  "modelId": "{composedModelId}",
  "description": "{composedModelDescription}",
  "componentModels": [
    { "modelId": "{modelId1}" },
    { "modelId": "{modelId2}" },
  ]
}

对复制模型的更改

复制模型的调用模式保持不变：

使用目标资源调用 authorizeCopy授权复制操作。现在是 POST 请求。
将授权提交到源资源，并复制调用 copyTo 的模型
轮询返回的操作以验证操作是否已成功完成

复制模型函数的唯一更改是：

authorizeCopy 上的 HTTP 动作现在是 POST 请求。
授权有效负载包含提交复制请求所需的全部信息。

授权副本

POST https://{targetHost}/formrecognizer/documentModels:authorizeCopy?api-version=2022-08-31
{
  "modelId": "{targetModelId}",
  "description": "{targetModelDescription}",
}

使用授权操作响应中的正文来构造副本请求。

POST https://{sourceHost}/formrecognizer/documentModels/{sourceModelId}:copyTo?api-version=2022-08-31
{
  "targetResourceId": "{targetResourceId}",
  "targetResourceRegion": "{targetResourceRegion}",
  "targetModelId": "{targetModelId}",
  "targetModelLocation": "https://{targetHost}/formrecognizer/documentModels/{targetModelId}",
  "accessToken": "{accessToken}",
  "expirationDateTime": "2021-08-02T03:56:11Z"
}

对列表模型的更改

列表模型已扩展为现在返回预生成模型和自定义模型。所有预生成模型名称都以 prebuilt-.. 开头。仅返回状态为“成功”的模型。若要列出失败或正在进行中的模型，请参阅列表操作。

示例列表模型请求

GET https://{your-form-recognizer-endpoint}/formrecognizer/documentModels?api-version=2022-08-31

对获取模型操作的更改

Get Model 现在包括预生成模型，因此 Get 操作将返回一个 docTypes 字典。每个文档类型说明包括名称、可选说明、字段架构和可选字段置信度。字段架构描述可能随文档类型一起返回的字段列表。

GET https://{your-form-recognizer-endpoint}/formrecognizer/documentModels/{modelId}?api-version=2022-08-31

新的获取信息操作

该服务 info 上的操作返回自定义模型计数和自定义模型限制。

GET https://{your-form-recognizer-endpoint}/formrecognizer/info? api-version=2022-08-31

示例响应

{
  "customDocumentModels": {
    "count": 5,
    "limit": 100
  }
}

后续步骤

Last updated on 2026-07-15

文档智能 v4.0 迁移

SDK 迁移指南

从 v3.1 迁移到 v4.0

分析功能

从 v3.0 迁移

从 v2.1 或 v2.0 迁移

对 REST API 终结点的更改

POST 请求

GET 请求

分析操作

分析请求正文

其他支持的参数

对分析结果的更改

生成或训练模型

合成模型更改

对复制模型的更改

对列表模型的更改

对获取模型操作的更改

新的获取信息操作

后续步骤

Recursos adicionales