调用读取 APICall the Read API

在本指南中,你将了解如何通过调用读取 API 从图像中提取文本。In this guide, you'll learn how to call the Read API to extract text from images. 你将了解此 API 行为的不同配置方式以满足你的需求。You'll learn the different ways you can configure the behavior of this API to meet your needs.

本指南假设你已经创建计算机视觉资源 并获取订阅密钥和终结点 URL。This guide assumes you have already create a Computer Vision resource and obtained a subscription key and endpoint URL. 如果没有,请按照快速入门中的说明开始操作。If you haven't, follow a quickstart to get started.

提交服务数据Submit data to the service

读取 API 的读取调用采用图像或 PDF 文档作为输入,以异步方式提取文本。The Read API's Read call takes an image or PDF document as the input and extracts text asynchronously.

https://{endpoint}/vision/v3.2/read/analyze[?language][&pages][&readingOrder]

该调用返回时将包含一个名为 Operation-Location 的响应头字段。The call returns with a response header field called Operation-Location. Operation-Location 值是一个 URL,其中包含要在下一步骤中使用的操作 ID。The Operation-Location value is a URL that contains the Operation ID to be used in the next step.

响应标头Response header 示例值Example value
Operation-LocationOperation-Location https://cognitiveservice/vision/v3.2/read/analyzeResults/49a36324-fc4b-4387-aa06-090cfbf0064f

备注

BillingBilling

计算机视觉定价页面包含了读取 API 的定价层。The Computer Vision pricing page includes the pricing tier for Read. 分析的每个图像或页面均为一个事务。Each analyzed image or page is one transaction. 如果对包含 100 个页面的 PDF 或 TIFF 文档调用该操作,则读取操作会将其计为 100 个事务,而你需要按 100 个事务付费。If you call the operation with a PDF or TIFF document containing 100 pages, the Read operation will count it as 100 transactions and you will be billed for 100 transactions. 如果对该操作发出了 50 次调用,而每次调用提交了包含 100 个页面的文档,则你需要按照 50 X 100 = 5000 个事务付费。If you made 50 calls to the operation and each call submitted a document with 100 pages, you will be billed for 50 X 100 = 5000 transactions.

确定如何处理数据Determine how to process the data

语言规范Language specification

读取 调用包含可选语言请求参数。The Read call has an optional request parameter for language. “读取”支持自动语言识别和多语言文档,因此,如果你要强制将文档作为一种特定语言进行处理,只需提供该语言的代码即可。Read supports auto language identification and multilingual documents, so only provide a language code if you would like to force the document to be processed as that specific language.

自然读取顺序输出(仅限拉丁语)Natural reading order output (Latin languages only)

使用 readingOrder 查询参数指定文本行的输出顺序。Specify the order in which the text lines are output with the readingOrder query parameter. 使用 natural 可提供更易于阅读的读取顺序输出,如以下示例中所示。Use natural for a more human-friendly reading order output as shown in the following example. 此功能仅支持拉丁语。This feature is only supported for Latin languages.

OCR 读取顺序示例

选择页面或页面范围以进行文本提取Select page(s) or page ranges for text extraction

对于较大的多页文档,请使用 pages 查询参数指定页码或页面范围,以便仅从这些页面提取文本。For large multi-page documents, use the pages query parameter to specify page numbers or page ranges to extract text from only those pages. 以下示例演示了一个包含 10 个页面的文档,将按照所有页面 (1-10) 和选定页面 (3-6) 这两种条件提取文本。The following example shows a document with 10 pages, with text extracted for both cases - all pages (1-10) and selected pages (3-6).

选定页面输出

获取服务结果Get results from the service

第二个步骤是调用获取读取结果操作。The second step is to call Get Read Results operation. 此操作采用读取操作创建的操作 ID 作为输入。This operation takes as input the operation ID that was created by the Read operation.

https://{endpoint}/vision/v3.2/read/analyzeResults/{operationId}

此操作返回一个 JSON 响应,其中包含具有以下可能值的 status 字段。It returns a JSON response that contains a status field with the following possible values.

Value 含义Meaning
notStarted 尚未启动操作。The operation has not started.
running 正在处理操作。The operation is being processed.
failed 操作失败。The operation has failed.
succeeded 操作已成功执行。The operation has succeeded.

可以不断地以迭代方式调用此操作,直到它返回 succeeded 值为止。You call this operation iteratively until it returns with the succeeded value. 使用 1 到 2 秒的间隔可以避免超过每秒请求数 (RPS) 的速率限制。Use an interval of 1 to 2 seconds to avoid exceeding the requests per second (RPS) rate.

备注

免费层将请求速率限制为每分钟 20 次调用。The free tier limits the request rate to 20 calls per minute. 付费层允许的每秒请求数 (RPS) 为 10 个,该限制可按请求提高。The paid tier allows 10 requests per second (RPS) that can be increased upon request. 注意你的 Azure 资源标识符和区域,并打开 Azure 支持票证或联系帐户团队,请求更高的每秒请求数 (RPS) 速率。Note your Azure resource identfier and region, and open an Azure support ticket or contact your account team to request a higher request per second (RPS) rate.

status 字段的值为 succeeded 时,JSON 响应将包含从图像或文档提取的文本内容。When the status field has the succeeded value, the JSON response contains the extracted text content from your image or document. JSON 响应会维护已识别单词的原始分组。The JSON response maintains the original line groupings of recognized words. 其中包括提取的文本行及其边界框坐标。It includes the extracted text lines and their bounding box coordinates. 每个文本行都包含所有提取的单词及其坐标和可信度分数。Each text line includes all extracted words with their coordinates and confidence scores.

备注

提交到 Read 操作的数据将暂时加密并静态存储较短的一段时间,然后被删除。The data submitted to the Read operation are temporarily encrypted and stored at rest for a short duration, and then deleted. 这样,应用程序便可以检索提取的文本作为服务响应的一部分。This lets your applications retrieve the extracted text as part of the service response.

示例 JSON 输出Sample JSON output

参阅下面的成功 JSON 响应示例:See the following example of a successful JSON response:

{
  "status": "succeeded",
  "createdDateTime": "2021-02-04T06:32:08.2752706+00:00",
  "lastUpdatedDateTime": "2021-02-04T06:32:08.7706172+00:00",
  "analyzeResult": {
    "version": "3.2",
    "readResults": [
      {
        "page": 1,
        "angle": 2.1243,
        "width": 502,
        "height": 252,
        "unit": "pixel",
        "lines": [
          {
            "boundingBox": [
              58,
              42,
              314,
              59,
              311,
              123,
              56,
              121
            ],
            "text": "Tabs vs",
            "appearance": {
              "style": {
                "name": "handwriting",
                "confidence": 0.96
              }
            },
            "words": [
              {
                "boundingBox": [
                  68,
                  44,
                  225,
                  59,
                  224,
                  122,
                  66,
                  123
                ],
                "text": "Tabs",
                "confidence": 0.933
              },
              {
                "boundingBox": [
                  241,
                  61,
                  314,
                  72,
                  314,
                  123,
                  239,
                  122
                ],
                "text": "vs",
                "confidence": 0.977
              }
            ]
          }
        ]
      }
    ]
  }
}

文本行手写分类(仅限拉丁语)Handwritten classification for text lines (Latin languages only)

响应将分类说明每个文本行是否为手写体,同时包括置信度评分。The response includes classifying whether each text line is of handwriting style or not, along with a confidence score. 此功能仅支持拉丁语。This feature is only supported for Latin languages. 以下示例演示了图像中文本的手写分类。The following example shows the handwritten classification for the text in the image.

OCR 手写分类示例

后续步骤Next steps

若要试用 REST API,请转到读取 API 参考To try out the REST API, go to the Read API Reference.