快速入门:使用计算机视觉 REST API 和 Python 提取印刷体文本和手写文本Quickstart: Extract printed and handwritten text using the Computer Vision REST API and Python

本快速入门将使用计算机视觉 REST API 从图像中提取印刷体文本和手写文本。In this quickstart, you'll extract printed and handwritten text from an image using the Computer Vision REST API. 使用 ReadGet Read Result 方法,可以检测图像中的文本,并将识别的字符提取到计算机可读的字符流中。With the Read and Get Read Result methods, you can detect text in an image and extract recognized characters into a machine-readable character stream.

重要

Read 方法以异步方式运行。The Read method runs asynchronously. 此方法不返回成功响应正文中的任何信息。This method does not return any information in the body of a successful response. 相反,批量读取方法返回 Operation-Location 响应标头字段值中的 URI。Instead, the Batch Read method returns a URI in the value of the Operation-Location response header field. 然后就可以调用此 URI,它表示获取读取结果 API,同时检查状态并返回读取方法调用的结果。You can then call this URI, which represents the Get Read Result API, to both check the status and return the results of the Read method call.


可以在 MyBinder 上使用 Jupyter 笔记本以分步方式运行此快速入门。You can run this quickstart in a step-by step fashion using a Jupyter notebook on MyBinder. 要启动活页夹,请选择以下按钮:To launch Binder, select the following button:

启动活页夹按钮The launch Binder button

先决条件Prerequisites

  • Azure 订阅 - 创建试用订阅An Azure subscription - Create one for trial
  • PythonPython
  • 拥有 Azure 订阅后,在 Azure 门户中创建计算机视觉资源 ,获取密钥和终结点。Once you have your Azure subscription, create a Computer Vision resource in the Azure portal to get your key and endpoint. 部署后,单击“转到资源”。After it deploys, click Go to resource.
    • 需要从创建的资源获取密钥和终结点,以便将应用程序连接到计算机视觉服务。You will need the key and endpoint from the resource you create to connect your application to the Computer Vision service. 你稍后会在快速入门中将密钥和终结点粘贴到下方的代码中。You'll paste your key and endpoint into the code below later in the quickstart.
    • 可以使用免费定价层 (F0) 试用该服务,然后再升级到付费层进行生产。You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
  • 为密钥和终结点 URL 创建环境变量,分别将其命名为 COMPUTER_VISION_SUBSCRIPTION_KEYCOMPUTER_VISION_ENDPOINTCreate environment variables for the key and endpoint URL, named COMPUTER_VISION_SUBSCRIPTION_KEY and COMPUTER_VISION_ENDPOINT, respectively.

创建并运行示例Create and run the sample

要创建和运行示例,请执行以下步骤:To create and run the sample, do the following steps:

  1. 将以下代码复制到文本编辑器中。Copy the following code into a text editor.
  2. (可选)将 image_url 的值替换为要从中提取文本的另一图像的 URL。Optionally, replace the value of image_url with the URL of a different image from which you want to extract text.
  3. 将代码保存为以 .py 为扩展名的文件。Save the code as a file with an .py extension. 例如,get-text.pyFor example, get-text.py.
  4. 打开命令提示符窗口。Open a command prompt window.
  5. 在提示符处,使用 python 命令运行示例。At the prompt, use the python command to run the sample. 例如,python get-text.pyFor example, python get-text.py.
import json
import os
import sys
import requests
import time
# If you are using a Jupyter notebook, uncomment the following line.
# %matplotlib inline
import matplotlib.pyplot as plt
from matplotlib.patches import Polygon
from PIL import Image
from io import BytesIO

missing_env = False
# Add your Computer Vision subscription key and endpoint to your environment variables.
if 'COMPUTER_VISION_ENDPOINT' in os.environ:
    endpoint = os.environ['COMPUTER_VISION_ENDPOINT']
else:
    print("From Azure Cognitive Service, retrieve your endpoint and subscription key.")
    print("\nSet the COMPUTER_VISION_ENDPOINT environment variable, such as \"https://api.cognitive.azure.cn\".\n")
    missing_env = True

if 'COMPUTER_VISION_SUBSCRIPTION_KEY' in os.environ:
    subscription_key = os.environ['COMPUTER_VISION_SUBSCRIPTION_KEY']
else:
    print("From Azure Cognitive Service, retrieve your endpoint and subscription key.")
    print("\nSet the COMPUTER_VISION_SUBSCRIPTION_KEY environment variable, such as \"1234567890abcdef1234567890abcdef\".\n")
    missing_env = True

if missing_env:
    print("**Restart your shell or IDE for changes to take effect.**")
    sys.exit()

text_recognition_url = endpoint + "/vision/v3.0/read/analyze"

# Set image_url to the URL of an image that you want to recognize.
image_url = "https://raw.githubusercontent.com/MicrosoftDocs/azure-docs/master/articles/cognitive-services/Computer-vision/Images/readsample.jpg"

headers = {'Ocp-Apim-Subscription-Key': subscription_key}
data = {'url': image_url}
response = requests.post(
    text_recognition_url, headers=headers, json=data)
response.raise_for_status()

# Extracting text requires two API calls: One call to submit the
# image for processing, the other to retrieve the text found in the image.

# Holds the URI used to retrieve the recognized text.
operation_url = response.headers["Operation-Location"]

# The recognized text isn't immediately available, so poll to wait for completion.
analysis = {}
poll = True
while (poll):
    response_final = requests.get(
        response.headers["Operation-Location"], headers=headers)
    analysis = response_final.json()
    
    print(json.dumps(analysis, indent=4))

    time.sleep(1)
    if ("analyzeResult" in analysis):
        poll = False
    if ("status" in analysis and analysis['status'] == 'failed'):
        poll = False

polygons = []
if ("analyzeResult" in analysis):
    # Extract the recognized text, with bounding boxes.
    polygons = [(line["boundingBox"], line["text"])
                for line in analysis["analyzeResult"]["readResults"][0]["lines"]]

# Display the image and overlay it with the extracted text.
image = Image.open(BytesIO(requests.get(image_url).content))
ax = plt.imshow(image)
for polygon in polygons:
    vertices = [(polygon[0][i], polygon[0][i+1])
                for i in range(0, len(polygon[0]), 2)]
    text = polygon[1]
    patch = Polygon(vertices, closed=True, fill=False, linewidth=2, color='y')
    ax.axes.add_patch(patch)
    plt.text(vertices[0][0], vertices[0][1], text, fontsize=20, va="top")
plt.show()

检查响应Examine the response

成功的响应以 JSON 格式返回。A successful response is returned in JSON. 示例网页会在命令提示符窗口中分析和显示成功响应,如下例所示:The sample webpage parses and displays a successful response in the command prompt window, similar to the following example:

{
  "status": "succeeded",
  "createdDateTime": "2020-05-28T05:13:21Z",
  "lastUpdatedDateTime": "2020-05-28T05:13:22Z",
  "analyzeResult": {
    "version": "3.0.0",
    "readResults": [
      {
        "page": 1,
        "language": "en",
        "angle": 0.8551,
        "width": 2661,
        "height": 1901,
        "unit": "pixel",
        "lines": [
          {
            "boundingBox": [
              67,
              646,
              2582,
              713,
              2580,
              876,
              67,
              821
            ],
            "text": "The quick brown fox jumps",
            "words": [
              {
                "boundingBox": [
                  143,
                  650,
                  435,
                  661,
                  436,
                  823,
                  144,
                  824
                ],
                "text": "The",
                "confidence": 0.958
              },
              {
                "boundingBox": [
                  540,
                  665,
                  926,
                  679,
                  926,
                  825,
                  541,
                  823
                ],
                "text": "quick",
                "confidence": 0.57
              },
              {
                "boundingBox": [
                  1125,
                  686,
                  1569,
                  700,
                  1569,
                  838,
                  1125,
                  828
                ],
                "text": "brown",
                "confidence": 0.799
              },
              {
                "boundingBox": [
                  1674,
                  703,
                  1966,
                  711,
                  1966,
                  851,
                  1674,
                  841
                ],
                "text": "fox",
                "confidence": 0.442
              },
              {
                "boundingBox": [
                  2083,
                  714,
                  2580,
                  725,
                  2579,
                  876,
                  2083,
                  855
                ],
                "text": "jumps",
                "confidence": 0.878
              }
            ]
          },
          {
            "boundingBox": [
              187,
              1062,
              485,
              1056,
              486,
              1120,
              189,
              1126
            ],
            "text": "over",
            "words": [
              {
                "boundingBox": [
                  190,
                  1064,
                  439,
                  1059,
                  441,
                  1122,
                  192,
                  1126
                ],
                "text": "over",
                "confidence": 0.37
              }
            ]
          },
          {
            "boundingBox": [
              664,
              1008,
              1973,
              1023,
              1969,
              1178,
              664,
              1154
            ],
            "text": "the lazy dog!",
            "words": [
              {
                "boundingBox": [
                  668,
                  1008,
                  923,
                  1015,
                  923,
                  1146,
                  669,
                  1117
                ],
                "text": "the",
                "confidence": 0.909
              },
              {
                "boundingBox": [
                  1107,
                  1018,
                  1447,
                  1023,
                  1445,
                  1178,
                  1107,
                  1162
                ],
                "text": "lazy",
                "confidence": 0.853
              },
              {
                "boundingBox": [
                  1639,
                  1024,
                  1974,
                  1023,
                  1971,
                  1170,
                  1636,
                  1178
                ],
                "text": "dog!",
                "confidence": 0.41
              }
            ]
          }
        ]
      }
    ]
  }
}

后续步骤Next steps

接下来,了解一款使用计算机视觉执行光学字符识别 (OCR) 功能的 Python 应用程序;创建智能裁剪缩略图;对图像中的视觉特征进行检测、分类、标记和说明。Next, explore a Python application that uses Computer Vision to perform optical character recognition (OCR); create smart-cropped thumbnails; and detect, categorize, tag, and describe visual features in images.