适用于 Python 的 Azure 认知服务计算机视觉 SDK

2023-11-10

使用计算机视觉服务，开发人员可以访问用于处理图像并返回信息的高级算法。计算机视觉算法根据你感兴趣的视觉特征，通过不同的方式分析图像的内容。

分析图像
获取主题域列表
按域分析图像
获取图像的文本说明
获取图像中的手写文本
生成缩略图

有关此服务的详细信息，请参阅什么是计算机视觉？。

想要更多文档？

先决条件

Python 3.6+
计算机视觉密钥和关联的终结点。创建 ComputerVisionClient 客户端对象的实例时需要使用这些值。使用以下其中一种方法获取这些值。

如果你拥有 Azure 订阅

在订阅中创建资源的最简单方法是使用以下 Azure CLI 命令。这样会创建一个认知服务密钥，该密钥可以在许多认知服务中使用。需要选择现有的资源组名称（例如“my-cogserv-group”）和新的计算机视觉资源名称（例如“my-computer-vision-resource”）。

RES_REGION=chinanorth
RES_GROUP=<resourcegroup-name>
ACCT_NAME=<computervision-account-name>

az cognitiveservices account create \
    --resource-group $RES_GROUP \
    --name $ACCT_NAME \
    --location $RES_REGION \
    --kind CognitiveServices \
    --sku S0 \
    --yes

安装 SDK

安装包含 pip 的适用于 Python 的 Azure 认知服务计算机视觉 SDK 包：

pip install azure-cognitiveservices-vision-computervision

身份验证

创建计算机视觉资源后，需要使用该资源的终结点及其帐户密钥之一来实例化客户端对象。

创建 ComputerVisionClient 客户端对象的实例时需要使用这些值。

例如，使用 Bash 终端设置环境变量：

ACCOUNT_ENDPOINT=<resourcegroup-name>
ACCT_NAME=<computervision-account-name>

对于 Azure 订阅用户，请获取密钥和终结点的凭据

如果忘记了终结点和密钥，可以使用以下方法找到它们。如需创建密钥和终结点，则可使用适用于 Azure 订阅持有人的方法。

使用以下 Azure CLI 代码片段在两个环境变量中填充计算机视觉帐户的终结点及其密钥之一（也可以在 Azure 门户中找到这些值）。此代码片段已针对 Bash shell 格式化。

RES_GROUP=<resourcegroup-name>
ACCT_NAME=<computervision-account-name>

export ACCOUNT_ENDPOINT=$(az cognitiveservices account show \
    --resource-group $RES_GROUP \
    --name $ACCT_NAME \
    --query endpoint \
    --output tsv)

export ACCOUNT_KEY=$(az cognitiveservices account keys list \
    --resource-group $RES_GROUP \
    --name $ACCT_NAME \
    --query key1 \
    --output tsv)

创建客户端

从环境变量获取终结点和密钥，然后创建 ComputerVisionClient 客户端对象。

from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import VisualFeatureTypes
from msrest.authentication import CognitiveServicesCredentials

# Get endpoint and key from environment variables
import os
endpoint = os.environ['ACCOUNT_ENDPOINT']
key = os.environ['ACCOUNT_KEY']

# Set credentials
credentials = CognitiveServicesCredentials(key)

# Create client
client = ComputerVisionClient(endpoint, credentials)

示例

在使用以下任何任务前，需要 ComputerVisionClient 客户端对象。

分析图像

可以使用 analyze_image 分析图像中的某些特征。使用 visual_features 属性设置针对图像执行的分析类型。常用值为 VisualFeatureTypes.tags 和 VisualFeatureTypes.description。

url = "https://upload.wikimedia.org/wikipedia/commons/thumb/1/12/Broadway_and_Times_Square_by_night.jpg/450px-Broadway_and_Times_Square_by_night.jpg"

image_analysis = client.analyze_image(url,visual_features=[VisualFeatureTypes.tags])

for tag in image_analysis.tags:
    print(tag)

获取主题域列表

使用 list_models 查看用于分析图像的主题域。按域分析图像时将使用这些域名。域的示例是 landmarks。

models = client.list_models()

for x in models.models_property:
    print(x)

按域分析图像

可以使用 analyze_image_by_domain 按主题域分析图像。获取支持的主题域列表，以使用正确的域名。

# type of prediction
domain = "landmarks"

# Public domain image of Eiffel tower
url = "https://images.pexels.com/photos/338515/pexels-photo-338515.jpeg"

# English language response
language = "en"

analysis = client.analyze_image_by_domain(domain, url, language)

for landmark in analysis.result["landmarks"]:
    print(landmark["name"])
    print(landmark["confidence"])

获取图像的文本说明

可以使用 describe_image 获取图像的基于语言的文本说明。如果你要针对与图像关联的关键字执行文本分析，请使用 max_description 属性请求多个说明。以下图像的文本说明示例包括 a train crossing a bridge over a body of water、a large bridge over a body of water 和 a train crossing a bridge over a large body of water。

domain = "landmarks"
url = "http://www.public-domain-photos.com/free-stock-photos-4/travel/san-francisco/golden-gate-bridge-in-san-francisco.jpg"
language = "en"
max_descriptions = 3

analysis = client.describe_image(url, max_descriptions, language)

for caption in analysis.captions:
    print(caption.text)
    print(caption.confidence)

获取图像中的文本

可以从图像中获取任何手写或打印的文本。这需要对 SDK 进行两次调用：batch_read_file 和 get_read_operation_result。对 batch_read_file 的调用是异步的。在 get_read_operation_result 调用的结果中，需要先使用 TextOperationStatusCodes 检查第一个调用是否已完成，然后再提取文本数据。结果包括文本以及该文本的边框坐标。

# import models
from azure.cognitiveservices.vision.computervision.models import TextOperationStatusCodes
import time

url = "https://azurecomcdn.azureedge.net/cvt-1979217d3d0d31c5c87cbd991bccfee2d184b55eeb4081200012bdaf6a65601a/images/shared/cognitive-services-demos/read-text/read-1-thumbnail.png"
raw = True
custom_headers = None
numberOfCharsInOperationId = 36

# Async SDK call
rawHttpResponse = client.batch_read_file(url, custom_headers,  raw)

# Get ID from returned headers
operationLocation = rawHttpResponse.headers["Operation-Location"]
idLocation = len(operationLocation) - numberOfCharsInOperationId
operationId = operationLocation[idLocation:]

# SDK call
while True:
    result = client.get_read_operation_result(operationId)
    if result.status not in ['NotStarted', 'Running']:
        break
    time.sleep(1)

# Get data
if result.status == TextOperationStatusCodes.succeeded:
    for textResult in result.recognition_results:
        for line in textResult.lines:
            print(line.text)
            print(line.bounding_box)

生成缩略图

可以使用 generate_thumbnail 生成图像的缩略图 (JPG)。缩略图的比例不需要与原始图像相同。

安装 Pillow 以使用此示例：

pip install Pillow

Pillow 安装后，使用以下代码示例中的包来生成缩略图。

# Pillow package
from PIL import Image

# IO package to create local image
import io

width = 50
height = 50
url = "http://www.public-domain-photos.com/free-stock-photos-4/travel/san-francisco/golden-gate-bridge-in-san-francisco.jpg"

thumbnail = client.generate_thumbnail(width, height, url)

for x in thumbnail:
    image = Image.open(io.BytesIO(x))

image.save('thumbnail.jpg')

疑难解答

常规

使用 Python SDK 与 ComputerVisionClient 客户端对象交互时，将使用ComputerVisionErrorException类返回错误。服务返回的错误对应于返回给 REST API 请求的相同 HTTP 状态代码。

例如，如果你尝试使用无效的密钥分析图像，则会返回 401 错误。以下代码片段通过捕获异常并显示有关错误的其他信息来妥善处理该错误。


domain = "landmarks"
url = "http://www.public-domain-photos.com/free-stock-photos-4/travel/san-francisco/golden-gate-bridge-in-san-francisco.jpg"
language = "en"
max_descriptions = 3

try:
    analysis = client.describe_image(url, max_descriptions, language)

    for caption in analysis.captions:
        print(caption.text)
        print(caption.confidence)
except HTTPFailure as e:
    if e.status_code == 401:
        print("Error unauthorized. Make sure your key and endpoint are correct.")
    else:
        raise

使用重试处理暂时性错误

使用 ComputerVisionClient 客户端时，可能会遇到服务强制实施的速率限制所导致的暂时性错误，或者网络中断等其他暂时性问题。

后续步骤

将内容标记应用于图像

通过