Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
The Computer Vision service provides developers with access to advanced algorithms for processing images and returning information. Computer Vision algorithms analyze the content of an image in different ways, depending on the visual features you're interested in.
- Analyze an image
- Get subject domain list
- Analyze an image by domain
- Get text description of an image
- Get handwritten text from image
- Generate thumbnail
For more information about this service, see What is Computer Vision?.
Looking for more documentation?
- Python 3.6+
- Computer Vision key and associated endpoint. You need these values when you create the instance of the ComputerVisionClient client object. Use one of the following methods to get these values.
The easiest method to create a resource in your subscription is to use the following Azure CLI command. This creates a Cognitive Service key that can be used across many cognitive services. You need to choose the existing resource group name, for example, "my-cogserv-group" and the new computer vision resource name, such as "my-computer-vision-resource".
RES_REGION=chinanorth
RES_GROUP=<resourcegroup-name>
ACCT_NAME=<computervision-account-name>
az cognitiveservices account create \
--resource-group $RES_GROUP \
--name $ACCT_NAME \
--location $RES_REGION \
--kind CognitiveServices \
--sku S0 \
--yes
Install the Azure Cognitive Services Computer Vision SDK for Python package with pip:
pip install azure-cognitiveservices-vision-computervision
Once you create your Computer Vision resource, you need its endpoint, and one of its account keys to instantiate the client object.
Use these values when you create the instance of the ComputerVisionClient client object.
For example, use the Bash terminal to set the environment variables:
ACCOUNT_ENDPOINT=<resourcegroup-name>
ACCT_NAME=<computervision-account-name>
If you do not remember your endpoint and key, you can use the following method to find them. If you need to create a key and endpoint, you can use the method for Azure subscription holders.
Use the Azure CLI snippet below to populate two environment variables with the Computer Vision account endpoint and one of its keys (you can also find these values in the Azure portal). The snippet is formatted for the Bash shell.
RES_GROUP=<resourcegroup-name>
ACCT_NAME=<computervision-account-name>
export ACCOUNT_ENDPOINT=$(az cognitiveservices account show \
--resource-group $RES_GROUP \
--name $ACCT_NAME \
--query endpoint \
--output tsv)
export ACCOUNT_KEY=$(az cognitiveservices account keys list \
--resource-group $RES_GROUP \
--name $ACCT_NAME \
--query key1 \
--output tsv)
Get the endpoint and key from environment variables then create the ComputerVisionClient client object.
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import VisualFeatureTypes
from msrest.authentication import CognitiveServicesCredentials
# Get endpoint and key from environment variables
import os
endpoint = os.environ['ACCOUNT_ENDPOINT']
key = os.environ['ACCOUNT_KEY']
# Set credentials
credentials = CognitiveServicesCredentials(key)
# Create client
client = ComputerVisionClient(endpoint, credentials)
You need a ComputerVisionClient client object before using any of the following tasks.
You can analyze an image for certain features with analyze_image
. Use the visual_features
property to set the types of analysis to perform on the image. Common values are VisualFeatureTypes.tags
and VisualFeatureTypes.description
.
url = "https://upload.wikimedia.org/wikipedia/commons/thumb/1/12/Broadway_and_Times_Square_by_night.jpg/450px-Broadway_and_Times_Square_by_night.jpg"
image_analysis = client.analyze_image(url,visual_features=[VisualFeatureTypes.tags])
for tag in image_analysis.tags:
print(tag)
Review the subject domains used to analyze your image with list_models
. These domain names are used when analyzing an image by domain. An example of a domain is landmarks
.
models = client.list_models()
for x in models.models_property:
print(x)
You can analyze an image by subject domain with analyze_image_by_domain
. Get the list of supported subject domains in order to use the correct domain name.
# type of prediction
domain = "landmarks"
# Public domain image of Eiffel tower
url = "https://images.pexels.com/photos/338515/pexels-photo-338515.jpeg"
# English language response
language = "en"
analysis = client.analyze_image_by_domain(domain, url, language)
for landmark in analysis.result["landmarks"]:
print(landmark["name"])
print(landmark["confidence"])
You can get a language-based text description of an image with describe_image
. Request several descriptions with the max_description
property if you are doing text analysis for keywords associated with the image. Examples of a text description for the following image include a train crossing a bridge over a body of water
, a large bridge over a body of water
, and a train crossing a bridge over a large body of water
.
domain = "landmarks"
url = "http://www.public-domain-photos.com/free-stock-photos-4/travel/san-francisco/golden-gate-bridge-in-san-francisco.jpg"
language = "en"
max_descriptions = 3
analysis = client.describe_image(url, max_descriptions, language)
for caption in analysis.captions:
print(caption.text)
print(caption.confidence)
You can get any handwritten or printed text from an image. This requires two calls to the SDK: batch_read_file
and get_read_operation_result
. The call to batch_read_file
is asynchronous. In the results of the get_read_operation_result
call, you need to check if the first call completed with TextOperationStatusCodes
before extracting the text data. The results include the text as well as the bounding box coordinates for the text.
# import models
from azure.cognitiveservices.vision.computervision.models import TextOperationStatusCodes
import time
url = "https://azurecomcdn.azureedge.net/cvt-1979217d3d0d31c5c87cbd991bccfee2d184b55eeb4081200012bdaf6a65601a/images/shared/cognitive-services-demos/read-text/read-1-thumbnail.png"
raw = True
custom_headers = None
numberOfCharsInOperationId = 36
# Async SDK call
rawHttpResponse = client.batch_read_file(url, custom_headers, raw)
# Get ID from returned headers
operationLocation = rawHttpResponse.headers["Operation-Location"]
idLocation = len(operationLocation) - numberOfCharsInOperationId
operationId = operationLocation[idLocation:]
# SDK call
while True:
result = client.get_read_operation_result(operationId)
if result.status not in ['NotStarted', 'Running']:
break
time.sleep(1)
# Get data
if result.status == TextOperationStatusCodes.succeeded:
for textResult in result.recognition_results:
for line in textResult.lines:
print(line.text)
print(line.bounding_box)
You can generate a thumbnail (JPG) of an image with generate_thumbnail
. The thumbnail does not need to be in the same proportions as the original image.
Install Pillow to use this example:
pip install Pillow
Once Pillow is installed, use the package in the following code example to generate the thumbnail image.
# Pillow package
from PIL import Image
# IO package to create local image
import io
width = 50
height = 50
url = "http://www.public-domain-photos.com/free-stock-photos-4/travel/san-francisco/golden-gate-bridge-in-san-francisco.jpg"
thumbnail = client.generate_thumbnail(width, height, url)
for x in thumbnail:
image = Image.open(io.BytesIO(x))
image.save('thumbnail.jpg')
When you interact with the ComputerVisionClient client object using the Python SDK, the ComputerVisionErrorException
class is used to return errors. Errors returned by the service correspond to the same HTTP status codes returned for REST API requests.
For example, if you try to analyze an image with an invalid key, a 401
error is returned. In the following snippet, the error is handled gracefully by catching the exception and displaying additional information about the error.
domain = "landmarks"
url = "http://www.public-domain-photos.com/free-stock-photos-4/travel/san-francisco/golden-gate-bridge-in-san-francisco.jpg"
language = "en"
max_descriptions = 3
try:
analysis = client.describe_image(url, max_descriptions, language)
for caption in analysis.captions:
print(caption.text)
print(caption.confidence)
except HTTPFailure as e:
if e.status_code == 401:
print("Error unauthorized. Make sure your key and endpoint are correct.")
else:
raise
While working with the ComputerVisionClient client, you might encounter transient failures caused by rate limits enforced by the service, or other transient problems like network outages.