什么是计算机视觉?What is Computer Vision?

使用 Azure 的计算机视觉服务,开发人员可以访问用于处理图像并返回信息的高级算法。Azure's Computer Vision service provides developers with access to advanced algorithms that process images and return information. 若要分析图像,可以上传图像,也可以指定图像 URL。To analyze an image, you can either upload an image or specify an image URL. 图像处理算法可以通过多种不同的方式分析内容,具体取决于你感兴趣的视觉功能。The images processing algorithms can analyze content in several different ways, depending on the visual features you're interested in. 例如,计算机视觉可以确定图像是否包含成人内容或不雅内容,或者查找图像中的所有人脸。For example, Computer Vision can determine if an image contains adult or racy content or find all of the human faces in an image.

可以在应用程序中使用计算机视觉,方法是:使用本机 SDK,或者直接调用 REST API。You can use Computer Vision in your application by using either a native SDK or invoking the REST API directly. 此页广泛地介绍了计算机视觉的功能。This page broadly covers what you can do with Computer Vision.

通过分析图像来获取见解Analyze images for insight

可以分析图像,以便检测并提供有关视觉特性和特征的见解。You can analyze images to detect and provide insights about their visual features and characteristics. 下表中的所有特性由分析图像 API 提供。All of the features in the table below are provided by the Analyze Image API.

操作Action 说明Description
标记视觉特性Tag visual features 根据数千个可识别对象、生物、风景和操作识别并标记图像中的视觉特征。Identify and tag visual features in an image, from a set of thousands of recognizable objects, living things, scenery, and actions. 如果标记含混不清或者不常见,API 响应会做出提示,阐明上下文或标记。When the tags are ambiguous or not common knowledge, the API response provides hints to clarify the context of the tag. 标记并不局限于主体(如前景中的人员),还包括设置(室内或室外)、家具、工具、工厂、动物、附件、小工具等。Tagging isn't limited to the main subject, such as a person in the foreground, but also includes the setting (indoor or outdoor), furniture, tools, plants, animals, accessories, gadgets, and so on.
检测对象Detect objects 对象检测类似于添加标记,但 API 返回应用于每个标记的边框坐标。Object detection is similar to tagging, but the API returns the bounding box coordinates for each tag applied. 例如,如果图像包含狗、猫和人,检测操作将列出这些对象及其在图像中的坐标。For example, if an image contains a dog, cat and person, the Detect operation will list those objects together with their coordinates in the image. 可以使用此功能进一步处理图像中各对象之间的关系。You can use this functionality to process further relationships between the objects in an image. 图像中有多个相同标记的实例时,还会通知于你。It also lets you know when there are multiple instances of the same tag in an image.
对图像分类Categorize an image 使用具有父/子遗传层次结构的类别分类对整个图像进行标识和分类。Identify and categorize an entire image, using a category taxonomy with parent/child hereditary hierarchies. 类别可单独使用或与我们的新标记模型结合使用。Categories can be used alone, or with our new tagging models.
目前,英语是唯一可以对图像进行标记和分类的语言。Currently, English is the only supported language for tagging and categorizing images.
描述图像Describe an image 使用完整的句子,以人类可读语言生成整个图像的说明。Generate a description of an entire image in human-readable language, using complete sentences. 计算机视觉算法可根据图像中标识的对象生成各种说明。Computer Vision's algorithms generate various descriptions based on the objects identified in the image. 会对说明一一进行评估,并生成置信度。The descriptions are each evaluated and a confidence score generated. 然后返回一个列表,将置信度从高到低进行排列。A list is then returned ordered from highest confidence score to lowest.
检测人脸Detect faces 检测图像中的人脸,提供每个检测到的人脸的相关信息。Detect faces in an image and provide information about each detected face. 计算机视觉返回每个检测到的人脸的坐标、矩形、性别和年龄。Computer Vision returns the coordinates, rectangle, gender, and age for each detected face.
计算机视觉提供了人脸服务功能的子集。Computer Vision provides a subset of the Face service functionality. 可以使用“人脸”服务进行更详细的分析,如面部识别和姿势检测。You can use the Face service for more detailed analysis, such as facial identification and pose detection.
检测图像类型Detect image types 检测图像特征,例如图像是否为素描,或者图像是剪贴画的可能性。Detect characteristics about an image, such as whether an image is a line drawing or the likelihood of whether an image is clip art.
检测特定领域的内容Detect domain-specific content 使用域模型来检测和标识图像中特定领域的内容,例如名人和地标。Use domain models to detect and identify domain-specific content in an image, such as celebrities and landmarks. 例如,如果图像中包含人物,则计算机视觉可以使用针对名人的域模型来确定图像中检测到的人物是否为已知名人。For example, if an image contains people, Computer Vision can use a domain model for celebrities to determine if the people detected in the image are known celebrities.
检测颜色方案Detect the color scheme 分析图像中的颜色使用情况。Analyze color usage within an image. 计算机视觉可以确定图像是黑白的还是彩色的,而对于彩色图像,又可以确定主色和主题色。Computer Vision can determine whether an image is black & white or color and, for color images, identify the dominant and accent colors.
生成缩略图Generate a thumbnail 分析图像的内容,生成该图像的相应缩略图。Analyze the contents of an image to generate an appropriate thumbnail for that image. 计算机视觉首先生成高质量缩略图,然后通过分析图像中的对象来确定“感兴趣区域” 。Computer Vision first generates a high-quality thumbnail and then analyzes the objects within the image to determine the area of interest. 然后,计算机视觉会裁剪图像以满足感兴趣区域的要求。Computer Vision then crops the image to fit the requirements of the area of interest. 可以根据用户需求,使用与原始图像的纵横比不同的纵横比显示生成的缩略图。The generated thumbnail can be presented using an aspect ratio that is different from the aspect ratio of the original image, depending on your needs.
获取感兴趣区域Get the area of interest 分析图像内容,以返回“感兴趣区域”的坐标 。Analyze the contents of an image to return the coordinates of the area of interest. 计算机视觉并没有裁剪图像和生成缩略图,而是返回该区域的边框坐标,因此,进行调用的应用程序可以根据需要修改原始图像。Instead of cropping the image and generating a thumbnail, Computer Vision returns the bounding box coordinates of the region, so the calling application can modify the original image as desired.

从图像中提取文本Extract text from images

可以使用计算机视觉读取 API 将印刷文本和手写文本从图像中提取到计算机可读的字符流。You can use Computer Vision Read API to extract printed and handwritten text from images into a machine-readable character stream. 该读取 API 使用最新的模型,适用于各种表面和背景(如收据、海报、名片、信件和白板)上的文本。The Read API uses our latest models and works with text on a variety of surfaces and backgrounds, such as receipts, posters, business cards, letters, and whiteboards. 目前,英语是唯一受支持的语言。Currently, English is the only supported language.

还可以使用光学字符识别 (OCR) API 提取多种语言的印刷文本。You can also use the optical character recognition (OCR) API to extract printed text in several languages. 如果需要,OCR 会校正已识别文本的旋转角度并提供每个词的帧坐标。If needed, OCR corrects the rotation of the recognized text and provides the frame coordinates of each word. OCR 支持 25 种语言,并会自动检测已识别文本的语言。OCR supports 25 languages and automatically detects the language of the recognized text.

管理图像中的内容Moderate content in images

可以使用计算机视觉来检测图像中的成人和不雅内容,并返回这两者的置信度分数。You can use Computer Vision to detect adult and racy content in an image and return a confidence score for both. 可以根据自己的偏好在滑尺上设置用于检测成人内容和不雅内容的筛选器。You can set the filter for adult and racy content detection on a sliding scale to accommodate your preferences.

图像要求Image requirements

计算机视觉可以分析符合以下要求的图像:Computer Vision can analyze images that meet the following requirements:

  • 图像必须以 JPEG、PNG、GIF 或 BMP 格式显示The image must be presented in JPEG, PNG, GIF, or BMP format
  • 图像的文件大小必须不到 4 兆字节 (MB)The file size of the image must be less than 4 megabytes (MB)
  • 图像的尺寸必须大于 50 x 50 像素The dimensions of the image must be greater than 50 x 50 pixels
    • 对于 OCR,图像的尺寸必须介于 50 x 50 和 4200 x 4200 像素之间For OCR, the dimensions of the image must be between 50 x 50 and 4200 x 4200 pixels

数据隐私和安全性Data privacy and security

与所有认知服务一样,使用计算机视觉服务的开发人员应该了解 Microsoft 针对客户数据的政策。As with all of the Cognitive Services, developers using the Computer Vision service should be aware of Microsoft's policies on customer data. 请参阅 Microsoft 信任中心上的“认知服务”页面来了解详细信息。See the Cognitive Services page on the Microsoft Trust Center to learn more.

后续步骤Next steps

按照快速入门指南操作,完成计算机视觉入门:Get started with Computer Vision by following a quickstart guide: