图像分析认知技能
图像分析技能根据图像内容提取一组丰富的可视特征 。 例如,可从图像生成标题栏、生成标记或识别名人和地标。 本文是图像分析技能的参考文档。 有关使用说明,请参阅从图像中提取文本和信息。
此技能使用 Azure AI 服务中 Azure AI 视觉提供的机器学习模型。 图像分析可以处理符合以下要求的图像:
- 图像必须以 JPEG、PNG、GIF 或 BMP 格式显示
- 图像的文件大小必须不到 4 兆字节 (MB)
- 图像的尺寸必须大于 50 x 50 像素
此技能使用 AI 图像分析 API 版本 3.2 实现。 如果解决方案需要调用较新版本的服务 API(如版本 4.0),请考虑通过 Web API 自定义技能实现。
注意
此技能绑定到 Azure AI 服务,并且对于每天每个索引器超过 20 个文档的事务,需要使用可计费资源。 执行内置技能将按现有的 Azure AI 服务标准预付费套餐价格收费。
此外,图像提取可由 Azure AI 搜索进行计费。
@odata.type
Microsoft.Skills.Vision.ImageAnalysisSkill
技能参数
参数区分大小写。
参数名称 | 说明 |
---|---|
defaultLanguageCode |
表示要返回的语言的字符串。 该服务以指定的语言返回识别结果。 如果未指定此参数,则默认值为“en”。 支持的语言包括 Azure AI 视觉的部分正式发布语言。 在 AI 视觉服务中新引入处于正式发布状态的语言时,经过预期的延迟之后才能在此技能中完全集成该语言。 |
visualFeatures |
表示要返回的可视特征类型的一组字符串。 有效的可视特征类型包括:
defaultLanguageCode 都支持哪些视觉功能。 |
details |
表示要返回的特定于域的详细信息的一组字符串。 有效的可视特征类型包括:
|
技能输入
输入名称 | 说明 |
---|---|
image |
复杂类型。 当前仅适用于“/document/normalized_images”字段,当 imageAction 设置为非 none 值时由 Azure Blob 索引器生成。 |
技能输出
输出名称 | 说明 |
---|---|
adult |
输出是单个复杂类型的 adult 对象,由布尔字段(isAdultContent 、isGoryContent 、isRacyContent )和 double 类型分数(adultScore 、goreScore 、racyScore )组成。 |
brands |
输出是 brand 对象的数组,该对象是一个复杂类型,由 name (string) 和 confidence 分数 (double) 组成。 它还返回一个 rectangle ,使用四个边界框坐标(x ,y ,w ,h ,以像素为单位)来指示元素在图像内的放置情况。 对于矩形,x 和 y 为左上。 左下为 x ,y+h 。 右上为 x+w ,y 。 右下为 x+w ,y+h 。 |
categories |
输出是 category 对象的数组,每个 category 对象都是一个复杂类型,由 name (string)、score (double) 和可选的 detail (包含名人或地标详细信息)组成。 有关类别名称的完整列表,请参阅类别分类。 详细信息是嵌套的复杂类型。 名人细节由姓名、置信度分数和人脸边界框组成。 地标详细信息由名称和置信度分数组成。 |
description |
输出是单个复杂类型的 description 对象,包含 tags 和 caption (一个数组,由 Text (string) 和 confidence (double) 组成)的列表。 |
faces |
复杂类型包含 age 、gender 和 faceBoundingBox ,后者有四个边界框坐标(以像素为单位),指示元素在图像内的放置情况。 坐标为 top ,left ,width ,height 。 |
objects |
输出是视觉特征对象的数组。 每个对象都是一种复杂类型,由 object (string)、confidence (double)、rectangle (有四个边界框坐标,指示元素在图像中的放置情况)和 parent (包含对象名称和置信度)组成。 |
tags |
输出是 imageTag 对象的数组,tag 对象是一种复杂类型,由 name (string)、hint (string) 和 confidence (double) 组成。 添加提示的情况很少见。 仅当标记不明确时,才会生成它。 例如,标记为“冰壶”的图像可能会有“体育”提示,以更好地指示其内容。 |
示例技能定义
{
"description": "Extract image analysis.",
"@odata.type": "#Microsoft.Skills.Vision.ImageAnalysisSkill",
"context": "/document/normalized_images/*",
"defaultLanguageCode": "en",
"visualFeatures": [
"adult",
"brands",
"categories",
"description",
"faces",
"objects",
"tags"
],
"inputs": [
{
"name": "image",
"source": "/document/normalized_images/*"
}
],
"outputs": [
{
"name": "adult"
},
{
"name": "brands"
},
{
"name": "categories"
},
{
"name": "description"
},
{
"name": "faces"
},
{
"name": "objects"
},
{
"name": "tags"
}
]
}
示例索引
对于单个对象(例如 adult
和 description
),可以将它们作为结构置于索引中,使之成为一个 Collection(Edm.ComplexType)
,以便为所有对象返回 adult
和 description
输出。 若要详细了解如何将输出映射到索引字段,请参阅平展复杂类型的信息。
{
"fields": [
{
"name": "metadata_storage_name",
"type": "Edm.String",
"key": true,
"searchable": true,
"filterable": false,
"facetable": false,
"sortable": true
},
{
"name": "metadata_storage_path",
"type": "Edm.String",
"searchable": true,
"filterable": false,
"facetable": false,
"sortable": true
},
{
"name": "content",
"type": "Edm.String",
"sortable": false,
"searchable": true,
"filterable": false,
"facetable": false
},
{
"name": "adult",
"type": "Edm.ComplexType",
"fields": [
{
"name": "isAdultContent",
"type": "Edm.Boolean",
"searchable": false,
"filterable": true,
"facetable": true
},
{
"name": "isGoryContent",
"type": "Edm.Boolean",
"searchable": false,
"filterable": true,
"facetable": true
},
{
"name": "isRacyContent",
"type": "Edm.Boolean",
"searchable": false,
"filterable": true,
"facetable": true
},
{
"name": "adultScore",
"type": "Edm.Double",
"searchable": false,
"filterable": false,
"facetable": false
},
{
"name": "goreScore",
"type": "Edm.Double",
"searchable": false,
"filterable": false,
"facetable": false
},
{
"name": "racyScore",
"type": "Edm.Double",
"searchable": false,
"filterable": false,
"facetable": false
}
]
},
{
"name": "brands",
"type": "Collection(Edm.ComplexType)",
"fields": [
{
"name": "name",
"type": "Edm.String",
"searchable": true,
"filterable": false,
"facetable": false
},
{
"name": "confidence",
"type": "Edm.Double",
"searchable": false,
"filterable": false,
"facetable": false
},
{
"name": "rectangle",
"type": "Edm.ComplexType",
"fields": [
{
"name": "x",
"type": "Edm.Int32",
"searchable": false,
"filterable": false,
"facetable": false
},
{
"name": "y",
"type": "Edm.Int32",
"searchable": false,
"filterable": false,
"facetable": false
},
{
"name": "w",
"type": "Edm.Int32",
"searchable": false,
"filterable": false,
"facetable": false
},
{
"name": "h",
"type": "Edm.Int32",
"searchable": false,
"filterable": false,
"facetable": false
}
]
}
]
},
{
"name": "categories",
"type": "Collection(Edm.ComplexType)",
"fields": [
{
"name": "name",
"type": "Edm.String",
"searchable": true,
"filterable": false,
"facetable": false
},
{
"name": "score",
"type": "Edm.Double",
"searchable": false,
"filterable": false,
"facetable": false
},
{
"name": "detail",
"type": "Edm.ComplexType",
"fields": [
{
"name": "celebrities",
"type": "Collection(Edm.ComplexType)",
"fields": [
{
"name": "name",
"type": "Edm.String",
"searchable": true,
"filterable": false,
"facetable": false
},
{
"name": "faceBoundingBox",
"type": "Collection(Edm.ComplexType)",
"fields": [
{
"name": "x",
"type": "Edm.Int32",
"searchable": false,
"filterable": false,
"facetable": false
},
{
"name": "y",
"type": "Edm.Int32",
"searchable": false,
"filterable": false,
"facetable": false
}
]
},
{
"name": "confidence",
"type": "Edm.Double",
"searchable": false,
"filterable": false,
"facetable": false
}
]
},
{
"name": "landmarks",
"type": "Collection(Edm.ComplexType)",
"fields": [
{
"name": "name",
"type": "Edm.String",
"searchable": true,
"filterable": false,
"facetable": false
},
{
"name": "confidence",
"type": "Edm.Double",
"searchable": false,
"filterable": false,
"facetable": false
}
]
}
]
}
]
},
{
"name": "description",
"type": "Collection(Edm.ComplexType)",
"fields": [
{
"name": "tags",
"type": "Collection(Edm.String)",
"searchable": true,
"filterable": false,
"facetable": false
},
{
"name": "captions",
"type": "Collection(Edm.ComplexType)",
"fields": [
{
"name": "text",
"type": "Edm.String",
"searchable": true,
"filterable": false,
"facetable": false
},
{
"name": "confidence",
"type": "Edm.Double",
"searchable": false,
"filterable": false,
"facetable": false
}
]
}
]
},
{
"name": "faces",
"type": "Collection(Edm.ComplexType)",
"fields": [
{
"name": "age",
"type": "Edm.Int32",
"searchable": false,
"filterable": false,
"facetable": false
},
{
"name": "gender",
"type": "Edm.String",
"searchable": false,
"filterable": false,
"facetable": false
},
{
"name": "faceBoundingBox",
"type": "Collection(Edm.ComplexType)",
"fields": [
{
"name": "top",
"type": "Edm.Int32",
"searchable": false,
"filterable": false,
"facetable": false
},
{
"name": "left",
"type": "Edm.Int32",
"searchable": false,
"filterable": false,
"facetable": false
},
{
"name": "width",
"type": "Edm.Int32",
"searchable": false,
"filterable": false,
"facetable": false
},
{
"name": "height",
"type": "Edm.Int32",
"searchable": false,
"filterable": false,
"facetable": false
}
]
}
]
},
{
"name": "objects",
"type": "Collection(Edm.ComplexType)",
"fields": [
{
"name": "object",
"type": "Edm.String",
"searchable": true,
"filterable": false,
"facetable": false
},
{
"name": "confidence",
"type": "Edm.Double",
"searchable": false,
"filterable": false,
"facetable": false
},
{
"name": "rectangle",
"type": "Edm.ComplexType",
"fields": [
{
"name": "x",
"type": "Edm.Int32",
"searchable": false,
"filterable": false,
"facetable": false
},
{
"name": "y",
"type": "Edm.Int32",
"searchable": false,
"filterable": false,
"facetable": false
},
{
"name": "w",
"type": "Edm.Int32",
"searchable": false,
"filterable": false,
"facetable": false
},
{
"name": "h",
"type": "Edm.Int32",
"searchable": false,
"filterable": false,
"facetable": false
}
]
},
{
"name": "parent",
"type": "Edm.ComplexType",
"fields": [
{
"name": "object",
"type": "Edm.String",
"searchable": true,
"filterable": false,
"facetable": false
},
{
"name": "confidence",
"type": "Edm.Double",
"searchable": false,
"filterable": false,
"facetable": false
}
]
}
]
},
{
"name": "tags",
"type": "Collection(Edm.ComplexType)",
"fields": [
{
"name": "name",
"type": "Edm.String",
"searchable": true,
"filterable": false,
"facetable": false
},
{
"name": "hint",
"type": "Edm.String",
"searchable": true,
"filterable": false,
"facetable": false
},
{
"name": "confidence",
"type": "Edm.Double",
"searchable": false,
"filterable": false,
"facetable": false
}
]
}
]
}
输出字段映射示例
目标字段可以是复杂字段或集合。 索引定义可指定任何子字段。
"outputFieldMappings": [
{
"sourceFieldName": "/document/normalized_images/*/adult",
"targetFieldName": "adult"
},
{
"sourceFieldName": "/document/normalized_images/*/brands/*",
"targetFieldName": "brands"
},
{
"sourceFieldName": "/document/normalized_images/*/categories/*",
"targetFieldName": "categories"
},
{
"sourceFieldName": "/document/normalized_images/*/description",
"targetFieldName": "description"
},
{
"sourceFieldName": "/document/normalized_images/*/faces/*",
"targetFieldName": "faces"
},
{
"sourceFieldName": "/document/normalized_images/*/objects/*",
"targetFieldName": "objects"
},
{
"sourceFieldName": "/document/normalized_images/*/tags/*",
"targetFieldName": "tags"
}
输出字段映射的变体(嵌套属性)
可以将输出字段映射定义为较低级别的属性,例如仅名人或地标。 在这种情况下,请确保索引架构有一个字段专门包含每种详细信息。
"outputFieldMappings": [
{
"sourceFieldName": "/document/normalized_images/*/categories/detail/celebrities/*",
"targetFieldName": "celebrities"
},
{
"sourceFieldName": "/document/normalized_images/*/categories/detail/landmarks/*",
"targetFieldName": "landmarks"
}
示例输入
{
"values": [
{
"recordId": "1",
"data": {
"image": {
"data": "BASE64 ENCODED STRING OF A JPEG IMAGE",
"width": 500,
"height": 300,
"originalWidth": 5000,
"originalHeight": 3000,
"rotationFromOriginal": 90,
"contentOffset": 500,
"pageNumber": 2
}
}
}
]
}
示例输出
{
"values": [
{
"recordId": "1",
"data": {
"categories": [
{
"name": "abstract_",
"score": 0.00390625
},
{
"name": "people_",
"score": 0.83984375,
"detail": {
"celebrities": [
{
"name": "Satya Nadella",
"faceBoundingBox": [
{
"x": 273,
"y": 309
},
{
"x": 395,
"y": 309
},
{
"x": 395,
"y": 431
},
{
"x": 273,
"y": 431
}
],
"confidence": 0.999028444
}
],
"landmarks": [ ]
}
}
],
"adult": {
"isAdultContent": false,
"isRacyContent": false,
"isGoryContent": false,
"adultScore": 0.0934349000453949,
"racyScore": 0.068613491952419281,
"goreScore": 0.08928389008070282
},
"tags": [
{
"name": "person",
"confidence": 0.98979085683822632
},
{
"name": "man",
"confidence": 0.94493889808654785
},
{
"name": "outdoor",
"confidence": 0.938492476940155
},
{
"name": "window",
"confidence": 0.89513939619064331
}
],
"description": {
"tags": [
"person",
"man",
"outdoor",
"window",
"glasses"
],
"captions": [
{
"text": "Satya Nadella sitting on a bench",
"confidence": 0.48293603002174407
}
]
},
"faces": [
{
"age": 44,
"gender": "Male",
"faceBoundingBox": [
{
"x": 1601,
"y": 395
},
{
"x": 1653,
"y": 395
},
{
"x": 1653,
"y": 447
},
{
"x": 1601,
"y": 447
}
]
}
],
"objects": [
{
"rectangle": {
"x": 25,
"y": 43,
"w": 172,
"h": 140
},
"object": "person",
"confidence": 0.931
}
],
"brands":[
{
"name":"Microsoft",
"confidence": 0.903,
"rectangle":{
"x":20,
"y":97,
"w":62,
"h":52
}
}
]
}
}
]
}
错误案例
在以下错误案例中,未提取任何元素。
错误代码 | 说明 |
---|---|
NotSupportedLanguage |
不支持提供的语言。 |
InvalidImageUrl |
图片 URL 格式不正确或无法访问。 |
InvalidImageFormat |
输入数据不是有效的图像。 |
InvalidImageSize |
输入的图像太大。 |
NotSupportedVisualFeature |
指定的特征类型无效。 |
NotSupportedImage |
不受支持的图片,例如儿童色情内容。 |
InvalidDetails |
不受支持的特定于域的模型。 |
如果收到类似于 "One or more skills are invalid. Details: Error in skill #<num>: Outputs are not supported by skill: Landmarks"
的错误,请检查路径。 名人和地标都是 detail
下的属性。
"categories":[
{
"name":"building_",
"score":0.97265625,
"detail":{
"landmarks":[
{
"name":"Forbidden City",
"confidence":0.92013400793075562
}
]