OCR 认知技能OCR cognitive skill

光学字符识别 (OCR) 技能可识别图像文件中的印刷体文本和手写文本。The Optical character recognition (OCR) skill recognizes printed and handwritten text in image files. 此技能使用认知服务中的计算机视觉 API v3.0 提供的机器学习模型。This skill uses the machine learning models provided by Computer Vision API v3.0 in Cognitive Services. OCR 技能将映射到以下功能:The OCR skill maps to the following functionality:

  • 对于英语、西班牙语、德语、法语、意大利语、葡萄牙语和荷兰语,使用新的“读取”API。For English, Spanish, German, French, Italian, Portuguese, and Dutch, the new "Read" API is used.
  • 对于所有其他语言,使用“OCR”API。For all other languages, the "OCR" API is used.

OCR 技能可以从图像文件中提取文本。The OCR skill extracts text from image files. 支持的文件格式包括:Supported file formats include:

  • .JPEG.JPEG
  • .JPG.JPG
  • .PNG.PNG
  • .BMP.BMP
  • .GIF.GIF
  • .TIFF.TIFF

备注

通过增大处理频率、添加更多文档或添加更多 AI 算法来扩大范围时,需要附加可计费的认知服务资源As you expand scope by increasing the frequency of processing, adding more documents, or adding more AI algorithms, you will need to attach a billable Cognitive Services resource. 调用认知服务中的 API 以及在 Azure 认知搜索中的文档破解阶段提取图像时,会产生费用。Charges accrue when calling APIs in Cognitive Services, and for image extraction as part of the document-cracking stage in Azure Cognitive Search. 提取文档中的文本不会产生费用。There are no charges for text extraction from documents.

当你行使内置技能时,我们会按现有的认知服务预付费价格收费。Execution of built-in skills is charged at the existing Cognitive Services pay-in-advance price. 图像提取定价如 Azure 认知搜索定价页所述。Image extraction pricing is described on the Azure Cognitive Search pricing page.

技能参数Skill parameters

参数区分大小写。Parameters are case-sensitive.

参数名称Parameter name 说明Description
detectOrientation 启用图像方向自动检测。Enables autodetection of image orientation.
有效值:true / false。Valid values: true / false.
defaultLanguageCode

输入文本的语言代码。Language code of the input text. 支持的语言包括:Supported languages include:
zh-Hans(中文简体)zh-Hans (ChineseSimplified)
zh-Hant(中文繁体)zh-Hant (ChineseTraditional)
cs(捷克语)cs (Czech)
da(丹麦语)da (Danish)
nl(荷兰语)nl (Dutch)
en(英语)en (English)
fi(芬兰语)fi (Finnish)
fr(法语)fr (French)
de(德语)de (German)
el(希腊语)el (Greek)
hu(匈牙利)hu (Hungarian)
it(意大利语)it (Italian)
ja(日语)ja (Japanese)
ko(韩语)ko (Korean)
nb(挪威语)nb (Norwegian)
pl(波兰语)pl (Polish)
pt(葡萄牙语)pt (Portuguese)
ru(俄语)ru (Russian)
es(西班牙语)es (Spanish)
sv(瑞典语)sv (Swedish)
tr(土耳其语)tr (Turkish)
ar(阿拉伯语)ar (Arabic)
ro(罗马尼亚语)ro (Romanian)
sr-Cyrl(塞尔维亚语西里尔文)sr-Cyrl (SerbianCyrillic)
sr-Latn(塞尔维亚语拉丁语)sr-Latn (SerbianLatin)
sk(斯洛伐克语)sk (Slovak)
unk(未知)unk (Unknown)

如果语言代码未指定或为 null,则语言将设置为英语。If the language code is unspecified or null, the language will be set to English. 如果语言显式设置为“unk”,则将自动检测语言。If the language is explicitly set to "unk", the language will be auto-detected.

lineEnding 要在各个检测到的行之间使用的值。The value to use between each detected line. 可能的值:“Space”、“CarriageReturn”、“LineFeed”。Possible values: "Space", "CarriageReturn", "LineFeed". 默认值为“Space”。The default is "Space".

以前,有一个名为“textExtractionAlgorithm”的参数,用于指定技能是提取“印刷”文本还是“手写”文本。Previously, there was a parameter called "textExtractionAlgorithm" for specifying whether the skill should extract "printed" or "handwritten" text. 此参数已弃用,不再需要,因为最新的 Read API 算法能够同时提取这两种类型的文本。This parameter is deprecated and no longer necessary as the latest Read API algorithm is capable of extracting both types of text at once. 如果技能定义已经包含此参数,则无需删除它,但是将不再使用它,并且无论将它设置为什么内容,将来都会提取这两种类型的文本。If your skill definition already includes this parameter, you do not need to remove it, but it will no longer be used and both types of text will be extracted going forward regardless of what it is set to.

技能输入Skill inputs

输入名称Input name 说明Description
image 复杂类型。Complex Type. 当前仅适用于“/document/normalized_images”字段,当 imageAction 设置为非 none 值时由 Azure Blob 索引器生成。Currently only works with "/document/normalized_images" field, produced by the Azure Blob indexer when imageAction is set to a value other than none. 请参阅此示例获取详细信息。See the sample for more information.

技能输出Skill outputs

输出名称Output name 说明Description
text 从映像中提取的纯文本。Plain text extracted from the image.
layoutText 描述提取的文本以及找到文本的位置的复杂类型。Complex type that describes the extracted text and the location where the text was found.

示例定义Sample definition

{
  "skills": [
    {
      "description": "Extracts text (plain and structured) from image.",
      "@odata.type": "#Microsoft.Skills.Vision.OcrSkill",
      "context": "/document/normalized_images/*",
      "defaultLanguageCode": null,
      "detectOrientation": true,
      "inputs": [
        {
          "name": "image",
          "source": "/document/normalized_images/*"
        }
      ],
      "outputs": [
        {
          "name": "text",
          "targetName": "myText"
        },
        {
          "name": "layoutText",
          "targetName": "myLayoutText"
        }
      ]
    }
  ]
}

示例文本和 layoutText 输出Sample text and layoutText output

{
  "text": "Hello World. -John",
  "layoutText":
  {
    "language" : "en",
    "text" : "Hello World. -John",
    "lines" : [
      {
        "boundingBox":
        [ {"x":10, "y":10}, {"x":50, "y":10}, {"x":50, "y":30},{"x":10, "y":30}],
        "text":"Hello World."
      },
      {
        "boundingBox": [ {"x":110, "y":10}, {"x":150, "y":10}, {"x":150, "y":30},{"x":110, "y":30}],
        "text":"-John"
      }
    ],
    "words": [
      {
        "boundingBox": [ {"x":110, "y":10}, {"x":150, "y":10}, {"x":150, "y":30},{"x":110, "y":30}],
        "text":"Hello"
      },
      {
        "boundingBox": [ {"x":110, "y":10}, {"x":150, "y":10}, {"x":150, "y":30},{"x":110, "y":30}],
        "text":"World."
      },
      {
        "boundingBox": [ {"x":110, "y":10}, {"x":150, "y":10}, {"x":150, "y":30},{"x":110, "y":30}],
        "text":"-John"
      }
    ]
  }
}

示例:将从嵌入图像中提取的文本与文档内容合并。Sample: Merging text extracted from embedded images with the content of the document.

文本合并的一个常见用例是将图像的文本表示形式(OCR 技能中的文本或图像的描述文字)合并到文档的内容字段中。A common use case for Text Merger is the ability to merge the textual representation of images (text from an OCR skill, or the caption of an image) into the content field of a document.

以下示例技能集创建 merged_text 字段。The following example skillset creates a merged_text field. 此字段包含文档的文本内容以及该文档中嵌入的每个图像的 OCRed 文本。This field contains the textual content of your document and the OCRed text from each of the images embedded in that document.

请求正文语法Request Body Syntax

{
  "description": "Extract text from images and merge with content text to produce merged_text",
  "skills":
  [
    {
      "description": "Extract text (plain and structured) from image.",
      "@odata.type": "#Microsoft.Skills.Vision.OcrSkill",
      "context": "/document/normalized_images/*",
      "defaultLanguageCode": "en",
      "detectOrientation": true,
      "inputs": [
        {
          "name": "image",
          "source": "/document/normalized_images/*"
        }
      ],
      "outputs": [
        {
          "name": "text"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.MergeSkill",
      "description": "Create merged_text, which includes all the textual representation of each image inserted at the right location in the content field.",
      "context": "/document",
      "insertPreTag": " ",
      "insertPostTag": " ",
      "inputs": [
        {
          "name":"text",
          "source": "/document/content"
        },
        {
          "name": "itemsToInsert", 
          "source": "/document/normalized_images/*/text"
        },
        {
          "name":"offsets", 
          "source": "/document/normalized_images/*/contentOffset"
        }
      ],
      "outputs": [
        {
          "name": "mergedText", 
          "targetName" : "merged_text"
        }
      ]
    }
  ]
}

以上技能集示例假设存在标准化的图像字段。The above skillset example assumes that a normalized-images field exists. 要生成此字段,请将索引器定义中的 imageAction 配置设置为 generateNormalizedImages,如下所示:To generate this field, set the imageAction configuration in your indexer definition to generateNormalizedImages as shown below:

{
  //...rest of your indexer definition goes here ...
  "parameters": {
    "configuration": {
      "dataToExtract":"contentAndMetadata",
      "imageAction":"generateNormalizedImages"
    }
  }
}

另请参阅See also