将认知服务资源附加到 Azure 认知搜索中的技能组Attach a Cognitive Services resource to a skillset in Azure Cognitive Search

在 Azure 认知搜索中配置扩充管道时,可以免费扩充有限数量的文档。When configuring an enrichment pipeline in Azure Cognitive Search, you can enrich a limited number of documents free of charge. 对于更大、更频繁的工作负荷,你应附加可计费的认知服务资源。For larger and more frequent workloads, you should attach a billable Cognitive Services resource.

在本文中,你将了解如何通过为定义了扩充管道的技能组分配密钥来附加资源。In this article, you'll learn how to attach a resource by assigning a key to a skillset that defines an enrichment pipeline.

扩充期间使用的资源Resources used during enrichment

Azure 认知搜索依赖于认知服务,这包括用于图像分析和光学字符识别 (OCR) 的计算机视觉、用于自然语言处理的文本分析,以及文本翻译之类的其他扩充。Azure Cognitive Search has a dependency on Cognitive Services, including Computer Vision for image analysis and optical character recognition (OCR), Text Analytics for natural language processing, and other enrichments like Text Translation. 在 Azure 认知搜索的扩充上下文中,这些 AI 算法包装在“技能” 中,放置在“技能组” 中,并在索引过程中由索引器 引用。In the context of enrichment in Azure Cognitive Search, these AI algorithms are wrapped inside a skill, placed in a skillset, and referenced by an indexer during indexing.

计费原理How billing works

  • Azure 认知搜索使用你在技能组上提供的认知服务资源密钥为图像和文本扩充计费。Azure Cognitive Search uses the Cognitive Services resource key you provide on a skillset to bill for image and text enrichment. 认知服务预付费价格执行可计费技能。Execution of billable skills is at the Cognitive Services pay-in-advance price.

  • 图像提取是在扩充之前在破解文档时发生的一项 Azure 认知搜索操作。Image extraction is an Azure Cognitive Search operation that occurs when documents are cracked prior to enrichment. 图像提取是可计费的。Image extraction is billable. 有关图像提取定价,请参阅 Azure 认知搜索定价页For image extraction pricing, see the Azure Cognitive Search pricing page.

  • 文本提取还发生在文档破解阶段。Text extraction also occurs during the document cracking phrase. 它不可计费。It is not billable.

  • 不调用认知服务的技能(包括条件性技能、整形程序、文本合并与文本拆分技能)不可计费。Skills that do not call Cognitive Services, including Conditional, Shaper, Text Merge, and Text Split skills, are not billable.

相同区域要求Same-region requirement

我们要求 Azure 认知搜索和 Azure 认知服务位于同一区域。We require that Azure Cognitive Search and Azure Cognitive Services exist within the same region. 否则,将在运行时收到此消息:"Provided key is not a valid CognitiveServices type key for the region of your search service."Otherwise, you will get this message at run time: "Provided key is not a valid CognitiveServices type key for the region of your search service."

无法跨区域移动服务。There is no way to move a service across regions. 如果出现此错误,应该在 Azure 认知搜索所在的区域中创建一个新的认知服务资源。If you get this error, you should create a new Cognitive Services resource in the same region as Azure Cognitive Search.

备注

某些内置技能基于非区域认知服务(例如,文本翻译技能)。Some built-in skills are based on non-regional Cognitive Services (for example, the Text Translation Skill). 使用非区域性技能意味着可能会在 Azure 认知搜索区域以外的区域中为请求提供服务。Using a non-regional skill means that your request might be serviced in a region other than the Azure Cognitive Search region. 有关非区域性服务的详细信息,请参阅认知服务产品(按区域)页面。For more information non-regional services, see the Cognitive Services product by region page.

使用免费资源Use Free resources

可以使用有限的免费处理选项来完成 AI 扩充教程和快速入门练习。You can use a limited, free processing option to complete the AI enrichment tutorial and quickstart exercises.

免费(有限扩充)资源限制为每个索引器每天 20 个文档。Free (Limited enrichments) resources are restricted to 20 documents per day, per indexer. 你可以删除并重新创建索引器来重置计数器。You can delete and recreate the indexer to reset the counter.

  1. 打开导入数据向导:Open the Import data wizard:

    打开导入数据向导Open the Import data wizard

  2. 选择一个数据源,然后继续执行“添加 AI 扩充(可选)” 。Choose a data source and continue to Add AI enrichment (Optional). 有关此向导的分步演练,请参阅在 Azure 门户中创建索引For a step-by-step walkthrough of this wizard, see Create an index in the Azure portal.

  3. 展开“附加认知服务”,然后选择“免费(有限扩充)”: Expand Attach Cognitive Services and then select Free (Limited enrichments):

    展开的“附加认知服务”部分Expanded Attach Cognitive Services section

  4. 现在,你可以继续执行后续步骤,包括“添加认知技能” 。You can now continue on to the next steps, including Add cognitive skills.

使用付费资源Use billable resources

对于每天创建超过 20 个扩充的工作负荷,请确保附加可计费的认知服务资源。For workloads that create more than 20 enrichments per day, make sure to attach a billable Cognitive Services resource. 我们建议你始终附加可计费的认知服务资源,即使你从未打算调用认知服务 API 也是如此。We recommend that you always attach a billable Cognitive Services resource, even if you never intend to call Cognitive Services APIs. 附加资源会重写每日限制。Attaching a resource overrides the daily limit.

只有调用认知服务 API 的技能才收费。You're charged only for skills that call the Cognitive Services APIs. 自定义技能,或者不基于 API 的技能(例如文本合并器文本拆分器整形程序)不收费。You're not billed for custom skills, or skills like text merger, text splitter, and shaper, which aren't API-based.

  1. 打开导入数据向导,选择数据源,然后转到“添加 AI 扩充(可选)”。 Open the Import data wizard, choose a data source, and continue to Add AI enrichment (Optional).

  2. 展开“附加认知服务”,然后选择“创建新的认知服务资源”。 Expand Attach Cognitive Services and then select Create new Cognitive Services resource. 此时会打开一个新的选项卡让你创建资源:A new tab opens so that you can create the resource:

    创建认知服务资源Create a Cognitive Services resource

  3. 在“位置”列表中,选择你的 Azure 认知搜索服务所在的区域。 In the Location list, select the region where your Azure Cognitive Search service is located. 出于性能方面的原因,请确保使用此区域。Make sure to use this region for performance reasons. 使用此区域还可避免跨区域的出站带宽费用。Using this region also voids outbound bandwidth charges across regions.

  4. 在“定价层”列表中,选择“S0”获取认知服务功能一体化集合,包括为 Azure 认知搜索提供的内置技能提供支持的“视觉和语言”功能。 In the Pricing tier list, select S0 to get the all-in-one collection of Cognitive Services features, including the Vision and Language features that back the built-in skills provided by Azure Cognitive Search.

    对于 S0 层,可以在认知服务定价页上找到特定工作负荷的费率。For the S0 tier, you can find rates for specific workloads on the Cognitive Services pricing page.

    • 在“选择套餐”列表中,确保“认知服务”已选中。 In the Select Offer list, make sure Cognitive Services is selected.
    • 在“语言”功能下,“文本分析标准版”的费率适用于 AI 索引。 Under Language features, the rates for Text Analytics Standard apply to AI indexing.
    • 在“视觉”功能下,适用“计算机视觉 S1”的费率。 Under Vision features, the rates for Computer Vision S1 apply.
  5. 选择“创建”预配新的认知服务资源。 Select Create to provision the new Cognitive Services resource.

  6. 返回到包含导入数据向导的上一选项卡。Return to the previous tab, which contains the Import data wizard. 选择“刷新”显示该认知服务资源,然后选择该资源。 Select Refresh to show the Cognitive Services resource, and then select the resource:

    选择认知服务资源Select the Cognitive Services resource

  7. 展开“添加认知技能”部分,选择要针对数据运行的特定认知技能。 Expand the Add cognitive skills section to select the specific cognitive skills that you want to run on your data. 完成向导中的剩余步骤。Complete the rest of the wizard.

将现有技能集附加到认知服务资源Attach an existing skillset to a Cognitive Services resource

如果你有现有的技能集,可将其附加到新的或不同的认知服务资源。If you have an existing skillset, you can attach it to a new or different Cognitive Services resource.

  1. 在“服务概述”页上选择“技能集”: On the Service overview page, select Skillsets:

    “技能组”选项卡Skillsets tab

  2. 选择技能集的名称,然后选择现有资源或新建资源。Select the name of the skillset, and then select an existing resource or create a new one. 选择“确定”以确认所做的更改。 Select OK to confirm your changes.

    技能组资源列表Skillset resource list

    请记住,“免费(有限扩充)”选项限制为每日 20 个文档,可以使用“创建新的认知服务资源”预配新的可计费资源。 Remember that the Free (Limited enrichments) option limits you to 20 documents daily, and that you can use Create new Cognitive Services resource to provision a new billable resource. 如果创建了新资源,请选择“刷新”以刷新认知服务资源的列表,然后选择该资源。 If you create a new resource, select Refresh to refresh the list of Cognitive Services resources, and then select the resource.

以编程方式附加认知服务Attach Cognitive Services programmatically

以编程方式定义技能集时,请将 cognitiveServices 节添加到该技能集。When you're defining the skillset programmatically, add a cognitiveServices section to the skillset. 在该节中,包含要与该技能集关联的认知服务资源的键。In that section, include the key of the Cognitive Services resource that you want to associate with the skillset. 请记住,该资源必须位于 Azure 认知搜索资源所在的同一区域。Remember that the resource must be in the same region as your Azure Cognitive Search resource. 另外请包含 @odata.type,并将其设置为 #Microsoft.Azure.Search.CognitiveServicesByKeyAlso include @odata.type, and set it to #Microsoft.Azure.Search.CognitiveServicesByKey.

以下示例演示了此模式。The following example shows this pattern. 请注意定义末尾的 cognitiveServices 节。Notice the cognitiveServices section at the end of the definition.

PUT https://[servicename].search.azure.cn/skillsets/[skillset name]?api-version=2020-06-30
api-key: [admin key]
Content-Type: application/json
{
    "name": "skillset name",
    "skills": 
    [
      {
        "@odata.type": "#Microsoft.Skills.Text.EntityRecognitionSkill",
        "categories": [ "Organization" ],
        "defaultLanguageCode": "en",
        "inputs": [
          {
            "name": "text", "source": "/document/content"
          }
        ],
        "outputs": [
          {
            "name": "organizations", "targetName": "organizations"
          }
        ]
      }
    ],
    "cognitiveServices": {
        "@odata.type": "#Microsoft.Azure.Search.CognitiveServicesByKey",
        "description": "mycogsvcs",
        "key": "<your key goes here>"
    }
}

示例:估算成本Example: Estimate costs

若要估算认知搜索索引编制的相关成本,请先构想一下文档的大致结构,以便能够得出一些数字。To estimate the costs associated with cognitive search indexing, start with an idea of what an average document looks like so you can run some numbers. 例如,可以使用以下似近值:For example, you might approximate:

  • 1,000 个 PDF。1,000 PDFs.
  • 每个 PDF 有 6 个页面。Six pages each.
  • 每个页面包含一个图像(共 6,000 个图像)。One image per page (6,000 images).
  • 每个页面包含 3,000 个字符。3,000 characters per page.

假设管道的功能包括:每个 PDF 的文档破解、图像和文本提取、图像的光学字符识别 (OCR),以及组织的实体识别。Assume a pipeline that consists of document cracking of each PDF, image and text extraction, optical character recognition (OCR) of images, and entity recognition of organizations.

本文中所示的价格是虚构的。The prices shown in this article are hypothetical. 这些价格用于演示估算过程。They're used to illustrate the estimation process. 你的成本可能更低。Your costs could be lower. 有关实际交易价格,请参阅认知服务定价For the actual prices of transactions, see See Cognitive Services pricing.

  1. 破解包含文本和图像内容的文档时,文本提取目前是免费的。For document cracking with text and image content, text extraction is currently free. 对于 6,000 个图像,假设每提取 1,000 个图像需要 $1。For 6,000 images, assume $1 for every 1,000 images extracted. 则此步骤的成本是 $6.00。That's a cost of $6.00 for this step.

  2. 对于英语环境中 6000 个图像的 OCR,OCR 认知技能使用最佳算法 (DescribeText)。For OCR of 6,000 images in English, the OCR cognitive skill uses the best algorithm (DescribeText). 假设每分析 1,000 个图像的成本为 $2.50,则这一步需要支付 $15.00。Assuming a cost of $2.50 per 1,000 images to be analyzed, you would pay $15.00 for this step.

  3. 提取实体时,每页总共有 3 个文本记录。For entity extraction, you'd have a total of three text records per page. 每条记录包含 1,000 个字符。Each record is 1,000 characters. 每页 3 个文本记录 * 6,000 页 = 18,000 个文本记录。Three text records per page multiplied by 6,000 pages equals 18,000 text records. 假设 1000 个文本记录的成本为 $2.00,则这一步的成本为 $36.00。Assuming $2.00 per 1,000 text records, this step would cost $36.00.

综合起来,在使用上述技能集引入 1,000 个此类 PDF 文档时,需要支付大约 $57.00。Putting it all together, you'd pay about $57.00 to ingest 1,000 PDF documents of this type with the described skillset.

后续步骤Next steps