快速入门:在 Azure 门户中创建 Azure 认知搜索知识存储Quickstart: Create an Azure Cognitive Search knowledge store in the Azure portal

知识存储是 Azure 认知搜索的一项功能,它可以保存内容处理管道的输出,以进行后续分析或下游处理。Knowledge store is a feature of Azure Cognitive Search that persists output from a content processing pipeline for subsequent analyses or downstream processing.

管道接受非结构化文本和图像内容,应用认知服务提供技术支持的 AI(例如 OCR 和自然语言处理),并输出之前不存在的新结构和信息。A pipeline accepts unstructured text and image content, applies AI powered by Cognitive Services (such as OCR and natural language processing), and outputs new structures and information that didn't previously exist. 管道创建的物理项目之一是知识存储,可通过工具访问该知识存储以分析和浏览内容。One of the physical artifacts created by a pipeline is a knowledge store, which you can access through tools to analyze and explore content.

在本快速入门中,你将合并 Azure 云中的服务和数据以创建知识存储。In this quickstart, you'll combine services and data in the Azure cloud to create a knowledge store. 一切准备就绪后,可在门户中运行“导入数据”向导,以将这些数据提取到一起。Once everything is in place, you'll run the Import data wizard in the portal to pull it all together. 最终结果是可以在门户(存储资源管理器)中查看的原始文本内容和 AI 生成的内容。The end result is original text content plus AI-generated content that you can view in the portal (Storage explorer).

先决条件Prerequisites

在开始之前,必须满足以下条件:Before you begin, you must have the following:

备注

此快速入门还将 Azure 认知服务用于 AI。This quickstart also uses Azure Cognitive Services for the AI. 由于工作负荷很小,因此,认知服务在幕后会抽调一部分算力来免费处理事务(最多 20 个)。Because the workload is so small, Cognitive Services is tapped behind the scenes for free processing for up to 20 transactions. 这意味着,无需创建其他认知服务资源即可完成此练习。This means that you can complete this exercise without having to create an additional Cognitive Services resource.

设置数据Set up your data

在以下步骤中,在 Azure 存储中设置 blob 容器以存储异类内容文件。In the following steps, set up a blob container in Azure Storage to store heterogeneous content files.

  1. 下载 HotelReviews_Free.csvDownload HotelReviews_Free.csv. 此数据是保存在某个 CSV 文件中的酒店评论数据(源自 Kaggle.com),其中包含客户对一家酒店的 19 条反馈。This data is hotel review data saved in a CSV file (originates from Kaggle.com) and contains 19 pieces of customer feedback about a single hotel.

  2. 创建 Azure 存储帐户,或在当前订阅下查找现有帐户Create an Azure storage account or find an existing account under your current subscription. 你将使用 Azure 存储来保存要导入的原始内容,并使用知识存储(最终结果)。You'll use Azure storage for both the raw content to be imported, and the knowledge store that is the end result.

    • 选择“StorageV2 (常规用途 V2)”帐户类型。Choose the StorageV2 (general purpose V2) account type.
  3. 打开 Blob 服务页并创建一个名为 hotel-reviews 的容器。Open the Blob services pages and create a container named hotel-reviews.

  4. 单击“上载” 。Click Upload.

    上传数据Upload the data

  5. 选择在第一个步骤中下载的 HotelReviews-Free.csv 文件。Select the HotelReviews-Free.csv file you downloaded in the first step.

    创建 Azure Blob 容器Create the Azure Blob container

  6. 在退出 Blob 存储页面之前,请使用左侧导航窗格中的链接打开“访问密钥”页。Before you quit the Blob storage pages, use a link on the left navigation pane to open the Access Keys page. 获取用于从 Blob 存储检索数据的连接字符串。Get a connection string to retrieve data from Blob storage. 连接字符串类似于以下示例:DefaultEndpointsProtocol=https;AccountName=<YOUR-ACCOUNT-NAME>;AccountKey=<YOUR-ACCOUNT-KEY>;EndpointSuffix=core.chinacloudapi.cnA connection string looks similar to the following example: DefaultEndpointsProtocol=https;AccountName=<YOUR-ACCOUNT-NAME>;AccountKey=<YOUR-ACCOUNT-KEY>;EndpointSuffix=core.chinacloudapi.cn

现在可以在“导入数据”向导中转到下一步。You are now ready to move on the Import data wizard.

运行“导入数据”向导Run the Import data wizard

  1. 使用 Azure 帐户登录到 Azure 门户Sign in to the Azure portal with your Azure account.

  2. 查找搜索服务,并在“概述”页中,单击命令栏上的“导入数据”,通过四个步骤创建知识存储。Find your search service and on the Overview page, click Import data on the command bar to create a knowledge store in four steps.

    导入数据命令

步骤 1:创建数据源Step 1: Create a data source

  1. 在“连接到数据”中,选择“Azure Blob 存储”,再选择创建的帐户和容器 。In Connect to your data, choose Azure Blob storage, select the account and container you created.

  2. 对于“名称”,请输入 hotel-reviews-dsFor the Name, enter hotel-reviews-ds.

  3. 对于“分析模式”,请选择“分隔文本”,然后选中“第一行包含标头”复选框。 For Parsing mode, select Delimited text, and then select the First Line Contains Header checkbox. 确保“分隔符”是逗号 (,)。Make sure the Delimiter character is a comma (,).

  4. 在“连接字符串”中,粘贴从 Azure 存储的“访问密钥”页面复制的连接字符串 。In Connection String, paste in the connection string you copied from the Access Keys page in Azure Storage.

  5. 在“容器”中,输入保存数据的 blob 容器的名称。In Containers, enter the name of the blob container holding the data.

    页面应类似于以下屏幕截图。Your page should look similar to the following screenshot.

    创建数据源对象Create a data source object

  6. 继续转到下一页。Continue to the next page.

步骤 2:添加认知技能Step 2: Add cognitive skills

在此向导步骤中,你将创建一个包含认知技能扩充的技能集。In this wizard step, you will create a skillset with cognitive skill enrichments. 源数据由多种语言的客户评论构成。The source data consists of customer reviews in several languages. 与此数据集相关的技能包括关键短语提取、情绪检测和文本翻译。Skills that are relevant for this data set include key phrase extraction, sentiment detection, and text translation. 在后续步骤中,这些扩充内容将以 Azure 表的形式“投影”到知识存储。In a later step, these enrichments will be "projected" into a knowledge store as Azure tables.

  1. 展开“附加认知服务”。Expand Attach Cognitive Services. 默认已选择“免费(受限扩充)”。Free (Limited enrichments) is selected by default. 之所以可以使用此资源,是因为 HotelReviews-Free.csv 中的记录数为 19 个,并且此免费资源每天最多允许 20 个事务。You can use this resource because number of records in HotelReviews-Free.csv is 19 and this free resource allows up to 20 transactions a day.

  2. 展开“添加扩充”。Expand Add enrichments.

  3. 对于“技能集名称”,请输入 hotel-reviews-ssFor Skillset name, enter hotel-reviews-ss.

  4. 对于“源数据字段”,请选择“reviews_text”。 For Source data field, select reviews_text.

  5. 对于“扩充粒度级别”,请选择“页面(5000 个字符区块)”。 For Enrichment granularity level, select Pages (5000 characters chunks)

  6. 选择以下认知技能:Select these cognitive skills:

    • 提取关键短语Extract key phrases

    • 翻译文本Translate text

    • 检测情绪Detect sentiment

      创建技能集Create a skillset

  7. 展开“将扩充内容保存到知识存储”。Expand Save enrichments to knowledge store.

  8. 选择以下 Azure 表投影Select these Azure table projections:

    • 文档Documents
    • Pages
    • 关键短语Key phrases
  9. 输入在上一步骤中保存的存储帐户连接字符串Enter the Storage account Connection String that you saved in a previous step.

    配置知识存储Configure knowledge store

  10. 或者,下载 Power BI 模板。Optionally, download a Power BI template. 从向导中访问模板时,会更改本地 .pbit 文件以反映数据的形状。When you access the template from the wizard, the local .pbit file is adapted to reflect the shape of your data.

  11. 继续转到下一页。Continue to the next page.

步骤 3:配置索引Step 3: Configure the index

在此向导步骤中,你将为可选的全文搜索查询配置索引。In this wizard step, you will configure an index for optional full-text search queries. 向导将对数据源进行采样,以推断字段和数据类型。The wizard will sample your data source to infer fields and data types. 你只需为所需的行为选择属性。You only need to select the attributes for your desired behavior. 例如,“可检索”属性将允许搜索服务返回一个字段值,而“可搜索”属性将对字段启用全文搜索。 For example, the Retrievable attribute will allow the search service to return a field value while the Searchable will enable full text search on the field.

  1. 对于“索引名称”,请输入 hotel-reviews-idxFor Index name, enter hotel-reviews-idx.

  2. 对于属性,请接受默认选项:“可检索”和“可搜索”(对于管道正在创建的新字段) 。For attributes, accept the default selections: Retrievable and Searchable for the new fields that the pipeline is creating.

    索引应与下图类似:Your index should look similar to the following image. 由于该列表很长,图像中未显示所有字段。Because the list is long, not all fields are visible in the image.

    配置索引Configure an index

  3. 继续转到下一页。Continue to the next page.

步骤 4:配置索引器Step 4: Configure the indexer

在此向导步骤中,你将配置一个索引器,用于统一提取前面向导步骤中定义的数据源、技能集和索引。In this wizard step, you will configure an indexer that will pull together the data source, skillset, and the index you defined in the previous wizard steps.

  1. 对于“名称”,请输入 hotel-reviews-idxrFor Name, enter hotel-reviews-idxr.

  2. 对于“计划”,请保留默认设置“一次”。 For Schedule, keep the default Once.

  3. 单击“提交”运行索引器。Click Submit to run the indexer. 数据提取、索引编制和应用认知技能的操作都在此步骤中发生。Data extraction, indexing, application of cognitive skills all happen in this step.

监视状态Monitor status

与典型的基于文本的索引相比,认知技能索引编制需要花费更长的时间才能完成。Cognitive skill indexing takes longer to complete than typical text-based indexing. 向导应在概述页打开索引器列表,以便你能够跟踪进度。The wizard should open the Indexer list in the overview page so that you can track progress. 若要进行自导航,请转到“概述”页,然后单击“索引器”。For self-navigation, go to the Overview page and click Indexers.

在 Azure 门户中,还可以监视可单击的“Azure 认知搜索通知”状态链接的通知活动日志。In the Azure portal, you can also monitor the Notifications activity log for a clickable Azure Cognitive Search notification status link. 执行过程可能需要几分钟才能完成。Execution may take several minutes to complete.

后续步骤Next steps

使用认知服务扩充数据并将结果投影到知识存储后,接下来可以使用存储资源管理器或 Power BI 来浏览扩充的数据集。Now that you have enriched your data using Cognitive Services and projected the results into a knowledge store, you can use Storage Explorer or Power BI to explore your enriched data set.

可以在存储资源管理器中查看内容,或进一步使用 Power BI 通过可视化来获取见解。You can view content in Storage Explorer, or take it a step further with Power BI to gain insights through visualization.

提示

若要重复此练习或尝试其他 AI 扩充演练,请删除 hotel-reviews-idxr 索引器。If you want to repeat this exercise or try a different AI enrichment walkthrough, delete the hotel-reviews-idxr indexer. 删除该索引器会将认知服务处理功能的每日免费事务计数器重置为零。Deleting the indexer resets the free daily transaction counter back to zero for Cognitive Services processing.