教程:将 Power BI 与文本分析认知服务集成Tutorial: Integrate Power BI with the Text Analytics Cognitive Service

Power BI Desktop 是免费的应用程序,可让你连接、转换和可视化你的数据。Power BI Desktop is a free application that lets you connect to, transform, and visualize your data. 文本分析服务是 Azure 认知服务的一部分,它提供自然语言处理。The Text Analytics service, part of Azure Cognitive Services, provides natural language processing. 给定原始的非结构化文本,它可以提取最重要的短语、分析情绪和确定已知实体(例如品牌)。Given raw unstructured text, it can extract the most important phrases, analyze sentiment, and identify well-known entities such as brands. 可以综合使用这些工具快速了解客户谈论的内容和客户的感受。Together, these tools can help you quickly see what your customers are talking about and how they feel about it.

本教程介绍以下操作:In this tutorial, you'll learn how to:

  • 使用 Power BI Desktop 导入和转换数据Use Power BI Desktop to import and transform data
  • 在 Power BI Desktop 中创建自定义函数Create a custom function in Power BI Desktop
  • 将 Power BI Desktop 与文本分析关键短语 API 集成Integrate Power BI Desktop with the Text Analytics Key Phrases API
  • 使用文本分析关键短语 API 从客户反馈中提取最重要的短语Use the Text Analytics Key Phrases API to extract the most important phrases from customer feedback
  • 从客户反馈创建词云Create a word cloud from customer feedback

先决条件Prerequisites

加载客户数据Load customer data

打开 Power BI Desktop 并加载你在先决条件中下载的逗号分隔值 (CSV) 文件 FabrikamComments.csv 即可开始。To get started, open Power BI Desktop and load the comma-separated value (CSV) file FabrikamComments.csv that you downloaded in Prerequisites. 此文件代表某个虚构小公司的支持论坛中一天的虚构活动。This file represents a day's worth of hypothetical activity in a fictional small company's support forum.

Note

Power BI 可以使用各种源(例如 SQL 数据库)提供的数据。Power BI can use data from a wide variety of sources, such as a SQL database. 请在 SQL Server 与 Power BI 的集成中了解详细信息。Learn more at SQL Server integration with Power BI.

在 Power BI Desktop 主窗口中,选择“开始”功能区。In the main Power BI Desktop window, select the Home ribbon. 在功能区的“外部数据”组中,打开“获取数据”下拉菜单,然后选择“文本/CSV”。In the External data group of the ribbon, open the Get Data drop-down menu and select Text/CSV.

[“获取数据”按钮]

此时会显示“打开”对话框。The Open dialog appears. 导航到“下载”文件夹,或你下载 FabrikamComments.csv 文件的文件夹。Navigate to your Downloads folder, or to the folder where you downloaded the FabrikamComments.csv file. 单击“FabrikamComments.csv”,然后单击“打开”按钮。Click FabrikamComments.csv, then the Open button. 此时会显示 CSV 导入对话框。The CSV import dialog appears.

[CSV 导入对话框]

CSV 导入对话框用于验证 Power BI Desktop 是否已正确检测到字符集、分隔符、标头行和列类型。The CSV import dialog lets you verify that Power BI Desktop has correctly detected the character set, delimiter, header rows, and column types. 这些信息全都正确,因此单击“加载”。This information is all correct, so click Load.

若要查看加载的数据,请单击 Power BI 工作区左侧的“数据视图”按钮。To see the loaded data, click the Data View button on the left edge of the Power BI workspace. 将会打开一个包含数据的表,就像在 Microsoft Excel 中一样。A table opens that contains the data, like in Microsoft Excel.

[已导入数据的初始视图]

准备数据Prepare the data

你可能需要先在 Power BI Desktop 中转换数据,然后它才可准备好由文本分析服务的关键短语 API 进行处理。You may need to transform your data in Power BI Desktop before it's ready to be processed by the Key Phrases API of the Text Analytics service.

示例数据包含一个 subject 列和一个 comment 列。The sample data contains a subject column and a comment column. 使用 Power BI Desktop 中的“合并列”功能,你可以从这两个列(而非仅仅 comment 列)的数据中提取关键短语。With the Merge Columns function in Power BI Desktop, you can extract key phrases from the data in both these columns, rather than just the comment column.

在 Power BI Desktop 中,选择“开始”功能区。In Power BI Desktop, select the Home ribbon. 在“外部数据”组中,单击“编辑查询”。In the External data group, click Edit Queries.

[“主页”功能区中的“外部数据”组]

选择窗口左侧“查询”列表中的 FabrikamComments(如果尚未选择)。Select FabrikamComments in the Queries list at the left side of the window if it isn't already selected.

现在请选择表中的 subjectcomment 列。Now select both the subject and comment columns in the table. 可能需要进行水平滚动才能看到这些列。You may need to scroll horizontally to see these columns. 首先单击 subject 列标题,然后在按住 Ctrl 键的同时单击 comment 列标题。First click the subject column header, then hold down the Control key and click the comment column header.

[选择要合并的字段]

选择“转换”功能区。Select the Transform ribbon. 在功能区的“文本列”组中,单击“合并列”。In the Text Columns group of the ribbon, click Merge Columns. 此时会显示“合并列”对话框。The Merge Columns dialog appears.

[使用“合并列”对话框合并字段]

在“合并列”对话框中,选择 Tab 作为分隔符,然后单击“确定”。In the Merge Columns dialog, choose Tab as the separator, then click OK.

也可考虑使用“删除空白”筛选器筛选掉空白消息,或者使用“洁净转换”删除无法打印的字符。You might also consider filtering out blank messages using the Remove Empty filter, or removing unprintable characters using the Clean transformation. 如果数据包含的一个列类似于示例文件中的 spamscore 列,则可使用“数字筛选器”跳过“垃圾”评论。If your data contains a column like the spamscore column in the sample file, you can skip "spam" comments using a Number Filter.

了解 APIUnderstand the API

对于每个 HTTP 请求,文本分析服务的关键短语 API 最多可以处理一千个文本文档。The Key Phrases API of the Text Analytics service can process up to a thousand text documents per HTTP request. Power BI 偏好一次处理一个记录,因此在本教程中,对 API 的每次调用将只包含一个文档。Power BI prefers to deal with records one at a time, so in this tutorial your calls to the API will include only a single document each. 对于每个要处理的文档,关键短语 API 要求提供以下字段。The Key Phrases API requires the following fields for each document being processed.

id 请求中此文档的唯一标识符。A unique identifier for this document within the request. 响应也包含此字段。The response also contains this field. 这样,在处理多个文档时,就可以轻松地将提取的关键短语与所在文档相关联。That way, if you process more than one document, you can easily associate the extracted key phrases with the document they came from. 在本教程中,由于每个请求只处理一个文档,你可以将 id 值硬编码为针对每个请求均相同。In this tutorial, because you're processing only one document per request, you can hard-code the value of id to be the same for each request.
text 要处理的文本。The text to be processed. 此字段的值来自于你在先前部分中创建的 Merged 列,其包含合并的主题行和评论文本。The value of this field comes from the Merged column you created in the previous section, which contains the combined subject line and comment text. 关键短语 API 要求该数据不得超出 5,120 个字符。The Key Phrases API requires this data be no longer than about 5,120 characters.
language 表示撰写文档时所采用的自然语言的代码。The code for the natural language the document is written in. 示例数据中的所有消息均为英语,因此你可以为此字段硬编码 en 值。All the messages in the sample data are in English, so you can hard-code the value en for this field.

创建自定义函数Create a custom function

现在你可以创建集成 Power BI 与文本分析的自定义函数了。Now you're ready to create the custom function that will integrate Power BI and Text Analytics. 此函数将要处理的文本作为参数接收。The function receives the text to be processed as a parameter. 它将数据转换为所需的 JSON 格式或从该格式进行转换,并对关键短语 API 发出 HTTP 请求。It converts data to and from the required JSON format and makes the HTTP request to the Key Phrases API. 然后该函数会分析来自 API 的响应,并返回包含所提取关键短语的逗号分隔列表的字符串。The function then parses the response from the API and returns a string that contains a comma-separated list of the extracted key phrases.

Note

Power BI Desktop 自定义函数是以 Power Query M 公式语言(简称“M”)编写的。Power BI Desktop custom functions are written in the Power Query M formula language, or just "M" for short. M 是基于 F# 的函数编程语言。M is a functional programming language based on F#. 不过,完成本教程不需要你是程序员;所需代码已在下面提供。You don't need to be a programmer to finish this tutorial, though; the required code is included below.

在 Power BI Desktop 中,确保你仍处于“查询编辑器”窗口。In Power BI Desktop, make sure you're still in the Query Editor window. 如果并非如此,请选择“开始”功能区,然后在“外部数据”组中,单击“编辑查询”。If you aren't, select the Home ribbon, and in the External data group, click Edit Queries.

现在,在“开始”功能区的“新建查询”组中,打开“新建源”下拉菜单并选择“空白查询”。Now, in the Home ribbon, in the New Query group, open the New Source drop-down menu and select Blank Query.

此时会在“查询”列表中显示新查询(初始名称为 Query1)。A new query, initially named Query1, appears in the Queries list. 双击此条目,将其命名为 KeyPhrasesDouble-click this entry and name it KeyPhrases.

现在,在“开始”功能区的“查询”组中,单击“高级编辑器”以打开高级编辑器窗口。Now, in the Home ribbon, in the Query group, click Advanced Editor to open the Advanced Editor window. 删除该窗口中的现有代码,将以下代码粘贴到其中。Delete the code that's already in that window and paste in the following code.

Note

以下示例假定文本分析 API 终结点以 https://chinaeast2.api.cognitive.azure.cn 开头。The examples below assume the Text Analytics API endpoint begins with https://chinaeast2.api.cognitive.azure.cn. 可以通过文本分析在 2 个不同的区域创建订阅。Text Analytics allows you to create a subscription in 2 different regions. 如果注册了另一区域的服务,请确保使用所选区域的终结点。If you signed up for the service in a different region, please make sure to use the endpoint for the region you selected. 你可以通过登录到 Azure 门户、选择文本分析订阅并选择“概述”页面来找到此终结点。You can find this endpoint by signing in to the Azure portal, selecting your Text Analytics subscription, and selecting the Overview page.

// Returns key phrases from the text in a comma-separated list
(text) => let
    apikey      = "YOUR_API_KEY_HERE",
    endpoint    = "https://chinaeast2.api.cognitive.azure.cn/text/analytics/v2.1/keyPhrases",
    jsontext    = Text.FromBinary(Json.FromValue(Text.Start(Text.Trim(text), 5000))),
    jsonbody    = "{ documents: [ { language: ""en"", id: ""0"", text: " & jsontext & " } ] }",
    bytesbody   = Text.ToBinary(jsonbody),
    headers     = [#"Ocp-Apim-Subscription-Key" = apikey],
    bytesresp   = Web.Contents(endpoint, [Headers=headers, Content=bytesbody]),
    jsonresp    = Json.Document(bytesresp),
    keyphrases  = Text.Lower(Text.Combine(jsonresp[documents]{0}[keyPhrases], ", "))
in  keyphrases

YOUR_API_KEY_HERE 替换为你的文本分析访问密钥。Replace YOUR_API_KEY_HERE with your Text Analytics access key. 你还可以通过登录到 Azure 门户、选择文本分析订阅并选择“概述”页面来找到此密钥。You can also find this key by signing in to the Azure portal, selecting your Text Analytics subscription, and selecting the Overview page. 请务必保留密钥前后的引号。Be sure to leave the quotation marks before and after the key. 然后单击“完成”。Then click Done.

使用自定义函数Use the custom function

现在可以使用自定义函数从每个客户评论中提取关键短语并将其存储在表的新列中。Now you can use the custom function to extract the key phrases from each of the customer comments and store them in a new column in the table.

在 Power BI Desktop 的查询编辑器窗口中,切换回 FabrikamComments 查询。In Power BI Desktop, in the Query Editor window, switch back to the FabrikamComments query. 选择“添加列”功能区。Select the Add Column ribbon. 在“常规”组中,单击“调用自定义函数”。In the General group, click Invoke Custom Function.

[“调用自定义函数”按钮]

将出现“调用自定义函数”对话框。The Invoke Custom Function dialog appears. 在“新列名”中,输入 keyphrasesIn New column name, enter keyphrases. 在“函数查询”中,选择你创建的自定义函数 KeyPhrasesIn Function query, select the custom function you created, KeyPhrases.

对话框中将出现新字段“文本(可选)”。A new field appears in the dialog, text (optional). 此字段会询问我们希望使用哪一列来提供关键短语 API 的 text 参数的值。This field is asking which column we want to use to provide values for the text parameter of the Key Phrases API. (请记住,你已经硬编码了 languageid 参数的值。)从下拉菜单中选择 Merged此前通过合并主题和消息字段所创建的列)。(Remember that you already hard-coded the values for the language and id parameters.) Select Merged (the column you created previously by merging the subject and message fields) from the drop-down menu.

[调用自定义函数]

最后,单击“确定”。Finally, click OK.

如果一切准备就绪,Power BI 将针对表中的每一行调用自定义函数一次。If everything is ready, Power BI calls your custom function once for each row in the table. 它将查询发送到关键短语 API,并为表添加一个新列来存储结果。It sends the queries to the Key Phrases API and adds a new column to the table to store the results. 但在这些操作开始之前,可能需要指定身份验证和隐私设置。But before that happens, you may need to specify authentication and privacy settings.

身份验证和隐私Authentication and privacy

在关闭“调用自定义函数”对话框之后,可能会出现一个横幅,要求你指定如何连接到关键短语 API。After you close the Invoke Custom Function dialog, a banner may appear asking you to specify how to connect to the Key Phrases API.

[凭据横幅]

单击“编辑凭据”,确保选中对话框中的 Anonymous,然后单击“连接”。Click Edit Credentials, make sure Anonymous is selected in the dialog, then click Connect.

Note

你选择 Anonymous 是因为文本分析服务使用你的访问密钥对你进行身份验证,这样 Power BI 便不需要为 HTTP 请求自身提供凭据。You select Anonymous because the Text Analytics service authenticates you using your access key, so Power BI does not need to provide credentials for the HTTP request itself.

[将身份验证设置为匿名]

如果在选择匿名访问后仍看到“编辑凭据”横幅,则可能是因为忘记将文本分析访问密钥粘贴到 KeyPhrases 自定义函数的代码中。If you see the Edit Credentials banner even after choosing anonymous access, you may have forgotten to paste your Text Analytics access key into the code in the KeyPhrases custom function.

接下来可能会出现一个横幅,要求你提供有关数据源隐私的信息。Next, a banner may appear asking you to provide information about your data sources' privacy.

[隐私横幅]

对于对话框中的每个数据源,请单击“继续”,然后选择 PublicClick Continue and choose Public for each of the data sources in the dialog. 然后单击“保存”。Then click Save.

[设置数据源隐私]

创建词云Create the word cloud

处理完显示的横幅以后,请单击“主页”功能区中的“关闭并应用”,关闭查询编辑器。Once you have dealt with any banners that appear, click Close & Apply in the Home ribbon to close the Query Editor.

Power BI Desktop 需要时间来发出必需的 HTTP 请求。Power BI Desktop takes a moment to make the necessary HTTP requests. 对于表中的每一行,新的 keyphrases 列都包含关键短语 API 在文本中检测到的关键短语。For each row in the table, the new keyphrases column contains the key phrases detected in the text by the Key Phrases API.

现在将使用此列生成一个词云。Now you'll use this column to generate a word cloud. 首先,单击工作区左侧 Power BI Desktop 主窗口中的“报表”按钮。To get started, click the Report button in the main Power BI Desktop window, to the left of the workspace.

Note

为何使用提取的关键短语而不是每个评论的完整文本来生成词云?Why use extracted key phrases to generate a word cloud, rather than the full text of every comment? 关键短语提供的是客户评论中的重要词汇,而不仅仅是最常见词汇。The key phrases provide us with the important words from our customer comments, not just the most common words. 另外,生成的云中的单词大小调整不会因某个词在相对少数评论中的频繁使用而扭曲。Also, word sizing in the resulting cloud isn't skewed by the frequent use of a word in a relatively small number of comments.

如果尚未安装词云自定义视觉对象,请安装它。If you don't already have the Word Cloud custom visual installed, install it. 在工作区右侧的“可视化效果”面板中,单击三点形式的省略号 (...),然后选择“从存储导入”。In the Visualizations panel to the right of the workspace, click the three dots (...) and choose Import From Store. 然后搜索“云”并单击词云视觉对象旁边的“添加”按钮。Then search for "cloud" and click the Add button next the Word Cloud visual. Power BI 会安装词云视觉对象并会让你知道它已成功安装。Power BI installs the Word Cloud visual and lets you know that it installed successfully.

[添加自定义视觉对象]

首先,单击“可视化效果”面板中的“词云”图标。First, click the Word Cloud icon in the Visualizations panel.

[“可视化效果”面板中的“词云”图标]

此时会在工作区中显示新的报表。A new report appears in the workspace. keyphrases 字段从“字段”面板拖至“可视化效果”面板中的“类别”字段。Drag the keyphrases field from the Fields panel to the Category field in the Visualizations panel. 词云会显示在报表中。The word cloud appears inside the report.

现在,请切换到“可视化效果”面板的“格式”页面。Now switch to the Format page of the Visualizations panel. 在“非索引字”类别中启用“默认非索引字”,以便从云中消除短的常用词,例如“of”。In the Stop Words category, turn on Default Stop Words to eliminate short, common words like "of" from the cloud.

[激活默认非索引字]

在此面板中再往下稍微移动一点,禁用“旋转文本”和“标题”。Down a little further in this panel, turn off Rotate Text and Title.

[激活焦点模式]

单击报表中的“焦点模式”工具可以更好地查看词云。Click the Focus Mode tool in the report to get a better look at our word cloud. 此工具可将词云展开,使之填充整个工作区,如下所示。The tool expands the word cloud to fill the entire workspace, as shown below.

[一个词云]

更多文本分析服务More Text Analytics services

作为 Azure 提供的认知服务之一,文本分析服务还提供情绪分析和语言检测功能。The Text Analytics service, one of the Cognitive Services offered by Azure, also provides sentiment analysis and language detection. 语言检测尤其适用于客户反馈不完全是英语的情况。The language detection in particular is useful if your customer feedback isn't all in English.

这两个其他的 API 类似于关键短语 API。Both of these other APIs are similar to the Key Phrases API. 这意味着可以使用与你在本教程中所创建的近乎相同的自定义函数,将它们与 Power BI Desktop 集成。That means you can integrate them with Power BI Desktop using custom functions that are nearly identical to the one you created in this tutorial. 直接创建一个空白查询,然后将下面的相应代码粘贴到高级编辑器中,就像此前的操作那样。Just create a blank query and paste the appropriate code below into the Advanced Editor, as you did earlier. (请勿忘记访问密钥!)然后,像以前那样,使用函数向表添加新列。(Don't forget your access key!) Then, as before, use the function to add a new column to the table.

下面的情绪分析函数返回一个分数,指示文本中表达的情绪的正面程度。The Sentiment Analysis function below returns a score indicating how positive the sentiment expressed in the text is.

// Returns the sentiment score of the text, from 0.0 (least favorable) to 1.0 (most favorable)
(text) => let
    apikey      = "YOUR_API_KEY_HERE",
    endpoint    = "https://chinaeast2.api.cognitive.azure.cn/text/analytics/v2.1/sentiment",
    jsontext    = Text.FromBinary(Json.FromValue(Text.Start(Text.Trim(text), 5000))),
    jsonbody    = "{ documents: [ { language: ""en"", id: ""0"", text: " & jsontext & " } ] }",
    bytesbody   = Text.ToBinary(jsonbody),
    headers     = [#"Ocp-Apim-Subscription-Key" = apikey],
    bytesresp   = Web.Contents(endpoint, [Headers=headers, Content=bytesbody]),
    jsonresp    = Json.Document(bytesresp),
    sentiment   = jsonresp[documents]{0}[score]
in  sentiment

下面是两个版本的语言检测函数。Here are two versions of a Language Detection function. 第一个返回 ISO 语言代码(例如,表示英语的 en),而第二个则返回“友好”名称(例如 English)。The first returns the ISO language code (for example, en for English), while the second returns the "friendly" name (for example, English). 可以看到,这两个版本仅正文的最后一行有差异。You may notice that only the last line of the body differs between the two versions.

// Returns the two-letter language code (for example, 'en' for English) of the text
(text) => let
    apikey      = "YOUR_API_KEY_HERE",
    endpoint    = "https://chinaeast2.api.cognitive.azure.cn/text/analytics/v2.1/languages",
    jsontext    = Text.FromBinary(Json.FromValue(Text.Start(Text.Trim(text), 5000))),
    jsonbody    = "{ documents: [ { id: ""0"", text: " & jsontext & " } ] }",
    bytesbody   = Text.ToBinary(jsonbody),
    headers     = [#"Ocp-Apim-Subscription-Key" = apikey],
    bytesresp   = Web.Contents(endpoint, [Headers=headers, Content=bytesbody]),
    jsonresp    = Json.Document(bytesresp),
    language    = jsonresp[documents]{0}[detectedLanguages]{0}[iso6391Name]
in  language
// Returns the name (for example, 'English') of the language in which the text is written
(text) => let
    apikey      = "YOUR_API_KEY_HERE",
    endpoint    = "https://api.cognitive.azure.cn/text/analytics/v2.1/languages",
    jsontext    = Text.FromBinary(Json.FromValue(Text.Start(Text.Trim(text), 5000))),
    jsonbody    = "{ documents: [ { id: ""0"", text: " & jsontext & " } ] }",
    bytesbody   = Text.ToBinary(jsonbody),
    headers     = [#"Ocp-Apim-Subscription-Key" = apikey],
    bytesresp   = Web.Contents(endpoint, [Headers=headers, Content=bytesbody]),
    jsonresp    = Json.Document(bytesresp),
    language    = jsonresp[documents]{0}[detectedLanguages]{0}[name]
in  language

最后,下面是此前已提供的关键短语函数的变体,它返回的短语是列表对象,而不是单个字符串(包含逗号分隔的短语)。Finally, here's a variant of the Key Phrases function already presented that returns the phrases as a list object, rather than as a single string of comma-separated phrases.

Note

返回单个字符串简化了词云示例。Returning a single string simplified our word cloud example. 而列表则是更灵活的格式,可以与 Power BI 中返回的短语配合使用。A list, on the other hand, is a more flexible format for working with the returned phrases in Power BI. 可以使用查询编辑器的“转换”功能区中的“结构化列”组来操作 Power BI Desktop 中的列表对象。You can manipulate list objects in Power BI Desktop using the Structured Column group in the Query Editor's Transform ribbon.

// Returns key phrases from the text as a list object
(text) => let
    apikey      = "YOUR_API_KEY_HERE",
    endpoint    = "https://chinaeast2.api.cognitive.azure.cn/text/analytics/v2.1/keyPhrases",
    jsontext    = Text.FromBinary(Json.FromValue(Text.Start(Text.Trim(text), 5000))),
    jsonbody    = "{ documents: [ { language: ""en"", id: ""0"", text: " & jsontext & " } ] }",
    bytesbody   = Text.ToBinary(jsonbody),
    headers     = [#"Ocp-Apim-Subscription-Key" = apikey],
    bytesresp   = Web.Contents(endpoint, [Headers=headers, Content=bytesbody]),
    jsonresp    = Json.Document(bytesresp),
    keyphrases  = jsonresp[documents]{0}[keyPhrases]
in  keyphrases

后续步骤Next steps

详细了解文本分析服务、Power Query M 公式语言或 Power BI。Learn more about the Text Analytics service, the Power Query M formula language, or Power BI.