使用 Power Query 将 Excel 连接到 Apache HadoopConnect Excel to Apache Hadoop by using Power Query

Microsoft 大数据解决方案的一个关键功能是,将 Microsoft 商业智能 (BI) 组件与 Azure HDInsight 中的 Apache Hadoop 群集相集成。One key feature of the Microsoft big-data solution is the integration of Microsoft business intelligence (BI) components with Apache Hadoop clusters in Azure HDInsight. 一个主要示例是能够使用 Microsoft Power Query for Excel 外接程序将 Excel 连接到包含与 Hadoop 群集关联的数据的 Azure 存储帐户。A primary example is the ability to connect Excel to the Azure Storage account that contains the data associated with your Hadoop cluster by using the Microsoft Power Query for Excel add-in. 本文将逐步说明如何设置和使用 Power Query 来查询与 HDInsight 管理的 Hadoop 群集关联的数据。This article walks you through how to set up and use Power Query to query data associated with a Hadoop cluster managed with HDInsight.

先决条件Prerequisites

  • HDInsight 中的 Apache Hadoop 群集。An Apache Hadoop cluster on HDInsight. 请参阅 Linux 上的 HDInsight 入门See Get Started with HDInsight on Linux.
  • 运行 Windows 10、Windows 7、Windows Server 2008 R2 或更高版本操作系统的工作站。A workstation that is running Windows 10, 7, Windows Server 2008 R2, or a later operating system.
  • Office 2016、Office 2013 Professional Plus、Office 365 ProPlus、Excel 2013 Standalone 或 Office 2010 Professional Plus。Office 2016, Office 2013 Professional Plus, Office 365 ProPlus, Excel 2013 Standalone, or Office 2010 Professional Plus.

安装 Microsoft Power QueryInstall Microsoft Power Query

Power Query 可以导入已输出的数据,或者导入已由在 HDInsight 群集上运行的 Hadoop 作业生成的数据。Power Query can import data that has been output or that has been generated by a Hadoop job running on an HDInsight cluster.

在 Excel 2016 中,Power Query 已集成到“获取并转换”部分下的数据功能区。In Excel 2016, Power Query has been integrated into the Data ribbon under the Get & Transform section. 对于较旧的 Excel 版本,请从 Microsoft 下载中心下载 Microsoft Power Query for Excel 并安装它。For older Excel versions, download Microsoft Power Query for Excel from the Microsoft Download Center and install it.

将 HDInsight 数据导入 ExcelImport HDInsight data into Excel

使用用于 Excel 的 Power Query 外接程序,可以轻松地将 HDInsight 群集中的数据导入到 Excel 中,并可以在 Excel 中使用 PowerPivot 和 Power Map 之类的 BI 工具对数据进行检查、分析和显示。The Power Query add-in for Excel makes it easy to import data from your HDInsight cluster into Excel, where BI tools such as PowerPivot and Power Map can be used to inspect, analyze, and present the data.

  1. 启动 Excel。Launch Excel.

  2. 创建一个新的空白工作簿。Create a new blank workbook.

  3. 基于 Excel 版本,执行以下步骤:Perform the following steps based on the Excel version:

    • Excel 2016Excel 2016

      • 选择 >“数据” > “获取数据” > “从 Azure” > “从 Azure HDInsight(HDFS)” 。Select > Data > Get Data > From Azure > From Azure HDInsight(HDFS).

        HDI.PowerQuery.SelectHdiSource.2016

    • Excel 2013/2010Excel 2013/2010

      • 选择“Power Query“ > “从 Azure” > “从 Microsoft Azure HDInsight” 。Select Power Query > From Azure > From Microsoft Azure HDInsight.

        HDI.PowerQuery.SelectHdiSource

        注意: 如果未看到“Power Query”菜单,请转到“文件” > “选项” > “加载项”,从页面底部的下拉“管理”框中选择“COM 加载项” 。Note: If you don't see the Power Query menu, go to File > Options > Add-ins, and select COM Add-ins from the drop-down Manage box at the bottom of the page. Select the Go... button and verify that the box for the Power Query for Excel add-in has been checked.

        注意: Power Query 还允许通过选择“来自其他源”从 HDFS 中导入数据 。Note: Power Query also allows you to import data from HDFS by selecting From Other Sources.

  4. 在“Azure HDInsight(HDFS)” 对话框的“帐户名称或 URL” 文本框中,输入与群集关联的 Azure Blob 存储帐户的名称。From the Azure HDInsight(HDFS) dialog, in the Account name or URL text box, enter the name of the Azure Blob storage account associated with your cluster. 然后选择“确定”。 Then select OK. 此帐户可以是默认存储帐户或链接的存储帐户。This account can be the default storage account or a linked storage account. 格式为 https://StorageAccountName.blob.core.windows.net/The format is https://StorageAccountName.blob.core.windows.net/.

  5. 对于“帐户密钥”,请输入 Blob 存储帐户的密钥,然后选择“连接” 。For Account Key, enter the key for the Blob storage account, and then select Connect. (只需要在首次访问此存储时输入帐户信息。)(You need to enter the account information only the first time you access this store.)

  6. 在“查询编辑器”左侧的“导航器”窗格中,双击与群集关联的 Blob 存储容器名称 。In the Navigator pane on the left of the Query Editor, double-click the Blob storage container name associated with your cluster. 默认情况下,该容器名称与群集名称相同。By default, the container name is the same name as the cluster name.

  7. 在“名称”列中找到“HiveSampleData.txt”(文件夹路径是“../hive/warehouse/hivesampletable/”),然后选择 HiveSampleData.txt 左侧的“二进制” 。Locate HiveSampleData.txt in the Name column (the folder path is ../hive/warehouse/hivesampletable/), and then select Binary on the left of HiveSampleData.txt. HiveSampleData.txt 随所有群集提供。HiveSampleData.txt comes with all the cluster. (可选)可使用自己的文件。Optionally, you can use your own file.

    HDI Excel Power Query 导入数据

  8. 可根据需要重命名列名称。If you want, you can rename the column names. 准备就绪后,选择“关闭并加载”。 When you're ready, select Close & Load. 数据已加载到工作簿:The data has been loaded to your workbook:

    HDI Excel Power Query 导入的表

后续步骤Next steps

在本文中,已了解如何使用 Power Query 将数据从 HDInsight 检索到 Excel 中。In this article, you learned how to use Power Query to retrieve data from HDInsight into Excel. 同样地,也可以将来自 HDInsight 的数据检索到 Azure SQL 数据库中。Similarly, you can retrieve data from HDInsight into Azure SQL Database. 也可以将数据上传到 HDInsight 中。It is also possible to upload data into HDInsight. 要了解更多信息,请参阅下列文章:To learn more, see the following articles: