在 Azure HDInsight 上安装第三方 Apache Hadoop 应用程序Install third-party Apache Hadoop applications on Azure HDInsight

了解如何在 Azure HDInsight 上安装第三方 Apache Hadoop 应用程序。Learn how to install a third-party Apache Hadoop application on Azure HDInsight. 有关如何安装自己的应用程序的说明,请参阅安装自定义 HDInsight 应用程序For instructions on installing your own application, see Install custom HDInsight applications.

HDInsight 应用程序是用户可以在 HDInsight 群集上安装的应用程序。An HDInsight application is an application that users can install on an HDInsight cluster. 这些应用程序可能是 Microsoft、独立软件供应商 (ISV) 或自己开发的。These applications can be developed by Microsoft, independent software vendors (ISV) or by yourself.

以下列表显示已发布的应用程序:The following list shows the published applications:

应用程序Application 群集类型Cluster type(s) 说明Description
AtScale 智能平台AtScale Intelligence Platform HadoopHadoop AtScale 可将 HDInsight 群集转换为 OLAP 横向扩展服务器,使你能够使用熟悉的、自有的和喜爱的商业智能 (BI) 工具以交互的方式查看数十亿行数据;这些工具包括 Microsoft Excel、Power BI、Tableau Software 和 QlikView 等等。AtScale turns your HDInsight cluster into a scale-out OLAP server, allowing you to query billions of rows of data interactively using the BI tools you already know, own, and love – from Microsoft Excel, Power BI, Tableau Software to QlikView.
适用于 HDInsight 的 CDAPCDAP for HDInsight HBaseHBase CDAP 是第一个用于大数据的统一集成平台,它加速 Hadoop 的价值实现,让 IT 能够提供自助服务数据。CDAP is the first unified integration platform for big data that accelerates time to value for Hadoop and enables IT to provide self-service data. CDAP 采用开源设计且可扩展,消除了创新所面临的障碍。Open source and extensible, CDAP removes barriers to innovation. 要求:4 个区域节点,至少需安装 D3 v2。Requirements: 4 Region nodes, min D3 v2.
DatameerDatameer HadoopHadoop Datameer 是一个可缩放的自助服务平台,用于准备、浏览和管理要分析的数据,可以加速将复杂多源数据转变为随时可在业务中使用的宝贵信息,以企业规模提供更快、更智能的见解。Datameer’s self-service scalable platform for preparing, exploring, and governing your data for analytics accelerates turning complex multisource data into valuable business-ready information, delivering faster, smarter insights at an enterprise-scale.
HDInsight 上的 Dataiku DSSDataiku DSS on HDInsight Hadoop、SparkHadoop, Spark Dataiku DSS 位于企业数据科学平台上,可让数据科学家和数据分析师协同合作,更有效地设计和运行新的数据产品和服务,将原始数据转变成有影响力的预测。Dataiku DSS in an enterprise data science platform that lets data scientists and data analysts collaborate to design and run new data products and services more efficiently, turning raw data into impactful predictions.
WANdisco Fusion HDI 应用WANdisco Fusion HDI App Hadoop、Spark、HBase、Storm、KafkaHadoop, Spark,HBase,Storm,Kafka 在分布式环境中保持数据一致性是一个很大的数据操作难题。Keeping data consistent in a distributed environment is a massive data operations challenge. 企业级软件平台 WANdisco Fusion 可在任一环境中实现非结构化数据的一致性,从而解决了这一问题。WANdisco Fusion, an enterprise-class software platform, solves this problem by enabling unstructured data consistency across any environment.
适用于 HDInsight 的 H2O SparklingWaterH2O SparklingWater for HDInsight SparkSpark H2O Sparkling Water 支持以下分布式算法:GLM、朴素贝叶斯、分布式随机森林、梯度增强机、深度神经网络、深度学习、K-平均、PCA、广义低阶模型、异常情况检测和自编码器。H2O Sparkling Water supports the following distributed algorithms: GLM, Naïve Bayes, Distributed Random Forest, Gradient Boosting Machine, Deep Neural Networks, Deep learning, K-means, PCA, Generalized Low Rank Models, Anomaly Detection, Autoencoders.
用于将实时数据集成到 HDInsight 的 StriimStriim for Real-Time Data Integration to HDInsight Hadoop、HBase、Storm、Spark、KafkaHadoop,HBase,Storm,Spark,Kafka Striim(读作“stream”)是一个端到端流式处理数据集成和智能平台,可用于实现不同数据流的持续引入、处理和分析。Striim (pronounced "stream") is an end-to-end streaming data integration + intelligence platform, enabling continuous ingestion, processing, and analytics of disparate data streams.
Jumbune Enterprise - 加速大数据分析Jumbune Enterprise-Accelerating BigData Analytics Hadoop、SparkHadoop, Spark 大致来说,Jumbune 通过以下方式为企业提供帮助:1.At a high level, Jumbune assists enterprises by, 1. 加速基于 Tez、MapReduce 和 Spark 引擎的 Hive、Java 和 Scala 工作负载性能;Accelerating Tez, MapReduce & Spark engine based Hive, Java, Scala workload performance. 2.2. 主动监视 Hadoop 群集;3.Proactive Hadoop Cluster Monitoring, 3. 在分布式文件系统上建立数据质量管理。Establishing Data Quality management on distributed file system.
Kyligence EnterpriseKyligence Enterprise Hadoop、HBase、SparkHadoop,HBase,Spark Kyligence Enterprise 由 Apache Kylin 提供支持,可基于大数据实现商业智能 (BI)。Powered by Apache Kylin, Kyligence Enterprise Enables BI on Big Data. 作为 Hadoop 上的企业 OLAP 引擎,Kyligence Enterprise 使业务分析师能够通过行业标准的数据仓库和 BI 方法在 Hadoop 上构建 BI。As an enterprise OLAP engine on Hadoop, Kyligence Enterprise empowers business analyst to architect BI on Hadoop with industry-standard data warehouse and BI methodology.
适用于 Azure HDInsight 的 Starburst PrestoStarburst Presto for Azure HDInsight HadoopHadoop Presto 是一种快速的可缩放分布式 SQL 查询引擎。Presto is a fast and scalable distributed SQL query engine. Presto 针对存储和计算的分离进行了架构设计,非常适用于查询 Azure Data Lake Storage、Azure Blob 存储、SQL 和 NoSQL 数据库以及其他数据源中的数据。Architected for the separation of storage and compute, Presto is perfect for querying data in Azure Data Lake Storage, Azure Blob Storage, SQL and NoSQL databases, and other data sources.
适用于 HDInsight 云的 StreamSets 数据收集器StreamSets Data Collector for HDInsight Cloud Hadoop、HBase、Spark、KafkaHadoop,HBase,Spark,Kafka StreamSets 数据收集器是一种轻量级的功能强大的引擎,可实时流式处理数据。StreamSets Data Collector is a lightweight, powerful engine that streams data in real time. 该数据收集器可用来路由和处理数据流中的数据。Use Data Collector to route and process data in your data streams. 该产品附带 30 天试用版许可证。It comes with a 30 day trial license.
Trifacta Wrangler EnterpriseTrifacta Wrangler Enterprise Hadoop、Spark、HBaseHadoop, Spark,HBase Trifacta Wrangler Enterprise for HDInsight 支持对任何规模的数据进行企业范围的数据整理。Trifacta Wrangler Enterprise for HDInsight supports enterprise-wide data wrangling for any scale of data. 在 Azure 上运行 Trifacta 所产生的成本是 Trifacta 订阅成本及虚拟机的 Azure 基础结构成本之和。The cost of running Trifacta on Azure is a combination of Trifacta subscription costs plus the Azure infrastructure costs for the virtual machines.
Unifi Data PlatformUnifi Data Platform Hadoop、HBase、Storm、SparkHadoop,HBase,Storm,Spark Unifi Data Platform 是一套无缝集成的自助服务数据工具,旨在使业务用户能够解决促进收入增长、降低成本或运营复杂性的数据难题。The Unifi Data Platform is a seamlessly integrated suite of self-service data tools designed to empower the business user to tackle data challenges that drive incremental revenue, reduce costs or operational complexity.
Unraveldata APMUnraveldata APM SparkSpark 适用于 HDInsight Spark 群集的 Unravel Data 应用。Unravel Data app for HDInsight Spark cluster.
Waterline AI 驱动的数据目录Waterline AI-Driven Data Catalog SparkSpark Waterline 使用 AI 来编录、整理和管理数据,从而使用业务术语自动标记数据。Waterline catalogs, organizes, and governs data using AI to auto-tag data with business terms. Waterline 的商业文献目录是一个用于自助分析、合规性和管理,以及 IT 管理计划的关键成功组件。Waterline’s business literate catalog is a critical, success component for self-service analytics, compliance and governance, and IT management initiatives.

本文提供的说明将使用 Azure 门户。The instructions provided in this article use Azure portal. 也可以从门户导出 Azure 资源管理器模板或从供应商处获取 Resource Manage 模板的副本,并使用 Azure PowerShell 和 Azure 经典 CLI 部署模板。You can also export the Azure Resource Manager template from the portal or obtain a copy of the Resource Manager template from vendors, and use Azure PowerShell and Azure Classic CLI to deploy the template. 请参阅使用资源管理器模板在 HDInsight 中创建 Apache Hadoop 群集See Create Apache Hadoop clusters on HDInsight using Resource Manager templates.

先决条件Prerequisites

如果想要在现有的 HDInsight 群集上安装 HDInsight 应用程序,必须有一个 HDInsight 群集。If you want to install HDInsight applications on an existing HDInsight cluster, you must have an HDInsight cluster. 若要创建群集,请参阅 创建群集To create one, see Create clusters. 也可以在创建 HDInsight 群集时安装 HDInsight 应用程序。You can also install HDInsight applications when you create an HDInsight cluster.

将应用程序安装到现有群集Install applications to existing clusters

下面的过程演示如何将 HDInsight 应用程序安装到现有的 HDInsight 群集。The following procedure shows you how to install HDInsight applications to an existing HDInsight cluster.

安装 HDInsight 应用程序Install an HDInsight application

  1. 登录 Azure 门户Sign in to the Azure portal.

  2. 在左侧菜单中,导航到“所有服务” > “Analytics” > “HDInsight 群集” 。From the left menu, navigate to All services > Analytics > HDInsight clusters.

  3. 从列表中选择一个 HDInsight 群集。Select an HDInsight cluster from the list. 如果没有群集,必须先创建一个。If you don't have one, you must create one first. 请参阅 创建群集see Create clusters.

  4. 在“设置”目录下,选择“应用程序” 。Under the Settings category, select Applications. 可在主窗口中看到已安装应用程序的列表。You can see a list of installed applications in the main window.

    HDInsight 应用程序门户菜单

  5. 在菜单中选择“+添加”。Select +Add from the menu. 可看到可用应用程序的列表。You can see a list of available applications. 如果“+添加”呈灰色,表示任何应用程序可用于此版本的 HDInsight 群集。If +Add is greyed out, that means there are no applications for this version of the HDInsight cluster.

    HDInsight 应用程序可用应用程序

  6. 选择某个可用的应用程序,然后按照说明来接受法律条款。Select one of the available applications, and then follow the instructions to accept the legal terms.

可通过门户通知查看安装状态(选择门户顶部的铃铛图标)。You can see the installation status from the portal notifications (select the bell icon on the top of the portal). 安装应用程序后,应用程序会出现在“已安装的应用”列表中。After the application is installed, the application appears on the Installed Apps list.

在群集创建期间安装应用程序Install applications during cluster creation

可以选择在创建群集时安装 HDInsight 应用程序。You have the option to install HDInsight applications when you create a cluster. 在此过程中,HDInsight 应用程序会在群集创建并处于运行状态后安装。During the process, HDInsight applications are installed after the cluster is created and is in the running state. 若要通过 Azure 门户在群集创建期间安装应用程序,请从“配置与定价”选项卡中选择“+添加应用程序” 。To install applications during cluster creation using the Azure portal, from the Configuration + pricing tab, select + Add application.

Azure 门户群集配置应用程序

列出已安装的 HDInsight 应用和属性List installed HDInsight apps and properties

门户会显示群集的已安装 HDInsight 应用程序列表,以及每个已安装应用程序的属性。The portal shows a list of the installed HDInsight applications for a cluster, and the properties of each installed application.

列出 HDInsight 应用程序并显示属性List HDInsight application and display properties

  1. 登录 Azure 门户Sign in to the Azure portal.

  2. 在左侧菜单中,导航到“所有服务” > “Analytics” > “HDInsight 群集” 。From the left menu, navigate to All services > Analytics > HDInsight clusters.

  3. 从列表中选择一个 HDInsight 群集。Select an HDInsight cluster from the list.

  4. 在“设置”目录下,选择“应用程序” 。Under the Settings category, select Applications. 可在主窗口中看到已安装应用程序的列表。You can see a list of installed applications in the main window.

    HDInsight 应用程序已安装的应用

  5. 选择一个已安装的应用程序来查看属性。Select one of the installed applications to show the property. 属性列表:The property lists:

    属性Property 说明Description
    应用程序名称App name 应用程序名称。Application name.
    状态Status 应用程序状态。Application status.
    网页Webpage 已部署到边缘节点的 Web 应用程序的 URL。The URL of the web application that you have deployed to the edge node. 此凭据与针对群集配置的 HTTP 用户凭据相同。The credential is the same as the HTTP user credentials that you have configured for the cluster.
    SSH 终结点SSH endpoint 可以使用 SSH 连接到边缘节点。You can use SSH to connect to the edge node. SSH 凭据与针对群集配置的 SSH 用户凭据相同。The SSH credentials are the same as the SSH user credentials that you have configured for the cluster. 有关信息,请参阅将 SSH 与 HDInsight 配合使用For information, see Use SSH with HDInsight.
    说明Description 应用程序说明。Application description.
  6. 若要删除应用程序,请右键单击应用程序,并单击上下文菜单中的“删除”。To delete an application, right-click the application, and then click Delete from the context menu.

连接到边缘节点Connect to the edge node

可以使用 HTTP 和 SSH 连接到边缘节点。You can connect to the edge node using HTTP and SSH. 可在 门户中找到终结点信息。The endpoint information can be found from the portal. 有关信息,请参阅将 SSH 与 HDInsight 配合使用For information, see Use SSH with HDInsight.

HTTP 终结点凭据是针对 HDInsight 群集配置的 HTTP 用户凭据;SSH 终结点凭据是针对 HDInsight 群集配置的 SSH 凭据。The HTTP endpoint credentials are the HTTP user credentials that you have configured for the HDInsight cluster; the SSH endpoint credentials are the SSH credentials that you have configured for the HDInsight cluster.

疑难解答Troubleshoot

请参阅 故障排除安装问题See Troubleshoot the installation.

后续步骤Next steps