使用 Azure 门户在 HDInsight 中创建基于 Linux 的群集Create Linux-based clusters in HDInsight by using the Azure portal

Azure 门户是一种基于 Web 的管理工具,用于管理 Azure 云中托管的服务和资源。The Azure portal is a web-based management tool for services and resources hosted in the Azure cloud. 本文介绍如何使用门户创建基于 Linux 的 AzureHDInsight 群集。In this article you learn how to create Linux-based AzureHDInsight clusters by using the portal.

先决条件Prerequisites

Warning

HDInsight 群集是基于分钟按比例计费,而不管用户是否使用它们。Billing for HDInsight clusters is prorated per minute, whether you use them or not. 请务必在使用完群集之后将其删除。Be sure to delete your cluster after you finish using it. 请参阅如何删除 HDInsight 群集See how to delete an HDInsight cluster.

  • 一个 Azure 订阅An Azure subscription. 请参阅获取 Azure 试用版See Get Azure trial.
  • 一个现代 Web 浏览器A modern web browser. Azure 门户使用 HTML5 和 JavaScript,The Azure portal uses HTML5 and JavaScript. 可能无法在旧版 Web 浏览器中正常运行。It might not function correctly in older web browsers.

创建群集Create clusters

Azure 门户会公开大部分的群集属性。The Azure portal exposes most of the cluster properties. 使用 Azure 资源管理器模板可以隐藏许多详细信息。By using Azure Resource Manager templates, you can hide many details. 有关详细信息,请参阅使用资源管理器模板在 HDInsight 中创建 Apache Hadoop 群集For more information, see Create Apache Hadoop clusters in HDInsight by using Resource Manager templates.

Note

需要安全传输的功能强制通过安全连接来实施针对帐户的所有请求。The feature that requires secure transfer enforces all requests to your account through a secure connection. 仅 HDInsight 群集 3.6 或更高版本支持此功能。Only HDInsight cluster version 3.6 or newer supports this feature. 有关详细信息,请参阅在 Azure HDInsight 中使用安全传输存储帐户创建 Apache Hadoop 群集For more information, see Create Apache Hadoop cluster with secure transfer storage accounts in Azure HDInsight.

  1. 登录到 Azure 门户Sign in to the Azure portal.

  2. 在左侧菜单中,选择“+ 创建资源”。From the left menu, select + Create a resource.

  3. 在“Azure 市场”下,选择“数据 + 分析”。Under Azure Marketplace, select Data + Analytics.

  4. 在“特别推荐”下选择“HDInsight”。Under Featured, select HDInsight.

    在 Azure 门户中创建新群集Create a new cluster in the Azure portal

  5. 在“HDInsight”页上,选择“自定义(大小、设置、应用)”。On the HDInsight page, select Custom (size, settings, apps).

  6. 选择“1 基础知识”。Select 1 Basics. 然后输入以下信息。Then enter the following information.

    配置基本设置Configure basic settings

    • 输入群集名称Enter the Cluster Name. 此名称必须全局唯一。This name must be globally unique.

    • 从“订阅”下拉列表中选择用于此群集的 Azure 订阅。From the Subscription drop-down list, select the Azure subscription that's used for the cluster.

    • 选择“群集类型”。Select Cluster type. 然后,选择要创建的群集的类型。Then select the type of cluster you want to create. 例如,Hadoop 和 Apache Spark。Examples are Hadoop and Apache Spark. “操作系统”将为 Linux。The Operating system will be Linux. 接下来,选择群集类型版本。Next, select a cluster type version. 如果不知道要选择哪个版本,请使用默认版本。Use the default version if you don't know what to choose. 有关详细信息,请参阅 HDInsight 群集版本For more information, see HDInsight cluster versions.

      Important

      HDInsight 群集有各种类型。HDInsight clusters come in a variety of types. 这些类型与该群集进行优化的工作负荷或技术相对应。They correspond to the workload or technology that the cluster is tuned for. 没有任何方法支持创建组合多种类型的群集,There's no supported method to create a cluster that combines multiple types. 例如,一个群集同时具有 Storm 和 HBase 类型。Examples are Storm and HBase on one cluster.

    • 对于“群集登录用户名”和“群集登录密码”,请分别为管理员用户提供用户名和密码。For Cluster login username and Cluster login password, provide the username and password for the admin user.

    • 输入“SSH 用户名”。Enter an SSH Username. 如果希望 SSH 密码与在前面指定的管理员密码相同,则选中“使用与群集登录相同的密码”复选框。If you want the same SSH password as the admin password you specified earlier, select the Use same password as cluster login check box. 否则,请提供“密码”或“公钥”来验证 SSH 用户。If not, provide either a PASSWORD or PUBLIC KEY to authenticate the SSH user. 建议的方法是公钥。A public key is the approach we recommend. 选择底部的“选择”,保存凭据配置。Choose Select at the bottom to save the credentials configuration.

      有关详细信息,请参阅使用 SSH 连接到 HDInsight (Apache Hadoop)For more information, see Connect to HDInsight (Apache Hadoop) by using SSH.

    • 对于“资源组”,指定是要创建新的资源组还是使用现有资源组。For Resource group, specify whether you want to create a new resource group or use an existing one.

    • 指定要在其中创建群集的数据中心位置Specify a data center location where the cluster is created.

    • 选择“下一步”转到下一页。Select Next to move to the next page.

  7. 在“2 安全性 + 网络”中,可以使用所提供的下拉菜单将群集连接到虚拟网络。From 2 Security + networking, you can connect your cluster to a virtual network by using the provided drop-down menu. 如果想要将群集放入虚拟网络,请选择 Azure 虚拟网络和子网。Select an Azure virtual network and the subnet if you want to place the cluster into a virtual network. 若要了解如何将 HDInsight 与虚拟网络配合使用,请参阅使用 Azure 虚拟网络扩展 HDInsight 功能For information on using HDInsight with a virtual network, see Extend HDInsight capabilities by using an Azure Virtual Network. 本文包含虚拟网络的特定配置要求。The article includes specific configuration requirements for the virtual network.

    选择“下一步”转到下一页。Select Next to move to the next page.

  8. 从“3 存储”中,指定是要将 Azure 存储还是将 Azure Data Lake Storage 作为默认存储。From 3 Storage, specify whether you want Azure Storage or Azure Data Lake Storage as your default storage. 有关详细信息,请参阅下表。For more information, see the following table.

    设置存储设置Set storage settings

    存储Storage 说明Description
    将 Azure 存储 Blob 作为默认存储Azure Storage blobs as the default storage
    • 对于“主存储类型”,选择“Azure 存储”。For Primary Storage type, select Azure Storage. 如果要指定属于 Azure 订阅的存储帐户,则请选择“我的订阅”作为“选择方法”,For Selection method, choose My subscriptions if you want to specify a storage account that's part of your Azure subscription. 然后选择存储帐户。Then select the storage account. 否则,请选择“访问密钥”,Otherwise, select Access key. 然后提供想要从 Azure 订阅外部选择的存储帐户的信息。Then provide the information for the storage account that you want to choose from outside your Azure subscription.
    • 对于“默认容器”,请选择门户建议的默认容器名称或自行指定。For Default container, choose the default container name suggested by the portal or specify your own.
    • 如果 Azure Blob 存储为默认存储,则也可选择“其他存储帐户”,以指定要与群集关联的其他存储帐户。If Azure Blob storage is your default storage, you can also select Additional Storage Accounts to specify additional storage accounts to associate with the cluster. 对于“Azure 存储密钥”,请选择“添加存储密钥”。For Azure Storage Keys, select Add a storage key. 然后,可以从 Azure 订阅或其他订阅提供一个存储帐户。Then you can provide a storage account from your Azure subscriptions or from other subscriptions. 提供存储帐户访问密钥。Provide the storage account access key.
    • 如果 Blob 存储为默认存储,则也可选择“Data Lake Storage 访问权限”,将 Azure Data Lake Storage 指定为其他存储。If Blob storage is your default storage, you can also select Data Lake Storage access to specify Azure Data Lake Storage as additional storage. 有关详细信息,请参阅快速入门:在 HDInsight 中设置群集For more information, see Quickstart: Set up clusters in HDInsight.
    外部元存储External metastores 以选项的方式指定一个 SQL 数据库,用于保存与群集关联的 Apache Hive 和 Apache Oozie 元数据。As an option, specify a SQL database to save Apache Hive and Apache Oozie metadata associated with the cluster. 对于“为 Hive 选择 SQL 数据库”选项,请选择一个 SQL 数据库,For Select a SQL database for Hive, select a SQL database. 然后为数据库提供用户名和密码。Then provide the username and password for the database. 为 Oozie 元数据重复以上这些步骤。Repeat these steps for Oozie metadata.

    将 Azure SQL 数据库用于元存储时的一些注意事项如下所示:Some considerations about using Azure SQL database for metastores are as follows:
    • 用于元存储的 Azure SQL 数据库必须允许连接到其他 Azure 服务,包括 Azure HDInsight。The Azure SQL database that's used for the metastore must allow connectivity to other Azure services, including Azure HDInsight. 在 Azure SQL 数据库仪表板的右侧选择服务器名称。On the right side of the Azure SQL database dashboard, select the server name. 此服务器是运行 SQL 数据库实例的服务器。This server is the one that the SQL database instance runs on. 进入服务器视图以后,选择“配置”。After you're in server view, select Configure. 然后,对于“Azure 服务”,请选择“是”。Then for Azure Services, select Yes. 再选择“保存”。Then select Save.
    • 创建元存储时,请勿使用短划线或连字符来命名数据库。When you create a metastore, don't name a database with dashes or hyphens. 这些字符可能导致群集创建过程失败。These characters can cause the cluster creation process to fail.

    Warning

    不支持在 HDInsight 群集之外的其他位置使用别的存储帐户。Using an additional storage account in a different location than the HDInsight cluster isn't supported.

    选择“下一步”转到下一页。Select Next to move to the next page.

  9. 从“4 应用程序(可选)”中,选择任何所需的应用程序。From 4 Applications (optional), select any applications that you want. Microsoft、独立软件供应商 (ISV) 或你自己都可以开发这些应用程序。Microsoft, independent software vendors (ISVs), or you can develop these applications. 有关详细信息,请参阅在群集创建期间安装应用程序For more information, see Install applications during cluster creation.

    选择“下一步”转到下一页。Select Next to move to the next page.

  10. “5 群集大小”显示用于此群集的节点的相关信息。5 Cluster size displays information about the nodes that are used for this cluster. 设置群集所需的工作节点数。Set the number of worker nodes that you need for the cluster. 此时还会显示该群集的预估运行成本。The estimated cost of running the cluster is also shown.

    指定节点定价层Specify node pricing tiers

    Important

    如果计划使用 32 个以上的辅助角色节点,则请选择至少具有 8 个核心和 14 GB RAM 的头节点大小。If you plan on more than 32 worker nodes, select a head node size with at least eight cores and 14 GB RAM. 可以在创建群集时计划节点,也可以在创建群集之后通过缩放群集来计划节点。Plan the nodes either at cluster creation or by scaling the cluster after creation.

    有关节点大小和相关费用的详细信息,请参阅 HDInsight 定价For more information on node sizes and associated costs, see HDInsight pricing.

    选择“下一步”转到下一页。Select Next to move to the next page.

  11. 从“6 脚本操作”中,可以自定义群集以安装自定义组件。From 6 Script actions, you can customize a cluster to install custom components. 如果要在创建群集时使用自定义脚本来自定义群集,请使用此选项。This option works if you want to use a custom script to customize a cluster, as the cluster is being created. 有关脚本操作的详细信息,请参阅使用脚本操作自定义基于 Linux 的 HDInsight 群集For more information about script actions, see Customize Linux-based HDInsight clusters by using script actions.

    选择“下一步”转到下一页。Select Next to move to the next page.

  12. 从“7 摘要”中,验证之前输入的信息,From 7 Summary, verify the information you entered earlier. 然后选择“创建”。Then select Create.

    确认配置Confirm configurations

    Note

    创建群集需要一些时间,通常约 20 分钟左右。It takes some time for the cluster to be created, usually around 20 minutes. 监视“通知”以检查预配进程。Monitor Notifications to check on the provisioning process.

  13. 创建进程完成后,选择“部署成功”通知中的“转到资源”。After the creation process finishes, select Go to Resource from the Deployment succeeded notification. 群集窗口提供以下信息。The cluster window provides the following information.

    群集接口Cluster interface

    窗口中的图标解释如下:The icons in the window are explained as follows:

    • “概览”选项卡提供有关群集的所有基本信息。The Overview tab provides all the essential information about the cluster. 例如,名称、其所属的资源组、位置、操作系统、群集仪表板 URL。Examples are the name, the resource group it belongs to, the location, the operating system, and the URL for the cluster dashboard.
    • 仪表板 可你将定向到与群集关联的 Ambari 门户。Dashboard directs you to the Ambari portal associated with the cluster.
    • 安全外壳提供使用 SSH 访问群集时所需的信息。Secure Shell provides information needed to access the cluster by using SSH.
    • 使用“缩放群集”可以增加与群集关联的辅助角色节点数。By using Scale cluster, you can increase the number of worker nodes associated with the cluster.
    • 通过“删除”来删除 HDInsight 群集。Delete deletes the HDInsight cluster.

自定义群集Customize clusters

删除群集Delete the cluster

Warning

HDInsight 群集是基于分钟按比例计费,而不管用户是否使用它们。Billing for HDInsight clusters is prorated per minute, whether you use them or not. 请务必在使用完群集之后将其删除。Be sure to delete your cluster after you finish using it. 请参阅如何删除 HDInsight 群集See how to delete an HDInsight cluster.

故障排除Troubleshoot

如果在创建 HDInsight 群集时遇到问题,请参阅访问控制要求If you run into issues with creating HDInsight clusters, see access control requirements.

后续步骤Next steps

你已成功创建 HDInsight 群集。You've successfully created an HDInsight cluster. 现在可以了解如何使用群集了。Now learn how to work with your cluster.

Apache Hadoop 群集Apache Hadoop clusters

Apache HBase 群集Apache HBase clusters

Apache Storm 群集Apache Storm clusters

Apache Spark 群集Apache Spark clusters