使用 Azure PowerShell 在 HDInsight 中创建基于 Linux 的群集Create Linux-based clusters in HDInsight using Azure PowerShell

Azure PowerShell 是一个功能强大的脚本编写环境,可用于在 Azure 中控制和自动执行工作负荷的部署和管理。Azure PowerShell is a powerful scripting environment that you can use to control and automate the deployment and management of your workloads in Azure. 本文档介绍如何使用 Azure PowerShell 创建基于 Linux 的 HDInsight 群集。This document provides information about how to create a Linux-based HDInsight cluster by using Azure PowerShell. 此外,还提供了示例脚本。It also includes an example script.

Note

Azure PowerShell 仅在 Windows 客户端上可用。Azure PowerShell is only available on Windows clients. 如果使用的是 Linux、Unix 或 Mac OS X 客户端,请参阅使用 Azure 经典 CLI 创建基于 Linux 的 HDInsight 群集,了解如何使用经典 CLI 创建群集。If you are using a Linux, Unix, or Mac OS X client, see Create a Linux-based HDInsight cluster using Azure Classic CLI for information about using the classic CLI to create a cluster.

先决条件Prerequisites

Note

本文进行了更新,以便使用新的 Azure PowerShell Az 模块。This article has been updated to use the new Azure PowerShell Az module. 你仍然可以使用 AzureRM 模块,至少在 2020 年 12 月之前,它将继续接收 bug 修补程序。You can still use the AzureRM module, which will continue to receive bug fixes until at least December 2020. 若要详细了解新的 Az 模块和 AzureRM 兼容性,请参阅新 Azure Powershell Az 模块简介To learn more about the new Az module and AzureRM compatibility, see Introducing the new Azure PowerShell Az module. 有关 Az 模块安装说明,请参阅安装 Azure PowerShellFor Az module installation instructions, see Install Azure PowerShell.

开始执行此过程之前请做好以下准备:You must have the following before starting this procedure:

创建群集Create cluster

Warning

HDInsight 群集是基于分钟按比例计费,而不管用户是否使用它们。Billing for HDInsight clusters is prorated per minute, whether you use them or not. 请务必在使用完群集之后将其删除。Be sure to delete your cluster after you finish using it. 请参阅如何删除 HDInsight 群集See how to delete an HDInsight cluster.

若要使用 Azure PowerShell 创建 HDInsight 群集,必须完成以下过程:To create an HDInsight cluster by using Azure PowerShell, you must complete the following procedures:

  • 创建 Azure 资源组Create an Azure resource group
  • 创建 Azure 存储帐户Create an Azure Storage account
  • 创建 Azure Blob 容器Create an Azure Blob container
  • 创建 HDInsight 群集Create an HDInsight cluster

以下脚本演示了如何创建新群集:The following script demonstrates how to create a new cluster:

# Login to your Azure subscription
# Is there an active Azure subscription?
$sub = Get-AzureRmSubscription -ErrorAction SilentlyContinue
if(-not($sub))
{
    Add-AzureRmAccount -EnvironmentName AzureChinaCloud
}

# If you have multiple subscriptions, set the one to use
# $subscriptionID = "<subscription ID to use>"
# Select-AzureRmSubscription -SubscriptionId $subscriptionID

# Get user input/default values
$resourceGroupName = Read-Host -Prompt "Enter the resource group name"
$location = Read-Host -Prompt "Enter the Azure region to create resources in"

# Create the resource group
New-AzureRmResourceGroup -Name $resourceGroupName -Location $location

$defaultStorageAccountName = Read-Host -Prompt "Enter the name of the storage account"

# Create an Azure storae account and container
New-AzureRmStorageAccount `
    -ResourceGroupName $resourceGroupName `
    -Name $defaultStorageAccountName `
    -Type Standard_LRS `
    -Location $location
$defaultStorageAccountKey = (Get-AzureRmStorageAccountKey `
                                -ResourceGroupName $resourceGroupName `
                                -Name $defaultStorageAccountName)[0].Value
$defaultStorageContext = New-AzureStorageContext `
                                -StorageAccountName $defaultStorageAccountName `
                                -StorageAccountKey $defaultStorageAccountKey

# Get information for the HDInsight cluster
$clusterName = Read-Host -Prompt "Enter the name of the HDInsight cluster"
# Cluster login is used to secure HTTPS services hosted on the cluster
$httpCredential = Get-Credential -Message "Enter Cluster login credentials" -UserName "admin"
# SSH user is used to remotely connect to the cluster using SSH clients
$sshCredentials = Get-Credential -Message "Enter SSH user credentials"

# Default cluster size (# of worker nodes), version, type, and OS
$clusterSizeInNodes = "4"
$clusterVersion = "3.5"
$clusterType = "Hadoop"
$clusterOS = "Linux"
# Set the storage container name to the cluster name
$defaultBlobContainerName = $clusterName

# Create a blob container. This holds the default data store for the cluster.
New-AzureStorageContainer `
    -Name $clusterName -Context $defaultStorageContext 

# Create the HDInsight cluster
New-AzureRmHDInsightCluster `
    -ResourceGroupName $resourceGroupName `
    -ClusterName $clusterName `
    -Location $location `
    -ClusterSizeInNodes $clusterSizeInNodes `
    -ClusterType $clusterType `
    -OSType $clusterOS `
    -Version $clusterVersion `
    -HttpCredential $httpCredential `
    -DefaultStorageAccountName "$defaultStorageAccountName.blob.core.chinacloudapi.cn" `
    -DefaultStorageAccountKey $defaultStorageAccountKey `
    -DefaultStorageContainer $clusterName `
    -SshCredential $sshCredentials

使用为群集登录指定的值创建群集的 Hadoop 用户帐户。The values you specify for the cluster login are used to create the Hadoop user account for the cluster. 使用此帐户连接到 Web UI 或 REST API 等群集上托管的服务。Use this account to connect to services hosted on the cluster such as web UIs or REST APIs.

使用为 SSH 用户指定的值创建群集的 SSH 用户。The values you specify for the SSH user are used to create the SSH user for the cluster. 使用此帐户在群集上启动远程 SSH 会话和运行作业。Use this account to start a remote SSH session on the cluster and run jobs. 有关详细信息,请参阅将 SSH 与 HDInsight 配合使用文档。For more information, see the Use SSH with HDInsight document.

Important

如果计划使用 32 个以上的辅助角色节点(在创建群集时配置或者是在创建之后通过扩展群集来配置),则还必须指定至少具有 8 个核心和 14 GB RAM 的头节点大小。If you plan to use more than 32 worker nodes (either at cluster creation or by scaling the cluster after creation), you must also specify a head node size with at least 8 cores and 14 GB of RAM.

有关节点大小和相关费用的详细信息,请参阅 HDInsight 定价For more information on node sizes and associated costs, see HDInsight pricing.

创建群集可能需要 20 分钟。It can take up to 20 minutes to create a cluster.

创建群集:配置对象Create cluster: Configuration object

还可以使用 New-AzHDInsightClusterConfig cmdlet 创建 HDInsight 配置对象。You can also create an HDInsight configuration object using New-AzHDInsightClusterConfig cmdlet. 然后,可以修改此配置对象,为群集启用其他配置选项。You can then modify this configuration object to enable additional configuration options for your cluster. 最后,使用 New-AzHDInsightCluster cmdlet 的 -Config 参数以利用该配置。Finally, use the -Config parameter of the New-AzHDInsightCluster cmdlet to use the configuration.

下面的脚本创建了一个配置对象,用于在 HDInsight 群集类型上配置 R Server。The following script creates a configuration object to configure an R Server on HDInsight cluster type. 该配置支持边缘节点、RStudio 和其他存储帐户。The configuration enables an edge node, RStudio, and an additional storage account.

$additionalStorageAccountName = Read-Host -Prompt "Enter the name of the additional storage account"

# Create the additional storage account
New-AzureRmStorageAccount -ResourceGroupName $resourceGroupName `
    -StorageAccountName $additionalStorageAccountName `
    -Location $location `
    -Type Standard_LRS

# Get the additional storage account key
$additionalStorageAccountKey = (Get-AzureRmStorageAccountKey -Name $additionalStorageAccountName -ResourceGroupName $resourceGroupName)[0].Value

$config = New-AzureRmHDInsightClusterConfig -ClusterType Hadoop

# Add an additional storage account
Add-AzureRmHDInsightStorage -Config $config -StorageAccountName "$additionalStorageAccountName.blob.core.chinacloudapi.cn" -StorageAccountKey $additionalStorageAccountKey

# Create a new HDInsight cluster using -Config
New-AzureRmHDInsightCluster `
    -ClusterName $clusterName `
    -ResourceGroupName $resourceGroupName `
    -HttpCredential $httpCredential `
    -Location $location `
    -DefaultStorageAccountName "$defaultStorageAccountName.blob.core.chinacloudapi.cn" `
    -DefaultStorageAccountKey $defaultStorageAccountKey `
    -DefaultStorageContainer $defaultStorageContainerName  `
    -ClusterSizeInNodes $clusterSizeInNodes `
    -OSType $clusterOS `
    -Version $clusterVersion `
    -SshCredential $sshCredentials `
    -Config $config

Warning

不支持在 HDInsight 群集之外的其他位置使用存储帐户。Using a storage account in a different location than the HDInsight cluster is not supported. 使用此示例时,请在与服务器相同的位置上创建其他存储帐户。When using this example, create the additional storage account in the same location as the server.

自定义群集Customize clusters

删除群集Delete the cluster

Warning

HDInsight 群集是基于分钟按比例计费,而不管用户是否使用它们。Billing for HDInsight clusters is prorated per minute, whether you use them or not. 请务必在使用完群集之后将其删除。Be sure to delete your cluster after you finish using it. 请参阅如何删除 HDInsight 群集See how to delete an HDInsight cluster.

故障排除Troubleshoot

如果在创建 HDInsight 群集时遇到问题,请参阅访问控制要求If you run into issues with creating HDInsight clusters, see access control requirements.

后续步骤Next steps

成功创建 HDInsight 群集后,请通过以下资源了解如何使用群集。Now that you have successfully created an HDInsight cluster, use the following resources to learn how to work with your cluster.

Apache Hadoop 群集Apache Hadoop clusters

Apache HBase 群集Apache HBase clusters

Storm 群集Storm clusters

Apache Spark 群集Apache Spark clusters