使用 Azure REST API 创建 Hadoop 群集Create Hadoop clusters using the Azure REST API

了解如何使用 Azure Resource Manager 模板和 Azure REST API 创建 HDInsight 群集。Learn how to create an HDInsight cluster using an Azure Resource Manager template and the Azure REST API.

使用 Azure REST API,可以对托管在 Azure 平台中的服务执行管理操作,包括创建新资源(例如 HDInsight 群集)。The Azure REST API allows you to perform management operations on services hosted in the Azure platform, including the creation of new resources such as HDInsight clusters.

Important

Linux 是 HDInsight 3.4 或更高版本上使用的唯一操作系统。Linux is the only operating system used on HDInsight version 3.4 or greater. 有关详细信息,请参阅 HDInsight 在 Windows 上停用For more information, see HDInsight retirement on Windows.

Note

本文档中的步骤使用 curl (https://curl.haxx.se/) 实用工具与 Azure REST API 进行通信。The steps in this document use the curl (https://curl.haxx.se/) utility to communicate with the Azure REST API.

创建模板Create a template

Azure Resource Manager 模板是描述资源组及其包含的所有资源(例如 HDInsight)的 JSON 文档。此基于模板的方法允许在一个模板中定义 HDInsight 所需的资源。Azure Resource Manager templates are JSON documents that describe a resource group and all resources in it (such as HDInsight.) This template-based approach allows you to define the resources that you need for HDInsight in one template.

下面的 JSON 文档是来自 https://github.com/Azure/azure-quickstart-templates/tree/master/101-hdinsight-linux-ssh-password 的模板与参数文件的组合形式,它将创建基于 Linux 的群集,并使用密码保护 SSH 用户帐户。The following JSON document is a merger of the template and parameters files from https://github.com/Azure/azure-quickstart-templates/tree/master/101-hdinsight-linux-ssh-password, which creates a Linux-based cluster using a password to secure the SSH user account.

{
    "properties": {
        "template": {
            "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
            "contentVersion": "1.0.0.0",
            "parameters": {
                "clusterType": {
                    "type": "string",
                    "allowedValues": ["hadoop",
                    "hbase",
                    "storm",
                    "spark"],
                    "metadata": {
                        "description": "The type of the HDInsight cluster to create."
                    }
                },
                "clusterName": {
                    "type": "string",
                    "metadata": {
                        "description": "The name of the HDInsight cluster to create."
                    }
                },
                "clusterLoginUserName": {
                    "type": "string",
                    "metadata": {
                        "description": "These credentials can be used to submit jobs to the cluster and to log into cluster dashboards."
                    }
                },
                "clusterLoginPassword": {
                    "type": "securestring",
                    "metadata": {
                        "description": "The password must be at least 10 characters in length and must contain at least one digit, one non-alphanumeric character, and one upper or lower case letter."
                    }
                },
                "sshUserName": {
                    "type": "string",
                    "metadata": {
                        "description": "These credentials can be used to remotely access the cluster."
                    }
                },
                "sshPassword": {
                    "type": "securestring",
                    "metadata": {
                        "description": "The password must be at least 10 characters in length and must contain at least one digit, one non-alphanumeric character, and one upper or lower case letter."
                    }
                },
                "clusterStorageAccountName": {
                    "type": "string",
                    "metadata": {
                        "description": "The name of the storage account to be created and be used as the cluster's storage."
                    }
                },
                "clusterWorkerNodeCount": {
                    "type": "int",
                    "defaultValue": 4,
                    "metadata": {
                        "description": "The number of nodes in the HDInsight cluster."
                    }
                }
            },
            "variables": {
                "defaultApiVersion": "2015-05-01-preview",
                "clusterApiVersion": "2015-03-01-preview"
            },
            "resources": [{
                "name": "[parameters('clusterStorageAccountName')]",
                "type": "Microsoft.Storage/storageAccounts",
                "location": "[resourceGroup().location]",
                "apiVersion": "[variables('defaultApiVersion')]",
                "dependsOn": [],
                "tags": {

                },
                "properties": {
                    "accountType": "Standard_LRS"
                }
            },
            {
                "name": "[parameters('clusterName')]",
                "type": "Microsoft.HDInsight/clusters",
                "location": "[resourceGroup().location]",
                "apiVersion": "[variables('clusterApiVersion')]",
                "dependsOn": ["[concat('Microsoft.Storage/storageAccounts/',parameters('clusterStorageAccountName'))]"],
                "tags": {

                },
                "properties": {
                    "clusterVersion": "3.6",
                    "osType": "Linux",
                    "clusterDefinition": {
                        "kind": "[parameters('clusterType')]",
                        "configurations": {
                            "gateway": {
                                "restAuthCredential.isEnabled": true,
                                "restAuthCredential.username": "[parameters('clusterLoginUserName')]",
                                "restAuthCredential.password": "[parameters('clusterLoginPassword')]"
                            }
                        }
                    },
                    "storageProfile": {
                        "storageaccounts": [{
                            "name": "[concat(parameters('clusterStorageAccountName'),'.blob.core.chinacloudapi.cn')]",
                            "isDefault": true,
                            "container": "[parameters('clusterName')]",
                            "key": "[listKeys(resourceId('Microsoft.Storage/storageAccounts', parameters('clusterStorageAccountName')), variables('defaultApiVersion')).key1]"
                        }]
                    },
                    "computeProfile": {
                        "roles": [{
                            "name": "headnode",
                            "targetInstanceCount": "2",
                            "hardwareProfile": {
                                "vmSize": "Standard_D3"
                            },
                            "osProfile": {
                                "linuxOperatingSystemProfile": {
                                    "username": "[parameters('sshUserName')]",
                                    "password": "[parameters('sshPassword')]"
                                }
                            }
                        },
                        {
                            "name": "workernode",
                            "targetInstanceCount": "[parameters('clusterWorkerNodeCount')]",
                            "hardwareProfile": {
                                "vmSize": "Standard_D3"
                            },
                            "osProfile": {
                                "linuxOperatingSystemProfile": {
                                    "username": "[parameters('sshUserName')]",
                                    "password": "[parameters('sshPassword')]"
                                }
                            }
                        }]
                    }
                }
            }],
            "outputs": {
                "cluster": {
                    "type": "object",
                    "value": "[reference(resourceId('Microsoft.HDInsight/clusters',parameters('clusterName')))]"
                }
            }
        },
        "mode": "incremental",
        "Parameters": {
            "clusterName": {
                "value": "newclustername"
            },
            "clusterType": {
                "value": "hadoop"
            },
            "clusterStorageAccountName": {
                "value": "newstoragename"
            },
            "clusterLoginUserName": {
                "value": "admin"
            },
            "clusterLoginPassword": {
                "value": "changeme"
            },
            "sshUserName": {
                "value": "sshuser"
            },
            "sshPassword": {
                "value": "changeme"
            }
        }
    }
}

本文档中的步骤使用了此示例。This example is used in the steps in this document. 将“参数”部分中的示例值替换为群集的值。Replace the example values in the Parameters section with the values for your cluster.

Important

此模板对 HDInsight 群集使用默认数目(4 个)的辅助角色节点。The template uses the default number of worker nodes (4) for an HDInsight cluster. 如果计划使用 32 个以上的辅助角色节点,则必须选择至少具有 8 个核心和 14 GB ram 的头节点大小。If you plan on more than 32 worker nodes, then you must select a head node size with at least 8 cores and 14-GB ram.

有关节点大小和相关费用的详细信息,请参阅 HDInsight 定价For more information on node sizes and associated costs, see HDInsight pricing.

登录到 Azure 订阅Log in to your Azure subscription

Note

在 Azure China 中使用 Azure CLI 2.0 之前,请首先运行 az cloud set -n AzureChinaCloud 更改云环境。Before you can use Azure CLI 2.0 in Azure China, please run az cloud set -n AzureChinaCloud first to change the cloud environment. 如果要切换回全局 Azure,请再次运行 az cloud set -n AzureCloudIf you want to switch back to Global Azure, run az cloud set -n AzureCloud again.

请按照 Azure CLI 入门中所述的步骤操作,并使用 az login 命令连接到订阅。Follow the steps documented in Get started with Azure CLI and connect to your subscription using the az login command.

创建服务主体Create a service principal

Note

这些步骤是使用 Azure CLI 创建服务主体以访问资源文档的“使用密码创建服务主体”部分的缩减版本。These steps are an abridged version of the Create service principal with password section of the Use Azure CLI to create a service principal to access resources document. 这些步骤创建用于向 Azure REST API 进行身份验证的服务主体。These steps create a service principal that is used to authenticate to the Azure REST API.

  1. 从命令行使用以下命令列出 Azure 订阅。From a command line, use the following command to list your Azure subscriptions.

    az account list --query '[].{Subscription_ID:id,Tenant_ID:tenantId,Name:name}'  --output table
    

    在列表中,选择要使用的订阅并记下 Subscription_IDTenant_ID 列。In the list, select the subscription that you want to use and note the Subscription_ID and Tenant_ID columns. 保存这些值。Save these values.

  2. 使用以下命令在 Azure Active Directory 中创建应用程序。Use the following command to create an application in Azure Active Directory.

    az ad app create --display-name "exampleapp" --homepage "https://www.contoso.org" --identifier-uris "https://www.contoso.org/example" --password <Your password> --query 'appId'
    

    --display-name--homepage--identifier-uris 的值替换为自己的值。Replace the values for the --display-name, --homepage, and --identifier-uris with your own values. 为新的 Active Directory 条目提供密码。Provide a password for the new Active Directory entry.

    Note

    --home-page--identifier-uris 值无需引用 Internet 上托管的实际网页。The --home-page and --identifier-uris values don't need to reference an actual web page hosted on the internet. 它们必须是唯一的 URI。They must be unique URIs.

    此命令返回的值是新应用程序的 应用 IDThe value returned from this command is the App ID for the new application. 保存此值。Save this value.

  3. 通过以下命令使用 应用 ID创建服务主体。Use the following command to create a service principal using the App ID.

    az ad sp create --id <App ID> --query 'objectId'
    

    此命令返回的值是 对象 IDThe value returned from this command is the Object ID. 保存此值。Save this value.

  4. 使用对象 ID 值向服务主体分配所有者角色。Assign the Owner role to the service principal using the Object ID value. 使用前面获取的 订阅 IDUse the subscription ID you obtained earlier.

    az role assignment create --assignee <Object ID> --role Owner --scope /subscriptions/<Subscription ID>/
    

获取身份验证令牌Get an authentication token

使用以下命令检索身份验证令牌:Use the following command to retrieve an authentication token:

curl -X "POST" "https://login.chinacloudapi.cn/$TENANTID/oauth2/token" \
-H "Cookie: flight-uxoptin=true; stsservicecookie=ests; x-ms-gateway-slice=productionb; stsservicecookie=ests" \
-H "Content-Type: application/x-www-form-urlencoded" \
--data-urlencode "client_id=$APPID" \
--data-urlencode "grant_type=client_credentials" \
--data-urlencode "client_secret=$PASSWORD" \
--data-urlencode "resource=https://management.chinacloudapi.cn/"

$TENANTID$APPID$PASSWORD 设置为以前获取或使用的值。Set $TENANTID, $APPID, and $PASSWORD to the values obtained or used previously.

如果此请求成功,你将收到 200 系列响应,且响应正文包含一个 JSON 文档。If this request is successful, you receive a 200 series response and the response body contains a JSON document.

此请求返回的 JSON 文档包含一个名为 access_token 的元素。The JSON document returned by this request contains an element named access_token. access_token 的值用来对向 REST API 发出的请求进行身份验证。The value of access_token is used to authentication requests to the REST API.

{
    "token_type":"Bearer",
    "expires_in":"3599",
    "expires_on":"1463409994",
    "not_before":"1463406094",
    "resource":"https://management.chinacloudapi.cn/","access_token":"eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6Ik1uQ19WWoNBVGZNNXBPWWlKSE1iYTlnb0VLWSIsImtpZCI6Ik1uQ19WWmNBVGZNNXBPWWlKSE1iYTlnb0VLWSJ9.eyJhdWQiOiJodHRwczovL21hbmFnZW1lbnQuYXp1cmUuY29tLyIsImlzcyI6Imh0dHBzOi8vc3RzLndpbmRvd3MubmV0LzcyZjk4OGJmLTg2ZjEtNDFhZi05MWFiLTJkN2NkMDExZGI2Ny8iLCJpYXQiOjE0NjM0MDYwOTQsIm5iZiI6MTQ2MzQwNjA5NCwiZXhwIjoxNDYzNDA5OTk5LCJhcHBpZCI6IjBlYzcyMzM0LTZkMDMtNDhmYi04OWU1LTU2NTJiODBiZDliYiIsImFwcGlkYWNyIjoiMSIsImlkcCI6Imh0dHBzOi8vc3RzLndpbmRvd3MubmV0LzcyZjk4OGJmLTg2ZjEtNDFhZi05MWFiLTJkN2NkMDExZGI0Ny8iLCJvaWQiOiJlNjgxZTZiMi1mZThkLTRkZGUtYjZiMS0xNjAyZDQyNWQzOWYiLCJzdWIiOiJlNjgxZTZiMi1mZThkLTRkZGUtYjZiMS0xNjAyZDQyNWQzOWYiLCJ0aWQiOiI3MmY5ODhiZi04NmYxLTQxYWYtOTFhYi0yZDdjZDAxMWRiNDciLCJ2ZXIiOiIxLjAifQ.nJVERbeDHLGHn7ZsbVGBJyHOu2PYhG5dji6F63gu8XN2Cvol3J1HO1uB4H3nCSt9DTu_jMHqAur_NNyobgNM21GojbEZAvd0I9NY0UDumBEvDZfMKneqp7a_cgAU7IYRcTPneSxbD6wo-8gIgfN9KDql98b0uEzixIVIWra2Q1bUUYETYqyaJNdS4RUmlJKNNpENllAyHQLv7hXnap1IuzP-f5CNIbbj9UgXxLiOtW5JhUAwWLZ3-WMhNRpUO2SIB7W7tQ0AbjXw3aUYr7el066J51z5tC1AK9UC-mD_fO_HUP6ZmPzu5gLA6DxkIIYP3grPnRVoUDltHQvwgONDOw"
}

创建资源组Create a resource group

使用以下命令创建资源组。Use the following to create a resource group.

  • $SUBSCRIPTIONID 设置为创建服务主体时收到的订阅 ID。Set $SUBSCRIPTIONID to the subscription ID received while creating the service principal.
  • $ACCESSTOKEN 设置为在上一步骤中收到的访问令牌。Set $ACCESSTOKEN to the access token received in the previous step.
  • DATACENTERLOCATION 替换为要在其中创建资源组和资源的数据中心。Replace DATACENTERLOCATION with the data center you wish to create the resource group, and resources, in. 例如“China East”。For example, 'China East'.
  • $RESOURCEGROUPNAME 设置为要用于此组的名称:Set $RESOURCEGROUPNAME to the name you wish to use for this group:
curl -X "PUT" "https://management.chinacloudapi.cn/subscriptions/$SUBSCRIPTIONID/resourcegroups/$RESOURCEGROUPNAME?api-version=2015-01-01" \
    -H "Authorization: Bearer $ACCESSTOKEN" \
    -H "Content-Type: application/json" \
    -d $'{
"location": "DATACENTERLOCATION"
}'

如果此请求成功,将收到 200 系列响应,且响应正文包含一个 JSON 文档,其中包含有关该组的信息。If this request is successful, you receive a 200 series response and the response body contains a JSON document containing information about the group. "provisioningState" 元素包含值 "Succeeded"The "provisioningState" element contains a value of "Succeeded".

创建部署Create a deployment

使用以下命令将模板部署到资源组。Use the following command to deploy the template to the resource group.

  • $DEPLOYMENTNAME 设置为要用于此部署的名称。Set $DEPLOYMENTNAME to the name you wish to use for this deployment.
curl -X "PUT" "https://management.chinacloudapi.cn/subscriptions/$SUBSCRIPTIONID/resourcegroups/$RESOURCEGROUPNAME/providers/microsoft.resources/deployments/$DEPLOYMENTNAME?api-version=2015-01-01" \
-H "Authorization: Bearer $ACCESSTOKEN" \
-H "Content-Type: application/json" \
-d "{set your body string to the template and parameters}"

Note

如果将模板保存到了文件中,则可以使用以下命令而不是 -d "{ template and parameters}"If you saved the template to a file, you can use the following command instead of -d "{ template and parameters}":

--data-binary "@/path/to/file.json"

如果此请求成功,将收到 200 系列响应,且响应正文包含一个 JSON 文档,其中包含有关部署操作的信息。If this request is successful, you receive a 200 series response and the response body contains a JSON document containing information about the deployment operation.

Important

部署已提交,但尚未完成。The deployment has been submitted, but has not completed. 部署通常需要大约 15 分钟才能完成。It can take several minutes, usually around 15, for the deployment to complete.

检查部署状态Check the status of a deployment

若要检查部署状态,请使用以下命令:To check the status of the deployment, use the following command:

curl -X "GET" "https://management.chinacloudapi.cn/subscriptions/$SUBSCRIPTIONID/resourcegroups/$RESOURCEGROUPNAME/providers/microsoft.resources/deployments/$DEPLOYMENTNAME?api-version=2015-01-01" \
-H "Authorization: Bearer $ACCESSTOKEN" \
-H "Content-Type: application/json"

此命令会返回包含有关部署操作的信息的 JSON 文档。This command returns a JSON document containing information about the deployment operation. "provisioningState" 元素包含部署的状态。The "provisioningState" element contains the status of the deployment. 如果此元素包含 "Succeeded" 值,则部署已成功完成。If this element contains a value of "Succeeded", then the deployment has completed successfully.

故障排除Troubleshoot

如果在创建 HDInsight 群集时遇到问题,请参阅访问控制要求If you run into issues with creating HDInsight clusters, see access control requirements.

后续步骤Next steps

成功创建 HDInsight 群集后,请参考以下主题来了解如何使用群集。Now that you have successfully created an HDInsight cluster, use the following to learn how to work with your cluster.

Hadoop 群集Hadoop clusters

HBase 群集HBase clusters

Storm 群集Storm clusters

[Deploy and monitor topologies with Storm on HDInsight](storm/apache-storm-deploy-monitor-topology-linux.md)