使用 Bootstrap 自定义 HDInsight 群集Customize HDInsight clusters using Bootstrap

Bootstrap 脚本允许你以编程方式在 Azure HDInsight 中安装和配置组件。Bootstrap scripts allow you to install and configure components in Azure HDInsight programmatically.

在创建 HDInsight 群集时,有三种方式可用来设置配置文件设置:There are three approaches to set configuration file settings as your HDInsight cluster is created:

  • 使用 Azure PowerShellUse Azure PowerShell
  • 使用 .NET SDKUse .NET SDK
  • 使用 Azure Resource Manager 模板Use Azure Resource Manager template

例如,使用这些编程方法,你可以在以下文件中配置选项:For example, using these programmatic methods, you can configure options in these files:

  • clusterIdentity.xmlclusterIdentity.xml
  • core-site.xmlcore-site.xml
  • gateway.xmlgateway.xml
  • hbase-env.xmlhbase-env.xml
  • hbase-site.xmlhbase-site.xml
  • hdfs-site.xmlhdfs-site.xml
  • hive-env.xmlhive-env.xml
  • hive-site.xmlhive-site.xml
  • mapred-sitemapred-site
  • oozie-site.xmloozie-site.xml
  • oozie-env.xmloozie-env.xml
  • storm-site.xmlstorm-site.xml
  • tez-site.xmltez-site.xml
  • webhcat-site.xmlwebhcat-site.xml
  • yarn-site.xmlyarn-site.xml
  • server.properties(kafka-broker 配置)server.properties (kafka-broker configuration)

有关在创建时在 HDInsight 群集上安装其他组件的信息,请参阅使用脚本操作自定义 HDInsight 群集 (Linux)For information on installing additional components on HDInsight cluster during the creation time, see Customize HDInsight clusters using Script Action (Linux).

先决条件Prerequisites

  • 如果使用 PowerShell,你将需要 Az 模块If using PowerShell, you will need the Az Module.

使用 Azure PowerShellUse Azure PowerShell

以下 PowerShell 代码将自定义 Apache Hive 配置:The following PowerShell code customizes an Apache Hive configuration:

Important

参数 Spark2Defaults 可能需要与 Add-AzHDInsightConfigValues 一起使用。The parameter Spark2Defaults may need to be used with Add-AzHDInsightConfigValue. 你可以向参数传递空值,如以下代码示例中所示。You can pass empty values to the parameter as shown in the code example below.

# hive-site.xml configuration
$hiveConfigValues = @{ "hive.metastore.client.socket.timeout"="90s" }

$config = New-AzHDInsightClusterConfig `
    | Set-AzHDInsightDefaultStorage `
        -StorageAccountName "$defaultStorageAccountName.blob.core.chinacloudapi.cn" `
        -StorageAccountKey $defaultStorageAccountKey `
    | Add-AzHDInsightConfigValue `
        -HiveSite $hiveConfigValues `
        -Spark2Defaults @{}

New-AzHDInsightCluster `
    -ResourceGroupName $existingResourceGroupName `
    -ClusterName $clusterName `
    -Location $location `
    -ClusterSizeInNodes $clusterSizeInNodes `
    -ClusterType Hadoop `
    -OSType Linux `
    -Version "3.6" `
    -HttpCredential $httpCredential `
    -Config $config 

可在附录中找到完整的有效 PowerShell 脚本。A complete working PowerShell script can be found in Appendix.

若要验证更改,请执行以下操作:To verify the change:

  1. 导航至 https://CLUSTERNAME.azurehdinsight.cn/,其中 CLUSTERNAME 是群集的名称。Navigate to https://CLUSTERNAME.azurehdinsight.cn/ where CLUSTERNAME is the name of your cluster.
  2. 从左侧菜单中,导航到“Hive” > “配置” > “高级” 。From the left menu, navigate to Hive > Configs > Advanced.
  3. 展开“高级 hive-site” 。Expand Advanced hive-site.
  4. 找到 hive.metastore.client.socket.timeout 并确认该值为 90sLocate hive.metastore.client.socket.timeout and confirm the value is 90s.

下面是有关自定义其他配置文件的更多示例:Some more samples on customizing other configuration files:

# hdfs-site.xml configuration
$HdfsConfigValues = @{ "dfs.blocksize"="64m" } #default is 128MB in HDI 3.0 and 256MB in HDI 2.1

# core-site.xml configuration
$CoreConfigValues = @{ "ipc.client.connect.max.retries"="60" } #default 50

# mapred-site.xml configuration
$MapRedConfigValues = @{ "mapreduce.task.timeout"="1200000" } #default 600000

# oozie-site.xml configuration
$OozieConfigValues = @{ "oozie.service.coord.normal.default.timeout"="150" }  # default 120

使用 .NET SDKUse .NET SDK

请参阅用于 .NET 的 Azure HDInsight SDKSee Azure HDInsight SDK for .NET.

使用 Resource Manager 模板Use Resource Manager template

可以在 Resource Manager 模板中使用 bootstrap:You can use bootstrap in Resource Manager template:

"configurations": {
    "hive-site": {
        "hive.metastore.client.connect.retry.delay": "5",
        "hive.execution.engine": "mr",
        "hive.security.authorization.manager": "org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider"
    }
}

HDInsight Hadoop, 自定义群集, bootstrap, Azure Resource Manager 模板

另请参阅See also

附录:PowerShell 示例Appendix: PowerShell sample

此 PowerShell 脚本创建一个 HDInsight 群集并自定义 Hive 设置。This PowerShell script creates an HDInsight cluster and customizes a Hive setting. 请确保为 $nameToken$httpPassword$sshPassword 输入值。Be sure to enter values for $nameToken, $httpPassword, and $sshPassword.

####################################
# Set these variables
####################################
#region - used for creating Azure service names
$nameToken = "<ENTER AN ALIAS>" 
#endregion

#region - cluster user accounts
$httpUserName = "admin"  #HDInsight cluster username
$httpPassword = '<ENTER A PASSWORD>' 

$sshUserName = "sshuser" #HDInsight ssh user name
$sshPassword = '<ENTER A PASSWORD>' 
#endregion

####################################
# Service names and varialbes
####################################
#region - service names
$namePrefix = $nameToken.ToLower() + (Get-Date -Format "MMdd")

$resourceGroupName = $namePrefix + "rg"
$hdinsightClusterName = $namePrefix + "hdi"
$defaultStorageAccountName = $namePrefix + "store"
$defaultBlobContainerName = $hdinsightClusterName

$location = "China East"
#endregion


####################################
# Connect to Azure
####################################
#region - Connect to Azure subscription
Write-Host "`nConnecting to your Azure subscription ..." -ForegroundColor Green
$sub = Get-AzSubscription -ErrorAction SilentlyContinue
if(-not($sub))
{
    Connect-AzAccount
}

# If you have multiple subscriptions, set the one to use
# Select-AzSubscription -SubscriptionId "<SUBSCRIPTIONID>"

#endregion

#region - Create an HDInsight cluster
####################################
# Create dependent components
####################################
Write-Host "Creating a resource group ..." -ForegroundColor Green
New-AzResourceGroup `
    -Name  $resourceGroupName `
    -Location $location

Write-Host "Creating the default storage account and default blob container ..."  -ForegroundColor Green
New-AzStorageAccount `
    -ResourceGroupName $resourceGroupName `
    -Name $defaultStorageAccountName `
    -Location $location `
    -SkuName Standard_LRS `
    -Kind StorageV2 `
    -EnableHttpsTrafficOnly 1

# Note: Storage account kind BlobStorage cannot be used as primary storage.

$defaultStorageAccountKey = (Get-AzStorageAccountKey `
                                -ResourceGroupName $resourceGroupName `
                                -Name $defaultStorageAccountName)[0].Value
$defaultStorageContext = New-AzStorageContext `
                                -StorageAccountName $defaultStorageAccountName `
                                -StorageAccountKey $defaultStorageAccountKey
New-AzStorageContainer `
    -Name $defaultBlobContainerName `
    -Context $defaultStorageContext #use the cluster name as the container name

####################################
# Create a configuration object
####################################
$hiveConfigValues = @{"hive.metastore.client.socket.timeout"="90s"}

$config = New-AzHDInsightClusterConfig `
    | Set-AzHDInsightDefaultStorage `
        -StorageAccountName "$defaultStorageAccountName.blob.core.chinacloudapi.cn" `
        -StorageAccountKey $defaultStorageAccountKey `
    | Add-AzHDInsightConfigValue `
        -HiveSite $hiveConfigValues `
        -Spark2Defaults @{}

####################################
# Create an HDInsight cluster
####################################
$httpPW = ConvertTo-SecureString -String $httpPassword -AsPlainText -Force
$httpCredential = New-Object System.Management.Automation.PSCredential($httpUserName,$httpPW)

$sshPW = ConvertTo-SecureString -String $sshPassword -AsPlainText -Force
$sshCredential = New-Object System.Management.Automation.PSCredential($sshUserName,$sshPW)

New-AzHDInsightCluster `
    -ResourceGroupName $resourceGroupName `
    -ClusterName $hdinsightClusterName `
    -Location $location `
    -ClusterSizeInNodes 1 `
    -ClusterType Hadoop `
    -OSType Linux `
    -Version "3.6" `
    -HttpCredential $httpCredential `
    -SshCredential $sshCredential `
    -Config $config

####################################
# Verify the cluster
####################################
Get-AzHDInsightCluster `
    -ClusterName $hdinsightClusterName

#endregion