使用脚本操作自定义 Azure HDInsight 群集Customize Azure HDInsight clusters by using script actions

Azure HDInsight 提供一个称为“脚本操作”的配置方法,该方法可以调用自定义脚本来自定义群集。Azure HDInsight provides a configuration method called script actions that invokes custom scripts to customize the cluster. 这些脚本用于安装其他组件以及更改配置设置。These scripts are used to install additional components and change configuration settings. 可在创建群集期间或者之后使用脚本操作。Script actions can be used during or after cluster creation.

Important

只有基于 Linux 的 HDInsight 群集能够在运行中的群集上使用脚本操作。The ability to use script actions on an already running cluster is only available for Linux-based HDInsight clusters.

Linux 是 HDInsight 3.4 或更高版本上使用的唯一操作系统。Linux is the only operating system used on HDInsight version 3.4 or later. 有关详细信息,请参阅 HDInsight Windows 停用For more information, see HDInsight Windows retirement.

权限Permissions

对于已加入域的 HDInsight 群集,对群集使用脚本操作时需要两个 Apache Ambari 权限:For a domain-joined HDInsight cluster, there are two Apache Ambari permissions that are required when you use script actions with the cluster:

  • AMBARI.RUN_CUSTOM_COMMANDAMBARI.RUN_CUSTOM_COMMAND. 默认情况下,Ambari 管理员角色具有此权限。The Ambari Administrator role has this permission by default.
  • CLUSTER.RUN_CUSTOM_COMMANDCLUSTER.RUN_CUSTOM_COMMAND. 默认情况下,HDInsight 群集管理员和 Ambari 管理员都具有此权限。Both the HDInsight Cluster Administrator and Ambari Administrator have this permission by default.

访问控制Access control

如果你不是 Azure 订阅的管理员或所有者,则你的帐户必须对包含 HDInsight 群集的资源组至少拥有“参与者”访问权限。If you aren't the administrator or owner of your Azure subscription, your account must have at least Contributor access to the resource group that contains the HDInsight cluster.

如果创建 HDInsight 群集,则至少对 Azure 订阅拥有“参与者”访问权限的用户必须事先注册 HDInsight 的提供程序。If you create an HDInsight cluster, someone with at least Contributor access to the Azure subscription must have previously registered the provider for HDInsight. 对订阅具有参与者访问权限的用户首次在订阅上创建资源时,会进行提供程序注册。Provider registration happens when a user with Contributor access to the subscription creates a resource for the first time on the subscription. 如果使用 REST 注册提供程序,则无需创建资源也可完成该操作。It can also be done without creating a resource if you register a provider by using REST.

获取有关使用访问权限管理的详细信息:Get more information on working with access management:

了解脚本操作Understand script actions

脚本操作是指在 HDInsight 群集的节点上运行的 Bash 脚本。A script action is Bash script that runs on the nodes in an HDInsight cluster. 下面是脚本操作的特征和功能:Characteristics and features of script actions are as follows:

  • 必须存储在可从 HDInsight 群集访问的 URI 上。Must be stored on a URI that's accessible from the HDInsight cluster. 下面是可能的存储位置:The following are possible storage locations:

    • HDInsight 群集可访问的 Azure Data Lake Storage 帐户。An Azure Data Lake Storage account that's accessible by the HDInsight cluster. 有关将 Azure Data Lake Storage 与 HDInsight 配合使用的信息,请参阅快速入门:在 HDInsight 中设置群集For information on using Azure Data Lake Storage with HDInsight, see Quickstart: Set up clusters in HDInsight.

    • Azure 存储帐户中的一个 Blob,该存储帐户可以是 HDInsight 群集的主存储帐户,也可以是其附加存储帐户。A blob in an Azure Storage account that's either the primary or additional storage account for the HDInsight cluster. 在创建群集期间,已将这两种存储帐户的访问权限都授予 HDInsight。HDInsight is granted access to both of these types of storage accounts during cluster creation.

    • 一个公共文件共享服务。A public file-sharing service. 例如 Azure Blob、GitHub 和 OneDrive。Examples are Azure Blob, GitHub and OneDrive.

      有关示例 URI,请参阅脚本操作脚本示例For example URIs, see Example script action scripts.

      Warning

      HDInsight 仅支持具有标准性能层的 Azure 存储帐户中的 Blob。HDInsight only supports Blob in Azure Storage accounts with a standard performance tier.

  • 可以限制为只对特定的节点类型运行,Can be restricted to run on only certain node types. 例如头节点或工作节点。Examples are head nodes or worker nodes.

  • 可以是持久化或即席脚本。Can be persisted or ad hoc.

    持久化脚本用于自定义通过缩放操作添加到群集的新工作节点。Persisted scripts are used to customize new worker nodes added to the cluster through scaling operations. 进行缩放操作时,持久化脚本还可以将更改应用于其他节点类型,A persisted script might also apply changes to another node type when scaling operations occur. 例如头节点。An example is a head node.

    Important

    持久化脚本操作必须有唯一的名称。Persisted script actions must have a unique name.

    即席脚本不会持久保存。Ad hoc scripts aren't persisted. 它们在运行后不会应用于添加到群集的工作节点。They aren't applied to worker nodes added to the cluster after the script has run. 然后可将即席脚本升级为持久化脚本,或将持久化脚本降级为即席脚本。Then you can promote an ad hoc script to a persisted script or demote a persisted script to an ad hoc script.

    Important

    创建群集期间使用的脚本操作自动持久保存下来。Script actions used during cluster creation are automatically persisted.

    即使明确指出应予保存,也不会持久保存失败的脚本。Scripts that fail aren't persisted, even if you specifically indicate that they should be.

  • 可以接受脚本在执行期间使用的参数。Can accept parameters that are used by the script during execution.

  • 在群集节点上以根级别权限运行。Run with root-level privileges on the cluster nodes.

  • 可以通过 Azure 门户、Azure PowerShell、Azure 经典 CLI 或 HDInsight .NET SDK 使用。Can be used through the Azure portal, Azure PowerShell, the Azure classic CLI, or the HDInsight .NET SDK.

群集保留已运行的所有脚本的历史记录。The cluster keeps a history of all scripts that have been run. 需要查找要升级或降级的脚本的 ID 时,历史记录很有用。The history helps when you need to find the ID of a script for promotion or demotion operations.

Important

没有任何自动方式可撤销脚本操作所做的更改。There's no automatic way to undo the changes made by a script action. 需要手动还原更改,或提供可还原更改的脚本。Either manually reverse the changes or provide a script that reverses them.

群集创建过程中的脚本操作Script Action in the cluster creation process

在群集创建期间使用的脚本操作与在现有群集上运行的脚本操作稍有不同:Script actions used during cluster creation are slightly different from script actions run on an existing cluster:

  • 该脚本将自动持久保存。The script is automatically persisted.

  • 脚本失败可能会导致群集创建过程失败。A failure in the script can cause the cluster creation process to fail.

下图演示了在创建过程中运行脚本操作的时间:The following diagram illustrates when script action runs during the creation process:

群集创建期间的 HDInsight 群集自定义和阶段HDInsight cluster customization and stages during cluster creation

在配置 HDInsight 时运行脚本。The script runs while HDInsight is being configured. 脚本在群集中的所有指定节点上并行运行。The script runs in parallel on all the specified nodes in the cluster. 它在节点上使用 root 特权运行。It runs with root privileges on the nodes.

Note

可以执行停止和启动服务(包括 Apache Hadoop 相关服务)等操作。You can perform operations like stopping and starting services, including Apache Hadoop-related services. 如果停止服务,请确保 Ambari 服务及其他 Hadoop 相关服务在脚本完成运行之前正在运行。If you stop services, make sure that the Ambari service and other Hadoop-related services are running before the script finishes. 这些服务必须在群集创建时,成功地确定群集的运行状况和状态。These services are required to successfully determine the health and state of the cluster while it's being created.

在创建群集期间,可以同时使用多个脚本操作。During cluster creation, you can use many script actions at once. 这些脚本将按其指定顺序调用。These scripts are invoked in the order in which they were specified.

Important

脚本操作必须在 60 分钟内完成,否则会超时。在群集预配期间,脚本与其他安装和配置进程一同运行。Script actions must finish within 60 minutes, or they time out. During cluster provisioning, the script runs concurrently with other setup and configuration processes. 争用 CPU 时间和网络带宽等资源可能导致完成脚本所需的时间要长于在开发环境中所需的时间。Competition for resources such as CPU time or network bandwidth might cause the script to take longer to finish than it does in your development environment.

若要让运行脚本所花费的时间降到最低,请避免从源下载和编译应用程序等任务。To minimize the time it takes to run the script, avoid tasks like downloading and compiling applications from the source. 预编译应用程序,并将二进制文件存储在 Azure 存储中。Precompile applications and store the binary in Azure Storage.

正在运行的群集上的脚本操作Script action on a running cluster

在运行中群集上运行的脚本发生失败并不会自动导致群集更改为失败状态。A failure in a script run on an already running cluster doesn't automatically cause the cluster to change to a failed state. 脚本完成后,群集应该恢复为“正在运行”状态。After a script finishes, the cluster should return to a running state.

Important

即使群集处于“正在运行”状态,失败的脚本也可能已损坏。Even if the cluster has a running state, the failed script might have broken things. 例如,脚本无法删除群集所需的文件。For example, a script might delete files needed by the cluster.

使用 root 权限运行的脚本操作。Scripts actions run with root privileges. 确保先了解脚本的作用,然后再将它应用到群集。Make sure that you understand what a script does before you apply it to your cluster.

将脚本应用到群集时,群集状态将从“正在运行”更改为“已接受”。When you apply a script to a cluster, the cluster state changes from Running to Accepted. 然后,状态将更改为“HDInsight 配置”,最后恢复为“正在运行”,表示脚本成功。Then it changes to HDInsight configuration and, finally, back to Running for successful scripts. 脚本状态记录在脚本操作历史记录中。The script status is logged in the script action history. 此信息告知脚本是成功还是失败。This information tells you whether the script succeeded or failed. 例如,Get-AzHDInsightScriptActionHistory PowerShell cmdlet 显示脚本的状态。For example, the Get-AzHDInsightScriptActionHistory PowerShell cmdlet shows the status of a script. 此命令返回类似于以下文本的信息:It returns information similar to the following text:

ScriptExecutionId : 635918532516474303
StartTime         : 8/14/2017 7:40:55 PM
EndTime           : 8/14/2017 7:41:05 PM
Status            : Succeeded

Important

如果在创建群集后更改群集用户、管理员和密码,针对此群集运行的脚本操作可能会失败。If you change the cluster user, admin, password after the cluster is created, script actions run against this cluster might fail. 如果任何持久性脚本操作以工作节点为目标,则缩放群集时,这些脚本可能失败。If you have any persisted script actions that target worker nodes, these scripts might fail when you scale the cluster.

脚本操作脚本示例Example Script Action scripts

可通过以下实用工具使用脚本操作脚本:Script Action scripts can be used through the following utilities:

  • Azure 门户The Azure portal
  • Azure PowerShellAzure PowerShell
  • Azure 经典 CLIThe Azure classic CLI
  • HDInsight .NET SDKAn HDInsight .NET SDK

HDInsight 提供了脚本用于在 HDInsight 群集上安装以下组件:HDInsight provides scripts to install the following components on HDInsight clusters:

NameName 脚本Script
添加 Azure 存储帐户Add an Azure Storage account https://hdiconfigactions.blob.core.windows.net/linuxaddstorageaccountv01/add-storage-account-v01.shhttps://hdiconfigactions.blob.core.windows.net/linuxaddstorageaccountv01/add-storage-account-v01.sh. 请参阅将其他存储帐户添加到 HDInsightSee Add additional storage accounts to HDInsight.
安装 HueInstall Hue https://hdiconfigactions.blob.core.windows.net/linuxhueconfigactionv02/install-hue-uber-v02.shhttps://hdiconfigactions.blob.core.windows.net/linuxhueconfigactionv02/install-hue-uber-v02.sh. 请参阅在 HDInsight Hadoop 群集上安装并使用 HueSee Install and use Hue on HDInsight Hadoop clusters.
安装 PrestoInstall Presto https://raw.githubusercontent.com/hdinsight/presto-hdinsight/master/installpresto.shhttps://raw.githubusercontent.com/hdinsight/presto-hdinsight/master/installpresto.sh. 请参阅 在基于 Hadoop 的 HDInsight 群集上安装并使用 Presto.See Install and use Presto on Hadoop-based HDInsight clusters.
安装 GiraphInstall Giraph https://hdiconfigactions.blob.core.windows.net/linuxgiraphconfigactionv01/giraph-installer-v01.shhttps://hdiconfigactions.blob.core.windows.net/linuxgiraphconfigactionv01/giraph-installer-v01.sh. 请参阅在 HDInsight Hadoop 群集上安装 Apache GiraphSee Install Apache Giraph on HDInsight Hadoop clusters.
预加载 Hive 库Preload Hive libraries https://hdiconfigactions.blob.core.windows.net/linuxsetupcustomhivelibsv01/setup-customhivelibs-v01.shhttps://hdiconfigactions.blob.core.windows.net/linuxsetupcustomhivelibsv01/setup-customhivelibs-v01.sh. 请参阅创建 HDInsight 群集时添加自定义 Apache Hive 库See Add custom Apache Hive libraries when creating your HDInsight cluster.

在创建群集期间使用脚本操作Use a Script Action during cluster creation

本部分说明了创建 HDInsight 群集时脚本操作的各种用法。This section explains the different ways you can use script actions when you create an HDInsight cluster.

在创建群集期间从 Azure 门户使用脚本操作Use a Script Action during cluster creation from the Azure portal

  1. 根据使用 Apache Hadoop、Apache Spark、Apache Kafka 及其他组件在 HDInsight 中设置群集中所述开始创建群集。Start to create a cluster as described in Set up clusters in HDInsight with Apache Hadoop, Apache Spark, Apache Kafka, and more. 创建群集期间,将会看到“群集摘要”页。During cluster creation, you arrive at a Cluster summary page. 从“群集摘要”页中选择 高级设置编辑 链接。From the Cluster summary page, select the edit link for Advanced settings.

    “高级设置”链接

  2. 从“高级设置”部分中选择“脚本操作”。From the Advanced settings section, select Script actions. 在“脚本操作”部分选择“+ 提交新项”。From the Script actions section, select + Submit new.

    提交新脚本操作

  3. 使用“选择脚本”条目选择预制的脚本。Use the Select a script entry to select a premade script. 若要使用自定义脚本,请选择“自定义”。To use a custom script, select Custom. 然后提供脚本的“名称”和“Bash 脚本 URI”。Then provide the Name and Bash script URI for your script.

    在“选择脚本”窗体中添加脚本

    下表描述窗体中的元素:The following table describes the elements on the form:

    属性Property ValueValue
    选择脚本Select a script 要使用自己的脚本,请选择“自定义”。To use your own script, select Custom. 否则,请从提供的脚本中选择一个。Otherwise, select one of the provided scripts.
    NameName 指定脚本操作的名称。Specify a name for the script action.
    Bash 脚本 URIBash script URI 指定脚本的 URI。Specify the URI of the script.
    头节点/辅助角色/ZookeeperHead/Worker/Zookeeper 指定运行脚本的节点:“头节点”、“工作节点”或“ZooKeeper 节点”。Specify the nodes on which the script is run: Head, Worker, or ZooKeeper.
    parametersParameters 根据脚本的需要,请指定参数。Specify the parameters, if required by the script.

    使用“持久保存此脚本操作”条目,确保在执行缩放操作期间应用该脚本。Use the Persist this script action entry to make sure that the script is applied during scaling operations.

  4. 选择“创建”保存脚本。Select Create to save the script. 然后可以使用“+ 提交新项”再添加一个脚本。Then you can use + Submit new to add another script.

    多个脚本操作

    添加完脚本后,选择“选择”按钮,然后选择“下一步”按钮返回到“群集摘要”部分。When you're done adding scripts, select the Select button and then the Next button to return to the Cluster summary section.

  5. 要创建群集,请从“群集摘要”部分中选择“创建”。To create the cluster, select Create from the Cluster summary selection.

从 Azure 资源管理器模板使用脚本操作Use a script action from Azure Resource Manager templates

可通过 Azure 资源管理器模板使用脚本操作。Script actions can be used with Azure Resource Manager templates. 有关示例,请参阅创建 HDInsight Linux 群集并运行脚本操作For an example, see Create HDInsight Linux Cluster and run a script action.

此示例中使用以下代码添加脚本操作:In this example, the script action is added by using the following code:

"scriptActions": [
    {
        "name": "setenvironmentvariable",
        "uri": "[parameters('scriptActionUri')]",
        "parameters": "headnode"
    }
]

获取有关如何部署模板的详细信息:Get more information on how to deploy a template:

在创建群集期间从 Azure PowerShell 使用脚本操作Use a script action during cluster creation from Azure PowerShell

本部分使用 Add-AzHDInsightScriptAction cmdlet 来调用脚本,以自定义群集。In this section, you use the Add-AzHDInsightScriptAction cmdlet to invoke scripts to customize a cluster. 开始之前,请确保安装并配置 Azure PowerShell。Before you start, make sure you install and configure Azure PowerShell. 若要使用这些 PowerShell 命令,需要 AZ 模块To use these PowerShell commands, you need the AZ Module.

Note

本文进行了更新,以便使用新的 Azure PowerShell Az 模块。This article has been updated to use the new Azure PowerShell Az module. 你仍然可以使用 AzureRM 模块,至少在 2020 年 12 月之前,它将继续接收 bug 修补程序。You can still use the AzureRM module, which will continue to receive bug fixes until at least December 2020. 若要详细了解新的 Az 模块和 AzureRM 兼容性,请参阅新 Azure Powershell Az 模块简介To learn more about the new Az module and AzureRM compatibility, see Introducing the new Azure PowerShell Az module. 有关 Az 模块安装说明,请参阅安装 Azure PowerShellFor Az module installation instructions, see Install Azure PowerShell.

以下脚本演示如何在使用 PowerShell 创建群集时应用脚本操作:The following script shows how to apply a script action when you create a cluster by using PowerShell:

  # Login to your Azure subscription
    # Is there an active Azure subscription?
    $sub = Get-AzureRmSubscription -ErrorAction SilentlyContinue
    if(-not($sub))
    {
        Add-AzureRmAccount
    }

    # If you have multiple subscriptions, set the one to use
    # $subscriptionID = "<subscription ID to use>"
    # Select-AzureRmSubscription -SubscriptionId $subscriptionID

    # Get user input/default values
    $resourceGroupName = Read-Host -Prompt "Enter the resource group name"
    $location = Read-Host -Prompt "Enter the Azure region to create resources in"

    # Create the resource group
    New-AzureRmResourceGroup -Name $resourceGroupName -Location $location

    $defaultStorageAccountName = Read-Host -Prompt "Enter the name of the storage account"

    # Create an Azure storae account and container
    New-AzureRmStorageAccount `
        -ResourceGroupName $resourceGroupName `
        -Name $defaultStorageAccountName `
        -Type Standard_LRS `
        -Location $location
    $defaultStorageAccountKey = (Get-AzureRmStorageAccountKey `
                                    -ResourceGroupName $resourceGroupName `
                                    -Name $defaultStorageAccountName)[0].Value
    $defaultStorageContext = New-AzureStorageContext `
                                    -StorageAccountName $defaultStorageAccountName `
                                    -StorageAccountKey $defaultStorageAccountKey

    # Get information for the HDInsight cluster
    $clusterName = Read-Host -Prompt "Enter the name of the HDInsight cluster"
    # Cluster login is used to secure HTTPS services hosted on the cluster
    $httpCredential = Get-Credential -Message "Enter Cluster login credentials" -UserName "admin"
    # SSH user is used to remotely connect to the cluster using SSH clients
    $sshCredential = Get-Credential -Message "Enter SSH user credentials"

    # Default cluster size (# of worker nodes), version, type, and OS
    $clusterSizeInNodes = "4"
    $clusterVersion = "3.5"
    $clusterType = "Hadoop"
    $clusterOS = "Linux"
    # Set the storage container name to the cluster name
    $defaultBlobContainerName = $clusterName

    # Create a blob container. This holds the default data store for the cluster.
    New-AzureStorageContainer `
        -Name $clusterName -Context $defaultStorageContext

    # Create an HDInsight configuration object
    $config = New-AzureRmHDInsightClusterConfig
    # Add the script action
    $scriptActionUri="https://hdiconfigactions.blob.core.windows.net/linuxgiraphconfigactionv01/giraph-installer-v01.sh"
    # Add for the head nodes
    $config = Add-AzureRmHDInsightScriptAction `
        -Config $config `
        -Name "Install Giraph" `
        -NodeType HeadNode `
        -Uri $scriptActionUri
    # Continue adding the script action for any other node types
    # that it must run on.
    $config = Add-AzureRmHDInsightScriptAction `
        -Config $config `
        -Name "Install Giraph" `
        -NodeType WorkerNode `
        -Uri $scriptActionUri

    # Create the cluster using the configuration object
    New-AzureRmHDInsightCluster `
        -Config $config `
        -ResourceGroupName $resourceGroupName `
        -ClusterName $clusterName `
        -Location $location `
        -ClusterSizeInNodes $clusterSizeInNodes `
        -ClusterType $clusterType `
        -OSType $clusterOS `
        -Version $clusterVersion `
        -HttpCredential $httpCredential `
        -DefaultStorageAccountName "$defaultStorageAccountName.blob.core.windows.net" `
        -DefaultStorageAccountKey $defaultStorageAccountKey `
        -DefaultStorageContainer $containerName `
        -SshCredential $sshCredential

创建群集可能需要几分钟时间。It can take several minutes before the cluster is created.

在创建群集期间从 HDInsight .NET SDK 使用脚本操作Use a Script Action during cluster creation from the HDInsight .NET SDK

HDInsight .NET SDK 提供客户端库,以方便从 .NET 应用程序使用 HDInsight。The HDInsight .NET SDK provides client libraries that make it easier to work with HDInsight from a .NET application. 有关代码示例,请参阅使用 .NET SDK 在 HDInsight 中创建基于 Linux 的群集For a code sample, see Create Linux-based clusters in HDInsight by using the .NET SDK.

将脚本操作应用到正在运行的群集Apply a Script Action to a running cluster

本部分说明如何将脚本操作应用于正在运行的群集。This section explains how to apply script actions to a running cluster.

从 Azure 门户将脚本操作应用到正在运行的群集Apply a Script Action to a running cluster from the Azure portal

转到 Azure 门户Go to the Azure portal:

  1. 在左侧菜单中,选择“所有服务”。From the left menu, select All services.

  2. 在“数据 + 分析”下,选择“HDInsight 群集”。Under Data + Analytics, select HDInsight clusters.

  3. 从列表中选择你的群集。此时会打开默认视图。Select your cluster from the list, which opens the default view.

  4. 在默认视图中的“设置”下,选择“脚本操作”。From the default view, under Settings, select Script actions.

  5. 在“脚本操作”页顶部,选择“+ 提交新项”。From the top of the Script actions page, select + Submit new.

    将脚本添加到正在运行的群集

  6. 使用“选择脚本”条目选择预制的脚本。Use the Select a script entry to select a premade script. 若要使用自定义脚本,请选择“自定义”。To use a custom script, select Custom. 然后提供脚本的“名称”和“Bash 脚本 URI”。Then provide the Name and Bash script URI for your script.

    在“选择脚本”窗体中添加脚本

    下表描述窗体中的元素:The following table describes the elements on the form:

    属性Property ValueValue
    选择脚本Select a script 要使用自己的脚本,请选择“自定义”。To use your own script, select custom. 否则,请选择提供的脚本。Otherwise, select a provided script.
    NameName 指定脚本操作的名称。Specify a name for the script action.
    Bash 脚本 URIBash script URI 指定脚本的 URI。Specify the URI of the script.
    头节点/辅助角色/ZookeeperHead/Worker/Zookeeper 指定运行脚本的节点:“头节点”、“工作节点”或“ZooKeeper 节点”。Specify the nodes on which the script is run: Head, Worker, or ZooKeeper.
    parametersParameters 根据脚本的需要,请指定参数。Specify the parameters, if required by the script.

    使用“持久保存此脚本操作”条目,确保在缩放操作中应用了脚本。Use the Persist this script action entry to make sure the script is applied during scaling operations.

  7. 最后,选择“创建”按钮将脚本应用到群集。Finally, select the Create button to apply the script to the cluster.

从 Azure PowerShell 将脚本操作应用到正在运行的群集Apply a Script Action to a running cluster from Azure PowerShell

若要使用这些 PowerShell 命令,需要 AZ 模块To use these PowerShell commands, you need the AZ Module.

以下示例演示如何将脚本操作应用于正在运行的群集:The following example shows how to apply a script action to a running cluster:

# Get information for the HDInsight cluster
$clusterName = Read-Host -Prompt "Enter the name of the HDInsight cluster"
$scriptActionName = Read-Host -Prompt "Enter the name of the script action"
$scriptActionUri = Read-Host -Prompt "Enter the URI of the script action"
# The node types that the script action is applied to
$nodeTypes = "headnode", "workernode"

# Apply the script and mark as persistent
Submit-AzureRmHDInsightScriptAction -ClusterName $clusterName `
    -Name $scriptActionName `
    -Uri $scriptActionUri `
    -NodeTypes $nodeTypes `
    -PersistOnSuccess

操作完成后,会收到类似于以下文本的信息:After the operation finishes, you receive information similar to the following text:

OperationState  : Succeeded
ErrorMessage    :
Name            : Giraph
Uri             : https://hdiconfigactions.blob.core.windows.net/linuxgiraphconfigactionv01/giraph-installer-v01.sh
Parameters      :
NodeTypes       : {HeadNode, WorkerNode}

从 Azure CLI 将脚本操作应用到正在运行的群集Apply a script action to a running cluster from the Azure CLI

开始之前,请确保安装并配置 Azure CLI。Before you start, make sure you install and configure the Azure CLI. 有关详细信息,请参阅安装 Azure 经典 CLIFor more information, see Install the Azure classic CLI.

Important

本文包含需要 Azure 经典 CLI 的内容。This article contains content which requires the Azure classic CLI. 需要使用经典 CLI,因为当前版 Azure CLI 不支持本文中列出的功能。The current release of the Azure CLI does not have support for the features outlined in this article, so the classic CLI is required.

经典 CLI 可以与新式 Azure CLI 并行安装,但我们建议对所有支持这些功能的新脚本和部署使用 Azure CLI。The classic CLI can be installed side-by-side with the modern Azure CLI, but we recommend using the Azure CLI for all new scripts and deployments where feature support is available. 若要安装经典 CLI,请参阅安装 Azure 经典 CLI;若要安装当前 CLI,请参阅安装 Azure CLITo install the classic CLI, see Install the Azure classic CLI and to install the current CLI, see Install Azure CLI.

  1. 切换到 Azure Resource Manager 模式:Switch to Azure Resource Manager mode:

    azure config mode arm
    
  2. 对 Azure 订阅进行身份验证:Authenticate to your Azure subscription:

        azure login -e AzureChinaCloud
    
  3. 将脚本操作应用到正在运行的群集:Apply a script action to a running cluster:

    azure hdinsight script-action create <clustername> -g <resourcegroupname> -n <scriptname> -u <scriptURI> -t <nodetypes>
    

    如果省略此命令的参数,系统会提示你指定参数。If you omit parameters for this command, you're prompted for them. 如果指定了 -u 的脚本接受参数,可以使用 -p 参数来指定参数。If the script you specify with -u accepts parameters, you can specify them by using the -p parameter.

    有效的节点类型为 headnodeworkernodezookeeperValid node types are headnode, workernode, and zookeeper. 如果应将脚本应用到多个节点类型,请指定分号 ; 分隔的类型。If the script should be applied to several node types, specify the types separated by a semicolon ;. 例如,-n headnode;workernodeFor example, -n headnode;workernode.

    若要持久保存脚本,请添加 --persistOnSuccessTo persist the script, add --persistOnSuccess. 以后也可以使用 azure hdinsight script-action persisted set 来持久保存脚本。You can also persist the script later by using azure hdinsight script-action persisted set.

    作业完成后,会收到类似于以下文本的输出:After the job finishes, you get output like the following text:

     info:    Executing command hdinsight script-action create
     + Executing Script Action on HDInsight cluster
     data:    Operation Info
     data:    ---------------
     data:    Operation status:
     data:    Operation ID:  b707b10e-e633-45c0-baa9-8aed3d348c13
     info:    hdinsight script-action create command OK
    

使用 REST API 将脚本操作应用到正在运行的群集Apply a script action to a running cluster by using REST API

请参阅 Azure HDInsight 中的群集 REST APISee Cluster REST API in Azure HDInsight.

从 HDInsight .NET SDK 将脚本操作应用到正在运行的群集Apply a Script Action to a running cluster from the HDInsight .NET SDK

有关使用 .NET SDK 将脚本应用到群集的示例,请参阅针对正在运行的基于 Linux 的 HDInsight 群集应用脚本操作For an example of using the .NET SDK to apply scripts to a cluster, see Apply a Script Action against a running Linux-based HDInsight cluster.

查看历史记录以及升级和降级脚本操作View history and promote and demote script actions

Azure 门户The Azure portal

  1. 登录到 Azure 门户Sign in to the Azure portal.

  2. 在左侧菜单中,选择“所有服务”。From the left menu, select All services.

  3. 在“数据 + 分析”下,选择“HDInsight 群集”。Under Data + Analytics, select HDInsight clusters.

  4. 从列表中选择你的群集。此时会打开默认视图。Select your cluster from the list, which opens the default view.

  5. 在默认视图中的“设置”下,选择“脚本操作”。From the default view, under Settings, select Script actions.

  6. 此群集的脚本历史记录显示在“脚本操作”部分。A history of scripts for this cluster displays on the script actions section. 此信息包括持久化脚本的列表。This information includes a list of persisted scripts. 以下屏幕截图显示已在此群集上运行了 Solr 脚本。The following screenshot shows that the Solr script has been run on this cluster. 该屏幕截图未显示任何持久化脚本。The screenshot doesn't show any persisted scripts.

    脚本操作

  7. 选择历史记录中的脚本会显示此脚本的“属性”部分。Select a script from the history to display the Properties section for this script. 从屏幕顶部,可重新运行脚本或将它升级。From the top of the screen, you can rerun the script or promote it.

    脚本操作 - 属性

  8. 还可以使用“脚本操作”部分条目右侧的省略号 ... 来执行操作。You can also select the ellipsis, ..., to the right of entries on the script actions section to perform actions.

    脚本操作 - 省略号

Azure PowerShellAzure PowerShell

cmdletcmdlet 函数Function
Get-AzHDInsightPersistedScriptAction 检索有关持久化脚本操作的信息。Retrieve information on persisted script actions.
Get-AzHDInsightScriptActionHistory 检索已应用到群集的脚本操作的历史记录,或特定脚本的详细信息。Retrieve a history of script actions applied to the cluster or details for a specific script.
Set-AzHDInsightPersistedScriptAction 将即席脚本操作升级为持久化脚本操作。Promote an ad hoc script action to a persisted script action.
Remove-AzHDInsightPersistedScriptAction 将持久化脚本操作降级为即席脚本操作。Demote a persisted script action to an ad hoc action.

Important

Remove-AzHDInsightPersistedScriptAction 不会撤消脚本执行的操作。Remove-AzHDInsightPersistedScriptAction doesn't undo the actions performed by a script. 此 cmdlet 只会删除持久化标志。This cmdlet only removes the persisted flag.

以下示例脚本演示如何使用 cmdlet 来升级再降级脚本。The following example script demonstrates using the cmdlets to promote and then demote a script.

# Get a history of scripts
Get-AzureRmHDInsightScriptActionHistory -ClusterName mycluster

# From the list, we want to get information on a specific script
Get-AzureRmHDInsightScriptActionHistory -ClusterName mycluster `
    -ScriptExecutionId 635920937765978529

# Promote this to a persisted script
# Note: the script must have a unique name to be promoted
# if the name is not unique, you receive an error
Set-AzureRmHDInsightPersistedScriptAction -ClusterName mycluster `
    -ScriptExecutionId 635920937765978529

# Demote the script back to ad hoc
# Note that demotion uses the unique script name instead of
# execution ID.
Remove-AzureRmHDInsightPersistedScriptAction -ClusterName mycluster `
    -Name "Install Giraph"

Azure 经典 CLIThe Azure classic CLI

cmdletcmdlet 函数Function
azure hdinsight script-action persisted list <clustername> 检索持久化脚本操作的列表。Retrieve a list of persisted script actions.
azure hdinsight script-action persisted show <clustername> <scriptname> 检索有关特定持久化脚本操作的信息。Retrieve information on a specific persisted script action.
azure hdinsight script-action history list <clustername> 检索已应用到群集的脚本操作的历史记录。Retrieve a history of script actions applied to the cluster.
azure hdinsight script-action history show <clustername> <scriptname> 检索有关特定脚本操作的信息。Retrieve information on a specific script action.
azure hdinsight script action persisted set <clustername> <scriptexecutionid> 将即席脚本操作升级为持久化脚本操作。Promote an ad hoc script action to a persisted script action.
azure hdinsight script-action persisted delete <clustername> <scriptname> 将持久化脚本操作降级为即席脚本操作。Demote a persisted script action to an ad hoc action.

Important

azure hdinsight script-action persisted delete 不会撤消脚本执行的操作。azure hdinsight script-action persisted delete doesn't undo the actions performed by a script. 此 cmdlet 只会删除持久化标志。This cmdlet only removes the persisted flag.

HDInsight .NET SDKThe HDInsight .NET SDK

有关使用 .NET SDK 从群集中检索脚本历史记录、升级或降级脚本的示例,请参阅针对正在运行的基于 Linux 的 HDInsight 群集应用脚本操作For an example of using the .NET SDK to retrieve script history from a cluster, promote or demote scripts, see Apply a Script Action against a running Linux-based HDInsight cluster.

Note

此示例还演示了如何使用 .NET SDK 安装 HDInsight 应用程序。This example also demonstrates how to install an HDInsight application by using the .NET SDK.

支持 HDInsight 群集上使用的开源软件Support for open-source software used on HDInsight clusters

Azure HDInsight 服务使用围绕 Apache Hadoop 形成的开源技术生态系统。The Azure HDInsight service uses an ecosystem of open-source technologies formed around Apache Hadoop. Azure 为开源技术提供一般级别的支持。Azure provides a general level of support for open-source technologies. 有关详细信息,请参阅 Azure 支持常见问题解答的“支持范围”部分。For more information, see the Support Scope section of Azure Support FAQs. HDInsight 服务为内置组件提供附加的支持级别。The HDInsight service provides an additional level of support for built-in components.

在 HDInsight 服务中可以使用两种类型的开源组件:Two types of open-source components are available in the HDInsight service:

Warning

完全支持通过 HDInsight 群集提供的组件。Components provided with the HDInsight cluster are fully supported. Azure 支持部门将帮助找出并解决与这些组件相关的问题。Azure Support helps to isolate and resolve issues related to these components.

自定义组件可获得合理范围的支持,以帮助进一步排查问题。Custom components receive commercially reasonable support to help you further troubleshoot the issue. Azure 支持部门也许能够解决问题,也可能要求参与可用的开放源代码技术渠道,获取该技术的深入专业知识。Azure support might be able to resolve the issue OR they might ask you to engage available channels for the open source technologies where deep expertise for that technology is found. 可以使用许多社区站点,Many community sites can be used. 例如,面向 HDInsight 的 MSDN 论坛Azure CSDNExamples are MSDN forum for HDInsight, Azure CSDN. 此外,Apache 项目在 http://apache.org 上提供了项目站点,例如:HadoopAlso Apache projects have project sites on http://apache.org, for example: Hadoop.

Apache 项目还在 Apache 网站上提供了项目站点,Apache projects also have project sites on the Apache website. 例如 HadoopAn example is Hadoop.

HDInsight 服务提供多种方式来使用自定义组件。The HDInsight service provides several ways to use custom components. 不论在群集上使用组件或安装组件的方式为何,均适用相同级别的支持。The same level of support applies, no matter how a component is used or installed on the cluster. 以下列表介绍 HDInsight 群集上最常见的自定义组件用法:The following list describes the most common ways that custom components are used on HDInsight clusters:

  1. 作业提交Job submission. Hadoop 或其他类型的作业可以提交到执行或使用自定义组件的群集。Hadoop or other types of jobs that execute or use custom components can be submitted to the cluster.

  2. 群集自定义Cluster customization. 在群集创建期间,可以指定其他设置和安装在群集节点上的自定义组件。During cluster creation, you can specify additional settings and custom components that are installed on the cluster nodes.

  3. 示例Samples. 对于常见的自定义组件,Microsoft 和其他人可能会提供演示如何在 HDInsight 群集上使用这些组件的示例。For popular custom components, Microsoft and others might provide samples of how these components can be used on HDInsight clusters. 我们不针对这些示例提供支持。These samples are provided without support.

故障排除Troubleshooting

可以使用 Ambari web UI 查看脚本操作记录的信息。You can use the Ambari web UI to view information logged by script actions. 如果在创建群集期间脚本失败,则与该群集关联的默认存储帐户中也会提供日志。If the script fails during cluster creation, the logs are also available in the default storage account associated with the cluster. 本部分提供有关如何使用这两个选项检索日志的信息。This section provides information on how to retrieve the logs by using both these options.

Apache Ambari Web UIThe Apache Ambari web UI

  1. 在浏览器中转到 https://CLUSTERNAME.azurehdinsight.cn。In your browser, go to https://CLUSTERNAME.azurehdinsight.cn. CLUSTERNAME 替换为 HDInsight 群集的名称。Replace CLUSTERNAME with the name of your HDInsight cluster.

    出现提示时,为群集输入管理员帐户名 admin 和密码。When prompted, enter the admin account name, admin, and password for the cluster. 可能需要在 Web 表单中重新输入管理员凭据。You might have to reenter the admin credentials in a web form.

  2. 从页面顶部栏中选择“操作”条目 。From the bar at the top of the page, select the ops entry. 此时会显示通过 Ambari 在群集上执行的当前操作和以前操作的列表。A list displays current and previous operations done on the cluster through Ambari.

    选中了“操作”的 Ambari Web UI 栏

  3. 查找“操作”列中包含 run_customscriptaction 的条目。Find the entries that have run_customscriptaction in the Operations column. 这些条目是在运行脚本操作时创建的。These entries are created when the Script Actions run.

    操作的屏幕截图

    若要查看 STDOUTSTDERR 输出,请选择 run\customscriptaction 条目,并通过链接向下钻取。To view the STDOUT and STDERR output, select the run\customscriptaction entry and drill down through the links. 此输出是在脚本运行时生成的,可能包含有用的信息。This output is generated when the script runs and might have useful information.

从默认的存储帐户访问日志Access logs from the default storage account

如果因脚本错误导致群集创建失败,则日志会保存在群集存储帐户中。If cluster creation fails because of a script error, the logs are kept in the cluster storage account.

  • 存储日志位于 \STORAGE_ACCOUNT_NAME\DEFAULT_CONTAINER_NAME\custom-scriptaction-logs\CLUSTER_NAME\DATEThe storage logs are available at \STORAGE_ACCOUNT_NAME\DEFAULT_CONTAINER_NAME\custom-scriptaction-logs\CLUSTER_NAME\DATE.

    操作的屏幕截图

    在此目录下,日志分别针对头节点工作器节点Zookeeper 节点进行组织。Under this directory, the logs are organized separately for headnode, worker node, and zookeeper node. 请看以下示例:See the following examples:

    • 头节点 - <uniqueidentifier>AmbariDb-hn0-<generated_value>.chinacloudapp.cnHeadnode - <uniqueidentifier>AmbariDb-hn0-<generated_value>.chinacloudapp.cn

    • 辅助角色节点 - <uniqueidentifier>AmbariDb-wn0-<generated_value>.chinacloudapp.cnWorker node - <uniqueidentifier>AmbariDb-wn0-<generated_value>.chinacloudapp.cn

    • Zookeeper 节点 - <uniqueidentifier>AmbariDb-zk0-<generated_value>.chinacloudapp.cnZookeeper node - <uniqueidentifier>AmbariDb-zk0-<generated_value>.chinacloudapp.cn

  • 相应主机的所有 stdoutstderr 将上传到存储帐户。All stdout and stderr of the corresponding host is uploaded to the storage account. 每个脚本操作各有一个 output-*.txterrors-*.txtThere's one output-*.txt and errors-*.txt for each script action. output-*.txt 文件包含有关在主机上运行的脚本的 URI 信息。The output-*.txt file contains information about the URI of the script that was run on the host. 以下文本是此信息的示例:The following text is an example of this information:

      'Start downloading script locally: ', u'https://hdiconfigactions.blob.core.windows.net/linuxrconfigactionv01/r-installer-v01.sh'
    
  • 有可能重复创建了同名的脚本操作群集。It's possible that you repeatedly create a script action cluster with the same name. 在这种情况下,可以根据 DATE 文件夹名称来区分相关的日志。In that case, you can distinguish the relevant logs based on the DATE folder name. 例如,在不同日期创建的群集 mycluster 的文件夹结构类似于以下日志条目:For example, the folder structure for a cluster, mycluster, created on different dates appears similar to the following log entries:

    \STORAGE_ACCOUNT_NAME\DEFAULT_CONTAINER_NAME\custom-scriptaction-logs\mycluster\2015-10-04 \STORAGE_ACCOUNT_NAME\DEFAULT_CONTAINER_NAME\custom-scriptaction-logs\mycluster\2015-10-05\STORAGE_ACCOUNT_NAME\DEFAULT_CONTAINER_NAME\custom-scriptaction-logs\mycluster\2015-10-04 \STORAGE_ACCOUNT_NAME\DEFAULT_CONTAINER_NAME\custom-scriptaction-logs\mycluster\2015-10-05

  • 如果在同一天创建同名的脚本操作群集,可以使用唯一的前缀来标识相关日志。If you create a script action cluster with the same name on the same day, you can use the unique prefix to identify the relevant log files.

  • 如果在临近晚上 12:00(午夜)时创建群集,则日志可能跨越两天。If you create a cluster near 12:00 AM, midnight, it's possible that the log files span across two days. 在这种情况下,会看到同一群集有两个不同的日期文件夹。In that case, you see two different date folders for the same cluster.

  • 将日志上传到默认容器可能需要 5 分钟,特别是对于大型群集。Uploading log files to the default container can take up to five minutes, especially for large clusters. 因此,如果想要访问日志,则不应在脚本操作失败时立即删除群集。So if you want to access the logs, you shouldn't immediately delete the cluster if a script action fails.

Ambari 监视器Ambari watchdog

Warning

不要在基于 Linux 的 HDInsight 群集上更改 Ambari 监视程序 (hdinsightwatchdog) 的密码。Don't change the password for the Ambari watchdog, hdinsightwatchdog, on your Linux-based HDInsight cluster. 如果更改此帐户的密码,则无法在 HDInsight 群集上运行新脚本操作。Changing the password for this account breaks the ability to run new script actions on the HDInsight cluster.

无法导入名称 BlobServiceCan't import name BlobService

症状Symptoms. 脚本操作失败。The script action fails. 在 Ambari 中查看该操作时,显示类似于以下错误的文本:Text similar to the following error displays when you view the operation in Ambari:

Traceback (most recent call list):
  File "/var/lib/ambari-agent/cache/custom_actions/scripts/run_customscriptaction.py", line 21, in <module>
    from azure.storage.blob import BlobService
ImportError: cannot import name BlobService

原因Cause. 如果升级 HDInsight 群集中随附的 Python Azure 存储客户端,则会发生此错误。This error occurs if you upgrade the Python Azure Storage client that's included with the HDInsight cluster. HDInsight 需要 Azure 存储客户端 0.20.0。HDInsight expects Azure Storage client 0.20.0.

解决方法Resolution. 若要解决此错误,请使用 ssh 手动连接到每个群集节点。To resolve this error, manually connect to each cluster node by using ssh. 运行以下命令重新安装正确的存储客户端版本:Run the following command to reinstall the correct storage client version:

sudo pip install azure-storage==0.20.0

有关使用 SSH 连接到群集的信息,请参阅使用 SSH 连接到 HDInsight (Apache Hadoop)For information on connecting to the cluster with SSH, see Connect to HDInsight (Apache Hadoop) by using SSH.

历史记录未显示创建群集期间使用的脚本History doesn't show the scripts used during cluster creation

如果群集是在 2016 年 3 月 15 日之前创建的,则脚本操作历史记录中可能不显示任何条目。If your cluster was created before March 15, 2016, you might not see an entry in script action history. 调整群集大小后,脚本会出现在脚本操作历史记录中。Resizing the cluster causes the scripts to appear in script action history.

有两种例外情况:There are two exceptions:

  • 群集是在 2015 年 9 月 1 日之前创建的。Your cluster was created before September 1, 2015. 这是脚本操作的推出时间。This date is when script actions were introduced. 在此日期之前创建的任何群集都不可能是使用脚本操作创建的。Any cluster created before this date couldn't have used script actions for cluster creation.

  • 创建群集期间使用了多个脚本操作。You used multiple script actions during cluster creation. 或者,将相同的名称和相同的 URI 用于多个脚本,但将不同的参数用于多个脚本。Or you used the same name for multiple scripts or the same name, same URI, but different parameters for multiple scripts. 在这种情况下,将收到以下错误:In these cases, you get the following error:

    由于现有脚本中的脚本名称有冲突,因此无法在此群集上运行任何新脚本操作。No new script actions can be run on this cluster because of conflicting script names in existing scripts. 创建群集时提供的脚本名称全都必须唯一。Script names provided at cluster creation must be all unique. 现有脚本在调整大小时运行。Existing scripts are run on resize.

后续步骤Next steps