使用 .NET SDK 管理 HDInsight 中的 Apache Hadoop 群集Manage Apache Hadoop clusters in HDInsight by using .NET SDK

了解如何使用 HDInsight.NET SDK 管理 HDInsight 群集。Learn how to manage HDInsight clusters using HDInsight.NET SDK.

先决条件Prerequisites

在开始阅读本文前,必须具有:Before you begin this article, you must have the following:

连接到 Azure HDInsightConnect to Azure HDInsight

需要以下 NuGet 包:You need the following NuGet packages:

Install-Package Microsoft.Rest.ClientRuntime.Azure.Authentication -Pre
Install-Package Microsoft.Azure.Management.ResourceManager -Pre
Install-Package Microsoft.Azure.Management.HDInsight

以下代码示例演示如何先连接到 Azure,并管理 Azure 订阅下面的 HDInsight 群集。The following code sample shows you how to connect to Azure before you can administer HDInsight clusters under your Azure subscription.

using System;
using Microsoft.Azure;
using Microsoft.Azure.Management.HDInsight;
using Microsoft.Azure.Management.HDInsight.Models;
using Microsoft.Azure.Management.ResourceManager;
using Microsoft.IdentityModel.Clients.ActiveDirectory;
using Microsoft.Rest;
using Microsoft.Rest.Azure.Authentication;

    namespace HDInsightManagement
    {
        class Program
        {
            private static HDInsightManagementClient _hdiManagementClient;
            // Replace with your AAD tenant ID if necessary
            private const string TenantId = UserTokenProvider.CommonTenantId; 
            private const string SubscriptionId = "<Your Azure Subscription ID>";
            // This is the GUID for the PowerShell client. Used for interactive logins in this example.
            private const string ClientId = "1950a258-227b-4e31-a9cf-717495945fc2";
            private static Uri BaseUri = new Uri("https://management.chinacloudapi.cn/");

            static void Main(string[] args)
            {
                // Authenticate and get a token
                var authToken = Authenticate(TenantId, ClientId, SubscriptionId);
                // Flag subscription for HDInsight, if it isn't already.
                EnableHDInsight(authToken);
                // Get an HDInsight management client
                _hdiManagementClient = new HDInsightManagementClient(authToken, BaseUri);

                // insert code here

                System.Console.WriteLine("Press ENTER to continue");
                System.Console.ReadLine();
            }

            /// <summary>
            /// Authenticate to an Azure subscription and retrieve an authentication token
            /// </summary>
            static TokenCloudCredentials Authenticate(string TenantId, string ClientId, string SubscriptionId)
            {
                var authContext = new AuthenticationContext("https://login.chinacloudapi.cn/" + TenantId);
                var tokenAuthResult = authContext.AcquireToken("https://management.core.chinacloudapi.cn/", 
                    ClientId, 
                    new Uri("urn:ietf:wg:oauth:2.0:oob"), 
                    PromptBehavior.Always, 
                    UserIdentifier.AnyUser);
                return new TokenCloudCredentials(SubscriptionId, tokenAuthResult.AccessToken);
            }
            /// <summary>
            /// Marks your subscription as one that can use HDInsight, if it has not already been marked as such.
            /// </summary>
            /// <remarks>This is essentially a one-time action; if you have already done something with HDInsight
            /// on your subscription, then this isn't needed at all and will do nothing.</remarks>
            /// <param name="authToken">An authentication token for your Azure subscription</param>
            static void EnableHDInsight(TokenCloudCredentials authToken)
            {
                // Create a client for the Resource manager and set the subscription ID
                var resourceManagementClient = new ResourceManagementClient(BaseUri, new TokenCredentials(authToken.Token));
                resourceManagementClient.SubscriptionId = SubscriptionId;
                // Register the HDInsight provider
                var rpResult = resourceManagementClient.Providers.Register("Microsoft.HDInsight");
            }
        }
    }

运行此程序时,会看到提示。You shall see a prompt when you run this program. 若不想看到提示,请参阅创建非交互式身份验证 .NET HDInsight 应用程序If you don't want to see the prompt, see Create non-interactive authentication .NET HDInsight applications.

列出群集List clusters

以下代码片段列出了群集和一些属性:The following code snippet lists clusters and some properties:

var results = _hdiManagementClient.Clusters.List();
foreach (var name in results.Clusters) {
    Console.WriteLine("Cluster Name: " + name.Name);
    Console.WriteLine("\t Cluster type: " + name.Properties.ClusterDefinition.ClusterType);
    Console.WriteLine("\t Cluster location: " + name.Location);
    Console.WriteLine("\t Cluster version: " + name.Properties.ClusterVersion);
}

删除群集Delete clusters

使用以下代码片段以同步或异步方式删除群集:Use the following code snippet to delete a cluster synchronously or asynchronously:

_hdiManagementClient.Clusters.Delete("<Resource Group Name>", "<Cluster Name>");
_hdiManagementClient.Clusters.DeleteAsync("<Resource Group Name>", "<Cluster Name>");

缩放群集Scale clusters

使用群集缩放功能,可更改 Azure HDInsight 中运行的群集使用的辅助节点数,而无需重新创建群集。The cluster scaling feature allows you to change the number of worker nodes used by a cluster that is running in Azure HDInsight without having to re-create the cluster.

备注

只支持使用 HDInsight 3.1.3 或更高版本的群集。Only clusters with HDInsight version 3.1.3 or higher are supported. 如果不确定群集的版本,可以查看“属性”页面。If you are unsure of the version of your cluster, you can check the Properties page. 请参阅 列出并显示群集See List and show clusters.

更改 HDInsight 支持的每种类型的群集所用数据节点数的影响:The impact of changing the number of data nodes for each type of cluster supported by HDInsight:

  • Apache HadoopApache Hadoop

    可顺利增加正在运行的 Hadoop 群集中的辅助节点数,而不会影响任何挂起或运行中的作业。You can seamlessly increase the number of worker nodes in a Hadoop cluster that is running without impacting any pending or running jobs. 也可在操作进行中提交新作业。New jobs can also be submitted while the operation is in progress. 系统会正常处理失败的缩放操作,让群集始终保持正常运行状态。Failures in a scaling operation are gracefully handled so that the cluster is always left in a functional state.

    减少数据节点数目以缩减 Hadoop 群集时,系统会重新启动群集中的某些服务。When a Hadoop cluster is scaled down by reducing the number of data nodes, some of the services in the cluster are restarted. 这会导致所有正在运行和挂起的作业在缩放操作完成时失败。This causes all running and pending jobs to fail at the completion of the scaling operation. 但是,可在操作完成后重新提交这些作业。You can, however, resubmit the jobs once the operation is complete.

  • Apache HBaseApache HBase

    可在 HBase 群集运行时顺利添加或删除节点。You can seamlessly add or remove nodes to your HBase cluster while it is running. 完成缩放操作后的几分钟内,区域服务器自动平衡。Regional Servers are automatically balanced within a few minutes of completing the scaling operation. 但也可手动平衡区域服务器,方法是登录到群集的头节点,并在命令提示符窗口中运行以下命令:However, you can also manually balance the regional servers by logging into the headnode of cluster and running the following commands from a command prompt window:

    >pushd %HBASE_HOME%\bin
    >hbase shell
    >balancer
    
  • Apache StormApache Storm

    可在 Storm 群集运行时顺利添加或删除数据节点。You can seamlessly add or remove data nodes to your Storm cluster while it is running. 但是,缩放操作成功完成后,需要重新平衡拓扑。But after a successful completion of the scaling operation, you will need to rebalance the topology.

    可以使用两种方法来完成重新平衡操作:Rebalancing can be accomplished in two ways:

    • Storm Web UIStorm web UI

    • 命令行界面 (CLI) 工具Command-line interface (CLI) tool

      有关详细信息,请参阅 Apache Storm 文档Please refer to the Apache Storm documentation for more details.

      HDInsight 群集上提供了 Storm Web UI:The Storm web UI is available on the HDInsight cluster:

      HDInsight Storm 缩放重新平衡

      以下是有关如何使用 CLI 命令重新平衡 Storm 拓扑的示例:Here is an example how to use the CLI command to rebalance the Storm topology:

      ## Reconfigure the topology "mytopology" to use 5 worker processes,
      ## the spout "blue-spout" to use 3 executors, and
      ## the bolt "yellow-bolt" to use 10 executors
      $ storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10
      

以下代码片段显示如何以同步或异步方式调整群集的大小:The following code snippet shows how to resize a cluster synchronously or asynchronously:

_hdiManagementClient.Clusters.Resize("<Resource Group Name>", "<Cluster Name>", <New Size>);   
_hdiManagementClient.Clusters.ResizeAsync("<Resource Group Name>", "<Cluster Name>", <New Size>);   

授予/撤消访问权限Grant/revoke access

HDInsight 群集提供以下 HTTP Web 服务(所有这些服务都有 REST 样式的终结点):HDInsight clusters have the following HTTP web services (all of these services have RESTful endpoints):

  • ODBCODBC
  • JDBCJDBC
  • Apache AmbariApache Ambari
  • Apache OozieApache Oozie
  • Apache TempletonApache Templeton

默认情况下,这些服务会获得访问授权。By default, these services are granted for access. 可以撤消/授予访问权限。You can revoke/grant the access. 若要撤消:To revoke:

var httpParams = new HttpSettingsParameters
{
    HttpUserEnabled = false,
    HttpUsername = "admin",
    HttpPassword = "*******",
};
_hdiManagementClient.Clusters.ConfigureHttpSettings("<Resource Group Name>, <Cluster Name>, httpParams);

若要授予:To grant:

var httpParams = new HttpSettingsParameters
{
    HttpUserEnabled = enable,
    HttpUsername = "admin",
    HttpPassword = "*******",
};
_hdiManagementClient.Clusters.ConfigureHttpSettings("<Resource Group Name>, <Cluster Name>, httpParams);

备注

授予/撤消访问权限时,会重设群集用户的用户名和密码。By granting/revoking the access, you will reset the cluster user name and password.

也可以通过门户完成此操作。This can also be done via the Portal. 请参阅使用 Azure 门户管理 HDInsight 中的 Apache Hadoop 群集See Manage Apache Hadoop clusters in HDInsight by using the Azure portal.

更新 HTTP 用户凭据Update HTTP user credentials

此过程与授予/撤销 HTTP 访问权限相同。It is the same procedure as Grant/revoke HTTP access. 如果已授予群集 HTTP 访问权限,必须先撤销该权限。If the cluster has been granted the HTTP access, you must first revoke it. 然后再使用新的 HTTP 用户凭据授予访问权限。And then grant the access with new HTTP user credentials.

查找默认存储帐户Find the default storage account

以下代码片段演示如何获取群集的默认存储帐户名称和默认存储帐户密钥。The following code snippet demonstrates how to get the default storage account name and the default storage account key for a cluster.

var results = _hdiManagementClient.Clusters.GetClusterConfigurations(<Resource Group Name>, <Cluster Name>, "core-site");
foreach (var key in results.Configuration.Keys)
{
    Console.WriteLine(String.Format("{0} => {1}", key, results.Configuration[key]));
}

提交作业Submit jobs

提交 MapReduce 作业To submit MapReduce jobs

请参阅在 HDInsight 中运行 MapReduce 示例See Run MapReduce samples in HDInsight.

提交 Apache Hive 作业To submit Apache Hive jobs

请参阅使用 .NET SDK 运行 Apache Hive 查询See Run Apache Hive queries using .NET SDK.

提交 Apache Sqoop 作业To submit Apache Sqoop jobs

请参阅将 Apache Sqoop 与 HDInsight 配合使用See Use Apache Sqoop with HDInsight.

提交 Apache Oozie 作业To submit Apache Oozie jobs

请参阅在 HDInsight 中将 Apache Oozie 与 Hadoop 配合使用以定义和运行工作流See Use Apache Oozie with Hadoop to define and run a workflow in HDInsight.

将数据上传到 Azure Blob 存储Upload data to Azure Blob storage

请参阅将数据上传到 HDInsightSee Upload data to HDInsight.

另请参阅See Also