使用 HDInsight 中用于 Apache Hadoop 的 .NET SDK 运行 Apache Sqoop 作业Run Apache Sqoop jobs by using .NET SDK for Apache Hadoop in HDInsight

了解如何使用 Azure HDInsight .NET SDK 运行 HDInsight 中的 Apache Sqoop 作业,以在 HDInsight 群集和 Azure SQL 数据库或 SQL Server 数据库之间进行导入和导出。Learn how to use the Azure HDInsight .NET SDK to run Apache Sqoop jobs in HDInsight to import and export between an HDInsight cluster and an Azure SQL database or SQL Server database.

Note

尽管可以对基于 Windows 或 Linux 的 HDInsight 群集使用本文中的步骤,但是,只能从 Windows 客户端执行这些步骤。Although you can use the procedures in this article with either a Windows-based or Linux-based HDInsight cluster, they work only from a Windows client. 若要选择其他方法,使用本文顶部的选项卡选择器。To choose other methods, use the tab selector at the top of this article.

先决条件Prerequisites

开始学习本教程之前,必须具备以下项:Before you begin this tutorial, you must have the following item:

通过 .NET SDK 使用 HDInsight 群集上的 SqoopUse Sqoop on HDInsight clusters with the .NET SDK

HDInsight .NET SDK 提供 .NET 客户端库,以便可轻易从 .NET 中使用 HDInsight 群集。The HDInsight .NET SDK provides .NET client libraries, so that it's easier to work with HDInsight clusters from .NET. 本部分创建一个 C# 控制台应用程序,以便将 hivesampletable 导出到本教程前面创建的 Azure SQL 数据库表。In this section, you create a C# console application to export the hivesampletable to the Azure SQL Database table that you created earlier in this tutorial.

提交 Sqoop 作业Submit a Sqoop job

  1. 在 Visual Studio 中创建 C# 控制台应用程序。Create a C# console application in Visual Studio.

  2. 在 Visual Studio 包管理器控制台中,运行以下 NuGet 命令将包导入:From the Visual Studio Package Manager console, import the package by running the following NuGet command:

     Install-Package Microsoft.Azure.Management.HDInsight.Job
    
  3. 在 Program.cs 文件中使用以下代码:Use the following code in the Program.cs file:

     using System.Collections.Generic;
     using Microsoft.Azure.Management.HDInsight.Job;
     using Microsoft.Azure.Management.HDInsight.Job.Models;
     using Hyak.Common;
    
     namespace SubmitHDInsightJobDotNet
     {
         class Program
         {
             private static HDInsightJobManagementClient _hdiJobManagementClient;
    
             private const string ExistingClusterName = "<Your HDInsight Cluster Name>";
             private const string ExistingClusterUri = ExistingClusterName + ".azurehdinsight.cn";
             private const string ExistingClusterUsername = "<Cluster Username>";
             private const string ExistingClusterPassword = "<Cluster User Password>";
    
             static void Main(string[] args)
             {
                 System.Console.WriteLine("The application is running ...");
    
                 var clusterCredentials = new BasicAuthenticationCloudCredentials { Username = ExistingClusterUsername, Password = ExistingClusterPassword };
                 _hdiJobManagementClient = new HDInsightJobManagementClient(ExistingClusterUri, clusterCredentials);
    
                 SubmitSqoopJob();
    
                 System.Console.WriteLine("Press ENTER to continue ...");
                 System.Console.ReadLine();
             }
    
             private static void SubmitSqoopJob()
             {
                 var sqlDatabaseServerName = "<SQLDatabaseServerName>";
                 var sqlDatabaseLogin = "<SQLDatabaseLogin>";
                 var sqlDatabaseLoginPassword = "<SQLDatabaseLoginPassword>";
                 var sqlDatabaseDatabaseName = "<DatabaseName>";
    
                 var tableName = "<TableName>";
                 var exportDir = "/tutorials/usesqoop/data";
    
                 // Connection string for using Azure SQL Database.
                 // Comment if using SQL Server
                 var connectionString = "jdbc:sqlserver://" + sqlDatabaseServerName + ".database.chinacloudapi.cn;user=" + sqlDatabaseLogin + "@" + sqlDatabaseServerName + ";password=" + sqlDatabaseLoginPassword + ";database=" + sqlDatabaseDatabaseName;
                 // Connection string for using SQL Server.
                 // Uncomment if using SQL Server
                 //var connectionString = "jdbc:sqlserver://" + sqlDatabaseServerName + ";user=" + sqlDatabaseLogin + ";password=" + sqlDatabaseLoginPassword + ";database=" + sqlDatabaseDatabaseName;
    
                 var parameters = new SqoopJobSubmissionParameters
                 {
                     Files = new List<string> { "/user/oozie/share/lib/sqoop/sqljdbc41.jar" }, // This line is required for Linux-based cluster.
                     Command = "export --connect " + connectionString + " --table " + tableName + "_mobile --export-dir " + exportDir + "_mobile --fields-terminated-by \\t -m 1"
                 };
    
                 System.Console.WriteLine("Submitting the Sqoop job to the cluster...");
                 var response = _hdiJobManagementClient.JobManagement.SubmitSqoopJob(parameters);
                 System.Console.WriteLine("Validating that the response is as expected...");
                 System.Console.WriteLine("Response status code is " + response.StatusCode);
                 System.Console.WriteLine("Validating the response object...");
                 System.Console.WriteLine("JobId is " + response.JobSubmissionJsonResponse.Id);
             }
         }
     }
    
  4. 若要运行程序,请选择 F5 键。To run the program, select the F5 key.

限制Limitations

基于 Linux 的 HDInsight 存在以下限制:Linux-based HDInsight presents the following limitations:

  • 批量导出:用于将数据导出到 Microsoft SQL Server 或 Azure SQL 数据库的 Sqoop 连接器目前不支持批量插入。Bulk export: The Sqoop connector that's used to export data to Microsoft SQL Server or Azure SQL Database does not currently support bulk inserts.

  • 批处理:如果使用 -batch 开关,Sqoop 将执行多次插入而不是批处理插入操作。Batching: By using the -batch switch, Sqoop performs multiple inserts instead of batching the insert operations.

后续步骤Next steps

现在你已了解如何使用 Sqoop。Now you have learned how to use Sqoop. 若要了解更多信息,请参阅以下文章:To learn more, see: