使用 HDInsight 中用于 Apache Hadoop 的 .NET SDK 运行 Apache Sqoop 作业Run Apache Sqoop jobs by using .NET SDK for Apache Hadoop in HDInsight

了解如何使用 Azure HDInsight .NET SDK 运行 HDInsight 中的 Apache Sqoop 作业,以在 HDInsight 群集和 Azure SQL 数据库或 SQL Server 数据库之间进行导入和导出。Learn how to use the Azure HDInsight .NET SDK to run Apache Sqoop jobs in HDInsight to import and export between an HDInsight cluster and an Azure SQL database or SQL Server database.

必备条件Prerequisites

通过 .NET SDK 使用 HDInsight 群集上的 SqoopUse Sqoop on HDInsight clusters with the .NET SDK

HDInsight .NET SDK 提供 .NET 客户端库,以便可轻易从 .NET 中使用 HDInsight 群集。The HDInsight .NET SDK provides .NET client libraries, so that it's easier to work with HDInsight clusters from .NET. 本部分创建一个 C# 控制台应用程序,以便将 hivesampletable 导出到先决条件中创建的 Azure SQL 数据库表。In this section, you create a C# console application to export the hivesampletable to the Azure SQL Database table that you created from the prerequisites.

设置Set up

  1. 启动 Visual Studio 并创建 C# 控制台应用程序。Start Visual Studio and create a C# console application.

  2. 导航到“工具” > “NuGet 包管理器” > “包管理器控制台”,然后运行以下命令 :Navigate to Tools > NuGet Package Manager > Package Manager Console and run the following command:

    Install-Package Microsoft.Azure.Management.HDInsight.Job
    

Sqoop 导出Sqoop export

从 Hive 到 SQL Server。From Hive to SQL Server. 此示例将数据从 Hive hivesampletable 表导出到 SQL 数据库中的 mobiledata 表。This example exports data from the Hive hivesampletable table to the mobiledata table in SQL Database.

  1. 在 Program.cs 文件中使用以下代码。Use the following code in the Program.cs file. 编辑代码以设置 ExistingClusterNameExistingClusterPassword 的值。Edit the code to set the values for ExistingClusterName, and ExistingClusterPassword.

    using Microsoft.Azure.Management.HDInsight.Job;
    using Microsoft.Azure.Management.HDInsight.Job.Models;
    using Hyak.Common;
    
    namespace SubmitHDInsightJobDotNet
    {
        class Program
        {
            private static HDInsightJobManagementClient _hdiJobManagementClient;
    
            private const string ExistingClusterName = "<Your HDInsight Cluster Name>";
            private const string ExistingClusterPassword = "<Cluster User Password>";
            private const string ExistingClusterUri = ExistingClusterName + ".azurehdinsight.cn";
            private const string ExistingClusterUsername = "admin";
    
            static void Main(string[] args)
            {
                System.Console.WriteLine("The application is running ...");
    
                var clusterCredentials = new BasicAuthenticationCloudCredentials { Username = ExistingClusterUsername, Password = ExistingClusterPassword };
                _hdiJobManagementClient = new HDInsightJobManagementClient(ExistingClusterUri, clusterCredentials);
    
                SubmitSqoopJob();
    
                System.Console.WriteLine("Press ENTER to continue ...");
                System.Console.ReadLine();
            }
    
            private static void SubmitSqoopJob()
            {
                var sqlDatabaseServerName = ExistingClusterName + "dbserver";
                var sqlDatabaseLogin = "sqluser";
                var sqlDatabaseLoginPassword = ExistingClusterPassword;
                var sqlDatabaseDatabaseName = ExistingClusterName + "db";
    
                // Connection string for using Azure SQL Database; Comment if using SQL Server
                var connectionString = "jdbc:sqlserver://" + sqlDatabaseServerName + ".database.chinacloudapi.cn;user=" + sqlDatabaseLogin + "@" + sqlDatabaseServerName + ";password=" + sqlDatabaseLoginPassword + ";database=" + sqlDatabaseDatabaseName;
    
                // Connection string for using SQL Server; Uncomment if using SQL Server
                // var connectionString = "jdbc:sqlserver://" + sqlDatabaseServerName + ";user=" + sqlDatabaseLogin + ";password=" + sqlDatabaseLoginPassword + ";database=" + sqlDatabaseDatabaseName;
    
                //sqoop start
                var tableName = "mobiledata";
    
                var parameters = new SqoopJobSubmissionParameters
                {
                     Command = "export --connect " + connectionString + " --table " + tableName + " --hcatalog-table hivesampletable"
                };
                //sqoop end
    
                System.Console.WriteLine("Submitting the Sqoop job to the cluster...");
                var response = _hdiJobManagementClient.JobManagement.SubmitSqoopJob(parameters);
                System.Console.WriteLine("Validating that the response is as expected...");
                System.Console.WriteLine("Response status code is " + response.StatusCode);
                System.Console.WriteLine("Validating the response object...");
                System.Console.WriteLine("JobId is " + response.JobSubmissionJsonResponse.Id);
            }
        }
    }
    
  2. 若要运行程序,请选择 F5 键 。To run the program, select the F5 key.

Sqoop 导入Sqoop import

从 SQL Server 到 Azure 存储。From SQL Server to Azure Storage. 此示例的前提是已执行上述导出。This example is dependent on the above export having been performed. 此示例将数据从 SQL 数据库中的 mobiledata 表导入群集的默认存储帐户上的 wasb:///tutorials/usesqoop/importeddata 目录。This example imports data from the mobiledata table in SQL Database to the wasb:///tutorials/usesqoop/importeddata directory on the cluster's default Storage Account.

  1. //sqoop start //sqoop end 块中的以上代码替换为以下代码:Replace the code above in the //sqoop start //sqoop end block with the following code:

    var tableName = "mobiledata";
    var exportDir = "/tutorials/usesqoop/importeddata";
    
    var parameters = new SqoopJobSubmissionParameters
    {
        Command = "import --connect " + connectionString + " --table " + tableName + " --target-dir " +  exportDir + " --fields-terminated-by \\t --lines-terminated-by \\n -m 1"
    };
    
  2. 若要运行程序,请选择 F5 键 。To run the program, select the F5 key.

限制Limitations

基于 Linux 的 HDInsight 存在以下限制:Linux-based HDInsight presents the following limitations:

  • 批量导出:用于将数据导出到 Microsoft SQL Server 或 Azure SQL 数据库的 Sqoop 连接器目前不支持批量插入。Bulk export: The Sqoop connector that's used to export data to Microsoft SQL Server or Azure SQL Database does not currently support bulk inserts.

  • 批处理:如果使用 -batch 开关,Sqoop 将执行多次插入而不是批处理插入操作。Batching: By using the -batch switch, Sqoop performs multiple inserts instead of batching the insert operations.

后续步骤Next steps

现在你已了解如何使用 Sqoop。Now you have learned how to use Sqoop. 若要了解更多信息,请参阅以下文章:To learn more, see: