教程:使用 ODBC 和 PowerShell 查询 Apache HiveTutorial: Query Apache Hive with ODBC and PowerShell

使用 Microsoft ODBC 驱动程序可以灵活地与不同类型的数据源(包括 Apache Hive)进行交互。Microsoft ODBC drivers provide a flexible way to interact with different kinds of data sources, including Apache Hive. 可以采用 PowerShell 等脚本语言编写代码,以使用 ODBC 驱动程序打开与 Hive 群集的连接,传递所选的查询,并显示相应的结果。You can write code in scripting languages like PowerShell that use the ODBC drivers to open a connection to your Hive cluster, pass a query of your choosing, and display the results.

在本教程中,你将执行以下任务:In this tutorial, you'll do the following tasks:

  • 下载并安装 Microsoft Hive ODBC 驱动程序Download and install the Microsoft Hive ODBC driver
  • 创建链接到群集的 Apache Hive ODBC 数据源Create an Apache Hive ODBC data source linked to your cluster
  • 使用 PowerShell 从群集查询示例信息Query sample information from your cluster using PowerShell

如果没有 Azure 订阅,请在开始前创建一个试用订阅If you don’t have an Azure subscription, create a trail account before you begin.

先决条件Prerequisites

要阅读本教程,必须具备以下项:Before you begin this tutorial, you must have the following items:

  • HDInsight 上的一个交互式查询群集。An Interactive Query cluster on HDInsight. 要创建此群集,请参阅 Azure HDInsight 入门To create one, see Get started with Azure HDInsight. 选择“交互式查询”作为群集类型。Select Interactive Query as the cluster type.

安装 Microsoft Hive ODBC 驱动程序Install Microsoft Hive ODBC driver

下载并安装 Microsoft Hive ODBC 驱动程序Download and install the Microsoft Hive ODBC Driver.

创建 Apache Hive ODBC 数据源Create Apache Hive ODBC data source

下列步骤演示如何创建 Apache Hive ODBC 数据源。The following steps show you how to create an Apache Hive ODBC data source.

  1. 从 Windows 中导航到“开始” > “Windows 管理工具” > “ODBC 数据源(32 位)/(64 位) 。From Windows, navigate to Start > Windows Administrative Tools > ODBC Data Sources (32-bit)/(64-bit). 此时将打开“ODBC 数据源管理员” 窗口。An ODBC Data Source Administrator window opens.

    ODBC 数据源管理器OBDC data source administrator

  2. 在”用户 DSN”选项卡中,选择“添加”,打开“创建新数据源”窗口 。From the User DSN tab, select Add to open the Create New Data Source window.

  3. 选择“Microsoft Hive ODBC 驱动程序”,然后选择“完成”,打开“Microsoft Hive ODBC 驱动程序 DSN 设置”窗口 。Select Microsoft Hive ODBC Driver, and then select Finish to open the Microsoft Hive ODBC Driver DSN Setup window.

  4. 键入或选择以下值:Type or select the following values:

    属性Property 说明Description
    数据源名称Data Source Name 为数据源提供名称Give a name to your data source
    主机Host(s) 输入 CLUSTERNAME.azurehdinsight.cnEnter CLUSTERNAME.azurehdinsight.cn. 例如: myHDICluster.azurehdinsight.cnFor example, myHDICluster.azurehdinsight.cn
    端口Port 使用 443Use 443.
    数据库Database 使用“默认” 。Use default.
    机制Mechanism 选择“Windows Azure HDInsight 服务” Select Windows Azure HDInsight Service
    用户名User Name 输入 HDInsight 群集 HTTP 用户的用户名。Enter HDInsight cluster HTTP user username. 默认的用户名为 adminThe default username is admin.
    密码Password 输入 HDInsight 群集用户的密码。Enter HDInsight cluster user password. 选中复选框“保存密码(加密)” 。Select the checkbox Save Password (Encrypted).
  5. 可选:选择“高级选项” 。Optional: Select Advanced Options.

    参数Parameter 说明Description
    使用本机查询Use Native Query 当此项处于选中状态时,ODBC 驱动程序不会尝试将 TSQL 转换为 HiveQL。When it's selected, the ODBC driver does NOT try to convert TSQL into HiveQL. 仅当你完全确定要提交纯 HiveQL 语句时才使用此选项。Use this option only if you're 100% sure that you're submitting pure HiveQL statements. 连接 SQL Server 或 Azure SQL 数据库时,应将此项保留为未选中状态。When connecting to SQL Server or Azure SQL Database, you should leave it unchecked.
    每块提取的行数Rows fetched per block 提取大量记录时,可能需要调整此参数以确保最佳性能。When fetching a large number of records, tuning this parameter may be required to ensure optimal performances.
    默认字符串列长度、二进制列长度、十进制列小数位数Default string column length, Binary column length, Decimal column scale 数据类型长度和精度可能会影响返回数据的方式。The data type lengths and precisions may affect how data is returned. 由于精度损失和截断,这些语句可能会导致返回不正确的信息。They cause incorrect information to be returned because of loss of precision and truncation.

    高级 DSN 配置选项Advanced DSN configuration options

  6. 选择“测试”以测试数据源 。Select Test to test the data source. 如果数据源配置正确,测试结果会显示“成功” 。When the data source is configured correctly, the test result shows SUCCESS.

  7. 选择“确定”以关闭“测试”窗口 。Select OK to close the Test window.

  8. 选择“确定”以关闭“Microsoft Hive ODBC 驱动程序 DSN 设置”窗口 。Select OK to close the Microsoft Hive ODBC Driver DSN Setup window.

  9. 选择“确定”以关闭“ODBC 数据源管理器”窗口 。Select OK to close the ODBC Data Source Administrator window.

使用 PowerShell 查询数据Query data with PowerShell

以下 PowerShell 脚本是一个使用 ODBC 查询 Hive 群集的函数。The following PowerShell script is a function that ODBC to query a Hive cluster.

function Get-ODBC-Data {

   param(
   [string]$query=$(throw 'query is required.'),
   [string]$dsn,  
   [PSCredential] $cred = (Get-Credential)  
   )

   $conn = New-Object System.Data.Odbc.OdbcConnection
   $uname = $cred.UserName

   $pswd = (New-Object System.Net.NetworkCredential -ArgumentList "", $cred.Password).Password
   $conn.ConnectionString = "DSN=$dsn;Uid=$uname;Pwd=$pswd;"
   $conn.open()
   $cmd = New-object System.Data.Odbc.OdbcCommand($query,$conn)

   $ds = New-Object system.Data.DataSet

   (New-Object system.Data.odbc.odbcDataAdapter($cmd)).fill($ds) #| out-null
   $conn.close()
   $ds.Tables
}

以下代码片段使用上述函数对你在本教程开头创建的交互式查询群集执行查询。The following code snippet uses the function above to execute a query on the Interactive Query cluster that you created at the beginning of the tutorial. DATASOURCENAME 替换为已在“Microsoft Hive ODBC 驱动程序 DSN 设置”屏幕上指定的“数据源名称” 。Replace DATASOURCENAME with the Data Source Name that you specified on the Microsoft Hive ODBC Driver DSN Setup screen. 在系统提示输入凭据时,输入创建群集时在“群集登录用户名”和“群集登录密码”下输入的用户名和密码 。When prompted for credentials, enter the username and password that you entered under Cluster login username and Cluster login password when you created the cluster.


$dsn = "DATASOURCENAME"

$query = "select count(distinct clientid) AS total_clients from hivesampletable"

Get-ODBC-Data -query $query -dsn $dsn

清理资源Clean up resources

如果不再需要资源组、HDInsight 群集和存储帐户,请将其删除。When no longer needed, delete the resource group, HDInsight cluster, and storage account. 为此,请选择已创建群集的资源组,然后单击“删除” 。To do so, select the resource group where the cluster was created and click Delete.

后续步骤Next steps

本教程介绍了如何使用 Microsoft Hive ODBC 驱动程序和 PowerShell 从 Azure HDInsight 交互式查询群集检索数据。In this tutorial, you learned how to use the Microsoft Hive ODBC driver and PowerShell to retrieve data from your Azure HDInsight Interactive Query cluster.