使用 SSH 连接到 HDInsight (Apache Hadoop)Connect to HDInsight (Apache Hadoop) using SSH

了解如何使用安全外壳 (SSH) 安全地连接到 Azure HDInsight 上的 Apache Hadoop。Learn how to use Secure Shell (SSH) to securely connect to Apache Hadoop on Azure HDInsight. 有关通过虚拟网络连接的信息,请参阅 Azure HDInsight 虚拟网络体系结构使用 Azure 虚拟网络扩展 Azure HDInsightFor information on connecting through a virtual network, see Azure HDInsight virtual network architecture and Extend Azure HDInsight using an Azure Virtual Network.

下表包含使用 SSH 客户端连接到 HDInsight 时所需的地址和端口信息:The following table contains the address and port information needed when connecting to HDInsight using an SSH client:

地址Address 端口Port 连接到...Connects to...
<clustername>-ssh.azurehdinsight.cn 2222 主头节点Primary headnode
<clustername>-ssh.azurehdinsight.cn 2323 辅助头节点Secondary headnode
<clustername>-ed-ssh.azurehdinsight.cn 2222 边缘节点 (ML Services on HDInsight)Edge node (ML Services on HDInsight)
<edgenodename>.<clustername>-ssh.azurehdinsight.cn 2222 边缘节点(如果存在边缘节点,则可以是任何其他群集类型)Edge node (any other cluster type, if an edge node exists)

<clustername> 替换为群集的名称。Replace <clustername> with the name of your cluster. <edgenodename> 替换为边缘节点的名称。Replace <edgenodename> with the name of the edge node.

如果群集包含边缘节点,建议__始终连接到边缘节点__(使用 SSH)。If your cluster contains an edge node, we recommend that you always connect to the edge node using SSH. 头节点托管服务对于 Hadoop 运行状况而言至关重要。The head nodes host services that are critical to the health of Hadoop. 边缘节点只会运行其上放置的软件。The edge node runs only what you put on it. 有关使用边缘节点的详细信息,请参阅 Use edge nodes in HDInsight(在 HDInsight 中使用边缘节点)。For more information on using edge nodes, see Use edge nodes in HDInsight.

Tip

首先连接到 HDInsight 时,SSH 客户端可能会显示一个警告,指出无法验证主机。When you first connect to HDInsight, your SSH client may display a warning that the authenticity of the host can't be established. 当系统提示时,请选择“是”,将主机添加到 SSH 客户端的受信任服务器列表。When prompted select 'yes' to add the host to your SSH client's trusted server list.

如果此前已使用同一名称连接到某个服务器,则可能会收到一个警告,指出存储的主机密钥与服务器的主机密钥不匹配。If you have previously connected to a server with the same name, you may receive a warning that the stored host key does not match the host key of the server. 请参阅 SSH 客户端的文档,了解如何删除服务器名称的现有条目。Consult the documentation for your SSH client on how to remove the existing entry for the server name.

SSH 客户端SSH clients

Linux、Unix 和 macOS 系统提供 sshscp 命令。Linux, Unix, and macOS systems provide the ssh and scp commands. ssh 客户端通常用于在基于 Linux 或 Unix 的系统中创建远程命令行会话。The ssh client is commonly used to create a remote command-line session with a Linux or Unix-based system. scp 客户端用于在客户端和远程系统之间安全地复制文件。The scp client is used to securely copy files between your client and the remote system.

默认情况下,Microsoft Windows 不安装任何 SSH 客户端。Microsoft Windows does not install any SSH clients by default. sshscp 客户端通过以下包提供给 Windows 使用:The ssh and scp clients are available for Windows through the following packages:

还有多个图形 SSH 客户端,例如 PuTTYMobaXtermThere are also several graphical SSH clients, such as PuTTY and MobaXterm. 尽管可以使用这些客户端连接到 HDInsight,但连接的过程与使用 ssh 实用工具时不同。While these clients can be used to connect to HDInsight, the process of connecting is different than using the ssh utility. 有关详细信息,请参阅所用图形客户端的文档。For more information, see the documentation of the graphical client you are using.

身份验证:SSH 密钥Authentication: SSH Keys

SSH 密钥使用公钥加密对 SSH 会话进行身份验证。SSH keys use Public-key cryptography to authenticate SSH sessions. SSH 密钥比密码更安全,使用它可以轻松确保对 Hadoop 群集进行访问时的安全性。SSH keys are more secure than passwords, and provide an easy way to secure access to your Hadoop cluster.

如果使用密钥保护 SSH 帐户,客户端必须在连接时提供匹配的私钥:If your SSH account is secured using a key, the client must provide the matching private key when you connect:

  • 可将大多数客户端配置为使用__默认密钥__。Most clients can be configured to use a default key. 例如,在 Linux 和 Unix 环境中,ssh 客户端会在 ~/.ssh/id_rsa 位置查找私钥。For example, the ssh client looks for a private key at ~/.ssh/id_rsa on Linux and Unix environments.

  • 可以指定__私钥的路径__。You can specify the path to a private key. ssh 客户端中,可使用 -i 参数指定私钥的路径。With the ssh client, the -i parameter is used to specify the path to private key. 例如,ssh -i ~/.ssh/id_rsa sshuser@myedge.mycluster-ssh.azurehdinsight.cnFor example, ssh -i ~/.ssh/id_rsa sshuser@myedge.mycluster-ssh.azurehdinsight.cn.

  • 如果将__多个私钥__用于不同的服务器,可以考虑使用 ssh-agent (https://en.wikipedia.org/wiki/Ssh-agent) 之类的实用工具。If you have multiple private keys for use with different servers, consider using a utility such as ssh-agent (https://en.wikipedia.org/wiki/Ssh-agent). 在建立 SSH 会话时,可以通过 ssh-agent 实用工具自动选择要使用的密钥。The ssh-agent utility can be used to automatically select the key to use when establishing an SSH session.

Important

如果使用密码保护私钥,使用该密钥时必须输入密码。If you secure your private key with a passphrase, you must enter the passphrase when using the key. 为提供方便,ssh-agent 等实用工具可以缓存密码。Utilities such as ssh-agent can cache the password for your convenience.

创建 SSH 密钥对Create an SSH key pair

使用 ssh-keygen 命令创建公钥和私钥文件。Use the ssh-keygen command to create public and private key files. 以下命令生成可在 HDInsight 中使用的 2048 位 RSA 密钥对:The following command generates a 2048-bit RSA key pair that can be used with HDInsight:

ssh-keygen -t rsa -b 2048

在创建密钥的过程中,系统会提示输入信息。You are prompted for information during the key creation process. 例如,密钥的存储位置,或者是否要使用密码。For example, where the keys are stored or whether to use a passphrase. 完成该过程后,会创建两个文件:一个公钥文件和一个私钥文件。After the process completes, two files are created; a public key and a private key.

  • __公钥__用于创建 HDInsight 群集。The public key is used to create an HDInsight cluster. 公钥的扩展名为 .pubThe public key has an extension of .pub.

  • __私钥__用于在 HDInsight 群集中对客户端进行身份验证。The private key is used to authenticate your client to the HDInsight cluster.

Important

可以使用密码保护密钥。You can secure your keys using a passphrase. 密码实际上是私钥中的一个密码。A passphrase is effectively a password on your private key. 即使有人获取了私钥,但他们必须知道该密码才能使用该私钥。Even if someone obtains your private key, they must have the passphrase to use the key.

使用公钥创建 HDInsightCreate HDInsight using the public key

创建方法Creation method 如何使用公钥How to use the public key
Azure 门户Azure portal 取消选中“使用与群集登录相同的密码”,并选择“公钥”作为 SSH 身份验证类型。 Uncheck Use same password as cluster login, and then select Public Key as the SSH authentication type. 最后,在“SSH 公钥”字段中选择公钥文件,或粘贴该文件的文本内容。 Finally, select the public key file or paste the text contents of the file in the SSH public key field.
创建 HDInsight 群集时的 SSH 公钥对话框SSH public key dialog in HDInsight cluster creation
Azure PowerShellAzure PowerShell 使用 New-AzHdinsightCluster cmdlet 的 -SshPublicKey 参数,并以字符串的形式传递公钥内容。Use the -SshPublicKey parameter of the New-AzHdinsightCluster cmdlet and pass the contents of the public key as a string.
Azure CLIAzure CLI 使用 az hdinsight create 命令的 --sshPublicKey 参数,并以字符串的形式传递公钥内容。Use the --sshPublicKey parameter of the az hdinsight create command and pass the contents of the public key as a string.
Resource Manager 模板Resource Manager Template 有关在模板中使用 SSH 密钥的示例,请参阅 Deploy HDInsight on Linux with SSH key(使用 SSH 密钥在 Linux 上部署 HDInsight)。For an example of using SSH keys with a template, see Deploy HDInsight on Linux with SSH key. azuredeploy.json 文件中的 publicKeys 元素用于在创建群集时向 Azure 传递密钥。The publicKeys element in the azuredeploy.json file is used to pass the keys to Azure when creating the cluster.

身份验证:密码Authentication: Password

可以使用密码保护 SSH 帐户。SSH accounts can be secured using a password. 使用 SSH 连接到 HDInsight 时,系统会提示输入密码。When you connect to HDInsight using SSH, you are prompted to enter the password.

Warning

Microsoft 不建议将密码身份验证用于 SSH。Microsoft does not recommend using password authentication for SSH. 密码可能被猜出,容易受到暴力破解攻击。Passwords can be guessed and are vulnerable to brute force attacks. 我们建议使用 SSH 密钥进行身份验证Instead, we recommend that you use SSH keys for authentication.

Important

SSH 帐户密码在创建 HDInsight 群集 70 天后过期。The SSH account password expires 70 days after the HDInsight cluster is created. 如果密码过期,可以使用管理 HDInsight 文档中的信息更改它。If your password expires, you can change it using the information in the Manage HDInsight document.

使用密码创建 HDInsightCreate HDInsight using a password

创建方法Creation method 如何指定密码How to specify the password
Azure 门户Azure portal 默认情况下,SSH 用户帐户的密码与群集登录帐户的密码相同。By default, the SSH user account has the same password as the cluster login account. 如果要使用不同的密码,请取消选中“使用与群集登录相同的密码”,并在“SSH 密码”字段中输入密码。 To use a different password, uncheck Use same password as cluster login, and then enter the password in the SSH password field.
创建 HDInsight 群集时的 SSH 密码对话框SSH password dialog in HDInsight cluster creation
Azure PowerShellAzure PowerShell 使用 New-AzHdinsightCluster cmdlet 的 --SshCredential 参数,并传递包含 SSH 用户帐户名和密码的 PSCredential 对象。Use the --SshCredential parameter of the New-AzHdinsightCluster cmdlet and pass a PSCredential object that contains the SSH user account name and password.
Azure CLIAzure CLI 使用 az hdinsight create 命令的 --sshPassword 参数,并提供密码值。Use the --sshPassword parameter of the az hdinsight create command and provide the password value.
Resource Manager 模板Resource Manager Template 有关在模板中使用密码的示例,请参阅 Deploy HDInsight on Linux with SSH password(使用 SSH 密码在 Linux 上部署 HDInsight)。For an example of using a password with a template, see Deploy HDInsight on Linux with SSH password. azuredeploy.json 文件中的 linuxOperatingSystemProfile 元素用于在创建群集时向 Azure 传递 SSH 帐户名和密码。The linuxOperatingSystemProfile element in the azuredeploy.json file is used to pass the SSH account name and password to Azure when creating the cluster.

更改 SSH 密码Change the SSH password

有关更改 SSH 用户帐户密码的信息,请参阅 Manage HDInsight(管理 HDInsight)文档的 Change passwords(更改密码)部分。For information on changing the SSH user account password, see the Change passwords section of the Manage HDInsight document.

连接到节点Connect to nodes

可以通过 Internet 在端口 22 和 23 上访问头节点和边缘节点(如果有)。The head nodes and edge node (if there is one) can be accessed over the internet on ports 22 and 23.

  • 连接到头节点时,请使用端口 22 连接到主头节点,使用端口 23 连接到辅助头节点。 When connecting to the head nodes, use port 22 to connect to the primary head node and port 23 to connect to the secondary head node. 要使用的完全限定的域名为 clustername-ssh.azurehdinsight.cn,其中的 clustername 为群集的名称。The fully qualified domain name to use is clustername-ssh.azurehdinsight.cn, where clustername is the name of your cluster.

    # Connect to primary head node
    # port not specified since 22 is the default
    ssh sshuser@clustername-ssh.azurehdinsight.cn
    
    # Connect to secondary head node
    ssh -p 23 sshuser@clustername-ssh.azurehdinsight.cn
    
  • 连接到边缘节点时请使用端口 22。 When connecting to the edge node, use port 22. 完全限定的域名为 edgenodename.clustername-ssh.azurehdinsight.cn,其中的 edgenodename 是在创建边缘节点时提供的名称。The fully qualified domain name is edgenodename.clustername-ssh.azurehdinsight.cn, where edgenodename is a name you provided when creating the edge node. clustername 是群集的名称。clustername is the name of the cluster.

    # Connect to edge node
    ssh sshuser@edgnodename.clustername-ssh.azurehdinsight.cn
    

Important

前面的示例假定你使用的是密码身份验证,或者系统会自动进行证书身份验证。The previous examples assume that you are using password authentication, or that certificate authentication is occurring automatically. 如果使用 SSH 密钥对进行身份验证,且系统不会自动使用证书,则请使用 -i 参数指定私钥。If you use an SSH key-pair for authentication, and the certificate is not used automatically, use the -i parameter to specify the private key. 例如,ssh -i ~/.ssh/mykey sshuser@clustername-ssh.azurehdinsight.cnFor example, ssh -i ~/.ssh/mykey sshuser@clustername-ssh.azurehdinsight.cn.

连接以后,提示符会改为指示 SSH 用户名和连接到的节点。Once connected, the prompt changes to indicate the SSH user name and the node you are connected to. 例如,在以 sshuser 身份连接到主头节点时,提示符为 sshuser@hn0-clustername:~$For example, when connected to the primary head node as sshuser, the prompt is sshuser@hn0-clustername:~$.

连接到工作节点和 Apache Zookeeper 节点Connect to worker and Apache Zookeeper nodes

工作节点和 Zookeeper 节点不能从 Internet 直接访问,The worker nodes and Zookeeper nodes are not directly accessible from the internet. 但可以从群集头节点或边缘节点访问。They can be accessed from the cluster head nodes or edge nodes. 以下是连接到其他节点的一般步骤:The following are the general steps to connect to other nodes:

  1. 使用 SSH 连接到头节点或边缘节点:Use SSH to connect to a head or edge node:

     ssh sshuser@myedge.mycluster-ssh.azurehdinsight.cn
    
  2. 通过 SSH 连接到头节点或边缘节点后,使用 ssh 命令连接到群集中的工作节点:From the SSH connection to the head or edge node, use the ssh command to connect to a worker node in the cluster:

     ssh sshuser@wn0-myhdi
    

    若要检索节点名称列表,请参阅使用 Apache Ambari REST API 管理 HDInsight 文档。To retrieve a list of the node names, see the Manage HDInsight by using the Apache Ambari REST API document.

如果 SSH 帐户使用某个__密码__进行了保护,请在连接时输入该密码。If the SSH account is secured using a password, enter the password when connecting.

如果 SSH 帐户使用 __SSH 密钥__进行了保护,请确保在客户端启用 SSH 转发。If the SSH account is secured using SSH keys, make sure that SSH forwarding is enabled on the client.

Note

直接访问群集中所有节点的另一种方法是在 Azure 虚拟网络中安装 HDInsight。Another way to directly access all nodes in the cluster is to install HDInsight into an Azure Virtual Network. 然后,可将远程计算机加入相同的虚拟网络,这样就可以直接访问群集中的所有节点。Then, you can join your remote machine to the same virtual network and directly access all nodes in the cluster.

有关详细信息,请参阅将 Use a virtual network with HDInsight(对 HDInsight 使用虚拟网络)。For more information, see Use a virtual network with HDInsight.

配置 SSH 代理转发Configure SSH agent forwarding

Important

以下步骤假设在基于 Linux 或 UNIX 的系统上操作,并且能够使用基于 Windows 10 的 Bash。The following steps assume a Linux or UNIX-based system, and work with Bash on Windows 10. 如果这些步骤不适用于系统,可能需要查阅 SSH 客户端的文档。If these steps do not work for your system, you may need to consult the documentation for your SSH client.

  1. 使用文本编辑器打开 ~/.ssh/configUsing a text editor, open ~/.ssh/config. 如果此文件不存在,可以在命令行中输入 touch ~/.ssh/config 来创建。If this file doesn't exist, you can create it by entering touch ~/.ssh/config at a command line.

  2. 将以下文本添加到 config 文件。Add the following text to the config file.

     Host <edgenodename>.<clustername>-ssh.azurehdinsight.cn
       ForwardAgent yes
    

    Host 信息替换为使用 SSH 连接到的节点的地址。Replace the Host information with the address of the node you connect to using SSH. 上面的示例使用边缘节点。The previous example uses the edge node. 此条目为指定的节点配置 SSH 代理转发。This entry configures SSH agent forwarding for the specified node.

  3. 在终端中通过使用以下命令测试 SSH 代理转发:Test SSH agent forwarding by using the following command from the terminal:

     echo "$SSH_AUTH_SOCK"
    

    此命令返回类似于以下文本的信息:This command returns information similar to the following text:

     /tmp/ssh-rfSUL1ldCldQ/agent.1792
    

    如未返回任何信息,则 ssh-agent 未运行。If nothing is returned, then ssh-agent is not running. 有关详细信息,请参阅 Using ssh-agent with ssh (http://mah.everybody.org/docs/ssh)(将 ssh-agent 与 ssh 配合使用)中的代理启动脚本信息,或者查阅 SSH 客户端文档。For more information, see the agent startup scripts information at Using ssh-agent with ssh (http://mah.everybody.org/docs/ssh) or consult your SSH client documentation.

  4. 验证了 ssh-agent 处于运行状态后,请使用以下方式将 SSH 私钥添加到代理:Once you have verified that ssh-agent is running, use the following to add your SSH private key to the agent:

     ssh-add ~/.ssh/id_rsa
    

    如果私钥存储在不同文件中,请将 ~/.ssh/id_rsa 替换为该文件的路径。If your private key is stored in a different file, replace ~/.ssh/id_rsa with the path to the file.

  5. 使用 SSH 连接到群集边缘节点或头节点。Connect to the cluster edge node or head nodes using SSH. 然后使用 SSH 命令连接到工作节点或 zookeeper 节点。Then use the SSH command to connect to a worker or zookeeper node. 使用转发的密钥建立连接。The connection is established using the forwarded key.

复制文件Copy files

scp 实用工具可以用来将文件复制到群集中的单个节点,或者从单个节点进行复制。The scp utility can be used to copy files to and from individual nodes in the cluster. 例如,以下命令将 test.txt 目录从本地系统复制到头主节点:For example, the following command copies the test.txt directory from the local system to the primary head node:

scp test.txt sshuser@clustername-ssh.azurehdinsight.cn:

由于未在 : 后指定路径,因此会将文件置于 sshuser 主目录。Since no path is specified after the :, the file is placed in the sshuser home directory.

以下示例将 test.txt 文件从头主节点上的 sshuser 主目录复制到本地系统:The following example copies the test.txt file from the sshuser home directory on the primary head node to the local system:

scp sshuser@clustername-ssh.azurehdinsight.cn:test.txt .

Important

scp 只能访问群集中单个节点的文件系统。scp can only access the file system of individual nodes within the cluster. 它不能用来访问适用于群集且兼容 HDFS 的存储中的数据。It cannot be used to access data in the HDFS-compatible storage for the cluster.

需要通过 SSH 会话上传要使用的资源时,请使用 scpUse scp when you need to upload a resource for use from an SSH session. 例如,上传一个 Python 脚本,然后通过 SSH 会话运行该脚本。For example, upload a Python script and then run the script from an SSH session.

若要了解如何将数据直接加载到兼容 HDFS 的存储中,请参阅以下文档:For information on directly loading data into the HDFS-compatible storage, see the following documents:

后续步骤Next steps