将 HDFS CLI 与 Data Lake Storage Gen2 配合使用Using the HDFS CLI with Data Lake Storage Gen2

可以使用命令行界面来访问并管理存储帐户中的数据,就像像使用 Hadoop 分布式文件系统 (HDFS) 一样。You can access and manage the data in your storage account by using a command line interface just as you would with a Hadoop Distributed File System (HDFS). 本文提供了一些有助于入门的示例。This article provides some examples that will help you get started.

HDInsight 可以访问在本地附加到计算节点的分布式容器。HDInsight provides access to the distributed container that is locally attached to the compute nodes. 可以使用与 HDFS 以及 Hadoop 支持的其他文件系统直接交互的 shell 来访问此容器。You can access this container by using the shell that directly interacts with the HDFS and the other file systems that Hadoop supports.

有关 HDFS CLI 的详细信息,请参阅官方文档HDFS 权限指南For more information on HDFS CLI, see the official documentation and the HDFS Permissions Guide

在 Linux 上结合使用 HDFS CLI 和 HDInsight Hadoop 群集Use the HDFS CLI with an HDInsight Hadoop cluster on Linux

首先建立对服务的远程访问First, establish remote access to services. 如果选择了 SSH,则示例 PowerShell 代码将如下所示:If you pick SSH the sample PowerShell code would look as follows:

#Connect to the cluster via SSH.
ssh sshuser@clustername-ssh.azurehdinsight.cn
#Execute basic HDFS commands. Display the hierarchy.
hdfs dfs -ls /
#Create a sample directory.
hdfs dfs -mkdir /samplefolder

可以在 Azure 门户中 HDInsight 群集边栏选项卡的“SSH + 群集登录”部分中找到连接字符串。The connection string can be found at the "SSH + Cluster login" section of the HDInsight cluster blade in Azure portal. SSH 凭据是在创建群集时指定的。SSH credentials were specified at the time of the cluster creation.

重要

创建群集后便开始 HDInsight 群集计费,删除群集后停止计费。HDInsight cluster billing starts after a cluster is created and stops when the cluster is deleted. HDInsight 群集按分钟收费,因此不再需要使用群集时,应将其删除。Billing is pro-rated per minute, so you should always delete your cluster when it is no longer in use. 若要了解如何删除群集,请参阅我们的有关该主题的文章To learn how to delete a cluster, see our article on the topic. 但是,即使删除了 HDInsight 群集,在启用了 Data Lake Storage Gen2 的存储帐户中存储的数据仍然会保留。However, data stored in a storage account with Data Lake Storage Gen2 enabled persists even after an HDInsight cluster is deleted.

创建容器Create a container

hdfs dfs -D "fs.azure.createRemoteFileSystemDuringInitialization=true" -ls abfs://<container-name>@<storage-account-name>.dfs.core.chinacloudapi.cn/

  • <container-name> 占位符替换为你要为容器指定的名称。Replace the <container-name> placeholder with the name that you want to give your container.

  • <storage-account-name> 占位符替换为存储帐户的名称。Replace the <storage-account-name> placeholder with the name of your storage account.

获取文件或目录列表Get a list of files or directories

hdfs dfs -ls <path>

<path> 占位符替换为容器或容器文件夹的 URI。Replace the <path> placeholder with the URI of the container or container folder.

例如:hdfs dfs -ls abfs://my-file-system@mystorageaccount.dfs.core.chinacloudapi.cn/my-directory-nameFor example: hdfs dfs -ls abfs://my-file-system@mystorageaccount.dfs.core.chinacloudapi.cn/my-directory-name

创建目录Create a directory

hdfs dfs -mkdir [-p] <path>

<path> 占位符替换为根容器名称或容器中的文件夹。Replace the <path> placeholder with the root container name or a folder within your container.

例如:hdfs dfs -mkdir abfs://my-file-system@mystorageaccount.dfs.core.chinacloudapi.cn/For example: hdfs dfs -mkdir abfs://my-file-system@mystorageaccount.dfs.core.chinacloudapi.cn/

删除文件或目录Delete a file or directory

hdfs dfs -rm <path>

<path> 占位符替换为要删除的文件或文件夹的 URI。Replace the <path> placeholder with the URI of the file or folder that you want to delete.

例如:hdfs dfs -rmdir abfs://my-file-system@mystorageaccount.dfs.core.chinacloudapi.cn/my-directory-name/my-file-nameFor example: hdfs dfs -rmdir abfs://my-file-system@mystorageaccount.dfs.core.chinacloudapi.cn/my-directory-name/my-file-name

显示文件和目录的访问控制列表 (ACL)Display the Access Control Lists (ACLs) of files and directories

hdfs dfs -getfacl [-R] <path>

示例:Example:

hdfs dfs -getfacl -R /dir

请参阅 getfaclSee getfacl

设置文件和目录的 ACLSet ACLs of files and directories

hdfs dfs -setfacl [-R] [-b|-k -m|-x <acl_spec> <path>]|[--set <acl_spec> <path>]

示例:Example:

hdfs dfs -setfacl -m user:hadoop:rw- /file

请参阅 setfaclSee setfacl

更改文件的所有者Change the owner of files

hdfs dfs -chown [-R] <new_owner>:<users_group> <URI>

请参阅 chownSee chown

更改文件的组关联Change group association of files

hdfs dfs -chgrp [-R] <group> <URI>

请参阅 chgrpSee chgrp

更改文件的权限Change the permissions of files

hdfs dfs -chmod [-R] <mode> <URI>

请参阅 chmodSee chmod

若要查看命令的完整列表,请访问 Apache Hadoop 2.4.1 文件系统 Shell 指南网站。You can view the complete list of commands on the Apache Hadoop 2.4.1 File System Shell Guide Website.

后续步骤Next steps