使用 Python 管理 Azure Data Lake Storage Gen2 中的目录、文件和 ACLUse Python to manage directories, files, and ACLs in Azure Data Lake Storage Gen2

本文介绍如何使用 Python 在启用了分层命名空间 (HNS) 的存储帐户中创建和管理目录、文件与权限。This article shows you how to use Python to create and manage directories, files, and permissions in storage accounts that has hierarchical namespace (HNS) enabled.

包(Python 包索引) | 示例 | API 参考 | 提供反馈Package (Python Package Index) | Samples | API reference | Give Feedback

先决条件Prerequisites

  • Azure 订阅。An Azure subscription. 请参阅获取 Azure 试用版See Get Azure trial.
  • 一个已启用分层命名空间 (HNS) 的存储帐户。A storage account that has hierarchical namespace (HNS) enabled. 这些说明创建一个。Follow these instructions to create one.

设置项目Set up your project

使用 pip 安装适用于 Python 的 Azure Data Lake Storage 客户端库。Install the Azure Data Lake Storage client library for Python by using pip.

pip install azure-storage-file-datalake

将这些 import 语句添加到代码文件的顶部。Add these import statements to the top of your code file.

import os, uuid, sys
from azure.storage.filedatalake import DataLakeServiceClient
from azure.core._match_conditions import MatchConditions
from azure.storage.filedatalake._models import ContentSettings

连接到帐户Connect to the account

若要使用本文中的代码片段,需创建一个表示存储帐户的 DataLakeServiceClient 实例。To use the snippets in this article, you'll need to create a DataLakeServiceClient instance that represents the storage account.

使用帐户密钥进行连接Connect by using an account key

这是连接到帐户的最简单方法。This is the easiest way to connect to an account.

此示例使用帐户密钥创建 DataLakeServiceClient 实例。This example creates a DataLakeServiceClient instance by using an account key.

try:  
    global service_client
        
    service_client = DataLakeServiceClient(account_url="{}://{}.dfs.core.chinacloudapi.cn".format(
        "https", storage_account_name), credential=storage_account_key)
    
except Exception as e:
    print(e)
  • storage_account_name 占位符值替换为存储帐户的名称。Replace the storage_account_name placeholder value with the name of your storage account.

  • storage_account_key 占位符值替换为存储帐户访问密钥。Replace the storage_account_key placeholder value with your storage account access key.

使用 Azure Active Directory (AD) 进行连接Connect by using Azure Active Directory (AD)

可以使用适用于 Python 的 Azure 标识客户端库,通过 Azure AD 对应用程序进行身份验证。You can use the Azure identity client library for Python to authenticate your application with Azure AD.

此示例使用客户端 ID、客户端密码和租户 ID 创建 DataLakeServiceClient 实例。This example creates a DataLakeServiceClient instance by using a client ID, a client secret, and a tenant ID. 若要获取这些值,请参阅从 Azure AD 获取用于请求客户端应用程序授权的令牌To get these values, see Acquire a token from Azure AD for authorizing requests from a client application.

def initialize_storage_account_ad(storage_account_name, client_id, client_secret, tenant_id):
    
    try:  
        global service_client

        credential = ClientSecretCredential(tenant_id, client_id, client_secret)

        service_client = DataLakeServiceClient(account_url="{}://{}.dfs.core.chinacloudapi.cn".format(
            "https", storage_account_name), credential=credential)
    
    except Exception as e:
        print(e)

备注

有关更多示例,请参阅适用于 Python 的 Azure 标识客户端库文档。For more examples, see the Azure identity client library for Python documentation.

创建容器Create a container

容器充当文件的文件系统。A container acts as a file system for your files. 可以通过调用 FileSystemDataLakeServiceClient.create_file_system 方法来创建一个。You can create one by calling the FileSystemDataLakeServiceClient.create_file_system method.

此示例创建一个名为 my-file-system 的容器。This example creates a container named my-file-system.

def create_file_system():
    try:
        global file_system_client

        file_system_client = service_client.create_file_system(file_system="my-file-system")
    
    except Exception as e:
        print(e) 

创建目录Create a directory

通过调用 FileSystemClient.create_directory 方法来创建目录引用。Create a directory reference by calling the FileSystemClient.create_directory method.

此示例将名为 my-directory 的目录添加到容器中。This example adds a directory named my-directory to a container.

def create_directory():
    try:
        file_system_client.create_directory("my-directory")
    
    except Exception as e:
     print(e) 

重命名或移动目录Rename or move a directory

通过调用 DataLakeDirectoryClient.rename_directory 方法来重命名或移动目录。Rename or move a directory by calling the DataLakeDirectoryClient.rename_directory method. 以参数形式传递所需目录的路径。Pass the path of the desired directory a parameter.

此示例将子目录重命名为 my-subdirectory-renamed 的名称。This example renames a sub-directory to the name my-subdirectory-renamed.

def rename_directory():
    try:
       
       file_system_client = service_client.get_file_system_client(file_system="my-file-system")
       directory_client = file_system_client.get_directory_client("my-directory")
       
       new_dir_name = "my-directory-renamed"
       directory_client.rename_directory(rename_destination=directory_client.file_system_name + '/' + new_dir_name)

    except Exception as e:
     print(e) 

删除目录Delete a directory

通过调用 DataLakeDirectoryClient.delete_directory 方法来删除目录。Delete a directory by calling the DataLakeDirectoryClient.delete_directory method.

此示例删除名为 my-directory 的目录。This example deletes a directory named my-directory.

def delete_directory():
    try:
        file_system_client = service_client.get_file_system_client(file_system="my-file-system")
        directory_client = file_system_client.get_directory_client("my-directory")

        directory_client.delete_directory()
    except Exception as e:
     print(e) 

将文件上传到目录Upload a file to a directory

首先,通过创建 DataLakeFileClient 类的实例,在目标目录中创建文件引用。First, create a file reference in the target directory by creating an instance of the DataLakeFileClient class. 通过调用 DataLakeFileClient.append_data 方法上传文件。Upload a file by calling the DataLakeFileClient.append_data method. 确保通过调用 DataLakeFileClient.flush_data 方法完成上传。Make sure to complete the upload by calling the DataLakeFileClient.flush_data method.

此示例将文本文件上传到名为 my-directory 的目录。This example uploads a text file to a directory named my-directory.

def upload_file_to_directory():
    try:

        file_system_client = service_client.get_file_system_client(file_system="my-file-system")

        directory_client = file_system_client.get_directory_client("my-directory")
        
        file_client = directory_client.create_file("uploaded-file.txt")
        local_file = open("C:\\file-to-upload.txt",'rb')

        file_contents = local_file.read()

        file_client.append_data(data=file_contents, offset=0, length=len(file_contents))

        file_client.flush_data(len(file_contents))

    except Exception as e:
      print(e) 

提示

如果文件很大,则代码必须多次调用 DataLakeFileClient.append_data 方法。If your file size is large, your code will have to make multiple calls to the DataLakeFileClient.append_data method. 请考虑改用 DataLakeFileClient.upload_data 方法。Consider using the DataLakeFileClient.upload_data method instead. 这样就可以在单个调用中上传整个文件。That way, you can upload the entire file in a single call.

将大型文件上传到目录Upload a large file to a directory

使用 DataLakeFileClient.upload_data 方法上传大型文件,无需多次调用 DataLakeFileClient.append_data 方法 。Use the DataLakeFileClient.upload_data method to upload large files without having to make multiple calls to the DataLakeFileClient.append_data method.

def upload_file_to_directory_bulk():
    try:

        file_system_client = service_client.get_file_system_client(file_system="my-file-system")

        directory_client = file_system_client.get_directory_client("my-directory")
        
        file_client = directory_client.get_file_client("uploaded-file.txt")

        local_file = open("C:\\file-to-upload.txt",'rb')

        file_contents = local_file.read()

        file_client.upload_data(file_contents, overwrite=True)

    except Exception as e:
      print(e) 

从目录下载Download from a directory

打开用于写入的本地文件。Open a local file for writing. 然后,创建一个 DataLakeFileClient 实例,该实例表示要下载的文件。Then, create a DataLakeFileClient instance that represents the file that you want to download. 调用 DataLakeFileClient.read_file,以便从文件读取字节,然后将这些字节写入本地文件。Call the DataLakeFileClient.read_file to read bytes from the file and then write those bytes to the local file.

def download_file_from_directory():
    try:
        file_system_client = service_client.get_file_system_client(file_system="my-file-system")

        directory_client = file_system_client.get_directory_client("my-directory")
        
        local_file = open("C:\\file-to-download.txt",'wb')

        file_client = directory_client.get_file_client("uploaded-file.txt")

        download = file_client.download_file()

        downloaded_bytes = download.readall()

        local_file.write(downloaded_bytes)

        local_file.close()

    except Exception as e:
     print(e)

列出目录内容List directory contents

通过调用 FileSystemClient.get_paths 方法列出目录内容,然后枚举结果。List directory contents by calling the FileSystemClient.get_paths method, and then enumerating through the results.

此示例输出名为 my-directory 的目录中的每个子目录和文件的路径。This example, prints the path of each subdirectory and file that is located in a directory named my-directory.

def list_directory_contents():
    try:
        
        file_system_client = service_client.get_file_system_client(file_system="my-file-system")

        paths = file_system_client.get_paths(path="my-directory")

        for path in paths:
            print(path.name + '\n')

    except Exception as e:
     print(e) 

管理访问控制列表 (ACL)Manage access control lists (ACLs)

可以获取、设置和更新目录与文件的访问权限。You can get, set, and update access permissions of directories and files.

备注

若要使用 Azure Active Directory (Azure AD) 来授予访问权限,请确保已为安全主体分配了存储 Blob 数据所有者角色If you're using Azure Active Directory (Azure AD) to authorize access, then make sure that your security principal has been assigned the Storage Blob Data Owner role. 若要详细了解如何应用 ACL 权限以及更改它们所带来的影响,请参阅 Azure Data Lake Storage Gen2 中的访问控制To learn more about how ACL permissions are applied and the effects of changing them, see Access control in Azure Data Lake Storage Gen2.

管理目录 ACLManage directory ACLs

通过调用 DataLakeDirectoryClient.get_access_control 方法获取目录的访问控制列表 (ACL),并通过调用 DataLakeDirectoryClient.set_access_control 方法来设置 ACL。Get the access control list (ACL) of a directory by calling the DataLakeDirectoryClient.get_access_control method and set the ACL by calling the DataLakeDirectoryClient.set_access_control method.

备注

如果你的应用程序通过使用 Azure Active Directory (Azure AD) 来授予访问权限,请确保已向应用程序用来授权访问的安全主体分配了存储 Blob 数据所有者角色If your application authorizes access by using Azure Active Directory (Azure AD), then make sure that the security principal that your application uses to authorize access has been assigned the Storage Blob Data Owner role. 若要详细了解如何应用 ACL 权限以及更改它们所带来的影响,请参阅 Azure Data Lake Storage Gen2 中的访问控制To learn more about how ACL permissions are applied and the effects of changing them, see Access control in Azure Data Lake Storage Gen2.

此示例获取并设置名为 my-directory 的目录的 ACL。This example gets and sets the ACL of a directory named my-directory. 字符串 rwxr-xrw- 为拥有用户提供读取、写入和执行权限,为拥有组授予读取和执行权限,并为所有其他用户提供读取和写入权限。The string rwxr-xrw- gives the owning user read, write, and execute permissions, gives the owning group only read and execute permissions, and gives all others read and write permission.

def manage_directory_permissions():
    try:
        file_system_client = service_client.get_file_system_client(file_system="my-file-system")

        directory_client = file_system_client.get_directory_client("my-directory")
        
        acl_props = directory_client.get_access_control()
        
        print(acl_props['permissions'])
        
        new_dir_permissions = "rwxr-xrw-"
        
        directory_client.set_access_control(permissions=new_dir_permissions)
        
        acl_props = directory_client.get_access_control()
        
        print(acl_props['permissions'])
    
    except Exception as e:
     print(e) 

还可以获取和设置容器根目录的 ACL。You can also get and set the ACL of the root directory of a container. 若要获取根目录,请调用 FileSystemClient._get_root_directory_client 方法。To get the root directory, call the FileSystemClient._get_root_directory_client method.

管理文件权限Manage file permissions

通过调用 DataLakeFileClient.get_access_control 方法获取文件的访问控制列表 (ACL),并通过调用 DataLakeFileClient.set_access_control 方法来设置 ACL。Get the access control list (ACL) of a file by calling the DataLakeFileClient.get_access_control method and set the ACL by calling the DataLakeFileClient.set_access_control method.

备注

如果你的应用程序通过使用 Azure Active Directory (Azure AD) 来授予访问权限,请确保已向应用程序用来授权访问的安全主体分配了存储 Blob 数据所有者角色If your application authorizes access by using Azure Active Directory (Azure AD), then make sure that the security principal that your application uses to authorize access has been assigned the Storage Blob Data Owner role. 若要详细了解如何应用 ACL 权限以及更改它们所带来的影响,请参阅 Azure Data Lake Storage Gen2 中的访问控制To learn more about how ACL permissions are applied and the effects of changing them, see Access control in Azure Data Lake Storage Gen2.

此示例获取并设置名为 my-file.txt 的文件的 ACL。This example gets and sets the ACL of a file named my-file.txt. 字符串 rwxr-xrw- 为拥有用户提供读取、写入和执行权限,为拥有组授予读取和执行权限,并为所有其他用户提供读取和写入权限。The string rwxr-xrw- gives the owning user read, write, and execute permissions, gives the owning group only read and execute permissions, and gives all others read and write permission.

def manage_file_permissions():
    try:
        file_system_client = service_client.get_file_system_client(file_system="my-file-system")

        directory_client = file_system_client.get_directory_client("my-directory")
        
        file_client = directory_client.get_file_client("uploaded-file.txt")

        acl_props = file_client.get_access_control()
        
        print(acl_props['permissions'])
        
        new_file_permissions = "rwxr-xrw-"
        
        file_client.set_access_control(permissions=new_file_permissions)
        
        acl_props = file_client.get_access_control()
        
        print(acl_props['permissions'])

    except Exception as e:
     print(e) 

以递归方式设置 ACLSet an ACL recursively

你可以为父目录的现有子项以递归方式添加、更新和删除 ACL,而不必为每个子项单独进行这些更改。You can add, update, and remove ACLs recursively on the existing child items of a parent directory without having to make these changes individually for each child item. 有关详细信息,请参阅以递归方式为 Azure Data Lake Storage Gen2 设置访问控制列表 (ACL)For more information, see Set access control lists (ACLs) recursively for Azure Data Lake Storage Gen2.

另请参阅See also