使用 Python 管理 Azure Data Lake Storage Gen2 中的目录、文件和 ACLUse Python to manage directories, files, and ACLs in Azure Data Lake Storage Gen2

本文介绍如何使用 Python 在启用了分层命名空间 (HNS) 的存储帐户中创建和管理目录、文件与权限。This article shows you how to use Python to create and manage directories, files, and permissions in storage accounts that has hierarchical namespace (HNS) enabled.

包(Python 包索引) | 示例 | API 参考 | 提供反馈Package (Python Package Index) | Samples | API reference | Give Feedback

先决条件Prerequisites

设置项目Set up your project

使用 pip 安装适用于 Python 的 Azure Data Lake Storage 客户端库。Install the Azure Data Lake Storage client library for Python by using pip.

pip install azure-storage-file-datalake

将这些 import 语句添加到代码文件的顶部。Add these import statements to the top of your code file.

import os, uuid, sys
from azure.storage.filedatalake import DataLakeServiceClient
from azure.core._match_conditions import MatchConditions
from azure.storage.filedatalake._models import ContentSettings

连接到帐户Connect to the account

若要使用本文中的代码片段,需创建一个表示存储帐户的 DataLakeServiceClient 实例。To use the snippets in this article, you'll need to create a DataLakeServiceClient instance that represents the storage account.

使用帐户密钥进行连接Connect by using an account key

这是连接到帐户的最简单方法。This is the easiest way to connect to an account.

此示例使用帐户密钥创建 DataLakeServiceClient 实例。This example creates a DataLakeServiceClient instance by using an account key.

try:  
    global service_client
        
    service_client = DataLakeServiceClient(account_url="{}://{}.dfs.core.chinacloudapi.cn".format(
        "https", storage_account_name), credential=storage_account_key)
    
except Exception as e:
    print(e)
  • storage_account_name 占位符值替换为存储帐户的名称。Replace the storage_account_name placeholder value with the name of your storage account.

  • storage_account_key 占位符值替换为存储帐户访问密钥。Replace the storage_account_key placeholder value with your storage account access key.

使用 Azure Active Directory (AD) 进行连接Connect by using Azure Active Directory (AD)

可以使用适用于 Python 的 Azure 标识客户端库,通过 Azure AD 对应用程序进行身份验证。You can use the Azure identity client library for Python to authenticate your application with Azure AD.

此示例使用客户端 ID、客户端密码和租户 ID 创建 DataLakeServiceClient 实例。This example creates a DataLakeServiceClient instance by using a client ID, a client secret, and a tenant ID. 若要获取这些值,请参阅从 Azure AD 获取用于请求客户端应用程序授权的令牌To get these values, see Acquire a token from Azure AD for authorizing requests from a client application.

def initialize_storage_account_ad(storage_account_name, client_id, client_secret, tenant_id):
    
    try:  
        global service_client

        credential = ClientSecretCredential(tenant_id, client_id, client_secret)

        service_client = DataLakeServiceClient(account_url="{}://{}.dfs.core.chinacloudapi.cn".format(
            "https", storage_account_name), credential=credential)
    
    except Exception as e:
        print(e)

备注

有关更多示例,请参阅适用于 Python 的 Azure 标识客户端库文档。For more examples, see the Azure identity client library for Python documentation.

创建容器Create a container

容器充当文件的文件系统。A container acts as a file system for your files. 可以通过调用 FileSystemDataLakeServiceClient.create_file_system 方法来创建一个。You can create one by calling the FileSystemDataLakeServiceClient.create_file_system method.

此示例创建一个名为 my-file-system 的容器。This example creates a container named my-file-system.

def create_file_system():
    try:
        global file_system_client

        file_system_client = service_client.create_file_system(file_system="my-file-system")
    
    except Exception as e:
        print(e) 

创建目录Create a directory

通过调用 FileSystemClient.create_directory 方法来创建目录引用。Create a directory reference by calling the FileSystemClient.create_directory method.

此示例将名为 my-directory 的目录添加到容器中。This example adds a directory named my-directory to a container.

def create_directory():
    try:
        file_system_client.create_directory("my-directory")
    
    except Exception as e:
     print(e) 

重命名或移动目录Rename or move a directory

通过调用 DataLakeDirectoryClient.rename_directory 方法来重命名或移动目录。Rename or move a directory by calling the DataLakeDirectoryClient.rename_directory method. 以参数形式传递所需目录的路径。Pass the path of the desired directory a parameter.

此示例将子目录重命名为 my-subdirectory-renamed 的名称。This example renames a sub-directory to the name my-subdirectory-renamed.

def rename_directory():
    try:
       
       file_system_client = service_client.get_file_system_client(file_system="my-file-system")
       directory_client = file_system_client.get_directory_client("my-directory")
       
       new_dir_name = "my-directory-renamed"
       directory_client.rename_directory(rename_destination=directory_client.file_system_name + '/' + new_dir_name)

    except Exception as e:
     print(e) 

删除目录Delete a directory

通过调用 DataLakeDirectoryClient.delete_directory 方法来删除目录。Delete a directory by calling the DataLakeDirectoryClient.delete_directory method.

此示例删除名为 my-directory 的目录。This example deletes a directory named my-directory.

def delete_directory():
    try:
        file_system_client = service_client.get_file_system_client(file_system="my-file-system")
        directory_client = file_system_client.get_directory_client("my-directory")

        directory_client.delete_directory()
    except Exception as e:
     print(e) 

管理目录权限Manage directory permissions

通过调用 DataLakeDirectoryClient.get_access_control 方法获取目录的访问控制列表 (ACL),并通过调用 DataLakeDirectoryClient.set_access_control 方法来设置 ACL。Get the access control list (ACL) of a directory by calling the DataLakeDirectoryClient.get_access_control method and set the ACL by calling the DataLakeDirectoryClient.set_access_control method.

备注

如果你的应用程序通过使用 Azure Active Directory (Azure AD) 来授予访问权限,请确保已向应用程序用来授权访问的安全主体分配了存储 Blob 数据所有者角色If your application authorizes access by using Azure Active Directory (Azure AD), then make sure that the security principal that your application uses to authorize access has been assigned the Storage Blob Data Owner role. 若要详细了解如何应用 ACL 权限以及更改它们所带来的影响,请参阅 Azure Data Lake Storage Gen2 中的访问控制To learn more about how ACL permissions are applied and the effects of changing them, see Access control in Azure Data Lake Storage Gen2.

此示例获取并设置名为 my-directory 的目录的 ACL。This example gets and sets the ACL of a directory named my-directory. 字符串 rwxr-xrw- 为拥有用户提供读取、写入和执行权限,为拥有组授予读取和执行权限,并为所有其他用户提供读取和写入权限。The string rwxr-xrw- gives the owning user read, write, and execute permissions, gives the owning group only read and execute permissions, and gives all others read and write permission.

def manage_directory_permissions():
    try:
        file_system_client = service_client.get_file_system_client(file_system="my-file-system")

        directory_client = file_system_client.get_directory_client("my-directory")
        
        acl_props = directory_client.get_access_control()
        
        print(acl_props['permissions'])
        
        new_dir_permissions = "rwxr-xrw-"
        
        directory_client.set_access_control(permissions=new_dir_permissions)
        
        acl_props = directory_client.get_access_control()
        
        print(acl_props['permissions'])
    
    except Exception as e:
     print(e) 

还可以获取和设置容器根目录的 ACL。You can also get and set the ACL of the root directory of a container. 若要获取根目录,请调用 FileSystemClient._get_root_directory_client 方法。To get the root directory, call the FileSystemClient._get_root_directory_client method.

将文件上传到目录Upload a file to a directory

首先,通过创建 DataLakeFileClient 类的实例,在目标目录中创建文件引用。First, create a file reference in the target directory by creating an instance of the DataLakeFileClient class. 通过调用 DataLakeFileClient.append_data 方法上传文件。Upload a file by calling the DataLakeFileClient.append_data method. 确保通过调用 DataLakeFileClient.flush_data 方法完成上传。Make sure to complete the upload by calling the DataLakeFileClient.flush_data method.

此示例将文本文件上传到名为 my-directory 的目录。This example uploads a text file to a directory named my-directory.

def upload_file_to_directory():
    try:

        file_system_client = service_client.get_file_system_client(file_system="my-file-system")

        directory_client = file_system_client.get_directory_client("my-directory")
        
        file_client = directory_client.create_file("uploaded-file.txt")
        local_file = open("C:\\file-to-upload.txt",'r')

        file_contents = local_file.read()

        file_client.append_data(data=file_contents, offset=0, length=len(file_contents))

        file_client.flush_data(len(file_contents))

    except Exception as e:
      print(e) 

提示

如果文件很大,则代码必须多次调用 DataLakeFileClient.append_data 方法。If your file size is large, your code will have to make multiple calls to the DataLakeFileClient.append_data method. 请考虑改用 DataLakeFileClient.upload_data 方法。Consider using the DataLakeFileClient.upload_data method instead. 这样就可以在单个调用中上传整个文件。That way, you can upload the entire file in a single call.

将大型文件上传到目录Upload a large file to a directory

使用 DataLakeFileClient.upload_data 方法上传大型文件,无需多次调用 DataLakeFileClient.append_data 方法 。Use the DataLakeFileClient.upload_data method to upload large files without having to make multiple calls to the DataLakeFileClient.append_data method.

def upload_file_to_directory_bulk():
    try:

        file_system_client = service_client.get_file_system_client(file_system="my-file-system")

        directory_client = file_system_client.get_directory_client("my-directory")
        
        file_client = directory_client.get_file_client("uploaded-file.txt")

        local_file = open("C:\\file-to-upload.txt",'r')

        file_contents = local_file.read()

        file_client.upload_data(file_contents, overwrite=True)

    except Exception as e:
      print(e) 

管理文件权限Manage file permissions

通过调用 DataLakeFileClient.get_access_control 方法获取文件的访问控制列表 (ACL),并通过调用 DataLakeFileClient.set_access_control 方法来设置 ACL。Get the access control list (ACL) of a file by calling the DataLakeFileClient.get_access_control method and set the ACL by calling the DataLakeFileClient.set_access_control method.

备注

如果你的应用程序通过使用 Azure Active Directory (Azure AD) 来授予访问权限,请确保已向应用程序用来授权访问的安全主体分配了存储 Blob 数据所有者角色If your application authorizes access by using Azure Active Directory (Azure AD), then make sure that the security principal that your application uses to authorize access has been assigned the Storage Blob Data Owner role. 若要详细了解如何应用 ACL 权限以及更改它们所带来的影响,请参阅 Azure Data Lake Storage Gen2 中的访问控制To learn more about how ACL permissions are applied and the effects of changing them, see Access control in Azure Data Lake Storage Gen2.

此示例获取并设置名为 my-file.txt 的文件的 ACL。This example gets and sets the ACL of a file named my-file.txt. 字符串 rwxr-xrw- 为拥有用户提供读取、写入和执行权限,为拥有组授予读取和执行权限,并为所有其他用户提供读取和写入权限。The string rwxr-xrw- gives the owning user read, write, and execute permissions, gives the owning group only read and execute permissions, and gives all others read and write permission.

def manage_file_permissions():
    try:
        file_system_client = service_client.get_file_system_client(file_system="my-file-system")

        directory_client = file_system_client.get_directory_client("my-directory")
        
        file_client = directory_client.get_file_client("uploaded-file.txt")

        acl_props = file_client.get_access_control()
        
        print(acl_props['permissions'])
        
        new_file_permissions = "rwxr-xrw-"
        
        file_client.set_access_control(permissions=new_file_permissions)
        
        acl_props = file_client.get_access_control()
        
        print(acl_props['permissions'])

    except Exception as e:
     print(e) 

从目录下载Download from a directory

打开用于写入的本地文件。Open a local file for writing. 然后,创建一个 DataLakeFileClient 实例,该实例表示要下载的文件。Then, create a DataLakeFileClient instance that represents the file that you want to download. 调用 DataLakeFileClient.read_file,以便从文件读取字节,然后将这些字节写入本地文件。Call the DataLakeFileClient.read_file to read bytes from the file and then write those bytes to the local file.

def download_file_from_directory():
    try:
        file_system_client = service_client.get_file_system_client(file_system="my-file-system")

        directory_client = file_system_client.get_directory_client("my-directory")
        
        local_file = open("C:\\file-to-download.txt",'wb')

        file_client = directory_client.get_file_client("uploaded-file.txt")

        download = file_client.download_file()

        downloaded_bytes = download.readall()

        local_file.write(downloaded_bytes)

        local_file.close()

    except Exception as e:
     print(e)

列出目录内容List directory contents

通过调用 FileSystemClient.get_paths 方法列出目录内容,然后枚举结果。List directory contents by calling the FileSystemClient.get_paths method, and then enumerating through the results.

此示例输出名为 my-directory 的目录中的每个子目录和文件的路径。This example, prints the path of each subdirectory and file that is located in a directory named my-directory.

def list_directory_contents():
    try:
        
        file_system_client = service_client.get_file_system_client(file_system="my-file-system")

        paths = file_system_client.get_paths(path="my-directory")

        for path in paths:
            print(path.name + '\n')

    except Exception as e:
     print(e) 

另请参阅See also