将大量随机数据以并行方式上传到 Azure 存储

项目
12/11/2023

本教程是一个系列中的第二部分。本教程演示如何部署将大量随机数据上传到 Azure 存储帐户的应用程序。

本系列教程的第二部分将介绍如何：

配置连接字符串
构建应用程序
运行应用程序
验证连接数

Azure Blob 存储提供可缩放的服务来存储数据。为了尽可能提高应用程序的性能，建议了解 blob 存储的工作方式。了解 Azure Blob 的限制非常重要，要深入了解这些限制，请访问：Blob 存储的可伸缩性和性能目标。

在使用 Blob 设计高性能应用程序时，分区命名是另一个潜在重要因素。对于大于或等于4 MiB 的块大小，会使用高吞吐量块 blob，并且分区命名不会影响性能。对于小于4 MiB 的块大小，Azure 存储使用基于范围的分区方案来进行缩放和负载均衡。此配置意味着具有相似命名约定或前缀的文件转到相同分区。此逻辑还包括文件上传到的容器的名称。本教程中使用名称为 GUID 的文件以及随机生成的内容。然后将这些文件和内容上传到五个使用随机名称的不同容器。

先决条件

若要完成本教程，必须先完成上一存储教程：为可缩放的应用程序创建虚拟机和存储帐户。

远程登录到虚拟机

在本地计算机上使用以下命令创建与虚拟机的远程桌面会话。将 IP 地址替换为虚拟机的 publicIPAddress。出现提示时，输入创建虚拟机时使用的凭据。

mstsc /v:<publicIpAddress>

配置连接字符串

在 Azure 门户中导航到存储帐户。在存储帐户的“设置” 下选择“访问密钥” 。从主密钥或辅助密钥复制连接字符串。登录到上一教程中创建的虚拟机。以管理员身份打开“命令提示符”，并使用 /m 开关运行 setx 命令，该命令可保存计算机设置环境变量。重载“命令提示符”后，环境变量才可用。替换以下示例中的 <storageConnectionString>：

setx storageconnectionstring "<storageConnectionString>" /m

完成后，打开另一“命令提示符”，导航到 D:\git\storage-dotnet-perf-scale-app 并键入 dotnet build 以生成应用程序。

运行应用程序

导航到 D:\git\storage-dotnet-perf-scale-app。

键入 dotnet run 运行应用程序。首次运行 dotnet 时，它会填充本地程序包高速缓存，以加快恢复速度并实现脱机访问。完成此命令需要最多一分钟，并且仅完成一次。

dotnet run

应用程序创建五个随机命名的容器，并开始将暂存目录中的文件上传到存储帐户。

下例显示了 UploadFilesAsync 方法：

private static async Task UploadFilesAsync()
{
    // Create five randomly named containers to store the uploaded files.
    BlobContainerClient[] containers = await GetRandomContainersAsync();

    // Path to the directory to upload
    string uploadPath = Directory.GetCurrentDirectory() + "\\upload";

    // Start a timer to measure how long it takes to upload all the files.
    Stopwatch timer = Stopwatch.StartNew();

    try
    {
        Console.WriteLine($"Iterating in directory: {uploadPath}");
        int count = 0;

        Console.WriteLine($"Found {Directory.GetFiles(uploadPath).Length} file(s)");

        // Specify the StorageTransferOptions
        BlobUploadOptions options = new BlobUploadOptions
        {
            TransferOptions = new StorageTransferOptions
            {
                // Set the maximum number of workers that 
                // may be used in a parallel transfer.
                MaximumConcurrency = 8,

                // Set the maximum length of a transfer to 50MB.
                MaximumTransferSize = 50 * 1024 * 1024
            }
        };

        // Create a queue of tasks that will each upload one file.
        var tasks = new Queue<Task<Response<BlobContentInfo>>>();

        // Iterate through the files
        foreach (string filePath in Directory.GetFiles(uploadPath))
        {
            BlobContainerClient container = containers[count % 5];
            string fileName = Path.GetFileName(filePath);
            Console.WriteLine($"Uploading {fileName} to container {container.Name}");
            BlobClient blob = container.GetBlobClient(fileName);

            // Add the upload task to the queue
            tasks.Enqueue(blob.UploadAsync(filePath, options));
            count++;
        }

        // Run all the tasks asynchronously.
        await Task.WhenAll(tasks);

        timer.Stop();
        Console.WriteLine($"Uploaded {count} files in {timer.Elapsed.TotalSeconds} seconds");
    }
    catch (RequestFailedException ex)
    {
        Console.WriteLine($"Azure request failed: {ex.Message}");
    }
    catch (DirectoryNotFoundException ex)
    {
        Console.WriteLine($"Error parsing files in the directory: {ex.Message}");
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Exception: {ex.Message}");
    }
}

以下示例是截断的应用程序输出，该应用程序在 Windows 系统上运行。

Created container 2dbb45f4-099e-49eb-880c-5b02ebac135e
Created container 0d784365-3bdf-4ef2-b2b2-c17b6480792b
Created container 42ac67f2-a316-49c9-8fdb-860fb32845d7
Created container f0357772-cb04-45c3-b6ad-ff9b7a5ee467
Created container 92480da9-f695-4a42-abe8-fb35e71eb887
Iterating in directory: C:\git\myapp\upload
Found 5 file(s)
Uploading 1d596d16-f6de-4c4c-8058-50ebd8141e4d.pdf to container 2dbb45f4-099e-49eb-880c-5b02ebac135e
Uploading 242ff392-78be-41fb-b9d4-aee8152a6279.pdf to container 0d784365-3bdf-4ef2-b2b2-c17b6480792b
Uploading 38d4d7e2-acb4-4efc-ba39-f9611d0d55ef.pdf to container 42ac67f2-a316-49c9-8fdb-860fb32845d7
Uploading 45930d63-b0d0-425f-a766-cda27ff00d32.pdf to container f0357772-cb04-45c3-b6ad-ff9b7a5ee467
Uploading 5129b385-5781-43be-8bac-e2fbb7d2bd82.pdf to container 92480da9-f695-4a42-abe8-fb35e71eb887
Uploaded 5 files in 16.9552163 seconds

验证连接

在上载文件的同时，可以验证存储帐户的并发连接数。打开控制台窗口，然后键入 netstat -a | find /c "blob:https"。此命令显示当前打开的连接数。如以下示例所示，上传随机文件到存储帐户时，打开了 800 个连接。此值在整个上传过程中不断更改。通过以并行块区块的形式进行上传，可显著减少传输内容所需的时间。

C:\>netstat -a | find /c "blob:https"
800

C:\>

后续步骤

本系列的第二部分介绍了以并行方式将大量随机数据上传到存储帐户的方法，例如如何：

配置连接字符串
构建应用程序
运行应用程序
验证连接数

进入本系列的第三部分，从存储帐户下载大量数据。

从 Azure 存储下载大量随机数据