将大量随机数据以并行方式上传到 Azure 存储Upload large amounts of random data in parallel to Azure storage

本教程是一个系列中的第二部分。This tutorial is part two of a series. 本教程演示如何部署将大量随机数据上传到 Azure 存储帐户的应用程序。This tutorial shows you deploy an application that uploads large amount of random data to an Azure storage account.

本系列教程的第二部分将介绍如何:In part two of the series, you learn how to:

  • 配置连接字符串Configure the connection string
  • 构建应用程序Build the application
  • 运行应用程序Run the application
  • 验证连接数Validate the number of connections

Azure Blob 存储提供可缩放的服务来存储数据。Azure Blob Storage provides a scalable service for storing your data. 为了尽可能提高应用程序的性能,建议了解 blob 存储的工作方式。To ensure your application is as performant as possible, an understanding of how blob storage works is recommended. 了解 Azure Blob 的限制非常重要,要深入了解这些限制,请访问:Blob 存储的可伸缩性和性能目标Knowledge of the limits for Azure blobs is important, to learn more about these limits visit: Scalability and performance targets for Blob storage.

在使用 Blob 设计高性能应用程序时,分区命名是另一个潜在重要因素。Partition naming is another potentially important factor when designing a high-performance application using blobs. 对于大于或等于4 MiB 的块大小,会使用高吞吐量块 blob,并且分区命名不会影响性能。For block sizes greater than or equal to 4 MiB, High-Throughput block blobs are used, and partition naming will not impact performance. 对于小于4 MiB 的块大小,Azure 存储使用基于范围的分区方案来进行缩放和负载均衡。For block sizes less than 4 MiB, Azure storage uses a range-based partitioning scheme to scale and load balance. 此配置意味着具有相似命名约定或前缀的文件转到相同分区。This configuration means that files with similar naming conventions or prefixes go to the same partition. 此逻辑还包括文件上传到的容器的名称。This logic includes the name of the container that the files are being uploaded to. 本教程中使用名称为 GUID 的文件以及随机生成的内容。In this tutorial, you use files that have GUIDs for names as well as randomly generated content. 然后将这些文件和内容上传到五个使用随机名称的不同容器。They are then uploaded to five different containers with random names.


若要完成本教程,必须先完成上一存储教程:为可缩放的应用程序创建虚拟机和存储帐户To complete this tutorial, you must have completed the previous Storage tutorial: Create a virtual machine and storage account for a scalable application.

远程登录到虚拟机Remote into your virtual machine

在本地计算机上使用以下命令创建与虚拟机的远程桌面会话。Use the following command on your local machine to create a remote desktop session with the virtual machine. 将 IP 地址替换为虚拟机的 publicIPAddress。Replace the IP address with the publicIPAddress of your virtual machine. 出现提示时,输入创建虚拟机时使用的凭据。When prompted, enter the credentials you used when creating the virtual machine.

mstsc /v:<publicIpAddress>

配置连接字符串Configure the connection string

在 Azure 门户中导航到存储帐户。In the Azure portal, navigate to your storage account. 在存储帐户的“设置” 下选择“访问密钥” 。Select Access keys under Settings in your storage account. 从主密钥或辅助密钥复制 连接字符串Copy the connection string from the primary or secondary key. 登录到上一教程中创建的虚拟机。Log in to the virtual machine you created in the previous tutorial. 以管理员身份打开“命令提示符”,并使用 /m 开关运行 setx 命令,该命令可保存计算机设置环境变量 。Open a Command Prompt as an administrator and run the setx command with the /m switch, this command saves a machine setting environment variable. 重载“命令提示符”后,环境变量才可用 。The environment variable is not available until you reload the Command Prompt. 替换以下示例中的“<storageConnectionString>”:Replace <storageConnectionString> in the following sample:

setx storageconnectionstring "<storageConnectionString>" /m

完成后,打开另一“命令提示符”,导航到 D:\git\storage-dotnet-perf-scale-app 并键入 dotnet build 以生成应用程序 。When finished, open another Command Prompt, navigate to D:\git\storage-dotnet-perf-scale-app and type dotnet build to build the application.

运行应用程序Run the application

导航到 D:\git\storage-dotnet-perf-scale-appNavigate to D:\git\storage-dotnet-perf-scale-app.

键入 dotnet run 运行应用程序。Type dotnet run to run the application. 首次运行 dotnet 时,它会填充本地程序包高速缓存,以加快恢复速度并实现脱机访问。The first time you run dotnet it populates your local package cache, to improve restore speed and enable offline access. 完成此命令需要最多一分钟,并且仅完成一次。This command takes up to a minute to complete and only happens once.

dotnet run

应用程序创建五个随机命名的容器,并开始将暂存目录中的文件上传到存储帐户。The application creates five randomly named containers and begins uploading the files in the staging directory to the storage account.

下例显示了 UploadFilesAsync 方法:The UploadFilesAsync method is shown in the following example:

private static async Task UploadFilesAsync()
    // Create five randomly named containers to store the uploaded files.
    BlobContainerClient[] containers = await GetRandomContainersAsync();

    // Path to the directory to upload
    string uploadPath = Directory.GetCurrentDirectory() + "\\upload";

    // Start a timer to measure how long it takes to upload all the files.
    Stopwatch timer = Stopwatch.StartNew();

        Console.WriteLine($"Iterating in directory: {uploadPath}");
        int count = 0;

        Console.WriteLine($"Found {Directory.GetFiles(uploadPath).Length} file(s)");

        // Specify the StorageTransferOptions
        BlobUploadOptions options = new BlobUploadOptions
            TransferOptions = new StorageTransferOptions
                // Set the maximum number of workers that 
                // may be used in a parallel transfer.
                MaximumConcurrency = 8,

                // Set the maximum length of a transfer to 50MB.
                MaximumTransferSize = 50 * 1024 * 1024

        // Create a queue of tasks that will each upload one file.
        var tasks = new Queue<Task<Response<BlobContentInfo>>>();

        // Iterate through the files
        foreach (string filePath in Directory.GetFiles(uploadPath))
            BlobContainerClient container = containers[count % 5];
            string fileName = Path.GetFileName(filePath);
            Console.WriteLine($"Uploading {fileName} to container {container.Name}");
            BlobClient blob = container.GetBlobClient(fileName);

            // Add the upload task to the queue
            tasks.Enqueue(blob.UploadAsync(filePath, options));

        // Run all the tasks asynchronously.
        await Task.WhenAll(tasks);

        Console.WriteLine($"Uploaded {count} files in {timer.Elapsed.TotalSeconds} seconds");
    catch (RequestFailedException ex)
        Console.WriteLine($"Azure request failed: {ex.Message}");
    catch (DirectoryNotFoundException ex)
        Console.WriteLine($"Error parsing files in the directory: {ex.Message}");
    catch (Exception ex)
        Console.WriteLine($"Exception: {ex.Message}");

以下示例是截断的应用程序输出,该应用程序在 Windows 系统上运行。The following example is a truncated application output running on a Windows system.

Created container 2dbb45f4-099e-49eb-880c-5b02ebac135e
Created container 0d784365-3bdf-4ef2-b2b2-c17b6480792b
Created container 42ac67f2-a316-49c9-8fdb-860fb32845d7
Created container f0357772-cb04-45c3-b6ad-ff9b7a5ee467
Created container 92480da9-f695-4a42-abe8-fb35e71eb887
Iterating in directory: C:\git\myapp\upload
Found 5 file(s)
Uploading 1d596d16-f6de-4c4c-8058-50ebd8141e4d.pdf to container 2dbb45f4-099e-49eb-880c-5b02ebac135e
Uploading 242ff392-78be-41fb-b9d4-aee8152a6279.pdf to container 0d784365-3bdf-4ef2-b2b2-c17b6480792b
Uploading 38d4d7e2-acb4-4efc-ba39-f9611d0d55ef.pdf to container 42ac67f2-a316-49c9-8fdb-860fb32845d7
Uploading 45930d63-b0d0-425f-a766-cda27ff00d32.pdf to container f0357772-cb04-45c3-b6ad-ff9b7a5ee467
Uploading 5129b385-5781-43be-8bac-e2fbb7d2bd82.pdf to container 92480da9-f695-4a42-abe8-fb35e71eb887
Uploaded 5 files in 16.9552163 seconds

验证连接Validate the connections

在上载文件的同时,可以验证存储帐户的并发连接数。While the files are being uploaded, you can verify the number of concurrent connections to your storage account. 打开控制台窗口,然后键入 netstat -a | find /c "blob:https"Open a console window and type netstat -a | find /c "blob:https". 此命令显示当前打开的连接数。This command shows the number of connections that are currently opened. 如以下示例所示,上传随机文件到存储帐户时,打开了 800 个连接。As you can see from the following example, 800 connections were open when uploading the random files to the storage account. 此值在整个上传过程中不断更改。This value changes throughout running the upload. 通过以并行块区块的形式进行上传,可显著减少传输内容所需的时间。By uploading in parallel block chunks, the amount of time required to transfer the contents is greatly reduced.

C:\>netstat -a | find /c "blob:https"


后续步骤Next steps

本系列的第二部分介绍了以并行方式将大量随机数据上传到存储帐户的方法,例如如何:In part two of the series, you learned about uploading large amounts of random data to a storage account in parallel, such as how to:

  • 配置连接字符串Configure the connection string
  • 构建应用程序Build the application
  • 运行应用程序Run the application
  • 验证连接数Validate the number of connections

进入本系列的第三部分,从存储帐户下载大量数据。Advance to part three of the series to download large amounts of data from a storage account.