从 Azure 存储下载大量随机数据Download large amounts of random data from Azure storage

本教程是一个系列中的第三部分。This tutorial is part three of a series. 本教程介绍如何从 Azure 存储下载大量数据。This tutorial shows you how to download large amounts of data from Azure storage.

在该系列的第三部分中,你会学习如何:In part three of the series, you learn how to:

  • 更新应用程序Update the application
  • 运行应用程序Run the application
  • 验证连接数Validate the number of connections

先决条件Prerequisites

若要完成本教程,必须先完成上一存储教程:将大量随机数据并行上传到 Azure 存储To complete this tutorial, you must have completed the previous Storage tutorial: Upload large amounts of random data in parallel to Azure storage.

远程登录到虚拟机Remote into your virtual machine

若要创建与虚拟机的远程桌面会话,请在本地计算机上使用以下命令。To create a remote desktop session with the virtual machine, use the following command on your local machine. 将 IP 地址替换为虚拟机的 publicIPAddress。Replace the IP address with the publicIPAddress of your virtual machine. 出现提示时,输入创建虚拟机时使用的凭据。When prompted, enter the credentials used when creating the virtual machine.

mstsc /v:<publicIpAddress>

更新应用程序Update the application

上一教程中只将文件上传到了存储帐户。In the previous tutorial, you only uploaded files to the storage account. 在文本编辑器中打开 D:\git\storage-dotnet-perf-scale-app\Program.csOpen D:\git\storage-dotnet-perf-scale-app\Program.cs in a text editor. Main 方法替换为以下示例。Replace the Main method with the following sample. 本示例添加了上传任务注释,取消了下载任务注释以及在完成时删除存储帐户中的内容这一任务的注释。This example comments out the upload task and uncomments the download task and the task to delete the content in the storage account when complete.

public static void Main(string[] args)
{
    Console.WriteLine("Azure Blob storage performance and scalability sample");
    // Set threading and default connection limit to 100 to ensure multiple threads and connections can be opened.
    // This is in addition to parallelism with the storage client library that is defined in the functions below.
    ThreadPool.SetMinThreads(100, 4);
    ServicePointManager.DefaultConnectionLimit = 100; // (Or More)

    bool exception = false;
    try
    {
        // Call the UploadFilesAsync function.
        UploadFilesAsync().GetAwaiter().GetResult();

        // Uncomment the following line to enable downloading of files from the storage account.  This is commented out
        // initially to support the tutorial at https://docs.azure.cn/storage/blobs/storage-blob-scalable-app-download-files.
        // DownloadFilesAsync().GetAwaiter().GetResult();
    }
    catch (Exception ex)
    {
        Console.WriteLine(ex.Message);
        exception = true;
    }
    finally
    {
        // The following function will delete the container and all files contained in them.  This is commented out initially
        // As the tutorial at https://docs.azure.cn/storage/blobs/storage-blob-scalable-app-download-files has you upload only for one tutorial and download for the other. 
        if (!exception)
        {
            // DeleteExistingContainersAsync().GetAwaiter().GetResult();
        }
        Console.WriteLine("Press any key to exit the application");
        Console.ReadKey();
    }
}

应用程序更新后,需再次生成应用程序。After the application has been updated, you need to build the application again. 打开 Command Prompt 并导航到 D:\git\storage-dotnet-perf-scale-appOpen a Command Prompt and navigate to D:\git\storage-dotnet-perf-scale-app. 通过运行 dotnet build 重新生成应用程序,如以下示例所示:Rebuild the application by running dotnet build as seen in the following example:

dotnet build

运行应用程序Run the application

现在,已重新生成应用程序,可使用更新的代码运行该应用程序。Now that the application has been rebuilt it is time to run the application with the updated code. 如果尚未打开,请打开 Command Prompt 并导航到 D:\git\storage-dotnet-perf-scale-appIf not already open, open a Command Prompt and navigate to D:\git\storage-dotnet-perf-scale-app.

键入 dotnet run 运行应用程序。Type dotnet run to run the application.

dotnet run

应用程序读取位于 storageconnectionstring 中指定的存储帐户中的容器 。The application reads the containers located in the storage account specified in the storageconnectionstring. 它使用容器中的 ListBlobsSegmented 方法每次循环访问 10 个 blob,并使用 DownloadToFileAsync 方法将它们下载到本地计算机。It iterates through the blobs 10 at a time using the ListBlobsSegmented method in the containers and downloads them to the local machine using the DownloadToFileAsync method. 下表显示了每个 blob 下载完成后为其定义的 BlobRequestOptionsThe following table shows the BlobRequestOptions that are defined for each blob as it is downloaded.

propertiesProperty Value 说明Description
DisableContentMD5ValidationDisableContentMD5Validation truetrue 该属性禁用对上传内容的 MD5 哈希检查。This property disables checking the MD5 hash of the content uploaded. 禁用 MD5 验证可加快传输速度。Disabling MD5 validation produces a faster transfer. 但是不能确认传输文件的有效性或完整性。But does not confirm the validity or integrity of the files being transferred.
StoreBlobContentMD5StoreBlobContentMD5 falsefalse 该属性确定是否计算和存储 MD5 哈希。This property determines if an MD5 hash is calculated and stored.

下例显示了 DownloadFilesAsync 任务:The DownloadFilesAsync task is shown in the following example:

private static async Task DownloadFilesAsync()
{
    CloudBlobClient blobClient = GetCloudBlobClient();

    // Define the BlobRequestOptions on the download, including disabling MD5 hash validation for this example, this improves the download speed.
    BlobRequestOptions options = new BlobRequestOptions
    {
        DisableContentMD5Validation = true,
        StoreBlobContentMD5 = false
    };

    // Retrieve the list of containers in the storage account.  Create a directory and configure variables for use later.
    BlobContinuationToken continuationToken = null;
    List<CloudBlobContainer> containers = new List<CloudBlobContainer>();
    do
    {
        var listingResult = await blobClient.ListContainersSegmentedAsync(continuationToken);
        continuationToken = listingResult.ContinuationToken;
        containers.AddRange(listingResult.Results);
    }
    while (continuationToken != null);

    var directory = Directory.CreateDirectory("download");
    BlobResultSegment resultSegment = null;
    Stopwatch time = Stopwatch.StartNew();

    // Download the blobs
    try
    {
        List<Task> tasks = new List<Task>();
        int max_outstanding = 100;
        int completed_count = 0;

        // Create a new instance of the SemaphoreSlim class to define the number of threads to use in the application.
        SemaphoreSlim sem = new SemaphoreSlim(max_outstanding, max_outstanding);

        // Iterate through the containers
        foreach (CloudBlobContainer container in containers)
        {
            do
            {
                // Return the blobs from the container lazily 10 at a time.
                resultSegment = await container.ListBlobsSegmentedAsync(null, true, BlobListingDetails.All, 10, continuationToken, null, null);
                continuationToken = resultSegment.ContinuationToken;
                {
                    foreach (var blobItem in resultSegment.Results)
                    {

                        if (((CloudBlob)blobItem).Properties.BlobType == BlobType.BlockBlob)
                        {
                            // Get the blob and add a task to download the blob asynchronously from the storage account.
                            CloudBlockBlob blockBlob = container.GetBlockBlobReference(((CloudBlockBlob)blobItem).Name);
                            Console.WriteLine("Downloading {0} from container {1}", blockBlob.Name, container.Name);
                            await sem.WaitAsync();
                            tasks.Add(blockBlob.DownloadToFileAsync(directory.FullName + "\\" + blockBlob.Name, FileMode.Create, null, options, null).ContinueWith((t) =>
                            {
                                sem.Release();
                                Interlocked.Increment(ref completed_count);
                            }));

                        }
                    }
                }
            }
            while (continuationToken != null);
        }

        // Creates an asynchronous task that completes when all the downloads complete.
        await Task.WhenAll(tasks);
    }
    catch (Exception e)
    {
        Console.WriteLine("\nError encountered during transfer: {0}", e.Message);
    }

    time.Stop();
    Console.WriteLine("Download has been completed in {0} seconds. Press any key to continue", time.Elapsed.TotalSeconds.ToString());
    Console.ReadLine();
}

验证连接Validate the connections

在下载文件的同时,可以验证存储帐户的并发连接数。While the files are being downloaded, you can verify the number of concurrent connections to your storage account. 打开 Command Prompt 并键入 netstat -a | find /c "blob:https"Open a Command Prompt and type netstat -a | find /c "blob:https". 此命令显示当前使用 netstat 打开的连接数。This command shows the number of connections that are currently opened using netstat. 下例显示的输出与自己运行该教程时看到的输出类似。The following example shows a similar output to what you see when running the tutorial yourself. 如该示例所示,从存储帐户下载随机文件时,打开了 280 多个连接。As you can see from the example, over 280 connections were open when downloading the random files from the storage account.

C:\>netstat -a | find /c "blob:https"
289

C:\>

后续步骤Next steps

本系列的第三部分介绍了从存储帐户下载大量随机数据的方法,例如如何:In part three of the series, you learned about downloading large amounts of random data from a storage account, such as how to:

  • 运行应用程序Run the application
  • 验证连接数Validate the number of connections

接下来进入本系列的第四部分,验证门户中的吞吐量和延迟指标。Advance to part four of the series to verify throughput and latency metrics in the portal.