使用 C++ 列出 Azure 存储资源List Azure Storage resources in C++

使用 Azure 存储进行开发时,很多情况下列表操作很重要。Listing operations are key to many development scenarios with Azure Storage. 本文介绍如何使用用于 C++ 的 Azure 存储客户端库中提供的列表 API 最有效率地枚举 Azure 存储中的对象。This article describes how to most efficiently enumerate objects in Azure Storage using the listing APIs provided in the Azure Storage Client Library for C++.

Note

本指南主要面向适用于 C++ 版本 2.x 的 Azure 存储客户端库,该库可通过 NuGetGitHub 获取。This guide targets the Azure Storage Client Library for C++ version 2.x, which is available via NuGet or GitHub.

存储客户端库提供了多种方法,用于列出或查询 Azure 存储中的对象。The Storage Client Library provides a variety of methods to list or query objects in Azure Storage. 本文探讨以下方案:This article addresses the following scenarios:

  • 列出帐户中的容器List containers in an account
  • 列出容器或虚拟 blob 目录中的 blobList blobs in a container or virtual blob directory
  • 列出帐户中的队列List queues in an account
  • 列出帐户中的表List tables in an account
  • 查询表中的实体Query entities in a table

使用不同的重载针对不同的方案演示上述每种方法。Each of these methods is shown using different overloads for different scenarios.

异步与同步Asynchronous versus synchronous

由于 C++ 的存储客户端库是在 C++ REST 库基础上构建的,因此我们实际上也支持使用 pplx::task 进行异步操作。Because the Storage Client Library for C++ is built on top of the C++ REST library, we inherently support asynchronous operations by using pplx::task. 例如:For example:

pplx::task<list_blob_item_segment> list_blobs_segmented_async(continuation_token& token) const;

同步操作包装相应的异步操作:Synchronous operations wrap the corresponding asynchronous operations:

list_blob_item_segment list_blobs_segmented(const continuation_token& token) const
{
    return list_blobs_segmented_async(token).get();
}

如果要使用多个线程应用程序或服务,我们建议直接使用异步 API,不必创建线程来调用同步 API,那样会严重影响性能。If you are working with multiple threading applications or services, we recommend that you use the async APIs directly instead of creating a thread to call the sync APIs, which significantly impacts your performance.

分段列表Segmented listing

云存储的规模决定了要使用分段列表。The scale of cloud storage requires segmented listing. 例如,你可能在 Azure blob 容器中有超过一百万个 blob,或者在 Azure 表中有超过十亿个实体。For example, you can have over a million blobs in an Azure blob container or over a billion entities in an Azure Table. 这些不是理论上的数字,而是实际的客户使用情况。These are not theoretical numbers, but real customer usage cases.

因此,要在单个响应中列出所有对象是不实际的。It is therefore impractical to list all objects in a single response. 与之相反,可以使用分页来列出对象。Instead, you can list objects using paging. 每个列表 API 都有 分段 重载。Each of the listing APIs has a segmented overload.

分段列表操作的响应包括:The response for a segmented listing operation includes:

  • _segment,其中包含针对列表 API 进行单个调用时返回的结果集。_segment, which contains the set of results returned for a single call to the listing API.
  • continuation_token,将传递给下一个调用,以获取下一页结果 。continuation_token, which is passed to the next call in order to get the next page of results. 当不再有要返回的结果时,继续标记为 null。When there are no more results to return, the continuation token is null.

例如,进行典型调用以列出容器中的所有 blob 时,该调用的代码片段可能如下所示。For example, a typical call to list all blobs in a container may look like the following code snippet. 我们的 示例中提供了该代码:The code is available in our samples:

// List blobs in the blob container
azure::storage::continuation_token token;
do
{
    azure::storage::list_blob_item_segment segment = container.list_blobs_segmented(token);
    for (auto it = segment.results().cbegin(); it != segment.results().cend(); ++it)
{
    if (it->is_blob())
    {
        process_blob(it->as_blob());
    }
    else
    {
        process_directory(it->as_directory());
    }
}

    token = segment.continuation_token();
}
while (!token.empty());

请注意,一页中返回的结果数可以通过每个 API 的重载中的参数 max_results 进行控制,例如 :Note that the number of results returned in a page can be controlled by the parameter max_results in the overload of each API, for example:

list_blob_item_segment list_blobs_segmented(const utility::string_t& prefix, bool use_flat_blob_listing,
    blob_listing_details::values includes, int max_results, const continuation_token& token,
    const blob_request_options& options, operation_context context)

如果未指定 max_results 参数,则会在单个页面中返回默认的最大值(最多 5000 个结果) 。If you do not specify the max_results parameter, the default maximum value of up to 5000 results is returned in a single page.

另请注意,针对 Azure 表存储进行查询时,可能不会返回任何记录,或者返回的记录数小于所指定的 max_results 参数的值,即使继续标记不为空 。Also note that a query against Azure Table storage may return no records, or fewer records than the value of the max_results parameter that you specified, even if the continuation token is not empty. 可能的一个原因是,查询可能无法在 5 秒钟内完成。One reason might be that the query could not complete in five seconds. 只要继续标记不为空,查询就会继续,代码不应假定分段结果的大小。As long as the continuation token is not empty, the query should continue, and your code should not assume the size of segment results.

大多数情况下,建议采用分段列表编码模式,因为这样可以明确地了解列表或查询的进度,以及服务对每个请求的响应方式。The recommended coding pattern for most scenarios is segmented listing, which provides explicit progress of listing or querying, and how the service responds to each request. 具体说来,对于 C++ 应用程序或服务来说,对列表进度进行低级别的控制可以更好地控制内存和性能。Particularly for C++ applications or services, lower-level control of the listing progress may help control memory and performance.

贪婪列表Greedy listing

早期版本的用于 C++ 的存储客户端库(0.5.0 预览版以及更低版本)包括适用于表和查询的不分段列表 API,如以下示例所示:Earlier versions of the Storage Client Library for C++ (versions 0.5.0 Preview and earlier) included non-segmented listing APIs for tables and queues, as in the following example:

std::vector<cloud_table> list_tables(const utility::string_t& prefix) const;
std::vector<table_entity> execute_query(const table_query& query) const;
std::vector<cloud_queue> list_queues() const;

这些方法在实现时,以分段 API 包装器的方式进行。These methods were implemented as wrappers of segmented APIs. 每次对分段列表进行响应时,代码会将结果附加到一个矢量,并在对完整的容器进行扫描后返回所有结果。For each response of segmented listing, the code appended the results to a vector and returned all results after the full containers were scanned.

当存储帐户或表所包含的对象数量较少时,此方法可能有效。This approach might work when the storage account or table contains a small number of objects. 但是,随着对象数目的增加,所需的内存可能会增加且没有限制,因为所有结果都保留在内存中。However, with an increase in the number of objects, the memory required could increase without limit, because all results remained in memory. 一个列表操作可能需要很长时间,调用方在此期间无法获得进度方面的信息。One listing operation can take a very long time, during which the caller had no information about its progress.

SDK 中的此类贪婪列表 API 在 C#、Java 或 JavaScript Node.js 环境中不存在。These greedy listing APIs in the SDK do not exist in C#, Java, or the JavaScript Node.js environment. 若要避免使用这些贪婪 API 带来的潜在问题,我们在 0.6.0 预览版中删除了它们。To avoid the potential issues of using these greedy APIs, we removed them in version 0.6.0 Preview.

如果代码调用这些贪婪 API:If your code is calling these greedy APIs:

std::vector<azure::storage::table_entity> entities = table.execute_query(query);
for (auto it = entities.cbegin(); it != entities.cend(); ++it)
{
    process_entity(*it);
}

应该修改代码,改用分段列表 API:Then you should modify your code to use the segmented listing APIs:

azure::storage::continuation_token token;
do
{
    azure::storage::table_query_segment segment = table.execute_query_segmented(query, token);
    for (auto it = segment.results().cbegin(); it != segment.results().cend(); ++it)
    {
        process_entity(*it);
    }

    token = segment.continuation_token();
} while (!token.empty());

可以指定该段的 max_results 参数,在请求数和内存使用量之间进行平衡,满足应用程序的性能要求。By specifying the max_results parameter of the segment, you can balance between the numbers of requests and memory usage to meet performance considerations for your application.

此外,如果使用了分段列表 API,但采用“贪婪”方式将数据存储在本地集合中,则我们也强烈建议对代码进行重构,谨慎地应对数据处理规模扩大时会数据存储在本地集合中带来的问题。Additionally, if you're using segmented listing APIs, but store the data in a local collection in a "greedy" style, we also strongly recommend that you refactor your code to handle storing data in a local collection carefully at scale.

懒惰列表Lazy listing

虽然贪婪列表带来了各种潜在的问题,但如果容器中的对象不是很多,则使用起来很方便。Although greedy listing raised potential issues, it is convenient if there are not too many objects in the container.

如果还使用 C# 或 Oracle Java SDK,则应熟悉枚举型编程模式,该模式提供懒惰形式的列表,仅在需要时才提取具有特定偏移量的数据。If you're also using C# or Oracle Java SDKs, you should be familiar with the Enumerable programming model, which offers a lazy-style listing, where the data at a certain offset is only fetched if it is required. 在 C++ 中,基于迭代器的模板也提供了类似方法。In C++, the iterator-based template also provides a similar approach.

典型的懒惰列表 API(使用 list_blobs 作为示例)如下所示 :A typical lazy listing API, using list_blobs as an example, looks like this:

list_blob_item_iterator list_blobs() const;

使用懒惰列表模式的典型代码段可能会如下所示:A typical code snippet that uses the lazy listing pattern might look like this:

// List blobs in the blob container
azure::storage::list_blob_item_iterator end_of_results;
for (auto it = container.list_blobs(); it != end_of_results; ++it)
{
    if (it->is_blob())
    {
        process_blob(it->as_blob());
    }
    else
    {
        process_directory(it->as_directory());
    }
}

请注意,懒惰列表仅在同步模式下可用。Note that lazy listing is only available in synchronous mode.

与贪婪列表相比,懒惰列表仅在必要时提取数据。Compared with greedy listing, lazy listing fetches data only when necessary. 实际上,它仅在下一个迭代器进入下一段的情况下,才从 Azure 存储提取数据。Under the covers, it fetches data from Azure Storage only when the next iterator moves into next segment. 因此,内存使用量被控制为界定的大小,而且运行速度也快。Therefore, memory usage is controlled with a bounded size, and the operation is fast.

懒惰列表 API 包括在用于 C++ 的存储客户端库的 2.2.0 版中。Lazy listing APIs are included in the Storage Client Library for C++ in version 2.2.0.

结论Conclusion

在本文中,我们针对用于 C++ 的存储客户端库中的各种对象,对列表 API 的不同重载进行了讨论。In this article, we discussed different overloads for listing APIs for various objects in the Storage Client Library for C++ . 总结:To summarize:

  • 在出现多个线程的情况下,强烈建议使用异步 API。Async APIs are strongly recommended under multiple threading scenarios.
  • 大多数情况下,建议使用分段的列表。Segmented listing is recommended for most scenarios.
  • 在库中提供懒惰列表是将其作为包装器,适合在同步方案中使用。Lazy listing is provided in the library as a convenient wrapper in synchronous scenarios.
  • 不建议使用贪婪列表,因此已将其从库中删除。Greedy listing is not recommended and has been removed from the library.

后续步骤Next steps

有关 Azure 存储以及用于 C++ 的客户端库的详细信息,请参阅以下资源。For more information about Azure Storage and Client Library for C++, see the following resources.