处理大型 Azure 资源数据集Working with large Azure resource data sets

Azure Resource Graph 旨在处理并获取 Azure 环境中资源的相关信息。Azure Resource Graph is designed for working with and getting information about resources in your Azure environment. Resource Graph 加快了获取此类数据的速度,即使在查询数千条记录,也不例外。Resource Graph makes getting this data fast, even when querying thousands of records. Resource Graph 提供了多个大型数据集处理选项。Resource Graph has several options for working with these large data sets.

若要了解如何频繁地执行查询,请参阅针对受限制请求的指南For guidance on working with queries at a high frequency, see Guidance for throttled requests.

数据集结果大小Data set result size

默认情况下,Resource Graph 限制任何查询都只能返回 100 条记录。By default, Resource Graph limits any query to returning only 100 records. 这项控制措施可保护用户和服务不受会生成大型数据集的意外查询影响。This control protects both the user and the service from unintentional queries that would result in large data sets. 当客户尝试通过查询来按照能满足自己特定需求的方式查找和筛选资源时,这种情况最为常见。This event most often happens as a customer is experimenting with queries to find and filter resources in the way that suits their particular needs. 这项控制措施不同于使用 toplimit Azure 数据资源管理器语言运算符来限制结果。This control is different than using the top or limit Azure Data Explorer language operators to limit the results.

备注

使用 First 时,建议用 ascdesc 按至少一个列对结果进行排序。When using First, it's recommended to order the results by at least one column with asc or desc. 如果不排序,则返回的结果是随机的且不可重复。Without sorting, the results returned are random and not repeatable.

通过与 Resource Graph 交互的所有方法,都可以替代默认限制。The default limit can be overridden through all methods of interacting with Resource Graph. 下面的示例展示了如何将数据集大小限制更改为 200:The following examples show how to change the data set size limit to 200:

az graph query -q "Resources | project name | order by name asc" --first 200 --output table
Search-AzGraph -Query "Resources | project name | order by name asc" -First 200

REST API 中,控制措施是 $top,它属于 QueryRequestOptions。In the REST API, the control is $top and is part of QueryRequestOptions.

最具限制性的控制措施将胜出。The control that is most restrictive will win. 例如,如果查询使用 top 或 limit 运算符,并生成多于 First 的记录,那么返回的记录数上限等于 First。For example, if your query uses the top or limit operators and would result in more records than First, the maximum records returned would be equal to First. 同样,如果 top 或 limit 小于 First,那么返回的记录集小于 top 或 limit 配置的值。Likewise, if top or limit is smaller than First, the record set returned would be the smaller value configured by top or limit.

First 当前允许的最大值为 5000,这是通过一次对 1000 条记录进行结果分页来实现的 。First currently has a maximum allowed value of 5000, which it achieves by paging results 1000 records at a time.

重要

当 First 配置为大于 1000 条记录时,查询必须对 ID 字段进行投射才能使分页生效 。When First is configured to be greater than 1000 records, the query must project the id field in order for pagination to work. 如果查询未进行投射,则响应不会进行分页,且结果将限制为 1000 条记录。If it's missing from the query, the response won't get paged and the results are limited to 1000 records.

跳过记录Skipping records

下一个大型数据集处理选项是 Skip 控制措施。The next option for working with large data sets is the Skip control. 通过这项控制措施,查询可以在返回结果之前跳过或略过定义的记录数。This control allows your query to jump over or skip the defined number of records before returning the results. Skip 适用于以一种有意义的方式对结果进行排序的查询,查询意图是在结果集中间某位置处获取记录。Skip is useful for queries that sort results in a meaningful way where the intent is to get at records somewhere in the middle of the result set. 如果所需的结果位于返回数据集的末尾,更高效的做法是使用不同的排序配置,并从数据集顶部检索结果。If the results needed are at the end of the returned data set, it's more efficient to use a different sort configuration and retrieve the results from the top of the data set instead.

备注

使用 Skip 时,建议用 ascdesc 按至少一个列对结果进行排序。When using Skip, it's recommended to order the results by at least one column with asc or desc. 如果不排序,则返回的结果是随机的且不可重复。Without sorting, the results returned are random and not repeatable.

下面的示例展示了如何跳过查询生成的前 10 条记录,改从第 11 条记录开始返回结果集:The following examples show how to skip the first 10 records a query would result in, instead starting the returned result set with the 11th record:

az graph query -q "Resources | project name | order by name asc" --skip 10 --output table
Search-AzGraph -Query "Resources | project name | order by name asc" -Skip 10

REST API 中,控制措施是 $skip,它属于 QueryRequestOptions。In the REST API, the control is $skip and is part of QueryRequestOptions.

分页结果Paging results

如果有必要将结果集拆分为更小的记录集进行处理,或者因为结果集会超过允许的最大返回记录数(即 1000 条),请使用分页。When it's necessary to break a result set into smaller sets of records for processing or because a result set would exceed the maximum allowed value of 1000 returned records, use paging. REST API QueryResponse 提供了指明结果集已被拆分的值:resultTruncated$skipTokenThe REST API QueryResponse provides values to indicate of a results set has been broken up: resultTruncated and $skipToken. resultTruncated 是布尔值,用于指示使用者返回的响应中是否还有其他记录。resultTruncated is a boolean value that informs the consumer if there are additional records not returned in the response. 如果 count 属性小于 totalRecords 属性,也可以确定此条件。This condition can also be identified when the count property is less than the totalRecords property. totalRecords 定义匹配查询的记录数。totalRecords defines how many records that match the query.

如果由于没有 id 列而禁用了分页或无法进行分页,或者可用资源少于查询请求的资源,则 resultTruncated 为 true。resultTruncated is true when either paging is disabled or not possible because no id column or when there are less resources available than a query is requesting. 如果 resultTruncated 为 true,则不会设置 $skipToken 属性。When resultTruncated is true, the $skipToken property isn't set.

以下示例演示了如何使用 Azure CLI 和 Azure PowerShell 跳过前 3000 条记录,并返回这些跳过的记录之后的前 1000 条记录 :The following examples show how to skip the first 3000 records and return the first 1000 records after those records skipped with Azure CLI and Azure PowerShell:

az graph query -q "Resources | project id, name | order by id asc" --first 1000 --skip 3000
Search-AzGraph -Query "Resources | project id, name | order by id asc" -First 1000 -Skip 3000

重要

查询必须投射 ID 字段,这样分页才能生效。The query must project the id field in order for pagination to work. 如果查询中缺少 ID 字段,则响应中不包含 $skipToken。If it's missing from the query, the response won't include the $skipToken.

有关示例,请参阅 REST API 文档中的下一页查询For an example, see Next page query in the REST API docs.

设置结果的格式Formatting results

Resource Graph 查询的结果以两种格式提供:Table 和 ObjectArray 。Results of a Resource Graph query are provided in two formats, Table and ObjectArray. 可使用 resultFormat 参数配置格式,该参数是请求选项一部分。The format is configured with the resultFormat parameter as part of the request options. Table 格式是 resultFormat 的默认值。The Table format is the default value for resultFormat.

默认情况下,来自 Azure CLI 的结果以 JSON 的形式提供。Results from Azure CLI are provided in JSON by default. 默认情况下,Azure PowerShell 中的结果为 PSCustomObject,但可使用 ConvertTo-Json cmdlet 快速将其转换为 JSON。Results in Azure PowerShell are a PSCustomObject by default, but they can quickly be converted to JSON using the ConvertTo-Json cmdlet. 对于其他 SDK,可将查询结果配置为输出 ObjectArray 格式。For other SDKs, the query results can be configured to output the ObjectArray format.

格式 - TableFormat - Table

Table 是默认格式,它以 JSON 格式返回结果,旨在突出显示查询所返回的属性的列设计和行值。The default format, Table, returns results in a JSON format designed to highlight the column design and row values of the properties returned by the query. 该格式与结构化表或电子表格中定义的数据很类似,其中首先标识列,然后标识表示与这些列相对应的数据的行。This format closely resembles data as defined in a structured table or spreadsheet with the columns identified first and then each row representing data aligned to those columns.

下面是使用 Table 格式设置的查询结果示例:Here is a sample of a query result with the Table formatting:

{
    "totalRecords": 47,
    "count": 1,
    "data": {
        "columns": [{
                "name": "name",
                "type": "string"
            },
            {
                "name": "type",
                "type": "string"
            },
            {
                "name": "location",
                "type": "string"
            },
            {
                "name": "subscriptionId",
                "type": "string"
            }
        ],
        "rows": [
            [
                "veryscaryvm2-nsg",
                "microsoft.network/networksecuritygroups",
                "chinaeast",
                "11111111-1111-1111-1111-111111111111"
            ]
        ]
    },
    "facets": [],
    "resultTruncated": "true"
}

格式 - ObjectArrayFormat - ObjectArray

ObjectArray 格式也以 JSON 格式返回结果。The ObjectArray format also returns results in a JSON format. 但是,这种设计与 JSON 中常见的键/值对关系保持一致,其中列和行数据在数组组中匹配。However, this design aligns to the key/value pair relationship common in JSON where the column and the row data are matched in array groups.

下面是使用 ObjectArray 格式设置的查询结果示例:Here is a sample of a query result with the ObjectArray formatting:

{
    "totalRecords": 47,
    "count": 1,
    "data": [{
        "name": "veryscaryvm2-nsg",
        "type": "microsoft.network/networksecuritygroups",
        "location": "chinaeast",
        "subscriptionId": "11111111-1111-1111-1111-111111111111"
    }],
    "facets": [],
    "resultTruncated": "true"
}

下面是一些设置 resultFormat 以使用 ObjectArray 格式的示例:Here are some examples of setting resultFormat to use the ObjectArray format:

var requestOptions = new QueryRequestOptions( resultFormat: ResultFormat.ObjectArray);
var request = new QueryRequest(subscriptions, "Resources | limit 1", options: requestOptions);
request_options = QueryRequestOptions(
    result_format=ResultFormat.object_array
)
request = QueryRequest(query="Resources | limit 1", subscriptions=subs_list, options=request_options)
response = client.resources(request)

后续步骤Next steps