如何使用执行配置文件步骤来评估 Gremlin 查询How to use the execution profile step to evaluate your Gremlin queries

本文概述如何使用适用于 Azure Cosmos DB Gremlin API 图形数据库的执行配置文件步骤。This article provides an overview of how to use the execution profile step for Azure Cosmos DB Gremlin API graph databases. 此步骤提供故障排除和查询优化的相关信息,适用于可以针对 Cosmos DB Gremlin API 帐户执行的任何 Gremlin 查询。This step provides relevant information for troubleshooting and query optimizations, and it is compatible with any Gremlin query that can be executed against a Cosmos DB Gremlin API account.

若要使用此步骤,只需在 Gremlin 查询的末尾追加 executionProfile() 函数调用即可。To use this step, simply append the executionProfile() function call at the end of your Gremlin query. 将执行你的 Gremlin 查询,操作结果将返回包含查询执行配置文件的 JSON 响应对象。Your Gremlin query will be executed and the result of the operation will return a JSON response object with the query execution profile.

例如:For example:

    // Basic traversal
    g.V('mary').out()

    // Basic traversal with execution profile call
    g.V('mary').out().executionProfile()

调用 executionProfile() 步骤后,响应将是一个 JSON 对象,其中包含执行的 Gremlin 步骤、该步骤花费的总时间,以及语句生成的 Cosmos DB 运行时运算符数组。After calling the executionProfile() step, the response will be a JSON object that includes the executed Gremlin step, the total time it took, and an array of the Cosmos DB runtime operators that the statement resulted in.

备注

Apache Tinkerpop 规范中未定义此执行配置文件实现。This implementation for Execution Profile is not defined in the Apache Tinkerpop specification. 它是特定于 Azure Cosmos DB Gremlin API 的实现。It is specific to Azure Cosmos DB Gremlin API's implementation.

响应示例Response Example

下面是将会返回的带批注的输出示例:The following is an annotated example of the output that will be returned:

备注

此示例使用注释做了批注,以便解释响应的常规结构。This example is annotated with comments that explain the general structure of the response. 实际的 executionProfile 响应不包含任何注释。An actual executionProfile response won't contain any comments.

[
  {
    // The Gremlin statement that was executed.
    "gremlin": "g.V('mary').out().executionProfile()",

    // Amount of time in milliseconds that the entire operation took.
    "totalTime": 28,

    // An array containing metrics for each of the steps that were executed. 
    // Each Gremlin step will translate to one or more of these steps.
    // This list is sorted in order of execution.
    "metrics": [
      {
        // This operation obtains a set of Vertex objects.
        // The metrics include: time, percentTime of total execution time, resultCount, 
        // fanoutFactor, count, size (in bytes) and time.
        "name": "GetVertices",
        "time": 24,
        "annotations": {
          "percentTime": 85.71
        },
        "counts": {
          "resultCount": 2
        },
        "storeOps": [
          {
            "fanoutFactor": 1,
            "count": 2,
            "size": 696,
            "time": 0.4
          }
        ]
      },
      {
        // This operation obtains a set of Edge objects. 
        // Depending on the query, these might be directly adjacent to a set of vertices, 
        // or separate, in the case of an E() query.
        //
        // The metrics include: time, percentTime of total execution time, resultCount, 
        // fanoutFactor, count, size (in bytes) and time.
        "name": "GetEdges",
        "time": 4,
        "annotations": {
          "percentTime": 14.29
        },
        "counts": {
          "resultCount": 1
        },
        "storeOps": [
          {
            "fanoutFactor": 1,
            "count": 1,
            "size": 419,
            "time": 0.67
          }
        ]
      },
      {
        // This operation obtains the vertices that a set of edges point at.
        // The metrics include: time, percentTime of total execution time and resultCount.
        "name": "GetNeighborVertices",
        "time": 0,
        "annotations": {
          "percentTime": 0
        },
        "counts": {
          "resultCount": 1
        }
      },
      {
        // This operation represents the serialization and preparation for a result from 
        // the preceding graph operations. The metrics include: time, percentTime of total 
        // execution time and resultCount.
        "name": "ProjectOperator",
        "time": 0,
        "annotations": {
          "percentTime": 0
        },
        "counts": {
          "resultCount": 1
        }
      }
    ]
  }
]

备注

executionProfile 步骤将执行 Gremlin 查询。The executionProfile step will execute the Gremlin query. 此查询包含 addVaddE 步骤,这些步骤会完成创建过程,并提交查询中指定的更改。This includes the addV or addEsteps, which will result in the creation and will commit the changes specified in the query. 因此,Gremlin 查询生成的请求单位也会产生费用。As a result, the Request Units generated by the Gremlin query will also be charged.

执行配置文件响应对象Execution profile response objects

executionProfile() 函数的响应将生成采用以下结构的 JSON 对象层次结构:The response of an executionProfile() function will yield a hierarchy of JSON objects with the following structure:

  • Gremlin 操作对象:表示已执行的整个 Gremlin 操作。Gremlin operation object: Represents the entire Gremlin operation that was executed. 包含以下属性。Contains the following properties.

    • gremlin:已执行的显式 Gremlin 语句。gremlin: The explicit Gremlin statement that was executed.
    • totalTime:执行该步骤所花费的时间(以毫秒为单位)。totalTime: The time, in milliseconds, that the execution of the step incurred in.
    • metrics:一个数组,其中包含为了完成查询而执行的每个 Cosmos DB 运行时运算符。metrics: An array that contains each of the Cosmos DB runtime operators that were executed to fulfill the query. 此列表已按执行顺序排序。This list is sorted in order of execution.
  • Cosmos DB 运行时运算符:表示整个 Gremlin 操作的每个组件。Cosmos DB runtime operators: Represents each of the components of the entire Gremlin operation. 此列表已按执行顺序排序。This list is sorted in order of execution. 每个对象包含以下属性:Each object contains the following properties:

    • name:运算符的名称。name: Name of the operator. 这是已评估和执行的步骤的类型。This is the type of step that was evaluated and executed. 请在下表中了解详细信息。Read more in the table below.
    • time:给定的运算符所花费的时间(以毫秒为单位)。time: Amount of time, in milliseconds, that a given operator took.
    • annotations:包含特定于已执行的运算符的其他信息。annotations: Contains additional information, specific to the operator that was executed.
    • annotations.percentTime:执行特定运算符所花费的时间占总时间的百分比。annotations.percentTime: Percentage of the total time that it took to execute the specific operator.
    • counts:此运算符从存储层返回的对象数。counts: Number of objects that were returned from the storage layer by this operator. 此值包含在内部的 counts.resultCount 标量值中。This is contained in the counts.resultCount scalar value within.
    • storeOps:表示可以跨一个或多个分区的存储操作。storeOps: Represents a storage operation that can span one or multiple partitions.
    • storeOps.fanoutFactor:表示此特定存储操作访问的分区数。storeOps.fanoutFactor: Represents the number of partitions that this specific storage operation accessed.
    • storeOps.count:表示此存储操作返回的结果数。storeOps.count: Represents the number of results that this storage operation returned.
    • storeOps.size:表示给定存储操作的结果大小(以字节为单位)。storeOps.size: Represents the size in bytes of the result of a given storage operation.
Cosmos DB Gremlin 运行时运算符Cosmos DB Gremlin Runtime Operator 说明Description
GetVertices 此步骤从持久性层获取一组带谓词的对象。This step obtains a predicated set of objects from the persistence layer.
GetEdges 此步骤获取与一组顶点相邻的边缘。This step obtains the edges that are adjacent to a set of vertices. 此步骤可以生成一个或多个存储操作。This step can result in one or many storage operations.
GetNeighborVertices 此步骤获取与一组边缘相连接的顶点。This step obtains the vertices that are connected to a set of edges. 边缘包含分区键及其源和目标顶点的 ID。The edges contain the partition keys and ID's of both their source and target vertices.
Coalesce 每当执行 coalesce() Gremlin 步骤时,此步骤都会考虑两项操作的评估结果。This step accounts for the evaluation of two operations whenever the coalesce() Gremlin step is executed.
CartesianProductOperator 此步骤计算两个数据集之间的笛卡儿积。This step computes a cartesian product between two datasets. 通常,每当使用谓词 to()from(),都会执行此步骤。Usually executed whenever the predicates to() or from() are used.
ConstantSourceOperator 此步骤计算一个表达式,以生成一个常量值作为结果。This step computes an expression to produce a constant value as a result.
ProjectOperator 此步骤使用先前操作的结果来准备并序列化响应。This step prepares and serializes a response using the result of preceding operations.
ProjectAggregation 此步骤准备并序列化聚合操作的响应。This step prepares and serializes a response for an aggregate operation.

备注

随着新运算符的添加,此列表将不断更新。This list will continue to be updated as new operators are added.

有关如何分析执行配置文件响应的示例Examples on how to analyze an execution profile response

下面是可以使用执行配置文件响应发现的常用优化方法示例:The following are examples of common optimizations that can be spotted using the Execution Profile response:

  • 盲目扇出查询。Blind fan-out query.
  • 未筛选的查询。Unfiltered query.

盲目扇出查询模式Blind fan-out query patterns

假设某个分区图形返回了以下执行配置文件响应:Assume the following execution profile response from a partitioned graph:

[
  {
    "gremlin": "g.V('tt0093640').executionProfile()",
    "totalTime": 46,
    "metrics": [
      {
        "name": "GetVertices",
        "time": 46,
        "annotations": {
          "percentTime": 100
        },
        "counts": {
          "resultCount": 1
        },
        "storeOps": [
          {
            "fanoutFactor": 5,
            "count": 1,
            "size": 589,
            "time": 75.61
          }
        ]
      },
      {
        "name": "ProjectOperator",
        "time": 0,
        "annotations": {
          "percentTime": 0
        },
        "counts": {
          "resultCount": 1
        }
      }
    ]
  }
]

可以从中得出以下结论:The following conclusions can be made from it:

  • 该查询是单个 ID 查找,因为 Gremlin 语句遵循 g.V('id') 模式。The query is a single ID lookup, since the Gremlin statement follows the pattern g.V('id').
  • time 指标判断,此查询的延迟似乎很高,因为它[针对某个单点读取操作花费了 10 毫秒以上](/cosmos-db/introduction#guaranteed-low-latency-at-99th-percentile-around China)。Judging from the time metric, the latency of this query seems to be high since it's [more than 10ms for a single point-read operation](/cosmos-db/introduction#guaranteed-low-latency-at-99th-percentile-around China).
  • 查看 storeOps 对象可以发现 fanoutFactor5,这意味着,此操作访问了 5 个分区If we look into the storeOps object, we can see that the fanoutFactor is 5, which means that 5 partitions were accessed by this operation.

根据此分析的结论,我们可以确定,第一个查询不必要地访问了多余的分区。As a conclusion of this analysis, we can determine that the first query is accessing more partitions than necessary. 在查询中指定分区键作为谓词可以解决此问题。This can be addressed by specifying the partitioning key in the query as a predicate. 这样可以降低延迟以及每个查询的开销。This will lead to less latency and less cost per query. 详细了解图形分区Learn more about graph partitioning. 更佳的查询是 g.V('tt0093640').has('partitionKey', 't1001')A more optimal query would be g.V('tt0093640').has('partitionKey', 't1001').

未筛选的查询模式Unfiltered query patterns

比较以下两个执行配置文件响应。Compare the following two execution profile responses. 为简单起见,这些示例使用了单个分区图形。For simplicity, these examples use a single partitioned graph.

这第一个查询检索所有带有 tweet 标签的顶点,然后获取其相邻顶点:This first query retrieves all vertices with the label tweet and then obtains their neighboring vertices:

[
  {
    "gremlin": "g.V().hasLabel('tweet').out().executionProfile()",
    "totalTime": 42,
    "metrics": [
      {
        "name": "GetVertices",
        "time": 31,
        "annotations": {
          "percentTime": 73.81
        },
        "counts": {
          "resultCount": 30
        },
        "storeOps": [
          {
            "fanoutFactor": 1,
            "count": 13,
            "size": 6819,
            "time": 1.02
          }
        ]
      },
      {
        "name": "GetEdges",
        "time": 6,
        "annotations": {
          "percentTime": 14.29
        },
        "counts": {
          "resultCount": 18
        },
        "storeOps": [
          {
            "fanoutFactor": 1,
            "count": 20,
            "size": 7950,
            "time": 1.98
          }
        ]
      },
      {
        "name": "GetNeighborVertices",
        "time": 5,
        "annotations": {
          "percentTime": 11.9
        },
        "counts": {
          "resultCount": 20
        },
        "storeOps": [
          {
            "fanoutFactor": 1,
            "count": 4,
            "size": 1070,
            "time": 1.19
          }
        ]
      },
      {
        "name": "ProjectOperator",
        "time": 0,
        "annotations": {
          "percentTime": 0
        },
        "counts": {
          "resultCount": 20
        }
      }
    ]
  }
]

在浏览相邻顶点之前,请注意同一查询(但现在它具有附加的筛选器 has('lang', 'en'))的配置文件:Notice the profile of the same query, but now with an additional filter, has('lang', 'en'), before exploring the adjacent vertices:

[
  {
    "gremlin": "g.V().hasLabel('tweet').has('lang', 'en').out().executionProfile()",
    "totalTime": 14,
    "metrics": [
      {
        "name": "GetVertices",
        "time": 14,
        "annotations": {
          "percentTime": 58.33
        },
        "counts": {
          "resultCount": 11
        },
        "storeOps": [
          {
            "fanoutFactor": 1,
            "count": 11,
            "size": 4807,
            "time": 1.27
          }
        ]
      },
      {
        "name": "GetEdges",
        "time": 5,
        "annotations": {
          "percentTime": 20.83
        },
        "counts": {
          "resultCount": 18
        },
        "storeOps": [
          {
            "fanoutFactor": 1,
            "count": 18,
            "size": 7159,
            "time": 1.7
          }
        ]
      },
      {
        "name": "GetNeighborVertices",
        "time": 5,
        "annotations": {
          "percentTime": 20.83
        },
        "counts": {
          "resultCount": 18
        },
        "storeOps": [
          {
            "fanoutFactor": 1,
            "count": 4,
            "size": 1070,
            "time": 1.01
          }
        ]
      },
      {
        "name": "ProjectOperator",
        "time": 0,
        "annotations": {
          "percentTime": 0
        },
        "counts": {
          "resultCount": 18
        }
      }
    ]
  }
]

这两个查询达到了相同的效果,但是,第一个查询需要更多的请求单位,因为它在查询相邻项之前,需要迭代一个更大的初始数据集。These two queries reached the same result, however, the first one will require more Request Units since it needed to iterate a larger initial dataset before querying the adjacent items. 比较两个响应中的以下参数时,可以看到此行为的指示器:We can see indicators of this behavior when comparing the following parameters from both responses:

  • 第一个响应中的 metrics[0].time 值更大,表示解决此步骤所花费的时间更长。The metrics[0].time value is higher in the first response, which indicates that this single step took longer to resolve.
  • 第一个响应中的 metrics[0].counts.resultsCount 值也更大,表示初始工作数据集更大。The metrics[0].counts.resultsCount value is higher as well in the first response, which indicates that the initial working dataset was larger.

后续步骤Next steps