对 Azure Cosmos DB for Apache Gremlin 中的图形数据执行查询

2025-09-12

用于 Apache Gremlin 的 Azure Cosmos DB 支持用于查询的 Gremlin TinkerPop 语法。本指南逐步讲解可以使用此服务执行的常见查询。可以使用 Gremlin 控制台或最喜欢的 Gremlin 驱动程序在本指南中运行以下查询。

先决条件

Azure 订阅服务
- 如果没有 Azure 订阅，可在开始前创建一个试用帐户。

Azure Cosmos DB for Apache Gremlin 帐户

访问用于测试的示例数据

计算图形中的顶点数

计算图形中产品顶点的总数。此作对于了解产品目录的大小或验证数据加载非常有用。

g.V().hasLabel('product').count()

计算图形中具有特定标签的顶点数

计算包含特定标签的图形中的产品顶点总数。在此示例中，标签为 product.

g.V().hasLabel('product').count()

按标签和属性筛选产品

检索与特定标签和属性值匹配的产品。此查询有助于将结果缩小到感兴趣的子集，例如价格大于 800 美元的产品。

g.V().hasLabel('product').has('price', gt(800))

来自产品的项目特定属性

仅返回匹配产品中的所选属性。此查询减少了返回的数据量，并重点介绍相关字段，例如产品名称。

g.V().hasLabel('product').values('name')

通过遍历图形查找相关产品。例如，通过遍历传出的“替换”边缘，然后遍历到连接的产品顶点来查找由特定产品替换的所有产品。

g.V(['gear-surf-surfboards', 'bbbbbbbb-1111-2222-3333-cccccccccccc']).outE('replaces').inV().hasLabel('product')

使用此查询在更换链中查找两个跃点的产品：

g.V(['gear-surf-surfboards', 'bbbbbbbb-1111-2222-3333-cccccccccccc']).outE('replaces').inV().hasLabel('product').outE('replaces').inV().hasLabel('product')

使用执行配置文件分析查询执行

使用 executionProfile() 步骤分析 Gremlin 查询的性能和执行详细信息。此步骤返回一个 JSON 对象，其中包含查询中每个步骤的指标，这有助于进行故障排除和优化。

g.V(['gear-surf-surfboards', 'bbbbbbbb-1111-2222-3333-cccccccccccc']).out().executionProfile()

[
  {
    "gremlin": "g.V('mary').out().executionProfile()",
    "totalTime": 28,
    "metrics": [
      {
        "name": "GetVertices",
        "time": 24,
        "annotations": { "percentTime": 85.71 },
        "counts": { "resultCount": 2 },
        "storeOps": [ { "fanoutFactor": 1, "count": 2, "size": 696, "time": 0.4 } ]
      },
      {
        "name": "GetEdges",
        "time": 4,
        "annotations": { "percentTime": 14.29 },
        "counts": { "resultCount": 1 },
        "storeOps": [ { "fanoutFactor": 1, "count": 1, "size": 419, "time": 0.67 } ]
      },
      {
        "name": "GetNeighborVertices",
        "time": 0,
        "annotations": { "percentTime": 0 },
        "counts": { "resultCount": 1 }
      },
      {
        "name": "ProjectOperator",
        "time": 0,
        "annotations": { "percentTime": 0 },
        "counts": { "resultCount": 1 }
      }
    ]
  }
]

有关该 executionProfile() 步骤的详细信息，请参阅执行配置文件参考。

小窍门

该 executionProfile() 步骤执行 Gremlin 查询。此查询包括或addVaddE步骤，这会导致创建和提交查询中指定的更改。 Gremlin 查询生成的请求单位也收费。

如果查询访问的分区数超过必要，通常是由于缺少分区键谓词，则会发生盲扇出。此反模式可以增加延迟和成本。执行配置文件通过显示高 fanoutFactor值来帮助识别此类模式。

g.V(['gear-surf-surfboards', 'aaaaaaaa-0000-1111-2222-bbbbbbbbbbbb']).executionProfile()

[
  {
    "gremlin": "g.V('tt0093640').executionProfile()",
    "totalTime": 46,
    "metrics": [
      {
        "name": "GetVertices",
        "time": 46,
        "annotations": { "percentTime": 100 },
        "counts": { "resultCount": 1 },
        "storeOps": [ { "fanoutFactor": 5, "count": 1, "size": 589, "time": 75.61 } ]
      },
      {
        "name": "ProjectOperator",
        "time": 0,
        "annotations": { "percentTime": 0 },
        "counts": { "resultCount": 1 }
      }
    ]
  }
]

优化扇出查询

高 fanoutFactor （例如 5）表示查询访问了多个分区。若要优化，请在查询谓词中包含分区键：

g.V(['gear-surf-surfboards', 'aaaaaaaa-0000-1111-2222-bbbbbbbbbbbb'])

未筛选的查询模式

未经筛选的查询可以处理大型初始数据集，从而增加成本和延迟。

g.V().hasLabel('product').out().executionProfile()

[
  {
    "gremlin": "g.V().hasLabel('tweet').out().executionProfile()",
    "totalTime": 42,
    "metrics": [
      {
        "name": "GetVertices",
        "time": 31,
        "annotations": { "percentTime": 73.81 },
        "counts": { "resultCount": 30 },
        "storeOps": [ { "fanoutFactor": 1, "count": 13, "size": 6819, "time": 1.02 } ]
      },
      {
        "name": "GetEdges",
        "time": 6,
        "annotations": { "percentTime": 14.29 },
        "counts": { "resultCount": 18 },
        "storeOps": [ { "fanoutFactor": 1, "count": 20, "size": 7950, "time": 1.98 } ]
      },
      {
        "name": "GetNeighborVertices",
        "time": 5,
        "annotations": { "percentTime": 11.9 },
        "counts": { "resultCount": 20 },
        "storeOps": [ { "fanoutFactor": 1, "count": 4, "size": 1070, "time": 1.19 } ]
      },
      {
        "name": "ProjectOperator",
        "time": 0,
        "annotations": { "percentTime": 0 },
        "counts": { "resultCount": 20 }
      }
    ]
  }
]

筛选的查询模式

在遍历之前添加筛选器可以减少工作集并提高性能。执行配置文件显示筛选的效果。筛选后的查询处理的顶点更少，导致延迟和成本降低。

g.V().hasLabel('product').has('clearance', true).out().executionProfile()

[
  {
    "gremlin": "g.V().hasLabel('tweet').has('lang', 'en').out().executionProfile()",
    "totalTime": 14,
    "metrics": [
      {
        "name": "GetVertices",
        "time": 14,
        "annotations": { "percentTime": 58.33 },
        "counts": { "resultCount": 11 },
        "storeOps": [ { "fanoutFactor": 1, "count": 11, "size": 4807, "time": 1.27 } ]
      },
      {
        "name": "GetEdges",
        "time": 5,
        "annotations": { "percentTime": 20.83 },
        "counts": { "resultCount": 18 },
        "storeOps": [ { "fanoutFactor": 1, "count": 18, "size": 7159, "time": 1.7 } ]
      },
      {
        "name": "GetNeighborVertices",
        "time": 5,
        "annotations": { "percentTime": 20.83 },
        "counts": { "resultCount": 18 },
        "storeOps": [ { "fanoutFactor": 1, "count": 4, "size": 1070, "time": 1.01 } ]
      },
      {
        "name": "ProjectOperator",
        "time": 0,
        "annotations": { "percentTime": 0 },
        "counts": { "resultCount": 18 }
      }
    ]
  }
]

Compartir a través de