Azure Cosmos DB Gremlin 图形支持并兼容 TinkerPop 功能Azure Cosmos DB Gremlin graph support and compatibility with TinkerPop features

适用于: Gremlin API

Azure Cosmos DB 支持 Apache Tinkerpop 的图形遍历语言(称为 Gremlin)。Azure Cosmos DB supports Apache Tinkerpop's graph traversal language, known as Gremlin. 可以使用 Gremlin 语言创建图形实体(顶点和边缘)、修改这些实体内部的属性、执行查询和遍历,以及删除实体。You can use the Gremlin language to create graph entities (vertices and edges), modify properties within those entities, perform queries and traversals, and delete entities.

Azure Cosmos DB Graph 引擎严格遵循 Apache TinkerPop 遍历步骤规范,但在实现中存在特定于 Azure Cosmos DB 的差异。Azure Cosmos DB Graph engine closely follows Apache TinkerPop traversal steps specification but there are differences in the implementation that are specific for Azure Cosmos DB. 本文提供 Gremlin 的快速演练,并列举 Gremlin API 支持的 Gremlin 功能。In this article, we provide a quick walkthrough of Gremlin and enumerate the Gremlin features that are supported by the Gremlin API.

兼容的客户端库Compatible client libraries

下表显示可以对 Azure Cosmos DB 使用的常用 Gremlin 驱动程序:The following table shows popular Gremlin drivers that you can use against Azure Cosmos DB:

下载Download Source 入门Getting Started 支持的连接器版本Supported connector version
.NET.NET GitHub 上的 Gremlin.NETGremlin.NET on GitHub 使用 .NET 创建图形Create Graph using .NET 3.4.63.4.6
JavaJava Gremlin JavaDocGremlin JavaDoc 使用 Java 创建图形Create Graph using Java 3.2.0+3.2.0+
Node.jsNode.js GitHub 上的 Gremlin-JavaScriptGremlin-JavaScript on GitHub 使用 Node.js 创建图形Create Graph using Node.js 3.3.4+3.3.4+
PythonPython GitHub 上的 Gremlin-PythonGremlin-Python on GitHub 使用 Python 创建图形Create Graph using Python 3.2.73.2.7
PHPPHP GitHub 上的 Gremlin-PHPGremlin-PHP on GitHub 使用 PHP 创建图形Create Graph using PHP 3.1.03.1.0
Go LangGo Lang Go LangGo Lang 此库由外部参与者生成。This library is built by external contributors. Azure Cosmos DB 团队不对该库提供任何支持或维护。The Azure Cosmos DB team doesn't offer any support or maintain the library.
Gremlin 控制台Gremlin console TinkerPop 文档TinkerPop docs 使用 Gremlin 控制台创建图形Create Graph using Gremlin Console 3.2.0 +3.2.0 +

支持的图对象Supported Graph Objects

TinkerPop 是涵盖多种图形技术的标准。TinkerPop is a standard that covers a wide range of graph technologies. 因此,它使用标准的术语来描述图形提供程序提供的功能。Therefore, it has standard terminology to describe what features are provided by a graph provider. Azure Cosmos DB 提供一个可跨多个服务器或群集分区的持久性、高并发性、可写的图形数据库。Azure Cosmos DB provides a persistent, high concurrency, writeable graph database that can be partitioned across multiple servers or clusters.

下表列出了 Azure Cosmos DB 实现的 TinkerPop 功能:The following table lists the TinkerPop features that are implemented by Azure Cosmos DB:

类别Category Azure Cosmos DB 实现Azure Cosmos DB implementation 说明Notes
图形功能Graph features 提供持久性和并发访问。Provides Persistence and ConcurrentAccess. 旨在支持事务Designed to support Transactions 可通过 Spark 连接器实现计算机方法。Computer methods can be implemented via the Spark connector.
变量功能Variable features 支持布尔值、整数、字节、双精度值、浮点值、长整数和字符串Supports Boolean, Integer, Byte, Double, Float, Integer, Long, String 支持基元类型,通过数据模型与复杂类型兼容Supports primitive types, is compatible with complex types via data model
顶点功能Vertex features 支持 RemoveVertices、MetaProperties、AddVertices、MultiProperties、StringIds、UserSuppliedIds、AddProperty、RemovePropertySupports RemoveVertices, MetaProperties, AddVertices, MultiProperties, StringIds, UserSuppliedIds, AddProperty, RemoveProperty 支持创建、修改和删除顶点Supports creating, modifying, and deleting vertices
顶点属性功能Vertex property features StringIds、UserSuppliedIds、AddProperty、RemoveProperty、BooleanValues、ByteValues、DoubleValues、FloatValues、IntegerValues、LongValues、StringValuesStringIds, UserSuppliedIds, AddProperty, RemoveProperty, BooleanValues, ByteValues, DoubleValues, FloatValues, IntegerValues, LongValues, StringValues 支持创建、修改和删除顶点属性Supports creating, modifying, and deleting vertex properties
边缘功能Edge features AddEdges、RemoveEdges、StringIds、UserSuppliedIds、AddProperty、RemovePropertyAddEdges, RemoveEdges, StringIds, UserSuppliedIds, AddProperty, RemoveProperty 支持创建、修改和删除边缘Supports creating, modifying, and deleting edges
边缘属性功能Edge property features Properties、BooleanValues、ByteValues、DoubleValues、FloatValues、IntegerValues、LongValues、StringValuesProperties, BooleanValues, ByteValues, DoubleValues, FloatValues, IntegerValues, LongValues, StringValues 支持创建、修改和删除边缘属性Supports creating, modifying, and deleting edge properties

Gremlin 网络格式Gremlin wire format

从 Gremlin 操作返回结果时,Azure Cosmos DB 使用 JSON 格式。Azure Cosmos DB uses the JSON format when returning results from Gremlin operations. Azure Cosmos DB 目前支持 JSON 格式。Azure Cosmos DB currently supports the JSON format. 例如,以下代码片段显示了从 Azure Cosmos DB 返回到客户端的某个顶点的 JSON 表示形式。 For example, the following snippet shows a JSON representation of a vertex returned to the client from Azure Cosmos DB:

  {
    "id": "a7111ba7-0ea1-43c9-b6b2-efc5e3aea4c0",
    "label": "person",
    "type": "vertex",
    "outE": {
      "knows": [
        {
          "id": "3ee53a60-c561-4c5e-9a9f-9c7924bc9aef",
          "inV": "04779300-1c8e-489d-9493-50fd1325a658"
        },
        {
          "id": "21984248-ee9e-43a8-a7f6-30642bc14609",
          "inV": "a8e3e741-2ef7-4c01-b7c8-199f8e43e3bc"
        }
      ]
    },
    "properties": {
      "firstName": [
        {
          "value": "Thomas"
        }
      ],
      "lastName": [
        {
          "value": "Andersen"
        }
      ],
      "age": [
        {
          "value": 45
        }
      ]
    }
  }

下面介绍了顶点的 JSON 格式使用的属性:The properties used by the JSON format for vertices are described below:

属性Property 说明Description
id 顶点的 ID。The ID for the vertex. 必须唯一(在适用的情况下,可与 _partition 的值合并)。Must be unique (in combination with the value of _partition if applicable). 如果未提供任何值,则系统会自动提供一个包含 GUID 的值If no value is provided, it will be automatically supplied with a GUID
label 顶点的标签。The label of the vertex. 此属性用于描述实体类型。This property is used to describe the entity type.
type 用于将顶点与非图形文档相区分Used to distinguish vertices from non-graph documents
properties 与顶点关联的用户定义属性包。Bag of user-defined properties associated with the vertex. 每个属性可以有多个值。Each property can have multiple values.
_partition 顶点的分区键。The partition key of the vertex. 用于图形分区Used for graph partitioning.
outE 此属性包含顶点中外部边缘的列表。This property contains a list of out edges from a vertex. 存储顶点的相邻信息,以便快速执行遍历。Storing the adjacency information with vertex allows for fast execution of traversals. 边缘根据其标签分组。Edges are grouped based on their labels.

边缘包含以下信息,以方便导航到图形的其他部件。And the edge contains the following information to help with navigation to other parts of the graph.

propertiesProperty 说明Description
id 边缘的 ID。The ID for the edge. 必须唯一(在适用的情况下,可与 _partition 的值合并)Must be unique (in combination with the value of _partition if applicable)
label 边缘的标签。The label of the edge. 此属性是可选的,用于描述关系类型。This property is optional, and used to describe the relationship type.
inV 此属性包含边缘的一系列顶点。This property contains a list of in vertices for an edge. 存储顶点的相邻信息可以快速执行遍历。Storing the adjacency information with the edge allows for fast execution of traversals. 顶点根据其标签分组。Vertices are grouped based on their labels.
properties 与边缘关联的用户定义属性包。Bag of user-defined properties associated with the edge. 每个属性可以有多个值。Each property can have multiple values.

每个属性可在一个数组中存储多个值。Each property can store multiple values within an array.

propertiesProperty 说明Description
value 属性的值The value of the property

Gremlin 的步骤Gremlin steps

现在,让我们了解 Azure Cosmos DB 支持的 Gremlin 步骤。Now let's look at the Gremlin steps supported by Azure Cosmos DB. 有关 Gremlin 的完整参考信息,请参阅 TinkerPop 参考For a complete reference on Gremlin, see TinkerPop reference.

步骤step 说明Description TinkerPop 3.2 文档TinkerPop 3.2 Documentation
addE 在两个顶点之间添加边缘Adds an edge between two vertices addE 步骤addE step
addV 将顶点添加到图形Adds a vertex to the graph addV 步骤addV step
and 确保所有遍历都返回值Ensures that all the traversals return a value and 步骤and step
as 用于向步骤的输出分配变量的步骤调制器A step modulator to assign a variable to the output of a step as 步骤as step
by grouporder 配合使用的步骤调制器A step modulator used with group and order by 步骤by step
coalesce 返回第一个返回结果的遍历Returns the first traversal that returns a result coalesce 步骤coalesce step
constant 返回常量值。Returns a constant value. coalesce 配合使用Used with coalesce constant 步骤constant step
count 从遍历返回计数Returns the count from the traversal count 步骤count step
dedup 返回已删除重复内容的值Returns the values with the duplicates removed dedup 步骤dedup step
drop 丢弃值(顶点/边缘)Drops the values (vertex/edge) drop 步骤drop step
executionProfile 创建执行的 Gremlin 步骤生成的所有操作的说明Creates a description of all operations generated by the executed Gremlin step executionProfile 步骤executionProfile step
fold 充当用于计算结果聚合值的屏障Acts as a barrier that computes the aggregate of results fold 步骤fold step
group 根据指定的标签将值分组Groups the values based on the labels specified group 步骤group step
has 用于筛选属性、顶点和边缘。Used to filter properties, vertices, and edges. 支持 hasLabelhasIdhasNothas 变体。Supports hasLabel, hasId, hasNot, and has variants. has 步骤has step
inject 将值注入流中Inject values into a stream inject 步骤inject step
is 用于通过布尔表达式执行筛选器Used to perform a filter using a boolean expression is 步骤is step
limit 用于限制遍历中的项数Used to limit number of items in the traversal limit 步骤limit step
local 本地包装遍历的某个部分,类似于子查询Local wraps a section of a traversal, similar to a subquery local 步骤local step
not 用于生成筛选器的求反结果Used to produce the negation of a filter not 步骤not step
optional 如果生成了某个结果,则返回指定遍历的结果,否则返回调用元素Returns the result of the specified traversal if it yields a result else it returns the calling element optional 步骤optional step
or 确保至少有一个遍历会返回值Ensures at least one of the traversals returns a value or 步骤or step
order 按指定的排序顺序返回结果Returns results in the specified sort order order 步骤order step
path 返回遍历的完整路径Returns the full path of the traversal path 步骤path step
project 将属性投影为映射Projects the properties as a Map project 步骤project step
properties 返回指定标签的属性Returns the properties for the specified labels properties 步骤properties step
range 根据指定的值范围进行筛选Filters to the specified range of values range 步骤range step
repeat 将步骤重复指定的次数。Repeats the step for the specified number of times. 用于循环Used for looping repeat 步骤repeat step
sample 用于对遍历返回的结果采样Used to sample results from the traversal sample 步骤sample step
select 用于投影遍历返回的结果Used to project results from the traversal select 步骤select step
store 用于遍历返回的非阻塞聚合Used for non-blocking aggregates from the traversal store 步骤store step
TextP.startingWith(string) 字符串筛选函数。String filtering function. 此函数用作 has() 步骤的谓词来将某个属性与给定字符串的开头进行匹配This function is used as a predicate for the has() step to match a property with the beginning of a given string TextP 谓词TextP predicates
TextP.endingWith(string) 字符串筛选函数。String filtering function. 此函数用作 has() 步骤的谓词来将某个属性与给定字符串的结尾进行匹配This function is used as a predicate for the has() step to match a property with the ending of a given string TextP 谓词TextP predicates
TextP.containing(string) 字符串筛选函数。String filtering function. 此函数用作 has() 步骤的谓词来将某个属性与给定字符串的内容进行匹配This function is used as a predicate for the has() step to match a property with the contents of a given string TextP 谓词TextP predicates
TextP.notStartingWith(string) 字符串筛选函数。String filtering function. 此函数用作 has() 步骤的谓词来匹配不以给定字符串开头的属性This function is used as a predicate for the has() step to match a property that doesn't start with a given string TextP 谓词TextP predicates
TextP.notEndingWith(string) 字符串筛选函数。String filtering function. 此函数用作 has() 步骤的谓词来匹配不以给定字符串结尾的属性This function is used as a predicate for the has() step to match a property that doesn't end with a given string TextP 谓词TextP predicates
TextP.notContaining(string) 字符串筛选函数。String filtering function. 此函数用作 has() 步骤的谓词来匹配不包含给定字符串的属性This function is used as a predicate for the has() step to match a property that doesn't contain a given string TextP 谓词TextP predicates
tree 将顶点中的路径聚合到树中Aggregate paths from a vertex into a tree tree 步骤tree step
unfold 将迭代器作为步骤展开Unroll an iterator as a step unfold 步骤unfold step
union 合并多个遍历返回的结果Merge results from multiple traversals union 步骤union step
V 包括顶点与边缘之间的遍历所需的步骤:VEoutinbothoutEinEbothEoutVinVbothVotherVIncludes the steps necessary for traversals between vertices and edges V, E, out, in, both, outE, inE, bothE, outV, inV, bothV, and otherV for vertex 步骤vertex steps
where 用于筛选遍历返回的结果。Used to filter results from the traversal. 支持 eqneqltltegtgtebetween 运算符Supports eq, neq, lt, lte, gt, gte, and between operators where 步骤where step

Azure Cosmos DB 提供的写入优化引擎默认支持自动对顶点和边缘中的所有属性编制索引。The write-optimized engine provided by Azure Cosmos DB supports automatic indexing of all properties within vertices and edges by default. 因此,使用筛选器、范围查询、排序或聚合对任何属性执行的查询将从索引处理,并可有效完成。Therefore, queries with filters, range queries, sorting, or aggregates on any property are processed from the index, and served efficiently. 有关 Azure Cosmos DB 中索引编制的工作原理的详细信息,请参阅有关架构不可知的索引编制的文章。For more information on how indexing works in Azure Cosmos DB, see our paper on schema-agnostic indexing.

行为差异Behavior differences

  • Azure Cosmos DB Graph 引擎运行“广度优先”遍历,而 TinkerPop Gremlin 则是深度优先。Azure Cosmos DB Graph engine runs *breadth-first _ traversal while TinkerPop Gremlin is depth-first. 这种行为在像 Cosmos DB 这样的水平可缩放系统中可实现更好的性能。This behavior achieves better performance in horizontally scalable system like Cosmos DB.

不支持的功能Unsupported features

Gremlin 字节码是与编程语言无关的图遍历规范。_ *Gremlin Bytecode _ is a programming language agnostic specification for graph traversals. Cosmos DB Graph 尚不支持它。Cosmos DB Graph doesn't support it yet. 请使用 GremlinClient.SubmitAsync() 并以文本字符串的形式传递遍历。Use GremlinClient.SubmitAsync() and pass traversal as a text string.

目前不支持 property(set, 'xyz', 1) 集基数。_ *property(set, 'xyz', 1) _ set cardinality isn't supported today. 请改用 property(list, 'xyz', 1)Use property(list, 'xyz', 1) instead. 若要了解详细信息,请参阅 TinkerPop 的顶点属性To learn more, see Vertex properties with TinkerPop.

match() 步骤当前不可用。_ The *match() step _ isn't currently available. 此步骤提供声明性查询功能。This step provides declarative querying capabilities.

在顶点或边缘,不支持对象作为属性。_ *Objects as properties _ on vertices or edges aren't supported. 属性只能是基元类型或数组。Properties can only be primitive types or arrays.

不支持按数组属性排序 order().by(<array property>)_ *Sorting by array properties _ order().by(<array property>) isn't supported. 只支持按基元类型排序。Sorting is supported only by primitive types.

不支持非基元 JSON 类型。_ *Non-primitive JSON types _ aren't supported. 使用 stringnumbertrue/false 类型。Use string, number, or true/false types. 不支持 null 值。null values aren't supported.

当前不支持 GraphSONv3 序列化程序。_ *GraphSONv3 _ serializer isn't currently supported. 在连接配置中使用 GraphSONv2 Serializer、Reader 和 Writer 类。Use GraphSONv2 Serializer, Reader, and Writer classes in the connection configuration. Azure Cosmos DB Gremlin API 返回的结果的格式与 GraphSON 格式不同。The results returned by the Azure Cosmos DB Gremlin API don't have the same format as the GraphSON format.

当前不支持 Lambda 表达式和函数。_ Lambda expressions and functions aren't currently supported. 这包括 .map{<expression>}.by{<expression>}.filter{<expression>} 函数。This includes the .map{<expression>}, the .by{<expression>}, and the .filter{<expression>} functions. 若要了解详细信息,并了解如何使用 Gremlin 步骤重写这些函数,请参阅关于 Lambda 的说明To learn more, and to learn how to rewrite them using Gremlin steps, see A Note on Lambdas.

  • 由于系统的分布式特性,因此事务不受支持*。*Transactions _ aren't supported because of distributed nature of the system. 在 Gremlin 帐户上配置适当的一致性模型以“读取自己的写入”,并使用乐观并发解决冲突的写入。Configure appropriate consistency model on Gremlin account to "read your own writes" and use optimistic concurrency to resolve conflicting writes.

已知的限制Known limitations

具有中间遍历 .V() 步骤的 Gremlin 查询的索引利用率:目前,只有遍历的第一次 .V() 调用将使用索引来解析附加到它的任何筛选器或谓词。_ Index utilization for Gremlin queries with mid-traversal .V() steps: Currently, only the first .V() call of a traversal will make use of the index to resolve any filters or predicates attached to it. 后续调用将不会访问索引,因为这可能会增加查询的延迟和成本。Subsequent calls will not consult the index, which might increase the latency and cost of the query.

Assuming default indexing, a typical read Gremlin query that starts with the `.V()` step would use parameters in its attached filtering steps, such as `.has()` or `.where()` to optimize the cost and performance of the query. For example:

```java
g.V().has('category', 'A')
```

However, when more than one `.V()` step is included in the Gremlin query, the resolution of the data for the query might not be optimal. Take the following query as an example:

```java
g.V().has('category', 'A').as('a').V().has('category', 'B').as('b').select('a', 'b')
```

This query will return two groups of vertices based on their property called `category`. In this case, only the first call, `g.V().has('category', 'A')` will make use of the index to resolve the vertices based on the values of their properties.

A workaround for this query is to use subtraversal steps such as `.map()` and `union()`. This is exemplified below:

```java
// Query workaround using .map()
g.V().has('category', 'A').as('a').map(__.V().has('category', 'B')).as('b').select('a','b')

// Query workaround using .union()
g.V().has('category', 'A').fold().union(unfold(), __.V().has('category', 'B'))
```

You can review the performance of the queries by using the [Gremlin `executionProfile()` step](graph-execution-profile.md).

后续步骤Next steps