Azure Cosmos DB 中的 Gremlin API 简介Introduction to Gremlin API in Azure Cosmos DB

适用于: Gremlin API

Azure Cosmos DB  是 Azure 针对任务关键型应用程序提供的多区域分布式多模型数据库服务。Azure Cosmos DB is the multiple-regionally distributed, multi-model database service from Azure for mission-critical applications. 它是多模型数据库并支持文档、键-值、图和列系列数据模型。It is a multi-model database and supports document, key-value, graph, and column-family data models. Azure Cosmos DB 通过 Gremlin API 在为任何规模设计的完全托管数据库服务中提供图形数据库服务。Azure Cosmos DB provides a graph database service via the Gremlin API on a fully managed database service designed for any scale.

Azure Cosmos DB 图形体系结构

本文提供 Azure Cosmos DB Gremlin API 概述,并说明如何使用它们存储具有数十亿顶点和边缘的大量图形。This article provides an overview of the Azure Cosmos DB Gremlin API and explains how to use them to store massive graphs with billions of vertices and edges. 可在毫秒级延迟的情况下查询图形,并轻松改进图结构。You can query the graphs with millisecond latency and evolve the graph structure easily. Azure Cosmos DB 的 Gremlin API 是基于 Apache TinkerPop(一种图形计算框架)构建的。Azure Cosmos DB's Gremlin API is built based on the Apache TinkerPop, a graph computing framework. Azure Cosmos DB 中的 Gremlin API 使用 Gremlin 查询语言。The Gremlin API in Azure Cosmos DB uses the Gremlin query language.

Azure Cosmos DB 的 Gremlin API 将强大的图形数据库算法与高度可扩展的托管基础结构相结合,为缺乏灵活性和关系型方法的大多数常见数据问题提供独特、灵活的解决方案。Azure Cosmos DB's Gremlin API combines the power of graph database algorithms with highly scalable, managed infrastructure to provide a unique, flexible solution to most common data problems associated with lack of flexibility and relational approaches.

Azure Cosmos DB 的 Gremlin API 的功能Features of Azure Cosmos DB's Gremlin API

Azure Cosmos DB 是一个完全托管的图形数据库,提供多区域分发、存储和吞吐量弹性缩放、自动索引编制与查询、可优化的一致性级别,并支持 TinkerPop 标准。Azure Cosmos DB is a fully managed graph database that offers multiple-region distribution, elastic scaling of storage and throughput, automatic indexing and query, tunable consistency levels, and support for the TinkerPop standard.

以下是 Azure Cosmos DB Gremlin API 提供的差异化功能:The following are the differentiated features that Azure Cosmos DB Gremlin API offers:

  • 可灵活缩放吞吐量和存储Elastically scalable throughput and storage

    现实世界中的图形需要扩展到超越单个服务器的容量。Graphs in the real world need to scale beyond the capacity of a single server. Azure Cosmos DB 支持横向可缩放的图形数据库,就存储和预配吞吐量而言,这些数据库的大小几乎无限制。Azure Cosmos DB supports horizontally scalable graph databases that can have a virtually unlimited size in terms of storage and provisioned throughput. 随着图形数据库规模的增长,数据将使用图形分区自动分发。As the graph database scale grows, the data will be automatically distributed using graph partitioning.

  • 多区域复制Multi-region replication

    Azure Cosmos DB 可以自动将图形数据复制到中国任何 Azure 区域。Azure Cosmos DB can automatically replicate your graph data to any Azure region around China. 多区域复制简化了需要多区域访问数据的应用程序的开发过程。Multiple-region replication simplifies the development of applications that require multiple-region access to data. Azure Cosmos DB 除了最大限度地减少中国任何地方的读写延迟外,还提供了一种自动区域故障转移机制,可以在某区域服务中断(这种情况很少见)的情况下确保应用程序持续可用。In addition to minimizing read and write latency anywhere around China, Azure Cosmos DB provides automatic regional failover mechanism that can ensure the continuity of your application in the rare case of a service interruption in a region.

  • 使用最广泛采用的图形查询标准进行快速查询和遍历Fast queries and traversals with the most widely adopted graph query standard

    存储异类顶点和边缘,并通过熟悉的 Gremlin 语法对其进行查询。Store heterogeneous vertices and edges and query them through a familiar Gremlin syntax. Gremlin 是一种命令式函数查询语言,提供了丰富的接口来实现常见的图形算法。Gremlin is an imperative, functional query language that provides a rich interface to implement common graph algorithms.

    Azure Cosmos DB 支持丰富的实时查询和遍历,而无需指定架构提示、二级索引或视图。Azure Cosmos DB enables rich real-time queries and traversals without the need to specify schema hints, secondary indexes, or views. 通过使用 Gremlin 查询图形中了解详细信息。Learn more in Query graphs by using Gremlin.

  • 完全托管的图形数据库Fully managed graph database

    通过 Azure Cosmos DB 无需管理数据库和计算机资源。Azure Cosmos DB eliminates the need to manage database and machine resources. 大多数现有的图形数据库平台受其基础结构的限制,并且通常需要进行高度维护才能确保其运营。Most existing graph database platforms are bound to the limitations of their infrastructure and often require a high degree of maintenance to ensure its operation.

    作为一种完全托管的服务,Cosmos DB 无需管理虚拟机、更新运行时软件、管理分片或复制或者处理复杂的数据层升级。As a fully managed service, Cosmos DB removes the need to manage virtual machines, update runtime software, manage sharding or replication, or deal with complex data-tier upgrades. 每个图形会自动备份,以防受到区域故障的影响。Every graph is automatically backed up and protected against regional failures. 这使得开发人员能够专注于提供应用程序价值,而不是专注于操作和管理其图形数据库。This allows developers to focus on delivering application value instead of operating and managing their graph databases.

  • 自动编制索引Automatic indexing

    默认情况下,Azure Cosmos DB 自动为图形中的节点(也称为顶点)和边缘包含的所有属性编制索引,无需任何架构或创建二级索引。By default, Azure Cosmos DB automatically indexes all the properties within nodes (also called as vertices) and edges in the graph and doesn't expect or require any schema or creation of secondary indices. 深入了解 Azure Cosmos DB 中的索引Learn more about indexing in Azure Cosmos DB.

  • 与 Apache TinkerPop 兼容Compatibility with Apache TinkerPop

    Azure Cosmos DB 支持开放源代码 Apache TinkerPop 标准Azure Cosmos DB supports the open-source Apache TinkerPop standard. Tinkerpop 标准拥有丰富的应用程序和库生态系统,它们可以轻松地与 Azure Cosmos DB 的 Gremlin API 集成。The Tinkerpop standard has an ample ecosystem of applications and libraries that can be easily integrated with Azure Cosmos DB's Gremlin API.

  • 可优化的一致性级别Tunable consistency levels

    Azure Cosmos DB 提供了五个定义明确的一致性级别,以实现应用程序的一致性和性能之间的适当平衡。Azure Cosmos DB provides five well-defined consistency levels to achieve the right tradeoff between consistency and performance for your application. 对于查询和读取操作,Azure Cosmos DB 提供五种不同的一致性级别:强、有限过时、会话、一致前缀和最终。For queries and read operations, Azure Cosmos DB offers five distinct consistency levels: strong, bounded-staleness, session, consistent prefix, and eventual. 通过这些细化的妥善定义的一致性级别,可以在一致性、可用性与延迟之间实现合理的平衡。These granular, well-defined consistency levels allow you to make sound tradeoffs among consistency, availability, and latency. 有关详细信息,请参阅 Azure Cosmos DB 中的可优化数据一致性级别Learn more in Tunable data consistency levels in Azure Cosmos DB.

使用 Gremlin API 的场景Scenarios that use Gremlin API

以下是 Azure Cosmos DB 的图形支持可能有用的一些场景:Here are some scenarios where graph support of Azure Cosmos DB can be useful:

  • 社交网络/客户 365Social networks/Customer 365

    通过合并有关客户及其与其他人的互动的数据,可以开发个性化的体验、预测客户行为,或者将某些人员与其他具有类似兴趣的人员相连接。By combining data about your customers and their interactions with other people, you can develop personalized experiences, predict customer behavior, or connect people with others with similar interests. 使用 Azure Cosmos DB 可以管理社交网络以及跟踪客户的喜好与数据。Azure Cosmos DB can be used to manage social networks and track customer preferences and data.

  • 推荐引擎Recommendation engines

    此场合通常用于零售行业。This scenario is commonly used in the retail industry. 通过合并有关产品、用户和用户互动(例如购买、浏览某件商品或者为商品评分)的信息,可以生成自定义的推荐内容。By combining information about products, users, and user interactions, like purchasing, browsing, or rating an item, you can build customized recommendations. Azure Cosmos DB 的低延迟、弹性缩放和原生图形支持是这些场景的理想选择。The low latency, elastic scale, and native graph support of Azure Cosmos DB is ideal for these scenarios.

  • 地理空间Geospatial

    电信、物流和旅行规划行业中的许多应用程序需要在某个区域中查找兴趣点,或者查找两个地点之间最短/最佳的路线。Many applications in telecommunications, logistics, and travel planning need to find a location of interest within an area or locate the shortest/optimal route between two locations. Azure Cosmos DB 天生就很适合解决这些问题。Azure Cosmos DB is a natural fit for these problems.

  • 物联网Internet of Things

    当 IoT 设备之间的网络和连接建模为图形时,可以更好地理解设备和资产的状态。With the network and connections between IoT devices modeled as a graph, you can build a better understanding of the state of your devices and assets. 还可以了解网络一个部分的更改可能对另一个部分造成的影响。You also can learn how changes in one part of the network can potentially affect another part.

图形数据库简介Introduction to graph databases

现实世界中的数据存在必然的联系。Data as it appears in the real world is naturally connected. 传统数据建模侧重于单独定义实体并在运行时计算它们之间的关系。Traditional data modeling focuses on defining entities separately and computing their relationships at runtime. 虽然这种模型有其自身的优点,但高度连接的数据在其约束下可能难以进行管理。While this model has its advantages, highly connected data can be challenging to manage under its constraints.

而图形数据库方法依赖于存储层中的持久关系,这使得图形检索操作非常高效。A graph database approach relies on persisting relationships in the storage layer instead, which leads to highly efficient graph retrieval operations. Azure Cosmos DB 的 Gremlin API 支持属性图形模型Azure Cosmos DB's Gremlin API supports the property graph model.

属性图形对象Property graph objects

属性图形是由顶点边缘组成的结构。A property graph is a structure that's composed of vertices and edges. 两个对象都可以作为属性包含任意数量的键值对。Both objects can have an arbitrary number of key-value pairs as properties.

  • 顶点/节点 - 顶点表示人员、地点或事件等离散实体。Vertices/nodes - Vertices denote discrete entities, such as a person, a place, or an event.

  • 边缘/关系 - 边表示顶点之间的关系。Edges/relationships - Edges denote relationships between vertices. 例如,一个人可能认识其他人、涉及到某个事件以及最近处于某个位置。For example, a person might know another person, be involved in an event, and recently been at a location.

  • 属性 - 属性表示有关顶点和边的信息。Properties - Properties express information about the vertices and edges. 顶点或边缘中可以包含任意数量的属性,并且这些属性可用于描述和筛选查询中的对象。There can be any number of properties in either vertices or edges, and they can be used to describe and filter the objects in a query. 属性示例包括具有姓名和年龄的顶点,或具有时间戳和/或体重的边缘。Example properties include a vertex that has name and age, or an edge, which can have a time stamp and/or a weight.

  • 标签 - 标签是顶点或边缘的名称或标识符。Label - A label is a name or the identifier of a vertex or an edge. 标签可以对多个顶点或边缘进行分组,以便组中的所有顶点/边缘都具有特定的标签。Labels can group multiple vertices or edges such that all the vertices/edges in a group have a certain label. 例如,一个图形可以有多个标签类型为“person”的顶点。For example, a graph can have multiple vertices of label type "person".

图形数据库通常包含在 NoSQL 或非关系数据库类别中,因为不存在对架构或受约束数据模型的依赖。Graph databases are often included within the NoSQL or non-relational database category, since there is no dependency on a schema or constrained data model. 由于缺乏架构,因此可以自然且高效地建模和存储连接的结构。This lack of schema allows for modeling and storing connected structures naturally and efficiently.

图形数据库示例Graph database by example

我们使用一个示例图形来了解如何在 Gremlin 中表示查询。Let's use a sample graph to understand how queries can be expressed in Gremlin. 下图显示了一个商业应用程序,该应用程序管理以图形形式呈现的有关用户、兴趣和设备的数据。The following figure shows a business application that manages data about users, interests, and devices in the form of a graph.


此图形使用以下顶点类型(在 Gremlin 中也称为“标签”):This graph has the following vertex types (these are also called "label" in Gremlin):

  • 人员 :图形中包含三名人员:Robin、Thomas 和 BenPeople : The graph has three people, Robin, Thomas, and Ben
  • 兴趣 :在此示例中,人员的兴趣为足球比赛Interests : Their interests, in this example, the game of Football
  • 设备 :人员使用的设备Devices : The devices that people use
  • 操作系统 :设备在其上运行的操作系统Operating Systems : The operating systems that the devices run on
  • 位置 :访问设备的位置Place : The places from which the devices are accessed

我们通过以下边缘类型表示这些实体之间的关系:We represent the relationships between these entities via the following edge types:

  • 认识 :例如,“Thomas 认识 Robin”Knows : For example, "Thomas knows Robin"
  • 感兴趣的内容 :在图形中表示人员的兴趣,例如,“Ben 对足球感兴趣”Interested : To represent the interests of the people in our graph, for example, "Ben is interested in Football"
  • RunsOS :运行 Windows OS 的笔记本电脑RunsOS : Laptop runs the Windows OS
  • 使用 :表示人员使用哪种设备。Uses : To represent which device a person uses. 例如,Robin uses a Motorola phone with serial number 77For example, Robin uses a Motorola phone with serial number 77
  • 位置 :表示从中访问设备的位置Located : To represent the location from which the devices are accessed

Gremlin 控制台是 Apache TinkerPop 提供的交互式终端,此终端用于与图形数据进行交互。The Gremlin Console is an interactive terminal offered by the Apache TinkerPop and this terminal is used to interact with the graph data. 若要了解详细信息,请参阅如何使用 Gremlin 控制台中的快速入门文档。To learn more, see the quickstart doc on how to use the Gremlin console. 也可以所选的平台(Java、Node.js、Python 或 .NET)中使用 Gremlin 驱动程序执行这些操作。You can also perform these operations using Gremlin drivers in the platform of your choice (Java, Node.js, Python, or .NET). 下面的示例演示如何使用 Gremlin 控制台对此图形数据运行查询。The following examples show how to run queries against this graph data using the Gremlin Console.

首先,让我们了解 CRUD。First let's look at CRUD. 以下 Gremlin 语句在图形中插入“Thomas”顶点:The following Gremlin statement inserts the "Thomas" vertex into the graph:

:> g.addV('person').property('id', 'thomas.1').property('firstName', 'Thomas').property('lastName', 'Andersen').property('age', 44)

接下来,以下 Gremlin 语句在 Thomas 与 Robin 之间插入“knows”边缘。Next, the following Gremlin statement inserts a "knows" edge between Thomas and Robin.

:> g.V('thomas.1').addE('knows').to(g.V('robin.1'))

以下查询按人员名字的降序返回“person”顶点:The following query returns the "person" vertices in descending order of their first names:

:> g.V().hasLabel('person').order().by('firstName', decr)

如果需要回答类似于“Thomas 的朋友使用哪些操作系统?”的问题,图形可以提供很大的方便。Where graphs shine is when you need to answer questions like "What operating systems do friends of Thomas use?". 可以运行此 Gremlin 遍历从图形中获取该信息:You can run this Gremlin traversal to get that information from the graph:

:> g.V('thomas.1').out('knows').out('uses').out('runsos').group().by('name').by(count())

后续步骤Next steps

若要详细了解 Azure Cosmos DB 中的图形支持,请参阅:To learn more about graph support in Azure Cosmos DB, see: