了解 NoSQL 数据库与关系数据库之间的差别Understanding the differences between NoSQL and relational databases

本文列举 NoSQL 数据库相比关系数据库的一些重要优势。This article will enumerate some of the key benefits of NoSQL databases over relational databases. 此外,将讨论使用 NoSQL 时存在的一些难点。We will also discuss some of the challenges in working with NoSQL. 若要深入了解现存的不同数据存储,请查看有关选择适当的数据存储的文章。For an in-depth look at the different data stores that exist, have a look at our article on choosing the right data store.

高吞吐量High throughput

在维护关系数据库系统时,最突出的难题之一是,大多数关系引擎都会应用闩锁机制来实施严格的 ACID 语义。One of the most obvious challenges when maintaining a relational database system is that most relational engines apply locks and latches to enforce strict ACID semantics. 这种做法的优点是可以确保数据库中数据的一致状态。This approach has benefits in terms of ensuring a consistent data state within the database. 但是,它在并发性、延迟和可用性方面也会带来严重的弊端。However, there are heavy trade-offs with respect to concurrency, latency, and availability. 由于存在这种根本性的体系结构限制,在事务量较高时,可能需要手动将数据分片。Due to these fundamental architectural restrictions, high transactional volumes can result in the need to manually shard data. 实现手动分片可能很耗时且非常棘手。Implementing manual sharding can be a time consuming and painful exercise.

针对这种情况,分布式数据库可提供更具伸缩性的解决方案。In these scenarios, distributed databases can offer a more scalable solution. 但是,维护过程仍可能成本高昂且很耗时。However, maintenance can still be a costly and time-consuming exercise. 管理员可能需要执行额外的工作来确保系统分布性是透明的。Administrators may have to do extra work to ensure that the distributed nature of the system is transparent. 他们还可能需要考虑到数据库的“离线”性。They may also have to account for the "disconnected" nature of the database.

Azure Cosmos DB 已部署到所有 Azure 中国区域,因此简化了这些难题。Azure Cosmos DB simplifies these challenges, by being deployed around China across all Azure China regions. 可以动态划分分区范围,使数据库随着应用程序的扩展而无缝增长,同时保持高可用性。Partition ranges are capable of being dynamically subdivided to seamlessly grow the database in line with the application, while simultaneously maintaining high availability. 精细粒度的多租户功能以及严格受控的云原生资源监管功能有助于提供令人惊叹的延迟保证和可预测的性能。Fine-grained multi-tenancy and tightly controlled, cloud-native resource governance facilitates astonishing latency guarantees and predictable performance. 分区是完全托管式的,因此管理员无需编写代码或管理分区。Partitioning is fully managed, so administrators need not have to write code or manage partitions.

如果事务量达到极限(例如,每秒数千个事务),则应考虑采用分布式 NoSQL 数据库。If your transactional volumes are reaching extreme levels, such as many thousands of transactions per second, you should consider a distributed NoSQL database. 若要实现最大效率、简化维护和降低总拥有成本,请考虑采用 Azure Cosmos DB。Consider Azure Cosmos DB for maximum efficiency, ease of maintenance, and reduced total cost of ownership.

后端

分层数据Hierarchical data

在大量的用例中,数据库中的事务可能包含许多父子关系。There are a significant number of use cases where transactions in the database can contain many parent-child relationships. 随着时间的推移,这些关系可能会急剧增加,最终变得难以管理。These relationships can grow significantly over time, and prove difficult to manage. 分层数据库形式在上世纪 80 年代即已出现,但由于存储效率低下而并未普及。Forms of hierarchical databases did emerge during the 1980s, but were not popular due to inefficiency in storage. 此外,随着 Ted Codd 关系模型成为几乎所有主流数据库管理系统使用的事实标准,分层数据层日渐失去其吸引力。They also lost traction as Ted Codd's relational model became the de facto standard used by virtually all mainstream database management systems.

但如今,文档式数据库的普及性已得到大幅提高。However, today the popularity of document-style databases has grown significantly. 这种数据库可被视为分层数据库范型的重新改造。由于解除了在磁盘上存储数据造成的成本忧虑,此类数据库让人随心所欲。These databases might be considered a reinventing of the hierarchical database paradigm, now uninhibited by concerns around the cost of storing data on disk. 因此,相较于面向文档的新式方法,在关系数据库中维护许多复杂父子实体关系现在可被视为一种反模式。As a result, maintaining many complex parent-child entity relationships in a relational database could now be considered an anti-pattern compared to modern document-oriented approaches.

面向对象的设计的出现,以及将它与关系模型相结合时所发生的阻抗失配,在某些用例中也突出了关系数据库中的反模式。The emergence of object oriented design, and the impedance mismatch that arises when combining it with relational models, also highlights an anti-pattern in relational databases for certain use cases. 因此可能会造成隐含的但往往很高昂的维护成本。Hidden but often significant maintenance costs can arise as a result. 尽管 ORM 方法已有演进,可在一定程度上缓解此问题,但面向文档的数据库与面向对象的方法的融合度要好得多。Although ORM approaches have evolved to partly mitigate this, document-oriented databases nonetheless coalesce much better with object-oriented approaches. 利用此方法,开发人员无需致力于开发 ORM 驱动程序,也无需定制语言特定的 OO 数据库引擎。With this approach, developers are not forced to be committed to ORM drivers, or bespoke language specific OO Database engines. 如果数据包含许多父子关系和深度的层次级别,可以考虑使用 NoSQL 文档数据库,例如 Azure Cosmos DB SQL APIIf your data contains many parent-child relationships and deep levels of hierarchy, you may want to consider using a NoSQL document database such as the Azure Cosmos DB SQL API.

OrderDetails

复杂的网络和关系Complex networks and relationships

讽刺的是,在为深层且复杂的关系建模时,关系数据库的名称虽然带有“关系”二字,但还算不上最佳的解决方案。Ironically, given their name, relational databases present a less than optimal solution for modeling deep and complex relationships. 原因在于,关系数据库中实际上并不存在实体之间的关系。The reason for this is that relationships between entities do not actually exist in a relational database. 这些关系需要在运行时计算,而复杂的关系还需要运用笛卡尔联接才能使用查询进行映射。They need to be computed at runtime, with complex relationships requiring cartesian joins in order to allow mapping using queries. 因此,在计算方面,随着关系的增加,运算的开销将呈指数级增大。As a result, operations become exponentially more expensive in terms of computation as relationships increase. 在某些情况下,尝试管理此类实体的关系数据库将不可用。In some cases, a relational database attempting to manage such entities will become unusable.

各种形式的“网络”数据库在关系数据库问世时即已出现,但与分层数据库一样,这些系统的普及也面临着重重困难。Various forms of "Network" databases did emerge during the time that relational databases emerged, but as with hierarchical databases, these systems struggled to gain popularity. 对其的采用裹足不前的原因是一时缺少用例,且存储效率低下。Slow adoption was due to a lack of use cases at the time, and storage inefficiencies. 当今,可将图形数据库引擎视为网络数据库范型的再生。Today, graph database engines could be considered a re-emergence of the network database paradigm. 这些系统的主要优势在于,关系作为“一等公民”存储在数据库中。The key benefit with these systems is that relationships are stored as "first class citizens" within the database. 因此,关系的遍历可在恒定的时间内完成,而不会在每次计算新的联接或叉积时增大时间和复杂性。Thus, traversing relationships can be done in constant time, rather than increasing in time complexity with each new join or cross product.

如果在数据库中维护复杂的关系网络,可以考虑使用图形数据库(例如 Azure Cosmos DB Gremlin API)来管理此类数据。If you are maintaining a complex network of relationships in your database, you may want to consider a graph database such as the Azure Cosmos DB Gremlin API for managing this data.

图表

Azure Cosmos DB 是一个多模型数据库服务,它为所有主要 NoSQL 模型类型(列系列、文档、图形和键-值)提供 API 投影。Azure Cosmos DB is a multi-model database service, which offers an API projection for all the major NoSQL model types; Column-family, Document, Graph, and Key-Value. Gremlin(图形)和 SQL(核心)文档 API 层完全可互操作。The Gremlin (graph) and SQL (Core) Document API layers are fully interoperable. 其优点是可以在编程级别切换不同的模型。This has benefits for switching between different models at the programmability level. 可以通过复杂的网络遍历以及建模为同一存储中的文档记录的事务来查询图形存储。Graph stores can be queried in terms of both complex network traversals as well as transactions modeled as document records in the same store.

流体架构Fluid schema

关系数据库的另一个特殊特征是需要在设计时定义架构。Another particular characteristic of relational databases is that schemas are required to be defined at design time. 这在引用完整性和数据合规性方面带来了优势。This has benefits in terms of referential integrity and conformity of data. 但是,随着应用程序的增大,这种特征也会带来限制。However, it can also be restrictive as the application grows. 为了应对各个逻辑独立的模型中的架构变化,共享相同的表或数据库定义可能会变得越来越复杂。Responding to changes in the schema across logically separate models sharing the same table or database definition can become complex over time. 将架构转移到应用程序并按记录进行管理往往可让此类用例受益。Such use cases often benefit from the schema being devolved to the application to manage on a per record basis. 这就要求数据库具有“架构不可知性”,并允许记录“自我描述”其中包含的数据。This requires the database to be "schema agnostic" and allow records to be "self-describing" in terms of the data contained within them.

如果你正在管理其结构不断频繁变化的数据(尤其是当事务可能来自外部源,而在这些源中很难对整个数据库强制实施合规性时),可以考虑使用托管的 NoSQL 数据库服务(例如 Azure Cosmos DB)来实施架构不可知性更高的方法。If you are managing data whose structures are constantly changing at a high rate, particularly if transactions can come from external sources where it is difficult to enforce conformity across the database, you may want to consider a more schema-agnostic approach using a managed NoSQL database service like Azure Cosmos DB.

微服务Microservices

近年来,微服务模式已得到长足发展。The microservices pattern has grown significantly in recent years. 此模式根植于面向服务的体系结构。This pattern has its roots in Service-Oriented Architecture. 在这些新式微服务体系结构中,数据传输的事实标准是 JSON,而 JSON 也正好是绝大多数面向文档的 NoSQL 数据库的存储媒介。The de-facto standard for data transmission in these modern microservices architectures is JSON, which also happens to be the storage medium for the vast majority of document-oriented NoSQL Databases. JSON 大大提高了 NoSQL 文档存储的无缝吻合性,可在复杂的微服务实施方案中实现持久性和同步(通过事件寻源模式)。This makes NoSQL document stores a much more seamless fit for both the persistence and synchronization (using event sourcing patterns) across complex Microservice implementations. 在这些体系结构中维护较传统的关系数据库可能要复杂得多。More traditional relational databases can be much more complex to maintain in these architectures. 这是因为,在 API 之间维护状态和同步需要更多的转换。This is due to the greater amount of transformation required for both state and synchronization across APIs. 具体而言,相比 NoSQL 数据库,Azure Cosmos DB 提供许多功能来进一步提高基于 JSON 的微服务体系结构的无缝吻合度:Azure Cosmos DB in particular has a number of features that make it an even more seamless fit for JSON-based Microservices Architectures than many NoSQL databases:

  • 丰富的纯 JSON 数据类型a choice of pure JSON data types
  • 内置于数据库中的 JavaScript 引擎和查询 APIa JavaScript engine and query API built into the database.
  • 可供客户端订阅的先进更改源,对容器进行修改后可以发出通知。a state-of-the-art change feed which clients can subscribe to in order to get notified of modifications to a container.

NoSQL 数据库存在的一些难题Some challenges with NoSQL databases

尽管实施 NoSQL 数据库可以获得一些明显的优势,但同时也要考虑到它存在的一些难题。Although there are some clear advantages when implementing NoSQL databases, there are also some challenges that you may want to take into consideration. 使用关系模型时,可能达不到相同程度的效果:These may not be present to the same degree when working with the relational model:

  • 事务中包含许多关系,而这些关系指向同一实体。transactions with many relations pointing to the same entity.
  • 事务要求对整个数据集实现强一致性。transactions requiring strong consistency across the entire dataset.

对于第一个难题,NoSQL 数据库的经验法则通常是反规范化,而根据前文所述,这可以在分布式系统中生成更有效的读取。Looking at the first challenge, the rule-of-thumb in NoSQL databases is generally denormalization, which as articulated earlier, produces more efficient reads in a distributed system. 但是,在使用这种方法时,会遇到一些设计难题。However, there are some design challenges that come into play with this approach. 让我们以一个类别和多个标记相关的产品为例:Let's take an example of a product that's related to one category and multiple tags:

联接

NoSQL 文档数据库的最佳做法是直接在产品文档中反规范化类别名称和标记名称。A best practice approach in a NoSQL document database would be to denormalize the category name and tag names directly in a product document". 但是,为了使类别、标记和产品保持同步,促成此方法的设计选项增大了维护复杂性,因为数据将在产品中的多个记录之间复制,而不是在“一对多”的关系中进行简单更新,然后使用联接来检索数据。However, in order to keep categories, tags, and products in sync, the design options to facilitate this have added maintenance complexity, because the data is duplicated across multiple records in the product, rather than being a simple update in a "one-to-many" relationship, and a join to retrieve the data.

优势是反规范化记录中读取效率更高,并且随着概念上联接的实体数量的增加,效率进一步提高。The trade-off is that reads are more efficient in the denormalized record, and become increasingly more efficient as the number of conceptually joined entities increases. 但是,正因为反规范化记录中读取效率随着联接实体数量的增加而提高,使实体保持同步的维护复杂性也随之增大。缓解这种利弊的方法之一是创建混合数据模型However, just as the read efficiency increases with increasing numbers of joined entities in a denormalize record, so too does the maintenance complexity of keeping entities in sync. One way of mitigating this trade-off is to create a hybrid data model.

尽管 NoSQL 数据库提供更高的灵活性来处理这些利弊,但灵活性的增高也可能会产生更多的设计决策。While there is more flexibility available in NoSQL databases to deal with these trade-offs, increased flexibility can also produce more design decisions. 请参阅我们的文章如何使用真实示例为 Azure Cosmos DB 中的数据建模和分区,其中介绍了使反规范化的用户数据保持同步的方法,其中的用户不仅位于不同的分区中,而且还位于不同的容器中。Consult our article how to model and partition data on Azure Cosmos DB using a real-world example, which includes an approach for keeping denormalized user data in sync where users not only sit in different partitions, but in different containers.

极少需要对整个数据集实现强一致性。With respect to strong consistency, it is rare that this will be required across the entire data set. 但是,如果存在这种需要,在分布式数据库中可能很难做到这一点。However, in cases where this is necessary, it can be a challenge in distributed databases. 为确保强一致性,需要在所有副本和区域之间同步数据,然后再允许客户端读取数据。To ensure strong consistency, data needs to be synchronized across all replicas and regions before allowing clients to read it. 这可能会增大读取延迟。This can increase the latency of reads.

同样,对于此处相关的各种利弊,Azure Cosmos DB 提供的灵活性高于关系数据库,但对于小规模的实施方案,使用此方法可能需要在设计时考虑到更多因素。Again, Azure Cosmos DB offers more flexibility than relational databases for the various trade-offs that are relevant here, but for small scale implementations, this approach may add more design considerations. 有关本主题的更多详细信息,请参阅有关一致性、可用性和性能利弊的文章。Consult our article on Consistency, availability, and performance tradeoffs for more detail on this topic.

后续步骤Next steps

了解如何管理 Azure Cosmos 帐户,以及了解其他概念:Learn how to manage your Azure Cosmos account and other concepts: