Azure Cosmos DB 中的数据建模Data modeling in Azure Cosmos DB

尽管使用无架构数据库(例如 Azure Cosmos DB)能够十分方便地存储和查询非结构化与半结构化数据,但你应该花些时间来思考数据模型,以充分发挥服务的性能和可伸缩性优势,并尽量降低成本。While schema-free databases, like Azure Cosmos DB, make it super easy to store and query unstructured and semi-structured data, you should spend some time thinking about your data model to get the most of the service in terms of performance and scalability and lowest cost.

将如何存储数据?How is data going to be stored? 应用程序将如何检索和查询数据?How is your application going to retrieve and query data? 应用程序是读取密集型还是写入密集型?Is your application read-heavy, or write-heavy?

阅读本文后,能够回答以下问题:After reading this article, you will be able to answer the following questions:

  • 什么是数据建模,我为什么应该关注?What is data modeling and why should I care?
  • Azure Cosmos DB 与关系数据库中的数据建模有何不同?How is modeling data in Azure Cosmos DB different to a relational database?
  • 如何在非关系型数据库中表示数据关系?How do I express data relationships in a non-relational database?
  • 我应何时嵌入数据和何时链接数据?When do I embed data and when do I link to data?

嵌入数据Embedding data

开始对 Azure Cosmos DB 中的数据建模时,请尝试将实体视为使用 JSON 文档表示的自包含项When you start modeling data in Azure Cosmos DB try to treat your entities as self-contained items represented as JSON documents.

为了进行比较,让我们先了解一下关系数据库中的数据建模方式。For comparison, let's first see how we might model data in a relational database. 下面的示例演示了如何在关系型数据库中存储一个人的信息。The following example shows how a person might be stored in a relational database.

关系型数据库模型

使用关系数据库时,策略是将所有数据规范化。When working with relational databases, the strategy is to normalize all your data. 规范化数据通常涉及到将一个实体(例如某人)的信息分解为多个离散的组成部分。Normalizing your data typically involves taking an entity, such as a person, and breaking it down into discrete components. 在以上示例中,特定人员除了多个地址记录之外,还可能有多个联系人详细信息记录。In the example above, a person can have multiple contact detail records, as well as multiple address records. 通过进一步提取通用字段(例如“类型”),可以进一步分解联系人详细信息。Contact details can be further broken down by further extracting common fields like a type. 这同样适用于地址,每条记录的类型可以是“家庭”或“工作”。 The same applies to address, each record can be of type Home or Business.

规范化数据时的指导前提是避免存储每个记录的冗余数据,并且引用数据。The guiding premise when normalizing data is to avoid storing redundant data on each record and rather refer to data. 在本示例中,若要读取某个人的所有联系人详细信息和地址信息,在运行时需要使用 JOINS 有效地重新撰写(或反规范化)数据。In this example, to read a person, with all their contact details and addresses, you need to use JOINS to effectively compose back (or denormalize) your data at run time.

SELECT p.FirstName, p.LastName, a.City, cd.Detail
FROM Person p
JOIN ContactDetail cd ON cd.PersonId = p.Id
JOIN ContactDetailType cdt ON cdt.Id = cd.TypeId
JOIN Address a ON a.PersonId = p.Id

更新一个人的联系人详细信息和地址信息需要跨多个表执行写入操作。Updating a single person with their contact details and addresses requires write operations across many individual tables.

现在让我们了解如何将相同的数据建模为 Azure Cosmos DB 中的自包含实体。Now let's take a look at how we would model the same data as a self-contained entity in Azure Cosmos DB.

{
    "id": "1",
    "firstName": "Thomas",
    "lastName": "Andersen",
    "addresses": [
        {
            "line1": "100 Some Street",
            "line2": "Unit 1",
            "city": "Seattle",
            "state": "WA",
            "zip": 98012
        }
    ],
    "contactDetails": [
        {"email": "thomas@andersen.com"},
        {"phone": "+1 555 555-5555", "extension": 5555}
    ]
}

我们已使用上述方法,通过将此人相关的所有信息(例如联系人详细信息和地址)嵌入到单个 JSON 文档,非规范化了其相关记录。Using the approach above we have denormalized the person record, by embedding all the information related to this person, such as their contact details and addresses, into a single JSON document. 此外,因为我们不受固定架构的限制,所以我们可以灵活地执行一些操作,例如可以具有完全不同类型的联系人详细信息。In addition, because we're not confined to a fixed schema we have the flexibility to do things like having contact details of different shapes entirely.

从数据库检索一条完整的人员记录现在成为针对单个容器和单个项的单个读取操作Retrieving a complete person record from the database is now a single read operation against a single container and for a single item. 更新人员记录(包括其联系人详细信息和地址)也是针对单个项的单个写入操作Updating a person record, with their contact details and addresses, is also a single write operation against a single item.

通过非规范化数据,应用程序需要发出的查询更少,并且更新为完成常见的操作。By denormalizing data, your application may need to issue fewer queries and updates to complete common operations.

何时嵌入When to embed

通常在下列情况下使用嵌入式数据模型:In general, use embedded data models when:

  • 实体之间存在“包含” 关系。There are contained relationships between entities.
  • 实体之间存在一对多关系。There are one-to-few relationships between entities.
  • 嵌入式数据不经常更改There is embedded data that changes infrequently.
  • 有些嵌入式数据在未绑定的情况下不会增长。There is embedded data that will not grow without bound.
  • 有些嵌入式数据会频繁地统一查询There is embedded data that is queried frequently together.

备注

通常非规范化数据模型具有更好的读取性能。Typically denormalized data models provide better read performance.

何时不嵌入When not to embed

虽然 Azure Cosmos DB 的经验法则是将所有事物非规范化,并将所有数据嵌入到单个项中,但是这可能导致一些情况的发生,而这些情况是应该避免的。While the rule of thumb in Azure Cosmos DB is to denormalize everything and embed all data into a single item, this can lead to some situations that should be avoided.

以下面的 JSON 代码段为例。Take this JSON snippet.

{
    "id": "1",
    "name": "What's new in the coolest Cloud",
    "summary": "A blog post by someone real famous",
    "comments": [
        {"id": 1, "author": "anon", "comment": "something useful, I'm sure"},
        {"id": 2, "author": "bob", "comment": "wisdom from the interwebs"},
        …
        {"id": 100001, "author": "jane", "comment": "and on we go ..."},
        …
        {"id": 1000000001, "author": "angry", "comment": "blah angry blah angry"},
        …
        {"id": ∞ + 1, "author": "bored", "comment": "oh man, will this ever end?"},
    ]
}

如果我们要对一个典型博客或 CMS 系统建模,那么具有嵌入式评论的发布实体可能就如上面的代码所示。This might be what a post entity with embedded comments would look like if we were modeling a typical blog, or CMS, system. 此示例中的问题是评论数组没有限制,这意味着任何单个发布的评论数都没有(实际)限制。The problem with this example is that the comments array is unbounded, meaning that there is no (practical) limit to the number of comments any single post can have. 随着项大小的无限增大,这可能会成为一个问题。This may become a problem as the size of the item could grow infinitely large.

随着项大小的不断增长,通过网络传输数据和大规模读取和更新项的能力会受到影响。As the size of the item grows the ability to transmit the data over the wire as well as reading and updating the item, at scale, will be impacted.

在此情况下,最好是考虑以下数据模型。In this case, it would be better to consider the following data model.

Post item:
{
    "id": "1",
    "name": "What's new in the coolest Cloud",
    "summary": "A blog post by someone real famous",
    "recentComments": [
        {"id": 1, "author": "anon", "comment": "something useful, I'm sure"},
        {"id": 2, "author": "bob", "comment": "wisdom from the interwebs"},
        {"id": 3, "author": "jane", "comment": "....."}
    ]
}

Comment items:
{
    "postId": "1"
    "comments": [
        {"id": 4, "author": "anon", "comment": "more goodness"},
        {"id": 5, "author": "bob", "comment": "tails from the field"},
        ...
        {"id": 99, "author": "angry", "comment": "blah angry blah angry"}
    ]
},
{
    "postId": "1"
    "comments": [
        {"id": 100, "author": "anon", "comment": "yet more"},
        ...
        {"id": 199, "author": "bored", "comment": "will this ever end?"}
    ]
}

此模型在贴子容器中嵌入了三条最新评论,这些评论是包含一组固定属性的数组。This model has the three most recent comments embedded in the post container, which is an array with a fixed set of attributes. 其他评论分组成包含 100 条评论的批,并存储为单独的项。The other comments are grouped in to batches of 100 comments and stored as separate items. 批次的大小选为 100,因为我们的虚构的应用程序允许用户一次加载 100 个评论。The size of the batch was chosen as 100 because our fictitious application allows the user to load 100 comments at a time.

另一个不适合嵌入数据的情况是嵌入式数据经常在项之间使用,并且经常更改。Another case where embedding data is not a good idea is when the embedded data is used often across items and will change frequently.

以下面的 JSON 代码段为例。Take this JSON snippet.

{
    "id": "1",
    "firstName": "Thomas",
    "lastName": "Andersen",
    "holdings": [
        {
            "numberHeld": 100,
            "stock": { "symbol": "zaza", "open": 1, "high": 2, "low": 0.5 }
        },
        {
            "numberHeld": 50,
            "stock": { "symbol": "xcxc", "open": 89, "high": 93.24, "low": 88.87 }
        }
    ]
}

这可以表示某人的股票投资组合。This could represent a person's stock portfolio. 我们已选择在每个投资组合文档中嵌入股票信息。We have chosen to embed the stock information into each portfolio document. 在一个相关数据频繁更改的环境中,例如股票交易应用程序,嵌入经常更改的数据将意味着每当进行一次股票交易,就需要更新每个投资组合文档,需要不停地更新。In an environment where related data is changing frequently, like a stock trading application, embedding data that changes frequently is going to mean that you are constantly updating each portfolio document every time a stock is traded.

在一天时间里股票 zaza 可能交易成百上千次,并且数以千计的用户可能在其投资组合中具有股票 zazaStock zaza may be traded many hundreds of times in a single day and thousands of users could have zaza on their portfolio. 使用类似上面的数据模型,我们需要每天更新成千上万的投资组合文档许多次,导致系统无法很好地扩展。With a data model like the above we would have to update many thousands of portfolio documents many times every day leading to a system that won't scale well.

引用数据Referencing data

嵌入式数据在很多情况下都可以很好运作,但在一些情况下,非规范化数据会导致更多问题而得不偿失。Embedding data works nicely for many cases but there are scenarios when denormalizing your data will cause more problems than it is worth. 因此我们现在该怎么办?So what do we do now?

关系型数据库不是可以在实体之间创建关系的唯一数据库。Relational databases are not the only place where you can create relationships between entities. 在文档数据库中,一个文档中的信息与其他文档中的数据相关。In a document database, you can have information in one document that relates to data in other documents. 我们不建议在 Azure Cosmos DB 或任何其他文档数据库中构建更适合于关系数据库的系统,但是简单关系是可以的,并且还非常有用。We do not recommend building systems that would be better suited to a relational database in Azure Cosmos DB, or any other document database, but simple relationships are fine and can be useful.

在下面的 JSON 代码中我们选择使用前面的股票投资组合示例,但是这次我们引用了投资组合中的股票项目,而不是嵌入此项目。In the JSON below we chose to use the example of a stock portfolio from earlier but this time we refer to the stock item on the portfolio instead of embedding it. 在这种情况下,当一天当中股票项发生频繁更改时,仅有的需要更新的文档就是一个股票文档。This way, when the stock item changes frequently throughout the day the only document that needs to be updated is the single stock document.

Person document:
{
    "id": "1",
    "firstName": "Thomas",
    "lastName": "Andersen",
    "holdings": [
        { "numberHeld":  100, "stockId": 1},
        { "numberHeld":  50, "stockId": 2}
    ]
}

Stock documents:
{
    "id": "1",
    "symbol": "zaza",
    "open": 1,
    "high": 2,
    "low": 0.5,
    "vol": 11970000,
    "mkt-cap": 42000000,
    "pe": 5.89
},
{
    "id": "2",
    "symbol": "xcxc",
    "open": 89,
    "high": 93.24,
    "low": 88.87,
    "vol": 2970200,
    "mkt-cap": 1005000,
    "pe": 75.82
}

但是当前这种方法的缺点是当显示一个人的投资组合时,如果应用程序需要显示所持有的每个股票的信息,则需要多次访问数据库以加载每个股票文档的信息。An immediate downside to this approach though is if your application is required to show information about each stock that is held when displaying a person's portfolio; in this case you would need to make multiple trips to the database to load the information for each stock document. 这里我们决定提高一天当中频繁发生的写操作的效率,但是这反过来会影响读取操作,读取操作对此特定系统的性能的潜在影响较小。Here we've made a decision to improve the efficiency of write operations, which happen frequently throughout the day, but in turn compromised on the read operations that potentially have less impact on the performance of this particular system.

备注

规范化的数据模型可能需要更多的往返访问服务器Normalized data models can require more round trips to the server.

外键呢?What about foreign keys?

因为当前没有约束、外键或其他类似概念,所以文档中存在的任何文档间关系都是有效的“弱链接”,并且数据库不会验证此关系。Because there is currently no concept of a constraint, foreign-key or otherwise, any inter-document relationships that you have in documents are effectively "weak links" and will not be verified by the database itself. 如果想要确保文档要引用的数据实际存在,那么需要在应用程序中进行此验证,或通过使用 Azure Cosmos DB 上的服务器端触发器或存储过程来验证。If you want to ensure that the data a document is referring to actually exists, then you need to do this in your application, or through the use of server-side triggers or stored procedures on Azure Cosmos DB.

何时引用When to reference

通常在下列情况下使用规范化的数据模型:In general, use normalized data models when:

  • 表示一对多关系。Representing one-to-many relationships.
  • 表示多对多关系。Representing many-to-many relationships.
  • 相关数据频繁更改Related data changes frequently.
  • 引用的数据可能没有限制Referenced data could be unbounded.

备注

通常规范化能够提供更好的编写性能。Typically normalizing provides better write performance.

将关系数据存储在何处?Where do I put the relationship?

关系的增长将有助于确定用于存储引用的文档。The growth of the relationship will help determine in which document to store the reference.

让我们看看下面的对出版商和书籍进行建模的 JSON 代码。If we look at the JSON below that models publishers and books.

Publisher document:
{
    "id": "mspress",
    "name": "Microsoft Press",
    "books": [ 1, 2, 3, ..., 100, ..., 1000]
}

Book documents:
{"id": "1", "name": "Azure Cosmos DB 101" }
{"id": "2", "name": "Azure Cosmos DB for RDBMS Users" }
{"id": "3", "name": "Taking over China one JSON doc at a time" }
...
{"id": "100", "name": "Learn about Azure Cosmos DB" }
...
{"id": "1000", "name": "Deep Dive into Azure Cosmos DB" }

如果每个出版商的书籍数量较少且增长有限,那么在出版商文档中存储书籍引用可能很有用。If the number of the books per publisher is small with limited growth, then storing the book reference inside the publisher document may be useful. 但是,如果每个出版商的书籍数量没有限制,那么此数据模型将产生可变、不断增长的数组,类似于上面示例中的出版商文档。However, if the number of books per publisher is unbounded, then this data model would lead to mutable, growing arrays, as in the example publisher document above.

稍微做些更改就会使模型仍显示相同的数据,但可以避免产生较大的可变集合。Switching things around a bit would result in a model that still represents the same data but now avoids these large mutable collections.

Publisher document:
{
    "id": "mspress",
    "name": "Microsoft Press"
}

Book documents:
{"id": "1","name": "Azure Cosmos DB 101", "pub-id": "mspress"}
{"id": "2","name": "Azure Cosmos DB for RDBMS Users", "pub-id": "mspress"}
{"id": "3","name": "Taking over China one JSON doc at a time"}
...
{"id": "100","name": "Learn about Azure Cosmos DB", "pub-id": "mspress"}
...
{"id": "1000","name": "Deep Dive into Azure Cosmos DB", "pub-id": "mspress"}

在上面的示例中,我们删除了出版商文档中的无限制集合,In the above example, we have dropped the unbounded collection on the publisher document. 只在每个书籍文档中引用出版商。Instead we just have a reference to the publisher on each book document.

如何对多对多关系建模?How do I model many:many relationships?

在关系型数据库中,多对多关系通常使用联接表来建模,这种方法只是将其他表中的记录联接在一起。In a relational database many:many relationships are often modeled with join tables, which just join records from other tables together.

联接表

可能想要使用文档复制相同内容,并生成类似以下示例的数据模型。You might be tempted to replicate the same thing using documents and produce a data model that looks similar to the following.

Author documents:
{"id": "a1", "name": "Thomas Andersen" }
{"id": "a2", "name": "William Wakefield" }

Book documents:
{"id": "b1", "name": "Azure Cosmos DB 101" }
{"id": "b2", "name": "Azure Cosmos DB for RDBMS Users" }
{"id": "b3", "name": "Taking over China one JSON doc at a time" }
{"id": "b4", "name": "Learn about Azure Cosmos DB" }
{"id": "b5", "name": "Deep Dive into Azure Cosmos DB" }

Joining documents:
{"authorId": "a1", "bookId": "b1" }
{"authorId": "a2", "bookId": "b1" }
{"authorId": "a1", "bookId": "b2" }
{"authorId": "a1", "bookId": "b3" }

此模型可行。This would work. 但是,加载一个作者及其书籍或加载一个书籍及其作者,将始终要求对数据库执行至少两次查询。However, loading either an author with their books, or loading a book with its author, would always require at least two additional queries against the database. 一次是对联接文档的查询,另一个查询用来获取联接的实际文档。One query to the joining document and then another query to fetch the actual document being joined.

如果联接表只是将两个数据片段联接在一起,那么为什么不将该表完全删除?If all this join table is doing is gluing together two pieces of data, then why not drop it completely? 请考虑以下代码。Consider the following.

Author documents:
{"id": "a1", "name": "Thomas Andersen", "books": ["b1", "b2", "b3"]}
{"id": "a2", "name": "William Wakefield", "books": ["b1", "b4"]}

Book documents:
{"id": "b1", "name": "Azure Cosmos DB 101", "authors": ["a1", "a2"]}
{"id": "b2", "name": "Azure Cosmos DB for RDBMS Users", "authors": ["a1"]}
{"id": "b3", "name": "Learn about Azure Cosmos DB", "authors": ["a1"]}
{"id": "b4", "name": "Deep Dive into Azure Cosmos DB", "authors": ["a2"]}

现在,如果有作者的姓名,可以立即知道他们写了哪些书;相反,如果加载了书籍文档,则可以知道作者的 ID。Now, if I had an author, I immediately know which books they have written, and conversely if I had a book document loaded I would know the IDs of the author(s). 这可以省去对联接表的中间查询,从而减少了应用程序需要往返访问服务器的次数。This saves that intermediary query against the join table reducing the number of server round trips your application has to make.

混合数据模型Hybrid data models

现在我们已经看了嵌入数据(或非规范化)和引用数据(规范化)的示例,正如我们看到的每种方法都有其优点和缺点。We've now looked embedding (or denormalizing) and referencing (or normalizing) data, each have their upsides and each have compromises as we have seen.

不需要始终只使用其中一种方法,可以大胆地将这两种方法结合使用。It doesn't always have to be either or, don't be scared to mix things up a little.

根据应用程序的特定使用模式和工作负载,可能在一些情况下结合使用嵌入式数据和引用数据是有意义的,可产生具有更少的服务器往返访问次数的更简单的应用程序逻辑,同时仍保持较好的性能级别。Based on your application's specific usage patterns and workloads there may be cases where mixing embedded and referenced data makes sense and could lead to simpler application logic with fewer server round trips while still maintaining a good level of performance.

请考虑以下 JSON。Consider the following JSON.

Author documents:
{
    "id": "a1",
    "firstName": "Thomas",
    "lastName": "Andersen",
    "countOfBooks": 3,
    "books": ["b1", "b2", "b3"],
    "images": [
        {"thumbnail": "https://....png"}
        {"profile": "https://....png"}
        {"large": "https://....png"}
    ]
},
{
    "id": "a2",
    "firstName": "William",
    "lastName": "Wakefield",
    "countOfBooks": 1,
    "books": ["b1"],
    "images": [
        {"thumbnail": "https://....png"}
    ]
}

Book documents:
{
    "id": "b1",
    "name": "Azure Cosmos DB 101",
    "authors": [
        {"id": "a1", "name": "Thomas Andersen", "thumbnailUrl": "https://....png"},
        {"id": "a2", "name": "William Wakefield", "thumbnailUrl": "https://....png"}
    ]
},
{
    "id": "b2",
    "name": "Azure Cosmos DB for RDBMS Users",
    "authors": [
        {"id": "a1", "name": "Thomas Andersen", "thumbnailUrl": "https://....png"},
    ]
}

此处我们(主要)遵循了嵌入式模型,在顶层文档中嵌入其他实体的数据,但同时引用了其他数据。Here we've (mostly) followed the embedded model, where data from other entities are embedded in the top-level document, but other data is referenced.

如果查看书籍文档中的作者数组,会看到一些有趣的字段。If you look at the book document, we can see a few interesting fields when we look at the array of authors. 某个 id 字段是用来引用作者文档的字段,这是规范化模型中的标准做法,但是我们还使用了 namethumbnailUrlThere is an id field that is the field we use to refer back to an author document, standard practice in a normalized model, but then we also have name and thumbnailUrl. 我们可以只使用 id 字段,并让应用程序使用“链接”从各自的作者文档中获取所需的任何其他信息,但是由于我们的应用程序在显示的每本书中显示了作者的姓名和缩略图,因此通过非规范化作者中的一些数据,我们节省了针对列表中每本书往返访问服务器的次数。We could have stuck with id and left the application to get any additional information it needed from the respective author document using the "link", but because our application displays the author's name and a thumbnail picture with every book displayed we can save a round trip to the server per book in a list by denormalizing some data from the author.

当然,如果作者的姓名发生更改,或者他们想要更新自己的照片,那么我们必须更新他们曾经出版的每本书,但对于我们的应用程序来说,基于作者不会经常更改他们的姓名的假设,这是一个可接受的设计决策。Sure, if the author's name changed or they wanted to update their photo we'd have to go and update every book they ever published but for our application, based on the assumption that authors don't change their names often, this is an acceptable design decision.

在示例中预先计算的聚合值可在读取操作上节省高昂的处理成本。In the example, there are pre-calculated aggregates values to save expensive processing on a read operation. 在本例中,作者文档中嵌入的一些数据为在运行时计算的数据。In the example, some of the data embedded in the author document is data that is calculated at run-time. 每当出版了一本新书,就会创建一个书籍文档并且将 countOfBooks 字段设置为基于特定作者的现有书籍文档数的计算值。Every time a new book is published, a book document is created and the countOfBooks field is set to a calculated value based on the number of book documents that exist for a particular author. 这种优化对于读取频繁的系统来说是有益的,为了优化读取,我们可以对写入操作执行更多计算。This optimization would be good in read heavy systems where we can afford to do computations on writes in order to optimize reads.

因为 Azure Cosmos DB 支持多文档事务,所以构建一个具有预先计算字段的模型是可能的。The ability to have a model with pre-calculated fields is made possible because Azure Cosmos DB supports multi-document transactions. 许多 NoSQL 存储无法跨文档执行事务,正是因为该限制,所以提倡诸如“始终嵌入所有数据”的设计决策。Many NoSQL stores cannot do transactions across documents and therefore advocate design decisions, such as "always embed everything", due to this limitation. 在 Azure Cosmos DB 中,可以使用服务器端触发器或存储过程在一个 ACID 事务中插入书籍和更新作者信息等。With Azure Cosmos DB, you can use server-side triggers, or stored procedures, that insert books and update authors all within an ACID transaction. 现在无需将所有数据嵌入一个文档,只需确保数据保持一致性 。Now you don't have to embed everything into one document just to be sure that your data remains consistent.

区分不同的文档类型Distinguishing between different document types

在某些情况下,你可能想要在同一个集合中混合使用不同的文档类型;当你想要将多个相关的文档放入同一个分区时,你往往会这样做。In some scenarios, you may want to mix different document types in the same collection; this is usually the case when you want multiple, related documents to sit in the same partition. 例如,可将书籍和书籍评论放入同一个集合,并按 bookId 将此集合分区。For example, you could put both books and book reviews in the same collection and partition it by bookId. 在这种情况下,你通常会在文档中添加一个字段用于标识其类型,以方便用户区分。In such situation, you usually want to add to your documents with a field that identifies their type in order to differentiate them.

Book documents:
{
    "id": "b1",
    "name": "Azure Cosmos DB 101",
    "bookId": "b1",
    "type": "book"
}

Review documents:
{
    "id": "r1",
    "content": "This book is awesome",
    "bookId": "b1",
    "type": "review"
},
{
    "id": "r2",
    "content": "Best book ever!",
    "bookId": "b1",
    "type": "review"
}

后续步骤Next steps

本文的最大的要点在于了解无架构环境下的数据建模的重要性一如既往。The biggest takeaways from this article are to understand that data modeling in a schema-free world is as important as ever.

就像有多种方法可在屏幕上表示一个数据片段一样,数据的建模方法也不会只有一种。Just as there is no single way to represent a piece of data on a screen, there is no single way to model your data. 需要了解应用程序以及它如何生成、使用和处理数据。You need to understand your application and how it will produce, consume, and process the data. 然后,通过应用此处提供的一些准则,可以开始创建可满足应用程序当前需求的模型。Then, by applying some of the guidelines presented here you can set about creating a model that addresses the immediate needs of your application. 当应用程序需要进行更改时,可以利用无架构数据库的灵活性欣然接受更改,并轻松改进数据模型。When your applications need to change, you can leverage the flexibility of a schema-free database to embrace that change and evolve your data model easily.

若要了解有关 Azure Cosmos DB 的详细信息,请参阅该服务的文档页。To learn more about Azure Cosmos DB, refer to the service's documentation page.

若要了解如何在多个分区之间对数据进行分片,请参阅在 Azure Cosmos DB 中对数据进行分区To understand how to shard your data across multiple partitions, refer to Partitioning Data in Azure Cosmos DB.

若要了解如何使用实际示例对 Azure Cosmos DB 上的数据进行建模和分区,请参阅数据建模和分区 - 实际示例To learn how to model and partition data on Azure Cosmos DB using a real-world example, refer to Data Modeling and Partitioning - a Real-World Example.