Azure Cosmos DB 中的联接Joins in Azure Cosmos DB

在关系数据库中,跨表联接是设计规范化架构的逻辑定理。In a relational database, joins across tables are the logical corollary to designing normalized schemas. 相比之下,SQL API 使用无架构项的反规范化数据模型,这在逻辑上等效于自联接。 In contrast, the SQL API uses the denormalized data model of schema-free items, which is the logical equivalent of a self-join.

内联会导致加入联接的集产生完整叉积。Inner joins result in a complete cross product of the sets participating in the join. N 向联接的结果是获得一组 N-元素元组,其中元组中的每个值与参与联接的别名集相关联,并且可以通过引用其他子句中的这些别名来访问。The result of an N-way join is a set of N-element tuples, where each value in the tuple is associated with the aliased set participating in the join and can be accessed by referencing that alias in other clauses.

语法Syntax

该语言支持语法 <from_source1> JOIN <from_source2> JOIN ... JOIN <from_sourceN>The language supports the syntax <from_source1> JOIN <from_source2> JOIN ... JOIN <from_sourceN>. 此查询返回一组包含 N 值的元组。This query returns a set of tuples with N values. 每个元组拥有通过对它们相应的集遍历所有容器别名所产生的值。Each tuple has values produced by iterating all container aliases over their respective sets.

请看以下 FROM 子句:<from_source1> JOIN <from_source2> JOIN ... JOIN <from_sourceN>Let's look at the following FROM clause: <from_source1> JOIN <from_source2> JOIN ... JOIN <from_sourceN>

让每个源定义 input_alias1, input_alias2, …, input_aliasNLet each source define input_alias1, input_alias2, …, input_aliasN. 此 FROM 子句将返回一组 N 元组(带有 N 个值的元组)。This FROM clause returns a set of N-tuples (tuple with N values). 每个元组拥有通过对它们相应的集遍历所有容器别名所产生的值。Each tuple has values produced by iterating all container aliases over their respective sets.

示例 1 - 2 个源 Example 1 - 2 sources

  • <from_source1> 的范围为容器,并表示集 {A, B, C}。Let <from_source1> be container-scoped and represent set {A, B, C}.

  • <from_source2> 作为文档范围的引用 input_alias1,表示的集如下所示:Let <from_source2> be document-scoped referencing input_alias1 and represent sets:

    input_alias1 = A,,表示集 {1, 2}{1, 2} for input_alias1 = A,

    input_alias1 = B,,表示集 {3}{3} for input_alias1 = B,

    input_alias1 = C,,表示集 {4, 5}{4, 5} for input_alias1 = C,

  • FROM 子句 <from_source1> JOIN <from_source2> 将产生以下元组:The FROM clause <from_source1> JOIN <from_source2> will result in the following tuples:

    (input_alias1, input_alias2):(input_alias1, input_alias2):

    (A, 1), (A, 2), (B, 3), (C, 4), (C, 5)

示例 2 - 3 个源 Example 2 - 3 sources

  • <from_source1> 的范围为容器,并表示集 {A, B, C}。Let <from_source1> be container-scoped and represent set {A, B, C}.

  • <from_source2> 作为文档范围的引用 input_alias1,表示的集如下所示:Let <from_source2> be document-scoped referencing input_alias1 and represent sets:

    input_alias1 = A,,表示集 {1, 2}{1, 2} for input_alias1 = A,

    input_alias1 = B,,表示集 {3}{3} for input_alias1 = B,

    input_alias1 = C,,表示集 {4, 5}{4, 5} for input_alias1 = C,

  • <from_source3> 作为文档范围的引用 input_alias2,表示的集如下所示:Let <from_source3> be document-scoped referencing input_alias2 and represent sets:

    input_alias2 = 1,,表示集 {100, 200}{100, 200} for input_alias2 = 1,

    input_alias2 = 3,,表示集 {300}{300} for input_alias2 = 3,

  • FROM 子句 <from_source1> JOIN <from_source2> JOIN <from_source3> 将产生以下元组:The FROM clause <from_source1> JOIN <from_source2> JOIN <from_source3> will result in the following tuples:

    (input_alias1, input_alias2, input_alias3):(input_alias1, input_alias2, input_alias3):

    (A, 1, 100), (A, 1, 200), (B, 3, 300)(A, 1, 100), (A, 1, 200), (B, 3, 300)

    备注

    input_alias1input_alias2 的其他值缺少元组,<from_source3> 不对其返回任何值。Lack of tuples for other values of input_alias1, input_alias2, for which the <from_source3> did not return any values.

示例 3 - 3 个源 Example 3 - 3 sources

  • 让 <from_source1> 的范围为容器,并表示集 {A, B, C}。Let <from_source1> be container-scoped and represent set {A, B, C}.

  • <from_source1> 的范围为容器,并表示集 {A, B, C}。Let <from_source1> be container-scoped and represent set {A, B, C}.

  • <from_source2> 作为引用 input_alias1 的文档作用域,并表示集:Let <from_source2> be document-scoped referencing input_alias1 and represent sets:

    input_alias1 = A,,表示集 {1, 2}{1, 2} for input_alias1 = A,

    input_alias1 = B,,表示集 {3}{3} for input_alias1 = B,

    input_alias1 = C,,表示集 {4, 5}{4, 5} for input_alias1 = C,

  • <from_source3> 的范围限定为 input_alias1,表示的集如下所示:Let <from_source3> be scoped to input_alias1 and represent sets:

    input_alias2 = A,,表示集 {100, 200}{100, 200} for input_alias2 = A,

    input_alias2 = C,,表示集 {300}{300} for input_alias2 = C,

  • FROM 子句 <from_source1> JOIN <from_source2> JOIN <from_source3> 将产生以下元组:The FROM clause <from_source1> JOIN <from_source2> JOIN <from_source3> will result in the following tuples:

    (input_alias1, input_alias2, input_alias3):(input_alias1, input_alias2, input_alias3):

    (A, 1, 100), (A, 1, 200), (A, 2, 100), (A, 2, 200), (C, 4, 300) , (C, 5, 300)(A, 1, 100), (A, 1, 200), (A, 2, 100), (A, 2, 200), (C, 4, 300) , (C, 5, 300)

    备注

    这会导致 <from_source2><from_source3> 产生叉积,因为两者的范围都为 <from_source1>This resulted in cross product between <from_source2> and <from_source3> because both are scoped to the same <from_source1>. 这会产生 4 (2x2) 个具有 A 值的元组、0 个具有 B (1x0) 值的元组、和 2 (2x1) 个具有 C 值的元组。This resulted in 4 (2x2) tuples having value A, 0 tuples having value B (1x0) and 2 (2x1) tuples having value C.

示例Examples

下面的示例演示了 JOIN 子句是如何工作的。The following examples show how the JOIN clause works. 在运行这些示例之前,请上传示例系列数据Before you run these examples, upload the sample family data. 在以下示例中,由于源中每个项和空集的叉积为空,因此结果为空:In the following example, the result is empty, since the cross product of each item from source and an empty set is empty:

    SELECT f.id
    FROM Families f
    JOIN f.NonExistent

结果为:The result is:

    [{
    }]

在以下示例中,联接是两个 JSON 对象、项根 id 和子根 children 之间的叉积。In the following example, the join is a cross product between two JSON objects, the item root id and the children subroot. children 是数组这一事实在联接中不起作用,因为查询处理的是作为 children 数组的单一根。The fact that children is an array isn't effective in the join, because it deals with a single root that is the children array. 由于每个带有数组的项的叉积仅生成一个项,因此结果仅包含两个结果。The result contains only two results, because the cross product of each item with the array yields exactly only one item.

    SELECT f.id
    FROM Families f
    JOIN f.children

结果有:The results are:

    [
      {
        "id": "AndersenFamily"
      },
      {
        "id": "WakefieldFamily"
      }
    ]

下面的示例演示了更传统的联接:The following example shows a more conventional join:

    SELECT f.id
    FROM Families f
    JOIN c IN f.children

结果有:The results are:

    [
      {
        "id": "AndersenFamily"
      },
      {
        "id": "WakefieldFamily"
      },
      {
        "id": "WakefieldFamily"
      }
    ]

JOIN 子句的 FROM 源是一个迭代器。The FROM source of the JOIN clause is an iterator. 因此,以上示例中的流程为:So, the flow in the preceding example is:

  1. 展开数组中的每个子元素 cExpand each child element c in the array.
  2. 应用包含项 f 的根的叉积,该项包含已在第一个步骤中平展的每个子元素 cApply a cross product with the root of the item f with each child element c that the first step flattened.
  3. 最后,单独投影根对象 f id 属性。Finally, project the root object f id property alone.

第一个项 (AndersenFamily) 仅包含一个 children 元素,因此结果集仅包含单个对象。The first item, AndersenFamily, contains only one children element, so the result set contains only a single object. 第二个项 WakefieldFamily 包含两个 children,因此,叉积为每个 children 元素生成一个对象,共两个对象。The second item, WakefieldFamily, contains two children, so the cross product produces two objects, one for each children element. 这两个项中的根字段会是相同的,正如在叉积中所预期的一样。The root fields in both these items are the same, just as you would expect in a cross product.

JOIN 子句真正实用的地方是通过以其他方式难以投影的形式基于叉积生成元组。The real utility of the JOIN clause is to form tuples from the cross product in a shape that's otherwise difficult to project. 以下示例对元组组合进行筛选,让用户选择元组在整体上满足的条件。The example below filters on the combination of a tuple that lets the user choose a condition satisfied by the tuples overall.

    SELECT 
        f.id AS familyName,
        c.givenName AS childGivenName,
        c.firstName AS childFirstName,
        p.givenName AS petName
    FROM Families f
    JOIN c IN f.children
    JOIN p IN c.pets

结果有:The results are:

    [
      {
        "familyName": "AndersenFamily",
        "childFirstName": "Henriette Thaulow",
        "petName": "Fluffy"
      },
      {
        "familyName": "WakefieldFamily",
        "childGivenName": "Jesse",
        "petName": "Goofy"
      }, 
      {
       "familyName": "WakefieldFamily",
       "childGivenName": "Jesse",
       "petName": "Shadow"
      }
    ]

以下示例对前一个示例做了延伸,将会执行双重联接。The following extension of the preceding example performs a double join. 可将叉积视为下面所示的伪代码:You could view the cross product as the following pseudo-code:

    for-each(Family f in Families)
    {
        for-each(Child c in f.children)
        {
            for-each(Pet p in c.pets)
            {
                return (Tuple(f.id AS familyName,
                  c.givenName AS childGivenName,
                  c.firstName AS childFirstName,
                  p.givenName AS petName));
            }
        }
    }

AndersenFamily 中有一个孩子拥有一只宠物,因此叉积从此家庭生成了一行 (1*1*1)。AndersenFamily has one child who has one pet, so the cross product yields one row (1*1*1) from this family. WakefieldFamily 中有两个孩子,其中只有一个孩子拥有宠物,但这个孩子拥有两只宠物。WakefieldFamily has two children, only one of whom has pets, but that child has two pets. 叉积对此家庭生成了 1*1*2 = 2 行。The cross product for this family yields 1*1*2 = 2 rows.

以下示例根据 pet 进行了额外的筛选,这排除了宠物名称不是 Shadow 的所有元组。In the next example, there is an additional filter on pet, which excludes all the tuples where the pet name is not Shadow. 可以基于数组生成元组,根据元组的任意元素进行筛选以及投影元素的任何组合。You can build tuples from arrays, filter on any of the elements of the tuple, and project any combination of the elements.

    SELECT 
        f.id AS familyName,
        c.givenName AS childGivenName,
        c.firstName AS childFirstName,
        p.givenName AS petName
    FROM Families f
    JOIN c IN f.children
    JOIN p IN c.pets
    WHERE p.givenName = "Shadow"

结果有:The results are:

    [
      {
       "familyName": "WakefieldFamily",
       "childGivenName": "Jesse",
       "petName": "Shadow"
      }
    ]

后续步骤Next steps