Synapse SQL 池中的 Group By 选项Group by options in Synapse SQL pool

本文介绍在 SQL 池中实现 Group By 选项的技巧。In this article, you'll find tips for implementing group by options in SQL pool.

GROUP BY 的作用是什么?What does GROUP BY do?

GROUP BY T-SQL 子句用于将数据聚合成摘要行集。The GROUP BY T-SQL clause aggregates data to a summary set of rows. GROUP BY 具有 SQL 池不支持的一些选项。GROUP BY has some options that SQL pool doesn't support. 这些选项有解决方法,如下所示:These options have workarounds, which are as follows:

  • 带 ROLLUP 的 GROUP BYGROUP BY with ROLLUP
  • GROUPING SETSGROUPING SETS
  • 带 CUBE 的 GROUP BYGROUP BY with CUBE

Rollup 和 grouping sets 选项Rollup and grouping sets options

此处最简单的选项是使用 UNION ALL 来执行汇总,而不是依赖显式语法。The simplest option here is to use UNION ALL to perform the rollup rather than relying on the explicit syntax. 结果完全相同。The result is exactly the same.

下面的示例使用了具有 ROLLUP 选项的 GROUP BY 语句:The following example using the GROUP BY statement with the ROLLUP option:

SELECT [SalesTerritoryCountry]
,      [SalesTerritoryRegion]
,      SUM(SalesAmount)             AS TotalSalesAmount
FROM  dbo.factInternetSales s
JOIN  dbo.DimSalesTerritory t       ON s.SalesTerritoryKey       = t.SalesTerritoryKey
GROUP BY ROLLUP (
                        [SalesTerritoryCountry]
                ,       [SalesTerritoryRegion]
                )
;

通过使用 ROLLUP,前面的示例请求以下聚合:By using ROLLUP, the preceding example requests the following aggregations:

  • 国家/地区和区域Country and Region
  • 国家/地区Country
  • 总计Grand Total

若要替换 ROLLUP 并返回相同的结果,可以使用 UNION ALL 并显式指定所需的聚合:To replace ROLLUP and return the same results, you can use UNION ALL and explicitly specify the required aggregations:

SELECT [SalesTerritoryCountry]
,      [SalesTerritoryRegion]
,      SUM(SalesAmount) AS TotalSalesAmount
FROM  dbo.factInternetSales s
JOIN  dbo.DimSalesTerritory t     ON s.SalesTerritoryKey       = t.SalesTerritoryKey
GROUP BY
       [SalesTerritoryCountry]
,      [SalesTerritoryRegion]
UNION ALL
SELECT [SalesTerritoryCountry]
,      NULL
,      SUM(SalesAmount) AS TotalSalesAmount
FROM  dbo.factInternetSales s
JOIN  dbo.DimSalesTerritory t     ON s.SalesTerritoryKey       = t.SalesTerritoryKey
GROUP BY
       [SalesTerritoryCountry]
UNION ALL
SELECT NULL
,      NULL
,      SUM(SalesAmount) AS TotalSalesAmount
FROM  dbo.factInternetSales s
JOIN  dbo.DimSalesTerritory t     ON s.SalesTerritoryKey       = t.SalesTerritoryKey;

若要替换 GROUPING SETS,示例原则也适用。To replace GROUPING SETS, the sample principle applies. 只需要为希望查看的聚合级别创建 UNION ALL 部分。You only need to create UNION ALL sections for the aggregation levels you want to see.

Cube 选项Cube options

可以使用 UNION ALL 方法创建 GROUP BY WITH CUBE。It's possible to create a GROUP BY WITH CUBE using the UNION ALL approach. 问题在于,代码可能很快就会变得庞大且失控。The problem is that the code can quickly become cumbersome and unwieldy. 若要缓解此问题,可以使用这种更高级的方法。To mitigate this issue, you can use this more advanced approach.

使用上面的示例,第一步是定义“cube”,它定义我们想要创建的所有聚合级别。Using the previous example, the first step is to define the 'cube' that defines all the levels of aggregation that we want to create.

记下两个派生表的交叉联接,因为这会生成所有级别。Take note of the CROSS JOIN of the two derived tables since this generates all the levels for us. 其余的代码用于设置格式:The rest of the code is there for formatting:

CREATE TABLE #Cube
WITH
(   DISTRIBUTION = ROUND_ROBIN
,   LOCATION = USER_DB
)
AS
WITH GrpCube AS
(SELECT    CAST(ISNULL(Country,'NULL')+','+ISNULL(Region,'NULL') AS NVARCHAR(50)) as 'Cols'
,          CAST(ISNULL(Country+',','')+ISNULL(Region,'') AS NVARCHAR(50))  as 'GroupBy'
,          ROW_NUMBER() OVER (ORDER BY Country) as 'Seq'
FROM       ( SELECT 'SalesTerritoryCountry' as Country
             UNION ALL
             SELECT NULL
           ) c
CROSS JOIN ( SELECT 'SalesTerritoryRegion' as Region
             UNION ALL
             SELECT NULL
           ) r
)
SELECT Cols
,      CASE WHEN SUBSTRING(GroupBy,LEN(GroupBy),1) = ','
            THEN SUBSTRING(GroupBy,1,LEN(GroupBy)-1)
            ELSE GroupBy
       END AS GroupBy  --Remove Trailing Comma
,Seq
FROM GrpCube;

下图显示了 CTAS 的结果:The following image shows the results of the CTAS:

按多维数据集分组

第二步是指定用于存储临时结果的目标表:The second step is to specify a target table for storing interim results:

DECLARE
 @SQL NVARCHAR(4000)
,@Columns NVARCHAR(4000)
,@GroupBy NVARCHAR(4000)
,@i INT = 1
,@nbr INT = 0
;
CREATE TABLE #Results
(
 [SalesTerritoryCountry] NVARCHAR(50)
,[SalesTerritoryRegion]  NVARCHAR(50)
,[TotalSalesAmount]      MONEY
)
WITH
(   DISTRIBUTION = ROUND_ROBIN
,   LOCATION = USER_DB
)
;

第三步是是循环访问执行聚合的列 cube。The third step is to loop over our cube of columns performing the aggregation. 此查询将针对 #Cube 临时表中的每一行运行一次。The query will run once for every row in the #Cube temporary table. 结果存储在 #Results 临时表中:The results are stored in the #Results temp table:

SET @nbr =(SELECT MAX(Seq) FROM #Cube);

WHILE @i<=@nbr
BEGIN
    SET @Columns = (SELECT Cols    FROM #Cube where seq = @i);
    SET @GroupBy = (SELECT GroupBy FROM #Cube where seq = @i);

    SET @SQL ='INSERT INTO #Results
              SELECT '+@Columns+'
              ,      SUM(SalesAmount) AS TotalSalesAmount
              FROM  dbo.factInternetSales s
              JOIN  dbo.DimSalesTerritory t  
              ON s.SalesTerritoryKey = t.SalesTerritoryKey
              '+CASE WHEN @GroupBy <>''
                     THEN 'GROUP BY '+@GroupBy ELSE '' END

    EXEC sp_executesql @SQL;
    SET @i +=1;
END

最后,可以通过从 #Results 临时表进行读取来返回结果:Lastly, you can return the results by reading from the #Results temporary table:

SELECT *
FROM #Results
ORDER BY 1,2,3
;

将代码拆分成不同的部分并生成循环构造以后,代码就会更易于管理和维护。By breaking up the code into sections and generating a looping construct, the code becomes more manageable and maintainable.

后续步骤Next steps

有关更多开发技巧,请参阅 开发概述For more development tips, see development overview.