数据仓库单位 (DWU) 和计算数据仓库单位 (cDWU)Data Warehouse Units (DWUs) and compute Data Warehouse Units (cDWUs)

针对选择理想数目的数据仓库单位(DWU、cDWU)来优化价格和性能以及如何更改单位数提供了建议。Recommendations on choosing the ideal number of data warehouse units (DWUs, cDWUs) to optimize price and performance, and how to change the number of units.

什么是数据仓库单位?What are Data Warehouse Units?

借助 SQL 数据仓库,将 CPU、内存和 IO 捆绑到称为数据仓库单位 (DWU) 的计算规模单位中。With SQL Data Warehouse CPU, memory, and IO are bundled into units of compute scale called Data Warehouse Units (DWUs). DWU 表示抽象、规范化的计算资源和性能度量值。A DWU represents an abstract, normalized measure of compute resources and performance. 通过更改服务级别,可更改分配给系统的 DWU 数,这反过来又会调整系统的性能和成本。By changing your service level you alter the number of DWUs that are allocated to the system, which in turn adjusts the performance, and the cost, of your system.

增加数据仓库单位数,则需支付的性能费用更高。To pay for higher performance, you can increase the number of data warehouse units. 减少数据仓库单位数,则需支付的性能费用更低。To pay for less performance, reduce data warehouse units. 存储和计算成本分别计费,因此更改数据仓库单位数不会影响存储成本。Storage and compute costs are billed separately, so changing data warehouse units does not affect storage costs.

数据仓库单位性能基于这些数据仓库负载指标:Performance for data warehouse units is based on these data warehouse workload metrics:

  • 标准数据仓库查询扫描大量行并执行复杂聚合的速度有多快?How fast can a standard data warehousing query scan a large number of rows and then perform a complex aggregation? 这是一种 I/O 和 CPU 密集型操作。This operation is I/O and CPU intensive.
  • 数据仓库从 Azure 存储 Blob 引入数据的速度有多快?How fast can the data warehouse ingest data from Azure Storage Blobs? 这是一种网络和 CPU 密集型操作。This operation is network and CPU intensive.
  • CREATE TABLE AS SELECT T-SQL 命令复制表的速度有多快?How fast can the CREATE TABLE AS SELECT T-SQL command copy a table? 此操作涉及从存储读取数据、将数据分配到设备的节点上,以及重新将数据写入到存储。This operation involves reading data from storage, distributing it across the nodes of the appliance and writing to storage again. 这是一种 CPU、IO 和网络密集型操作。This operation is CPU, IO, and network intensive.

增加 DWU:Increasing DWUs:

  • 以线性方式更改系统对扫描、聚合和 CTAS 语句的性能Linearly changes performance of the system for scans, aggregations, and CTAS statements
  • 增加 PolyBase 加载操作的读取器和编写器数量Increases the number of readers and writers for PolyBase load operations
  • 增加并发查询和并发槽的最大数量。Increases the maximum number of concurrent queries and concurrency slots.

服务级别目标Service Level Objective

服务级别目标 (SLO) 是确定数据仓库的成本和性能级别的可伸缩性设置。The Service Level Objective (SLO) is the scalability setting that determines the cost and performance level of your data warehouse. 第 2 代服务级别以计算数据仓库单位 (cDWU) 计量,例如 DW2000c。The service levels for Gen2 are measured in compute data warehouse units (cDWU), for example DW2000c. 第 1 代服务级别以 DWU 计量,例如 DW2000。Gen1 service levels are measured in DWUs, for example DW2000.

在 T-SQL 中,SERVICE_OBJECTIVE 设置确定了数据仓库的服务级别和性能层。In T-SQL the SERVICE_OBJECTIVE setting determines the service level and the performance tier for your data warehouse.

--Gen1
CREATE DATABASE myElasticSQLDW
WITH
(    SERVICE_OBJECTIVE = 'DW1000'
)
;

--Gen2
CREATE DATABASE myComputeSQLDW
WITH
(    SERVICE_OBJECTIVE = 'DW1000c'
)
;

性能层和数据仓库单位Performance Tiers and Data Warehouse Units

每个性能层用于其数据仓库单位测量的单位都略有不同。Each performance tier uses a slightly different unit of measure for their data warehouse units. 当规模单位直接转换为计费时,这种差异会反映在发票上。This difference is reflected on the invoice as the unit of scale directly translates to billing.

  • 第 1 代数据仓库以数据仓库单位计量 (DWU)。Gen1 data warehouses are measured in Data Warehouse Units (DWUs).
  • 第 2 代数据仓库以计算数据仓库单位计量 (cDWU)。Gen2 data warehouses are measured in compute Data Warehouse Units (cDWUs).

DWU 和 cDWU 都支持增加或减少计算,以及在无需使用数据仓库时暂停计算。Both DWUs and cDWUs support scaling compute up or down, and pausing compute when you don't need to use the data warehouse. 这些操作均可按需进行。These operations are all on-demand. 第 2 代还会在计算节点上使用基于本地磁盘的缓存以提高性能。Gen2 uses a local disk-based cache on the compute nodes to improve performance. 缩放或暂停系统时,缓存将失效,因此在达到最佳性能前,缓存需要预热一段时间。When you scale or pause the system, the cache is invalidated and so a period of cache warming is required before optimal performance is achieved.

增加数据库单位时,将以线性方式增加计算资源。As you increase data warehouse units, you are linearly increasing computing resources. 第 2 代可提供最佳查询性能和最大规模。Gen2 provides the best query performance and highest scale. 这些系统最大限度利用缓存。These systems make the most use of the cache.

容量限制Capacity limits

每个 SQL Server(例如 myserver.database.chinacloudapi.cn)都有一个数据库事务单位 (DTU) 配额,允许使用特定的数据仓库单位数。Each SQL server (for example, myserver.database.chinacloudapi.cn) has a Database Transaction Unit (DTU) quota that allows a specific number of data warehouse units. 有关详细信息,请参阅工作负荷管理容量限制For more information, see the workload management capacity limits.

我需要多少个数据仓库单位?How many data warehouse units do I need?

合适的数据仓库单位数很大程度上取决于工作负荷及已加载到系统的数据量。The ideal number of data warehouse units depends very much on your workload and the amount of data you have loaded into the system.

查找最适合工作负荷的 DWU 的步骤:Steps for finding the best DWU for your workload:

  1. 首先选择一个较小的 DWU。Begin by selecting a smaller DWU.
  2. 在测试数据加载到系统中时,监视应用程序性能,将所选 DWU 数目与观测到的性能变化进行比较。Monitor your application performance as you test data loads into the system, observing the number of DWUs selected compared to the performance you observe.
  3. 确认峰值活动周期的其他要求。Identify any additional requirements for periodic periods of peak activity. 如果工作负荷在活动中显示出重要的峰值和谷值,则有充分理由进行频繁缩放。If the workload shows significant peaks and troughs in activity and there is a good reason to scale frequently.

SQL 数据仓库是一个向外扩展系统,可预配大量计算和查询大量数据。SQL Data Warehouse is a scale-out system that can provision vast amounts of compute and query sizeable quantities of data. 要查看其真正的缩放功能(尤其是针对较大的 DWU),建议在缩放的同时对数据集进行缩放,确保可向 CPU 提供足够的数据。To see its true capabilities for scaling, especially at larger DWUs, we recommend scaling the data set as you scale to ensure that you have enough data to feed the CPUs. 对于规模测试,建议至少使用 1 TB。For scale testing, we recommend using at least 1 TB.

Note

此外,如果可以计算节点之间拆分工作,与多个并行化只会增加查询性能。Query performance only increases with more parallelization if the work can be split between compute nodes. 如果发现该缩放未更改性能,建议优化表设计和/或查询。If you find that scaling is not changing your performance, you may need to tune your table design and/or your queries. 有关查询优化指南,请参阅管理用户查询For query tuning guidance, see Manage user queries.

权限Permissions

更改数据仓库单位需要 ALTER DATABASE 中所述的权限。Changing the data warehouse units requires the permissions described in ALTER DATABASE.

Azure 资源的内置角色(如 SQL DB 参与者和 SQL Server 参与者)可以更改 DWU 设置。Built-in roles for Azure resources such as SQL DB Contributor and SQL Server Contributor can change DWU settings.

查看当前的 DWU 设置View current DWU settings

查看当前的 DWU 设置:To view the current DWU setting:

  1. 在 Visual Studio 中打开“SQL Server 对象资源管理器”。Open SQL Server Object Explorer in Visual Studio.
  2. 连接到与逻辑 SQL 数据库服务器关联的 master 数据库。Connect to the master database associated with the logical SQL Database server.
  3. 从 sys.database_service_objectives 动态管理视图中选择。Select from the sys.database_service_objectives dynamic management view. 以下是示例:Here is an example:
SELECT  db.name [Database]
,       ds.edition [Edition]
,       ds.service_objective [Service Objective]
FROM    sys.database_service_objectives   AS ds
JOIN    sys.databases                     AS db ON ds.database_id = db.database_id
;

更改数据仓库单位Change data warehouse units

Azure 门户Azure portal

更改 DWU 或 cDWU:To change DWUs or cDWUs:

  1. 打开 Azure 门户,打开数据库,并单击“缩放”。Open the Azure portal, open your database, and click Scale.

  2. 在“缩放”下,向左或向右移动滑块,以更改 DWU 设置。Under Scale, move the slider left or right to change the DWU setting.

  3. 单击“保存” 。Click Save. 此时会显示确认消息。A confirmation message appears. 单击“是”确认,或者单击“否”取消。Click yes to confirm or no to cancel.

PowerShellPowerShell

Note

本文进行了更新,以便使用新的 Azure PowerShell Az 模块。This article has been updated to use the new Azure PowerShell Az module. 你仍然可以使用 AzureRM 模块,至少在 2020 年 12 月之前,它将继续接收 bug 修补程序。You can still use the AzureRM module, which will continue to receive bug fixes until at least December 2020. 若要详细了解新的 Az 模块和 AzureRM 兼容性,请参阅新 Azure Powershell Az 模块简介To learn more about the new Az module and AzureRM compatibility, see Introducing the new Azure PowerShell Az module. 有关 Az 模块安装说明,请参阅安装 Azure PowerShellFor Az module installation instructions, see Install Azure PowerShell.

若要更改 DWU 或 cDWU,请使用 Set-AzSqlDatabase PowerShell cmdlet。To change the DWUs or cDWUs, use the Set-AzSqlDatabase PowerShell cmdlet. 以下示例将托管在服务器 MyServer 上的数据库 MySQLDW 的服务级别目标设置为 DW1000。The following example sets the service level objective to DW1000 for the database MySQLDW that is hosted on server MyServer.

Set-AzSqlDatabase -DatabaseName "MySQLDW" -ServerName "MyServer" -RequestedServiceObjectiveName "DW1000"

有关详细信息,请参阅适用于 SQL 数据仓库的 PowerShell cmdletFor more information, see PowerShell cmdlets for SQL Data Warehouse

T-SQLT-SQL

使用 T-SQL 可查看当前的 DWU 或 cDWU 设置、更改设置和检查进度。With T-SQL you can view the current DWU or cDWU settings, change the settings, and check the progress.

更改 DWU 或 cDWU:To change the DWUs or cDWUs:

  1. 连接到与逻辑 SQL 数据库服务器关联的 master 数据库。Connect to the master database associated with your logical SQL Database server.
  2. 使用 ALTER DATABASE TSQL 语句。Use the ALTER DATABASE TSQL statement. 以下示例将数据库 MySQLDW 的服务级别目标设置为 DW1000。The following example sets the service level objective to DW1000 for the database MySQLDW.
ALTER DATABASE MySQLDW
MODIFY (SERVICE_OBJECTIVE = 'DW1000')
;

REST APIREST APIs

若要更改 DWU,请使用创建或更新数据库 REST APITo change the DWUs, use the Create or Update Database REST API. 以下示例将托管在服务器 MyServer 上的数据库 MySQLDW 的服务级别目标设置为 DW1000。The following example sets the service level objective to DW1000 for the database MySQLDW which is hosted on server MyServer. 该服务器位于名为 ResourceGroup1 的 Azure 资源组中。The server is in an Azure resource group named ResourceGroup1.

PUT https://management.chinacloudapi.cn/subscriptions/{subscription-id}/resourceGroups/{resource-group-name}/providers/Microsoft.Sql/servers/{server-name}/databases/{database-name}?api-version=2014-04-01-preview HTTP/1.1
Content-Type: application/json; charset=UTF-8

{
    "properties": {
        "requestedServiceObjectiveName": DW1000
    }
}

有关更多 REST API 示例,请参阅适用于 SQL 数据仓库的 REST APIFor more REST API examples, see REST APIs for SQL Data Warehouse.

检查 DWU 更改的状态Check status of DWU changes

DWU 更改可能需要几分钟才能完成。DWU changes may take several minutes to complete. 如果要自动缩放,建议实现逻辑以确保某些操作在继续另一项操作前已完成。If you are scaling automatically, consider implementing logic to ensure that certain operations have been completed before proceeding with another action.

检查通过不同终结点的数据库状态可允许你正确实现自动化。Checking the database state through various endpoints allows you to correctly implement automation. 门户在完成操作和数据库当前状态时提供通知,但不允许以编程方式检查状态。The portal provides notification upon completion of an operation and the databases current state but does not allow for programmatic checking of state.

不能使用 Azure 门户检查向外扩展操作的数据库状态。You cannot check the database state for scale-out operations with the Azure portal.

检查 DWU 更改的状态:To check the status of DWU changes:

  1. 连接到与逻辑 SQL 数据库服务器关联的 master 数据库。Connect to the master database associated with your logical SQL Database server.
  2. 提交以下查询以检查数据库状态。Submit the following query to check database state.
SELECT    *
FROM      sys.databases
;
  1. 提交以下查询以检查操作状态Submit the following query to check status of operation
SELECT    *
FROM      sys.dm_operation_status
WHERE     resource_type_desc = 'Database'
AND       major_resource_id = 'MySQLDW'
;

此 DMV 返回针对 SQL 数据仓库的各种管理操作的相关信息,例如操作和操作状态(IN_PROGRESS 或 COMPLETED)。This DMV returns information about various management operations on your SQL Data Warehouse such as the operation and the state of the operation, which is either be IN_PROGRESS or COMPLETED.

缩放工作流The scaling workflow

启动缩放操作时,系统首先终止所有打开的会话,回退所有打开的事务以确保状态一致。When you initiate a scale operation, the system first kills all open sessions, rolling back any open transactions to ensure a consistent state. 对于缩放操作,缩放仅在此事务回退完成后才会发生。For scale operations, scaling only occurs after this transactional rollback has completed.

  • 对于增加操作,系统会预配额外计算,然后重新附加到存储层。For a scale-up operation, the system provisions the additional compute and then reattaches to the storage layer.
  • 对于减少操作,不需要的节点会从存储中分离出来并重新附加到剩余节点。For a scale-down operation, the unneeded nodes detach from the storage and reattach to the remaining nodes.

后续步骤Next steps

若要了解有关如何管理性能的详细信息,请参阅用于工作负荷管理的资源类内存和并发限制To learn more about managing performance, see Resource classes for workload management and Memory and concurrency limits.