SQL warehouse sizing, scaling, and queuing behavior
This article explains the cluster sizing, queuing, and autoscaling behavior of SQL warehouses.
Cluster sizes for pro and classic SQL warehouses
The table in this section maps SQL warehouse cluster sizes to Azure Databricks cluster driver size and worker counts. The driver size only applies to pro and classic SQL warehouses.
Cluster size | Instance type for driver (applies only to pro and classic SQL warehouses) | Worker count |
---|---|---|
2X-Small | Standard_E8ds_v4 | 1 x Standard_E8ds_v4 |
X-Small | Standard_E8ds_v4 | 2 x Standard_E8ds_v4 |
Small | Standard_E16ds_v4 | 4 x Standard_E8ds_v4 |
Medium | Standard_E32ds_v4 | 8 x Standard_E8ds_v4 |
Large | Standard_E32ds_v4 | 16 x Standard_E8ds_v4 |
X-Large | Standard_E64ds_v4 | 32 x Standard_E8ds_v4 |
2X-Large | Standard_E64ds_v4 | 64 x Standard_E8ds_v4 |
3X-Large | Standard_E64ds_v4 | 128 x Standard_E8ds_v4 |
4X-Large | Standard_E64ds_v4 | 256 x Standard_E8ds_v4 |
The instance size of all workers is Standard_E8ds_v4.
Each driver and worker has eight 128 GB Standard LRS managed disks attached. Attached disks are charged hourly.
Required Azure vCPU quota for classic and pro SQL warehouses
To start a classic or pro SQL warehouse, you must have adequate Azure vCPU quota for Standard_E8ds_v4 instances in your Azure account. Use the following guidelines to determine the required vCPU quota:
- If you have only one or two SQL warehouses, ensure that you have 8 Azure vCPU available for each core in the cluster. This ensures that you have adequate Azure vCPU to account for the re-provisioning of your warehouse that happens roughly every 24 hours. If your SQL warehouses use auto-scaling or multi-cluster load balancing, you may need to increase the multiplier.
- As the number of SQL warehouses increases, allow for between 4 and 8 Azure vCPU for each core in the cluster. Databricks recommends starting with a larger number and monitoring for stability.
- Azure vCPUs used by SQL warehouses is in addition to Azure vCPUs used by clusters used by Data Science & Engineering or by non-Databricks workloads.
To request additional Azure vCPU quota, see Azure support request.
Note
The information in this table can vary based on product or region availability and workspace type.
Queueing and autoscaling for pro and classic SQL warehouses
Azure Databricks limits the number of queries on a cluster assigned to a SQL warehouse based on the cost to compute their results. Upscaling of clusters per warehouse is based on query throughput, the rate of incoming queries, and the queue size. Azure Databricks recommends a cluster for every 10 concurrent queries. The maximum number of queries in a queue for all SQL warehouse types is 1000.
Azure Databricks adds clusters based on the time it would take to process all currently running queries, all queued queries, and the incoming queries expected in the next two minutes.
- If less than 2 minutes, don't upscale.
- If 2 to 6 minutes, add 1 cluster.
- If 6 to 12 minutes, add 2 clusters.
- If 12 to 22 minutes, add 3 clusters.
Otherwise, Azure Databricks adds 3 clusters plus 1 cluster for every additional 15 minutes of expected query load.
In addition, a warehouse is always upscaled if a query waits for 5 minutes in the queue.
If the load is low for 15 minutes, Azure Databricks downscales the SQL warehouse. It keeps enough clusters to handle the peak load over the last 15 minutes. For example, if the peak load was 25 concurrent queries, Azure Databricks keeps 3 clusters.
Query queuing for pro and classic SQL warehouses
Azure Databricks queues queries when all clusters assigned to the warehouse are executing queries at full capacity or when the warehouse is in the STARTING
state. The maximum number of queries in a queue for all SQL warehouse types is 1000.
Metadata queries (for example, DESCRIBE <table>
) and state modifying queries (for example SET
) are never queued, unless the warehouse is in the STARTING
state.