Troubleshoot query acceleration over external delta tables

Applies to: ✅ Azure Data Explorer

The query acceleration policy enables accelerating queries over external delta tables by caching delta table metadata and data files. The policy defines which date ranges (number of days back and hot windows) are accelerated so that queries over those ranges can run faster.

The query acceleration feature consists of the following components:

  • A background job that maintains a local snapshot (catalog) of the delta table metadata.
  • A background job that caches delta table data files.
  • Query-time enhancements that utilize the catalog and the cached data.

To understand why things aren't working as expected, it's important to identify which of these components isn't functioning properly.

This article helps you troubleshoot scenarios where:

  • Query over an accelerated external delta table returns stale data, or
  • Query over an accelerated external delta table is slower than expected

Prerequisites

  1. Ensure query acceleration is enabled on the external table by running the following command:

    .show external table <ETName> policy query_acceleration
    | project isnotnull(Policy) and todynamic(Policy).IsEnabled
    

    If this command returns false, enable the query acceleration policy using the .alter query acceleration policy command.

  2. Ensure the delta table complies with the Delta protocol.

    The query acceleration feature assumes a delta table that complies with the Delta protocol. Manual operations executed directly on the delta table (for example, editing transaction logs or parquet files) aren't supported and may result in unexpected behavior.

    If such operations have been executed on the delta table, recreate the external table and re-enable the query acceleration policy.

Query is returning stale data

This is a data freshness issue: query results don't reflect the latest data from the underlying delta table.

Query acceleration refreshes the accelerated data periodically, so that results are no older than the configured MaxAge value in the policy. By design, queries over accelerated external tables may return data that lags behind the latest delta table version by up to MaxAge. Set MaxAge to the maximum data staleness that is acceptable at query time.

You can control the effective MaxAge in two ways:

Query isn't running fast enough

This is a performance issue: query is slower than expected, and acceleration doesn't appear to improve performance.

There are a few reasons why this could happen:

Check if catalog is unusable

Query acceleration uses a local catalog for the external table containing a snapshot of the delta table metadata. If this catalog hasn't been updated within the configured MaxAge (see the query acceleration policy's MaxAge property), it's considered unusable and isn't used at query time. In that case, queries fall back to reading the remote delta table directly, which can be significantly slower.

Fetch the current state of the catalog using the following command:

.show external table [ETName] details
| extend MinimumUpdateTime = now() - totimespan(todynamic(QueryAccelerationPolicy).MaxAge)
| project IsCatalogUnusable = MinimumUpdateTime > todatetime(todynamic(QueryAccelerationState).LastUpdatedDateTime)

IsCatalogUnusable == true indicates the catalog is stale and query acceleration won't be used.

Troubleshoot unusable catalogs

To understand why a catalog is unusable, first check if the query acceleration state is healthy and resolve unhealthy reasons as needed.

Run:

.show external table [ETName] details
| project state = todynamic(QueryAccelerationState)
| project IsHealthy = state.IsHealthy, UnhealthyReason = state.NotHealthyReason

Query acceleration policy was enabled recently

When the query acceleration policy is enabled for the first time, building the initial catalog needs to complete before it can be used in queries. During this period, the LastUpdatedDateTime value is empty:

.show external table [ETName] details
| project todynamic(QueryAccelerationState).LastUpdatedDateTime

If LastUpdatedDateTime is empty, allow some time for the first update to complete. This usually takes up to several minutes. Subsequent updates are expected to be significantly faster.

Query acceleration unhealthy state - understanding and mitigating

When an external table's query acceleration is unhealthy, you can retrieve the unhealthy reason using the following command:

.show external table [ETName] details
| project todynamic(QueryAccelerationState).NotHealthyReason

Use the following table to understand and mitigate common unhealthy states.

Note

To re-enable an external table's query acceleration policy, run the following commands:

.execute database script <|
.alter-merge external table [ETName] policy query_acceleration @'{"IsEnabled":false}'
.alter-merge external table [ETName] policy query_acceleration @'{"IsEnabled":true}'
Unhealthy reason Example NotHealthyReason Action
External table access is forbidden InaccessibleDeltaTable: Access to Delta table is forbidden Verify the connection string of the external table, including the authentication method, and that the permissions on the underlying storage are correct.
External table connection string doesn't point to a valid delta table DeltaTableNotFound: Delta table does not exist Alter the external table’s connection string so it targets a valid Delta table location. Make sure the path points to the table’s root folder - not the _delta_log directory.
Delta table column mapping mode has changed ColumnMappingModeChange: Column mapping mode has changed. Previous: 'None', New: 'Name' Recreate the external table and re-enable the query acceleration policy.
Delta table column mappings have changed NewColumnMapping: New column mapping was introduced. Column 'Col1' is now mapped to 'Col2' Recreate the external table and re-enable the query acceleration policy so that column mappings are aligned with the delta table.
Delta table column type has changed ColumnTypeMismatch: Column 'Col1' type has changed. Previous delta type: 'long', New type: 'string'. Respective external table type: long Recreate (or alter) the external table so that its schema is aligned with the delta table column types, and then re-enable query acceleration.
Hot datetime column not found HotDateTimeColumn 'Col1' does not exist as a datetime column in the Delta table schema Alter the query acceleration policy to include a valid HotDateTimeColumn (a column of type datetime in the delta table), or leave the property empty if not required.
Delta table has one of the following unsupported features for query acceleration
• Column mapping mode 'Name'
• AddFile transactions referencing files with absolute path
• deletion vectors with absolute path
Unsupported feature: Column mapping of type 'Id' Recreate the delta table with a supported configuration (for example, using Name column mapping type), and then re-enable query acceleration.
Managed identity error Managed identity must be specified for external tables with impersonation authentication. Ensure that the query acceleration policy contains a valid managed identity that has:
• Appropriate permissions on the Delta table
• The AutomatedFlows usage type in the cluster or database managed identity policy.

Check if query is over nonaccelerated data

To fully benefit from query acceleration, queries must be executed over accelerated data. Non-accelerated data is read directly from the remote delta table, which may result in significant latency.

Use the following command and filter on a time frame that includes the relevant query:

.show queries
| where StartedOn > ago(1h)
| extend ExternalDataStats = OverallQueryStats.input_dataset_statistics.external_data

If ExternalDataStats.iterated_artifacts or ExternalDataStats.downloaded_items are greater than 0, it means data was read from the remote delta table (non-accelerated path). The following section helps you understand why.

Troubleshoot queries over nonaccelerated data

There are two main reasons why a query might read non-accelerated data:

  • The query-time filter isn't fully within the query acceleration hot period or hot windows.
  • The data within the policy hot period isn't fully cached.

Query filter isn't fully within the hot period or hot windows

Run the following command to view the hot caching properties and make sure the query filters match them:

.show external table [ETName] policy query_acceleration
| project Policy = todynamic(Policy)
| project Policy.Hot, Policy.HotWindows

Ensure your query's time filter is fully contained within the configured Hot period or the defined HotWindows.

If it's a one-time query, policy change isn't recommended. However, if you anticipate running multiple queries over the same time range that lies outside the configured Hot period or defined HotWindows and require improved performance, alter the policy by:

  • Increasing the hot period, and/or
  • Adding additional hot windows that match your query patterns.

Data within the hot period isn't fully cached

Use the following command to check the acceleration progress:

.show external table [ETName] details
| project state = todynamic(QueryAccelerationState)
| project state.CompletionPercentage, state.PendingDataFilesCount

Understanding and mitigating data acceleration issues

Unaccelerated data (CompletionPercentage < 100) can stem from several issues.

Data is currently being accelerated

Data acceleration might take time, especially when:

  • A query acceleration policy has recently been enabled, or
  • A significant amount of data was recently added to the delta table, or
  • The delta table has undergone an optimization operation such as OPTIMIZE that results in many deleted and recreated files.

Frequently running OPTIMIZE or MERGE operations on the source delta table that cause large-scale rewrites of data files can negatively affect acceleration performance because data files are repeatedly rewritten, and need to be accelerated.

Data files aren't eligible for acceleration

Parquet data files larger than 1 GB won't be cached.

If your delta table includes many large files, consider adjusting your data generation or optimization strategy to produce smaller parquet files. If this requires recreating the Delta table, make sure you recreate the external table and re-enable query acceleration policy.

Insufficient cluster capacity or resources

Query acceleration operations are restricted by the cluster's available query acceleration capacity.

Run the following command to view the remaining capacity:

.show capacity
| where Resource == 'QueryAcceleration'
| project Remaining
  • If Remaining == 0 consistently and CompletionPercentage isn't increasing, consider:

    • Scaling the cluster out or up to provide more resources.
    • Increasing the QueryAcceleration capacity by altering the capacity policy. NOTE: altering the capacity policy may have adverse effects on other operations. Alter the policy as a last resort at your own discretion.