Distribution Advisor in Azure Synapse SQL

Applies to: Azure Synapse Analytics dedicated SQL pools (formerly SQL DW)

In Azure Synapse SQL, each table is distributed using the strategy chosen by the customer (Round Robin, Hash Distributed, Replicated). The chosen distribution strategy can affect query performance substantially.

The Distribution Advisor (DA) feature of Azure Synapse SQL analyzes customer queries and recommends the best distribution strategies for tables to improve query performance. Queries to be considered by the advisor can be provided by the customer or pulled from historic queries available in DMV.

Note

Distribution Advisor is currently in preview for Azure Synapse Analytics. Preview features are meant for testing only and should not be used on production instances or production data. As a preview feature, Distribution Advisor is subject to undergo changes in behavior or functionality. Please also keep a copy of your test data if the data is important. Distribution Advisor does not support Multi-Column distributed tables.

Prerequisites

  • Execute the T-SQL statement SELECT @@version to ensure that your Azure Synapse Analytics dedicated SQL pool is version 10.0.15669 or higher. If your version is lower, a new version should automatically reach your provisioned dedicated SQL pools during their maintenance cycle.

  • Ensure that statistics are available and up-to-date before running the advisor. For more details, Manage table statistics, CREATE STATISTICS, and UPDATE STATISTICS articles for more details on statistics.

  • Enable the Azure Synapse distribution advisor for the current session with the SET RECOMMENDATIONS T-SQL command.

Analyze workload and generate distribution recommendations

The follow tutorial explains the sample use case for using the Distribution Advisor feature to analyze customer queries and recommend the best distribution strategies.

Distribution Advisor only analyzes queries run on user tables.

1. Create Distribution Advisor stored procedures

To run the advisor easily, create two new stored procedures in the database. Run the CreateDistributionAdvisor_PublicPreview script available for download from GitHub:

Command Description
dbo.write_dist_recommendation Defines queries that DA will analyze on. You can provide queries manually, or read from up to 100 past queries from the actual workloads in sys.dm_pdw_exec_requests.
dbo.read_dist_recommendation Runs the advisor and generates recommendations.

Here is an example of how you could run the advisor.

2a. Run the advisor on past workload in DMV

Run the following commands to read up to the last 100 queries in the workload for analysis and distribution recommendations:

EXEC dbo.write_dist_recommendation <Number of Queries max 100>, NULL
go
EXEC dbo.read_dist_recommendation;
go

To see which queries were analyzed by DA, run the e2e_queries_used_for_recommendations.sql script available for download from GitHub.

2b. Run the advisor on selected queries

The first parameter in dbo.write_dist_recommendation should be set to 0, and the second parameter is a semi-colon separated list of up to 100 queries that DA will analyze. In the below example, we want to see the distribution recommendation for two statements separated by semicolons, select count (*) from t1; and select * from t1 join t2 on t1.a1 = t2.a1;.

EXEC dbo.write_dist_recommendation 0, 'select count (*) from t1; select * from t1 join t2 on t1.a1 = t2.a1;'
go
EXEC dbo.read_dist_recommendation;
go

3. View recommendations

The dbo.read_dist_recommendation system stored procedure will return recommendations in the following format when execution is completed:

Column name Description
Table_name The table that DA analyzed. One line per table regardless of change in recommendation.
Current_Distribution Current table distribution strategy.
Recommended_Distribution Recommended distribution. This can be the same as Current_Distribution if there is no change recommended.
Distribution_Change_Command A CTAS T-SQL command to implement the recommendation.

4. Implement the advice

  • Run the CTAS command provided by Distribution Advisor to create new tables with the recommended distribution strategy.
  • Modify queries to run on new tables.
  • Execute queries on old and new tables to compare for performance improvements.

Troubleshooting

This section contains common troubleshooting scenarios and common mistakes that you may encounter.

1. Stale state from a previous run of the advisor

1a. Symptom:

You see this error message upon running the advisor:

Msg 110813, Level 16, State 1, Line 1
Calling GetLastScalarResult() before executing scalar subquery.
1b. Mitigation:
  • Verify that you are using single quotes '' to run the advisor on select queries.
  • Start a new session in SSMS and run the advisor.

2. Errors during running the advisor

2a. Symptom:

The 'result' pane shows CommandToInvokeAdvisorString below but does not show the RecommendationOutput below.

For example, you see only the Command_to_Invoke_Distribution_Advisor resultset.

Screenshot of the output of a T-SQL result showing the Command_to_Invoke_Distribution_Advisor.

But not the second resultset containing the table change T-SQL commands:

Screenshot of the output of a T-SQL result showing the Command_to_Invoke_Distribution_Advisor with a second resultset containing table change T-SQL commands.

2b. Mitigation:
  • Check the output of CommandToInvokeAdvisorString above.

  • Remove queries that may not be valid anymore which may have been added here from either the hand-selected queries or from the DMV by editing WHERE clause in: Queries Considered by DA.

3. Error during post-processing of recommendation output

3a. Symptom:

You see the following error message.

Invalid length parameter passed to the LEFT or SUBSTRING function.
3b. Mitigation:

Ensure that you have the most up to date version of the stored procedure from GitHub:

Next steps