What's new in Azure Synapse Analytics?
This page is continuously updated with a recent review of what's new in Azure Synapse Analytics, and also what features are currently in preview.
For older updates, review past Azure Synapse Analytics Blog posts or previous updates in Azure Synapse Analytics.
Features currently in preview
The following table lists the features of Azure Synapse Analytics that are currently in preview. Preview features are sorted alphabetically.
Feature | Learn more |
---|---|
Apache Spark Delta Lake tables in serverless SQL pools | The ability to for serverless SQL pools to access Delta Lake tables created in Spark databases is in preview. For more information, see Azure Synapse Analytics shared metadata tables. |
Apache Spark elastic pool storage | Azure Synapse Analytics Spark pools now support elastic pool storage in preview. Elastic pool storage allows the Spark engine to monitor worker node temporary storage and attach more disks if needed. No action is required, and you should see fewer job failures as a result. For more information, see Azure Synapse Analytics Spark elastic pool storage. |
Apache Spark R language support | Built-in R support for Apache Spark is now in preview. |
Browse ADLS Gen2 folders in the Azure Synapse Analytics workspace | You can now browse an Azure Data Lake Storage Gen2 (ADLS Gen2) container or folder in your Azure Synapse Analytics workspace in Synapse Studio. To learn more, see Browse an ADLS Gen2 folder with ACLs in Azure Synapse Analytics. |
Capture changed data from Cosmos DB analytical store | Azure Cosmos DB analytical store now supports change data capture (CDC) for Azure Cosmos DB API for NoSQL and Azure Cosmos DB API for MongoDB. For more information, see Capture Changed Data from your Cosmos DB analytical store and DevBlog: Change Data Capture (CDC) with Azure Cosmos DB analytical store. |
Distribution Advisor | The Distribution Advisor is a new preview feature in Azure Synapse dedicated SQL pools Gen2 that analyzes queries and recommends the best distribution strategies for tables to improve query performance. For more information, see Distribution Advisor in Azure Synapse SQL. |
Reject options for delimited text files | Reject options for CREATE EXTERNAL TABLE on delimited files is in preview. |
Spark Advisor for Azure Synapse Notebook | The Spark Advisor for Azure Synapse Notebook analyzes code run by Spark and displays real-time advice for Notebooks. The Spark advisor offers recommendations for code optimization based on built-in common patterns, performs error analysis, and locates the root cause of failures. |
Time-To-Live in managed virtual network (VNet) | Reserve compute for the time-to-live (TTL) in managed virtual network TTL period, saving time and improving efficiency. For more information on this preview, see Announcing public preview of Time-To-Live (TTL) in managed virtual network. |
User-Assigned managed identities | Now you can use user-assigned managed identities in linked services for authentication in Synapse Pipelines and Dataflows. To learn more, see Credentials in Azure Data Factory and Azure Synapse. |
Generally available features
The following table lists the features of Azure Synapse Analytics that have transitioned from preview to general availability (GA) within the last 12 months.
Month | Feature | Learn more |
---|---|---|
April 2023 | Apache Spark Optimized Write | Optimize Write is a Delta Lake on Azure Synapse feature reduces the number of files written by Apache Spark 3 (3.1 and 3.2) and aims to increase individual file size of the written data. |
February 2023 | UTF-8 and Japanese collations support for dedicated SQL pools | Both UTF-8 support and Japanese collations are now generally available for dedicated SQL pools. |
February 2023 | Azure Synapse Runtime for Apache Spark 3.3 | The Azure Synapse Runtime for Apache Spark 3.3 is now generally available. Based on our testing using the 1TB TPC-H industry benchmark, you're likely to see up to 77% increased performance. |
December 2022 | SSIS IR Express virtual network injection | Both the standard and express methods to inject your SSIS Integration Runtime (IR) into a VNet are generally available now. For more information, see General Availability of Express Virtual Network injection for SSIS in Azure Data Factory. |
November 2022 | Azure Synapse Link for SQL | Azure Synapse Link for SQL is now generally available for both SQL Server 2022 and Azure SQL Database. The Azure Synapse Link for SQL feature provides low- and no-code, near real-time data replication from your SQL-based operational stores into Azure Synapse Analytics. Provide BI reporting on operational data in near real-time, with minimal impact on your operational store. To learn more, visit What is Azure Synapse Link for SQL? |
October 2022 | SAP CDC connector GA | The data connector for SAP Change Data Capture (CDC) is now GA. For more information, see Announcing Public Preview of the SAP CDC solution in Azure Data Factory and Azure Synapse Analytics and SAP CDC solution in Azure Data Factory. |
September 2022 | MERGE T-SQL syntax | MERGE T-SQL syntax has been a highly requested addition to the Synapse T-SQL library. As in SQL Server, the MERGE syntax encapsulates INSERTs/UPDATEs/DELETEs into a single high-performance statement. Available in dedicated SQL pools in version 10.0.17829 and above. For more, see the MERGE T-SQL announcement blog. |
July 2022 | Apache Spark™ 3.2 for Synapse Analytics | Apache Spark™ 3.2 for Synapse Analytics is now generally available. Review the official release notes and migration guidelines between Spark 3.1 and 3.2 to assess potential changes to your applications. For more details, read Apache Spark version support and Azure Synapse Runtime for Apache Spark 3.2. Highlights of what got better in Spark 3.2 in the Azure Synapse Analytics July Update 2022. |
July 2022 | Apache Spark in Azure Synapse Intelligent Cache feature | Intelligent Cache for Spark automatically stores each read within the allocated cache storage space, detecting underlying file changes and refreshing the files to provide the most recent data. To learn more, see how to Enable/Disable the cache for your Apache Spark pool. |
June 2022 | Map Data tool | The Map Data tool is a guided process to help you create ETL mappings and mapping data flows from your source data to Synapse without writing code. To learn more about the Map Data tool, read Map Data in Azure Synapse Analytics. |
June 2022 | User Defined Functions | User defined functions (UDFs) are now generally available. To learn more, read User defined functions in mapping data flows. |
Apache Spark for Azure Synapse Analytics
This section summarizes recent new features and capabilities of Apache Spark for Azure Synapse Analytics.
Month | Feature | Learn more |
---|---|---|
April 2023 | Delta Lake - Low Shuffle Merge | Low Shuffle Merge optimization for Delta tables is now available in Apache Spark 3.2 pools. You can now update a Delta table with advanced conditions using the Delta Lake MERGE command. |
March 2023 | Library management new ability: in-line installation | %pip and %conda are now available in Apache Spark for Synapse! %pip and %conda are commands that can be used on Notebooks to install Python packages. For more information, see Manage session-scoped Python packages through %pip and %conda commands. |
January 2023 | Spark Advisor for Azure Synapse Notebook | The Spark Advisor for Azure Synapse Notebook analyzes code run by Spark and displays real-time advice for Notebooks. The Spark advisor offers recommendations for code optimization based on built-in common patterns, performs error analysis, and locates the root cause of failures. |
January 2023 | Improve Spark pool utilization with Synapse Genie | The Synapse Genie Framework improves Spark pool utilization by executing multiple Synapse notebooks on the same Spark pool instance. Read more about this metadata-driven utility written in Python. |
September 2022 | New informative Livy error codes | More precise error codes describe the cause of failure and replaces the previous generic error codes. Previously, all errors in failing Spark jobs surfaced with a generic error code displaying LIVY_JOB_STATE_DEAD . |
September 2022 | New query optimization techniques in Apache Spark for Azure Synapse Analytics | Read the findings from Microsoft's work to gain considerable performance benefits across the board on the reference TPC-DS workload as well as a significant reduction in query plan generation time. |
August 2022 | Apache Spark elastic pool storage | Azure Synapse Analytics Spark pools now support elastic pool storage in preview. Elastic pool storage allows the Spark engine to monitor worker nodes temporary storage and attach additional disks if needed. No action is required, and you should see fewer job failures as a result. For more information, see Blog: Azure Synapse Analytics Spark elastic pool storage is available for public preview. |
August 2022 | Apache Spark Optimized Write | Optimize Write is a Delta Lake on Synapse preview feature that reduces the number of files written by Apache Spark 3 (3.1 and 3.2) and aims to increase individual file size of the written data. To learn more, see The need for optimize write on Apache Spark. |
Data integration
This section summarizes recent new features and capabilities of Azure Synapse Analytics data integration. Learn how to Load data into Azure Synapse Analytics using Azure Data Factory (ADF) or a Synapse pipeline.
Month | Feature | Learn more |
---|---|---|
April 2023 | Capture changed data from Cosmos DB analytical store (Public Preview) | Azure Cosmos DB analytical store now supports change data capture (CDC) for Azure Cosmos DB API for NoSQL and Azure Cosmos DB API for MongoDB. For more information, see Capture Changed Data from your Cosmos DB analytical store and DevBlog: Change Data Capture (CDC) with Azure Cosmos DB analytical store. |
March 2023 | Deep dive: Synapse pipelines storage event trigger security | This Customer Success Engineering blog post is a deep dive into Azure Synapse pipelines storage event trigger security. ADF and Synapse Pipelines offer a feature that allows pipeline execution to be triggered based on various events, such as storage blob creation or deletion. This can be used by customers to implement event-driven pipeline orchestration. |
January 2023 | SQL CDC incremental extract now supports numeric columns | Enabling incremental extract from SQL Server CDC in dataflows allows you to only process rows that have changed since the last time that pipeline was executed. Supported incremental column types now include date/time and numeric columns. |
December 2022 | Express virtual network injection | Both the standard and express methods to inject your SSIS Integration Runtime (IR) into a VNet are generally available now. For more information, see General Availability of Express Virtual Network injection for SSIS in Azure Data Factory. |
October 2022 | SAP CDC connector GA | The data connector for SAP Change Data Capture (CDC) is now GA. For more information, see Announcing Public Preview of the SAP CDC solution in Azure Data Factory and Azure Synapse Analytics and SAP CDC solution in Azure Data Factory. |
September 2022 | Gantt chart view | You can now view your activity runs with a Gantt chart in Azure Data Factory Integration Runtime monitoring. |
September 2022 | Monitoring improvements | We've released a new bundle of improvements to the monitoring experience based on community feedback. |
September 2022 | Maximum column optimization in mapping dataflow | For delimited text data sources such as CSVs, a new maximum columns setting allows you to set the maximum number of columns. |
September 2022 | NUMBER to integer conversion in Oracle data source connector | New property to convert Oracle NUMBER type to a corresponding integer type in source via the new property convertDecimalToInteger. For more information, see the Oracle source connector. |
September 2022 | Support for sending a body with HTTP request DELETE method in Web activity | New support for sending a body (optional) when using the DELETE method in Web activity. For more information, see the available Type properties for the Web activity. |
August 2022 | Mapping data flows now support visual Cast transformation | You can use the cast transformation to easily modify the data types of individual columns in a data flow. |
August 2022 | Default activity timeout changed to 12 hours | The default activity timeout is now 12 hours. |
August 2022 | Pipeline expression builder ease-of-use enhancements | We've updated our expression builder UI to make pipeline designing easier. |
August 2022 | New UI for mapping dataflow inline dataset types | We've updated our data flow source UI to make it easier to find your inline dataset type. |
July 2022 | Time-To-Live in managed virtual network (VNet) | Reserve compute for the time-to-live (TTL) in managed virtual network TTL period, saving time and improving efficiency. For more information on this preview, see Announcing public preview of Time-To-Live (TTL) in managed virtual network. |
June 2022 | SAP CDC connector preview | A new data connector for SAP Change Data Capture (CDC) is now available in preview. For more information, see Announcing Public Preview of the SAP CDC solution in Azure Data Factory and Azure Synapse Analytics and SAP CDC solution in Azure Data Factory. |
June 2022 | Fuzzy join option in Join Transformation | Use fuzzy matching with a similarity threshold score slider has been added to the Join transformation in Mapping Data Flows. |
June 2022 | Map Data tool GA | We're excited to announce that the Map Data tool is now Generally Available. The Map Data tool is a guided process to help you create ETL mappings and mapping data flows from your source data to Synapse without writing code. |
June 2022 | Rerun pipeline with new parameters | You can now change pipeline parameters when rerunning a pipeline from the Monitoring page without having to return to the pipeline editor. To learn more, read Rerun pipelines and activities. |
June 2022 | User Defined Functions GA | User defined functions (UDFs) in mapping data flows are now generally available (GA). |
Developer experience
This section summarizes recent new quality of life and feature improvements for developers in Azure Synapse Analytics.
Month | Feature | Learn more |
---|---|---|
December 2022 | MSSparkUtils is the Swiss Army knife inside Synapse Spark | MSSparkUtils is a built-in package to help you easily perform common tasks called Microsoft Spark utilities, including the ability to share results between notebooks. |
July 2022 | Synapse Notebooks compatibility with IPython | The official kernel for Jupyter notebooks is IPython and it's now supported in Synapse Notebooks. For more information, see Synapse Notebooks is now fully compatible with IPython. |
July 2022 | Mssparkutils now has spark.stop() method | A new API mssparkutils.session.stop() has been added to the mssparkutils package. This feature becomes handy when there are multiple sessions running against the same Spark pool. The new API is available for Scala and Python. To learn more, see Stop an interactive session. |
Machine Learning
This section summarizes recent new features and improvements to machine learning models in Azure Synapse Analytics.
Month | Feature | Learn more |
---|---|---|
November 2022 | R Support (preview) | Azure Synapse Analytics now provides built-in R support for Apache Spark, currently in preview. For an example, install an R library from CRAN and CRAN snapshots. |
August 2022 | SynapseML v.0.10.0 | New release of SynapseML v0.10.0 (previously MMLSpark), an open-source library that aims to simplify the creation of massively scalable machine learning pipelines. Learn more about the latest additions to SynapseML and get started with SynapseML. |
August 2022 | .NET support | SynapseML v0.10 adds full support for .NET languages like C# and F#. For a .NET SynapseML example, see .NET Example with LightGBMClassifier. |
August 2022 | Azure OpenAI Service support | SynapseML now allows users to tap into 175-Billion parameter language models (GPT-3) from OpenAI that can generate and complete text and code near human parity. For more information, see Azure OpenAI for Big Data. |
August 2022 | MLflow platform support | SynapseML models now integrate with MLflow with full support for saving, loading, deployment, and autologging. |
August 2022 | SynapseML in Binder | We know that Spark can be intimidating for first users but fear not because with the technology Binder, you can explore and experiment with SynapseML in Binder with zero setup, install, infrastructure, or Azure account required. |
Samples and guidance
This section summarizes new guidance and sample project resources for Azure Synapse Analytics.
Month | Feature | Learn more |
---|---|---|
March 2023 | Create a Data Solution on Azure Synapse Analytics with Snapshot Serengeti | This is a four-part series on building an end-to-end data analytics and machine learning solution on Azure Synapse Analytics. The dataset used in this solution is the Snapshot Serengeti dataset, which consists of a large-scale collection of camera trap images. |
March 2023 | Introduction to Kusto Query Language (KQL) | This Customer Success Engineering blog post provides an introduction to Kusto Query Language (KQL), a powerful query language to analyze large volumes of structured, semi structured and unstructured (Free Text) data. |
March 2023 | Creating a custom disaster recovery plan for your Synapse workspace | A multi-part blog series on creating a disaster recovery plan for their Synapse Workspace. |
March 2023 | Azure Synapse connectivity: public endpoints, private endpoints, managed VNet and managed private endpoints | A three-part expert-written blog series on Azure Synapse connectivity for the various networking options, including inbound dedicated pool public endpoint connectivity, Azure Synapse private endpoints, and managed VNet and managed private endpoints. |
February 2023 | Historical monitoring dashboards for Azure Synapse dedicated SQL pools | A walkthrough of the steps to enable historical monitoring using Azure Monitor Workbook templates on top of Azure Metrics and Azure Log Analytics. |
January 2023 | Read Data Lake with Synapse Serverless pools | A two-part guide on how to use OPENROWSET to query a path within the lake or use an external table to query a path within the lake. |
January 2023 | Structured streaming in Synapse Spark | A detailed example of streaming IoT temperature data from IoT devices into Synapse Spark. |
January 2023 | Create DNS alias for dedicated SQL pool in Synapse workspace for disaster recovery | A custom DNS for dedicated SQL pools (formerly SQL DW) can provide redirect to client programs during a disaster. |
December 2022 | Azure Synapse - Data Lake vs. Delta Lake vs. Data Lakehouse | Read a new Success Engineering blog post demystifying the terms Data Lake, Delta Lake, and Data Lakehouse. |
November 2022 | How Data Exfiltration Protection (DEP) impacts Azure Synapse Analytics Pipelines | Data Exfiltration Protection (DEP) is a feature that enables additional restrictions on the ability of Azure Synapse Analytics to connect to other services. |
November 2022 | Getting started with REST APIs for Azure Synapse Analytics - Apache Spark Pool | We provide instructions on how to setup and use Synapse REST endpoints and describe the Apache Spark Pool operations supported by REST APIs. |
November 2022 | Synapse Spark Delta Time Travel | Delta Lake time travel enables point-in-time query snapshots or even rolls back erroneous updates. |
September 2022 | What is the difference between Synapse dedicated SQL pool (formerly SQL DW) and Serverless SQL pool? | Understand dedicated vs serverless pools and their concurrency. Read more at basic concepts of dedicated SQL pools and serverless SQL pools. |
September 2022 | Reading Delta Lake in dedicated SQL Pool | Sample script to import Delta Lake files directly into the dedicated SQL Pool and support features like time-travel. For an explanation, see Reading Delta Lake in dedicated SQL Pool. |
September 2022 | Azure Synapse Customer Success Engineering blog series | The new Azure Synapse Customer Success Engineering blog series launches with a detailed introduction to Building the Lakehouse - Implementing a Data Lake Strategy with Azure Synapse. |
June 2022 | Azure Orbital analytics with Synapse Analytics | We now offer an Azure Orbital analytics sample solution showing an end-to-end implementation of extracting, loading, transforming, and analyzing spaceborne data by using geospatial libraries and AI models with Azure Synapse Analytics. The sample solution also demonstrates how to integrate geospatial-specific Azure AI services models, AI models from partners, and bring-your-own-data models. |
June 2022 | Azure Synapse success by design | The Azure Synapse proof of concept playbook provides a guide to scope, design, execute, and evaluate a proof of concept for SQL or Spark workloads. |
Security
This section summarizes recent new security features and settings in Azure Synapse Analytics.
Month | Feature | Learn more |
---|---|---|
December 2022 | How Data Exfiltration Protection (DEP) impacts Azure Synapse Analytics Pipelines | Data Exfiltration Protection (DEP) is a feature that enables additional restrictions on the ability of Azure Synapse Analytics to connect to other services. |
August 2022 | Execute Azure Synapse Spark Notebooks with system-assigned managed identity | You can now execute Spark Notebooks with the system-assigned managed identity (or workspace managed identity) by enabling Run as managed identity from the Configure session menu. With this feature, you are able to validate that your notebook works as expected when using the system-assigned managed identity, before using the notebook in a pipeline. For more information, see Managed identity for Azure Synapse. |
Azure Synapse Link
Azure Synapse Link is an automated system for replicating data from SQL Server or Azure SQL Database, Azure Cosmos DB, or Dataverse into Azure Synapse Analytics. This section summarizes recent news about the Azure Synapse Link feature.
Month | Feature | Learn more |
---|---|---|
November 2022 | Azure Synapse Link for SQL | Azure Synapse Link for SQL is now generally available for both SQL Server 2022 and Azure SQL Database. The Azure Synapse Link for SQL feature provides low- and no-code, near real-time data replication from your SQL-based operational stores into Azure Synapse Analytics. Provide BI reporting on operational data in near real-time, with minimal impact on your operational store. For more information, see What is Azure Synapse Link for SQL? |
July 2022 | Batch mode | Decide between cost and latency in Azure Synapse Link for SQL by selecting continuous or batch mode to replicate your data. Batch mode allows you to save even more on costs by only paying for ingestion service during the batch loads instead of it being continuously on. You can select between 20 and 60 minutes for batch processing. |
Synapse SQL
This section summarizes recent improvements and features in SQL pools in Azure Synapse Analytics.
Month | Feature | Learn more |
---|---|---|
June 2023 | Updated diagnostic settings fields | Nine fields have been added to the dedicated SQL pool diagnostic settings logs. |
March 2023 | Create alerts for your Azure Synapse dedicated SQL pool | This Customer Success Engineering blog post provides steps to configure alerts for your Azure Synapse dedicated SQL pool and provide recommended alerts to get you started. |
March 2023 | Performance Tuning Synapse Dedicated Pools - Understanding the Query Lifecycle | This Customer Success Engineering blog post is a deep dive into Understanding Query Lifecycle to Maximize Performance. |
March 2023 | GREATEST and LEAST T-SQL syntax support | GREATEST and LEAST functions are now available in both serverless and dedicated SQL pools. These scalar-valued functions and return the maximum and minimum value out of a list of one or more expressions. |
February 2023 | UTF-8 and Japanese collations support for dedicated SQL pools | Both UTF-8 support and Japanese collations are now generally available for dedicated SQL pools. |
September 2022 | Auto-statistics for OPENROWSET in CSV datasets | Serverless SQL pool will automatically create statistics for CSV datasets when needed to ensure an optimal query execution plan for OPENROWSET queries. |
September 2022 | MERGE T-SQL syntax | T-SQL MERGE syntax has been a highly requested addition to the Synapse T-SQL library. MERGE encapsulates INSERTs/UPDATEs/DELETEs into a single statement. Available in dedicated SQL pools in version 10.0.17829 and above. For more, see the MERGE T-SQL announcement blog. |
August 2022 | Apache Spark Delta Lake tables in serverless SQL pools | The ability to for serverless SQL pools to access Delta Lake tables created in Spark databases is in preview. For more information, see Azure Synapse Analytics shared metadata tables. |
August 2022 | Multi-column distribution in dedicated SQL pools | You can now Hash Distribute tables on multiple columns for a more even distribution of the base table, reducing data skew over time and improving query performance. For more information on opting-in to the preview, see CREATE TABLE distribution options or CREATE TABLE AS SELECT distribution options. |
August 2022 | Distribution Advisor | The Distribution Advisor is a new preview feature in Azure Synapse dedicated SQL pools Gen2 that analyzes queries and recommends the best distribution strategies for tables to improve query performance. For more information, see Distribution Advisor in Azure Synapse SQL. |
August 2022 | Add SQL objects and users in Lake databases | New capabilities announced for lake databases in serverless SQL pools: create schemas, views, procedures, inline table-valued functions. You can also database users from your Azure Active Directory domain and assign them to the db_datareader role. For more information, see Access lake databases using serverless SQL pool in Azure Synapse Analytics and Create and use native external tables using SQL pools in Azure Synapse Analytics. |
Learn more
For older updates, review past Azure Synapse Analytics Blog posts or previous updates in Azure Synapse Analytics.
- Get started with Azure Synapse Analytics
- Introduction to Azure Synapse Analytics
- Realize Integrated Analytical Solutions with Azure Synapse Analytics
- Data integration at scale with Azure Data Factory or Azure Synapse Pipeline
- Microsoft Training Learning Paths for Azure Synapse
- Azure Synapse Analytics in Microsoft Q&A