What's New in Azure Synapse Analytics Archive
This article describes previous month updates to Azure Synapse Analytics. For the most current month's release, check out Azure Synapse Analytics latest updates. Each update links to the Azure Synapse Analytics blog and an article that provides more information.
Generally available features
The following table lists a past history of the features of Azure Synapse Analytics that have transitioned from preview to general availability (GA).
Month | Feature | Learn more |
---|---|---|
July 2022 | Apache Spark™ 3.2 for Synapse Analytics | Apache Spark™ 3.2 for Synapse Analytics is now generally available. Review the official release notes and migration guidelines between Spark 3.1 and 3.2 to assess potential changes to your applications. For more details, read Apache Spark version support and Azure Synapse Runtime for Apache Spark 3.2. Highlights of what got better in Spark 3.2 in the Azure Synapse Analytics July Update 2022. |
July 2022 | Apache Spark in Azure Synapse Intelligent Cache feature | Intelligent Cache for Spark automatically stores each read within the allocated cache storage space, detecting underlying file changes and refreshing the files to provide the most recent data. To learn more, see how to Enable/Disable the cache for your Apache Spark pool. |
June 2022 | Map Data tool | The Map Data tool is a guided process to help you create ETL mappings and mapping data flows from your source data to Synapse without writing code. To learn more about the Map Data tool, read Map Data in Azure Synapse Analytics. |
June 2022 | User Defined Functions | User defined functions (UDFs) are now generally available. To learn more, read User defined functions in mapping data flows. |
April 2022 | Cross-subscription restore for Azure Synapse SQL | With the PowerShell Az.Sql module 3.8 update, the Restore-AzSqlDatabase cmdlet can be used for cross-subscription restore of dedicated SQL pools. To learn more, see Blog: Restore a dedicated SQL pool (formerly SQL DW) to a different subscription. This feature is now generally available for dedicated SQL pools (formerly SQL DW) and dedicated SQL pools in a Synapse workspace. What's the difference? |
April 2022 | Database Designer | The database designer allows users to visually create databases within Synapse Studio without writing a single line of code. For more information, see Announcing General Availability of Database Designer. Read more about lake databases and learn How to modify an existing lake database using the database designer. |
April 2022 | Synapse Monitoring Operator RBAC role | The Synapse Monitoring Operator RBAC (role-based access control) role allows a user persona to monitor the execution of Synapse Pipelines and Spark applications without having the ability to run or cancel the execution of these applications. For more information, review the Synapse RBAC Roles. |
March 2022 | Flowlets | Flowlets help you design portions of new data flow logic, or to extract portions of an existing data flow, and save them as separate artifact inside your Synapse workspace. Then, you can reuse these Flowlets can inside other data flows. To learn more, review the Flowlets GA announcement blog post and read Flowlets in mapping data flow. |
March 2022 | Change Feed connectors | Changed data capture (CDC) feed data flow source transformations for Azure Cosmos DB, Azure Blob Storage, ADLS Gen2, and Common Data Model (CDM) are now generally available. By simply checking a box, you can tell ADF to manage a checkpoint automatically for you and only read the latest rows that were updated or inserted since the last pipeline run. To learn more, review the Change Feed connectors GA preview blog post and read Copy and transform data in Azure Data Lake Storage Gen2 using Azure Data Factory or Azure Synapse Analytics. |
March 2022 | Column level encryption for dedicated SQL pools | Column level encryption is now generally available for use on new and existing Azure SQL logical servers with Azure Synapse dedicated SQL pools and dedicated SQL pools in Azure Synapse workspaces. SQL Server Data Tools (SSDT) support for column level encryption for the dedicated SQL pools is available starting with the 17.2 Preview 2 build of Visual Studio 2022. |
March 2022 | Synapse Spark Common Data Model (CDM) connector | The CDM format reader/writer enables a Spark program to read and write CDM entities in a CDM folder via Spark dataframes. To learn more, see how the CDM connector supports reading, writing data, examples, & known issues. |
November 2021 | PREDICT | The T-SQL PREDICT syntax is now generally available for dedicated SQL pools. Get started with the Machine learning model scoring wizard for dedicated SQL pools. |
October 2021 | Synapse RBAC Roles | Synapse role-based access control (RBAC) roles are now generally available. Learn more about Synapse RBAC roles and Azure Synapse role-based access control (RBAC) using PowerShell. |
Apache Spark for Azure Synapse Analytics
This section is an archive of features and capabilities of Apache Spark for Azure Synapse Analytics.
Month | Feature | Learn more |
---|---|---|
May 2022 | Azure Synapse dedicated SQL pool connector for Apache Spark now available in Python | Previously, the Azure Synapse Dedicated SQL Pool Connector for Apache Spark was only available using Scala. Now, the dedicated SQL pool connector for Apache Spark can be used with Python on Spark 3. |
May 2022 | Manage Azure Synapse Apache Spark configuration | With the new Apache Spark configurations feature, you can create a standalone Spark configuration artifact with auto-suggestions and built-in validation rules. The Spark configuration artifact allows you to share your Spark configuration within and across Azure Synapse workspaces. You can also easily associate your Spark configuration with a Spark pool, a Notebook, and a Spark job definition for reuse and minimize the need to copy the Spark configuration in multiple places. |
April 2022 | Apache Spark 3.2 for Synapse Analytics | Apache Spark 3.2 for Synapse Analytics with preview availability. Review the official Spark 3.2 release notes and migration guidelines between Spark 3.1 and 3.2 to assess potential changes to your applications. For more details, read Apache Spark version support and Azure Synapse Runtime for Apache Spark 3.2. |
April 2022 | Parameterization for Spark job definition | You can now assign parameters dynamically based on variables, metadata, or specifying Pipeline specific parameters for the Spark job definition activity. For more details, read Transform data using Apache Spark job definition. |
April 2022 | Apache Spark notebook snapshot | You can access a snapshot of the Notebook when there's a Pipeline Notebook run failure or when there's a long-running Notebook job. To learn more, read Transform data by running a Synapse notebook and Introduction to Microsoft Spark utilities. |
March 2022 | Synapse Spark Common Data Model (CDM) connector | The CDM format reader/writer enables a Spark program to read and write CDM entities in a CDM folder via Spark dataframes. To learn more, see how the CDM connector supports reading, writing data, examples, & known issues. |
March 2022 | Performance optimization for Synapse Spark dedicated SQL pool connector | New improvements to the Azure Synapse Dedicated SQL Pool Connector for Apache Spark reduce data movement and leverage COPY INTO . Performance tests indicated at least ~5x improvement over the previous version. No action is required from the user to leverage these enhancements. For more information, see Blog: Synapse Spark Dedicated SQL Pool (DW) Connector: Performance Improvements. |
March 2022 | Support for all Spark Dataframe SaveMode choices | The Azure Synapse Dedicated SQL Pool Connector for Apache Spark now supports all four Spark Dataframe SaveMode choices: Append, Overwrite, ErrorIfExists, Ignore. For more information on Spark SaveMode, read the official Apache Spark documentation. |
March 2022 | Apache Spark in Azure Synapse Analytics Intelligent Cache feature | Intelligent Cache for Spark automatically stores each read within the allocated cache storage space, detecting underlying file changes and refreshing the files to provide the most recent data. To learn more on this preview feature, see how to Enable/Disable the cache for your Apache Spark pool or see the blog post. |
Data integration
This section is an archive of features and capabilities of Azure Synapse Analytics data integration. Learn how to Load data into Azure Synapse Analytics using Azure Data Factory (ADF) or a Synapse pipeline.
Month | Feature | Learn more |
---|---|---|
June 2022 | SAP CDC connector preview | A new data connector for SAP Change Data Capture (CDC) is now available in preview. For more information, see Announcing Public Preview of the SAP CDC solution in Azure Data Factory and Azure Synapse Analytics and SAP CDC solution in Azure Data Factory. |
June 2022 | Fuzzy join option in Join Transformation | Use fuzzy matching with a similarity threshold score slider has been added to the Join transformation in Mapping Data Flows. |
June 2022 | Map Data tool GA | We're excited to announce that the Map Data tool is now Generally Available. The Map Data tool is a guided process to help you create ETL mappings and mapping data flows from your source data to Synapse without writing code. |
June 2022 | Rerun pipeline with new parameters | You can now change pipeline parameters when rerunning a pipeline from the Monitoring page without having to return to the pipeline editor. To learn more, read Rerun pipelines and activities. |
June 2022 | User Defined Functions GA | User defined functions (UDFs) in mapping data flows are now generally available (GA). |
May 2022 | Export pipeline monitoring as a CSV | The ability to export pipeline monitoring to CSV and other monitoring improvements have been introduced to ADF. |
May 2022 | Automatic incremental source data loading from PostgreSQL and MySQL | Automatic incremental source data loading from PostgreSQL and MySQL to Synapse SQL and Azure Database is now natively available in ADF. |
May 2022 | Assert transformation error handling | Error handling has now been added to sinks following an assert transformation in mapping data flow. You can now choose whether to output the failed rows to the selected sink or to a separate file. |
May 2022 | Mapping data flows projection editing | In mapping data flows, you can now update source projection column names and column types. |
April 2022 | Dataverse connector for Synapse Data Flows | Dataverse is now a source and sink connector to Synapse Data Flows. You can Copy and transform data from Dynamics 365 (Microsoft Dataverse) or Dynamics CRM using Azure Data Factory or Azure Synapse Analytics. |
April 2022 | Configurable Synapse Pipelines Web activity response timeout | With the response timeout property httpRequestTimeout , you can define a timeout for the HTTP request up to 10 minutes. Web activities work exceptionally well with APIs that follow the asynchronous request-reply pattern, a suggested approach for building scalable web APIs/services. |
March 2022 | sFTP connector for Synapse data flows | A native sftp connector in Synapse data flows is supported to read and write data from sFTP using the visual low-code data flows interface in Synapse. To learn more, see Copy and transform data in SFTP server using Azure Data Factory or Azure Synapse Analytics. |
March 2022 | Data flow improvements to Data Preview | Review features added to the Data Preview and debug improvements in Mapping Data Flows. |
March 2022 | Pipeline script activity | You can now Transform data by using the Script activity to invoke SQL commands to perform both DDL and DML. |
December 2021 | Custom partitions for Synapse link for Azure Cosmos DB | Improve query execution times for your Spark queries, by creating custom partitions based on fields frequently used in your queries. To learn more, see Custom partitioning in Azure Synapse Link for Azure Cosmos DB (Preview). |
Database Designer
This section is an archive of features and capabilities of the database designer.
Month | Feature | Learn more |
---|---|---|
April 2022 | Database Designer | The database designer allows users to visually create databases within Synapse Studio without writing a single line of code. For more information, see Announcing General Availability of Database Designer. Read more about lake databases and learn How to modify an existing lake database using the database designer. |
April 2022 | Clone lake database | In Synapse Studio, you can now clone a database using the action menu available on the lake database. To learn more, read How-to: Clone a lake database. |
April 2022 | Use wildcards to specify custom folder hierarchies | Lake databases sit on top of data that is in the lake and this data can live in nested folders that don't fit into clean partition patterns. You can now use wildcards to specify custom folder hierarchies. To learn more, read How-to: Modify a datalake. |
Developer experience
This section is an archive of quality of life and feature improvements for developers in Azure Synapse Analytics.
Month | Feature | Learn more |
---|---|---|
May 2022 | Updated Azure Synapse Analyzer Report | Learn about the new features in version 2.0 of the Synapse Analyzer report. |
April 2022 | Azure Synapse Analyzer Report | The Azure Synapse Analyzer Report helps you identify common issues that may be present in your database that can lead to performance issues. |
April 2022 | Reference unpublished notebooks | Now, when using %run notebooks, you can enable 'unpublished notebook reference', which will allow you to reference unpublished notebooks. When enabled, notebook run will fetch the current contents in the notebook web cache, meaning the changes in your notebook editor can be referenced immediately by other notebooks without having to be published (Live mode). |
March 2022 | Code cells with exception to show standard output | Now in Synapse notebooks, both standard output and exception messages are shown when a code statement fails for Python and Scala languages. For examples, see Synapse notebooks: Code cells with exception to show standard output. |
March 2022 | Partial output is available for running notebook code cells | Now in Synapse notebooks, you can see anything you write (with println commands, for example) as the cell executes, instead of waiting until it ends. For examples, see Synapse notebooks: Partial output is available for running notebook code cells . |
March 2022 | Dynamically control your Spark session configuration with pipeline parameters | Now in Synapse notebooks, you can use pipeline parameters to configure the session with the notebook %%configure magic. For examples, see Synapse notebooks: Dynamically control your Spark session configuration with pipeline parameters. |
March 2022 | Reuse and manage notebook sessions | Now in Synapse notebooks, it's easy to reuse an active session conveniently without having to start a new one and to see and manage your active sessions in the Active sessions list. To view your sessions, select the 3 dots in the notebook and select Manage sessions. For examples, see Synapse notebooks: Reuse and manage notebook sessions. |
March 2022 | Support for Python logging | Now in Synapse notebooks, anything written through the Python logging module is captured, in addition to the driver logs. For examples, see Synapse notebooks: Support for Python logging. |
Machine Learning
This section is an archive of features and improvements to machine learning models in Azure Synapse Analytics.
Month | Feature | Learn more |
---|---|---|
November 2021 | PREDICT | The T-SQL PREDICT syntax is now generally available for dedicated SQL pools. Get started with the Machine learning model scoring wizard for dedicated SQL pools. |
Samples and guidance
This section is an archive of guidance and sample project resources for Azure Synapse Analytics.
Month | Feature | Learn more |
---|---|---|
June 2022 | Azure Orbital analytics with Synapse Analytics | We now offer an Azure Orbital analytics sample solution showing an end-to-end implementation of extracting, loading, transforming, and analyzing spaceborne data by using geospatial libraries and AI models with Azure Synapse Analytics. The sample solution also demonstrates how to integrate geospatial-specific Azure AI services models, AI models from partners, and bring-your-own-data models. |
June 2022 | Azure Synapse success by design | The Azure Synapse proof of concept playbook provides a guide to scope, design, execute, and evaluate a proof of concept for SQL or Spark workloads. |
Security
This section is an archive of security features and settings in Azure Synapse Analytics.
Month | Feature | Learn more |
---|---|---|
April 2022 | Synapse Monitoring Operator RBAC role | The Synapse Monitoring Operator role-based access control (RBAC) role allows a user persona to monitor the execution of Synapse Pipelines and Spark applications without having the ability to run or cancel the execution of these applications. For more information, review the Synapse RBAC Roles. |
March 2022 | Enforce minimal TLS version | You can now raise or lower the minimum TLS version for dedicated SQL pools in Synapse workspaces. To learn more, see Azure SQL connectivity settings. The workspace managed SQL API can be used to modify the minimum TLS settings. |
March 2022 | Azure Synapse Analytics now supports Azure Active Directory (Azure AD) only authentication | You can now use Azure Active Directory authentication to centrally manage access to all Azure Synapse resources, including SQL pools. You can disable local authentication upon creation or after a workspace is created through the Azure portal. |
December 2021 | User-Assigned managed identities | Now you can use user-assigned managed identities in linked services for authentication in Synapse Pipelines and Dataflows. To learn more, see Credentials in Azure Data Factory and Azure Synapse. |
December 2021 | Browse ADLS Gen2 folders in the Azure Synapse Analytics workspace | You can now browse and secure an Azure Data Lake Storage Gen2 (ADLS Gen2) container or folder in your Azure Synapse Analytics workspace by connecting to a specific container or folder in Synapse Studio. |
December 2021 | TLS 2.1 enforced for new Synapse Workspaces | Starting in December 2021, a requirement for TLS 1.2 has been implemented for new Synapse Workspaces only. |
Azure Synapse Link
Azure Synapse Link is an automated system for replicating data from SQL Server or Azure SQL Database, Azure Cosmos DB, or Dataverse into Azure Synapse Analytics. This section is an archive of news about the Azure Synapse Link feature.
Month | Feature | Learn more |
---|---|---|
May 2022 | Azure Synapse Link for SQL preview | Azure Synapse Link for SQL is in preview for both SQL Server 2022 and Azure SQL Database. The Azure Synapse Link feature provides low- and no-code, near real-time data replication from your SQL-based operational stores into Azure Synapse Analytics. Provide BI reporting on operational data in near real-time, with minimal impact on your operational store. The Azure Synapse Link for SQL preview has been announced. For more information, see Blog: Azure Synapse Link for SQL Deep Dive. |
Synapse SQL
This section is an archive of improvements and features in SQL pools in Azure Synapse Analytics.
Month | Feature | Learn more |
---|---|---|
June 2022 | Result set size limit increase | The maximum size of query result sets in serverless SQL pools has been increased from 200 GB to 400 GB. |
May 2022 | Automatic character column length calculation for serverless SQL pools | It's no longer necessary to define character column lengths for serverless SQL pools in the data lake. You can get optimal query performance without having to define the schema, because the serverless SQL pool will use automatically calculated average column lengths and cardinality estimation. |
April 2022 | Cross-subscription restore for Azure Synapse SQL GA | With the PowerShell Az.Sql module 3.8 update, the Restore-AzSqlDatabase cmdlet can be used for cross-subscription restore of dedicated SQL pools. To learn more, see Restore a dedicated SQL pool to a different subscription. This feature is now generally available for dedicated SQL pools (formerly SQL DW) and dedicated SQL pools in a Synapse workspace. What's the difference? |
April 2022 | Recover SQL pool from dropped server or workspace | With the PowerShell Restore cmdlets in Az.Sql and Az.Synapse modules, you can now restore from a deleted server or workspace without filing a support ticket. For more information, see Restore a dedicated SQL pool from a deleted Azure Synapse workspace or Restore a standalone dedicated SQL pools (formerly SQL DW) from a deleted server, depending on your scenario. |
March 2022 | Column level encryption for dedicated SQL pools | Column level encryption is now generally available for use on new and existing Azure SQL logical servers with Azure Synapse dedicated SQL pools and dedicated SQL pools in Azure Synapse workspaces. SQL Server Data Tools (SSDT) support for column level encryption for the dedicated SQL pools is available starting with the 17.2 Preview 2 build of Visual Studio 2022. |
March 2022 | Parallel execution for CETAS | Better performance for CREATE TABLE AS SELECT (CETAS) and subsequent SELECT statements now made possible by use of parallel execution plans. For examples, see Better performance for CETAS and subsequent SELECTs. |
Previous monthly updates in Azure Synapse Analytics
What follows are the previous format of monthly news updates for Synapse Analytics.
June 2022 update
General
Azure Orbital analytics with Synapse Analytics - We now offer an Azure Orbital analytics sample solution showing an end-to-end implementation of extracting, loading, transforming, and analyzing spaceborne data by using geospatial libraries and AI models with Azure Synapse Analytics. The sample solution also demonstrates how to integrate geospatial-specific Azure AI services models, AI models from partners, and bring-your-own-data models.
Azure Synapse success by design - Project success is no accident and requires careful planning and execution. The Synapse Analytics' Success by Design playbooks are now available. The Azure Synapse proof of concept playbook provides a guide to scope, design, execute, and evaluate a proof of concept for SQL or Spark workloads. These guides contain best practices from the most challenging and complex solution implementations incorporating Azure Synapse. To learn more about the Azure Synapse proof of concept playbook, read Success by Design.
SQL
Result set size limit increase - We know that you turn to Azure Synapse Analytics to work with large amounts of data. With that in mind, the maximum size of query result sets in Serverless SQL pools has been increased from 200 GB to 400 GB. This limit is shared between concurrent queries. To learn more about this size limit increase and other constraints, read Self-help for serverless SQL pool.
Data integration
Fuzzy Join option in Join Transformation - Fuzzy matching with a sliding similarity score option has been added to the Join transformation in Mapping Data Flows. You can create inner and outer joins on data values that are similar rather than exact matches! Previously, you would have had to use an exact match. The sliding scale value goes from 60% to 100%, making it easy to adjust the similarity threshold of the match. For learn more about fuzzy joins, read Join transformation in mapping data flow.
Map Data [Generally Available] - We're excited to announce that the Map Data tool is now Generally Available. The Map Data tool is a guided process to help you create ETL mappings and mapping data flows from your source data to Synapse without writing code. To learn more about Map Data, read Map Data in Azure Synapse Analytics.
Rerun pipeline with new parameters - You can now change pipeline parameters when rerunning a pipeline from the Monitoring page without having to return to the pipeline editor. After running a pipeline with new parameters, you can easily monitor the new run against the old ones without having to toggle between pages. To learn more about rerunning pipelines with new parameters, read Rerun pipelines and activities.
User Defined Functions [Generally Available] - We're excited to announce that user defined functions (UDFs) are now Generally Available. With user-defined functions, you can create customized expressions that can be reused across multiple mapping data flows. You no longer have to use the same string manipulation, math calculations, or other complex logic several times. User-defined functions will be grouped in libraries to help developers group common sets of functions. To learn more about user defined functions, read User defined functions in mapping data flows.
May 2022 update
The following updates are new to Azure Synapse Analytics this month.
SQL
- Automatic character column length calculation - It's no longer necessary to define character column lengths! Serverless SQL pools let you query files in the data lake without knowing the schema upfront. The best practice was to specify the lengths of character columns to get optimal performance. Not anymore! With this new feature, you can get optimal query performance without having to define the schema. The serverless SQL pool will calculate the average column length for each inferred character column or character column defined as larger than 100 bytes. The schema will stay the same, while the serverless SQL pool will use the calculated average column lengths internally. It will also automatically calculate the cardinality estimation in case there was no previously created statistic.
Apache Spark for Synapse
Azure Synapse Dedicated SQL Pool Connector for Apache Spark Now Available in Python - Previously, the Azure Synapse Dedicated SQL Pool connector was only available using Scala. Now, it can be used with Python on Spark 3. The only difference between the Scala and Python implementations is the optional Scala callback handle, which allows you to receive post-write metrics.
The following are now supported in Python on Spark 3:
- Read using Azure Active Directory (AD) Authentication or Basic Authentication
- Write to Internal Table using Azure AD Authentication or Basic Authentication
- Write to External Table using Azure AD Authentication or Basic Authentication
To learn more about the connector in Python, read Azure Synapse Dedicated SQL Pool Connector for Apache Spark.
Manage Azure Synapse Apache Spark configuration - Apache Spark configuration management is always a challenging task because Spark has hundreds of properties. It is also challenging for you to know the optimal value for Spark configurations. With the new Spark configuration management feature, you can create a standalone Spark configuration artifact with auto-suggestions and built-in validation rules. The Spark configuration artifact allows you to share your Spark configuration within and across Azure Synapse workspaces. You can also easily associate your Spark configuration with a Spark pool, a Notebook, and a Spark job definition for reuse and minimize the need to copy the Spark configuration in multiple places. To learn more about the new Spark configuration management feature, read Manage Apache Spark configuration.
Data Integration
Export pipeline monitoring as a CSV - The ability to export pipeline monitoring to CSV has been added after receiving many community requests for the feature. Simply filter the Pipeline runs screen to the data you want and select Export to CSV*. To learn more about exporting pipeline monitoring and other monitoring improvements, read Azure Data Factory monitoring improvements.
Incremental data loading made easy for Synapse and Azure Database for PostgreSQL and MySQL - In a data integration solution, incrementally loading data after an initial full data load is a widely used scenario. Automatic incremental source data loading is now natively available for Synapse SQL and Azure Database for PostgreSQL and MySQL. Users can "enable incremental extract" and only inserted or updated rows will be read by the pipeline. To learn more about incremental data loading, read Incrementally copy data from a source data store to a destination data store.
User-Defined Functions for Mapping Data Flows [Public Preview] - We hear you that you can find yourself doing the same string manipulation, math calculations, or other complex logic several times. Now, with the new user-defined function feature, you can create customized expressions that can be reused across multiple mapping data flows. User-defined functions will be grouped in libraries to help developers group common sets of functions. Once you've created a data flow library, you can add in your user-defined functions. You can even add in multiple arguments to make your function more reusable. To learn more about user-defined functions, read User defined functions in mapping data flows.
Assert Error Handling - Error handling has now been added to sinks following an assert transformation. Assert transformations enable you to build custom rules for data quality and data validation. You can now choose whether to output the failed rows to the selected sink or to a separate file. To learn more about error handling, read Assert data transformation in mapping data flow.
Mapping data flows projection editing - New UI updates have been made to source projection editing in mapping data flows. You can now update source projection column names and column types. To learn more about source projection editing, read Source transformation in mapping data flow.
Azure Synapse Link
Azure Synapse Link for SQL Server - At Microsoft Build 2022, we announced the Public Preview availability of Azure Synapse Link for SQL, for both SQL Server 2022 and Azure SQL Database. Data-driven, quality insights are critical for companies to stay competitive. The speed to achieve those insights can make all the difference. The costly and time-consuming nature of traditional ETL and ELT pipelines is no longer enough. With this release, you can now take advantage of low- and no-code, near real-time data replication from your SQL-based operational stores into Azure Synapse Analytics. This makes it easier to run BI reporting on operational data in near real-time, with minimal impact on your operational store. To learn more, read Announcing the Public Preview of Azure Synapse Link for SQL.
Apr 2022 update
The following updates are new to Azure Synapse Analytics this month.
SQL
Cross-subscription restore for Azure Synapse SQL is now generally available. Previously, it took many undocumented steps to restore a dedicated SQL pool to another subscription. Now, with the PowerShell Az.Sql module 3.8 update, the Restore-AzSqlDatabase cmdlet can be used for cross-subscription restore. To learn more, see Restore a dedicated SQL pool (formerly SQL DW) to a different subscription.
It is now possible to recover a SQL pool from a dropped server or workspace. With the PowerShell Restore cmdlets in Az.Sql and Az.Synapse modules, you can now restore from a deleted server or workspace without filing a support ticket. For more information, read Synapse workspace SQL pools or standalone SQL pools (formerly SQL DW), depending on your scenario.
Synapse database designer
We've added the option to clone a lake database. This unlocks additional opportunities to manage new versions of databases or support schemas that evolve in discrete steps. You can quickly clone a database using the action menu available on the lake database. To learn more, read How-to: Clone a lake database.
You can now use wildcards to specify custom folder hierarchies. Lake databases sit on top of data that is in the lake and this data can live in nested folders that don't fit into clean partition patterns. Previously, querying lake databases required that your data exists in a simple directory structure that you could browse using the folder icon without the ability to manually specify directory structure or use wildcard characters. To learn more, read How-to: Modify a datalake.
Apache Spark for Synapse
We are excited to announce the preview availability of Apache Spark™ 3.2 on Synapse Analytics. This new version incorporates user-requested enhancements and resolves 1,700+ Jira tickets. Please review the official release notes for the complete list of fixes and features and review the migration guidelines between Spark 3.1 and 3.2 to assess potential changes to your applications. For more details, read Apache Spark version support and Azure Synapse Runtime for Apache Spark 3.2.
Assigning parameters dynamically based on variables, metadata, or specifying Pipeline specific parameters has been one of your top feature requests. Now, with the release of parameterization for the Spark job definition activity, you can do just that. For more details, read Transform data using Apache Spark job definition.
We often receive customer requests to access the snapshot of the Notebook when there is a Pipeline Notebook run failure or there is a long-running Notebook job. With the release of the Synapse Notebook snapshot feature, you can now view the snapshot of the Notebook activity run with the original Notebook code, the cell output, and the input parameters. You can also access the snapshot of the referenced Notebook from the referencing Notebook cell output if you refer to other Notebooks through Spark utils. To learn more, read Transform data by running a Synapse notebook and Introduction to Microsoft Spark utilities.
Security
- The Synapse Monitoring Operator RBAC role is now generally available. Since the GA of Synapse, customers have asked for a fine-grained RBAC (role-based access control) role that allows a user persona to monitor the execution of Synapse Pipelines and Spark applications without having the ability to run or cancel the execution of these applications. Now, customers can assign the Synapse Monitoring Operator role to such monitoring personas. This allows organizations to stay compliant while having flexibility in the delegation of tasks to individuals or teams. Learn more by reading Synapse RBAC Roles.
Data integration
Azure has added Dataverse as a source and sink connector to Synapse Data Flows so that you can now build low-code data transformation ETL jobs in Synapse directly accessing your Dataverse environment. For more details on how to use this new connector, read Mapping data flow properties.
We heard from you that a 1-minute timeout for Web activity was not long enough, especially in cases of synchronous APIs. Now, with the response timeout property 'httpRequestTimeout', you can define timeout for the HTTP request up to 10 minutes. Learn more by reading Web activity response timeout improvements.
Developer experience
- Previously, if you wanted to reference a notebook in another notebook, you could only reference published or committed content. Now, when using %run notebooks, you can enable 'unpublished notebook reference' which will allow you to reference unpublished notebooks. When enabled, notebook run will fetch the current contents in the notebook web cache, meaning the changes in your notebook editor can be referenced immediately by other notebooks without having to be published (Live mode). To learn more, read Reference unpublished notebook.
Mar 2022 update
The following updates are new to Azure Synapse Analytics this month.
Developer Experience
Code cells in Synapse notebooks that result in exception will now show standard output along with the exception message. This feature is supported for Python and Scala languages. To learn more, see the example output when a code statement fails.
Synapse notebooks now support partial output when running code cells. To learn more, see the examples at this blog post
You can now dynamically control Spark session configuration for the notebook activity with pipeline parameters. To learn more, see the variable explorer feature of Synapse notebooks.
You can now reuse and manage notebook sessions without having to start a new one. You can easily connect a selected notebook to an active session in the list started from another notebook. You can detach a session from a notebook, stop the session, and monitor it. To learn more, see how to manage your active notebook sessions.
Synapse notebooks now capture anything written through the Python logging module, in addition to the driver logs. To learn more, see support for Python logging.
SQL
Column Level Encryption for Azure Synapse dedicated SQL Pools is now Generally Available. With column level encryption, you can use different protection keys for each column with each key having its own access permissions. The data in CLE-enforced columns are encrypted on disk and remain encrypted in memory until the DECRYPTBYKEY function is used to decrypt it. To learn more, see how to encrypt a data column.
Serverless SQL pools now support better performance for CETAS (Create External Table as Select) and subsequent SELECT queries. The performance improvements include, a parallel execution plan resulting in faster CETAS execution and outputting multiple files. To learn more, see CETAS with Synapse SQL article and the blog post
Apache Spark for Synapse
Synapse Spark Common Data Model (CDM) Connector is now Generally Available. The CDM format reader/writer enables a Spark program to read and write CDM entities in a CDM folder via Spark dataframes. To learn more, see how the CDM connector supports reading, writing data, examples, & known issues.
Synapse Spark Dedicated SQL Pool (DW) Connector now supports improved performance. The new architecture eliminates redundant data movement and uses COPY-INTO instead of PolyBase. You can authenticate through SQL basic authentication or opt into the Azure Active Directory/Azure AD based authentication method. It now has ~5x improvements over the previous version. To learn more, see Azure Synapse Dedicated SQL Pool Connector for Apache Spark
Synapse Spark Dedicated SQL Pool (DW) Connector now supports all Spark Dataframe SaveMode choices. It supports Append, Overwrite, ErrorIfExists, and Ignore modes. The Append and Overwrite are critical for managing data ingestion at scale. To learn more, see DataFrame write SaveMode support
Accelerate Spark execution speed using the new Intelligent Cache feature. This feature is currently in public preview. Intelligent Cache automatically stores each read within the allocated cache storage space, detecting underlying file changes and refreshing the files to provide the most recent data. To learn more, see how to Enable/Disable the cache for your Apache Spark pool or see the blog post
Security
Azure Synapse Analytics now supports Azure Active Directory (Azure AD) authentication. You can turn on Azure AD authentication during the workspace creation or after the workspace is created. To learn more, see how to use Azure AD authentication with Synapse SQL.
API support to raise or lower minimal TLS version for workspace managed SQL Server Dedicated SQL. To learn more, see how to update the minimum TLS setting or read the blog post for more details.
Data Integration
Flowlets and CDC Connectors are now Generally Available. Flowlets in Synapse Data Flows allow for reusable and composable ETL logic. To learn more, see Flowlets in mapping data flow or see the blog post.
sFTP connector for Synapse data flows. You can read and write data while transforming data from sftp using the visual low-code data flows interface in Synapse. To learn more, see source transformation
Data flow improvements to Data Preview. To learn more, see Data Preview and debug improvements in Mapping Data Flows
Pipeline script activity. The Script Activity enables data engineers to build powerful data integration pipelines that can read from and write to Synapse databases, and other database types. To learn more, see Transform data by using the Script activity in Azure Data Factory or Synapse Analytics
Feb 2022 update
The following updates are new to Azure Synapse Analytics this month.
SQL
Serverless SQL Pools now support more consistent query execution times. Learn how Serverless SQL pools automatically detect spikes in read latency and support consistent query execution time.
The
OPENJSON
function makes it easy to get array element indexes. To learn more, see how the OPENJSON function in a serverless SQL pool allows you to parse nested arrays and return one row for each JSON array element with the index of each element.
Data integration
Upserting data is now supported by the copy activity. See how you can natively load data into a temporary table and then merge that data into a sink table with upsert.
Transform Dynamics Data Visually in Synapse Data Flows. Learn more on how to use a Dynamics dataset or an inline dataset as source and sink types to transform data at scale.
Connect to your SQL sources in data flows using Always Encrypted. To learn more, see how to securely connect to your SQL databases from Synapse data flows using Always Encrypted.
Capture descriptions from asserts in Data Flows To learn more, see how to define your own dynamic descriptive messages in the assert data flow transformation at the row or column level.
Easily define schemas for complex type fields. To learn more, see how you can make the engine to automatically detect the schema of an embedded complex field inside a string column.
Jan 2022 update
The following updates are new to Azure Synapse Analytics this month.
Machine Learning
Improvements to the Synapse Machine Learning library v0.9.5 (previously called MMLSpark). This release simplifies the creation of massively scalable machine learning pipelines with Apache Spark. To learn more, read the blog post about the new capabilities in this release or see the full release notes
Security
The Azure Synapse Analytics security overview - A whitepaper that covers the five layers of security. The security layers include authentication, access control, data protection, network security, and threat protection. Understand each security feature in detailed to implement an industry-standard security baseline and protect your data on the cloud.
TLS 1.2 is now required for newly created Synapse Workspaces. To learn more, see how TLS 1.2 provides enhanced security using this article or the blog post. Sign-in attempts to a newly created Synapse workspace from connections using TLS versions lower than 1.2 will fail.
Data Integration
Data quality validation rules using Assert transformation - You can now easily add data quality, data validation, and schema validation to your Synapse ETL jobs by using Assert transformation in Synapse data flows. To learn more, see the Assert transformation in mapping data flow article or the blog post.
Native data flow connector for Dynamics - Synapse data flows can now read and write data directly to Dynamics through the new data flow Dynamics connector. Learn more on how to Create data sets in data flows to read, transform, aggregate, join, etc. using this article or the blog post. You can then write the data back into Dynamics using the built-in Synapse Spark compute.
IntelliSense and auto-complete added to pipeline expressions - IntelliSense makes creating expressions, editing them easy. To learn more, see how to check your expression syntax, find functions, and add code to your pipelines.
Synapse SQL
COPY schema discovery for complex data ingestion. To learn more, see the blog post or how GitHub leveraged this functionality in Introducing Automatic Schema Discovery with auto table creation for complex datatypes.
Serverless SQL pools now support the HASHBYTES function. HASHBYTES is a T-SQL function, which hashes values. Learn how to use hash values in distributing data using this article or the blog post.
December 2021 update
The following updates are new to Azure Synapse Analytics this month.
Apache Spark for Synapse
- Mount remote storage to a Synapse Spark pool blog article
- Natively read & write data in ADLS with Pandas blog article
- Dynamic allocation of executors for Spark blog article
Machine Learning
- The Synapse Machine Learning library blog article
- Getting started with state-of-the-art pre-built intelligent models blog article
- Building responsible AI systems with the Synapse ML library blog article
- PREDICT is now GA for Synapse Dedicated SQL pools blog article
- Simple & scalable scoring with PREDICT and MLFlow for Apache Spark for Synapse blog article
- Retail AI solutions blog article
Security
- User-Assigned managed identities now supported in Synapse Pipelines in preview blog article
- Browse ADLS Gen2 folders in an Azure Synapse Analytics workspace in preview blog article
Data Integration
- Pipeline Fail activity blog article
- Mapping Data Flow gets new native connectors blog article
- More notebook export formats: HTML, Python, and LaTeX blog
- Three new chart types in notebook view: box plot, histogram, and pivot table blog
- Reconnect to lost notebook session blog
Integrate
- Azure Synapse Link for Dataverse blog article
- Custom partitions for Azure Synapse Link for Azure Cosmos DB in preview blog article
- Map data tool (Public Preview), a no-code guided ETL experience blog article
- Quick reuse of spark cluster blog article
- External Call transformation blog article
- Flowlets (Public Preview) blog article
November 2021 update
The following updates are new to Azure Synapse Analytics this month.
Work with Databases and Data Lakes
- Introducing Lake databases (formerly known as Spark databases) blog article
- Lake database designer now available in preview blog article
SQL
- Delta Lake support for serverless SQL is generally available blog article
- Query multiple file paths using OPENROWSET in serverless SQL blog article
- Serverless SQL queries can now return up to 200 GB of results blog article
- Handling invalid rows with OPENROWSET in serverless SQL blog article
Apache Spark for Synapse
- Mount remote storage to a Synapse Spark pool blog article
- Natively read & write data in ADLS with Pandas blog article
- Dynamic allocation of executors for Spark blog article
Machine Learning
- The Synapse Machine Learning library blog article
- Getting started with state-of-the-art pre-built intelligent models blog article
- Building responsible AI systems with the Synapse ML library blog article
- PREDICT is now GA for Synapse Dedicated SQL pools blog article
- Simple & scalable scoring with PREDICT and MLFlow for Apache Spark for Synapse blog article
- Retail AI solutions blog article
Security
- User-Assigned managed identities now supported in Synapse Pipelines in preview blog article
- Browse ADLS Gen2 folders in an Azure Synapse Analytics workspace in preview blog article
Data Integration
Azure Synapse Link
- Azure Synapse Link for Dataverse blog article
- Custom partitions for Azure Synapse Link for Azure Cosmos DB in preview blog article
October 2021 update
The following updates are new to Azure Synapse Analytics this month.
Apache Spark for Synapse
- Spark performance optimizations blog
Security
- All Synapse RBAC roles are now generally available for use in production blog article
- Apply User-Assigned Managed Identities for Double Encryption blog article
- Synapse Administrators now have elevated access to dedicated SQL pools blog article
Integrate
- Use Stringify in data flows to easily transform complex data types to strings blog article
- Control Spark session time-to-live (TTL) in data flows blog article
Developer Experience
- Enhanced Markdown editing in Synapse notebooks preview blog article
- Pandas dataframes automatically render as nicely formatted HTML tables blog article
- Use IPython widgets in Synapse Notebooks blog article
- Mssparkutils runtime context now available for Python and Scala blog article