How to get lineage from Azure Synapse Analytics into Microsoft Purview
This document explains the steps required for connecting an Azure Synapse workspace with a Microsoft Purview account to track data lineage and ingest data sources. The document also gets into the details of the activity coverage scope and supported lineage capabilities.
When you connect Azure Synapse Analytics to Microsoft Purview, whenever a supported pipeline activity is run, metadata about the activity's source data, output data, and the activity will be automatically ingested into the Microsoft Purview Data Map.
If a data source has already been scanned and exists in the data map, the ingestion process will add the lineage information from Azure Synapse Analytics to that existing source. If the source or output doesn't exist in the data map and is supported by Azure Synapse Analytics lineage Microsoft Purview will automatically add their metadata from Synapse Analytics into the data map under the default domain root collection.
This can be an excellent way to monitor your data estate as users move and transform information using Azure Synapse Analytics.
Supported Azure Synapse capabilities
Currently, Microsoft Purview captures runtime lineage from the following Azure Synapse pipeline activities:
Important
Microsoft Purview drops lineage if the source or destination uses an unsupported data storage system.
Copy activity support
Data store | Supported |
---|---|
Azure Blob Storage | Yes |
Azure Cognitive Search | Yes |
Azure Cosmos DB for NoSQL * | Yes |
Azure Cosmos DB for MongoDB * | Yes |
Azure Data Explorer * | Yes |
Azure Data Lake Storage Gen1 | Yes |
Azure Data Lake Storage Gen2 | Yes |
Azure Database for MariaDB * | Yes |
Azure Database for MySQL * | Yes |
Azure Database for PostgreSQL * | Yes |
Azure Files | Yes |
Azure SQL Database * | Yes |
Azure SQL Managed Instance * | Yes |
Azure Synapse Analytics * | Yes |
Azure Dedicated SQL pool (formerly SQL DW) * | Yes |
Azure Table Storage | Yes |
Hive * | Yes |
SQL Server * | Yes |
* Microsoft Purview currently doesn't support query or stored procedure for lineage or scanning. Lineage is limited to table and view sources only.
If you use Self-hosted Integration Runtime, note the minimal version with lineage support for:
- Any use case: version 5.9.7885.3 or later
- Copying data into Azure Synapse Analytics via COPY command or PolyBase: version 5.10 or later
Limitations on copy activity lineage
Currently, if you use the following copy activity features, the lineage is not yet supported:
- Copy data into Azure Data Lake Storage Gen1 using Binary format.
- Compression setting for Binary, delimited text, Excel, JSON, and XML files.
- Source partition options for Azure SQL Database, Azure SQL Managed Instance, Azure Synapse Analytics, SQL Server, and SAP Table.
- Copy data to file-based sink with setting of max rows per file.
- Column level lineage is not currently supported by copy activity when source/sink is resource set.
In additional to lineage, the data asset schema (shown in Asset -> Schema tab) is reported for the following connectors:
- CSV and Parquet files on Azure Blob, Azure Files, ADLS Gen1, and ADLS Gen2
- Azure Data Explorer, Azure SQL Database, Azure SQL Managed Instance, Azure Synapse Analytics, SQL Server
Data Flow support
Data store | Supported |
---|---|
Azure Blob Storage | Yes |
Azure Cosmos DB for NoSQL * | Yes |
Azure Data Lake Storage Gen1 | Yes |
Azure Data Lake Storage Gen2 | Yes |
Azure Database for MySQL * | Yes |
Azure Database for PostgreSQL * | Yes |
Azure SQL Database * | Yes |
Azure SQL Managed Instance * | Yes |
Azure Synapse Analytics * | Yes |
Azure Dedicated SQL pool (formerly SQL DW) * | Yes |
* Microsoft Purview currently doesn't support query or stored procedure for lineage or scanning. Lineage is limited to table and view sources only.
Limitations on data flow lineage
- Data flow lineage may generate folder level resource set without visibility on the involved files.
- Column level lineage is not currently supported when source/sink is resource set.
- For the lineage of data flow activity, Microsoft Purview only supports showing the source and sink involved. The detailed lineage for data flow transformation isn't supported yet.
- Lineage is not supported when flowlets are part of the dataflow.
- Currently Purview doesn't support lineage reporting for Synapse tables (LakeHouse DB/Workspace DB)
Monitor the Azure Synapse Analytics links
In Microsoft Purview governance portal, you can monitor the Azure Synapse Analytics links.