Microsoft Purview - Build custom lineage with the REST APIs

This article provides the steps to create data lineage entries using the REST API in the Microsoft Purview Data Catalog. In scenarios where the automatically generated lineage in Microsoft Purview is incomplete or missing, lineage can be custom built either manually in the Microsoft Purview portal or using the REST APIs. This article focuses on using REST APIs that can overcome manual lineage known limitations and provide more options.

Background

The ability to show the lineage between datasets is one of Microsoft Purview's important platform features. Systems like Data Factory, Data Share, and Power BI capture the lineage of data as it moves. In certain situations, Microsoft Purview's automatically generated lineage is incomplete or missing for practical visualization and enterprise reporting purposes. In those scenarios, custom lineage reporting is supported by Apache Atlas hooks and the REST API.

Using the REST APIs to build custom lineage allows you to overcome some of manual lineage's limitations as described in the following articles:

The rest of this article explains the use of Microsoft Purview REST APIs to build and report custom lineage on Microsoft Purview.

Prerequisites

Scenarios

There are two use cases when building custom lineage becomes necessary:

A. Create new entities created and link them with lineage

B. Link existing entities or lineage to another existing entity or lineage

As an example, lineage needs to be reported between entities A & B, but A & B don't exist currently.

To create the entities A & B, invoke the Microsoft Purview REST API: Entity - Bulk Create Or Update - REST API

POST https://{accountname}.purview.azure.cn/datamap/api/atlas/v2/entity/bulk?api-version=2023-09-01
sample_entity_json = '{"entity": {"status": "ACTIVE","version": 0,"name": ENTITY_A"}.......{"entity": ........}}'
#Send POST JSON containing entities to be created
CreateOrUpdateEntitesUrl = 'https://<purview_account_name>.purview.azure.cn/datamap/api/atlas/v2/entity/bulk'
EntitiesResponse = requests.post(CreateOrUpdateEntitesUrl, json = json.loads(sample_entity_json) ,headers=headers)
entitiesRes = json.loads(EntitiesResponse.text)

API Response "201 Created" indicates entities are successfully created and their respective GUIDs are contained in the output JSON.

Now that the entities A & B are created, move to step B to link the entities in the lineage chain using the same REST API.

  • If the number of entities to be linked isn't time or resource intensive (for example, less than 20-30 entities), you can connect the lineage manually in the Microsoft Purview portal. Follow the manual lineage user guide for steps to manually create lineage connections.
  • If you have a high number of lineage connections to make, the process needs to be automated, or if manual lineage using the Microsoft Purview portal isn't feasible, proceed with the API process of linking and building custom lineage.

Custom Lineage JSON Payload:

Execute the POST /entity/bulk Entity - Bulk Create Or Update - REST API using the payload as illustrated:

POST https://{accountname}.purview.azure.cn/datamap/api/atlas/v2/entity/bulk?api-version=2023-09-01
sample_entity_json = '{
  "entities": [
    {
      "status": "ACTIVE",
      "version": 1,
      "typeName": "Process",
      "attributes": {
        "inputs": [
          {
            "guid": "24558fd8-9cdc-47de-9310-56a58108bab0",
            “guid”: “27163581-9aca-212a-782a-213612639abc”
          }
        ],
        "outputs": [
          {
            "guid": "e33c694a-2c4f-4cae-8c27-06f6f6f60000"
          }
        ],
        "qualifiedName": "cassandra://query",
        "name": "query"
      }
    }
  ]
}'

#In this code snippet, we send the JSON as POST request containing the two GUIDs as input and "output" GUID as output. This creates lineage with 2 directional inputs and 1 directional output.
#Note: using the same API and SDK code you can create lineage with any number of inputs, any number of processes in between, any number of typedefs and any number of outputs.
#The API/SDK method is the most flexible and versatile menthod of creating lineage.
 
CreateLineageEntitesUrl = 'https://<purview_account_name>.purview.azure.cn/datamap/api/atlas/v2/entity/bulk'
EntitiesResponse = requests.post(CreateLineageEntitesUrl, json = json.loads(sample_entity_json),headers=headers)
entitiesRes = json.loads(EntitiesResponse.text)

This JSON payload creates the custom lineage. It works for already existing assets whose GUIDs are supplied in the "inputs" JSON. For example "guid": "24558fd8-9cdc-47de-9310-56a58108bab0" and "27163581-9aca-212a-782a-213612639abc" refers to the directional input of the lineage and "guid": "e33c694a-2c4f-4cae-8c27-06f6f6f60000" refers to the directional output of the lineage, which was an existing asset automatically scanned by Purview. We just created the lineage between the two assets.

Note

If the assets don't exist already, you need to run Bulk Entity create API before this step to create those entities before creating the lineage relationship. The bulk entity creation process using the POST /entity/bulk API and Python code snippet is described in step A.

Scenario Outcomes

API Response "201 Created" indicates successful lineage graph linkage creation and the created GUIDs are contained in the output JSON. Lineage appears in the Microsoft Purview portal:

  • Scenario A: Custom built lineage from assets created via API:

    Screenshot showing Scenario A: Custom Built Lineage from assets created via API.

  • Scenario B: Custom built lineage from pre-existing assets linked via API.

    Note

    If the lineage is created from pre-existing entities, observe that the pre-existing lineage graph remains intact and the new linkage is created and displayed additionally.

    Screenshot showing Scenario B: Custom Built Lineage from pre-existing assets linked via API.