Type definitions and how to create custom types
This tutorial will explain what type definitions are, how to create custom types, and how to initialize assets of custom types in Microsoft Purview.
In this tutorial, you'll learn:
- How Microsoft Purview uses the type system from Apache Atlas
- How to create a new custom type
- How to create relationships between custom types
- How to initialize new entities of custom types
Prerequisites
For this tutorial you'll need:
An Azure account with an active subscription. If you don't have one, you can create an account for trial.
An active Microsoft Purview (formerly Azure Purview) account. If you don't have one, see the quickstart for creating a Microsoft Purview account.
A bearer token to your Microsoft Purview account. To establish a bearer token and to call any APIs, see the documentation about how to authenticate APIs for Microsoft Purview.
Your Apache Atlas endpoint of your Microsoft Purview account. It will be one of these two endpoints, depending on your environment:
- If you're using the classic Microsoft Purview governance portal:
https://{{ACCOUNTNAME}}.purview.azure.cn/catalog
- If you're using the new Microsoft Purview portal:
https://api.purview-service.microsoft.com/catalog
- If you're using the classic Microsoft Purview governance portal:
Note
Before moving to the hands-on part of the tutorial, the first four sections will explain what a System Type is and how it is used in Microsoft Purview. All the REST API calls described further will use the bearer token and the endpoint which are described in the prerequisites.
To skip directly to the steps, use these links:
What are asset and type in Microsoft Purview?
An asset is a metadata element that describes a digital or physical resource. The digital or physical resources that are expected to be cataloged as assets include:
- Data sources such as databases, files, and data feed.
- Analytical models and processes.
- Business policies and terms.
- Infrastructure like the server.
Microsoft Purview provides users a flexible type system to expand the definition of the asset to include new kinds of resources as they become relevant. Microsoft Purview relies on the Type System from Apache Atlas. All metadata objects (assets) managed by Microsoft Purview are modeled using type definitions. Understanding the Type System is fundamental to create new custom types in Microsoft Purview.
Essentially, a Type can be seen as a Class from Object Oriented Programming (OOP):
- It defines the properties that represent that type.
- Each type is uniquely identified by its name.
- A type can inherit from a supertType. This is an equivalent concept as inheritance from OOP. A type that extends a superType will inherit the attributes of the superType.
You can see all type definitions in your Microsoft Purview account by sending a GET
request to the All Type Definitions endpoint:
GET https://{{ENDPOINT}}/catalog/api/atlas/v2/types/typedefs
Apache Atlas has few predefined system types that are commonly used as supertypes.
For example:
Referenceable: This type represents all entities that can be searched for using a unique attribute called qualifiedName.
Asset: This type extends from Referenceable and has other attributes such as: name, description and owner.
DataSet: This type extends Referenceable and Asset. Conceptually, it can be used to represent a type that stores data. Types that extend DataSet can be expected to have a Schema. For example, a SQL table.
Lineage: Lineage information helps one understand the origin of data and the transformations it may have gone through before arriving in a file or table. Lineage is calculated through DataSet and Process: DataSets (input of process) impact some other DataSets (output of process) through Process.
Example of a Type definition
To better understand the Type system, let's look at an example and see how an Azure SQL Table is defined.
You can get the complete type definition by sending a GET
request to the Type Definition endpoint:
GET https://{{ENDPOINT}}/catalog/api/atlas/v2/types/typedef/name/{name}
Tip
The {name} property tells which definition you are interested in. In this case, you should use azure_sql_table.
Below you can see a simplified JSON result:
{
"category": "ENTITY",
"guid": "7d92a449-f7e8-812f-5fc8-ca6127ba90bd",
"name": "azure_sql_table",
"description": "azure_sql_table",
"typeVersion": "1.0",
"serviceType": "Azure SQL Database",
"options": {
"schemaElementsAttribute": "columns",
},
"attributeDefs": [
{ "name": "principalId", ...},
{ "name": "objectType", ...},
{ "name": "createTime", ...},
{ "name": "modifiedTime", ... }
],
"superTypes": [
"DataSet",
"Purview_Table",
"Table"
],
"subTypes": [],
"relationshipAttributeDefs": [
{
"name": "dbSchema",
"typeName": "azure_sql_schema",
"isOptional": false,
"cardinality": "SINGLE",
"relationshipTypeName": "azure_sql_schema_tables",
},
{
"name": "columns",
"typeName": "array<azure_sql_column>",
"isOptional": true,
"cardinality": "SET",
"relationshipTypeName": "azure_sql_table_columns",
},
]
}
Based on the JSON type definition, let's look at some properties:
Category field describes in what category your type is. The list of categories supported by Apache Atlas can be found here.
ServiceType field is useful when browsing assets by source type in Microsoft Purview. The service type will be an entry point to find all assets that belong to the same service type - as defined on their type definition. In the below screenshot of Purview UI, the user limits the result to be the entities specified with Azure SQL Database in serviceType:
Note
Azure SQL Database is defined with the same serviceType as Azure SQL Table.
SuperTypes describes the "parent" types you want to "inherit" from.
schemaElementsAttributes from options influences what appears in the Schema tab of your asset in Microsoft Purview.
Below you can see an example of how the Schema tab looks like for an asset of type Azure SQL Table:
relationshipAttributeDefs are calculated through the relationship type definitions. In our JSON, we can see that schemaElementsAttributes points to the relationship attribute called columns - which is one of elements from relationshipAttributeDefs array, as shown below:
... "relationshipAttributeDefs": [ ... { "name": "columns", "typeName": "array<azure_sql_column>", "isOptional": true, "cardinality": "SET", "relationshipTypeName": "azure_sql_table_columns", }, ]
Each relationship has its own definition. The name of the definition is found in relationshipTypeName attribute. In this case, it's azure_sql_table_columns.
- The cardinality of this relationship attribute is set to *SET, which suggests that it holds a list of related assets.
- The related asset is of type azure_sql_column, as visible in the typeName attribute.
In other words, the columns relationship attribute relates the Azure SQL Table to a list of Azure SQL Columns that show up in the Schema tab.
Example of a relationship Type definition
Each relationship consists of two ends, called endDef1 and endDef2.
In the previous example, azure_sql_table_columns was the name of the relationship that characterizes a table (endDef1) and its columns (endDef2).
For the full definition, you can do make a GET
request to the following endpoint using azure_sql_table_columns as the name:
GET https://{{ENDPOINT}}/catalog/api/atlas/v2/types/typedef/name/azure_sql_table_columns
Below you can see a simplified JSON result:
{
"category": "RELATIONSHIP",
"guid": "c80d0027-8f29-6855-6395-d243b37d8a93",
"name": "azure_sql_table_columns",
"description": "azure_sql_table_columns",
"serviceType": "Azure SQL Database",
"relationshipCategory": "COMPOSITION",
"endDef1": {
"type": "azure_sql_table",
"name": "columns",
"isContainer": true,
"cardinality": "SET",
},
"endDef2": {
"type": "azure_sql_column",
"name": "table",
"isContainer": false,
"cardinality": "SINGLE",
}
}
name is the name of the relationship definition. The value, in this case azure_sql_table_columns is used in the relationshipTypeName attribute of the entity that has this relationship, as you can see it referenced in the json.
relationshipCategory is the category of the relationship and it can be either COMPOSITION, AGGREGATION or ASSOCIATION as described here.
enDef1 is the first end of the definition and contains the attributes:
type is the type of the entity that this relationship expects as end1.
name is the attribute that will appear on this entity's relationship attribute.
cardinality is either SINGLE, SET or LIST.
isContainer is a boolean and applies to containment relationship category. When set to true in one end, it indicates that this end is the container of the other end. Therefore:
- Only the Composition or Aggregation category relationships can and should have in one end isContainer set to true.
- Association category relationship shouldn't have isContainer property set to true in any end.
endDef2 is the second end of the definition and describes, similarly to endDef1, the properties of the second part of the relationship.
Schema tab
What is Schema in Microsoft Purview?
Schema is an important concept that reflects how data is stored and organized in the data store. It reflects the structure of the data and the data restrictions of the elements that construct the structure.
Elements on the same schema can be classified differently (due to their content). Also, different transformation (lineage) can happen to only a subset of elements. Due to these aspects, Purview can model schema and schema elements as entities, hence schema is usually a relationship attribute to the data asset entity. Examples of schema elements are: columns of a table, json properties of json schema, xml elements of xml schema etc.
There are two types of schemas:
Intrinsic Schema - Some systems are intrinsic to schema. For example, when you create a SQL Table, the system requires you to define the columns that construct the table; in this sense, schema of a table is reflected by its columns.
For data store with predefined schema, Purview uses the corresponding relationship between the data asset and the schema elements to reflect the schema. This relationship attribute is specified by the keyword schemaElementsAttribute in options property of the entity type definition.
Non Intrinsic Schema - Some systems don't enforce such schema restrictions, but users can use it to store structural data by applying some schema protocols to the data. For example, Azure Blobs store binary data and don't care about the data in the binary stream. Therefore, it's unaware of any schema, but the user can serialize their data with schema protocols like json before storing it in the blob. In this sense, schema is maintained by some extra protocols and corresponding validation enforced by the user.
For data store without inherent schema, schema model is independent of this data store. For such cases, Purview defines an interface for schema and a relationship between DataSet and schema, called dataset_attached_schemas - this extends any entity type that inherits from DataSet to have an attachedSchema relationship attribute to link to their schema representation.
Example of Schema tab
The Azure SQL Table example from above has an intrinsic schema. The information that shows up in the Schema tab of the Azure SQL Table comes from the Azure SQL Column themselves.
Selecting one column item, we would see the following:
The question is, how did Microsoft Purview select the data_tye property from the column and showed it in the Schema tab of the table?
You can get the type definition of an Azure SQL Column by making a GET
request to the endpoint:
GET https://{{ENDPOINT}}/catalog/api/atlas/v2/types/typedef/name/{name}
Note
{name} in this case is: azure_sql_column
Here's a simplified JSON result:
{
"category": "ENTITY",
"guid": "58034a18-fc2c-df30-e474-75803c3a8957",
"name": "azure_sql_column",
"description": "azure_sql_column",
"serviceType": "Azure SQL Database",
"options": {
"schemaAttributes": "[\"data_type\"]"
},
"attributeDefs":
[
{
"name": "data_type",
"typeName": "string",
"isOptional": false,
"cardinality": "SINGLE",
"valuesMinCount": 1,
"valuesMaxCount": 1,
"isUnique": false,
"isIndexable": false,
"includeInNotification": false
},
...
]
...
}
Note
serviceType is Azure SQL Database, the same as for the table
- schemaAttributes is set to data_type, which is one of the attributes of this type.
Azure SQL Table used schemaElementAttribute to point to a relationship consisting of a list of Azure SQL Columns. The type definition of a column has schemaAttributes defined.
In this way, the Schema tab in the table displays the attribute(s) listed in the schemaAttributes of the related assets.
Create custom type definitions
Why?
First, why would someone like to create a custom type definition?
There can be cases where there's no built-in type that corresponds to the structure of the metadata you want to import in Microsoft Purview.
In such a case, a new type definition has to be defined.
Note
The usage of built-in types should be favored over the creation of custom types, whenever possible.
Now that we gained an understanding of type definitions in general, let us create custom type definitions.
Scenario
In this tutorial, we would like to model a 1:n relationship between two types, called custom_type_parent and custom_type_child.
A custom_type_child should reference one parent, whereas a custom_type_parent can reference a list of children.
They should be linked together through a 1:n relationship.
Tip
Here you can find few tips when creating a new custom type.
Create definitions
- Create the custom_type_parent type definition by making a
POST
request to one of the two following endpoints:
Classic Microsoft Purview governance portal:
POST https://{{ENDPOINT}}.purview.azure.cn/catalog/api/atlas/v2/types/typedefs
New Microsoft Purview portal:
POST https://api.purview-service.microsoft.com/catalog/api/atlas/v2/types/typedefs
With the body:
{
"entityDefs":
[
{
"category": "ENTITY",
"version": 1,
"name": "custom_type_parent",
"description": "Sample custom type of a parent object",
"typeVersion": "1.0",
"serviceType": "Sample-Custom-Types",
"superTypes": [
"DataSet"
],
"subTypes": [],
"options":{
"schemaElementsAttribute": "columns"
}
}
]
}
- Create the custom_type_child type definition by making a
POST
request to one of the two following endpoints:
Classic Microsoft Purview governance portal:
POST https://{{ENDPOINT}}.purview.azure.cn/catalog/api/atlas/v2/types/typedefs
New Microsoft Purview portal:
POST https://api.purview-service.microsoft.com/catalog/api/atlas/v2/types/typedefs
With the body:
{
"entityDefs":
[
{
"category": "ENTITY",
"version": 1,
"name": "custom_type_child",
"description": "Sample custom type of a CHILD object",
"typeVersion": "1.0",
"serviceType": "Sample-Custom-Types",
"superTypes": [
"DataSet"
],
"subTypes": [],
"options":{
"schemaAttributes": "data_type"
}
}
]
}
- Create a custom type relationship definition by making a
POST
request to one of the two following endpoints:
Classic Microsoft Purview governance portal:
POST https://{{ENDPOINT}}.purview.azure.cn/catalog/api/atlas/v2/types/typedefs
New Microsoft Purview portal:
POST https://api.purview-service.microsoft.com/catalog/api/atlas/v2/types/typedefs
With the body:
{
"relationshipDefs": [
{
"category": "RELATIONSHIP",
"endDef1" : {
"cardinality" : "SET",
"isContainer" : true,
"name" : "Children",
"type" : "custom_type_parent"
},
"endDef2" : {
"cardinality" : "SINGLE",
"isContainer" : false,
"name" : "Parent",
"type" : "custom_type_child"
},
"relationshipCategory" : "COMPOSITION",
"serviceType": "Sample-Custom-Types",
"name": "custom_parent_child_relationship"
}
]
}
Initialize assets of custom types
- Initialize a new asset of type custom_type_parent by making a
POST
request to one of the two following endpoints:
Classic Microsoft Purview governance portal:
POST https://{{ENDPOINT}}.purview.azure.cn/catalog/api/atlas/v2/entity
New Microsoft Purview portal:
POST https://api.purview-service.microsoft.com/catalog/api/atlas/v2/entity
With the body:
{
"entity": {
"typeName":"custom_type_parent",
"status": "ACTIVE",
"version": 1,
"attributes":{
"name": "First_parent_object",
"description": "This is the first asset of type custom_type_parent",
"qualifiedName": "custom//custom_type_parent:First_parent_object"
}
}
}
Save the guid as you'll need it later.
- Initialize a new asset of type custom_type_child by making a
POST
request to one of the two following endpoints:
Classic Microsoft Purview governance portal:
POST https://{{ENDPOINT}}.purview.azure.cn/catalog/api/atlas/v2/entity
New Microsoft Purview portal:
POST https://api.purview-service.microsoft.com/catalog/api/atlas/v2/entity
With the body:
{
"entity": {
"typeName":"custom_type_child",
"status": "ACTIVE",
"version": 1,
"attributes":{
"name": "First_child_object",
"description": "This is the first asset of type custom_type_child",
"qualifiedName": "custom//custom_type_child:First_child_object"
}
}
}
Save the guid as you'll need it later.
- Initialize a new relationship of type custom_parent_child_relationship by making a
POST
request to one of the two following endpoints:
Classic Microsoft Purview governance portal:
POST https://{{ENDPOINT}}.purview.azure.cn/catalog/api/atlas/v2/relationship/
New Microsoft Purview portal:
POST https://api.purview-service.microsoft.com/catalog/api/atlas/v2/relationship/
With the following body:
Note
The guid in end1 must be replaced with the the guid of the object created at step 6.1 The guid in end2 must be replaced with the guid of the object created at step 6.2
{
"typeName": "custom_parent_child_relationship",
"end1": {
"guid": "...",
"typeName": "custom_type_parent"
},
"end2": {
"guid": "...",
"typeName": "custom_type_child"
}
}
View the assets in Microsoft Purview
Go to Data Catalog in Microsoft Purview.
Select Browse.
Select By source type.
Select Sample-Custom-Types.
Select the First_parent_object:
Select the Properties tab:
You can see the First_child_object being linked there.
Select the First_child_object:
Select the Properties tab:
You can see the Parent object being linked there.
Similarly, you can select the Related tab and will see the relationship between the two objects:
You can create multiple children by initializing a new child asset and inititialzing a relationship
Note
The qualifiedName is unique per asset, therefore, the second child should be called differently, such as: custom//custom_type_child:Second_child_object
Tip
If you delete the First_parent_object you will notice that the children will also be removed, due to the COMPOSITION relationship that we chose in the definition.
Limitations
There are several known limitations when working with custom types that will be enhanced in future, such as:
- Relationship tab looks different compared to built-in types
- Custom types have no icons
- Hierarchy isn't supported