Microsoft Purview domains and collections architectures and best practices
At the core of Microsoft Purview unified governance solutions, the data map is a service that keeps an up-to-date map of assets and their metadata across your data estate. To hydrate the data map, you need to register and scan your data sources. In an organization, there might be thousands of sources of data managed and governed by either centralized or decentralized business units, teams, and environments. To manage this, you can use domains and collections in Microsoft Purview.
Note
The best practices recommendations in this article applies to Microsoft Purview accounts that are using a tenant-level account (https://purview.microsoft.com).
Domains
In Microsoft Purview, domains are foundational elements of the Data Map and represent a top-level hierarchy within a Microsoft Purview account. They enable the segregation of responsibilities, effective organization, and management of data governance within the organization specially when there are subsidiaries or business units that operate independently, but they share a common Entra ID tenant. By using domains, organizations can achieve several capabilities, including:
- Organization: Domains help logically group resources such as data sources, assets, scans, and security related resources that belong to a business unit or region.
- Delegation: Domains are a hierarchy above collections, allowing Microsoft Purview admins to delegate specific administrative tasks to subsets of components within Microsoft Purview data governance for specific business units or suborganization.
- Security: By isolating objects within domains, administrators can implement targeted security measures and control access more effectively. For example, resources such as connections, credentials and policies can be specific and visible for a certain domain.
- Lifecycle Management: Domains facilitate the separation of development, test, QA, preproduction, and production resources within the same tenant.
- Isolation of Resources: Domains help isolate resources due to regional, legal, or regulatory requirements.
Collections
Collections in Microsoft Purview support organizational or suborganizational mapping of metadata. By using collections, you can manage and maintain data sources, scans, and assets within a business unit in a hierarchy instead of a flat structure. Collections allow you to build a custom hierarchical model of your data landscape based on how your organization plans to use Microsoft Purview to govern your data.
A collection also provides a security boundary for your metadata in the data map. Access to collections, data sources, and metadata is set up and maintained based on the collections hierarchy in Microsoft Purview, following a least-privilege model:
- Users have the minimum amount of access they need to do their jobs.
- Users don't have access to sensitive data that they don't need.
Understanding the Relationship
- Domains are more strategic and policy-centric, while collections are more operational and access-centric. For example, in a large healthcare organization with several segments such as Hospitals, Clinics, Research, and Administration, all under the same Microsoft Entra ID tenant, domains, and collections can be defined as follows:
- Domains: The organization creates domains for each segment. These domains are strategic and policy-centric, meaning they define high-level governance policies, compliance requirements, and data management strategies for each segment. For instance, the Hospitals domain might have policies related to patient data privacy and healthcare regulations, while the Research domain might focus on data sharing agreements and ethical guidelines for clinical trials. Each domain can have their own sets of credentials, scan rulesets, policies, and connections as well as collections, data sources, scans, and assets that aren't visible to users and admins in other domains.
- Collections: Within the Hospitals domain, there are several operational tasks that need to be managed. The organization creates collections for different operational units such as Emergency Services, Inpatient Care, and Outpatient Services. These collections are more operational and access-centric, meaning they organize data sources, assets, and scan specific to each operational unit. Access to these collections is controlled based on the roles and responsibilities of the users within the Hospitals segment. For example, only emergency department staff might have access to the Emergency Services collection, while inpatient care managers have access to the Inpatient Care collection.
- Collections can exist within domains, inheriting the governance policies set at the domain level.
- In Microsoft Purview data governance, domains and collections have distinct functions. An account can have one default domain and up to four custom domains. Each domain can have its own collection hierarchy.
- A user member of Purview Administrators role can create and manage domains and delegate access to each business unit to manage their own domains by granting them access as Purview Domain Manager role.
Define a hierarchy
Design recommendations
Start designing your domains and collections architecture based on your organization's legal, security requirements considering data management, and governance structure of your organization. Review the recommended archetypes in this article.
Consider security and access management as part of your design decision-making process when you build domains and collections in Microsoft Purview.
Start with default domain and build collection hierarchy inside the default domain. Use additional domains if you have any of the following requirements:
You need to build prod and non-prod environments under the same tenant.
You have multiple regions and need to logically separate resources and segregate responsibilities across these regions.
Your organization has multiple companies or business unit under the same tenant, and need to separate resources and segregate management and responsibilities.
Each domain or collection has a name attribute and a friendly name attribute. If you use the Microsoft Purview governance portal to deploy a domain or collection, the system automatically assigns a random six-letter name to avoid duplication.
Currently, a domain or collection name can contain up to 36 characters and a collection friendly name can have up to 100 characters.
When you can, avoid duplicating your organizational structure into a deeply nested collection hierarchy. If you can't avoid doing so, be sure to use different names for every collection in the hierarchy to make the collections easy to distinguish.
Automate deployment of domains and collections by using the API if you're planning to deploy domains and collections and role assignments in bulk.
Use a dedicated service principal name (SPN) to run operations on Data Map for managing domains, collections and role assignments by using the API. Using an SPN reduces the number of users who have elevated rights and follows least-privilege guidelines.
Design considerations
Domains are available only to Microsoft Purview accounts using a tenant-level account (https://purview.microsoft.com).
Consider that currently a Microsoft Purview account can have up to four domains in addition to the default domain. As part of consolidating your current Microsoft Purview accounts, the content of existing data maps including collections, data sources, assets and scans are migrated to a new domain.
Create new domain if you're planning to onboard a new organization in your tenant that they have a different legal requirement.
The following resources are deployed at tenant level and visible across all domains:
- Typedefs
- Managed attributes
- Glossary terms
- Classifications and classification rules
- Metamodel
- Integration runtimes
- Workflows
Domains provide separation of the following resources:
- Credentials
- Security connections
- Custom scan rule sets
- Advanced resource sets and pattern rules
- Policies
- ADF connections
- Collections and all resources that can be scoped to a collection
Collections provide separation of the following resources:
- Data sources
- Scans
- Assets
Each Microsoft Purview account is created with a default domain. The default domain name is the same as your Microsoft Purview account name. The default domain can't be removed, however, you can change the default domain's friendly name.
A collection can have as many child collections as needed. But each collection can have only one domain and a parent collection.
A collections hierarchy in a Microsoft Purview can support as many as 256 collections, with a maximum of eight levels of depth. This doesn't include the root collection.
By design, you can't register data sources multiple times in a single Microsoft Purview account. This architecture helps to avoid the risk of assigning different levels of access control to a single data source. If multiple teams consume the metadata of a single data source, you can register and manage the data source in a parent collection. You can then create corresponding scans under each subcollection so that relevant assets appear under each child collection.
Lineage connections and artifacts are attached to the default domain even if the data sources are registered at lower-level collections.
When you run a new scan, by default, the scan is deployed in the same collection as the data source. You can optionally select a different subcollection to run the scan. As a result, the assets belong under the subcollection.
You can delete a domain if it's empty.
You can delete a collection if it does not have any assets, associated scans, data sources or child collections.
Moving data sources across collections is allowed if the user is granted the Data Source Admin role for the source and destination collections.
Moving assets across collections is allowed if the user is granted the Data Curator role for the source and destination collections.
To perform move and rename operations on a collection, review the following recommendations and considerations:
To rename a collection, you must be member of collection admins role.
To move a collection, you must be member of collection admins role on the source and destination collections.
Define an authorization model
Microsoft Purview contains roles in Microsoft Defender for Office 365 as well as roles that exist inside Microsoft Purview data plane. After you deploy a Microsoft Purview account, a default domain is automatically created and the creator of the Microsoft Purview account becomes part of Purview Administrators role. For more information about permissions for the Microsoft Purview Data Map and Data Catalog, see the roles and permissions documentation.
Design recommendations
Consider implementing emergency access or a break-glass strategy for your tenant, so you can recover access to Microsoft Purview default domain when needed, to avoid Microsoft Purview account-level lockouts. Document the process for using emergency accounts.
Minimize the number of Purview Administrators, domain admins and collection admins. Assign a maximum of three domain admin users at the default domain, including the SPN and your break-glass accounts. Assign your Collection Admin roles to the top-level collection or to subcollections instead.
Assign roles to groups instead of individual users to reduce administrative overhead and errors in managing individual roles.
Assign the service principal at the root collection for automation purposes.
To increase security, enable Microsoft Entra Conditional Access with multifactor authentication for purview administrators, domain admins and collection admins, data source admins, and data curators. Make sure emergency accounts are excluded from the Conditional Access policy.
Design considerations
Microsoft Purview access management has moved into data plane. Azure Resource Manager roles aren't used anymore, so you should use Microsoft Purview to assign roles.
In Microsoft Purview, you can assign roles to users, security groups, and service principals (including managed identities) from Microsoft Entra ID on the same Microsoft Entra tenant where the Microsoft Purview account is deployed.
You must first add guest accounts to your Microsoft Entra tenant as B2B users before you can assign Microsoft Purview roles to external users.
By default, domain admins also obtain data source admins, data reader and data curator roles so they have access to read or modify assets.
By default, Global Administrator is added as collection admins on the default domain.
By default, all role assignments are automatically inherited by all child collections. But you can enable Restrict inherited permissions on any collection except for the root collection. Restrict inherited permissions removes the inherited roles from all parent collections, except for the collection admin role.
For Azure Data Factory connection: to connect to Azure Data Factory, you have to be a collection admin on the default domain.
If you need to connect to Azure Data Factory for lineage, grant the Data Curator role to the data factory's managed identity at your Microsoft Purview root collection level. When you connect Data Factory to Microsoft Purview in the authoring UI, Data Factory tries to add these role assignments automatically. If you have the collection admin role on the Microsoft Purview default domain, this operation works.
Domains and collections archetypes
You can deploy your Microsoft Purview domains and collections based on centralized, decentralized, or hybrid data management and governance models. Base this decision on your business, legal and security requirements.
Example 1: Single organization with a single environment and shared legal requirements
This structure is suitable for organizations that:
- Are based in a single geographic location and operate under same legal requirements.
- Have a centralized data management and governance team where the next level of data management falls into departments, teams, or projects.
The hierarchy consists of these verticals:
Domains:
- Default domain: Contoso
Collections under default domain:
- Departments (a delegated collection for each department)
- Teams or projects (further segregation based on projects)
There's no need for more domains, since there are no specific business or legal requirements to add more.
Organization-level shared data sources are registered and scanned in the Hub collection.
The department-level shared data sources are registered and scanned in the department collections.
Each data source is registered and scanned in its corresponding collection. So assets also appear in the same collection.
Example 2: Single multi-region organization with centralized management
This scenario is useful for organizations:
- That have presence in multiple regions.
- Where the data governance team is centralized or decentralized in each region.
- Where data management teams are distributed in each geographic location, and there's also a centralized federated management.
- Teams that need to manage their own data sources and resources
The domain and collection hierarchy consists of these verticals:
Domains:
- Default domain: FourthCoffee
Collections under default domain:
- Geographic locations (top-level collections based on geographic locations where data sources and data owners are located)
- Departments (a delegated collection for each department)
- Teams or projects (further segregation based on projects)
In this scenario, each region has a collection of its own under the default domain in the Microsoft Purview account. Data sources are registered and scanned in the corresponding collections in their own geographic locations. So assets also appear in the collection hierarchy for the region.
If you have centralized data management and governance teams, you can grant them access from the default domain. When you do, they gain oversight for the entire data estate in the data map. Optionally, the centralized team can register and scan any shared data sources. The centralized team can also manage security resources such as credentials and integration runtimes.
Region-based data management and governance teams can obtain access from their corresponding collections.
The department-level shared data sources are registered and scanned in the department collections.
Example 3: Single organization with multiple environments
This scenario can be useful if you have single tenant for all types of prod and non-prod environments and you need to isolate resources as much as possible. Data scientists and data engineers who can transform data to make it more meaningful, can manage Raw and Refine zones. They can then move the data into Produce or Curated zones in the corresponding environments.
The domain and collection hierarchy consists of these verticals:
Domains:
- Default domain: Fabrikam production
- Custom domain 1: Dev and Test
- Custom domain 2: QA
Collections under each domain could follow any of these verticals:
- Departments, Teams or projects (further segregation based on projects)
- Data transformation stages (Raw, Enriched, Produce/Curated, Development, etc.)
Data scientists and data engineers can have the Data Curators role on their corresponding zones so they can curate metadata. Data Reader access to the curated zone can be granted to entire data personas and business users.
Example 4: Multiple organizations or companies, using the same Entra ID tenant with decentralized management
This option can be used for scenarios where multiple companies share the same Entra ID tenant and each organization need to organize metadata and manage their own resources
Note
If you previously had multiple Microsoft Purview accounts in your tenant, the first account you select to migrate becomes default domain and you can upgrade the other accounts into separate domains.
The domain and collection hierarchy consists of these verticals:
Domains:
- Default domain: parent company or organization such as Contoso
- Custom domain 1: FourthCoffee
- Custom domain 2: Fabrikam
Collections under each domain could follow any of these verticals:
- Departments, teams or projects (further segregation based on projects)
- Data transformation stages (Raw, Enriched, Produce/Curated, Development, etc.)
- Regions within an organization
Each organization has a domain of its own with their own collections hierarchy in the Microsoft Purview account. Security resources are managed inside each domain and data sources are registered and scanned in the corresponding domains. Assets are added to the subcollection hierarchy for the specific domain.
If you have centralized data management and governance organization, that can be the default domain, so they can manage shared resources such as integration run-times, managed attributes, etc.
Organizational data management and governance teams can obtain access from their corresponding collections at a lower level, depending in centralized or decentralized management in each domain.
Note
A shared non-production domain can be created and used by multiple organizations, each org having their own top level collection in the non-prod domain.
Access management options
If you want to implement data democratization across an entire organization, use one domain and assign the Data Reader role at the default domain to data management, governance, and business users. Assign Data Source Admin and Data Curator roles at the subcollection levels to the corresponding data management and governance teams.
If you need to restrict access to metadata search and discovery in your organization, assign Data Reader and Data Curator roles at the specific collection level. For example, you could restrict US employees so they can read data only at the US collection level and not in the LATAM collection.
Create additional domains only if you need to, such as when separating prod and non-pro environments, upgrading multiple accounts into one unified account or have multiple companies inside the same tenant who have different security requirements.
You can apply a combination of these scenarios in your Microsoft Purview data map by using domains and collections.
Assign the domain admin role to the centralized data security and management team at the default collection. Delegate further domains or collection management of additional domains and lower-level collections to corresponding teams.