Data Discovery & Classification

Applies to: Azure SQL Database Azure SQL Managed Instance Azure Synapse Analytics

Data Discovery & Classification is built into Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics. It provides basic capabilities for discovering, classifying, labeling, and reporting the sensitive data in your databases.

Your most sensitive data might include business, financial, healthcare, or personal information. It can serve as infrastructure for:

  • Helping to meet standards for data privacy and requirements for regulatory compliance.
  • Various security scenarios, such as monitoring (auditing) access to sensitive data.
  • Controlling access to and hardening the security of databases that contain highly sensitive data.

Note

For information about SQL Server on-premises, see SQL Data Discovery & Classification.

What is Data Discovery & Classification?

Data Discovery & Classification currently supports the following capabilities:

  • Discovery and recommendations: The classification engine scans your database and identifies columns that contain potentially sensitive data. It then provides you with an easy way to review and apply recommended classification via the Azure portal.

  • Labeling: You can apply sensitivity-classification labels persistently to columns by using new metadata attributes that have been added to the SQL Server database engine. This metadata can then be used for sensitivity-based auditing scenarios.

  • Query result-set sensitivity: The sensitivity of a query result set is calculated in real time for auditing purposes.

  • Visibility: You can view the database-classification state in a detailed dashboard in the Azure portal. Also, you can download a report in Excel format to use for compliance and auditing purposes and other needs.

Discover, classify, and label sensitive columns

This section describes the steps for:

  • Discovering, classifying, and labeling columns that contain sensitive data in your database.
  • Viewing the current classification state of your database and exporting reports.

The classification includes two metadata attributes:

  • Labels: The main classification attributes, used to define the sensitivity level of the data stored in the column.
  • Information types: Attributes that provide more granular information about the type of data stored in the column.

SQL Information Protection policy

Data Discovery & Classification comes with a built-in set of sensitivity labels and information types with discovery logic which is native to the SQL logical server. You can continue using the protection labels available in the default policy file, or you can customize this taxonomy. You can define a set and ranking of classification constructs specifically for your environment.

Define and customize your classification taxonomy

You define and customize of your classification taxonomy in one central place for your entire Azure organization. That location is in Microsoft Defender for Cloud, as part of your security policy. Only someone with administrative rights on the organization's root management group can do this task.

As part of policy management, you can define custom labels, rank them, and associate them with a selected set of information types. You can also add your own custom information types and configure them with string patterns. The patterns are added to the discovery logic for identifying this type of data in your databases.

After the organization-wide policy has been defined, you can continue classifying individual databases by using your customized policy.

Classify database

Note

The below example uses Azure SQL Database, but you should select the appropriate product that you want to configure Data Discovery & Classification.

  1. Go to the Azure portal.

  2. Go to Data Discovery & Classification under the Security heading in your Azure SQL Database pane. The Overview tab includes a summary of the current classification state of the database. The summary includes a detailed list of all classified columns, which you can also filter to show only specific schema parts, information types, and labels. If you haven't classified any columns yet, skip to step 4.

    Overview

  3. To download a report in Excel format, select Export in the top menu of the pane.

  4. To begin classifying your data, select the Classification tab on the Data Discovery & Classification page.

    The classification engine scans your database for columns containing potentially sensitive data and provides a list of recommended column classifications.

  5. View and apply classification recommendations:

    • To view the list of recommended column classifications, select the recommendations panel at the bottom of the pane.

    • To accept a recommendation for a specific column, select the check box in the left column of the relevant row. To mark all recommendations as accepted, select the leftmost check box in the recommendations table header.

    • To apply the selected recommendations, select Accept selected recommendations.

    Recommendations for classification

  6. You can also classify columns manually, as an alternative or in addition to the recommendation-based classification:

    1. Select Add classification in the top menu of the pane.

    2. In the context window that opens, select the schema, table, and column that you want to classify, and the information type and sensitivity label.

    3. Select Add classification at the bottom of the context window.

    Manually add classification

  7. To complete your classification and persistently label (tag) the database columns with the new classification metadata, select Save in the Classification page.

Audit access to sensitive data

An important aspect of the classification is the ability to monitor access to sensitive data. Azure SQL Auditing has been enhanced to include a new field in the audit log called data_sensitivity_information. This field logs the sensitivity classifications (labels) of the data that was returned by a query. Here's an example:

Audit log

These are the activities that are actually auditable with sensitivity information:

  • ALTER TABLE ... DROP COLUMN
  • BULK INSERT
  • SELECT
  • DELETE
  • INSERT
  • MERGE
  • UPDATE
  • UPDATETEXT
  • WRITETEXT
  • DROP TABLE
  • BACKUP
  • DBCC CloneDatabase
  • SELECT INTO
  • INSERT INTO EXEC
  • TRUNCATE TABLE
  • DBCC SHOW_STATISTICS
  • sys.dm_db_stats_histogram

Use sys.fn_get_audit_file to return information from an audit file stored in an Azure Storage account.

Permissions

These built-in roles can read the data classification of a database:

  • Owner
  • Reader
  • Contributor
  • SQL Security Manager
  • User Access Administrator

These are the required actions to read the data classification of a database are:

  • Microsoft.Sql/servers/databases/currentSensitivityLabels/*
  • Microsoft.Sql/servers/databases/recommendedSensitivityLabels/*
  • Microsoft.Sql/servers/databases/schemas/tables/columns/sensitivityLabels/*

These built-in roles can modify the data classification of a database:

  • Owner
  • Contributor
  • SQL Security Manager

This is the required action to modify the data classification of a database are:

  • Microsoft.Sql/servers/databases/schemas/tables/columns/sensitivityLabels/*

Learn more about role-based permissions in Azure RBAC.

Manage classifications

You can use T-SQL, a REST API, or PowerShell to manage classifications.

Use T-SQL

You can use T-SQL to add or remove column classifications, and to retrieve all classifications for the entire database.

Note

When you use T-SQL to manage labels, there's no validation that labels that you add to a column exist in the organization's information-protection policy (the set of labels that appear in the portal recommendations). So, it's up to you to validate this.

For information about using T-SQL for classifications, see the following references:

Use PowerShell cmdlets

Manage classifications and recommendations for Azure SQL Database and Azure SQL Managed Instance using PowerShell.

PowerShell cmdlets for Azure SQL Database

PowerShell cmdlets for Azure SQL Managed Instance

Use the REST API

You can use the REST API to programmatically manage classifications and recommendations. The published REST API supports the following operations:

  • Create Or Update: Creates or updates the sensitivity label of the specified column.
  • Delete: Deletes the sensitivity label of the specified column.
  • Disable Recommendation: Disables sensitivity recommendations on the specified column.
  • Enable Recommendation: Enables sensitivity recommendations on the specified column. (Recommendations are enabled by default on all columns.)
  • Get: Gets the sensitivity label of the specified column.
  • List Current By Database: Gets the current sensitivity labels of the specified database.
  • List Recommended By Database: Gets the recommended sensitivity labels of the specified database.

Retrieve classifications metadata using SQL drivers

You can use the following SQL drivers to retrieve classification metadata:

Next steps

  • Consider configuring Azure SQL Auditing for monitoring and auditing access to your classified sensitive data.