Discover data
Azure Databricks provides a suite of tools and products that simplify the discovery of data assets that are accessible through the Databricks Data Intelligence Platform. This article provides an opinionated overview of how you can discover and preview data that has already been configured for access in your workspace.
- To connect to data sources, see Connect to data sources.
Topics in this section focus on exploring data objects and data files. If you're looking for information about working with assets such as notebooks, SQL queries, libraries, and models, see Navigate the workspace.
If you're seeking guidance around generating summary statistics for datasets or other tasks associated with exploratory data analysis (EDA), see Exploratory data analysis on Azure Databricks: Tools and techniques.
How can you discover data assets?
Data discovery tools on Azure Databricks fall into the following general categories:
- Keyword search.
- Catalog exploration using the UI.
- Programmatic listing and metadata exploration.
Data discovery tools are optimized for data governed by Unity Catalog. Data assets that have not been registered as Unity Catalog objects might not be discoverable using some of these approaches.
Find data using the UI
Catalog Explorer provides tools for exploring and governing data assets. You access Catalog Explorer using the Catalog in the workspace sidebar. See What is Catalog Explorer?.
Notebooks and the SQL query editor also provide a catalog navigator for exploring database objects. Click the Catalog icon in these interfaces to expand or collapse the catalog navigator without leaving from your code editor.
Once you've discovered a dataset of interest, you can use the Insights tab to learn how the data is being used in your workspace. See View frequent queries and users of a table.
Explore data programmatically
You can use the SHOW
command on all database objects to discover assets registered to Unity Catalog. Use the LIST
command, the %fs
magic command, or Databricks Utilities to list files.
See Explore storage and find data files and Explore database objects.
Review data comments
You can review comments to learn about the contents of datasets available in your lakehouse. Comments can be set on data objects including catalogs, schemas, tables, and columns. You can view comments in Catalog Explorer or using the DESCRIBE
command for an object.
Users can also optionally provide comments on tables and other database objects using markdown, which is rendered in Catalog Explorer. See Add comments to data and AI assets.
Search for tables in your lakehouse
You can use the search bar in Azure Databricks to find tables registered to Unity Catalog. You can either perform a keyword search or use semantic search to find datasets or columns that relate to your search query. Search only returns results for tables that you have permission to see. Search reviews table names, column names, table comments, and column comments. See Search for workspace objects.