Connect to dbt Core
This artcle explains what is dbt, how to install dbt Core, and how to connect. The hosted version of dbt, called dbt Cloud is also available. For more information, see Connect to dbt Cloud.
What is dbt?
dbt (data build tool) is a development environment for transforming data by writing select statements. dbt turns these select statements into tables and views. dbt compiles your code into raw SQL and then runs that code on the specified database in Azure Databricks. dbt supports collaborative coding patterns and best practices, including version control, documentation, and modularity.
dbt does not extract or load data. dbt focuses on the transformation step only, using a "transform after load" architecture. dbt assumes that you already have a copy of your data in your database.
dbt Core enables you to write dbt code in the IDE of your choice on your local development machine and then run dbt from the command line. dbt Core includes the dbt Command Line Interface (CLI). The dbt CLI is free to use and open source.
dbt Core (and dbt Cloud) can use hosted git repositories. For more information, see Creating a dbt project and Using an existing project on the dbt website.
Installation requirements
Before you install dbt Core, you must install the following on your local development machine:
You also need one of the following to authenticate:
(Recommended) dbt Core enabled as an OAuth application in your account. This is enabled by default.
A personal access token
Note
As a security best practice when you authenticate with automated tools, systems, scripts, and apps, Databricks recommends that you use OAuth tokens.
If you use personal access token authentication, Databricks recommends using personal access tokens belonging to service principals instead of workspace users. To create tokens for service principals, see Manage tokens for a service principal.
Step 1: Install the dbt Databricks adapter
We recommend using a Python virtual environment because it isolates package versions and code dependencies to that specific environment, regardless of the package versions and code dependencies in other environments. This helps reduce unexpected package version mismatches and code dependency collisions.
Databricks recommends version 1.8.0 or greater of the dbt-databricks package.
Important
If your local development machine uses any of the following operating systems, you must complete additional steps first: CentOS, MacOS, Ubuntu, Debian, and Windows. See the "Does my operating system have prerequisites" section of Use pip to install dbt on the dbt Labs website.
Step 2: Create a dbt project and specify and test connection settings
Create a dbt project (a collection of related directories and files required to use dbt). You then configure your connection profiles, which contain connection settings to an Azure Databricks compute, a SQL warehouse, or both. To increase security, dbt projects and profiles are stored in separate locations by default.
With the virtual environment still activated, run the dbt init command with the project name. This example procedure creates a project named
my_dbt_demo
.dbt init my_dbt_demo
When you are prompted to choose a
databricks
orspark
database, enter the number that corresponds todatabricks
.When prompted for a
host
value, do the following:- For a compute, enter the Server Hostname value from the Advanced Options, JDBC/ODBC tab for your Azure Databricks compute.
- For a SQL warehouse, enter the Server Hostname value from the Connection Details tab for your SQL warehouse.
When prompted for an
http_path
value, do the following:- For a compute, enter the HTTP Path value from the Advanced Options, JDBC/ODBC tab for your Azure Databricks compute.
- For a SQL warehouse, enter the HTTP Path value from the Connection Details tab for your SQL warehouse.
To choose an authentication type, enter the number that corresponds with
use oauth
(recommended) oruse access token
.If you chose
use access token
for your authentication type, enter the value of your Azure Databricks personal access token.Note
As a security best practice, when you authenticate with automated tools, systems, scripts, and apps, Databricks recommends that you use personal access tokens belonging to service principals instead of workspace users. To create tokens for service principals, see Manage tokens for a service principal.
When prompted for the
desired Unity Catalog option
value, enter the number that corresponds withuse Unity Catalog
ornot use Unity Catalog
.If you chose to use Unity Catalog, enter the desired value for
catalog
when prompted.Enter the desired values for
schema
andthreads
when prompted.dbt writes your entries to a
profiles.yml
file. The location of this file is listed in the output of thedbt init
command. You can also list this location later by running thedbt debug --config-dir
command. You can open this file now to examine and verify its contents.If you chose
use oauth
for your authentication type, add your machine-to-machine (M2M) or user-to-machine (U2M) authentication profile toprofiles.yml
.Databricks does not recommend specifying secrets in
profiles.yml
directly. Instead, set the client ID and client secret as environment variables.Confirm the connection details by running the
dbt debug
command on themy_dbt_demo
directory.If you chose
use oauth
for your authentication type, you're prompted to sign in with your identity provider.Important
Before you begin, verify that your compute or SQL warehouse is running.
You should see output similar to the following:
cd my_dbt_demo dbt debug
... Configuration: profiles.yml file [OK found and valid] dbt_project.yml file [OK found and valid] Required dependencies: - git [OK found] Connection: ... Connection test: OK connection ok
Next steps
- Create, run, and test dbt Core models locally. See the dbt Core tutorial.