Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Important
This documentation has been retired and might not be updated.
Databricks recommends that instead of dbx sync, you use the Databricks CLI versions 0.205 or above, which includes functionality similar to dbx sync through the databricks sync command.
The Databricks extension for Visual Studio Code also includes functionality similar to dbx sync integrated into the Visual Studio Code IDE. Note that dbx sync can synchronize file changes from a local development machine to DBFS, workspace locations, and Databricks Git folders in your Azure Databricks workspaces. The Databricks extension for Visual Studio Code supports synchronizing file changes only to workspace user (/Users) files and Databricks Git folders (/Repos).
Note
This article covers dbx by Databricks Labs, which is provided as-is and is not supported by Databricks through customer technical support channels. Questions and feature requests can be communicated through the Issues page of the databrickslabs/dbx repo on GitHub.
You can perform real-time synchronization of changes to files on your local development machine with their corresponding files in your Azure Databricks workspaces by using dbx by Databricks Labs. These workspace files can be in DBFS or in Databricks Git folders.
Real-time file synchronization with dbx (also known as dbx sync) is useful in rapid code development scenarios. For example, you can use a local integrated development environment (IDE) for productivity features such as syntax highlighting, smart code completion, code linting, and testing and debugging. You can then go immediately to your workspace and run your updated code.
You can use dbx sync by itself, with automated jobs, or with an IDE.
dbx sync development workflows
There are two development workflows for dbx sync, one with DBFS and another with Databricks Git folders.
The typical development workflow with dbx sync and DBFS is:
Identify a local directory that contains the files you want to synchronize to DBFS.
Identify the path in DBFS that you want your local directory to synchronize with (or let
dbx synccreate a default DBFS path for you).Run
dbx sync dbfsto synchronize your local directory to the DBFS path.dbx syncbegins watching your local directory for any file changes.Make changes to files in your local directory as needed.
dbx syncapplies those changes to the corresponding files in the DBFS path in real time.
The typical development workflow with dbx sync and Databricks Git folders is:
Create a repository with a Git provider that Databricks Git folders supports, if you do not have a repository available already.
Clone your repo into your Azure Databricks workspace.
Clone your repo into your local development machine.
Run
dbx sync repoto associate your local cloned repo with your workspace cloned repo.dbx syncbegins watching your local directory for any file changes.Make changes to files in your local cloned repo as needed.
dbx syncapplies those changes to the corresponding files in Databricks Git folders in real time.Periodically push updated files from the cloned repo in your workspace to your Git provider, so that the repo stays up to date with your Git provider.
Important
dbx sync only performs one-way, real-time synchronization of file changes from your local development machine to your remote workspace. Therefore, Databricks does not recommend that you initiate changes in your Azure Databricks workspace to files that are monitored by dbx sync. If you must make such workspace-initiated file changes, then you must also do the following:
- For file changes in DBFS, make the corresponding changes to the local files manually.
- For file changes in Databricks Git folders, push the file changes from your workspace to your Git provider. Then, on your local development machine, pull those file changes from your Git provider.
Requirements
If you want to use dbx sync with Databricks Git folders, your Azure Databricks workspace must meet the following requirement:
- A clone of your repository with your Git provider, while not required, is suggested.
On your local development machine, you must have the following installed:
Python version 3.8 or above. To check whether Python is installed, and to check your installed Python version, run
python --versionin your terminal or PowerShell.python --versionNote
Some installations of
pythonmay require you to usepython3instead ofpython. If so, substitutepythonwithpython3throughout this article.pip. To check whether
pipis installed, and to check your installedpipversion, runpip --versionorpython -m pip --version.pip --version # Or... python -m pip --versionNote
Some installations of
pipmay require you to usepip3instead ofpip. If so, substitutepipwithpip3throughout this article.dbx version 0.8.0 or above. To check whether
dbxis installed, and to check your installeddbxversion, rundbx --version. To installdbxfrom the Python Package Index (PyPI), runpip install dbxorpython -m pip install dbx. (dbxincludes dbx sync.)# Check whether dbx is installed, and check its version. dbx --version # Install dbx. pip install dbx # Or... python -m pip install dbxNote
For more information about
dbx, see dbx by Databricks Labs and the dbx documentation.The Databricks CLI version 0.18 or below, set up with authentication. The legacy Databricks CLI (Databricks CLI version 0.17) is automatically installed when you install
dbx. This authentication can be set up on your local development machine in one or both of the following locations:- Within the
DATABRICKS_HOSTandDATABRICKS_TOKENenvironment variables (starting with legacy Databricks CLI version 0.8.0). - In an Azure Databricks configuration profile within your
.databrickscfgfile.
dbxlooks for authentication credentials in these two locations, respectively.dbxuses only the first set of matching credentials that it finds.Note
If you use a
.databrickscfgfile,dbx synclooks in this file for a configuration profile namedDEFAULTby default. To specify a different profile, use the--profileoption when you run thedbx synccommand, later in this article.dbxdoes not support the use of a .netrc file for authentication.- Within the
If you want to use
dbx syncwith Databricks Git folders, a local clone of your repository with your Git provider, while not required, is suggested. To perform a local clone, consult your Git provider's documentation.
Use DBFS with dbx sync
From the terminal or PowerShell on your local development machine, change to the directory that contains the files you want to synchronize to DBFS in your Azure Databricks workspace.
Run the dbx sync command to synchronize your local directory to DBFS in your workspace, as follows. (Do not forget the dot (
.) at the end, which represents your current directory.)dbx sync dbfs --source .Tip
To specify a different source directory, replace the dot (
.) with a different path.Note
If the error
Error: No such command 'sync'appears, your installation ofdbxis likely out of date. To fix this, runpip install --upgrade dbx==<version>orpython -m pip install --upgrade dbx==version, where<version>is the latest version ofdbx. This version number can be found on the PyPI webpage for dbx.pip install --upgrade dbx==<version> # Or... python -m pip install --upgrade dbx==versiondbx syncbegins synchronizing files in your current local directory with files in the following DBFS path in your workspace.dbx syncconfirms this by printingTarget base pathfollowed by the DBFS path, for example:/tmp/users/<your-Databricks-username>/<local-directory-name>Tip
To specify a different username or DBFS path, specify the
--userand--destoptions, respectively, when you rundbx sync.Make changes to your local files, as needed.
Important
You must keep your terminal or PowerShell open for
dbx syncto continue synchronizing. If you close your terminal or PowerShell,dbx syncstops watching for file changes and stops synchronizing. To resume file change synchronization, repeat this procedure from the beginning.As needed, verify your file changes in the preceding path in DBFS in your workspace.
Use Databricks Git folders with dbx sync
From the terminal or PowerShell on your local development machine, change to the root directory that contains the clone of the repository with your Git provider.
In your Azure Databricks workspace, identify the name of the Databricks Git folder that you want to synchronize your local cloned repo to. You can find this repo name by clicking Git folders in your workspace's sidebar.
On your local development machine, run the dbx sync command to synchronize your local cloned repository to the Databricks Git folders in your workspace as follows, replacing
<your-repo-name>with the name of your repo in Databricks Git folders. (Do not forget the dot (.) at the end, which represents your current directory.)dbx sync repo -d <your-repo-name> --source .Tip
To specify a different source directory, replace the dot (
.) with a different path.Note
If the error
Error: No such command 'sync'appears, your installation ofdbxis likely out of date. To fix this, runpip install --upgrade dbx==<version>orpython -m pip install --upgrade dbx==version, where<version>is the latest version ofdbx. This version number can be found on the PyPI webpage for dbx.pip install --upgrade dbx==<version> # Or... python -m pip install --upgrade dbx==versiondbx syncbegins synchronizing files in your local cloned repository with files in Databricks Git folders in your workspace.dbx syncconfirms this by printingTarget base pathfollowed by the Databricks Git folders path, for example:/Repos/<your-Databricks-username>/<your-repo-name>Tip
To specify a different username or repo name, specify the
--userand--dest-repooptions, respectively, when you rundbx sync.Make changes to your local files, as needed.
Important
You must keep your terminal or PowerShell open for
dbx syncto continue synchronizing. If you close your terminal or PowerShell,dbx syncstops watching for file changes and stops synchronizing. To resume file change synchronization, repeat this procedure from the beginning.As needed, verify your file changes in Databricks Git folders in your workspace.
Additional resources
- dbx documentation
- dbx sync documentation
- databrickslabs/dbx repository on GitHub
- dbx limitations