Run and debug notebook cells with Databricks Connect using the Databricks extension for Visual Studio Code
You can run and debug notebooks, one cell at a time or all cells at once, and see their results in the Visual Studio Code UI using the Databricks extension for Visual Studio Code Databricks Connect integration. All code runs locally, while all code involving DataFrame operations runs on the cluster in the remote Azure Databricks workspace and run responses are sent back to the local caller. All code is debugged locally, while all Spark code continues to run on the cluster in the remote Azure Databricks workspace. The core Spark engine code cannot be debugged directly from the client.
Note
This feature works with Databricks Runtime 13.3 and above.
To enable the Databricks Connect integration for notebooks in the Databricks extension for Visual Studio Code, you must install Databricks Connect in the Databricks extension for Visual Studio Code. See Debug code using Databricks Connect for the Databricks extension for Visual Studio Code.
Run Python notebook cells
For notebooks with filenames that have a .py
extension, when you open the notebook in the Visual Studio Code IDE, each cell displays Run Cell, Run Above, and Debug Cell buttons. As you run a cell, its results are shown in a separate tab in the IDE. As you debug, the cell being debugged displays Continue, Stop, and Step Over buttons. As you debug a cell, you can use Visual Studio Code debugging features such as watching variables' states and viewing the call stack and debug console.
For notebooks with filenames that have a .ipynb
extension, when you open the notebook in the Visual Studio Code IDE, the notebook and its cells contain additional features. See Running cells and Work with code cells in the Notebook Editor.
For more information about notebook formats for filenames with the .py
and .ipynb
extensions, see Export and import Databricks notebooks.
Run Python Jupyter noteboook cells
To run or debug a Python Jupyter notebook (.ipynb
):
In your project, open the Python Jupyter notebook that you want to run or debug. Make sure the Python file is in Jupyter notebook format and has the extension
.ipynb
.Tip
You can create a new Python Jupyter notebook by running the >Create: New Jupyter Notebook command from within the Command Palette.
Click Run All Cells to run all cells without debugging, Execute Cell to run an individual corresponding cell without debugging, or Run by Line to run an individual cell line-by-line with limited debugging, with variable values displayed in the Jupyter panel (View > Open View > Jupyter).
For full debugging within an individual cell, set breakpoints, and then click Debug Cell in the menu next to the cell's Run button.
After you click any of these options, you might be prompted to install missing Python Jupyter notebook package dependencies. Click to install.
For more information, see Jupyter Notebooks in VS Code.
Notebook globals
The following notebook globals are also enabled:
spark
, representing an instance ofdatabricks.connect.DatabricksSession
, is preconfigured to instantiateDatabricksSession
by getting Azure Databricks authentication credentials from the extension. IfDatabricksSession
is already instantiated in a notebook cell's code, thisDatabricksSession
settings are used instead. See Code examples for Databricks Connect for Python.udf
, preconfigured as an alias forpyspark.sql.functions.udf
, which is an alias for Python UDFs. See pyspark.sql.functions.udf.sql
, preconfigured as an alias forspark.sql
.spark
, as described earlier, represents a preconfigured instance ofdatabricks.connect.DatabricksSession
. See Spark SQL.dbutils
, preconfigured as an instance of Databricks Utilities, which is imported fromdatabricks-sdk
and is instantiated by getting Azure Databricks authentication credentials from the extension. See Use Databricks Utilities.Note
Only a subset of Databricks Utilities is supported for notebooks with Databricks Connect.
To enable
dbutils.widgets
, you must first install the Databricks SDK for Python by running the following command in your local development machine's terminal:pip install 'databricks-sdk[notebook]'
display
, preconfigured as an alias for the Jupyter builtinIPython.display.display
. See IPython.display.display.displayHTML
, preconfigured as an alias fordbruntime.display.displayHTML
, which is an alias fordisplay.HTML
fromipython
. See IPython.display.html.
Notebook magics
The following notebook magics are also enabled:
%fs
, which is the same as makingdbutils.fs
calls. See Mix languages.%sh
, which runs a command by using the cell magic%%script
on the local machine. This does not run the command in the remote Azure Databricks workspace. See Mix languages.%md
and%md-sandbox
, which runs the cell magic%%markdown
. See Mix languages.%sql
, which runsspark.sql
. See Mix languages.%pip
, which runspip install
on the local machine. This does not runpip install
in the remote Azure Databricks workspace. See Manage libraries with %pip commands.%run
, which runs another notebook. See Run a Databricks notebook from another notebook.Note
To enable
%run
, you must first install the nbformat library by running the following command in your local development machine's terminal:pip install nbformat
Additional features that are enabled include:
- Spark DataFrames are converted to pandas DataFrames, which are displayed in Jupyter table format.
Limitations
Limitations of running cells in notebooks in Visual Studio Code include:
- The notebooks magics
%r
and%scala
are not supported and display an error if called. See Mix languages. - The notebook magic
%sql
does not support some DML commands, such as Show Tables.