Known limitations Databricks notebooks
This article covers known limitations of Databricks notebooks. For additional resource limits, see Resource limits.
Notebook sizing
- Individual notebook cells have an input limit of 6 MB.
- The maximum notebook size for revision snapshots autosaving, import, export, and cloning is 10 MB.
- You can manually save notebooks up to 32 MB.
Notebook results table
- Table results are limited to 10K rows or 2MB, whichever is lower.
- Job clusters have a maximum notebook output size of 30 MB.
- Non tabular commands results have a 20MB limit.
- By default, text results return a maximum of 50,000 characters. With Databricks Runtime 12.2 LTS and above, you can increase this limit by setting the Spark configuration property
spark.databricks.driver.maxReplOutputLength
.
Notebook debugger
Limitations of the notebook debugger:
- The debugger works only with Python. It does not support Scala or R.
- The debugger does not work on Shared access mode clusters.
- The debugger does not support stepping into external files or modules.
- You cannot run other commands in the notebook when a debug session is active.
SQL warehouse notebooks
Limitations of SQL warehouses notebooks:
- When attached to a SQL warehouse, execution contexts have an idle timeout of 8 hours.
ipywidgets
Limitations of ipywidgets:
- A notebook using ipywidgets must be attached to a running cluster.
- Widget states are not preserved across notebook sessions. You must re-run widget cells to render them each time you attach the notebook to a cluster.
- The Password and Controller ipywidgets are not supported.
- HTMLMath and Label widgets with LaTeX expressions do not render correctly. (For example,
widgets.Label(value=r'$$\frac{x+1}{x-1}$$')
does not render correctly.) - Widgets might not render correctly if the notebook is in dark mode, especially colored widgets.
- Widget outputs cannot be used in notebook dashboard views.
- The maximum message payload size for an ipywidget is 5 MB. Widgets that use images or large text data may not be properly rendered.
Databricks widgets
Limitations of Databricks widgets:
A maximum of 512 widgets can be created in a notebook.
A widget name is limited to 1024 characters.
A widget label is limited to 2048 characters.
A maximum of 2048 characters can be input to a text widget.
There can be a maximum of 1024 choices for a multi-select, combo box, or dropdown widget.
There is a known issue where a widget state might not properly clear after pressing Run All, even after clearing or removing the widget in the code. If this happens, you will see a discrepancy between the widget's visual and printed states. Re-running the cells individually might bypass this issue. To avoid this issue entirely, Databricks recommends using ipywidgets.
You should not access widget state directly in asynchronous contexts like threads, subprocesses, or Structured Streaming (foreachBatch), as widget state can change while the asynchronous code is running. If you need to access widget state in an asynchronous context, pass it in as an argument. For example, if you have the following code that uses threads:
import threading def thread_func(): # Unsafe access in a thread value = dbutils.widgets.get('my_widget') print(value) thread = threading.Thread(target=thread_func) thread.start() thread.join()
Databricks recommends using an argument instead:
# Access widget values outside the asynchronous context and pass them to the function value = dbutils.widgets.get('my_widget') def thread_func(val): # Use the passed value safely inside the thread print(val) thread = threading.Thread(target=thread_func, args=(value,)) thread.start() thread.join()
Widgets can't generally pass arguments between different languages within a notebook. You can create a widget
arg1
in a Python cell and use it in a SQL or Scala cell if you run one cell at a time. However, this does not work if you use Run All or run the notebook as a job. Some workarounds are:- For notebooks that do not mix languages, you can create a notebook for each language and pass the arguments when you run the notebook.
- You can access the widget using a
spark.sql()
call. For example, in Python:spark.sql("select getArgument('arg1')").take(1)[0][0]
.
Bamboolib
Limitations of bamboolib:
- Using bamboolib for data wrangling is limited to approximately 10 million rows. This limit is based on pandas and your cluster's compute resources.
- Using bamboolib for data visualizations is limited to approximately 10 thousand rows. This limit is based on plotly.