Asynchronous queries and interruptions with Databricks Connect for Python
Note
This article covers Databricks Connect for Databricks Runtime 14.0 and above.
This article describes how to handle asynchronous queries and interruptions with Databricks Connect for Python. Databricks Connect enables you to connect popular IDEs, notebook servers, and custom applications to Azure Databricks clusters. See What is Databricks Connect?. For the Scala version of this article, see Asynchronous queries and interruptions with Databricks Connect for Scala.
Note
Before you begin to use Databricks Connect, you must set up the Databricks Connect client.
For Databricks Connect for Databricks Runtime 14.0 and above, query execution is more resilient to network and other interrupts when executing long running queries. When the client program receives an interruption or the process is paused (up to 5 minutes) by the operating system, such as when the laptop lid is shut, the client reconnects to the running query. This also allows queries to run for longer times (previously only 1 hour).
Databricks Connect now also comes with the ability to interrupt running queries, if desired, such as for cost saving.
The following Python program interrupts a long running query by using the interruptTag()
API.
from databricks.connect import DatabricksSession
from time import sleep
import threading
session = DatabricksSession.builder.getOrCreate()
def thread_fn():
sleep(5)
session.interruptTag("interrupt-me")
# All subsequent DataFrame queries that use session will have this tag.
session.addTag("interrupt-me")
t = threading.Thread(target=thread_fn).start()
df = <a long running DataFrame query>
df.show()
t.join()