Photon runtime

Photon is the native vectorized query engine on Azure Databricks, written to be directly compatible with Apache Spark APIs so it works with your existing code. It is developed in C++ to take advantage of modern hardware, and uses the latest techniques in vectorized query processing to capitalize on data- and instruction-level parallelism in CPUs, enhancing performance on real-world data and applications-—all natively on your data lake. Photon is part of a high-performance runtime that runs your existing SQL and DataFrame API calls faster and reduces your total cost per workload.

Azure Databricks clusters

Photon is available for clusters running Databricks Runtime 9.1 LTS and above.

To enable Photon acceleration, select the Use Photon Acceleration checkbox when you create the cluster. If you create the cluster using the clusters API, set runtime_engine to PHOTON.

Photon supports a number of instance types on the driver and worker nodes. Photon instance types consume DBUs at a different rate than the same instance type running the non-Photon runtime. For more information about Photon instances and DBU consumption, see the Azure Databricks pricing page.

Photon coverage


  • Scan, Filter, Project
  • Hash Aggregate/Join/Shuffle
  • Nested-Loop Join
  • Null-Aware Anti Join
  • Union, Expand, ScalarSubquery
  • Delta/Parquet Write Sink
  • Sort
  • Window Function


  • Comparison / Logic
  • Arithmetic / Math (most)
  • Conditional (IF, CASE, etc.)
  • String (common ones)
  • Casts
  • Aggregates(most common ones)
  • Date/Timestamp

Data types

  • Byte/Short/Int/Long
  • Boolean
  • String/Binary
  • Decimal
  • Float/Double
  • Date/Timestamp
  • Struct
  • Array
  • Map

Photon advantages

  • Supports SQL and equivalent DataFrame operations against Delta and Parquet tables.
  • Expected to accelerate queries that process a significant amount of data (100GB+) and include aggregations and joins.
  • Faster performance when data is accessed repeatedly from the disk cache.
  • More robust scan performance on tables with many columns and many small files.
  • Faster Delta and Parquet writing using UPDATE, DELETE, MERGE INTO, INSERT, and CREATE TABLE AS SELECT, especially for wide tables (hundreds to thousands of columns).
  • Replaces sort-merge joins with hash-joins.


  • Does not support Spark Structured Streaming.
  • Does not support UDFs.
  • Does not support RDD APIs.
  • Not expected to improve short-running queries (<2 seconds), for example, queries against small amounts of data.

Features not supported by Photon run the same way they would with Databricks Runtime; there is no performance advantage for those features.