AI and machine learning integrations

Azure Databricks has validated integrations with various third-party solutions that enable common machine learning scenarios.

Ray integration

Ray is an open source framework for scaling Python applications. It includes libraries specific to AI workloads, making it especially suited for developing AI applications. Running Ray on Azure Databricks allows you to leverage the breadth of the Azure Databricks ecosystem, enhancing data processing and machine learning workflows with services and integrations unavailable in open source Ray.

See What is Ray on Azure Databricks? for more information.

Graphframes integration

GraphFrames is a package for Apache Spark that provides DataFrame-based graphs. It provides high-level APIs in Java, Python, and Scala. It aims to provide both the functionality of GraphX and extended functionality, taking advantage of Spark DataFrames. This extended functionality includes motif finding, DataFrame-based serialization, and highly expressive graph queries.

Data labeling

Labeling additional training data is an important step for many machine learning workflows, such as classification or computer vision applications. Azure Databricks does not directly support data labeling; however, the Databricks partnership with Labelbox simplifies the process.

See Partner Connect documentation for Labelbox.