How to use GraphFrames on Azure Databricks
This article includes example notebooks to help you get started using GraphFrames on Azure Databricks. GraphFrames is a package for Apache Spark that provides DataFrame-based graphs. It provides high-level APIs in Java, Python, and Scala. It aims to provide both the functionality of GraphX and extended functionality taking advantage of Spark DataFrames. This extended functionality includes motif finding, DataFrame-based serialization, and highly expressive graph queries.
This article includes three example notebooks: a introductory notebook available in Python and in Scala, and a Python user guide. For additional examples using GraphFrames with Scala, see GraphFrames user guide - Scala.
Databricks Runtime recommendation for GraphFrames
Databricks recommends using a cluster running Databricks Runtime for Machine Learning, as it includes an optimized installation of GraphFrames.
If you are not using a cluster running Databricks Runtime ML, download the JAR file from the GraphFrames library, load it to a volume, and install it onto your cluster.
Get started with GraphFrames
The following notebooks show you how to use GraphFrames to perform graph analysis.
Graph Analysis with GraphFrames (Python)
Graph Analysis with GraphFrames (Scala)
GraphFrames user guide (Python)
The following notebook includes Python code examples of how to use GraphFrames.