Neo4jNeo4j

Neo4j 是一个利用数据关系作为第一类实体的本机图形数据库。Neo4j is a native graph database that leverages data relationships as first-class entities. 可以使用 neo4j-spark-connector(为 RDD、DataFrame、GraphX 和 GraphFrames 提供 Apache Spark API)将 Azure Databricks 群集连接到 Neo4j 群集。You can connect an Azure Databricks cluster to a Neo4j cluster using the neo4j-spark-connector, which offers Apache Spark APIs for RDD, DataFrame, GraphX, and GraphFrames. neo4j-spark-connector 使用二进制 Bolt 协议将数据传输到 Neo4j 服务器以及从其传输数据。The neo4j-spark-connector uses the binary Bolt protocol to transfer data to and from the Neo4j server.

本文介绍如何部署和配置 Neo4j、如何配置 Azure Databricks 以访问 Neo4j,并提供演示用法的笔记本。This article describes how to deploy and configure Neo4j, configure Azure Databricks to access Neo4j, and includes a notebook demonstrating usage.

备注

无法从运行 Databricks Runtime 7.0 或更高版本的群集访问此数据源,因为支持 Apache Spark 3.0 的 Neo4j 连接器不可用。You cannot access this data source from a cluster running Databricks Runtime 7.0 or above because a Neo4j connector that supports Apache Spark 3.0 is not available.

Neo4j 部署和配置Neo4j deployment and configuration

可以在不同的云提供商上部署 Neo4j。You can deploy Neo4j on various cloud providers.

若要部署 Neo4j,请参阅官方的 Neo4j 云部署指南。To deploy Neo4j, see the official Neo4j cloud deployment guide. 本指南假定你使用 Neo4j 3.2.2。This guide assumes Neo4j 3.2.2 .

更改 Neo4j 默认密码(首次访问 Neo4j 时系统会提示你),并修改 conf/neo4j.conf 以接受远程连接。Change the Neo4j password from the default (you should be prompted when you first access Neo4j) and modify conf/neo4j.conf to accept remote connections.

# conf/neo4j.conf

# Bolt connector
dbms.connector.bolt.enabled=true
#dbms.connector.bolt.tls_level=OPTIONAL
dbms.connector.bolt.listen_address=0.0.0.0:7687

# HTTP Connector. There must be exactly one HTTP connector.
dbms.connector.http.enabled=true
#dbms.connector.http.listen_address=0.0.0.0:7474

# HTTPS Connector. There can be zero or one HTTPS connectors.
dbms.connector.https.enabled=true
#dbms.connector.https.listen_address=0.0.0.0:7473

有关详细信息,请参阅配置 Neo4j 连接器For more information, see Configuring Neo4j Connectors.

Azure Databricks 配置Azure Databricks configuration

  1. 安装两个库(neo4j-spark-connectorgraphframes)作为 Spark 包。Install two libraries: neo4j-spark-connector and graphframes as Spark Packages. 有关说明,请参阅指南。See the libraries guide for instructions.

  2. 使用这些 Spark 配置创建群集。Create a cluster with these Spark configurations.

    spark.neo4j.bolt.url bolt://<ip-of-neo4j-instance>:7687
    spark.neo4j.bolt.user <username>
    spark.neo4j.bolt.password <password>
    
  3. 导入库并测试连接。Import libraries and test the connection.

    import org.neo4j.spark._
    import org.graphframes._
    
    val neo = Neo4j(sc)
    
    // Dummy Cypher query to check connection
    val testConnection = neo.cypher("MATCH (n) RETURN n;").loadRdd[Long]
    

Neo4j 笔记本Neo4j notebook

获取笔记本Get notebook