从 Azure Databricks 访问 Azure Cosmos DB Cassandra API 数据Access Azure Cosmos DB Cassandra API data from Azure Databricks

适用于: Cassandra API

本文详细介绍如何在 Azure Databricks 上通过 Spark 使用 Azure Cosmos DB Cassandra API。This article details how to work with Azure Cosmos DB Cassandra API from Spark on Azure Databricks.

先决条件Prerequisites

添加必需的依赖项Add the required dependencies

  • Cassandra Spark 连接器:- 要将 Azure Cosmos DB Cassandra API 与 Spark 集成,Cassandra 连接器应附加到 Azure Databricks 群集。Cassandra Spark connector: - To integrate Azure Cosmos DB Cassandra API with Spark, the Cassandra connector should be attached to the Azure Databricks cluster. 若要附加群集:To attach the cluster:

    • 查看 Databricks 运行时版本,即 Spark 版本。Review the Databricks runtime version, the Spark version. 然后找到与 Cassandra Spark 连接器兼容的 maven 坐标,并将其附加到群集。Then find the maven coordinates that are compatible with the Cassandra Spark connector, and attach it to the cluster. 请参阅“上传 Maven 包或 Spark 包”一文,将连接器库附加到群集。See "Upload a Maven package or Spark package" article to attach the connector library to the cluster. 例如,“Databricks Runtime 版本 4.3”、“Spark 2.3.1”和“Scala 2.11”的 maven 坐标为 spark-cassandra-connector_2.11-2.3.1For example, maven coordinate for "Databricks Runtime version 4.3", "Spark 2.3.1", and "Scala 2.11" is spark-cassandra-connector_2.11-2.3.1
  • Azure Cosmos DB Cassandra API 特定的库:- 需要自定义连接工厂才能将重试策略从 Cassandra Spark 连接器配置到 Azure Cosmos DB Cassandra API。Azure Cosmos DB Cassandra API-specific library: - A custom connection factory is required to configure the retry policy from the Cassandra Spark connector to Azure Cosmos DB Cassandra API. 添加 com.microsoft.azure.cosmosdb:azure-cosmos-cassandra-spark-helper:1.0.0maven 坐标将库附加到群集。Add the com.microsoft.azure.cosmosdb:azure-cosmos-cassandra-spark-helper:1.0.0maven coordinates to attach the library to the cluster.

示例笔记本Sample notebooks

可在 GitHub 存储库下载一系列 Azure Databricks 示例笔记本A list of Azure Databricks sample notebooks are available in GitHub repo for you to download. 这些示例包括如何从 Spark 连接到 Azure Cosmos DB Cassandra API 并对数据执行不同的 CRUD 操作。These samples include how to connect to Azure Cosmos DB Cassandra API from Spark and perform different CRUD operations on the data. 此外可以将所有笔记本导入到 Databricks 群集工作区并运行。You can also import all the notebooks into your Databricks cluster workspace and run it.

从 Spark Scala 程序访问 Azure Cosmos DB Cassandra APIAccessing Azure Cosmos DB Cassandra API from Spark Scala programs

在 Azure Databricks 上作为自动化流程运行的 Spark 程序将通过使用 spark-submit) 提交给群集,并安排在 Azure Databricks 作业中运行。Spark programs to be run as automated processes on Azure Databricks are submitted to the cluster by using spark-submit) and scheduled to run through the Azure Databricks jobs.

以下链接可以帮助你开始生成 Spark Scala 程序,以便与 Azure Cosmos DB Cassandra API 进行交互。The following are links to help you get started building Spark Scala programs to interact with Azure Cosmos DB Cassandra API.

后续步骤Next steps

使用 Java 应用程序开始创建 Cassandra API 帐户、数据库和表Get started with creating a Cassandra API account, database, and a table by using a Java application.