Fivetran 集成Fivetran integration

重要

此功能目前以公共预览版提供。This feature is in Public Preview.

Fivetran 与 Azure Databricks 的集成可帮助你轻松地将数据从不同的数据源集中到 Delta Lake。The Fivetran integration with Azure Databricks helps you easily centralize data from disparate data sources into Delta Lake.

下面是结合使用 Fivetran 与 Azure Databricks 的步骤。Here are the steps for using Fivetran with Azure Databricks.

步骤 1:生成 Databricks 个人访问令牌 Step 1: Generate a Databricks personal access token

Fivetran 使用 Azure Databricks 个人访问令牌在 Azure Databricks 中进行身份验证。Fivetran authenticates with Azure Databricks using an Azure Databricks personal access token. 若要生成个人访问令牌,请按照生成个人访问令牌中的说明操作。To generate a personal access token, follow the instructions in Generate a personal access token.

步骤2:设置群集来支持集成需求 Step 2: Set up a cluster to support integration needs

Fivetran 会将数据写入 Azure Data Lake Storage 路径,而 Azure Databricks 集成群集将从该位置读取数据。Fivetran will write data to an Azure Data Lake Storage path and the Azure Databricks integration cluster will read data from that location. 因此,集成群集需要能够安全地访问 Azure Data Lake Storage 路径。Therefore the integration cluster requires secure access to the Azure Data Lake Storage path.

安全地访问 Azure Data Lake Storage 路径Secure access to an Azure Data Lake Storage path

若要安全地访问 Azure Data Lake Storage (ADLS) 中的数据,可使用 Azure 存储帐户访问密钥(推荐)或 Azure 服务主体。To secure access to data in Azure Data Lake Storage (ADLS) you can use an Azure storage account access key (recommended) or an Azure service principal.

使用 Azure 存储帐户访问密钥Use an Azure storage account access key

可在配置 Apache Spark 期间在集成群集上配置存储帐户访问密钥。You can configure a storage account access key on the integration cluster as part of the Apache Spark configuration. 确保存储帐户可访问用于暂存数据的 ADLS 容器和文件系统,以及要在其中写入 Delta Lake 表的 ADLS 容器和文件系统。Ensure that the storage account has access to the ADLS container and file system used for staging data and the ADLS container and file system where you want to write the Delta Lake tables. 若要将集成群集配置为使用密钥,请按照使用存储密钥访问 ADLS Gen2 中的步骤操作。To configure the integration cluster to use the key, follow the steps in Access ADLS Gen2 with storage key.

使用 Azure 服务主体Use an Azure service principal

可在配置 Apache Spark 期间在 Azure Databricks 集成群集上配置服务主体。You can configure a service principal on the Azure Databricks integration cluster as part of the Apache Spark configuration. 确保服务主体可访问用于暂存数据的 ADLS 容器,以及要在其中写入 Delta 表的 ADLS 容器。Ensure that the service principal has access to the ADLS container used for staging data and the ADLS container where you want to write the Delta tables.

指定群集配置Specify the cluster configuration

  1. 在“群集模式”下拉列表中,选择“标准” 。In the Cluster Mode drop-down, select Standard.

  2. 在“Databricks Runtime 版本”下拉列表中,选择 Runtime 6.3 或更高版本。In the Databricks Runtime Version drop-down, select Runtime: 6.3 or above.

  3. 将以下属性添加到 Spark 配置来打开自动优化Turn on Auto Optimize by adding the following properties to your Spark configuration:

    spark.databricks.delta.optimizeWrite.enabled true
    spark.databricks.delta.autoCompact.enabled true
    
  4. 根据集成和缩放需求配置群集。Configure your cluster depending on your integration and scaling needs.

有关群集配置的详细信息,请参阅配置群集For cluster configuration details, see Configure clusters.

有关获取 JDBC URL 和 HTTP 路径的步骤,请参阅服务器主机名、端口、HTTP 路径和 JDBC URLSee Server hostname, port, HTTP path, and JDBC URL for the steps to obtain the JDBC URL and HTTP Path.

步骤 3:获取 JDBC 和 ODBC 连接详细信息以连接到群集 Step 3: Obtain JDBC and ODBC connection details to connect to a cluster

若要将 Azure Databricks 群集连接到 Fivetran,需要以下 JDBC/ODBC 连接属性:To connect an Azure Databricks cluster to Fivetran you need the following JDBC/ODBC connection properties:

  • JDBC URLJDBC URL
  • HTTP 路径HTTP Path

步骤 4:使用 Azure Databricks 配置 FivetranStep 4: Configure Fivetran with Azure Databricks

转到 Fivetran 登录页面,然后按照说明进行操作。Go to the Fivetran login page and follow the instructions.