Connect to Syncsort
Important
This feature is in Public Preview.
Syncsort helps you break down data silos by integrating legacy, mainframe, and IBM data with Azure Databricks. You can easily pull data from these sources into Delta Lake.
Here are the steps for using Syncsort with Azure Databricks.
Step 1: Generate a Databricks personal access token
Syncsort authenticates with Azure Databricks using an Azure Databricks personal access token.
Note
As a security best practice, when you authenticate with automated tools, systems, scripts, and apps, Databricks recommends that you use personal access tokens belonging to service principals instead of workspace users. To create tokens for service principals, see Manage tokens for a service principal.
Step 2: Set up a cluster to support integration needs
Syncsort will write data to an Azure Data Lake Storage path and the Azure Databricks integration cluster will read data from that location. Therefore the integration cluster requires secure access to the Azure Data Lake Storage path.
Secure access to an Azure Data Lake Storage path
To secure access to data in Azure Data Lake Storage (ADLS) you can use an Azure storage account access key (recommended) or a Microsoft Entra ID service principal.
Use an Azure storage account access key
You can configure a storage account access key on the integration cluster as part of the Spark configuration. Ensure that the storage account has access to the ADLS container and file system used for staging data and the ADLS container and file system where you want to write the Delta Lake tables. To configure the integration cluster to use the key, follow the steps in Connect to Azure Data Lake Storage Gen2 and Blob Storage.
Use a Microsoft Entra ID service principal
You can configure a service principal on the Azure Databricks integration cluster as part of the Spark configuration. Ensure that the service principal has access to the ADLS container used for staging data and the ADLS container where you want to write the Delta tables. To configure the integration cluster to use the service principal, follow the steps in Access ADLS Gen2 with service principal.
Specify the cluster configuration
Set Cluster Mode to Standard.
Set Databricks Runtime Version to a Databricks runtime version.
Enable optimized writes and auto compaction by adding the following properties to your Spark configuration:
spark.databricks.delta.optimizeWrite.enabled true spark.databricks.delta.autoCompact.enabled true
Configure your cluster depending on your integration and scaling needs.
For cluster configuration details, see Compute configuration reference.
See Get connection details for an Azure Databricks compute resource for the steps to obtain the JDBC URL and HTTP path.
Step 3: Obtain JDBC and ODBC connection details to connect to a cluster
To connect an Azure Databricks cluster to Syncsort you need the following JDBC/ODBC connection properties:
- JDBC URL
- HTTP Path
Step 4: Configure Syncsort with Azure Databricks
Go to the Databricks and Connect for Big Data login page and follow the instructions.