Databricks SQL Driver for Go

The Databricks SQL Driver for Go is a Go library that allows you to use Go code to run SQL commands on Azure Databricks compute resources. This article supplements the Databricks SQL Driver for Go README, API reference, and examples.

Requirements

Getting started with the Databricks SQL Driver for Go

  1. On your development machine with Go 1.20 or above already installed and an existing Go code project already created, create a go.mod file to track your Go code's dependencies by running the go mod init command, for example:

    go mod init sample
    
  2. Take a dependency on the Databricks SQL Driver for Go package by running the go mod edit -require command, replacing v1.5.2 with the latest version of the Databricks SQL Driver for Go package as listed in the Releases:

    go mod edit -require github.com/databricks/databricks-sql-go@v1.5.2
    

    Your go.mod file should now look like this:

    module sample
    
    go 1.20
    
    require github.com/databricks/databricks-sql-go v1.5.2
    
  3. In your project, create a Go code file that imports the Databricks SQL Driver for Go. The following example, in a file named main.go with the following contents, lists all the clusters in your Azure Databricks workspace:

    package main
    
    import (
      "database/sql"
      "os"
      _ "github.com/databricks/databricks-sql-go"
    )
    
    func main() {
      dsn := os.Getenv("DATABRICKS_DSN")
    
      if dsn == "" {
        panic("No connection string found. " +
         "Set the DATABRICKS_DSN environment variable, and try again.")
      }
    
      db, err := sql.Open("databricks", dsn)
      if err != nil {
        panic(err)
      }
      defer db.Close()
    
      if err := db.Ping(); err != nil {
        panic(err)
      }
    }
    
  4. Add any missing module dependencies by running the go mod tidy command:

    go mod tidy
    

    Note

    If you get the error go: warning: "all" matched no packages, you forgot to add a Go code file that imports the Databricks SQL Driver for Go.

  5. Make copies of all packages needed to support builds and tests of packages in your main module, by running the go mod vendor command:

    go mod vendor
    
  6. Modify your code as needed to set the DATABRICKS_DSN environment variable for Azure Databricks authentication. See also Connect with a DSN connection string.

  7. Run your Go code file, assuming a file named main.go, by running the go run command:

    go run main.go
    
  8. If no errors are returned, you have successfully authenticated the Databricks SQL Driver for Go with your Azure Databricks workspace and connected to your running Azure Databricks cluster or SQL warehouse in that workspace.

Connect with a DSN connection string

To access clusters and SQL warehouses, use sql.Open() to create a database handle through a data source name (DSN) connection string. This code example retrieves the DSN connection string from an environment variable named DATABRICKS_DSN:

package main

import (
  "database/sql"
  "os"
  _ "github.com/databricks/databricks-sql-go"
)

func main() {
  dsn := os.Getenv("DATABRICKS_DSN")

  if dsn == "" {
    panic("No connection string found. " +
          "Set the DATABRICKS_DSN environment variable, and try again.")
  }

  db, err := sql.Open("databricks", dsn)
  if err != nil {
    panic(err)
  }
  defer db.Close()

  if err := db.Ping(); err != nil {
    panic(err)
  }
}

To specify the DSN connection string in the correct format, see the DSN connection string examples in Authentication. For example, for Azure Databricks personal access token authentication, use the following syntax, where:

  • <personal-access-token> is your Azure Databricks personal access token from the requirements.
  • <server-hostname> is the Server Hostname value from the requirements.
  • <port-number> is the Port value from the requirements, which is typically 443.
  • <http-path> is the HTTP Path value from the requirements.
  • <paramX=valueX> is one or more Optional parameters listed later in this article.
token:<personal-access-token>@<server-hostname>:<port-number>/<http-path>?<param1=value1>&<param2=value2>

For example, for a cluster:

token:<36bit_Your_Token_PlaceHolder>@adb-1234567890123456.7.databricks.azure.cn:443/sql/protocolv1/o/1234567890123456/1234-567890-abcdefgh

For example, for a SQL warehouse:

token:<36bit_Your_Token_PlaceHolder>@adb-1234567890123456.7.databricks.azure.cn:443/sql/1.0/endpoints/a1b234c5678901d2

Note

As a security best practice, you should not hard-code this DSN connection string into your Go code. Instead, you should retrieve this DSN connection string from a secure location. For example, the code example earlier in this article used an environment variable.

Optional parameters

  • Supported optional connection parameters can be specified in <param=value>. Some of the more frequently used ones include:
    • catalog: Sets the initial catalog name in the session.
    • schema: Sets the initial schema name in the session.
    • maxRows: Sets up the maximum number of rows fetched per request. The default is 10000.
    • timeout: Adds the timeout (in seconds) for the server query execution. The default is no timeout.
    • userAgentEntry: Used to identify partners. For more information, see your partner's documentation.
  • Supported optional session parameters can be specified in param=value. Some of the more frequently used ones include:
    • ansi_mode: A Boolean string. true for session statements to adhere to rules specified by the ANSI SQL specification. The system default is false.
    • timezone: A string, for example China/Beijing. Sets the timezone of the session. The system default is UTC.

For example, for a SQL warehouse:

token:<36bit_Your_Token_PlaceHolder>@adb-1234567890123456.7.databricks.azure.cn:443/sql/1.0/endpoints/a1b234c5678901d2?catalog=hive_metastore&schema=example&maxRows=100&timeout=60&timezone=China/Beijing&ansi_mode=true

Connect with the NewConnector function

Alternatively, use sql.OpenDB() to create a database handle through a new connector object that is created with dbsql.NewConnector() (connecting to Azure Databricks clusters and SQL warehouses with a new connector object requires v1.0.0 or higher of the Databricks SQL Driver for Go). For example:

package main

import (
  "database/sql"
  "os"
  dbsql "github.com/databricks/databricks-sql-go"
)

func main() {
  connector, err := dbsql.NewConnector(
    dbsql.WithAccessToken(os.Getenv("DATABRICKS_ACCESS_TOKEN")),
    dbsql.WithServerHostname(os.Getenv("DATABRICKS_HOST")),
    dbsql.WithPort(443),
    dbsql.WithHTTPPath(os.Getenv("DATABRICKS_HTTP_PATH")),
  )
  if err != nil {
    panic(err)
  }

  db := sql.OpenDB(connector)
  defer db.Close()

  if err := db.Ping(); err != nil {
    panic(err)
  }
}

To specify the correct set of NewConnector settings, see the examples in Authentication.

Note

As a security best practice, you should not hard-code your NewConnector settings into your Go code. Instead, you should retrieve these values from a secure location. For example, the preceding code uses environment variables.

Some of the more frequently used functional options include:

  • WithAccessToken(<access-token>): Your Azure Databricks personal access token from the requirements. Required string.
  • WithServerHostname(<server-hostname>): The Server Hostname value from the requirements. Required string.
  • WithPort(<port>): The server's port number, typically 443. Required int.
  • WithHTTPPath(<http-path>): The HTTP Path value from the requirements. Required string.
  • WithInitialNamespace(<catalog>, <schema>):The catalog and schema name in the session. Optional string, string.
  • WithMaxRows(<max-rows>): The maximum number of rows fetched per request. The default is 10000. Optional int.
  • WithSessionParams(<params-map>): The session parameters including "timezone" and "ansi_mode". Optional map[string]string.
  • WithTimeout(<timeout>). The timeout (in time.Duration) for the server query execution. The default is no timeout. Optional.
  • WithUserAgentEntry(<isv-name-plus-product-name>). Used to identify partners. For more information, see your partner's documentation. Optional string.

For example:

connector, err := dbsql.NewConnector(
  dbsql.WithAccessToken(os.Getenv("DATABRICKS_ACCESS_TOKEN")),
  dbsql.WithServerHostname(os.Getenv("DATABRICKS_HOST")),
  dbsql.WithPort(443),
  dbsql.WithHTTPPath(os.Getenv("DATABRICKS_HTTP_PATH")),
  dbsql.WithInitialNamespace("samples", "nyctaxi"),
  dbsql.WithMaxRows(100),
  dbsql.SessionParams(map[string]string{"timezone": "America/Sao_Paulo", "ansi_mode": "true"}),
  dbsql.WithTimeout(time.Minute),
  dbsql.WithUserAgentEntry("example-user"),
)

Authentication

The Databricks SQL Driver for Go supports the following Azure Databricks authentication types:

The Databricks SQL Driver for Go does not yet support the following Azure Databricks authentication types:

Databricks personal access token authentication

To use the Databricks SQL Driver for Go with Azure Databricks personal access token authentication, you must first create an Azure Databricks personal access token, as follows:

  1. In your Azure Databricks workspace, click your Azure Databricks username in the top bar, and then select Settings from the drop down.
  2. Click Developer.
  3. Next to Access tokens, click Manage.
  4. Click Generate new token.
  5. (Optional) Enter a comment that helps you to identify this token in the future, and change the token's default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank).
  6. Click Generate.
  7. Copy the displayed token to a secure location, and then click Done.

Note

Be sure to save the copied token in a secure location. Do not share your copied token with others. If you lose the copied token, you cannot regenerate that exact same token. Instead, you must repeat this procedure to create a new token. If you lose the copied token, or you believe that the token has been compromised, Databricks strongly recommends that you immediately delete that token from your workspace by clicking the trash can (Revoke) icon next to the token on the Access tokens page.

If you are not able to create or use tokens in your workspace, this might be because your workspace administrator has disabled tokens or has not given you permission to create or use tokens. See your workspace administrator or the following topics:

To authenticate the Databricks SQL Driver for Go with a DSN connection string and the code example in Connect with a DSN connection string, use the following DSN connection string syntax, where:

  • <personal-access-token> is your Azure Databricks personal access token from the requirements.
  • <server-hostname> is the Server Hostname value from the requirements.
  • <port-number> is the Port value from the requirements, which is typically 443.
  • <http-path> is the HTTP Path value from the requirements.

You can also append one or more Optional parameters listed previously in this article.

token:<personal-access-token>@<server-hostname>:<port-number>/<http-path>

To authenticate the Databricks SQL Driver for Go with the NewConnector function, use the following code snippet and the code example in Connect with the NewConnector function, which assumes that you have set the following environment variables:

  • DATABRICKS_SERVER_HOSTNAMEset to the Server Hostname value for your cluster or SQL warehouse.
  • DATABRICKS_HTTP_PATH, set to HTTP Path value for your cluster or SQL warehouse.
  • DATABRICKS_TOKEN, set to the Azure Databricks personal access token.

To set environment variables, see your operating system's documentation.

connector, err := dbsql.NewConnector(
  dbsql.WithServerHostname(os.Getenv("DATABRICKS_SERVER_HOSTNAME")),
  dbsql.WithHTTPPath(os.Getenv("DATABRICKS_HTTP_PATH")),
  dbsql.WithPort(443),
  dbsql.WithAccessToken(os.Getenv("DATABRICKS_TOKEN")),
)

Query data

The following code example demonstrates how to call the Databricks SQL Driver for Go to run a basic SQL query on an Azure Databricks compute resource. This command returns the first two rows from the trips table in the samples catalog's nyctaxi schema.

This code example retrieves the DSN connection string from an environment variable named DATABRICKS_DSN.

package main

import (
  "database/sql"
  "fmt"
  "os"
  "time"

  _ "github.com/databricks/databricks-sql-go"
)

func main() {
  dsn := os.Getenv("DATABRICKS_DSN")

  if dsn == "" {
    panic("No connection string found." +
          "Set the DATABRICKS_DSN environment variable, and try again.")
  }

  db, err := sql.Open("databricks", dsn)
  if err != nil {
    panic(err)
  }

  defer db.Close()

  var (
    tpep_pickup_datetime  time.Time
    tpep_dropoff_datetime time.Time
    trip_distance         float64
    fare_amount           float64
    pickup_zip            int
    dropoff_zip           int
  )

  rows, err := db.Query("SELECT * FROM samples.nyctaxi.trips LIMIT 2")
  if err != nil {
    panic(err)
  }

  defer rows.Close()

  fmt.Print("tpep_pickup_datetime,",
    "tpep_dropoff_datetime,",
    "trip_distance,",
    "fare_amount,",
    "pickup_zip,",
    "dropoff_zip\n")

  for rows.Next() {
    err := rows.Scan(&tpep_pickup_datetime,
      &tpep_dropoff_datetime,
      &trip_distance,
      &fare_amount,
      &pickup_zip,
      &dropoff_zip)
    if err != nil {
      panic(err)
    }

    fmt.Print(tpep_pickup_datetime, ",",
      tpep_dropoff_datetime, ",",
      trip_distance, ",",
      fare_amount, ",",
      pickup_zip, ",",
      dropoff_zip, "\n")
  }

  err = rows.Err()
  if err != nil {
    panic(err)
  }
}

Manage files in Unity Catalog volumes

The Databricks SQL Driver enables you to write local files to Unity Catalog volumes, download files from volumes, and delete files from volumes, as shown in the following example:

package main

import (
  "context"
  "database/sql"
  "os"

  _ "github.com/databricks/databricks-sql-go"
  "github.com/databricks/databricks-sql-go/driverctx"
)

func main() {
  dsn := os.Getenv("DATABRICKS_DSN")

  if dsn == "" {
    panic("No connection string found." +
      "Set the DATABRICKS_DSN environment variable, and try again.")
  }

  db, err := sql.Open("databricks", dsn)
  if err != nil {
    panic(err)
  }
  defer db.Close()

  // For writing local files to volumes and downloading files from volumes,
  // you must first specify the path to the local folder that contains the
  // files to be written or downloaded.
  // For multiple folders, add their paths to the following string array.
  // For deleting files in volumes, this string array is ignored but must
  // still be provided, so in that case its value can be set for example
  // to an empty string.
  ctx := driverctx.NewContextWithStagingInfo(
    context.Background(),
    []string{"/tmp/"},
  )

  // Write a local file to the path in the specified volume.
  // Specify OVERWRITE to overwrite any existing file in that path.
  db.ExecContext(ctx, "PUT '/tmp/my-data.csv' INTO '/Volumes/main/default/my-volume/my-data.csv' OVERWRITE")

  // Download a file from the path in the specified volume.
  db.ExecContext(ctx, "GET '/Volumes/main/default/my-volume/my-data.csv' TO '/tmp/my-downloaded-data.csv'")

  // Delete a file from the path in the specified volume.
  db.ExecContext(ctx, "REMOVE '/Volumes/main/default/my-volume/my-data.csv'")

  db.Close()
}

Logging

Use github.com/databricks/databricks-sql-go/logger to log messages that the Databricks SQL Driver for Go emits. The following code example uses sql.Open() to create a database handle through a DSN connection string. This code example retrieves the DSN connection string from an environment variable named DATABRICKS_DSN. All log messages that are emitted at the debug level and below are written to the results.log file.

package main

import (
  "database/sql"
  "io"
  "log"
  "os"

  _ "github.com/databricks/databricks-sql-go"
  dbsqllog "github.com/databricks/databricks-sql-go/logger"
)

func main() {
  dsn := os.Getenv("DATABRICKS_DSN")

  // Use the specified file for logging messages to.
  file, err := os.Create("results.log")
  if err != nil {
    log.Fatal(err)
  }
  defer file.Close()

  writer := io.Writer(file)

  // Log messages at the debug level and below.
  if err := dbsqllog.SetLogLevel("debug"); err != nil {
    log.Fatal(err)
  }

  // Log messages to the file.
  dbsqllog.SetLogOutput(writer)

  if dsn == "" {
    panic("Error: Cannot connect. No connection string found. " +
      "Set the DATABRICKS_DSN environment variable, and try again.")
  }

  db, err := sql.Open("databricks", dsn)
  if err != nil {
    panic(err)
  }
  defer db.Close()

  if err := db.Ping(); err != nil {
    panic(err)
  }
}

Testing

To test your code, use Go test frameworks such as the testing standard library. To test your code under simulated conditions without calling Azure Databricks REST API endpoints or changing the state of your Azure Databricks accounts or workspaces, use Go mocking libraries such as testfify.

For example, given the following file named helpers.go containing a GetDBWithDSNPAT function that returns an Azure Databricks workspace connection, a GetNYCTaxiTrips function that returns data from the trips table in the samples catalog's nyctaxi schema, and a PrintNYCTaxiTrips that prints the returned data:

package main

import (
  "database/sql"
  "fmt"
  "strconv"
  "time"
)

func GetDBWithDSNPAT(dsn string) (*sql.DB, error) {
  db, err := sql.Open("databricks", dsn)
  if err != nil {
    return nil, err
  }
  return db, nil
}

func GetNYCTaxiTrips(db *sql.DB, numRows int) (*sql.Rows, error) {
  rows, err := db.Query("SELECT * FROM samples.nyctaxi.trips LIMIT " + strconv.Itoa(numRows))
  if err != nil {
    return nil, err
  }
  return rows, nil
}

func PrintNYCTaxiTrips(rows *sql.Rows) {
  var (
    tpep_pickup_datetime  time.Time
    tpep_dropoff_datetime time.Time
    trip_distance         float64
    fare_amount           float64
    pickup_zip            int
    dropoff_zip           int
  )

  fmt.Print(
    "tpep_pickup_datetime,",
    "tpep_dropoff_datetime,",
    "trip_distance,",
    "fare_amount,",
    "pickup_zip,",
    "dropoff_zip\n",
  )

  for rows.Next() {
    err := rows.Scan(
      &tpep_pickup_datetime,
      &tpep_dropoff_datetime,
      &trip_distance,
      &fare_amount,
      &pickup_zip,
      &dropoff_zip,
    )
    if err != nil {
      panic(err)
    }

    fmt.Print(
      tpep_pickup_datetime, ",",
      tpep_dropoff_datetime, ",",
      trip_distance, ",",
      fare_amount, ",",
      pickup_zip, ",",
      dropoff_zip, "\n",
    )
  }

  err := rows.Err()
  if err != nil {
    panic(err)
  }
}

And given the following file named main.go that calls these functions:

package main

import (
  "os"
)

func main() {
  db, err := GetDBWithDSNPAT(os.Getenv("DATABRICKS_DSN"))
  if err != nil {
    panic(err)
  }

  rows, err := GetNYCTaxiTrips(db, 2)
  if err != nil {
    panic(err)
  }

  PrintNYCTaxiTrips(rows)
}

The following file named helpers_test.go tests whether the GetNYCTaxiTrips function returns the expected response. Rather than creating a real connection to the target workspace, this test mocks a sql.DB object. The test also mocks some data that conforms to the schema and values that are in the real data. The test returns the mocked data through the mocked connection and then checks whether one of the mocked data rows' values matches the expected value.

package main

import (
  "database/sql"
  "testing"

  "github.com/stretchr/testify/assert"
  "github.com/stretchr/testify/mock"
)

// Define an interface that contains a method with the same signature
// as the real GetNYCTaxiTrips function that you want to test.
type MockGetNYCTaxiTrips interface {
  GetNYCTaxiTrips(db *sql.DB, numRows int) (*sql.Rows, error)
}

// Define a struct that represents the receiver of the interface's method
// that you want to test.
type MockGetNYCTaxiTripsObj struct {
  mock.Mock
}

// Define the behavior of the interface's method that you want to test.
func (m *MockGetNYCTaxiTripsObj) GetNYCTaxiTrips(db *sql.DB, numRows int) (*sql.Rows, error) {
  args := m.Called(db, numRows)
  return args.Get(0).(*sql.Rows), args.Error(1)
}

func TestGetNYCTaxiTrips(t *testing.T) {
  // Instantiate the receiver.
  mockGetNYCTaxiTripsObj := new(MockGetNYCTaxiTripsObj)

  // Define how the mock function should be called and what it should return.
  // We're not concerned with whether the actual database is connected to--just
  // what is returned.
  mockGetNYCTaxiTripsObj.On("GetNYCTaxiTrips", mock.Anything, mock.AnythingOfType("int")).Return(&sql.Rows{}, nil)

  // Call the mock function that you want to test.
  rows, err := mockGetNYCTaxiTripsObj.GetNYCTaxiTrips(nil, 2)

  // Assert that the mock function was called as expected.
  mockGetNYCTaxiTripsObj.AssertExpectations(t)

  // Assert that the mock function returned what you expected.
  assert.NotNil(t, rows)
  assert.Nil(t, err)
}

Because the GetNYCTaxiTrips function contains a SELECT statement and therefore does not change the state of the trips table, mocking is not absolutely required in this example. However, mocking enables you to quickly run your tests without waiting for an actual connection to be made with the workspace. Also, mocking enables you to run simulated tests multiple times for functions that might change a table's state, such as INSERT INTO, UPDATE, and DELETE FROM.

Additional resources