Databricks Runtime 8.0 migration guide

Article
12/20/2021

Delta is the default format when a format is not specified

Databricks Runtime 8.0 changes the default format to delta to make it simpler to create a Delta table. When you create a table using SQL commands, or {Dataset|DataFrame}.{read|readStream|write|writeTo|writeStream} APIs, and you do not specify a format, the default format is delta. The behavior is the same as if you specified the format as delta in your code.

With Delta Lake, you get better performance over Parquet, better data reliability with rich schema validation, quality constraints, and transactional guarantees. With Delta Lake, you can simplify your data pipelines with unified structured streaming and batch processing on a single data source.

While Databricks recommends using Delta Lake to store your data, you may have legacy workflows that require migration to Delta Lake. To maintain the previous behavior, there are two options:

Option 1: Set configurations
Option 2: Modify code

Option 1: Set configurations

You can set spark.sql.sources.default to parquet and spark.sql.legacy.createHiveTableByDefault to true on your cluster. You should see the exact same behaviors as Databricks Runtime 7.x and below after setting these two configurations.

Option 2: Modify code

If you don’t want to set extra configs on your cluster, you can modify your code to maintain the previous behavior. Here are examples to show you how to explicitly specify the format to overwrite the default value.

`CREATE TABLE` without the `AS SELECT` clause

In Databricks Runtime 7.x and below, when you create a table without the AS SELECT clause in a SQL query without specifying the format, such as CREATE TABLE student(id INT, name STRING, age INT), it creates a Hive table as if you wrote USING HIVE in the query. In Databricks Runtime 8.x, it behaves as if you wrote USING DELTA. If you want to create a Hive table, you must add USING HIVE in Databricks Runtime 8.0.

`CREATE TABLE` with the `AS SELECT` clause (CTAS)

In Databricks Runtime 7.x and below, when you create a table with the AS SELECT clause in a SQL query without specifying the format, such as CREATE TABLE student AS SELECT 1 AS id, 'Andy' AS name, 20 AS age, it creates a Parquet table as if you wrote USING PARQUET in the query. In Databricks Runtime 8.x, it behaves as if you wrote USING DELTA. If you want to create a Parquet table, you must add USING PARQUET in Databricks Runtime 8.0.

`{Dataset|DataFrame}.{read|readStream}` APIs

In Databricks Runtime 7.x and below, when you read a path or a table using {Dataset|DataFrame}.{read|readStream} APIs, if you don’t specify a format and the underlying file format is not Delta, it will read the path or the table using the Parquet format as if you wrote format("parquet"). In Databricks Runtime 8.0, as we change the default format to delta. It behaves as if it you wrote format("delta"), which means if you omit the format previously to read a Parquet table in Databricks Runtime 7.x, you must add format("parquet") in Databricks Runtime 8.0. A table can both be read and written in Databricks Runtime 8.0 and above, without mentioning any format as both read and write API will use the delta format by default.

`{Dataset|DataFrame}.{write|writeTo|writeStream}` APIs

In Databricks Runtime 7.x and below, when you write a path or a table using {Dataset|DataFrame}.{write|writeTo|writeStream} APIs, if you don’t specify a format, when the path or the table doesn’t exist, or you are not writing to a Delta table, it writes data to the path or the table using the Parquet format as if you wrote format("parquet"). In Databricks Runtime 8.0 it behaves as if you wrote format("delta"), which means if you omit the format previously to write to a Parquet table, you must add format("parquet") in Databricks Runtime 8.0. A table can both be read and written in Databricks Runtime 8.0 and above, without mentioning any format as both read and write API will use the delta format by default.

Frequently asked questions (FAQs)

How do I know the table format after I create it?

You can use DESCRIBE DETAIL to check the table format.

Is it possible to convert a Delta table to a Parquet table or a Hive table if I created it without specifying the format unintentionally?

The easiest way is to create a new table with the format you would like and write the existing data in the Delta table to the new table.

How can I get additional help if I encounter an unexpected issue?

Contact Databricks Support who can review your case and help with a migration strategy.

Apache Spark 3.1.1 Migration Guide

See the Apache Spark 3.1.1 Migration Guide for changes inherited by Databricks Runtime 8.0 from Apache Spark.