Databricks Runtime 8.0 migration guide
Delta is the default format when a format is not specified
Databricks Runtime 8.0 changes the default format to delta
to make it simpler to create a Delta table. When you create a table using SQL commands, or {Dataset|DataFrame}.{read|readStream|write|writeTo|writeStream}
APIs, and you do not specify a format, the default format is delta
. The behavior is the same as if you specified the format as delta
in your code.
With Delta Lake, you get better performance over Parquet, better data reliability with rich schema validation, quality constraints, and transactional guarantees. With Delta Lake, you can simplify your data pipelines with unified structured streaming and batch processing on a single data source.
While Databricks recommends using Delta Lake to store your data, you may have legacy workflows that require migration to Delta Lake. To maintain the previous behavior, there are two options:
Option 1: Set configurations
You can set spark.sql.sources.default
to parquet
and spark.sql.legacy.createHiveTableByDefault
to true
on your cluster. You should see the exact same behaviors as Databricks Runtime 7.x and below after setting these two configurations.
Option 2: Modify code
If you don’t want to set extra configs on your cluster, you can modify your code to maintain the previous behavior. Here are examples to show you how to explicitly specify the format to overwrite the default value.
CREATE TABLE
without the AS SELECT
clause
In Databricks Runtime 7.x and below, when you create a table without the AS SELECT
clause in a SQL query without specifying the format, such as CREATE TABLE student(id INT, name STRING, age INT)
, it creates a Hive table as if you wrote USING HIVE
in the query. In Databricks Runtime 8.x, it behaves as if you wrote USING DELTA
. If you want to create a Hive table, you must add USING HIVE
in Databricks Runtime 8.0.
CREATE TABLE
with the AS SELECT
clause (CTAS)
In Databricks Runtime 7.x and below, when you create a table with the AS SELECT
clause in a SQL query without specifying the format, such as CREATE TABLE student AS SELECT 1 AS id, 'Andy' AS name, 20 AS age
, it creates a Parquet table as if you wrote USING PARQUET
in the query. In Databricks Runtime 8.x, it behaves as if you wrote USING DELTA
. If you want to create a Parquet table, you must add USING PARQUET
in Databricks Runtime 8.0.
{Dataset|DataFrame}.{read|readStream}
APIs
In Databricks Runtime 7.x and below, when you read a path or a table using {Dataset|DataFrame}.{read|readStream}
APIs, if you don’t specify a format and the underlying file format is not Delta, it will read the path or the table using the Parquet format as if you wrote format("parquet")
. In Databricks Runtime 8.0, as we change the default format to delta
. It behaves as if it you wrote format("delta")
, which means if you omit the format previously to read a Parquet table in Databricks Runtime 7.x, you must add format("parquet")
in Databricks Runtime 8.0. A table can both be read and written in Databricks Runtime 8.0 and above, without mentioning any format as both read and write API will use the delta
format by default.
{Dataset|DataFrame}.{write|writeTo|writeStream}
APIs
In Databricks Runtime 7.x and below, when you write a path or a table using {Dataset|DataFrame}.{write|writeTo|writeStream}
APIs, if you don’t specify a format, when the path or the table doesn’t exist, or you are not writing to a Delta table, it writes data to the path or the table using the Parquet format as if you wrote format("parquet")
. In Databricks Runtime 8.0 it behaves as if you wrote format("delta")
, which means if you omit the format previously to write to a Parquet table, you must add format("parquet")
in Databricks Runtime 8.0. A table can both be read and written in Databricks Runtime 8.0 and above, without mentioning any format as both read and write API will use the delta
format by default.
Frequently asked questions (FAQs)
How do I know the table format after I create it?
You can use DESCRIBE DETAIL to check the table format.
Is it possible to convert a Delta table to a Parquet table or a Hive table if I created it without specifying the format unintentionally?
The easiest way is to create a new table with the format you would like and write the existing data in the Delta table to the new table.
How can I get additional help if I encounter an unexpected issue?
Contact Databricks Support who can review your case and help with a migration strategy.
Apache Spark 3.1.1 Migration Guide
See the Apache Spark 3.1.1 Migration Guide for changes inherited by Databricks Runtime 8.0 from Apache Spark.