Create or modify a table using file upload
The Create or modify a table using file upload page allows you to upload CSV, TSV, or JSON, Avro, Parquet, or text files to create or overwrite a managed Delta Lake table.
You can create managed Delta tables in Unity Catalog or in the Hive metastore.
Note
In addition, you can use use the add data UI or COPY INTO to load files from cloud storage.
Important
- You must have access to a running compute resource and permissions to create tables in a target schema.
- Workspace admins can disable the Create or modify a table using file upload page.
You can use the UI to create a Delta table by importing small CSV, TSV, JSON, Avro, Parquet, or text files from your local machine.
- The Create or modify a table using file upload page supports uploading up to 10 files at a time.
- The total size of uploaded files must be under 2 gigabytes.
- The file must be a CSV, TSV, JSON, Avro, Parquet, or text file and have the extension ".csv", ".tsv" (or ".tab"), ".json", ".avro", ".parquet", or ".txt".
- Compressed files such as
zip
andtar
files are not supported.
Upload the file
- Click New > Add data.
- Click Create or modify a table.
- Click the file browser button or drag and drop files directly on the drop zone.
Note
Imported files are uploaded to a secure internal location within your account which is garbage collected daily.
Preview, configure, and create a table
You can upload data to the staging area without connecting to compute resources, but you must select an active compute resource to preview and configure your table.
You can preview 50 rows of your data when you configure the options for the uploaded table. Click the grid or list buttons under the file name to switch the presentation of your data.
Azure Databricks stores data files for managed tables in the locations configured for the containing schema. You need proper permissions to create a table in a schema.
Select the desired schema in which to create a table by doing the following:
- (For Unity Catalog-enabled workspaces only) You can select a catalog or the legacy
hive_metastore
. - Select a schema.
- (Optional) Edit the table name.
Note
You can use the dropdown to select Overwrite existing table or Create new table. Operations that attempt to create new tables with name conflicts display an error message.
You can configure options or columns before you create the table.
To create the table, click Create at the bottom of the page.
Format options
Format options depend on the file format you upload. Common format options appear in the header bar, while less commonly used options are available on the Advanced attributes dialog.
- For CSV, the following options are available:
- First row contains the header (enabled by default): This option specifies whether the CSV/TSV file contains a header.
- Column delimiter: The separator character between columns. Only a single character is allowed, and backslash is not supported. This defaults to comma for CSV files.
- Automatically detect column types (enabled by default): Automatically detect column types from file content. You can edit types in the preview table. If this is set to false, all column types are inferred as
STRING
. - Rows span multiple lines (disabled by default): Whether a column's value can span multiple lines in the file.
- Merge the schema across multiple files: Whether to infer the schema across multiple files and to merge the schema of each file. If disabled, the schema from one file is used.
- For JSON, the following options are available:
- Automatically detect column types (enabled by default): Automatically detect column types from file content. You can edit types in the preview table. If this is set to false, all column types are inferred as
STRING
. - Rows span multiple lines (enabled by default): Whether a column's value can span multiple lines in the file.
- Allow comments (enabled by default): Whether comments are allowed in the file.
- Allow single quotes (enabled by default): Whether single quotes are allowed in the file.
- Infer timestamp (enabled by default): Whether to try to infer timestamp strings as
TimestampType
.
- Automatically detect column types (enabled by default): Automatically detect column types from file content. You can edit types in the preview table. If this is set to false, all column types are inferred as
- For JSON, the following options are available:
- Automatically detect column types (enabled by default): Automatically detect column types from file content. You can edit types in the preview table. If this is set to false, all column types are inferred as
STRING
. - Rows span multiple lines (disabled by default): Whether a column's value can span multiple lines in the file.
- Allow comments Whether comments are allowed in the file.
- Allow single quotes: Whether single quotes are allowed in the file.
- Infer timestamp: Whether to try to infer timestamp strings as
TimestampType
.
- Automatically detect column types (enabled by default): Automatically detect column types from file content. You can edit types in the preview table. If this is set to false, all column types are inferred as
The data preview updates automatically when you edit format options.
Note
When you upload multiple files, the following rules apply:
- Header settings apply to all files. Make sure headers are consistently absent or present in all uploaded files to avoid data loss.
- Uploaded files combine by appending all data as rows in the target table. Joining or merging records during file upload is not supported.
Column names and types
You can edit column names and types.
To edit types, click the icon with the type.
Note
You can't edit nested types for
STRUCT
orARRAY
.To edit the column name, click the input box at the top of the column.
Column names do not support commas, backslashes, or unicode characters (such as emojis).
Column data types are inferred by default for CSV and JSON files. You can interpret all columns as STRING
type by disabling Advanced attributes > Automatically detect column types.
Note
- Schema inference does a best effort detection of column types. Changing column types can lead to some values being cast to
NULL
if the value cannot be cast correctly to the target data type. CastingBIGINT
toDATE
orTIMESTAMP
columns is not supported. Databricks recommends that you create a table first and then transform these columns using SQL functions afterwards. - To support table column names with special characters, the Create or modify a table using file upload page leverages Column Mapping.
- To add comments to columns, create the table and navigate to Catalog Explorer where you can add comments.
Supported data types
The Create or modify a table using file upload page supports the following data types. For more information about individual data types see SQL data types.
Data Type | Description |
---|---|
BIGINT |
8-byte signed integer numbers. |
BOOLEAN |
Boolean (true , false ) values. |
DATE |
Values comprising values of fields year, month, and day, without a time-zone. |
DOUBLE |
8-byte double-precision floating point numbers. |
STRING |
Character string values. |
TIMESTAMP |
Values comprising values of fields year, month, day, hour, minute, and second, with the session local timezone. |
STRUCT |
Values with the structure described by a sequence of fields. |
ARRAY |
Values comprising a sequence of elements with the typeelementType . |
DECIMAL(P,S) |
Numbers with maximum precision P and fixed scale S . |
Known issues
Casting BIGINT
to non-castable types like DATE
, such as dates in the format of 'yyyy', may trigger errors.