Data types

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

For rules governing how conflicts between data types are resolved, see SQL data type rules.

Supported data types

Azure Databricks supports the following data types:

Data Type Description
BIGINT Represents 8-byte signed integer numbers.
BINARY Represents byte sequence values.
BOOLEAN Represents Boolean values.
DATE Represents values comprising values of fields year, month and day, without a time-zone.
DECIMAL(p,s) Represents numbers with maximum precision p and fixed scale s.
DOUBLE Represents 8-byte double-precision floating point numbers.
FLOAT Represents 4-byte single-precision floating point numbers.
INT Represents 4-byte signed integer numbers.
INTERVAL intervalQualifier Represents intervals of time either on a scale of seconds or months.
VOID Represents the untyped NULL.
SMALLINT Represents 2-byte signed integer numbers.
STRING Represents character string values.
TIMESTAMP Represents values comprising values of fields year, month, day, hour, minute, and second, with the session local timezone.
TIMESTAMP_NTZ Represents values comprising values of fields year, month, day, hour, minute, and second. All operations are performed without taking any time zone into account.
TINYINT Represents 1-byte signed integer numbers.
ARRAY <elementType> Represents values comprising a sequence of elements with the type of elementType.
MAP < keyType,valueType > Represents values comprising a set of key-value pairs.
STRUCT < [fieldName : fieldType [NOT NULL][COMMENT str][, …]] > Represents values with the structure described by a sequence of fields.

Data type classification

Data types are grouped into the following classes:

Language mappings

Applies to: check marked yes Databricks Runtime

Scala

Spark SQL data types are defined in the package org.apache.spark.sql.types. You access them by importing the package:

import org.apache.spark.sql.types._
SQL type Data type Value type API to access or create data type
TINYINT ByteType Byte ByteType
SMALLINT ShortType Short ShortType
INT IntegerType Int IntegerType
BIGINT LongType Long LongType
FLOAT FloatType Float FloatType
DOUBLE DoubleType Double DoubleType
DECIMAL(p,s) DecimalType java.math.BigDecimal DecimalType
STRING StringType String StringType
BINARY BinaryType Array[Byte] BinaryType
BOOLEAN BooleanType Boolean BooleanType
TIMESTAMP TimestampType java.sql.Timestamp TimestampType
TIMESTAMP_NTZ TimestampNTZType java.time.LocalDateTime TimestampNTZType
DATE DateType java.sql.Date DateType
year-month interval YearMonthIntervalType java.time.Period YearMonthIntervalType (3)
day-time interval DayTimeIntervalType java.time.Duration DayTimeIntervalType (3)
ARRAY ArrayType scala.collection.Seq ArrayType(elementType [, containsNull]). (2)
MAP MapType scala.collection.Map MapType(keyType, valueType [, valueContainsNull]). (2)
STRUCT StructType org.apache.spark.sql.Row StructType(fields). fields is a Seq of StructField. 4.
StructField The value type of the data type of this field(For example, Int for a StructField with the data type IntegerType) StructField(name, dataType [, nullable]). 4

Java

Spark SQL data types are defined in the package org.apache.spark.sql.types. To access or create a data type, use factory methods provided in org.apache.spark.sql.types.DataTypes.

SQL type Data Type Value type API to access or create data type
TINYINT ByteType byte or Byte DataTypes.ByteType
SMALLINT ShortType short or Short DataTypes.ShortType
INT IntegerType int or Integer DataTypes.IntegerType
BIGINT LongType long or Long DataTypes.LongType
FLOAT FloatType float or Float DataTypes.FloatType
DOUBLE DoubleType double or Double DataTypes.DoubleType
DECIMAL(p,s) DecimalType java.math.BigDecimal DataTypes.createDecimalType() DataTypes.createDecimalType(precision, scale).
STRING StringType String DataTypes.StringType
BINARY BinaryType byte[] DataTypes.BinaryType
BOOLEAN BooleanType boolean or Boolean DataTypes.BooleanType
TIMESTAMP TimestampType java.sql.Timestamp DataTypes.TimestampType
TIMESTAMP_NTZ TimestampNTZType java.time.LocalDateTime DataTypes.TimestampNTZType
DATE DateType java.sql.Date DataTypes.DateType
year-month interval YearMonthIntervalType java.time.Period YearMonthIntervalType (3)
day-time interval DayTimeIntervalType java.time.Duration DayTimeIntervalType (3)
ARRAY ArrayType ava.util.List DataTypes.createArrayType(elementType [, containsNull]).(2)
MAP MapType java.util.Map DataTypes.createMapType(keyType, valueType [, valueContainsNull]).(2)
STRUCT StructType org.apache.spark.sql.Row DataTypes.createStructType(fields). fields is a List or array of StructField. 4
StructField The value type of the data type of this field (For example, int for a StructField with the data type IntegerType) DataTypes.createStructField(name, dataType, nullable) 4

Python

Spark SQL data types are defined in the package pyspark.sql.types. You access them by importing the package:

from pyspark.sql.types import *
SQL type Data type Value type API to access or create data type
TINYINT ByteType int or long. (1) ByteType()
SMALLINT ShortType int or long. (1) ShortType()
INT IntegerType int or long IntegerType()
BIGINT LongType long (1) LongType()
FLOAT FloatType float (1) FloatType()
DOUBLE DoubleType float DoubleType()
DECIMAL(p,s) DecimalType decimal.Decimal DecimalType()
STRING StringType string StringType()
BINARY BinaryType bytearray BinaryType()
BOOLEAN BooleanType bool BooleanType()
TIMESTAMP TimestampType datetime.datetime TimestampType()
TIMESTAMP_NTZ TimestampNTZType datetime.datetime TimestampNTZType()
DATE DateType datetime.date DateType()
year-month interval YearMonthIntervalType Not supported Not supported
day-time interval DayTimeIntervalType datetime.timedelta DayTimeIntervalType (3)
ARRAY ArrayType list, tuple, or array ArrayType(elementType, [containsNull]).(2)
MAP MapType dict MapType(keyType, valueType, [valueContainsNull]).(2)
STRUCT StructType list or tuple StructType(fields). field is a Seq of StructField. (4)
StructField The value type of the data type of this field (For example, Int for a StructField with the data type IntegerType) StructField(name, dataType, [nullable]).(4)

R

SQL type Data type Value type API to access or create data type
TINYINT ByteType integer (1) 'byte'
SMALLINT ShortType integer (1) 'short'
INT IntegerType integer 'integer'
BIGINT LongType integer (1) 'long'
FLOAT FloatType numeric (1) 'float'
DOUBLE DoubleType numeric 'double'
DECIMAL(p,s) DecimalType Not supported Not supported
STRING StringType character 'string'
BINARY BinaryType raw 'binary'
BOOLEAN BooleanType logical 'bool'
TIMESTAMP TimestampType POSIXct 'timestamp'
TIMESTAMP_NTZ TimestampNTZType datetime.datetime TimestampNTZType()
DATE DateType Date 'date'
year-month interval YearMonthIntervalType Not supported Not supported
day-time interval DayTimeIntervalType Not supported Not supported
ARRAY ArrayType vector or list list(type='array', elementType=elementType, containsNull=[containsNull]).(2)
MAP MapType environment list(type='map', keyType=keyType, valueType=valueType, valueContainsNull=[valueContainsNull]).(2)
STRUCT StructType named list list(type='struct', fields=fields). fields is a Seq of StructField. (4)
StructField The value type of the data type of this field (For example, integer for a StructField with the data type IntegerType) list(name=name, type=dataType, nullable=[nullable]).(4)

(1) Numbers are converted to the domain at runtime. Make sure that numbers are within range.

(2) The optional value defaults to TRUE.

(3) Interval types

  • YearMonthIntervalType([startField,] endField): Represents a year-month interval which is made up of a contiguous subset of the following fields:

    startField is the leftmost field, and endField is the rightmost field of the type. Valid values of startField and endField are 0(MONTH) and 1(YEAR).

  • DayTimeIntervalType([startField,] endField): Represents a day-time interval which is made up of a contiguous subset of the following fields:

    startField is the leftmost field, and endField is the rightmost field of the type. Valid values of startField and endField are 0(DAY), 1(HOUR), 2(MINUTE), 3(SECOND).

(4) StructType

  • StructType(fields) Represents values with the structure described by a sequence, list, or array of StructFields (fields). Two fields with the same name are not allowed.
  • StructField(name, dataType, nullable) Represents a field in a StructType. The name of a field is indicated by name. The data type of a field is indicated by dataType. nullable indicates if values of these fields can have null values. This is the default.