from_json
function
Applies to: Databricks SQL Databricks Runtime
Returns a struct value with the jsonStr
and schema
.
Syntax
from_json(jsonStr, schema [, options])
Arguments
jsonStr
: A STRING expression specifying a json document.schema
: A STRING expression or invocation of schema_of_json function.options
: An optional MAP<STRING,STRING> literal specifying directives.
jsonStr
should be well-formed with respect to schema
and options
.
schema
must be defined as comma-separated column name and data type pairs as used in for example CREATE TABLE
. Prior to Databricks Runtime 12.2 schema
must be a literal.
options
, if provided, can be any of the following:
primitivesAsString
(defaultfalse
): infers all primitive values as a string type.prefersDecimal
(defaultfalse
): infers all floating-point values as a decimal type. If the values do not fit in decimal, then it infers them as doubles.allowComments
(defaultfalse
): ignores Java and C++ style comment in JSON records.allowUnquotedFieldNames
(defaultfalse
): allows unquoted JSON field names.allowSingleQuotes
(defaulttrue
): allows single quotes in addition to double quotes.allowNumericLeadingZeros
(defaultfalse
): allows leading zeros in numbers (for example,00012
).allowBackslashEscapingAnyCharacter
(defaultfalse
): allows accepting quoting of all character using backslash quoting mechanism.allowUnquotedControlChars
(defaultfalse
): allows JSON Strings to contain unquoted control characters (ASCII characters with value less than 32, including tab and line feed characters) or not.mode
(defaultPERMISSIVE
): allows a mode for dealing with corrupt records during parsing.PERMISSIVE
: when it meets a corrupted record, puts the malformed string into a field configured bycolumnNameOfCorruptRecord
, and sets malformed fields to null. To keep corrupt records, you can set a string type field namedcolumnNameOfCorruptRecord
in an user-defined schema. If a schema does not have the field, it drops corrupt records during parsing. When inferring a schema, it implicitly adds acolumnNameOfCorruptRecord
field in an output schema.FAILFAST
: throws an exception when it meets corrupted records.
columnNameOfCorruptRecord
(default is the value specified inspark.sql.columnNameOfCorruptRecord
): allows renaming the new field having malformed string created byPERMISSIVE
mode. This overridesspark.sql.columnNameOfCorruptRecord
.dateFormat
(defaultyyyy-MM-dd
): sets the string that indicates a date format. Custom date formats follow the formats at Datetime patterns. This applies to date type.timestampFormat
(defaultyyyy-MM-dd'T'HH:mm:ss[.SSS][XXX]
): sets the string that indicates a timestamp format. Custom date formats follow the formats at Datetime patterns. This applies to timestamp type.multiLine
(defaultfalse
): parses one record, which may span multiple lines, per file.encoding
(by default it is not set): allows to forcibly set one of standard basic or extended encoding for the JSON files. For example UTF-16BE, UTF-32LE. If the encoding is not specified andmultiLine
is set totrue
, it is detected automatically.lineSep
(default covers all\r
,\r\n
and\n
): defines the line separator that should be used for parsing.samplingRatio
(default 1.0): defines fraction of input JSON objects used for schema inferring.dropFieldIfAllNull
(defaultfalse
): whether to ignore column of all null values or empty array/struct during schema inference.locale
(default isen-US
):sets
a locale as language tag in IETF BCP 47 format. For instance, this is used while parsing dates and timestamps.allowNonNumericNumbers
(defaulttrue
): allows JSON parser to recognize set of not-a-number (NaN
) tokens as legal floating number values:+INF
for positive infinity, as well as alias of+Infinity
andInfinity
.-INF
for negative infinity), alias-Infinity
.NaN
for other not-a-numbers, like result of division by zero.
readerCaseSensitive
(defaulttrue
): specifies the case sensitivity behavior whenrescuedDataColumn
is enabled. If true, rescue the data columns whose names differ by case from the schema; otherwise, read the data in a case-insensitive manner. Available in Databricks SQL and Databricks Runtime 13.3 LTS and above.
Returns
A struct with field names and types matching the schema definition.
Examples
> SELECT from_json('{"a":1, "b":0.8}', 'a INT, b DOUBLE');
{1,0.8}
> SELECT from_json('{"time":"26/08/2015"}', 'time Timestamp', map('timestampFormat', 'dd/MM/yyyy'));
{2015-08-26 00:00:00}