INSERT

适用于: 勾选“是” Databricks SQL 勾选“是” Databricks Runtime

将新行插入表中,并可选择截断表或分区。 通过值表达式或查询的结果指定插入的行。

如果表架构中存在 INSERT 类型,则 Databricks 对 Hive Avro 表不支持 timestamp-millis

语法

INSERT { OVERWRITE | INTO } [ TABLE ] table_name
    [ PARTITION clause ]
    [ ( column_name [, ...] ) | BY NAME ]
    query

INSERT INTO [ TABLE ] table_name
    REPLACE WHERE predicate
    query

注意

当你 INSERT INTO Delta 表时,支持架构强制和演变。 如果列的数据类型不能安全地强制转换为 Delta 表的数据类型,则会引发运行时异常。 如果启用了架构演变,则新列可以作为架构的最后一列(或嵌套列)存在,以便架构得以演变。

参数

  • INTOOVERWRITE

    如果指定 OVERWRITE,则将应用以下操作:

    • 如果没有 partition_spec,将在插入第一行之前截断表。
    • 否则,在插入第一行之前,与 partition_spec 匹配的所有分区都将被截断。

    如果指定 INTO,则插入的所有行都将添加到现有行。

  • table_name

    标识要插入到的表。 名称不得包含时态规范。 如果找不到表,Azure Databricks 会引发 TABLE_OR_VIEW_NOT_FOUND 错误。

    table_name 不得为外表。

    适用于: 勾选“是” Databricks SQL 勾选“是” Databricks Runtime 16.0 及更高版本

    table_name 可能包含选项规范

  • PARTITION 子句

    一个可选参数,可指定插入的目标分区。 也可仅部分指定分区。

    指定静态分区 (column = value) 时,此列不得在插入列列表中重复。

    INSERT OVERWRITE 指定动态分区时,table_name 必须是 Delta Lake 表。

  • ( column_name [, …] )

    表中列的可选列表。 insert 命令最多可以指定表中的任何特定列一次。

    适用于: 勾选“是” Databricks SQL 勾选“是” Databricks Runtime 12.2 LTS 及更高版本

    • 如果此命令省略某列,Databricks SQL 将改为分配相应的默认值。
    • 如果目标表架构没有为插入的列定义任何默认值,则 Databricks SQL 会在该列可为空时分配 NULL。 否则,Databricks SQL 会引发错误。

    不提供列列表等同于按照表中定义的顺序指定所有列(在 PARTITION 子句中分配了值的列除外)。

  • BY NAME

    适用于: 勾选“是” Databricks SQL 勾选“是” Databricks Runtime 13.3 LTS 及更高版本

    当使用此子句而不是显式列列表时,该命令使用 query 的公开列名按 query 顺序生成列列表。 与显式列列表一样,每个列必须存在于目标表中,并且不得重复。 如果 table_name 中存在的列不是隐含列列表的一部分,则会改用 DEFAULT 值。

    BY NAME 还按名称匹配结构的属性。

    query 中的任何列都不能与 PARTITION 子句中指定的列或生成的任何列匹配。

  • REPLACE WHERE boolean_expression

    适用于: 勾选“是” Databricks SQL 勾选“是” Databricks Runtime 12.2 LTS 及更高版本

    如果 table_name 是 Delta Lake 表,请先删除与 boolean_expression 匹配的行,然后再插入与 query 中指定的 boolean-expression 匹配的任何行。 query 中不匹配 boolean_expression 的行将被忽略。

    boolean_expression 可以是计算得出 BOOLEAN 结果类型的任何表达式。

    请参阅使用 replaceWhere 进行任意选择性覆盖

  • 查询

    生成要插入的行的查询。

    必须将查询返回的列数与指定或隐含的插入列列表相匹配。

    如果数据类型不能安全地转换为匹配的列数据类型,则会引发运行时异常。

    适用于:勾选“是” Databricks SQL 勾选“是” Databricks Runtime 11.3 LTS 及更高版本

    • 如果 queryVALUES 子句组成,则 expression 可以是 DEFAULT
    • 如果 querySELECT 子句组成,则 named_expression 可以是 DEFAULT
    • DEFAULT 将在 table_name 中插入相应列的明确定义的 DEFAULT 表达式,如果没有定义任何表达式,则插入 NULL

    如果启用了架构演变,则新列可以作为架构的最后一列(或嵌套列)存在,以便架构得以演变。

示例

本节内容:

INSERT INTO

INSERT 使用 VALUES

> CREATE TABLE students (name VARCHAR(64), address VARCHAR(64) DEFAULT 'unknown', student_id INT)
  PARTITIONED BY (student_id);

-- Single row insert using a `VALUES` clause specifying all columns.
> INSERT INTO students VALUES
    ('Amy Smith', '123 Park Ave, San Jose', 111111);

-- Single row insert using an implicit default for address
> INSERT INTO students(name, student_id) VALUES('Grayson Miller', 222222);

-- Single row insert using an explicit DEFAULT keyword for address
> INSERT INTO students VALUES('Youna Kim', DEFAULT, 333333);

-- Multi-row insert using a `VALUES` clause
> INSERT INTO students VALUES
    ('Bob Brown', '456 Taylor St, Cupertino', 444444),
    ('Cathy Johnson', '789 Race Ave, Palo Alto', 555555);

-- Multi-row insert using a mix of DEFAULT and literals
> INSERT INTO students VALUES
    ('Gwyneth Zhao', '120 Main St, Rockport', 666666),
    ('Jackson Peterson', DEFAULT, 777777);

> SELECT * FROM students;
            name                  address student_id
 ---------------- ------------------------ ----------
        Amy Smith   123 Park Ave, San Jose     111111
   Grayson Miller                  unknown     222222
        Youna Kim                  unknown     333333
        Bob Brown 456 Taylor St, Cupertino     444444
    Cathy Johnson  789 Race Ave, Palo Alto     555555
     Gwyneth Zhao    120 Main St, Rockport     666666
 Jackson Peterson                  unknown     777777

使用子查询进行插入

-- Assuming the persons table has already been created and populated.
> SELECT * FROM persons;
          name                   address       ssn
 ------------- ------------------------- ---------
 Dora Williams 134 Forest Ave, Melo Park 123456789
   Eddie Davis   245 Market St, Milpitas 345678901

> INSERT INTO students PARTITION (student_id = 444444)
    SELECT name, address FROM persons WHERE name = "Dora Williams";

> SELECT * FROM students;
          name                   address student_id
 ------------- ------------------------- ----------
     Amy Smith    123 Park Ave, San Jose     111111
     Bob Brown  456 Taylor St, Cupertino     222222
 Cathy Johnson   789 Race Ave, Palo Alto     333333
 Dora Williams 134 Forest Ave, Melo Park     444444

-- Use an option specification to influence the write behavior
INSERT INTO t WITH ('write.split-size' = 10) SELECT * FROM s;

使用 TABLE 子句进行插入

-- Assuming the visiting_students table has already been created and populated.
> SELECT * FROM visiting_students;
          name               address student_id
 ------------- --------------------- ----------
 Fleur Laurent 345 Copper St, London     777777
 Gordon Martin  779 Lake Ave, Oxford     888888

> INSERT INTO students TABLE visiting_students;

> SELECT * FROM students;
          name                   address student_id
 ------------- ------------------------- ----------
     Amy Smith     123 Park Ave,San Jose     111111
     Bob Brown  456 Taylor St, Cupertino     222222
 Cathy Johnson   789 Race Ave, Palo Alto     333333
 Dora Williams 134 Forest Ave, Melo Park     444444
 Fleur Laurent     345 Copper St, London     777777
 Gordon Martin      779 Lake Ave, Oxford     888888

插入到目录中

> CREATE TABLE students (name VARCHAR(64), address VARCHAR(64), student_id INT)
    PARTITIONED BY (student_id)
    LOCATION "/path/to/students_table";

> INSERT INTO delta.`/path/to/students_table` VALUES
    ('Amy Smith', '123 Park Ave, San Jose', 111111);
> SELECT * FROM students;
          name                   address student_id
 ------------- ------------------------- ----------
     Amy Smith    123 Park Ave, San Jose     111111

插入列列表

> INSERT INTO students (address, name, student_id) VALUES
    ('Hangzhou, China', 'Kent Yao', 11215016);
> SELECT * FROM students WHERE name = 'Kent Yao';
      name                address student_id
 --------- ---------------------- ----------
 Kent Yao         Hangzhou, China   11215016

插入分区规范和列列表

> INSERT INTO students PARTITION (student_id = 11215017) (address, name) VALUES
    ('Hangzhou, China', 'Kent Yao Jr.');
> SELECT * FROM students WHERE student_id = 11215017;
         name                address student_id
 ------------ ---------------------- ----------
 Kent Yao Jr.        Hangzhou, China   11215017

使用 BY NAME 子句的 INSERT

> CREATE TABLE target(n INT, text STRING, s STRUCT<a INT, b INT>);
> INSERT INTO target BY NAME SELECT named_struct('b', 2, 'a', 1) AS s, 0 AS n, 'data' AS text;
> SELECT * FROM target;
  0  data  {"a":1,"b":2}

> CREATE OR REPLACE TABLE target(n INT, arr ARRAY<STRUCT<a INT, b INT>>);
> INSERT INTO target BY NAME SELECT array(named_struct('b', 2, 'a', 1)) AS arr, 0 AS n;
> INSERT INTO target BY NAME SELECT array(named_struct('b', 2, 'a', 1)) AS arr;
> SELECT * FROM target;
  0     [{"a":1,"b":2}]
  NULL  [{"a":1,"b":2}]

> INSERT INTO target BY NAME SELECT array(named_struct('b', 2, 'a', 1)) AS arr, 0 AS badname;
Error

> INSERT INTO target BY NAME SELECT array(named_struct('b', 2, 'a', 1)) AS arr, 0 AS n, 1 AS n;
 Error: INSERT_COLUMN_ARITY_MISMATCH.TOO_MANY_DATA_COLUMNS

REPLACE WHERE

> CREATE TABLE sales(tx_date DATE, amount INTEGER);
> INSERT INTO sales VALUES
   (DATE'2022-10-01', 1234),
   (DATE'2022-10-02', 2345),
   (DATE'2022-10-03', 3456),
   (DATE'2022-11-01', 3214);

-- Replace any rows with a transaction date in October 2022.
> INSERT INTO sales REPLACE WHERE tx_date BETWEEN '2022-10-01' AND '2022-10-31'
   VALUES (DATE'2022-10-01', 1237),
          (DATE'2022-10-02', 2378),
          (DATE'2022-10-04', 2456),
          (DATE'2022-10-05', 6328);
> SELECT * FROM sales ORDER BY  tx_date;
 tx_date    amount
 ---------- ------
 2022-10-01   1237
 2022-10-02   2378
 2022-10-04   2456
 2022-10-05   6328
 2022-11-01   3214

INSERT OVERWRITE

使用 VALUES 子句进行插入

-- Assuming the students table has already been created and populated.
> SELECT * FROM students;
          name                   address student_id
 ------------- ------------------------- ----------
     Amy Smith    123 Park Ave, San Jose     111111
     Bob Brown  456 Taylor St, Cupertino     222222
 Cathy Johnson   789 Race Ave, Palo Alto     333333
 Dora Williams 134 Forest Ave, Melo Park     444444
 Fleur Laurent     345 Copper St, London     777777
 Gordon Martin      779 Lake Ave, Oxford     888888
   Helen Davis 469 Mission St, San Diego     999999
    Jason Wang     908 Bird St, Saratoga     121212

> INSERT OVERWRITE students VALUES
    ('Ashua Hill', '456 Erica Ct, Cupertino', 111111),
    ('Brian Reed', '723 Kern Ave, Palo Alto', 222222);

> SELECT * FROM students;
       name                 address student_id
 ---------- ----------------------- ----------
 Ashua Hill 456 Erica Ct, Cupertino     111111
 Brian Reed 723 Kern Ave, Palo Alto     222222

使用子查询进行插入

-- Assuming the persons table has already been created and populated.
> SELECT * FROM persons;
          name                   address       ssn
 ------------- ------------------------- ---------
 Dora Williams 134 Forest Ave, Melo Park 123456789
   Eddie Davis    245 Market St,Milpitas 345678901

> INSERT OVERWRITE students PARTITION (student_id = 222222)
    SELECT name, address FROM persons WHERE name = "Dora Williams";

> SELECT * FROM students;
          name                   address student_id
 ------------- ------------------------- ----------
    Ashua Hill   456 Erica Ct, Cupertino     111111
 Dora Williams 134 Forest Ave, Melo Park     222222

使用 TABLE 子句进行插入

-- Assuming the visiting_students table has already been created and populated.
> SELECT * FROM visiting_students;
          name               address student_id
 ------------- --------------------- ----------
 Fleur Laurent 345 Copper St, London     777777
 Gordon Martin  779 Lake Ave, Oxford     888888

> INSERT OVERWRITE students TABLE visiting_students;

> SELECT * FROM students;
          name               address student_id
 ------------- --------------------- ----------
 Fleur Laurent 345 Copper St, London     777777
 Gordon Martin  779 Lake Ave, Oxford     888888

插入覆盖目录

> CREATE TABLE students (name VARCHAR(64), address VARCHAR(64), student_id INT)
    PARTITIONED BY (student_id)
    LOCATION "/path/to/students_table";

> INSERT OVERWRITE delta.`/path/to/students_table` VALUES
    ('Amy Smith', '123 Park Ave, San Jose', 111111);
> SELECT * FROM students;
          name                   address student_id
 ------------- ------------------------- ----------
     Amy Smith    123 Park Ave, San Jose     111111