CREATE TABLE USINGCREATE TABLE USING

使用数据源来定义表。Defines a table using a data source.

语法Syntax

CREATE TABLE [ OR REPLACE ] [ IF NOT EXISTS ] table_identifier
    [ ( col_name1 col_type1 [ COMMENT col_comment1 ], ... ) ]
    [ USING data_source ]
    [ OPTIONS ( key1=val1, key2=val2, ... ) ]
    [ PARTITIONED BY ( col_name1, col_name2, ... ) ]
    [ LOCATION path ]
    [ COMMENT table_comment ]
    [ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ]
    [ AS select_statement ]

USING 子句和 AS SELECT 子句之间的子句可按任意顺序出现。The clauses between the USING clause and the AS SELECT clause can appear in any order. 例如,可在 TBLPROPERTIES 之后写入 COMMENT table_commentFor example, you can write COMMENT table_comment after TBLPROPERTIES.

参数Parameters

  • table_identifiertable_identifier

    表名,可选择使用数据库名称进行限定。A table name, optionally qualified with a database name.

    语法: [ database_name. ] table_nameSyntax: [ database_name. ] table_name

  • OR REPLACEOR REPLACE

    如果已存在同名的表,则会使用新的配置替换该表。If a table with the same name already exists, the table is replaced with the new configuration.

    备注

    Databricks 强烈建议使用 OR REPLACE,而不是删除再重新创建表。Databricks strongly recommends using OR REPLACE instead of dropping and re-creating tables.

  • IF NOT EXISTSIF NOT EXISTS

    如果已存在同名表,则不会执行任何操作。If a table with the same name already exists, nothing will happen. IF NOT EXISTS 无法与 OR REPLACE 共存,这意味着不允许使用 CREATE TABLE OR REPLACE IF NOT EXISTSIF NOT EXISTS cannot coexist with OR REPLACE, which means CREATE TABLE OR REPLACE IF NOT EXISTS is not allowed.

  • USING data_sourceUSING data_source

    将要用于表的文件格式。The file format to use for the table. data_source 必须是 TEXTCSVJSONJDBCPARQUETORCDELTA 之一。data_source must be one of TEXT, CSV, JSON, JDBC, PARQUET, ORC, or DELTA. 如果省略 USING,则默认值为 DELTAIf USING is omitted, the default is DELTA.

    重要

    • 当省略 USINGdata_sourceDELTA 时,将应用 Delta 表选项中的其他选项。When USING is omitted or data_source is DELTA, the additional options in Delta table options apply.
    • data_sourceTEXTCSVJSONJDBCPARQUETORC 时,必须指定 LOCATION,并且不允许使用 AS select_statementWhen data_source is TEXT, CSV, JSON, JDBC, PARQUET, or ORC you must specify LOCATION and AS select_statement is not allowed.
  • PARTITIONED BYPARTITIONED BY

    请按指定的列对表进行分区。Partitions the table by the specified columns.

  • LOCATIONLOCATION

    用于存储表数据的目录路径,可以是分布式存储上的一个路径。Path to the directory where table data is stored, which could be a path on distributed storage.

  • OPTIONSOPTIONS

    用于优化表行为的表选项。Table options used to optimize the behavior of the table. 当数据源为 DELTA 时,不支持此子句。This clause is not supported when the data source is DELTA.

  • COMMENTCOMMENT

    用于描述表的字符串字面量。A string literal to describe the table.

  • TBLPROPERTIESTBLPROPERTIES

    用于标记表定义的键值对列表。A list of key-value pairs used to tag the table definition.

  • AS select_statementAS select_statement

    使用来自 select_statement 的数据填充表。Populates the table using the data from select_statement.

Delta 表选项 Delta table options

除标准 CREATE TABLE 选项外,Delta 表还支持本部分中介绍的选项。In addition to the standard CREATE TABLE options, Delta tables support the options described in this section.

CREATE ... table_identifier
  [ (col_name1 col_type1 [ NOT NULL ] [ COMMENT col_comment1], ...) ]
  [ USING DELTA ]
  [ LOCATION <path-to-delta-files> ]

参数Parameters

  • table_identifiertable_identifier

    • [database_name.] table_name:表名,可选择使用数据库名称进行限定。[database_name.] table_name: A table name, optionally qualified with a database name.
    • delta.`<path-to-table>`:现有 Delta 表的位置。delta.`<path-to-table>`: The location of an existing Delta table.
  • NOT NULLNOT NULL

    指示列值不能为 NULLIndicate that a column value cannot be NULL. 默认设置为允许 NULL 值。The default is to allow a NULL value. 如果指定了该设置,则对 Delta 表的任何更改都将检查这些 NOT NULL 约束。If specified any change to the Delta table will check these NOT NULL constraints.

    有关详细信息,请参阅 NOT NULL 约束For details, see NOT NULL constraint.

  • LOCATION LOCATION

    如果指定的 LOCATION 已包含存储在 Delta Lake 中的数据,Delta Lake 会执行以下操作:If you specify a LOCATION that already contains data stored in Delta Lake, Delta Lake does the following:

    • 如果仅指定了表名称和位置,例如:If you specify only the table name and location, for example:

      CREATE TABLE events
        LOCATION '/mnt/delta/events'
      

      Hive 元存储中的表会自动继承现有数据的架构、分区和表属性。the table in the Hive metastore automatically inherits the schema, partitioning, and table properties of the existing data. 此功能可用于将数据“导入”到元存储中。This functionality can be used to “import” data into the metastore.

    • 如果你指定了任何配置(架构、分区或表属性),则 Delta Lake 会验证指定的内容是否与现有数据的配置完全匹配。If you specify any configuration (schema, partitioning, or table properties), Delta Lake verifies that the specification exactly matches the configuration of the existing data.

    警告

    如果指定的配置与数据的配置并非完全匹配,则 Delta Lake 会引发一个描述差异的异常。If the specified configuration does not exactly match the configuration of the data, Delta Lake throws an exception that describes the discrepancy.

数据源交互Data source interaction

数据源表的作用类似于指向基础数据源的指针。A data source table acts like a pointer to the underlying data source. 例如,可在 Azure Databricks 中创建指向文件 bar 的表 fooFor example, you can create a table foo in Azure Databricks that points to a file bar. 读取和写入表 foo 时,实际上是读取和写入文件 barWhen you read and write table foo, you actually read and write the file bar. 如果未指定 LOCATION,Azure Databricks 会创建默认的表位置。If you don’t specify the LOCATION, Azure Databricks creates a default table location.

对于 CREATE TABLE AS SELECT,Azure Databricks 会用输入查询的数据覆盖基础数据源,确保创建的表包含与输入查询完全相同的数据。For CREATE TABLE AS SELECT, Azure Databricks overwrites the underlying data source with the data of the input query, to make sure the table gets created contains exactly the same data as the input query.

示例Examples

--Creates a Delta table
CREATE TABLE student (id INT, name STRING, age INT);
--Use data from another table
CREATE TABLE student_copy AS SELECT * FROM student;
--Creates a CSV table from an external directory
CREATE TABLE student USING CSV LOCATION '/mnt/csv_files';
--Specify table comment and properties
CREATE TABLE student (id INT, name STRING, age INT)
    COMMENT 'this is a comment'
    TBLPROPERTIES ('foo'='bar');
--Specify table comment and properties with different clauses order
CREATE TABLE student (id INT, name STRING, age INT)
    TBLPROPERTIES ('foo'='bar')
    COMMENT 'this is a comment';
--Create partitioned table
CREATE TABLE student (id INT, name STRING, age INT)
    PARTITIONED BY (age);