复制到(Azure Databricks 上的 Delta Lake)Copy Into (Delta Lake on Azure Databricks)

重要

此功能目前以公共预览版提供。This feature is in Public Preview.

COPY INTO table_identifier
  FROM [ file_location | (SELECT identifier_list FROM file_location) ]
  FILEFORMAT = data_source
  [FILES = [file_name, ... | PATTERN = 'regex_pattern']
  [FORMAT_OPTIONS ('data_source_reader_option' = 'value', ...)]
  [COPY_OPTIONS 'force' = ('false'|'true')]

将文件位置中的数据加载到 Delta 表中。Load data from a file location into a Delta table. 这是一个可重试的幂等操作 - 跳过源位置中已加载的文件。This is a re-triable and idempotent operation—files in the source location that have already been loaded are skipped.

table_identifier

要将数据复制到其中的 Delta 表。The Delta table to copy into.

FROM file_location

要从中加载数据的文件位置。The file location to load the data from. 此位置中的文件必须采用 FILEFORMAT 中指定的格式。Files in this location must have the format specified in FILEFORMAT.

SELECT identifier_list

在复制到 Delta 表之前,从源数据中选择指定的列或表达式。Selects the specified columns or expressions from the source data before copying into the Delta table.

FILEFORMAT = data_source

要加载的源文件的格式。The format of the source files to load. CSVJSONAVROORCPARQUET 之一。One of CSV, JSON, AVRO, ORC, PARQUET.

FILES

要加载的文件名的列表,长度最大为 1000。A list of file names to load, with length up to 1000. 无法使用 PATTERN 进行指定。Cannot be specified with PATTERN.

PATTERN

正则表达式模式,用于标识要从源目录加载的文件。A regex pattern that identifies the files to load from the source directory. 无法使用 FILES 进行指定。Cannot be specified with FILES.

FORMAT_OPTIONS

要传递给指定格式的 Apache Spark 数据源读取器的选项。Options to be passed to the Apache Spark data source reader for the specified format.

COPY_OPTIONS

用于控制 COPY INTO 命令的操作的选项。Options to control the operation of the COPY INTO command. 唯一的选项是 'force';如果设置为 'true',则禁用幂等性并加载文件,而不管文件以前是否加载过。The only option is 'force'; if set to 'true', idempotency is disabled and files are loaded regardless of whether they’ve been loaded before.

示例Examples

COPY INTO delta.`target_path`
  FROM (SELECT key, index, textData, 'constant_value' FROM 'source_path')
  FILEFORMAT = CSV
  PATTERN = 'folder1/file_[a-g].csv'
  FORMAT_OPTIONS('header' = 'true')