infer_storage_schema 插件infer_storage_schema plugin

此插件推断外部数据的架构,并以 CSL 架构字符串的形式返回该架构。This plug-in infers schema of external data, and returns it as CSL schema string. 创建外部表时可以使用该字符串。The string can be used when creating external tables.

let options = dynamic({
  'StorageContainers': [
    h@'https://storageaccount.blob.core.chinacloudapi.cn/container1;secretKey'
  ],
  'DataFormat': 'parquet',
  'FileExtension': '.parquet'
});
evaluate infer_storage_schema(options)

语法Syntax

evaluate infer_storage_schema( 选项 )evaluate infer_storage_schema( Options )

参数Arguments

单一的“选项”参数是 dynamic 类型的常数值,该值保留用于指定请求属性的属性包:A single Options argument is a constant value of type dynamic that holds a property bag specifying properties of the request:

名称Name 必须Required 描述Description
StorageContainers Yes 存储连接字符串的列表,这些字符串表示存储的数据项目的前缀 URIList of storage connection strings that represent prefix URI for stored data artifacts
DataFormat Yes 受支持的数据格式之一。One of supported data formats.
FileExtension No 只扫描以此文件扩展名结尾的文件。Only scan files ending with this file extension. 该参数不是必需的,但指定该参数可能会加快进程速度(或消除数据读取问题)It's not required, but specifying it may speed up the process (or eliminate data reading issues)
FileNamePrefix No 只扫描以此前缀开头的文件。Only scan files starting with this prefix. 该参数不是必需的,但指定该参数可能会加快进程速度It's not required, but specifying it may speed up the process
Mode No 架构推理策略,anylastall 之一。Schema inference strategy, one of: any, last, all. 分别从任意(找到的第一个)文件、从上一个写入的文件或者从所有文件来推断数据架构。Infer data schema from any (first found) file, from last written file, or from all files respectively. 默认值为 lastThe default value is last.

返回Returns

infer_storage_schema 插件返回一个结果表,其中包含一个保留了 CSL 架构字符串的行/列。The infer_storage_schema plugin returns a single result table containing a single row/column holding CSL schema string.

备注

  • 除了“读取”的权限外,存储容器 URI 密钥还必须具有“列表”的权限 。Storage container URI secret keys must have the permissions for List in addition to Read .
  • 架构推理策略“all”是非常“昂贵”的运算,因为它意味着要从所有找到的项目中读取并合并它们的架构。Schema inference strategy 'all' is a very "expensive" operation, as it implies reading from all artifacts found and merging their schema.
  • 由于错误的类型推测(或者由于架构合并进程),有些返回的类型可能并不是实际的类型。Some returned types may not be the actual ones as a result of wrong type guess (or, as a result of schema merge process). 因此,在创建外部表之前,应该先仔细查看结果。This is why you should review the result carefully before creating an external table.

示例Example

let options = dynamic({
  'StorageContainers': [
    h@'https://storageaccount.blob.core.chinacloudapi.cn/MovileEvents/2015;secretKey'
  ],
  'FileExtension': '.parquet',
  'FileNamePrefix': 'part-',
  'DataFormat': 'parquet'
});
evaluate infer_storage_schema(options)

结果Result

CslSchemaCslSchema
app_id:string, user_id:long, event_time:datetime, country:string, city:string, device_type:string, device_vendor:string, ad_network:string, campaign:string, site_id:string, event_type:string, event_name:string, organic:string, days_from_install:int, revenue:realapp_id:string, user_id:long, event_time:datetime, country:string, city:string, device_type:string, device_vendor:string, ad_network:string, campaign:string, site_id:string, event_type:string, event_name:string, organic:string, days_from_install:int, revenue:real