infer_storage_schema 插件infer_storage_schema plugin
此插件推断外部数据的架构,并以 CSL 架构字符串的形式返回该架构。This plug-in infers schema of external data, and returns it as CSL schema string. 在创建外部表时可以使用该字符串。The string can be used when creating external tables.
let options = dynamic({
'StorageContainers': [
h@'https://storageaccount.blob.core.chinacloudapi.cn/container1;secretKey'
],
'DataFormat': 'parquet',
'FileExtension': '.parquet'
});
evaluate infer_storage_schema(options)
语法Syntax
evaluate
infer_storage_schema(
选项 )
evaluate
infer_storage_schema(
Options )
参数Arguments
单一的“选项”参数是 dynamic
类型的常数值,该值保留用于指定请求属性的属性包:A single Options argument is a constant value of type dynamic
that holds a property bag specifying properties of the request:
名称Name | 必须Required | 描述Description |
---|---|---|
StorageContainers |
是Yes | 存储连接字符串的列表,这些字符串表示存储的数据项目的前缀 URIList of storage connection strings that represent prefix URI for stored data artifacts |
DataFormat |
是Yes | 受支持的数据格式之一。One of supported data formats. |
FileExtension |
否No | 只扫描以此文件扩展名结尾的文件。Only scan files ending with this file extension. 该参数不是必需的,但指定该参数可能会加快进程速度(或消除数据读取问题)It's not required, but specifying it may speed up the process (or eliminate data reading issues) |
FileNamePrefix |
否No | 只扫描以此前缀开头的文件。Only scan files starting with this prefix. 该参数不是必需的,但指定该参数可能会加快进程速度It's not required, but specifying it may speed up the process |
Mode |
否No | 架构推理策略,any 、last 和 all 之一。Schema inference strategy, one of: any , last , all . 分别从任意(找到的第一个)文件、从上一个写入的文件或者从所有文件来推断数据架构。Infer data schema from any (first found) file, from last written file, or from all files respectively. 默认值为 last 。The default value is last . |
返回Returns
infer_storage_schema
插件返回一个结果表,其中包含一个保留了 CSL 架构字符串的行/列。The infer_storage_schema
plugin returns a single result table containing a single row/column holding CSL schema string.
备注
- 除了“读取”的权限外,存储容器 URI 密钥还必须具有“列表”的权限 。Storage container URI secret keys must have the permissions for List in addition to Read .
- 架构推理策略“all”是非常“昂贵”的运算,因为它意味着要从所有找到的项目中读取并合并它们的架构。Schema inference strategy 'all' is a very "expensive" operation, as it implies reading from all artifacts found and merging their schema.
- 由于错误的类型推测(或者由于架构合并进程),有些返回的类型可能并不是实际的类型。Some returned types may not be the actual ones as a result of wrong type guess (or, as a result of schema merge process). 因此,在创建外部表之前,应该先仔细查看结果。This is why you should review the result carefully before creating an external table.
示例Example
let options = dynamic({
'StorageContainers': [
h@'https://storageaccount.blob.core.chinacloudapi.cn/MovileEvents/2015;secretKey'
],
'FileExtension': '.parquet',
'FileNamePrefix': 'part-',
'DataFormat': 'parquet'
});
evaluate infer_storage_schema(options)
结果Result
CslSchemaCslSchema |
---|
app_id:string, user_id:long, event_time:datetime, country:string, city:string, device_type:string, device_vendor:string, ad_network:string, campaign:string, site_id:string, event_type:string, event_name:string, organic:string, days_from_install:int, revenue:realapp_id:string, user_id:long, event_time:datetime, country:string, city:string, device_type:string, device_vendor:string, ad_network:string, campaign:string, site_id:string, event_type:string, event_name:string, organic:string, days_from_install:int, revenue:real |