externaldata 运算符externaldata operator

externaldata 运算符返回一个表,该表的架构是在查询自身中定义的,并且该表的数据是从外部存储项目(如 Azure Blob 存储中的 Blob 或 Azure Data Lake Storage 中的文件)中读取的。The externaldata operator returns a table whose schema is defined in the query itself, and whose data is read from an external storage artifact, such as a blob in Azure Blob Storage or a file in Azure Data Lake Storage.

语法Syntax

externaldata ( ColumnName : ColumnType [, ...] ) externaldata ( ColumnName : ColumnType [, ...] )
[ StorageConnectionString [, ...] ] [ StorageConnectionString [, ...] ]
[with ( PropertyName = PropertyValue [, ...] )] [with ( PropertyName = PropertyValue [, ...] )]

参数Arguments

  • ColumnName , ColumnType :这些参数定义表的架构。ColumnName , ColumnType : The arguments define the schema of the table. 该语法与定义 .create table 中的表时所使用的语法相同。The syntax is the same as the syntax used when defining a table in .create table.

  • StorageConnectionString:存储连接字符串,用于描述包含要返回的数据的存储项目。StorageConnectionString : Storage connection strings that describe the storage artifacts holding the data to return.

  • PropertyName , PropertyValue , ...:其他属性(如 引入属性下面所列),描述如何解释从存储中检索到的数据。PropertyName , PropertyValue , ...: Additional properties that describe how to interpret the data retrieved from storage, as listed under ingestion properties.

当前支持的属性包括:Currently supported properties are:

属性Property 类型Type 说明Description
format string 数据格式。Data format. 如果未指定,则会尝试从文件扩展名检测数据格式(默认为 CSV)。If not specified, an attempt is made to detect the data format from file extension (defaults to CSV). 支持任何引入数据格式Any of the ingestion data formats are supported.
ignoreFirstRecord bool 如果设置为 true,则表示每个文件中的第一条记录均会被忽略。If set to true, indicates that the first record in every file is ignored. 在查询带有标题的 CSV 文件时,此属性很有用。This property is useful when querying CSV files with headers.
ingestionMapping string 一个字符串值,表示如何将数据从源文件映射到运算符结果集中的实际列。A string value that indicates how to map data from the source file to the actual columns in the operator result set. 请参阅数据映射See data mappings.

备注

  • 此运算符不接受任何管道输入。This operator doesn't accept any pipeline input.
  • 标准查询限制还适用于外部数据查询。Standard query limits apply to external data queries as well.

返回Returns

externaldata 运算符返回给定架构的数据表,表中的数据是从指定的存储项目中分析的,由存储连接字符串指示。The externaldata operator returns a data table of the given schema whose data was parsed from the specified storage artifact, indicated by the storage connection string.

示例Examples

提取存储在 Azure Blob 存储中的用户 ID 的列表Fetch a list of user IDs stored in Azure Blob Storage

下面的示例显示了如何查找表中的所有记录,该表的 UserID 列属于一个已知 ID 集,这些 ID 保存在外部存储文件中(每行一个 ID)。The following example shows how to find all records in a table whose UserID column falls into a known set of IDs, held (one per line) in an external storage file. 由于未指定数据格式,因此检测到的数据格式是 TXTSince the data format isn't specified, the detected data format is TXT.

Users
| where UserID in ((externaldata (UserID:string) [
    @"https://storageaccount.blob.core.chinacloudapi.cn/storagecontainer/users.txt"
      h@"?...SAS..." // Secret token needed to access the blob
    ]))
| ...

查询多个数据文件Query multiple data files

下面的示例查询外部存储中存储的多个数据文件。The following example queries multiple data files stored in external storage.

externaldata(Timestamp:datetime, ProductId:string, ProductDescription:string)
[
  h@"https://mycompanystorage.blob.core.chinacloudapi.cn/archivedproducts/2019/01/01/part-00000-7e967c99-cf2b-4dbb-8c53-ce388389470d.csv.gz?...SAS...",
  h@"https://mycompanystorage.blob.core.chinacloudapi.cn/archivedproducts/2019/01/02/part-00000-ba356fa4-f85f-430a-8b5a-afd64f128ca4.csv.gz?...SAS...",
  h@"https://mycompanystorage.blob.core.chinacloudapi.cn/archivedproducts/2019/01/03/part-00000-acb644dc-2fc6-467c-ab80-d1590b23fc31.csv.gz?...SAS..."
]
with(format="csv")
| summarize count() by ProductId

可将上述示例视为快速查询多个数据文件(无需定义外部表)的方法。The above example can be thought of as a quick way to query multiple data files without defining an external table.

备注

externaldata 运算符无法识别数据分区。Data partitioning isn't recognized by the externaldata operator.

查询分层数据格式Query hierarchical data formats

若要查询分层数据格式(如 JSONParquetAvroORC),必须在运算符属性中指定 ingestionMappingTo query hierarchical data format, such as JSON, Parquet, Avro, or ORC, ingestionMapping must be specified in the operator properties. 在此示例中,有一个 JSON 文件存储在 Azure Blob 存储中,该文件包含以下内容:In this example, there's a JSON file stored in Azure Blob Storage with the following contents:

{
  "timestamp": "2019-01-01 10:00:00.238521",   
  "data": {    
    "tenant": "e1ef54a6-c6f2-4389-836e-d289b37bcfe0",   
    "method": "RefreshTableMetadata"   
  }   
}   
{
  "timestamp": "2019-01-01 10:00:01.845423",   
  "data": {   
    "tenant": "9b49d0d7-b3e6-4467-bb35-fa420a25d324",   
    "method": "GetFileList"   
  }   
}
...

若要使用 externaldata 运算符查询此文件,必须指定数据映射。To query this file using the externaldata operator, a data mapping must be specified. 该映射指示如何将 JSON 字段映射到运算符结果集列:The mapping dictates how to map JSON fields to the operator result set columns:

externaldata(Timestamp: datetime, TenantId: guid, MethodName: string)
[ 
   h@'https://mycompanystorage.blob.core.chinacloudapi.cn/events/2020/09/01/part-0000046c049c1-86e2-4e74-8583-506bda10cca8.json?...SAS...'
]
with(format='multijson', ingestionMapping='[{"Column":"Timestamp","Properties":{"Path":"$.time"}},{"Column":"TenantId","Properties":{"Path":"$.data.tenant"}},{"Column":"MethodName","Properties":{"Path":"$.data.method"}}]')

此处使用 MultiJSON 格式,因为单个 JSON 记录跨越多行。The MultiJSON format is used here because single JSON records are spanned into multiple lines.

有关映射语法的详细信息,请参阅数据映射For more info on mapping syntax, see data mappings.