.list blobs command (Preview)

Applies to: ✅ Azure Data Explorer

The .list blobs command lists blobs under a specified container path.

This command is typically used with .ingest-from-storage-queued to ingest data. You can also use it on its own to better understand folder contents and parameterize ingestion commands.

Note

Queued ingestion commands are run on the data ingestion URI endpoint https://ingest-<YourClusterName><Region>.kusto.chinacloudapi.cn.

Permissions

You must have at least Table Ingestor permissions to run this command.

Syntax

.list blobs (SourceDataLocators) [Suffix=SuffixValue] [MaxFiles=MaxFilesValue] [PathFormat=PathFormatValue]

Learn more about syntax conventions.

Parameters

Name Type Required Description
SourceDataLocators string ✔️ One or many storage connection strings separated by a comma character. Each connection string can refer to a storage container or a file prefix within a container. Currently, only one storage connection string is supported.
SuffixValue string The suffix that enables blob filtering.
MaxFilesValue integer The maximum number of blobs to return.
PathFormatValue string The pattern in the blob’s path that can be used to retrieve the creation time as an output field. For more information, see Path format.

Note

  • We recommend using obfuscated string literals for SourceDataLocators.

  • When used alone, .list blob returns up to 1,000 files, regardless of any larger value specified in MaxFiles.

Ingestion properties

Important

In queued ingestion data is batched using Ingestion properties. The more distinct ingestion mapping properties used, such as different ConstValue values, the more fragmented the ingestion becomes, which can lead to performance degradation.

The following table lists and describes the supported properties, and provides examples:

Property Description Example
ingestionMapping A string value that indicates how to map data from the source file to the actual columns in the table. Define the format value with the relevant mapping type. See data mappings. with (format="json", ingestionMapping = "[{\"column\":\"rownumber\", \"Properties\":{\"Path\":\"$.RowNumber\"}}, {\"column\":\"rowguid\", \"Properties\":{\"Path\":\"$.RowGuid\"}}]")
(deprecated: avroMapping, csvMapping, jsonMapping)
ingestionMappingReference A string value that indicates how to map data from the source file to the actual columns in the table using a named mapping policy object. Define the format value with the relevant mapping type. See data mappings. with (format="csv", ingestionMappingReference = "Mapping1")
(deprecated: avroMappingReference, csvMappingReference, jsonMappingReference)
creationTime The datetime value (formatted as an ISO8601 string) to use at the creation time of the ingested data extents. If unspecified, the current value (now()) is used. Overriding the default is useful when ingesting older data, so that the retention policy is applied correctly. When specified, make sure the Lookback property in the target table's effective Extents merge policy is aligned with the specified value. with (creationTime="2017-02-13")
extend_schema A Boolean value that, if specified, instructs the command to extend the schema of the table (defaults to false). This option applies only to .append and .set-or-append commands. The only allowed schema extensions have more columns added to the table at the end. If the original table schema is (a:string, b:int), a valid schema extension would be (a:string, b:int, c:datetime, d:string), but (a:string, c:datetime) wouldn't be valid
folder For ingest-from-query commands, the folder to assign to the table. If the table already exists, this property overrides the table's folder. with (folder="Tables/Temporary")
format The data format (see supported data formats). with (format="csv")
ingestIfNotExists A string value that, if specified, prevents ingestion from succeeding if the table already has data tagged with an ingest-by: tag with the same value. This ensures idempotent data ingestion. For more information, see ingest-by: tags. The properties with (ingestIfNotExists='["Part0001"]', tags='["ingest-by:Part0001"]') indicate that if data with the tag ingest-by:Part0001 already exists, then don't complete the current ingestion. If it doesn't already exist, this new ingestion should have this tag set (in case a future ingestion attempts to ingest the same data again.)
ignoreFirstRecord A Boolean value that, if set to true, indicates that ingestion should ignore the first record of every file. This property is useful for files in CSVand similar formats, if the first record in the file are the column names. By default, false is assumed. with (ignoreFirstRecord=false)
policy_ingestiontime A Boolean value that, if specified, describes whether to enable the Ingestion Time Policy on a table that is created by this command. The default is true. with (policy_ingestiontime=false)
recreate_schema A Boolean value that, if specified, describes whether the command may recreate the schema of the table. This property applies only to the .set-or-replace command. This property takes precedence over the extend_schema property if both are set. with (recreate_schema=true)
tags A list of tags to associate with the ingested data, formatted as a JSON string with (tags="['Tag1', 'Tag2']")
TreatGzAsUncompressed A Boolean value that, if set to true, indicates that files with the extension .gz are not compressed. This flag is sometimes needed when ingesting from Amazon AWS S3. with (treatGzAsUncompressed=true)
validationPolicy A JSON string that indicates which validations to run during ingestion of data represented using CSV format. See Data ingestion for an explanation of the different options. with (validationPolicy='{"ValidationOptions":1, "ValidationImplications":1}') (this is the default policy)
zipPattern Use this property when ingesting data from storage that has a ZIP archive. This is a string value indicating the regular expression to use when selecting which files in the ZIP archive to ingest. All other files in the archive are ignored. with (zipPattern="*.csv")

Authentication and authorization

Each storage connection string indicates the authorization method to use for access to the storage. Depending on the authorization method, the principal might need to be granted permissions on the external storage to perform the ingestion.

The following table lists the supported authentication methods and the permissions needed for ingesting data from external storage.

Authentication method Azure Blob Storage / Data Lake Storage Gen2 Data Lake Storage Gen1
Shared Access (SAS) token List + Read This authentication method isn't supported in Gen1.
Storage account access key This authentication method isn't supported in Gen1.
Managed identity Storage Blob Data Reader Reader

The primary use of .list blobs is for queued ingestion which is done asynchronously with no user context. Therefore, Impersonation isn't supported.

Path format

The PathFormat parameter allows you to specify the format of the creation time for listed blobs. It consists of a sequence of text separators and partition elements. A partition element refers to a partition that is declared in the partition by clause, and the text separator is any text enclosed in quotes. Consecutive partition elements must be set apart using the text separator.

[ StringSeparator ] Partition [ StringSeparator ] [Partition [ StringSeparator ] ...]

To construct the original file path prefix, partition elements are rendered as strings and separated with corresponding text separators. You can use the datetime_pattern macro (datetime_pattern(DateTimeFormat, PartitionName)) to specify the format used for rendering a datetime partition value. The macro adheres to the .NET format specification, and allows format specifiers to be enclosed in curly brackets. For example, the following two formats are equivalent:

  • 'year='yyyy'/month='MM
  • year={yyyy}/month={MM}

By default, datetime values are rendered using the following formats:

Partition function Default format
startofyear yyyy
startofmonth yyyy/MM
startofweek yyyy/MM/dd
startofday yyyy/MM/dd
bin(Column, 1d) yyyy/MM/dd
bin(Column, 1h) yyyy/MM/dd/HH
bin(Column, 1m) yyyy/MM/dd/HH/mm

Returns

The result of the command is a table with one record per blob listed.

Name Type Description
BlobUri string The URI of the blob.
SizeInBytes long The number of bytes, or content-length, of the blob.
CapturedVariables dynamic The captured variables. Currently only CreationTime is supported.

Examples

List maximum number of blobs

The following command lists a maximum of 20 blobs from the myfolder folder using system-assigned managed identity authentication.

.list blobs (
    "https://mystorageaccount.blob.core.chinacloudapi.cn/datasets/myfolder;managed_identity=system"
)
MaxFiles=20

List Parquet blobs

The following command lists a maximum of 10 blobs of type .parquet from a folder, using system-assigned managed identity authentication.

.list blobs (
    "https://mystorageaccount.blob.core.chinacloudapi.cn/datasets/myfolder;managed_identity=system"
)
Suffix=".parquet"
MaxFiles=10

Capture date from blob path

The following command lists a maximum of 10 blobs of type .parquet from a folder, using system-assigned managed identity authentication, and extracts the date from the URL path.

.list blobs (
    "https://mystorageaccount.blob.core.chinacloudapi.cn/datasets/myfolder;managed_identity=system"
)
Suffix=".parquet"
MaxFiles=10
PathFormat=("myfolder/year=" datetime_pattern("yyyy'/month='MM'/day='dd", creationTime) "/")

The PathFormat in the example can extract dates from a path such as the following path:

https://mystorageaccount.blob.core.chinacloudapi.cn/datasets/myfolder/year=2024/month=03/day=16/myblob.parquet