Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Applies to: ✅ Azure Data Explorer
The .list blobs
command lists blobs under a specified container path.
This command is typically used with .ingest-from-storage-queued to ingest data. You can also use it on its own to better understand folder contents and parameterize ingestion commands.
Note
Queued ingestion commands are run on the data ingestion URI endpoint https://ingest-<YourClusterName><Region>.kusto.chinacloudapi.cn
.
Permissions
You must have at least Table Ingestor permissions to run this command.
Syntax
.list blobs
(SourceDataLocators) [Suffix
=SuffixValue] [MaxFiles
=MaxFilesValue] [PathFormat
=PathFormatValue]
Learn more about syntax conventions.
Parameters
Name | Type | Required | Description |
---|---|---|---|
SourceDataLocators | string |
✔️ | One or many storage connection strings separated by a comma character. Each connection string can refer to a storage container or a file prefix within a container. Currently, only one storage connection string is supported. |
SuffixValue | string |
The suffix that enables blob filtering. | |
MaxFilesValue | integer |
The maximum number of blobs to return. | |
PathFormatValue | string |
The pattern in the blob’s path that can be used to retrieve the creation time as an output field. For more information, see Path format. |
Note
We recommend using obfuscated string literals for SourceDataLocators.
When used alone,
.list blob
returns up to 1,000 files, regardless of any larger value specified in MaxFiles.
Ingestion properties
Important
In queued ingestion data is batched using Ingestion properties. The more distinct ingestion mapping properties used, such as different ConstValue values, the more fragmented the ingestion becomes, which can lead to performance degradation.
The following table lists and describes the supported properties, and provides examples:
Property | Description | Example |
---|---|---|
ingestionMapping |
A string value that indicates how to map data from the source file to the actual columns in the table. Define the format value with the relevant mapping type. See data mappings. |
with (format="json", ingestionMapping = "[{\"column\":\"rownumber\", \"Properties\":{\"Path\":\"$.RowNumber\"}}, {\"column\":\"rowguid\", \"Properties\":{\"Path\":\"$.RowGuid\"}}]") (deprecated: avroMapping , csvMapping , jsonMapping ) |
ingestionMappingReference |
A string value that indicates how to map data from the source file to the actual columns in the table using a named mapping policy object. Define the format value with the relevant mapping type. See data mappings. |
with (format="csv", ingestionMappingReference = "Mapping1") (deprecated: avroMappingReference , csvMappingReference , jsonMappingReference ) |
creationTime |
The datetime value (formatted as an ISO8601 string) to use at the creation time of the ingested data extents. If unspecified, the current value (now() ) is used. Overriding the default is useful when ingesting older data, so that the retention policy is applied correctly. When specified, make sure the Lookback property in the target table's effective Extents merge policy is aligned with the specified value. |
with (creationTime="2017-02-13") |
extend_schema |
A Boolean value that, if specified, instructs the command to extend the schema of the table (defaults to false ). This option applies only to .append and .set-or-append commands. The only allowed schema extensions have more columns added to the table at the end. |
If the original table schema is (a:string, b:int) , a valid schema extension would be (a:string, b:int, c:datetime, d:string) , but (a:string, c:datetime) wouldn't be valid |
folder |
For ingest-from-query commands, the folder to assign to the table. If the table already exists, this property overrides the table's folder. | with (folder="Tables/Temporary") |
format |
The data format (see supported data formats). | with (format="csv") |
ingestIfNotExists |
A string value that, if specified, prevents ingestion from succeeding if the table already has data tagged with an ingest-by: tag with the same value. This ensures idempotent data ingestion. For more information, see ingest-by: tags. |
The properties with (ingestIfNotExists='["Part0001"]', tags='["ingest-by:Part0001"]') indicate that if data with the tag ingest-by:Part0001 already exists, then don't complete the current ingestion. If it doesn't already exist, this new ingestion should have this tag set (in case a future ingestion attempts to ingest the same data again.) |
ignoreFirstRecord |
A Boolean value that, if set to true , indicates that ingestion should ignore the first record of every file. This property is useful for files in CSV and similar formats, if the first record in the file are the column names. By default, false is assumed. |
with (ignoreFirstRecord=false) |
policy_ingestiontime |
A Boolean value that, if specified, describes whether to enable the Ingestion Time Policy on a table that is created by this command. The default is true . |
with (policy_ingestiontime=false) |
recreate_schema |
A Boolean value that, if specified, describes whether the command may recreate the schema of the table. This property applies only to the .set-or-replace command. This property takes precedence over the extend_schema property if both are set. |
with (recreate_schema=true) |
tags |
A list of tags to associate with the ingested data, formatted as a JSON string | with (tags="['Tag1', 'Tag2']") |
TreatGzAsUncompressed |
A Boolean value that, if set to true , indicates that files with the extension .gz are not compressed. This flag is sometimes needed when ingesting from Amazon AWS S3. |
with (treatGzAsUncompressed=true) |
validationPolicy |
A JSON string that indicates which validations to run during ingestion of data represented using CSV format. See Data ingestion for an explanation of the different options. | with (validationPolicy='{"ValidationOptions":1, "ValidationImplications":1}') (this is the default policy) |
zipPattern |
Use this property when ingesting data from storage that has a ZIP archive. This is a string value indicating the regular expression to use when selecting which files in the ZIP archive to ingest. All other files in the archive are ignored. | with (zipPattern="*.csv") |
Authentication and authorization
Each storage connection string indicates the authorization method to use for access to the storage. Depending on the authorization method, the principal might need to be granted permissions on the external storage to perform the ingestion.
The following table lists the supported authentication methods and the permissions needed for ingesting data from external storage.
Authentication method | Azure Blob Storage / Data Lake Storage Gen2 | Data Lake Storage Gen1 |
---|---|---|
Shared Access (SAS) token | List + Read | This authentication method isn't supported in Gen1. |
Storage account access key | This authentication method isn't supported in Gen1. | |
Managed identity | Storage Blob Data Reader | Reader |
The primary use of .list blobs
is for queued ingestion which is done asynchronously with no user context. Therefore, Impersonation isn't supported.
Path format
The PathFormat parameter allows you to specify the format of the creation time for listed blobs. It consists of a sequence of text separators and partition elements. A partition element refers to a partition that is declared in the partition by
clause, and the text separator is any text enclosed in quotes. Consecutive partition elements must be set apart using the text separator.
[ StringSeparator ] Partition [ StringSeparator ] [Partition [ StringSeparator ] ...]
To construct the original file path prefix, partition elements are rendered as strings and separated with corresponding text separators. You can use the datetime_pattern
macro (datetime_pattern(
DateTimeFormat,
PartitionName)
) to specify the format used for rendering a datetime partition value. The macro adheres to the .NET format specification, and allows format specifiers to be enclosed in curly brackets. For example, the following two formats are equivalent:
- 'year='yyyy'/month='MM
- year={yyyy}/month={MM}
By default, datetime values are rendered using the following formats:
Partition function | Default format |
---|---|
startofyear |
yyyy |
startofmonth |
yyyy/MM |
startofweek |
yyyy/MM/dd |
startofday |
yyyy/MM/dd |
bin( Column, 1d) |
yyyy/MM/dd |
bin( Column, 1h) |
yyyy/MM/dd/HH |
bin( Column, 1m) |
yyyy/MM/dd/HH/mm |
Returns
The result of the command is a table with one record per blob listed.
Name | Type | Description |
---|---|---|
BlobUri | string |
The URI of the blob. |
SizeInBytes | long |
The number of bytes, or content-length, of the blob. |
CapturedVariables | dynamic |
The captured variables. Currently only CreationTime is supported. |
Examples
List maximum number of blobs
The following command lists a maximum of 20 blobs from the myfolder
folder using system-assigned managed identity authentication.
.list blobs (
"https://mystorageaccount.blob.core.chinacloudapi.cn/datasets/myfolder;managed_identity=system"
)
MaxFiles=20
List Parquet blobs
The following command lists a maximum of 10 blobs of type .parquet
from a folder, using system-assigned managed identity authentication.
.list blobs (
"https://mystorageaccount.blob.core.chinacloudapi.cn/datasets/myfolder;managed_identity=system"
)
Suffix=".parquet"
MaxFiles=10
Capture date from blob path
The following command lists a maximum of 10 blobs of type .parquet
from a folder, using system-assigned managed identity authentication, and extracts the date from the URL path.
.list blobs (
"https://mystorageaccount.blob.core.chinacloudapi.cn/datasets/myfolder;managed_identity=system"
)
Suffix=".parquet"
MaxFiles=10
PathFormat=("myfolder/year=" datetime_pattern("yyyy'/month='MM'/day='dd", creationTime) "/")
The PathFormat
in the example can extract dates from a path such as the following path:
https://mystorageaccount.blob.core.chinacloudapi.cn/datasets/myfolder/year=2024/month=03/day=16/myblob.parquet