tokenize_fl()

Switch services using the Version drop-down list. Learn more about navigation.
Applies to: ✅ Azure Data Explorer ✅ Azure Monitor ✅ Microsoft Sentinel

The function tokenize_fl() is a user-defined function (UDF) that tokenizes semi-structured text strings, such as log lines, into separate columns. It uses a regex-based approach to split text on non-alphanumeric delimiters while preserving common semantic characters like periods, hyphens, colons, slashes, and equals signs.

Syntax

T | invoke tokenize_fl(text_col)

Learn more about syntax conventions.

Parameters

Name	Type	Required	Description
text_col	`string`	✔️	The name of the string column containing the text to tokenize.

Function definition

You can define the function by either embedding its code as a query-defined function, or creating it as a stored function in your database, as follows:

Query-defined
Stored

Define the function using the following let statement. No permissions are required.

Important

A let statement can't run on its own. It must be followed by a tabular expression statement. To run a working example of tokenize_fl(), see Example.

let tokenize_fl=(tbl:(*), text_col:string)
{
    let universal_regex = @"[^\w\.\-:/=]+";             //  Universal Token Splitter
    tbl
    | serialize _rid=row_number()
    | join (tbl
    | extend _text = column_ifexists(text_col, '')
    | extend norm = replace_regex(_text, universal_regex, "$$$")
    | extend tokens = split(norm, "$$$")
    | serialize _rid=row_number()
    | mv-expand with_itemindex = i tokens
    | summarize cols = make_bag(bag_pack(strcat("col", tostring(i + 1)), tokens)) by _rid
    | evaluate bag_unpack(cols)
    ) on _rid
    | project-away _rid, _rid1
};
// Write your query to use the function here.

Define the stored function once using the following .create function. Database User permissions are required.

Important

You must run this code to create the function before you can use the function as shown in the Example.

.create-or-alter function with (folder = "Packages\\ML", docstring = "regex tokenizer for log strings")
tokenize_fl(tbl:(*), text_col:string)
{
    let universal_regex = @"[^\w\.\-:/=]+";             //  Universal Token Splitter
    tbl
    | serialize _rid=row_number()
    | join (tbl
    | extend _text = column_ifexists(text_col, '')
    | extend norm = replace_regex(_text, universal_regex, "$$$")
    | extend tokens = split(norm, "$$$")
    | serialize _rid=row_number()
    | mv-expand with_itemindex = i tokens
    | summarize cols = make_bag(bag_pack(strcat("col", tostring(i + 1)), tokens)) by _rid
    | evaluate bag_unpack(cols)
    ) on _rid
    | project-away _rid, _rid1
}

Example

The following example uses the invoke operator to run the function.

Query-defined
Stored

To use a query-defined function, invoke it after the embedded function definition.

let tokenize_fl=(tbl:(*), text_col:string)
{
    let universal_regex = @"[^\w\.\-:/=]+";             //  Universal Token Splitter
    tbl
    | serialize _rid=row_number()
    | join (tbl
    | extend _text = column_ifexists(text_col, '')
    | extend norm = replace_regex(_text, universal_regex, "$$$")
    | extend tokens = split(norm, "$$$")
    | serialize _rid=row_number()
    | mv-expand with_itemindex = i tokens
    | summarize cols = make_bag(bag_pack(strcat("col", tostring(i + 1)), tokens)) by _rid
    | evaluate bag_unpack(cols)
    ) on _rid
    | project-away _rid, _rid1
};
let tbl = datatable(s:string)
[
  "INFO 2025-12-11T11:22:33Z device=cam01 temp=32.8C ip=10.0.0.5/24",
  "WARN user=adi path=/var/log/syslog error:disk-full id=aa:bb:cc:01",
];
tbl
| invoke tokenize_fl('s')

Important

For this example to run successfully, you must first run the Function definition code to store the function.

let tbl = datatable(s:string)
[
  "INFO 2025-12-11T11:22:33Z device=cam01 temp=32.8C ip=10.0.0.5/24",
  "WARN user=adi path=/var/log/syslog error:disk-full id=aa:bb:cc:01",
];
tbl
| invoke tokenize_fl('s')

Output

s	col1	col2	col3	col4	col5	col6
INFO 2025-12-11T11:22:33Z device=cam01 temp=32.8C ip=10.0.0.5/24	INFO	2025-12-11T11:22:33Z	device=cam01	temp=32.8C	ip=10.0.0.5/24
WARN user=adi path=/var/log/syslog error:disk-full id=aa:bb:cc:01	WARN	user=adi	path=/var/log/syslog	error:disk-full	id=aa:bb:cc:01

Last updated on 2026-01-14

tokenize_fl()

Syntax

Parameters

Function definition

Example

Additional resources