分析 Azure Monitor 日志中的文本数据Parse text data in Azure Monitor logs

Azure Monitor 收集的某些日志数据会在单个属性中包括多条信息。Some log data collected by Azure Monitor will include multiple pieces of information in a single property. 将此数据分析为多个属性可以更轻松地在查询中进行使用。Parsing this data into multiple properties make it easier to use in queries. 一个常见示例是收集在单个属性中包含多个值的整个日志项目的自定义日志A common example is a custom log that collects an entire log entry with multiple values into a single property. 通过为不同值创建单独属性,可以对每个值进行搜索和聚合。By creating separate properties for the different values, you can search and aggregate on each.

本文介绍了用于在引入数据时以及在查询中检索时分析 Azure Monitor 中的日志数据的不同选项,比较了每个选项的相对优点。This article describes different options for parsing log data in Azure Monitor when the data is ingested and when it's retrieved in a query, comparing the relative advantages for each.

分析方法Parsing methods

在收集数据的引入时间或是在使用查询分析数据的查询时间,可以分析数据。You can parse data either at ingestion time when the data is collected or at query time when analyzing the data with a query. 每种策略都具有独特的优点,如下所述。Each strategy has unique advantages as described below.

在收集时分析数据Parse data at collection time

如果在收集时分析数据,则配置会在表中创建新属性的自定义字段When you parse data at collection time, you configure Custom Fields that create new properties in the table. 查询不必包含任何分析逻辑,只需将这些属性用作表中的任何其他字段。Queries don't have to include any parsing logic and simply use these properties as any other field in the table.

此方法的优点包括以下这些:Advantages to this method include the following:

  • 更易于查询收集的数据,因为无需在查询中包含分析命令。Easier to query the collected data since you don't need to include parse commands in the query.
  • 查询性能更好,因为查询无需执行分析。Better query performance since the query doesn't need to perform parsing.

此方法的缺点包括以下这些:Disadvantages to this method include the following:

  • 必须提前定义。Must be defined in advance. 不能包括已收集的数据。Can't include data that's already been collected.
  • 如果更改分析逻辑,则它仅应用于新数据。If you change the parsing logic, it will only apply to new data.
  • 分析选项少于查询中可用的选项。Fewer parsing options than available in queries.
  • 增加收集数据的延迟时间。Increases latency time for collecting data.
  • 错误可能难以处理。Errors can be difficult to handle.

在查询时分析数据Parse data at query time

如果在查询时分析数据,则在查询中包含用于将数据分析为多个字段的逻辑。When you parse data at query time, you include logic in your query to parse data into multiple fields. 实际表本身不进行修改。The actual table itself isn't modified.

此方法的优点包括以下这些:Advantages to this method include the following:

  • 应用于任何数据,包括已收集的数据。Applies to any data, including data that's already been collected.
  • 逻辑中的更改可以立即应用于所有数据。Changes in logic can be applied immediately to all data.
  • 灵活的分析选项,包括用于特定数据结构的预定义逻辑。Flexible parsing options including predefined logic for particular data structures.

此方法的缺点包括以下这些:Disadvantages to this method include the following:

  • 需要更复杂的查询。Requires more complex queries. 这可以通过使用函数模拟表来进行缓解。This can be mitigated by using functions to simulate a table.
  • 必须在多个查询中复制分析逻辑。Must replicate parsing logic in multiple queries. 可以通过函数共享某些逻辑。Can share some logic through functions.
  • 在对非常大的记录集(数十亿个记录)运行复杂逻辑时可能会形成开销。Can create overhead when running complex logic against very large record sets (billions of records).

在收集时分析数据Parse data as it's collected

有关在收集时分析数据的详细信息,请参阅在 Azure Monitor 中创建自定义字段See Create custom fields in Azure Monitor for details on parsing data as it's collected. 这会在表中创建可以由查询使用的自定义属性(就如同任何其他属性一样)。This creates custom properties in the table that can be used by queries just like any other property.

使用模式在查询中分析数据Parse data in query using patterns

当要分析的数据可以通过在记录间重复的某种模式进行标识时,可以使用 Kusto 查询语言中的不同运算符将特定数据段提取到一个或多个新属性中。When the data you want to parse can be identified by a pattern repeated across records, you can use different operators in the Kusto query language to extract the specific piece of data into one or more new properties.

简单文本模式Simple text patterns

在查询中使用分析运算符创建可以从字符串表达式中提取的一个或多个自定义属性。Use the parse operator in your query to create one or more custom properties that can be extracted from a string expression. 指定要标识的模式以及要创建的属性的名称。You specify the pattern to be identified and the names of the properties to create. 这对于具有形式类似于 key=value 的键/值字符串的数据特别有用。This is particularly useful for data with key-value strings with a form similar to key=value.

请考虑具有以下格式的数据的自定义日志。Consider a custom log with data in the following format.

Time=2018-03-10 01:34:36 Event Code=207 Status=Success Message=Client 05a26a97-272a-4bc9-8f64-269d154b0e39 connected
Time=2018-03-10 01:33:33 Event Code=208 Status=Warning Message=Client ec53d95c-1c88-41ae-8174-92104212de5d disconnected
Time=2018-03-10 01:35:44 Event Code=209 Status=Success Message=Transaction 10d65890-b003-48f8-9cfc-9c74b51189c8 succeeded
Time=2018-03-10 01:38:22 Event Code=302 Status=Error Message=Application could not connect to database
Time=2018-03-10 01:31:34 Event Code=303 Status=Error Message=Application lost connection to database

以下查询会将此数据分析为各个单独属性。The following query would parse this data into individual properties. 会添加包含 project 的行以便仅返回计算的属性,而不是 RawData,这是从自定义日志保存整个条目的单个属性。The line with project is added to only return the calculated properties and not RawData, which is the single property holding the entire entry from the custom log.

MyCustomLog_CL
| parse RawData with * "Time=" EventTime " Event Code=" Code " Status=" Status " Message=" Message
| project EventTime, Code, Status, Message

下面是另一个示例,它分解了 AzureActivity 表中的 UPN 的用户名。Following is another example that breaks out the user name of a UPN in the AzureActivity table.

AzureActivity
| parse  Caller with UPNUserPart "@" * 
| where UPNUserPart != "" //Remove non UPN callers (apps, SPNs, etc)
| distinct UPNUserPart, Caller

正则表达式Regular expressions

如果可以使用正则表达式标识数据,则可以通过使用正则表达式的函数提取各个值。If your data can be identified with a regular expression, you can use functions that use regular expressions to extract individual values. 下面的示例使用 提取分解 AzureActivity 记录中的 UPN 字段,然后返回非重复用户。The following example uses extract to break out the UPN field from AzureActivity records and then return distinct users.

AzureActivity
| extend UPNUserPart = extract("([a-z.]*)@", 1, Caller) 
| distinct UPNUserPart, Caller

为了显示大规模高效分析,Azure Monitor 使用 re2 版本的正则表达式,这与某些其他正则表达式变体相似,但并不完全相同。To enable efficient parsing at large scale, Azure Monitor uses re2 version of Regular Expressions, which is similar but not identical to some of the other regular expression variants. 有关详细信息,请参阅 re2 表达式语法Refer to the re2 expression syntax for details.

在查询中分析带分隔符的数据Parse delimited data in a query

带分隔符的数据使用常见字符(例如 CSV 文件中的逗号)来分隔字段。Delimited data separates fields with a common character such as a comma in a CSV file. 通过拆分函数可使用指定分隔符来分析带分隔符的数据。Use the split function to parse delimited data using a delimiter that you specify. 可以将此方法与扩展运算符结合使用,以返回数据中的所有字段,或指定要包括在输出中的各个字段。You can use this with extend operator to return all fields in the data or to specify individual fields to be included in the output.

备注

由于拆分返回动态对象,因此结果可能需要显式强制转换为数据类型(如字符串)以便在运算符和筛选器中使用。Since split returns a dynamic object, the results may need to be explicitly cast to data types such as string to be used in operators and filters.

请考虑具有以下 CSV 格式的数据的自定义日志。Consider a custom log with data in the following CSV format.

2018-03-10 01:34:36, 207,Success,Client 05a26a97-272a-4bc9-8f64-269d154b0e39 connected
2018-03-10 01:33:33, 208,Warning,Client ec53d95c-1c88-41ae-8174-92104212de5d disconnected
2018-03-10 01:35:44, 209,Success,Transaction 10d65890-b003-48f8-9cfc-9c74b51189c8 succeeded
2018-03-10 01:38:22, 302,Error,Application could not connect to database
2018-03-10 01:31:34, 303,Error,Application lost connection to database

以下查询会分析此数据和按两个计算的属性进行汇总。The following query would parse this data and summarize by two of the calculated properties. 第一行将 RawData 属性拆分为字符串数组。The first line splits the RawData property into a string array. 接下来各行会向单独的属性提供名称,并使用将它们转换为相应数据类型的函数将它们添加到输出。Each of the next lines gives a name to individual properties and adds them to the output using functions to convert them to the appropriate data type.

MyCustomCSVLog_CL
| extend CSVFields  = split(RawData, ',')
| extend EventTime  = todatetime(CSVFields[0])
| extend Code       = toint(CSVFields[1]) 
| extend Status     = tostring(CSVFields[2]) 
| extend Message    = tostring(CSVFields[3]) 
| where getyear(EventTime) == 2018
| summarize count() by Status,Code

在查询中分析预定义结构Parse predefined structures in a query

如果数据采用已知结构设置格式,则可能能够使用 Kusto 查询语言中的一个函数来分析预定义结构:If your data is formatted in a known structure, you may be able to use one of the functions in the Kusto query language for parsing predefined structures:

下面的示例查询分析 AzureActivity 表(采用 JSON 结构)的 Properties 字段。The following example query parses the Properties field of the AzureActivity table, which is structured in JSON. 它将结果保存到一个名为 parsedProp 的动态属性,其中包含采用 JSON 的各个命名值。It saves the results to a dynamic property called parsedProp, which includes the individual named value in the JSON. 这些值用于筛选和汇总查询结果。These values are used to filter and summarize the query results.

AzureActivity
| extend parsedProp = parse_json(Properties) 
| where parsedProp.isComplianceCheck == "True" 
| summarize count() by ResourceGroup, tostring(parsedProp.tags.businessowner)

这些分析函数可能是处理器密集型,因此仅当查询使用格式化数据中的多个属性时才应使用它们。These parsing functions can be processor intensive, so they should be used only when your query uses multiple properties from the formatted data. 否则,简单模式匹配处理会更快。Otherwise, simple pattern matching processing will be faster.

下面的示例演示域控制器 TGT 预身份验证类型的分解。The following example shows the breakdown of domain controller TGT Preauth type. 该类型仅存在于 EventData 字段中(这是一个 XML 字符串),但不需要此字段中的任何其他数据。The type exists only in the EventData field, which is an XML string, but no other data from this field is needed. 在这种情况下,分析用于挑选出所需数据段。In this case, parse is used to pick out the required piece of data.

SecurityEvent
| where EventID == 4768
| parse EventData with * 'PreAuthType">' PreAuthType '</Data>' * 
| summarize count() by PreAuthType

使用函数模拟表Use function to simulate a table

可能具有对特定表执行相同分析的多个查询。You may have multiple queries that perform the same parsing of a particular table. 在这种情况下,创建一个函数以返回经过分析的数据,而不是在每个查询中复制分析逻辑。In this case, create a function that returns the parsed data instead of replicating the parsing logic in each query. 随后可以在其他查询中使用函数别名来代替原始表。You can then use the function alias in place of the original table in other queries.

请考虑上面的以逗号分隔的自定义日志示例。Consider the comma-delimited custom log example above. 若要在多个查询中使用经过分析的数据,请使用以下查询插件函数,并使用别名 MyCustomCSVLog 保存它。In order to use the parsed data in multiple queries, create a function using the following query and save it with the alias MyCustomCSVLog.

MyCustomCSVLog_CL
| extend CSVFields = split(RawData, ',')
| extend DateTime  = tostring(CSVFields[0])
| extend Code      = toint(CSVFields[1]) 
| extend Status    = tostring(CSVFields[2]) 
| extend Message   = tostring(CSVFields[3]) 

现在可以在查询使用别名 MyCustomCSVLog 代替实际表名,如下所示。You can now use the alias MyCustomCSVLog in place of the actual table name in queries like the following.

MyCustomCSVLog
| summarize count() by Status,Code

后续步骤Next steps

  • 了解日志查询以便分析从数据源和解决方案中收集的数据。Learn about log queries to analyze the data collected from data sources and solutions.