监视 Azure Monitor 中 Log Analytics 工作区的运行状况Monitor health of Log Analytics workspace in Azure Monitor

若要维持 Azure Monitor 中 Log Analytics 工作区的性能和可用性,你需要能够主动检测出现的任何问题。To maintain the performance and availability of your Log Analytics workspace in Azure Monitor, you need to be able to proactively detect any issues that arise. 本文介绍如何使用操作表中的数据监视 Log Analytics 工作区的运行状况。This article describes how to monitor the health of your Log Analytics workspace using data in the Operation table. 此表存在于每个 Log Analytics 工作区中,并且包含工作区中发生的错误和警告。This table is included in every Log Analytics workspace and contains error and warnings that occur in your workspace. 你应定期查看这些数据,并创建警报,以便在工作区中发生任何重要事件时主动收到通知。You should regularly review this data and create alerts to be proactively notified when there are any important incidents in your workspace.

_LogOperation 函数_LogOperation function

Azure Monitor 日志会将有关任何问题的详细信息发送到出现相应问题的工作区中的操作表。Azure Monitor Logs sends details on any issues to the Operation table in the workspace where the issue occurred. _LogOperation 系统函数以“操作”表为基础,并提供一组简化的信息以供分析和报警 。The _LogOperation system function is based on the Operation table and provides a simplified set of information for analysis and alerting.

Columns

_LogOperation 函数返回下表中的列。The _LogOperation function returns the columns in the following table.

Column 说明Description
TimeGeneratedTimeGenerated 事件发生的时间(UTC 形式)。Time that the incident occurred in UTC.
类别Category 操作类别组。Operation category group. 可用于筛选操作类型,以及帮助创建更精确的系统审核和警报。Can be used to filter on types of operations and help create more precise system auditing and alerts. 有关类别的列表,请参阅下面的部分。See the section below for a list of categories.
操作Operation 操作类型的说明。Description of the operation type. 可以表示某一 Log Analytics 限制、操作类型或进程的一部分。This can indicate one of the Log Analytics limits, type of operation, or part of a process.
级别Level 问题的严重级别:Severity level of the issue:
- 信息:无需特别关注。- Info: No specific attention needed.
- 警告:进程未按预期完成,需要引起注意。- Warning: Process was not completed as expected, and attention is needed.
- 错误:进程失败,需要紧急关注。- Error: Process failed and urgent attention is needed.
详细信息Detail 操作的详细说明,包括特定的错误消息(如果存在)。Detailed description of the operation include specific error message if it exists.
_ResourceId_ResourceId 与操作相关的 Azure 资源的资源 ID。Resource ID of the Azure resource related to the operation.
ComputerComputer 计算机名(如果操作与 Azure Monitor 代理相关)。Computer name if the operation is related to an Azure Monitor agent.
CorrelationIdCorrelationId 用于对连续的相关操作进行分组。Used to group consecutive related operations.

类别Categories

下表介绍了 _LogOperation 函数的类别。The following table describes the categories from the _LogOperation function.

类别Category 说明Description
引流Ingestion 作为数据引入过程的一部分的操作。Operations that are part of the data ingestion process. 有关详细信息,请参阅下文。See below for more details.
代理Agent 指示代理安装问题。Indicates an issue with agent installation.
数据收集Data collection 与数据收集过程相关的操作。Operations related to data collections processes.
解决方案目标Solution targeting ConfigurationScope 类型的操作已处理。Operation of type ConfigurationScope was processed.
评估解决方案Assessment solution 评估过程已执行。An assessment process was executed.

引流Ingestion

引入操作是在数据引入过程中出现的问题,包括有关达到 Azure Log Analytics 工作区限制的通知。Ingestion operations are issues that occurred during data ingestion including notification about reaching the Azure Log Analytics workspace limits. 此类别中的错误情况可能意味着数据丢失,因此监视这些情况尤为重要。Error conditions in this category might suggest data loss, so they are particularly important to monitor. 下表提供了有关这些操作的详细信息。The table below provides details on these operations. 请参阅 Azure Monitor 服务限制,了解 Log Analytics 工作区的服务限制。See Azure Monitor service limits for service limits for Log Analytics workspaces.

操作Operation 级别Level 详细信息Detail 相关文章Related article
自定义日志Custom log 错误Error 已达到自定义字段列限制。Custom fields column limit reached. Azure Monitor 服务限制Azure Monitor service limits
自定义日志Custom log 错误Error 自定义日志引入失败。Custom logs ingestion failed.
元数据Metadata. 错误Error 检测到配置错误。Configuration error detected.
数据收集Data collection 错误Error 数据已删除,因为请求的创建时间早于设置的天数。Data was dropped because the request was created earlier than the number of set days. 使用 Azure Monitor 日志管理使用情况和成本Manage usage and costs with Azure Monitor Logs
数据收集Data collection 信息Info 检测到收集计算机配置。Collection machine configuration is detected.
数据收集Data collection 信息Info 数据收集由于新的一天而开始。Data collection started due to new day. 使用 Azure Monitor 日志管理使用情况和成本Manage usage and costs with Azure Monitor Logs
数据收集Data collection 警告Warning 由于达到每日限制,数据收集已停止。Data collection stopped due to daily limit reached. 使用 Azure Monitor 日志管理使用情况和成本Manage usage and costs with Azure Monitor Logs
数据处理Data processing 错误Error JSON 格式无效。Invalid JSON format. 使用 HTTP 数据收集器 API(公共预览版)将日志数据发送到 Azure MonitorSend log data to Azure Monitor with the HTTP Data Collector API (public preview)
数据处理Data processing 警告Warning 值已剪裁为允许的最大大小。Value has been trimmed to the max allowed size. Azure Monitor 服务限制Azure Monitor service limits
数据处理Data processing 警告Warning 由于达到大小限制,字段值已被剪裁。Field value trimmed as size limit reached. Azure Monitor 服务限制Azure Monitor service limits
引入速率Ingestion rate 信息Info 引入速率限制接近 70%。Ingestion rate limit approaching 70%. Azure Monitor 服务限制Azure Monitor service limits
引入速率Ingestion rate 警告Warning 引入速率限制接近极限。Ingestion rate limit approaching the limit. Azure Monitor 服务限制Azure Monitor service limits
引入速率Ingestion rate 错误Error 已达到速率限制。Rate limit reached. Azure Monitor 服务限制Azure Monitor service limits
存储Storage 错误Error 由于使用的凭据无效,因此无法访问存储帐户。Cannot access the storage account as credentials used are invalid.

警报规则Alert rules

当在 Log Analytics 工作区中检测到问题时,使用 Azure Monitor 中的日志查询警报主动获得通知。Use log query alerts in Azure Monitor to be proactively notified when an issue is detected in your Log Analytics workspace. 你应使用这样一种策略,即让你能够及时响应问题,同时最大限度地降低成本。You should use a strategy that allows you to respond in a timely manner to issues while minimizing your costs. 按每项预警规则对订阅收费,费用具体取决于评估这些规则的频率。Your subscription is charged for each alert rule with a cost depending on the frequency that it's evaluated.

一项建议的策略是根据问题级别从两项预警规则开始。A recommended strategy is to start with two alert rules based on the level of the issue. 针对错误使用较短的频率(例如每 5 分钟一次),而针对警告使用较长的频率(例如 24 小时)。Use a short frequency such as every 5 minutes for Errors and a longer frequency such as 24 hours for Warnings. 由于错误指示潜在的数据丢失,因此你需要快速响应这些错误,最大限度地减少任何损失。Since Errors indicate potential data loss, you want to respond to them quickly to minimize any loss. 警告通常指示无需立即关注的问题,因此你可以每天查看它们。Warnings typically indicate an issue that does not require immediate attention, so you can review them daily.

使用使用 Azure Monitor 创建、查看和管理日志警报中的过程创建日志预警规则。Use the process in Create, view, and manage log alerts using Azure Monitor to create the log alert rules. 以下部分介绍了每项规则的详细信息。The following sections describe the details for each rule.

查询Query 阈值Threshold value 周期Period 频率Frequency
_LogOperation | where Level == "Error" 00 55 55
_LogOperation | where Level == "Warning" 00 14401440 14401440

这些预警规则将对出现错误或警告的所有操作做出相同的响应。These alert rules will respond the same to all operations with Error or Warning. 随着你对生成警报的操作越来越熟悉,你可能想要对特定操作做出不同响应。As you become more familiar with the operations that are generating alerts, you may want to respond differently for particular operations. 例如,针对特定操作,你可能想要向不同的人发送通知。For example, you may want to send notifications to different people for particular operations.

若要为特定操作创建预警规则,请使用包含“类别”和“操作”列的查询 。To create an alert rule for a specific operation, use a query that includes the Category and Operation columns.

以下示例在引入量速率达到限制的 80% 时创建警告性警报。The following example creates a warning alert when the ingestion volume rate has reached 80% of the limit.

  • 目标:选择你的 Log Analytics 工作区Target: Select your Log Analytics workspace
  • 条件:Criteria:
    • 信号名称:自定义日志搜索Signal name: Custom log search
    • 搜索查询:_LogOperation | where Category == "Ingestion" | where Operation == "Ingestion rate" | where Level == "Warning"Search query: _LogOperation | where Category == "Ingestion" | where Operation == "Ingestion rate" | where Level == "Warning"
    • 依据:结果数Based on: Number of results
    • 条件:大于Condition: Greater than
    • 阈值:0Threshold: 0
    • 时间段:5(分钟)Period: 5 (minutes)
    • 频率:5(分钟)Frequency: 5 (minutes)
  • 警报规则名称:达到每日数据限制Alert rule name: Daily data limit reached
  • 严重性:警告(严重性 1)Severity: Warning (Sev 1)

以下示例在数据收集达到每日限制时创建警告性警报。The following example creates a warning alert when the data collection has reached the daily limit.

  • 目标:选择你的 Log Analytics 工作区Target: Select your Log Analytics workspace
  • 条件:Criteria:
    • 信号名称:自定义日志搜索Signal name: Custom log search
    • 搜索查询:_LogOperation | where Category == "Ingestion" | where Operation == "Data Collection" | where Level == "Warning"Search query: _LogOperation | where Category == "Ingestion" | where Operation == "Data Collection" | where Level == "Warning"
    • 依据:结果数Based on: Number of results
    • 条件:大于Condition: Greater than
    • 阈值:0Threshold: 0
    • 时间段:5(分钟)Period: 5 (minutes)
    • 频率:5(分钟)Frequency: 5 (minutes)
  • 警报规则名称:达到每日数据限制Alert rule name: Daily data limit reached
  • 严重性:警告(严重性 1)Severity: Warning (Sev 1)

后续步骤Next steps