使用 Azure Monitor 进行大规模监视Monitor at scale by using Azure Monitor

Azure 备份在恢复服务保管库中提供内置的监视和警报功能Azure Backup provides built-in monitoring and alerting capabilities in a Recovery Services vault. 无需配置任何附加的管理基础结构即可使用这些功能。These capabilities are available without any additional management infrastructure. 但是,仅限在以下方案中使用此内置服务:But this built-in service is limited in the following scenarios:

  • 监视不同订阅中多个恢复服务保管库中的数据If you monitor data from multiple Recovery Services vaults across subscriptions
  • 首选的通知通道不是电子邮件**If the preferred notification channel is not email
  • 用户想要接收更多方案的警报If users want alerts for more scenarios
  • 在 Azure 中查看来自本地组件(例如 System Center Data Protection Manager)的信息。门户不会在备份作业备份警报中显示这些信息If you want to view information from an on-premises component such as System Center Data Protection Manager in Azure, which the portal doesn't show in Backup Jobs or Backup Alerts

使用 Log Analytics 工作区Using Log Analytics workspace

使用 Log Analytics 创建警报Create alerts by using Log Analytics

在 Azure Monitor 中,可以在 Log Analytics 工作区内创建你自己的警报。In Azure Monitor, you can create your own alerts in a Log Analytics workspace. 在工作区中,可以使用 Azure 操作组来选择首选的通知机制。**In the workspace, you use Azure action groups to select your preferred notification mechanism.

重要

有关创建此查询所产生的成本的信息,请参阅 Azure Monitor 定价For information on the cost of creating this query, see Azure Monitor pricing.

打开 Log Analytics 工作区的“日志”部分,并为自己的日志创建查询****。Open the Logs section of the Log Analytics workspace and create a query for your own Logs. 选择“新建警报规则”时,将打开 Azure Monitor 警报创建页,如下图所示。****When you select New Alert Rule, the Azure Monitor alert-creation page opens, as shown in the following image.

在 Log Analytics 工作区中创建警报

此处的资源已标记为 Log Analytics 工作区,并提供了操作组集成。Here the resource is already marked as the Log Analytics workspace, and action group integration is provided.

Log Analytics 警报创建页

警报条件Alert condition

警报的定义特征是其触发条件。The defining characteristic of an alert is its triggering condition. 选择“条件”可在“日志”页上自动加载 Kusto 查询,如下图所示。**** ****Select Condition to automatically load the Kusto query on the Logs page as shown in the following image. 在此处可根据需要编辑条件。Here you can edit the condition to suit your needs. 有关详细信息,请参阅示例 Kusto 查询For more information, see Sample Kusto queries.

设置警报条件

如果需要,可以编辑 Kusto 查询。If necessary, you can edit the Kusto query. 选择阈值、期限和频率。Choose a threshold, period, and frequency. 阈值确定何时引发警报。The threshold determines when the alert will be raised. 期限是运行查询的时间范围。The period is the window of time in which the query is run. 例如,如果阈值大于 0,期限为 5 分钟,频率为 5 分钟,那么,该规则将每隔 5 分钟运行一次查询,并检查前 5 分钟的数据。For example, if the threshold is greater than 0, the period is 5 minutes, and the frequency is 5 minutes, then the rule runs the query every 5 minutes, reviewing the previous 5 minutes. 如果结果数大于 0,则系统将通过所选的操作组通知你。If the number of results is greater than 0, you're notified through the selected action group.

备注

若要为在给定日期创建的所有事件/日志每天运行一次预警规则,请将“时间范围”和“频率”值都更改为 1440(即 24小时)。To run the alert rule once a day, across all the events/logs that were created on the given day, change the value of both 'period' and 'frequency' to 1440, i.e., 24 hours.

警报操作组Alert action groups

使用操作组指定通知通道。Use an action group to specify a notification channel. 若要查看可用的通知机制,请在“操作组”下选择“新建”。**** ****To see the available notification mechanisms, under Action groups, select Create New.

“添加操作组”窗口中的可用通知机制

单纯地在 Log Analytics 中就能满足所有的警报和监视要求;你也可以使用 Log Analytics 来补充内置通知。You can satisfy all alerting and monitoring requirements from Log Analytics alone, or you can use Log Analytics to supplement built-in notifications.

有关详细信息,请参阅使用 Azure Monitor 创建、查看和管理日志警报以及在 Azure 门户中创建和管理操作组For more information, see Create, view, and manage log alerts by using Azure Monitor and Create and manage action groups in the Azure portal.

示例 Kusto 查询Sample Kusto queries

默认图形提供可对其生成警报的基本方案的 Kusto 查询。The default graphs give you Kusto queries for basic scenarios on which you can build alerts. 还可以修改查询,以获取要对其发出警报的数据。You can also modify the queries to get the data you want to be alerted on. 将以下示例 Kusto 查询粘贴到“日志”页中,然后基于查询创建警报:****Paste the following sample Kusto queries in the Logs page and then create alerts on the queries:

  • 所有成功的备份作业All successful backup jobs

    AddonAzureBackupJobs
    | where JobOperation=="Backup"
    | summarize arg_max(TimeGenerated,*) by JobUniqueId
    | where JobStatus=="Completed"
    
  • 所有失败的备份作业All failed backup jobs

    AddonAzureBackupJobs
    | where JobOperation=="Backup"
    | summarize arg_max(TimeGenerated,*) by JobUniqueId
    | where JobStatus=="Failed"
    
  • 所有成功的 Azure VM 备份作业All successful Azure VM backup jobs

    AddonAzureBackupJobs
    | where JobOperation=="Backup"
    | summarize arg_max(TimeGenerated,*) by JobUniqueId
    | where JobStatus=="Completed"
    | join kind=inner
    (
        CoreAzureBackup
        | where OperationName == "BackupItem"
        | where BackupItemType=="VM" and BackupManagementType=="IaaSVM"
        | distinct BackupItemUniqueId, BackupItemFriendlyName
    )
    on BackupItemUniqueId
    
  • 所有成功的 SQL 日志备份作业All successful SQL log backup jobs

    AddonAzureBackupJobs
    | where JobOperation=="Backup" and JobOperationSubType=="Log"
    | summarize arg_max(TimeGenerated,*) by JobUniqueId
    | where JobStatus=="Completed"
    | join kind=inner
    (
        CoreAzureBackup
        | where OperationName == "BackupItem"
        | where BackupItemType=="SQLDataBase" and BackupManagementType=="AzureWorkload"
        | distinct BackupItemUniqueId, BackupItemFriendlyName
    )
    on BackupItemUniqueId
    
  • 所有成功的 Azure 备份代理作业All successful Azure Backup agent jobs

    AddonAzureBackupJobs
    | where JobOperation=="Backup"
    | summarize arg_max(TimeGenerated,*) by JobUniqueId
    | where JobStatus=="Completed"
    | join kind=inner
    (
        CoreAzureBackup
        | where OperationName == "BackupItem"
        | where BackupItemType=="FileFolder" and BackupManagementType=="MAB"
        | distinct BackupItemUniqueId, BackupItemFriendlyName
    )
    on BackupItemUniqueId
    
  • 每个备份项使用的备份存储Backup Storage Consumed per Backup Item

    CoreAzureBackup
    //Get all Backup Items
    | where OperationName == "BackupItem"
    //Get distinct Backup Items
    | distinct BackupItemUniqueId, BackupItemFriendlyName
    | join kind=leftouter
    (AddonAzureBackupStorage
    | where OperationName == "StorageAssociation"
    //Get latest record for each Backup Item
    | summarize arg_max(TimeGenerated, *) by BackupItemUniqueId
    | project BackupItemUniqueId , StorageConsumedInMBs)
    on BackupItemUniqueId
    | project BackupItemUniqueId , BackupItemFriendlyName , StorageConsumedInMBs
    | sort by StorageConsumedInMBs desc
    

诊断数据更新频率Diagnostic data update frequency

保管库中的诊断数据将传送到 Log Analytics 工作区,但会出现一定的延迟。The diagnostic data from the vault is pumped to the Log Analytics workspace with some lag. 从恢复服务保管库推送每个事件 20 到 30 分钟后,这些事件将抵达 Log Analytics 工作区。**Every event arrives at the Log Analytics workspace 20 to 30 minutes after it's pushed from the Recovery Services vault. 下面是有关延迟的更多详细信息:Here are further details about the lag:

  • 在所有解决方案中,一旦创建备份服务的内置警报,就会立即推送这些警报。Across all solutions, the backup service's built-in alerts are pushed as soon as they're created. 因此,它们通常会在 20 到 30 分钟后显示在 Log Analytics 工作区中。So they usually appear in the Log Analytics workspace after 20 to 30 minutes.
  • 在所有解决方案中,在完成按需备份作业和还原作业后,会立即推送这些作业。**Across all solutions, on-demand backup jobs and restore jobs are pushed as soon as they finish.
  • 对于除 SQL 备份以外的所有解决方案,在完成计划的备份作业后,会立即推送这些作业。**For all solutions except SQL backup, scheduled backup jobs are pushed as soon as they finish.
  • 对于 SQL 备份,由于日志备份可每隔 15 分钟发生,所有已完成的计划备份作业的信息(包括日志)将每隔 6 小时进行批处理和推送。For SQL backup, because log backups can occur every 15 minutes, information for all the completed scheduled backup jobs, including logs, is batched and pushed every 6 hours.
  • 在所有解决方案中,备份项、策略、恢复点、存储等其他信息每天至少推送一次。**Across all solutions, other information such as the backup item, policy, recovery points, storage, and so on, is pushed at least once per day.
  • 备份配置发生更改(例如更改策略或编辑策略)会触发所有相关备份信息的推送。A change in the backup configuration (such as changing policy or editing policy) triggers a push of all related backup information.

使用恢复服务保管库的活动日志Using the Recovery Services vault's activity logs

注意

以下步骤仅适用于 Azure VM 备份。**The following steps apply only to Azure VM backups. 不能对 Azure 备份代理、Azure 中的 SQL 备份或 Azure 文件等解决方案使用这些步骤。You can't use these steps for solutions such as the Azure Backup agent, SQL backups within Azure, or Azure Files.

还可以使用活动日志来获取事件通知,例如备份成功。You can also use activity logs to get notification for events such as backup success. 遵循以下步骤开始:To begin, follow these steps:

  1. 登录 Azure 门户。Sign in into the Azure portal.
  2. 打开相关的恢复服务保管库。Open the relevant Recovery Services vault.
  3. 在保管库的属性中,打开“活动日志”部分。****In the vault's properties, open the Activity log section.

若要识别相应的日志并创建警报:To identify the appropriate log and create an alert:

  1. 应用下图中所示的筛选器,验证是否能够接收成功备份的活动日志。Verify that you're receiving activity logs for successful backups by applying the filters shown in the following image. 根据需要更改“时间跨度”值以查看记录。****Change the Timespan value as necessary to view records.

    通过筛选找到 Azure VM 备份的活动日志

  2. 选择操作名称以查看相关详细信息。Select the operation name to see the relevant details.

  3. 选择“新建警报规则”打开“创建规则”页。**** ****Select New alert rule to open the Create rule page.

  4. 遵循使用 Azure Monitor 创建、查看和管理活动日志警报中的步骤创建警报。Create an alert by following the steps in Create, view, and manage activity log alerts by using Azure Monitor.

    新建警报规则

此处的资源是恢复服务保管库本身。Here the resource is the Recovery Services vault itself. 请针对要在其中通过活动日志接收通知的所有保管库重复相同的步骤。Repeat the same steps for all of the vaults in which you want to be notified through activity logs. 条件中不包含阈值、期限或频率,因为此警报基于事件。The condition won't have a threshold, period, or frequency because this alert is based on events. 生成相关的活动日志后,会立即引发警报。As soon as the relevant activity log is generated, the alert is raised.

使用 Log Analytics 进行大规模监视Using Log Analytics to monitor at scale

可以在 Azure Monitor 中查看从活动日志和 Log Analytics 工作区创建的所有警报。You can view all alerts created from activity logs and Log Analytics workspaces in Azure Monitor. 只需打开左侧的“警报”窗格即可。****Just open the Alerts pane on the left.

尽管你可以通过活动日志获取通知,但我们强烈建议使用 Log Analytics(而不是活动日志)进行大规模监视。Although you can get notifications through activity logs, we highly recommend using Log Analytics rather than activity logs for monitoring at scale. 原因如下:Here's why:

  • 方案受限:通过活动日志发送通知仅适用于 Azure VM 备份。Limited scenarios: Notifications through activity logs apply only to Azure VM backups. 必须为每个恢复服务保管库设置通知。The notifications must be set up for every Recovery Services vault.
  • 定义适应:计划的备份活动不能适应活动日志的最新定义。Definition fit: The scheduled backup activity doesn't fit with the latest definition of activity logs. 相反,它适用于资源日志Instead, it aligns with resource logs. 当通过活动日志通道传送数据发生变化时,这种相符性会导致意外的影响。This alignment causes unexpected effects when the data that flows through the activity log channel changes.
  • 活动日志通道的问题:在恢复服务保管库中,从 Azure 备份传送的活动日志遵循一个新的模型。Problems with the activity log channel: In Recovery Services vaults, activity logs that are pumped from Azure Backup follow a new model. 遗憾的是,此项更改会影响 Azure 中国的活动日志生成。Unfortunately, this change affects the generation of activity logs in Azure China. 如果这些云服务的用户在 Azure Monitor 中基于活动日志创建或配置了任何警报,将不会触发警报。If users of these cloud services create or configure any alerts from activity logs in Azure Monitor, the alerts aren't triggered. 此外,在所有 Azure 公共区域,如果用户将恢复服务活动日志收集到 Log Analytics 工作区中,这些日志不会显示。Also, in all Azure public regions, if a user collects Recovery Services activity logs into a Log Analytics workspace, these logs don't appear.

使用 Log Analytics 工作区可对 Azure 备份保护的所有工作负荷进行大规模监视和发出警报。Use a Log Analytics workspace for monitoring and alerting at scale for all your workloads that are protected by Azure Backup.