Azure Key Vault 的监视和警报Monitoring and alerting for Azure Key Vault

概述Overview

一旦开始使用密钥保管库来存储你的生产机密,就必须监视密钥保管库的运行状况,以确保你的服务按预期运转。Once you have started to use key vault to store your production secrets, it is important to monitor the health of your key vault to make sure your service operates as intended. 如果开始扩展服务,则发送到密钥保管库的请求数将增加。As you start to scale your service the number of requests sent to your key vault will rise. 这可能会增加请求的延迟时间,并且在极端情况下,会导致请求被限制,从而影响服务的性能。This has a potential to increase the latency of your requests and in extreme cases, cause your requests to be throttled which will impact the performance of your service. 如果密钥保管库发送了异常数量的错误代码,则也需要发出警报,以便你可以快速了解任何访问策略或防火墙配置的问题。You also need to be alerted if your key vault is sending an unusual number of error codes, so you can be quickly notified of any access policy or firewall configuration issues. 本文档包含以下主题:This document will cover the following topics:

  • 要监视的基本 Key Vault 指标Basic Key Vault metrics to monitor
  • 如何配置指标并创建仪表板How to configure metrics and create a dashboard
  • 如何根据指定阈值创建警报How to create alerts at specified thresholds

要监视的基本 Key Vault 指标Basic Key Vault metrics to monitor

  • 保管库可用性Vault Availability
  • 保管库饱和度Vault Saturation
  • 服务 API 延迟Service API Latency
  • 服务 API 命中总计(按活动类型筛选)Total Service API Hits (Filter by Activity Type)
  • 错误代码(按状态代码筛选)Error Codes (Filter by Status Code)

保管库可用性此指标应始终为 100%。这是要监视的重要指标,因为它可以快速向你显示密钥保管库是否遇到中断情况。Vault Availability - This metric should always be at 100% this is an important metric to monitor, since it can quickly show you if your key vault experienced an outage.

保管库饱和密钥保管库每秒可服务的请求数取决于正在执行的操作类型。Vault Saturation – The number of requests per second that a key vault can serve is based on the type of operation being performed. 某些保管库操作具有较低的每秒请求数阈值。Some vault operations have a lower requests-per-second threshold. 此指标聚合了跨所有操作类型密钥保管库的总使用量,得出一个百分比值,用来指示当前密钥保管库的使用情况。This metric aggregates the total usage of your key vault across all operation types to come up with a percentage value that indicates your current key vault usage. 有关密钥保管库服务限制的完整列表,请参阅以下文档。For a full list of key vault service limits, see the following document. Azure Key Vault 服务限制Azure Key Vault Service Limits

服务 API 延迟 - 此指标显示调用密钥保管库的平均延迟。Service API Latency - This metric shows the average latency of a call to key vault. 尽管密钥保管库可能未超过服务限制,但其高利用率可能会因下游的应用程序出现故障而导致延迟。Although your key vault may be within service limits, a high utilization of key vault could introduce latency that causes applications downstream to fail.

API 命中总计此指标显示对密钥保管库进行的所有调用。Total API Hits - This metric shows all of the calls made to your key vault. 这将帮助你识别哪些应用程序正在调用密钥保管库。This will help you identify which applications are calling your key vault.

错误代码此指标会显示密钥保管库是否遇到异常数量的错误。Error Codes – This metric will show you if your key vault is experiencing an unusual amount of errors. 如需错误代码的完整列表和故障排除指南,请参阅以下文档。For a full list of error codes and troubleshooting guidance, see the following document. Azure Key Vault REST API 错误代码Azure Key Vault REST API Error Codes

如何配置指标并创建仪表板How to configure metrics and create a dashboard

  1. 登录到 Azure 门户Login to the Azure portal
  2. 导航到 Key VaultNavigate to your Key Vault
  3. 在“监视”下选择“指标” Select Metrics under Monitoring

Azure 门户的屏幕截图Screenshot of Azure portal

  1. 将图表的标题更新为想在仪表板上看到的内容。Update the title of the chart to what you want to see on your dashboard.
  2. 选择“作用域”。Select the scope. 在此示例中,我们将选择单个密钥保管库。In this example we will select a single key vault.
  3. 选择指标“保管库总体可用性”,然后选择聚合方式为“平均” Select the Metric Overall Vault Availability and Aggregation Avg
  4. 更新时间范围为最近 24 小时,更新时间粒度为 1 分钟。Update the time range to the Last 24 Hours and update the time granularity to 1 minute.

Azure 门户的屏幕截图Screenshot of Azure portal

  1. 为保管库饱和度和服务 API 延迟指标重复上述步骤。Repeat the steps above for the Vault Saturation and Service API Latency metrics. 选择“固定到仪表板”,将指标保存到仪表板中。Select Pin to Dashboard to save your metrics into a dashboard.

Important

选择“固定到仪表板”并保存每个配置的指标。Select "Pin to Dashboard" and save every metric you configure. 如果离开页面时未保存,则返回该页面时配置更改将会丢失。If you leave the page and return to it without saving, your configuration changes will be lost.

  1. 若要监视密钥保管库上所有类型的操作,请使用“服务 API 命中总计”指标,然后选择“按活动类型应用拆分” To monitor all of the types of operations on the key vault, use the Total Service API Hits Metric, and Select Apply Splitting by Activity Type

Azure 门户的屏幕截图Screenshot of Azure portal

  1. 若要监视密钥保管库上的错误代码,请使用“服务 API 结果总计”指标,然后选择“按活动类型应用拆分” To monitor for error codes on the key vault, use the Total Service API Results Metric, and Select Apply Splitting by Activity Type

Azure 门户的屏幕截图Screenshot of Azure portal

现在,你将看到一个类似下图的仪表板。Now you will have a dashboard that looks like this. 可以单击每个磁贴右上方的 3 个点,并根据需要重新排列和调整磁贴大小。You can click the 3 dots on the top right of each tile and you can rearrange and resize the tiles as you need.

保存并发布仪表板后,它将在 Azure 订阅中创建新的资源。Once you save and publish the dashboard, it will create a new resource in your Azure subscription. 可以通过搜索“共享仪表板”随时查看它。You will be able to see it at anytime by searching for "shared dashboard".

Azure 门户的屏幕截图Screenshot of Azure portal

如何配置 Key Vault 上的警报How to configure alerts on your Key Vault

本部分将介绍如何在密钥保管库上配置警报,这样就可以在密钥保管库处于不正常状态时提醒团队立即采取措施。This section will show you how to configure alerts on your key vault so you can alert your team to take action immediately if your key vault is in an unhealthy state. 可以配置发送电子邮件的警报,最好是发送到团队 DL,触发事件网格通知,或者拨打电话号码或发信息。You can configure alerts that send an email, preferably to a team DL, fire an event grid notification, or call or text a phone number. 可以选择基于固定值的静态警报,也可以选择动态警报,在一定时间范围内,如果受监视的指标超过密钥保管库的平均限制一定次数时发出警报。You can also choose static alerts based on a fixed value, or a dynamic alert that will alert you if a monitored metric exceeds the average limit of your key vault a certain number of times within a defined time range.

Important

请注意,新配置的警报需要长达 10 分钟的时间才能开始发送通知。Please note it can take up to 10 minutes for newly configured alerts to start sending notifications.

配置操作组Configure an action group

操作组是可配置的通知和属性列表。An action group is a configurable list of notifications and properties.

  1. 登录到 Azure 门户Login to the Azure portal
  2. 在搜索框中搜索“警报”Search for Alerts in the search box
  3. 选择“管理操作”Select Manage Actions

Azure 门户的屏幕截图Screenshot of Azure portal

  1. 选择“+ 添加操作组”Select + Add Action Group

Azure 门户的屏幕截图Screenshot of Azure portal

  1. 为操作组选择“操作类型”。Choose the Action Type for your Action Group. 本示例中,将创建电子邮件警报。In this example, we will create an email alert.

Azure 门户的屏幕截图Screenshot of Azure portal

Azure 门户的屏幕截图Screenshot of Azure portal

  1. 单击页面底部的“确定”。Click OK at the bottom of the page. 已成功创建操作组。You have successfully created an action group.

现在,已配置了一个操作组,接下来我们将配置密钥保管库警报阈值。Now that you have configured an action group, we will configure the the key vault alert thresholds.

配置警报阈值Configure alert thresholds

  1. 在 Azure 门户中选择密钥保管库资源,然后选择“监视”下的“警报” Select your key vault resource in the Azure portal and select Alerts under Monitoring

Azure 门户的屏幕截图Screenshot of Azure portal

  1. 选择“新建警报规则”Select New Alert Rule

Azure 门户的屏幕截图Screenshot of Azure portal

  1. 选择警报规则的作用域。Select the scope of your alert rule. 可以选择一个或多个保管库。You can select a single vault or multiple.

Important

请注意,如果选择多个保管库作为警报作用域,则所有选定的保管库必须位于同一区域中。Please note that when you are selecting multiple vaults for the scope of your alerts, all selected vaults must be in the same region. 不同区域中的保管库需要配置单独的警报规则。You will have to configure separate alert rules for vaults in different regions.

Azure 门户的屏幕截图Screenshot of Azure portal

  1. 为警报选择条件。Select the conditions for your alerts. 可以选择以下任何信号,并为警报定义逻辑。You can choose any of the following signals and define your logic for alerting. Key Vault 团队建议配置以下警报阈值。The Key Vault team recommends configuring the following alerting thresholds.

    • Key Vault 可用性低于 100%(静态阈值)Key Vault Availability drops below 100% (Static Threshold)
    • Key Vault 延迟大于 500ms(静态阈值)Key Vault Latency is greater than 500ms (Static Threshold)
    • 保管库整体饱和度大于 75%(静态阈值)Overall Vault Saturation is greater than 75% (Static Threshold)
    • 保管库整体饱和度超过平均数(动态阈值)Overall Vault Saturation exceeds average (Dynamic Threshold)
    • 错误代码总计高于平均值(动态阈值)Total Error Codes higher than average (Dynamic Threshold)

Azure 门户的屏幕截图Screenshot of Azure portal

示例 1:为延迟配置静态警报阈值Example 1: Configuring a static alert threshold for latency

选择“整体服务 API 延迟”作为信号名称Select Overall Service API Latency as the signal name

Azure 门户的屏幕截图Screenshot of Azure portal

请查看以下配置参数。Please see the following configuration parameters.

  • 将阈值设置为“静态”Set the Threshold to Static
  • 将运算符设置为“大于”Set the Operator to Greater Than
  • 将聚合类型设置为“平均”Set the Aggregation Type to Average
  • 将阈值设置为“500”Set the Threshold Value to 500
  • 将聚合期间设置为“5 分钟”Set Aggregation Period to 5 minutes
  • 将评估频率设置为“1 分钟”Set the Evaluation Frequency to 1 minute
  • 选择“完成”Select Done

Azure 门户的屏幕截图Screenshot of Azure portal

示例 2:为保管库饱和度配置动态警报阈值Example 2: Configuring a dynamic alert threshold for vault saturation

使用动态警报时,可以看到所选密钥保管库的历史数据。When you use a dynamic alert, you will be able to see historical data of the key vault you have selected. 蓝色区域表示密钥保管库的平均使用情况。The blue area represents the average usage of your key vault. 红色区域显示可能触发警报的高峰,前提是满足警报配置中的其他条件。The red area shows spikes that would have triggered an alert provided other criteria in the alert configuration are met. 红点显示在聚合时间范围内满足警报条件的冲突实例。The red dots show instances of violations where the criteria for the alert was met during the aggregated time window. 可以设置在一定时间内发生一定次数的冲突后触发警报。You can set an alert to fire after a certain number of violations within a set time. 如果不想包含过去的数据,如下所示在高级设置中有一个选项可以排除旧数据。If you don't want to include past data, there is an option to exclude old data below in advanced settings.

Azure 门户的屏幕截图Screenshot of Azure portal

请查看以下配置参数。Please see the following configuration parameters.

  • 将阈值设置为“动态”Set the Threshold to Dynamic
  • 将运算符设置为“大于”Set the Operator to Greater Than
  • 将聚合类型设置为“平均”Set the Aggregation Type to Average
  • 将阈值敏感度设置为“中等”Set the Threshold Sensitivity to Medium
  • 将聚合期间设置为“5 分钟”Set Aggregation Period to 5 minutes
  • 将评估频率设置为“1 分钟”Set the Evaluation Frequency to 1 minute
  • “可选”配置高级设置Optional Configure Advanced Settings
  • 选择“完成”Select Done

Azure 门户的屏幕截图Screenshot of Azure portal

  1. 添加已配置的操作组Add the action group that you have configured

Azure 门户的屏幕截图Screenshot of Azure portal

  1. 启用警报并分配严重性Enable the alert and assign a severity

Azure 门户的屏幕截图Screenshot of Azure portal

  1. 创建警报Create the alert

后续步骤Next steps

恭喜,现在已成功创建了一个用于密钥保管库的监视仪表板并配置了警报!Congratulations, you have now successfully created a monitoring dashboard and configured alerts for your key vault! 完成上述所有步骤后,当密钥保管库满足配置的警报条件时,就会收到电子邮件警报。Once you have followed all of the steps above, you should receive email alerts when your key vault meets the alert criteria you configured. 下面显示了一个示例。An example is shown below. 使用本文中设置的工具积极监视密钥保管库的运行状况。Use the tools you have set up in this article to actively monitor the health of your key vault.

示例电子邮件警报Example email alert

Azure 门户的屏幕截图Screenshot of Azure portal