使用 Azure Monitor 为 Azure Cosmos DB 创建警报Create alerts for Azure Cosmos DB using Azure Monitor

警报用于设置重复测试,以监视 Azure Cosmos DB 资源的可用性和响应能力。Alerts are used to set up recurring tests to monitor the availability and responsiveness of your Azure Cosmos DB resources. 当指标之一达到阈值时或活动日志中记录了特定事件时,警报可以通过电子邮件向你发送通知,或执行某个 Azure 函数。Alerts can send you a notification in the form of an email, or execute an Azure Function when one of your metrics reaches the threshold or if a specific event is logged in the activity log.

可以根据这些指标或 Azure Cosmos 帐户上的活动日志事件接收警报:You can receive an alert based on the metrics, or the activity log events on your Azure Cosmos account:

  • 指标 - 当指定指标的值超出分配的阈值时,会触发警报。Metrics - The alert triggers when the value of a specified metric crosses a threshold you assign. 例如,当所使用的请求单位总数超出 1000 RU/秒时,会触发警报。For example, when the total request units consumed exceed 1000 RU/s. 首次满足条件时,以及之后不再满足条件时,都会触发此警报。This alert is triggered both when the condition is first met and then afterwards when that condition is no longer being met. 有关 Azure Cosmos DB 中可用的各种指标,请参阅监视数据参考一文。See the monitoring data reference article for different metrics available in Azure Cosmos DB.

  • 活动日志事件 - 在出现特定事件时触发此警报。Activity log events - This alert triggers when a certain event occurs. 例如,当 Azure Cosmos 帐户的密钥被访问或刷新时触发此警报。For example, when the keys of your Azure Cosmos account are accessed or refreshed.

可以通过 Azure 门户中的“Azure Cosmos DB”窗格或 Azure Monitor 服务设置警报。You can set up alerts from the Azure Cosmos DB pane or the Azure Monitor service in the Azure portal. 这两个界面提供相同的选项。Both the interfaces offer the same options. 本文介绍了如何使用 Azure Monitor 为 Azure Cosmos DB 设置警报。This article shows you how to set up alerts for Azure Cosmos DB using Azure Monitor.

创建警报规则Create an alert rule

本部分展示了如何创建一个在你收到 HTTP 状态代码 429 时会发出的警报。当请求存在速率限制时,用户会收到该警报。This section shows how to create an alert when you receive an HTTP status code 429, which is received when the requests are rate limited. 例如,你可能希望在请求数为 100 或更高并受到速率限制时收到警报。For examples, you may want to receive an alert when there are 100 or more rate limited requests. 本文介绍了如何使用 HTTP 状态代码为此类场景配置警报。This article shows you how to configure an alert for such scenario by using the HTTP status code. 你还可以使用类似的步骤来配置其他类型的警报,只需根据要求选择其他条件即可。You can use the similar steps to configure other types of alerts as well, you just need to choose a different condition based on your requirement.

  1. 登录到 Azure 门户Sign into the Azure portal.

  2. 在左侧导航栏中选择“监视”,然后选择“警报”。 Select Monitor from the left-hand navigation bar and select Alerts.

  3. 选择“新建警报规则”按钮以打开“创建警报规则”窗格。Select the New alert rule button to open the Create alert rule pane.

  4. 填充“作用域”部分:Fill out the Scope section:

    • 打开“选择资源”窗格,配置以下内容:Open the Select resource pane and configure the following:

    • 选择订阅名称。Choose your subscription name.

    • 选择“Azure Cosmos DB 帐户”作为“资源类型”。 Select Azure Cosmos DB accounts for the resource type.

    • 选择你的 Azure Cosmos 帐户的“位置”。The location of your Azure Cosmos account.

    • 填写详细信息后,会显示所选作用域中的 Azure Cosmos 帐户的列表。After filling in the details, a list of Azure Cosmos accounts in the selected scope is displayed. 选择要为其配置警报的帐户,然后选择“完成”。Choose the one for which you want to configure alerts and select Done.

  5. 填充“条件”部分:Fill out the Condition section:

    • 打开“选择条件”窗格,以便打开“配置信号逻辑”页并配置以下内容: Open the Select condition pane to open the Configure signal logic page and configure the following:

    • 选择一个信号。Select a signal. “信号类型”可以是“指标”或“活动日志”。The signal type can be a Metric or an Activity Log. 对于此场景,请选择“指标”。Choose Metrics for this scenario. 原因在于,你希望在“请求单位总数”指标出现速率限制问题时收到警报。Because you want to get an alert when there are rate limiting issues on the total request units metric.

    • 对于“Azure Monitor 服务”,请选择“全部” Select All for the Monitor service

    • 选择一个信号名称Choose a Signal name. 若要获取 HTTP 状态代码的警报,请选择“请求单位总数”信号。To get an alert for HTTP status codes, choose the Total Request Units signal.

    • 在下一个选项卡中,可以定义警报触发逻辑,并使用图表查看 Azure Cosmos 帐户的趋势。In the next tab, you can define the logic for triggering an alert and use the chart to view trends of your Azure Cosmos account. “请求单位总数”指标支持维度。The Total Request Units metric supports dimensions. 可以按这些维度对指标进行筛选。These dimensions allow you to filter on the metric. 如果你未选择任何维度,系统会忽略此值。If you don't select any dimension, this value is ignored.

    • 选择“StatusCode”作为维度名称Choose StatusCode as the Dimension name. 选择“添加自定义值”,将状态代码设置为 429。Select Add custom value and set the status code to 429.

    • 在“警报逻辑”中,将“阈值”设置为“静态”。 In the Alert logic, set the Threshold to Static. 静态阈值使用用户定义的阈值来评估规则,而动态阈值则使用内置的机器学习算法来持续学习指标行为模式并自动计算阈值。The static threshold uses a user-defined threshold value to evaluate the rule, whereas the dynamic thresholds use inbuilt machine learning algorithms to continuously learn the metric behavior pattern and calculate the thresholds automatically.

    • 将“运算符”设置为“大于”,将“聚合类型”设置为“总计”,将“阈值”设置为“100”。 Set the operator to Greater than, the Aggregation type to Total, and the Threshold value to 100. 使用此逻辑时,如果客户端发现状态代码为 429 的请求超过 100 个,则会触发警报。With this logic, if your client sees more than 100 requests that have a 429 status code, the alert is triggered. 你还可以根据自己的需求配置聚合类型、聚合粒度和评估频率。You can also configure the aggregation type, aggregation granularity, and the frequency of evaluation based on your requirement.

    • 填充此窗体后,选择“完成”。After filling the form, select Done. 以下屏幕截图显示了警报逻辑的详细信息:The following screenshot shows the details of the alert logic:

      配置针对速率限制/429 请求的警报接收逻辑

  6. 填充“操作组”部分:Fill out the Action group section:

    • 在“创建规则”窗格中,选择一个现有的“操作组”或创建一个新的操作组。On the Create rule pane, select an existing action group, or create a new action group. 操作组用于定义在发生警报条件时要执行的操作。An action group enables you to define the action to be taken when an alert condition occurs. 在此示例中,请创建一个新的操作组,用于在触发警报时接收电子邮件通知。For this example, create a new action group to receive an email notification when the alert is triggered. 打开“添加操作组”窗格,填充以下详细信息:Open the Add action group pane and fill out the following details:

    • 操作组名称 - 操作组名称在资源组中必须独一无二。Action group name - The action group name must be unique within a resource group.

    • 短名称 - 操作组的短名称。此值包含在电子邮件和短信通知中,用于标识哪个操作组是通知来源。Short name - The action group's Short name, this value is included in email and SMS notifications to identify which action group was the source of the notification.

    • 选择要在其中创建此操作组的订阅和资源组。Choose the subscription and the resource group in which this action group will be created.

      配置用于接收警报的操作类型名称

    • 为操作提供一个名称,并选择“电子邮件/短信”作为通知类型Provide a name for your action and select Email/SMS message as the Notification Type. 以下屏幕截图显示了操作类型的详细信息:The following screenshot shows the details of the action type:

      配置用于接收警报的操作类型(例如电子邮件通知)

  7. 填充“警报规则详细信息”部分:Fill out the Alert rule details section:

    • 定义规则的名称,提供可选说明和警报的严重性级别,选择是否在创建规则后启用规则,然后选择“创建规则警报”以创建指标规则警报。Define a name for the rule, provide an optional description, the severity level of the alert, choose whether to enable the rule upon rule creation, and then select Create rule alert to create the metric rule alert.

创建警报后,它会在 10 分钟内激活。After creating the alert, it will be active within 10 minutes.

常见警报场景Common alerting scenarios

下面是一些可以使用警报的场景:The following are some scenarios where you can use alerts:

  • 当更新了 Azure Cosmos 帐户的密钥时。When the keys of an Azure Cosmos account are updated.

  • 当某个容器、数据库或区域所使用的数据或索引超过特定数目的字节时。When the data or index usage of a container, database, or a region exceeds a certain number of bytes.

  • 当添加、删除了某个区域或该区域进入脱机状态时。When a region is added, removed, or if it goes offline.

  • 当创建、删除或更新了数据库或容器时。When a database or a container is created, deleted, or updated.

  • 当数据库或容器的吞吐量发生变化时。When the throughput of your database or the container is changed.

后续步骤Next steps