.NET 和 .NET Core 中的自定义指标集合Custom metric collection in .NET and .NET Core

Azure Monitor Application Insights.NET 和 .NET Core SDK 有两种不同的方法来收集自定义指标,分别为 TrackMetric()GetMetric()The Azure Monitor Application Insights .NET and .NET Core SDKs have two different methods of collecting custom metrics, TrackMetric(), and GetMetric(). 这两种方法的主要区别在于本地聚合。The key difference between these two methods is local aggregation. TrackMetric() 缺少预聚合,而 GetMetric() 具有预聚合。TrackMetric() lacks pre-aggregation while GetMetric() has pre-aggregation. 推荐的方法是使用聚合,因此,TrackMetric() 不再是收集自定义指标的首选方法。The recommended approach is to use aggregation, therefore, TrackMetric() is no longer the preferred method of collecting custom metrics. 本文将引导你使用 GetMetric() 方法并介绍它的一些基本原理。This article will walk you through using the GetMetric() method, and some of the rationale behind how it works.

TrackMetric 与 GetMetricTrackMetric versus GetMetric

TrackMetric() 发送表示指标的原始遥测。TrackMetric() sends raw telemetry denoting a metric. 为每个值发送单个遥测项效率低。It is inefficient to send a single telemetry item for each value. TrackMetric() 在性能方面的效率也较低,因为每个 TrackMetric(item) 都要经过整个 SDK 管道,包括遥测初始化程序和处理器。TrackMetric() is also inefficient in terms of performance since every TrackMetric(item) goes through the full SDK pipeline of telemetry initializers and processors. TrackMetric() 不同,GetMetric() 为你处理本地预聚合,然后仅以一分钟的固定间隔提交聚合汇总指标。Unlike TrackMetric(), GetMetric() handles local pre-aggregation for you and then only submits an aggregated summary metric at a fixed interval of one minute. 因此,如果你需要在秒级甚至毫秒级密切监视某些自定义指标,则可以这样做,同时只需要承担每分钟监视一次的存储和网络流量成本。So if you need to closely monitor some custom metric at the second or even millisecond level you can do so while only incurring the storage and network traffic cost of only monitoring every minute. 这也极大降低了发生限制的风险,因为需要为聚合指标发送的遥测项的总数大大减少。This also greatly reduces the risk of throttling occurring since the total number of telemetry items that need to be sent for an aggregated metric are greatly reduced.

在 Application Insights 中,通过 TrackMetric()GetMetric() 收集的自定义指标不受采样限制。In Application Insights, custom metrics collected via TrackMetric() and GetMetric() are not subject to sampling. 对重要指标进行采样可能会导致以下情况:围绕这些指标构建的警报可能变得不可靠。Sampling important metrics can lead to scenarios where alerting you may have built around those metrics could become unreliable. 通过从不采样自定义指标,通常可以确信,一旦超过警报阈值,便会触发警报。By never sampling your custom metrics, you can generally be confident that when your alert thresholds are breached, an alert will fire. 但由于没有对自定义指标进行采样,因此存在一些潜在的问题。But since custom metrics aren't sampled, there are some potential concerns.

如果需要每秒钟跟踪一个指标的趋势,或以更细粒度的间隔进行跟踪,这可能会导致:If you need to track trends in a metric every second, or at an even more granular interval this can result in:

  • 数据存储成本增加。Increased data storage costs. 向 Azure Monitor 发送的数据量会产生一定的成本。There is a cost associated with how much data you send to Azure Monitor. (发送的数据越多,监视的总体成本就越高。)(The more data you send the greater the overall cost of monitoring.)
  • 网络流量/性能开销增加。Increased network traffic/performance overhead. (在某些情况下,这可能会产生货币成本和应用程序性能成本。)(In some scenarios this could have both a monetary and application performance cost.)
  • 引入限制风险。Risk of ingestion throttling. (当应用以短时间间隔发送速率很高的遥测时,Azure Monitor 服务会丢弃(“限制”)数据点。)(The Azure Monitor service drops ("throttles") data points when your app sends a very high rate of telemetry in a short time interval.)

与采样一样,限制也需要特别关注,限制可能会导致错过警报,因为触发警报的条件可能在本地发生,然后由于发送的数据太多而被丢弃在引入终结点。Throttling is of particular concern in that like sampling, throttling can lead to missed alerts since the condition to trigger an alert could occur locally and then be dropped at the ingestion endpoint due to too much data being sent. 这就是对于 .NET 和 .NET Core,我们不建议使用 TrackMetric() 的原因,除非你已实现自己的本地聚合逻辑。This is why for .NET and .NET Core we don't recommend using TrackMetric() unless you have implemented your own local aggregation logic. 如果尝试跟踪事件在给定时间段内发生的每个实例,则可能会发现 TrackEvent() 更适合。If you are trying to track every instance an event occurs over a given time period, you may find that TrackEvent() is a better fit. 但请记住,与自定义指标不同,自定义事件受采样限制。Though keep in mind that unlike custom metrics, custom events are subject to sampling. 当然,即使不编写自己的本地预聚合,也可以使用 TrackMetric(),但如果这样做,请注意陷阱。You can of course still use TrackMetric() even without writing your own local pre-aggregation, but if you do so be aware of the pitfalls.

总之,推荐使用 GetMetric() 方法,因为它执行预聚合、从所有 Track() 调用中累积值,并每分钟发送一次汇总/聚合。In summary GetMetric() is the recommended approach since it does pre-aggregation, it accumulates values from all the Track() calls and sends a summary/aggregate once every minute. 通过发送更少的数据点,同时仍然收集所有相关信息,这可以显著降低成本和性能开销。This can significantly reduce the cost and performance overhead by sending fewer data points, while still collecting all relevant information.

备注

只有 .NET 和 .NET Core SDK 具有 GetMetric() 方法。Only the .NET and .NET Core SDKs have a GetMetric() method. 如果你使用的是 Java,则可以使用 Micrometer 指标TrackMetric()If you are using Java you can use Micrometer metrics or TrackMetric(). 对于 Python,可以使用 OpenCensus.stats 发送自定义指标。For Python you can use OpenCensus.stats to send custom metrics. 对于 JavaScript 和 Node.js,仍可以使用 TrackMetric(),但请记住上一部分总结的注意事项。For JavaScript and Node.js you would still use TrackMetric(), but keep in mind the caveats that were outlined in the previous section.

GetMetric 入门Getting started with GetMetric

在示例中,我们将使用基本的 .NET Core 3.1 辅助角色服务应用程序。For our examples, we are going to use a basic .NET Core 3.1 worker service application. 如果要完全复制这些示例使用的测试环境,请按照监视辅助角色服务文章的步骤 1-6 将 Application Insights 添加到基本辅助角色服务项目模板。If you would like to exactly replicate the test environment that was used with these examples, follow steps 1-6 of the monitoring worker service article to add Application Insights to a basic worker service project template. 这些概念适用于任何可以使用 SDK 的通用应用程序,包括 Web 应用和控制台应用。These concepts apply to any general application where the SDK can be used including web apps and console apps.

发送指标Sending metrics

worker.cs 文件的内容替换为以下内容:Replace the contents of your worker.cs file with the following:

using System;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
using Microsoft.ApplicationInsights;

namespace WorkerService3
{
    public class Worker : BackgroundService
    {
        private readonly ILogger<Worker> _logger;
        private TelemetryClient _telemetryClient;

        public Worker(ILogger<Worker> logger, TelemetryClient tc)
        {
            _logger = logger;
            _telemetryClient = tc;
        }

        protected override async Task ExecuteAsync(CancellationToken stoppingToken)
        {   // The following line demonstrates usages of GetMetric API.
            // Here "computersSold", a custom metric name, is being tracked with a value of 42 every second.
            while (!stoppingToken.IsCancellationRequested)
            {
                _telemetryClient.GetMetric("computersSold").TrackValue(42);

                _logger.LogInformation("Worker running at: {time}", DateTimeOffset.Now);
                await Task.Delay(1000, stoppingToken);
            }
        }
    }
}

如果运行上面的代码,并在 Visual Studio 输出窗口或 Telerik 的 Fiddler 等工具中观察发送的遥测,则会发现 while 循环重复执行而不发送遥测,然后将在大约 60 秒后发送一个遥测项,在我们的测试中如下所示:If you run the code above and watch the telemetry being sent via the Visual Studio output window or a tool like Telerik's Fiddler, you will see the while loop repeatedly executing with no telemetry being sent and then a single telemetry item will be sent by around the 60-second mark, which in the case of our test looks as follows:

Application Insights Telemetry: {"name":"Microsoft.ApplicationInsights.Dev.00000000-0000-0000-0000-000000000000.Metric", "time":"2019-12-28T00:54:19.0000000Z",
"ikey":"00000000-0000-0000-0000-000000000000",
"tags":{"ai.application.ver":"1.0.0.0",
"ai.cloud.roleInstance":"Test-Computer-Name",
"ai.internal.sdkVersion":"m-agg2c:2.12.0-21496",
"ai.internal.nodeName":"Test-Computer-Name"},
"data":{"baseType":"MetricData",
"baseData":{"ver":2,"metrics":[{"name":"computersSold",
"kind":"Aggregation",
"value":1722,
"count":41,
"min":42,
"max":42,
"stdDev":0}],
"properties":{"_MS.AggregationIntervalMs":"42000",
"DeveloperMode":"true"}}}}

此单个遥测项代表了 41 个不同指标度量的聚合。This single telemetry item represents an aggregate of 41 distinct metric measurements. 由于我们反复发送相同的值,因此标准偏差 (stDev) 为 0,具有相同的最大值 (max) 和最小值 (min) 。Since we were sending the same value over and over again we have a standard deviation (stDev) of 0 with an identical maximum (max), and minimum (min) values. 值属性表示聚合的所有单个值的总和。The value property represents a sum of all the individual values that were aggregated.

如果我们在日志(分析)体验中检查 Application Insights 资源,此单独的遥测项将如下所示:If we examine our Application Insights resource in the Logs (Analytics) experience, this individual telemetry item would look as follows:

Log Analytics 查询视图

备注

虽然原始遥测项在引入后不包含显式的总和属性/字段,但我们会为你创建一个。While the raw telemetry item did not contain an explicit sum property/field once ingested we create one for you. 在本例中,valuevalueSum 属性都表示相同的内容。In this case both the value and valueSum property represent the same thing.

你还可以在门户的指标部分访问自定义指标遥测。You can also access your custom metric telemetry in the Metrics section of the portal. 同时作为基于日志的指标和自定义指标As both a log-based, and custom metric. (以下屏幕截图是基于日志的指标示例。)指标资源管理器视图(The screenshot below is an example of log-based.) Metrics explorer view

用于高吞吐量使用情况的缓存指标引用Caching metric reference for high-throughput usage

在某些情况下,指标值的观察非常频繁。In some cases metric values are observed very frequently. 例如,每秒处理 500 个请求的高吞吐量服务可能希望为每个请求发出 20 个遥测指标。For example, a high-throughput service that processes 500 requests/second may want to emit 20 telemetry metrics for each request. 这意味着每秒跟踪 10,000 个值。This means tracking 10,000 values per second. 在这种高吞吐量方案中,用户可能需要通过避免一些查找来帮助 SDK。In such high-throughput scenarios, users may need to help the SDK by avoiding some lookups.

例如,在本例中,上面的示例对指标“ComputersSold”的句柄执行了查找操作,然后跟踪观察到的值 42。For example, in this case, the example above performed a lookup for a handle for the metric "ComputersSold" and then tracked an observed value 42. 相反,用户应该缓存该句柄,以进行多次跟踪调用:Instead, the handle may be cached for multiple track invocations:

//...

        protected override async Task ExecuteAsync(CancellationToken stoppingToken)
        {
            // This is where the cache is stored to handle faster lookup
            Metric computersSold = _telemetryClient.GetMetric("ComputersSold");
            while (!stoppingToken.IsCancellationRequested)
            {

                computersSold.TrackValue(42);

                computersSold.TrackValue(142);

                _logger.LogInformation("Worker running at: {time}", DateTimeOffset.Now);
                await Task.Delay(50, stoppingToken);
            }
        }

除了缓存指标句柄之外,上面的示例还将 Task.Delay 缩短到了 50 毫秒,这样循环将更频繁地执行,从而导致了 772 次 TrackValue() 调用。In addition to caching the metric handle, the example above also reduced the Task.Delay to 50 milliseconds so that the loop would execute more frequently resulting in 772 TrackValue() invocations.

多维指标Multi-dimensional metrics

上一部分的示例显示零维度指标。The examples in the previous section show zero-dimensional metrics. 指标也可以是多维的。Metrics can also be multi-dimensional. 目前最多支持 10 个维度。We currently support up to 10 dimensions.

下面是如何创建一维指标的示例:Here is an example of how to create a one-dimensional metric:

//...

        protected override async Task ExecuteAsync(CancellationToken stoppingToken)
        {
            // This is an example of a metric with a single dimension.
            // FormFactor is the name of the dimension.
            Metric computersSold= _telemetryClient.GetMetric("ComputersSold", "FormFactor");

            while (!stoppingToken.IsCancellationRequested)
            {
                // The number of arguments (dimension values)
                // must match the number of dimensions specified while GetMetric.
                // Laptop, Tablet, etc are values for the dimension "FormFactor"
                computersSold.TrackValue(42, "Laptop");
                computersSold.TrackValue(20, "Tablet");
                computersSold.TrackValue(126, "Desktop");


                _logger.LogInformation("Worker running at: {time}", DateTimeOffset.Now);
                await Task.Delay(50, stoppingToken);
            }
        }

运行此代码至少 60 秒将导致向 Azure 发送三个不同的遥测项,每个项代表三种外形规格之一的聚合。Running this code for at least 60 seconds will result in three distinct telemetry items being sent to Azure, each representing the aggregation of one of the three form factors. 与之前一样,你可以在日志(分析)视图中检查这些项:As before you can examine these in Logs (Analytics) view:

多维指标的日志分析视图

以及在指标资源管理器体验中:As well as in the Metrics explorer experience:

自定义指标

但是,你将注意到无法按新的自定义维度拆分指标,也无法使用指标视图查看自定义维度:However, you will notice that you aren't able to split the metric by your new custom dimension, or view your custom dimension with the metrics view:

拆分支持

默认情况下,指标资源管理器体验中的多维指标未在 Application Insights 资源中启用。By default multi-dimensional metrics within the Metric explorer experience are not turned on in Application Insights resources.

启用多维指标Enable multi-dimensional metrics

要为 Application Insights 资源启用多维指标,请选择“使用情况和估计成本” > “自定义指标” > “启用自定义指标维度的警报” > “确定” 。To enable multi-dimensional metrics for an Application Insights resource, Select Usage and estimated costs > Custom Metrics > Enable alerting on custom metric dimensions > OK. 此处可以找到关于此内容的更多详细信息。More details about this can be found here.

完成更改并发送新的多维遥测后,你将能够“应用拆分”。Once you have made that change and send new multi-dimensional telemetry, you will be able to Apply splitting.

备注

只有在门户中启用此功能后,新发送的指标才会存储维度。Only newly sent metrics after the feature was turned on in the portal will have dimensions stored.

应用拆分

以及查看每个 FormFactor 维度的指标聚合:And view your metric aggregations for each FormFactor dimension:

外形规格

当维护超过三个时如何使用 MetrocidentifierHow to use MetricIdentifier when there are more than three dimensions

目前支持 10 个维度,但是大于 3 个维度需要使用 MetricIdentifierCurrently 10 dimensions are supported however, greater than three dimensions requires the use of MetricIdentifier:

// Add "using Microsoft.ApplicationInsights.Metrics;" to use MetricIdentifier
// MetricIdentifier id = new MetricIdentifier("[metricNamespace]","[metricId],"[dim1]","[dim2]","[dim3]","[dim4]","[dim5]");
MetricIdentifier id = new MetricIdentifier("CustomMetricNamespace","ComputerSold", "FormFactor", "GraphicsCard", "MemorySpeed", "BatteryCapacity", "StorageCapacity");
Metric computersSold  = _telemetryClient.GetMetric(id);
computersSold.TrackValue(110,"Laptop", "Nvidia", "DDR4", "39Wh", "1TB");

自定义指标配置Custom metric configuration

如果想要更改指标配置,则需要在初始化指标的位置执行此操作。If you want to alter the metric configuration, you need to do this in the place where the metric is initialized.

特殊维度名称Special dimension names

指标不使用用于访问它们的 TelemetryClient 的遥测上下文,MetricDimensionNames 类中作为常量使用的特殊维度名称是解决此限制的最佳方法。Metrics do not use the telemetry context of the TelemetryClient used to access them, special dimension names available as constants in MetricDimensionNames class is the best workaround for this limitation.

以下“特殊操作请求大小”指标发送的指标聚合不会将其 Context.Operation.Name 设置为“特殊操作”。Metric aggregates sent by the below "Special Operation Request Size"-metric will not have their Context.Operation.Name set to "Special Operation". TrackMetric() 或任何其他 TrackXXX() 会将 OperationName 正确设置为“特殊操作”。Whereas TrackMetric() or any other TrackXXX() will have OperationName set correctly to "Special Operation".

        //...
        TelemetryClient specialClient;
        private static int GetCurrentRequestSize()
        {
            // Do stuff
            return 1100;
        }
        int requestSize = GetCurrentRequestSize()

        protected override async Task ExecuteAsync(CancellationToken stoppingToken)
        {
            while (!stoppingToken.IsCancellationRequested)
            {
                //...
                specialClient.Context.Operation.Name = "Special Operation";
                specialClient.GetMetric("Special Operation Request Size").TrackValue(requestSize);
                //...
            }
                   
        }

在这种情况下,使用 MetricDimensionNames 类中列出的特殊维度名称来指定 TelemetryContext 值。In this circumstance, use the special dimension names listed in the MetricDimensionNames class in order to specify TelemetryContext values.

例如,当下一条语句生成的指标聚合发送到 Application Insights 云终结点时,其 Context.Operation.Name 数据字段将设置为“特殊操作”:For example, when the metric aggregate resulting from the next statement is sent to the Application Insights cloud endpoint, its Context.Operation.Name data field will be set to "Special Operation":

_telemetryClient.GetMetric("Request Size", MetricDimensionNames.TelemetryContext.Operation.Name).TrackValue(requestSize, "Special Operation");

此特殊维度的值将复制到 TelemetryContext 中,并且不会用作“普通”维度。The values of this special dimension will be copied into the TelemetryContext and will not be used as a 'normal' dimension. 如果还希望用于普通指标探索的操作维度,则需要针对此目的创建单独的维度:If you want to also keep an operation dimension for normal metric exploration, you need to create a separate dimension for that purpose:

_telemetryClient.GetMetric("Request Size", "Operation Name", MetricDimensionNames.TelemetryContext.Operation.Name).TrackValue(requestSize, "Special Operation", "Special Operation");

维度和时序上限Dimension and time-series capping

要防止遥测子系统意外耗尽资源,可以控制每个指标的最大数据系列数。To prevent the telemetry subsystem from accidentally using up your resources, you can control the maximum number of data series per metric. 默认限制是每个指标的总数据系列不超过 1000 个,每个维度的不同值不超过 100 个。The default limits are no more than 1000 total data series per metric, and no more than 100 different values per dimension.

在维度和时序上限的上下文中,我们使用 Metric.TrackValue(..) 来确保遵守限制。In the context of dimension and time series capping we use Metric.TrackValue(..) to make sure that the limits are observed. 如果已经达到限制,Metric.TrackValue(..) 将返回“False”,并且不会跟踪该值。If the limits are already reached, Metric.TrackValue(..) will return "False" and the value will not be tracked. 否则将返回“True”。Otherwise it will return "True". 如果指标的数据源于用户输入,这将很有用。This is useful if the data for a metric originates from user input.

MetricConfiguration 构造函数可以选择如何在相应的指标以及实现 IMetricSeriesConfiguration 的类对象中管理不同的系列,该类为指标的每个单独系列指定聚合行为:The MetricConfiguration constructor takes some options on how to manage different series within the respective metric and an object of a class implementing IMetricSeriesConfiguration that specifies aggregation behavior for each individual series of the metric:

var metConfig = new MetricConfiguration(seriesCountLimit: 100, valuesPerDimensionLimit:2,
                new MetricSeriesConfigurationForMeasurement(restrictToUInt32Values: false));

Metric computersSold = _telemetryClient.GetMetric("ComputersSold", "Dimension1", "Dimension2", metConfig);

// Start tracking.
computersSold.TrackValue(100, "Dim1Value1", "Dim2Value1");
computersSold.TrackValue(100, "Dim1Value1", "Dim2Value2");

// The following call gives 3rd unique value for dimension2, which is above the limit of 2.
computersSold.TrackValue(100, "Dim1Value1", "Dim2Value3");
// The above call does not track the metric, and returns false.
  • seriesCountLimit 是指标可以包含的最大数据时序数目。seriesCountLimit is the max number of data time series a metric can contain. 达到此限制后,调用 TrackValue()Once this limit is reached, calls to TrackValue().
  • valuesPerDimensionLimit 以类似的方式限制每个维度的非重复值数目。valuesPerDimensionLimit limits the number of distinct values per dimension in a similar manner.
  • restrictToUInt32Values 确定是否只跟踪非负整数值。restrictToUInt32Values determines whether or not only non-negative integer values should be tracked.

以下示例演示如何发送消息以了解是否超过上限:Here is an example of how to send a message to know if cap limits are exceeded:

if (! computersSold.TrackValue(100, "Dim1Value1", "Dim2Value3"))
{
// Add "using Microsoft.ApplicationInsights.DataContract;" to use SeverityLevel.Error
_telemetryClient.TrackTrace("Metric value not tracked as value of one of the dimension exceeded the cap. Revisit the dimensions to ensure they are within the limits",
SeverityLevel.Error);
}

后续步骤Next steps