Application Insights 中的遥测关联Telemetry correlation in Application Insights

在微服务的世界中,每次逻辑操作都需要在服务的不同组件中完成工作。In the world of microservices, every logical operation requires work to be done in various components of the service. 可以通过 Application Insights 单独监视这些组件。You can monitor each of these components separately by using Application Insights. Application Insights 支持分布式遥测关联,可用来检测哪个组件要对故障或性能下降问题负责。Application Insights supports distributed telemetry correlation, which you use to detect which component is responsible for failures or performance degradation.

本文介绍了 Application Insights 用于关联由多个组件发送的遥测的数据模型。This article explains the data model used by Application Insights to correlate telemetry sent by multiple components. 其中阐述了 context-propagation 技术和协议,It covers context-propagation techniques and protocols. 以及如何在不同的语言和平台上实现相关的策略。It also covers the implementation of correlation tactics on different languages and platforms.

遥测关联的数据模型Data model for telemetry correlation

Application Insights 定义了用于分配遥测关联的数据模型Application Insights defines a data model for distributed telemetry correlation. 要将遥测与逻辑操作关联,每个遥测项都应包含名为 operation_Id 的上下文字段。To associate telemetry with a logical operation, every telemetry item has a context field called operation_Id. 此标识符由分布式跟踪中的每个遥测项共享。This identifier is shared by every telemetry item in the distributed trace. 因此,即使失去单个层的遥测,也仍可关联其他组件报告的遥测。So even if you lose telemetry from a single layer, you can still associate telemetry reported by other components.

分布式逻辑操作通常由一系列小规模操作(某个组件处理的请求)构成。A distributed logical operation typically consists of a set of smaller operations that are requests processed by one of the components. 这些操作由请求遥测定义。These operations are defined by request telemetry. 每个请求遥测项都具有自身的 id,用于对自身进行唯一全局标识。Every request telemetry item has its own id that identifies it uniquely and globally. 与此请求关联的所有遥测项(例如跟踪和异常)应将 operation_parentId 设置为请求 id 的值。And all telemetry items (such as traces and exceptions) that are associated with the request should set the operation_parentId to the value of the request id.

每个传出操作(例如,对另一个组件的 HTTP 调用)是由依赖项遥测表示的。Every outgoing operation, such as an HTTP call to another component, is represented by dependency telemetry. 依赖项遥测也定义了自身的全局独一无二的 idDependency telemetry also defines its own id that's globally unique. 此依赖项调用发起的请求遥测将此 id 用作其 operation_parentIdRequest telemetry, initiated by this dependency call, uses this id as its operation_parentId.

可以结合 dependency.id 使用 operation_Idoperation_parentIdrequest.id,生成分布式逻辑操作的视图。You can build a view of the distributed logical operation by using operation_Id, operation_parentId, and request.id with dependency.id. 这些字段还定义了遥测调用的因果关系顺序。These fields also define the causality order of telemetry calls.

在微服务环境中,来自组件的跟踪可能会进入不同的存储项。In a microservices environment, traces from components can go to different storage items. 每个组件可能在 Application Insights 中具有其自身的检测密钥。Every component can have its own instrumentation key in Application Insights. 为了获取逻辑操作的遥测数据,Application Insights 会查询每个存储项中的数据。To get telemetry for the logical operation, Application Insights queries data from every storage item. 如果存储项的数目大,需要提示后续查找位置。When the number of storage items is large, you'll need a hint about where to look next. Application Insights 数据模型定义了以下两个字段来解决此问题:request.sourcedependency.targetThe Application Insights data model defines two fields to solve this problem: request.source and dependency.target. 第一个字段定义发起依赖项请求的组件。The first field identifies the component that initiated the dependency request. 第二个字段定义哪个组件返回依赖项调用的响应。The second field identifies which component returned the response of the dependency call.

示例Example

接下来举例说明。Let's look at an example. 名为 Stock Prices 的应用程序使用名为 Stock 的外部 API 显示某只股票的当前市价。An application called Stock Prices shows the current market price of a stock by using an external API called Stock. Stock Prices 应用程序有一个名为 Stock 的页面,可以由客户端 Web 浏览器通过 GET /Home/Stock 打开。The Stock Prices application has a page called Stock page that the client web browser opens by using GET /Home/Stock. 该应用程序使用 HTTP 调用 GET /api/stock/value 查询 Stock API。The application queries the Stock API by using the HTTP call GET /api/stock/value.

可以运行一个查询来分析生成的遥测数据:You can analyze the resulting telemetry by running a query:

(requests | union dependencies | union pageViews)
| where operation_Id == "STYz"
| project timestamp, itemType, name, id, operation_ParentId, operation_Id

在结果中可以看到,所有遥测项共享根 operation_IdIn the results, note that all telemetry items share the root operation_Id. 从该页面发出 Ajax 调用后,会将新的唯一 ID (qJSXU) 分配给依赖项遥测,并将 pageView 的 ID 用作 operation_ParentIdWhen an Ajax call is made from the page, a new unique ID (qJSXU) is assigned to the dependency telemetry, and the ID of the pageView is used as operation_ParentId. 接着,服务器请求将 Ajax ID 用作 operation_ParentIdThe server request then uses the Ajax ID as operation_ParentId.

itemTypeitemType namename IDID operation_ParentIdoperation_ParentId operation_Idoperation_Id
pageViewpageView Stock pageStock page STYzSTYz STYzSTYz
dependencydependency GET /Home/StockGET /Home/Stock qJSXUqJSXU STYzSTYz STYzSTYz
requestrequest GET Home/StockGET Home/Stock KqKwlrSt9PA=KqKwlrSt9PA= qJSXUqJSXU STYzSTYz
dependencydependency GET /api/stock/valueGET /api/stock/value bBrf2L7mm2g=bBrf2L7mm2g= KqKwlrSt9PA=KqKwlrSt9PA= STYzSTYz

在对外部服务发出 GET /api/stock/value 调用时,需要知道该服务器的标识,以便对 dependency.target 字段进行相应的设置。When the call GET /api/stock/value is made to an external service, you need to know the identity of that server so you can set the dependency.target field appropriately. 如果外部服务不支持监视,则会将 target 设置为服务的主机名(例如 stock-prices-api.com)。When the external service doesn't support monitoring, target is set to the host name of the service (for example, stock-prices-api.com). 但是,如果该服务通过返回预定义的 HTTP 标头来标识自身,则 target 会包含服务标识,使 Application Insights 能够通过查询该服务中的遥测数据来生成分布式跟踪。But if the service identifies itself by returning a predefined HTTP header, target contains the service identity that allows Application Insights to build a distributed trace by querying telemetry from that service.

使用 W3C TraceContext 的关联标头Correlation headers using W3C TraceContext

Application Insights 正在过渡到 W3C Trace-Context,该协议定义:Application Insights is transitioning to W3C Trace-Context, which defines:

  • traceparent:承载调用的全局唯一操作 ID 和唯一标识符。traceparent: Carries the globally unique operation ID and unique identifier of the call.
  • tracestate:承载系统特定的跟踪上下文。tracestate: Carries system-specific tracing context.

最新版本 Application Insights SDK 支持 Trace-Context 协议,但你可能需要选择启用此协议。The latest version of the Application Insights SDK supports the Trace-Context protocol, but you might need to opt in to it. (将保持与 Application Insights SDK 支持的旧关联协议的后向兼容性。)(Backward compatibility with the previous correlation protocol supported by the Application Insights SDK will be maintained.)

关联 HTTP 协议(也称为 Request-Id)即将弃用。The correlation HTTP protocol, also called Request-Id, is being deprecated. 此协议定义两个标头:This protocol defines two headers:

  • Request-Id:承载调用的全局唯一 ID。Request-Id: Carries the globally unique ID of the call.
  • Correlation-Context:承载分布式跟踪属性的名称值对集合。Correlation-Context: Carries the name-value pairs collection of the distributed trace properties.

Application Insights 还为关联 HTTP 协议定义了扩展Application Insights also defines the extension for the correlation HTTP protocol. 它使用 Request-Context 名称值对来传播直接调用方或被调用方使用的属性集合。It uses Request-Context name-value pairs to propagate the collection of properties used by the immediate caller or callee. Application Insights SDK 使用此标头设置 dependency.targetrequest.source 字段。The Application Insights SDK uses this header to set the dependency.target and request.source fields.

W3C Trace-Context 和 Application Insights 数据模型按以下方式映射:The W3C Trace-Context and Application Insights data models map in the following way:

Application InsightsApplication Insights W3C TraceContextW3C TraceContext
RequestDependencyIdId of Request and Dependency parent-idparent-id
Operation_Id trace-idtrace-id
Operation_ParentId 此范围的父范围的 parent-idparent-id of this span's parent span. 如果这是根范围,此字段必须为空。If this is a root span, then this field must be empty.

有关详细信息,请参阅 Application Insights 遥测数据模型For more information, see Application Insights telemetry data model.

启用对 .NET 应用的 W3C 分布式跟踪支持Enable W3C distributed tracing support for .NET apps

在所有最新的 .NET Framework/.NET Core SDK 中默认启用基于 W3C TraceContext 的分布式跟踪,并提供与旧 Request-Id 协议的后向兼容性。W3C TraceContext based distributed tracing is enabled by default in all recent .NET Framework/.NET Core SDKs, along with backward compatibility with legacy Request-Id protocol.

启用对 Java 应用的 W3C 分布式跟踪支持Enable W3C distributed tracing support for Java apps

Java 3.0 代理Java 3.0 agent

Java 3.0 代理直接支持 W3C,不需要任何其他配置。Java 3.0 agent supports W3C out of the box and no additional configuration is needed.

Java SDKJava SDK

  • 传入配置Incoming configuration

    • 对于 Java EE 应用,请将以下内容添加到 ApplicationInsights.xml 内的 <TelemetryModules> 标记中:For Java EE apps, add the following to the <TelemetryModules> tag in ApplicationInsights.xml:

      <Add type="com.microsoft.applicationinsights.web.extensibility.modules.WebRequestTrackingTelemetryModule>
         <Param name = "W3CEnabled" value ="true"/>
         <Param name ="enableW3CBackCompat" value = "true" />
      </Add>
      
    • 对于 Spring Boot 应用,请添加以下属性:For Spring Boot apps, add these properties:

      • azure.application-insights.web.enable-W3C=true
      • azure.application-insights.web.enable-W3C-backcompat-mode=true
  • 传出配置Outgoing configuration

    将以下代码添加到 AI-Agent.xml:Add the following to AI-Agent.xml:

    <Instrumentation>
      <BuiltIn enabled="true">
        <HTTP enabled="true" W3C="true" enableW3CBackCompat="true"/>
      </BuiltIn>
    </Instrumentation>
    

    备注

    默认情况下启用后向兼容性模式,并且 enableW3CBackCompat 参数是可选的,Backward compatibility mode is enabled by default, and the enableW3CBackCompat parameter is optional. 仅在要将后向兼容性关闭时使用。Use it only when you want to turn backward compatibility off.

    理想情况下,当所有服务都已更新为支持 W3C 协议的较新版 SDK 时,应将该功能关闭。Ideally, you would turn this off when all your services have been updated to newer versions of SDKs that support the W3C protocol. 强烈建议你尽快迁移到这些更新的 SDK。We highly recommend that you move to these newer SDKs as soon as possible.

重要

确保传入和传出配置完全相同。Make sure the incoming and outgoing configurations are exactly the same.

启用对 Web 应用的 W3C 分布式跟踪支持Enable W3C distributed tracing support for Web apps

此功能在 Microsoft.ApplicationInsights.JavaScript 中。This feature is in Microsoft.ApplicationInsights.JavaScript. 此项默认禁用。It's disabled by default. 若要启用它,请使用 distributedTracingMode 配置。提供 AI_AND_W3C 是为了与 Application Insights 检测的任何旧式服务向后兼容:To enable it, use distributedTracingMode config. AI_AND_W3C is provided for backward compatibility with any legacy services instrumented by Application Insights.

添加以下配置:Add the following configuration:

  distributedTracingMode: DistributedTracingModes.W3C

添加以下配置:Add the following configuration:

    distributedTracingMode: 2 // DistributedTracingModes.W3C

重要

若要查看启用关联所需的所有配置,请参阅 JavaScript 关联文档To see all configurations required to enable correlation, see the JavaScript correlation documentation.

OpenCensus Python 中的遥测关联Telemetry correlation in OpenCensus Python

OpenCensus Python 支持 W3C Trace-Context,无需额外配置。OpenCensus Python supports W3C Trace-Context without requiring additional configuration.

作为参考,可在此处找到 OpenCensus 数据模型。As a reference, the OpenCensus data model can be found here.

传入请求关联Incoming request correlation

OpenCensus Python 将传入请求中的 W3C Trace-Context 标头关联到从请求本身生成的范围。OpenCensus Python correlates W3C Trace-Context headers from incoming requests to the spans that are generated from the requests themselves. OpenCensus 会通过适合以下流行 Web 应用程序框架的集成自动完成该操作:Flask、Django 和 Pyramid。OpenCensus will do this automatically with integrations for these popular web application frameworks: Flask, Django, and Pyramid. 只需使用正确的格式填充 W3C Trace-Context 标头,并通过请求发送即可。You just need to populate the W3C Trace-Context headers with the correct format and send them with the request. 下面是演示此设置的示例 Flask 应用程序:Here's a sample Flask application that demonstrates this:

from flask import Flask
from opencensus.ext.azure.trace_exporter import AzureExporter
from opencensus.ext.flask.flask_middleware import FlaskMiddleware
from opencensus.trace.samplers import ProbabilitySampler

app = Flask(__name__)
middleware = FlaskMiddleware(
    app,
    exporter=AzureExporter(),
    sampler=ProbabilitySampler(rate=1.0),
)

@app.route('/')
def hello():
    return 'Hello World!'

if __name__ == '__main__':
    app.run(host='localhost', port=8080, threaded=True)

此代码在本地计算机上运行示例 Flask 应用程序,并侦听端口 8080This code runs a sample Flask application on your local machine, listening to port 8080. 若要关联跟踪上下文,请向终结点发送一个请求。To correlate trace context, you send a request to the endpoint. 在此示例中,可以使用 curl 命令:In this example, you can use a curl command:

curl --header "traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01" localhost:8080

查看 Trace-Context 标头格式,可以获得以下信息:By looking at the Trace-Context header format, you can derive the following information:

version: 00version: 00

trace-id: 4bf92f3577b34da6a3ce929d0e0e4736trace-id: 4bf92f3577b34da6a3ce929d0e0e4736

parent-id/span-id: 00f067aa0ba902b7parent-id/span-id: 00f067aa0ba902b7

trace-flags: 01trace-flags: 01

如果查看发送到 Azure Monitor 的请求条目,可以看到填充了跟踪标头信息的字段。If you look at the request entry that was sent to Azure Monitor, you can see fields populated with the trace header information. 可以在 Azure Monitor Application Insights 资源中的“日志(分析)”下找到此数据。You can find this data under Logs (Analytics) in the Azure Monitor Application Insights resource.

“日志(分析)”中的请求遥测数据

id 字段采用 <trace-id>.<span-id> 格式,其中的 trace-id 取自在请求中传递的跟踪标头,span-id 是针对该范围生成的 8 字节数组。The id field is in the format <trace-id>.<span-id>, where the trace-id is taken from the trace header that was passed in the request and the span-id is a generated 8-byte array for this span.

operation_ParentId 字段采用 <trace-id>.<parent-id> 格式,其中的 trace-idparent-id 取自在请求中传递的跟踪标头。The operation_ParentId field is in the format <trace-id>.<parent-id>, where both the trace-id and the parent-id are taken from the trace header that was passed in the request.

日志关联Log correlation

OpenCensus Python 允许通过添加跟踪 ID、范围 ID 和采样标志进行日志记录,从而对日志进行关联。OpenCensus Python enables you to correlate logs by adding a trace ID, a span ID, and a sampling flag to log records. 可以通过安装 OpenCensus 日志记录集成来添加这些属性。You add these attributes by installing OpenCensus logging integration. 以下属性将添加到 Python LogRecord 对象:traceIdspanIdtraceSampledThe following attributes will be added to Python LogRecord objects: traceId, spanId, and traceSampled. 请注意,这只对集成后创建的记录器生效。Note that this takes effect only for loggers that are created after the integration.

下面是演示此设置的示例应用程序:Here's a sample application that demonstrates this:

import logging

from opencensus.trace import config_integration
from opencensus.trace.samplers import AlwaysOnSampler
from opencensus.trace.tracer import Tracer

config_integration.trace_integrations(['logging'])
logging.basicConfig(format='%(asctime)s traceId=%(traceId)s spanId=%(spanId)s %(message)s')
tracer = Tracer(sampler=AlwaysOnSampler())

logger = logging.getLogger(__name__)
logger.warning('Before the span')
with tracer.span(name='hello'):
    logger.warning('In the span')
logger.warning('After the span')

运行此代码时,控制台中将输出以下内容:When this code runs, the following prints in the console:

2019-10-17 11:25:59,382 traceId=c54cb1d4bbbec5864bf0917c64aeacdc spanId=0000000000000000 Before the span
2019-10-17 11:25:59,384 traceId=c54cb1d4bbbec5864bf0917c64aeacdc spanId=70da28f5a4831014 In the span
2019-10-17 11:25:59,385 traceId=c54cb1d4bbbec5864bf0917c64aeacdc spanId=0000000000000000 After the span

请注意,范围中的日志消息有一个对应的 spanIdNotice that there's a spanId present for the log message that's within the span. 它与属于名为 hello 的范围的 spanId 相同。This is the same spanId that belongs to the span named hello.

可以使用 AzureLogHandler 导出日志数据。You can export the log data by using AzureLogHandler. 有关详细信息,请参阅此文章For more information, see this article.

我们还可以将跟踪信息从一个组件传递到另一个组件,以便进行适当关联。We can also pass trace information from one component to another for proper correlation. 例如,假设有两个组件:module1module2For example, consider a scenario where there are two components module1 and module2. Module1 调用 Module2 中的函数,并在单次跟踪中从 module1module2 获取日志,我们可以使用以下方法:Module1 calls functions in Module2 and to get logs from both module1 and module2 in a single trace we can use following approach:

# module1.py
import logging

from opencensus.trace import config_integration
from opencensus.trace.samplers import AlwaysOnSampler
from opencensus.trace.tracer import Tracer
from module2 import function_1

config_integration.trace_integrations(['logging'])
logging.basicConfig(format='%(asctime)s traceId=%(traceId)s spanId=%(spanId)s %(message)s')
tracer = Tracer(sampler=AlwaysOnSampler())

logger = logging.getLogger(__name__)
logger.warning('Before the span')
with tracer.span(name='hello'):
   logger.warning('In the span')
   function_1(tracer)
logger.warning('After the span')


# module2.py

import logging

from opencensus.trace import config_integration
from opencensus.trace.samplers import AlwaysOnSampler
from opencensus.trace.tracer import Tracer

config_integration.trace_integrations(['logging'])
logging.basicConfig(format='%(asctime)s traceId=%(traceId)s spanId=%(spanId)s %(message)s')
tracer = Tracer(sampler=AlwaysOnSampler())

def function_1(parent_tracer=None):
    if parent_tracer is not None:
        tracer = Tracer(
                    span_context=parent_tracer.span_context,
                    sampler=AlwaysOnSampler(),
                )
    else:
        tracer = Tracer(sampler=AlwaysOnSampler())

    with tracer.span("function_1"):
        logger.info("In function_1")

.NET 中的遥测关联Telemetry correlation in .NET

.NET 运行时支持借助 ActivityDiagnosticSource 进行分发.NET runtime supports distributed with the help of Activity and DiagnosticSource

Application Insights .NET SDK 使用 DiagnosticSourceActivity 收集和关联遥测数据。The Application Insights .NET SDK uses DiagnosticSource and Activity to collect and correlate telemetry.

Java 中的遥测关联Telemetry correlation in Java

Java 代理以及 Java SDK 2.0.0 或更高版本支持自动关联遥测。Java agent as well as Java SDK version 2.0.0 or later supports automatic correlation of telemetry. 对于所有在请求范围内发出的遥测(例如跟踪、异常、自定义事件),它会自动填充 operation_idIt automatically populates operation_id for all telemetry (like traces, exceptions, and custom events) issued within the scope of a request. 对于通过 HTTP 进行的服务到服务调用,它还会传播关联标头(如前所述),前提是 Java SDK 代理已配置。It also propagates the correlation headers (described earlier) for service-to-service calls via HTTP, if the Java SDK agent is configured.

备注

Application Insights Java 代理自动收集 JMS、Kafka、Netty/Webflux 等的请求和依赖项。Application Insights Java agent auto-collects requests and dependencies for JMS, Kafka, Netty/Webflux, and more. 对于 Java SDK,关联功能仅支持通过 Apache HttpClient 进行的调用。For Java SDK only calls made via Apache HttpClient are supported for the correlation feature. 该 SDK 不支持跨消息传送技术(例如,Kafka、RabbitMQ 和 Azure 服务总线)自动进行上下文传播。Automatic context propagation across messaging technologies (like Kafka, RabbitMQ, and Azure Service Bus) isn't supported in the SDK.

备注

若要收集自定义遥测,需要使用 Java 2.6 SDK 检测应用程序。To collect custom telemetry you need to instrument the application with Java 2.6 SDK.

角色名称Role names

你可能需要对组件名称在应用程序映射中的显示方式进行自定义。You might want to customize the way component names are displayed in the Application Map. 为此,可执行以下操作之一来手动设置 cloud_RoleNameTo do so, you can manually set the cloud_RoleName by taking one of the following actions:

  • 对于 Application Insights Java 代理 3.0,请按如下所示设置云角色名称:For Application Insights Java agent 3.0, set the cloud role name as follows:

    {
      "role": {
        "name": "my cloud role name"
      }
    }
    

    还可以使用环境变量 APPLICATIONINSIGHTS_ROLE_NAME 设置云角色名称。You can also set the cloud role name using the environment variable APPLICATIONINSIGHTS_ROLE_NAME.

  • 使用 Application Insights Java SDK 2.5.0 和更高版本时,可以通过将 <RoleName> 添加到 ApplicationInsights.xml 文件来指定 cloud_RoleNameWith Application Insights Java SDK 2.5.0 and later, you can specify the cloud_RoleName by adding <RoleName> to your ApplicationInsights.xml file:

    <?xml version="1.0" encoding="utf-8"?>
    <ApplicationInsights xmlns="http://schemas.microsoft.com/ApplicationInsights/2013/Settings" schemaVersion="2014-05-30">
       <InstrumentationKey>** Your instrumentation key **</InstrumentationKey>
       <RoleName>** Your role name **</RoleName>
       ...
    </ApplicationInsights>
    
  • 如果将 Spring Boot 与 Application Insights Spring Boot Starter 配合使用,则只需在 application.properties 文件中为应用程序设置自定义名称:If you use Spring Boot with the Application Insights Spring Boot Starter, you just need to set your custom name for the application in the application.properties file:

    spring.application.name=<name-of-app>

    Spring Boot Starter 会自动将 cloudRoleName 分配给你为 spring.application.name 属性输入的值。The Spring Boot Starter automatically assigns cloudRoleName to the value you enter for the spring.application.name property.

后续步骤Next steps