使用 Application Insights 监视和调试 Azure Batch .NET 应用程序Monitor and debug an Azure Batch .NET application with Application Insights

Application Insights 提供简洁且强大的方式让开发人员监视和调试 Azure 服务中部署的应用程序。Application Insights provides an elegant and powerful way for developers to monitor and debug applications deployed to Azure services. 使用 Application Insights 可以监视性能计数器和异常,以及通过自定义指标和跟踪来检测代码。Use Application Insights to monitor performance counters and exceptions as well as instrument your code with custom metrics and tracing. 将 Application Insights 与 Azure Batch 应用程序相集成可以近乎实时地洞察行为以及调查问题。Integrating Application Insights with your Azure Batch application allows you to gain deep insights into behaviors and investigate issues in near-real time.

本文介绍如何在 Azure Batch .NET 解决方案中添加和配置 Application Insights 库,以及检测应用程序代码。This article shows how to add and configure the Application Insights library into your Azure Batch .NET solution and instrument your application code. 此外,介绍通过 Azure 门户监视应用程序和生成自定义仪表板的方法。It also shows ways to monitor your application via the Azure portal and build custom dashboards. 有关 Application Insights 对其他语言的支持,请查看语言、平台和集成文档For Application Insights support in other languages, look at the languages, platforms, and integrations documentation.

GitHub 上提供了一个示例 C# 解决方案,其中随附了本文中演示的代码。A sample C# solution with code to accompany this article is available on GitHub. 此示例将 Application Insights 检测代码添加到 TopNWords 示例。This example adds Application Insights instrumentation code to the TopNWords example. 如果你不熟悉该示例,请先尝试生成并运行 TopNWords。If you're not familiar with that example, try building and running TopNWords first. 这有助于理解在多个计算节点上并行处理一组输入 Blob 的基本 Batch 工作流。Doing this will help you understand a basic Batch workflow of processing a set of input blobs in parallel on multiple compute nodes.

提示

或者,配置 Batch 解决方案以在 Batch Explorer 中显示 Application Insights 数据,例如 VM 性能计数器。As an alternative, configure your Batch solution to display Application Insights data such as VM performance counters in Batch Explorer. Batch Explorer 是一个功能丰富的免费独立客户端工具,可帮助创建、调试和监视 Azure Batch 应用程序。Batch Explorer is a free, rich-featured, standalone client tool to help create, debug, and monitor Azure Batch applications. 下载适用于 Mac、Linux 或 Windows 的安装包Download an installation package for Mac, Linux, or Windows. 有关在 Batch Explorer 中启用 Application Insights 数据的快速步骤,请参阅 batch-insights 存储库See the batch-insights repo for quick steps to enable Application Insights data in Batch Explorer.

先决条件Prerequisites

将 Application Insights 添加到项目Add Application Insights to your project

项目需要 Microsoft.ApplicationInsights.WindowsServer NuGet 包及其依赖项。The Microsoft.ApplicationInsights.WindowsServer NuGet package and its dependencies are required for your project. 请将其添加或还原到应用程序的项目。Add or restore them to your application's project. 若要安装此包,请使用 Install-Package 命令或 NuGet 包管理器。To install the package, use the Install-Package command or NuGet Package Manager.

Install-Package Microsoft.ApplicationInsights.WindowsServer

使用 Microsoft.ApplicationInsights 命名空间从 .NET 应用程序引用 Application Insights。Reference Application Insights from your .NET application by using the Microsoft.ApplicationInsights namespace.

检测代码Instrument your code

若要检测代码,解决方案需要创建 Application Insights TelemetryClientTo instrument your code, your solution needs to create an Application Insights TelemetryClient. 在本示例中,TelemetryClient 将从 ApplicationInsights.config 文件加载其配置。In the example, the TelemetryClient loads its configuration from the ApplicationInsights.config file. 请务必使用 Application Insights 检测密钥更新以下项目中的 ApplicationInsights.config:Microsoft.Azure.Batch.Samples.TelemetryStartTask 和 TopNWordsSample。Be sure to update ApplicationInsights.config in the following projects with your Application Insights instrumentation key: Microsoft.Azure.Batch.Samples.TelemetryStartTask and TopNWordsSample.

<InstrumentationKey>YOUR-IKEY-GOES-HERE</InstrumentationKey>

此外,请在 TopNWords.cs 文件中添加该检测密钥。Also add the instrumentation key in the file TopNWords.cs.

TopNWords.cs 中的示例通过 Application Insights API 使用以下检测调用The example in TopNWords.cs uses the following instrumentation calls from the Application Insights API:

  • TrackMetric() - 跟踪某个计算节点下载所需文本文件平均花费的时间。TrackMetric() - Tracks how long, on average, a compute node takes to download the required text file.
  • TrackTrace() - 将调试调用添加到代码。TrackTrace() - Adds debugging calls to your code.
  • TrackEvent() - 跟踪要捕获的相关事件。TrackEvent() - Tracks interesting events to capture.

本示例有意遗漏了异常处理。This example purposely leaves out exception handling. 相反,Application Insights 会自动报告未经处理的异常,从而显著改善了调试体验。Instead, Application Insights automatically reports unhandled exceptions, which significantly improves the debugging experience.

以下代码片段演示如何使用这些方法。The following snippet illustrates how to use these methods.

public void CountWords(string blobName, int numTopN, string storageAccountName, string storageAccountKey)
{
    // simulate exception for some set of tasks
    Random rand = new Random();
    if (rand.Next(0, 10) % 10 == 0)
    {
        blobName += ".badUrl";
    }

    // log the url we are downloading the file from
    insightsClient.TrackTrace(new TraceTelemetry(string.Format("Task {0}: Download file from: {1}", this.taskId, blobName), SeverityLevel.Verbose));

    // open the cloud blob that contains the book
    var storageCred = new StorageCredentials(storageAccountName, storageAccountKey);
    CloudBlockBlob blob = new CloudBlockBlob(new Uri(blobName), storageCred);
    using (Stream memoryStream = new MemoryStream())
    {
        // calculate blob download time
        DateTime start = DateTime.Now;
        blob.DownloadToStream(memoryStream);
        TimeSpan downloadTime = DateTime.Now.Subtract(start);

        // track how long the blob takes to download on this node
        // this will help debug timing issues or identify poorly performing nodes
        insightsClient.TrackMetric("Blob download in seconds", downloadTime.TotalSeconds, this.CommonProperties);

        memoryStream.Position = 0; //Reset the stream
        var sr = new StreamReader(memoryStream);
        var myStr = sr.ReadToEnd();
        string[] words = myStr.Split(' ');

        // log how many words were found in the text file
        insightsClient.TrackTrace(new TraceTelemetry(string.Format("Task {0}: Found {1} words", this.taskId, words.Length), SeverityLevel.Verbose));
        var topNWords =
            words.
                Where(word => word.Length > 0).
                GroupBy(word => word, (key, group) => new KeyValuePair<String, long>(key, group.LongCount())).
                OrderByDescending(x => x.Value).
                Take(numTopN).
                ToList();
        foreach (var pair in topNWords)
        {
            Console.WriteLine("{0} {1}", pair.Key, pair.Value);
        }

        // emit an event to track the completion of the task
        insightsClient.TrackEvent("Done counting words");
    }
}

Azure Batch 遥测初始化表达式帮助器Azure Batch telemetry initializer helper

报告给定服务器和实例的遥测数据时,Application Insights 使用 Azure VM 角色和 VM 名称作为默认值。When reporting telemetry for a given server and instance, Application Insights uses the Azure VM Role and VM name for the default values. 本示例演示如何在 Azure Batch 的上下文中改用池名称和计算节点名称。In the context of Azure Batch, the example shows how to use the pool name and compute node name instead. 使用遥测初始化表达式重写默认值。Use a telemetry initializer to override the default values.

using Microsoft.ApplicationInsights.Channel;
using Microsoft.ApplicationInsights.Extensibility;
using System;
using System.Threading;

namespace Microsoft.Azure.Batch.Samples.TelemetryInitializer
{
    public class AzureBatchNodeTelemetryInitializer : ITelemetryInitializer
    {
        // Azure Batch environment variables
        private const string PoolIdEnvironmentVariable = "AZ_BATCH_POOL_ID";
        private const string NodeIdEnvironmentVariable = "AZ_BATCH_NODE_ID";

        private string roleInstanceName;
        private string roleName;

        public void Initialize(ITelemetry telemetry)
        {
            if (string.IsNullOrEmpty(telemetry.Context.Cloud.RoleName))
            {
                // override the role name with the Azure Batch Pool name
                string name = LazyInitializer.EnsureInitialized(ref this.roleName, this.GetPoolName);
                telemetry.Context.Cloud.RoleName = name;
            }

            if (string.IsNullOrEmpty(telemetry.Context.Cloud.RoleInstance))
            {
                // override the role instance with the Azure Batch Compute Node name
                string name = LazyInitializer.EnsureInitialized(ref this.roleInstanceName, this.GetNodeName);
                telemetry.Context.Cloud.RoleInstance = name;
            }
        }

        private string GetPoolName()
        {
            return Environment.GetEnvironmentVariable(PoolIdEnvironmentVariable) ?? string.Empty;
        }

        private string GetNodeName()
        {
            return Environment.GetEnvironmentVariable(NodeIdEnvironmentVariable) ?? string.Empty;
        }
    }
}

为了启用遥测初始化表达式,TopNWordsSample 项目中的 ApplicationInsights.config 文件包含以下值:To enable the telemetry initializer, the ApplicationInsights.config file in the TopNWordsSample project includes the following:

<TelemetryInitializers>
    <Add Type="Microsoft.Azure.Batch.Samples.TelemetryInitializer.AzureBatchNodeTelemetryInitializer, Microsoft.Azure.Batch.Samples.TelemetryInitializer"/>
</TelemetryInitializers>

更新作业和任务,以包含 Application Insights 二进制文件Update the job and tasks to include Application Insights binaries

为使 Application Insights 在计算节点上正常运行,请务必正确放置二进制文件。In order for Application Insights to run correctly on your compute nodes, make sure the binaries are correctly placed. 将所需的二进制文件添加到任务的资源文件集合,以便在执行任务时下载这些文件。Add the required binaries to your task's resource files collection so that they get downloaded at the time your task executes. 以下代码片段类似于 Job.cs 中的代码。The following snippets are similar to code in Job.cs.

首先,创建要上传的 Application Insights 文件的静态列表。First, create a static list of Application Insights files to upload.

private static readonly List<string> AIFilesToUpload = new List<string>()
{
    // Application Insights config and assemblies
    "ApplicationInsights.config",
    "Microsoft.ApplicationInsights.dll",
    "Microsoft.AI.Agent.Intercept.dll",
    "Microsoft.AI.DependencyCollector.dll",
    "Microsoft.AI.PerfCounterCollector.dll",
    "Microsoft.AI.ServerTelemetryChannel.dll",
    "Microsoft.AI.WindowsServer.dll",
    
    // custom telemetry initializer assemblies
    "Microsoft.Azure.Batch.Samples.TelemetryInitializer.dll",
 };
...

接下来,创建任务使用的临时文件。Next, create the staging files that are used by the task.

...
// create file staging objects that represent the executable and its dependent assembly to run as the task.
// These files are copied to every node before the corresponding task is scheduled to run on that node.
FileToStage topNWordExe = new FileToStage(TopNWordsExeName, stagingStorageAccount);
FileToStage storageDll = new FileToStage(StorageClientDllName, stagingStorageAccount);

// Upload Application Insights assemblies
List<FileToStage> aiStagedFiles = new List<FileToStage>();
foreach (string aiFile in AIFilesToUpload)
{
    aiStagedFiles.Add(new FileToStage(aiFile, stagingStorageAccount));
}
...

FileToStage 方法是代码示例中的帮助器函数,用于将文件从本地磁盘轻松上传到 Azure 存储 Blob。The FileToStage method is a helper function in the code sample that allows you to easily upload a file from local disk to an Azure Storage blob. 稍后每个文件将下载到计算节点,并由任务引用。Each file is later downloaded to a compute node and referenced by a task.

最后,将任务添加到作业,并包含所需的 Application Insights 二进制文件。Finally, add the tasks to the job and include the necessary Application Insights binaries.

...
// initialize a collection to hold the tasks that will be submitted in their entirety
List<CloudTask> tasksToRun = new List<CloudTask>(topNWordsConfiguration.NumberOfTasks);
for (int i = 1; i <= topNWordsConfiguration.NumberOfTasks; i++)
{
    CloudTask task = new CloudTask("task_no_" + i, String.Format("{0} --Task {1} {2} {3} {4}",
        TopNWordsExeName,
        string.Format("https://{0}.blob.chinacloudapi.cn/{1}",
            accountSettings.StorageAccountName,
            documents[i]),
        topNWordsConfiguration.TopWordCount,
        accountSettings.StorageAccountName,
        accountSettings.StorageAccountKey));

    //This is the list of files to stage to a container -- for each job, one container is created and 
    //files all resolve to Azure Blobs by their name (so two tasks with the same named file will create just 1 blob in
    //the container).
    task.FilesToStage = new List<IFileStagingProvider>
                        {
                            // required application binaries
                            topNWordExe,
                            storageDll,
                        };
    foreach (FileToStage stagedFile in aiStagedFiles)
   {
        task.FilesToStage.Add(stagedFile);
   }    
    task.RunElevated = false;
    tasksToRun.Add(task);
}

在 Azure 门户中查看数据View data in the Azure portal

将作业和任务配置为使用 Application Insights 后,请在池中运行示例作业。Now that you've configured the job and tasks to use Application Insights, run the example job in your pool. 导航到 Azure 门户,并打开预配的 Application Insights 资源。Navigate to the Azure portal and open the Application Insights resource that you provisioned. 预配池后,应会开始看到数据流动并被记录。After the pool is provisioned, you should start to see data flowing and getting logged. 本文的余下内容只会讨论几项 Application Insights 功能,读者可以任意探索整个功能集。The rest of this article touches on only a few Application Insights features, but feel free to explore the full feature set.

查看实时流数据View live stream data

若要查看 Application Insights 资源中的跟踪日志,请单击“实时流”。To view trace logs in your Applications Insights resource, click Live Stream. 以下屏幕截图显示如何查看来自池中计算节点的实时数据,例如每个计算节点的 CPU 使用率。The following screenshot shows how to view live data coming from the compute nodes in the pool, for example the CPU usage per compute node.

实时流计算节点数据

查看跟踪日志View trace logs

若要查看 Application Insights 资源中的跟踪日志,请单击“搜索”。To view trace logs in your Applications Insights resource, click Search. 此视图显示 Application Insights 捕获的诊断数据列表,包括跟踪、事件和异常。This view shows a list of diagnostic data captured by Application Insights including traces, events, and exceptions.

以下屏幕截图显示如何记录某个任务的单个跟踪,并随后对其进行查询以实现调试目的。The following screenshot shows how a single trace for a task is logged and later queried for debugging purposes.

跟踪日志图像

查看未经处理的异常View unhandled exceptions

以下屏幕截图显示 Application Insights 如何记录应用程序中引发的异常。The following screenshots shows how Application Insights logs exceptions thrown from your application. 在本例中,在应用程序引发异常后的几秒钟内,即可深入到特定的异常并诊断问题。In this case, within seconds of the application throwing the exception, you can drill into a specific exception and diagnose the issue.

未经处理的异常

测量 Blob 下载时间Measure blob download time

自定义指标也是门户中的一个有用工具。Custom metrics are also a valuable tool in the portal. 例如,可以显示每个计算节点下载其处理的所需文本文件平均花费的时间。For example, you can display the average time it took each compute node to download the required text file it was processing.

创建示例图表:To create a sample chart:

  1. 在 Application Insights 资源中,单击“指标资源管理器” > “添加图表”。 In your Application Insights resource, click Metrics Explorer > Add chart.
  2. 在添加的图表上单击“编辑”。Click Edit on the chart that was added.
  3. 按如下所示更新图表详细信息:Update the chart details as follows:
    • 将“图表类型”设置为“网格”。 Set Chart type to Grid.
    • 将“聚合”设置为“平均”。 Set Aggregation to Average.
    • 将“分组依据”设置为“NodeId”。 Set Group by to NodeId.
    • 在“指标”中,选择“自定义” > “Blob 下载时间(秒)”。 In Metrics, select Custom > Blob download in seconds.
    • 根据偏好调整“调色板”的显示。Adjust display Color palette to your choice.

每个节点的 Blob 下载时间

连续监视计算节点Monitor compute nodes continuously

你可能已注意到,仅当任务运行时,才会记录所有指标,包括性能计数器。You may have noticed that all metrics, including performance counters, are only logged when the tasks are running. 此行为很有用,因为它会限制 Application Insights 记录的数据量。This behavior is useful because it limits the amount of data that Application Insights logs. 但是,在某些情况下,我们希望一直监视计算节点。However, there are cases when you would always like to monitor the compute nodes. 例如,计算节点可能在运行未由 Batch 服务计划的后台工作。For example, they might be running background work which is not scheduled via the Batch service. 在这种情况下,可设置为在计算节点的整个生命周期内都要运行某个监视进程。In this case, set up a monitoring process to run for the life of the compute node.

实现此行为的方法之一是生成一个加载 Application Insights 库并在后台运行的进程。One way to achieve this behavior is to spawn a process that loads the Application Insights library and runs in the background. 在本示例中,启动任务在计算机上加载二进制文件,并使某个进程无限期运行。In the example, the start task loads the binaries on the machine and keeps a process running indefinitely. 将此进程的 Application Insights 配置文件配置为发出所需的附加数据,例如性能计数器。Configure the Application Insights configuration file for this process to emit additional data you're interested in, such as performance counters.

...
 // Batch start task telemetry runner
private const string BatchStartTaskFolderName = "StartTask";
private const string BatchStartTaskTelemetryRunnerName = "Microsoft.Azure.Batch.Samples.TelemetryStartTask.exe";
private const string BatchStartTaskTelemetryRunnerAIConfig = "ApplicationInsights.config";
...
CloudPool pool = client.PoolOperations.CreatePool(
    topNWordsConfiguration.PoolId,
    targetDedicated: topNWordsConfiguration.PoolNodeCount,
    virtualMachineSize: "standard_d1_v2",
    cloudServiceConfiguration: new CloudServiceConfiguration(osFamily: "5"));
...

// Create a start task which will run a dummy exe in background that simply emits performance
// counter data as defined in the relevant ApplicationInsights.config.
// Note that the waitForSuccess on the start task was not set so the Compute Node will be
// available immediately after this command is run.
pool.StartTask = new StartTask()
{
    CommandLine = string.Format("cmd /c {0}", BatchStartTaskTelemetryRunnerName),
    ResourceFiles = resourceFiles
};
...

提示

若要提高解决方案的可管理性,可以在应用程序包中捆绑程序集。To increase the manageability of your solution, you can bundle the assembly in an application package. 然后,若要自动将应用程序包部署到池中,请添加对池配置的应用程序包引用。Then, to deploy the application package automatically to your pools, add an application package reference to the pool configuration.

限制和示例数据Throttle and sample data

由于生产环境中运行的 Azure Batch 应用程序的大规模性,你可能想要限制 Application Insights 收集的数据量,以控制成本。Due to the large-scale nature of Azure Batch applications running in production, you might want to limit the amount of data collected by Application Insights to manage costs. 请参阅在 Application Insights 中采样,了解一些可以实现此目的的机制。See Sampling in Application Insights for some mechanisms to achieve this.

后续步骤Next steps