Azure Durable Functions 中的诊断Diagnostics in Durable Functions in Azure

可以使用多个选项通过 Durable Functions 诊断问题。There are several options for diagnosing issues with Durable Functions. 其中有些选项与适用于正则函数的选项相同,还有一些选项是 Durable Functions 特有的。Some of these options are the same for regular functions and some of them are unique to Durable Functions.

Application InsightsApplication Insights

使用 Application Insights 是在 Azure Functions 中执行诊断和监视的建议方法。Application Insights is the recommended way to do diagnostics and monitoring in Azure Functions. 这同样适用于 Durable Functions。The same applies to Durable Functions. 有关如何在函数应用中利用 Application Insights 的概述,请参阅监视 Azure FunctionsFor an overview of how to leverage Application Insights in your function app, see Monitor Azure Functions.

Azure Functions Durable 扩展还会发出跟踪事件,用于跟踪业务流程的端到端执行。 The Azure Functions Durable Extension also emits tracking events that allow you to trace the end-to-end execution of an orchestration. 可在 Azure 门户中使用 Application Insights Analytics 工具来查找和查询这些跟踪事件。These tracking events can be found and queried using the Application Insights Analytics tool in the Azure portal.

跟踪数据Tracking data

业务流程实例的每个生命周期事件会导致将一个跟踪事件写入 Application Insights 中的 跟踪 集合。Each lifecycle event of an orchestration instance causes a tracking event to be written to the traces collection in Application Insights. 此事件包含带有多个字段的 customDimensions 有效负载。This event contains a customDimensions payload with several fields. 字段名称的前面都附有 prop__Field names are all prepended with prop__.

  • hubName :运行业务流程的任务中心的名称。hubName : The name of the task hub in which your orchestrations are running.
  • appName :函数应用的名称。appName : The name of the function app. 当有多个函数应用共享同一个 Application Insights 实例时,此字段非常有用。This field is useful when you have multiple function apps sharing the same Application Insights instance.
  • slotName :运行当前函数应用的 部署槽位slotName : The deployment slot in which the current function app is running. 使用部署槽位控制业务流程的版本时,此字段非常有用。This field is useful when you use deployment slots to version your orchestrations.
  • functionName :业务流程协调程序或活动函数的名称。functionName : The name of the orchestrator or activity function.
  • functionType: The type of the function, such as Orchestrator or Activity.
  • instanceId :业务流程实例的唯一 ID。instanceId : The unique ID of the orchestration instance.
  • state :实例的生命周期执行状态。state : The lifecycle execution state of the instance. 有效值包括:Valid values include:
    • Scheduled :函数已计划执行,但尚未开始运行。Scheduled : The function was scheduled for execution but hasn't started running yet.
    • Started :函数已开始运行,但尚未进入等待或完成状态。Started : The function has started running but has not yet awaited or completed.
    • Awaited :业务流程协调程序已计划某项工作,正在等待该工作完成。Awaited : The orchestrator has scheduled some work and is waiting for it to complete.
    • Listening :业务流程协调程序正在侦听外部事件通知。Listening : The orchestrator is listening for an external event notification.
    • Completed :函数已成功完成。Completed : The function has completed successfully.
    • Failed :函数失败并出错。Failed : The function failed with an error.
  • reason :与跟踪事件关联的其他数据。reason : Additional data associated with the tracking event. 例如,如果某个实例正在等待外部事件通知,则此字段指示该实例正在等待的事件的名称。For example, if an instance is waiting for an external event notification, this field indicates the name of the event it is waiting for. 如果函数失败,此字段会包含错误详细信息。If a function has failed, this field will contain the error details.
  • isReplay :指示跟踪事件是否用于重播执行的布尔值。isReplay : Boolean value indicating whether the tracking event is for replayed execution.
  • extensionVersion :持久任务扩展的版本。extensionVersion : The version of the Durable Task extension. 在报告扩展中可能存在的 bug 时,此版本信息是特别重要的数据。The version information is especially important data when reporting possible bugs in the extension. 如果长时间运行的实例在运行时发生更新,它可能会报告多个版本。Long-running instances may report multiple versions if an update occurs while it is running.
  • sequenceNumber :事件的执行序列号。sequenceNumber : Execution sequence number for an event. 与时间戳组合使用可以帮助按执行时间对事件进行排序。Combined with the timestamp helps to order the events by execution time. 请注意,如果主机在实例正在运行时重新启动,则此数字将重置为零,因此始终先按时间戳然后按 sequenceNumber 排序很重要。Note that this number will be reset to zero if the host restarts while the instance is running, so it's important to always sort by timestamp first, then sequenceNumber.

可以在 host.json 文件的 logger (Functions 1.x) 或 logging (Functions 2.0) 部分中对发出到 Application Insights 的跟踪数据的详细程度进行配置。The verbosity of tracking data emitted to Application Insights can be configured in the logger (Functions 1.x) or logging (Functions 2.0) section of the host.json file.

Functions 1.0Functions 1.0

{
    "logger": {
        "categoryFilter": {
            "categoryLevels": {
                "Host.Triggers.DurableTask": "Information"
            }
        }
    }
}

Functions 2.0Functions 2.0

{
    "logging": {
        "logLevel": {
          "Host.Triggers.DurableTask": "Information",
        },
    }
}

默认情况下,会发出所有非重播跟踪事件。By default, all non-replay tracking events are emitted. 可通过将 Host.Triggers.DurableTask 设置为 "Warning""Error" 来减少数据量,在这种情况下,只会在发生异常情况时发出跟踪事件。The volume of data can be reduced by setting Host.Triggers.DurableTask to "Warning" or "Error" in which case tracking events will only be emitted for exceptional situations.

若要启用发出详细业务流程重播事件,可以在 host.json 文件中的 durableTask 下将 LogReplayEvents 设置为 true,如下所示:To enable emitting the verbose orchestration replay events, the LogReplayEvents can be set to true in the host.json file under durableTask as shown:

Functions 1.0Functions 1.0

{
    "durableTask": {
        "logReplayEvents": true
    }
}

Functions 2.0Functions 2.0

{
    "extensions": {
        "durableTask": {
            "logReplayEvents": true
        }
    }
}

备注

默认情况下,Azure Functions 运行时会对 Application Insights 遥测数据采样,以免过度频繁地发出数据。By default, Application Insights telemetry is sampled by the Azure Functions runtime to avoid emitting data too frequently. 如果在短时间内发生了许多的生命周期事件,此行为可能会导致跟踪信息丢失。This can cause tracking information to be lost when many lifecycle events occur in a short period of time. Azure Functions 监视文章介绍了如何配置此行为。The Azure Functions Monitoring article explains how to configure this behavior.

单实例查询Single instance query

以下查询显示 Hello Sequence 函数业务流程的单个实例的历史跟踪数据。The following query shows historical tracking data for a single instance of the Hello Sequence function orchestration. 它是使用 Kusto 查询语言编写的。It's written using the Kusto Query Language. 它会筛选出重播执行,以便仅显示逻辑执行路径。 It filters out replay execution so that only the logical execution path is shown. 可以通过按 timestampsequenceNumber 排序来安排事件顺序,如以下查询中所示:Events can be ordered by sorting by timestamp and sequenceNumber as shown in the query below:

let targetInstanceId = "ddd1aaa685034059b545eb004b15d4eb";
let start = datetime(2018-03-25T09:20:00);
traces
| where timestamp > start and timestamp < start + 30m
| where customDimensions.Category == "Host.Triggers.DurableTask"
| extend functionName = customDimensions["prop__functionName"]
| extend instanceId = customDimensions["prop__instanceId"]
| extend state = customDimensions["prop__state"]
| extend isReplay = tobool(tolower(customDimensions["prop__isReplay"]))
| extend sequenceNumber = tolong(customDimensions["prop__sequenceNumber"])
| where isReplay != true
| where instanceId == targetInstanceId
| sort by timestamp asc, sequenceNumber asc
| project timestamp, functionName, state, instanceId, sequenceNumber, appName = cloud_RoleName

结果是显示业务流程执行路径的跟踪事件的列表,包括所有活动函数,按执行时间以升序排序。The result is a list of tracking events that shows the execution path of the orchestration, including any activity functions ordered by the execution time in ascending order.

Application Insights 单实例已排序的查询

实例摘要查询Instance summary query

以下查询显示在指定时间范围中运行的所有业务流程实例的状态。The following query displays the status of all orchestration instances that were run in a specified time range.

let start = datetime(2017-09-30T04:30:00);
traces
| where timestamp > start and timestamp < start + 1h
| where customDimensions.Category == "Host.Triggers.DurableTask"
| extend functionName = tostring(customDimensions["prop__functionName"])
| extend instanceId = tostring(customDimensions["prop__instanceId"])
| extend state = tostring(customDimensions["prop__state"])
| extend isReplay = tobool(tolower(customDimensions["prop__isReplay"]))
| extend output = tostring(customDimensions["prop__output"])
| where isReplay != true
| summarize arg_max(timestamp, *) by instanceId
| project timestamp, instanceId, functionName, state, output, appName = cloud_RoleName
| order by timestamp asc

结果是实例 ID 的列表及其当前运行时状态。The result is a list of instance IDs and their current runtime status.

Application Insights 单实例查询

Durable Task Framework 日志记录Durable Task Framework Logging

Durable 扩展日志对于了解业务流程逻辑的行为很有帮助。The Durable extension logs are useful for understanding the behavior of your orchestration logic. 但这些日志并非始终包含足够的信息来调试框架级别的性能和可靠性问题。However, these logs don't always contain enough information to debug framework-level performance and reliability issues. 从 Durable 扩展 v2.3.0 开始,由基础 Durable Task Framework (DTFx) 发出的日志也可用于集合。Starting in v2.3.0 of the Durable extension, logs emitted by the underlying Durable Task Framework (DTFx) are also available for collection.

查看 DTFx 发出的日志时,请务必了解 DTFx 引擎由两个组件组成:核心调度引擎 (DurableTask.Core) 和众多受支持的存储提供程序之一(Durable Functions 默认使用 DurableTask.AzureStorage)。When looking at logs emitted by the DTFx, it's important to understand that the DTFx engine is composed of two components: the core dispatch engine (DurableTask.Core) and one of many supported storage providers (Durable Functions uses DurableTask.AzureStorage by default).

  • DurableTask :包含有关业务流程执行和低级别计划的信息。DurableTask.Core : contains information about orchestration execution and low-level scheduling.
  • DurableTask.AzureStorage :包含与 Azure 存储项目交互相关的信息,其中包括用于存储和提取内部业务流程状态的内部队列、blob 和存储表。DurableTask.AzureStorage : contains information related to interactions with Azure Storage artifacts, including the internal queues, blobs, and storage tables used to store and fetch internal orchestration state.

可通过更新函数应用的 host.json 文件的 logging/logLevel 部分来启用这些日志。You can enable these logs by updating the logging/logLevel section of your function app's host.json file. 下面的示例演示如何从 DurableTask.CoreDurableTask.AzureStorage 启用警告和错误日志:The following example shows how to enable warning and error logs from both DurableTask.Core and DurableTask.AzureStorage:

{
  "version": "2.0",
  "logging": {
    "logLevel": {
      "DurableTask.AzureStorage": "Warning",
      "DurableTask.Core": "Warning"
    }
  }
}

如果已启用 Application Insights,这些日志会自动添加到 trace 集合。If you have Application Insights enabled, these logs will be automatically added to the trace collection. 可使用 Kusto 查询像搜索其他 trace 日志一样搜索它们。You can search them the same way that you search for other trace logs using Kusto queries.

备注

对于生产应用程序,建议使用 "Warning" 筛选器启用 DurableTask.CoreDurableTask.AzureStorage 日志。For production applications, it is recommended that you enable DurableTask.Core and DurableTask.AzureStorage logs using the "Warning" filter. 较高详细程度筛选器(如 "Information")对于调试性能问题非常有用。Higher verbosity filters such as "Information" are very useful for debugging performance issues. 但这些日志事件会占用很大容量,可能会大大增加 Application Insights 数据存储费用。However, these log events are high-volume and can significantly increase Application Insights data storage costs.

以下 Kusto 查询演示了如何查询 DTFx 日志。The following Kusto query shows how to query for DTFx logs. 查询最重要的部分是 where customerDimensions.Category startswith "DurableTask",因为它将结果筛选到 DurableTask.CoreDurableTask.AzureStorage 类别中的日志。The most important part of the query is where customerDimensions.Category startswith "DurableTask" since that filters the results to logs in the DurableTask.Core and DurableTask.AzureStorage categories.

traces
| where customDimensions.Category startswith "DurableTask"
| project
    timestamp,
    severityLevel,
    Category = customDimensions.Category,
    EventId = customDimensions.EventId,
    message,
    customDimensions
| order by timestamp asc 

结果为 Durable Task Framework 日志提供程序编写的一组日志。The result is a set of logs written by the Durable Task Framework log providers.

Application Insights DTFx 查询结果

有关可用日志事件的详细信息,请参阅 GitHub 上的 Durable Task Framework 结构化日志记录文档For more information about what log events are available, see the Durable Task Framework structured logging documentation on GitHub.

应用日志记录App Logging

直接从业务流程协调程序函数写入日志时,必须注意业务流程协调程序的重播行为。It's important to keep the orchestrator replay behavior in mind when writing logs directly from an orchestrator function. 例如,考虑以下业务流程协调程序函数:For example, consider the following orchestrator function:

[FunctionName("FunctionChain")]
public static async Task Run(
    [OrchestrationTrigger] IDurableOrchestrationContext context,
    ILogger log)
{
    log.LogInformation("Calling F1.");
    await context.CallActivityAsync("F1");
    log.LogInformation("Calling F2.");
    await context.CallActivityAsync("F2");
    log.LogInformation("Calling F3");
    await context.CallActivityAsync("F3");
    log.LogInformation("Done!");
}