在 Application Insights 中收集用于警报和调试的机器学习管道日志文件Collect machine learning pipeline log files in Application Insights for alerts and debugging

适用于:是基本版是企业版               (升级到企业版APPLIES TO: yesBasic edition yesEnterprise edition                    (Upgrade to Enterprise edition)

可以通过脚本使用 OpenCensus Python 库将日志路由到 Application Insights。The OpenCensus python library can be used to route logs to Application Insights from your scripts. 在一个位置聚合管道运行发出的日志可以生成查询和诊断问题。Aggregating logs from pipeline runs in one place allows you to build queries and diagnose issues. 使用 Application Insights 可以跟踪一段时间内的日志,并比较各个运行的管道日志。Using Application Insights will allow you to track logs over time and compare pipeline logs across runs.

将日志放在一个位置可以提供异常和错误消息的历史记录。Having your logs in once place will provide a history of exceptions and error messages. 由于 Application Insights 与 Azure 警报相集成,因此你还可以基于 Application Insights 查询创建警报。Since Application Insights integrates with Azure Alerts, you can also create alerts based on Application Insights queries.

必备条件Prerequisites

入门Getting Started

本部分的简介内容与通过 Azure 机器学习管道使用 OpenCensus 的操作相关。This section is an introduction specific to using OpenCensus from an Azure Machine Learning pipeline. 有关详细教程,请参阅 OpenCensus Azure Monitor 导出程序For a detailed tutorial, see OpenCensus Azure Monitor Exporters

将 PythonScriptStep 添加到 Azure 机器学习管道。Add a PythonScriptStep to your Azure ML Pipeline. 使用 opencensus-ext-azure 中的依赖项配置 RunConfigurationConfigure your RunConfiguration with the dependency on opencensus-ext-azure. 配置 APPLICATIONINSIGHTS_CONNECTION_STRING 环境变量。Configure the APPLICATIONINSIGHTS_CONNECTION_STRING environment variable.

from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.runconfig import RunConfiguration
from azureml.pipeline.core import Pipeline
from azureml.pipeline.steps import PythonScriptStep

# Connecting to the workspace and compute target not shown

# Add pip dependency on OpenCensus
dependencies = CondaDependencies()
dependencies.add_pip_package("opencensus-ext-azure>=1.0.1")
run_config = RunConfiguration(conda_dependencies=dependencies)

# Add environment variable with Application Insights Connection String
# Replace the value with your own connection string
run_config.environment.environment_variables = {
    "APPLICATIONINSIGHTS_CONNECTION_STRING": 'InstrumentationKey=00000000-0000-0000-0000-000000000000'
}

# Configure step with runconfig
sample_step = PythonScriptStep(
        script_name="sample_step.py",
        compute_target=compute_target,
        runconfig=run_config
)

# Submit new pipeline run
pipeline = Pipeline(workspace=ws, steps=[sample_step])
pipeline.submit(experiment_name="Logging_Experiment")

创建名为 sample_step.py 的文件。Create a file called sample_step.py. 导入 AzureLogHandler 类以将日志路由到 Application Insights。Import the AzureLogHandler class to route logs to Application Insights. 还需要导入 Python 日志记录库。You'll also need to import the Python Logging library.

from opencensus.ext.azure.log_exporter import AzureLogHandler
import logging

接下来,将 AzureLogHandler 添加到 Python 记录器。Next, add the AzureLogHandler to the python logger.

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())

# Assumes the environment variable APPLICATIONINSIGHTS_CONNECTION_STRING is already set
logger.addHandler(AzureLogHandler())
logger.warning("I will be sent to Application Insights")

使用自定义维度进行日志记录Logging with Custom Dimensions

默认情况下,转发到 Application Insights 的日志不提供足够的上下文用于追溯运行或试验。By default, logs forwarded to Application Insights won't have enough context to trace back to the run or experiment. 要使日志可用于诊断问题,需要添加额外的字段。To make the logs actionable for diagnosing issues, additional fields are needed.

若要添加这些字段,可以添加自定义维度,以提供日志消息的上下文。To add these fields, Custom Dimensions can be added to provide context to a log message. 例如,当某人想要查看同一管道运行中多个步骤的日志时。One example is when someone wants to view logs across multiple steps in the same pipeline run.

自定义维度构成了键-值(以“字符串, 字符串”形式存储)对的字典。Custom Dimensions make up a dictionary of key-value (stored as string, string) pairs. 然后,该字典将发送到 Application Insights,并在查询结果中显示为一列。The dictionary is then sent to Application Insights and displayed as a column in the query results. 其各个维度可用作查询参数Its individual dimensions can be used as query parameters.

要包含的有用上下文Helpful Context to include

字段Field 理由/示例Reasoning/Example
parent_run_idparent_run_id 可以查询具有相同 parent_run_id 的运行的日志,以查看一段时间内所有步骤的日志,而无需深入到各个步骤Can query logs for ones with the same parent_run_id to see logs over time for all steps, instead of having to dive into each individual step
step_idstep_id 可以查询具有相同 step_id 的运行的日志,将范围缩小到单个步骤,以查看发生问题的位置Can query logs for ones with the same step_id to see where an issue occurred with a narrow scope to just the individual step
step_namestep_name 可以查询日志以查看一段时间内的步骤性能。Can query logs to see step performance over time. 还有助于查找最近的运行的 step_id,而无需深入到门户 UIAlso helps to find a step_id for recent runs without diving into the portal UI
experiment_nameexperiment_name 可以跨日志进行查询,以查看一段时间内的试验性能。Can query across logs to see experiment performance over time. 还有助于查找最近的运行的 parent_run_id 或 step_id,而无需深入到门户 UIAlso helps find a parent_run_id or step_id for recent runs without diving into the portal UI
run_urlrun_url 可以提供直接返回到运行的链接用于调查。Can provide a link directly back to the run for investigation.

其他有用字段Other helpful fields

这些字段可能需要额外的代码检测,且不会由运行上下文提供。These fields may require additional code instrumentation, and aren't provided by the run context.

字段Field 理由/示例Reasoning/Example
build_url/build_versionbuild_url/build_version 如果使用 CI/CD 进行部署,此字段可将日志关联到提供步骤和管道逻辑的代码版本。If using CI/CD to deploy, this field can correlate logs to the code version that provided the step and pipeline logic. 此链接可以进一步帮助诊断问题,或识别具有特定特征(日志/指标值)的模型This link can further help to diagnose issues, or identify models with specific traits (log/metric values)
run_typerun_type 可以区分不同的模型类型,或者区分训练运行与评分运行Can differentiate between different model types, or training vs. scoring runs

创建自定义维度字典Creating a Custom Dimensions dictionary

from azureml.core import Run

run = Run.get_context(allow_offline=False)

custom_dimensions = {
    "parent_run_id": run.parent.id,
    "step_id": run.id,
    "step_name": run.name,
    "experiment_name": run.experiment.name,
    "run_url": run.parent.get_portal_url(),
    "run_type": "training"
}

# Assumes AzureLogHandler was already registered above
logger.info("I will be sent to Application Insights with Custom Dimensions", custom_dimensions)

OpenCensus Python 日志记录注意事项OpenCensus Python logging considerations

OpenCensus AzureLogHandler 用于将 Python 日志路由到 Application Insights。The OpenCensus AzureLogHandler is used to route Python logs to Application Insights. 因此,应考虑到 Python 日志记录的细微差别。As a result, Python logging nuances should be considered. 创建记录器后,它将采用默认日志级别,并显示大于或等于该级别的日志。When a logger is created, it has a default log level and will show logs greater than or equal to that level. 日志记录 Cookbook 提供了有关使用 Python 日志记录功能的有用参考。A good reference for using Python logging features is the Logging Cookbook.

OpenCensus 库需要 APPLICATIONINSIGHTS_CONNECTION_STRING 环境变量。The APPLICATIONINSIGHTS_CONNECTION_STRING environment variable is needed for the OpenCensus library. 建议设置此环境变量,而不要将其作为管道参数传入,以避免传递纯文本连接字符串。We recommend setting this environment variable instead of passing it in as a pipeline parameter to avoid passing around plaintext connection strings.

在 Application Insights 中查询日志Querying logs in Application Insights

路由到 Application Insights 的日志将显示在“跟踪”或“异常”下。The logs routed to Application Insights will show up under 'traces' or 'exceptions'. 请务必调整时间范围以包括你的管道运行。Be sure to adjust your time window to include your pipeline run.

Application Insights 查询结果

Application Insights 中的结果将显示日志消息和级别、文件路径及代码行号。The result in Application Insights will show the log message and level, file path, and code line number. 它还显示包含的任何自定义维度。It will also show any custom dimensions included. 在此图中,customDimensions 字典显示了前一代码示例中的键/值对。In this image, the customDimensions dictionary shows the key/value pairs from the previous code sample.

其他有用查询Additional helpful queries

以下某些查询使用“customDimensions.Level”。Some of the queries below use 'customDimensions.Level'. 这些严重性级别对应于最初发送 Python 日志时所用的级别。These severity levels correspond to the level the Python log was originally sent with. 有关其他查询信息,请参阅 Azure Monitor 日志查询For additional query information, see Azure Monitor Log Queries.

用例Use case 查询Query
特定自定义维度(例如“parent_run_id”)的日志结果Log results for specific custom dimension, for example 'parent_run_id'
traces | 
where customDimensions.parent_run_id == '931024c2-3720-11ea-b247-c49deda841c1
过去 7 天内所有训练运行的日志结果Log results for all training runs over the last 7 days
traces | 
where timestamp > ago(7d)
and customDimensions.run_type == 'training'
过去 7 天内出现 severityLevel 错误的日志结果Log results with severityLevel Error from the last 7 days
traces | 
where timestamp > ago(7d)
and customDimensions.Level == 'ERROR'
过去 7 天内出现 severityLevel 错误的日志结果的计数Count of log results with severityLevel Error over the last 7 days
traces | 
where timestamp > ago(7d)
and customDimensions.Level == 'ERROR' |
summarize count()

后续步骤Next Steps

在 Application Insights 实例中启用日志后,可以使用这些日志基于查询结果设置 Azure Monitor 警报Once you have logs in your Application Insights instance, they can be used to set Azure Monitor alerts based on query results.

还可以将查询结果添加到 Azure 仪表板以获取更多见解。You can also add results from queries to an Azure Dashboard for additional insights.