监视和查看 ML 运行日志与指标Monitor and view ML run logs and metrics

适用于:是基本版是企业版               (升级到企业版APPLIES TO: yesBasic edition yesEnterprise edition                    (Upgrade to Enterprise edition)

本文介绍如何监视 Azure 机器学习运行并查看其日志。In this article, you learn how to monitor Azure Machine Learning runs and view their logs. 在查看日志之前,必须先启用它们。Before you can view logs, you have to enable them first. 有关详细信息,请参阅在 Azure 机器学习训练运行中启用日志记录For more information, see Enable logging in Azure ML training runs.

日志可帮助你诊断错误和警告,或跟踪参数和模型准确性等性能指标。Logs can help you diagnose errors and warnings, or track performance metrics like parameters and model accuracy. 本文介绍如何使用以下方法查看日志:In this article, you learn how to view logs using the following methods:

  • 在工作室中监视运行Monitor runs in the studio
  • 使用 Jupyter Notebook 小组件监视运行Monitor runs using the Jupyter Notebook widget
  • 监视自动化机器学习运行Monitor automated machine learning runs
  • 完成时查看输出日志View output logs upon completion
  • 在工作室中查看输出日志View output logs in the studio

有关如何管理试验的常规信息,请参阅启动、监视和取消训练运行For general information on how to manage your experiments, see Start, monitor, and cancel training runs.

在工作室中监视运行Monitor runs in the studio

若要在浏览器中监视特定计算目标的运行,请执行以下步骤:To monitor runs for a specific compute target from your browser, use the following steps:

  1. Azure 机器学习工作室中选择自己的工作区,然后在页面左侧选择“计算”。In the Azure Machine Learning studio, select your workspace, and then select Compute from the left side of the page.

  2. 选择“正在训练群集”,显示用于训练的计算目标列表。Select Training Clusters to display a list of compute targets used for training. 然后选择群集。Then select the cluster.

    选择训练群集

  3. 选择“运行”。Select Runs. 此时显示使用此群集的运行列表。The list of runs that use this cluster is displayed. 若要查看某个特定运行的详细信息,请点击“运行”列中的链接。To view details for a specific run, use the link in the Run column. 若要查看试验的详细信息,请点击“试验”列中的链接。To view details for the experiment, use the link in the Experiment column.

    选择训练群集的运行

    提示

    由于训练计算目标是共享资源,因此它们可以让多个运行排队或在给定时间处于活动状态。Since training compute targets are a shared resource, they can have multiple runs queued or active at a given time.

    一个运行可以包含多个子级运行,所以一个训练作业可能会产生多个条目。A run can contain child runs, so one training job can result in multiple entries.

完成的运行将不再显示在此页上。Once a run completes, it is no longer displayed on this page. 若要查看已完成运行的信息,请访问工作室的“试验”部分,然后选择试验和运行。To view information on completed runs, visit the Experiments section of the studio and select the experiment and run. 有关详细信息,请参阅查看已完成运行的指标部分。For more information, see the section View metrics for completed runs.

使用 Jupyter Notebook 小组件监视运行Monitor runs using the Jupyter notebook widget

使用 ScriptRunConfig 方法提交运行时,可使用 Jupyter 小组件监视运行的进度。When you use the ScriptRunConfig method to submit runs, you can watch the progress of the run using the Jupyter widget. 和运行提交一样,该小组件采用异步方式,并每隔 10-15 秒提供实时更新,直到作业完成。Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes.

在等待运行完成的期间查看 Jupyter 小组件。View the Jupyter widget while waiting for the run to complete.

from azureml.widgets import RunDetails
RunDetails(run).show()

Jupyter 笔记本小组件的屏幕截图

也可以在工作区中找到指向此画面的链接。You can also get a link to the same display in your workspace.

print(run.get_portal_url())

监视自动化机器学习运行Monitor automated machine learning runs

对于自动化机器学习运行,若要访问根据以前的运行生成的图表,请将 <<experiment_name>> 替换为相应的试验名称:For automated machine learning runs, to access the charts from a previous run, replace <<experiment_name>> with the appropriate experiment name:

from azureml.widgets import RunDetails
from azureml.core.run import Run

experiment = Experiment (workspace, <<experiment_name>>)
run_id = 'autoML_my_runID' #replace with run_ID
run = Run(experiment, run_id)
RunDetails(run).show()

自动化机器学习的 Jupyter Notebook 小组件

完成时显示输出Show output upon completion

使用 ScriptRunConfig 时,可以使用 run.wait_for_completion(show_output = True) 在模型定型完成时进行显示。When you use ScriptRunConfig, you can use run.wait_for_completion(show_output = True) to show when the model training is complete. 使用 show_output 标志可查看详细输出。The show_output flag gives you verbose output. 有关详细信息,请参阅如何启用日志记录中的 ScriptRunConfig 部分。For more information, see the ScriptRunConfig section of How to enable logging.

查询运行指标Query run metrics

可以使用 run.get_metrics() 查看训练的模型的指标。You can view the metrics of a trained model using run.get_metrics(). 例如,可以将此方法示例与上面的示例配合使用,通过查找具有最低均方误差 (mse) 值的模型来确定最佳模型。For example, you could use this with the example above to determine the best model by looking for the model with the lowest mean square error (mse) value.

在工作室中查看运行记录View run records in the studio

可以在 Azure 机器学习工作室中浏览已完成的运行记录,包括记录的指标。You can browse completed run records, including logged metrics, in the Azure Machine Learning studio.

导航到“试验”选项卡并选择自己的试验。Navigate to the Experiments tab and select your experiment. 在“试验运行”仪表板中,可以看到为每次运行跟踪的指标和日志。On the experiment run dashboard, you can see tracked metrics and logs for each run.

向下钻取至特定运行以查看其输出或日志,或下载试验的快照,以便与其他人共享试验文件夹。Drill down to a specific run to view its outputs or logs, or download the snapshot of the experiment so you can share the experiment folder with others.

还可以编辑“运行列表”表,以选择多个运行并显示运行的最新记录值、最小记录值或最大记录值。You can also edit the run list table to select multiple runs and display either the last, minimum, or maximum logged value for your runs. 自定义自己的图表,以比较多个运行上的已记录指标值和聚合。Customize your charts to compare the logged metrics values and aggregates across multiple runs.

Azure 机器学习工作室中的运行详细信息

设置工作室中图表的格式Format charts in the studio

使用日志记录 API 中的以下方法可影响将指标可视化的工作室。Use the following methods in the logging APIs to influence the studio visualizes your metrics.

记录的值Logged Value 示例代码Example code 门户中的格式Format in portal
记录一组数值Log an array of numeric values run.log_list(name='Fibonacci', value=[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]) 单变量折线图single-variable line chart
使用重复使用的相同指标名称记录单个数值(例如在 for 循环中)Log a single numeric value with the same metric name repeatedly used (like from within a for loop) for i in tqdm(range(-10, 10)): run.log(name='Sigmoid', value=1 / (1 + np.exp(-i))) angle = i / 2.0 单变量折线图Single-variable line chart
重复记录包含 2 个数字列的行Log a row with 2 numerical columns repeatedly run.log_row(name='Cosine Wave', angle=angle, cos=np.cos(angle)) sines['angle'].append(angle) sines['sine'].append(np.sin(angle)) 双变量折线图Two-variable line chart
记录包含 2 个数字列的表Log table with 2 numerical columns run.log_table(name='Sine Wave', value=sines) 双变量折线图Two-variable line chart

后续步骤Next steps

尝试执行以下后续步骤,了解如何使用 Azure 机器学习:Try these next steps to learn how to use Azure Machine Learning: