监视机器学习 Web 服务终结点以及从中收集数据Monitor and collect data from ML web service endpoints

适用于:是基本版是企业版               (升级到企业版APPLIES TO: yesBasic edition yesEnterprise edition                    (Upgrade to Enterprise edition)

本文介绍如何通过以下方式查询日志并启用 Azure Application Insights,监视部署到 Azure Kubernetes 服务 (AKS) 或 Azure 容器实例 (ACI) 中 Web 服务终结点的模型以及从中收集数据:In this article, you learn how to collect data from and monitor models deployed to web service endpoints in Azure Kubernetes Service (AKS) or Azure Container Instances (ACI) by querying logs and enabling Azure Application Insights via

除了收集终结点的输出数据和响应之外,还可以监视:In addition to collecting an endpoint's output data and response, you can monitor:

  • 请求速率、响应时间和失败率Request rates, response times, and failure rates
  • 依赖项速率、响应时间和失败率Dependency rates, response times, and failure rates
  • 异常Exceptions

详细了解 Azure Application InsightsLearn more about Azure Application Insights.

先决条件Prerequisites

  • 如果没有 Azure 订阅,请在开始前创建一个试用帐户。If you don't have an Azure subscription, create a trial account before you begin. 立即试用 Azure 机器学习的免费版或付费版Try the Azure trial today

  • 已安装 Azure 机器学习工作区、一个包含脚本的本地目录以及用于 Python 的 Azure 机器学习 SDK。An Azure Machine Learning workspace, a local directory that contains your scripts, and the Azure Machine Learning SDK for Python installed. 若要了解如何满足这些先决条件,请参阅如何配置开发环境To learn how to get these prerequisites, see How to configure a development environment

  • 要部署到 Azure Kubernetes 服务 (AKS) 或 Azure 容器实例 (ACI) 的经过训练的机器学习模型。A trained machine learning model to be deployed to Azure Kubernetes Service (AKS) or Azure Container Instance (ACI). 如果没有模型,请参阅训练图像分类模型教程If you don't have one, see the Train image classification model tutorial

查询部署的模型的日志Query logs for deployed models

若要从以前部署的 Web 服务检索日志,请加载该服务并使用 get_logs() 函数。To retrieve logs from a previously deployed web service, load the service and use the get_logs() function. 日志可以包含有关部署期间发生的任何错误的详细信息。The logs may contain detailed information about any errors that occurred during deployment.

from azureml.core.webservice import Webservice

# load existing web service
service = Webservice(name="service-name", workspace=ws)
logs = service.get_logs()

Web 服务元数据和响应数据Web service metadata and response data

重要

Azure Application Insights 仅记录最多 64kb 的有效负载。Azure Application Insights only logs payloads of up to 64kb. 如果达到此限制,可能会出现内存不足或未记录任何信息等错误。If this limit is reached then you may see errors such as out of memory, or no information may be logged.

若要将请求的信息记录到 Web 服务,请将 print 语句添加到 score.py 文件。To log information for a request to the web service, add print statements to your score.py file. 每个 print 语句都会在 Application Insights 的跟踪表中的消息 STDOUT 下生成一个条目。Each print statement results in one entry in the trace table in Application Insights, under the message STDOUT. print 语句的内容将依次包含在跟踪表的 customDimensionsContents 下。The contents of the print statement will be contained under customDimensions and then Contents in the trace table. 如果打印 JSON 字符串,它会在 Contents 下的跟踪输出中生成分层数据结构。If you print a JSON string, it produces a hierarchical data structure in the trace output under Contents.

你可以直接查询 Azure Application Insights 来访问此数据,或者设置到存储帐户的连续导出以保留更长时间或进一步进行处理。You can query Azure Application Insights directly to access this data, or set up a continuous export to a storage account for longer retention or further processing. 然后,可以在 Azure 机器学习中使用模型数据来设置标签、重新训练、可解释性、数据分析或其他用途。Model data can then be used in the Azure Machine Learning to set up labeling, retraining, explainability, data analysis, or other use.

使用 Python SDK 进行配置Use Python SDK to configure

更新已部署的服务Update a deployed service

  1. 在工作区中标识该服务。Identify the service in your workspace. ws 的值是工作区的名称The value for ws is the name of your workspace

    from azureml.core.webservice import Webservice
    aks_service= Webservice(ws, "my-service-name")
    
  2. 更新服务并启用 Azure Application InsightsUpdate your service and enable Azure Application Insights

    aks_service.update(enable_app_insights=True)
    

在服务中记录自定义跟踪Log custom traces in your service

如果要记录自定义跟踪,请遵循部署方式和部署位置文档中适用于 AKS 或 ACI 的标准部署过程。If you want to log custom traces, follow the standard deployment process for AKS or ACI in the How to deploy and where document. 然后,使用以下步骤:Then use the following steps:

  1. 若要在推理期间将数据发送到 Application Insights,请通过添加 print 语句来更新评分文件。To send data to Application Insights during inference, update the scoring file by adding print statements. 若要记录更复杂的信息(例如请求数据和响应),请使用 JSON 结构。To log more complex information, such as the request data and the response, us a JSON structure. 下面的示例 score.py 文件记录模型初始化的时间、推理期间的输入和输出以及发生任何错误的时间:The following example score.py file logs the time the model is initialized, the input and output during inference, and the time any errors occur:

    重要

    Azure Application Insights 仅记录最多 64kb 的有效负载。Azure Application Insights only logs payloads of up to 64kb. 如果达到此限制,则可能会出现诸如内存不足或不会记录任何信息之类的错误。If this limit is reached, you may see errors such as out of memory, or no information may be logged. 如果要记录的数据大于 64kb,应使用为生产环境中的模型收集数据中的信息,将其存储到 Blob 存储中。If the data you want to log is larger 64kb, you should instead store it to blob storage using the information in Collect Data for models in production.

    import pickle
    import json
    import numpy 
    from sklearn.externals import joblib
    from sklearn.linear_model import Ridge
    from azureml.core.model import Model
    import time
    
    def init():
        global model
        #Print statement for appinsights custom traces:
        print ("model initialized" + time.strftime("%H:%M:%S"))
    
        # note here "sklearn_regression_model.pkl" is the name of the model registered under the workspace
        # this call should return the path to the model.pkl file on the local disk.
        model_path = Model.get_model_path(model_name = 'sklearn_regression_model.pkl')
    
        # deserialize the model file back into a sklearn model
        model = joblib.load(model_path)
    
    
    # note you can pass in multiple rows for scoring
    def run(raw_data):
        try:
            data = json.loads(raw_data)['data']
            data = numpy.array(data)
            result = model.predict(data)
            # Log the input and output data to appinsights:
            info = {
                "input": raw_data,
                "output": result.tolist()
                }
            print(json.dumps(info))
            # you can return any datatype as long as it is JSON-serializable
            return result.tolist()
        except Exception as e:
            error = str(e)
            print (error + time.strftime("%H:%M:%S"))
            return error
    
  2. 更新服务配置Update the service configuration

    config = Webservice.deploy_configuration(enable_app_insights=True)
    
  3. 生成一个映像并将它部署到 AKS 或 ACI 上。Build an image and deploy it on AKS or ACI.

有关日志记录和数据收集的详细信息,请参阅在 Azure 机器学习中启用日志记录在生产环境中从模型收集数据For more information on logging and data collection, see Enable logging in Azure Machine Learning and Collect data from models in production.

在 Python 中禁用跟踪Disable tracking in Python

若要禁用 Azure Application Insights,请使用以下代码:To disable Azure Application Insights, use the following code:

## replace <service_name> with the name of the web service
<service_name>.update(enable_app_insights=False)

使用 Azure 机器学习工作室配置Use Azure Machine Learning studio to configure

当你准备好按照以下步骤部署模型时,还可以从 Azure 机器学习工作室启用 Azure Application Insights。You can also enable Azure Application Insights from Azure Machine Learning studio when you're ready to deploy your model with these steps.

  1. 登录到你的工作区 (https://studio.ml.azure.cn/ )Sign in to your workspace at https://studio.ml.azure.cn/

  2. 转到“模型”并选择要部署的模型Go to Models and select which model you want to deploy

  3. 选择“+部署”Select +Deploy

  4. 填充“部署模型”窗体Populate the Deploy model form

  5. 展开“高级”菜单Expand the Advanced menu

    “部署”窗体

  6. 选择“启用 Application Insights 诊断和数据收集”Select Enable Application Insights diagnostics and data collection

    启用 App Insights

查看指标和日志View metrics and logs

服务的数据将存储在 Azure Application Insights 帐户中,此帐户与 Azure 机器学习位于同一资源组。Your service's data is stored in your Azure Application Insights account, within the same resource group as Azure Machine Learning. 查看数据:To view it:

  1. Azure 门户中转到 Azure 机器学习工作区。Go to your Azure Machine Learning workspace in the Azure portal.

  2. 选择“终结点”。Select Endpoints.

  3. 选择已部署的服务。Select your deployed service.

  4. 向下滚动以查找 Application Insights url,并选择该链接。Scroll down to find the Application Insights url and select the link.

    AppInsightsLocAppInsightsLoc

  5. 在 Application Insights 中,从“概述”选项卡或左侧列表的“监视”部分选择“日志” 。In Application Insights, from the Overview tab or the Monitoring section in the list on the left, select Logs.

    监视的“概述”选项卡Overview tab of monitoring

  6. 若要查看从 score.py 文件记录的信息,请查看跟踪表。To view information logged from the score.py file, look at the traces table. 以下查询搜索记录了输入值的日志:The following query searches for logs where the input value was logged:

    traces
    | where customDimensions contains "input"
    | limit 10
    

    跟踪数据trace data

若要详细了解如何使用 Azure Application Insights,请参阅什么是 Application Insights?To learn more about how to use Azure Application Insights, see What is Application Insights?.

导出数据以便进一步处理并保留更长时间Export data for further processing and longer retention

重要

Azure Application Insights 仅支持导出到 Blob 存储。Azure Application Insights only supports exports to blob storage. 从 App Insights 中导出遥测数据列出了此导出功能的其他限制。Additional limits of this export capability are listed in Export telemetry from App Insights.

可以使用 Azure Application Insights 的连续导出将消息发送到受支持的存储帐户,可以在其中设置更长的保留期。You can use Azure Application Insights' continuous export to send messages to a supported storage account, where a longer retention can be set. 数据以 JSON 格式存储,可以轻松对其进行分析来提取模型数据。The data is stored in JSON format and can be easily parsed to extract model data.

可以根据需要使用 Azure 数据工厂、Azure ML Pipelines 或其他数据处理工具来转换数据。Azure Data Factory, Azure ML Pipelines, or other data processing tools can be used to transform the data as needed. 转换数据后,可以在 Azure 机器学习工作区中将其注册为数据集。When you have transformed the data, you can then register it with the Azure Machine Learning workspace as a dataset. 若要执行此操作,请参阅如何创建和注册数据集To do so, see How to create and register datasets.

连续导出Continuous Export

示例笔记本Example notebook

enable-app-insights-in-production-service.ipynb 笔记本演示了本文所述的概念。The enable-app-insights-in-production-service.ipynb notebook demonstrates concepts in this article.

阅读使用 Jupyter 笔记本探索此服务一文,了解如何运行笔记本。Learn how to run notebooks by following the article Use Jupyter notebooks to explore this service.

后续步骤Next steps