使用可解释性包通过 Python 解释 ML 模型和预测(预览版)Use the interpretability package to explain ML models & predictions in Python (preview)

本操作指南介绍如何使用 Azure 机器学习 Python SDK 的可解释性包来执行以下任务:In this how-to guide, you learn to use the interpretability package of the Azure Machine Learning Python SDK to perform the following tasks:

  • 在本地的个人计算机上解释整个模型行为或单个预测。Explain the entire model behavior or individual predictions on your personal machine locally.

  • 为工程特征启用可解释性技术。Enable interpretability techniques for engineered features.

  • 在 Azure 中解释整个模型的行为和单个预测。Explain the behavior for the entire model and individual predictions in Azure.

  • 使用可视化仪表板与模型解释进行交互。Use a visualization dashboard to interact with your model explanations.

  • 将评分解释器与模型一起部署,以便在推理过程中观察解释。Deploy a scoring explainer alongside your model to observe explanations during inferencing.

若要详细了解受支持的可解释性技术和机器学习模型,请参阅 Azure 机器学习中的模型可解释性笔记本示例For more information on the supported interpretability techniques and machine learning models, see Model interpretability in Azure Machine Learning and sample notebooks.

在个人计算机上生成特征重要性值Generate feature importance value on your personal machine

以下示例演示如何在不使用 Azure 服务的情况下在个人计算机上使用可解释性包。The following example shows how to use the interpretability package on your personal machine without contacting Azure services.

  1. 安装 azureml-interpret 包。Install the azureml-interpret package.

    pip install azureml-interpret
    
  2. 在本地 Jupyter 笔记本中训练示例模型。Train a sample model in a local Jupyter notebook.

    # load breast cancer dataset, a well-known small dataset that comes with scikit-learn
    from sklearn.datasets import load_breast_cancer
    from sklearn import svm
    from sklearn.model_selection import train_test_split
    breast_cancer_data = load_breast_cancer()
    classes = breast_cancer_data.target_names.tolist()
    
    # split data into train and test
    from sklearn.model_selection import train_test_split
    x_train, x_test, y_train, y_test = train_test_split(breast_cancer_data.data,            
                                                        breast_cancer_data.target,  
                                                        test_size=0.2,
                                                        random_state=0)
    clf = svm.SVC(gamma=0.001, C=100., probability=True)
    model = clf.fit(x_train, y_train)
    
  3. 在本地调用解释器。Call the explainer locally.

    • 若要初始化解释器对象,请将模型和一些训练数据传递给该解释器的构造函数。To initialize an explainer object, pass your model and some training data to the explainer's constructor.
    • 若要使解释和可视化效果更具参考性,可以选择传入特征名称和输出类名称(如果执行分类)。To make your explanations and visualizations more informative, you can choose to pass in feature names and output class names if doing classification.

    以下代码块演示如何在本地使用 TabularExplainerMimicExplainerPFIExplainer 实例化解释器对象。The following code blocks show how to instantiate an explainer object with TabularExplainer, MimicExplainer, and PFIExplainer locally.

    • TabularExplainer 调用下面的三个 SHAP 解释器之一(TreeExplainerDeepExplainerKernelExplainer)。TabularExplainer calls one of the three SHAP explainers underneath (TreeExplainer, DeepExplainer, or KernelExplainer).
    • TabularExplainer 为用例自动选择最适合的解释器,但你可以直接调用三个基础解释器中的每一个。TabularExplainer automatically selects the most appropriate one for your use case, but you can call each of its three underlying explainers directly.
    from interpret.ext.blackbox import TabularExplainer
    
    # "features" and "classes" fields are optional
    explainer = TabularExplainer(model, 
                                 x_train, 
                                 features=breast_cancer_data.feature_names, 
                                 classes=classes)
    

    or

    
    from interpret.ext.blackbox import MimicExplainer
    
    # you can use one of the following four interpretable models as a global surrogate to the black box model
    
    from interpret.ext.glassbox import LGBMExplainableModel
    from interpret.ext.glassbox import LinearExplainableModel
    from interpret.ext.glassbox import SGDExplainableModel
    from interpret.ext.glassbox import DecisionTreeExplainableModel
    
    # "features" and "classes" fields are optional
    # augment_data is optional and if true, oversamples the initialization examples to improve surrogate model accuracy to fit original model.  Useful for high-dimensional data where the number of rows is less than the number of columns.
    # max_num_of_augmentations is optional and defines max number of times we can increase the input data size.
    # LGBMExplainableModel can be replaced with LinearExplainableModel, SGDExplainableModel, or DecisionTreeExplainableModel
    explainer = MimicExplainer(model, 
                               x_train, 
                               LGBMExplainableModel, 
                               augment_data=True, 
                               max_num_of_augmentations=10, 
                               features=breast_cancer_data.feature_names, 
                               classes=classes)
    

    or

    from interpret.ext.blackbox import PFIExplainer
    
    # "features" and "classes" fields are optional
    explainer = PFIExplainer(model,
                             features=breast_cancer_data.feature_names, 
                             classes=classes)
    

解释整个模型行为(全局解释)Explain the entire model behavior (global explanation)

请参阅以下示例来帮助获取聚合(全局)特征重要性值。Refer to the following example to help you get the aggregate (global) feature importance values.


# you can use the training data or the test data here, but test data would allow you to use Explanation Exploration
global_explanation = explainer.explain_global(x_test)

# if you used the PFIExplainer in the previous step, use the next line of code instead
# global_explanation = explainer.explain_global(x_train, true_labels=y_train)

# sorted feature importance values and feature names
sorted_global_importance_values = global_explanation.get_ranked_global_values()
sorted_global_importance_names = global_explanation.get_ranked_global_names()
dict(zip(sorted_global_importance_names, sorted_global_importance_values))

# alternatively, you can print out a dictionary that holds the top K feature names and values
global_explanation.get_feature_importance_dict()

解释单个预测(本地解释)Explain an individual prediction (local explanation)

通过为单个实例或一组实例调用解释来获取不同数据点的单个特征重要性值。Get the individual feature importance values of different datapoints by calling explanations for an individual instance or a group of instances.

备注

PFIExplainer 不支持本地解释。PFIExplainer does not support local explanations.

# get explanation for the first data point in the test set
local_explanation = explainer.explain_local(x_test[0:5])

# sorted feature importance values and feature names
sorted_local_importance_names = local_explanation.get_ranked_local_names()
sorted_local_importance_values = local_explanation.get_ranked_local_values()

原始特征转换Raw feature transformations

可以选择获取原始的未经转换的特征中的解释,而不是工程特征中的解释。You can opt to get explanations in terms of raw, untransformed features rather than engineered features. 对于此选项,请将特征转换管道传递到 train_explain.py 中的解释器。For this option, you pass your feature transformation pipeline to the explainer in train_explain.py. 否则,解释器会根据工程特征提供解释。Otherwise, the explainer provides explanations in terms of engineered features.

支持的转换格式与 sklearn-pandas 中所述的格式相同。The format of supported transformations is the same as described in sklearn-pandas. 一般情况下,只要转换针对单个列运行,并且很明确地可以判断它们执行一对多的转换,则就会支持这些转换。In general, any transformations are supported as long as they operate on a single column so that it's clear they're one-to-many.

使用 sklearn.compose.ColumnTransformer 或拟合的转换器元组列表获取原始特征的解释。Get an explanation for raw features by using a sklearn.compose.ColumnTransformer or with a list of fitted transformer tuples. 以下示例使用 sklearn.compose.ColumnTransformerThe following example uses sklearn.compose.ColumnTransformer.

from sklearn.compose import ColumnTransformer

numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())])

categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])

# append classifier to preprocessing pipeline.
# now we have a full prediction pipeline.
clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('classifier', LogisticRegression(solver='lbfgs'))])


# clf.steps[-1][1] returns the trained classification model
# pass transformation as an input to create the explanation object
# "features" and "classes" fields are optional
tabular_explainer = TabularExplainer(clf.steps[-1][1],
                                     initialization_examples=x_train,
                                     features=dataset_feature_names,
                                     classes=dataset_classes,
                                     transformations=preprocessor)

如果你想要使用拟合的转换器元组列表运行示例,请使用以下代码:In case you want to run the example with the list of fitted transformer tuples, use the following code:

from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.linear_model import LogisticRegression
from sklearn_pandas import DataFrameMapper

# assume that we have created two arrays, numerical and categorical, which holds the numerical and categorical feature names

numeric_transformations = [([f], Pipeline(steps=[('imputer', SimpleImputer(
    strategy='median')), ('scaler', StandardScaler())])) for f in numerical]

categorical_transformations = [([f], OneHotEncoder(
    handle_unknown='ignore', sparse=False)) for f in categorical]

transformations = numeric_transformations + categorical_transformations

# append model to preprocessing pipeline.
# now we have a full prediction pipeline.
clf = Pipeline(steps=[('preprocessor', DataFrameMapper(transformations)),
                      ('classifier', LogisticRegression(solver='lbfgs'))])

# clf.steps[-1][1] returns the trained classification model
# pass transformation as an input to create the explanation object
# "features" and "classes" fields are optional
tabular_explainer = TabularExplainer(clf.steps[-1][1],
                                     initialization_examples=x_train,
                                     features=dataset_feature_names,
                                     classes=dataset_classes,
                                     transformations=transformations)

通过远程运行生成特征重要性值Generate feature importance values via remote runs

以下示例演示如何使用 ExplanationClient 类为远程运行启用模型可解释性。The following example shows how you can use the ExplanationClient class to enable model interpretability for remote runs. 它在概念上类似于本地过程,但需要:It is conceptually similar to the local process, except you:

  • 在远程运行中使用 ExplanationClient 来上传可解释性上下文。Use the ExplanationClient in the remote run to upload the interpretability context.
  • 稍后在本地环境中下载该上下文。Download the context later in a local environment.
  1. 安装 azureml-interpret 包。Install the azureml-interpret package.

    pip install azureml-interpret
    
  2. 在本地 Jupyter 笔记本中创建训练脚本。Create a training script in a local Jupyter notebook. 例如,train_explain.pyFor example, train_explain.py.

    from azureml.interpret import ExplanationClient
    from azureml.core.run import Run
    from interpret.ext.blackbox import TabularExplainer
    
    run = Run.get_context()
    client = ExplanationClient.from_run(run)
    
    # write code to get and split your data into train and test sets here
    # write code to train your model here 
    
    # explain predictions on your local machine
    # "features" and "classes" fields are optional
    explainer = TabularExplainer(model, 
                                 x_train, 
                                 features=feature_names, 
                                 classes=classes)
    
    # explain overall model predictions (global explanation)
    global_explanation = explainer.explain_global(x_test)
    
    # uploading global model explanation data for storage or visualization in webUX
    # the explanation can then be downloaded on any compute
    # multiple explanations can be uploaded
    client.upload_model_explanation(global_explanation, comment='global explanation: all features')
    # or you can only upload the explanation object with the top k feature info
    #client.upload_model_explanation(global_explanation, top_k=2, comment='global explanation: Only top 2 features')
    
  3. 将 Azure 机器学习计算设置为计算目标,并提交训练运行。Set up an Azure Machine Learning Compute as your compute target and submit your training run. 有关说明,请参阅创建和管理 Azure 机器学习计算群集See Create and manage Azure Machine Learning compute clusters for instructions. 示例笔记本也可能很有帮助。You might also find the example notebooks helpful.

  4. 下载本地 Jupyter 笔记本中的解释。Download the explanation in your local Jupyter notebook.

    from azureml.interpret import ExplanationClient
    
    client = ExplanationClient.from_run(run)
    
    # get model explanation data
    explanation = client.download_model_explanation()
    # or only get the top k (e.g., 4) most important features with their importance values
    explanation = client.download_model_explanation(top_k=4)
    
    global_importance_values = explanation.get_ranked_global_values()
    global_importance_names = explanation.get_ranked_global_names()
    print('global importance values: {}'.format(global_importance_values))
    print('global importance names: {}'.format(global_importance_names))
    

可视化效果Visualizations

将解释下载到本地 Jupyter Notebook 后,可以使用可视化效果仪表板来了解和解释模型。After you download the explanations in your local Jupyter Notebook, you can use the visualization dashboard to understand and interpret your model. 若要在 Jupyter Notebook 中加载可视化效果仪表板小组件,请使用以下代码:To load the visualization dashboard widget in your Jupyter Notebook, use the following code:

from interpret_community.widget import ExplanationDashboard

ExplanationDashboard(global_explanation, model, datasetX=x_test)

可视化效果同时支持有关工程化特征和原始特征的解释。The visualization supports explanations on both engineered and raw features. 原始解释基于原始数据集的特征,工程化解释基于应用了特征工程的数据集的特征。Raw explanations are based on the features from the original dataset and engineered explanations are based on the features from the dataset with feature engineering applied.

尝试解释与原始数据集相关的模型时,建议使用原始解释,因为每个特征重要性将对应于原始数据集中的一个列。When attempting to interpret a model with respect to the original dataset it is recommended to use raw explanations as each feature importance will correspond to a column from the original dataset. 工程化解释可能有用的一个场景是,从分类特征观察各个类别的影响。One scenario where engineered explanations might be useful is when examining the impact of individual categories from a categorical feature. 如果对某个分类特征应用了独热编码,则生成的工程化解释会为每个类别包含一个不同的重要性值,为每个独热工程化特征包含一个重要性值。If a one-hot encoding is applied to a categorical feature, then the resulting engineered explanations will include a different importance value per category, one per one-hot engineered feature. 这在缩小范围以确定数据集的哪一部分提供的信息对模型最有用时很有用。This can be useful when narrowing down which part of the dataset is most informative to the model.

备注

工程化解释和原始解释按顺序计算。Engineered and raw explanations are computed sequentially. 首先会基于模型和特征化管道创建一个工程化解释。First an engineered explanation is created based on the model and featurization pipeline. 然后,通过聚合来自同一原始特征的工程化特征的重要性,基于该工程化解释创建原始解释。Then the raw explanation is created based on that engineered explanation by aggregating the importance of engineered features that came from the same raw feature.

创建、编辑和查看数据集队列Create, edit and view dataset cohorts

顶部的功能区显示了模型和数据的总体统计信息。The top ribbon shows the overall statistics on your model and data. 你可以将数据切分为数据集队列或子组,以研究或比较你的模型在这些已定义子组中的性能和解释。You can slice and dice your data into dataset cohorts, or subgroups, to investigate or compare your model’s performance and explanations across these defined subgroups. 通过在这些子组之间比较数据集统计信息和说明,你可以了解为何其中一组可能出现错误,而另一组却不出现错误。By comparing your dataset statistics and explanations across those subgroups, you can get a sense of why possible errors are happening in one group versus another.

创建、编辑和查看数据集队列Creating, editing, and viewing dataset cohorts

理解整个模型行为(全局解释)Understand entire model behavior (global explanation)

解释仪表板的前三个选项卡提供了已训练模型的总体分析及其预测和解释。The first three tabs of the explanation dashboard provide an overall analysis of the trained model along with its predictions and explanations.

模型性能Model performance

通过探究预测值的分布和模型性能指标的值来评估模型的性能。Evaluate the performance of your model by exploring the distribution of your prediction values and the values of your model performance metrics. 你可以通过查看模型在数据集的不同队列或子组之间的性能比较分析来进一步研究模型。You can further investigate your model by looking at a comparative analysis of its performance across different cohorts or subgroups of your dataset. 通过 x 值和 y 值选择筛选器,以便了解不同的维度。Select filters along y-value and x-value to cut across different dimensions. 查看指标,例如准确度、精度、召回率、误报率 (FPR) 和漏报率 (FNR)。View metrics such as accuracy, precision, recall, false positive rate (FPR) and false negative rate (FNR).

解释可视化效果中的“模型性能”选项卡Model performance tab in the explanation visualization

数据集资源管理器Dataset explorer

了解数据集统计信息,方法是:沿 X 轴、Y 轴和颜色轴选择不同的筛选器,以便沿不同维度对数据进行切片。Explore your dataset statistics by selecting different filters along the X, Y, and color axes to slice your data along different dimensions. 创建以上数据集队列,以使用筛选器(例如预测结果、数据集特征和错误组)分析数据集统计信息。Create dataset cohorts above to analyze dataset statistics with filters such as predicted outcome, dataset features and error groups. 可以使用此图右上角的齿轮图标更改图形类型。Use the gear icon in the upper right-hand corner of the graph to change graph types.

解释可视化效果中的“数据集资源管理器”选项卡Dataset explorer tab in the explanation visualization

聚合特征重要性Aggregate feature importance

探究影响整体模型预测(也称为全局解释)的排名前 k 的重要特征。Explore the top-k important features that impact your overall model predictions (also known as global explanation). 使用滑块来显示按降序排列的特征重要性值。Use the slider to show descending feature importance values. 最多可以选择三个队列来并排查看其特征重要性值。Select up to three cohorts to see their feature importance values side by side. 单击图形中的任何特征条可以查看所选特征的值如何影响下方的依赖关系图中的模型预测。Click on any of the feature bars in the graph to see how values of the selected feature impact model prediction in the dependence plot below.

解释可视化效果中的“聚合特征重要性”选项卡Aggregate feature importance tab in the explanation visualization

理解单个预测(本地解释)Understand individual predictions (local explanation)

可以使用解释选项卡的第四个选项卡深入了解单个数据点及其单个特征重要性。The fourth tab of the explanation tab lets you drill into an individual datapoint and their individual feature importances. 你可以通过单击主散点图中的任意单个数据点或在右侧面板向导中选择特定的数据点,加载任意数据点的单个特征重要性绘图。You can load the individual feature importance plot for any data point by clicking on any of the individual data points in the main scatter plot or selecting a specific datapoint in the panel wizard on the right.

绘图Plot 说明Description
单个特征重要性Individual feature importance 为单个预测显示排名前 k 的重要特征。Shows the top-k important features for an individual prediction. 帮助演示基础模型对特定数据点的本地行为。Helps illustrate the local behavior of the underlying model on a specific data point.
What-if 分析What-If analysis 允许更改所选真实数据点的特征值,并通过生成具有新特征值的假设数据点来观察所导致的针对预测值的更改。Allows changes to feature values of the selected real data point and observe resulting changes to prediction value by generating a hypothetical datapoint with the new feature values.
个体条件预期 (ICE)Individual Conditional Expectation (ICE) 允许特征值从最小值更改为最大值。Allows feature value changes from a minimum value to a maximum value. 帮助演示在特征发生更改时数据点的预测如何更改。Helps illustrate how the data point's prediction changes when a feature changes.

解释仪表板中的单个特征重要性和 What-if 选项卡Individual feature importance and What-if tab in explanation dashboard

备注

这些是基于许多近似值的解释,而不是那样预测的“原因”。These are explanations based on many approximations and are not the "cause" of predictions. 如果因果推理没有严格的数学稳健性,我们不建议用户根据 What-if 工具的特征扰动做出现实生活中的决策。Without strict mathematical robustness of causal inference, we do not advise users to make real-life decisions based on the feature perturbations of the What-If tool. 此工具主要用于了解你的模型和进行调试。This tool is primarily for understanding your model and debugging.

Azure 机器学习工作室中的可视化效果Visualization in Azure Machine Learning studio

如果完成了远程可解释性步骤(将生成的解释上传到 Azure 机器学习运行历史记录),则可在 Azure 机器学习工作室中查看可视化仪表板。If you complete the remote interpretability steps (uploading generated explanation to Azure Machine Learning Run History), you can view the visualization dashboard in Azure Machine Learning studio. 此仪表板是前面所述的可视化仪表板的简化版本。This dashboard is a simpler version of the visualization dashboard explained above. What-if 数据点生成和 ICE 绘图已禁用,因为 Azure 机器学习工作室中没有可以执行实时计算的活动计算。What-If datapoint generation and ICE plots are disabled as there is no active compute in Azure Machine Learning studio that can perform their real time computations.

如果数据集、全局和本地解释可用,则数据会填充所有选项卡。If the dataset, global, and local explanations are available, data populates all of the tabs. 如果只有全局解释可用,则会禁用“单个特征重要性”选项卡。If only a global explanation is available, the Individual feature importance tab will be disabled.

通过以下途径之一访问 Azure 机器学习工作室中的可视化仪表板:Follow one of these paths to access the visualization dashboard in Azure Machine Learning studio:

  • “试验”窗格(预览)Experiments pane (Preview)

    1. 在左侧窗格中选择“试验”,以查看在 Azure 机器学习中运行的试验列表。Select Experiments in the left pane to see a list of experiments that you've run on Azure Machine Learning.
    2. 选择特定的试验可查看该试验中的所有运行。Select a particular experiment to view all the runs in that experiment.
    3. 选择一个运行,然后选择“解释”选项卡来查看解释可视化仪表板。Select a run, and then the Explanations tab to the explanation visualization dashboard.

    试验中的可视化效果仪表板,其中显示了 AzureML 工作室中的“聚合特征重要性”Visualization Dashboard with Aggregate Feature Importance in AzureML studio in experiments

  • “模型”窗格Models pane

    1. 如果已遵循使用 Azure 机器学习部署模型中的步骤注册了原始模型,则可以在左侧窗格中选择“模型”来查看它。If you registered your original model by following the steps in Deploy models with Azure Machine Learning, you can select Models in the left pane to view it.
    2. 选择一个模型,然后选择“解释”选项卡来查看解释可视化仪表板。Select a model, and then the Explanations tab to view the explanation visualization dashboard.

推理时的可解释性Interpretability at inference time

可将解释器与原始模型一起部署,并在推理时使用该解释器为任何新的数据点提供单个特征重要性值(本地解释)。You can deploy the explainer along with the original model and use it at inference time to provide the individual feature importance values (local explanation) for any new datapoint. 我们还提供了更轻型的评分解释器来改善推理时的可解释性性能,当前只有 Azure 机器学习 SDK 支持它。We also offer lighter-weight scoring explainers to improve interpretability performance at inference time, which is currently supported only in Azure Machine Learning SDK. 部署轻量评分解释器的过程类似于部署模型,包括以下步骤:The process of deploying a lighter-weight scoring explainer is similar to deploying a model and includes the following steps:

  1. 创建解释对象。Create an explanation object. 例如,可以使用 TabularExplainerFor example, you can use TabularExplainer:

     from interpret.ext.blackbox import TabularExplainer
    
    
    explainer = TabularExplainer(model, 
                                 initialization_examples=x_train, 
                                 features=dataset_feature_names, 
                                 classes=dataset_classes, 
                                 transformations=transformations)
    
  2. 使用解释对象创建评分解释器。Create a scoring explainer with the explanation object.

    from azureml.interpret.scoring.scoring_explainer import KernelScoringExplainer, save
    
    # create a lightweight explainer at scoring time
    scoring_explainer = KernelScoringExplainer(explainer)
    
    # pickle scoring explainer
    # pickle scoring explainer locally
    OUTPUT_DIR = 'my_directory'
    save(scoring_explainer, directory=OUTPUT_DIR, exist_ok=True)
    
  3. 配置并注册使用评分解释器模型的映像。Configure and register an image that uses the scoring explainer model.

    # register explainer model using the path from ScoringExplainer.save - could be done on remote compute
    # scoring_explainer.pkl is the filename on disk, while my_scoring_explainer.pkl will be the filename in cloud storage
    run.upload_file('my_scoring_explainer.pkl', os.path.join(OUTPUT_DIR, 'scoring_explainer.pkl'))
    
    scoring_explainer_model = run.register_model(model_name='my_scoring_explainer', 
                                                 model_path='my_scoring_explainer.pkl')
    print(scoring_explainer_model.name, scoring_explainer_model.id, scoring_explainer_model.version, sep = '\t')
    
  4. (可选步骤)可以从云检索评分解释器,并测试解释。As an optional step, you can retrieve the scoring explainer from cloud and test the explanations.

    from azureml.interpret.scoring.scoring_explainer import load
    
    # retrieve the scoring explainer model from cloud"
    scoring_explainer_model = Model(ws, 'my_scoring_explainer')
    scoring_explainer_model_path = scoring_explainer_model.download(target_dir=os.getcwd(), exist_ok=True)
    
    # load scoring explainer from disk
    scoring_explainer = load(scoring_explainer_model_path)
    
    # test scoring explainer locally
    preds = scoring_explainer.explain(x_test)
    print(preds)
    
  5. 遵循以下步骤将映像部署到计算目标:Deploy the image to a compute target, by following these steps:

    1. 如果需要,请遵循使用 Azure 机器学习部署模型中的步骤注册原始预测模型。If needed, register your original prediction model by following the steps in Deploy models with Azure Machine Learning.

    2. 创建评分文件。Create a scoring file.

      %%writefile score.py
      import json
      import numpy as np
      import pandas as pd
      import os
      import pickle
      from sklearn.externals import joblib
      from sklearn.linear_model import LogisticRegression
      from azureml.core.model import Model
      
      def init():
      
         global original_model
         global scoring_model
      
         # retrieve the path to the model file using the model name
         # assume original model is named original_prediction_model
         original_model_path = Model.get_model_path('original_prediction_model')
         scoring_explainer_path = Model.get_model_path('my_scoring_explainer')
      
         original_model = joblib.load(original_model_path)
         scoring_explainer = joblib.load(scoring_explainer_path)
      
      def run(raw_data):
         # get predictions and explanations for each data point
         data = pd.read_json(raw_data)
         # make prediction
         predictions = original_model.predict(data)
         # retrieve model explanations
         local_importance_values = scoring_explainer.explain(data)
         # you can return any data type as long as it is JSON-serializable
         return {'predictions': predictions.tolist(), 'local_importance_values': local_importance_values}
      
    3. 定义部署配置。Define the deployment configuration.

      此配置取决于模型的要求。This configuration depends on the requirements of your model. 以下示例定义一种使用 1 个 CPU 核心和 1 GB 内存的配置。The following example defines a configuration that uses one CPU core and one GB of memory.

      from azureml.core.webservice import AciWebservice
      
       aciconfig = AciWebservice.deploy_configuration(cpu_cores=1,
                                                 memory_gb=1,
                                                 tags={"data": "NAME_OF_THE_DATASET",
                                                       "method" : "local_explanation"},
                                                 description='Get local explanations for NAME_OF_THE_PROBLEM')
      
    4. 创建包含环境依赖项的文件。Create a file with environment dependencies.

      from azureml.core.conda_dependencies import CondaDependencies
      
      # WARNING: to install this, g++ needs to be available on the Docker image and is not by default (look at the next cell)
      
      azureml_pip_packages = ['azureml-defaults', 'azureml-core', 'azureml-telemetry', 'azureml-interpret']
      
      
      # specify CondaDependencies obj
      myenv = CondaDependencies.create(conda_packages=['scikit-learn', 'pandas'],
                                       pip_packages=['sklearn-pandas'] + azureml_pip_packages,
                                       pin_sdk_version=False)
      
      
      with open("myenv.yml","w") as f:
         f.write(myenv.serialize_to_string())
      
      with open("myenv.yml","r") as f:
         print(f.read())
      
    5. 创建装有 g++ 的自定义 Dockerfile。Create a custom dockerfile with g++ installed.

      %%writefile dockerfile
      RUN apt-get update && apt-get install -y g++
      
    6. 部署创建的映像。Deploy the created image.

      此过程大约需要 5 分钟。This process takes approximately five minutes.

      from azureml.core.webservice import Webservice
      from azureml.core.image import ContainerImage
      
      # use the custom scoring, docker, and conda files we created above
      image_config = ContainerImage.image_configuration(execution_script="score.py",
                                                      docker_file="dockerfile",
                                                      runtime="python",
                                                      conda_file="myenv.yml")
      
      # use configs and models generated above
      service = Webservice.deploy_from_model(workspace=ws,
                                          name='model-scoring-service',
                                          deployment_config=aciconfig,
                                          models=[scoring_explainer_model, original_model],
                                          image_config=image_config)
      
      service.wait_for_deployment(show_output=True)
      
  6. 测试部署。Test the deployment.

    import requests
    
    # create data to test service with
    examples = x_list[:4]
    input_data = examples.to_json()
    
    headers = {'Content-Type':'application/json'}
    
    # send request to service
    resp = requests.post(service.scoring_uri, input_data, headers=headers)
    
    print("POST to url", service.scoring_uri)
    # can covert back to Python objects from json string if desired
    print("prediction:", resp.text)
    
  7. 清理。Clean up.

    若要删除已部署的 Web 服务,请使用 service.delete()To delete a deployed web service, use service.delete().

疑难解答Troubleshooting

  • 不支持稀疏数据:当存在大量特征时,模型解释仪表板会出现故障,或者速度明显减慢,因此我们目前不支持稀疏数据格式。Sparse data not supported: The model explanation dashboard breaks/slows down substantially with a large number of features, therefore we currently do not support sparse data format. 此外,当使用大型数据集和大量特征时,会产生常规的内存问题。Additionally, general memory issues will arise with large datasets and large number of features.

  • 模型解释不支持预测模型:可解释性(最佳模型解释)不适用于将以下算法推荐为最佳模型的 AutoML 预测试验:TCNForecaster、AutoArima、Prophet、ExponentialSmoothing、Average、Naive、Seasonal Average 和 Seasonal Naive。Forecasting models not supported with model explanations: Interpretability, best model explanation, is not available for AutoML forecasting experiments that recommend the following algorithms as the best model: TCNForecaster, AutoArima, Prophet, ExponentialSmoothing, Average, Naive, Seasonal Average, and Seasonal Naive. AutoML 预测具有支持解释的回归模型。AutoML Forecasting has regression models which support explanations. 但是,在解释仪表板中,不支持将“单个特征重要性”选项卡用于预测,因为其数据管道太复杂。However, in the explanation dashboard, the "Individual feature importance" tab is just not supported for forecasting because of complexity in their data pipelines.

  • 数据索引的本地解释:如果原始验证数据集有 5000 个以上的数据点,则解释仪表板不支持将本地重要性值关联到该数据集中的行标识符,因为仪表板会随机对数据进行下采样。Local explanation for data index: The explanation dashboard does not support relating local importance values to a row identifier from the original validation dataset if that dataset is greater than 5000 datapoints as the dashboard randomly downsamples the data. 但是,仪表板会在“单个特征重要性”选项卡下显示传递到仪表板中的每个数据点的原始数据集特征值。用户可以通过对原始数据集特征值进行匹配将本地重要性映射回原始数据集。However, the dashboard shows raw dataset feature values for each datapoint passed into the dashboard under the Individual feature importance tab. Users can map local importances back to the original dataset through matching the raw dataset feature values. 如果验证数据集的大小小于 5000 个样本,则 AzureML 工作室中的 index 特征将对应于验证数据集中的索引。If the validation dataset size is less than 5000 samples, the index feature in AzureML studio will correspond to the index in the validation dataset.

  • 工作室中不支持 What-if/ICE 绘图:Azure 机器学习工作室中的“解释”选项卡下不支持 What-If 和个体条件预期 (ICE) 绘图,因为上传的解释需要一个活动计算来重新计算预测和扰动特征的概率。What-if/ICE plots not supported in studio: What-If and Individual Conditional Expectation (ICE) plots are not supported in Azure Machine Learning studio under the Explanations tab since the uploaded explanation needs an active compute to recalculate predictions and probabilities of perturbed features. 当使用 SDK 将它作为小组件运行时,当前支持在 Jupyter 笔记本中使用它。It is currently supported in Jupyter notebooks when run as a widget using the SDK.

后续步骤Next steps

详细了解模型可解释性Learn more about model interpretability

查看 Azure 机器学习可解释性笔记本示例Check out Azure Machine Learning Interpretability sample notebooks