通过 Azure 机器学习设计器使用提升决策树来预测客户流失Use boosted decision tree to predict churn with Azure Machine Learning designer

设计器(预览版)示例 5Designer (preview) sample 5

应用于:否基本版是 Enterprise 版本            (升级到 EnterpriseAPPLIES TO: noBasic edition yesEnterprise edition                       (Upgrade to Enterprise)

了解如何使用设计器(预览版)在不编写代码的情况下构建复杂的机器学习管道。Learn how to build a complex machine learning pipeline without writing a single line of code using the designer (preview).

此管道训练 2 个双类提升决策树分类器来预测客户管理 (CRM) 系统的常见任务 - 客户流失。This pipeline trains 2 two-class boosted decision tree classifiers to predict common tasks for customer relationship management (CRM) systems - customer churn. 数据值和标签被分成多个数据源,并被无序收集以匿名化客户信息,但是,我们仍然可以使用设计器来组合数据集,并使用模糊值来训练模型。The data values and labels are split across multiple data sources and scrambled to anonymize customer information, however, we can still use the designer to combine data sets and train a model using the obscured values.

因为你尝试回答问题“哪一个?”,Because you're trying to answer the question "Which one?" 因此这称为分类问题,但是,你可以应用此示例中显示的同一逻辑来解决任何类型的机器学习问题,无论是回归、分类,还是聚类等等。this is called a classification problem, but you can apply the same logic shown in this sample to tackle any type of machine learning problem whether it be regression, classification, clustering, and so on.

下面是此管道的完整图形:Here's the completed graph for this pipeline:

管道图形

先决条件Prerequisites

  1. 创建 Azure 机器学习工作区(如果没有)。Create an Azure Machine Learning workspace if you don't have one.

  2. 登录到 ml.azure.com,选择要使用的工作区。Sign into ml.azure.com and select the workspace you want to work with.

  3. 选择“设计器”。 Select Designer.

    启动设计器

  1. 单击示例 5 将其打开。Click sample 5 to open it.

数据Data

此管道的数据来自 KDD Cup 2009。The data for this pipeline is from KDD Cup 2009. 它有 50,000 行和 230 个特征列。It has 50,000 rows and 230 feature columns. 任务是为使用这些特征的客户预测客户流失、购买欲和追加销售。The task is to predict churn, appetency, and up-selling for customers who use these features. 有关数据和任务的详细信息,请参阅 KDD 网站For more information about the data and the task, see the KDD website.

管道摘要Pipeline summary

设计器中的此示例管道显示了客户流失、购买欲和追加销售的二元分类器预测,这是客户关系管理 (CRM) 的一个常见任务。This sample pipeline in the designer shows binary classifier prediction of churn, appetency, and up-selling, a common task for customer relationship management (CRM).

首先,一些简单的数据处理。First, some simple data processing.

  • 原始数据集有许多缺失值。The raw dataset has many missing values. 使用清理缺失数据模块将缺失值替换为 0。Use the Clean Missing Data module to replace the missing values with 0.

    清理数据集

  • 特征和对应的客户流失位于不同的数据集中。The features and the corresponding churn are in different datasets. 使用添加列模块将标签列追加到特征列。Use the Add Columns module to append the label columns to the feature columns. 第一个列 Col1 是标签列。The first column, Col1, is the label column. 从可视化效果结果中,我们可以看到数据集不平衡。From the visualization result we can see the dataset is unbalanced. 负示例 (-1) 比正示例 (+1) 要多。There way more negative (-1) examples than positive examples (+1). 稍后我们将使用 SMOTE 模块来增加未被充分代表的案例。We will use SMOTE module to increase underrepresented cases later.

    添加列数据集

  • 使用拆分数据模块将数据集拆分为训练集和测试集。Use the Split Data module to split the dataset into train and test sets.

  • 然后,使用提升决策树二元分类器构建预测模型。Then use the Boosted Decision Tree binary classifier with the default parameters to build the prediction models. 为每个任务构建一个模型,也就是说,各个模型分别预测追加销售、购买欲和客户流失。Build one model per task, that is, one model each to predict up-selling, appetency, and churn.

  • 在管道的右侧部分中,我们使用 SMOTE 模块来增加正示例的百分比。In the right part of the pipeline, we use SMOTE module to increase the percentage of positive examples. SMOTE 百分比设置为 100 以使正示例加倍。The SMOTE percentage is set to 100 to double the positive examples. 通过 SMOTE 模块参考详细了解 SMOTE 模块如何工作。Learn more on how SMOTE module works with SMOTE module reference0.

结果Results

评估模型模块的输出可视化来查看基于测试集时的模型性能。Visualize the output of the Evaluate Model module to see the performance of the model on the test set.

评估结果

你可以移动阈值滑块并查看二元分类任务的指标更改。You can move the Threshold slider and see the metrics change for the binary classification task.

清理资源Clean up resources

重要

可以使用你创建的、用作其他 Azure 机器学习教程和操作指南文章的先决条件的资源。You can use the resources that you created as prerequisites for other Azure Machine Learning tutorials and how-to articles.

删除所有内容Delete everything

如果你不打算使用所创建的任何内容,请删除整个资源组,以免产生任何费用。If you don't plan to use anything that you created, delete the entire resource group so you don't incur any charges.

  1. 在 Azure 门户的窗口左侧选择“资源组” 。In the Azure portal, select Resource groups on the left side of the window.

    在 Azure 门户中删除资源组

  2. 在列表中选择你创建的资源组。In the list, select the resource group that you created.

  3. 选择“删除资源组” 。Select Delete resource group.

删除该资源组也会删除在设计器中创建的所有资源。Deleting the resource group also deletes all resources that you created in the designer.

删除各项资产Delete individual assets

在创建试验的设计器中删除各个资产,方法是将其选中,然后选择“删除”按钮。 In the designer where you created your experiment, delete individual assets by selecting them and then selecting the Delete button.

此处创建的计算目标在未使用时,会自动缩减到零个节点。 The compute target that you created here automatically autoscales to zero nodes when it's not being used. 此操作旨在最大程度地减少费用。This action is taken to minimize charges. 若要删除计算目标,请执行以下步骤: If you want to delete the compute target, take these steps:

删除资产

可以通过选择每个数据集并选择“注销” ,从工作区中注销数据集。You can unregister datasets from your workspace by selecting each dataset and selecting Unregister.

取消注册数据集

若要删除数据集,请使用 Azure 门户或 Azure 存储资源管理器访问存储帐户,然后手动删除这些资产。To delete a dataset, go to the storage account by using the Azure portal or Azure Storage Explorer and manually delete those assets.

后续步骤Next steps

浏览可用于设计器的其他示例:Explore the other samples available for the designer: