在 Azure Databricks 上使用 Apache Spark MLlibUse Apache Spark MLlib on Azure Databricks

Apache Spark MLlib 是 Apache Spark 机器学习库,由常见学习算法和实用程序(包括分类、回归、聚集、协作筛选、维数约简以及底层优化基元)组成。Apache Spark MLlib is the Apache Spark machine learning library consisting of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, and underlying optimization primitives. Azure Databricks 建议使用以下 Apache Spark MLLib 指南:Azure Databricks recommends the following Apache Spark MLLib guides:

示例笔记本Example notebooks

以下笔记本演示如何通过 Azure Databricks 使用各种 Apache Spark MLlib 功能。The following notebooks demonstrate how to use various Apache Spark MLlib features using Azure Databricks.

本部分内容:In this section:

二元分类示例Binary classification example

此笔记本演示如何使用 Apache Spark MLlib 管道 API 生成二进制分类应用程序。This notebook shows you how to build a binary classification application using the Apache Spark MLlib Pipelines API.

二进制分类笔记本Binary classification notebook

获取笔记本Get notebook

决策树示例Decision trees examples

这些示例演示了使用 Apache Spark 管道 API 的决策树的各种应用程序。These examples demonstrate various applications of decision trees using the Apache Spark MLlib Pipelines API.

决策树Decision trees

这些笔记本演示如何在决策树中执行分类。These notebooks show you how to perform classifications with decision trees.

数字识别笔记本的决策树Decision trees for digit recognition notebook

获取笔记本Get notebook

SFO 调查笔记本的决策树Decision trees for SFO survey notebook

获取笔记本Get notebook

使用 MLlib 管道的 GBT 回归GBT regression using MLlib pipelines

此笔记本演示如何使用 MLlib 管道通过渐变提升树来执行回归,从而根据一周中的某一天、天气、季节等信息预测自行车租金计费(每小时)。This notebook shows you how to use MLlib pipelines to perform a regression using gradient boosted trees to predict bike rental counts (per hour) from information such as day of the week, weather, season, and so on.

自行车共享回归笔记本Bike sharing regression notebook

获取笔记本Get notebook

Apache Spark MLib 管道和结构化流式处理示例 Apache Spark MLlib pipelines and Structured Streaming example

此笔记本演示如何针对历史数据训练 Apache Spark MLlib 管道并将其应用于流式处理数据。This notebook shows how to train an Apache Spark MLlib pipeline on historic data and apply it to streaming data.

MLlib 管道结构化流式处理笔记本MLlib pipeline Structured Streaming notebook

获取笔记本Get notebook

高级 Apache Spark MLlib 示例Advanced Apache Spark MLlib example

此笔记本说明如何创建自定义转换器。This notebook illustrates how to create a custom transformer.

自定义转换器笔记本Custom transformer notebook

获取笔记本Get notebook

对于 MLlib 功能的参考信息,Azure Databricks 建议使用以下 Apache Spark API 参考:For reference information about MLlib features, Azure Databricks recommends the following Apache Spark API reference:

若要使用 R 中的 Apache Spark MLlib,请参阅 R 机器学习文档。For using Apache Spark MLlib from R, refer to the R machine learning documentation.

若要了解 Azure Databricks 对机器学习算法可视化的支持,请参阅机器学习可视化效果For Azure Databricks support for visualizing machine learning algorithms, see Machine learning visualizations.