机器学习教程Machine learning tutorial

备注

Databricks Runtime ML 是使用 Azure Databricks 开发和部署机器学习模型的综合性工具。Databricks Runtime ML is a comprehensive tool for developing and deploying machine learning models with Azure Databricks. 它包括最常用的机器学习和深度学习库,以及 MLflow(一种用于跟踪和管理端到端机器学习生命周期的机器学习平台 API)。It includes the most popular machine learning and deep learning libraries, as well as MLflow, a machine learning platform API for tracking and managing the end-to-end machine learning lifecycle. 有关详细信息,请参阅机器学习和深度学习指南See Machine learning and deep learning guide for details.

Apache Spark 机器学习库 (MLlib) 使数据科学家能够专注于其数据问题和模型,而不是专注于解决围绕分布式数据的复杂性问题(例如基础结构、配置等)。The Apache Spark machine learning library (MLlib) allows data scientists to focus on their data problems and models instead of solving the complexities surrounding distributed data (such as infrastructure, configurations, and so on). 教程笔记本将引导你完成以下步骤:加载和预处理数据、使用 MLlib 算法训练模型、评估模型性能、优化模型以及进行预测。The tutorial notebook takes you through the steps of loading and preprocessing data, training a model using an MLlib algorithm, evaluating model performance, tuning the model, and making predictions. 它还说明了如何使用 MLlib 管道和 MLflow 机器学习平台。It also illustrates the use of MLlib pipelines and the MLflow machine learning platform.

笔记本Notebook

使用与群集上的 Databricks Runtime 版本相对应的笔记本。Use the notebook that corresponds to the Databricks Runtime version on your cluster. 如需更多机器学习示例,请参阅机器学习和深度学习指南For more machine learning examples, see Machine learning and deep learning guide.

MLlib 笔记本入门(Databricks Runtime 7.0 及更高版本)Get started with MLlib notebook (Databricks Runtime 7.0 and above)

获取笔记本Get notebook

MLlib 笔记本入门(Databricks Runtime 5.5 LTS 或 6.x)Get started with MLlib notebook (Databricks Runtime 5.5 LTS or 6.x)

获取笔记本Get notebook