在 Azure Databricks 上构建机器学习模型的端到端示例 End-to-end example of building machine learning models on Azure Databricks

现实世界中的机器学习非常混乱。Machine learning in the real world is messy. 数据源包含缺少的值、包含冗余行或可能无法放入内存中。Data sources contain missing values, include redundant rows, or may not fit in memory. 特征工程通常需要特定领域的专业知识并且可能很乏味。Feature engineering often requires domain expertise and can be tedious. 建模也常常混合了数据科学和系统工程,不仅需要了解算法,还需要了解计算机体系结构和分布式系统。Modeling too often mixes data science and systems engineering, requiring not only knowledge of algorithms but also of machine architecture and distributed systems.

Azure Databricks 简化了此过程。Azure Databricks simplifies this process. 下面的 10 分钟教程笔记本演示了针对表格数据训练机器学习模型的端到端示例。The following 10-minute tutorial notebook shows an end-to-end example of training machine learning models on tabular data. 你可以导入此笔记本并自己运行,也可以复制代码片段和想法供自己使用。You can import this notebook and run it yourself, or copy code-snippets and ideas for your own use.

要求Requirements

此笔记本需要 Databricks Runtime 6.5 ML 或更高版本。This notebook requires Databricks Runtime 6.5 ML or above.

笔记本Notebook

MLflow 端到端示例笔记本MLflow end-to-end example notebook

获取笔记本Get notebook