MLflow MLflow

MLflow 是用于管理端到端机器学习生命周期的开源平台。MLflow is an open source platform for managing the end-to-end machine learning lifecycle. 它具有以下主要组件:It has the following primary components:

  • 跟踪:用于跟踪试验,以记录和比较参数与结果。Tracking: Allows you to track experiments to record and compare parameters and results.
  • 模型:用于通过各种 ML 库管理模型,并将其部署到各种模型服务和推理平台。Models: Allow you to manage and deploy models from a variety of ML libraries to a variety of model serving and inference platforms.
  • 项目:用于将 ML 代码打包成可重用、可再现的格式,以便与其他数据科学家共享或转移到生产环境。Projects: Allow you to package ML code in a reusable, reproducible form to share with other data scientists or transfer to production.
  • 模型注册表:使你可以将模型存储集中化,以便使用版本控制和批注的功能来管理模型的完整生命周期阶段转换:从过渡到生产。Model Registry: Allows you to centralize a model store for managing models’ full lifecycle stage transitions: from staging to production, with capabilities for versioning and annotating.
  • 模型服务:可用于将 MLflow 模型以 REST 终结点的形式托管。Model Serving: Allows you to host MLflow Models as REST endpoints.

MLflow 支持 JavaPythonRREST API。MLflow supports Java, Python, R, and REST APIs.

Azure Databricks 提供与企业安全性功能、高可用性和其他 Azure Databricks 工作区功能(例如试验和运行管理,以及笔记本修订版捕获)集成的完全托管式 MLflow 版本。Azure Databricks provides a fully managed and hosted version of MLflow integrated with enterprise security features, high availability, and other Azure Databricks workspace features such as experiment and run management and notebook revision capture. Azure Databricks 上的 MLflow 提供集成体验用于跟踪和保护机器学习模型训练运行,以及运行机器学习项目。MLflow on Azure Databricks offers an integrated experience for tracking and securing machine learning model training runs and running machine learning projects.

首次使用的用户应从该快速入门开始,该快速入门演示了基本 MLflow 跟踪 API。First-time users should begin with the Quick Start, which demonstrates the basic MLflow tracking APIs. 后续文章通过示例笔记本介绍每个 MLflow 组件,并描述这些组件在 Azure Databricks 中的托管方式。The subsequent articles introduce each MLflow component with example notebooks and describe how these components are hosted within Azure Databricks.