工作区资产Workspace assets

本文概述性介绍 Azure Databricks 工作区资产。This article provides a high-level introduction to Azure Databricks workspace assets.

群集 Clusters

Azure Databricks 群集为各种用例(如运行生产 ETL 管道、流分析、临时分析和机器学习)提供了统一的平台。Azure Databricks clusters provide a unified platform for various use cases such as running production ETL pipelines, streaming analytics, ad-hoc analytics, and machine learning.

有关如何管理和使用群集的详细信息,请参阅群集For detailed information on managing and using clusters, see Clusters.

笔记本 Notebooks

笔记本是一种基于 web 的文档界面,其中包含一系列可运行单元(命令),可对文件、表格可视化效果和叙述性文本进行操作。A notebook is a web-based interface to documents containing a series of runnable cells (commands) that operate on files and tables, visualizations, and narrative text. 命令可以按顺序运行,引用一个或多个以前运行的命令的输出。Commands can be run in sequence, referring to the output of one or more previously run commands.

笔记本是在 Azure Databricks 中运行代码的一种机制。Notebooks are one mechanism for running code in Azure Databricks. 另一种机制是作业The other mechanism is jobs.

有关如何管理和使用笔记本的详细信息,请参阅笔记本For detailed information on managing and using notebooks, see Notebooks.

作业 Jobs

作业是在 Azure Databricks 中运行代码的一种机制。Jobs are one mechanism for running code in Azure Databricks. 另一种机制是笔记本The other mechanism is notebooks.

有关如何管理和使用作业的详细信息,请参阅作业For detailed information on managing and using jobs, see Jobs.

Libraries

库使你群集上运行的笔记本和作业能够使用第三方或本地生成的代码。A library makes third-party or locally-built code available to notebooks and jobs running on your clusters.

有关如何管理和使用库的详细信息,请参阅For detailed information on managing and using libraries, see Libraries.

数据 Data

可以将数据导入一个装载到 Azure Databricks 工作区中的分布式文件系统,并在 Azure Databricks 笔记本和群集中使用。You can import data into a distributed file system mounted into an Azure Databricks workspace and work with it in Azure Databricks notebooks and clusters. 还可以使用各种 Apache Spark 数据源来访问数据。You can also use a wide variety of Apache Spark data sources to access data.

若要详细了解如何管理和使用数据,请查看数据指南For detailed information on managing and using data, see Data guide.

存储库 Repos

存储库是 Azure Databricks 文件夹,其内容是通过同步到远程 Git 存储库进行共同版本控制的。Repos are Azure Databricks folders whose contents are co-versioned together by syncing them to a remote Git repository. 使用 Azure Databricks 存储库,你可以在 Azure Databricks 中开发笔记本,并使用远程 Git 存储库进行协作和版本控制。Using a Azure Databricks repo, you can develop notebooks in Azure Databricks and use a remote Git repository for collaboration and version control.

有关使用存储库的详细信息,请参阅用于 Git 集成的存储库For detailed information on using repos, see Repos for Git integration.

模型Models

模型是指在 MLflow 模型注册表中注册的模型。Model refers to a model registered in MLflow Model Registry. 模型注册表是一种集中式模型存储,可用于管理 MLflow 模型的完整生命周期。Model Registry is a centralized model store that enables you to manage the full lifecycle of MLflow models. 它提供按时间顺序的模型世系、模型版本控制、阶段转换以及模型和模型版本批注和说明。It provides chronological model lineage, model versioning, stage transitions, and model and model version annotations and descriptions.

若要详细了解如何管理和使用模型,请查看 Azure Databricks 上的 MLflow 模型For detailed information on managing and using models, see MLflow Model Registry on Azure Databricks.

试验 Experiments

MLflow 试验是组织的基本构成单位和适用于 MLflow 机器学习模型训练运行的访问控制;所有 MLflow 运行都属于试验。An MLflow experiment is the primary unit of organization and access control for MLflow machine learning model training runs; all MLflow runs belong to an experiment. 每个试验都允许可视化、搜索和比较运行,以及下载运行项目或元数据以便在其他工具中进行分析。Each experiment lets you visualize, search, and compare runs, as well as download run artifacts or metadata for analysis in other tools.

有关如何管理和使用试验的详细信息,请参阅试验For detailed information on managing and using experiments, see Experiments.