MLflow 试验 MLflow experiment

MLflow 试验数据源提供了一种用于加载 MLflow 试验运行数据的标准 API。The MLflow experiment data source provides a standard API to load MLflow experiment run data. 可以从笔记本试验加载数据,也可以使用 MLflow 试验名称或试验 ID 加载数据。You can load data from the notebook experiment, or you can use the MLflow experiment name or experiment ID.

要求Requirements

Databricks Runtime 6.0 ML 或更高版本。Databricks Runtime 6.0 ML or above.

从笔记本试验加载数据Load data from the notebook experiment

若要从笔记本试验加载数据,请使用 load()To load data from the notebook experiment, use load().

PythonPython

df = spark.read.format("mlflow-experiment").load()
display(df)

ScalaScala

val df = spark.read.format("mlflow-experiment").load()
display(df)

使用试验 ID 加载数据Load data using experiment IDs

若要从一个或多个工作区试验加载数据,请按如下所示指定试验 ID。To load data from one or more workspace experiments, specify the experiment IDs as shown.

PythonPython

df = spark.read.format("mlflow-experiment").load("3270527066281272")
display(df)

ScalaScala

val df = spark.read.format("mlflow-experiment").load("3270527066281272,953590262154175")
display(df)

使用试验名称加载数据Load data using experiment name

还可将试验名称传递给 load() 方法。You can also pass the experiment name to the load() method.

PythonPython

expId = mlflow.get_experiment_by_name("/Shared/diabetes_experiment/").experiment_id
df = spark.read.format("mlflow-experiment").load(expId)
display(df)

ScalaScala

val expId = mlflow.getExperimentByName("/Shared/diabetes_experiment/").get.getExperimentId
val df = spark.read.format("mlflow-experiment").load(expId)
display(df)

基于指标和参数筛选数据Filter data based on metrics and parameters

本部分的示例演示如何在从试验中加载数据后对数据进行筛选。The examples in this section show how you can filter data after loading it from an experiment.

PythonPython

df = spark.read.format("mlflow-experiment").load("3270527066281272")
filtered_df = df.filter("metrics.loss < 0.01 AND params.learning_rate > '0.001'")
display(filtered_df)

ScalaScala

val df = spark.read.format("mlflow-experiment").load("3270527066281272")
val filtered_df = df.filter("metrics.loss < 1.85 AND params.num_epochs > '30'")
display(filtered_df)

架构Schema

数据源返回的数据帧的架构为:The schema of the DataFrame returned by the data source is:

root
|-- run_id: string
|-- experiment_id: string
|-- metrics: map
|    |-- key: string
|    |-- value: double
|-- params: map
|    |-- key: string
|    |-- value: string
|-- tags: map
|    |-- key: string
|    |-- value: string
|-- start_time: timestamp
|-- end_time: timestamp
|-- status: string
|-- artifact_uri: string