MLflow 试验 MLflow experiment
MLflow 试验数据源提供了一种用于加载 MLflow 试验运行数据的标准 API。The MLflow experiment data source provides a standard API to load MLflow experiment run data. 可以从笔记本试验加载数据,也可以使用 MLflow 试验名称或试验 ID 加载数据。You can load data from the notebook experiment, or you can use the MLflow experiment name or experiment ID.
要求Requirements
Databricks Runtime 6.0 ML 或更高版本。Databricks Runtime 6.0 ML or above.
从笔记本试验加载数据Load data from the notebook experiment
若要从笔记本试验加载数据,请使用 load()
。To load data from the notebook experiment, use load()
.
PythonPython
df = spark.read.format("mlflow-experiment").load()
display(df)
ScalaScala
val df = spark.read.format("mlflow-experiment").load()
display(df)
使用试验 ID 加载数据Load data using experiment IDs
若要从一个或多个工作区试验加载数据,请按如下所示指定试验 ID。To load data from one or more workspace experiments, specify the experiment IDs as shown.
PythonPython
df = spark.read.format("mlflow-experiment").load("3270527066281272")
display(df)
ScalaScala
val df = spark.read.format("mlflow-experiment").load("3270527066281272,953590262154175")
display(df)
使用试验名称加载数据Load data using experiment name
还可将试验名称传递给 load()
方法。You can also pass the experiment name to the load()
method.
PythonPython
expId = mlflow.get_experiment_by_name("/Shared/diabetes_experiment/").experiment_id
df = spark.read.format("mlflow-experiment").load(expId)
display(df)
ScalaScala
val expId = mlflow.getExperimentByName("/Shared/diabetes_experiment/").get.getExperimentId
val df = spark.read.format("mlflow-experiment").load(expId)
display(df)
基于指标和参数筛选数据Filter data based on metrics and parameters
本部分的示例演示如何在从试验中加载数据后对数据进行筛选。The examples in this section show how you can filter data after loading it from an experiment.
PythonPython
df = spark.read.format("mlflow-experiment").load("3270527066281272")
filtered_df = df.filter("metrics.loss < 0.01 AND params.learning_rate > '0.001'")
display(filtered_df)
ScalaScala
val df = spark.read.format("mlflow-experiment").load("3270527066281272")
val filtered_df = df.filter("metrics.loss < 1.85 AND params.num_epochs > '30'")
display(filtered_df)
架构Schema
数据源返回的数据帧的架构为:The schema of the DataFrame returned by the data source is:
root
|-- run_id: string
|-- experiment_id: string
|-- metrics: map
| |-- key: string
| |-- value: double
|-- params: map
| |-- key: string
| |-- value: string
|-- tags: map
| |-- key: string
| |-- value: string
|-- start_time: timestamp
|-- end_time: timestamp
|-- status: string
|-- artifact_uri: string