다음을 통해 공유

生产监视 API 参考(旧版)

重要

此功能在 Beta 版中。

注释

Databricks 建议使用已注册的记分器进行生产监视。 请参阅 记分器生命周期管理 API 参考

通过生产监控,可以通过在实时流量上自动运行评分器来持续评价您的 GenAI 应用程序的质量。 监视服务每 15 分钟运行一次,使用开发中使用的相同评分器评估可配置的跟踪示例。

工作原理

为 MLflow 试验启用生产监视时:

  1. 自动执行 - 后台作业每 15 分钟运行一次(初始设置后)

  2. 记分器评估 - 每个配置的记分器在生产跟踪示例上运行

  3. 反馈附件 - 结果作为 反馈 附加到每个评估的跟踪

  4. 数据存档 - 所有跟踪(而不仅仅是采样跟踪)都会写入 Unity 目录中的 Delta 表进行分析

监视服务可确保使用开发中的相同评分器进行一致的评估,无需手动干预即可提供自动化质量评估。

重要

生产监视仅支持 预定义的评分器。 如果需要在生产环境中运行基于代码的自定义或基于 LLM 的评分器,请联系 Databricks 帐户代表。

API 参考文档

create_external_monitor

为 Databricks 外部提供的 GenAI 应用程序创建监视器。 创建后,监视器将开始根据配置的评估套件自动评估跟踪。

# These packages are automatically installed with mlflow[databricks]
from databricks.agents.monitoring import create_external_monitor

create_external_monitor(
    *,
    catalog_name: str,
    schema_name: str,
    assessments_config: AssessmentsSuiteConfig | dict,
    experiment_id: str | None = None,
    experiment_name: str | None = None,
) -> ExternalMonitor

参数

参数 类型 Description
catalog_name str 将在其中创建跟踪存档表的 Unity 目录名称
schema_name str 将在其中创建跟踪存档表的 Unity 目录架构名称
assessments_config AssessmentsSuiteConfigdict 用于在跟踪上运行的评估套件的配置
experiment_id strNone 要与监视器关联的 MLflow 试验的 ID。 默认为当前活动试验
experiment_name strNone 要与监视器关联的 MLflow 试验的名称。 默认为当前活动试验

退货

ExternalMonitor - 创建的包含试验 ID、配置和监视 URL 的监视对象

Example

import mlflow
from databricks.agents.monitoring import create_external_monitor, AssessmentsSuiteConfig, BuiltinJudge, GuidelinesJudge

# Create a monitor with multiple scorers
external_monitor = create_external_monitor(
    catalog_name="workspace",
    schema_name="default",
    assessments_config=AssessmentsSuiteConfig(
        sample=0.5,  # Sample 50% of traces
        assessments=[
            BuiltinJudge(name="safety"),
            BuiltinJudge(name="relevance_to_query"),
            BuiltinJudge(name="groundedness", sample_rate=0.2),  # Override sampling for this scorer
            GuidelinesJudge(
                guidelines={
                    "mlflow_only": [
                        "If the request is unrelated to MLflow, the response must refuse to answer."
                    ],
                    "professional_tone": [
                        "The response must maintain a professional and helpful tone."
                    ]
                }
            ),
        ],
    ),
)

print(f"Monitor created for experiment: {external_monitor.experiment_id}")
print(f"View traces at: {external_monitor.monitoring_page_url}")

get_external_monitor

检索 Databricks 外部提供的 GenAI 应用程序的现有监视器。

# These packages are automatically installed with mlflow[databricks]
from databricks.agents.monitoring import get_external_monitor

get_external_monitor(
    *,
    experiment_id: str | None = None,
    experiment_name: str | None = None,
) -> ExternalMonitor

参数

参数 类型 Description
experiment_id strNone 与监视器关联的 MLflow 试验的 ID
experiment_name strNone 与监视器关联的 MLflow 试验的名称

退货

ExternalMonitor - 检索的监视器对象

提高

  • ValueError - 提供experiment_id和experiment_name时
  • NoMonitorFoundError - 找不到给定试验的监视器时

Example

from databricks.agents.monitoring import get_external_monitor

# Get monitor by experiment ID
monitor = get_external_monitor(experiment_id="123456789")

# Get monitor by experiment name
monitor = get_external_monitor(experiment_name="my-genai-app-experiment")

# Access monitor configuration
print(f"Sampling rate: {monitor.assessments_config.sample}")
print(f"Archive table: {monitor.trace_archive_table}")

update_external_monitor

更新现有监视器的配置。 配置已完全替换为新值(未合并)。

# These packages are automatically installed with mlflow[databricks]
from databricks.agents.monitoring import update_external_monitor

update_external_monitor(
    *,
    experiment_id: str | None = None,
    experiment_name: str | None = None,
    assessments_config: AssessmentsSuiteConfig | dict,
) -> ExternalMonitor

参数

参数 类型 Description
experiment_id strNone 与监视器关联的 MLflow 试验的 ID
experiment_name strNone 与监视器关联的 MLflow 试验的名称
assessments_config AssessmentsSuiteConfigdict 更新的配置将完全替换现有配置

退货

ExternalMonitor - 更新后的监视器对象

提高

  • ValueError - 未提供assessments_config时

delete_external_monitor

删除 Databricks 外部提供的 GenAI 应用程序的监视器。

# These packages are automatically installed with mlflow[databricks]
from databricks.agents.monitoring import delete_external_monitor

delete_external_monitor(
    *,
    experiment_id: str | None = None,
    experiment_name: str | None = None,
) -> None

参数

参数 类型 Description
experiment_id strNone 与监视器关联的 MLflow 试验的 ID
experiment_name strNone 与监视器关联的 MLflow 试验的名称

Example

from databricks.agents.monitoring import delete_external_monitor

# Delete monitor by experiment ID
delete_external_monitor(experiment_id="123456789")

# Delete monitor by experiment name
delete_external_monitor(experiment_name="my-genai-app-experiment")

Configuration 类

AssessmentsSuiteConfig

要针对 GenAI 应用程序的跟踪上运行的一套评估的配置。

# These packages are automatically installed with mlflow[databricks]
from databricks.agents.monitoring import AssessmentsSuiteConfig

@dataclasses.dataclass
class AssessmentsSuiteConfig:
    sample: float | None = None
    paused: bool | None = None
    assessments: list[AssessmentConfig] | None = None

特性

Attribute 类型 Description
sample floatNone 0.0(独占)和 1.0 之间的全局采样率(含)。 单个评估可以替代此情况
paused boolNone 是否暂停监视
assessments list[AssessmentConfig]None 在跟踪上运行的评估列表

Methods

from_dict

从字典表示形式创建 AssessmentsSuiteConfig。

@classmethod
def from_dict(cls, data: dict) -> AssessmentsSuiteConfig
get_guidelines_judge

从评估列表中返回第一个 GuidelinesJudge,如果未找到,则返回 None。

def get_guidelines_judge(self) -> GuidelinesJudge | None

Example

from databricks.agents.monitoring import AssessmentsSuiteConfig, BuiltinJudge, GuidelinesJudge

# Create configuration with multiple assessments
config = AssessmentsSuiteConfig(
    sample=0.3,  # Sample 30% of all traces
    assessments=[
        BuiltinJudge(name="safety"),
        BuiltinJudge(name="relevance_to_query", sample_rate=0.5),  # Override to 50%
        GuidelinesJudge(
            guidelines={
                "accuracy": ["The response must be factually accurate"],
                "completeness": ["The response must fully address the user's question"]
            }
        )
    ]
)

# Create from dictionary
config_dict = {
    "sample": 0.3,
    "assessments": [
        {"name": "safety"},
        {"name": "relevance_to_query", "sample_rate": 0.5}
    ]
}
config = AssessmentsSuiteConfig.from_dict(config_dict)

BuiltinJudge

用于在跟踪上运行的内置法官的配置。

# These packages are automatically installed with mlflow[databricks]
from databricks.agents.monitoring import BuiltinJudge

@dataclasses.dataclass
class BuiltinJudge:
    name: Literal["safety", "groundedness", "relevance_to_query", "chunk_relevance"]
    sample_rate: float | None = None

特性

Attribute 类型 Description
name str 内置法官的名称。 必须是以下项之一: "safety""groundedness""relevance_to_query""chunk_relevance"
sample_rate floatNone 此特定法官(0.0 到 1.0)的可选替代采样率

可用的内置法官

  • safety - 检测响应中的有害或有毒内容
  • groundedness - 评估响应是否在检索的上下文中(RAG 应用程序)
  • relevance_to_query - 检查响应是否解决了用户的请求
  • chunk_relevance - 评估每个检索的区块(RAG 应用程序)的相关性

GuidelinesJudge

针对准则遵循判断来评估自定义业务规则的配置。

# These packages are automatically installed with mlflow[databricks]
from databricks.agents.monitoring import GuidelinesJudge

@dataclasses.dataclass
class GuidelinesJudge:
    guidelines: dict[str, list[str]]
    sample_rate: float | None = None
    name: Literal["guideline_adherence"] = "guideline_adherence"  # Set automatically

特性

Attribute 类型 Description
guidelines dict[str, list[str]] 字典映射指南名称到指南说明列表
sample_rate floatNone 此法官的可选替代采样率(0.0 到 1.0)

Example

from databricks.agents.monitoring import GuidelinesJudge

# Create guidelines judge with multiple business rules
guidelines_judge = GuidelinesJudge(
    guidelines={
        "data_privacy": [
            "The response must not reveal any personal customer information",
            "The response must not include internal system details"
        ],
        "brand_voice": [
            "The response must maintain a professional yet friendly tone",
            "The response must use 'we' instead of 'I' when referring to the company"
        ],
        "accuracy": [
            "The response must only provide information that can be verified",
            "The response must acknowledge uncertainty when appropriate"
        ]
    },
    sample_rate=0.8  # Evaluate 80% of traces with these guidelines
)

ExternalMonitor

表示 Databricks 外部提供的 GenAI 应用程序的监视器。

@dataclasses.dataclass
class ExternalMonitor:
    experiment_id: str
    assessments_config: AssessmentsSuiteConfig
    trace_archive_table: str | None
    _checkpoint_table: str
    _legacy_ingestion_endpoint_name: str

    @property
    def monitoring_page_url(self) -> str

特性

Attribute 类型 Description
experiment_id str 与此监视器关联的 MLflow 试验的 ID
assessments_config AssessmentsSuiteConfig 正在运行的评估的配置
trace_archive_table strNone 存档跟踪的 Unity 目录表
monitoring_page_url str 在 MLflow UI 中查看监视结果的 URL

后续步骤