重要
此功能在 Beta 版中。
注释
Databricks 建议使用已注册的记分器进行生产监视。 请参阅 记分器生命周期管理 API 参考。
通过生产监控,可以通过在实时流量上自动运行评分器来持续评价您的 GenAI 应用程序的质量。 监视服务每 15 分钟运行一次,使用开发中使用的相同评分器评估可配置的跟踪示例。
工作原理
为 MLflow 试验启用生产监视时:
自动执行 - 后台作业每 15 分钟运行一次(初始设置后)
记分器评估 - 每个配置的记分器在生产跟踪示例上运行
反馈附件 - 结果作为 反馈 附加到每个评估的跟踪
数据存档 - 所有跟踪(而不仅仅是采样跟踪)都会写入 Unity 目录中的 Delta 表进行分析
监视服务可确保使用开发中的相同评分器进行一致的评估,无需手动干预即可提供自动化质量评估。
重要
生产监视仅支持 预定义的评分器。 如果需要在生产环境中运行基于代码的自定义或基于 LLM 的评分器,请联系 Databricks 帐户代表。
API 参考文档
create_external_monitor
为 Databricks 外部提供的 GenAI 应用程序创建监视器。 创建后,监视器将开始根据配置的评估套件自动评估跟踪。
# These packages are automatically installed with mlflow[databricks]
from databricks.agents.monitoring import create_external_monitor
create_external_monitor(
*,
catalog_name: str,
schema_name: str,
assessments_config: AssessmentsSuiteConfig | dict,
experiment_id: str | None = None,
experiment_name: str | None = None,
) -> ExternalMonitor
参数
参数 | 类型 | Description |
---|---|---|
catalog_name |
str |
将在其中创建跟踪存档表的 Unity 目录名称 |
schema_name |
str |
将在其中创建跟踪存档表的 Unity 目录架构名称 |
assessments_config |
AssessmentsSuiteConfig 或 dict |
用于在跟踪上运行的评估套件的配置 |
experiment_id |
str 或 None |
要与监视器关联的 MLflow 试验的 ID。 默认为当前活动试验 |
experiment_name |
str 或 None |
要与监视器关联的 MLflow 试验的名称。 默认为当前活动试验 |
退货
ExternalMonitor
- 创建的包含试验 ID、配置和监视 URL 的监视对象
Example
import mlflow
from databricks.agents.monitoring import create_external_monitor, AssessmentsSuiteConfig, BuiltinJudge, GuidelinesJudge
# Create a monitor with multiple scorers
external_monitor = create_external_monitor(
catalog_name="workspace",
schema_name="default",
assessments_config=AssessmentsSuiteConfig(
sample=0.5, # Sample 50% of traces
assessments=[
BuiltinJudge(name="safety"),
BuiltinJudge(name="relevance_to_query"),
BuiltinJudge(name="groundedness", sample_rate=0.2), # Override sampling for this scorer
GuidelinesJudge(
guidelines={
"mlflow_only": [
"If the request is unrelated to MLflow, the response must refuse to answer."
],
"professional_tone": [
"The response must maintain a professional and helpful tone."
]
}
),
],
),
)
print(f"Monitor created for experiment: {external_monitor.experiment_id}")
print(f"View traces at: {external_monitor.monitoring_page_url}")
get_external_monitor
检索 Databricks 外部提供的 GenAI 应用程序的现有监视器。
# These packages are automatically installed with mlflow[databricks]
from databricks.agents.monitoring import get_external_monitor
get_external_monitor(
*,
experiment_id: str | None = None,
experiment_name: str | None = None,
) -> ExternalMonitor
参数
参数 | 类型 | Description |
---|---|---|
experiment_id |
str 或 None |
与监视器关联的 MLflow 试验的 ID |
experiment_name |
str 或 None |
与监视器关联的 MLflow 试验的名称 |
退货
ExternalMonitor
- 检索的监视器对象
提高
-
ValueError
- 提供experiment_id和experiment_name时 -
NoMonitorFoundError
- 找不到给定试验的监视器时
Example
from databricks.agents.monitoring import get_external_monitor
# Get monitor by experiment ID
monitor = get_external_monitor(experiment_id="123456789")
# Get monitor by experiment name
monitor = get_external_monitor(experiment_name="my-genai-app-experiment")
# Access monitor configuration
print(f"Sampling rate: {monitor.assessments_config.sample}")
print(f"Archive table: {monitor.trace_archive_table}")
update_external_monitor
更新现有监视器的配置。 配置已完全替换为新值(未合并)。
# These packages are automatically installed with mlflow[databricks]
from databricks.agents.monitoring import update_external_monitor
update_external_monitor(
*,
experiment_id: str | None = None,
experiment_name: str | None = None,
assessments_config: AssessmentsSuiteConfig | dict,
) -> ExternalMonitor
参数
参数 | 类型 | Description |
---|---|---|
experiment_id |
str 或 None |
与监视器关联的 MLflow 试验的 ID |
experiment_name |
str 或 None |
与监视器关联的 MLflow 试验的名称 |
assessments_config |
AssessmentsSuiteConfig 或 dict |
更新的配置将完全替换现有配置 |
退货
ExternalMonitor
- 更新后的监视器对象
提高
-
ValueError
- 未提供assessments_config时
delete_external_monitor
删除 Databricks 外部提供的 GenAI 应用程序的监视器。
# These packages are automatically installed with mlflow[databricks]
from databricks.agents.monitoring import delete_external_monitor
delete_external_monitor(
*,
experiment_id: str | None = None,
experiment_name: str | None = None,
) -> None
参数
参数 | 类型 | Description |
---|---|---|
experiment_id |
str 或 None |
与监视器关联的 MLflow 试验的 ID |
experiment_name |
str 或 None |
与监视器关联的 MLflow 试验的名称 |
Example
from databricks.agents.monitoring import delete_external_monitor
# Delete monitor by experiment ID
delete_external_monitor(experiment_id="123456789")
# Delete monitor by experiment name
delete_external_monitor(experiment_name="my-genai-app-experiment")
Configuration 类
AssessmentsSuiteConfig
要针对 GenAI 应用程序的跟踪上运行的一套评估的配置。
# These packages are automatically installed with mlflow[databricks]
from databricks.agents.monitoring import AssessmentsSuiteConfig
@dataclasses.dataclass
class AssessmentsSuiteConfig:
sample: float | None = None
paused: bool | None = None
assessments: list[AssessmentConfig] | None = None
特性
Attribute | 类型 | Description |
---|---|---|
sample |
float 或 None |
0.0(独占)和 1.0 之间的全局采样率(含)。 单个评估可以替代此情况 |
paused |
bool 或 None |
是否暂停监视 |
assessments |
list[AssessmentConfig] 或 None |
在跟踪上运行的评估列表 |
Methods
from_dict
从字典表示形式创建 AssessmentsSuiteConfig。
@classmethod
def from_dict(cls, data: dict) -> AssessmentsSuiteConfig
get_guidelines_judge
从评估列表中返回第一个 GuidelinesJudge,如果未找到,则返回 None。
def get_guidelines_judge(self) -> GuidelinesJudge | None
Example
from databricks.agents.monitoring import AssessmentsSuiteConfig, BuiltinJudge, GuidelinesJudge
# Create configuration with multiple assessments
config = AssessmentsSuiteConfig(
sample=0.3, # Sample 30% of all traces
assessments=[
BuiltinJudge(name="safety"),
BuiltinJudge(name="relevance_to_query", sample_rate=0.5), # Override to 50%
GuidelinesJudge(
guidelines={
"accuracy": ["The response must be factually accurate"],
"completeness": ["The response must fully address the user's question"]
}
)
]
)
# Create from dictionary
config_dict = {
"sample": 0.3,
"assessments": [
{"name": "safety"},
{"name": "relevance_to_query", "sample_rate": 0.5}
]
}
config = AssessmentsSuiteConfig.from_dict(config_dict)
BuiltinJudge
用于在跟踪上运行的内置法官的配置。
# These packages are automatically installed with mlflow[databricks]
from databricks.agents.monitoring import BuiltinJudge
@dataclasses.dataclass
class BuiltinJudge:
name: Literal["safety", "groundedness", "relevance_to_query", "chunk_relevance"]
sample_rate: float | None = None
特性
Attribute | 类型 | Description |
---|---|---|
name |
str |
内置法官的名称。 必须是以下项之一: "safety" 、 "groundedness" 、 "relevance_to_query" 、 "chunk_relevance" |
sample_rate |
float 或 None |
此特定法官(0.0 到 1.0)的可选替代采样率 |
可用的内置法官
-
safety
- 检测响应中的有害或有毒内容 -
groundedness
- 评估响应是否在检索的上下文中(RAG 应用程序) -
relevance_to_query
- 检查响应是否解决了用户的请求 -
chunk_relevance
- 评估每个检索的区块(RAG 应用程序)的相关性
GuidelinesJudge
针对准则遵循判断来评估自定义业务规则的配置。
# These packages are automatically installed with mlflow[databricks]
from databricks.agents.monitoring import GuidelinesJudge
@dataclasses.dataclass
class GuidelinesJudge:
guidelines: dict[str, list[str]]
sample_rate: float | None = None
name: Literal["guideline_adherence"] = "guideline_adherence" # Set automatically
特性
Attribute | 类型 | Description |
---|---|---|
guidelines |
dict[str, list[str]] |
字典映射指南名称到指南说明列表 |
sample_rate |
float 或 None |
此法官的可选替代采样率(0.0 到 1.0) |
Example
from databricks.agents.monitoring import GuidelinesJudge
# Create guidelines judge with multiple business rules
guidelines_judge = GuidelinesJudge(
guidelines={
"data_privacy": [
"The response must not reveal any personal customer information",
"The response must not include internal system details"
],
"brand_voice": [
"The response must maintain a professional yet friendly tone",
"The response must use 'we' instead of 'I' when referring to the company"
],
"accuracy": [
"The response must only provide information that can be verified",
"The response must acknowledge uncertainty when appropriate"
]
},
sample_rate=0.8 # Evaluate 80% of traces with these guidelines
)
ExternalMonitor
表示 Databricks 外部提供的 GenAI 应用程序的监视器。
@dataclasses.dataclass
class ExternalMonitor:
experiment_id: str
assessments_config: AssessmentsSuiteConfig
trace_archive_table: str | None
_checkpoint_table: str
_legacy_ingestion_endpoint_name: str
@property
def monitoring_page_url(self) -> str
特性
Attribute | 类型 | Description |
---|---|---|
experiment_id |
str |
与此监视器关联的 MLflow 试验的 ID |
assessments_config |
AssessmentsSuiteConfig |
正在运行的评估的配置 |
trace_archive_table |
str 或 None |
存档跟踪的 Unity 目录表 |
monitoring_page_url |
str |
在 MLflow UI 中查看监视结果的 URL |