judges.is_context_relevant()
预定义的法官评估上下文是由 RAG 系统检索的,还是由工具调用生成的上下文是否与用户的请求相关。 这对于诊断质量问题至关重要 - 如果上下文不相关,则生成步骤无法生成有用的响应。
此法官可通过两个预定义的评分器获得:
-
RelevanceToQuery
:评估应用的响应是否直接解决用户的输入问题 -
RetrievalRelevance
:评估应用检索器返回的每个文档是否相关
API 签名
有关详细信息,请参阅 mlflow.genai.judges.is_context_relevant()
。
from mlflow.genai.judges import is_context_relevant
def is_context_relevant(
*,
request: str, # User's question or query
context: Any, # Context to evaluate for relevance, can be any Python primitive or a JSON-seralizable dict
name: Optional[str] = None # Optional custom name for display in the MLflow UIs
) -> mlflow.entities.Feedback:
"""Returns Feedback with 'yes' or 'no' value and a rationale"""
运行示例的先决条件
安装 MLflow 和所需包
pip install --upgrade "mlflow[databricks]>=3.1.0" openai "databricks-connect>=16.1"
请按照设置环境快速入门创建 MLflow 试验。
直接使用 SDK
from mlflow.genai.judges import is_context_relevant
# Example 1: Relevant context
feedback = is_context_relevant(
request="What is the capital of France?",
context="Paris is the capital of France."
)
print(feedback.value) # "yes"
print(feedback.rationale) # Explanation of relevance
# Example 2: Irrelevant context
feedback = is_context_relevant(
request="What is the capital of France?",
context="Paris is known for its Eiffel Tower."
)
print(feedback.value) # "no"
print(feedback.rationale) # Explanation of why it's not relevant
使用预构建的评分器
法官 is_context_relevant
可通过两个预生成的记分员获得:
1. RelevanceToQuery
记分器
此评分器评估应用的响应是否直接解决用户的输入问题,而不会偏离不相关的主题。
要求:
-
跟踪要求:
inputs
和outputs
必须位于跟踪的根跨度上
from mlflow.genai.scorers import RelevanceToQuery
eval_dataset = [
{
"inputs": {"query": "What is the capital of France?"},
"outputs": {
"response": "Paris is the capital of France. It's known for the Eiffel Tower and is a major European city."
},
},
{
"inputs": {"query": "What is the capital of France?"},
"outputs": {
"response": "France is a beautiful country with great wine and cuisine."
},
}
]
# Run evaluation with RelevanceToQuery scorer
eval_results = mlflow.genai.evaluate(
data=eval_dataset,
scorers=[RelevanceToQuery()]
)
2. RetrievalRelevance
记分器
此评分器评估应用检索器(s)返回的每个文档是否与输入请求相关。
要求:
-
跟踪要求:MLflow 跟踪必须至少包含一个范围,且设置为
span_type
RETRIEVER
import mlflow
from mlflow.genai.scorers import RetrievalRelevance
from mlflow.entities import Document
from typing import List
# Define a retriever function with proper span type
@mlflow.trace(span_type="RETRIEVER")
def retrieve_docs(query: str) -> List[Document]:
# Simulated retrieval - in practice, this would query a vector database
if "capital" in query.lower() and "france" in query.lower():
return [
Document(
id="doc_1",
page_content="Paris is the capital of France.",
metadata={"source": "geography.txt"}
),
Document(
id="doc_2",
page_content="The Eiffel Tower is located in Paris.",
metadata={"source": "landmarks.txt"}
)
]
else:
return [
Document(
id="doc_3",
page_content="Python is a programming language.",
metadata={"source": "tech.txt"}
)
]
# Define your app that uses the retriever
@mlflow.trace
def rag_app(query: str):
docs = retrieve_docs(query)
# In practice, you would pass these docs to an LLM
return {"response": f"Found {len(docs)} relevant documents."}
# Create evaluation dataset
eval_dataset = [
{
"inputs": {"query": "What is the capital of France?"}
},
{
"inputs": {"query": "How do I use Python?"}
}
]
# Run evaluation with RetrievalRelevance scorer
eval_results = mlflow.genai.evaluate(
data=eval_dataset,
predict_fn=rag_app,
scorers=[RetrievalRelevance()]
)
在自定义评分器中使用
在评估具有与预定义评分器要求不同的数据结构的应用程序时,请将判断包装在自定义评分器中:
from mlflow.genai.judges import is_context_relevant
from mlflow.genai.scorers import scorer
from typing import Dict, Any
eval_dataset = [
{
"inputs": {"query": "What are MLflow's main components?"},
"outputs": {
"retrieved_context": [
{"content": "MLflow has four main components: Tracking, Projects, Models, and Registry."}
]
}
},
{
"inputs": {"query": "What are MLflow's main components?"},
"outputs": {
"retrieved_context": [
{"content": "Python is a popular programming language."}
]
}
}
]
@scorer
def context_relevance_scorer(inputs: Dict[Any, Any], outputs: Dict[Any, Any]):
# Extract first context chunk for evaluation
context = outputs["retrieved_context"]
return is_context_relevant(
request=inputs["query"],
context=context
)
# Run evaluation
eval_results = mlflow.genai.evaluate(
data=eval_dataset,
scorers=[context_relevance_scorer]
)
解释结果
法官返回一个 Feedback
对象,其中包含:
-
value
:如果上下文相关,则为“是”,否则为“否” -
rationale
:解释上下文为何被视为相关或无关
后续步骤
- 探索其他预定义的评委 - 了解基础性、安全性和正确性法官
- 创建自定义法官 - 为用例构建专用法官
- 评估 RAG 应用程序 - 在全面的 RAG 评估中应用相关性法官