Nota
El acceso a esta página requiere autorización. Puede intentar iniciar sesión o cambiar directorios.
El acceso a esta página requiere autorización. Puede intentar cambiar los directorios.
预定义的judges.is_grounded()
法官会评估应用程序的响应是否由所提供的上下文(从 RAG 系统或工具调用生成)事实支持,帮助检测与该上下文不符合的幻觉或不实陈述。
可以通过预定义的 RetrievalGroundedness
评分器获得此评分标准,用于评估需要确保响应基于检索到的信息的 RAG 应用程序。
API 签名
有关详细信息,请参阅 mlflow.genai.judges.is_grounded()
。
from mlflow.genai.judges import is_grounded
def is_grounded(
*,
request: str, # User's original query
response: str, # Application's response
context: Any, # Context to evaluate for relevance, can be any Python primitive or a JSON-seralizable dict
name: Optional[str] = None # Optional custom name for display in the MLflow UIs
) -> mlflow.entities.Feedback:
"""Returns Feedback with 'yes' or 'no' value and a rationale"""
运行示例的先决条件
安装 MLflow 和所需包
pip install --upgrade "mlflow[databricks]>=3.1.0"
请按照设置环境快速入门创建 MLflow 试验。
直接使用 SDK
from mlflow.genai.judges import is_grounded
# Example 1: Response is grounded in context
feedback = is_grounded(
request="What is the capital of France?",
response="Paris",
context=[
{"content": "Paris is the capital of France."},
{"content": "Paris is known for its Eiffel Tower."}
]
)
print(feedback.value) # "yes"
print(feedback.rationale) # Explanation of groundedness
# Example 2: Response contains hallucination
feedback = is_grounded(
request="What is the capital of France?",
response="Paris, which has a population of 10 million people",
context=[
{"content": "Paris is the capital of France."}
]
)
print(feedback.value) # "no"
print(feedback.rationale) # Identifies unsupported claim about population
使用预构建的评分器
is_grounded
判断可通过 RetrievalGroundedness
预构建的评分器获得。
要求:
-
跟踪要求:
- MLflow 跟踪必须至少包含一个将
span_type
设置为RETRIEVER
的跨度 -
inputs
和outputs
必须位于跟踪的根跨度上
- MLflow 跟踪必须至少包含一个将
初始化 OpenAI 客户端以连接到由 Databricks 托管的 LLM 或者由 OpenAI 托管的 LLM。
Databricks 托管的 LLM
使用 MLflow 获取一个 OpenAI 客户端,以连接到由 Databricks 托管的 LLMs。 从可用的基础模型中选择一个模型。
import mlflow from databricks.sdk import WorkspaceClient # Enable MLflow's autologging to instrument your application with Tracing mlflow.openai.autolog() # Set up MLflow tracking to Databricks mlflow.set_tracking_uri("databricks") mlflow.set_experiment("/Shared/docs-demo") # Create an OpenAI client that is connected to Databricks-hosted LLMs w = WorkspaceClient() client = w.serving_endpoints.get_open_ai_client() # Select an LLM model_name = "databricks-claude-sonnet-4"
OpenAI 托管的 LLM
使用本地 OpenAI SDK 连接到由 OpenAI 托管的模型。 从 可用的 OpenAI 模型中选择一个模型。
import mlflow import os import openai # Ensure your OPENAI_API_KEY is set in your environment # os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>" # Uncomment and set if not globally configured # Enable auto-tracing for OpenAI mlflow.openai.autolog() # Set up MLflow tracking to Databricks mlflow.set_tracking_uri("databricks") mlflow.set_experiment("/Shared/docs-demo") # Create an OpenAI client connected to OpenAI SDKs client = openai.OpenAI() # Select an LLM model_name = "gpt-4o-mini"
使用评分器评判:
from mlflow.genai.scorers import RetrievalGroundedness from mlflow.entities import Document from typing import List # Define a retriever function with proper span type @mlflow.trace(span_type="RETRIEVER") def retrieve_docs(query: str) -> List[Document]: # Simulated retrieval based on query if "mlflow" in query.lower(): return [ Document( id="doc_1", page_content="MLflow is an open-source platform for managing the ML lifecycle.", metadata={"source": "mlflow_docs.txt"} ), Document( id="doc_2", page_content="MLflow provides tools for experiment tracking, model packaging, and deployment.", metadata={"source": "mlflow_features.txt"} ) ] else: return [ Document( id="doc_3", page_content="Machine learning involves training models on data.", metadata={"source": "ml_basics.txt"} ) ] # Define your RAG app @mlflow.trace def rag_app(query: str): # Retrieve relevant documents docs = retrieve_docs(query) context = "\n".join([doc.page_content for doc in docs]) # Generate response using LLM messages = [ {"role": "system", "content": f"Answer based on this context: {context}"}, {"role": "user", "content": query} ] response = client.chat.completions.create( # This example uses Databricks hosted Claude. If you provide your own OpenAI credentials, replace with a valid OpenAI model e.g., gpt-4o, etc. model=model_name, messages=messages ) return {"response": response.choices[0].message.content} # Create evaluation dataset eval_dataset = [ { "inputs": {"query": "What is MLflow used for?"} }, { "inputs": {"query": "What are the main features of MLflow?"} } ] # Run evaluation with RetrievalGroundedness scorer eval_results = mlflow.genai.evaluate( data=eval_dataset, predict_fn=rag_app, scorers=[RetrievalGroundedness()] )
在自定义评分器中使用
在评估具有与预定义评分器要求不同的数据结构的应用程序时,请将判断包装在自定义评分器中:
from mlflow.genai.judges import is_grounded
from mlflow.genai.scorers import scorer
from typing import Dict, Any
eval_dataset = [
{
"inputs": {"query": "What is MLflow used for?"},
"outputs": {
"response": "MLflow is used for managing the ML lifecycle, including experiment tracking and model deployment.",
"retrieved_context": [
{"content": "MLflow is a platform for managing the ML lifecycle."},
{"content": "MLflow includes capabilities for experiment tracking, model packaging, and deployment."}
]
}
},
{
"inputs": {"query": "Who created MLflow?"},
"outputs": {
"response": "MLflow was created by Databricks in 2018 and has over 10,000 contributors.",
"retrieved_context": [
{"content": "MLflow was created by Databricks."},
{"content": "MLflow was open-sourced in 2018."}
]
}
}
]
@scorer
def groundedness_scorer(inputs: Dict[Any, Any], outputs: Dict[Any, Any]):
return is_grounded(
request=inputs["query"],
response=outputs["response"],
context=outputs["retrieved_context"]
)
# Run evaluation
eval_results = mlflow.genai.evaluate(
data=eval_dataset,
scorers=[groundedness_scorer]
)
解释结果
法官返回一个 Feedback
对象,其中包含:
-
value
:如果响应已停止,则为“是”;如果响应包含幻觉,则为“否” -
rationale
:详细说明用于标识:- 上下文支持哪些语句
- 哪些陈述缺乏支持(幻觉)
- 支持或矛盾声明的上下文中的特定引文
后续步骤
- 评估上下文是否足够 - 检查检索器是否提供足够的信息
- 评估上下文相关性 - 确保检索的文档与查询相关
- 运行全面的 RAG 评估 - 合并多个评委以完成 RAG 评估