标记架构定义域专家在“审阅”应用中 标记现有跟踪 时回答的特定问题。 他们设计反馈收集过程,确保为评估您的 GenAI 应用程序提供一致且相关的信息。
注释
仅当使用审阅应用 标记现有跟踪 时,标记架构才适用,而不适用于使用审阅应用 在聊天 UI 中测试新应用版本时。
标记架构的工作原理
创建标记会话时,将其与一个或多个标记架构相关联。 每个架构都代表一个 Feedback 或附加到 Expectation的 Assessment。
架构控制:
- 显示给审阅者的问题
- 输入法(下拉列表、文本框等)
- 验证规则和约束
- 可选说明和注释
重要
标记架构名称在每个 MLflow 试验中必须是唯一的。 不能在同一试验中具有同一名称的两个架构,但可以在不同的试验中重复使用架构名称。
常见用例的标签架构
MLflow 为使用期望值的 内置 LLM 法官 提供预定义的架构名称。 可以使用这些名称创建自定义架构,以确保与内置评估功能兼容。
- 与准则法官合作
-
GUIDELINES:收集 GenAI 应用在处理请求时应该遵循的理想指令
-
- 与正确性判断配合使用
-
EXPECTED_FACTS:收集为确保正确性而必须包括的事实陈述 -
EXPECTED_RESPONSE:收集完整的基本真实答案
-
为常见用例创建架构
import mlflow.genai.label_schemas as schemas
from mlflow.genai.label_schemas import LabelSchemaType, InputTextList, InputText
# Schema for collecting expected facts
expected_facts_schema = schemas.create_label_schema(
name=schemas.EXPECTED_FACTS,
type=LabelSchemaType.EXPECTATION,
title="Expected facts",
input=InputTextList(max_length_each=1000),
instruction="Please provide a list of facts that you expect to see in a correct response.",
overwrite=True
)
# Schema for collecting guidelines
guidelines_schema = schemas.create_label_schema(
name=schemas.GUIDELINES,
type=LabelSchemaType.EXPECTATION,
title="Guidelines",
input=InputTextList(max_length_each=500),
instruction="Please provide guidelines that the model's output is expected to adhere to.",
overwrite=True
)
# Schema for collecting expected response
expected_response_schema = schemas.create_label_schema(
name=schemas.EXPECTED_RESPONSE,
type=LabelSchemaType.EXPECTATION,
title="Expected response",
input=InputText(),
instruction="Please provide a correct agent response.",
overwrite=True
)
创建自定义标记架构
创建自定义架构以收集域的特定反馈。 可以通过 MLflow UI 或通过编程方式使用 SDK 创建架构。
注释
请记住,架构名称在当前 MLflow 试验中必须是唯一的。 选择明确指示每个架构用途的描述性名称。
通过 UI 创建架构
导航到 MLflow UI 中的“ 标记 ”选项卡以直观方式创建架构。 这提供了一个直观的界面,用于定义问题、输入类型和验证规则,而无需编写代码。
以编程方式创建架构
所有架构都需要名称、类型、标题和输入规范。
基本架构创建
import mlflow.genai.label_schemas as schemas
from mlflow.genai.label_schemas import InputCategorical, InputText
# Create a feedback schema for rating response quality
quality_schema = schemas.create_label_schema(
name="response_quality",
type="feedback",
title="How would you rate the overall quality of this response?",
input=InputCategorical(options=["Poor", "Fair", "Good", "Excellent"]),
instruction="Consider accuracy, relevance, and helpfulness when rating."
)
架构类型
在两种架构类型之间进行选择:
-
feedback:评分、偏好或意见等主观评估 -
expectation:客观的真相,如正确的答案或预期行为
import mlflow.genai.label_schemas as schemas
from mlflow.genai.label_schemas import InputCategorical, InputTextList
# Feedback schema for subjective assessment
tone_schema = schemas.create_label_schema(
name="response_tone",
type="feedback",
title="Is the response tone appropriate for the context?",
input=InputCategorical(options=["Too formal", "Just right", "Too casual"]),
enable_comment=True # Allow additional comments
)
# Expectation schema for ground truth
facts_schema = schemas.create_label_schema(
name="required_facts",
type="expectation",
title="What facts must be included in a correct response?",
input=InputTextList(max_count=5, max_length_each=200),
instruction="List key facts that any correct response must contain."
)
管理标记架构
使用 SDK 函数以编程方式管理架构:
检索架构
import mlflow.genai.label_schemas as schemas
# Get an existing schema
schema = schemas.get_label_schema("response_quality")
print(f"Schema: {schema.name}")
print(f"Type: {schema.type}")
print(f"Title: {schema.title}")
更新架构
import mlflow.genai.label_schemas as schemas
from mlflow.genai.label_schemas import InputCategorical
# Update by recreating with overwrite=True
updated_schema = schemas.create_label_schema(
name="response_quality",
type="feedback",
title="Rate the response quality (updated question)",
input=InputCategorical(options=["Excellent", "Good", "Fair", "Poor", "Very Poor"]),
instruction="Updated: Focus on factual accuracy above all else.",
overwrite=True # Replace existing schema
)
删除架构
import mlflow.genai.label_schemas as schemas
# Remove a schema that's no longer needed
schemas.delete_label_schema("old_schema_name")
自定义架构的输入类型
MLflow 支持五种输入类型来收集不同类型的反馈:
单选下拉列表(InputCategorical)
用于互斥选项:
from mlflow.genai.label_schemas import InputCategorical
# Rating scale
rating_input = InputCategorical(
options=["1 - Poor", "2 - Below Average", "3 - Average", "4 - Good", "5 - Excellent"]
)
# Binary choice
safety_input = InputCategorical(options=["Safe", "Unsafe"])
# Multiple categories
error_type_input = InputCategorical(
options=["Factual Error", "Logical Error", "Formatting Error", "No Error"]
)
多选下拉列表 (InputCategoricalList)
选择多个选项时使用:
from mlflow.genai.label_schemas import InputCategoricalList
# Multiple error types can be present
errors_input = InputCategoricalList(
options=[
"Factual inaccuracy",
"Missing context",
"Inappropriate tone",
"Formatting issues",
"Off-topic content"
]
)
# Multiple content types
content_input = InputCategoricalList(
options=["Technical details", "Examples", "References", "Code samples"]
)
自由格式文本(InputText)
用于开放式回答:
from mlflow.genai.label_schemas import InputText
# General feedback
feedback_input = InputText(max_length=500)
# Specific improvement suggestions
improvement_input = InputText(
max_length=200 # Limit length for focused feedback
)
# Short answers
summary_input = InputText(max_length=100)
多个文本条目 (InputTextList)
用于收集文本项的列表:
from mlflow.genai.label_schemas import InputTextList
# List of factual errors
errors_input = InputTextList(
max_count=10, # Maximum 10 errors
max_length_each=150 # Each error description limited to 150 chars
)
# Missing information
missing_input = InputTextList(
max_count=5,
max_length_each=200
)
# Improvement suggestions
suggestions_input = InputTextList(max_count=3) # No length limit per item
数字输入 (InputNumeric)
用于表示数值分级或分数:
from mlflow.genai.label_schemas import InputNumeric
# Confidence score
confidence_input = InputNumeric(
min_value=0.0,
max_value=1.0
)
# Rating scale
rating_input = InputNumeric(
min_value=1,
max_value=10
)
# Cost estimate
cost_input = InputNumeric(min_value=0) # No maximum limit
完整示例
客户服务评估
下面是用于评估客户服务响应的综合示例:
import mlflow.genai.label_schemas as schemas
from mlflow.genai.label_schemas import (
InputCategorical,
InputCategoricalList,
InputText,
InputTextList,
InputNumeric
)
# Overall quality rating
quality_schema = schemas.create_label_schema(
name="service_quality",
type="feedback",
title="Rate the overall quality of this customer service response",
input=InputCategorical(options=["Excellent", "Good", "Average", "Poor", "Very Poor"]),
instruction="Consider helpfulness, accuracy, and professionalism.",
enable_comment=True
)
# Issues identification
issues_schema = schemas.create_label_schema(
name="response_issues",
type="feedback",
title="What issues are present in this response? (Select all that apply)",
input=InputCategoricalList(options=[
"Factually incorrect information",
"Unprofessional tone",
"Doesn't address the question",
"Too vague or generic",
"Contains harmful content",
"No issues identified"
]),
instruction="Select all issues you identify. Choose 'No issues identified' if the response is problem-free."
)
# Expected resolution steps
resolution_schema = schemas.create_label_schema(
name="expected_resolution",
type="expectation",
title="What steps should be included in the ideal resolution?",
input=InputTextList(max_count=5, max_length_each=200),
instruction="List the key steps a customer service rep should take to properly resolve this issue."
)
# Confidence in assessment
confidence_schema = schemas.create_label_schema(
name="assessment_confidence",
type="feedback",
title="How confident are you in your assessment?",
input=InputNumeric(min_value=1, max_value=10),
instruction="Rate from 1 (not confident) to 10 (very confident)"
)
医疗信息评审
评估医疗信息响应的示例:
import mlflow.genai.label_schemas as schemas
from mlflow.genai.label_schemas import InputCategorical, InputTextList, InputNumeric
# Safety assessment
safety_schema = schemas.create_label_schema(
name="medical_safety",
type="feedback",
title="Is this medical information safe and appropriate?",
input=InputCategorical(options=[
"Safe - appropriate general information",
"Concerning - may mislead patients",
"Dangerous - could cause harm if followed"
]),
instruction="Assess whether the information could be safely consumed by patients."
)
# Required disclaimers
disclaimers_schema = schemas.create_label_schema(
name="required_disclaimers",
type="expectation",
title="What medical disclaimers should be included?",
input=InputTextList(max_count=3, max_length_each=300),
instruction="List disclaimers that should be present (e.g., 'consult your doctor', 'not professional medical advice')."
)
# Accuracy of medical facts
accuracy_schema = schemas.create_label_schema(
name="medical_accuracy",
type="feedback",
title="Rate the factual accuracy of the medical information",
input=InputNumeric(min_value=0, max_value=100),
instruction="Score from 0 (completely inaccurate) to 100 (completely accurate)"
)
与标记会话集成
创建架构后,可以在标记会话中使用它们。
import mlflow.genai.label_schemas as schemas
# Schemas are automatically available when creating labeling sessions
# The Review App will present questions based on your schema definitions
# Example: Using schemas in a session (conceptual - actual session creation
# happens through the Review App UI or other APIs)
session_schemas = [
"service_quality", # Your custom schema
"response_issues", # Your custom schema
schemas.EXPECTED_FACTS # Built-in schema
]
最佳做法
架构设计
- 明确标题:将问题写成清晰、具体的提示
- 有用的说明:提供指导审阅者的上下文
- 适当的约束:设置文本长度和列表计数的合理限制
- 逻辑选项:对于分类输入,请确保选项互斥且全面
架构管理
- 一致的命名:跨架构使用描述性、一致的名称
- 版本控制:更新架构时,请考虑对现有会话的影响
- 清理:删除未使用的架构以保持工作区井然有序
输入类型选择
- 使用
InputCategorical进行标准化评级或分类 - 如果存在多个问题或功能时,使用
InputCategoricalList - 使用
InputText进行详细说明或自定义反馈 - 使用
InputTextList获取结构化的项目列表 - 使用
InputNumeric进行精确评分或置信度评估