本教程示例使用 MLflow 提示优化,通过 GEPA 和 GPT-OSS 20B 优化简单 查询分类 提示以完成分类任务。
安装依赖项
%pip install --upgrade mlflow databricks-sdk dspy openai
dbutils.library.restartPython()
确保你有权访问 Databricks Foundation 模型 API 以成功运行此 API。
import mlflow
import openai
from mlflow.genai.optimize import GepaPromptOptimizer
from mlflow.genai.scorers import Correctness
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
# Change the catalog and schema to your catalog and schema
catalog = ""
schema = ""
prompt_registry_name = "qa"
prompt_location = f"{catalog}.{schema}.{prompt_registry_name}"
openai_client = w.serving_endpoints.get_open_ai_client()
# Register initial prompt
prompt = mlflow.genai.register_prompt(
name=prompt_location,
template="classify this: {{query}}",
)
# Define your prediction function
def predict_fn(query: str) -> str:
prompt = mlflow.genai.load_prompt(f"prompts:/{prompt_location}/1")
completion = openai_client.chat.completions.create(
model="databricks-gpt-oss-20b",
# load prompt template using PromptVersion.format()
messages=[{"role": "user", "content": prompt.format(question=query)}],
)
return completion.choices[0].message.content
测试你的函数
观察模型在仅用基础提示时对输入进行分类的准确性。 虽然准确,但它与要查找的任何任务或用例不一致。
from IPython.display import Markdown
output = predict_fn("The emergence of HIV as a chronic condition means that people living with HIV are required to take more responsibility for the self-management of their condition , including making physical , emotional and social adjustments.")
Markdown(output[1]['text'])
针对数据进行优化
提供预期的反馈和事实,以帮助优化模型的行为和结果,以适合你的用例的方式。
在这种情况下,你希望模型从五个单词的选择中输出一个单词。 它只应输出该单词而不作任何进一步解释。
# Training data with inputs and expected outputs
dataset = [
{
"inputs": {"query": "The emergence of HIV as a chronic condition means that people living with HIV are required to take more responsibility for the self-management of their condition , including making physical , emotional and social adjustments."},
"outputs": {"response": "BACKGROUND"},
"expectations": {"expected_facts": ["Classification label must be 'CONCLUSIONS', 'RESULTS', 'METHODS', 'OBJECTIVE', 'BACKGROUND'"]}
},
{
"inputs": {"query": "This paper describes the design and evaluation of Positive Outlook , an online program aiming to enhance the self-management skills of men living with HIV ."},
"outputs": {"response": "BACKGROUND"},
"expectations": {"expected_facts": ["Classification label must be 'CONCLUSIONS', 'RESULTS', 'METHODS', 'OBJECTIVE', 'BACKGROUND'"]}
},
{
"inputs": {"query": "This study is designed as a randomised controlled trial in which men living with HIV in Australia will be assigned to either an intervention group or usual care control group ."},
"outputs": {"response": "METHODS"},
"expectations": {"expected_facts": ["Classification label must be 'CONCLUSIONS', 'RESULTS', 'METHODS', 'OBJECTIVE', 'BACKGROUND'"]}
},
{
"inputs": {"query": "The intervention group will participate in the online group program ` Positive Outlook ' ."},
"outputs": {"response": "METHODS"},
"expectations": {"expected_facts": ["Classification label must be 'CONCLUSIONS', 'RESULTS', 'METHODS', 'OBJECTIVE', 'BACKGROUND'"]}
},
{
"inputs": {"query": "The program is based on self-efficacy theory and uses a self-management approach to enhance skills , confidence and abilities to manage the psychosocial issues associated with HIV in daily life ."},
"outputs": {"response": "METHODS"},
"expectations": {"expected_facts": ["Classification label must be 'CONCLUSIONS', 'RESULTS', 'METHODS', 'OBJECTIVE', 'BACKGROUND'"]}
},
{
"inputs": {"query": "Participants will access the program for a minimum of 90 minutes per week over seven weeks ."},
"outputs": {"response": "METHODS"},
"expectations": {"expected_facts": ["Classification label must be 'CONCLUSIONS', 'RESULTS', 'METHODS', 'OBJECTIVE', 'BACKGROUND'"]}
}
]
# Optimize the prompt
result = mlflow.genai.optimize_prompts(
predict_fn=predict_fn,
train_data=dataset,
prompt_uris=[prompt.uri],
optimizer=GepaPromptOptimizer(reflection_model="databricks:/databricks-claude-sonnet-4-5"),
scorers=[Correctness(model="databricks:/databricks-gpt-5")],
)
# Use the optimized prompt
optimized_prompt = result.optimized_prompts[0]
print(f"Optimized template: {optimized_prompt.template}")
查看提示
打开 MLflow 试验的链接,并完成以下步骤,让试验中出现提示:
- 确保试验类型设置为 GenAI 应用和代理
- 导航到提示选项卡
- 单击右上角的 选择架构,并输入上面设置的相同架构以查看提示
加载新提示并再次测试
查看提示的外观,并将其加载到预测函数中,以查看模型的性能如何不同。
from IPython.display import Markdown
prompt = mlflow.genai.load_prompt(f"prompts:/{prompt_location}/34")
Markdown(prompt.template)
from IPython.display import Markdown
def predict_fn(query: str) -> str:
prompt = mlflow.genai.load_prompt(f"prompts:/{prompt_location}/34")
completion = openai_client.chat.completions.create(
model="databricks-gpt-oss-20b",
# load prompt template using PromptVersion.format()
messages=[{"role": "user", "content": prompt.format(query=query)}],
)
return completion.choices[0].message.content
output = predict_fn("The emergence of HIV as a chronic condition means that people living with HIV are required to take more responsibility for the self-management of their condition , including making physical , emotional and social adjustments.")
Markdown(output[1]['text'])
示例笔记本
下面是一个可运行的笔记本,它使用 MLflow GenAI 的 GEPA 提示优化功能来优化提示,并演示了 GPT-OSS 20B 的分类任务。