MLflow 跟踪集成

2025-10-21

MLflow 跟踪与一系列常用的生成式 AI 库和框架集成，为所有库和框架提供 单行自动跟踪 体验。这样，就可以在最少的设置下立即获得 GenAI 应用程序的可观测性。

自动跟踪根据特定库或 SDK 的实现方式捕获应用程序逻辑和中间步骤，例如 LLM 调用、工具使用情况和代理交互。

若要深入了解自动跟踪的工作原理、其先决条件以及将其与手动跟踪相结合的示例，请参阅主要的自动跟踪指南。下面的快速示例突出显示了一些顶级集成。本部分各自页面上提供了每个受支持的库的详细指南，涵盖先决条件、高级示例和特定行为。

顶层集成概览

下面是一些最常用的集成快速入门示例。单击选项卡可查看基本用法示例。有关每个方案的详细先决条件和更高级的方案，请访问其专用集成页（从选项卡或下面的列表链接）。

OpenAI

import mlflow
import openai

# If running this code outside of a Databricks notebook (e.g., locally),
# uncomment and set the following environment variables to point to your Databricks workspace:
# import os
# os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
# os.environ["DATABRICKS_TOKEN"] = "your-personal-access-token"

# Enable auto-tracing for OpenAI
mlflow.openai.autolog()

# Set up MLflow tracking
mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Shared/openai-tracing-demo")

openai_client = openai.OpenAI()

messages = [
    {
        "role": "user",
        "content": "What is the capital of France?",
    }
]

response = openai_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    temperature=0.1,
    max_tokens=100,
)
# View trace in MLflow UI

完整 OpenAI 集成指南

LangChain

import mlflow
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

# If running this code outside of a Databricks notebook (e.g., locally),
# uncomment and set the following environment variables to point to your Databricks workspace:
# import os
# os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
# os.environ["DATABRICKS_TOKEN"] = "your-personal-access-token"

mlflow.langchain.autolog()

mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Shared/langchain-tracing-demo")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7, max_tokens=1000)
prompt = PromptTemplate.from_template("Tell me a joke about {topic}.")
chain = prompt | llm | StrOutputParser()

chain.invoke({"topic": "artificial intelligence"})
# View trace in MLflow UI

完整 LangChain 集成指南

LangGraph

import mlflow
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent

# If running this code outside of a Databricks notebook (e.g., locally),
# uncomment and set the following environment variables to point to your Databricks workspace:
# import os
# os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
# os.environ["DATABRICKS_TOKEN"] = "your-personal-access-token"

mlflow.langchain.autolog() # LangGraph uses LangChain's autolog

mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Shared/langgraph-tracing-demo")

@tool
def get_weather(city: str):
    """Use this to get weather information."""
    return f"It might be cloudy in {city}"

llm = ChatOpenAI(model="gpt-4o-mini")
graph = create_react_agent(llm, [get_weather])
result = graph.invoke({"messages": [("user", "what is the weather in sf?")]})
# View trace in MLflow UI

完整 LangGraph 集成指南

Anthropic

import mlflow
import anthropic
import os

# If running this code outside of a Databricks notebook (e.g., locally),
# uncomment and set the following environment variables to point to your Databricks workspace:
# os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
# os.environ["DATABRICKS_TOKEN"] = "your-personal-access-token"

mlflow.anthropic.autolog()

mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Shared/anthropic-tracing-demo")

client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello, Claude"}],
)
# View trace in MLflow UI

完整人类集成指南

DSPy

import mlflow
import dspy

# If running this code outside of a Databricks notebook (e.g., locally),
# uncomment and set the following environment variables to point to your Databricks workspace:
# import os
# os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
# os.environ["DATABRICKS_TOKEN"] = "your-personal-access-token"

mlflow.dspy.autolog()

mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Shared/dspy-tracing-demo")

lm = dspy.LM("openai/gpt-4o-mini") # Assumes OPENAI_API_KEY is set
dspy.configure(lm=lm)

class SimpleSignature(dspy.Signature):
    input_text: str = dspy.InputField()
    output_text: str = dspy.OutputField()

program = dspy.Predict(SimpleSignature)
result = program(input_text="Summarize MLflow Tracing.")
# View trace in MLflow UI

完整 DSPy 集成指南

Databricks

import mlflow
import os
from openai import OpenAI # Databricks FMAPI uses OpenAI client

# If running this code outside of a Databricks notebook (e.g., locally),
# uncomment and set the following environment variables to point to your Databricks workspace:
# os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
# os.environ["DATABRICKS_TOKEN"] = "your-personal-access-token"

mlflow.openai.autolog() # Traces Databricks FMAPI via OpenAI client

mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Shared/databricks-fmapi-tracing")

client = OpenAI(
    api_key=os.environ.get("DATABRICKS_TOKEN"),
    base_url=f"{os.environ.get('DATABRICKS_HOST')}/serving-endpoints"
)
response = client.chat.completions.create(
    model="databricks-llama-4-maverick",
    messages=[{"role": "user", "content": "Key features of MLflow?"}],
)
# View trace in MLflow UI

完整 Databricks 集成指南

Bedrock

import mlflow
import boto3

# If running this code outside of a Databricks notebook (e.g., locally),
# uncomment and set the following environment variables to point to your Databricks workspace:
# import os
# os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
# os.environ["DATABRICKS_TOKEN"] = "your-personal-access-token"

mlflow.bedrock.autolog()

mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Shared/bedrock-tracing-demo")

bedrock = boto3.client(
    service_name="bedrock-runtime",
    region_name="us-east-1" # Replace with your region
)
response = bedrock.converse(
    modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
    messages=[{"role": "user", "content": "Hello World in one line."}]
)
# View trace in MLflow UI

完整基岩集成指南

AutoGen

import mlflow
from autogen import ConversableAgent
import os

# If running this code outside of a Databricks notebook (e.g., locally),
# uncomment and set the following environment variables to point to your Databricks workspace:
# os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
# os.environ["DATABRICKS_TOKEN"] = "your-personal-access-token"

mlflow.autogen.autolog()

mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Shared/autogen-tracing-demo")

config_list = [{"model": "gpt-4o-mini", "api_key": os.environ.get("OPENAI_API_KEY")}]
assistant = ConversableAgent("assistant", llm_config={"config_list": config_list})
user_proxy = ConversableAgent("user_proxy", human_input_mode="NEVER", code_execution_config=False)

user_proxy.initiate_chat(assistant, message="What is 2+2?")
# View trace in MLflow UI

完整 AutoGen 集成指南

安全 API 密钥管理

对于生产环境，Databricks 建议使用 AI 网关或 Databricks 机密来管理 API 密钥。 AI 网关是首选方法，提供其他治理功能。

警告

切勿直接在代码或笔记本中提交 API 密钥。对于敏感凭据，始终使用 AI 网关或 Databricks 机密。

Databricks 机密信息

使用 Databricks 机密管理 API 密钥：

创建机密范围并存储 API 密钥：

from databricks.sdk import WorkspaceClient

# Set your secret scope and key names
secret_scope_name = "llm-secrets"  # Choose an appropriate scope name
secret_key_name = "api-key"        # Choose an appropriate key name

# Create the secret scope and store your API key
w = WorkspaceClient()
w.secrets.create_scope(scope=secret_scope_name)
w.secrets.put_secret(
    scope=secret_scope_name,
    key=secret_key_name,
    string_value="your-api-key-here"  # Replace with your actual API key
)

检索和使用代码中的机密：

import mlflow
import openai
import os

# Configure your secret scope and key names
secret_scope_name = "llm-secrets"
secret_key_name = "api-key"

# Retrieve the API key from Databricks secrets
os.environ["OPENAI_API_KEY"] = dbutils.secrets.get(
    scope=secret_scope_name,
    key=secret_key_name
)

# Enable automatic tracing
mlflow.openai.autolog()

# Use OpenAI client with securely managed API key
client = openai.OpenAI()
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Explain MLflow Tracing"}],
    max_tokens=100
)

启用多个自动跟踪集成

由于 GenAI 应用程序通常合并多个库，因此 MLflow 跟踪允许同时为多个集成启用自动跟踪，从而提供统一的跟踪体验。

例如，若要启用 LangChain 和直接 OpenAI 跟踪，请执行以下命令：

import mlflow

# Enable MLflow Tracing for both LangChain and OpenAI
mlflow.langchain.autolog()
mlflow.openai.autolog()

# Your code using both LangChain and OpenAI directly...
# ... an example can be found on the Automatic Tracing page ...

MLflow 将生成一个统一的跟踪，该跟踪结合了 LangChain 和直接 OpenAI LLM 调用中的步骤，使你能够检查完整的流。可以在 “自动跟踪 ”页上找到更多合并集成示例。

禁用自动跟踪

可以通过调用 mlflow.<library>.autolog(disable=True)来禁用任何特定库的自动跟踪。若要一次性禁用所有自动记录集成，请使用 mlflow.autolog(disable=True)。

import mlflow

# Disable for a specific library
mlflow.openai.autolog(disable=True)

# Disable all autologging
mlflow.autolog(disable=True)

通过