跟踪海斯塔克

Haystack 是一个开源 AI 业务流程框架，用于构建生产就绪的 LLM 应用程序、语义搜索系统和问答系统。

MLflow 跟踪为 Haystack 提供自动跟踪功能。可以通过调用 mlflow.haystack.autolog 函数来启用 Haystack 跟踪，在调用管道和组件时，跟踪会自动记录到活动的 MLflow 试验中。

import mlflow

mlflow.haystack.autolog()

MLflow 跟踪会自动捕获有关 Haystack 管道运行的以下信息：

管道和组件
潜伏期
组件元数据
令牌使用情况和成本
缓存命中信息
引发的任何异常

注释

在无服务器计算群集上，不会自动启用自动记录。必须显式调用 mlflow.haystack.autolog() 才能为此集成启用自动跟踪。

先决条件

若要将 MLflow 跟踪与 Haystack 配合使用，需要安装 MLflow 和 Haystack 包。

开发

对于开发环境，请使用 Databricks Extras 和 Haystack 包安装完整的 MLflow 包：

pip install --upgrade "mlflow[databricks]>=3.1" haystack-ai

完整 mlflow[databricks] 包包括用于 Databricks 的本地开发和试验的所有功能。

生产

对于生产部署，安装 mlflow-tracing 包和 Haystack 包：

pip install --upgrade mlflow-tracing haystack-ai

包 mlflow-tracing 已针对生产用途进行优化。

注释

强烈推荐使用 MLflow 3，以实现海斯塔克的最佳跟踪效果。

在运行示例之前，需要配置环境：

对于不使用 Databricks 笔记本的用户：设置 Databricks 环境变量：

export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-personal-access-token"

对于 Databricks 笔记本中的用户：这些凭据会自动为您设置。

API 密钥：确保配置 LLM 提供程序 API 密钥。对于生产环境，请使用马赛克 AI 网关或 Databricks 机密，而不要使用硬编码的值，以便更安全地管理 API 密钥。

export OPENAI_API_KEY="your-openai-api-key"
# Add other provider keys as needed

示例用法

以下示例演示如何将 Haystack 与 MLflow 跟踪配合使用。此示例创建一个简单的 RAG（检索增强型生成）管道，其中包括检索器、提示生成器和语言模型。

import mlflow
import os

from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack import Document

# Ensure your OPENAI_API_KEY is set in your environment
# os.environ["OPENAI_API_KEY"] = "your-openai-api-key" # Uncomment and set if not globally configured

# Enable auto tracing for Haystack
mlflow.haystack.autolog()

# Set up MLflow tracking to Databricks
mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Shared/haystack-tracing-demo")

# Create a simple document store with sample documents
document_store = InMemoryDocumentStore()
document_store.write_documents([
    Document(content="Paris is the capital/major city of France."),
    Document(content="Berlin is the capital/major city of Germany."),
    Document(content="Rome is the capital/major city of Italy."),
])

# Build a simple RAG pipeline
template = """
Given the following documents, answer the question.

Documents:
{% for doc in documents %}
  {{ doc.content }}
{% endfor %}

Question: {{ question }}
Answer:
"""

pipe = Pipeline()
pipe.add_component("retriever", InMemoryBM25Retriever(document_store=document_store))
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component("llm", OpenAIGenerator(model="gpt-4o-mini"))

pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")

# Run the pipeline - trace will be automatically logged
result = pipe.run({
    "retriever": {"query": "What is the capital/major city of France?"},
    "prompt_builder": {"question": "What is the capital/major city of France?"}
})

print(result["llm"]["replies"][0])

警告

对于生产环境，请使用马赛克 AI 网关或 Databricks 机密，而不要使用硬编码的值，以便更安全地管理 API 密钥。

令牌使用情况跟踪

使用 MLflow 3.4.0 或更高版本时，MLflow 会自动跟踪 Haystack 管道的令牌使用情况。令牌使用信息包括输入令牌、输出令牌和管道执行期间使用的令牌总数。

import mlflow

mlflow.haystack.autolog()

from haystack import Pipeline
from haystack.components.generators import OpenAIGenerator

# Create and run a pipeline
pipe = Pipeline()
pipe.add_component("llm", OpenAIGenerator(model="gpt-4o-mini"))

# Run the pipeline and retrieve trace information
with mlflow.start_span(name="haystack_pipeline_run") as span:
    result = pipe.run({"llm": {"prompt": "What is the capital/major city of France?"}})
    print(result["llm"]["replies"][0])

# Token usage is automatically logged and visible in the MLflow UI
trace_info = mlflow.get_last_active_trace()
print(f"Trace ID: {trace_info.request_id}")

令牌使用情况详细信息显示在 MLflow 跟踪 UI 中，允许监视和优化管道的性能和成本。

禁用自动跟踪

可以通过调用 mlflow.haystack.autolog(disable=True) 或 mlflow.autolog(disable=True)调用来全局禁用 Haystack 的自动跟踪。

Last updated on 2026-04-20

跟踪海斯塔克

先决条件

开发

生产

示例用法

令牌使用情况跟踪

禁用自动跟踪

其他资源