跟踪 LlamaIndex

LlamaIndex 使用 autolog 进行跟踪

LlamaIndex 是一个开源框架，用于生成代理生成 AI 应用程序，允许大型语言模型以任何格式处理数据。

MLflow 跟踪为 LlamaIndex 提供自动跟踪功能。可以通过调用 mlflow.llama_index.autolog 函数来启用 LlamaIndex 跟踪，嵌套跟踪在调用 LlamaIndex 引擎和工作流时会自动记录到活动的 MLflow 试验中。

import mlflow

mlflow.llama_index.autolog()

注释

在无服务器计算群集上，不会自动启用自动记录。必须显式调用 mlflow.llama_index.autolog() 才能为此集成启用自动跟踪。

小窍门

MLflow LlamaIndex 集成不仅与跟踪有关。 MLflow 为 LlamaIndex 提供完整的跟踪体验，包括模型跟踪、索引管理和评估。

先决条件

若要将 MLflow 跟踪与 LlamaIndex 配合使用，需要安装 MLflow 和 llama-index 库。

开发

对于开发环境，请安装包含 Databricks 附加程序和 llama-index 的完整 MLflow 软件包：

pip install --upgrade "mlflow[databricks]>=3.1" llama-index

完整 mlflow[databricks] 包包括用于 Databricks 的本地开发和试验的所有功能。

生产

对于生产部署，请安装 mlflow-tracing 和 llama-index：

pip install --upgrade mlflow-tracing llama-index

包 mlflow-tracing 已针对生产用途进行优化。

注释

强烈建议使用 MLflow 3 获得 LlamaIndex 的最佳跟踪体验。

在运行示例之前，需要配置环境：

对于不使用 Databricks 笔记本的用户：设置 Databricks 环境变量：

export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-personal-access-token"

对于 Databricks 笔记本中的用户：这些凭据会自动为您设置。

API 密钥：确保配置 LLM 提供程序 API 密钥。对于生产环境，请使用马赛克 AI 网关或 Databricks 机密，而不要使用硬编码的值，以便更安全地管理 API 密钥。

export OPENAI_API_KEY="your-openai-api-key"
# Add other provider keys as needed

示例用法

首先，下载测试数据以创建玩具索引：

!mkdir -p data
!curl -L https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt -o ./data/paul_graham_essay.txt

将它们加载到简单的内存中向量索引中：

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)

现在，可以启用 LlamaIndex 自动跟踪并开始查询索引：

import mlflow
import os

# Ensure your OPENAI_API_KEY (or other LLM provider keys) is set in your environment
# os.environ["OPENAI_API_KEY"] = "your-openai-api-key" # Uncomment and set if not globally configured

# Enabling tracing for LlamaIndex
mlflow.llama_index.autolog()

# Set up MLflow tracking to Databricks
mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Shared/llamaindex-demo")

# Query the index
query_engine = index.as_query_engine()
response = query_engine.query("What was the first program the author wrote?")

警告

对于生产环境，请使用马赛克 AI 网关或 Databricks 机密，而不要使用硬编码的值，以便更安全地管理 API 密钥。

LlamaIndex 工作流

Workflow这是 LlamaIndex 的下一代 GenAI 业务流程框架。它设计为一个灵活且可解释的框架，用于生成任意 LLM 应用程序，例如代理、RAG 流、数据提取管道等。MLflow 支持跟踪、评估和跟踪工作流对象，从而使它们更易于观察和维护。

LlamaIndex 工作流的自动跟踪通过调用相同的 mlflow.llama_index.autolog()方式工作。

禁用自动跟踪

可以通过调用 mlflow.llama_index.autolog(disable=True) 或 mlflow.autolog(disable=True)调用来全局禁用 LlamaIndex 的自动跟踪。

Last updated on 2026-01-26

通过

跟踪 LlamaIndex

先决条件

开发

生产

示例用法

LlamaIndex 工作流

禁用自动跟踪

其他资源