快速入门:将生成式搜索 (RAG) 与 Azure AI 搜索中的上下文关联数据配合使用
本快速入门介绍如何将基本和复杂查询发送到大型语言模型 (LLM),以基于 Azure AI 搜索中的索引内容进行对话式搜索体验。 你将使用 Azure 门户设置资源,然后运行 Python 代码来调用 API。
先决条件
Azure 订阅。 创建试用版订阅。
Azure AI 搜索。 区域必须与 Azure OpenAI 使用的区域相同。
Azure AI 搜索所在的同一区域中的 Azure OpenAI 资源,其中部署了
gpt-4o
、gpt-4o-mini
或等效 LLM。包含 Python 扩展和 Jupyter 包的 Visual Studio Code。 有关详细信息,请参阅 Visual Studio Code 中的 Python。
下载文件
从 GitHub 下载 Jupyter 笔记本以发送本快速入门中所述的请求。 有关详细信息,请参阅从 GitHub 下载文件。
你还可以在本地系统上启动一个新文件,并根据本文中的说明手动创建请求。
配置访问权限
搜索终结点的请求必须经过身份验证和授权。 你可以使用 API 密钥或角色来完成此任务。 密钥更容易上手,但角色更安全。 本快速入门将使用角色。
你将设置两个客户端,因此需要拥有对两个资源的权限。
Azure AI 搜索正在从本地系统接收查询请求。 向你自己分配该任务的搜索索引数据读者角色分配。 如果还要创建和加载酒店示例索引,则另请添加搜索服务参与者和搜索索引数据参与者角色。
除接受来自搜索服务的搜索结果(源)以外,Azure OpenAI 还将接收来自本地系统的“是否可以推荐几个酒店”(查询)。 向你自己和搜索服务分配认知服务 OpenAI 用户角色。
登录到 Azure 门户。
配置 Azure AI 搜索以使用系统分配的托管标识,以便你可以为其分配角色分配:
在 Azure 门户上找到你的搜索服务。
在左侧菜单中,选择“设置”>“标识”。
在“系统分配”选项卡上,将状态设置为“打开”。
为 Azure AI 搜索配置基于角色的访问:
在 Azure 门户中,找到你的 Azure AI 搜索服务。
在左侧菜单中选择“设置”>“密钥”,然后选择“基于角色的访问控制”或“两者”。
分配角色:
在左侧菜单中,选择“访问控制 (IAM)”。
在 Azure AI 搜索上,请确保你有权创建、加载和查询搜索索引:
- 搜索索引数据读取者
- 搜索索引数据参与者
- 搜索服务参与者
在 Azure OpenAI 上,选择“访问控制 (IAM)”为自己分配 Azure OpenAI 的搜索服务标识权限。 本快速入门的代码在本地运行。 对 Azure OpenAI 的请求源自你的系统。 此外,搜索引擎的搜索结果将传递到 Azure OpenAI。 由于这些原因,你和搜索服务都需要拥有 Azure OpenAI 的权限。
- 认知服务 OpenAI 用户
权限可能需要几分钟才能生效。
创建索引
我们建议使用 hotels-sample-index,此索引在几分钟内即可创建完成,并可在任何搜索服务层级上运行。 此索引是使用内置示例数据创建的。
在 Azure 门户上找到你的搜索服务。
在“概述”主页上,选择“导入数据”以启动向导。
在“连接到数据”页面上,从下拉列表中选择“示例”。
选择“hotels-sample”。
在剩余页面中选择“下一步”,并接受默认值。
创建索引后,从左侧菜单中选择“搜索管理”>“索引”以打开索引。
选择“编辑 JSON”。
搜索“semantic”以在索引中查找语义配置部分。 将空
"semantic": {}
行替换为以下语义配置。 此示例指定了"defaultConfiguration"
,它对于运行本快速入门非常重要。"semantic": { "defaultConfiguration": "semantic-config", "configurations": [ { "name": "semantic-config", "prioritizedFields": { "titleField": { "fieldName": "HotelName" }, "prioritizedContentFields": [ { "fieldName": "Description" } ], "prioritizedKeywordsFields": [ { "fieldName": "Category" }, { "fieldName": "Tags" } ] } } ] },
保存所做更改。
在搜索资源管理器中运行以下查询以测试索引:
complimentary breakfast
。输出应类似于以下示例。 搜索引擎直接返回的结果由字段及其逐字值以及搜索分数、语义排名分数和标题等元数据(如果使用语义排序器)组成。 我们使用 select 语句仅返回 HotelName、Description 和 Tags 字段。
{ "@odata.count": 18, "@search.answers": [], "value": [ { "@search.score": 2.2896252, "@search.rerankerScore": 2.506816864013672, "@search.captions": [ { "text": "Head Wind Resort. Suite. coffee in lobby\r\nfree wifi\r\nview. The best of old town hospitality combined with views of the river and cool breezes off the prairie. Our penthouse suites offer views for miles and the rooftop plaza is open to all guests from sunset to 10 p.m. Enjoy a **complimentary continental breakfast** in the lobby, and free Wi-Fi throughout the hotel..", "highlights": "" } ], "HotelName": "Head Wind Resort", "Description": "The best of old town hospitality combined with views of the river and cool breezes off the prairie. Our penthouse suites offer views for miles and the rooftop plaza is open to all guests from sunset to 10 p.m. Enjoy a complimentary continental breakfast in the lobby, and free Wi-Fi throughout the hotel.", "Tags": [ "coffee in lobby", "free wifi", "view" ] }, { "@search.score": 2.2158256, "@search.rerankerScore": 2.288334846496582, "@search.captions": [ { "text": "Swan Bird Lake Inn. Budget. continental breakfast\r\nfree wifi\r\n24-hour front desk service. We serve a continental-style breakfast each morning, featuring a variety of food and drinks. Our locally made, oh-so-soft, caramel cinnamon rolls are a favorite with our guests. Other breakfast items include coffee, orange juice, milk, cereal, instant oatmeal, bagels, and muffins..", "highlights": "" } ], "HotelName": "Swan Bird Lake Inn", "Description": "We serve a continental-style breakfast each morning, featuring a variety of food and drinks. Our locally made, oh-so-soft, caramel cinnamon rolls are a favorite with our guests. Other breakfast items include coffee, orange juice, milk, cereal, instant oatmeal, bagels, and muffins.", "Tags": [ "continental breakfast", "free wifi", "24-hour front desk service" ] }, { "@search.score": 0.92481667, "@search.rerankerScore": 2.221315860748291, "@search.captions": [ { "text": "White Mountain Lodge & Suites. Resort and Spa. continental breakfast\r\npool\r\nrestaurant. Live amongst the trees in the heart of the forest. Hike along our extensive trail system. Visit the Natural Hot Springs, or enjoy our signature hot stone massage in the Cathedral of Firs. Relax in the meditation gardens, or join new friends around the communal firepit. Weekend evening entertainment on the patio features special guest musicians or poetry readings..", "highlights": "" } ], "HotelName": "White Mountain Lodge & Suites", "Description": "Live amongst the trees in the heart of the forest. Hike along our extensive trail system. Visit the Natural Hot Springs, or enjoy our signature hot stone massage in the Cathedral of Firs. Relax in the meditation gardens, or join new friends around the communal firepit. Weekend evening entertainment on the patio features special guest musicians or poetry readings.", "Tags": [ "continental breakfast", "pool", "restaurant" ] }, . . . ]}
获取服务终结点
在其余部分中,设置对 Azure OpenAI 和 Azure AI 搜索的 API 调用。 获取服务终结点,以便可以在代码中将其作为变量提供。
登录到 Azure 门户。
在“概述”主页上,复制 URL。 示例终结点可能类似于
https://example.search.azure.cn
。在“概述”主页上,选择用于查看终结点的链接。 复制 URL。 示例终结点可能类似于
https://example.openai.azure.com/
。
设置查询和聊天线程
本部分使用 Visual Studio Code 和 Python 在 Azure OpenAI 中调用聊天完成 API。
启动 Visual Studio Code 并打开 .ipynb 文件或创建新的 Python 文件。
安装以下 Python 包。
! pip install azure-search-documents==11.6.0b5 --quiet ! pip install azure-identity==1.16.1 --quiet ! pip install openai --quiet ! pip install aiohttp --quiet ! pip install ipykernel --quiet
设置以下变量,并将占位符替换为在上一步中收集的终结点。
AZURE_SEARCH_SERVICE: str = "PUT YOUR SEARCH SERVICE ENDPOINT HERE" AZURE_OPENAI_ACCOUNT: str = "PUT YOUR AZURE OPENAI ENDPOINT HERE" AZURE_DEPLOYMENT_MODEL: str = "gpt-4o"
设置客户端、提示、查询和响应。
# Set up the query for generating responses from azure.identity import DefaultAzureCredential from azure.identity import get_bearer_token_provider from azure.search.documents import SearchClient from openai import AzureOpenAI credential = DefaultAzureCredential() token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default") openai_client = AzureOpenAI( api_version="2024-06-01", azure_endpoint=AZURE_OPENAI_ACCOUNT, azure_ad_token_provider=token_provider ) search_client = SearchClient( endpoint=AZURE_SEARCH_SERVICE, index_name="hotels-sample-index", credential=credential ) # This prompt provides instructions to the model GROUNDED_PROMPT=""" You are a friendly assistant that recommends hotels based on activities and amenities. Answer the query using only the sources provided below in a friendly and concise bulleted manner. Answer ONLY with the facts listed in the list of sources below. If there isn't enough information below, say you don't know. Do not generate answers that don't use the sources below. Query: {query} Sources:\n{sources} """ # Query is the question being asked. It's sent to the search engine and the LLM. query="Can you recommend a few hotels with complimentary breakfast?" # Set up the search results and the chat thread. # Retrieve the selected fields from the search index related to the question. search_results = search_client.search( search_text=query, top=5, select="Description,HotelName,Tags" ) sources_formatted = "\n".join([f'{document["HotelName"]}:{document["Description"]}:{document["Tags"]}' for document in search_results]) response = openai_client.chat.completions.create( messages=[ { "role": "user", "content": GROUNDED_PROMPT.format(query=query, sources=sources_formatted) } ], model=AZURE_DEPLOYMENT_MODEL ) print(response.choices[0].message.content)
输出来自 Azure OpenAI,其中包含多个酒店的建议。 下面是此输出的执行示例:
Sure! Here are a few hotels that offer complimentary breakfast: - **Head Wind Resort** - Complimentary continental breakfast in the lobby - Free Wi-Fi throughout the hotel - **Double Sanctuary Resort** - Continental breakfast included - **White Mountain Lodge & Suites** - Continental breakfast available - **Swan Bird Lake Inn** - Continental-style breakfast each morning with a variety of food and drinks such as caramel cinnamon rolls, coffee, orange juice, milk, cereal, instant oatmeal, bagels, and muffins
如果收到“已禁止”错误消息,请检查 Azure AI 搜索配置,以确保已启用基于角色的访问。
如果收到“授权失败”错误消息,请等待几分钟,然后重试。 可能需要几分钟才能使角色分配生效。
否则,如果要进一步试验,请更改查询并重新运行上一步,以更好地了解模型如何处理基础数据。
你还可以修改提示以更改输出的语气或结构。
还可以通过在查询参数步骤中设置
use_semantic_reranker=False
来尝试没有语义排名的查询。 语义排名可以显著改善查询结果的相关性以及 LLM 返回有用信息的能力。 试验可以帮助你确定它是否对内容有影响。
发送复杂的 RAG 查询
Azure AI 搜索支持嵌套 JSON 结构的复杂类型。 在 hotels-sample-index 中,一个复杂类型的示例是 Address
,其中包括 Address.StreetAddress
、Address.City
、Address.StateProvince
、Address.PostalCode
和 Address.Country
。 该索引还包含每家酒店的复杂 Rooms
集合。
如果索引具有复杂类型,那么只要先将搜索结果输出转换为 JSON,然后将 JSON 传递给 LLM,查询就可以提供这些字段。 以下示例将复杂类型添加到请求。 格式设置说明包括 JSON 规范。
import json
# Query is the question being asked. It's sent to the search engine and the LLM.
query="Can you recommend a few hotels that offer complimentary breakfast?
Tell me their description, address, tags, and the rate for one room that sleeps 4 people."
# Set up the search results and the chat thread.
# Retrieve the selected fields from the search index related to the question.
selected_fields = ["HotelName","Description","Address","Rooms","Tags"]
search_results = search_client.search(
search_text=query,
top=5,
select=selected_fields,
query_type="semantic"
)
sources_filtered = [{field: result[field] for field in selected_fields} for result in search_results]
sources_formatted = "\n".join([json.dumps(source) for source in sources_filtered])
response = openai_client.chat.completions.create(
messages=[
{
"role": "user",
"content": GROUNDED_PROMPT.format(query=query, sources=sources_formatted)
}
],
model=AZURE_DEPLOYMENT_MODEL
)
print(response.choices[0].message.content)
输出来自 Azure OpenAI,并且会添加复杂类型中的内容。
Here are a few hotels that offer complimentary breakfast and have rooms that sleep 4 people:
1. **Head Wind Resort**
- **Description:** The best of old town hospitality combined with views of the river and
cool breezes off the prairie. Enjoy a complimentary continental breakfast in the lobby,
and free Wi-Fi throughout the hotel.
- **Address:** 7633 E 63rd Pl, Tulsa, OK 74133, USA
- **Tags:** Coffee in lobby, free Wi-Fi, view
- **Room for 4:** Suite, 2 Queen Beds (Amenities) - $254.99
2. **Double Sanctuary Resort**
- **Description:** 5-star Luxury Hotel - Biggest Rooms in the city. #1 Hotel in the area
listed by Traveler magazine. Free WiFi, Flexible check in/out, Fitness Center & espresso
in room. Offers continental breakfast.
- **Address:** 2211 Elliott Ave, Seattle, WA 98121, USA
- **Tags:** View, pool, restaurant, bar, continental breakfast
- **Room for 4:** Suite, 2 Queen Beds (Amenities) - $254.99
3. **Swan Bird Lake Inn**
- **Description:** Continental-style breakfast featuring a variety of food and drinks.
Locally made caramel cinnamon rolls are a favorite.
- **Address:** 1 Memorial Dr, Cambridge, MA 02142, USA
- **Tags:** Continental breakfast, free Wi-Fi, 24-hour front desk service
- **Room for 4:** Budget Room, 2 Queen Beds (City View) - $85.99
4. **Gastronomic Landscape Hotel**
- **Description:** Known for its culinary excellence under the management of William Dough,
offers continental breakfast.
- **Address:** 3393 Peachtree Rd, Atlanta, GA 30326, USA
- **Tags:** Restaurant, bar, continental breakfast
- **Room for 4:** Budget Room, 2 Queen Beds (Amenities) - $66.99
...
- **Tags:** Pool, continental breakfast, free parking
- **Room for 4:** Budget Room, 2 Queen Beds (Amenities) - $60.99
Enjoy your stay! Let me know if you need any more information.
解决错误
要调试身份验证错误,请在调用搜索引擎和 LLM 的步骤之前插入以下代码。
import sys
import logging # Set the logging level for all azure-storage-* libraries
logger = logging.getLogger('azure.identity')
logger.setLevel(logging.DEBUG)
handler = logging.StreamHandler(stream=sys.stdout)
formatter = logging.Formatter('[%(levelname)s %(name)s] %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
重新运行查询脚本。 现在,应会在输出中获取 INFO 和 DEBUG 语句,该语句将提供有关此问题的更多详细信息。
如果看到与 ManagedIdentityCredential 和令牌获取失败相关的输出消息,则可能是因为你拥有多个租户,但你的 Azure 登录使用的是没有搜索服务的租户。 若要获取租户 ID,请在 Azure 门户中搜索“租户属性”或运行 az login tenant list
。
获得租户 ID 后,在命令提示符处运行 az login --tenant <YOUR-TENANT-ID>
,然后重新运行脚本。
清理
在自己的订阅中操作时,最好在项目结束时确定是否仍需要已创建的资源。 持续运行资源可能会产生费用。 可以逐个删除资源,也可以删除资源组以删除整个资源集。
可以在 Azure 门户中使用最左侧窗格中的“所有资源”或“资源组”链接来查找和管理资源。