快速入门:使用 REST 进行矢量搜索
了解如何使用搜索 REST API 在 Azure AI 搜索中创建、加载和查询矢量。
在 Azure AI 搜索中,矢量存储具有定义矢量和非矢量字段的索引架构、用于创建嵌入空间的算法的矢量配置,以及查询请求中使用的矢量字段定义的设置。 创建索引 API 将创建矢量存储。
如果没有 Azure 订阅,请在开始前创建一个试用版订阅。
注意
本快速入门省略了矢量化步骤,并在示例文档中提供了嵌入内容。 如果要添加基于你自己的内容的内置数据分块和矢量化功能,请尝试使用“导入和矢量化数据”向导进行端到端演练。
先决条件
包含 REST 客户端的 Visual Studio Code。 如需入门帮助信息,请参阅快速入门:使用 REST 进行文本搜索。
任何区域和任何层中的 Azure AI 搜索。 对于本快速入门,可以使用免费层,但对于较大的数据文件,建议使用基本层或更高层。 创建或找到当前订阅下的现有 Azure AI 搜索资源。
大多数现有服务都支持矢量搜索。 对于在 2019 年 1 月之前创建的一小部分服务,包含矢量字段的索引会在创建时失败。 在这种情况下,必须创建新服务。
(可选)部署
text-embedding-ada-002
的 Azure OpenAI 资源。 源.rest
文件包含用于生成新文本嵌入的可选步骤,但我们提供了预生成嵌入,以便可以省略此依赖项。
下载文件
从 GitHub 下载 REST 示例以发送本快速入门中的请求。 有关详细信息,请参阅从 GitHub 下载文件。
你还可以在本地系统上启动一个新文件,并根据本文中的说明手动创建请求。
获取搜索服务终结点
你可以在 Azure 门户中找到搜索服务终结点。
在稍后的步骤中,需将此终结点粘贴到 .rest
或 .http
文件中。
配置访问权限
搜索终结点的请求必须经过身份验证和授权。 你可以使用 API 密钥或角色来完成此任务。 密钥更容易上手,但角色更安全。
对于基于角色的连接,如果按照以下说明操作,你将通过自己的标识(而不是客户端应用的标识)连接到 Azure AI 搜索。
选项 1:使用密钥
选择“设置”>“密钥”,然后复制管理密钥。 管理密钥用于添加、修改和删除对象。 有两个可互换的管理密钥。 复制其中任意一个。 有关详细信息,请参阅使用密钥身份验证连接到 Azure AI 搜索。
在稍后的步骤中,需将此密钥粘贴到 .rest
或 .http
文件中。
选项 2:使用角色
请确保为基于角色的访问配置搜索服务。 你必须已预先配置用于实现开发人员访问的角色分配。 角色分配必须授予创建、加载和查询搜索索引的权限。
在本部分中,使用 Azure CLI、Azure PowerShell 或 Azure 门户获取个人标识令牌。
登录到 Azure CLI。
az cloud set -n AzureChinaCloud az login # az cloud set -n AzureCloud //means return to Public Azure.
获取个人标识令牌。
az account get-access-token --scope https://search.azure.com/.default
在稍后的步骤中,需将个人标识令牌粘贴到 .rest
或 .http
文件中。
注意
本部分假定你使用的是代表你连接到 Azure AI 搜索的本地客户端。 另一种方法是获取客户端应用的令牌,前提是你的应用程序已在 Microsoft Entra ID 中注册。
创建矢量索引
创建索引 (REST) 创建矢量索引并在搜索服务上设置物理数据结构。
索引架构围绕酒店内容进行组织。 示例数据由矢量和非矢量名称以及七家虚构酒店的描述组成。 该架构包括矢量索引和查询以及语义排名的配置。
在 Visual Studio Code 中打开新的文本文件。
将变量设置为之前收集的值。 此示例使用了个人标识令牌。
@baseUrl = PUT-YOUR-SEARCH-SERVICE-URL-HERE @token = PUT-YOUR-PERSONAL-IDENTITY-TOKEN-HERE
使用
.rest
或.http
文件扩展名保存文件。粘贴以下示例,以在搜索服务上创建
hotels-vector-quickstart
索引。### Create a new index POST {{baseUrl}}/indexes?api-version=2023-11-01 HTTP/1.1 Content-Type: application/json Authorization: Bearer {{token}} { "name": "hotels-vector-quickstart", "fields": [ { "name": "HotelId", "type": "Edm.String", "searchable": false, "filterable": true, "retrievable": true, "sortable": false, "facetable": false, "key": true }, { "name": "HotelName", "type": "Edm.String", "searchable": true, "filterable": false, "retrievable": true, "sortable": true, "facetable": false }, { "name": "HotelNameVector", "type": "Collection(Edm.Single)", "searchable": true, "retrievable": true, "dimensions": 1536, "vectorSearchProfile": "my-vector-profile" }, { "name": "Description", "type": "Edm.String", "searchable": true, "filterable": false, "retrievable": true, "sortable": false, "facetable": false }, { "name": "DescriptionVector", "type": "Collection(Edm.Single)", "searchable": true, "retrievable": true, "dimensions": 1536, "vectorSearchProfile": "my-vector-profile" }, { "name": "Category", "type": "Edm.String", "searchable": true, "filterable": true, "retrievable": true, "sortable": true, "facetable": true }, { "name": "Tags", "type": "Collection(Edm.String)", "searchable": true, "filterable": true, "retrievable": true, "sortable": false, "facetable": true }, { "name": "Address", "type": "Edm.ComplexType", "fields": [ { "name": "City", "type": "Edm.String", "searchable": true, "filterable": true, "retrievable": true, "sortable": true, "facetable": true }, { "name": "StateProvince", "type": "Edm.String", "searchable": true, "filterable": true, "retrievable": true, "sortable": true, "facetable": true } ] }, { "name": "Location", "type": "Edm.GeographyPoint", "searchable": false, "filterable": true, "retrievable": true, "sortable": true, "facetable": false } ], "vectorSearch": { "algorithms": [ { "name": "my-hnsw-vector-config-1", "kind": "hnsw", "hnswParameters": { "m": 4, "efConstruction": 400, "efSearch": 500, "metric": "cosine" } }, { "name": "my-hnsw-vector-config-2", "kind": "hnsw", "hnswParameters": { "m": 4, "metric": "euclidean" } }, { "name": "my-eknn-vector-config", "kind": "exhaustiveKnn", "exhaustiveKnnParameters": { "metric": "cosine" } } ], "profiles": [ { "name": "my-vector-profile", "algorithm": "my-hnsw-vector-config-1" } ] } }
选择“发送请求”。 回想一下,你需要 REST 客户端发送请求。 应具有
HTTP/1.1 201 Created
响应。 响应正文应包含索引架构的 JSON 表示形式。要点:
fields
集合包含文本和矢量搜索所需的关键字段、文本字段和矢量字段(例如Description
和DescriptionVector
)。 将矢量字段和非矢量字段并置在同一索引中可实现混合查询。 例如,可以将筛选器、带语义排名的文本搜索和矢量合并到单个查询操作中。- 矢量字段必须是带有
dimensions
和vectorSearchProfile
的type: Collection(Edm.Single)
。 vectorSearch
部分是一组最近的邻域算法配置和配置文件的数组。 支持的算法包括分层可导航小世界和穷举 k 最近邻域。 有关详细信息,请参阅矢量搜索中的相关性评分。
上传文档
创建和加载索引是两个独立的步骤。 在 Azure AI 搜索中,索引包含所有可搜索的数据,并且查询在搜索服务上运行。 对于 REST 调用,数据以 JSON 文档的形式提供。 为此任务使用文档 - 索引 REST API。
会扩展 URI 以包含 docs
集合和 index
操作。
重要
以下示例不是可运行的代码。 为了便于阅读,我们排除了矢量值,因为每个矢量值包含 1536 个嵌入,这对于本文而言太长了。 如果你要尝试执行此步骤,请从 GitHub 上的示例中复制可运行的代码。
### Upload documents
POST {{baseUrl}}/indexes/hotels-quickstart-vectors/docs/index?api-version=2023-11-01 HTTP/1.1
Content-Type: application/json
Authorization: Bearer {{token}}
{
"value": [
{
"@search.action": "mergeOrUpload",
"HotelId": "1",
"HotelName": "Stay-Kay City Hotel",
"HotelNameVector": [VECTOR ARRAY OMITTED],
"Description":
"The hotel is ideally located on the main commercial artery of the city
in the heart of Beijing.",
"DescriptionVector": [VECTOR ARRAY OMITTED],
"Category": "Boutique",
"Tags": [
"pool",
"air conditioning",
"concierge"
],
},
{
"@search.action": "mergeOrUpload",
"HotelId": "2",
"HotelName": "Old Century Hotel",
"HotelNameVector": [VECTOR ARRAY OMITTED],
"Description":
"The hotel is situated in a nineteenth century plaza, which has been
expanded and renovated to the highest architectural standards to create a modern,
functional and first-class hotel in which art and unique historical elements
coexist with the most modern comforts.",
"DescriptionVector": [VECTOR ARRAY OMITTED],
"Category": "Boutique",
"Tags": [
"pool",
"air conditioning",
"free wifi",
"concierge"
]
},
{
"@search.action": "mergeOrUpload",
"HotelId": "3",
"HotelName": "Gastronomic Landscape Hotel",
"HotelNameVector": [VECTOR ARRAY OMITTED],
"Description":
"The Hotel stands out for its gastronomic excellence under the management of
William Dough, who advises on and oversees all of the Hotel's restaurant services.",
"DescriptionVector": [VECTOR ARRAY OMITTED],
"Category": "Resort and Spa",
"Tags": [
"air conditioning",
"bar",
"continental breakfast"
]
}
{
"@search.action": "mergeOrUpload",
"HotelId": "4",
"HotelName": "Sublime Palace Hotel",
"HotelNameVector": [VECTOR ARRAY OMITTED],
"Description":
"Sublime Palace Hotel is located in the heart of the historic center of
Sublime in an extremely vibrant and lively area within short walking distance to
the sites and landmarks of the city and is surrounded by the extraordinary beauty
of churches, buildings, shops and monuments.
Sublime Palace is part of a lovingly restored 1800 palace.",
"DescriptionVector": [VECTOR ARRAY OMITTED],
"Category": "Boutique",
"Tags": [
"concierge",
"view",
"24-hour front desk service"
]
},
{
"@search.action": "mergeOrUpload",
"HotelId": "13",
"HotelName": "Luxury Lion Resort",
"HotelNameVector": [VECTOR ARRAY OMITTED],
"Description":
"Unmatched Luxury. Visit our downtown hotel to indulge in luxury
accommodations. Moments from the stadium, we feature the best in comfort",
"DescriptionVector": [VECTOR ARRAY OMITTED],
"Category": "Resort and Spa",
"Tags": [
"view",
"free wifi",
"pool"
]
},
{
"@search.action": "mergeOrUpload",
"HotelId": "48",
"HotelName": "Nordick's Valley Motel",
"HotelNameVector": [VECTOR ARRAY OMITTED],
"Description":
"Only 90 miles (about 2 hours) from the nation's capital and nearby
most everything the historic valley has to offer. Hiking? Wine Tasting? Exploring
the caverns? It's all nearby and we have specially priced packages to help make
our B&B your home base for fun while visiting the valley.",
"DescriptionVector": [VECTOR ARRAY OMITTED],
"Category": "Boutique",
"Tags": [
"continental breakfast",
"air conditioning",
"free wifi"
],
},
{
"@search.action": "mergeOrUpload",
"HotelId": "49",
"HotelName": "Swirling Currents Hotel",
"HotelNameVector": [VECTOR ARRAY OMITTED],
"Description":
"Spacious rooms, glamorous suites and residences, rooftop pool, walking
access to shopping, dining, entertainment and the city center.",
"DescriptionVector": [VECTOR ARRAY OMITTED],
"Category": "Luxury",
"Tags": [
"air conditioning",
"laundry service",
"24-hour front desk service"
]
}
]
}
要点:
- 有效负载中的文档由索引架构中定义的字段组成。
- 矢量字段包含浮点值。 dimensions 属性的最小值为 2,每个字段不超过 3072 个浮点值。 本快速入门将 dimensions 属性设置为 1536,因为这是 Azure OpenAI text-embedding-ada-002 模型生成的嵌入项的大小。
运行查询
加载文档后,可以使用文档 - 搜索 POST (REST) 对其发出矢量查询。
有几个查询可用于演示各种模式:
本部分中的矢量查询基于两个字符串:
- 搜索字符串:
historic hotel walk to restaurants and shopping
- 矢量查询字符串(矢量化为数学表示形式):
classic lodging near running trails, eateries, retail
重要
以下示例是不可运行的代码。 为了便于阅读,我们排除了矢量值,因为每个数组值包含 1536 个嵌入,这对于本文而言太长了。 如果你要尝试这些查询,请从 GitHub 上的示例中复制可运行的代码。
单矢量搜索
粘贴 POST 请求以查询搜索索引。 然后,选择“发送请求”。 URI 扩展为包含
/docs/search
运算符。### Run a query POST {{baseUrl}}/indexes/hotels-vector-quickstart/docs/search?api-version=2023-11-01 HTTP/1.1 Content-Type: application/json Authorization: Bearer {{token}} { "count": true, "select": "HotelId, HotelName, Description, Category", "vectorQueries": [ { "vector"": [0.01944167, 0.0040178085 . . . TRIMMED FOR BREVITY 010858015, -0.017496133], "k": 7, "fields": "DescriptionVector", "kind": "vector", "exhaustive": true } ] }
为简洁起见,缩短了此矢量查询。
vectorQueries.vector
包含查询输入的矢量化文本,fields
确定要搜索的矢量字段,k
指定要返回的最近邻域的数目。矢量查询字符串为
classic lodging near running trails, eateries, retail
,该字符串将矢量化为此查询的 1536 个嵌入项。查看回应。 等效于
classic lodging near running trails, eateries, retail
的矢量的响应包括 7 个结果。 每个结果都提供一个搜索分数和select
中列出的字段。 在相似性搜索中,响应总是包含按相似性分数值排序的k
个结果。{ "@odata.context": "https://my-demo-search.search.azure.cn/indexes('hotels-vector-quickstart')/$metadata#docs(*)", "@odata.count": 7, "value": [ { "@search.score": 0.857736, "HotelName": "Nordick's Valley Motel", "Description": "Only 90 miles (about 2 hours) from the nation's capital and nearby most everything the historic valley has to offer. Hiking? Wine Tasting? Exploring the caverns? It's all nearby and we have specially priced packages to help make our B&B your home base for fun while visiting the valley." }, { "@search.score": 0.8399129, "HotelName": "Swirling Currents Hotel", "Description": "Spacious rooms, glamorous suites and residences, rooftop pool, walking access to shopping, dining, entertainment and the city center." }, { "@search.score": 0.8383954, "HotelName": "Luxury Lion Resort", "Description": "Unmatched Luxury. Visit our downtown hotel to indulge in luxury accommodations. Moments from the stadium, we feature the best in comfort" }, { "@search.score": 0.8254346, "HotelName": "Sublime Palace Hotel", "Description": "Sublime Palace Hotel is located in the heart of the historic center of Sublime in an extremely vibrant and lively area within short walking distance to the sites and landmarks of the city and is surrounded by the extraordinary beauty of churches, buildings, shops and monuments. Sublime Palace is part of a lovingly restored 1800 palace." }, { "@search.score": 0.82380056, "HotelName": "Stay-Kay City Hotel", "Description": "The hotel is ideally located on the main commercial artery of the city in the heart of Beijing." }, { "@search.score": 0.81514084, "HotelName": "Old Century Hotel", "Description": "The hotel is situated in a nineteenth century plaza, which has been expanded and renovated to the highest architectural standards to create a modern, functional and first-class hotel in which art and unique historical elements coexist with the most modern comforts." }, { "@search.score": 0.8133763, "HotelName": "Gastronomic Landscape Hotel", "Description": "The Hotel stands out for its gastronomic excellence under the management of William Dough, who advises on and oversees all of the Hotel's restaurant services." } ] }
使用筛选器的单矢量搜索
可以添加筛选器,但筛选器将应用于索引中的非矢量内容。 在此示例中,筛选器适用于 Tags
字段,可筛选出任何不提供免费 Wi-Fi 的酒店。
粘贴 POST 请求以查询搜索索引。
### Run a vector query with a filter POST {{baseUrl}}/indexes/hotels-vector-quickstart/docs/search?api-version=2023-11-01 HTTP/1.1 Content-Type: application/json Authorization: Bearer {{token}} { "count": true, "select": "HotelId, HotelName, Category, Tags, Description", "filter": "Tags/any(tag: tag eq 'free wifi')", "vectorFilterMode": "postFilter", "vectorQueries": [ { "vector": [ VECTOR OMITTED ], "k": 7, "fields": "DescriptionVector", "kind": "vector", "exhaustive": true }, ] }
查看回应。 查询与前面的示例相同,但包含后处理排除筛选器,仅返回具有免费 Wi-Fi 的三家酒店。
{ "@odata.count": 3, "value": [ { "@search.score": 0.857736, "HotelName": "Nordick's Valley Motel", "Description": "Only 90 miles (about 2 hours) from the nation's capital and nearby most everything the historic valley has to offer. Hiking? Wine Tasting? Exploring the caverns? It's all nearby and we have specially priced packages to help make our B&B your home base for fun while visiting the valley.", "Tags": [ "continental breakfast", "air conditioning", "free wifi" ] }, { "@search.score": 0.8383954, "HotelName": "Luxury Lion Resort", "Description": "Unmatched Luxury. Visit our downtown hotel to indulge in luxury accommodations. Moments from the stadium, we feature the best in comfort", "Tags": [ "view", "free wifi", "pool" ] }, { "@search.score": 0.81514084, "HotelName": "Old Century Hotel", "Description": "The hotel is situated in a nineteenth century plaza, which has been expanded and renovated to the highest architectural standards to create a modern, functional and first-class hotel in which art and unique historical elements coexist with the most modern comforts.", "Tags": [ "pool", "free wifi", "concierge" ] } ] }
混合搜索
混合搜索由单个搜索请求中的关键字查询和矢量查询组成。 此示例并发运行矢量查询和全文搜索:
- 搜索字符串:
historic hotel walk to restaurants and shopping
- 矢量查询字符串(矢量化为数学表示形式):
classic lodging near running trails, eateries, retail
粘贴 POST 请求以查询搜索索引。 然后,选择“发送请求”。
### Run a hybrid query POST {{baseUrl}}/indexes/hotels-vector-quickstart/docs/search?api-version=2023-11-01 HTTP/1.1 Content-Type: application/json Authorization: Bearer {{token}} { "count": true, "search": "historic hotel walk to restaurants and shopping", "select": "HotelName, Description", "top": 7, "vectorQueries": [ { "vector": [ VECTOR OMITTED], "k": 7, "fields": "DescriptionVector", "kind": "vector", "exhaustive": true } ] }
由于这是混合查询,因此结果按倒数排名融合 (RRF) 进行排名。 RRF 评估多个搜索结果的搜索分数,采用反函数,然后合并和排序组合的结果。 返回
top
结果数。查看回应。
{ "@odata.count": 7, "value": [ { "@search.score": 0.03279569745063782, "HotelName": "Luxury Lion Resort", "Description": "Unmatched Luxury. Visit our downtown hotel to indulge in luxury accommodations. Moments from the stadium, we feature the best in comfort" }, { "@search.score": 0.03226646035909653, "HotelName": "Sublime Palace Hotel", "Description": "Sublime Palace Hotel is located in the heart of the historic center of Sublime in an extremely vibrant and lively area within short walking distance to the sites and landmarks of the city and is surrounded by the extraordinary beauty of churches, buildings, shops and monuments. Sublime Palace is part of a lovingly restored 1800 palace." }, { "@search.score": 0.03226646035909653, "HotelName": "Swirling Currents Hotel", "Description": "Spacious rooms, glamorous suites and residences, rooftop pool, walking access to shopping, dining, entertainment and the city center." }, { "@search.score": 0.03205128386616707, "HotelName": "Nordick's Valley Motel", "Description": "Only 90 miles (about 2 hours) from the nation's capital and nearby most everything the historic valley has to offer. Hiking? Wine Tasting? Exploring the caverns? It's all nearby and we have specially priced packages to help make our B&B your home base for fun while visiting the valley." }, { "@search.score": 0.03128054738044739, "HotelName": "Gastronomic Landscape Hotel", "Description": "The Hotel stands out for its gastronomic excellence under the management of William Dough, who advises on and oversees all of the Hotel's restaurant services." }, { "@search.score": 0.03100961446762085, "HotelName": "Old Century Hotel", "Description": "The hotel is situated in a nineteenth century plaza, which has been expanded and renovated to the highest architectural standards to create a modern, functional and first-class hotel in which art and unique historical elements coexist with the most modern comforts." }, { "@search.score": 0.03077651560306549, "HotelName": "Stay-Kay City Hotel", "Description": "The hotel is ideally located on the main commercial artery of the city in the heart of Beijing." } ] }
由于 RRF 合并了结果,因此这有助于查看输入。 以下结果仅来自全文查询。 前两个结果是 Sublime Palace Hotel 和 History Lion Resort。 Sublime Palace Hotel 具有更强的 BM25 相关性分数。
{ "@search.score": 2.2626662, "HotelName": "Sublime Palace Hotel", "Description": "Sublime Palace Hotel is located in the heart of the historic center of Sublime in an extremely vibrant and lively area within short walking distance to the sites and landmarks of the city and is surrounded by the extraordinary beauty of churches, buildings, shops and monuments. Sublime Palace is part of a lovingly restored 1800 palace." }, { "@search.score": 0.86421645, "HotelName": "Luxury Lion Resort", "Description": "Unmatched Luxury. Visit our downtown hotel to indulge in luxury accommodations. Moments from the stadium, we feature the best in comfort" },
在使用 HNSW 查找匹配项的纯向量查询中,Sublime Palace Hotel 跌至第四位。 History Lion Resort 是全文搜索的第二位和矢量搜索的第三位,并没有经历相同的波动范围,因此在均匀化结果集中显示为顶级匹配。
"value": [ { "@search.score": 0.857736, "HotelId": "48", "HotelName": "Nordick's Valley Motel", "Description": "Only 90 miles (about 2 hours) from the nation's capital and nearby most everything the historic valley has to offer. Hiking? Wine Tasting? Exploring the caverns? It's all nearby and we have specially priced packages to help make our B&B your home base for fun while visiting the valley.", "Category": "Boutique" }, { "@search.score": 0.8399129, "HotelId": "49", "HotelName": "Swirling Currents Hotel", "Description": "Spacious rooms, glamorous suites and residences, rooftop pool, walking access to shopping, dining, entertainment and the city center.", "Category": "Luxury" }, { "@search.score": 0.8383954, "HotelId": "13", "HotelName": "Luxury Lion Resort", "Description": "Unmatched Luxury. Visit our downtown hotel to indulge in luxury accommodations. Moments from the stadium, we feature the best in comfort", "Category": "Resort and Spa" }, { "@search.score": 0.8254346, "HotelId": "4", "HotelName": "Sublime Palace Hotel", "Description": "Sublime Palace Hotel is located in the heart of the historic center of Sublime in an extremely vibrant and lively area within short walking distance to the sites and landmarks of the city and is surrounded by the extraordinary beauty of churches, buildings, shops and monuments. Sublime Palace is part of a lovingly restored 1800 palace.", "Category": "Boutique" }, { "@search.score": 0.82380056, "HotelId": "1", "HotelName": "Stay-Kay City Hotel", "Description": "The hotel is ideally located on the main commercial artery of the city in the heart of Beijing.", "Category": "Boutique" }, { "@search.score": 0.81514084, "HotelId": "2", "HotelName": "Old Century Hotel", "Description": "The hotel is situated in a nineteenth century plaza, which has been expanded and renovated to the highest architectural standards to create a modern, functional and first-class hotel in which art and unique historical elements coexist with the most modern comforts.", "Category": "Boutique" }, { "@search.score": 0.8133763, "HotelId": "3", "HotelName": "Gastronomic Landscape Hotel", "Description": "The Hotel stands out for its gastronomic excellence under the management of William Dough, who advises on and oversees all of the Hotel's restaurant services.", "Category": "Resort and Spa" } ]
清理
在自己的订阅中操作时,最好在项目结束时确定是否仍需要已创建的资源。 持续运行资源可能会产生费用。 可以逐个删除资源,也可以删除资源组以删除整个资源集。
可以在门户中使用最左侧窗格中的“所有资源”或“资源组”链接来查找和管理资源。
还可以尝试以下 DELETE
命令:
### Delete an index
DELETE {{baseUrl}}/indexes/hotels-vector-quickstart?api-version=2023-11-01 HTTP/1.1
Content-Type: application/json
Authorization: Bearer {{token}}
后续步骤
接下来,建议查看 Python、C# 或 JavaScript 的演示代码。