“完整”Lucene 搜索语法(高级查询)的示例

在构造 Azure AI 搜索的查询时,可以将默认的简单查询分析程序替换为功能更强大的 Lucene 查询分析程序,以便构建专用的高级查询表达式。

Lucene 分析程序支持复杂的查询格式,比如字段范围查询、模糊搜索、中缀和后缀通配符搜索、邻近搜索、术语提升以及正则表达式搜索。 额外的功能需遵守更多处理要求,因此执行时间应该会更长一些。 在本文中,你可以逐步了解一些示例,这些示例演示了基于完整语法的查询操作。

注意

通过完整的 Lucene 查询语法实现的专用查询构造很多都不是按文本分析的,所以并不涉及词干分解和词形还原,这一点有些出人意料。 只会对完整字词(字词查询或短语查询)进行词法分析。 字词不完整的查询类型(前缀查询、通配符查询、正则表达式查询、模糊查询)会被直接添加到查询树中,绕过分析阶段。 对部分查询字词执行的唯一转换操作是转换为小写。

酒店示例索引

以下查询基于 hotels-sample-index,你可以按照此快速入门中的说明进行创建。

示例查询使用 REST API 和 POST 请求来表达。 可以在 REST 客户端中粘贴并运行它们。 或者,在 Azure 门户中使用搜索资源管理器的 JSON 视图。 在 JSON 视图中,可以粘贴本文中所示的查询示例。

请求头必须具有以下值:

密钥
Content-Type application/json
api-key <your-search-service-api-key>,查询或管理密钥

URI 参数必须包括具有索引名、文档集合、搜索命令和 API 版本的搜索服务终结点,类似于以下示例:

https://{{service-name}}.search.azure.cn/indexes/hotels-sample-index/docs/search?api-version=2024-07-01

请求正文的格式应为有效的 JSON:

{
    "search": "*",
    "queryType": "full",
    "select": "HotelId, HotelName, Category, Tags, Description",
    "count": true
}
  • search 设置为“*”时,表示一个未指定的查询,等效于 NULL 或空搜索。 它不是特别有用,但却是你可以执行的最简单的搜索,并且会显示索引中所有可检索的字段以及所有值。

  • queryType 设置为“完整”时,会调用完整的 Lucene 查询分析程序,它是此语法的必需参数

  • select 设置为以逗号分隔的字段列表时,可用于搜索结果组合,使其只包括在搜索结果上下文中有用的字段。

  • count 返回与搜索条件匹配的文档数。 在空搜索字符串上,计数将是索引中的所有文档(在 hotels-sample-index 中,数量为 50)。

字段化搜索将单个嵌入式搜索表达式的范围限定为特定字段。 此示例搜索包含字词“hotel”的酒店名称,而不是“motel”。 可以使用 AND 指定多个字段。

使用此查询语法时,如果想要查询的字段在搜索表达式中,则可以省略 searchFields 参数。 如果字段化搜索包含 searchFields,则 fieldName:searchExpression 始终优先于 searchFields

POST /indexes/hotel-samples-index/docs/search?api-version=2024-07-01
{
    "search": "HotelName:(hotel NOT motel) AND Category:'Boutique'",
    "queryType": "full",
    "select": "HotelName, Category",
    "count": true
}

此查询的响应应该类似于以下示例,根据“Boutique”进行筛选,返回名称中包含“hotel”的酒店,同时排除名称中包含“motel”的结果

{
  "@odata.count": 5,
  "value": [
    {
      "@search.score": 2.2289815,
      "HotelName": "Stay-Kay City Hotel",
      "Category": "Boutique"
    },
    {
      "@search.score": 1.3862944,
      "HotelName": "City Skyline Antiquity Hotel",
      "Category": "Boutique"
    },
    {
      "@search.score": 1.355046,
      "HotelName": "Old Century Hotel",
      "Category": "Boutique"
    },
    {
      "@search.score": 1.355046,
      "HotelName": "Sublime Palace Hotel",
      "Category": "Boutique"
    },
    {
      "@search.score": 1.355046,
      "HotelName": "Red Tide Hotel",
      "Category": "Boutique"
    }
  ]
}

搜索表达式可以是单个字词或短语,也可以是用括号括起来的更复杂的表达式,其中可以选择使用布尔运算符。 下面是部分示例:

  • HotelName:(hotel NOT motel)
  • Address/StateProvince:("WA" OR "CA")
  • Tags:("free wifi" NOT "free parking") AND "coffee in lobby"

如果想要两个字符串评估为单个实体,请务必将短语放置在引号内,正如这个在 Address/StateProvince 字段中搜索两个不同位置的情况一样。 你可能需要对引号进行转义 (\),具体取决于客户端。

fieldName:searchExpression 中指定的字段必须是可搜索的字段。 如需了解如何特性化字段定义,请参阅创建索引 (REST API)

模糊搜索可匹配类似的字词,包括拼写错误的字词。 若要执行模糊搜索,请在单个字词的末尾追加“~”波形符,后跟指定编辑距离的可选参数(介于 0 到 2 之间的值)。 例如,blue~blue~1 会返回 blue、blues 和 glue。

POST /indexes/hotel-samples-index/docs/search?api-version=2024-07-01
{
    "search": "Tags:conserge~",
    "queryType": "full",
    "select": "HotelName, Category, Tags",
    "searchFields": "HotelName, Category, Tags",
    "count": true
}

对此查询的响应会解析为匹配文档中的“concierge”,为简洁起见只截取了一部分

{
  "@odata.count": 9,
  "value": [
    {
      "@search.score": 1.4947624,
      "HotelName": "Twin Vortex Hotel",
      "Category": "Luxury",
      "Tags": [
        "bar",
        "restaurant",
        "concierge"
      ]
    },
    {
      "@search.score": 1.1685618,
      "HotelName": "Stay-Kay City Hotel",
      "Category": "Boutique",
      "Tags": [
        "view",
        "air conditioning",
        "concierge"
      ]
    },
    {
      "@search.score": 1.1465473,
      "HotelName": "Old Century Hotel",
      "Category": "Boutique",
      "Tags": [
        "pool",
        "free wifi",
        "concierge"
      ]
    },
. . .
  ]
}

不直接支持短语,但你可以基于多部件短语的每个字词指定一个模糊匹配,例如 search=Tags:landy~ AND sevic~。 此查询表达式根据“laundry service”查找到 15 个匹配项

注意

不会对模糊查询进行分析。 字词不完整的查询类型(前缀查询、通配符查询、正则表达式查询、模糊查询)会被直接添加到查询树中,绕过分析阶段。 对部分查询字词执行的唯一转换操作是转换为小写。

邻近搜索会查找在文档中相互靠近的字词。 在短语末尾插入波形符“~”,后跟创建邻近边界的字数。

此查询将搜索文档中彼此相距 5 个单词以内的字词“hotel”和“airport”。 引号经过转义 (\"),以保留短语:

POST /indexes/hotel-samples-index/docs/search?api-version=2024-07-01
{
    "search": "Description: \"hotel airport\"~5",
    "queryType": "full",
    "select": "HotelName, Description",
    "searchFields": "HotelName, Description",
    "count": true
}

此查询的响应应与以下示例类似:

{
  "@odata.count": 1,
  "value": [
    {
      "@search.score": 0.69167054,
      "HotelName": "Trails End Motel",
      "Description": "Only 8 miles from Downtown. On-site bar/restaurant, Free hot breakfast buffet, Free wireless internet, All non-smoking hotel. Only 15 miles from airport."
    }
  ]
}

示例 4:术语提升

术语提升是指相对于不包含术语的文档,提高包含提升术语的文档排名。 若要提升字词,请使用插入符号 ^,并且所搜索字词末尾还要附加提升系数(数字)。 提升系数默认为 1,虽然它必须是正数,但可以小于 1(例如 0.2)。 术语提升不同于计分配置文件,因为计分配置文件提升某些字段,而非特定术语。

在“before”查询中,搜索“beach access”,你会注意到有六个文档匹配一个或两个字词

POST /indexes/hotel-samples-index/docs/search?api-version=2024-07-01
{
    "search": "beach access",
    "queryType": "full",
    "select": "HotelName, Description, Tags",
    "searchFields": "HotelName, Description, Tags",
    "count": true
}

事实上,关于 access 只有两个文档匹配。 尽管文档中缺少术语 beach,但第一个实例位于第二位。

{
  "@odata.count": 6,
  "value": [
    {
      "@search.score": 1.068669,
      "HotelName": "Johnson's Family Resort",
      "Description": "Family oriented resort located in the heart of the northland. Operated since 1962 by the Smith family, we have grown into one of the largest family resorts in the state. The home of excellent Smallmouth Bass fishing with 10 small cabins, we're a home not only to fishermen but their families as well. Rebuilt in the early 2000's, all of our cabins have all the comforts of home. Sporting a huge **beach** with multiple water toys for those sunny summer days and a Lodge full of games for when you just can't swim anymore, there's always something for the family to do. A full marina offers watercraft rentals, boat launch, powered dock slips, canoes (free to use), & fish cleaning facility. Rent pontoons, 14' fishing boats, 16' fishing rigs or jet ski's for a fun day or week on the water. Pets are accepted in the lakeside cottages.",
      "Tags": [
        "24-hour front desk service",
        "pool",
        "coffee in lobby"
      ]
    },
    {
      "@search.score": 1.0162708,
      "HotelName": "Campus Commander Hotel",
      "Description": "Easy **access** to campus and steps away from the best shopping corridor in the city. From meetings in town or gameday, enjoy our prime location between the union and proximity to the university stadium.",
      "Tags": [
        "free parking",
        "coffee in lobby",
        "24-hour front desk service"
      ]
    },
    {
      "@search.score": 0.9050383,
      "HotelName": "Lakeside B & B",
      "Description": "Nature is Home on the **beach**. Explore the shore by day, and then come home to our shared living space to relax around a stone fireplace, sip something warm, and explore the library by night. Save up to 30 percent. Valid Now through the end of the year. Restrictions and blackouts may apply.",
      "Tags": [
        "laundry service",
        "concierge",
        "free parking"
      ]
    },
    {
      "@search.score": 0.8955848,
      "HotelName": "Windy Ocean Motel",
      "Description": "Oceanfront hotel overlooking the **beach** features rooms with a private balcony and 2 indoor and outdoor pools. Inspired by the natural beauty of the island, each room includes an original painting of local scenes by the owner. Rooms include a mini fridge, Keurig coffee maker, and flatscreen TV. Various shops and art entertainment are on the boardwalk, just steps away.",
      "Tags": [
        "pool",
        "air conditioning",
        "bar"
      ]
    },
    {
      "@search.score": 0.83636594,
      "HotelName": "Happy Lake Resort & Restaurant",
      "Description": "The largest year-round resort in the area offering more of everything for your vacation - at the best value! What can you enjoy while at the resort, aside from the mile-long sandy **beaches** of the lake? Check out our activities sure to excite both young and young-at-heart guests. We have it all, including being named "Property of the Year" and a "Top Ten Resort" by top publications.",
      "Tags": [
        "pool",
        "bar",
        "restaurant"
      ]
    },
    {
      "@search.score": 0.7808502,
      "HotelName": "Swirling Currents Hotel",
      "Description": "Spacious rooms, glamorous suites and residences, rooftop pool, walking **access** to shopping, dining, entertainment and the city center. Each room comes equipped with a microwave, a coffee maker and a minifridge. In-room entertainment includes complimentary W-Fi and flat-screen TVs. ",
      "Tags": [
        "air conditioning",
        "laundry service",
        "24-hour front desk service"
      ]
    }
  ]
}

在“after”查询中,重试该搜索,此时会提升包含字词“beach”而非“access”的结果。 查询的人工可读版本为 search=Description:beach^2 access。 根据你的客户端,可能需要将 ^2 表达为 %5E2

POST /indexes/hotel-samples-index/docs/search?api-version=2024-07-01
{
    "search": "Description:beach^2 access",
    "queryType": "full",
    "select": "HotelName, Description, Tags",
    "searchFields": "HotelName, Description, Tags",
    "count": true
}

提升术语 beach后,关于 Campus Commander Hotel 的匹配将下降到第五位。

{
  "@odata.count": 6,
  "value": [
    {
      "@search.score": 2.137338,
      "HotelName": "Johnson's Family Resort",
      "Description": "Family oriented resort located in the heart of the northland. Operated since 1962 by the Smith family, we have grown into one of the largest family resorts in the state. The home of excellent Smallmouth Bass fishing with 10 small cabins, we're a home not only to fishermen but their families as well. Rebuilt in the early 2000's, all of our cabins have all the comforts of home. Sporting a huge beach with multiple water toys for those sunny summer days and a Lodge full of games for when you just can't swim anymore, there's always something for the family to do. A full marina offers watercraft rentals, boat launch, powered dock slips, canoes (free to use), & fish cleaning facility. Rent pontoons, 14' fishing boats, 16' fishing rigs or jet ski's for a fun day or week on the water. Pets are accepted in the lakeside cottages.",
      "Tags": [
        "24-hour front desk service",
        "pool",
        "coffee in lobby"
      ]
    },
    {
      "@search.score": 1.8100766,
      "HotelName": "Lakeside B & B",
      "Description": "Nature is Home on the beach. Explore the shore by day, and then come home to our shared living space to relax around a stone fireplace, sip something warm, and explore the library by night. Save up to 30 percent. Valid Now through the end of the year. Restrictions and blackouts may apply.",
      "Tags": [
        "laundry service",
        "concierge",
        "free parking"
      ]
    },
    {
      "@search.score": 1.7911696,
      "HotelName": "Windy Ocean Motel",
      "Description": "Oceanfront hotel overlooking the beach features rooms with a private balcony and 2 indoor and outdoor pools. Inspired by the natural beauty of the island, each room includes an original painting of local scenes by the owner. Rooms include a mini fridge, Keurig coffee maker, and flatscreen TV. Various shops and art entertainment are on the boardwalk, just steps away.",
      "Tags": [
        "pool",
        "air conditioning",
        "bar"
      ]
    },
    {
      "@search.score": 1.6727319,
      "HotelName": "Happy Lake Resort & Restaurant",
      "Description": "The largest year-round resort in the area offering more of everything for your vacation - at the best value! What can you enjoy while at the resort, aside from the mile-long sandy beaches of the lake? Check out our activities sure to excite both young and young-at-heart guests. We have it all, including being named "Property of the Year" and a "Top Ten Resort" by top publications.",
      "Tags": [
        "pool",
        "bar",
        "restaurant"
      ]
    },
    {
      "@search.score": 1.0162708,
      "HotelName": "Campus Commander Hotel",
      "Description": "Easy access to campus and steps away from the best shopping corridor in the city. From meetings in town or gameday, enjoy our prime location between the union and proximity to the university stadium.",
      "Tags": [
        "free parking",
        "coffee in lobby",
        "24-hour front desk service"
      ]
    },
    {
      "@search.score": 0.7808502,
      "HotelName": "Swirling Currents Hotel",
      "Description": "Spacious rooms, glamorous suites and residences, rooftop pool, walking access to shopping, dining, entertainment and the city center. Each room comes equipped with a microwave, a coffee maker and a minifridge. In-room entertainment includes complimentary W-Fi and flat-screen TVs. ",
      "Tags": [
        "air conditioning",
        "laundry service",
        "24-hour front desk service"
      ]
    }
  ]
}

示例 5:正则表达式

正则表达式搜索基于正斜杠“/”之间的内容查找匹配项,如在 RegExp 类中所记录的那样。

POST /indexes/hotel-samples-index/docs/search?api-version=2024-07-01
{
    "search": "HotelName:/(Mo|Ho)tel/",
    "queryType": "full",
    "select": "HotelName",
    "count": true
}

对此查询的响应应类似于下面的示例(为简洁起见只截取了一部分):

{
  "@odata.count": 25,
  "value": [
    {
      "@search.score": 1,
      "HotelName": "Country Residence Hotel"
    },
    {
      "@search.score": 1,
      "HotelName": "Downtown Mix Hotel"
    },
    {
      "@search.score": 1,
      "HotelName": "Gastronomic Landscape Hotel"
    },
    . . . 
    {
      "@search.score": 1,
      "HotelName": "Trails End Motel"
    },
    {
      "@search.score": 1,
      "HotelName": "Nordick's Valley Motel"
    },
    {
      "@search.score": 1,
      "HotelName": "King's Cellar Hotel"
    }
  ]
}

注意

不会对正则表达式查询进行分析。 对部分查询字词执行的唯一转换操作是转换为小写。

可将通常可识别的语法用于多个 (*) 或单个 (?) 字符通配符搜索。 Lucene 查询分析器支持将这些符号与单个术语一起使用,但不能与短语一起使用。

在此查询中,搜索包含前缀“sc”的酒店名称。 不能将 *? 符号用作搜索的第一个字符。

POST /indexes/hotel-samples-index/docs/search?api-version=2024-07-01
{
    "search": "HotelName:sc*",
    "queryType": "full",
    "select": "HotelName",
    "count": true
}

此查询的响应应与以下示例类似:

{
  "@odata.count": 1,
  "value": [
    {
      "@search.score": 1,
      "HotelName": "Waterfront Scottish Inn"
    }
  ]
}

注意

不会对通配符查询进行分析。 对部分查询字词执行的唯一转换操作是转换为小写。

尝试在代码中指定查询。 以下链接介绍了如何使用 Azure SDK 设置搜索查询。

可在以下文章中找到更多语法参考、查询体系结构和示例: