Azure 认知搜索中的同义词Synonyms in Azure Cognitive Search

搜索引擎中的同义词功能无需用户实际提供术语,便可关联隐式扩展查询作用域的等效术语。Synonyms in search engines associate equivalent terms that implicitly expand the scope of a query, without the user having to actually provide the term. 例如,若给定术语“dog”以及“canine”和“puppy”同义词关联,则包含“dog”、“canine”或“puppy”的所有文档都属于查询作用域。For example, given the term "dog" and synonym associations of "canine" and "puppy", any documents containing "dog", "canine" or "puppy" will fall within the scope of the query.

在 Azure 认知搜索中,查询时会完成同义词功能扩展。In Azure Cognitive Search, synonym expansion is done at query time. 可将同义词映射添加到服务,而不会中断现有操作。You can add synonym maps to a service with no disruption to existing operations. 可将 synonymMaps 属性添加到字段定义,而无需重新生成索引。You can add a synonymMaps property to a field definition without having to rebuild the index.

创建同义词Create synonyms

我们不提供创建同义词的门户支持,但你可以使用 REST API 或 .NET SDK。There is no portal support for creating synonyms but you can use the REST API or .NET SDK. 若要开始使用 REST,建议使用 Postman,并使用此 API 来表述请求:创建同义词映射To get started with REST, we recommend using Postman and formulation of requests using this API: Create Synonym Maps. 如果是 C# 开发人员,一开始可以使用 C# 在 Azure 认知搜索中添加同义词For C# developers, you can get started with Add Synonyms in Azure Cognitive Searching using C#.

另外,如果使用客户托管密钥进行服务端静态加密,则可对同义词映射的内容应用该保护。Optionally, if you are using customer-managed keys for service-side encryption-at-rest, you can apply that protection to the contents of your synonym map.

使用同义词Use synonyms

在 Azure 认知搜索中,同义词支持基于定义和上传到服务的同义词映射。In Azure Cognitive Search, synonym support is based on synonym maps that you define and upload to your service. 这些映射构成独立的资源(如索引或数据源),在搜索服务中可用于任何索引的任何可搜索字段。These maps constitute an independent resource (like indexes or data sources), and can be used by any searchable field in any index in your search service.

同义词映射和索引独立维护。Synonym maps and indexes are maintained independently. 定义同义词映射并将其上传到服务后,可通过在字段定义中添加名为 synonymMaps 的新属性在字段上启用同义词功能。Once you define a synonym map and upload it to your service, you can enable the synonym feature on a field by adding a new property called synonymMaps in the field definition. 创建、更新和删除同义词映射始终是一项全文档操作。也就是说,无法逐个创建、更新或删除同义词映射的各个部分。Creating, updating, and deleting a synonym map is always a whole-document operation, meaning that you cannot create, update or delete parts of the synonym map incrementally. 甚至更新单个条目也需要重新加载。Updating even a single entry requires a reload.

将同义词并入搜索应用程序需要两步:Incorporating synonyms into your search application is a two-step process:

  1. 通过以下 API 将同义词映射添加到搜索服务。Add a synonym map to your search service through the APIs below.

  2. 配置可搜索字段以在索引定义中使用同义词映射。Configure a searchable field to use the synonym map in the index definition.

可为搜索应用程序创建多个同义词映射(例如,如果应用程序支持多语言客户群,则可按语言创建同义词映射)。You can create multiple synonym maps for your search application (for example, by language if your application supports a multi-lingual customer base). 目前,一个字段仅可使用其中一种。Currently, a field can only use one of them. 可随时更新字段的 synonymMaps 属性。You can update a field's synonymMaps property at any time.

SynonymMaps 资源 APISynonymMaps Resource APIs

使用 POST 或 PUT 可在服务下添加或更新同义词映射。Add or update a synonym map under your service, using POST or PUT.

使用 POST 或 PUT 可将同义词映射上传到服务。Synonym maps are uploaded to the service via POST or PUT. 每个规则必须通过换行符(“\n”)进行分隔。Each rule must be delimited by the new line character ('\n'). 在免费服务中可为每个同义词映射定义最多 5,000 条规则,在所有其他 SKU 中可为每个映射定义最多 20,000 条规则。You can define up to 5,000 rules per synonym map in a free service and 20,000 rules per map in all other SKUs. 每条规则可包含最多 20 个扩展。Each rule can have up to 20 expansions.

同义词映射的格式必须为 Apache Solr,以下对此进行了解释。Synonym maps must be in the Apache Solr format which is explained below. 如果现有的同义词字典具有不同格式,并且希望直接使用它,请在 UserVoice 上向我们反馈。If you have an existing synonym dictionary in a different format and want to use it directly, please let us know on UserVoice.

如以下示例所示,可使用 HTTP POST 创建新的同义词映射:You can create a new synonym map using HTTP POST, as in the following example:

    POST https://[servicename].search.azure.cn/synonymmaps?api-version=2020-06-30
    api-key: [admin key]

    {
       "name":"mysynonymmap",
       "format":"solr",
       "synonyms": "
          USA, United States, United States of America\n
          Washington, Wash., WA => WA\n"
    }

此外,可使用 PUT 并在 URI 上指定同义词映射名称。Alternatively, you can use PUT and specify the synonym map name on the URI. 如果同义词映射不存在,则创建一个。If the synonym map does not exist, it will be created.

    PUT https://[servicename].search.azure.cn/synonymmaps/mysynonymmap?api-version=2020-06-30
    api-key: [admin key]

    {
       "format":"solr",
       "synonyms": "
          USA, United States, United States of America\n
          Washington, Wash., WA => WA\n"
    }
Apache Solr 同义词格式Apache Solr synonym format

Solr 格式支持等效和显式同义词映射。The Solr format supports equivalent and explicit synonym mappings. 映射规则遵循 Apache Solr 的开源同义词筛选器规范,详情请参阅此文档:SynonymFilterMapping rules adhere to the open-source synonym filter specification of Apache Solr, described in this document: SynonymFilter. 下面是等效同义词的示例规则。Below is a sample rule for equivalent synonyms.

USA, United States, United States of America

使用以上规则,搜索查询“USA”会扩展为“USA”、“United States”或“United States of America”。With the rule above, a search query "USA" will expand to "USA" OR "United States" OR "United States of America".

箭头“=>”表示显式映射。Explicit mapping is denoted by an arrow "=>". 如果指定,与“=>”左侧内容匹配的一系列搜索查询词会被替换为“=>”右侧的替代项。When specified, a term sequence of a search query that matches the left-hand side of "=>" will be replaced with the alternatives on the right-hand side. 给定以下规则,搜索查询“Washington”、“Wash”。Given the rule below, search queries "Washington", "Wash." 或“WA”全都会重写为“WA”。or "WA" will all be rewritten to "WA". 显式映射只会按指定方向应用,在此示例中,不会将查询“WA”重写为“Washington”。Explicit mapping only applies in the direction specified and does not rewrite the query "WA" to "Washington" in this case.

Washington, Wash., WA => WA

如果需要定义包含逗号的同义词,可以使用反斜杠对其进行转义,如以下示例所示:If you need to define synonyms that contain commas, you can escape them with a backslash, like in this example:

WA\, USA, WA, Washington

由于反斜杠本身是其他语言(例如 JSON 和 C#)中的特殊字符,因此你可能需要对其进行双重转义。Since backslash is itself a special character in other languages like JSON and C#, you will probably need to double-escape it. 例如,发送到上述同义词映射的 REST API 的 JSON 如下所示:For example, the JSON sent to the REST API for the above synonym map would look like this:

    {
       "format":"solr",
       "synonyms": "WA\\, USA, WA, Washington"
    }

列出服务下的同义词映射。List synonym maps under your service.

    GET https://[servicename].search.azure.cn/synonymmaps?api-version=2020-06-30
    api-key: [admin key]

获取服务下的同义词映射。Get a synonym map under your service.

    GET https://[servicename].search.azure.cn/synonymmaps/mysynonymmap?api-version=2020-06-30
    api-key: [admin key]

删除服务下的同义词映射。Delete a synonyms map under your service.

    DELETE https://[servicename].search.azure.cn/synonymmaps/mysynonymmap?api-version=2020-06-30
    api-key: [admin key]

配置可搜索字段以在索引定义中使用同义词映射。Configure a searchable field to use the synonym map in the index definition.

新字段属性 synonymMaps 可用于指定同义词映射以供可搜索字段使用。A new field property synonymMaps can be used to specify a synonym map to use for a searchable field. 同义词映射是服务级资源,服务下的任意索引字段都可以引用。Synonym maps are service level resources and can be referenced by any field of an index under the service.

    POST https://[servicename].search.azure.cn/indexes?api-version=2020-06-30
    api-key: [admin key]

    {
       "name":"myindex",
       "fields":[
          {
             "name":"id",
             "type":"Edm.String",
             "key":true
          },
          {
             "name":"name",
             "type":"Edm.String",
             "searchable":true,
             "analyzer":"en.lucene",
             "synonymMaps":[
                "mysynonymmap"
             ]
          },
          {
             "name":"name_jp",
             "type":"Edm.String",
             "searchable":true,
             "analyzer":"ja.microsoft",
             "synonymMaps":[
                "japanesesynonymmap"
             ]
          }
       ]
    }

可为类型“Edm.String”或“Collection(Edm.String)”的可搜索字段指定 synonymMapssynonymMaps can be specified for searchable fields of the type 'Edm.String' or 'Collection(Edm.String)'.

备注

每个字段仅可包含一个同义词映射。You can only have one synonym map per field. 如果要使用多个同义词映射,请在 UserVoice 上告诉我们。If you want to use multiple synonym maps, please let us know on UserVoice.

同义词功能对其他搜索功能的影响Impact of synonyms on other search features

同义词功能将使用 OR 操作符将原始查询重写为同义词。The synonyms feature rewrites the original query with synonyms with the OR operator. 出于这个原因,突出显示和计分配置文件会将原始术语和同义词视为等效项。For this reason, hit highlighting and scoring profiles treat the original term and synonyms as equivalent.

同义词功能适用于搜索查询且不适用于筛选器或方面。Synonym feature applies to search queries and does not apply to filters or facets. 同样,建议仅基于原始术语;同义词匹配不会在响应中显示。Similarly, suggestions are based only on the original term; synonym matches do not appear in the response.

同义词扩展不适用于通配符搜索术语;也不会扩展前缀、模糊和正则表达式术语。Synonym expansions do not apply to wildcard search terms; prefix, fuzzy, and regex terms aren't expanded.

如果需要执行应用同义词扩展和通配符、正则表达式或模糊搜索的单个查询,则可以使用 OR 语法组合查询。If you need to do a single query that applies synonym expansion and wildcard, regex, or fuzzy searches, you can combine the queries using the OR syntax. 例如,若要将同义词与通配符组合用于简单查询语法,则术语将为 <query> | <query>*For example, to combine synonyms with wildcards for simple query syntax, the term would be <query> | <query>*.

如果开发(非生产)环境中具有现有索引,请使用一个小字典进行试验,了解添加同义词如何更改搜索体验,包括对计分配置文件、突出显示和建议造成的影响。If you have an existing index in a development (non-production) environment, experiment with a small dictionary to see how the addition of synonyms changes the search experience, including impact on scoring profiles, hit highlighting, and suggestions.

后续步骤Next steps