如何在 Azure 认知搜索中为多种语言创建索引How to create an index for multiple languages in Azure Cognitive Search

索引可以包括包含多种语言内容的字段,例如,为特定于语言的字符串创建单独的字段。Indexes can include fields containing content from multiple languages, for example, creating individual fields for language-specific strings. 为了在索引和查询过程中获得最佳结果,请分配提供适当语言规则的语言分析器。For best results during indexing and querying, assign a language analyzer that provides the appropriate linguistic rules.

Azure 认知搜索提供了大量来自 Lucene 和 Microsoft 的语言分析器,可以使用 Analyzer 属性将它们分配给各个字段。Azure Cognitive Search offers a large selection of language analyzers from both Lucene and Microsoft that can be assigned to individual fields using the Analyzer property. 还可以在门户中指定语言分析器,如本文所述。You can also specify a language analyzer in the portal, as described in this article.

将分析器添加到字段Add analyzers to fields

创建字段时指定语言分析器。A language analyzer is specified when a field is created. 将分析器添加到现有字段定义需要覆盖(并重新加载)索引,或创建与原始字段相同但具有分析器分配的新字段。Adding an analyzer to an existing field definition requires overwriting (and reloading) the index, or creating a new field identical to the original, but with an analyzer assignment. 然后,可以在方便时删除未使用的字段。You could then delete the unused field at your convenience.

  1. 登录 Azure 门户并查找搜索服务。Sign in to the Azure portal and find your search service.
  2. 在服务仪表板顶部的命令栏中单击“添加索引” 即可启动新的索引,或打开现有索引,在添加至现有索引的新字段上设置分析器。Click Add index in the command bar at the top of the service dashboard to start a new index, or open an existing index to set an analyzer on new fields you're adding to an existing index.
  3. 通过提供名称开始字段定义。Start a field definition by providing a name.
  4. 选择 Edm.String 数据类型。Choose the Edm.String data type. 只有字符串字段是全文可搜索的。Only string fields are full-text searchable.
  5. 设置可搜索特性以启用 Analyzer 属性。Set the Searchable attribute to enable the Analyzer property. 字段必须基于文本才能使用语言分析器。A field must be text-based in order to make use of a language analyzer.
  6. 选择一个可用的分析器。Choose one of the available analyzers.

在字段定义期间分配语言分析器Assign language analyzers during field definition

默认情况下,所有可搜索字段都使用与语言无关的标准 Lucene 分析器By default, all searchable fields use the Standard Lucene analyzer which is language-agnostic. 若要查看支持的分析器的完整列表,请参阅将语言分析器添加到 Azure 认知搜索索引To view the full list of supported analyzers, see Add language analyzers to an Azure Cognitive Search index.

在门户中,分析器旨在按原样使用。In the portal, analyzers are intended to be used as-is. 如果需要筛选器和 tokenizer 的自定义设置或特定配置,则应在代码中创建自定义分析器If you require customization or a specific configuration of filters and tokenizers, you should create a custom analyzer in code. 门户不支持选择或配置自定义分析器。The portal does not support selecting or configuring custom analyzers.

查询语言特定的字段Query language-specific fields

为字段选择语言分析器后,它用于该字段的每个索引和搜索请求。Once the language analyzer is selected for a field, it will be used with each indexing and search request for that field. 当针对使用不同分析器的多个字段发出查询时,查询将由为每个字段分配的分析器独立处理。When a query is issued against multiple fields using different analyzers, the query will be processed independently by the assigned analyzers for each field.

如果已知发出查询的代理的语言,可使用 searchFields 查询参数,将搜索请求的范围限制为特定字段。If the language of the agent issuing a query is known, a search request can be scoped to a specific field using the searchFields query parameter. 以下查询将仅针对波兰文描述发出:The following query will be issued only against the description in Polish:

https://[service name].search.azure.cn/indexes/[index name]/docs?search=darmowy&searchFields=PolishContent&api-version=2019-05-06

使用搜索资源管理器粘贴类似上述内容的查询,可以从门户查询索引。You can query your index from the portal, using Search explorer to paste in a query similar to the one shown above.

提升语言特定的字段Boost language-specific fields

有时,发出查询的代理的语言未知,在此情况下,可以针对所有字段同时发出查询。Sometimes the language of the agent issuing a query is not known, in which case the query can be issued against all fields simultaneously. 如果需要,可以使用计分配置文件来定义采用特定语言的结果首选项。If needed, preference for results in a certain language can be defined using scoring profiles. 在下面的示例中,与波兰文和法文的匹配项相比,英文描述中提供的匹配项的评分更高:In the example below, matches found in the description in English will be scored higher relative to matches in Polish and French:

    "scoringProfiles": [
      {
        "name": "englishFirst",
        "text": {
          "weights": { "description_en": 2 }
        }
      }
    ]

https://[service name].search.azure.cn/indexes/[index name]/docs?search=Microsoft&scoringProfile=englishFirst&api-version=2020-06-30

后续步骤Next steps

如果是一名 .NET 开发人员,请注意,可以使用 Azure 认知搜索 .NET SDKAnalyzer 属性来配置语言分析器。If you're a .NET developer, note that you can configure language analyzers using the Azure Cognitive Search .NET SDK and the Analyzer property.