教程:为电话号码创建自定义分析器Tutorial: Create a custom analyzer for phone numbers

分析器是任何搜索解决方案中的关键组件。Analyzers are a key component in any search solution. 若要提高搜索结果的质量,请务必了解分析器如何工作并影响这些结果。To improve the quality of search results, it's important to understand how analyzers work and impact these results.

在某些情况下(例如对于自定义文本字段),只需选择正确的语言分析器即可改善搜索结果。In some cases, like with a free text field, simply selecting the correct language analyzer will improve search results. 但是,一些场景(例如准确搜索电话号码、URL 或电子邮件)可能需要使用自定义分析器。However, some scenarios such as accurately searching phone numbers, URLs, or emails may require the use of custom analyzers.

本教程使用 Postman 和 Azure 认知搜索的 REST API 执行以下任务:This tutorial uses Postman and Azure Cognitive Search's REST APIs to:

  • 说明分析器的工作方式Explain how analyzers work
  • 定义用于搜索电话号码的自定义分析器Define a custom analyzer for searching phone numbers
  • 测试自定义分析器切分文本的方式Test how the custom analyzer tokenizes text
  • 创建单独的分析器用于索引和搜索,以进一步改善结果Create separate analyzers for indexing and searching to further improve results

先决条件Prerequisites

本教程需要以下服务和工具。The following services and tools are required for this tutorial.

下载文件Download files

本教程的源代码位于 Azure-Samples/azure-search-postman-samples GitHub 存储库中的 custom-analyzers 文件夹中。Source code for this tutorial is in the custom-analyzers folder in the Azure-Samples/azure-search-postman-samples GitHub repository.

1 - 创建 Azure 认知搜索服务1 - Create Azure Cognitive Search service

若要完成本教程,你需要有一个 Azure 认知搜索服务,该服务可以在门户中创建To complete this tutorial, you'll need an Azure Cognitive Search service, which you can create in the portal. 可使用免费层完成本演练。You can use the Free tier to complete this walkthrough.

要执行下一步骤,你需要了解搜索服务的名称及其 API 密钥。For the next step, you'll need to know the name of your search service and its API Key. 如果你不确定如何查找这些项,请查看此快速入门If you're unsure how to find those items, check out this quickstart.

2 - 设置 Postman2 - Set up Postman

接下来,启动 Postman 并导入从 Azure-Samples/azure-search-postman-samples 下载的集合。Next, start Postman and import the collection you downloaded from Azure-Samples/azure-search-postman-samples.

若要导入集合,请转到“文件” > “导入”,然后选择要导入的集合文件 。To import the collection, go to Files > Import, then select the collection file you'd like to import.

对于每个请求,需要:For each request, you need to:

  1. 请将 <YOUR-SEARCH-SERVICE> 替换为搜索服务的名称。Replace <YOUR-SEARCH-SERVICE> with the name of your search service.

  2. <YOUR-ADMIN-API-KEY> 替换为搜索服务的主要密钥或辅助密钥。Replace <YOUR-ADMIN-API-KEY> with either the primary or secondary key of your search service.

Postman 请求 URL 和标头

如果不熟悉 Postman,请参阅探索 Azure 认知搜索 REST APIIf you're unfamiliar with Postman, see Explore Azure Cognitive Search REST APIs.

3 - 创建初始索引3 - Create an initial index

在此步骤中,我们将创建一个初始索引,将文档加载到索引中,然后查询文档以查看初始搜索的执行情况。In this step, we'll create an initial index, load documents into the index, and then query the documents to see how our initial searches perform.

创建索引Create index

首先,我们创建一个名为 tutorial-basic-index 的简单索引,其中包含两个字段:idphone_numberWe'll start by creating a simple index called tutorial-basic-index with two fields: id and phone_number. 我们尚未定义分析器,因此默认使用 standard.lucene 分析器。We haven't defined an analyzer yet so the standard.lucene analyzer will be used by default.

为创建该索引,我们发送以下请求:To create the index, we send the following request:

PUT https://<YOUR-SEARCH-SERVICE-NAME>.search.azure.cn/indexes/tutorial-basic-index?api-version=2019-05-06
  Content-Type: application/json
  api-key: <YOUR-ADMIN-API-KEY>

  {
    "fields": [
      {
        "name": "id",
        "type": "Edm.String",
        "key": true,
        "searchable": true,
        "filterable": false,
        "facetable": false,
        "sortable": true
      },
      {
        "name": "phone_number",
        "type": "Edm.String",
        "sortable": false,
        "searchable": true,
        "filterable": false,
        "facetable": false
      }
    ]
  }

加载数据Load data

接下来,我们将数据加载到索引中。Next, we'll load data into the index. 在某些情况下,你可能无法控制引入的电话号码的格式,因此我们将针对不同的格式进行测试。In some cases, you may not have control over the format of the phone numbers ingested so we'll test against different kinds of formats. 理想情况下,搜索解决方案将返回所有匹配的电话号码,而不考虑它们的格式。Ideally, a search solution will return all matching phone numbers regardless of their format.

使用以下请求将数据加载到索引中:Data is loaded into the index using the following request:

POST https://<YOUR-SEARCH-SERVICE-NAME>.search.azure.cn/indexes/tutorial-basic-index/docs/index?api-version=2019-05-06
  Content-Type: application/json
  api-key: <YOUR-ADMIN-API-KEY>

  {
    "value": [
      {
        "@search.action": "upload",  
        "id": "1",
        "phone_number": "425-555-0100"
      },
      {
        "@search.action": "upload",  
        "id": "2",
        "phone_number": "(321) 555-0199"
      },
      {  
        "@search.action": "upload",  
        "id": "3",
        "phone_number": "+1 425-555-0100"
      },
      {  
        "@search.action": "upload",  
        "id": "4",  
        "phone_number": "+1 (321) 555-0199"
      },
      {
        "@search.action": "upload",  
        "id": "5",
        "phone_number": "4255550100"
      },
      {
        "@search.action": "upload",  
        "id": "6",
        "phone_number": "13215550199"
      },
      {
        "@search.action": "upload",  
        "id": "7",
        "phone_number": "425 555 0100"
      },
      {
        "@search.action": "upload",  
        "id": "8",
        "phone_number": "321.555.0199"
      }
    ]  
  }

索引中包含数据后,我们即可开始搜索。With the data in the index, we're ready to start searching.

为了使搜索直观,最好不要期望用户以特定的方式设置查询格式。To make the search intuitive, it's best to not expect users to format queries in a specific way. 用户可以使用上面显示的任何格式搜索 (425) 555-0100,并且仍然期望返回结果。A user could search for (425) 555-0100 in any of the formats we showed above and will still expect results to be returned. 在此步骤中,我们将测试几个示例查询,查看它们的执行情况。In this step, we'll test out a couple of sample queries to see how they perform.

首先,搜索 (425) 555-0100We start by searching (425) 555-0100:

GET https://<YOUR-SEARCH-SERVICE-NAME>.search.azure.cn/indexes/tutorial-basic-index/docs?api-version=2019-05-06&search=(425) 555-0100
  Content-Type: application/json
  api-key: <YOUR-ADMIN-API-KEY>  

此查询返回四个预期结果中的三个结果,同时还返回两个意外结果 :This query returns three out of four expected results but also returns two unexpected results:

{
    "value": [
        {
            "@search.score": 0.05634898,
            "phone_number": "+1 425-555-0100"
        },
        {
            "@search.score": 0.05634898,
            "phone_number": "425 555 0100"
        },
        {
            "@search.score": 0.05634898,
            "phone_number": "425-555-0100"
        },
        {
            "@search.score": 0.020766128,
            "phone_number": "(321) 555-0199"
        },
        {
            "@search.score": 0.020766128,
            "phone_number": "+1 (321) 555-0199"
        }
    ]
}

接下来,让我们搜索不带任何格式的号码 4255550100Next, let's search a number without any formatting 4255550100

GET https://<YOUR-SEARCH-SERVICE-NAME>.search.azure.cn/indexes/tutorial-basic-index/docs?api-version=2019-05-06&search=4255550100
  api-key: <YOUR-ADMIN-API-KEY>

但这一次情况更糟,此查询只返回了四个正确匹配项之一。This query does even worse, only returning one of four correct matches.

{
    "value": [
        {
            "@search.score": 0.6015292,
            "phone_number": "4255550100"
        }
    ]
}

对于出现这些结果,不止你一个人会感觉到困惑。If you find these results confusing, you're not alone. 在下一部分中,我们将深入探讨为什么会得到这些结果。In the next section, we'll dig into why we're getting these results.

4 - 调试搜索结果4 - Debug search results

若要理解这些搜索结果,首先必须了解分析器的工作方式。To understand these search results, it's important to first understand how analyzers work. 然后,我们可以使用分析文本 API 测试默认分析器,然后创建满足我们需要的分析器。From there, we can test the default analyzer using the Analyze Text API and then create an analyzer that meets our needs.

分析器的工作方式How analyzers work

分析器全文搜索引擎的组成部分,负责在查询字符串和带索引文档中进行文本处理。An analyzer is a component of the full text search engine responsible for processing text in query strings and indexed documents. 不同的分析器根据具体的方案以不同的方式处理文本。Different analyzers manipulate text in different ways depending on the scenario. 对于此场景,我们需要构建一个专为电话号码定制的分析器。For this scenario, we need to build an analyzer tailored to phone numbers.

分析器包含三个组件:Analyzers consist of three components:

  • 字符筛选器,用于从输入文本中删除或替换单个字符。Character filters that remove or replace individual characters from the input text.
  • Tokenizer,用于将输入文本分解为标记,这些标记将成为输入索引中的密钥。A Tokenizer that breaks the input text into tokens, which become keys in the search index.
  • 标记筛选器,用于处理 tokenizer 生成的标记。Token filters that manipulate the tokens produced by the tokenizer.

在下图中,可以看到这三个组件如何协同工作以切分一个句子:In the diagram below, you can see how these three components work together to tokenize a sentence:

用于标记句子的分析器进程示意图

这些标记随后被存储在倒排索引中,以便进行快速的全文搜索。These tokens are then stored in an inverted index, which allows for fast, full-text searches. 倒排索引通过将在词法分析过程中提取的所有唯一词映射到它们所在的文档来实现全文搜索。An inverted index enables full-text search by mapping all unique terms extracted during lexical analysis to the documents in which they occur. 下图提供了一个示例:You can see an example in the diagram below:

倒排索引示例

所有搜索均可归结为搜索存储在倒排索引中的字词。All of search comes down to searching for the terms stored in the inverted index. 用户发出查询时:When a user issues a query:

  1. 解析查询并分析查询词。The query is parsed and the query terms are analyzed.
  2. 然后扫描倒排索引以查找具有匹配词的文档。The inverted index is then scanned for documents with matching terms.
  3. 最后,根据相似性算法对检索到的文档进行排序。Finally, the retrieved documents are ranked by the similarity algorithm.

用于对相似性排名的分析器进程示意图

如果查询词与倒排索引中的词不匹配,则不会返回结果。If the query terms don't match the terms in your inverted index, results won't be returned. 若要详细了解查询的工作方式,请参阅这篇关于全文搜索的文章。To learn more about how queries work, see this article on full text search.

备注

对于此规则,部分字词查询是一个重要的例外情况。Partial term queries are an important exception to this rule. 与常规字词查询不同,这些查询(前缀查询、通配符查询、正则表达式查询)绕过了词法分析过程。These queries (prefix query, wildcard query, regex query) bypass the lexical analysis process unlike regular term queries. 部分字词在与索引中的字词匹配之前只使用小写。Partial terms are only lowercased before being matched against terms in the index. 如果分析器未配置为支持这些类型的查询,你通常会收到意外的结果,因为索引中不存在匹配的字词。If an analyzer isn't configured to support these types of queries, you'll often receive unexpected results because matching terms don't exist in the index.

使用分析文本 API 测试分析器Test analyzer using the Analyze Text API

Azure 认知搜索提供了一个分析文本 API,通过该 API 可测试分析器以了解它们如何处理文本。Azure Cognitive Search provides an Analyze Text API that allows you to test analyzers to understand how they process text.

使用以下请求调用分析文本 API:The Analyze Text API is called using the following request:

POST https://<YOUR-SEARCH-SERVICE-NAME>.search.azure.cn/indexes/tutorial-basic-index/analyze?api-version=2019-05-06
  Content-Type: application/json
  api-key: <YOUR-ADMIN-API-KEY>

  {
      "text": "(425) 555-0100",
      "analyzer": "standard.lucene"
  }

然后,API 返回从文本中提取的标记列表。The API then returns a list of the tokens extracted from the text. 你可以看到,标准 Lucene 分析器将电话号码拆分成了三个单独的标记:You can see that the standard Lucene analyzer splits the phone number into three separate tokens:

{
    "tokens": [
        {
            "token": "425",
            "startOffset": 1,
            "endOffset": 4,
            "position": 0
        },
        {
            "token": "555",
            "startOffset": 6,
            "endOffset": 9,
            "position": 1
        },
        {
            "token": "0100",
            "startOffset": 10,
            "endOffset": 14,
            "position": 2
        }
    ]
}

相反,格式化为没有任何标点符号的电话号码 4255550100 被切分为单个标记。Conversely, the phone number 4255550100 formatted without any punctuation is tokenized into a single token.

{
  "text": "4255550100",
  "analyzer": "standard.lucene"
}
{
    "tokens": [
        {
            "token": "4255550100",
            "startOffset": 0,
            "endOffset": 10,
            "position": 0
        }
    ]
}

请记住,查询词和索引文档都会得到分析。Keep in mind that both query terms and the indexed documents are analyzed. 回顾上一步的搜索结果,我们可以开始了解为什么会返回这些结果。Thinking back to the search results from the previous step, we can start to see why those results were returned.

在第一个查询中,返回了不正确的电话号码,因为其中一个词 555 与我们搜索的一个词匹配。In the first query, the incorrect phone numbers were returned because one of their terms, 555, matched one of the terms we searched. 在第二个查询中,只返回了一个号码,因为它是唯一具有与 4255550100 匹配的词的记录。In the second query, only the one number was returned because it was the only record that had a term matching 4255550100.

5 - 生成自定义分析器5 - Build a custom analyzer

现在我们理解了我们所看到的结果,接着我们生成一个自定义分析器来改善词汇切分逻辑。Now that we understand the results we're seeing, let's build a custom analyzer to improve the tokenization logic.

目的是提供对电话号码的直观搜索,无论查询或索引字符串采用哪种格式。The goal is to provide intuitive search against phone numbers no matter what format the query or indexed string is in. 为实现此结果,我们将指定字符筛选器tokenizer标记筛选器To achieve this result, we'll specify a character filter, a tokenizer, and a token filter.

字符筛选器Character filters

字符筛选器用于在将文本馈送到 tokenizer 之前对其进行处理。Character filters are used to process text before it's fed into the tokenizer. 字符筛选器的常见用途包括筛选出 HTML 元素或替换特殊字符。Common uses of character filters include filtering out HTML elements or replacing special characters.

对于电话号码,我们希望删除空格和特殊字符,因为并非所有电话号码格式都包含相同的特殊字符和空格。For phone numbers, we want to remove whitespace and special characters because not all phone number formats contain the same special characters and spaces.

"charFilters": [
    {
      "@odata.type": "#Microsoft.Azure.Search.MappingCharFilter",
      "name": "phone_char_mapping",
      "mappings": [
        "-=>",
        "(=>",
        ")=>",
        "+=>",
        ".=>",
        "\\u0020=>"
      ]
    }
  ]

上方的筛选器将从输入中删除 -()+. 和空格。The filter above will remove - ( ) + . and spaces from the input.

输入Input 输出Output
(321) 555-0199 3215550199
321.555.0199 3215550199

TokenizerTokenizers

Tokenizer 将文本拆分为标记,并在此过程中丢弃某些字符,如标点符号。Tokenizers split text into tokens and discard some characters, such as punctuation, along the way. 在许多情况下,词汇切分的目标是将句子拆分为单个单词。In many cases, the goal of tokenization is to split a sentence into individual words.

对于此场景,我们使用关键字 tokenizer (keyword_v2),因为我们想要将电话号码捕获为单个词。For this scenario, we'll use a keyword tokenizer, keyword_v2, because we want to capture the phone number as a single term. 请注意,这并不是解决此问题的唯一方法。Note that this isn't the only way to solve this problem. 请参阅下面的备用方法部分。See the Alternate approaches section below.

关键字 tokenizer 始终将收到的相同文本作为单个词输出。Keyword tokenizers always output the same text it was given as a single term.

输入Input 输出Output
The dog swims. [The dog swims.]
3215550199 [3215550199]

标记筛选器Token filters

标记筛选器将筛选出或修改由 tokenizer 生成的标记。Token filters will filter out or modify the tokens generated by the tokenizer. 标记筛选器的一种常见用法是使用小写标记筛选器将所有字符小写。One common use of a token filter is to lowercase all characters using a lowercase token filter. 另一种常见用法是筛选出非索引字,例如 theandisAnother common use is filtering out stopwords such as the, and, or is.

在此场景中我们不需要使用上述任何一种筛选器,而将使用 nGram 标记筛选器以允许对电话号码进行部分搜索。While we don't need to use either of those filters for this scenario, we'll use an nGram token filter to allow for partial searches of phone numbers.

"tokenFilters": [
  {
    "@odata.type": "#Microsoft.Azure.Search.NGramTokenFilterV2",
    "name": "custom_ngram_filter",
    "minGram": 3,
    "maxGram": 20
  }
]

NGramTokenFilterV2NGramTokenFilterV2

nGram_v2 标记筛选器根据 minGrammaxGram 参数将标记拆分为给定大小的 n 元语法。The nGram_v2 token filter splits tokens into n-grams of a given size based on the minGram and maxGram parameters.

对于电话分析器,我们将 minGram 设置为 3,因为这是我们希望用户搜索的最短子字符串。For the phone analyzer, we set minGram to 3 because that is the shortest substring we expect users to search. maxGram 设置为 20,确保所有电话号码(即使是分机号)都可放入单个 n 元语法中。maxGram is set to 20 to ensure that all phone numbers, even with extensions, will fit into a single n-gram.

令人遗憾的是,n 元语法的副作用是会返回一些误报。The unfortunate side effect of n-grams is that some false positives will be returned. 我们将在第 7 步中解决此问题,方法是为不包含 n 元语法标记筛选器的搜索构建一个单独的分析器。We'll fix this in step 7 by building out a separate analyzer for searches that doesn't include the n-gram token filter.

输入Input 输出Output
[12345] [123, 1234, 12345, 234, 2345, 345]
[3215550199] [321, 3215, 32155, 321555, 3215550, 32155501, 321555019, 3215550199, 215, 2155, 21555, 215550, ... ]

分析器Analyzer

有了字符筛选器、tokenizer 和标记筛选器,我们就可以定义分析器了。With our character filters, tokenizer, and token filters in place, we're ready to define our analyzer.

"analyzers": [
  {
    "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
    "name": "phone_analyzer",
    "tokenizer": "custom_tokenizer_phone",
    "tokenFilters": [
      "custom_ngram_filter"
    ],
    "charFilters": [
      "phone_char_mapping"
    ]
  }
]
输入Input 输出Output
12345 [123, 1234, 12345, 234, 2345, 345]
(321) 555-0199 [321, 3215, 32155, 321555, 3215550, 32155501, 321555019, 3215550199, 215, 2155, 21555, 215550, ... ]

请注意,现在可以搜索输出中的任何标记。Notice that any of the tokens in the output can now be searched. 如果查询包含这些标记中的任何一个,则返回电话号码。If our query includes any of those tokens, the phone number will be returned.

定义自定义分析器后,重新创建索引,以便在下一步中可使用自定义分析器进行测试。With the custom analyzer defined, recreate the index so that the custom analyzer will be available for testing in the next step. 为简单起见,Postman 集合将使用我们定义的分析器创建名为 tutorial-first-analyzer 的新索引。For simplicity, the Postman collection creates a new index named tutorial-first-analyzer with the analyzer we defined.

6 - 测试自定义分析器6 - Test the custom analyzer

创建索引后,现在可以使用以下请求测试创建的分析器:After creating the index, you can now test out the analyzer we created using the following request:

POST https://<YOUR-SEARCH-SERVICE-NAME>.search.azure.cn/indexes/tutorial-first-analyzer/analyze?api-version=2019-05-06
  Content-Type: application/json
  api-key: <YOUR-ADMIN-API-KEY>  

  {
    "text": "+1 (321) 555-0199",
    "analyzer": "phone_analyzer"
  }

然后,你将能够看到由电话号码生成的标记集合:You will then be able to see the collection of tokens resulting from the phone number:

{
    "tokens": [
        {
            "token": "132",
            "startOffset": 1,
            "endOffset": 17,
            "position": 0
        },
        {
            "token": "1321",
            "startOffset": 1,
            "endOffset": 17,
            "position": 0
        },
        {
            "token": "13215",
            "startOffset": 1,
            "endOffset": 17,
            "position": 0
        },
        ...
        ...
        ...
    ]
}

7 - 为查询生成自定义分析器7 - Build a custom analyzer for queries

使用自定义分析器对索引进行一些示例查询后,你会发现召回率有所改进,并且所有匹配的电话号码现在均已返回。After making some sample queries against the index with the custom analyzer, you'll find that recall has improved and all matching phone numbers are now returned. 但是,n 元语法标记筛选器也会返回一些误报。However, the n-gram token filter causes some false positives to be returned as well. 这是 n 元语法标记筛选器的常见副作用。This is a common side effect of an n-gram token filter.

为了防止误报,我们将创建一个单独的分析器用于查询。To prevent false positives, we'll create a separate analyzer for querying. 此分析器将与我们已创建的分析器相同,但没有 custom_ngram_filterThis analyzer will be the same as the analyzer we created already but without the custom_ngram_filter.

    {
      "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
      "name": "phone_analyzer_search",
      "tokenizer": "custom_tokenizer_phone",
      "tokenFilters": [],
      "charFilters": [
        "phone_char_mapping"
      ]
    }

然后,在索引定义中,同时指定 indexAnalyzersearchAnalyzerIn the index definition, we then specify both an indexAnalyzer and a searchAnalyzer.

    {
      "name": "phone_number",
      "type": "Edm.String",
      "sortable": false,
      "searchable": true,
      "filterable": false,
      "facetable": false,
      "indexAnalyzer": "phone_analyzer",
      "searchAnalyzer": "phone_analyzer_search"
    }

进行该更改后,一切准备就绪。With this change, you're all set. 重新创建索引,为数据编制索引,并再次测试查询以验证搜索是否按预期方式工作。Recreate the index, index the data, and test the queries again to verify the search works as expected. 如果你使用的是 Postman 集合,它将创建第三个名为 tutorial-second-analyzer 的索引。If you're using the Postman collection, it will create a third index named tutorial-second-analyzer.

备用方法Alternate approaches

以上分析器旨在最大程度地提高搜索灵活性。The analyzer above was designed to maximize the flexibility for search. 但这样做的代价是,在索引中存储许多可能无关紧要的字词。However, it does so at the cost of storing many potentially unimportant terms in the index.

下面的示例显示了另一个同样可执行此任务的分析器。The example below shows a different analyzer that can also be used for this task.

除了难以按逻辑方式划分电话号码的输入数据(例如 14255550100),该分析器运行良好。The analyzer works well except for input data such as 14255550100 that makes it difficult to logically chunk the phone number. 例如,分析器将无法区分国家/地区代码 1 和区号 425For example, the analyzer wouldn't be able to separate the country code, 1, from the area code, 425. 如果用户在搜索中未包含国家/地区代码,则该差异将导致不返回上述号码。This discrepancy would lead to the number above not being returned if a user didn't include a country code in their search.

"analyzers": [
  {
    "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
    "name": "phone_analyzer_shingles",
    "tokenizer": "custom_tokenizer_phone",
    "tokenFilters": [
      "custom_shingle_filter"
    ]
  }
],
"tokenizers": [
  {
    "@odata.type": "#Microsoft.Azure.Search.StandardTokenizerV2",
    "name": "custom_tokenizer_phone",
    "maxTokenLength": 4
  }
],
"tokenFilters": [
  {
    "@odata.type": "#Microsoft.Azure.Search.ShingleTokenFilter",
    "name": "custom_shingle_filter",
    "minShingleSize": 2,
    "maxShingleSize": 6,
    "tokenSeparator": ""
  }
]

在下面的示例中,你可以看到电话号码被拆分为你通常希望用户搜索的区块。You can see in the example below that the phone number is split into the chunks you would normally expect a user to be searching for.

输入Input 输出Output
(321) 555-0199 [321, 555, 0199, 321555, 5550199, 3215550199]

根据你的需求,这可能是解决问题的更有效方法。Depending on your requirements, this may be a more efficient approach to the problem.

重置并重新运行Reset and rerun

为简单起见,本教程指导你创建了三个新索引。For simplicity, this tutorial had you create three new indexes. 但是,在开发的早期阶段删除和重新创建索引是很常见的。However, it's common to delete and recreate indexes during the early stages of development. 可以在 Azure 门户中或使用以下 API 调用删除索引:You can delete an index in the Azure portal or using the following API call:

DELETE https://<YOUR-SEARCH-SERVICE-NAME>.search.azure.cn/indexes/tutorial-basic-index?api-version=2019-05-06
  api-key: <YOUR-ADMIN-API-KEY>

要点Takeaways

本教程演示了生成和测试自定义分析器的过程。This tutorial demonstrated the process for building and testing a custom analyzer. 你已创建了索引、为数据编制了索引,然后针对索引进行了查询以查看返回的搜索结果。You created an index, indexed data, and then queried against the index to see what search results were returned. 然后,你使用分析文本 API 查看实际的词法分析过程。From there, you used the Analyze Text API to see the lexical analysis process in action.

本教程中定义的分析器为搜索电话号码提供了一种简单的解决方案,而这一同样的过程也可以用于针对你可能遇到的任何场景构建自定义电话分析器。While the analyzer defined in this tutorial offers an easy solution for searching against phone numbers, this same process can be used to build a custom phone analyzer for any scenario you may have.

清理资源Clean up resources

在自己的订阅中操作时,最好在项目结束时删除不再需要的资源。When you're working in your own subscription, it's a good idea to remove the resources that you no longer need at the end of a project. 持续运行资源可能会产生费用。Resources left running can cost you money. 可以逐个删除资源,也可以删除资源组以删除整个资源集。You can delete resources individually or delete the resource group to delete the entire set of resources.

可以使用左侧导航窗格中的“所有资源”或“资源组”链接在门户中查找和管理资源。You can find and manage resources in the portal, using the All resources or Resource groups link in the left-navigation pane.

后续步骤Next steps

现在,你已经熟悉了如何创建自定义分析器,接下来让我们看看可用于构建丰富的搜索体验的所有不同筛选器、tokenizer 和分析器。Now that you're familiar with how to create a custom analyzer, let's take a look at all of the different filters, tokenizers, and analyzers available to you to build a rich search experience.