快速入门:在 Azure 门户中创建 Azure 认知搜索索引Quickstart: Create an Azure Cognitive Search index in the Azure portal

“导入数据”向导是一种 Azure 门户工具,可指导你完成搜索索引的创建过程,以便你可以在几分钟内编写感兴趣的查询。Import data wizard is an Azure portal tool that guides you through the creation of a search index so that you can write interesting queries within minutes.

该向导还具有 AI 扩充页面,以便你可以从图像文件和非结构化文本中提取文本和结构。The wizard also has pages for AI enrichment so that you can extract text and structure from image files and unstructured text. AI 内容处理包括光学字符识别 (OCR)、关键短语和实体提取以及图像分析。Content processing with AI includes Optical Character Recognition (OCR), key phrase and entity extraction, and image analysis.

先决条件Prerequisites

在开始之前,必须满足以下条件:Before you begin, you must have the following:

检查空间Check for space

很多客户开始使用免费服务。Many customers start with the free service. 此版本限制为三个索引、三个数据源和三个索引器。This version is limited to three indexes, three data sources, and three indexers. 在开始之前,请确保有空间存储额外的项目。Make sure you have room for extra items before you begin. 本教程会创建每个对象的一个实例。This tutorial creates one of each object.

服务仪表板上的部分显示你已有多少个索引、索引器和数据源。Sections on the service dashboard show how many indexes, indexers, and data sources you already have.

索引、索引器和数据源的列表

创建索引并加载数据Create an index and load data

搜索查询可循环访问索引,索引中包含可搜索数据、元数据,以及其他用于优化某些搜索行为的构造。Search queries iterate over an index that contains searchable data, metadata, and additional constructs that optimize certain search behaviors.

在本教程中,请使用可通过“导入数据”向导利用索引器对其进行抓取的内置示例数据集。For this tutorial, we use a built-in sample dataset that can be crawled using an indexer via the Import data wizard. 索引器是特定于源的爬网程序,可以从支持的 Azure 数据源中读取元数据和内容。An indexer is a source-specific crawler that can read metadata and content from supported Azure data sources. 通常,索引器以编程方式使用,但在门户中,你可以通过“导入数据”向导来访问。Normally, indexers are used programmatically, but in the portal, you can access them through the Import data wizard.

步骤 1 - 启动“导入数据”向导和创建数据源Step 1 - Start the Import data wizard and create a data source

  1. 使用 Azure 帐户登录到 Azure 门户Sign in to the Azure portal with your Azure account.

  2. 查找搜索服务,并在“概述”页中,单击命令栏上的“导入数据”,以创建和填充搜索索引。Find your search service and on the Overview page, click Import data on the command bar to create and populate a search index.

    导入数据命令

  3. 在向导中,单击“连接到数据” > “示例” > “hotels-sample”。 In the wizard, click Connect to your data > Samples > hotels-sample. 此数据源是内置的。This data source is built-in. 如果要创建自己的数据源,则需要指定名称、类型和连接信息。If you were creating your own data source, you would need to specify a name, type, and connection information. 创建后,它将成为可在其他导入操作中重复使用的“现有数据源”。Once created, it becomes an "existing data source" that can be reused in other import operations.

    选择示例数据集

  4. 继续转到下一页。Continue to the next page.

步骤 2 - 跳过“充实内容”页面Step 2 - Skip the "Enrich content" page

该向导支持创建 AI 扩充管道,用于将认知服务 AI 算法合并到索引中。The wizard supports the creation of an AI enrichment pipeline for incorporating the Cognitive Services AI algorithms into indexing.

我们将暂时跳过此步骤,转到“自定义目标索引”。We'll skip this step for now, and move directly on to Customize target index.

跳过认知技能步骤

提示

可以在快速入门教程中逐步执行 AI 索引示例。You can step through an AI-indexing example in a quickstart or tutorial.

步骤 3 - 配置索引Step 3 - Configure index

通常情况下,索引创建是基于代码的练习,在加载数据之前完成。Typically, index creation is a code-based exercise, completed prior to loading data. 但是,如本教程所示,向导可以针对它能够爬网的任何数据源生成基本索引。However, as this tutorial indicates, the wizard can generate a basic index for any data source it can crawl. 索引至少需要一个名称和一个字段集合;其中一个字段应该标记为文档键,用于唯一标识每个文档。Minimally, an index requires a name and a fields collection; one of the fields should be marked as the document key to uniquely identify each document. 此外,如果需要自动完成或建议查询,可以指定语言分析器或建议器。Additionally, you can specify language analyzers or suggesters if you want autocomplete or suggested queries.

字段包含数据类型和属性。Fields have data types and attributes. 顶部的复选框为索引属性,用于控制如何使用字段。The check boxes across the top are index attributes controlling how the field is used.

  • 可检索意味着该字段将显示在搜索结果列表中。Retrievable means that it shows up in search results list. 清除此复选框即可将单个字段标记为关闭搜索结果限制,例如 for 字段仅用在筛选器表达式中。You can mark individual fields as off limits for search results by clearing this checkbox, for example for fields used only in filter expressions.
  • “密钥”是唯一的文档标识符。Key is the unique document identifier. 它始终是一个字符串,而且是必需的字符串。It's always a string, and it is required.
  • “可筛选”、“可排序”和“可查找”确定字段是否可用于筛选器、排序或方面导航结构 。Filterable, Sortable, and Facetable determine whether fields are used in a filter, sort, or faceted navigation structure.
  • 可搜索意味着该字段将包括在全文搜索中。Searchable means that a field is included in full text search. 字符串可搜索。Strings are searchable. 数值字段和布尔字段通常标记为不可搜索。Numeric fields and Boolean fields are often marked as not searchable.

存储要求不会因你的选择而发生更改。Storage requirements do not vary as a result of your selection. 例如,如果你在多个字段上设置“可检索”属性,则存储需求不会增加。For example, if you set the Retrievable attribute on multiple fields, storage requirements do not go up.

默认情况下,向导会在数据源中扫描用作键字段基础的唯一标识符。By default, the wizard scans the data source for unique identifiers as the basis for the key field. 字符串经过属性化,可检索可搜索Strings are attributed as Retrievable and Searchable. 整数经过属性化,可检索可筛选可排序可分面Integers are attributed as Retrievable, Filterable, Sortable, and Facetable.

  1. 接受默认值。Accept the defaults.

    如果使用现有的 hotels 数据源再次重新运行向导,则不会使用默认属性配置索引。If you rerun the wizard a second time using an existing hotels data source, the index won't be configured with default attributes. 以后导入时,你必须手动选择属性。You'll have to manually select attributes on future imports.

    生成的 hotels 索引

  2. 继续转到下一页。Continue to the next page.

步骤 4 - 配置索引器Step 4 - Configure indexer

仍在“导入数据”向导中,单击“索引器” > “名称”,并键入索引器的名称。 Still in the Import data wizard, click Indexer > Name, and type a name for the indexer.

此对象定义一个可执行过程。This object defines an executable process. 可将该对象放入定期计划,但我们现在使用默认选项立即运行索引器一次。You could put it on recurring schedule, but for now use the default option to run the indexer once, immediately.

单击“提交”以创建并同时运行索引器。Click Submit to create and simultaneously run the indexer.

hotels 索引器

监视进度Monitor progress

该向导应转到索引器列表,你可在其中监视进度。The wizard should take you to the Indexers list where you can monitor progress. 若要进行自导航,请转到“概述”页,然后单击“索引器”。For self-navigation, go to the Overview page and click Indexers.

门户网站可能需要几分钟才能更新页面,但列表中应会出现新建的索引器,其状态指示“正在进行”或“成功”,此外还会列出已编制索引的文档数。It can take a few minutes for the portal to update the page, but you should see the newly created indexer in the list, with status indicating "in progress" or success, along with the number of documents indexed.

索引器进度消息

查看索引View the index

主服务页提供指向 Azure 认知搜索服务中创建的资源的链接。The main service page provides links to the resources created in your Azure Cognitive Search service. 若要查看刚刚创建的索引,请单击链接列表中的索引To view the index you just created, click Indexes from the list of links.

等待门户页刷新。Wait for the portal page to refresh. 几分钟后,应该会看到具有文档计数和存储大小的索引。After a few minutes, you should see the index with a document count and storage size.

服务仪表板上的索引列表

从此列表中,可以单击刚刚创建的 hotels-sample 索引,查看索引架构,From this list, you can click on the hotels-sample index that you just created, view the index schema. 并可以选择添加新字段。and optionally add new fields.

“字段”选项卡显示索引架构。The Fields tab shows the index schema. 滚动到列表底部可输入新字段。Scroll to the bottom of the list to enter a new field. 在大多数情况下,不能更改现有字段。In most cases, you cannot change existing fields. 现有字段在 Azure 认知搜索中具有实际的表示形式,因此不可修改,即使在代码中也是如此。Existing fields have a physical representation in Azure Cognitive Search and are thus non-modifiable, not even in code. 若要从根本上更改现有字段,请创建新索引并丢弃原始索引。To fundamentally change an existing field, create a new index, dropping the original.

示例索引定义

随时可以添加其他构造,例如评分配置文件和 CORS 选项。Other constructs, such as scoring profiles and CORS options, can be added at any time.

若要清楚地了解在索引设计过程中可以和不可以编辑哪些内容,请花点时间查看索引定义选项。To clearly understand what you can and cannot edit during index design, take a minute to view index definition options. 灰显的选项表示对应的值不可修改或删除。Grayed-out options are an indicator that a value cannot be modified or deleted.

使用搜索浏览器查询Query using Search explorer

我们继续。现在应已创建了一个可以使用内置搜索资源管理器查询页查询的搜索索引。Moving forward, you should now have a search index that's ready to query using the built-in Search explorer query page. 该页提供了一个搜索框,用于测试任意查询字符串。It provides a search box so that you can test arbitrary query strings.

搜索浏览器仅用于处理 REST API 请求,但它接受简单查询语法完整 Lucene 查询分析器的语法,加上可在搜索文档 REST API 操作中使用的所有搜索参数。Search explorer is only equipped to handle REST API requests, but it accepts syntax for both simple query syntax and full Lucene query parser, plus all the search parameters available in Search Document REST API operations.

  1. 单击命令栏上的“搜索浏览器” 。Click Search explorer on the command bar.

    搜索浏览器命令

  2. 在“索引”下拉列表中,选择“hotels-sample-index”。From the Index dropdown, choose hotels-sample-index. 单击“API 版本”下拉列表,查看有哪些 REST API 可用。Click the API Version dropdown, to see which REST APIs are available. 对于以下查询,请使用正式发行版 (2020-06-30)。For the queries below, use the generally available version (2020-06-30).

    索引和 API 命令

  3. 在搜索栏中粘贴以下查询字符串,并单击“搜索”。In the search bar, paste in the query strings below and click Search.

    查询字符串和搜索按钮

查询示例Example queries

你可以输入术语和短语,类似于在必应搜索中可能执行的操作,或输入完全指定的查询表达式。You can enter terms and phrases, similar to what you might do in a Bing search, or fully-specified query expressions. 结果以详细的 JSON 文档形式返回。Results are returned as verbose JSON documents.

提供 top N 结果的简单查询Simple query with top N results

示例(字符串查询):search=spaExample (string query): search=spa

  • search 参数用于输入关键字来执行全文搜索,在本例中,将返回在文档的任何可搜索字段中包含 spa 的酒店数据。The search parameter is used to input a keyword search for full text search, in this case, returning hotel data for those containing spa in any searchable field in the document.

  • 搜索浏览器以 JSON 格式返回结果,如果文档采用密集结构,这种结果将很冗长且难以阅读。Search explorer returns results in JSON, which is verbose and hard to read if documents have a dense structure. 这是有意而为的;整个文档的可见性对于开发来说很重要,尤其是在测试期间。This is intentional; visibility into the entire document is important for development purposes, especially during testing. 为了改善用户体验,需要编写代码用于处理搜索结果,以提供重要元素。For a better user experience, you will need to write code that handles search results to bring out important elements.

  • 文档由标记为在索引中“可检索”的所有字段构成。Documents are composed of all fields marked as "retrievable" in the index. 若要在门户中查看索引属性,请在“索引”列表中单击“hotels-sample”。To view index attributes in the portal, click hotels-sample in the Indexes list.

示例(参数化查询):search=spa&$count=true&$top=10Example (parameterized query): search=spa&$count=true&$top=10

  • & 符号用于追加可以按任意顺序指定的搜索参数。The & symbol is used to append search parameters, which can be specified in any order.

  • $count=true 参数返回所有已返回文档的总计数。The $count=true parameter returns the total count of all documents returned. 此值显示在搜索结果顶部附近。This value appears near the top of the search results. 可以通过监视 $count=true 报告的更改来验证筛选器查询。You can verify filter queries by monitoring changes reported by $count=true. 如果计数较小,则表示筛选器正在工作。Smaller counts indicate your filter is working.

  • $top=10 返回所有文档中排名最高的 10 个文档。The $top=10 returns the highest ranked 10 documents out of the total. 默认情况下,Azure 认知搜索返回前 50 个最佳匹配项。By default, Azure Cognitive Search returns the first 50 best matches. 可以通过 $top 增加或减少返回的结果。You can increase or decrease the amount via $top.

筛选查询Filter the query

追加 $filter 参数时,会将筛选器包括在搜索请求中。Filters are included in search requests when you append the $filter parameter.

示例(已筛选):search=beach&$filter=Rating gt 4Example (filtered): search=beach&$filter=Rating gt 4

  • $filter 参数返回与提供的条件匹配的结果。The $filter parameter returns results matching the criteria you provided. 在本例中,条件为评分大于 4。In this case, ratings greater than 4.

  • 筛选器语法是一种 OData 构造。Filter syntax is an OData construction. 有关详细信息,请参阅 Filter OData syntax(筛选器 OData 语法)。For more information, see Filter OData syntax.

分面查询Facet the query

分面筛选器包括在搜索请求中。Facet filters are included in search requests. 可以使用分面参数,返回与所提供分面值匹配的文档的聚合计数。You can use the facet parameter to return an aggregated count of documents that match a facet value you provide.

示例(使用范围缩减进行分面):search=*&facet=Category&$top=2Example (faceted with scope reduction): search=*&facet=Category&$top=2

  • search= * 是空搜索。search=* is an empty search. 空搜索会搜索所有内容。Empty searches search over everything. 提交空查询的原因之一是针对整个文档集进行筛选器或分面。One reason for submitting an empty query is to filter or facet over the complete set of documents. 例如,你希望某个分面导航结构由索引中的所有酒店组成。For example, you want a faceting navigation structure to consist of all hotels in the index.

  • facet 返回可传递给 UI 控件的导航结构。facet returns a navigation structure that you can pass to a UI control. 它将返回类别和计数。It returns categories and a count. 在本例中,类别基于可以方便地称为“类别”的字段。In this case, categories are based on a field conveniently called Category. Azure 认知搜索中没有聚合,但可以通过 facet 进行近似聚合,提供每个类别中的文档计数。There is no aggregation in Azure Cognitive Search, but you can approximate aggregation via facet, which gives a count of documents in each category.

  • $top=2 返回两个文档,演示如何使用 top 来减少或增加结果。$top=2 brings back two documents, illustrating that you can use top to both reduce or increase results.

示例(包含数字值的分面):search=spa&facet=RatingExample (facet on numeric values): search=spa&facet=Rating

  • 此查询针对 spa 执行文本搜索后返回的评分分面。This query is facet for rating, on a text search for spa. 可将“评分”一词指定为分面,因为该字段已标记为可在索引中检索、筛选和分面,并且它包含的值(数字 1 到 5)适合用于将列表分类为组。The term Rating can be specified as a facet because the field is marked as retrievable, filterable, and facetable in the index, and the values it contains (numeric, 1 through 5), are suitable for categorizing listings into groups.

  • 只有可筛选的字段才可分面。Only filterable fields can be faceted. 结果中只返回仅可检索的字段。Only retrievable fields can be returned in the results.

  • “评分”字段为双精度浮点,将按精度值分组。The Rating field is double-precision floating point and the grouping will be by precise value. 若要详细了解如何按间隔来分组(例如,“3 星评分”、“4 星评分”等),请参阅如何在 Azure 认知搜索中实现分面导航For more information on grouping by interval (for instance, "3 star ratings," "4 star ratings," etc.), see How to implement faceted navigation in Azure Cognitive Search.

突出显示搜索结果Highlight search results

搜索词突出显示是对与关键字匹配的文本设置的格式,表示在特定字段中找到的匹配项。Hit highlighting refers to formatting on text matching the keyword, given matches are found in a specific field. 如果搜索词深藏在说明中,可以添加搜索词突出显示来方便找到这些词。If your search term is deeply buried in a description, you can add hit highlighting to make it easier to spot.

示例(突出显示):search=beach&highlight=DescriptionExample (highlighter): search=beach&highlight=Description

  • 在此示例中,格式化单词 beach 更容易在说明字段中发现。In this example, the formatted word beach is easier to spot in the description field.

示例(语言分析):search=beaches&highlight=DescriptionExample (linguistic analysis): search=beaches&highlight=Description

  • 全文搜索可识别单词形式中的基本差异。Full text search recognizes basic variations in word forms. 在本例中,在进行“beaches”关键字搜索后,会获得包含“beach”突出显示文本的结果,可以从这些结果中搜索那些在其可搜索字段中包含该单词的酒店。In this case, search results contain highlighted text for "beach", for hotels that have that word in their searchable fields, in response to a keyword search on "beaches". 由于执行了语言分析,同一单词的不同形式可能会显示在结果中。Different forms of the same word can appear in results because of linguistic analysis.

  • Azure 认知搜索支持 Lucene 和 Microsoft 提供的 56 种分析器。Azure Cognitive Search supports 56 analyzers from both Lucene and Microsoft. Azure 认知搜索使用的默认分析器是标准的 Lucene 分析器。The default used by Azure Cognitive Search is the standard Lucene analyzer.

默认情况下,执行典型搜索时,如果拼错查询字词(例如,将“Seattle”错拼为“seatle”),则无法返回匹配项。By default, misspelled query terms, like seatle for "Seattle", fail to return matches in typical search. 以下示例不会返回任何结果。The following example returns no results.

示例(拼错字词且未经处理):search=seatleExample (misspelled term, unhandled): search=seatle

若要处理拼写错误,可以使用模糊搜索。To handle misspellings, you can use fuzzy search. 使用完整的 Lucene 查询语法时,将启用模糊搜索:在查询中设置 queryType=full,以及将 ~ 追加到搜索字符串时,就会启用模糊搜索。Fuzzy search is enabled when you use the full Lucene query syntax, which occurs when you do two things: set queryType=full on the query, and append the ~ to the search string.

示例(拼错字词且已经处理):search=seatle~&queryType=fullExample (misspelled term, handled): search=seatle~&queryType=full

现在,此示例会返回包含“Seattle”匹配项的文档。This example now returns documents that include matches on "Seattle".

如果未指定 queryType,将使用默认的简单查询分析器。When queryType is unspecified, the default simple query parser is used. 简单查询分析器速度更快,但如果需要执行模糊搜索、正则表达式、近似搜索或其他高级查询类型,则需要使用完整语法。The simple query parser is faster, but if you require fuzzy search, regular expressions, proximity search, or other advanced query types, you will need the full syntax.

模糊搜索和通配符搜索对搜索输出产生影响。Fuzzy search and wildcard search have implications on search output. 不会针对这些查询格式执行语言分析。Linguistic analysis is not performed on these query formats. 在使用模糊搜索和通配符搜索之前,请查看 Azure 认知搜索中全文搜索的工作原理,并查找有关词法分析例外情况的部分。Before using fuzzy and wildcard search, review How full text search works in Azure Cognitive Search and look for the section about exceptions to lexical analysis.

有关完整查询分析器支持的查询方案的详细信息,请参阅 Lucene query syntax in Azure Cognitive Search(Azure 认知搜索中的 Lucene 查询语法)。For more information about query scenarios enabled by the full query parser, see Lucene query syntax in Azure Cognitive Search.

对于包含坐标的字段,支持通过 edm.GeographyPoint 数据类型执行地理空间搜索。Geospatial search is supported through the edm.GeographyPoint data type on a field containing coordinates. 地域搜索是 Filter OData syntax(筛选器 OData 语法)中指定的一种筛选器类型。Geosearch is a type of filter, specified in Filter OData syntax.

示例(地理坐标筛选器):search=*&$count=true&$filter=geo.distance(Location,geography'POINT(-122.12 47.67)') le 5Example (geo-coordinate filters): search=*&$count=true&$filter=geo.distance(Location,geography'POINT(-122.12 47.67)') le 5

该示例查询将筛选位置数据的所有结果,这些结果与给定地点(以纬度和经度坐标的形式指定)之间的距离小于 5 公里。The example query filters all results for positional data, where results are less than 5 kilometers from a given point (specified as latitude and longitude coordinates). 通过添加 $count,可以查看在更改距离或坐标时会返回多少个结果。By adding $count, you can see how many results are returned when you change either the distance or the coordinates.

如果搜索应用程序具有“附近查找”功能或使用地图导航,地理空间搜索非常有用。Geospatial search is useful if your search application has a "find near me" feature or uses map navigation. 但它不属于全文搜索。It is not full text search, however. 如果用户要求是按名称搜索城市或国家/地区,请添加包含该城市或国家/地区的名称的字段,并添加坐标。If you have user requirements for searching on a city or country/region by name, add fields containing city or country/region names, in addition to coordinates.

要点Takeaways

本教程快速介绍了如何在 Azure 门户中使用 Azure 认知搜索。This tutorial provided a quick introduction to Azure Cognitive Search using the Azure portal.

介绍了如何使用“导入数据”向导创建搜索索引;You learned how to create a search index using the Import data wizard. 介绍了索引器,以及索引设计的基本工作流,包括对已发布索引进行的支持的修改You learned about indexers, as well as the basic workflow for index design, including supported modifications to a published index.

介绍了一些基本的查询语法,在 Azure 门户中使用搜索浏览器通过手动示例演示筛选器、搜索词突出显示、模糊搜索和地理搜索等重要功能;Using the Search explorer in the Azure portal, you learned some basic query syntax through hands-on examples that demonstrated key capabilities such as filters, hit highlighting, fuzzy search, and geo-search.

还学习了如何在门户中查找索引、索引器和数据源。You also learned how to find indexes, indexers, and data sources in the portal. 如果将来有任何新的数据源,则可轻松地通过门户快速查看其定义或字段集合。Given any new data source in the future, you can use the portal to quickly check its definitions or field collections with minimal effort.

清理资源Clean up resources

在自己的订阅中操作时,最好在项目结束时确定是否仍需要已创建的资源。When you're working in your own subscription, it's a good idea at the end of a project to identify whether you still need the resources you created. 持续运行资源可能会产生费用。Resources left running can cost you money. 可以逐个删除资源,也可以删除资源组以删除整个资源集。You can delete resources individually or delete the resource group to delete the entire set of resources.

可以使用左侧导航窗格中的“所有资源”或“资源组”链接 ,在门户中查找和管理资源。You can find and manage resources in the portal, using the All resources or Resource groups link in the left-navigation pane.

如果使用的是免费服务,请记住只能设置三个索引、索引器和数据源。If you are using a free service, remember that you are limited to three indexes, indexers, and data sources. 可以在门户中删除单个项目,以不超出此限制。You can delete individual items in the portal to stay under the limit.

后续步骤Next steps

使用门户向导生成在浏览器中运行的即用型 Web 应用。Use a portal wizard to generate a ready-to-use web app that runs in a browser. 可以在刚刚创建的小索引上尝试使用此向导,也可以使用内置的示例数据集之一来获得更丰富的搜索体验。You can try this wizard out on the small index you just created, or use one of the built-in sample data sets for a richer search experience.