Azure 认知搜索是什么?What is Azure Cognitive Search?

Azure 认知搜索(以前称为“Azure 搜索”)是一种云搜索服务,它为开发人员提供 API 和工具,以便基于 Web、移动和企业应用程序中的专用异类内容构建丰富的搜索体验。Azure Cognitive Search (formerly known as "Azure Search") is a cloud search service that gives developers APIs and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.

创建认知搜索服务时,将获得:When you create a Cognitive Search service, you get:

  • 执行索引和查询执行的搜索引擎a search engine that performs indexing and query execution
  • 索引过程中以 AI 为中心的图像和无差异文本的分析和转换AI-centered analysis and transformation of images and undifferentiated text during indexing
  • 创建和管理的搜索索引的持久存储persistent storage of search indexes that you create and manage
  • 用于编写简单到复杂查询的查询语言a query language for composing simple to complex queries

从体系结构方面来讲,搜索服务位于外部数据存储(包含未索引数据)与客户端应用(向搜索索引发送查询请求并处理响应)之间。Architecturally, a search service sits in between the external data stores that contain your un-indexed data, and a client app that sends query requests to a search index and handles the response.

Azure 认知搜索体系结构Azure Cognitive Search architecture

表面上,搜索服务以“索引器”(自动从 Azure 数据源引入/检索数据)和“技能组”(引入认知服务(例如图像和文本分析)中的可消耗 AI,或者引入你在 Azure 机器学习中创建的或在 Azure Functions 内包装的自定义 AI)的形式与其他 Azure 服务集成 。Outwardly, a search service integrates with other Azure services in the form of indexers that automate data ingestion/retrieval from Azure data sources, and skillsets that incorporate consumable AI from Cognitive Services, such as image and text analysis, or custom AI that you create in Azure Machine Learning or wrap inside Azure Functions.

在搜索服务本身,两个主要工作负荷是索引编制和查询 。On the search service itself, the two primary workloads are indexing and querying.

  • 索引编制将文本插入到搜索服务中,并使其可供搜索。Indexing brings text into to your search service and makes it searchable. 在内部,将入站文本处理到令牌中,并将其存储在逆选索引中,以便快速扫描。Internally, inbound text is processed into tokens and stored in inverted indexes for fast scans.

    在编制索引期间,可选择通过认知技能添加 AI 扩充:来自 Microsoft 的预定义技能,或者你创建的自定义技能。Within indexing, you have the option of adding AI enrichment through cognitive skills, either predefined ones from Microsoft or custom skills that you create. 后续的分析和转换可能会导致生成以前不存在的新信息和结构,为许多搜索和知识挖掘方案提供高实用性。The subsequent analysis and transformations can result in new information and structures that did not previously exist, providing high utility for many search and knowledge mining scenarios.

  • 使用可搜索的数据填充索引后,客户端应用会将查询请求发送到搜索服务,并处理响应。Once an index is populated with searchable data, your client app sends query requests to a search service and handles responses. 所有查询执行都基于你在服务中创建、拥有和存储的搜索索引。All query execution is over a search index that you create, own, and store in your service. 在客户端应用中,搜索体验是使用 Azure 认知搜索中的 API 定义的,可能包括相关性调整、自动完成、同义词匹配、模糊匹配、模式匹配、筛选和排序。In your client app, the search experience is defined using APIs from Azure Cognitive Search, and can include relevance tuning, autocomplete, synonym matching, fuzzy matching, pattern matching, filter, and sort.

功能通过简单的 REST API.NET SDK 公开,消除了信息检索固有的复杂性。Functionality is exposed through a simple REST API or .NET SDK that masks the inherent complexity of information retrieval. 你还可以使用 Azure 门户,通过用于原型制作以及查询索引和技能组的工具进行服务管理和内容管理。You can also use the Azure portal for service administration and content management, with tools for prototyping and querying your indexes and skillsets. 因为服务在云中运行,所以基础结构和可用性由 Microsoft 管理。Because the service runs in the cloud, infrastructure and availability are managed by Microsoft.

Azure 认知搜索非常适合以下应用程序方案:Azure Cognitive Search is well-suited for the following application scenarios:

  • 将异构内容整合成专用的用户定义的搜索索引。Consolidate heterogeneous content into a private, user-defined search index. 可以在搜索索引中填充来自任何源的 JSON 文档流。You can populate a search index with streams of JSON documents from any source. 对于 Azure 上支持的源,请使用索引器来自动编制索引。For supported sources on Azure, use an indexer to automate indexing. 对索引架构和刷新计划进行控制是使用认知搜索的主要原因。Control over the index schema and refresh schedule is a key reason for using Cognitive Search.

  • 轻松实现搜索相关的功能。Easy implementation of search-related features. 搜索 API 简化了查询构造、分面导航、筛选器(包括地理空间搜索)、同义词映射、自动完成和相关性优化。Search APIs simplify query construction, faceted navigation, filters (including geo-spatial search), synonym mapping, autocomplete, and relevance tuning. 使用内置功能可以满足最终用户对搜索体验的预期,使其觉得该体验类似于商用 Web 搜索引擎。Using built-in features, you can satisfy end-user expectations for a search experience similar to commercial web search engines.

  • 原始内容是 Azure Blob 存储或 Cosmos DB 中存储的大型无差别文本、图像文件或应用程序文件。Raw content is large undifferentiated text or image files or application files stored in Azure Blob storage or Cosmos DB. 在索引编制期间,你可以应用认知技能来识别和提取文本、创建结构或创建新信息(例如已翻译的文本或实体)。You can apply cognitive skills during indexing to identify and extract text, create structure, or create new information such as translated text or entities.

  • 内容需要语言分析或自定义文本分析。Content needs linguistic or custom text analysis. 如果你使用非英语内容,Azure 认知搜索支持 Lucene 分析器和 Microsoft 的自然语言处理器。If you have non-English content, Azure Cognitive Search supports both Lucene analyzers and Microsoft's natural language processors. 还可以配置分析器以实现原始内容的专业处理,例如筛选出标注字符,或识别并保留字符串中的模式。You can also configure analyzers to achieve specialized processing of raw content, such as filtering out diacritics, or recognizing and preserving patterns in strings.

有关特定功能的详细信息,请参阅 Azure 认知搜索的 功能For more information about specific functionality, see Features of Azure Cognitive Search

如何开始使用How to get started

可以通过以下四个步骤来实现核心搜索功能的端到端探索:An end-to-end exploration of core search features can be achieved in four steps:

  1. 在与其他订阅者共享的免费层上创建一个搜索服务 可搜索仅供你的服务使用的专用资源。Create a search service at the Free tier shared with other subscribers, for dedicated resources used only by your service. 所有快速入门和教程都可以通过免费服务完成。All quickstarts and tutorials can be completed on the free service.

  2. 使用门户 REST API 创建搜索索引Create a search index using the portal, REST API. .NET SDK 或其他 SDK。.NET SDK, or another SDK. 索引架构决定了可搜索内容的结构。The index schema defines the structure of searchable content.

  3. 将内容上传到索引。Upload content to the index. 使用“推送”模型从任意源推送 JSON 文档,或如果源数据位于 Azure 上,则使用“拉取”模型(索引器)Use the "push" model to push JSON documents from any source, or use the "pull" model (indexers) if your source data is on Azure.

  4. 使用门户 REST API.NET SDK 或其他 SDK 中的搜索资源管理器查询索引Query an index using Search explorer in the portal, REST API, .NET SDK, or another SDK.


导入数据向导和 Azure 数据源开始整合步骤,以在几分钟内创建、加载和查询索引。Consolidate steps by starting with the Import data wizard and an Azure data source to create, load, and query an index in minutes.

它如何进行比较How it compares

客户常常询问 Azure 认知搜索与其他搜索相关解决方案有何不同。Customers often ask how Azure Cognitive Search compares with other search-related solutions. 下表总结主要区别。The following table summarizes key differences.

比较对象Compared to 主要区别Key differences
Microsoft SearchMicrosoft Search Microsoft 搜索适用于需要在 SharePoint 中查询内容的经过 Microsoft 365 身份验证的用户。Microsoft Search is for Microsoft 365 authenticated users who need to query over content in SharePoint. 它作为现成可用的搜索体验提供,由管理员进行启用和配置,能够通过连接器接受来自 Microsoft 和其他来源的外部内容。It's offered as a ready-to-use search experience, enabled and configured by administrators, with the ability to accept external content through connectors from Microsoft and other sources. 如果这与你的场景一致,则 Microsoft 365 的 Microsoft 搜索是一个值得探索的诱人选项。If this describes your scenario, then Microsoft Search with Microsoft 365 is an attractive option to explore.

相对地,Azure 认知搜索对你定义的索引执行查询,填充你拥有的数据和文档(常常来自多个不同的源)。In contrast, Azure Cognitive Search executes queries over an index that you define, populated with data and documents you own, often from diverse sources. Azure 认知搜索具有通过索引器爬取一些 Azure 数据源的功能,但你也可将符合你的索引架构的所有 JSON 文档推送到单个统一的可搜索资源。Azure Cognitive Search has crawler capabilities for some Azure data sources through indexers, but you can push any JSON document that conforms to your index schema into a single, consolidated searchable resource. 你还可自定义索引管道,将机器学习和词法分析器纳入其中。You can also customize the indexing pipeline to include machine learning and lexical analyzers. 由于认知搜索被构建为更大型的解决方案中的一个插件组件,因此你可通过任意平台在几乎任意应用中集成搜索功能。Because Cognitive Search is built to be a plug-in component in larger solutions, you can integrate search into almost any app, on any platform.
数据库搜索Database search 许多数据库平台都包含内置的搜索体验。Many database platforms include a built-in search experience. SQL Server 具有全文搜索SQL Server has full text search. Cosmos DB 及类似技术具有可查询的索引。Cosmos DB and similar technologies have queryable indexes. 在评估结合使用搜索和存储的产品时,确定要采用哪种方式可能颇具挑战性。When evaluating products that combine search and storage, it can be challenging to determine which way to go. 许多解决方案同时使用两种:使用 DBMS 进行存储,使用 Azure 认知搜索获取专业搜索功能。Many solutions use both: DBMS for storage, and Azure Cognitive Search for specialized search features.

与 DBMS 搜索相比,Azure 认知搜索存储来自不同来源的内容,并提供专用文本处理功能,例如 56 种语言中的语言感知文本处理(词干化、词元化、词形式)。Compared to DBMS search, Azure Cognitive Search stores content from heterogeneous sources and offers specialized text processing features such as linguistic-aware text processing (stemming, lemmatization, word forms) in 56 languages. 它还支持拼写错误单词的自动更正、同义词建议评分控制Facet自定义词汇切分It also supports autocorrection of misspelled words, synonyms, suggestions, scoring controls, facets, and custom tokenization. Azure 认知搜索中的全文搜索引擎基于 Apache Lucene,它是信息检索方面的行业标准。The full text search engine in Azure Cognitive Search is built on Apache Lucene, an industry standard in information retrieval. 虽然 Azure 认知搜索以倒排索引的形式持久存储数据,但它不能替代真正的数据存储,建议不要在该容量中使用它。However, while Azure Cognitive Search persists data in the form of an inverted index, it is not a replacement for true data storage and we don't recommend using it in that capacity. 有关详细信息,请参阅此论坛帖子For more information, see this forum post.

资源利用是这个类别的另一个转折点。Resource utilization is another inflection point in this category. 索引和一些查询操作通常是计算密集型的。Indexing and some query operations are often computationally intensive. 将搜索从 DBMS 卸载到云中的专用解决方案可以节省用于事务处理的系统资源。Offloading search from the DBMS to a dedicated solution in the cloud preserves system resources for transaction processing. 此外,通过将搜索外部化,可以根据查询量轻松调整规模。Furthermore, by externalizing search, you can easily adjust scale to match query volume.
专用搜索解决方案Dedicated search solution 假设已决定使用全频谱功能进行专用搜索,则需要在本地解决方案或云服务之间进行最终的分类比较。Assuming you have decided on dedicated search with full spectrum functionality, a final categorical comparison is between on premises solutions or a cloud service. 许多搜索技术提供对索引和查询管道的控制、对更丰富查询和筛选语法的访问、对设置级别和相关性的控制以及自导智能搜索功能。Many search technologies offer controls over indexing and query pipelines, access to richer query and filtering syntax, control over rank and relevance, and features for self-directed and intelligent search.

如果想要获得一个开销和维护工作量极少且规模可调的统包解决方案,则云服务是适当的选择。A cloud service is the right choice if you want a turn-key solution with minimal overhead and maintenance, and adjustable scale.

在云的范式中,许多提供程序提供相当的基线功能,以及全文搜索、地理搜索,并且能够处理搜索输入中一定程度的模糊性。Within the cloud paradigm, several providers offer comparable baseline features, with full-text search, geo-search, and the ability to handle a certain level of ambiguity in search inputs. 通常,它是一项专用功能,或者是 API、工具以及用于确定最匹配项的管理功能的易化和总体简化。Typically, it's a specialized feature, or the ease and overall simplicity of APIs, tools, and management that determines the best fit.

在所有云提供程序中,对于主要依赖于信息检索搜索和内容导航的应用,Azure 认知搜索在处理 Azure 上的内容存储和数据库的全文搜索工作负荷方面最为强大。Among cloud providers, Azure Cognitive Search is strongest for full text search workloads over content stores and databases on Azure, for apps that rely primarily on search for both information retrieval and content navigation.

主要优势包括:Key strengths include:

  • 在索引层的 Azure 数据集成(爬网程序)Azure data integration (crawlers) at the indexing layer
  • 用于集中管理的 Azure 门户Azure portal for central management
  • Azure 可伸缩性、可靠性和世界一流的可用性Azure scale, reliability, and world-class availability
  • 对原始数据进行 AI 处理,使其更易于搜索,包括识别图像中的文本,或查找非结构化内容中的模式。AI processing of raw data to make it more searchable, including text from images, or finding patterns in unstructured content.
  • 语言分析和自定义分析,提供分析器,用于支持以 56 种语言进行可靠的全文搜索Linguistic and custom analysis, with analyzers for solid full text search in 56 languages
  • 对以搜索为中心的应用通用的核心功能:评分、分面、建议、同义词、地理搜索,等等。Core features common to search-centric apps: scoring, faceting, suggestions, synonyms, geo-search, and more.

在我们的所有客户中,能够利用 Azure 认知搜索中最广泛功能的客户包括在线目录、业务线程序以及文档发现应用程序。Among our customers, those able to leverage the widest range of features in Azure Cognitive Search include online catalogs, line-of-business programs, and document discovery applications.