快速入门:在 Python 中使用 Jupyter Notebook 创建 Azure 认知搜索索引Quickstart: Create an Azure Cognitive Search index in Python using Jupyter notebooks

使用 Python 和 Azure 认知搜索 REST API 生成可创建、加载和查询 Azure 认知搜索索引的 Jupyter Notebook。Build a Jupyter notebook that creates, loads, and queries an Azure Cognitive Search index using Python and the Azure Cognitive Search REST APIs. 本文介绍如何逐步生成笔记本。This article explains how to build a notebook step by step. 你也可以下载并运行一个已完成的 Jupyter Python 笔记本Alternatively, you can download and run a finished Jupyter Python notebook.

如果没有 Azure 订阅,可在开始前创建一个试用帐户If you don't have an Azure subscription, create a trial account before you begin.

先决条件Prerequisites

本快速入门需要以下服务和工具。The following services and tools are required for this quickstart.

获取密钥和 URLGet a key and URL

REST 调用需要在每个请求中使用服务 URL 和访问密钥。REST calls require the service URL and an access key on every request. 搜索服务是使用这二者创建的,因此,如果向订阅添加了 Azure 认知搜索,则请按以下步骤获取必需信息:A search service is created with both, so if you added Azure Cognitive Search to your subscription, follow these steps to get the necessary information:

  1. 登录到 Azure 门户,在搜索服务的“概述”页中获取 URL。 Sign in to the Azure portal, and in your search service Overview page, get the URL. 示例终结点可能类似于 https://mydemo.search.azure.cnAn example endpoint might look like https://mydemo.search.azure.cn.

  2. 在“设置” > “密钥”中,获取有关该服务的完全权限的管理员密钥 。In Settings > Keys, get an admin key for full rights on the service. 有两个可交换的管理员密钥,为保证业务连续性而提供,以防需要滚动一个密钥。There are two interchangeable admin keys, provided for business continuity in case you need to roll one over. 可以在请求中使用主要或辅助密钥来添加、修改和删除对象。You can use either the primary or secondary key on requests for adding, modifying, and deleting objects.

获取 HTTP 终结点和访问密钥Get an HTTP endpoint and access key

所有请求对发送到服务的每个请求都需要 API 密钥。All requests require an api-key on every request sent to your service. 具有有效的密钥可以在发送请求的应用程序与处理请求的服务之间建立信任关系,这种信任关系以每个请求为基础。Having a valid key establishes trust, on a per request basis, between the application sending the request and the service that handles it.

在此任务中,请启动一个 Jupyter notebook 并验证是否可以连接到 Azure 认知搜索。In this task, start a Jupyter notebook and verify that you can connect to Azure Cognitive Search. 为此,你将从服务请求索引列表。You'll do this by requesting a list of indexes from your service. 在装有 Anaconda3 的 Windows 上,可以使用 Anaconda Navigator 来启动笔记本。On Windows with Anaconda3, you can use Anaconda Navigator to launch a notebook.

  1. 创建新的 Python3 笔记本。Create a new Python3 notebook.

  2. 在第一个单元格中,加载用于处理 JSON 和构建 HTTP 请求的库。In the first cell, load the libraries used for working with JSON and formulating HTTP requests.

    import json
    import requests
    from pprint import pprint
    
  3. 在第二个单元格中,输入用作每个请求中的常量的请求元素。In the second cell, input the request elements that will be constants on every request. 将搜索服务名称 (YOUR-SEARCH-SERVICE-NAME) 和管理员 API 密钥 (YOUR-ADMIN-API-KEY) 替换为有效值。Replace the search service name (YOUR-SEARCH-SERVICE-NAME) and admin API key (YOUR-ADMIN-API-KEY) with valid values.

    endpoint = 'https://<YOUR-SEARCH-SERVICE-NAME>.search.azure.cn/'
    api_version = '?api-version=2019-05-06'
    headers = {'Content-Type': 'application/json',
            'api-key': '<YOUR-ADMIN-API-KEY>' }
    

    如果收到 ConnectionError "Failed to establish a new connection",请验证 api-key 是主管理密钥还是辅助管理密钥,以及所有前导和尾随字符(?/)是否已到位。If you get ConnectionError "Failed to establish a new connection", verify that the api-key is a primary or secondary admin key, and that all leading and trailing characters (? and /) are in place.

  4. 在第三个单元格中构建请求。In the third cell, formulate the request. 此 GET 请求针对搜索服务的索引集合,并选择现有索引的名称属性。This GET request targets the indexes collection of your search service and selects the name property of existing indexes.

    url = endpoint + "indexes" + api_version + "&$select=name"
    response  = requests.get(url, headers=headers)
    index_list = response.json()
    pprint(index_list)
    
  5. 运行每个步骤。Run each step. 如果索引存在,则响应将包含索引名称列表。If indexes exist, the response contains a list of index names. 在以下屏幕截图中,服务已有 azureblob-index 和 realestate-us-sample 索引。In the screenshot below, the service already has an azureblob-index and a realestate-us-sample index.

    Jupyter notebook 中的 Python 脚本,其中包含对 Azure 认知搜索的 HTTP 请求Python script in Jupyter notebook with HTTP requests to Azure Cognitive Search

    相比之下,空索引集合返回以下响应:{'@odata.context': 'https://mydemo.search.azure.cn/$metadata#indexes(name)', 'value': []}In contrast, an empty index collection returns this response: {'@odata.context': 'https://mydemo.search.azure.cn/$metadata#indexes(name)', 'value': []}

1 - 创建索引1 - Create an index

除非使用门户,服务中必须预先存在一个索引才能加载数据。Unless you are using the portal, an index must exist on the service before you can load data. 此步骤使用创建索引 REST API 向服务推送索引架构。This step uses the Create Index REST API to push an index schema to the service.

索引的所需元素包括名称、字段集合和键。Required elements of an index include a name, a fields collection, and a key. 字段集合定义文档的结构。 The fields collection defines the structure of a document. 每个字段具有一个确定其用法的名称、类型和属性(例如,该字段在搜索结果是否可全文搜索、可筛选或可检索)。Each field has a name, type, and attributes that determine how the field is used (for example, whether it is full-text searchable, filterable, or retrievable in search results). 在索引中,必须将一个 Edm.String 类型的字段指定为文档标识的键。 Within an index, one of the fields of type Edm.String must be designated as the key for document identity.

此索引名为“hotels-quickstart”,使用下面所示的字段定义。This index is named "hotels-quickstart" and has the field definitions you see below. 它是其他演练中使用的一个更大 Hotels 索引的子集。It's a subset of a larger Hotels index used in other walkthroughs. 为简明起见,本快速入门已对其进行修整。We trimmed it in this quickstart for brevity.

  1. 在下一个单元格中,将以下示例粘贴到某个单元格以提供架构。In the next cell, paste the following example into a cell to provide the schema.

    index_schema = {
       "name": "hotels-quickstart",  
       "fields": [
         {"name": "HotelId", "type": "Edm.String", "key": "true", "filterable": "true"},
         {"name": "HotelName", "type": "Edm.String", "searchable": "true", "filterable": "false", "sortable": "true", "facetable": "false"},
         {"name": "Description", "type": "Edm.String", "searchable": "true", "filterable": "false", "sortable": "false", "facetable": "false", "analyzer": "en.lucene"},
         {"name": "Description_fr", "type": "Edm.String", "searchable": "true", "filterable": "false", "sortable": "false", "facetable": "false", "analyzer": "fr.lucene"},
         {"name": "Category", "type": "Edm.String", "searchable": "true", "filterable": "true", "sortable": "true", "facetable": "true"},
         {"name": "Tags", "type": "Collection(Edm.String)", "searchable": "true", "filterable": "true", "sortable": "false", "facetable": "true"},
         {"name": "ParkingIncluded", "type": "Edm.Boolean", "filterable": "true", "sortable": "true", "facetable": "true"},
         {"name": "LastRenovationDate", "type": "Edm.DateTimeOffset", "filterable": "true", "sortable": "true", "facetable": "true"},
         {"name": "Rating", "type": "Edm.Double", "filterable": "true", "sortable": "true", "facetable": "true"},
         {"name": "Address", "type": "Edm.ComplexType", 
         "fields": [
         {"name": "StreetAddress", "type": "Edm.String", "filterable": "false", "sortable": "false", "facetable": "false", "searchable": "true"},
         {"name": "City", "type": "Edm.String", "searchable": "true", "filterable": "true", "sortable": "true", "facetable": "true"},
         {"name": "StateProvince", "type": "Edm.String", "searchable": "true", "filterable": "true", "sortable": "true", "facetable": "true"},
         {"name": "PostalCode", "type": "Edm.String", "searchable": "true", "filterable": "true", "sortable": "true", "facetable": "true"},
         {"name": "Country", "type": "Edm.String", "searchable": "true", "filterable": "true", "sortable": "true", "facetable": "true"}
        ]
       }
      ]
    }
    
  2. 在另一个单元格中构建请求。In another cell, formulate the request. 此 POST 请求以搜索服务的索引集合为目标,并基于在上一单元格中提供的索引架构创建索引。This POST request targets the indexes collection of your search service and creates an index based on the index schema you provided in the previous cell.

    url = endpoint + "indexes" + api_version
    response  = requests.post(url, headers=headers, json=index_schema)
    index = response.json()
    pprint(index)
    
  3. 运行每个步骤。Run each step.

    响应包含架构的 JSON 表示形式。The response includes the JSON representation of the schema. 以下屏幕截图只显示了响应的一部分。The following screenshot is showing just a portion of the response.

    创建索引的请求Request to create an index

Tip

验证索引创建结果的另一种方法是在门户中检查“索引”列表。Another way to verify index creation is to check the Indexes list in the portal.

2 - 加载文档2 - Load documents

若要推送文档,请向索引的 URL 终结点发出 HTTP POST 请求。To push documents, use an HTTP POST request to your index's URL endpoint. REST API 为添加、更新或删除文档The REST API is Add, Update, or Delete Documents. 文档源自 GitHub 上的 HotelsDataDocuments originate from HotelsData on GitHub.

  1. 在新单元格中,提供符合索引架构的四个文档。In a new cell, provide four documents that conform to the index schema. 指定每个文档的上传操作。Specify an upload action for each document.

    documents = {
        "value": [
        {
        "@search.action": "upload",
        "HotelId": "1",
        "HotelName": "Secret Point Motel",
        "Description": "The hotel is ideally located on the main commercial artery of the city in the heart of New York. A few minutes away is Time's Square and the historic centre of the city, as well as other places of interest that make New York one of America's most attractive and cosmopolitan cities.",
        "Description_fr": "L'hôtel est idéalement situé sur la principale artère commerciale de la ville en plein cœur de New York. A quelques minutes se trouve la place du temps et le centre historique de la ville, ainsi que d'autres lieux d'intérêt qui font de New York l'une des villes les plus attractives et cosmopolites de l'Amérique.",
        "Category": "Boutique",
        "Tags": [ "pool", "air conditioning", "concierge" ],
        "ParkingIncluded": "false",
        "LastRenovationDate": "1970-01-18T00:00:00Z",
        "Rating": 3.60,
        "Address": {
            "StreetAddress": "677 5th Ave",
            "City": "New York",
            "StateProvince": "NY",
            "PostalCode": "10022",
            "Country": "USA"
            }
        },
        {
        "@search.action": "upload",
        "HotelId": "2",
        "HotelName": "Twin Dome Motel",
        "Description": "The hotel is situated in a  nineteenth century plaza, which has been expanded and renovated to the highest architectural standards to create a modern, functional and first-class hotel in which art and unique historical elements coexist with the most modern comforts.",
        "Description_fr": "L'hôtel est situé dans une place du XIXe siècle, qui a été agrandie et rénovée aux plus hautes normes architecturales pour créer un hôtel moderne, fonctionnel et de première classe dans lequel l'art et les éléments historiques uniques coexistent avec le confort le plus moderne.",
        "Category": "Boutique",
        "Tags": [ "pool", "free wifi", "concierge" ],
        "ParkingIncluded": "false",
        "LastRenovationDate": "1979-02-18T00:00:00Z",
        "Rating": 3.60,
        "Address": {
            "StreetAddress": "140 University Town Center Dr",
            "City": "Sarasota",
            "StateProvince": "FL",
            "PostalCode": "34243",
            "Country": "USA"
            }
        },
        {
        "@search.action": "upload",
        "HotelId": "3",
        "HotelName": "Triple Landscape Hotel",
        "Description": "The Hotel stands out for its gastronomic excellence under the management of William Dough, who advises on and oversees all of the Hotel's restaurant services.",
        "Description_fr": "L'hôtel est situé dans une place du XIXe siècle, qui a été agrandie et rénovée aux plus hautes normes architecturales pour créer un hôtel moderne, fonctionnel et de première classe dans lequel l'art et les éléments historiques uniques coexistent avec le confort le plus moderne.",
        "Category": "Resort and Spa",
        "Tags": [ "air conditioning", "bar", "continental breakfast" ],
        "ParkingIncluded": "true",
        "LastRenovationDate": "2015-09-20T00:00:00Z",
        "Rating": 4.80,
        "Address": {
            "StreetAddress": "3393 Peachtree Rd",
            "City": "Atlanta",
            "StateProvince": "GA",
            "PostalCode": "30326",
            "Country": "USA"
            }
        },
        {
        "@search.action": "upload",
        "HotelId": "4",
        "HotelName": "Sublime Cliff Hotel",
        "Description": "Sublime Cliff Hotel is located in the heart of the historic center of Sublime in an extremely vibrant and lively area within short walking distance to the sites and landmarks of the city and is surrounded by the extraordinary beauty of churches, buildings, shops and monuments. Sublime Cliff is part of a lovingly restored 1800 palace.",
        "Description_fr": "Le sublime Cliff Hotel est situé au coeur du centre historique de sublime dans un quartier extrêmement animé et vivant, à courte distance de marche des sites et monuments de la ville et est entouré par l'extraordinaire beauté des églises, des bâtiments, des commerces et Monuments. Sublime Cliff fait partie d'un Palace 1800 restauré avec amour.",
        "Category": "Boutique",
        "Tags": [ "concierge", "view", "24-hour front desk service" ],
        "ParkingIncluded": "true",
        "LastRenovationDate": "1960-02-06T00:00:00Z",
        "Rating": 4.60,
        "Address": {
            "StreetAddress": "7400 San Pedro Ave",
            "City": "San Antonio",
            "StateProvince": "TX",
            "PostalCode": "78216",
            "Country": "USA"
            }
        }
    ]
    }
    
  2. 在另一个单元格中构建请求。In another cell, formulate the request. 此 POST 请求针对 hotels-quickstart 索引的文档集合,将推送在上一步骤中提供的文档。This POST request targets the docs collection of the hotels-quickstart index and pushes the documents provided in the previous step.

    url = endpoint + "indexes/hotels-quickstart/docs/index" + api_version
    response  = requests.post(url, headers=headers, json=documents)
    index_content = response.json()
    pprint(index_content)
    
  3. 运行每个步骤,将文档推送到搜索服务中的索引。Run each step to push the documents to an index in your search service. 结果应如以下示例所示。Results should look similar to the following example.

    将文档发送到索引Send documents to an index

3 - 搜索索引3 - Search an index

此步骤说明如何使用搜索文档 REST API 查询索引。This step shows you how to query an index using the Search Documents REST API.

  1. 在单元格中提供一个查询表达式,用于执行空搜索 (search=*),并返回任意文档的未排名列表 (search score = 1.0)。In a cell, provide a query expression that executes an empty search (search=*), returning an unranked list (search score = 1.0) of arbitrary documents. 默认情况下,Azure 认知搜索每次返回 50 个匹配项。By default, Azure Cognitive Search returns 50 matches at a time. 由于已结构化,此查询将返回整个文档结构和值。As structured, this query returns an entire document structure and values. 添加 $count=true 以获取结果中所有文档的计数。Add $count=true to get a count of all documents in the results.

    searchstring = '&search=*&$count=true'
    
    url = endpoint + "indexes/hotels-quickstart/docs" + api_version + searchstring
    response  = requests.get(url, headers=headers, json=searchstring)
    query = response.json()
    pprint(query)
    
  2. 在新单元格中提供以下示例,以根据字词“hotels”和“wifi”执行搜索。In a new cell, provide the following example to search on the terms "hotels" and "wifi". 添加 $select 以指定要包含在搜索结果中的字段。Add $select to specify which fields to include in the search results.

    searchstring = '&search=hotels wifi&$count=true&$select=HotelId,HotelName'
    
    url = endpoint + "indexes/hotels-quickstart/docs" + api_version + searchstring
    response  = requests.get(url, headers=headers, json=searchstring)
    query = response.json()
    pprint(query)   
    

    结果应如以下输出所示。Results should look similar to the following output.

    搜索索引Search an index

  3. 接下来,应用一个 $filter 表达式,该表达式仅选择评级高于 4 的酒店。Next, apply a $filter expression that selects only those hotels with a rating greater than 4.

    searchstring = '&search=*&$filter=Rating gt 4&$select=HotelId,HotelName,Description,Rating'
    
    url = endpoint + "indexes/hotels-quickstart/docs" + api_version + searchstring
    response  = requests.get(url, headers=headers, json=searchstring)
    query = response.json()
    pprint(query)     
    
  4. 默认情况下,搜索引擎返回前 50 个文档,但你可以使用 top 和 skip 来添加分页,并选择每个结果中的文档数。By default, the search engine returns the top 50 documents but you can use top and skip to add pagination and choose how many documents in each result. 此查询在每个结果集中返回两个文档。This query returns two documents in each result set.

    searchstring = '&search=boutique&$top=2&$select=HotelId,HotelName,Description'
    
    url = endpoint + "indexes/hotels-quickstart/docs" + api_version + searchstring
    response  = requests.get(url, headers=headers, json=searchstring)
    query = response.json()
    pprint(query)
    
  5. 上一示例使用 $orderby 按城市对结果排序。In this last example, use $orderby to sort results by city. 此示例包含“地址”集合中的字段。This example includes fields from the Address collection.

    searchstring = '&search=pool&$orderby=Address/City&$select=HotelId, HotelName, Address/City, Address/StateProvince'
    
    url = endpoint + "indexes/hotels-quickstart/docs" + api_version + searchstring
    response  = requests.get(url, headers=headers, json=searchstring)
    query = response.json()
    pprint(query)
    

清除Clean up

在自己的订阅中操作时,最好在项目结束时确定是否仍需要已创建的资源。When you're working in your own subscription, it's a good idea at the end of a project to identify whether you still need the resources you created. 持续运行资源可能会产生费用。Resources left running can cost you money. 可以逐个删除资源,也可以删除资源组以删除整个资源集。You can delete resources individually or delete the resource group to delete the entire set of resources.

可以使用左侧导航窗格中的“所有资源”或“资源组”链接 ,在门户中查找和管理资源。You can find and manage resources in the portal, using the All resources or Resource groups link in the left-navigation pane.

如果使用的是免费服务,请记住只能设置三个索引、索引器和数据源。If you are using a free service, remember that you are limited to three indexes, indexers, and data sources. 可以在门户中删除单个项目,以不超出此限制。You can delete individual items in the portal to stay under the limit.

后续步骤Next steps

为简单起见,本快速入门使用了 Hotels 索引的缩写版本。As a simplification, this quickstart uses an abbreviated version of the Hotels index. 你可以创建完整的版本,以尝试进行更有意思的查询。You can create the full version to try out more interesting queries. 若要获取完整版本和所有 50 个文档,请运行“导入数据”向导,并从内置的示例数据源中选择“hotels-sample”。 To get the full version and all 50 documents, run the Import data wizard, selecting hotels-sample from the built-in sample data sources.