快速入门:使用 Python REST API 调用文本分析认知服务Quickstart: Using the Python REST API to call the Text Analytics Cognitive Service

根据本快速入门中的说明,开始使用文本分析 REST API 和 Python 来分析语言。Use this quickstart to begin analyzing language with the Text Analytics REST API and Python. 本文展示了如何检测语言分析情绪提取关键短语以及识别链接的实体This article shows you how to detect language, analyze sentiment, extract key phrases, and identify linked entities.

有关 API 的技术文档,请参阅 API 定义Refer to the API definitions for technical documentation for the APIs.

先决条件Prerequisites

必须拥有可以访问文本分析 API 的认知服务 API 订阅You must have a Cognitive Services API subscription with access to the Text Analytics API. 如果没有订阅,可以创建一个 1 元试用帐户If you don't have a subscription, you can create a 1rmb trial account. 在继续操作之前,需使用激活帐户后由系统提供的文本分析订阅密钥。Before continuing, you will need the Text Analytics subscription key provided after activating your account.

创建新的 Python 应用程序Create a new Python application

在最喜爱的编辑器或 IDE 中创建一个新的 Python 应用程序。Create a new Python application in your favorite editor or IDE. 将以下导入内容添加到文件中。Add the following imports to your file.

import requests
# pprint is used to format the JSON response
from pprint import pprint

为订阅密钥创建变量,并文本分析 REST API 创建终结点。Create variables for your subscription key, and the endpoint for the Text Analytics REST API. 验证终结点中的区域是否与注册时使用的区域(例如 chinaeast2)相对应。Verify that the region in the endpoint corresponds to the one you used when you signed up (for example chinaeast2).

subscription_key = "<ADD YOUR KEY HERE>"
text_analytics_base_url = "https://chinaeast2.api.cognitive.azure.cn/text/analytics/v2.1/"

以下部分介绍如何调用每项 API 功能。The following sections describe how to call each of the API's features.

检测语言Detect languages

languages 追加到文本分析基终结点,形成语言检测 URL。Append languages to the Text Analytics base endpoint to form the language detection URL. 例如: https://chinaeast2.api.cognitive.azure.cn/text/analytics/v2.1/languagesFor example: https://chinaeast2.api.cognitive.azure.cn/text/analytics/v2.1/languages

language_api_url = text_analytics_base_url + "languages"

API 的有效负载由 documents 列表组成,而列表中的项是包含 idtext 属性的元组。The payload to the API consists of a list of documents, which are tuples containing an id and a text attribute. text 属性存储要分析的文本,而 id 则可以是任何值。The text attribute stores the text to be analyzed, and the id can be any value.

documents = { "documents": [
    { "id": "1", "text": "This is a document written in English." },
    { "id": "2", "text": "Este es un document escrito en Español." },
    { "id": "3", "text": "这是一个用中文写的文件" }
]}

使用请求库将文档发送到 API。Use the Requests library to send the documents to the API. 将订阅密钥添加到 Ocp-Apim-Subscription-Key 标头,并发送带 requests.post() 的请求。Add your subscription key to the Ocp-Apim-Subscription-Key header, and send the request with requests.post().

headers   = {"Ocp-Apim-Subscription-Key": subscription_key}
response  = requests.post(language_api_url, headers=headers, json=documents)
languages = response.json()
pprint(languages)

输出Output

{
"documents":[
    {
        "detectedLanguages":[
        {
            "iso6391Name":"en",
            "name":"English",
            "score":1.0
        }
        ],
        "id":"1"
    },
    {
        "detectedLanguages":[
        {
            "iso6391Name":"es",
            "name":"Spanish",
            "score":1.0
        }
        ],
        "id":"2"
    },
    {
        "detectedLanguages":[
        {
            "iso6391Name":"zh_chs",
            "name":"Chinese_Simplified",
            "score":1.0
        }
        ],
        "id":"3"
    }
],
"errors":[]
}

分析情绪Analyze sentiment

若要检测一组文档的情绪(可以是正面的,也可以是负面的),请将 sentiment 追加到文本分析基终结点,形成语言检测 URL。To detect the sentiment (which ranges between positive or negative) of a set of documents, append sentiment to the Text Analytics base endpoint to form the language detection URL. 例如: https://chinaeast2.api.cognitive.azure.cn/text/analytics/v2.1/sentimentFor example: https://chinaeast2.api.cognitive.azure.cn/text/analytics/v2.1/sentiment

sentiment_url = text_analytics_base_url + "sentiment"

与语言检测示例一样,请创建一个字典,其 documents 键由一系列文档组成。As with the language detection example, create a dictionary with a documents key that consists of a list of documents. 每个文档都是一个由 id、要分析的 text 和文本的 language 组成的元组。Each document is a tuple consisting of the id, the text to be analyzed and the language of the text.

documents = {"documents" : [
  {"id": "1", "language": "en", "text": "I had a wonderful experience! The rooms were wonderful and the staff was helpful."},
  {"id": "2", "language": "en", "text": "I had a terrible time at the hotel. The staff was rude and the food was awful."},  
  {"id": "3", "language": "es", "text": "Los caminos que llevan hasta Monte Rainier son espectaculares y hermosos."},  
  {"id": "4", "language": "es", "text": "La carretera estaba atascada. Había mucho tráfico el día de ayer."}
]}

使用请求库将文档发送到 API。Use the Requests library to send the documents to the API. 将订阅密钥添加到 Ocp-Apim-Subscription-Key 标头,并发送带 requests.post() 的请求。Add your subscription key to the Ocp-Apim-Subscription-Key header, and send the request with requests.post().

headers   = {"Ocp-Apim-Subscription-Key": subscription_key}
response  = requests.post(sentiment_url, headers=headers, json=documents)
sentiments = response.json()
pprint(sentiments)

输出Output

文档的情绪分数在 0.0 和 1.0 之间,分数越高表示情绪越积极。The sentiment score for a document is between 0.0 and 1.0, with a higher score indicating a more positive sentiment.

{
  "documents":[
    {
      "id":"1",
      "score":0.9708490371704102
    },
    {
      "id":"2",
      "score":0.0019068121910095215
    },
    {
      "id":"3",
      "score":0.7456425428390503
    },
    {
      "id":"4",
      "score":0.334433376789093
    }
  ],
  "errors":[

  ]
}

提取关键短语Extract key phrases

若要从一组文档中提取关键短语,请将 keyPhrases 追加到文本分析基终结点,形成语言检测 URL。To extract the key phrases from a set of documents, append keyPhrases to the Text Analytics base endpoint to form the language detection URL. 例如: https://chinaeast2.api.cognitive.azure.cn/text/analytics/v2.1/keyPhrasesFor example: https://chinaeast2.api.cognitive.azure.cn/text/analytics/v2.1/keyPhrases

keyphrase_url = text_analytics_base_url + "keyPhrases"

此文档集合与用于情绪分析示例的文档集合相同。This collection of documents is the same used for the sentiment analysis example.

documents = {"documents" : [
  {"id": "1", "language": "en", "text": "I had a wonderful experience! The rooms were wonderful and the staff was helpful."},
  {"id": "2", "language": "en", "text": "I had a terrible time at the hotel. The staff was rude and the food was awful."},  
  {"id": "3", "language": "es", "text": "Los caminos que llevan hasta Monte Rainier son espectaculares y hermosos."},  
  {"id": "4", "language": "es", "text": "La carretera estaba atascada. Había mucho tráfico el día de ayer."}
]}

使用请求库将文档发送到 API。Use the Requests library to send the documents to the API. 将订阅密钥添加到 Ocp-Apim-Subscription-Key 标头,并发送带 requests.post() 的请求。Add your subscription key to the Ocp-Apim-Subscription-Key header, and send the request with requests.post().

headers   = {"Ocp-Apim-Subscription-Key": subscription_key}
response  = requests.post(keyphrase_url, headers=headers, json=documents)
key_phrases = response.json()
pprint(key_phrases)

输出Output

{
  "documents":[
    {
      "keyPhrases":[
        "wonderful experience",
        "staff",
        "rooms"
      ],
      "id":"1"
    },
    {
      "keyPhrases":[
        "food",
        "terrible time",
        "hotel",
        "staff"
      ],
      "id":"2"
    },
    {
      "keyPhrases":[
        "Monte Rainier",
        "caminos"
      ],
      "id":"3"
    },
    {
      "keyPhrases":[
        "carretera",
        "tráfico",
        "día"
      ],
      "id":"4"
    }
  ],
  "errors":[

  ]
}

识别实体Identify Entities

若要标识文本文档中的已知实体(人物、位置和事物),请将 entities 追加到文本分析基终结点,形成语言检测 URL。To identify well-known entities (people, places, and things) in text documents, append entities to the Text Analytics base endpoint to form the language detection URL. 例如: https://chinaeast2.api.cognitive.azure.cn/text/analytics/v2.1/entitiesFor example: https://chinaeast2.api.cognitive.azure.cn/text/analytics/v2.1/entities

entities_url = text_analytics_base_url + "entities"

创建文档集合,就像以前的示例一样。Create a collection of documents, like in the previous examples.

documents = {"documents" : [
  {"id": "1", "text": "Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975, to develop and sell BASIC interpreters for the Altair 8800."}
]}

使用请求库将文档发送到 API。Use the Requests library to send the documents to the API. 将订阅密钥添加到 Ocp-Apim-Subscription-Key 标头,并发送带 requests.post() 的请求。Add your subscription key to the Ocp-Apim-Subscription-Key header, and send the request with requests.post().

headers   = {"Ocp-Apim-Subscription-Key": subscription_key}
response  = requests.post(entities_url, headers=headers, json=documents)
entities = response.json()

输出Output

{'documents': [{'id': '1',
   'entities': [{'name': 'Microsoft',
     'matches': [{'wikipediaScore': 0.502357972145024,
       'entityTypeScore': 1.0,
       'text': 'Microsoft',
       'offset': 0,
       'length': 9}],
     'wikipediaLanguage': 'en',
     'wikipediaId': 'Microsoft',
     'wikipediaUrl': 'https://en.wikipedia.org/wiki/Microsoft',
     'bingId': 'a093e9b9-90f5-a3d5-c4b8-5855e1b01f85',
     'type': 'Organization'},
    {'name': 'Bill Gates',
     'matches': [{'wikipediaScore': 0.5849375085784292,
       'entityTypeScore': 0.999847412109375,
       'text': 'Bill Gates',
       'offset': 25,
       'length': 10}],
     'wikipediaLanguage': 'en',
     'wikipediaId': 'Bill Gates',
     'wikipediaUrl': 'https://en.wikipedia.org/wiki/Bill_Gates',
     'bingId': '0d47c987-0042-5576-15e8-97af601614fa',
     'type': 'Person'},
    {'name': 'Paul Allen',
     'matches': [{'wikipediaScore': 0.5314163053043621,
       'entityTypeScore': 0.9988409876823425,
       'text': 'Paul Allen',
       'offset': 40,
       'length': 10}],
     'wikipediaLanguage': 'en',
     'wikipediaId': 'Paul Allen',
     'wikipediaUrl': 'https://en.wikipedia.org/wiki/Paul_Allen',
     'bingId': 'df2c4376-9923-6a54-893f-2ee5a5badbc7',
     'type': 'Person'},
    {'name': 'April 4',
     'matches': [{'wikipediaScore': 0.37312706493069636,
       'entityTypeScore': 0.8,
       'text': 'April 4',
       'offset': 54,
       'length': 7}],
     'wikipediaLanguage': 'en',
     'wikipediaId': 'April 4',
     'wikipediaUrl': 'https://en.wikipedia.org/wiki/April_4',
     'bingId': '52535f87-235e-b513-54fe-c03e4233ac6e',
     'type': 'Other'},
    {'name': 'April 4, 1975',
     'matches': [{'entityTypeScore': 0.8,
       'text': 'April 4, 1975',
       'offset': 54,
       'length': 13}],
     'type': 'DateTime',
     'subType': 'Date'},
    {'name': 'BASIC',
     'matches': [{'wikipediaScore': 0.35916049097766867,
       'entityTypeScore': 0.8,
       'text': 'BASIC',
       'offset': 89,
       'length': 5}],
     'wikipediaLanguage': 'en',
     'wikipediaId': 'BASIC',
     'wikipediaUrl': 'https://en.wikipedia.org/wiki/BASIC',
     'bingId': '5b16443d-501c-58f3-352e-611bbe75aa6e',
     'type': 'Other'},
    {'name': 'Altair 8800',
     'matches': [{'wikipediaScore': 0.8697256853652899,
       'entityTypeScore': 0.8,
       'text': 'Altair 8800',
       'offset': 116,
       'length': 11}],
     'wikipediaLanguage': 'en',
     'wikipediaId': 'Altair 8800',
     'wikipediaUrl': 'https://en.wikipedia.org/wiki/Altair_8800',
     'bingId': '7216c654-3779-68a2-c7b7-12ff3dad5606',
     'type': 'Other'}]}],
 'errors': []}

后续步骤Next steps

另请参阅See also

文本分析概述Text Analytics overview
常见问题解答 (FAQ)Frequently asked questions (FAQ)