快速入门:使用 Python REST API 调用文本分析认知服务Quickstart: Using the Python REST API to call the Text Analytics Cognitive Service

根据本快速入门中的说明,开始使用文本分析 REST API 和 Python 来分析语言。Use this quickstart to begin analyzing language with the Text Analytics REST API and Python. 本文展示了如何检测语言分析情绪提取关键短语以及识别链接的实体This article shows you how to detect language, analyze sentiment, extract key phrases, and identify linked entities.

提示

若要获取详细的 API 技术文档并查看其如何发挥作用,请使用以下链接。For detailed API technical documentation and to see it in action, use the following links. 还可以从内置 API 测试控制台发送 POST 请求。You can also send POST requests from the built-in API test console. 无需进行任何设置,只需将资源键和 JSON 文档粘贴到请求中:No setup is required, simply paste your resource key and JSON documents into the request:

先决条件Prerequisites

  • Python 3.xPython 3.x

  • Python 请求库The Python requests library

    可以使用以下命令安装此库:You can install the library with this command:

    pip install --upgrade requests
    

必须拥有可以访问文本分析 API 的认知服务 API 订阅You must have a Cognitive Services API subscription with access to the Text Analytics API. 如果没有订阅,可以创建一个 1 元试用帐户If you don't have a subscription, you can create a 1rmb trial account. 在继续操作之前,需使用激活帐户后由系统提供的文本分析订阅密钥。Before continuing, you will need the Text Analytics subscription key provided after activating your account.

创建新的 Python 应用程序Create a new Python application

在你喜欢使用的编辑器或 IDE 中创建一个新的 Python 应用程序。Create a new Python application in your favorite editor or IDE. 将以下导入内容添加到文件中。Add the following imports to your file.

import requests
# pprint is used to format the JSON response
from pprint import pprint

为资源的 Azure 终结点和订阅密钥创建变量。Create variables for your resource's Azure endpoint and subscription key.

import os

subscription_key = "<paste-your-text-analytics-key-here>"
endpoint = "<paste-your-text-analytics-endpoint-here>"

以下部分介绍如何调用每项 API 功能。The following sections describe how to call each of the API's features.

检测语言Detect languages

/text/analytics/v3.0/languages 追加到文本分析基终结点,形成语言检测 URL。Append /text/analytics/v3.0/languages to the Text Analytics base endpoint to form the language detection URL. 例如: https://<your-custom-subdomain>.cognitiveservices.azure.cn/text/analytics/v3.0/languagesFor example: https://<your-custom-subdomain>.cognitiveservices.azure.cn/text/analytics/v3.0/languages

language_api_url = endpoint + "/text/analytics/v3.0/languages"

API 的有效负载由 documents 列表组成,而列表中的项是包含 idtext 属性的元组。The payload to the API consists of a list of documents, which are tuples containing an id and a text attribute. text 属性存储要分析的文本,而 id 则可以是任何值。The text attribute stores the text to be analyzed, and the id can be any value.

documents = {"documents": [
    {"id": "1", "text": "This is a document written in English."},
    {"id": "2", "text": "Este es un document escrito en Español."},
    {"id": "3", "text": "这是一个用中文写的文件"}
]}

使用请求库将文档发送到 API。Use the Requests library to send the documents to the API. 将订阅密钥添加到 Ocp-Apim-Subscription-Key 标头,并发送带 requests.post() 的请求。Add your subscription key to the Ocp-Apim-Subscription-Key header, and send the request with requests.post().

headers = {"Ocp-Apim-Subscription-Key": subscription_key}
response = requests.post(language_api_url, headers=headers, json=documents)
languages = response.json()
pprint(languages)

输出Output

{
    "documents": [
        {
            "id": "1",
            "detectedLanguage": {
                "name": "English",
                "iso6391Name": "en",
                "confidenceScore": 1.0
            },
            "warnings": []
        },
        {
            "id": "2",
            "detectedLanguage": {
                "name": "Spanish",
                "iso6391Name": "es",
                "confidenceScore": 1.0
            },
            "warnings": []
        },
        {
            "id": "3",
            "detectedLanguage": {
                "name": "Chinese_Simplified",
                "iso6391Name": "zh_chs",
                "confidenceScore": 1.0
            },
            "warnings": []
        }
    ],
    "errors": [],
    "modelVersion": "2019-10-01"
}

分析情绪Analyze sentiment

若要检测一组文档的情绪(可以是正面的,也可以是负面的),请将 /text/analytics/v3.0/sentiment 追加到文本分析基终结点,形成语言检测 URL。To detect the sentiment (which ranges between positive or negative) of a set of documents, append /text/analytics/v3.0/sentiment to the Text Analytics base endpoint to form the language detection URL. 例如: https://<your-custom-subdomain>.cognitiveservices.azure.cn/text/analytics/v3.0/sentimentFor example: https://<your-custom-subdomain>.cognitiveservices.azure.cn/text/analytics/v3.0/sentiment

sentiment_url = endpoint + "/text/analytics/v3.0/sentiment"

与语言检测示例一样,请创建一个字典,其 documents 键由一系列文档组成。As with the language detection example, create a dictionary with a documents key that consists of a list of documents. 每个文档都是一个由 id、要分析的 text 和文本的 language 组成的元组。Each document is a tuple consisting of the id, the text to be analyzed and the language of the text.

documents = {"documents": [
    {"id": "1", "language": "en",
        "text": "I really enjoy the new XBox One S. It has a clean look, it has 4K/HDR resolution and it is affordable."},
    {"id": "2", "language": "es",
        "text": "Este ha sido un dia terrible, llegué tarde al trabajo debido a un accidente automobilistico."}
]}

使用请求库将文档发送到 API。Use the Requests library to send the documents to the API. 将订阅密钥添加到 Ocp-Apim-Subscription-Key 标头,并发送带 requests.post() 的请求。Add your subscription key to the Ocp-Apim-Subscription-Key header, and send the request with requests.post().

headers = {"Ocp-Apim-Subscription-Key": subscription_key}
response = requests.post(sentiment_url, headers=headers, json=documents)
sentiments = response.json()
pprint(sentiments)

输出Output

文档的情绪分数在 0.0 和 1.0 之间,分数越高表示情绪越积极。The sentiment score for a document is between 0.0 and 1.0, with a higher score indicating a more positive sentiment.

{
    "documents": [
        {
            "id": "1",
            "sentiment": "positive",
            "confidenceScores": {
                "positive": 1.0,
                "neutral": 0.0,
                "negative": 0.0
            },
            "sentences": [
                {
                    "sentiment": "positive",
                    "confidenceScores": {
                        "positive": 1.0,
                        "neutral": 0.0,
                        "negative": 0.0
                    },
                    "offset": 0,
                    "length": 102,
                    "text": "I really enjoy the new XBox One S. It has a clean look, it has 4K/HDR resolution and it is affordable."
                }
            ],
            "warnings": []
        },
        {
            "id": "2",
            "sentiment": "negative",
            "confidenceScores": {
                "positive": 0.02,
                "neutral": 0.05,
                "negative": 0.93
            },
            "sentences": [
                {
                    "sentiment": "negative",
                    "confidenceScores": {
                        "positive": 0.02,
                        "neutral": 0.05,
                        "negative": 0.93
                    },
                    "offset": 0,
                    "length": 92,
                    "text": "Este ha sido un dia terrible, llegué tarde al trabajo debido a un accidente automobilistico."
                }
            ],
            "warnings": []
        }
    ],
    "errors": [],
    "modelVersion": "2020-04-01"
}

提取关键短语Extract key phrases

若要从一组文档中提取关键短语,请将 /text/analytics/v3.0/keyPhrases 追加到文本分析基终结点,形成语言检测 URL。To extract the key phrases from a set of documents, append /text/analytics/v3.0/keyPhrases to the Text Analytics base endpoint to form the language detection URL. 例如: https://<your-custom-subdomain>.cognitiveservices.azure.cn/text/analytics/v3.0/keyPhrasesFor example: https://<your-custom-subdomain>.cognitiveservices.azure.cn/text/analytics/v3.0/keyPhrases

keyphrase_url = endpoint + "/text/analytics/v3.0/keyphrases"

此文档集合与用于情绪分析示例的文档集合相同。This collection of documents is the same used for the sentiment analysis example.

documents = {"documents": [
    {"id": "1", "language": "en",
        "text": "I really enjoy the new XBox One S. It has a clean look, it has 4K/HDR resolution and it is affordable."},
    {"id": "2", "language": "es",
        "text": "Si usted quiere comunicarse con Carlos, usted debe de llamarlo a su telefono movil. Carlos es muy responsable, pero necesita recibir una notificacion si hay algun problema."},
    {"id": "3", "language": "en",
        "text": "The Grand Hotel is a new hotel in the center of Seattle. It earned 5 stars in my review, and has the classiest decor I've ever seen."}
]}

使用请求库将文档发送到 API。Use the Requests library to send the documents to the API. 将订阅密钥添加到 Ocp-Apim-Subscription-Key 标头,并发送带 requests.post() 的请求。Add your subscription key to the Ocp-Apim-Subscription-Key header, and send the request with requests.post().

headers = {"Ocp-Apim-Subscription-Key": subscription_key}
response = requests.post(keyphrase_url, headers=headers, json=documents)
key_phrases = response.json()
pprint(key_phrases)

输出Output

{
    "documents": [
        {
            "id": "1",
            "keyPhrases": [
                "HDR resolution",
                "new XBox",
                "clean look"
            ],
            "warnings": []
        },
        {
            "id": "2",
            "keyPhrases": [
                "Carlos",
                "notificacion",
                "algun problema",
                "telefono movil"
            ],
            "warnings": []
        },
        {
            "id": "3",
            "keyPhrases": [
                "new hotel",
                "Grand Hotel",
                "review",
                "center of Seattle",
                "classiest decor",
                "stars"
            ],
            "warnings": []
        }
    ],
    "errors": [],
    "modelVersion": "2019-10-01"
}

识别实体Identify Entities

若要标识文本文档中的已知实体(人物、位置和事物),请将 /text/analytics/v3.0/entities/recognition/general 追加到文本分析基终结点,形成语言检测 URL。To identify well-known entities (people, places, and things) in text documents, append /text/analytics/v3.0/entities/recognition/general to the Text Analytics base endpoint to form the language detection URL. 例如: https://<your-custom-subdomain>.cognitiveservices.azure.cn/text/analytics/v3.0/entities/recognition/generalFor example: https://<your-custom-subdomain>.cognitiveservices.azure.cn/text/analytics/v3.0/entities/recognition/general

entities_url = endpoint + "/text/analytics/v3.0/entities/recognition/general"

创建文档集合,就像以前的示例一样。Create a collection of documents, like in the previous examples.

documents = {"documents": [
    {"id": "1", "text": "Microsoft is an It company."}
]}

使用请求库将文档发送到 API。Use the Requests library to send the documents to the API. 将订阅密钥添加到 Ocp-Apim-Subscription-Key 标头,并发送带 requests.post() 的请求。Add your subscription key to the Ocp-Apim-Subscription-Key header, and send the request with requests.post().

headers = {"Ocp-Apim-Subscription-Key": subscription_key}
response = requests.post(entities_url, headers=headers, json=documents)
entities = response.json()
pprint(entities)

输出Output

{
    "documents": [
        {
            "id": "1",
            "entities": [
                {
                    "text": "Microsoft",
                    "category": "Organization",
                    "offset": 0,
                    "length": 9,
                    "confidenceScore": 0.86
                },
                {
                    "text": "IT",
                    "category": "Skill",
                    "offset": 16,
                    "length": 2,
                    "confidenceScore": 0.8
                }
            ],
            "warnings": []
        }
    ],
    "errors": [],
    "modelVersion": "2020-04-01"
}

后续步骤Next steps

另请参阅See also

文本分析概述Text Analytics overview
常见问题解答 (FAQ)Frequently asked questions (FAQ)