如何:使用文本分析 API 检测情绪How to: Detect sentiment using the Text Analytics API

文本分析 API 的情绪分析功能评估文本并返回每个句子的情绪分数和标签。The Text Analytics API's Sentiment Analysis feature evaluates text and returns sentiment scores and labels for each sentence. 这有助于检测社交媒体、客户评价、论坛等内容中的积极和消极情绪。This is useful for detecting positive and negative sentiment in social media, customer reviews, discussion forums and more. API 使用的 AI 模型由该服务提供,只需发送内容即可进行分析。The AI models used by the API are provided by the service, you just have to send content for analysis.

提示

文本分析还提供一个基于 Linux 的 Docker 容器映像,用于检测语言,因此可以在靠近数据的位置安装并运行文本分析容器Text Analytics also provides a Linux-based Docker container image for language detection, so you can install and run the Text Analytics container close to your data.

情绪分析支持多种语言,并在预览版中提供了更多的语言。Sentiment Analysis supports a wide range of languages, with more in preview. 有关详细信息,请参阅支持的语言For more information, see Supported languages.

概念Concepts

文本分析 API 使用机器学习分类算法生成介于 0 到 1 之间的情绪分数。The Text Analytics API uses a machine learning classification algorithm to generate a sentiment score between 0 and 1. 接近 1 的评分表示正面情绪,接近 0 的评分表示负面情绪。Scores closer to 1 indicate positive sentiment, while scores closer to 0 indicate negative sentiment. 情绪分析针对整个文档进行,而不是针对文本中的各个实体进行。Sentiment analysis is performed on the entire document, instead of individual entities in the text. 这意味着将返回文档或句子级别的情绪分数。This means sentiment scores are returned at a document or sentence level.

使用的模型通过含有文本和情绪关联内容的大量语料库进行了预先训练。The model used is pre-trained with an extensive corpus of text and sentiment associations. 它使用多种技术进行分析,包括文本处理、词性分析、词序和字词关联。It utilizes a combination of techniques for analysis, including text processing, part-of-speech analysis, word placement, and word associations. 有关算法的详细信息,请参阅文本分析简介For more information about the algorithm, see Introducing Text Analytics. 目前,不支持提供用户自己的训练数据。Currently, it isn't possible to provide your own training data.

当文档只包含一些句子而不是大段文本时,评分准确性有提高的趋势。There's a tendency for scoring accuracy to improve when documents contain fewer sentences rather than a large block of text. 在客观性评估阶段,模型会确定整个文档是客观内容还是包含情感。During an objectivity assessment phase, the model determines whether a document as a whole is objective or contains sentiment. 客观内容占主导的文档将不会进入情绪检测阶段,将获得 0.50 分数,不会进行进一步处理。A document that's mostly objective doesn't progress to the sentiment detection phase, which results in a 0.50 score, with no further processing. 对于继续处理的文档,下一阶段会得出高于或低于 0.50 的分数,For documents that continue in the pipeline, the next phase generates a score above or below 0.50. 具体取决于文档中检测到的情绪程度。The score depends on the degree of sentiment detected in the document.

情绪分析版本和功能Sentiment Analysis versions and features

文本分析 API 提供两个版本的情绪分析 - v2 和 v3。The Text Analytics API offers two versions of Sentiment Analysis - v2 and v3. 情绪分析 v3(公共预览版)在 API 的文本分类和计分的准确度和细节方面有了显著改进。Sentiment Analysis v3 (Public preview) provides significant improvements in the accuracy and detail of the API's text categorization and scoring.

备注

  • 情绪分析 v3 的请求格式和数据限制与上一版本相同。The Sentiment Analysis v3 request format and data limits are the same as the previous version.
  • 情绪分析 v3 供以下区域使用:Australia EastCentral CanadaCentral USEast AsiaEast USEast US 2North EuropeSoutheast AsiaSouth Central USUK SouthWest EuropeWest US 2Sentiment Analysis v3 is available in the following regions: Australia East, Central Canada, Central US, East Asia, East US, East US 2, North Europe, Southeast Asia, South Central US, UK South, West Europe, and West US 2.
功能Feature 情绪分析 v2Sentiment Analysis v2 情绪分析 v3Sentiment Analysis v3
用于单个请求和批量请求的方法Methods for single, and batch requests XX XX
整篇文档的情绪分数Sentiment scores for the entire document XX XX
各个句子的情绪分数Sentiment scores for individual sentences XX
情绪标记Sentiment labeling XX
模型版本控制Model versioning XX

情绪评分Sentiment scoring

情绪分析 v3 使用情绪标签对文本进行分类(如下所述)。Sentiment Analysis v3 classifies text with sentiment labels (described below). 返回的分数表示模型的置信度,即文本为积极、消极或中性。The returned scores represent the model's confidence that the text is either positive, negative, or neutral. 分数越高,置信度越高。Higher values signify higher confidence.

情绪标记Sentiment labeling

情绪分析 v3 可以在句子和文档级别返回评分和标签。Sentiment Analysis v3 can return scores and labels at a sentence and document level. 评分和标签为 positivenegativeneutralThe scores and labels are positive, negative, and neutral. 在文档级别,也可返回没有分数的 mixed 情绪标签。At the document level, the mixed sentiment label also can be returned without a score. 文档的情绪由以下内容确定:The sentiment of the document is determined below:

句子情绪Sentence sentiment 返回的文档标签Returned document label
文档中至少有一个 positive 句子。At least one positive sentence is in the document. 剩下的句子为 neutralThe rest of the sentences are neutral. positive
文档中至少有一个 negative 句子。At least one negative sentence is in the document. 剩下的句子为 neutralThe rest of the sentences are neutral. negative
文档中至少有一个 negative 句子和一个 positive 句子。At least one negative sentence and at least one positive sentence are in the document. mixed
文档中的所有句子为 neutralAll sentences in the document are neutral. neutral

示例 C# 代码Example C# code

可以在 GitHub 上查找一个可调用此版情绪分析的示例 C# 应用程序。You can find an example C# application that calls this version of Sentiment Analysis on GitHub.

发送 REST API 请求Sending a REST API request

准备工作Preparation

当为情绪分析提供较少的文本时,会得到更高质量的结果。Sentiment analysis produces a higher-quality result when you give it smaller amounts of text to work on. 这与关键短语提取相反,关键短语提取在处理较大的文本块时表现更好。This is opposite from key phrase extraction, which performs better on larger blocks of text. 要从两个操作获取最佳结果,请考虑相应地重建输入。To get the best results from both operations, consider restructuring the inputs accordingly.

必须拥有以下格式的 JSON 文档:ID、文本和语言You must have JSON documents in this format: ID, text, and language.

每个文档的大小必须少于 5,120 个字符,Document size must be under 5,120 characters per document. 每个集合最多可包含 1,000 个项目 (ID)。You can have up to 1,000 items (IDs) per collection. 集合在请求正文中提交。The collection is submitted in the body of the request.

构造请求Structure the request

创建 POST 请求。Create a POST request. 使用 Postman 或以下参考链接中的“API 测试控制台”来快速构建并发送请求。You can use Postman or the API testing console in the following reference links to quickly structure and send one.

使用 Azure 上的文本分析资源或实例化的文本分析容器设置 HTTPS 终结点,以便进行情绪分析。Set the HTTPS endpoint for sentiment analysis by using either a Text Analytics resource on Azure or an instantiated Text Analytics container. 必须包括要使用的版本的正确 URL。You must include the correct URL for the version you want to use. 例如:For example:

备注

可以在 Azure 门户上找到文本分析资源的密钥和终结点。You can find your key and endpoint for your Text Analytics resource on the azure portal. 它们将位于资源的“快速启动”页上的“资源管理”下。They will be located on the resource's Quick start page, under resource management.

https://<your-custom-subdomain>.cognitiveservices.azure.cn/text/analytics/v3.0-preview.1/sentiment

发送请求标头以包括文本分析 API 密钥。Set a request header to include your Text Analytics API key. 在请求正文中,提供为此分析准备的 JSON 文档集合。In the request body, provide the JSON documents collection you prepared for this analysis.

情绪分析请求示例Example Sentiment Analysis request

下面是可能提交用于情绪分析的内容示例。The following is an example of content you might submit for sentiment analysis. 两个版本的 API 的请求格式相同。The request format is the same for both versions of the API.

{
    "documents": [
    {
        "language": "en",
        "id": "1",
        "text": "Hello world. This is some input text that I love."
    },
    {
        "language": "en",
        "id": "2",
        "text": "It's incredibly sunny outside! I'm so happy."
    }
    ],
}

发布请求Post the request

在收到请求时执行分析。Analysis is performed upon receipt of the request. 有关每分钟和每秒可以发送的请求的大小和数量的信息,请参阅概述中的数据限制部分。For information on the size and number of requests you can send per minute and second, see the data limits section in the overview.

文本分析 API 是无状态的。The Text Analytics API is stateless. 不会在帐户中存储数据,结果会立即在响应中返回。No data is stored in your account, and results are returned immediately in the response.

查看结果View the results

情绪分析器将文本分类为积极为主或消极为主,The sentiment analyzer classifies text as predominantly positive or negative. 并分配范围在 0 到 1 之间的分数。It assigns a score in the range of 0 to 1. 接近 0.5 的值表示中性或不确定。Values close to 0.5 are neutral or indeterminate. 得分 0.5 表示中性。A score of 0.5 indicates neutrality. 如果无法分析字符串的情绪或不含情绪,则分数始终为 0.5。When a string can't be analyzed for sentiment or has no sentiment, the score is always 0.5 exactly. 例如,如果传入带有英语语言代码的西班牙语字符串,则分数为 0.5。For example, if you pass in a Spanish string with an English language code, the score is 0.5.

系统会立即返回输出。Output is returned immediately. 可将结果流式传输到接受 JSON 的应用程序,或者将输出保存到本地系统上的文件中。You can stream the results to an application that accepts JSON or save the output to a file on the local system. 然后,将输出导入到可以用来对数据进行排序、搜索和操作的应用程序。Then, import the output into an application that you can use to sort, search, and manipulate the data.

情绪分析 v3 示例响应Sentiment Analysis v3 example response

情绪分析 v3 的响应包含每个已分析句子和文档的情绪标签和分数。Responses from Sentiment Analysis v3 contain sentiment labels and scores for each analyzed sentence and document. 如果文档情绪标签为 mixed,则不会返回 documentScoresdocumentScores is not returned if the document sentiment label is mixed.

{
    "documents": [
        {
            "id": "1",
            "sentiment": "positive",
            "documentScores": {
                "positive": 0.98570585250854492,
                "neutral": 0.0001625834556762,
                "negative": 0.0141316400840878
            },
            "sentences": [
                {
                    "sentiment": "neutral",
                    "sentenceScores": {
                        "positive": 0.0785155147314072,
                        "neutral": 0.89702343940734863,
                        "negative": 0.0244610067456961
                    },
                    "offset": 0,
                    "length": 12
                },
                {
                    "sentiment": "positive",
                    "sentenceScores": {
                        "positive": 0.98570585250854492,
                        "neutral": 0.0001625834556762,
                        "negative": 0.0141316400840878
                    },
                    "offset": 13,
                    "length": 36
                }
            ]
        },
        {
            "id": "2",
            "sentiment": "positive",
            "documentScores": {
                "positive": 0.89198976755142212,
                "neutral": 0.103382371366024,
                "negative": 0.0046278294175863
            },
            "sentences": [
                {
                    "sentiment": "positive",
                    "sentenceScores": {
                        "positive": 0.78401315212249756,
                        "neutral": 0.2067587077617645,
                        "negative": 0.0092281140387058
                    },
                    "offset": 0,
                    "length": 30
                },
                {
                    "sentiment": "positive",
                    "sentenceScores": {
                        "positive": 0.99996638298034668,
                        "neutral": 0.0000060341349126,
                        "negative": 0.0000275444017461
                    },
                    "offset": 31,
                    "length": 13
                }
            ]
        }
    ],
    "errors": []
}

总结Summary

本文介绍了使用文本分析 API 进行情绪分析的概念和工作流。In this article, you learned concepts and workflow for sentiment analysis using the Text Analytics API. 综上所述:In summary:

  • 情绪分析适用于选定的语言并提供两个版本。Sentiment Analysis is available for selected languages in two versions.
  • 请求正文中的 JSON 文档包括 ID、文本和语言代码。JSON documents in the request body include an ID, text, and language code.
  • POST 请求的目标是 /sentiment 终结点,方法是使用对订阅有效的个性化访问密钥和终结点The POST request is to a /sentiment endpoint by using a personalized access key and an endpoint that's valid for your subscription.
  • 响应输出包含每个文档 ID 的情绪得分,可流式传输到接受 JSON 的任何应用。Response output, which consists of a sentiment score for each document ID, can be streamed to any app that accepts JSON. 例如,Excel 和 Power BI。For example, Excel and Power BI.

另请参阅See also