示例:如何使用文本分析提取关键短语Example: How to extract key phrases using Text Analytics

关键短语提取 API 用于计算非结构化的文本,并针对每个 JSON 文档返回关键短语列表。The Key Phrase Extraction API evaluates unstructured text, and for each JSON document, returns a list of key phrases.

如果需要快速确定文档集中的要点,此功能十分有用。This capability is useful if you need to quickly identify the main points in a collection of documents. 例如,给定输入文本“The food was delicious and there were wonderful staff”,服务会返回谈话要点:“food”和“wonderful staff”。For example, given input text "The food was delicious and there were wonderful staff", the service returns the main talking points: "food" and "wonderful staff".

有关详细信息,请参阅支持的语言一文。See the Supported languages article for more information.

Tip

文本分析还提供一个基于 Linux 的 Docker 容器映像,用于提取关键短语,因此可以在靠近数据的位置安装并运行文本分析容器Text Analytics also provides a Linux-based Docker container image for key phrase extraction, so you can install and run the Text Analytics container close to your data.

准备工作Preparation

提供的要处理的文本量越大,关键短语提取效果越好。Key phrase extraction works best when you give it bigger amounts of text to work on. 这恰好与情绪分析(文本量越小,效果越好)相反。This is opposite from sentiment analysis, which performs better on smaller amounts of text. 要从两个操作获取最佳结果,请考虑相应地重建输入。To get the best results from both operations, consider restructuring the inputs accordingly.

必须拥有以下格式的 JSON 文档:ID、文本、语言You must have JSON documents in this format: ID, text, language

每个文档的大小必须为 5,120 个或更少的字符,每个集合最多可包含 1,000 个项目 (ID)。Document size must be 5,120 or fewer characters per document, and you can have up to 1,000 items (IDs) per collection. 集合在请求正文中提交。The collection is submitted in the body of the request. 以下示例例举了可能提交以进行关键短语提取的内容。The following example is an illustration of content you might submit for key phrase extraction.

    {
        "documents": [
            {
                "language": "en",
                "id": "1",
                "text": "We love this trail and make the trip every year. The views are breathtaking and well worth the hike!"
            },
            {
                "language": "en",
                "id": "2",
                "text": "Poorly marked trails! I thought we were goners. Worst hike ever."
            },
            {
                "language": "en",
                "id": "3",
                "text": "Everyone in my family liked the trail but thought it was too challenging for the less athletic among us. Not necessarily recommended for small children."
            },
            {
                "language": "en",
                "id": "4",
                "text": "It was foggy so we missed the spectacular views, but the trail was ok. Worth checking out if you are in the area."
            },
            {
                "language": "en",
                "id": "5",
                "text": "This is my favorite trail. It has beautiful views and many places to stop and rest"
            }
        ]
    }

步骤 1:构造请求Step 1: Structure the request

有关请求定义的详细信息,请参阅如何调用文本分析 APIDetails on request definition can be found in How to call the Text Analytics API. 为方便起见,特重申以下几点:The following points are restated for convenience:

  • 创建 POST 请求 。Create a POST request. 查看此请求的 API 文档:关键短语 APIReview the API documentation for this request: Key Phrases API

  • 使用 Azure 上的文本分析资源或实例化的文本分析容器设置 HTTP 终结点,以便提取关键短语。Set the HTTP endpoint for key phrase extraction, using either a Text Analytics resource on Azure or an instantiated Text Analytics container. 它必须包含 /keyPhrases 资源:https://chinaeast2.api.cognitive.azure.cn/text/analytics/v2.1/keyPhrasesIt must include the /keyPhrases resource: https://chinaeast2.api.cognitive.azure.cn/text/analytics/v2.1/keyPhrases

  • 设置请求头以包含文本分析操作的访问密钥。Set a request header to include the access key for Text Analytics operations. 有关详细信息,请参阅如何查找终结点和访问密钥For more information, see How to find endpoints and access keys.

  • 在请求正文中,提供为此分析准备的 JSON 文档集合In the request body, provide the JSON documents collection you prepared for this analysis

Tip

使用 Postman 或打开文档中的“API 测试控制台”来构造请求并将其 POST 到该服务 。Use Postman or open the API testing console in the documentation to structure a request and POST it to the service.

步骤 2:发布请求Step 2: Post the request

在收到请求时执行分析。Analysis is performed upon receipt of the request. 有关每分钟和每秒可以发送的请求的大小和数量的信息,请参阅概述中的数据限制部分。See the data limits section in the overview for information on the size and number of requests you can send per minute and second.

记住,该服务是无状态服务。Recall that the service is stateless. 帐户中未存储任何数据。No data is stored in your account. 结果会立即在响应中返回。Results are returned immediately in the response.

步骤 3:查看结果Step 3: View results

所有 POST 请求都将返回 JSON 格式的响应,其中包含 ID 和检测到的属性。All POST requests return a JSON formatted response with the IDs and detected properties.

系统会立即返回输出。Output is returned immediately. 可将结果流式传输到接受 JSON 的应用程序,或者将输出保存到本地系统上的文件中,然后将其导入到允许对数据进行排序、搜索和操作的应用程序。You can stream the results to an application that accepts JSON or save the output to a file on the local system, and then import it into an application that allows you to sort, search, and manipulate the data.

关键短语提取输出的示例如下所示:An example of the output for key phrase extraction is shown here:

    {
        "documents": [
            {
                "keyPhrases": [
                    "year",
                    "trail",
                    "trip",
                    "views"
                ],
                "id": "1"
            },
            {
                "keyPhrases": [
                    "marked trails",
                    "Worst hike",
                    "goners"
                ],
                "id": "2"
            },
            {
                "keyPhrases": [
                    "trail",
                    "small children",
                    "family"
                ],
                "id": "3"
            },
            {
                "keyPhrases": [
                    "spectacular views",
                    "trail",
                    "area"
                ],
                "id": "4"
            },
            {
                "keyPhrases": [
                    "places",
                    "beautiful views",
                    "favorite trail"
                ],
                "id": "5"
            }
        ],
        "errors": []
    }

如上所述,分析器查找和放弃不重要的字词,并保留似乎是句子主语或宾语的字词或短语。As noted, the analyzer finds and discards non-essential words, and keeps single terms or phrases that appear to be the subject or object of a sentence.

摘要Summary

在本文中,你已了解使用认知服务中的文本分析进行关键短语提取的概念和工作流。In this article, you learned concepts and workflow for key phrase extraction using Text Analytics in Cognitive Services. 摘要:In summary:

  • 关键短语提取 API 适用于所选语言。Key phrase extraction API is available for selected languages.
  • 请求正文中的 JSON 文档包括 ID、文本和语言代码。JSON documents in the request body include an id, text, and language code.
  • POST 请求的目标是 /keyphrases 终结点,方法是使用对订阅有效的个性化访问密钥和终结点POST request is to a /keyphrases endpoint, using a personalized access key and an endpoint that is valid for your subscription.
  • 响应输出包含每个文档 ID 的关键单词和短语,可以流式传输到接受 JSON 的任何应用,包括 Excel 和 Power BI(仅举几例)。Response output, which consists of key words and phrases for each document ID, can be streamed to any app that accepts JSON, including Excel and Power BI, to name a few.

另请参阅See also

文本分析概述常见问题解答 (FAQ)Text Analytics overview Frequently asked questions (FAQ)
文本分析产品页Text Analytics product page

后续步骤Next steps