什么是文本分析 API?What is the Text Analytics API?

文本分析 API 是一种基于云的服务,它提供用于文本挖掘和文本分析的自然语言处理 (NLP) 功能,包括:情绪分析、观点挖掘、关键短语提取、语言检测和命名实体识别。The Text Analytics API is a cloud-based service that provides Natural Language Processing (NLP) features for text mining and text analysis, including: sentiment analysis, opinion mining, key phrase extraction, language detection, and named entity recognition.

该 API 是 Azure 认知服务的一部分,是云中机器学习和 AI 算法的集合,适用于开发项目。The API is a part of Azure Cognitive Services, a collection of machine learning and AI algorithms in the cloud for your development projects. 可以将这些功能与 REST API 版本 3.0版本 3.1-preview客户端库配合使用。You can use these features with the REST API version 3.0 or version 3.1-preview, or the client library.

本文档包含以下类型的文章:This documentation contains the following types of articles:

  • 快速入门是分步说明,你可按照该说明调用服务,并迅速获得结果。Quickstarts are step-by-step instructions that let you make calls to the service and get results in a short period of time.
  • 操作指南包含以更具体的方式或自定义方式使用服务的说明。How-to guides contain instructions for using the service in more specific or customized ways.
  • 概念详细介绍了服务的功能和特性。Concepts provide in-depth explanations of the service's functionality and features.
  • 教程是一份篇幅较长的指南,向你演示了如何在更广泛的业务解决方案中使用此服务作为组件。Tutorials are longer guides that show you how to use this service as a component in broader business solutions.

情绪分析Sentiment analysis

通过在文本中挖掘有关积极情绪或消极情绪的线索,使用情绪分析确定人们如何看待你的品牌或主题。Use sentiment analysis and find out what people think of your brand or topic by mining the text for clues about positive or negative sentiment.

此功能根据服务在句子和文档级别找到的最高置信度分数来提供情绪标签(例如“消极”、“中立”和“积极”)。The feature provides sentiment labels (such as "negative", "neutral" and "positive") based on the highest confidence score found by the service at a sentence and document-level. 此功能还会为每个文档和文档中的句子返回介于 0 和 1 之间的置信度分数以表示积极、中立和消极情绪。This feature also returns confidence scores between 0 and 1 for each document & sentences within it for positive, neutral and negative sentiment. 你还可以使用容器在本地运行该服务。You can also be run the service on premises using a container.

观点挖掘是情绪分析的一项功能,在 v3.1 预览版中开始提供。Starting in the v3.1 preview, opinion mining is a feature of Sentiment Analysis. 此功能在自然语言处理 (NLP) 中也称为基于方面的情绪分析,它更加精细地描述了对文本中某些字(例如产品或服务的属性)的观点。Also known as Aspect-based Sentiment Analysis in Natural Language Processing (NLP), this feature provides more granular information about the opinions related to words (such as the attributes of products or services) in text.

关键短语提取Key phrase extraction

使用关键短语提取可以快速识别文本中的主要概念。Use key phrase extraction to quickly identify the main concepts in text. 例如,在文本“The food was delicious and there were wonderful staff”中,关键短语提取将返回谈话要点:“food”和“wonderful staff”。For example, in the text "The food was delicious and there were wonderful staff", Key Phrase Extraction will return the main talking points: "food" and "wonderful staff".

语言检测Language detection

语言检测可以检测输入文本是用哪种语言编写的,并以多种语言、语言变体、方言和一些区域/文化语言报告请求中提交的每个文档的单一语言代码。Language detection can detect the language an input text is written in and report a single language code for every document submitted on the request in a wide range of languages, variants, dialects, and some regional/cultural languages. 语言代码与置信度分数成对出现。The language code is paired with a confidence score.

命名实体识别Named entity recognition

命名实体识别 (NER) 可以对文本中的实体进行识别和分类,将其识别并分类为人员、地点、组织、数量,还可以识别众所周知的实体并将其链接到 Web 上的详细信息。Named Entity Recognition (NER) can Identify and categorize entities in your text as people, places, organizations, quantities, Well-known entities are also recognized and linked to more information on the web.

使用 Docker 容器进行本地部署Deploy on premises using Docker containers

使用文本分析容器在本地部署 API 功能。Use Text Analytics containers to deploy API features on-premises. 借助这些 docker 容器,你能够将服务进一步引入数据,以满足合规性、安全性或其他操作目的。These docker containers enable you to bring the service closer to your data for compliance, security or other operational reasons. 文本分析提供以下容器:Text Analytics offers the following containers:

  • 情绪分析sentiment analysis
  • 关键短语提取(预览版)key phrase extraction (preview)
  • 语言检测(预览版)language detection (preview)

典型工作流Typical workflow

工作流非常简单:在代码中提交分析数据和处理输出。The workflow is simple: you submit data for analysis and handle outputs in your code. 分析器按原样使用,无需额外的配置或自定义。Analyzers are consumed as-is, with no additional configuration or customization.

  1. 为文本分析创建 Azure 资源Create an Azure resource for Text Analytics. 然后,获取生成的密钥,以便对请求进行身份验证。Afterwards, get the key generated for you to authenticate your requests.

  2. 规划请求,其中包含原始非结构化文本形式的 JSON 数据。Formulate a request containing your data as raw unstructured text, in JSON.

  3. 将此请求发布到注册期间建立的终结点,并追加所需的资源:情绪分析、关键短语提取、语言检测或命名实体识别。Post the request to the endpoint established during sign-up, appending the desired resource: sentiment analysis, key phrase extraction, language detection, or named entity recognition.

  4. 在本地流式处理或存储响应。Stream or store the response locally. 根据具体的请求,结果将是情绪评分、提取的关键短语集合或语言代码。Depending on the request, results are either a sentiment score, a collection of extracted key phrases, or a language code.

输出将会根据 ID 以单个 JSON 文档的形式返回,其中包含发布的每个文本文档的结果。Output is returned as a single JSON document, with results for each text document you posted, based on ID. 然后,可以分析、可视化结果,或将其分类成可行的见解。You can subsequently analyze, visualize, or categorize the results into actionable insights.

数据不会存储在你的帐户中。Data is not stored in your account. 文本分析 API 执行的操作是无状态的,这意味着,将会处理所提供的文本,并立即返回结果。Operations performed by the Text Analytics API are stateless, which means the text you provide is processed and results are returned immediately.

适合多种编程经验水平的文本分析Text Analytics for multiple programming experience levels

即使编程经验并不丰富,也可以开始在进程中使用文本分析 API。You can start using the Text Analytics API in your processes, even if you don't have much experience in programming. 学习这些教程,了解如何根据自己的经验水平使用该 API 以不同方式分析文本。Use these tutorials to learn how you can use the API to analyze text in different ways to fit your experience level.

支持的语言Supported languages

为方便查找,本部分已转移到单独的文章。This section has been moved to a separate article for better discoverability. 有关此内容,请参阅文本分析 API 支持的语言Refer to Supported languages in the Text Analytics API for this content.

数据限制Data limits

所有的文本分析 API 终结点都接受原始文本数据。All of the Text Analytics API endpoints accept raw text data. 有关详细信息,请参阅数据限制一文。See the Data limits article for more information.

Unicode 编码Unicode encoding

文本分析 API 使用 Unicode 编码来呈现文本和计算字符数。The Text Analytics API uses Unicode encoding for text representation and character count calculations. 可以 UTF-8 和 UTF-16 编码提交请求,这在字符计数方面没有可度量的差别。Requests can be submitted in both UTF-8 and UTF-16 with no measurable differences in the character count. Unicode 码位用作字符长度的启发因子,对文本分析数据限制的影响被视为等效。Unicode codepoints are used as the heuristic for character length and are considered equivalent for the purposes of text analytics data limits. 如果你使用 StringInfo.LengthInTextElements 获取字符计数,则使用的方法也是我们用来度量数据大小的方法。If you use StringInfo.LengthInTextElements to get the character count, you are using the same method we use to measure data size.

后续步骤Next steps