快速入门:使用文本分析客户端库和 REST APIQuickstart: Use the Text Analytics client library and REST API
参考本文开始使用文本分析客户端库和 REST API。Use this article to get started with the Text Analytics client library and REST API. 按照以下步骤使用示例代码挖掘文本:Follow these steps to try out examples code for mining text:
- 情绪分析Sentiment analysis
- 观点挖掘Opinion mining
- 语言检测Language detection
- 实体识别Entity recognition
- 个人身份信息识别Personal Identifying Information recognition
- 关键短语提取Key phrase extraction
重要
- 文本分析 API 的最新稳定版本为
3.0
。The latest stable version of the Text Analytics API is3.0
.- 确保只按所用版本的说明操作。Be sure to only follow the instructions for the version you are using.
- 为了简单起见,本文中的代码使用了同步方法和不受保护的凭据存储。The code in this article uses synchronous methods and un-secured credentials storage for simplicity reasons. 对于生产方案,我们建议使用批处理的异步方法来提高性能和可伸缩性。For production scenarios, we recommend using the batched asynchronous methods for performance and scalability. 请参阅下面的参考文档。See the reference documentation below.
v3.1 参考文档 | v3.1 库源代码 | v3.1 包 (NuGet) | v3.1 示例v3.1 Reference documentation | v3.1 Library source code | v3.1 Package (NuGet) | v3.1 Samples
先决条件Prerequisites
- Azure 订阅 - 创建试用订阅Azure subscription - Create one for trial
- Visual Studio IDEThe Visual Studio IDE
- 你有了 Azure 订阅后,将在 Azure 门户中创建文本分析资源 ,以获取你的密钥和终结点。Once you have your Azure subscription, create a Text Analytics resource in the Azure portal to get your key and endpoint. 部署后,单击“转到资源”。After it deploys, click Go to resource.
- 你需要从创建的资源获取密钥和终结点,以便将应用程序连接到文本分析 API。You will need the key and endpoint from the resource you create to connect your application to the Text Analytics API. 你稍后会在快速入门中将密钥和终结点粘贴到下方的代码中。You'll paste your key and endpoint into the code below later in the quickstart.
- 可以使用免费定价层 (
F0
) 试用该服务,然后再升级到付费层进行生产。You can use the free pricing tier (F0
) to try the service, and upgrade later to a paid tier for production.
- 若要使用“分析”功能,需要标准 (S) 定价层的“文本分析”资源。To use the Analyze feature, you will need a Text Analytics resource with the standard (S) pricing tier.
设置Setting up
创建新的 .NET Core 应用程序Create a new .NET Core application
使用 Visual Studio IDE 创建新的 .NET Core 控制台应用。Using the Visual Studio IDE, create a new .NET Core console app. 这会创建包含单个 C# 源文件的“Hello World”项目:program.cs。This will create a "Hello World" project with a single C# source file: program.cs.
右键单击 解决方案资源管理器 中的解决方案,然后选择“管理 NuGet 包”,以便安装客户端库。Install the client library by right-clicking on the solution in the Solution Explorer and selecting Manage NuGet Packages. 在打开的包管理器中选择“浏览”,搜索 Azure.AI.TextAnalytics
。In the package manager that opens select Browse and search for Azure.AI.TextAnalytics
. 选中“包括预发行版”框,选择版本 5.1.0-beta.3
,然后选择“安装”。Check the include prerelase box, select version 5.1.0-beta.3
, and then Install. 也可使用包管理器控制台。You can also use the Package Manager Console.
打开 program.cs 文件并添加以下 using
指令:Open the program.cs file and add the following using
directives:
using Azure;
using System;
using System.Globalization;
using Azure.AI.TextAnalytics;
在应用程序的 Program
类中,为资源的密钥和终结点创建变量。In the application's Program
class, create variables for your resource's key and endpoint.
重要
转到 Azure 门户。Go to the Azure portal. 如果在“先决条件”部分中创建的文本分析资源已成功部署,请单击“后续步骤”下的“转到资源”按钮 。If the Text Analytics resource you created in the Prerequisites section deployed successfully, click the Go to Resource button under Next Steps. 在资源的“密钥和终结点”页的“资源管理”下可以找到密钥和终结点 。You can find your key and endpoint in the resource's key and endpoint page, under resource management.
完成后,请记住将密钥从代码中删除,并且永远不要公开发布该密钥。Remember to remove the key from your code when you're done, and never post it publicly. 对于生产环境,请考虑使用安全的方法来存储和访问凭据。For production, consider using a secure way of storing and accessing your credentials. 例如,Azure 密钥保管库。For example, Azure key vault.
private static readonly AzureKeyCredential credentials = new AzureKeyCredential("<replace-with-your-text-analytics-key-here>");
private static readonly Uri endpoint = new Uri("<replace-with-your-text-analytics-endpoint-here>");
替换应用程序的 Main
方法。Replace the application's Main
method. 稍后将定义此处调用的方法。You will define the methods called here later.
static void Main(string[] args)
{
var client = new TextAnalyticsClient(endpoint, credentials);
// You will implement these methods later in the quickstart.
SentimentAnalysisExample(client);
SentimentAnalysisWithOpinionMiningExample(client);
LanguageDetectionExample(client);
EntityRecognitionExample(client);
EntityLinkingExample(client);
RecognizePIIExample(client);
KeyPhraseExtractionExample(client);
Console.Write("Press any key to exit.");
Console.ReadKey();
}
对象模型Object model
文本分析客户端是一个 TextAnalyticsClient
对象,该对象使用你的密钥在 Azure 中进行身份验证,并提供用于接受文本(单个字符串或批)的函数。The Text Analytics client is a TextAnalyticsClient
object that authenticates to Azure using your key, and provides functions to accept text as single strings or as a batch. 可以同步方式将文本发送到 API。You can send text to the API synchronously. 响应对象包含发送的每个文档的分析信息。The response object will contain the analysis information for each document you send.
如果使用的是服务的 3.x
版本,则可使用可选的 TextAnalyticsClientOptions
实例,通过各种默认设置(例如默认语言或国家/地区提示)来初始化客户端。If you're using version 3.x
of the service, you can use an optional TextAnalyticsClientOptions
instance to initialize the client with various default settings (for example default language or country/region hint). 还可以使用 Azure Active Directory 令牌进行身份验证。You can also authenticate using an Azure Active Directory token.
代码示例Code examples
- 情绪分析Sentiment analysis
- 观点挖掘Opinion mining
- 语言检测Language detection
- 命名实体识别Named Entity Recognition
- 实体链接Entity linking
- 关键短语提取Key phrase extraction
验证客户端Authenticate the client
请确保先前的 main 方法使用终结点和凭据创建新的客户端对象。Make sure your main method from earlier creates a new client object with your endpoint and credentials.
var client = new TextAnalyticsClient(endpoint, credentials);
情绪分析Sentiment analysis
创建一个名为 SentimentAnalysisExample()
的新函数,该函数接受你之前创建的客户端,并调用其 AnalyzeSentiment()
函数。Create a new function called SentimentAnalysisExample()
that takes the client that you created earlier, and call its AnalyzeSentiment()
function. 如果成功,则返回的 Response<DocumentSentiment>
对象将包含整个输入文档的情绪标签和分数,以及每个句子的情绪分析。The returned Response<DocumentSentiment>
object will contain the sentiment label and score of the entire input document, as well as a sentiment analysis for each sentence if successful. 如果发生错误,则会引发 RequestFailedException
。If there was an error, it will throw a RequestFailedException
.
static void SentimentAnalysisExample(TextAnalyticsClient client)
{
string inputText = "I had the best day of my life. I wish you were there with me.";
DocumentSentiment documentSentiment = client.AnalyzeSentiment(inputText);
Console.WriteLine($"Document sentiment: {documentSentiment.Sentiment}\n");
foreach (var sentence in documentSentiment.Sentences)
{
Console.WriteLine($"\tText: \"{sentence.Text}\"");
Console.WriteLine($"\tSentence sentiment: {sentence.Sentiment}");
Console.WriteLine($"\tPositive score: {sentence.ConfidenceScores.Positive:0.00}");
Console.WriteLine($"\tNegative score: {sentence.ConfidenceScores.Negative:0.00}");
Console.WriteLine($"\tNeutral score: {sentence.ConfidenceScores.Neutral:0.00}\n");
}
}
输出Output
Document sentiment: Positive
Text: "I had the best day of my life."
Sentence sentiment: Positive
Positive score: 1.00
Negative score: 0.00
Neutral score: 0.00
Text: "I wish you were there with me."
Sentence sentiment: Neutral
Positive score: 0.21
Negative score: 0.02
Neutral score: 0.77
观点挖掘Opinion mining
创建一个名为 SentimentAnalysisWithOpinionMiningExample()
的新函数,该函数接受你之前创建的客户端,并使用 AdditionalSentimentAnalyses.OpinionMining
选项调用其 AnalyzeSentimentBatch()
函数。Create a new function called SentimentAnalysisWithOpinionMiningExample()
that takes the client that you created earlier, and call its AnalyzeSentimentBatch()
function with AdditionalSentimentAnalyses.OpinionMining
option. 返回的 AnalyzeSentimentResultCollection
对象将包含表示 Response<DocumentSentiment>
的 AnalyzeSentimentResult
的集合。The returned AnalyzeSentimentResultCollection
object will contain the collection of AnalyzeSentimentResult
in which represents Response<DocumentSentiment>
. SentimentAnalysis()
和 SentimentAnalysisWithOpinionMiningExample()
的区别在于后者在每个句子中都包含 MinedOpinion
(表明了所分析的角度和相关观点)。The difference between SentimentAnalysis()
and SentimentAnalysisWithOpinionMiningExample()
is that the latter will contain MinedOpinion
in each sentence, which shows an analyzed aspect and the related opinion(s). 如果发生错误,则会引发 RequestFailedException
。If there was an error, it will throw a RequestFailedException
.
static void SentimentAnalysisWithOpinionMiningExample(TextAnalyticsClient client)
{
var documents = new List<string>
{
"The food and service were unacceptable, but the concierge were nice."
};
AnalyzeSentimentResultCollection reviews = client.AnalyzeSentimentBatch(documents, options: new AnalyzeSentimentOptions()
{
IncludeOpinionMining = true
});
foreach (AnalyzeSentimentResult review in reviews)
{
Console.WriteLine($"Document sentiment: {review.DocumentSentiment.Sentiment}\n");
Console.WriteLine($"\tPositive score: {review.DocumentSentiment.ConfidenceScores.Positive:0.00}");
Console.WriteLine($"\tNegative score: {review.DocumentSentiment.ConfidenceScores.Negative:0.00}");
Console.WriteLine($"\tNeutral score: {review.DocumentSentiment.ConfidenceScores.Neutral:0.00}\n");
foreach (SentenceSentiment sentence in review.DocumentSentiment.Sentences)
{
Console.WriteLine($"\tText: \"{sentence.Text}\"");
Console.WriteLine($"\tSentence sentiment: {sentence.Sentiment}");
Console.WriteLine($"\tSentence positive score: {sentence.ConfidenceScores.Positive:0.00}");
Console.WriteLine($"\tSentence negative score: {sentence.ConfidenceScores.Negative:0.00}");
Console.WriteLine($"\tSentence neutral score: {sentence.ConfidenceScores.Neutral:0.00}\n");
foreach (MinedOpinion minedOpinion in sentence.MinedOpinions)
{
Console.WriteLine($"\tAspect: {minedOpinion.Aspect.Text}, Value: {minedOpinion.Aspect.Sentiment}");
Console.WriteLine($"\tAspect positive score: {minedOpinion.Aspect.ConfidenceScores.Positive:0.00}");
Console.WriteLine($"\tAspect negative score: {minedOpinion.Aspect.ConfidenceScores.Negative:0.00}");
foreach (OpinionSentiment opinion in minedOpinion.Opinions)
{
Console.WriteLine($"\t\tRelated Opinion: {opinion.Text}, Value: {opinion.Sentiment}");
Console.WriteLine($"\t\tRelated Opinion positive score: {opinion.ConfidenceScores.Positive:0.00}");
Console.WriteLine($"\t\tRelated Opinion negative score: {opinion.ConfidenceScores.Negative:0.00}");
}
}
}
Console.WriteLine($"\n");
}
}
输出Output
Document sentiment: Positive
Positive score: 0.84
Negative score: 0.16
Neutral score: 0.00
Text: "The food and service were unacceptable, but the concierge were nice."
Sentence sentiment: Positive
Sentence positive score: 0.84
Sentence negative score: 0.16
Sentence neutral score: 0.00
Aspect: food, Value: Negative
Aspect positive score: 0.01
Aspect negative score: 0.99
Related Opinion: unacceptable, Value: Negative
Related Opinion positive score: 0.01
Related Opinion negative score: 0.99
Aspect: service, Value: Negative
Aspect positive score: 0.01
Aspect negative score: 0.99
Related Opinion: unacceptable, Value: Negative
Related Opinion positive score: 0.01
Related Opinion negative score: 0.99
Aspect: concierge, Value: Positive
Aspect positive score: 1.00
Aspect negative score: 0.00
Related Opinion: nice, Value: Positive
Related Opinion positive score: 1.00
Related Opinion negative score: 0.00
Press any key to exit.
语言检测Language detection
创建一个名为 LanguageDetectionExample()
的新函数,该函数接受你之前创建的客户端并调用其 DetectLanguage()
函数。Create a new function called LanguageDetectionExample()
that takes the client that you created earlier, and call its DetectLanguage()
function. 返回的 Response<DetectedLanguage>
对象会包含检测到的语言及其名称和 ISO-6391 代码。The returned Response<DetectedLanguage>
object will contain the detected language along with its name and ISO-6391 code. 如果发生错误,则会引发 RequestFailedException
。If there was an error, it will throw a RequestFailedException
.
提示
在某些情况下,可能很难根据输入区分语言。In some cases it may be hard to disambiguate languages based on the input. 可以使用 countryHint
参数指定 2 个字母的国家/地区代码。You can use the countryHint
parameter to specify a 2-letter country/region code. 默认情况下,API 使用“US”作为默认的 countryHint,要删除此行为,可以通过将此值设置为空字符串 countryHint = ""
来重置此参数。By default the API is using the "US" as the default countryHint, to remove this behavior you can reset this parameter by setting this value to empty string countryHint = ""
. 若要设置不同的默认值,请设置 TextAnalyticsClientOptions.DefaultCountryHint
属性,然后在客户端初始化期间传递它。To set a different default, set the TextAnalyticsClientOptions.DefaultCountryHint
property and pass it during the client's initialization.
static void LanguageDetectionExample(TextAnalyticsClient client)
{
DetectedLanguage detectedLanguage = client.DetectLanguage("Ce document est rédigé en Français.");
Console.WriteLine("Language:");
Console.WriteLine($"\t{detectedLanguage.Name},\tISO-6391: {detectedLanguage.Iso6391Name}\n");
}
输出Output
Language:
French, ISO-6391: fr
命名实体识别 (NER)Named Entity Recognition (NER)
创建一个名为 EntityRecognitionExample()
的新函数,该函数接受你之前创建的客户端,调用其 RecognizeEntities()
函数并循环访问结果。Create a new function called EntityRecognitionExample()
that takes the client that you created earlier, call its RecognizeEntities()
function and iterate through the results. 返回的 Response<CategorizedEntityCollection>
对象将包含检测到的实体 CategorizedEntity
的集合。The returned Response<CategorizedEntityCollection>
object will contain the collection of detected entities CategorizedEntity
. 如果发生错误,则会引发 RequestFailedException
。If there was an error, it will throw a RequestFailedException
.
static void EntityRecognitionExample(TextAnalyticsClient client)
{
var response = client.RecognizeEntities("I had a wonderful trip to Seattle last week.");
Console.WriteLine("Named Entities:");
foreach (var entity in response.Value)
{
Console.WriteLine($"\tText: {entity.Text},\tCategory: {entity.Category},\tSub-Category: {entity.SubCategory}");
Console.WriteLine($"\t\tScore: {entity.ConfidenceScore:F2},\tLength: {entity.Length},\tOffset: {entity.Offset}\n");
}
}
输出Output
Named Entities:
Text: trip, Category: Event, Sub-Category:
Score: 0.61, Length: 4, Offset: 18
Text: Seattle, Category: Location, Sub-Category: GPE
Score: 0.82, Length: 7, Offset: 26
Text: last week, Category: DateTime, Sub-Category: DateRange
Score: 0.80, Length: 9, Offset: 34
实体链接Entity linking
创建一个名为 EntityLinkingExample()
的新函数,该函数接受你之前创建的客户端,调用其 RecognizeLinkedEntities()
函数并循环访问结果。Create a new function called EntityLinkingExample()
that takes the client that you created earlier, call its RecognizeLinkedEntities()
function and iterate through the results. 返回的 Response<LinkedEntityCollection>
对象将包含检测到的实体 LinkedEntity
的集合。The returned Response<LinkedEntityCollection>
object will contain the collection of detected entities LinkedEntity
. 如果发生错误,则会引发 RequestFailedException
。If there was an error, it will throw a RequestFailedException
. 由于链接实体是唯一标识的,因此同一实体的实例将以分组形式出现在 LinkedEntity
对象下,显示为 LinkedEntityMatch
对象的列表。Since linked entities are uniquely identified, occurrences of the same entity are grouped under a LinkedEntity
object as a list of LinkedEntityMatch
objects.
static void EntityLinkingExample(TextAnalyticsClient client)
{
var response = client.RecognizeLinkedEntities(
"Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975, " +
"to develop and sell BASIC interpreters for the Altair 8800. " +
"During his career at Microsoft, Gates held the positions of chairman, " +
"chief executive officer, president and chief software architect, " +
"while also being the largest individual shareholder until May 2014.");
Console.WriteLine("Linked Entities:");
foreach (var entity in response.Value)
{
Console.WriteLine($"\tName: {entity.Name},\tID: {entity.DataSourceEntityId},\tURL: {entity.Url}\tData Source: {entity.DataSource}");
Console.WriteLine("\tMatches:");
foreach (var match in entity.Matches)
{
Console.WriteLine($"\t\tText: {match.Text}");
Console.WriteLine($"\t\tScore: {match.ConfidenceScore:F2}");
Console.WriteLine($"\t\tLength: {match.Length}");
Console.WriteLine($"\t\tOffset: {match.Offset}\n");
}
}
}
输出Output
Linked Entities:
Name: Microsoft, ID: Microsoft, URL: https://en.wikipedia.org/wiki/Microsoft Data Source: Wikipedia
Matches:
Text: Microsoft
Score: 0.55
Length: 9
Offset: 0
Text: Microsoft
Score: 0.55
Length: 9
Offset: 150
Name: Bill Gates, ID: Bill Gates, URL: https://en.wikipedia.org/wiki/Bill_Gates Data Source: Wikipedia
Matches:
Text: Bill Gates
Score: 0.63
Length: 10
Offset: 25
Text: Gates
Score: 0.63
Length: 5
Offset: 161
Name: Paul Allen, ID: Paul Allen, URL: https://en.wikipedia.org/wiki/Paul_Allen Data Source: Wikipedia
Matches:
Text: Paul Allen
Score: 0.60
Length: 10
Offset: 40
Name: April 4, ID: April 4, URL: https://en.wikipedia.org/wiki/April_4 Data Source: Wikipedia
Matches:
Text: April 4
Score: 0.32
Length: 7
Offset: 54
Name: BASIC, ID: BASIC, URL: https://en.wikipedia.org/wiki/BASIC Data Source: Wikipedia
Matches:
Text: BASIC
Score: 0.33
Length: 5
Offset: 89
Name: Altair 8800, ID: Altair 8800, URL: https://en.wikipedia.org/wiki/Altair_8800 Data Source: Wikipedia
Matches:
Text: Altair 8800
Score: 0.88
Length: 11
Offset: 116
个人身份信息识别Personally Identifiable Information recognition
创建一个名为 RecognizePIIExample()
的新函数,该函数接受你之前创建的客户端,调用其 RecognizePiiEntities()
函数并循环访问结果。Create a new function called RecognizePIIExample()
that takes the client that you created earlier, call its RecognizePiiEntities()
function and iterate through the results. 返回的 PiiEntityCollection
表示检测到的 PII 实体的列表。The returned PiiEntityCollection
represents the list of detected PII entities. 如果发生错误,则会引发 RequestFailedException
。If there was an error, it will throw a RequestFailedException
.
static void RecognizePIIExample(TextAnalyticsClient client)
{
string document = "A developer with SSN 859-98-0987 whose phone number is 800-102-1100 is building tools with our APIs.";
PiiEntityCollection entities = client.RecognizePiiEntities(document).Value;
Console.WriteLine($"Redacted Text: {entities.RedactedText}");
if (entities.Count > 0)
{
Console.WriteLine($"Recognized {entities.Count} PII entit{(entities.Count > 1 ? "ies" : "y")}:");
foreach (PiiEntity entity in entities)
{
Console.WriteLine($"Text: {entity.Text}, Category: {entity.Category}, SubCategory: {entity.SubCategory}, Confidence score: {entity.ConfidenceScore}");
}
}
else
{
Console.WriteLine("No entities were found.");
}
}
输出Output
Redacted Text: A developer with SSN *********** whose phone number is ************ is building tools with our APIs.
Recognized 2 PII entities:
Text: 859-98-0987, Category: U.S. Social Security Number (SSN), SubCategory: , Confidence score: 0.65
Text: 800-102-1100, Category: Phone Number, SubCategory: , Confidence score: 0.8
关键短语提取Key phrase extraction
创建一个名为 KeyPhraseExtractionExample()
的新函数,该函数接受你之前创建的客户端,并调用其 ExtractKeyPhrases()
函数。Create a new function called KeyPhraseExtractionExample()
that takes the client that you created earlier, and call its ExtractKeyPhrases()
function. 返回的 <Response<KeyPhraseCollection>
对象将包含检测到的关键短语的列表。The returned <Response<KeyPhraseCollection>
object will contain the list of detected key phrases. 如果发生错误,则会引发 RequestFailedException
。If there was an error, it will throw a RequestFailedException
.
static void KeyPhraseExtractionExample(TextAnalyticsClient client)
{
var response = client.ExtractKeyPhrases("My cat might need to see a veterinarian.");
// Printing key phrases
Console.WriteLine("Key phrases:");
foreach (string keyphrase in response.Value)
{
Console.WriteLine($"\t{keyphrase}");
}
}
输出Output
Key phrases:
cat
veterinarian
重要
- 文本分析 API 的最新稳定版本为
3.0
。The latest stable version of the Text Analytics API is3.0
. - 为了简单起见,本文中的代码使用了同步方法和不受保护的凭据存储。The code in this article uses synchronous methods and un-secured credentials storage for simplicity reasons. 对于生产方案,我们建议使用批处理的异步方法来提高性能和可伸缩性。For production scenarios, we recommend using the batched asynchronous methods for performance and scalability. 请参阅下面的参考文档。See the reference documentation below.
参考文档 | 库源代码 | 包 | 示例Reference documentation | Library source code | Package | Samples
先决条件Prerequisites
- Azure 订阅 - 创建试用订阅Azure subscription - Create one for trial
- Java 开发工具包 (JDK) 版本 8 或更高版本Java Development Kit (JDK) with version 8 or above
- 你有了 Azure 订阅后,将在 Azure 门户中创建文本分析资源 ,以获取你的密钥和终结点。Once you have your Azure subscription, create a Text Analytics resource in the Azure portal to get your key and endpoint. 部署后,单击“转到资源”。After it deploys, click Go to resource .
- 你需要从创建的资源获取密钥和终结点,以便将应用程序连接到文本分析 API。You will need the key and endpoint from the resource you create to connect your application to the Text Analytics API. 你稍后会在快速入门中将密钥和终结点粘贴到下方的代码中。You'll paste your key and endpoint into the code below later in the quickstart.
- 可以使用免费定价层 (
F0
) 试用该服务,然后再升级到付费层进行生产。You can use the free pricing tier (F0
) to try the service, and upgrade later to a paid tier for production.
设置Setting up
添加客户端库Add the client library
在首选 IDE 或开发环境中创建 Maven 项目。Create a Maven project in your preferred IDE or development environment. 然后在项目的 pom.xml 文件中,添加以下依赖项。Then add the following dependency to your project's pom.xml file. 可联机找到用于其他生成工具的实现语法。You can find the implementation syntax for other build tools online.
<dependencies>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-ai-textanalytics</artifactId>
<version>5.1.0-beta.1</version>
</dependency>
</dependencies>
创建名为 TextAnalyticsSamples.java
的 Java 文件。Create a Java file named TextAnalyticsSamples.java
. 打开 文件并添加以下 import
语句:Open the file and add the following import
statements:
import com.azure.core.credential.AzureKeyCredential;
import com.azure.ai.textanalytics.models.*;
import com.azure.ai.textanalytics.TextAnalyticsClientBuilder;
import com.azure.ai.textanalytics.TextAnalyticsClient;
在 java 文件中,添加一个新类并添加你的 Azure 资源的密钥和终结点,如下所示。In the java file, add a new class and add your Azure resource's key and endpoint as shown below.
重要
转到 Azure 门户。Go to the Azure portal. 如果在“先决条件”部分中创建的文本分析资源已成功部署,请单击“后续步骤”下的“转到资源”按钮 。If the Text Analytics resource you created in the Prerequisites section deployed successfully, click the Go to Resource button under Next Steps. 在资源的“密钥和终结点”页的“资源管理”下可以找到密钥和终结点 。You can find your key and endpoint in the resource's key and endpoint page, under resource management.
完成后,请记住将密钥从代码中删除,并且永远不要公开发布该密钥。Remember to remove the key from your code when you're done, and never post it publicly. 对于生产环境,请考虑使用安全的方法来存储和访问凭据。For production, consider using a secure way of storing and accessing your credentials. 例如,Azure 密钥保管库。For example, Azure key vault.
public class TextAnalyticsSamples {
private static String KEY = "<replace-with-your-text-analytics-key-here>";
private static String ENDPOINT = "<replace-with-your-text-analytics-endpoint-here>";
}
将以下 main 方法添加到该类。Add the following main method to the class. 稍后将定义此处调用的方法。You will define the methods called here later.
public static void main(String[] args) {
//You will create these methods later in the quickstart.
TextAnalyticsClient client = authenticateClient(KEY, ENDPOINT);
sentimentAnalysisWithOpinionMiningExample(client)
detectLanguageExample(client);
recognizeEntitiesExample(client);
recognizeLinkedEntitiesExample(client);
recognizePiiEntitiesExample(client);
extractKeyPhrasesExample(client);
}
对象模型Object model
文本分析客户端是一个 TextAnalyticsClient
对象,该对象使用你的密钥在 Azure 中进行身份验证,并提供用于接受文本(单个字符串或批)的函数。The Text Analytics client is a TextAnalyticsClient
object that authenticates to Azure using your key, and provides functions to accept text as single strings or as a batch. 可以同步方式或异步方式将文本发送到 API。You can send text to the API synchronously, or asynchronously. 响应对象包含发送的每个文档的分析信息。The response object will contain the analysis information for each document you send.
代码示例Code examples
- 对客户端进行身份验证Authenticate the client
- 情绪分析Sentiment Analysis
- 观点挖掘Opinion mining
- 语言检测Language detection
- 命名实体识别Named Entity recognition
- 实体链接Entity linking
- 关键短语提取Key phrase extraction
验证客户端Authenticate the client
使用文本分析资源的密钥和终结点创建用于实例化 TextAnalyticsClient
对象的方法。Create a method to instantiate the TextAnalyticsClient
object with the key and endpoint for your Text Analytics resource. 此示例同样适用于该 API 的版本 3.0 和 3.1。This example is the same for versions 3.0 and 3.1 of the API.
static TextAnalyticsClient authenticateClient(String key, String endpoint) {
return new TextAnalyticsClientBuilder()
.credential(new AzureKeyCredential(key))
.endpoint(endpoint)
.buildClient();
}
在程序的 main()
方法中,调用身份验证方法来实例化客户端。In your program's main()
method, call the authentication method to instantiate the client.
情绪分析Sentiment analysis
备注
在版本 3.1
中:In version 3.1
:
- 情绪分析包括“观点挖掘分析”(可选标志)。Sentiment Analysis includes Opinion Mining analysis which is optional flag.
- “观点挖掘”包含角度和观点级情绪。Opinion Mining contains aspect and opinion level sentiment.
创建一个名为 sentimentAnalysisExample()
的新函数,该函数接受你之前创建的客户端,并调用其 analyzeSentiment()
函数。Create a new function called sentimentAnalysisExample()
that takes the client that you created earlier, and call its analyzeSentiment()
function. 如果成功,则返回的 AnalyzeSentimentResult
对象将包含 documentSentiment
和 sentenceSentiments
,否则将包含 errorMessage
。The returned AnalyzeSentimentResult
object will contain documentSentiment
and sentenceSentiments
if successful, or an errorMessage
if not.
static void sentimentAnalysisExample(TextAnalyticsClient client)
{
// The text that need be analyzed.
String text = "I had the best day of my life. I wish you were there with me.";
DocumentSentiment documentSentiment = client.analyzeSentiment(text);
System.out.printf(
"Recognized document sentiment: %s, positive score: %s, neutral score: %s, negative score: %s.%n",
documentSentiment.getSentiment(),
documentSentiment.getConfidenceScores().getPositive(),
documentSentiment.getConfidenceScores().getNeutral(),
documentSentiment.getConfidenceScores().getNegative());
for (SentenceSentiment sentenceSentiment : documentSentiment.getSentences()) {
System.out.printf(
"Recognized sentence sentiment: %s, positive score: %s, neutral score: %s, negative score: %s.%n",
sentenceSentiment.getSentiment(),
sentenceSentiment.getConfidenceScores().getPositive(),
sentenceSentiment.getConfidenceScores().getNeutral(),
sentenceSentiment.getConfidenceScores().getNegative());
}
}
}
输出Output
Recognized document sentiment: positive, positive score: 1.0, neutral score: 0.0, negative score: 0.0.
Recognized sentence sentiment: positive, positive score: 1.0, neutral score: 0.0, negative score: 0.0.
Recognized sentence sentiment: neutral, positive score: 0.21, neutral score: 0.77, negative score: 0.02.
观点挖掘Opinion mining
若要使用观点挖掘执行情绪分析,请创建一个名为 sentimentAnalysisWithOpinionMiningExample()
的新函数(该函数接受之前创建的客户端),并通过设置选项对象 AnalyzeSentimentOptions
调用其 analyzeSentiment()
函数。To perform sentiment analysis with opinion mining, create a new function called sentimentAnalysisWithOpinionMiningExample()
that takes the client that you created earlier, and call its analyzeSentiment()
function with setting option object AnalyzeSentimentOptions
. 如果成功,则返回的 AnalyzeSentimentResult
对象将包含 documentSentiment
和 sentenceSentiments
,否则将包含 errorMessage
。The returned AnalyzeSentimentResult
object will contain documentSentiment
and sentenceSentiments
if successful, or an errorMessage
if not.
static void sentimentAnalysisWithOpinionMiningExample(TextAnalyticsClient client)
{
// The Document that needs be analyzed.
String document = "Bad atmosphere. Not close to plenty of restaurants, hotels, and transit! Staff are not friendly and helpful.";
System.out.printf("Document = %s%n", document);
AnalyzeSentimentOptions options = new AnalyzeSentimentOptions().setIncludeOpinionMining(true);
final DocumentSentiment documentSentiment = client.analyzeSentiment(document, "en", options);
SentimentConfidenceScores scores = documentSentiment.getConfidenceScores();
System.out.printf(
"\tRecognized document sentiment: %s, positive score: %f, neutral score: %f, negative score: %f.%n",
documentSentiment.getSentiment(), scores.getPositive(), scores.getNeutral(), scores.getNegative());
documentSentiment.getSentences().forEach(sentenceSentiment -> {
SentimentConfidenceScores sentenceScores = sentenceSentiment.getConfidenceScores();
System.out.printf("\t\tSentence sentiment: %s, positive score: %f, neutral score: %f, negative score: %f.%n",
sentenceSentiment.getSentiment(), sentenceScores.getPositive(), sentenceScores.getNeutral(), sentenceScores.getNegative());
sentenceSentiment.getMinedOpinions().forEach(minedOpinions -> {
AspectSentiment aspectSentiment = minedOpinions.getAspect();
System.out.printf("\t\t\tAspect sentiment: %s, aspect text: %s%n", aspectSentiment.getSentiment(),
aspectSentiment.getText());
for (OpinionSentiment opinionSentiment : minedOpinions.getOpinions()) {
System.out.printf("\t\t\t\t'%s' opinion sentiment because of \"%s\". Is the opinion negated: %s.%n",
opinionSentiment.getSentiment(), opinionSentiment.getText(), opinionSentiment.isNegated());
}
});
});
}
输出Output
Text = Bad atmosphere. Not close to plenty of restaurants, hotels, and transit! Staff are not friendly and helpful.
Recognized document sentiment: negative, positive score: 0.010000, neutral score: 0.140000, negative score: 0.850000.
Sentence sentiment: negative, positive score: 0.000000, neutral score: 0.000000, negative score: 1.000000.
Aspect sentiment: negative, aspect text: atmosphere
'negative' opinion sentiment because of "bad". Is the opinion negated: false.
Sentence sentiment: negative, positive score: 0.020000, neutral score: 0.440000, negative score: 0.540000.
Sentence sentiment: negative, positive score: 0.000000, neutral score: 0.000000, negative score: 1.000000.
Aspect sentiment: negative, aspect text: Staff
'negative' opinion sentiment because of "friendly". Is the opinion negated: true.
'negative' opinion sentiment because of "helpful". Is the opinion negated: true.
Process finished with exit code 0
语言检测Language detection
创建一个名为 detectLanguageExample()
的新函数,该函数接受你之前创建的客户端并调用其 detectLanguage()
函数。Create a new function called detectLanguageExample()
that takes the client that you created earlier, and call its detectLanguage()
function. 如果成功,则返回的 DetectLanguageResult
对象将包含检测到的主要语言和检测到的其他语言的列表,如果失败,则将包含 errorMessage
。The returned DetectLanguageResult
object will contain a primary language detected, a list of other languages detected if successful, or an errorMessage
if not. 此示例同样适用于该 API 的版本 3.0 和 3.1。This example is the same for versions 3.0 and 3.1 of the API.
提示
在某些情况下,可能很难根据输入区分语言。In some cases it may be hard to disambiguate languages based on the input. 可以使用 countryHint
参数指定 2 个字母的国家/地区代码。You can use the countryHint
parameter to specify a 2-letter country code. 默认情况下,API 使用“US”作为默认的 countryHint,要删除此行为,可以通过将此值设置为空字符串 countryHint = ""
来重置此参数。By default the API is using the "US" as the default countryHint, to remove this behavior you can reset this parameter by setting this value to empty string countryHint = ""
. 若要设置不同的默认值,请设置 TextAnalyticsClientOptions.DefaultCountryHint
属性,然后在客户端初始化期间传递它。To set a different default, set the TextAnalyticsClientOptions.DefaultCountryHint
property and pass it during the client's initialization.
static void detectLanguageExample(TextAnalyticsClient client)
{
// The text that need be analyzed.
String text = "Ce document est rédigé en Français.";
DetectedLanguage detectedLanguage = client.detectLanguage(text);
System.out.printf("Detected primary language: %s, ISO 6391 name: %s, score: %.2f.%n",
detectedLanguage.getName(),
detectedLanguage.getIso6391Name(),
detectedLanguage.getConfidenceScore());
}
输出Output
Detected primary language: French, ISO 6391 name: fr, score: 1.00.
命名实体识别 (NER)Named Entity recognition (NER)
备注
在版本 3.1
中:In version 3.1
:
- NER 包含单独用于检测个人信息的方法。NER includes separate methods for detecting personal information.
- 实体链接是一个独立于 NER 的请求。Entity linking is a separate request than NER.
创建一个名为 recognizeEntitiesExample()
的新函数,该函数接受你之前创建的客户端,并调用其 recognizeEntities()
函数。Create a new function called recognizeEntitiesExample()
that takes the client that you created earlier, and call its recognizeEntities()
function. 如果成功,则返回的 CategorizedEntityCollection
对象将包含 CategorizedEntity
的列表,否则将包含 errorMessage
。The returned CategorizedEntityCollection
object will contain a list of CategorizedEntity
if successful, or an errorMessage
if not.
static void recognizeEntitiesExample(TextAnalyticsClient client)
{
// The text that need be analyzed.
String text = "I had a wonderful trip to Seattle last week.";
for (CategorizedEntity entity : client.recognizeEntities(text)) {
System.out.printf(
"Recognized entity: %s, entity category: %s, entity sub-category: %s, score: %s, offset: %s, length: %s.%n",
entity.getText(),
entity.getCategory(),
entity.getSubcategory(),
entity.getConfidenceScore(),
entity.getOffset(),
entity.getLength());
}
}
输出Output
Recognized entity: trip, entity category: Event, entity sub-category: null, score: 0.61, offset: 8, length: 4.
Recognized entity: Seattle, entity category: Location, entity sub-category: GPE, score: 0.82, offset: 16, length: 7.
Recognized entity: last week, entity category: DateTime, entity sub-category: DateRange, score: 0.8, offset: 24, length: 9.
实体链接Entity linking
创建一个名为 recognizeLinkedEntitiesExample()
的新函数,该函数接受你之前创建的客户端,并调用其 recognizeLinkedEntities()
函数。Create a new function called recognizeLinkedEntitiesExample()
that takes the client that you created earlier, and call its recognizeLinkedEntities()
function. 如果成功,则返回的 LinkedEntityCollection
对象将包含 LinkedEntity
的列表,否则将包含 errorMessage
。The returned LinkedEntityCollection
object will contain a list of LinkedEntity
if successful, or an errorMessage
if not. 由于链接实体是唯一标识的,因此同一实体的实例将以分组形式出现在 LinkedEntity
对象下,显示为 LinkedEntityMatch
对象的列表。Since linked entities are uniquely identified, occurrences of the same entity are grouped under a LinkedEntity
object as a list of LinkedEntityMatch
objects.
static void recognizeLinkedEntitiesExample(TextAnalyticsClient client)
{
// The text that need be analyzed.
String text = "Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975, " +
"to develop and sell BASIC interpreters for the Altair 8800. " +
"During his career at Microsoft, Gates held the positions of chairman, " +
"chief executive officer, president and chief software architect, " +
"while also being the largest individual shareholder until May 2014.";
System.out.printf("Linked Entities:%n");
for (LinkedEntity linkedEntity : client.recognizeLinkedEntities(text)) {
System.out.printf("Name: %s, ID: %s, URL: %s, Data Source: %s.%n",
linkedEntity.getName(),
linkedEntity.getDataSourceEntityId(),
linkedEntity.getUrl(),
linkedEntity.getDataSource());
System.out.printf("Matches:%n");
for (LinkedEntityMatch linkedEntityMatch : linkedEntity.getMatches()) {
System.out.printf("Text: %s, Score: %.2f, Offset: %s, Length: %s%n",
linkedEntityMatch.getText(),
linkedEntityMatch.getConfidenceScore(),
linkedEntityMatch.getOffset(),
linkedEntityMatch.getLength());
}
}
}
输出Output
Linked Entities:
Name: Microsoft, ID: Microsoft, URL: https://en.wikipedia.org/wiki/Microsoft, Data Source: Wikipedia.
Matches:
Text: Microsoft, Score: 0.55, Offset: 9, Length: 0
Text: Microsoft, Score: 0.55, Offset: 9, Length: 150
Name: Bill Gates, ID: Bill Gates, URL: https://en.wikipedia.org/wiki/Bill_Gates, Data Source: Wikipedia.
Matches:
Text: Bill Gates, Score: 0.63, Offset: 10, Length: 25
Text: Gates, Score: 0.63, Offset: 5, Length: 161
Name: Paul Allen, ID: Paul Allen, URL: https://en.wikipedia.org/wiki/Paul_Allen, Data Source: Wikipedia.
Matches:
Text: Paul Allen, Score: 0.60, Offset: 10, Length: 40
Name: April 4, ID: April 4, URL: https://en.wikipedia.org/wiki/April_4, Data Source: Wikipedia.
Matches:
Text: April 4, Score: 0.32, Offset: 7, Length: 54
Name: BASIC, ID: BASIC, URL: https://en.wikipedia.org/wiki/BASIC, Data Source: Wikipedia.
Matches:
Text: BASIC, Score: 0.33, Offset: 5, Length: 89
Name: Altair 8800, ID: Altair 8800, URL: https://en.wikipedia.org/wiki/Altair_8800, Data Source: Wikipedia.
Matches:
Text: Altair 8800, Score: 0.88, Offset: 11, Length: 116
个人身份信息识别Personally Identifiable Information Recognition
创建一个名为 recognizePiiEntitiesExample()
的新函数,该函数接受你之前创建的客户端,并调用其 recognizePiiEntities()
函数。Create a new function called recognizePiiEntitiesExample()
that takes the client that you created earlier, and call its recognizePiiEntities()
function. 如果成功,则返回的 PiiEntityCollection
对象将包含 PiiEntity
的列表,否则将包含 errorMessage
。The returned PiiEntityCollection
object will contain a list of PiiEntity
if successful, or an errorMessage
if not. 它还将包含已修订的文本,该文本由输入文本组成,其中所有可识别实体均替换为 *****
。It will also contain the redacted text, which consists of the input text with all identifiable entities replaced with *****
.
static void recognizePiiEntitiesExample(TextAnalyticsClient client)
{
// The text that need be analyzed.
String document = "My SSN is 859-98-0987";
PiiEntityCollection piiEntityCollection = client.recognizePiiEntities(document);
System.out.printf("Redacted Text: %s%n", piiEntityCollection.getRedactedText());
piiEntityCollection.forEach(entity -> System.out.printf(
"Recognized Personally Identifiable Information entity: %s, entity category: %s, entity subcategory: %s,"
+ " confidence score: %f.%n",
entity.getText(), entity.getCategory(), entity.getSubcategory(), entity.getConfidenceScore()));
}
输出Output
Redacted Text: My SSN is ***********
Recognized Personally Identifiable Information entity: 859-98-0987, entity category: U.S. Social Security Number (SSN), entity subcategory: null, confidence score: 0.650000.
关键短语提取Key phrase extraction
创建一个名为 extractKeyPhrasesExample()
的新函数,该函数接受你之前创建的客户端,并调用其 extractKeyPhrases()
函数。Create a new function called extractKeyPhrasesExample()
that takes the client that you created earlier, and call its extractKeyPhrases()
function. 如果成功,则返回的 ExtractKeyPhraseResult
对象将包含关键短语的列表,否则将包含 errorMessage
。The returned ExtractKeyPhraseResult
object will contain a list of key phrases if successful, or an errorMessage
if not. 此示例同样适用于该 API 的版本 3.0 和 3.1。This example is the same for version 3.0 and 3.1 of the API.
static void extractKeyPhrasesExample(TextAnalyticsClient client)
{
// The text that need be analyzed.
String text = "My cat might need to see a veterinarian.";
System.out.printf("Recognized phrases: %n");
for (String keyPhrase : client.extractKeyPhrases(text)) {
System.out.printf("%s%n", keyPhrase);
}
}
输出Output
Recognized phrases:
cat
veterinarian
重要
- 文本分析 API 的最新稳定版本为
3.0
。The latest stable version of the Text Analytics API is3.0
.- 确保只按所用版本的说明操作。Be sure to only follow the instructions for the version you are using.
- 为了简单起见,本文中的代码使用了同步方法和不受保护的凭据存储。The code in this article uses synchronous methods and un-secured credentials storage for simplicity reasons. 对于生产方案,我们建议使用批处理的异步方法来提高性能和可伸缩性。For production scenarios, we recommend using the batched asynchronous methods for performance and scalability. 请参阅下面的参考文档。See the reference documentation below.
- 还可在浏览器中运行此版本的文本分析客户端库。You can also run this version of the Text Analytics client library in your browser.
v3 参考文档 | v3 库源代码 | v3 包(NPM) | v3 示例v3 Reference documentation | v3 Library source code | v3 Package (NPM) | v3 Samples
先决条件Prerequisites
- Azure 订阅 - 创建试用订阅Azure subscription - Create one for trial
- 最新版本的 Node.js。The current version of Node.js.
- 你有了 Azure 订阅后,将在 Azure 门户中创建文本分析资源 ,以获取你的密钥和终结点。Once you have your Azure subscription, create a Text Analytics resource in the Azure portal to get your key and endpoint. 部署后,单击“转到资源”。After it deploys, click Go to resource.
- 你需要从创建的资源获取密钥和终结点,以便将应用程序连接到文本分析 API。You will need the key and endpoint from the resource you create to connect your application to the Text Analytics API. 你稍后会在快速入门中将密钥和终结点粘贴到下方的代码中。You'll paste your key and endpoint into the code below later in the quickstart.
- 可以使用免费定价层 (
F0
) 试用该服务,然后再升级到付费层进行生产。You can use the free pricing tier (F0
) to try the service, and upgrade later to a paid tier for production.
- 若要使用“分析”功能,需要标准 (S) 定价层的“文本分析”资源。To use the Analyze feature, you will need a Text Analytics resource with the standard (S) pricing tier.
设置Setting up
创建新的 Node.js 应用程序Create a new Node.js application
在控制台窗口(例如 cmd、PowerShell 或 Bash)中,为应用创建一个新目录并导航到该目录。In a console window (such as cmd, PowerShell, or Bash), create a new directory for your app, and navigate to it.
mkdir myapp
cd myapp
运行 npm init
命令以使用 package.json
文件创建一个 node 应用程序。Run the npm init
command to create a node application with a package.json
file.
npm init
安装客户端库Install the client library
安装 @azure/ai-text-analytics
NPM 包:Install the @azure/ai-text-analytics
NPM packages:
npm install --save @azure/ai-text-analytics@5.1.0-beta.3
提示
想要立即查看整个快速入门代码文件?Want to view the whole quickstart code file at once? 可以在 GitHub 上找到它,其中包含此快速入门中的代码示例。You can find it on GitHub, which contains the code examples in this quickstart.
应用的 package.json
文件将使用依赖项进行更新。Your app's package.json
file will be updated with the dependencies.
创建一个名为 index.js
的文件,并添加以下内容:Create a file named index.js
and add the following:
"use strict";
const { TextAnalyticsClient, AzureKeyCredential } = require("@azure/ai-text-analytics");
为资源的 Azure 终结点和密钥创建变量。Create variables for your resource's Azure endpoint and key.
重要
转到 Azure 门户。Go to the Azure portal. 如果在“先决条件”部分中创建的文本分析资源已成功部署,请单击“后续步骤”下的“转到资源”按钮 。If the Text Analytics resource you created in the Prerequisites section deployed successfully, click the Go to Resource button under Next Steps. 在资源的“密钥和终结点”页的“资源管理”下可以找到密钥和终结点 。You can find your key and endpoint in the resource's key and endpoint page, under resource management.
完成后,请记住将密钥从代码中删除,并且永远不要公开发布该密钥。Remember to remove the key from your code when you're done, and never post it publicly. 对于生产环境,请考虑使用安全的方法来存储和访问凭据。For production, consider using a secure way of storing and accessing your credentials. 例如,Azure 密钥保管库。For example, Azure key vault.
const key = '<paste-your-text-analytics-key-here>';
const endpoint = '<paste-your-text-analytics-endpoint-here>';
对象模型Object model
文本分析客户端是一个 TextAnalyticsClient
对象,它使用你的密钥向 Azure 进行身份验证。The Text Analytics client is a TextAnalyticsClient
object that authenticates to Azure using your key. 该客户端提供了几种方法来分析文本,文本可以是单个字符串,也可以是批处理。The client provides several methods for analyzing text, as a single string, or a batch.
文本将以 documents
的列表的形式发送到 API,该项是包含 id
、text
和 language
属性的组合的 dictionary
对象,具体取决于所用的方法。Text is sent to the API as a list of documents
, which are dictionary
objects containing a combination of id
, text
, and language
attributes depending on the method used. text
属性存储要以源 language
分析的文本,而 id
则可以是任何值。The text
attribute stores the text to be analyzed in the origin language
, and the id
can be any value.
响应对象是一个列表,其中包含每个文档的分析信息。The response object is a list containing the analysis information for each document.
代码示例Code examples
- 客户端身份验证Client Authentication
- 情绪分析Sentiment Analysis
- 观点挖掘Opinion mining
- 语言检测Language detection
- 命名实体识别Named Entity recognition
- 实体链接Entity linking
- 个人身份信息Personally Identifiable Information
- 关键短语提取Key phrase extraction
客户端身份验证Client Authentication
创建一个新的 TextAnalyticsClient
对象并使用你的密钥和终结点作为参数。Create a new TextAnalyticsClient
object with your key and endpoint as parameters.
const textAnalyticsClient = new TextAnalyticsClient(endpoint, new AzureKeyCredential(key));
情绪分析Sentiment analysis
创建一个字符串数组,使其包含要分析的文档。Create an array of strings containing the document you want to analyze. 调用客户端的 analyzeSentiment()
方法,并获取返回的 SentimentBatchResult
对象。Call the client's analyzeSentiment()
method and get the returned SentimentBatchResult
object. 循环访问结果列表,输出每个文档的 ID、文档级别情绪以及置信度分数。Iterate through the list of results, and print each document's ID, document level sentiment with confidence scores. 对于每个文档,结果都包含句子级别情绪以及偏移量、长度和置信度分数。For each document, result contains sentence level sentiment along with offsets, length, and confidence scores.
async function sentimentAnalysis(client){
const sentimentInput = [
"I had the best day of my life. I wish you were there with me."
];
const sentimentResult = await client.analyzeSentiment(sentimentInput);
sentimentResult.forEach(document => {
console.log(`ID: ${document.id}`);
console.log(`\tDocument Sentiment: ${document.sentiment}`);
console.log(`\tDocument Scores:`);
console.log(`\t\tPositive: ${document.confidenceScores.positive.toFixed(2)} \tNegative: ${document.confidenceScores.negative.toFixed(2)} \tNeutral: ${document.confidenceScores.neutral.toFixed(2)}`);
console.log(`\tSentences Sentiment(${document.sentences.length}):`);
document.sentences.forEach(sentence => {
console.log(`\t\tSentence sentiment: ${sentence.sentiment}`)
console.log(`\t\tSentences Scores:`);
console.log(`\t\tPositive: ${sentence.confidenceScores.positive.toFixed(2)} \tNegative: ${sentence.confidenceScores.negative.toFixed(2)} \tNeutral: ${sentence.confidenceScores.neutral.toFixed(2)}`);
});
});
}
sentimentAnalysis(textAnalyticsClient)
在控制台窗口中使用 node index.js
运行代码。Run your code with node index.js
in your console window.
输出Output
ID: 0
Document Sentiment: positive
Document Scores:
Positive: 1.00 Negative: 0.00 Neutral: 0.00
Sentences Sentiment(2):
Sentence sentiment: positive
Sentences Scores:
Positive: 1.00 Negative: 0.00 Neutral: 0.00
Sentence sentiment: neutral
Sentences Scores:
Positive: 0.21 Negative: 0.02 Neutral: 0.77
观点挖掘Opinion mining
若要使用观点挖掘进行情绪分析,请创建一个包含要分析的文档的字符串数组。In order to do sentiment analysis with opinion mining, create an array of strings containing the document you want to analyze. 调用客户端的 analyzeSentiment()
方法(添加了选项标志 includeOpinionMining: true
),并获取返回的 SentimentBatchResult
对象。Call the client's analyzeSentiment()
method with adding option flag includeOpinionMining: true
and get the returned SentimentBatchResult
object. 循环访问结果列表,输出每个文档的 ID、文档级别情绪以及置信度分数。Iterate through the list of results, and print each document's ID, document level sentiment with confidence scores. 对于每个文档,结果不仅包含如上所述的句子级情绪,而且还包含角度和观点级情绪。For each document, result contains not only sentence level sentiment as above, but also aspect and opinion level sentiment.
async function sentimentAnalysisWithOpinionMining(client){
const sentimentInput = [
{
text: "The food and service were unacceptable, but the concierge were nice",
id: "0",
language: "en"
}
];
const sentimentResult = await client.analyzeSentiment(sentimentInput, { includeOpinionMining: true });
sentimentResult.forEach(document => {
console.log(`ID: ${document.id}`);
console.log(`\tDocument Sentiment: ${document.sentiment}`);
console.log(`\tDocument Scores:`);
console.log(`\t\tPositive: ${document.confidenceScores.positive.toFixed(2)} \tNegative: ${document.confidenceScores.negative.toFixed(2)} \tNeutral: ${document.confidenceScores.neutral.toFixed(2)}`);
console.log(`\tSentences Sentiment(${document.sentences.length}):`);
document.sentences.forEach(sentence => {
console.log(`\t\tSentence sentiment: ${sentence.sentiment}`)
console.log(`\t\tSentences Scores:`);
console.log(`\t\tPositive: ${sentence.confidenceScores.positive.toFixed(2)} \tNegative: ${sentence.confidenceScores.negative.toFixed(2)} \tNeutral: ${sentence.confidenceScores.neutral.toFixed(2)}`);
console.log("\tMined opinions");
for (const { aspect, opinions } of sentence.minedOpinions) {
console.log(`\t\tAspect text: ${aspect.text}`);
console.log(`\t\tAspect sentiment: ${aspect.sentiment}`);
console.log(`\t\tAspect Positive: ${aspect.confidenceScores.positive.toFixed(2)} \tNegative: ${aspect.confidenceScores.negative.toFixed(2)}`);
console.log("\t\tAspect opinions:");
for (const { text, sentiment, confidenceScores } of opinions) {
console.log(`\t\tOpinion text: ${text}`);
console.log(`\t\tOpinion sentiment: ${sentiment}`);
console.log(`\t\tOpinion Positive: ${confidenceScores.positive.toFixed(2)} \tNegative: ${confidenceScores.negative.toFixed(2)}`);
}
}
});
});
}
sentimentAnalysisWithOpinionMining(textAnalyticsClient)
在控制台窗口中使用 node index.js
运行代码。Run your code with node index.js
in your console window.
输出Output
ID: 0
Document Sentiment: positive
Document Scores:
Positive: 0.84 Negative: 0.16 Neutral: 0.00
Sentences Sentiment(1):
Sentence sentiment: positive
Sentences Scores:
Positive: 0.84 Negative: 0.16 Neutral: 0.00
Mined opinions
Aspect text: food
Aspect sentiment: negative
Aspect Positive: 0.01 Negative: 0.99
Aspect opinions:
Opinion text: unacceptable
Opinion sentiment: negative
Opinion Positive: 0.01 Negative: 0.99
Aspect text: service
Aspect sentiment: negative
Aspect Positive: 0.01 Negative: 0.99
Aspect opinions:
Opinion text: unacceptable
Opinion sentiment: negative
Opinion Positive: 0.01 Negative: 0.99
Aspect text: concierge
Aspect sentiment: positive
Aspect Positive: 1.00 Negative: 0.00
Aspect opinions:
Opinion text: nice
Opinion sentiment: positive
Opinion Positive: 1.00 Negative: 0.00
语言检测Language detection
创建一个字符串数组,使其包含要分析的文档。Create an array of strings containing the document you want to analyze. 调用客户端的 detectLanguage()
方法,并获取返回的 DetectLanguageResultCollection
。Call the client's detectLanguage()
method and get the returned DetectLanguageResultCollection
. 然后循环访问结果,输出每个文档的 ID 以及各自的主要语言。Then iterate through the results, and print each document's ID with respective primary language.
async function languageDetection(client) {
const languageInputArray = [
"Ce document est rédigé en Français."
];
const languageResult = await client.detectLanguage(languageInputArray);
languageResult.forEach(document => {
console.log(`ID: ${document.id}`);
console.log(`\tPrimary Language ${document.primaryLanguage.name}`)
});
}
languageDetection(textAnalyticsClient);
在控制台窗口中使用 node index.js
运行代码。Run your code with node index.js
in your console window.
输出Output
ID: 0
Primary Language French
命名实体识别 (NER)Named Entity Recognition (NER)
备注
在版本 3.1
中:In version 3.1
:
- 实体链接是一个独立于 NER 的请求。Entity linking is a separate request than NER.
创建一个字符串数组,使其包含要分析的文档。Create an array of strings containing the document you want to analyze. 调用客户端的 recognizeEntities()
方法,并获取 RecognizeEntitiesResult
对象。Call the client's recognizeEntities()
method and get the RecognizeEntitiesResult
object. 循环访问结果列表,并输出实体名称、类型、子类型、偏移量、长度和分数。Iterate through the list of results, and print the entity name, type, subtype, offset, length, and score.
async function entityRecognition(client){
const entityInputs = [
"Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975, to develop and sell BASIC interpreters for the Altair 8800",
"La sede principal de Microsoft se encuentra en la ciudad de Redmond, a 21 kilómetros de Seattle."
];
const entityResults = await client.recognizeEntities(entityInputs);
entityResults.forEach(document => {
console.log(`Document ID: ${document.id}`);
document.entities.forEach(entity => {
console.log(`\tName: ${entity.text} \tCategory: ${entity.category} \tSubcategory: ${entity.subCategory ? entity.subCategory : "N/A"}`);
console.log(`\tScore: ${entity.confidenceScore}`);
});
});
}
entityRecognition(textAnalyticsClient);
在控制台窗口中使用 node index.js
运行代码。Run your code with node index.js
in your console window.
输出Output
Document ID: 0
Name: Microsoft Category: Organization Subcategory: N/A
Score: 0.29
Name: Bill Gates Category: Person Subcategory: N/A
Score: 0.78
Name: Paul Allen Category: Person Subcategory: N/A
Score: 0.82
Name: April 4, 1975 Category: DateTime Subcategory: Date
Score: 0.8
Name: 8800 Category: Quantity Subcategory: Number
Score: 0.8
Document ID: 1
Name: 21 Category: Quantity Subcategory: Number
Score: 0.8
Name: Seattle Category: Location Subcategory: GPE
Score: 0.25
实体链接Entity Linking
创建一个字符串数组,使其包含要分析的文档。Create an array of strings containing the document you want to analyze. 调用客户端的 recognizeLinkedEntities()
方法,并获取 RecognizeLinkedEntitiesResult
对象。Call the client's recognizeLinkedEntities()
method and get the RecognizeLinkedEntitiesResult
object. 循环访问结果列表,并输出实体名称、ID、数据源、URL 和匹配项。Iterate through the list of results, and print the entity name, ID, data source, url, and matches. matches
数组中的每个对象都将包含该匹配项的偏移量、长度和分数。Every object in matches
array will contain offset, length, and score for that match.
async function linkedEntityRecognition(client){
const linkedEntityInput = [
"Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975, to develop and sell BASIC interpreters for the Altair 8800. During his career at Microsoft, Gates held the positions of chairman, chief executive officer, president and chief software architect, while also being the largest individual shareholder until May 2014."
];
const entityResults = await client.recognizeLinkedEntities(linkedEntityInput);
entityResults.forEach(document => {
console.log(`Document ID: ${document.id}`);
document.entities.forEach(entity => {
console.log(`\tName: ${entity.name} \tID: ${entity.dataSourceEntityId} \tURL: ${entity.url} \tData Source: ${entity.dataSource}`);
console.log(`\tMatches:`)
entity.matches.forEach(match => {
console.log(`\t\tText: ${match.text} \tScore: ${match.confidenceScore.toFixed(2)}`);
})
});
});
}
linkedEntityRecognition(textAnalyticsClient);
在控制台窗口中使用 node index.js
运行代码。Run your code with node index.js
in your console window.
输出Output
Document ID: 0
Name: Altair 8800 ID: Altair 8800 URL: https://en.wikipedia.org/wiki/Altair_8800 Data Source: Wikipedia
Matches:
Text: Altair 8800 Score: 0.88
Name: Bill Gates ID: Bill Gates URL: https://en.wikipedia.org/wiki/Bill_Gates Data Source: Wikipedia
Matches:
Text: Bill Gates Score: 0.63
Text: Gates Score: 0.63
Name: Paul Allen ID: Paul Allen URL: https://en.wikipedia.org/wiki/Paul_Allen Data Source: Wikipedia
Matches:
Text: Paul Allen Score: 0.60
Name: Microsoft ID: Microsoft URL: https://en.wikipedia.org/wiki/Microsoft Data Source: Wikipedia
Matches:
Text: Microsoft Score: 0.55
Text: Microsoft Score: 0.55
Name: April 4 ID: April 4 URL: https://en.wikipedia.org/wiki/April_4 Data Source: Wikipedia
Matches:
Text: April 4 Score: 0.32
Name: BASIC ID: BASIC URL: https://en.wikipedia.org/wiki/BASIC Data Source: Wikipedia
Matches:
Text: BASIC Score: 0.33
个人身份信息 (PII) 识别Personally Identifying Information (PII) Recognition
创建一个字符串数组,使其包含要分析的文档。Create an array of strings containing the document you want to analyze. 调用客户端的 recognizePiiEntities()
方法,并获取 RecognizePIIEntitiesResult
对象。Call the client's recognizePiiEntities()
method and get the RecognizePIIEntitiesResult
object. 循环访问结果列表,并输出实体名称、类型和分数。Iterate through the list of results, and print the entity name, type, and score.
async function piiRecognition(client) {
const documents = [
"The employee's phone number is (555) 555-5555."
];
const results = await client.recognizePiiEntities(documents, "en");
for (const result of results) {
if (result.error === undefined) {
console.log("Redacted Text: ", result.redactedText);
console.log(" -- Recognized PII entities for input", result.id, "--");
for (const entity of result.entities) {
console.log(entity.text, ":", entity.category, "(Score:", entity.confidenceScore, ")");
}
} else {
console.error("Encountered an error:", result.error);
}
}
}
piiRecognition(textAnalyticsClient)
在控制台窗口中使用 node index.js
运行代码。Run your code with node index.js
in your console window.
输出Output
Redacted Text: The employee's phone number is **************.
-- Recognized PII entities for input 0 --
(555) 555-5555 : Phone Number (Score: 0.8 )
关键短语提取Key phrase extraction
创建一个字符串数组,使其包含要分析的文档。Create an array of strings containing the document you want to analyze. 调用客户端的 extractKeyPhrases()
方法,并获取返回的 ExtractKeyPhrasesResult
对象。Call the client's extractKeyPhrases()
method and get the returned ExtractKeyPhrasesResult
object. 循环访问结果,输出每个文档的 ID 以及任何检测到的密钥短语。Iterate through the results and print each document's ID, and any detected key phrases.
async function keyPhraseExtraction(client){
const keyPhrasesInput = [
"My cat might need to see a veterinarian.",
];
const keyPhraseResult = await client.extractKeyPhrases(keyPhrasesInput);
keyPhraseResult.forEach(document => {
console.log(`ID: ${document.id}`);
console.log(`\tDocument Key Phrases: ${document.keyPhrases}`);
});
}
keyPhraseExtraction(textAnalyticsClient);
在控制台窗口中使用 node index.js
运行代码。Run your code with node index.js
in your console window.
输出Output
ID: 0
Document Key Phrases: cat,veterinarian
使用“分析”操作异步使用 APIUse the API asynchronously with the Analyze operation
注意
若要使用“分析”操作,必须使用标准 (S) 定价层的“文本分析”资源。To use Analyze operations, you must use a Text Analytics resource with the standard (S) pricing tier.
创建名为 analyze_example()
的新函数,它将调用 beginAnalyze()
函数。Create a new function called analyze_example()
, which calls the beginAnalyze()
function. 结果将是一个长期操作,将轮询该操作以获得结果。The result will be a long running operation which will be polled for results.
const documents = [
"Microsoft was founded by Bill Gates and Paul Allen.",
];
async function analyze_example(client) {
console.log("== Analyze Sample ==");
const tasks = {
entityRecognitionTasks: [{ modelVersion: "latest" }]
};
const poller = await client.beginAnalyze(documents, tasks);
const resultPages = await poller.pollUntilDone();
for await (const page of resultPages) {
const entitiesResults = page.entitiesRecognitionResults![0];
for (const doc of entitiesResults) {
console.log(`- Document ${doc.id}`);
if (!doc.error) {
console.log("\tEntities:");
for (const entity of doc.entities) {
console.log(`\t- Entity ${entity.text} of type ${entity.category}`);
}
} else {
console.error(" Error:", doc.error);
}
}
}
}
analyze_example(textAnalyticsClient);
OutputOutput
== Analyze Sample ==
- Document 0
Entities:
- Entity Microsoft of type Organization
- Entity Bill Gates of type Person
- Entity Paul Allen of type Person
还可以使用“分析”操作来检测 PII 和关键短语提取。You can also use the Analyze operation to detect PII and key phrase extraction. 请参阅 GitHub 上的 JavaScript 和 TypeScript 分析示例。See the Analyze samples for JavaScript and TypeScript on GitHub.
运行应用程序Run the application
在快速入门文件中使用 node
命令运行应用程序。Run the application with the node
command on your quickstart file.
node index.js
重要
- 文本分析 API 的最新稳定版本为
3.0
。The latest stable version of the Text Analytics API is3.0
.- 确保只按所用版本的说明操作。Be sure to only follow the instructions for the version you are using.
- 为了简单起见,本文中的代码使用了同步方法和不受保护的凭据存储。The code in this article uses synchronous methods and un-secured credentials storage for simplicity reasons. 对于生产方案,我们建议使用批处理的异步方法来提高性能和可伸缩性。For production scenarios, we recommend using the batched asynchronous methods for performance and scalability. 请参阅下面的参考文档。See the reference documentation below.
v3.1 参考文档 | v3.1 库源代码 | v3.1 包 (PiPy) | v3.1 示例v3.1 Reference documentation | v3.1 Library source code | v3.1 Package (PiPy) | v3.1 Samples
先决条件Prerequisites
- Azure 订阅 - 创建试用订阅Azure subscription - Create one for trial
- Python 3.xPython 3.x
- 你有了 Azure 订阅后,将在 Azure 门户中创建文本分析资源 ,以获取你的密钥和终结点。Once you have your Azure subscription, create a Text Analytics resource in the Azure portal to get your key and endpoint. 部署后,单击“转到资源”。After it deploys, click Go to resource.
- 你需要从创建的资源获取密钥和终结点,以便将应用程序连接到文本分析 API。You will need the key and endpoint from the resource you create to connect your application to the Text Analytics API. 你稍后会在快速入门中将密钥和终结点粘贴到下方的代码中。You'll paste your key and endpoint into the code below later in the quickstart.
- 可以使用免费定价层 (
F0
) 试用该服务,然后再升级到付费层进行生产。You can use the free pricing tier (F0
) to try the service, and upgrade later to a paid tier for production.
- 若要使用“分析”功能,需要标准 (S) 定价层的“文本分析”资源。To use the Analyze feature, you will need a Text Analytics resource with the standard (S) pricing tier.
设置Setting up
安装客户端库Install the client library
在安装 Python 后,可以通过以下命令安装客户端库:After installing Python, you can install the client library with:
pip install azure-ai-textanalytics --pre
提示
想要立即查看整个快速入门代码文件?Want to view the whole quickstart code file at once? 可以在 GitHub 上找到它,其中包含此快速入门中的代码示例。You can find it on GitHub, which contains the code examples in this quickstart.
创建新的 Python 应用程序Create a new python application
创建一个新的 Python 文件,为资源的 Azure 终结点和订阅密钥创建变量。Create a new Python file and create variables for your resource's Azure endpoint and subscription key.
重要
转到 Azure 门户。Go to the Azure portal. 如果在“先决条件”部分中创建的文本分析资源已成功部署,请单击“后续步骤”下的“转到资源”按钮 。If the Text Analytics resource you created in the Prerequisites section deployed successfully, click the Go to Resource button under Next Steps. 在资源的“密钥和终结点”页的“资源管理”下可以找到密钥和终结点 。You can find your key and endpoint in the resource's key and endpoint page, under resource management.
完成后,请记住将密钥从代码中删除,并且永远不要公开发布该密钥。Remember to remove the key from your code when you're done, and never post it publicly. 对于生产环境,请考虑使用安全的方法来存储和访问凭据。For production, consider using a secure way of storing and accessing your credentials. 例如,Azure 密钥保管库。For example, Azure key vault.
key = "<paste-your-text-analytics-key-here>"
endpoint = "<paste-your-text-analytics-endpoint-here>"
对象模型Object model
文本分析客户端是向 Azure 进行身份验证的 TextAnalyticsClient
对象。The Text Analytics client is a TextAnalyticsClient
object that authenticates to Azure. 该客户端提供了几种方法来分析文本。The client provides several methods for analyzing text.
将处理文本以 documents
列表的形式发送到 API 时,该列表可以是字符串列表、类似于字典的表示形式列表或 TextDocumentInput/DetectLanguageInput
列表。When processing text is sent to the API as a list of documents
, which is either as a list of string, a list of dict-like representation, or as a list of TextDocumentInput/DetectLanguageInput
. dict-like
对象包含 id
、text
和 language/country_hint
的组合。A dict-like
object contains a combination of id
, text
, and language/country_hint
. text
属性存储要以源 country_hint
分析的文本,而 id
则可以是任何值。The text
attribute stores the text to be analyzed in the origin country_hint
, and the id
can be any value.
响应对象是一个列表,其中包含每个文档的分析信息。The response object is a list containing the analysis information for each document.
代码示例Code examples
这些代码片段展示了如何使用适用于 Python 的文本分析客户端库执行以下任务:These code snippets show you how to do the following tasks with the Text Analytics client library for Python:
验证客户端Authenticate the client
创建一个函数,以便通过上面创建的 key
和 endpoint
来实例化 TextAnalyticsClient
对象。Create a function to instantiate the TextAnalyticsClient
object with your key
AND endpoint
created above. 然后创建一个新客户端。Then create a new client.
from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential
def authenticate_client():
ta_credential = AzureKeyCredential(key)
text_analytics_client = TextAnalyticsClient(
endpoint=endpoint,
credential=ta_credential)
return text_analytics_client
client = authenticate_client()
情绪分析Sentiment analysis
创建一个名为 sentiment_analysis_example()
的新函数,该函数采用客户端作为参数,然后调用 analyze_sentiment()
函数。Create a new function called sentiment_analysis_example()
that takes the client as an argument, then calls the analyze_sentiment()
function. 返回的响应对象将包含整个输入文档的情绪标签和分数,以及每个句子的情绪分析。The returned response object will contain the sentiment label and score of the entire input document, as well as a sentiment analysis for each sentence.
def sentiment_analysis_example(client):
documents = ["I had the best day of my life. I wish you were there with me."]
response = client.analyze_sentiment(documents=documents)[0]
print("Document Sentiment: {}".format(response.sentiment))
print("Overall scores: positive={0:.2f}; neutral={1:.2f}; negative={2:.2f} \n".format(
response.confidence_scores.positive,
response.confidence_scores.neutral,
response.confidence_scores.negative,
))
for idx, sentence in enumerate(response.sentences):
print("Sentence: {}".format(sentence.text))
print("Sentence {} sentiment: {}".format(idx+1, sentence.sentiment))
print("Sentence score:\nPositive={0:.2f}\nNeutral={1:.2f}\nNegative={2:.2f}\n".format(
sentence.confidence_scores.positive,
sentence.confidence_scores.neutral,
sentence.confidence_scores.negative,
))
sentiment_analysis_example(client)
输出Output
Document Sentiment: positive
Overall scores: positive=1.00; neutral=0.00; negative=0.00
Sentence: I had the best day of my life.
Sentence 1 sentiment: positive
Sentence score:
Positive=1.00
Neutral=0.00
Negative=0.00
Sentence: I wish you were there with me.
Sentence 2 sentiment: neutral
Sentence score:
Positive=0.21
Neutral=0.77
Negative=0.02
观点挖掘Opinion mining
若要使用观点挖掘进行情绪分析,请创建一个名为 sentiment_analysis_with_opinion_mining_example()
的新函数(采用客户端作为参数),然后使用选项标志 show_opinion_mining=True
调用 analyze_sentiment()
函数。In order to do sentiment analysis with opinion mining, create a new function called sentiment_analysis_with_opinion_mining_example()
that takes the client as an argument, then calls the analyze_sentiment()
function with option flag show_opinion_mining=True
. 返回的响应对象不仅包含整个输入文档的情绪标签和分数以及每个句子的情绪分析,还包含角度和观点级情绪分析。The returned response object will contain not only the sentiment label and score of the entire input document with sentiment analysis for each sentence, but also aspect and opinion level sentiment analysis.
def sentiment_analysis_with_opinion_mining_example(client):
documents = [
"The food and service were unacceptable, but the concierge were nice"
]
result = client.analyze_sentiment(documents, show_opinion_mining=True)
doc_result = [doc for doc in result if not doc.is_error]
positive_reviews = [doc for doc in doc_result if doc.sentiment == "positive"]
negative_reviews = [doc for doc in doc_result if doc.sentiment == "negative"]
positive_mined_opinions = []
mixed_mined_opinions = []
negative_mined_opinions = []
for document in doc_result:
print("Document Sentiment: {}".format(document.sentiment))
print("Overall scores: positive={0:.2f}; neutral={1:.2f}; negative={2:.2f} \n".format(
document.confidence_scores.positive,
document.confidence_scores.neutral,
document.confidence_scores.negative,
))
for sentence in document.sentences:
print("Sentence: {}".format(sentence.text))
print("Sentence sentiment: {}".format(sentence.sentiment))
print("Sentence score:\nPositive={0:.2f}\nNeutral={1:.2f}\nNegative={2:.2f}\n".format(
sentence.confidence_scores.positive,
sentence.confidence_scores.neutral,
sentence.confidence_scores.negative,
))
for mined_opinion in sentence.mined_opinions:
aspect = mined_opinion.aspect
print("......'{}' aspect '{}'".format(aspect.sentiment, aspect.text))
print("......Aspect score:\n......Positive={0:.2f}\n......Negative={1:.2f}\n".format(
aspect.confidence_scores.positive,
aspect.confidence_scores.negative,
))
for opinion in mined_opinion.opinions:
print("......'{}' opinion '{}'".format(opinion.sentiment, opinion.text))
print("......Opinion score:\n......Positive={0:.2f}\n......Negative={1:.2f}\n".format(
opinion.confidence_scores.positive,
opinion.confidence_scores.negative,
))
print("\n")
print("\n")
sentiment_analysis_with_opinion_mining_example(client)
输出Output
Document Sentiment: positive
Overall scores: positive=0.84; neutral=0.00; negative=0.16
Sentence: The food and service were unacceptable, but the concierge were nice
Sentence sentiment: positive
Sentence score:
Positive=0.84
Neutral=0.00
Negative=0.16
......'negative' aspect 'food'
......Aspect score:
......Positive=0.01
......Negative=0.99
......'negative' opinion 'unacceptable'
......Opinion score:
......Positive=0.01
......Negative=0.99
......'negative' aspect 'service'
......Aspect score:
......Positive=0.01
......Negative=0.99
......'negative' opinion 'unacceptable'
......Opinion score:
......Positive=0.01
......Negative=0.99
......'positive' aspect 'concierge'
......Aspect score:
......Positive=1.00
......Negative=0.00
......'positive' opinion 'nice'
......Opinion score:
......Positive=1.00
......Negative=0.00
Press any key to continue . . .
语言检测Language detection
创建一个名为 language_detection_example()
的新函数,该函数采用客户端作为参数,然后调用 detect_language()
函数。Create a new function called language_detection_example()
that takes the client as an argument, then calls the detect_language()
function. 如果成功,则返回的响应对象将在 primary_language
中包含检测到的语言,否则将包含 error
。The returned response object will contain the detected language in primary_language
if successful, and an error
if not.
提示
在某些情况下,可能很难根据输入区分语言。In some cases it may be hard to disambiguate languages based on the input. 可以使用 country_hint
参数指定 2 个字母的国家/地区代码。You can use the country_hint
parameter to specify a 2-letter country code. 默认情况下,API 使用“US”作为默认的 countryHint,要删除此行为,可以通过将此值设置为空字符串 country_hint : ""
来重置此参数。By default the API is using the "US" as the default countryHint, to remove this behavior you can reset this parameter by setting this value to empty string country_hint : ""
.
def language_detection_example(client):
try:
documents = ["Ce document est rédigé en Français."]
response = client.detect_language(documents = documents, country_hint = 'us')[0]
print("Language: ", response.primary_language.name)
except Exception as err:
print("Encountered exception. {}".format(err))
language_detection_example(client)
输出Output
Language: French
命名实体识别 (NER)Named Entity recognition (NER)
备注
在版本 3.1
中:In version 3.1
:
- 实体链接是一个独立于 NER 的请求。Entity linking is a separate request than NER.
创建一个名为 entity_recognition_example
的新函数,该函数采用客户端作为参数,然后调用 recognize_entities()
函数并循环访问结果。Create a new function called entity_recognition_example
that takes the client as an argument, then calls the recognize_entities()
function and iterates through the results. 如果成功,则返回的响应对象将在 entity
中包含检测到的实体列表,否则将包含 error
。The returned response object will contain the list of detected entities in entity
if successful, and an error
if not. 对于检测到的每个实体,输出其类别和子类别(如果存在)。For each detected entity, print its Category and Sub-Category if exists.
def entity_recognition_example(client):
try:
documents = ["I had a wonderful trip to Seattle last week."]
result = client.recognize_entities(documents = documents)[0]
print("Named Entities:\n")
for entity in result.entities:
print("\tText: \t", entity.text, "\tCategory: \t", entity.category, "\tSubCategory: \t", entity.subcategory,
"\n\tConfidence Score: \t", round(entity.confidence_score, 2), "\tLength: \t", entity.length, "\tOffset: \t", entity.offset, "\n")
except Exception as err:
print("Encountered exception. {}".format(err))
entity_recognition_example(client)
输出Output
Named Entities:
Text: trip Category: Event SubCategory: None
Confidence Score: 0.61 Length: 4 Offset: 18
Text: Seattle Category: Location SubCategory: GPE
Confidence Score: 0.82 Length: 7 Offset: 26
Text: last week Category: DateTime SubCategory: DateRange
Confidence Score: 0.8 Length: 9 Offset: 34
实体链接Entity Linking
创建一个名为 entity_linking_example()
的新函数,该函数采用客户端作为参数,然后调用 recognize_linked_entities()
函数并循环访问结果。Create a new function called entity_linking_example()
that takes the client as an argument, then calls the recognize_linked_entities()
function and iterates through the results. 如果成功,则返回的响应对象将在 entities
中包含检测到的实体列表,否则将包含 error
。The returned response object will contain the list of detected entities in entities
if successful, and an error
if not. 由于链接实体是唯一标识的,因此同一实体的实例将以分组形式出现在 entity
对象下,显示为 match
对象的列表。Since linked entities are uniquely identified, occurrences of the same entity are grouped under a entity
object as a list of match
objects.
def entity_linking_example(client):
try:
documents = ["""Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975,
to develop and sell BASIC interpreters for the Altair 8800.
During his career at Microsoft, Gates held the positions of chairman,
chief executive officer, president and chief software architect,
while also being the largest individual shareholder until May 2014."""]
result = client.recognize_linked_entities(documents = documents)[0]
print("Linked Entities:\n")
for entity in result.entities:
print("\tName: ", entity.name, "\tId: ", entity.data_source_entity_id, "\tUrl: ", entity.url,
"\n\tData Source: ", entity.data_source)
print("\tMatches:")
for match in entity.matches:
print("\t\tText:", match.text)
print("\t\tConfidence Score: {0:.2f}".format(match.confidence_score))
print("\t\tOffset: {}".format(match.offset))
print("\t\tLength: {}".format(match.length))
except Exception as err:
print("Encountered exception. {}".format(err))
entity_linking_example(client)
输出Output
Linked Entities:
Name: Microsoft Id: Microsoft Url: https://en.wikipedia.org/wiki/Microsoft
Data Source: Wikipedia
Matches:
Text: Microsoft
Confidence Score: 0.55
Offset: 0
Length: 9
Text: Microsoft
Confidence Score: 0.55
Offset: 168
Length: 9
Name: Bill Gates Id: Bill Gates Url: https://en.wikipedia.org/wiki/Bill_Gates
Data Source: Wikipedia
Matches:
Text: Bill Gates
Confidence Score: 0.63
Offset: 25
Length: 10
Text: Gates
Confidence Score: 0.63
Offset: 179
Length: 5
Name: Paul Allen Id: Paul Allen Url: https://en.wikipedia.org/wiki/Paul_Allen
Data Source: Wikipedia
Matches:
Text: Paul Allen
Confidence Score: 0.60
Offset: 40
Length: 10
Name: April 4 Id: April 4 Url: https://en.wikipedia.org/wiki/April_4
Data Source: Wikipedia
Matches:
Text: April 4
Confidence Score: 0.32
Offset: 54
Length: 7
Name: BASIC Id: BASIC Url: https://en.wikipedia.org/wiki/BASIC
Data Source: Wikipedia
Matches:
Text: BASIC
Confidence Score: 0.33
Offset: 98
Length: 5
Name: Altair 8800 Id: Altair 8800 Url: https://en.wikipedia.org/wiki/Altair_8800
Data Source: Wikipedia
Matches:
Text: Altair 8800
Confidence Score: 0.88
Offset: 125
Length: 11
个人身份信息识别Personally Identifiable Information recognition
创建一个名为 pii_recognition_example
的新函数,该函数采用客户端作为参数,然后调用 recognize_pii_entities()
函数并循环访问结果。Create a new function called pii_recognition_example
that takes the client as an argument, then calls the recognize_pii_entities()
function and iterates through the results. 如果成功,则返回的响应对象将在 entity
中包含检测到的实体列表,否则将包含 error
。The returned response object will contain the list of detected entities in entity
if successful, and an error
if not. 对于检测到的每个实体,输出其类别和子类别(如果存在)。For each detected entity, print its Category and Sub-Category if exists.
def pii_recognition_example(client):
documents = [
"The employee's SSN is 859-98-0987.",
"The employee's phone number is 555-555-5555."
]
response = client.recognize_pii_entities(documents, language="en")
result = [doc for doc in response if not doc.is_error]
for doc in result:
print("Redacted Text: {}".format(doc.redacted_text))
for entity in doc.entities:
print("Entity: {}".format(entity.text))
print("\tCategory: {}".format(entity.category))
print("\tConfidence Score: {}".format(entity.confidence_score))
print("\tOffset: {}".format(entity.offset))
print("\tLength: {}".format(entity.length))
pii_recognition_example(client)
输出Output
Redacted Text: The employee's SSN is ***********.
Entity: 859-98-0987
Category: U.S. Social Security Number (SSN)
Confidence Score: 0.65
Offset: 22
Length: 11
Redacted Text: The employee's phone number is ************.
Entity: 555-555-5555
Category: Phone Number
Confidence Score: 0.8
Offset: 31
Length: 12
关键短语提取Key phrase extraction
创建一个名为 key_phrase_extraction_example()
的新函数,该函数采用客户端作为参数,然后调用 extract_key_phrases()
函数。Create a new function called key_phrase_extraction_example()
that takes the client as an argument, then calls the extract_key_phrases()
function. 如果成功,结果将包含 key_phrases
中检测到的关键短语列表,如果失败,则将包含 error
。The result will contain the list of detected key phrases in key_phrases
if successful, and an error
if not. 输出任何检测到的关键短语。Print any detected key phrases.
def key_phrase_extraction_example(client):
try:
documents = ["My cat might need to see a veterinarian."]
response = client.extract_key_phrases(documents = documents)[0]
if not response.is_error:
print("\tKey Phrases:")
for phrase in response.key_phrases:
print("\t\t", phrase)
else:
print(response.id, response.error)
except Exception as err:
print("Encountered exception. {}".format(err))
key_phrase_extraction_example(client)
输出Output
Key Phrases:
cat
veterinarian
重要
- 文本分析 API 的最新稳定版本为
3.0
。The latest stable version of the Text Analytics API is3.0
.- 确保只按所用版本的说明操作。Be sure to only follow the instructions for the version you are using.
先决条件Prerequisites
- 最新版本的 cURL。The current version of cURL.
- 你有了 Azure 订阅后,将在 Azure 门户中创建文本分析资源 ,以获取你的密钥和终结点。Once you have your Azure subscription, create a Text Analytics resource in the Azure portal to get your key and endpoint. 部署后,单击“转到资源”。After it deploys, click Go to resource.
- 你需要从创建的资源获取密钥和终结点,以便将应用程序连接到文本分析 API。You will need the key and endpoint from the resource you create to connect your application to the Text Analytics API. 你稍后会在快速入门中将密钥和终结点粘贴到下方的代码中。You'll paste your key and endpoint into the code below later in the quickstart.
- 可以使用免费定价层 (
F0
) 试用该服务,然后再升级到付费层进行生产。You can use the free pricing tier (F0
) to try the service, and upgrade later to a paid tier for production.
备注
- 以下 BASH 示例使用
\
行继续符。The following BASH examples use the\
line continuation character. 如果你的控制台或终端使用不同的行继续符,请使用该字符。If your console or terminal uses a different line continuation character, use that character. - 可以在 GitHub 上找到特定于语言的示例。You can find language specific samples on GitHub.
- 访问 Azure 门户,并查找在满足先决条件的过程中创建的文本分析资源的密钥和终结点。Go to the Azure portal and find the key and endpoint for the Text Analytics resource you created in the prerequisites. 它们将位于资源的“密钥和终结点”页的“资源管理”下。They will be located on the resource's key and endpoint page, under resource management. 然后,将以下代码中的字符串替换为你的密钥和终结点。Then replace the strings in the code below with your key and endpoint. 若要调用文本分析 API,需要以下信息:To call the Text Analytics API, you need the following information:
参数 (parameter)parameter | 说明Description |
---|---|
-X POST <endpoint> |
指定用于访问 API 的终结点。Specifies your endpoint for accessing the API. |
-H Content-Type: application/json |
用于发送 JSON 数据的内容类型。The content type for sending JSON data. |
-H "Ocp-Apim-Subscription-Key:<key> |
指定用于访问 API 的密钥。Specifies the key for accessing the API. |
-d <documents> |
包含要发送的文档的 JSON。The JSON containing the documents you want to send. |
以下 cURL 命令从 BASH shell 中执行。The following cURL commands are executed from a BASH shell. 请使用自己的资源名称、资源密钥和 JSON 值编辑这些命令。Edit these commands with your own resource name, resource key, and JSON values.
情绪分析Sentiment Analysis
- 将命令复制到文本编辑器中。Copy the command into a text editor.
- 必要时在命令中进行如下更改:Make the following changes in the command where needed:
- 将值
<your-text-analytics-key-here>
替换为你的值。Replace the value<your-text-analytics-key-here>
with your key. - 将请求 URL 的第一部分 (
<your-text-analytics-endpoint-here>
) 替换为你自己的终结点 URL。Replace the first part of the request URL<your-text-analytics-endpoint-here>
with the your own endpoint URL.
- 将值
- 打开命令提示符窗口。Open a command prompt window.
- 将文本编辑器中的命令粘贴到命令提示符窗口,然后运行命令。Paste the command from the text editor into the command prompt window, and then run the command.
备注
以下示例包含使用 opinionMining=true
对情绪分析的观点挖掘功能发出的请求,它精细地描述了对文本中某些方面(例如产品或服务的属性)的观点。The below example includes a request for the Opinion Mining feature of Sentiment Analysis using the opinionMining=true
parameter, which provides granular information about the opinions related to aspects (such as the attributes of products or services) in text.
curl -X POST https://<your-text-analytics-endpoint-here>/text/analytics/v3.1-preview.3/sentiment?opinionMining=true \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: <your-text-analytics-key-here>" \
-d '{ documents: [{ id: "1", text: "The customer service here is really good."}]}'
JSON 响应JSON response
{
"documents":[
{
"id":"1",
"sentiment":"positive",
"confidenceScores":{
"positive":1.0,
"neutral":0.0,
"negative":0.0
},
"sentences":[
{
"sentiment":"positive",
"confidenceScores":{
"positive":1.0,
"neutral":0.0,
"negative":0.0
},
"offset":0,
"length":41,
"text":"The customer service here is really good.",
"aspects":[
{
"sentiment":"positive",
"confidenceScores":{
"positive":1.0,
"negative":0.0
},
"offset":4,
"length":16,
"text":"customer service",
"relations":[
{
"relationType":"opinion",
"ref":"#/documents/0/sentences/0/opinions/0"
}
]
}
],
"opinions":[
{
"sentiment":"positive",
"confidenceScores":{
"positive":1.0,
"negative":0.0
},
"offset":36,
"length":4,
"text":"good",
"isNegated":false
}
]
}
],
"warnings":[
]
}
],
"errors":[
],
"modelVersion":"2020-04-01"
}
语言检测Language detection
- 将命令复制到文本编辑器中。Copy the command into a text editor.
- 必要时在命令中进行如下更改:Make the following changes in the command where needed:
- 将值
<your-text-analytics-key-here>
替换为你的值。Replace the value<your-text-analytics-key-here>
with your key. - 将请求 URL 的第一部分 (
<your-text-analytics-endpoint-here>
) 替换为你自己的终结点 URL。Replace the first part of the request URL<your-text-analytics-endpoint-here>
with the your own endpoint URL.
- 将值
- 打开命令提示符窗口。Open a command prompt window.
- 将文本编辑器中的命令粘贴到命令提示符窗口,然后运行命令。Paste the command from the text editor into the command prompt window, and then run the command.
curl -X POST https://<your-text-analytics-endpoint-here>/text/analytics/v3.1-preview.3/languages/ \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: <your-text-analytics-key-here>" \
-d '{ documents: [{ id: "1", text: "This is a document written in English."}]}'
JSON 响应JSON response
{
"documents":[
{
"id":"1",
"detectedLanguage":{
"name":"English",
"iso6391Name":"en",
"confidenceScore":0.99
},
"warnings":[
]
}
],
"errors":[
],
"modelVersion":"2020-09-01"
}
命名实体识别 (NER)Named Entity Recognition (NER)
- 将命令复制到文本编辑器中。Copy the command into a text editor.
- 必要时在命令中进行如下更改:Make the following changes in the command where needed:
- 将值
<your-text-analytics-key-here>
替换为你的值。Replace the value<your-text-analytics-key-here>
with your key. - 将请求 URL 的第一部分 (
<your-text-analytics-endpoint-here>
) 替换为你自己的终结点 URL。Replace the first part of the request URL<your-text-analytics-endpoint-here>
with the your own endpoint URL.
- 将值
- 打开命令提示符窗口。Open a command prompt window.
- 将文本编辑器中的命令粘贴到命令提示符窗口,然后运行命令。Paste the command from the text editor into the command prompt window, and then run the command.
curl -X POST https://<your-text-analytics-endpoint-here>/text/analytics/v3.1-preview.3/entities/recognition/general \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: <your-text-analytics-key-here>" \
-d '{ documents: [{ id: "1", language:"en", text: "I had a wonderful trip to Seattle last week."}]}'
JSON 响应JSON response
{
"documents":[
{
"id":"1",
"entities":[
{
"text":"trip",
"category":"Event",
"offset":18,
"length":4,
"confidenceScore":0.61
},
{
"text":"Seattle",
"category":"Location",
"subcategory":"GPE",
"offset":26,
"length":7,
"confidenceScore":0.82
},
{
"text":"last week",
"category":"DateTime",
"subcategory":"DateRange",
"offset":34,
"length":9,
"confidenceScore":0.8
}
],
"warnings":[
]
}
],
"errors":[
],
"modelVersion":"2020-04-01"
}
检测个人身份信息Detecting personally identifying information
- 将命令复制到文本编辑器中。Copy the command into a text editor.
- 必要时在命令中进行如下更改:Make the following changes in the command where needed:
- 将值
<your-text-analytics-key-here>
替换为你的值。Replace the value<your-text-analytics-key-here>
with your key. - 将请求 URL 的第一部分 (
<your-text-analytics-endpoint-here>
) 替换为你自己的终结点 URL。Replace the first part of the request URL<your-text-analytics-endpoint-here>
with the your own endpoint URL.
- 将值
- 打开命令提示符窗口。Open a command prompt window.
- 将文本编辑器中的命令粘贴到命令提示符窗口,然后运行命令。Paste the command from the text editor into the command prompt window, and then run the command.
curl -X POST https://your-text-analytics-endpoint-here>/text/analytics/v3.1-preview.3/entities/recognition/pii \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: <your-text-analytics-key-here>" \
-d '{ documents: [{ id: "1", language:"en", text: "Call our office at 312-555-1234, or send an email to support@contoso.com"}]}'
JSON 响应JSON response
{
"documents":[
{
"redactedText":"Insurance policy for *** on file 123-12-1234 is here by approved.",
"id":"1",
"entities":[
{
"text":"SSN",
"category":"Organization",
"offset":21,
"length":3,
"confidenceScore":0.45
}
],
"warnings":[
]
}
],
"errors":[
],
"modelVersion":"2020-07-01"
}
实体链接Entity linking
- 将命令复制到文本编辑器中。Copy the command into a text editor.
- 必要时在命令中进行如下更改:Make the following changes in the command where needed:
- 将值
<your-text-analytics-key-here>
替换为你的值。Replace the value<your-text-analytics-key-here>
with your key. - 将请求 URL 的第一部分 (
<your-text-analytics-endpoint-here>
) 替换为你自己的终结点 URL。Replace the first part of the request URL<your-text-analytics-endpoint-here>
with the your own endpoint URL.
- 将值
- 打开命令提示符窗口。Open a command prompt window.
- 将文本编辑器中的命令粘贴到命令提示符窗口,然后运行命令。Paste the command from the text editor into the command prompt window, and then run the command.
curl -X POST https://<your-text-analytics-endpoint-here>/text/analytics/v3.1-preview.3/entities/linking \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: <your-text-analytics-key-here>" \
-d '{ documents: [{ id: "1", language:"en", text: "Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975."}]}'
JSON 响应JSON response
{
"documents":[
{
"id":"1",
"entities":[
{
"bingId":"a093e9b9-90f5-a3d5-c4b8-5855e1b01f85",
"name":"Microsoft",
"matches":[
{
"text":"Microsoft",
"offset":0,
"length":9,
"confidenceScore":0.48
}
],
"language":"en",
"id":"Microsoft",
"url":"https://en.wikipedia.org/wiki/Microsoft",
"dataSource":"Wikipedia"
},
{
"bingId":"0d47c987-0042-5576-15e8-97af601614fa",
"name":"Bill Gates",
"matches":[
{
"text":"Bill Gates",
"offset":25,
"length":10,
"confidenceScore":0.52
}
],
"language":"en",
"id":"Bill Gates",
"url":"https://en.wikipedia.org/wiki/Bill_Gates",
"dataSource":"Wikipedia"
},
{
"bingId":"df2c4376-9923-6a54-893f-2ee5a5badbc7",
"name":"Paul Allen",
"matches":[
{
"text":"Paul Allen",
"offset":40,
"length":10,
"confidenceScore":0.54
}
],
"language":"en",
"id":"Paul Allen",
"url":"https://en.wikipedia.org/wiki/Paul_Allen",
"dataSource":"Wikipedia"
},
{
"bingId":"52535f87-235e-b513-54fe-c03e4233ac6e",
"name":"April 4",
"matches":[
{
"text":"April 4",
"offset":54,
"length":7,
"confidenceScore":0.38
}
],
"language":"en",
"id":"April 4",
"url":"https://en.wikipedia.org/wiki/April_4",
"dataSource":"Wikipedia"
}
],
"warnings":[
]
}
],
"errors":[
],
"modelVersion":"2020-02-01"
}
关键短语提取Key phrase extraction
- 将命令复制到文本编辑器中。Copy the command into a text editor.
- 必要时在命令中进行如下更改:Make the following changes in the command where needed:
- 将值
<your-text-analytics-key-here>
替换为你的值。Replace the value<your-text-analytics-key-here>
with your key. - 将请求 URL 的第一部分 (
<your-text-analytics-endpoint-here>
) 替换为你自己的终结点 URL。Replace the first part of the request URL<your-text-analytics-endpoint-here>
with the your own endpoint URL.
- 将值
- 打开命令提示符窗口。Open a command prompt window.
- 将文本编辑器中的命令粘贴到命令提示符窗口,然后运行命令。Paste the command from the text editor into the command prompt window, and then run the command.
curl -X POST https://<your-text-analytics-endpoint-here>/text/analytics/v3.1-preview.3/keyPhrases \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: <your-text-analytics-key-here>" \
-d '{ documents: [{ id: "1", language:"en", text: "Hello world. This is some input text that I love."}]}'
{
"documents":[
{
"id":"1",
"keyPhrases":[
"wonderful trip",
"Seattle",
"week"
],
"warnings":[
]
}
],
"errors":[
],
"modelVersion":"2020-07-01"
}
清理资源Clean up resources
如果想要清理并删除认知服务订阅,可以删除资源或资源组。If you want to clean up and remove a Cognitive Services subscription, you can delete the resource or resource group. 删除资源组同时也会删除与之相关联的任何其他资源。Deleting the resource group also deletes any other resources associated with it.