使用文档智能模型
此内容适用于: v2.1
本指南介绍如何将文档智能模型添加到应用程序和工作流中。 可以使用所选编程语言 SDK 或 REST API。
Azure AI 文档智能是一个基于云的 Azure AI 服务,它使用机器学习从文档中提取关键文本和结构元素。 我们建议你在学习该技术时使用免费服务。 请记住,每月的免费页数限于 500。
从以下文档智能模型中选择一种模型,分析并提取表单和文档中的数据和值:
预生成读取模型是所有文档智能模型的核心,可以检测行、字词、位置和语言。 布局、常规文档、预生成和自定义模型都使用
read
模型作为从文档中提取文本的基础。预生成布局模型从文档和图像中提取文本和文本位置、表、选择标记和结构信息。 可以使用布局模型并启用可选的查询字符串参数
features=keyValuePairs
来提取键/值对。prebuilt-contract 模型从合同协议中提取关键信息。
prebuilt-healthInsuranceCard.us 模型可从美国医疗保险卡中提取重要信息。
预生成税务文件模型模型提取美国税务表单上报告的信息。
预生成发票模型会从各种格式和质量的销售发票中提取关键字段和行项。 字段包括手机捕获的图像、扫描的文档和数字 PDF。
预生成收据模型提取印刷体和手写体销售收据中的关键信息。
预生成 idDocument 模型从美国驾驶执照、国际护照个人资料页、美国各州身份证、社会保障卡和永久居民卡中提取关键信息。
- 预生成 businessCard 模型从名片图像中提取关键信息和联系人详细信息。
客户端库 | SDK 参考 | REST API 参考 | 包 | 示例 |支持的 REST API 版本
先决条件
Azure 订阅 - 创建试用订阅。
Azure AI 服务或文档智能资源。 创建“单一服务或多个服务。 可以使用免费定价层 (
F0
) 试用该服务,然后再升级到付费层进行生产。从创建的资源获取密钥和终结点,以便将应用程序连接到文档智能服务。
- 部署资源后,选择“转到资源”。
- 在左侧导航菜单中,选择“密钥和终结点”。
- 复制其中一个密钥和终结点,稍后你将在本快速入门中使用它们。
在 URL 中的文档文件。 对于本项目,可以使用下表中为每项功能提供的示例表单:
功能 modelID document-url “读取”模型 prebuilt-read 示例手册 布局模型 预生成布局 预订确认示例 W-2 窗体模型 prebuilt-tax.us.w2 示例 W-2 表单 发票模型 预生成的发票 示例发票 收据模型 prebuilt-receipt 示例收据 ID 文档模型 prebuilt-idDocument 示例身份证明文档 名片模型 prebuilt-businessCard 名片示例
设置环境变量
若要与文档智能服务交互,需要创建 DocumentAnalysisClient
类的实例。 为此,请使用 Azure 门户中的 key
和 endpoint
实例化客户端。 对于此项目,使用环境变量来存储和访问凭据。
重要
如果使用 API 密钥,请将其安全地存储在某个其他位置,例如 Azure Key Vault 中。 请不要直接在代码中包含 API 密钥,并且切勿公开发布该密钥。
有关 Azure AI 服务安全性的详细信息,请参阅对 Azure AI 服务的请求进行身份验证。
若要为文档智能资源密钥设置环境变量,请打开控制台窗口,按照操作系统和开发环境的说明进行操作。 将 <yourKey> 和 <yourEndpoint> 替换为 Azure 门户中资源的值。
环境变量不区分大小写。 它们通常以大写形式声明,其中包含的单词以下划线联接。 请根据命令提示符运行以下命令:
设置密钥变量:
setx DI_KEY <yourKey>
设置终结点变量
setx DI_ENDPOINT <yourEndpoint>
设置环境变量后关闭命令提示符窗口。 这些值将一直保留,直到你再次更改它们。
重启任何读取环境变量的运行程序。 例如,如果正在使用 Visual Studio 或 Visual Studio Code 作为编辑器,请在运行示例代码之前重启它。
下面是其他一些可与环境变量一起使用的有用命令:
命令 | 操作 | 示例 |
---|---|---|
setx VARIABLE_NAME= |
通过将环境变量值设置为空字符串来删除环境变量。 | setx DI_KEY= |
setx VARIABLE_NAME=value |
设置或更改环境变量的值。 | setx DI_KEY=<yourKey> |
set VARIABLE_NAME |
显示特定环境变量的值。 | set DI_KEY |
set |
显示所有环境变量。 | set |
设置编程环境
启动 Visual Studio。
在“开始”页上,选择“创建新项目”。
在“创建新项目”页面上,在搜索框中输入“控制台”。 选择“控制台应用程序”模板,然后选择“下一步”。
在“配置新项目”页的“项目名称”下,输入 docIntelligence_app。 然后,选择“下一步”。
在“其他信息”页面中,选择“.NET 8.0 (长期支持)”,然后选择“创建”。
使用 NuGet 安装客户端库
右键单击 docIntelligence_app 项目,然后选择“管理 NuGet 包...”。
选择“浏览”选项卡,并输入 Azure.AI.FormRecognizer。
从下拉菜单中选择一个版本,然后将包安装到项目中。
生成应用程序
注意
从 .NET 6 开始,使用 console
模板的新项目将生成与以前版本不同的新程序样式。 新的输出使用最新的 C# 功能,这些功能简化了你需要编写的代码。
使用较新版本时,只需编写 Main
方法的主体。 无需包括顶级语句、全局 using 指令或隐式 using 指令。 有关详细信息,请参阅 C# 控制台应用模板生成顶级语句。
打开 Program.cs 文件。
删除现有的代码,包括
Console.Writeline("Hello World!")
行。选择以下代码示例之一,然后将其复制并粘贴到应用程序的 Program.cs 文件中:
将代码示例添加到应用程序后,选择项目名称旁边的绿色“开始”按钮以生成和运行程序,或按 F5 键。
使用“读取”模型
using Azure;
using Azure.AI.FormRecognizer.DocumentAnalysis;
//use your `key` and `endpoint` environment variables to create your `AzureKeyCredential` and `DocumentAnalysisClient` instances
string key = Environment.GetEnvironmentVariable("DI_KEY");
string endpoint = Environment.GetEnvironmentVariable("DI_ENDPOINT");
AzureKeyCredential credential = new AzureKeyCredential(key);
DocumentAnalysisClient client = new DocumentAnalysisClient(new Uri(endpoint), credential);
//sample document
Uri fileUri = new Uri("https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/read.png");
AnalyzeDocumentOperation operation = await client.AnalyzeDocumentFromUriAsync(WaitUntil.Completed, "prebuilt-read", fileUri);
AnalyzeResult result = operation.Value;
foreach (DocumentPage page in result.Pages)
{
Console.WriteLine($"Document Page {page.PageNumber} has {page.Lines.Count} line(s), {page.Words.Count} word(s),");
Console.WriteLine($"and {page.SelectionMarks.Count} selection mark(s).");
for (int i = 0; i < page.Lines.Count; i++)
{
DocumentLine line = page.Lines[i];
Console.WriteLine($" Line {i} has content: '{line.Content}'.");
Console.WriteLine($" Its bounding polygon (points ordered clockwise):");
for (int j = 0; j < line.BoundingPolygon.Count; j++)
{
Console.WriteLine($" Point {j} => X: {line.BoundingPolygon[j].X}, Y: {line.BoundingPolygon[j].Y}");
}
}
}
foreach (DocumentStyle style in result.Styles)
{
// Check the style and style confidence to see if text is handwritten.
// Note that value '0.8' is used as an example.
bool isHandwritten = style.IsHandwritten.HasValue && style.IsHandwritten == true;
if (isHandwritten && style.Confidence > 0.8)
{
Console.WriteLine($"Handwritten content found:");
foreach (DocumentSpan span in style.Spans)
{
Console.WriteLine($" Content: {result.Content.Substring(span.Index, span.Length)}");
}
}
}
Console.WriteLine("Detected languages:");
foreach (DocumentLanguage language in result.Languages)
{
Console.WriteLine($" Found language with locale'{language.Locale}' with confidence {language.Confidence}.");
}
请访问 GitHub 上的 Azure 示例存储库,并查看 read
模型输出。
使用布局模型
using Azure;
using Azure.AI.FormRecognizer.DocumentAnalysis;
//use your `key` and `endpoint` environment variables to create your `AzureKeyCredential` and `DocumentAnalysisClient` instances
string key = Environment.GetEnvironmentVariable("DI_KEY");
string endpoint = Environment.GetEnvironmentVariable("DI_ENDPOINT");
AzureKeyCredential credential = new AzureKeyCredential(key);
DocumentAnalysisClient client = new DocumentAnalysisClient(new Uri(endpoint), credential);
// sample document document
Uri fileUri = new Uri ("https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/layout.png");
AnalyzeDocumentOperation operation = await client.AnalyzeDocumentFromUriAsync(WaitUntil.Completed, "prebuilt-layout", fileUri);
AnalyzeResult result = operation.Value;
foreach (DocumentPage page in result.Pages)
{
Console.WriteLine($"Document Page {page.PageNumber} has {page.Lines.Count} line(s), {page.Words.Count} word(s),");
Console.WriteLine($"and {page.SelectionMarks.Count} selection mark(s).");
for (int i = 0; i < page.Lines.Count; i++)
{
DocumentLine line = page.Lines[i];
Console.WriteLine($" Line {i} has content: '{line.Content}'.");
Console.WriteLine($" Its bounding polygon (points ordered clockwise):");
for (int j = 0; j < line.BoundingPolygon.Count; j++)
{
Console.WriteLine($" Point {j} => X: {line.BoundingPolygon[j].X}, Y: {line.BoundingPolygon[j].Y}");
}
}
for (int i = 0; i < page.SelectionMarks.Count; i++)
{
DocumentSelectionMark selectionMark = page.SelectionMarks[i];
Console.WriteLine($" Selection Mark {i} is {selectionMark.State}.");
Console.WriteLine($" Its bounding polygon (points ordered clockwise):");
for (int j = 0; j < selectionMark.BoundingPolygon.Count; j++)
{
Console.WriteLine($" Point {j} => X: {selectionMark.BoundingPolygon[j].X}, Y: {selectionMark.BoundingPolygon[j].Y}");
}
}
}
Console.WriteLine("Paragraphs:");
foreach (DocumentParagraph paragraph in result.Paragraphs)
{
Console.WriteLine($" Paragraph content: {paragraph.Content}");
if (paragraph.Role != null)
{
Console.WriteLine($" Role: {paragraph.Role}");
}
}
foreach (DocumentStyle style in result.Styles)
{
// Check the style and style confidence to see if text is handwritten.
// Note that value '0.8' is used as an example.
bool isHandwritten = style.IsHandwritten.HasValue && style.IsHandwritten == true;
if (isHandwritten && style.Confidence > 0.8)
{
Console.WriteLine($"Handwritten content found:");
foreach (DocumentSpan span in style.Spans)
{
Console.WriteLine($" Content: {result.Content.Substring(span.Index, span.Length)}");
}
}
}
Console.WriteLine("The following tables were extracted:");
for (int i = 0; i < result.Tables.Count; i++)
{
DocumentTable table = result.Tables[i];
Console.WriteLine($" Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (DocumentTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) has kind '{cell.Kind}' and content: '{cell.Content}'.");
}
}
请访问 GitHub 上的 Azure 示例存储库,并查看布局模型输出。
使用“常规文档”模型
using Azure;
using Azure.AI.FormRecognizer.DocumentAnalysis;
//use your `key` and `endpoint` environment variables to create your `AzureKeyCredential` and `DocumentAnalysisClient` instances
string key = Environment.GetEnvironmentVariable("DI_KEY");
string endpoint = Environment.GetEnvironmentVariable("DI_ENDPOINT");
AzureKeyCredential credential = new AzureKeyCredential(key);
DocumentAnalysisClient client = new DocumentAnalysisClient(new Uri(endpoint), credential);
// sample document document
Uri fileUri = new Uri("https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/sample-layout.pdf");
AnalyzeDocumentOperation operation = await client.AnalyzeDocumentFromUriAsync(WaitUntil.Completed, "prebuilt-document", fileUri);
AnalyzeResult result = operation.Value;
Console.WriteLine("Detected key-value pairs:");
foreach (DocumentKeyValuePair kvp in result.KeyValuePairs)
{
if (kvp.Value == null)
{
Console.WriteLine($" Found key with no value: '{kvp.Key.Content}'");
}
else
{
Console.WriteLine($" Found key-value pair: '{kvp.Key.Content}' and '{kvp.Value.Content}'");
}
}
foreach (DocumentPage page in result.Pages)
{
Console.WriteLine($"Document Page {page.PageNumber} has {page.Lines.Count} line(s), {page.Words.Count} word(s),");
Console.WriteLine($"and {page.SelectionMarks.Count} selection mark(s).");
for (int i = 0; i < page.Lines.Count; i++)
{
DocumentLine line = page.Lines[i];
Console.WriteLine($" Line {i} has content: '{line.Content}'.");
Console.WriteLine($" Its bounding polygon (points ordered clockwise):");
for (int j = 0; j < line.BoundingPolygon.Count; j++)
{
Console.WriteLine($" Point {j} => X: {line.BoundingPolygon[j].X}, Y: {line.BoundingPolygon[j].Y}");
}
}
for (int i = 0; i < page.SelectionMarks.Count; i++)
{
DocumentSelectionMark selectionMark = page.SelectionMarks[i];
Console.WriteLine($" Selection Mark {i} is {selectionMark.State}.");
Console.WriteLine($" Its bounding polygon (points ordered clockwise):");
for (int j = 0; j < selectionMark.BoundingPolygon.Count; j++)
{
Console.WriteLine($" Point {j} => X: {selectionMark.BoundingPolygon[j].X}, Y: {selectionMark.BoundingPolygon[j].Y}");
}
}
}
foreach (DocumentStyle style in result.Styles)
{
// Check the style and style confidence to see if text is handwritten.
// Note that value '0.8' is used as an example.
bool isHandwritten = style.IsHandwritten.HasValue && style.IsHandwritten == true;
if (isHandwritten && style.Confidence > 0.8)
{
Console.WriteLine($"Handwritten content found:");
foreach (DocumentSpan span in style.Spans)
{
Console.WriteLine($" Content: {result.Content.Substring(span.Index, span.Length)}");
}
}
}
Console.WriteLine("The following tables were extracted:");
for (int i = 0; i < result.Tables.Count; i++)
{
DocumentTable table = result.Tables[i];
Console.WriteLine($" Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (DocumentTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) has kind '{cell.Kind}' and content: '{cell.Content}'.");
}
}
请访问 GitHub 上的 Azure 示例存储库,并查看常规文档模型输出。
使用 W-2 税务模型
using Azure;
using Azure.AI.FormRecognizer.DocumentAnalysis;
//use your `key` and `endpoint` environment variables to create your `AzureKeyCredential` and `DocumentAnalysisClient` instances
string key = Environment.GetEnvironmentVariable("DI_KEY");
string endpoint = Environment.GetEnvironmentVariable("DI_ENDPOINT");
AzureKeyCredential credential = new AzureKeyCredential(key);
DocumentAnalysisClient client = new DocumentAnalysisClient(new Uri(endpoint), credential);
// sample document document
Uri w2Uri = new Uri("https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/w2.png");
AnalyzeDocumentOperation operation = await client.AnalyzeDocumentFromUriAsync(WaitUntil.Completed, "prebuilt-tax.us.w2", w2Uri);
AnalyzeResult result = operation.Value;
for (int i = 0; i < result.Documents.Count; i++)
{
Console.WriteLine($"Document {i}:");
AnalyzedDocument document = result.Documents[i];
if (document.Fields.TryGetValue("AdditionalInfo", out DocumentField? additionalInfoField))
{
if (additionalInfoField.FieldType == DocumentFieldType.List)
{
foreach (DocumentField infoField in additionalInfoField.Value.AsList())
{
Console.WriteLine("AdditionalInfo:");
if (infoField.FieldType == DocumentFieldType.Dictionary)
{
IReadOnlyDictionary<string, DocumentField> infoFields = infoField.Value.AsDictionary();
if (infoFields.TryGetValue("Amount", out DocumentField? amountField))
{
if (amountField.FieldType == DocumentFieldType.Double)
{
double amount = amountField.Value.AsDouble();
Console.WriteLine($" Amount: '{amount}', with confidence {amountField.Confidence}");
}
}
if (infoFields.TryGetValue("LetterCode", out DocumentField? letterCodeField))
{
if (letterCodeField.FieldType == DocumentFieldType.String)
{
string letterCode = letterCodeField.Value.AsString();
Console.WriteLine($" LetterCode: '{letterCode}', with confidence {letterCodeField.Confidence}");
}
}
}
}
}
}
if (document.Fields.TryGetValue("AllocatedTips", out DocumentField? allocatedTipsField))
{
if (allocatedTipsField.FieldType == DocumentFieldType.Double)
{
double allocatedTips = allocatedTipsField.Value.AsDouble();
Console.WriteLine($"Allocated Tips: '{allocatedTips}', with confidence {allocatedTipsField.Confidence}");
}
}
if (document.Fields.TryGetValue("Employer", out DocumentField? employerField))
{
if (employerField.FieldType == DocumentFieldType.Dictionary)
{
IReadOnlyDictionary<string, DocumentField> employerFields = employerField.Value.AsDictionary();
if (employerFields.TryGetValue("Name", out DocumentField? employerNameField))
{
if (employerNameField.FieldType == DocumentFieldType.String)
{
string name = employerNameField.Value.AsString();
Console.WriteLine($"Employer Name: '{name}', with confidence {employerNameField.Confidence}");
}
}
if (employerFields.TryGetValue("IdNumber", out DocumentField? idNumberField))
{
if (idNumberField.FieldType == DocumentFieldType.String)
{
string id = idNumberField.Value.AsString();
Console.WriteLine($"Employer ID Number: '{id}', with confidence {idNumberField.Confidence}");
}
}
if (employerFields.TryGetValue("Address", out DocumentField? addressField))
{
if (addressField.FieldType == DocumentFieldType.Address)
{
Console.WriteLine($"Employer Address: '{addressField.Content}', with confidence {addressField.Confidence}");
}
}
}
}
}
请访问 GitHub 上的 Azure 示例存储库,并查看 W-2 税务模型输出。
使用发票模型
using Azure;
using Azure.AI.FormRecognizer.DocumentAnalysis;
//use your `key` and `endpoint` environment variables to create your `AzureKeyCredential` and `DocumentAnalysisClient` instances
string key = Environment.GetEnvironmentVariable("DI_KEY");
string endpoint = Environment.GetEnvironmentVariable("DI_ENDPOINT");
AzureKeyCredential credential = new AzureKeyCredential(key);
DocumentAnalysisClient client = new DocumentAnalysisClient(new Uri(endpoint), credential);
// sample document document
Uri invoiceUri = new Uri("https://github.com/Azure-Samples/cognitive-services-REST-api-samples/raw/master/curl/form-recognizer/rest-api/invoice.pdf");
AnalyzeDocumentOperation operation = await client.AnalyzeDocumentFromUriAsync(WaitUntil.Completed, "prebuilt-invoice", invoiceUri);
AnalyzeResult result = operation.Value;
for (int i = 0; i < result.Documents.Count; i++)
{
Console.WriteLine($"Document {i}:");
AnalyzedDocument document = result.Documents[i];
if (document.Fields.TryGetValue("VendorName", out DocumentField vendorNameField))
{
if (vendorNameField.FieldType == DocumentFieldType.String)
{
string vendorName = vendorNameField.Value.AsString();
Console.WriteLine($"Vendor Name: '{vendorName}', with confidence {vendorNameField.Confidence}");
}
}
if (document.Fields.TryGetValue("CustomerName", out DocumentField customerNameField))
{
if (customerNameField.FieldType == DocumentFieldType.String)
{
string customerName = customerNameField.Value.AsString();
Console.WriteLine($"Customer Name: '{customerName}', with confidence {customerNameField.Confidence}");
}
}
if (document.Fields.TryGetValue("Items", out DocumentField itemsField))
{
if (itemsField.FieldType == DocumentFieldType.List)
{
foreach (DocumentField itemField in itemsField.Value.AsList())
{
Console.WriteLine("Item:");
if (itemField.FieldType == DocumentFieldType.Dictionary)
{
IReadOnlyDictionary<string, DocumentField> itemFields = itemField.Value.AsDictionary();
if (itemFields.TryGetValue("Description", out DocumentField itemDescriptionField))
{
if (itemDescriptionField.FieldType == DocumentFieldType.String)
{
string itemDescription = itemDescriptionField.Value.AsString();
Console.WriteLine($" Description: '{itemDescription}', with confidence {itemDescriptionField.Confidence}");
}
}
if (itemFields.TryGetValue("Amount", out DocumentField itemAmountField))
{
if (itemAmountField.FieldType == DocumentFieldType.Currency)
{
CurrencyValue itemAmount = itemAmountField.Value.AsCurrency();
Console.WriteLine($" Amount: '{itemAmount.Symbol}{itemAmount.Amount}', with confidence {itemAmountField.Confidence}");
}
}
}
}
}
}
if (document.Fields.TryGetValue("SubTotal", out DocumentField subTotalField))
{
if (subTotalField.FieldType == DocumentFieldType.Currency)
{
CurrencyValue subTotal = subTotalField.Value.AsCurrency();
Console.WriteLine($"Sub Total: '{subTotal.Symbol}{subTotal.Amount}', with confidence {subTotalField.Confidence}");
}
}
if (document.Fields.TryGetValue("TotalTax", out DocumentField totalTaxField))
{
if (totalTaxField.FieldType == DocumentFieldType.Currency)
{
CurrencyValue totalTax = totalTaxField.Value.AsCurrency();
Console.WriteLine($"Total Tax: '{totalTax.Symbol}{totalTax.Amount}', with confidence {totalTaxField.Confidence}");
}
}
if (document.Fields.TryGetValue("InvoiceTotal", out DocumentField invoiceTotalField))
{
if (invoiceTotalField.FieldType == DocumentFieldType.Currency)
{
CurrencyValue invoiceTotal = invoiceTotalField.Value.AsCurrency();
Console.WriteLine($"Invoice Total: '{invoiceTotal.Symbol}{invoiceTotal.Amount}', with confidence {invoiceTotalField.Confidence}");
}
}
}
请访问 GitHub 上的 Azure 示例存储库,并查看发票模型输出。
使用收据模型
using Azure;
using Azure.AI.FormRecognizer.DocumentAnalysis;
//use your `key` and `endpoint` environment variables to create your `AzureKeyCredential` and `DocumentAnalysisClient` instances
string key = Environment.GetEnvironmentVariable("DI_KEY");
string endpoint = Environment.GetEnvironmentVariable("DI_ENDPOINT");
AzureKeyCredential credential = new AzureKeyCredential(key);
DocumentAnalysisClient client = new DocumentAnalysisClient(new Uri(endpoint), credential);
// sample document document
Uri receiptUri = new Uri("https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/receipt.png");
AnalyzeDocumentOperation operation = await client.AnalyzeDocumentFromUriAsync(WaitUntil.Completed, "prebuilt-receipt", receiptUri);
AnalyzeResult receipts = operation.Value;
foreach (AnalyzedDocument receipt in receipts.Documents)
{
if (receipt.Fields.TryGetValue("MerchantName", out DocumentField merchantNameField))
{
if (merchantNameField.FieldType == DocumentFieldType.String)
{
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($"Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
if (receipt.Fields.TryGetValue("TransactionDate", out DocumentField transactionDateField))
{
if (transactionDateField.FieldType == DocumentFieldType.Date)
{
DateTimeOffset transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($"Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
if (receipt.Fields.TryGetValue("Items", out DocumentField itemsField))
{
if (itemsField.FieldType == DocumentFieldType.List)
{
foreach (DocumentField itemField in itemsField.Value.AsList())
{
Console.WriteLine("Item:");
if (itemField.FieldType == DocumentFieldType.Dictionary)
{
IReadOnlyDictionary<string, DocumentField> itemFields = itemField.Value.AsDictionary();
if (itemFields.TryGetValue("Description", out DocumentField itemDescriptionField))
{
if (itemDescriptionField.FieldType == DocumentFieldType.String)
{
string itemDescription = itemDescriptionField.Value.AsString();
Console.WriteLine($" Description: '{itemDescription}', with confidence {itemDescriptionField.Confidence}");
}
}
if (itemFields.TryGetValue("TotalPrice", out DocumentField itemTotalPriceField))
{
if (itemTotalPriceField.FieldType == DocumentFieldType.Double)
{
double itemTotalPrice = itemTotalPriceField.Value.AsDouble();
Console.WriteLine($" Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
if (receipt.Fields.TryGetValue("Total", out DocumentField totalField))
{
if (totalField.FieldType == DocumentFieldType.Double)
{
double total = totalField.Value.AsDouble();
Console.WriteLine($"Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
请访问 GitHub 上的 Azure 示例存储库,并查看收据模型输出。
ID 文档模型
using Azure;
using Azure.AI.FormRecognizer.DocumentAnalysis;
//use your `key` and `endpoint` environment variables to create your `AzureKeyCredential` and `DocumentAnalysisClient` instances
string key = Environment.GetEnvironmentVariable("DI_KEY");
string endpoint = Environment.GetEnvironmentVariable("DI_ENDPOINT");
AzureKeyCredential credential = new AzureKeyCredential(key);
DocumentAnalysisClient client = new DocumentAnalysisClient(new Uri(endpoint), credential);
// sample document document
Uri idDocumentUri = new Uri("https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/identity_documents.png");
AnalyzeDocumentOperation operation = await client.AnalyzeDocumentFromUriAsync(WaitUntil.Completed, "prebuilt-idDocument", idDocumentUri);
AnalyzeResult identityDocuments = operation.Value;
AnalyzedDocument identityDocument = identityDocuments.Documents.Single();
if (identityDocument.Fields.TryGetValue("Address", out DocumentField addressField))
{
if (addressField.FieldType == DocumentFieldType.String)
{
string address = addressField.Value. AsString();
Console.WriteLine($"Address: '{address}', with confidence {addressField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("CountryRegion", out DocumentField countryRegionField))
{
if (countryRegionField.FieldType == DocumentFieldType.CountryRegion)
{
string countryRegion = countryRegionField.Value.AsCountryRegion();
Console.WriteLine($"CountryRegion: '{countryRegion}', with confidence {countryRegionField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfBirth", out DocumentField dateOfBirthField))
{
if (dateOfBirthField.FieldType == DocumentFieldType.Date)
{
DateTimeOffset dateOfBirth = dateOfBirthField.Value.AsDate();
Console.WriteLine($"Date Of Birth: '{dateOfBirth}', with confidence {dateOfBirthField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfExpiration", out DocumentField dateOfExpirationField))
{
if (dateOfExpirationField.FieldType == DocumentFieldType.Date)
{
DateTimeOffset dateOfExpiration = dateOfExpirationField.Value.AsDate();
Console.WriteLine($"Date Of Expiration: '{dateOfExpiration}', with confidence {dateOfExpirationField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DocumentNumber", out DocumentField documentNumberField))
{
if (documentNumberField.FieldType == DocumentFieldType.String)
{
string documentNumber = documentNumberField.Value.AsString();
Console.WriteLine($"Document Number: '{documentNumber}', with confidence {documentNumberField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("FirstName", out DocumentField firstNameField))
{
if (firstNameField.FieldType == DocumentFieldType.String)
{
string firstName = firstNameField.Value.AsString();
Console.WriteLine($"First Name: '{firstName}', with confidence {firstNameField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("LastName", out DocumentField lastNameField))
{
if (lastNameField.FieldType == DocumentFieldType.String)
{
string lastName = lastNameField.Value.AsString();
Console.WriteLine($"Last Name: '{lastName}', with confidence {lastNameField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("Region", out DocumentField regionfield))
{
if (regionfield.FieldType == DocumentFieldType.String)
{
string region = regionfield.Value.AsString();
Console.WriteLine($"Region: '{region}', with confidence {regionfield.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("Sex", out DocumentField sexfield))
{
if (sexfield.FieldType == DocumentFieldType.String)
{
string sex = sexfield.Value.AsString();
Console.WriteLine($"Sex: '{sex}', with confidence {sexfield.Confidence}");
}
}
请访问 GitHub 上的 Azure 示例存储库,并查看 ID 文档模型输出。
使用名片模型
using Azure;
using Azure.AI.FormRecognizer.DocumentAnalysis;
//use your `key` and `endpoint` environment variables to create your `AzureKeyCredential` and `DocumentAnalysisClient` instances
string key = Environment.GetEnvironmentVariable("DI_KEY");
string endpoint = Environment.GetEnvironmentVariable("DI_ENDPOINT");
AzureKeyCredential credential = new AzureKeyCredential(key);
DocumentAnalysisClient client = new DocumentAnalysisClient(new Uri(endpoint), credential);
// sample document document
Uri businessCardUri = new Uri("https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/business-card-english.jpg");
AnalyzeDocumentOperation operation = await client.AnalyzeDocumentFromUriAsync(WaitUntil.Completed, "prebuilt-businessCard", businessCardUri);
AnalyzeResult businessCards = operation.Value;
foreach (AnalyzedDocument businessCard in businessCards.Documents)
{
if (businessCard.Fields.TryGetValue("ContactNames", out DocumentField ContactNamesField))
{
if (ContactNamesField.FieldType == DocumentFieldType.List)
{
foreach (DocumentField contactNameField in ContactNamesField.Value.AsList())
{
Console.WriteLine("Contact Name: ");
if (contactNameField.FieldType == DocumentFieldType.Dictionary)
{
IReadOnlyDictionary<string, DocumentField> contactNameFields = contactNameField.Value.AsDictionary();
if (contactNameFields.TryGetValue("FirstName", out DocumentField firstNameField))
{
if (firstNameField.FieldType == DocumentFieldType.String)
{
string firstName = firstNameField.Value.AsString();
Console.WriteLine($" First Name: '{firstName}', with confidence {firstNameField.Confidence}");
}
}
if (contactNameFields.TryGetValue("LastName", out DocumentField lastNameField))
{
if (lastNameField.FieldType == DocumentFieldType.String)
{
string lastName = lastNameField.Value.AsString();
Console.WriteLine($" Last Name: '{lastName}', with confidence {lastNameField.Confidence}");
}
}
}
}
}
}
if (businessCard.Fields.TryGetValue("JobTitles", out DocumentField jobTitlesFields))
{
if (jobTitlesFields.FieldType == DocumentFieldType.List)
{
foreach (DocumentField jobTitleField in jobTitlesFields.Value.AsList())
{
if (jobTitleField.FieldType == DocumentFieldType.String)
{
string jobTitle = jobTitleField.Value.AsString();
Console.WriteLine($"Job Title: '{jobTitle}', with confidence {jobTitleField.Confidence}");
}
}
}
}
if (businessCard.Fields.TryGetValue("Departments", out DocumentField departmentFields))
{
if (departmentFields.FieldType == DocumentFieldType.List)
{
foreach (DocumentField departmentField in departmentFields.Value.AsList())
{
if (departmentField.FieldType == DocumentFieldType.String)
{
string department = departmentField.Value.AsString();
Console.WriteLine($"Department: '{department}', with confidence {departmentField.Confidence}");
}
}
}
}
if (businessCard.Fields.TryGetValue("Emails", out DocumentField emailFields))
{
if (emailFields.FieldType == DocumentFieldType.List)
{
foreach (DocumentField emailField in emailFields.Value.AsList())
{
if (emailField.FieldType == DocumentFieldType.String)
{
string email = emailField.Value.AsString();
Console.WriteLine($"Email: '{email}', with confidence {emailField.Confidence}");
}
}
}
}
if (businessCard.Fields.TryGetValue("Websites", out DocumentField websiteFields))
{
if (websiteFields.FieldType == DocumentFieldType.List)
{
foreach (DocumentField websiteField in websiteFields.Value.AsList())
{
if (websiteField.FieldType == DocumentFieldType.String)
{
string website = websiteField.Value.AsString();
Console.WriteLine($"Website: '{website}', with confidence {websiteField.Confidence}");
}
}
}
}
if (businessCard.Fields.TryGetValue("MobilePhones", out DocumentField mobilePhonesFields))
{
if (mobilePhonesFields.FieldType == DocumentFieldType.List)
{
foreach (DocumentField mobilePhoneField in mobilePhonesFields.Value.AsList())
{
if (mobilePhoneField.FieldType == DocumentFieldType.PhoneNumber)
{
string mobilePhone = mobilePhoneField.Value.AsPhoneNumber();
Console.WriteLine($"Mobile phone number: '{mobilePhone}', with confidence {mobilePhoneField.Confidence}");
}
}
}
}
if (businessCard.Fields.TryGetValue("WorkPhones", out DocumentField workPhonesFields))
{
if (workPhonesFields.FieldType == DocumentFieldType.List)
{
foreach (DocumentField workPhoneField in workPhonesFields.Value.AsList())
{
if (workPhoneField.FieldType == DocumentFieldType.PhoneNumber)
{
string workPhone = workPhoneField.Value.AsPhoneNumber();
Console.WriteLine($"Work phone number: '{workPhone}', with confidence {workPhoneField.Confidence}");
}
}
}
}
if (businessCard.Fields.TryGetValue("Faxes", out DocumentField faxesFields))
{
if (faxesFields.FieldType == DocumentFieldType.List)
{
foreach (DocumentField faxField in faxesFields.Value.AsList())
{
if (faxField.FieldType == DocumentFieldType.PhoneNumber)
{
string fax = faxField.Value.AsPhoneNumber();
Console.WriteLine($"Fax phone number: '{fax}', with confidence {faxField.Confidence}");
}
}
}
}
if (businessCard.Fields.TryGetValue("Addresses", out DocumentField addressesFields))
{
if (addressesFields.FieldType == DocumentFieldType.List)
{
foreach (DocumentField addressField in addressesFields.Value.AsList())
{
if (addressField.FieldType == DocumentFieldType.String)
{
string address = addressField.Value.AsString();
Console.WriteLine($"Address: '{address}', with confidence {addressField.Confidence}");
}
}
}
}
if (businessCard.Fields.TryGetValue("CompanyNames", out DocumentField companyNamesFields))
{
if (companyNamesFields.FieldType == DocumentFieldType.List)
{
foreach (DocumentField companyNameField in companyNamesFields.Value.AsList())
{
if (companyNameField.FieldType == DocumentFieldType.String)
{
string companyName = companyNameField.Value.AsString();
Console.WriteLine($"Company name: '{companyName}', with confidence {companyNameField.Confidence}");
}
}
}
}
}
请访问 GitHub 上的 Azure 示例存储库,并查看名片模型输出。
客户端库 | SDK 参考 | REST API 参考 | 包 (Maven) | 示例| 支持的 REST API 版本
先决条件
Azure 订阅 - 创建试用订阅。
最新版本的 Visual Studio Code 或者你首选的 IDE。 请参阅 Visual Studio Code 中的 Java。
- Visual Studio Code 提供了适用于 Windows 和 macOS 的 Java 编码包。 该编码包是
VS Code
、Java 开发工具包 (JDK) 和 Microsoft 建议扩展的集合。 编码包还可用于修复现有开发环境。 - 如果使用的是
VS Code
和适用于 Java 的编码包,请安装 Gradle for Java 扩展。
如果不使用 Visual Studio Code,请确保在开发环境中安装了以下内容:
- Java 开发工具包 (JDK) 版本 8 或更高版本。 有关详细信息,请参阅 Microsoft Build of OpenJDK。
- Gradle 版本 6.8 或更高版本。
- Visual Studio Code 提供了适用于 Windows 和 macOS 的 Java 编码包。 该编码包是
Azure AI 服务或文档智能资源。 创建“单一服务或多个服务。 可以使用免费定价层 (
F0
) 试用该服务,然后再升级到付费层进行生产。提示
如果计划通过一个终结点和密钥访问多个 Azure AI 服务,请创建 Azure AI 服务资源。 请创建仅供文档智能访问的文档智能资源。 若要使用 Microsoft Entra 身份验证,则需要单服务资源。
从创建的资源获取密钥和终结点,以便将应用程序连接到文档智能服务。
- 部署资源后,选择“转到资源”。
- 在左侧导航菜单中,选择“密钥和终结点”。
- 复制其中一个密钥和终结点,稍后你将在本快速入门中使用它们。
URL 中的文档文件。 对于本项目,可以使用下表中为每项功能提供的示例表单:
功能 modelID document-url “读取”模型 prebuilt-read 示例手册 布局模型 预生成布局 预订确认示例 W-2 窗体模型 prebuilt-tax.us.w2 示例 W-2 表单 发票模型 预生成的发票 示例发票 收据模型 prebuilt-receipt 示例收据 ID 文档模型 prebuilt-idDocument 示例身份证明文档 名片模型 prebuilt-businessCard 名片示例
设置环境变量
若要与文档智能服务交互,需要创建 DocumentAnalysisClient
类的实例。 为此,请使用 Azure 门户中的 key
和 endpoint
实例化客户端。 对于此项目,使用环境变量来存储和访问凭据。
重要
如果使用 API 密钥,请将其安全地存储在某个其他位置,例如 Azure Key Vault 中。 请不要直接在代码中包含 API 密钥,并且切勿公开发布该密钥。
有关 Azure AI 服务安全性的详细信息,请参阅对 Azure AI 服务的请求进行身份验证。
若要为文档智能资源密钥设置环境变量,请打开控制台窗口,按照操作系统和开发环境的说明进行操作。 将 <yourKey> 和 <yourEndpoint> 替换为 Azure 门户中资源的值。
环境变量不区分大小写。 它们通常以大写形式声明,其中包含的单词以下划线联接。 请根据命令提示符运行以下命令:
设置密钥变量:
setx DI_KEY <yourKey>
设置终结点变量
setx DI_ENDPOINT <yourEndpoint>
设置环境变量后关闭命令提示符窗口。 这些值将一直保留,直到你再次更改它们。
重启任何读取环境变量的运行程序。 例如,如果正在使用 Visual Studio 或 Visual Studio Code 作为编辑器,请在运行示例代码之前重启它。
下面是其他一些可与环境变量一起使用的有用命令:
命令 | 操作 | 示例 |
---|---|---|
setx VARIABLE_NAME= |
通过将环境变量值设置为空字符串来删除环境变量。 | setx DI_KEY= |
setx VARIABLE_NAME=value |
设置或更改环境变量的值。 | setx DI_KEY=<yourKey> |
set VARIABLE_NAME |
显示特定环境变量的值。 | set DI_KEY |
set |
显示所有环境变量。 | set |
设置编程环境
若要设置编程环境,请创建 Gradle 项目并安装客户端库。
创建新的 Gradle 项目
在控制台窗口中,为你的应用 (名为“form-recognizer-app”) 创建一个新目录,然后导航到该目录。
mkdir form-recognizer-app cd form-recognizer-app
从工作目录运行
gradle init
命令。 此命令将创建 Gradle 的基本生成文件,其中包括 build.gradle.kts - 在运行时将使用该文件创建并配置应用程序。gradle init --type basic
当提示你选择一个 DSL 时,选择 Kotlin。
选择“Enter”以接受默认项目名称 (form-recognizer-app)。
安装客户端库
本文使用 Gradle 依赖项管理器。 可以在 Maven 中央存储库中找到客户端库以及其他依赖项管理器的信息。
在 IDE 中打开项目的 build.gradle.kts 文件。 复制并粘贴以下代码,将客户端库作为
implementation
语句与所需的插件和设置一起包括在内。plugins { java application } application { mainClass.set("FormRecognizer") } repositories { mavenCentral() } dependencies { implementation(group = "com.azure", name = "azure-ai-formrecognizer", version = "4.0.0") }
创建 Java 应用程序
若要与文档智能服务交互,需要创建 DocumentAnalysisClient
类的实例。 为此,你要通过 Azure 门户使用 key
创建一个 AzureKeyCredential
,并使用 AzureKeyCredential
和文档智能 endpoint
创建一个 DocumentAnalysisClient
实例。
在 form-recognizer-app 目录中,运行以下命令:
mkdir -p src/main/java
创建以下目录结构:
导航到
java
目录,创建一个名为 FormRecognizer.java 的文件。提示
可以使用 Powershell 创建新文件。 按住 Shift 键并右键单击文件夹,在项目目录中打开 PowerShell 窗口,然后输入以下命令:New-Item FormRecognizer.java。
打开 FormRecognizer.java 文件,选择以下代码示例之一,然后将其复制/粘贴到你的应用程序中:
- 预生成读取模型是所有文档智能模型的核心,可以检测行、字词、位置和语言。 布局、常规文档、预生成和自定义模型都使用
read
模型作为从文档中提取文本的基础。 - 预生成布局模型从文档和图像中提取文本和文本位置、表、选择标记和结构信息。
- 预生成 tax.us.w2 模型提取美国国内税收署 (IRS) 纳税表单上报告的信息。
- 预生成发票模型会从各种格式和质量的销售发票中提取关键字段和行项。
- 预生成收据模型提取印刷体和手写体销售收据中的关键信息。
- 预生成 idDocument 模型从美国驾驶执照、国际护照个人资料页、美国各州身份证、社会保障卡和永久居民卡中提取关键信息。
- 预生成读取模型是所有文档智能模型的核心,可以检测行、字词、位置和语言。 布局、常规文档、预生成和自定义模型都使用
键入以下命令:
gradle build gradle -PmainClass=FormRecognizer run
使用“读取”模型
import com.azure.ai.formrecognizer.*;
import com.azure.ai.formrecognizer.documentanalysis.models.*;
import com.azure.ai.formrecognizer.documentanalysis.DocumentAnalysisClient;
import com.azure.ai.formrecognizer.documentanalysis.DocumentAnalysisClientBuilder;
import com.azure.core.credential.AzureKeyCredential;
import com.azure.core.util.polling.SyncPoller;
import java.io.IOException;
import java.util.List;
import java.util.Arrays;
import java.time.LocalDate;
import java.util.Map;
import java.util.stream.Collectors;
public class FormRecognizer {
//use your `key` and `endpoint` environment variables
private static final String key = System.getenv("FR_KEY");
private static final String endpoint = System.getenv("FR_ENDPOINT");
public static void main(final String[] args) {
// create your `DocumentAnalysisClient` instance and `AzureKeyCredential` variable
DocumentAnalysisClient client = new DocumentAnalysisClientBuilder()
.credential(new AzureKeyCredential(key))
.endpoint(endpoint)
.buildClient();
//sample document
String documentUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/read.png";
String modelId = "prebuilt-read";
SyncPoller < OperationResult, AnalyzeResult > analyzeLayoutResultPoller =
client.beginAnalyzeDocumentFromUrl(modelId, documentUrl);
AnalyzeResult analyzeLayoutResult = analyzeLayoutResultPoller.getFinalResult();
// pages
analyzeLayoutResult.getPages().forEach(documentPage -> {
System.out.printf("Page has width: %.2f and height: %.2f, measured with unit: %s%n",
documentPage.getWidth(),
documentPage.getHeight(),
documentPage.getUnit());
// lines
documentPage.getLines().forEach(documentLine ->
System.out.printf("Line %s is within a bounding polygon %s.%n",
documentLine.getContent(),
documentLine.getBoundingPolygon().toString()));
// words
documentPage.getWords().forEach(documentWord ->
System.out.printf("Word '%s' has a confidence score of %.2f.%n",
documentWord.getContent(),
documentWord.getConfidence()));
});
}
}
请访问 GitHub 上的 Azure 示例存储库,并查看 read
模型输出。
使用布局模型
import com.azure.ai.formrecognizer.*;
import com.azure.ai.formrecognizer.documentanalysis.models.*;
import com.azure.ai.formrecognizer.documentanalysis.DocumentAnalysisClient;
import com.azure.ai.formrecognizer.documentanalysis.DocumentAnalysisClientBuilder;
import com.azure.core.credential.AzureKeyCredential;
import com.azure.core.util.polling.SyncPoller;
import java.io.IOException;
import java.util.List;
import java.util.Arrays;
import java.time.LocalDate;
import java.util.Map;
import java.util.stream.Collectors;
public class FormRecognizer {
//use your `key` and `endpoint` environment variables
private static final String key = System.getenv("FR_KEY");
private static final String endpoint = System.getenv("FR_ENDPOINT");
public static void main(final String[] args) {
// create your `DocumentAnalysisClient` instance and `AzureKeyCredential` variable
DocumentAnalysisClient client = new DocumentAnalysisClientBuilder()
.credential(new AzureKeyCredential(key))
.endpoint(endpoint)
.buildClient();
//sample document
String layoutDocumentUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/layout.png";
String modelId = "prebuilt-layout";
SyncPoller < OperationResult, AnalyzeResult > analyzeLayoutResultPoller =
client.beginAnalyzeDocumentFromUrl(modelId, layoutDocumentUrl);
AnalyzeResult analyzeLayoutResult = analyzeLayoutResultPoller.getFinalResult();
// pages
analyzeLayoutResult.getPages().forEach(documentPage -> {
System.out.printf("Page has width: %.2f and height: %.2f, measured with unit: %s%n",
documentPage.getWidth(),
documentPage.getHeight(),
documentPage.getUnit());
// lines
documentPage.getLines().forEach(documentLine ->
System.out.printf("Line %s is within a bounding polygon %s.%n",
documentLine.getContent(),
documentLine.getBoundingPolygon().toString()));
// words
documentPage.getWords().forEach(documentWord ->
System.out.printf("Word '%s' has a confidence score of %.2f%n",
documentWord.getContent(),
documentWord.getConfidence()));
// selection marks
documentPage.getSelectionMarks().forEach(documentSelectionMark ->
System.out.printf("Selection mark is '%s' and is within a bounding polygon %s with confidence %.2f.%n",
documentSelectionMark.getSelectionMarkState().toString(),
getBoundingCoordinates(documentSelectionMark.getBoundingPolygon()),
documentSelectionMark.getConfidence()));
});
// tables
List < DocumentTable > tables = analyzeLayoutResult.getTables();
for (int i = 0; i < tables.size(); i++) {
DocumentTable documentTables = tables.get(i);
System.out.printf("Table %d has %d rows and %d columns.%n", i, documentTables.getRowCount(),
documentTables.getColumnCount());
documentTables.getCells().forEach(documentTableCell -> {
System.out.printf("Cell '%s', has row index %d and column index %d.%n", documentTableCell.getContent(),
documentTableCell.getRowIndex(), documentTableCell.getColumnIndex());
});
System.out.println();
}
}
// Utility function to get the bounding polygon coordinates.
private static String getBoundingCoordinates(List < Point > boundingPolygon) {
return boundingPolygon.stream().map(point -> String.format("[%.2f, %.2f]", point.getX(),
point.getY())).collect(Collectors.joining(", "));
}
}
请访问 GitHub 上的 Azure 示例存储库,并查看布局模型输出。
使用“常规文档”模型
import com.azure.ai.formrecognizer.*;
import com.azure.ai.formrecognizer.documentanalysis.models.*;
import com.azure.ai.formrecognizer.documentanalysis.DocumentAnalysisClient;
import com.azure.ai.formrecognizer.documentanalysis.DocumentAnalysisClientBuilder;
import com.azure.core.credential.AzureKeyCredential;
import com.azure.core.util.polling.SyncPoller;
import java.io.IOException;
import java.util.List;
import java.util.Arrays;
import java.time.LocalDate;
import java.util.Map;
import java.util.stream.Collectors;
public class FormRecognizer {
//use your `key` and `endpoint` environment variables
private static final String key = System.getenv("FR_KEY");
private static final String endpoint = System.getenv("FR_ENDPOINT");
public static void main(final String[] args) {
// create your `DocumentAnalysisClient` instance and `AzureKeyCredential` variable
DocumentAnalysisClient client = new DocumentAnalysisClientBuilder()
.credential(new AzureKeyCredential(key))
.endpoint(endpoint)
.buildClient();
//sample document
String generalDocumentUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/sample-layout.pdf";
String modelId = "prebuilt-document";
SyncPoller < OperationResult, AnalyzeResult > analyzeDocumentPoller =
client.beginAnalyzeDocumentFromUrl(modelId, generalDocumentUrl);
AnalyzeResult analyzeResult = analyzeDocumentPoller.getFinalResult();
// pages
analyzeResult.getPages().forEach(documentPage -> {
System.out.printf("Page has width: %.2f and height: %.2f, measured with unit: %s%n",
documentPage.getWidth(),
documentPage.getHeight(),
documentPage.getUnit());
// lines
documentPage.getLines().forEach(documentLine ->
System.out.printf("Line %s is within a bounding polygon %s.%n",
documentLine.getContent(),
documentLine.getBoundingPolygon().toString()));
// words
documentPage.getWords().forEach(documentWord ->
System.out.printf("Word %s has a confidence score of %.2f%n.",
documentWord.getContent(),
documentWord.getConfidence()));
});
// tables
List < DocumentTable > tab_les = analyzeResult.getTables();
for (int i = 0; i < tab_les.size(); i++) {
DocumentTable documentTable = tab_les.get(i);
System.out.printf("Table %d has %d rows and %d columns.%n", i, documentTable.getRowCount(),
documentTable.getColumnCount());
documentTable.getCells().forEach(documentTableCell -> {
System.out.printf("Cell '%s', has row index %d and column index %d.%n",
documentTableCell.getContent(),
documentTableCell.getRowIndex(), documentTableCell.getColumnIndex());
});
System.out.println();
}
// Key-value pairs
analyzeResult.getKeyValuePairs().forEach(documentKeyValuePair -> {
System.out.printf("Key content: %s%n", documentKeyValuePair.getKey().getContent());
System.out.printf("Key content bounding region: %s%n",
documentKeyValuePair.getKey().getBoundingRegions().toString());
if (documentKeyValuePair.getValue() != null) {
System.out.printf("Value content: %s%n", documentKeyValuePair.getValue().getContent());
System.out.printf("Value content bounding region: %s%n", documentKeyValuePair.getValue().getBoundingRegions().toString());
}
});
}
}
请访问 GitHub 上的 Azure 示例存储库,并查看常规文档模型输出。
使用 W-2 税务模型
import com.azure.ai.formrecognizer.*;
import com.azure.ai.formrecognizer.documentanalysis.models.*;
import com.azure.ai.formrecognizer.documentanalysis.DocumentAnalysisClient;
import com.azure.ai.formrecognizer.documentanalysis.DocumentAnalysisClientBuilder;
import com.azure.core.credential.AzureKeyCredential;
import com.azure.core.util.polling.SyncPoller;
import java.io.IOException;
import java.util.List;
import java.util.Arrays;
import java.time.LocalDate;
import java.util.Map;
import java.util.stream.Collectors;
public class FormRecognizer {
//use your `key` and `endpoint` environment variables
private static final String key = System.getenv("FR_KEY");
private static final String endpoint = System.getenv("FR_ENDPOINT");
public static void main(final String[] args) {
// create your `DocumentAnalysisClient` instance and `AzureKeyCredential` variable
DocumentAnalysisClient client = new DocumentAnalysisClientBuilder()
.credential(new AzureKeyCredential(key))
.endpoint(endpoint)
.buildClient();
// sample document
String w2Url = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/w2.png";
String modelId = "prebuilt-tax.us.w2";
SyncPoller < OperationResult, AnalyzeResult > analyzeW2Poller =
client.beginAnalyzeDocumentFromUrl(modelId, w2Url);
AnalyzeResult analyzeTaxResult = analyzeW2Poller.getFinalResult();
for (int i = 0; i < analyzeTaxResult.getDocuments().size(); i++) {
AnalyzedDocument analyzedTaxDocument = analyzeTaxResult.getDocuments().get(i);
Map < String, DocumentField > taxFields = analyzedTaxDocument.getFields();
System.out.printf("----------- Analyzing Document %d -----------%n", i);
DocumentField w2FormVariantField = taxFields.get("W2FormVariant");
if (w2FormVariantField != null) {
if (DocumentFieldType.STRING == w2FormVariantField.getType()) {
String merchantName = w2FormVariantField.getValueAsString();
System.out.printf("Form variant: %s, confidence: %.2f%n",
merchantName, w2FormVariantField.getConfidence());
}
}
DocumentField employeeField = taxFields.get("Employee");
if (employeeField != null) {
System.out.println("Employee Data: ");
if (DocumentFieldType.MAP == employeeField.getType()) {
Map < String, DocumentField > employeeDataFieldMap = employeeField.getValueAsMap();
DocumentField employeeName = employeeDataFieldMap.get("Name");
if (employeeName != null) {
if (DocumentFieldType.STRING == employeeName.getType()) {
String employeesName = employeeName.getValueAsString();
System.out.printf("Employee Name: %s, confidence: %.2f%n",
employeesName, employeeName.getConfidence());
}
}
DocumentField employeeAddrField = employeeDataFieldMap.get("Address");
if (employeeAddrField != null) {
if (DocumentFieldType.STRING == employeeAddrField.getType()) {
String employeeAddress = employeeAddrField.getValueAsString();
System.out.printf("Employee Address: %s, confidence: %.2f%n",
employeeAddress, employeeAddrField.getConfidence());
}
}
}
}
DocumentField employerField = taxFields.get("Employer");
if (employerField != null) {
System.out.println("Employer Data: ");
if (DocumentFieldType.MAP == employerField.getType()) {
Map < String, DocumentField > employerDataFieldMap = employerField.getValueAsMap();
DocumentField employerNameField = employerDataFieldMap.get("Name");
if (employerNameField != null) {
if (DocumentFieldType.STRING == employerNameField.getType()) {
String employerName = employerNameField.getValueAsString();
System.out.printf("Employer Name: %s, confidence: %.2f%n",
employerName, employerNameField.getConfidence());
}
}
DocumentField employerIDNumberField = employerDataFieldMap.get("IdNumber");
if (employerIDNumberField != null) {
if (DocumentFieldType.STRING == employerIDNumberField.getType()) {
String employerIdNumber = employerIDNumberField.getValueAsString();
System.out.printf("Employee ID Number: %s, confidence: %.2f%n",
employerIdNumber, employerIDNumberField.getConfidence());
}
}
}
}
DocumentField taxYearField = taxFields.get("TaxYear");
if (taxYearField != null) {
if (DocumentFieldType.STRING == taxYearField.getType()) {
String taxYear = taxYearField.getValueAsString();
System.out.printf("Tax year: %s, confidence: %.2f%n",
taxYear, taxYearField.getConfidence());
}
}
DocumentField taxDateField = taxFields.get("TaxDate");
if (taxDateField != null) {
if (DocumentFieldType.DATE == taxDateField.getType()) {
LocalDate taxDate = taxDateField.getValueAsDate();
System.out.printf("Tax Date: %s, confidence: %.2f%n",
taxDate, taxDateField.getConfidence());
}
}
DocumentField socialSecurityTaxField = taxFields.get("SocialSecurityTaxWithheld");
if (socialSecurityTaxField != null) {
if (DocumentFieldType.DOUBLE == socialSecurityTaxField.getType()) {
Double socialSecurityTax = socialSecurityTaxField.getValueAsDouble();
System.out.printf("Social Security Tax withheld: %.2f, confidence: %.2f%n",
socialSecurityTax, socialSecurityTaxField.getConfidence());
}
}
}
}
}
请访问 GitHub 上的 Azure 示例存储库,并查看 W-2 税务模型输出。
使用发票模型
import com.azure.ai.formrecognizer.*;
import com.azure.ai.formrecognizer.documentanalysis.models.*;
import com.azure.ai.formrecognizer.documentanalysis.DocumentAnalysisClient;
import com.azure.ai.formrecognizer.documentanalysis.DocumentAnalysisClientBuilder;
import com.azure.core.credential.AzureKeyCredential;
import com.azure.core.util.polling.SyncPoller;
import java.io.IOException;
import java.util.List;
import java.util.Arrays;
import java.time.LocalDate;
import java.util.Map;
import java.util.stream.Collectors;
public class FormRecognizer {
//use your `key` and `endpoint` environment variables
private static final String key = System.getenv("FR_KEY");
private static final String endpoint = System.getenv("FR_ENDPOINT");
public static void main(final String[] args) {
// create your `DocumentAnalysisClient` instance and `AzureKeyCredential` variable
DocumentAnalysisClient client = new DocumentAnalysisClientBuilder()
.credential(new AzureKeyCredential(key))
.endpoint(endpoint)
.buildClient();
// sample document
String invoiceUrl = "https://github.com/Azure-Samples/cognitive-services-REST-api-samples/raw/master/curl/form-recognizer/rest-api/invoice.pdf";
String modelId = "prebuilt-invoice";
SyncPoller < OperationResult, AnalyzeResult > analyzeInvoicesPoller =
client.beginAnalyzeDocumentFromUrl(modelId, invoiceUrl);
AnalyzeResult analyzeInvoiceResult = analyzeInvoicesPoller.getFinalResult();
for (int i = 0; i < analyzeInvoiceResult.getDocuments().size(); i++) {
AnalyzedDocument analyzedInvoice = analyzeInvoiceResult.getDocuments().get(i);
Map < String, DocumentField > invoiceFields = analyzedInvoice.getFields();
System.out.printf("----------- Analyzing invoice %d -----------%n", i);
DocumentField vendorNameField = invoiceFields.get("VendorName");
if (vendorNameField != null) {
if (DocumentFieldType.STRING == vendorNameField.getType()) {
String merchantName = vendorNameField.getValueAsString();
System.out.printf("Vendor Name: %s, confidence: %.2f%n",
merchantName, vendorNameField.getConfidence());
}
}
DocumentField vendorAddressField = invoiceFields.get("VendorAddress");
if (vendorAddressField != null) {
if (DocumentFieldType.STRING == vendorAddressField.getType()) {
String merchantAddress = vendorAddressField.getValueAsString();
System.out.printf("Vendor address: %s, confidence: %.2f%n",
merchantAddress, vendorAddressField.getConfidence());
}
}
DocumentField customerNameField = invoiceFields.get("CustomerName");
if (customerNameField != null) {
if (DocumentFieldType.STRING == customerNameField.getType()) {
String merchantAddress = customerNameField.getValueAsString();
System.out.printf("Customer Name: %s, confidence: %.2f%n",
merchantAddress, customerNameField.getConfidence());
}
}
DocumentField customerAddressRecipientField = invoiceFields.get("CustomerAddressRecipient");
if (customerAddressRecipientField != null) {
if (DocumentFieldType.STRING == customerAddressRecipientField.getType()) {
String customerAddr = customerAddressRecipientField.getValueAsString();
System.out.printf("Customer Address Recipient: %s, confidence: %.2f%n",
customerAddr, customerAddressRecipientField.getConfidence());
}
}
DocumentField invoiceIdField = invoiceFields.get("InvoiceId");
if (invoiceIdField != null) {
if (DocumentFieldType.STRING == invoiceIdField.getType()) {
String invoiceId = invoiceIdField.getValueAsString();
System.out.printf("Invoice ID: %s, confidence: %.2f%n",
invoiceId, invoiceIdField.getConfidence());
}
}
DocumentField invoiceDateField = invoiceFields.get("InvoiceDate");
if (customerNameField != null) {
if (DocumentFieldType.DATE == invoiceDateField.getType()) {
LocalDate invoiceDate = invoiceDateField.getValueAsDate();
System.out.printf("Invoice Date: %s, confidence: %.2f%n",
invoiceDate, invoiceDateField.getConfidence());
}
}
DocumentField invoiceTotalField = invoiceFields.get("InvoiceTotal");
if (customerAddressRecipientField != null) {
if (DocumentFieldType.DOUBLE == invoiceTotalField.getType()) {
Double invoiceTotal = invoiceTotalField.getValueAsDouble();
System.out.printf("Invoice Total: %.2f, confidence: %.2f%n",
invoiceTotal, invoiceTotalField.getConfidence());
}
}
DocumentField invoiceItemsField = invoiceFields.get("Items");
if (invoiceItemsField != null) {
System.out.printf("Invoice Items: %n");
if (DocumentFieldType.LIST == invoiceItemsField.getType()) {
List < DocumentField > invoiceItems = invoiceItemsField.getValueAsList();
invoiceItems.stream()
.filter(invoiceItem -> DocumentFieldType.MAP == invoiceItem.getType())
.map(documentField -> documentField.getValueAsMap())
.forEach(documentFieldMap -> documentFieldMap.forEach((key, documentField) -> {
if ("Description".equals(key)) {
if (DocumentFieldType.STRING == documentField.getType()) {
String name = documentField.getValueAsString();
System.out.printf("Description: %s, confidence: %.2fs%n",
name, documentField.getConfidence());
}
}
if ("Quantity".equals(key)) {
if (DocumentFieldType.DOUBLE == documentField.getType()) {
Double quantity = documentField.getValueAsDouble();
System.out.printf("Quantity: %f, confidence: %.2f%n",
quantity, documentField.getConfidence());
}
}
if ("UnitPrice".equals(key)) {
if (DocumentFieldType.DOUBLE == documentField.getType()) {
Double unitPrice = documentField.getValueAsDouble();
System.out.printf("Unit Price: %f, confidence: %.2f%n",
unitPrice, documentField.getConfidence());
}
}
if ("ProductCode".equals(key)) {
if (DocumentFieldType.DOUBLE == documentField.getType()) {
Double productCode = documentField.getValueAsDouble();
System.out.printf("Product Code: %f, confidence: %.2f%n",
productCode, documentField.getConfidence());
}
}
}));
}
}
}
}
}
请访问 GitHub 上的 Azure 示例存储库,并查看发票模型输出。
使用收据模型
import com.azure.ai.formrecognizer.*;
import com.azure.ai.formrecognizer.documentanalysis.models.*;
import com.azure.ai.formrecognizer.documentanalysis.DocumentAnalysisClient;
import com.azure.ai.formrecognizer.documentanalysis.DocumentAnalysisClientBuilder;
import com.azure.core.credential.AzureKeyCredential;
import com.azure.core.util.polling.SyncPoller;
import java.io.IOException;
import java.util.List;
import java.util.Arrays;
import java.time.LocalDate;
import java.util.Map;
import java.util.stream.Collectors;
public class FormRecognizer {
//use your `key` and `endpoint` environment variables
private static final String key = System.getenv("FR_KEY");
private static final String endpoint = System.getenv("FR_ENDPOINT");
public static void main(final String[] args) {
// create your `DocumentAnalysisClient` instance and `AzureKeyCredential` variable
DocumentAnalysisClient client = new DocumentAnalysisClientBuilder()
.credential(new AzureKeyCredential(key))
.endpoint(endpoint)
.buildClient();
String receiptUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/receipt.png";
String modelId = "prebuilt-receipt";
SyncPoller < OperationResult, AnalyzeResult > analyzeReceiptPoller =
client.beginAnalyzeDocumentFromUrl(modelId, receiptUrl);
AnalyzeResult receiptResults = analyzeReceiptPoller.getFinalResult();
for (int i = 0; i < receiptResults.getDocuments().size(); i++) {
AnalyzedDocument analyzedReceipt = receiptResults.getDocuments().get(i);
Map < String, DocumentField > receiptFields = analyzedReceipt.getFields();
System.out.printf("----------- Analyzing receipt info %d -----------%n", i);
DocumentField merchantNameField = receiptFields.get("MerchantName");
if (merchantNameField != null) {
if (DocumentFieldType.STRING == merchantNameField.getType()) {
String merchantName = merchantNameField.getValueAsString();
System.out.printf("Merchant Name: %s, confidence: %.2f%n",
merchantName, merchantNameField.getConfidence());
}
}
DocumentField merchantPhoneNumberField = receiptFields.get("MerchantPhoneNumber");
if (merchantPhoneNumberField != null) {
if (DocumentFieldType.PHONE_NUMBER == merchantPhoneNumberField.getType()) {
String merchantAddress = merchantPhoneNumberField.getValueAsPhoneNumber();
System.out.printf("Merchant Phone number: %s, confidence: %.2f%n",
merchantAddress, merchantPhoneNumberField.getConfidence());
}
}
DocumentField merchantAddressField = receiptFields.get("MerchantAddress");
if (merchantAddressField != null) {
if (DocumentFieldType.STRING == merchantAddressField.getType()) {
String merchantAddress = merchantAddressField.getValueAsString();
System.out.printf("Merchant Address: %s, confidence: %.2f%n",
merchantAddress, merchantAddressField.getConfidence());
}
}
DocumentField transactionDateField = receiptFields.get("TransactionDate");
if (transactionDateField != null) {
if (DocumentFieldType.DATE == transactionDateField.getType()) {
LocalDate transactionDate = transactionDateField.getValueAsDate();
System.out.printf("Transaction Date: %s, confidence: %.2f%n",
transactionDate, transactionDateField.getConfidence());
}
}
DocumentField receiptItemsField = receiptFields.get("Items");
if (receiptItemsField != null) {
System.out.printf("Receipt Items: %n");
if (DocumentFieldType.LIST == receiptItemsField.getType()) {
List < DocumentField > receiptItems = receiptItemsField.getValueAsList();
receiptItems.stream()
.filter(receiptItem -> DocumentFieldType.MAP == receiptItem.getType())
.map(documentField -> documentField.getValueAsMap())
.forEach(documentFieldMap -> documentFieldMap.forEach((key, documentField) -> {
if ("Name".equals(key)) {
if (DocumentFieldType.STRING == documentField.getType()) {
String name = documentField.getValueAsString();
System.out.printf("Name: %s, confidence: %.2fs%n",
name, documentField.getConfidence());
}
}
if ("Quantity".equals(key)) {
if (DocumentFieldType.DOUBLE == documentField.getType()) {
Double quantity = documentField.getValueAsDouble();
System.out.printf("Quantity: %f, confidence: %.2f%n",
quantity, documentField.getConfidence());
}
}
if ("Price".equals(key)) {
if (DocumentFieldType.DOUBLE == documentField.getType()) {
Double price = documentField.getValueAsDouble();
System.out.printf("Price: %f, confidence: %.2f%n",
price, documentField.getConfidence());
}
}
if ("TotalPrice".equals(key)) {
if (DocumentFieldType.DOUBLE == documentField.getType()) {
Double totalPrice = documentField.getValueAsDouble();
System.out.printf("Total Price: %f, confidence: %.2f%n",
totalPrice, documentField.getConfidence());
}
}
}));
}
}
}
}
}
请访问 GitHub 上的 Azure 示例存储库,并查看收据模型输出。
使用 ID 文档模型
import com.azure.ai.formrecognizer.*;
import com.azure.ai.formrecognizer.documentanalysis.models.*;
import com.azure.ai.formrecognizer.documentanalysis.DocumentAnalysisClient;
import com.azure.ai.formrecognizer.documentanalysis.DocumentAnalysisClientBuilder;
import com.azure.core.credential.AzureKeyCredential;
import com.azure.core.util.polling.SyncPoller;
import java.io.IOException;
import java.util.List;
import java.util.Arrays;
import java.time.LocalDate;
import java.util.Map;
import java.util.stream.Collectors;
public class FormRecognizer {
//use your `key` and `endpoint` environment variables
private static final String key = System.getenv("FR_KEY");
private static final String endpoint = System.getenv("FR_ENDPOINT");
public static void main(final String[] args) {
// create your `DocumentAnalysisClient` instance and `AzureKeyCredential` variable
DocumentAnalysisClient client = new DocumentAnalysisClientBuilder()
.credential(new AzureKeyCredential(key))
.endpoint(endpoint)
.buildClient();
//sample document
String licenseUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/identity_documents.png";
String modelId = "prebuilt-idDocument";
SyncPoller < OperationResult, AnalyzeResult > analyzeIdentityDocumentPoller = client.beginAnalyzeDocumentFromUrl(modelId, licenseUrl);
AnalyzeResult identityDocumentResults = analyzeIdentityDocumentPoller.getFinalResult();
for (int i = 0; i < identityDocumentResults.getDocuments().size(); i++) {
AnalyzedDocument analyzedIDDocument = identityDocumentResults.getDocuments().get(i);
Map < String, DocumentField > licenseFields = analyzedIDDocument.getFields();
System.out.printf("----------- Analyzed license info for page %d -----------%n", i);
DocumentField addressField = licenseFields.get("Address");
if (addressField != null) {
if (DocumentFieldType.STRING == addressField.getType()) {
String address = addressField.getValueAsString();
System.out.printf("Address: %s, confidence: %.2f%n",
address, addressField.getConfidence());
}
}
DocumentField countryRegionDocumentField = licenseFields.get("CountryRegion");
if (countryRegionDocumentField != null) {
if (DocumentFieldType.STRING == countryRegionDocumentField.getType()) {
String countryRegion = countryRegionDocumentField.getValueAsCountry();
System.out.printf("Country or region: %s, confidence: %.2f%n",
countryRegion, countryRegionDocumentField.getConfidence());
}
}
DocumentField dateOfBirthField = licenseFields.get("DateOfBirth");
if (dateOfBirthField != null) {
if (DocumentFieldType.DATE == dateOfBirthField.getType()) {
LocalDate dateOfBirth = dateOfBirthField.getValueAsDate();
System.out.printf("Date of Birth: %s, confidence: %.2f%n",
dateOfBirth, dateOfBirthField.getConfidence());
}
}
DocumentField dateOfExpirationField = licenseFields.get("DateOfExpiration");
if (dateOfExpirationField != null) {
if (DocumentFieldType.DATE == dateOfExpirationField.getType()) {
LocalDate expirationDate = dateOfExpirationField.getValueAsDate();
System.out.printf("Document date of expiration: %s, confidence: %.2f%n",
expirationDate, dateOfExpirationField.getConfidence());
}
}
DocumentField documentNumberField = licenseFields.get("DocumentNumber");
if (documentNumberField != null) {
if (DocumentFieldType.STRING == documentNumberField.getType()) {
String documentNumber = documentNumberField.getValueAsString();
System.out.printf("Document number: %s, confidence: %.2f%n",
documentNumber, documentNumberField.getConfidence());
}
}
DocumentField firstNameField = licenseFields.get("FirstName");
if (firstNameField != null) {
if (DocumentFieldType.STRING == firstNameField.getType()) {
String firstName = firstNameField.getValueAsString();
System.out.printf("First Name: %s, confidence: %.2f%n",
firstName, documentNumberField.getConfidence());
}
}
DocumentField lastNameField = licenseFields.get("LastName");
if (lastNameField != null) {
if (DocumentFieldType.STRING == lastNameField.getType()) {
String lastName = lastNameField.getValueAsString();
System.out.printf("Last name: %s, confidence: %.2f%n",
lastName, lastNameField.getConfidence());
}
}
DocumentField regionField = licenseFields.get("Region");
if (regionField != null) {
if (DocumentFieldType.STRING == regionField.getType()) {
String region = regionField.getValueAsString();
System.out.printf("Region: %s, confidence: %.2f%n",
region, regionField.getConfidence());
}
}
}
}
}
请访问 GitHub 上的 Azure 示例存储库,并查看 ID 文档模型输出。
使用名片模型
import com.azure.ai.formrecognizer.*;
import com.azure.ai.formrecognizer.documentanalysis.models.*;
import com.azure.ai.formrecognizer.documentanalysis.DocumentAnalysisClient;
import com.azure.ai.formrecognizer.documentanalysis.DocumentAnalysisClientBuilder;
import com.azure.core.credential.AzureKeyCredential;
import com.azure.core.util.polling.SyncPoller;
import java.io.IOException;
import java.util.List;
import java.util.Arrays;
import java.time.LocalDate;
import java.util.Map;
import java.util.stream.Collectors;
public class FormRecognizer {
//use your `key` and `endpoint` environment variables
private static final String key = System.getenv("FR_KEY");
private static final String endpoint = System.getenv("FR_ENDPOINT");
public static void main(final String[] args) {
// create your `DocumentAnalysisClient` instance and `AzureKeyCredential` variable
DocumentAnalysisClient client = new DocumentAnalysisClientBuilder()
.credential(new AzureKeyCredential(key))
.endpoint(endpoint)
.buildClient();
//sample document
String businessCardUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/de5e0d8982ab754823c54de47a47e8e499351523/curl/form-recognizer/rest-api/business_card.jpg";
String modelId = "prebuilt-businessCard";
SyncPoller < OperationResult, AnalyzeResult > analyzeBusinessCardPoller = client.beginAnalyzeDocumentFromUrl(modelId, businessCardUrl);
AnalyzeResult businessCardPageResults = analyzeBusinessCardPoller.getFinalResult();
for (int i = 0; i < businessCardPageResults.getDocuments().size(); i++) {
System.out.printf("--------Analyzing business card %d -----------%n", i);
AnalyzedDocument analyzedBusinessCard = businessCardPageResults.getDocuments().get(i);
Map < String, DocumentField > businessCardFields = analyzedBusinessCard.getFields();
DocumentField contactNamesDocumentField = businessCardFields.get("ContactNames");
if (contactNamesDocumentField != null) {
if (DocumentFieldType.LIST == contactNamesDocumentField.getType()) {
List < DocumentField > contactNamesList = contactNamesDocumentField.getValueAsList();
contactNamesList.stream()
.filter(contactName -> DocumentFieldType.MAP == contactName.getType())
.map(contactName -> {
System.out.printf("Contact name: %s%n", contactName.getContent());
return contactName.getValueAsMap();
})
.forEach(contactNamesMap -> contactNamesMap.forEach((key, contactName) -> {
if ("FirstName".equals(key)) {
if (DocumentFieldType.STRING == contactName.getType()) {
String firstName = contactName.getValueAsString();
System.out.printf("\tFirst Name: %s, confidence: %.2f%n",
firstName, contactName.getConfidence());
}
}
if ("LastName".equals(key)) {
if (DocumentFieldType.STRING == contactName.getType()) {
String lastName = contactName.getValueAsString();
System.out.printf("\tLast Name: %s, confidence: %.2f%n",
lastName, contactName.getConfidence());
}
}
}));
}
}
DocumentField jobTitles = businessCardFields.get("JobTitles");
if (jobTitles != null) {
if (DocumentFieldType.LIST == jobTitles.getType()) {
List < DocumentField > jobTitlesItems = jobTitles.getValueAsList();
jobTitlesItems.forEach(jobTitlesItem -> {
if (DocumentFieldType.STRING == jobTitlesItem.getType()) {
String jobTitle = jobTitlesItem.getValueAsString();
System.out.printf("Job Title: %s, confidence: %.2f%n",
jobTitle, jobTitlesItem.getConfidence());
}
});
}
}
DocumentField departments = businessCardFields.get("Departments");
if (departments != null) {
if (DocumentFieldType.LIST == departments.getType()) {
List < DocumentField > departmentsItems = departments.getValueAsList();
departmentsItems.forEach(departmentsItem -> {
if (DocumentFieldType.STRING == departmentsItem.getType()) {
String department = departmentsItem.getValueAsString();
System.out.printf("Department: %s, confidence: %.2f%n",
department, departmentsItem.getConfidence());
}
});
}
}
DocumentField emails = businessCardFields.get("Emails");
if (emails != null) {
if (DocumentFieldType.LIST == emails.getType()) {
List < DocumentField > emailsItems = emails.getValueAsList();
emailsItems.forEach(emailsItem -> {
if (DocumentFieldType.STRING == emailsItem.getType()) {
String email = emailsItem.getValueAsString();
System.out.printf("Email: %s, confidence: %.2f%n", email, emailsItem.getConfidence());
}
});
}
}
DocumentField websites = businessCardFields.get("Websites");
if (websites != null) {
if (DocumentFieldType.LIST == websites.getType()) {
List < DocumentField > websitesItems = websites.getValueAsList();
websitesItems.forEach(websitesItem -> {
if (DocumentFieldType.STRING == websitesItem.getType()) {
String website = websitesItem.getValueAsString();
System.out.printf("Web site: %s, confidence: %.2f%n",
website, websitesItem.getConfidence());
}
});
}
}
DocumentField mobilePhones = businessCardFields.get("MobilePhones");
if (mobilePhones != null) {
if (DocumentFieldType.LIST == mobilePhones.getType()) {
List < DocumentField > mobilePhonesItems = mobilePhones.getValueAsList();
mobilePhonesItems.forEach(mobilePhonesItem -> {
if (DocumentFieldType.PHONE_NUMBER == mobilePhonesItem.getType()) {
String mobilePhoneNumber = mobilePhonesItem.getValueAsPhoneNumber();
System.out.printf("Mobile phone number: %s, confidence: %.2f%n",
mobilePhoneNumber, mobilePhonesItem.getConfidence());
}
});
}
}
DocumentField otherPhones = businessCardFields.get("OtherPhones");
if (otherPhones != null) {
if (DocumentFieldType.LIST == otherPhones.getType()) {
List < DocumentField > otherPhonesItems = otherPhones.getValueAsList();
otherPhonesItems.forEach(otherPhonesItem -> {
if (DocumentFieldType.PHONE_NUMBER == otherPhonesItem.getType()) {
String otherPhoneNumber = otherPhonesItem.getValueAsPhoneNumber();
System.out.printf("Other phone number: %s, confidence: %.2f%n",
otherPhoneNumber, otherPhonesItem.getConfidence());
}
});
}
}
DocumentField faxes = businessCardFields.get("Faxes");
if (faxes != null) {
if (DocumentFieldType.LIST == faxes.getType()) {
List < DocumentField > faxesItems = faxes.getValueAsList();
faxesItems.forEach(faxesItem -> {
if (DocumentFieldType.PHONE_NUMBER == faxesItem.getType()) {
String faxPhoneNumber = faxesItem.getValueAsPhoneNumber();
System.out.printf("Fax phone number: %s, confidence: %.2f%n",
faxPhoneNumber, faxesItem.getConfidence());
}
});
}
}
DocumentField addresses = businessCardFields.get("Addresses");
if (addresses != null) {
if (DocumentFieldType.LIST == addresses.getType()) {
List < DocumentField > addressesItems = addresses.getValueAsList();
addressesItems.forEach(addressesItem -> {
if (DocumentFieldType.STRING == addressesItem.getType()) {
String address = addressesItem.getValueAsString();
System.out
.printf("Address: %s, confidence: %.2f%n", address, addressesItem.getConfidence());
}
});
}
}
DocumentField companyName = businessCardFields.get("CompanyNames");
if (companyName != null) {
if (DocumentFieldType.LIST == companyName.getType()) {
List < DocumentField > companyNameItems = companyName.getValueAsList();
companyNameItems.forEach(companyNameItem -> {
if (DocumentFieldType.STRING == companyNameItem.getType()) {
String companyNameValue = companyNameItem.getValueAsString();
System.out.printf("Company name: %s, confidence: %.2f%n", companyNameValue,
companyNameItem.getConfidence());
}
});
}
}
}
}
}
请访问 GitHub 上的 Azure 示例存储库,并查看名片模型输出。
先决条件
Azure 订阅 - 创建试用订阅。
最新版本的 Visual Studio Code 或者你首选的 IDE。 有关详细信息,请参阅 Visual Studio Code 中的 Node.js。
Node.js 的最新
LTS
版本。Azure AI 服务或文档智能资源。 创建“单一服务或多个服务。 可以使用免费定价层 (
F0
) 试用该服务,然后再升级到付费层进行生产。提示
如果计划通过一个终结点和密钥访问多个 Azure AI 服务,请创建 Azure AI 服务资源。 请创建仅供文档智能访问的文档智能资源。 若要使用 Microsoft Entra 身份验证,则需要单服务资源。
从创建的资源获取密钥和终结点,以便将应用程序连接到文档智能服务。
- 部署资源后,选择“转到资源”。
- 在左侧导航菜单中,选择“密钥和终结点”。
- 复制其中一个密钥和终结点,稍后你将在本快速入门中使用它们。
URL 中的文档文件。 对于本项目,可以使用下表中为每项功能提供的示例表单:
功能 modelID document-url “读取”模型 prebuilt-read 示例手册 布局模型 预生成布局 预订确认示例 W-2 窗体模型 prebuilt-tax.us.w2 示例 W-2 表单 发票模型 预生成的发票 示例发票 收据模型 prebuilt-receipt 示例收据 ID 文档模型 prebuilt-idDocument 示例身份证明文档 名片模型 prebuilt-businessCard 名片示例
设置环境变量
若要与文档智能服务交互,需要创建 DocumentAnalysisClient
类的实例。 为此,请使用 Azure 门户中的 key
和 endpoint
实例化客户端。 对于此项目,使用环境变量来存储和访问凭据。
重要
如果使用 API 密钥,请将其安全地存储在某个其他位置,例如 Azure Key Vault 中。 请不要直接在代码中包含 API 密钥,并且切勿公开发布该密钥。
有关 Azure AI 服务安全性的详细信息,请参阅对 Azure AI 服务的请求进行身份验证。
若要为文档智能资源密钥设置环境变量,请打开控制台窗口,按照操作系统和开发环境的说明进行操作。 将 <yourKey> 和 <yourEndpoint> 替换为 Azure 门户中资源的值。
环境变量不区分大小写。 它们通常以大写形式声明,其中包含的单词以下划线联接。 请根据命令提示符运行以下命令:
设置密钥变量:
setx DI_KEY <yourKey>
设置终结点变量
setx DI_ENDPOINT <yourEndpoint>
设置环境变量后关闭命令提示符窗口。 这些值将一直保留,直到你再次更改它们。
重启任何读取环境变量的运行程序。 例如,如果正在使用 Visual Studio 或 Visual Studio Code 作为编辑器,请在运行示例代码之前重启它。
下面是其他一些可与环境变量一起使用的有用命令:
命令 | 操作 | 示例 |
---|---|---|
setx VARIABLE_NAME= |
通过将环境变量值设置为空字符串来删除环境变量。 | setx DI_KEY= |
setx VARIABLE_NAME=value |
设置或更改环境变量的值。 | setx DI_KEY=<yourKey> |
set VARIABLE_NAME |
显示特定环境变量的值。 | set DI_KEY |
set |
显示所有环境变量。 | set |
设置编程环境
创建 Node.js Express 应用程序。
在控制台窗口中,为名为“
form-recognizer-app
”的应用创建一个新目录,然后导航到该目录。mkdir form-recognizer-app cd form-recognizer-app
运行
npm init
命令以初始化应用程序并为项目构建基架。npm init
使用终端中提供的提示指定项目的属性。
- 最重要的属性包括名称、版本号和入口点。
- 建议保留
index.js
作为入口点名称。 描述、测试命令、GitHub 存储库、关键字、作者和许可证信息均为可选属性。 可在此项目中跳过它们。 - 选择“Enter”以接受括号中的建议。
完成提示后,命令会在
package.json
form-recognizer-app 目录中创建一个 文件。安装
ai-form-recognizer
客户端库和azure/identity
npm 包:npm i @azure/ai-form-recognizer @azure/identity
应用的 package.json 文件将随依赖项进行更新。
在应用程序目录中创建一个名为“index.js”的文件。
提示
可以使用 Powershell 创建新文件。 按住 Shift 键并右键单击文件夹,在项目目录中打开 PowerShell 窗口,然后输入以下命令:New-Item index.js。
生成应用程序
若要与文档智能服务交互,需要创建 DocumentAnalysisClient
类的实例。 为此,你要通过 Azure 门户使用你的密钥创建一个 AzureKeyCredential
,并使用 AzureKeyCredential
和文档智能终结点创建一个 DocumentAnalysisClient
实例。
在 Visual Studio Code 或你喜爱的 IDE 中打开 index.js
文件,选择以下代码示例之一,然后将其复制/粘贴到你的应用程序中:
- 预生成读取模型是所有文档智能模型的核心,可以检测行、字词、位置和语言。 布局、常规文档、预生成和自定义模型都使用
read
模型作为从文档中提取文本的基础。 - 预生成布局模型从文档和图像中提取文本和文本位置、表、选择标记和结构信息。
- 预生成 tax.us.w2 模型提取美国国内税收署 (IRS) 纳税表单上报告的信息。
- prebuilt-invoice 模型会提取美国国内税收署纳税表单上报告的信息。
- 预生成收据模型提取印刷体和手写体销售收据中的关键信息。
- 预生成 idDocument 模型从美国驾驶执照、国际护照个人资料页、美国各州身份证、社会保障卡和永久居民卡中提取关键信息。
使用“读取”模型
const { AzureKeyCredential, DocumentAnalysisClient } = require("@azure/ai-form-recognizer");
//use your `key` and `endpoint` environment variables
const key = process.env['FR_KEY'];
const endpoint = process.env['FR_ENDPOINT'];
// sample document
const documentUrlRead = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/read.png"
// helper function
function* getTextOfSpans(content, spans) {
for (const span of spans) {
yield content.slice(span.offset, span.offset + span.length);
}
}
async function main() {
// create your `DocumentAnalysisClient` instance and `AzureKeyCredential` variable
const client = new DocumentAnalysisClient(endpoint, new AzureKeyCredential(key));
const poller = await client.beginAnalyzeDocument("prebuilt-read", documentUrlRead);
const {
content,
pages,
languages,
styles
} = await poller.pollUntilDone();
if (pages.length <= 0) {
console.log("No pages were extracted from the document.");
} else {
console.log("Pages:");
for (const page of pages) {
console.log("- Page", page.pageNumber, `(unit: ${page.unit})`);
console.log(` ${page.width}x${page.height}, angle: ${page.angle}`);
console.log(` ${page.lines.length} lines, ${page.words.length} words`);
if (page.lines.length > 0) {
console.log(" Lines:");
for (const line of page.lines) {
console.log(` - "${line.content}"`);
// The words of the line can also be iterated independently. The words are computed based on their
// corresponding spans.
for (const word of line.words()) {
console.log(` - "${word.content}"`);
}
}
}
}
}
if (languages.length <= 0) {
console.log("No language spans were extracted from the document.");
} else {
console.log("Languages:");
for (const languageEntry of languages) {
console.log(
`- Found language: ${languageEntry.languageCode} (confidence: ${languageEntry.confidence})`
);
for (const text of getTextOfSpans(content, languageEntry.spans)) {
const escapedText = text.replace(/\r?\n/g, "\\n").replace(/"/g, '\\"');
console.log(` - "${escapedText}"`);
}
}
}
if (styles.length <= 0) {
console.log("No text styles were extracted from the document.");
} else {
console.log("Styles:");
for (const style of styles) {
console.log(
`- Handwritten: ${style.isHandwritten ? "yes" : "no"} (confidence=${style.confidence})`
);
for (const word of getTextOfSpans(content, style.spans)) {
console.log(` - "${word}"`);
}
}
}
}
main().catch((error) => {
console.error("An error occurred:", error);
process.exit(1);
});
请访问 GitHub 上的 Azure 示例存储库,并查看 read
模型输出。
使用布局模型
const { AzureKeyCredential, DocumentAnalysisClient } = require("@azure/ai-form-recognizer");
//use your `key` and `endpoint` environment variables
const key = process.env['FR_KEY'];
const endpoint = process.env['FR_ENDPOINT'];
// sample document
const layoutUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/layout.png"
async function main() {
const client = new DocumentAnalysisClient(endpoint, new AzureKeyCredential(key));
const poller = await client.beginAnalyzeDocumentFromUrl(
"prebuilt-layout", layoutUrl);
// Layout extraction produces basic elements such as pages, words, lines, etc. as well as information about the
// appearance (styles) of textual elements.
const { pages, tables } = await poller.pollUntilDone();
if (!pages || pages.length <= 0) {
console.log("No pages were extracted from the document.");
} else {
console.log("Pages:");
for (const page of pages) {
console.log("- Page", page.pageNumber, `(unit: ${page.unit})`);
console.log(` ${page.width}x${page.height}, angle: ${page.angle}`);
console.log(
` ${page.lines && page.lines.length} lines, ${page.words && page.words.length} words`
);
if (page.lines && page.lines.length > 0) {
console.log(" Lines:");
for (const line of page.lines) {
console.log(` - "${line.content}"`);
// The words of the line can also be iterated independently. The words are computed based on their
// corresponding spans.
for (const word of line.words()) {
console.log(` - "${word.content}"`);
}
}
}
}
}
if (!tables || tables.length <= 0) {
console.log("No tables were extracted from the document.");
} else {
console.log("Tables:");
for (const table of tables) {
console.log(
`- Extracted table: ${table.columnCount} columns, ${table.rowCount} rows (${table.cells.length} cells)`
);
}
}
}
main().catch((error) => {
console.error("An error occurred:", error);
process.exit(1);
});
请访问 GitHub 上的 Azure 示例存储库,并查看布局模型输出。
使用“常规文档”模型
const { AzureKeyCredential, DocumentAnalysisClient } = require("@azure/ai-form-recognizer");
//use your `key` and `endpoint` environment variables
const key = process.env['FR_KEY'];
const endpoint = process.env['FR_ENDPOINT'];
// sample document
const documentUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/sample-layout.pdf"
async function main() {
const client = new DocumentAnalysisClient(endpoint, new AzureKeyCredential(key));
const poller = await client.beginAnalyzeDocumentFromUrl("prebuilt-document", documentUrl);
const {
keyValuePairs
} = await poller.pollUntilDone();
if (!keyValuePairs || keyValuePairs.length <= 0) {
console.log("No key-value pairs were extracted from the document.");
} else {
console.log("Key-Value Pairs:");
for (const {
key,
value,
confidence
} of keyValuePairs) {
console.log("- Key :", `"${key.content}"`);
console.log(" Value:", `"${(value && value.content) || "<undefined>"}" (${confidence})`);
}
}
}
main().catch((error) => {
console.error("An error occurred:", error);
process.exit(1);
});
请访问 GitHub 上的 Azure 示例存储库,并查看常规文档模型输出。
使用 W-2 税务模型
const { AzureKeyCredential, DocumentAnalysisClient } = require("@azure/ai-form-recognizer");
//use your `key` and `endpoint` environment variables
const key = process.env['FR_KEY'];
const endpoint = process.env['FR_ENDPOINT'];
const w2DocumentURL = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/w2.png"
async function main() {
const client = new DocumentAnalysisClient(endpoint, new AzureKeyCredential(key));
const poller = await client.beginAnalyzeDocument("prebuilt-tax.us.w2", w2DocumentURL);
const {
documents: [result]
} = await poller.pollUntilDone();
if (result) {
const { Employee, Employer, ControlNumber, TaxYear, AdditionalInfo } = result.fields;
if (Employee) {
const { Name, Address, SocialSecurityNumber } = Employee.properties;
console.log("Employee:");
console.log(" Name:", Name && Name.content);
console.log(" SSN/TIN:", SocialSecurityNumber && SocialSecurityNumber.content);
if (Address && Address.value) {
const { streetAddress, postalCode } = Address.value;
console.log(" Address:");
console.log(" Street Address:", streetAddress);
console.log(" Postal Code:", postalCode);
}
} else {
console.log("No employee information extracted.");
}
if (Employer) {
const { Name, Address, IdNumber } = Employer.properties;
console.log("Employer:");
console.log(" Name:", Name && Name.content);
console.log(" ID (EIN):", IdNumber && IdNumber.content);
if (Address && Address.value) {
const { streetAddress, postalCode } = Address.value;
console.log(" Address:");
console.log(" Street Address:", streetAddress);
console.log(" Postal Code:", postalCode);
}
} else {
console.log("No employer information extracted.");
}
console.log("Control Number:", ControlNumber && ControlNumber.content);
console.log("Tax Year:", TaxYear && TaxYear.content);
if (AdditionalInfo) {
console.log("Additional Info:");
for (const info of AdditionalInfo.values) {
const { LetterCode, Amount } = info.properties;
console.log(`- ${LetterCode && LetterCode.content}: ${Amount && Amount.content}`);
}
}
} else {
throw new Error("Expected at least one document in the result.");
}
}
main().catch((error) => {
console.error(error);
process.exit(1);
});
请访问 GitHub 上的 Azure 示例存储库,并查看 W-2 税务模型输出。
使用发票模型
const { AzureKeyCredential, DocumentAnalysisClient } = require("@azure/ai-form-recognizer");
//use your `key` and `endpoint` environment variables
const key = process.env['FR_KEY'];
const endpoint = process.env['FR_ENDPOINT'];
// sample url
const invoiceUrl = "https://github.com/Azure-Samples/cognitive-services-REST-api-samples/raw/master/curl/form-recognizer/rest-api/invoice.pdf";
async function main() {
const client = new DocumentAnalysisClient(endpoint, new AzureKeyCredential(key));
const poller = await client.beginAnalyzeDocument("prebuilt-invoice", invoiceUrl);
const {
documents: [result]
} = await poller.pollUntilDone();
if (result) {
const invoice = result.fields;
console.log("Vendor Name:", invoice.VendorName?.content);
console.log("Customer Name:", invoice.CustomerName?.content);
console.log("Invoice Date:", invoice.InvoiceDate?.content);
console.log("Due Date:", invoice.DueDate?.content);
console.log("Items:");
for (const {
properties: item
} of invoice.Items?.values ?? []) {
console.log("-", item.ProductCode?.content ?? "<no product code>");
console.log(" Description:", item.Description?.content);
console.log(" Quantity:", item.Quantity?.content);
console.log(" Date:", item.Date?.content);
console.log(" Unit:", item.Unit?.content);
console.log(" Unit Price:", item.UnitPrice?.content);
console.log(" Tax:", item.Tax?.content);
console.log(" Amount:", item.Amount?.content);
}
console.log("Subtotal:", invoice.SubTotal?.content);
console.log("Previous Unpaid Balance:", invoice.PreviousUnpaidBalance?.content);
console.log("Tax:", invoice.TotalTax?.content);
console.log("Amount Due:", invoice.AmountDue?.content);
} else {
throw new Error("Expected at least one receipt in the result.");
}
}
main().catch((error) => {
console.error("An error occurred:", error);
process.exit(1);
});
请访问 GitHub 上的 Azure 示例存储库,并查看发票模型输出。
使用收据模型
const { AzureKeyCredential, DocumentAnalysisClient } = require("@azure/ai-form-recognizer");
//use your `key` and `endpoint` environment variables
const key = process.env['FR_KEY'];
const endpoint = process.env['FR_ENDPOINT'];
// sample url
const receiptUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/receipt.png";
async function main() {
const client = new DocumentAnalysisClient(endpoint, new AzureKeyCredential(key));
const poller = await client.beginAnalyzeDocument("prebuilt-receipt", receiptUrl);
const {
documents: [result]
} = await poller.pollUntilDone();
if (result) {
const {
MerchantName,
Items,
Total
} = result.fields;
console.log("=== Receipt Information ===");
console.log("Type:", result.docType);
console.log("Merchant:", MerchantName && MerchantName.content);
console.log("Items:");
for (const item of (Items && Items.values) || []) {
const {
Description,
TotalPrice
} = item.properties;
console.log("- Description:", Description && Description.content);
console.log(" Total Price:", TotalPrice && TotalPrice.content);
}
console.log("Total:", Total && Total.content);
} else {
throw new Error("Expected at least one receipt in the result.");
}
}
main().catch((err) => {
console.error("The sample encountered an error:", err);
});
请访问 GitHub 上的 Azure 示例存储库,并查看收据模型输出。
使用 ID 文档模型
const { AzureKeyCredential, DocumentAnalysisClient } = require("@azure/ai-form-recognizer");
//use your `key` and `endpoint` environment variables
const key = process.env['FR_KEY'];
const endpoint = process.env['FR_ENDPOINT'];
// sample document
const idDocumentURL = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/identity_documents.png"
async function main() {
const client = new DocumentAnalysisClient(endpoint, new AzureKeyCredential(key));
const poller = await client.beginAnalyzeDocument("prebuilt-idDocument", idDocumentURL);
const {
documents: [result]
} = await poller.pollUntilDone();
if (result) {
// The identity document model has multiple document types, so we need to know which document type was actually
extracted.
if (result.docType === "idDocument.driverLicense") {
const { FirstName, LastName, DocumentNumber, DateOfBirth, DateOfExpiration, Height, Weight, EyeColor, Endorsements, Restrictions, VehicleClassifications} = result.fields;
// For the sake of the example, we'll only show a few of the fields that are produced.
console.log("Extracted a Driver License:");
console.log(" Name:", FirstName && FirstName.content, LastName && LastName.content);
console.log(" License No.:", DocumentNumber && DocumentNumber.content);
console.log(" Date of Birth:", DateOfBirth && DateOfBirth.content);
console.log(" Expiration:", DateOfExpiration && DateOfExpiration.content);
console.log(" Height:", Height && Height.content);
console.log(" Weight:", Weight && Weight.content);
console.log(" Eye color:", EyeColor && EyeColor.content);
console.log(" Restrictions:", Restrictions && Restrictions.content);
console.log(" Endorsements:", Endorsements && Endorsements.content);
console.log(" Class:", VehicleClassifications && VehicleClassifications.content);
} else if (result.docType === "idDocument.passport") {
// The passport document type extracts and parses the Passport's machine-readable zone
if (!result.fields.machineReadableZone) {
throw new Error("No Machine Readable Zone extracted from passport.");
}
const {
FirstName,
LastName,
DateOfBirth,
Nationality,
DocumentNumber,
CountryRegion,
DateOfExpiration,
} = result.fields.machineReadableZone.properties;
console.log("Extracted a Passport:");
console.log(" Name:", FirstName && FirstName.content, LastName && LastName.content);
console.log(" Date of Birth:", DateOfBirth && DateOfBirth.content);
console.log(" Nationality:", Nationality && natiNationalityonality.content);
console.log(" Passport No.:", DocumentNumber && DocumentNumber.content);
console.log(" Issuer:", CountryRegion && CountryRegion.content);
console.log(" Expiration Date:", DateOfExpiration && DateOfExpiration.content);
} else {
// The only reason this would happen is if the client library's schema for the prebuilt identity document model is
out of date, and a new document type has been introduced.
console.error("Unknown document type in result:", result);
}
} else {
throw new Error("Expected at least one receipt in the result.");
}
}
main().catch((error) => {
console.error("An error occurred:", error);
process.exit(1);
});
请访问 GitHub 上的 Azure 示例存储库,并查看 ID 文档模型输出。
使用名片模型
const { AzureKeyCredential, DocumentAnalysisClient } = require("@azure/ai-form-recognizer");
//use your `key` and `endpoint` environment variables
const key = process.env['FR_KEY'];
const endpoint = process.env['FR_ENDPOINT'];
// sample document
const businessCardURL = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/de5e0d8982ab754823c54de47a47e8e499351523/curl/form-recognizer/rest-api/business_card.jpg"
async function main() {
const client = new DocumentAnalysisClient(endpoint, new AzureKeyCredential(key));
const poller = await client.beginAnalyzeDocument("prebuilt-businessCard", businessCardURL);
const {
documents: [result]
} = await poller.pollUntilDone();
if (result) {
const businessCard = result.fields;
console.log("=== Business Card Information ===");
// There are more fields than just these few, and the model allows for multiple contact & company names as well as
// phone numbers, though we'll only show the first extracted values here.
const name = businessCard.ContactNames && businessCard.ContactNames.values[0];
if (name) {
const {
FirstName,
LastName
} = name.properties;
console.log("Name:", FirstName && FirstName.content, LastName && LastName.content);
}
const company = businessCard.CompanyNames && businessCard.CompanyNames.values[0];
if (company) {
console.log("Company:", company.content);
}
const address = businessCard.Addresses && businessCard.Addresses.values[0];
if (address) {
console.log("Address:", address.content);
}
const jobTitle = businessCard.JobTitles && businessCard.JobTitles.values[0];
if (jobTitle) {
console.log("Job title:", jobTitle.content);
}
const department = businessCard.Departments && businessCard.Departments.values[0];
if (department) {
console.log("Department:", department.content);
}
const email = businessCard.Emails && businessCard.Emails.values[0];
if (email) {
console.log("Email:", email.content);
}
const workPhone = businessCard.WorkPhones && businessCard.WorkPhones.values[0];
if (workPhone) {
console.log("Work phone:", workPhone.content);
}
const website = businessCard.Websites && businessCard.Websites.values[0];
if (website) {
console.log("Website:", website.content);
}
} else {
throw new Error("Expected at least one business card in the result.");
}
}
main().catch((error) => {
console.error("An error occurred:", error);
process.exit(1);
});
请访问 GitHub 上的 Azure 示例存储库,并查看名片模型输出。
先决条件
Azure 订阅 - 创建试用订阅。
Python 3.7 或更高版本。 你的 Python 安装应包含 pip。 可以通过在命令行上运行
pip --version
来检查是否安装了 pip。 通过安装最新版本的 Python 获取 pip。最新版本的 Visual Studio Code 或者你首选的 IDE。 请参阅 Visual Studio Code 中的 Python 使用入门。
Azure AI 服务或文档智能资源。 创建“单一服务或多个服务。 可以使用免费定价层 (
F0
) 试用该服务,然后再升级到付费层进行生产。从创建的资源获取密钥和终结点,以便将应用程序连接到文档智能服务。
- 部署资源后,选择“转到资源”。
- 在左侧导航菜单中,选择“密钥和终结点”。
- 复制其中一个密钥和终结点,稍后你将在本快速入门中使用它们。
URL 中的文档文件。 对于本项目,可以使用下表中为每项功能提供的示例表单:
功能 modelID document-url “读取”模型 prebuilt-read 示例手册 布局模型 预生成布局 预订确认示例 W-2 窗体模型 prebuilt-tax.us.w2 示例 W-2 表单 发票模型 预生成的发票 示例发票 收据模型 prebuilt-receipt 示例收据 ID 文档模型 prebuilt-idDocument 示例身份证明文档 名片模型 prebuilt-businessCard 名片示例
设置环境变量
若要与文档智能服务交互,需要创建 DocumentAnalysisClient
类的实例。 为此,请使用 Azure 门户中的 key
和 endpoint
实例化客户端。 对于此项目,使用环境变量来存储和访问凭据。
重要
如果使用 API 密钥,请将其安全地存储在某个其他位置,例如 Azure Key Vault 中。 请不要直接在代码中包含 API 密钥,并且切勿公开发布该密钥。
有关 Azure AI 服务安全性的详细信息,请参阅对 Azure AI 服务的请求进行身份验证。
若要为文档智能资源密钥设置环境变量,请打开控制台窗口,按照操作系统和开发环境的说明进行操作。 将 <yourKey> 和 <yourEndpoint> 替换为 Azure 门户中资源的值。
环境变量不区分大小写。 它们通常以大写形式声明,其中包含的单词以下划线联接。 请根据命令提示符运行以下命令:
设置密钥变量:
setx DI_KEY <yourKey>
设置终结点变量
setx DI_ENDPOINT <yourEndpoint>
设置环境变量后关闭命令提示符窗口。 这些值将一直保留,直到你再次更改它们。
重启任何读取环境变量的运行程序。 例如,如果正在使用 Visual Studio 或 Visual Studio Code 作为编辑器,请在运行示例代码之前重启它。
下面是其他一些可与环境变量一起使用的有用命令:
命令 | 操作 | 示例 |
---|---|---|
setx VARIABLE_NAME= |
通过将环境变量值设置为空字符串来删除环境变量。 | setx DI_KEY= |
setx VARIABLE_NAME=value |
设置或更改环境变量的值。 | setx DI_KEY=<yourKey> |
set VARIABLE_NAME |
显示特定环境变量的值。 | set DI_KEY |
set |
显示所有环境变量。 | set |
设置编程环境
在本地环境中打开一个控制台窗口,并使用 pip 安装适用于 Python 的 Azure AI 文档智能客户端库:
pip install azure-ai-formrecognizer==3.2.0
创建 Python 应用程序
若要与文档智能服务交互,需要创建 DocumentAnalysisClient
类的实例。 为此,你要通过 Azure 门户使用你的密钥创建一个 AzureKeyCredential
,并使用 AzureKeyCredential
和文档智能终结点创建一个 DocumentAnalysisClient
实例。
在编辑器或 IDE 中创建一个名为 form_recognizer_quickstart.py 的新 Python 文件。
打开 form_recognizer_quickstart.py 文件,选择以下代码示例之一,然后将其复制/粘贴到你的应用程序中:
- 预生成读取模型是所有文档智能模型的核心,可以检测行、字词、位置和语言。 布局、常规文档、预生成和自定义模型都使用
read
模型作为从文档中提取文本的基础。 - 预生成布局模型从文档和图像中提取文本和文本位置、表、选择标记和结构信息。
- 预生成 tax.us.w2 模型提取美国国内税收署 (IRS) 纳税表单上报告的信息。
- 预生成发票模型会从各种格式和质量的销售发票中提取关键字段和行项。
- 预生成收据模型提取印刷体和手写体销售收据中的关键信息。
- 预生成 idDocument 模型从美国驾驶执照、国际护照个人资料页、美国各州身份证、社会保障卡和永久居民卡中提取关键信息。
- 预生成读取模型是所有文档智能模型的核心,可以检测行、字词、位置和语言。 布局、常规文档、预生成和自定义模型都使用
从命令提示符运行 Python 代码。
python form_recognizer_quickstart.py
使用“读取”模型
import os
from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential
# use your `key` and `endpoint` environment variables
key = os.environ.get('FR_KEY')
endpoint = os.environ.get('FR_ENDPOINT')
# formatting function
def format_polygon(polygon):
if not polygon:
return "N/A"
return ", ".join(["[{}, {}]".format(p.x, p.y) for p in polygon])
def analyze_read():
# sample document
formUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/read.png"
document_analysis_client = DocumentAnalysisClient(
endpoint=endpoint, credential=AzureKeyCredential(key)
)
poller = document_analysis_client.begin_analyze_document_from_url(
"prebuilt-read", formUrl
)
result = poller.result()
print("Document contains content: ", result.content)
for idx, style in enumerate(result.styles):
print(
"Document contains {} content".format(
"handwritten" if style.is_handwritten else "no handwritten"
)
)
for page in result.pages:
print("----Analyzing Read from page #{}----".format(page.page_number))
print(
"Page has width: {} and height: {}, measured with unit: {}".format(
page.width, page.height, page.unit
)
)
for line_idx, line in enumerate(page.lines):
print(
"...Line # {} has text content '{}' within bounding box '{}'".format(
line_idx,
line.content,
format_polygon(line.polygon),
)
)
for word in page.words:
print(
"...Word '{}' has a confidence of {}".format(
word.content, word.confidence
)
)
print("----------------------------------------")
if __name__ == "__main__":
analyze_read()
请访问 GitHub 上的 Azure 示例存储库,并查看 read
模型输出。
使用布局模型
import os
from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential
# use your `key` and `endpoint` environment variables
key = os.environ.get('FR_KEY')
endpoint = os.environ.get('FR_ENDPOINT')
# formatting function
def format_polygon(polygon):
if not polygon:
return "N/A"
return ", ".join(["[{}, {}]".format(p.x, p.y) for p in polygon])
def analyze_layout():
# sample document
formUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/layout.png"
document_analysis_client = DocumentAnalysisClient(
endpoint=endpoint, credential=AzureKeyCredential(key)
)
poller = document_analysis_client.begin_analyze_document_from_url(
"prebuilt-layout", formUrl
)
result = poller.result()
for idx, style in enumerate(result.styles):
print(
"Document contains {} content".format(
"handwritten" if style.is_handwritten else "no handwritten"
)
)
for page in result.pages:
print("----Analyzing layout from page #{}----".format(page.page_number))
print(
"Page has width: {} and height: {}, measured with unit: {}".format(
page.width, page.height, page.unit
)
)
for line_idx, line in enumerate(page.lines):
words = line.get_words()
print(
"...Line # {} has word count {} and text '{}' within bounding box '{}'".format(
line_idx,
len(words),
line.content,
format_polygon(line.polygon),
)
)
for word in words:
print(
"......Word '{}' has a confidence of {}".format(
word.content, word.confidence
)
)
for selection_mark in page.selection_marks:
print(
"...Selection mark is '{}' within bounding box '{}' and has a confidence of {}".format(
selection_mark.state,
format_polygon(selection_mark.polygon),
selection_mark.confidence,
)
)
for table_idx, table in enumerate(result.tables):
print(
"Table # {} has {} rows and {} columns".format(
table_idx, table.row_count, table.column_count
)
)
for region in table.bounding_regions:
print(
"Table # {} location on page: {} is {}".format(
table_idx,
region.page_number,
format_polygon(region.polygon),
)
)
for cell in table.cells:
print(
"...Cell[{}][{}] has content '{}'".format(
cell.row_index,
cell.column_index,
cell.content,
)
)
for region in cell.bounding_regions:
print(
"...content on page {} is within bounding box '{}'".format(
region.page_number,
format_polygon(region.polygon),
)
)
print("----------------------------------------")
if __name__ == "__main__":
analyze_layout()
请访问 GitHub 上的 Azure 示例存储库,并查看布局模型输出。
使用“常规文档”模型
import os
from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential
# use your `key` and `endpoint` environment variables
key = os.environ.get('FR_KEY')
endpoint = os.environ.get('FR_ENDPOINT')
# formatting function
def format_bounding_region(bounding_regions):
if not bounding_regions:
return "N/A"
return ", ".join("Page #{}: {}".format(region.page_number, format_polygon(region.polygon)) for region in bounding_regions)
# formatting function
def format_polygon(polygon):
if not polygon:
return "N/A"
return ", ".join(["[{}, {}]".format(p.x, p.y) for p in polygon])
def analyze_general_documents():
# sample document
docUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/sample-layout.pdf"
# create your `DocumentAnalysisClient` instance and `AzureKeyCredential` variable
document_analysis_client = DocumentAnalysisClient(endpoint=endpoint, credential=AzureKeyCredential(key))
poller = document_analysis_client.begin_analyze_document_from_url(
"prebuilt-document", docUrl)
result = poller.result()
for style in result.styles:
if style.is_handwritten:
print("Document contains handwritten content: ")
print(",".join([result.content[span.offset:span.offset + span.length] for span in style.spans]))
print("----Key-value pairs found in document----")
for kv_pair in result.key_value_pairs:
if kv_pair.key:
print(
"Key '{}' found within '{}' bounding regions".format(
kv_pair.key.content,
format_bounding_region(kv_pair.key.bounding_regions),
)
)
if kv_pair.value:
print(
"Value '{}' found within '{}' bounding regions\n".format(
kv_pair.value.content,
format_bounding_region(kv_pair.value.bounding_regions),
)
)
for page in result.pages:
print("----Analyzing document from page #{}----".format(page.page_number))
print(
"Page has width: {} and height: {}, measured with unit: {}".format(
page.width, page.height, page.unit
)
)
for line_idx, line in enumerate(page.lines):
print(
"...Line # {} has text content '{}' within bounding box '{}'".format(
line_idx,
line.content,
format_polygon(line.polygon),
)
)
for word in page.words:
print(
"...Word '{}' has a confidence of {}".format(
word.content, word.confidence
)
)
for selection_mark in page.selection_marks:
print(
"...Selection mark is '{}' within bounding box '{}' and has a confidence of {}".format(
selection_mark.state,
format_polygon(selection_mark.polygon),
selection_mark.confidence,
)
)
for table_idx, table in enumerate(result.tables):
print(
"Table # {} has {} rows and {} columns".format(
table_idx, table.row_count, table.column_count
)
)
for region in table.bounding_regions:
print(
"Table # {} location on page: {} is {}".format(
table_idx,
region.page_number,
format_polygon(region.polygon),
)
)
for cell in table.cells:
print(
"...Cell[{}][{}] has content '{}'".format(
cell.row_index,
cell.column_index,
cell.content,
)
)
for region in cell.bounding_regions:
print(
"...content on page {} is within bounding box '{}'\n".format(
region.page_number,
format_polygon(region.polygon),
)
)
print("----------------------------------------")
if __name__ == "__main__":
analyze_general_documents()
请访问 GitHub 上的 Azure 示例存储库,并查看常规文档模型输出。
使用 W-2 税务模型
import os
from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential
# use your `key` and `endpoint` environment variables
key = os.environ.get('FR_KEY')
endpoint = os.environ.get('FR_ENDPOINT')
# formatting function
def format_address_value(address_value):
return f"\n......House/building number: {address_value.house_number}\n......Road: {address_value.road}\n......City: {address_value.city}\n......State: {address_value.state}\n......Postal code: {address_value.postal_code}"
def analyze_tax_us_w2():
# sample document
formUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/w2.png"
document_analysis_client = DocumentAnalysisClient(
endpoint=endpoint, credential=AzureKeyCredential(key)
)
poller = document_analysis_client.begin_analyze_document_from_url(
"prebuilt-tax.us.w2", formUrl
)
w2s = poller.result()
for idx, w2 in enumerate(w2s.documents):
print("--------Analyzing US Tax W-2 Form #{}--------".format(idx 1))
form_variant = w2.fields.get("W2FormVariant")
if form_variant:
print(
"Form variant: {} has confidence: {}".format(
form_variant.value, form_variant.confidence
)
)
tax_year = w2.fields.get("TaxYear")
if tax_year:
print(
"Tax year: {} has confidence: {}".format(
tax_year.value, tax_year.confidence
)
)
w2_copy = w2.fields.get("W2Copy")
if w2_copy:
print(
"W-2 Copy: {} has confidence: {}".format(
w2_copy.value,
w2_copy.confidence,
)
)
employee = w2.fields.get("Employee")
if employee:
print("Employee data:")
employee_name = employee.value.get("Name")
if employee_name:
print(
"...Name: {} has confidence: {}".format(
employee_name.value, employee_name.confidence
)
)
employee_ssn = employee.value.get("SocialSecurityNumber")
if employee_ssn:
print(
"...SSN: {} has confidence: {}".format(
employee_ssn.value, employee_ssn.confidence
)
)
employee_address = employee.value.get("Address")
if employee_address:
print(
"...Address: {}\n......has confidence: {}".format(
format_address_value(employee_address.value),
employee_address.confidence,
)
)
employee_zipcode = employee.value.get("ZipCode")
if employee_zipcode:
print(
"...Zipcode: {} has confidence: {}".format(
employee_zipcode.value, employee_zipcode.confidence
)
)
control_number = w2.fields.get("ControlNumber")
if control_number:
print(
"Control Number: {} has confidence: {}".format(
control_number.value, control_number.confidence
)
)
employer = w2.fields.get("Employer")
if employer:
print("Employer data:")
employer_name = employer.value.get("Name")
if employer_name:
print(
"...Name: {} has confidence: {}".format(
employer_name.value, employer_name.confidence
)
)
employer_id = employer.value.get("IdNumber")
if employer_id:
print(
"...ID Number: {} has confidence: {}".format(
employer_id.value, employer_id.confidence
)
)
employer_address = employer.value.get("Address")
if employer_address:
print(
"...Address: {}\n......has confidence: {}".format(
format_address_value(employer_address.value),
employer_address.confidence,
)
)
employer_zipcode = employer.value.get("ZipCode")
if employer_zipcode:
print(
"...Zipcode: {} has confidence: {}".format(
employer_zipcode.value, employer_zipcode.confidence
)
)
wages_tips = w2.fields.get("WagesTipsAndOtherCompensation")
if wages_tips:
print(
"Wages, tips, and other compensation: {} has confidence: {}".format(
wages_tips.value,
wages_tips.confidence,
)
)
fed_income_tax_withheld = w2.fields.get("FederalIncomeTaxWithheld")
if fed_income_tax_withheld:
print(
"Federal income tax withheld: {} has confidence: {}".format(
fed_income_tax_withheld.value, fed_income_tax_withheld.confidence
)
)
social_security_wages = w2.fields.get("SocialSecurityWages")
if social_security_wages:
print(
"Social Security wages: {} has confidence: {}".format(
social_security_wages.value, social_security_wages.confidence
)
)
social_security_tax_withheld = w2.fields.get("SocialSecurityTaxWithheld")
if social_security_tax_withheld:
print(
"Social Security tax withheld: {} has confidence: {}".format(
social_security_tax_withheld.value,
social_security_tax_withheld.confidence,
)
)
medicare_wages_tips = w2.fields.get("MedicareWagesAndTips")
if medicare_wages_tips:
print(
"Medicare wages and tips: {} has confidence: {}".format(
medicare_wages_tips.value, medicare_wages_tips.confidence
)
)
medicare_tax_withheld = w2.fields.get("MedicareTaxWithheld")
if medicare_tax_withheld:
print(
"Medicare tax withheld: {} has confidence: {}".format(
medicare_tax_withheld.value, medicare_tax_withheld.confidence
)
)
social_security_tips = w2.fields.get("SocialSecurityTips")
if social_security_tips:
print(
"Social Security tips: {} has confidence: {}".format(
social_security_tips.value, social_security_tips.confidence
)
)
allocated_tips = w2.fields.get("AllocatedTips")
if allocated_tips:
print(
"Allocated tips: {} has confidence: {}".format(
allocated_tips.value,
allocated_tips.confidence,
)
)
verification_code = w2.fields.get("VerificationCode")
if verification_code:
print(
"Verification code: {} has confidence: {}".format(
verification_code.value, verification_code.confidence
)
)
dependent_care_benefits = w2.fields.get("DependentCareBenefits")
if dependent_care_benefits:
print(
"Dependent care benefits: {} has confidence: {}".format(
dependent_care_benefits.value,
dependent_care_benefits.confidence,
)
)
non_qualified_plans = w2.fields.get("NonQualifiedPlans")
if non_qualified_plans:
print(
"Non-qualified plans: {} has confidence: {}".format(
non_qualified_plans.value,
non_qualified_plans.confidence,
)
)
additional_info = w2.fields.get("AdditionalInfo")
if additional_info:
print("Additional information:")
for item in additional_info.value:
letter_code = item.value.get("LetterCode")
if letter_code:
print(
"...Letter code: {} has confidence: {}".format(
letter_code.value, letter_code.confidence
)
)
amount = item.value.get("Amount")
if amount:
print(
"...Amount: {} has confidence: {}".format(
amount.value, amount.confidence
)
)
is_statutory_employee = w2.fields.get("IsStatutoryEmployee")
if is_statutory_employee:
print(
"Is statutory employee: {} has confidence: {}".format(
is_statutory_employee.value, is_statutory_employee.confidence
)
)
is_retirement_plan = w2.fields.get("IsRetirementPlan")
if is_retirement_plan:
print(
"Is retirement plan: {} has confidence: {}".format(
is_retirement_plan.value, is_retirement_plan.confidence
)
)
third_party_sick_pay = w2.fields.get("IsThirdPartySickPay")
if third_party_sick_pay:
print(
"Is third party sick pay: {} has confidence: {}".format(
third_party_sick_pay.value, third_party_sick_pay.confidence
)
)
other_info = w2.fields.get("Other")
if other_info:
print(
"Other information: {} has confidence: {}".format(
other_info.value,
other_info.confidence,
)
)
state_tax_info = w2.fields.get("StateTaxInfos")
if state_tax_info:
print("State Tax info:")
for tax in state_tax_info.value:
state = tax.value.get("State")
if state:
print(
"...State: {} has confidence: {}".format(
state.value, state.confidence
)
)
employer_state_id_number = tax.value.get("EmployerStateIdNumber")
if employer_state_id_number:
print(
"...Employer state ID number: {} has confidence: {}".format(
employer_state_id_number.value,
employer_state_id_number.confidence,
)
)
state_wages_tips = tax.value.get("StateWagesTipsEtc")
if state_wages_tips:
print(
"...State wages, tips, etc: {} has confidence: {}".format(
state_wages_tips.value, state_wages_tips.confidence
)
)
state_income_tax = tax.value.get("StateIncomeTax")
if state_income_tax:
print(
"...State income tax: {} has confidence: {}".format(
state_income_tax.value, state_income_tax.confidence
)
)
local_tax_info = w2.fields.get("LocalTaxInfos")
if local_tax_info:
print("Local Tax info:")
for tax in local_tax_info.value:
local_wages_tips = tax.value.get("LocalWagesTipsEtc")
if local_wages_tips:
print(
"...Local wages, tips, etc: {} has confidence: {}".format(
local_wages_tips.value, local_wages_tips.confidence
)
)
local_income_tax = tax.value.get("LocalIncomeTax")
if local_income_tax:
print(
"...Local income tax: {} has confidence: {}".format(
local_income_tax.value, local_income_tax.confidence
)
)
locality_name = tax.value.get("LocalityName")
if locality_name:
print(
"...Locality name: {} has confidence: {}".format(
locality_name.value, locality_name.confidence
)
)
print("----------------------------------------")
if __name__ == "__main__":
analyze_tax_us_w2()
请访问 GitHub 上的 Azure 示例存储库,并查看 W-2 税务模型输出。
使用发票模型
import os
from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential
# use your `key` and `endpoint` environment variables
key = os.environ.get('FR_KEY')
endpoint = os.environ.get('FR_ENDPOINT')
# formatting function
def format_bounding_region(bounding_regions):
if not bounding_regions:
return "N/A"
return ", ".join("Page #{}: {}".format(region.page_number, format_polygon(region.polygon)) for region in bounding_regions)
# formatting function
def format_polygon(polygon):
if not polygon:
return "N/A"
return ", ".join(["[{}, {}]".format(p.x, p.y) for p in polygon])
def analyze_invoice():
invoiceUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/sample-invoice.pdf"
document_analysis_client = DocumentAnalysisClient(
endpoint=endpoint, credential=AzureKeyCredential(key)
)
poller = document_analysis_client.begin_analyze_document_from_url(
"prebuilt-invoice", invoiceUrl)
invoices = poller.result()
for idx, invoice in enumerate(invoices.documents):
print("--------Recognizing invoice #{}--------".format(idx + 1))
vendor_name = invoice.fields.get("VendorName")
if vendor_name:
print(
"Vendor Name: {} has confidence: {}".format(
vendor_name.value, vendor_name.confidence
)
)
vendor_address = invoice.fields.get("VendorAddress")
if vendor_address:
print(
"Vendor Address: {} has confidence: {}".format(
vendor_address.value, vendor_address.confidence
)
)
vendor_address_recipient = invoice.fields.get("VendorAddressRecipient")
if vendor_address_recipient:
print(
"Vendor Address Recipient: {} has confidence: {}".format(
vendor_address_recipient.value, vendor_address_recipient.confidence
)
)
customer_name = invoice.fields.get("CustomerName")
if customer_name:
print(
"Customer Name: {} has confidence: {}".format(
customer_name.value, customer_name.confidence
)
)
customer_id = invoice.fields.get("CustomerId")
if customer_id:
print(
"Customer Id: {} has confidence: {}".format(
customer_id.value, customer_id.confidence
)
)
customer_address = invoice.fields.get("CustomerAddress")
if customer_address:
print(
"Customer Address: {} has confidence: {}".format(
customer_address.value, customer_address.confidence
)
)
customer_address_recipient = invoice.fields.get("CustomerAddressRecipient")
if customer_address_recipient:
print(
"Customer Address Recipient: {} has confidence: {}".format(
customer_address_recipient.value,
customer_address_recipient.confidence,
)
)
invoice_id = invoice.fields.get("InvoiceId")
if invoice_id:
print(
"Invoice Id: {} has confidence: {}".format(
invoice_id.value, invoice_id.confidence
)
)
invoice_date = invoice.fields.get("InvoiceDate")
if invoice_date:
print(
"Invoice Date: {} has confidence: {}".format(
invoice_date.value, invoice_date.confidence
)
)
invoice_total = invoice.fields.get("InvoiceTotal")
if invoice_total:
print(
"Invoice Total: {} has confidence: {}".format(
invoice_total.value, invoice_total.confidence
)
)
due_date = invoice.fields.get("DueDate")
if due_date:
print(
"Due Date: {} has confidence: {}".format(
due_date.value, due_date.confidence
)
)
purchase_order = invoice.fields.get("PurchaseOrder")
if purchase_order:
print(
"Purchase Order: {} has confidence: {}".format(
purchase_order.value, purchase_order.confidence
)
)
billing_address = invoice.fields.get("BillingAddress")
if billing_address:
print(
"Billing Address: {} has confidence: {}".format(
billing_address.value, billing_address.confidence
)
)
billing_address_recipient = invoice.fields.get("BillingAddressRecipient")
if billing_address_recipient:
print(
"Billing Address Recipient: {} has confidence: {}".format(
billing_address_recipient.value,
billing_address_recipient.confidence,
)
)
shipping_address = invoice.fields.get("ShippingAddress")
if shipping_address:
print(
"Shipping Address: {} has confidence: {}".format(
shipping_address.value, shipping_address.confidence
)
)
shipping_address_recipient = invoice.fields.get("ShippingAddressRecipient")
if shipping_address_recipient:
print(
"Shipping Address Recipient: {} has confidence: {}".format(
shipping_address_recipient.value,
shipping_address_recipient.confidence,
)
)
print("Invoice items:")
for idx, item in enumerate(invoice.fields.get("Items").value):
print("...Item #{}".format(idx + 1))
item_description = item.value.get("Description")
if item_description:
print(
"......Description: {} has confidence: {}".format(
item_description.value, item_description.confidence
)
)
item_quantity = item.value.get("Quantity")
if item_quantity:
print(
"......Quantity: {} has confidence: {}".format(
item_quantity.value, item_quantity.confidence
)
)
unit = item.value.get("Unit")
if unit:
print(
"......Unit: {} has confidence: {}".format(
unit.value, unit.confidence
)
)
unit_price = item.value.get("UnitPrice")
if unit_price:
print(
"......Unit Price: {} has confidence: {}".format(
unit_price.value, unit_price.confidence
)
)
product_code = item.value.get("ProductCode")
if product_code:
print(
"......Product Code: {} has confidence: {}".format(
product_code.value, product_code.confidence
)
)
item_date = item.value.get("Date")
if item_date:
print(
"......Date: {} has confidence: {}".format(
item_date.value, item_date.confidence
)
)
tax = item.value.get("Tax")
if tax:
print(
"......Tax: {} has confidence: {}".format(tax.value, tax.confidence)
)
amount = item.value.get("Amount")
if amount:
print(
"......Amount: {} has confidence: {}".format(
amount.value, amount.confidence
)
)
subtotal = invoice.fields.get("SubTotal")
if subtotal:
print(
"Subtotal: {} has confidence: {}".format(
subtotal.value, subtotal.confidence
)
)
total_tax = invoice.fields.get("TotalTax")
if total_tax:
print(
"Total Tax: {} has confidence: {}".format(
total_tax.value, total_tax.confidence
)
)
previous_unpaid_balance = invoice.fields.get("PreviousUnpaidBalance")
if previous_unpaid_balance:
print(
"Previous Unpaid Balance: {} has confidence: {}".format(
previous_unpaid_balance.value, previous_unpaid_balance.confidence
)
)
amount_due = invoice.fields.get("AmountDue")
if amount_due:
print(
"Amount Due: {} has confidence: {}".format(
amount_due.value, amount_due.confidence
)
)
service_start_date = invoice.fields.get("ServiceStartDate")
if service_start_date:
print(
"Service Start Date: {} has confidence: {}".format(
service_start_date.value, service_start_date.confidence
)
)
service_end_date = invoice.fields.get("ServiceEndDate")
if service_end_date:
print(
"Service End Date: {} has confidence: {}".format(
service_end_date.value, service_end_date.confidence
)
)
service_address = invoice.fields.get("ServiceAddress")
if service_address:
print(
"Service Address: {} has confidence: {}".format(
service_address.value, service_address.confidence
)
)
service_address_recipient = invoice.fields.get("ServiceAddressRecipient")
if service_address_recipient:
print(
"Service Address Recipient: {} has confidence: {}".format(
service_address_recipient.value,
service_address_recipient.confidence,
)
)
remittance_address = invoice.fields.get("RemittanceAddress")
if remittance_address:
print(
"Remittance Address: {} has confidence: {}".format(
remittance_address.value, remittance_address.confidence
)
)
remittance_address_recipient = invoice.fields.get("RemittanceAddressRecipient")
if remittance_address_recipient:
print(
"Remittance Address Recipient: {} has confidence: {}".format(
remittance_address_recipient.value,
remittance_address_recipient.confidence,
)
)
print("----------------------------------------")
if __name__ == "__main__":
analyze_invoice()
请访问 GitHub 上的 Azure 示例存储库,并查看发票模型输出。
使用收据模型
import os
from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential
# use your `key` and `endpoint` environment variables
key = os.environ.get('FR_KEY')
endpoint = os.environ.get('FR_ENDPOINT')
def analyze_receipts():
# sample document
receiptUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/receipt.png"
document_analysis_client = DocumentAnalysisClient(
endpoint=endpoint, credential=AzureKeyCredential(key)
)
poller = document_analysis_client.begin_analyze_document_from_url(
"prebuilt-receipt", receiptUrl, locale="en-US"
)
receipts = poller.result()
for idx, receipt in enumerate(receipts.documents):
print("--------Analysis of receipt #{}--------".format(idx 1))
print("Receipt type: {}".format(receipt.doc_type or "N/A"))
merchant_name = receipt.fields.get("MerchantName")
if merchant_name:
print(
"Merchant Name: {} has confidence: {}".format(
merchant_name.value, merchant_name.confidence
)
)
transaction_date = receipt.fields.get("TransactionDate")
if transaction_date:
print(
"Transaction Date: {} has confidence: {}".format(
transaction_date.value, transaction_date.confidence
)
)
if receipt.fields.get("Items"):
print("Receipt items:")
for idx, item in enumerate(receipt.fields.get("Items").value):
print("...Item #{}".format(idx 1))
item_description = item.value.get("Description")
if item_description:
print(
"......Item Description: {} has confidence: {}".format(
item_description.value, item_description.confidence
)
)
item_quantity = item.value.get("Quantity")
if item_quantity:
print(
"......Item Quantity: {} has confidence: {}".format(
item_quantity.value, item_quantity.confidence
)
)
item_price = item.value.get("Price")
if item_price:
print(
"......Individual Item Price: {} has confidence: {}".format(
item_price.value, item_price.confidence
)
)
item_total_price = item.value.get("TotalPrice")
if item_total_price:
print(
"......Total Item Price: {} has confidence: {}".format(
item_total_price.value, item_total_price.confidence
)
)
subtotal = receipt.fields.get("Subtotal")
if subtotal:
print(
"Subtotal: {} has confidence: {}".format(
subtotal.value, subtotal.confidence
)
)
tax = receipt.fields.get("TotalTax")
if tax:
print("Total tax: {} has confidence: {}".format(tax.value, tax.confidence))
tip = receipt.fields.get("Tip")
if tip:
print("Tip: {} has confidence: {}".format(tip.value, tip.confidence))
total = receipt.fields.get("Total")
if total:
print("Total: {} has confidence: {}".format(total.value, total.confidence))
print("--------------------------------------")
if __name__ == "__main__":
analyze_receipts()
请访问 GitHub 上的 Azure 示例存储库,并查看收据模型输出。
使用 ID 文档模型
import os
from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential
# use your `key` and `endpoint` environment variables
key = os.environ.get('FR_KEY')
endpoint = os.environ.get('FR_ENDPOINT')
def analyze_identity_documents():
# sample document
identityUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/identity_documents.png"
document_analysis_client = DocumentAnalysisClient(
endpoint=endpoint, credential=AzureKeyCredential(key)
)
poller = document_analysis_client.begin_analyze_document_from_url(
"prebuilt-idDocument", identityUrl
)
id_documents = poller.result()
for idx, id_document in enumerate(id_documents.documents):
print("--------Analyzing ID document #{}--------".format(idx + 1))
first_name = id_document.fields.get("FirstName")
if first_name:
print(
"First Name: {} has confidence: {}".format(
first_name.value, first_name.confidence
)
)
last_name = id_document.fields.get("LastName")
if last_name:
print(
"Last Name: {} has confidence: {}".format(
last_name.value, last_name.confidence
)
)
document_number = id_document.fields.get("DocumentNumber")
if document_number:
print(
"Document Number: {} has confidence: {}".format(
document_number.value, document_number.confidence
)
)
dob = id_document.fields.get("DateOfBirth")
if dob:
print(
"Date of Birth: {} has confidence: {}".format(dob.value, dob.confidence)
)
doe = id_document.fields.get("DateOfExpiration")
if doe:
print(
"Date of Expiration: {} has confidence: {}".format(
doe.value, doe.confidence
)
)
sex = id_document.fields.get("Sex")
if sex:
print("Sex: {} has confidence: {}".format(sex.value, sex.confidence))
address = id_document.fields.get("Address")
if address:
print(
"Address: {} has confidence: {}".format(
address.value, address.confidence
)
)
country_region = id_document.fields.get("CountryRegion")
if country_region:
print(
"Country/Region: {} has confidence: {}".format(
country_region.value, country_region.confidence
)
)
region = id_document.fields.get("Region")
if region:
print(
"Region: {} has confidence: {}".format(region.value, region.confidence)
)
print("--------------------------------------")
if __name__ == "__main__":
analyze_identity_documents()
请访问 GitHub 上的 Azure 示例存储库,并查看 ID 文档模型输出。
使用名片模型
import os
from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential
# use your `key` and `endpoint` environment variables
key = os.environ.get('FR_KEY')
endpoint = os.environ.get('FR_ENDPOINT')
def analyze_business_card():
# sample document
businessCardUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/de5e0d8982ab754823c54de47a47e8e499351523/curl/form-recognizer/rest-api/business_card.jpg"
document_analysis_client = DocumentAnalysisClient(
endpoint=endpoint, credential=AzureKeyCredential(key)
)
poller = document_analysis_client.begin_analyze_document_from_url(
"prebuilt-businessCard", businessCardUrl, locale="en-US"
)
business_cards = poller.result()
for idx, business_card in enumerate(business_cards.documents):
print("--------Analyzing business card #{}--------".format(idx + 1))
contact_names = business_card.fields.get("ContactNames")
if contact_names:
for contact_name in contact_names.value:
print(
"Contact First Name: {} has confidence: {}".format(
contact_name.value["FirstName"].value,
contact_name.value[
"FirstName"
].confidence,
)
)
print(
"Contact Last Name: {} has confidence: {}".format(
contact_name.value["LastName"].value,
contact_name.value[
"LastName"
].confidence,
)
)
company_names = business_card.fields.get("CompanyNames")
if company_names:
for company_name in company_names.value:
print(
"Company Name: {} has confidence: {}".format(
company_name.value, company_name.confidence
)
)
departments = business_card.fields.get("Departments")
if departments:
for department in departments.value:
print(
"Department: {} has confidence: {}".format(
department.value, department.confidence
)
)
job_titles = business_card.fields.get("JobTitles")
if job_titles:
for job_title in job_titles.value:
print(
"Job Title: {} has confidence: {}".format(
job_title.value, job_title.confidence
)
)
emails = business_card.fields.get("Emails")
if emails:
for email in emails.value:
print(
"Email: {} has confidence: {}".format(email.value, email.confidence)
)
websites = business_card.fields.get("Websites")
if websites:
for website in websites.value:
print(
"Website: {} has confidence: {}".format(
website.value, website.confidence
)
)
addresses = business_card.fields.get("Addresses")
if addresses:
for address in addresses.value:
print(
"Address: {} has confidence: {}".format(
address.value, address.confidence
)
)
mobile_phones = business_card.fields.get("MobilePhones")
if mobile_phones:
for phone in mobile_phones.value:
print(
"Mobile phone number: {} has confidence: {}".format(
phone.content, phone.confidence
)
)
faxes = business_card.fields.get("Faxes")
if faxes:
for fax in faxes.value:
print(
"Fax number: {} has confidence: {}".format(
fax.content, fax.confidence
)
)
work_phones = business_card.fields.get("WorkPhones")
if work_phones:
for work_phone in work_phones.value:
print(
"Work phone number: {} has confidence: {}".format(
work_phone.content, work_phone.confidence
)
)
other_phones = business_card.fields.get("OtherPhones")
if other_phones:
for other_phone in other_phones.value:
print(
"Other phone number: {} has confidence: {}".format(
other_phone.value, other_phone.confidence
)
)
print("--------------------------------------")
if __name__ == "__main__":
analyze_business_card()
请访问 GitHub 上的 Azure 示例存储库,并查看名片模型输出。
| 文档智能 REST API | 支持的 Azure SDK |
| 文档智能 REST API | 支持的 Azure SDK |
注意
此项目使用 cURL
命令行工具执行 REST API 调用。
先决条件
Azure 订阅 - 创建试用订阅。
已安装 cURL 命令行工具。 Windows 10 和 Windows 11 随附 cURL 副本。 在命令提示符窗口中输入以下 cURL 命令。 如果显示帮助选项,则 cURL 已安装在 Windows 环境中。
curl -help
如果未安装 cURL,可以在此处获取:
Azure AI 服务或文档智能资源。 创建“单一服务或多个服务。 可以使用免费定价层 (
F0
) 试用该服务,然后再升级到付费层进行生产。从创建的资源获取密钥和终结点,以便将应用程序连接到文档智能服务。
- 部署资源后,选择“转到资源”。
- 在左侧导航菜单中,选择“密钥和终结点”。
- 复制其中一个密钥和终结点,稍后你将在本快速入门中使用它们。
设置环境变量
若要与文档智能服务交互,需要创建 DocumentAnalysisClient
类的实例。 为此,请使用 Azure 门户中的 key
和 endpoint
实例化客户端。 对于此项目,使用环境变量来存储和访问凭据。
重要
如果使用 API 密钥,请将其安全地存储在某个其他位置,例如 Azure Key Vault 中。 请不要直接在代码中包含 API 密钥,并且切勿公开发布该密钥。
有关 Azure AI 服务安全性的详细信息,请参阅对 Azure AI 服务的请求进行身份验证。
若要为文档智能资源密钥设置环境变量,请打开控制台窗口,按照操作系统和开发环境的说明进行操作。 将 <yourKey> 和 <yourEndpoint> 替换为 Azure 门户中资源的值。
环境变量不区分大小写。 它们通常以大写形式声明,其中包含的单词以下划线联接。 请根据命令提示符运行以下命令:
设置密钥变量:
setx DI_KEY <yourKey>
设置终结点变量
setx DI_ENDPOINT <yourEndpoint>
设置环境变量后关闭命令提示符窗口。 这些值将一直保留,直到你再次更改它们。
重启任何读取环境变量的运行程序。 例如,如果正在使用 Visual Studio 或 Visual Studio Code 作为编辑器,请在运行示例代码之前重启它。
下面是其他一些可与环境变量一起使用的有用命令:
命令 | 操作 | 示例 |
---|---|---|
setx VARIABLE_NAME= |
通过将环境变量值设置为空字符串来删除环境变量。 | setx DI_KEY= |
setx VARIABLE_NAME=value |
设置或更改环境变量的值。 | setx DI_KEY=<yourKey> |
set VARIABLE_NAME |
显示特定环境变量的值。 | set DI_KEY |
set |
显示所有环境变量。 | set |
分析文档并获取结果
POST 请求用于通过预生成或自定义模型分析文档。 GET 请求用于检索文档分析调用的结果。 modelId
与 POST 一起使用,resultId
与 GET 操作一起使用。
使用下表作为指南。 将 <modelId> 和 <document-url> 替换为所需的值:
型号 | modelId | description | document-url |
---|---|---|---|
“读取”模型 | prebuilt-read | 示例手册 | https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/read.png |
布局模型 | 预生成布局 | 预订确认示例 | https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/layout.png |
W-2 窗体模型 | prebuilt-tax.us.w2 | 示例 W-2 表单 | https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/w2.png |
发票模型 | 预生成的发票 | 示例发票 | https://github.com/Azure-Samples/cognitive-services-REST-api-samples/raw/master/curl/form-recognizer/rest-api/invoice.pdf |
收据模型 | prebuilt-receipt | 示例收据 | https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/receipt.png |
ID 文档模型 | prebuilt-idDocument | 示例身份证明文档 | https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/rest-api/identity_documents.png |
POST 请求
打开控制台窗口并运行以下 cURL 命令。 该命令包含之前在“设置环境变量”部分中创建的终结点和关键环境变量。 如果变量名称不同,请替换这些变量。 请记得替换 <modelId> 和 <document-url> 参数。
curl -i -X POST "%FR_ENDPOINT%formrecognizer/documentModels/<modelId>:analyze?api-version=2023-07-31" -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: %FR_KEY%" --data-ascii "{'urlSource': '<document-url>'}"
若要启用附加功能,请在 POST 请求中使用 features
查询参数。 2023-07-31
(GA) 版本提供了四项附加功能:ocr.highResolution、ocr.formula、ocr.font 和 queryFields.premium。 若要详细了解每项功能,请参阅自定义模型。
只能调用读取和布局模型的 highResolution、公式和字体功能,以及常规文档模型的 queryFields 功能。 以下示例演示如何调用布局模型的 highResolution、公式和字体功能。
curl -i -X POST "%FR_ENDPOINT%formrecognizer/documentModels/prebuilt-layout:analyze?features=ocr.highResolution,ocr.formula,ocr.font?api-version=2023-07-31" -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: %FR_KEY%" --data-ascii "{'urlSource': '<document-url>'}"
POST 响应
你将收到 202 (Success)
响应,其中包含“Operation-location
”标头。 使用此标头的值来检索响应结果。
获取分析结果 (GET 请求)
调用 Analyze document
API 后,请调用 [Get analyze
结果}(https://learn.microsoft.com/rest/api/aiservices/document-models/get-analyze-result?view=rest-aiservices-2023-07-31&preserve-view=true&tabs=HTTP) API,以获取操作的状态和提取的数据。
cURL 命令行工具不设置包含 JSON 内容的 API 响应的格式,这会使内容难以阅读。 要设置 JSON 响应的格式,请在 GET 请求中的 JSON 格式设置工具前面添加竖线。
将 NodeJS json 工具用作 cURL 的 JSON 格式化程序。 如果未安装 Node.js,请下载并安装最新版本。
使用以下命令打开控制台窗口并安装 json 工具:
npm install -g jsontool
通过在 GET 请求中添加竖线
| json
,打印易读的 JSON 输出。curl -i -X GET "<endpoint>formrecognizer/documentModels/prebuilt-read/analyzeResults/0e49604a-2d8e-4b15-b6b8-bb456e5d3e0a?api-version=2023-07-31"-H "Ocp-Apim-Subscription-Key: <subscription key>" | json
GET 请求
在运行以下命令之前,请进行以下更改:
- 将 <POST 响应>替换为 POST 响应的标头“
Operation-location
”。 - 如果 <FR_KEY 不同于代码中的名称,请将其替换为你的环境变量的变量。
- 将 *<json-tool> 替换为 JSON 格式设置工具。
curl -i -X GET "<POST response>" -H "Ocp-Apim-Subscription-Key: %FR_KEY%" | `<json-tool>`
检查响应
你将收到包含 JSON 输出的 200 (Success)
响应。 第一个字段 status
指示操作的状态。 如果操作未完成,则 status
的值为 running
或 notStarted
。 再次手动或通过脚本调用 API。 我们建议两次调用间隔一秒或更长时间。
请访问 GitHub 上的 Azure 示例存储库,查看每个文档智能模型的 GET
响应:
型号 | 输出 URL |
---|---|
“读取”模型 | 读取模型输出 |
布局模型 | 布局模型输出 |
W-2 税务模型 | W-2 税务模型输出 |
发票模型 | 发票模型输出 |
收据模型 | 收据模型输出 |
ID 文档模型 | ID 文档模型输出 |
本指南介绍如何将文档智能添加到应用程序和工作流中。 可以使用所选编程语言或 REST API。 Azure AI 文档智能是一项基于云的 Azure AI 服务,它使用机器学习从文档中提取键值对、文本和表。 我们建议你在学习该技术时使用免费服务。 请记住,每月的免费页数限于 500。
可使用以下 API 提取表单和文档中的结构化数据:
重要
本项目面向文档智能 REST API 版本 v2.1。
本文中的代码使用了同步方法和不受保护的凭据存储。
先决条件
Azure 订阅 - 创建试用订阅。
Visual Studio IDE 或最新版本的 .NET Core。
包含一组训练数据的 Azure 存储 Blob。 有关整理训练数据集的提示和选项,请参阅生成和训练自定义模型。 对于本教程,可使用示例数据集的 Train 文件夹下的文件。 下载并提取 sample_data.zip。
智能资源。 可以使用免费定价层 (
F0
) 试用该服务,然后再升级到付费层进行生产。从创建的资源获取密钥和终结点,以便将应用程序连接到文档智能服务。
- 部署资源后,选择“转到资源”。
- 在左侧导航菜单中,选择“密钥和终结点”。
- 复制其中一个密钥和终结点,稍后你将在本快速入门中使用它们。
设置编程环境
在控制台窗口中,使用 dotnet new
命令创建名为 formrecognizer-project
的新控制台应用。 此命令将创建包含单个源文件的简单“Hello World”C# 项目:program.cs。
dotnet new console -n formrecognizer-project
将目录更改为新创建的应用文件夹。 可以使用以下命令生成应用程序:
dotnet build
生成输出不应包含警告或错误。
...
Build succeeded.
0 Warning(s)
0 Error(s)
...
安装客户端库
在应用程序目录中,使用以下命令安装适用于 .NET 的文档智能客户端库:
dotnet add package Azure.AI.FormRecognizer --version 3.1.1
在编辑器或 IDE 中,从项目目录打开 Program.cs 文件。 然后,添加以下 using
指令:
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program
{
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
private static string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
private static string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
private static string receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args)
{
var recognizerClient = AuthenticateClient();
var trainingClient = AuthenticateTrainingClient();
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, trainModel, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
private static FormRecognizerClient AuthenticateClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_auth_training>
static private FormTrainingClient AuthenticateTrainingClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormTrainingClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth_training>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient)
{
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizerClient
.StartRecognizeContentFromUri(new Uri(invoiceUri))
.WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach (FormPage page in formPages)
{
Console.WriteLine($"Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++)
{
FormLine line = page.Lines[i];
Console.WriteLine($" Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "s" : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri)
{
RecognizedFormCollection receipts = await recognizerClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach (RecognizedForm receipt in receipts)
{
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField))
{
if (merchantNameField.Value.ValueType == FieldValueType.String)
{
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($"Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField))
{
if (transactionDateField.Value.ValueType == FieldValueType.Date)
{
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($"Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField))
{
if (itemsField.Value.ValueType == FieldValueType.List)
{
foreach (FormField itemField in itemsField.Value.AsList())
{
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary)
{
IReadOnlyDictionary<string, FormField> itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField))
{
if (itemNameField.Value.ValueType == FieldValueType.String)
{
string itemName = itemNameField.Value.AsString();
Console.WriteLine($" Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField))
{
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float)
{
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($" Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField))
{
if (totalField.Value.ValueType == FieldValueType.Float)
{
float total = totalField.Value.AsFloat();
Console.WriteLine($"Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_train>
private static async Task<String> TrainModel(
FormTrainingClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task<Guid> TrainModelWithLabelsAsync(
FormRecognizerClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, String modelId, string formUrl)
{
RecognizedFormCollection forms = await recognizerClient
.StartRecognizeCustomFormsFromUri(modelId, new Uri(formUrl))
.WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach (RecognizedForm form in forms)
{
Console.WriteLine($"Form of type: {form.FormType}");
foreach (FormField field in form.Fields.Values)
{
Console.WriteLine($"Field '{field.Name}: ");
if (field.LabelData != null)
{
Console.WriteLine($" Label: '{field.LabelData.Text}");
}
Console.WriteLine($" Value: '{field.ValueData.Text}");
Console.WriteLine($" Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach (FormPage page in form.Pages)
{
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "header" : "text")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormTrainingClient trainingClient, string trainingFileUrl)
{
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($"Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($"It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable<CustomFormModelInfo> models = trainingClient.GetCustomModels();
foreach (CustomFormModelInfo modelInfo in models)
{
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {modelInfo.ModelId}");
Console.WriteLine($" Model Status: {modelInfo.Status}");
Console.WriteLine($" Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(model.ModelId);
Console.WriteLine($"Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach (CustomFormSubmodel submodel in modelCopy.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
在应用程序的 Program
类中,为资源的密钥和终结点创建变量。
重要
转到 Azure 门户。 如果你在“先决条件”部分创建的文档智能资源部署成功,请选择“后续步骤”下的“转到资源”按钮。 在左侧导航菜单中的“资源管理”下,选择“密钥和终结点”。
请记住完成后将密钥从代码中删除。 切勿公开发布。 对于生产环境,请使用安全方法来存储和访问凭据。 有关详细信息,请参阅 Azure AI 服务安全性。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program
{
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
private static string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
private static string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
private static string receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args)
{
var recognizerClient = AuthenticateClient();
var trainingClient = AuthenticateTrainingClient();
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, trainModel, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
private static FormRecognizerClient AuthenticateClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_auth_training>
static private FormTrainingClient AuthenticateTrainingClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormTrainingClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth_training>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient)
{
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizerClient
.StartRecognizeContentFromUri(new Uri(invoiceUri))
.WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach (FormPage page in formPages)
{
Console.WriteLine($"Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++)
{
FormLine line = page.Lines[i];
Console.WriteLine($" Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "s" : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri)
{
RecognizedFormCollection receipts = await recognizerClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach (RecognizedForm receipt in receipts)
{
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField))
{
if (merchantNameField.Value.ValueType == FieldValueType.String)
{
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($"Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField))
{
if (transactionDateField.Value.ValueType == FieldValueType.Date)
{
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($"Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField))
{
if (itemsField.Value.ValueType == FieldValueType.List)
{
foreach (FormField itemField in itemsField.Value.AsList())
{
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary)
{
IReadOnlyDictionary<string, FormField> itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField))
{
if (itemNameField.Value.ValueType == FieldValueType.String)
{
string itemName = itemNameField.Value.AsString();
Console.WriteLine($" Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField))
{
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float)
{
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($" Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField))
{
if (totalField.Value.ValueType == FieldValueType.Float)
{
float total = totalField.Value.AsFloat();
Console.WriteLine($"Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_train>
private static async Task<String> TrainModel(
FormTrainingClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task<Guid> TrainModelWithLabelsAsync(
FormRecognizerClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, String modelId, string formUrl)
{
RecognizedFormCollection forms = await recognizerClient
.StartRecognizeCustomFormsFromUri(modelId, new Uri(formUrl))
.WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach (RecognizedForm form in forms)
{
Console.WriteLine($"Form of type: {form.FormType}");
foreach (FormField field in form.Fields.Values)
{
Console.WriteLine($"Field '{field.Name}: ");
if (field.LabelData != null)
{
Console.WriteLine($" Label: '{field.LabelData.Text}");
}
Console.WriteLine($" Value: '{field.ValueData.Text}");
Console.WriteLine($" Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach (FormPage page in form.Pages)
{
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "header" : "text")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormTrainingClient trainingClient, string trainingFileUrl)
{
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($"Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($"It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable<CustomFormModelInfo> models = trainingClient.GetCustomModels();
foreach (CustomFormModelInfo modelInfo in models)
{
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {modelInfo.ModelId}");
Console.WriteLine($" Model Status: {modelInfo.Status}");
Console.WriteLine($" Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(model.ModelId);
Console.WriteLine($"Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach (CustomFormSubmodel submodel in modelCopy.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
在应用程序的“Main
”方法中,添加对此项目中使用的异步任务的调用:
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program {
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
string receiptUrl = "/cognitive-services/form-recognizer/media" + "/contoso-allinone.jpg";
string bcUrl = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_forms/business_cards/business-card-english.jpg";
string invoiceUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
string idUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/id-license.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args) {
// new code:
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var analyzeBusinessCard = AnalyzeBusinessCard(recognizerClient, bcUrl);
Task.WaitAll(analyzeBusinessCard);
var analyzeInvoice = AnalyzeInvoice(recognizerClient, invoiceUrl);
Task.WaitAll(analyzeInvoice);
var analyzeId = AnalyzeId(recognizerClient, idUrl);
Task.WaitAll(analyzeId);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var trainModelWithLabels = TrainModelWithLabels(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, modelId, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
static private FormRecognizerClient AuthenticateClient() {
string endpoint = "<replace-with-your-form-recognizer-endpoint-here>";
string apiKey = "<replace-with-your-form-recognizer-key-here>";
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient) {
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizeClient.StartRecognizeContentFromUri(new Uri(invoiceUri)).WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach(FormPage page in formPages) {
Console.WriteLine($ "Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++) {
FormLine line = page.Lines[i];
Console.WriteLine($ " Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "
s " : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++) {
FormTable table = page.Tables[i];
Console.WriteLine($ "Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach(FormTableCell cell in table.Cells) {
Console.WriteLine($ " Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri) {
RecognizedFormCollection receipts = await recognizeClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach(RecognizedForm receipt in receipts) {
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField)) {
if (merchantNameField.Value.ValueType == FieldValueType.String) {
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($ "Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField)) {
if (transactionDateField.Value.ValueType == FieldValueType.Date) {
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($ "Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField)) {
if (itemsField.Value.ValueType == FieldValueType.List) {
foreach(FormField itemField in itemsField.Value.AsList()) {
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary) {
IReadOnlyDictionary < string,
FormField > itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField)) {
if (itemNameField.Value.ValueType == FieldValueType.String) {
string itemName = itemNameField.Value.AsString();
Console.WriteLine($ " Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField)) {
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float) {
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($ " Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField)) {
if (totalField.Value.ValueType == FieldValueType.Float) {
float total = totalField.Value.AsFloat();
Console.WriteLine($ "Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_bc_call>
private static async Task AnalyzeBusinessCard(
FormRecognizerClient recognizerClient, string bcUrl) {
RecognizedFormCollection businessCards = await recognizerClient.StartRecognizeBusinessCardsFromUriAsync(bcUrl).WaitForCompletionAsync();
// </snippet_bc_call>
// <snippet_bc_print>
foreach(RecognizedForm businessCard in businessCards) {
FormField ContactNamesField;
if (businessCard.Fields.TryGetValue("ContactNames", out ContactNamesField)) {
if (ContactNamesField.Value.ValueType == FieldValueType.List) {
foreach(FormField contactNameField in ContactNamesField.Value.AsList()) {
Console.WriteLine($ "Contact Name: {contactNameField.ValueData.Text}");
if (contactNameField.Value.ValueType == FieldValueType.Dictionary) {
IReadOnlyDictionary < string,
FormField > contactNameFields = contactNameField.Value.AsDictionary();
FormField firstNameField;
if (contactNameFields.TryGetValue("FirstName", out firstNameField)) {
if (firstNameField.Value.ValueType == FieldValueType.String) {
string firstName = firstNameField.Value.AsString();
Console.WriteLine($ " First Name: '{firstName}', with confidence {firstNameField.Confidence}");
}
}
FormField lastNameField;
if (contactNameFields.TryGetValue("LastName", out lastNameField)) {
if (lastNameField.Value.ValueType == FieldValueType.String) {
string lastName = lastNameField.Value.AsString();
Console.WriteLine($ " Last Name: '{lastName}', with confidence {lastNameField.Confidence}");
}
}
}
}
}
}
FormField jobTitlesFields;
if (businessCard.Fields.TryGetValue("JobTitles", out jobTitlesFields)) {
if (jobTitlesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField jobTitleField in jobTitlesFields.Value.AsList()) {
if (jobTitleField.Value.ValueType == FieldValueType.String) {
string jobTitle = jobTitleField.Value.AsString();
Console.WriteLine($ " Job Title: '{jobTitle}', with confidence {jobTitleField.Confidence}");
}
}
}
}
FormField departmentFields;
if (businessCard.Fields.TryGetValue("Departments", out departmentFields)) {
if (departmentFields.Value.ValueType == FieldValueType.List) {
foreach(FormField departmentField in departmentFields.Value.AsList()) {
if (departmentField.Value.ValueType == FieldValueType.String) {
string department = departmentField.Value.AsString();
Console.WriteLine($ " Department: '{department}', with confidence {departmentField.Confidence}");
}
}
}
}
FormField emailFields;
if (businessCard.Fields.TryGetValue("Emails", out emailFields)) {
if (emailFields.Value.ValueType == FieldValueType.List) {
foreach(FormField emailField in emailFields.Value.AsList()) {
if (emailField.Value.ValueType == FieldValueType.String) {
string email = emailField.Value.AsString();
Console.WriteLine($ " Email: '{email}', with confidence {emailField.Confidence}");
}
}
}
}
FormField websiteFields;
if (businessCard.Fields.TryGetValue("Websites", out websiteFields)) {
if (websiteFields.Value.ValueType == FieldValueType.List) {
foreach(FormField websiteField in websiteFields.Value.AsList()) {
if (websiteField.Value.ValueType == FieldValueType.String) {
string website = websiteField.Value.AsString();
Console.WriteLine($ " Website: '{website}', with confidence {websiteField.Confidence}");
}
}
}
}
FormField mobilePhonesFields;
if (businessCard.Fields.TryGetValue("MobilePhones", out mobilePhonesFields)) {
if (mobilePhonesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField mobilePhoneField in mobilePhonesFields.Value.AsList()) {
if (mobilePhoneField.Value.ValueType == FieldValueType.PhoneNumber) {
string mobilePhone = mobilePhoneField.Value.AsPhoneNumber();
Console.WriteLine($ " Mobile phone number: '{mobilePhone}', with confidence {mobilePhoneField.Confidence}");
}
}
}
}
FormField otherPhonesFields;
if (businessCard.Fields.TryGetValue("OtherPhones", out otherPhonesFields)) {
if (otherPhonesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField otherPhoneField in otherPhonesFields.Value.AsList()) {
if (otherPhoneField.Value.ValueType == FieldValueType.PhoneNumber) {
string otherPhone = otherPhoneField.Value.AsPhoneNumber();
Console.WriteLine($ " Other phone number: '{otherPhone}', with confidence {otherPhoneField.Confidence}");
}
}
}
}
FormField faxesFields;
if (businessCard.Fields.TryGetValue("Faxes", out faxesFields)) {
if (faxesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField faxField in faxesFields.Value.AsList()) {
if (faxField.Value.ValueType == FieldValueType.PhoneNumber) {
string fax = faxField.Value.AsPhoneNumber();
Console.WriteLine($ " Fax phone number: '{fax}', with confidence {faxField.Confidence}");
}
}
}
}
FormField addressesFields;
if (businessCard.Fields.TryGetValue("Addresses", out addressesFields)) {
if (addressesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField addressField in addressesFields.Value.AsList()) {
if (addressField.Value.ValueType == FieldValueType.String) {
string address = addressField.Value.AsString();
Console.WriteLine($ " Address: '{address}', with confidence {addressField.Confidence}");
}
}
}
}
FormField companyNamesFields;
if (businessCard.Fields.TryGetValue("CompanyNames", out companyNamesFields)) {
if (companyNamesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField companyNameField in companyNamesFields.Value.AsList()) {
if (companyNameField.Value.ValueType == FieldValueType.String) {
string companyName = companyNameField.Value.AsString();
Console.WriteLine($ " Company name: '{companyName}', with confidence {companyNameField.Confidence}");
}
}
}
}
}
}
// </snippet_bc_print>
// <snippet_invoice_call>
private static async Task AnalyzeInvoice(
FormRecognizerClient recognizerClient, string invoiceUrl) {
var options = new RecognizeInvoicesOptions() {
Locale = "en-US"
};
RecognizedFormCollection invoices = await recognizerClient.StartRecognizeInvoicesFromUriAsync(invoiceUrl, options).WaitForCompletionAsync();
// </snippet_invoice_call>
// <snippet_invoice_print>
RecognizedForm invoice = invoices.Single();
FormField invoiceIdField;
if (invoice.Fields.TryGetValue("InvoiceId", out invoiceIdField)) {
if (invoiceIdField.Value.ValueType == FieldValueType.String) {
string invoiceId = invoiceIdField.Value.AsString();
Console.WriteLine($ " Invoice Id: '{invoiceId}', with confidence {invoiceIdField.Confidence}");
}
}
FormField invoiceDateField;
if (invoice.Fields.TryGetValue("InvoiceDate", out invoiceDateField)) {
if (invoiceDateField.Value.ValueType == FieldValueType.Date) {
DateTime invoiceDate = invoiceDateField.Value.AsDate();
Console.WriteLine($ " Invoice Date: '{invoiceDate}', with confidence {invoiceDateField.Confidence}");
}
}
FormField dueDateField;
if (invoice.Fields.TryGetValue("DueDate", out dueDateField)) {
if (dueDateField.Value.ValueType == FieldValueType.Date) {
DateTime dueDate = dueDateField.Value.AsDate();
Console.WriteLine($ " Due Date: '{dueDate}', with confidence {dueDateField.Confidence}");
}
}
FormField vendorNameField;
if (invoice.Fields.TryGetValue("VendorName", out vendorNameField)) {
if (vendorNameField.Value.ValueType == FieldValueType.String) {
string vendorName = vendorNameField.Value.AsString();
Console.WriteLine($ " Vendor Name: '{vendorName}', with confidence {vendorNameField.Confidence}");
}
}
FormField vendorAddressField;
if (invoice.Fields.TryGetValue("VendorAddress", out vendorAddressField)) {
if (vendorAddressField.Value.ValueType == FieldValueType.String) {
string vendorAddress = vendorAddressField.Value.AsString();
Console.WriteLine($ " Vendor Address: '{vendorAddress}', with confidence {vendorAddressField.Confidence}");
}
}
FormField customerNameField;
if (invoice.Fields.TryGetValue("CustomerName", out customerNameField)) {
if (customerNameField.Value.ValueType == FieldValueType.String) {
string customerName = customerNameField.Value.AsString();
Console.WriteLine($ " Customer Name: '{customerName}', with confidence {customerNameField.Confidence}");
}
}
FormField customerAddressField;
if (invoice.Fields.TryGetValue("CustomerAddress", out customerAddressField)) {
if (customerAddressField.Value.ValueType == FieldValueType.String) {
string customerAddress = customerAddressField.Value.AsString();
Console.WriteLine($ " Customer Address: '{customerAddress}', with confidence {customerAddressField.Confidence}");
}
}
FormField customerAddressRecipientField;
if (invoice.Fields.TryGetValue("CustomerAddressRecipient", out customerAddressRecipientField)) {
if (customerAddressRecipientField.Value.ValueType == FieldValueType.String) {
string customerAddressRecipient = customerAddressRecipientField.Value.AsString();
Console.WriteLine($ " Customer address recipient: '{customerAddressRecipient}', with confidence {customerAddressRecipientField.Confidence}");
}
}
FormField invoiceTotalField;
if (invoice.Fields.TryGetValue("InvoiceTotal", out invoiceTotalField)) {
if (invoiceTotalField.Value.ValueType == FieldValueType.Float) {
float invoiceTotal = invoiceTotalField.Value.AsFloat();
Console.WriteLine($ " Invoice Total: '{invoiceTotal}', with confidence {invoiceTotalField.Confidence}");
}
}
}
// </snippet_invoice_print>
// <snippet_id_call>
private static async Task AnalyzeId(
FormRecognizerClient recognizerClient, string idUrl) {
RecognizedFormCollection identityDocument = await recognizerClient.StartRecognizeIdDocumentsFromUriAsync(idUrl).WaitForCompletionAsync();
//</snippet_id_call>
//<snippet_id_print>
RecognizedForm identityDocument = identityDocuments.Single();
if (identityDocument.Fields.TryGetValue("Address", out FormField addressField)) {
if (addressField.Value.ValueType == FieldValueType.String) {
string address = addressField.Value.AsString();
Console.WriteLine($ "Address: '{address}', with confidence {addressField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("CountryRegion", out FormField countryRegionField)) {
if (countryRegionField.Value.ValueType == FieldValueType.CountryRegion) {
string countryRegion = countryRegionField.Value.AsCountryRegion();
Console.WriteLine($ "CountryRegion: '{countryRegion}', with confidence {countryRegionField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfBirth", out FormField dateOfBirthField)) {
if (dateOfBirthField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfBirth = dateOfBirthField.Value.AsDate();
Console.WriteLine($ "Date Of Birth: '{dateOfBirth}', with confidence {dateOfBirthField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfExpiration", out FormField dateOfExpirationField)) {
if (dateOfExpirationField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfExpiration = dateOfExpirationField.Value.AsDate();
Console.WriteLine($ "Date Of Expiration: '{dateOfExpiration}', with confidence {dateOfExpirationField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DocumentNumber", out FormField documentNumberField)) {
if (documentNumberField.Value.ValueType == FieldValueType.String) {
string documentNumber = documentNumberField.Value.AsString();
Console.WriteLine($ "Document Number: '{documentNumber}', with confidence {documentNumberField.Confidence}");
}
RecognizedForm identityDocument = identityDocuments.Single();
if (identityDocument.Fields.TryGetValue("Address", out FormField addressField)) {
if (addressField.Value.ValueType == FieldValueType.String) {
string address = addressField.Value.AsString();
Console.WriteLine($ "Address: '{address}', with confidence {addressField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("CountryRegion", out FormField countryRegionField)) {
if (countryRegionField.Value.ValueType == FieldValueType.CountryRegion) {
string countryRegion = countryRegionField.Value.AsCountryRegion();
Console.WriteLine($ "CountryRegion: '{countryRegion}', with confidence {countryRegionField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfBirth", out FormField dateOfBirthField)) {
if (dateOfBirthField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfBirth = dateOfBirthField.Value.AsDate();
Console.WriteLine($ "Date Of Birth: '{dateOfBirth}', with confidence {dateOfBirthField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfExpiration", out FormField dateOfExpirationField)) {
if (dateOfExpirationField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfExpiration = dateOfExpirationField.Value.AsDate();
Console.WriteLine($ "Date Of Expiration: '{dateOfExpiration}', with confidence {dateOfExpirationField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DocumentNumber", out FormField documentNumberField)) {
if (documentNumberField.Value.ValueType == FieldValueType.String) {
string documentNumber = documentNumberField.Value.AsString();
Console.WriteLine($ "Document Number: '{documentNumber}', with confidence {documentNumberField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("FirstName", out FormField firstNameField)) {
if (firstNameField.Value.ValueType == FieldValueType.String) {
string firstName = firstNameField.Value.AsString();
Console.WriteLine($ "First Name: '{firstName}', with confidence {firstNameField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("LastName", out FormField lastNameField)) {
if (lastNameField.Value.ValueType == FieldValueType.String) {
string lastName = lastNameField.Value.AsString();
Console.WriteLine($ "Last Name: '{lastName}', with confidence {lastNameField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("Region", out FormField regionfield)) {
if (regionfield.Value.ValueType == FieldValueType.String) {
string region = regionfield.Value.AsString();
Console.WriteLine($ "Region: '{region}', with confidence {regionfield.Confidence}");
}
}
//</snippet_id_print>
// <snippet_train>
private static async Task < Guid > TrainModel(
FormRecognizerClient trainingClient, string trainingDataUrl) {
CustomFormModel model = await trainingClient.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false).WaitForCompletionAsync();
Console.WriteLine($ "Custom Model Info:");
Console.WriteLine($ " Model Id: {model.ModelId}");
Console.WriteLine($ " Model Status: {model.Status}");
Console.WriteLine($ " Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($ " Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach(CustomFormSubmodel submodel in model.Submodels) {
Console.WriteLine($ "Submodel Form Type: {submodel.FormType}");
foreach(CustomFormModelField field in submodel.Fields.Values) {
Console.Write($ " FieldName: {field.Name}");
if (field.Label != null) {
Console.Write($ ", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task < Guid > TrainModelWithLabels(
FormRecognizerClient trainingClient, string trainingDataUrl) {
CustomFormModel model = await trainingClient.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true).WaitForCompletionAsync();
Console.WriteLine($ "Custom Model Info:");
Console.WriteLine($ " Model Id: {model.ModelId}");
Console.WriteLine($ " Model Status: {model.Status}");
Console.WriteLine($ " Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($ " Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach(CustomFormSubmodel submodel in model.Submodels) {
Console.WriteLine($ "Submodel Form Type: {submodel.FormType}");
foreach(CustomFormModelField field in submodel.Fields.Values) {
Console.Write($ " FieldName: {field.Name}");
if (field.Label != null) {
Console.Write($ ", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, Guid modelId, string formUrl) {
RecognizedFormCollection forms = await recognizeClient.StartRecognizeCustomFormsFromUri(modelId, new Uri(invoiceUri)).WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach(RecognizedForm form in forms) {
Console.WriteLine($ "Form of type: {form.FormType}");
foreach(FormField field in form.Fields.Values) {
Console.WriteLine($ "Field '{field.Name}: ");
if (field.LabelData != null) {
Console.WriteLine($ " Label: '{field.LabelData.Text}");
}
Console.WriteLine($ " Value: '{field.ValueData.Text}");
Console.WriteLine($ " Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach(FormPage page in form.Pages.Values) {
for (int i = 0; i < page.Tables.Count; i++) {
FormTable table = page.Tables[i];
Console.WriteLine($ "Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach(FormTableCell cell in table.Cells) {
Console.WriteLine($ " Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "
header " : "
text ")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormRecognizerClient trainingClient, string trainingFileUrl) {
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($ "Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($ "It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable < CustomFormModelInfo > models = trainingClient.GetCustomModels();
foreach(CustomFormModelInfo modelInfo in models) {
Console.WriteLine($ "Custom Model Info:");
Console.WriteLine($ " Model Id: {modelInfo.ModelId}");
Console.WriteLine($ " Model Status: {modelInfo.Status}");
Console.WriteLine($ " Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($ " Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(modelId);
Console.WriteLine($ "Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach(CustomFormSubmodel submodel in modelCopy.Submodels) {
Console.WriteLine($ "Submodel Form Type: {submodel.FormType}");
foreach(CustomFormModelField field in submodel.Fields.Values) {
Console.Write($ " FieldName: {field.Name}");
if (field.Label != null) {
Console.Write($ ", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
使用对象模型
在文档智能中,可以创建两种不同的客户端类型。 第一种是 FormRecognizerClient
,用于查询服务以识别表单域和内容。 第二种是 FormTrainingClient
,用于创建和管理可改进识别的自定义模型。
FormRecognizerClient
提供以下操作:
- 使用为了分析自定义表单而训练的自定义模型,来识别表单字段和内容。 这些值在
RecognizedForm
对象的集合中返回。 请参阅使用自定义模型分析表单。 - 无需训练模型即可识别表单内容,包括表格、行和单词。 表单内容在
FormPage
对象的集合中返回。 请参阅分析布局。 - 使用文档智能服务上预先训练的模型来识别美国的收据、名片、发票和 ID 文档中的常见字段。
FormTrainingClient
提供操作以实现以下目的:
- 训练自定义模型,以分析在自定义表单中找到的所有字段和值。 会返回一个
CustomFormModel
,它指示模型分析的表单类型,以及为每种表单类型提取的字段。 - 训练自定义模型,以分析通过标记自定义表单指定的特定字段和值。 会返回一个
CustomFormModel
,它指示模型提取的字段以及每个字段的估计准确度。 - 管理在帐户中创建的模型。
- 将自定义模型从一个文档智能资源复制到另一个资源。
注意
还可以使用图形用户界面(例如示例标记工具)来训练模型。
对客户端进行身份验证
在 Main
下,创建名为“AuthenticateClient
”的方法。 在其他任务中使用此方法来验证对文档智能服务的请求。 此方法使用 AzureKeyCredential
对象,因此,如果需要,你可以更新密钥而无需创建新的客户端对象。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program
{
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
private static string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
private static string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
private static string receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args)
{
var recognizerClient = AuthenticateClient();
var trainingClient = AuthenticateTrainingClient();
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, trainModel, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
private static FormRecognizerClient AuthenticateClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_auth_training>
static private FormTrainingClient AuthenticateTrainingClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormTrainingClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth_training>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient)
{
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizerClient
.StartRecognizeContentFromUri(new Uri(invoiceUri))
.WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach (FormPage page in formPages)
{
Console.WriteLine($"Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++)
{
FormLine line = page.Lines[i];
Console.WriteLine($" Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "s" : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri)
{
RecognizedFormCollection receipts = await recognizerClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach (RecognizedForm receipt in receipts)
{
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField))
{
if (merchantNameField.Value.ValueType == FieldValueType.String)
{
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($"Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField))
{
if (transactionDateField.Value.ValueType == FieldValueType.Date)
{
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($"Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField))
{
if (itemsField.Value.ValueType == FieldValueType.List)
{
foreach (FormField itemField in itemsField.Value.AsList())
{
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary)
{
IReadOnlyDictionary<string, FormField> itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField))
{
if (itemNameField.Value.ValueType == FieldValueType.String)
{
string itemName = itemNameField.Value.AsString();
Console.WriteLine($" Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField))
{
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float)
{
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($" Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField))
{
if (totalField.Value.ValueType == FieldValueType.Float)
{
float total = totalField.Value.AsFloat();
Console.WriteLine($"Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_train>
private static async Task<String> TrainModel(
FormTrainingClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task<Guid> TrainModelWithLabelsAsync(
FormRecognizerClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, String modelId, string formUrl)
{
RecognizedFormCollection forms = await recognizerClient
.StartRecognizeCustomFormsFromUri(modelId, new Uri(formUrl))
.WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach (RecognizedForm form in forms)
{
Console.WriteLine($"Form of type: {form.FormType}");
foreach (FormField field in form.Fields.Values)
{
Console.WriteLine($"Field '{field.Name}: ");
if (field.LabelData != null)
{
Console.WriteLine($" Label: '{field.LabelData.Text}");
}
Console.WriteLine($" Value: '{field.ValueData.Text}");
Console.WriteLine($" Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach (FormPage page in form.Pages)
{
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "header" : "text")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormTrainingClient trainingClient, string trainingFileUrl)
{
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($"Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($"It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable<CustomFormModelInfo> models = trainingClient.GetCustomModels();
foreach (CustomFormModelInfo modelInfo in models)
{
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {modelInfo.ModelId}");
Console.WriteLine($" Model Status: {modelInfo.Status}");
Console.WriteLine($" Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(model.ModelId);
Console.WriteLine($"Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach (CustomFormSubmodel submodel in modelCopy.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
对于就训练客户端进行身份验证的新方法,请重复这些步骤。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program
{
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
private static string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
private static string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
private static string receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args)
{
var recognizerClient = AuthenticateClient();
var trainingClient = AuthenticateTrainingClient();
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, trainModel, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
private static FormRecognizerClient AuthenticateClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_auth_training>
static private FormTrainingClient AuthenticateTrainingClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormTrainingClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth_training>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient)
{
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizerClient
.StartRecognizeContentFromUri(new Uri(invoiceUri))
.WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach (FormPage page in formPages)
{
Console.WriteLine($"Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++)
{
FormLine line = page.Lines[i];
Console.WriteLine($" Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "s" : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri)
{
RecognizedFormCollection receipts = await recognizerClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach (RecognizedForm receipt in receipts)
{
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField))
{
if (merchantNameField.Value.ValueType == FieldValueType.String)
{
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($"Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField))
{
if (transactionDateField.Value.ValueType == FieldValueType.Date)
{
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($"Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField))
{
if (itemsField.Value.ValueType == FieldValueType.List)
{
foreach (FormField itemField in itemsField.Value.AsList())
{
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary)
{
IReadOnlyDictionary<string, FormField> itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField))
{
if (itemNameField.Value.ValueType == FieldValueType.String)
{
string itemName = itemNameField.Value.AsString();
Console.WriteLine($" Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField))
{
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float)
{
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($" Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField))
{
if (totalField.Value.ValueType == FieldValueType.Float)
{
float total = totalField.Value.AsFloat();
Console.WriteLine($"Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_train>
private static async Task<String> TrainModel(
FormTrainingClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task<Guid> TrainModelWithLabelsAsync(
FormRecognizerClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, String modelId, string formUrl)
{
RecognizedFormCollection forms = await recognizerClient
.StartRecognizeCustomFormsFromUri(modelId, new Uri(formUrl))
.WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach (RecognizedForm form in forms)
{
Console.WriteLine($"Form of type: {form.FormType}");
foreach (FormField field in form.Fields.Values)
{
Console.WriteLine($"Field '{field.Name}: ");
if (field.LabelData != null)
{
Console.WriteLine($" Label: '{field.LabelData.Text}");
}
Console.WriteLine($" Value: '{field.ValueData.Text}");
Console.WriteLine($" Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach (FormPage page in form.Pages)
{
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "header" : "text")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormTrainingClient trainingClient, string trainingFileUrl)
{
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($"Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($"It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable<CustomFormModelInfo> models = trainingClient.GetCustomModels();
foreach (CustomFormModelInfo modelInfo in models)
{
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {modelInfo.ModelId}");
Console.WriteLine($" Model Status: {modelInfo.Status}");
Console.WriteLine($" Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(model.ModelId);
Console.WriteLine($"Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach (CustomFormSubmodel submodel in modelCopy.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
获取用于测试的资产
还需要为训练和测试数据添加对 URL 的引用。 将这些引用添加到“Program
”类的根中。
若要检索自定义模型训练数据的 SAS URL,请在 Azure 门户中找到存储资源,然后选择“数据存储>容器”。
导航到你的容器,右键单击并选择“生成 SAS”。
请务必获取容器的 SAS,而不是存储帐户本身的。
确保选中“读取”、“写入”、“删除”和“列出”权限,然后选择“生成 SAS 令牌和 URL”。
将“URL”部分中的值复制到临时位置。 它应当采用
https://<storage account>.blob.core.chinacloudapi.cn/<container name>?<SAS value>
形式。
重复上述步骤以获取 Blob 存储容器中单个文档的 SAS URL。 将该 SAS URL 也保存到临时位置。
保存包含的示例图像的 URL。 GitHub 上也会提供该图片)。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program {
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
string receiptUrl = "/cognitive-services/form-recognizer/media" + "/contoso-allinone.jpg";
string bcUrl = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_forms/business_cards/business-card-english.jpg";
string invoiceUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
string idUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/id-license.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args) {
// new code:
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var analyzeBusinessCard = AnalyzeBusinessCard(recognizerClient, bcUrl);
Task.WaitAll(analyzeBusinessCard);
var analyzeInvoice = AnalyzeInvoice(recognizerClient, invoiceUrl);
Task.WaitAll(analyzeInvoice);
var analyzeId = AnalyzeId(recognizerClient, idUrl);
Task.WaitAll(analyzeId);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var trainModelWithLabels = TrainModelWithLabels(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, modelId, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
static private FormRecognizerClient AuthenticateClient() {
string endpoint = "<replace-with-your-form-recognizer-endpoint-here>";
string apiKey = "<replace-with-your-form-recognizer-key-here>";
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient) {
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizeClient.StartRecognizeContentFromUri(new Uri(invoiceUri)).WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach(FormPage page in formPages) {
Console.WriteLine($ "Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++) {
FormLine line = page.Lines[i];
Console.WriteLine($ " Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "
s " : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++) {
FormTable table = page.Tables[i];
Console.WriteLine($ "Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach(FormTableCell cell in table.Cells) {
Console.WriteLine($ " Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri) {
RecognizedFormCollection receipts = await recognizeClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach(RecognizedForm receipt in receipts) {
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField)) {
if (merchantNameField.Value.ValueType == FieldValueType.String) {
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($ "Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField)) {
if (transactionDateField.Value.ValueType == FieldValueType.Date) {
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($ "Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField)) {
if (itemsField.Value.ValueType == FieldValueType.List) {
foreach(FormField itemField in itemsField.Value.AsList()) {
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary) {
IReadOnlyDictionary < string,
FormField > itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField)) {
if (itemNameField.Value.ValueType == FieldValueType.String) {
string itemName = itemNameField.Value.AsString();
Console.WriteLine($ " Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField)) {
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float) {
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($ " Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField)) {
if (totalField.Value.ValueType == FieldValueType.Float) {
float total = totalField.Value.AsFloat();
Console.WriteLine($ "Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_bc_call>
private static async Task AnalyzeBusinessCard(
FormRecognizerClient recognizerClient, string bcUrl) {
RecognizedFormCollection businessCards = await recognizerClient.StartRecognizeBusinessCardsFromUriAsync(bcUrl).WaitForCompletionAsync();
// </snippet_bc_call>
// <snippet_bc_print>
foreach(RecognizedForm businessCard in businessCards) {
FormField ContactNamesField;
if (businessCard.Fields.TryGetValue("ContactNames", out ContactNamesField)) {
if (ContactNamesField.Value.ValueType == FieldValueType.List) {
foreach(FormField contactNameField in ContactNamesField.Value.AsList()) {
Console.WriteLine($ "Contact Name: {contactNameField.ValueData.Text}");
if (contactNameField.Value.ValueType == FieldValueType.Dictionary) {
IReadOnlyDictionary < string,
FormField > contactNameFields = contactNameField.Value.AsDictionary();
FormField firstNameField;
if (contactNameFields.TryGetValue("FirstName", out firstNameField)) {
if (firstNameField.Value.ValueType == FieldValueType.String) {
string firstName = firstNameField.Value.AsString();
Console.WriteLine($ " First Name: '{firstName}', with confidence {firstNameField.Confidence}");
}
}
FormField lastNameField;
if (contactNameFields.TryGetValue("LastName", out lastNameField)) {
if (lastNameField.Value.ValueType == FieldValueType.String) {
string lastName = lastNameField.Value.AsString();
Console.WriteLine($ " Last Name: '{lastName}', with confidence {lastNameField.Confidence}");
}
}
}
}
}
}
FormField jobTitlesFields;
if (businessCard.Fields.TryGetValue("JobTitles", out jobTitlesFields)) {
if (jobTitlesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField jobTitleField in jobTitlesFields.Value.AsList()) {
if (jobTitleField.Value.ValueType == FieldValueType.String) {
string jobTitle = jobTitleField.Value.AsString();
Console.WriteLine($ " Job Title: '{jobTitle}', with confidence {jobTitleField.Confidence}");
}
}
}
}
FormField departmentFields;
if (businessCard.Fields.TryGetValue("Departments", out departmentFields)) {
if (departmentFields.Value.ValueType == FieldValueType.List) {
foreach(FormField departmentField in departmentFields.Value.AsList()) {
if (departmentField.Value.ValueType == FieldValueType.String) {
string department = departmentField.Value.AsString();
Console.WriteLine($ " Department: '{department}', with confidence {departmentField.Confidence}");
}
}
}
}
FormField emailFields;
if (businessCard.Fields.TryGetValue("Emails", out emailFields)) {
if (emailFields.Value.ValueType == FieldValueType.List) {
foreach(FormField emailField in emailFields.Value.AsList()) {
if (emailField.Value.ValueType == FieldValueType.String) {
string email = emailField.Value.AsString();
Console.WriteLine($ " Email: '{email}', with confidence {emailField.Confidence}");
}
}
}
}
FormField websiteFields;
if (businessCard.Fields.TryGetValue("Websites", out websiteFields)) {
if (websiteFields.Value.ValueType == FieldValueType.List) {
foreach(FormField websiteField in websiteFields.Value.AsList()) {
if (websiteField.Value.ValueType == FieldValueType.String) {
string website = websiteField.Value.AsString();
Console.WriteLine($ " Website: '{website}', with confidence {websiteField.Confidence}");
}
}
}
}
FormField mobilePhonesFields;
if (businessCard.Fields.TryGetValue("MobilePhones", out mobilePhonesFields)) {
if (mobilePhonesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField mobilePhoneField in mobilePhonesFields.Value.AsList()) {
if (mobilePhoneField.Value.ValueType == FieldValueType.PhoneNumber) {
string mobilePhone = mobilePhoneField.Value.AsPhoneNumber();
Console.WriteLine($ " Mobile phone number: '{mobilePhone}', with confidence {mobilePhoneField.Confidence}");
}
}
}
}
FormField otherPhonesFields;
if (businessCard.Fields.TryGetValue("OtherPhones", out otherPhonesFields)) {
if (otherPhonesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField otherPhoneField in otherPhonesFields.Value.AsList()) {
if (otherPhoneField.Value.ValueType == FieldValueType.PhoneNumber) {
string otherPhone = otherPhoneField.Value.AsPhoneNumber();
Console.WriteLine($ " Other phone number: '{otherPhone}', with confidence {otherPhoneField.Confidence}");
}
}
}
}
FormField faxesFields;
if (businessCard.Fields.TryGetValue("Faxes", out faxesFields)) {
if (faxesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField faxField in faxesFields.Value.AsList()) {
if (faxField.Value.ValueType == FieldValueType.PhoneNumber) {
string fax = faxField.Value.AsPhoneNumber();
Console.WriteLine($ " Fax phone number: '{fax}', with confidence {faxField.Confidence}");
}
}
}
}
FormField addressesFields;
if (businessCard.Fields.TryGetValue("Addresses", out addressesFields)) {
if (addressesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField addressField in addressesFields.Value.AsList()) {
if (addressField.Value.ValueType == FieldValueType.String) {
string address = addressField.Value.AsString();
Console.WriteLine($ " Address: '{address}', with confidence {addressField.Confidence}");
}
}
}
}
FormField companyNamesFields;
if (businessCard.Fields.TryGetValue("CompanyNames", out companyNamesFields)) {
if (companyNamesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField companyNameField in companyNamesFields.Value.AsList()) {
if (companyNameField.Value.ValueType == FieldValueType.String) {
string companyName = companyNameField.Value.AsString();
Console.WriteLine($ " Company name: '{companyName}', with confidence {companyNameField.Confidence}");
}
}
}
}
}
}
// </snippet_bc_print>
// <snippet_invoice_call>
private static async Task AnalyzeInvoice(
FormRecognizerClient recognizerClient, string invoiceUrl) {
var options = new RecognizeInvoicesOptions() {
Locale = "en-US"
};
RecognizedFormCollection invoices = await recognizerClient.StartRecognizeInvoicesFromUriAsync(invoiceUrl, options).WaitForCompletionAsync();
// </snippet_invoice_call>
// <snippet_invoice_print>
RecognizedForm invoice = invoices.Single();
FormField invoiceIdField;
if (invoice.Fields.TryGetValue("InvoiceId", out invoiceIdField)) {
if (invoiceIdField.Value.ValueType == FieldValueType.String) {
string invoiceId = invoiceIdField.Value.AsString();
Console.WriteLine($ " Invoice Id: '{invoiceId}', with confidence {invoiceIdField.Confidence}");
}
}
FormField invoiceDateField;
if (invoice.Fields.TryGetValue("InvoiceDate", out invoiceDateField)) {
if (invoiceDateField.Value.ValueType == FieldValueType.Date) {
DateTime invoiceDate = invoiceDateField.Value.AsDate();
Console.WriteLine($ " Invoice Date: '{invoiceDate}', with confidence {invoiceDateField.Confidence}");
}
}
FormField dueDateField;
if (invoice.Fields.TryGetValue("DueDate", out dueDateField)) {
if (dueDateField.Value.ValueType == FieldValueType.Date) {
DateTime dueDate = dueDateField.Value.AsDate();
Console.WriteLine($ " Due Date: '{dueDate}', with confidence {dueDateField.Confidence}");
}
}
FormField vendorNameField;
if (invoice.Fields.TryGetValue("VendorName", out vendorNameField)) {
if (vendorNameField.Value.ValueType == FieldValueType.String) {
string vendorName = vendorNameField.Value.AsString();
Console.WriteLine($ " Vendor Name: '{vendorName}', with confidence {vendorNameField.Confidence}");
}
}
FormField vendorAddressField;
if (invoice.Fields.TryGetValue("VendorAddress", out vendorAddressField)) {
if (vendorAddressField.Value.ValueType == FieldValueType.String) {
string vendorAddress = vendorAddressField.Value.AsString();
Console.WriteLine($ " Vendor Address: '{vendorAddress}', with confidence {vendorAddressField.Confidence}");
}
}
FormField customerNameField;
if (invoice.Fields.TryGetValue("CustomerName", out customerNameField)) {
if (customerNameField.Value.ValueType == FieldValueType.String) {
string customerName = customerNameField.Value.AsString();
Console.WriteLine($ " Customer Name: '{customerName}', with confidence {customerNameField.Confidence}");
}
}
FormField customerAddressField;
if (invoice.Fields.TryGetValue("CustomerAddress", out customerAddressField)) {
if (customerAddressField.Value.ValueType == FieldValueType.String) {
string customerAddress = customerAddressField.Value.AsString();
Console.WriteLine($ " Customer Address: '{customerAddress}', with confidence {customerAddressField.Confidence}");
}
}
FormField customerAddressRecipientField;
if (invoice.Fields.TryGetValue("CustomerAddressRecipient", out customerAddressRecipientField)) {
if (customerAddressRecipientField.Value.ValueType == FieldValueType.String) {
string customerAddressRecipient = customerAddressRecipientField.Value.AsString();
Console.WriteLine($ " Customer address recipient: '{customerAddressRecipient}', with confidence {customerAddressRecipientField.Confidence}");
}
}
FormField invoiceTotalField;
if (invoice.Fields.TryGetValue("InvoiceTotal", out invoiceTotalField)) {
if (invoiceTotalField.Value.ValueType == FieldValueType.Float) {
float invoiceTotal = invoiceTotalField.Value.AsFloat();
Console.WriteLine($ " Invoice Total: '{invoiceTotal}', with confidence {invoiceTotalField.Confidence}");
}
}
}
// </snippet_invoice_print>
// <snippet_id_call>
private static async Task AnalyzeId(
FormRecognizerClient recognizerClient, string idUrl) {
RecognizedFormCollection identityDocument = await recognizerClient.StartRecognizeIdDocumentsFromUriAsync(idUrl).WaitForCompletionAsync();
//</snippet_id_call>
//<snippet_id_print>
RecognizedForm identityDocument = identityDocuments.Single();
if (identityDocument.Fields.TryGetValue("Address", out FormField addressField)) {
if (addressField.Value.ValueType == FieldValueType.String) {
string address = addressField.Value.AsString();
Console.WriteLine($ "Address: '{address}', with confidence {addressField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("CountryRegion", out FormField countryRegionField)) {
if (countryRegionField.Value.ValueType == FieldValueType.CountryRegion) {
string countryRegion = countryRegionField.Value.AsCountryRegion();
Console.WriteLine($ "CountryRegion: '{countryRegion}', with confidence {countryRegionField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfBirth", out FormField dateOfBirthField)) {
if (dateOfBirthField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfBirth = dateOfBirthField.Value.AsDate();
Console.WriteLine($ "Date Of Birth: '{dateOfBirth}', with confidence {dateOfBirthField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfExpiration", out FormField dateOfExpirationField)) {
if (dateOfExpirationField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfExpiration = dateOfExpirationField.Value.AsDate();
Console.WriteLine($ "Date Of Expiration: '{dateOfExpiration}', with confidence {dateOfExpirationField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DocumentNumber", out FormField documentNumberField)) {
if (documentNumberField.Value.ValueType == FieldValueType.String) {
string documentNumber = documentNumberField.Value.AsString();
Console.WriteLine($ "Document Number: '{documentNumber}', with confidence {documentNumberField.Confidence}");
}
RecognizedForm identityDocument = identityDocuments.Single();
if (identityDocument.Fields.TryGetValue("Address", out FormField addressField)) {
if (addressField.Value.ValueType == FieldValueType.String) {
string address = addressField.Value.AsString();
Console.WriteLine($ "Address: '{address}', with confidence {addressField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("CountryRegion", out FormField countryRegionField)) {
if (countryRegionField.Value.ValueType == FieldValueType.CountryRegion) {
string countryRegion = countryRegionField.Value.AsCountryRegion();
Console.WriteLine($ "CountryRegion: '{countryRegion}', with confidence {countryRegionField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfBirth", out FormField dateOfBirthField)) {
if (dateOfBirthField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfBirth = dateOfBirthField.Value.AsDate();
Console.WriteLine($ "Date Of Birth: '{dateOfBirth}', with confidence {dateOfBirthField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfExpiration", out FormField dateOfExpirationField)) {
if (dateOfExpirationField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfExpiration = dateOfExpirationField.Value.AsDate();
Console.WriteLine($ "Date Of Expiration: '{dateOfExpiration}', with confidence {dateOfExpirationField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DocumentNumber", out FormField documentNumberField)) {
if (documentNumberField.Value.ValueType == FieldValueType.String) {
string documentNumber = documentNumberField.Value.AsString();
Console.WriteLine($ "Document Number: '{documentNumber}', with confidence {documentNumberField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("FirstName", out FormField firstNameField)) {
if (firstNameField.Value.ValueType == FieldValueType.String) {
string firstName = firstNameField.Value.AsString();
Console.WriteLine($ "First Name: '{firstName}', with confidence {firstNameField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("LastName", out FormField lastNameField)) {
if (lastNameField.Value.ValueType == FieldValueType.String) {
string lastName = lastNameField.Value.AsString();
Console.WriteLine($ "Last Name: '{lastName}', with confidence {lastNameField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("Region", out FormField regionfield)) {
if (regionfield.Value.ValueType == FieldValueType.String) {
string region = regionfield.Value.AsString();
Console.WriteLine($ "Region: '{region}', with confidence {regionfield.Confidence}");
}
}
//</snippet_id_print>
// <snippet_train>
private static async Task < Guid > TrainModel(
FormRecognizerClient trainingClient, string trainingDataUrl) {
CustomFormModel model = await trainingClient.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false).WaitForCompletionAsync();
Console.WriteLine($ "Custom Model Info:");
Console.WriteLine($ " Model Id: {model.ModelId}");
Console.WriteLine($ " Model Status: {model.Status}");
Console.WriteLine($ " Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($ " Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach(CustomFormSubmodel submodel in model.Submodels) {
Console.WriteLine($ "Submodel Form Type: {submodel.FormType}");
foreach(CustomFormModelField field in submodel.Fields.Values) {
Console.Write($ " FieldName: {field.Name}");
if (field.Label != null) {
Console.Write($ ", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task < Guid > TrainModelWithLabels(
FormRecognizerClient trainingClient, string trainingDataUrl) {
CustomFormModel model = await trainingClient.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true).WaitForCompletionAsync();
Console.WriteLine($ "Custom Model Info:");
Console.WriteLine($ " Model Id: {model.ModelId}");
Console.WriteLine($ " Model Status: {model.Status}");
Console.WriteLine($ " Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($ " Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach(CustomFormSubmodel submodel in model.Submodels) {
Console.WriteLine($ "Submodel Form Type: {submodel.FormType}");
foreach(CustomFormModelField field in submodel.Fields.Values) {
Console.Write($ " FieldName: {field.Name}");
if (field.Label != null) {
Console.Write($ ", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, Guid modelId, string formUrl) {
RecognizedFormCollection forms = await recognizeClient.StartRecognizeCustomFormsFromUri(modelId, new Uri(invoiceUri)).WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach(RecognizedForm form in forms) {
Console.WriteLine($ "Form of type: {form.FormType}");
foreach(FormField field in form.Fields.Values) {
Console.WriteLine($ "Field '{field.Name}: ");
if (field.LabelData != null) {
Console.WriteLine($ " Label: '{field.LabelData.Text}");
}
Console.WriteLine($ " Value: '{field.ValueData.Text}");
Console.WriteLine($ " Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach(FormPage page in form.Pages.Values) {
for (int i = 0; i < page.Tables.Count; i++) {
FormTable table = page.Tables[i];
Console.WriteLine($ "Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach(FormTableCell cell in table.Cells) {
Console.WriteLine($ " Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "
header " : "
text ")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormRecognizerClient trainingClient, string trainingFileUrl) {
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($ "Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($ "It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable < CustomFormModelInfo > models = trainingClient.GetCustomModels();
foreach(CustomFormModelInfo modelInfo in models) {
Console.WriteLine($ "Custom Model Info:");
Console.WriteLine($ " Model Id: {modelInfo.ModelId}");
Console.WriteLine($ " Model Status: {modelInfo.Status}");
Console.WriteLine($ " Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($ " Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(modelId);
Console.WriteLine($ "Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach(CustomFormSubmodel submodel in modelCopy.Submodels) {
Console.WriteLine($ "Submodel Form Type: {submodel.FormType}");
foreach(CustomFormModelField field in submodel.Fields.Values) {
Console.Write($ " FieldName: {field.Name}");
if (field.Label != null) {
Console.Write($ ", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
分析布局
可以使用文档智能分析文档中的表格、行和单词,而无需训练模型。 返回的值是 FormPage 对象的集合。 提交的文档中每个页面都有一个对象。 有关布局提取的详细信息,请参阅文档智能布局模型。
若要分析位于给定 URL 的文件的内容,请使用 StartRecognizeContentFromUri
方法。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program
{
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
private static string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
private static string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
private static string receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args)
{
var recognizerClient = AuthenticateClient();
var trainingClient = AuthenticateTrainingClient();
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, trainModel, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
private static FormRecognizerClient AuthenticateClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_auth_training>
static private FormTrainingClient AuthenticateTrainingClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormTrainingClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth_training>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient)
{
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizerClient
.StartRecognizeContentFromUri(new Uri(invoiceUri))
.WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach (FormPage page in formPages)
{
Console.WriteLine($"Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++)
{
FormLine line = page.Lines[i];
Console.WriteLine($" Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "s" : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri)
{
RecognizedFormCollection receipts = await recognizerClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach (RecognizedForm receipt in receipts)
{
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField))
{
if (merchantNameField.Value.ValueType == FieldValueType.String)
{
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($"Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField))
{
if (transactionDateField.Value.ValueType == FieldValueType.Date)
{
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($"Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField))
{
if (itemsField.Value.ValueType == FieldValueType.List)
{
foreach (FormField itemField in itemsField.Value.AsList())
{
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary)
{
IReadOnlyDictionary<string, FormField> itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField))
{
if (itemNameField.Value.ValueType == FieldValueType.String)
{
string itemName = itemNameField.Value.AsString();
Console.WriteLine($" Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField))
{
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float)
{
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($" Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField))
{
if (totalField.Value.ValueType == FieldValueType.Float)
{
float total = totalField.Value.AsFloat();
Console.WriteLine($"Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_train>
private static async Task<String> TrainModel(
FormTrainingClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task<Guid> TrainModelWithLabelsAsync(
FormRecognizerClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, String modelId, string formUrl)
{
RecognizedFormCollection forms = await recognizerClient
.StartRecognizeCustomFormsFromUri(modelId, new Uri(formUrl))
.WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach (RecognizedForm form in forms)
{
Console.WriteLine($"Form of type: {form.FormType}");
foreach (FormField field in form.Fields.Values)
{
Console.WriteLine($"Field '{field.Name}: ");
if (field.LabelData != null)
{
Console.WriteLine($" Label: '{field.LabelData.Text}");
}
Console.WriteLine($" Value: '{field.ValueData.Text}");
Console.WriteLine($" Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach (FormPage page in form.Pages)
{
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "header" : "text")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormTrainingClient trainingClient, string trainingFileUrl)
{
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($"Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($"It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable<CustomFormModelInfo> models = trainingClient.GetCustomModels();
foreach (CustomFormModelInfo modelInfo in models)
{
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {modelInfo.ModelId}");
Console.WriteLine($" Model Status: {modelInfo.Status}");
Console.WriteLine($" Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(model.ModelId);
Console.WriteLine($"Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach (CustomFormSubmodel submodel in modelCopy.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
提示
还可从本地文件中获取内容。 请参阅 FormRecognizerClient 方法,例如 StartRecognizeContent
。 或者,请参阅 GitHub 上的示例代码,了解涉及本地图像的方案。
此任务的其余部分将内容信息输出到控制台。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program
{
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
private static string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
private static string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
private static string receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args)
{
var recognizerClient = AuthenticateClient();
var trainingClient = AuthenticateTrainingClient();
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, trainModel, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
private static FormRecognizerClient AuthenticateClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_auth_training>
static private FormTrainingClient AuthenticateTrainingClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormTrainingClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth_training>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient)
{
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizerClient
.StartRecognizeContentFromUri(new Uri(invoiceUri))
.WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach (FormPage page in formPages)
{
Console.WriteLine($"Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++)
{
FormLine line = page.Lines[i];
Console.WriteLine($" Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "s" : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri)
{
RecognizedFormCollection receipts = await recognizerClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach (RecognizedForm receipt in receipts)
{
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField))
{
if (merchantNameField.Value.ValueType == FieldValueType.String)
{
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($"Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField))
{
if (transactionDateField.Value.ValueType == FieldValueType.Date)
{
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($"Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField))
{
if (itemsField.Value.ValueType == FieldValueType.List)
{
foreach (FormField itemField in itemsField.Value.AsList())
{
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary)
{
IReadOnlyDictionary<string, FormField> itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField))
{
if (itemNameField.Value.ValueType == FieldValueType.String)
{
string itemName = itemNameField.Value.AsString();
Console.WriteLine($" Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField))
{
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float)
{
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($" Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField))
{
if (totalField.Value.ValueType == FieldValueType.Float)
{
float total = totalField.Value.AsFloat();
Console.WriteLine($"Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_train>
private static async Task<String> TrainModel(
FormTrainingClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task<Guid> TrainModelWithLabelsAsync(
FormRecognizerClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, String modelId, string formUrl)
{
RecognizedFormCollection forms = await recognizerClient
.StartRecognizeCustomFormsFromUri(modelId, new Uri(formUrl))
.WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach (RecognizedForm form in forms)
{
Console.WriteLine($"Form of type: {form.FormType}");
foreach (FormField field in form.Fields.Values)
{
Console.WriteLine($"Field '{field.Name}: ");
if (field.LabelData != null)
{
Console.WriteLine($" Label: '{field.LabelData.Text}");
}
Console.WriteLine($" Value: '{field.ValueData.Text}");
Console.WriteLine($" Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach (FormPage page in form.Pages)
{
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "header" : "text")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormTrainingClient trainingClient, string trainingFileUrl)
{
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($"Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($"It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable<CustomFormModelInfo> models = trainingClient.GetCustomModels();
foreach (CustomFormModelInfo modelInfo in models)
{
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {modelInfo.ModelId}");
Console.WriteLine($" Model Status: {modelInfo.Status}");
Console.WriteLine($" Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(model.ModelId);
Console.WriteLine($"Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach (CustomFormSubmodel submodel in modelCopy.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
结果与以下输出类似。
Form Page 1 has 18 lines.
Line 0 has 1 word, and text: 'Contoso'.
Line 1 has 1 word, and text: 'Address:'.
Line 2 has 3 words, and text: 'Invoice For: Microsoft'.
Line 3 has 4 words, and text: '1 Redmond way Suite'.
Line 4 has 3 words, and text: '1020 Enterprise Way'.
Line 5 has 3 words, and text: '6000 Redmond, WA'.
Line 6 has 3 words, and text: 'Sunnayvale, CA 87659'.
Line 7 has 1 word, and text: '99243'.
Line 8 has 2 words, and text: 'Invoice Number'.
Line 9 has 2 words, and text: 'Invoice Date'.
Line 10 has 3 words, and text: 'Invoice Due Date'.
Line 11 has 1 word, and text: 'Charges'.
Line 12 has 2 words, and text: 'VAT ID'.
Line 13 has 1 word, and text: '34278587'.
Line 14 has 1 word, and text: '6/18/2017'.
Line 15 has 1 word, and text: '6/24/2017'.
Line 16 has 1 word, and text: '$56,651.49'.
Line 17 has 1 word, and text: 'PT'.
Table 0 has 2 rows and 6 columns.
Cell (0, 0) contains text: 'Invoice Number'.
Cell (0, 1) contains text: 'Invoice Date'.
Cell (0, 2) contains text: 'Invoice Due Date'.
Cell (0, 3) contains text: 'Charges'.
Cell (0, 5) contains text: 'VAT ID'.
Cell (1, 0) contains text: '34278587'.
Cell (1, 1) contains text: '6/18/2017'.
Cell (1, 2) contains text: '6/24/2017'.
Cell (1, 3) contains text: '$56,651.49'.
Cell (1, 5) contains text: 'PT'.
分析回执
本部分演示了如何使用预先训练的收据模型分析和提取美国收据中的常见字段。 有关收据分析的详细信息,请参阅文档智能收据模型。
若要分析位于某个 URL 的收据,请使用 StartRecognizeReceiptsFromUri
方法。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program
{
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
private static string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
private static string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
private static string receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args)
{
var recognizerClient = AuthenticateClient();
var trainingClient = AuthenticateTrainingClient();
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, trainModel, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
private static FormRecognizerClient AuthenticateClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_auth_training>
static private FormTrainingClient AuthenticateTrainingClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormTrainingClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth_training>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient)
{
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizerClient
.StartRecognizeContentFromUri(new Uri(invoiceUri))
.WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach (FormPage page in formPages)
{
Console.WriteLine($"Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++)
{
FormLine line = page.Lines[i];
Console.WriteLine($" Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "s" : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri)
{
RecognizedFormCollection receipts = await recognizerClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach (RecognizedForm receipt in receipts)
{
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField))
{
if (merchantNameField.Value.ValueType == FieldValueType.String)
{
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($"Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField))
{
if (transactionDateField.Value.ValueType == FieldValueType.Date)
{
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($"Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField))
{
if (itemsField.Value.ValueType == FieldValueType.List)
{
foreach (FormField itemField in itemsField.Value.AsList())
{
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary)
{
IReadOnlyDictionary<string, FormField> itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField))
{
if (itemNameField.Value.ValueType == FieldValueType.String)
{
string itemName = itemNameField.Value.AsString();
Console.WriteLine($" Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField))
{
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float)
{
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($" Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField))
{
if (totalField.Value.ValueType == FieldValueType.Float)
{
float total = totalField.Value.AsFloat();
Console.WriteLine($"Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_train>
private static async Task<String> TrainModel(
FormTrainingClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task<Guid> TrainModelWithLabelsAsync(
FormRecognizerClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, String modelId, string formUrl)
{
RecognizedFormCollection forms = await recognizerClient
.StartRecognizeCustomFormsFromUri(modelId, new Uri(formUrl))
.WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach (RecognizedForm form in forms)
{
Console.WriteLine($"Form of type: {form.FormType}");
foreach (FormField field in form.Fields.Values)
{
Console.WriteLine($"Field '{field.Name}: ");
if (field.LabelData != null)
{
Console.WriteLine($" Label: '{field.LabelData.Text}");
}
Console.WriteLine($" Value: '{field.ValueData.Text}");
Console.WriteLine($" Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach (FormPage page in form.Pages)
{
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "header" : "text")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormTrainingClient trainingClient, string trainingFileUrl)
{
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($"Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($"It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable<CustomFormModelInfo> models = trainingClient.GetCustomModels();
foreach (CustomFormModelInfo modelInfo in models)
{
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {modelInfo.ModelId}");
Console.WriteLine($" Model Status: {modelInfo.Status}");
Console.WriteLine($" Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(model.ModelId);
Console.WriteLine($"Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach (CustomFormSubmodel submodel in modelCopy.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
提示
你还可以分析本地收据图像。 请参阅 FormRecognizerClient 方法,例如 StartRecognizeReceipts
。 或者,请参阅 GitHub 上的示例代码,了解涉及本地图像的方案。
返回的值是 RecognizedForm
对象的集合。 提交的文档中每个页面都有一个对象。 下面代码处理给定 URI 的回执,并将主字段和值输出到控制台。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program
{
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
private static string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
private static string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
private static string receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args)
{
var recognizerClient = AuthenticateClient();
var trainingClient = AuthenticateTrainingClient();
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, trainModel, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
private static FormRecognizerClient AuthenticateClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_auth_training>
static private FormTrainingClient AuthenticateTrainingClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormTrainingClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth_training>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient)
{
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizerClient
.StartRecognizeContentFromUri(new Uri(invoiceUri))
.WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach (FormPage page in formPages)
{
Console.WriteLine($"Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++)
{
FormLine line = page.Lines[i];
Console.WriteLine($" Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "s" : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri)
{
RecognizedFormCollection receipts = await recognizerClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach (RecognizedForm receipt in receipts)
{
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField))
{
if (merchantNameField.Value.ValueType == FieldValueType.String)
{
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($"Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField))
{
if (transactionDateField.Value.ValueType == FieldValueType.Date)
{
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($"Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField))
{
if (itemsField.Value.ValueType == FieldValueType.List)
{
foreach (FormField itemField in itemsField.Value.AsList())
{
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary)
{
IReadOnlyDictionary<string, FormField> itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField))
{
if (itemNameField.Value.ValueType == FieldValueType.String)
{
string itemName = itemNameField.Value.AsString();
Console.WriteLine($" Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField))
{
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float)
{
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($" Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField))
{
if (totalField.Value.ValueType == FieldValueType.Float)
{
float total = totalField.Value.AsFloat();
Console.WriteLine($"Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_train>
private static async Task<String> TrainModel(
FormTrainingClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task<Guid> TrainModelWithLabelsAsync(
FormRecognizerClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, String modelId, string formUrl)
{
RecognizedFormCollection forms = await recognizerClient
.StartRecognizeCustomFormsFromUri(modelId, new Uri(formUrl))
.WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach (RecognizedForm form in forms)
{
Console.WriteLine($"Form of type: {form.FormType}");
foreach (FormField field in form.Fields.Values)
{
Console.WriteLine($"Field '{field.Name}: ");
if (field.LabelData != null)
{
Console.WriteLine($" Label: '{field.LabelData.Text}");
}
Console.WriteLine($" Value: '{field.ValueData.Text}");
Console.WriteLine($" Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach (FormPage page in form.Pages)
{
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "header" : "text")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormTrainingClient trainingClient, string trainingFileUrl)
{
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($"Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($"It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable<CustomFormModelInfo> models = trainingClient.GetCustomModels();
foreach (CustomFormModelInfo modelInfo in models)
{
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {modelInfo.ModelId}");
Console.WriteLine($" Model Status: {modelInfo.Status}");
Console.WriteLine($" Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(model.ModelId);
Console.WriteLine($"Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach (CustomFormSubmodel submodel in modelCopy.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
结果与以下输出类似。
Form Page 1 has 18 lines.
Line 0 has 1 word, and text: 'Contoso'.
Line 1 has 1 word, and text: 'Address:'.
Line 2 has 3 words, and text: 'Invoice For: Microsoft'.
Line 3 has 4 words, and text: '1 Redmond way Suite'.
Line 4 has 3 words, and text: '1020 Enterprise Way'.
Line 5 has 3 words, and text: '6000 Redmond, WA'.
Line 6 has 3 words, and text: 'Sunnayvale, CA 87659'.
Line 7 has 1 word, and text: '99243'.
Line 8 has 2 words, and text: 'Invoice Number'.
Line 9 has 2 words, and text: 'Invoice Date'.
Line 10 has 3 words, and text: 'Invoice Due Date'.
Line 11 has 1 word, and text: 'Charges'.
Line 12 has 2 words, and text: 'VAT ID'.
Line 13 has 1 word, and text: '34278587'.
Line 14 has 1 word, and text: '6/18/2017'.
Line 15 has 1 word, and text: '6/24/2017'.
Line 16 has 1 word, and text: '$56,651.49'.
Line 17 has 1 word, and text: 'PT'.
Table 0 has 2 rows and 6 columns.
Cell (0, 0) contains text: 'Invoice Number'.
Cell (0, 1) contains text: 'Invoice Date'.
Cell (0, 2) contains text: 'Invoice Due Date'.
Cell (0, 3) contains text: 'Charges'.
Cell (0, 5) contains text: 'VAT ID'.
Cell (1, 0) contains text: '34278587'.
Cell (1, 1) contains text: '6/18/2017'.
Cell (1, 2) contains text: '6/24/2017'.
Cell (1, 3) contains text: '$56,651.49'.
Cell (1, 5) contains text: 'PT'.
Merchant Name: 'Contoso Contoso', with confidence 0.516
Transaction Date: '6/10/2019 12:00:00 AM', with confidence 0.985
Item:
Name: '8GB RAM (Black)', with confidence 0.916
Total Price: '999', with confidence 0.559
Item:
Name: 'SurfacePen', with confidence 0.858
Total Price: '99.99', with confidence 0.386
Total: '1203.39', with confidence '0.774'
分析名片
本部分演示了如何使用预先训练的模型分析和提取英文名片中的常见字段。 有关名片分析的详细信息,请参阅文档智能名片模型。
若要分析位于某个 URL 的名片,请使用 StartRecognizeBusinessCardsFromUriAsync
方法。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program {
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
string receiptUrl = "/cognitive-services/form-recognizer/media" + "/contoso-allinone.jpg";
string bcUrl = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_forms/business_cards/business-card-english.jpg";
string invoiceUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
string idUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/id-license.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args) {
// new code:
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var analyzeBusinessCard = AnalyzeBusinessCard(recognizerClient, bcUrl);
Task.WaitAll(analyzeBusinessCard);
var analyzeInvoice = AnalyzeInvoice(recognizerClient, invoiceUrl);
Task.WaitAll(analyzeInvoice);
var analyzeId = AnalyzeId(recognizerClient, idUrl);
Task.WaitAll(analyzeId);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var trainModelWithLabels = TrainModelWithLabels(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, modelId, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
static private FormRecognizerClient AuthenticateClient() {
string endpoint = "<replace-with-your-form-recognizer-endpoint-here>";
string apiKey = "<replace-with-your-form-recognizer-key-here>";
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient) {
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizeClient.StartRecognizeContentFromUri(new Uri(invoiceUri)).WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach(FormPage page in formPages) {
Console.WriteLine($ "Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++) {
FormLine line = page.Lines[i];
Console.WriteLine($ " Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "
s " : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++) {
FormTable table = page.Tables[i];
Console.WriteLine($ "Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach(FormTableCell cell in table.Cells) {
Console.WriteLine($ " Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri) {
RecognizedFormCollection receipts = await recognizeClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach(RecognizedForm receipt in receipts) {
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField)) {
if (merchantNameField.Value.ValueType == FieldValueType.String) {
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($ "Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField)) {
if (transactionDateField.Value.ValueType == FieldValueType.Date) {
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($ "Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField)) {
if (itemsField.Value.ValueType == FieldValueType.List) {
foreach(FormField itemField in itemsField.Value.AsList()) {
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary) {
IReadOnlyDictionary < string,
FormField > itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField)) {
if (itemNameField.Value.ValueType == FieldValueType.String) {
string itemName = itemNameField.Value.AsString();
Console.WriteLine($ " Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField)) {
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float) {
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($ " Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField)) {
if (totalField.Value.ValueType == FieldValueType.Float) {
float total = totalField.Value.AsFloat();
Console.WriteLine($ "Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_bc_call>
private static async Task AnalyzeBusinessCard(
FormRecognizerClient recognizerClient, string bcUrl) {
RecognizedFormCollection businessCards = await recognizerClient.StartRecognizeBusinessCardsFromUriAsync(bcUrl).WaitForCompletionAsync();
// </snippet_bc_call>
// <snippet_bc_print>
foreach(RecognizedForm businessCard in businessCards) {
FormField ContactNamesField;
if (businessCard.Fields.TryGetValue("ContactNames", out ContactNamesField)) {
if (ContactNamesField.Value.ValueType == FieldValueType.List) {
foreach(FormField contactNameField in ContactNamesField.Value.AsList()) {
Console.WriteLine($ "Contact Name: {contactNameField.ValueData.Text}");
if (contactNameField.Value.ValueType == FieldValueType.Dictionary) {
IReadOnlyDictionary < string,
FormField > contactNameFields = contactNameField.Value.AsDictionary();
FormField firstNameField;
if (contactNameFields.TryGetValue("FirstName", out firstNameField)) {
if (firstNameField.Value.ValueType == FieldValueType.String) {
string firstName = firstNameField.Value.AsString();
Console.WriteLine($ " First Name: '{firstName}', with confidence {firstNameField.Confidence}");
}
}
FormField lastNameField;
if (contactNameFields.TryGetValue("LastName", out lastNameField)) {
if (lastNameField.Value.ValueType == FieldValueType.String) {
string lastName = lastNameField.Value.AsString();
Console.WriteLine($ " Last Name: '{lastName}', with confidence {lastNameField.Confidence}");
}
}
}
}
}
}
FormField jobTitlesFields;
if (businessCard.Fields.TryGetValue("JobTitles", out jobTitlesFields)) {
if (jobTitlesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField jobTitleField in jobTitlesFields.Value.AsList()) {
if (jobTitleField.Value.ValueType == FieldValueType.String) {
string jobTitle = jobTitleField.Value.AsString();
Console.WriteLine($ " Job Title: '{jobTitle}', with confidence {jobTitleField.Confidence}");
}
}
}
}
FormField departmentFields;
if (businessCard.Fields.TryGetValue("Departments", out departmentFields)) {
if (departmentFields.Value.ValueType == FieldValueType.List) {
foreach(FormField departmentField in departmentFields.Value.AsList()) {
if (departmentField.Value.ValueType == FieldValueType.String) {
string department = departmentField.Value.AsString();
Console.WriteLine($ " Department: '{department}', with confidence {departmentField.Confidence}");
}
}
}
}
FormField emailFields;
if (businessCard.Fields.TryGetValue("Emails", out emailFields)) {
if (emailFields.Value.ValueType == FieldValueType.List) {
foreach(FormField emailField in emailFields.Value.AsList()) {
if (emailField.Value.ValueType == FieldValueType.String) {
string email = emailField.Value.AsString();
Console.WriteLine($ " Email: '{email}', with confidence {emailField.Confidence}");
}
}
}
}
FormField websiteFields;
if (businessCard.Fields.TryGetValue("Websites", out websiteFields)) {
if (websiteFields.Value.ValueType == FieldValueType.List) {
foreach(FormField websiteField in websiteFields.Value.AsList()) {
if (websiteField.Value.ValueType == FieldValueType.String) {
string website = websiteField.Value.AsString();
Console.WriteLine($ " Website: '{website}', with confidence {websiteField.Confidence}");
}
}
}
}
FormField mobilePhonesFields;
if (businessCard.Fields.TryGetValue("MobilePhones", out mobilePhonesFields)) {
if (mobilePhonesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField mobilePhoneField in mobilePhonesFields.Value.AsList()) {
if (mobilePhoneField.Value.ValueType == FieldValueType.PhoneNumber) {
string mobilePhone = mobilePhoneField.Value.AsPhoneNumber();
Console.WriteLine($ " Mobile phone number: '{mobilePhone}', with confidence {mobilePhoneField.Confidence}");
}
}
}
}
FormField otherPhonesFields;
if (businessCard.Fields.TryGetValue("OtherPhones", out otherPhonesFields)) {
if (otherPhonesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField otherPhoneField in otherPhonesFields.Value.AsList()) {
if (otherPhoneField.Value.ValueType == FieldValueType.PhoneNumber) {
string otherPhone = otherPhoneField.Value.AsPhoneNumber();
Console.WriteLine($ " Other phone number: '{otherPhone}', with confidence {otherPhoneField.Confidence}");
}
}
}
}
FormField faxesFields;
if (businessCard.Fields.TryGetValue("Faxes", out faxesFields)) {
if (faxesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField faxField in faxesFields.Value.AsList()) {
if (faxField.Value.ValueType == FieldValueType.PhoneNumber) {
string fax = faxField.Value.AsPhoneNumber();
Console.WriteLine($ " Fax phone number: '{fax}', with confidence {faxField.Confidence}");
}
}
}
}
FormField addressesFields;
if (businessCard.Fields.TryGetValue("Addresses", out addressesFields)) {
if (addressesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField addressField in addressesFields.Value.AsList()) {
if (addressField.Value.ValueType == FieldValueType.String) {
string address = addressField.Value.AsString();
Console.WriteLine($ " Address: '{address}', with confidence {addressField.Confidence}");
}
}
}
}
FormField companyNamesFields;
if (businessCard.Fields.TryGetValue("CompanyNames", out companyNamesFields)) {
if (companyNamesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField companyNameField in companyNamesFields.Value.AsList()) {
if (companyNameField.Value.ValueType == FieldValueType.String) {
string companyName = companyNameField.Value.AsString();
Console.WriteLine($ " Company name: '{companyName}', with confidence {companyNameField.Confidence}");
}
}
}
}
}
}
// </snippet_bc_print>
// <snippet_invoice_call>
private static async Task AnalyzeInvoice(
FormRecognizerClient recognizerClient, string invoiceUrl) {
var options = new RecognizeInvoicesOptions() {
Locale = "en-US"
};
RecognizedFormCollection invoices = await recognizerClient.StartRecognizeInvoicesFromUriAsync(invoiceUrl, options).WaitForCompletionAsync();
// </snippet_invoice_call>
// <snippet_invoice_print>
RecognizedForm invoice = invoices.Single();
FormField invoiceIdField;
if (invoice.Fields.TryGetValue("InvoiceId", out invoiceIdField)) {
if (invoiceIdField.Value.ValueType == FieldValueType.String) {
string invoiceId = invoiceIdField.Value.AsString();
Console.WriteLine($ " Invoice Id: '{invoiceId}', with confidence {invoiceIdField.Confidence}");
}
}
FormField invoiceDateField;
if (invoice.Fields.TryGetValue("InvoiceDate", out invoiceDateField)) {
if (invoiceDateField.Value.ValueType == FieldValueType.Date) {
DateTime invoiceDate = invoiceDateField.Value.AsDate();
Console.WriteLine($ " Invoice Date: '{invoiceDate}', with confidence {invoiceDateField.Confidence}");
}
}
FormField dueDateField;
if (invoice.Fields.TryGetValue("DueDate", out dueDateField)) {
if (dueDateField.Value.ValueType == FieldValueType.Date) {
DateTime dueDate = dueDateField.Value.AsDate();
Console.WriteLine($ " Due Date: '{dueDate}', with confidence {dueDateField.Confidence}");
}
}
FormField vendorNameField;
if (invoice.Fields.TryGetValue("VendorName", out vendorNameField)) {
if (vendorNameField.Value.ValueType == FieldValueType.String) {
string vendorName = vendorNameField.Value.AsString();
Console.WriteLine($ " Vendor Name: '{vendorName}', with confidence {vendorNameField.Confidence}");
}
}
FormField vendorAddressField;
if (invoice.Fields.TryGetValue("VendorAddress", out vendorAddressField)) {
if (vendorAddressField.Value.ValueType == FieldValueType.String) {
string vendorAddress = vendorAddressField.Value.AsString();
Console.WriteLine($ " Vendor Address: '{vendorAddress}', with confidence {vendorAddressField.Confidence}");
}
}
FormField customerNameField;
if (invoice.Fields.TryGetValue("CustomerName", out customerNameField)) {
if (customerNameField.Value.ValueType == FieldValueType.String) {
string customerName = customerNameField.Value.AsString();
Console.WriteLine($ " Customer Name: '{customerName}', with confidence {customerNameField.Confidence}");
}
}
FormField customerAddressField;
if (invoice.Fields.TryGetValue("CustomerAddress", out customerAddressField)) {
if (customerAddressField.Value.ValueType == FieldValueType.String) {
string customerAddress = customerAddressField.Value.AsString();
Console.WriteLine($ " Customer Address: '{customerAddress}', with confidence {customerAddressField.Confidence}");
}
}
FormField customerAddressRecipientField;
if (invoice.Fields.TryGetValue("CustomerAddressRecipient", out customerAddressRecipientField)) {
if (customerAddressRecipientField.Value.ValueType == FieldValueType.String) {
string customerAddressRecipient = customerAddressRecipientField.Value.AsString();
Console.WriteLine($ " Customer address recipient: '{customerAddressRecipient}', with confidence {customerAddressRecipientField.Confidence}");
}
}
FormField invoiceTotalField;
if (invoice.Fields.TryGetValue("InvoiceTotal", out invoiceTotalField)) {
if (invoiceTotalField.Value.ValueType == FieldValueType.Float) {
float invoiceTotal = invoiceTotalField.Value.AsFloat();
Console.WriteLine($ " Invoice Total: '{invoiceTotal}', with confidence {invoiceTotalField.Confidence}");
}
}
}
// </snippet_invoice_print>
// <snippet_id_call>
private static async Task AnalyzeId(
FormRecognizerClient recognizerClient, string idUrl) {
RecognizedFormCollection identityDocument = await recognizerClient.StartRecognizeIdDocumentsFromUriAsync(idUrl).WaitForCompletionAsync();
//</snippet_id_call>
//<snippet_id_print>
RecognizedForm identityDocument = identityDocuments.Single();
if (identityDocument.Fields.TryGetValue("Address", out FormField addressField)) {
if (addressField.Value.ValueType == FieldValueType.String) {
string address = addressField.Value.AsString();
Console.WriteLine($ "Address: '{address}', with confidence {addressField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("CountryRegion", out FormField countryRegionField)) {
if (countryRegionField.Value.ValueType == FieldValueType.CountryRegion) {
string countryRegion = countryRegionField.Value.AsCountryRegion();
Console.WriteLine($ "CountryRegion: '{countryRegion}', with confidence {countryRegionField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfBirth", out FormField dateOfBirthField)) {
if (dateOfBirthField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfBirth = dateOfBirthField.Value.AsDate();
Console.WriteLine($ "Date Of Birth: '{dateOfBirth}', with confidence {dateOfBirthField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfExpiration", out FormField dateOfExpirationField)) {
if (dateOfExpirationField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfExpiration = dateOfExpirationField.Value.AsDate();
Console.WriteLine($ "Date Of Expiration: '{dateOfExpiration}', with confidence {dateOfExpirationField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DocumentNumber", out FormField documentNumberField)) {
if (documentNumberField.Value.ValueType == FieldValueType.String) {
string documentNumber = documentNumberField.Value.AsString();
Console.WriteLine($ "Document Number: '{documentNumber}', with confidence {documentNumberField.Confidence}");
}
RecognizedForm identityDocument = identityDocuments.Single();
if (identityDocument.Fields.TryGetValue("Address", out FormField addressField)) {
if (addressField.Value.ValueType == FieldValueType.String) {
string address = addressField.Value.AsString();
Console.WriteLine($ "Address: '{address}', with confidence {addressField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("CountryRegion", out FormField countryRegionField)) {
if (countryRegionField.Value.ValueType == FieldValueType.CountryRegion) {
string countryRegion = countryRegionField.Value.AsCountryRegion();
Console.WriteLine($ "CountryRegion: '{countryRegion}', with confidence {countryRegionField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfBirth", out FormField dateOfBirthField)) {
if (dateOfBirthField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfBirth = dateOfBirthField.Value.AsDate();
Console.WriteLine($ "Date Of Birth: '{dateOfBirth}', with confidence {dateOfBirthField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfExpiration", out FormField dateOfExpirationField)) {
if (dateOfExpirationField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfExpiration = dateOfExpirationField.Value.AsDate();
Console.WriteLine($ "Date Of Expiration: '{dateOfExpiration}', with confidence {dateOfExpirationField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DocumentNumber", out FormField documentNumberField)) {
if (documentNumberField.Value.ValueType == FieldValueType.String) {
string documentNumber = documentNumberField.Value.AsString();
Console.WriteLine($ "Document Number: '{documentNumber}', with confidence {documentNumberField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("FirstName", out FormField firstNameField)) {
if (firstNameField.Value.ValueType == FieldValueType.String) {
string firstName = firstNameField.Value.AsString();
Console.WriteLine($ "First Name: '{firstName}', with confidence {firstNameField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("LastName", out FormField lastNameField)) {
if (lastNameField.Value.ValueType == FieldValueType.String) {
string lastName = lastNameField.Value.AsString();
Console.WriteLine($ "Last Name: '{lastName}', with confidence {lastNameField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("Region", out FormField regionfield)) {
if (regionfield.Value.ValueType == FieldValueType.String) {
string region = regionfield.Value.AsString();
Console.WriteLine($ "Region: '{region}', with confidence {regionfield.Confidence}");
}
}
//</snippet_id_print>
// <snippet_train>
private static async Task < Guid > TrainModel(
FormRecognizerClient trainingClient, string trainingDataUrl) {
CustomFormModel model = await trainingClient.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false).WaitForCompletionAsync();
Console.WriteLine($ "Custom Model Info:");
Console.WriteLine($ " Model Id: {model.ModelId}");
Console.WriteLine($ " Model Status: {model.Status}");
Console.WriteLine($ " Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($ " Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach(CustomFormSubmodel submodel in model.Submodels) {
Console.WriteLine($ "Submodel Form Type: {submodel.FormType}");
foreach(CustomFormModelField field in submodel.Fields.Values) {
Console.Write($ " FieldName: {field.Name}");
if (field.Label != null) {
Console.Write($ ", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task < Guid > TrainModelWithLabels(
FormRecognizerClient trainingClient, string trainingDataUrl) {
CustomFormModel model = await trainingClient.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true).WaitForCompletionAsync();
Console.WriteLine($ "Custom Model Info:");
Console.WriteLine($ " Model Id: {model.ModelId}");
Console.WriteLine($ " Model Status: {model.Status}");
Console.WriteLine($ " Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($ " Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach(CustomFormSubmodel submodel in model.Submodels) {
Console.WriteLine($ "Submodel Form Type: {submodel.FormType}");
foreach(CustomFormModelField field in submodel.Fields.Values) {
Console.Write($ " FieldName: {field.Name}");
if (field.Label != null) {
Console.Write($ ", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, Guid modelId, string formUrl) {
RecognizedFormCollection forms = await recognizeClient.StartRecognizeCustomFormsFromUri(modelId, new Uri(invoiceUri)).WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach(RecognizedForm form in forms) {
Console.WriteLine($ "Form of type: {form.FormType}");
foreach(FormField field in form.Fields.Values) {
Console.WriteLine($ "Field '{field.Name}: ");
if (field.LabelData != null) {
Console.WriteLine($ " Label: '{field.LabelData.Text}");
}
Console.WriteLine($ " Value: '{field.ValueData.Text}");
Console.WriteLine($ " Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach(FormPage page in form.Pages.Values) {
for (int i = 0; i < page.Tables.Count; i++) {
FormTable table = page.Tables[i];
Console.WriteLine($ "Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach(FormTableCell cell in table.Cells) {
Console.WriteLine($ " Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "
header " : "
text ")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormRecognizerClient trainingClient, string trainingFileUrl) {
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($ "Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($ "It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable < CustomFormModelInfo > models = trainingClient.GetCustomModels();
foreach(CustomFormModelInfo modelInfo in models) {
Console.WriteLine($ "Custom Model Info:");
Console.WriteLine($ " Model Id: {modelInfo.ModelId}");
Console.WriteLine($ " Model Status: {modelInfo.Status}");
Console.WriteLine($ " Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($ " Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(modelId);
Console.WriteLine($ "Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach(CustomFormSubmodel submodel in modelCopy.Submodels) {
Console.WriteLine($ "Submodel Form Type: {submodel.FormType}");
foreach(CustomFormModelField field in submodel.Fields.Values) {
Console.Write($ " FieldName: {field.Name}");
if (field.Label != null) {
Console.Write($ ", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
提示
你还可以分析本地名片图像。 请参阅 FormRecognizerClient 方法,例如 StartRecognizeBusinessCards
。 或者,请参阅 GitHub 上的示例代码,了解涉及本地图像的方案。
以下代码处理给定 URI 的名片,并将主字段和值输出到控制台。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program {
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
string receiptUrl = "/cognitive-services/form-recognizer/media" + "/contoso-allinone.jpg";
string bcUrl = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_forms/business_cards/business-card-english.jpg";
string invoiceUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
string idUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/id-license.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args) {
// new code:
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var analyzeBusinessCard = AnalyzeBusinessCard(recognizerClient, bcUrl);
Task.WaitAll(analyzeBusinessCard);
var analyzeInvoice = AnalyzeInvoice(recognizerClient, invoiceUrl);
Task.WaitAll(analyzeInvoice);
var analyzeId = AnalyzeId(recognizerClient, idUrl);
Task.WaitAll(analyzeId);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var trainModelWithLabels = TrainModelWithLabels(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, modelId, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
static private FormRecognizerClient AuthenticateClient() {
string endpoint = "<replace-with-your-form-recognizer-endpoint-here>";
string apiKey = "<replace-with-your-form-recognizer-key-here>";
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient) {
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizeClient.StartRecognizeContentFromUri(new Uri(invoiceUri)).WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach(FormPage page in formPages) {
Console.WriteLine($ "Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++) {
FormLine line = page.Lines[i];
Console.WriteLine($ " Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "
s " : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++) {
FormTable table = page.Tables[i];
Console.WriteLine($ "Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach(FormTableCell cell in table.Cells) {
Console.WriteLine($ " Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri) {
RecognizedFormCollection receipts = await recognizeClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach(RecognizedForm receipt in receipts) {
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField)) {
if (merchantNameField.Value.ValueType == FieldValueType.String) {
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($ "Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField)) {
if (transactionDateField.Value.ValueType == FieldValueType.Date) {
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($ "Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField)) {
if (itemsField.Value.ValueType == FieldValueType.List) {
foreach(FormField itemField in itemsField.Value.AsList()) {
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary) {
IReadOnlyDictionary < string,
FormField > itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField)) {
if (itemNameField.Value.ValueType == FieldValueType.String) {
string itemName = itemNameField.Value.AsString();
Console.WriteLine($ " Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField)) {
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float) {
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($ " Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField)) {
if (totalField.Value.ValueType == FieldValueType.Float) {
float total = totalField.Value.AsFloat();
Console.WriteLine($ "Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_bc_call>
private static async Task AnalyzeBusinessCard(
FormRecognizerClient recognizerClient, string bcUrl) {
RecognizedFormCollection businessCards = await recognizerClient.StartRecognizeBusinessCardsFromUriAsync(bcUrl).WaitForCompletionAsync();
// </snippet_bc_call>
// <snippet_bc_print>
foreach(RecognizedForm businessCard in businessCards) {
FormField ContactNamesField;
if (businessCard.Fields.TryGetValue("ContactNames", out ContactNamesField)) {
if (ContactNamesField.Value.ValueType == FieldValueType.List) {
foreach(FormField contactNameField in ContactNamesField.Value.AsList()) {
Console.WriteLine($ "Contact Name: {contactNameField.ValueData.Text}");
if (contactNameField.Value.ValueType == FieldValueType.Dictionary) {
IReadOnlyDictionary < string,
FormField > contactNameFields = contactNameField.Value.AsDictionary();
FormField firstNameField;
if (contactNameFields.TryGetValue("FirstName", out firstNameField)) {
if (firstNameField.Value.ValueType == FieldValueType.String) {
string firstName = firstNameField.Value.AsString();
Console.WriteLine($ " First Name: '{firstName}', with confidence {firstNameField.Confidence}");
}
}
FormField lastNameField;
if (contactNameFields.TryGetValue("LastName", out lastNameField)) {
if (lastNameField.Value.ValueType == FieldValueType.String) {
string lastName = lastNameField.Value.AsString();
Console.WriteLine($ " Last Name: '{lastName}', with confidence {lastNameField.Confidence}");
}
}
}
}
}
}
FormField jobTitlesFields;
if (businessCard.Fields.TryGetValue("JobTitles", out jobTitlesFields)) {
if (jobTitlesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField jobTitleField in jobTitlesFields.Value.AsList()) {
if (jobTitleField.Value.ValueType == FieldValueType.String) {
string jobTitle = jobTitleField.Value.AsString();
Console.WriteLine($ " Job Title: '{jobTitle}', with confidence {jobTitleField.Confidence}");
}
}
}
}
FormField departmentFields;
if (businessCard.Fields.TryGetValue("Departments", out departmentFields)) {
if (departmentFields.Value.ValueType == FieldValueType.List) {
foreach(FormField departmentField in departmentFields.Value.AsList()) {
if (departmentField.Value.ValueType == FieldValueType.String) {
string department = departmentField.Value.AsString();
Console.WriteLine($ " Department: '{department}', with confidence {departmentField.Confidence}");
}
}
}
}
FormField emailFields;
if (businessCard.Fields.TryGetValue("Emails", out emailFields)) {
if (emailFields.Value.ValueType == FieldValueType.List) {
foreach(FormField emailField in emailFields.Value.AsList()) {
if (emailField.Value.ValueType == FieldValueType.String) {
string email = emailField.Value.AsString();
Console.WriteLine($ " Email: '{email}', with confidence {emailField.Confidence}");
}
}
}
}
FormField websiteFields;
if (businessCard.Fields.TryGetValue("Websites", out websiteFields)) {
if (websiteFields.Value.ValueType == FieldValueType.List) {
foreach(FormField websiteField in websiteFields.Value.AsList()) {
if (websiteField.Value.ValueType == FieldValueType.String) {
string website = websiteField.Value.AsString();
Console.WriteLine($ " Website: '{website}', with confidence {websiteField.Confidence}");
}
}
}
}
FormField mobilePhonesFields;
if (businessCard.Fields.TryGetValue("MobilePhones", out mobilePhonesFields)) {
if (mobilePhonesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField mobilePhoneField in mobilePhonesFields.Value.AsList()) {
if (mobilePhoneField.Value.ValueType == FieldValueType.PhoneNumber) {
string mobilePhone = mobilePhoneField.Value.AsPhoneNumber();
Console.WriteLine($ " Mobile phone number: '{mobilePhone}', with confidence {mobilePhoneField.Confidence}");
}
}
}
}
FormField otherPhonesFields;
if (businessCard.Fields.TryGetValue("OtherPhones", out otherPhonesFields)) {
if (otherPhonesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField otherPhoneField in otherPhonesFields.Value.AsList()) {
if (otherPhoneField.Value.ValueType == FieldValueType.PhoneNumber) {
string otherPhone = otherPhoneField.Value.AsPhoneNumber();
Console.WriteLine($ " Other phone number: '{otherPhone}', with confidence {otherPhoneField.Confidence}");
}
}
}
}
FormField faxesFields;
if (businessCard.Fields.TryGetValue("Faxes", out faxesFields)) {
if (faxesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField faxField in faxesFields.Value.AsList()) {
if (faxField.Value.ValueType == FieldValueType.PhoneNumber) {
string fax = faxField.Value.AsPhoneNumber();
Console.WriteLine($ " Fax phone number: '{fax}', with confidence {faxField.Confidence}");
}
}
}
}
FormField addressesFields;
if (businessCard.Fields.TryGetValue("Addresses", out addressesFields)) {
if (addressesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField addressField in addressesFields.Value.AsList()) {
if (addressField.Value.ValueType == FieldValueType.String) {
string address = addressField.Value.AsString();
Console.WriteLine($ " Address: '{address}', with confidence {addressField.Confidence}");
}
}
}
}
FormField companyNamesFields;
if (businessCard.Fields.TryGetValue("CompanyNames", out companyNamesFields)) {
if (companyNamesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField companyNameField in companyNamesFields.Value.AsList()) {
if (companyNameField.Value.ValueType == FieldValueType.String) {
string companyName = companyNameField.Value.AsString();
Console.WriteLine($ " Company name: '{companyName}', with confidence {companyNameField.Confidence}");
}
}
}
}
}
}
// </snippet_bc_print>
// <snippet_invoice_call>
private static async Task AnalyzeInvoice(
FormRecognizerClient recognizerClient, string invoiceUrl) {
var options = new RecognizeInvoicesOptions() {
Locale = "en-US"
};
RecognizedFormCollection invoices = await recognizerClient.StartRecognizeInvoicesFromUriAsync(invoiceUrl, options).WaitForCompletionAsync();
// </snippet_invoice_call>
// <snippet_invoice_print>
RecognizedForm invoice = invoices.Single();
FormField invoiceIdField;
if (invoice.Fields.TryGetValue("InvoiceId", out invoiceIdField)) {
if (invoiceIdField.Value.ValueType == FieldValueType.String) {
string invoiceId = invoiceIdField.Value.AsString();
Console.WriteLine($ " Invoice Id: '{invoiceId}', with confidence {invoiceIdField.Confidence}");
}
}
FormField invoiceDateField;
if (invoice.Fields.TryGetValue("InvoiceDate", out invoiceDateField)) {
if (invoiceDateField.Value.ValueType == FieldValueType.Date) {
DateTime invoiceDate = invoiceDateField.Value.AsDate();
Console.WriteLine($ " Invoice Date: '{invoiceDate}', with confidence {invoiceDateField.Confidence}");
}
}
FormField dueDateField;
if (invoice.Fields.TryGetValue("DueDate", out dueDateField)) {
if (dueDateField.Value.ValueType == FieldValueType.Date) {
DateTime dueDate = dueDateField.Value.AsDate();
Console.WriteLine($ " Due Date: '{dueDate}', with confidence {dueDateField.Confidence}");
}
}
FormField vendorNameField;
if (invoice.Fields.TryGetValue("VendorName", out vendorNameField)) {
if (vendorNameField.Value.ValueType == FieldValueType.String) {
string vendorName = vendorNameField.Value.AsString();
Console.WriteLine($ " Vendor Name: '{vendorName}', with confidence {vendorNameField.Confidence}");
}
}
FormField vendorAddressField;
if (invoice.Fields.TryGetValue("VendorAddress", out vendorAddressField)) {
if (vendorAddressField.Value.ValueType == FieldValueType.String) {
string vendorAddress = vendorAddressField.Value.AsString();
Console.WriteLine($ " Vendor Address: '{vendorAddress}', with confidence {vendorAddressField.Confidence}");
}
}
FormField customerNameField;
if (invoice.Fields.TryGetValue("CustomerName", out customerNameField)) {
if (customerNameField.Value.ValueType == FieldValueType.String) {
string customerName = customerNameField.Value.AsString();
Console.WriteLine($ " Customer Name: '{customerName}', with confidence {customerNameField.Confidence}");
}
}
FormField customerAddressField;
if (invoice.Fields.TryGetValue("CustomerAddress", out customerAddressField)) {
if (customerAddressField.Value.ValueType == FieldValueType.String) {
string customerAddress = customerAddressField.Value.AsString();
Console.WriteLine($ " Customer Address: '{customerAddress}', with confidence {customerAddressField.Confidence}");
}
}
FormField customerAddressRecipientField;
if (invoice.Fields.TryGetValue("CustomerAddressRecipient", out customerAddressRecipientField)) {
if (customerAddressRecipientField.Value.ValueType == FieldValueType.String) {
string customerAddressRecipient = customerAddressRecipientField.Value.AsString();
Console.WriteLine($ " Customer address recipient: '{customerAddressRecipient}', with confidence {customerAddressRecipientField.Confidence}");
}
}
FormField invoiceTotalField;
if (invoice.Fields.TryGetValue("InvoiceTotal", out invoiceTotalField)) {
if (invoiceTotalField.Value.ValueType == FieldValueType.Float) {
float invoiceTotal = invoiceTotalField.Value.AsFloat();
Console.WriteLine($ " Invoice Total: '{invoiceTotal}', with confidence {invoiceTotalField.Confidence}");
}
}
}
// </snippet_invoice_print>
// <snippet_id_call>
private static async Task AnalyzeId(
FormRecognizerClient recognizerClient, string idUrl) {
RecognizedFormCollection identityDocument = await recognizerClient.StartRecognizeIdDocumentsFromUriAsync(idUrl).WaitForCompletionAsync();
//</snippet_id_call>
//<snippet_id_print>
RecognizedForm identityDocument = identityDocuments.Single();
if (identityDocument.Fields.TryGetValue("Address", out FormField addressField)) {
if (addressField.Value.ValueType == FieldValueType.String) {
string address = addressField.Value.AsString();
Console.WriteLine($ "Address: '{address}', with confidence {addressField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("CountryRegion", out FormField countryRegionField)) {
if (countryRegionField.Value.ValueType == FieldValueType.CountryRegion) {
string countryRegion = countryRegionField.Value.AsCountryRegion();
Console.WriteLine($ "CountryRegion: '{countryRegion}', with confidence {countryRegionField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfBirth", out FormField dateOfBirthField)) {
if (dateOfBirthField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfBirth = dateOfBirthField.Value.AsDate();
Console.WriteLine($ "Date Of Birth: '{dateOfBirth}', with confidence {dateOfBirthField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfExpiration", out FormField dateOfExpirationField)) {
if (dateOfExpirationField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfExpiration = dateOfExpirationField.Value.AsDate();
Console.WriteLine($ "Date Of Expiration: '{dateOfExpiration}', with confidence {dateOfExpirationField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DocumentNumber", out FormField documentNumberField)) {
if (documentNumberField.Value.ValueType == FieldValueType.String) {
string documentNumber = documentNumberField.Value.AsString();
Console.WriteLine($ "Document Number: '{documentNumber}', with confidence {documentNumberField.Confidence}");
}
RecognizedForm identityDocument = identityDocuments.Single();
if (identityDocument.Fields.TryGetValue("Address", out FormField addressField)) {
if (addressField.Value.ValueType == FieldValueType.String) {
string address = addressField.Value.AsString();
Console.WriteLine($ "Address: '{address}', with confidence {addressField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("CountryRegion", out FormField countryRegionField)) {
if (countryRegionField.Value.ValueType == FieldValueType.CountryRegion) {
string countryRegion = countryRegionField.Value.AsCountryRegion();
Console.WriteLine($ "CountryRegion: '{countryRegion}', with confidence {countryRegionField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfBirth", out FormField dateOfBirthField)) {
if (dateOfBirthField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfBirth = dateOfBirthField.Value.AsDate();
Console.WriteLine($ "Date Of Birth: '{dateOfBirth}', with confidence {dateOfBirthField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfExpiration", out FormField dateOfExpirationField)) {
if (dateOfExpirationField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfExpiration = dateOfExpirationField.Value.AsDate();
Console.WriteLine($ "Date Of Expiration: '{dateOfExpiration}', with confidence {dateOfExpirationField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DocumentNumber", out FormField documentNumberField)) {
if (documentNumberField.Value.ValueType == FieldValueType.String) {
string documentNumber = documentNumberField.Value.AsString();
Console.WriteLine($ "Document Number: '{documentNumber}', with confidence {documentNumberField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("FirstName", out FormField firstNameField)) {
if (firstNameField.Value.ValueType == FieldValueType.String) {
string firstName = firstNameField.Value.AsString();
Console.WriteLine($ "First Name: '{firstName}', with confidence {firstNameField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("LastName", out FormField lastNameField)) {
if (lastNameField.Value.ValueType == FieldValueType.String) {
string lastName = lastNameField.Value.AsString();
Console.WriteLine($ "Last Name: '{lastName}', with confidence {lastNameField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("Region", out FormField regionfield)) {
if (regionfield.Value.ValueType == FieldValueType.String) {
string region = regionfield.Value.AsString();
Console.WriteLine($ "Region: '{region}', with confidence {regionfield.Confidence}");
}
}
//</snippet_id_print>
// <snippet_train>
private static async Task < Guid > TrainModel(
FormRecognizerClient trainingClient, string trainingDataUrl) {
CustomFormModel model = await trainingClient.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false).WaitForCompletionAsync();
Console.WriteLine($ "Custom Model Info:");
Console.WriteLine($ " Model Id: {model.ModelId}");
Console.WriteLine($ " Model Status: {model.Status}");
Console.WriteLine($ " Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($ " Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach(CustomFormSubmodel submodel in model.Submodels) {
Console.WriteLine($ "Submodel Form Type: {submodel.FormType}");
foreach(CustomFormModelField field in submodel.Fields.Values) {
Console.Write($ " FieldName: {field.Name}");
if (field.Label != null) {
Console.Write($ ", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task < Guid > TrainModelWithLabels(
FormRecognizerClient trainingClient, string trainingDataUrl) {
CustomFormModel model = await trainingClient.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true).WaitForCompletionAsync();
Console.WriteLine($ "Custom Model Info:");
Console.WriteLine($ " Model Id: {model.ModelId}");
Console.WriteLine($ " Model Status: {model.Status}");
Console.WriteLine($ " Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($ " Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach(CustomFormSubmodel submodel in model.Submodels) {
Console.WriteLine($ "Submodel Form Type: {submodel.FormType}");
foreach(CustomFormModelField field in submodel.Fields.Values) {
Console.Write($ " FieldName: {field.Name}");
if (field.Label != null) {
Console.Write($ ", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, Guid modelId, string formUrl) {
RecognizedFormCollection forms = await recognizeClient.StartRecognizeCustomFormsFromUri(modelId, new Uri(invoiceUri)).WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach(RecognizedForm form in forms) {
Console.WriteLine($ "Form of type: {form.FormType}");
foreach(FormField field in form.Fields.Values) {
Console.WriteLine($ "Field '{field.Name}: ");
if (field.LabelData != null) {
Console.WriteLine($ " Label: '{field.LabelData.Text}");
}
Console.WriteLine($ " Value: '{field.ValueData.Text}");
Console.WriteLine($ " Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach(FormPage page in form.Pages.Values) {
for (int i = 0; i < page.Tables.Count; i++) {
FormTable table = page.Tables[i];
Console.WriteLine($ "Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach(FormTableCell cell in table.Cells) {
Console.WriteLine($ " Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "
header " : "
text ")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormRecognizerClient trainingClient, string trainingFileUrl) {
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($ "Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($ "It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable < CustomFormModelInfo > models = trainingClient.GetCustomModels();
foreach(CustomFormModelInfo modelInfo in models) {
Console.WriteLine($ "Custom Model Info:");
Console.WriteLine($ " Model Id: {modelInfo.ModelId}");
Console.WriteLine($ " Model Status: {modelInfo.Status}");
Console.WriteLine($ " Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($ " Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(modelId);
Console.WriteLine($ "Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach(CustomFormSubmodel submodel in modelCopy.Submodels) {
Console.WriteLine($ "Submodel Form Type: {submodel.FormType}");
foreach(CustomFormModelField field in submodel.Fields.Values) {
Console.Write($ " FieldName: {field.Name}");
if (field.Label != null) {
Console.Write($ ", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
分析发票
本部分演示了如何使用预先训练的模型分析和提取销售发票中的常见字段。 有关发票分析的详细信息,请参阅文档智能发票模型。
若要分析位于某个 URL 的发票,请使用 StartRecognizeInvoicesFromUriAsync
方法。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program {
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
string receiptUrl = "/cognitive-services/form-recognizer/media" + "/contoso-allinone.jpg";
string bcUrl = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_forms/business_cards/business-card-english.jpg";
string invoiceUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
string idUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/id-license.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args) {
// new code:
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var analyzeBusinessCard = AnalyzeBusinessCard(recognizerClient, bcUrl);
Task.WaitAll(analyzeBusinessCard);
var analyzeInvoice = AnalyzeInvoice(recognizerClient, invoiceUrl);
Task.WaitAll(analyzeInvoice);
var analyzeId = AnalyzeId(recognizerClient, idUrl);
Task.WaitAll(analyzeId);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var trainModelWithLabels = TrainModelWithLabels(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, modelId, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
static private FormRecognizerClient AuthenticateClient() {
string endpoint = "<replace-with-your-form-recognizer-endpoint-here>";
string apiKey = "<replace-with-your-form-recognizer-key-here>";
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient) {
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizeClient.StartRecognizeContentFromUri(new Uri(invoiceUri)).WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach(FormPage page in formPages) {
Console.WriteLine($ "Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++) {
FormLine line = page.Lines[i];
Console.WriteLine($ " Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "
s " : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++) {
FormTable table = page.Tables[i];
Console.WriteLine($ "Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach(FormTableCell cell in table.Cells) {
Console.WriteLine($ " Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri) {
RecognizedFormCollection receipts = await recognizeClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach(RecognizedForm receipt in receipts) {
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField)) {
if (merchantNameField.Value.ValueType == FieldValueType.String) {
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($ "Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField)) {
if (transactionDateField.Value.ValueType == FieldValueType.Date) {
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($ "Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField)) {
if (itemsField.Value.ValueType == FieldValueType.List) {
foreach(FormField itemField in itemsField.Value.AsList()) {
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary) {
IReadOnlyDictionary < string,
FormField > itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField)) {
if (itemNameField.Value.ValueType == FieldValueType.String) {
string itemName = itemNameField.Value.AsString();
Console.WriteLine($ " Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField)) {
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float) {
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($ " Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField)) {
if (totalField.Value.ValueType == FieldValueType.Float) {
float total = totalField.Value.AsFloat();
Console.WriteLine($ "Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_bc_call>
private static async Task AnalyzeBusinessCard(
FormRecognizerClient recognizerClient, string bcUrl) {
RecognizedFormCollection businessCards = await recognizerClient.StartRecognizeBusinessCardsFromUriAsync(bcUrl).WaitForCompletionAsync();
// </snippet_bc_call>
// <snippet_bc_print>
foreach(RecognizedForm businessCard in businessCards) {
FormField ContactNamesField;
if (businessCard.Fields.TryGetValue("ContactNames", out ContactNamesField)) {
if (ContactNamesField.Value.ValueType == FieldValueType.List) {
foreach(FormField contactNameField in ContactNamesField.Value.AsList()) {
Console.WriteLine($ "Contact Name: {contactNameField.ValueData.Text}");
if (contactNameField.Value.ValueType == FieldValueType.Dictionary) {
IReadOnlyDictionary < string,
FormField > contactNameFields = contactNameField.Value.AsDictionary();
FormField firstNameField;
if (contactNameFields.TryGetValue("FirstName", out firstNameField)) {
if (firstNameField.Value.ValueType == FieldValueType.String) {
string firstName = firstNameField.Value.AsString();
Console.WriteLine($ " First Name: '{firstName}', with confidence {firstNameField.Confidence}");
}
}
FormField lastNameField;
if (contactNameFields.TryGetValue("LastName", out lastNameField)) {
if (lastNameField.Value.ValueType == FieldValueType.String) {
string lastName = lastNameField.Value.AsString();
Console.WriteLine($ " Last Name: '{lastName}', with confidence {lastNameField.Confidence}");
}
}
}
}
}
}
FormField jobTitlesFields;
if (businessCard.Fields.TryGetValue("JobTitles", out jobTitlesFields)) {
if (jobTitlesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField jobTitleField in jobTitlesFields.Value.AsList()) {
if (jobTitleField.Value.ValueType == FieldValueType.String) {
string jobTitle = jobTitleField.Value.AsString();
Console.WriteLine($ " Job Title: '{jobTitle}', with confidence {jobTitleField.Confidence}");
}
}
}
}
FormField departmentFields;
if (businessCard.Fields.TryGetValue("Departments", out departmentFields)) {
if (departmentFields.Value.ValueType == FieldValueType.List) {
foreach(FormField departmentField in departmentFields.Value.AsList()) {
if (departmentField.Value.ValueType == FieldValueType.String) {
string department = departmentField.Value.AsString();
Console.WriteLine($ " Department: '{department}', with confidence {departmentField.Confidence}");
}
}
}
}
FormField emailFields;
if (businessCard.Fields.TryGetValue("Emails", out emailFields)) {
if (emailFields.Value.ValueType == FieldValueType.List) {
foreach(FormField emailField in emailFields.Value.AsList()) {
if (emailField.Value.ValueType == FieldValueType.String) {
string email = emailField.Value.AsString();
Console.WriteLine($ " Email: '{email}', with confidence {emailField.Confidence}");
}
}
}
}
FormField websiteFields;
if (businessCard.Fields.TryGetValue("Websites", out websiteFields)) {
if (websiteFields.Value.ValueType == FieldValueType.List) {
foreach(FormField websiteField in websiteFields.Value.AsList()) {
if (websiteField.Value.ValueType == FieldValueType.String) {
string website = websiteField.Value.AsString();
Console.WriteLine($ " Website: '{website}', with confidence {websiteField.Confidence}");
}
}
}
}
FormField mobilePhonesFields;
if (businessCard.Fields.TryGetValue("MobilePhones", out mobilePhonesFields)) {
if (mobilePhonesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField mobilePhoneField in mobilePhonesFields.Value.AsList()) {
if (mobilePhoneField.Value.ValueType == FieldValueType.PhoneNumber) {
string mobilePhone = mobilePhoneField.Value.AsPhoneNumber();
Console.WriteLine($ " Mobile phone number: '{mobilePhone}', with confidence {mobilePhoneField.Confidence}");
}
}
}
}
FormField otherPhonesFields;
if (businessCard.Fields.TryGetValue("OtherPhones", out otherPhonesFields)) {
if (otherPhonesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField otherPhoneField in otherPhonesFields.Value.AsList()) {
if (otherPhoneField.Value.ValueType == FieldValueType.PhoneNumber) {
string otherPhone = otherPhoneField.Value.AsPhoneNumber();
Console.WriteLine($ " Other phone number: '{otherPhone}', with confidence {otherPhoneField.Confidence}");
}
}
}
}
FormField faxesFields;
if (businessCard.Fields.TryGetValue("Faxes", out faxesFields)) {
if (faxesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField faxField in faxesFields.Value.AsList()) {
if (faxField.Value.ValueType == FieldValueType.PhoneNumber) {
string fax = faxField.Value.AsPhoneNumber();
Console.WriteLine($ " Fax phone number: '{fax}', with confidence {faxField.Confidence}");
}
}
}
}
FormField addressesFields;
if (businessCard.Fields.TryGetValue("Addresses", out addressesFields)) {
if (addressesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField addressField in addressesFields.Value.AsList()) {
if (addressField.Value.ValueType == FieldValueType.String) {
string address = addressField.Value.AsString();
Console.WriteLine($ " Address: '{address}', with confidence {addressField.Confidence}");
}
}
}
}
FormField companyNamesFields;
if (businessCard.Fields.TryGetValue("CompanyNames", out companyNamesFields)) {
if (companyNamesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField companyNameField in companyNamesFields.Value.AsList()) {
if (companyNameField.Value.ValueType == FieldValueType.String) {
string companyName = companyNameField.Value.AsString();
Console.WriteLine($ " Company name: '{companyName}', with confidence {companyNameField.Confidence}");
}
}
}
}
}
}
// </snippet_bc_print>
// <snippet_invoice_call>
private static async Task AnalyzeInvoice(
FormRecognizerClient recognizerClient, string invoiceUrl) {
var options = new RecognizeInvoicesOptions() {
Locale = "en-US"
};
RecognizedFormCollection invoices = await recognizerClient.StartRecognizeInvoicesFromUriAsync(invoiceUrl, options).WaitForCompletionAsync();
// </snippet_invoice_call>
// <snippet_invoice_print>
RecognizedForm invoice = invoices.Single();
FormField invoiceIdField;
if (invoice.Fields.TryGetValue("InvoiceId", out invoiceIdField)) {
if (invoiceIdField.Value.ValueType == FieldValueType.String) {
string invoiceId = invoiceIdField.Value.AsString();
Console.WriteLine($ " Invoice Id: '{invoiceId}', with confidence {invoiceIdField.Confidence}");
}
}
FormField invoiceDateField;
if (invoice.Fields.TryGetValue("InvoiceDate", out invoiceDateField)) {
if (invoiceDateField.Value.ValueType == FieldValueType.Date) {
DateTime invoiceDate = invoiceDateField.Value.AsDate();
Console.WriteLine($ " Invoice Date: '{invoiceDate}', with confidence {invoiceDateField.Confidence}");
}
}
FormField dueDateField;
if (invoice.Fields.TryGetValue("DueDate", out dueDateField)) {
if (dueDateField.Value.ValueType == FieldValueType.Date) {
DateTime dueDate = dueDateField.Value.AsDate();
Console.WriteLine($ " Due Date: '{dueDate}', with confidence {dueDateField.Confidence}");
}
}
FormField vendorNameField;
if (invoice.Fields.TryGetValue("VendorName", out vendorNameField)) {
if (vendorNameField.Value.ValueType == FieldValueType.String) {
string vendorName = vendorNameField.Value.AsString();
Console.WriteLine($ " Vendor Name: '{vendorName}', with confidence {vendorNameField.Confidence}");
}
}
FormField vendorAddressField;
if (invoice.Fields.TryGetValue("VendorAddress", out vendorAddressField)) {
if (vendorAddressField.Value.ValueType == FieldValueType.String) {
string vendorAddress = vendorAddressField.Value.AsString();
Console.WriteLine($ " Vendor Address: '{vendorAddress}', with confidence {vendorAddressField.Confidence}");
}
}
FormField customerNameField;
if (invoice.Fields.TryGetValue("CustomerName", out customerNameField)) {
if (customerNameField.Value.ValueType == FieldValueType.String) {
string customerName = customerNameField.Value.AsString();
Console.WriteLine($ " Customer Name: '{customerName}', with confidence {customerNameField.Confidence}");
}
}
FormField customerAddressField;
if (invoice.Fields.TryGetValue("CustomerAddress", out customerAddressField)) {
if (customerAddressField.Value.ValueType == FieldValueType.String) {
string customerAddress = customerAddressField.Value.AsString();
Console.WriteLine($ " Customer Address: '{customerAddress}', with confidence {customerAddressField.Confidence}");
}
}
FormField customerAddressRecipientField;
if (invoice.Fields.TryGetValue("CustomerAddressRecipient", out customerAddressRecipientField)) {
if (customerAddressRecipientField.Value.ValueType == FieldValueType.String) {
string customerAddressRecipient = customerAddressRecipientField.Value.AsString();
Console.WriteLine($ " Customer address recipient: '{customerAddressRecipient}', with confidence {customerAddressRecipientField.Confidence}");
}
}
FormField invoiceTotalField;
if (invoice.Fields.TryGetValue("InvoiceTotal", out invoiceTotalField)) {
if (invoiceTotalField.Value.ValueType == FieldValueType.Float) {
float invoiceTotal = invoiceTotalField.Value.AsFloat();
Console.WriteLine($ " Invoice Total: '{invoiceTotal}', with confidence {invoiceTotalField.Confidence}");
}
}
}
// </snippet_invoice_print>
// <snippet_id_call>
private static async Task AnalyzeId(
FormRecognizerClient recognizerClient, string idUrl) {
RecognizedFormCollection identityDocument = await recognizerClient.StartRecognizeIdDocumentsFromUriAsync(idUrl).WaitForCompletionAsync();
//</snippet_id_call>
//<snippet_id_print>
RecognizedForm identityDocument = identityDocuments.Single();
if (identityDocument.Fields.TryGetValue("Address", out FormField addressField)) {
if (addressField.Value.ValueType == FieldValueType.String) {
string address = addressField.Value.AsString();
Console.WriteLine($ "Address: '{address}', with confidence {addressField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("CountryRegion", out FormField countryRegionField)) {
if (countryRegionField.Value.ValueType == FieldValueType.CountryRegion) {
string countryRegion = countryRegionField.Value.AsCountryRegion();
Console.WriteLine($ "CountryRegion: '{countryRegion}', with confidence {countryRegionField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfBirth", out FormField dateOfBirthField)) {
if (dateOfBirthField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfBirth = dateOfBirthField.Value.AsDate();
Console.WriteLine($ "Date Of Birth: '{dateOfBirth}', with confidence {dateOfBirthField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfExpiration", out FormField dateOfExpirationField)) {
if (dateOfExpirationField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfExpiration = dateOfExpirationField.Value.AsDate();
Console.WriteLine($ "Date Of Expiration: '{dateOfExpiration}', with confidence {dateOfExpirationField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DocumentNumber", out FormField documentNumberField)) {
if (documentNumberField.Value.ValueType == FieldValueType.String) {
string documentNumber = documentNumberField.Value.AsString();
Console.WriteLine($ "Document Number: '{documentNumber}', with confidence {documentNumberField.Confidence}");
}
RecognizedForm identityDocument = identityDocuments.Single();
if (identityDocument.Fields.TryGetValue("Address", out FormField addressField)) {
if (addressField.Value.ValueType == FieldValueType.String) {
string address = addressField.Value.AsString();
Console.WriteLine($ "Address: '{address}', with confidence {addressField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("CountryRegion", out FormField countryRegionField)) {
if (countryRegionField.Value.ValueType == FieldValueType.CountryRegion) {
string countryRegion = countryRegionField.Value.AsCountryRegion();
Console.WriteLine($ "CountryRegion: '{countryRegion}', with confidence {countryRegionField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfBirth", out FormField dateOfBirthField)) {
if (dateOfBirthField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfBirth = dateOfBirthField.Value.AsDate();
Console.WriteLine($ "Date Of Birth: '{dateOfBirth}', with confidence {dateOfBirthField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfExpiration", out FormField dateOfExpirationField)) {
if (dateOfExpirationField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfExpiration = dateOfExpirationField.Value.AsDate();
Console.WriteLine($ "Date Of Expiration: '{dateOfExpiration}', with confidence {dateOfExpirationField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DocumentNumber", out FormField documentNumberField)) {
if (documentNumberField.Value.ValueType == FieldValueType.String) {
string documentNumber = documentNumberField.Value.AsString();
Console.WriteLine($ "Document Number: '{documentNumber}', with confidence {documentNumberField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("FirstName", out FormField firstNameField)) {
if (firstNameField.Value.ValueType == FieldValueType.String) {
string firstName = firstNameField.Value.AsString();
Console.WriteLine($ "First Name: '{firstName}', with confidence {firstNameField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("LastName", out FormField lastNameField)) {
if (lastNameField.Value.ValueType == FieldValueType.String) {
string lastName = lastNameField.Value.AsString();
Console.WriteLine($ "Last Name: '{lastName}', with confidence {lastNameField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("Region", out FormField regionfield)) {
if (regionfield.Value.ValueType == FieldValueType.String) {
string region = regionfield.Value.AsString();
Console.WriteLine($ "Region: '{region}', with confidence {regionfield.Confidence}");
}
}
//</snippet_id_print>
// <snippet_train>
private static async Task < Guid > TrainModel(
FormRecognizerClient trainingClient, string trainingDataUrl) {
CustomFormModel model = await trainingClient.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false).WaitForCompletionAsync();
Console.WriteLine($ "Custom Model Info:");
Console.WriteLine($ " Model Id: {model.ModelId}");
Console.WriteLine($ " Model Status: {model.Status}");
Console.WriteLine($ " Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($ " Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach(CustomFormSubmodel submodel in model.Submodels) {
Console.WriteLine($ "Submodel Form Type: {submodel.FormType}");
foreach(CustomFormModelField field in submodel.Fields.Values) {
Console.Write($ " FieldName: {field.Name}");
if (field.Label != null) {
Console.Write($ ", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task < Guid > TrainModelWithLabels(
FormRecognizerClient trainingClient, string trainingDataUrl) {
CustomFormModel model = await trainingClient.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true).WaitForCompletionAsync();
Console.WriteLine($ "Custom Model Info:");
Console.WriteLine($ " Model Id: {model.ModelId}");
Console.WriteLine($ " Model Status: {model.Status}");
Console.WriteLine($ " Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($ " Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach(CustomFormSubmodel submodel in model.Submodels) {
Console.WriteLine($ "Submodel Form Type: {submodel.FormType}");
foreach(CustomFormModelField field in submodel.Fields.Values) {
Console.Write($ " FieldName: {field.Name}");
if (field.Label != null) {
Console.Write($ ", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, Guid modelId, string formUrl) {
RecognizedFormCollection forms = await recognizeClient.StartRecognizeCustomFormsFromUri(modelId, new Uri(invoiceUri)).WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach(RecognizedForm form in forms) {
Console.WriteLine($ "Form of type: {form.FormType}");
foreach(FormField field in form.Fields.Values) {
Console.WriteLine($ "Field '{field.Name}: ");
if (field.LabelData != null) {
Console.WriteLine($ " Label: '{field.LabelData.Text}");
}
Console.WriteLine($ " Value: '{field.ValueData.Text}");
Console.WriteLine($ " Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach(FormPage page in form.Pages.Values) {
for (int i = 0; i < page.Tables.Count; i++) {
FormTable table = page.Tables[i];
Console.WriteLine($ "Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach(FormTableCell cell in table.Cells) {
Console.WriteLine($ " Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "
header " : "
text ")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormRecognizerClient trainingClient, string trainingFileUrl) {
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($ "Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($ "It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable < CustomFormModelInfo > models = trainingClient.GetCustomModels();
foreach(CustomFormModelInfo modelInfo in models) {
Console.WriteLine($ "Custom Model Info:");
Console.WriteLine($ " Model Id: {modelInfo.ModelId}");
Console.WriteLine($ " Model Status: {modelInfo.Status}");
Console.WriteLine($ " Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($ " Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(modelId);
Console.WriteLine($ "Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach(CustomFormSubmodel submodel in modelCopy.Submodels) {
Console.WriteLine($ "Submodel Form Type: {submodel.FormType}");
foreach(CustomFormModelField field in submodel.Fields.Values) {
Console.Write($ " FieldName: {field.Name}");
if (field.Label != null) {
Console.Write($ ", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
提示
你还可以分析本地发票图像。 请参阅 FormRecognizerClient 方法,例如 StartRecognizeInvoices
。 或者,请参阅 GitHub 上的示例代码,了解涉及本地图像的方案。
以下代码处理给定 URI 的发票,并将主字段和值输出到控制台。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program {
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
string receiptUrl = "/cognitive-services/form-recognizer/media" + "/contoso-allinone.jpg";
string bcUrl = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_forms/business_cards/business-card-english.jpg";
string invoiceUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
string idUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/id-license.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args) {
// new code:
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var analyzeBusinessCard = AnalyzeBusinessCard(recognizerClient, bcUrl);
Task.WaitAll(analyzeBusinessCard);
var analyzeInvoice = AnalyzeInvoice(recognizerClient, invoiceUrl);
Task.WaitAll(analyzeInvoice);
var analyzeId = AnalyzeId(recognizerClient, idUrl);
Task.WaitAll(analyzeId);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var trainModelWithLabels = TrainModelWithLabels(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, modelId, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
static private FormRecognizerClient AuthenticateClient() {
string endpoint = "<replace-with-your-form-recognizer-endpoint-here>";
string apiKey = "<replace-with-your-form-recognizer-key-here>";
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient) {
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizeClient.StartRecognizeContentFromUri(new Uri(invoiceUri)).WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach(FormPage page in formPages) {
Console.WriteLine($ "Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++) {
FormLine line = page.Lines[i];
Console.WriteLine($ " Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "
s " : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++) {
FormTable table = page.Tables[i];
Console.WriteLine($ "Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach(FormTableCell cell in table.Cells) {
Console.WriteLine($ " Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri) {
RecognizedFormCollection receipts = await recognizeClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach(RecognizedForm receipt in receipts) {
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField)) {
if (merchantNameField.Value.ValueType == FieldValueType.String) {
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($ "Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField)) {
if (transactionDateField.Value.ValueType == FieldValueType.Date) {
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($ "Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField)) {
if (itemsField.Value.ValueType == FieldValueType.List) {
foreach(FormField itemField in itemsField.Value.AsList()) {
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary) {
IReadOnlyDictionary < string,
FormField > itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField)) {
if (itemNameField.Value.ValueType == FieldValueType.String) {
string itemName = itemNameField.Value.AsString();
Console.WriteLine($ " Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField)) {
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float) {
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($ " Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField)) {
if (totalField.Value.ValueType == FieldValueType.Float) {
float total = totalField.Value.AsFloat();
Console.WriteLine($ "Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_bc_call>
private static async Task AnalyzeBusinessCard(
FormRecognizerClient recognizerClient, string bcUrl) {
RecognizedFormCollection businessCards = await recognizerClient.StartRecognizeBusinessCardsFromUriAsync(bcUrl).WaitForCompletionAsync();
// </snippet_bc_call>
// <snippet_bc_print>
foreach(RecognizedForm businessCard in businessCards) {
FormField ContactNamesField;
if (businessCard.Fields.TryGetValue("ContactNames", out ContactNamesField)) {
if (ContactNamesField.Value.ValueType == FieldValueType.List) {
foreach(FormField contactNameField in ContactNamesField.Value.AsList()) {
Console.WriteLine($ "Contact Name: {contactNameField.ValueData.Text}");
if (contactNameField.Value.ValueType == FieldValueType.Dictionary) {
IReadOnlyDictionary < string,
FormField > contactNameFields = contactNameField.Value.AsDictionary();
FormField firstNameField;
if (contactNameFields.TryGetValue("FirstName", out firstNameField)) {
if (firstNameField.Value.ValueType == FieldValueType.String) {
string firstName = firstNameField.Value.AsString();
Console.WriteLine($ " First Name: '{firstName}', with confidence {firstNameField.Confidence}");
}
}
FormField lastNameField;
if (contactNameFields.TryGetValue("LastName", out lastNameField)) {
if (lastNameField.Value.ValueType == FieldValueType.String) {
string lastName = lastNameField.Value.AsString();
Console.WriteLine($ " Last Name: '{lastName}', with confidence {lastNameField.Confidence}");
}
}
}
}
}
}
FormField jobTitlesFields;
if (businessCard.Fields.TryGetValue("JobTitles", out jobTitlesFields)) {
if (jobTitlesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField jobTitleField in jobTitlesFields.Value.AsList()) {
if (jobTitleField.Value.ValueType == FieldValueType.String) {
string jobTitle = jobTitleField.Value.AsString();
Console.WriteLine($ " Job Title: '{jobTitle}', with confidence {jobTitleField.Confidence}");
}
}
}
}
FormField departmentFields;
if (businessCard.Fields.TryGetValue("Departments", out departmentFields)) {
if (departmentFields.Value.ValueType == FieldValueType.List) {
foreach(FormField departmentField in departmentFields.Value.AsList()) {
if (departmentField.Value.ValueType == FieldValueType.String) {
string department = departmentField.Value.AsString();
Console.WriteLine($ " Department: '{department}', with confidence {departmentField.Confidence}");
}
}
}
}
FormField emailFields;
if (businessCard.Fields.TryGetValue("Emails", out emailFields)) {
if (emailFields.Value.ValueType == FieldValueType.List) {
foreach(FormField emailField in emailFields.Value.AsList()) {
if (emailField.Value.ValueType == FieldValueType.String) {
string email = emailField.Value.AsString();
Console.WriteLine($ " Email: '{email}', with confidence {emailField.Confidence}");
}
}
}
}
FormField websiteFields;
if (businessCard.Fields.TryGetValue("Websites", out websiteFields)) {
if (websiteFields.Value.ValueType == FieldValueType.List) {
foreach(FormField websiteField in websiteFields.Value.AsList()) {
if (websiteField.Value.ValueType == FieldValueType.String) {
string website = websiteField.Value.AsString();
Console.WriteLine($ " Website: '{website}', with confidence {websiteField.Confidence}");
}
}
}
}
FormField mobilePhonesFields;
if (businessCard.Fields.TryGetValue("MobilePhones", out mobilePhonesFields)) {
if (mobilePhonesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField mobilePhoneField in mobilePhonesFields.Value.AsList()) {
if (mobilePhoneField.Value.ValueType == FieldValueType.PhoneNumber) {
string mobilePhone = mobilePhoneField.Value.AsPhoneNumber();
Console.WriteLine($ " Mobile phone number: '{mobilePhone}', with confidence {mobilePhoneField.Confidence}");
}
}
}
}
FormField otherPhonesFields;
if (businessCard.Fields.TryGetValue("OtherPhones", out otherPhonesFields)) {
if (otherPhonesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField otherPhoneField in otherPhonesFields.Value.AsList()) {
if (otherPhoneField.Value.ValueType == FieldValueType.PhoneNumber) {
string otherPhone = otherPhoneField.Value.AsPhoneNumber();
Console.WriteLine($ " Other phone number: '{otherPhone}', with confidence {otherPhoneField.Confidence}");
}
}
}
}
FormField faxesFields;
if (businessCard.Fields.TryGetValue("Faxes", out faxesFields)) {
if (faxesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField faxField in faxesFields.Value.AsList()) {
if (faxField.Value.ValueType == FieldValueType.PhoneNumber) {
string fax = faxField.Value.AsPhoneNumber();
Console.WriteLine($ " Fax phone number: '{fax}', with confidence {faxField.Confidence}");
}
}
}
}
FormField addressesFields;
if (businessCard.Fields.TryGetValue("Addresses", out addressesFields)) {
if (addressesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField addressField in addressesFields.Value.AsList()) {
if (addressField.Value.ValueType == FieldValueType.String) {
string address = addressField.Value.AsString();
Console.WriteLine($ " Address: '{address}', with confidence {addressField.Confidence}");
}
}
}
}
FormField companyNamesFields;
if (businessCard.Fields.TryGetValue("CompanyNames", out companyNamesFields)) {
if (companyNamesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField companyNameField in companyNamesFields.Value.AsList()) {
if (companyNameField.Value.ValueType == FieldValueType.String) {
string companyName = companyNameField.Value.AsString();
Console.WriteLine($ " Company name: '{companyName}', with confidence {companyNameField.Confidence}");
}
}
}
}
}
}
// </snippet_bc_print>
// <snippet_invoice_call>
private static async Task AnalyzeInvoice(
FormRecognizerClient recognizerClient, string invoiceUrl) {
var options = new RecognizeInvoicesOptions() {
Locale = "en-US"
};
RecognizedFormCollection invoices = await recognizerClient.StartRecognizeInvoicesFromUriAsync(invoiceUrl, options).WaitForCompletionAsync();
// </snippet_invoice_call>
// <snippet_invoice_print>
RecognizedForm invoice = invoices.Single();
FormField invoiceIdField;
if (invoice.Fields.TryGetValue("InvoiceId", out invoiceIdField)) {
if (invoiceIdField.Value.ValueType == FieldValueType.String) {
string invoiceId = invoiceIdField.Value.AsString();
Console.WriteLine($ " Invoice Id: '{invoiceId}', with confidence {invoiceIdField.Confidence}");
}
}
FormField invoiceDateField;
if (invoice.Fields.TryGetValue("InvoiceDate", out invoiceDateField)) {
if (invoiceDateField.Value.ValueType == FieldValueType.Date) {
DateTime invoiceDate = invoiceDateField.Value.AsDate();
Console.WriteLine($ " Invoice Date: '{invoiceDate}', with confidence {invoiceDateField.Confidence}");
}
}
FormField dueDateField;
if (invoice.Fields.TryGetValue("DueDate", out dueDateField)) {
if (dueDateField.Value.ValueType == FieldValueType.Date) {
DateTime dueDate = dueDateField.Value.AsDate();
Console.WriteLine($ " Due Date: '{dueDate}', with confidence {dueDateField.Confidence}");
}
}
FormField vendorNameField;
if (invoice.Fields.TryGetValue("VendorName", out vendorNameField)) {
if (vendorNameField.Value.ValueType == FieldValueType.String) {
string vendorName = vendorNameField.Value.AsString();
Console.WriteLine($ " Vendor Name: '{vendorName}', with confidence {vendorNameField.Confidence}");
}
}
FormField vendorAddressField;
if (invoice.Fields.TryGetValue("VendorAddress", out vendorAddressField)) {
if (vendorAddressField.Value.ValueType == FieldValueType.String) {
string vendorAddress = vendorAddressField.Value.AsString();
Console.WriteLine($ " Vendor Address: '{vendorAddress}', with confidence {vendorAddressField.Confidence}");
}
}
FormField customerNameField;
if (invoice.Fields.TryGetValue("CustomerName", out customerNameField)) {
if (customerNameField.Value.ValueType == FieldValueType.String) {
string customerName = customerNameField.Value.AsString();
Console.WriteLine($ " Customer Name: '{customerName}', with confidence {customerNameField.Confidence}");
}
}
FormField customerAddressField;
if (invoice.Fields.TryGetValue("CustomerAddress", out customerAddressField)) {
if (customerAddressField.Value.ValueType == FieldValueType.String) {
string customerAddress = customerAddressField.Value.AsString();
Console.WriteLine($ " Customer Address: '{customerAddress}', with confidence {customerAddressField.Confidence}");
}
}
FormField customerAddressRecipientField;
if (invoice.Fields.TryGetValue("CustomerAddressRecipient", out customerAddressRecipientField)) {
if (customerAddressRecipientField.Value.ValueType == FieldValueType.String) {
string customerAddressRecipient = customerAddressRecipientField.Value.AsString();
Console.WriteLine($ " Customer address recipient: '{customerAddressRecipient}', with confidence {customerAddressRecipientField.Confidence}");
}
}
FormField invoiceTotalField;
if (invoice.Fields.TryGetValue("InvoiceTotal", out invoiceTotalField)) {
if (invoiceTotalField.Value.ValueType == FieldValueType.Float) {
float invoiceTotal = invoiceTotalField.Value.AsFloat();
Console.WriteLine($ " Invoice Total: '{invoiceTotal}', with confidence {invoiceTotalField.Confidence}");
}
}
}
// </snippet_invoice_print>
// <snippet_id_call>
private static async Task AnalyzeId(
FormRecognizerClient recognizerClient, string idUrl) {
RecognizedFormCollection identityDocument = await recognizerClient.StartRecognizeIdDocumentsFromUriAsync(idUrl).WaitForCompletionAsync();
//</snippet_id_call>
//<snippet_id_print>
RecognizedForm identityDocument = identityDocuments.Single();
if (identityDocument.Fields.TryGetValue("Address", out FormField addressField)) {
if (addressField.Value.ValueType == FieldValueType.String) {
string address = addressField.Value.AsString();
Console.WriteLine($ "Address: '{address}', with confidence {addressField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("CountryRegion", out FormField countryRegionField)) {
if (countryRegionField.Value.ValueType == FieldValueType.CountryRegion) {
string countryRegion = countryRegionField.Value.AsCountryRegion();
Console.WriteLine($ "CountryRegion: '{countryRegion}', with confidence {countryRegionField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfBirth", out FormField dateOfBirthField)) {
if (dateOfBirthField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfBirth = dateOfBirthField.Value.AsDate();
Console.WriteLine($ "Date Of Birth: '{dateOfBirth}', with confidence {dateOfBirthField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfExpiration", out FormField dateOfExpirationField)) {
if (dateOfExpirationField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfExpiration = dateOfExpirationField.Value.AsDate();
Console.WriteLine($ "Date Of Expiration: '{dateOfExpiration}', with confidence {dateOfExpirationField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DocumentNumber", out FormField documentNumberField)) {
if (documentNumberField.Value.ValueType == FieldValueType.String) {
string documentNumber = documentNumberField.Value.AsString();
Console.WriteLine($ "Document Number: '{documentNumber}', with confidence {documentNumberField.Confidence}");
}
RecognizedForm identityDocument = identityDocuments.Single();
if (identityDocument.Fields.TryGetValue("Address", out FormField addressField)) {
if (addressField.Value.ValueType == FieldValueType.String) {
string address = addressField.Value.AsString();
Console.WriteLine($ "Address: '{address}', with confidence {addressField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("CountryRegion", out FormField countryRegionField)) {
if (countryRegionField.Value.ValueType == FieldValueType.CountryRegion) {
string countryRegion = countryRegionField.Value.AsCountryRegion();
Console.WriteLine($ "CountryRegion: '{countryRegion}', with confidence {countryRegionField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfBirth", out FormField dateOfBirthField)) {
if (dateOfBirthField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfBirth = dateOfBirthField.Value.AsDate();
Console.WriteLine($ "Date Of Birth: '{dateOfBirth}', with confidence {dateOfBirthField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfExpiration", out FormField dateOfExpirationField)) {
if (dateOfExpirationField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfExpiration = dateOfExpirationField.Value.AsDate();
Console.WriteLine($ "Date Of Expiration: '{dateOfExpiration}', with confidence {dateOfExpirationField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DocumentNumber", out FormField documentNumberField)) {
if (documentNumberField.Value.ValueType == FieldValueType.String) {
string documentNumber = documentNumberField.Value.AsString();
Console.WriteLine($ "Document Number: '{documentNumber}', with confidence {documentNumberField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("FirstName", out FormField firstNameField)) {
if (firstNameField.Value.ValueType == FieldValueType.String) {
string firstName = firstNameField.Value.AsString();
Console.WriteLine($ "First Name: '{firstName}', with confidence {firstNameField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("LastName", out FormField lastNameField)) {
if (lastNameField.Value.ValueType == FieldValueType.String) {
string lastName = lastNameField.Value.AsString();
Console.WriteLine($ "Last Name: '{lastName}', with confidence {lastNameField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("Region", out FormField regionfield)) {
if (regionfield.Value.ValueType == FieldValueType.String) {
string region = regionfield.Value.AsString();
Console.WriteLine($ "Region: '{region}', with confidence {regionfield.Confidence}");
}
}
//</snippet_id_print>
// <snippet_train>
private static async Task < Guid > TrainModel(
FormRecognizerClient trainingClient, string trainingDataUrl) {
CustomFormModel model = await trainingClient.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false).WaitForCompletionAsync();
Console.WriteLine($ "Custom Model Info:");
Console.WriteLine($ " Model Id: {model.ModelId}");
Console.WriteLine($ " Model Status: {model.Status}");
Console.WriteLine($ " Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($ " Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach(CustomFormSubmodel submodel in model.Submodels) {
Console.WriteLine($ "Submodel Form Type: {submodel.FormType}");
foreach(CustomFormModelField field in submodel.Fields.Values) {
Console.Write($ " FieldName: {field.Name}");
if (field.Label != null) {
Console.Write($ ", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task < Guid > TrainModelWithLabels(
FormRecognizerClient trainingClient, string trainingDataUrl) {
CustomFormModel model = await trainingClient.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true).WaitForCompletionAsync();
Console.WriteLine($ "Custom Model Info:");
Console.WriteLine($ " Model Id: {model.ModelId}");
Console.WriteLine($ " Model Status: {model.Status}");
Console.WriteLine($ " Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($ " Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach(CustomFormSubmodel submodel in model.Submodels) {
Console.WriteLine($ "Submodel Form Type: {submodel.FormType}");
foreach(CustomFormModelField field in submodel.Fields.Values) {
Console.Write($ " FieldName: {field.Name}");
if (field.Label != null) {
Console.Write($ ", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, Guid modelId, string formUrl) {
RecognizedFormCollection forms = await recognizeClient.StartRecognizeCustomFormsFromUri(modelId, new Uri(invoiceUri)).WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach(RecognizedForm form in forms) {
Console.WriteLine($ "Form of type: {form.FormType}");
foreach(FormField field in form.Fields.Values) {
Console.WriteLine($ "Field '{field.Name}: ");
if (field.LabelData != null) {
Console.WriteLine($ " Label: '{field.LabelData.Text}");
}
Console.WriteLine($ " Value: '{field.ValueData.Text}");
Console.WriteLine($ " Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach(FormPage page in form.Pages.Values) {
for (int i = 0; i < page.Tables.Count; i++) {
FormTable table = page.Tables[i];
Console.WriteLine($ "Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach(FormTableCell cell in table.Cells) {
Console.WriteLine($ " Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "
header " : "
text ")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormRecognizerClient trainingClient, string trainingFileUrl) {
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($ "Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($ "It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable < CustomFormModelInfo > models = trainingClient.GetCustomModels();
foreach(CustomFormModelInfo modelInfo in models) {
Console.WriteLine($ "Custom Model Info:");
Console.WriteLine($ " Model Id: {modelInfo.ModelId}");
Console.WriteLine($ " Model Status: {modelInfo.Status}");
Console.WriteLine($ " Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($ " Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(modelId);
Console.WriteLine($ "Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach(CustomFormSubmodel submodel in modelCopy.Submodels) {
Console.WriteLine($ "Submodel Form Type: {submodel.FormType}");
foreach(CustomFormModelField field in submodel.Fields.Values) {
Console.Write($ " FieldName: {field.Name}");
if (field.Label != null) {
Console.Write($ ", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
分析 ID 文档
本部分演示了如何使用文档智能预生成的 ID 模型来分析和提取政府颁发的身份文档(全球护照和美国驾照)中的关键信息。 有关 ID 文档分析的详细信息,请参阅文档智能 ID 文档模型。
若要从 URI 分析 ID 文档,请使用 StartRecognizeIdentityDocumentsFromUriAsync
方法。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program {
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
string receiptUrl = "/cognitive-services/form-recognizer/media" + "/contoso-allinone.jpg";
string bcUrl = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_forms/business_cards/business-card-english.jpg";
string invoiceUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
string idUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/id-license.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args) {
// new code:
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var analyzeBusinessCard = AnalyzeBusinessCard(recognizerClient, bcUrl);
Task.WaitAll(analyzeBusinessCard);
var analyzeInvoice = AnalyzeInvoice(recognizerClient, invoiceUrl);
Task.WaitAll(analyzeInvoice);
var analyzeId = AnalyzeId(recognizerClient, idUrl);
Task.WaitAll(analyzeId);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var trainModelWithLabels = TrainModelWithLabels(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, modelId, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
static private FormRecognizerClient AuthenticateClient() {
string endpoint = "<replace-with-your-form-recognizer-endpoint-here>";
string apiKey = "<replace-with-your-form-recognizer-key-here>";
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient) {
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizeClient.StartRecognizeContentFromUri(new Uri(invoiceUri)).WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach(FormPage page in formPages) {
Console.WriteLine($ "Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++) {
FormLine line = page.Lines[i];
Console.WriteLine($ " Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "
s " : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++) {
FormTable table = page.Tables[i];
Console.WriteLine($ "Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach(FormTableCell cell in table.Cells) {
Console.WriteLine($ " Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri) {
RecognizedFormCollection receipts = await recognizeClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach(RecognizedForm receipt in receipts) {
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField)) {
if (merchantNameField.Value.ValueType == FieldValueType.String) {
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($ "Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField)) {
if (transactionDateField.Value.ValueType == FieldValueType.Date) {
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($ "Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField)) {
if (itemsField.Value.ValueType == FieldValueType.List) {
foreach(FormField itemField in itemsField.Value.AsList()) {
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary) {
IReadOnlyDictionary < string,
FormField > itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField)) {
if (itemNameField.Value.ValueType == FieldValueType.String) {
string itemName = itemNameField.Value.AsString();
Console.WriteLine($ " Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField)) {
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float) {
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($ " Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField)) {
if (totalField.Value.ValueType == FieldValueType.Float) {
float total = totalField.Value.AsFloat();
Console.WriteLine($ "Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_bc_call>
private static async Task AnalyzeBusinessCard(
FormRecognizerClient recognizerClient, string bcUrl) {
RecognizedFormCollection businessCards = await recognizerClient.StartRecognizeBusinessCardsFromUriAsync(bcUrl).WaitForCompletionAsync();
// </snippet_bc_call>
// <snippet_bc_print>
foreach(RecognizedForm businessCard in businessCards) {
FormField ContactNamesField;
if (businessCard.Fields.TryGetValue("ContactNames", out ContactNamesField)) {
if (ContactNamesField.Value.ValueType == FieldValueType.List) {
foreach(FormField contactNameField in ContactNamesField.Value.AsList()) {
Console.WriteLine($ "Contact Name: {contactNameField.ValueData.Text}");
if (contactNameField.Value.ValueType == FieldValueType.Dictionary) {
IReadOnlyDictionary < string,
FormField > contactNameFields = contactNameField.Value.AsDictionary();
FormField firstNameField;
if (contactNameFields.TryGetValue("FirstName", out firstNameField)) {
if (firstNameField.Value.ValueType == FieldValueType.String) {
string firstName = firstNameField.Value.AsString();
Console.WriteLine($ " First Name: '{firstName}', with confidence {firstNameField.Confidence}");
}
}
FormField lastNameField;
if (contactNameFields.TryGetValue("LastName", out lastNameField)) {
if (lastNameField.Value.ValueType == FieldValueType.String) {
string lastName = lastNameField.Value.AsString();
Console.WriteLine($ " Last Name: '{lastName}', with confidence {lastNameField.Confidence}");
}
}
}
}
}
}
FormField jobTitlesFields;
if (businessCard.Fields.TryGetValue("JobTitles", out jobTitlesFields)) {
if (jobTitlesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField jobTitleField in jobTitlesFields.Value.AsList()) {
if (jobTitleField.Value.ValueType == FieldValueType.String) {
string jobTitle = jobTitleField.Value.AsString();
Console.WriteLine($ " Job Title: '{jobTitle}', with confidence {jobTitleField.Confidence}");
}
}
}
}
FormField departmentFields;
if (businessCard.Fields.TryGetValue("Departments", out departmentFields)) {
if (departmentFields.Value.ValueType == FieldValueType.List) {
foreach(FormField departmentField in departmentFields.Value.AsList()) {
if (departmentField.Value.ValueType == FieldValueType.String) {
string department = departmentField.Value.AsString();
Console.WriteLine($ " Department: '{department}', with confidence {departmentField.Confidence}");
}
}
}
}
FormField emailFields;
if (businessCard.Fields.TryGetValue("Emails", out emailFields)) {
if (emailFields.Value.ValueType == FieldValueType.List) {
foreach(FormField emailField in emailFields.Value.AsList()) {
if (emailField.Value.ValueType == FieldValueType.String) {
string email = emailField.Value.AsString();
Console.WriteLine($ " Email: '{email}', with confidence {emailField.Confidence}");
}
}
}
}
FormField websiteFields;
if (businessCard.Fields.TryGetValue("Websites", out websiteFields)) {
if (websiteFields.Value.ValueType == FieldValueType.List) {
foreach(FormField websiteField in websiteFields.Value.AsList()) {
if (websiteField.Value.ValueType == FieldValueType.String) {
string website = websiteField.Value.AsString();
Console.WriteLine($ " Website: '{website}', with confidence {websiteField.Confidence}");
}
}
}
}
FormField mobilePhonesFields;
if (businessCard.Fields.TryGetValue("MobilePhones", out mobilePhonesFields)) {
if (mobilePhonesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField mobilePhoneField in mobilePhonesFields.Value.AsList()) {
if (mobilePhoneField.Value.ValueType == FieldValueType.PhoneNumber) {
string mobilePhone = mobilePhoneField.Value.AsPhoneNumber();
Console.WriteLine($ " Mobile phone number: '{mobilePhone}', with confidence {mobilePhoneField.Confidence}");
}
}
}
}
FormField otherPhonesFields;
if (businessCard.Fields.TryGetValue("OtherPhones", out otherPhonesFields)) {
if (otherPhonesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField otherPhoneField in otherPhonesFields.Value.AsList()) {
if (otherPhoneField.Value.ValueType == FieldValueType.PhoneNumber) {
string otherPhone = otherPhoneField.Value.AsPhoneNumber();
Console.WriteLine($ " Other phone number: '{otherPhone}', with confidence {otherPhoneField.Confidence}");
}
}
}
}
FormField faxesFields;
if (businessCard.Fields.TryGetValue("Faxes", out faxesFields)) {
if (faxesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField faxField in faxesFields.Value.AsList()) {
if (faxField.Value.ValueType == FieldValueType.PhoneNumber) {
string fax = faxField.Value.AsPhoneNumber();
Console.WriteLine($ " Fax phone number: '{fax}', with confidence {faxField.Confidence}");
}
}
}
}
FormField addressesFields;
if (businessCard.Fields.TryGetValue("Addresses", out addressesFields)) {
if (addressesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField addressField in addressesFields.Value.AsList()) {
if (addressField.Value.ValueType == FieldValueType.String) {
string address = addressField.Value.AsString();
Console.WriteLine($ " Address: '{address}', with confidence {addressField.Confidence}");
}
}
}
}
FormField companyNamesFields;
if (businessCard.Fields.TryGetValue("CompanyNames", out companyNamesFields)) {
if (companyNamesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField companyNameField in companyNamesFields.Value.AsList()) {
if (companyNameField.Value.ValueType == FieldValueType.String) {
string companyName = companyNameField.Value.AsString();
Console.WriteLine($ " Company name: '{companyName}', with confidence {companyNameField.Confidence}");
}
}
}
}
}
}
// </snippet_bc_print>
// <snippet_invoice_call>
private static async Task AnalyzeInvoice(
FormRecognizerClient recognizerClient, string invoiceUrl) {
var options = new RecognizeInvoicesOptions() {
Locale = "en-US"
};
RecognizedFormCollection invoices = await recognizerClient.StartRecognizeInvoicesFromUriAsync(invoiceUrl, options).WaitForCompletionAsync();
// </snippet_invoice_call>
// <snippet_invoice_print>
RecognizedForm invoice = invoices.Single();
FormField invoiceIdField;
if (invoice.Fields.TryGetValue("InvoiceId", out invoiceIdField)) {
if (invoiceIdField.Value.ValueType == FieldValueType.String) {
string invoiceId = invoiceIdField.Value.AsString();
Console.WriteLine($ " Invoice Id: '{invoiceId}', with confidence {invoiceIdField.Confidence}");
}
}
FormField invoiceDateField;
if (invoice.Fields.TryGetValue("InvoiceDate", out invoiceDateField)) {
if (invoiceDateField.Value.ValueType == FieldValueType.Date) {
DateTime invoiceDate = invoiceDateField.Value.AsDate();
Console.WriteLine($ " Invoice Date: '{invoiceDate}', with confidence {invoiceDateField.Confidence}");
}
}
FormField dueDateField;
if (invoice.Fields.TryGetValue("DueDate", out dueDateField)) {
if (dueDateField.Value.ValueType == FieldValueType.Date) {
DateTime dueDate = dueDateField.Value.AsDate();
Console.WriteLine($ " Due Date: '{dueDate}', with confidence {dueDateField.Confidence}");
}
}
FormField vendorNameField;
if (invoice.Fields.TryGetValue("VendorName", out vendorNameField)) {
if (vendorNameField.Value.ValueType == FieldValueType.String) {
string vendorName = vendorNameField.Value.AsString();
Console.WriteLine($ " Vendor Name: '{vendorName}', with confidence {vendorNameField.Confidence}");
}
}
FormField vendorAddressField;
if (invoice.Fields.TryGetValue("VendorAddress", out vendorAddressField)) {
if (vendorAddressField.Value.ValueType == FieldValueType.String) {
string vendorAddress = vendorAddressField.Value.AsString();
Console.WriteLine($ " Vendor Address: '{vendorAddress}', with confidence {vendorAddressField.Confidence}");
}
}
FormField customerNameField;
if (invoice.Fields.TryGetValue("CustomerName", out customerNameField)) {
if (customerNameField.Value.ValueType == FieldValueType.String) {
string customerName = customerNameField.Value.AsString();
Console.WriteLine($ " Customer Name: '{customerName}', with confidence {customerNameField.Confidence}");
}
}
FormField customerAddressField;
if (invoice.Fields.TryGetValue("CustomerAddress", out customerAddressField)) {
if (customerAddressField.Value.ValueType == FieldValueType.String) {
string customerAddress = customerAddressField.Value.AsString();
Console.WriteLine($ " Customer Address: '{customerAddress}', with confidence {customerAddressField.Confidence}");
}
}
FormField customerAddressRecipientField;
if (invoice.Fields.TryGetValue("CustomerAddressRecipient", out customerAddressRecipientField)) {
if (customerAddressRecipientField.Value.ValueType == FieldValueType.String) {
string customerAddressRecipient = customerAddressRecipientField.Value.AsString();
Console.WriteLine($ " Customer address recipient: '{customerAddressRecipient}', with confidence {customerAddressRecipientField.Confidence}");
}
}
FormField invoiceTotalField;
if (invoice.Fields.TryGetValue("InvoiceTotal", out invoiceTotalField)) {
if (invoiceTotalField.Value.ValueType == FieldValueType.Float) {
float invoiceTotal = invoiceTotalField.Value.AsFloat();
Console.WriteLine($ " Invoice Total: '{invoiceTotal}', with confidence {invoiceTotalField.Confidence}");
}
}
}
// </snippet_invoice_print>
// <snippet_id_call>
private static async Task AnalyzeId(
FormRecognizerClient recognizerClient, string idUrl) {
RecognizedFormCollection identityDocument = await recognizerClient.StartRecognizeIdDocumentsFromUriAsync(idUrl).WaitForCompletionAsync();
//</snippet_id_call>
//<snippet_id_print>
RecognizedForm identityDocument = identityDocuments.Single();
if (identityDocument.Fields.TryGetValue("Address", out FormField addressField)) {
if (addressField.Value.ValueType == FieldValueType.String) {
string address = addressField.Value.AsString();
Console.WriteLine($ "Address: '{address}', with confidence {addressField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("CountryRegion", out FormField countryRegionField)) {
if (countryRegionField.Value.ValueType == FieldValueType.CountryRegion) {
string countryRegion = countryRegionField.Value.AsCountryRegion();
Console.WriteLine($ "CountryRegion: '{countryRegion}', with confidence {countryRegionField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfBirth", out FormField dateOfBirthField)) {
if (dateOfBirthField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfBirth = dateOfBirthField.Value.AsDate();
Console.WriteLine($ "Date Of Birth: '{dateOfBirth}', with confidence {dateOfBirthField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfExpiration", out FormField dateOfExpirationField)) {
if (dateOfExpirationField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfExpiration = dateOfExpirationField.Value.AsDate();
Console.WriteLine($ "Date Of Expiration: '{dateOfExpiration}', with confidence {dateOfExpirationField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DocumentNumber", out FormField documentNumberField)) {
if (documentNumberField.Value.ValueType == FieldValueType.String) {
string documentNumber = documentNumberField.Value.AsString();
Console.WriteLine($ "Document Number: '{documentNumber}', with confidence {documentNumberField.Confidence}");
}
RecognizedForm identityDocument = identityDocuments.Single();
if (identityDocument.Fields.TryGetValue("Address", out FormField addressField)) {
if (addressField.Value.ValueType == FieldValueType.String) {
string address = addressField.Value.AsString();
Console.WriteLine($ "Address: '{address}', with confidence {addressField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("CountryRegion", out FormField countryRegionField)) {
if (countryRegionField.Value.ValueType == FieldValueType.CountryRegion) {
string countryRegion = countryRegionField.Value.AsCountryRegion();
Console.WriteLine($ "CountryRegion: '{countryRegion}', with confidence {countryRegionField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfBirth", out FormField dateOfBirthField)) {
if (dateOfBirthField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfBirth = dateOfBirthField.Value.AsDate();
Console.WriteLine($ "Date Of Birth: '{dateOfBirth}', with confidence {dateOfBirthField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfExpiration", out FormField dateOfExpirationField)) {
if (dateOfExpirationField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfExpiration = dateOfExpirationField.Value.AsDate();
Console.WriteLine($ "Date Of Expiration: '{dateOfExpiration}', with confidence {dateOfExpirationField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DocumentNumber", out FormField documentNumberField)) {
if (documentNumberField.Value.ValueType == FieldValueType.String) {
string documentNumber = documentNumberField.Value.AsString();
Console.WriteLine($ "Document Number: '{documentNumber}', with confidence {documentNumberField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("FirstName", out FormField firstNameField)) {
if (firstNameField.Value.ValueType == FieldValueType.String) {
string firstName = firstNameField.Value.AsString();
Console.WriteLine($ "First Name: '{firstName}', with confidence {firstNameField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("LastName", out FormField lastNameField)) {
if (lastNameField.Value.ValueType == FieldValueType.String) {
string lastName = lastNameField.Value.AsString();
Console.WriteLine($ "Last Name: '{lastName}', with confidence {lastNameField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("Region", out FormField regionfield)) {
if (regionfield.Value.ValueType == FieldValueType.String) {
string region = regionfield.Value.AsString();
Console.WriteLine($ "Region: '{region}', with confidence {regionfield.Confidence}");
}
}
//</snippet_id_print>
// <snippet_train>
private static async Task < Guid > TrainModel(
FormRecognizerClient trainingClient, string trainingDataUrl) {
CustomFormModel model = await trainingClient.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false).WaitForCompletionAsync();
Console.WriteLine($ "Custom Model Info:");
Console.WriteLine($ " Model Id: {model.ModelId}");
Console.WriteLine($ " Model Status: {model.Status}");
Console.WriteLine($ " Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($ " Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach(CustomFormSubmodel submodel in model.Submodels) {
Console.WriteLine($ "Submodel Form Type: {submodel.FormType}");
foreach(CustomFormModelField field in submodel.Fields.Values) {
Console.Write($ " FieldName: {field.Name}");
if (field.Label != null) {
Console.Write($ ", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task < Guid > TrainModelWithLabels(
FormRecognizerClient trainingClient, string trainingDataUrl) {
CustomFormModel model = await trainingClient.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true).WaitForCompletionAsync();
Console.WriteLine($ "Custom Model Info:");
Console.WriteLine($ " Model Id: {model.ModelId}");
Console.WriteLine($ " Model Status: {model.Status}");
Console.WriteLine($ " Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($ " Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach(CustomFormSubmodel submodel in model.Submodels) {
Console.WriteLine($ "Submodel Form Type: {submodel.FormType}");
foreach(CustomFormModelField field in submodel.Fields.Values) {
Console.Write($ " FieldName: {field.Name}");
if (field.Label != null) {
Console.Write($ ", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, Guid modelId, string formUrl) {
RecognizedFormCollection forms = await recognizeClient.StartRecognizeCustomFormsFromUri(modelId, new Uri(invoiceUri)).WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach(RecognizedForm form in forms) {
Console.WriteLine($ "Form of type: {form.FormType}");
foreach(FormField field in form.Fields.Values) {
Console.WriteLine($ "Field '{field.Name}: ");
if (field.LabelData != null) {
Console.WriteLine($ " Label: '{field.LabelData.Text}");
}
Console.WriteLine($ " Value: '{field.ValueData.Text}");
Console.WriteLine($ " Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach(FormPage page in form.Pages.Values) {
for (int i = 0; i < page.Tables.Count; i++) {
FormTable table = page.Tables[i];
Console.WriteLine($ "Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach(FormTableCell cell in table.Cells) {
Console.WriteLine($ " Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "
header " : "
text ")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormRecognizerClient trainingClient, string trainingFileUrl) {
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($ "Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($ "It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable < CustomFormModelInfo > models = trainingClient.GetCustomModels();
foreach(CustomFormModelInfo modelInfo in models) {
Console.WriteLine($ "Custom Model Info:");
Console.WriteLine($ " Model Id: {modelInfo.ModelId}");
Console.WriteLine($ " Model Status: {modelInfo.Status}");
Console.WriteLine($ " Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($ " Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(modelId);
Console.WriteLine($ "Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach(CustomFormSubmodel submodel in modelCopy.Submodels) {
Console.WriteLine($ "Submodel Form Type: {submodel.FormType}");
foreach(CustomFormModelField field in submodel.Fields.Values) {
Console.Write($ " FieldName: {field.Name}");
if (field.Label != null) {
Console.Write($ ", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
提示
还可分析本地 ID 文档图像。 请参阅 FormRecognizerClient 方法,例如 StartRecognizeIdentityDocumentsAsync
。 或者,请参阅 GitHub 上的示例代码,了解涉及本地图像的方案。
以下代码会处理给定 URI 的 ID 文档,并将主字段和值输出到控制台。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program {
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
string receiptUrl = "/cognitive-services/form-recognizer/media" + "/contoso-allinone.jpg";
string bcUrl = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_forms/business_cards/business-card-english.jpg";
string invoiceUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
string idUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/id-license.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args) {
// new code:
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var analyzeBusinessCard = AnalyzeBusinessCard(recognizerClient, bcUrl);
Task.WaitAll(analyzeBusinessCard);
var analyzeInvoice = AnalyzeInvoice(recognizerClient, invoiceUrl);
Task.WaitAll(analyzeInvoice);
var analyzeId = AnalyzeId(recognizerClient, idUrl);
Task.WaitAll(analyzeId);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var trainModelWithLabels = TrainModelWithLabels(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, modelId, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
static private FormRecognizerClient AuthenticateClient() {
string endpoint = "<replace-with-your-form-recognizer-endpoint-here>";
string apiKey = "<replace-with-your-form-recognizer-key-here>";
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient) {
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizeClient.StartRecognizeContentFromUri(new Uri(invoiceUri)).WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach(FormPage page in formPages) {
Console.WriteLine($ "Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++) {
FormLine line = page.Lines[i];
Console.WriteLine($ " Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "
s " : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++) {
FormTable table = page.Tables[i];
Console.WriteLine($ "Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach(FormTableCell cell in table.Cells) {
Console.WriteLine($ " Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri) {
RecognizedFormCollection receipts = await recognizeClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach(RecognizedForm receipt in receipts) {
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField)) {
if (merchantNameField.Value.ValueType == FieldValueType.String) {
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($ "Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField)) {
if (transactionDateField.Value.ValueType == FieldValueType.Date) {
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($ "Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField)) {
if (itemsField.Value.ValueType == FieldValueType.List) {
foreach(FormField itemField in itemsField.Value.AsList()) {
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary) {
IReadOnlyDictionary < string,
FormField > itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField)) {
if (itemNameField.Value.ValueType == FieldValueType.String) {
string itemName = itemNameField.Value.AsString();
Console.WriteLine($ " Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField)) {
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float) {
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($ " Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField)) {
if (totalField.Value.ValueType == FieldValueType.Float) {
float total = totalField.Value.AsFloat();
Console.WriteLine($ "Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_bc_call>
private static async Task AnalyzeBusinessCard(
FormRecognizerClient recognizerClient, string bcUrl) {
RecognizedFormCollection businessCards = await recognizerClient.StartRecognizeBusinessCardsFromUriAsync(bcUrl).WaitForCompletionAsync();
// </snippet_bc_call>
// <snippet_bc_print>
foreach(RecognizedForm businessCard in businessCards) {
FormField ContactNamesField;
if (businessCard.Fields.TryGetValue("ContactNames", out ContactNamesField)) {
if (ContactNamesField.Value.ValueType == FieldValueType.List) {
foreach(FormField contactNameField in ContactNamesField.Value.AsList()) {
Console.WriteLine($ "Contact Name: {contactNameField.ValueData.Text}");
if (contactNameField.Value.ValueType == FieldValueType.Dictionary) {
IReadOnlyDictionary < string,
FormField > contactNameFields = contactNameField.Value.AsDictionary();
FormField firstNameField;
if (contactNameFields.TryGetValue("FirstName", out firstNameField)) {
if (firstNameField.Value.ValueType == FieldValueType.String) {
string firstName = firstNameField.Value.AsString();
Console.WriteLine($ " First Name: '{firstName}', with confidence {firstNameField.Confidence}");
}
}
FormField lastNameField;
if (contactNameFields.TryGetValue("LastName", out lastNameField)) {
if (lastNameField.Value.ValueType == FieldValueType.String) {
string lastName = lastNameField.Value.AsString();
Console.WriteLine($ " Last Name: '{lastName}', with confidence {lastNameField.Confidence}");
}
}
}
}
}
}
FormField jobTitlesFields;
if (businessCard.Fields.TryGetValue("JobTitles", out jobTitlesFields)) {
if (jobTitlesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField jobTitleField in jobTitlesFields.Value.AsList()) {
if (jobTitleField.Value.ValueType == FieldValueType.String) {
string jobTitle = jobTitleField.Value.AsString();
Console.WriteLine($ " Job Title: '{jobTitle}', with confidence {jobTitleField.Confidence}");
}
}
}
}
FormField departmentFields;
if (businessCard.Fields.TryGetValue("Departments", out departmentFields)) {
if (departmentFields.Value.ValueType == FieldValueType.List) {
foreach(FormField departmentField in departmentFields.Value.AsList()) {
if (departmentField.Value.ValueType == FieldValueType.String) {
string department = departmentField.Value.AsString();
Console.WriteLine($ " Department: '{department}', with confidence {departmentField.Confidence}");
}
}
}
}
FormField emailFields;
if (businessCard.Fields.TryGetValue("Emails", out emailFields)) {
if (emailFields.Value.ValueType == FieldValueType.List) {
foreach(FormField emailField in emailFields.Value.AsList()) {
if (emailField.Value.ValueType == FieldValueType.String) {
string email = emailField.Value.AsString();
Console.WriteLine($ " Email: '{email}', with confidence {emailField.Confidence}");
}
}
}
}
FormField websiteFields;
if (businessCard.Fields.TryGetValue("Websites", out websiteFields)) {
if (websiteFields.Value.ValueType == FieldValueType.List) {
foreach(FormField websiteField in websiteFields.Value.AsList()) {
if (websiteField.Value.ValueType == FieldValueType.String) {
string website = websiteField.Value.AsString();
Console.WriteLine($ " Website: '{website}', with confidence {websiteField.Confidence}");
}
}
}
}
FormField mobilePhonesFields;
if (businessCard.Fields.TryGetValue("MobilePhones", out mobilePhonesFields)) {
if (mobilePhonesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField mobilePhoneField in mobilePhonesFields.Value.AsList()) {
if (mobilePhoneField.Value.ValueType == FieldValueType.PhoneNumber) {
string mobilePhone = mobilePhoneField.Value.AsPhoneNumber();
Console.WriteLine($ " Mobile phone number: '{mobilePhone}', with confidence {mobilePhoneField.Confidence}");
}
}
}
}
FormField otherPhonesFields;
if (businessCard.Fields.TryGetValue("OtherPhones", out otherPhonesFields)) {
if (otherPhonesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField otherPhoneField in otherPhonesFields.Value.AsList()) {
if (otherPhoneField.Value.ValueType == FieldValueType.PhoneNumber) {
string otherPhone = otherPhoneField.Value.AsPhoneNumber();
Console.WriteLine($ " Other phone number: '{otherPhone}', with confidence {otherPhoneField.Confidence}");
}
}
}
}
FormField faxesFields;
if (businessCard.Fields.TryGetValue("Faxes", out faxesFields)) {
if (faxesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField faxField in faxesFields.Value.AsList()) {
if (faxField.Value.ValueType == FieldValueType.PhoneNumber) {
string fax = faxField.Value.AsPhoneNumber();
Console.WriteLine($ " Fax phone number: '{fax}', with confidence {faxField.Confidence}");
}
}
}
}
FormField addressesFields;
if (businessCard.Fields.TryGetValue("Addresses", out addressesFields)) {
if (addressesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField addressField in addressesFields.Value.AsList()) {
if (addressField.Value.ValueType == FieldValueType.String) {
string address = addressField.Value.AsString();
Console.WriteLine($ " Address: '{address}', with confidence {addressField.Confidence}");
}
}
}
}
FormField companyNamesFields;
if (businessCard.Fields.TryGetValue("CompanyNames", out companyNamesFields)) {
if (companyNamesFields.Value.ValueType == FieldValueType.List) {
foreach(FormField companyNameField in companyNamesFields.Value.AsList()) {
if (companyNameField.Value.ValueType == FieldValueType.String) {
string companyName = companyNameField.Value.AsString();
Console.WriteLine($ " Company name: '{companyName}', with confidence {companyNameField.Confidence}");
}
}
}
}
}
}
// </snippet_bc_print>
// <snippet_invoice_call>
private static async Task AnalyzeInvoice(
FormRecognizerClient recognizerClient, string invoiceUrl) {
var options = new RecognizeInvoicesOptions() {
Locale = "en-US"
};
RecognizedFormCollection invoices = await recognizerClient.StartRecognizeInvoicesFromUriAsync(invoiceUrl, options).WaitForCompletionAsync();
// </snippet_invoice_call>
// <snippet_invoice_print>
RecognizedForm invoice = invoices.Single();
FormField invoiceIdField;
if (invoice.Fields.TryGetValue("InvoiceId", out invoiceIdField)) {
if (invoiceIdField.Value.ValueType == FieldValueType.String) {
string invoiceId = invoiceIdField.Value.AsString();
Console.WriteLine($ " Invoice Id: '{invoiceId}', with confidence {invoiceIdField.Confidence}");
}
}
FormField invoiceDateField;
if (invoice.Fields.TryGetValue("InvoiceDate", out invoiceDateField)) {
if (invoiceDateField.Value.ValueType == FieldValueType.Date) {
DateTime invoiceDate = invoiceDateField.Value.AsDate();
Console.WriteLine($ " Invoice Date: '{invoiceDate}', with confidence {invoiceDateField.Confidence}");
}
}
FormField dueDateField;
if (invoice.Fields.TryGetValue("DueDate", out dueDateField)) {
if (dueDateField.Value.ValueType == FieldValueType.Date) {
DateTime dueDate = dueDateField.Value.AsDate();
Console.WriteLine($ " Due Date: '{dueDate}', with confidence {dueDateField.Confidence}");
}
}
FormField vendorNameField;
if (invoice.Fields.TryGetValue("VendorName", out vendorNameField)) {
if (vendorNameField.Value.ValueType == FieldValueType.String) {
string vendorName = vendorNameField.Value.AsString();
Console.WriteLine($ " Vendor Name: '{vendorName}', with confidence {vendorNameField.Confidence}");
}
}
FormField vendorAddressField;
if (invoice.Fields.TryGetValue("VendorAddress", out vendorAddressField)) {
if (vendorAddressField.Value.ValueType == FieldValueType.String) {
string vendorAddress = vendorAddressField.Value.AsString();
Console.WriteLine($ " Vendor Address: '{vendorAddress}', with confidence {vendorAddressField.Confidence}");
}
}
FormField customerNameField;
if (invoice.Fields.TryGetValue("CustomerName", out customerNameField)) {
if (customerNameField.Value.ValueType == FieldValueType.String) {
string customerName = customerNameField.Value.AsString();
Console.WriteLine($ " Customer Name: '{customerName}', with confidence {customerNameField.Confidence}");
}
}
FormField customerAddressField;
if (invoice.Fields.TryGetValue("CustomerAddress", out customerAddressField)) {
if (customerAddressField.Value.ValueType == FieldValueType.String) {
string customerAddress = customerAddressField.Value.AsString();
Console.WriteLine($ " Customer Address: '{customerAddress}', with confidence {customerAddressField.Confidence}");
}
}
FormField customerAddressRecipientField;
if (invoice.Fields.TryGetValue("CustomerAddressRecipient", out customerAddressRecipientField)) {
if (customerAddressRecipientField.Value.ValueType == FieldValueType.String) {
string customerAddressRecipient = customerAddressRecipientField.Value.AsString();
Console.WriteLine($ " Customer address recipient: '{customerAddressRecipient}', with confidence {customerAddressRecipientField.Confidence}");
}
}
FormField invoiceTotalField;
if (invoice.Fields.TryGetValue("InvoiceTotal", out invoiceTotalField)) {
if (invoiceTotalField.Value.ValueType == FieldValueType.Float) {
float invoiceTotal = invoiceTotalField.Value.AsFloat();
Console.WriteLine($ " Invoice Total: '{invoiceTotal}', with confidence {invoiceTotalField.Confidence}");
}
}
}
// </snippet_invoice_print>
// <snippet_id_call>
private static async Task AnalyzeId(
FormRecognizerClient recognizerClient, string idUrl) {
RecognizedFormCollection identityDocument = await recognizerClient.StartRecognizeIdDocumentsFromUriAsync(idUrl).WaitForCompletionAsync();
//</snippet_id_call>
//<snippet_id_print>
RecognizedForm identityDocument = identityDocuments.Single();
if (identityDocument.Fields.TryGetValue("Address", out FormField addressField)) {
if (addressField.Value.ValueType == FieldValueType.String) {
string address = addressField.Value.AsString();
Console.WriteLine($ "Address: '{address}', with confidence {addressField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("CountryRegion", out FormField countryRegionField)) {
if (countryRegionField.Value.ValueType == FieldValueType.CountryRegion) {
string countryRegion = countryRegionField.Value.AsCountryRegion();
Console.WriteLine($ "CountryRegion: '{countryRegion}', with confidence {countryRegionField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfBirth", out FormField dateOfBirthField)) {
if (dateOfBirthField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfBirth = dateOfBirthField.Value.AsDate();
Console.WriteLine($ "Date Of Birth: '{dateOfBirth}', with confidence {dateOfBirthField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfExpiration", out FormField dateOfExpirationField)) {
if (dateOfExpirationField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfExpiration = dateOfExpirationField.Value.AsDate();
Console.WriteLine($ "Date Of Expiration: '{dateOfExpiration}', with confidence {dateOfExpirationField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DocumentNumber", out FormField documentNumberField)) {
if (documentNumberField.Value.ValueType == FieldValueType.String) {
string documentNumber = documentNumberField.Value.AsString();
Console.WriteLine($ "Document Number: '{documentNumber}', with confidence {documentNumberField.Confidence}");
}
RecognizedForm identityDocument = identityDocuments.Single();
if (identityDocument.Fields.TryGetValue("Address", out FormField addressField)) {
if (addressField.Value.ValueType == FieldValueType.String) {
string address = addressField.Value.AsString();
Console.WriteLine($ "Address: '{address}', with confidence {addressField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("CountryRegion", out FormField countryRegionField)) {
if (countryRegionField.Value.ValueType == FieldValueType.CountryRegion) {
string countryRegion = countryRegionField.Value.AsCountryRegion();
Console.WriteLine($ "CountryRegion: '{countryRegion}', with confidence {countryRegionField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfBirth", out FormField dateOfBirthField)) {
if (dateOfBirthField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfBirth = dateOfBirthField.Value.AsDate();
Console.WriteLine($ "Date Of Birth: '{dateOfBirth}', with confidence {dateOfBirthField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DateOfExpiration", out FormField dateOfExpirationField)) {
if (dateOfExpirationField.Value.ValueType == FieldValueType.Date) {
DateTime dateOfExpiration = dateOfExpirationField.Value.AsDate();
Console.WriteLine($ "Date Of Expiration: '{dateOfExpiration}', with confidence {dateOfExpirationField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("DocumentNumber", out FormField documentNumberField)) {
if (documentNumberField.Value.ValueType == FieldValueType.String) {
string documentNumber = documentNumberField.Value.AsString();
Console.WriteLine($ "Document Number: '{documentNumber}', with confidence {documentNumberField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("FirstName", out FormField firstNameField)) {
if (firstNameField.Value.ValueType == FieldValueType.String) {
string firstName = firstNameField.Value.AsString();
Console.WriteLine($ "First Name: '{firstName}', with confidence {firstNameField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("LastName", out FormField lastNameField)) {
if (lastNameField.Value.ValueType == FieldValueType.String) {
string lastName = lastNameField.Value.AsString();
Console.WriteLine($ "Last Name: '{lastName}', with confidence {lastNameField.Confidence}");
}
}
if (identityDocument.Fields.TryGetValue("Region", out FormField regionfield)) {
if (regionfield.Value.ValueType == FieldValueType.String) {
string region = regionfield.Value.AsString();
Console.WriteLine($ "Region: '{region}', with confidence {regionfield.Confidence}");
}
}
//</snippet_id_print>
// <snippet_train>
private static async Task < Guid > TrainModel(
FormRecognizerClient trainingClient, string trainingDataUrl) {
CustomFormModel model = await trainingClient.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false).WaitForCompletionAsync();
Console.WriteLine($ "Custom Model Info:");
Console.WriteLine($ " Model Id: {model.ModelId}");
Console.WriteLine($ " Model Status: {model.Status}");
Console.WriteLine($ " Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($ " Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach(CustomFormSubmodel submodel in model.Submodels) {
Console.WriteLine($ "Submodel Form Type: {submodel.FormType}");
foreach(CustomFormModelField field in submodel.Fields.Values) {
Console.Write($ " FieldName: {field.Name}");
if (field.Label != null) {
Console.Write($ ", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task < Guid > TrainModelWithLabels(
FormRecognizerClient trainingClient, string trainingDataUrl) {
CustomFormModel model = await trainingClient.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true).WaitForCompletionAsync();
Console.WriteLine($ "Custom Model Info:");
Console.WriteLine($ " Model Id: {model.ModelId}");
Console.WriteLine($ " Model Status: {model.Status}");
Console.WriteLine($ " Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($ " Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach(CustomFormSubmodel submodel in model.Submodels) {
Console.WriteLine($ "Submodel Form Type: {submodel.FormType}");
foreach(CustomFormModelField field in submodel.Fields.Values) {
Console.Write($ " FieldName: {field.Name}");
if (field.Label != null) {
Console.Write($ ", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, Guid modelId, string formUrl) {
RecognizedFormCollection forms = await recognizeClient.StartRecognizeCustomFormsFromUri(modelId, new Uri(invoiceUri)).WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach(RecognizedForm form in forms) {
Console.WriteLine($ "Form of type: {form.FormType}");
foreach(FormField field in form.Fields.Values) {
Console.WriteLine($ "Field '{field.Name}: ");
if (field.LabelData != null) {
Console.WriteLine($ " Label: '{field.LabelData.Text}");
}
Console.WriteLine($ " Value: '{field.ValueData.Text}");
Console.WriteLine($ " Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach(FormPage page in form.Pages.Values) {
for (int i = 0; i < page.Tables.Count; i++) {
FormTable table = page.Tables[i];
Console.WriteLine($ "Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach(FormTableCell cell in table.Cells) {
Console.WriteLine($ " Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "
header " : "
text ")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormRecognizerClient trainingClient, string trainingFileUrl) {
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($ "Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($ "It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable < CustomFormModelInfo > models = trainingClient.GetCustomModels();
foreach(CustomFormModelInfo modelInfo in models) {
Console.WriteLine($ "Custom Model Info:");
Console.WriteLine($ " Model Id: {modelInfo.ModelId}");
Console.WriteLine($ " Model Status: {modelInfo.Status}");
Console.WriteLine($ " Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($ " Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(modelId);
Console.WriteLine($ "Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach(CustomFormSubmodel submodel in modelCopy.Submodels) {
Console.WriteLine($ "Submodel Form Type: {submodel.FormType}");
foreach(CustomFormModelField field in submodel.Fields.Values) {
Console.Write($ " FieldName: {field.Name}");
if (field.Label != null) {
Console.Write($ ", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
训练自定义模型
本部分演示如何使用自己的数据训练模型。 训练的模型可以输出结构化数据,其中包含原始文档中的键/值关系。 训练模型后,可对其进行测试和重新训练,并最终使用它来根据需求提取其他表单中的数据。
注意
你还可以使用图形用户界面(例如文档智能示例标记工具)来训练模型。
不使用标签训练模型
训练自定义模型可分析在自定义表单中找到的所有字段和值,无需手动标记训练文档。 下面的方法使用给定文档集训练模型,并将该模型的状态输出到控制台。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program
{
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
private static string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
private static string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
private static string receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args)
{
var recognizerClient = AuthenticateClient();
var trainingClient = AuthenticateTrainingClient();
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, trainModel, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
private static FormRecognizerClient AuthenticateClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_auth_training>
static private FormTrainingClient AuthenticateTrainingClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormTrainingClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth_training>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient)
{
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizerClient
.StartRecognizeContentFromUri(new Uri(invoiceUri))
.WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach (FormPage page in formPages)
{
Console.WriteLine($"Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++)
{
FormLine line = page.Lines[i];
Console.WriteLine($" Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "s" : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri)
{
RecognizedFormCollection receipts = await recognizerClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach (RecognizedForm receipt in receipts)
{
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField))
{
if (merchantNameField.Value.ValueType == FieldValueType.String)
{
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($"Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField))
{
if (transactionDateField.Value.ValueType == FieldValueType.Date)
{
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($"Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField))
{
if (itemsField.Value.ValueType == FieldValueType.List)
{
foreach (FormField itemField in itemsField.Value.AsList())
{
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary)
{
IReadOnlyDictionary<string, FormField> itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField))
{
if (itemNameField.Value.ValueType == FieldValueType.String)
{
string itemName = itemNameField.Value.AsString();
Console.WriteLine($" Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField))
{
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float)
{
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($" Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField))
{
if (totalField.Value.ValueType == FieldValueType.Float)
{
float total = totalField.Value.AsFloat();
Console.WriteLine($"Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_train>
private static async Task<String> TrainModel(
FormTrainingClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task<Guid> TrainModelWithLabelsAsync(
FormRecognizerClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, String modelId, string formUrl)
{
RecognizedFormCollection forms = await recognizerClient
.StartRecognizeCustomFormsFromUri(modelId, new Uri(formUrl))
.WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach (RecognizedForm form in forms)
{
Console.WriteLine($"Form of type: {form.FormType}");
foreach (FormField field in form.Fields.Values)
{
Console.WriteLine($"Field '{field.Name}: ");
if (field.LabelData != null)
{
Console.WriteLine($" Label: '{field.LabelData.Text}");
}
Console.WriteLine($" Value: '{field.ValueData.Text}");
Console.WriteLine($" Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach (FormPage page in form.Pages)
{
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "header" : "text")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormTrainingClient trainingClient, string trainingFileUrl)
{
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($"Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($"It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable<CustomFormModelInfo> models = trainingClient.GetCustomModels();
foreach (CustomFormModelInfo modelInfo in models)
{
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {modelInfo.ModelId}");
Console.WriteLine($" Model Status: {modelInfo.Status}");
Console.WriteLine($" Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(model.ModelId);
Console.WriteLine($"Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach (CustomFormSubmodel submodel in modelCopy.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
返回的 CustomFormModel
对象包含该模型可以分析的表单类型以及它可以从每种表单类型中提取的字段的相关信息。 下面的代码块将此信息输出到控制台。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program
{
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
private static string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
private static string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
private static string receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args)
{
var recognizerClient = AuthenticateClient();
var trainingClient = AuthenticateTrainingClient();
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, trainModel, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
private static FormRecognizerClient AuthenticateClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_auth_training>
static private FormTrainingClient AuthenticateTrainingClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormTrainingClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth_training>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient)
{
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizerClient
.StartRecognizeContentFromUri(new Uri(invoiceUri))
.WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach (FormPage page in formPages)
{
Console.WriteLine($"Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++)
{
FormLine line = page.Lines[i];
Console.WriteLine($" Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "s" : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri)
{
RecognizedFormCollection receipts = await recognizerClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach (RecognizedForm receipt in receipts)
{
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField))
{
if (merchantNameField.Value.ValueType == FieldValueType.String)
{
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($"Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField))
{
if (transactionDateField.Value.ValueType == FieldValueType.Date)
{
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($"Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField))
{
if (itemsField.Value.ValueType == FieldValueType.List)
{
foreach (FormField itemField in itemsField.Value.AsList())
{
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary)
{
IReadOnlyDictionary<string, FormField> itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField))
{
if (itemNameField.Value.ValueType == FieldValueType.String)
{
string itemName = itemNameField.Value.AsString();
Console.WriteLine($" Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField))
{
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float)
{
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($" Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField))
{
if (totalField.Value.ValueType == FieldValueType.Float)
{
float total = totalField.Value.AsFloat();
Console.WriteLine($"Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_train>
private static async Task<String> TrainModel(
FormTrainingClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task<Guid> TrainModelWithLabelsAsync(
FormRecognizerClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, String modelId, string formUrl)
{
RecognizedFormCollection forms = await recognizerClient
.StartRecognizeCustomFormsFromUri(modelId, new Uri(formUrl))
.WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach (RecognizedForm form in forms)
{
Console.WriteLine($"Form of type: {form.FormType}");
foreach (FormField field in form.Fields.Values)
{
Console.WriteLine($"Field '{field.Name}: ");
if (field.LabelData != null)
{
Console.WriteLine($" Label: '{field.LabelData.Text}");
}
Console.WriteLine($" Value: '{field.ValueData.Text}");
Console.WriteLine($" Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach (FormPage page in form.Pages)
{
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "header" : "text")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormTrainingClient trainingClient, string trainingFileUrl)
{
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($"Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($"It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable<CustomFormModelInfo> models = trainingClient.GetCustomModels();
foreach (CustomFormModelInfo modelInfo in models)
{
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {modelInfo.ModelId}");
Console.WriteLine($" Model Status: {modelInfo.Status}");
Console.WriteLine($" Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(model.ModelId);
Console.WriteLine($"Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach (CustomFormSubmodel submodel in modelCopy.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
最后,返回训练的模型 ID 以供后续步骤使用。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program
{
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
private static string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
private static string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
private static string receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args)
{
var recognizerClient = AuthenticateClient();
var trainingClient = AuthenticateTrainingClient();
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, trainModel, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
private static FormRecognizerClient AuthenticateClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_auth_training>
static private FormTrainingClient AuthenticateTrainingClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormTrainingClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth_training>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient)
{
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizerClient
.StartRecognizeContentFromUri(new Uri(invoiceUri))
.WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach (FormPage page in formPages)
{
Console.WriteLine($"Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++)
{
FormLine line = page.Lines[i];
Console.WriteLine($" Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "s" : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri)
{
RecognizedFormCollection receipts = await recognizerClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach (RecognizedForm receipt in receipts)
{
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField))
{
if (merchantNameField.Value.ValueType == FieldValueType.String)
{
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($"Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField))
{
if (transactionDateField.Value.ValueType == FieldValueType.Date)
{
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($"Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField))
{
if (itemsField.Value.ValueType == FieldValueType.List)
{
foreach (FormField itemField in itemsField.Value.AsList())
{
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary)
{
IReadOnlyDictionary<string, FormField> itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField))
{
if (itemNameField.Value.ValueType == FieldValueType.String)
{
string itemName = itemNameField.Value.AsString();
Console.WriteLine($" Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField))
{
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float)
{
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($" Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField))
{
if (totalField.Value.ValueType == FieldValueType.Float)
{
float total = totalField.Value.AsFloat();
Console.WriteLine($"Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_train>
private static async Task<String> TrainModel(
FormTrainingClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task<Guid> TrainModelWithLabelsAsync(
FormRecognizerClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, String modelId, string formUrl)
{
RecognizedFormCollection forms = await recognizerClient
.StartRecognizeCustomFormsFromUri(modelId, new Uri(formUrl))
.WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach (RecognizedForm form in forms)
{
Console.WriteLine($"Form of type: {form.FormType}");
foreach (FormField field in form.Fields.Values)
{
Console.WriteLine($"Field '{field.Name}: ");
if (field.LabelData != null)
{
Console.WriteLine($" Label: '{field.LabelData.Text}");
}
Console.WriteLine($" Value: '{field.ValueData.Text}");
Console.WriteLine($" Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach (FormPage page in form.Pages)
{
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "header" : "text")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormTrainingClient trainingClient, string trainingFileUrl)
{
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($"Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($"It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable<CustomFormModelInfo> models = trainingClient.GetCustomModels();
foreach (CustomFormModelInfo modelInfo in models)
{
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {modelInfo.ModelId}");
Console.WriteLine($" Model Status: {modelInfo.Status}");
Console.WriteLine($" Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(model.ModelId);
Console.WriteLine($"Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach (CustomFormSubmodel submodel in modelCopy.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
为了提高可读性,此示例输出已截断。
Merchant Name: 'Contoso Contoso', with confidence 0.516
Transaction Date: '6/10/2019 12:00:00 AM', with confidence 0.985
Item:
Name: '8GB RAM (Black)', with confidence 0.916
Total Price: '999', with confidence 0.559
Item:
Name: 'SurfacePen', with confidence 0.858
Total Price: '99.99', with confidence 0.386
Total: '1203.39', with confidence '0.774'
Form Page 1 has 18 lines.
Line 0 has 1 word, and text: 'Contoso'.
Line 1 has 1 word, and text: 'Address:'.
Line 2 has 3 words, and text: 'Invoice For: Microsoft'.
Line 3 has 4 words, and text: '1 Redmond way Suite'.
Line 4 has 3 words, and text: '1020 Enterprise Way'.
...
Table 0 has 2 rows and 6 columns.
Cell (0, 0) contains text: 'Invoice Number'.
Cell (0, 1) contains text: 'Invoice Date'.
Cell (0, 2) contains text: 'Invoice Due Date'.
Cell (0, 3) contains text: 'Charges'.
...
Custom Model Info:
Model Id: 95035721-f19d-40eb-8820-0c806b42798b
Model Status: Ready
Training model started on: 8/24/2020 6:36:44 PM +00:00
Training model completed on: 8/24/2020 6:36:50 PM +00:00
Submodel Form Type: form-95035721-f19d-40eb-8820-0c806b42798b
FieldName: CompanyAddress
FieldName: CompanyName
FieldName: CompanyPhoneNumber
...
Custom Model Info:
Model Id: e7a1181b-1fb7-40be-bfbe-1ee154183633
Model Status: Ready
Training model started on: 8/24/2020 6:36:44 PM +00:00
Training model completed on: 8/24/2020 6:36:52 PM +00:00
Submodel Form Type: form-0
FieldName: field-0, FieldLabel: Additional Notes:
FieldName: field-1, FieldLabel: Address:
FieldName: field-2, FieldLabel: Company Name:
FieldName: field-3, FieldLabel: Company Phone:
FieldName: field-4, FieldLabel: Dated As:
FieldName: field-5, FieldLabel: Details
FieldName: field-6, FieldLabel: Email:
FieldName: field-7, FieldLabel: Hero Limited
FieldName: field-8, FieldLabel: Name:
FieldName: field-9, FieldLabel: Phone:
...
使用标签训练模型
你还可以通过手动标记训练文档来训练自定义模型。 在某些情况下,使用标签训练可改善效果。 若要使用标签进行训练,需要在 Blob 存储容器中添加特殊标签信息文件 (<filename>.pdf.labels.json) 和训练文档。 文档智能示例标记工具提供了可帮助你创建这些标签文件的 UI。 获得它们后,可以调用“StartTrainingAsync
”方法,并将“uselabels
”参数设置为 true
。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program
{
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
private static string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
private static string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
private static string receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args)
{
var recognizerClient = AuthenticateClient();
var trainingClient = AuthenticateTrainingClient();
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, trainModel, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
private static FormRecognizerClient AuthenticateClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_auth_training>
static private FormTrainingClient AuthenticateTrainingClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormTrainingClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth_training>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient)
{
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizerClient
.StartRecognizeContentFromUri(new Uri(invoiceUri))
.WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach (FormPage page in formPages)
{
Console.WriteLine($"Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++)
{
FormLine line = page.Lines[i];
Console.WriteLine($" Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "s" : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri)
{
RecognizedFormCollection receipts = await recognizerClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach (RecognizedForm receipt in receipts)
{
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField))
{
if (merchantNameField.Value.ValueType == FieldValueType.String)
{
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($"Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField))
{
if (transactionDateField.Value.ValueType == FieldValueType.Date)
{
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($"Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField))
{
if (itemsField.Value.ValueType == FieldValueType.List)
{
foreach (FormField itemField in itemsField.Value.AsList())
{
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary)
{
IReadOnlyDictionary<string, FormField> itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField))
{
if (itemNameField.Value.ValueType == FieldValueType.String)
{
string itemName = itemNameField.Value.AsString();
Console.WriteLine($" Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField))
{
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float)
{
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($" Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField))
{
if (totalField.Value.ValueType == FieldValueType.Float)
{
float total = totalField.Value.AsFloat();
Console.WriteLine($"Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_train>
private static async Task<String> TrainModel(
FormTrainingClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task<Guid> TrainModelWithLabelsAsync(
FormRecognizerClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, String modelId, string formUrl)
{
RecognizedFormCollection forms = await recognizerClient
.StartRecognizeCustomFormsFromUri(modelId, new Uri(formUrl))
.WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach (RecognizedForm form in forms)
{
Console.WriteLine($"Form of type: {form.FormType}");
foreach (FormField field in form.Fields.Values)
{
Console.WriteLine($"Field '{field.Name}: ");
if (field.LabelData != null)
{
Console.WriteLine($" Label: '{field.LabelData.Text}");
}
Console.WriteLine($" Value: '{field.ValueData.Text}");
Console.WriteLine($" Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach (FormPage page in form.Pages)
{
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "header" : "text")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormTrainingClient trainingClient, string trainingFileUrl)
{
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($"Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($"It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable<CustomFormModelInfo> models = trainingClient.GetCustomModels();
foreach (CustomFormModelInfo modelInfo in models)
{
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {modelInfo.ModelId}");
Console.WriteLine($" Model Status: {modelInfo.Status}");
Console.WriteLine($" Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(model.ModelId);
Console.WriteLine($"Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach (CustomFormSubmodel submodel in modelCopy.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
返回的 CustomFormModel
指示模型可以提取的字段,以及每个字段中的估计准确度。 下面的代码块将此信息输出到控制台。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program
{
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
private static string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
private static string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
private static string receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args)
{
var recognizerClient = AuthenticateClient();
var trainingClient = AuthenticateTrainingClient();
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, trainModel, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
private static FormRecognizerClient AuthenticateClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_auth_training>
static private FormTrainingClient AuthenticateTrainingClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormTrainingClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth_training>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient)
{
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizerClient
.StartRecognizeContentFromUri(new Uri(invoiceUri))
.WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach (FormPage page in formPages)
{
Console.WriteLine($"Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++)
{
FormLine line = page.Lines[i];
Console.WriteLine($" Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "s" : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri)
{
RecognizedFormCollection receipts = await recognizerClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach (RecognizedForm receipt in receipts)
{
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField))
{
if (merchantNameField.Value.ValueType == FieldValueType.String)
{
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($"Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField))
{
if (transactionDateField.Value.ValueType == FieldValueType.Date)
{
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($"Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField))
{
if (itemsField.Value.ValueType == FieldValueType.List)
{
foreach (FormField itemField in itemsField.Value.AsList())
{
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary)
{
IReadOnlyDictionary<string, FormField> itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField))
{
if (itemNameField.Value.ValueType == FieldValueType.String)
{
string itemName = itemNameField.Value.AsString();
Console.WriteLine($" Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField))
{
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float)
{
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($" Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField))
{
if (totalField.Value.ValueType == FieldValueType.Float)
{
float total = totalField.Value.AsFloat();
Console.WriteLine($"Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_train>
private static async Task<String> TrainModel(
FormTrainingClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task<Guid> TrainModelWithLabelsAsync(
FormRecognizerClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, String modelId, string formUrl)
{
RecognizedFormCollection forms = await recognizerClient
.StartRecognizeCustomFormsFromUri(modelId, new Uri(formUrl))
.WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach (RecognizedForm form in forms)
{
Console.WriteLine($"Form of type: {form.FormType}");
foreach (FormField field in form.Fields.Values)
{
Console.WriteLine($"Field '{field.Name}: ");
if (field.LabelData != null)
{
Console.WriteLine($" Label: '{field.LabelData.Text}");
}
Console.WriteLine($" Value: '{field.ValueData.Text}");
Console.WriteLine($" Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach (FormPage page in form.Pages)
{
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "header" : "text")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormTrainingClient trainingClient, string trainingFileUrl)
{
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($"Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($"It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable<CustomFormModelInfo> models = trainingClient.GetCustomModels();
foreach (CustomFormModelInfo modelInfo in models)
{
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {modelInfo.ModelId}");
Console.WriteLine($" Model Status: {modelInfo.Status}");
Console.WriteLine($" Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(model.ModelId);
Console.WriteLine($"Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach (CustomFormSubmodel submodel in modelCopy.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
为了提高可读性,此示例输出已截断。
Form Page 1 has 18 lines.
Line 0 has 1 word, and text: 'Contoso'.
Line 1 has 1 word, and text: 'Address:'.
Line 2 has 3 words, and text: 'Invoice For: Microsoft'.
Line 3 has 4 words, and text: '1 Redmond way Suite'.
Line 4 has 3 words, and text: '1020 Enterprise Way'.
Line 5 has 3 words, and text: '6000 Redmond, WA'.
...
Table 0 has 2 rows and 6 columns.
Cell (0, 0) contains text: 'Invoice Number'.
Cell (0, 1) contains text: 'Invoice Date'.
Cell (0, 2) contains text: 'Invoice Due Date'.
...
Merchant Name: 'Contoso Contoso', with confidence 0.516
Transaction Date: '6/10/2019 12:00:00 AM', with confidence 0.985
Item:
Name: '8GB RAM (Black)', with confidence 0.916
Total Price: '999', with confidence 0.559
Item:
Name: 'SurfacePen', with confidence 0.858
Total Price: '99.99', with confidence 0.386
Total: '1203.39', with confidence '0.774'
Custom Model Info:
Model Id: 63c013e3-1cab-43eb-84b0-f4b20cb9214c
Model Status: Ready
Training model started on: 8/24/2020 6:42:54 PM +00:00
Training model completed on: 8/24/2020 6:43:01 PM +00:00
Submodel Form Type: form-63c013e3-1cab-43eb-84b0-f4b20cb9214c
FieldName: CompanyAddress
FieldName: CompanyName
FieldName: CompanyPhoneNumber
FieldName: DatedAs
FieldName: Email
FieldName: Merchant
...
使用自定义模型分析表单
本部分演示了如何通过你使用自己的表单训练的模型,从自定义模板类型中提取键/值信息和其他内容。
重要
若要实现此方案,必须已训练好一个模型,以便可以将其 ID 传递到下面的方法中。
使用 StartRecognizeCustomFormsFromUri
方法。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program
{
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
private static string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
private static string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
private static string receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args)
{
var recognizerClient = AuthenticateClient();
var trainingClient = AuthenticateTrainingClient();
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, trainModel, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
private static FormRecognizerClient AuthenticateClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_auth_training>
static private FormTrainingClient AuthenticateTrainingClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormTrainingClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth_training>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient)
{
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizerClient
.StartRecognizeContentFromUri(new Uri(invoiceUri))
.WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach (FormPage page in formPages)
{
Console.WriteLine($"Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++)
{
FormLine line = page.Lines[i];
Console.WriteLine($" Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "s" : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri)
{
RecognizedFormCollection receipts = await recognizerClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach (RecognizedForm receipt in receipts)
{
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField))
{
if (merchantNameField.Value.ValueType == FieldValueType.String)
{
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($"Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField))
{
if (transactionDateField.Value.ValueType == FieldValueType.Date)
{
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($"Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField))
{
if (itemsField.Value.ValueType == FieldValueType.List)
{
foreach (FormField itemField in itemsField.Value.AsList())
{
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary)
{
IReadOnlyDictionary<string, FormField> itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField))
{
if (itemNameField.Value.ValueType == FieldValueType.String)
{
string itemName = itemNameField.Value.AsString();
Console.WriteLine($" Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField))
{
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float)
{
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($" Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField))
{
if (totalField.Value.ValueType == FieldValueType.Float)
{
float total = totalField.Value.AsFloat();
Console.WriteLine($"Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_train>
private static async Task<String> TrainModel(
FormTrainingClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task<Guid> TrainModelWithLabelsAsync(
FormRecognizerClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, String modelId, string formUrl)
{
RecognizedFormCollection forms = await recognizerClient
.StartRecognizeCustomFormsFromUri(modelId, new Uri(formUrl))
.WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach (RecognizedForm form in forms)
{
Console.WriteLine($"Form of type: {form.FormType}");
foreach (FormField field in form.Fields.Values)
{
Console.WriteLine($"Field '{field.Name}: ");
if (field.LabelData != null)
{
Console.WriteLine($" Label: '{field.LabelData.Text}");
}
Console.WriteLine($" Value: '{field.ValueData.Text}");
Console.WriteLine($" Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach (FormPage page in form.Pages)
{
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "header" : "text")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormTrainingClient trainingClient, string trainingFileUrl)
{
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($"Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($"It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable<CustomFormModelInfo> models = trainingClient.GetCustomModels();
foreach (CustomFormModelInfo modelInfo in models)
{
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {modelInfo.ModelId}");
Console.WriteLine($" Model Status: {modelInfo.Status}");
Console.WriteLine($" Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(model.ModelId);
Console.WriteLine($"Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach (CustomFormSubmodel submodel in modelCopy.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
提示
还可分析本地文件。 请参阅 FormRecognizerClient 方法,例如 StartRecognizeCustomForms
。 或者,请参阅 GitHub 上的示例代码,了解涉及本地图像的方案。
返回的值是 RecognizedForm
对象的集合。 提交的文档中每个页面都有一个对象。 以下代码将分析结果输出到控制台。 它将输出每个识别的字段和相应的值以及置信度分数。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program
{
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
private static string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
private static string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
private static string receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args)
{
var recognizerClient = AuthenticateClient();
var trainingClient = AuthenticateTrainingClient();
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, trainModel, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
private static FormRecognizerClient AuthenticateClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_auth_training>
static private FormTrainingClient AuthenticateTrainingClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormTrainingClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth_training>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient)
{
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizerClient
.StartRecognizeContentFromUri(new Uri(invoiceUri))
.WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach (FormPage page in formPages)
{
Console.WriteLine($"Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++)
{
FormLine line = page.Lines[i];
Console.WriteLine($" Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "s" : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri)
{
RecognizedFormCollection receipts = await recognizerClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach (RecognizedForm receipt in receipts)
{
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField))
{
if (merchantNameField.Value.ValueType == FieldValueType.String)
{
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($"Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField))
{
if (transactionDateField.Value.ValueType == FieldValueType.Date)
{
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($"Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField))
{
if (itemsField.Value.ValueType == FieldValueType.List)
{
foreach (FormField itemField in itemsField.Value.AsList())
{
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary)
{
IReadOnlyDictionary<string, FormField> itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField))
{
if (itemNameField.Value.ValueType == FieldValueType.String)
{
string itemName = itemNameField.Value.AsString();
Console.WriteLine($" Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField))
{
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float)
{
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($" Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField))
{
if (totalField.Value.ValueType == FieldValueType.Float)
{
float total = totalField.Value.AsFloat();
Console.WriteLine($"Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_train>
private static async Task<String> TrainModel(
FormTrainingClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task<Guid> TrainModelWithLabelsAsync(
FormRecognizerClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, String modelId, string formUrl)
{
RecognizedFormCollection forms = await recognizerClient
.StartRecognizeCustomFormsFromUri(modelId, new Uri(formUrl))
.WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach (RecognizedForm form in forms)
{
Console.WriteLine($"Form of type: {form.FormType}");
foreach (FormField field in form.Fields.Values)
{
Console.WriteLine($"Field '{field.Name}: ");
if (field.LabelData != null)
{
Console.WriteLine($" Label: '{field.LabelData.Text}");
}
Console.WriteLine($" Value: '{field.ValueData.Text}");
Console.WriteLine($" Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach (FormPage page in form.Pages)
{
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "header" : "text")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormTrainingClient trainingClient, string trainingFileUrl)
{
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($"Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($"It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable<CustomFormModelInfo> models = trainingClient.GetCustomModels();
foreach (CustomFormModelInfo modelInfo in models)
{
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {modelInfo.ModelId}");
Console.WriteLine($" Model Status: {modelInfo.Status}");
Console.WriteLine($" Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(model.ModelId);
Console.WriteLine($"Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach (CustomFormSubmodel submodel in modelCopy.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
为了提高可读性,此输出响应已截断。
Custom Model Info:
Model Id: 9b0108ee-65c8-450e-b527-bb309d054fc4
Model Status: Ready
Training model started on: 8/24/2020 7:00:31 PM +00:00
Training model completed on: 8/24/2020 7:00:32 PM +00:00
Submodel Form Type: form-9b0108ee-65c8-450e-b527-bb309d054fc4
FieldName: CompanyAddress
FieldName: CompanyName
FieldName: CompanyPhoneNumber
...
Form Page 1 has 18 lines.
Line 0 has 1 word, and text: 'Contoso'.
Line 1 has 1 word, and text: 'Address:'.
Line 2 has 3 words, and text: 'Invoice For: Microsoft'.
Line 3 has 4 words, and text: '1 Redmond way Suite'.
...
Table 0 has 2 rows and 6 columns.
Cell (0, 0) contains text: 'Invoice Number'.
Cell (0, 1) contains text: 'Invoice Date'.
Cell (0, 2) contains text: 'Invoice Due Date'.
...
Merchant Name: 'Contoso Contoso', with confidence 0.516
Transaction Date: '6/10/2019 12:00:00 AM', with confidence 0.985
Item:
Name: '8GB RAM (Black)', with confidence 0.916
Total Price: '999', with confidence 0.559
Item:
Name: 'SurfacePen', with confidence 0.858
Total Price: '99.99', with confidence 0.386
Total: '1203.39', with confidence '0.774'
Custom Model Info:
Model Id: dc115156-ce0e-4202-bbe4-7426e7bee756
Model Status: Ready
Training model started on: 8/24/2020 7:00:31 PM +00:00
Training model completed on: 8/24/2020 7:00:41 PM +00:00
Submodel Form Type: form-0
FieldName: field-0, FieldLabel: Additional Notes:
FieldName: field-1, FieldLabel: Address:
FieldName: field-2, FieldLabel: Company Name:
FieldName: field-3, FieldLabel: Company Phone:
FieldName: field-4, FieldLabel: Dated As:
...
Form of type: custom:form
Field 'Azure.AI.FormRecognizer.Models.FieldValue:
Value: '$56,651.49
Confidence: '0.249
Field 'Azure.AI.FormRecognizer.Models.FieldValue:
Value: 'PT
Confidence: '0.245
Field 'Azure.AI.FormRecognizer.Models.FieldValue:
Value: '99243
Confidence: '0.114
...
管理自定义模型
本部分演示如何管理帐户中存储的自定义模型。 你将在以下方法中完成多个操作:
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program
{
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
private static string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
private static string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
private static string receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args)
{
var recognizerClient = AuthenticateClient();
var trainingClient = AuthenticateTrainingClient();
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, trainModel, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
private static FormRecognizerClient AuthenticateClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_auth_training>
static private FormTrainingClient AuthenticateTrainingClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormTrainingClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth_training>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient)
{
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizerClient
.StartRecognizeContentFromUri(new Uri(invoiceUri))
.WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach (FormPage page in formPages)
{
Console.WriteLine($"Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++)
{
FormLine line = page.Lines[i];
Console.WriteLine($" Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "s" : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri)
{
RecognizedFormCollection receipts = await recognizerClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach (RecognizedForm receipt in receipts)
{
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField))
{
if (merchantNameField.Value.ValueType == FieldValueType.String)
{
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($"Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField))
{
if (transactionDateField.Value.ValueType == FieldValueType.Date)
{
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($"Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField))
{
if (itemsField.Value.ValueType == FieldValueType.List)
{
foreach (FormField itemField in itemsField.Value.AsList())
{
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary)
{
IReadOnlyDictionary<string, FormField> itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField))
{
if (itemNameField.Value.ValueType == FieldValueType.String)
{
string itemName = itemNameField.Value.AsString();
Console.WriteLine($" Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField))
{
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float)
{
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($" Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField))
{
if (totalField.Value.ValueType == FieldValueType.Float)
{
float total = totalField.Value.AsFloat();
Console.WriteLine($"Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_train>
private static async Task<String> TrainModel(
FormTrainingClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task<Guid> TrainModelWithLabelsAsync(
FormRecognizerClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, String modelId, string formUrl)
{
RecognizedFormCollection forms = await recognizerClient
.StartRecognizeCustomFormsFromUri(modelId, new Uri(formUrl))
.WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach (RecognizedForm form in forms)
{
Console.WriteLine($"Form of type: {form.FormType}");
foreach (FormField field in form.Fields.Values)
{
Console.WriteLine($"Field '{field.Name}: ");
if (field.LabelData != null)
{
Console.WriteLine($" Label: '{field.LabelData.Text}");
}
Console.WriteLine($" Value: '{field.ValueData.Text}");
Console.WriteLine($" Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach (FormPage page in form.Pages)
{
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "header" : "text")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormTrainingClient trainingClient, string trainingFileUrl)
{
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($"Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($"It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable<CustomFormModelInfo> models = trainingClient.GetCustomModels();
foreach (CustomFormModelInfo modelInfo in models)
{
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {modelInfo.ModelId}");
Console.WriteLine($" Model Status: {modelInfo.Status}");
Console.WriteLine($" Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(model.ModelId);
Console.WriteLine($"Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach (CustomFormSubmodel submodel in modelCopy.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
检查 FormRecognizer 资源帐户中的模型数
下面的代码块会检查文档智能帐户中保存的模型数,并将其与帐户限制进行比较。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program
{
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
private static string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
private static string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
private static string receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args)
{
var recognizerClient = AuthenticateClient();
var trainingClient = AuthenticateTrainingClient();
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, trainModel, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
private static FormRecognizerClient AuthenticateClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_auth_training>
static private FormTrainingClient AuthenticateTrainingClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormTrainingClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth_training>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient)
{
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizerClient
.StartRecognizeContentFromUri(new Uri(invoiceUri))
.WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach (FormPage page in formPages)
{
Console.WriteLine($"Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++)
{
FormLine line = page.Lines[i];
Console.WriteLine($" Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "s" : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri)
{
RecognizedFormCollection receipts = await recognizerClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach (RecognizedForm receipt in receipts)
{
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField))
{
if (merchantNameField.Value.ValueType == FieldValueType.String)
{
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($"Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField))
{
if (transactionDateField.Value.ValueType == FieldValueType.Date)
{
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($"Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField))
{
if (itemsField.Value.ValueType == FieldValueType.List)
{
foreach (FormField itemField in itemsField.Value.AsList())
{
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary)
{
IReadOnlyDictionary<string, FormField> itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField))
{
if (itemNameField.Value.ValueType == FieldValueType.String)
{
string itemName = itemNameField.Value.AsString();
Console.WriteLine($" Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField))
{
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float)
{
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($" Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField))
{
if (totalField.Value.ValueType == FieldValueType.Float)
{
float total = totalField.Value.AsFloat();
Console.WriteLine($"Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_train>
private static async Task<String> TrainModel(
FormTrainingClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task<Guid> TrainModelWithLabelsAsync(
FormRecognizerClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, String modelId, string formUrl)
{
RecognizedFormCollection forms = await recognizerClient
.StartRecognizeCustomFormsFromUri(modelId, new Uri(formUrl))
.WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach (RecognizedForm form in forms)
{
Console.WriteLine($"Form of type: {form.FormType}");
foreach (FormField field in form.Fields.Values)
{
Console.WriteLine($"Field '{field.Name}: ");
if (field.LabelData != null)
{
Console.WriteLine($" Label: '{field.LabelData.Text}");
}
Console.WriteLine($" Value: '{field.ValueData.Text}");
Console.WriteLine($" Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach (FormPage page in form.Pages)
{
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "header" : "text")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormTrainingClient trainingClient, string trainingFileUrl)
{
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($"Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($"It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable<CustomFormModelInfo> models = trainingClient.GetCustomModels();
foreach (CustomFormModelInfo modelInfo in models)
{
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {modelInfo.ModelId}");
Console.WriteLine($" Model Status: {modelInfo.Status}");
Console.WriteLine($" Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(model.ModelId);
Console.WriteLine($"Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach (CustomFormSubmodel submodel in modelCopy.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
输出
Account has 20 models.
It can have at most 5000 models.
列出资源帐户中当前存储的模型
下面的代码块会列出帐户中的当前模型,并将这些模型的详细信息输出到控制台。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program
{
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
private static string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
private static string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
private static string receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args)
{
var recognizerClient = AuthenticateClient();
var trainingClient = AuthenticateTrainingClient();
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, trainModel, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
private static FormRecognizerClient AuthenticateClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_auth_training>
static private FormTrainingClient AuthenticateTrainingClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormTrainingClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth_training>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient)
{
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizerClient
.StartRecognizeContentFromUri(new Uri(invoiceUri))
.WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach (FormPage page in formPages)
{
Console.WriteLine($"Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++)
{
FormLine line = page.Lines[i];
Console.WriteLine($" Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "s" : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri)
{
RecognizedFormCollection receipts = await recognizerClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach (RecognizedForm receipt in receipts)
{
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField))
{
if (merchantNameField.Value.ValueType == FieldValueType.String)
{
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($"Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField))
{
if (transactionDateField.Value.ValueType == FieldValueType.Date)
{
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($"Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField))
{
if (itemsField.Value.ValueType == FieldValueType.List)
{
foreach (FormField itemField in itemsField.Value.AsList())
{
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary)
{
IReadOnlyDictionary<string, FormField> itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField))
{
if (itemNameField.Value.ValueType == FieldValueType.String)
{
string itemName = itemNameField.Value.AsString();
Console.WriteLine($" Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField))
{
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float)
{
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($" Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField))
{
if (totalField.Value.ValueType == FieldValueType.Float)
{
float total = totalField.Value.AsFloat();
Console.WriteLine($"Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_train>
private static async Task<String> TrainModel(
FormTrainingClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task<Guid> TrainModelWithLabelsAsync(
FormRecognizerClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, String modelId, string formUrl)
{
RecognizedFormCollection forms = await recognizerClient
.StartRecognizeCustomFormsFromUri(modelId, new Uri(formUrl))
.WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach (RecognizedForm form in forms)
{
Console.WriteLine($"Form of type: {form.FormType}");
foreach (FormField field in form.Fields.Values)
{
Console.WriteLine($"Field '{field.Name}: ");
if (field.LabelData != null)
{
Console.WriteLine($" Label: '{field.LabelData.Text}");
}
Console.WriteLine($" Value: '{field.ValueData.Text}");
Console.WriteLine($" Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach (FormPage page in form.Pages)
{
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "header" : "text")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormTrainingClient trainingClient, string trainingFileUrl)
{
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($"Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($"It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable<CustomFormModelInfo> models = trainingClient.GetCustomModels();
foreach (CustomFormModelInfo modelInfo in models)
{
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {modelInfo.ModelId}");
Console.WriteLine($" Model Status: {modelInfo.Status}");
Console.WriteLine($" Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(model.ModelId);
Console.WriteLine($"Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach (CustomFormSubmodel submodel in modelCopy.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
为了提高可读性,此示例输出已截断。
Custom Model Info:
Model Id: 05932d5a-a2f8-4030-a2ef-4e5ed7112515
Model Status: Creating
Training model started on: 8/24/2020 7:35:02 PM +00:00
Training model completed on: 8/24/2020 7:35:02 PM +00:00
Custom Model Info:
Model Id: 150828c4-2eb2-487e-a728-60d5d504bd16
Model Status: Ready
Training model started on: 8/24/2020 7:33:25 PM +00:00
Training model completed on: 8/24/2020 7:33:27 PM +00:00
Custom Model Info:
Model Id: 3303e9de-6cec-4dfb-9e68-36510a6ecbb2
Model Status: Ready
Training model started on: 8/24/2020 7:29:27 PM +00:00
Training model completed on: 8/24/2020 7:29:36 PM +00:00
使用模型 ID 获取特定模型
下面的代码块会训练新模型(如不使用标签训练模型中所述),然后使用该模型的 ID 检索对它的第二次引用。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program
{
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
private static string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
private static string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
private static string receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args)
{
var recognizerClient = AuthenticateClient();
var trainingClient = AuthenticateTrainingClient();
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, trainModel, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
private static FormRecognizerClient AuthenticateClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_auth_training>
static private FormTrainingClient AuthenticateTrainingClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormTrainingClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth_training>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient)
{
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizerClient
.StartRecognizeContentFromUri(new Uri(invoiceUri))
.WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach (FormPage page in formPages)
{
Console.WriteLine($"Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++)
{
FormLine line = page.Lines[i];
Console.WriteLine($" Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "s" : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri)
{
RecognizedFormCollection receipts = await recognizerClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach (RecognizedForm receipt in receipts)
{
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField))
{
if (merchantNameField.Value.ValueType == FieldValueType.String)
{
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($"Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField))
{
if (transactionDateField.Value.ValueType == FieldValueType.Date)
{
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($"Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField))
{
if (itemsField.Value.ValueType == FieldValueType.List)
{
foreach (FormField itemField in itemsField.Value.AsList())
{
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary)
{
IReadOnlyDictionary<string, FormField> itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField))
{
if (itemNameField.Value.ValueType == FieldValueType.String)
{
string itemName = itemNameField.Value.AsString();
Console.WriteLine($" Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField))
{
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float)
{
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($" Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField))
{
if (totalField.Value.ValueType == FieldValueType.Float)
{
float total = totalField.Value.AsFloat();
Console.WriteLine($"Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_train>
private static async Task<String> TrainModel(
FormTrainingClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task<Guid> TrainModelWithLabelsAsync(
FormRecognizerClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, String modelId, string formUrl)
{
RecognizedFormCollection forms = await recognizerClient
.StartRecognizeCustomFormsFromUri(modelId, new Uri(formUrl))
.WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach (RecognizedForm form in forms)
{
Console.WriteLine($"Form of type: {form.FormType}");
foreach (FormField field in form.Fields.Values)
{
Console.WriteLine($"Field '{field.Name}: ");
if (field.LabelData != null)
{
Console.WriteLine($" Label: '{field.LabelData.Text}");
}
Console.WriteLine($" Value: '{field.ValueData.Text}");
Console.WriteLine($" Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach (FormPage page in form.Pages)
{
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "header" : "text")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormTrainingClient trainingClient, string trainingFileUrl)
{
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($"Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($"It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable<CustomFormModelInfo> models = trainingClient.GetCustomModels();
foreach (CustomFormModelInfo modelInfo in models)
{
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {modelInfo.ModelId}");
Console.WriteLine($" Model Status: {modelInfo.Status}");
Console.WriteLine($" Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(model.ModelId);
Console.WriteLine($"Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach (CustomFormSubmodel submodel in modelCopy.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
为了提高可读性,此示例输出已截断。
Custom Model Info:
Model Id: 150828c4-2eb2-487e-a728-60d5d504bd16
Model Status: Ready
Training model started on: 8/24/2020 7:33:25 PM +00:00
Training model completed on: 8/24/2020 7:33:27 PM +00:00
Submodel Form Type: form-150828c4-2eb2-487e-a728-60d5d504bd16
FieldName: CompanyAddress
FieldName: CompanyName
FieldName: CompanyPhoneNumber
FieldName: DatedAs
FieldName: Email
FieldName: Merchant
FieldName: PhoneNumber
FieldName: PurchaseOrderNumber
FieldName: Quantity
FieldName: Signature
FieldName: Subtotal
FieldName: Tax
FieldName: Total
FieldName: VendorName
FieldName: Website
...
从资源帐户中删除模型
还可以通过引用模型的 ID 从帐户中删除该模型。 该方法也以此步骤结束。
// <snippet_using>
using Azure;
using Azure.AI.FormRecognizer;
using Azure.AI.FormRecognizer.Models;
using Azure.AI.FormRecognizer.Training;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
// </snippet_using>
class Program
{
// <snippet_creds>
private static readonly string endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
private static readonly string apiKey = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
private static readonly AzureKeyCredential credential = new AzureKeyCredential(apiKey);
// </snippet_creds>
// <snippet_urls>
private static string trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
private static string formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
private static string receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_urls>
// <snippet_main>
static void Main(string[] args)
{
var recognizerClient = AuthenticateClient();
var trainingClient = AuthenticateTrainingClient();
var recognizeContent = RecognizeContent(recognizerClient);
Task.WaitAll(recognizeContent);
var analyzeReceipt = AnalyzeReceipt(recognizerClient, receiptUrl);
Task.WaitAll(analyzeReceipt);
var trainModel = TrainModel(trainingClient, trainingDataUrl);
Task.WaitAll(trainModel);
var analyzeForm = AnalyzePdfForm(recognizerClient, trainModel, formUrl);
Task.WaitAll(analyzeForm);
var manageModels = ManageModels(trainingClient, trainingDataUrl);
Task.WaitAll(manageModels);
}
// </snippet_main>
// <snippet_auth>
private static FormRecognizerClient AuthenticateClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormRecognizerClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth>
// <snippet_auth_training>
static private FormTrainingClient AuthenticateTrainingClient()
{
var credential = new AzureKeyCredential(apiKey);
var client = new FormTrainingClient(new Uri(endpoint), credential);
return client;
}
// </snippet_auth_training>
// <snippet_getcontent_call>
private static async Task RecognizeContent(FormRecognizerClient recognizerClient)
{
var invoiceUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/simple-invoice.png";
FormPageCollection formPages = await recognizerClient
.StartRecognizeContentFromUri(new Uri(invoiceUri))
.WaitForCompletionAsync();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
foreach (FormPage page in formPages)
{
Console.WriteLine($"Form Page {page.PageNumber} has {page.Lines.Count} lines.");
for (int i = 0; i < page.Lines.Count; i++)
{
FormLine line = page.Lines[i];
Console.WriteLine($" Line {i} has {line.Words.Count} word{(line.Words.Count > 1 ? "s" : "")}, and text: '{line.Text}'.");
}
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains text: '{cell.Text}'.");
}
}
}
}
// </snippet_getcontent_print>
// <snippet_receipt_call>
private static async Task AnalyzeReceipt(
FormRecognizerClient recognizerClient, string receiptUri)
{
RecognizedFormCollection receipts = await recognizerClient.StartRecognizeReceiptsFromUri(new Uri(receiptUrl)).WaitForCompletionAsync();
// </snippet_receipt_call>
// <snippet_receipt_print>
foreach (RecognizedForm receipt in receipts)
{
FormField merchantNameField;
if (receipt.Fields.TryGetValue("MerchantName", out merchantNameField))
{
if (merchantNameField.Value.ValueType == FieldValueType.String)
{
string merchantName = merchantNameField.Value.AsString();
Console.WriteLine($"Merchant Name: '{merchantName}', with confidence {merchantNameField.Confidence}");
}
}
FormField transactionDateField;
if (receipt.Fields.TryGetValue("TransactionDate", out transactionDateField))
{
if (transactionDateField.Value.ValueType == FieldValueType.Date)
{
DateTime transactionDate = transactionDateField.Value.AsDate();
Console.WriteLine($"Transaction Date: '{transactionDate}', with confidence {transactionDateField.Confidence}");
}
}
FormField itemsField;
if (receipt.Fields.TryGetValue("Items", out itemsField))
{
if (itemsField.Value.ValueType == FieldValueType.List)
{
foreach (FormField itemField in itemsField.Value.AsList())
{
Console.WriteLine("Item:");
if (itemField.Value.ValueType == FieldValueType.Dictionary)
{
IReadOnlyDictionary<string, FormField> itemFields = itemField.Value.AsDictionary();
FormField itemNameField;
if (itemFields.TryGetValue("Name", out itemNameField))
{
if (itemNameField.Value.ValueType == FieldValueType.String)
{
string itemName = itemNameField.Value.AsString();
Console.WriteLine($" Name: '{itemName}', with confidence {itemNameField.Confidence}");
}
}
FormField itemTotalPriceField;
if (itemFields.TryGetValue("TotalPrice", out itemTotalPriceField))
{
if (itemTotalPriceField.Value.ValueType == FieldValueType.Float)
{
float itemTotalPrice = itemTotalPriceField.Value.AsFloat();
Console.WriteLine($" Total Price: '{itemTotalPrice}', with confidence {itemTotalPriceField.Confidence}");
}
}
}
}
}
}
FormField totalField;
if (receipt.Fields.TryGetValue("Total", out totalField))
{
if (totalField.Value.ValueType == FieldValueType.Float)
{
float total = totalField.Value.AsFloat();
Console.WriteLine($"Total: '{total}', with confidence '{totalField.Confidence}'");
}
}
}
}
// </snippet_receipt_print>
// <snippet_train>
private static async Task<String> TrainModel(
FormTrainingClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: false)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_train>
// <snippet_train_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_train_response>
// <snippet_train_return>
return model.ModelId;
}
// </snippet_train_return>
// <snippet_trainlabels>
private static async Task<Guid> TrainModelWithLabelsAsync(
FormRecognizerClient trainingClient, string trainingDataUrl)
{
CustomFormModel model = await trainingClient
.StartTrainingAsync(new Uri(trainingDataUrl), useTrainingLabels: true)
.WaitForCompletionAsync();
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {model.ModelId}");
Console.WriteLine($" Model Status: {model.Status}");
Console.WriteLine($" Training model started on: {model.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {model.TrainingCompletedOn}");
// </snippet_trainlabels>
// <snippet_trainlabels_response>
foreach (CustomFormSubmodel submodel in model.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
return model.ModelId;
}
// </snippet_trainlabels_response>
// <snippet_analyze>
// Analyze PDF form data
private static async Task AnalyzePdfForm(
FormRecognizerClient recognizerClient, String modelId, string formUrl)
{
RecognizedFormCollection forms = await recognizerClient
.StartRecognizeCustomFormsFromUri(modelId, new Uri(formUrl))
.WaitForCompletionAsync();
// </snippet_analyze>
// <snippet_analyze_response>
foreach (RecognizedForm form in forms)
{
Console.WriteLine($"Form of type: {form.FormType}");
foreach (FormField field in form.Fields.Values)
{
Console.WriteLine($"Field '{field.Name}: ");
if (field.LabelData != null)
{
Console.WriteLine($" Label: '{field.LabelData.Text}");
}
Console.WriteLine($" Value: '{field.ValueData.Text}");
Console.WriteLine($" Confidence: '{field.Confidence}");
}
Console.WriteLine("Table data:");
foreach (FormPage page in form.Pages)
{
for (int i = 0; i < page.Tables.Count; i++)
{
FormTable table = page.Tables[i];
Console.WriteLine($"Table {i} has {table.RowCount} rows and {table.ColumnCount} columns.");
foreach (FormTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}) contains {(cell.IsHeader ? "header" : "text")}: '{cell.Text}'");
}
}
}
}
}
// </snippet_analyze_response>
// <snippet_manage>
private static async Task ManageModels(
FormTrainingClient trainingClient, string trainingFileUrl)
{
// </snippet_manage>
// <snippet_manage_model_count>
// Check number of models in the FormRecognizer account,
// and the maximum number of models that can be stored.
AccountProperties accountProperties = trainingClient.GetAccountProperties();
Console.WriteLine($"Account has {accountProperties.CustomModelCount} models.");
Console.WriteLine($"It can have at most {accountProperties.CustomModelLimit} models.");
// </snippet_manage_model_count>
// <snippet_manage_model_list>
Pageable<CustomFormModelInfo> models = trainingClient.GetCustomModels();
foreach (CustomFormModelInfo modelInfo in models)
{
Console.WriteLine($"Custom Model Info:");
Console.WriteLine($" Model Id: {modelInfo.ModelId}");
Console.WriteLine($" Model Status: {modelInfo.Status}");
Console.WriteLine($" Training model started on: {modelInfo.TrainingStartedOn}");
Console.WriteLine($" Training model completed on: {modelInfo.TrainingCompletedOn}");
}
// </snippet_manage_model_list>
// <snippet_manage_model_get>
// Create a new model to store in the account
CustomFormModel model = await trainingClient.StartTrainingAsync(
new Uri(trainingFileUrl)).WaitForCompletionAsync();
// Get the model that was just created
CustomFormModel modelCopy = trainingClient.GetCustomModel(model.ModelId);
Console.WriteLine($"Custom Model {modelCopy.ModelId} recognizes the following form types:");
foreach (CustomFormSubmodel submodel in modelCopy.Submodels)
{
Console.WriteLine($"Submodel Form Type: {submodel.FormType}");
foreach (CustomFormModelField field in submodel.Fields.Values)
{
Console.Write($" FieldName: {field.Name}");
if (field.Label != null)
{
Console.Write($", FieldLabel: {field.Label}");
}
Console.WriteLine("");
}
}
// </snippet_manage_model_get>
// <snippet_manage_model_delete>
// Delete the model from the account.
trainingClient.DeleteModel(model.ModelId);
}
// </snippet_manage_model_delete>
}
运行应用程序
从应用程序目录使用 dotnet run
命令运行应用程序。
dotnet run
清理资源
如果想要清理并移除 Azure AI 服务订阅,可以删除资源或资源组。 删除资源组同时也会删除与之相关联的任何其他资源。
疑难解答
使用 .NET SDK 与 Azure AI 文档智能客户端库交互时,服务返回的错误将导致 RequestFailedException
。 它们包括 REST API 请求会返回的相同的 HTTP 状态代码。
例如,如果提交具有无效 URI 的收据图像,则会返回 400
错误,表示“请求错误”。
try
{
RecognizedReceiptCollection receipts = await client.StartRecognizeReceiptsFromUri(new Uri(receiptUri)).WaitForCompletionAsync();
}
catch (RequestFailedException e)
{
Console.WriteLine(e.ToString());
}
你会发现,操作的客户端请求 ID 等其他信息已被记录下来。
Message:
Azure.RequestFailedException: Service request failed.
Status: 400 (Bad Request)
Content:
{"error":{"code":"FailedToDownloadImage","innerError":
{"requestId":"8ca04feb-86db-4552-857c-fde903251518"},
"message":"Failed to download image from input URL."}}
Headers:
Transfer-Encoding: chunked
x-envoy-upstream-service-time: REDACTED
apim-request-id: REDACTED
Strict-Transport-Security: REDACTED
X-Content-Type-Options: REDACTED
Date: Mon, 20 Apr 2020 22:48:35 GMT
Content-Type: application/json; charset=utf-8
后续步骤
在此项目中,你使用了文档智能 .NET 客户端库以不同的方式来训练模型和分析表单。 接下来,请了解如何创建更好的训练数据集并生成更准确的模型。
重要
本项目面向文档智能 REST API 版本 2.1。
先决条件
Azure 订阅 - 创建试用订阅。
最新版 Java 开发工具包 (JDK)。
Gradle 生成工具,或其他依赖项管理器。
文档智能资源。 可以使用免费定价层 (
F0
) 试用该服务,然后再升级到付费层进行生产。从创建的资源获取密钥和终结点,以便将应用程序连接到文档智能服务。
- 部署资源后,选择“转到资源”。
- 在左侧导航菜单中,选择“密钥和终结点”。
- 复制其中一个密钥和终结点,稍后你将在本快速入门中使用它们。
包含一组训练数据的 Azure 存储 Blob。 有关整理训练数据集的提示和选项,请参阅生成和训练自定义模型。 对于本项目,可使用示例数据集的 Train 文件夹下的文件。 下载并提取 sample_data.zip。
设置编程环境
若要设置编程环境,请创建 Gradle 项目并安装客户端库。
创建新的 Gradle 项目
在控制台窗口中,为你的应用创建一个新目录,然后导航到该目录。
mkdir myapp
cd myapp
从工作目录运行 gradle init
命令。 此命令将创建 Gradle 的基本生成文件,其中包括 build.gradle.kts - 在运行时将使用该文件创建并配置应用程序。
gradle init --type basic
当提示你选择一个 DSL 时,选择 Kotlin。
安装客户端库
本项目使用 Gradle 依赖项管理器。 可以在 Maven 中央存储库中找到客户端库以及其他依赖项管理器的信息。
在项目的 build.gradle.kts 文件中,以 implementation
语句的形式包含客户端库及所需的插件和设置。
plugins {
java
application
}
application {
mainClass.set("FormRecognizer")
}
repositories {
mavenCentral()
}
dependencies {
implementation(group = "com.azure", name = "azure-ai-formrecognizer", version = "3.1.1")
}
创建 Java 文件
在工作目录中运行以下命令:
mkdir -p src/main/java
导航到新的文件夹,创建一个名为 FormRecognizer.java 的文件。 在编辑器或 IDE 中打开该文件并添加以下 import
语句:
// <snippet_imports>
import com.azure.ai.formrecognizer.*;
import com.azure.ai.formrecognizer.training.*;
import com.azure.ai.formrecognizer.models.*;
import com.azure.ai.formrecognizer.training.models.*;
import java.util.concurrent.atomic.AtomicReference;
import java.util.List;
import java.util.Map;
import java.time.LocalDate;
import com.azure.core.credential.AzureKeyCredential;
import com.azure.core.http.rest.PagedIterable;
import com.azure.core.util.Context;
import com.azure.core.util.polling.SyncPoller;
// </snippet_imports>
public class FormRecognizer {
// <snippet_creds>
static final String key = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
static final String endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
// </snippet_creds>
public static void main(String[] args) {
// <snippet_auth>
FormRecognizerClient recognizerClient = new FormRecognizerClientBuilder()
.credential(new AzureKeyCredential(key)).endpoint(endpoint).buildClient();
FormTrainingClient trainingClient = new FormTrainingClientBuilder().credential(new AzureKeyCredential(key))
.endpoint(endpoint).buildClient();
// </snippet_auth>
// <snippet_mainvars>
String trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
String formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
String receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_mainvars>
// <snippet_maincalls>
// Call Form Recognizer scenarios:
System.out.println("Get form content...");
GetContent(recognizerClient, formUrl);
System.out.println("Analyze receipt...");
AnalyzeReceipt(recognizerClient, receiptUrl);
System.out.println("Train Model with training data...");
String modelId = TrainModel(trainingClient, trainingDataUrl);
System.out.println("Analyze PDF form...");
AnalyzePdfForm(recognizerClient, modelId, formUrl);
System.out.println("Manage models...");
ManageModels(trainingClient, trainingDataUrl);
// </snippet_maincalls>
}
// <snippet_getcontent_call>
private static void GetContent(FormRecognizerClient recognizerClient, String invoiceUri) {
String analyzeFilePath = invoiceUri;
SyncPoller<FormRecognizerOperationResult, List<FormPage>> recognizeContentPoller = recognizerClient
.beginRecognizeContentFromUrl(analyzeFilePath);
List<FormPage> contentResult = recognizeContentPoller.getFinalResult();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
contentResult.forEach(formPage -> {
// Table information
System.out.println("----Recognizing content ----");
System.out.printf("Has width: %f and height: %f, measured with unit: %s.%n", formPage.getWidth(),
formPage.getHeight(), formPage.getUnit());
formPage.getTables().forEach(formTable -> {
System.out.printf("Table has %d rows and %d columns.%n", formTable.getRowCount(),
formTable.getColumnCount());
formTable.getCells().forEach(formTableCell -> {
System.out.printf("Cell has text %s.%n", formTableCell.getText());
});
System.out.println();
});
});
}
// </snippet_getcontent_print>
// <snippet_receipts_call>
private static void AnalyzeReceipt(FormRecognizerClient recognizerClient, String receiptUri) {
SyncPoller<FormRecognizerOperationResult, List<RecognizedForm>> syncPoller = recognizerClient
.beginRecognizeReceiptsFromUrl(receiptUri);
List<RecognizedForm> receiptPageResults = syncPoller.getFinalResult();
// </snippet_receipts_call>
// <snippet_receipts_print>
for (int i = 0; i < receiptPageResults.size(); i++) {
RecognizedForm recognizedForm = receiptPageResults.get(i);
Map<String, FormField> recognizedFields = recognizedForm.getFields();
System.out.printf("----------- Recognized Receipt page %d -----------%n", i);
FormField merchantNameField = recognizedFields.get("MerchantName");
if (merchantNameField != null) {
if (FieldValueType.STRING == merchantNameField.getValue().getValueType()) {
String merchantName = merchantNameField.getValue().asString();
System.out.printf("Merchant Name: %s, confidence: %.2f%n", merchantName,
merchantNameField.getConfidence());
}
}
FormField merchantAddressField = recognizedFields.get("MerchantAddress");
if (merchantAddressField != null) {
if (FieldValueType.STRING == merchantAddressField.getValue().getValueType()) {
String merchantAddress = merchantAddressField.getValue().asString();
System.out.printf("Merchant Address: %s, confidence: %.2f%n", merchantAddress,
merchantAddressField.getConfidence());
}
}
FormField transactionDateField = recognizedFields.get("TransactionDate");
if (transactionDateField != null) {
if (FieldValueType.DATE == transactionDateField.getValue().getValueType()) {
LocalDate transactionDate = transactionDateField.getValue().asDate();
System.out.printf("Transaction Date: %s, confidence: %.2f%n", transactionDate,
transactionDateField.getConfidence());
}
}
// </snippet_receipts_print>
// <snippet_receipts_print_items>
FormField receiptItemsField = recognizedFields.get("Items");
if (receiptItemsField != null) {
System.out.printf("Receipt Items: %n");
if (FieldValueType.LIST == receiptItemsField.getValue().getValueType()) {
List<FormField> receiptItems = receiptItemsField.getValue().asList();
receiptItems.stream()
.filter(receiptItem -> FieldValueType.MAP == receiptItem.getValue().getValueType())
.map(formField -> formField.getValue().asMap())
.forEach(formFieldMap -> formFieldMap.forEach((key, formField) -> {
if ("Name".equals(key)) {
if (FieldValueType.STRING == formField.getValue().getValueType()) {
String name = formField.getValue().asString();
System.out.printf("Name: %s, confidence: %.2fs%n", name,
formField.getConfidence());
}
}
if ("Quantity".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float quantity = formField.getValue().asFloat();
System.out.printf("Quantity: %f, confidence: %.2f%n", quantity,
formField.getConfidence());
}
}
if ("Price".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float price = formField.getValue().asFloat();
System.out.printf("Price: %f, confidence: %.2f%n", price,
formField.getConfidence());
}
}
if ("TotalPrice".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float totalPrice = formField.getValue().asFloat();
System.out.printf("Total Price: %f, confidence: %.2f%n", totalPrice,
formField.getConfidence());
}
}
}));
}
}
}
}
// </snippet_receipts_print_items>
// <snippet_train_call>
private static String TrainModel(FormTrainingClient trainingClient, String trainingDataUrl) {
SyncPoller<FormRecognizerOperationResult, CustomFormModel> trainingPoller = trainingClient
.beginTraining(trainingDataUrl, false);
CustomFormModel customFormModel = trainingPoller.getFinalResult();
// Model Info
System.out.printf("Model Id: %s%n", customFormModel.getModelId());
System.out.printf("Model Status: %s%n", customFormModel.getModelStatus());
System.out.printf("Training started on: %s%n", customFormModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n%n", customFormModel.getTrainingCompletedOn());
// </snippet_train_call>
// <snippet_train_print>
System.out.println("Recognized Fields:");
// looping through the subModels, which contains the fields they were trained on
// Since the given training documents are unlabeled, we still group them but
// they do not have a label.
customFormModel.getSubmodels().forEach(customFormSubmodel -> {
// Since the training data is unlabeled, we are unable to return the accuracy of
// this model
System.out.printf("The subModel has form type %s%n", customFormSubmodel.getFormType());
customFormSubmodel.getFields().forEach((field, customFormModelField) -> System.out
.printf("The model found field '%s' with label: %s%n", field, customFormModelField.getLabel()));
});
// </snippet_train_print>
// <snippet_train_return>
return customFormModel.getModelId();
}
// </snippet_train_return>
// <snippet_trainlabels_call>
private static String TrainModelWithLabels(FormTrainingClient trainingClient, String trainingDataUrl) {
// Train custom model
String trainingSetSource = trainingDataUrl;
SyncPoller<FormRecognizerOperationResult, CustomFormModel> trainingPoller = trainingClient
.beginTraining(trainingSetSource, true);
CustomFormModel customFormModel = trainingPoller.getFinalResult();
// Model Info
System.out.printf("Model Id: %s%n", customFormModel.getModelId());
System.out.printf("Model Status: %s%n", customFormModel.getModelStatus());
System.out.printf("Training started on: %s%n", customFormModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n%n", customFormModel.getTrainingCompletedOn());
// </snippet_trainlabels_call>
// <snippet_trainlabels_print>
// looping through the subModels, which contains the fields they were trained on
// The labels are based on the ones you gave the training document.
System.out.println("Recognized Fields:");
// Since the data is labeled, we are able to return the accuracy of the model
customFormModel.getSubmodels().forEach(customFormSubmodel -> {
System.out.printf("The subModel with form type %s has accuracy: %.2f%n", customFormSubmodel.getFormType(),
customFormSubmodel.getAccuracy());
customFormSubmodel.getFields()
.forEach((label, customFormModelField) -> System.out.printf(
"The model found field '%s' to have name: %s with an accuracy: %.2f%n", label,
customFormModelField.getName(), customFormModelField.getAccuracy()));
});
return customFormModel.getModelId();
}
// </snippet_trainlabels_print>
// <snippet_analyze_call>
// Analyze PDF form data
private static void AnalyzePdfForm(FormRecognizerClient formClient, String modelId, String pdfFormUrl) {
SyncPoller<FormRecognizerOperationResult, List<RecognizedForm>> recognizeFormPoller = formClient
.beginRecognizeCustomFormsFromUrl(modelId, pdfFormUrl);
List<RecognizedForm> recognizedForms = recognizeFormPoller.getFinalResult();
// </snippet_analyze_call>
// <snippet_analyze_print>
for (int i = 0; i < recognizedForms.size(); i++) {
final RecognizedForm form = recognizedForms.get(i);
System.out.printf("----------- Recognized custom form info for page %d -----------%n", i);
System.out.printf("Form type: %s%n", form.getFormType());
form.getFields().forEach((label, formField) ->
// label data is populated if you are using a model trained with unlabeled data,
// since the service needs to make predictions for labels if not explicitly
// given to it.
System.out.printf("Field '%s' has label '%s' with a confidence " + "score of %.2f.%n", label,
formField.getLabelData().getText(), formField.getConfidence()));
}
}
// </snippet_analyze_print>
// <snippet_manage>
private static void ManageModels(FormTrainingClient trainingClient, String trainingFileUrl) {
// </snippet_manage>
// <snippet_manage_count>
AtomicReference<String> modelId = new AtomicReference<>();
// First, we see how many custom models we have, and what our limit is
AccountProperties accountProperties = trainingClient.getAccountProperties();
System.out.printf("The account has %s custom models, and we can have at most %s custom models",
accountProperties.getCustomModelCount(), accountProperties.getCustomModelLimit());
// </snippet_manage_count>
// <snippet_manage_list>
// Next, we get a paged list of all of our custom models
PagedIterable<CustomFormModelInfo> customModels = trainingClient.listCustomModels();
System.out.println("We have following models in the account:");
customModels.forEach(customFormModelInfo -> {
System.out.printf("Model Id: %s%n", customFormModelInfo.getModelId());
// get custom model info
modelId.set(customFormModelInfo.getModelId());
CustomFormModel customModel = trainingClient.getCustomModel(customFormModelInfo.getModelId());
System.out.printf("Model Id: %s%n", customModel.getModelId());
System.out.printf("Model Status: %s%n", customModel.getModelStatus());
System.out.printf("Training started on: %s%n", customModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n", customModel.getTrainingCompletedOn());
customModel.getSubmodels().forEach(customFormSubmodel -> {
System.out.printf("Custom Model Form type: %s%n", customFormSubmodel.getFormType());
System.out.printf("Custom Model Accuracy: %.2f%n", customFormSubmodel.getAccuracy());
if (customFormSubmodel.getFields() != null) {
customFormSubmodel.getFields().forEach((fieldText, customFormModelField) -> {
System.out.printf("Field Text: %s%n", fieldText);
System.out.printf("Field Accuracy: %.2f%n", customFormModelField.getAccuracy());
});
}
});
});
// </snippet_manage_list>
// <snippet_manage_delete>
// Delete Custom Model
System.out.printf("Deleted model with model Id: %s, operation completed with status: %s%n", modelId.get(),
trainingClient.deleteModelWithResponse(modelId.get(), Context.NONE).getStatusCode());
}
// </snippet_manage_delete>
}
在应用程序的 FormRecognizer 类中,为资源的密钥和终结点创建变量。
// <snippet_imports>
import com.azure.ai.formrecognizer.*;
import com.azure.ai.formrecognizer.training.*;
import com.azure.ai.formrecognizer.models.*;
import com.azure.ai.formrecognizer.training.models.*;
import java.util.concurrent.atomic.AtomicReference;
import java.util.List;
import java.util.Map;
import java.time.LocalDate;
import com.azure.core.credential.AzureKeyCredential;
import com.azure.core.http.rest.PagedIterable;
import com.azure.core.util.Context;
import com.azure.core.util.polling.SyncPoller;
// </snippet_imports>
public class FormRecognizer {
// <snippet_creds>
static final String key = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
static final String endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
// </snippet_creds>
public static void main(String[] args) {
// <snippet_auth>
FormRecognizerClient recognizerClient = new FormRecognizerClientBuilder()
.credential(new AzureKeyCredential(key)).endpoint(endpoint).buildClient();
FormTrainingClient trainingClient = new FormTrainingClientBuilder().credential(new AzureKeyCredential(key))
.endpoint(endpoint).buildClient();
// </snippet_auth>
// <snippet_mainvars>
String trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
String formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
String receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_mainvars>
// <snippet_maincalls>
// Call Form Recognizer scenarios:
System.out.println("Get form content...");
GetContent(recognizerClient, formUrl);
System.out.println("Analyze receipt...");
AnalyzeReceipt(recognizerClient, receiptUrl);
System.out.println("Train Model with training data...");
String modelId = TrainModel(trainingClient, trainingDataUrl);
System.out.println("Analyze PDF form...");
AnalyzePdfForm(recognizerClient, modelId, formUrl);
System.out.println("Manage models...");
ManageModels(trainingClient, trainingDataUrl);
// </snippet_maincalls>
}
// <snippet_getcontent_call>
private static void GetContent(FormRecognizerClient recognizerClient, String invoiceUri) {
String analyzeFilePath = invoiceUri;
SyncPoller<FormRecognizerOperationResult, List<FormPage>> recognizeContentPoller = recognizerClient
.beginRecognizeContentFromUrl(analyzeFilePath);
List<FormPage> contentResult = recognizeContentPoller.getFinalResult();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
contentResult.forEach(formPage -> {
// Table information
System.out.println("----Recognizing content ----");
System.out.printf("Has width: %f and height: %f, measured with unit: %s.%n", formPage.getWidth(),
formPage.getHeight(), formPage.getUnit());
formPage.getTables().forEach(formTable -> {
System.out.printf("Table has %d rows and %d columns.%n", formTable.getRowCount(),
formTable.getColumnCount());
formTable.getCells().forEach(formTableCell -> {
System.out.printf("Cell has text %s.%n", formTableCell.getText());
});
System.out.println();
});
});
}
// </snippet_getcontent_print>
// <snippet_receipts_call>
private static void AnalyzeReceipt(FormRecognizerClient recognizerClient, String receiptUri) {
SyncPoller<FormRecognizerOperationResult, List<RecognizedForm>> syncPoller = recognizerClient
.beginRecognizeReceiptsFromUrl(receiptUri);
List<RecognizedForm> receiptPageResults = syncPoller.getFinalResult();
// </snippet_receipts_call>
// <snippet_receipts_print>
for (int i = 0; i < receiptPageResults.size(); i++) {
RecognizedForm recognizedForm = receiptPageResults.get(i);
Map<String, FormField> recognizedFields = recognizedForm.getFields();
System.out.printf("----------- Recognized Receipt page %d -----------%n", i);
FormField merchantNameField = recognizedFields.get("MerchantName");
if (merchantNameField != null) {
if (FieldValueType.STRING == merchantNameField.getValue().getValueType()) {
String merchantName = merchantNameField.getValue().asString();
System.out.printf("Merchant Name: %s, confidence: %.2f%n", merchantName,
merchantNameField.getConfidence());
}
}
FormField merchantAddressField = recognizedFields.get("MerchantAddress");
if (merchantAddressField != null) {
if (FieldValueType.STRING == merchantAddressField.getValue().getValueType()) {
String merchantAddress = merchantAddressField.getValue().asString();
System.out.printf("Merchant Address: %s, confidence: %.2f%n", merchantAddress,
merchantAddressField.getConfidence());
}
}
FormField transactionDateField = recognizedFields.get("TransactionDate");
if (transactionDateField != null) {
if (FieldValueType.DATE == transactionDateField.getValue().getValueType()) {
LocalDate transactionDate = transactionDateField.getValue().asDate();
System.out.printf("Transaction Date: %s, confidence: %.2f%n", transactionDate,
transactionDateField.getConfidence());
}
}
// </snippet_receipts_print>
// <snippet_receipts_print_items>
FormField receiptItemsField = recognizedFields.get("Items");
if (receiptItemsField != null) {
System.out.printf("Receipt Items: %n");
if (FieldValueType.LIST == receiptItemsField.getValue().getValueType()) {
List<FormField> receiptItems = receiptItemsField.getValue().asList();
receiptItems.stream()
.filter(receiptItem -> FieldValueType.MAP == receiptItem.getValue().getValueType())
.map(formField -> formField.getValue().asMap())
.forEach(formFieldMap -> formFieldMap.forEach((key, formField) -> {
if ("Name".equals(key)) {
if (FieldValueType.STRING == formField.getValue().getValueType()) {
String name = formField.getValue().asString();
System.out.printf("Name: %s, confidence: %.2fs%n", name,
formField.getConfidence());
}
}
if ("Quantity".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float quantity = formField.getValue().asFloat();
System.out.printf("Quantity: %f, confidence: %.2f%n", quantity,
formField.getConfidence());
}
}
if ("Price".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float price = formField.getValue().asFloat();
System.out.printf("Price: %f, confidence: %.2f%n", price,
formField.getConfidence());
}
}
if ("TotalPrice".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float totalPrice = formField.getValue().asFloat();
System.out.printf("Total Price: %f, confidence: %.2f%n", totalPrice,
formField.getConfidence());
}
}
}));
}
}
}
}
// </snippet_receipts_print_items>
// <snippet_train_call>
private static String TrainModel(FormTrainingClient trainingClient, String trainingDataUrl) {
SyncPoller<FormRecognizerOperationResult, CustomFormModel> trainingPoller = trainingClient
.beginTraining(trainingDataUrl, false);
CustomFormModel customFormModel = trainingPoller.getFinalResult();
// Model Info
System.out.printf("Model Id: %s%n", customFormModel.getModelId());
System.out.printf("Model Status: %s%n", customFormModel.getModelStatus());
System.out.printf("Training started on: %s%n", customFormModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n%n", customFormModel.getTrainingCompletedOn());
// </snippet_train_call>
// <snippet_train_print>
System.out.println("Recognized Fields:");
// looping through the subModels, which contains the fields they were trained on
// Since the given training documents are unlabeled, we still group them but
// they do not have a label.
customFormModel.getSubmodels().forEach(customFormSubmodel -> {
// Since the training data is unlabeled, we are unable to return the accuracy of
// this model
System.out.printf("The subModel has form type %s%n", customFormSubmodel.getFormType());
customFormSubmodel.getFields().forEach((field, customFormModelField) -> System.out
.printf("The model found field '%s' with label: %s%n", field, customFormModelField.getLabel()));
});
// </snippet_train_print>
// <snippet_train_return>
return customFormModel.getModelId();
}
// </snippet_train_return>
// <snippet_trainlabels_call>
private static String TrainModelWithLabels(FormTrainingClient trainingClient, String trainingDataUrl) {
// Train custom model
String trainingSetSource = trainingDataUrl;
SyncPoller<FormRecognizerOperationResult, CustomFormModel> trainingPoller = trainingClient
.beginTraining(trainingSetSource, true);
CustomFormModel customFormModel = trainingPoller.getFinalResult();
// Model Info
System.out.printf("Model Id: %s%n", customFormModel.getModelId());
System.out.printf("Model Status: %s%n", customFormModel.getModelStatus());
System.out.printf("Training started on: %s%n", customFormModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n%n", customFormModel.getTrainingCompletedOn());
// </snippet_trainlabels_call>
// <snippet_trainlabels_print>
// looping through the subModels, which contains the fields they were trained on
// The labels are based on the ones you gave the training document.
System.out.println("Recognized Fields:");
// Since the data is labeled, we are able to return the accuracy of the model
customFormModel.getSubmodels().forEach(customFormSubmodel -> {
System.out.printf("The subModel with form type %s has accuracy: %.2f%n", customFormSubmodel.getFormType(),
customFormSubmodel.getAccuracy());
customFormSubmodel.getFields()
.forEach((label, customFormModelField) -> System.out.printf(
"The model found field '%s' to have name: %s with an accuracy: %.2f%n", label,
customFormModelField.getName(), customFormModelField.getAccuracy()));
});
return customFormModel.getModelId();
}
// </snippet_trainlabels_print>
// <snippet_analyze_call>
// Analyze PDF form data
private static void AnalyzePdfForm(FormRecognizerClient formClient, String modelId, String pdfFormUrl) {
SyncPoller<FormRecognizerOperationResult, List<RecognizedForm>> recognizeFormPoller = formClient
.beginRecognizeCustomFormsFromUrl(modelId, pdfFormUrl);
List<RecognizedForm> recognizedForms = recognizeFormPoller.getFinalResult();
// </snippet_analyze_call>
// <snippet_analyze_print>
for (int i = 0; i < recognizedForms.size(); i++) {
final RecognizedForm form = recognizedForms.get(i);
System.out.printf("----------- Recognized custom form info for page %d -----------%n", i);
System.out.printf("Form type: %s%n", form.getFormType());
form.getFields().forEach((label, formField) ->
// label data is populated if you are using a model trained with unlabeled data,
// since the service needs to make predictions for labels if not explicitly
// given to it.
System.out.printf("Field '%s' has label '%s' with a confidence " + "score of %.2f.%n", label,
formField.getLabelData().getText(), formField.getConfidence()));
}
}
// </snippet_analyze_print>
// <snippet_manage>
private static void ManageModels(FormTrainingClient trainingClient, String trainingFileUrl) {
// </snippet_manage>
// <snippet_manage_count>
AtomicReference<String> modelId = new AtomicReference<>();
// First, we see how many custom models we have, and what our limit is
AccountProperties accountProperties = trainingClient.getAccountProperties();
System.out.printf("The account has %s custom models, and we can have at most %s custom models",
accountProperties.getCustomModelCount(), accountProperties.getCustomModelLimit());
// </snippet_manage_count>
// <snippet_manage_list>
// Next, we get a paged list of all of our custom models
PagedIterable<CustomFormModelInfo> customModels = trainingClient.listCustomModels();
System.out.println("We have following models in the account:");
customModels.forEach(customFormModelInfo -> {
System.out.printf("Model Id: %s%n", customFormModelInfo.getModelId());
// get custom model info
modelId.set(customFormModelInfo.getModelId());
CustomFormModel customModel = trainingClient.getCustomModel(customFormModelInfo.getModelId());
System.out.printf("Model Id: %s%n", customModel.getModelId());
System.out.printf("Model Status: %s%n", customModel.getModelStatus());
System.out.printf("Training started on: %s%n", customModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n", customModel.getTrainingCompletedOn());
customModel.getSubmodels().forEach(customFormSubmodel -> {
System.out.printf("Custom Model Form type: %s%n", customFormSubmodel.getFormType());
System.out.printf("Custom Model Accuracy: %.2f%n", customFormSubmodel.getAccuracy());
if (customFormSubmodel.getFields() != null) {
customFormSubmodel.getFields().forEach((fieldText, customFormModelField) -> {
System.out.printf("Field Text: %s%n", fieldText);
System.out.printf("Field Accuracy: %.2f%n", customFormModelField.getAccuracy());
});
}
});
});
// </snippet_manage_list>
// <snippet_manage_delete>
// Delete Custom Model
System.out.printf("Deleted model with model Id: %s, operation completed with status: %s%n", modelId.get(),
trainingClient.deleteModelWithResponse(modelId.get(), Context.NONE).getStatusCode());
}
// </snippet_manage_delete>
}
重要
转到 Azure 门户。 如果你在“先决条件”部分创建的文档智能资源部署成功,请选择“后续步骤”下的“转到资源”按钮。 在“密钥和终结点”页的“资源管理”下,可以找到密钥和终结点。
请记住完成后将密钥从代码中删除。 切勿公开发布。 对于生产环境,请使用安全方法来存储和访问凭据。 有关详细信息,请参阅 Azure AI 服务安全性。
在应用程序的 main
方法中,添加对本项目中使用的方法的调用。 你将在稍后定义这些调用。 还需要为训练和测试数据添加对 URL 的引用。
若要检索自定义模型训练数据的 SAS URL,请在 Azure 门户中找到存储资源,然后选择“数据存储>容器”。
导航到你的容器,右键单击并选择“生成 SAS”。
请务必获取容器的 SAS,而不是存储帐户本身的。
确保选中“读取”、“写入”、“删除”和“列出”权限,然后选择“生成 SAS 令牌和 URL”。
将“URL”部分中的值复制到临时位置。 它应当采用
https://<storage account>.blob.core.chinacloudapi.cn/<container name>?<SAS value>
形式。
若要获取要测试的表单的 URL,可以使用上述步骤获取 blob 存储中单个文档的 SAS URL。 或者获取位于其他位置的文档的 URL。
使用上述方法还可获取收据图像的 URL。
// <snippet_imports>
import com.azure.ai.formrecognizer.* ;
import com.azure.ai.formrecognizer.training.* ;
import com.azure.ai.formrecognizer.models.* ;
import com.azure.ai.formrecognizer.training.models.* ;
import java.util.concurrent.atomic.AtomicReference;
import java.util.List;
import java.util.Map;
import java.time.LocalDate;
import com.azure.core.credential.AzureKeyCredential;
import com.azure.core.http.rest.PagedIterable;
import com.azure.core.util.Context;
import com.azure.core.util.polling.SyncPoller;
// </snippet_imports>
public class FormRecognizer {
// <snippet_creds>
static final String key = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
static final String endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
// </snippet_creds>
public static void main(String[] args) {
// <snippet_auth>
FormRecognizerClient recognizerClient = new FormRecognizerClientBuilder().credential(new AzureKeyCredential(key)).endpoint(endpoint).buildClient();
FormTrainingClient trainingClient = new FormTrainingClientBuilder().credential(new AzureKeyCredential(key)).endpoint(endpoint).buildClient();
// </snippet_auth>
// <snippet_mainvars>
String trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
String formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
String receiptUrl = "/cognitive-services/form-recognizer/media" + "/contoso-allinone.jpg";
String bcUrl = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_forms/business_cards/business-card-english.jpg";
String invoiceUrl = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_forms/forms/Invoice_1.pdf";
String idUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/id-license.jpg"
// </snippet_mainvars>
// <snippet_maincalls>
// Call Form Recognizer scenarios:
System.out.println("Get form content...");
GetContent(recognizerClient, formUrl);
System.out.println("Analyze receipt...");
AnalyzeReceipt(recognizerClient, receiptUrl);
System.out.println("Analyze business card...");
AnalyzeBusinessCard(recognizerClient, bcUrl);
System.out.println("Analyze invoice...");
AnalyzeInvoice(recognizerClient, invoiceUrl);
System.out.println("Analyze id...");
AnalyzeId(recognizerClient, idUrl);
System.out.println("Train Model with training data...");
String modelId = TrainModel(trainingClient, trainingDataUrl);
System.out.println("Analyze PDF form...");
AnalyzePdfForm(recognizerClient, modelId, formUrl);
System.out.println("Manage models...");
ManageModels(trainingClient, trainingDataUrl);
// </snippet_maincalls>
}
// <snippet_getcontent_call>
private static void GetContent(FormRecognizerClient recognizerClient, String invoiceUri) {
String analyzeFilePath = invoiceUri;
SyncPoller < FormRecognizerOperationResult,
List < FormPage >> recognizeContentPoller = recognizerClient.beginRecognizeContentFromUrl(analyzeFilePath);
List < FormPage > contentResult = recognizeContentPoller.getFinalResult();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
contentResult.forEach(formPage - >{
// Table information
System.out.println("----Recognizing content ----");
System.out.printf("Has width: %f and height: %f, measured with unit: %s.%n", formPage.getWidth(), formPage.getHeight(), formPage.getUnit());
formPage.getTables().forEach(formTable - >{
System.out.printf("Table has %d rows and %d columns.%n", formTable.getRowCount(), formTable.getColumnCount());
formTable.getCells().forEach(formTableCell - >{
System.out.printf("Cell has text %s.%n", formTableCell.getText());
});
System.out.println();
});
});
}
// </snippet_getcontent_print>
// <snippet_receipts_call>
private static void AnalyzeReceipt(FormRecognizerClient recognizerClient, String receiptUri) {
SyncPoller < FormRecognizerOperationResult,
List < RecognizedForm >> syncPoller = recognizerClient.beginRecognizeReceiptsFromUrl(receiptUri);
List < RecognizedForm > receiptPageResults = syncPoller.getFinalResult();
// </snippet_receipts_call>
// <snippet_receipts_print>
for (int i = 0; i < receiptPageResults.size(); i++) {
RecognizedForm recognizedForm = receiptPageResults.get(i);
Map < String,
FormField > recognizedFields = recognizedForm.getFields();
System.out.printf("----------- Recognized Receipt page %d -----------%n", i);
FormField merchantNameField = recognizedFields.get("MerchantName");
if (merchantNameField != null) {
if (FieldValueType.STRING == merchantNameField.getValue().getValueType()) {
String merchantName = merchantNameField.getValue().asString();
System.out.printf("Merchant Name: %s, confidence: %.2f%n", merchantName, merchantNameField.getConfidence());
}
}
FormField merchantAddressField = recognizedFields.get("MerchantAddress");
if (merchantAddressField != null) {
if (FieldValueType.STRING == merchantAddressField.getValue().getValueType()) {
String merchantAddress = merchantAddressField.getValue().asString();
System.out.printf("Merchant Address: %s, confidence: %.2f%n", merchantAddress, merchantAddressField.getConfidence());
}
}
FormField transactionDateField = recognizedFields.get("TransactionDate");
if (transactionDateField != null) {
if (FieldValueType.DATE == transactionDateField.getValue().getValueType()) {
LocalDate transactionDate = transactionDateField.getValue().asDate();
System.out.printf("Transaction Date: %s, confidence: %.2f%n", transactionDate, transactionDateField.getConfidence());
}
}
// </snippet_receipts_print>
// <snippet_receipts_print_items>
FormField receiptItemsField = recognizedFields.get("Items");
if (receiptItemsField != null) {
System.out.printf("Receipt Items: %n");
if (FieldValueType.LIST == receiptItemsField.getValue().getValueType()) {
List < FormField > receiptItems = receiptItemsField.getValue().asList();
receiptItems.stream().filter(receiptItem - >FieldValueType.MAP == receiptItem.getValue().getValueType()).map(formField - >formField.getValue().asMap()).forEach(formFieldMap - >formFieldMap.forEach((key, formField) - >{
if ("Name".equals(key)) {
if (FieldValueType.STRING == formField.getValue().getValueType()) {
String name = formField.getValue().asString();
System.out.printf("Name: %s, confidence: %.2fs%n", name, formField.getConfidence());
}
}
if ("Quantity".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float quantity = formField.getValue().asFloat();
System.out.printf("Quantity: %f, confidence: %.2f%n", quantity, formField.getConfidence());
}
}
if ("Price".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float price = formField.getValue().asFloat();
System.out.printf("Price: %f, confidence: %.2f%n", price, formField.getConfidence());
}
}
if ("TotalPrice".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float totalPrice = formField.getValue().asFloat();
System.out.printf("Total Price: %f, confidence: %.2f%n", totalPrice, formField.getConfidence());
}
}
}));
}
}
}
}
// </snippet_receipts_print_items>
// <snippet_bc_call>
private static void AnalyzeBusinessCard(FormRecognizerClient recognizerClient, String bcUrl) {
SyncPoller < FormRecognizerOperationResult,
List < RecognizedForm >> recognizeBusinessCardPoller = client.beginRecognizeBusinessCardsFromUrl(businessCardUrl);
List < RecognizedForm > businessCardPageResults = recognizeBusinessCardPoller.getFinalResult();
// </snippet_bc_call>
// <snippet_bc_print>
for (int i = 0; i < businessCardPageResults.size(); i++) {
RecognizedForm recognizedForm = businessCardPageResults.get(i);
Map < String,
FormField > recognizedFields = recognizedForm.getFields();
System.out.printf("----------- Recognized business card info for page %d -----------%n", i);
FormField contactNamesFormField = recognizedFields.get("ContactNames");
if (contactNamesFormField != null) {
if (FieldValueType.LIST == contactNamesFormField.getValue().getValueType()) {
List < FormField > contactNamesList = contactNamesFormField.getValue().asList();
contactNamesList.stream().filter(contactName - >FieldValueType.MAP == contactName.getValue().getValueType()).map(contactName - >{
System.out.printf("Contact name: %s%n", contactName.getValueData().getText());
return contactName.getValue().asMap();
}).forEach(contactNamesMap - >contactNamesMap.forEach((key, contactName) - >{
if ("FirstName".equals(key)) {
if (FieldValueType.STRING == contactName.getValue().getValueType()) {
String firstName = contactName.getValue().asString();
System.out.printf("\tFirst Name: %s, confidence: %.2f%n", firstName, contactName.getConfidence());
}
}
if ("LastName".equals(key)) {
if (FieldValueType.STRING == contactName.getValue().getValueType()) {
String lastName = contactName.getValue().asString();
System.out.printf("\tLast Name: %s, confidence: %.2f%n", lastName, contactName.getConfidence());
}
}
}));
}
}
FormField jobTitles = recognizedFields.get("JobTitles");
if (jobTitles != null) {
if (FieldValueType.LIST == jobTitles.getValue().getValueType()) {
List < FormField > jobTitlesItems = jobTitles.getValue().asList();
jobTitlesItems.stream().forEach(jobTitlesItem - >{
if (FieldValueType.STRING == jobTitlesItem.getValue().getValueType()) {
String jobTitle = jobTitlesItem.getValue().asString();
System.out.printf("Job Title: %s, confidence: %.2f%n", jobTitle, jobTitlesItem.getConfidence());
}
});
}
}
FormField departments = recognizedFields.get("Departments");
if (departments != null) {
if (FieldValueType.LIST == departments.getValue().getValueType()) {
List < FormField > departmentsItems = departments.getValue().asList();
departmentsItems.stream().forEach(departmentsItem - >{
if (FieldValueType.STRING == departmentsItem.getValue().getValueType()) {
String department = departmentsItem.getValue().asString();
System.out.printf("Department: %s, confidence: %.2f%n", department, departmentsItem.getConfidence());
}
});
}
}
FormField emails = recognizedFields.get("Emails");
if (emails != null) {
if (FieldValueType.LIST == emails.getValue().getValueType()) {
List < FormField > emailsItems = emails.getValue().asList();
emailsItems.stream().forEach(emailsItem - >{
if (FieldValueType.STRING == emailsItem.getValue().getValueType()) {
String email = emailsItem.getValue().asString();
System.out.printf("Email: %s, confidence: %.2f%n", email, emailsItem.getConfidence());
}
});
}
}
FormField websites = recognizedFields.get("Websites");
if (websites != null) {
if (FieldValueType.LIST == websites.getValue().getValueType()) {
List < FormField > websitesItems = websites.getValue().asList();
websitesItems.stream().forEach(websitesItem - >{
if (FieldValueType.STRING == websitesItem.getValue().getValueType()) {
String website = websitesItem.getValue().asString();
System.out.printf("Web site: %s, confidence: %.2f%n", website, websitesItem.getConfidence());
}
});
}
}
FormField mobilePhones = recognizedFields.get("MobilePhones");
if (mobilePhones != null) {
if (FieldValueType.LIST == mobilePhones.getValue().getValueType()) {
List < FormField > mobilePhonesItems = mobilePhones.getValue().asList();
mobilePhonesItems.stream().forEach(mobilePhonesItem - >{
if (FieldValueType.PHONE_NUMBER == mobilePhonesItem.getValue().getValueType()) {
String mobilePhoneNumber = mobilePhonesItem.getValue().asPhoneNumber();
System.out.printf("Mobile phone number: %s, confidence: %.2f%n", mobilePhoneNumber, mobilePhonesItem.getConfidence());
}
});
}
}
FormField otherPhones = recognizedFields.get("OtherPhones");
if (otherPhones != null) {
if (FieldValueType.LIST == otherPhones.getValue().getValueType()) {
List < FormField > otherPhonesItems = otherPhones.getValue().asList();
otherPhonesItems.stream().forEach(otherPhonesItem - >{
if (FieldValueType.PHONE_NUMBER == otherPhonesItem.getValue().getValueType()) {
String otherPhoneNumber = otherPhonesItem.getValue().asPhoneNumber();
System.out.printf("Other phone number: %s, confidence: %.2f%n", otherPhoneNumber, otherPhonesItem.getConfidence());
}
});
}
}
FormField faxes = recognizedFields.get("Faxes");
if (faxes != null) {
if (FieldValueType.LIST == faxes.getValue().getValueType()) {
List < FormField > faxesItems = faxes.getValue().asList();
faxesItems.stream().forEach(faxesItem - >{
if (FieldValueType.PHONE_NUMBER == faxesItem.getValue().getValueType()) {
String faxPhoneNumber = faxesItem.getValue().asPhoneNumber();
System.out.printf("Fax phone number: %s, confidence: %.2f%n", faxPhoneNumber, faxesItem.getConfidence());
}
});
}
}
FormField addresses = recognizedFields.get("Addresses");
if (addresses != null) {
if (FieldValueType.LIST == addresses.getValue().getValueType()) {
List < FormField > addressesItems = addresses.getValue().asList();
addressesItems.stream().forEach(addressesItem - >{
if (FieldValueType.STRING == addressesItem.getValue().getValueType()) {
String address = addressesItem.getValue().asString();
System.out.printf("Address: %s, confidence: %.2f%n", address, addressesItem.getConfidence());
}
});
}
}
FormField companyName = recognizedFields.get("CompanyNames");
if (companyName != null) {
if (FieldValueType.LIST == companyName.getValue().getValueType()) {
List < FormField > companyNameItems = companyName.getValue().asList();
companyNameItems.stream().forEach(companyNameItem - >{
if (FieldValueType.STRING == companyNameItem.getValue().getValueType()) {
String companyNameValue = companyNameItem.getValue().asString();
System.out.printf("Company name: %s, confidence: %.2f%n", companyNameValue, companyNameItem.getConfidence());
}
});
}
}
}
}
// </snippet_bc_print>
// <snippet_invoice_call>
private static void AnalyzeInvoice(FormRecognizerClient recognizerClient, String invoiceUrl) {
SyncPoller < FormRecognizerOperationResult,
List < RecognizedForm >> recognizeInvoicesPoller = client.beginRecognizeInvoicesFromUrl(invoiceUrl);
List < RecognizedForm > recognizedInvoices = recognizeInvoicesPoller.getFinalResult();
// </snippet_invoice_call>
// <snippet_invoice_print>
for (int i = 0; i < recognizedInvoices.size(); i++) {
RecognizedForm recognizedInvoice = recognizedInvoices.get(i);
Map < String,
FormField > recognizedFields = recognizedInvoice.getFields();
System.out.printf("----------- Recognized invoice info for page %d -----------%n", i);
FormField vendorNameField = recognizedFields.get("VendorName");
if (vendorNameField != null) {
if (FieldValueType.STRING == vendorNameField.getValue().getValueType()) {
String merchantName = vendorNameField.getValue().asString();
System.out.printf("Vendor Name: %s, confidence: %.2f%n", merchantName, vendorNameField.getConfidence());
}
}
FormField vendorAddressField = recognizedFields.get("VendorAddress");
if (vendorAddressField != null) {
if (FieldValueType.STRING == vendorAddressField.getValue().getValueType()) {
String merchantAddress = vendorAddressField.getValue().asString();
System.out.printf("Vendor address: %s, confidence: %.2f%n", merchantAddress, vendorAddressField.getConfidence());
}
}
FormField customerNameField = recognizedFields.get("CustomerName");
if (customerNameField != null) {
if (FieldValueType.STRING == customerNameField.getValue().getValueType()) {
String merchantAddress = customerNameField.getValue().asString();
System.out.printf("Customer Name: %s, confidence: %.2f%n", merchantAddress, customerNameField.getConfidence());
}
}
FormField customerAddressRecipientField = recognizedFields.get("CustomerAddressRecipient");
if (customerAddressRecipientField != null) {
if (FieldValueType.STRING == customerAddressRecipientField.getValue().getValueType()) {
String customerAddr = customerAddressRecipientField.getValue().asString();
System.out.printf("Customer Address Recipient: %s, confidence: %.2f%n", customerAddr, customerAddressRecipientField.getConfidence());
}
}
FormField invoiceIdField = recognizedFields.get("InvoiceId");
if (invoiceIdField != null) {
if (FieldValueType.STRING == invoiceIdField.getValue().getValueType()) {
String invoiceId = invoiceIdField.getValue().asString();
System.out.printf("Invoice Id: %s, confidence: %.2f%n", invoiceId, invoiceIdField.getConfidence());
}
}
FormField invoiceDateField = recognizedFields.get("InvoiceDate");
if (customerNameField != null) {
if (FieldValueType.DATE == invoiceDateField.getValue().getValueType()) {
LocalDate invoiceDate = invoiceDateField.getValue().asDate();
System.out.printf("Invoice Date: %s, confidence: %.2f%n", invoiceDate, invoiceDateField.getConfidence());
}
}
FormField invoiceTotalField = recognizedFields.get("InvoiceTotal");
if (customerAddressRecipientField != null) {
if (FieldValueType.FLOAT == invoiceTotalField.getValue().getValueType()) {
Float invoiceTotal = invoiceTotalField.getValue().asFloat();
System.out.printf("Invoice Total: %.2f, confidence: %.2f%n", invoiceTotal, invoiceTotalField.getConfidence());
}
}
}
}
// </snippet_invoice_print>
//<snippet_id_call>
private static void AnalyzeId(FormRecognizerClient client, String idUrl) {
SyncPoller < FormRecognizerOperationResult,
List < RecognizedForm >> analyzeIdentityDocumentPoller = client.beginRecognizeIdentityDocumentsFromUrl(licenseDocumentUrl);
List < RecognizedForm > identityDocumentResults = analyzeIdentityDocumentPoller.getFinalResult();
//</snippet_id_call>
//<snippet_id_print>
for (int i = 0; i < identityDocumentResults.size(); i++) {
RecognizedForm recognizedForm = identityDocumentResults.get(i);
Map < String,
FormField > recognizedFields = recognizedForm.getFields();
System.out.printf("----------- Recognized license info for page %d -----------%n", i);
FormField addressField = recognizedFields.get("Address");
if (addressField != null) {
if (FieldValueType.STRING == addressField.getValue().getValueType()) {
String address = addressField.getValue().asString();
System.out.printf("Address: %s, confidence: %.2f%n", address, addressField.getConfidence());
}
}
FormField countryRegionFormField = recognizedFields.get("CountryRegion");
if (countryRegionFormField != null) {
if (FieldValueType.STRING == countryRegionFormField.getValue().getValueType()) {
String countryRegion = countryRegionFormField.getValue().asCountryRegion();
System.out.printf("Country or region: %s, confidence: %.2f%n", countryRegion, countryRegionFormField.getConfidence());
}
}
FormField dateOfBirthField = recognizedFields.get("DateOfBirth");
if (dateOfBirthField != null) {
if (FieldValueType.DATE == dateOfBirthField.getValue().getValueType()) {
LocalDate dateOfBirth = dateOfBirthField.getValue().asDate();
System.out.printf("Date of Birth: %s, confidence: %.2f%n", dateOfBirth, dateOfBirthField.getConfidence());
}
}
FormField dateOfExpirationField = recognizedFields.get("DateOfExpiration");
if (dateOfExpirationField != null) {
if (FieldValueType.DATE == dateOfExpirationField.getValue().getValueType()) {
LocalDate expirationDate = dateOfExpirationField.getValue().asDate();
System.out.printf("Document date of expiration: %s, confidence: %.2f%n", expirationDate, dateOfExpirationField.getConfidence());
}
}
FormField documentNumberField = recognizedFields.get("DocumentNumber");
if (documentNumberField != null) {
if (FieldValueType.STRING == documentNumberField.getValue().getValueType()) {
String documentNumber = documentNumberField.getValue().asString();
System.out.printf("Document number: %s, confidence: %.2f%n", documentNumber, documentNumberField.getConfidence());
}
}
FormField firstNameField = recognizedFields.get("FirstName");
if (firstNameField != null) {
if (FieldValueType.STRING == firstNameField.getValue().getValueType()) {
String firstName = firstNameField.getValue().asString();
System.out.printf("First Name: %s, confidence: %.2f%n", firstName, documentNumberField.getConfidence());
}
}
FormField lastNameField = recognizedFields.get("LastName");
if (lastNameField != null) {
if (FieldValueType.STRING == lastNameField.getValue().getValueType()) {
String lastName = lastNameField.getValue().asString();
System.out.printf("Last name: %s, confidence: %.2f%n", lastName, lastNameField.getConfidence());
}
}
FormField regionField = recognizedFields.get("Region");
if (regionField != null) {
if (FieldValueType.STRING == regionField.getValue().getValueType()) {
String region = regionField.getValue().asString();
System.out.printf("Region: %s, confidence: %.2f%n", region, regionField.getConfidence());
}
}
}
//</snippet_id_print>
// <snippet_train_call>
private static String TrainModel(FormTrainingClient trainingClient, String trainingDataUrl) {
SyncPoller < FormRecognizerOperationResult,
CustomFormModel > trainingPoller = trainingClient.beginTraining(trainingDataUrl, false);
CustomFormModel customFormModel = trainingPoller.getFinalResult();
// Model Info
System.out.printf("Model Id: %s%n", customFormModel.getModelId());
System.out.printf("Model Status: %s%n", customFormModel.getModelStatus());
System.out.printf("Training started on: %s%n", customFormModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n%n", customFormModel.getTrainingCompletedOn());
// </snippet_train_call>
// <snippet_train_print>
System.out.println("Recognized Fields:");
// looping through the subModels, which contains the fields they were trained on
// Since the given training documents are unlabeled, we still group them but
// they do not have a label.
customFormModel.getSubmodels().forEach(customFormSubmodel - >{
// Since the training data is unlabeled, we are unable to return the accuracy of
// this model
System.out.printf("The subModel has form type %s%n", customFormSubmodel.getFormType());
customFormSubmodel.getFields().forEach((field, customFormModelField) - >System.out.printf("The model found field '%s' with label: %s%n", field, customFormModelField.getLabel()));
});
// </snippet_train_print>
// <snippet_train_return>
return customFormModel.getModelId();
}
// </snippet_train_return>
// <snippet_trainlabels_call>
private static String TrainModelWithLabels(FormTrainingClient trainingClient, String trainingDataUrl) {
// Train custom model
String trainingSetSource = trainingDataUrl;
SyncPoller < FormRecognizerOperationResult,
CustomFormModel > trainingPoller = trainingClient.beginTraining(trainingSetSource, true);
CustomFormModel customFormModel = trainingPoller.getFinalResult();
// Model Info
System.out.printf("Model Id: %s%n", customFormModel.getModelId());
System.out.printf("Model Status: %s%n", customFormModel.getModelStatus());
System.out.printf("Training started on: %s%n", customFormModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n%n", customFormModel.getTrainingCompletedOn());
// </snippet_trainlabels_call>
// <snippet_trainlabels_print>
// looping through the subModels, which contains the fields they were trained on
// The labels are based on the ones you gave the training document.
System.out.println("Recognized Fields:");
// Since the data is labeled, we are able to return the accuracy of the model
customFormModel.getSubmodels().forEach(customFormSubmodel - >{
System.out.printf("The subModel with form type %s has accuracy: %.2f%n", customFormSubmodel.getFormType(), customFormSubmodel.getAccuracy());
customFormSubmodel.getFields().forEach((label, customFormModelField) - >System.out.printf("The model found field '%s' to have name: %s with an accuracy: %.2f%n", label, customFormModelField.getName(), customFormModelField.getAccuracy()));
});
return customFormModel.getModelId();
}
// </snippet_trainlabels_print>
// <snippet_analyze_call>
// Analyze PDF form data
private static void AnalyzePdfForm(FormRecognizerClient formClient, String modelId, String pdfFormUrl) {
SyncPoller < FormRecognizerOperationResult,
List < RecognizedForm >> recognizeFormPoller = formClient.beginRecognizeCustomFormsFromUrl(modelId, pdfFormUrl);
List < RecognizedForm > recognizedForms = recognizeFormPoller.getFinalResult();
// </snippet_analyze_call>
// <snippet_analyze_print>
for (int i = 0; i < recognizedForms.size(); i++) {
final RecognizedForm form = recognizedForms.get(i);
System.out.printf("----------- Recognized custom form info for page %d -----------%n", i);
System.out.printf("Form type: %s%n", form.getFormType());
form.getFields().forEach((label, formField) - >
// label data is populated if you are using a model trained with unlabeled data,
// since the service needs to make predictions for labels if not explicitly
// given to it.
System.out.printf("Field '%s' has label '%s' with a confidence " + "score of %.2f.%n", label, formField.getLabelData().getText(), formField.getConfidence()));
}
}
// </snippet_analyze_print>
// <snippet_manage>
private static void ManageModels(FormTrainingClient trainingClient, String trainingFileUrl) {
// </snippet_manage>
// <snippet_manage_count>
AtomicReference < String > modelId = new AtomicReference < >();
// First, we see how many custom models we have, and what our limit is
AccountProperties accountProperties = trainingClient.getAccountProperties();
System.out.printf("The account has %s custom models, and we can have at most %s custom models", accountProperties.getCustomModelCount(), accountProperties.getCustomModelLimit());
// </snippet_manage_count>
// <snippet_manage_list>
// Next, we get a paged list of all of our custom models
PagedIterable < CustomFormModelInfo > customModels = trainingClient.listCustomModels();
System.out.println("We have following models in the account:");
customModels.forEach(customFormModelInfo - >{
System.out.printf("Model Id: %s%n", customFormModelInfo.getModelId());
// get custom model info
modelId.set(customFormModelInfo.getModelId());
CustomFormModel customModel = trainingClient.getCustomModel(customFormModelInfo.getModelId());
System.out.printf("Model Id: %s%n", customModel.getModelId());
System.out.printf("Model Status: %s%n", customModel.getModelStatus());
System.out.printf("Training started on: %s%n", customModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n", customModel.getTrainingCompletedOn());
customModel.getSubmodels().forEach(customFormSubmodel - >{
System.out.printf("Custom Model Form type: %s%n", customFormSubmodel.getFormType());
System.out.printf("Custom Model Accuracy: %.2f%n", customFormSubmodel.getAccuracy());
if (customFormSubmodel.getFields() != null) {
customFormSubmodel.getFields().forEach((fieldText, customFormModelField) - >{
System.out.printf("Field Text: %s%n", fieldText);
System.out.printf("Field Accuracy: %.2f%n", customFormModelField.getAccuracy());
});
}
});
});
// </snippet_manage_list>
// <snippet_manage_delete>
// Delete Custom Model
System.out.printf("Deleted model with model Id: %s, operation completed with status: %s%n", modelId.get(), trainingClient.deleteModelWithResponse(modelId.get(), Context.NONE).getStatusCode());
}
// </snippet_manage_delete>
}
// <snippet_imports>
import com.azure.ai.formrecognizer.* ;
import com.azure.ai.formrecognizer.training.* ;
import com.azure.ai.formrecognizer.models.* ;
import com.azure.ai.formrecognizer.training.models.* ;
import java.util.concurrent.atomic.AtomicReference;
import java.util.List;
import java.util.Map;
import java.time.LocalDate;
import com.azure.core.credential.AzureKeyCredential;
import com.azure.core.http.rest.PagedIterable;
import com.azure.core.util.Context;
import com.azure.core.util.polling.SyncPoller;
// </snippet_imports>
public class FormRecognizer {
// <snippet_creds>
static final String key = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
static final String endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
// </snippet_creds>
public static void main(String[] args) {
// <snippet_auth>
FormRecognizerClient recognizerClient = new FormRecognizerClientBuilder().credential(new AzureKeyCredential(key)).endpoint(endpoint).buildClient();
FormTrainingClient trainingClient = new FormTrainingClientBuilder().credential(new AzureKeyCredential(key)).endpoint(endpoint).buildClient();
// </snippet_auth>
// <snippet_mainvars>
String trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
String formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
String receiptUrl = "/cognitive-services/form-recognizer/media" + "/contoso-allinone.jpg";
String bcUrl = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_forms/business_cards/business-card-english.jpg";
String invoiceUrl = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_forms/forms/Invoice_1.pdf";
String idUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/id-license.jpg"
// </snippet_mainvars>
// <snippet_maincalls>
// Call Form Recognizer scenarios:
System.out.println("Get form content...");
GetContent(recognizerClient, formUrl);
System.out.println("Analyze receipt...");
AnalyzeReceipt(recognizerClient, receiptUrl);
System.out.println("Analyze business card...");
AnalyzeBusinessCard(recognizerClient, bcUrl);
System.out.println("Analyze invoice...");
AnalyzeInvoice(recognizerClient, invoiceUrl);
System.out.println("Analyze id...");
AnalyzeId(recognizerClient, idUrl);
System.out.println("Train Model with training data...");
String modelId = TrainModel(trainingClient, trainingDataUrl);
System.out.println("Analyze PDF form...");
AnalyzePdfForm(recognizerClient, modelId, formUrl);
System.out.println("Manage models...");
ManageModels(trainingClient, trainingDataUrl);
// </snippet_maincalls>
}
// <snippet_getcontent_call>
private static void GetContent(FormRecognizerClient recognizerClient, String invoiceUri) {
String analyzeFilePath = invoiceUri;
SyncPoller < FormRecognizerOperationResult,
List < FormPage >> recognizeContentPoller = recognizerClient.beginRecognizeContentFromUrl(analyzeFilePath);
List < FormPage > contentResult = recognizeContentPoller.getFinalResult();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
contentResult.forEach(formPage - >{
// Table information
System.out.println("----Recognizing content ----");
System.out.printf("Has width: %f and height: %f, measured with unit: %s.%n", formPage.getWidth(), formPage.getHeight(), formPage.getUnit());
formPage.getTables().forEach(formTable - >{
System.out.printf("Table has %d rows and %d columns.%n", formTable.getRowCount(), formTable.getColumnCount());
formTable.getCells().forEach(formTableCell - >{
System.out.printf("Cell has text %s.%n", formTableCell.getText());
});
System.out.println();
});
});
}
// </snippet_getcontent_print>
// <snippet_receipts_call>
private static void AnalyzeReceipt(FormRecognizerClient recognizerClient, String receiptUri) {
SyncPoller < FormRecognizerOperationResult,
List < RecognizedForm >> syncPoller = recognizerClient.beginRecognizeReceiptsFromUrl(receiptUri);
List < RecognizedForm > receiptPageResults = syncPoller.getFinalResult();
// </snippet_receipts_call>
// <snippet_receipts_print>
for (int i = 0; i < receiptPageResults.size(); i++) {
RecognizedForm recognizedForm = receiptPageResults.get(i);
Map < String,
FormField > recognizedFields = recognizedForm.getFields();
System.out.printf("----------- Recognized Receipt page %d -----------%n", i);
FormField merchantNameField = recognizedFields.get("MerchantName");
if (merchantNameField != null) {
if (FieldValueType.STRING == merchantNameField.getValue().getValueType()) {
String merchantName = merchantNameField.getValue().asString();
System.out.printf("Merchant Name: %s, confidence: %.2f%n", merchantName, merchantNameField.getConfidence());
}
}
FormField merchantAddressField = recognizedFields.get("MerchantAddress");
if (merchantAddressField != null) {
if (FieldValueType.STRING == merchantAddressField.getValue().getValueType()) {
String merchantAddress = merchantAddressField.getValue().asString();
System.out.printf("Merchant Address: %s, confidence: %.2f%n", merchantAddress, merchantAddressField.getConfidence());
}
}
FormField transactionDateField = recognizedFields.get("TransactionDate");
if (transactionDateField != null) {
if (FieldValueType.DATE == transactionDateField.getValue().getValueType()) {
LocalDate transactionDate = transactionDateField.getValue().asDate();
System.out.printf("Transaction Date: %s, confidence: %.2f%n", transactionDate, transactionDateField.getConfidence());
}
}
// </snippet_receipts_print>
// <snippet_receipts_print_items>
FormField receiptItemsField = recognizedFields.get("Items");
if (receiptItemsField != null) {
System.out.printf("Receipt Items: %n");
if (FieldValueType.LIST == receiptItemsField.getValue().getValueType()) {
List < FormField > receiptItems = receiptItemsField.getValue().asList();
receiptItems.stream().filter(receiptItem - >FieldValueType.MAP == receiptItem.getValue().getValueType()).map(formField - >formField.getValue().asMap()).forEach(formFieldMap - >formFieldMap.forEach((key, formField) - >{
if ("Name".equals(key)) {
if (FieldValueType.STRING == formField.getValue().getValueType()) {
String name = formField.getValue().asString();
System.out.printf("Name: %s, confidence: %.2fs%n", name, formField.getConfidence());
}
}
if ("Quantity".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float quantity = formField.getValue().asFloat();
System.out.printf("Quantity: %f, confidence: %.2f%n", quantity, formField.getConfidence());
}
}
if ("Price".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float price = formField.getValue().asFloat();
System.out.printf("Price: %f, confidence: %.2f%n", price, formField.getConfidence());
}
}
if ("TotalPrice".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float totalPrice = formField.getValue().asFloat();
System.out.printf("Total Price: %f, confidence: %.2f%n", totalPrice, formField.getConfidence());
}
}
}));
}
}
}
}
// </snippet_receipts_print_items>
// <snippet_bc_call>
private static void AnalyzeBusinessCard(FormRecognizerClient recognizerClient, String bcUrl) {
SyncPoller < FormRecognizerOperationResult,
List < RecognizedForm >> recognizeBusinessCardPoller = client.beginRecognizeBusinessCardsFromUrl(businessCardUrl);
List < RecognizedForm > businessCardPageResults = recognizeBusinessCardPoller.getFinalResult();
// </snippet_bc_call>
// <snippet_bc_print>
for (int i = 0; i < businessCardPageResults.size(); i++) {
RecognizedForm recognizedForm = businessCardPageResults.get(i);
Map < String,
FormField > recognizedFields = recognizedForm.getFields();
System.out.printf("----------- Recognized business card info for page %d -----------%n", i);
FormField contactNamesFormField = recognizedFields.get("ContactNames");
if (contactNamesFormField != null) {
if (FieldValueType.LIST == contactNamesFormField.getValue().getValueType()) {
List < FormField > contactNamesList = contactNamesFormField.getValue().asList();
contactNamesList.stream().filter(contactName - >FieldValueType.MAP == contactName.getValue().getValueType()).map(contactName - >{
System.out.printf("Contact name: %s%n", contactName.getValueData().getText());
return contactName.getValue().asMap();
}).forEach(contactNamesMap - >contactNamesMap.forEach((key, contactName) - >{
if ("FirstName".equals(key)) {
if (FieldValueType.STRING == contactName.getValue().getValueType()) {
String firstName = contactName.getValue().asString();
System.out.printf("\tFirst Name: %s, confidence: %.2f%n", firstName, contactName.getConfidence());
}
}
if ("LastName".equals(key)) {
if (FieldValueType.STRING == contactName.getValue().getValueType()) {
String lastName = contactName.getValue().asString();
System.out.printf("\tLast Name: %s, confidence: %.2f%n", lastName, contactName.getConfidence());
}
}
}));
}
}
FormField jobTitles = recognizedFields.get("JobTitles");
if (jobTitles != null) {
if (FieldValueType.LIST == jobTitles.getValue().getValueType()) {
List < FormField > jobTitlesItems = jobTitles.getValue().asList();
jobTitlesItems.stream().forEach(jobTitlesItem - >{
if (FieldValueType.STRING == jobTitlesItem.getValue().getValueType()) {
String jobTitle = jobTitlesItem.getValue().asString();
System.out.printf("Job Title: %s, confidence: %.2f%n", jobTitle, jobTitlesItem.getConfidence());
}
});
}
}
FormField departments = recognizedFields.get("Departments");
if (departments != null) {
if (FieldValueType.LIST == departments.getValue().getValueType()) {
List < FormField > departmentsItems = departments.getValue().asList();
departmentsItems.stream().forEach(departmentsItem - >{
if (FieldValueType.STRING == departmentsItem.getValue().getValueType()) {
String department = departmentsItem.getValue().asString();
System.out.printf("Department: %s, confidence: %.2f%n", department, departmentsItem.getConfidence());
}
});
}
}
FormField emails = recognizedFields.get("Emails");
if (emails != null) {
if (FieldValueType.LIST == emails.getValue().getValueType()) {
List < FormField > emailsItems = emails.getValue().asList();
emailsItems.stream().forEach(emailsItem - >{
if (FieldValueType.STRING == emailsItem.getValue().getValueType()) {
String email = emailsItem.getValue().asString();
System.out.printf("Email: %s, confidence: %.2f%n", email, emailsItem.getConfidence());
}
});
}
}
FormField websites = recognizedFields.get("Websites");
if (websites != null) {
if (FieldValueType.LIST == websites.getValue().getValueType()) {
List < FormField > websitesItems = websites.getValue().asList();
websitesItems.stream().forEach(websitesItem - >{
if (FieldValueType.STRING == websitesItem.getValue().getValueType()) {
String website = websitesItem.getValue().asString();
System.out.printf("Web site: %s, confidence: %.2f%n", website, websitesItem.getConfidence());
}
});
}
}
FormField mobilePhones = recognizedFields.get("MobilePhones");
if (mobilePhones != null) {
if (FieldValueType.LIST == mobilePhones.getValue().getValueType()) {
List < FormField > mobilePhonesItems = mobilePhones.getValue().asList();
mobilePhonesItems.stream().forEach(mobilePhonesItem - >{
if (FieldValueType.PHONE_NUMBER == mobilePhonesItem.getValue().getValueType()) {
String mobilePhoneNumber = mobilePhonesItem.getValue().asPhoneNumber();
System.out.printf("Mobile phone number: %s, confidence: %.2f%n", mobilePhoneNumber, mobilePhonesItem.getConfidence());
}
});
}
}
FormField otherPhones = recognizedFields.get("OtherPhones");
if (otherPhones != null) {
if (FieldValueType.LIST == otherPhones.getValue().getValueType()) {
List < FormField > otherPhonesItems = otherPhones.getValue().asList();
otherPhonesItems.stream().forEach(otherPhonesItem - >{
if (FieldValueType.PHONE_NUMBER == otherPhonesItem.getValue().getValueType()) {
String otherPhoneNumber = otherPhonesItem.getValue().asPhoneNumber();
System.out.printf("Other phone number: %s, confidence: %.2f%n", otherPhoneNumber, otherPhonesItem.getConfidence());
}
});
}
}
FormField faxes = recognizedFields.get("Faxes");
if (faxes != null) {
if (FieldValueType.LIST == faxes.getValue().getValueType()) {
List < FormField > faxesItems = faxes.getValue().asList();
faxesItems.stream().forEach(faxesItem - >{
if (FieldValueType.PHONE_NUMBER == faxesItem.getValue().getValueType()) {
String faxPhoneNumber = faxesItem.getValue().asPhoneNumber();
System.out.printf("Fax phone number: %s, confidence: %.2f%n", faxPhoneNumber, faxesItem.getConfidence());
}
});
}
}
FormField addresses = recognizedFields.get("Addresses");
if (addresses != null) {
if (FieldValueType.LIST == addresses.getValue().getValueType()) {
List < FormField > addressesItems = addresses.getValue().asList();
addressesItems.stream().forEach(addressesItem - >{
if (FieldValueType.STRING == addressesItem.getValue().getValueType()) {
String address = addressesItem.getValue().asString();
System.out.printf("Address: %s, confidence: %.2f%n", address, addressesItem.getConfidence());
}
});
}
}
FormField companyName = recognizedFields.get("CompanyNames");
if (companyName != null) {
if (FieldValueType.LIST == companyName.getValue().getValueType()) {
List < FormField > companyNameItems = companyName.getValue().asList();
companyNameItems.stream().forEach(companyNameItem - >{
if (FieldValueType.STRING == companyNameItem.getValue().getValueType()) {
String companyNameValue = companyNameItem.getValue().asString();
System.out.printf("Company name: %s, confidence: %.2f%n", companyNameValue, companyNameItem.getConfidence());
}
});
}
}
}
}
// </snippet_bc_print>
// <snippet_invoice_call>
private static void AnalyzeInvoice(FormRecognizerClient recognizerClient, String invoiceUrl) {
SyncPoller < FormRecognizerOperationResult,
List < RecognizedForm >> recognizeInvoicesPoller = client.beginRecognizeInvoicesFromUrl(invoiceUrl);
List < RecognizedForm > recognizedInvoices = recognizeInvoicesPoller.getFinalResult();
// </snippet_invoice_call>
// <snippet_invoice_print>
for (int i = 0; i < recognizedInvoices.size(); i++) {
RecognizedForm recognizedInvoice = recognizedInvoices.get(i);
Map < String,
FormField > recognizedFields = recognizedInvoice.getFields();
System.out.printf("----------- Recognized invoice info for page %d -----------%n", i);
FormField vendorNameField = recognizedFields.get("VendorName");
if (vendorNameField != null) {
if (FieldValueType.STRING == vendorNameField.getValue().getValueType()) {
String merchantName = vendorNameField.getValue().asString();
System.out.printf("Vendor Name: %s, confidence: %.2f%n", merchantName, vendorNameField.getConfidence());
}
}
FormField vendorAddressField = recognizedFields.get("VendorAddress");
if (vendorAddressField != null) {
if (FieldValueType.STRING == vendorAddressField.getValue().getValueType()) {
String merchantAddress = vendorAddressField.getValue().asString();
System.out.printf("Vendor address: %s, confidence: %.2f%n", merchantAddress, vendorAddressField.getConfidence());
}
}
FormField customerNameField = recognizedFields.get("CustomerName");
if (customerNameField != null) {
if (FieldValueType.STRING == customerNameField.getValue().getValueType()) {
String merchantAddress = customerNameField.getValue().asString();
System.out.printf("Customer Name: %s, confidence: %.2f%n", merchantAddress, customerNameField.getConfidence());
}
}
FormField customerAddressRecipientField = recognizedFields.get("CustomerAddressRecipient");
if (customerAddressRecipientField != null) {
if (FieldValueType.STRING == customerAddressRecipientField.getValue().getValueType()) {
String customerAddr = customerAddressRecipientField.getValue().asString();
System.out.printf("Customer Address Recipient: %s, confidence: %.2f%n", customerAddr, customerAddressRecipientField.getConfidence());
}
}
FormField invoiceIdField = recognizedFields.get("InvoiceId");
if (invoiceIdField != null) {
if (FieldValueType.STRING == invoiceIdField.getValue().getValueType()) {
String invoiceId = invoiceIdField.getValue().asString();
System.out.printf("Invoice Id: %s, confidence: %.2f%n", invoiceId, invoiceIdField.getConfidence());
}
}
FormField invoiceDateField = recognizedFields.get("InvoiceDate");
if (customerNameField != null) {
if (FieldValueType.DATE == invoiceDateField.getValue().getValueType()) {
LocalDate invoiceDate = invoiceDateField.getValue().asDate();
System.out.printf("Invoice Date: %s, confidence: %.2f%n", invoiceDate, invoiceDateField.getConfidence());
}
}
FormField invoiceTotalField = recognizedFields.get("InvoiceTotal");
if (customerAddressRecipientField != null) {
if (FieldValueType.FLOAT == invoiceTotalField.getValue().getValueType()) {
Float invoiceTotal = invoiceTotalField.getValue().asFloat();
System.out.printf("Invoice Total: %.2f, confidence: %.2f%n", invoiceTotal, invoiceTotalField.getConfidence());
}
}
}
}
// </snippet_invoice_print>
//<snippet_id_call>
private static void AnalyzeId(FormRecognizerClient client, String idUrl) {
SyncPoller < FormRecognizerOperationResult,
List < RecognizedForm >> analyzeIdentityDocumentPoller = client.beginRecognizeIdentityDocumentsFromUrl(licenseDocumentUrl);
List < RecognizedForm > identityDocumentResults = analyzeIdentityDocumentPoller.getFinalResult();
//</snippet_id_call>
//<snippet_id_print>
for (int i = 0; i < identityDocumentResults.size(); i++) {
RecognizedForm recognizedForm = identityDocumentResults.get(i);
Map < String,
FormField > recognizedFields = recognizedForm.getFields();
System.out.printf("----------- Recognized license info for page %d -----------%n", i);
FormField addressField = recognizedFields.get("Address");
if (addressField != null) {
if (FieldValueType.STRING == addressField.getValue().getValueType()) {
String address = addressField.getValue().asString();
System.out.printf("Address: %s, confidence: %.2f%n", address, addressField.getConfidence());
}
}
FormField countryRegionFormField = recognizedFields.get("CountryRegion");
if (countryRegionFormField != null) {
if (FieldValueType.STRING == countryRegionFormField.getValue().getValueType()) {
String countryRegion = countryRegionFormField.getValue().asCountryRegion();
System.out.printf("Country or region: %s, confidence: %.2f%n", countryRegion, countryRegionFormField.getConfidence());
}
}
FormField dateOfBirthField = recognizedFields.get("DateOfBirth");
if (dateOfBirthField != null) {
if (FieldValueType.DATE == dateOfBirthField.getValue().getValueType()) {
LocalDate dateOfBirth = dateOfBirthField.getValue().asDate();
System.out.printf("Date of Birth: %s, confidence: %.2f%n", dateOfBirth, dateOfBirthField.getConfidence());
}
}
FormField dateOfExpirationField = recognizedFields.get("DateOfExpiration");
if (dateOfExpirationField != null) {
if (FieldValueType.DATE == dateOfExpirationField.getValue().getValueType()) {
LocalDate expirationDate = dateOfExpirationField.getValue().asDate();
System.out.printf("Document date of expiration: %s, confidence: %.2f%n", expirationDate, dateOfExpirationField.getConfidence());
}
}
FormField documentNumberField = recognizedFields.get("DocumentNumber");
if (documentNumberField != null) {
if (FieldValueType.STRING == documentNumberField.getValue().getValueType()) {
String documentNumber = documentNumberField.getValue().asString();
System.out.printf("Document number: %s, confidence: %.2f%n", documentNumber, documentNumberField.getConfidence());
}
}
FormField firstNameField = recognizedFields.get("FirstName");
if (firstNameField != null) {
if (FieldValueType.STRING == firstNameField.getValue().getValueType()) {
String firstName = firstNameField.getValue().asString();
System.out.printf("First Name: %s, confidence: %.2f%n", firstName, documentNumberField.getConfidence());
}
}
FormField lastNameField = recognizedFields.get("LastName");
if (lastNameField != null) {
if (FieldValueType.STRING == lastNameField.getValue().getValueType()) {
String lastName = lastNameField.getValue().asString();
System.out.printf("Last name: %s, confidence: %.2f%n", lastName, lastNameField.getConfidence());
}
}
FormField regionField = recognizedFields.get("Region");
if (regionField != null) {
if (FieldValueType.STRING == regionField.getValue().getValueType()) {
String region = regionField.getValue().asString();
System.out.printf("Region: %s, confidence: %.2f%n", region, regionField.getConfidence());
}
}
}
//</snippet_id_print>
// <snippet_train_call>
private static String TrainModel(FormTrainingClient trainingClient, String trainingDataUrl) {
SyncPoller < FormRecognizerOperationResult,
CustomFormModel > trainingPoller = trainingClient.beginTraining(trainingDataUrl, false);
CustomFormModel customFormModel = trainingPoller.getFinalResult();
// Model Info
System.out.printf("Model Id: %s%n", customFormModel.getModelId());
System.out.printf("Model Status: %s%n", customFormModel.getModelStatus());
System.out.printf("Training started on: %s%n", customFormModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n%n", customFormModel.getTrainingCompletedOn());
// </snippet_train_call>
// <snippet_train_print>
System.out.println("Recognized Fields:");
// looping through the subModels, which contains the fields they were trained on
// Since the given training documents are unlabeled, we still group them but
// they do not have a label.
customFormModel.getSubmodels().forEach(customFormSubmodel - >{
// Since the training data is unlabeled, we are unable to return the accuracy of
// this model
System.out.printf("The subModel has form type %s%n", customFormSubmodel.getFormType());
customFormSubmodel.getFields().forEach((field, customFormModelField) - >System.out.printf("The model found field '%s' with label: %s%n", field, customFormModelField.getLabel()));
});
// </snippet_train_print>
// <snippet_train_return>
return customFormModel.getModelId();
}
// </snippet_train_return>
// <snippet_trainlabels_call>
private static String TrainModelWithLabels(FormTrainingClient trainingClient, String trainingDataUrl) {
// Train custom model
String trainingSetSource = trainingDataUrl;
SyncPoller < FormRecognizerOperationResult,
CustomFormModel > trainingPoller = trainingClient.beginTraining(trainingSetSource, true);
CustomFormModel customFormModel = trainingPoller.getFinalResult();
// Model Info
System.out.printf("Model Id: %s%n", customFormModel.getModelId());
System.out.printf("Model Status: %s%n", customFormModel.getModelStatus());
System.out.printf("Training started on: %s%n", customFormModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n%n", customFormModel.getTrainingCompletedOn());
// </snippet_trainlabels_call>
// <snippet_trainlabels_print>
// looping through the subModels, which contains the fields they were trained on
// The labels are based on the ones you gave the training document.
System.out.println("Recognized Fields:");
// Since the data is labeled, we are able to return the accuracy of the model
customFormModel.getSubmodels().forEach(customFormSubmodel - >{
System.out.printf("The subModel with form type %s has accuracy: %.2f%n", customFormSubmodel.getFormType(), customFormSubmodel.getAccuracy());
customFormSubmodel.getFields().forEach((label, customFormModelField) - >System.out.printf("The model found field '%s' to have name: %s with an accuracy: %.2f%n", label, customFormModelField.getName(), customFormModelField.getAccuracy()));
});
return customFormModel.getModelId();
}
// </snippet_trainlabels_print>
// <snippet_analyze_call>
// Analyze PDF form data
private static void AnalyzePdfForm(FormRecognizerClient formClient, String modelId, String pdfFormUrl) {
SyncPoller < FormRecognizerOperationResult,
List < RecognizedForm >> recognizeFormPoller = formClient.beginRecognizeCustomFormsFromUrl(modelId, pdfFormUrl);
List < RecognizedForm > recognizedForms = recognizeFormPoller.getFinalResult();
// </snippet_analyze_call>
// <snippet_analyze_print>
for (int i = 0; i < recognizedForms.size(); i++) {
final RecognizedForm form = recognizedForms.get(i);
System.out.printf("----------- Recognized custom form info for page %d -----------%n", i);
System.out.printf("Form type: %s%n", form.getFormType());
form.getFields().forEach((label, formField) - >
// label data is populated if you are using a model trained with unlabeled data,
// since the service needs to make predictions for labels if not explicitly
// given to it.
System.out.printf("Field '%s' has label '%s' with a confidence " + "score of %.2f.%n", label, formField.getLabelData().getText(), formField.getConfidence()));
}
}
// </snippet_analyze_print>
// <snippet_manage>
private static void ManageModels(FormTrainingClient trainingClient, String trainingFileUrl) {
// </snippet_manage>
// <snippet_manage_count>
AtomicReference < String > modelId = new AtomicReference < >();
// First, we see how many custom models we have, and what our limit is
AccountProperties accountProperties = trainingClient.getAccountProperties();
System.out.printf("The account has %s custom models, and we can have at most %s custom models", accountProperties.getCustomModelCount(), accountProperties.getCustomModelLimit());
// </snippet_manage_count>
// <snippet_manage_list>
// Next, we get a paged list of all of our custom models
PagedIterable < CustomFormModelInfo > customModels = trainingClient.listCustomModels();
System.out.println("We have following models in the account:");
customModels.forEach(customFormModelInfo - >{
System.out.printf("Model Id: %s%n", customFormModelInfo.getModelId());
// get custom model info
modelId.set(customFormModelInfo.getModelId());
CustomFormModel customModel = trainingClient.getCustomModel(customFormModelInfo.getModelId());
System.out.printf("Model Id: %s%n", customModel.getModelId());
System.out.printf("Model Status: %s%n", customModel.getModelStatus());
System.out.printf("Training started on: %s%n", customModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n", customModel.getTrainingCompletedOn());
customModel.getSubmodels().forEach(customFormSubmodel - >{
System.out.printf("Custom Model Form type: %s%n", customFormSubmodel.getFormType());
System.out.printf("Custom Model Accuracy: %.2f%n", customFormSubmodel.getAccuracy());
if (customFormSubmodel.getFields() != null) {
customFormSubmodel.getFields().forEach((fieldText, customFormModelField) - >{
System.out.printf("Field Text: %s%n", fieldText);
System.out.printf("Field Accuracy: %.2f%n", customFormModelField.getAccuracy());
});
}
});
});
// </snippet_manage_list>
// <snippet_manage_delete>
// Delete Custom Model
System.out.printf("Deleted model with model Id: %s, operation completed with status: %s%n", modelId.get(), trainingClient.deleteModelWithResponse(modelId.get(), Context.NONE).getStatusCode());
}
// </snippet_manage_delete>
}
使用对象模型
在文档智能中,可以创建两种不同的客户端类型。 第一种是 FormRecognizerClient
,用于查询服务以识别表单域和内容。 第二种是 FormTrainingClient
,用于创建和管理可改进识别的自定义模型。
FormRecognizerClient
提供以下任务的操作:
- 使用为了分析自定义表单而训练的自定义模型,来识别表单字段和内容。 这些值在
RecognizedForm
对象的集合中返回。 请参阅分析自定义表单。 - 无需训练模型即可识别表单内容,包括表格、行和单词。 表单内容在
FormPage
对象的集合中返回。 请参阅分析布局。 - 使用文档智能服务上预先训练的模型来识别美国的收据、名片、发票和 ID 文档中的常见字段。
FormTrainingClient
提供操作以实现以下目的:
- 训练自定义模型,以分析在自定义表单中找到的所有字段和值。 会返回一个
CustomFormModel
,它指示模型分析的表单类型,以及为每种表单类型提取的字段。 - 训练自定义模型,以分析通过标记自定义表单指定的特定字段和值。 会返回一个
CustomFormModel
,它指示模型提取的字段以及每个字段的估计准确度。 - 管理在帐户中创建的模型。
- 将自定义模型从一个文档智能资源复制到另一个资源。
注意
还可以使用图形用户界面(例如示例标记工具)来训练模型。
对客户端进行身份验证
在 main
方法顶部添加以下代码。 你将使用前面定义的订阅变量对两个客户端对象进行身份验证。 你将使用 AzureKeyCredential
对象,因此在需要时,你无需创建新的客户端对象即可更新密钥。
// <snippet_imports>
import com.azure.ai.formrecognizer.*;
import com.azure.ai.formrecognizer.training.*;
import com.azure.ai.formrecognizer.models.*;
import com.azure.ai.formrecognizer.training.models.*;
import java.util.concurrent.atomic.AtomicReference;
import java.util.List;
import java.util.Map;
import java.time.LocalDate;
import com.azure.core.credential.AzureKeyCredential;
import com.azure.core.http.rest.PagedIterable;
import com.azure.core.util.Context;
import com.azure.core.util.polling.SyncPoller;
// </snippet_imports>
public class FormRecognizer {
// <snippet_creds>
static final String key = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
static final String endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
// </snippet_creds>
public static void main(String[] args) {
// <snippet_auth>
FormRecognizerClient recognizerClient = new FormRecognizerClientBuilder()
.credential(new AzureKeyCredential(key)).endpoint(endpoint).buildClient();
FormTrainingClient trainingClient = new FormTrainingClientBuilder().credential(new AzureKeyCredential(key))
.endpoint(endpoint).buildClient();
// </snippet_auth>
// <snippet_mainvars>
String trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
String formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
String receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_mainvars>
// <snippet_maincalls>
// Call Form Recognizer scenarios:
System.out.println("Get form content...");
GetContent(recognizerClient, formUrl);
System.out.println("Analyze receipt...");
AnalyzeReceipt(recognizerClient, receiptUrl);
System.out.println("Train Model with training data...");
String modelId = TrainModel(trainingClient, trainingDataUrl);
System.out.println("Analyze PDF form...");
AnalyzePdfForm(recognizerClient, modelId, formUrl);
System.out.println("Manage models...");
ManageModels(trainingClient, trainingDataUrl);
// </snippet_maincalls>
}
// <snippet_getcontent_call>
private static void GetContent(FormRecognizerClient recognizerClient, String invoiceUri) {
String analyzeFilePath = invoiceUri;
SyncPoller<FormRecognizerOperationResult, List<FormPage>> recognizeContentPoller = recognizerClient
.beginRecognizeContentFromUrl(analyzeFilePath);
List<FormPage> contentResult = recognizeContentPoller.getFinalResult();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
contentResult.forEach(formPage -> {
// Table information
System.out.println("----Recognizing content ----");
System.out.printf("Has width: %f and height: %f, measured with unit: %s.%n", formPage.getWidth(),
formPage.getHeight(), formPage.getUnit());
formPage.getTables().forEach(formTable -> {
System.out.printf("Table has %d rows and %d columns.%n", formTable.getRowCount(),
formTable.getColumnCount());
formTable.getCells().forEach(formTableCell -> {
System.out.printf("Cell has text %s.%n", formTableCell.getText());
});
System.out.println();
});
});
}
// </snippet_getcontent_print>
// <snippet_receipts_call>
private static void AnalyzeReceipt(FormRecognizerClient recognizerClient, String receiptUri) {
SyncPoller<FormRecognizerOperationResult, List<RecognizedForm>> syncPoller = recognizerClient
.beginRecognizeReceiptsFromUrl(receiptUri);
List<RecognizedForm> receiptPageResults = syncPoller.getFinalResult();
// </snippet_receipts_call>
// <snippet_receipts_print>
for (int i = 0; i < receiptPageResults.size(); i++) {
RecognizedForm recognizedForm = receiptPageResults.get(i);
Map<String, FormField> recognizedFields = recognizedForm.getFields();
System.out.printf("----------- Recognized Receipt page %d -----------%n", i);
FormField merchantNameField = recognizedFields.get("MerchantName");
if (merchantNameField != null) {
if (FieldValueType.STRING == merchantNameField.getValue().getValueType()) {
String merchantName = merchantNameField.getValue().asString();
System.out.printf("Merchant Name: %s, confidence: %.2f%n", merchantName,
merchantNameField.getConfidence());
}
}
FormField merchantAddressField = recognizedFields.get("MerchantAddress");
if (merchantAddressField != null) {
if (FieldValueType.STRING == merchantAddressField.getValue().getValueType()) {
String merchantAddress = merchantAddressField.getValue().asString();
System.out.printf("Merchant Address: %s, confidence: %.2f%n", merchantAddress,
merchantAddressField.getConfidence());
}
}
FormField transactionDateField = recognizedFields.get("TransactionDate");
if (transactionDateField != null) {
if (FieldValueType.DATE == transactionDateField.getValue().getValueType()) {
LocalDate transactionDate = transactionDateField.getValue().asDate();
System.out.printf("Transaction Date: %s, confidence: %.2f%n", transactionDate,
transactionDateField.getConfidence());
}
}
// </snippet_receipts_print>
// <snippet_receipts_print_items>
FormField receiptItemsField = recognizedFields.get("Items");
if (receiptItemsField != null) {
System.out.printf("Receipt Items: %n");
if (FieldValueType.LIST == receiptItemsField.getValue().getValueType()) {
List<FormField> receiptItems = receiptItemsField.getValue().asList();
receiptItems.stream()
.filter(receiptItem -> FieldValueType.MAP == receiptItem.getValue().getValueType())
.map(formField -> formField.getValue().asMap())
.forEach(formFieldMap -> formFieldMap.forEach((key, formField) -> {
if ("Name".equals(key)) {
if (FieldValueType.STRING == formField.getValue().getValueType()) {
String name = formField.getValue().asString();
System.out.printf("Name: %s, confidence: %.2fs%n", name,
formField.getConfidence());
}
}
if ("Quantity".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float quantity = formField.getValue().asFloat();
System.out.printf("Quantity: %f, confidence: %.2f%n", quantity,
formField.getConfidence());
}
}
if ("Price".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float price = formField.getValue().asFloat();
System.out.printf("Price: %f, confidence: %.2f%n", price,
formField.getConfidence());
}
}
if ("TotalPrice".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float totalPrice = formField.getValue().asFloat();
System.out.printf("Total Price: %f, confidence: %.2f%n", totalPrice,
formField.getConfidence());
}
}
}));
}
}
}
}
// </snippet_receipts_print_items>
// <snippet_train_call>
private static String TrainModel(FormTrainingClient trainingClient, String trainingDataUrl) {
SyncPoller<FormRecognizerOperationResult, CustomFormModel> trainingPoller = trainingClient
.beginTraining(trainingDataUrl, false);
CustomFormModel customFormModel = trainingPoller.getFinalResult();
// Model Info
System.out.printf("Model Id: %s%n", customFormModel.getModelId());
System.out.printf("Model Status: %s%n", customFormModel.getModelStatus());
System.out.printf("Training started on: %s%n", customFormModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n%n", customFormModel.getTrainingCompletedOn());
// </snippet_train_call>
// <snippet_train_print>
System.out.println("Recognized Fields:");
// looping through the subModels, which contains the fields they were trained on
// Since the given training documents are unlabeled, we still group them but
// they do not have a label.
customFormModel.getSubmodels().forEach(customFormSubmodel -> {
// Since the training data is unlabeled, we are unable to return the accuracy of
// this model
System.out.printf("The subModel has form type %s%n", customFormSubmodel.getFormType());
customFormSubmodel.getFields().forEach((field, customFormModelField) -> System.out
.printf("The model found field '%s' with label: %s%n", field, customFormModelField.getLabel()));
});
// </snippet_train_print>
// <snippet_train_return>
return customFormModel.getModelId();
}
// </snippet_train_return>
// <snippet_trainlabels_call>
private static String TrainModelWithLabels(FormTrainingClient trainingClient, String trainingDataUrl) {
// Train custom model
String trainingSetSource = trainingDataUrl;
SyncPoller<FormRecognizerOperationResult, CustomFormModel> trainingPoller = trainingClient
.beginTraining(trainingSetSource, true);
CustomFormModel customFormModel = trainingPoller.getFinalResult();
// Model Info
System.out.printf("Model Id: %s%n", customFormModel.getModelId());
System.out.printf("Model Status: %s%n", customFormModel.getModelStatus());
System.out.printf("Training started on: %s%n", customFormModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n%n", customFormModel.getTrainingCompletedOn());
// </snippet_trainlabels_call>
// <snippet_trainlabels_print>
// looping through the subModels, which contains the fields they were trained on
// The labels are based on the ones you gave the training document.
System.out.println("Recognized Fields:");
// Since the data is labeled, we are able to return the accuracy of the model
customFormModel.getSubmodels().forEach(customFormSubmodel -> {
System.out.printf("The subModel with form type %s has accuracy: %.2f%n", customFormSubmodel.getFormType(),
customFormSubmodel.getAccuracy());
customFormSubmodel.getFields()
.forEach((label, customFormModelField) -> System.out.printf(
"The model found field '%s' to have name: %s with an accuracy: %.2f%n", label,
customFormModelField.getName(), customFormModelField.getAccuracy()));
});
return customFormModel.getModelId();
}
// </snippet_trainlabels_print>
// <snippet_analyze_call>
// Analyze PDF form data
private static void AnalyzePdfForm(FormRecognizerClient formClient, String modelId, String pdfFormUrl) {
SyncPoller<FormRecognizerOperationResult, List<RecognizedForm>> recognizeFormPoller = formClient
.beginRecognizeCustomFormsFromUrl(modelId, pdfFormUrl);
List<RecognizedForm> recognizedForms = recognizeFormPoller.getFinalResult();
// </snippet_analyze_call>
// <snippet_analyze_print>
for (int i = 0; i < recognizedForms.size(); i++) {
final RecognizedForm form = recognizedForms.get(i);
System.out.printf("----------- Recognized custom form info for page %d -----------%n", i);
System.out.printf("Form type: %s%n", form.getFormType());
form.getFields().forEach((label, formField) ->
// label data is populated if you are using a model trained with unlabeled data,
// since the service needs to make predictions for labels if not explicitly
// given to it.
System.out.printf("Field '%s' has label '%s' with a confidence " + "score of %.2f.%n", label,
formField.getLabelData().getText(), formField.getConfidence()));
}
}
// </snippet_analyze_print>
// <snippet_manage>
private static void ManageModels(FormTrainingClient trainingClient, String trainingFileUrl) {
// </snippet_manage>
// <snippet_manage_count>
AtomicReference<String> modelId = new AtomicReference<>();
// First, we see how many custom models we have, and what our limit is
AccountProperties accountProperties = trainingClient.getAccountProperties();
System.out.printf("The account has %s custom models, and we can have at most %s custom models",
accountProperties.getCustomModelCount(), accountProperties.getCustomModelLimit());
// </snippet_manage_count>
// <snippet_manage_list>
// Next, we get a paged list of all of our custom models
PagedIterable<CustomFormModelInfo> customModels = trainingClient.listCustomModels();
System.out.println("We have following models in the account:");
customModels.forEach(customFormModelInfo -> {
System.out.printf("Model Id: %s%n", customFormModelInfo.getModelId());
// get custom model info
modelId.set(customFormModelInfo.getModelId());
CustomFormModel customModel = trainingClient.getCustomModel(customFormModelInfo.getModelId());
System.out.printf("Model Id: %s%n", customModel.getModelId());
System.out.printf("Model Status: %s%n", customModel.getModelStatus());
System.out.printf("Training started on: %s%n", customModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n", customModel.getTrainingCompletedOn());
customModel.getSubmodels().forEach(customFormSubmodel -> {
System.out.printf("Custom Model Form type: %s%n", customFormSubmodel.getFormType());
System.out.printf("Custom Model Accuracy: %.2f%n", customFormSubmodel.getAccuracy());
if (customFormSubmodel.getFields() != null) {
customFormSubmodel.getFields().forEach((fieldText, customFormModelField) -> {
System.out.printf("Field Text: %s%n", fieldText);
System.out.printf("Field Accuracy: %.2f%n", customFormModelField.getAccuracy());
});
}
});
});
// </snippet_manage_list>
// <snippet_manage_delete>
// Delete Custom Model
System.out.printf("Deleted model with model Id: %s, operation completed with status: %s%n", modelId.get(),
trainingClient.deleteModelWithResponse(modelId.get(), Context.NONE).getStatusCode());
}
// </snippet_manage_delete>
}
分析布局
可以使用文档智能分析文档中的表格、行和单词,而无需训练模型。 有关布局提取的详细信息,请参阅文档智能布局模型。
若要分析位于给定 URL 的文件的内容,请使用 beginRecognizeContentFromUrl
方法。
// <snippet_imports>
import com.azure.ai.formrecognizer.*;
import com.azure.ai.formrecognizer.training.*;
import com.azure.ai.formrecognizer.models.*;
import com.azure.ai.formrecognizer.training.models.*;
import java.util.concurrent.atomic.AtomicReference;
import java.util.List;
import java.util.Map;
import java.time.LocalDate;
import com.azure.core.credential.AzureKeyCredential;
import com.azure.core.http.rest.PagedIterable;
import com.azure.core.util.Context;
import com.azure.core.util.polling.SyncPoller;
// </snippet_imports>
public class FormRecognizer {
// <snippet_creds>
static final String key = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
static final String endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
// </snippet_creds>
public static void main(String[] args) {
// <snippet_auth>
FormRecognizerClient recognizerClient = new FormRecognizerClientBuilder()
.credential(new AzureKeyCredential(key)).endpoint(endpoint).buildClient();
FormTrainingClient trainingClient = new FormTrainingClientBuilder().credential(new AzureKeyCredential(key))
.endpoint(endpoint).buildClient();
// </snippet_auth>
// <snippet_mainvars>
String trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
String formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
String receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_mainvars>
// <snippet_maincalls>
// Call Form Recognizer scenarios:
System.out.println("Get form content...");
GetContent(recognizerClient, formUrl);
System.out.println("Analyze receipt...");
AnalyzeReceipt(recognizerClient, receiptUrl);
System.out.println("Train Model with training data...");
String modelId = TrainModel(trainingClient, trainingDataUrl);
System.out.println("Analyze PDF form...");
AnalyzePdfForm(recognizerClient, modelId, formUrl);
System.out.println("Manage models...");
ManageModels(trainingClient, trainingDataUrl);
// </snippet_maincalls>
}
// <snippet_getcontent_call>
private static void GetContent(FormRecognizerClient recognizerClient, String invoiceUri) {
String analyzeFilePath = invoiceUri;
SyncPoller<FormRecognizerOperationResult, List<FormPage>> recognizeContentPoller = recognizerClient
.beginRecognizeContentFromUrl(analyzeFilePath);
List<FormPage> contentResult = recognizeContentPoller.getFinalResult();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
contentResult.forEach(formPage -> {
// Table information
System.out.println("----Recognizing content ----");
System.out.printf("Has width: %f and height: %f, measured with unit: %s.%n", formPage.getWidth(),
formPage.getHeight(), formPage.getUnit());
formPage.getTables().forEach(formTable -> {
System.out.printf("Table has %d rows and %d columns.%n", formTable.getRowCount(),
formTable.getColumnCount());
formTable.getCells().forEach(formTableCell -> {
System.out.printf("Cell has text %s.%n", formTableCell.getText());
});
System.out.println();
});
});
}
// </snippet_getcontent_print>
// <snippet_receipts_call>
private static void AnalyzeReceipt(FormRecognizerClient recognizerClient, String receiptUri) {
SyncPoller<FormRecognizerOperationResult, List<RecognizedForm>> syncPoller = recognizerClient
.beginRecognizeReceiptsFromUrl(receiptUri);
List<RecognizedForm> receiptPageResults = syncPoller.getFinalResult();
// </snippet_receipts_call>
// <snippet_receipts_print>
for (int i = 0; i < receiptPageResults.size(); i++) {
RecognizedForm recognizedForm = receiptPageResults.get(i);
Map<String, FormField> recognizedFields = recognizedForm.getFields();
System.out.printf("----------- Recognized Receipt page %d -----------%n", i);
FormField merchantNameField = recognizedFields.get("MerchantName");
if (merchantNameField != null) {
if (FieldValueType.STRING == merchantNameField.getValue().getValueType()) {
String merchantName = merchantNameField.getValue().asString();
System.out.printf("Merchant Name: %s, confidence: %.2f%n", merchantName,
merchantNameField.getConfidence());
}
}
FormField merchantAddressField = recognizedFields.get("MerchantAddress");
if (merchantAddressField != null) {
if (FieldValueType.STRING == merchantAddressField.getValue().getValueType()) {
String merchantAddress = merchantAddressField.getValue().asString();
System.out.printf("Merchant Address: %s, confidence: %.2f%n", merchantAddress,
merchantAddressField.getConfidence());
}
}
FormField transactionDateField = recognizedFields.get("TransactionDate");
if (transactionDateField != null) {
if (FieldValueType.DATE == transactionDateField.getValue().getValueType()) {
LocalDate transactionDate = transactionDateField.getValue().asDate();
System.out.printf("Transaction Date: %s, confidence: %.2f%n", transactionDate,
transactionDateField.getConfidence());
}
}
// </snippet_receipts_print>
// <snippet_receipts_print_items>
FormField receiptItemsField = recognizedFields.get("Items");
if (receiptItemsField != null) {
System.out.printf("Receipt Items: %n");
if (FieldValueType.LIST == receiptItemsField.getValue().getValueType()) {
List<FormField> receiptItems = receiptItemsField.getValue().asList();
receiptItems.stream()
.filter(receiptItem -> FieldValueType.MAP == receiptItem.getValue().getValueType())
.map(formField -> formField.getValue().asMap())
.forEach(formFieldMap -> formFieldMap.forEach((key, formField) -> {
if ("Name".equals(key)) {
if (FieldValueType.STRING == formField.getValue().getValueType()) {
String name = formField.getValue().asString();
System.out.printf("Name: %s, confidence: %.2fs%n", name,
formField.getConfidence());
}
}
if ("Quantity".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float quantity = formField.getValue().asFloat();
System.out.printf("Quantity: %f, confidence: %.2f%n", quantity,
formField.getConfidence());
}
}
if ("Price".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float price = formField.getValue().asFloat();
System.out.printf("Price: %f, confidence: %.2f%n", price,
formField.getConfidence());
}
}
if ("TotalPrice".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float totalPrice = formField.getValue().asFloat();
System.out.printf("Total Price: %f, confidence: %.2f%n", totalPrice,
formField.getConfidence());
}
}
}));
}
}
}
}
// </snippet_receipts_print_items>
// <snippet_train_call>
private static String TrainModel(FormTrainingClient trainingClient, String trainingDataUrl) {
SyncPoller<FormRecognizerOperationResult, CustomFormModel> trainingPoller = trainingClient
.beginTraining(trainingDataUrl, false);
CustomFormModel customFormModel = trainingPoller.getFinalResult();
// Model Info
System.out.printf("Model Id: %s%n", customFormModel.getModelId());
System.out.printf("Model Status: %s%n", customFormModel.getModelStatus());
System.out.printf("Training started on: %s%n", customFormModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n%n", customFormModel.getTrainingCompletedOn());
// </snippet_train_call>
// <snippet_train_print>
System.out.println("Recognized Fields:");
// looping through the subModels, which contains the fields they were trained on
// Since the given training documents are unlabeled, we still group them but
// they do not have a label.
customFormModel.getSubmodels().forEach(customFormSubmodel -> {
// Since the training data is unlabeled, we are unable to return the accuracy of
// this model
System.out.printf("The subModel has form type %s%n", customFormSubmodel.getFormType());
customFormSubmodel.getFields().forEach((field, customFormModelField) -> System.out
.printf("The model found field '%s' with label: %s%n", field, customFormModelField.getLabel()));
});
// </snippet_train_print>
// <snippet_train_return>
return customFormModel.getModelId();
}
// </snippet_train_return>
// <snippet_trainlabels_call>
private static String TrainModelWithLabels(FormTrainingClient trainingClient, String trainingDataUrl) {
// Train custom model
String trainingSetSource = trainingDataUrl;
SyncPoller<FormRecognizerOperationResult, CustomFormModel> trainingPoller = trainingClient
.beginTraining(trainingSetSource, true);
CustomFormModel customFormModel = trainingPoller.getFinalResult();
// Model Info
System.out.printf("Model Id: %s%n", customFormModel.getModelId());
System.out.printf("Model Status: %s%n", customFormModel.getModelStatus());
System.out.printf("Training started on: %s%n", customFormModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n%n", customFormModel.getTrainingCompletedOn());
// </snippet_trainlabels_call>
// <snippet_trainlabels_print>
// looping through the subModels, which contains the fields they were trained on
// The labels are based on the ones you gave the training document.
System.out.println("Recognized Fields:");
// Since the data is labeled, we are able to return the accuracy of the model
customFormModel.getSubmodels().forEach(customFormSubmodel -> {
System.out.printf("The subModel with form type %s has accuracy: %.2f%n", customFormSubmodel.getFormType(),
customFormSubmodel.getAccuracy());
customFormSubmodel.getFields()
.forEach((label, customFormModelField) -> System.out.printf(
"The model found field '%s' to have name: %s with an accuracy: %.2f%n", label,
customFormModelField.getName(), customFormModelField.getAccuracy()));
});
return customFormModel.getModelId();
}
// </snippet_trainlabels_print>
// <snippet_analyze_call>
// Analyze PDF form data
private static void AnalyzePdfForm(FormRecognizerClient formClient, String modelId, String pdfFormUrl) {
SyncPoller<FormRecognizerOperationResult, List<RecognizedForm>> recognizeFormPoller = formClient
.beginRecognizeCustomFormsFromUrl(modelId, pdfFormUrl);
List<RecognizedForm> recognizedForms = recognizeFormPoller.getFinalResult();
// </snippet_analyze_call>
// <snippet_analyze_print>
for (int i = 0; i < recognizedForms.size(); i++) {
final RecognizedForm form = recognizedForms.get(i);
System.out.printf("----------- Recognized custom form info for page %d -----------%n", i);
System.out.printf("Form type: %s%n", form.getFormType());
form.getFields().forEach((label, formField) ->
// label data is populated if you are using a model trained with unlabeled data,
// since the service needs to make predictions for labels if not explicitly
// given to it.
System.out.printf("Field '%s' has label '%s' with a confidence " + "score of %.2f.%n", label,
formField.getLabelData().getText(), formField.getConfidence()));
}
}
// </snippet_analyze_print>
// <snippet_manage>
private static void ManageModels(FormTrainingClient trainingClient, String trainingFileUrl) {
// </snippet_manage>
// <snippet_manage_count>
AtomicReference<String> modelId = new AtomicReference<>();
// First, we see how many custom models we have, and what our limit is
AccountProperties accountProperties = trainingClient.getAccountProperties();
System.out.printf("The account has %s custom models, and we can have at most %s custom models",
accountProperties.getCustomModelCount(), accountProperties.getCustomModelLimit());
// </snippet_manage_count>
// <snippet_manage_list>
// Next, we get a paged list of all of our custom models
PagedIterable<CustomFormModelInfo> customModels = trainingClient.listCustomModels();
System.out.println("We have following models in the account:");
customModels.forEach(customFormModelInfo -> {
System.out.printf("Model Id: %s%n", customFormModelInfo.getModelId());
// get custom model info
modelId.set(customFormModelInfo.getModelId());
CustomFormModel customModel = trainingClient.getCustomModel(customFormModelInfo.getModelId());
System.out.printf("Model Id: %s%n", customModel.getModelId());
System.out.printf("Model Status: %s%n", customModel.getModelStatus());
System.out.printf("Training started on: %s%n", customModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n", customModel.getTrainingCompletedOn());
customModel.getSubmodels().forEach(customFormSubmodel -> {
System.out.printf("Custom Model Form type: %s%n", customFormSubmodel.getFormType());
System.out.printf("Custom Model Accuracy: %.2f%n", customFormSubmodel.getAccuracy());
if (customFormSubmodel.getFields() != null) {
customFormSubmodel.getFields().forEach((fieldText, customFormModelField) -> {
System.out.printf("Field Text: %s%n", fieldText);
System.out.printf("Field Accuracy: %.2f%n", customFormModelField.getAccuracy());
});
}
});
});
// </snippet_manage_list>
// <snippet_manage_delete>
// Delete Custom Model
System.out.printf("Deleted model with model Id: %s, operation completed with status: %s%n", modelId.get(),
trainingClient.deleteModelWithResponse(modelId.get(), Context.NONE).getStatusCode());
}
// </snippet_manage_delete>
}
提示
还可从本地文件中获取内容。 请参阅 FormRecognizerClient 方法,例如 beginRecognizeContent
。 或者,请参阅 GitHub 上的示例代码,了解涉及本地图像的方案。
返回的值是 FormPage
对象的集合。 提交的文档中每个页面都有一个对象。 下面的代码循环访问这些对象,并输出提取的键/值对和表数据。
// <snippet_imports>
import com.azure.ai.formrecognizer.*;
import com.azure.ai.formrecognizer.training.*;
import com.azure.ai.formrecognizer.models.*;
import com.azure.ai.formrecognizer.training.models.*;
import java.util.concurrent.atomic.AtomicReference;
import java.util.List;
import java.util.Map;
import java.time.LocalDate;
import com.azure.core.credential.AzureKeyCredential;
import com.azure.core.http.rest.PagedIterable;
import com.azure.core.util.Context;
import com.azure.core.util.polling.SyncPoller;
// </snippet_imports>
public class FormRecognizer {
// <snippet_creds>
static final String key = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
static final String endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
// </snippet_creds>
public static void main(String[] args) {
// <snippet_auth>
FormRecognizerClient recognizerClient = new FormRecognizerClientBuilder()
.credential(new AzureKeyCredential(key)).endpoint(endpoint).buildClient();
FormTrainingClient trainingClient = new FormTrainingClientBuilder().credential(new AzureKeyCredential(key))
.endpoint(endpoint).buildClient();
// </snippet_auth>
// <snippet_mainvars>
String trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
String formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
String receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_mainvars>
// <snippet_maincalls>
// Call Form Recognizer scenarios:
System.out.println("Get form content...");
GetContent(recognizerClient, formUrl);
System.out.println("Analyze receipt...");
AnalyzeReceipt(recognizerClient, receiptUrl);
System.out.println("Train Model with training data...");
String modelId = TrainModel(trainingClient, trainingDataUrl);
System.out.println("Analyze PDF form...");
AnalyzePdfForm(recognizerClient, modelId, formUrl);
System.out.println("Manage models...");
ManageModels(trainingClient, trainingDataUrl);
// </snippet_maincalls>
}
// <snippet_getcontent_call>
private static void GetContent(FormRecognizerClient recognizerClient, String invoiceUri) {
String analyzeFilePath = invoiceUri;
SyncPoller<FormRecognizerOperationResult, List<FormPage>> recognizeContentPoller = recognizerClient
.beginRecognizeContentFromUrl(analyzeFilePath);
List<FormPage> contentResult = recognizeContentPoller.getFinalResult();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
contentResult.forEach(formPage -> {
// Table information
System.out.println("----Recognizing content ----");
System.out.printf("Has width: %f and height: %f, measured with unit: %s.%n", formPage.getWidth(),
formPage.getHeight(), formPage.getUnit());
formPage.getTables().forEach(formTable -> {
System.out.printf("Table has %d rows and %d columns.%n", formTable.getRowCount(),
formTable.getColumnCount());
formTable.getCells().forEach(formTableCell -> {
System.out.printf("Cell has text %s.%n", formTableCell.getText());
});
System.out.println();
});
});
}
// </snippet_getcontent_print>
// <snippet_receipts_call>
private static void AnalyzeReceipt(FormRecognizerClient recognizerClient, String receiptUri) {
SyncPoller<FormRecognizerOperationResult, List<RecognizedForm>> syncPoller = recognizerClient
.beginRecognizeReceiptsFromUrl(receiptUri);
List<RecognizedForm> receiptPageResults = syncPoller.getFinalResult();
// </snippet_receipts_call>
// <snippet_receipts_print>
for (int i = 0; i < receiptPageResults.size(); i++) {
RecognizedForm recognizedForm = receiptPageResults.get(i);
Map<String, FormField> recognizedFields = recognizedForm.getFields();
System.out.printf("----------- Recognized Receipt page %d -----------%n", i);
FormField merchantNameField = recognizedFields.get("MerchantName");
if (merchantNameField != null) {
if (FieldValueType.STRING == merchantNameField.getValue().getValueType()) {
String merchantName = merchantNameField.getValue().asString();
System.out.printf("Merchant Name: %s, confidence: %.2f%n", merchantName,
merchantNameField.getConfidence());
}
}
FormField merchantAddressField = recognizedFields.get("MerchantAddress");
if (merchantAddressField != null) {
if (FieldValueType.STRING == merchantAddressField.getValue().getValueType()) {
String merchantAddress = merchantAddressField.getValue().asString();
System.out.printf("Merchant Address: %s, confidence: %.2f%n", merchantAddress,
merchantAddressField.getConfidence());
}
}
FormField transactionDateField = recognizedFields.get("TransactionDate");
if (transactionDateField != null) {
if (FieldValueType.DATE == transactionDateField.getValue().getValueType()) {
LocalDate transactionDate = transactionDateField.getValue().asDate();
System.out.printf("Transaction Date: %s, confidence: %.2f%n", transactionDate,
transactionDateField.getConfidence());
}
}
// </snippet_receipts_print>
// <snippet_receipts_print_items>
FormField receiptItemsField = recognizedFields.get("Items");
if (receiptItemsField != null) {
System.out.printf("Receipt Items: %n");
if (FieldValueType.LIST == receiptItemsField.getValue().getValueType()) {
List<FormField> receiptItems = receiptItemsField.getValue().asList();
receiptItems.stream()
.filter(receiptItem -> FieldValueType.MAP == receiptItem.getValue().getValueType())
.map(formField -> formField.getValue().asMap())
.forEach(formFieldMap -> formFieldMap.forEach((key, formField) -> {
if ("Name".equals(key)) {
if (FieldValueType.STRING == formField.getValue().getValueType()) {
String name = formField.getValue().asString();
System.out.printf("Name: %s, confidence: %.2fs%n", name,
formField.getConfidence());
}
}
if ("Quantity".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float quantity = formField.getValue().asFloat();
System.out.printf("Quantity: %f, confidence: %.2f%n", quantity,
formField.getConfidence());
}
}
if ("Price".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float price = formField.getValue().asFloat();
System.out.printf("Price: %f, confidence: %.2f%n", price,
formField.getConfidence());
}
}
if ("TotalPrice".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float totalPrice = formField.getValue().asFloat();
System.out.printf("Total Price: %f, confidence: %.2f%n", totalPrice,
formField.getConfidence());
}
}
}));
}
}
}
}
// </snippet_receipts_print_items>
// <snippet_train_call>
private static String TrainModel(FormTrainingClient trainingClient, String trainingDataUrl) {
SyncPoller<FormRecognizerOperationResult, CustomFormModel> trainingPoller = trainingClient
.beginTraining(trainingDataUrl, false);
CustomFormModel customFormModel = trainingPoller.getFinalResult();
// Model Info
System.out.printf("Model Id: %s%n", customFormModel.getModelId());
System.out.printf("Model Status: %s%n", customFormModel.getModelStatus());
System.out.printf("Training started on: %s%n", customFormModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n%n", customFormModel.getTrainingCompletedOn());
// </snippet_train_call>
// <snippet_train_print>
System.out.println("Recognized Fields:");
// looping through the subModels, which contains the fields they were trained on
// Since the given training documents are unlabeled, we still group them but
// they do not have a label.
customFormModel.getSubmodels().forEach(customFormSubmodel -> {
// Since the training data is unlabeled, we are unable to return the accuracy of
// this model
System.out.printf("The subModel has form type %s%n", customFormSubmodel.getFormType());
customFormSubmodel.getFields().forEach((field, customFormModelField) -> System.out
.printf("The model found field '%s' with label: %s%n", field, customFormModelField.getLabel()));
});
// </snippet_train_print>
// <snippet_train_return>
return customFormModel.getModelId();
}
// </snippet_train_return>
// <snippet_trainlabels_call>
private static String TrainModelWithLabels(FormTrainingClient trainingClient, String trainingDataUrl) {
// Train custom model
String trainingSetSource = trainingDataUrl;
SyncPoller<FormRecognizerOperationResult, CustomFormModel> trainingPoller = trainingClient
.beginTraining(trainingSetSource, true);
CustomFormModel customFormModel = trainingPoller.getFinalResult();
// Model Info
System.out.printf("Model Id: %s%n", customFormModel.getModelId());
System.out.printf("Model Status: %s%n", customFormModel.getModelStatus());
System.out.printf("Training started on: %s%n", customFormModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n%n", customFormModel.getTrainingCompletedOn());
// </snippet_trainlabels_call>
// <snippet_trainlabels_print>
// looping through the subModels, which contains the fields they were trained on
// The labels are based on the ones you gave the training document.
System.out.println("Recognized Fields:");
// Since the data is labeled, we are able to return the accuracy of the model
customFormModel.getSubmodels().forEach(customFormSubmodel -> {
System.out.printf("The subModel with form type %s has accuracy: %.2f%n", customFormSubmodel.getFormType(),
customFormSubmodel.getAccuracy());
customFormSubmodel.getFields()
.forEach((label, customFormModelField) -> System.out.printf(
"The model found field '%s' to have name: %s with an accuracy: %.2f%n", label,
customFormModelField.getName(), customFormModelField.getAccuracy()));
});
return customFormModel.getModelId();
}
// </snippet_trainlabels_print>
// <snippet_analyze_call>
// Analyze PDF form data
private static void AnalyzePdfForm(FormRecognizerClient formClient, String modelId, String pdfFormUrl) {
SyncPoller<FormRecognizerOperationResult, List<RecognizedForm>> recognizeFormPoller = formClient
.beginRecognizeCustomFormsFromUrl(modelId, pdfFormUrl);
List<RecognizedForm> recognizedForms = recognizeFormPoller.getFinalResult();
// </snippet_analyze_call>
// <snippet_analyze_print>
for (int i = 0; i < recognizedForms.size(); i++) {
final RecognizedForm form = recognizedForms.get(i);
System.out.printf("----------- Recognized custom form info for page %d -----------%n", i);
System.out.printf("Form type: %s%n", form.getFormType());
form.getFields().forEach((label, formField) ->
// label data is populated if you are using a model trained with unlabeled data,
// since the service needs to make predictions for labels if not explicitly
// given to it.
System.out.printf("Field '%s' has label '%s' with a confidence " + "score of %.2f.%n", label,
formField.getLabelData().getText(), formField.getConfidence()));
}
}
// </snippet_analyze_print>
// <snippet_manage>
private static void ManageModels(FormTrainingClient trainingClient, String trainingFileUrl) {
// </snippet_manage>
// <snippet_manage_count>
AtomicReference<String> modelId = new AtomicReference<>();
// First, we see how many custom models we have, and what our limit is
AccountProperties accountProperties = trainingClient.getAccountProperties();
System.out.printf("The account has %s custom models, and we can have at most %s custom models",
accountProperties.getCustomModelCount(), accountProperties.getCustomModelLimit());
// </snippet_manage_count>
// <snippet_manage_list>
// Next, we get a paged list of all of our custom models
PagedIterable<CustomFormModelInfo> customModels = trainingClient.listCustomModels();
System.out.println("We have following models in the account:");
customModels.forEach(customFormModelInfo -> {
System.out.printf("Model Id: %s%n", customFormModelInfo.getModelId());
// get custom model info
modelId.set(customFormModelInfo.getModelId());
CustomFormModel customModel = trainingClient.getCustomModel(customFormModelInfo.getModelId());
System.out.printf("Model Id: %s%n", customModel.getModelId());
System.out.printf("Model Status: %s%n", customModel.getModelStatus());
System.out.printf("Training started on: %s%n", customModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n", customModel.getTrainingCompletedOn());
customModel.getSubmodels().forEach(customFormSubmodel -> {
System.out.printf("Custom Model Form type: %s%n", customFormSubmodel.getFormType());
System.out.printf("Custom Model Accuracy: %.2f%n", customFormSubmodel.getAccuracy());
if (customFormSubmodel.getFields() != null) {
customFormSubmodel.getFields().forEach((fieldText, customFormModelField) -> {
System.out.printf("Field Text: %s%n", fieldText);
System.out.printf("Field Accuracy: %.2f%n", customFormModelField.getAccuracy());
});
}
});
});
// </snippet_manage_list>
// <snippet_manage_delete>
// Delete Custom Model
System.out.printf("Deleted model with model Id: %s, operation completed with status: %s%n", modelId.get(),
trainingClient.deleteModelWithResponse(modelId.get(), Context.NONE).getStatusCode());
}
// </snippet_manage_delete>
}
结果与以下输出类似。
Get form content...
----Recognizing content ----
Has width: 8.500000 and height: 11.000000, measured with unit: inch.
Table has 2 rows and 6 columns.
Cell has text Invoice Number.
Cell has text Invoice Date.
Cell has text Invoice Due Date.
Cell has text Charges.
Cell has text VAT ID.
Cell has text 458176.
Cell has text 3/28/2018.
Cell has text 4/16/2018.
Cell has text $89,024.34.
Cell has text ET.
分析回执
本部分演示了如何使用预先训练的收据模型分析和提取美国收据中的常见字段。 有关收据分析的详细信息,请参阅文档智能收据模型。
若要分析位于某个 URI 的收据,请使用 beginRecognizeReceiptsFromUrl
方法。
// <snippet_imports>
import com.azure.ai.formrecognizer.*;
import com.azure.ai.formrecognizer.training.*;
import com.azure.ai.formrecognizer.models.*;
import com.azure.ai.formrecognizer.training.models.*;
import java.util.concurrent.atomic.AtomicReference;
import java.util.List;
import java.util.Map;
import java.time.LocalDate;
import com.azure.core.credential.AzureKeyCredential;
import com.azure.core.http.rest.PagedIterable;
import com.azure.core.util.Context;
import com.azure.core.util.polling.SyncPoller;
// </snippet_imports>
public class FormRecognizer {
// <snippet_creds>
static final String key = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
static final String endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
// </snippet_creds>
public static void main(String[] args) {
// <snippet_auth>
FormRecognizerClient recognizerClient = new FormRecognizerClientBuilder()
.credential(new AzureKeyCredential(key)).endpoint(endpoint).buildClient();
FormTrainingClient trainingClient = new FormTrainingClientBuilder().credential(new AzureKeyCredential(key))
.endpoint(endpoint).buildClient();
// </snippet_auth>
// <snippet_mainvars>
String trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
String formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
String receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_mainvars>
// <snippet_maincalls>
// Call Form Recognizer scenarios:
System.out.println("Get form content...");
GetContent(recognizerClient, formUrl);
System.out.println("Analyze receipt...");
AnalyzeReceipt(recognizerClient, receiptUrl);
System.out.println("Train Model with training data...");
String modelId = TrainModel(trainingClient, trainingDataUrl);
System.out.println("Analyze PDF form...");
AnalyzePdfForm(recognizerClient, modelId, formUrl);
System.out.println("Manage models...");
ManageModels(trainingClient, trainingDataUrl);
// </snippet_maincalls>
}
// <snippet_getcontent_call>
private static void GetContent(FormRecognizerClient recognizerClient, String invoiceUri) {
String analyzeFilePath = invoiceUri;
SyncPoller<FormRecognizerOperationResult, List<FormPage>> recognizeContentPoller = recognizerClient
.beginRecognizeContentFromUrl(analyzeFilePath);
List<FormPage> contentResult = recognizeContentPoller.getFinalResult();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
contentResult.forEach(formPage -> {
// Table information
System.out.println("----Recognizing content ----");
System.out.printf("Has width: %f and height: %f, measured with unit: %s.%n", formPage.getWidth(),
formPage.getHeight(), formPage.getUnit());
formPage.getTables().forEach(formTable -> {
System.out.printf("Table has %d rows and %d columns.%n", formTable.getRowCount(),
formTable.getColumnCount());
formTable.getCells().forEach(formTableCell -> {
System.out.printf("Cell has text %s.%n", formTableCell.getText());
});
System.out.println();
});
});
}
// </snippet_getcontent_print>
// <snippet_receipts_call>
private static void AnalyzeReceipt(FormRecognizerClient recognizerClient, String receiptUri) {
SyncPoller<FormRecognizerOperationResult, List<RecognizedForm>> syncPoller = recognizerClient
.beginRecognizeReceiptsFromUrl(receiptUri);
List<RecognizedForm> receiptPageResults = syncPoller.getFinalResult();
// </snippet_receipts_call>
// <snippet_receipts_print>
for (int i = 0; i < receiptPageResults.size(); i++) {
RecognizedForm recognizedForm = receiptPageResults.get(i);
Map<String, FormField> recognizedFields = recognizedForm.getFields();
System.out.printf("----------- Recognized Receipt page %d -----------%n", i);
FormField merchantNameField = recognizedFields.get("MerchantName");
if (merchantNameField != null) {
if (FieldValueType.STRING == merchantNameField.getValue().getValueType()) {
String merchantName = merchantNameField.getValue().asString();
System.out.printf("Merchant Name: %s, confidence: %.2f%n", merchantName,
merchantNameField.getConfidence());
}
}
FormField merchantAddressField = recognizedFields.get("MerchantAddress");
if (merchantAddressField != null) {
if (FieldValueType.STRING == merchantAddressField.getValue().getValueType()) {
String merchantAddress = merchantAddressField.getValue().asString();
System.out.printf("Merchant Address: %s, confidence: %.2f%n", merchantAddress,
merchantAddressField.getConfidence());
}
}
FormField transactionDateField = recognizedFields.get("TransactionDate");
if (transactionDateField != null) {
if (FieldValueType.DATE == transactionDateField.getValue().getValueType()) {
LocalDate transactionDate = transactionDateField.getValue().asDate();
System.out.printf("Transaction Date: %s, confidence: %.2f%n", transactionDate,
transactionDateField.getConfidence());
}
}
// </snippet_receipts_print>
// <snippet_receipts_print_items>
FormField receiptItemsField = recognizedFields.get("Items");
if (receiptItemsField != null) {
System.out.printf("Receipt Items: %n");
if (FieldValueType.LIST == receiptItemsField.getValue().getValueType()) {
List<FormField> receiptItems = receiptItemsField.getValue().asList();
receiptItems.stream()
.filter(receiptItem -> FieldValueType.MAP == receiptItem.getValue().getValueType())
.map(formField -> formField.getValue().asMap())
.forEach(formFieldMap -> formFieldMap.forEach((key, formField) -> {
if ("Name".equals(key)) {
if (FieldValueType.STRING == formField.getValue().getValueType()) {
String name = formField.getValue().asString();
System.out.printf("Name: %s, confidence: %.2fs%n", name,
formField.getConfidence());
}
}
if ("Quantity".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float quantity = formField.getValue().asFloat();
System.out.printf("Quantity: %f, confidence: %.2f%n", quantity,
formField.getConfidence());
}
}
if ("Price".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float price = formField.getValue().asFloat();
System.out.printf("Price: %f, confidence: %.2f%n", price,
formField.getConfidence());
}
}
if ("TotalPrice".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float totalPrice = formField.getValue().asFloat();
System.out.printf("Total Price: %f, confidence: %.2f%n", totalPrice,
formField.getConfidence());
}
}
}));
}
}
}
}
// </snippet_receipts_print_items>
// <snippet_train_call>
private static String TrainModel(FormTrainingClient trainingClient, String trainingDataUrl) {
SyncPoller<FormRecognizerOperationResult, CustomFormModel> trainingPoller = trainingClient
.beginTraining(trainingDataUrl, false);
CustomFormModel customFormModel = trainingPoller.getFinalResult();
// Model Info
System.out.printf("Model Id: %s%n", customFormModel.getModelId());
System.out.printf("Model Status: %s%n", customFormModel.getModelStatus());
System.out.printf("Training started on: %s%n", customFormModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n%n", customFormModel.getTrainingCompletedOn());
// </snippet_train_call>
// <snippet_train_print>
System.out.println("Recognized Fields:");
// looping through the subModels, which contains the fields they were trained on
// Since the given training documents are unlabeled, we still group them but
// they do not have a label.
customFormModel.getSubmodels().forEach(customFormSubmodel -> {
// Since the training data is unlabeled, we are unable to return the accuracy of
// this model
System.out.printf("The subModel has form type %s%n", customFormSubmodel.getFormType());
customFormSubmodel.getFields().forEach((field, customFormModelField) -> System.out
.printf("The model found field '%s' with label: %s%n", field, customFormModelField.getLabel()));
});
// </snippet_train_print>
// <snippet_train_return>
return customFormModel.getModelId();
}
// </snippet_train_return>
// <snippet_trainlabels_call>
private static String TrainModelWithLabels(FormTrainingClient trainingClient, String trainingDataUrl) {
// Train custom model
String trainingSetSource = trainingDataUrl;
SyncPoller<FormRecognizerOperationResult, CustomFormModel> trainingPoller = trainingClient
.beginTraining(trainingSetSource, true);
CustomFormModel customFormModel = trainingPoller.getFinalResult();
// Model Info
System.out.printf("Model Id: %s%n", customFormModel.getModelId());
System.out.printf("Model Status: %s%n", customFormModel.getModelStatus());
System.out.printf("Training started on: %s%n", customFormModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n%n", customFormModel.getTrainingCompletedOn());
// </snippet_trainlabels_call>
// <snippet_trainlabels_print>
// looping through the subModels, which contains the fields they were trained on
// The labels are based on the ones you gave the training document.
System.out.println("Recognized Fields:");
// Since the data is labeled, we are able to return the accuracy of the model
customFormModel.getSubmodels().forEach(customFormSubmodel -> {
System.out.printf("The subModel with form type %s has accuracy: %.2f%n", customFormSubmodel.getFormType(),
customFormSubmodel.getAccuracy());
customFormSubmodel.getFields()
.forEach((label, customFormModelField) -> System.out.printf(
"The model found field '%s' to have name: %s with an accuracy: %.2f%n", label,
customFormModelField.getName(), customFormModelField.getAccuracy()));
});
return customFormModel.getModelId();
}
// </snippet_trainlabels_print>
// <snippet_analyze_call>
// Analyze PDF form data
private static void AnalyzePdfForm(FormRecognizerClient formClient, String modelId, String pdfFormUrl) {
SyncPoller<FormRecognizerOperationResult, List<RecognizedForm>> recognizeFormPoller = formClient
.beginRecognizeCustomFormsFromUrl(modelId, pdfFormUrl);
List<RecognizedForm> recognizedForms = recognizeFormPoller.getFinalResult();
// </snippet_analyze_call>
// <snippet_analyze_print>
for (int i = 0; i < recognizedForms.size(); i++) {
final RecognizedForm form = recognizedForms.get(i);
System.out.printf("----------- Recognized custom form info for page %d -----------%n", i);
System.out.printf("Form type: %s%n", form.getFormType());
form.getFields().forEach((label, formField) ->
// label data is populated if you are using a model trained with unlabeled data,
// since the service needs to make predictions for labels if not explicitly
// given to it.
System.out.printf("Field '%s' has label '%s' with a confidence " + "score of %.2f.%n", label,
formField.getLabelData().getText(), formField.getConfidence()));
}
}
// </snippet_analyze_print>
// <snippet_manage>
private static void ManageModels(FormTrainingClient trainingClient, String trainingFileUrl) {
// </snippet_manage>
// <snippet_manage_count>
AtomicReference<String> modelId = new AtomicReference<>();
// First, we see how many custom models we have, and what our limit is
AccountProperties accountProperties = trainingClient.getAccountProperties();
System.out.printf("The account has %s custom models, and we can have at most %s custom models",
accountProperties.getCustomModelCount(), accountProperties.getCustomModelLimit());
// </snippet_manage_count>
// <snippet_manage_list>
// Next, we get a paged list of all of our custom models
PagedIterable<CustomFormModelInfo> customModels = trainingClient.listCustomModels();
System.out.println("We have following models in the account:");
customModels.forEach(customFormModelInfo -> {
System.out.printf("Model Id: %s%n", customFormModelInfo.getModelId());
// get custom model info
modelId.set(customFormModelInfo.getModelId());
CustomFormModel customModel = trainingClient.getCustomModel(customFormModelInfo.getModelId());
System.out.printf("Model Id: %s%n", customModel.getModelId());
System.out.printf("Model Status: %s%n", customModel.getModelStatus());
System.out.printf("Training started on: %s%n", customModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n", customModel.getTrainingCompletedOn());
customModel.getSubmodels().forEach(customFormSubmodel -> {
System.out.printf("Custom Model Form type: %s%n", customFormSubmodel.getFormType());
System.out.printf("Custom Model Accuracy: %.2f%n", customFormSubmodel.getAccuracy());
if (customFormSubmodel.getFields() != null) {
customFormSubmodel.getFields().forEach((fieldText, customFormModelField) -> {
System.out.printf("Field Text: %s%n", fieldText);
System.out.printf("Field Accuracy: %.2f%n", customFormModelField.getAccuracy());
});
}
});
});
// </snippet_manage_list>
// <snippet_manage_delete>
// Delete Custom Model
System.out.printf("Deleted model with model Id: %s, operation completed with status: %s%n", modelId.get(),
trainingClient.deleteModelWithResponse(modelId.get(), Context.NONE).getStatusCode());
}
// </snippet_manage_delete>
}
提示
你还可以分析本地收据图像。 请参阅 FormRecognizerClient 方法,例如 beginRecognizeReceipts
。 或者,请参阅 GitHub 上的示例代码,了解涉及本地图像的方案。
返回的值是 RecognizedReceipt
对象的集合。 提交的文档中每个页面都有一个对象。 接下来的代码块会循环访问回执,并将其详细信息输出到控制台。
// <snippet_imports>
import com.azure.ai.formrecognizer.*;
import com.azure.ai.formrecognizer.training.*;
import com.azure.ai.formrecognizer.models.*;
import com.azure.ai.formrecognizer.training.models.*;
import java.util.concurrent.atomic.AtomicReference;
import java.util.List;
import java.util.Map;
import java.time.LocalDate;
import com.azure.core.credential.AzureKeyCredential;
import com.azure.core.http.rest.PagedIterable;
import com.azure.core.util.Context;
import com.azure.core.util.polling.SyncPoller;
// </snippet_imports>
public class FormRecognizer {
// <snippet_creds>
static final String key = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
static final String endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
// </snippet_creds>
public static void main(String[] args) {
// <snippet_auth>
FormRecognizerClient recognizerClient = new FormRecognizerClientBuilder()
.credential(new AzureKeyCredential(key)).endpoint(endpoint).buildClient();
FormTrainingClient trainingClient = new FormTrainingClientBuilder().credential(new AzureKeyCredential(key))
.endpoint(endpoint).buildClient();
// </snippet_auth>
// <snippet_mainvars>
String trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
String formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
String receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_mainvars>
// <snippet_maincalls>
// Call Form Recognizer scenarios:
System.out.println("Get form content...");
GetContent(recognizerClient, formUrl);
System.out.println("Analyze receipt...");
AnalyzeReceipt(recognizerClient, receiptUrl);
System.out.println("Train Model with training data...");
String modelId = TrainModel(trainingClient, trainingDataUrl);
System.out.println("Analyze PDF form...");
AnalyzePdfForm(recognizerClient, modelId, formUrl);
System.out.println("Manage models...");
ManageModels(trainingClient, trainingDataUrl);
// </snippet_maincalls>
}
// <snippet_getcontent_call>
private static void GetContent(FormRecognizerClient recognizerClient, String invoiceUri) {
String analyzeFilePath = invoiceUri;
SyncPoller<FormRecognizerOperationResult, List<FormPage>> recognizeContentPoller = recognizerClient
.beginRecognizeContentFromUrl(analyzeFilePath);
List<FormPage> contentResult = recognizeContentPoller.getFinalResult();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
contentResult.forEach(formPage -> {
// Table information
System.out.println("----Recognizing content ----");
System.out.printf("Has width: %f and height: %f, measured with unit: %s.%n", formPage.getWidth(),
formPage.getHeight(), formPage.getUnit());
formPage.getTables().forEach(formTable -> {
System.out.printf("Table has %d rows and %d columns.%n", formTable.getRowCount(),
formTable.getColumnCount());
formTable.getCells().forEach(formTableCell -> {
System.out.printf("Cell has text %s.%n", formTableCell.getText());
});
System.out.println();
});
});
}
// </snippet_getcontent_print>
// <snippet_receipts_call>
private static void AnalyzeReceipt(FormRecognizerClient recognizerClient, String receiptUri) {
SyncPoller<FormRecognizerOperationResult, List<RecognizedForm>> syncPoller = recognizerClient
.beginRecognizeReceiptsFromUrl(receiptUri);
List<RecognizedForm> receiptPageResults = syncPoller.getFinalResult();
// </snippet_receipts_call>
// <snippet_receipts_print>
for (int i = 0; i < receiptPageResults.size(); i++) {
RecognizedForm recognizedForm = receiptPageResults.get(i);
Map<String, FormField> recognizedFields = recognizedForm.getFields();
System.out.printf("----------- Recognized Receipt page %d -----------%n", i);
FormField merchantNameField = recognizedFields.get("MerchantName");
if (merchantNameField != null) {
if (FieldValueType.STRING == merchantNameField.getValue().getValueType()) {
String merchantName = merchantNameField.getValue().asString();
System.out.printf("Merchant Name: %s, confidence: %.2f%n", merchantName,
merchantNameField.getConfidence());
}
}
FormField merchantAddressField = recognizedFields.get("MerchantAddress");
if (merchantAddressField != null) {
if (FieldValueType.STRING == merchantAddressField.getValue().getValueType()) {
String merchantAddress = merchantAddressField.getValue().asString();
System.out.printf("Merchant Address: %s, confidence: %.2f%n", merchantAddress,
merchantAddressField.getConfidence());
}
}
FormField transactionDateField = recognizedFields.get("TransactionDate");
if (transactionDateField != null) {
if (FieldValueType.DATE == transactionDateField.getValue().getValueType()) {
LocalDate transactionDate = transactionDateField.getValue().asDate();
System.out.printf("Transaction Date: %s, confidence: %.2f%n", transactionDate,
transactionDateField.getConfidence());
}
}
// </snippet_receipts_print>
// <snippet_receipts_print_items>
FormField receiptItemsField = recognizedFields.get("Items");
if (receiptItemsField != null) {
System.out.printf("Receipt Items: %n");
if (FieldValueType.LIST == receiptItemsField.getValue().getValueType()) {
List<FormField> receiptItems = receiptItemsField.getValue().asList();
receiptItems.stream()
.filter(receiptItem -> FieldValueType.MAP == receiptItem.getValue().getValueType())
.map(formField -> formField.getValue().asMap())
.forEach(formFieldMap -> formFieldMap.forEach((key, formField) -> {
if ("Name".equals(key)) {
if (FieldValueType.STRING == formField.getValue().getValueType()) {
String name = formField.getValue().asString();
System.out.printf("Name: %s, confidence: %.2fs%n", name,
formField.getConfidence());
}
}
if ("Quantity".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float quantity = formField.getValue().asFloat();
System.out.printf("Quantity: %f, confidence: %.2f%n", quantity,
formField.getConfidence());
}
}
if ("Price".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float price = formField.getValue().asFloat();
System.out.printf("Price: %f, confidence: %.2f%n", price,
formField.getConfidence());
}
}
if ("TotalPrice".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float totalPrice = formField.getValue().asFloat();
System.out.printf("Total Price: %f, confidence: %.2f%n", totalPrice,
formField.getConfidence());
}
}
}));
}
}
}
}
// </snippet_receipts_print_items>
// <snippet_train_call>
private static String TrainModel(FormTrainingClient trainingClient, String trainingDataUrl) {
SyncPoller<FormRecognizerOperationResult, CustomFormModel> trainingPoller = trainingClient
.beginTraining(trainingDataUrl, false);
CustomFormModel customFormModel = trainingPoller.getFinalResult();
// Model Info
System.out.printf("Model Id: %s%n", customFormModel.getModelId());
System.out.printf("Model Status: %s%n", customFormModel.getModelStatus());
System.out.printf("Training started on: %s%n", customFormModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n%n", customFormModel.getTrainingCompletedOn());
// </snippet_train_call>
// <snippet_train_print>
System.out.println("Recognized Fields:");
// looping through the subModels, which contains the fields they were trained on
// Since the given training documents are unlabeled, we still group them but
// they do not have a label.
customFormModel.getSubmodels().forEach(customFormSubmodel -> {
// Since the training data is unlabeled, we are unable to return the accuracy of
// this model
System.out.printf("The subModel has form type %s%n", customFormSubmodel.getFormType());
customFormSubmodel.getFields().forEach((field, customFormModelField) -> System.out
.printf("The model found field '%s' with label: %s%n", field, customFormModelField.getLabel()));
});
// </snippet_train_print>
// <snippet_train_return>
return customFormModel.getModelId();
}
// </snippet_train_return>
// <snippet_trainlabels_call>
private static String TrainModelWithLabels(FormTrainingClient trainingClient, String trainingDataUrl) {
// Train custom model
String trainingSetSource = trainingDataUrl;
SyncPoller<FormRecognizerOperationResult, CustomFormModel> trainingPoller = trainingClient
.beginTraining(trainingSetSource, true);
CustomFormModel customFormModel = trainingPoller.getFinalResult();
// Model Info
System.out.printf("Model Id: %s%n", customFormModel.getModelId());
System.out.printf("Model Status: %s%n", customFormModel.getModelStatus());
System.out.printf("Training started on: %s%n", customFormModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n%n", customFormModel.getTrainingCompletedOn());
// </snippet_trainlabels_call>
// <snippet_trainlabels_print>
// looping through the subModels, which contains the fields they were trained on
// The labels are based on the ones you gave the training document.
System.out.println("Recognized Fields:");
// Since the data is labeled, we are able to return the accuracy of the model
customFormModel.getSubmodels().forEach(customFormSubmodel -> {
System.out.printf("The subModel with form type %s has accuracy: %.2f%n", customFormSubmodel.getFormType(),
customFormSubmodel.getAccuracy());
customFormSubmodel.getFields()
.forEach((label, customFormModelField) -> System.out.printf(
"The model found field '%s' to have name: %s with an accuracy: %.2f%n", label,
customFormModelField.getName(), customFormModelField.getAccuracy()));
});
return customFormModel.getModelId();
}
// </snippet_trainlabels_print>
// <snippet_analyze_call>
// Analyze PDF form data
private static void AnalyzePdfForm(FormRecognizerClient formClient, String modelId, String pdfFormUrl) {
SyncPoller<FormRecognizerOperationResult, List<RecognizedForm>> recognizeFormPoller = formClient
.beginRecognizeCustomFormsFromUrl(modelId, pdfFormUrl);
List<RecognizedForm> recognizedForms = recognizeFormPoller.getFinalResult();
// </snippet_analyze_call>
// <snippet_analyze_print>
for (int i = 0; i < recognizedForms.size(); i++) {
final RecognizedForm form = recognizedForms.get(i);
System.out.printf("----------- Recognized custom form info for page %d -----------%n", i);
System.out.printf("Form type: %s%n", form.getFormType());
form.getFields().forEach((label, formField) ->
// label data is populated if you are using a model trained with unlabeled data,
// since the service needs to make predictions for labels if not explicitly
// given to it.
System.out.printf("Field '%s' has label '%s' with a confidence " + "score of %.2f.%n", label,
formField.getLabelData().getText(), formField.getConfidence()));
}
}
// </snippet_analyze_print>
// <snippet_manage>
private static void ManageModels(FormTrainingClient trainingClient, String trainingFileUrl) {
// </snippet_manage>
// <snippet_manage_count>
AtomicReference<String> modelId = new AtomicReference<>();
// First, we see how many custom models we have, and what our limit is
AccountProperties accountProperties = trainingClient.getAccountProperties();
System.out.printf("The account has %s custom models, and we can have at most %s custom models",
accountProperties.getCustomModelCount(), accountProperties.getCustomModelLimit());
// </snippet_manage_count>
// <snippet_manage_list>
// Next, we get a paged list of all of our custom models
PagedIterable<CustomFormModelInfo> customModels = trainingClient.listCustomModels();
System.out.println("We have following models in the account:");
customModels.forEach(customFormModelInfo -> {
System.out.printf("Model Id: %s%n", customFormModelInfo.getModelId());
// get custom model info
modelId.set(customFormModelInfo.getModelId());
CustomFormModel customModel = trainingClient.getCustomModel(customFormModelInfo.getModelId());
System.out.printf("Model Id: %s%n", customModel.getModelId());
System.out.printf("Model Status: %s%n", customModel.getModelStatus());
System.out.printf("Training started on: %s%n", customModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n", customModel.getTrainingCompletedOn());
customModel.getSubmodels().forEach(customFormSubmodel -> {
System.out.printf("Custom Model Form type: %s%n", customFormSubmodel.getFormType());
System.out.printf("Custom Model Accuracy: %.2f%n", customFormSubmodel.getAccuracy());
if (customFormSubmodel.getFields() != null) {
customFormSubmodel.getFields().forEach((fieldText, customFormModelField) -> {
System.out.printf("Field Text: %s%n", fieldText);
System.out.printf("Field Accuracy: %.2f%n", customFormModelField.getAccuracy());
});
}
});
});
// </snippet_manage_list>
// <snippet_manage_delete>
// Delete Custom Model
System.out.printf("Deleted model with model Id: %s, operation completed with status: %s%n", modelId.get(),
trainingClient.deleteModelWithResponse(modelId.get(), Context.NONE).getStatusCode());
}
// </snippet_manage_delete>
}
接下来的代码块会循环访问在回执上检测到的各个项,并将其详细信息输出到控制台。
// <snippet_imports>
import com.azure.ai.formrecognizer.*;
import com.azure.ai.formrecognizer.training.*;
import com.azure.ai.formrecognizer.models.*;
import com.azure.ai.formrecognizer.training.models.*;
import java.util.concurrent.atomic.AtomicReference;
import java.util.List;
import java.util.Map;
import java.time.LocalDate;
import com.azure.core.credential.AzureKeyCredential;
import com.azure.core.http.rest.PagedIterable;
import com.azure.core.util.Context;
import com.azure.core.util.polling.SyncPoller;
// </snippet_imports>
public class FormRecognizer {
// <snippet_creds>
static final String key = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
static final String endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
// </snippet_creds>
public static void main(String[] args) {
// <snippet_auth>
FormRecognizerClient recognizerClient = new FormRecognizerClientBuilder()
.credential(new AzureKeyCredential(key)).endpoint(endpoint).buildClient();
FormTrainingClient trainingClient = new FormTrainingClientBuilder().credential(new AzureKeyCredential(key))
.endpoint(endpoint).buildClient();
// </snippet_auth>
// <snippet_mainvars>
String trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
String formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
String receiptUrl = "/cognitive-services/form-recognizer/media"
+ "/contoso-allinone.jpg";
// </snippet_mainvars>
// <snippet_maincalls>
// Call Form Recognizer scenarios:
System.out.println("Get form content...");
GetContent(recognizerClient, formUrl);
System.out.println("Analyze receipt...");
AnalyzeReceipt(recognizerClient, receiptUrl);
System.out.println("Train Model with training data...");
String modelId = TrainModel(trainingClient, trainingDataUrl);
System.out.println("Analyze PDF form...");
AnalyzePdfForm(recognizerClient, modelId, formUrl);
System.out.println("Manage models...");
ManageModels(trainingClient, trainingDataUrl);
// </snippet_maincalls>
}
// <snippet_getcontent_call>
private static void GetContent(FormRecognizerClient recognizerClient, String invoiceUri) {
String analyzeFilePath = invoiceUri;
SyncPoller<FormRecognizerOperationResult, List<FormPage>> recognizeContentPoller = recognizerClient
.beginRecognizeContentFromUrl(analyzeFilePath);
List<FormPage> contentResult = recognizeContentPoller.getFinalResult();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
contentResult.forEach(formPage -> {
// Table information
System.out.println("----Recognizing content ----");
System.out.printf("Has width: %f and height: %f, measured with unit: %s.%n", formPage.getWidth(),
formPage.getHeight(), formPage.getUnit());
formPage.getTables().forEach(formTable -> {
System.out.printf("Table has %d rows and %d columns.%n", formTable.getRowCount(),
formTable.getColumnCount());
formTable.getCells().forEach(formTableCell -> {
System.out.printf("Cell has text %s.%n", formTableCell.getText());
});
System.out.println();
});
});
}
// </snippet_getcontent_print>
// <snippet_receipts_call>
private static void AnalyzeReceipt(FormRecognizerClient recognizerClient, String receiptUri) {
SyncPoller<FormRecognizerOperationResult, List<RecognizedForm>> syncPoller = recognizerClient
.beginRecognizeReceiptsFromUrl(receiptUri);
List<RecognizedForm> receiptPageResults = syncPoller.getFinalResult();
// </snippet_receipts_call>
// <snippet_receipts_print>
for (int i = 0; i < receiptPageResults.size(); i++) {
RecognizedForm recognizedForm = receiptPageResults.get(i);
Map<String, FormField> recognizedFields = recognizedForm.getFields();
System.out.printf("----------- Recognized Receipt page %d -----------%n", i);
FormField merchantNameField = recognizedFields.get("MerchantName");
if (merchantNameField != null) {
if (FieldValueType.STRING == merchantNameField.getValue().getValueType()) {
String merchantName = merchantNameField.getValue().asString();
System.out.printf("Merchant Name: %s, confidence: %.2f%n", merchantName,
merchantNameField.getConfidence());
}
}
FormField merchantAddressField = recognizedFields.get("MerchantAddress");
if (merchantAddressField != null) {
if (FieldValueType.STRING == merchantAddressField.getValue().getValueType()) {
String merchantAddress = merchantAddressField.getValue().asString();
System.out.printf("Merchant Address: %s, confidence: %.2f%n", merchantAddress,
merchantAddressField.getConfidence());
}
}
FormField transactionDateField = recognizedFields.get("TransactionDate");
if (transactionDateField != null) {
if (FieldValueType.DATE == transactionDateField.getValue().getValueType()) {
LocalDate transactionDate = transactionDateField.getValue().asDate();
System.out.printf("Transaction Date: %s, confidence: %.2f%n", transactionDate,
transactionDateField.getConfidence());
}
}
// </snippet_receipts_print>
// <snippet_receipts_print_items>
FormField receiptItemsField = recognizedFields.get("Items");
if (receiptItemsField != null) {
System.out.printf("Receipt Items: %n");
if (FieldValueType.LIST == receiptItemsField.getValue().getValueType()) {
List<FormField> receiptItems = receiptItemsField.getValue().asList();
receiptItems.stream()
.filter(receiptItem -> FieldValueType.MAP == receiptItem.getValue().getValueType())
.map(formField -> formField.getValue().asMap())
.forEach(formFieldMap -> formFieldMap.forEach((key, formField) -> {
if ("Name".equals(key)) {
if (FieldValueType.STRING == formField.getValue().getValueType()) {
String name = formField.getValue().asString();
System.out.printf("Name: %s, confidence: %.2fs%n", name,
formField.getConfidence());
}
}
if ("Quantity".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float quantity = formField.getValue().asFloat();
System.out.printf("Quantity: %f, confidence: %.2f%n", quantity,
formField.getConfidence());
}
}
if ("Price".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float price = formField.getValue().asFloat();
System.out.printf("Price: %f, confidence: %.2f%n", price,
formField.getConfidence());
}
}
if ("TotalPrice".equals(key)) {
if (FieldValueType.FLOAT == formField.getValue().getValueType()) {
Float totalPrice = formField.getValue().asFloat();
System.out.printf("Total Price: %f, confidence: %.2f%n", totalPrice,
formField.getConfidence());
}
}
}));
}
}
}
}
// </snippet_receipts_print_items>
// <snippet_train_call>
private static String TrainModel(FormTrainingClient trainingClient, String trainingDataUrl) {
SyncPoller<FormRecognizerOperationResult, CustomFormModel> trainingPoller = trainingClient
.beginTraining(trainingDataUrl, false);
CustomFormModel customFormModel = trainingPoller.getFinalResult();
// Model Info
System.out.printf("Model Id: %s%n", customFormModel.getModelId());
System.out.printf("Model Status: %s%n", customFormModel.getModelStatus());
System.out.printf("Training started on: %s%n", customFormModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n%n", customFormModel.getTrainingCompletedOn());
// </snippet_train_call>
// <snippet_train_print>
System.out.println("Recognized Fields:");
// looping through the subModels, which contains the fields they were trained on
// Since the given training documents are unlabeled, we still group them but
// they do not have a label.
customFormModel.getSubmodels().forEach(customFormSubmodel -> {
// Since the training data is unlabeled, we are unable to return the accuracy of
// this model
System.out.printf("The subModel has form type %s%n", customFormSubmodel.getFormType());
customFormSubmodel.getFields().forEach((field, customFormModelField) -> System.out
.printf("The model found field '%s' with label: %s%n", field, customFormModelField.getLabel()));
});
// </snippet_train_print>
// <snippet_train_return>
return customFormModel.getModelId();
}
// </snippet_train_return>
// <snippet_trainlabels_call>
private static String TrainModelWithLabels(FormTrainingClient trainingClient, String trainingDataUrl) {
// Train custom model
String trainingSetSource = trainingDataUrl;
SyncPoller<FormRecognizerOperationResult, CustomFormModel> trainingPoller = trainingClient
.beginTraining(trainingSetSource, true);
CustomFormModel customFormModel = trainingPoller.getFinalResult();
// Model Info
System.out.printf("Model Id: %s%n", customFormModel.getModelId());
System.out.printf("Model Status: %s%n", customFormModel.getModelStatus());
System.out.printf("Training started on: %s%n", customFormModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n%n", customFormModel.getTrainingCompletedOn());
// </snippet_trainlabels_call>
// <snippet_trainlabels_print>
// looping through the subModels, which contains the fields they were trained on
// The labels are based on the ones you gave the training document.
System.out.println("Recognized Fields:");
// Since the data is labeled, we are able to return the accuracy of the model
customFormModel.getSubmodels().forEach(customFormSubmodel -> {
System.out.printf("The subModel with form type %s has accuracy: %.2f%n", customFormSubmodel.getFormType(),
customFormSubmodel.getAccuracy());
customFormSubmodel.getFields()
.forEach((label, customFormModelField) -> System.out.printf(
"The model found field '%s' to have name: %s with an accuracy: %.2f%n", label,
customFormModelField.getName(), customFormModelField.getAccuracy()));
});
return customFormModel.getModelId();
}
// </snippet_trainlabels_print>
// <snippet_analyze_call>
// Analyze PDF form data
private static void AnalyzePdfForm(FormRecognizerClient formClient, String modelId, String pdfFormUrl) {
SyncPoller<FormRecognizerOperationResult, List<RecognizedForm>> recognizeFormPoller = formClient
.beginRecognizeCustomFormsFromUrl(modelId, pdfFormUrl);
List<RecognizedForm> recognizedForms = recognizeFormPoller.getFinalResult();
// </snippet_analyze_call>
// <snippet_analyze_print>
for (int i = 0; i < recognizedForms.size(); i++) {
final RecognizedForm form = recognizedForms.get(i);
System.out.printf("----------- Recognized custom form info for page %d -----------%n", i);
System.out.printf("Form type: %s%n", form.getFormType());
form.getFields().forEach((label, formField) ->
// label data is populated if you are using a model trained with unlabeled data,
// since the service needs to make predictions for labels if not explicitly
// given to it.
System.out.printf("Field '%s' has label '%s' with a confidence " + "score of %.2f.%n", label,
formField.getLabelData().getText(), formField.getConfidence()));
}
}
// </snippet_analyze_print>
// <snippet_manage>
private static void ManageModels(FormTrainingClient trainingClient, String trainingFileUrl) {
// </snippet_manage>
// <snippet_manage_count>
AtomicReference<String> modelId = new AtomicReference<>();
// First, we see how many custom models we have, and what our limit is
AccountProperties accountProperties = trainingClient.getAccountProperties();
System.out.printf("The account has %s custom models, and we can have at most %s custom models",
accountProperties.getCustomModelCount(), accountProperties.getCustomModelLimit());
// </snippet_manage_count>
// <snippet_manage_list>
// Next, we get a paged list of all of our custom models
PagedIterable<CustomFormModelInfo> customModels = trainingClient.listCustomModels();
System.out.println("We have following models in the account:");
customModels.forEach(customFormModelInfo -> {
System.out.printf("Model Id: %s%n", customFormModelInfo.getModelId());
// get custom model info
modelId.set(customFormModelInfo.getModelId());
CustomFormModel customModel = trainingClient.getCustomModel(customFormModelInfo.getModelId());
System.out.printf("Model Id: %s%n", customModel.getModelId());
System.out.printf("Model Status: %s%n", customModel.getModelStatus());
System.out.printf("Training started on: %s%n", customModel.getTrainingStartedOn());
System.out.printf("Training completed on: %s%n", customModel.getTrainingCompletedOn());
customModel.getSubmodels().forEach(customFormSubmodel -> {
System.out.printf("Custom Model Form type: %s%n", customFormSubmodel.getFormType());
System.out.printf("Custom Model Accuracy: %.2f%n", customFormSubmodel.getAccuracy());
if (customFormSubmodel.getFields() != null) {
customFormSubmodel.getFields().forEach((fieldText, customFormModelField) -> {
System.out.printf("Field Text: %s%n", fieldText);
System.out.printf("Field Accuracy: %.2f%n", customFormModelField.getAccuracy());
});
}
});
});
// </snippet_manage_list>
// <snippet_manage_delete>
// Delete Custom Model
System.out.printf("Deleted model with model Id: %s, operation completed with status: %s%n", modelId.get(),
trainingClient.deleteModelWithResponse(modelId.get(), Context.NONE).getStatusCode());
}
// </snippet_manage_delete>
}
结果与以下输出类似。
Analyze receipt...
----------- Recognized Receipt page 0 -----------
Merchant Name: Contoso Contoso, confidence: 0.62
Merchant Address: 123 Main Street Redmond, WA 98052, confidence: 0.99
Transaction Date: 2020-06-10, confidence: 0.90
Receipt Items:
Name: Cappuccino, confidence: 0.96s
Quantity: null, confidence: 0.957s]
Total Price: 2.200000, confidence: 0.95
Name: BACON & EGGS, confidence: 0.94s
Quantity: null, confidence: 0.927s]
Total Price: null, confidence: 0.93
分析名片
本部分演示了如何使用预先训练的模型分析和提取英文名片中的常见字段。 有关名片分析的详细信息,请参阅文档智能名片模型。
若要分析位于某个 URL 的名片,请使用 beginRecognizeBusinessCardsFromUrl
方法。
// <snippet_imports>
import com.azure.ai.formrecognizer.* ;
import com.azure.ai.formrecognizer.training.* ;
import com.azure.ai.formrecognizer.models.* ;
import com.azure.ai.formrecognizer.training.models.* ;
import java.util.concurrent.atomic.AtomicReference;
import java.util.List;
import java.util.Map;
import java.time.LocalDate;
import com.azure.core.credential.AzureKeyCredential;
import com.azure.core.http.rest.PagedIterable;
import com.azure.core.util.Context;
import com.azure.core.util.polling.SyncPoller;
// </snippet_imports>
public class FormRecognizer {
// <snippet_creds>
static final String key = "PASTE_YOUR_FORM_RECOGNIZER_SUBSCRIPTION_KEY_HERE";
static final String endpoint = "PASTE_YOUR_FORM_RECOGNIZER_ENDPOINT_HERE";
// </snippet_creds>
public static void main(String[] args) {
// <snippet_auth>
FormRecognizerClient recognizerClient = new FormRecognizerClientBuilder().credential(new AzureKeyCredential(key)).endpoint(endpoint).buildClient();
FormTrainingClient trainingClient = new FormTrainingClientBuilder().credential(new AzureKeyCredential(key)).endpoint(endpoint).buildClient();
// </snippet_auth>
// <snippet_mainvars>
String trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE";
String formUrl = "PASTE_YOUR_FORM_RECOGNIZER_FORM_URL_HERE";
String receiptUrl = "/cognitive-services/form-recognizer/media" + "/contoso-allinone.jpg";
String bcUrl = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_forms/business_cards/business-card-english.jpg";
String invoiceUrl = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_forms/forms/Invoice_1.pdf";
String idUrl = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/id-license.jpg"
// </snippet_mainvars>
// <snippet_maincalls>
// Call Form Recognizer scenarios:
System.out.println("Get form content...");
GetContent(recognizerClient, formUrl);
System.out.println("Analyze receipt...");
AnalyzeReceipt(recognizerClient, receiptUrl);
System.out.println("Analyze business card...");
AnalyzeBusinessCard(recognizerClient, bcUrl);
System.out.println("Analyze invoice...");
AnalyzeInvoice(recognizerClient, invoiceUrl);
System.out.println("Analyze id...");
AnalyzeId(recognizerClient, idUrl);
System.out.println("Train Model with training data...");
String modelId = TrainModel(trainingClient, trainingDataUrl);
System.out.println("Analyze PDF form...");
AnalyzePdfForm(recognizerClient, modelId, formUrl);
System.out.println("Manage models...");
ManageModels(trainingClient, trainingDataUrl);
// </snippet_maincalls>
}
// <snippet_getcontent_call>
private static void GetContent(FormRecognizerClient recognizerClient, String invoiceUri) {
String analyzeFilePath = invoiceUri;
SyncPoller < FormRecognizerOperationResult,
List < FormPage >> recognizeContentPoller = recognizerClient.beginRecognizeContentFromUrl(analyzeFilePath);
List < FormPage > contentResult = recognizeContentPoller.getFinalResult();
// </snippet_getcontent_call>
// <snippet_getcontent_print>
contentResult.forEach(formPage - >{
// Table information
System.out.println("----Recognizing content ----");
System.out.printf("Has width: %f and height: %f, measured with unit: %s.%n", formPage.getWidth(), formPage.getHeight(), formPage.getUnit());
formPage.getTables().forEach(formTable - >{
System.out.printf("Table has %d rows and %d columns.%n", formTable.getRowCount(), formTable.getColumnCount());
formTable.getCells().forEach(formTableCell - >{
System.out.printf("Cell has text %s.%n", formTableCell.getText());
});
System.out.println();
});
});
}
// </snippet_getcontent_print>
// <snippet_receipts_call>
private static void AnalyzeReceipt(FormRecognizerClient recognizerClient, String receiptUri) {
SyncPoller < FormRecognizerOperationResult,
List < RecognizedForm >> syncPoller = recognizerClient.beginRecognizeReceiptsFromUrl(receiptUri);
List < RecognizedForm > receiptPageResults = syncPoller.getFinalResult();
// </snippet_receipts_call>
// <snippet_receipts_print>
for (int i = 0; i < receiptPageResults.size(); i++) {
RecognizedForm recognizedForm = receiptPageResults.get(i);
Map < String,
FormField > recognizedFields = recognizedForm.getFields();
System.out.printf("----------- Recognized Receipt page %d -----------%n", i);
FormField merchantNameField = recognizedFields.get("MerchantName");
if (merchantNameField != null) {
if (FieldValueType.STRING == merchantNameField.getValue().getValueType()) {
String merchantName = merchantNameField.getValue().asString();
System.out.printf("Merchant Name: %s, confidence: %.2f%n", merchantName, merchantNameField.getConfidence());
}
}
FormField merchantAddressField = recognizedFields.get("MerchantAddress");
if (merchantAddressField != null) {
if (FieldValueType.STRING == merchantAddressField.getValue().getValueType()) {
String merchantAddress = merchantAddressField.getValue().asString();
System.out.printf("Merchant Address: %s, confidence: %.2f%n", merchantAddress, merchantAddressField.getConfidence());
}
}
FormField transactionDateField = recognizedFields.get("TransactionDate");
if (transactionDateField != null) {
if (FieldValueType.DATE == transactionDateField.getValue().getValueType()) {
LocalDate transactionDate = transactionDateField.getValue().asDate();
System.out.printf("Transaction Date: %s, confidence: %.2f%n", transactionDate, transactionDateField.getConfidence());
}
}
// </snippet_receipts_print>
// <snippet_receipts_print_items>