Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Important
The Azure Language Text Personally Identifiable Information (PII) detection anonymization feature (synthetic replacement) is currently available in preview and licensed to you as part of your Azure subscription. Your use of this feature is subject to the terms applicable to Previews as described in the Supplemental Terms of Use for Microsoft Azure Previews and the Microsoft Products and Services Data Protection Addendum (DPA).
Azure Language Personally Identifiable Information (PII) detection is a feature offered by Azure Language. The PII detection service is a cloud-based API that utilizes machine learning and AI algorithms to help you develop intelligent applications with advanced natural language understanding. Azure Language PII detection uses Named Entity Recognition (NER) to identify and redact sensitive information from input data. The service classifies sensitive personal data into predefined categories. These categories include phone numbers, email addresses, and identification documents. This classification helps to efficiently detect and eliminate such information.
What's new
The 2025-11-15-preview version introduces the following new PII task parameters:
Multiple redaction policies offer the ability to apply various redaction approaches within a single request:
Configurable confidence threshold enables you to set a minimum confidence score. Entities are only included in the output if their confidence score meets or exceeds the specified threshold.
Disable type validation enforcement enables you to bypass the entity type validation. By default, the service enforces validation across multiple entity types to ensure data integrity and minimize false positives. Disabling this enforcement can enhance operational efficiency in cases where strict validation isn't required.
The following entities are available in preview:
- Airport
- DateOfBirth
- BankAccountNumber
- CASocialIdentificationNumber
- CVV (Card Verification Value )
- City
- PassportNumber
- DriversLicenseNumber
- ExpirationDate
- Geopolitical Entity
- KRDriversLicenseNumber
- KRPassportNumber
- KRSocialSecurityNumber
- LicensePlate
- Location
- Password
- SortCode
- State
- USMedicareBeneficiaryId
- VIN (vehicle identification number)
- ZipCode
Conversational PII detection models (both version
2024-11-01-previewandGA) are updated to provide enhanced AI quality and accuracy. The numeric identifier entity type now also includes Drivers License and Medicare Beneficiary Identifier.- As of June 2024, we now provide General Availability support for the Conversational PII service (English-language only).
- Customers can now redact transcripts, chats, and other text written in a conversational style.
- These capabilities provide better confidence in AI quality. They also offer Azure SLA support, production environment support, and enterprise-grade security.
Capabilities
Currently, PII support is available for the following capabilities:
- General text PII detection for processing sensitive information (PII) and health information (PHI) in unstructured text across several predefined categories.
- Conversation PII detection, a specialized model designed to handle speech transcriptions and the informal, conversational tone found in meeting and call transcripts.
- Native Document PII detection for processing structured document files.
Language is a cloud-based service that applies Natural Language Processing (NLP) features to detect categories of personal information (PII) in text-based data. This documentation contains the following types:
- Quickstarts are getting-started instructions to guide you through making requests to the service.
- How-to guides contain instructions for using the service in more specific or customized ways.
Typical workflow
To use this feature, you submit data for analysis and handle the API output in your application. Analysis is performed as-is, with no added customization to the model used on your data.
Create an Azure Language resource, which grants you access to the features offered by Language. It generates a password (called a key) and an endpoint URL that you use to authenticate API requests.
Create a request using either the REST API or the client library for C#, Java, JavaScript, and Python. You can also send asynchronous calls with a batch request to combine API requests for multiple features into a single call.
Send the request containing your text data. Your key and endpoint are used for authentication.
Stream or store the response locally.
Key features for text PII
Language offers named entity recognition to identify and categorize information within your text. The feature detects PII categories including names, organizations, addresses, phone numbers, financial account numbers or codes, and government identification numbers. A subset of this PII is protected health information (PHI). By specifying domain=phi in your request, only PHI entities are returned.
Get started with PII detection
To use PII detection, you submit text for analysis and handle the API output in your application. Analysis is performed as-is, with no customization to the model used on your data. There are two ways to use PII detection:
| Development option | Description |
|---|---|
| Language studio | Language Studio is a web-based platform that lets you use personally identifying information detection with text examples with your own data when you sign up. For more information, see the Language Studio website or language studio quickstart. |
| REST API or Client library (Azure SDK) | Integrate PII detection into your applications using the REST API, or the client library available in various languages. |
Reference documentation and code samples
As you use this feature in your applications, see the following reference documentation and samples for Azure Language:
| Development option / language | Reference documentation | Samples |
|---|---|---|
| REST API | REST API documentation | |
| C# | C# documentation | C# samples |
| Java | Java documentation | Java Samples |
| JavaScript | JavaScript documentation | JavaScript samples |
| Python | Python documentation | Python samples |
Input requirements and service limits
- Text PII takes text for analysis. For more information, see Data and service limits in the how-to guide.
- PII works with various written languages. For more information, see language support. You can specify in which supported languages your source text is written. If you don't specify a language, the extraction defaults to English. The API may return offsets in the response to support different multilingual and emoji encodings.
Example scenarios
- Apply sensitivity labels - For example, based on the results from the PII service, a public sensitivity label might be applied to documents where no PII entities are detected. For documents where US addresses and phone numbers are recognized, a confidential label might be applied. A highly confidential label might be used for documents where bank routing numbers are recognized.
- Redact some categories of personal information from documents that get wider circulation - For example, if customer contact records are accessible to frontline support representatives, the company can redact the customer's personal information besides their name from the version of the customer history to preserve the customer's privacy.
- Redact personal information in order to reduce unconscious bias - For example, during a company's resume review process, they can block name, address, and phone number to help reduce unconscious gender or other biases.
- Replace personal information in source data for machine learning to reduce unfairness - For example, if you want to remove names that might reveal gender when training a machine learning model, you could use the service to identify them and you could replace them with generic placeholders for model training.
- Remove personal information from call center transcription - For example, if you want to remove names or other PII data that happen between the agent and the customer in a call center scenario. You could use the service to identify and remove them.
- Data cleaning for data science - PII can be used to make the data ready for data scientists and engineers to be able to use these data to train their machine learning models. Redacting the data to make sure that customer data isn't exposed.
Next steps
There are two ways to get started using the entity linking feature:
- Language Studio is a web-based platform that lets you use several Language service features without needing to write code.
- The quickstart article for instructions on making requests to the service using the REST API and client library SDK.