什么是 Azure 信息保护统一标记扫描程序?What is the Azure Information Protection unified labeling scanner?

*适用于Azure 信息保护、Windows Server 2019、Windows Server 2016、Windows Server 2012 R2**Applies to: Azure Information Protection, Windows Server 2019, Windows Server 2016, Windows Server 2012 R2*

*相关客户端仅限 AIP 统一标记客户端Relevant for: AIP unified labeling client only. 对于经典客户端,请参阅什么是 Azure 信息保护经典扫描程序?*For the classic client, see What is the Azure Information Protection classic scanner?*

备注

若要在云存储库上扫描并标记文件,请使用 Cloud App Security,而不是扫描程序。To scan and label files on cloud repositories, use Cloud App Security instead of the scanner.

使用本部分中的信息来了解 Azure 信息保护统一标记扫描程序,然后了解如何成功进行安装、配置和运行,并在必要时对其进行故障排除。Use the information in this section to learn about the Azure Information Protection unified labeling scanner, and then how to successfully install, configure, run and if necessary, troubleshoot it.

AIP 扫描程序在 Windows Server 上作为服务运行,使你能够发现、分类和保护以下数据存储中的文件:The AIP scanner runs as a service on Windows Server and lets you discover, classify, and protect files on the following data stores:

  • 使用 SMB 或 NFS(预览版)协议的网络共享的 UNC 路径UNC paths for network shares that use the SMB or NFS (Preview) protocols.

  • SharePoint Server 2013 至 SharePoint Server 2019 的 SharePoint 文档库和文件夹SharePoint document libraries and folder for SharePoint Server 2019 through SharePoint Server 2013. 如果客户扩展了针对此版 SharePoint 的支持,则 SharePoint 2010 也会受到支持。SharePoint 2010 is also supported for customers who have extended support for this version of SharePoint.

为了对文件进行分类和保护,扫描程序将使用以下某个 Microsoft 365 标记管理中心内配置的敏感度标签:Microsoft 365 安全中心、Microsoft 365 合规中心,以及 Microsoft 365 安全与合规中心。To classify and protect your files, the scanner uses sensitivity labels configured in one of the Microsoft 365 labeling admin centers, including the Microsoft 365 Security Center, the Microsoft 365 Compliance Center, and the Microsoft 365 Security and Compliance Center.

Azure 信息保护统一标记扫描程序概述Azure Information Protection unified labeling scanner overview

AIP 扫描程序可以检查任何可由 Windows 编制索引的文件。The AIP scanner can inspect any files that Windows can index. 如果你已配置敏感度标签以便应用自动分类,则扫描程序可以标记已发现的文件以应用这种分类,并可选择性地应用或删除保护。If you've configured sensitivity labels to apply automatic classification, the scanner can label discovered files to apply that classification, and optionally apply or remove protection.

下图显示了 AIP 扫描程序体系结构,在其中,扫描程序将发现本地和 SharePoint 服务器中的文件。The following image shows the AIP scanner architecture, where the scanner discovers files across your on-premises and SharePoint servers.

Azure 信息保护统一标记扫描程序体系结构

为了检查文件,扫描程序将使用计算机上安装的 IFilter。To inspect your files, the scanner uses IFilters installed on the computer. 为了确定文件是否需要标记,扫描程序将使用 Microsoft 365 内置的数据丢失防护 (DLP) 敏感度信息类型和模式检测,或 Microsoft 365 正则表达式模式。To determine whether the files need labeling, the scanner uses the Microsoft 365 built-in data loss prevention (DLP) sensitivity information types and pattern detection, or Microsoft 365 regex patterns.

扫描程序使用 Azure 信息保护客户端,前者可以分类和保护的文件类型与后者相同。The scanner uses the Azure Information Protection client, and can classify and protect the same types of files as the client. 有关详细信息,请参阅 Azure 信息保护统一标记客户端支持的文件类型For more information, see File types supported by the Azure Information Protection unified labeling client.

根据需要执行以下任一操作来配置扫描:Do any of the following to configure your scans as needed:

  • 仅在发现模式下运行扫描程序,以创建报告来检查在标记文件后发生了什么情况。Run the scanner in discovery mode only to create reports that check to see what happens when your files are labeled.
  • 在不配置应用自动分类的标签的情况下 运行扫描程序来发现包含敏感信息的文件Run the scanner to discover files with sensitive information, without configuring labels that apply automatic classification.
  • 自动运行扫描程序 以根据配置应用标签。Run the scanner automatically to apply labels as configured.
  • 定义文件类型列表 以指定要扫描或排除的特定文件。Define a file types list to specify specific files to scan or to exclude.

备注

该扫描程序并非实时执行发现和标记,The scanner does not discover and label in real time. 而是系统性地对指定数据存储中的文件进行爬网式扫描。It systematically crawls through files on data stores that you specify. 可将此周期配置为运行一次或重复运行。Configure this cycle to run once, or repeatedly.

提示

统一标记扫描程序支持包含多个节点的扫描程序群集,使组织能够横向扩展,让扫描时间更短、扫描范围更广。The unified labeling scanner supports scanner clusters with multiple nodes, enabling your organization to scale out, achieving faster scan times and broader scope.

可以从一开始就部署多个节点,或者先部署单节点群集,以后再随着组织的发展不断添加节点。Deploy multiple nodes right from the start, or start with a single-node cluster and add additional nodes later on as you grow. Install-AIPScanner cmdlet 使用相同的群集名称和数据库来部署多个节点。Deploy multiple nodes by using the same cluster name and database for the Install-AIPScanner cmdlet.

AIP 扫描过程AIP scanning process

扫描文件时,AIP 扫描程序将执行以下每个步骤:When scanning files, the AIP scanner runs through the following steps:

1.确定文件是包括在扫描范围内还是排除在扫描范围外1. Determine whether files are included or excluded for scanning

2.检查并标记文件2. Inspect and label files

3.标记无法检查的文件3. Label files that can't be inspected

有关详细信息,请参阅扫描程序不标记的文件For more information, see Files not labeled by the scanner.

1.确定文件是包括在扫描范围内还是排除在扫描范围外1. Determine whether files are included or excluded for scanning

扫描程序自动跳过从分类和保护中排除的文件,如可执行文件和系统文件。The scanner automatically skips files that are excluded from classification and protection, such as executable files and system files. 有关详细信息,请参阅从分类和保护中排除的文件类型For more information, see File types excluded from classification and protection.

扫描程序还会考虑任何已显式定义为要扫描或者要排除在扫描范围外的文件。The scanner also considers any file lists explicitly defined to scan, or exclude from scanning. 文件列表默认适用于所有数据存储库,也可以仅为特定的存储库定义文件列表。File lists apply for all data repositories by default, and can also be defined for specific repositories only.

若要定义要扫描或排除的文件列表,请在内容扫描作业中使用“要扫描的文件类型”设置。To define file lists for scanning or exclusion, use the File types to scan setting in the content scan job. 例如:For example:

配置 Azure 信息保护扫描程序要扫描的文件类型

有关详细信息,请参阅部署 Azure 信息保护扫描程序以自动对文件进行分类和保护For more information, see Deploying the Azure Information Protection scanner to automatically classify and protect files.

2.检查文件并为其设置标签2. Inspect and label files

识别要排除的文件后,扫描程序将再次筛选以识别支持检查的文件。After identifying excluded files, the scanner filters again to identify files supported for inspection.

这些筛选器与操作系统对 Windows 搜索和索引使用的筛选器相同,无需额外的配置。These filters are the same ones used by the operating system for Windows Search and indexing, and require no extra configuration. Windows IFilter 还用于扫描 Word、Excel 和 PowerPoint 使用的文件类型,以及 PDF 文档和文本文件。Windows IFilter is also used to scan file types that are used by Word, Excel, and PowerPoint, and for PDF documents and text files.

有关支持检查的文件类型的完整列表,以及有关将筛选器配置为包含 .zip 和 .tiff 文件的其他说明,请参阅支持检查的文件类型For a full list of file types supported for inspection, and other instructions for configuring filters to include .zip and .tiff files, see File types supported for inspection.

检查后,将使用针对标签指定的条件来标记支持的文件类型。After inspection, supported file types are labeled using the conditions specified for your labels. 如果使用发现模式,那么这些文件可以报告为包含针对标签指定的条件,或者报告为包含任何已知的敏感信息类型。If you're using discovery mode, these files can either be reported to contain the conditions specified for your labels, or reported to contain any known sensitive information types.

扫描程序进程已停止Stopped scanner processes

如果扫描程序停止并且未完成扫描存储库中的大量文件,你可能需要增加这些文件所在的操作系统的动态端口数。If the scanner stops and doesn't complete a scan for a large number of files in your repository, you may need to increase the number of dynamic ports for the operating system hosting the files.

例如,SharePoint 服务器强化就是导致扫描程序超过允许的网络连接数并因而停止运行的原因之一。For example, server hardening for SharePoint is one reason why the scanner would exceed the number of allowed network connections, and therefore stop.

若要检查 SharePoint 服务器强化是否是导致扫描程序停止的原因,请检查位于 %localappdata%\Microsoft\MSIP\Logs\MSIPScanner.iplog 的扫描程序日志(如有多个日志,将压缩成 zip 文件)中是否出现了以下错误消息:To check whether server hardening for SharePoint is the cause of the scanner stopping, check for the following error message in the scanner logs at %localappdata%\Microsoft\MSIP\Logs\MSIPScanner.iplog (multiple logs are compressed into a zip file):

Unable to connect to the remote server ---> System.Net.Sockets.SocketException: Only one usage of each socket address (protocol/network address/port) is normally permitted IP:port

有关如何查看当前端口范围以及按需增大范围的详细信息,请参阅为提高网络性能而可以修改的设置For more information about how to view the current port range and increase it if needed, see Settings that can be modified to improve network performance.

提示

对于大型 SharePoint 场,可能需要增大列表视图阈值(默认值为 5,000)。For large SharePoint farms, you may need to increase the list view threshold, which has a default of 5,000.

有关详细信息,请参阅在 SharePoint 中管理大型列表和库For more information, see the Manage large lists and libraries in SharePoint.

3.无法检查的标签文件3. Label files that can't be inspected

对于无法检查的任何文件类型,AIP 扫描程序将应用 Azure 信息保护策略中的默认标签,或者为扫描程序配置的默认标签。For any file types that can't be inspected, the AIP scanner applies the default label in the Azure Information Protection policy, or the default label configured for the scanner.

扫描程序不标记的文件Files not labeled by the scanner

在以下情况下,AIP 扫描程序无法标记文件:The AIP scanner cannot label files under the following circumstances:

  • 当标签应用分类但不应用保护,并且相应文件类型不支持客户端仅应用分类时。When the label applies classification, but not protection, and the file type does not support classification-only by the client. 有关详细信息,请参阅统一标记客户端文件类型For more information, see Unified labeling client file types.

  • 当标签应用分类和保护,但扫描程序不支持相应文件类型时。When the label applies classification and protection, but the scanner does not support the file type.

    默认情况下,扫描程序仅保护 Office 文件类型,以及 PDF 文件(使用 ISO PDF 加密标准进行保护时)。By default, the scanner protects only Office file types, and PDF files when they are protected by using the ISO standard for PDF encryption.

    更改要保护的文件类型时,可以添加其他类型的文件进行保护。Other types of files can be added for protection when you change the types of files to protect.

示例:检查 .txt 文件后,扫描程序无法应用仅为了分类而配置的标签,因为 .txt 文件类型不支持“仅分类”。Example: After inspecting .txt files, the scanner can't apply a label that's configured for classification only, because the .txt file type doesn't support classification only.

但是,如果将标签配置为用于分类和保护,并且已包含 .txt 文件类型来让扫描程序提供保护,则扫描程序可以标记该文件。However, if the label is configured for both classification and protection, and the .txt file type is included for the scanner to protect, the scanner can label the file.

后续步骤Next steps

有关部署扫描程序的详细信息,请参阅以下文章:For more information about deploying the scanner, see the following articles:

详细信息More information: