存储在 Log Analytics 和 Application Insights 中的个人数据指南Guidance for personal data stored in Log Analytics and Application Insights

Log Analytics 是一种可能存有个人数据的数据存储。Log Analytics is a data store where personal data is likely to be found. Application Insights 将其数据存储在 Log Analytics 分区中。Application Insights stores its data in a Log Analytics partition. 本文讨论通常在 Log Analytics 和 Application Insights 中的哪些位置可找到此类数据,以及可用于处理此类数据的功能。This article will discuss where in Log Analytics and Application Insights such data is typically found, as well as the capabilities available to you to handle such data.

备注

在本文中,_日志数据_指的是发送到 Log Analytics 工作区的数据,而_应用程序数据_指的是 Application Insights 收集的数据。For the purposes of this article log data refers to data sent to a Log Analytics workspace, while application data refers to data collected by Application Insights.

备注

如果有兴趣查看或删除个人数据,请参阅 GDPR 的 Azure 数据使用者请求一文。If you’re interested in viewing or deleting personal data, please see the Azure Data Subject Requests for the GDPR article. 如需关于 GDPR 的常规信息,请参阅服务信任门户的 GDPR 部分If you’re looking for general info about GDPR, see the GDPR section of the Service Trust portal.

个人数据处理策略Strategy for personal data handling

下面介绍了一些可行的方法,不过最终将由你和你的公司来决定用于处理私人数据(如有)的策略。While it will be up to you and your company to ultimately determine the strategy with which you will handle your private data (if at all), the following are some possible approaches. 我们从技术角度出发,按偏好顺序(偏好度从高到低)列出了这些方法:They are listed in order of preference from a technical point of view from most to least preferable:

  • 如果可行,应停止收集、混淆或匿名处理私人数据,或停止以其他方式调整所收集的数据,使其不再被视为“私人数据”。Where possible, stop collection of, obfuscate, anonymize, or otherwise adjust the data being collected to exclude it from being considered "private". 这是_目前为止_的首选方法,无需创建非常昂贵且影响很大的数据处理策略。This is by far the preferred approach, saving you the need to create a very costly and impactful data handling strategy.
  • 如果不可行,请尝试规范化数据,减少对数据平台和性能的影响。Where not possible, attempt to normalize the data to reduce the impact on the data platform and performance. 例如,不记录显式用户 ID,而是创建查找数据,将用户名及其详细信息关联到可随后在其他位置记录的内部 ID。For example, instead of logging an explicit User ID, create a lookup data that will correlate the username and their details to an internal ID that can then be logged elsewhere. 这样一来,如果某个用户要求删除其个人信息,则只需删除查找表中对应于该用户的行就足够了。That way, should one of your users ask you to delete their personal information, it is possible that only deleting the row in the lookup table corresponding to the user will be sufficient.
  • 最后,如果必须收集私人数据,请就清除 API 路径和现有查询 API 路径构建相关流程,以履行在导出和删除与用户关联的任何私人数据时可能需要承担的任何义务。Finally, if private data must be collected, build a process around the purge API path and the existing query API path to meet any obligations you may have around exporting and deleting any private data associated with a user.

在 Log Analytics 中的何处查找私人数据?Where to look for private data in Log Analytics?

Log Analytics 是十分灵活的存储,可在规定数据架构的同时允许使用自定义值替代每个字段。Log Analytics is a flexible store, which while prescribing a schema to your data, allows you to override every field with custom values. 此外,还可引入任何自定义架构。Additionally, any custom schema can be ingested. 因此,无法确切知道可在特定工作区的哪些位置找到私人数据。As such, it is impossible to say exactly where Private data will be found in your specific workspace. 但是,不妨从清单中的以下位置着手:The following locations, however, are good starting points in your inventory:

日志数据Log data

  • IP 地址:Log Analytics 跨许多不同的表收集各种 IP 信息。IP addresses: Log Analytics collects a variety of IP information across many different tables. 例如,以下查询显示过去 24 小时内从中收集 IPv4 地址的所有表:For example, the following query shows all tables where IPv4 addresses have been collected over the last 24 hours:
    search * 
    | where * matches regex @'\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4}\b' //RegEx originally provided on https://stackoverflow.com/questions/5284147/validating-ipv4-addresses-with-regexp
    | summarize count() by $table
    
  • 用户 ID:可在各种解决方案和表中找到用户 ID。User IDs: User IDs are found in a large variety of solutions and tables. 可使用以下搜索命令在整个数据集中查找特定用户名:You can look for a particular username across your entire dataset using the search command:
    search "[username goes here]"
    
    请记住,不仅要查找用户可读的用户名,还要查找可直接追溯到特定用户的 GUID!Remember to look not only for human-readable user names but also GUIDs that can directly be traced back to a particular user!
  • 设备 ID:与用户 ID 一样,设备 ID 有时被视为“私人数据”。Device IDs: Like user IDs, device IDs are sometimes considered "private". 可使用上面针对用户 ID 列出的方法来识别可能存在此问题的表。Use the same approach as listed above for user IDs to identify tables where this might be a concern.
  • 自定义数据:Log Analytics 允许使用各种方法进行收集:自定义日志和自定义字段、HTTP 数据收集器 API 以及作为系统事件日志一部分收集的自定义数据。Custom data: Log Analytics allows the collection in a variety of methods: custom logs and custom fields, the HTTP Data Collector API , and custom data collected as part of system event logs. 所有这些数据都很有可能包含私人数据,应该进行检查以验证是否存在任何此类数据。All of these are susceptible to containing private data, and should be examined to verify whether any such data exists.
  • 解决方案捕获的数据:由于解决方案机制是开放式的,因此建议查看解决方案生成的所有表以确保符合性。Solution-captured data: Because the solution mechanism is an open-ended one, we recommend reviewing all tables generated by solutions to ensure compliance.

应用程序数据Application data

  • IP 地址:虽然 Application Insights 在默认情况下会将所有 IP 地址字段混淆成“0.0.0.0”,但为了保留会话信息,通常会将此值替代为实际的用户 IP。IP addresses: While Application Insights will by default obfuscate all IP address fields to "0.0.0.0", it is a fairly common pattern to override this value with the actual user IP to maintain session information. 可以使用下面的 Analytics 查询来查找特定的表,此类表的 IP 地址列中包含的值在过去 24 小时内不是“0.0.0.0”:The Analytics query below can be used to find any table that contains values in the IP address column other than "0.0.0.0" over the last 24 hours:
    search client_IP != "0.0.0.0"
    | where timestamp > ago(1d)
    | summarize numNonObfuscatedIPs_24h = count() by $table
    
  • 用户 ID:默认情况下,Application Insights 会使用为用户随机生成的 ID,以便进行会话跟踪。User IDs: By default, Application Insights will use randomly generated IDs for user and session tracking. 不过,这些字段常常会被替代,改为存储与应用程序更相关的 ID。However, it is common to see these fields overridden to store an ID more relevant to the application. 例如:用户名、AAD GUID 等。这些 ID 通常会被视为范围内的个人数据,因此应处理得当。For example: usernames, AAD GUIDs, etc. These IDs are often considered to be in-scope as personal data, and therefore, should be handled appropriately. 我们的建议始终是尝试对这些 ID 进行混淆或匿名处理。Our recommendation is always to attempt to obfuscate or anonymize these IDs. 通常可以在其中发现这些值的字段包括:session_Id、user_Id、user_AuthenticatedId、user_AccountId、customDimensions。Fields where these values are commonly found include session_Id, user_Id, user_AuthenticatedId, user_AccountId, as well as customDimensions.
  • 自定义数据:Application Insights 允许向任何数据类型追加一组自定义维度。Custom data: Application Insights allows you to append a set of custom dimensions to any data type. 这些维度可以是任何数据。These dimensions can be any data. 使用以下查询来确定在过去 24 小时内收集的任何自定义维度:Use the following query to identify any custom dimensions collected over the last 24 hours:
    search * 
    | where isnotempty(customDimensions)
    | where timestamp > ago(1d)
    | project $table, timestamp, name, customDimensions 
    
  • 内存中和传输中数据:Application Insights 会跟踪异常、请求、依赖项调用和跟踪。In-memory and in-transit data: Application Insights will track exceptions, requests, dependency calls, and traces. 私人数据通常可以在代码和 HTTP 调用级别收集。Private data can often be collected at the code and HTTP call level. 查看异常、请求、依赖项和跟踪表中是否存在任何此类数据。Review the exceptions, requests, dependencies, and traces tables to identify any such data. 尽可能使用遥测初始值设定项来混淆该数据。Use telemetry initializers where possible to obfuscate this data.
  • Snapshot Debugger 捕获:使用 Application Insights 中的 Snapshot Debugger 功能时,只要在应用程序的生产实例上捕获某个异常,就可以收集调试快照。Snapshot Debugger captures: The Snapshot Debugger feature in Application Insights allows you to collect debug snapshots whenever an exception is caught on the production instance of your application. 快照会公开导致异常的完整堆栈跟踪,以及堆栈中每一步的本地变量的值。Snapshots will expose the full stack trace leading to the exceptions as well as the values for local variables at every step in the stack. 遗憾的是,此功能不允许选择性地删除吸附点,也不允许以编程方式访问快照中的数据。Unfortunately, this feature does not allow for selective deletion of snap points, or programmatic access to data within the snapshot. 因此,如果默认的快照保留率不满足符合性要求,建议关闭此功能。Therefore, if the default snapshot retention rate does not satisfy your compliance requirements, the recommendation is to turn off the feature.

如何导出和删除私人数据How to export and delete private data

如前面的个人数据处理策略部分所述,如果可行,__强烈__建议重新构建数据收集策略,以禁止收集、混淆或匿名处理私人数据,或禁止以其他方式对其进行修改,使其不再被视为“私人数据”。As mentioned in the strategy for personal data handling section earlier, it is strongly recommended to if it all possible, to restructure your data collection policy to disable the collection of private data, obfuscating or anonymizing it, or otherwise modifying it to remove it from being considered "private". 处理数据首先需要你和你的团队定义和自动化策略,为客户构建一个界面,以便与数据交互,并且还需持续承担维护成本。Handling the data will foremost result in costs to you and your team to define and automate a strategy, build an interface for your customers to interact with their data through, and ongoing maintenance costs. 此外,Log Analytics 和 Application Insights 的计算成本很高,大量并发查询或清除 API 调用可能会对与 Log Analytics 功能的所有其他交互产生负面影响。Further, it is computationally costly for Log Analytics and Application Insights, and a large volume of concurrent query or purge API calls have the potential to negatively impact all other interaction with Log Analytics functionality. 尽管如此,在某些情况下,确实必须收集私人数据。That said, there are indeed some valid scenarios where private data must be collected. 对于这些情况,数据应按本部分所述进行处理。For these cases, data should be handled as described in this section.

备注

本文介绍如何删除设备或服务中的个人数据,并且可为 GDPR 下的任务提供支持。This article provides steps for how to delete personal data from the device or service and can be used to support your obligations under the GDPR. 如需关于 GDPR 的常规信息,请参阅服务信任门户的 GDPR 部分If you're looking for general info about GDPR, see the GDPR section of the Service Trust portal.

查看和导出View and export

对于查看和导出数据请求,应使用 Log Analytics 查询 APIApplication Insights 查询 APIFor both view and export data requests, the Log Analytics query API or the Application Insights query API should be used. 将数据形状转换为适当形状以提供给用户时,将由你实现相关逻辑。Logic to convert the shape of the data to an appropriate one to deliver to your users will be up to you to implement. Azure Functions 非常适合托管此类逻辑。Azure Functions makes a great place to host such logic.

重要

虽然绝大多数清除操作完成起来会比 SLA 快得多,但由于其对所用数据平台造成的严重影响,因此完成清除操作所需的正式 SLA 设置为 30 天While the vast majority of purge operations may complete much quicker than the SLA, the formal SLA for the completion of purge operations is set at 30 days due to their heavy impact on the data platform used. 这是一个自动化过程;无法请求以更快的速度处理操作。This is an automated process; there is no way to request that an operation be handled faster.

删除Delete

警告

Log Analytics 中的删除操作具有破坏性且不可逆!Deletes in Log Analytics are destructive and non-reversible! 执行时请特别小心。Please use extreme caution in their execution.

我们已在隐私处理中提供清除 API 路径。We have made available as part of a privacy handling a purge API path. 使用此路径会带来一定的风险和潜在的性能影响,并有可能导致 Log Analytics 数据的所有聚合、度量和其他方面发生偏差,因此应谨慎使用。This path should be used sparingly due to the risk associated with doing so, the potential performance impact, and the potential to skew all-up aggregations, measurements, and other aspects of your Log Analytics data. 有关处理私人数据的替代方法,请参阅个人数据处理策略部分。See the Strategy for personal data handling section for alternative approaches to handle private data.

清除是一项高特权操作,如果未向 Azure 中的应用或用户显式授予 Azure 资源管理器中的某个角色,则任何应用或用户(甚至包括资源所有者)都无权执行该操作。Purge is a highly privileged operation that no app or user in Azure (including even the resource owner) will have permissions to execute without explicitly being granted a role in Azure Resource Manager. 此角色为_数据清除程序_,由于可能会丢失数据,应谨慎委托。This role is Data Purger and should be cautiously delegated due to the potential for data loss.

重要

若要管理系统资源,清除请求被限制为每小时 50 个请求。In order to manage system resources, purge requests are throttled at 50 requests per hour. 应该通过发送一条命令并在其谓词中包含所有需要清除的用户标识,批量执行清除请求。You should batch the execution of purge requests by sending a single command whose predicate includes all user identities that require purging. 使用 in 运算符来指定多个标识。Use the in operator to specify multiple identities. 在执行清除请求之前,应运行查询来验证结果是否符合预期。You should run the query before executing the purge request to verify that the results are expected.

一旦分配该 Azure 资源管理器角色,就有两个新的 API 路径可用:Once the Azure Resource Manager role has been assigned, two new API paths are available:

日志数据Log data

  • POST purge - 使用一个对象来指定要删除的数据的参数,并返回引用 GUIDPOST purge - takes an object specifying parameters of data to delete and returns a reference GUID

  • GET purge status:POST purge 调用将返回“x-ms-status-location”标头,其中包含一个 URL,可以调用该 URL 来确定清除 API 的状态。GET purge status - the POST purge call will return an 'x-ms-status-location' header that will include a URL that you can call to determine the status of your purge API. 例如:For example:

    x-ms-status-location: https://management.chinacloudapi.cn/subscriptions/[SubscriptionId]/resourceGroups/[ResourceGroupName]/providers/Microsoft.OperationalInsights/workspaces/[WorkspaceName]/operations/purge-[PurgeOperationId]?api-version=2015-03-20
    

重要

虽然我们预计绝大多数清除操作完成起来都比我们的 SLA 快得多,但由于这些操作会对 Log Analytics 使用的数据平台造成严重影响,因此我们将完成清除操作所需的正式 SLA 设置为 30 天While we expect the vast majority of purge operations to complete much quicker than our SLA, due to their heavy impact on the data platform used by Log Analytics, the formal SLA for the completion of purge operations is set at 30 days.

应用程序数据Application data

  • POST purge - 使用一个对象来指定要删除的数据的参数,并返回引用 GUIDPOST purge - takes an object specifying parameters of data to delete and returns a reference GUID

  • GET purge status:POST purge 调用将返回“x-ms-status-location”标头,其中包含一个 URL,可以调用该 URL 来确定清除 API 的状态。GET purge status - the POST purge call will return an 'x-ms-status-location' header that will include a URL that you can call to determine the status of your purge API. 例如:For example:

    x-ms-status-location: https://management.chinacloudapi.cn/subscriptions/[SubscriptionId]/resourceGroups/[ResourceGroupName]/providers/microsoft.insights/components/[ComponentName]/operations/purge-[PurgeOperationId]?api-version=2015-05-01
    

重要

虽然绝大多数清除操作完成起来会比 SLA 快得多,但由于其对 Application Insights 使用的数据平台造成的严重影响,因此将完成清除操作所需的正式 SLA 设置为 30 天While the vast majority of purge operations may complete much quicker than the SLA, due to their heavy impact on the data platform used by Application Insights, the formal SLA for the completion of purge operations is set at 30 days.

后续步骤Next steps