设计 Azure Monitor 日志部署Designing your Azure Monitor Logs deployment

Azure Monitor 将日志数据存储在 Log Analytics 工作区中。该工作区是一个 Azure 资源,也是一个用于收集和聚合数据的容器,充当管理边界。Azure Monitor stores log data in a Log Analytics workspace, which is an Azure resource and a container where data is collected, aggregated, and serves as an administrative boundary. 尽管可以在 Azure 订阅中部署一个或多个工作区,但为了确保初始部署遵循我们的指导原则来提供经济高效、易管理、可缩放且符合组织需求的部署,应考虑到以下多种因素。While you can deploy one or more workspaces in your Azure subscription, there are several considerations you should understand in order to ensure your initial deployment is following our guidelines to provide you with a cost effective, manageable, and scalable deployment meeting your organizations needs.

工作区中的数据组织成表,每个表存储不同类型的数据,根据生成数据的资源,它还具有自身独特的属性集。Data in a workspace is organized into tables, each of which stores different kinds of data and has its own unique set of properties based on the resource generating the data. 大多数数据源将数据写入到其各自在 Log Analytics 工作区中的表内。Most data sources will write to their own tables in a Log Analytics workspace.

工作区数据模型示例

Log Analytics 工作区可提供:A Log Analytics workspace provides:

  • 数据存储的地理位置。A geographic location for data storage.
  • 遵循建议的设计策略之一授予不同的用户访问权限,以实现数据隔离。Data isolation by granting different users access rights following one of our recommended design strategies.
  • 设置配置的范围,例如定价层级保留期数据上限Scope for configuration of settings like pricing tier, retention, and data capping.

本文提供设计和迁移注意事项的详细概述、访问控制概述,我们为 IT 组织推荐的设计实施方案的介绍。This article provides a detailed overview of the design and migration considerations, access control overview, and an understanding of the design implementations we recommend for your IT organization.

访问控制策略的重要注意事项Important considerations for an access control strategy

以下一项或多项要求会影响所需工作区数量的选择:Identifying the number of workspaces you need is influenced by one or more of the following requirements:

  • 贵公司是全球性公司,因数据所有权和合规性需要将日志数据存储于特定区域。You are a global company and you need log data stored in specific regions for data sovereignty or compliance reasons.
  • 正在使用 Azure,并希望通过让工作区与它所管理的 Azure 资源位于同一区域,避免产生出站数据传输费用。You are using Azure and you want to avoid outbound data transfer charges by having a workspace in the same region as the Azure resources it manages.
  • 你要管理多个部门或业务组,并希望每个部门或业务组能够看到自己的数据,但看不到其他部门的数据。You manage multiple departments or business groups, and you want each to see their own data, but not data from others. 此外,在整合部门或业务组视图方面没有业务要求。Also, there is no business requirement for a consolidated cross department or business group view.

当今的 IT 组织会遵循集中式、分散式或介于两者之间的混合结构建模。IT organizations today are modeled following either a centralized, decentralized, or an in-between hybrid of both structures. 因此,经常使用以下工作区部署模型映射到其中一个组织结构:As a result, the following workspace deployment models have been commonly used to map to one of these organizational structures:

  • 集中式:所有日志存储在中心工作区并由单个团队进行管理,Azure Monitor 为每个团队提供差异访问权限。Centralized: All logs are stored in a central workspace and administered by a single team, with Azure Monitor providing differentiated access per-team. 在此方案中,可以轻松管理和搜索各个资源,以及交叉关联日志。In this scenario, it is easy to manage, search across resources, and cross-correlate logs. 根据从订阅中的多个资源收集的数据量,工作区可能会明显增大,对不同的用户保持访问控制会增大管理开销。The workspace can grow significantly depending on the amount of data collected from multiple resources in your subscription, with additional administrative overhead to maintain access control to different users. 此模型称为“中心和分支”。This model is known as "hub and spoke".
  • 分散式:每个团队在其拥有和管理的资源组中创建自己的工作区,日志数据按资源隔离。Decentralized: Each team has their own workspace created in a resource group they own and manage, and log data is segregated per resource. 在此方案中,工作区可以保持安全性,访问控制与资源访问权限保持一致,但难以交叉关联日志。In this scenario, the workspace can be kept secure and access control is consistent with resource access, but it's difficult to cross-correlate logs. 需要大范围查看多种资源的用户无法以有效的方式分析数据。Users who need a broad view of many resources cannot analyze the data in a meaningful way.
  • 混合:安全审核合规性要求进一步使此方案变得复杂,因为许多组织同时实施这两种部署模型。Hybrid: Security audit compliance requirements further complicate this scenario because many organizations implement both deployment models in parallel. 这通常会导致复杂、大开销且难以维护的配置,并使日志覆盖范围出现差异。This commonly results in a complex, expensive, and hard-to-maintain configuration with gaps in logs coverage.

使用 Log Analytics 代理收集数据时,需要了解以下各项以规划代理部署:When using the Log Analytics agents to collect data, you need to understand the following in order to plan your agent deployment:

访问控制概述Access control overview

使用基于角色的访问控制 (RBAC),可以仅向用户和组授予其在工作区中处理监视数据所需的访问级别。With role-based access control (RBAC), you can grant users and groups only the amount of access they need to work with monitoring data in a workspace. 这样,便可以使用单个工作区来存储对所有资源启用的收集数据,从而符合 IT 组织的操作模型。This allows you to align with your IT organization operating model using a single workspace to store collected data enabled on all your resources. 例如,可向负责管理 Azure 虚拟机 (VM) 上托管的基础结构服务的团队授予访问权限,因此,他们只能访问这些 VM 生成的日志。For example, you grant access to your team responsible for infrastructure services hosted on Azure virtual machines (VMs), and as a result they'll have access to only the logs generated by the VMs. 此方案遵循我们的新资源上下文日志模型。This is following our new resource-context log model. 此模型的基础适用于 Azure 资源发出的每条日志记录,自动与此资源相关联。The basis for this model is for every log record emitted by an Azure resource, it is automatically associated with this resource. 日志将根据资源转发到与范围和 RBAC 相符的中心工作区。Logs are forwarded to a central workspace that respects scoping and RBAC based on the resources.

用户有权访问的数据由下表中列出的因素组合决定。The data a user has access to is determined by a combination of factors that are listed in the following table. 后续部分会描述每种因素。Each is described in the sections below.

因子Factor 说明Description
访问模式Access mode 用户访问工作区的方法。Method the user uses to access the workspace. 定义可用数据的范围,以及应用的访问控制模式。Defines the scope of the data available and the access control mode that's applied.
访问控制模式Access control mode 工作区中的设置,用于定义是要在工作区级别还是资源级别应用权限。Setting on the workspace that defines whether permissions are applied at the workspace or resource level.
权限Permissions 应用到工作区或资源的个人用户或用户组的权限。Permissions applied to individual or groups of users for the workspace or resource. 定义用户有权访问哪些数据。Defines what data the user will have access to.
表级 RBACTable level RBAC 应用到所有用户(无论他们使用的是访问模式还是访问控制模式)的可选精细权限。Optional granular permissions that apply to all users regardless of their access mode or access control mode. 定义用户可以访问哪些数据类型。Defines which data types a user can access.

访问模式Access mode

访问模式是指用户如何访问 Log Analytics 工作区,并定义他们有权访问的数据范围。The access mode refers to how a user accesses a Log Analytics workspace and defines the scope of data they can access.

用户可通过两个选项访问数据:Users have two options for accessing the data:

  • 工作区上下文:你可以查看你有权访问的工作区中的所有日志。Workspace-context: You can view all logs in the workspace you have permission to. 在此模式下,只能查询该工作区中所有表内的所有数据。Queries in this mode are scoped to all data in all tables in the workspace. 使用工作区作为范围来访问日志时(例如,在 Azure 门户上的“Azure Monitor”菜单中选择“日志”时),将使用此访问模式 。This is the access mode used when logs are accessed with the workspace as the scope, such as when you select Logs from the Azure Monitor menu in the Azure portal.

    工作区中的 Log Analytics 上下文

  • 资源上下文:访问特定资源、资源组或订阅的工作区时(例如,在 Azure 门户上的资源菜单中选择“日志”时),只能查看所有表中你有权访问的资源的日志。Resource-context: When you access the workspace for a particular resource, resource group, or subscription, such as when you select Logs from a resource menu in the Azure portal, you can view logs for only resources in all tables that you have access to. 在此模式下,只能查询与该资源关联的数据。Queries in this mode are scoped to only data associated with that resource. 此模式还支持粒度 RBAC。This mode also enables granular RBAC.

    资源中的 Log Analytics 上下文

    备注

    仅当日志已适当地关联到相关资源时,才能对日志进行资源上下文查询。Logs are available for resource-context queries only if they were properly associated with the relevant resource. 目前,以下资源存在限制:Currently, the following resources have limitations:

    • Azure 外部的计算机Computers outside of Azure
    • Service FabricService Fabric
    • Application InsightsApplication Insights

    可以通过运行一个查询并检查所需的记录,来测试日志是否已适当关联到其资源。You can test if logs are properly associated with their resource by running a query and inspecting the records you're interested in. 如果 _ResourceId 属性中包含正确的资源 ID,则可以对数据进行以资源为中心的查询。If the correct resource ID is in the _ResourceId property, then data is available to resource-centric queries.

Azure Monitor 根据执行日志搜索时所在的上下文自动确定正确的模式。Azure Monitor automatically determines the right mode depending on the context you perform the log search from. 范围始终显示在 Log Analytics 的左上部分。The scope is always presented in the top-left section of Log Analytics.

比较访问模式Comparing access modes

下表汇总了访问模式:The following table summarizes the access modes:

问题Issue 工作区上下文Workspace-context 资源上下文Resource-context
每种模式适合哪类用户?Who is each model intended for? 集中管理。Central administration. 需要配置数据收集的管理员,以及需要访问各种资源的用户。Administrators who need to configure data collection and users who need access to a wide variety of resources. 此外,需要访问 Azure 外部资源的日志的用户目前也需要使用此模式。Also currently required for users who need to access logs for resources outside of Azure. 应用程序团队。Application teams. 受监视 Azure 资源的管理员。Administrators of Azure resources being monitored.
用户需要哪些权限才能查看日志?What does a user require to view logs? 对工作区的权限。Permissions to the workspace. 请参阅使用工作区权限管理访问权限中的工作区权限See Workspace permissions in Manage access using workspace permissions. 对资源的读取访问权限。Read access to the resource. 请参阅使用 Azure 权限管理访问权限中的资源权限See Resource permissions in Manage access using Azure permissions. 权限可以继承(例如,从包含资源组继承),也可以直接分配给资源。Permissions can be inherited (such as from the containing resource group) or directly assigned to the resource. 系统会自动分配对资源日志的权限。Permission to the logs for the resource will be automatically assigned.
权限范围是什么?What is the scope of permissions? 工作区。Workspace. 有权访问工作区的用户可以通过他们有权访问的表查询该工作区中的所有日志。Users with access to the workspace can query all logs in the workspace from tables that they have permissions to. 请参阅表访问控制See Table access control Azure 资源。Azure resource. 用户可以通过任何工作区查询他们有权访问的资源、资源组或订阅的日志,但无法查询其他资源的日志。User can query logs for specific resources, resource groups, or subscription they have access to from any workspace but can't query logs for other resources.
用户如何访问日志?How can user access logs?
  • 从“Azure Monitor”菜单启动“日志”。 Start Logs from Azure Monitor menu.
  • 从“Log Analytics 工作区”启动“日志”。 Start Logs from Log Analytics workspaces.
  • 从 Azure 资源的菜单启动“日志”Start Logs from the menu for the Azure resource
  • 从“Azure Monitor”菜单启动“日志”。 Start Logs from Azure Monitor menu.
  • 从“Log Analytics 工作区”启动“日志”。 Start Logs from Log Analytics workspaces.

访问控制模式Access control mode

访问控制模式是每个工作区中的一项设置,定义如何确定该工作区的权限。The Access control mode is a setting on each workspace that defines how permissions are determined for the workspace.

  • 需要工作区权限:此控制模式不允许精细的 RBAC。Require workspace permissions: This control mode does not allow granular RBAC. 用户若要访问工作区,必须获得对该工作区或特定表的权限。For a user to access the workspace, they must be granted permissions to the workspace or to specific tables.

    如果用户遵循工作区上下文模式访问工作区,将可以访问他们有权访问的任何表中的所有数据。If a user accesses the workspace following the workspace-context mode, they have access to all data in any table they've been granted access to. 如果用户遵循资源上下文模式访问工作区,则只能访问他们有权访问的任何表中该资源的数据。If a user accesses the workspace following the resource-context mode, they have access to only data for that resource in any table they've been granted access to.

    这是在 2019 年 3 月之前创建的所有工作区的默认设置。This is the default setting for all workspaces created before March 2019.

  • 使用资源或工作区权限:此控制模式允许精细的 RBAC。Use resource or workspace permissions: This control mode allows granular RBAC. 可以通过分配 Azure read 权限,仅向用户授予与他们可查看的资源相关联的数据的访问权限。Users can be granted access to only data associated with resources they can view by assigning Azure read permission.

    当用户以工作区上下文模式访问工作区时,将应用工作区权限。When a user accesses the workspace in workspace-context mode, workspace permissions apply. 当用户以资源上下文模式访问工作区时,只会验证资源权限,而会忽略工作区权限。When a user accesses the workspace in resource-context mode, only resource permissions are verified, and workspace permissions are ignored. 要为用户启用 RBAC,可将其从工作区权限中删除,并允许识别其资源权限。Enable RBAC for a user by removing them from workspace permissions and allowing their resource permissions to be recognized.

    这是在 2019 年 3 月之后创建的所有工作区的默认设置。This is the default setting for all workspaces created after March 2019.

    备注

    如果用户只对工作区拥有资源权限,则他们只能使用资源上下文模式访问工作区(假设工作区访问模式设置为“使用资源或工作区权限”)。If a user has only resource permissions to the workspace, they are only able to access the workspace using resource-context mode assuming the workspace access mode is set to Use resource or workspace permissions.

若要了解如何使用门户、PowerShell 或资源管理器模板更改访问控制模式,请参阅配置访问控制模式To learn how to change the access control mode in the portal, with PowerShell, or using a Resource Manager template, see Configure access control mode.

引入量速率限制Ingestion volume rate limit

Azure Monitor 是一种大规模数据服务,每月为成千上万的客户发送数 TB 的数据,并且此数据仍在不断增长。Azure Monitor is a high scale data service that serves thousands of customers sending terabytes of data each month at a growing pace. 引入量速率限制旨在保护 Azure Monitor 客户免受多租户环境中突然出现的引入高峰的影响。The volume rate limit intends to protect Azure Monitor customers from sudden ingestion spikes in multitenancy environment. 默认的引入量速率阈值为 500 M(压缩量),适用于工作区,大约等于未压缩时的每分钟 6 GB 的速率 - 根据日志长度及其压缩率,不同数据类型的实际大小可能不同。A default ingestion volume rate threshold of 500 MB (compressed) applies to workspaces, which is approximately 6 GB/min uncompressed -- the actual size can vary between data types depending on the log length and its compression ratio. 此阈值适用于所有引入的数据,无论是使用诊断设置数据收集器 API 还是代理从 Azure 发送都适用。This threshold applies to all ingested data whether sent from Azure resources using Diagnostic settings, Data Collector API or agents.

如果将数据发送至工作区时采用的引入量速率高于工作区中配置的阈值的 80%,则当继续超过阈值时,会每 6 小时向你工作区中的“操作”表发送一个事件。When you send data to a workspace at a volume rate higher than 80% of the threshold configured in your workspace, an event is sent to the Operation table in your workspace every 6 hours while the threshold continues to be exceeded. 如果引入量速率超过阈值,则当继续超过阈值时,某些数据会被放弃,并且每 6 小时向你工作区中的“操作”表发送一个事件。When ingested volume rate is higher than threshold, some data is dropped and an event is sent to the Operation table in your workspace every 6 hours while the threshold continues to be exceeded. 如果引入量速率继续超过阈值,或者预计很快会达到阈值,你可打开支持请求,请求在工作区中调高阈值。If your ingestion volume rate continues to exceed the threshold or you are expecting to reach it sometime soon, you can request to increase it in your workspace by opening a support request.

若要就工作区中的此类事件收到通知,请使用警报逻辑通过以下查询创建一条日志警报规则,其中该逻辑依据的是结果数大于 0、评估时段为 5 分钟且频率为 5 分钟。To be notified on such an event in your workspace, create a log alert rule using the following query with alert logic base on number of results grater than zero, evaluation period of 5 minutes and frequency of 5 minutes.

引入量速率达到阈值的 80%:Ingestion volume rate reached 80% of threshold:

Operation
|where OperationCategory == "Ingestion"
|where Detail startswith "The data ingestion volume rate crossed 80% of the threshold"

引入量速率达到阈值:Ingestion volume rate reached threshold:

Operation
|where OperationCategory == "Ingestion"
|where Detail startswith "The data ingestion volume rate crossed the threshold"

建议Recommendations

资源上下文设计示例

本方案涉及到 IT 组织订阅中的单个工作区设计,该设计不受数据主权或合规性的约束,或者需要映射到部署资源的区域。This scenario covers a single workspace design in your IT organizations subscription that is not constrained by data sovereignty or regulatory compliance, or needs to map to the regions your resources are deployed within. 此方案可让组织中的安全和 IT 管理团队利用与 Azure 访问管理之间的改进集成,以及更安全的访问控制。It allows your organizations security and IT admin teams the ability to leverage the improved integration with Azure access management and more secure access control.

支持由不同团队维护的基础结构和应用程序的所有资源、监视解决方案和见解(例如 Application Insights 和用于 VM 的 Azure Monitor)将配置为向 IT 组织集中式共享工作区转发收集的日志数据。All resources, monitoring solutions, and Insights such as Application Insights and Azure Monitor for VMs, supporting infrastructure and applications maintained by the different teams are configured to forward their collected log data to the IT organizations centralized shared workspace. 为每个团队的用户授予其已有权访问的资源的日志访问权限。Users on each team are granted access to logs for resources they have been given access to.

部署工作区体系结构后,可以使用 Azure Policy 对 Azure 资源强制实施此方案。Once you have deployed your workspace architecture, you can enforce this on Azure resources with Azure Policy. 此方案可让你定义策略并确保 Azure 资源合规,因此它们将其所有资源日志发送到特定的工作区。It provides a way to define policies and ensure compliance with your Azure resources so they send all their resource logs to a particular workspace. 例如,使用 Azure 虚拟机或虚拟机规模集时,可以使用现有的策略来评估工作区合规性和报告结果,或者自定义策略,以便在不合规的情况下予以补救。For example, with Azure virtual machines or virtual machine scale sets, you can use existing policies that evaluate workspace compliance and report results, or customize to remediate if non-compliant.

工作区整合迁移策略Workspace consolidation migration strategy

对于已部署多个工作区并对整合到资源上下文访问模型感兴趣的客户,我们建议采用增量方法迁移到建议的访问模型,而不要尝试快速或以激进方式实现此目的。For customers who have already deployed multiple workspaces and are interested in consolidating to the resource-context access model, we recommend you take an incremental approach to migrate to the recommended access model, and you don't attempt to achieve this quickly or aggressively. 根据合理的时间线以分阶段的方法进行规划、迁移、验证和停用,将有助于避免任何计划外事件或者对云操作造成意外影响。Following a phased approach to plan, migrate, validate, and retire following a reasonable timeline will help avoid any unplanned incidents or unexpected impact to your cloud operations. 如果出于合规或业务原因而没有制定数据保留策略,则在迁移过程中,需要评估在从中迁移的工作区中保留数据的适当时间长短。If you do not have a data retention policy for compliance or business reasons, you need to assess the appropriate length of time to retain data in the workspace you are migrating from during the process. 将资源重新配置为向共享工作区报告时,仍可根据需要分析原始工作区中的数据。While you are reconfiguring resources to report to the shared workspace, you can still analyze the data in the original workspace as necessary. 完成迁移后,如果在保留期结束之前需要根据监管要求在原始工作区中保留数据,请不要删除该工作区。Once the migration is complete, if you're governed to retain data in the original workspace before the end of the retention period, don't delete it.

规划迁移到此模型时,请注意以下事项:While planning your migration to this model, consider the following:

  • 了解必须遵守的有关数据保留的行业法规和内部政策。Understand what industry regulations and internal policies regarding data retention you must comply with.
  • 确保应用程序团队可在现有的资源上下文功能范围内工作。Make sure that your application teams can work within the existing resource-context functionality.
  • 确定为应用程序团队授予的资源访问权限,并先在开发环境中进行测试,然后在生产环境中实施。Identify the access granted to resources for your application teams and test in a development environment before implementing in production.
  • 将工作区配置为启用“使用资源或工作区权限”。Configure the workspace to enable Use resource or workspace permissions.
  • 删除应用程序团队的工作区读取和查询权限。Remove application teams permission to read and query the workspace.
  • 启用并配置原始工作区中部署的任何监视解决方案、见解(例如用于容器的 Azure Monitor 和/或用于 VM 的 Azure Monitor)、自动化帐户和管理解决方案(例如更新管理、启动/停止 VM 等)。Enable and configure any monitoring solutions, Insights such as Azure Monitor for containers and/or Azure Monitor for VMs, your Automation account(s), and management solutions such as Update Management, Start/Stop VMs, etc., that were deployed in the original workspace.

后续步骤Next steps

若要实施本指南中建议的安全权限和控制措施,请查看管理对日志的访问权限To implement the security permissions and controls recommended in this guide, review manage access to logs.