什么是 Azure Active Directory 体系结构?What is the Azure Active Directory architecture?

使用 Azure Active Directory (Azure AD) 可以安全地管理用户对 Azure 服务和资源的访问。Azure Active Directory (Azure AD) enables you to securely manage access to Azure services and resources for your users. Azure AD 随附了整套标识管理功能。Included with Azure AD is a full suite of identity management capabilities. 有关 Azure AD 功能的信息,请参阅什么是 Azure Active Directory?For information about Azure AD features, see What is Azure Active Directory?

在 Azure AD 中可以创建及管理用户和组,并使用权限来允许和拒绝对企业资源的访问。With Azure AD, you can create and manage users and groups, and enable permissions to allow and deny access to enterprise resources. 有关标识管理的信息,请参阅 Azure 标识管理基础知识For information about identity management, see The fundamentals of Azure identity management.

Azure AD 体系结构Azure AD architecture

Azure AD 的地理分布式体系结构整合了全面监视、自动重新路由、故障转移和恢复功能,使我们能够为客户提供公司级的可用性与性能。Azure AD's geographically distributed architecture combines extensive monitoring, automated rerouting, failover, and recovery capabilities, which deliver company-wide availability and performance to customers.

本文介绍以下体系结构元素:The following architecture elements are covered in this article:

  • 服务体系结构设计Service architecture design
  • 可伸缩性Scalability
  • 连续可用性Continuous availability
  • 数据中心Datacenters

服务体系结构设计Service architecture design

构建可访问、可用且数据丰富的系统的最常见方法是通过独立的构建块或缩放单元。The most common way to build an accessible and usable, data-rich system is through independent building blocks or scale units. 对于 Azure AD 数据层,缩放单元称为“分区” 。For the Azure AD data tier, scale units are called partitions.

数据层包含多个可提供读写功能的前端服务。The data tier has several front-end services that provide read-write capability. 下图显示了单目录分区的组件在整个地理分布式数据中心内的交付方式。The diagram below shows how the components of a single-directory partition are delivered throughout geographically distributed datacenters.

单目录分区图示

Azure AD 体系结构的组件包括主副本和辅助副本。The components of Azure AD architecture include a primary replica and secondary replicas.

主副本Primary replica

主副本接收它所属的分区的所有写入操作。The primary replica receives all writes for the partition it belongs to. 在向调用方返回成功消息之前,任何写入操作将立即复制到不同数据中心内的次要副本,从而确保写入操作具有异地冗余的持久性。Any write operation is immediately replicated to a secondary replica in a different datacenter before returning success to the caller, thus ensuring geo-redundant durability of writes.

辅助副本Secondary replicas

所有目录读取 操作会通过物理分散在不同地理区域的数据中心内的次要副本 提供服务。All directory reads are serviced from secondary replicas, which are at datacenters that are physically located across different geographies. 由于数据是以异步方式复制的,因此存在许多辅助副本。There are many secondary replicas, as data is replicated asynchronously. 目录读取操作(例如身份验证请求)通过靠近客户的数据中心提供服务。Directory reads, such as authentication requests, are serviced from datacenters that are close to customers. 辅助副本负责提供读取可伸缩性。The secondary replicas are responsible for read scalability.

可伸缩性Scalability

可伸缩性是指服务根据不断提高的性能需求进行扩展的能力。Scalability is the ability of a service to expand to meet increasing performance demands. 将数据分区可实现写入伸缩性。Write scalability is achieved by partitioning the data. 要实现读取伸缩性,可将数据从一个分区复制到分发在世界各地的多个次要副本。Read scalability is achieved by replicating data from one partition to multiple secondary replicas distributed throughout the world.

来自目录应用程序的请求将路由到它们在物理上最靠近的数据中心。Requests from directory applications are routed to the datacenter that they are physically closest to. 写入操作以透明方式重定向到主要副本,以提供读写一致性。Writes are transparently redirected to the primary replica to provide read-write consistency. 由于在大多数情况下目录通常为读取操作提供服务,因此辅助副本可以大幅扩展分区的规模。Secondary replicas significantly extend the scale of partitions because the directories are typically serving reads most of the time.

目录应用程序连接到最靠近的数据中心。Directory applications connect to the nearest datacenters. 此连接可以改善性能,因此可实现扩展。This connection improves performance, and therefore scaling out is possible. 由于一个目录分区可以包含许多次要副本,因此,可将次要副本放置在比较靠近目录客户端的位置。Since a directory partition can have many secondary replicas, secondary replicas can be placed closer to the directory clients. 只有写入密集型的内部目录服务组件才直接以活动的主副本为目标。Only internal directory service components that are write-intensive target the active primary replica directly.

连续可用性Continuous availability

可用性(或运行时间)是指系统无中断运行的能力。Availability (or uptime) defines the ability of a system to perform uninterrupted. Azure AD 高可用性的关键在于,服务可跨多个地理分散的数据中心快速转移流量。The key to Azure AD’s high-availability is that the services can quickly shift traffic across multiple geographically distributed datacenters. 数据中心彼此独立,因此可以实现互不相干的故障模式。Each datacenter is independent, which enables de-correlated failure modes. 通过这种高可用性设计,Azure AD 不需要停机即可进行维护活动。Through this high availability design, Azure AD requires no downtime for maintenance activities.

与企业 AD 设计相比,Azure AD 的分区设计更精简,它使用单一主控设计,其中融入了精心协调的确定性主副本故障转移过程。Azure AD’s partition design is simplified compared to the enterprise AD design, using a single-master design that includes a carefully orchestrated and deterministic primary replica failover process.

容错Fault tolerance

如果系统能够承受硬件、网络和软件故障,则可用性更高。A system is more available if it is tolerant to hardware, network, and software failures. 目录的每个分区中有一个高度可用的主控副本:主要副本。For each partition on the directory, a highly available master replica exists: The primary replica. 此副本中只执行针对分区的写入。Only writes to the partition are performed at this replica. 此副本持续受到密切的监视,一旦检测到故障,可立即将写入操作转移到其他副本(该副本将变成新的主要副本)。This replica is being continuously and closely monitored, and writes can be immediately shifted to another replica (which becomes the new primary) if a failure is detected. 故障转移期间,通常会出现 1-2 分钟的写入可用性损失。During failover, there could be a loss of write availability typically of 1-2 minutes. 读取可用性在此期间不受影响。Read availability is not affected during this time.

读取操作(比写入操作要多出许多个量级)只会转到辅助副本。Read operations (which outnumber writes by many orders of magnitude) only go to secondary replicas. 由于次要副本是幂等的,因此,通过将读取操作定向到其他副本(通常在同一数据中心内),即可轻松补偿给定分区中发生的任一副本丢失。Since secondary replicas are idempotent, loss of any one replica in a given partition is easily compensated by directing the reads to another replica, usually in the same datacenter.

数据持久性Data durability

在确认某个写入操作之前,会持续将该操作提交到至少两个数据中心。A write is durably committed to at least two datacenters prior to it being acknowledged. 这通过首先将写入操作提交到主数据中心,然后立即将写入操作复制到其他至少一个数据中心来实现。This happens by first committing the write on the primary, and then immediately replicating the write to at least one other datacenter. 此写入操作可以确保托管主副本的数据中心发生潜在灾难性损失时不会导致数据丢失。This write action ensures that a potential catastrophic loss of the datacenter hosting the primary does not result in data loss.

Azure AD 维持零恢复时间目标 (RTO),确保故障转移时不会丢失数据。Azure AD maintains a zero Recovery Time Objective (RTO) to not lose data on failovers. 这包括:This includes:

  • 令牌颁发和目录读取操作Token issuance and directory reads
  • 对于目录写入,可以实现大约 5 分钟的 RTOAllowing only about 5 minutes RTO for directory writes

数据中心Datacenters

Azure AD 的副本存储在分布于世界各地的数据中心内。Azure AD’s replicas are stored in datacenters located throughout the world. 有关详细信息,请参阅 Azure 全球基础结构For more information, see Azure global infrastructure.

Azure AD 可跨数据中心运行,其特征如下:Azure AD operates across datacenters with the following characteristics:

  • 身份验证、Graph 和其他 AD 服务驻留在网关服务的后面。Authentication, Graph, and other AD services reside behind the Gateway service. 网关管理这些服务的负载均衡。The Gateway manages load balancing of these services. 如果使用事务运行状况探测检测到任何不正常的服务器,网关自动故障转移。It will fail over automatically if any unhealthy servers are detected using transactional health probes. 网关根据这些运行状况探测,将流量动态路由到正常的数据中心。Based on these health probes, the Gateway dynamically routes traffic to healthy datacenters.
  • 对于读取 操作,目录提供辅助副本以及在多个数据中心运行的、采用主动-主动配置的相应前端服务。For reads, the directory has secondary replicas and corresponding front-end services in an active-active configuration operating in multiple datacenters. 当整个数据中心发生故障时,流量将自动路由到其他数据中心。In case of a failure of an entire datacenter, traffic will be automatically routed to a different datacenter.
  • 对于写入操作,目录将通过计划的(将新的主副本同步到旧的主副本)或紧急故障转移过程,跨数据中心故障转移主(主控)副本。 For writes, the directory will fail over primary (master) replica across datacenters via planned (new primary is synchronized to old primary) or emergency failover procedures. 通过将所有提交项复制到至少两个数据中心来实现数据持久性。Data durability is achieved by replicating any commit to at least two datacenters.

数据一致性Data consistency

目录模型具备最终一致性。The directory model is one of eventual consistencies. 分布式异步复制系统的一个典型问题是,从“特定”副本返回的数据可能不是最新的。One typical problem with distributed asynchronously replicating systems is that the data returned from a “particular” replica may not be up-to-date.

Azure AD 为面向次要副本的应用程序提供读写一致性,为此,它会将写入操作路由到主要副本,然后以异步方式将这些写入操作拉回到次要副本。Azure AD provides read-write consistency for applications targeting a secondary replica by routing its writes to the primary replica, and synchronously pulling the writes back to the secondary replica.

使用 Azure AD 图形 API 的应用程序写入操作经过抽象化,可与目录副本保持相关性,实现读写一致性。Application writes using the Graph API of Azure AD are abstracted from maintaining affinity to a directory replica for read-write consistency. Azure AD Graph 服务维护一个逻辑会话,该会话与用于读取的辅助副本相关;相关性在 Graph 服务使用分布式缓存缓存的“副本令牌”中捕获。The Azure AD Graph service maintains a logical session, which has affinity to a secondary replica used for reads; affinity is captured in a “replica token” that the graph service caches using a distributed cache. 然后,此令牌可用于同一个逻辑会话中的后续操作。This token is then used for subsequent operations in the same logical session.

Note

写入操作立即复制到逻辑会话读取操作所颁发到的辅助副本。Writes are immediately replicated to the secondary replica to which the logical session's reads were issued.

备份保护Backup protection

目录为用户和租户实施软删除而不是硬删除,让客户在意外删除数据后轻松恢复。The directory implements soft deletes, instead of hard deletes, for users and tenants for easy recovery in case of accidental deletes by a customer. 如果租户管理员意外删除了用户,他们可以轻松撤消操作并还原已删除的用户。If your tenant administrator accidental deletes users, they can easily undo and restore the deleted users.

Azure AD 实施所有数据的每日备份,因此,在发生任何逻辑删除或损坏时,能够可靠地恢复数据。Azure AD implements daily backups of all data, and therefore can authoritatively restore data in case of any logical deletions or corruptions. 数据层采用纠错代码,可以检查错误并自动更正特定类型的磁盘错误。The data tier employs error correcting codes, so that it can check for errors and automatically correct particular types of disk errors.

指标和监视器Metrics and monitors

运行高可用性服务需要一流的指标和监视功能。Running a high availability service requires world-class metrics and monitoring capabilities. Azure AD 会持续分析和报告其每个服务的关键服务运行状况指标与成功条件。Azure AD continually analyzes and reports key service health metrics and success criteria for each of its services. 此外还将不断开发并优化指标,针对每个 Azure AD 服务和所有服务中的每个情景进行监视并发出警报。There is also continuous development and tuning of metrics and monitoring and alerting for each scenario, within each Azure AD service and across all services.

如果有任何 Azure AD 服务不按预期工作,我们将立即采取措施,尽快还原功能。If any Azure AD service is not working as expected, action is immediately taken to restore functionality as quickly as possible. Azure AD 跟踪的最重要指标是如何快速检测并减轻客户的实时站点问题。The most important metric Azure AD tracks is how quickly live site issues can be detected and mitigated for customers. 我们在监视和警报功能方面投入了大量资金,力求将检测时间缩到最短(TTD 目标:小于 5 分钟);在操作就绪性方面也同样如此,力求将缓解时间缩到最短(TTM 目标:小于 30 分钟)。We invest heavily in monitoring and alerts to minimize time to detect (TTD Target: <5 minutes) and operational readiness to minimize time to mitigate (TTM Target: <30 minutes).

安全操作Secure operations

针对任一操作采用多重身份验证 (MFA) 等操作控制,并针对所有操作实施审核。Using operational controls such as multi-factor authentication (MFA) for any operation, as well as auditing of all operations. 此外使用适时提升系统,授予必要的临时访问权限让客户完成任何日常的按需操作任务。In addition, using a just-in-time elevation system to grant necessary temporary access for any operational task-on-demand on an ongoing basis. 有关详细信息,请参阅 受信任的云For more information, see The Trusted Cloud.

后续步骤Next steps

Azure Active Directory 开发人员指南Azure Active Directory developer's guide