通过开发人员最佳做法实现复原能力Resilience through developer best practices

在本文中,我们将分享一些经验教训,它们基于我们与大客户合作的体验。In this article, we share some learnings that are based on our experience from working with large customers. 你可以在服务的设计和实现中考虑这些建议。You may consider these recommendations in the design and implementation of your services.

图像显示了开发人员体验组件

使用 Microsoft 身份验证库 (MSAL)Use the Microsoft Authentication Library (MSAL)

Microsoft 身份验证库 (MSAL)适用于 ASP.NET 的 Microsoft 标识 Web 身份验证库简化了应用程序所需令牌的获取、管理、缓存和刷新。The Microsoft Authentication Library (MSAL) and the Microsoft identity web authentication library for ASP.NET simplify acquiring, managing, caching, and refreshing the tokens an application requires. 这些库经过专门优化,可支持 Microsoft 标识,其中包括可改进应用程序复原能力的功能。These libraries are optimized specifically to support Microsoft Identity including features that improve application resiliency.

开发人员应采用最新版本的 MSAL 并了解最新信息。Developers should adopt latest releases of MSAL and stay up to date. 请参阅如何提高应用程序中身份验证和授权的复原能力See how to increase resilience of authentication and authorization in your applications. 如果可能,请避免实现自己的身份验证堆栈,使用构建良好的库。Where possible, avoid implementing your own authentication stack and use well-established libraries.

优化目录读取和写入Optimize directory reads and writes

Azure AD B2C 目录服务支持一天进行数十亿次身份验证。The Azure AD B2C directory service supports billions of authentications a day. 其设计可实现较高的每秒读取速率。It's designed for a high rate of reads per second. 优化写入以最大程度地减少依赖关系并提高复原能力。Optimize your writes to minimize dependencies and increase resilience.

如何优化目录读取和写入How to optimize directory reads and writes

  • 避免在登录时将函数写入目录:切勿在自定义策略中执行无前提条件(if 子句)的登录时写入。Avoid write functions to the directory on sign-in: Never execute a write on sign-in without a precondition (if clause) in your custom policies. 需要进行登录时写入的一个用例是用户密码的实时迁移One use case that requires a write on a sign-in is just-in-time migration of user passwords. 避免任何在每次登录时都需要写入的方案。Avoid any scenario that requires a write on every sign-in.

    <Precondition Type="ClaimEquals" ExecuteActionsIf="true"> <Value>requiresMigration</Value> ... < Precondition/>

  • 了解限制:目录实现应用程序和租户级别限制规则。Understand throttling: The directory implements both application and tenant level throttling rules. Read/GET、Write/POST、Update/PUT 和 Delete/DELETE 操作有进一步的速率限制,并且每个操作有不同的限制。There are further rate limits for Read/GET, Write/POST, Update/PUT, and Delete/DELETE operations and each operation have different limits.

    • 登录时的写入对于新用户属于 POST,对于现有用户属于 PUT。A write at the time of sign-in will fall under a POST for new users or PUT for existing users.

    • 在每次登录时创建或更新用户的自定义策略可能会达到应用程序级别 PUT 或 POST 速率限制。A custom policy that creates or updates a user on every sign-in, can potentially hit an application level PUT or POST rate limit. 通过 Azure AD 或 Microsoft Graph 更新目录对象时,会应用相同的限制。The same limits apply when updating directory objects via Azure AD or Microsoft Graph. 同样,检查读取以使每次登录时的读取次数保持最少。Similarly, examine the reads to keep the number of reads on every sign-in to the minimum.

    • 估计峰值负载以预测目录写入速率并避免限制。Estimate peak load to predict the rate of directory writes and avoid throttling. 峰值流量估计应包括对诸如注册、登录和多重身份验证 (MFA) 等操作的估计。Peak traffic estimates should include estimates for actions such as sign-up, sign-in, and Multi-factor authentication (MFA). 确保针对峰值流量测试 Azure AD B2C 系统和应用程序。Be sure to test both the Azure AD B2C system and your application for peak traffic. 可能存在这种情况下:Azure AD B2C 可以不受限制地处理负载,而下游应用程序或服务无法这样做。It's possible that Azure AD B2C can handle the load without throttling, when your downstream applications or services won’t.

    • 了解并规划迁移时间线。Understand and plan your migration timeline. 当计划使用 Microsoft Graph 将用户迁移到 Azure AD B2C 时,请考虑应用程序和租户限制以计算完成用户迁移所需的时间。When planning to migrate users to Azure AD B2C using Microsoft Graph, consider the application and tenant limits to calculate the time needed to complete the migration of users. 如果使用两个应用程序拆分用户创建作业或脚本,则可以使用每个应用程序限制。If you split your user creation job or script using two applications, you can use the per application limit. 它仍需保持低于每个租户阈值。It would still need to remain below the per tenant threshold.

    • 了解迁移作业对其他应用程序的影响。Understand the effects of your migration job on other applications. 考虑由其他依赖应用程序提供的实时流量,以确保不会在租户级别造成限制,并且不会导致实时应用程序资源不足。Consider the live traffic served by other relying applications to make sure you don’t cause throttling at the tenant level and resource starvation for your live application. 有关详细信息,请参阅 Microsoft Graph 限制指导For more information, see the Microsoft Graph throttling guidance.

延长令牌生存期Extend token lifetimes

在极少数情况下,当 Azure AD B2C 身份验证服务无法完成新的注册和登录时,你仍然可以为已登录的用户提供缓解措施。In an unlikely event, when the Azure AD B2C authentication service is unable to complete new sign-ups and sign-ins, you can still provide mitigation for users who are signed in. 借助配置,可以允许已登录的用户继续使用应用程序,而没有任何明显的中断,直到用户从应用程序中注销或会话由于处于不活动状态而超时。With configuration, you can allow users that are already signed in to continue using the application without any perceived disruption until the user signs out from the application or the session times out due to inactivity.

业务要求和所需的最终用户体验将决定你为 Web 和单页应用程序 (SPA) 进行令牌刷新的频率。Your business requirements and desired end-user experience will dictate your frequency of token refresh for both web and Single-page applications (SPAs).

如何延长令牌生存期How to extend token lifetimes

  • Web 应用程序:对于在登录开始时验证身份验证令牌的 Web 应用程序,应用程序依赖于会话 Cookie 来继续扩展会话有效性。Web applications: For web applications where the authentication token is validated at the beginning of sign-in, the application depends on the session cookie to continue to extend the session validity.

    • 通过实现基于用户活动持续续订会话的滚动会话时间,使用户可以保持登录状态。Enable users to remain signed in by implementing rolling session times that will continue to renew sessions based on user activity. 如果存在长期令牌颁发中断,则这些会话时间可以作为应用程序中的一次性配置而进一步增加。If there is a long-term token issuance outage, these session times can be further increased as a onetime configuration on the application. 使会话的生存期保持为允许的最大值。Keep the lifetime of the session to the maximum allowed.
  • SPA:SPA 可能依赖于访问令牌来调用 API。SPAs: A SPA may depend on access tokens to make calls to the APIs. 在传统上,SPA 使用不会导致刷新令牌的隐式流。A SPA traditionally uses the implicit flow that doesn't result in a refresh token. 如果浏览器仍与 Azure AD B2C 保持活动会话,则 SPA 可以使用隐藏 iframe 对授权终结点执行新的令牌请求。The SPA can use a hidden iframe to perform new token requests against the authorization endpoint if the browser still has an active session with the Azure AD B2C. 对于 SPA,有几个选项可用于允许用户持续使用应用程序。For SPAs, there are a few options available to allow the user to continue to use the application.

    • 延长访问令牌的有效期,以满足业务需求。Extend the access token’s validity duration to meet your business requirements.

    • 构建应用程序,以使用 API 网关作为身份验证代理。Build your application to use an API gateway as the authentication proxy. 在此配置中,SPA 加载时不进行任何身份验证,会对 API 网关进行 API 调用。In this configuration, the SPA loads without any authentication and the API calls are made to the API gateway. API 网关使用基于策略的授权代码授予,通过登录过程发送用户,并对用户进行身份验证。The API gateway sends the user through a sign-in process using an authorization code grant based on a policy and authenticates the user. 随后使用身份验证 Cookie 维护 API 网关与客户端之间的身份验证会话。Subsequently, the authentication session between the API gateway and the client is maintained using an authentication cookie. 使用 API 网关获取的令牌或其他一些直接身份验证方法(如证书、客户端凭据或 API 密钥),从 API 网关为 API 提供服务。The APIs are serviced from the API gateway using the token that is obtained by the API gateway or some other direct authentication method such as certificates, client credentials, or API keys.

    • 借助用于代码交换的证明密钥 (PKCE) 和跨域资源共享 (CORS) 支持,将 SPA 从隐式授予迁移授权代码授予流Migrate your SPA from implicit grant to authorization code grant flow with Proof Key for Code Exchange (PKCE) and Cross-origin Resource Sharing (CORS) support. 将应用程序从 MSAL.js 1.x 迁移到 MSAL.js 2.x 以实现 Web 应用程序的复原能力。Migrate your application from MSAL.js 1.x to MSAL.js 2.x to realize the resiliency of Web applications.

    • 对于移动应用程序,建议同时延长刷新和访问令牌生存期。For mobile applications, it's recommended to extend both the refresh and access token lifetimes.

  • 后端或微服务应用程序:由于后端(守护程序)应用程序是非交互式的,不在用户上下文中,因此令牌失窃的可能性会大幅降低。Backend or microservice applications: Because backend (daemon) applications are non-interactive and aren't in a user context, the prospect of token theft is greatly diminished. 建议在安全性与生存期之间达到平衡,并设置较长的令牌生存期。Recommendation is to strike a balance between security and lifetime and set a long token lifetime.

安全部署实践Safe deployment practices

最常见的服务中断因素是代码和配置更改。The most common disrupters of service are the code and configuration changes. 采用持续集成和持续交付 (CICD) 过程和工具有助于大规模快速部署,并减少在测试和部署到生产环境过程中的人为错误。Adoption of Continuous Integration and Continuous Delivery (CICD) processes and tools help with rapid deployment at a large scale and reduces human errors during testing and deployment into production. 采用 CICD 来减少错误,提高效率和一致性。Adopt CICD for error reduction, efficiency, and consistency.

机密轮换Secrets rotation

Azure AD B2C 对应用程序、API、策略和加密使用机密。Azure AD B2C uses secrets for applications, APIs, policies, and encryption. 机密可保护身份验证、外部交互和存储。The secrets secure authentication, external interactions, and storage. 美国国家标准与技术研究所 (NIST) 将授权特定密钥供合法实体使用的时间跨度称为加密期。The National Institute of Standards and Technology (NIST) calls the time span during which a specific key is authorized for use by legitimate entities a cryptoperiod. 选择合适的cryptoperiod长度以满足业务需求。Choose the right length of cryptoperiod to meet your business needs. 开发人员需要手动设置到期并在机密到期之前进行轮换。Developers need to manually set the expiration and rotate secrets well in advance of their expiration.

如何实现机密轮换How to implement secret rotation

  • 使用受支持资源的托管标识向支持 Azure AD 身份验证的任何服务验证身份。Use managed identities for supported resources to authenticate to any service that supports Azure AD authentication. 使用托管标识时,可以自动管理资源,包括凭据轮换。When you use managed identities, you can manage resources automatically, including rotation of credentials.

  • 清点 Azure AD B2C 中配置的所有密钥和证书。Take an inventory of all the keys and certificates configured in Azure AD B2C. 此列表可能包括的自定义策略、API、签名 ID 令牌和 SAML 证书中使用的密钥。This list is likely to include keys used in custom policies, APIs, signing ID token, and certificates for SAML.

  • 使用 CICD,可轮换将在预期旺季两个月内到期的机密。Using CICD, rotate secrets that are about to expire within two months from the anticipated peak season. 与证书关联的私钥的建议最大加密期为一年。The recommended maximum cryptoperiod of private keys associated to a certificate is one year.

  • 主动监视和轮换 API 访问凭据,如密码和证书。Proactively monitor and rotate the API access credentials such as passwords, and certificates.

测试 REST APITest REST APIs

在复原能力上下文中,REST API 的测试需要包括 HTTP 代码、响应有效负载、标头和性能的验证。In the context of resiliency, testing of REST APIs needs to include verification of - HTTP codes, response payload, headers, and performance. 测试不应只包含快乐路径测试,还应检查 API 是否可正常处理问题情景。Testing shouldn't include only happy path tests, but also check whether the API handles problem scenarios gracefully.

如何测试 APIHow to test APIs

建议测试计划包括综合 API 测试We recommend your test plan to include comprehensive API tests. 如果为即将到来的流量激增(由于促销或假日流量)进行规划,则需要使用新估计来修订负载测试。If you're planning for an upcoming surge because of promotion or holiday traffic, you need to revise your load testing with the new estimates. 在开发人员环境而不在生产环境中执行 API 和内容交付网络 (CDN) 的负载测试。Conduct load testing of your APIs and Content Delivery Network (CDN) in a developer environment and not in production.

后续步骤Next steps