Service Fabric 群集中基于 X.509 证书的身份验证X.509 Certificate-based authentication in Service Fabric clusters

本文是对 Service Fabric 群集安全性简介的补充,详细介绍了 Service Fabric 群集中基于证书的身份验证。This article complements the introduction to Service Fabric cluster security, and goes into the details of certificate-based authentication in Service Fabric clusters. 假设读者熟悉基本的安全性概念,以及 Service Fabric 公开的用于控制群集安全性的控制措施。We assume the reader is familiar with fundamental security concepts, and also with the controls that Service Fabric exposes to control the security of a cluster.

本文的主题包括:Topics covered under this title:

  • 有关基于证书的身份验证的基础知识Certificate-based authentication basics
  • 标识以及各自的角色Identities and their respective roles
  • 证书配置规则Certificate configuration rules
  • 故障排除和常见问题解答Troubleshooting and Frequently Asked Questions

有关基于证书的身份验证的基础知识Certificate-based authentication basics

简单回顾以前的内容:在安全领域,证书是将有关某个实体(使用者)的信息绑定到其拥有的一对非对称加密密钥,从而构成公钥加密核心构造的一种机制。As a brief refresher, in security, a certificate is an instrument meant to bind information regarding an entity (the subject) to their possession of a pair of asymmetric cryptographic keys, and so constitutes a core construct of public key cryptography. 证书代表的密钥可用于保护数据,或用于证明密钥持有者的身份;与公钥基础结构 (PKI) 系统结合使用时,证书可以代表其使用者的其他特征,例如 Internet 域的所有权,或者证书颁发者(称为证书颁发机构,简称 CA)向其使用者授予的特定特权。The keys represented by a certificate can be used for protecting data, or for proving the identity of key holders; when used in conjunction with a Public Key Infrastructure (PKI) system, a certificate can represent additional traits of its subject, such as the ownership of an internet domain, or certain privileges granted to it by the issuer of the certificate (known as a Certification Authority, or CA). 应用证书的一个常见目的是支持传输层安全性 (TLS) 加密协议,以便通过计算机网络进行安全通信。A common application of certificates is supporting the Transport Layer Security (TLS) cryptographic protocol, allowing for secure communications over a computer network. 具体而言,客户端和服务器使用证书来确保其通信的隐私性和完整性,以及执行相互身份验证。Specifically, the client and server use certificates to ensure the privacy and integrity of their communication, and also to conduct mutual authentication.

在 Service Fabric 中,群集的基础层(联合层)也是基于 TLS(以及其他协议)构建的,目的是实现可靠且安全的参与节点网络。In Service Fabric, the fundamental layer of a cluster (Federation) also builds on TLS (among other protocols) to achieve a reliable, secure network of participating nodes. 通过 Service Fabric 客户端 API 与群集建立的连接也使用 TLS 来保护流量,以及建立参与方的标识。Connections into the cluster via Service Fabric Client APIs use TLS as well to protect traffic, and also to establishing the identities of the parties. 具体而言,在 Service Fabric 中用于身份验证时,证书可用于证明以下声明:a) 证书凭据的出示者拥有证书的私钥 b) 证书的 SHA-1 哈希(“指纹”)与群集定义中包含的声明匹配,或 c) 证书的可分辨“使用者公用名”与群集定义中包含的声明匹配,且证书的颁发者是已知的或受信任的。Specifically, when used for authentication in Service Fabric, a certificate can be used to prove the following claims: a) the presenter of the certificate credential has possession of the certificate's private key b) the certificate's SHA-1 hash ('thumbprint') matches a declaration included in the cluster definition, or c) the certificate's distinguished Subject Common Name matches a declaration included in the cluster definition, and the certificate's issuer is known or trusted.

在上面的列表中,“b”俗称为“指纹固定”;在这种情况下,声明是指特定的证书,身份验证方案的强度依赖于这么一个前提:在计算上无法伪造一个可以生成与另一证书相同的哈希值,但在所有其他方面仍是格式标准的有效对象的证书。In the list above, 'b' is colloquially known as 'thumbprint pinning'; in this case, the declaration refers to a specific certificate and the strength of the authentication scheme rests on the premise that it is computationally unfeasible to forge a certificate which produces the same hash value as another one, while still being a valid, well-formed object in all other respects. 项“c”代表另一种证书声明形式,其方案强度依赖于证书使用者和颁发机构的组合。Item 'c' represents an alternative form of declaring a certificate, where the strength of the scheme rests on the combination of the subject of the certificate and the issuing authority. 在这种情况下,声明是指某类证书 - 具有相同特征的任意两个证书被视为完全等效。In this case, the declaration refers to a class of certificates - any two certificates with the same characteristics are considered fully equivalent.

以下部分将深入说明 Service Fabric 运行时如何使用和验证证书来确保群集安全性。The following sections will explain in depth how the Service Fabric runtime uses and validates certificates to ensure cluster security.

标识以及各自的角色Identities and their respective roles

在深入了解身份验证或信道保护的详细信息之前,必须列出参与者及其在群集中扮演的相应角色:Before diving into the details of authentication or securing communication channels, it is important to list the participating actors and the corresponding roles they play in a cluster:

  • Service Fabric 运行时(称为“系统”):提供代表群集的抽象和功能的服务集。the Service Fabric runtime, referred to as 'system': the set of services which provide the abstractions and functionality representing the cluster. 表示系统实例之间的群集内通信时,我们将使用术语“群集标识”;当群集表示作为来自群集外部的流量的接收方/目标时,我们将使用术语“服务器标识”。When referring to in-cluster communication between system instances, we'll use the term 'cluster identity'; when referring to the cluster as the recipient/target of traffic from outside the cluster, we'll use the term 'server identity'.
  • 托管应用程序(称为“应用程序”):群集所有者提供的代码,在群集中协调和执行hosted applications, referred to as 'applications': code provided by the owner of the cluster, which is orchestrated and executed in the cluster
  • 客户端:允许连接到的实体,可根据群集配置在群集中执行功能。clients: entities allowed to connect to, and execute functionality in a cluster, according to the cluster configuration. 我们区分两种级别的权限,分别为“用户”和“管理员”。We distinguish between two levels of privileges - 'user' and 'admin', respectively. “用户”客户端主要限制为执行只读操作(但并非所有只读功能),而“管理员”客户端对群集功能的访问权限则不受限制。A 'user' client is restricted primarily to read-only operations (but not all read-only functionality), whereas an 'admin' client has unrestricted access to the cluster's functionality. (有关更多详细信息,请参阅 Service Fabric 群集中的安全角色。)(For more details, refer to Security roles in a Service Fabric cluster.)
  • (仅限 Azure)Service Fabric 服务协调和公开 Service Fabric 群集的操作与管理控制权,简称为“服务”。(Azure-only) the Service Fabric services which orchestrate and expose controls for operation and management of Service Fabric clusters, referred to as simply 'service'. 根据环境,“服务”可以指 Azure Service Fabric 资源提供程序,或 Service Fabric 团队拥有和操作的其他资源提供程序。Depending on the environment, the 'service' may refer to the Azure Service Fabric Resource Provider, or other Resource Providers owned and operated by the Service Fabric team.

在安全群集中,可为其中的每个角色配置其自身的不同的标识,并将其声明为预定义角色名称及其相应凭据的配对。In a secure cluster, each of these roles can be configured with their own, distinct identity, declared as the pairing of a predefined role name and its corresponding credential. Service Fabric 支持将凭据声明为证书或基于域的服务主体。Service Fabric supports declaring credentials as certificates or domain-based service principal. (也支持基于 Windows/Kerberos 的标识,但此内容超出了本文的范围;请参阅 Service Fabric 群集中基于 Windows 的安全性。)在 Azure 群集中,还可将客户端角色声明为基于 Azure Active Directory 的标识(Windows-/Kerberos-based identities are also supported, but are beyond the scope of this article; refer to Windows-based security in Service Fabric clusters.) In Azure clusters, client roles may also be declared as Azure Active Directory-based identities.

如前所述,Service Fabric 运行时在群集中定义两种级别的特权:“管理员”和“用户”。As alluded to above, the Service Fabric runtime defines two levels of privilege in a cluster: 'admin' and 'user'. 管理员客户端和“系统”组件均可使用“管理员”特权运行,因此无法区分彼此。An administrator client and a 'system' component would both operate with 'admin' privileges, and so are undistinguishable from each other. 在群集中建立连接/与群集建立连接后,Service Fabric 运行时会为经过身份验证的调用方授予这两种角色之一,作为后续授权的基础。Upon establishing a connection in/to the cluster, an authenticated caller will be granted by the Service Fabric runtime one of the two roles as the base for the subsequent authorization. 后续部分将深入探讨身份验证。We'll examine authentication in depth in the following sections.

证书配置规则Certificate configuration rules

验证规则Validation rules

Service Fabric 群集的安全性设置大致描述以下方面:The security settings of a Service Fabric cluster describe, in principle, the following aspects:

  • 身份验证类型;这是群集在创建时的不可变特征。the authentication type; this is a creation-time, immutable characteristic of the cluster. 此类设置的示例包括“ClusterCredentialType”和“ServerCredentialType”,允许的值为“none”、“x509”或“windows”。Examples of such settings are 'ClusterCredentialType', 'ServerCredentialType', and allowed values are 'none', 'x509' or 'windows'. 本文重点介绍 x509 类型身份验证。This article focuses on the x509-type authentication.
  • (身份验证)验证规则;这些设置由群集所有者设置,描述给定角色应接受的凭据。the (authentication) validation rules; these settings are set by the cluster owner and describe which credentials shall be accepted for a given role. 接下来将深入探讨相关示例。Examples will be examined in depth immediately below.
  • 用于微调或轻微改变身份验证结果的设置;示例包括用于限制(取消限制)证书吊销列表的强制实施的标志,等等。settings used to tweak or subtly alter the result of authentication; examples here include flags (de-)restricting enforcement of certificate revocation lists etc.

备注

下面提供的群集配置示例摘自 XML 格式的群集清单,该格式是最精炼的格式,直接支持本文所述的 Service Fabric 功能。Cluster configuration examples provided below are excerpts from the cluster manifest in XML format, as the most-digested format which supports directly the Service Fabric functionality described in this article. 可以直接以群集定义的 JSON 表示形式来表示相同的设置,无论是使用独立的 JSON 群集清单,还是使用 Azure 资源管理模板。The same settings can be expressed directly in the JSON representations of a cluster definition, whether a standalone json cluster manifest, or an Azure Resource Mangement template.

证书验证规则由以下元素组成:A certificate validation rule comprises the following elements:

  • 相应的角色:客户端、管理员客户端(特权角色)the corresponding role: client, admin client (privileged role)
  • 角色接受的凭据,由指纹或使用者公用名声明the credential accepted for the role, declared either by thumbprint or subject common name

基于指纹的证书验证声明Thumbprint-based certificate validation declarations

对于基于指纹的验证规则,将按如下方式对请求在群集中建立连接/与群集建立连接的调用方所出示的凭据进行验证:In the case of thumbprint-based validation rules, the credentials presented by a caller requesting a connection in/to the cluster will be validated as follows:

  • 凭据是格式标准的有效证书:可以生成其链,且签名匹配the credential is a valid, well-formed certificate: its chain can be built, signatures match
  • 证书有时效性 (NotBefore <= now < NotAfter)the certificate is time valid (NotBefore <= now < NotAfter)
  • 证书的 SHA-1 哈希与声明匹配,匹配方式是进行不区分大小写的字符串比较(排除所有空格)the certificate's SHA-1 hash matches the declaration, as a case-insensitive string comparison excluding all whitespaces

对于基于指纹的声明,将会取消显示在链生成或验证过程中遇到的任何信任错误,过期的证书除外 - 不过,对于证书过期的情况,还需要遵守一些规定。Any trust errors encountered during chain building or validation will be suppressed for thumbprint-based declarations, except for expired certificates - although provisions do exist for that case as well. 具体而言,对于与吊销状态为未知或脱机、不受信任的根、密钥用法无效相关的失败,部分链将被视为非灾难性错误;这种情况的前提是证书只是密钥对的信封 - 安全性依赖于群集所有者已经采取了保护私钥的措施这一事实。Specifically, failures related to: revocation status being unknown or offline, untrusted root, invalid key usage, partial chain are considered non-fatal errors; the premise, in this case, is that the certificate is merely an envelope for a key pair - the security lies in the fact that the cluster owner has set in places measure to safeguard the private key.

摘自群集清单的以下内容演示了一组基于指纹的验证规则:The following excerpt from a cluster manifest exemplifies such a set of thumbprint-based validation rules:

<Section Name="Security">
  <Parameter Name="ClusterCredentialType" Value="X509" />
  <Parameter Name="ServerAuthCredentialType" Value="X509" />
  <Parameter Name="AdminClientCertThumbprints" Value="d5ec...4264" />
  <Parameter Name="ClientCertThumbprints" Value="7c8f...01b0" />
  <Parameter Name="ClusterCertThumbprints" Value="abcd...1234,ef01...5678" />
  <Parameter Name="ServerCertThumbprints" Value="ef01...5678" />
</Section>

根据前面所述,上面的每个条目都表示一个特定的标识;每个条目还允许以逗号分隔的字符串列表形式指定多个值。Each of the entries above refer to a specific identity as described earlier; each entry also allows for specifying multiple values, as comma-separated list of strings. 在此示例中,在成功验证传入凭据后,将为具有 SHA-1 指纹“d5ec...4264”的证书的出示者授予“管理员”角色;相反,将为使用证书“7c8f...01b0”进行身份验证的调用方授予“用户”角色,该角色主要限制为执行只读操作。In this example, upon successfully validating the incoming credentials, the presenter of a certificate with the SHA-1 thumbprint 'd5ec...4264' will be granted the 'admin' role; conversely, a caller authenticating with certificate '7c8f...01b0' will be granted a 'user' role, restricted to primarily read-only operations. 接受出示指纹为“abcd...1234”或“ef01...5678”的证书的入站调用方作为群集中的对等节点。An inbound caller that presents a certificate whose thumbprint is either 'abcd...1234' or 'ef01...5678' will be accepted as a peer node in the cluster. 最后,连接到群集管理终结点的客户端会要求服务器证书的指纹为“ef01...5678”。Lastly, a client connecting to a management endpoint of the cluster will expect the thumbprint of the server certificate to be 'ef01...5678'.

如前所述,Service Fabric 在接受过期证书方面做出了规定;原因是证书的生存期有限,在由指纹(表示特定的证书实例)声明时,允许证书过期会导致连接到群集失败,或导致群集彻底崩溃。As mentioned earlier, Service Fabric does make provisions for accepting expired certificates; the reason is that certificates do have a limited lifetime and, when declared by thumbprint (which refers to a specific certificate instance), allowing a certificate to expire will result in either failure to connect to the cluster, or an outright collapse of the cluster. 很容易忘记或忽略轮换指纹固定的证书,遗憾的是,在这种情况下进行恢复却很困难。It is all too easy to forget or neglect rotating a thumbprint-pinned certificate, and unfortunately the recovery from such a situation is difficult.

为此,群集所有者可以显式指明应将指纹声明的自签名证书视为有效,如下所示:To that end, the cluster owner can explicitly state that self-signed certificates declared by thumbprint shall be considered valid, as follows:

  <Section Name="Security">
    <Parameter Name="AcceptExpiredPinnedClusterCertificate" Value="true" />
  </Section>

此行为不会扩展到 CA 颁发的证书;如果出现那种情况,当已吊销的、已知泄露的过期证书不再包含在 CA 证书吊销列表中时,该证书会立即变成“有效”证书,从而带来安全风险。This behavior does not extend to CA-issued certificates; if that were the case, a revoked, known-to-be-compromised expired certificate could become 'valid' as soon as it would no longer figure in the CA's certificate revocation list, and thus present a security risk. 使用自签名证书时,群集所有者被视为负责保护证书私钥的唯一一方,而使用 CA 颁发的证书时则不存在这种情况 - 群集所有者可能不知道其证书是如何或何时声明为已泄露的。With self-signed certificates, the cluster owner is considered the only party responsible for safeguarding the certificate's private key, which is not the case with CA-issued certificates - the cluster owner may not be aware of how or when their certificate was declared compromised.

基于公用名的证书验证声明Common name-based certificate validation declarations

基于公用名的声明采用以下形式之一:Common name-based declarations take one of the following forms:

  • (仅限)使用者公用名subject common name (only)
  • 进行了颁发者固定的使用者公用名subject common name with issuer pinning

让我们首先考虑通过群集清单的摘录内容演示这两种声明样式:Let us first consider an excerpt from a cluster manifest exemplifying both declaration styles:

    <Section Name="Security/ServerX509Names">
      <Parameter Name="server.demo.system.servicefabric.azure-int" Value="" />
    </Section>
    <Section Name="Security/ClusterX509Names">
      <Parameter Name="cluster.demo.system.servicefabric.azure-int" Value="1b45...844d,d7fe...26c8,3ac7...6960,96ea...fb5e" />
    </Section>

声明分别表示服务器和群集标识;请注意,基于 CN 的声明在群集清单中具有自身的节,而不同于标准“安全性”。The declarations refer to the server and cluster identities, respectively; note that the CN-based declarations have their own sections in the cluster manifest, separate from the standard 'Security'. 在这两个声明中,“Name”表示证书的可分辨使用者公用名,“Value”字段表示预期的颁发者,如下所述:In both declarations, the 'Name' represents the distinguished subject common name of the certificate, and the 'Value' field represents the expected issuer, as follows:

  • 对于第一种情况,声明指明服务器证书可分辨使用者的公用名元素预期与字符串“server.demo.system.servicefabric.azure-int”匹配;空的“Value”字段表示证书链的根预期在验证服务器证书的节点/计算机上受信任;在 Windows 上,这意味着该证书可以链接到“受信任根 CA”存储中安装的任何证书;in the first case, the declaration states that the common name element of the distinguished subject of the server certificate is expected to match the string "server.demo.system.servicefabric.azure-int"; the empty 'Value' field denotes the expectation that the root of the certificate chain is trusted on the node/machine where the server certificate is being validated; on Windows, this means that the certificate can chain up to any of the certificates installed in the 'Trusted Root CA' store;
  • 对于第二种情况,声明指明,如果证书的公用名与字符串“cluster.demo.system.servicefabric.azure-int”匹配,并且证书直接颁发者的指纹与“值”字段中的某个逗号分隔条目匹配,则接受该证书的出示者作为群集中的对等节点。 in the second case, the declaration states that the presenter of a certificate is accepted as a peer node in the cluster if the certificate's common name matches the string "cluster.demo.system.servicefabric.azure-int", and the thumbprint of the direct issuer of the certificate matches one of the comma-separated entries in the 'Value' field. (此规则类型俗称为“进行了颁发者固定的公用名”。)(This rule type is colloquially known as 'common name with issuer pinning'.)

无论哪种情况,都会生成证书链,且该链预期不存在错误;即,吊销错误、部分链或时间无效信任错误都被视为灾难性错误,在此情况下证书验证会失败。In either case, the certificate's chain is built and is expected to be error-free; that is, revocation errors, partial chain or time-invalid trust errors are considered fatal, and the certificate validation will fail. 固定颁发者会导致将“不受信任的根”状态视为非灾难性错误;不过,这在表面上只是一种更严格的验证形式,因为它允许群集所有者将授权的/接受的颁发者集约束为自身的 PKI。Pinning the issuers will result in considering the 'untrusted root' status as a non-fatal error; despite appearances, this is a stricter form of validation, as it allows the cluster owner to constrain the set of authorized/accepted issuers to their own PKI.

生成证书链后,将使用声明的使用者作为远程名称根据标准 TLS/SSL 策略来验证该链。如果某个证书的使用者公用名或其任何使用者可选名称与群集清单中的 CN 声明相匹配,则会将该证书视为匹配。After the certificate chain is built, it is validated against a standard TLS/SSL policy with the declared subject as the remote name; a certificate will be considered a match if its subject common name or any of its subject alternative names matches the CN declaration from the cluster manifest. 这种情况下支持通配符,字符串匹配不区分大小写。Wildcards are supported in this case, and the string matching is case-insensitive.

(我们应该澄清,对于证书声明的每种密钥用法类型,可以执行上述序列;如果证书指定了客户端身份验证密钥用法,则首先为某个客户端角色生成并评估该链。(We should clarify that the sequence described above could be executed for each type of key usage declared by the certificate; if the certificate specifies the client authentication key usage, the chain is built and evaluated first for a client role. 如果成功,即表示评估完成且验证成功。In case of success, evaluation completes and validation is successful. 如果证书未声明客户端身份验证用法或者验证失败,则 Service Fabric 运行时会生成并评估服务器身份验证链。)If the certificate does not have the client authentication usage, or the validation failed, the Service Fabric runtime will build and evaluate the chain for server authentication.)

若要完成本示例,请参阅以下摘录,了解如何按公用名声明客户端证书:To complete the example, the following excerpt illustrates declaring client certificates by common name:

    <Section Name="Security/AdminClientX509Names">
      <Parameter Name="admin.demo.client.servicefabric.azure-int" Value="1b45...844d,d7fe...26c8,3ac7...6960,96ea...fb5e" />
    </Section>
    <Section Name="Security/ClientX509Names">
      <Parameter Name="user.demo.client.servicefabric.azure-int" Value="1b45...844d,d7fe...26c8,3ac7...6960,96ea...fb5e" />
    </Section>

以上声明分别对应于管理员标识和用户标识;以这种方式声明的证书的验证与前面示例中群集和服务器证书的验证完全相同。The declarations above correspond to the admin and user identities, respectively; validation of certificates declared in this manner is exactly as described for the previous examples, of cluster and server certificates.

备注

基于公用名的声明旨在简化群集证书的轮换,在一般情况下还可以简化其管理。Common name-based declarations are meant to simplify rotation, and in general, management of cluster certificates. 但是,我们建议你遵循以下建议,以确保群集的持续可用性和安全性:However, it is recommended to adhere to the following recommendations to ensure the continued availability and security of the cluster:

  • 优先采用颁发者固定,而不是依赖于受信任的根prefer issuer pinning to relying on trusted roots
  • 避免混合来自不同 PKI 的颁发者avoid mixing issuers from different PKIs
  • 确保所有预期的颁发者已列在证书声明中;颁发者不匹配会导致验证失败ensure that all expected issuers are listed on the certificate declaration; a mismatching issuer will result in a failed validation
  • 确保 PKI 的证书策略终结点可发现、可用且可访问 - 这意味着,AIA、CRL 或 OCSP 终结点已在叶证书中声明且可访问,因此可以完成证书链生成。ensure that the PKI's certificate policy endpoints are discoverable, available and accessible - this means that the AIA, CRL or OCSP endpoints are declared on the leaf certificate, and that they are accessible so that certificate chain building can complete.

如果满足所有这些要求,收到在使用 X.509 证书保护的群集中建立连接的请求后,Service Fabric 运行时会使用群集的安全设置来验证远程方的凭据,如前所述;如果成功,则将调用方/远程方视为已通过身份验证。Tying it all together, upon receiving a request for a connection in a cluster secured with X.509 certificates, the Service Fabric runtime will use the cluster's security settings to validate the credentials of the remote party as described above; if successful, the caller/remote party is considered to be authenticated. 如果凭据与多个验证规则匹配,则运行时将为调用方授予任何已匹配规则的最高特权角色。If the credential matches multiple validation rules, the runtime will grant the caller the highest-privileged role of any of the matched rules.

出示规则Presentation rules

上一部分介绍了身份验证在受证书保护的群集中的工作原理;本部分将说明 Service Fabric 运行时本身如何发现和加载用于群集中通信的证书;我们称之为“出示”规则。The previous section described how authentication works in a certificate-secured cluster; this section will explain how the Service Fabric runtime itself discovers and loads the certificates it uses for in-cluster communication; we call these the "presentation" rules.

与验证规则一样,出示规则会指定一个角色和关联的凭据声明,后者通过指纹或公用名进行表述。As with the validation rules, the presentation rules specify a role and the associated credential declaration, expressed either by thumbprint or common name. 与验证规则不同,基于公用名的声明在颁发者固定方面没有规定;这可以提高灵活性和性能。Unlike the validation rules, common name-based declarations do not have provisions for issuer pinning; this allows for greater flexibility as well as improved performance. 对于每个不同的节点类型,出示规则在群集清单的“NodeType”节中声明;设置是从群集的 Security 节中拆分的,使每个节点类型在单个节中具有自身的完整配置。The presentation rules are declared in the 'NodeType' section(s) of the cluster manifest, for each distinct node type; the settings are split from the Security sections of the cluster to allow each node type to have its full configuration in a single section. 在 Azure Service Fabric 群集中,节点类型证书声明默认为其在群集定义的 Security 节中的相应设置。In Azure Service Fabric clusters, the node type certificate declarations default to their corresponding settings in the Security section of the definition of the cluster.

基于指纹的证书出示声明Thumbprint-based certificate presentation declarations

如前所述,Service Fabric 运行时会将自身的作为群集中其他节点的对等方的角色与作为群集管理操作的服务器的角色区分开来。As previously described, the Service Fabric runtime distinguishes between its role as the peer of other nodes in the cluster, and as the server for cluster management operations. 原则上可以区别性地配置这些设置,但在实践中,这些设置往往是一致的。In principle, these settings can be configured distinctly, but in practice they tend to align. 为方便说明,本文的余下内容假设设置是匹配的。For the remainder of this article, we'll assume the settings match for simplicity.

以摘自群集清单的以下内容为例:Let us consider the following excerpt from a cluster manifest:

  <NodeTypes>
    <NodeType Name="nt1vm">
      <Certificates>
        <ClusterCertificate X509FindType="FindByThumbprint" X509FindValue="cc71...1984" X509FindValueSecondary="49e2...19d6" X509StoreName="my" Name="ClusterCertificate" />
        <ServerCertificate X509FindValue="cc71...1984" Name="ServerCertificate" />
        <ClientCertificate X509FindValue="cc71...1984" Name="ClientCertificate" />
      </Certificates>
    </NodeType>
  </NodeTypes>

“ClusterCertificate”元素演示完整架构,包括可选参数(“X509FindValueSecondary”)或具有相应默认值的参数(“X509StoreName”);其他声明显示了缩写形式。The 'ClusterCertificate' element demonstrates the full schema, including optional parameters ('X509FindValueSecondary') or those with appropriate defaults ('X509StoreName'); the other declarations show an abbreviated form. 以上群集证书声明指明,类型为“nt1vm”的节点的安全设置是使用主要证书“cc71..1984”和辅助证书“49e2..19d6”初始化的;这两个证书预期可在 LocalMachine'My' 证书存储(或 Linux 等效路径 var/lib/sfcerts)中找到。The cluster certificate declaration above states that the security settings of nodes of type 'nt1vm' are initialized with certificate 'cc71..1984' as the primary, and the '49e2..19d6' certificate as the secondary; both certificates are expected to be found in the LocalMachine'My' certificate store (or the Linux equivalent path, var/lib/sfcerts).

基于公用名的证书出示声明Common name-based certificate presentation declarations

还可以按使用者公用名声明节点类型证书,如以下示例所示:The node type certificates can also be declared by subject common name, as exemplified below:

  <NodeTypes>
    <NodeType Name="nt1vm">
      <Certificates>
        <ClusterCertificate X509FindType="FindBySubjectName" X509FindValue="demo.cluster.azuredocpr.system.servicefabric.azure-int" Name="ClusterCertificate" />
      </Certificates>
    </NodeType>
  </NodeTypes>

对于任一类型的声明,Service Fabric 节点会在启动时读取配置,查找并加载指定的证书,并按证书的 NotAfter 属性以降序方式将证书排序;忽略已过期的证书,列表的第一个元素将选作此节点尝试建立的任何 Service Fabric 连接的客户端凭据。For either type of declaration, a Service Fabric node will read the configuration at startup, locate and load the specified certificates, and sort them in descending order of their NotAfter attribute; expired certificates are ignored, and the first element of the list is selected as the client credential for any Service Fabric connection attempted by this node. (实际上,Service Fabric 会优先使用最后面的即将过期的证书。)(In effect, Service Fabric favors the farthest expiring certificate.)

请注意,对于基于公用名的出示声明,如果在进行区分大小写的精确字符串比较的情况下,某个证书的使用者公用名等于声明的 X509FindValue(或 X509FindValueSecondary)字段,则将该证书视为匹配。Note that, for common-name based presentation declarations, a certificate is considered a match if its subject common name is equal to the X509FindValue (or X509FindValueSecondary) field of the declaration as a case-sensitive, exact string comparison. 这与验证规则相反。验证规则支持通配符匹配,以及不区分大小写的字符串比较。This is in contrast with the validation rules, which does support wildcard matching, as well as case-insensitive string comparisons.

其他证书配置设置Miscellaneous certificate configuration settings

前面提到,Service Fabric 群集的安全设置还允许对身份验证代码的行为进行细微更改。It was mentioned previously that the security settings of a Service Fabric cluster also allow for subtly changing the behavior of the authentication code. 尽管有关 Service Fabric 群集设置的文章提供了全面且最新的设置列表,但本文将对少量的所选安全设置的涵义进行扩展,以完全公开基于证书的身份验证。While the article on Service Fabric cluster settings represents the comprehensive and most up to date list of settings, we'll expand on the meaning of a select few of the security settings here, to complete the full expose on certificate-based authentication. 对于每项设置,我们会介绍其意图、默认值/行为、对身份验证的影响以及可接受的值。For each setting, we'll explain the intent, default value/behavior, how it affects authentication and which values are acceptable.

如前所述,证书验证始终意味着要生成并评估证书链。As mentioned, certificate validation always implies building and evaluating the certificate's chain. 对于 CA 颁发的证书,这种表面上简单的 OS API 调用通常需要对颁发者 PKI 的各个终结点进行多次出站调用、对响应进行缓存,以及执行其他操作。For CA-issued certificates, this apparently-simple OS API call typically entails several outbound calls to various endpoints of the issuing PKI, caching of responses and so on. 由于证书验证调用在 Service Fabric 群集中非常普遍,PKI 终结点中出现任何问题都可能导致群集可用性下降或完全中断。Given the prevalence of certificate validation calls in a Service Fabric cluster, any issues in the PKI's endpoints can result in reduced availability of the cluster, or outright breakdown. 尽管无法抑制出站调用(如需此方面的详细信息,请参阅下面的“常见问题解答”部分),但可以使用以下设置来隐藏由于 CRL 调用失败而导致的验证错误。While the outbound calls cannot be suppressed (see below in the FAQ section for more on this), the following settings can be used to mask out validation errors caused by failing CRL calls.

  • CrlCheckingFlag -“Security”节中转换为 UINT 的字符串。CrlCheckingFlag - under the 'Security' section, string converted to UINT. Service Fabric 使用此设置的值通过更改链生成行为来隐藏证书链状态错误;此值将作为“dwFlags”参数传递给 Win32 CryptoAPI CertGetCertificateChain 调用,并可设置为该函数接受的任何有效标志组合。The value of this setting is used by Service Fabric to mask out certificate chain status errors by changing the behavior of chain building; it is passed in to the Win32 CryptoAPI CertGetCertificateChain call as the 'dwFlags' parameter, and can be set to any valid combination of flags accepted by the function. 值为 0 会强制 Service Fabric 运行时忽略任何信任状态错误 - 不建议使用此值,因为会导致严重的安全风险。A value of 0 forces the Service Fabric runtime to ignore any trust status errors - this is not recommended, as its use would constitute a significant security exposure. 默认值为 0x40000000 (CERT_CHAIN_REVOCATION_CHECK_CHAIN_EXCLUDE_ROOT)。The default value is 0x40000000 (CERT_CHAIN_REVOCATION_CHECK_CHAIN_EXCLUDE_ROOT).

    何时使用:使用格式不全面的自签名证书或开发人员证书进行本地测试/没有适当的公钥基础结构用于支持证书。When to use: for local testing, with self-signed certificates or developer certificates which aren't fully formed/do not have a proper public key infrastructure to support the certificates. 还可以在 PKI 之间的过渡期间,在与外界隔绝的环境中用作缓解措施。May also use as mitigation in air-gapped environments during transition between PKIs.

    如何使用:我们将使用一个强制吊销检查仅访问缓存 URL 的示例。How to use: we'll take an example that forces revocation check to access cached URLs only. 假设:Assuming:

    #define CERT_CHAIN_REVOCATION_CHECK_CACHE_ONLY         0x80000000
    

    那么,群集清单中的声明将变为:then the declaration in the cluster manifest becomes:

    <Section Name="Security">
      <Parameter Name="CrlCheckingFlag" Value="0x80000000" />
    </Section>
    
  • IgnoreCrlOfflineError -“Security”节中默认值为“false”的布尔值。IgnoreCrlOfflineError - under the 'Security' section, boolean with a default value of 'false'. 表示一个快捷方式,用于抑制“脱机吊销”链生成错误状态(或后续链策略验证错误状态)。Represents a shortcut for suppressing a 'revocation offline' chain building error status (or a subsequent chain policy validation error status).

    何时使用:本地测试,或者使用不是由适当 PKI 支持的开发人员证书。When to use: local testing, or with developer certificates not backed by a proper PKI. 在与外界隔绝的环境中用作缓解措施,或者在 PKI 已知不可访问时使用。Use as mitigation in air-gapped environments or when the PKI is known to be inaccessible.

    如何使用:How to use:

    <Section Name="Security">
      <Parameter Name="IgnoreCrlOfflineError" Value="true" />
    </Section>
    

    其他值得注意的设置(都在“Security”节中):Other notable settings (all under the 'Security' section):

    • AcceptExpiredPinnedClusterCertificate - 在专用于基于指纹的证书验证的部分中已讨论;允许接受已过期的自签名群集证书。AcceptExpiredPinnedClusterCertificate - discussed in the section dedicated to thumbprint-based certificate validation; allows accepting expired self-signed cluster certificates.
    • CertificateExpirySafetyMargin - 时间间隔,以证书的 NotAfter 时间戳之前的分钟数为单位,在此期间,证书被视为存在过期风险。CertificateExpirySafetyMargin - interval, expressed in minutes prior to the certificate's NotAfter timestamp, and during which the certificate is considered at risk for expiration. Service Fabric 会监视群集证书,并对其剩余可用性定期发出运行状况报告。Service Fabric monitors cluster certificate(s) and periodically emits health reports on their remaining availability. 在“安全”时间间隔内,这些运行状况报告将提升为“警告”状态。Inside the 'safety' interval, these health reports are elevated to 'warning' status. 默认值为 30 天。The default is 30 days.
    • CertificateHealthReportingInterval - 控制与群集证书剩余有效时间相关的运行状况报告的频率。CertificateHealthReportingInterval - controls the frequency of health reports concerning the remaining time validity of cluster certificates. 发出报告时,每个这样的时间间隔只发出一次。Reports will only be emitted once per this interval. 值以秒表示,默认值为 8 小时。The value is expressed in seconds, with a default of 8 hours.
    • EnforcePrevalidationOnSecurityChanges - 布尔值,控制检测到安全设置更改后的群集升级行为。EnforcePrevalidationOnSecurityChanges - boolean, controls the behavior of cluster upgrade upon detecting changes of security settings. 如果设置为“true”,则群集升级会尝试确保与任何表示规则匹配的证书中至少有一个可以通过相应的验证规则。If set to 'true', the cluster upgrade will attempt to ensure that at least one of the certificates matching any of the presentation rules can pass a corresponding validation rule. 在将新设置应用于任何节点之前会执行预验证,但它仅在启动升级时承载群集管理器服务主要副本的节点上运行。The pre-validation is executed before the new settings are applied to any node, but runs only on the node hosting the primary replica of the Cluster Manager service at the time of initiating the upgrade. 在撰写本文时,该设置的默认值为“false”;在从 7.1 开始的运行时版本中,对于新的 Azure Service Fabric 群集,该设置将为“true”。As of this writing, the setting has a default of 'false', and will be set to 'true' for new Azure Service Fabric clusters with a runtime version starting with 7.1.

端到端方案(示例)End to end scenario (examples)

我们已了解出示规则、验证规则和微调标志,但所有这些设置如何协同工作呢?We've looked at presentation rules, validation rules and tweaking flags, but how does this all work together? 本部分将通过两个端到端示例,演示如何利用安全设置来进行安全的群集升级。In this section, we'll work through two end-to-end examples demonstrating how the security settings can be leveraged for safe cluster upgrades. 请注意,本文并非综合性论文,不讨论如何在 Service Fabric 中正确地进行证书管理。如需这方面的内容,请查看该主题的相关文章。Note this is not intended to be a comprehensive dissertation on proper certificate management in Service Fabric, look for a companion article on that topic.

出示规则和验证规则的隔离引发了这样一个明显的问题(或忧虑):两者是否可以相互背离,如果可以,会有什么后果?The separation of presentation and validation rules poses the obvious question (or concern) of whether they can diverge, and what the consequences would be. 一个节点选择的身份验证证书确实有可能通不过另一个节点的验证规则。It is, indeed, possible that a node's selection of an authentication certificate won't pass the validation rules of another node. 事实上,这种偏差是身份验证相关事件的主要原因。In fact, this discrepancy is the primary cause of authentication-related incidents. 同时,这些规则的隔离使得群集在升级期间能够继续运行,从而更改了群集的安全设置。At the same time, the separation of these rules allows for a cluster to continue operating during an upgrade which changes the cluster's security settings. 假设先作为一个首要步骤补充验证规则,所有群集节点会按新的设置融合,同时仍使用当前凭据。Consider that, by augmenting first the validation rules as a first step, all of the cluster's nodes will converge on the new settings while still using the current credentials.

回顾前文,在 Service Fabric 群集中,升级过程会进行到(最多 5 个)“升级域”(UD)。Recall that, in a Service Fabric cluster, an upgrade progresses through (up to 5) 'upgrade domains', or UDs. 在给定的时间,只会升级/更改当前 UD 中的节点。仅当群集的可用性允许时,升级才会继续进行到下一 UD。Only nodes in the current UD are being upgraded/changed at a given time, and the upgrade will proceed to the next UD only if the cluster's availability allows it. (有关详细信息,请参阅 Service Fabric 群集升级和同一主题的其他文章。)证书/安全性更改的风险特别高,因为这种更改可能会从群集中隔离节点,或让群集处于仲裁丢失的边缘状态。(Refer to Service Fabric cluster upgrades and other articles on the same topic for more details.) Certificate/security changes are particularly risky, since they can isolate nodes from the cluster, or leave the cluster at the edge of quorum loss.

我们将使用以下表示法来描述节点的安全设置:We will use the following notation to describe a node's security settings:

Nk: {P:{TP=A}, V:{TP=A}},其中:Nk: {P:{TP=A}, V:{TP=A}}, where:

  • “Nk”表示升级域 k 中的节点'Nk' represents a node in upgrade domain k
  • “P”表示节点的当前出示规则(假设仅引用群集证书);'P' represents the node's current presentation rules (assuming we are referring to cluster certificates only);
  • “V”表示节点的当前验证规则(仅限群集证书)'V' represents the node's current validation rules (cluster certificate only)
  • “TP=A”表示基于指纹的声明 (TP),其中的“A”是证书指纹'TP=A' represents a thumbprint-based declaration (TP), with 'A' being a certificate thumbprint
  • “CN=B”表示基于公用名的声明 (CN),其中的“B”是证书的使用者公用名'CN=B' represents a common name-based declaration (CN), with 'B' being the certificate's subject common name

轮换按指纹声明的群集证书Rotating a cluster certificate declared by thumbprint

以下序列描述如何使用 2 阶段升级过程来安全地引入按指纹声明的辅助群集证书;第一个阶段在验证规则中引入新的证书声明,第二个阶段在出示规则中引入该声明:The following sequence describes how a 2-stage upgrade can be used to safely introduce a secondary cluster certificate, declared by thumbprint; first phase introduces the new certificate declaration in the validation rules, and the second phase introduces it in the presentation rules:

  • 初始状态:N0 = {P:{TP=A}, V:{TP=A}}, ...Nk = {P:{TP=A}, V:{TP=A}} - 群集处于静止状态,所有节点共享一个通用配置initial state: N0 = {P:{TP=A}, V:{TP=A}}, ... Nk = {P:{TP=A}, V:{TP=A}} - the cluster is at rest, all nodes share a common configuration
  • 完成升级域 0 后:N0 = {P:{TP=A}, V:{TP=A, TP=B}}, ...Nk = {P:{TP=A}, V:{TP=A}} - UD0 中的节点会出示证书 A,并接受证书 A 或 B;所有其他节点仅出示并接受证书 Aupon completing upgrade domain 0: N0 = {P:{TP=A}, V:{TP=A, TP=B}}, ... Nk = {P:{TP=A}, V:{TP=A}} - nodes in UD0 will present certificate A, and accept certificates A or B; all other nodes present and accept certificate A only
  • 完成最后一个升级域后:N0 = {P:{TP=A}, V:{TP=A, TP=B}}, ...Nk = {P:{TP=A}, V:{TP=A, TP=B}} - 所有节点出示证书 A,所有节点接受证书 A 或 Bupon completing the last upgrade domain: N0 = {P:{TP=A}, V:{TP=A, TP=B}}, ... Nk = {P:{TP=A}, V:{TP=A, TP=B}} - all nodes present certificate A, all nodes would accept either certificate A or B

此时,群集再次进入平衡状态,第二个升级阶段/更改安全设置的操作可以开始:At this point, the cluster is again in equilibrium, and the second phase of the upgrade/changing security settings can commence:

  • 完成升级域 0 后:N0 = {P:{TP=A, TP=B}, V:{TP=A, TP=B}}, ...Nk = {P:{TP=A}, V:{TP=A, TP=B}} - UD0 中的节点开始出示证书 B,该证书由群集中的任何其他节点接受。upon completing upgrade domain 0: N0 = {P:{TP=A, TP=B}, V:{TP=A, TP=B}}, ... Nk = {P:{TP=A}, V:{TP=A, TP=B}} - nodes in UD0 will start presenting B, which is accepted by any other node in the cluster.
  • 完成最后一个升级域后:N0 = {P:{TP=A, TP=B}, V:{TP=A, TP=B}}, ...Nk = {P:{TP=A, TP=B}, V:{TP=A, TP=B}} - 所有节点已切换为出示证书 B。现在,可以通过后续的升级集在群集定义中停用/删除证书 A。upon completing the last upgrade domain: N0 = {P:{TP=A, TP=B}, V:{TP=A, TP=B}}, ... Nk = {P:{TP=A, TP=B}, V:{TP=A, TP=B}} - all nodes have switched to presenting certificate B. Certificate A can now be retired/removed from the cluster definition with a subsequent set of upgrades.

将群集从基于指纹的证书声明转换为基于公用名的证书声明Converting a cluster from thumbprint- to common-name-based certificate declarations

同样,更改证书声明类型(从指纹更改为公用名)遵循上述相同模式。Similarly, changing the type of certificate declaration (from thumbprint to common name) will follow the same pattern as above. 请注意,验证规则允许在同一群集定义中按指纹和公用名声明给定角色的证书。Note that validation rules allow declaring the certificates of a given role by both thumbprint and common name in the same cluster definition. 不过,相比之下,出示规则只允许一种形式的声明。By contrast, though, the presentation rules allow only one form of declaration. 顺便提一句,若要将群集证书从指纹转换为公用名,安全的做法是先按指纹引入目标证书,然后将该声明更改为基于公用名的声明。Incidentally, the safe approach to converting a cluster certificate from thumbprint to common name is to introduce the intended target certificate first by thumbprint, and then changing that declaration to a common name-based one. 在以下示例中,我们假设指纹“A”和使用者公用名“B”引用同一证书。In the following example, we will assume that thumbprint 'A' and subject common name 'B' refer to the same certificate.

  • 初始状态:N0 = {P:{TP=A}, V:{TP=A}}, ...Nk = {P:{TP=A}, V:{TP=A}} - 群集处于静止状态,所有节点共享一个通用配置,A 是主证书指纹initial state: N0 = {P:{TP=A}, V:{TP=A}}, ... Nk = {P:{TP=A}, V:{TP=A}} - the cluster is at rest, all nodes share a common configuration, with A being the primary certificate thumbprint
  • 完成升级域 0 后:N0 = {P:{TP=A}, V:{TP=A, CN=B}}, ...Nk = {P:{TP=A}, V:{TP=A}} - UD0 中的节点会出示证书 A,并接受包含指纹 A 或公用名 B 的证书;所有其他节点仅出示并接受证书 Aupon completing upgrade domain 0: N0 = {P:{TP=A}, V:{TP=A, CN=B}}, ... Nk = {P:{TP=A}, V:{TP=A}} - nodes in UD0 will present certificate A, and accept certificates with either thumbprint A or common name B; all other nodes present and accept certificate A only
  • 完成最后一个升级域后:N0 = {P:{TP=A}, V:{TP=A, CN=B}}, ...Nk = {P:{TP=A}, V:{TP=A, CN=B}} - 所有节点出示证书 A,所有节点接受证书 A (TP) 或 B (CN)upon completing the last upgrade domain: N0 = {P:{TP=A}, V:{TP=A, CN=B}}, ... Nk = {P:{TP=A}, V:{TP=A, CN=B}} - all nodes present certificate A, all nodes would accept either certificate A (TP) or B (CN)

此时,我们可以继续通过后续升级来更改出示规则:At this point we can proceed with changing the presentation rules with a subsequent upgrade:

  • 完成升级域 0 后:N0 = {P:{CN=B}, V:{TP=A, CN=B}}, ...Nk = {P:{TP=A}, V:{TP=A, CN=B}} - UD0 中的节点会出示按 CN 找到的证书 B,并接受包含指纹 A 或公用名 B 的证书;所有其他节点仅出示并接受按指纹选择的证书 Aupon completing upgrade domain 0: N0 = {P:{CN=B}, V:{TP=A, CN=B}}, ... Nk = {P:{TP=A}, V:{TP=A, CN=B}} - nodes in UD0 will present certificate B found by CN, and accept certificates with either thumbprint A or common name B; all other nodes present and accept certificate A only, selected by thumbprint
  • 完成最后一个升级域后:N0 = {P:{CN=B}, V:{TP=A, CN=B}}, ...Nk = {P:{CN=B}, V:{TP=A, CN=B}} - 所有节点会出示按 CN 找到的证书 B,所有节点会接受证书 A (TP) 或 B (CN)upon completing the last upgrade domain: N0 = {P:{CN=B}, V:{TP=A, CN=B}}, ... Nk = {P:{CN=B}, V:{TP=A, CN=B}} - all nodes present certificate B found by CN, all nodes would accept either certificate A (TP) or B (CN)

完成阶段 2 也意味着将群集转换为基于公用名的证书;在后续群集升级中可以删除基于指纹的验证声明。Completion of phase 2 also marks the conversion of the cluster to common name-based certificates; the thumbprint-based validation declarations can be removed in a subsequent cluster upgrade.

备注

在 Azure Service Fabric 群集中,上述工作流由 Service Fabric 资源提供程序进行协调;群集所有者仍需负责按指定规则(出示或验证规则)将证书预配到群集中。我们建议通过多个步骤执行更改。In Azure Service Fabric clusters, the workflows presented above are orchestrated by the Service Fabric Resource Provider; the cluster owner is still responsible for provisioning certificates into the cluster according to the indicated rules (presentation or validation), and is encouraged to perform changes in multiple steps.

我们会在单独的文章中介绍关于如何在 Service Fabric 群集中管理和预配证书的主题。In a separate article we will address the topic of managing and provisioning certificates into a Service Fabric cluster.

故障排除和常见问题解答Troubleshooting and Frequently Asked Questions

尽管在 Service Fabric 群集中调试与身份验证相关的问题并不容易,但以下提示和技巧也许会有帮助。While debugging authentication-related issues in Service Fabric clusters is not easy, we're hopeful the following hints and tips may help. 若要开始进行调查,最简单的方法是检查群集节点(不一定只是表现了症状的节点,也包括已启动但无法连接到某个邻居的节点)上的 Service Fabric 事件日志。The easiest way to begin investigations is to examine the Service Fabric event logs on the nodes of the cluster - not necessarily only those showing symptoms, but also nodes which are up but are unable to connect to one of their neighbors. 在 Windows 上,重要事件通常分别记录在“Applications and Services Logs\Microsoft-ServiceFabric\Admin”或“Operational”路径下。On Windows, events of significance are typically logged under the 'Applications and Services Logs\Microsoft-ServiceFabric\Admin' or 'Operational' channels, respectively. 有时,启用 CAPI2 日志记录可能会有帮助,这样可以捕获有关证书验证、CRL/CTL 检索等方面的更多详细信息。(在完成问题重现后,请记得禁用此功能,因为记录的日志非常详细。)Sometimes it may be helpful to enable CAPI2 logging, to capture more details regarding the certificate validation, retrieval of CRL/CTL etc. (Do remember to disable it after completing the repro, it can be quite verbose.)

遇到身份验证问题的群集中表现的典型症状如下:Typical symptoms that manifest themselves in a cluster experiencing authentication issues are:

  • 节点关闭/循环启动nodes are down/cycling
  • 拒绝连接尝试connection attempts are rejected
  • 连接尝试超时connection attempts are timing out

每种症状可能由不同的问题造成,相同的根本原因可能表现出不同的症状;因此,我们只是列出了典型问题的简要示例,以及解决这些问题的建议方法。Each of the symptoms may be caused by different problems, and the same root cause may show different manifestations; as such, we'll just list a small sample of typical problems, with recommendations for fixing them.

  • 节点可以交换消息,但无法进行连接。Nodes can exchange messages but cannot connect. 终止连接尝试的可能原因是发生了“证书不匹配”错误 - Service Fabric 到 Service Fabric 连接中的一方所出示的证书未通过接收方的验证规则。A possible cause for connection attempts to be terminated is the 'certificate not matched' error - one of the parties in a Service Fabric-to- Service Fabric connections is presenting a certificate which fails the recipient's validation rules. 可能会伴随以下两个错误之一:May be accompanied by either of the following errors:

    0x80071c44  -2147017660 FABRIC_E_SERVER_AUTHENTICATION_FAILED
    

    若要进一步进行诊断/调查,请执行以下操作:在尝试连接的每个节点上,确定出示了哪个证书;检查该证书,并尝试模拟验证规则(检查指纹或公用名是否相等,并检查颁发者指纹(如果已指定))。To diagnose/investigate further: on each of the nodes attempting the connection, determine which certificate is being presented; examine the certificate and try and emulate the validation rules (check for thumbprint or common name equality, check issuer thumbprints if specified).

    伴随的另一个常见错误代码可能是:Another common accompanying error code may be:

    0x800b0109  -2146762487 CERT_E_UNTRUSTEDROOT
    

    在这种情况下,证书是按公用名声明的,并存在以下任一情况:In this case, the certificate is declared by common name, and either of the following applies:

    • 颁发者未固定,根证书不受信任,或者the issuers are not pinned, and the root certificate is not trusted, or
    • 颁发者已固定,但声明不包含此证书的直接颁发者的指纹the issuers are pinned but the declaration does not include the thumbprint of the direct issuer of this certificate
  • 某个节点已启动,但无法连接到其他节点;其他节点不接收来自有故障节点的入站流量。A node is up, but cannot connect to other nodes; other nodes do not receive inbound traffic from the failing node. 在这种情况下,本地节点上的证书加载可能会失败。In this case, it is possible that the certificate loading fails on the local node. 查看以下错误:Look for the following errors:

    • 找不到证书 - 确保可以按照 LocalMachine\My 证书存储(或指定的证书存储)的内容解析出示规则中声明的证书。certificate not found - ensure the certificates declared in the presentation rules can be resolved by the contents of the LocalMachine\My (or as specified) certificate store. 可能的失败原因包括:Possible causes for failure may include:

      • 指纹声明中的字符无效invalid characters in the thumbprint declaration
      • 未安装证书the certificate is not installed
      • 证书已过期the certificate is expired
      • 公用名声明包含前缀“CN=”the common-name declaration includes the prefix 'CN='
      • 声明指定了通配符,但证书存储中没有完全匹配项(声明:CN=*.mydomain.com,实际证书:CN=server.mydomain.com)the declaration specifies a wildcard and no exact match exists in the cert store (declaration: CN=*.mydomain.com, actual certificate: CN=server.mydomain.com)
    • 未知凭据 - 表示缺少与证书对应的私钥,通常会伴随以下错误代码:unknown credentials - indicates either a missing private key corresponding to the certificate, typically accompanied by error code:

      0x8009030d  -2146893043 SEC_E_UNKNOWN_CREDENTIALS
      0x8009030e  -2146893042 SEC_E_NO_CREDENTIALS
      

      若要修正此问题,请检查是否存在私钥;验证是否已向 SFAdmins 授予了对私钥的“read|execute”访问权限。To remedy, check the existence of the private key; verify SFAdmins is granted 'read|execute' access to the private key.

    • 错误的提供程序类型 - 表示新一代加密 (CNG) 证书(“Microsoft 软件密钥存储提供程序”);目前 Service Fabric 仅支持 CAPI1 证书。bad provider type - indicates a Crypto New Generation (CNG) certificate ("Microsoft Software Key Storage Provider"); at this time, Service Fabric only supports CAPI1 certificates. 通常伴随以下错误代码:Typically accompanied by error code:

      0x80090014  -2146893804 NTE_BAD_PROV_TYPE
      

      若要修正此问题,请使用 CAPI1(例如“Microsoft 增强型 RSA 和 AES 加密提供程序”)提供程序重新创建群集证书。To remedy, re-create the cluster certificate using a CAPI1 (e.g. "Microsoft Enhanced RSA and AES Cryptographic Provider") provider. 有关加密提供程序的更多详细信息,请参阅 Understanding Cryptographic Providers(了解加密提供程序)For more details on crypto providers, refer to Understanding Cryptographic Providers