规划和准备 Service Fabric 独立群集部署Plan and prepare your Service Fabric Standalone cluster deployment

在创建群集之前,请执行以下步骤。Perform the following steps before you create your cluster.

规划群集基础结构Plan your cluster infrastructure

需要在“拥有的”计算机上创建 Service Fabric 群集,以便确定群集需应对的故障类型。You are about to create a Service Fabric cluster on machines you "own", so you can decide what kinds of failures you want the cluster to survive. 例如,是否需要为这些计算机单独提供电源线或 Internet 连接?For example, do you need separate power lines or Internet connections supplied to these machines? 此外,还应考虑这些计算机的物理安全性。In addition, consider the physical security of these machines. 计算机位于何处,谁需要访问它们?Where are the machines located and who needs access to them? 在做出这些决定后,可以采用逻辑方式将计算机映射到多个容错域(请参阅下一步骤)。After you make these decisions, you can logically map the machines to various fault domains (see next step). 相比于测试群集,生产群集的基础结构规划更复杂。The infrastructure planning for production clusters is more involved than for test clusters.

确定容错域和升级域的数目Determine the number of fault domains and upgrade domains

“容错域 (FD)”是故障的物理单元,与数据中心的物理基础结构直接相关。 A fault domain (FD) is a physical unit of failure and is directly related to the physical infrastructure in the data centers. 容错域由共享单一故障点的硬件组件(计算机、交换机、网络等)组成。A fault domain consists of hardware components (computers, switches, networks, and more) that share a single point of failure. 尽管容错域与机架之间没有 1:1 映射,但是大致上,可将每个机架视为一个容错域。Although there is no 1:1 mapping between fault domains and racks, loosely speaking, each rack can be considered a fault domain.

在 ClusterConfig.json 中指定 FD 时,可以选择每个 FD 的名称。When you specify FDs in ClusterConfig.json, you can choose the name for each FD. Service Fabric 支持分层的 FD,因此,可以在 FD 中反映基础结构拓扑。Service Fabric supports hierarchical FDs, so you can reflect your infrastructure topology in them. 例如,以下 FD 是有效的:For example, the following FDs are valid:

  • "faultDomain": "fd:/Room1/Rack1/Machine1""faultDomain": "fd:/Room1/Rack1/Machine1"
  • "faultDomain": "fd:/FD1""faultDomain": "fd:/FD1"
  • "faultDomain": "fd:/Room1/Rack1/PDU1/M1""faultDomain": "fd:/Room1/Rack1/PDU1/M1"

升级域 (UD) 是节点的逻辑单元。 An upgrade domain (UD) is a logical unit of nodes. 在 Service Fabric 协调式升级(应用程序升级或群集升级)期间,将关闭 UD 中的所有节点以执行升级,而其他 UD 中的节点仍可用来为请求提供服务。During Service Fabric orchestrated upgrades (either an application upgrade or a cluster upgrade), all nodes in a UD are taken down to perform the upgrade while nodes in other UDs remain available to serve requests. 对计算机进行的固件升级不遵循 UD,因此每次必须在一台计算机上执行此操作。The firmware upgrades you perform on your machines do not honor UDs, so you must do them one machine at a time.

领会这些概念的最简单方法是将 FD 视为非计划内故障的单元,将 UD 视为计划维护的单元。The simplest way to think about these concepts is to consider FDs as the unit of unplanned failure and UDs as the unit of planned maintenance.

在 ClusterConfig.json 中指定 UD 时,可以选择每个 UD 的名称。When you specify UDs in ClusterConfig.json, you can choose the name for each UD. 例如,以下名称是有效的:For example, the following names are valid:

  • "upgradeDomain":"UD0""upgradeDomain": "UD0"
  • "upgradeDomain":"UD1A""upgradeDomain": "UD1A"
  • "upgradeDomain":"DomainRed""upgradeDomain": "DomainRed"
  • "upgradeDomain":"Blue""upgradeDomain": "Blue"

有关 FD 和 UD 的更多详细信息,请参阅 Service Fabric 群集介绍For more detailed information on FDs and UDs, see Describing a Service Fabric cluster.

如果能够完全控制节点的维护和管理(即负责更新和更换计算机),则应该将生产环境中的群集至少跨越三个 FD,这样才能使它在生产环境中受支持。A cluster in production should span at least three FDs in order to be supported in a production environment, if you have full control over the maintenance and management of the nodes, that is, you are responsible for updating and replacing machines. 对于在无法完全控制计算机的环境(例如 Amazon Web Services VM 实例)中运行的群集,至少应在群集中部署五个 FD。For clusters running in environments (that is, Amazon Web Services VM instances) where you do not have full control over the machines, you should have a minimum of five FDs in your cluster. 每个 FD 可以有一个或多个节点。Each FD can have one or more nodes. 这是为了防止计算机升级和更新造成问题,根据升级和更新的时间,这些问题可能会干扰群集中应用程序和服务的运行。This is to prevent issues caused by machine upgrades and updates, which depending on their timing, can interfere with the running of applications and services in clusters.

确定初始群集大小Determine the initial cluster size

通常,群集中的节点数是根据业务需求确定的,例如,要在群集上运行多少个服务和容器,以及需要多少个资源来维持工作负荷。Generally, the number of nodes in your cluster is determined based on your business needs, that is, how many services and containers will be running on the cluster and how many resources do you need to sustain your workloads. 对于生产群集,我们建议在群集中至少跨 5 个 FD 部署 5 个节点。For production clusters, we recommend having at least five nodes in your cluster, spanning 5 FDs. 不过,如上所述,如果你对节点具有完全控制权并且可以跨三个 FD,则三个节点应当也可以胜任该工作。However, as described above, if you have full control over your nodes and can span three FDs, then three nodes should also do the job.

运行有状态工作负荷的测试用群集应当具有三个节点,而运行无状态工作负荷的测试用群集只需要一个节点。Test clusters running stateful workloads should have three nodes, whereas test clusters only running stateless workloads only need one node. 还应当注意,若是用于开发,可以在一台指定的计算机上配备多个节点。It should also be noted that for development purposes, you can have more than one node on a given machine. 不过,在生产环境中,对于每台物理机或虚拟机,Service Fabric 只支持一个节点。In a production environment however, Service Fabric supports only one node per physical or virtual machine.

准备将充当节点的计算机Prepare the machines that will serve as nodes

以下是 Service Fabric 群集中计算机的建议规格:Here are recommended specs for machines in a Service Fabric cluster:

  • 至少 16 GB RAMA minimum of 16 GB of RAM
  • 至少 40 GB 可用磁盘空间A minimum of 40 of GB available disk space
  • 一个 4 核心或更多核心的 CPUA 4 core or greater CPU
  • 与所有计算机的安全网络建立连接Connectivity to a secure network or networks for all machines
  • 已安装 Windows Server 操作系统(有效版本:2012 R2、2016、1709 或 1803)。Windows Server OS installed (valid versions: 2012 R2, 2016, 1709, or 1803). Service Fabric 版本 6.4.654.9590 及更高版本还支持 Server 2019 和 1809。Service Fabric version 6.4.654.9590 and later also supports Server 2019 and 1809.
  • .NET Framework 4.5.1 或更高版本的完整安装版.NET Framework 4.5.1 or higher, full install
  • Windows PowerShell 3.0Windows PowerShell 3.0
  • 应在所有计算机上运行 RemoteRegistry 服务The RemoteRegistry service should be running on all the machines
  • Service Fabric 安装驱动器必须是 NTFS 文件系统Service Fabric installation drive must be NTFS File System
  • Windows 服务 性能日志和警报 以及 Windows 事件日志 必须 启用Windows services Performance Logs & Alerts and Windows Event Log must be enabled.


部署和配置群集的群集管理员必须拥有每台计算机的 管理员权限The cluster administrator deploying and configuring the cluster must have administrator privileges on each of the machines. 不能在域控制器上安装 Service Fabric。You cannot install Service Fabric on a domain controller.

下载适用于 Windows Server 的 Service Fabric 独立包Download the Service Fabric standalone package for Windows Server

下载链接 - Service Fabric 独立包 - Windows Server,并将包解压缩到群集外的一台部署计算机中或解压缩到群集内的其中一台计算机中。Download Link - Service Fabric Standalone Package - Windows Server and unzip the package, either to a deployment machine that is not part of the cluster, or to one of the machines that will be a part of your cluster.

修改群集配置Modify cluster configuration

若要创建独立群集,必须创建独立群集配置 ClusterConfig.json 文件,其中描述群集的规范。To create a standalone cluster you have to create a standalone cluster configuration ClusterConfig.json file, which describes the specification of the cluster. 可以基于在以下链接中找到的模板创建配置文件。You can base the configuration file on the templates found at the below link.
独立群集配置Standalone Cluster Configurations

有关此文件中各个节的详细信息,请参阅 Windows 独立群集的配置设置For details on the sections in this file, see Configuration settings for standalone Windows cluster.

从已下载的包中打开某个 ClusterConfig.json 文件,并修改以下设置:Open one of the ClusterConfig.json files from the package you downloaded and modify the following settings:

配置设置Configuration Setting 说明Description
NodeTypesNodeTypes 节点类型可让你将群集节点划分到不同的组中。Node types allow you to separate your cluster nodes into various groups. 一个群集必须至少有一个节点类型。A cluster must have at least one NodeType. 组中的所有节点具有以下共同特征:All nodes in a group have the following common characteristics:
名称 - 即节点类型名称。Name - This is the node type name.
终结点端口 - 即与此节点类型关联的各种命名终结点(端口)。Endpoint Ports - These are various named end points (ports) that are associated with this node type. 可以使用任何端口号,只要它们不会与此清单中的其他部分发生冲突,并且未被计算机/VM 上运行的其他应用程序使用。You can use any port number that you wish, as long as they do not conflict with anything else in this manifest and are not already in use by any other application running on the machine/VM.
放置属性 - 即此节点类型的相应属性,可用作系统服务或你拥有的服务的放置约束。Placement Properties - These describe properties for this node type that you use as placement constraints for the system services or your services. 这些属性是用户定义的键/值对,可为指定节点提供额外的元数据。These properties are user-defined key/value pairs that provide extra meta data for a given node. 节点属性的示例包括节点是否有硬盘或图形卡、其硬盘的轴数、内核数和其他物理属性。Examples of node properties would be whether the node has a hard drive or graphics card, the number of spindles in its hard drive, cores, and other physical properties.
容量 - 节点容量,定义特定节点提供的特定资源的名称和数量。Capacities - Node capacities define the name and amount of a particular resource that a particular node has available for consumption. 例如,节点可以定义名为“MemoryInMb”的指标容量,而且默认有 2048 MB 的可用内存。For example, a node may define that it has capacity for a metric called "MemoryInMb" and that it has 2048 MB available by default. 这些容量在运行时使用,以确保将需要特定资源量的服务放在具有所需数量的可用资源的节点上。These capacities are used at runtime to ensure that services that require particular amounts of resources are placed on the nodes that have those resources available in the required amounts.
IsPrimary - 如果定义了多个 NodeType,请确保只有一个设置为主节点(值为 true ),系统服务会在该主节点上运行。IsPrimary - If you have more than one NodeType defined ensure that only one is set to primary with the value true , which is where the system services run. 应将所有其他节点类型设置为 false 值All other node types should be set to the value false
NodesNodes 这些是群集内的每个节点的详细信息(节点类型、节点名称、IP 地址、节点的容错域和升级域)。These are the details for each of the nodes that are part of the cluster (node type, node name, IP address, fault domain, and upgrade domain of the node). 要在其上创建群集的计算机必须与其 IP 地址一起列在此处。The machines you want the cluster to be created on need to be listed here with their IP addresses.
如果对所有节点使用相同的 IP 地址,则会创建一个可用于测试的单机群集。If you use the same IP address for all the nodes, then a one-box cluster is created, which you can use for testing purposes. 不要将单机群集用于部署生产工作负荷。Do not use One-box clusters for deploying production workloads.

群集配置将所有设置配置到环境后,可针对群集环境对其进行测试(步骤 7)。After the cluster configuration has had all settings configured to the environment, it can be tested against the cluster environment (step 7).

环境设置Environment setup

群集管理员配置 Service Fabric 独立群集时,需按照以下准则设置环境:When a cluster administrator configures a Service Fabric standalone cluster, the environment needs to be set up with the following criteria:

  1. 创建群集的用户应对群集配置文件中作为节点列出的所有计算机具有管理员级别的安全特权。The user creating the cluster should have administrator-level security privileges to all machines that are listed as nodes in the cluster configuration file.

  2. 从中创建群集的计算机以及每个群集节点计算机必须:Machine from which the cluster is created, as well as each cluster node machine must:

    • 已卸载 Service Fabric SDKHave Service Fabric SDK uninstalled
    • 已卸载 Service Fabric 运行时Have Service Fabric runtime uninstalled
    • 已启用 Windows 防火墙服务 (mpssvc)Have the Windows Firewall service (mpssvc) enabled
    • 已启用远程注册表服务(远程注册表)Have the Remote Registry Service (remote registry) enabled
    • 已启用文件共享 (SMB)Have file sharing (SMB) enabled
    • 已基于群集配置端口打开了必要的端口Have necessary ports opened, based on cluster configuration ports
    • 已为 Windows SMB 和远程注册表服务打开了必要的端口:135、137、138、139 和 445Have necessary ports opened for Windows SMB and Remote Registry service: 135, 137, 138, 139, and 445
    • 已将网络彼此互连Have network connectivity to one another
  3. 群集节点计算机不应为域控制器。None of the cluster node machines should be a Domain Controller.

  4. 如果要部署的群集是安全群集,需确保存在所需的安全先决条件,且已针对配置进行了正确配置。If the cluster to be deployed is a secure cluster, validate the necessary security prerequisites are in place, and are configured correctly against the configuration.

  5. 如果群集计算机无法访问 Internet,请在群集配置中设置以下项:If the cluster machines are not internet-accessible, set the following in the cluster configuration:

    • 禁用遥测: 在“属性”下,设置 "enableTelemetry": falseDisable telemetry: Under properties set "enableTelemetry": false
    • 禁用自动下载 Fabric 版本和禁用通知当前群集版本支持即将终止:在“属性”下,设置 "fabricClusterAutoupgradeEnabled": false Disable automatic Fabric version downloading & notifications that the current cluster version is nearing end of support: Under properties set "fabricClusterAutoupgradeEnabled": false
    • 或者,如果网络 Internet 访问仅限于允许列表中的域,则需要自动升级以下域:go.microsoft.com download.microsoft.comAlternatively, if network internet access is limited to white-listed domains, the domains below are required for automatic upgrade: go.microsoft.com download.microsoft.com
  6. 设置适当的 Service Fabric 防病毒排除项:Set appropriate Service Fabric antivirus exclusions:

    防病毒排除目录Antivirus Excluded directories
    Program Files\Microsoft Service FabricProgram Files\Microsoft Service Fabric
    FabricDataRoot(从群集配置中)FabricDataRoot (from cluster configuration)
    FabricLogRoot(从群集配置中)FabricLogRoot (from cluster configuration)
    防病毒排除进程Antivirus Excluded processes

使用 TestConfiguration 脚本验证环境Validate environment using TestConfiguration script

可以在独立包中找到 TestConfiguration.ps1 脚本。The TestConfiguration.ps1 script can be found in the standalone package. 该脚本可用作最佳做法分析器,验证上述部分标准,并应用作健全性检查,验证是否可在给定环境中部署群集。It is used as a Best Practices Analyzer to validate some of the criteria above and should be used as a sanity check to validate whether a cluster can be deployed on a given environment. 如果出现任何故障,请参阅环境设置下的列表进行故障排除。If there is any failure, refer to the list under Environment Setup for troubleshooting.

可以在对群集配置文件中列为节点的所有计算机具有管理员访问权限的任何计算机上运行此脚本。This script can be run on any machine that has administrator access to all the machines that are listed as nodes in the cluster configuration file. 运行此脚本的计算机不必要是群集的一部分。The machine that this script is run on does not have to be part of the cluster.

PS C:\temp\Microsoft.Azure.ServiceFabric.WindowsServer> .\TestConfiguration.ps1 -ClusterConfigFilePath .\ClusterConfig.Unsecure.DevCluster.json
Trace folder already exists. Traces will be written to existing trace folder: C:\temp\Microsoft.Azure.ServiceFabric.WindowsServer\DeploymentTraces
Running Best Practices Analyzer...
Best Practices Analyzer completed successfully.

LocalAdminPrivilege        : True
IsJsonValid                : True
IsCabValid                 : True
RequiredPortsOpen          : True
RemoteRegistryAvailable    : True
FirewallAvailable          : True
RpcCheckPassed             : True
NoConflictingInstallations : True
FabricInstallable          : True
Passed                     : True

当前此配置测试模块不验证安全配置,因此这必须独立完成。Currently this configuration testing module does not validate the security configuration so this has to be done independently.


我们正在不断改进,旨在使此模块更加可靠,因此如果遇到了可能由 TestConfiguration 导致的故障或丢失情况,请通过我们的支持通道告知我们。We are continually making improvements to make this module more robust, so if there is a faulty or missing case which you believe isn't currently caught by TestConfiguration, please let us know through our support channels.

后续步骤Next steps