Service Fabric 常见问题Commonly asked Service Fabric questions

有许多关于 Service Fabric 可以做些什么以及应该如何使用的常见问题。There are many commonly asked questions about what Service Fabric can do and how it should be used. 本文档介绍其中许多常见问题及其答案。This document covers many of those common questions and their answers.

备注

本文进行了更新,以便使用新的 Azure PowerShell Az 模块。This article has been updated to use the new Azure PowerShell Az module. 你仍然可以使用 AzureRM 模块,至少在 2020 年 12 月之前,它将继续接收 bug 修补程序。You can still use the AzureRM module, which will continue to receive bug fixes until at least December 2020. 若要详细了解新的 Az 模块和 AzureRM 兼容性,请参阅新 Azure Powershell Az 模块简介To learn more about the new Az module and AzureRM compatibility, see Introducing the new Azure PowerShell Az module. 有关 Az 模块安装说明,请参阅安装 Azure PowerShellFor Az module installation instructions, see Install Azure PowerShell.

群集设置和管理Cluster setup and management

如何回退 Service Fabric 群集证书?How do I roll back my Service Fabric cluster certificate?

回退应用程序的任何升级需要在提交更改的 Service Fabric 群集仲裁前,进行运行状况故障检测;已提交的更改只能前滚。Rolling back any upgrade to your application requires health failure detection prior to your Service Fabric cluster quorum committing the change; committed changes can only be rolled forward. 如果引入了不受监控的重大证书更改,则可能需要呈报工程师的通过客户支持服务才能恢复群集。Escalation engineer's through Customer Support Services, may be required to recover your cluster, if an unmonitored breaking certificate change has been introduced. Service Fabric 的应用程序升级应用应用程序升级参数,并提供零停机时间升级承诺。Service Fabric's application upgrade applies Application upgrade parameters, and delivers zero downtime upgrade promise. 按照建议的应用程序升级监视模式,更新域上的自动更新进度基于运行状况检查是否通过,如果更新默认服务失败,将自动回退。Following our recommended application upgrade monitored mode, automatic progress through update domains is based upon health checks passing, rolling back automatically if updating a default service fails.

如果你的群集仍在利用资源管理器模板中的经典 Certificate Thumbprint 属性,建议你将群集从证书指纹更改为公用名称,以便利用新式机密管理功能。If your cluster is still leveraging the classic Certificate Thumbprint property in your Resource Manager template, it's recommended you Change cluster from certificate thumbprint to common name, to leverage modern secrets management features.

是否可以创建跨越多个 Azure 区域或自己的数据中心的群集?Can I create a cluster that spans multiple Azure regions or my own datacenters?

是的。Yes.

核心 Service Fabric 群集技术可用于将世界各地运行的计算机集合到一起,前提是它们相互之间已建立网络连接。The core Service Fabric clustering technology can be used to combine machines running anywhere in the world, so long as they have network connectivity to each other. 然而,生成并运行这样的群集可能很复杂。However, building and running such a cluster can be complicated.

如果你对此方案感兴趣,建议你通过 Service Fabric GitHub 问题列表或通过你的支持代表与我们联系以获取其他指导。If you are interested in this scenario, we encourage you to get in contact either through the Service Fabric GitHub Issues List or through your support representative in order to obtain additional guidance. Service Fabric 团队正在努力针对此方案提供其他解释、指南和建议。The Service Fabric team is working to provide additional clarity, guidance, and recommendations for this scenario.

应考虑的一些事项:Some things to consider:

  1. 如同构建群集的虚拟机规模集一样,Azure 中的 Service Fabric 群集资源如今也是区域性的。The Service Fabric cluster resource in Azure is regional today, as are the virtual machine scale sets that the cluster is built on. 这意味着如果出现区域性故障,将可能无法通过 Azure 资源管理器或 Azure 门户管理群集。This means that in the event of a regional failure you may lose the ability to manage the cluster via the Azure Resource Manager or the Azure portal. 即使群集仍在运行,且你能够直接与其交互,也可能发生此情况。This can happen even though the cluster remains running and you'd be able to interact with it directly. 此外,Azure 目前不提供创建跨区域可用的单个虚拟网络的功能。In addition, Azure today does not offer the ability to have a single virtual network that is usable across regions. 这意味着 Azure 中的多区域群集需要适用于 VM 规模集中的每个 VM 的公共 IP 地址Azure VPN 网关This means that a multi-region cluster in Azure requires either Public IP Addresses for each VM in the VM Scale Sets or Azure VPN Gateways. 这些网络选择对成本、性能以及某种程度上的应用程序设计都有不同的影响,因此需要在选择此类环境前仔细分析和规划。These networking choices have different impacts on costs, performance, and to some degree application design, so careful analysis and planning is required before standing up such an environment.
  2. 维护、管理和监视这些计算机可能会变得很复杂,尤其是在跨多种类型环境时,比如不同云提供程序之间或本地资源和 Azure 之间 。The maintenance, management, and monitoring of these machines can become complicated, especially when spanned across types of environments, such as between different cloud providers or between on-premises resources and Azure. 必须格外小心,确保在此类环境中运行生产工作负荷前,已了解群集和应用程序的升级、监视、管理和诊断。Care must be taken to ensure that upgrades, monitoring, management, and diagnostics are understood for both the cluster and the applications before running production workloads in such an environment. 如果你已有在 Azure 或自己的数据中心解决这些问题的经验,则很可能这些相同的解决方案在生成或运行 Service Fabric 群集时均适用。If you already have experience solving these problems in Azure or within your own datacenters, then it is likely that those same solutions can be applied when building out or running your Service Fabric cluster.

Service Fabric 节点是否会自动接收操作系统更新?Do Service Fabric nodes automatically receive OS updates?

对于未在 Azure 中运行的群集,我们提供了一个应用程序来修补 Service Fabric 节点下的操作系统。For clusters that are NOT run in Azure, we have provided an application to patch the operating systems underneath your Service Fabric nodes.

是否可以在我的 SF 群集中使用大型虚拟机规模集?Can I use large virtual machine scale sets in my SF cluster?

简短解答 - 否。Short answer - No.

详细解答 - 尽管通过大型虚拟机规模集可将虚拟机规模集缩放至多达 1000 个 VM 实例,但这是通过使用放置组 (PG) 实现的。Long Answer - Although the large virtual machine scale sets allow you to scale a virtual machine scale set up to 1000 VM instances, it does so by the use of Placement Groups (PGs). 容错域 (FD) 和升级域 (UD) 仅在使用 FD 和 UD 来为服务副本/服务实例做出放置决策的放置组 Service Fabric 中保持一致。Fault domains (FDs) and upgrade domains (UDs) are only consistent within a placement group Service fabric uses FDs and UDs to make placement decisions of your service replicas/Service instances. 因为 FD 和 UD 仅在放置组中可比较,因此 SF 无法使用它。Since the FDs and UDs are comparable only within a placement group, SF cannot use it. 例如,如果 PG1 中的 VM1 具有一个 FD=0 的拓扑,并且 PG2 中的 VM9 具有一个 FD=4 的拓扑,这并不意味着 VM1 和 VM2 在两个不同的硬件机架上,因此在这种情况下 SF 无法使用 FD 值做出放置决策。For example, If VM1 in PG1 has a topology of FD=0 and VM9 in PG2 has a topology of FD=4, it does not mean that VM1 and VM2 are on two different Hardware Racks, hence SF cannot use the FD values in this case to make placement decisions.

当前,大型虚拟机规模集还存在其他问题,例如缺少 level-4 负载均衡支持。There are other issues with large virtual machine scale sets currently, like the lack of level-4 Load balancing support.

Service Fabric 群集的最小大小如何?What is the minimum size of a Service Fabric cluster? 为什么不能更小?Why can't it be smaller?

运行生产工作负荷的 Service Fabric 群集支持的最小大小是五个节点。The minimum supported size for a Service Fabric cluster running production workloads is five nodes. 对于开发方案,我们支持单节点群集(已针对 Visual Studio 中的快速开发体验进行优化)和五节点群集。For dev scenarios, we support one node (optimized for quick development experience in Visual Studio) and five node clusters.

由于以下三个原因,我们要求生产群集至少包含 5 个节点:We require a production cluster to have at least 5 nodes because of the following three reasons:

  1. 即使未运行任何用户服务,Service Fabric 群集也会运行一组有状态系统服务,包括命名服务和故障转移管理器服务。Even when no user services are running, a Service Fabric cluster runs a set of stateful system services, including the naming service and the failover manager service. 这些系统服务对于群集的正常运行至关重要。These system services are essential for the cluster to remain operational.
  2. 我们始终为每个节点保留一个服务副本,因此,群集大小是某个服务(实际上是分区)可以包含的副本数上限。We always place one replica of a service per node, so cluster size is the upper limit for the number of replicas a service (actually a partition) can have.
  3. 由于群集升级至少会关闭一个节点,我们希望至少有一个节点可以提供缓冲,因此,生产群集最好是除了裸机以外,至少包含两个节点。 Since a cluster upgrade will bring down at least one node, we want to have a buffer of at least one node, therefore, we want a production cluster to have at least two nodes in addition to the bare minimum. 裸机是下面所述的系统服务仲裁大小。The bare minimum is the quorum size of a system service as explained below.

我们希望该群集在两个节点同时发生故障时保持可用。We want the cluster to be available in the face of simultaneous failure of two nodes. 要使 Service Fabric 群集可用,系统服务必须可用。For a Service Fabric cluster to be available, the system services must be available. 跟踪哪些服务已部署到群集及其当前托管位置的有状态系统服务(例如命名服务和故障转移管理器服务)取决于非常一致性。Stateful system services like naming service and failover manager service, that track what services have been deployed to the cluster and where they're currently hosted, depend on strong consistency. 而这种非常一致性又取决于能否获取仲裁来更新这些服务的状态,其中,仲裁表示给定服务在严格意义上的大多数副本 (N/2 + 1)。That strong consistency, in turn, depends on the ability to acquire a quorum for any given update to the state of those services, where a quorum represents a strict majority of the replicas (N/2 +1) for a given service. 因此,如果我们希望能够弹性应对两个节点同时丢失(因而系统服务的两个副本也会同时丢失)的情况,必须保证 ClusterSize - QuorumSize >= 2,这会将最小大小强制为 5。Thus if we want to be resilient against simultaneous loss of two nodes (thus simultaneous loss of two replicas of a system service), we must have ClusterSize - QuorumSize >= 2, which forces the minimum size to be five. 为了演示这一点,我们假设群集包含 N 个节点,并且系统服务有 N 个副本 - 每个节点上各有一个副本。To see that, consider the cluster has N nodes and there are N replicas of a system service -- one on each node. 系统服务的仲裁大小为 (N/2 + 1)。The quorum size for a system service is (N/2 + 1). 上述不等式类似于 N - (N/2 + 1) >= 2。The above inequality looks like N - (N/2 + 1) >= 2. 要考虑两种情况:N 为偶数,以及 N 为奇数。There are two cases to consider: when N is even and when N is odd. 如果 N 为偶数,例如 N = 2*m,其中 m >= 1,则不等式类似于 2*m - (2*m/2 + 1) >= 2 或 m >= 3。If N is even, say N = 2*m where m >= 1, the inequality looks like 2*m - (2*m/2 + 1) >= 2 or m >= 3. N 的最小值为 6,这是 m = 3 时实现的。The minimum for N is 6 and that is achieved when m = 3. 另一方面,如果 N 为奇数,例如 N = 2*m+1,其中 m >= 1,则不等式类似于 2*m+1 - ( (2*m+1)/2 + 1 ) >= 2 或 2*m+1 - (m+1) >= 2 或 m >= 2。On the other hand, if N is odd, say N = 2*m+1 where m >= 1, the inequality looks like 2*m+1 - ( (2*m+1)/2 + 1 ) >= 2 or 2*m+1 - (m+1) >= 2 or m >= 2. N 的最小值为 5,这是 m = 2 时实现的。The minimum for N is 5 and that is achieved when m = 2. 因此,在满足不等式 ClusterSize - QuorumSize >= 2 的所有 N 值中,最小值为 5。Therefore, among all values of N that satisfy the inequality ClusterSize - QuorumSize >= 2, the minimum is 5.

请注意,在上面的参数中,我们假设每个节点有一个系统服务副本,因此,仲裁大小是根据群集中的节点数计算的。Note, in the above argument we have assumed that every node has a replica of a system service, thus the quorum size is computed based on the number of nodes in the cluster. 但是,我们可以通过更改 TargetReplicaSetSize 来使仲裁大小小于 (N/2+1),这可能会造成这样的观点:可以使用少于 5 个节点的群集,并且仍有 2 个额外的节点可以超过仲裁大小。However, by changing TargetReplicaSetSize we could make the quorum size less than (N/2+1) which might give the impression that we could have a cluster smaller than 5 nodes and still have 2 extra nodes above the quorum size. 例如,在 4 节点群集中,如果将 TargetReplicaSetSize 设置为 3,则基于 TargetReplicaSetSize 的仲裁大小为 (3/2 + 1) 或 2,因此 ClusterSize - QuorumSize = 4-2 >= 2。For example, in a 4 node cluster, if we set the TargetReplicaSetSize to 3, the quorum size based on TargetReplicaSetSize is (3/2 + 1) or 2, thus we have ClusterSize - QuorumSize = 4-2 >= 2. 但是,如果同时丢失任何一对节点,则我们无法保证系统服务将会达到或超过仲裁。有可能丢失的两个节点托管了两个副本,因此,系统服务将进入仲裁丢失状态(只留下一个副本)且不可用。However, we cannot guarantee that the system service will be at or above quorum if we lose any pair of nodes simultaneously, it could be that the two nodes we lost were hosting two replicas, so the system service will go into quorum loss (having only a single replica left) and will become unavailable.

在了解这种背景的前提下,让我们探讨一些可能的群集配置:With that background, let's examine some possible cluster configurations:

单节点:此选项无法提供高可用性,因为出于任何原因丢失单个节点就意味着丢失整个群集。One node: this option does not provide high availability since the loss of the single node for any reason means the loss of the entire cluster.

双节点:跨两个节点 (N = 2) 部署的服务的仲裁为 2 (2/2 + 1 = 2)。Two nodes: a quorum for a service deployed across two nodes (N = 2) is 2 (2/2 + 1 = 2). 丢失单个副本时,无法创建仲裁。When a single replica is lost, it is impossible to create a quorum. 由于执行服务升级要求暂时关闭副本,因此这不是一个有用的配置。Since performing a service upgrade requires temporarily taking down a replica, this is not a useful configuration.

三节点:包含三个节点 (N = 3),创建仲裁仍然要求使用两个节点 (3/2 + 1 = 2)。Three nodes: with three nodes (N=3), the requirement to create a quorum is still two nodes (3/2 + 1 = 2). 这意味着,可以丢失单个节点,同时仍保留仲裁,但两个节点同时发生故障会造成系统服务进入仲裁丢失状态,并导致群集不可用。This means that you can lose an individual node and still maintain quorum, but simultaneous failure of two nodes will drive the system services into quorum loss and will cause the cluster to become unavailable.

四节点:包含四个节点 (N = 4),创建仲裁要求使用三个节点 (4/2 + 1 = 3)。Four nodes: with four nodes (N=4), the requirement to create a quorum is three nodes (4/2 + 1 = 3). 这意味着,可以丢失单个节点,同时仍保留仲裁,但两个节点同时发生故障会造成系统服务进入仲裁丢失状态,并导致群集不可用。This means that you can lose an individual node and still maintain quorum, but simultaneous failure of two nodes will drive the system services into quorum loss and will cause the cluster to become unavailable.

五节点:包含五个节点 (N = 5),创建仲裁仍要求使用三个节点 (5/2 + 1 = 3)。Five nodes: with five nodes (N=5), the requirement to create a quorum is still three nodes (5/2 + 1 = 3). 这意味着,可以同时丢失两个节点,同时仍保留系统服务的仲裁。This means that you can lose two nodes at the same time and still maintain quorum for the system services.

对于生产工作负荷,必须能够弹性应对至少两个节点同时发生故障的情况(例如,群集升级导致一个节点发生故障,其他原因导致一个节点发生故障),因此需要五个节点。For production workloads, you must be resilient to simultaneous failure of at least two nodes (for example, one due to cluster upgrade, one due to other reasons), so five nodes are required.

是否可以在夜间/周末关闭群集以节约成本?Can I turn off my cluster at night/weekends to save costs?

一般而言,不可以。In general, no. Service Fabric 在本地、临时磁盘上存储状态,这意味着如果虚拟机移到其他主机,数据不会随之移动。Service Fabric stores state on local, ephemeral disks, meaning that if the virtual machine is moved to a different host, the data does not move with it. 在正常操作中,这是没有问题的,因为其他节点可使新节点保持最新状态。In normal operation, that is not a problem as the new node is brought up-to-date by other nodes. 但是,如果停止所有节点并重启,很可能发生的情况是,大部分节点在新主机上启动,导致系统无法恢复。However, if you stop all nodes and restart them later, there is a significant possibility that most of the nodes start on new hosts and make the system unable to recover.

如果在部署应用程序之前想要创建群集来测试应用程序,我们建议将这些群集动态创建为持续集成/持续部署管道的一部分。If you would like to create clusters for testing your application before it is deployed, we recommend that you dynamically create those clusters as part of your continuous integration/continuous deployment pipeline.

如何升级操作系统(例如从 Windows Server 2012 升级到 Windows Server 2016)?How do I upgrade my Operating System (for example from Windows Server 2012 to Windows Server 2016)?

我们致力于改善体验,但现在升级由你负责。While we're working on an improved experience, today, you are responsible for the upgrade. 必须升级群集虚拟机上的 OS 映像,一次升级一个 VM。You must upgrade the OS image on the virtual machines of the cluster one VM at a time.

是否可以对群集节点类型(虚拟机规模集)中的附加数据磁盘进行加密?Can I encrypt attached data disks in a cluster node type (virtual machine scale set)?

是的。Yes. 有关详细信息,请参阅创建具有附加数据磁盘的群集用于虚拟机规模集的 Azure 磁盘加密For more information, see Create a cluster with attached data disks and Azure Disk Encryption for Virtual Machine Scale Sets.

是否可以在群集节点类型(虚拟机规模集)中使用低优先级 VM?Can I use low-priority VMs in a cluster node type (virtual machine scale set)?

否。No. 不支持低优先级 VM。Low-priority VMs are not supported.

在群集中运行防病毒程序时需要排除哪些目录和进程?What are the directories and processes that I need to exclude when running an anti-virus program in my cluster?

防病毒排除目录Antivirus Excluded directories
Program Files\Microsoft Service FabricProgram Files\Microsoft Service Fabric
FabricDataRoot(从群集配置中)FabricDataRoot (from cluster configuration)
FabricLogRoot(从群集配置中)FabricLogRoot (from cluster configuration)
防病毒排除进程Antivirus Excluded processes
Fabric.exeFabric.exe
FabricHost.exeFabricHost.exe
FabricInstallerService.exeFabricInstallerService.exe
FabricSetup.exeFabricSetup.exe
FabricDeployer.exeFabricDeployer.exe
ImageBuilder.exeImageBuilder.exe
FabricGateway.exeFabricGateway.exe
FabricDCA.exeFabricDCA.exe
FabricFAS.exeFabricFAS.exe
FabricUOS.exeFabricUOS.exe
FabricRM.exeFabricRM.exe
FileStoreService.exeFileStoreService.exe

应用程序可如何对 KeyVault 进行身份验证以获取机密?How can my application authenticate to KeyVault to get secrets?

下面为应用程序为实现对 KeyVault 的身份验证而获取凭据的方式:The following are means for your application to obtain credentials for authenticating to KeyVault:

A.A. 在应用程序生成/打包作业期间,可以将证书拉进 SF 应用的数据包中,并使用此实现对 KeyVault 的身份验证。During your applications build/packing job, you can pull a certificate into your SF app's data package, and use this to authenticate to KeyVault. B.B. 对于支持虚拟机规模集 MSI 的主机,可为 SF 应用开发一个简单的 PowerShell SetupEntryPoint,以便从 MSI 终结点获取访问令牌,然后从 KeyVault 检索机密For virtual machine scale set MSI enabled hosts, you can develop a simple PowerShell SetupEntryPoint for your SF app to get an access token from the MSI endpoint, and then retrieve your secrets from KeyVault.

应用程序设计Application Design

跨可靠集合的分区查询数据的最佳方法是什么?What's the best way to query data across partitions of a Reliable Collection?

Reliable Collections 通常已分区以支持扩展并提高性能和吞吐量。Reliable collections are typically partitioned to enable scale out for greater performance and throughput. 这意味着,给定服务的状态可以分散在数十甚至数百台计算机上。That means that the state for a given service may be spread across tens or hundreds of machines. 若要对这个完整的数据集执行操作,有以下几个选项:To perform operations over that full data set, you have a few options:

  • 创建查询其他服务的所有分区的服务,拉入所需的数据。Create a service that queries all partitions of another service to pull in the required data.
  • 创建可从其他服务的所有分区接收数据的服务。Create a service that can receive data from all partitions of another service.
  • 定期从每个服务将数据推送到外部存储。Periodically push data from each service to an external store. 此方法仅适用于要执行的查询不属于核心业务逻辑的情况,因为外部存储的数据会过时。This approach is only appropriate if the queries you're performing are not part of your core business logic, as the external store's data will be stale.
  • 也可要求存储的数据支持跨所有记录直接在数据存储中进行查询,而不是在可靠集合中查询。Alternatively, store data that must support querying across all records directly in a data store rather than in a reliable collection. 这消除了过时数据带来的问题,但无法利用可靠集合的优势。This eliminates the issue with stale data, but doesn't allow the advantages of reliable collections to be leveraged.

跨执行组件查询数据的最佳方法是什么?What's the best way to query data across my actors?

执行组件设计为独立的状态和计算单元,因此,不建议在运行时对执行组件状态执行广泛查询。Actors are designed to be independent units of state and compute, so it is not recommended to perform broad queries of actor state at runtime. 如果需要跨执行组件状态的完整集进行查询,应考虑以下方法之一:If you have a need to query across the full set of actor state, you should consider either:

  • 将执行组件服务替换为有状态 Reliable Services,使得从一定数目的执行组件中收集所有数据的网络请求数等于服务中的分区数。Replacing your actor services with stateful reliable services, so that the number of network requests to gather all data from the number of actors to the number of partitions in your service.
  • 将执行组件设计为定期将其状态推送到外部存储,以方便查询。Designing your actors to periodically push their state to an external store for easier querying. 如上所述,此方法仅在运行时行为不需要所执行的查询时才可行。As above, this approach is only viable if the queries you're performing are not required for your runtime behavior.

可以在可靠集合中存储多少数据?How much data can I store in a Reliable Collection?

Reliable Services 通常已分区,因此,存储量仅受限于群集中的计算机数量以及这些计算机的可用内存量。Reliable services are typically partitioned, so the amount you can store is only limited by the number of machines you have in the cluster, and the amount of memory available on those machines.

例如,某个服务中的 Reliable Collection 包含 100 个分区和 3 个副本,存储平均大小为 1 KB 的对象。As an example, suppose that you have a reliable collection in a service with 100 partitions and 3 replicas, storing objects that average 1 kb in size. 现在假设群集中有 10 台计算机,每台计算机的内存为 16 GB。Now suppose that you have a 10 machine cluster with 16gb of memory per machine. 我们做一个简单且保守的假设:操作系统和系统服务、Service Fabric 运行时以及服务消耗了 6GB 内存,也就是说,每个计算机还剩余 10GB 内存,整个群集还有 100GB 内存。For simplicity and to be conservative, assume that the operating system and system services, the Service Fabric runtime, and your services consume 6gb of that, leaving 10gb available per machine, or 100 gb for the cluster.

请记住,每个对象必须存储三次(一个主要副本和两个次要副本),满负荷运行时,集合可提供足够的内存来存储约 3.5 亿个对象。Keeping in mind that each object must be stored three times (one primary and two replicas), you would have sufficient memory for approximately 35 million objects in your collection when operating at full capacity. 但是,我们建议弹性应对故障域和升级域同时丢失的情况,这需要约 1/3 的容量,因此,上述数字减至约 2.3 亿个。However, we recommend being resilient to the simultaneous loss of a failure domain and an upgrade domain, which represents about 1/3 of capacity, and would reduce the number to roughly 23 million.

请注意,此计算还假设:Note that this calculation also assumes:

  • 跨分区的数据分布大致是均匀的,或者可向群集资源管理器报告负载指标。That the distribution of data across the partitions is roughly uniform or that you're reporting load metrics to the Cluster Resource Manager. 默认情况下,Service Fabric 会根据副本计数执行负载均衡。By default, Service Fabric loads balance based on replica count. 在前面的示例中,群集中的每个节点上会放置 10 个主副本和 20 个辅助副本。In the preceding example, that would put 10 primary replicas and 20 secondary replicas on each node in the cluster. 这也适用于跨分区均匀分布的负载。That works well for load that is evenly distributed across the partitions. 如果负载不均衡,必须报告负载,使 Resource Manager 能够将较小副本打包在一起,增加较大副本在单个节点上占用的内存。If load is not even, you must report load so that the Resource Manager can pack smaller replicas together and allow larger replicas to consume more memory on an individual node.

  • 讨论的可靠服务是群集中存储状态的唯一服务。That the reliable service in question is the only one storing state in the cluster. 由于可将多个服务部署到群集,因此你需要知道每个服务运行时以及管理其状态时所需的资源。Since you can deploy multiple services to a cluster, you need to be mindful of the resources that each needs to run and manage its state.

  • 群集本身不会变大或缩小。That the cluster itself is not growing or shrinking. 如果添加更多计算机,Service Fabric 会重新平衡副本以利用新增容量,直到计算机数量超过服务中的分区数量,因为单独的副本不能跨计算机。If you add more machines, Service Fabric will rebalance your replicas to leverage the additional capacity until the number of machines surpasses the number of partitions in your service, since an individual replica cannot span machines. 相比之下,如果通过删除计算机来减小群集大小,副本将更加严密地打包,减少总体容量消耗。By contrast, if you reduce the size of the cluster by removing machines, your replicas are packed more tightly and have less overall capacity.

可以在执行组件中存储多少数据?How much data can I store in an actor?

和 Reliable Services 一样,可以在执行组件服务中存储的数据量仅受限于群集中各个节点的总磁盘空间和可用内存。As with reliable services, the amount of data that you can store in an actor service is only limited by the total disk space and memory available across the nodes in your cluster. 但是,单独的执行组件在用于封装少量状态和关联的业务逻辑时效率最高。However, individual actors are most effective when they are used to encapsulate a small amount of state and associated business logic. 一般而言,单独的执行组件应具有以千字节为单位的状态。As a general rule, an individual actor should have state that is measured in kilobytes.

其他问题Other questions

Service Fabric 如何与容器关联?How does Service Fabric relate to containers?

容器提供打包服务及其依赖项的简单方法,以便它们能够在所有环境中一致地运行并且可在单台计算机上以隔离方式运行。Containers offer a simple way to package services and their dependencies such that they run consistently in all environments and can operate in an isolated fashion on a single machine. Service Fabric 提供部署和管理服务的方法,包括 已打包在容器中的服务Service Fabric offers a way to deploy and manage services, including services that have been packaged in a container.

你们是否打算开放 Service Fabric 源代码?Are you planning to open-source Service Fabric?

我们已在 GitHub 上开放了部分 Service Fabric 源代码(可靠服务框架可靠执行组件框架ASP.NET Core 集成库Service Fabric ExplorerService Fabric CLI),并接受有关这些项目的社区投稿。We have open-sourced parts of Service Fabric (reliable services framework, reliable actors framework, ASP.NET Core integration libraries, Service Fabric Explorer, and Service Fabric CLI) on GitHub and accept community contributions to those projects.

我们最近宣布了计划完全开放 Service Fabric 运行时源代码。We recently announced that we plan to open-source the Service Fabric runtime. 当前,Service Fabric 存储库可在 GitHub 与 Linux 生成和测试工具上运行,这意味着可克隆存储库、为 Linux 构建 Service Fabric、运行基本测试、提出问题并提交拉取请求。At this point we have the Service Fabric repo up on GitHub with Linux build and test tools, which means you can clone the repo, build Service Fabric for Linux, run basic tests, open issues, and submit pull requests. 我们正在努力将 Windows 生成环境以及完整的 CI 环境迁移过来。We're working hard to get the Windows build environment migrated over as well, along with a complete CI environment.

有关已发布的更多详细信息,请参阅 Service Fabric 博客Follow the Service Fabric blog for more details as they're announced.

后续步骤Next steps

了解核心 Service Fabric 概念最佳做法Learn about core Service Fabric concepts and best practices