有关 Azure Service Fabric 应用程序设计的最佳做法Azure Service Fabric application design best practices

本文提供有关在 Azure Service Fabric 上构建应用程序和服务的最佳做法指导。This article provides best practice guidance for building applications and services on Azure Service Fabric.

熟悉 Service FabricGet familiar with Service Fabric

应用程序设计指导Application design guidance

熟悉 Service Fabric 应用程序的一般体系结构及其设计注意事项Become familiar with the general architecture of Service Fabric applications and their design considerations.

选择 API 网关Choose an API gateway

使用一个 API 网关服务,该服务能够与以后可横向扩展的后端服务通信。最常用的 API 网关服务包括:Use an API gateway service that communicates to back-end services that can then be scaled out. The most common API gateway services used are:

无状态服务Stateless services

我们建议,一开始始终使用可在 Azure 数据库、Azure Cosmos DB 或 Azure 存储中存储状态的 Reliable Services 构建无状态服务。We recommended that you always start by building stateless services by using Reliable Services and storing state in an Azure database, Azure Cosmos DB, or Azure Storage. 对于大多数开发人员而言,使状态外部化是他们更熟悉的方法。Externalized state is the more familiar approach for most developers. 此方法还可以利用存储查询功能。This approach also enables you to take advantage of query capabilities on the store.

何时使用有状态服务When to use stateful services

如果你的方案要求低延迟,并需要使数据靠近计算资源,请考虑使用有状态服务。Consider stateful services when you have a scenario for low latency and need to keep the data close to the compute. 部分示例方案包括 IoT 数字孪生设备、游戏状态、会话状态、缓存数据库中的数据,以及长时间运行的用于跟踪对其他服务的调用的工作流。Some example scenarios include IoT digital twin devices, game state, session state, caching data from a database, and long-running workflows to track calls to other services.

确定数据保留期限:Decide on the data retention time frame:

  • 缓存的数据Cached data. 如果与外部存储之间的延迟是一个问题,请使用缓存。Use caching when latency to external stores is an issue. 将有状态服务用作自己的数据缓存,或考虑使用开源的 SoCreate Service Fabric 分布式缓存Use a stateful service as your own data cache, or consider using the open-source SoCreate Service Fabric Distributed Cache. 在此方案中,无需担心是否会丢失缓存中的所有数据。In this scenario, you don't need to be concerned if you lose all the data in the cache.
  • 时间受限的数据Time-bound data. 在此方案中,需要使数据靠近计算资源以减小某个时间段的延迟,但在发生灾难时,你能够承受得起丢失该数据所造成的影响。 In this scenario, you need to keep data close to compute for a period of time for latency, but you can afford to lose the data in a disaster. 例如,在许多 IoT 解决方案中,数据需要靠近计算资源(例如,计算过去几天的平均温度),但如果此数据丢失,则记录的特定数据点就不是那么重要。For example, in many IoT solutions, data needs to be close to compute, such as when the average temperature over the past few days is being calculated, but if this data is lost, the specific data points recorded aren't that important. 此外,在此方案中,你通常不会关心单个数据点的备份。Also, in this scenario you don't typically care about backing up the individual data points. 你只需备份定期写入到外部存储的平均计算值。You only back up computed average values that are periodically written to external storage.
  • 长期数据Long-term data. 可靠集合可以永久存储数据。Reliable collections can store your data permanently. 但是,在这种情况下,需要为灾难恢复做好准备,包括为群集配置定期备份策略But in this case you need to prepare for disaster recovery, including configuring periodic backup policies for your clusters. 实际上,配置的设置涉及到群集在灾难中受损时要怎样做、在哪种情况下需要创建新的群集、如何部署新应用程序实例以及从最新的备份恢复。In effect, you configure what happens if your cluster is destroyed in a disaster, where you would need to create a new cluster, and how to deploy new application instances and recover from the latest backup.

节省成本并提高可用性:Save costs and improve availability:

  • 可以使用有状态服务降低成本,因为从远程存储执行数据访问和事务不会产生成本,并且无需使用另一个服务(例如 Azure Redis 缓存)。You can reduce costs by using stateful services because you don't incur data access and transactions costs from the remote store, and because you don't need to use another service, like Azure Cache for Redis.
  • 将有状态服务主要用于存储而不是计算的做法会造成很大的成本,我们不建议这样做。Using stateful services primarily for storage and not for compute is expensive, and we don't recommend it. 请考虑将有状态服务用作本地存储成本低廉的计算资源。Think of stateful services as compute with cheap local storage.
  • 消除与其他服务之间的依赖关系可以提高服务可用性。By removing dependencies on other services, you can improve your service availability. 使用群集中的 HA 管理状态可以免受其他服务停机或延迟问题造成的影响。Managing state with HA in the cluster isolates you from other service downtimes or latency issues.

如何使用 Reliable ServicesHow to work with Reliable Services

使用 Service Fabric Reliable Services 可以轻松创建无状态和有状态服务。Service Fabric Reliable Services enables you to easily create stateless and stateful services. 有关详细信息,请参阅 Reliable Services 简介For more information, see the introduction to Reliable Services.

  • 始终遵循 RunAsync() 方法(对于无状态和有状态服务)和 ChangeRole() 方法(对于有状态服务)中的取消标记Always honor the cancellation token in the RunAsync() method for stateless and stateful services and the ChangeRole() method for stateful services. 否则,Service Fabric 不知道是否可以关闭你的服务。If you don't, Service Fabric doesn't know if your service can be closed. 例如,如果你不遵循取消标记,可能会导致应用程序升级时间大幅延长。For example, if you don't honor the cancellation token, much longer application upgrade times can occur.
  • 及时打开和关闭通信侦听器并遵循取消标记。Open and close communication listeners in a timely way, and honor the cancellation tokens.
  • 切勿将同步代码和异步代码混合使用。Never mix sync code with async code. 例如,不要在异步调用中使用 .GetAwaiter().GetResult()For example, don't use .GetAwaiter().GetResult() in your async calls. 在整个调用堆栈中始终使用异步调用。 Use async all the way through the call stack.

如何使用 Reliable ActorsHow to work with Reliable Actors

使用 Service Fabric Reliable Actors 可以轻松创建有状态的虚拟执行组件。Service Fabric Reliable Actors enables you to easily create stateful, virtual actors. 有关详细信息,请参阅 Reliable Actors 简介For more information, see the introduction to Reliable Actors.

  • 强烈建议考虑在执行组件之间使用 pub/sub 消息传送来缩放应用程序。Seriously consider using pub/sub messaging between your actors for scaling your application. 提供此服务的工具包括开源的 SoCreate Service Fabric Pub/SubAzure 服务总线Tools that provide this service include the open-source SoCreate Service Fabric Pub/Sub and Azure Service Bus.
  • 使执行组件状态尽量保持粒度Make the actor state as granular as possible.
  • 管理执行组件的生命周期Manage the actor's life cycle. 如果你不再使用执行组件,请将其删除。Delete actors if you're not going to use them again. 使用易失性状态提供程序时,删除不需要的执行组件尤为重要,因为所有状态存储在内存中。Deleting unneeded actors is especially important when you're using the volatile state provider, because all the state is stored in memory.
  • 最好是将执行组件用作独立的对象,因为它们采用基于轮次的并发Because of their turn-based concurrency, actors are best used as independent objects. 不要创建多执行组件同步方法调用(每个调用很有可能会成为独立的网络调用)的图或创建循环执行组件请求,Don't create graphs of multi-actor, synchronous method calls (each of which most likely becomes a separate network call) or create circular actor requests. 这会对性能和规模造成显著影响。These will significantly affect performance and scale.
  • 不要将同步代码和异步代码混合使用。Don't mix sync code with async code. 始终使用异步代码,以防止出现性能问题。Use async consistently to prevent performance issues.
  • 不在在执行组件中发出长时间运行的调用。Don't make long-running calls in actors. 长时间运行的调用会阻止对同一执行组件发出其他调用,因为执行组件采用基于轮次的并发。Long-running calls will block other calls to the same actor, due to the turn-based concurrency.
  • 如果使用 Service Fabric 远程处理来与其他服务通信,并且你正在创建 ServiceProxyFactory,请在执行组件服务级别而不是执行组件级别创建工厂。 If you're communicating with other services by using Service Fabric remoting and you're creating a ServiceProxyFactory, create the factory at the actor-service level and not at the actor level.

应用程序诊断Application diagnostics

在服务调用中添加应用程序日志记录时应该面面俱到。Be thorough about adding application logging in service calls. 日志记录有助于诊断服务相互调用的方案。It will help you diagnose scenarios in which services call each other. 例如,如果 A 调用 B,B 调用 C,C 调用 D,则调用可能会在任何一个位置失败。For example, when A calls B calls C calls D, the call could fail anywhere. 如果没有足够的日志,将难以诊断。If you don't have enough logging, failures are hard to diagnose. 如果服务记录的日志过多(因为调用量很大),请确保至少记录错误和警告。If the services are logging too much because of call volumes, be sure to at least log errors and warnings.

IoT 和消息传送应用程序IoT and messaging applications

Azure IoT 中心Azure 事件中心读取消息时,请使用 ServiceFabricProcessorWhen you're reading messages from Azure IoT Hub or Azure Event Hubs, use ServiceFabricProcessor. ServiceFabricProcessor 与 Service Fabric Reliable Services 集成,可保留从事件中心分区读取消息的状态,并通过 IEventProcessor::ProcessEventsAsync() 方法将新消息推送到服务。ServiceFabricProcessor integrates with Service Fabric Reliable Services to maintain the state of reading from the event hub partitions and pushes new messages to your services via the IEventProcessor::ProcessEventsAsync() method.