规划和准备群集部署Plan and prepare for a cluster deployment

生产群集部署的规划和准备非常重要。Planning and preparing for a production cluster deployment is very important. 需要考虑到许多因素。There are many factors to consider. 本文将引导你完成准备群集部署的步骤。This article walks you through the steps of preparing your cluster deployment.

阅读最佳做法信息Read the best-practices information

若要成功管理 Azure Service Fabric 应用程序和群集,我们强烈建议你执行某些操作,以优化生产环境的可靠性。To manage Azure Service Fabric applications and clusters successfully, there are operations that we highly recommend you perform to optimize the reliability of your production environment. 有关详细信息,请参阅 Azure Service Fabric 应用程序和群集最佳做法For more information, read Service Fabric application and cluster best practices.

选择群集的 OSSelect the OS for the cluster

使用 Service Fabric 可在运行 Windows Server 或 Linux 的任何 VM 或计算机上创建 Service Fabric 群集。Service Fabric allows for the creation of Service Fabric clusters on any VMs or computers running Windows Server or Linux. 在部署群集之前,必须选择 OS:Windows 或 Linux。Before deploying your cluster, you must choose the OS: Windows or Linux. 群集中的每个节点(虚拟机)运行相同的 OS,不能在同一群集中混用 Windows 和 Linux VM。Every node (virtual machine) in the cluster runs the same OS, you cannot mix Windows and Linux VMs in the same cluster.

容量计划Capacity planning

对于任何生产部署,容量规划都是一个重要的步骤。For any production deployment, capacity planning is an important step. 下面是在规划过程中必须注意的一些事项。Here are some things to consider as a part of that process.

  • 群集的初始节点类型数目The initial number of node types for your cluster
  • 每个节点类型的属性(大小、实例数目、是否为主节点、是否面向 Internet、VM 数目,等等)The properties of each of node type (size, number of instances, primary, internet facing, number of VMs, etc.)
  • 群集的可靠性和持久性特征The reliability and durability characteristics of the cluster

选择初始节点类型数目Select the initial number of node types

首先,需要确定要创建的群集用于什么目的,First, you need to figure out what the cluster you are creating is going to be used for. 以及打算要将哪些类型的应用程序部署到此群集中。What kinds of applications you are planning to deploy into this cluster? 应用程序是否有多个服务,其中是否有任何服务需面向公众或面向 Internet?Does your application have multiple services, and do any of them need to be public or internet facing? (构成应用程序的)服务是否有不同的基础结构要求,例如,更多的 RAM 或更高的 CPU 周期?Do your services (that make up your application) have different infrastructure needs such as greater RAM or higher CPU cycles? Service Fabric 群集可以包括多个节点类型:主节点类型,以及一个或多个非主节点类型。A Service Fabric cluster can consist of more than one node type: a primary node type and one or more non-primary node types. 每个节点类型将映射到虚拟机规模集。Each node type is mapped to a virtual machine scale set. 然后,每个节点类型可以独立扩展或缩减、打开不同的端口集,并可以有不同的容量指标。Each node type can then be scaled up or down independently, have different sets of ports open, and can have different capacity metrics. 可以设置节点属性和放置约束,以将特定服务限制为特定节点类型。Node properties and placement constraints can be set up to constrain specific services to specific node types. 有关详细信息,请参阅 Service Fabric 群集容量计划For more information, see Service Fabric cluster capacity planning.

选择每个节点类型的节点属性Select node properties for each node type

节点类型定义关联规模集中 VM 的 VM SKU、数目和属性。Node types define the VM SKU, number, and properties of the VMs in the associated scale set.

每个节点类型的 VM 大小下限取决于为节点类型选择的持久性层The minimum size of VMs for each node type is determined by the durability tier you choose for the node type.

主节点类型的 VM 数目下限取决于选择的可靠性层The minimum number of VMs for the primary node type is determined by the reliability tier you choose.

请参阅主节点类型非主节点类型上的有状态工作负荷非主节点类型上的无状态工作负荷的最低建议要求。See the minimum recommendations for primary node types, stateful workloads on non-primary node types, and stateless workloads on non-primary node types.

如果节点数目超过最小数目,应根据想要在此节点类型中运行的应用程序/服务的副本数目确定数目。Any more than the minimum number of nodes should be based on the number of replicas of the application/services that you want to run in this node type. Service Fabric 应用程序的容量规划可帮助你估算运行应用程序所需的资源。Capacity planning for Service Fabric applications helps you estimate the resources you need to run your applications. 以后始终可以纵向扩展或缩减群集,以根据不断变化的应用程序工作负荷做出调整。You can always scale the cluster up or down later to adjust for changing application workload.

将临时 OS 磁盘用于虚拟机规模集Use ephemeral OS disks for virtual machine scale sets

“临时 OS 磁盘”是在本地虚拟机 (VM) 上创建的存储,不保存到远程 Azure 存储。Ephemeral OS disks are storage created on the local virtual machine (VM), and not saved to remote Azure Storage. 建议将它们用于所有 Service Fabric 节点类型(主要和次要),因为与传统的持久 OS 磁盘相比,临时 OS 磁盘:They are recommended for all Service Fabric node types (Primary and Secondary), because compared to traditional persistent OS disks, ephemeral OS disks:

  • 降低了到 OS 磁盘的读/写延迟Reduce read/write latency to OS disk
  • 可实现更快的重置/重置节点映像管理操作Enable faster reset/reimage node management operations
  • 降低了总体成本(磁盘免费,不会产生额外的存储成本)Reduce overall costs (the disks are free and incur no additional storage cost)

临时 OS 磁盘不是特定的 Service Fabric 功能,而是映射到 Service Fabric 节点类型的 Azure“虚拟机规模集”的功能。Ephemeral OS disks is not a specific Service Fabric feature, but rather a feature of the Azure virtual machine scale sets that are mapped to Service Fabric node types. 将它们与 Service Fabric 一起使用需要在群集 Azure 资源管理器模板中执行以下操作:Using them with Service Fabric requires the following in your cluster Azure Resource Manager template:

  1. 确保你的节点类型为临时 OS 磁盘指定支持的 Azure VM 大小,并且 VM 大小有足够的缓存大小来支持其 OS 磁盘大小(请参阅下文中的注释。)例如:Ensure your node types specify supported Azure VM sizes for Ephemeral OS disks, and that the VM size has sufficient cache size to support its OS disk size (see Note below.) For example:

    "vmNodeType1Size": {
        "type": "string",
        "defaultValue": "Standard_DS3_v2"
    

    备注

    请确保选择缓存大小等于或大于 VM 本身 OS 磁盘大小的 VM 大小,否则,Azure 部署可能会导致错误(即使最初接受了该大小)。Be sure to select a VM size with a cache size equal or greater than the OS disk size of the VM itself, otherwise your Azure deployment might result in error (even if it's initially accepted).

  2. 将虚拟机规模集版本 (vmssApiVersion) 指定为 2018-06-01 或更高版本:Specify a virtual machine scale set version (vmssApiVersion) of 2018-06-01 or later:

    "variables": {
        "vmssApiVersion": "2018-06-01",
    
  3. 在部署模板的虚拟机规模集部分中,为 diffDiskSettings 指定 Local 选项:In the virtual machine scale set section of your deployment template, specify Local option for diffDiskSettings:

    "apiVersion": "[variables('vmssApiVersion')]",
    "type": "Microsoft.Compute/virtualMachineScaleSets",
        "virtualMachineProfile": {
            "storageProfile": {
                "osDisk": {
                        "caching": "ReadOnly",
                        "createOption": "FromImage",
                        "diffDiskSettings": {
                            "option": "Local"
                        },
                }
            }
        }
    

备注

用户应用程序不应在 OS 磁盘上有任何依赖项/文件/项目,因为 OS 升级时 OS 磁盘会丢失。User applications should not have any dependency/file/artifact on the OS disk, as the OS disk would be lost in the case of an OS upgrade. 因此,建议不要在临时磁盘上使用 PatchOrchestrationApplicationHence, it is not recommended to use PatchOrchestrationApplication with ephemeral disks.

备注

现有的非临时 VMSS 无法就地升级,因此无法使用临时磁盘。Existing non-ephemeral VMSS can't be upgraded in-place to use ephemeral disks. 若要进行迁移,用户必须使用临时磁盘添加新的 nodeType,将工作负荷移至新的 nodeType 并删除现有 nodeType。To migrate, users will have to add a new nodeType with ephemeral disks, move the workloads to the new nodeType & remove the existing nodeType.

有关详细信息和更多配置选项,请参阅 Azure VM 的临时 OS 磁盘For more info and further configuration options, see Ephemeral OS disks for Azure VMs

选择群集的持续性和可靠性级别Select the durability and reliability levels for the cluster

持久性层用于向系统指示 VM 对于基本 Azure 基础结构拥有的权限。The durability tier is used to indicate to the system the privileges that your VMs have with the underlying Azure infrastructure. 在主节点类型中,此权限可让 Service Fabric 暂停影响系统服务及有状态服务的仲裁要求的任何 VM 级别基础结构请求(例如,VM 重启、VM 重置映像或 VM 迁移)。In the primary node type, this privilege allows Service Fabric to pause any VM level infrastructure request (such as a VM reboot, VM reimage, or VM migration) that impact the quorum requirements for the system services and your stateful services. 在非主节点类型中,此特权可让 Service Fabric 暂停影响其中运行的有状态服务的仲裁要求的任何 VM 级别基础结构请求,例如,VM 重新启动、VM 重置映像、VM 迁移,等等。In the non-primary node types, this privilege allows Service Fabric to pause any VM level infrastructure requests (such as VM reboot, VM reimage, and VM migration) that impact the quorum requirements for your stateful services. 有关不同级别的优势、要使用哪种级别以及何时使用的建议,请参阅群集的持久性特征For advantages of the different levels and recommendations on which level to use and when, see The durability characteristics of the cluster.

可靠性层用于设置想要在此群集中的主节点类型上运行的系统服务副本数。The reliability tier is used to set the number of replicas of the system services that you want to run in this cluster on the primary node type. 副本数越大,群集中的系统服务越可靠。The more the number of replicas, the more reliable the system services are in your cluster. 有关不同级别的优势、要使用哪种级别以及何时使用的建议,请参阅群集的可靠性特征For advantages of the different levels and recommendations on which level to use and when, see The reliability characteristics of the cluster.

启用反向代理和/或 DNSEnable reverse proxy and/or DNS

在群集内相互连接的服务通常可以直接访问其他服务的终结点,因为群集中的节点处于相同的本地网络上。Services connecting to each other inside a cluster generally can directly access the endpoints of other services because the nodes in a cluster are on the same local network. 为了更轻松地在服务之间进行连接,Service Fabric 提供了附加的服务:DNS 服务反向代理服务To make it easier to connect between services, Service Fabric provides additional services: A DNS service and a reverse proxy service. 部署群集时,可以启用这两个服务。Both services can be enabled when deploying a cluster.

由于许多服务(特别是容器化服务)可以拥有一个现有的 URL 名称,能够使用标准 DNS 协议(而不是命名服务协议)解析这些名称会十分方便,尤其是在应用程序“直接迁移”方案中。Since many services, especially containerized services, can have an existing URL name, being able to resolve these using the standard DNS protocol (rather than the Naming Service protocol) is convenient, especially in application "lift and shift" scenarios. 这正是 DNS 服务能够发挥作用的地方。This is exactly what the DNS service does. 借助 DNS 服务,用户能够将 DNS 名称映射到服务名称,进而解析终结点 IP 地址。It enables you to map DNS names to a service name and hence resolve endpoint IP addresses.

反向代理处理群集中的服务,群集公开 HTTP 终结点(包括 HTTPS)。The reverse proxy addresses services in the cluster that expose HTTP endpoints (including HTTPS). 反向代理提供特定的 URI 格式,可以极大地简化对其他服务的调用。The reverse proxy greatly simplifies calling other services by providing a specific URI format. 反向代理还可以处理服务间相互通信所需的解析、连接和重试步骤。The reverse proxy also handles the resolve, connect, and retry steps required for one service to communicate with another.

为灾难恢复做准备Prepare for disaster recovery

提供高可用性的关键一环是确保服务能够经受各种不同类型的故障。A critical part of delivering high-availability is ensuring that services can survive all different types of failures. 对于计划外和不受控制的故障,这一点尤其重要。This is especially important for failures that are unplanned and outside of your control. 准备灾难恢复介绍了一些常见的故障模式,如果未正确建模和管理,这些故障可能成为灾难。Prepare for disaster recovery describes some common failure modes that could be disasters if not modeled and managed correctly. 此文还介绍发生灾难时应采取的缓解措施和行动。It also discusses mitigations and actions to take if a disaster happened anyway.

生产就绪情况核对清单Production readiness checklist

应用程序和群集是否准备好接收生产流量?Is your application and cluster ready to take production traffic? 在将群集部署到生产环境之前,请运行整个生产就绪性核对清单Before deploying your cluster to production, run through the Production readiness checklist. 通过检查此核对清单中的各个项,使应用程序和群集保持平稳运行。Keep your application and cluster running smoothly by working through the items in this checklist. 我们强烈建议在转移到生产环境之前检查所有这些项。We strongly recommend all these items to be checked off before going into production.

后续步骤Next steps