纵向扩展 Service Fabric 群集主节点类型Scale up a Service Fabric cluster primary node type

本文介绍如何通过增加虚拟机资源来纵向扩展 Service Fabric 群集主节点类型。This article describes how to scale up a Service Fabric cluster primary node type by increasing the virtual machine resources. Service Fabric 群集是一组通过网络连接在一起的虚拟机或物理计算机,微服务会在其中部署和管理。A Service Fabric cluster is a network-connected set of virtual or physical machines into which your microservices are deployed and managed. 属于群集一部分的计算机或 VM 称为节点。A machine or VM that's part of a cluster is called a node. 虚拟机规模集是一种 Azure 计算资源,用于将一组 VM 作为一个集进行部署和管理。Virtual machine scale sets are an Azure compute resource that you use to deploy and manage a collection of virtual machines as a set. Azure 群集中定义的每个节点类型设置为独立的规模集Every node type that is defined in an Azure cluster is set up as a separate scale set. 然后可以单独管理每个节点类型。Each node type can then be managed separately. 创建 Service Fabric 群集后,可以纵向缩放群集节点类型(更改节点的资源)或升级节点类型 VM 的操作系统。After creating a Service Fabric cluster, you can scale a cluster node type vertically (change the resources of the nodes) or upgrade the operating system of the node type VMs. 随时可以缩放群集,即使该群集上正在运行工作负荷。You can scale the cluster at any time, even when workloads are running on the cluster. 在缩放群集的同时,应用程序也会随之自动缩放。As the cluster scales, your applications automatically scale as well.

警告

如果集群状态不正常,请勿尝试主节点类型纵向扩展过程,因为这只会进一步破坏集群的稳定性。Do not attempt a primary node type scale up procedure if the cluster status is unhealthy, as this will only destabilize the cluster further.

备注

本文进行了更新,以便使用新的 Azure PowerShell Az 模块。This article has been updated to use the new Azure PowerShell Az module. 你仍然可以使用 AzureRM 模块,至少在 2020 年 12 月之前,它将继续接收 bug 修补程序。You can still use the AzureRM module, which will continue to receive bug fixes until at least December 2020. 若要详细了解新的 Az 模块和 AzureRM 兼容性,请参阅新 Azure Powershell Az 模块简介To learn more about the new Az module and AzureRM compatibility, see Introducing the new Azure PowerShell Az module. 有关 Az 模块安装说明,请参阅安装 Azure PowerShellFor Az module installation instructions, see Install Azure PowerShell.

升级主节点类型 VM 大小和操作系统的过程Process to upgrade the size and operating system of the primary node type VMs

以下是主节点类型 VM 的 VM 大小和操作系统的更新过程。Here is the process for updating the VM size and operating system of the primary node type VMs. 升级后,主节点类型 VM 的大小为标准 D4_V2,并且运行带容器的 Windows Server 2016 Datacenter。After the upgrade, the primary node type VMs are size Standard D4_V2 and running Windows Server 2016 Datacenter with Containers.

警告

在生产群集上尝试执行此过程之前,建议先研究示例模板并对测试群集验证此过程。Before attempting this procedure on a production cluster, we recommend that you study the sample templates and verify the process against a test cluster. 该群集也会有段时间不可用。The cluster is also unavailable for a time. 不能并行对声明为相同 NodeType 的多个 VMSS 执行更改,需要执行单独的部署操作来单独为每个 NodeType VMSS 应用更改。You can NOT make changes to multiple VMSS declared as the same NodeType in parallel; you will need to perform separated deployment operations to apply changes to each NodeType VMSS individually.

  1. 使用这些示例模板参数文件部署包含两种节点类型和两个规模集(每种节点类型一个规模集)的初始群集。Deploy the initial cluster with two node types and two scale sets (one scale set per node type) using these sample template and parameters files. 这两个规模集的大小均为标准 D2_V2,并且都运行 Windows Server 2012 R2 Datacenter。Both scale sets are size Standard D2_V2 and running Windows Server 2012 R2 Datacenter. 等待群集完成基线升级。Wait for the cluster to complete the baseline upgrade.
  2. 可选:向群集部署有状态示例。Optional- deploy a stateful sample to the cluster.
  3. 在决定升级主节点类型 VM 以后,使用这些示例模板参数文件向主节点类型添加新的规模集,这样一来,主节点类型现在就有两个规模集。After deciding to upgrade the primary node type VMs, add a new scale set to the primary node type using these sample template and parameters files so the primary node type now has two scale sets. 系统服务和用户应用程序能够在两个不同规模集中的 VM 之间迁移。System services and user applications are able to migrate between VMs in the two different scale sets. 新规模集 VM 的大小为标准 D4_V2,并运行带容器的 Windows Server 2016 Datacenter。The new scale set VMs are size Standard D4_V2 and run Windows Server 2016 Datacenter with Containers. 添加新的规模集时也会添加新的负载均衡器和公共 IP 地址。A new load balancer and public IP address are also added with the new scale set.
    若要在模板中查找新的规模集,请搜索由 vmNodeType2Name 参数命名的“Microsoft.Compute/virtualMachineScaleSets”资源。To find the new scale set in the template, search for the "Microsoft.Compute/virtualMachineScaleSets" resource named by the vmNodeType2Name parameter. 系统使用 properties->virtualMachineProfile->extensionProfile->extensions->properties->settings->nodeTypeRef 设置将新的规模集添加到主节点类型中。The new scale set is added to the primary node type using the properties->virtualMachineProfile->extensionProfile->extensions->properties->settings->nodeTypeRef setting.
  4. 检查群集运行状况并验证所有节点是否都处于正常状态。Check the cluster health and verify all the nodes are healthy.
  5. 禁用主节点类型的旧规模集中的节点,以便删除节点。Disable the nodes in the old scale set of the primary node type with the intent to remove node. 可以一次禁用所有节点,并且这些操作会排入队列。You can disable all at once and the operations are queued. 等到所有节点都被禁用,这可能需要一些时间。Wait until all nodes are disabled, which may take some time. 由于禁用了节点类型中较旧的节点,因此,系统服务和种子节点会迁移到主节点类型中新规模集的 VM。As the older nodes in the node type are disabled, the system services and seed nodes migrate to the VMs of the new scale set in the primary node type.
  6. 从主节点类型中删除较旧的规模集。Remove the older scale set from the primary node type. (如果在步骤 5 中禁用了节点,则在 Azure 门户的虚拟机规模集边栏选项卡中,逐一从旧节点类型取消分配节点。)(After the nodes are disabled as in step 5, in the virtual machine scale set blade in the Azure portal, deallocate the nodes from the old node type one by one.)
  7. 删除与旧规模集关联的负载均衡器。Remove the load balancer associated with the old scale set. 在为新规模集配置新的公共 IP 地址和负载均衡器时,群集不可用。The cluster is unavailable while the new public IP address and load balancer are configured for the new scale set.
  8. 将与旧的主节点类型规模集关联的公共 IP 地址的 DNS 设置存储在变量中,并删除该公共 IP 地址。Store DNS settings of the public IP address associated with the old primary node type scale set in a variable and remove that public IP address.
  9. 将与新的主节点类型规模集关联的公共 IP 地址的 DNS 设置替换为已删除的公共 IP 地址的 DNS 设置。Replace the DNS settings of the public IP address associated with the new primary node type scale set with the DNS settings of the deleted public IP address. 现在可以再次访问群集。The cluster is now reachable again.
  10. 从群集中删除节点的节点状态。Remove the node state of the nodes from the cluster. 如果旧规模集的持续性级别为银级或金级,则此步骤由系统自动完成。If the durability level of the old scale set was silver or gold, this step is done by the system automatically.
  11. 如果在之前的步骤中部署了有状态的应用程序,请验证该应用程序能否正常运行。If you deployed the stateful application in a previous step, verify that the application is functional.

设置测试群集Set up the test cluster

首先下载本教程需要的两组文件:之前的模板参数以及之后的模板参数Begin by downloading the two sets of files we'll need for this tutorial, the before template and parameters and the after template and parameters.

接下来,登录 Azure 帐户。Next, sign in to your Azure account.

# sign in to your Azure account and select your subscription
Connect-AzAccount -Environment AzureChinaCloud -SubscriptionId "<your subscription ID>"

本教程将指导你完成创建自签名证书的方案。This tutorial walks through the scenario of creating a self-signed certificate. 要使用 Azure 密钥保管库中的现有证书,请跳过下面的步骤,改为按照使用现有证书部署群集中的步骤进行操作。To use an existing certificate from Azure Key Vault, skip the step below and instead mirror the steps in using an existing certificate to deploy the cluster.

生成自签名证书并部署群集Generate a self-signed certificate and deploy the cluster

首先,为 Service Fabric 群集部署分配所需的变量。First, assign the variables you'll need for Service Fabric cluster deployment. 针对特定帐户和环境调整 resourceGroupNamecertSubjectNameparameterFilePathtemplateFilePath 的值:Adjust the values for resourceGroupName, certSubjectName, parameterFilePath, and templateFilePath for your specific account and environment:

# Assign deployment variables
$resourceGroupName = "sftestupgradegroup"
$certOutputFolder = "c:\certificates"
$certPassword = "Password!1" | ConvertTo-SecureString -AsPlainText -Force
$certSubjectName = "sftestupgrade.chinaeast.cloudapp.chinacloudapi.cn"
$templateFilePath = "C:\Deploy-2NodeTypes-2ScaleSets.json"
$parameterFilePath = "C:\Deploy-2NodeTypes-2ScaleSets.parameters.json"

备注

请确保在运行命令之前,certOutputFolder 位置存在于本地计算机上,以便部署新的 Service Fabric 群集。Ensure that the certOutputFolder location exist on your local machine before running the command to deploy a new Service Fabric cluster.

接下来,打开 Deploy-2NodeTypes-2ScaleSets.parameters.json 文件,并调整 clusterNamednsName 的值,使其与 PowerShell 中设置的动态值相对应,并保存所做的更改。Next open the Deploy-2NodeTypes-2ScaleSets.parameters.json file and adjust the values for clusterName and dnsName to correspond to the dynamic values you set in PowerShell and save your changes.

然后,部署 Service Fabric 测试群集:Then deploy the Service Fabric test cluster:

# Deploy the initial test cluster
New-AzServiceFabricCluster `
    -ResourceGroupName $resourceGroupName `
    -CertificateOutputFolder $certOutputFolder `
    -CertificatePassword $certPassword `
    -CertificateSubjectName $certSubjectName `
    -TemplateFile $templateFilePath `
    -ParameterFile $parameterFilePath

部署完成后,在本地计算机上找到 .pfx 文件 ($certPfx),然后将其导入证书存储:Once the deployment is complete, locate the .pfx file ($certPfx) on your local machine and import it to your certificate store:

cd c:\certificates
$certPfx = ".\sftestupgradegroup20200312121003.pfx"

Import-PfxCertificate `
     -FilePath $certPfx `
     -CertStoreLocation Cert:\CurrentUser\My `
     -Password (ConvertTo-SecureString Password!1 -AsPlainText -Force)

该操作将返回证书指纹,你将使用该指纹连接到新群集并检查其运行状况。The operation will return the certificate thumbprint, which you'll use to connect to the new cluster and check its health status.

连接到新群集并检查运行状况Connect to the new cluster and check health status

连接到群集,并确保其所有节点都正常运行(替换群集的 clusterNamethumb 变量):Connect to the cluster and ensure that all of its nodes are healthy (replacing the clusterName and thumb variables for your cluster):

# Connect to the cluster
$clusterName = "sftestupgrade.chinaeast.cloudapp.chinacloudapi.cn:19000"
$thumb = "BB796AA33BD9767E7DA27FE5182CF8FDEE714A70"

Connect-ServiceFabricCluster `
    -ConnectionEndpoint $clusterName `
    -KeepAliveIntervalInSec 10 `
    -X509Credential `
    -ServerCertThumbprint $thumb  `
    -FindType FindByThumbprint `
    -FindValue $thumb `
    -StoreLocation CurrentUser `
    -StoreName My

# Check cluster health
Get-ServiceFabricClusterHealth

我们已准备好开始升级过程。We are ready to begin the upgrade procedure.

升级主节点类型 VMUpgrade the primary node type VMs

在决定升级主节点类型 VM 以后,向主节点类型添加新的规模集,确保主节点类型现在有两个规模集。After deciding to upgrade the primary node type VMs, add a new scale set to the primary node type such that the primary node type now has two scale sets. 提供了示例模板参数文件来显示必要的更改。A sample template and parameters files have been provided to show the necessary changes. 新规模集的 VM 大小为 Standard D4_V2,并运行包含容器的 Windows Server 2016 Datacenter。The new scale set's VMs are size Standard D4_V2 and run Windows Server 2016 Datacenter with Containers. 添加新的规模集时也会添加新的负载均衡器和公共 IP 地址。A new load balancer and public IP address are also added with the new scale set.

要在模板中查找新规模集,请搜索以 vmNodeType2Name 参数命名的“Microsoft.Compute/virtualMachineScaleSets”资源。To find the new scale set in the template, search for the "Microsoft.Compute/virtualMachineScaleSets" resource named by the vmNodeType2Name parameter. 系统使用 properties->virtualMachineProfile->extensionProfile->extensions->properties->settings->nodeTypeRef 设置将新的规模集添加到主节点类型中。The new scale set is added to the primary node type using the properties->virtualMachineProfile->extensionProfile->extensions->properties->settings->nodeTypeRef setting.

部署已更新的模板Deploy the updated template

根据需要调整 parameterFilePathtemplateFilePath,然后运行以下命令:Adjust the parameterFilePath and templateFilePath as needed and then run the following command:

# Deploy the new scale set into the primary node type along with a new load balancer and public IP
$templateFilePath = "C:\Deploy-2NodeTypes-3ScaleSets.json"
$parameterFilePath = "C:\Deploy-2NodeTypes-3ScaleSets.parameters.json"

New-AzResourceGroupDeployment `
    -ResourceGroupName $resourceGroupName `
    -TemplateFile $templateFilePath `
    -TemplateParameterFile $parameterFilePath `
    -CertificateThumbprint $thumb `
    -CertificateUrlValue $certUrlValue `
    -SourceVaultValue $sourceVaultValue `
    -Verbose

部署完成后,再次检查群集运行状况,并确保原始和新规模集上的所有节点都正常运行。When the deployment completes, check the cluster health again and ensure all nodes (on the original and on the new scale set) are healthy.

Get-ServiceFabricClusterHealth

将节点迁移到新规模集Migrate nodes to the new scale set

现在,我们准备开始禁用原始规模集的节点。We're now ready to start disabling the nodes of the original scale set. 禁用这些节点后,系统服务和种子节点将迁移到新规模集的 VM,因为新规模集也被标记为主节点类型。As these nodes become disabled, the system services and seed nodes migrate to the VMs of the new scale set because it is also marked as the primary node type.

对于纵向扩展非主节点类型,在此步骤中,你将修改服务放置约束以包括新的虚拟机规模集/节点类型,然后将旧的虚拟机规模集实例计数降低到零,一次一个节点(这是为了确保删除节点不会影响群集的可靠性)。For scaling up non-primary node types, in this step you would modify the service placement constraint to include the new virtual machine scale set/node type and then reduce the old virtual machine scale set instance count to zero, one node at a time (to ensure node removal doesn't impact cluster reliability).

# Disable the nodes in the original scale set.
$nodeNames = @("_NTvm1_0","_NTvm1_1","_NTvm1_2","_NTvm1_3","_NTvm1_4")

Write-Host "Disabling nodes..."
foreach($name in $nodeNames){
    Disable-ServiceFabricNode -NodeName $name -Intent RemoveNode -Force
}

使用 Service Fabric Explorer 监视种子节点到新规模集的迁移以及原始规模集中的节点从“正在禁用”到“已禁用”状态的进度 。Use Service Fabric Explorer to monitor the migration of seed nodes to the new scale set and the progression of nodes in the original scale set from Disabling to Disabled status.

备注

在原始规模集的所有节点上完成禁用操作可能需要一些时间。It may take some time to complete the disabling operation across all the nodes of the original scale set. 为了保证数据一致性,一次只能更改一个种子节点。To guarantee data consistency, only one seed node can change at a time. 每次更改种子节点都需要更新群集,因此,替换种子节点需要升级两次群集(一次是在添加节点时,一次是在删除节点时)。Each seed node change requires a cluster update; thus replacing a seed node requires two cluster upgrades (one each for node addition and removal). 本示例方案中的升级 5 个种子节点将需要升级 10 次群集。Upgrading the five seed nodes in this sample scenario will result in ten cluster upgrades.

删除原始规模集Remove the original scale set

禁用操作完成后,删除规模集。Once the disabling operation is complete, remove the scale set.

# Remove the original scale set
$scaleSetName = "NTvm1"

Remove-AzVmss `
    -ResourceGroupName $resourceGroupName `
    -VMScaleSetName $scaleSetName `
    -Force

Write-Host "Removed scale set $scaleSetName"

在 Service Fabric Explorer 中,已删除的节点(以及群集运行状况)现在将显示为“错误”状态 。In Service Fabric Explorer, the removed nodes (and thus the Cluster Health State) will now appear in Error state.

删除旧的负载均衡器并更新 DNS 设置Remove the old load balancer and update DNS settings

现在,可以从负载均衡器和旧的公共 IP 开始,删除与旧的主节点类型相关的资源。Now, we can remove the resources related to the old primary node type, beginning with the load balancer and the old public IP.

$lbname="LB-sfupgradetest-NTvm1"
$oldPublicIpName="PublicIP-LB-FE-0"
$newPublicIpName="PublicIP-LB-FE-2"

# Store DNS settings of public IP address related to old Primary NodeType into variable 
$oldprimaryPublicIP = Get-AzPublicIpAddress -Name $oldPublicIpName  -ResourceGroupName $groupname

$primaryDNSName = $oldprimaryPublicIP.DnsSettings.DomainNameLabel

$primaryDNSFqdn = $oldprimaryPublicIP.DnsSettings.Fqdn

# Remove Load Balancer related to old Primary NodeType. This will cause a brief period of downtime for the cluster
Remove-AzLoadBalancer -Name $lbname -ResourceGroupName $groupname -Force

# Remove the old public IP
Remove-AzPublicIpAddress -Name $oldPublicIpName -ResourceGroupName $groupname -Force

接下来,我们更新新公共 IP 的 DNS 设置,以反映旧主节点类型公共 IP 中的设置。Next, we update the DNS settings of the new public IP to mirror the settings from the old primary node type's public IP.

# Replace DNS settings of Public IP address related to new Primary Node Type with DNS settings of Public IP address related to old Primary Node Type
$PublicIP = Get-AzPublicIpAddress -Name $newPublicIpName  -ResourceGroupName $groupname
$PublicIP.DnsSettings.DomainNameLabel = $primaryDNSName
$PublicIP.DnsSettings.Fqdn = $primaryDNSFqdn
Set-AzPublicIpAddress -PublicIpAddress $PublicIP

再次查看群集运行状况Once more, check the cluster health

# Check the cluster health
Get-ServiceFabricClusterHealth

最后,删除每个相关节点的节点状态。Finally, remove the node state for each of the related nodes. 如果旧规模集的持续性级别为银级或金级,则将自动删除。If durability level of the old scale set was silver or gold, this will occur automatically.

# Remove node state for the deleted nodes.
foreach($name in $nodeNames){
    # Remove the node from the cluster
    Remove-ServiceFabricNodeState -NodeName $name -TimeoutSec 300 -Force
    Write-Host "Removed node state for node $name"
}

群集的主节点类型现已升级。The cluster's primary node type has now been upgraded. 验证已部署的所有应用程序是否正常运行以及群集运行状况是否正常。Verify that any deployed applications function properly and cluster health is ok.

后续步骤Next steps