纵向扩展 Service Fabric 群集主节点类型Scale up a Service Fabric cluster primary node type

本文介绍了如何在尽量减少停机时间的情况下纵向扩展 Service Fabric 群集主节点类型。This article describes how to scale up a Service Fabric cluster primary node type with minimal downtime. 升级 Service Fabric 群集节点类型的一般策略是:The general strategy for upgrading a Service Fabric cluster node type is to:

  1. 向 Service Fabric 群集添加一个新节点类型,该节点类型由升级(或修改)后的虚拟机规模集 SKU 和配置提供支持。Add a new node type to your Service Fabric cluster, backed by your upgraded (or modified) virtual machine scale set SKU and configuration. 此步骤还涉及为规模集设置新的负载均衡器、子网和公共 IP。This step also involves setting up a new load balancer, subnet, and public IP for the scale set.

  2. 在原始规模集和升级后的规模集一起运行后,逐个禁用原始节点实例,以便将系统服务(或有状态服务的副本)迁移到新规模集。Once both the original and upgraded scale sets are running side by side, disable the original node instances one at a time so that the system services (or replicas of stateful services) migrate to the new scale set.

  3. 验证群集和新节点是否正常,然后删除原始规模集(和相关资源)以及已删除节点的节点状态。Verify the cluster and new nodes are healthy, then remove the original scale set (and related resources) and node state for the deleted nodes.

下面的内容会引导你完成将示例群集的主节点类型 VM 的 VM 大小和操作系统更新为白银持续性的过程,该持续性由具有 5 个节点的单一规模集提供支持。The following will walk you through the process for updating the VM size and operating system of primary node type VMs of a sample cluster with Silver durability, backed by a single scale set with five nodes. 我们将对主节点类型进行如下所述的升级:We'll be upgrading the primary node type:

  • 将 VM 大小从 Standard_D2_V2 升级到 Standard D4_V2;From VM size Standard_D2_V2 to Standard D4_V2, and
  • 将 VM 操作系统从“包含容器的 Windows Server 2016 Datacenter”升级到“包含容器的 Windows Server 2019 Datacenter”。From VM operating system Windows Server 2016 Datacenter with Containers to Windows Server 2019 Datacenter with Containers.

警告

在生产群集上尝试执行此过程之前,建议先研究示例模板并对测试群集验证此过程。Before attempting this procedure on a production cluster, we recommend that you study the sample templates and verify the process against a test cluster. 群集也可能不可用,但这段不可用的时间较短。The cluster may also be unavailable for a short period of time.

如果集群状态不正常,请勿尝试主节点类型纵向扩展过程,因为这只会进一步破坏集群的稳定性。Do not attempt a primary node type scale up procedure if the cluster status is unhealthy, as this will only destabilize the cluster further.

下面是我们将要用来完成此示例升级方案的分步 Azure 部署模板: https://github.com/microsoft/service-fabric-scripts-and-templates/tree/master/templates/nodetype-upgradeHere are the step-by-step Azure deployment templates that we'll use to complete this sample upgrade scenario: https://github.com/microsoft/service-fabric-scripts-and-templates/tree/master/templates/nodetype-upgrade

设置测试群集Set up the test cluster

接下来让我们设置初始 Service Fabric 测试群集。Let's set up the initial Service Fabric test cluster. 首先,下载用于完成此方案的 Azure 资源管理器示例模板。First, download the Azure Resource Manager sample templates that we'll use to complete this scenario.

然后,登录到 Azure 帐户。Next, sign in to your Azure account.

# Sign in to your Azure account
Connect-AzAccount -Environment AzureChinaCloud -SubscriptionId "<subscription ID>"

接下来,打开 parameters.json 文件并将 clusterName 的值更新为某个唯一值(在 Azure 中)。Next open the parameters.json file and update the value for clusterName to something unique (within Azure).

以下命令将引导你生成新的自签名证书并部署测试群集。The following commands will guide you through generating a new self-signed certificate and deploying the test cluster. 如果你已有一个想要使用的证书,请跳到使用现有证书部署群集If you already have a certificate you'd like to use, skip to Use an existing certificate to deploy the cluster.

生成自签名证书并部署群集Generate a self-signed certificate and deploy the cluster

首先,分配 Service Fabric 群集部署所需的变量。First, assign the variables you'll need for Service Fabric cluster deployment. 根据具体的帐户和环境调整 resourceGroupNamecertSubjectNameparameterFilePathtemplateFilePath 的值:Adjust the values for resourceGroupName, certSubjectName, parameterFilePath, and templateFilePath for your specific account and environment:

# Assign deployment variables
$resourceGroupName = "sftestupgradegroup"
$certOutputFolder = "c:\certificates"
$certPassword = "Password!1" | ConvertTo-SecureString -AsPlainText -Force
$certSubjectName = "sftestupgrade.southcentralus.cloudapp.chinacloudapi.cn"
$parameterFilePath = "C:\parameters.json"
$templateFilePath = "C:\Initial-TestClusterSetup.json"

备注

在运行该命令来部署新的 Service Fabric 群集之前,请确保 certOutputFolder 位置存在于你的本地计算机上。Ensure that the certOutputFolder location exists on your local machine before running the command to deploy a new Service Fabric cluster.

接下来部署 Service Fabric 测试群集:Then deploy the Service Fabric test cluster:

# Deploy the initial test cluster
New-AzServiceFabricCluster `
    -ResourceGroupName $resourceGroupName `
    -CertificateOutputFolder $certOutputFolder `
    -CertificatePassword $certPassword `
    -CertificateSubjectName $certSubjectName `
    -TemplateFile $templateFilePath `
    -ParameterFile $parameterFilePath

部署完成后,在本地计算机上找到 .pfx 文件 ($certPfx) 并将其导入到证书存储中:Once the deployment is complete, locate the .pfx file ($certPfx) on your local machine and import it to your certificate store:

cd c:\certificates
$certPfx = ".\sftestupgradegroup20200312121003.pfx"

Import-PfxCertificate `
     -FilePath $certPfx `
     -CertStoreLocation Cert:\CurrentUser\My `
     -Password (ConvertTo-SecureString Password!1 -AsPlainText -Force)

该操作会返回证书指纹。现在,在连接到新群集以及检查其运行状况时可以使用该指纹。The operation will return the certificate thumbprint, which you can now use to connect to the new cluster and check its health status. (请跳过以下部分,其中介绍的是替代的群集部署方法。)(Skip the following section, which is an alternate approach to cluster deployment.)

使用现有证书部署群集Use an existing certificate to deploy the cluster

另外,你还可以使用现有 Azure Key Vault 证书来部署测试群集。Alternately, you can use an existing Azure Key Vault certificate to deploy the test cluster. 为此,需要获取对 Key Vault 和证书指纹的引用To do this, you'll need to obtain references to your Key Vault and certificate thumbprint.

# Key Vault variables
$certUrlValue = "https://sftestupgradegroup.vault.azure.cn/secrets/sftestupgradegroup20200309235308/dac0e7b7f9d4414984ccaa72bfb2ea39"
$sourceVaultValue = "/subscriptions/########-####-####-####-############/resourceGroups/sftestupgradegroup/providers/Microsoft.KeyVault/vaults/sftestupgradegroup"
$thumb = "BB796AA33BD9767E7DA27FE5182CF8FDEE714A70"

接下来,指定群集的资源组名称,并设置 templateFilePathparameterFilePath 位置:Next, designate a resource group name for the cluster and set the templateFilePath and parameterFilePath locations:

备注

指定的资源组必须已存在,并且与 Key Vault 位于同一区域。The designated resource group must already exist and be located in the same region as your Key Vault.

$resourceGroupName = "sftestupgradegroup"
$templateFilePath = "C:\Initial-TestClusterSetup.json"
$parameterFilePath = "C:\parameters.json"

最后,运行以下命令以部署初始测试群集:Finally, run the following command to deploy the initial test cluster:

# Deploy the initial test cluster
New-AzResourceGroupDeployment `
    -ResourceGroupName $resourceGroupName `
    -TemplateFile $templateFilePath `
    -TemplateParameterFile $parameterFilePath `
    -CertificateThumbprint $thumb `
    -CertificateUrlValue $certUrlValue `
    -SourceVaultValue $sourceVaultValue `
    -Verbose

连接到新群集并检查运行状况Connect to the new cluster and check health status

连接到群集,并确保其所有五个节点均正常运行(将 clusterNamethumb 变量替换为你自己的值):Connect to the cluster and ensure that all five of its nodes are healthy (substitute the clusterName and thumb variables with your own values):

# Connect to the cluster
$clusterName = "sftestupgrade.southcentralus.cloudapp.chinacloudapi.cn:19000"
$thumb = "BB796AA33BD9767E7DA27FE5182CF8FDEE714A70"

Connect-ServiceFabricCluster `
    -ConnectionEndpoint $clusterName `
    -KeepAliveIntervalInSec 10 `
    -X509Credential `
    -ServerCertThumbprint $thumb  `
    -FindType FindByThumbprint `
    -FindValue $thumb `
    -StoreLocation CurrentUser `
    -StoreName My

# Check cluster health
Get-ServiceFabricClusterHealth

完成上述操作后,便可以开始执行升级过程了。With that, we're ready to begin the upgrade procedure.

使用升级的规模集部署一个新的主节点类型Deploy a new primary node type with upgraded scale set

若要升级(垂直缩放)某个节点类型,首先需要部署由新的规模集和支持资源提供支持的新节点类型。In order to upgrade (vertically scale) a node type, we'll first need to deploy a new node type backed by a new scale set and supporting resources. 新规模集会被标记为主节点类型 (isPrimary: true),如同原始规模集一样(除非你执行的是非主节点类型升级)。The new scale set will be marked as primary (isPrimary: true), just like the original scale set (unless you're doing a non-primary node type upgrade). 在以下部分创建的资源最终会成为群集中新的主节点类型,而原始主节点类型资源则会被删除。The resources created in the following section will ultimately become the new primary node type in your cluster, and the original primary node type resources will be deleted.

使用升级的规模集更新群集模板Update the cluster template with the upgraded scale set

下面是原始群集部署模板中用来添加新的主节点类型和支持资源的逐节修改。Here are the section-by-section modifications of the original cluster deployment template for adding a new primary node type and supporting resources.

已经在 Step1-AddPrimaryNodeType.json 模板文件中为你执行了此步骤的必需更改,下面将详细解释这些更改。The required changes for this step have already been made for you in the Step1-AddPrimaryNodeType.json template file, and the following will explain these changes in detail. 如果需要,你可以跳过说明并继续获取 Key Vault 参考,并部署已更新的模板,该模板用于将新的主节点类型添加到群集。If you prefer, you can skip the explanation and continue to obtain your Key Vault references and deploy the updated template that adds a new primary node type to your cluster.

备注

请确保所用名称不同于原始主节点类型的原始节点类型、规模集、负载均衡器、公共 IP 和子网,因为在此过程的一个后续步骤中会删除这些资源。Ensure that you use names that are unique from the original node type, scale set, load balancer, public IP, and subnet of the original primary node type, as these resources will be deleted at a later step in the process.

在现有虚拟网络中创建一个新的子网Create a new subnet in the existing virtual network

{
    "name": "[variables('subnet1Name')]",
    "properties": {
        "addressPrefix": "[variables('subnet1Prefix')]"
    }
}

创建具有唯一 domainNameLabel 的一个新公共 IPCreate a new public IP with a unique domainNameLabel

{
    "apiVersion": "[variables('publicIPApiVersion')]",
    "type": "Microsoft.Network/publicIPAddresses",
    "name": "[concat(variables('lbIPName'),'-',variables('vmNodeType1Name'))]",
    "location": "[variables('computeLocation')]",
    "properties": {
    "dnsSettings": {
        "domainNameLabel": "[concat(variables('dnsName'),'-','nt1')]"
    },
    "publicIPAllocationMethod": "Dynamic"
    },
    "tags": {
    "resourceType": "Service Fabric",
    "clusterName": "[parameters('clusterName')]"
    }
}

为公共 IP 创建一个新的负载均衡器Create a new load balancer for the public IP

"dependsOn": [
    "[concat('Microsoft.Network/publicIPAddresses/',concat(variables('lbIPName'),'-',variables('vmNodeType1Name')))]"
]

创建一个新的虚拟机规模集(包含升级的 VM 和 OS SKU)Create a new virtual machine scale set (with upgraded VM and OS SKUs)

节点类型引用Node Type Ref

"nodeTypeRef": "[variables('vmNodeType1Name')]"

VM SKUVM SKU

"sku": {
    "name": "[parameters('vmNodeType1Size')]",
    "capacity": "[parameters('nt1InstanceCount')]",
    "tier": "Standard"
}

OS SKUOS SKU

"imageReference": {
    "publisher": "[parameters('vmImagePublisher1')]",
    "offer": "[parameters('vmImageOffer1')]",
    "sku": "[parameters('vmImageSku1')]",
    "version": "[parameters('vmImageVersion1')]"
}

另外,请确保包含工作负荷所需的任何其他扩展。Also, ensure you include any additional extensions that are required for your workload.

向群集添加新的主节点类型Add a new primary node type to the cluster

现在,新的节点类型 (vmNodeType1Name) 具有其自己的名称、子网、IP、负载均衡器和规模集,可以重复使用原始节点类型中的所有其他变量(例如 nt0applicationEndPortnt0applicationStartPortnt0fabricTcpGatewayPort):Now that the new node type (vmNodeType1Name) has its own name, subnet, IP, load balancer, and scale set, it can reuse all other variables from the original node type (such as nt0applicationEndPort, nt0applicationStartPort, and nt0fabricTcpGatewayPort):

"name": "[variables('vmNodeType1Name')]",
"applicationPorts": {
    "endPort": "[variables('nt0applicationEndPort')]",
    "startPort": "[variables('nt0applicationStartPort')]"
},
"clientConnectionEndpointPort": "[variables('nt0fabricTcpGatewayPort')]",
"durabilityLevel": "Bronze",
"ephemeralPorts": {
    "endPort": "[variables('nt0ephemeralEndPort')]",
    "startPort": "[variables('nt0ephemeralStartPort')]"
},
"httpGatewayEndpointPort": "[variables('nt0fabricHttpGatewayPort')]",
"isPrimary": true,
"reverseProxyEndpointPort": "[variables('nt0reverseProxyEndpointPort')]",
"vmInstanceCount": "[parameters('nt1InstanceCount')]"

实现模板和 parameters 文件中的所有更改后,转到下一部分获取 Key Vault 引用并将更新部署到群集。Once you've implemented all the changes in your template and parameters files, proceed to the next section to acquire your Key Vault references and deploy the updates to your cluster.

获取 Key Vault 引用Obtain your Key Vault references

若要部署已更新的配置,你需要多次引用 Key Vault 中存储的群集证书。To deploy the updated configuration, you'll need several references to the cluster certificate stored in your Key Vault. 查找这些值的最简单方法是使用 Azure 门户。The easiest way to find these values is through Azure portal. 需要:You'll need:

  • 群集证书的 Key Vault URL。The Key Vault URL of your cluster certificate. 在 Azure 门户上你的 Key Vault 中,选择“证书” > <所需的证书> > “机密标识符”:From your Key Vault in Azure portal, select Certificates > Your desired certificate > Secret Identifier:

    $certUrlValue="https://sftestupgradegroup.vault.azure.cn/secrets/sftestupgradegroup20200309235308/dac0e7b7f9d4414984ccaa72bfb2ea39"
    
  • 群集证书的指纹。The thumbprint of your cluster certificate. (如果已连接到初始群集来检查其运行状况,则可能已获取此指纹。)在 Azure 门户上的同一证书边栏选项卡(“证书” > <所需的证书>)中,复制“X.509 SHA-1 指纹(十六进制)”:(You probably already have this if you connected to the initial cluster to check its health status.) From the same certificate blade (Certificates > Your desired certificate) in Azure portal, copy X.509 SHA-1 Thumbprint (in hex):

    $thumb = "BB796AA33BD9767E7DA27FE5182CF8FDEE714A70"
    
  • Key Vault 的资源 ID。The Resource ID of your Key Vault. 在 Azure 门户上你的 Key Vault 中,选择“属性” > “资源 ID”: From your Key Vault in Azure portal, select Properties > Resource ID:

    $sourceVaultValue = "/subscriptions/########-####-####-####-############/resourceGroups/sftestupgradegroup/providers/Microsoft.KeyVault/vaults/sftestupgradegroup"
    

部署已更新的模板Deploy the updated template

根据需要调整 templateFilePath 并运行以下命令:Adjust the templateFilePath as needed and run the following command:

# Deploy the new node type and its resources
$templateFilePath = "C:\Step1-AddPrimaryNodeType.json"

New-AzResourceGroupDeployment `
    -ResourceGroupName $resourceGroupName `
    -TemplateFile $templateFilePath `
    -TemplateParameterFile $parameterFilePath `
    -CertificateThumbprint $thumb `
    -CertificateUrlValue $certUrlValue `
    -SourceVaultValue $sourceVaultValue `
    -Verbose

部署完成后,再次检查群集运行状况,并确保两种节点类型的所有节点都正常运行。When the deployment completes, check the cluster health again and ensure all nodes on both node types are healthy.

Get-ServiceFabricClusterHealth

将种子节点迁移到新节点类型Migrate seed nodes to the new node type

现在,我们已准备好将原始节点类型更新为非主节点类型,并开始禁用其节点。We're now ready to update the original node type as non-primary and start disabling its nodes. 节点禁用时,群集的系统服务和种子节点会迁移到新的规模集。As the nodes disable, the cluster's system services and seed nodes migrate to the new scale set.

取消将原始节点类型标记为主节点类型的操作Unmark the original node type as primary

首先,从原始节点类型中删除模板中的 isPrimary 指定。First remove the isPrimary designation in the template from the original node type.

{
    "isPrimary": false,
}

然后,使用更新来部署模板。Then deploy the template with the update. 这会启动种子节点到新规模集的迁移。This will initiate the migration of seed nodes to the new scale set.

$templateFilePath = "C:\Step2-UnmarkOriginalPrimaryNodeType.json"

New-AzResourceGroupDeployment `
    -ResourceGroupName $resourceGroupName `
    -TemplateFile $templateFilePath `
    -TemplateParameterFile $parameterFilePath `
    -CertificateThumbprint $thumb `
    -CertificateUrlValue $certUrlValue `
    -SourceVaultValue $sourceVaultValue `
    -Verbose

备注

完成种子节点到新规模集的迁移需要一段时间。It will take some time to complete the seed node migration to the new scale set. 为了保证数据一致性,每次只能更改一个种子节点。To guarantee data consistency, only one seed node can change at a time. 每次进行种子节点更改都需要更新群集;因此,替换种子节点需要执行群集升级两次(添加和删除节点各需要执行一次)。Each seed node change requires a cluster update; thus replacing a seed node requires two cluster upgrades (one each for node addition and removal). 在此示例方案中升级 5 个种子节点需要执行群集升级 10 次。Upgrading the five seed nodes in this sample scenario will result in ten cluster upgrades.

使用 Service Fabric Explorer 来监视种子节点到新规模集的迁移。Use Service Fabric Explorer to monitor the migration of seed nodes to the new scale set. 原始节点类型 (nt0vm) 的节点在“是种子节点”列中应当全都为“false”,新节点类型 (nt1vm) 的节点将为“true”。The nodes of the original node type (nt0vm) should all be false in the Is Seed Node column, and those of the new node type (nt1vm) will be true.

禁用原始节点类型规模集中的节点Disable the nodes in the original node type scale set

在所有种子节点都迁移到新规模集后,你可以禁用原始规模集的节点。Once all seed nodes have migrated to the new scale set, you can disable the nodes of the original scale set.

# Disable the nodes in the original scale set.
$nodeType = "nt0vm"
$nodes = Get-ServiceFabricNode

Write-Host "Disabling nodes..."
foreach($node in $nodes)
{
  if ($node.NodeType -eq $nodeType)
  {
    $node.NodeName

    Disable-ServiceFabricNode -Intent RemoveNode -NodeName $node.NodeName -Force
  }
}

使用 Service Fabric Explorer 监视原始规模集中的节点从“正在禁用”状态到“已禁用”状态的转换进度。 Use Service Fabric Explorer to monitor the progression of nodes in the original scale set from Disabling to Disabled status.

显示已禁用节点状态的 Service Fabric Explorer

对于“白银”和“黄金”持续性,某些节点会进入“已禁用”状态,而另一些节点可能会保持“正在禁用”状态。For Silver and Gold durability, some nodes will go into Disabled state, while others might remain in a Disabling state. 在 Service Fabric Explorer 中,检查处于“正在禁用”状态的节点的“详细信息”选项卡。In Service Fabric Explorer, check the Details tab of nodes in Disabling state. 如果它们显示了“EnsurePartitionQuorem”(确保基础设施服务分区的仲裁)类型的“挂起的安全检查”,则可以放心地继续操作。If they show a Pending Safety Check of Kind EnsurePartitionQuorem (ensuring quorum for infrastructure service partitions), then it is safe to continue.

如果节点显示了“EnsurePartitionQuorum”类型的挂起安全检查,则你可以停止数据并删除卡在“正在禁用”状态的节点。

如果群集为“青铜”持续性,请等待所有节点进入“已禁用”状态。If your cluster is Bronze durability, wait for all nodes to reach Disabled state.

停止已禁用节点上的数据Stop data on the disabled nodes

现在,你可以停止已禁用节点上的数据。Now you can stop data on the disabled nodes.

# Stop data on the disabled nodes.
foreach($node in $nodes)
{
  if ($node.NodeType -eq $nodeType)
  {
    $node.NodeName

    Start-ServiceFabricNodeTransition -Stop -OperationId (New-Guid) -NodeInstanceId $node.NodeInstanceId -NodeName $node.NodeName -StopDurationInSeconds 10000
  }
}

删除原始节点类型并清理其资源Remove the original node type and cleanup its resources

我们已准备好删除原始节点类型及其关联的资源,以结束垂直缩放过程。We're ready to remove the original node type and its associated resources to conclude the vertical scaling procedure.

删除原始规模集Remove the original scale set

首先删除节点类型的支持规模集。First remove the node type's backing scale set.

$scaleSetName = "nt0vm"
$scaleSetResourceType = "Microsoft.Compute/virtualMachineScaleSets"

Remove-AzResource -ResourceName $scaleSetName -ResourceType $scaleSetResourceType -ResourceGroupName $resourceGroupName -Force

删除原始 IP 和负载均衡器资源Delete the original IP and load balancer resources

你现在可以删除原始 IP 和负载均衡器资源。You can now delete the original IP, and load balancer resources. 在此步骤中,你还将更新 DNS 名称。In this step you will also update the DNS name.

备注

如果你已在使用“标准”SKU 公共 IP 和负载均衡器,则此步骤是可选的。This step is optional if you're already using a Standard SKU public IP and load balancer. 这种情况下,你在同一个负载均衡器下可以有多个规模集/节点类型。In this case you could have multiple scale sets / node types under the same load balancer.

运行以下命令,根据需要修改 $lbname 值。Run the following commands, modifying the $lbname value as needed.

# Delete the original IP and load balancer resources
$lbName = "LB-sftestupgrade-nt0vm"
$lbResourceType = "Microsoft.Network/loadBalancers"
$ipResourceType = "Microsoft.Network/publicIPAddresses"
$oldPublicIpName = "PublicIP-LB-FE-nt0vm"
$newPublicIpName = "PublicIP-LB-FE-nt1vm"

$oldPrimaryPublicIP = Get-AzPublicIpAddress -Name $oldPublicIpName  -ResourceGroupName $resourceGroupName
$primaryDNSName = $oldPrimaryPublicIP.DnsSettings.DomainNameLabel
$primaryDNSFqdn = $oldPrimaryPublicIP.DnsSettings.Fqdn

Remove-AzResource -ResourceName $lbName -ResourceType $lbResourceType -ResourceGroupName $resourceGroupName -Force
Remove-AzResource -ResourceName $oldPublicIpName -ResourceType $ipResourceType -ResourceGroupName $resourceGroupName -Force

$PublicIP = Get-AzPublicIpAddress -Name $newPublicIpName  -ResourceGroupName $resourceGroupName
$PublicIP.DnsSettings.DomainNameLabel = $primaryDNSName
$PublicIP.DnsSettings.Fqdn = $primaryDNSFqdn
Set-AzPublicIpAddress -PublicIpAddress $PublicIP

从原始节点类型中删除节点状态Remove node state from the original node type

原始节点类型节点的“运行状况状态”现在会显示“错误”。The original node type nodes will now show Error for their Health State. 从群集中删除其节点状态。Remove their node state from the cluster.

# Remove state of the obsolete nodes from the cluster
$nodeType = "nt0vm"
$nodes = Get-ServiceFabricNode

Write-Host "Removing node state..."
foreach($node in $nodes)
{
  if ($node.NodeType -eq $nodeType)
  {
    $node.NodeName

    Remove-ServiceFabricNodeState -NodeName $node.NodeName -Force
  }
}

Service Fabric Explorer 现在应当仅反映新节点类型 (nt1vm) 的五个节点,所有节点的“运行状况状态”值均为“正常”。Service Fabric Explorer should now reflect only the five nodes of the new node type (nt1vm), all with Health State values of OK. “群集运行状况状态”仍会显示“错误”。Your Cluster Health State will still show Error. 接下来,我们将通过更新模板以反映最新的更改并重新进行部署来修正这个问题。We'll remediate that next by updating the template to reflect the latest changes and redeploying.

更新部署模板以反映最近进行了纵向扩展的主节点类型Update the deployment template to reflect the newly scaled-up primary node type

已经在 Step3-CleanupOriginalPrimaryNodeType.json 模板文件中为你执行了此步骤的必需更改,下面的各部分将详细解释这些模板更改。The required changes for this step have already been made for you in the Step3-CleanupOriginalPrimaryNodeType.json template file, and the following sections will explain these template changes in detail. 如果需要,你可以跳过说明,继续部署已更新的模板,完成本教程。If you prefer, you can skip the explanation and continue to deploy the updated template and complete the tutorial.

更新群集管理终结点Update the cluster management endpoint

更新部署模板上的群集 managementEndpoint 以引用新的 IP(方法是将“vmNodeType0Name”更新为“vmNodeType1Name”)。Update the cluster managementEndpoint on the deployment template to reference the new IP (by updating vmNodeType0Name with vmNodeType1Name).

  "managementEndpoint": "[concat('https://',reference(concat(variables('lbIPName'),'-',variables('vmNodeType1Name'))).dnsSettings.fqdn,':',variables('nt0fabricHttpGatewayPort'))]",

删除原始节点类型引用Remove the original node type reference

从部署模板的 Service Fabric 资源中删除原始节点类型引用:Remove the original node type reference from the Service Fabric resource in the deployment template:

"name": "[variables('vmNodeType0Name')]",
"applicationPorts": {
    "endPort": "[variables('nt0applicationEndPort')]",
    "startPort": "[variables('nt0applicationStartPort')]"
},
"clientConnectionEndpointPort": "[variables('nt0fabricTcpGatewayPort')]",
"durabilityLevel": "Bronze",
"ephemeralPorts": {
    "endPort": "[variables('nt0ephemeralEndPort')]",
    "startPort": "[variables('nt0ephemeralStartPort')]"
},
"httpGatewayEndpointPort": "[variables('nt0fabricHttpGatewayPort')]",
"isPrimary": true,
"reverseProxyEndpointPort": "[variables('nt0reverseProxyEndpointPort')]",
"vmInstanceCount": "[parameters('nt0InstanceCount')]"

将运行状况策略配置为忽略现有错误Configure health policies to ignore existing errors

(仅适用于“白银”和更高级别持续性的群集)更新模板中的群集资源,并通过在群集资源属性下添加“applicationDeltaHealthPolicies”,将运行状况策略配置为忽略 fabric:/System 应用程序运行状况,如下所示。Only for Silver and higher durability clusters, update the cluster resource in the template and configure health policies to ignore fabric:/System application health by adding applicationDeltaHealthPolicies under cluster resource properties as given below. 以下策略会忽略现有错误,但不允许新的运行状况错误。The below policy will ignore existing errors but not allow new health errors.

"upgradeDescription":  
{ 
 "forceRestart": false, 
 "upgradeReplicaSetCheckTimeout": "10675199.02:48:05.4775807", 
 "healthCheckWaitDuration": "00:05:00", 
 "healthCheckStableDuration": "00:05:00", 
 "healthCheckRetryTimeout": "00:45:00", 
 "upgradeTimeout": "12:00:00", 
 "upgradeDomainTimeout": "02:00:00", 
 "healthPolicy": { 
   "maxPercentUnhealthyNodes": 100, 
   "maxPercentUnhealthyApplications": 100 
 }, 
 "deltaHealthPolicy":  
 { 
   "maxPercentDeltaUnhealthyNodes": 0, 
   "maxPercentUpgradeDomainDeltaUnhealthyNodes": 0, 
   "maxPercentDeltaUnhealthyApplications": 0, 
   "applicationDeltaHealthPolicies":  
   { 
       "fabric:/System":  
       { 
           "defaultServiceTypeDeltaHealthPolicy":  
           { 
                   "maxPercentDeltaUnhealthyServices": 0 
           } 
       } 
   } 
 } 
}

删除原始节点类型的支持资源Remove supporting resources for the original node type

从 ARM 模板和参数文件中删除与原始节点类型相关的所有其他资源。Remove all other resources related to the original node type from the ARM template and the parameters file. 删除以下内容:Delete the following:

    "vmImagePublisher": {
      "value": "MicrosoftWindowsServer"
    },
    "vmImageOffer": {
      "value": "WindowsServer"
    },
    "vmImageSku": {
      "value": "2016-Datacenter-with-Containers"
    },
    "vmImageVersion": {
      "value": "latest"
    },

部署已完成的模板Deploy the finalized template

最后,部署修改后的 Azure 资源管理器模板。Finally, deploy the modified Azure Resource Manager template.

# Deploy the updated template file
$templateFilePath = "C:\Step3-CleanupOriginalPrimaryNodeType"

New-AzResourceGroupDeployment `
    -ResourceGroupName $resourceGroupName `
    -TemplateFile $templateFilePath `
    -TemplateParameterFile $parameterFilePath `
    -CertificateThumbprint $thumb `
    -CertificateUrlValue $certUrlValue `
    -SourceVaultValue $sourceVaultValue `
    -Verbose

备注

此步骤将需要一段时间,通常最长为两个小时。This step will take a while, usually up to two hours.

升级会更改“InfrastructureService”的设置,因此需要重启节点。The upgrade will change settings to the InfrastructureService; therefore, a node restart is needed. 在这种情况下,将忽略 forceRestart。In this case, forceRestart is ignored. 参数 upgradeReplicaSetCheckTimeout 指定 Service Fabric 等待分区进入安全状态(如果尚未进入安全状态)的最长时间。The parameter upgradeReplicaSetCheckTimeout specifies the maximum time that Service Fabric waits for a partition to be in a safe state, if not already in a safe state. 一旦节点上的所有分区都已通过安全检查,Service Fabric 就会在该节点上继续升级。Once safety checks pass for all partitions on a node, Service Fabric proceeds with the upgrade on that node. 可将参数 upgradeTimeout 的值减至 6 小时,但若要获得最高安全性,应使用 12 小时。The value for the parameter upgradeTimeout can be reduced to 6 hours, but for maximal safety 12 hours should be used.

部署完成后,在 Azure 门户中验证 Service Fabric 资源的“状态”是否为“就绪”。Once the deployment has completed, verify in Azure portal that the Service Fabric resource Status is Ready. 验证你是否可以访问新的 Service Fabric Explorer 终结点、“群集运行状况状态”是否为“正常”,以及任何已部署的应用程序是否都能正常运行。Verify you can reach the new Service Fabric Explorer endpoint, the Cluster Health State is OK, and any deployed applications function properly.

这样,你就已经完成了垂直缩放群集主节点类型的操作!With that, you've vertically scaled a cluster primary node type!

后续步骤Next steps