将群集节点升级为使用 Azure 托管磁盘Upgrade cluster nodes to use Azure managed disks

Azure 托管磁盘是推荐用于 Azure 虚拟机的磁盘存储产品/服务,可以持久存储数据。Azure managed disks are the recommended disk storage offering for use with Azure virtual machines for persistent storage of data. 将你的节点类型所基于的虚拟机规模集升级为使用托管磁盘,可以改善 Service Fabric 工作负荷的复原能力。You can improve the resiliency of your Service Fabric workloads by upgrading the virtual machine scale sets that underlie your node types to use managed disks. 本文介绍了如何在只需群集短暂停机甚至无需其停机的前提下,将现有 Service Fabric 群集升级为使用 Azure 托管磁盘。Here's how to upgrade an existing Service Fabric cluster to use Azure managed disks with little or no downtime of your cluster.

将 Service Fabric 群集节点升级为使用托管磁盘的一般策略是:The general strategy for upgrading a Service Fabric cluster node to use managed disks is to:

  1. 部署该节点类型的另一重复虚拟机规模集,但在虚拟机规模集部署模板的 osDisk 节中添加 managedDisk 对象。Deploy an otherwise duplicate virtual machine scale set of that node type, but with the managedDisk object added to the osDisk section of the virtual machine scale set deployment template. 新规模集应绑定到原始规模集所用的同一负载均衡器/IP,使客户在迁移期间不会遇到服务中断。The new scale set should bind to the same load balancer / IP as the original, so that your customers don't experience a service outage during the migration.

  2. 在原始规模集和升级后的规模集一起运行后,逐个禁用原始节点实例,以便将系统服务(或有状态服务的副本)迁移到新规模集。Once both the original and upgraded scale sets are running side by side, disable the original node instances one at a time so that the system services (or replicas of stateful services) migrate to the new scale set.

  3. 验证群集和新节点是否正常,然后删除原始规模集,以及已删除的节点的节点状态。Verify the cluster and new nodes are healthy, then remove the original scale set and node state for the deleted nodes.

本文将引导你完成将示例群集的主要节点类型升级为使用托管磁盘的步骤,同时避免发生任何群集停机(参阅下面的注释)。This article will walk you through the steps of upgrading the primary node type of an example cluster to use managed disks, while avoiding any cluster downtime (see note below). 示例测试群集的初始状态包括一个银级持久性的节点类型,该节点类型由包含五个节点的单个规模集提供支持。The initial state of the example test cluster consists of one node type of Silver durability, backed by a single scale set with five nodes.

备注

基本 SKU 负载均衡器的限制会阻止添加其他规模集。The limitations of a Basic SKU load balancer prevent an additional scale set from being added. 建议改为使用标准 SKU 负载均衡器。We recommend using the Standard SKU load balancer instead. 有关详细信息,请参阅两个 SKU 的比较For more, see a comparison of the two SKUs.

注意

仅当你依赖于群集 DNS 时(例如,在访问 Service Fabric Explorer 时),才会在此过程中遇到服务中断。You will experience an outage with this procedure only if you have dependencies on the cluster DNS (such as when accessing Service Fabric Explorer). 适用于前端服务的体系结构最佳做法要求在你的节点类型的前面使用某种负载均衡器,以便无需中断服务即可进行节点交换。Architectural best practice for front-end services is to have some kind of load balancer in front of your node types to make node swapping possible without an outage.

下面是用于完成升级方案的 Azure 资源管理器的模板和 cmdletHere are the templates and cmdlets for Azure Resource Manager that we'll use to complete the upgrade scenario. 下文中的为主要节点类型部署升级规模集将介绍模板更改。The template changes will be explained in Deploy an upgraded scale set for the primary node type below.

设置测试群集Set up the test cluster

接下来让我们设置初始 Service Fabric 测试群集。Let's set up the initial Service Fabric test cluster. 首先,下载用于完成此方案的 Azure 资源管理器示例模板。First, download the Azure Resource Manager sample templates that we'll use to complete this scenario.

然后,登录到 Azure 帐户。Next, sign in to your Azure account.

# Sign in to your Azure account
Connect-AzAccount -Environment AzureChinaCloud -SubscriptionId "<subscription ID>"

以下命令将引导你生成新的自签名证书并部署测试群集。The following commands will guide you through generating a new self-signed certificate and deploying the test cluster. 如果你已有一个想要使用的证书,请跳到使用现有证书部署群集If you already have a certificate you'd like to use, skip to Use an existing certificate to deploy the cluster.

生成自签名证书并部署群集Generate a self-signed certificate and deploy the cluster

首先,分配 Service Fabric 群集部署所需的变量。First, assign the variables you'll need for Service Fabric cluster deployment. 根据具体的帐户和环境调整 resourceGroupNamecertSubjectNameparameterFilePathtemplateFilePath 的值:Adjust the values for resourceGroupName, certSubjectName, parameterFilePath, and templateFilePath for your specific account and environment:

# Assign deployment variables
$resourceGroupName = "sftestupgradegroup"
$certOutputFolder = "c:\certificates"
$certPassword = "Password!1" | ConvertTo-SecureString -AsPlainText -Force
$certSubjectName = "sftestupgrade.chinaeast.cloudapp.chinacloudapi.cn"
$templateFilePath = "C:\Initial-1NodeType-UnmanagedDisks.json"
$parameterFilePath = "C:\Initial-1NodeType-UnmanagedDisks.parameters.json"

备注

在运行该命令部署新的 Service Fabric 群集之前,请确保 certOutputFolder 位置存在于你的本地计算机上。Ensure that the certOutputFolder location exist on your local machine before running the command to deploy a new Service Fabric cluster.

接下来,打开 Initial-1NodeType-UnmanagedDisks.parameters.json 文件,并调整 clusterNamednsName 的值,使之对应于你在 PowerShell 中设置的动态值,然后保存更改。Next open the Initial-1NodeType-UnmanagedDisks.parameters.json file and adjust the values for clusterName and dnsName to correspond to the dynamic values you set in PowerShell and save your changes.

接下来部署 Service Fabric 测试群集:Then deploy the Service Fabric test cluster:

# Deploy the initial test cluster
New-AzServiceFabricCluster `
    -ResourceGroupName $resourceGroupName `
    -CertificateOutputFolder $certOutputFolder `
    -CertificatePassword $certPassword `
    -CertificateSubjectName $certSubjectName `
    -TemplateFile $templateFilePath `
    -ParameterFile $parameterFilePath

部署完成后,在本地计算机上找到 .pfx 文件 ($certPfx) 并将其导入到证书存储中:Once the deployment is complete, locate the .pfx file ($certPfx) on your local machine and import it to your certificate store:

cd c:\certificates
$certPfx = ".\sftestupgradegroup20200312121003.pfx"

Import-PfxCertificate `
     -FilePath $certPfx `
     -CertStoreLocation Cert:\CurrentUser\My `
     -Password (ConvertTo-SecureString Password!1 -AsPlainText -Force)

该操作将返回证书指纹,在连接到新群集以及检查其运行状况时将要使用该指纹。The operation will return the certificate thumbprint, which you'll use to connect to the new cluster and check its health status. (请跳过以下部分,其中介绍的是替代的群集部署方法。)(Skip the following section, which is an alternate approach to cluster deployment.)

使用现有证书部署群集Use an existing certificate to deploy the cluster

还可以使用现有 Azure Key Vault 证书来部署测试群集。You can also use an existing Azure Key Vault certificate to deploy the test cluster. 为此,需要获取对 Key Vault 和证书指纹的引用To do this, you'll need to obtain references to your Key Vault and certificate thumbprint.

# Key Vault variables
$certUrlValue = "https://sftestupgradegroup.vault.azure.cn/secrets/sftestupgradegroup20200309235308/dac0e7b7f9d4414984ccaa72bfb2ea39"
$sourceVaultValue = "/subscriptions/########-####-####-####-############/resourceGroups/sftestupgradegroup/providers/Microsoft.KeyVault/vaults/sftestupgradegroup"
$thumb = "BB796AA33BD9767E7DA27FE5182CF8FDEE714A70"

打开 Initial-1NodeType-UnmanagedDisks.parameters.json 文件,并将 clusterNamednsName 的值更改为唯一值。Open the Initial-1NodeType-UnmanagedDisks.parameters.json file and change the values for clusterName and dnsName to something unique.

最后,指定群集的资源组名称,并设置 Initial-1NodeType-UnmanagedDisks 文件的 templateFilePathparameterFilePath 位置:Finally, designate a resource group name for the cluster and set the templateFilePath and parameterFilePath locations of your Initial-1NodeType-UnmanagedDisks files:

备注

指定的资源组必须已存在,并且与 Key Vault 位于同一区域。The designated resource group must already exist and be located in the same region as your Key Vault.

# Deploy the new scale set (upgraded to use managed disks) into the primary node type.
$resourceGroupName = "sftestupgradegroup"
$templateFilePath = "C:\Upgrade-1NodeType-2ScaleSets-ManagedDisks.json"
$parameterFilePath = "C:\Upgrade-1NodeType-2ScaleSets-ManagedDisks.parameters.json"

最后,运行以下命令以部署初始测试群集:Finally, run the following command to deploy the initial test cluster:

New-AzResourceGroupDeployment `
    -ResourceGroupName $resourceGroupName `
    -TemplateFile $templateFilePath `
    -TemplateParameterFile $parameterFilePath `
    -CertificateThumbprint $thumb `
    -CertificateUrlValue $certUrlValue `
    -SourceVaultValue $sourceVaultValue `
    -Verbose

连接到新群集并检查运行状况Connect to the new cluster and check health status

连接到群集,并确保其所有五个节点均正常(请针对你的群集替换 clusterNamethumb 变量):Connect to the cluster and ensure that all five of its nodes are healthy (replacing the clusterName and thumb variables for your cluster):

# Connect to the cluster
$clusterName = "sftestupgrade.chinaeast.cloudapp.chinacloudapi.cn:19000"
$thumb = "BB796AA33BD9767E7DA27FE5182CF8FDEE714A70"

Connect-ServiceFabricCluster `
    -ConnectionEndpoint $clusterName `
    -KeepAliveIntervalInSec 10 `
    -X509Credential `
    -ServerCertThumbprint $thumb  `
    -FindType FindByThumbprint `
    -FindValue $thumb `
    -StoreLocation CurrentUser `
    -StoreName My

# Check cluster health
Get-ServiceFabricClusterHealth

完成上述操作后,便可以开始执行升级过程了。With that, we're ready to begin the upgrade procedure.

为主要节点类型部署升级的规模集Deploy an upgraded scale set for the primary node type

若要升级或纵向缩放某个节点类型,需要部署该节点类型的虚拟机规模集的副本,该副本与原始规模集相同(包括对相同 nodeTypeRefsubnetloadBalancerBackendAddressPools 的引用),只不过,其中还包含所需的升级/更改及其自身的单独子网和入站 NAT 地址池。In order to upgrade, or vertically scale , a node type, we'll need to deploy a copy of that node type's virtual machine scale set, which is otherwise identical to the original scale set (including reference to the same nodeTypeRef, subnet, and loadBalancerBackendAddressPools) except that it includes the desired upgrade/changes and its own separate subnet and inbound NAT address pool. 由于我们要升级主要节点类型,因此,新规模集将如同原始规模集一样标记为主节点类型 (isPrimary: true)。Because we are upgrading a primary node type, the new scale set will be marked as primary (isPrimary: true), just like the original scale set. (对于非主节点类型升级,请省略此项更改。)(For non-primary node type upgrades, simply omit this.)

为方便起见, Upgrade-1NodeType-2ScaleSets-ManagedDisks 模板parameters 文件中已经为你做出了所需的更改。For convenience, the required changes have already been made for you in the Upgrade-1NodeType-2ScaleSets-ManagedDisks template and parameters files.

以下部分将详细说明模板更改。The following sections will explain the template changes in detail. 如果需要,可以跳过说明并转到升级过程的下一步骤If you prefer, you can skip the explanation and continue on to the next step of the upgrade procedure.

使用升级的规模集更新群集模板Update the cluster template with the upgraded scale set

下面是用来为主要节点类型添加已升级规模集的原始群集部署模板的各个节的修改。Here are the section-by-section modifications of the original cluster deployment template for adding an upgraded scale set for the primary node type.

parametersParameters

添加新规模集的实例名称、计数和大小的参数。Add parameters for the instance name, count, and size of the new scale set. 请注意,vmNodeType1Name 对于新规模集是唯一的,而计数和大小值与原始规模集相同。Note that vmNodeType1Name is unique to the new scale set, while the count and size values are identical to the original scale set.

模板文件Template file

"vmNodeType1Name": {
    "type": "string",
    "defaultValue": "NTvm2",
    "maxLength": 9
},
"nt1InstanceCount": {
    "type": "int",
    "defaultValue": 5,
    "metadata": {
        "description": "Instance count for node type"
    }
},
"vmNodeType1Size": {
    "type": "string",
    "defaultValue": "Standard_D2_v2"
},

参数文件Parameters file

"vmNodeType1Name": {
    "value": "NTvm2"
},
"nt1InstanceCount": {
    "value": 5
},
"vmNodeType1Size": {
    "value": "Standard_D2_v2"
}

变量Variables

在部署模板的 variables 节中,为新规模集的入站 NAT 地址池添加一个条目。In the deployment template variables section, add an entry for the inbound NAT address pool of the new scale set.

模板文件Template file

"lbNatPoolID1": "[concat(variables('lbID0'),'/inboundNatPools/LoadBalancerBEAddressNatPool1')]", 

资源Resources

在部署模板的 resources 节中添加新的虚拟机规模集,同时请注意以下事项:In the deployment template resources section, add the new virtual machine scale set, keeping in mind these things:

  • 新规模集引用与原始规模集相同的节点类型:The new scale set references the same node type as the original:

    "nodeTypeRef": "[parameters('vmNodeType0Name')]",
    
  • 新规模集引用相同的负载均衡器后端地址和子网(但使用不同的负载均衡器入站 NAT 池):The new scale set references the same load balancer backend address and subnet (but uses a different load balancer inbound NAT pool):

      "loadBalancerBackendAddressPools": [
          {
              "id": "[variables('lbPoolID0')]"
          }
      ],
      "loadBalancerInboundNatPools": [
          {
              "id": "[variables('lbNatPoolID1')]"
          }
      ],
      "subnet": {
          "id": "[variables('subnet0Ref')]"
      }
    
  • 与原始规模集一样,新规模集将标记为主要节点类型。Like the original scale set, the new scale set is marked as the primary node type. (升级非主要节点类型时,请省略此项更改。)(When upgrading non-primary node types, omit this change.)

    "isPrimary": true,
    
  • 与原始规模集不同,新规模集将升级为使用托管磁盘。Unlike the original scale set, the new scale set is upgraded to use managed disks.

    "managedDisk": {
        "storageAccountType": "[parameters('storageAccountType')]"
    }
    

实现模板和 parameters 文件中的所有更改后,转到下一部分获取 Key Vault 引用并将更新部署到群集。Once you've implemented all the changes in your template and parameters files, proceed to the next section to acquire your Key Vault references and deploy the updates to your cluster.

获取 Key Vault 引用Obtain your Key Vault references

若要部署已更新的配置,首先需要获取对 Key Vault 中存储的群集证书的多个引用。To deploy the updated configuration, you'll first to obtain several references to your cluster certificate stored in your Key Vault. 查找这些值的最简单方法是使用 Azure 门户。The easiest way to find these values is through Azure portal. 需要:You'll need:

  • 群集证书的 Key Vault URL。The Key Vault URL of your cluster certificate. 在 Azure 门户上你的 Key Vault 中,选择“证书” > <所需的证书> > “机密标识符”:From your Key Vault in Azure portal, select Certificates > Your desired certificate > Secret Identifier :

    $certUrlValue="https://sftestupgradegroup.vault.azure.cn/secrets/sftestupgradegroup20200309235308/dac0e7b7f9d4414984ccaa72bfb2ea39"
    
  • 群集证书的指纹。The thumbprint of your cluster certificate. (如果已连接到初始群集来检查其运行状况,则可能已获取此指纹。)在 Azure 门户上的同一证书边栏选项卡(“证书” > <所需的证书>)中,复制“X.509 SHA-1 指纹(十六进制)”:(You probably already have this if you connected to the initial cluster to check its health status.) From the same certificate blade ( Certificates > Your desired certificate ) in Azure portal, copy X.509 SHA-1 Thumbprint (in hex) :

    $thumb = "BB796AA33BD9767E7DA27FE5182CF8FDEE714A70"
    
  • Key Vault 的资源 ID。The Resource ID of your Key Vault. 在 Azure 门户上你的 Key Vault 中,选择“属性” > “资源 ID”: From your Key Vault in Azure portal, select Properties > Resource ID :

    $sourceVaultValue = "/subscriptions/########-####-####-####-############/resourceGroups/sftestupgradegroup/providers/Microsoft.KeyVault/vaults/sftestupgradegroup"
    

部署已更新的模板Deploy the updated template

根据需要调整 parameterFilePathtemplateFilePath,然后运行以下命令:Adjust the parameterFilePath and templateFilePath as needed and then run the following command:

# Deploy the new scale set (upgraded to use managed disks) into the primary node type.
$templateFilePath = "C:\Upgrade-1NodeType-2ScaleSets-ManagedDisks.json"
$parameterFilePath = "C:\Upgrade-1NodeType-2ScaleSets-ManagedDisks.parameters.json"

New-AzResourceGroupDeployment `
    -ResourceGroupName $resourceGroupName `
    -TemplateFile $templateFilePath `
    -TemplateParameterFile $parameterFilePath `
    -CertificateThumbprint $thumb `
    -CertificateUrlValue $certUrlValue `
    -SourceVaultValue $sourceVaultValue `
    -Verbose

部署完成后,再次检查群集运行状况,并确保所有 10 个节点(原始规模集和新规模集上各有 5 个)均正常。When the deployment completes, check the cluster health again and ensure all ten nodes (five on the original and five on the new scale set) are healthy.

Get-ServiceFabricClusterHealth

将种子节点迁移到新规模集Migrate seed nodes to the new scale set

现在,我们已准备好开始禁用原始规模集的节点。We're now ready to start disabling the nodes of the original scale set. 由于这些节点将被禁用,而新规模集也会标记为主要节点类型,因此,系统服务和种子节点将迁移到新规模集的 VM。As these nodes become disabled, the system services and seed nodes migrate to the VMs of the new scale set because it is also marked as the primary node type.

# Disable the nodes in the original scale set.
$nodeNames = @("_NTvm1_0","_NTvm1_1","_NTvm1_2","_NTvm1_3","_NTvm1_4")

Write-Host "Disabling nodes..."
foreach($name in $nodeNames){
    Disable-ServiceFabricNode -NodeName $name -Intent RemoveNode -Force
}

使用 Service Fabric Explorer 来监视将种子节点迁移到新规模集的过程,以及原始规模集中的节点从“正在禁用”到“已禁用”状态的转换进度。 Use Service Fabric Explorer to monitor the migration of seed nodes to the new scale set and the progression of nodes in the original scale set from Disabling to Disabled status.

显示已禁用节点状态的 Service Fabric Explorer

备注

在原始规模集的所有节点上完成禁用操作可能需要一段时间。It may take some time to complete the disabling operation across all the nodes of the original scale set. 为了保证数据一致性,每次只能更改一个种子节点。To guarantee data consistency, only one seed node can change at a time. 每次进行种子节点更改都需要更新群集;因此,替换种子节点需要执行群集升级两次(添加和删除节点各需要执行一次)。Each seed node change requires a cluster update; thus replacing a seed node requires two cluster upgrades (one each for node addition and removal). 在此示例方案中升级 5 个种子节点需要执行群集升级 10 次。Upgrading the five seed nodes in this sample scenario will result in ten cluster upgrades.

删除原始规模集Remove the original scale set

禁用操作完成后,请删除规模集。Once the disabling operation is complete, remove the scale set.

# Remove the original scale set
$scaleSetName = "NTvm1"

Remove-AzVmss `
    -ResourceGroupName $resourceGroupName `
    -VMScaleSetName $scaleSetName `
    -Force

Write-Host "Removed scale set $scaleSetName"

在 Service Fabric Explorer 中,已删除的节点(因此也包括“群集运行状况”)现在会显示为“错误”状态。 In Service Fabric Explorer, the removed nodes (and thus the Cluster Health State ) will now appear in Error state.

显示处于“错误”状态的已禁用节点的 Service Fabric Explorer

从 Service Fabric 群集中删除已过时的节点可将“群集运行状况”还原为“正常”。Remove the obsolete nodes from the Service Fabric cluster to restore the Cluster Health State to OK.

# Remove node states for the deleted scale set
foreach($name in $nodeNames){
    Remove-ServiceFabricNodeState -NodeName $name -TimeoutSec 300 -Force
    Write-Host "Removed node state for node $name"
}

Service Fabric Explorer,其中包含处于“错误”状态的已关闭节点

后续步骤Next steps

在本演练中,你已了解如何将 Service Fabric 群集的虚拟机规模集升级为使用托管磁盘,并在此过程中避免发生服务中断。In this walkthrough, you learned how to upgrade the virtual machine scale sets of a Service Fabric cluster to use managed disks while avoiding service outages during the process. 如需相关主题的详细信息,请查看以下资源。For more info on related topics check out the following resources.

了解如何:Learn how to:

另请参阅:See also: