在 Azure Service Fabric 群集中定期备份和还原Periodic backup and restore in an Azure Service Fabric cluster

Service Fabric 是一种分布式系统平台,用于轻松开发和管理基于微服务的可靠的分布式云应用程序。Service Fabric is a distributed systems platform that makes it easy to develop and manage reliable, distributed, microservices based cloud applications. 它允许运行无状态和有状态的微服务。It allows running of both stateless and stateful micro services. 有状态服务可在请求和响应或完整的事务之外维持可变的权威状态。Stateful services can maintain mutable, authoritative state beyond the request and response or a complete transaction. 如果有状态服务长时间不可用或由于灾难而丢失信息,可能需要还原到其状态的某个最近备份,以便在其备份后继续提供服务。If a Stateful service goes down for a long time or loses information due to a disaster, it may need to be restored to some recent backup of its state in order to continue providing service after it comes back up.

Service Fabric 跨多个节点复制状态,确保服务高度可用。Service Fabric replicates the state across multiple nodes to ensure that the service is highly available. 即使群集中的一个节点出现故障,服务也将继续可用。Even if one node in the cluster fails, the service continues to be available. 然而,在某些情况下,仍然需要服务数据能够可靠应对更广泛的故障。In certain cases, however, it is still desirable for the service data to be reliable against broader failures.

例如,服务可能要备份其数据,以防止出现以下情况:For example, service may want to back up its data in order to protect from the following scenarios:

  • 整个 Service Fabric 群集永久丢失。In the event of the permanent loss of an entire Service Fabric cluster.
  • 大部分服务分区副本永久丢失Permanent loss of a majority of the replicas of a service partition
  • 状态被意外删除或受损而引起管理错误。Administrative errors whereby the state accidentally gets deleted or corrupted. 例如,具有足够权限的管理员错误地删除了服务。For example, an administrator with sufficient privilege erroneously deletes the service.
  • 服务中的 bug 导致数据损坏。Bugs in the service that cause data corruption. 例如,当某个服务代码升级程序开始将错误数据写入到可靠集合中时可能发生此情况。For example, this may happen when a service code upgrade starts writing faulty data to a Reliable Collection. 在此情况下,代码和数据可能都必须还原到先前的状态。In such a case, both the code and the data may have to be reverted to an earlier state.
  • 离线数据处理。Offline data processing. 对于商业智能来说使用离线处理的数据很方便,此处理是独立于生成数据的服务进行的。It might be convenient to have offline processing of data for business intelligence that happens separately from the service that generates the data.

Service Fabric 提供了一个内置 API,用于执行时间点备份和还原Service Fabric provides an inbuilt API to do point in time backup and restore. 应用程序开发者可使用这些 API 定期备份服务状态。Application developers may use these APIs to back up the state of the service periodically. 此外,如果服务管理员想要在特定时间从服务外部触发备份,就像在升级应用程序之前一样,开发者需要将备份(和还原)作为服务的 API 公开。Additionally, if service administrators want to trigger a backup from outside of the service at a specific time, like before upgrading the application, developers need to expose backup (and restore) as an API from the service. 维护备份是以上操作的额外成本。Maintaining the backups is an additional cost above this. 例如,你可能希望每半小时进行 5 次递增备份,然后进行完整备份。For example, you may want to take five incremental backups every half hour, followed by a full backup. 完整备份后,可删除以前的递增备份。After the full backup, you can delete the prior incremental backups. 此方法需要额外的代码,因而在应用程序开发期间产生额外成本。This approach requires additional code leading to additional cost during application development.

Service Fabric 中的备份和还原服务可以轻松自动备份存储在有状态服务中的信息。The Backup and Restore service in Service Fabric enables easy and automatic backup of information stored in stateful services. 定期备份应用程序数据是防止数据丢失和服务不可用的基础。Backing up application data on a periodic basis is fundamental for guarding against data loss and service unavailability. Service Fabric 提供可选的备份和还原服务,因此无需编写任何其他代码,便可配置有状态可靠服务(包括角色服务)的定期备份。Service Fabric provides an optional backup and restore service, which allows you to configure periodic backup of stateful Reliable Services (including Actor Services) without having to write any additional code. 它还有助于还原以前执行的备份。It also facilitates restoring previously taken backups.

Service Fabric 提供了一组 API 以实现与定期备份和还原功能相关的以下功能:Service Fabric provides a set of APIs to achieve the following functionality related to periodic backup and restore feature:

  • 通过支持将备份上传到(外部)存储位置,计划可靠有状态服务和 Reliable Actors 的定期备份。Schedule periodic backup of Reliable Stateful services and Reliable Actors with support to upload backup to (external) storage locations. 受支持的存储位置Supported storage locations
    • Azure 存储Azure Storage
    • 文件共享(本地)File Share (on-premises)
  • 枚举备份Enumerate backups
  • 触发分区的临时备份Trigger an ad hoc backup of a partition
  • 使用之前的备份还原分区Restore a partition using previous backup
  • 暂时暂停备份Temporarily suspend backups
  • 备份的保留期管理(即将推出)Retention management of backups (upcoming)

必备条件Prerequisites

  • 具有 Fabric 6.4 或更高版本的 Service Fabric 群集。Service Fabric cluster with Fabric version 6.4 or above. 有关使用 Azure 资源模板创建 Service Fabric 群集的步骤,请参阅此文章Refer to this article for steps to create Service Fabric cluster using Azure resource template.
  • 用于加密机密的 X.509 证书,连接到存储以存储备份时需要此机密。X.509 Certificate for encryption of secrets needed to connect to storage to store backups. 请参阅文章,了解如何获取或创建 X.509 证书。Refer article to know how to get or create an X.509 certificate.
  • 使用 Service Fabric SDK 3.0 或更高版本生成的 Service Fabric 可靠有状态应用程序。Service Fabric Reliable Stateful application built using Service Fabric SDK version 3.0 or above. 对于面向 .NET Core 2.0 的应用程序,应使用 Service Fabric SDK 3.1 或更高版本生成应用程序。For applications targeting .NET Core 2.0, application should be built using Service Fabric SDK version 3.1 or above.
  • 创建 Azure 存储帐户,用于存储应用程序备份。Create Azure Storage account for storing application backups.
  • 安装 Microsoft.ServiceFabric.Powershell.Http模块 [在预览中] 进行配置调用。Install Microsoft.ServiceFabric.Powershell.Http Module [In Preview] for making configuration calls.
Install-Module -Name Microsoft.ServiceFabric.Powershell.Http -AllowPrerelease
  • 请确保在使用 Microsoft.ServiceFabric.Powershell.Http 模块发出任何配置请求之前,先使用 Connect-SFCluster 命令连接群集。Make sure that Cluster is connected using the Connect-SFCluster command before making any configuration request using Microsoft.ServiceFabric.Powershell.Http Module.

Connect-SFCluster -ConnectionEndpoint 'https://mysfcluster.chinaeast.cloudapp.chinacloudapi.cn:19080' -X509Credential -FindType FindByThumbprint -FindValue '1b7ebe2174649c45474a4819dafae956712c31d3' -StoreLocation 'CurrentUser' -StoreName 'My' -ServerCertThumbprint '1b7ebe2174649c45474a4819dafae956712c31d3'  

启用备份和还原服务Enabling backup and restore service

使用 Azure 门户Using Azure portal

启用 Cluster Configuration 选项卡中 + Show optional settings 下的 Include backup restore service 复选框。Enable Include backup restore service check box under + Show optional settings in Cluster Configuration tab.

使用门户启用备份还原服务

使用 Azure 资源管理器模板Using Azure Resource Manager Template

首先,需要在群集中启用备份和还原服务 。First you need to enable the backup and restore service in your cluster. 获取要部署的群集的模板。Get the template for the cluster that you want to deploy. 可使用示例模板或创建资源管理器模板。You can either use the sample templates or create a Resource Manager template. 通过以下步骤启用备份和还原服务 :Enable the backup and restore service with the following steps:

  1. 检查 apiversion 是否针对 Microsoft.ServiceFabric/clusters 资源设置为 2018-02-01,如果没有,请按以下代码片段所示进行更新:Check that the apiversion is set to 2018-02-01 for the Microsoft.ServiceFabric/clusters resource, and if not, update it as shown in the following snippet:

    {
        "apiVersion": "2018-02-01",
        "type": "Microsoft.ServiceFabric/clusters",
        "name": "[parameters('clusterName')]",
        "location": "[parameters('clusterLocation')]",
        ...
    }
    
  2. 现在,通过在 properties 部分下添加以下 addonFeatures 部分来启用备份和还原服务,如以下代码片段所示:Now enable the backup and restore service by adding the following addonFeatures section under properties section as shown in the following snippet:

    "properties": {
        ...
        "addonFeatures":  ["BackupRestoreService"],
        "fabricSettings": [ ... ]
        ...
    }
    
    
  3. 配置 X.509 证书以用于加密凭据。Configure X.509 certificate for encryption of credentials. 此步骤非常重要,可确保在保留之前对提供用于连接存储的凭据进行加密。This is important to ensure that the credentials provided to connect to storage are encrypted before persisting. 通过在 fabricSettings 部分下添加以下 BackupRestoreService 部分来配置加密证书,如以下代码片段所示:Configure encryption certificate by adding the following BackupRestoreService section under fabricSettings section as shown in the following snippet:

    "properties": {
        ...
        "addonFeatures": ["BackupRestoreService"],
        "fabricSettings": [{
            "name": "BackupRestoreService",
            "parameters":  [{
                "name": "SecretEncryptionCertThumbprint",
                "value": "[Thumbprint]"
            }]
        }
        ...
    }
    
  4. 通过前述更改更新群集模板后,应用更改并等待部署/升级完成。Once you have updated your cluster template with the preceding changes, apply them and let the deployment/upgrade complete. 完成后,备份和还原服务开始在群集中运行 。Once complete, the backup and restore service starts running in your cluster. 此服务的 URI 为 fabric:/System/BackupRestoreService,并且此服务可位于 Service Fabric Explorer 中系统服务部分下。The Uri of this service is fabric:/System/BackupRestoreService and the service can be located under system service section in the Service Fabric explorer.

启用可靠有状态服务和 Reliable Actors 的定期备份Enabling periodic backup for Reliable Stateful service and Reliable Actors

让我们通过一些步骤来启用可靠有状态服务和 Reliable Actors 的定期备份。Let's walk through steps to enable periodic backup for Reliable Stateful service and Reliable Actors. 这些步骤假定These steps assume

  • 通过备份和还原服务,使用 X.509 安全性安装群集__。That the cluster is setup using X.509 security with backup and restore service.
  • 在群集上部署了可靠有状态服务。A Reliable Stateful service is deployed on the cluster. 在本快速入门指南中,应用程序 URI 为 fabric:/SampleApp,属于此应用程序的可靠有状态服务的 URI 为 fabric:/SampleApp/MyStatefulServiceFor the purpose of this quickstart guide, application Uri is fabric:/SampleApp and the Uri for Reliable Stateful service belonging to this application is fabric:/SampleApp/MyStatefulService. 使用单个分区部署此服务,分区 ID 为 974bd92a-b395-4631-8a7f-53bd4ae9cf22This service is deployed with single partition, and the partition ID is 974bd92a-b395-4631-8a7f-53bd4ae9cf22.
  • 具有管理员角色的客户端证书安装计算机上 CurrentUser 证书存储位置的“我的”(个人)存储名称中,可从其中调用以下脚本 。The client certificate with administrator role is installed in My (Personal) store name of CurrentUser certificate store location on the machine from where below scripts will be invoked. 本示例使用 1b7ebe2174649c45474a4819dafae956712c31d3 作为此证书的指纹。This example uses 1b7ebe2174649c45474a4819dafae956712c31d3 as thumbprint of this certificate. 有关访问客户端证书的详细信息,请参阅适用于 Service Fabric 客户端的基于角色的访问控制For more information on client certificates, see Role-based access control for Service Fabric clients.

创建备份策略Create backup policy

第一步是创建描述备份计划的备份策略、备份数据的目标存储、策略名称、触发完整备份之前允许的最大递增备份以及备份存储的保留策略。First step is to create backup policy describing backup schedule, target storage for backup data, policy name, maximum incremental backups to be allowed before triggering full backup and retention policy for backup storage.

有关备份存储,请使用上面创建的 Azure 存储帐户。For backup storage, use the Azure Storage account created above. 容器 backup-container 配置为存储备份。Container backup-container is configured to store backups. 在备份上传期间,将创建具有该名称的容器(如果该容器尚未存在)。A container with this name is created, if it does not already exist, during backup upload. 使用 Azure 存储帐户的有效连接字符串填充 ConnectionString,并将 account-name 替换为你的存储帐户名,将 account-key 替换为你的存储帐户密钥。Populate ConnectionString with a valid connection string for the Azure Storage account, replacing account-name with your storage account name, and account-key with your storage account key.

使用Microsoft.ServiceFabric.Powershell.Http 模块的 PowerShellPowerShell using Microsoft.ServiceFabric.Powershell.Http Module

执行以下 PowerShell cmdlet 以创建新的备份策略。Execute following PowerShell cmdlets for creating new backup policy. 请将 account-name 替换为你的存储帐户名,将 account-key 替换为你的存储帐户密钥。Replace account-name with your storage account name, and account-key with your storage account key.


New-SFBackupPolicy -Name 'BackupPolicy1' -AutoRestoreOnDataLoss $true -MaxIncrementalBackups 20 -FrequencyBased -Interval 00:15:00 -AzureBlobStore -ConnectionString 'DefaultEndpointsProtocol=https;AccountName=<account-name>;AccountKey=<account-key>;EndpointSuffix=core.chinacloudapi.cn' -ContainerName 'backup-container' -Basic -RetentionDuration '10.00:00:00'

使用 PowerShell 进行 Rest 调用Rest Call using PowerShell

执行以下 PowerShell 脚本,调用所需的 REST API 来创建新策略。Execute following PowerShell script for invoking required REST API to create new policy. 请将 account-name 替换为你的存储帐户名,将 account-key 替换为你的存储帐户密钥。Replace account-name with your storage account name, and account-key with your storage account key.

$StorageInfo = @{
    ConnectionString = 'DefaultEndpointsProtocol=https;AccountName=<account-name>;AccountKey=<account-key>;EndpointSuffix=core.chinacloudapi.cn'
    ContainerName = 'backup-container'
    StorageKind = 'AzureBlobStore'
}

$ScheduleInfo = @{
    Interval = 'PT15M'
    ScheduleKind = 'FrequencyBased'
}

$RetentionPolicy = @{ 
    RetentionPolicyType = 'Basic'
    RetentionDuration =  'P10D'
}

$BackupPolicy = @{
    Name = 'BackupPolicy1'
    MaxIncrementalBackups = 20
    Schedule = $ScheduleInfo
    Storage = $StorageInfo
    RetentionPolicy = $RetentionPolicy
}

$body = (ConvertTo-Json $BackupPolicy)
$url = "https://mysfcluster.chinaeast.cloudapp.chinacloudapi.cn:19080/BackupRestore/BackupPolicies/$/Create?api-version=6.4"

Invoke-WebRequest -Uri $url -Method Post -Body $body -ContentType 'application/json' -CertificateThumbprint '1b7ebe2174649c45474a4819dafae956712c31d3'

使用 Service Fabric ExplorerUsing Service Fabric Explorer

  1. 在 Service Fabric Explorer 中,导航到“备份”选项卡,然后选择“操作”>“创建备份策略”。In Service Fabric Explorer, navigate to the Backups tab and select Actions > Create Backup Policy.

    创建备份策略

  2. 填写信息。Fill out the information. 对于 Azure 群集,应选择“AzureBlobStore”。For Azure clusters, AzureBlobStore should be selected.

    创建备份策略 Azure Blob 存储

启用定期备份Enable periodic backup

在定义备份策略以满足应用程序的数据保护要求后,备份策略应与应用程序相关联。After defining backup policy to fulfill data protection requirements of the application, the backup policy should be associated with the application. 根据需要,备份策略可与应用程序、服务或分区相关联。Depending on requirement, the backup policy can be associated with an application, service, or a partition.

使用Microsoft.ServiceFabric.Powershell.Http 模块的 PowerShellPowerShell using Microsoft.ServiceFabric.Powershell.Http Module


Enable-SFApplicationBackup -ApplicationId 'SampleApp' -BackupPolicyName 'BackupPolicy1'

使用 PowerShell 进行 Rest 调用Rest Call using PowerShell

执行以下 PowerShell 脚本,调用所需的 REST API,将上面步骤中创建的名为 BackupPolicy1 的备份策略与应用程序 SampleApp 相关联。Execute following PowerShell script for invoking required REST API to associate backup policy with name BackupPolicy1 created in above step with application SampleApp.

$BackupPolicyReference = @{
    BackupPolicyName = 'BackupPolicy1'
}

$body = (ConvertTo-Json $BackupPolicyReference)
$url = "https://mysfcluster.chinaeast.cloudapp.chinacloudapi.cn:19080/Applications/SampleApp/$/EnableBackup?api-version=6.4"

Invoke-WebRequest -Uri $url -Method Post -Body $body -ContentType 'application/json' -CertificateThumbprint '1b7ebe2174649c45474a4819dafae956712c31d3'

使用 Service Fabric ExplorerUsing Service Fabric Explorer

  1. 选择应用程序,然后访问操作。Select an application and go to action. 单击“启用/更新应用程序备份”。Click Enable/Update Application Backup.

    启用应用程序备份

  2. 最后,选择所需的策略,然后单击“启用备份”。Finally, select the desired policy and click Enable Backup.

    选择策略

验证定期备份是否正常工作Verify that periodic backups are working

在应用程序级别启用备份后,属于应用程序下的可靠有状态服务和 Reliable Actors 的所有分区将根据关联的备份策略开始定期备份。After enabling backup at the application level, all partitions belonging to Reliable Stateful services and Reliable Actors under the application will start getting backed-up periodically as per the associated backup policy.

分区备份运行状况事件

列出备份List Backups

可使用 GetBackups API 来枚举属于应用程序的可靠有状态服务和 Reliable Actors 的所有分区的关联备份 。Backups associated with all partitions belonging to Reliable Stateful services and Reliable Actors of the application can be enumerated using GetBackups API. 可为应用程序、服务或分区枚举备份。Backups can be enumerated for an application, service, or a partition.

使用Microsoft.ServiceFabric.Powershell.Http 模块的 PowerShellPowerShell using Microsoft.ServiceFabric.Powershell.Http Module


Get-SFApplicationBackupList -ApplicationId WordCount

使用 PowerShell 进行 Rest 调用Rest Call using PowerShell

执行以下 PowerShell 脚本,调用 HTTP API 来枚举为 SampleApp 应用程序内所有分区创建的备份。Execute following PowerShell script to invoke the HTTP API to enumerate the backups created for all partitions inside the SampleApp application.

$url = "https://mysfcluster.chinaeast.cloudapp.chinacloudapi.cn:19080/Applications/SampleApp/$/GetBackups?api-version=6.4"

$response = Invoke-WebRequest -Uri $url -Method Get -CertificateThumbprint '1b7ebe2174649c45474a4819dafae956712c31d3'

$BackupPoints = (ConvertFrom-Json $response.Content)
$BackupPoints.Items

上述运行的示例输出:Sample output for the above run:

BackupId                : b9577400-1131-4f88-b309-2bb1e943322c
BackupChainId           : b9577400-1131-4f88-b309-2bb1e943322c
ApplicationName         : fabric:/SampleApp
ServiceName             : fabric:/SampleApp/MyStatefulService
PartitionInformation    : @{LowKey=-9223372036854775808; HighKey=9223372036854775807; ServicePartitionKind=Int64Range; Id=974bd92a-b395-4631-8a7f-53bd4ae9cf22}
BackupLocation          : SampleApp\MyStatefulService\974bd92a-b395-4631-8a7f-53bd4ae9cf22\2018-04-06 20.55.16.zip
BackupType              : Full
EpochOfLastBackupRecord : @{DataLossNumber=131675205859825409; ConfigurationNumber=8589934592}
LsnOfLastBackupRecord   : 3334
CreationTimeUtc         : 2018-04-06T20:55:16Z
FailureError            : 

BackupId                : b0035075-b327-41a5-a58f-3ea94b68faa4
BackupChainId           : b9577400-1131-4f88-b309-2bb1e943322c
ApplicationName         : fabric:/SampleApp
ServiceName             : fabric:/SampleApp/MyStatefulService
PartitionInformation    : @{LowKey=-9223372036854775808; HighKey=9223372036854775807; ServicePartitionKind=Int64Range; Id=974bd92a-b395-4631-8a7f-53bd4ae9cf22}
BackupLocation          : SampleApp\MyStatefulService\974bd92a-b395-4631-8a7f-53bd4ae9cf22\2018-04-06 21.10.27.zip
BackupType              : Incremental
EpochOfLastBackupRecord : @{DataLossNumber=131675205859825409; ConfigurationNumber=8589934592}
LsnOfLastBackupRecord   : 3552
CreationTimeUtc         : 2018-04-06T21:10:27Z
FailureError            : 

BackupId                : 69436834-c810-4163-9386-a7a800f78359
BackupChainId           : b9577400-1131-4f88-b309-2bb1e943322c
ApplicationName         : fabric:/SampleApp
ServiceName             : fabric:/SampleApp/MyStatefulService
PartitionInformation    : @{LowKey=-9223372036854775808; HighKey=9223372036854775807; ServicePartitionKind=Int64Range; Id=974bd92a-b395-4631-8a7f-53bd4ae9cf22}
BackupLocation          : SampleApp\MyStatefulService\974bd92a-b395-4631-8a7f-53bd4ae9cf22\2018-04-06 21.25.36.zip
BackupType              : Incremental
EpochOfLastBackupRecord : @{DataLossNumber=131675205859825409; ConfigurationNumber=8589934592}
LsnOfLastBackupRecord   : 3764
CreationTimeUtc         : 2018-04-06T21:25:36Z
FailureError            : 

使用 Service Fabric ExplorerUsing Service Fabric Explorer

若要在 Service Fabric Explorer 中查看备份,请导航到一个分区,然后选择“备份”选项卡。To view backups in Service Fabric Explorer, navigate to a partition and select the Backups tab.

枚举备份

限制/注意事项Limitation/ caveats

  • Service Fabric PowerShell cmdlet 处于预览模式。Service Fabric PowerShell cmdlets are in preview mode.
  • Linux 上不支持 Service Fabric 群集。No support for Service Fabric clusters on Linux.

后续步骤Next steps