Periodic backup and restore in an Azure Service Fabric cluster
Service Fabric is a platform for developing and managing reliable, distributed cloud applications. It supports both stateless and stateful microservices. Stateful services can keep important data beyond a single request or transaction. If a stateful service goes down or loses data, it may need to be restored from a recent backup to continue working properly.
Service Fabric replicates the state across multiple nodes to ensure that the service is highly available. Even if one node in the cluster fails, the service continues to be available. In certain cases, however, it's still desirable for the service data to be reliable against broader failures.
For example, service may want to back up its data in order to protect from the following scenarios:
- In the event of the permanent loss of an entire Service Fabric cluster.
- Permanent loss of a majority of the replicas of a service partition
- Administrative errors whereby the state accidentally gets deleted or corrupted. For example, an administrator with sufficient privilege erroneously deletes the service.
- Bugs in the service that cause data corruption. For example, this may happen when a service code upgrade starts writing faulty data to a Reliable Collection. In such a case, both the code and the data may have to be reverted to an earlier state.
- Offline data processing. It might be convenient to have offline processing of data for business intelligence that happens separately from the service that generates the data.
Service Fabric provides an inbuilt API to do point in time backup and restore. Application developers may use these APIs to back up the state of the service periodically. Additionally, if service administrators want to trigger a backup from outside of the service at a specific time, like before upgrading the application, developers need to expose backup (and restore) as an API from the service. Maintaining the backups is an additional cost above this. For example, you may want to take five incremental backups every half hour, followed by a full backup. After the full backup, you can delete the prior incremental backups. This approach requires additional code leading to additional cost during application development.
The Backup and Restore service in Service Fabric enables easy and automatic backup of information stored in stateful services. Backing up application data on a periodic basis is fundamental for guarding against data loss and service unavailability. Service Fabric provides an optional backup and restore service, which allows you to configure periodic backup of stateful Reliable Services (including Actor Services) without having to write any additional code. It also facilitates restoring previously taken backups.
Service Fabric provides a set of APIs to achieve the following functionality related to periodic backup and restore feature:
- Schedule periodic backup of Reliable Stateful services and Reliable Actors with support to upload backup to (external) storage locations. Supported storage locations
- Azure Storage
- File Share (on-premises)
- Enumerate backups
- Trigger an ad hoc backup of a partition
- Restore a partition using previous backup
- Temporarily suspend backups
- Retention management of backups (upcoming)
Prerequisites
- Service Fabric cluster with Fabric version 6.4 or above. Refer to this article for steps to create Service Fabric cluster using Azure resource template.
- X.509 Certificate for encryption of secrets needed to connect to storage to store backups. Refer article to know how to get or create an X.509 certificate.
- Service Fabric Reliable Stateful application built using Service Fabric SDK version 3.0 or above. For applications targeting .NET Core 2.0, the application should be built using Service Fabric SDK version 3.1 or above.
- Create Azure Storage account for storing application backups.
- Install Microsoft.ServiceFabric.Powershell.Http Module for making configuration calls.
Install-Module -Name Microsoft.ServiceFabric.Powershell.Http -AllowPrerelease
Note
If your PowerShellGet version is less than 1.6.0, you'll need to update to add support for the -AllowPrerelease flag:
Install-Module -Name PowerShellGet -Force
- Make sure that Cluster is connected using the
Connect-SFCluster
command before making any configuration request using Microsoft.ServiceFabric.Powershell.Http Module.
Connect-SFCluster -ConnectionEndpoint 'https://mysfcluster.chinaeast.cloudapp.chinacloudapi.cn:19080' -X509Credential -FindType FindByThumbprint -FindValue '1b7ebe2174649c45474a4819dafae956712c31d3' -StoreLocation 'CurrentUser' -StoreName 'My' -ServerCertThumbprint '1b7ebe2174649c45474a4819dafae956712c31d3'
Enabling backup and restore service
Using Azure portal
Enable Include backup restore service
check box under + Show optional settings
in Cluster Configuration
tab.
Using Azure Resource Manager Template
First you need to enable the backup and restore service in your cluster. Get the template for the cluster that you want to deploy. You can either use the sample templates or create a Resource Manager template. Enable the backup and restore service with the following steps:
Check that the
apiversion
is set to2018-02-01
for theMicrosoft.ServiceFabric/clusters
resource, and if not, update it as shown in the following snippet:{ "apiVersion": "2018-02-01", "type": "Microsoft.ServiceFabric/clusters", "name": "[parameters('clusterName')]", "location": "[parameters('clusterLocation')]", ... }
Now enable the backup and restore service by adding the following
addonFeatures
section underproperties
section as shown in the following snippet:"properties": { ... "addonFeatures": ["BackupRestoreService"], "fabricSettings": [ ... ] ... }
Configure X.509 certificate for encryption of credentials. This is important to ensure that the credentials provided to connect to storage are encrypted before persisting. Configure encryption certificate by adding the following
BackupRestoreService
section underfabricSettings
section as shown in the following snippet:"properties": { ... "addonFeatures": ["BackupRestoreService"], "fabricSettings": [ { "name": "BackupRestoreService", "parameters": [ { "name": "SecretEncryptionCertThumbprint", "value": "[Thumbprint]" }, { "name": "SecretEncryptionCertX509StoreName", "value": "My" } ] }] ... }
Note
[Thumbprint] needs to replace by valid certificate thumbprint to be used for encryption.
Once you have updated your cluster template with the preceding changes, apply them and let the deployment/upgrade complete. Once complete, the backup and restore service starts running in your cluster. The Uri of this service is
fabric:/System/BackupRestoreService
and the service can be located under system service section in the Service Fabric explorer.
Enabling periodic backup for Reliable Stateful service and Reliable Actors
Let's walk through steps to enable periodic backup for Reliable Stateful service and Reliable Actors. These steps assume that
- The cluster is setup using X.509 security with backup and restore service.
- A Reliable Stateful service is deployed on the cluster. For this quickstart guide, application Uri is
fabric:/SampleApp
and the Uri for Reliable Stateful service belonging to this application isfabric:/SampleApp/MyStatefulService
. This service is deployed with a single partition, and the partition ID is974bd92a-b395-4631-8a7f-53bd4ae9cf22
. - The client certificate with administrator role is installed in My (Personal) store name of CurrentUser certificate store location on the machine from where below scripts will be invoked. This example uses
1b7ebe2174649c45474a4819dafae956712c31d3
as thumbprint of this certificate. For more information on client certificates, see Role-based access control for Service Fabric clients.
Create backup policy
The first step is to create a backup policy. This policy should include the backup schedule, target storage for the backup data, policy name, the maximum number of incremental backups allowed before a full backup is triggered, and the retention policy for the backup storage.
For backup storage, use the Azure Storage account created above. Container backup-container
is configured to store backups. A container with this name is created, if it doesn't already exist, during backup upload. Populate BlobServiceUri
with the Azure Storage account url replacing account-name
with your storage account name and populate optional parameter ManagedIdentityClientId
with Client-ID of User-Assigned Managed Identity if there are multiple managed identities assigned to your resource.
Follow steps for managed-identity assignment on Azure resource:
Enable system assigned or User assigned managed identity in the Virtual Machine Scale Sets Configure managed identities on virtual machine scale set
Assign role to the Virtual Machine Scale Sets managed identity to storage account Assign Azure roles using the Azure portal - Azure RBAC
- Storage Blob Data Contributor Role at minimum
For more information on Managed Identity
PowerShell using Microsoft.ServiceFabric.Powershell.Http Module
Execute the following PowerShell cmdlets for creating new backup policy. Replace account-name
with your storage account name.
New-SFBackupPolicy -Name 'BackupPolicy1' -AutoRestoreOnDataLoss $false -MaxIncrementalBackups 20 -FrequencyBased -Interval "<hh:mm>" -ManagedIdentityAzureBlobStore -FriendlyName "AzureMI_storagesample" -BlobServiceUri 'https://<account-name>.blob.core.chinacloudapi.cn' -ContainerName 'backup-container' -ManagedIdentityType "VMSS" -ManagedIdentityClientId "<Client-Id of User-Assigned MI>" -Basic -RetentionDuration '10.00:00:00'
# Use Optional parameter `ManagedIdentityClientId` with Client-Id of User-Assigned Managed Identity in case of multiple User-Assigned Managed Identities assigned to your resource, or both SAMI & UAMI assigned and we need to use UAMI as the default, else no need of this paramter.
Rest Call using PowerShell
Execute the following PowerShell script for invoking required REST API to create new policy. Replace account-name
with your storage account name.
$StorageInfo = @{
StorageKind = "ManagedIdentityAzureBlobStore"
FriendlyName = "AzureMI_storagesample"
BlobServiceUri = "https://<account-name>.blob.core.chinacloudapi.cn"
ContainerName = "backup-container"
ManagedIdentityType = "VMSS"
ManagedIdentityClientId = "<Client-Id of User-Assigned MI>" # Use Optional parameter `ManagedIdentityClientId` with Client-Id of User-Assigned Managed Identity in case of multiple User-Assigned Managed Identities assigned to your resource, or both SAMI & UAMI assigned and we need to use UAMI as the default, else no need of this paramter.
}
$ScheduleInfo = @{
Interval = 'PT15M'
ScheduleKind = 'FrequencyBased'
}
$RetentionPolicy = @{
RetentionPolicyType = 'Basic'
RetentionDuration = 'P10D'
}
$BackupPolicy = @{
Name = 'BackupPolicy1'
MaxIncrementalBackups = 20
Schedule = $ScheduleInfo
Storage = $StorageInfo
RetentionPolicy = $RetentionPolicy
}
$body = (ConvertTo-Json $BackupPolicy)
$url = "https://mysfcluster.chinaeast.cloudapp.chinacloudapi.cn:19080/BackupRestore/BackupPolicies/$/Create?api-version=6.4"
Invoke-WebRequest -Uri $url -Method Post -Body $body -ContentType 'application/json' -CertificateThumbprint '1b7ebe2174649c45474a4819dafae956712c31d3'
Using Service Fabric Explorer
In Service Fabric Explorer, Click on Cluster at left side panel, navigate to the Backups tab and select Actions > Create Backup Policy.
Fill out the information. For details out how to specify a frequency-based interval, see the TimeGrain property. For Azure clusters, ManagedIdentityAzureBlobStore should be selected.
Enable periodic backup
After defining backup policy to fulfill data protection requirements of the application, the backup policy should be associated with the application. Depending on the requirement, the backup policy can be associated with an application, service, or a partition.
PowerShell using Microsoft.ServiceFabric.Powershell.Http Module
Enable-SFApplicationBackup -ApplicationId 'SampleApp' -BackupPolicyName 'BackupPolicy1'
Rest Call using PowerShell
Execute the following PowerShell script for invoking required REST API to associate backup policy with name BackupPolicy1
created in above step with application SampleApp
.
$BackupPolicyReference = @{
BackupPolicyName = 'BackupPolicy1'
}
$body = (ConvertTo-Json $BackupPolicyReference)
$url = "https://mysfcluster.chinaeast.cloudapp.chinacloudapi.cn:19080/Applications/SampleApp/$/EnableBackup?api-version=6.4"
Invoke-WebRequest -Uri $url -Method Post -Body $body -ContentType 'application/json' -CertificateThumbprint '1b7ebe2174649c45474a4819dafae956712c31d3'
Using Service Fabric Explorer
Make sure the BackupRestoreService is enabled on cluster.
Open Service Fabric Explorer.
Select an application and go to Backup section. Click on Backup Action.
Click Enable/Update Application Backup.
Finally, select the desired policy and click Enable Backup.
Verify that periodic backups are working
After enabling backup at the application level, all partitions belonging to Reliable Stateful services and Reliable Actors under the application will start getting backed-up periodically as per the associated backup policy.
List Backups
Backups associated with all partitions belonging to Reliable Stateful services and Reliable Actors of the application can be enumerated using GetBackups API. Backups can be enumerated for an application, service, or a partition.
PowerShell using Microsoft.ServiceFabric.Powershell.Http Module
Get-SFApplicationBackupList -ApplicationId WordCount
Rest Call using PowerShell
Execute the following PowerShell script to invoke the HTTP API to enumerate the backups created for all partitions inside the SampleApp
application.
$url = "https://mysfcluster.chinaeast.cloudapp.chinacloudapi.cn:19080/Applications/SampleApp/$/GetBackups?api-version=6.4"
$response = Invoke-WebRequest -Uri $url -Method Get -CertificateThumbprint '1b7ebe2174649c45474a4819dafae956712c31d3'
$BackupPoints = (ConvertFrom-Json $response.Content)
$BackupPoints.Items
Sample output for the above run:
BackupId : b9577400-1131-4f88-b309-2bb1e943322c
BackupChainId : b9577400-1131-4f88-b309-2bb1e943322c
ApplicationName : fabric:/SampleApp
ServiceName : fabric:/SampleApp/MyStatefulService
PartitionInformation : @{LowKey=-9223372036854775808; HighKey=9223372036854775807; ServicePartitionKind=Int64Range; Id=974bd92a-b395-4631-8a7f-53bd4ae9cf22}
BackupLocation : SampleApp\MyStatefulService\974bd92a-b395-4631-8a7f-53bd4ae9cf22\2018-04-06 20.55.16.zip
BackupType : Full
EpochOfLastBackupRecord : @{DataLossNumber=131675205859825409; ConfigurationNumber=8589934592}
LsnOfLastBackupRecord : 3334
CreationTimeUtc : 2018-04-06T20:55:16Z
FailureError :
BackupId : b0035075-b327-41a5-a58f-3ea94b68faa4
BackupChainId : b9577400-1131-4f88-b309-2bb1e943322c
ApplicationName : fabric:/SampleApp
ServiceName : fabric:/SampleApp/MyStatefulService
PartitionInformation : @{LowKey=-9223372036854775808; HighKey=9223372036854775807; ServicePartitionKind=Int64Range; Id=974bd92a-b395-4631-8a7f-53bd4ae9cf22}
BackupLocation : SampleApp\MyStatefulService\974bd92a-b395-4631-8a7f-53bd4ae9cf22\2018-04-06 21.10.27.zip
BackupType : Incremental
EpochOfLastBackupRecord : @{DataLossNumber=131675205859825409; ConfigurationNumber=8589934592}
LsnOfLastBackupRecord : 3552
CreationTimeUtc : 2018-04-06T21:10:27Z
FailureError :
BackupId : 69436834-c810-4163-9386-a7a800f78359
BackupChainId : b9577400-1131-4f88-b309-2bb1e943322c
ApplicationName : fabric:/SampleApp
ServiceName : fabric:/SampleApp/MyStatefulService
PartitionInformation : @{LowKey=-9223372036854775808; HighKey=9223372036854775807; ServicePartitionKind=Int64Range; Id=974bd92a-b395-4631-8a7f-53bd4ae9cf22}
BackupLocation : SampleApp\MyStatefulService\974bd92a-b395-4631-8a7f-53bd4ae9cf22\2018-04-06 21.25.36.zip
BackupType : Incremental
EpochOfLastBackupRecord : @{DataLossNumber=131675205859825409; ConfigurationNumber=8589934592}
LsnOfLastBackupRecord : 3764
CreationTimeUtc : 2018-04-06T21:25:36Z
FailureError :
Using Service Fabric Explorer
To view backups in Service Fabric Explorer, navigate to a partition and select the Backups tab.
Limitation/ caveats
- Service Fabric PowerShell cmdlets are in preview mode.
- No support for Service Fabric clusters on Linux.