Azure Stack HCI 中的容错和存储效率Fault tolerance and storage efficiency in Azure Stack HCI

适用于:Azure Stack HCI 版本 20H2;Windows Server 2019Applies to: Azure Stack HCI, version 20H2; Windows Server 2019

本主题介绍存储空间直通中可用的复原选项,并概述其规模要求、存储效率,以及每个选项的一般优势和弊端。This topic introduces the resiliency options available in Storage Spaces Direct and outlines the scale requirements, storage efficiency, and general advantages and tradeoffs of each. 本主题还提供了一些用法说明来帮助你入门,并参考了一些极佳的论文、博客和其他内容供你了解详情。It also presents some usage instructions to get you started, and references some great papers, blogs, and additional content where you can learn more.

如果你已熟悉存储空间,可以跳到摘要部分。If you are already familiar with Storage Spaces, you may want to skip to the Summary section.

概述Overview

存储空间的核心是为数据提供容错,通常称为“复原”。At its heart, Storage Spaces is about providing fault tolerance, often called "resiliency," for your data. 其实现方式类似于 RAID,但分布在不同的服务器中,并在软件中实现。Its implementation is similar to RAID, except distributed across servers and implemented in software.

如同 RAID,存储空间可通过多种不同的方法实现其功能,这些方法在容错、存储效率和计算复杂性方面各有利弊。As with RAID, there are a few different ways Storage Spaces can do this, which make different tradeoffs between fault tolerance, storage efficiency, and compute complexity. 概括而言,这些方法分为两大类别:“镜像”和“奇偶校验”,后者有时称为“擦除编码”。These broadly fall into two categories: "mirroring" and "parity," the latter sometimes called "erasure coding."

镜像Mirroring

镜像功能通过保存所有数据的多个副本来提供容错。Mirroring provides fault tolerance by keeping multiple copies of all data. 它非常类似于 RAID-1。This most closely resembles RAID-1. 这些数据的条带化和放置方式非常重要(请参阅此博客了解详细信息),但肯定的是,使用镜像功能存储的任何数据都会完整地写入多次。How that data is striped and placed is non-trivial (see this blog to learn more), but it is absolutely true to say that any data stored using mirroring is written, in its entirety, multiple times. 每个副本将写入不同的物理硬件(位于不同服务器中的不同驱动器),假设每个硬盘各自都有可能发生故障。Each copy is written to different physical hardware (different drives in different servers) that are assumed to fail independently.

存储空间提供两种形式的镜像 –“双向”和“三向”。Storage Spaces offers two flavors of mirroring – "two-way" and "three-way."

双向镜像Two-way mirror

双向镜像写入所有内容的两个副本。Two-way mirroring writes two copies of everything. 其存储效率为 50% – 若要写入 1 TB 的数据,至少需要 2 TB 物理存储容量。Its storage efficiency is 50 percent – to write 1 TB of data, you need at least 2 TB of physical storage capacity. 同理,至少需要两个硬件“容错域”– 使用存储空间直通时,这意味着需要两台服务器。Likewise, you need at least two hardware 'fault domains' – with Storage Spaces Direct, that means two servers.

two-way-mirror

警告

如果你有两台以上的服务器,我们建议改用三向镜像。If you have more than two servers, we recommend using three-way mirroring instead.

三向镜像Three-way mirror

三向镜像写入所有内容的三个副本。Three-way mirroring writes three copies of everything. 其存储效率为 33.3% – 若要写入 1 TB 的数据,至少需要 3 TB 物理存储容量。Its storage efficiency is 33.3 percent – to write 1 TB of data, you need at least 3 TB of physical storage capacity. 同理,至少需要三个硬件容错域 – 使用存储空间直通时,这意味着需要三台服务器。Likewise, you need at least three hardware fault domains – with Storage Spaces Direct, that means three servers.

三向镜像可以安全承受至少两个硬件(驱动器或服务器)同时出现问题Three-way mirroring can safely tolerate at least two hardware problems (drive or server) at a time. 例如,如果你正在重新启动一台服务器,此时另一个驱动器或服务器突然发生故障,在这种情况下,所有数据将保持安全,可供持续访问。For example, if you're rebooting one server when suddenly another drive or server fails, all data remains safe and continuously accessible.

three-way-mirror

奇偶校验Parity

奇偶校验编码(通常称为“擦除编码”)提供使用按位算术的容错,它可能会变得相当复杂Parity encoding, often called "erasure coding," provides fault tolerance using bitwise arithmetic, which can get remarkably complicated. 相比于镜像,此方法的工作原理较为隐晦,但有许多极佳的在线资源(例如,此第三方擦除编码入门指南)可帮助你了解其思路。The way this works is less obvious than mirroring, and there are many great online resources (for example, this third-party Dummies Guide to Erasure Coding) that can help you get the idea. 简单而言,它可以提供更好的存储效率,且不影响容错能力。Sufficed to say it provides better storage efficiency without compromising fault tolerance.

存储空间提供两种形式的奇偶校验 – “单一”奇偶校验和“双重”奇偶校验,后者大规模运用称作“局部重建代码”的先进技术。Storage Spaces offers two flavors of parity – "single" parity and "dual" parity, the latter employing an advanced technique called "local reconstruction codes" at larger scales.

重要

我们建议对大多数性能敏感型工作负荷使用镜像。We recommend using mirroring for most performance-sensitive workloads. 有关如何根据工作负荷均衡性能和容量的详细信息,请参阅规划卷To learn more about how to balance performance and capacity depending on your workload, see Plan volumes.

单一奇偶校验Single parity

单一奇偶校验只保留一个按位奇偶校验符号,每次只能针对一次故障提供容错。Single parity keeps only one bitwise parity symbol, which provides fault tolerance against only one failure at a time. 此方法非常类似于 RAID-5。It most closely resembles RAID-5. 若要使用单一奇偶校验,至少需要三个硬件容错域 – 使用存储空间直通时,这意味着需要三台服务器。To use single parity, you need at least three hardware fault domains – with Storage Spaces Direct, that means three servers. 由于三向镜像能够以相同的规模提供更高的容错,因此我们不建议使用单一奇偶校验。Because three-way mirroring provides more fault tolerance at the same scale, we discourage using single parity. 但是,如果你坚决要使用它,它是完全受支持的。But, it's there if you insist on using it, and it is fully supported.

警告

我们之所以不建议使用单一奇偶校验,是因为它每次只能安全承受一次硬件故障:如果另一驱动器或服务器突然发生故障时重新启动某一台服务器,则会遇到停机。We discourage using single parity because it can only safely tolerate one hardware failure at a time: if you're rebooting one server when suddenly another drive or server fails, you will experience downtime. 如果你只有三台服务器,我们建议使用三向镜像。If you only have three servers, we recommend using three-way mirroring. 如果你有四台或更多服务器,请参阅下一部分。If you have four or more, see the next section.

双重奇偶校验Dual parity

双重奇偶校验运行 Reed-Solomon 纠错代码,以保留两个按位奇偶校验符号,因此提供与三向镜像相同的容错(即,每次最多可以承受两次故障),但其存储效率更高。Dual parity implements Reed-Solomon error-correcting codes to keep two bitwise parity symbols, thereby providing the same fault tolerance as three-way mirroring (i.e. up to two failures at once), but with better storage efficiency. 此方法非常类似于 RAID-6。It most closely resembles RAID-6. 若要使用双重奇偶校验,至少需要四个硬件容错域 – 使用存储空间直通时,这意味着需要四台服务器。To use dual parity, you need at least four hardware fault domains – with Storage Spaces Direct, that means four servers. 在这种规模下,存储效率为 50% – 若要存储 2 TB 数据,需要 4 TB 物理存储容量。At that scale, the storage efficiency is 50% – to store 2 TB of data, you need 4 TB of physical storage capacity.

dual-parity

双重奇偶校验的存储效率随着硬件容错域数的增加而提高,可从 50% 提高到 80%。The storage efficiency of dual parity increases the more hardware fault domains you have, from 50 percent up to 80 percent. 例如,使用七个容错域时(使用存储空间直通时,这意味着需要七台服务器),效率将激增到 66.7% – 若要存储 4 TB 数据,只需要 6 TB 物理存储容量。For example, at seven (with Storage Spaces Direct, that means seven servers) the efficiency jumps to 66.7 percent – to store 4 TB of data, you need just 6 TB of physical storage capacity.

dual-parity-wide

请参阅摘要部分,了解每种规模下的双重奇偶校验效率和局部重建代码。See the Summary section for the efficiency of dual party and local reconstruction codes at every scale.

局部重建代码Local reconstruction codes

存储空间引入了由 Microsoft Research 开发的先进技术“局部重建代码”(LRC)。Storage Spaces introduces an advanced technique developed by Microsoft Research called "local reconstruction codes," or LRC. 规模较大时,双重奇偶校验会使用 LRC 将其编码/解码拆分成一些较小的组,以降低进行写入或从故障中恢复所需的开销。At large scale, dual parity uses LRC to split its encoding/decoding into a few smaller groups, to reduce the overhead required to make writes or recover from failures.

使用机械硬盘 (HDD) 时,组的大小为四个符号;使用固态硬盘 (SSD) 时,组的大小为六个符号。With hard disk drives (HDD) the group size is four symbols; with solid-state drives (SSD), the group size is six symbols. 例如,下面是使用机械硬盘和 12 个硬件容错域(即 12 台服务器)时的布局外观 – 有两个组,每个组为四个数据符号。For example, here's what the layout looks like with hard disk drives and 12 hardware fault domains (meaning 12 servers) – there are two groups of four data symbols. 此配置可实现 72.7% 的存储效率。It achieves 72.7 percent storage efficiency.

local-reconstruction-codes

建议参阅这篇既深入又易于理解的演练:局部重建代码如何应对各种不同的故障场景,它为何如此引人关注We recommend this in-depth yet eminently readable walk-through of how local reconstruction codes handle various failure scenarios, and why they're appealing.

镜像加速奇偶校验Mirror-accelerated parity

存储空间直通卷可以是部分镜像和部分奇偶校验。A Storage Spaces Direct volume can be part mirror and part parity. 写入内容先进入镜像部分,然后逐渐移入奇偶校验部分。Writes land first in the mirrored portion and are gradually moved into the parity portion later. 实际上,这是使用镜像来加速擦除编码Effectively, this is using mirroring to accelerate erasure coding.

若要混合使用三向镜像和双重奇偶校验,至少需要四个容错域,即四台服务器。To mix three-way mirror and dual parity, you need at least four fault domains, meaning four servers.

镜像加速奇偶校验的存储效率介于全镜像或全奇偶校验的效率之间,并取决于所选的比例。The storage efficiency of mirror-accelerated parity is in between what you'd get from using all mirror or all parity, and depends on the proportions you choose.

重要

我们建议对大多数性能敏感型工作负荷使用镜像。We recommend using mirroring for most performance-sensitive workloads. 有关如何根据工作负荷均衡性能和容量的详细信息,请参阅规划卷To learn more about how to balance performance and capacity depending on your workload, see Plan volumes.

摘要Summary

本部分汇总了存储空间直通中提供的复原类型、使用每种类型所要满足的最低规模要求、每个类型可承受的故障次数,以及相应的存储效率。This section summarizes the resiliency types available in Storage Spaces Direct, the minimum scale requirements to use each type, how many failures each type can tolerate, and the corresponding storage efficiency.

复原类型Resiliency types

复原能力Resiliency 容错Failure tolerance 存储效率Storage efficiency
双向镜像Two-way mirror 11 50.0%50.0%
三向镜像Three-way mirror 22 33.3%33.3%
双重奇偶校验Dual parity 22 50.0% - 80.0%50.0% - 80.0%
MixedMixed 22 33.3% - 80.0%33.3% - 80.0%

最低规模要求Minimum scale requirements

复原能力Resiliency 所需的最小容错域数Minimum required fault domains
双向镜像Two-way mirror 22
三向镜像Three-way mirror 33
双重奇偶校验Dual parity 44
MixedMixed 44

提示

除非使用机箱或机架容错,否则容错域数目是指服务器数目。Unless you are using chassis or rack fault tolerance, the number of fault domains refers to the number of servers. 只要符合存储空间直通的最低要求,每台服务器中的驱动器数目就不会影响可用的复原类型。The number of drives in each server does not affect which resiliency types you can use, as long as you meet the minimum requirements for Storage Spaces Direct.

混合部署的双重奇偶校验效率Dual parity efficiency for hybrid deployments

下表显示了同时包含机械硬盘 (HDD) 和固态硬盘 (SSD) 的混合部署在每种规模下的双重奇偶校验存储效率和局部重建代码。This table shows the storage efficiency of dual parity and local reconstruction codes at each scale for hybrid deployments which contain both hard disk drives (HDD) and solid-state drives (SSD).

容错域Fault domains 布局Layout 效率Efficiency
22
33
44 RS 2+2RS 2+2 50.0%50.0%
55 RS 2+2RS 2+2 50.0%50.0%
66 RS 2+2RS 2+2 50.0%50.0%
77 RS 4+2RS 4+2 66.7%66.7%
88 RS 4+2RS 4+2 66.7%66.7%
99 RS 4+2RS 4+2 66.7%66.7%
10 个10 RS 4+2RS 4+2 66.7%66.7%
1111 RS 4+2RS 4+2 66.7%66.7%
1212 LRC (8, 2, 1)LRC (8, 2, 1) 72.7%72.7%
1313 LRC (8, 2, 1)LRC (8, 2, 1) 72.7%72.7%
1414 LRC (8, 2, 1)LRC (8, 2, 1) 72.7%72.7%
1515 LRC (8, 2, 1)LRC (8, 2, 1) 72.7%72.7%
1616 LRC (8, 2, 1)LRC (8, 2, 1) 72.7%72.7%

全闪存部署的双重奇偶校验效率Dual parity efficiency for all-flash deployments

下表显示了包含固态硬盘 (SSD) 的全闪存部署在每种规模下的双重奇偶校验存储效率和局部重建代码。This table shows the storage efficiency of dual parity and local reconstruction codes at each scale for all-flash deployments which contain only solid-state drives (SSD). 奇偶校验布局可以使用较大的组大小,并在全闪存配置中实现更好的存储效率。The parity layout can use larger group sizes and achieve better storage efficiency in an all-flash configuration.

容错域Fault domains 布局Layout 效率Efficiency
22
33
44 RS 2+2RS 2+2 50.0%50.0%
55 RS 2+2RS 2+2 50.0%50.0%
66 RS 2+2RS 2+2 50.0%50.0%
77 RS 4+2RS 4+2 66.7%66.7%
88 RS 4+2RS 4+2 66.7%66.7%
99 RS 6+2RS 6+2 75.0%75.0%
10 个10 RS 6+2RS 6+2 75.0%75.0%
1111 RS 6+2RS 6+2 75.0%75.0%
1212 RS 6+2RS 6+2 75.0%75.0%
1313 RS 6+2RS 6+2 75.0%75.0%
1414 RS 6+2RS 6+2 75.0%75.0%
1515 RS 6+2RS 6+2 75.0%75.0%
1616 LRC (12, 2, 1)LRC (12, 2, 1) 80.0%80.0%

示例Examples

除非只有两台服务器,否则我们建议使用三向镜像和/或双重奇偶校验,因为它们提供更好的容错。Unless you have only two servers, we recommend using three-way mirroring and/or dual parity, because they offer better fault tolerance. 具体而言,即使两个容错域(使用存储空间直通时,这意味着需要两台服务器)由于同时发生的故障而受到影响,这些方法也能确保所有数据保持安全且持续可供访问。Specifically, they ensure that all data remains safe and continuously accessible even when two fault domains – with Storage Spaces Direct, that means two servers - are affected by simultaneous failures.

所有组件保持联机的示例Examples where everything stays online

这六个示例演示了三向镜像和/或双重奇偶校验可以承受的故障。These six examples show what three-way mirroring and/or dual parity can tolerate.

  • 1. 一个驱动器丢失(包括缓存驱动器)1. One drive lost (includes cache drives)
  • 2. 一台服务器丢失2. One server lost

fault-tolerance-examples-1-and-2

  • 3. 一个服务器和一个驱动器丢失3. One server and one drive lost
  • 4. 不同服务器中的两个驱动器丢失4. Two drives lost in different servers

fault-tolerance-examples-3-and-4

  • 5. 两个以上的驱动器丢失,但最多两台服务器受影响5. More than two drives lost, so long as at most two servers are affected
  • 6. 两台服务器丢失6. Two servers lost

fault-tolerance-examples-5-and-6

...在每种情况下,所有卷都将保持联机状态。...in every case, all volumes will stay online. (请确保群集中保留了仲裁。)(Make sure your cluster maintains quorum.)

所有组件脱机的示例Examples where everything goes offline

在其生存期内,存储空间可以承受任意次数的故障,因为在时间足够的情况下,它在每次故障后能够完全复原。Over its lifetime, Storage Spaces can tolerate any number of failures, because it restores to full resiliency after each one, given sufficient time. 但是,在任意给定时刻,最多只能有两个容错域能够受到故障影响而安全无虞。However, at most two fault domains can safely be affected by failures at any given moment. 下面是三向镜像和/或双重奇偶校验不能承受的故障示例。The following are therefore examples of what three-way mirroring and/or dual parity cannot tolerate.

  • 7. 三台或更多服务器中的驱动器同时丢失7. Drives lost in three or more servers at once
  • 8. 三台或更多服务器同时丢失8. Three or more servers lost at once

fault-tolerance-examples-7-and-8

使用情况Usage

查看创建卷Check out Create volumes.

后续步骤Next steps

若要进一步阅读本文中所述的主题,请参阅以下文章:For further reading on subjects mentioned in this article, see the following: