使用 Python 进行客户端加密Client-side encryption with Python

概述Overview

用于 Python 的 Azure 存储客户端库 支持在上传到 Azure 存储之前加密客户端应用程序中的数据,以及在下载到客户端时解密数据。The Azure Storage Client Library for Python supports encrypting data within client applications before uploading to Azure Storage, and decrypting data while downloading to the client.

Note

Azure 存储 Python 库目前以预览版提供。The Azure Storage Python library is in preview.

通过信封技术加密和解密Encryption and decryption via the envelope technique

加密和解密的过程遵循信封技术。The processes of encryption and decryption follow the envelope technique.

通过信封技术加密Encryption via the envelope technique

通过信封技术加密的工作方式如下:Encryption via the envelope technique works in the following way:

  1. Azure 存储客户端库生成内容加密密钥 (CEK),这是一次性使用对称密钥。The Azure storage client library generates a content encryption key (CEK), which is a one-time-use symmetric key.
  2. 使用此 CEK 对用户数据进行加密。User data is encrypted using this CEK.
  3. 然后,使用密钥加密密钥 (KEK) 对此 CEK 进行包装(加密)。The CEK is then wrapped (encrypted) using the key encryption key (KEK). KEK 由密钥标识符标识,可以是本地管理的非对称密钥对或对称密钥。The KEK is identified by a key identifier and can be an asymmetric key pair or a symmetric key, which is managed locally. 存储客户端库本身永远无法访问 KEK。The storage client library itself never has access to KEK. 该库调用 KEK 提供的密钥包装算法。The library invokes the key wrapping algorithm that is provided by the KEK. 用户可以根据需要选择使用自定义提供程序进行密钥包装/解包。Users can choose to use custom providers for key wrapping/unwrapping if desired.
  4. 然后,将已加密的数据上传到 Azure 存储服务。The encrypted data is then uploaded to the Azure Storage service. 已包装的密钥以及一些附加加密元数据要么存储为元数据(在 Blob 上),要么以内插值替换已加密的数据(消息队列和表实体)。The wrapped key along with some additional encryption metadata is either stored as metadata (on a blob) or interpolated with the encrypted data (queue messages and table entities).

通过信封技术解密Decryption via the envelope technique

通过信封技术解密的工作方式如下:Decryption via the envelope technique works in the following way:

  1. 客户端库假定用户在本地管理密钥加密密钥 (KEK)。The client library assumes that the user is managing the key encryption key (KEK) locally. 用户不需要知道用于加密的特定密钥。The user does not need to know the specific key that was used for encryption. 但是,可以设置和使用一个密钥解析程序,将不同的密钥标识符解析为密钥。Instead, a key resolver, which resolves different key identifiers to keys, can be set up and used.
  2. 客户端库下载已加密数据以及存储在服务中的任何加密材料。The client library downloads the encrypted data along with any encryption material that is stored on the service.
  3. 然后,使用密钥加密密钥 (KEK) 对已包装的内容加密密钥 (CEK) 进行解包(解密)。The wrapped content encryption key (CEK) is then unwrapped (decrypted) using the key encryption key (KEK). 这里同样,客户端库无法访问 KEK。Here again, the client library does not have access to KEK. 它只是调用自定义提供程序的解包算法。It simply invokes the custom provider's unwrapping algorithm.
  4. 然后,使用内容加密密钥 (CEK) 解密已加密的用户数据。The content encryption key (CEK) is then used to decrypt the encrypted user data.

加密机制Encryption Mechanism

存储客户端库使用 AES 来加密用户数据。The storage client library uses AES in order to encrypt user data. 具体而言,是使用 AES 的加密块链接 (CBC) 模式。Specifically, Cipher Block Chaining (CBC) mode with AES. 每个服务的工作方式都稍有不同,因此我们会在此讨论其中每个服务。Each service works somewhat differently, so we will discuss each of them here.

BlobBlobs

目前,客户端库仅支持整个 Blob 的加密。The client library currently supports encryption of whole blobs only. 具体而言,用户使用 create* 方法时支持加密。Specifically, encryption is supported when users use the create* methods. 对于下载,支持完整下载和范围下载,并且可以并行化上传和下载。For downloads, both complete and range downloads are supported, and parallelization of both upload and download is available.

在加密过程中,客户端库生成 16 字节的随机初始化向量 (IV) 和 32 字节的随机内容加密密钥 (CEK) 并将使用此信息对 Blob 数据执行信封加密。During encryption, the client library will generate a random Initialization Vector (IV) of 16 bytes, together with a random content encryption key (CEK) of 32 bytes, and perform envelope encryption of the blob data using this information. 然后,已包装的 CEK 和一些附加加密元数据将与服务上的已加密 Blob 一起存储为 Blob 元数据。The wrapped CEK and some additional encryption metadata are then stored as blob metadata along with the encrypted blob on the service.

Warning

若要针对 Blob 编辑或上传自己的元数据,需确保此元数据已保留。If you are editing or uploading your own metadata for the blob, you need to ensure that this metadata is preserved. 如果在没有此元数据的情况下上传新元数据,则已包装的 CEK、IV 和其他元数据会丢失,而 Blob 内容永远无法再检索。If you upload new metadata without this metadata, the wrapped CEK, IV and other metadata will be lost and the blob content will never be retrievable again.

下载已加密的 blob 需要使用 get* 便捷方法检索整个 blob 的内容。Downloading an encrypted blob involves retrieving the content of the entire blob using the get* convenience methods. 将已包装的 CEK 解包,与 IV(在本示例中存储为 Blob 元数据)一起使用将解密后的数据返回给用户。The wrapped CEK is unwrapped and used together with the IV (stored as blob metadata in this case) to return the decrypted data to the users.

下载已加密 blob 中的任意范围(传入了范围参数的 get* 方法)需要调整用户提供的范围以获取少量可用于成功解密所请求范围的附加数据。Downloading an arbitrary range (get* methods with range parameters passed in) in the encrypted blob involves adjusting the range provided by users in order to get a small amount of additional data that can be used to successfully decrypt the requested range.

块 Blob 和页 Blob 只能使用此方案进行加密/解密。Block blobs and page blobs only can be encrypted/decrypted using this scheme. 目前不支持加密追加 Blob。There is currently no support for encrypting append blobs.

队列Queues

由于队列消息可以采用任何格式,客户端库定义一个自定义格式,其在消息文本中包括初始化向量 (IV) 和已加密的内容加密密钥 (CEK)。Since queue messages can be of any format, the client library defines a custom format that includes the Initialization Vector (IV) and the encrypted content encryption key (CEK) in the message text.

在加密过程中,客户端库会生成 16 个字节的随机 IV 和 32 个字节的随机 CEK,并使用此信息对队列消息文本执行信封加密。During encryption, the client library generates a random IV of 16 bytes along with a random CEK of 32 bytes and performs envelope encryption of the queue message text using this information. 然后,将已包装的 CEK 和一些附加加密元数据添加到已加密的队列消息中。The wrapped CEK and some additional encryption metadata are then added to the encrypted queue message. 此修改后的消息(如下所示)存储在服务中。This modified message (shown below) is stored on the service.

<MessageText>{"EncryptedMessageContents":"6kOu8Rq1C3+M1QO4alKLmWthWXSmHV3mEfxBAgP9QGTU++MKn2uPq3t2UjF1DO6w","EncryptionData":{…}}</MessageText>

在解密过程中,将从队列消息中提取已包装的密钥并将其解包。During decryption, the wrapped key is extracted from the queue message and unwrapped. 还会从队列消息中提取 IV,与解包的密钥一起使用来对队列消息数据进行解密。The IV is also extracted from the queue message and used along with the unwrapped key to decrypt the queue message data. 请注意,加密元数据很少(不到 500 个字节),因此虽然它计入队列消息的 64KB 限制,但影响应是可管理的。Note that the encryption metadata is small (under 500 bytes), so while it does count toward the 64KB limit for a queue message, the impact should be manageable.

Tables

客户端库支持对插入和替换操作的实体属性进行加密。The client library supports encryption of entity properties for insert and replace operations.

Note

当前不支持合并。Merge is not currently supported. 由于属性的子集可能以前已使用不同的密钥加密,因此只合并新属性和更新元数据会导致数据丢失。Since a subset of properties may have been encrypted previously using a different key, simply merging the new properties and updating the metadata will result in data loss. 合并需要进行额外的服务调用以从服务中读取预先存在的实体,或者需要为属性使用一个新密钥,由于性能方面的原因,这两种方案都不适用。Merging either requires making extra service calls to read the pre-existing entity from the service, or using a new key per property, both of which are not suitable for performance reasons.

表数据加密的工作方式如下:Table data encryption works as follows:

  1. 用户指定要加密的属性。Users specify the properties to be encrypted.

  2. 客户端库为每个实体生成 16 个字节的随机初始化向量 (IV) 和 32 个字节的随机内容加密密钥 (CEK),并通过为每个属性派生新的 IV 对要加密的单独属性执行信封加密。The client library generates a random Initialization Vector (IV) of 16 bytes along with a random content encryption key (CEK) of 32 bytes for every entity, and performs envelope encryption on the individual properties to be encrypted by deriving a new IV per property. 加密的属性存储为二进制数据。The encrypted property is stored as binary data.

  3. 然后,已包装的 CEK 和一些附加加密元数据将存储为两个附加保留属性。The wrapped CEK and some additional encryption metadata are then stored as two additional reserved properties. 第一个保留属性 (_ClientEncryptionMetadata1) 是一个字符串属性,保存有关 IV、版本和已包装密钥的信息。The first reserved property (_ClientEncryptionMetadata1) is a string property that holds the information about IV, version, and wrapped key. 第二个保留属性 (_ClientEncryptionMetadata2) 是一个二进制属性,保存有关已加密属性的信息。The second reserved property (_ClientEncryptionMetadata2) is a binary property that holds the information about the properties that are encrypted. 第二个属性 (_ClientEncryptionMetadata2) 中的信息本身是加密的。The information in this second property (_ClientEncryptionMetadata2) is itself encrypted.

  4. 由于加密需要这两个附加保留属性,用户现在可能只有 250 个自定义属性,而不是 252 个。Due to these additional reserved properties required for encryption, users may now have only 250 custom properties instead of 252. 实体的总大小必须小于 1MB。The total size of the entity must be less than 1MB.

    请注意,只有字符串属性可以加密。Note that only string properties can be encrypted. 如果要对其他类型的属性进行加密,必须将它们转换为字符串。If other types of properties are to be encrypted, they must be converted to strings. 加密的字符串作为二进制属性存储在服务中,并在解密之后转换回字符串(原始字符串,不是 EdmType.STRING 类型的 EntityProperties)。The encrypted strings are stored on the service as binary properties, and they are converted back to strings (raw strings, not EntityProperties with type EdmType.STRING) after decryption.

    对于表,除了加密策略以外,用户还必须指定要加密的属性。For tables, in addition to the encryption policy, users must specify the properties to be encrypted. 为此,可将这些属性存储在 type 设置为 EdmType.STRING 且 encrypt 设置为 true 的 TableEntity 对象中,或者在 tableservice 对象中设置 encryption_resolver_function。This can be done by either storing these properties in TableEntity objects with the type set to EdmType.STRING and encrypt set to true or setting the encryption_resolver_function on the tableservice object. 加密解析程序是一个函数,它接受分区键、行键和属性名称并返回一个布尔值以指示是否应加密该属性。An encryption resolver is a function that takes a partition key, row key, and property name and returns a boolean that indicates whether that property should be encrypted. 在加密过程中,客户端库会使用此信息来确定是否应在写入到网络时加密属性。During encryption, the client library will use this information to decide whether a property should be encrypted while writing to the wire. 该委托还可以围绕如何加密属性实现逻辑的可能性。The delegate also provides for the possibility of logic around how properties are encrypted. (例如,如果 X,则加密属性 A,否则加密属性 A 和 B。)请注意,在读取或查询实体时,不需要提供此信息。(For example, if X, then encrypt property A; otherwise encrypt properties A and B.) Note that it is not necessary to provide this information while reading or querying entities.

批处理操作Batch Operations

一个加密策略应用到批中的所有行。One encryption policy applies to all rows in the batch. 客户端库将为批中的每行在内部生成一个新的随机 IV 和随机 CEK。The client library will internally generate a new random IV and random CEK per row in the batch. 用户还可以选择通过在加密解析程序中定义此行为来加密批中的每个操作的不同属性。Users can also choose to encrypt different properties for every operation in the batch by defining this behavior in the encryption resolver. 如果某个批是通过 tableservice batch() 方法以上下文管理器形式创建的,则 tableservice 的加密策略自动应用到该批。If a batch is created as a context manager through the tableservice batch() method, the tableservice's encryption policy will automatically be applied to the batch. 如果某个批是通过调用构造函数显式创建的,则必须将加密策略作为参数来传递,并且在该批的生存期内都不要修改加密策略。If a batch is created explicitly by calling the constructor, the encryption policy must be passed as a parameter and left unmodified for the lifetime of the batch. 请注意,使用批的加密策略将实例插入批时,会将实体加密(使用 tableservice 的加密策略提交批时不会加密实体)。Note that entities are encrypted as they are inserted into the batch using the batch's encryption policy (entities are NOT encrypted at the time of committing the batch using the tableservice's encryption policy).

查询Queries

Note

由于实体已加密,因此不能运行根据已加密属性进行筛选的查询。Because the entities are encrypted, you cannot run queries that filter on an encrypted property. 如果尝试运行,结果将会不正确,因为该服务会尝试将已加密的数据与未加密的数据进行比较。If you try, results will be incorrect, because the service would be trying to compare encrypted data with unencrypted data.

若要执行查询操作,必须指定一个能够解析结果集中的所有密钥的密钥解析程序。To perform query operations, you must specify a key resolver that is able to resolve all the keys in the result set. 如果查询结果中包含的实体不能解析为提供程序,则客户端库会引发错误。If an entity contained in the query result cannot be resolved to a provider, the client library will throw an error. 对于执行服务器端投影的任何查询,在默认情况下,客户端库将为所选列添加特殊的加密元数据属性(_ClientEncryptionMetadata1 和 _ClientEncryptionMetadata2)。For any query that performs server side projections, the client library will add the special encryption metadata properties (_ClientEncryptionMetadata1 and _ClientEncryptionMetadata2) by default to the selected columns.

Important

使用客户端加密时,请注意以下要点:Be aware of these important points when using client-side encryption:

  • 读取或写入到已加密的 Blob 时,请使用完整 Blob 上传命令和范围/完整 Blob 下载命令。When reading from or writing to an encrypted blob, use whole blob upload commands and range/whole blob download commands. 避免使用协议操作(如“放置块”、“放置块列表”、“写入页”或“清除页”)写入到已加密的 Blob,否则可能会损坏已加密的 Blob 并使其不可读。Avoid writing to an encrypted blob using protocol operations such as Put Block, Put Block List, Write Pages, or Clear Pages; otherwise you may corrupt the encrypted blob and make it unreadable.
  • 对于表,存在类似的约束。For tables, a similar constraint exists. 请注意,不要在未更新加密元数据的情况下更新已加密的属性。Be careful to not update encrypted properties without updating the encryption metadata.
  • 如果在已加密的 Blob 上设置元数据,则可能会覆盖解密所需的与加密相关的元数据,因为设置元数据不是累加性的。If you set metadata on the encrypted blob, you may overwrite the encryption-related metadata required for decryption, since setting metadata is not additive. 这也适用于快照;避免在创建已加密的 Blob 的快照时指定元数据。This is also true for snapshots; avoid specifying metadata while creating a snapshot of an encrypted blob. 如果必须设置元数据,务必调用 get_blob_metadata 方法首先获取当前加密元数据,并在设置元数据时避免并发写入。If metadata must be set, be sure to call the get_blob_metadata method first to get the current encryption metadata, and avoid concurrent writes while metadata is being set.
  • 对于只处理加密数据的用户,请在服务对象中启用 require_encryption 标志。Enable the require_encryption flag on the service object for users that should work only with encrypted data. 有关详细信息,请参阅下文。See below for more info.

存储客户端库要求提供的 KEK 和密钥解析程序实现以下接口。The storage client library expects the provided KEK and key resolver to implement the following interface. 用于 Python KEK 管理的 Azure 密钥保管库支持正在筹备中,开发完成后会集成到此库中。Azure Key Vault support for Python KEK management is pending and will be integrated into this library when completed.

客户端 API/接口Client API / Interface

创建存储服务对象(例如 blockblobservice)后,用户可以向构成加密策略的字段赋值:key_encryption_key、key_resolver_function 和 require_encryption。After a storage service object (i.e. blockblobservice) has been created, the user may assign values to the fields that constitute an encryption policy: key_encryption_key, key_resolver_function, and require_encryption. 用户可仅提供 KEK 或解析程序,或同时提供两者。Users can provide only a KEK, only a resolver, or both. key_encryption_key 是使用密钥标识符进行标识的基本密钥类型,它提供包装/解包逻辑。key_encryption_key is the basic key type that is identified using a key identifier and that provides the logic for wrapping/unwrapping. key_resolver_function 用于在解密过程中解析密钥。key_resolver_function is used to resolve a key during the decryption process. 在指定了密钥标识符的情况下,它返回有效的 KEK。It returns a valid KEK given a key identifier. 由此,用户能够在多个位置中托管的多个密钥之间进行选择。This provides users the ability to choose between multiple keys that are managed in multiple locations.

KEK 必须实现以下方法才能成功加密数据:The KEK must implement the following methods to successfully encrypt data:

  • wrap_key(cek):使用用户所选的算法包装指定的 CEK(字节)。wrap_key(cek): Wraps the specified CEK (bytes) using an algorithm of the user's choice. 返回包装的密钥。Returns the wrapped key.
  • get_key_wrap_algorithm():返回用于包装密钥的算法。get_key_wrap_algorithm(): Returns the algorithm used to wrap keys.
  • get_kid():返回此 KEK 的字符串密钥 ID。get_kid(): Returns the string key id for this KEK. KEK 必须实现以下方法才能成功解密数据:The KEK must implement the following methods to successfully decrypt data:
  • unwrap_key(cek, algorithm):使用字符串指定的算法返回指定 CEK 的解包形式。unwrap_key(cek, algorithm): Returns the unwrapped form of the specified CEK using the string-specified algorithm.
  • get_kid():返回此 KEK 的字符串密钥 ID。get_kid(): Returns a string key id for this KEK.

密钥解析程序必须至少实现一个方法,以便在指定密钥 ID 的情况下,返回用于实现上述接口的相应 KEK。The key resolver must at least implement a method that, given a key id, returns the corresponding KEK implementing the interface above. 只会将此方法分配到服务对象中的 key_resolver_function 属性。Only this method is to be assigned to the key_resolver_function property on the service object.

  • 对于加密,始终使用该密钥,而没有密钥会导致错误。For encryption, the key is used always and the absence of a key will result in an error.

  • 对于解密:For decryption:

    • 如果指定为获取密钥,则调用密钥解析程序。The key resolver is invoked if specified to get the key. 如果指定了解析程序,但该解析程序不具有密钥标识符的映射,则会引发错误。If the resolver is specified but does not have a mapping for the key identifier, an error is thrown.

    • 如果未指定解析程序,但指定了密钥,则在该密钥的标识符与所需密钥标识符匹配时使用该密钥。If resolver is not specified but a key is specified, the key is used if its identifier matches the required key identifier. 如果标识符不匹配,则会引发错误。If the identifier does not match, an error is thrown.

      azure.storage.samples 中的加密示例演示了针对 blob、队列和表的更详细端到端方案。The encryption samples in azure.storage.samples demonstrate a more detailed end-to-end scenario for blobs, queues and tables. KEK 和密钥解析程序的示例实现在示例文件中分别以 KeyWrapper 和 KeyResolver 提供。Sample implementations of the KEK and key resolver are provided in the sample files as KeyWrapper and KeyResolver respectively.

RequireEncryption 模式RequireEncryption mode

用户可以选择启用这样的操作模式,要求加密所有上传和下载行为。Users can optionally enable a mode of operation where all uploads and downloads must be encrypted. 在此模式下,尝试在没有加密策略的情况下上传数据或下载在服务中未加密的数据,会导致在客户端上失败。In this mode, attempts to upload data without an encryption policy or download data that is not encrypted on the service will fail on the client. 服务对象中的 require_encryption 标志控制此行为。The require_encryption flag on the service object controls this behavior.

Blob 服务加密Blob service encryption

设置 blockblobservice 对象中的加密策略字段。Set the encryption policy fields on the blockblobservice object. 其他所有事项均由客户端库在内部处理。Everything else will be handled by the client library internally.

# Create the KEK used for encryption.
# KeyWrapper is the provided sample implementation, but the user may use their own object as long as it implements the interface above.
kek = KeyWrapper('local:key1')  # Key identifier

# Create the key resolver used for decryption.
# KeyResolver is the provided sample implementation, but the user may use whatever implementation they choose so long as the function set on the service object behaves appropriately.
key_resolver = KeyResolver()
key_resolver.put_key(kek)

# Set the KEK and key resolver on the service object.
my_block_blob_service.key_encryption_key = kek
my_block_blob_service.key_resolver_funcion = key_resolver.resolve_key

# Upload the encrypted contents to the blob.
my_block_blob_service.create_blob_from_stream(
    container_name, blob_name, stream)

# Download and decrypt the encrypted contents from the blob.
blob = my_block_blob_service.get_blob_to_bytes(container_name, blob_name)

队列服务加密Queue service encryption

设置 queueservice 对象中的加密策略字段。Set the encryption policy fields on the queueservice object. 其他所有事项均由客户端库在内部处理。Everything else will be handled by the client library internally.

# Create the KEK used for encryption.
# KeyWrapper is the provided sample implementation, but the user may use their own object as long as it implements the interface above.
kek = KeyWrapper('local:key1')  # Key identifier

# Create the key resolver used for decryption.
# KeyResolver is the provided sample implementation, but the user may use whatever implementation they choose so long as the function set on the service object behaves appropriately.
key_resolver = KeyResolver()
key_resolver.put_key(kek)

# Set the KEK and key resolver on the service object.
my_queue_service.key_encryption_key = kek
my_queue_service.key_resolver_funcion = key_resolver.resolve_key

# Add message
my_queue_service.put_message(queue_name, content)

# Retrieve message
retrieved_message_list = my_queue_service.get_messages(queue_name)

表服务加密Table service encryption

除了创建加密策略并在请求选项中设置它以外,还必须在 tableservice 中指定 encryption_resolver_function,或者在 EntityProperty 中设置 encrypt 属性。In addition to creating an encryption policy and setting it on request options, you must either specify an encryption_resolver_function on the tableservice, or set the encrypt attribute on the EntityProperty.

使用解析程序Using the resolver

# Create the KEK used for encryption.
# KeyWrapper is the provided sample implementation, but the user may use their own object as long as it implements the interface above.
kek = KeyWrapper('local:key1')  # Key identifier

# Create the key resolver used for decryption.
# KeyResolver is the provided sample implementation, but the user may use whatever implementation they choose so long as the function set on the service object behaves appropriately.
key_resolver = KeyResolver()
key_resolver.put_key(kek)

# Define the encryption resolver_function.


def my_encryption_resolver(pk, rk, property_name):
    if property_name == 'foo':
        return True
    return False


# Set the KEK and key resolver on the service object.
my_table_service.key_encryption_key = kek
my_table_service.key_resolver_funcion = key_resolver.resolve_key
my_table_service.encryption_resolver_function = my_encryption_resolver

# Insert Entity
my_table_service.insert_entity(table_name, entity)

# Retrieve Entity
# Note: No need to specify an encryption resolver for retrieve, but it is harmless to leave the property set.
my_table_service.get_entity(
    table_name, entity['PartitionKey'], entity['RowKey'])

使用属性Using attributes

如上所述,可能通过将某个属性存储在 EntityProperty 对象中并设置 encrypt 字段,将该属性标记为进行加密。As mentioned above, a property may be marked for encryption by storing it in an EntityProperty object and setting the encrypt field.

encrypted_property_1 = EntityProperty(EdmType.STRING, value, encrypt=True)

加密和性能Encryption and performance

注意,加密存储数据会导致额外的性能开销。Note that encrypting your storage data results in additional performance overhead. 必须生成内容密钥和 IV,内容本身必须进行加密,并且其他元数据必须进行格式化并上传。The content key and IV must be generated, the content itself must be encrypted, and additional metadata must be formatted and uploaded. 此开销将因所加密的数据量而有所变化。This overhead will vary depending on the quantity of data being encrypted. 我们建议客户在开发过程中始终测试其应用程序的性能。We recommend that customers always test their applications for performance during development.

后续步骤Next steps