Durable Functions (Azure Functions) 中的数据保留和序列化Data persistence and serialization in Durable Functions (Azure Functions)

Durable Functions 会自动将函数参数、返回值和其他状态保留到持久后端,以便提供可靠的执行。Durable Functions automatically persists function parameters, return values, and other state to a durable backend in order to provide reliable execution. 但是,保留到持久存储的数据量和频率可能会影响应用程序性能和存储事务成本。However, the amount and frequency of data persisted to durable storage can impact application performance and storage transaction costs. 可能还需要考虑数据保留和隐私策略,具体取决于应用程序存储的数据的类型。Depending on the type of data your application stores, data retention and privacy policies may also need to be considered.

Azure 存储Azure Storage

默认情况下,Durable Functions 将数据保留到你指定的 Azure 存储帐户中的队列、表和 blob。By default, Durable Functions persists data to queues, tables, and blobs in an Azure Storage account that you specify.

队列Queues

Durable Functions 使用 Azure 存储队列来可靠地计划所有函数执行。Durable Functions uses Azure Storage queues to reliably schedule all function executions. 这些队列消息包含函数输入或输出,具体取决于消息是用于计划某个执行还是将值返回到调用函数。These queue messages contain function inputs or outputs, depending on whether the message is being used to schedule an execution or return a value back to a calling function. 这些队列消息还包括 Durable Functions 用于内部用途的其他元数据,例如路由和端到端关联。These queue messages also include additional metadata that Durable Functions uses for internal purposes, like routing and end-to-end correlation. 在为收到的消息做出响应而执行完某个函数后,该消息会被删除,执行的结果也可能会被保留到 Azure 存储表或 Azure 存储 Blob。After a function has finished executing in response to a received message, that message is deleted and the result of the execution may also be persisted to either Azure Storage Tables or Azure Storage Blobs.

在单个任务中心内,Durable Functions 会创建消息并将其添加到一个名为 <taskhub>-workitem 的工作项队列(用于计划活动函数),以及一个或多个名为 <taskhub>-control-## 的控制队列(用于计划或恢复业务流程协调程序和实体函数)。Within a single task hub, Durable Functions creates and adds messages to a work-item queue named <taskhub>-workitem for scheduling activity functions and one or more control queues named <taskhub>-control-## to schedule or resume orchestrator and entity functions. 控制队列的数目等于为应用程序配置的分区数。The number of control queues is equal to the number of partitions configured for your application. 有关队列和分区的详细信息,请参阅性能和可伸缩性文档For more information about queues and partitions, see the Performance and Scalability documentation.

Tables

业务流程成功处理消息后,其生成操作的记录会保留到名为 <taskhub>History 的历史记录表。Once orchestrations process messages successfully, records of their resulting actions are persisted to the History table named <taskhub>History. 业务流程的输入、输出和自定义状态数据也会保留到名为 <taskhub>Instances 的实例表。Orchestration inputs, outputs, and custom status data is also persisted to the Instances table named <taskhub>Instances.

BlobBlobs

大多数情况下,Durable Functions 不会使用 Azure 存储 Blob 来保留数据。In most cases, Durable Functions doesn't use Azure Storage Blobs to persist data. 但是,队列和表的大小限制可能会阻止 Durable Functions 将所有必需数据保留到存储行或队列消息中。However, queues and tables have size limits that can prevent Durable Functions from persisting all of the required data into a storage row or queue message. 例如,当序列化后需要保留到队列中的一段数据大于 45 KB 时,Durable Functions 会改为压缩数据并将其存储在 blob 中。For example, when a piece of data that needs to be persisted to a queue is greater than 45 KB when serialized, Durable Functions will compress the data and store it in a blob instead. 以这种方式将数据保留到 blob 存储时,Durable Functions 会在表行或队列消息中存储对该 blob 的引用。When persisting data to blob storage in this way, Durable Function stores a reference to that blob in the table row or queue message. 当 Durable Functions 需要检索数据时,它会自动从 blob 中提取数据。When Durable Functions needs to retrieve the data it will automatically fetch it from the blob. 这些 blob 存储在 blob 容器 <taskhub>-largemessages 中。These blobs are stored in the blob container <taskhub>-largemessages.

备注

对于较大的消息,额外的压缩和 blob 操作步骤可能会导致 CPU 和 I/O 延迟方面的成本高昂。The extra compression and blob operation steps for large messages can be expensive in terms of CPU and I/O latency costs. 此外,Durable Functions 需要在内存中加载保留的数据,并且可能会同时为许多不同的函数执行而这样做。Additionally, Durable Functions needs to load persisted data in memory, and may do so for many different function executions at the same time. 因此,保留较大的数据有效负载也可能导致内存使用率过高。As a result, persisting large data payloads can cause high memory usage as well. 若要最大程度地减少内存开销,请考虑手动保留较大的数据有效负载(例如,将其保留在 blob 存储中),改为传递对该数据的引用。To minimize memory overhead, consider persisting large data payloads manually (for example, in blob storage) and instead pass around references to this data. 这样,代码就可以仅在需要时加载数据,避免在业务流程协调程序函数重播期间出现冗余负载。This way your code can load the data only when needed to avoid redundant loads during orchestrator function replays. 但是,不建议将有效负载存储到磁盘,因为磁盘上的状态并不保证可用,这是因为函数在其整个生存期内可能会在不同的 VM 上执行。However, storing payloads to disk is not recommended since on-disk state is not guaranteed to be available since functions may execute on different VMs throughout their lifetimes.

序列化并保留的数据的类型Types of data that is serialized and persisted

下面列出了在使用 Durable Functions 的功能时将要序列化并保留的不同类型的数据:The following is a list of the different types of data that will be serialized and persisted when using features of Durable Functions:

  • 业务流程协调程序、活动和实体函数的所有输入和输出,包括任何 ID 和未处理的异常All inputs and outputs of orchestrator, activity, and entity functions, including any IDs and unhandled exceptions
  • 业务流程协调程序、活动和实体函数的名称Orchestrator, activity, and entity function names
  • 外部事件名称和有效负载External event names and payloads
  • 自定义业务流程状态有效负载Custom orchestration status payloads
  • 业务流程终止消息Orchestration termination messages
  • 持久计时器有效负载Durable timer payloads
  • 持久 HTTP 请求和响应 URL、标头和有效负载Durable HTTP request and response URLs, headers, and payloads
  • 实体调用和信号有效负载Entity call and signal payloads
  • 实体状态有效负载Entity state payloads

处理敏感数据Working with sensitive data

使用 Azure 存储时,所有数据都会自动进行静态加密。When using Azure Storage, all data is automatically encrypted at rest. 但是,有权访问存储帐户的任何人都可以读取采用未加密格式的数据。However, anyone with access to the storage account can read the data in its unencrypted form. 如果需要对敏感数据进行更强的保护,请考虑先使用自己的加密密钥对数据进行加密,以便 Durable Functions 以预加密格式保留数据。If you need stronger protection for sensitive data, consider first encrypting the data using your own encryption keys so that Durable Functions persists the data in a pre-encrypted form.

另外,.NET 用户也可以选择实现提供自动加密的自定义序列化提供程序。Alternatively, .NET users have the option of implementing custom serialization providers that provide automatic encryption. 可以在此 GitHub 示例中找到一个包含加密功能的自定义序列化的示例。An example of custom serialization with encryption can be found in this GitHub sample.

备注

如果决定实施应用程序级别的加密,请注意,业务流程和实体可以存在无限长的时间。If you decide to implement application-level encryption, be aware that orchestrations and entities can exist for indefinite amounts of time. 这一点在需要轮换加密密钥时很重要,因为业务流程或实体运行的时间可能比密钥轮换策略要长。This matters when it comes time to rotate your encryption keys because an orchestration or entities may run longer than your key rotation policy. 如果发生密钥轮换,则下一次执行业务流程或实体时,用于加密数据的密钥也许不再可用于对数据进行解密。If a key rotation happens, the key used to encrypt your data may no longer be available to decrypt it the next time your orchestration or entity executes. 因此,仅当应在相对较短的时段内运行业务流程和实体时,才建议进行客户加密。Customer encryption is therefore recommended only when orchestrations and entities are expected to run for relatively short periods of time.

自定义序列化和反序列化Customizing serialization and deserialization

默认序列化逻辑Default serialization logic

Durable Functions 在内部使用 Json.NET 将业务流程和实体数据序列化为 JSON。Durable Functions internally uses Json.NET to serialize orchestration and entity data to JSON. Durable Functions 用于 Json.NET 的默认设置为:The default settings Durable Functions uses for Json.NET are:

输入、输出和状态:Inputs, Outputs, and State:

JsonSerializerSettings
{
    TypeNameHandling = TypeNameHandling.None,
    DateParseHandling = DateParseHandling.None,
}

异常:Exceptions:

JsonSerializerSettings
{
    ContractResolver = new ExceptionResolver(),
    TypeNameHandling = TypeNameHandling.Objects,
    ReferenceLoopHandling = ReferenceLoopHandling.Ignore,
}

有关 JsonSerializerSettings 的更详细文档,请参阅此文Read more detailed documentation about JsonSerializerSettings here.

使用 .NET 特性自定义序列化Customizing serialization with .NET attributes

序列化数据时,Json.NET 会在类和属性上查找各种特性,这些特性控制如何从 JSON 序列化和反序列化数据。When serializing data, Json.NET looks for various attributes on classes and properties that control how the data is serialized and deserialized from JSON. 如果你拥有传递到 Durable Functions API 的数据类型的源代码,请考虑将这些特性添加到类型以自定义序列化和反序列化。If you own the source code for data type passed to Durable Functions APIs, consider adding these attributes to the type to customize serialization and deserialization.

使用依赖项注入自定义序列化Customizing serialization with Dependency Injection

面向 .NET 并在 Functions V3 运行时上运行的函数应用可以使用依赖项注入 (DI) 来自定义数据和异常的序列化方式。Function apps that target .NET and run on the Functions V3 runtime can use Dependency Injection (DI) to customize how data and exceptions are serialized. 下面的示例代码演示了如何使用 DI 通过 IMessageSerializerSettingsFactoryIErrorSerializerSettingsFactory 服务接口的自定义实现来替代默认的 Json.NET 序列化设置。The sample code below demonstrates how to use DI to override the default Json.NET serialization settings using custom implementations of the IMessageSerializerSettingsFactory and IErrorSerializerSettingsFactory service interfaces.

using Microsoft.Azure.Functions.Extensions.DependencyInjection;
using Microsoft.Azure.WebJobs.Extensions.DurableTask;
using Microsoft.Extensions.DependencyInjection;
using Newtonsoft.Json;
using System.Collections.Generic;

[assembly: FunctionsStartup(typeof(MyApplication.Startup))]
namespace MyApplication
{
    public class Startup : FunctionsStartup
    {
        public override void Configure(IFunctionsHostBuilder builder)
        {
            builder.Services.AddSingleton<IMessageSerializerSettingsFactory, CustomMessageSerializerSettingsFactory>();
            builder.Services.AddSingleton<IErrorSerializerSettingsFactory, CustomErrorSerializerSettingsFactory>();
        }

        /// <summary>
        /// A factory that provides the serialization for all inputs and outputs for activities and
        /// orchestrations, as well as entity state.
        /// </summary>
        internal class CustomMessageSerializerSettingsFactory : IMessageSerializerSettingsFactory
        {
            public JsonSerializerSettings CreateJsonSerializerSettings()
            {
                // Return your custom JsonSerializerSettings here
            }
        }

        /// <summary>
        /// A factory that provides the serialization for all exceptions thrown by activities
        /// and orchestrations
        /// </summary>
        internal class CustomErrorSerializerSettingsFactory : IErrorSerializerSettingsFactory
        {
            public JsonSerializerSettings CreateJsonSerializerSettings()
            {
                // Return your custom JsonSerializerSettings here
            }
        }
    }
}