优化 Azure Functions 的性能和可靠性Optimize the performance and reliability of Azure Functions

本文为提高无服务器函数应用的性能和可靠性提供了指南。This article provides guidance to improve the performance and reliability of your serverless function apps.

常规最佳做法General best practices

下面是有关如何使用 Azure Functions 生成和构建无服务器解决方案的最佳做法。The following are best practices in how you build and architect your serverless solutions using Azure Functions.

避免使用长时间运行的函数Avoid long running functions

长时间运行的大型函数可能会引起意外超时问题。Large, long-running functions can cause unexpected timeout issues. 函数规模可能因含有许多 Node.js 依赖项而变大。A function can become large due to many Node.js dependencies. 导入依赖项也会导致加载时间增加,引起意外的超时问题。Importing dependencies can also cause increased load times that result in unexpected timeouts. 显式和隐式加载依赖项。Dependencies are loaded both explicitly and implicitly. 由代码加载的单个模块可能会加载自己的附加模块。A single module loaded by your code may load its own additional modules.

尽可能将大型函数重构为可协同工作且快速返回响应的较小函数集。Whenever possible, refactor large functions into smaller function sets that work together and return responses fast. 例如,webhook 或 HTTP 触发器函数可能需要在特定时间限制内确认响应;webhook 需要快速响应,这很常见。For example, a webhook or HTTP trigger function might require an acknowledgment response within a certain time limit; it is common for webhooks to require an immediate response. 可将 HTTP 触发器有效负载传递到由队列触发器函数处理的队列。You can pass the HTTP trigger payload into a queue to be processed by a queue trigger function. 此方法允许延迟实际工作并返回即时响应。This approach allows you to defer the actual work and return an immediate response.

跨函数通信Cross function communication

Durable FunctionsAzure 逻辑应用用于管理状态转换以及多个函数之间的通信。Durable Functions and Azure Logic Apps are built to manage state transitions and communication between multiple functions.

如果不使用 Durable Functions 或逻辑应用来集成多个函数,将存储队列用于跨函数通信通常是最佳做法。If not using Durable Functions or Logic Apps to integrate with multiple functions, it is generally a best practice to use storage queues for cross function communication. 主要原因是因为存储队列成本更低、更易预配。The main reason is storage queues are cheaper and much easier to provision.

存储队列中各消息的大小限制为 64 KB。Individual messages in a storage queue are limited in size to 64 KB. 如果需要在函数之间传递更大的消息,可使用 Azure 服务总线队列,以在标准层中支持最大为 256 KB 的消息大小,在高级层中最大为 1 MB 的消息大小。If you need to pass larger messages between functions, an Azure Service Bus queue could be used to support message sizes up to 256 KB in the Standard tier, and up to 1 MB in the Premium tier.

如果在处理前需要筛选消息,则服务总线主题十分有用。Service Bus topics are useful if you require message filtering before processing.

对于支持大容量通信,事件中心十分有用。Event hubs are useful to support high volume communications.

将函数编写为无状态Write functions to be stateless

如有可能,函数应为无状态和幂等。Functions should be stateless and idempotent if possible. 将任何所需的状态信息与用户的数据相关联。Associate any required state information with your data. 例如,正在处理的排序可能具有关联的 state 成员。For example, an order being processed would likely have an associated state member. 函数本身保持无状态时,该函数可根据该状态处理排序。A function could process an order based on that state while the function itself remains stateless.

对于计时器触发器,特别建议采用幂等函数。Idempotent functions are especially recommended with timer triggers. 例如,如果有必须每天运行一次的内容,则编写它,使它可在一天内的任何时间运行,并生成相同的结果。For example, if you have something that absolutely must run once a day, write it so it can run any time during the day with the same results. 某天没有任何工作时,可退出该函数。The function can exit when there is no work for a particular day. 此外,如果未能完成以前的运行,则下次运行应从中断的位置继续运行。Also if a previous run failed to complete, the next run should pick up where it left off.

编写防御函数Write defensive functions

假定任何时候函数都可能会遇到异常。Assume your function could encounter an exception at any time. 设计函数,使其具有在下次执行期间从上一失败点继续执行的能力。Design your functions with the ability to continue from a previous fail point during the next execution. 请考虑需执行以下操作的方案:Consider a scenario that requires the following actions:

  1. 在 DB 中进行 10,000 行的查询。Query for 10,000 rows in a db.
  2. 为每行创建队列消息,从而处理下一行。Create a queue message for each of those rows to process further down the line.

根据系统复杂程度,可能有:行为有误的相关下游服务,网络故障或已达配额限制等等。所有这些可在任何时间影响用户的函数。Depending on how complex your system is, you may have: involved downstream services behaving badly, networking outages, or quota limits reached, etc. All of these can affect your function at any time. 需设计函数,使其做好该准备。You need to design your functions to be prepared for it.

如果将 5,000 个那些项插入到队列中进行处理,然后发生故障,代码将如何响应?How does your code react if a failure occurs after inserting 5,000 of those items into a queue for processing? 跟踪已完成的一组中的项。Track items in a set that you’ve completed. 否则,下次可能再次插入它们。Otherwise, you might insert them again next time. 这会严重影响工作流。This can have a serious impact on your work flow.

如果已处理队列项,则允许函数不执行任何操作。If a queue item was already processed, allow your function to be a no-op.

利用已为 Azure Functions 平台中使用的组件提供的防御措施。Take advantage of defensive measures already provided for components you use in the Azure Functions platform. 有关示例,请参阅 Azure 存储队列触发器和绑定文档中的处理有害队列消息For example, see Handling poison queue messages in the documentation for Azure Storage Queue triggers and bindings.

可伸缩性最佳做法Scalability best practices

有许多因素会影响函数应用实例的缩放方式。There are a number of factors which impact how instances of your function app scale. 有关函数缩放的文档中提供了详细信息。The details are provided in the documentation for function scaling. 下面是确保以最佳方式缩放函数应用的最佳做法。The following are some best practices to ensure optimal scalability of a function app.

共享和管理连接Share and manage connections

只要可能,请重用与外部资源的连接。Re-use connections to external resources whenever possible. 请参阅如何管理 Azure Functions 中的连接See how to manage connections in Azure Functions.

请勿在同一函数应用中混合测试和生产代码Don't mix test and production code in the same function app

Function App 中的各函数共享资源。Functions within a function app share resources. 例如,共享内存。For example, memory is shared. 如果生产中使用的是 Function App,则请勿向其添加与测试相关的函数和资源。If you're using a function app in production, don't add test-related functions and resources to it. 生产代码执行期间,这可能会导致意外的开销。It can cause unexpected overhead during production code execution.

请注意在生产 Function App 中加载的内容。Be careful what you load in your production function apps. 将内存平均分配给应用中的每个函数。Memory is averaged across each function in the app.

如果在多个 .NET 函数中引用共享程序集,请将其放在常用的共享文件夹中。If you have a shared assembly referenced in multiple .NET functions, put it in a common shared folder. 如果使用 C# 脚本 (.csx),请使用类似于以下示例的语句引用程序集:Reference the assembly with a statement similar to the following example if using C# Scripts (.csx):

#r "..\Shared\MyAssembly.dll". 

否则,很容易意外部署在函数之间表现不同的同一二进制的多个测试版本。Otherwise, it is easy to accidentally deploy multiple test versions of the same binary that behave differently between functions.

请勿在生产代码中使用详细日志记录。Don't use verbose logging in production code. 其对性能有负面影响。It has a negative performance impact.

使用异步代码,但避免阻止调用Use async code but avoid blocking calls

异步编程是推荐的最佳做法。Asynchronous programming is a recommended best practice. 但是,请始终避免引用 Result 属性或在 Task 实例上调用 Wait 方法。However, always avoid referencing the Result property or calling Wait method on a Task instance. 这种方法会导致线程耗尽。This approach can lead to thread exhaustion.

Tip

如果计划使用 HTTP 或 WebHook 绑定,请制定计划来避免因实例化 HttpClient 不当导致的端口耗尽现象。If you plan to use the HTTP or WebHook bindings, plan to avoid port exhaustion that can be caused by improper instantiation of HttpClient. 有关详细信息,请参阅如何在 Azure Functions 中管理连接For more information, see How to manage connections in Azure Functions.

尽量批量接收消息Receive messages in batch whenever possible

某些触发器(例如事件中心)允许通过单次调用接收一批消息。Some triggers like Event Hub enable receiving a batch of messages on a single invocation. 批处理消息可大幅提升性能。Batching messages has much better performance. 可以根据 host.json 参考文档中的详述,在 host.json 文件中配置最大批大小You can configure the max batch size in the host.json file as detailed in the host.json reference documentation

对于 C# 函数,可将类型更改为强类型化数组。For C# functions you can change the type to a strongly-typed array. 例如,方法签名可以是 EventData[] sensorEvent,而不是 EventData sensorEventFor example, instead of EventData sensorEvent the method signature could be EventData[] sensorEvent. 对于其他语言,需要根据此文所述,在 function.json 中将基数属性显式设置为 many,以启用批处理。For other languages you'll need to explicitly set the cardinality property in your function.json to many in order to enable batching as shown here.

配置主机行为以更好地处理并发性Configure host behaviors to better handle concurrency

使用函数应用中的 host.json 文件可以配置主机运行时和触发器行为。The host.json file in the function app allows for configuration of host runtime and trigger behaviors. 除了批处理行为以外,还可以管理大量触发器的并发性。In addition to batching behaviors, you can manage concurrency for a number of triggers. 调整这些选项中的值往往有助于每个实例根据被调用函数的需求适当缩放。Often adjusting the values in these options can help each instance scale appropriately for the demands of the invoked functions.

主机文件中的设置应用于应用中的所有函数,以及函数的单个实例。Settings in the hosts file apply across all functions within the app, within a single instance of the function. 例如,如果有包含 2 个 HTTP 函数的函数应用,并且并发请求设置为 25,则针对任一 HTTP 触发器发出的请求将计入 25 个共享的并发请求。For example, if you had a function app with 2 HTTP functions and concurrent requests set to 25, a request to either HTTP trigger would count towards the shared 25 concurrent requests. 如果该函数应用扩展到 10 个实例,则 2 个函数将有效地允许 250 个并发请求(10 个实例 * 每个实例 25 个并发请求)。If that function app scaled to 10 instances, the 2 functions would effectively allow 250 concurrent requests (10 instances * 25 concurrent requests per instance).

HTTP 并发性主机选项HTTP concurrency host options

{
    "http": {
        "routePrefix": "api",
        "maxOutstandingRequests": 200,
        "maxConcurrentRequests": 100,
        "dynamicThrottlesEnabled": true
    }
}
属性Property 默认Default 说明Description
routePrefixroutePrefix apiapi 应用到所有路由的路由前缀。The route prefix that applies to all routes. 使用空字符串可删除默认前缀。Use an empty string to remove the default prefix.
maxOutstandingRequestsmaxOutstandingRequests 200*200* 在任意给定时间搁置的未完成请求数上限。The maximum number of outstanding requests that are held at any given time. 此限制包括已排队但尚未开始执行的请求,以及正在执行的所有请求。This limit includes requests that are queued but have not started executing, as well as any in progress executions. 超出此限制的任何传入请求将被拒绝,并返回 429“太忙”响应。Any incoming requests over this limit are rejected with a 429 "Too Busy" response. 允许调用方使用基于时间的重试策略,还可帮助控制最大请求延迟。That allows callers to employ time-based retry strategies, and also helps you to control maximum request latencies. 此设置仅控制脚本宿主执行路径中发生的排队。This only controls queuing that occurs within the script host execution path. 其他队列(例如 ASP.NET 请求队列)仍有效,不受此设置的影响。Other queues such as the ASP.NET request queue will still be in effect and unaffected by this setting. *版本 1.x 的默认值是无限制的。*The default for version 1.x is unbounded. 消耗计划中版本 2.x 的默认值为 200。The default for version 2.x in a consumption plan is 200. 专用计划中版本 2.x 的默认值是无限制的。The default for version 2.x in a dedicated plan is unbounded.
maxConcurrentRequestsmaxConcurrentRequests 100*100* 要并行执行的 http 函数数目上限。The maximum number of http functions that will be executed in parallel. 这样,可以控制并发性,从而帮助管理资源利用率。This allows you to control concurrency, which can help manage resource utilization. 例如,某个 http 函数可能使用了大量系统资源(内存/CPU/插槽),从而在并发性过高时导致问题。For example, you might have an http function that uses a lot of system resources (memory/cpu/sockets) such that it causes issues when concurrency is too high. 或者,某个函数向第三方服务发出出站请求,则可能需要限制这些调用的速率。Or you might have a function that makes outbound requests to a third party service, and those calls need to be rate limited. 在这种情况下,应用限制可能有帮助。In these cases, applying a throttle here can help. *版本 1.x 的默认值是无限制的。*The default for version 1.x is unbounded. 消耗计划中版本 2.x 的默认值为 100。The default for version 2.x in a consumption plan is 100. 专用计划中版本 2.x 的默认值是无限制的。The default for version 2.x in a dedicated plan is unbounded.
dynamicThrottlesEnableddynamicThrottlesEnabled true*true* 启用时,将为此设置将导致请求处理管道,以定期检查系统性能计数器类似连接/线程/进程/内存/CPU 等,并通过内置的高阈值 (80%),如果有任何这些计数器请求拒绝与 429“太忙”响应,直至恢复到正常水平的计数器。When enabled, this setting causes the request processing pipeline to periodically check system performance counters like connections/threads/processes/memory/cpu/etc. and if any of those counters are over a built-in high threshold (80%), requests will be rejected with a 429 "Too Busy" response until the counter(s) return to normal levels. *版本 1.x 的默认值是 false。*The default for version 1.x is false. 消耗计划中版本 2.x 的默认值为 true。The default for version 2.x in a consumption plan is true. 专用计划中版本 2.x 的默认值为 false。The default for version 2.x in a dedicated plan is false.

在主机配置文档中找到其他主机配置选项。Other host configuration options can be found in the host configuration document.

后续步骤Next steps

有关详细信息,请参阅以下资源:For more information, see the following resources: