配置运行情况探测Configure liveness probes

容器化应用程序可能会运行较长时间,从而导致进入可能需要通过重启容器来修复的损坏状态。Containerized applications may run for extended periods of time, resulting in broken states that may need to be repaired by restarting the container. Azure 容器实例支持运行情况探测,以便你可以将容器组中的容器配置为在关键功能未正常工作时重启。Azure Container Instances supports liveness probes so that you can configure your containers within your container group to restart if critical functionality is not working. 该运行情况探测的行为类似于 Kubernetes 运行情况探测The liveness probe behaves like a Kubernetes liveness probe.

本文介绍了如何部署包括运行情况探测的容器组,演示了模拟的不正常容器的自动重启。This article explains how to deploy a container group that includes a liveness probe, demonstrating the automatic restart of a simulated unhealthy container.

Azure 容器实例还支持就绪情况探测,你可以对其进行配置,以确保仅当容器为流量准备就绪时,流量才到达容器。Azure Container Instances also supports readiness probes, which you can configure to ensure that traffic reaches a container only when it's ready for it.

YAML 部署YAML deployment

创建包含下面的代码片段的 liveness-probe.yaml 文件。Create a liveness-probe.yaml file with the following snippet. 此文件定义了包含最终变得不正常的 NGNIX 容器的容器组。This file defines a container group that consists of an NGNIX container that eventually becomes unhealthy.

apiVersion: 2018-10-01
location: chinaeast2
name: livenesstest
properties:
  containers:
  - name: mycontainer
    properties:
      image: nginx
      command:
        - "/bin/sh"
        - "-c"
        - "touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600"
      ports: []
      resources:
        requests:
          cpu: 1.0
          memoryInGB: 1.5
      livenessProbe:
        exec:
            command:
                - "cat"
                - "/tmp/healthy"
        periodSeconds: 5
  osType: Linux
  restartPolicy: Always
tags: null
type: Microsoft.ContainerInstance/containerGroups

运行以下命令来使用上面的 YAML 部署此容器组:Run the following command to deploy this container group with the above YAML configuration:

az container create --resource-group myResourceGroup --name livenesstest -f liveness-probe.yaml

启动命令Start command

该部署定义了要在容器首次开始运行时运行的启动命令(由接受字符串数组的 command 属性定义)。The deployment defines a starting command to be run when the container first starts running, defined by the command property, which accepts an array of strings. 在此示例中,它将通过传递以下命令启动 bash 会话并在 /tmp 目录中创建名为 healthy 的文件:In this example, it will start a bash session and create a file called healthy within the /tmp directory by passing this command:

/bin/sh -c "touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600"

然后,它休眠 30 秒,之后删除该文件,之后进入 10 分钟的休眠。It will then sleep for 30 seconds before deleting the file, then enters a 10-minute sleep.

运行情况命令Liveness command

此部署定义了一个 livenessProbe,它支持充当运行情况检查的 exec 运行情况命令。This deployment defines a livenessProbe that supports an exec liveness command that acts as the liveness check. 如果此命令以非零值退出,则容器将被终止并重启,指明找不到 healthy 文件。If this command exits with a non-zero value, the container will be killed and restarted, signaling the healthy file could not be found. 如果此命令以退出代码 0 成功退出,则不会采取任何操作。If this command exits successfully with exit code 0, no action will be taken.

periodSeconds 属性指定运行情况命令应当每 5 秒执行一次。The periodSeconds property designates the liveness command should execute every 5 seconds.

验证运行情况输出Verify liveness output

在前 30 秒内,启动命令创建的 healthy 文件存在。Within the first 30 seconds, the healthy file created by the start command exists. 当运行情况命令检查 healthy 文件是否存在时,状态代码返回零,表示成功,因此不会重启。When the liveness command checks for the healthy file's existence, the status code returns a zero, signaling success, so no restarting occurs.

在 30 秒后,cat /tmp/healthy 将开始失败,导致不正常的和终止事件发生。After 30 seconds, the cat /tmp/healthy will begin to fail, causing unhealthy and killing events to occur.

可以通过 Azure 门户或 Azure CLI 查看这些事件。These events can be viewed from the Azure portal or Azure CLI.

门户不正常事件

通过在 Azure 门户中查看事件,在运行情况命令失败时将触发 Unhealthy 类型的事件。By viewing the events in the Azure portal, events of type Unhealthy will be triggered upon the liveness command failing. 后续事件将是 Killing 类型的,表示容器已删除,因此可以开始重启。The subsequent event will be of type Killing, signifying a container deletion so a restart can begin. 容器的重启计数在每次发生此事件时递增。The restart count for the container increments each time this event occurs.

重启是就地完成的,因此,诸如公共 IP 地址和节点特定的内容都将保留。Restarts are completed in-place so resources like public IP addresses and node-specific contents will be preserved.

门户重启计数器

如果运行情况探测连续失败,并且触发了太多次重启,则容器将进入指数后退延迟。If the liveness probe continuously fails and triggers too many restarts, your container will enter an exponential back off delay.

运行情况探测和重启策略Liveness probes and restart policies

重启策略会取代由运行情况探测触发的重启行为。Restart policies supersede the restart behavior triggered by liveness probes. 例如,如果设置了 restartPolicy = Never 以及一个 运行情况探测,则容器组不会由于失败的运行情况探测而重启。For example, if you set a restartPolicy = Never and a liveness probe, the container group will not restart because of a failed liveness check. 容器组将改为遵守容器组的重启策略 NeverThe container group will instead adhere to the container group's restart policy of Never.

后续步骤Next steps

基于任务的方案可能会要求运行情况探测在先决条件功能不正常工作时能够自动重启。Task-based scenarios may require a liveness probe to enable automatic restarts if a pre-requisite function is not working properly. 若要详细了解如何运行基于任务的容器,请参阅在 Azure 容器实例中运行容器化任务For more information about running task-based containers, see Run containerized tasks in Azure Container Instances.