配置运行情况探测Configure liveness probes
容器化应用程序可能会运行较长时间,从而导致进入可能需要通过重启容器来修复的损坏状态。Containerized applications may run for extended periods of time, resulting in broken states that may need to be repaired by restarting the container. Azure 容器实例支持运行情况探测,以便你可以将容器组中的容器配置为在关键功能未正常工作时重启。Azure Container Instances supports liveness probes so that you can configure your containers within your container group to restart if critical functionality is not working. 该运行情况探测的行为类似于 Kubernetes 运行情况探测。The liveness probe behaves like a Kubernetes liveness probe.
本文介绍了如何部署包括运行情况探测的容器组,演示了模拟的不正常容器的自动重启。This article explains how to deploy a container group that includes a liveness probe, demonstrating the automatic restart of a simulated unhealthy container.
Azure 容器实例还支持就绪情况探测,你可以对其进行配置,以确保仅当容器为流量准备就绪时,流量才到达容器。Azure Container Instances also supports readiness probes, which you can configure to ensure that traffic reaches a container only when it's ready for it.
YAML 部署YAML deployment
创建包含下面的代码片段的 liveness-probe.yaml
文件。Create a liveness-probe.yaml
file with the following snippet. 此文件定义了包含最终变得不正常的 NGNIX 容器的容器组。This file defines a container group that consists of an NGNIX container that eventually becomes unhealthy.
apiVersion: 2018-10-01
location: chinaeast2
name: livenesstest
properties:
containers:
- name: mycontainer
properties:
image: nginx
command:
- "/bin/sh"
- "-c"
- "touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600"
ports: []
resources:
requests:
cpu: 1.0
memoryInGB: 1.5
livenessProbe:
exec:
command:
- "cat"
- "/tmp/healthy"
periodSeconds: 5
osType: Linux
restartPolicy: Always
tags: null
type: Microsoft.ContainerInstance/containerGroups
运行以下命令来使用上面的 YAML 部署此容器组:Run the following command to deploy this container group with the above YAML configuration:
az container create --resource-group myResourceGroup --name livenesstest -f liveness-probe.yaml
启动命令Start command
该部署定义了要在容器首次开始运行时运行的启动命令(由接受字符串数组的 command
属性定义)。The deployment defines a starting command to be run when the container first starts running, defined by the command
property, which accepts an array of strings. 在此示例中,它将通过传递以下命令启动 bash 会话并在 /tmp
目录中创建名为 healthy
的文件:In this example, it will start a bash session and create a file called healthy
within the /tmp
directory by passing this command:
/bin/sh -c "touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600"
然后,它休眠 30 秒,之后删除该文件,之后进入 10 分钟的休眠。It will then sleep for 30 seconds before deleting the file, then enters a 10-minute sleep.
运行情况命令Liveness command
此部署定义了一个 livenessProbe
,它支持充当运行情况检查的 exec
运行情况命令。This deployment defines a livenessProbe
that supports an exec
liveness command that acts as the liveness check. 如果此命令以非零值退出,则容器将被终止并重启,指明找不到 healthy
文件。If this command exits with a non-zero value, the container will be killed and restarted, signaling the healthy
file could not be found. 如果此命令以退出代码 0 成功退出,则不会采取任何操作。If this command exits successfully with exit code 0, no action will be taken.
periodSeconds
属性指定运行情况命令应当每 5 秒执行一次。The periodSeconds
property designates the liveness command should execute every 5 seconds.
验证运行情况输出Verify liveness output
在前 30 秒内,启动命令创建的 healthy
文件存在。Within the first 30 seconds, the healthy
file created by the start command exists. 当运行情况命令检查 healthy
文件是否存在时,状态代码返回零,表示成功,因此不会重启。When the liveness command checks for the healthy
file's existence, the status code returns a zero, signaling success, so no restarting occurs.
在 30 秒后,cat /tmp/healthy
将开始失败,导致不正常的和终止事件发生。After 30 seconds, the cat /tmp/healthy
will begin to fail, causing unhealthy and killing events to occur.
可以通过 Azure 门户或 Azure CLI 查看这些事件。These events can be viewed from the Azure portal or Azure CLI.
通过在 Azure 门户中查看事件,在运行情况命令失败时将触发 Unhealthy
类型的事件。By viewing the events in the Azure portal, events of type Unhealthy
will be triggered upon the liveness command failing. 后续事件将是 Killing
类型的,表示容器已删除,因此可以开始重启。The subsequent event will be of type Killing
, signifying a container deletion so a restart can begin. 容器的重启计数在每次发生此事件时递增。The restart count for the container increments each time this event occurs.
重启是就地完成的,因此,诸如公共 IP 地址和节点特定的内容都将保留。Restarts are completed in-place so resources like public IP addresses and node-specific contents will be preserved.
如果运行情况探测连续失败,并且触发了太多次重启,则容器将进入指数后退延迟。If the liveness probe continuously fails and triggers too many restarts, your container will enter an exponential back off delay.
运行情况探测和重启策略Liveness probes and restart policies
重启策略会取代由运行情况探测触发的重启行为。Restart policies supersede the restart behavior triggered by liveness probes. 例如,如果设置了 restartPolicy = Never
以及一个 运行情况探测,则容器组不会由于失败的运行情况探测而重启。For example, if you set a restartPolicy = Never
and a liveness probe, the container group will not restart because of a failed liveness check. 容器组将改为遵守容器组的重启策略 Never
。The container group will instead adhere to the container group's restart policy of Never
.
后续步骤Next steps
基于任务的方案可能会要求运行情况探测在先决条件功能不正常工作时能够自动重启。Task-based scenarios may require a liveness probe to enable automatic restarts if a pre-requisite function is not working properly. 若要详细了解如何运行基于任务的容器,请参阅在 Azure 容器实例中运行容器化任务。For more information about running task-based containers, see Run containerized tasks in Azure Container Instances.