排查 Azure 容器实例中的常见问题Troubleshoot common issues in Azure Container Instances

本文展示了如何排查管理容器或向 Azure 容器实例部署容器时出现的常见问题。This article shows how to troubleshoot common issues for managing or deploying containers to Azure Container Instances. 另请参阅常见问题解答See also Frequently asked questions.

如果需要更多支持,请参阅 Azure 门户中可用的“帮助 + 支持”选项。If you need additional support, see available Help + support options in the Azure portal.

容器组部署过程中的问题Issues during container group deployment

命名约定Naming conventions

定义容器规格时,某些参数需要遵循命名限制。When defining your container specification, certain parameters require adherence to naming restrictions. 下表包含容器组属性的特定要求。Below is a table with specific requirements for container group properties. 有关详细信息,请参阅 Azure 资源的命名规则和限制For more information, see Naming rules and restrictions for Azure resources.

作用域Scope 长度Length 大小写Casing 有效的字符Valid characters 建议的模式Suggested pattern 示例Example
容器名称1Container name1 1-631-63 小写Lowercase 第一个或最后一个字符不能为字母数字和连字符Alphanumeric, and hyphen anywhere except the first or last character <name>-<role>-container<number> web-batch-container1
容器端口Container ports 介于 1 和 65535 之间Between 1 and 65535 IntegerInteger 一个介于 1 和 65535 之间的整数Integer between 1 and 65535 <port-number> 443
DNS 名称标签DNS name label 5-635-63 不区分大小写Case insensitive 第一个或最后一个字符不能为字母数字和连字符Alphanumeric, and hyphen anywhere except the first or last character <name> frontend-site1
环境变量Environment variable 1-631-63 不区分大小写Case insensitive 第一个或最后一个字符不能为字母数字和下划线 ()Alphanumeric, and underscore () anywhere except the first or last character <name> MY_VARIABLE
卷名Volume name 5-635-63 小写Lowercase 第一个或最后一个字符不能为字母数字和连字符。Alphanumeric, and hyphens anywhere except the first or last character. 不能包含两个连续的连字符。Cannot contain two consecutive hyphens. <name> batch-output-volume

1如果没有单独指定容器实例(如通过 az container create 命令部署),那么还会对容器组名称进行限制。1Restriction also for container group names when not specified independently of container instances, for example with az container create command deployments.

不受支持的映像的操作系统版本OS version of image not supported

如果指定了 Azure 容器实例不支持的映像,则将返回 OsVersionNotSupported 错误。If you specify an image that Azure Container Instances doesn't support, an OsVersionNotSupported error is returned. 该错误类似于以下内容,其中 {0} 是你尝试部署的映像的名称:The error is similar to following, where {0} is the name of the image you attempted to deploy:

{
  "error": {
    "code": "OsVersionNotSupported",
    "message": "The OS version of image '{0}' is not supported."
  }
}

在部署基于半年频道版本 1709 或 1803(不支持这些版本)的 Windows 映像时,通常会遇到此错误。This error is most often encountered when deploying Windows images that are based on Semi-Annual Channel release 1709 or 1803, which are not supported. 有关 Azure 容器实例中支持的 Windows 映像,请参阅常见问题解答For supported Windows images in Azure Container Instances, see Frequently asked questions.

无法请求映像Unable to pull image

如果 Azure 容器实例最初无法请求映像,则会重试一段时间。If Azure Container Instances is initially unable to pull your image, it retries for a period of time. 如果映像请求操作继续失败,ACI 最终会使部署失败,可能会显示 Failed to pull image 错误。If the image pull operation continues to fail, ACI eventually fails the deployment, and you may see a Failed to pull image error.

若要解决此问题,请删除容器实例,然后重试部署。To resolve this issue, delete the container instance and retry your deployment. 请确保映像存在于注册表中,并且你已正确键入映像名称。Ensure that the image exists in the registry, and that you've typed the image name correctly.

如果无法请求映像,az container show 的输出会显示如下事件:If the image can't be pulled, events like the following are shown in the output of az container show:

"events": [
  {
    "count": 3,
    "firstTimestamp": "2017-12-21T22:56:19+00:00",
    "lastTimestamp": "2017-12-21T22:57:00+00:00",
    "message": "pulling image \"mcr.microsoft.com/azuredocs/aci-hellowrld\"",
    "name": "Pulling",
    "type": "Normal"
  },
  {
    "count": 3,
    "firstTimestamp": "2017-12-21T22:56:19+00:00",
    "lastTimestamp": "2017-12-21T22:57:00+00:00",
    "message": "Failed to pull image \"mcr.microsoft.com/azuredocs/aci-hellowrld\": rpc error: code 2 desc Error: image t/aci-hellowrld:latest not found",
    "name": "Failed",
    "type": "Warning"
  },
  {
    "count": 3,
    "firstTimestamp": "2017-12-21T22:56:20+00:00",
    "lastTimestamp": "2017-12-21T22:57:16+00:00",
    "message": "Back-off pulling image \"mcr.microsoft.com/azuredocs/aci-hellowrld\"",
    "name": "BackOff",
    "type": "Normal"
  }
],

资源不可用错误Resource not available error

由于 Azure 中的区域资源负载不同,尝试部署容器实例时可能会收到以下错误:Due to varying regional resource load in Azure, you might receive the following error when attempting to deploy a container instance:

The requested resource with 'x' CPU and 'y.z' GB memory is not available in the location 'example region' at this moment. Please retry with a different resource request or in another location.

此错误指示由于尝试部署的区域中负载较重,无法在此时为容器分配指定的资源。This error indicates that due to heavy load in the region in which you are attempting to deploy, the resources specified for your container can't be allocated at that time. 使用以下一个或多个缓解步骤来帮助解决此问题。Use one or more of the following mitigation steps to help resolve your issue.

容器组运行过程中的问题Issues during container group runtime

容器不断退出并重启(没有长时间运行的进程)Container continually exits and restarts (no long-running process)

容器组的重启策略默认为 Always,因此容器组中的容器在运行完成后始终会重启。Container groups default to a restart policy of Always, so containers in the container group always restart after they run to completion. 如果打算运行基于任务的容器,则可能需要将此策略更改为 OnFailureNeverYou may need to change this to OnFailure or Never if you intend to run task-based containers. 如果指定了“失败时” ,但仍不断重启,则可能容器中执行的应用程序或脚本存在问题。If you specify OnFailure and still see continual restarts, there might be an issue with the application or script executed in your container.

在没有长时间运行的进程的情况下运行容器组时,可能会看到重复退出并重启 Ubuntu 或 Alpine 等映像。When running container groups without long-running processes you may see repeated exits and restarts with images such as Ubuntu or Alpine. 通过 EXEC 连接将无法正常工作,因为容器没有使其保持活动的进程。Connecting via EXEC will not work as the container has no process keeping it alive. 若要解决此问题,请在容器组部署中包含如下所示的启动命令,以使容器保持运行。To resolve this problem, include a start command like the following with your container group deployment to keep the container running.

## Deploying a Linux container
az container create -g MyResourceGroup --name myapp --image ubuntu --command-line "tail -f /dev/null"
## Deploying a Windows container
az container create -g myResourceGroup --name mywindowsapp --os-type Windows --image mcr.microsoft.com/windows/servercore:ltsc2019
 --command-line "ping -t localhost"

容器实例 API 和 Azure 门户包含 restartCount 属性。The Container Instances API and Azure portal includes a restartCount property. 若要检查容器的重启次数,可在 Azure CLI 中使用 az container show 命令。To check the number of restarts for a container, you can use the az container show command in the Azure CLI. 在以下示例输出中(为简洁起见已将其截断),可以在输出末尾看到 restartCount 属性。In the following example output (which has been truncated for brevity), you can see the restartCount property at the end of the output.

...
 "events": [
   {
     "count": 1,
     "firstTimestamp": "2017-11-13T21:20:06+00:00",
     "lastTimestamp": "2017-11-13T21:20:06+00:00",
     "message": "Pulling: pulling image \"myregistry.azurecr.cn/aci-tutorial-app:v1\"",
     "type": "Normal"
   },
   {
     "count": 1,
     "firstTimestamp": "2017-11-13T21:20:14+00:00",
     "lastTimestamp": "2017-11-13T21:20:14+00:00",
     "message": "Pulled: Successfully pulled image \"myregistry.azurecr.cn/aci-tutorial-app:v1\"",
     "type": "Normal"
   },
   {
     "count": 1,
     "firstTimestamp": "2017-11-13T21:20:14+00:00",
     "lastTimestamp": "2017-11-13T21:20:14+00:00",
     "message": "Created: Created container with id bf25a6ac73a925687cafcec792c9e3723b0776f683d8d1402b20cc9fb5f66a10",
     "type": "Normal"
   },
   {
     "count": 1,
     "firstTimestamp": "2017-11-13T21:20:14+00:00",
     "lastTimestamp": "2017-11-13T21:20:14+00:00",
     "message": "Started: Started container with id bf25a6ac73a925687cafcec792c9e3723b0776f683d8d1402b20cc9fb5f66a10",
     "type": "Normal"
   }
 ],
 "previousState": null,
 "restartCount": 0
...
}

备注

Linux 分发的大多数容器映像会设置一个 shell(如 bash)作为默认命令。Most container images for Linux distributions set a shell, such as bash, as the default command. 由于 Shell 本身不是长时间运行的服务,因此如果这些容器配置了“始终”重启策略,会立即退出并不断重启 。Since a shell on its own is not a long-running service, these containers immediately exit and fall into a restart loop when configured with the default Always restart policy.

容器启动时间过长Container takes a long time to start

影响 Azure 容器实例中的容器启动时间的三个主要因素是:The three primary factors that contribute to container startup time in Azure Container Instances are:

Windows 映像具有其他注意事项Windows images have additional considerations.

映像大小Image size

如果容器启动时间过长,但最终成功启动,请先查看容器映像大小。If your container takes a long time to start, but eventually succeeds, start by looking at the size of your container image. 由于 Azure 容器实例按需请求容器映像,因此显示的启动时间与映像大小直接相关。Because Azure Container Instances pulls your container image on demand, the startup time you see is directly related to its size.

可在 Docker CLI 中使用 docker images 命令查看容器映像大小:You can view the size of your container image by using the docker images command in the Docker CLI:

$ docker images
REPOSITORY                                    TAG       IMAGE ID        CREATED          SIZE
mcr.microsoft.com/azuredocs/aci-helloworld    latest    7367f3256b41    15 months ago    67.6MB

保持容器较小的关键是,确保最终映像不包含任何运行时不需要的内容。The key to keeping image sizes small is ensuring that your final image does not contain anything that is not required at runtime. 执行此操作的一种方法是使用多阶段生成One way to do this is with multi-stage builds. 多阶段生成可轻松确保最终映像仅包含应用程序所需的项目,而不包含任何生成时需要的额外内容。Multi-stage builds make it easy to ensure that the final image contains only the artifacts you need for your application, and not any of the extra content that was required at build time.

映像位置Image location

若要减小映像请求对容器启动时间的影响,另一种方法是在希望部署容器实例的同一区域的 Azure 容器注册表中托管容器映像。Another way to reduce the impact of the image pull on your container's startup time is to host the container image in Azure Container Registry in the same region where you intend to deploy container instances. 这会缩短容器映像需要经过的网络路径,显著缩短下载时间。This shortens the network path that the container image needs to travel, significantly shortening the download time.

缓存的映像Cached images

对于基于常用 Windows 基本映像(包括 nanoserver:1809servercore:ltsc2019servercore:1809)的映像,Azure 容器实例使用一种缓存机制来帮助加快容器启动时间。Azure Container Instances uses a caching mechanism to help speed container startup time for images built on common Windows base images, including nanoserver:1809, servercore:ltsc2019, and servercore:1809. 常用的 Linux 映像(例如 ubuntu:1604alpine:3.6)也会缓存。Commonly used Linux images such as ubuntu:1604 and alpine:3.6 are also cached. 若要获取缓存的映像和标记的最新列表,请使用列出缓存的映像 API。For an up-to-date list of cached images and tags, use the List Cached Images API.

备注

在 Azure 容器实例中使用基于 Windows Server 2019 的映像处于预览状态。Use of Windows Server 2019-based images in Azure Container Instances is in preview.

Windows 容器慢速网络准备情况Windows containers slow network readiness

在初始创建时,Windows 容器在最多 30 秒内(在极少数情况下,会更长时间)可能没有入站或出站连接。On initial creation, Windows containers may have no inbound or outbound connectivity for up to 30 seconds (or longer, in rare cases). 如果容器应用程序需要 Internet 连接,请添加延迟和重试逻辑以允许 30 秒建立 Internet 连接。If your container application needs an Internet connection, add delay and retry logic to allow 30 seconds to establish Internet connectivity. 初始设置后,容器网络应适当恢复。After initial setup, container networking should resume appropriately.

无法连接到基础 Docker API 或运行特权容器Cannot connect to underlying Docker API or run privileged containers

Azure 容器实例不公开对托管容器组的底层基础结构的直接访问。Azure Container Instances does not expose direct access to the underlying infrastructure that hosts container groups. 这包括访问运行在容器主机上的 Docker API 和运行特权容器。This includes access to the Docker API running on the container's host and running privileged containers. 如果需要 Docker 交互,请查看 REST 参考文档以了解 ACI API 支持的内容。If you require Docker interaction, check the REST reference documentation to see what the ACI API supports. 如果缺少某些内容,请在 ACI 反馈论坛上提交请求。If there is something missing, submit a request on the ACI feedback forums.

容器组 IP 地址可能会由于端口不匹配而无法访问Container group IP address may not be accessible due to mismatched ports

Azure 容器实例尚不支持具有常规 docker 配置的端口映射。Azure Container Instances doesn't yet support port mapping like with regular docker configuration. 如果你发现容器组的 IP 地址在你认为应该可以访问的情况下无法访问,请确保已使用 ports 属性将容器映像配置为侦听在容器组中公开的相同端口。If you find a container group's IP address is not accessible when you believe it should be, ensure you have configured your container image to listen to the same ports you expose in your container group with the ports property.

如果要确认 Azure 容器实例可以在容器映像中配置的端口上侦听,请测试公开了该端口的 aci-helloworld 映像的部署。If you want to confirm that Azure Container Instances can listen on the port you configured in your container image, test a deployment of the aci-helloworld image that exposes the port. 另外,请运行 aci-helloworld 应用,使其在该端口上侦听。Also run the aci-helloworld app so that it listens on the port. aci-helloworld 接受一个可选的环境变量 PORT 来替代它用于侦听的默认端口 80。aci-helloworld accepts an optional environment variable PORT to override the default port 80 it listens on. 例如,若要测试端口 9000,请在创建容器组时设置该环境变量For example, to test port 9000, set the environment variable when you create the container group:

  1. 设置容器组来公开端口 9000,并将端口号传递为环境变量的值。Set up the container group to expose port 9000, and pass the port number as the value of the environment variable. 此示例已针对 Bash shell 格式化。The example is formatted for the Bash shell. 若要使用其他 shell(例如 PowerShell 或命令提示符),需要相应地调整变量赋值。If you prefer another shell such as PowerShell or Command Prompt, you'll need to adjust variable assignment accordingly.

    az container create --resource-group myResourceGroup \
    --name mycontainer --image mcr.microsoft.com/azuredocs/aci-helloworld \
    --ip-address Public --ports 9000 \
    --environment-variables 'PORT'='9000'
    
  2. az container create 的命令输出中找到该容器组的 IP 地址。Find the IP address of the container group in the command output of az container create. 查找 ip 的值。Look for the value of ip.

  3. 成功预配容器后,在浏览器中浏览到容器应用的 IP 地址和端口,例如:192.0.2.0:9000After the container is provisioned successfully, browse to the IP address and port of the container app in your browser, for example: 192.0.2.0:9000.

    应该会看到You should see the "Welcome to Azure Container Instances!" Web 应用显示的 "Welcome to Azure Container Instances!" 消息。message displayed by the web app.

  4. 完成容器的操作后,使用 az container delete 命令将其删除:When you're done with the container, remove it using the az container delete command:

    az container delete --resource-group myResourceGroup --name mycontainer
    

后续步骤Next steps

了解如何检索容器日志和事件来帮助调试你的容器。Learn how to retrieve container logs and events to help debug your containers.