如何排查 Log Analytics Linux 代理的问题How to troubleshoot issues with the Log Analytics agent for Linux

本文介绍如何排查可能遇到的 Azure Monitor 中的 Log Analytics Linux 代理的相关错误,并提供可能的解决方案建议。This article provides help troubleshooting errors you might experience with the Log Analytics agent for Linux in Azure Monitor and suggests possible solutions to resolve them.

如果这些步骤对你均无效,我们还提供了以下支持渠道:If none of these steps work for you, the following support channels are also available:

Log Analytics 故障排除工具Log Analytics Troubleshooting Tool

Log Analytics 代理 Linux 故障排除工具是一个脚本,旨在帮助查找和诊断 Log Analytics 代理问题。The Log Analytics Agent Linux Troubleshooting Tool is a script designed to help find and diagnose issues with the Log Analytics Agent. 安装后,该工具将自动包含在代理中。It is automatically included with the agent upon installation. 应将运行此工具作为诊断问题的第一步。Running the tool should be the first step in diagnosing an issue.

使用方法How to Use

通过将以下命令粘贴到具有 Log Analytics 代理的计算机上的终端窗口中,可以运行故障排除工具:sudo /opt/microsoft/omsagent/bin/troubleshooterThe Troubleshooting Tool can be run by pasting the following command into a terminal window on a machine with the Log Analytics agent: sudo /opt/microsoft/omsagent/bin/troubleshooter

手动安装Manual Installation

安装 Log Analytics 代理后,将自动包含故障排除工具。The Troubleshooting Tool is automatically included upon installation of the Log Analytics Agent. 但如果安装失败,也可以按照以下步骤手动安装该工具。However, if installation fails in any way, it can also be installed manually by following the steps below.

  1. 将疑难解答捆绑包复制到你的计算机上:wget https://raw.github.com/microsoft/OMS-Agent-for-Linux/master/source/code/troubleshooter/omsagent_tst.tar.gzCopy the troubleshooter bundle onto your machine: wget https://raw.github.com/microsoft/OMS-Agent-for-Linux/master/source/code/troubleshooter/omsagent_tst.tar.gz
  2. 解开该捆绑包:tar -xzvf omsagent_tst.tar.gzUnpack the bundle: tar -xzvf omsagent_tst.tar.gz
  3. 运行手动安装:sudo ./install_tstRun the manual installation: sudo ./install_tst

涵盖的方案Scenarios Covered

下面是使用故障排除工具检查的方案的列表:Below is a list of scenarios checked by the Troubleshooting Tool:

  1. 代理运行不正常,检测信号无法正常工作Agent is unhealthy, heartbeat doesn't work properly
  2. 代理未启动,无法连接到 Log Analytic 服务Agent doesn't start, can't connect to Log Analytic Services
  3. 代理系统日志无效Agent syslog isn't working
  4. 代理的 CPU/内存使用率高Agent has high CPU / memory usage
  5. 代理存在安装问题Agent having installation issues
  6. 代理自定义日志无效Agent custom logs aren't working
  7. 收集代理日志Collect Agent logs

有关更多详细信息,请参阅 Github 文档For more details, please check out our Github documentation.

备注

遇到问题时,请运行日志收集器工具。Please run the Log Collector tool when you experience an issue. 从一开始便记录日志将极大帮助我们的支持团队更快解决你的问题。Having the logs initially will greatly help our support team troubleshoot your issue quicker.

清除 Linux 代理并重新安装Purge and Re-Install the Linux Agent

我们了解到,清理代理并重新安装可以解决大多数问题。We've seen that a clean re-install of the Agent will fix most issues. 事实上,这可能是支持部门提出的第一个建议,让支持团队使代理处于未损坏的状态。In fact this may be the first suggestion from Support to get the Agent into a uncurropted state from our support team. 运行排除故障程序、收集日志、尝试清理并重新安装将有助于更快地解决问题。Running the troubleshooter, log collect, and attempting a clean re-install will help solve issues more quickly.

  1. 下载清除脚本:Download the purge script:
  • $ wget https://raw.githubusercontent.com/microsoft/OMS-Agent-for-Linux/master/tools/purge_omsagent.sh
  1. 运行清除脚本(使用 sudo 权限):Run the purge script (with sudo permissions):
  • $ sudo sh purge_omsagent.sh

重要的日志位置和日志收集器工具Important log locations and Log Collector tool

文件File PathPath
Log Analytics Linux 代理日志文件Log Analytics agent for Linux log file /var/opt/microsoft/omsagent/<workspace id>/log/omsagent.log
Log Analytics 代理配置日志文件Log Analytics agent configuration log file /var/opt/microsoft/omsconfig/omsconfig.log

我们建议你在进行故障排除或提交 GitHub 问题之前使用我们的日志收集器工具来检索重要日志。We recommend you to use our log collector tool to retrieve important logs for troubleshooting or before submitting a GitHub issue. 你可以点击此处了解有关该工具的详细信息以及如何运行该工具。You can read more about the tool and how to run it here.

重要的配置文件Important configuration files

CategoryCategory 文件位置File Location
SyslogSyslog /etc/syslog-ng/syslog-ng.conf/etc/rsyslog.conf/etc/rsyslog.d/95-omsagent.conf/etc/syslog-ng/syslog-ng.conf or /etc/rsyslog.conf or /etc/rsyslog.d/95-omsagent.conf
性能、Nagios、Zabbix、Log Analytics 输出和常规代理Performance, Nagios, Zabbix, Log Analytics output and general agent /etc/opt/microsoft/omsagent/<workspace id>/conf/omsagent.conf
其他配置Additional configurations /etc/opt/microsoft/omsagent/<workspace id>/conf/omsagent.d/*.conf

备注

如果从 Azure 门户中针对你的工作区的数据菜单 Log Analytics 高级设置中配置收集,则性能计数器的编辑配置文件和 Syslog 将会被覆盖。Editing configuration files for performance counters and Syslog is overwritten if the collection is configured from the data menu Log Analytics Advanced Settings in the Azure portal for your workspace. 要禁用所有代理的配置,则禁用从 Log Analytics“高级设置”收集,若禁用单个代理,则运行以下命令:To disable configuration for all agents, disable collection from Log Analytics Advanced Settings or for a single agent run the following:
sudo su omsagent -c 'python /opt/microsoft/omsconfig/Scripts/OMS_MetaConfigHelper.py --disable'

安装错误代码Installation error codes

错误代码Error Code 含义Meaning
NOT_DEFINEDNOT_DEFINED 由于未安装必需的依赖项,将不会安装 auoms auditd 插件Because the necessary dependencies are not installed, the auoms auditd plugin will not be installed Auoms 安装失败,请安装程序包 auditd。Installation of auoms failed, install package auditd.
22 提供给 shell 捆绑包的选项无效。Invalid option provided to the shell bundle. 运行 sudo sh ./omsagent-*.universal*.sh --help 获取使用情况Run sudo sh ./omsagent-*.universal*.sh --help for usage
33 未向 shell 捆绑包提供任何选项。No option provided to the shell bundle. 运行 sudo sh ./omsagent-*.universal*.sh --help 获取使用情况。Run sudo sh ./omsagent-*.universal*.sh --help for usage.
44 无效的程序包类型或者无效的代理服务器设置;omsagent-rpm.sh 程序包只能安装在基于 RPM 的系统上,而 msagent-deb.sh 程序包只能安装在基于 Debian 的系统上。Invalid package type OR invalid proxy settings; omsagent-rpm.sh packages can only be installed on RPM-based systems, and omsagent-deb.sh packages can only be installed on Debian-based systems. 建议使用最新版本中的通用安装程序。It is recommend you use the universal installer from the latest release. 另外还应该进行查看以验证你的代理服务器设置。Also review to verify your proxy settings.
55 必须以 root 身份执行 shell 捆绑包或在载入期间返回 403 错误。The shell bundle must be executed as root OR there was 403 error returned during onboarding. 使用 sudo 运行你的命令。Run your command using sudo.
66 无效的程序包体系结构或者载入期间返回 200 错误;omsagent-x64.sh 程序包只能安装在 64 位系统上,而 omsagent- x86.sh 程序包只能安装在 32 位系统上。Invalid package architecture OR there was error 200 error returned during onboarding; omsagent-x64.sh packages can only be installed on 64-bit systems, and omsagent- x86.sh packages can only be installed on 32-bit systems. 最新版本为你的体系结构下载正确的程序包。Download the correct package for your architecture from the latest release.
1717 OMS 程序包安装失败。Installation of OMS package failed. 仔细查看命令输出查找根源故障。Look through the command output for the root failure.
1919 OMI 程序包安装失败。Installation of OMI package failed. 仔细查看命令输出查找根源故障。Look through the command output for the root failure.
2020 SCX 程序包安装失败。Installation of SCX package failed. 仔细查看命令输出查找根源故障。Look through the command output for the root failure.
2121 Provider 工具包安装失败。Installation of Provider kits failed. 仔细查看命令输出查找根源故障。Look through the command output for the root failure.
2222 捆绑的程序包安装失败。Installation of bundled package failed. 仔细查看命令输出查找根源故障Look through the command output for the root failure
2323 SCX 或 OMI 程序包已安装。SCX or OMI package already installed. 使用 --upgrade 而不是 --install 安装 shell 捆绑包。Use --upgrade instead of --install to install the shell bundle.
3030 内部捆绑包错误。Internal bundle error. 提交 GitHub 问题,附带输出中的详细信息。File a GitHub Issue with details from the output.
5555 不受支持的 openssl 版本或无法连接到 Azure Monitor 或 dpkg 已锁定或缺少 curl 程序。Unsupported openssl version OR Cannot connect to Azure Monitor OR dpkg is locked OR missing curl program.
6161 缺少 Python ctypes 库。Missing Python ctypes library. 安装 Python ctypes 库或程序包 (python-ctypes)。Install the Python ctypes library or package (python-ctypes).
6262 缺少 tar 程序,请安装 tar。Missing tar program, install tar.
6363 缺少 sed 程序,请安装 sed。Missing sed program, install sed.
6464 缺少 curl 程序,请安装 curl。Missing curl program, install curl.
6565 缺少 gpg 程序,请安装 gpg。Missing gpg program, install gpg.

载入错误代码Onboarding error codes

错误代码Error Code 含义Meaning
22 提供给 omsadmin 脚本的选项无效。Invalid option provided to the omsadmin script. 运行 sudo sh /opt/microsoft/omsagent/bin/omsadmin.sh -h 获取使用情况。Run sudo sh /opt/microsoft/omsagent/bin/omsadmin.sh -h for usage.
33 提供给 omsadmin 脚本的配置无效。Invalid configuration provided to the omsadmin script. 运行 sudo sh /opt/microsoft/omsagent/bin/omsadmin.sh -h 获取使用情况。Run sudo sh /opt/microsoft/omsagent/bin/omsadmin.sh -h for usage.
44 提供给 omsadmin 脚本的代理无效。Invalid proxy provided to the omsadmin script. 验证代理,并参阅我们的有关使用 HTTP 代理服务器的文档Verify the proxy and see our documentation for using an HTTP proxy.
55 从 Azure Monitor 收到 403 HTTP 错误。403 HTTP error received from Azure Monitor. 请参阅完整的 omsadmin 脚本输出了解详细信息。See the full output of the omsadmin script for details.
66 从 Azure Monitor 收到非 200 HTTP 错误。Non-200 HTTP error received from Azure Monitor. 请参阅完整的 omsadmin 脚本输出了解详细信息。See the full output of the omsadmin script for details.
77 无法连接到 Azure Monitor。Unable to connect to Azure Monitor. 请参阅完整的 omsadmin 脚本输出了解详细信息。See the full output of the omsadmin script for details.
88 载入 Log Analytics 工作区时出错。Error onboarding to Log Analytics workspace. 请参阅完整的 omsadmin 脚本输出了解详细信息。See the full output of the omsadmin script for details.
3030 内部脚本错误。Internal script error. 提交 GitHub 问题,附带输出中的详细信息。File a GitHub Issue with details from the output.
3131 生成代理 ID 时出错。Error generating agent ID. 提交 GitHub 问题,附带输出中的详细信息。File a GitHub Issue with details from the output.
3232 生成证书时出错。Error generating certificates. 请参阅完整的 omsadmin 脚本输出了解详细信息。See the full output of the omsadmin script for details.
3333 生成 omsconfig 的元配置时出错。Error generating metaconfiguration for omsconfig. 提交 GitHub 问题,附带输出中的详细信息。File a GitHub Issue with details from the output.
3434 不存在元配置生成脚本。Metaconfiguration generation script not present. 重新尝试使用 sudo sh /opt/microsoft/omsagent/bin/omsadmin.sh -w <Workspace ID> -s <Workspace Key> 载入。Retry onboarding with sudo sh /opt/microsoft/omsagent/bin/omsadmin.sh -w <Workspace ID> -s <Workspace Key>.

启用调试日志记录Enable debug logging

OMS 输出插件调试OMS output plugin debug

FluentD 允许插件特定日志记录级别,从而允许针对输入和输出指定不同的日志级别。FluentD allows for plugin-specific logging levels allowing you to specify different log levels for inputs and outputs. 要为 OMS 输出指定不同的日志级别,请在 /etc/opt/microsoft/omsagent/<workspace id>/conf/omsagent.conf 编辑常规代理配置。To specify a different log level for OMS output, edit the general agent configuration at /etc/opt/microsoft/omsagent/<workspace id>/conf/omsagent.conf.

在 OMS 输出插件的配置文件末尾之前,将 log_level 属性从 info 更改为 debugIn the OMS output plugin, before the end of the configuration file, change the log_level property from info to debug:

<match oms.** docker.**>
 type out_oms
 log_level debug
 num_threads 5
 buffer_chunk_limit 5m
 buffer_type file
 buffer_path /var/opt/microsoft/omsagent/<workspace id>/state/out_oms*.buffer
 buffer_queue_limit 10
 flush_interval 20s
 retry_limit 10
 retry_wait 30s
</match>

调试日志记录允许按类型、数据项数量和发送所用时间查看批量上传至 Azure Monitor 的信息:Debug logging allows you to see batched uploads to Azure Monitor separated by type, number of data items, and time taken to send:

启用调试日志的示例︰Example debug enabled log:

Success sending oms.nagios x 1 in 0.14s
Success sending oms.omi x 4 in 0.52s
Success sending oms.syslog.authpriv.info x 1 in 0.91s

详细输出Verbose output

如果不使用 OMS 输出插件,还可以将数据项直接输出到 Log Analytics for Linux 代理日志文件中可见的 stdoutInstead of using the OMS output plugin you can also output data items directly to stdout, which is visible in the Log Analytics agent for Linux log file.

在 Log Analytics 常规代理配置文件中的 /etc/opt/microsoft/omsagent/<workspace id>/conf/omsagent.conf 处,通过在每一行的前面添加 # 注释掉 OMS 输出插件:In the Log Analytics general agent configuration file at /etc/opt/microsoft/omsagent/<workspace id>/conf/omsagent.conf, comment out the OMS output plugin by adding a # in front of each line:

#<match oms.** docker.**>
#  type out_oms
#  log_level info
#  num_threads 5
#  buffer_chunk_limit 5m
#  buffer_type file
#  buffer_path /var/opt/microsoft/omsagent/<workspace id>/state/out_oms*.buffer
#  buffer_queue_limit 10
#  flush_interval 20s
#  retry_limit 10
#  retry_wait 30s
#</match>

在输出插件下面,通过在每一行的前面删除 # 注释掉以下部分:Below the output plugin, uncomment the following section by removing the # in front of each line:

<match **>
  type stdout
</match>

问题:无法通过代理连接到 Azure MonitorIssue: Unable to connect through proxy to Azure Monitor

可能的原因Probable causes

  • 在载入期间指定的代理不正确The proxy specified during onboarding was incorrect
  • 数据中心的已批准列表中不包括 Azure Monitor 和 Azure 自动化服务终结点The Azure Monitor and Azure Automation Service Endpoints are not included in the approved list in your datacenter

解决方法Resolution

  1. 使用以下命令(启用了 -v 选项)通过 Log Analytics Linux 代理重新载入到 Azure Monitor。Reonboard to Azure Monitor with the Log Analytics agent for Linux by using the following command with the option -v enabled. 它允许通过代理服务器连接到 Azure Monitor 的代理能够进行详细输出。It allows verbose output of the agent connecting through the proxy to Azure Monitor. /opt/microsoft/omsagent/bin/omsadmin.sh -w <Workspace ID> -s <Workspace Key> -p <Proxy Conf> -v

  2. 请查看更新代理设置部分,验证是否已将代理正确配置为通过代理服务器进行通信。Review the section Update proxy settings to verify you have properly configured the agent to communicate through a proxy server.

  3. 仔细检查 Azure Monitor 网络防火墙要求列表中列出的终结点是否已正确添加到允许列表中。Double-check that the endpoints outlined in the Azure Monitor network firewall requirements list are added to an allow list correctly. 如果使用 Azure 自动化,则还会在上面链接必要的网络配置步骤。If you use Azure Automation, the necessary network configuration steps are linked above as well.

问题:尝试载入时收到 403 错误Issue: You receive a 403 error when trying to onboard

可能的原因Probable causes

  • Linux 服务器上的日期和时间不正确Date and Time is incorrect on Linux Server
  • 使用的工作区 ID 和工作区密钥不正确Workspace ID and Workspace Key used are not correct

解决方法Resolution

  1. 使用 date 命令检查 Linux 服务器上的时间。Check the time on your Linux server with the command date. 如果时间比当前时间快/慢 15 分钟,则载入失败。If the time is +/- 15 minutes from current time, then onboarding fails. 若要纠正此问题,请更新 Linux 服务器的日期和/或时区。To correct this update the date and/or timezone of your Linux server.
  2. 验证你是否安装了最新版本的 Log Analytics Linux 代理。Verify you have installed the latest version of the Log Analytics agent for Linux. 如果时间偏差导致了载入故障,最新版本现在会发出通知。The newest version now notifies you if time skew is causing the onboarding failure.
  3. 请按照本文前面所述的安装说明使用正确的工作区 ID 和工作区密钥重新载入。Reonboard using correct Workspace ID and Workspace Key following the installation instructions earlier in this article.

问题:载入后,日志文件中立即显示 500 和 404 错误Issue: You see a 500 and 404 error in the log file right after onboarding

这是第一次将 Linux 数据上传到 Log Analytics 工作区时发生的已知问题。This is a known issue that occurs on first upload of Linux data into a Log Analytics workspace. 这不会影响发送的数据或服务体验。This does not affect data being sent or service experience.

问题:你看到 omiagent 使用100% CPUIssue: You see omiagent using 100% CPU

可能的原因Probable causes

nss-pem 包 v1.0.3-5.el7 中的回归导致了严重的性能问题,我们已在 Redhat/CentOS 7.x 发行版中看到发生了很多这样的问题。A regression in nss-pem package v1.0.3-5.el7 caused a severe performance issue, that we've been seeing come up a lot in Redhat/Centos 7.x distributions. 若要详细了解此问题,请查看以下文档:Bug libcurl 中的 1667121 性能回归To learn more about this issue, check the following documentation: Bug 1667121 Performance regression in libcurl.

与性能相关的 bug 并不总是发生,而且它们很难再现。Performance related bugs don't happen all the time, and they are very difficult to reproduce. 如果你在 omiagent 中遇到这样的问题,应该使用脚本 omiHighCPUDiagnostics.sh,它将在超过某个阈值时收集 omiagent 的堆栈跟踪。If you experience such issue with omiagent you should use the script omiHighCPUDiagnostics.sh which will collect the stack trace of the omiagent when exceeding a certain threshold.

  1. 下载脚本Download the script
    wget https://raw.githubusercontent.com/microsoft/OMS-Agent-for-Linux/master/tools/LogCollector/source/omiHighCPUDiagnostics.sh

  2. 使用 30% CPU 阈值运行诊断 24 小时Run diagnostics for 24 hours with 30% CPU threshold
    bash omiHighCPUDiagnostics.sh --runtime-in-min 1440 --cpu-threshold 30

  3. Callstack 将转储到 omiagent_trace 文件中。如果你看到许多 Curl 和 NSS 函数调用,请按照下面的解决步骤操作。Callstack will be dumped in omiagent_trace file, If you notice many Curl and NSS function calls, follow resolution steps below.

解决方法(分步)Resolution (step by step)

  1. 将 nss-pem 包升级到 v1.0.3-5.el7_6.1Upgrade the nss-pem package to v1.0.3-5.el7_6.1.
    sudo yum upgrade nss-pem

  2. 如果 nss-pem 不可用于升级(主要发生在 Centos 上),则将 curl 降级到 7.29.0-46。If nss-pem is not available for upgrade (mostly happens on Centos), then downgrade curl to 7.29.0-46. 如果错误地运行了“yum update”,则 curl 将升级到 7.29.0-51,问题将再次发生。If by mistake you run "yum update", then curl will be upgraded to 7.29.0-51 and the issue will happen again.
    sudo yum downgrade curl libcurl

  3. 重启 OMI:Restart OMI:
    sudo scxadmin -restart

问题:Azure 门户中未显示任何数据Issue: You are not seeing any data in the Azure portal

可能的原因Probable causes

  • 加入 Azure Monitor 失败Onboarding to Azure Monitor failed
  • 已阻止连接到 Azure MonitorConnection to Azure Monitor is blocked
  • Log Analytics Linux 代理数据已备份Log Analytics agent for Linux data is backed up

解决方法Resolution

  1. 通过检查是否存在以下文件,来检查是否已成功载入 Azure Monitor:/etc/opt/microsoft/omsagent/<workspace id>/conf/omsadmin.confCheck if onboarding Azure Monitor was successful by checking if the following file exists: /etc/opt/microsoft/omsagent/<workspace id>/conf/omsadmin.conf

  2. 使用 omsadmin.sh 命令行指令重新载入Reonboard using the omsadmin.sh command-line instructions

  3. 如果使用代理,请参阅之前提供的代理解决方法步骤。If using a proxy, refer to the proxy resolution steps provided earlier.

  4. 在某些情况下,当 Log Analytics Linux 代理无法与此服务通信时,代理上的数据会在整个缓冲区(大小 50 MB)中排队。In some cases, when the Log Analytics agent for Linux cannot communicate with the service, data on the agent is queued to the full buffer size, which is 50 MB. 该代理应通过运行以下命令重新启动:/opt/microsoft/omsagent/bin/service_control restart [<workspace id>]The agent should be restarted by running the following command: /opt/microsoft/omsagent/bin/service_control restart [<workspace id>].

    备注

    此问题已在代理版本 1.1.0-28 及更高版本中解决。This issue is fixed in agent version 1.1.0-28 and later.

问题:看不到转发的 Syslog 消息Issue: You are not seeing forwarded Syslog messages

可能的原因Probable causes

  • 应用于 Linux 服务器的配置不允许收集已发送的设施和/或日志级别The configuration applied to the Linux server does not allow collection of the sent facilities and/or log levels
  • Syslog 未正确转发到 Linux 服务器Syslog is not being forwarded correctly to the Linux server
  • 每秒转发的消息数太大,Log Analytics Linux 代理基本配置无法处理The number of messages being forwarded per second are too great for the base configuration of the Log Analytics agent for Linux to handle

解决方法Resolution

  • 验证 Syslog 的 Log Analytics 工作区中的配置是否具有所有设施和正确的日志级别。Verify the configuration in the Log Analytics workspace for Syslog has all the facilities and the correct log levels. 查看在 Azure 门户中配置 Syslog 收集Review configure Syslog collection in the Azure portal
  • 验证本机 Syslog 消息守护程序(rsyslogsyslog-ng)是否能够接收转发的消息Verify the native syslog messaging daemons (rsyslog, syslog-ng) are able to receive the forwarded messages
  • 检查 Syslog 服务器的防火墙设置,以确保未阻止消息Check firewall settings on the Syslog server to ensure that messages are not being blocked
  • 使用 logger 命令模拟向 Log Analytics 发送的 Syslog 消息Simulate a Syslog message to Log Analytics using logger command
    • logger -p local0.err "This is my test message"

问题:收到的 Errno 地址已在 omsagent 日志文件中使用Issue: You are receiving Errno address already in use in omsagent log file

如果你在 omsagent.log 中看到 [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 25224>If you see [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 25224> in omsagent.log.

可能的原因Probable causes

此错误指出 Linux 诊断扩展 (LAD) 与 Log Analytics Linux VM 扩展并行安装,并且它使用 syslog 数据收集所用的端口作为 omsagent。This error indicates that the Linux Diagnostic extension (LAD) is installed side by side with the Log Analytics Linux VM extension, and it is using same port for syslog data collection as omsagent.

解决方法Resolution

  1. 以 root 身份执行以下命令(请注意,25224 只是举例,你可能在自己的环境中看到 LAD 使用不同的端口号):As root, execute the following commands (note that 25224 is an example and it is possible that in your environment you see a different port number used by LAD):

    /opt/microsoft/omsagent/bin/configure_syslog.sh configure LAD 25229
    
    sed -i -e 's/25224/25229/' /etc/opt/microsoft/omsagent/LAD/conf/omsagent.d/syslog.conf
    

    然后,你需要编辑正确的 rsyslogdsyslog_ng 配置文件,并更改要写入到端口 25229 的 LAD 相关配置。You then need to edit the correct rsyslogd or syslog_ng config file and change the LAD-related configuration to write to port 25229.

  2. 如果 VM 正在运行 rsyslogd,要修改的文件是:/etc/rsyslog.d/95-omsagent.conf(如果不存在,则修改 /etc/rsyslog)。If the VM is running rsyslogd, the file to be modified is: /etc/rsyslog.d/95-omsagent.conf (if it exists, else /etc/rsyslog). 如果 VM 正在运行 syslog_ng,要修改的文件是:/etc/syslog-ng/syslog-ng.confIf the VM is running syslog_ng, the file to be modified is: /etc/syslog-ng/syslog-ng.conf.

  3. 重新启动 omsagent sudo /opt/microsoft/omsagent/bin/service_control restartRestart omsagent sudo /opt/microsoft/omsagent/bin/service_control restart.

  4. 重新启动 syslog 服务。Restart syslog service.

问题:无法使用清除选项卸载 omsagentIssue: You are unable to uninstall omsagent using purge option

可能的原因Probable causes

  • Linux 诊断扩展已安装Linux Diagnostic Extension is installed
  • Linux 诊断扩展已安装和卸载,但你仍会看到以下相关错误:omsagent 已被 mdsd 使用,无法删除。Linux Diagnostic Extension was installed and uninstalled, but you still see an error about omsagent being used by mdsd and cannot be removed.

解决方法Resolution

  1. 卸载 Linux 诊断扩展 (LAD)。Uninstall the Linux Diagnostic Extension (LAD).
  2. 如果 Linux 诊断扩展文件出现在以下位置,请从计算机中删除:/var/lib/waagent/Microsoft.Azure.Diagnostics.LinuxDiagnostic-<version>//var/opt/microsoft/omsagent/LAD/Remove Linux Diagnostic Extension files from the machine if they are present in the following location: /var/lib/waagent/Microsoft.Azure.Diagnostics.LinuxDiagnostic-<version>/ and /var/opt/microsoft/omsagent/LAD/.

问题:看不到任何 Nagios 数据Issue: You cannot see data any Nagios data

可能的原因Probable causes

  • Omsagent 用户没有权限从 Nagios 日志文件中读取Omsagent user does not have permissions to read from Nagios log file
  • Nagios 源和筛选器未从 omsagent.conf 文件中注释掉Nagios source and filter have not been uncommented from omsagent.conf file

解决方法Resolution

  1. 遵循以下这些说明添加 omsagent 用户以从 Nagios 文件读取。Add omsagent user to read from Nagios file by following these instructions.

  2. 在 Log Analytics Linux 代理常规配置文件的 /etc/opt/microsoft/omsagent/<workspace id>/conf/omsagent.conf 处,确保 Nagios 源和筛选器 均已 被注释掉。In the Log Analytics agent for Linux general configuration file at /etc/opt/microsoft/omsagent/<workspace id>/conf/omsagent.conf, ensure that both the Nagios source and filter are uncommented.

    <source>
      type tail
      path /var/log/nagios/nagios.log
      format none
      tag oms.nagios
    </source>
    
    <filter oms.nagios>
      type filter_nagios_log
    </filter>
    

问题:看不到任何 Linux 数据Issue: You are not seeing any Linux data

可能的原因Probable causes

  • 加入 Azure Monitor 失败Onboarding to Azure Monitor failed
  • 已阻止连接到 Azure MonitorConnection to Azure Monitor is blocked
  • 虚拟机已重新启动Virtual machine was rebooted
  • 相比 Log Analytics Linux 代理程序包安装的版本,OMI 程序包已手动升级到较新版本OMI package was manually upgraded to a newer version compared to what was installed by Log Analytics agent for Linux package
  • DSC 资源在 omsconfig.log 日志文件中记录“找不到类”错误DSC resource logs class not found error in omsconfig.log log file
  • Log Analytics 代理数据已备份Log Analytics agent for data is backed up
  • DSC 记录“当前配置不存在。执行 Start-DscConfiguration 命令及 -Path 参数来指定配置文件并先创建当前的配置。”DSC logs Current configuration does not exist. Execute Start-DscConfiguration command with -Path parameter to specify a configuration file and create a current configuration first. (在 omsconfig.log 日志文件中),但不存在关于 PerformRequiredConfigurationChecks 操作的日志消息。in omsconfig.log log file, but no log message exists about PerformRequiredConfigurationChecks operations.

解决方法Resolution

  1. 安装 auditd 程序包等所有依赖项。Install all dependencies like auditd package.

  2. 通过检查是否存在以下文件,来检查是否已成功加入 Azure Monitor:/etc/opt/microsoft/omsagent/<workspace id>/conf/omsadmin.confCheck if onboarding to Azure Monitor was successful by checking if the following file exists: /etc/opt/microsoft/omsagent/<workspace id>/conf/omsadmin.conf. 如果它不存在,使用 omsadmin.sh 命令行指令重新载入。If it was not, reonboard using the omsadmin.sh command line instructions.

  3. 如果使用代理服务器,请检查上述代理服务器故障排除步骤。If using a proxy, check proxy troubleshooting steps above.

  4. 在某些 Azure 分发系统中,omid OMI 服务器后台程序在重新启动虚拟机后未启动。In some Azure distribution systems, omid OMI server daemon does not start after the virtual machine is rebooted. 这将导致看不到 Audit、ChangeTracking 或 UpdateManagement 解决方案相关的数据。This will result in not seeing Audit, ChangeTracking, or UpdateManagement solution-related data. 解决方法是通过运行 sudo /opt/omi/bin/service_control restart 来手动启动 omi 服务器。The workaround is to manually start omi server by running sudo /opt/omi/bin/service_control restart.

  5. OMI 程序包手动升级到较新版本后,必须手动重新启动,Log Analytics 代理才能继续运行。After OMI package is manually upgraded to a newer version, it has to be manually restarted for Log Analytics agent to continue functioning. 对于其中 OMI 服务器在升级之后无法自动启动的分发,此为必需步骤。This step is required for some distros where OMI server does not automatically start after it is upgraded. 运行 sudo /opt/omi/bin/service_control restart 重新启动 OMI。Run sudo /opt/omi/bin/service_control restart to restart OMI.

  6. 如果在 omsconfig.log 中看到 DSC 资源“找不到类”错误,请运行 sudo /opt/omi/bin/service_control restartIf you see DSC resource class not found error in omsconfig.log, run sudo /opt/omi/bin/service_control restart.

  7. 在某些情况下,当 Log Analytics Linux 代理无法与 Azure Monitor 通信时,代理上的数据会备份到整个缓冲区:大小 50 MB。In some cases, when the Log Analytics agent for Linux cannot talk to Azure Monitor, data on the agent is backed up to the full buffer size: 50 MB. 该代理应通过运行以下命令重新启动:/opt/microsoft/omsagent/bin/service_control restartThe agent should be restarted by running the following command /opt/microsoft/omsagent/bin/service_control restart.

    备注

    此问题已在代理版本 1.1.0-28 或更高版本中解决This issue is fixed in Agent version 1.1.0-28 or later

  • 如果 omsconfig.log 日志文件未指出 PerformRequiredConfigurationChecks 操作定期在系统上运行,cron 作业/服务可能会有问题。If omsconfig.log log file does not indicate that PerformRequiredConfigurationChecks operations are running periodically on the system, there might be a problem with the cron job/service. 请确保 /etc/cron.d/OMSConsistencyInvoker 下存存在 cron 作业。Make sure cron job exists under /etc/cron.d/OMSConsistencyInvoker. 如果需要,请运行以下命令创建 cron 作业:If needed run the following commands to create the cron job:

    mkdir -p /etc/cron.d/
    echo "*/15 * * * * omsagent /opt/omi/bin/OMSConsistencyInvoker >/dev/null 2>&1" | sudo tee /etc/cron.d/OMSConsistencyInvoker
    

    此外,请确保 cron 服务正在运行。Also, make sure the cron service is running. 可以将 service cron status 与 Debian、Ubuntu、SUSE 结合使用,或者将 service crond status 与 RHEL、CentOS、Oracle Linux 结合使用,来检查此服务的状态。You can use service cron status with Debian, Ubuntu, SUSE, or service crond status with RHEL, CentOS, Oracle Linux to check the status of this service. 如果不存在该服务,可以安装二进制文件并使用以下命令启动该服务:If the service does not exist, you can install the binaries and start the service using the following:

    Ubuntu/DebianUbuntu/Debian

    # To Install the service binaries
    sudo apt-get install -y cron
    # To start the service
    sudo service cron start
    

    SUSESUSE

    # To Install the service binaries
    sudo zypper in cron -y
    # To start the service
    sudo systemctl enable cron
    sudo systemctl start cron
    

    RHEL/CeonOSRHEL/CeonOS

    # To Install the service binaries
    sudo yum install -y crond
    # To start the service
    sudo service crond start
    

    Oracle LinuxOracle Linux

    # To Install the service binaries
    sudo yum install -y cronie
    # To start the service
    sudo service crond start
    

问题:从 Syslog 或 Linux 性能计数器的门户配置收集时,未应用这些设置Issue: When configuring collection from the portal for Syslog or Linux performance counters, the settings are not applied

可能的原因Probable causes

  • Log Analytics Linux 代理未获取最新配置The Log Analytics agent for Linux has not picked up the latest configuration
  • 未应用门户中的已更改设置The changed settings in the portal were not applied

解决方法Resolution

背景: omsconfig 是每隔五分钟便会查找一次新门户端配置的 Log Analytics Linux 代理的配置代理。Background: omsconfig is the Log Analytics agent for Linux configuration agent that looks for new portal-side configuration every five minutes. 然后,此配置会应用到位于以下位置的 Log Analytics Linux 代理配置文件中:/etc/opt/microsoft/omsagent/conf/omsagent.conf。This configuration is then applied to the Log Analytics agent for Linux configuration files located at /etc/opt/microsoft/omsagent/conf/omsagent.conf.

  • 在某些情况下,Log Analytics Linux 配置代理可能无法与导致未应用最新配置的门户配置服务通信。In some cases, the Log Analytics agent for Linux configuration agent might not be able to communicate with the portal configuration service resulting in latest configuration not being applied.
    1. 通过运行 dpkg --list omsconfigrpm -qi omsconfig 检查是否已安装 omsconfig 代理。Check that the omsconfig agent is installed by running dpkg --list omsconfig or rpm -qi omsconfig. 如果未安装,请重新安装最新版本的 Log Analytics Linux 代理。If it is not installed, reinstall the latest version of the Log Analytics agent for Linux.

    2. 通过运行以下命令检查 omsconfig 是否可以与 Azure Monitor 进行通信:sudo su omsagent -c 'python /opt/microsoft/omsconfig/Scripts/GetDscConfiguration.py'Check that the omsconfig agent can communicate with Azure Monitor by running the following command sudo su omsagent -c 'python /opt/microsoft/omsconfig/Scripts/GetDscConfiguration.py'. 此命令返回代理从该服务中收到的配置(包括 Syslog 设置、Linux 性能计数器和自定义日志)。This command returns the configuration that agent receives from the service, including Syslog settings, Linux performance counters, and custom logs. 如果此命令失败,请运行以下命令:sudo su omsagent -c 'python /opt/microsoft/omsconfig/Scripts/PerformRequiredConfigurationChecks.py'If this command fails, run the following command sudo su omsagent -c 'python /opt/microsoft/omsconfig/Scripts/PerformRequiredConfigurationChecks.py'. 此命令会强制 omsconfig 代理与 Azure Monitor 进行通信并检索最新的配置。This command forces the omsconfig agent to talk to Azure Monitor and retrieve the latest configuration.

问题:看不到任何自定义日志数据Issue: You are not seeing any custom log data

可能的原因Probable causes

  • 加入 Azure Monitor 失败。Onboarding to Azure Monitor failed.
  • 未选择“将下列配置应用于我的 Linux 服务器”设置。The setting Apply the following configuration to my Linux Servers has not been selected.
  • omsconfig 尚未从该服务获取最新的自定义日志配置。omsconfig has not picked up the latest custom log configuration from the service.
  • Log Analytics Linux 代理用户 omsagent 无法访问自定义日志,原因是没有权限或者找不到该日志。Log Analytics agent for Linux user omsagent is unable to access the custom log due to permissions or not being found. 可能会看到如下错误:You may see the following errors:
  • [DATETIME] [warn]: file not found. Continuing without tailing it.
  • [DATETIME] [error]: file not accessible by omsagent.
  • 已知的争用条件问题在 Log Analytics Linux 代理版本 1.1.0-217 中已修复Known Issue with Race Condition fixed in Log Analytics agent for Linux version 1.1.0-217

解决方法Resolution

  1. 通过检查是否存在以下文件,验证是否已成功加入 Azure Monitor:/etc/opt/microsoft/omsagent/<workspace id>/conf/omsadmin.confVerify onboarding to Azure Monitor was successful by checking if the following file exists: /etc/opt/microsoft/omsagent/<workspace id>/conf/omsadmin.conf. 如果不存在,则可以:If not, either:

  2. 使用 omsadmin.sh 命令行指令重新载入。Reonboard using the omsadmin.sh command line instructions.

  3. 在 Azure 门户的“高级设置”下,确保已启用“将以下配置应用于我的 Linux 服务器”设置。Under Advanced Settings in the Azure portal, ensure that the setting Apply the following configuration to my Linux Servers is enabled.

  4. 通过运行以下命令检查 omsconfig 是否可以与 Azure Monitor 进行通信:sudo su omsagent -c 'python /opt/microsoft/omsconfig/Scripts/GetDscConfiguration.py'Check that the omsconfig agent can communicate with Azure Monitor by running the following command sudo su omsagent -c 'python /opt/microsoft/omsconfig/Scripts/GetDscConfiguration.py'. 此命令返回代理从该服务中收到的配置(包括 Syslog 设置、Linux 性能计数器和自定义日志)。This command returns the configuration that agent receives from the service, including Syslog settings, Linux performance counters, and custom logs. 如果此命令失败,请运行以下命令:sudo su omsagent -c 'python /opt/microsoft/omsconfig/Scripts/PerformRequiredConfigurationChecks.py'If this command fails, run the following command sudo su omsagent -c 'python /opt/microsoft/omsconfig/Scripts/PerformRequiredConfigurationChecks.py'. 此命令会强制 omsconfig 代理与 Azure Monitor 进行通信并检索最新的配置。This command forces the omsconfig agent to talk to Azure Monitor and retrieve the latest configuration.

背景: Log Analytics Linux 代理不是以具有特权的用户 root 身份运行,而是以 omsagent 用户身份运行。Background: Instead of the Log Analytics agent for Linux running as a privileged user - root, the agent runs as the omsagent user. 在大多数情况下,必须为此用户授予显式权限以便读取某些文件。In most cases, explicit permission must be granted to this user in order for certain files to be read. 要为 omsagent 用户授予权限,请运行以下命令︰To grant permission to omsagent user, run the following commands:

  1. omsagent 用户添加到特定组 sudo usermod -a -G <GROUPNAME> <USERNAME>Add the omsagent user to specific group sudo usermod -a -G <GROUPNAME> <USERNAME>
  2. 授予对所需文件 sudo chmod -R ugo+rx <FILE DIRECTORY> 的通用读取权限Grant universal read access to the required file sudo chmod -R ugo+rx <FILE DIRECTORY>

这是 1.1.0-217 之前的 Log Analytics Linux 代理版本中已知的争用条件问题。There is a known issue with a race condition with the Log Analytics agent for Linux version earlier than 1.1.0-217. 更新到最新的代理后,运行以下命令以获取最新版本的输出插件:sudo cp /etc/opt/microsoft/omsagent/sysconf/omsagent.conf /etc/opt/microsoft/omsagent/<workspace id>/conf/omsagent.confAfter updating to the latest agent, run the following command to get the latest version of the output plugin sudo cp /etc/opt/microsoft/omsagent/sysconf/omsagent.conf /etc/opt/microsoft/omsagent/<workspace id>/conf/omsagent.conf.

问题:你正尝试重新载入到新的工作区Issue: You are trying to reonboard to a new workspace

当你尝试将代理重新载入新的工作区时,载入之前需要清理 Log Analytics 代理配置。When you try to reonboard an agent to a new workspace, the Log Analytics agent configuration needs to be cleaned up before reonboarding. 若要清理代理中的旧配置,请使用 --purge 运行shell 捆绑包To clean up old configuration from the agent, run the shell bundle with --purge

sudo sh ./omsagent-*.universal.x64.sh --purge

Or

sudo sh ./onboard_agent.sh --purge

你可以在使用 --purge 选项后继续重新载入You can continue reonboard after using the --purge option

在 Azure 门户中,log Analytics 代理扩展标记为失败状态:预配失败Log Analytics agent extension in the Azure portal is marked with a failed state: Provisioning failed

可能的原因Probable causes

  • 已从操作系统中删除 log Analytics 代理Log Analytics agent has been removed from the operating system
  • Log Analytics 代理服务已关闭、已禁用或未配置Log Analytics agent service is down, disabled, or not configured

解决方法Resolution

执行以下步骤来更正问题。Perform the following steps to correct the issue.

  1. 从 Azure 门户中删除扩展。Remove extension from Azure portal.
  2. 按照说明安装代理。Install the agent following the instructions.
  3. 运行以下命令重启代理:sudo /opt/microsoft/omsagent/bin/service_control restartRestart the agent by running the following command: sudo /opt/microsoft/omsagent/bin/service_control restart.
  • 等待几分钟,并将预配状态更改为“预配成功”。Wait several minutes and the provisioning state changes to Provisioning succeeded.

问题:Log Analytics 代理升级按需进行Issue: The Log Analytics agent upgrade on-demand

可能的原因Probable causes

主机上的 Log Analytics 代理程序包已过期。The Log Analytics agent packages on the host are outdated.

解决方法Resolution

执行以下步骤来更正问题。Perform the following steps to correct the issue.

  1. 查看页面上的最新版本。Check for the latest release on page.

  2. 下载安装脚本(1.4.2-124 作为示例版本):Download install script (1.4.2-124 as example version):

    wget https://github.com/Microsoft/OMS-Agent-for-Linux/releases/download/OMSAgent_GA_v1.4.2-124/omsagent-1.4.2-124.universal.x64.sh
    
  3. 通过执行以下命令升级程序包:sudo sh ./omsagent-*.universal.x64.sh --upgradeUpgrade packages by executing sudo sh ./omsagent-*.universal.x64.sh --upgrade.