最近使用Ansible批量添加zabbix-agent时,碰到不少问题,在此记录一下。
Ansible版本:2.9.2
Python版本:2.7.5
pywinrm版本:0.4.3当使用Ansible管理主机时报错
-- 首先要排除网络问题
-- 其次根据错误提示来判断问题根源(可以在命令后添加-vvvvvv来查看详细信息)
CentOS环境
报错:Shared connection to xxx.xxx.xxx.xxx closed
xxx.xxx.xxx.xxx | FAILED! => {
"changed": false,
"module_stderr": "Shared connection to xxx.xxx.xxx.xxx closed.\r\n",
"module_stdout": "Traceback (most recent call last):\r\n File \"/root/.ansible/tmp/ansible-tmp-1657010661.53-56592330631735/command.py\", line 123, in <module>\r\n f.write(z.read('ansible_module_command.py'))\r\n File \"/data/app/python/Lib/zipfile.py\", line 1314, in read\r\n with self.open(name, \"r\", pwd) as fp:\r\n File \"/data/app/python/Lib/zipfile.py\", line 1425, in open\r\n return ZipExtFile(zef_file, mode, zinfo, zd, True)\r\n File \"/data/app/python/Lib/zipfile.py\", line 758, in __init__\r\n self._decompressor = _get_decompressor(self._compress_type)\r\n File \"/data/app/python/Lib/zipfile.py\", line 678, in _get_decompressor\r\n return zlib.decompressobj(-15)\r\nAttributeError: 'NoneType' object has no attribute 'decompressobj'\r\n",
"msg": "MODULE FAILURE",
"rc": 0
}
module_stdout:
Traceback (most recent call last):
File "/root/.ansible/tmp/ansible-tmp-1657786888.03-3276-66518254516251/AnsiballZ_ping.py", line 114, in
<module> _ansiballz_main()
File "/root/.ansible/tmp/ansible-tmp-1657786888.03-3276-66518254516251/AnsiballZ_ping.py", line 106, in
_ansiballz_main invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)
File "/root/.ansible/tmp/ansible-tmp-1657786888.03-3276-66518254516251/AnsiballZ_ping.py", line 41, in invoke_module
f.write(z.read('__main__.py'))
File "/data/app/python/Lib/zipfile.py", line 1314, in read
with self.open(name, "r", pwd) as fp:
File "/data/app/python/Lib/zipfile.py", line 1425, in open
return ZipExtFile(zef_file, mode, zinfo, zd, True)
File "/data/app/python/Lib/zipfile.py", line 758, in __init__
self._decompressor = _get_decompressor(self._compress_type)
File "/data/app/python/Lib/zipfile.py", line 678, in _get_decompressor
return zlib.decompressobj(-15)AttributeError: 'NoneType' object has no attribute 'decompressobj'",
原因:
查看目标主机,发现是升级过python版本(python2.7.5 -> python3.6.5),在编译安装时没有安装zlib-devel库。
解决方式:
需要安装zlib-devel库,并重新编译安装python3.6.5。
报错:connect to host xxx.xxx.xxx.xxx port 22: Connection timed out
xxx.xxx.xxx.xxx | UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: ssh: connect to host xxx.xxx.xxx.xxx port 22: Connection timed out\r\n",
"unreachable": true
}
原因:
导致出现Connection timed out的基本都是网络性问题。
解决方式:
排查网络连通性以及端口是否开启,目标主机有没有开启防火墙。
报错:Authentication failure
# Ansible 2.4.0
xxx.xxx.xxx.xxx | UNREACHABLE! => {
"changed": false,
"msg": "Authentication failure.",
"unreachable": true
}
# Ansible 2.9.2
xxx.xxx.xxx.xxx | UNREACHABLE! => {
"changed": false,
"msg": "Invalid/incorrect password: Permission denied, please try again.",
"unreachable": true
}
原因:
密码错误
解决方法:
更改正确密码
报错:安装时报错缺少libpcre.so.1()库
fatal: [xxx.xxx.xxx.xxx]: FAILED! => {
"changed": true,
"cmd": ["rpm", "-ivh", "/etc/zabbix/zabbix-agent-4.0.9-3.el7.x86_64.rpm"],
"delta": "0:00:00.086235", "end": "2022-07-15 05:38:46.825589",
"msg": "non-zero return code",
"rc": 1, "start": "2022-07-15 05:38:46.739354",
"stderr": "warning: /etc/zabbix/zabbix-agent-4.0.9-3.el7.x86_64.rpm: Header V4 RSA/SHA512 Signature, key ID a14fe591: NOKEY\nerror: Failed dependencies: libpcre.so.1()(64bit) is needed by zabbix-agent-4.0.9-3.el7.x86_64 systemd is needed by zabbix-agent-4.0.9-3.el7.x86_64",
"stderr_lines": [
"warning: /etc/zabbix/zabbix-agent-4.0.9-3.el7.x86_64.rpm: Header V4 RSA/SHA512 Signature, key ID a14fe591: NOKEY",
"error: Failed dependencies:",
"libpcre.so.1()(64bit) is needed by zabbix-agent-4.0.9-3.el7.x86_64",
"systemd is needed by zabbix-agent-4.0.9-3.el7.x86_64"
],
"stdout": "",
"stdout_lines": []
}
原因:
1、报错缺少libpcre.so.1()时,首先检查zabbix-agent版本是否与目标主机的内核版本一致;
2、如内核版本与agent一致,则再查看目标主机中是否缺少libpcre.so.1库
解决方法:
更换与目标主机内核版本一致的agent版本
报错: Permission denied (publickey)
xxx.xxx.xxx.xxx | UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: root@xxx.xxx.xxx.xxx: Permission denied (publickey).",
"unreachable": true
}
原因:
目标主机使用秘钥登录
解决方法:
1、将Ansible主机的ssh公钥拷贝到目标主机(推荐);
2、修改目标主机的ssh登录方式;
报错:No space left on device
xxx.xxx.xxx.xxx | UNREACHABLE! => {
"changed": false,
"msg": "mkdir: cannot create directory ‘/root/.ansible’: No space left on device\n",
"unreachable": true
}
原因:
目标主机的磁盘空间不足
解决方法:
清理磁盘空间
Windows环境
报错:the specified credentials were rejected by the server
xxx.xxx.xxx.xxx | UNREACHABLE! => {
"changed": false,
"msg": "ntlm: the specified credentials were rejected by the server",
"unreachable": true
}
原因:
目标主机密码不正确
解决方法:
修改正确密码
报错:Failed to establish a new connection [Errno 111]
xxx.xxx.xxx.xxx10.2.50.73 | UNREACHABLE! => {
"changed": false,
"msg": "ntlm: HTTPConnectionPool(host='xxx.xxx.xxx.xxx', port=5985): Max retries exceeded with url: /wsman (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff2040e7f50>: Failed to establish a new connection: [Errno 111] Connection refused',))",
"unreachable": true
}
原因:
1、查看网络是否互通;
2、查看目标主机没有开放5985远程端口。
解决方法:
查看目标主机winrm远程服务是否开启 - powershell -> (Get-Service -Name winrm).status
-- 如果没有运行,运行1.ps1脚本开启winrm远程服务(需注意 - Powershell>=3.0, .NET>=4.0)
报错:Failed to establish a new connection: [Errno 113]
xxx.xxx.xxx.xxx | UNREACHABLE! => {
"changed": false,
"msg": "ntlm: HTTPConnectionPool(host='xxx.xxx.xxx.xxx', port=5985): Max retries exceeded with url: /wsman (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f06d333e650>: Failed to establish a new connection: [Errno 113] No route to host',))",
"unreachable": true
}
原因:
1、这台主机是Linux系统,但是使用windows方式连接;
2、这台主机防火墙开启,No route to host报错是因为防火墙没有开放端口;
解决方法:
在防火墙上开启对应端口
报错:Connection to xxx.xxx.xxx.xxx timed out
xxx.xxx.xxx.xxx | UNREACHABLE! => {
"changed": false,
"msg": "ntlm: HTTPConnectionPool(host='xxx.xxx.xxx.xxx', port=5985): Max retries exceeded with url: /wsman (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7ff2040cff90>, 'Connection to xxx.xxx.xxx.xxx timed out. (connect timeout=30)'))",
"unreachable": true
}
原因:
1、网络不通;
2、端口超时
解决方法:
1、开通网络互通策略;
2、查看目标主机是否开放5985端口(winrm服务);
报错:winrm send_input failed
# Ansible 2.4.0
xxx.xxx.xxx.xxx | FAILED! => {
"changed": false,
"module_stderr": "An error occurred while creating the pipeline.\r\n + CategoryInfo : NotSpecified: (:) [], ParentContainsErrorRecordException\r\n + FullyQualifiedErrorId : RuntimeException\r\n \r\n",
"module_stdout": "",
"msg": "MODULE FAILURE",
"rc": 3221226519
}
# Ansible 2.9.2
[WARNING]: ERROR DURING WINRM SEND INPUT - attempting to recover: WinRMError WinRMError(u"\u7ba1\u9053\u5df2\u7ed3\u675f\u3002 (extended fault data:
{u'fault_subcode': 'w:InternalError', u'fault_code': 's:Receiver', u'wsmanfault_code': '109', 'transport_message': u'Bad HTTP response returned from server.
Code 500', 'http_status_code': 500})",)
xxx.xxx.xxx.xxx | FAILED! => {
"msg": "winrm send_input failed; \nstdout: \nstderr "
}
原因:
这个原因没有找到,如果有找到原因和解决方法的小伙伴,希望能分享一下,不胜感激。
解决方法:
报错:There is not enough space on the disk
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: at System.Management.Automation.CommandProcessorBase.Complete()
xxx.xxx.xxx.xxx | FAILED! => {
"changed": false,
"msg": "internal error: failed to run exec_wrapper action module_powershell_wrapper: Exception calling \"CompileAssemblyFromDom\" with \"2\" argument(s): \"There is not enough space on the disk.\r\n\""
}
原因:
目标主机磁盘不足
解决方法:
清理磁盘空间
报错:The RPC server is unavailable
xxx.xxx.xxx.xxx | UNREACHABLE! => {
"changed": false,
"msg": "ntlm: The RPC server is unavailable. (extended fault data: {u'fault_subcode': 'w:InternalError', u'fault_code': 's:Receiver', u'wsmanfault_code': '2147944122', 'transport_message': u'Bad HTTP response returned from server. Code 500', 'http_status_code': 500})",
"unreachable": true
}
原因:
目标主机的RPC服务未开启
解决方法:
开启RPC服务
-- win+R -> 输入services.msc -> 在服务中找到RPC Exdpoint Mapper -> 右键选择属性 -> 启动类型:自动,服务状态选择启动;