Ansible错误汇总

最近使用Ansible批量添加zabbix-agent时,碰到不少问题,在此记录一下。

Ansible版本:2.9.2
Python版本:2.7.5
pywinrm版本:0.4.3

当使用Ansible管理主机时报错
-- 首先要排除网络问题
-- 其次根据错误提示来判断问题根源(可以在命令后添加-vvvvvv来查看详细信息)

CentOS环境

报错:Shared connection to xxx.xxx.xxx.xxx closed

xxx.xxx.xxx.xxx | FAILED! => {
    "changed": false, 
    "module_stderr": "Shared connection to xxx.xxx.xxx.xxx closed.\r\n", 
    "module_stdout": "Traceback (most recent call last):\r\n  File \"/root/.ansible/tmp/ansible-tmp-1657010661.53-56592330631735/command.py\", line 123, in <module>\r\n    f.write(z.read('ansible_module_command.py'))\r\n  File \"/data/app/python/Lib/zipfile.py\", line 1314, in read\r\n    with self.open(name, \"r\", pwd) as fp:\r\n  File \"/data/app/python/Lib/zipfile.py\", line 1425, in open\r\n    return ZipExtFile(zef_file, mode, zinfo, zd, True)\r\n  File \"/data/app/python/Lib/zipfile.py\", line 758, in __init__\r\n    self._decompressor = _get_decompressor(self._compress_type)\r\n  File \"/data/app/python/Lib/zipfile.py\", line 678, in _get_decompressor\r\n    return zlib.decompressobj(-15)\r\nAttributeError: 'NoneType' object has no attribute 'decompressobj'\r\n", 
    "msg": "MODULE FAILURE", 
    "rc": 0
}

module_stdout:
Traceback (most recent call last):  
File "/root/.ansible/tmp/ansible-tmp-1657786888.03-3276-66518254516251/AnsiballZ_ping.py", line 114, in 
    <module> _ansiballz_main()  
File "/root/.ansible/tmp/ansible-tmp-1657786888.03-3276-66518254516251/AnsiballZ_ping.py", line 106, in             
    _ansiballz_main invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)  
File "/root/.ansible/tmp/ansible-tmp-1657786888.03-3276-66518254516251/AnsiballZ_ping.py", line 41, in invoke_module    
    f.write(z.read('__main__.py'))  
File "/data/app/python/Lib/zipfile.py", line 1314, in read    
    with self.open(name, "r", pwd) as fp:  
File "/data/app/python/Lib/zipfile.py", line 1425, in open    
    return ZipExtFile(zef_file, mode, zinfo, zd, True)  
File "/data/app/python/Lib/zipfile.py", line 758, in __init__    
    self._decompressor = _get_decompressor(self._compress_type)  
File "/data/app/python/Lib/zipfile.py", line 678, in _get_decompressor    
    return zlib.decompressobj(-15)AttributeError: 'NoneType' object has no attribute 'decompressobj'",

原因:
  查看目标主机,发现是升级过python版本(python2.7.5 -> python3.6.5),在编译安装时没有安装zlib-devel库。

解决方式:
  需要安装zlib-devel库,并重新编译安装python3.6.5。

报错:connect to host xxx.xxx.xxx.xxx port 22: Connection timed out

xxx.xxx.xxx.xxx | UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: ssh: connect to host xxx.xxx.xxx.xxx port 22: Connection timed out\r\n",
"unreachable": true
}

原因:
  导致出现Connection timed out的基本都是网络性问题。

解决方式:
  排查网络连通性以及端口是否开启,目标主机有没有开启防火墙。

报错:Authentication failure

# Ansible 2.4.0
xxx.xxx.xxx.xxx | UNREACHABLE! => {
    "changed": false, 
    "msg": "Authentication failure.", 
    "unreachable": true
}

# Ansible 2.9.2
xxx.xxx.xxx.xxx | UNREACHABLE! => {
    "changed": false,
    "msg": "Invalid/incorrect password: Permission denied, please try again.",
    "unreachable": true
}

原因:
  密码错误

解决方法
  更改正确密码

报错:安装时报错缺少libpcre.so.1()库

fatal: [xxx.xxx.xxx.xxx]: FAILED! => {
"changed": true, 
"cmd": ["rpm", "-ivh", "/etc/zabbix/zabbix-agent-4.0.9-3.el7.x86_64.rpm"], 
"delta": "0:00:00.086235", "end": "2022-07-15 05:38:46.825589", 
"msg": "non-zero return code", 
"rc": 1, "start": "2022-07-15 05:38:46.739354", 
"stderr": "warning: /etc/zabbix/zabbix-agent-4.0.9-3.el7.x86_64.rpm: Header V4 RSA/SHA512 Signature, key ID a14fe591: NOKEY\nerror: Failed dependencies: libpcre.so.1()(64bit) is needed by zabbix-agent-4.0.9-3.el7.x86_64 systemd is needed by zabbix-agent-4.0.9-3.el7.x86_64", 
"stderr_lines": [
    "warning: /etc/zabbix/zabbix-agent-4.0.9-3.el7.x86_64.rpm: Header V4 RSA/SHA512 Signature, key ID a14fe591: NOKEY", 
    "error: Failed dependencies:", 
    "libpcre.so.1()(64bit) is needed by zabbix-agent-4.0.9-3.el7.x86_64",
    "systemd is needed by zabbix-agent-4.0.9-3.el7.x86_64"
    ], 
"stdout": "", 
"stdout_lines": []
}

原因:
  1、报错缺少libpcre.so.1()时,首先检查zabbix-agent版本是否与目标主机的内核版本一致;
  2、如内核版本与agent一致,则再查看目标主机中是否缺少libpcre.so.1库

解决方法:
  更换与目标主机内核版本一致的agent版本

报错: Permission denied (publickey)

xxx.xxx.xxx.xxx | UNREACHABLE! => {
    "changed": false,
    "msg": "Failed to connect to the host via ssh: root@xxx.xxx.xxx.xxx: Permission denied (publickey).",
    "unreachable": true
}

原因:
  目标主机使用秘钥登录

解决方法:
  1、将Ansible主机的ssh公钥拷贝到目标主机(推荐);
  2、修改目标主机的ssh登录方式;

报错:No space left on device

xxx.xxx.xxx.xxx | UNREACHABLE! => {
    "changed": false,
    "msg": "mkdir: cannot create directory ‘/root/.ansible’: No space left on device\n",
    "unreachable": true
}

原因:
  目标主机的磁盘空间不足

解决方法:
  清理磁盘空间

Windows环境

报错:the specified credentials were rejected by the server

xxx.xxx.xxx.xxx | UNREACHABLE! => {
    "changed": false,
    "msg": "ntlm: the specified credentials were rejected by the server",
    "unreachable": true
}

原因:
  目标主机密码不正确

解决方法:
  修改正确密码

报错:Failed to establish a new connection [Errno 111]

xxx.xxx.xxx.xxx10.2.50.73 | UNREACHABLE! => {
    "changed": false,
    "msg": "ntlm: HTTPConnectionPool(host='xxx.xxx.xxx.xxx', port=5985): Max retries exceeded with url: /wsman (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff2040e7f50>: Failed to establish a new connection: [Errno 111] Connection refused',))",
    "unreachable": true
}

原因:
  1、查看网络是否互通;
  2、查看目标主机没有开放5985远程端口。

解决方法:
  查看目标主机winrm远程服务是否开启 - powershell -> (Get-Service -Name winrm).status
    -- 如果没有运行,运行1.ps1脚本开启winrm远程服务(需注意 - Powershell>=3.0, .NET>=4.0)

报错:Failed to establish a new connection: [Errno 113]

xxx.xxx.xxx.xxx | UNREACHABLE! => {
    "changed": false,
    "msg": "ntlm: HTTPConnectionPool(host='xxx.xxx.xxx.xxx', port=5985): Max retries exceeded with url: /wsman (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f06d333e650>: Failed to establish a new connection: [Errno 113] No route to host',))",
    "unreachable": true
}

原因:
  1、这台主机是Linux系统,但是使用windows方式连接;
  2、这台主机防火墙开启,No route to host报错是因为防火墙没有开放端口;

解决方法:
  在防火墙上开启对应端口

报错:Connection to xxx.xxx.xxx.xxx timed out

xxx.xxx.xxx.xxx | UNREACHABLE! => {
    "changed": false,
    "msg": "ntlm: HTTPConnectionPool(host='xxx.xxx.xxx.xxx', port=5985): Max retries exceeded with url: /wsman (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7ff2040cff90>, 'Connection to xxx.xxx.xxx.xxx timed out. (connect timeout=30)'))",
    "unreachable": true
}

原因:
  1、网络不通;
  2、端口超时

解决方法:
  1、开通网络互通策略;
  2、查看目标主机是否开放5985端口(winrm服务);

报错:winrm send_input failed

# Ansible 2.4.0
xxx.xxx.xxx.xxx | FAILED! => {
"changed": false,
    "module_stderr": "An error occurred while creating the pipeline.\r\n    + CategoryInfo          : NotSpecified: (:) [], ParentContainsErrorRecordException\r\n    + FullyQualifiedErrorId : RuntimeException\r\n \r\n",
"module_stdout": "",
"msg": "MODULE FAILURE",
"rc": 3221226519
}

# Ansible 2.9.2
[WARNING]: ERROR DURING WINRM SEND INPUT - attempting to recover: WinRMError WinRMError(u"\u7ba1\u9053\u5df2\u7ed3\u675f\u3002  (extended fault data:
{u'fault_subcode': 'w:InternalError', u'fault_code': 's:Receiver', u'wsmanfault_code': '109', 'transport_message': u'Bad HTTP response returned from server.
Code 500', 'http_status_code': 500})",)

xxx.xxx.xxx.xxx | FAILED! => {
    "msg": "winrm send_input failed; \nstdout: \nstderr "
}

原因:
  这个原因没有找到,如果有找到原因和解决方法的小伙伴,希望能分享一下,不胜感激。

解决方法:

报错:There is not enough space on the disk

An exception occurred during task execution. To see the full traceback, use -vvv. The error was:    at System.Management.Automation.CommandProcessorBase.Complete()
xxx.xxx.xxx.xxx | FAILED! => {
    "changed": false,
    "msg": "internal error: failed to run exec_wrapper action module_powershell_wrapper: Exception calling \"CompileAssemblyFromDom\" with \"2\" argument(s): \"There is not enough space on the disk.\r\n\""
}

原因:
  目标主机磁盘不足

解决方法:
  清理磁盘空间

报错:The RPC server is unavailable

xxx.xxx.xxx.xxx | UNREACHABLE! => {
    "changed": false,
    "msg": "ntlm: The RPC server is unavailable.  (extended fault data: {u'fault_subcode': 'w:InternalError', u'fault_code': 's:Receiver', u'wsmanfault_code': '2147944122', 'transport_message': u'Bad HTTP response returned from server. Code 500', 'http_status_code': 500})",
    "unreachable": true
}

原因:
  目标主机的RPC服务未开启

解决方法:
  开启RPC服务
    -- win+R -> 输入services.msc -> 在服务中找到RPC Exdpoint Mapper -> 右键选择属性 -> 启动类型:自动,服务状态选择启动;

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容