最近集群总是遇到某些节点出现pod一直terminating,无法正常删除的状态,查看系统日志
journalctl -xeu kubelet
发现kubelet的异常日志如下
no space left on device
根据关键字imotify_add_watch
,网上介绍说是fs.inotify.max_user_watches
的值不够的缘故,查看节点的该值
cat /proc/sys/fs/inotify/max_user_watches
临时调整
sysctl -w fs.inotify.max_user_watches=65536
观察到kubelet确实正常自动,为了避免重启机器配置丢失,将配置写入/etc/sysctl.conf
# 新增配置
fs.inotify.max_user_watches=65536
# 启动配置
sysctl -p
利用ansible操作多台机器
hosts.ini
[kube-all]
192.168.0.1 ansible_ssh_user=ubuntu ansible_ssh_port=22 ansible_ssh_pass="***" ansible_sudo_pass="***"
fix_watch.yaml
---
- hosts: kube-all
become_user: root
become: yes
gather_facts: no
tasks:
- name: change watch
lineinfile:
path: /etc/sysctl.conf
line: fs.inotify.max_user_watches=65536
- name: sysctl -p
shell: |
sysctl -p
ansible.cfg
[defaults]
host_key_checking = False
any_errors_fatal = True
timeout = 30
forks = 10
[ssh_connection]
ssh_args=-F ansible_ssh_config
retries=10
ansible_ssh_config
Host *
ForwardAgent no
ControlMaster=auto
ControlPersist=300s
执行命令,即可完成机器的配置调整,上面的ansible.cfg
在调整了一些配置,ansible会自动优先查找当前目录下的该文件
ansible-playbook -i hosts,ini fix_watch.yaml