背景:测试服务器为公用环境,服务进程总是莫名被停止
解决方法:使用supervisor进行进程管理,服务进程被异常终止时,可以自动重启
安装命令:
sudo apt-get install supervisor
进程配置,在/etc/supervisor/cond.d中新增xxx.conf,内容示例:
[program:abc]
directory=/data/deploy/master/system/test_env/abc
command=/usr/bin/nohup bin/abc &
stdout_logfile=/etc/supervisor/conf.d/log.out
autostart=true
autorestart=true
startsecs=60
priority=1
stopasgroup=true
killasgroup=true
进程组配置,修改/etc/supervisor/supervisord.conf,增加以下配置:
[group:XXXX]
programs = abc,bcd
priority = 999
如需增加http_server(在浏览器查看进程运行情况),修改/etc/supervisor/supervisord.conf,增加以下配置:
[inet_http_server] ; inet (TCP) server disabled by default
port=*:9001 ; (ip_address:port specifier, *:port for all iface)
username=**** ; (default is no username (open server))
password=****
启动命令:
supervisord -c /etc/supervisor/supervisord.conf
常见问题:
用非root权限运行supervisord报错:
error: <class 'socket.error'>, [Errno 13] Permission denied: file: /usr/lib/python2.7/socket.py line: 224
解决方法,修改成如下配置:
[unix_http_server]
file=/tmp/supervisor.sock ; (the path to the socket file)
chmod=0766 ; socket file mode (default 0700)