写java代码,文件资源的释放需要特别谨慎的对待.通常文件资源使用后必须close,然后再删除。
如果先删除但没有close掉,会造成文件句柄未被释放,这会造成实际使用磁盘空间较大,删除文件不释放磁盘空间。
此时文件关闭了,但是out还持有文件,out未关闭则文件句柄未被释放,会造成实际可使用空间小于可使用空间。
文件句柄的调试可用lsof 命令进行查看:
lsof -s | grep java
lsof -s |grep deleted
系统告警磁盘空间不足,因为某个服务一直在刷错误日志,磁盘爆了,把容器删除重新起了一个。
df -h 后磁盘空间没有释放
du -sh 统计没有占用那么多空间
通过指令:lsof | grep deleted 指令,查看当前系统句柄未释放情况
因为都是容器空间,所以只查看容器进程未释放的文件句柄。
lsof | grep deleted
lsof -p 3495 | grep deleted
lsof -p $(ps aux |grep dockerd |grep -v grep |awk '{print$2}') | grep deleted
发现有很多已经不存在的容器空间文件句柄未释放。
问题找到后怎么解决,有两种方法。
1、将当前线程进行重启,关闭线程,从而让句柄释放,释放空间。
2、找到指定的文件句柄,将当前文件句柄的大小设置为空。
第一种方法频繁重启不适合当前业务场景在生产环境不适用,采用第二种方法。
通过lsof | grep deleted拿到 PID(进程标识符)和 FD(文件描述符,应用程序通过文件描述符识别该文件。)
置空文件内容,然后查看磁盘使用发现空间恢复了:
echo > /proc/${pid}/fd/${fd}
echo > /proc/3840/fd/124
或者
truncate -s 0 /proc/${pid}/fd/${fd}
truncate -s 0 /proc/3840/fd/124
文件删除空间不释放,必须重启解决?
DBA日常运维过程中经常会遇到服务器磁盘空间不足的问题,容易一顿操作猛如虎,直接删除服务器不常用的日志和文件,然而空间并没有释放,给后来者留下隐患。
##/usr/local空间不足
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 20G 9.0G 9.6G 49% /
/dev/sda3 20G 8.2G 11G 79% /usr/local
/dev/sda4 401G 297G 84G 78% /data
##查看只有8.2G,实际占用15G
# du -sh /usr/local/
8.2G /usr/local/
检查/usr/local目录下删除的文件,发现有日志被删除,但是不少进程占用文件句柄,空间并未释放
# lsof /usr/local/|grep -i delete
mysqld_sa 21589 user_00 2w REG 8,3 6858136961 1053546 /usr/local/soft/shell_pkginstall_2021-07-07.log (deleted)
router_up 27504 user_00 1w REG 8,3 6858136961 1053546 /usr/local/soft/shell_pkginstall_2021-07-07.log (deleted)
router_up 27504 user_00 2w REG 8,3 6858136961 1053546 /usr/local/soft/shell_pkginstall_2021-07-07.log (deleted)
dcagent_t 28895 user_00 1w REG 8,3 6858136961 1053546 /usr/local/soft/shell_pkginstall_2021-07-07.log (deleted)
dcagent_t 28895 user_00 2w REG 8,3 6858136961 1053546 /usr/local/soft/shell_pkginstall_2021-07-07.log (deleted)
dcagent_t 28897 user_00 1w REG 8,3 6858136961 1053546 /usr/local/soft/shell_pkginstall_2021-07-07.log (deleted)
dcagent_t 28897 user_00 2w REG 8,3 6858136961 1053546 /usr/local/soft/shell_pkginstall_2021-07-07.log (deleted)
router_up 70763 user_00 1w REG 8,3 6858137684 1053546 /usr/local/soft/shell_pkginstall_2021-07-07.log (deleted)
router_up 70763 user_00 2w REG 8,3 6858137684 1053546 /usr/local/soft/shell_pkginstall_2021-07-07.log (deleted)
dcagent_t 71398 user_00 1w REG 8,3 6858137684 1053546 /usr/local/soft/shell_pkginstall_2021-07-07.log (deleted)
dcagent_t 71398 user_00 2w REG 8,3 6858137684 1053546 /usr/local/soft/shell_pkginstall_2021-07-07.log (deleted)
dcagent_t 71403 user_00 1w REG 8,3 6858137684 1053546 /usr/local/soft/shell_pkginstall_2021-07-07.log (deleted)
dcagent_t 71403 user_00 2w REG 8,3 6858137684 1053546 /usr/local/soft/shell_pkginstall_2021-07-07.log (deleted)
java 72021 user_00 1w REG 8,3 6858137684 1053546 /usr/local/soft/shell_pkginstall_2021-07-07.log (deleted)
java 72021 user_00 2w REG 8,3 6858137684 1053546 /usr/local/soft/shell_pkginstall_2021-07-07.log (deleted)
方案一:重启相关进程
这是最常见的方案,只有重启相关进程后,占用的文件句柄才会释放,磁盘空间也会释放
方案二:置空未释放文件句柄的文件
##从相关进程中随机找1个,查看文件句柄
# ls -l /proc/21589/fd
total 0
lr-x------ 1 user_00 users 64 Jun 22 15:05 0 -> /dev/nulll
-wx------ 1 user_00 users 64 Jun 22 15:05 1 -> /data/log/dblogs/nohup.out
l-wx------ 1 user_00 users 64 Jun 22 15:05 2 -> /usr/local/soft/shell_pkginstall_2021-07-07.log (deleted)
lrwx------ 1 user_00 users 64 Jun 22 15:05 3 -> socket:[3094866717]
##发现文件句柄2占用了删除文件
l-wx------ 1 user_00 users 64 Jun 22 15:05 2 -> /usr/local/soft/shell_pkginstall_2021-07-07.log (deleted)
##清空文件句柄2占用的文件
# >/proc/21589/fd/2
##查看磁盘,发现空间已释放
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 20G 9.0G 9.6G 49% /
/dev/sda3 20G 8.2G 11G 79% /usr/local
/dev/sda4 401G 297G 84G 78% /data
#优化文件打开数
##CentOS6.x版本,是先读/etc/security/limits.conf,如果/etc/security/limits.d/目录下还有配置文件的话
##CentOS7.x会遍历读取里面文件,所以/etc/security/limits.d/里面的文件里面的配置会覆盖/etc/security/limits.conf的配置
##注释原有的nofile行
sed -i "/nofile/s/^/#/g" /etc/security/limits.conf
##注释原有的nproc行
sed -i "/nproc/s/^/#/g" /etc/security/limits.conf
echo "* soft nofile 1048576" >>/etc/security/limits.conf
echo "* hard nofile 1048576" >>/etc/security/limits.conf
echo "root soft nofile 1048576" >>/etc/security/limits.conf
echo "root hard nofile 1048576" >>/etc/security/limits.conf
echo "* soft nproc 65535" >>/etc/security/limits.conf
echo "* hard nproc 65535" >>/etc/security/limits.conf
echo "root soft nproc unlimited" >>/etc/security/limits.conf
echo "root hard nproc unlimited" >>/etc/security/limits.conf
##注释原有的nproc行
sed -i "/nproc/s/^/#/g" /etc/security/limits.d/90-nproc.conf
##注释原有的nofile行
sed -i "/nofile/s/^/#/g" /etc/security/limits.d/90-nproc.conf
echo "* soft nofile 1048576" >>/etc/security/limits.d/90-nproc.conf
echo "* hard nofile 1048576" >>/etc/security/limits.d/90-nproc.conf
echo "root soft nofile 1048576" >>/etc/security/limits.d/90-nproc.conf
echo "root hard nofile 1048576" >>/etc/security/limits.d/90-nproc.conf
echo "* soft nproc 65535" >>/etc/security/limits.d/90-nproc.conf
echo "* hard nproc 65535" >>/etc/security/limits.d/90-nproc.conf
echo "root soft nproc unlimited" >>/etc/security/limits.d/90-nproc.conf
echo "root hard nproc unlimited" >>/etc/security/limits.d/90-nproc.conf
echo "* soft memlock unlimited" >>/etc/security/limits.d/90-nproc.conf
echo "* hard memlock unlimited" >>/etc/security/limits.d/90-nproc.conf
参考
Linux文件句柄未释放
https://blog.bwcxtech.com/posts/1501dca
释放java文件句柄
https://oomake.com/question/313034
lsof处理文件恢复、句柄以及空间释放问题
https://blog.51cto.com/u_13293070/2298059
LINUX删除文件,但空间不释放
https://blog.51cto.com/chbinmile/1872633