需求
项目通过docker容器化后,由于种种原因,项目部署后存在未被容器使用的镜像,占用磁盘空间。因此,需要查找未被容器使用的docker镜像,并将它们从部署脚本中去除。项目中容器数量庞大,手工查询费时费力且易出错,故寻求自动化方式查找出未使用镜像
分析
- 最好能通过docker自身命令查找出未使用的容器。
docker images --filter dangling=true能查找出untagged images,如下
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
<none> <none> 8abc22fbb042 4 weeks ago 0 B
但实际项目中未使用的docker镜像大多是有tag的,只是镜像加载后没有使用而已。故通过docker images --filter dangling=true无法查找出全部未使用的镜像
- 自己写脚本解决
1.docker ps -a获取所有容器信息,其中包含它们所使用的镜像名称
2.对所有容器,docker history container_id,查找到它们的基础镜像(由于docker镜像的分层复用特性,下层的基础镜像不会占用额外的空间,故无需清理)
3.对1/2中所有的镜像取并集,并去重
4.docker images查找出所有镜像,并排查3中得到的镜像,便是未使用的镜像
python脚本
find_unused_images.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import re
IMAGES_COMMAND = 'docker images'
PS_COMMAND = 'docker ps -a'
HISTORY = 'docker history %s'
RE = re.compile(r'\s+\s+')
def exec_command(command):
result = os.popen(command)
return result.readlines()
class Image(object):
def __init__(self, image_info):
self.split_info = RE.split(image_info)
self.image_id = self.split_info[2]
self._generate_image_name()
self.size = self.split_info[-1]
def _generate_image_name(self):
tag = self.split_info[1]
repo = self.split_info[0]
self.name = repo if tag == 'latest' else repo + ':' + tag
def get_related_images(self):
image_ids = [RE.split(history)[0] for history in exec_command(HISTORY % self.image_id)[1:] if
RE.split(history)[0] != '<missing>']
return filter(None, [ImageUtil.get_image_by_id(image_id) for image_id in image_ids])
def __repr__(self):
return 'id:%s name:%s size:%s' % (self.image_id, self.name, self.size)
def __hash__(self):
return hash(self.image_id)
def __eq__(self, other):
return self.image_id == other.image_id
def __ne__(self, other):
return not self.__eq__(other)
class ImageUtil(object):
all_images = [Image(image_info) for image_info in exec_command(IMAGES_COMMAND)[1:]]
@classmethod
def get_image_by_id(cls, image_id):
try:
return filter(lambda img: img.image_id == image_id, cls.all_images)[0]
except IndexError:
return ''
@classmethod
def get_image_by_name(cls, name):
try:
return filter(lambda img: img.name == name, cls.all_images)[0]
except IndexError:
return ''
class Container(object):
def __init__(self, container_info):
self.split_info = RE.split(container_info)
image_name = self.split_info[1]
self.image = ImageUtil.get_image_by_name(image_name)
class ContainerUtil(object):
all_containers = [Container(container_info) for container_info in exec_command(PS_COMMAND)[1:]]
@classmethod
def get_used_images(cls):
used_images = [container.image for container in cls.all_containers if container.image]
related_images = []
for used_image in used_images:
related_images.extend(used_image.get_related_images())
used_images.extend(related_images)
return used_images
if __name__ == '__main__':
all_images = ImageUtil.all_images
images_used_by_container = ContainerUtil.get_used_images()
unused_images = set(all_images) - set(images_used_by_container)
print('unused images')
for image in unused_images:
print(image)
依赖
- python2.7(python3.x略作修改也可使用)
- root用户
用法
将find_unused_images.py脚本放至任意目录,执行
python find_unused_images.py
便可打印出所有未使用的镜像以及它们的大小
unused images
id:e111a70eee6a name:celery size:216MB