启动容器的过程
-
conditionalMountOnStart()
与挂载有关 -
initializeNetworking()
网络 -
createSpec()
文件系统 -
saveApparmorConfig()
安全 -
containerd.Create()
和containerd.Start()
交给containerd管理容器
conditionalMountOnStart()
// conditionalMountOnStart is a platform specific helper function during the
// container start to call mount.
注释是这么写的,该函数就一行代码return daemon.Mount(container)
,设置了容器的文件系统:
-
container.RWLayer.Mount()
挂载了容器。RWLayer是docker daemon创建容器的时候由daemon.imageService.CreateLayer()
建立的,可以回顾一下相关代码:-
layerID = img.RootFS.ChainID()
通过镜像ID获取到镜像实例img,img拥有RootFS的属性,RootFS描述了镜像的根文件系统,包含两个成员:Type string json:type"
-
DiffIDs []layer.DiffID json:"diff_ids,omitempty"
DiffID是一个哈希值,代表了一个layer
ChainID则代表了最顶层的layer。实际上非常好理解,image是分层的,每一层是一个layer,拥有一个自己的root filesystem,当前容器构建于最顶层的镜像之上。
ChainID()
在windows下是空的,在其他情况下将调用CreateChainID(r.DiffIDs)
,通过递归+SHA256算法计算出最顶层的ChainID 创建rwLayerOpts实例,包含了mountlabel, initFunc和storageOpt三部分,InitFunc负责初始化一个writable的挂载点
layerStores[container.OS].CreateRWLayer()
负责创建read write layer
回到Mount()函数中,
container.RWLayer.Mount(container.GetMountLabel())
挂载了创建的rwlayer,并返回文件系统的路径。它的参数是string类型的mount label -
// Set RWLayer for container after mount labels have been set
rwLayer, err := daemon.imageService.CreateLayer(container, setupInitLayer(daemon.idMappings))
if err != nil {
return nil, errdefs.System(err)
}
container.RWLayer = rwLayer
// CreateLayer creates a filesystem layer for a container.
// called from create.go
// TODO: accept an opt struct instead of container?
func (i *ImageService) CreateLayer(container *container.Container, initFunc layer.MountInit) (layer.RWLayer, error) {
var layerID layer.ChainID
if container.ImageID != "" {
img, err := i.imageStore.Get(container.ImageID)
if err != nil {
return nil, err
}
layerID = img.RootFS.ChainID()
}
rwLayerOpts := &layer.CreateRWLayerOpts{
MountLabel: container.MountLabel,
InitFunc: initFunc,
StorageOpt: container.HostConfig.StorageOpt,
}
// Indexing by OS is safe here as validation of OS has already been performed in create() (the only
// caller), and guaranteed non-nil
return i.layerStores[container.OS].CreateRWLayer(container.ID, layerID, rwLayerOpts)
}
// Mount sets container.BaseFS
// (is it not set coming in? why is it unset?)
func (daemon *Daemon) Mount(container *container.Container) error {
if container.RWLayer == nil {
return errors.New("RWLayer of container " + container.ID + " is unexpectedly nil")
}
dir, err := container.RWLayer.Mount(container.GetMountLabel())
if err != nil {
return err
}
logrus.Debugf("container mounted via layerStore: %v", dir)
if container.BaseFS != nil && container.BaseFS.Path() != dir.Path() {
// The mount path reported by the graph driver should always be trusted on Windows, since the
// volume path for a given mounted layer may change over time. This should only be an error
// on non-Windows operating systems.
if runtime.GOOS != "windows" {
daemon.Unmount(container)
return fmt.Errorf("Error: driver %s is returning inconsistent paths for container %s ('%s' then '%s')",
daemon.imageService.GraphDriverForOS(container.OS), container.ID, container.BaseFS, dir)
}
}
container.BaseFS = dir // TODO: combine these fields
return nil
}
createSpec()
基于namespace的概念,spec设置容器的文件系统,并完成进程间的隔离,将容器的进程与宿主机的进程隔离开来。
-
DefaultSpec()
生成默认的spec,linux的默认配置在docker/oci/defaults.go的DefaultLinuxSpec()
中,设置了LinuxCapabilities和各种文件系统的默认路径 -
daemon.populateCommonSpec()
将容器的设置覆盖之前的默认值,包括BaseFS, linked container, ReadOnlyRootfs, WorkingDir, 注册init函数(即用户可以指定容器启动后自动执行什么命令), TTY(terminal设置), daemon environment等 - 设置crougpsPath, Systemd是否启用
-
setResources()
设置容器的资源使用,如CPU、内存、IO的限制(不懂linux内核,然而这块应该由很多底层的机制可以挖掘和了解) -
initCgroupsPath()
初始化,cgroup介绍 -
setDevices()
设置容器使用device的权限,如果为容器制定了privileged模式,容器可以使用所有host上的device,否则按照cgroup的规则在容器和host上添加可用的device。 -
setRlimits()
设置POSIXRlimit,进程资源限制,包括了软件和硬件两种。 -
SetUser()
将容器内用户的uid, gid等信息写入spec.Process.User内,可以做到将容器用户和宿主机用户隔离。 -
setNamespaces()
设置容器的namespace,进行资源隔离。这些namespace的path的格式为/proc/$PID/ns/net-
user namespace
如果容器使用了私有的user namespace,会根据UID创建该容器用户的用户命名空间。以前版本的docker没有user namespace时,容器里root用户在宿主机同时也是root用户,user namespace隔离容器内用户,使其在host看来是普通用户。 -
network namespace
根据网络模式进行区分,如果是container模式,网络命名空间与另一个容器共享,同时在注释内看到:to share a net namespace, they must also share a user namespace;如果是host模式,和host共享 -
ipc
进程间通信。IpcMode有四类情况:isContainer()即和另一个容器保持通信,和指定的容器共享ipc namespace,isHost()将删除容器的ipc namespace,IsEmpty()现在已经作废,IsPrivate()或IsShareable()或IsNone()都将生成容器自己独立的namespace -
pid namespace
docker允许容器拥有自己的pid namespace,容器内进程和容器外进程的映射,两者拥有不同的process id。 -
uts namespace
容器可以有自己的hostname和domain name
-
-
setCapabilities()
capabilities简单来说,就是开放给进程的权限(access),容器本质上就是一个进程,它默认有一些capabilities,例如:CHOWN, DAC_OVERRIDE, FSETID, FOWNER, MKNOD, NET_RAW, SETGID, SETUID, SETFCAP, SETPCAP,
NET_BIND_SERVICE, SYS_CHROOT, KILL, and AUDIT_WRITE -
setSeccomp()
也是一种安全机制,seccomp和privileged冲突,privileged模式会覆盖seccomp。可以查阅网上关于seccomp的资料 -
setupContainerMountsRoot()
获取根目录挂载点,创建目录,权限改为700 -
setupIpcDirs()
在host, shareable下IPC目录为/dev/shm -
setupSecretDir()
createSecretsDir is used to create a dir suitable for storing container secrets. -
setupMounts()
etupMounts iterates through each of the mount points for a container and calls Setup() on each. 看起来Setup()主要为每个挂载点挂载volumn、创建目录等等。在setupMounts()之后还有setMounts()
为IcpMounts, TmpfsMounts, SecretMounts做挂载 - 如果创建了network namespace,在libnetwork中注册container ID和daemon netcontroller的hook
- 设置apparmor和selinux。设置MaskedPaths和ReadonlyPaths。
func (daemon *Daemon) createSpec(c *container.Container) (retSpec *specs.Spec, err error) {
s := oci.DefaultSpec()
if err := daemon.populateCommonSpec(&s, c); err != nil {
return nil, err
}
var cgroupsPath string
scopePrefix := "docker"
parent := "/docker"
useSystemd := UsingSystemd(daemon.configStore)
if useSystemd {
parent = "system.slice"
}
if c.HostConfig.CgroupParent != "" {
parent = c.HostConfig.CgroupParent
} else if daemon.configStore.CgroupParent != "" {
parent = daemon.configStore.CgroupParent
}
if useSystemd {
cgroupsPath = parent + ":" + scopePrefix + ":" + c.ID
logrus.Debugf("createSpec: cgroupsPath: %s", cgroupsPath)
} else {
cgroupsPath = filepath.Join(parent, c.ID)
}
s.Linux.CgroupsPath = cgroupsPath
if err := setResources(&s, c.HostConfig.Resources); err != nil {
return nil, fmt.Errorf("linux runtime spec resources: %v", err)
}
s.Linux.Sysctl = c.HostConfig.Sysctls
p := s.Linux.CgroupsPath
if useSystemd {
initPath, err := cgroups.GetInitCgroup("cpu")
if err != nil {
return nil, err
}
_, err = cgroups.GetOwnCgroup("cpu")
if err != nil {
return nil, err
}
p = filepath.Join(initPath, s.Linux.CgroupsPath)
}
// Clean path to guard against things like ../../../BAD
parentPath := filepath.Dir(p)
if !filepath.IsAbs(parentPath) {
parentPath = filepath.Clean("/" + parentPath)
}
if err := daemon.initCgroupsPath(parentPath); err != nil {
return nil, fmt.Errorf("linux init cgroups path: %v", err)
}
if err := setDevices(&s, c); err != nil {
return nil, fmt.Errorf("linux runtime spec devices: %v", err)
}
if err := daemon.setRlimits(&s, c); err != nil {
return nil, fmt.Errorf("linux runtime spec rlimits: %v", err)
}
if err := setUser(&s, c); err != nil {
return nil, fmt.Errorf("linux spec user: %v", err)
}
if err := setNamespaces(daemon, &s, c); err != nil {
return nil, fmt.Errorf("linux spec namespaces: %v", err)
}
if err := setCapabilities(&s, c); err != nil {
return nil, fmt.Errorf("linux spec capabilities: %v", err)
}
if err := setSeccomp(daemon, &s, c); err != nil {
return nil, fmt.Errorf("linux seccomp: %v", err)
}
if err := daemon.setupContainerMountsRoot(c); err != nil {
return nil, err
}
if err := daemon.setupIpcDirs(c); err != nil {
return nil, err
}
defer func() {
if err != nil {
daemon.cleanupSecretDir(c)
}
}()
if err := daemon.setupSecretDir(c); err != nil {
return nil, err
}
ms, err := daemon.setupMounts(c)
if err != nil {
return nil, err
}
if !c.HostConfig.IpcMode.IsPrivate() && !c.HostConfig.IpcMode.IsEmpty() {
ms = append(ms, c.IpcMounts()...)
}
tmpfsMounts, err := c.TmpfsMounts()
if err != nil {
return nil, err
}
ms = append(ms, tmpfsMounts...)
secretMounts, err := c.SecretMounts()
if err != nil {
return nil, err
}
ms = append(ms, secretMounts...)
sort.Sort(mounts(ms))
if err := setMounts(daemon, &s, c, ms); err != nil {
return nil, fmt.Errorf("linux mounts: %v", err)
}
for _, ns := range s.Linux.Namespaces {
if ns.Type == "network" && ns.Path == "" && !c.Config.NetworkDisabled {
target := filepath.Join("/proc", strconv.Itoa(os.Getpid()), "exe")
s.Hooks = &specs.Hooks{
Prestart: []specs.Hook{{
Path: target,
Args: []string{"libnetwork-setkey", c.ID, daemon.netController.ID()},
}},
}
}
}
if apparmor.IsEnabled() {
var appArmorProfile string
if c.AppArmorProfile != "" {
appArmorProfile = c.AppArmorProfile
} else if c.HostConfig.Privileged {
appArmorProfile = "unconfined"
} else {
appArmorProfile = "docker-default"
}
if appArmorProfile == "docker-default" {
// Unattended upgrades and other fun services can unload AppArmor
// profiles inadvertently. Since we cannot store our profile in
// /etc/apparmor.d, nor can we practically add other ways of
// telling the system to keep our profile loaded, in order to make
// sure that we keep the default profile enabled we dynamically
// reload it if necessary.
if err := ensureDefaultAppArmorProfile(); err != nil {
return nil, err
}
}
s.Process.ApparmorProfile = appArmorProfile
}
s.Process.SelinuxLabel = c.GetProcessLabel()
s.Process.NoNewPrivileges = c.NoNewPrivileges
s.Process.OOMScoreAdj = &c.HostConfig.OomScoreAdj
s.Linux.MountLabel = c.MountLabel
// Set the masked and readonly paths with regard to the host config options if they are set.
if c.HostConfig.MaskedPaths != nil {
s.Linux.MaskedPaths = c.HostConfig.MaskedPaths
}
if c.HostConfig.ReadonlyPaths != nil {
s.Linux.ReadonlyPaths = c.HostConfig.ReadonlyPaths
}
return &s, nil
}
saveApparmorConfig()
Apparmor看似在createSpec()里已经设置过了,然而之后还有这么一步,而且第一行代码就把container.AppArmorProfile设为空。这个函数里通过parseSecurityOpt()
获取到安全等级,即用户可能在启动容器时指定--security-opt参数。都是安全性方面的设定,不太懂。
func (daemon *Daemon) saveApparmorConfig(container *container.Container) error {
container.AppArmorProfile = "" //we don't care about the previous value.
if !daemon.apparmorEnabled {
return nil // if apparmor is disabled there is nothing to do here.
}
if err := parseSecurityOpt(container, container.HostConfig); err != nil {
return errdefs.InvalidParameter(err)
}
if !container.HostConfig.Privileged {
if container.AppArmorProfile == "" {
container.AppArmorProfile = defaultApparmorProfile
}
} else {
container.AppArmorProfile = "unconfined"
}
return nil
}