3. daemon详解——网络

docker网络

docker的网络模式:

  • 桥接,采用linux bridge + NAT
  • host,与主机共享一个network namespace
  • none,由用户手动指定
  • overlay,在swarm mode下的层2的覆盖网络

另外,一些插件也可以定制docker网络,比如flannel、calico、k8s等等……docker集成了libnetwork作为docker网络的解决方案。

daemon网络配置

打开docker/daemon/config/config.go可以看到一些docker的配置,里面与网络有关的如下:

// commonBridgeConfig stores all the platform-common bridge driver specific
// configuration.
type commonBridgeConfig struct {
    Iface     string `json:"bridge,omitempty"`
    FixedCIDR string `json:"fixed-cidr,omitempty"`
}

// NetworkConfig stores the daemon-wide networking configurations
type NetworkConfig struct {
    // Default address pools for docker networks
    DefaultAddressPools opts.PoolsOpt `json:"default-address-pools,omitempty"`
}
  • docker默认采用linux bridge的interface
  • IP分配在固定网段

在config_unix.go里有如下配置:

// BridgeConfig stores all the bridge driver specific
// configuration.
type BridgeConfig struct {
    commonBridgeConfig

    // These fields are common to all unix platforms.
    commonUnixBridgeConfig

    // Fields below here are platform specific.
    EnableIPv6          bool   `json:"ipv6,omitempty"`
    EnableIPTables      bool   `json:"iptables,omitempty"`
    EnableIPForward     bool   `json:"ip-forward,omitempty"`
    EnableIPMasq        bool   `json:"ip-masq,omitempty"`
    EnableUserlandProxy bool   `json:"userland-proxy,omitempty"`
    UserlandProxyPath   string `json:"userland-proxy-path,omitempty"`
    FixedCIDRv6         string `json:"fixed-cidr-v6,omitempty"`
}

docker daemon的配置为docker container提供了默认的网络环境,container创建时如果不指定网络,默认采用daemon的配置,也可以指定别的网络模式。

由于用户可以在启动daemon的时候指定docker0的信息,daemon启动过程中在NewDaemon()函数中会调用verifyDaemonSettings()检查用户输入的flags的合法性,和网络相关的有:

  • 检查是否同时指定了-b和-bip两个选项,这两个只能指定一个,如果制定了已经存在一个网桥,就不可能再指定它的IP了
  • 是否同时将enableIPTables和ICC都置为false,这是不允许的。因为ICC基于IPTABLES
  • 如果enableIPTables且enableIPMasq,将enableIPMasq置为false

之后的代码是一些cgroup和runtime的设定,以后再解析。

// verifyDaemonSettings performs validation of daemon config struct
func verifyDaemonSettings(conf *config.Config) error {
    // Check for mutually incompatible config options
    if conf.BridgeConfig.Iface != "" && conf.BridgeConfig.IP != "" {
        return fmt.Errorf("You specified -b & --bip, mutually exclusive options. Please specify only one")
    }
    if !conf.BridgeConfig.EnableIPTables && !conf.BridgeConfig.InterContainerCommunication {
        return fmt.Errorf("You specified --iptables=false with --icc=false. ICC=false uses iptables to function. Please set --icc or --iptables to true")
    }
    if !conf.BridgeConfig.EnableIPTables && conf.BridgeConfig.EnableIPMasq {
        conf.BridgeConfig.EnableIPMasq = false
    }
    if err := VerifyCgroupDriver(conf); err != nil {
        return err
    }
    if conf.CgroupParent != "" && UsingSystemd(conf) {
        if len(conf.CgroupParent) <= 6 || !strings.HasSuffix(conf.CgroupParent, ".slice") {
            return fmt.Errorf("cgroup-parent for systemd cgroup should be a valid slice named as \"xxx.slice\"")
        }
    }

    if conf.DefaultRuntime == "" {
        conf.DefaultRuntime = config.StockRuntimeName
    }
    if conf.Runtimes == nil {
        conf.Runtimes = make(map[string]types.Runtime)
    }
    conf.Runtimes[config.StockRuntimeName] = types.Runtime{Path: DefaultRuntimeName}

    return nil
}

之后再daemon.go里会检查bridge network是否启用,DisableNetworkBridge是'none',如果Iface也是none,即用户要求不创建网络,isBridgeNetworkDisabled就为true。

func isBridgeNetworkDisabled(conf *config.Config) bool {
    return conf.BridgeConfig.Iface == config.DisableNetworkBridge
}

而如果用户启用了网络配置,在NewDaemon()中的d.restore()中会初始化网络环境。因为可能此时已经有些容器之前跑过了,所以daemon会恢复它们的运行环境,会调用initNetworkController()
这个函数里的逻辑如下:

  • 调用networkOptions()生成网络配置,具体信息可以去函数里查看
  • 调用libnetwork.New(),用生成的netOptions新建libnetwork controller实例,其中包括
  id:               stringid.GenerateRandomID(),
  cfg:              config.ParseConfigOptions(cfgOptions...),
  sandboxes:        sandboxTable{},
  svcRecords:       make(map[string]svcInfo),
  serviceBindings:  make(map[serviceKey]*service),
  agentInitDone:    make(chan struct{}),
  networkLocker:    locker.New(),
  DiagnosticServer: diagnostic.New(),
    • type sandbox的定义可以在libnetwork/sandbox.go看到,容器其实就是一种沙盒应用。networkLocker是docker提供的"finer-grained locking",和互斥锁类似。DiagnosticServer用来诊断网络错误(支持注册用户自定义的HTTPHandlerFunc的功能还有待完善)。
      之后调用initStores()初始化data center,目前支持的类型有consul, zookeeper, etcd, boltdb四种,监控整个docker的网络状态。
      之后调用drvregistry.New()新建device registry实例,向registry中添加driver,从getInitializers()中看到driver类型包括了bridge, host, macvlan, null, remote, overlay,正是docker支持的网络模式。
      之后调用initIPAMDrivers()进行IP address management驱动的初始化,其中会调用SetDefaultIPAddressPool()进行IP地址的规划。
      之后初始化服务发现功能initDiscovery(),与之前的etcd等data center对应。在新建data center的时候会初始化watcher,这里会调用这个watcher。
      调用WalkNetworks(populateSpecial),populateSpecial定义了函数NetworkWalker,其中调用了addNetwork(),真正的调用底层的库创建了linux bridge。在/vendor/github.com/docker/libnetwork/controller.go中。在addNetwork()的逻辑中,调用了CreateNetwork()
      之后把之前遗留的sandbox和endpoints清空,把network清空。在初始化网络的时候显然要清理遗留的资源。
  • 根据network name初始化网络,如果是bridge模式,可能需要先调用removeDefaultBridgeInterface()删除遗留的网桥。bridge的真正初始化在之后调用的initBridgeDriver()中。
func (daemon *Daemon) initNetworkController(config *config.Config, activeSandboxes map[string]interface{}) (libnetwork.NetworkController, error) {
    netOptions, err := daemon.networkOptions(config, daemon.PluginStore, activeSandboxes)
    if err != nil {
        return nil, err
    }

    controller, err := libnetwork.New(netOptions...)
    if err != nil {
        return nil, fmt.Errorf("error obtaining controller instance: %v", err)
    }

    if len(activeSandboxes) > 0 {
        logrus.Info("There are old running containers, the network config will not take affect")
        return controller, nil
    }

    // Initialize default network on "null"
    if n, _ := controller.NetworkByName("none"); n == nil {
        if _, err := controller.NewNetwork("null", "none", "", libnetwork.NetworkOptionPersist(true)); err != nil {
            return nil, fmt.Errorf("Error creating default \"null\" network: %v", err)
        }
    }

    // Initialize default network on "host"
    if n, _ := controller.NetworkByName("host"); n == nil {
        if _, err := controller.NewNetwork("host", "host", "", libnetwork.NetworkOptionPersist(true)); err != nil {
            return nil, fmt.Errorf("Error creating default \"host\" network: %v", err)
        }
    }

    // Clear stale bridge network
    if n, err := controller.NetworkByName("bridge"); err == nil {
        if err = n.Delete(); err != nil {
            return nil, fmt.Errorf("could not delete the default bridge network: %v", err)
        }
        if len(config.NetworkConfig.DefaultAddressPools.Value()) > 0 && !daemon.configStore.LiveRestoreEnabled {
            removeDefaultBridgeInterface()
        }
    }

    if !config.DisableBridge {
        // Initialize default driver "bridge"
        if err := initBridgeDriver(controller, config); err != nil {
            return nil, err
        }
    } else {
        removeDefaultBridgeInterface()
    }

    return controller, nil
}

daemon.netController, err = daemon.initNetworkController(daemon.configStore, activeSandboxes)

CreateNetwork()做了什么

  • InitOSContext()设置namespace
  • getNetworks()获取网络列表
  • newInterface()新建bridgeInterface实例,其中调用了netlink库的nlh.LinkByName()将高层配置的网桥指针与底层设备连接起来。
  • 新建bridgeNetwork实例,bridge填充为上述新建的bridge interface
  • 添加inter-network跨网络通信规则(通过Iptables完成)
  • newBridgeSetup()准备新建网桥所需要的参数
  • bridgeSetup.apply()将参数apply到网桥中
func (d *driver) createNetwork(config *networkConfiguration) (err error) {
    defer osl.InitOSContext()()

    networkList := d.getNetworks()

    // Initialize handle when needed
    d.Lock()
    if d.nlh == nil {
        d.nlh = ns.NlHandle()
    }
    d.Unlock()

    // Create or retrieve the bridge L3 interface
    bridgeIface, err := newInterface(d.nlh, config)
    if err != nil {
        return err
    }

    // Create and set network handler in driver
    network := &bridgeNetwork{
        id:         config.ID,
        endpoints:  make(map[string]*bridgeEndpoint),
        config:     config,
        portMapper: portmapper.New(d.config.UserlandProxyPath),
        bridge:     bridgeIface,
        driver:     d,
    }

    d.Lock()
    d.networks[config.ID] = network
    d.Unlock()

    // On failure make sure to reset driver network handler to nil
    defer func() {
        if err != nil {
            d.Lock()
            delete(d.networks, config.ID)
            d.Unlock()
        }
    }()

    // Add inter-network communication rules.
    setupNetworkIsolationRules := func(config *networkConfiguration, i *bridgeInterface) error {
        if err := network.isolateNetwork(networkList, true); err != nil {
            if err = network.isolateNetwork(networkList, false); err != nil {
                logrus.Warnf("Failed on removing the inter-network iptables rules on cleanup: %v", err)
            }
            return err
        }
        // register the cleanup function
        network.registerIptCleanFunc(func() error {
            nwList := d.getNetworks()
            return network.isolateNetwork(nwList, false)
        })
        return nil
    }

    // Prepare the bridge setup configuration
    bridgeSetup := newBridgeSetup(config, bridgeIface)

    // If the bridge interface doesn't exist, we need to start the setup steps
    // by creating a new device and assigning it an IPv4 address.
    bridgeAlreadyExists := bridgeIface.exists()
    if !bridgeAlreadyExists {
        bridgeSetup.queueStep(setupDevice)
    }

    // Even if a bridge exists try to setup IPv4.
    bridgeSetup.queueStep(setupBridgeIPv4)

    enableIPv6Forwarding := d.config.EnableIPForwarding && config.AddressIPv6 != nil

    // Conditionally queue setup steps depending on configuration values.
    for _, step := range []struct {
        Condition bool
        Fn        setupStep
    }{
        // Enable IPv6 on the bridge if required. We do this even for a
        // previously  existing bridge, as it may be here from a previous
        // installation where IPv6 wasn't supported yet and needs to be
        // assigned an IPv6 link-local address.
        {config.EnableIPv6, setupBridgeIPv6},

        // We ensure that the bridge has the expectedIPv4 and IPv6 addresses in
        // the case of a previously existing device.
        {bridgeAlreadyExists, setupVerifyAndReconcile},

        // Enable IPv6 Forwarding
        {enableIPv6Forwarding, setupIPv6Forwarding},

        // Setup Loopback Addresses Routing
        {!d.config.EnableUserlandProxy, setupLoopbackAddressesRouting},

        // Setup IPTables.
        {d.config.EnableIPTables, network.setupIPTables},

        //We want to track firewalld configuration so that
        //if it is started/reloaded, the rules can be applied correctly
        {d.config.EnableIPTables, network.setupFirewalld},

        // Setup DefaultGatewayIPv4
        {config.DefaultGatewayIPv4 != nil, setupGatewayIPv4},

        // Setup DefaultGatewayIPv6
        {config.DefaultGatewayIPv6 != nil, setupGatewayIPv6},

        // Add inter-network communication rules.
        {d.config.EnableIPTables, setupNetworkIsolationRules},

        //Configure bridge networking filtering if ICC is off and IP tables are enabled
        {!config.EnableICC && d.config.EnableIPTables, setupBridgeNetFiltering},
    } {
        if step.Condition {
            bridgeSetup.queueStep(step.Fn)
        }
    }

    // Apply the prepared list of steps, and abort at the first error.
    bridgeSetup.queueStep(setupDeviceUp)
    return bridgeSetup.apply()
}

bridge driver的初始化过程

在daemon/daemon_unix.go的initBridgeDriver()

func initBridgeDriver(controller libnetwork.NetworkController, config *config.Config) error {
    bridgeName := bridge.DefaultBridgeName
    if config.BridgeConfig.Iface != "" {
        bridgeName = config.BridgeConfig.Iface
    }
    netOption := map[string]string{
        bridge.BridgeName:         bridgeName,
        bridge.DefaultBridge:      strconv.FormatBool(true),
        netlabel.DriverMTU:        strconv.Itoa(config.Mtu),
        bridge.EnableIPMasquerade: strconv.FormatBool(config.BridgeConfig.EnableIPMasq),
        bridge.EnableICC:          strconv.FormatBool(config.BridgeConfig.InterContainerCommunication),
    }

    // --ip processing
    if config.BridgeConfig.DefaultIP != nil {
        netOption[bridge.DefaultBindingIP] = config.BridgeConfig.DefaultIP.String()
    }

    var (
        ipamV4Conf *libnetwork.IpamConf
        ipamV6Conf *libnetwork.IpamConf
    )

    ipamV4Conf = &libnetwork.IpamConf{AuxAddresses: make(map[string]string)}

    nwList, nw6List, err := netutils.ElectInterfaceAddresses(bridgeName)
    if err != nil {
        return errors.Wrap(err, "list bridge addresses failed")
    }

    nw := nwList[0]
    if len(nwList) > 1 && config.BridgeConfig.FixedCIDR != "" {
        _, fCIDR, err := net.ParseCIDR(config.BridgeConfig.FixedCIDR)
        if err != nil {
            return errors.Wrap(err, "parse CIDR failed")
        }
        // Iterate through in case there are multiple addresses for the bridge
        for _, entry := range nwList {
            if fCIDR.Contains(entry.IP) {
                nw = entry
                break
            }
        }
    }

    ipamV4Conf.PreferredPool = lntypes.GetIPNetCanonical(nw).String()
    hip, _ := lntypes.GetHostPartIP(nw.IP, nw.Mask)
    if hip.IsGlobalUnicast() {
        ipamV4Conf.Gateway = nw.IP.String()
    }

    if config.BridgeConfig.IP != "" {
        ipamV4Conf.PreferredPool = config.BridgeConfig.IP
        ip, _, err := net.ParseCIDR(config.BridgeConfig.IP)
        if err != nil {
            return err
        }
        ipamV4Conf.Gateway = ip.String()
    } else if bridgeName == bridge.DefaultBridgeName && ipamV4Conf.PreferredPool != "" {
        logrus.Infof("Default bridge (%s) is assigned with an IP address %s. Daemon option --bip can be used to set a preferred IP address", bridgeName, ipamV4Conf.PreferredPool)
    }

    if config.BridgeConfig.FixedCIDR != "" {
        _, fCIDR, err := net.ParseCIDR(config.BridgeConfig.FixedCIDR)
        if err != nil {
            return err
        }

        ipamV4Conf.SubPool = fCIDR.String()
    }

    if config.BridgeConfig.DefaultGatewayIPv4 != nil {
        ipamV4Conf.AuxAddresses["DefaultGatewayIPv4"] = config.BridgeConfig.DefaultGatewayIPv4.String()
    }

    var deferIPv6Alloc bool
    if config.BridgeConfig.FixedCIDRv6 != "" {
        _, fCIDRv6, err := net.ParseCIDR(config.BridgeConfig.FixedCIDRv6)
        if err != nil {
            return err
        }

        // In case user has specified the daemon flag --fixed-cidr-v6 and the passed network has
        // at least 48 host bits, we need to guarantee the current behavior where the containers'
        // IPv6 addresses will be constructed based on the containers' interface MAC address.
        // We do so by telling libnetwork to defer the IPv6 address allocation for the endpoints
        // on this network until after the driver has created the endpoint and returned the
        // constructed address. Libnetwork will then reserve this address with the ipam driver.
        ones, _ := fCIDRv6.Mask.Size()
        deferIPv6Alloc = ones <= 80

        if ipamV6Conf == nil {
            ipamV6Conf = &libnetwork.IpamConf{AuxAddresses: make(map[string]string)}
        }
        ipamV6Conf.PreferredPool = fCIDRv6.String()

        // In case the --fixed-cidr-v6 is specified and the current docker0 bridge IPv6
        // address belongs to the same network, we need to inform libnetwork about it, so
        // that it can be reserved with IPAM and it will not be given away to somebody else
        for _, nw6 := range nw6List {
            if fCIDRv6.Contains(nw6.IP) {
                ipamV6Conf.Gateway = nw6.IP.String()
                break
            }
        }
    }

    if config.BridgeConfig.DefaultGatewayIPv6 != nil {
        if ipamV6Conf == nil {
            ipamV6Conf = &libnetwork.IpamConf{AuxAddresses: make(map[string]string)}
        }
        ipamV6Conf.AuxAddresses["DefaultGatewayIPv6"] = config.BridgeConfig.DefaultGatewayIPv6.String()
    }

    v4Conf := []*libnetwork.IpamConf{ipamV4Conf}
    v6Conf := []*libnetwork.IpamConf{}
    if ipamV6Conf != nil {
        v6Conf = append(v6Conf, ipamV6Conf)
    }
    // Initialize default network on "bridge" with the same name
    _, err = controller.NewNetwork("bridge", "bridge", "",
        libnetwork.NetworkOptionEnableIPv6(config.BridgeConfig.EnableIPv6),
        libnetwork.NetworkOptionDriverOpts(netOption),
        libnetwork.NetworkOptionIpam("default", "", v4Conf, v6Conf, nil),
        libnetwork.NetworkOptionDeferIPv6Alloc(deferIPv6Alloc))
    if err != nil {
        return fmt.Errorf("Error creating default \"bridge\" network: %v", err)
    }
    return nil
}

代码逻辑:

  • 设置网桥名称
  • 调用ElectInterfaceAddresses查找bridge name的device在宿主机上能否找到。除此之外还要检查用户是否在flags中指定了网桥的IP,将用户指定的IP作为preferred IP
  • 检测是否指定了网桥IP,之后进行了IPV6、掩码、网关等设置
  • controller.NewNetwork()新建名为之前设定的bridge name的网桥

restore完成之后的流

initNetworkController()初始化了网络,规定了网桥IP、掩码、网关等配置信息,返回了netController实例。之后重新加载容器与网络的关系,从checkpoint恢复容器上下文等等。

总之,docke采用libnetwork库封装了docker的网络功能,隔离了docker daemon对netlink库的直接调用。

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 213,558评论 6 492
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,002评论 3 387
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 159,036评论 0 349
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,024评论 1 285
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,144评论 6 385
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,255评论 1 292
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,295评论 3 412
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,068评论 0 268
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,478评论 1 305
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,789评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,965评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,649评论 4 336
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,267评论 3 318
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,982评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,223评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,800评论 2 365
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,847评论 2 351

推荐阅读更多精彩内容