【NAS工具箱】Drop Path介绍 + Dropout回顾

【前言】Drop Path是NAS中常用到的一种正则化方法，由于网络训练的过程中常常是动态的，Drop Path就成了一个不错的正则化工具，在FractalNet、NASNet等都有广泛使用。

Dropout

Dropout是最早的用于解决过拟合的方法，是所有drop类方法的大前辈。Dropout在12年被Hinton提出，并且在ImageNet Classification with Deep Convolutional Neural Network工作AlexNet中使用到了Dropout。

原理：在前向传播的时候，让某个神经元激活以概率1-keep_prob（0<p<1）停止工作。

功能：这样可以让模型泛化能力更强，因为其不会过于以来某些局部的节点。训练阶段以keep_prob的概率保留，以1-keep_prob的概率关闭；测试阶段所有的神经元都不关闭，但是对训练阶段应用了dropout的神经元，输出值需要乘以keep_prob。

具体是这样的：

假设一个神经元的输出激活值为a，在不使用dropout的情况下，其输出期望值为a，如果使用了dropout，神经元就可能有保留和关闭两种状态，把它看作一个离散型随机变量，它就符合概率论中的0-1分布，其输出激活值的期望变为 p*a+(1-p)*0=pa，此时若要保持期望和不使用dropout时一致，就要除以 p。作者：种子_fe 链接：https://www.imooc.com/article/30129

实现： pytorch中的实现如下。

<pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python" cid="n13" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: Menlo, Monaco, "Courier New", monospace; font-size: 1.125rem; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(255, 255, 255); position: relative !important; color: rgb(122, 122, 122); padding: 0.5rem 1.125em; margin-bottom: 0.88em; border: 1px solid rgb(122, 122, 122); line-height: 1.5rem; width: inherit; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">class _DropoutNd(Module):
constants = ['p', 'inplace']
p: float
inplace: bool

def init(self, p: float = 0.5, inplace: bool = False) -> None:
super(_DropoutNd, self).init()
if p < 0 or p > 1:
raise ValueError("dropout probability has to be between 0 and 1, "
"but got {}".format(p))
self.p = p
self.inplace = inplace

def extra_repr(self) -> str:
return 'p={}, inplace={}'.format(self.p, self.inplace)

class Dropout(_DropoutNd):
def forward(self, input: Tensor) -> Tensor:
return F.dropout(input, self.p, self.training, self.inplace)</pre>

funtional.py中的dropout实现：

<pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python" cid="n15" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: Menlo, Monaco, "Courier New", monospace; font-size: 1.125rem; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(255, 255, 255); position: relative !important; color: rgb(122, 122, 122); padding: 0.5rem 1.125em; margin-bottom: 0.88em; border: 1px solid rgb(122, 122, 122); line-height: 1.5rem; width: inherit; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">def dropout(input: Tensor, p: float = 0.5, training: bool = True, inplace: bool = False) -> Tensor:
r"""
During training, randomly zeroes some of the elements of the input
tensor with probability :attr:p using samples from a Bernoulli
distribution.
See :class:~torch.nn.Dropout for details.
Args:
p: probability of an element to be zeroed. Default: 0.5
training: apply dropout if is True. Default: True
inplace: If set to True, will do this operation in-place. Default: False
"""
if has_torch_function_unary(input):
return handle_torch_function(dropout, (input,), input, p=p, training=training, inplace=inplace)
if p < 0.0 or p > 1.0:
raise ValueError("dropout probability has to be between 0 and 1, " "but got {}".format(p))
return VF.dropout(input, p, training) if inplace else _VF.dropout(input, p, training)</pre>

最终在Dropout.cpp中找到具体实现：

<pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="c++" cid="n17" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: Menlo, Monaco, "Courier New", monospace; font-size: 1.125rem; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(255, 255, 255); position: relative !important; color: rgb(122, 122, 122); padding: 0.5rem 1.125em; margin-bottom: 0.88em; border: 1px solid rgb(122, 122, 122); line-height: 1.5rem; width: inherit; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">template<bool feature_dropout, bool alpha_dropout, bool inplace, typename T>
Ctype<inplace> _dropout_impl(T& input, double p, bool train) {
TORCH_CHECK(p >= 0 && p <= 1, "dropout probability has to be between 0 and 1, but got ", p);
if (p == 0 || !train || input.numel() == 0) {
return input;
}

if (p == 1) {
return multiply<inplace>(input, at::zeros({}, input.options()));
}

at::Tensor b; // used for alpha_dropout only
auto noise = feature_dropout ? make_feature_noise(input) : at::empty_like(input, LEGACY_CONTIGUOUS_MEMORY_FORMAT);
noise.bernoulli_(1 - p);
if (alpha_dropout) {
constexpr double alpha = 1.7580993408473766;
double a = 1. / std::sqrt((alpha * alpha * p + 1) * (1 - p));
b = noise.add(-1).mul_(alpha * a).add_(alpha * a * p);
noise.mul_(a);
} else {
noise.div_(1 - p);
}

if (!alpha_dropout) {
return multiply<inplace>(input, noise);
} else {
return multiply<inplace>(input, noise).add_(b);
}
}</pre>

流程：

判断p的范围以及训练状态
使用1-p的概率得到伯努利分布（0-1分布）
(input / 1-p) * 伯努利分布

Drop Path

原理：字如其名，Drop Path就是随机将深度学习网络中的多分支结构随机删除。

功能：一般可以作为正则化手段加入网络，但是会增加网络训练的难度。尤其是在NAS问题中，如果设置的drop prob过高，模型甚至有可能不收敛。

实现：

<pre class="md-fences md-end-block ty-contain-cm modeLoaded" spellcheck="false" lang="python" cid="n31" mdtype="fences" style="box-sizing: border-box; overflow: visible; font-family: Menlo, Monaco, "Courier New", monospace; font-size: 1.125rem; display: block; break-inside: avoid; text-align: left; white-space: normal; background-image: inherit; background-position: inherit; background-size: inherit; background-repeat: inherit; background-attachment: inherit; background-origin: inherit; background-clip: inherit; background-color: rgb(255, 255, 255); position: relative !important; color: rgb(122, 122, 122); padding: 0.5rem 1.125em; margin-bottom: 0.88em; border: 1px solid rgb(122, 122, 122); line-height: 1.5rem; width: inherit; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">def drop_path(x, drop_prob: float = 0., training: bool = False):
if drop_prob == 0. or not training:
return x
keep_prob = 1 - drop_prob
shape = (x.shape[0],) + (1,) * (x.ndim - 1) # work with diff dim tensors, not just 2D ConvNets
random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)
random_tensor.floor_() # binarize
output = x.div(keep_prob) * random_tensor
return output

class DropPath(nn.Module):
"""Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
"""
def init(self, drop_prob=None):
super(DropPath, self).init()
self.drop_prob = drop_prob

def forward(self, x):
return drop_path(x, self.drop_prob, self.training)</pre>

有了Dropout的理论铺垫，这里的实现就比较明了了，具体使用的时候一般是这样的：

Drop Path不能直接这样使用：

Reference

https://www.cnblogs.com/dan-baishucaizi/p/14703263.html

https://www.imooc.com/article/30129

https://www.github.com/pytorch/pytorch

【NAS工具箱】Drop Path介绍 + Dropout回顾

Dropout

Drop Path

Reference

推荐阅读更多精彩内容