Improving Deep Neural Networks学习笔记(三)

文章作者:Tyan
博客:noahsnail.com  |  CSDN  |  简书

5. Hyperparameter tuning

5.1 Tuning process

Hyperparameters:

$\alpha$, $\beta$, $\beta_1,\beta_2, \epsilon$, layers, hidden units, learning rate decay, mini-batch size.

The learning rate is the most important hyperparameter to tune. $\beta$, mini-batch size and hidden units is second in importance to tune.

Try random values: Don't use a grid. Corarse to fine.

5.2 Using an appropriate scale to pick hyperparameters

Appropriate scale to hyperparameters:

$\alpha = [0.0001, 1]$, r = -4 * np.random.rand(), $\alpha = 10^r$.

If $\alpha = [10^a, 10^b]$, random pick from [a, b] uniformly, and set $\alpha = 10^r$.

Hyperparameters for exponentially weighted average

$\beta = [0.9, 0.999]$, don't random pick from $[0.9, 0.999]$. Use $1-\beta = [0.001, 0.1]$, use similar method lik $\alpha$.

Why don't use linear pick? Because when $\beta$ is close one, even if a little change, it will have a huge impact on algorithm.

5.3 Hyperparameters tuning in practice: Pandas vs Caviar

  • Re-test hyperparamters occasionally

  • Babysitting one model(Pandas)

  • Training many models in parallel(Caviar)

6. Batch Normalization

6.1 Normalizing activations in a network

In logistic regression, normalizing inputs to speed up learning.

  1. compute means$\mu = \frac {1} {m} \sum_{i=1}^n x^{(i)}$
  2. subtract off the means from training set $x = x - \mu$\
  3. compute the variances $\sigma ^2 = \frac {1} {m} \sum_{i=1}^n {x{(i)}}2$
  4. normalize training set $X = \frac {X} {\sigma ^2}$

Similarly, in order to speed up training neural network, we can normalize intermediate values in layers(z in hidden layer), it is called Batch Normalization or Batch Norm.

Implementing Batch Norm

  1. Given some intermediate value in neural network, $z^{(1)}, z{(2)},...,z{(m)}$
  2. compute means $\mu = \frac {1} {m} \sum_{i=1} z^{(i)}$
  3. compute the variances $\sigma ^2 = \frac {1} {m} \sum_{i=1} (z^{(i)} - \mu)^2$
  4. normalize $z$, $z^{(i)} = \frac {z^{(i)} - \mu} {\sqrt {(\sigma ^2 + \epsilon)}}$
  5. compute $\hat z$, $\hat z = \gamma z^{(i) + \beta}$.

Now we have normalized Z to have mean zero and standard unit variance. But maybe it makes sense for hidden units to have a different distribution. So we use $\hat z$ instead of $z$, $\gamma$ and $\beta$ are learnable parameters of your model.

6.2 Fitting Batch Norm into a neural network

Add Batch Norm to a network

$X \rightarrow Z^{[1]} \rightarrow {\hat Z^{[1]}} \rightarrow {a^{[1]}} \rightarrow Z^{[2]} \rightarrow {\hat Z^{[2]}} \rightarrow {a^{[2]}}...$

Parameters:
$W^{[1]}, b^{[1]}$, $W^{[2]}, b^{[2]}...$
$\gamma^{[1]}, \beta^{[1]}$, $\gamma^{[2]}, \beta^{[2]}...$

If you use Batch Norm, you need to computing means and subtracting means, so $b^{[i]}$ is useless, so we can set $b^{[i]} = 0$ permanently.

6.3 Why does Batch Norm work?

Covariate Shift: You have learned a function from $x \rightarrow y$, it works well. If the distribution of $x$ changes, you need to learn a new function to make it work well.

Hidden unit values change all the time, and so it's suffering from the problem of covariate.

Batch Norm as regularization

  • Each mini-batch is scaled by the mean/variance computed on just that mini-batch.
  • This adds some noise to the values $z^{[l]}$ within that mini-batch. So similar to dropout, it adds some noise to each hidden layer's activations.
  • This has a slight regularization effect.

6.4 Batch Norm at test time

In order to apply neural network at test time, come up with some seperate estimate of mu and sigma squared.

7. Multi-class classification

7.1 Softmax regression

7.2 Training a softmax classifier

Hard max.

Loss function.

Gradient descent with softmax.

8. Programming Frameworks

8.1 Deep Learning frameworks

  • Caffe/Caffe2
  • TensorFlow
  • Torch
  • Theano
  • mxnet
  • PaddlePaddle
  • Keras
  • CNTK

Choosing deep learning frameworks

  • Ease of programming (development and deployment)
  • Running speed
  • Truly open (open source with good governance)

8.2 TensorFlow

...

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • 文章作者:Tyan博客:noahsnail.com | CSDN | 简书 声明:作者翻译论文仅为学习,如有侵权请...
    SnailTyan阅读 10,595评论 0 8
  • 一场倾盆的大雨, 把闷热的天气淋了又淋。 天空中传来一丝凉意, 把你燥热的心抚慰。 你可曾知道, 这样的因缘要用怎...
    小剧在成长阅读 793评论 0 3
  • 如今的花果山是个景区 世上当然也就没有了那块 钟天地灵气的顽石 却不知这世上 突然又从哪里冒出来 一只猴子 和那只...
    刘汉皇阅读 1,287评论 3 2
  • 小米六年级的时候从村里的小学转到了镇上。那个时候,笨手笨脚的小米是班里人嘲笑的对象。孩子的世界虽然单纯的近乎可爱,...
    余小鱼爱吃酸菜鱼阅读 2,726评论 0 0