完整课程视频b站视频链接：MIT 6.S191

1. Introduction to deep learning

The perceptron - the structural building block of deep learning

perceptron：感知元 neuron：神经元，在神经网络中是同一个东西，即组成神经网络的最小单元。

Forward Propagation：前向传播。即从输入到输出的正向传播过程。

Common Activation Functions：Sigmoid Function, Hyperbolic Tangent, Rectified Linear Unit (ReLU)。

Building Neural Networks with perceptrons

Dense Layers: if inputs are densely connected to all ouputs, these layers are called dense layers.

Quantifying Loss：量化loss。loss记录了不正确的预测所造成的损失。

Loss的种类：Empirical Loss, Binary Cross Entropy Loss（二元交叉熵损失函数）, Mean Squared Error Loss（均方误差损失函数）

Training Neural Networks

Loss Optimization: Gradient Descent（梯度下降）

Computing Gradient: Backpopagation（误差反向传播，用来求梯度）

Learning Rate：决定了梯度下降的幅度，太大则无法收敛，太小则优化过程太慢。

Adaptive Learning Rate：训练过程中learning rate不再是固定值，而是可以改变的。常用算法：SGD, Adam, Adadelta, Adagrad, RMSProp

Batch Size：由于计算loss时，把整个数据集的数据项都计算一遍的开销太大，因此每次随机选择若干个数据项计算loss。batch size指的是选择计算loss的数据项的个数。batch size越小，计算速度越快，但loss无法代表整体效果；batch size越大，计算速度越慢，但是每一次loss更能代表整体的性能。

Overfitting：随着训练数据的加入，神经网络可能会记录下只属于训练集的特征，这些特征无法泛化，从而导致过拟合的现象。解决此问题的方法有：

Dropout：将每一层的每一个神经元的激活函数根据概率随机置为0，即不让它处理数据，从而简化网络。
Early Stopping：当 test data 的 loss 曲线走向开始与 training data 的 loss 曲线走向相背离时，提前终止训练。这种情况的出现说明神经网络虽然在训练集上表现更好，但在测试集上表现更差了，这是过拟合的表现。

2. Recurrent Neural Networks

处理的问题：Sequence Modeling Problem。

Sequence Modeling Problem中的关键问题：

Handle variable-length sequence
Track long-term dependency
Maintain information about order
Share parameters across the sequence

RNN训练中可能出现的问题：1. Exploding gradient；2. Vanishing gradient。因此加入了LSTM (Long Short Term Memory)

RNN的应用

Music generation
Sentiment classification
Machine translation (Attention machanism -- from Google)
Trakectory prediction (self-driving)
Environmental modeling

3. Convolutional Neural Networks

CNN常常被用来进行特征提取，其内部的神经网络一般由两部分组成：卷积层（Convolution）、池化层（Pooling）。其中卷积层的激活函数使用 ReLU。

CNN 使用场景

Detection
Semantic segmantation
End-to-end robotic controll

4. Deep Generative Modeling

VAEs (Variational Autoencoders)

Learn lower-dimensional latent space and sample to generate input reconstruction.

GANs (Generative Adversarial Networks)

Competing Generator and Discriminator networks.

MIT 6.S191 公开课笔记（持续更新）