神经网络基本原理（以逻辑回归为例）

基本

逻辑回归可以视为最简单的单层神经网络（无隐藏层）。
其逻辑图如下：

单层神经网络图

多隐层神经网络（深度学习网络）实际就是这种单层网络的扩展，其都依赖于反向传播算法。

学习过程

下面简述下网络的学习过程：

#1 正向传播（Forward Propagation）

设输入为 $X$ ，本层输出为 $A$ ，最终输出为 $Y$ 。
则在此单层网络中：
$Y=A=\sigma( w^T X + b)$

可以设 $Z= w^T X + b$ ，是一个线性过程。
$\sigma$ 为激活函数，一般为非线性。所以整体模型是一个非线性函数。

$w$ 和 $b$ 是需要学习的参数。可以统称为 $\theta$ 。

式中，各量一般为矩阵。假设 $X$ 的shape为(n, m)。n表示样本数据向量长度，m表示样本数。则 $w$ 的shape为(n, 1)，
$b$ 为(1,m)， $Z$ 、 $A$ 和 $Y$ 为(1,m)，表示m个样本的输出值。

对于二分类问题， $\sigma$ 一般为sigmoid函数，公式为 $A=\sigma(Z)=\frac{1}{1 + e^{-Z}}$ ，输出为区间[0,1]的概率值。

#2 指标、损失和代价函数（Metric, Loss, Cost）

为了说明神经网络对实际问题的拟合程度，我们需要一个指标来表现其效果。一般用的是准确度Accuracy。

假设真实输出为 $y$ ，网络计算输出为 $\hat{y}$ 。
损失函数对指标进行了计算：
$Loss=L(y, \hat{y})$

由于我们一次对多个输入输出进行计算，可以评估对多样本的损失，用代价函数：
$Cost =J = \frac{1}{m} \sum_{i=1}^m L(y, \hat{y})$

对二分类问题，使用sigmoid激活函数时， $L = y\log(\hat{y})+(1-y)\log(1-\hat{y})$ 。

#3 反向传播（Backward Propagation）

正向传播计算出 $Y$ 后，需要进行反向传播。反向传播的目的是，（利用求导的链式反则）求取神经网络各层参数相对于代价函数 $J$ 的偏导数，并以此为据调整网络中参数的值。
根据梯度下降（Gradient Descent）法，有优化函数：
$Optimizer(\theta)=\theta - \alpha \frac{\partial J}{\partial \theta} = \theta - \alpha \text{ } d\theta$
（ $\alpha$ 称为学习速率）

由此，每次反向传播过程，参数 $\theta$ 的值都会变化。变化的方向是使得 $J$ 减小得最快的方向。

通过反复进行正向+反向传播过程，参数 $\theta$ 最终将逼近或达到一个使得 $J$ 取得最小值的点。
这个“正向+反向传播过程”，就称为训练。过程中，模型的参数对样本分布进行了拟合，就称为学习。

对上述逻辑回归的正向传播过程来说， $\frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T$ ， $\frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})$ 。
神经网络中涉及的线性运算、卷积、激活函数、求和函数等，都是可导的，因此反向传播是可行的。
又根据链式法则，可以从输出到输入逐层计算导数，在将之合并，因此每层的参数都可求得对J的导数。
* 链式法则： $\frac{d\varphi}{dx} = \frac{d\varphi}{dy} \centerdot \frac{dy}{dx}$

总结

神经网络，可以视为一个复杂的非线性函数。有输入、输出和许多参数。对神经网络的训练，就是通过反复供给模型X、Y，使得神经网络的参数变化，直到拟合输入输出间的复杂关系。

神经网络对函数（样本分布）的拟合程度，可以用代价函数来表示。而反向传播算法，通过使参数向代价函数变小最快方向移动，来降低代价函数值，起到拟合的作用。

在构建神经网络时，需要选择层数、每层单元数，每层的运算、激活函数，以及损失函数、优化函数等。

实现

参考Coursera的deeplearning课程 Logistic Regression as a Neural Network。

（datasets可以在上面课程地址找到）

# https://hub.coursera-notebooks.org/user/jzpvoopfcbteoxgjrpwsyv/notebooks/Week%202/Logistic%20Regression%20as%20a%20Neural%20Network/Logistic%20Regression%20with%20a%20Neural%20Network%20mindset%20v5.ipynb

# cat/non-cat problem

import h5py
import numpy as np

def load_dataset():
    """
    Load the datasets

    Return:
    train_set_x_orig -- original train set x
    train_set_y_orig -- original train set y
    test_set_x_orig -- original test set x
    test_set_y_orig -- original test set y
    classes -- sigmoid(z)
    """
    train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
    train_set_x_orig = np.array(train_dataset["train_set_x"][:])  # your train set features
    train_set_y_orig = np.array(train_dataset["train_set_y"][:])  # your train set labels

    test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
    test_set_x_orig = np.array(test_dataset["test_set_x"][:])  # your test set features
    test_set_y_orig = np.array(test_dataset["test_set_y"][:])  # your test set labels

    classes = np.array(test_dataset["list_classes"][:])  # the list of classes

    train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
    test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))

    return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

def sigmoid(z):
    """
    Compute the sigmoid of z

    Arguments:
    z -- A scalar or numpy array of any size.

    Return:
    s -- sigmoid(z)
    """
    s = 1 / (1 + np.e ** (-z))

    return s


def initialize_with_zeros(dim):
    """
    This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.

    Argument:
    dim -- size of the w vector we want (or number of parameters in this case)

    Returns:
    w -- initialized vector of shape (dim, 1)
    b -- initialized scalar (corresponds to the bias)
    """
    w = np.zeros((dim, 1))
    b = 0

    assert (w.shape == (dim, 1))
    assert (isinstance(b, float) or isinstance(b, int))

    return w, b


def propagate(w, b, X, Y):
    """
    Implement the cost function and its gradient for the propagation explained above

    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)

    Return:
    cost -- negative log-likelihood cost for logistic regression
    dw -- gradient of the loss with respect to w, thus same shape as w
    db -- gradient of the loss with respect to b, thus same shape as b

    Tips:
    - Write your code step by step for the propagation. np.log(), np.dot()
    """

    m = X.shape[1]

    # FORWARD PROPAGATION (FROM X TO COST)
    A = sigmoid(w.T @ X + b)  # compute activation
    cost = -(1 / m) * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))  # compute cost

    # BACKWARD PROPAGATION (TO FIND GRAD)
    dw = (1 / m) * (X @ (A - Y).T)
    db = (1 / m) * np.sum(A - Y)

    assert (dw.shape == w.shape)
    assert (db.dtype == float)
    cost = np.squeeze(cost)
    assert (cost.shape == ())

    grads = {"dw": dw,
             "db": db}

    return grads, cost


def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost=False):
    """
    This function optimizes w and b by running a gradient descent algorithm

    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of shape (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples)
    num_iterations -- number of iterations of the optimization loop
    learning_rate -- learning rate of the gradient descent update rule
    print_cost -- True to print the loss every 100 steps

    Returns:
    params -- dictionary containing the weights w and bias b
    grads -- dictionary containing the gradients of the weights and bias with respect to the cost function
    costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.

    Tips:
    You basically need to write down two steps and iterate through them:
        1) Calculate the cost and the gradient for the current parameters. Use propagate().
        2) Update the parameters using gradient descent rule for w and b.
    """

    costs = []
    dw, db = 0, 0

    for i in range(num_iterations):

        # Cost and gradient calculation (≈ 1-4 lines of code)
        grads, cost = propagate(w, b, X, Y)

        # Retrieve derivatives from grads
        dw = grads["dw"]
        db = grads["db"]

        # update rule (≈ 2 lines of code)
        w = w - learning_rate * dw
        b = b - learning_rate * db

        # Record the costs
        if i % 100 == 0:
            costs.append(cost)

        # Print the cost every 100 training iterations
        if print_cost and i % 100 == 0:
            print("Cost after iteration %i: %f" % (i, cost))

    params = {"w": w,
              "b": b}

    grads = {"dw": dw,
             "db": db}

    return params, grads, costs


def predict(w, b, X):
    '''
    Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)

    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)

    Returns:
    Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
    '''

    m = X.shape[1]
    Y_prediction = np.zeros((1, m))
    w = w.reshape(X.shape[0], 1)

    # Compute vector "A" predicting the probabilities of a cat being present in the picture
    A = sigmoid(w.T @ X + b)

    for i in range(A.shape[1]):

        # Convert probabilities A[0,i] to actual predictions p[0,i]
        if A[0, i] <= 0.5:
            Y_prediction[0, i] = 0
        else:
            Y_prediction[0, i] = 1

    assert (Y_prediction.shape == (1, m))

    return Y_prediction


def model(X_train, Y_train, X_test, Y_test, num_iterations=2000, learning_rate=0.5, print_cost=False):
    """
    Builds the logistic regression model by calling the function you've implemented previously

    Arguments:
    X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
    Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
    X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
    Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
    num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
    learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
    print_cost -- Set to true to print the cost every 100 iterations

    Returns:
    d -- dictionary containing information about the model.
    """
    # initialize parameters with zeros (≈ 1 line of code)
    w, b = initialize_with_zeros(X_train.shape[0])

    # Gradient descent (≈ 1 line of code)
    parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)

    # Retrieve parameters w and b from dictionary "parameters"
    w = parameters["w"]
    b = parameters["b"]

    # Predict test/train set examples (≈ 2 lines of code)
    Y_prediction_test = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)

    # Print train/test Errors
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))

    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test,
         "Y_prediction_train": Y_prediction_train,
         "w": w,
         "b": b,
         "learning_rate": learning_rate,
         "num_iterations": num_iterations}

    return d

def main():
    # Loading the data
    train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()

    m_train = train_set_y.shape[-1]
    m_test = test_set_y.shape[-1]
    num_px = train_set_x_orig.shape[1]

    print("Number of training examples: m_train = " + str(m_train))
    print("Number of testing examples: m_test = " + str(m_test))
    print("Height/Width of each image: num_px = " + str(num_px))
    print("Each image is of size: (" + str(num_px) + ", " + str(num_px) + ", 3)")
    print("train_set_x shape: " + str(train_set_x_orig.shape))
    print("train_set_y shape: " + str(train_set_y.shape))
    print("test_set_x shape: " + str(test_set_x_orig.shape))
    print("test_set_y shape: " + str(test_set_y.shape))

    # Reshape the training and test examples
    train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1).T
    test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1).T

    print("train_set_x_flatten shape: " + str(train_set_x_flatten.shape))
    print("train_set_y shape: " + str(train_set_y.shape))
    print("test_set_x_flatten shape: " + str(test_set_x_flatten.shape))
    print("test_set_y shape: " + str(test_set_y.shape))

    train_set_x = train_set_x_flatten / 255.
    test_set_x = test_set_x_flatten / 255.

    d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations=2000, learning_rate=0.005,
              print_cost=True)
    # train accuracy: 99.04306220095694 %
    # test accuracy: 70.0 %

if __name__ == '__main__':
    main()

神经网络基本原理（以逻辑回归为例）

基本

学习过程

#1 正向传播（Forward Propagation）

#2 指标、损失和代价函数（Metric, Loss, Cost）

#3 反向传播（Backward Propagation）

总结

实现

推荐阅读更多精彩内容