UFLDL新版教程与编程练习（八）：Convolutional Neural Network（卷积神经网络）

UFLDL是吴恩达团队编写的较早的一门深度学习入门，里面理论加上练习的节奏非常好，每次都想快点看完理论去动手编写练习，因为他帮你打好了整个代码框架，也有详细的注释，所以我们只要实现一点核心的代码编写工作就行了，上手快！

我这里找不到新版对应这块的中文翻译了，-_-，趁早写一下，否则又没感觉了！
第八节是：Convolutional Neural Network（卷积神经网络）
我自认为这是整个教程中最难的一次练习，因为它综合了之前的多层神经网络，卷积和池化，不仅要写前向传播的代码，而且还要写卷积网络的反向传播（这又是这节最难的一部分），梯度检查等等，之前学习卷积神经网络的反向传播都花了2天的时间，不要说要写出代码了
在练习之前，有两节知识点：

Optimization: Stochastic Gradient Descent

Stochastic Gradient Descent即随机梯度下降，与Batch Gradient Descent不同的是，随机梯度下降（SGD）不是在整个训练集上进行一次梯度的计算，而是在一个单个训练样本上进行梯度的计算，就如下面的公式所表示的：
BGD:
$\theta=\theta-\alpha \nabla_{\theta} E[J(\theta)]$
SGD:
$\theta=\theta-\alpha \nabla_{\theta} J\left(\theta ; x^{(i)}, y^{(i)}\right)$
有想具体了解一下的，可以在这里看一下CS229的note中相关的部分
相比于BGD来说，SGD在用的时候也不会真的只用一个单个的训练样本计算梯度及更新参数，而是会用一个Mini-Batch来计算梯度，通常是128，256等2的幂次，相比一个来说，这样更能稳定地收敛，并且利用向量计算的高效性，在深度学习里面SGD也默认都是利用了Mini-Batch，这点会在minFuncSGD.m里面体现。
这一节还介绍了比SGD更好一点的是带Momentum（动量）的梯度下降，具体公式就是：
$\begin{array}{l}{v=\gamma v+\alpha \nabla_{\theta} J\left(\theta ; x^{(i)}, y^{(i)}\right)} \\ {\theta=\theta-v}\end{array}$
在代码里面也就是一两行，现在就贴出minFuncSGD.m：

function [opttheta] = minFuncSGD(funObj,theta,data,labels,...
                        options)
% in cnnTrain.m to call minFuncSGD function like this:
% opttheta = minFuncSGD(@(x,y,z) cnnCost(x,y,z,numClasses,filterDim,...
%                      numFilters,poolDim),theta,images,labels,options);

% Runs stochastic gradient descent with momentum to optimize the
% parameters for the given objective.
%
% Parameters:
%  funObj     -  function handle which accepts as input theta,
%                data, labels and returns cost and gradient w.r.t
%                to theta.
%  theta      -  unrolled parameter vector
%  data       -  stores data in m x n x numExamples tensor
%  labels     -  corresponding labels in numExamples x 1 vector
%  options    -  struct to store specific options for optimization
%
% Returns:
%  opttheta   -  optimized parameter vector
%
% Options (* required)
%  epochs*     - number of epochs through data
%  alpha*      - initial learning rate
%  minibatch*  - size of minibatch
%  momentum    - momentum constant, defualts to 0.9


%%======================================================================
%% Setup
assert(all(isfield(options,{'epochs','alpha','minibatch'})),...
        'Some options not defined');
if ~isfield(options,'momentum')
    options.momentum = 0.9;
end;
epochs = options.epochs;
alpha = options.alpha;
minibatch = options.minibatch;
m = length(labels); % training set size
% Setup for momentum
mom = 0.5;
momIncrease = 20;
velocity = zeros(size(theta));

%%======================================================================
%% SGD loop
it = 0;  % iterations
for e = 1:epochs
    
    % randomly permute indices of data for quick minibatch sampling
    rp = randperm(m);
    
    for s=1:minibatch:(m-minibatch+1) %234
        it = it + 1;

        % increase momentum after momIncrease iterations
        if it == momIncrease
            mom = options.momentum;
        end;

        % get next randomly selected minibatch
        mb_data = data(:,:,rp(s:s+minibatch-1));
        mb_labels = labels(rp(s:s+minibatch-1));

        % evaluate the objective function on the next minibatch
        [cost grad] = funObj(theta,mb_data,mb_labels);
        
        % Instructions: Add in the weighted velocity vector to the
        % gradient evaluated above scaled by the learning rate.
        % Then update the current weights theta according to the
        % sgd update rule
        
        %%% YOUR CODE HERE %%%
        velocity = mom * velocity + alpha * grad;
        theta = theta - velocity;
        fprintf('Epoch %d: Cost on iteration %d is %f\n',e,it,cost);
    end;

    % aneal learning rate by factor of two after each epoch
    alpha = alpha/2.0;

end;

opttheta = theta;

end

Convolutional Neural Network

这就是教程里面介绍的卷积神经网络的第一层的架构，一些符号说明具体可以看教程这节本身，这里的前向传播跟之前的Convolution and Pooling（卷积和池化）一节非常像，所以直接调用之前的cnnConvolve、cnnPool函数就好了。

卷积神经网络第一层

难的是此时的反向传播和全连接网络的Back Propagation不一样了，之前如果第层和层是全连接的，我们之前也提过，就是按这样进行误差传递和梯度计算：

但如果第层是一个卷积和下采样层，误差传递是这样的：

梯度计算是这样的：

这一段着实比较难理解，第一下看的话，要仔细理解教程的话，我建议你直接读我的代码，也可以参考一下这几篇博客和内容：
1、这一篇用图来介绍反向传播在卷积神经网络里面也可以表示为卷积过程（好像国内看不了）
2、这篇是关联度最高的，可以着重看一下这篇，因为它的PPT最后写的参考文献就是UFLDL，但不能下载
3、这篇写卷积神经网络的反向传播写的符号标记也和UFLDL颇为接近，推导说明非常详细
然后就是我的cnnCost.m代码：

function [cost, grad, preds] = cnnCost(theta,images,labels,numClasses,...
                                filterDim,numFilters,poolDim,pred)
% Calcualte cost and gradient for a single layer convolutional
% neural network followed by a softmax layer with cross entropy
% objective.
%                            
% Parameters:
%  theta      -  unrolled parameter vector
%  images     -  stores images in imageDim x imageDim x numImges
%                array
%  numClasses -  number of classes to predict
%  filterDim  -  dimension of convolutional filter                            
%  numFilters -  number of convolutional filters
%  poolDim    -  dimension of pooling area
%  pred       -  boolean only forward propagate and return
%                predictions
%
%
% Returns:
%  cost       -  cross entropy cost
%  grad       -  gradient with respect to theta (if pred==False)
%  preds      -  list of predictions for each example (if pred==True)


if ~exist('pred','var') % 默认就是false
    pred = false;
end;


imageDim = size(images,1); % height/width of image
numImages = size(images,3); % number of images

%% Reshape parameters and setup gradient matrices

% Wc is filterDim x filterDim x numFilters parameter matrix
% bc is the corresponding bias

% Wd is numClasses x hiddenSize parameter matrix where hiddenSize
% is the number of output units from the convolutional layer
% bd is corresponding bias
[Wc, Wd, bc, bd] = cnnParamsToStack(theta,imageDim,filterDim,numFilters,...
                        poolDim,numClasses);

% Same sizes as Wc,Wd,bc,bd. Used to hold gradient w.r.t above params.
Wc_grad = zeros(size(Wc));
Wd_grad = zeros(size(Wd));
bc_grad = zeros(size(bc));
bd_grad = zeros(size(bd));

%%======================================================================
%% STEP 1a: Forward Propagation
%  In this step you will forward propagate the input through the
%  convolutional and subsampling (mean pooling) layers.  You will then use
%  the responses from the convolution and pooling layer as the input to a
%  standard softmax layer.

%% Convolutional Layer
%  For each image and each filter, convolve the image with the filter, add
%  the bias and apply the sigmoid nonlinearity.  Then subsample the 
%  convolved activations with mean pooling.  Store the results of the
%  convolution in activations and the results of the pooling in
%  activationsPooled.  You will need to save the convolved activations for
%  backpropagation.
convDim = imageDim-filterDim+1; % dimension of convolved output 20
outputDim = (convDim)/poolDim; % dimension of subsampled output 10

% convDim x convDim x numFilters x numImages tensor for storing activations
activations = zeros(convDim,convDim,numFilters,numImages);

% outputDim x outputDim x numFilters x numImages tensor for storing
% subsampled activations
activationsPooled = zeros(outputDim,outputDim,numFilters,numImages);

%%% YOUR CODE HERE %%%
activations = cnnConvolve(filterDim,numFilters,images,Wc,bc);
% pool
activationsPooled = cnnPool(poolDim,activations);

% Reshape activations into 2-d matrix, hiddenSize x numImages,
% for Softmax layer
activationsPooled = reshape(activationsPooled,[],numImages);

%% Softmax Layer
%  Forward propagate the pooled activations calculated above into a
%  standard softmax layer. For your convenience we have reshaped
%  activationPooled into a hiddenSize x numImages matrix.  Store the
%  results in probs.

% numClasses x numImages for storing probability that each image belongs to
% each class.
probs = zeros(numClasses,numImages);

%%% YOUR CODE HERE %%%
out = Wd * activationsPooled;
out = bsxfun(@plus,out,bd);
% out = sigmoid(out); 之前梯度检查的时候就这里没有注释，看来还是不能用激活的
out = exp(out);
probs = bsxfun(@rdivide,out,sum(out));
preds = probs;
%%======================================================================
%% STEP 1b: Calculate Cost
%  In this step you will use the labels given as input and the probs
%  calculate above to evaluate the cross entropy objective.  Store your
%  results in cost.

cost = 0; % save objective into cost

%%% YOUR CODE HERE %%%
I = sub2ind(size(probs),labels',1:size(probs,2));
cost = (-1) * sum(log(probs(I)));
lambda = 0.0001;
weightDecayCost = (lambda/2) * (sum(Wd(:) .^ 2) + sum(Wc(:) .^ 2));
cost = cost / numImages + weightDecayCost;
% Makes predictions given probs and returns without backproagating errors.
if pred
    [~,preds] = max(probs,[],1);
    preds = preds';
    grad = 0;
    return;
end;

%%======================================================================
%% STEP 1c: Backpropagation
%  Backpropagate errors through the softmax and convolutional/subsampling
%  layers.  Store the errors for the next step to calculate the gradient.
%  Backpropagating the error w.r.t the softmax layer is as usual.  To
%  backpropagate through the pooling layer, you will need to upsample the
%  error with respect to the pooling layer for each filter and each image.  
%  Use the kron function and a matrix of ones to do this upsampling 
%  quickly.

%%% YOUR CODE HERE %%%
hAct = cell(3,1);
tabels = zeros(size(probs));
tabels(I) = 1;
for l = 3:-1:2  % 这里不像之前的有ei.num_layer，只能人工填3
    if(l == 3)
        hAct{l}.delta = -(tabels - probs);  % 输出层使用softmax的损失函数，所以和二次项损失函数不同，其他的都是一样的
    else
%         hAct{l}.delta = (Wd'* hAct{l+1}.delta) .* (activationsPooled
%         .*(1- activationsPooled));  % 不能乘后面激活函数的导数
        hAct{l}.delta = (Wd'* hAct{l+1}.delta);
    end
end
hAct{2}.delta = reshape(hAct{2}.delta,outputDim, outputDim, numFilters, numImages);
hAct{1}.delta = zeros(convDim, convDim, numFilters, numImages);
%展开 卷积层的误差传递有些不一样
for imageNum = 1:numImages
    for filterNum = 1:numFilters
        e = hAct{2}.delta(:, :, filterNum, imageNum);
        hAct{1}.delta(:, :, filterNum, imageNum) = (1/poolDim^2) * kron(e, ones(poolDim));
    end
end
hAct{1}.delta = hAct{1}.delta .* activations .* (1 - activations); 
%%======================================================================
%% STEP 1d: Gradient Calculation
%  After backpropagating the errors above, we can use them to calculate the
%  gradient with respect to all the parameters.  The gradient w.r.t the
%  softmax layer is calculated as usual.  To calculate the gradient w.r.t.
%  a filter in the convolutional layer, convolve the backpropagated error
%  for that filter with each image and aggregate over images.

%%% YOUR CODE HERE %%%
Wd_grad = (1/numImages) * hAct{3}.delta * activationsPooled'+lambda * Wd;
bd_grad = (1/numImages).*sum(hAct{3}.delta, 2);

for filterNum = 1 : numFilters
    for imageNum = 1 : numImages     
        Wc_grad(:, :, filterNum) = Wc_grad(:, :, filterNum) + conv2(images(:, :, imageNum), rot90(hAct{1}.delta(:, :, filterNum, imageNum), 2), 'valid');
    end
    Wc_grad(:, :, filterNum) = (1/numImages) * Wc_grad(:, :, filterNum);
end
Wc_grad = Wc_grad + lambda * Wc;

for filterNum = 1 : numFilters
    e = hAct{1}.delta(:, :, filterNum, :);
    bc_grad(filterNum) = (1/numImages) * sum(e(:));
end
%% Unroll gradient into grad vector for minFunc
grad = [Wc_grad(:) ; Wd_grad(:) ; bc_grad(:) ; bd_grad(:)];

end

运行结果
i5 6500U跑出的结果：

卷积神经网络运行结果1

i7 8代跑出的结果：

卷积神经网络运行结果2

两者除了时间上有点差别外，准确率相差不大

参考：https://blog.csdn.net/lingerlanlan/article/details/41390443
有理解不到位之处，还请指出，有更好的想法，可以在下方评论交流！

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 218,546评论 6赞 507
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 93,224评论 3赞 395
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 164,911评论 0赞 354
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 58,737评论 1赞 294
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 67,753评论 6赞 392
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 51,598评论 1赞 305
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 40,338评论 3赞 418
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 39,249评论 0赞 276
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 45,696评论 1赞 314
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 37,888评论 3赞 336
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 40,013评论 1赞 348
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 35,731评论 5赞 346
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 41,348评论 3赞 330
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 31,929评论 0赞 22
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 33,048评论 1赞 270
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 48,203评论 3赞 370
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 44,960评论 2赞 355

UFLDL新版教程与编程练习（八）：Convolutional Neural Network（卷积神经网络）

Optimization: Stochastic Gradient Descent

Convolutional Neural Network

推荐阅读更多精彩内容