cs231n:assignment2——python文件:fc_net.py

视频里 Andrej Karpathy上课的时候说,这次的作业meaty but educational,确实很meaty,作业一般是由.ipynb文件和.py文件组成,这次因为每个.ipynb文件涉及到的.py文件较多,且互相之间有交叉,所以每篇博客只贴出一个.ipynb或者一个.py文件.(因为之前的作业由于是一个.ipynb文件对应一个.py文件,所以就整合到一篇博客里)
还是那句话,有错误希望帮我指出来,多多指教,谢谢

**第二部分编写任意层数的全连接层的类的时候,我的前向计算和反向计算都太繁琐,这是在写任意层数的conv_net的时候发现的,所以懒得改了 **

fc_net.py内容:

import numpy as np

from cs231n.layers import *
from cs231n.layer_utils import *


class TwoLayerNet(object):
  """
  A two-layer fully-connected neural network with ReLU nonlinearity and
  softmax loss that uses a modular layer design. We assume an input dimension
  of D, a hidden dimension of H, and perform classification over C classes.

  The architecure should be affine - relu - affine - softmax.

  Note that this class does not implement gradient descent; instead, it
  will interact with a separate Solver object that is responsible for running
  optimization.

  The learnable parameters of the model are stored in the dictionary
  self.params that maps parameter names to numpy arrays.
  """

  def __init__(self, input_dim=3*32*32, hidden_dim=100, num_classes=10,
               weight_scale=1e-3, reg=0.0):
    """
    Initialize a new network.

    Inputs:
    - input_dim: An integer giving the size of the input
    - hidden_dim: An integer giving the size of the hidden layer
    - num_classes: An integer giving the number of classes to classify
    - dropout: Scalar between 0 and 1 giving dropout strength.
    - weight_scale: Scalar giving the standard deviation for random
      initialization of the weights.
    - reg: Scalar giving L2 regularization strength.
    """
    self.params = {}
    self.reg = reg

    ############################################################################
    # TODO: Initialize the weights and biases of the two-layer net. Weights    #
    # should be initialized from a Gaussian with standard deviation equal to   #
    # weight_scale, and biases should be initialized to zero. All weights and  #
    # biases should be stored in the dictionary self.params, with first layer  #
    # weights and biases using the keys 'W1' and 'b1' and second layer weights #
    # and biases using the keys 'W2' and 'b2'.                                 #
    ############################################################################
    self.params['W1'] = weight_scale * np.random.randn(input_dim, hidden_dim)
    self.params['b1'] = np.zeros(hidden_dim)
    self.params['W2'] = weight_scale * np.random.randn(hidden_dim, num_classes)
    self.params['b2'] = np.zeros(num_classes)
    ############################################################################
    #                             END OF YOUR CODE                             #
    ############################################################################


  def loss(self, X, y=None):
    """
    Compute loss and gradient for a minibatch of data.

    Inputs:
    - X: Array of input data of shape (N, d_1, ..., d_k)
    - y: Array of labels, of shape (N,). y[i] gives the label for X[i].

    Returns:
    If y is None, then run a test-time forward pass of the model and return:
    - scores: Array of shape (N, C) giving classification scores, where
      scores[i, c] is the classification score for X[i] and class c.

    If y is not None, then run a training-time forward and backward pass and
    return a tuple of:
    - loss: Scalar value giving the loss
    - grads: Dictionary with the same keys as self.params, mapping parameter
      names to gradients of the loss with respect to those parameters.
    """
    scores = None
    ############################################################################
    # TODO: Implement the forward pass for the two-layer net, computing the    #
    # class scores for X and storing them in the scores variable.              #
    ############################################################################
    a2, cache1 = affine_relu_forward(X, self.params['W1'], self.params['b1'])
    scores, cache2 = affine_forward(a2, self.params['W2'], self.params['b2'])
    ############################################################################
    #                             END OF YOUR CODE                             #
    ############################################################################

    # If y is None then we are in test mode so just return scores
    if y is None:
      return scores

    loss, grads = 0, {}
    ############################################################################
    # TODO: Implement the backward pass for the two-layer net. Store the loss  #
    # in the loss variable and gradients in the grads dictionary. Compute data #
    # loss using softmax, and make sure that grads[k] holds the gradients for  #
    # self.params[k]. Don't forget to add L2 regularization!                   #
    #                                                                          #
    # NOTE: To ensure that your implementation matches ours and you pass the   #
    # automated tests, make sure that your L2 regularization includes a factor #
    # of 0.5 to simplify the expression for the gradient.                      #
    ############################################################################
    loss_without_reg, dscores = softmax_loss(scores, y)
    loss = loss_without_reg + 0.5 * self.reg * (np.sum(self.params['W1']**2) + \
                                          np.sum(self.params['W2']**2))
    da2, grads['W2'], grads['b2'] = affine_backward(dscores, cache2)
    grads['W2'] += self.reg*cache2[1]
    dx, grads['W1'], grads['b1'] = affine_relu_backward(da2, cache1)
    grads['W1'] += self.reg*cache1[0][1]
    ############################################################################
    #                             END OF YOUR CODE                             #
    ############################################################################

    return loss, grads


class FullyConnectedNet(object):
  """
  A fully-connected neural network with an arbitrary number of hidden layers,
  ReLU nonlinearities, and a softmax loss function. This will also implement
  dropout and batch normalization as options. For a network with L layers,
  the architecture will be

  {affine - [batch norm] - relu - [dropout]} x (L - 1) - affine - softmax

  where batch normalization and dropout are optional, and the {...} block is
  repeated L - 1 times.

  Similar to the TwoLayerNet above, learnable parameters are stored in the
  self.params dictionary and will be learned using the Solver class.
  """

  def __init__(self, hidden_dims, input_dim=3*32*32, num_classes=10,
               dropout=0, use_batchnorm=False, reg=0.0,
               weight_scale=1e-2, dtype=np.float32, seed=None):
    """
    Initialize a new FullyConnectedNet.

    Inputs:
    - hidden_dims: A list of integers giving the size of each hidden layer.
    - input_dim: An integer giving the size of the input.
    - num_classes: An integer giving the number of classes to classify.
    - dropout: Scalar between 0 and 1 giving dropout strength. If dropout=0 then
      the network should not use dropout at all.
    - use_batchnorm: Whether or not the network should use batch normalization.
    - reg: Scalar giving L2 regularization strength.
    - weight_scale: Scalar giving the standard deviation for random
      initialization of the weights.
    - dtype: A numpy datatype object; all computations will be performed using
      this datatype. float32 is faster but less accurate, so you should use
      float64 for numeric gradient checking.
    - seed: If not None, then pass this random seed to the dropout layers. This
      will make the dropout layers deteriminstic so we can gradient check the
      model.
    """
    self.use_batchnorm = use_batchnorm
    self.use_dropout = dropout > 0
    self.reg = reg
    self.num_layers = 1 + len(hidden_dims)
    self.dtype = dtype
    self.params = {}

    ############################################################################
    # TODO: Initialize the parameters of the network, storing all values in    #
    # the self.params dictionary. Store weights and biases for the first layer #
    # in W1 and b1; for the second layer use W2 and b2, etc. Weights should be #
    # initialized from a normal distribution with standard deviation equal to  #
    # weight_scale and biases should be initialized to zero.                   #
    #                                                                          #
    # When using batch normalization, store scale and shift parameters for the #
    # first layer in gamma1 and beta1; for the second layer use gamma2 and     #
    # beta2, etc. Scale parameters should be initialized to one and shift      #
    # parameters should be initialized to zero.                                #
    ############################################################################
    for i in xrange(self.num_layers):
      if i == 0:
        #initialize first affine layers
        self.params['W'+str(i+1)] = \
            weight_scale * np.random.randn(input_dim, hidden_dims[i])
        self.params['b'+str(i+1)] = np.zeros(hidden_dims[i])
        #initialize first batch normalize layers
        if self.use_batchnorm:
          self.params['gamma'+str(i+1)] = np.ones(hidden_dims[i])
          self.params['beta'+str(i+1)] = np.zeros(hidden_dims[i])
      elif i == self.num_layers-1:
        #initialize last affine layers
        self.params['W'+str(i+1)] = \
            weight_scale * np.random.randn(hidden_dims[i-1], num_classes)
        self.params['b'+str(i+1)] = np.zeros(num_classes)
      else:
        #initialize  affine layers
        self.params['W'+str(i+1)] = \
            weight_scale * np.random.randn(hidden_dims[i-1], hidden_dims[i])
        self.params['b'+str(i+1)] = np.zeros(hidden_dims[i])
        #initialize batch normalize layers
        if self.use_batchnorm:
          self.params['gamma'+str(i+1)] = np.ones(hidden_dims[i])
          self.params['beta'+str(i+1)] = np.zeros(hidden_dims[i])
    ############################################################################
    #                             END OF YOUR CODE                             #
    ############################################################################

    # When using dropout we need to pass a dropout_param dictionary to each
    # dropout layer so that the layer knows the dropout probability and the mode
    # (train / test). You can pass the same dropout_param to each dropout layer.
    self.dropout_param = {}
    if self.use_dropout:
      self.dropout_param = {'mode': 'train', 'p': dropout}
      if seed is not None:
        self.dropout_param['seed'] = seed

    # With batch normalization we need to keep track of running means and
    # variances, so we need to pass a special bn_param object to each batch
    # normalization layer. You should pass self.bn_params[0] to the forward pass
    # of the first batch normalization layer, self.bn_params[1] to the forward
    # pass of the second batch normalization layer, etc.
    self.bn_params = []
    if self.use_batchnorm:
      self.bn_params = [{'mode': 'train'} for i in xrange(self.num_layers - 1)]

    # Cast all parameters to the correct datatype
    for k, v in self.params.iteritems():
      self.params[k] = v.astype(dtype)


  def loss(self, X, y=None):
    """
    Compute loss and gradient for the fully-connected net.

    Input / output: Same as TwoLayerNet above.
    """
    X = X.astype(self.dtype)
    mode = 'test' if y is None else 'train'

    # Set train/test mode for batchnorm params and dropout param since they
    # behave differently during training and testing.
    if self.dropout_param is not None:
      self.dropout_param['mode'] = mode
    if self.use_batchnorm:
      for bn_param in self.bn_params:
        bn_param['mode'] = mode
        # bn_param[mode] = mode
        ## 我觉得这一块原文可能错了索引的mode应该是带引号的,应该是bn_param['mode']  
    scores = None
    ############################################################################
    # TODO: Implement the forward pass for the fully-connected net, computing  #
    # the class scores for X and storing them in the scores variable.          #
    #                                                                          #
    # When using dropout, you'll need to pass self.dropout_param to each       #
    # dropout forward pass.                                                    #
    #                                                                          #
    # When using batch normalization, you'll need to pass self.bn_params[0] to #
    # the forward pass for the first batch normalization layer, pass           #
    # self.bn_params[1] to the forward pass for the second batch normalization #
    # layer, etc.                                                              #
    ############################################################################
    #写的太繁琐,效率且低
    if self.use_batchnorm and self.use_dropout:
      a = []
      a.append(X)
      cache = []
      for i in xrange(self.num_layers):
        if i==self.num_layers-1:
          scores, cache_last = affine_forward(a[i], \
                                self.params['W'+str(i+1)], \
                                self.params['b'+str(i+1)])
          cache.append(cache_last)
        else:
          a_out_i, cache_i = affine_bn_relu_dp_forward(a[i], \
                                        self.params['W'+str(i+1)], \
                                        self.params['b'+str(i+1)],\
                                        self.params['gamma'+str(i+1)],\
                                        self.params['beta'+str(i+1)],\
                                        self.bn_params[i],\
                                                                                self.dropout_param)
          a.append(a_out_i)
          cache.append(cache_i)
    elif self.use_batchnorm and not self.use_dropout:
      a = []
      a.append(X)
      cache = []
      for i in xrange(self.num_layers):
        if i==self.num_layers-1:
          scores, cache_last = affine_forward(a[i], \
                                self.params['W'+str(i+1)], \
                                self.params['b'+str(i+1)])
          cache.append(cache_last)
        else:
          a_out_i, cache_i = affine_bn_relu_forward(a[i], \
                                        self.params['W'+str(i+1)], \
                                        self.params['b'+str(i+1)],\
                                        self.params['gamma'+str(i+1)],\
                                        self.params['beta'+str(i+1)],\
                                        self.bn_params[i])
          a.append(a_out_i)
          cache.append(cache_i)
    elif self.use_dropout and not self.use_batchnorm:
      a = []
      a.append(X)
      cache = []
      for i in xrange(self.num_layers):
        if i==self.num_layers-1:
          scores, cache_last = affine_forward(a[i], \
                                self.params['W'+str(i+1)], \
                                self.params['b'+str(i+1)])
          cache.append(cache_last)
        else:
          a_out_i, cache_i = affine_relu_forward(a[i], \
                                        self.params['W'+str(i+1)], \
                                        self.params['b'+str(i+1)])
          # add dropout layer
          d_out_i, dp_cache= dropout_forward(a_out_i, self.dropout_param)

          a.append(d_out_i)
          ad_dp_cache = (cache_i, dp_cache)
          cache.append(ad_dp_cache)
    else:
      a = []
      a.append(X)
      cache = []
      for i in xrange(self.num_layers):
        if i==self.num_layers-1:
          scores, cache_last = affine_forward(a[i], \
                                self.params['W'+str(i+1)], \
                                self.params['b'+str(i+1)])
          cache.append(cache_last)
        else:
          a_out_i, cache_i = affine_relu_forward(a[i], \
                                        self.params['W'+str(i+1)], \
                                        self.params['b'+str(i+1)])
          a.append(a_out_i)
          cache.append(cache_i)
    ############################################################################
    #                             END OF YOUR CODE                             #
    ############################################################################

    # If test mode return early
    if mode == 'test':
      return scores

    loss, grads = 0.0, {}
    ############################################################################
    # TODO: Implement the backward pass for the fully-connected net. Store the #
    # loss in the loss variable and gradients in the grads dictionary. Compute #
    # data loss using softmax, and make sure that grads[k] holds the gradients #
    # for self.params[k]. Don't forget to add L2 regularization!               #
    #                                                                          #
    # When using batch normalization, you don't need to regularize the scale   #
    # and shift parameters.                                                    #
    #                                                                          #
    # NOTE: To ensure that your implementation matches ours and you pass the   #
    # automated tests, make sure that your L2 regularization includes a factor #
    # of 0.5 to simplify the expression for the gradient.                      #
    ############################################################################
    #写的太繁琐,效率且低
    if self.use_batchnorm and self.use_dropout:
      loss, dscores = softmax_loss(scores, y)
      for i in xrange(self.num_layers):
        loss += 0.5 * self.reg * np.sum(self.params['W'+str(i+1)]**2)

      dout = []
      dout.append(dscores)
      for i in reversed(xrange(self.num_layers)):
        #print len(cache[i])
        if i==self.num_layers-1:
          dout_i, dw, db = affine_backward(dout[self.num_layers-i-1], cache[i])
          dout.append(dout_i)
          grads['W'+str(i+1)] = dw + self.reg * cache[i][1]
          grads['b'+str(i+1)] = db
        else:
          dout_i, dw, db, dgamma, dbeta = \
                    affine_bn_relu_dp_backward(dout[self.num_layers-i-1], cache[i])

          dout.append(dout_i)
          grads['W'+str(i+1)] = dw + self.reg * cache[i][0][1]
          grads['b'+str(i+1)] = db
          grads['gamma'+str(i+1)] = dgamma
          grads['beta'+str(i+1)] = dbeta
    elif self.use_batchnorm and not self.use_dropout:
      loss, dscores = softmax_loss(scores, y)
      for i in xrange(self.num_layers):
        loss += 0.5 * self.reg * np.sum(self.params['W'+str(i+1)]**2)

      dout = []
      dout.append(dscores)
      for i in reversed(xrange(self.num_layers)):
        #print len(cache[i])
        if i==self.num_layers-1:
          dout_i, dw, db = affine_backward(dout[self.num_layers-i-1], cache[i])
          dout.append(dout_i)
          grads['W'+str(i+1)] = dw + self.reg * cache[i][1]
          grads['b'+str(i+1)] = db
        else:
          dout_i, dw, db, dgamma, dbeta = \
                    affine_bn_relu_backward(dout[self.num_layers-i-1], cache[i])

          dout.append(dout_i)
          grads['W'+str(i+1)] = dw + self.reg * cache[i][0][1]
          grads['b'+str(i+1)] = db
          grads['gamma'+str(i+1)] = dgamma
          grads['beta'+str(i+1)] = dbeta
    elif self.use_dropout and not self.use_batchnorm:
      loss, dscores = softmax_loss(scores, y)
      for i in xrange(self.num_layers):
        loss += 0.5 * self.reg * np.sum(self.params['W'+str(i+1)]**2)

      dout = []
      dout.append(dscores)
      for i in reversed(xrange(self.num_layers)):
        #print len(cache[i])
        if i==self.num_layers-1:
          dout_i, dw, db = affine_backward(dout[self.num_layers-i-1], cache[i])
          dout.append(dout_i)
          grads['W'+str(i+1)] = dw + self.reg * cache[i][1]
          grads['b'+str(i+1)] = db
        else:
          cache_i, dp_cache = cache[i]
          dout_ar = dropout_backward(dout[self.num_layers-i-1], dp_cache)

          dout_i, dw, db = affine_relu_backward(dout_ar, cache_i)
          dout.append(dout_i)
          grads['W'+str(i+1)] = dw + self.reg * cache_i[0][1]
          grads['b'+str(i+1)] = db
    else:
      loss, dscores = softmax_loss(scores, y)
      for i in xrange(self.num_layers):
        loss += 0.5 * self.reg * np.sum(self.params['W'+str(i+1)]**2)

      dout = []
      dout.append(dscores)
      for i in reversed(xrange(self.num_layers)):
        #print len(cache[i])
        if i==self.num_layers-1:
          dout_i, dw, db = affine_backward(dout[self.num_layers-i-1], cache[i])
          dout.append(dout_i)
          grads['W'+str(i+1)] = dw + self.reg * cache[i][1]
          grads['b'+str(i+1)] = db
        else:
          dout_i, dw, db = affine_relu_backward(dout[self.num_layers-i-1], cache[i])
          dout.append(dout_i)
          grads['W'+str(i+1)] = dw + self.reg * cache[i][0][1]
          grads['b'+str(i+1)] = db
    ############################################################################
    #                             END OF YOUR CODE                             #
    ############################################################################

    return loss, grads

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 221,548评论 6 515
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 94,497评论 3 399
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 167,990评论 0 360
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 59,618评论 1 296
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 68,618评论 6 397
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 52,246评论 1 308
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,819评论 3 421
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,725评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 46,268评论 1 320
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 38,356评论 3 340
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 40,488评论 1 352
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 36,181评论 5 350
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,862评论 3 333
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 32,331评论 0 24
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,445评论 1 272
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,897评论 3 376
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 45,500评论 2 359

推荐阅读更多精彩内容