cs231n:assignment2——Q1: Fully-connected Neural Network

视频里 Andrej Karpathy上课的时候说,这次的作业meaty but educational,确实很meaty,作业一般是由.ipynb文件和.py文件组成,这次因为每个.ipynb文件涉及到的.py文件较多,且互相之间有交叉,所以每篇博客只贴出一个.ipynb或者一个.py文件.(因为之前的作业由于是一个.ipynb文件对应一个.py文件,所以就整合到一篇博客里)
还是那句话,有错误希望帮我指出来,多多指教,谢谢
FullyConnectedNets.ipynb内容:

[TOC]

Fully-Connected Neural Nets

In the previous homework you implemented a fully-connected two-layer neural network on CIFAR-10. The implementation was simple but not very modular since the loss and gradient were computed in a single monolithic function. This is manageable for a simple two-layer network, but would become impractical as we move to bigger models. Ideally we want to build networks using a more modular design so that we can implement different layer types in isolation and then snap them together into models with different architectures.

In this exercise we will implement fully-connected networks using a more modular approach. For each layer we will implement a forward and a backward function. The forward function will receive inputs, weights, and other parameters and will return both an output and a cache object storing data needed for the backward pass, like this:

def layer_forward(x, w):
  """ Receive inputs x and weights w """
  # Do some computations ...
  z = # ... some intermediate value
  # Do some more computations ...
  out = # the output
   
  cache = (x, w, z, out) # Values we need to compute gradients
   
  return out, cache

The backward pass will receive upstream derivatives and the cache object, and will return gradients with respect to the inputs and weights, like this:

def layer_backward(dout, cache):
  """
  Receive derivative of loss with respect to outputs and cache,
  and compute derivative with respect to inputs.
  """
  # Unpack cache values
  x, w, z, out = cache
  
  # Use values in cache to compute derivatives
  dx = # Derivative of loss with respect to x
  dw = # Derivative of loss with respect to w
  
  return dx, dw

After implementing a bunch of layers this way, we will be able to easily combine them to build classifiers with different architectures.

In addition to implementing fully-connected networks of arbitrary depth, we will also explore different update rules for optimization, and introduce Dropout as a regularizer and Batch Normalization as a tool to more efficiently optimize deep networks.

# As usual, a bit of setup

import time
import numpy as np
import matplotlib.pyplot as plt
from cs231n.classifiers.fc_net import *
from cs231n.data_utils import get_CIFAR10_data
from cs231n.gradient_check import eval_numerical_gradient, eval_numerical_gradient_array
from cs231n.solver import Solver

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

def rel_error(x, y):
  """ returns relative error """
  return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))
# Load the (preprocessed) CIFAR10 data.

data = get_CIFAR10_data()
for k, v in data.iteritems():
  print '%s: ' % k, v.shape
X_val:  (1000, 3, 32, 32)
X_train:  (49000, 3, 32, 32)
X_test:  (1000, 3, 32, 32)
y_val:  (1000,)
y_train:  (49000,)
y_test:  (1000,)

Affine layer: foward

Open the file cs231n/layers.py and implement the affine_forward function.

Once you are done you can test your implementaion by running the following:

# Test the affine_forward function

num_inputs = 2
input_shape = (4, 5, 6)
output_dim = 3

input_size = num_inputs * np.prod(input_shape)
weight_size = output_dim * np.prod(input_shape)

x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)
w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)
b = np.linspace(-0.3, 0.1, num=output_dim)

out, _ = affine_forward(x, w, b)
correct_out = np.array([[ 1.49834967,  1.70660132,  1.91485297],
                        [ 3.25553199,  3.5141327,   3.77273342]])

# Compare your output with ours. The error should be around 1e-9.
print 'Testing affine_forward function:'
print 'difference: ', rel_error(out, correct_out)
Testing affine_forward function:
difference:  9.76985004799e-10

Affine layer: backward

Now implement the affine_backward function and test your implementation using numeric gradient checking.

# Test the affine_backward function

x = np.random.randn(10, 2, 3)
w = np.random.randn(6, 5)
b = np.random.randn(5)
dout = np.random.randn(10, 5)

dx_num = eval_numerical_gradient_array(lambda x: affine_forward(x, w, b)[0], x, dout)
dw_num = eval_numerical_gradient_array(lambda w: affine_forward(x, w, b)[0], w, dout)
db_num = eval_numerical_gradient_array(lambda b: affine_forward(x, w, b)[0], b, dout)

_, cache = affine_forward(x, w, b)
dx, dw, db = affine_backward(dout, cache)

# The error should be around 1e-10
print 'Testing affine_backward function:'
print 'dx error: ', rel_error(dx_num, dx)
print 'dw error: ', rel_error(dw_num, dw)
print 'db error: ', rel_error(db_num, db)
Testing affine_backward function:
dx error:  5.82176848644e-11
dw error:  1.69054721917e-10
db error:  1.40577633097e-11

ReLU layer: forward

Implement the forward pass for the ReLU activation function in the relu_forward function and test your implementation using the following:

# Test the relu_forward function

x = np.linspace(-0.5, 0.5, num=12).reshape(3, 4)

out, _ = relu_forward(x)
correct_out = np.array([[ 0.,          0.,          0.,          0.,        ],
                        [ 0.,          0.,          0.04545455,  0.13636364,],
                        [ 0.22727273,  0.31818182,  0.40909091,  0.5,       ]])

# Compare your output with ours. The error should be around 1e-8
print 'Testing relu_forward function:'
print 'difference: ', rel_error(out, correct_out)
Testing relu_forward function:
difference:  4.99999979802e-08

ReLU layer: backward

Now implement the backward pass for the ReLU activation function in the relu_backward function and test your implementation using numeric gradient checking:

x = np.random.randn(10, 10)
dout = np.random.randn(*x.shape)

dx_num = eval_numerical_gradient_array(lambda x: relu_forward(x)[0], x, dout)

_, cache = relu_forward(x)
dx = relu_backward(dout, cache)

# The error should be around 1e-12
print 'Testing relu_backward function:'
print 'dx error: ', rel_error(dx_num, dx)
Testing relu_backward function:
dx error:  3.27562740606e-12

"Sandwich" layers

There are some common patterns of layers that are frequently used in neural nets. For example, affine layers are frequently followed by a ReLU nonlinearity. To make these common patterns easy, we define several convenience layers in the file cs231n/layer_utils.py.

For now take a look at the affine_relu_forward and affine_relu_backward functions, and run the following to numerically gradient check the backward pass:

from cs231n.layer_utils import affine_relu_forward, affine_relu_backward

x = np.random.randn(2, 3, 4)
w = np.random.randn(12, 10)
b = np.random.randn(10)
dout = np.random.randn(2, 10)

out, cache = affine_relu_forward(x, w, b)
dx, dw, db = affine_relu_backward(dout, cache)

dx_num = eval_numerical_gradient_array(lambda x: affine_relu_forward(x, w, b)[0], x, dout)
dw_num = eval_numerical_gradient_array(lambda w: affine_relu_forward(x, w, b)[0], w, dout)
db_num = eval_numerical_gradient_array(lambda b: affine_relu_forward(x, w, b)[0], b, dout)

print 'Testing affine_relu_forward:'
print 'dx error: ', rel_error(dx_num, dx)
print 'dw error: ', rel_error(dw_num, dw)
print 'db error: ', rel_error(db_num, db)
Testing affine_relu_forward:
dx error:  3.60036208641e-10
dw error:  2.61229361266e-09
db error:  4.99397627854e-12

Loss layers: Softmax and SVM

You implemented these loss functions in the last assignment, so we'll give them to you for free here. You should still make sure you understand how they work by looking at the implementations in cs231n/layers.py.

You can make sure that the implementations are correct by running the following:

num_classes, num_inputs = 10, 50
x = 0.001 * np.random.randn(num_inputs, num_classes)
y = np.random.randint(num_classes, size=num_inputs)

dx_num = eval_numerical_gradient(lambda x: svm_loss(x, y)[0], x, verbose=False)
loss, dx = svm_loss(x, y)

# Test svm_loss function. Loss should be around 9 and dx error should be 1e-9
print 'Testing svm_loss:'
print 'loss: ', loss
print 'dx error: ', rel_error(dx_num, dx)

dx_num = eval_numerical_gradient(lambda x: softmax_loss(x, y)[0], x, verbose=False)
loss, dx = softmax_loss(x, y)

# Test softmax_loss function. Loss should be 2.3 and dx error should be 1e-8
print '\nTesting softmax_loss:'
print 'loss: ', loss
print 'dx error: ', rel_error(dx_num, dx)
Testing svm_loss:
loss:  9.00052703662
dx error:  1.40215660067e-09

Testing softmax_loss:
loss:  2.30263822083
dx error:  1.0369484028e-08

Two-layer network

In the previous assignment you implemented a two-layer neural network in a single monolithic class. Now that you have implemented modular versions of the necessary layers, you will reimplement the two layer network using these modular implementations.

Open the file cs231n/classifiers/fc_net.py and complete the implementation of the TwoLayerNet class. This class will serve as a model for the other networks you will implement in this assignment, so read through it to make sure you understand the API. You can run the cell below to test your implementation.

N, D, H, C = 3, 5, 50, 7
X = np.random.randn(N, D)
y = np.random.randint(C, size=N)

std = 1e-2
model = TwoLayerNet(input_dim=D, hidden_dim=H, num_classes=C, weight_scale=std)

print 'Testing initialization ... '
W1_std = abs(model.params['W1'].std() - std)
b1 = model.params['b1']
W2_std = abs(model.params['W2'].std() - std)
b2 = model.params['b2']
assert W1_std < std / 10, 'First layer weights do not seem right'
assert np.all(b1 == 0), 'First layer biases do not seem right'
assert W2_std < std / 10, 'Second layer weights do not seem right'
assert np.all(b2 == 0), 'Second layer biases do not seem right'

print 'Testing test-time forward pass ... '
model.params['W1'] = np.linspace(-0.7, 0.3, num=D*H).reshape(D, H)
model.params['b1'] = np.linspace(-0.1, 0.9, num=H)
model.params['W2'] = np.linspace(-0.3, 0.4, num=H*C).reshape(H, C)
model.params['b2'] = np.linspace(-0.9, 0.1, num=C)
X = np.linspace(-5.5, 4.5, num=N*D).reshape(D, N).T
scores = model.loss(X)
correct_scores = np.asarray(
  [[11.53165108,  12.2917344,   13.05181771,  13.81190102,  14.57198434, 15.33206765,  16.09215096],
   [12.05769098,  12.74614105,  13.43459113,  14.1230412,   14.81149128, 15.49994135,  16.18839143],
   [12.58373087,  13.20054771,  13.81736455,  14.43418138,  15.05099822, 15.66781506,  16.2846319 ]])
scores_diff = np.abs(scores - correct_scores).sum()
assert scores_diff < 1e-6, 'Problem with test-time forward pass'

print 'Testing training loss (no regularization)'
y = np.asarray([0, 5, 1])
loss, grads = model.loss(X, y)
correct_loss = 3.4702243556
assert abs(loss - correct_loss) < 1e-10, 'Problem with training-time loss'

model.reg = 1.0
loss, grads = model.loss(X, y)
correct_loss = 26.5948426952
assert abs(loss - correct_loss) < 1e-10, 'Problem with regularization loss'

for reg in [0.0, 0.7]:
  print 'Running numeric gradient check with reg = ', reg
  model.reg = reg
  loss, grads = model.loss(X, y)

  for name in sorted(grads):
    f = lambda _: model.loss(X, y)[0]
    grad_num = eval_numerical_gradient(f, model.params[name], verbose=False)
    print '%s relative error: %.2e' % (name, rel_error(grad_num, grads[name]))
Testing initialization ... 
Testing test-time forward pass ... 
Testing training loss (no regularization)
Running numeric gradient check with reg =  0.0
W1 relative error: 1.22e-08
W2 relative error: 3.34e-10
b1 relative error: 4.73e-09
b2 relative error: 4.33e-10
Running numeric gradient check with reg =  0.7
W1 relative error: 2.53e-07
W2 relative error: 1.37e-07
b1 relative error: 1.56e-08
b2 relative error: 9.09e-10

Solver

In the previous assignment, the logic for training models was coupled to the models themselves. Following a more modular design, for this assignment we have split the logic for training models into a separate class.

Open the file cs231n/solver.py and read through it to familiarize yourself with the API. After doing so, use a Solver instance to train a TwoLayerNet that achieves at least 50% accuracy on the validation set.

model = TwoLayerNet()
solver = None

##############################################################################
# TODO: Use a Solver instance to train a TwoLayerNet that achieves at least  #
# 50% accuracy on the validation set.                                        #
##############################################################################
solver = Solver(model, data,
                  update_rule='sgd',
                  optim_config={
                    'learning_rate': 1e-3,
                  },
                  lr_decay=0.95,
                  num_epochs=10, batch_size=100,
                  print_every=100)
solver.train()
solver.best_val_acc
##############################################################################
#                             END OF YOUR CODE                               #
##############################################################################
(Iteration 1 / 4900) loss: 2.309509
(Epoch 0 / 10) train acc: 0.111000; val_acc: 0.124000
(Iteration 101 / 4900) loss: 2.031418
(Iteration 201 / 4900) loss: 1.712236
(Iteration 301 / 4900) loss: 1.747420
(Iteration 401 / 4900) loss: 1.549451
(Epoch 1 / 10) train acc: 0.450000; val_acc: 0.414000
(Iteration 501 / 4900) loss: 1.630659
(Iteration 601 / 4900) loss: 1.491387
(Iteration 701 / 4900) loss: 1.442918
(Iteration 801 / 4900) loss: 1.351634
(Iteration 901 / 4900) loss: 1.453418
(Epoch 2 / 10) train acc: 0.491000; val_acc: 0.484000
(Iteration 1001 / 4900) loss: 1.485202
(Iteration 1101 / 4900) loss: 1.383021
(Iteration 1201 / 4900) loss: 1.346942
(Iteration 1301 / 4900) loss: 1.252413
(Iteration 1401 / 4900) loss: 1.537722
(Epoch 3 / 10) train acc: 0.521000; val_acc: 0.480000
(Iteration 1501 / 4900) loss: 1.365271
(Iteration 1601 / 4900) loss: 1.123946
(Iteration 1701 / 4900) loss: 1.315114
(Iteration 1801 / 4900) loss: 1.597782
(Iteration 1901 / 4900) loss: 1.416204
(Epoch 4 / 10) train acc: 0.546000; val_acc: 0.494000
(Iteration 2001 / 4900) loss: 1.114552
(Iteration 2101 / 4900) loss: 1.377966
(Iteration 2201 / 4900) loss: 1.121448
(Iteration 2301 / 4900) loss: 1.306290
(Iteration 2401 / 4900) loss: 1.404830
(Epoch 5 / 10) train acc: 0.559000; val_acc: 0.500000
(Iteration 2501 / 4900) loss: 1.123347
(Iteration 2601 / 4900) loss: 1.449507
(Iteration 2701 / 4900) loss: 1.308397
(Iteration 2801 / 4900) loss: 1.375048
(Iteration 2901 / 4900) loss: 1.259040
(Epoch 6 / 10) train acc: 0.572000; val_acc: 0.491000
(Iteration 3001 / 4900) loss: 1.119232
(Iteration 3101 / 4900) loss: 1.270312
(Iteration 3201 / 4900) loss: 1.204007
(Iteration 3301 / 4900) loss: 1.214074
(Iteration 3401 / 4900) loss: 1.110863
(Epoch 7 / 10) train acc: 0.566000; val_acc: 0.514000
(Iteration 3501 / 4900) loss: 1.253669
(Iteration 3601 / 4900) loss: 1.354838
(Iteration 3701 / 4900) loss: 1.299770
(Iteration 3801 / 4900) loss: 1.184324
(Iteration 3901 / 4900) loss: 1.154244
(Epoch 8 / 10) train acc: 0.594000; val_acc: 0.498000
(Iteration 4001 / 4900) loss: 0.911092
(Iteration 4101 / 4900) loss: 1.154072
(Iteration 4201 / 4900) loss: 1.106225
(Iteration 4301 / 4900) loss: 1.279295
(Iteration 4401 / 4900) loss: 1.046316
(Epoch 9 / 10) train acc: 0.611000; val_acc: 0.503000
(Iteration 4501 / 4900) loss: 1.172954
(Iteration 4601 / 4900) loss: 1.040094
(Iteration 4701 / 4900) loss: 1.369539
(Iteration 4801 / 4900) loss: 1.106506
(Epoch 10 / 10) train acc: 0.588000; val_acc: 0.5150


0.51500000000000001
# Run this cell to visualize training loss and train / val accuracy

plt.subplot(2, 1, 1)
plt.title('Training loss')
plt.plot(solver.loss_history, 'o')
plt.xlabel('Iteration')

plt.subplot(2, 1, 2)
plt.title('Accuracy')
plt.plot(solver.train_acc_history, '-o', label='train')
plt.plot(solver.val_acc_history, '-o', label='val')
plt.plot([0.5] * len(solver.val_acc_history), 'k--')
plt.xlabel('Epoch')
plt.legend(loc='lower right')
plt.gcf().set_size_inches(15, 12)
plt.show()
output19

Multilayer network

Next you will implement a fully-connected network with an arbitrary number of hidden layers.

Read through the FullyConnectedNet class in the file cs231n/classifiers/fc_net.py.

Implement the initialization, the forward pass, and the backward pass. For the moment don't worry about implementing dropout or batch normalization; we will add those features soon.

Initial loss and gradient check

As a sanity check, run the following to check the initial loss and to gradient check the network both with and without regularization. Do the initial losses seem reasonable?

For gradient checking, you should expect to see errors around 1e-6 or less.

# 有的时候relative error会比较大,能达到1e-2的数量级,但是多运行几次,所有参数的relative error都比较小,应该是随机初始化参数的影响
N, D, H1, H2, C = 2, 15, 20, 30, 10
X = np.random.randn(N, D)
y = np.random.randint(C, size=(N,))

for reg in [0, 3.14,0.02]:
  print 'Running check with reg = ', reg
  model = FullyConnectedNet([H1, H2], input_dim=D, num_classes=C,
                            reg=reg, weight_scale=5e-2, dtype=np.float64)

  loss, grads = model.loss(X, y)
  print 'Initial loss: ', loss

  for name in sorted(grads):
    f = lambda _: model.loss(X, y)[0]
    grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)
    print '%s relative error: %.2e' % (name, rel_error(grad_num, grads[name]))
Running check with reg =  0
Initial loss:  2.29966459663
W1 relative error: 2.92e-07
W2 relative error: 2.17e-05
W3 relative error: 4.38e-08
b1 relative error: 3.54e-08
b2 relative error: 1.45e-08
b3 relative error: 1.31e-10
Running check with reg =  3.14
Initial loss:  6.71836699258
W1 relative error: 2.65e-07
W2 relative error: 2.28e-07
W3 relative error: 3.79e-06
b1 relative error: 7.94e-09
b2 relative error: 1.73e-08
b3 relative error: 2.05e-10
Running check with reg =  0.02
Initial loss:  2.32843212504
W1 relative error: 1.19e-07
W2 relative error: 1.47e-06
W3 relative error: 8.67e-06
b1 relative error: 2.08e-08
b2 relative error: 1.21e-02
b3 relative error: 1.39e-10

As another sanity check, make sure you can overfit a small dataset of 50 images. First we will try a three-layer network with 100 units in each hidden layer. You will need to tweak the learning rate and initialization scale, but you should be able to overfit and achieve 100% training accuracy within 20 epochs.

# TODO: Use a three-layer Net to overfit 50 training examples.

num_train = 50
small_data = {
  'X_train': data['X_train'][:num_train],
  'y_train': data['y_train'][:num_train],
  'X_val': data['X_val'],
  'y_val': data['y_val'],
}

#weight_scale = 1e-2
#learning_rate = 1e-4
weight_scale = 4e-2
learning_rate = 1e-3
model = FullyConnectedNet([100, 100],
              weight_scale=weight_scale, dtype=np.float64)
solver = Solver(model, small_data,
                print_every=10, num_epochs=20, batch_size=25,
                update_rule='sgd',
                optim_config={
                  'learning_rate': learning_rate,
                }
         )
solver.train()

plt.plot(solver.loss_history, 'o')
plt.title('Training loss history')
plt.xlabel('Iteration')
plt.ylabel('Training loss')
plt.show()
(Iteration 1 / 40) loss: 10.016980
(Epoch 0 / 20) train acc: 0.260000; val_acc: 0.110000
(Epoch 1 / 20) train acc: 0.280000; val_acc: 0.131000
(Epoch 2 / 20) train acc: 0.380000; val_acc: 0.130000
(Epoch 3 / 20) train acc: 0.540000; val_acc: 0.114000
(Epoch 4 / 20) train acc: 0.800000; val_acc: 0.110000
(Epoch 5 / 20) train acc: 0.880000; val_acc: 0.121000
(Iteration 11 / 40) loss: 0.474159
(Epoch 6 / 20) train acc: 0.940000; val_acc: 0.136000
(Epoch 7 / 20) train acc: 0.920000; val_acc: 0.143000
(Epoch 8 / 20) train acc: 1.000000; val_acc: 0.141000
(Epoch 9 / 20) train acc: 1.000000; val_acc: 0.140000
(Epoch 10 / 20) train acc: 1.000000; val_acc: 0.138000
(Iteration 21 / 40) loss: 0.049274
(Epoch 11 / 20) train acc: 1.000000; val_acc: 0.139000
(Epoch 12 / 20) train acc: 1.000000; val_acc: 0.141000
(Epoch 13 / 20) train acc: 1.000000; val_acc: 0.142000
(Epoch 14 / 20) train acc: 1.000000; val_acc: 0.141000
(Epoch 15 / 20) train acc: 1.000000; val_acc: 0.141000
(Iteration 31 / 40) loss: 0.011080
(Epoch 16 / 20) train acc: 1.000000; val_acc: 0.139000
(Epoch 17 / 20) train acc: 1.000000; val_acc: 0.138000
(Epoch 18 / 20) train acc: 1.000000; val_acc: 0.138000
(Epoch 19 / 20) train acc: 1.000000; val_acc: 0.134000
(Epoch 20 / 20) train acc: 1.000000; val_acc: 0.13300
output25

Now try to use a five-layer network with 100 units on each layer to overfit 50 training examples. Again you will have to adjust the learning rate and weight initialization, but you should be able to achieve 100% training accuracy within 20 epochs.

# TODO: Use a five-layer Net to overfit 50 training examples.

num_train = 50
small_data = {
  'X_train': data['X_train'][:num_train],
  'y_train': data['y_train'][:num_train],
  'X_val': data['X_val'],
  'y_val': data['y_val'],
}

# learning_rate = 1e-3
# weight_scale = 1e-5
learning_rate = 1e-3
weight_scale = 6e-2
model = FullyConnectedNet([100, 100, 100, 100],
                weight_scale=weight_scale, dtype=np.float64)
solver = Solver(model, small_data,
                print_every=10, num_epochs=20, batch_size=25,
                update_rule='sgd',
                optim_config={
                  'learning_rate': learning_rate,
                }
         )
solver.train()

plt.plot(solver.loss_history, 'o')
plt.title('Training loss history')
plt.xlabel('Iteration')
plt.ylabel('Training loss')
plt.show()
(Iteration 1 / 40) loss: 8.242625
(Epoch 0 / 20) train acc: 0.040000; val_acc: 0.108000
(Epoch 1 / 20) train acc: 0.180000; val_acc: 0.119000
(Epoch 2 / 20) train acc: 0.260000; val_acc: 0.126000
(Epoch 3 / 20) train acc: 0.480000; val_acc: 0.116000
(Epoch 4 / 20) train acc: 0.500000; val_acc: 0.110000
(Epoch 5 / 20) train acc: 0.600000; val_acc: 0.114000
(Iteration 11 / 40) loss: 1.805009
(Epoch 6 / 20) train acc: 0.800000; val_acc: 0.113000
(Epoch 7 / 20) train acc: 0.860000; val_acc: 0.108000
(Epoch 8 / 20) train acc: 0.920000; val_acc: 0.116000
(Epoch 9 / 20) train acc: 0.960000; val_acc: 0.113000
(Epoch 10 / 20) train acc: 0.960000; val_acc: 0.116000
(Iteration 21 / 40) loss: 0.137192
(Epoch 11 / 20) train acc: 0.980000; val_acc: 0.113000
(Epoch 12 / 20) train acc: 0.980000; val_acc: 0.118000
(Epoch 13 / 20) train acc: 0.980000; val_acc: 0.118000
(Epoch 14 / 20) train acc: 0.980000; val_acc: 0.118000
(Epoch 15 / 20) train acc: 0.980000; val_acc: 0.118000
(Iteration 31 / 40) loss: 0.084054
(Epoch 16 / 20) train acc: 1.000000; val_acc: 0.118000
(Epoch 17 / 20) train acc: 1.000000; val_acc: 0.113000
(Epoch 18 / 20) train acc: 1.000000; val_acc: 0.115000
(Epoch 19 / 20) train acc: 1.000000; val_acc: 0.118000
(Epoch 20 / 20) train acc: 1.000000; val_acc: 0.119000
output27

Inline question:

Did you notice anything about the comparative difficulty of training the three-layer net vs training the five layer net?

Answer:

training five-layer net need bigger weight_scale since it has deeper net so five-layer net's weights get higher probablity to decrease to zero.
As five-layer net initialize weights with higher weight scale, so it needs bigger learning rate.
three-layer net is more robust than five-layer net.
5层网络比三层网络更深,所以计算过程中的值越来越小vanish现象更严重,所以需要讲weight scale调大,因为weight scale调大了,所以同样条件下,学习率也要调大才能在同样步骤内更好的训练网络.5层网络比三层更敏感和脆弱.
其实不太懂他想问啥,感觉很容易就调到了100%

Update rules

So far we have used vanilla stochastic gradient descent (SGD) as our update rule. More sophisticated update rules can make it easier to train deep networks. We will implement a few of the most commonly used update rules and compare them to vanilla SGD.

SGD+Momentum

Stochastic gradient descent with momentum is a widely used update rule that tends to make deep networks converge faster than vanilla stochstic gradient descent.

Open the file cs231n/optim.py and read the documentation at the top of the file to make sure you understand the API. Implement the SGD+momentum update rule in the function sgd_momentum and run the following to check your implementation. You should see errors less than 1e-8.

from cs231n.optim import sgd_momentum

N, D = 4, 5
w = np.linspace(-0.4, 0.6, num=N*D).reshape(N, D)
dw = np.linspace(-0.6, 0.4, num=N*D).reshape(N, D)
v = np.linspace(0.6, 0.9, num=N*D).reshape(N, D)

config = {'learning_rate': 1e-3, 'velocity': v}
next_w, _ = sgd_momentum(w, dw, config=config)

expected_next_w = np.asarray([
  [ 0.1406,      0.20738947,  0.27417895,  0.34096842,  0.40775789],
  [ 0.47454737,  0.54133684,  0.60812632,  0.67491579,  0.74170526],
  [ 0.80849474,  0.87528421,  0.94207368,  1.00886316,  1.07565263],
  [ 1.14244211,  1.20923158,  1.27602105,  1.34281053,  1.4096    ]])
expected_velocity = np.asarray([
  [ 0.5406,      0.55475789,  0.56891579, 0.58307368,  0.59723158],
  [ 0.61138947,  0.62554737,  0.63970526,  0.65386316,  0.66802105],
  [ 0.68217895,  0.69633684,  0.71049474,  0.72465263,  0.73881053],
  [ 0.75296842,  0.76712632,  0.78128421,  0.79544211,  0.8096    ]])

print 'next_w error: ', rel_error(next_w, expected_next_w)
print 'velocity error: ', rel_error(expected_velocity, config['velocity'])
next_w error:  8.88234703351e-09
velocity error:  4.26928774328e-09

Once you have done so, run the following to train a six-layer network with both SGD and SGD+momentum. You should see the SGD+momentum update rule converge faster.

num_train = 4000
small_data = {
  'X_train': data['X_train'][:num_train],
  'y_train': data['y_train'][:num_train],
  'X_val': data['X_val'],
  'y_val': data['y_val'],
}

solvers = {}

for update_rule in ['sgd', 'sgd_momentum']:
  print 'running with ', update_rule
  model = FullyConnectedNet([100, 100, 100, 100, 100], weight_scale=5e-2)

  solver = Solver(model, small_data,
                  num_epochs=5, batch_size=100,
                  update_rule=update_rule,
                  optim_config={
                    'learning_rate': 1e-2,
                  },
                  verbose=True)
  solvers[update_rule] = solver
  solver.train()
  print

plt.subplot(3, 1, 1)
plt.title('Training loss')
plt.xlabel('Iteration')

plt.subplot(3, 1, 2)
plt.title('Training accuracy')
plt.xlabel('Epoch')

plt.subplot(3, 1, 3)
plt.title('Validation accuracy')
plt.xlabel('Epoch')

for update_rule, solver in solvers.iteritems():
  plt.subplot(3, 1, 1)
  plt.plot(solver.loss_history, 'o', label=update_rule)
  
  plt.subplot(3, 1, 2)
  plt.plot(solver.train_acc_history, '-o', label=update_rule)

  plt.subplot(3, 1, 3)
  plt.plot(solver.val_acc_history, '-o', label=update_rule)
  
for i in [1, 2, 3]:
  plt.subplot(3, 1, i)
  plt.legend(loc='upper center', ncol=4)
plt.gcf().set_size_inches(15, 15)
plt.show()
running with  sgd
(Iteration 1 / 200) loss: 2.482962
(Epoch 0 / 5) train acc: 0.103000; val_acc: 0.116000
(Iteration 11 / 200) loss: 2.189759
(Iteration 21 / 200) loss: 2.118428
(Iteration 31 / 200) loss: 2.146263
(Epoch 1 / 5) train acc: 0.234000; val_acc: 0.231000
(Iteration 41 / 200) loss: 2.136812
(Iteration 51 / 200) loss: 2.058494
(Iteration 61 / 200) loss: 2.010344
(Iteration 71 / 200) loss: 1.935777
(Epoch 2 / 5) train acc: 0.341000; val_acc: 0.278000
(Iteration 81 / 200) loss: 1.848450
(Iteration 91 / 200) loss: 1.890258
(Iteration 101 / 200) loss: 1.851392
(Iteration 111 / 200) loss: 1.890978
(Epoch 3 / 5) train acc: 0.364000; val_acc: 0.316000
(Iteration 121 / 200) loss: 1.674997
(Iteration 131 / 200) loss: 1.753746
(Iteration 141 / 200) loss: 1.677929
(Iteration 151 / 200) loss: 1.651327
(Epoch 4 / 5) train acc: 0.378000; val_acc: 0.296000
(Iteration 161 / 200) loss: 1.707673
(Iteration 171 / 200) loss: 1.771841
(Iteration 181 / 200) loss: 1.650195
(Iteration 191 / 200) loss: 1.671102
(Epoch 5 / 5) train acc: 0.414000; val_acc: 0.347000

running with  sgd_momentum
(Iteration 1 / 200) loss: 2.779826
(Epoch 0 / 5) train acc: 0.080000; val_acc: 0.090000
(Iteration 11 / 200) loss: 2.151418
(Iteration 21 / 200) loss: 2.005661
(Iteration 31 / 200) loss: 2.018002
(Epoch 1 / 5) train acc: 0.334000; val_acc: 0.327000
(Iteration 41 / 200) loss: 1.914837
(Iteration 51 / 200) loss: 1.745527
(Iteration 61 / 200) loss: 1.829091
(Iteration 71 / 200) loss: 1.646542
(Epoch 2 / 5) train acc: 0.354000; val_acc: 0.311000
(Iteration 81 / 200) loss: 1.561354
(Iteration 91 / 200) loss: 1.687099
(Iteration 101 / 200) loss: 1.644848
(Iteration 111 / 200) loss: 1.604384
(Epoch 3 / 5) train acc: 0.447000; val_acc: 0.334000
(Iteration 121 / 200) loss: 1.727682
(Iteration 131 / 200) loss: 1.569907
(Iteration 141 / 200) loss: 1.565606
(Iteration 151 / 200) loss: 1.674119
(Epoch 4 / 5) train acc: 0.466000; val_acc: 0.390000
(Iteration 161 / 200) loss: 1.364019
(Iteration 171 / 200) loss: 1.449550
(Iteration 181 / 200) loss: 1.510401
(Iteration 191 / 200) loss: 1.353840
(Epoch 5 / 5) train acc: 0.514000; val_acc: 0.368000
output33

RMSProp and Adam

RMSProp [1] and Adam [2] are update rules that set per-parameter learning rates by using a running average of the second moments of gradients.

In the file cs231n/optim.py, implement the RMSProp update rule in the rmsprop function and implement the Adam update rule in the adam function, and check your implementations using the tests below.

[1] Tijmen Tieleman and Geoffrey Hinton. "Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude." COURSERA: Neural Networks for Machine Learning 4 (2012).

[2] Diederik Kingma and Jimmy Ba, "Adam: A Method for Stochastic Optimization", ICLR 2015.

# Test RMSProp implementation; you should see errors less than 1e-7
from cs231n.optim import rmsprop

N, D = 4, 5
w = np.linspace(-0.4, 0.6, num=N*D).reshape(N, D)
dw = np.linspace(-0.6, 0.4, num=N*D).reshape(N, D)
cache = np.linspace(0.6, 0.9, num=N*D).reshape(N, D)

config = {'learning_rate': 1e-2, 'cache': cache}
next_w, _ = rmsprop(w, dw, config=config)

expected_next_w = np.asarray([
  [-0.39223849, -0.34037513, -0.28849239, -0.23659121, -0.18467247],
  [-0.132737,   -0.08078555, -0.02881884,  0.02316247,  0.07515774],
  [ 0.12716641,  0.17918792,  0.23122175,  0.28326742,  0.33532447],
  [ 0.38739248,  0.43947102,  0.49155973,  0.54365823,  0.59576619]])
expected_cache = np.asarray([
  [ 0.5976,      0.6126277,   0.6277108,   0.64284931,  0.65804321],
  [ 0.67329252,  0.68859723,  0.70395734,  0.71937285,  0.73484377],
  [ 0.75037008,  0.7659518,   0.78158892,  0.79728144,  0.81302936],
  [ 0.82883269,  0.84469141,  0.86060554,  0.87657507,  0.8926    ]])

print 'next_w error: ', rel_error(expected_next_w, next_w)
print 'cache error: ', rel_error(expected_cache, config['cache'])
next_w error:  9.50264522989e-08
cache error:  2.64779558072e-09
# Test Adam implementation; you should see errors around 1e-7 or less
from cs231n.optim import adam

N, D = 4, 5
w = np.linspace(-0.4, 0.6, num=N*D).reshape(N, D)
dw = np.linspace(-0.6, 0.4, num=N*D).reshape(N, D)
m = np.linspace(0.6, 0.9, num=N*D).reshape(N, D)
v = np.linspace(0.7, 0.5, num=N*D).reshape(N, D)

config = {'learning_rate': 1e-2, 'm': m, 'v': v, 't': 5}
next_w, _ = adam(w, dw, config=config)

expected_next_w = np.asarray([
  [-0.40094747, -0.34836187, -0.29577703, -0.24319299, -0.19060977],
  [-0.1380274,  -0.08544591, -0.03286534,  0.01971428,  0.0722929],
  [ 0.1248705,   0.17744702,  0.23002243,  0.28259667,  0.33516969],
  [ 0.38774145,  0.44031188,  0.49288093,  0.54544852,  0.59801459]])
expected_v = np.asarray([
  [ 0.69966,     0.68908382,  0.67851319,  0.66794809,  0.65738853,],
  [ 0.64683452,  0.63628604,  0.6257431,   0.61520571,  0.60467385,],
  [ 0.59414753,  0.58362676,  0.57311152,  0.56260183,  0.55209767,],
  [ 0.54159906,  0.53110598,  0.52061845,  0.51013645,  0.49966,   ]])
expected_m = np.asarray([
  [ 0.48,        0.49947368,  0.51894737,  0.53842105,  0.55789474],
  [ 0.57736842,  0.59684211,  0.61631579,  0.63578947,  0.65526316],
  [ 0.67473684,  0.69421053,  0.71368421,  0.73315789,  0.75263158],
  [ 0.77210526,  0.79157895,  0.81105263,  0.83052632,  0.85      ]])

print 'next_w error: ', rel_error(expected_next_w, next_w)
print 'v error: ', rel_error(expected_v, config['v'])
print 'm error: ', rel_error(expected_m, config['m'])
next_w error:  1.13956917985e-07
v error:  4.20831403811e-09
m error:  4.21496319311e-09

Once you have debugged your RMSProp and Adam implementations, run the following to train a pair of deep networks using these new update rules:

learning_rates = {'rmsprop': 1e-4, 'adam': 1e-3}
for update_rule in ['adam', 'rmsprop']:
  print 'running with ', update_rule
  model = FullyConnectedNet([100, 100, 100, 100, 100], weight_scale=5e-2)

  solver = Solver(model, small_data,
                  num_epochs=5, batch_size=100,
                  update_rule=update_rule,
                  optim_config={
                    'learning_rate': learning_rates[update_rule]
                  },
                  verbose=True)
  solvers[update_rule] = solver
  solver.train()
  print

plt.subplot(3, 1, 1)
plt.title('Training loss')
plt.xlabel('Iteration')

plt.subplot(3, 1, 2)
plt.title('Training accuracy')
plt.xlabel('Epoch')

plt.subplot(3, 1, 3)
plt.title('Validation accuracy')
plt.xlabel('Epoch')

for update_rule, solver in solvers.iteritems():
  plt.subplot(3, 1, 1)
  plt.plot(solver.loss_history, 'o', label=update_rule)
  
  plt.subplot(3, 1, 2)
  plt.plot(solver.train_acc_history, '-o', label=update_rule)

  plt.subplot(3, 1, 3)
  plt.plot(solver.val_acc_history, '-o', label=update_rule)
  
for i in [1, 2, 3]:
  plt.subplot(3, 1, i)
  plt.legend(loc='upper center', ncol=4)
plt.gcf().set_size_inches(15, 15)
plt.show()
running with  adam
(Iteration 1 / 200) loss: 2.764716
(Epoch 0 / 5) train acc: 0.128000; val_acc: 0.124000
(Iteration 11 / 200) loss: 2.040898
(Iteration 21 / 200) loss: 1.774376
(Iteration 31 / 200) loss: 1.847699
(Epoch 1 / 5) train acc: 0.376000; val_acc: 0.306000
(Iteration 41 / 200) loss: 1.926563
(Iteration 51 / 200) loss: 1.720461
(Iteration 61 / 200) loss: 1.537673
(Iteration 71 / 200) loss: 1.603966
(Epoch 2 / 5) train acc: 0.422000; val_acc: 0.313000
(Iteration 81 / 200) loss: 1.602464
(Iteration 91 / 200) loss: 1.514707
(Iteration 101 / 200) loss: 1.341900
(Iteration 111 / 200) loss: 1.671358
(Epoch 3 / 5) train acc: 0.467000; val_acc: 0.353000
(Iteration 121 / 200) loss: 1.638983
(Iteration 131 / 200) loss: 1.433005
(Iteration 141 / 200) loss: 1.259506
(Iteration 151 / 200) loss: 1.510506
(Epoch 4 / 5) train acc: 0.552000; val_acc: 0.360000
(Iteration 161 / 200) loss: 1.234063
(Iteration 171 / 200) loss: 1.344069
(Iteration 181 / 200) loss: 1.179093
(Iteration 191 / 200) loss: 1.477335
(Epoch 5 / 5) train acc: 0.590000; val_acc: 0.375000

running with  rmsprop
(Iteration 1 / 200) loss: 2.498833
(Epoch 0 / 5) train acc: 0.164000; val_acc: 0.133000
(Iteration 11 / 200) loss: 2.099360
(Iteration 21 / 200) loss: 1.879164
(Iteration 31 / 200) loss: 1.856589
(Epoch 1 / 5) train acc: 0.367000; val_acc: 0.285000
(Iteration 41 / 200) loss: 1.831192
(Iteration 51 / 200) loss: 1.911368
(Iteration 61 / 200) loss: 1.693546
(Iteration 71 / 200) loss: 1.745929
(Epoch 2 / 5) train acc: 0.447000; val_acc: 0.339000
(Iteration 81 / 200) loss: 1.748112
(Iteration 91 / 200) loss: 1.613525
(Iteration 101 / 200) loss: 1.635379
(Iteration 111 / 200) loss: 1.491368
(Epoch 3 / 5) train acc: 0.467000; val_acc: 0.361000
(Iteration 121 / 200) loss: 1.372249
(Iteration 131 / 200) loss: 1.678353
(Iteration 141 / 200) loss: 1.521616
(Iteration 151 / 200) loss: 1.441466
(Epoch 4 / 5) train acc: 0.514000; val_acc: 0.339000
(Iteration 161 / 200) loss: 1.619760
(Iteration 171 / 200) loss: 1.432048
(Iteration 181 / 200) loss: 1.684465
(Iteration 191 / 200) loss: 1.532590
(Epoch 5 / 5) train acc: 0.528000; val_acc: 0.370000
output38

Train a good model!

Train the best fully-connected model that you can on CIFAR-10, storing your best model in the best_model variable. We require you to get at least 50% accuracy on the validation set using a fully-connected net.

If you are careful it should be possible to get accuracies above 55%, but we don't require it for this part and won't assign extra credit for doing so. Later in the assignment we will ask you to train the best convolutional network that you can on CIFAR-10, and we would prefer that you spend your effort working on convolutional nets rather than fully-connected nets.

You might find it useful to complete the BatchNormalization.ipynb and Dropout.ipynb notebooks before completing this part, since those techniques can help you train powerful models.

best_model = None
best_val_acc = 0
################################################################################
# TODO: Train the best FullyConnectedNet that you can on CIFAR-10. You might   #
# batch normalization and dropout useful. Store your best model in the         #
# best_model variable.                                                         #
################################################################################
reg_choice = [0, 0.02, 0.05]
#dropout_choice = [0.25, 0.5]
#netstructure_choice = [
#     [100,100],
#      [100, 100, 100],
#      [50, 50, 50, 50, 50, 50, 50]]
dropout_choice = [0]
netstructure_choice = [[100, 100]]
for hidden_dim in netstructure_choice:
    for dropout in dropout_choice:
        model = FullyConnectedNet(hidden_dim, reg=0, weight_scale=5e-2, dtype=np.float64,
                            use_batchnorm=True, dropout=dropout)
        solver = Solver(model, data,
                num_epochs=20, batch_size=100,
                update_rule='adam',
                optim_config={
                    'learning_rate': 5e-3
                    },
                print_every=100,
                lr_decay=0.95,
                verbose=True)
        solver.train()  
        if solver.best_val_acc>best_val_acc:
            best_model = model
            print 
            plt.subplot(2, 1, 1)
            plt.title('Training loss')
            plt.plot(solver.loss_history, 'o')
            plt.xlabel('Iteration')

            plt.subplot(2, 1, 2)
            plt.title('Accuracy')
            plt.plot(solver.train_acc_history, '-o', label='train')
            plt.plot(solver.val_acc_history, '-o', label='val')
            plt.plot([0.5] * len(solver.val_acc_history), 'k--')
            plt.xlabel('Epoch')
            plt.legend(loc='lower right')
            plt.gcf().set_size_inches(15, 12)
            plt.show()
################################################################################
#                              END OF YOUR CODE                                #
################################################################################
(Iteration 1 / 9800) loss: 2.263781
(Epoch 0 / 20) train acc: 0.179000; val_acc: 0.180000
(Iteration 101 / 9800) loss: 1.624115
(Iteration 201 / 9800) loss: 1.467661
(Iteration 301 / 9800) loss: 1.591997
(Iteration 401 / 9800) loss: 1.432411
(Epoch 1 / 20) train acc: 0.487000; val_acc: 0.491000
(Iteration 501 / 9800) loss: 1.241822
(Iteration 601 / 9800) loss: 1.546403
(Iteration 701 / 9800) loss: 1.411293
(Iteration 801 / 9800) loss: 1.375881
(Iteration 901 / 9800) loss: 1.242919
(Epoch 2 / 20) train acc: 0.520000; val_acc: 0.495000
(Iteration 1001 / 9800) loss: 1.316806
(Iteration 1101 / 9800) loss: 1.340302
(Iteration 1201 / 9800) loss: 1.335680
(Iteration 1301 / 9800) loss: 1.346994
(Iteration 1401 / 9800) loss: 1.156202
(Epoch 3 / 20) train acc: 0.553000; val_acc: 0.509000
(Iteration 1501 / 9800) loss: 1.111737
(Iteration 1601 / 9800) loss: 1.339837
(Iteration 1701 / 9800) loss: 1.218292
(Iteration 1801 / 9800) loss: 1.344992
(Iteration 1901 / 9800) loss: 1.198010
(Epoch 4 / 20) train acc: 0.574000; val_acc: 0.515000
(Iteration 2001 / 9800) loss: 1.185471
(Iteration 2101 / 9800) loss: 1.245266
(Iteration 2201 / 9800) loss: 1.046663
(Iteration 2301 / 9800) loss: 1.128248
(Iteration 2401 / 9800) loss: 1.100717
(Epoch 5 / 20) train acc: 0.626000; val_acc: 0.529000
(Iteration 2501 / 9800) loss: 1.076717
(Iteration 2601 / 9800) loss: 1.154111
(Iteration 2701 / 9800) loss: 1.077080
(Iteration 2801 / 9800) loss: 0.998500
(Iteration 2901 / 9800) loss: 1.051188
(Epoch 6 / 20) train acc: 0.590000; val_acc: 0.536000
(Iteration 3001 / 9800) loss: 1.004974
(Iteration 3101 / 9800) loss: 1.124638
(Iteration 3201 / 9800) loss: 1.073654
(Iteration 3301 / 9800) loss: 0.970181
(Iteration 3401 / 9800) loss: 1.115142
(Epoch 7 / 20) train acc: 0.630000; val_acc: 0.537000
(Iteration 3501 / 9800) loss: 0.869317
(Iteration 3601 / 9800) loss: 1.109377
(Iteration 3701 / 9800) loss: 1.037178
(Iteration 3801 / 9800) loss: 0.947001
(Iteration 3901 / 9800) loss: 0.989016
(Epoch 8 / 20) train acc: 0.633000; val_acc: 0.529000
(Iteration 4001 / 9800) loss: 0.949825
(Iteration 4101 / 9800) loss: 1.007835
(Iteration 4201 / 9800) loss: 0.894922
(Iteration 4301 / 9800) loss: 1.134644
(Iteration 4401 / 9800) loss: 0.932514
(Epoch 9 / 20) train acc: 0.609000; val_acc: 0.541000
(Iteration 4501 / 9800) loss: 1.117945
(Iteration 4601 / 9800) loss: 1.066002
(Iteration 4701 / 9800) loss: 0.858422
(Iteration 4801 / 9800) loss: 0.799150
(Epoch 10 / 20) train acc: 0.692000; val_acc: 0.527000
(Iteration 4901 / 9800) loss: 1.027588
(Iteration 5001 / 9800) loss: 0.903380
(Iteration 5101 / 9800) loss: 0.950514
(Iteration 5201 / 9800) loss: 0.891470
(Iteration 5301 / 9800) loss: 0.947976
(Epoch 11 / 20) train acc: 0.685000; val_acc: 0.523000
(Iteration 5401 / 9800) loss: 1.161916
(Iteration 5501 / 9800) loss: 1.039629
(Iteration 5601 / 9800) loss: 0.895261
(Iteration 5701 / 9800) loss: 0.855530
(Iteration 5801 / 9800) loss: 0.723047
(Epoch 12 / 20) train acc: 0.730000; val_acc: 0.516000
(Iteration 5901 / 9800) loss: 1.015861
(Iteration 6001 / 9800) loss: 0.921310
(Iteration 6101 / 9800) loss: 1.055507
(Iteration 6201 / 9800) loss: 0.917648
(Iteration 6301 / 9800) loss: 0.767686
(Epoch 13 / 20) train acc: 0.699000; val_acc: 0.543000
(Iteration 6401 / 9800) loss: 1.170058
(Iteration 6501 / 9800) loss: 0.810596
(Iteration 6601 / 9800) loss: 0.920641
(Iteration 6701 / 9800) loss: 0.725889
(Iteration 6801 / 9800) loss: 0.931281
(Epoch 14 / 20) train acc: 0.667000; val_acc: 0.531000
(Iteration 6901 / 9800) loss: 0.701817
(Iteration 7001 / 9800) loss: 0.788107
(Iteration 7101 / 9800) loss: 0.818656
(Iteration 7201 / 9800) loss: 0.888433
(Iteration 7301 / 9800) loss: 0.728136
(Epoch 15 / 20) train acc: 0.695000; val_acc: 0.530000
(Iteration 7401 / 9800) loss: 0.857501
(Iteration 7501 / 9800) loss: 0.867369
(Iteration 7601 / 9800) loss: 0.814501
(Iteration 7701 / 9800) loss: 0.763123
(Iteration 7801 / 9800) loss: 0.835519
(Epoch 16 / 20) train acc: 0.711000; val_acc: 0.528000
(Iteration 7901 / 9800) loss: 0.861891
(Iteration 8001 / 9800) loss: 0.667957
(Iteration 8101 / 9800) loss: 0.678417
(Iteration 8201 / 9800) loss: 0.776296
(Iteration 8301 / 9800) loss: 0.846255
(Epoch 17 / 20) train acc: 0.718000; val_acc: 0.532000
(Iteration 8401 / 9800) loss: 0.821841
(Iteration 8501 / 9800) loss: 0.737560
(Iteration 8601 / 9800) loss: 0.734345
(Iteration 8701 / 9800) loss: 0.789014
(Iteration 8801 / 9800) loss: 0.829744
(Epoch 18 / 20) train acc: 0.745000; val_acc: 0.530000
(Iteration 8901 / 9800) loss: 0.688820
(Iteration 9001 / 9800) loss: 0.726195
(Iteration 9101 / 9800) loss: 0.922960
(Iteration 9201 / 9800) loss: 0.791910
(Iteration 9301 / 9800) loss: 0.891499
(Epoch 19 / 20) train acc: 0.728000; val_acc: 0.522000
(Iteration 9401 / 9800) loss: 0.731820
(Iteration 9501 / 9800) loss: 0.721811
(Iteration 9601 / 9800) loss: 0.600602
(Iteration 9701 / 9800) loss: 0.689157
(Epoch 20 / 20) train acc: 0.739000; val_acc: 0.521000
output40

Test you model

Run your best model on the validation and test sets. You should achieve above 50% accuracy on the validation set.

y_test_pred = np.argmax(best_model.loss(data['X_test']), axis=1)
y_val_pred = np.argmax(best_model.loss(data['X_val']), axis=1)
print 'Validation set accuracy: ', (y_val_pred == data['y_val']).mean()
print 'Test set accuracy: ', (y_test_pred == data['y_test']).mean()
Validation set accuracy:  0.554
Test set accuracy:  0.545
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 214,904评论 6 497
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,581评论 3 389
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 160,527评论 0 350
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,463评论 1 288
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,546评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,572评论 1 293
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,582评论 3 414
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,330评论 0 270
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,776评论 1 307
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,087评论 2 330
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,257评论 1 344
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,923评论 5 338
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,571评论 3 322
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,192评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,436评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,145评论 2 366
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,127评论 2 352

推荐阅读更多精彩内容

  • 经常看见一些调侃我们这代人的文章,说我们80后生错了时代,大致内容如下: “当我们读小学的时候,大学毕业是包分配的...
    亲郭亲陈阅读 599评论 0 0
  • 静坐窗台前, 细看庭中树。 微风叶轻摇, 谁知心中事。
    江子鱼阅读 176评论 0 2
  • 后来呢?下面呢?如果一个故事能让听者追问,故事肯定还没讲完。接着讲呗。 吃完两只乳鸽,还不到八点,还早。我吃得多,...
    广州许多多阅读 207评论 0 1
  • 关闭除当前选项卡以外的所有选项卡。关闭所有其他标签的另一种方法是点击X按钮,同时按住Alt键 关闭当前选项卡: C...
    tanyp阅读 301评论 0 0