"When I study Residual Network
, it has made enormous confusion for me, therefore it is needed to make a memo post for later review as well as beginners' skip connection
."
-
Basic idea
(wikipedia quots)A residual neural network(ResNet) is an artificial neural network of a kind that builds on constructs known from
pyramidal cells(锥体细胞 )
in thecerebral cortex(大脑皮层)
.
The Residual neural networks do this by utilizing so calledskip connections
orshort-cuts
to jump over some layers in order to avoid the problem of vanishing gradients and training degradation.-
There are two types of
Neural networks
: Plain Networks and Deeper Networks.The Plain Networks often contain layers at most smaller than 25 with accuracy roughly as good as it should be, i.e. the result is not too bad and there is going to more improvement.
-
The Deeper Networks often contain layers more than 25+ and people often think
more deeper layer more better the accuracy
, in reality it is often wrong because the deeper layers is the higher risk of vanishing or exploding gradient will be. Even if you add theregularization
to save the whole network from it, there is also things calleddegradation
Vanishing or exploding gradient(it is trivial to explain it so we omit this part)
-
Degradation - This problem has been observed while training deeper neural networks, as we increase the network depth, accuracy gets saturated which is expected as more complex layers of the network to model all the intricacies of the data. Overtime, there will come a time, as we increase the layers of the network further(after the saturation region), the accuracy of the network
dropped
. We can think of this happened due tooverfitting
, but actually it is not, additional layers in a deep model lead to higher training errors(training not testing
)
It is hard to grasp at the beginning since thecommon sense
of training a neural network is to make deeper layers in order to achieve a higher accuracy, we often thinkmore input more better result
.
-
So in order to avoid all those type of problems, people find out residual network which has been proved a good solution for deeper neural networks(25+ layers).
-
How
- Intuition behind ResNet
- What is residual - A residual is the error in a result, for example, find out someone's age, if the actual age is 20 and you guessed 18, 2 is off from the right answer and it is the residua. In essence, residual is what you should have added to your prediction to match the actual data. It is important to realize that when residual is
0
, we don't do anything since the prediction already matches the actual data.
In the diagram, x is our prediction and we want it to be equal to the Actual. However, if is it off by a margin, our residual function residual() will kick in and produce the residual of the operation so as to correct our prediction to match the actual. If x == Actual, residual(x) will be 0. The Identity function just copies x.
- What is residual - A residual is the error in a result, for example, find out someone's age, if the actual age is 20 and you guessed 18, 2 is off from the right answer and it is the residua. In essence, residual is what you should have added to your prediction to match the actual data. It is important to realize that when residual is
- How
ResNet
worksWe want to go deeper without degradation in accuracy and error rate. We can do this via injecting identity mappings.
We want to be able to learn the residuals so that our predictions are close to the actuals.
-
Shortcut connections are those skipping one or more layers. In our case, the shortcut connections simply perform identity mapping, and their outputs are added to the outputs of the stacked layers. Identity shortcut connections add neither extra parameter nor computational complexity. The entire network can still be trained end-to-end by SGD with backpropagation, and can be easily implemented using common libraries without modifying the solvers.
H(x) = F(x) + x, where F(x) = W2 * relu(W1 * x + b1) + b2
During training period, the residual network learns the weights of its layers such that if the identity mapping were optimal, all the weights get set to 0. In effect F(x) become 0, as in x gets directly mapped to H(x) and no corrections need to be made. Hence these become your identity mappings which help grow the network deep. And if there is a deviation from optimal identity mapping, weights and biases of F(x) are learned to adjust for it. Think of F(x) as learning how to adjust our predictions to match the actuals.
- Intuition behind ResNet
Conclusion
Deep residual networks works well due to the flow of information from the very first layer to the last layer of the network. By formulating residual functions as identity mappings, information is able to flow unimpeded throughout the entire network. This allows any layer to be represented as a function of the original input. Using pre-activation resnets by placing batch normalization and relu before the convolution, the output of the addition becomes the output of the layer, this achieves the identity effect we desire.
PS-1
PS-2
- Take a close look of the
residual block
The main take away here is to make thea[l+2] == relu(a[l])
,
therefore, the gradients at every single layer could be computed with the original input taking into consideration. Given the above equation, when G and H are identity functions, information would always flow unimpeded and gradients would never vanish no matter how deep we go.
PS-3 ResNet code example
#import needed classes
import keras
from keras.datasets import cifar10
from keras.layers import Dense,Conv2D,MaxPooling2D,Flatten,AveragePooling2D,Dropout,BatchNormalization,Activation
from keras.models import Model,Input
from keras.optimizers import Adam
from keras.callbacks import LearningRateScheduler
from keras.callbacks import ModelCheckpoint
from math import ceil
import os
from keras.preprocessing.image import ImageDataGenerator
def Unit(x,filters,pool=False):
res = x
if pool:
x = MaxPooling2D(pool_size=(2, 2))(x)
res = Conv2D(filters=filters,kernel_size=[1,1],strides=(2,2),padding="same")(res)
out = BatchNormalization()(x)
out = Activation("relu")(out)
out = Conv2D(filters=filters, kernel_size=[3, 3], strides=[1, 1], padding="same")(out)
out = BatchNormalization()(out)
out = Activation("relu")(out)
out = Conv2D(filters=filters, kernel_size=[3, 3], strides=[1, 1], padding="same")(out)
out = keras.layers.add([res,out])
return out
#Define the model
def MiniModel(input_shape):
images = Input(input_shape)
net = Conv2D(filters=32, kernel_size=[3, 3], strides=[1, 1], padding="same")(images)
net = Unit(net,32)
net = Unit(net,32)
net = Unit(net,32)
net = Unit(net,64,pool=True)
net = Unit(net,64)
net = Unit(net,64)
net = Unit(net,128,pool=True)
net = Unit(net,128)
net = Unit(net,128)
net = Unit(net, 256,pool=True)
net = Unit(net, 256)
net = Unit(net, 256)
net = BatchNormalization()(net)
net = Activation("relu")(net)
net = Dropout(0.25)(net)
net = AveragePooling2D(pool_size=(4,4))(net)
net = Flatten()(net)
net = Dense(units=10,activation="softmax")(net)
model = Model(inputs=images,outputs=net)
return model
#load the cifar10 dataset
(train_x, train_y) , (test_x, test_y) = cifar10.load_data()
#normalize the data
train_x = train_x.astype('float32') / 255
test_x = test_x.astype('float32') / 255
#Subtract the mean image from both train and test set
train_x = train_x - train_x.mean()
test_x = test_x - test_x.mean()
#Divide by the standard deviation
train_x = train_x / train_x.std(axis=0)
test_x = test_x / test_x.std(axis=0)
# Generate batches of tensor image data with real-time data augmentation.
# The data will be looped over (in batches).
datagen = ImageDataGenerator(rotation_range=10,
width_shift_range=5. / 32,
height_shift_range=5. / 32,
horizontal_flip=True)
# Compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied).
datagen.fit(train_x)
#Encode the labels to vectors
train_y = keras.utils.to_categorical(train_y,10)
test_y = keras.utils.to_categorical(test_y,10)
#define a common unit
input_shape = (32,32,3)
model = MiniModel(input_shape)
#Print a Summary of the model
model.summary()
#Specify the training components
model.compile(optimizer=Adam(0.001),loss="categorical_crossentropy",metrics=["accuracy"])
epochs = 50
steps_per_epoch = ceil(50000/128)
# Fit the model on the batches generated by datagen.flow().
model.fit_generator(datagen.flow(train_x, train_y, batch_size=128),
validation_data=[test_x,test_y],
epochs=epochs,steps_per_epoch=steps_per_epoch, verbose=1, workers=4)
#Evaluate the accuracy of the test dataset
accuracy = model.evaluate(x=test_x,y=test_y,batch_size=128)
model.save("cifar10model.h5")
running result
<pre style="box-sizing: border-box; overflow: auto; font-family: monospace; font-size: inherit; display: block; padding: 1px 0px; margin: 0px; line-height: inherit; word-break: break-all; overflow-wrap: break-word; color: black; background-color: transparent; border: 0px; border-radius: 0px; white-space: pre-wrap; vertical-align: baseline;">Using TensorFlow backend.
</pre>
<pre style="box-sizing: border-box; overflow: auto; font-family: monospace; font-size: inherit; display: block; padding: 1px 0px; margin: 0px; line-height: inherit; word-break: break-all; overflow-wrap: break-word; color: black; background-color: transparent; border: 0px; border-radius: 0px; white-space: pre-wrap; vertical-align: baseline;">Downloading data from [https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz](https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz)
170500096/170498071 [==============================] - 42s 0us/step
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 32, 32, 3) 0
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 32, 32, 32) 896 input_1[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 32, 32, 32) 128 conv2d_1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation) (None, 32, 32, 32) 0 batch_normalization_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 32, 32, 32) 9248 activation_1[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 32, 32, 32) 128 conv2d_2[0][0]
__________________________________________________________________________________________________
activation_2 (Activation) (None, 32, 32, 32) 0 batch_normalization_2[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 32, 32, 32) 9248 activation_2[0][0]
__________________________________________________________________________________________________
add_1 (Add) (None, 32, 32, 32) 0 conv2d_1[0][0]
conv2d_3[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 32, 32, 32) 128 add_1[0][0]
__________________________________________________________________________________________________
activation_3 (Activation) (None, 32, 32, 32) 0 batch_normalization_3[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D) (None, 32, 32, 32) 9248 activation_3[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 32, 32, 32) 128 conv2d_4[0][0]
__________________________________________________________________________________________________
activation_4 (Activation) (None, 32, 32, 32) 0 batch_normalization_4[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D) (None, 32, 32, 32) 9248 activation_4[0][0]
__________________________________________________________________________________________________
add_2 (Add) (None, 32, 32, 32) 0 add_1[0][0]
conv2d_5[0][0]
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 32, 32, 32) 128 add_2[0][0]
__________________________________________________________________________________________________
activation_5 (Activation) (None, 32, 32, 32) 0 batch_normalization_5[0][0]
__________________________________________________________________________________________________
conv2d_6 (Conv2D) (None, 32, 32, 32) 9248 activation_5[0][0]
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, 32, 32, 32) 128 conv2d_6[0][0]
__________________________________________________________________________________________________
activation_6 (Activation) (None, 32, 32, 32) 0 batch_normalization_6[0][0]
__________________________________________________________________________________________________
conv2d_7 (Conv2D) (None, 32, 32, 32) 9248 activation_6[0][0]
__________________________________________________________________________________________________
add_3 (Add) (None, 32, 32, 32) 0 add_2[0][0]
conv2d_7[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, 16, 16, 32) 0 add_3[0][0]
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, 16, 16, 32) 128 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
activation_7 (Activation) (None, 16, 16, 32) 0 batch_normalization_7[0][0]
__________________________________________________________________________________________________
conv2d_9 (Conv2D) (None, 16, 16, 64) 18496 activation_7[0][0]
__________________________________________________________________________________________________
batch_normalization_8 (BatchNor (None, 16, 16, 64) 256 conv2d_9[0][0]
__________________________________________________________________________________________________
activation_8 (Activation) (None, 16, 16, 64) 0 batch_normalization_8[0][0]
__________________________________________________________________________________________________
conv2d_8 (Conv2D) (None, 16, 16, 64) 2112 add_3[0][0]
__________________________________________________________________________________________________
conv2d_10 (Conv2D) (None, 16, 16, 64) 36928 activation_8[0][0]
__________________________________________________________________________________________________
add_4 (Add) (None, 16, 16, 64) 0 conv2d_8[0][0]
conv2d_10[0][0]
__________________________________________________________________________________________________
batch_normalization_9 (BatchNor (None, 16, 16, 64) 256 add_4[0][0]
__________________________________________________________________________________________________
activation_9 (Activation) (None, 16, 16, 64) 0 batch_normalization_9[0][0]
__________________________________________________________________________________________________
conv2d_11 (Conv2D) (None, 16, 16, 64) 36928 activation_9[0][0]
__________________________________________________________________________________________________
batch_normalization_10 (BatchNo (None, 16, 16, 64) 256 conv2d_11[0][0]
__________________________________________________________________________________________________
activation_10 (Activation) (None, 16, 16, 64) 0 batch_normalization_10[0][0]
__________________________________________________________________________________________________
conv2d_12 (Conv2D) (None, 16, 16, 64) 36928 activation_10[0][0]
__________________________________________________________________________________________________
add_5 (Add) (None, 16, 16, 64) 0 add_4[0][0]
conv2d_12[0][0]
__________________________________________________________________________________________________
batch_normalization_11 (BatchNo (None, 16, 16, 64) 256 add_5[0][0]
__________________________________________________________________________________________________
activation_11 (Activation) (None, 16, 16, 64) 0 batch_normalization_11[0][0]
__________________________________________________________________________________________________
conv2d_13 (Conv2D) (None, 16, 16, 64) 36928 activation_11[0][0]
__________________________________________________________________________________________________
batch_normalization_12 (BatchNo (None, 16, 16, 64) 256 conv2d_13[0][0]
__________________________________________________________________________________________________
activation_12 (Activation) (None, 16, 16, 64) 0 batch_normalization_12[0][0]
__________________________________________________________________________________________________
conv2d_14 (Conv2D) (None, 16, 16, 64) 36928 activation_12[0][0]
__________________________________________________________________________________________________
add_6 (Add) (None, 16, 16, 64) 0 add_5[0][0]
conv2d_14[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D) (None, 8, 8, 64) 0 add_6[0][0]
__________________________________________________________________________________________________
batch_normalization_13 (BatchNo (None, 8, 8, 64) 256 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
activation_13 (Activation) (None, 8, 8, 64) 0 batch_normalization_13[0][0]
__________________________________________________________________________________________________
conv2d_16 (Conv2D) (None, 8, 8, 128) 73856 activation_13[0][0]
__________________________________________________________________________________________________
batch_normalization_14 (BatchNo (None, 8, 8, 128) 512 conv2d_16[0][0]
__________________________________________________________________________________________________
activation_14 (Activation) (None, 8, 8, 128) 0 batch_normalization_14[0][0]
__________________________________________________________________________________________________
conv2d_15 (Conv2D) (None, 8, 8, 128) 8320 add_6[0][0]
__________________________________________________________________________________________________
conv2d_17 (Conv2D) (None, 8, 8, 128) 147584 activation_14[0][0]
__________________________________________________________________________________________________
add_7 (Add) (None, 8, 8, 128) 0 conv2d_15[0][0]
conv2d_17[0][0]
__________________________________________________________________________________________________
batch_normalization_15 (BatchNo (None, 8, 8, 128) 512 add_7[0][0]
__________________________________________________________________________________________________
activation_15 (Activation) (None, 8, 8, 128) 0 batch_normalization_15[0][0]
__________________________________________________________________________________________________
conv2d_18 (Conv2D) (None, 8, 8, 128) 147584 activation_15[0][0]
__________________________________________________________________________________________________
batch_normalization_16 (BatchNo (None, 8, 8, 128) 512 conv2d_18[0][0]
__________________________________________________________________________________________________
activation_16 (Activation) (None, 8, 8, 128) 0 batch_normalization_16[0][0]
__________________________________________________________________________________________________
conv2d_19 (Conv2D) (None, 8, 8, 128) 147584 activation_16[0][0]
__________________________________________________________________________________________________
add_8 (Add) (None, 8, 8, 128) 0 add_7[0][0]
conv2d_19[0][0]
__________________________________________________________________________________________________
batch_normalization_17 (BatchNo (None, 8, 8, 128) 512 add_8[0][0]
__________________________________________________________________________________________________
activation_17 (Activation) (None, 8, 8, 128) 0 batch_normalization_17[0][0]
__________________________________________________________________________________________________
conv2d_20 (Conv2D) (None, 8, 8, 128) 147584 activation_17[0][0]
__________________________________________________________________________________________________
batch_normalization_18 (BatchNo (None, 8, 8, 128) 512 conv2d_20[0][0]
__________________________________________________________________________________________________
activation_18 (Activation) (None, 8, 8, 128) 0 batch_normalization_18[0][0]
__________________________________________________________________________________________________
conv2d_21 (Conv2D) (None, 8, 8, 128) 147584 activation_18[0][0]
__________________________________________________________________________________________________
add_9 (Add) (None, 8, 8, 128) 0 add_8[0][0]
conv2d_21[0][0]
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D) (None, 4, 4, 128) 0 add_9[0][0]
__________________________________________________________________________________________________
batch_normalization_19 (BatchNo (None, 4, 4, 128) 512 max_pooling2d_3[0][0]
__________________________________________________________________________________________________
activation_19 (Activation) (None, 4, 4, 128) 0 batch_normalization_19[0][0]
__________________________________________________________________________________________________
conv2d_23 (Conv2D) (None, 4, 4, 256) 295168 activation_19[0][0]
__________________________________________________________________________________________________
batch_normalization_20 (BatchNo (None, 4, 4, 256) 1024 conv2d_23[0][0]
__________________________________________________________________________________________________
activation_20 (Activation) (None, 4, 4, 256) 0 batch_normalization_20[0][0]
__________________________________________________________________________________________________
conv2d_22 (Conv2D) (None, 4, 4, 256) 33024 add_9[0][0]
__________________________________________________________________________________________________
conv2d_24 (Conv2D) (None, 4, 4, 256) 590080 activation_20[0][0]
__________________________________________________________________________________________________
add_10 (Add) (None, 4, 4, 256) 0 conv2d_22[0][0]
conv2d_24[0][0]
__________________________________________________________________________________________________
batch_normalization_21 (BatchNo (None, 4, 4, 256) 1024 add_10[0][0]
__________________________________________________________________________________________________
activation_21 (Activation) (None, 4, 4, 256) 0 batch_normalization_21[0][0]
__________________________________________________________________________________________________
conv2d_25 (Conv2D) (None, 4, 4, 256) 590080 activation_21[0][0]
__________________________________________________________________________________________________
batch_normalization_22 (BatchNo (None, 4, 4, 256) 1024 conv2d_25[0][0]
__________________________________________________________________________________________________
activation_22 (Activation) (None, 4, 4, 256) 0 batch_normalization_22[0][0]
__________________________________________________________________________________________________
conv2d_26 (Conv2D) (None, 4, 4, 256) 590080 activation_22[0][0]
__________________________________________________________________________________________________
add_11 (Add) (None, 4, 4, 256) 0 add_10[0][0]
conv2d_26[0][0]
__________________________________________________________________________________________________
batch_normalization_23 (BatchNo (None, 4, 4, 256) 1024 add_11[0][0]
__________________________________________________________________________________________________
activation_23 (Activation) (None, 4, 4, 256) 0 batch_normalization_23[0][0]
__________________________________________________________________________________________________
conv2d_27 (Conv2D) (None, 4, 4, 256) 590080 activation_23[0][0]
__________________________________________________________________________________________________
batch_normalization_24 (BatchNo (None, 4, 4, 256) 1024 conv2d_27[0][0]
__________________________________________________________________________________________________
activation_24 (Activation) (None, 4, 4, 256) 0 batch_normalization_24[0][0]
__________________________________________________________________________________________________
conv2d_28 (Conv2D) (None, 4, 4, 256) 590080 activation_24[0][0]
__________________________________________________________________________________________________
add_12 (Add) (None, 4, 4, 256) 0 add_11[0][0]
conv2d_28[0][0]
__________________________________________________________________________________________________
batch_normalization_25 (BatchNo (None, 4, 4, 256) 1024 add_12[0][0]
__________________________________________________________________________________________________
activation_25 (Activation) (None, 4, 4, 256) 0 batch_normalization_25[0][0]
__________________________________________________________________________________________________
dropout_1 (Dropout) (None, 4, 4, 256) 0 activation_25[0][0]
__________________________________________________________________________________________________
average_pooling2d_1 (AveragePoo (None, 1, 1, 256) 0 dropout_1[0][0]
__________________________________________________________________________________________________
flatten_1 (Flatten) (None, 256) 0 average_pooling2d_1[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 10) 2570 flatten_1[0][0]
==================================================================================================
Total params: 4,374,538
Trainable params: 4,368,714
Non-trainable params: 5,824
__________________________________________________________________________________________________
</pre>
<pre style="box-sizing: border-box; overflow: auto; font-family: monospace; font-size: inherit; display: block; padding: 1px 0px; margin: 0px; line-height: inherit; word-break: break-all; overflow-wrap: break-word; color: black; background-color: transparent; border: 0px; border-radius: 0px; white-space: pre-wrap; vertical-align: baseline;">Epoch 1/50
391/391 [==============================] - 27s 68ms/step - loss: 1.2885 - acc: 0.5326 - val_loss: 1.6630 - val_acc: 0.4961
Epoch 2/50
391/391 [==============================] - 21s 53ms/step - loss: 0.8541 - acc: 0.7001 - val_loss: 1.0465 - val_acc: 0.6674
Epoch 3/50
391/391 [==============================] - 21s 54ms/step - loss: 0.6907 - acc: 0.7593 - val_loss: 0.9077 - val_acc: 0.7053
Epoch 4/50
391/391 [==============================] - 22s 56ms/step - loss: 0.6064 - acc: 0.7902 - val_loss: 0.6870 - val_acc: 0.7732
Epoch 5/50
391/391 [==============================] - 21s 53ms/step - loss: 0.5409 - acc: 0.8119 - val_loss: 0.6286 - val_acc: 0.7820
Epoch 6/50
391/391 [==============================] - 20s 52ms/step - loss: 0.4976 - acc: 0.8276 - val_loss: 0.6467 - val_acc: 0.7915
Epoch 7/50
391/391 [==============================] - 21s 53ms/step - loss: 0.4554 - acc: 0.8428 - val_loss: 0.7318 - val_acc: 0.7812
Epoch 8/50
391/391 [==============================] - 21s 54ms/step - loss: 0.4276 - acc: 0.8515 - val_loss: 0.5955 - val_acc: 0.8024
Epoch 9/50
391/391 [==============================] - 20s 51ms/step - loss: 0.4037 - acc: 0.8592 - val_loss: 0.7164 - val_acc: 0.7742
Epoch 10/50
391/391 [==============================] - 20s 52ms/step - loss: 0.3785 - acc: 0.8691 - val_loss: 0.5306 - val_acc: 0.8272
Epoch 11/50
391/391 [==============================] - 20s 51ms/step - loss: 0.3606 - acc: 0.8747 - val_loss: 0.6534 - val_acc: 0.8090
Epoch 12/50
391/391 [==============================] - 20s 51ms/step - loss: 0.3378 - acc: 0.8816 - val_loss: 0.4706 - val_acc: 0.8475
Epoch 13/50
391/391 [==============================] - 20s 51ms/step - loss: 0.3182 - acc: 0.8888 - val_loss: 0.4721 - val_acc: 0.8438
Epoch 14/50
391/391 [==============================] - 21s 54ms/step - loss: 0.3070 - acc: 0.8941 - val_loss: 0.5304 - val_acc: 0.8327
Epoch 15/50
391/391 [==============================] - 20s 52ms/step - loss: 0.2959 - acc: 0.8972 - val_loss: 0.5714 - val_acc: 0.8310
Epoch 16/50
391/391 [==============================] - 22s 56ms/step - loss: 0.2757 - acc: 0.9032 - val_loss: 0.5431 - val_acc: 0.8413
Epoch 17/50
391/391 [==============================] - 21s 53ms/step - loss: 0.2722 - acc: 0.9045 - val_loss: 0.5690 - val_acc: 0.8257
Epoch 18/50
391/391 [==============================] - 21s 54ms/step - loss: 0.2542 - acc: 0.9105 - val_loss: 0.5157 - val_acc: 0.8502
Epoch 19/50
391/391 [==============================] - 20s 52ms/step - loss: 0.2447 - acc: 0.9150 - val_loss: 0.4588 - val_acc: 0.8625
Epoch 20/50
391/391 [==============================] - 21s 53ms/step - loss: 0.2299 - acc: 0.9180 - val_loss: 0.5702 - val_acc: 0.8410
Epoch 21/50
391/391 [==============================] - 20s 51ms/step - loss: 0.2238 - acc: 0.9207 - val_loss: 0.5116 - val_acc: 0.8418
Epoch 22/50
391/391 [==============================] - 20s 52ms/step - loss: 0.2201 - acc: 0.9242 - val_loss: 0.4404 - val_acc: 0.8655
Epoch 23/50
391/391 [==============================] - 21s 53ms/step - loss: 0.2071 - acc: 0.9270 - val_loss: 0.3913 - val_acc: 0.8784
Epoch 24/50
391/391 [==============================] - 21s 55ms/step - loss: 0.2007 - acc: 0.9300 - val_loss: 0.4831 - val_acc: 0.8581
Epoch 25/50
391/391 [==============================] - 20s 52ms/step - loss: 0.1993 - acc: 0.9298 - val_loss: 0.4367 - val_acc: 0.8684
Epoch 26/50
391/391 [==============================] - 24s 61ms/step - loss: 0.1902 - acc: 0.9327 - val_loss: 0.3972 - val_acc: 0.8818
Epoch 27/50
391/391 [==============================] - 25s 64ms/step - loss: 0.1804 - acc: 0.9355 - val_loss: 0.4377 - val_acc: 0.8714
Epoch 28/50
391/391 [==============================] - 24s 62ms/step - loss: 0.1751 - acc: 0.9396 - val_loss: 0.4713 - val_acc: 0.8644
Epoch 29/50
391/391 [==============================] - 23s 60ms/step - loss: 0.1686 - acc: 0.9399 - val_loss: 0.4441 - val_acc: 0.8689
Epoch 30/50
391/391 [==============================] - 22s 57ms/step - loss: 0.1619 - acc: 0.9436 - val_loss: 0.5143 - val_acc: 0.8729
Epoch 31/50
391/391 [==============================] - 21s 55ms/step - loss: 0.1562 - acc: 0.9439 - val_loss: 0.4043 - val_acc: 0.8834
Epoch 32/50
391/391 [==============================] - 22s 56ms/step - loss: 0.1512 - acc: 0.9463 - val_loss: 0.3830 - val_acc: 0.8895
Epoch 33/50
391/391 [==============================] - 21s 54ms/step - loss: 0.1456 - acc: 0.9482 - val_loss: 0.3707 - val_acc: 0.8900
Epoch 34/50
391/391 [==============================] - 23s 58ms/step - loss: 0.1415 - acc: 0.9498 - val_loss: 0.4362 - val_acc: 0.8788
Epoch 35/50
391/391 [==============================] - 22s 56ms/step - loss: 0.1423 - acc: 0.9501 - val_loss: 0.4081 - val_acc: 0.8881
Epoch 36/50
391/391 [==============================] - 22s 56ms/step - loss: 0.1350 - acc: 0.9523 - val_loss: 0.4355 - val_acc: 0.8809
Epoch 37/50
391/391 [==============================] - 22s 55ms/step - loss: 0.1343 - acc: 0.9526 - val_loss: 0.4465 - val_acc: 0.8825
Epoch 38/50
391/391 [==============================] - 22s 57ms/step - loss: 0.1314 - acc: 0.9526 - val_loss: 0.3857 - val_acc: 0.8941
Epoch 39/50
391/391 [==============================] - 22s 57ms/step - loss: 0.1207 - acc: 0.9574 - val_loss: 0.5319 - val_acc: 0.8636
Epoch 40/50
391/391 [==============================] - 21s 55ms/step - loss: 0.1206 - acc: 0.9569 - val_loss: 0.4038 - val_acc: 0.8907
Epoch 41/50
391/391 [==============================] - 22s 55ms/step - loss: 0.1191 - acc: 0.9578 - val_loss: 0.3672 - val_acc: 0.8963
Epoch 42/50
391/391 [==============================] - 21s 54ms/step - loss: 0.1148 - acc: 0.9596 - val_loss: 0.4449 - val_acc: 0.8819
Epoch 43/50
391/391 [==============================] - 21s 54ms/step - loss: 0.1116 - acc: 0.9591 - val_loss: 0.4252 - val_acc: 0.8844
Epoch 44/50
391/391 [==============================] - 22s 55ms/step - loss: 0.1097 - acc: 0.9612 - val_loss: 0.5019 - val_acc: 0.8774
Epoch 45/50
391/391 [==============================] - 22s 55ms/step - loss: 0.1066 - acc: 0.9619 - val_loss: 0.4458 - val_acc: 0.8822
Epoch 46/50
391/391 [==============================] - 22s 56ms/step - loss: 0.1032 - acc: 0.9634 - val_loss: 0.4647 - val_acc: 0.8833
Epoch 47/50
391/391 [==============================] - 22s 56ms/step - loss: 0.1027 - acc: 0.9634 - val_loss: 0.4329 - val_acc: 0.8845
Epoch 48/50
391/391 [==============================] - 22s 56ms/step - loss: 0.0990 - acc: 0.9644 - val_loss: 0.4254 - val_acc: 0.8880
Epoch 49/50
391/391 [==============================] - 22s 57ms/step - loss: 0.0935 - acc: 0.9676 - val_loss: 0.4516 - val_acc: 0.8850
Epoch 50/50
391/391 [==============================] - 22s 55ms/step - loss: 0.0969 - acc: 0.9660 - val_loss: 0.3984 - val_acc: 0.8995
10000/10000 [==============================] - 1s 143us/step</pre>