Why is the training time of each epoch different heavily? - machine-learning

I am training a CNN model in Keras. I find that the time of each epoch is nearly same in the fist 10 epochs, about 140s every epoch. But in the successive epochs,the training time increases to about 500s every epoch.
So, what‘s the problem?
184s - loss: 0.2587 - fscore_cloud: 0.8348 - val_loss: 0.1987 - val_fscore_cloud: 0.8781
Epoch 2/2000
163s - loss: 0.1899 - fscore_cloud: 0.8868 - val_loss: 0.1927 - val_fscore_cloud: 0.8877
Epoch 3/2000
144s - loss: 0.1821 - fscore_cloud: 0.8915 - val_loss: 0.1885 - val_fscore_cloud: 0.8910
Epoch 4/2000
143s - loss: 0.1794 - fscore_cloud: 0.8931 - val_loss: 0.1856 - val_fscore_cloud: 0.8930
Epoch 5/2000
142s - loss: 0.1784 - fscore_cloud: 0.8937 - val_loss: 0.1846 - val_fscore_cloud: 0.8935
Epoch 6/2000
142s - loss: 0.1774 - fscore_cloud: 0.8939 - val_loss: 0.1835 - val_fscore_cloud: 0.8940
Epoch 7/2000
144s - loss: 0.1766 - fscore_cloud: 0.8942 - val_loss: 0.1827 - val_fscore_cloud: 0.8944
Epoch 8/2000
141s - loss: 0.1759 - fscore_cloud: 0.8944 - val_loss: 0.1820 - val_fscore_cloud: 0.8947
Epoch 9/2000
139s - loss: 0.1754 - fscore_cloud: 0.8946 - val_loss: 0.1813 - val_fscore_cloud: 0.8950
Epoch 10/2000
184s - loss: 0.1749 - fscore_cloud: 0.8947 - val_loss: 0.1806 - val_fscore_cloud: 0.8952
Epoch 11/2000
544s - loss: 0.1743 - fscore_cloud: 0.8948 - val_loss: 0.1800 - val_fscore_cloud: 0.8954
Epoch 12/2000
545s - loss: 0.1738 - fscore_cloud: 0.8950 - val_loss: 0.1796 - val_fscore_cloud: 0.8955
Epoch 13/2000
553s - loss: 0.1731 - fscore_cloud: 0.8952 - val_loss: 0.1791 - val_fscore_cloud: 0.8957
Epoch 14/2000
214s - loss: 0.1723 - fscore_cloud: 0.8955 - val_loss: 0.1776 - val_fscore_cloud: 0.8961
Epoch 15/2000
145s - loss: 0.1706 - fscore_cloud: 0.8965 - val_loss: 0.1768 - val_fscore_cloud: 0.8964
Epoch 16/2000
146s - loss: 0.1683 - fscore_cloud: 0.8975 - val_loss: 0.1743 - val_fscore_cloud: 0.8980
Epoch 17/2000
140s - loss: 0.1658 - fscore_cloud: 0.8983 - val_loss: 0.1734 - val_fscore_cloud: 0.8986
Epoch 18/2000
142s - loss: 0.1640 - fscore_cloud: 0.8987 - val_loss: 0.1719 - val_fscore_cloud: 0.8990
Epoch 19/2000
137s - loss: 0.1621 - fscore_cloud: 0.8996 - val_loss: 0.1699 - val_fscore_cloud: 0.9001
Epoch 20/2000
277s - loss: 0.1601 - fscore_cloud: 0.9007 - val_loss: 0.1678 - val_fscore_cloud: 0.9015
Epoch 21/2000
310s - loss: 0.1579 - fscore_cloud: 0.9018 - val_loss: 0.1655 - val_fscore_cloud: 0.9028
Epoch 22/2000
345s - loss: 0.1558 - fscore_cloud: 0.9031 - val_loss: 0.1635 - val_fscore_cloud: 0.9042
Epoch 23/2000
587s - loss: 0.1538 - fscore_cloud: 0.9044 - val_loss: 0.1621 - val_fscore_cloud: 0.9054
Epoch 24/2000
525s - loss: 0.1519 - fscore_cloud: 0.9056 - val_loss: 0.1610 - val_fscore_cloud: 0.9061
Epoch 25/2000
579s - loss: 0.1500 - fscore_cloud: 0.9068 - val_loss: 0.1597 - val_fscore_cloud: 0.9069
Epoch 26/2000
557s - loss: 0.1485 - fscore_cloud: 0.9075 - val_loss: 0.1575 - val_fscore_cloud: 0.9078
Epoch 27/2000
530s - loss: 0.1469 - fscore_cloud: 0.9084 - val_loss: 0.1561 - val_fscore_cloud: 0.9083
Epoch 28/2000

I also found this problem. Even when we just run torch.cuda.FloatTensor(a,b).normal_() every epoch, the latter epochs will take longer than previous epochs. I guess that this phenomenon is caused by memory usage.

Related

Is my model overfitting/underfitting? Can someone explain the behavior?

The accuracy of the training dataset is increasing steadily and the loss decreases accordingly also.
However, for the accuracy of the validation dataset has a strange fluctuation. It increases but tends to fluctuate and decrease at times and isn't learnign at the same rate as the validation dataset. The validation loss decreases but sometimes increases also.
Here are my results after 10 epochs:
Epoch 1/10
493/493 [==============================] - 330s 668ms/step - loss: 0.8949 - accuracy: 0.6697 - val_loss: 0.6944 - val_accuracy: 0.6463
Epoch 2/10
493/493 [==============================] - 290s 589ms/step - loss: 0.5457 - accuracy: 0.7958 - val_loss: 0.6451 - val_accuracy: 0.7450
Epoch 3/10
493/493 [==============================] - 331s 672ms/step - loss: 0.5110 - accuracy: 0.8235 - val_loss: 0.8121 - val_accuracy: 0.6904
Epoch 4/10
493/493 [==============================] - 278s 563ms/step - loss: 0.4697 - accuracy: 0.8479 - val_loss: 0.7215 - val_accuracy: 0.7153
Epoch 5/10
493/493 [==============================] - 265s 537ms/step - loss: 0.4395 - accuracy: 0.8726 - val_loss: 0.6471 - val_accuracy: 0.7505
Epoch 6/10
493/493 [==============================] - 277s 561ms/step - loss: 0.4043 - accuracy: 0.8924 - val_loss: 0.5335 - val_accuracy: 0.8169
Epoch 7/10
493/493 [==============================] - 335s 679ms/step - loss: 0.3918 - accuracy: 0.9024 - val_loss: 0.5372 - val_accuracy: 0.8294
Epoch 8/10
493/493 [==============================] - 320s 650ms/step - loss: 0.3679 - accuracy: 0.9111 - val_loss: 0.5790 - val_accuracy: 0.8171
Epoch 9/10
493/493 [==============================] - 299s 606ms/step - loss: 0.3618 - accuracy: 0.9151 - val_loss: 0.3969 - val_accuracy: 0.8874
Epoch 10/10
493/493 [==============================] - 272s 552ms/step - loss: 0.3374 - accuracy: 0.9235 - val_loss: 0.4553 - val_accuracy: 0.8652
Here is my code for the layers etc:
model = tf.keras.models.Sequential([tf.keras.layers.Conv2D(16,(3,3), kernel_regularizer = regularizers.l2(0.01), activation = 'relu', input_shape = (200,200,1)), #conv2d= how many filters we want to keep in the layer, input shaoe = size of filter
tf.keras.layers.MaxPool2D(2,2), #Max pixels out of a given number of pixels
#
tf.keras.layers.Conv2D(32,(3,3), kernel_regularizer = regularizers.l2(0.01), activation='relu'),
tf.keras.layers.MaxPool2D(2,2),
#
tf.keras.layers.Conv2D(64,(3,3), kernel_regularizer = regularizers.l2(0.01), activation='relu'),
tf.keras.layers.MaxPool2D(2,2),
## Increasing the number of channels
tf.keras.layers.Flatten(),
##
tf.keras.layers.Dense(512,activation='relu', kernel_regularizer = regularizers.l2(0.01)),
##
tf.keras.layers.Dense(1,activation='sigmoid')
])
model.compile(loss='binary_crossentropy',
optimizer= tf.keras.optimizers.RMSprop(learning_rate=0.001),
metrics =['accuracy'])
model_fit = model.fit(train_dataset,
steps_per_epoch = None,
epochs = 10,
validation_data = validation_dataset)
I added regularization to the Conv2D Layers to reduce overfitting I was previously experiencing. I also have tried changing the figures for Regularization

Image Classification CNN save best parameters with ModelCheckpoint

I am doing image classification with CNN.
The following is my model:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(200, 200, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(4, activation='softmax'))
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(x_train,y_train,epochs=16,batch_size=64,validation_data=(x_val,y_val))
The epoch results are like the below:
Epoch 1/16
416/416 [==============================] - 832s 2s/step - loss: 0.7742 - accuracy: 0.8689 - val_loss: 0.5149 - val_accuracy: 0.8451
Epoch 2/16
416/416 [==============================] - 825s 2s/step - loss: 0.5608 - accuracy: 0.8585 - val_loss: 0.3776 - val_accuracy: 0.8808
Epoch 3/16
416/416 [==============================] - 775s 2s/step - loss: 0.1926 - accuracy: 0.9338 - val_loss: 0.3328 - val_accuracy: 0.9066
Epoch 4/16
416/416 [==============================] - 587s 1s/step - loss: 0.0984 - accuracy: 0.9650 - val_loss: 0.3163 - val_accuracy: 0.9388
Epoch 5/16
416/416 [==============================] - 578s 1s/step - loss: 0.0606 - accuracy: 0.9798 - val_loss: 0.3584 - val_accuracy: 0.9357
Epoch 6/16
416/416 [==============================] - 511s 1s/step - loss: 0.0457 - accuracy: 0.9860 - val_loss: 0.5067 - val_accuracy: 0.9360
Epoch 7/16
416/416 [==============================] - 476s 1s/step - loss: 0.3649 - accuracy: 0.8912 - val_loss: 0.4446 - val_accuracy: 0.8645
Epoch 8/16
416/416 [==============================] - 476s 1s/step - loss: 0.3108 - accuracy: 0.9006 - val_loss: 0.6096 - val_accuracy: 0.8681
Epoch 9/16
416/416 [==============================] - 477s 1s/step - loss: 0.2397 - accuracy: 0.9158 - val_loss: 0.4061 - val_accuracy: 0.9042
Epoch 10/16
416/416 [==============================] - 502s 1s/step - loss: 0.1334 - accuracy: 0.9532 - val_loss: 0.3673 - val_accuracy: 0.9281
Epoch 11/16
416/416 [==============================] - 478s 1s/step - loss: 0.2787 - accuracy: 0.9184 - val_loss: 0.6745 - val_accuracy: 0.9039
Epoch 12/16
416/416 [==============================] - 481s 1s/step - loss: 0.7476 - accuracy: 0.8649 - val_loss: 0.4643 - val_accuracy: 0.8777
Epoch 13/16
416/416 [==============================] - 488s 1s/step - loss: 0.2187 - accuracy: 0.9271 - val_loss: 0.3347 - val_accuracy: 0.9102
Epoch 14/16
416/416 [==============================] - 483s 1s/step - loss: 4.0347 - accuracy: 0.9171 - val_loss: 0.6267 - val_accuracy: 0.7980
Epoch 15/16
416/416 [==============================] - 476s 1s/step - loss: 0.5838 - accuracy: 0.8095 - val_loss: 0.4481 - val_accuracy: 0.8663
Epoch 16/16
416/416 [==============================] - 492s 1s/step - loss: 0.4916 - accuracy: 0.8520 - val_loss: 1.0406 - val_accuracy: 0.6113
My first question is that because the mode.fit will keep the last epoch result, but my last epoch result is not the best(the epoch 4/16 is the best result based on min val_loss)
Hence, I wonder how could I build on a model using the epoch 4/16 parameter?
Note: I have saved the model.
I realize that if I add ModelCheckpoing in the model.fit, then the min val_loss may be saved. However, because it takes me a long time to run the code, I think is it possible to extract the min val_loss result directly from the model I saved without running the code again?
My second question is that I do not understand how ModelCheckpoint works since my understanding is that ModelCheckpoint will stop at the best epoch.
if I have a ModelCheckpoint like below:
mc = ModelCheckpoint('best_model.h5', monitor='val_loss', mode='min', save_best_only=True)
If the epoch is 16 and the min val_loss happened at epoch 4/16, then using the ModelCheckpoing will stop running the code at epoch 4/16 and save the parameters. But it does not run the rest of the epoch 5 to 16, how does it know that epoch 4 is the best? or actually, using ModelCehckpoint, the code will still run the 16 epoch and just save the best one(epoch 4)?
Thanks!!
ModelCheckpoint does not stop the training. After each epoch it compares the result with the current best one, and pick the best between two, according to this document code, then you only need to reload the saved model to get that best weight.

Why the cross validation loss is not decreasing for this model?

Github link to code-
https://github.com/abhijit1247/Resnet50_trial1.git
I am trying to use transfer learning for satellite image classification on DeepSAT-6 dataset.
link for dataset-
https://www.kaggle.com/crawford/deepsat-sat6
My base model is Resnet50.
I am trying to follow this training strategy https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html that is first training the top layer separately based on the output of the completely frozen convolutional layers and then attaching the top with the pre-trained weights and then start unfreezing the convolutional blocks.
top layer-
model=Sequential([
Dense(1024,input_dim=2048),
BatchNormalization(),
Activation('relu'),
Dense(256),
BatchNormalization(),
Activation('relu'),
Dense(6, activation='softmax'),
])
trained for 50 epochs got training accuracy of around- 95.09% and cross-Val accuracy- 93.71%
Then, I attached this top to the convolutional block and unfroze the bottommost conventional block-
Train on 275400 samples, validate on 48600 samples.
Epoch 1/10
275400/275400 [==============================] - 649s 2ms/step - loss: 0.0962 - accuracy: 0.9656 - val_loss: 6.1452 - val_accuracy: 0.1554
Epoch 2/10
275400/275400 [==============================] - 652s 2ms/step - loss: 0.0835 - accuracy: 0.9700 - val_loss: 5.5609 - val_accuracy: 0.1554
Epoch 3/10
275400/275400 [==============================] - 665s 2ms/step - loss: 0.0745 - accuracy: 0.9734 - val_loss: 6.6450 - val_accuracy: 0.1554
Epoch 4/10
275400/275400 [==============================] - 663s 2ms/step - loss: 0.0680 - accuracy: 0.9758 - val_loss: 6.4879 - val_accuracy: 0.1554
Epoch 5/10
275400/275400 [==============================] - 678s 2ms/step - loss: 0.0634 - accuracy: 0.9775 - val_loss: 6.2436 - val_accuracy: 0.1554
Epoch 6/10
275400/275400 [==============================] - 651s 2ms/step - loss: 0.0589 - accuracy: 0.9789 - val_loss: 7.9822 - val_accuracy: 0.1554
Epoch 7/10
275400/275400 [==============================] - 662s 2ms/step - loss: 0.0555 - accuracy: 0.9803 - val_loss: 9.0204 - val_accuracy: 0.1554
Epoch 8/10
275400/275400 [==============================] - 701s 3ms/step - loss: 0.0521 - accuracy: 0.9812 - val_loss: 8.3389 - val_accuracy: 0.1554
Epoch 9/10
275400/275400 [==============================] - 669s 2ms/step - loss: 0.0502 - accuracy: 0.9824 - val_loss: 8.9311 - val_accuracy: 0.1554
So why the cross-validation loss is behaving so oddly? Why isn't it decreasing with the epochs?

The loss and accuracy of val data in fitting process is not equal with the evaluate process

The loss and accuracy of val data in fitting process are not equal with the evaluate process.
And I used image_generator.
The code is in attachement. incep_v3.py is fitting code,model_app.py is evaluate code.
E:\Jason\incep_v3.py
E:\Jason\model_app.py
The fitting log:
Epoch 00001: saving model to ./training2/cp-01.ckpt
Epoch 2/30
150/150 [==============================] - 368s 2s/step - loss: 0.0675 - accuracy: 0.9787 - val_loss: 0.1083 - val_accuracy: 0.9375
Epoch 00002: saving model to ./training2/cp-02.ckpt
Epoch 3/30
150/150 [==============================] - 382s 3s/step - loss: 0.0506 - accuracy: 0.9808 - val_loss: 0.0429 - val_accuracy: 1.0000
Epoch 00003: saving model to ./training2/cp-03.ckpt
Epoch 4/30
150/150 [==============================] - 335s 2s/step - loss: 0.0433 - accuracy: 0.9833 - val_loss: 0.1925 - val_accuracy: 0.8750
Epoch 00004: saving model to ./training2/cp-04.ckpt
Epoch 5/30
150/150 [==============================] - 337s 2s/step - loss: 0.0573 - accuracy: 0.9792 - val_loss: 0.2156 - val_accuracy: 0.9375
Epoch 00005: saving model to ./training2/cp-05.ckpt
Epoch 6/30
150/150 [==============================] - 336s 2s/step - loss: 0.0383 - accuracy: 0.9867 - val_loss: 0.0069 - val_accuracy: 1.0000
I load the cp-05.ckpt to evaluate, the evaluating log:
0.20122399926185608 0.9712499976158142

Validation accuracy stagnates while training accuracy improves

I'm pretty new to deep learning so I'm sorry if I'm missing something obvious.
I am currently training a CNN with a dataset I put together.
When training, the training accuracy behaves pretty normal and improves, reaching >99% accuracy. My validation accuracy starts off at about 75% and fluctuates around 81% ± 1%. After training, the model performs really well on completely new data.
Epoch 1/100
187/187 [==============================] - 103s 550ms/step - loss: 1.1336 - acc: 0.5384 - val_loss: 0.8065 - val_acc: 0.7405
Epoch 2/100
187/187 [==============================] - 97s 519ms/step - loss: 0.8041 - acc: 0.7345 - val_loss: 0.7566 - val_acc: 0.7720
Epoch 3/100
187/187 [==============================] - 97s 519ms/step - loss: 0.7194 - acc: 0.7945 - val_loss: 0.7410 - val_acc: 0.7846
Epoch 4/100
187/187 [==============================] - 97s 517ms/step - loss: 0.6688 - acc: 0.8324 - val_loss: 0.7295 - val_acc: 0.7924
Epoch 5/100
187/187 [==============================] - 97s 518ms/step - loss: 0.6288 - acc: 0.8611 - val_loss: 0.7197 - val_acc: 0.7961
Epoch 6/100
187/187 [==============================] - 96s 515ms/step - loss: 0.5989 - acc: 0.8862 - val_loss: 0.7252 - val_acc: 0.7961
Epoch 7/100
187/187 [==============================] - 96s 514ms/step - loss: 0.5762 - acc: 0.8981 - val_loss: 0.7135 - val_acc: 0.8063
Epoch 8/100
187/187 [==============================] - 97s 518ms/step - loss: 0.5513 - acc: 0.9186 - val_loss: 0.7089 - val_acc: 0.8077
Epoch 9/100
187/187 [==============================] - 96s 513ms/step - loss: 0.5351 - acc: 0.9280 - val_loss: 0.7113 - val_acc: 0.8053
Epoch 10/100
187/187 [==============================] - 96s 514ms/step - loss: 0.5189 - acc: 0.9417 - val_loss: 0.7167 - val_acc: 0.8094
Epoch 11/100
187/187 [==============================] - 96s 515ms/step - loss: 0.5026 - acc: 0.9483 - val_loss: 0.7104 - val_acc: 0.8162
Epoch 12/100
187/187 [==============================] - 96s 516ms/step - loss: 0.4914 - acc: 0.9538 - val_loss: 0.7114 - val_acc: 0.8101
Epoch 13/100
187/187 [==============================] - 96s 515ms/step - loss: 0.4809 - acc: 0.9583 - val_loss: 0.7099 - val_acc: 0.8141
Epoch 14/100
187/187 [==============================] - 96s 512ms/step - loss: 0.4681 - acc: 0.9656 - val_loss: 0.7149 - val_acc: 0.8182
Epoch 15/100
187/187 [==============================] - 96s 515ms/step - loss: 0.4605 - acc: 0.9701 - val_loss: 0.7139 - val_acc: 0.8172
Epoch 16/100
187/187 [==============================] - 96s 514ms/step - loss: 0.4479 - acc: 0.9753 - val_loss: 0.7102 - val_acc: 0.8182
Epoch 17/100
187/187 [==============================] - 96s 513ms/step - loss: 0.4418 - acc: 0.9805 - val_loss: 0.7087 - val_acc: 0.8247
Epoch 18/100
187/187 [==============================] - 96s 512ms/step - loss: 0.4363 - acc: 0.9809 - val_loss: 0.7148 - val_acc: 0.8213
Epoch 19/100
187/187 [==============================] - 96s 516ms/step - loss: 0.4225 - acc: 0.9870 - val_loss: 0.7184 - val_acc: 0.8203
Epoch 20/100
187/187 [==============================] - 96s 513ms/step - loss: 0.4241 - acc: 0.9863 - val_loss: 0.7216 - val_acc: 0.8189
Epoch 21/100
187/187 [==============================] - 96s 513ms/step - loss: 0.4132 - acc: 0.9908 - val_loss: 0.7143 - val_acc: 0.8199
Epoch 22/100
187/187 [==============================] - 96s 515ms/step - loss: 0.4050 - acc: 0.9936 - val_loss: 0.7109 - val_acc: 0.8233
Epoch 23/100
187/187 [==============================] - 96s 515ms/step - loss: 0.4040 - acc: 0.9928 - val_loss: 0.7118 - val_acc: 0.8203
Epoch 24/100
187/187 [==============================] - 96s 511ms/step - loss: 0.3989 - acc: 0.9930 - val_loss: 0.7194 - val_acc: 0.8165
Epoch 25/100
187/187 [==============================] - 97s 517ms/step - loss: 0.3933 - acc: 0.9946 - val_loss: 0.7163 - val_acc: 0.8155
Epoch 26/100
187/187 [==============================] - 97s 516ms/step - loss: 0.3884 - acc: 0.9957 - val_loss: 0.7225 - val_acc: 0.8148
Epoch 27/100
187/187 [==============================] - 95s 510ms/step - loss: 0.3876 - acc: 0.9959 - val_loss: 0.7224 - val_acc: 0.8179
The plot in itself looks like overfitting, but I've taken plenty of measures to fix overfitting but none seem to work. Here is my model:
# transfer learning with ResNet50
base_model=ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
# function to finetune model
def build_finetune_model(base_model, dropout, fc_layers, num_classes):
# make base model untrainable
for layer in base_model.layers:
layer.trainable = False
x = base_model.output
x = Flatten()(x)
# add dense layers
for fc in fc_layers:
# use regularizer
x = Dense(fc, use_bias=False, kernel_regularizer=l2(0.003))(x)
# add batch normalization
x = BatchNormalization()(x)
x = Activation('relu')(x)
# add dropout
x = Dropout(dropout)(x)
# New softmax layer
x = Dense(num_classes, use_bias=False)(x)
x = BatchNormalization()(x)
predictions = Activation('softmax')(x)
finetune_model = Model(inputs=base_model.input, outputs=predictions)
return finetune_model
FC_LAYERS = [1024, 1024]
dropout = 0.5
model = build_finetune_model(base_model, dropout=dropout, fc_layers=FC_LAYERS,num_classes=len(categories))
I'm adjusting for class weights and have set a really low learning rate in hopes of slowing the learning down.
model.compile(optimizer=Adam(lr=0.000005),loss='categorical_crossentropy',metrics=['accuracy'], weighted_metrics=class_weight)
I'm really confused by the fact that the validation accuracy starts so high (significantly higher than training accuracy) and barely improves during the entire training process. As mentioned before it seems to be overfitting but I added dropouts, batch normalization and regularizers, it doesn't seem to work. Augmenting data with Horizontal flips, random cropping, random brightness and rotation does not change the accuracy significantly either. Turning shuffle off for my data inside ImageDataGenerator().flow_from_directory() for my training data makes the model train around 25% for training accuracy and <50% for validation accuracy (Edit: accuracy seems to be so low because the learning rate was too low in that case).
Again, the model works surprisingly well on new testing data. I'm looking to increase the validation accuracy and want to understand why the neural network is behaving that way.
Your model is overfitting. You may want to use data augmentation on a model of images. e.g. use ImageDataGenerator (https://keras.io/preprocessing/image/) to randomly shift, rotate and crop images.
SGD tried to find the simplest way possible to minimise the loss function on the dataset; given a large enough set of data points it is forced to come up with a generic solution; but whenever possible DNNs tend to "memorise" the inputs since that is the simplest way to reduce the loss. Dropouts and regularisation do help but at the end of the day what matters is the validation metrics. Assuming of course that your validation set is correctly balanced.

Resources