Keras model training accuracy dipped drastically while training - machine-learning

I am training a neural network based on deep CNNs using Keras and the accuracy while training until the 16th epoch was 90%. It dipped massively to 40% on 17th epoch and then to 3% on the next one and stayed the same until the end of the training. What could have caused that?
This is my model architecture:
## input layer
input_layer = Input((S, S, L, 1))
## convolutional layers
conv_layer1 = Conv3D(filters=8, kernel_size=(3, 3, 7), activation='relu', padding = 'same')(input_layer)
conv_layer2 = Conv3D(filters=16, kernel_size=(3, 3, 5), activation='relu', padding = 'same')(conv_layer1)
conv_layer3 = Conv3D(filters=32, kernel_size=(3, 3, 3), activation='relu', padding = 'same')(conv_layer2)
print(conv_layer3._keras_shape)
conv3d_shape = conv_layer3._keras_shape
conv_layer3 = Reshape((conv3d_shape[1], conv3d_shape[2], conv3d_shape[3]*conv3d_shape[4]))(conv_layer3)
conv_layer4 = Conv2D(filters=64, kernel_size=(3,3), activation='relu')(conv_layer3)
flatten_layer = Flatten()(conv_layer4)
## fully connected layers
dense_layer1 = Dense(units=256, activation='relu')(flatten_layer)
dense_layer1 = Dropout(0.4)(dense_layer1)
dense_layer2 = Dense(units=128, activation='relu')(dense_layer1)
dense_layer2 = Dropout(0.4)(dense_layer2)
output_layer = Dense(units=output_units, activation='softmax')(dense_layer2)
I will add the screenshot of the training:
training-screenshot
In this regard, I have two questions:
What are the possible reasons for this to take place?
I suspect the information could be incorrect. I have set up a checkpoint so the best weights will only be saved. It took about 16 hours to train the model. Is there a way I can still get the training weights of the last epoch i.e. the not best weights while the checkpoint was still in place?

Your loss is nan on the 17th epoch
It is impossile to recover weights other than by loading from saved weights

Related

How to use categorical_hinge loss in keras in order to train with an SVM in the last layer?

I wanna train a CNN using SVM to classify at the last layer. I understand that the categorical_hinge is the best loss function for that . I have 6 classes to classify .
My model is as shown below:
model = Sequential()
model.add(Conv2D(50, 3, 3, activation = 'relu', input_shape = train_data.shape[1:]))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(50, 3, 3, activation = 'relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(50, 3, 3, activation = 'relu'))
model.add(Flatten())
model.add(Dense(400, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(128, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation = 'sigmoid'))
Is there a problem with the network , data processing , or the loss function?
The model does not learn anything after a point as shown in the image
What should I do?
Your model has a single output neuron, there is no way this will work with 6 classes. The output of your model should have 6 neurons. Also the output of your model should have no activation function in order to produce logits that the categorical hinge can use.
Note that the categorical hinge was added recently (2-3 weeks ago) so its quite new and probably not many people have tested it.
Use hinge loss in and linear activation in last layer.
model.add(Dense(nb_classes), W_regularizer=l2(0.01))
model.add(Activation('linear'))
model.compile(loss='hinge',
optimizer='adadelta',
metrics=['accuracy'])
for more information visit https://github.com/keras-team/keras/issues/6090

Layer Counting with Keras Deep Learning

I am working on my First deep-learning project on counting layers in an image with convolutional neural network.
After fixing tons of errors, I could finally train my model. However, I am getting 0 accuracy; after 2nd epoch it just stops because it is not learning anything.
Input will be a 1200 x 100 size image of layers and output will be an integer.
If anyone can look over my model and can suggest a tip. That will be awesome.
Thanks.
from keras.layers import Reshape, Conv2D, MaxPooling2D, Flatten
model = Sequential()
model.add(Convolution2D(32, 5, 5, activation='relu', input_shape=(1,1200,100)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(64, 5, 5, activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(1, activation='relu'))
batch_size = 1
epochs = 10
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(sgd, loss='poisson', metrics=['accuracy'])
earlyStopping=keras.callbacks.EarlyStopping(monitor='val_loss', patience=0, verbose=0, mode='auto')
history = model.fit(xtrain, ytrain, batch_size=batch_size, nb_epoch=epochs, validation_data=validation, callbacks=[earlyStopping], verbose=1)
There are sooo many thing to criticise?
1200*100 size of an image (I assume that they're pixels) is so big for CNN's. In ImageNet competitions, images are all 224*224, 299*299.
2.Why don't you use linear or sigmoid activation on last layer?
Did you normalize your outputs between 0 and 1? Normalize it, just divide your output with the maximum of your output and multiply with the same number when using your CNN after training/predicting.
Don't use it with small data, unnecessary :
earlyStopping=keras.callbacks.EarlyStopping(monitor='val_loss', patience=0, verbose=0, mode='auto')
Lower your optimizer to 0.001 with Adam.
Your data isn't actually big, it should work, probably your problem is at normalization of your output/inputs, check for them.

Keras Stateful LSTM - strange val loss

I'm trying to do a one-step ahead contrived stock market prediction and I'm unsure if I am doing everything correctly as my validation loss does not go down and my graphs look off to me.
With no dropout I get the following - it looks like a common case of overfitting?: here
However I would expect the predictions to basically mirror the training data in this case as it has learnt it's pattern but instead I get this graph when plotting y_train vs the predictions:
here
Strangely when plotting y_test vs predictions it looks more accurate:
here
How could y_train be so far off with such a low training MSE and y_test be more accurate with a high MSE?
When adding dropout, the model simply fails to learn or just learns much slower on the training set like this (only 10 epochs to cut down on training time but the pattern holds of val loss not decreasing).
here
Some information about the model.
My data shapes are as follows:
x_train size is: (172544, 20, 197)
x_test size is: (83968, 20, 197)
y_train size is: (172544, 1)
y_test size is: (83968, 1)
X is set up as 197 features at timesteps [0,1,2,..19] and has a corresponding Y label at timestep [20]. The repeats for the next sequence [1,2,3...20] and Y label [21] and so on.
All data is normalized to mean 0, std_dev 1 (on the training set) then applied to the test set.
Code for the model:
batch_size = 512
DROPOUT = 0.0
timesteps = x_train.shape[1]
data_dim = x_train.shape[2]
model = Sequential()
model.add(LSTM(512, stateful=True, return_sequences=True, implementation=2,
dropout=DROPOUT,
batch_input_shape=(batch_size, timesteps, data_dim)))
model.add(LSTM(256, stateful=True, return_sequences=True, implementation=2,
dropout=DROPOUT))
model.add(LSTM(256, stateful=True, return_sequences=False, implementation=2,
dropout=DROPOUT))
model.add(Dense(1, activation='linear'))
nadam = Nadam()
model.compile(loss='mse',
optimizer=nadam,
metrics=['mse','mae','mape'])
history = model.fit(x_train, y_train,validation_data=(x_test, y_test),
epochs=100,batch_size=batch_size, shuffle=False, verbose=1, callbacks=[reduce_lr])
EDIT: Even when using two samples the same happens
x_train size is: (2, 2, 197)
x_test size is: (2, 2, 197)
y_train size is: (2, 1)
y_test size is: (2, 1)
y_train vs predictions

Intuition behind Stacking Multiple Conv2D Layers before Dropout in CNN

Background:
Tagging TensorFlow since Keras runs on top of it and this is more a general deep learning question.
I have been working on the Kaggle Digit Recognizer problem and used Keras to train CNN models for the task. This model below has the original CNN structure I used for this competition and it performed okay.
def build_model1():
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), padding="Same" activation="relu", input_shape=[28, 28, 1]))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Conv2D(64, (3, 3), padding="Same", activation="relu"))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Conv2D(64, (3, 3), padding="Same", activation="relu"))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation="relu"))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(10, activation="softmax"))
return model
Then I read some other notebooks on Kaggle and borrowed another CNN structure (copied below), which works much better than the one above in that it achieved better accuracy, lower error rate, and took many more epochs before overfitting the training data.
def build_model2():
model = models.Sequential()
model.add(layers.Conv2D(32, (5, 5),padding ='Same', activation='relu', input_shape = (28, 28, 1)))
model.add(layers.Conv2D(32, (5, 5),padding = 'Same', activation ='relu'))
model.add(layers.MaxPool2D((2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Conv2D(64,(3, 3),padding = 'Same', activation ='relu'))
model.add(layers.Conv2D(64, (3, 3),padding = 'Same', activation ='relu'))
model.add(layers.MaxPool2D(pool_size=(2, 2), strides=(2, 2)))
model.add(layers.Dropout(0.25))
model.add(layers.Flatten())
model.add(layers.Dense(256, activation = "relu"))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(10, activation = "softmax"))
return model
Question:
Is there any intuition or explanation behind the better performance of the second CNN structure? What is it that makes stacking 2 Conv2D layers better than just using 1 Conv2D layer before max pooling and dropout? Or is there something else that contributes to the result of the second model?
Thank y'all for your time and help.
The main difference between these two approaches is that the later (2 conv) has more flexibility in expressing non-linear transformations without loosing information. Maxpool removes information from the signal, dropout forces distributed representation, thus both effectively make it harder to propagate information. If, for given problem, highly non-linear transformation has to be applied on raw data, stacking multiple convs (with relu) will make it easier to learn, that's it. Also note that you are comparing a model with 3 max poolings with model with only 2, consequently the second one will potentially loose less information. Another thing is it has way bigger fully connected bit at the end, while the first one is tiny (64 neurons + 0.5 dropout means that you effectively have at most 32 neurons active, that is a tiny layer!). To sum up:
These architectures differe in many aspects, not just stacking conv nets.
Stacking convnets usually leads to less information being lost in processing; see for example "all convolutional" architectures.

Training neural network with keras (observed loss is too low)

I am training a neural network with Keras which takes input of 2000 X 1 arrays, all the input data are "0" and "1" and generate a single output either 0 or 1.
here is my model:
def mdl_normal(sq_len,broker_num):
model = Sequential()
model.add(Dense(sq_len * (broker_num + 1), input_dim = (sq_len * (broker_num+1)),activation = 'relu'))
model.add(Dense(800, activation = 'relu'))
model.add(Dense(400, activation = 'relu'))
model.add(Dense(1, activation = 'sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='SGD')
return model
However I am getting the following while training:
Epoch 384/600 0s - loss: 1.4224e-04 - val_loss: 2.6322
The loss is extremely low and I am wondering I am doing something wrong. Can someone explain what is the meaning of loss here?
Thanks!
Louis

Resources