Transfer learning only works with trainable set to false - machine-learning

I have two models initialized like this
vgg19 = keras.applications.vgg19.VGG19(
weights='imagenet',
include_top=False,
input_shape=(img_height, img_width, img_channels))
for layer in vgg19.layers:
layer.trainable = False
model = Sequential(layers=vgg19.layers)
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dense(512, activation='relu'))
model.add(Dense(10, activation='softmax'))
opt = Adam(learning_rate=0.001, beta_1=0.9)
model.compile(
loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
and
vgg19_2 = keras.applications.vgg19.VGG19(
weights='imagenet',
include_top=False,
input_shape=(img_height, img_width, img_channels))
model2 = Sequential(layers=vgg19_2.layers)
model2.add(Dense(1024, activation='relu'))
model2.add(Dense(512, activation='relu'))
model2.add(Dense(10, activation='softmax'))
opt = Adam(learning_rate=0.001, beta_1=0.9)
model2.compile(
loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
In other words the only difference is the second model doesn't set vgg19 layers' trainable parameter to false. Unfortunately the model with trainable set to true does not learn the data.
When I use model.fit I get
Trainable set to false:
Epoch 1/51
2500/2500 [==============================] - 49s 20ms/step - loss: 1.4319 - accuracy: 0.5466 - val_loss: 1.3951 - val_accuracy: 0.5693
Epoch 2/51
2500/2500 [==============================] - 47s 19ms/step - loss: 1.1508 - accuracy: 0.6009 - val_loss: 0.7832 - val_accuracy: 0.6023
Epoch 3/51
2500/2500 [==============================] - 48s 19ms/step - loss: 1.0816 - accuracy: 0.6256 - val_loss: 0.6782 - val_accuracy: 0.6153
Epoch 4/51
2500/2500 [==============================] - 47s 19ms/step - loss: 1.0396 - accuracy: 0.6450 - val_loss: 1.3045 - val_accuracy: 0.6103
The model trains to about 65% accuracy within a few epochs. However using model2 which should be able to make even better predictions (since there are more trainable parameters) I get:
Epoch 1/5
2500/2500 [==============================] - 226s 90ms/step - loss: 2.3028 - accuracy: 0.0980 - val_loss: 2.3038 - val_accuracy: 0.1008
Epoch 2/5
2500/2500 [==============================] - 311s 124ms/step - loss: 2.3029 - accuracy: 0.0980 - val_loss: 2.2988 - val_accuracy: 0.1017
Epoch 3/5
2500/2500 [==============================] - 306s 123ms/step - loss: 2.3029 - accuracy: 0.0980 - val_loss: 2.3052 - val_accuracy: 0.0997
Epoch 4/5
2500/2500 [==============================] - 321s 129ms/step - loss: 2.3029 - accuracy: 0.0972 - val_loss: 2.3028 - val_accuracy: 0.0997
Epoch 5/5
2500/2500 [==============================] - 300s 120ms/step - loss: 2.3028 - accuracy: 0.0988 - val_loss: 2.3027 - val_accuracy: 0.1007
When I then try to compute weights gradients on my data I get only zeros. I understand that it may take a long time to train such a big neural net like vgg to optimum but considering the calculated gradients for the last 3 layers should be very similar in both cases why is the accuracy so low? Training for more time gives no improvement.

Try this:
Train the first model, which sets trainable to False. You don't have to train it to saturation, so I would start with your 5 epochs.
Go back and set trainable to True for all the vgg19 parameters. Then, per the documentation, you can rebuild and recompile the model to have these changes take effect.
Continue training on the rebuilt model, which now has all parameters available for tuning.
It is very common in transfer learning to completely freeze the transferred layers in order to preserve them. In the early stages of training your additional layers don't know what to do. That means a noisy gradient by the time it gets to the transferred layers, which will quickly "detune" them away from their previously well-tuned weights.
Putting it all together into some code, it would look something like this.
# Original code. Transfer VGG and freeze the weights.
vgg19 = keras.applications.vgg19.VGG19(
weights='imagenet',
include_top=False,
input_shape=(img_height, img_width, img_channels))
for layer in vgg19.layers:
layer.trainable = False
model = Sequential(layers=vgg19.layers)
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dense(512, activation='relu'))
model.add(Dense(10, activation='softmax'))
opt = Adam(learning_rate=0.001, beta_1=0.9)
model.compile(
loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
model.fit()
# New second stage: unfreeze and continue training.
for layer in vgg19.layers:
layer.trainable = True
full_model = Sequential(layers=model.layers)
full_model.compile(
loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
full_model.fit()
You may want to tune the learning rate for the fine-tuning stage. It's not essential to start, just something to keep in mind.
A third option is to use discriminative learning rates, as introduced by Jeremy Howard and Sebastian Ruder in the ULMFiT paper. The idea is that, in Transfer Learning, you usually want the later layers to learn faster than the earlier, transferred layers. So you actually set the learning rates to be different for different sets of layers. The fastai library has a PyTorch implementation that works by dividing the model into "layer groups" and allowing different parameters for each.

Related

Why I am retrieving high value loss for neural network regression

I have data in the following format consisting of 80 instances. I need to predict two-parameter latency and accuracy
No Model Technique Latency Accuracy
0 1 Net Repartition 31308.4 0.99
1 2 Net Connection 30338.2 0.79
2 3 MobiNet Repartition 20360.1 0.89
predictors=data.drop(['Latency','Accuracy'], axis = 1)
target=data[['Latency', 'Accuracy']]
predictors_cat_converted=pd.get_dummies(predictors, prefix=['Model', 'Technique'])
pre_norms = (predictors_cat_converted-predictors_cat_converted.mean()/predictors_cat_converted.std())
def regression():
model=Sequential()
model.add(Dense(50, activation= 'relu',input_shape=(n_cols,)))
model.add(Dense(50, activation='relu'))#hidden layer
model.add(Dense(2))#output
model.compile(optimizer='adam',loss='mean_squared_error')
return model
model=regression()
model.fit(pre_norms, target,validation_split=.3,epochs=100,verbose=1)
Output retrieving high value loss
Epoch 1/100
2/2 [==============================] - 1s 275ms/step - loss: 256321162.6667 - val_loss: 262150224.0000
Epoch 2/100
2/2 [==============================] - 0s 23ms/step - loss: 246612645.3333 - val_loss: 262146176.0000
Epoch 3/100
2/2 [==============================] - 0s 22ms/step - loss: 251778928.0000 - val_loss: 262142000.0000
Epoch 4/100
2/2 [==============================] - 0s 26ms/step - loss: 252470826.6667 - val_loss: 262137664.0000
Epoch 5/100
2/2 [==============================] - 0s 25ms/step - loss: 255799392.0000 - val_loss: 262133200.0000
Epoch 6/100
You have very less data, just 2 columns, 80 rows and 2 target variables. All you can do is:
Add more data.
Normalize your data and then feed it to the neural network.
If neural network not giving good accuracy, try Random Forest or XGBoost.
I also want to add one thing that is your neural network architecture is wrong. Dense layer with 2 outputs and a softmax activation isn't going to give you good result here. You have to use TensorFlow's Funtional API and make 1 input 2 output neural network architecture.
One of your target variables reaches quite big values. As shown in the excerpt of your data, "Latency" reaches values around 30,000 and 20,000.
Evidently if your model makes quite wrong predictions in the beginning, f.e. if it predicts "1" for your Latency, the MSE will be extremely high.
You could normalize your targets as you did with your inputs to make it easier for your network to learn the targets. Your MSE and hence your loss should be much smaller then

Validaton loss decrease and validation accuracy decrease in CNN classification

Im training classification on 2 classes (spawned fish or not from image of scale). The dataset is unbalanced. There is only 5% spawned scales.
I havnt checked how many spawned fish are in each of train/validation/test sets, but there are 9073 images. Splitt in 70/15/15 %. Then I observe in epoke 2 that val_loss decrease while val_acc decrease. How is that possible?
Im using Keras. The network is EfficientNetB4 from github.com/qubvel.
1600/1600 [==============================] - 1557s 973ms/step - loss: 1.3353 - acc: 0.6474 - val_loss: 0.8055 - val_acc: 0.7046
Epoch 00001: val_loss improved from inf to 0.80548, saving model to ./checkpoints_missing_loss2/salmon_scale_inception.001-0.81.hdf5
Epoch 2/150
1600/1600 [==============================] - 1508s 943ms/step - loss: 0.8013 - acc: 0.7084 - val_loss: 0.6816 - val_acc: 0.6973
Epoch 00002: val_loss improved from 0.80548 to 0.68164, saving model to ./checkpoints_missing_loss2/salmon_scale_inception.002-0.68.hdf5
Edit: here is another example - only 1010 images but its balanced - 50/50.
Epoch 5/150
1600/1600 [==============================] - 1562s 976ms/step - loss: 0.0219 - acc: 0.9933 - val_loss: 0.2639 - val_acc: 0.9605
Epoch 00005: val_loss improved from 0.28715 to 0.26390, saving model to ./checkpoints_missing_loss2/salmon_scale_inception.005-0.26.hdf5
Epoch 6/150
1600/1600 [==============================] - 1565s 978ms/step - loss: 0.0059 - acc: 0.9982 - val_loss: 0.4140 - val_acc: 0.9276
Epoch 00006: val_loss did not improve from 0.26390
Epoch 7/150
1600/1600 [==============================] - 1561s 976ms/step - loss: 0.0180 - acc: 0.9941 - val_loss: 0.2379 - val_acc: 0.9276
and val_loss decrease aswell as val_acc.
If you have such an unbalanced dataset, the model first classifies everything as the majority class which gets relatively high accuracy, but all probability is distributed to the majority class. The reason is that the final bias can be learned very quickly because the back-propagation path is very short.
In the later stages of the training, the model basically finds reasons not to classify the input with the majority class. At this point, the model starts to make mistakes, the accuracy goes down, but the probability is more evenly distributed, so from the loss perspective, the error is smaller.
With such an imbalanced dataset, I would rather track F-measure instead of accuracy.

Transfer Learning - Val_loss strange behaviour

I am trying to use transfer-learning on MobileNetV2 from keras.application in phyton.
My images belongs to 4 classes with an amount of 8000, 7000, 8000 and 8000 images in the first, second, third and last class. My images are gray-scaled and resized from 1024x1024 to 128x128.
I removed the classification dense layers from MobileNetV2 and added my own dense layers:
global_average_pooling2d_1 (Glo Shape = (None, 1280) 0 Parameters
______________________________________________________________________________
dense_1 (Dense) Shape=(None, 4) 5124 Parameters
______________________________________________________________________________
dropout_1 (Dropout) Shape=(None, 4) 0 Parameters
________________________________________________________________
dense_2 (Dense) Shape=(None, 4) 20 Parameters
__________________________________________________________________________
dense_3 (Dense) Shape=(None, 4) 20 Parameters
Total params: 2,263,148
Trainable params: 5,164
Non-trainable params: 2,257,984
As you can see I added 2 dense layers with dropout as regularizer.
Furhtermore, I used the following
opt = optimizers.SGD(lr=0.001, decay=4e-5, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
batch_size = 32
My results on training are very weird... :
Epoch
1 loss: 1.3378 - acc: 0.3028 - val_loss: 1.4629 - val_acc: 0.2702
2 loss: 1.2807 - acc: 0.3351 - val_loss: 1.3297 - val_acc: 0.3208
3 loss: 1.2641 - acc: 0.3486 - val_loss: 1.4428 - val_acc: 0.3707
4 loss: 1.2178 - acc: 0.3916 - val_loss: 1.4231 - val_acc: 0.3758
5 loss: 1.2100 - acc: 0.3909 - val_loss: 1.4009 - val_acc: 0.3625
6 loss: 1.1979 - acc: 0.3976 - val_loss: 1.5025 - val_acc: 0.3116
7 loss: 1.1943 - acc: 0.3988 - val_loss: 1.4510 - val_acc: 0.2872
8 loss: 1.1926 - acc: 0.3965 - val_loss: 1.5162 - val_acc: 0.3072
9 loss: 1.1888 - acc: 0.4004 - val_loss: 1.5659 - val_acc: 0.3304
10 loss: 1.1906 - acc: 0.3969 - val_loss: 1.5655 - val_acc: 0.3260
11 loss: 1.1864 - acc: 0.3999 - val_loss: 1.6286 - val_acc: 0.2967
(...)
Summarizing, the loss of training does not decrease anymore and is still very high. The model also overfits.
You may ask why I added only 2 dense layers with 4 neurons in each. In the beginning I tried different configurations (e.g. 128 neurons and 64 neurons and also different regulaziers), then overfitting was a huge problem, i.e. accuracy on training was almost 1 and loss on test was still far away from 0.
I am a little bit confused what is going on, since something tremendously is wrong here.
Fine-tuning attempts:
Different numbers of neurons in the dense layers in the classification part varying from 1024 to 4.
Different learning rates (0.01, 0.001, 0.0001)
Different batch sizes (16,32, 64)
Different regulaziers L1 with 0.001, 0.0001
Results:
Always huge overfitting
base_model = MobileNetV2(input_shape=(128, 128, 3), weights='imagenet', include_top=False)
# define classificator
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(4, activation='relu')(x)
x = Dropout(0.8)(x)
x = Dense(4, activation='relu')(x)
preds = Dense(4, activation='softmax')(x) #final layer with softmax activation
model = Model(inputs=base_model.input, outputs=preds)
for layer in model.layers[:-4]:
layer.trainable = False
opt = optimizers.SGD(lr=0.001, decay=4e-5, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
batch_size = 32
EPOCHS = int(trainY.size/batch_size)
H = model.fit(trainX, trainY, validation_data=(testX, testY), epochs=EPOCHS, batch_size=batch_size)
Result should be that there is no overfitting and val_loss close to 0. I know that from some paper working on similiar image sets.
UPDATE:
Here are some pictures of val_loss, train_loss and accuracy:
2 dense layers with 16 and 8 neurons, lr =0.001 with decay 1e-6, batchsize=25
Here, you used
x = Dropout(0.8)(x)
which means to drop 80% but i assume you need 20%
so replace it by x = Dropout(0.2)(x)
Also, please go thorugh keras documentation for the same if needed.
an extract from the above documentation
keras.layers.Dropout(rate, noise_shape=None, seed=None)
rate: float between 0 and 1. Fraction of the input units to drop.
I am not sure what the error was from above, but i know how to fix it. I completely trained the pretrained network (aswell one dense layer with 4 neurons and softmax). The results are more than satisfying.
I also tested on VGG16, where I trained only the dense output layer and it totally worked fine.
It seems to be that MobileNetV2 learns features which undesirable for my set of datas. My data sets are radar images, which looks very artificially (choi williams distribution of 'LPI'-signals). On the other hand those images are very easy (they are basicially just edges in a grayscale image), so it is still unkown to me why model-based transfer learning doesnt work for MobileNetV2).
Could be the result of your dropout rate being to high. You do not show your data generators so I can't tell if there is an issue there but I suspect that you need to compile using
loss='sparse_categorical_crossentropy'

Keras validation accuracy is 0, and stays constant throughout the training

I am doing a time series analysis using Tensorflow/ Keras in Python.
The overall LSTM model looks like,
model = keras.models.Sequential()
model.add(keras.layers.LSTM(25, input_shape = (1,1), activation = 'relu', dropout = 0.2, return_sequences = False))
model.add(keras.layers.Dense(1))
model.compile(optimizer = 'adam', loss = 'mean_squared_error', metrics=['acc'])
tensorboard = keras.callbacks.TensorBoard(log_dir="logs/{}".format(time()))
es = keras.callbacks.EarlyStopping(monitor='val_acc', mode='max', verbose=1, patience=50)
mc = keras.callbacks.ModelCheckpoint('/home/sukriti/best_model.h5', monitor='val_loss', mode='min', save_best_only=True)
history = model.fit(trainX_3d, trainY_1d, epochs=50, batch_size=10, verbose=2, validation_data = (testX_3d, testY_1d), callbacks=[mc, es, tensorboard])
I am having the following outcome,
Train on 14015 samples, validate on 3503 samples
Epoch 1/50
- 3s - loss: 0.0222 - acc: 7.1352e-05 - val_loss: 0.0064 - val_acc: 0.0000e+00
Epoch 2/50
- 2s - loss: 0.0120 - acc: 7.1352e-05 - val_loss: 0.0054 - val_acc: 0.0000e+00
Epoch 3/50
- 2s - loss: 0.0108 - acc: 7.1352e-05 - val_loss: 0.0047 - val_acc: 0.0000e+00
Now the val_acc remains unchanged. Is it normal?
what does it signify?
As signified by loss = 'mean_squared_error', you are in a regression setting, where accuracy is meaningless (it is meaningful only in classification problems).
Unfortunately, Keras will not "protect" you in such a case, insisting in computing and reporting back an "accuracy", despite the fact that it is meaningless and inappropriate for your problem - see my answer in What function defines accuracy in Keras when the loss is mean squared error (MSE)?
You should simply remove metrics=['acc'] from your model compilation, and don't bother - in regression settings, MSE itself can (and usually does) serve also as the performance metric.
In my case I had validation accuracy of 0.0000e+00 throughout training (using Keras and CNTK-GPU backend) when my batch size was 64 but there were only 120 samples in my validation set (divided into three classes). After I changed the batch size to 60, I got normal accuracy values.
It will not improve with changing batch size or with metrics. I had the same problem but when I shuffled my training and validation data set 0.0000e+00 gone.

Loading weights after a training run in KERAS not recognising the highest level of accuracy achieved in previous run

I conduct a training run and finish with the following output from a KERAS NN model. I have added a confusion matrix output for model quality
...
Epoch 09998: val_acc did not improve from 0.83780
Epoch 9999/10000
12232/12232 [==============================] - 1s 49us/step - loss: 0.2656 - acc: 0.9133 - val_loss: 0.6134 - val_acc: 0.8051
Epoch 09999: val_acc did not improve from 0.83780
Epoch 10000/10000
12232/12232 [==============================] - 1s 48us/step - loss: 0.2655 - acc: 0.9124 - val_loss: 0.5918 - val_acc: 0.8283
Epoch 10000: val_acc did not improve from 0.83780
3058/3058 [==============================] - 0s 46us/step
acc: 82.83%
Model Quality
Tn: 806 Tp: 1727 Fp: 262 Fn: 263
Precision: 0.8683 Recall: 0.8678 Accuracy 0.8283 F score 0.8681
I then make a change to a hyper parameter and reload everything and recompile using the following code
# prep checkpointing
model_file = output_file.rsplit('.',1)[0] + '_model.h5'
checkpoint = ModelCheckpoint(model_file, monitor='val_acc', verbose=1,
save_best_only=True, save_weights_only=True, mode='max')
callbacks_list = [checkpoint]
model.load_weights(model_file) # - commented out first time thru, reload second time thru
adam = optimizers.Adam(lr=l_rate, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
model.compile(loss='binary_crossentropy', optimizer=adam, metrics=['accuracy'])
model.fit(X_train, Y_train, validation_data=(model_test_X, model_test_y),
batch_size = batch_s, epochs=num_epochs,
callbacks=callbacks_list, verbose=1)
Upon re-starting I would expect the best accuracy to be as it was above. In this instance val_acc = 0.83780
However, after the first two epochs I get this output:
Epoch 1/10000
12232/12232 [==============================] - 1s 99us/step - loss: 0.2747 - acc: 0.9097 - val_loss: 0.6191 - val_acc: 0.8143
Epoch 00001: val_acc improved from -inf to 0.81426, saving model to Data_model.h5
Epoch 2/10000
12232/12232 [==============================] - 1s 42us/step - loss: 0.2591 - acc: 0.9168 - val_loss: 0.6367 - val_acc: 0.8322
Epoch 00002: val_acc improved from 0.81426 to 0.83224, saving model to Data_model.h5
Epoch 3/10000
12232/12232 [==============================] - 1s 44us/step - loss: 0.2699 - acc: 0.9140 - val_loss: 0.6157 - val_acc: 0.8313
Epoch 00003: val_acc did not improve from 0.83224
......
While I understand the model may start from a different place, my assumption was that the best level of accuracy (val_acc) would have been carried over from the previous run.
My question is am i missing something?
About ModelCheckpoint, as Wilmar van Ommeren already wrote in the comment, every time you call this class it is reinitialized the process of searching best result, because it's a class and you assign an instance of it to a variable called checkpoint.
Fortunately Keras is an open source project meaning that you can read how the class ModelCheckpoint is written and it turns out that, every time you make an instance of this class, the special method __init__ always initialize a public variable called best to Infinity or -Infinity, which depends on the argument you pass to the parameter mode inside the constructor of the class.
So every time you repeat the training of the model, the last result is always overwritten by the first training process of the new training session because the variable best is set to an Infinity value, losing all the effort from the previous training session.
To make your training sessions continuing the previous ones you just have to save the best result from the previous training session, which you can do it by taking the History object from the model.fit() method, and save it into a variable or file. Then, when you have to continue the training of the model, before you call the model.fit() method specify the best result from the previous training into the variable checkpoint, in your case is the val_acc. In summary, this is the "pseudocode" you need to solve your problem with ModelCheckpoint:
# Previous training...
# Save the best result to a variable or in a file
# Load the best result into a variable
checkpoint = ModelCheckpoint(model_file, monitor='val_acc', verbose=1, save_best_only=True, save_weights_only=True,mode='max')
checkpoint.best = best_accuracy_from_previous_training_session
# Repeat the training...
Another thing I want to point out is the compile method. There is nothing wrong on what you pass as arguments to the method, but recently I have found the following two issues that claims it is necessary to specify the right metrics to the method for saving the right information:
Keras giving low accuracy after loading model
Accuracy is lost after save/load
Although you save and load only the weights of the model, it seems it is better to specify which metric you use based on the loss function. Even tho passing 'accuracy' as a parameter it's ok because Keras chooses the right metric but sometimes might not do it in the right way.
So all you have to do is to change the string 'accuracy' to 'binary_accuracy' in metrics.

Resources