Why would LSTM with one time step perform better than MLP?

Why would LSTM with one time step perform better than MLP? - machine-learning

Out of curiosity, I compared a stacked LSTM neural network with a single time step with MLP with tanh activation function, thinking they would have the same performance.
The architectures used for comparison are as follows, and they are trained on an identical dataset of regression problem (loss function is MSE):
model.add(Dense(50, input_dim=num_features, activation = 'tanh'))
model.add(Dense(100, activation = 'tanh'))
model.add(Dense(150, activation = 'tanh'))
model.add(Dense(100, activation = 'tanh'))
model.add(Dense(50, activation = 'tanh'))
model.add(Dense(1))
model.add(LSTM(50, return_sequences=True, input_shape=(None, num_features)))
model.add(LSTM(100, return_sequences=True))
model.add(LSTM(150, return_sequences=True))
model.add(LSTM(100, return_sequences=True))
model.add(LSTM(50))
model.add(Dense(1))
Surprisingly, the loss for the LSTM model decreases much faster than the MLP:
MLP loss:
Epoch: 1
Training Loss: 0.011504
Validation Loss: 0.010708
Epoch: 2
Training Loss: 0.010739
Validation Loss: 0.010623
Epoch: 3
Training Loss: 0.010598
Validation Loss: 0.010189
Epoch: 4
Training Loss: 0.010046
Validation Loss: 0.009651
Epoch: 5
Training Loss: 0.009305
Validation Loss: 0.008502
Epoch: 6
Training Loss: 0.007388
Validation Loss: 0.004334
Epoch: 7
Training Loss: 0.002576
Validation Loss: 0.001686
Epoch: 8
Training Loss: 0.001375
Validation Loss: 0.001217
Epoch: 9
Training Loss: 0.000921
Validation Loss: 0.000916
Epoch: 10
Training Loss: 0.000696
Validation Loss: 0.000568
Epoch: 11
Training Loss: 0.000560
Validation Loss: 0.000479
Epoch: 12
Training Loss: 0.000493
Validation Loss: 0.000451
Epoch: 13
Training Loss: 0.000439
Validation Loss: 0.000564
Epoch: 14
Training Loss: 0.000402
Validation Loss: 0.000478
Epoch: 15
Training Loss: 0.000377
Validation Loss: 0.000366
Epoch: 16
Training Loss: 0.000351
Validation Loss: 0.000240
Epoch: 17
Training Loss: 0.000340
Validation Loss: 0.000352
Epoch: 18
Training Loss: 0.000327
Validation Loss: 0.000203
Epoch: 19
Training Loss: 0.000311
Validation Loss: 0.000323
Epoch: 20
Training Loss: 0.000299
Validation Loss: 0.000264
LSTM loss:
Epoch: 1
Training Loss: 0.011345
Validation Loss: 0.010634
Epoch: 2
Training Loss: 0.008128
Validation Loss: 0.003692
Epoch: 3
Training Loss: 0.001488
Validation Loss: 0.000668
Epoch: 4
Training Loss: 0.000440
Validation Loss: 0.000232
Epoch: 5
Training Loss: 0.000260
Validation Loss: 0.000160
Epoch: 6
Training Loss: 0.000200
Validation Loss: 0.000137
Epoch: 7
Training Loss: 0.000165
Validation Loss: 0.000093
Epoch: 8
Training Loss: 0.000140
Validation Loss: 0.000104
Epoch: 9
Training Loss: 0.000127
Validation Loss: 0.000139
Epoch: 10
Training Loss: 0.000116
Validation Loss: 0.000091
Epoch: 11
Training Loss: 0.000106
Validation Loss: 0.000095
Epoch: 12
Training Loss: 0.000099
Validation Loss: 0.000082
Epoch: 13
Training Loss: 0.000091
Validation Loss: 0.000135
Epoch: 14
Training Loss: 0.000085
Validation Loss: 0.000099
Epoch: 15
Training Loss: 0.000082
Validation Loss: 0.000055
Epoch: 16
Training Loss: 0.000079
Validation Loss: 0.000062
Epoch: 17
Training Loss: 0.000075
Validation Loss: 0.000045
Epoch: 18
Training Loss: 0.000073
Validation Loss: 0.000121
Epoch: 19
Training Loss: 0.000069
Validation Loss: 0.000045
Epoch: 20
Training Loss: 0.000065
Validation Loss: 0.000052
After 100 epochs, the validation loss for MLP decreased to about 1e-4, but the loss for LSTM decreased to about 1e-5.
It doesn't make much sense to me as to how these two architectures would be any different, since the LSTM cells are not using any memory from previous timesteps. Also, the training for MLP is about 3 times faster than LSTM. Could someone explain the math behind it?

Related

NASNet transfer learning not giving accuracy

I am currently trying to implement transfer learning using pytorch on the nasnet model. I cannot find any other way to import the model other than using this.
import timm
model = timm.create_model('nasnetalarge', pretrained=True)
It has last three layers
(act): ReLU(inplace=True)
(global_pool): SelectAdaptivePool2d (pool_type=avg, flatten=Flatten(start_dim=1, end_dim=-1))
(last_linear): Linear(in_features=4032, out_features=1000, bias=True)
I am trying to do binary classification and I am trying to fine tune the network. So, I changed the number of out features to be 2. But the accuracy remains constant with the number of epochs.
Is it a correct way to implement nasnet model? Also, should I change all the activation functions or some of them? And how to fine tune the model so that the model converges?
Epoch: 0 Train Loss: 0.6450783805564987 Validation loss: 0.6651424169540405 Train Accuracy: tensor(64.5254, device='cuda:0') Validation accuracy: 78.0
Epoch: 1 Train Loss: 0.6464798424893493 Validation loss: 0.6233693957328796 Train Accuracy: tensor(63.9106, device='cuda:0') Validation accuracy: 77.33333333333333
Epoch: 2 Train Loss: 0.6471542623569249 Validation loss: 0.5627642869949341 Train Accuracy: tensor(63.9618, device='cuda:0') Validation accuracy: 77.33333333333333
Epoch: 3 Train Loss: 0.6478866574491537 Validation loss: 0.6459301710128784 Train Accuracy: tensor(64.1540, device='cuda:0') Validation accuracy: 78.66666666666666
Epoch: 4 Train Loss: 0.6494869376566493 Validation loss: 0.6185131072998047 Train Accuracy: tensor(64.1540, device='cuda:0') Validation accuracy: 78.0
Epoch: 5 Train Loss: 0.6495973079446123 Validation loss: 0.6605387926101685 Train Accuracy: tensor(64.3269, device='cuda:0') Validation accuracy: 78.0
Epoch: 6 Train Loss: 0.6508511623683317 Validation loss: 0.7085398435592651 Train Accuracy: tensor(64.1604, device='cuda:0') Validation accuracy: 78.0
Epoch: 7 Train Loss: 0.6518356682885635 Validation loss: 0.6155421137809753 Train Accuracy: tensor(64.3013, device='cuda:0') Validation accuracy: 78.0
Epoch: 8 Train Loss: 0.6525909496022505 Validation loss: 0.6670436859130859 Train Accuracy: tensor(64.0963, device='cuda:0') Validation accuracy: 78.0
I am trying to change the hyperparameters, but that didn't work. I wonder if there is something wrong with my implementation. Can anyone please help?
Here is my training part of the code
best_accuracy = 0.0
training_loss = []
validation_loss = []
for epoch in range(num_of_epochs):
#Evaluation and training on training dataset
model.train()
running_loss=0.0
running_correct=0.0
correct=0.0
for images,labels in train_loader:
images = images.to(device)
labels = labels.to(device)
with torch.set_grad_enabled(True):
outputs=model(images)
_,preds=torch.max(outputs,1)
loss=loss_function(outputs,labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
running_loss +=loss.item()*images.size(0)
running_correct +=torch.sum(preds==labels.data)
step_lr_scheduler.step()
train_accuracy=(running_correct/train_count)*100
train_loss=running_loss/train_count
training_loss.append(train_loss)
#Evaluating on the validation set
model.eval()
valid_accuracy=0.0
running_validloss = 0.0
for images,labels in valid_loader:
images = images.to(device)
labels = labels.to(device)
with torch.no_grad():
outputs = model(images)
_,preds=torch.max(outputs,1)
running_validloss +=loss.item()*images.size(0)
correct += (preds == labels.cuda(device)).sum().item()
valid_accuracy = 100*(correct/valid_count)
valid_loss = running_validloss/valid_count
validation_loss.append(valid_loss)
print('Epoch: '+str(epoch)+' Train Loss: '+ str(train_loss)+ ' Validation loss: ' + str(valid_loss) + ' Train Accuracy: '+str(train_accuracy)+ ' Validation accuracy: ' + str(valid_accuracy))

Transfer learning only works with trainable set to false

I have two models initialized like this
vgg19 = keras.applications.vgg19.VGG19(
weights='imagenet',
include_top=False,
input_shape=(img_height, img_width, img_channels))
for layer in vgg19.layers:
layer.trainable = False
model = Sequential(layers=vgg19.layers)
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dense(512, activation='relu'))
model.add(Dense(10, activation='softmax'))
opt = Adam(learning_rate=0.001, beta_1=0.9)
model.compile(
loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
and
vgg19_2 = keras.applications.vgg19.VGG19(
weights='imagenet',
include_top=False,
input_shape=(img_height, img_width, img_channels))
model2 = Sequential(layers=vgg19_2.layers)
model2.add(Dense(1024, activation='relu'))
model2.add(Dense(512, activation='relu'))
model2.add(Dense(10, activation='softmax'))
opt = Adam(learning_rate=0.001, beta_1=0.9)
model2.compile(
loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
In other words the only difference is the second model doesn't set vgg19 layers' trainable parameter to false. Unfortunately the model with trainable set to true does not learn the data.
When I use model.fit I get
Trainable set to false:
Epoch 1/51
2500/2500 [==============================] - 49s 20ms/step - loss: 1.4319 - accuracy: 0.5466 - val_loss: 1.3951 - val_accuracy: 0.5693
Epoch 2/51
2500/2500 [==============================] - 47s 19ms/step - loss: 1.1508 - accuracy: 0.6009 - val_loss: 0.7832 - val_accuracy: 0.6023
Epoch 3/51
2500/2500 [==============================] - 48s 19ms/step - loss: 1.0816 - accuracy: 0.6256 - val_loss: 0.6782 - val_accuracy: 0.6153
Epoch 4/51
2500/2500 [==============================] - 47s 19ms/step - loss: 1.0396 - accuracy: 0.6450 - val_loss: 1.3045 - val_accuracy: 0.6103
The model trains to about 65% accuracy within a few epochs. However using model2 which should be able to make even better predictions (since there are more trainable parameters) I get:
Epoch 1/5
2500/2500 [==============================] - 226s 90ms/step - loss: 2.3028 - accuracy: 0.0980 - val_loss: 2.3038 - val_accuracy: 0.1008
Epoch 2/5
2500/2500 [==============================] - 311s 124ms/step - loss: 2.3029 - accuracy: 0.0980 - val_loss: 2.2988 - val_accuracy: 0.1017
Epoch 3/5
2500/2500 [==============================] - 306s 123ms/step - loss: 2.3029 - accuracy: 0.0980 - val_loss: 2.3052 - val_accuracy: 0.0997
Epoch 4/5
2500/2500 [==============================] - 321s 129ms/step - loss: 2.3029 - accuracy: 0.0972 - val_loss: 2.3028 - val_accuracy: 0.0997
Epoch 5/5
2500/2500 [==============================] - 300s 120ms/step - loss: 2.3028 - accuracy: 0.0988 - val_loss: 2.3027 - val_accuracy: 0.1007
When I then try to compute weights gradients on my data I get only zeros. I understand that it may take a long time to train such a big neural net like vgg to optimum but considering the calculated gradients for the last 3 layers should be very similar in both cases why is the accuracy so low? Training for more time gives no improvement.

Try this:
Train the first model, which sets trainable to False. You don't have to train it to saturation, so I would start with your 5 epochs.
Go back and set trainable to True for all the vgg19 parameters. Then, per the documentation, you can rebuild and recompile the model to have these changes take effect.
Continue training on the rebuilt model, which now has all parameters available for tuning.
It is very common in transfer learning to completely freeze the transferred layers in order to preserve them. In the early stages of training your additional layers don't know what to do. That means a noisy gradient by the time it gets to the transferred layers, which will quickly "detune" them away from their previously well-tuned weights.
Putting it all together into some code, it would look something like this.
# Original code. Transfer VGG and freeze the weights.
vgg19 = keras.applications.vgg19.VGG19(
weights='imagenet',
include_top=False,
input_shape=(img_height, img_width, img_channels))
for layer in vgg19.layers:
layer.trainable = False
model = Sequential(layers=vgg19.layers)
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dense(512, activation='relu'))
model.add(Dense(10, activation='softmax'))
opt = Adam(learning_rate=0.001, beta_1=0.9)
model.compile(
loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
model.fit()
# New second stage: unfreeze and continue training.
for layer in vgg19.layers:
layer.trainable = True
full_model = Sequential(layers=model.layers)
full_model.compile(
loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
full_model.fit()
You may want to tune the learning rate for the fine-tuning stage. It's not essential to start, just something to keep in mind.
A third option is to use discriminative learning rates, as introduced by Jeremy Howard and Sebastian Ruder in the ULMFiT paper. The idea is that, in Transfer Learning, you usually want the later layers to learn faster than the earlier, transferred layers. So you actually set the learning rates to be different for different sets of layers. The fastai library has a PyTorch implementation that works by dividing the model into "layer groups" and allowing different parameters for each.

Validaton loss decrease and validation accuracy decrease in CNN classification

Im training classification on 2 classes (spawned fish or not from image of scale). The dataset is unbalanced. There is only 5% spawned scales.
I havnt checked how many spawned fish are in each of train/validation/test sets, but there are 9073 images. Splitt in 70/15/15 %. Then I observe in epoke 2 that val_loss decrease while val_acc decrease. How is that possible?
Im using Keras. The network is EfficientNetB4 from github.com/qubvel.
1600/1600 [==============================] - 1557s 973ms/step - loss: 1.3353 - acc: 0.6474 - val_loss: 0.8055 - val_acc: 0.7046
Epoch 00001: val_loss improved from inf to 0.80548, saving model to ./checkpoints_missing_loss2/salmon_scale_inception.001-0.81.hdf5
Epoch 2/150
1600/1600 [==============================] - 1508s 943ms/step - loss: 0.8013 - acc: 0.7084 - val_loss: 0.6816 - val_acc: 0.6973
Epoch 00002: val_loss improved from 0.80548 to 0.68164, saving model to ./checkpoints_missing_loss2/salmon_scale_inception.002-0.68.hdf5
Edit: here is another example - only 1010 images but its balanced - 50/50.
Epoch 5/150
1600/1600 [==============================] - 1562s 976ms/step - loss: 0.0219 - acc: 0.9933 - val_loss: 0.2639 - val_acc: 0.9605
Epoch 00005: val_loss improved from 0.28715 to 0.26390, saving model to ./checkpoints_missing_loss2/salmon_scale_inception.005-0.26.hdf5
Epoch 6/150
1600/1600 [==============================] - 1565s 978ms/step - loss: 0.0059 - acc: 0.9982 - val_loss: 0.4140 - val_acc: 0.9276
Epoch 00006: val_loss did not improve from 0.26390
Epoch 7/150
1600/1600 [==============================] - 1561s 976ms/step - loss: 0.0180 - acc: 0.9941 - val_loss: 0.2379 - val_acc: 0.9276
and val_loss decrease aswell as val_acc.

If you have such an unbalanced dataset, the model first classifies everything as the majority class which gets relatively high accuracy, but all probability is distributed to the majority class. The reason is that the final bias can be learned very quickly because the back-propagation path is very short.
In the later stages of the training, the model basically finds reasons not to classify the input with the majority class. At this point, the model starts to make mistakes, the accuracy goes down, but the probability is more evenly distributed, so from the loss perspective, the error is smaller.
With such an imbalanced dataset, I would rather track F-measure instead of accuracy.

Transfer Learning - Val_loss strange behaviour

I am trying to use transfer-learning on MobileNetV2 from keras.application in phyton.
My images belongs to 4 classes with an amount of 8000, 7000, 8000 and 8000 images in the first, second, third and last class. My images are gray-scaled and resized from 1024x1024 to 128x128.
I removed the classification dense layers from MobileNetV2 and added my own dense layers:
global_average_pooling2d_1 (Glo Shape = (None, 1280) 0 Parameters
______________________________________________________________________________
dense_1 (Dense) Shape=(None, 4) 5124 Parameters
______________________________________________________________________________
dropout_1 (Dropout) Shape=(None, 4) 0 Parameters
________________________________________________________________
dense_2 (Dense) Shape=(None, 4) 20 Parameters
__________________________________________________________________________
dense_3 (Dense) Shape=(None, 4) 20 Parameters
Total params: 2,263,148
Trainable params: 5,164
Non-trainable params: 2,257,984
As you can see I added 2 dense layers with dropout as regularizer.
Furhtermore, I used the following
opt = optimizers.SGD(lr=0.001, decay=4e-5, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
batch_size = 32
My results on training are very weird... :
Epoch
1 loss: 1.3378 - acc: 0.3028 - val_loss: 1.4629 - val_acc: 0.2702
2 loss: 1.2807 - acc: 0.3351 - val_loss: 1.3297 - val_acc: 0.3208
3 loss: 1.2641 - acc: 0.3486 - val_loss: 1.4428 - val_acc: 0.3707
4 loss: 1.2178 - acc: 0.3916 - val_loss: 1.4231 - val_acc: 0.3758
5 loss: 1.2100 - acc: 0.3909 - val_loss: 1.4009 - val_acc: 0.3625
6 loss: 1.1979 - acc: 0.3976 - val_loss: 1.5025 - val_acc: 0.3116
7 loss: 1.1943 - acc: 0.3988 - val_loss: 1.4510 - val_acc: 0.2872
8 loss: 1.1926 - acc: 0.3965 - val_loss: 1.5162 - val_acc: 0.3072
9 loss: 1.1888 - acc: 0.4004 - val_loss: 1.5659 - val_acc: 0.3304
10 loss: 1.1906 - acc: 0.3969 - val_loss: 1.5655 - val_acc: 0.3260
11 loss: 1.1864 - acc: 0.3999 - val_loss: 1.6286 - val_acc: 0.2967
(...)
Summarizing, the loss of training does not decrease anymore and is still very high. The model also overfits.
You may ask why I added only 2 dense layers with 4 neurons in each. In the beginning I tried different configurations (e.g. 128 neurons and 64 neurons and also different regulaziers), then overfitting was a huge problem, i.e. accuracy on training was almost 1 and loss on test was still far away from 0.
I am a little bit confused what is going on, since something tremendously is wrong here.
Fine-tuning attempts:
Different numbers of neurons in the dense layers in the classification part varying from 1024 to 4.
Different learning rates (0.01, 0.001, 0.0001)
Different batch sizes (16,32, 64)
Different regulaziers L1 with 0.001, 0.0001
Results:
Always huge overfitting
base_model = MobileNetV2(input_shape=(128, 128, 3), weights='imagenet', include_top=False)
# define classificator
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(4, activation='relu')(x)
x = Dropout(0.8)(x)
x = Dense(4, activation='relu')(x)
preds = Dense(4, activation='softmax')(x) #final layer with softmax activation
model = Model(inputs=base_model.input, outputs=preds)
for layer in model.layers[:-4]:
layer.trainable = False
opt = optimizers.SGD(lr=0.001, decay=4e-5, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
batch_size = 32
EPOCHS = int(trainY.size/batch_size)
H = model.fit(trainX, trainY, validation_data=(testX, testY), epochs=EPOCHS, batch_size=batch_size)
Result should be that there is no overfitting and val_loss close to 0. I know that from some paper working on similiar image sets.
UPDATE:
Here are some pictures of val_loss, train_loss and accuracy:
2 dense layers with 16 and 8 neurons, lr =0.001 with decay 1e-6, batchsize=25

Here, you used
x = Dropout(0.8)(x)
which means to drop 80% but i assume you need 20%
so replace it by x = Dropout(0.2)(x)
Also, please go thorugh keras documentation for the same if needed.
an extract from the above documentation
keras.layers.Dropout(rate, noise_shape=None, seed=None)
rate: float between 0 and 1. Fraction of the input units to drop.

I am not sure what the error was from above, but i know how to fix it. I completely trained the pretrained network (aswell one dense layer with 4 neurons and softmax). The results are more than satisfying.
I also tested on VGG16, where I trained only the dense output layer and it totally worked fine.
It seems to be that MobileNetV2 learns features which undesirable for my set of datas. My data sets are radar images, which looks very artificially (choi williams distribution of 'LPI'-signals). On the other hand those images are very easy (they are basicially just edges in a grayscale image), so it is still unkown to me why model-based transfer learning doesnt work for MobileNetV2).

Could be the result of your dropout rate being to high. You do not show your data generators so I can't tell if there is an issue there but I suspect that you need to compile using
loss='sparse_categorical_crossentropy'

Keras validation accuracy is 0, and stays constant throughout the training

I am doing a time series analysis using Tensorflow/ Keras in Python.
The overall LSTM model looks like,
model = keras.models.Sequential()
model.add(keras.layers.LSTM(25, input_shape = (1,1), activation = 'relu', dropout = 0.2, return_sequences = False))
model.add(keras.layers.Dense(1))
model.compile(optimizer = 'adam', loss = 'mean_squared_error', metrics=['acc'])
tensorboard = keras.callbacks.TensorBoard(log_dir="logs/{}".format(time()))
es = keras.callbacks.EarlyStopping(monitor='val_acc', mode='max', verbose=1, patience=50)
mc = keras.callbacks.ModelCheckpoint('/home/sukriti/best_model.h5', monitor='val_loss', mode='min', save_best_only=True)
history = model.fit(trainX_3d, trainY_1d, epochs=50, batch_size=10, verbose=2, validation_data = (testX_3d, testY_1d), callbacks=[mc, es, tensorboard])
I am having the following outcome,
Train on 14015 samples, validate on 3503 samples
Epoch 1/50
- 3s - loss: 0.0222 - acc: 7.1352e-05 - val_loss: 0.0064 - val_acc: 0.0000e+00
Epoch 2/50
- 2s - loss: 0.0120 - acc: 7.1352e-05 - val_loss: 0.0054 - val_acc: 0.0000e+00
Epoch 3/50
- 2s - loss: 0.0108 - acc: 7.1352e-05 - val_loss: 0.0047 - val_acc: 0.0000e+00
Now the val_acc remains unchanged. Is it normal?
what does it signify?

As signified by loss = 'mean_squared_error', you are in a regression setting, where accuracy is meaningless (it is meaningful only in classification problems).
Unfortunately, Keras will not "protect" you in such a case, insisting in computing and reporting back an "accuracy", despite the fact that it is meaningless and inappropriate for your problem - see my answer in What function defines accuracy in Keras when the loss is mean squared error (MSE)?
You should simply remove metrics=['acc'] from your model compilation, and don't bother - in regression settings, MSE itself can (and usually does) serve also as the performance metric.

In my case I had validation accuracy of 0.0000e+00 throughout training (using Keras and CNTK-GPU backend) when my batch size was 64 but there were only 120 samples in my validation set (divided into three classes). After I changed the batch size to 60, I got normal accuracy values.

It will not improve with changing batch size or with metrics. I had the same problem but when I shuffled my training and validation data set 0.0000e+00 gone.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Why would LSTM with one time step perform better than MLP? - machine-learning

Related

NASNet transfer learning not giving accuracy

Transfer learning only works with trainable set to false

Validaton loss decrease and validation accuracy decrease in CNN classification

Transfer Learning - Val_loss strange behaviour

Keras validation accuracy is 0, and stays constant throughout the training

Categories

Resources