ImageAI understadning output - machine-learning

Running ResNet with help of ImageAI. I can't understand why there is two sets of data under each epoch.
What does it mean 10/10 and 111/111. Why model saving coming before finishing this epoch?
There is totally no information in the manual.
Using Enhanced Data Generation
Found 442 images belonging to 2 classes.
Found 38 images belonging to 2 classes.
JSON Mapping for the model classes saved to /content/drive/My Drive/Colab Notebooks/images/ML/json/model_class.json
Number of experiments (Epochs) : 100
Epoch 1/100
10/10 [==============================] - 22s 2s/step - loss: 7.6561 - acc: 0.5000
Epoch 00001: val_acc improved from -inf to 0.50000, saving model to /content/drive/My Drive/Colab Notebooks/images/ML/models/model_ex-001_acc-0.500000.h5
111/111 [==============================] - 297s 3s/step - loss: 0.7466 - acc: 0.7941 - val_loss: 7.6561 - val_acc: 0.5000
Epoch 2/100
10/10 [==============================] - 0s 28ms/step - loss: 7.6561 - acc: 0.5000
Epoch 00002: val_acc did not improve from 0.50000
111/111 [==============================] - 12s 105ms/step - loss: 0.3910 - acc: 0.8778 - val_loss: 7.6561 - val_acc: 0.5000

Related

Why I am retrieving high value loss for neural network regression

I have data in the following format consisting of 80 instances. I need to predict two-parameter latency and accuracy
No Model Technique Latency Accuracy
0 1 Net Repartition 31308.4 0.99
1 2 Net Connection 30338.2 0.79
2 3 MobiNet Repartition 20360.1 0.89
predictors=data.drop(['Latency','Accuracy'], axis = 1)
target=data[['Latency', 'Accuracy']]
predictors_cat_converted=pd.get_dummies(predictors, prefix=['Model', 'Technique'])
pre_norms = (predictors_cat_converted-predictors_cat_converted.mean()/predictors_cat_converted.std())
def regression():
model=Sequential()
model.add(Dense(50, activation= 'relu',input_shape=(n_cols,)))
model.add(Dense(50, activation='relu'))#hidden layer
model.add(Dense(2))#output
model.compile(optimizer='adam',loss='mean_squared_error')
return model
model=regression()
model.fit(pre_norms, target,validation_split=.3,epochs=100,verbose=1)
Output retrieving high value loss
Epoch 1/100
2/2 [==============================] - 1s 275ms/step - loss: 256321162.6667 - val_loss: 262150224.0000
Epoch 2/100
2/2 [==============================] - 0s 23ms/step - loss: 246612645.3333 - val_loss: 262146176.0000
Epoch 3/100
2/2 [==============================] - 0s 22ms/step - loss: 251778928.0000 - val_loss: 262142000.0000
Epoch 4/100
2/2 [==============================] - 0s 26ms/step - loss: 252470826.6667 - val_loss: 262137664.0000
Epoch 5/100
2/2 [==============================] - 0s 25ms/step - loss: 255799392.0000 - val_loss: 262133200.0000
Epoch 6/100
You have very less data, just 2 columns, 80 rows and 2 target variables. All you can do is:
Add more data.
Normalize your data and then feed it to the neural network.
If neural network not giving good accuracy, try Random Forest or XGBoost.
I also want to add one thing that is your neural network architecture is wrong. Dense layer with 2 outputs and a softmax activation isn't going to give you good result here. You have to use TensorFlow's Funtional API and make 1 input 2 output neural network architecture.
One of your target variables reaches quite big values. As shown in the excerpt of your data, "Latency" reaches values around 30,000 and 20,000.
Evidently if your model makes quite wrong predictions in the beginning, f.e. if it predicts "1" for your Latency, the MSE will be extremely high.
You could normalize your targets as you did with your inputs to make it easier for your network to learn the targets. Your MSE and hence your loss should be much smaller then

Transfer learning only works with trainable set to false

I have two models initialized like this
vgg19 = keras.applications.vgg19.VGG19(
weights='imagenet',
include_top=False,
input_shape=(img_height, img_width, img_channels))
for layer in vgg19.layers:
layer.trainable = False
model = Sequential(layers=vgg19.layers)
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dense(512, activation='relu'))
model.add(Dense(10, activation='softmax'))
opt = Adam(learning_rate=0.001, beta_1=0.9)
model.compile(
loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
and
vgg19_2 = keras.applications.vgg19.VGG19(
weights='imagenet',
include_top=False,
input_shape=(img_height, img_width, img_channels))
model2 = Sequential(layers=vgg19_2.layers)
model2.add(Dense(1024, activation='relu'))
model2.add(Dense(512, activation='relu'))
model2.add(Dense(10, activation='softmax'))
opt = Adam(learning_rate=0.001, beta_1=0.9)
model2.compile(
loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
In other words the only difference is the second model doesn't set vgg19 layers' trainable parameter to false. Unfortunately the model with trainable set to true does not learn the data.
When I use model.fit I get
Trainable set to false:
Epoch 1/51
2500/2500 [==============================] - 49s 20ms/step - loss: 1.4319 - accuracy: 0.5466 - val_loss: 1.3951 - val_accuracy: 0.5693
Epoch 2/51
2500/2500 [==============================] - 47s 19ms/step - loss: 1.1508 - accuracy: 0.6009 - val_loss: 0.7832 - val_accuracy: 0.6023
Epoch 3/51
2500/2500 [==============================] - 48s 19ms/step - loss: 1.0816 - accuracy: 0.6256 - val_loss: 0.6782 - val_accuracy: 0.6153
Epoch 4/51
2500/2500 [==============================] - 47s 19ms/step - loss: 1.0396 - accuracy: 0.6450 - val_loss: 1.3045 - val_accuracy: 0.6103
The model trains to about 65% accuracy within a few epochs. However using model2 which should be able to make even better predictions (since there are more trainable parameters) I get:
Epoch 1/5
2500/2500 [==============================] - 226s 90ms/step - loss: 2.3028 - accuracy: 0.0980 - val_loss: 2.3038 - val_accuracy: 0.1008
Epoch 2/5
2500/2500 [==============================] - 311s 124ms/step - loss: 2.3029 - accuracy: 0.0980 - val_loss: 2.2988 - val_accuracy: 0.1017
Epoch 3/5
2500/2500 [==============================] - 306s 123ms/step - loss: 2.3029 - accuracy: 0.0980 - val_loss: 2.3052 - val_accuracy: 0.0997
Epoch 4/5
2500/2500 [==============================] - 321s 129ms/step - loss: 2.3029 - accuracy: 0.0972 - val_loss: 2.3028 - val_accuracy: 0.0997
Epoch 5/5
2500/2500 [==============================] - 300s 120ms/step - loss: 2.3028 - accuracy: 0.0988 - val_loss: 2.3027 - val_accuracy: 0.1007
When I then try to compute weights gradients on my data I get only zeros. I understand that it may take a long time to train such a big neural net like vgg to optimum but considering the calculated gradients for the last 3 layers should be very similar in both cases why is the accuracy so low? Training for more time gives no improvement.
Try this:
Train the first model, which sets trainable to False. You don't have to train it to saturation, so I would start with your 5 epochs.
Go back and set trainable to True for all the vgg19 parameters. Then, per the documentation, you can rebuild and recompile the model to have these changes take effect.
Continue training on the rebuilt model, which now has all parameters available for tuning.
It is very common in transfer learning to completely freeze the transferred layers in order to preserve them. In the early stages of training your additional layers don't know what to do. That means a noisy gradient by the time it gets to the transferred layers, which will quickly "detune" them away from their previously well-tuned weights.
Putting it all together into some code, it would look something like this.
# Original code. Transfer VGG and freeze the weights.
vgg19 = keras.applications.vgg19.VGG19(
weights='imagenet',
include_top=False,
input_shape=(img_height, img_width, img_channels))
for layer in vgg19.layers:
layer.trainable = False
model = Sequential(layers=vgg19.layers)
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dense(512, activation='relu'))
model.add(Dense(10, activation='softmax'))
opt = Adam(learning_rate=0.001, beta_1=0.9)
model.compile(
loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
model.fit()
# New second stage: unfreeze and continue training.
for layer in vgg19.layers:
layer.trainable = True
full_model = Sequential(layers=model.layers)
full_model.compile(
loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
full_model.fit()
You may want to tune the learning rate for the fine-tuning stage. It's not essential to start, just something to keep in mind.
A third option is to use discriminative learning rates, as introduced by Jeremy Howard and Sebastian Ruder in the ULMFiT paper. The idea is that, in Transfer Learning, you usually want the later layers to learn faster than the earlier, transferred layers. So you actually set the learning rates to be different for different sets of layers. The fastai library has a PyTorch implementation that works by dividing the model into "layer groups" and allowing different parameters for each.

Validaton loss decrease and validation accuracy decrease in CNN classification

Im training classification on 2 classes (spawned fish or not from image of scale). The dataset is unbalanced. There is only 5% spawned scales.
I havnt checked how many spawned fish are in each of train/validation/test sets, but there are 9073 images. Splitt in 70/15/15 %. Then I observe in epoke 2 that val_loss decrease while val_acc decrease. How is that possible?
Im using Keras. The network is EfficientNetB4 from github.com/qubvel.
1600/1600 [==============================] - 1557s 973ms/step - loss: 1.3353 - acc: 0.6474 - val_loss: 0.8055 - val_acc: 0.7046
Epoch 00001: val_loss improved from inf to 0.80548, saving model to ./checkpoints_missing_loss2/salmon_scale_inception.001-0.81.hdf5
Epoch 2/150
1600/1600 [==============================] - 1508s 943ms/step - loss: 0.8013 - acc: 0.7084 - val_loss: 0.6816 - val_acc: 0.6973
Epoch 00002: val_loss improved from 0.80548 to 0.68164, saving model to ./checkpoints_missing_loss2/salmon_scale_inception.002-0.68.hdf5
Edit: here is another example - only 1010 images but its balanced - 50/50.
Epoch 5/150
1600/1600 [==============================] - 1562s 976ms/step - loss: 0.0219 - acc: 0.9933 - val_loss: 0.2639 - val_acc: 0.9605
Epoch 00005: val_loss improved from 0.28715 to 0.26390, saving model to ./checkpoints_missing_loss2/salmon_scale_inception.005-0.26.hdf5
Epoch 6/150
1600/1600 [==============================] - 1565s 978ms/step - loss: 0.0059 - acc: 0.9982 - val_loss: 0.4140 - val_acc: 0.9276
Epoch 00006: val_loss did not improve from 0.26390
Epoch 7/150
1600/1600 [==============================] - 1561s 976ms/step - loss: 0.0180 - acc: 0.9941 - val_loss: 0.2379 - val_acc: 0.9276
and val_loss decrease aswell as val_acc.
If you have such an unbalanced dataset, the model first classifies everything as the majority class which gets relatively high accuracy, but all probability is distributed to the majority class. The reason is that the final bias can be learned very quickly because the back-propagation path is very short.
In the later stages of the training, the model basically finds reasons not to classify the input with the majority class. At this point, the model starts to make mistakes, the accuracy goes down, but the probability is more evenly distributed, so from the loss perspective, the error is smaller.
With such an imbalanced dataset, I would rather track F-measure instead of accuracy.

Keras EarlyStopping patience parameter

I'm trying to do some binary classification and I use Keras's EarlyStopping callback. However, I have a question regarding patience parameter.
In the documentation it is stated
patience: number of epochs with no improvement after which training will be stopped.
but I find that it behaves in a different way. For example, I have set
EarlyStopping(monitor='val_loss', min_delta=0.0001, patience=2, verbose=0, mode='auto')
and here are the results:
val_loss: 0.6811
val_loss: 0.6941
val_loss: 0.6532
val_loss: 0.6546
val_loss: 0.6534
val_loss: 0.6489
val_loss: 0.6240
val_loss: 0.6285
val_loss: 0.6144
val_loss: 0.5921
val_loss: 0.5731
val_loss: 0.5956
val_loss: 0.5753
val_loss: 0.5977
After this training has stopped. As far as I see there are no 2 consecutively increasing loss values at the end. Could someone give an explanation to this parameter-phenomena?
There are three consecutively worse runs by loss, let's look at the numbers:
val_loss: 0.5921 < current best
val_loss: 0.5731 < current best
val_loss: 0.5956 < patience 1
val_loss: 0.5753 < patience 2
val_loss: 0.5977 < patience >2, stopping the training
You already discovered the min delta parameter, but I think it is too small to trigger here (you're off by 10x).
Epoch 1 val_loss: 0.6811 <- current best
Epoch 2 val_loss: 0.6941 <-patience 1
Epoch 3 val_loss: 0.6532 <- current best # current best gets updated
Epoch 4 val_loss: 0.6546 <- patience 1
Epoch 5 val_loss: 0.6534 <-patience 2
Training will stop at epoch 5
Try this example in google colab for more intuitive understanding:https://colab.research.google.com/github/minsuk-heo/tf2/blob/master/jupyter_notebooks/06.DropOut_EarlyStopping.ipynb

Keras NoteBook GPU Timeout

I am trying to run keras with tensorflow on a windows 10 machine with my GTX 980 gpu on a jupyter notebook. If I run tensorflow alone with my gpu, its works perfectly fine without any issues. But problems arise with the keras interface for high number of epochs.
The keras model uses the GPU and gives an output if my number of epochs is low like the following
with tf.device('/gpu:0'):
model.compile('adam', 'categorical_crossentropy', ['accuracy'])
history = model.fit(X_normalized,y_one_hot,batch_size=128,nb_epoch=2,validation_split=0.2)
Following is the output
Train on 31367 samples, validate on 7842 samples
Epoch 1/2
31367/31367 [==============================] - 3s - loss: 1.7640 - acc: 0.5438 - val_loss: 1.2872 - val_acc: 0.6486 - ETA: 0s - loss: 1.8827 - acc: 0.5145 - ETA: 0s - loss: 1.7732 - acc: 0.5416
Epoch 2/2
31367/31367 [==============================] - 2s - loss: 0.8539 - acc: 0.7765 - val_loss: 0.7958 - val_acc: 0.7615
If the number of epochs is high then it will timeout with the following error and the webpage says busy
WebSocket ping timeout after 119999 ms.
How do i fix this error?
I guess this issue is related to TDR(Timeout Detection and Recovery) on Windows.
Basically, the OS thought the GPU hang and do not response any more, so OS will reboot the graphics card. You can try to disable the TDR or extend the up limit of TdRDelay. More details can be found https://learn.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys.

Resources