Keras EarlyStopping patience parameter - machine-learning

I'm trying to do some binary classification and I use Keras's EarlyStopping callback. However, I have a question regarding patience parameter.
In the documentation it is stated
patience: number of epochs with no improvement after which training will be stopped.
but I find that it behaves in a different way. For example, I have set
EarlyStopping(monitor='val_loss', min_delta=0.0001, patience=2, verbose=0, mode='auto')
and here are the results:
val_loss: 0.6811
val_loss: 0.6941
val_loss: 0.6532
val_loss: 0.6546
val_loss: 0.6534
val_loss: 0.6489
val_loss: 0.6240
val_loss: 0.6285
val_loss: 0.6144
val_loss: 0.5921
val_loss: 0.5731
val_loss: 0.5956
val_loss: 0.5753
val_loss: 0.5977
After this training has stopped. As far as I see there are no 2 consecutively increasing loss values at the end. Could someone give an explanation to this parameter-phenomena?

There are three consecutively worse runs by loss, let's look at the numbers:
val_loss: 0.5921 < current best
val_loss: 0.5731 < current best
val_loss: 0.5956 < patience 1
val_loss: 0.5753 < patience 2
val_loss: 0.5977 < patience >2, stopping the training
You already discovered the min delta parameter, but I think it is too small to trigger here (you're off by 10x).

Epoch 1 val_loss: 0.6811 <- current best
Epoch 2 val_loss: 0.6941 <-patience 1
Epoch 3 val_loss: 0.6532 <- current best # current best gets updated
Epoch 4 val_loss: 0.6546 <- patience 1
Epoch 5 val_loss: 0.6534 <-patience 2
Training will stop at epoch 5
Try this example in google colab for more intuitive understanding:https://colab.research.google.com/github/minsuk-heo/tf2/blob/master/jupyter_notebooks/06.DropOut_EarlyStopping.ipynb

Related

Why I am retrieving high value loss for neural network regression

I have data in the following format consisting of 80 instances. I need to predict two-parameter latency and accuracy
No Model Technique Latency Accuracy
0 1 Net Repartition 31308.4 0.99
1 2 Net Connection 30338.2 0.79
2 3 MobiNet Repartition 20360.1 0.89
predictors=data.drop(['Latency','Accuracy'], axis = 1)
target=data[['Latency', 'Accuracy']]
predictors_cat_converted=pd.get_dummies(predictors, prefix=['Model', 'Technique'])
pre_norms = (predictors_cat_converted-predictors_cat_converted.mean()/predictors_cat_converted.std())
def regression():
model=Sequential()
model.add(Dense(50, activation= 'relu',input_shape=(n_cols,)))
model.add(Dense(50, activation='relu'))#hidden layer
model.add(Dense(2))#output
model.compile(optimizer='adam',loss='mean_squared_error')
return model
model=regression()
model.fit(pre_norms, target,validation_split=.3,epochs=100,verbose=1)
Output retrieving high value loss
Epoch 1/100
2/2 [==============================] - 1s 275ms/step - loss: 256321162.6667 - val_loss: 262150224.0000
Epoch 2/100
2/2 [==============================] - 0s 23ms/step - loss: 246612645.3333 - val_loss: 262146176.0000
Epoch 3/100
2/2 [==============================] - 0s 22ms/step - loss: 251778928.0000 - val_loss: 262142000.0000
Epoch 4/100
2/2 [==============================] - 0s 26ms/step - loss: 252470826.6667 - val_loss: 262137664.0000
Epoch 5/100
2/2 [==============================] - 0s 25ms/step - loss: 255799392.0000 - val_loss: 262133200.0000
Epoch 6/100
You have very less data, just 2 columns, 80 rows and 2 target variables. All you can do is:
Add more data.
Normalize your data and then feed it to the neural network.
If neural network not giving good accuracy, try Random Forest or XGBoost.
I also want to add one thing that is your neural network architecture is wrong. Dense layer with 2 outputs and a softmax activation isn't going to give you good result here. You have to use TensorFlow's Funtional API and make 1 input 2 output neural network architecture.
One of your target variables reaches quite big values. As shown in the excerpt of your data, "Latency" reaches values around 30,000 and 20,000.
Evidently if your model makes quite wrong predictions in the beginning, f.e. if it predicts "1" for your Latency, the MSE will be extremely high.
You could normalize your targets as you did with your inputs to make it easier for your network to learn the targets. Your MSE and hence your loss should be much smaller then

ImageAI understadning output

Running ResNet with help of ImageAI. I can't understand why there is two sets of data under each epoch.
What does it mean 10/10 and 111/111. Why model saving coming before finishing this epoch?
There is totally no information in the manual.
Using Enhanced Data Generation
Found 442 images belonging to 2 classes.
Found 38 images belonging to 2 classes.
JSON Mapping for the model classes saved to /content/drive/My Drive/Colab Notebooks/images/ML/json/model_class.json
Number of experiments (Epochs) : 100
Epoch 1/100
10/10 [==============================] - 22s 2s/step - loss: 7.6561 - acc: 0.5000
Epoch 00001: val_acc improved from -inf to 0.50000, saving model to /content/drive/My Drive/Colab Notebooks/images/ML/models/model_ex-001_acc-0.500000.h5
111/111 [==============================] - 297s 3s/step - loss: 0.7466 - acc: 0.7941 - val_loss: 7.6561 - val_acc: 0.5000
Epoch 2/100
10/10 [==============================] - 0s 28ms/step - loss: 7.6561 - acc: 0.5000
Epoch 00002: val_acc did not improve from 0.50000
111/111 [==============================] - 12s 105ms/step - loss: 0.3910 - acc: 0.8778 - val_loss: 7.6561 - val_acc: 0.5000

LSTM Accuracy unchanged while loss decrease

We put a sensor to detect anomalies in accelerometer.
There is only one sensor so my data is 1-D array.
I tried to use LSTM autoencoder for anomaly detection.
But my model didn't work as the losses of the training and validation sets were decreasing but accuracy unchanged.
Here is my Code and training log:
dim = 1
timesteps = 32
data.shape = (-1,timesteps,dim)
model = Sequential()
model.add(LSTM(50,input_shape=(timesteps,dim),return_sequences=True))
model.add(Dense(dim))
lr = 0.00001
Nadam = optimizers.Nadam(lr=lr)
model.compile(loss='mae', optimizer=Nadam ,metrics=['accuracy'])
EStop = EarlyStopping(monitor='val_loss', min_delta=0.001,patience=150, verbose=2, mode='auto',restore_best_weights=True)
history = model.fit(data,data,validation_data=(data,data),epochs=2000,batch_size=64,verbose=2,shuffle=False,callbacks=[EStop]).history
Trainging Log
Train on 4320 samples, validate on 4320 samples
Epoch 1/2000
- 3s - loss: 0.3855 - acc: 7.2338e-06 - val_loss: 0.3760 - val_acc: 7.2338e-06
Epoch 2/2000
- 2s - loss: 0.3666 - acc: 7.2338e-06 - val_loss: 0.3567 - val_acc: 7.2338e-06
Epoch 3/2000
- 2s - loss: 0.3470 - acc: 7.2338e-06 - val_loss: 0.3367 - val_acc: 7.2338e-06
...
Epoch 746/2000
- 2s - loss: 0.0021 - acc: 1.4468e-05 - val_loss: 0.0021 - val_acc: 1.4468e-05
Epoch 747/2000
- 2s - loss: 0.0021 - acc: 1.4468e-05 - val_loss: 0.0021 - val_acc: 1.4468e-05
Epoch 748/2000
- 2s - loss: 0.0021 - acc: 1.4468e-05 - val_loss: 0.0021 - val_acc: 1.4468e-05
Restoring model weights from the end of the best epoch
Epoch 00748: early stopping
A couple of things
As Matias in the comment field pointed out, you're doing a regression, not a classification. Accuracy will not give expected values for regression. That said, you can see that the accuracy did improve (from 0.0000072 to 0.0000145). Check the direct output from your model to check how well it approximates to original time series.
You can safely omit the validation data when your validation data is the same as the training data
With autoencoders, you generally want to compress the data in some way as to be able to represent the same data in a lower dimension which is easier to analyze (for anomalies or otherwise. In your case, you are expanding the dimensionality instead of reducing it, meaning the optimal strategy for your autoencoder would be to pass through the same values it gets in (value of your timeseries is sent to 50 LSTM units, which send their result to 1 Dense unit). You might be able to combat this if you set return_sequence to False (i.e. only the result from the last timestep is returned), preferably into more than one unit, and you then try to rebuild the timeseries from this instead. It might fail, but is still likely to lead to a better model
As #MatiasValdenegro said you shouldn't use accuracy when you want to do regression.
You can see that your model might be fine because your loss is decreasing over the epochs and is very low when early stopping.
In Regression Problems normaly these Metrics are used:
Mean Squared Error: mean_squared_error, MSE or mse
Mean Absolute Error: mean_absolute_error, MAE, mae
Mean Absolute Percentage Error: mean_absolute_percentage_error, MAPE,
mape
Cosine Proximity: cosine_proximity, cosine
Resource
To geht the right metrics you should change this (e.g. for "Mean Squared Error"):
model.compile(loss='mae', optimizer=Nadam ,metrics=['mse'])
As already said your model seems to be fine, you are just looking at the wrong metrics.
Hope this helps and feel free to ask.
Early stopping is not the best technique for regularization while you are facing this problem. At least, while you are still struggling to fix it I would rather take it out or at replace it with other regularization method. to figure out what happens.
Also another suggestion. Can you change a bit the validation set and see what is the behavior ? How did you build the validation set ?
Did you normalize / standardize the data ? Please note normalization is even more important for LSTMs
the metric is definitely a problem. The above suggestions are good.

Validaton loss decrease and validation accuracy decrease in CNN classification

Im training classification on 2 classes (spawned fish or not from image of scale). The dataset is unbalanced. There is only 5% spawned scales.
I havnt checked how many spawned fish are in each of train/validation/test sets, but there are 9073 images. Splitt in 70/15/15 %. Then I observe in epoke 2 that val_loss decrease while val_acc decrease. How is that possible?
Im using Keras. The network is EfficientNetB4 from github.com/qubvel.
1600/1600 [==============================] - 1557s 973ms/step - loss: 1.3353 - acc: 0.6474 - val_loss: 0.8055 - val_acc: 0.7046
Epoch 00001: val_loss improved from inf to 0.80548, saving model to ./checkpoints_missing_loss2/salmon_scale_inception.001-0.81.hdf5
Epoch 2/150
1600/1600 [==============================] - 1508s 943ms/step - loss: 0.8013 - acc: 0.7084 - val_loss: 0.6816 - val_acc: 0.6973
Epoch 00002: val_loss improved from 0.80548 to 0.68164, saving model to ./checkpoints_missing_loss2/salmon_scale_inception.002-0.68.hdf5
Edit: here is another example - only 1010 images but its balanced - 50/50.
Epoch 5/150
1600/1600 [==============================] - 1562s 976ms/step - loss: 0.0219 - acc: 0.9933 - val_loss: 0.2639 - val_acc: 0.9605
Epoch 00005: val_loss improved from 0.28715 to 0.26390, saving model to ./checkpoints_missing_loss2/salmon_scale_inception.005-0.26.hdf5
Epoch 6/150
1600/1600 [==============================] - 1565s 978ms/step - loss: 0.0059 - acc: 0.9982 - val_loss: 0.4140 - val_acc: 0.9276
Epoch 00006: val_loss did not improve from 0.26390
Epoch 7/150
1600/1600 [==============================] - 1561s 976ms/step - loss: 0.0180 - acc: 0.9941 - val_loss: 0.2379 - val_acc: 0.9276
and val_loss decrease aswell as val_acc.
If you have such an unbalanced dataset, the model first classifies everything as the majority class which gets relatively high accuracy, but all probability is distributed to the majority class. The reason is that the final bias can be learned very quickly because the back-propagation path is very short.
In the later stages of the training, the model basically finds reasons not to classify the input with the majority class. At this point, the model starts to make mistakes, the accuracy goes down, but the probability is more evenly distributed, so from the loss perspective, the error is smaller.
With such an imbalanced dataset, I would rather track F-measure instead of accuracy.

Keras NoteBook GPU Timeout

I am trying to run keras with tensorflow on a windows 10 machine with my GTX 980 gpu on a jupyter notebook. If I run tensorflow alone with my gpu, its works perfectly fine without any issues. But problems arise with the keras interface for high number of epochs.
The keras model uses the GPU and gives an output if my number of epochs is low like the following
with tf.device('/gpu:0'):
model.compile('adam', 'categorical_crossentropy', ['accuracy'])
history = model.fit(X_normalized,y_one_hot,batch_size=128,nb_epoch=2,validation_split=0.2)
Following is the output
Train on 31367 samples, validate on 7842 samples
Epoch 1/2
31367/31367 [==============================] - 3s - loss: 1.7640 - acc: 0.5438 - val_loss: 1.2872 - val_acc: 0.6486 - ETA: 0s - loss: 1.8827 - acc: 0.5145 - ETA: 0s - loss: 1.7732 - acc: 0.5416
Epoch 2/2
31367/31367 [==============================] - 2s - loss: 0.8539 - acc: 0.7765 - val_loss: 0.7958 - val_acc: 0.7615
If the number of epochs is high then it will timeout with the following error and the webpage says busy
WebSocket ping timeout after 119999 ms.
How do i fix this error?
I guess this issue is related to TDR(Timeout Detection and Recovery) on Windows.
Basically, the OS thought the GPU hang and do not response any more, so OS will reboot the graphics card. You can try to disable the TDR or extend the up limit of TdRDelay. More details can be found https://learn.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys.

Resources