keras.fit() re-initialises the weights - machine-learning

I have a model trained using model.fit() and used model.save() to save it on a physical file. Now, I have another data-set on which I want to resume training with the saved model. But, I found that every fit() call is been considered as a fresh training. That means, it is re-initialising the weights which was generated and saved before.
When I called fit() with epochs 0, then I do not see the weight reset problem. But, I definitely want to try with epochs > 0.
Am I missing something here, or is it an issue with Keras.
Keras version: 2.0.3
Thanks.

Actually - the case with calling fit is the following:
Weights are not reset - your model would have exactly the same weights as before calling fit - of course until the optimization algorithm won't change them during the first batch.
Model state not reset - this is scenario you probably came across. Model hidden states (especially in rnn case) are reset. This is the only thing which is changed. If you want to keep also these values (especially optimizer state is crucial in many cases) - you could use train_on_batch method which doesn't affect any state of model at all.
Optimizer states does not reset - Calling fit() again and again does not reset the optimizer state. Ref: https://github.com/keras-team/keras/issues/454#issuecomment-125644222

Calling fit should not re-initialize the weights.
You write that you are using a new dataset - if this dataset has different statistics it could easily cause the net to rapidly lose accuracy. If this is the case, try a very small learning rate or set trainable=False for the early layers during the first few epochs.

Related

training a model without any weights but just only bias

i was doing some tests using a VGG3 CNN model and i zeroed out all weights and their gradients after every backprop call. i basically allowed only the bias of the model to be trained / updated. what i got was interesting results that i cant per say explain. the accuracy of the model kept on increasing. started off at about 27.12% until after only 10 epochs it reached around 71% accuracy.
this kept me wondering how the accuracy is increasing so significantly without using any weights at all during training. i would love to get some theories and inputs regarding this.
can you actually train a model(not necessarily the best model) without using any weights at all?
If you zero the weights and only use bias then output of the model should be independent of the input. In fact the model only depends on the final layer's bias value. If you implemented it correctly then you should have something akin to a maximum a-priori estimator. I.e. your model predicts the most common class in your training data. That means one of the classes in your training/testing data is overrepresented and your model is always predicting that class.

How to look at the parameters of a pytorch model?

I have a simple pytorch neural net that I copied from openai, and I modified it to some extent (mostly the input).
When I run my code, the output of the network remains the same on every episode, as if no training occurs.
I want to see if any training happens, or if some other reason causes the results to be the same.
How can I make sure any movement happens to the weights?
Thanks
Depends on what you are doing, but the easiest would be to check the weights of your model.
You can do this (and compare with the ones from previous iteration) using the following code:
for parameter in model.parameters():
print(parameter.data)
If the weights are changing, the neural network is being optimized (which doesn't necessarily mean it learns anything useful in particular).

Determining number of epochs for model fitting in Keras

I'm trying to automatically determine when a Keras autoencoder converges. For example, look at this link under "Let's build the simplest autoencoder possible." The number of epochs is hardcoded at 50 (when the loss value converges). However, how would you code this using Keras if you didn't know the number was 50? Would you just keep calling fit()?
This question is actually ridiculously wide and hard. There are many techniques on how to set the number of epochs:
Early stopping- in this case you set the number of epochs to a really high number and you turn off the training when the improvement over next epochs is not satisfying. In Keras you have a special object called EarlyStopping which does the job for you.
Model Checkpoint - here you once again set up a really high number of epochs and you simply save only the best model w.r.t. to a metric chosen. Once again you have a special callback for this scenario.
Of course, there are other scenarios like e.g. using Reinforcement learning to find the stopping time or more complexed scenarios when you choose this in a Bayesian hyperparameter set up but those are much harder methods which are often not introducing any improvement.
One sure thing is that restarting a fit method might end up in unexpected behaviour as many inner states of a model are reset which could cause instability. For this scenario I strongly advise you to use train_on_batch which is not resetting model states and makes a lot of fancy training scenarios possible.

Tensorflow model selection ? Which model do I select?

I have trained a network with and LSTM but I see that there is over fitting And have tried several combinations of LR/batch size /optimizers but most combinations give a similar graph.
I would like to know I Could use a model before 75k iterations.
And would You consider this model to be over fit?
It is actually hard to say if this is overfitted - as you have really high variance in the training. It is probable, but not sure.
Which model to choose?
Usually you would create a validation dataset, where you test your networks performance, and you select the model (including set of hyperparameters) which yields the highest score. That's all. Without additional validation set it will be hard.
How to fit overfitting?
There are plenty of techniques, including:
early stopping (you will need yet again - validaation set, to test when your network starts to overfit)
adding priors
prior on the weights - like L2 regularization
prior over structure of the network - maybe you can limit the size of your network?
prior over data distribution - maybe you can augment your dataset in some way? Like - for images you can usually disort them a bit (rotate, translate) without losing label. For generic data usually noising them works fine
ensembling - averaging multiple networks (either explicitly, or through dropout) reduces overfitting
last but not least - gathering more data always helps (as in the limit empirical error converges to generalization).
The technique you are suggesting is called early stopping and many people have used it as a way to combat over fitting. Other things you could do would be to decrease the size of your network or to try and collect more data.

Relation between perceptron accuracy and epoch

Is it possible that the accuracy of perceptron decreases as I go through the training more times? In this case, I use the same training set several times.
Neither the accuracy on training data set nor on test data set is stable as the epoch increases. Actually the experimental data indicated that the trend of either in-sample-error or out-sample-error is not even monotonic. And a "pocket" strategy is often applied. Unlike early stop, the pocket algorithm keeps the best solution seen so far "in its pocket" instead of the last solution.
Yes.
This is a commonly studied phenomenon, the accuracy on never-before-seen data (testing data) start to decrease after certain point (after a certain number of passes through the training data-- what you call epochs). This phenomenon is called overfitting and is well understood. You want to stop early, as early as possible or use regularization.

Resources