If I have to make only a/some prediction(s), do I need to re-train my NN every time? Or I can, pardon me if this is silly, "save" the training and only do the test?
Currently I'm using Pycharm, but I've seen that with other IDEs, like Spyder, you can execute selected lines of code, in that case how does the NN keeps the training without the need to re-train?
Sorry if those question are too naive.
No, you don't need to re-train your NN every time. Just save your model parameters into a file and load to make new predictions.
Are you using any machine learning framework like Tensorflow or Keras? In Keras is very easy to implement this, there are two methods, first you can save model during training using the Callbacks and second, are possible to use your_model_name.save('file_name.h5') and then load with load_model('file_name.h5) to do some predictions. Use your_model_name.prediction(x).
By the way, there is a nice guide to how you can properly save the full model architecture or model weights.
EDIT: For both methods you can use load_model, is very simple!
Related
I have trained a neural network and an XGBoost model for the same problem, now I am confused that how should I stack them. Should I just pass the output of the neural network as a parameter to the XGBoost model, or should I take the weighting of their results seperately ? Which would be better ?
This question cannot be clearly answered. I would suggest to check both possibilities and chose the one, that worked best.
Using the output of one model as input to the other model
I guess, you know, what you have to do to use the output of the NN as input to XGBoost. You should just take some time, about how you handle the test and train data (see below). Use the "probabilities" rather than the binary labels for that. Of course, you could also try it vice-versa, so that the NN gets the output of the XGBoost model as an additional input.
Using a Votingclassifier
The other possibility is to use a VotingClassifier using soft-voting. You can use VotingClassifier(voting='soft') for that (to be precise sklearn.ensemble.VotingClassifier). You could also play around with the weights here.
Difference
The big difference is, that with the first possibility the XGBoost model might learn, in what areas the NN is weak and in which it is strong, while with the VotingClassifier the outputs of both models are equally weighted for all samples and it relies on the assumption that the model output a "probability" not so close to 0 / 1 if they are not so confident about the prediciton of the specific input record. But this assumption might not be always true.
Handling of the Train/Testdata
In both cases, you need to think about, how you should handle the train/test data. The train/test data should ideally be split the same way for both models. Otherwise you might introduce some kind of data-leakage problem.
For the VotingClassifier this is no problem, because it can be used as a regular skearn model class. For the first method (output of model 1 is one feature of model 2), you should make sure, you do the train-test-split (or the cross-validation) with exactly the same records. If you don't do that, you would run the risk to validate the output of your second model on a record which was in the training set of model 1 (except for the additonal feature of course) and this clearly could cause a data-leakage problem which results in a score that appears to be better than how the model would actually perform on unseen productive data.
I split my dataset into training and testing. At the end after finding the best hyper parameters for the training dataset, should I fit the model again using all the data? The point is to reach the highest possible score for new data.
Yes, that would help to generalize your model, as more data generally means better generalization.
I don't think so. If you do that, you will no longer have a valid test set. What happens when you come back to improve the model later? If you do this, then you will need a new test set each model improvement, which means more labeling. You won't be able to compare experiments across model versions, because the test set won't be identical.
If you consider this model finished forever, then ok.
I have a binary classification problem where I have around 15 features. I have chosen these features using some other model. Now I want to perform Bayesian Logistic on these features. My target classes are highly imbalance(minority class is 0.001%) and I have around 6 million records. I want to build a model which can be trained nighty or weekend using Bayesian logistic.
Currently, I have divided the data into 15 parts and then I train my model on the first part and test on the last part then I am updating my priors using Interpolated method of pymc3 and rerun the model using the 2nd set of data. I am checking the accuracy and other metrics(ROC, f1-score) after each run.
Problems:
My score is not improving.
Am I using the right approch?
This process is taking too much time.
If someone can guide me with the right approach and code snippets it will be very helpful for me.
You can use variational inference. It is faster than sampling and produces almost similar results. pymc3 itself provides methods for VI, you can explore that.
I only know this part of question. If you can elaborate your problem a bit further, maybe.. I can help you.
I am training a customized Named Entity Recognition (NER) model using NeuroNER which is written using tensor-flow.I am able to train a model and its performing well but when i am re-training it on new observation for which it showing incorrect result it correcting them but its affecting/forgetting some previous observation for which it showing correct results.
I want online re-training.I tried using stanfordNLP , Spacy and now tensor-flow.please suggest a better way to achieve the desired goals.
Thanks
I think there is a misunderstanding behind this question. When you train a model you adjust a set of parameters, sometimes millions of them. Your model will then learn to fit this data.
The thing with Neural Network is that they may forget. It sounds bad but is actually what makes it really strong: it learn to forget what is useless.
That is, if you retrain you should probably:
- run just a few epoch, otherwise the model will overfit the new dataset thus forgetting everything else
- learn on a bigger dataset i.e. past+new data, would ensure that nothing is forgotten
- maybe use a larger setup (in terms of hidden layers size, or number of layer) since you cannot indefinitely hope to learn more with the same setup.
I'm not expert in online training but that's not something you can expect without effort. It is in fact quite hard to do in practice. It's far from being the default behavior when you "just" continue training.
Hope it helps.
I am trying to do the following with weka's MultilayerPerceptron:
Train with a small subset of the training Instances for a portion of the epochs input,
Train with whole set of Instances for the remaining epochs.
However, when I do the following in my code, the network seems to reset itself to start with a clean slate the second time.
mlp.setTrainingTime(smallTrainingSetEpochs);
mlp.buildClassifier(smallTrainingSet);
mlp.setTrainingTime(wholeTrainingSetEpochs);
mlp.buildClassifier(wholeTrainingSet);
Am I doing something wrong, or is this the way that the algorithm is supposed to work in weka?
If you need more information to answer this question, please let me know. I am kind of new to programming with weka and am unsure as to what information would be helpful.
This thread on the weka mailing list is a question very similar to yours.
It seems that this is how weka's MultilayerPerceptron is supposed to work. It's designed to be a 'batch' learner, you are trying to use it incrementally. Only classifiers that implement weka.classifiers.UpdateableClassifier can be incrementally trained.