I'm currently using partial_fit with SGDClassifier to fit a model to predict the hashtags on images.
The problem I'm having is that SGDClassifier requires specifying classes upfront. This is ok to fit a model offline but I'd like to add new classes online when observing new hashtags. Currently, I need to retrain a new model from scratch to accommodate the new classes.
Is there a way to have SGDClassifier accept new classes without having to retrain a new model? Or would I be better off training a separate binary SGDClassifier for each hashtag?
Thanks
Hashtags are usually just tags, thus one object can have many of them. In such setting there is no multiclass scenario - and you should have just a single SGD binary classifier per tag. You can obviously fit more complex models taking into account reasoning between tags, but SGD is not duing so, thus using it in a provided setting does not make any more sense than just having N distinct classifiers.
Related
I have trained Gensim's WordToVec on a text corpus,converted it to DocToVec and then used cosine similarity to find the similarity between documents. I need to suggest similar documents. Now suppose among the top 5 suggestions for a particular document, we manually find that 3 of them are not similar.Can this feedback be incorporated in retraining the model?
It's not quite clear what you mean by "converted [a Word2Vec model] to DocToVec". The gensim Doc2Vec class doesn't use or require a Word2Vec model as input.
But, if you have many sets of hand-curated "this is a good suggestion" or "this is a bad suggestion" pairs for your corpus, you can use the model's scoring against all those to compare models, and train many variant models (with different model parameter values like size, window, min_count, sample, etc), picking the one that scores best on your tests.
That sort of automated-parameter-search is the most straightforward way to use performance on real evaluation data to adjust an unsupervised model like Word2Vec.
(Depending on the specifics of your data and problem-domain, you might also start to notice patterns in where the model is better or worse, that help you hand-tune parts of the data preprocessing. For example, a different handling of capitalization or tokenization might be suggested by error cases.)
How train_on_batch() is different from fit()? What are the cases when we should use train_on_batch()?
For this question, it's a simple answer from the primary author:
With fit_generator, you can use a generator for the validation data as
well. In general, I would recommend using fit_generator, but using
train_on_batch works fine too. These methods only exist for the sake of
convenience in different use cases, there is no "correct" method.
train_on_batch allows you to expressly update weights based on a collection of samples you provide, without regard to any fixed batch size. You would use this in cases when that is what you want: to train on an explicit collection of samples. You could use that approach to maintain your own iteration over multiple batches of a traditional training set but allowing fit or fit_generator to iterate batches for you is likely simpler.
One case when it might be nice to use train_on_batch is for updating a pre-trained model on a single new batch of samples. Suppose you've already trained and deployed a model, and sometime later you've received a new set of training samples previously never used. You could use train_on_batch to directly update the existing model only on those samples. Other methods can do this too, but it is rather explicit to use train_on_batch for this case.
Apart from special cases like this (either where you have some pedagogical reason to maintain your own cursor across different training batches, or else for some type of semi-online training update on a special batch), it is probably better to just always use fit (for data that fits in memory) or fit_generator (for streaming batches of data as a generator).
train_on_batch() gives you greater control of the state of the LSTM, for example, when using a stateful LSTM and controlling calls to model.reset_states() is needed. You may have multi-series data and need to reset the state after each series, which you can do with train_on_batch(), but if you used .fit() then the network would be trained on all the series of data without resetting the state. There's no right or wrong, it depends on what data you're using, and how you want the network to behave.
Train_on_batch will also see a performance increase over fit and fit generator if youre using large datasets and don't have easily serializable data (like high rank numpy arrays), to write to tfrecords.
In this case you can save the arrays as numpy files and load up smaller subsets of them (traina.npy, trainb.npy etc) in memory, when the whole set won't fit in memory. You can then use tf.data.Dataset.from_tensor_slices and then using train_on_batch with your subdataset, then loading up another dataset and calling train on batch again, etc, now you've trained on your entire set and can control exactly how much and what of your dataset trains your model. You can then define your own epochs, batch sizes, etc with simple loops and functions to grab from your dataset.
Indeed #nbro answer helps, just to add few more scenarios, lets say you are training some seq to seq model or a large network with one or more encoders. We can create custom training loops using train_on_batch and use a part of our data to validate on the encoder directly without using callbacks. Writing callbacks for a complex validation process could be difficult. There are several cases where we wish to train on batch.
Regards,
Karthick
From Keras - Model training APIs:
fit: Trains the model for a fixed number of epochs (iterations on a dataset).
train_on_batch: Runs a single gradient update on a single batch of data.
We can use it in GAN when we update the discriminator and generator using a batch of our training data set at a time. I saw Jason Brownlee used train_on_batch in on his tutorials (How to Develop a 1D Generative Adversarial Network From Scratch in Keras)
Tip for quick search: Type Control+F and type in the search box the term that you want to search (train_on_batch, for example).
This may sound like a naive question, but i am quite new on this. Let's say I use the Google pre-trained word2vector model (https://github.com/dav/word2vec) to train a classification model. I save my classification model. Now I load back the classification model into memory for testing new instances. Do I need to load the Google word2vector model again? Or is it only used for training my model?
It depends on how your corpuses and test examples are structured and pre-processed.
You are probably using the pre-trained word-vectors to turn text into numerical features. At first, text examples are vectorized to train the classifier. Later, other (test/production) text examples will be vectorized in the same, and presented to get the classifier to get its judgements.
So you will need to use the same text-to-vectors process for test/production text examples as was used during training. Perhaps you've done that in a separate earlier bulk step, in which case you already have the features in the vector form the classifier uses. But often your classifier pipeline will itself take raw text, and vectorize it – in which case it will need the same pre-trained (word)->(vector) mappings available at test time as were available during training.
Is there a way to penalize a feature in so that it doesn't dominate the model? (In Salford Predictive Modeller, there is a setting called "Penalties on Variables")
The situation is that I have one categorical feature which I want to include in the model, but I don't want to have as the most important feature, since then the model doesn't properly capture the variance explained by the other predictors.
I think you cannot do that. Although I don't really understand why you would want to do that, you could try the following:
Train a model on the whole dataset, train a separate model on the dataset after removing this feature. Then, combine the results of the two models (maybe simple averaging or stacking etc.)
I am using the classical SIFT - BOW - SVM for image classification. My classifiers are created using the 1vsAll paradigm.
Let's say that I currently have 100 classes.
Later, I would like to add new classes OR I would like to improve the recognition for some specific classes using additional training sets.
What would be the best approach to do it ?
Of course, the best way would be to re-execute every steps of the training stage.
But would it make any sense to only compute the additional (or modified) classes using the same vocabulary as the previous model, in order avoid to recompute a new vocabulary and train again ALL the classes ?
Thanks
In short - no. If you add new class it should be added to each of the "old" classifiers so "one vs. all" still makes sense. If you assume that new classes can appear with time consider using one class classifiers instead, such as one-class SVM. This way once you get new samples for a particular class you only retrain a particular model, or add a completely new one to the system.
Furthermore, for large number of classes, 1 vs all SVM works quite badly, and one-class approach is usually much better.