Saving a ML pipeline as pickle in python - machine-learning

Lets say that i have a ML pipeline for text classification. In this pipeline i'm using tfidf vectorizer and SVM from sklearn. After training the model, i need to save the vectorized class from tfidf and the svm model class as a pickle, right? Or i need save only the svm trained class?
I think the data to be classified must have the same pipeline that i used in the trained pipeline, is that right?

Related

pytorch class weights for multi class classification

I am using class weights for multiclass classification using sklearn's compute_weight function and pytorch for training the model. For to compute the class weight, whether we need to use all data (training, validation and test) or only training set data for calculating the class weights. Thanks
When training your model you can only assume training data is available for you.
Estimating the class_weights is part of the training -- it defines your loss function.

How to perform incremental training of large data set using (scikit) Adaboost classifier?

I have a large size of the training dataset, so in order to fit it into the AdaBoost classifier, I would like to do incremental training.
Like in xgb we have a parameter called xgb_model to use the trained xgb model for further fitting the new data, I am looking for such parameters in AdaBoost classifier.
Currently, I have am trying to use fit function to iteratively train the model but it seems my classifier is not using the previous weights. How can I solve this?
It's not possible out-of-the-box. sklearn supports incremental/online training in some estimators, but not AdaBoostClassifier.
The estimators that support incremental training are listed here and have a special method named partial_fit().
see gaenari
it is c++ incremental decision tree.
support:
insert(csv), update(), and rebuild().
report()

Pipeline for py-faster-rcnn on custom datasets with VGG16 on caffe

I would like to use an existing VGG16 model trained on imagenet and fine-tune it with a custom dataset for some other classes required by me.
Where can I find the caffemodel, train_val.prototxt and solver.prototxt for the same ?
To fine-tune it with the custom dataset, is the procedure same as
Fine Tuning of GoogLeNet Model
A guide to convert_imageset.cpp
?
However, I want to use the newly-trained weights of the VGG16 model to train a faster RCNN (py-faster-rcnn) https://github.com/rbgirshick/py-faster-rcnn on a custom dataset.
For training faster RCNN on a custom dataset, I was planning on following the steps given here http://sgsai.blogspot.com/2016/02/training-faster-r-cnn-on-custom-dataset.html
Will the caffemodel generated from the VGG16 fine-tuning done earlier work here or some tweaks need to be done ?

how to export trained vectors to SVM in opencv

I am new to opencv. I want to use SVM in opencv. My question is can I train the classifier once and save all vectors. So in my main program I just need to import these vectors and do the classification. I read the SVM document and I only find get_support_vector function to get all vectors but I didn't find set_support_vector function. Does anybody have idea how to re-use a trained classifier? Thanks.

How do I update a trained model (weka.classifiers.functions.MultilayerPerceptron) with new training data in Weka?

I would like to load a model I trained before and then update this model with new training data. But I found this task hard to accomplish.
I have learnt from Weka Wiki that
Classifiers implementing the weka.classifiers.UpdateableClassifier interface can be trained incrementally.
However, the regression model I trained is using weka.classifiers.functions.MultilayerPerceptron classifier which does not implement UpdateableClassifier.
Then I checked the Weka API and it turns out that no regression classifier implements UpdateableClassifier.
How can I train a regression model in Weka, and then update the model later with new training data after loading the model?
I have some data mining experience in Weka as well as in scikit-learn and r and updateble regression models do not exist in weka and scikit-learn as far as I know. Some R libraries however do support updating regression models (take a look at this linear regression model for example: http://stat.ethz.ch/R-manual/R-devel/library/stats/html/update.html), so if you are free to switching data mining tool this might help you out.
If you need to stick to Weka than I'm afraid that you would probably need to implement such a model yourself, but since I'm not a complete Weka expert please check with the guys at weka list (http://weka.wikispaces.com/Weka+Mailing+List).
The SGD classifier implementation in Weka supports multiple loss functions. Among them are two loss functions that are meant for linear regression, viz. Epsilon insensitive, and Huber loss functions.
Therefore one can use a linear regression trained with SGD as long as either of these two loss functions are used to minimize training error.

Resources