How to code Naïve Bayes using Information Gain (IG) - machine-learning

I read from a paper that Naive Bayes using IG is the best model for text classification where the dataset is small and has few positives. However, I'm not too sure how to code this specific model using Python. Would this be user TF or Scikit learn and then adjusting a parameter?

Related

Supervised training and testing in GenSims FastText implementation

I am currently training a Gensim FastText model with a document from a certain domain with the unsupervised training method from Gensim.
After this training of the word representations i would like to train a set of sentence+label lines and ultimately test the model and return a precision and recall value like it is possible in facebooks fastText implementation via train_supervised + test. Does GenSims implementation support the supervised training and testing? I couldnt get it to work / find the required methods.
Any help is much appreciated.
Gensim's FastText implementation has so far chosen not to support the same supervised mode of Facebook's original FastText, where known-labels can be used to drive the training of word-vectors – because gensim sees it focus as being unsupervised topic-modeling techniques.

In Mahout, is there any method for data classification with Naive Bayes?

I am still a newbie in using Mahout, and currently studying on the Naive Bayes for data classification.
As far as I know Mahout has 2 related programs, one is trainnb which is for training Bayes model, and testnb which is for evaluating the model. Under current implementation of Mahout, is there a way to apply the model on new data classification by just a simple command? Or do I need to code an implementation from scratch (e.g. use the model as a base to calculate the likelihood for each of the possibilities, compute and return the class with highest value) using java?

How to use different dataset for scikit and NLTK?

I am trying to implement inbuilt naive bayes classifier of Scikit and NLTK for raw data I have. The data I have is set tab-separated-rows each having some label, paragraph and some other attributes.
I am interested in classifying the paragraphs.
I need to convert this data into format suitable for inbuilt classifiers of Scikit/ NLTK.
I want to implement Gaussian,Bernoulli and Multinomial Naive Bayes for all paragraphs.
Question 1:
For scikit, the example given imports iris data. I checked the iris data, it has precalculated values from the data set. How can I convert my data into such format and directly call the gaussian function? Is there any standard way of doing so?
Question 2:
For NLTK,
What should be input for NaiveBayesClassifier.classify function? is it dict with boolean values? how can it be made multinomial or gaussian?
# question 2:
nltk.NaiveBayesClassifier.classify expects a so called 'featureset'. A featureset is a dictionary with feature names as keys and feature values as values, e.g. {'word1':True, 'word2':True, 'word3':False}. Nltks' naive bayes classifier cannot be used as multinomial approach. However, you can install scikit learn and use the nltk.classify.scikitlearn wrapper module to deploy scikit's multinomial classifier.

Navie Bayes classifier in Data Mining

Googled a lot to find an answer.Then thought this will be the area where some one will be able to answer my doubt.
In classification algorithm we have model and prediction part.
Normally while testing we have accuracy rate.
Likewise is there any accuracy rate/confidence for model in Navie Bayes algorithm.
Evaluation is (usually) not part of the classifier.
It's something you do seperately to evaluate if you did a good job, or not.
If you classify your test data using naive bayes, you can perform exactly the same kind of evaluation as with other classifiers!

How do I update a trained model (weka.classifiers.functions.MultilayerPerceptron) with new training data in Weka?

I would like to load a model I trained before and then update this model with new training data. But I found this task hard to accomplish.
I have learnt from Weka Wiki that
Classifiers implementing the weka.classifiers.UpdateableClassifier interface can be trained incrementally.
However, the regression model I trained is using weka.classifiers.functions.MultilayerPerceptron classifier which does not implement UpdateableClassifier.
Then I checked the Weka API and it turns out that no regression classifier implements UpdateableClassifier.
How can I train a regression model in Weka, and then update the model later with new training data after loading the model?
I have some data mining experience in Weka as well as in scikit-learn and r and updateble regression models do not exist in weka and scikit-learn as far as I know. Some R libraries however do support updating regression models (take a look at this linear regression model for example: http://stat.ethz.ch/R-manual/R-devel/library/stats/html/update.html), so if you are free to switching data mining tool this might help you out.
If you need to stick to Weka than I'm afraid that you would probably need to implement such a model yourself, but since I'm not a complete Weka expert please check with the guys at weka list (http://weka.wikispaces.com/Weka+Mailing+List).
The SGD classifier implementation in Weka supports multiple loss functions. Among them are two loss functions that are meant for linear regression, viz. Epsilon insensitive, and Huber loss functions.
Therefore one can use a linear regression trained with SGD as long as either of these two loss functions are used to minimize training error.

Resources