How to load & use a WEKA model in Ruby? - ruby-on-rails

I'm writing a rails app that helps the user to predict sentiment analysis in text.
After using the WEKA GUI, I have an output of the model file.
I would like to know how to load the WEKA model file in Ruby, and evaluate prediction on a specific data (string).
Thanks for the help.

Related

How to create incremental NER training model(Appending in existing model)?

I am training customized Named Entity Recognition(NER) model using stanford NLP but the thing is i want to re-train the model.
Example :
Suppose i trained xyz model , then i will test it on some text if model detected somethings wrong then i (end user) will correct it and wanna re-train(append mode) the model on the corrected text.
Stanford Doesn't provide re-training facility so thats why i shifted towards spacy library of python , where i can retrain the model means , i can append new entities into the existing model.But after re-training the model using spacy , it overriding the existing knowledge(means existing training data in it) and just showing the result related to recent training.
Consider , i trained a model on TECHNOLOGY tag using 1000 records.after that lets say i have added one more entity BOOK_NAME to existing trained model.after this if i test model then spacy model just detecting BOOK_NAME from text.
Please give a suggestion to tackle my problem statement.
Thanks in Advance...!
I think it is a bit late to address this here. The issue you are facing is what is also called 'Catastrophic Forgetting problem'. You can get over it by sending in examples for existing examples. Like Spacy can predict well on well formed text like BBC corpus. You can choose such corpus, predict using pretrained model of spacy and create training examples. Mix these examples with your new examples and then train. You should now get better results. It was mentioned already in the spacy issues.

Do I still need to load word2vec model at model testing?

This may sound like a naive question, but i am quite new on this. Let's say I use the Google pre-trained word2vector model (https://github.com/dav/word2vec) to train a classification model. I save my classification model. Now I load back the classification model into memory for testing new instances. Do I need to load the Google word2vector model again? Or is it only used for training my model?
It depends on how your corpuses and test examples are structured and pre-processed.
You are probably using the pre-trained word-vectors to turn text into numerical features. At first, text examples are vectorized to train the classifier. Later, other (test/production) text examples will be vectorized in the same, and presented to get the classifier to get its judgements.
So you will need to use the same text-to-vectors process for test/production text examples as was used during training. Perhaps you've done that in a separate earlier bulk step, in which case you already have the features in the vector form the classifier uses. But often your classifier pipeline will itself take raw text, and vectorize it – in which case it will need the same pre-trained (word)->(vector) mappings available at test time as were available during training.

How can weka in Java classify one sample and print the result rather than read arff files?

Because I read the non-classify samples line by line, and I want to append the classify result to the end of each line, I have load the weka model file. Can I use weka in java to classify one sample rather than read arff files?
Yes. You could use Evaluation.evaluationForSingleInstance to estimate for a single case.

How to output resultant documents from Weka text-classification

So we are running a multinomial naive bayes classification algorithm on a set of 15k tweets. We first break up each tweet into a vector of word features based on Weka's StringToWordVector function. We then save the results to a new arff file to user as our training set. We repeat this process with another set of 5k tweets and re-evaluate the test set using the same model derived from our training set.
What we would like to do is to output each sentence that weka classified in the test set along with its classification... We can see the general information (Precision, recall, f-score) of the performance and accuracy of the algorithm but we cannot see the individual sentences that were classified by weka, based on our classifier... Is there anyway to do this?
Another problem is that ultimately our professor will give us 20k more tweets and expect us to classify this new document. We are not sure how to do this however as:
All of the data we have been working with has been classified manually, both the training and test sets...
however the data we will be getting from the professor will be UNclassified... How can we
reevaluate our model on the unclassified data if Weka requires that the attribute information must
be the same as the set used to form the model and the test set we are evaluating against?
Thanks for any help!
The easiest way to acomplish these tasks is using a FilteredClassifier. This kind of classifier integrates a Filter and a Classifier, so you can connect a StringToWordVector filter with the classifier you prefer (J48, NaiveBayes, whatever), and you will be always keeping the original training set (unprocessed text), and applying the classifier to new tweets (unprocessed) by using the vocabular derived by the StringToWordVector filter.
You can see how to do this in the command line in "Command Line Functions for Text Mining in WEKA" and via a program in "A Simple Text Classifier in Java with WEKA".

Classifying instances of a set with a Classification Model on WEKA GUI

I am new to data mining and I would like to ask you a classification question.
I have trained a classification algorithm on WEKA (GUI), using a training set in ARFF format. Consequently I saved it in Model format for future use.
Now I want to use this classification Model on WEKA (GUI) to get the predicted class of instances of a set that is also in ARFF format. Could you please give me instructions on how to do this on WEKA? Unlike the Weka Java API, the GUI version has really poor documentation in the Web and I couldn't find anything relevant.
Is it possible to store the classified set back to ARFF format with '?''s replaced with the class label in the class attribute? I need such outputs files for some computations.
Thank you beforehand.

Resources