I'm trying to train a logistic regression model in mahout. The command I use is this:mahout trainlogistic --input /home/cloudera/Desktop/final.csv --output /home/cloudera/Desktop/model/model --target Action --predictors Open High Close --types word --features 20 --passes 100 --rate 50 --categories 2
The files I use actually exist. I'm reading a book that says that I should expect an output that looks like
Action ~ 647.186*Close+-44.975*High+3.269*Intercept term +-601.454*Open
and then a 4x2 matrix.
What I actually get is a terminal being filled with calculations, no Action ~, and a 5x4 matrix.
What am I doing wrong?
Well, the type of my predictors was numeric; why did the book I referenced call them words I have no idea.
I am new to Weka. I am trying to classify text documents after OCR process. The training corpus contains 286 mortgage documents and 57 note documents. The test dataset contains 1-100 text pages. So each line of the training and test dataset contains few paragraphs of text data. After classification text documents should be classified into mortgage or note properly.
I am doing a StringToWordVector operation combining both Training and Test dataset with missing values from Test dataset i.e. "?".
Steps are as follows:
Create training Arff file using following command line:
java -cp weka.jar weka.core.converters.TextDirectoryLoader -dir <text directory>
This creates a training dataset with known classes i.e. mortgage, note
Create test Arff file with missing classes i.e "?"
Combine both training and test dataset
Run the classifier with following command line:
java -cp weka.jar weka.classifiers.meta.FilteredClassifier -t train.arff -test.arff -F "weka.filters.MultiFilter -F weka.filters.unsupervised.attribute.StringToWordVector -F weka.filters.unsupervised.attribute.Standardize" -d trained.model -p 0
I am running the above example from both Weka GUI and from command line as well. Everything works fine as far as commands are concerned. The results are abnormal. Not at all correct.
I have also tried to run StringToWordVector operation separately and tested through NaiveBayes, NaiveBayesMultiNomial, J48 and other multiclass classifiers on the dataset but classification prediction is not correct. Always giving abnormal results.
Please help me to get the proper prediction result. Let me know if the above steps are correct and if I am doing anything wrong.
I am trying to figure out WEKA and perform some experiments with data that I have.
Basically what I want to do is take Data Set 1, use it as a training set. Run a J48 Decision Tree on it. Then take Data Set 2 and run the trained tree on it, with the output of the original data set with a extra column for what the prediction was.
Then do the same thing again with the Bayes Neural Network.
Can someone point me to a link of detail instructions on how exactly I would accomplish this? I seem to be missing some steps and cannot get the output of the original data set with the extra column.
Here is one way to do it with the command-line. This information is found in Chapter 1 ("A command-line primer") of the Weka manual that comes with the software.
java weka.classifiers.trees.J48 -t training_data.arff -T test_data.arff -p 1-N
where:
-t <training_data.arff> specifies the training data in ARFF format
-T <test_data.arff> specifies the test data in ARFF format
-p 1-N specifies that you want to output the feature vector and the prediction,
where N is the number of features in your feature vector.
For example, here I am using soybean.arff for both training and testing. There are 35 features in the feature vector:
java weka.classifiers.trees.J48 -t soybean.arff -T soybean.arff -p 1-35
The first few lines of the output look like:
=== Predictions on test data ===
inst# actual predicted error prediction (date,plant-stand,precip,temp,hail,crop-hist,area-damaged,severity,seed-tmt,germination,plant-growth,leaves,leafspots-halo,leafspots-marg,leafspot-size,leaf-shread,leaf-malf,leaf-mild,stem,lodging,stem-cankers,canker-lesion,fruiting-bodies,external-decay,mycelium,int-discolor,sclerotia,fruit-pods,fruit-spots,seed,mold-growth,seed-discolor,seed-size,shriveling,roots)
1 1:diaporth 1:diaporth 0.952 (october,normal,gt-norm,norm,yes,same-lst-yr,low-areas,pot-severe,none,90-100,abnorm,abnorm,absent,dna,dna,absent,absent,absent,abnorm,no,above-sec-nde,brown,present,firm-and-dry,absent,none,absent,norm,dna,norm,absent,absent,norm,absent,norm)
2 1:diaporth 1:diaporth 0.952 (august,normal,gt-norm,norm,yes,same-lst-two-yrs,scattered,severe,fungicide,80-89,abnorm,abnorm,absent,dna,dna,absent,absent,absent,abnorm,yes,above-sec-nde,brown,present,firm-and-dry,absent,none,absent,norm,dna,norm,absent,absent,norm,absent,norm)
The columns are: (1) data instance number; (2) ground truth label; (3) predicted label; (4) error; (5) prediction confidence; and (6) feature vector.
I am trying to solve a numeric classification problem with numeric attributes in WEKA using linear regression and then I want to test my model on the existing dataset with ""re-evaluate model on current test dataset.
As a result of the evaluation I am getting the summary:
Correlation coefficient 0.9924
Mean absolute error 1.1017
Root mean squared error 1.2445
Total Number of Instances 17
But I don't have results as it is shown here: http://weka.wikispaces.com/Making+predictions
How to bring WEKA to the result I need?
Thank you.
To answer my question - for trained and tested model, right click on the model and go to visualize classifier error. there use save option to save actual and predicted values.
Are you using command line interface (CLI) or GUI.
If CLI, the command given in the above link works pretty fine
java weka.classifiers.trees.J48 -T unclassified.arff -l j48.model -p 0
So when you train the model you save it as *.model (j48.model) and later use it to evaluate on test data (unclassified.arff)
I am trying the libSVM package, playing with RBF and Linear classification, and I followed (I think) all recommendations in their README files.
I have a big file to train on, (70K) so I am trying to use liblinear instead of RBF.
The only problem is that I am unable to get the model after the training phase, my command line looks like this :
./train -c 4 -v 5 -s 6 TrainingSet.scal TrainingSet.scal.Model
After the training is done, I have the accuracy estimation but then when I look at the *.model file to use it against my test set, I simply don't find it.
DO you think it is a bug in the package or is there something I am missing here ?
Thanks
Rad
Option -v 5 means that you are doing 5-fold evaluation on training set. If this option is enabled, then liblinear estimates error using 5-fold evaluation and doesn't output model.
If you want to output model, then don't use -v 5. Tt doesn't output training error in that case. But you can use liblinear-predict to estimate error on test set.
I normally use the library directly on code, but I think in your case the training is not being performed because you are using the option -s 6 which I think is undefined.
This is the usage:
`
-s svm_type : set type of SVM (default 0)
0 -- C-SVC (multi-class classification)
1 -- nu-SVC (multi-class classification)
2 -- one-class SVM
3 -- epsilon-SVR (regression)
4 -- nu-SVR (regression)
`
You are also omitting the kernel type
-t kernel_type : set type of kernel function (default 2)
0 -- linear: u'*v
1 -- polynomial: (gamma*u'*v + coef0)^degree
2 -- radial basis function: exp(-gamma*|u-v|^2)
3 -- sigmoid: tanh(gamma*u'*v + coef0)
4 -- precomputed kernel (kernel values in training_set_file)
Hopefully that would make the trick.