I am trying to use weka to classify a dataset with logistic regression but the option logistic is unavaliable even though I use only numeric values for attributes and nominal for class (Other main classifiers are also unavaiable like NaiveBayes, J48 etc). My Arff file is :
#RELATION data_weka
#ATTRIBUTE class {1,0}
#ATTRIBUTE 1 NUMERIC
.
.
.
#ATTRIBUTE 30 NUMERIC
#DATA
1,17.99,10.38,122.8,1001,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,1.095,0.9053,8.589,153.4,0.006399,0.04904,0.05373,0.01587,0.03003,0.006193,25.38,17.33,184.6,2019,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
.
.
.
The dataset contains 562 examples.
Can anyone help me please?
In your file, the class attribute is not the last attribute. Did you change the class attribute to be the last (class) attribute in the Preprocess Editor (right click to see that menu).
Weka always assumes the class attribute is the last attribute in the file. Your last attribute (30) is numeric, so it's not letting you run logistic regression.
Related
I have a course project that I need to finish. I'm using Weka 3.8 and I need to classify text. The result needs to be as accurate as possible. We received a train and a test .arff file. We need to train it with the train file of course, and then let it classify the test file. The professor uploaded a 100% accurate classification of the test file. We need to upload our own results and than the system compares the two files. For now I've been using a FilteredClassifier composed of SMO and StringToWordVector with Snowball stremmer, but I can't get a better accuracy than 65.9% for some reason (this is not the split accuracy, but the one I get when the system compares my results to the 100% accurate one). I can't figure out why.
The train.arff file:
#relation train
#attribute index numeric
#attribute ingredients string
#attribute cuisine {greek,southern_us,filipino,indian,jamaican,spanish,italian,mexican,chinese,british,thai,vietnamese,cajun_creole,brazilian,french,japanese,irish,korean,moroccan,russian}
#data
0,'romaine lettuce;black olives;grape tomatoes;garlic;pepper;purple onion;seasoning;garbanzo beans;feta cheese crumbles',greek
1,'plain flour;ground pepper;salt;tomatoes;ground black pepper;thyme;eggs;green tomatoes;yellow corn meal;milk;vegetable oil',southern_us
2,'eggs;pepper;salt;mayonaise;cooking oil;green chilies;grilled chicken breasts;garlic powder;yellow onion;soy sauce;butter;chicken livers',filipino
3,'water;vegetable oil;wheat;salt',indian
...
and 4995 more lines like these.
The test.arff is similar to this:
#relation test
#attribute index numeric
#attribute ingredients string
#attribute cuisine {greek,southern_us,filipino,indian,jamaican,spanish,italian,mexican,chinese,british,thai,vietnamese,cajun_creole,brazilian,french,
japanese,irish,korean,moroccan,russian}
#data
0,'white vinegar;sesame seeds;english cucumber;sugar;extract;Korean chile flakes;shallots;garlic cloves;pepper;salt',?
1,'eggplant;fresh parsley;white vinegar;salt;extra-virgin olive oil;onions;tomatoes;feta cheese crumbles',?
... and 4337 more lines, like these.
This is my weka configuration:
He told us that there are some instances when in the .arff file some ingredients amongst the #data are seperated with ',' by accident and that there are words that occur frequently and that those might not help much. I don't know if this is important or not. Is there any way I could improve the classification accuracy? Am I even using the right classifier for the job? Thanks in advance!
I use weka for text classification, I have a train set and untagged test set, the goal is to classify test set.
In WEKA 3.6.6 everything goes well, I can select Supplied test set and train the model and get result.
On the same files, WEKA 3.7.10 says that
Train and test set are not compatible. Would you like to automatically wrap the classifier in "inputMappedClassifier" before porceeding?
And when I press No it outputs the following error message
Problem evaluating classfier: Train and test are not compatible Class index differ
: 2!= 0
I understand that the key is Class index differ: 2!= 0.
However what does it mean? Why it works in WEKA 3.6.6 and not compatible in WEKA 3.7.10?
How can I make the test set compatible to train set?
When you import the supplied test set, are you selecting the same class attribute as the one that you use in the train set? If you don't change this field, weka selects the last attribute as being the class automatically.
I have code to create decision tree from data set. i am using weather data set in weka examples. how can i generate the rules from the decision tree in java?
Data set::
#relation weather
#attribute outlook {sunny, overcast, rainy}
#attribute temperature real
#attribute humidity real
#attribute windy {TRUE, FALSE}
#attribute play {yes, no}
#data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no
You can get decision rules from a tree by following the path to each leaf and connecting the conditions on the junctions with "and". That is, for each leaf you would end up with one rule that tells you what conditions must be met to get to that leaf.
It might be easier though to instead of training a tree train a set of decision rules directly, e.g. with the DecisionTable classifier.
During the creating of my training set, I entered "true" and "false" in the same order as it was entered while creating the test set in WEKA. These nominal values are for the class attribute.
When I run a classifier, I somehow feel that the results look as if it is reversed in the test set.
My question is if the first line in the training set shows that the class value is "False", and if the trained model is used in the SVM classifier on a test set, does it mean if the returned classified class is 0, should I consider it as False?
Thanks
Abhishek S
If the nominal attribute was defined in the same order in both data sets (training and test).
The output will be in the same order.
Nominal values are coded as "double".
So if you wrote: {false, true} => "false" = 0.0 and "true" = 1.0.
Here is the excerpt from the weka documentation.
The returned double value from classifyInstance (or the index in the
array returned by distributionForInstance) is just the index for the
string values in the attribute. That is, if you want the string
representation for the class label returned above clsLabel, then you
can print it like this:
System.out.println(clsLabel + " -> " + unlabeled.classAttribute().value((int) clsLabel));
I have tried to classify using both a NaiveBayes classifier and a NaiveBayesSimple classifier, using the following data:
#attribute a real
#attribute b {yes, no}
#data
1,yes
3,yes
5,yes
2,yes
1,yes
4,no
7,no
5,no
8,no
9,no
When using the NaiveBayesSimple classifier, I get the mean and variance values I expect:
=== Classifier model (full training set) ===
Naive Bayes (simple)
Class yes: P(C) = 0.5
Attribute a
Mean: 2.4 Standard Deviation: 1.67332005
Class no: P(C) = 0.5
Attribute a
Mean: 6.6 Standard Deviation: 2.07364414
However, when using the NaiveBayes classifier, I get different values:
=== Classifier model (full training set) ===
Naive Bayes Classifier
Class
Attribute yes no
(0.5) (0.5)
=============================
a
mean 2.5143 6.6286
std. dev. 1.3328 1.8286
weight sum 5 5
precision 1.1429 1.1429
I was wondering what the cause of the shifting mean/SD was? I've read through the paper that the NaiveBayes classifier is based on: http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.8.3257 and can't see any reason for it there.
Thanks
The two algorithms differ from each other.
Naive bayes in Weka is defined as follows:
NAME weka.classifiers.bayes.NaiveBayes
SYNOPSIS Class for a Naive Bayes classifier using estimator classes.
Numeric estimator precision values are chosen based on analysis of the
training data. For this reason, the classifier is not an
UpdateableClassifier (which in typical usage are initialized with zero
training instances) -- if you need the UpdateableClassifier
functionality, use the NaiveBayesUpdateable classifier. The
NaiveBayesUpdateable classifier will use a default precision of 0.1
for numeric attributes when buildClassifier is called with zero
training instances.
For more information on Naive Bayes classifiers, see
George H. John, Pat Langley: Estimating Continuous Distributions in
Bayesian Classifiers. In: Eleventh Conference on Uncertainty in
Artificial Intelligence, San Mateo, 338-345, 1995.
OPTIONS debug -- If set to true, classifier may output additional info
to the console.
displayModelInOldFormat -- Use old format for model output. The old
format is better when there are many class values. The new format is
better when there are fewer classes and many attributes.
useKernelEstimator -- Use a kernel estimator for numeric attributes
rather than a normal distribution.
useSupervisedDiscretization -- Use supervised discretization to
convert numeric attributes to nominal ones.
and NaiveBayesSimple is defined as follows:
NAME weka.classifiers.bayes.NaiveBayesSimple
SYNOPSIS Class for building and using a simple Naive Bayes
classifier.Numeric attributes are modelled by a normal distribution.
For more information, see
Richard Duda, Peter Hart (1973). Pattern Classification and Scene
Analysis. Wiley, New York.
OPTIONS debug -- If set to true, classifier may output additional info
to the console.