I use weka for text classification, I have a train set and untagged test set, the goal is to classify test set.
In WEKA 3.6.6 everything goes well, I can select Supplied test set and train the model and get result.
On the same files, WEKA 3.7.10 says that
Train and test set are not compatible. Would you like to automatically wrap the classifier in "inputMappedClassifier" before porceeding?
And when I press No it outputs the following error message
Problem evaluating classfier: Train and test are not compatible Class index differ
: 2!= 0
I understand that the key is Class index differ: 2!= 0.
However what does it mean? Why it works in WEKA 3.6.6 and not compatible in WEKA 3.7.10?
How can I make the test set compatible to train set?
When you import the supplied test set, are you selecting the same class attribute as the one that you use in the train set? If you don't change this field, weka selects the last attribute as being the class automatically.
Related
I am trying to analyze the titanic dataset and build a predictive model. I have preprocessed the datasets. Now while I am trying to predict using the test set and I don't know why it doesn't show any result.
Titanic_test.arff
Titanic_train.afff
If you open the two files (training and test set) you will notice a difference: in the training set the last column has value 0 or 1, whereas in the test set it has ? (undefined).
This means that your test set doesn't contain the answers, therefore Weka cannot do any evaluation. It could do predictions though.
I am getting error while running SMO model on test dataset in weka
Problem Evaluating classifier Train and test dataset are not
compatible. Class index differ: 3 != 0
Training dataset format
mean,variance,label
54.3333333333,1205.55555556,five
3.0,0.0,five
31739.0,0.0,five
3205.5,4475340.25,one
Test dataset format
mean,variance
3.0,0.0
257.0,0.0
216.0,14884.0
736.0,0.0
I trained the training dataset and want to get labels for the test dataset. Why I am getting these errors.
The test dataset should have identical structure to the training data. In your case you should add a column to the end called "label". Then, you need to assign some value to the label. This could be simply a question mark "?" to indicate the true label is unknown.
I already build a tree to classify the instance. In my tree, there are 14 attributes. Each attribute is discretize by supervised discrete. When I created a new instance, I put the value in this instance and classify it in my tree, and I found the result is wrong. So I debug my program, and I found the value of the instance is not divided into the interval correctly. For example:
value of the instance:0.26879699248120303 is divided into '(-inf-0]'.
Why?
Problem solved.I didn't discretize the instance that was to be tested so that the weka didn't know the format of my instance.add the following code:
discretize.input(instance);//discretize is a filter
instance = discretize.output();
I am trying to solve a numeric classification problem with numeric attributes in WEKA using linear regression and then I want to test my model on the existing dataset with ""re-evaluate model on current test dataset.
As a result of the evaluation I am getting the summary:
Correlation coefficient 0.9924
Mean absolute error 1.1017
Root mean squared error 1.2445
Total Number of Instances 17
But I don't have results as it is shown here: http://weka.wikispaces.com/Making+predictions
How to bring WEKA to the result I need?
Thank you.
To answer my question - for trained and tested model, right click on the model and go to visualize classifier error. there use save option to save actual and predicted values.
Are you using command line interface (CLI) or GUI.
If CLI, the command given in the above link works pretty fine
java weka.classifiers.trees.J48 -T unclassified.arff -l j48.model -p 0
So when you train the model you save it as *.model (j48.model) and later use it to evaluate on test data (unclassified.arff)
I am using the Weka gui to train a SVM classifier (using libSVM) on a dataset. The data in the .arff file is
#relation Expandtext
#attribute message string
#attribute Class {positive, negative, objective}
#data
I turn it into a bag of words with String-to-Word Vector, run SVM and get a decent classification rate. Now I have my test data I want to predict their labels which I do not know. Again it's header information is the same but for every class it is labeled with a question mark (?) ie
'Musical awareness: Great Big Beautiful Tomorrow has an ending\u002c Now is the time does not', ?
Again I pre-processed it, string-to-word-vector, class is in the same position as the training data.
I go to the "classify" menu, load up my trained SVM model, select "supplied test data", load in the test data and right click on the model saying "Re-evaluate model on current test set" but it gives me the error that test and train are not compatible. I am not sure why.
Am I going about this the wrong way to label the test data? What am I doing wrong?
For almost any machine learning algorithm, the training data and the test data need to have the same format. That means, both must have the same features, i.e. attributes in weka, in the same format, including the class.
The problem is probably that you pre-process the training set and the test set independently, and the StrintToWordVectorFilter will create different features for each set. Hence, the model, trained on the training set, is incompatible to the test set.
What you rather want to do is initialize the filter on the training set and then apply it on both training and test set.
The question Weka: ReplaceMissingValues for a test file deals with this issue, but I'll repeat the relevant part here:
Instances train = ... // from somewhere
Instances test = ... // from somewhere
Filter filter = new StringToWordVector(); // could be any filter
filter.setInputFormat(train); // initializing the filter once with training set
Instances newTrain = Filter.useFilter(train, filter); // configures the Filter based on train instances and returns filtered instances
Instances newTest = Filter.useFilter(test, filter); // create new test set
Now, you can train the SVM and apply the resulting model on the test data.
If training and testing have to be in separate runs or programs, it should be possible to serialize the initialized filter together with the model. When you load (deserialize) the model, you can also load the filter and apply it on the test data. They should be compatibel now.