the result weka j48 classifyinstance is not correct - machine-learning

I already build a tree to classify the instance. In my tree, there are 14 attributes. Each attribute is discretize by supervised discrete. When I created a new instance, I put the value in this instance and classify it in my tree, and I found the result is wrong. So I debug my program, and I found the value of the instance is not divided into the interval correctly. For example:
value of the instance:0.26879699248120303 is divided into '(-inf-0]'.
Why?

Problem solved.I didn't discretize the instance that was to be tested so that the weka didn't know the format of my instance.add the following code:
discretize.input(instance);//discretize is a filter
instance = discretize.output();

Related

what's the error using supply test set for prediction

I am trying to analyze the titanic dataset and build a predictive model. I have preprocessed the datasets. Now while I am trying to predict using the test set and I don't know why it doesn't show any result.
Titanic_test.arff
Titanic_train.afff
If you open the two files (training and test set) you will notice a difference: in the training set the last column has value 0 or 1, whereas in the test set it has ? (undefined).
This means that your test set doesn't contain the answers, therefore Weka cannot do any evaluation. It could do predictions though.

pmml4s model.predict() returns array instead of single value

I used sklearn2pmml to serialize my decision tree classifier to a pmml file.
I used pmml4s in java to deserialize the model and use it to predict.
Iuse the code below to make a prediction over a single incoming value. This should return either 0/1/2/3/4/5/6.
Object[] result = model.predict(new String[]{"220"});
The result array looks like this after the prediction:
Does anyone know why this is happening? Is my way of inputting the prediction value wrong or is something wrong in the serialization/deserialization?
It is certainty of model for each class. In your case it means that it's 4 with probability 94.5% or 5 with probability 5.5%
In simple case, if you want to receive value, you should pick index for the maximal value.
However you might use this probabilities to additional control logic, like thresholding when decision is ambiguous (two values with probability ~0.4, etc.)

what's meaning of function predict's returned value in OpenCV?

I use function predict in opencv to classify my gestures.
svm.load("train.xml");
float ret = svm.predict(mat);//mat is my feature vector
I defined 5 labels (1.0,2.0,3.0,4.0,5.0), but in fact the value of ret are (0.521220207,-0.247173533,-0.127723947······)
So I am confused about it. As Opencv official document, the function returns a class label (classification) in my case.
update: I don't still know why to appear this result. But I choose new features to train models and the return value of predict function is what I defined during train phase (e.g. 1 or 2 or 3 or etc).
During the training of an SVM you assign a label to each class of training data.
When you classify a sample the returned result will match up with one of these labels telling you which class the sample is predicted to fall into.
There's some more documentation here which might help:
http://docs.opencv.org/doc/tutorials/ml/introduction_to_svm/introduction_to_svm.html
With Support Vector Machines (SVM) you have a training function and a prediction one. The training function is to train your data and save those informations on an xml file (it facilitates the prediction process in case you use a huge number of training data and you must do the prediction function in another project).
Example : 20 images per class in your case : 20*5=100 training images,each image is associated with a label of its appropriate class and all these informations are stocked in train.xml)
For the prediction function , it tells you what's label to assign to your test image according to your training DATA (the hole work you did in training process). Your prediction results might be good and might be bad , it's all about your training data I think.
If you want try to calculate the error rate for your classifier to see how much it can give good results or bad ones.

WEKA 3.7.10 not compatible format, class index differ

I use weka for text classification, I have a train set and untagged test set, the goal is to classify test set.
In WEKA 3.6.6 everything goes well, I can select Supplied test set and train the model and get result.
On the same files, WEKA 3.7.10 says that
Train and test set are not compatible. Would you like to automatically wrap the classifier in "inputMappedClassifier" before porceeding?
And when I press No it outputs the following error message
Problem evaluating classfier: Train and test are not compatible Class index differ
: 2!= 0
I understand that the key is Class index differ: 2!= 0.
However what does it mean? Why it works in WEKA 3.6.6 and not compatible in WEKA 3.7.10?
How can I make the test set compatible to train set?
When you import the supplied test set, are you selecting the same class attribute as the one that you use in the train set? If you don't change this field, weka selects the last attribute as being the class automatically.

How to control the name of a saved variable from linear regression in SPSS?

When I run a multiple linear regression in SPSS, I want to save Mahalanobis distance.
By default when I do this, SPSS adds a variable called MAH_1 to the data frame.
I would like to indicate using syntax what that variable should be called. This would help make any example more reproducible if this variable were to be used in a subsequent analysis.
How do you control the name of the variable generated by linear regression in SPSS?
For example, here is some sample syntax:
REGRESSION
/STATISTICS COEFF OUTS R ANOVA
/DEPENDENT id
/METHOD=ENTER IV1 IV2 IV3 IV4
/SAVE MAHAL.
Is there some syntax which follows the MAHAL command which allows you to specify the name of the generated variable?
I managed to seemingly answer the question with a little experimentation.
Writing the preferred name in parentheses after the saved object name generates the saved object with the preferred name. I.e., changing MAHAL to MAHAL(FOO), means that the saved values of Mahalanobis distance are now called foo instead of the default MAH_1.
REGRESSION
/STATISTICS COEFF OUTS R ANOVA
/DEPENDENT id
/METHOD=ENTER IV1 IV2 IV3 IV4
/SAVE MAHAL(foo).
And the section of the Help file for the save option in the regression command elaborates.

Resources