Weka Classification - machine-learning

I was trying to data model a Classification Machine Learning algorithm on a data set which has 32 attributes,the last column being Target class.I refined the attributes number in to 6 from 32 ,which I felt would be more useful for my Classification model.
I tried to perform J48 and some incremental classification algorithm.
I expected output structure which consists of confusion matrix,correctlt and incorrectly classified instances,kappa value.
But my result did not give any information on Correctly and Incorrectly classified instances.Also,it did not predict confusion matrix and Kappa value.All I received is like this:
=== Summary ===
Correlation coefficient 0.9482
Mean absolute error 0.2106
Root mean squared error 0.5673
Relative absolute error 13.4077 %
Root relative squared error 31.9157 %
Total Number of Instances 1461
Can anyone tell me why I did not get Confusion matrix,kappa and Correct,Incorrect instances information.

Unfortunately you didnt write your code, or what version of weka do you apply.
BTW, to calculate confusion mtx, kappa etc. you can use methods of Evaluation class, http://weka.sourceforge.net/doc.dev/weka/classifiers/Evaluation.html
for example, after you train your model:
classifier.buildClassifier(train); \\train is an instances
Evaluation eval = new Evaluation(train);
//evaulate your model at 10 fold cross validation manner
eval.crossValidateModel(classifier, train, 10, new Random(1));
System.out.println(classifier);
//print different stats with
System.out.println(eval.toSummaryString());
System.out.println(eval.toMatrixString());
System.out.println(eval.toClassDetailsString());

Related

understanding test and validation set usage for early stop and model selection

I implemented an ANN (1 hidden layer of 64 units, learning rate = 0.001, epsilon = 0.001, iters = 500) with pythons OpenCV module. Train error ~ 3% and test error ~ 12%
In order to improve the accruacy/ generalisation of my NN I decided to proceed by- implementing model selection (of #hidden units and learning rate) to get an accurate value of hyperparameters and plotting learning curves to determine if more data is needed (currently have 2.5k).
Having read some sources regarding NN training and model selection, I'm very confused on the following matter -
1) In order to perform model selection, I know the following needs to be done-
create set possibleHiddenUnits {4, 8, 16, 32, 64}
randomly select Tr & Va sets from the total set of Tr + Va with some split e.g. 80/20
foreach ele in possibleHiddenUnits
(*) compute weights for the NN using backpropagation and an iterative optimisation algorithm like Gradient Descent (where we provide the termination criteria in the form of number of iterations / epsilon)
compute Validation set error using these trained weights
select the number of hidden units which min Va set error
Alternatively, I believe we can also use k-fold cross validation.
a. how do you decide what the number of iterations/ epsilon for GD should be?
b. does 1 iteration out of x iterations of GD (where the entire training set is used to compute the gradients of cost wrt weights through backprop) constitute an 'epoch'?
2) Sources (whats is the difference between train, validation and test set, in neural networks? and How to use k-fold cross validation in a neural network) mention that the training for a NN is done in the following way as it prevents over-fitting
for each epoch
for each training data instance
propagate error through the network
adjust the weights
calculate the accuracy over training data
for each validation data instance
calculate the accuracy over the validation data
if the threshold validation accuracy is met
exit training
else
continue training
a. I believe this method should be executed once the model selection has been done. But then how do we avoid overfitting of the model in step (*) of the model selection process above?
b. Am I right in assuming that one epoch constitues one iteration of training where weights are calculated using the entire Tr set through GD + backprop and GD involves x (>1) iters over the entire Tr set to calculate the weights ?
Also, out off 1b and 2b which is correct?
This is more of a comment but since I cant make comments yet I write it here. Have you tried other methods like l2 regularization or dropout? I dont know a lot about model selection but dropout has a very similiar effect like taking lots of models and averaging them. Normaly dropout should do the trick and you wont have problems with overfitting anymore.

How to get jlibsvm prediction probability in multi-class classification

I am new to SVM. I am using jlibsvm for a multi-class classification problem. Basically, I am doing a sentence classification problem. There are 3 Classes. What I understood is I am doing One-against-all classification. I have a comparatively small train set. A total of 75 sentences, In which 25 sentences belongs to each class.
I am making 3 SVMs (so 3 different models), where, while training, in SVM_A, sentences belong to CLASS A will have a true label, i.e., 1 and other sentences will have a -1 label. Correspondingly done for SVM_B, and SVM_C.
While testing, to get the true label of a sentence, I am giving the sentence to 3 models and I am taking the prediction probability returned by these 3 models. Which one returns the highest will be the class the sentence belong to.
This is how I am doing. But I am getting the same prediction probability for every sentence in the test set for all models.
A predicted:0.012820514
B predicted:0.012820514
C predicted:0.012820514
These values repeat for all sentences in the training set.
The following is how I set parameters for training:
C_SVC svm = new C_SVC();
MutableBinaryClassificationProblemImpl problem;
ImmutableSvmParameterGrid.Builder builder = ImmutableSvmParameterGrid.builder();
// create training parameters ------------
HashSet<Float> cSet;
HashSet<LinearKernel> kernelSet;
cSet = new HashSet<Float>();
cSet.add(1.0f);
kernelSet = new HashSet<LinearKernel>();
kernelSet.add(new LinearKernel());
// configure finetuning parameters
builder.eps = 0.001f; // epsilon
builder.Cset = cSet; // C values used
builder.kernelSet = kernelSet; //Kernel used
builder.probability=true; // To get the prediction probability
ImmutableSvmParameter params = builder.build();
What am I doing wrong?
Is there any other better way to do multi-class classification other than this?
You are getting the same output, because you generate the same model three times.
The reason for this is, that jlibsvm is able to perform multiclass classification out of the box based on the provided data (LIBSVM itself supports this too). If it detects, that more than two class labes are provided in the given data, it automatically performs multiclass classification. So there is no need for a manually 1vsN approach. Just supply the data with class-labels for each category.
However, jlibsvm is still in beta and relies on a rather old version of LIBSVM (2.88). A lot has changed. For a more intiuitive Java binding (in comparison to the default LIBSVM version), you can take a look at zlibsvm, which is available via Maven Central and based on the latest LIBSVM version.

Parameter tuning/model selection using resampling

I have been trying to get into more details of resampling methods and implemented them on a small data set of 1000 rows. The data was split into 800 training set and 200 validation set. I used K-fold cross validation and repeated K-fold cross validation to train the KNN using the training set. Based on my understanding I have done some interpretations of the results - however, I have certain doubts about them (see questions below):
Results :
10 Fold Cv
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 720, 720, 720, 720, 720, 720, ...
Resampling results across tuning parameters:
k Accuracy Kappa
5 0.6600 0.07010791
7 0.6775 0.09432414
9 0.6800 0.07054371
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was k = 9.
Repeated 10 fold with 10 repeats
Resampling results across tuning parameters:
k Accuracy Kappa
5 0.670250 0.10436607
7 0.676875 0.09288219
9 0.683125 0.08062622
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was k = 9.
10 fold, 1000 repeats
k Accuracy Kappa
5 0.6680438 0.09473128
7 0.6753375 0.08810406
9 0.6831800 0.07907891
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was k = 9.
10 fold with 2000 repeats
k Accuracy Kappa
5 0.6677981 0.09467347
7 0.6750369 0.08713170
9 0.6826894 0.07772184
Doubts:
While selecting the parameter, K=9 is the optimal value for highest accuracy. However, I don't understand how to take Kappa into consideration while finally choosing parameter value?
Repeat number has to be increased until we get stabilised result, the accuracy changes when the repeats are increased from 10 to 1000. However,the results are similar for 1000 repeats and 2000 repeats. Will it be right to consider the results of 1000/2000 repeats to be stabilised performance estimate?
Any thumb rule for the repeat number?
Finally,should I train the model on my complete training data (800 rows) now test the accuracy on the validation set ?
Accuracy and Kappa are just different classification performance metrics. In a nutshell, their difference is that Accuracy does not take possible class imbalance into account when calculating the metrics, while Kappa does. Therefore, with imbalanced classes, you might be better off using Kappa. With R caret you can do so via the train::metric parameter.
You could see a similar effect of slightly different performance results when running e.g. the 10CV with 10 repeats multiple times - you will just get slightly different results for those as well. Something you should look out for is the variance of classification performance over your partitions and repeats. In case you obtain a small variance you can derive that you by training on all your data, you likely obtain a model that will give you similar (hence stable) results on new data. But, in case you obtain a huge variance, you can derive that just by chance (being lucky or unlucky) you might instead obtain a model that either gives you rather good or rather bad performance on new data. BTW: the prediction performance variance is something e.g. R caret::train will give you automatically, hence I'd advice on using it.
See above: look at the variance and increase the repeats until you can e.g. repeat the whole process and obtain a similar average performance and variance of performance.
Yes, CV and resampling methods exist to give you information about how well your model will perform on new data. So, after performing CV and resampling and obtaining this information, you will usually use all your data to train a final model that you use in your e.g. application scenario (this includes both train and test partition!).

what's meaning of function predict's returned value in OpenCV?

I use function predict in opencv to classify my gestures.
svm.load("train.xml");
float ret = svm.predict(mat);//mat is my feature vector
I defined 5 labels (1.0,2.0,3.0,4.0,5.0), but in fact the value of ret are (0.521220207,-0.247173533,-0.127723947······)
So I am confused about it. As Opencv official document, the function returns a class label (classification) in my case.
update: I don't still know why to appear this result. But I choose new features to train models and the return value of predict function is what I defined during train phase (e.g. 1 or 2 or 3 or etc).
During the training of an SVM you assign a label to each class of training data.
When you classify a sample the returned result will match up with one of these labels telling you which class the sample is predicted to fall into.
There's some more documentation here which might help:
http://docs.opencv.org/doc/tutorials/ml/introduction_to_svm/introduction_to_svm.html
With Support Vector Machines (SVM) you have a training function and a prediction one. The training function is to train your data and save those informations on an xml file (it facilitates the prediction process in case you use a huge number of training data and you must do the prediction function in another project).
Example : 20 images per class in your case : 20*5=100 training images,each image is associated with a label of its appropriate class and all these informations are stocked in train.xml)
For the prediction function , it tells you what's label to assign to your test image according to your training DATA (the hole work you did in training process). Your prediction results might be good and might be bad , it's all about your training data I think.
If you want try to calculate the error rate for your classifier to see how much it can give good results or bad ones.

libSVM giving highly inaccurate predictions even for the file that was used to train it

here is the deal.
I am trying to make an SVM based POS tagger.
The feature vectors for the SVM was created with the help of format converters.
Now here is a screenshot of the training file that I am using.
http://tinypic.com/r/n4fn2r/8
I have 25 labels for various POS tags. when i use the java implementation or the command line tools for prediction i get the following results.
http://tinypic.com/r/2dtw5ky/8
I have tried with all the kernels available but it gave more or less the same results.
This is happening even when the training file is used as the testing file.
please help me out here..!!
P.S. I cannot share more than two links. Thus here is a snippet of the model file
svm_type c_svc
kernel_type rbf
gamma 0.000548546
nr_class 25
total_sv 431
rho -0.929467 1.01073 1.0531 1.03472 1.01585 0.953263 1.03027 -0.921365 0.984535 1.02796 1.01266 1.03374 0.949463 0.977925 0.986551 -0.920912 0.940926 -0.955562 0.975386 -0.981959 -0.884042 0.0516955 -0.980884 -0.966095 0.995091 1.023 1.01489 1.00308 0.948314 1.01137 -0.845876 0.968034 1.0076 1.00064 1.01335 0.942633 0.965703 0.979212 -0.861236 0.935055 -0.91739 0.970223 -0.97103 0.0743777 0.970321 -0.971215 -0.931582 0.972377 0.958193 0.931253 0.825797 0.954894 -0.972884 -0.941726 0.945077 0.922366 0.953999 -1.00503 0.840985 0.882229 -0.961742 0.791631 -0.984971 0.855911 -0.991528 -0.951211 -0.962096 -0.99213 -0.99708 -0.957557 -0.308987 -0.455442 -0.94881 -0.995319 -0.974945 -0.964637 -0.902152 -0.955258 -1.05287 -1.00614 -0.
update
Just trained the SVM with svm type as c-SVC and kernel type as linear. Which gave a non-zero(although very poor) accuracy.
As mentioned by #Pedrom, parameter choice is absolutely crucial when training SVMs. I suggest you have a look at this practical guide. Also, 431 words is nowhere near enough to train a 25-class model. You will definitely need more data.
That said, 0% accuracy is indeed odd. Can you please show us the commands you are using to train and evaluate the model?

Resources