inconsistency between libsvm and scikit-learn.svc results - machine-learning

I have a project that is based on SVM algorithm implemented by libsvm. Recently I decided to try several other classification algorithm, this is where scikit-learn comes to the picture.
The connection to the scikit was pretty straightforward, it supports libsvm format by load_svmlight_file routine. Ans it's svm implementation is based on the same libsvm.
When everything was done, I decided to the check the consistence of the results by directly running libsvm and via scikit-learn, and the results were different. Among 18 measures in learning curves, 7 were different, and the difference is located at the small steps of the learning curve. The libsvm results seems much more stable, but scikit-learn results have some drastic fluctuation.
The classifiers have exactly the same parameters of course.
I tried to check the version of libsvm in scikit-learn implementation, but I din't find it, the only thing I found was libsvm.so file.
Currently I am using libsvm 3.21 version, and scikit-learn 0.17.1 version.
I wound appreciate any help in addressing this issue.
size libsvm scikit-learn
1 0.1336239435355727 0.1336239435355727
2 0.08699516468193455 0.08699516468193455
3 0.32928301642777424 0.2117238289550198 #different
4 0.2835688734876902 0.2835688734876902
5 0.27846766962743097 0.26651875338163966 #different
6 0.2853854654662907 0.18898048915599963 #different
7 0.28196058132165136 0.28196058132165136
8 0.31473956032575623 0.1958710201604552 #different
9 0.33588303670653136 0.2101641630182972 #different
10 0.4075242509025311 0.2997807499800962 #different
15 0.4391771087975972 0.4391771087975972
20 0.3837789445609818 0.2713167833345173 #different
25 0.4252154334940311 0.4252154334940311
30 0.4256407777477492 0.4256407777477492
35 0.45314944605858387 0.45314944605858387
40 0.4278633233755064 0.4278633233755064
45 0.46174762022239796 0.46174762022239796
50 0.45370452524846866 0.45370452524846866

Related

Matching PyTorch w/ CNTK (VGG on CIFAR)

I am trying to understand how PyTorch works and want to replicate a simple CNN training on CIFAR. The CNTK script gets to 0.76 accuracy after 168 seconds of training (10 epochs), which is similar to my MXNet script (0.75 accuracy after 153 seconds).
However, my PyTorch script is lagging behind a lot at 0.71 accuracy and 354 seconds. I appreciate I will get differences in accuracy due to stochastic weight initialisation, etc. However the difference across frameworks is much greater than difference within a framework, initialising randomly between runs.
The reasons I can think of:
MXNet and CNTK are initialized to xavier/glorot uniform; not sure how to do this in PyTorch and so perhaps the weights are initialised to 0
CNTK does gradient-clipping by default; not sure if PyTorch has the equivalent
Perhaps the bias is dropped in PyTorch by default
I use SGD with momentum; perhaps the PyTorch implementation of momentum is a bit different
Edit:
I have tried specifying the weight initialisation, however it seems to have no big effect:
self.conv1 = nn.Conv2d(3, 50, kernel_size=3, padding=1)
init.xavier_uniform(self.conv1.weight, gain=np.sqrt(2.0))
init.constant(self.conv1.bias, 0)
I try to answer your first two questions:
weight initialization: different kinds of layers have their own method, you can find the default weight initialization of all these layers in the following link: https://github.com/pytorch/pytorch/tree/master/torch/nn/modules
gradient-clipping: you might want to use torch.nn.utils.clip_grad_norm
In addition, I am curious why you don't use torchvision.transforms torch.utils.data.DataLoader and torchvision.datasets.CIFAR10 to load and preprocess your data?
There is a similar image classification tutorial of cifar for Pytorch
http://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py
Hope this can help you.

LibSVM - Multi class classification with unbalanced data

I tried to play with libsvm and 3D descriptors in order to perform object recognition. So far I have 7 categories of objects and for each category I have its number of objects (and its pourcentage) :
Category 1. 492 (14%)
Category 2. 574 (16%)
Category 3. 738 (21%)
Category4. 164 (5%)
Category5. 369 (10%)
Category6. 123 (3%)
Category7. 1025 (30%)
So I have in total 3585 objects.
I have followed the practical guide of libsvm.
Here for reminder :
A. Scaling the training and the testing
B. Cross validation
C. Training
D. Testing
I separated my data into training and testing.
By doing a 5 cross validation process, I was able to determine the good C and Gamma.
However I obtained poor results (CV is about 30-40 and my accuracy is about 50%).
Then, I was thinking about my data and saw that I have some unbalanced data (categories 4 and 6 for example). I discovered that on libSVM there is an option about weight. That's why I would like now to set up the good weights.
So far I'm doing this :
svm-train -c cValue -g gValue -w1 1 -w2 1 -w3 1 -w4 2 -w5 1 -w6 2 -w7 1
However the results is the same. I'm sure that It's not the good way to do it and that's why I ask you some helps.
I saw some topics on the subject but they were related to binary classification and not multiclass classification.
I know that libSVM is doing "one against one" (so a binary classifier) but I don't know to handle that when I have multiple class.
Could you please help me ?
Thank you in advance for your help.
I've met the same problem before. I also tried to give them different weight, which didn't work.
I recommend you to train with a subset of the dataset.
Try to use approximately equal number of different class samples. You can use all category 4 and 6 samples, and then pick up about 150 samples for every other categories.
I used this method and the accuracy did improve. Hope this will help you!

libSVM giving highly inaccurate predictions even for the file that was used to train it

here is the deal.
I am trying to make an SVM based POS tagger.
The feature vectors for the SVM was created with the help of format converters.
Now here is a screenshot of the training file that I am using.
http://tinypic.com/r/n4fn2r/8
I have 25 labels for various POS tags. when i use the java implementation or the command line tools for prediction i get the following results.
http://tinypic.com/r/2dtw5ky/8
I have tried with all the kernels available but it gave more or less the same results.
This is happening even when the training file is used as the testing file.
please help me out here..!!
P.S. I cannot share more than two links. Thus here is a snippet of the model file
svm_type c_svc
kernel_type rbf
gamma 0.000548546
nr_class 25
total_sv 431
rho -0.929467 1.01073 1.0531 1.03472 1.01585 0.953263 1.03027 -0.921365 0.984535 1.02796 1.01266 1.03374 0.949463 0.977925 0.986551 -0.920912 0.940926 -0.955562 0.975386 -0.981959 -0.884042 0.0516955 -0.980884 -0.966095 0.995091 1.023 1.01489 1.00308 0.948314 1.01137 -0.845876 0.968034 1.0076 1.00064 1.01335 0.942633 0.965703 0.979212 -0.861236 0.935055 -0.91739 0.970223 -0.97103 0.0743777 0.970321 -0.971215 -0.931582 0.972377 0.958193 0.931253 0.825797 0.954894 -0.972884 -0.941726 0.945077 0.922366 0.953999 -1.00503 0.840985 0.882229 -0.961742 0.791631 -0.984971 0.855911 -0.991528 -0.951211 -0.962096 -0.99213 -0.99708 -0.957557 -0.308987 -0.455442 -0.94881 -0.995319 -0.974945 -0.964637 -0.902152 -0.955258 -1.05287 -1.00614 -0.
update
Just trained the SVM with svm type as c-SVC and kernel type as linear. Which gave a non-zero(although very poor) accuracy.
As mentioned by #Pedrom, parameter choice is absolutely crucial when training SVMs. I suggest you have a look at this practical guide. Also, 431 words is nowhere near enough to train a 25-class model. You will definitely need more data.
That said, 0% accuracy is indeed odd. Can you please show us the commands you are using to train and evaluate the model?

Import trained SVM from scikit-learn to OpenCV

I'm porting an algorithm that uses a Support Vector Machine from Python (using scikit-learn) to C++ (using the machine learning library of OpenCV).
I have access to the trained SVM in Python, and I can import SVM model parameters from an XML file into OpenCV. Since the SVM implementation of both scikit-learn and OpenCV is based on LibSVM, I think it should be possible to use the parameters of the trained scikit SVM in OpenCV.
The example below shows an XML file which can be used to initialize an SVM in OpenCV:
<?xml version="1.0"?>
<opencv_storage>
<my_svm type_id="opencv-ml-svm">
<svm_type>C_SVC</svm_type>
<kernel><type>RBF</type>
<gamma>0.058823529411764705</gamma></kernel>
<C>100</C>
<term_criteria><epsilon>0.0</epsilon>
<iterations>1000</iterations></term_criteria>
<var_all>17</var_all>
<var_count>17</var_count>
<class_count>2</class_count>
<class_labels type_id="opencv-matrix">
<rows>1</rows>
<cols>2</cols>
<dt>i</dt>
<data>
0 1</data></class_labels>
<sv_total>20</sv_total>
<support_vectors>
<_>
2.562423055146794554e-02 1.195797425735170838e-01
8.541410183822648050e-02 9.395551202204914520e-02
1.622867934926303379e-01 3.074907666176152077e-01
4.099876888234874062e-01 4.697775601102455179e-01
3.074907666176152077e-01 3.416564073529061440e-01
5.124846110293592716e-01 5.039432008455355660e-01
5.466502517646497639e-01 1.494746782168964394e+00
4.168208169705446942e+00 7.214937388193202183e-01
7.400275229357797802e-01</_>
<!-- omit 19 vectors to keep it short -->
</support_vectors>
<decision_functions>
<_>
<sv_count>20</sv_count>
<rho>-5.137523249549433402e+00</rho>
<alpha>
2.668992955678978518e+01 7.079767098112181145e+01
3.554240018130368384e+01 4.787014908624512088e+01
1.308470223155845069e+01 5.499185410034550614e+01
4.160483074010306126e+01 2.885504210853826379e+01
7.816431542954153144e+01 6.882061506693679576e+01
1.069534676985309574e+01 -1.000000000000000000e+02
-5.088050252552544350e+01 -1.101740897543916375e+01
-7.519686789702373630e+01 -3.893481464245511603e+01
-9.497774056452135483e+01 -4.688632332663718927e+00
-1.972745089701982835e+01 -8.169343841768861125e+01</alpha>
<index>
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
</index></_></decision_functions></my_svm>
</opencv_storage>
I would now like to fill this XML file with values from the trained scikit-learn SVM. But I'm not sure how the parameters of scikit-learn and OpenCV correspond. Here is what I have so far (clf is the classifier object in Python):
<kernel><gamma> corresponds to clf.gamma
<C> corresponds to clf.C
<term_criteria><epsilon> corresponds to clf.tol
<support_vectors> corresponds to clf.support_vectors_
Is this correct so far? Now here are the items I'm not really sure:
What about <term_criteria><iterations>?
Does <decision_functions><_><rho> correspond to clf.intercept_?
Does <decision_functions><_><alpha> correspond to clf.dual_coef_? Here I'm not sure because the scikit-learn documentation says "dual_coef_ which holds the product yiαi". It looks like OpenCV expects only αi, and not yiαi.
You don't need epsilon and iterations anymore, those are used in the training optimization problem. You can set them to your favorite number or ignore them.
Porting the support vectors may require some fiddling, as indexing may be different between scikit-learn and opencv. The XML in your example has no sparse format for example.
As for the other parameters:
rho should correspond to intercept_, but you may need to change sign.
scikit's dual_coef_ corresponds to sv_coef in standard libsvm models (which is alpha_i*y_i).
If opencv complains about the values you provide for alpha when porting, use absolute values of scikit-learn's dual_coef_ (e.g. all positive). These are the true alpha values of an SVM model.

Learning how to map numeric values into an array

Deal all,
I am looking for an appropriate algorithm which can allow me to learn how some numeric values are mapped into an array.
Try to imagine that I have a training data set like this:
1 1 2 4 5 --> [0 1 5 7 8 7 1 2 3 7]
2 3 2 4 1 --> [9 9 5 6 6 6 2 4 3 5]
...
1 2 1 8 9 --> [1 4 5 8 7 4 1 2 3 4]
So that given a new set of numeric values, I would like to predict this new array
5 8 7 4 2 --> [? ? ? ? ? ? ? ? ? ?]
Thank you very much in advance.
Best regards!
Some considerations:
Let us suppose that all numbers are integer and the length of the arrays is fixed
Quality of each predicted array can be determine by means of a distance function which try to measure the likeness between the ideal and the predicted array.
This is a challenging task in general. Are your array lengths fixed? What's the loss function (for example is it better to be "closer" for single digits -- is predicting 2 instead of 1 better than predicting 9 or it doesn't matter? Do you get credit for partial matches on the array, such as predicting the first half correct? etc)?
In any case, classical regression or classification techniques would likely not work very well for your scenario. I think the best bet would be to try a genetic programming approach. The fitness function would then be your loss measure i mentioned earlier. You can check this nice comparison for genetic programming libraries for different languages.
This is called a structured output problem, where the target you are trying to predict is a complex structure, rather than a simple class (classification) or number (regression).
As mentioned above, the loss function is an important thing you will have to think about. Minimum edit distance, RMS or simple 0-1 loss could be used.
Structured support vector machine or variations on ridge regression for structured output problems are two known algorithms that can tackle this problem. See wikipedia of course.
We have a research group on this topic at Universite Laval (Canada), led by Mario Marchand and Francois Laviolette. You might want to search for their publications like "Risk Bounds and Learning Algorithms for the Regression Approach to Structured Output Prediction" by Sebastien Giguere et al.
Good luck!

Resources