libSVM - second best classification - machine-learning

I am facing a classification problem, so I thought I could use libSVM and in fact everything works just fine.
Now I would like to introduce some 'tolerance' and see if my system can guess the correct label of the data (which I know a priori) in N guesses (or attempts). What I mean is this: is it possible to have libSVM output not only the label it guesses but also the second-best, third-best, ... ?
EDIT - SOLVED
Actually I 'discovered' that I can use the option -b 1 to ask libSVM to output the probabilities. Then I can just sort them to obtain the N most likely labels.

You mean the underlying probabilities of class classification. This is not produced by support vector machine algorithms. You can however, run an auxiliary model as described in Platt (1999).

Related

Prediction of Stock Returns with ML Algrithm

I am working on a prediction model for stock returns over a fixed period of time (say n days). I am was hoping to gather a few ideas ahead of time. My questions are:
1) Would it be best to turn this into a classification problem, say create a dummy variable with returns larger than x%? Then I could try the entire arsenal of ML Algorithms.
2) If I don't turn it into a classification problem but use say a regression model, would it make sense or be necessary to transform the returns into logs?
Any thoughts are appreciated.
EDIT: My goal with this is relatively broadly defined, in the sense that I would simple like to improve performance of the selection process (pick positive returns and avoid negative ones)
Best under what quality? Turning it into a thresholding problem simply means translating the problem space to a much simpler one. Your problem definition is your own; you can turn it into a binary classification problem (>x or not), a multi-class classification problem (binning into ranges) or simply keep it as a prediction task. If you do the latter, you can still apply binning or classification as a post-processing step.
Classification is just a subclass of prediction. The log transformation employed by logistic regression is no more than a neat trick to turn the outputs into something that resembles a probability distribution; don't put too much thought into it. That said, applying transformations on your output is not necessarily bad (you could for instance apply some normalization to keep your output within the range of some activation function).

How to choose classifier on specific dataset

When given the dataset, normally m instances by n features matrix, how to choose the classifier that is most appropriate for the dataset.
This is just like what algorithm to solve a prime Number. Not every algorithm solve any problem means each problem assigned which finite no. of algorithm. In machine learning you can apply different algorithm on a type of problem.
If matrix contain real numbered features then you can use KNN algorithm can be used. Or if matrix have words as feature then you can use naive bayes classifier which is one of best for text classification. And Machine learning have tons of algorithm you can read them apply to your problem which fits best. Hope you understand what I said.
An interesting but much more general map I found:
http://scikit-learn.org/stable/tutorial/machine_learning_map/
If you have weka, you can use experimenter and choose different algorithms on same data set to evaluate different models.
This project compares many different classifiers on different typical datasets.
If you have no idea, you could use this simple tool auto-weka which will test all the different classifiers you selected within different constraints. Before using auto-weka, you may need to convert your data to ARFF using Weka or just manually (many tutorial on youtube).
The best classifier depends on your data (binary/string/real/tags, patterns, distribution...), what kind of output to predict (binary class / multi-class / evolving classes / a value from regression ?) and the expected performance (time, memory, accuracy). It would also depend on whether you want to update your model frequently or not (ie. if it is a stream, better use an online classifier).
Please note that the best classifier may not be one but an ensemble of different classifiers.

Can I implement a classifier using a function?

I was learning about different techniques for classification, like probablistic classifiers etc , and stubled upon the question Why cant we implement a binary classifier as a Regression function of all the attributes and classify on the basis of the output of the function , say if the output is less than a certain value it belongs to class A , else in class B . Is there any limitation to this method compared to probablistic approach ?
You can do this and it is often done in practice, for example in Logistic Regression. It is not even limited to binary classes. There is no inherent limitation compared to a probabilistic approach, although you should keep in mind that both are fundamentally different approaches and hard to compare.
I think you have some misunderstanding in classification. No matter what kind of classifier you are using (svm, or logistic regression), you can always view the output model as
f(x)>b ===> positive
f(x) negative
This applies to both probabilistic model and non-probabilistic model. In fact, this is something related to risk minimization which results the cut-off branch naturally.
Yes, this is possible. For example, a perceptron does exactly that.
However, it is limited in its use to linearly separable problems. But multiple of them can be combined to solve arbitrarily complex problems in general neural networks.
Another machine learning technique, SVM, works in a similar way. It first transforms the input data into some high dimensional space and then separates it via a linear function.

How to set intercept_scaling in scikit-learn LogisticRegression

I am using scikit-learn's LogisticRegression object for regularized binary classification. I've read the documentation on intercept_scaling but I don't understand how to choose this value intelligently.
The datasets look like this:
10-20 features, 300-500 replicates
Highly non-Gaussian, in fact most observations are zeros
The output classes are not necessarily equally likely. In some cases they are almost 50/50, in other cases they are more like 90/10.
Typically C=0.001 gives good cross-validated results.
The documentation contains warnings that the intercept itself is subject to regularization, like every other feature, and that intercept_scaling can be used to address this. But how should I choose this value? One simple answer is to explore many possible combinations of C and intercept_scaling and choose the parameters that give the best performance. But this parameter search will take quite a while and I'd like to avoid that if possible.
Ideally, I would like to use the intercept to control the distribution of output predictions. That is, I would like to ensure that the probability that the classifier predicts "class 1" on the training set is equal to the proportion of "class 1" data in the training set. I know that this is the case under certain circumstances, but this is not the case in my data. I don't know if it's due to the regularization or to the non-Gaussian nature of the input data.
Thanks for any suggestions!
While you tried oversampling the positive class by setting class_weight="auto"? That effectively oversamples the underrepresented classes and undersamples the majority class.
(The current stable docs are a bit confusing since they seem to have been copy-pasted from SVC and not edited for LR; that's just changed in the bleeding edge version.)

Can SVM solution change after shuffling the inputs?

When training a support vector machine (SVM) for classification with exactly the same data I obtain different results based on the order of the inputs, ie. if I shuffle the data I get different SVMs.
If I understood the theory correctly, the SVM solution should be the same regardless of the order of the inputs, so how come I get the different results? Is there any implementation "detail" in SVM why shuffling would change the solution? I have already checked my code several times, because I think this smells.
I use the SVM implementation in OpenCV.
EDIT: in this case, by shuffling I refer to changing the order of the data points not features.
I am not familiar with the OpenCV implementation. But do this: run several trials on exactly the same data set -- no shuffling, same order, same data points. See if the SVM changes. Obviously, in theory, it shouldn't. But it could be that there is some small randomization step somewhere in the implementation that produces different outputs for the same input.
Edit: As Chris A. asks, do the feature vectors correspond to their proper labels after shuffling? If not, that would obviously destroy your results.
SVM is for solving convex optimization problem, so maximum is unique. That means any random optimization algorithms will solve problem very close to unique optimal solution. And shuffling can't change result above float-point operation accuracy.

Resources