What kernel to select when using LIBSVM [closed] - machine-learning

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am current performing classification of two labels using libsvm in matlab. I have extracted the features and there are about 69 of them. I just want to know if it is alright to use linear kernel for two-class classification that has around 69 features.
Thanks
Marcus

Yes, it's perfectly fine. I've used linear kernels for data that had about 5000 features. (Not saying this was the best way to go, but it's possible.)
Better yet, why not just try the RBF kernel as well and compare the results?

It really depends on the situation. In different scenarios, the result will be different for different kernels. You need to try.
Give a try for RBF kernel, polynomial kernel. Different kernels give different results. You got to try.

It always depends on the nature of your data. If it is linearly separable then a linear kernel is more than enough.
If the data is non linear and locally encapsulated (in other words, if there exists an hyper sphere that would enclosure all the data - new points included), then a RBF kernel sounds like the proper kernel for the job.
If the data is non linear but it is not encapsulated ( so it might always be a new point far from your training set data) then you might want to try with a continuous kernel such as a polynomial one)
It is hard to deduce the nature of your data in high dimensional spaces, so most of the time the practical solution is try different scenarios and use crossvalidation to pick the proper kernel and parameters.
However, sometimes plotting different pairs of features helped me to have an idea about my data nature, but it is just a very rough indicator.

Related

crossvalidation “balancing” for regression problems [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
Classification problems can exhibit a strong label imbalance in the given dataset. This can be overcome by subsampling certain class weight attributed weights, which allow for balancing the label distributions at least during model training. Stratification on the other hand will allow for keeping a certain label distribution, which stays for every respective fold.
For a regression problem this is by standard libaries e.g. scikit-learn not defined. There are few approaches to cover stratification and a well written theoretical approach for regression subsampling by Scott Lowe here.
I am wondering why label balancing for regression instead of classification problems has so few attention in the Machine Learning community? Regression problems also exhibit different characteristica that might be easier / harder acquired in a data collection setting. And then, is there any framework or paper that further addresses this issue?
The complexity of the problem lies in the continuous nature of regression. When you have the classification, it is very natural to split them into classes because they are basically already split into classes :) Now, if you have a regression, the number of possibilities to split is basically infinite and most importantly, it is just impossible to know what a good split would be. As in the article you sent, you might apply sorted or fractional approaches but in the end, you have no idea to what extent they would be correct. You can also split it into intervals. This is what the stack library does. In the documentation, it says: "For continuous target variable overstock uses binning and categoric split based on bins". What they do is, they first assign the continuous values to bins(classes) and then they apply stratification on them.
There are not many studies on this because everything you can come up with is going to be a heuristic. However, there can be exceptions if you can incorporate some domain knowledge. As an example, let's say that you are trying to predict the frequency of some electromagnetic waves from some set of features. In that case, you have prior knowledge of how the wave frequencies are split. ( https://en.wikipedia.org/wiki/Electromagnetic_spectrum) So now it is natural to split them into continuous intervals with respect to their wavelengths and do a regression stratification. But otherwise, it is hard to come with something that would generalize.
I personally never encountered a study on this.

Binary Image Classification with CNN - best practices for choosing "negative" dataset? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Say, I want to train a CNN to detect whether an image is a car or not.
What are some best practices or methods to choosing the "Not-Car" dataset?
Because this dataset could potentially be infinite (basically anything that is not a car) - is there a guideline on how big the dataset needs to be? Should they contain objects which are very similar to cars, but are not (planes, boats, etc.)?
Like in all of supervised machine learning, the training set should reflect the real distribution that the model is going to work with. Neural network is basically a function approximator. Your actual goal is to approximate the real-world distribution, but in practice it's only possible to get the sample from it, and this sample is the only thing a neural network will see. For any input way outside of the training manifold, the output will be a just a guess (see also this discussion on AI.SE).
So when choosing a negative dataset, the first question you should answer is: What will be the likely use-case of this model? E.g., if you're building an app for a smartphone, then the negative sample should probably include street views, pictures of buildings and stores, people, indoor environment, etc. It's unlikely that the image from the smartphone camera will be a wild animal or abstract painting, i.e., it's an improbable input in your real distribution.
Including images that look like a positive class (trucks, airplanes, boats, etc) is a good idea, because the low-conv-layer features (edges, corners) will be very similar and it's important that the neural network learned important high-level features correctly.
In general, I'd use 5-10x more negative images that positive ones. CIFAR-10 is a good starting point: out of 50000 training images 5000 are the cars, 5000 are the planes, etc. In fact, building a 10-class classifier is not a bad idea. In this case, you'll transform this CNN to a binary classifier by thresholding its certainty that the inferred class is a car. Anything that the CNN isn't certain about will be interpreted as not a car.
I think the negative sample should be selected depend on the occasion your model works on. If your model works on the street as a car detector, the reasonable negative sample should be street road background, trees, pedestrian,and other vehicle that commone in street. So i think there is not a universal negative sample select rules but only depend on your need.

what is the definition of "flexibility" of a method in Machine Learning? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I want to find the definition of "flexibility" of a method in machine learning, just like Lasso, SVM, Least Squares.
here is a representation of the tradeoff between flexibility and interpretability.
And I also think flexibility is a detailed numerical thing.
Because of my reputation, I cannot upload the pictures. If you want to know some details, you can read An Introduction to Statistical Learning, the pictures are on page 25 and page 31.
Thank you.
You can think of "Flexibility" of a model as the model's "curvy-ness" when graphing the model equation. A linear regression is said to be be inflexible. On the other hand, if you have 9 training sets that are each very different, and you require a more rigid decision boundary, the model will be deemed flexible, just because the model can't be a straight line.
Of course, there's an essential assumption that these models are adequate representations of the training data (a linear representation doesn't work well for highly spread out data, and a jagged multinomial representation doesn't work well with straight lines).
As a result, A flexible model will:
Generalize well across the different training sets
Comes at a cost of higher variance. That's why flexible models are generally associated with low bias
Perform better as complexity increases and/or # of data points increase (up to a point, where it won't perform better)
There's no rigor definition of method's flexibility. The aforementioned book says
We can try to address this problem by choosing flexible models that can fit many different possible functional forms flexible for f.
In that sense Least Squares is less flexible since it's a linear model. Kernel SVM, on contrary, doesn't have such limitation and can model fancy non-linear functions.
Flexibility isn't measured in numbers, the picture in the book shows relational data only, not actual points on a 2D-plane.
Flexibility describes the ability to increase the degrees of freedom available to the model to "fit" to the training data.

Automatic flora detection in Images [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
My image dataset is from http://www.image-net.org. There are various synsets for different things like flora, fauna, persons, etc
I have to train a classifier which predicts 1 if the image belongs to floral synset and 0, otherwise.
Images belonging to floral synset can be viewed at http://www.image-net.org/explore, by clicking on the plant, flora, plant life option in the left pane.
These images include wide variety of flora - like trees, herbs, shrubs, flowers etc.
I am not able to figure out what features to use to train the classifier. There is a lot of greenery in these images, but there are many flower images, which don't have much green component. Another feature is the shape of the leaves and the petals.
It would be helpful if anyone could suggest how to extract this shape feature and use it to train the classifier. Also suggest what other features could be used to train the classifier.
And after extracting features, which algorithm is to be used to train the classifier?
Not sure that shape information is the approach for the data set you have linked to.
Just having a quick glance at some of the images I have a few suggestions for classification:
Natural scenes rarely have straight lines - Line detection
You can discount scenes which have swathes of "unnatural" colour in them.
If you want to try something more advanced I would suggest that a hybrid between entropy/pattern recognition would form a good classifier as natural scenes have alot of both.
Attempting template-matching/shape matching for leaves/petals will break your heart - you need to use something much more generalised.
As for which classifier to use... I'd normally advise K-means initially and once you have some results determine if the extra effort to implement Bayes or a Neural Net would be worth it.
Hope this helps.
T.
Expanded:
"Unnatural Colors" could be highly saturated colours outside of the realms of greens and browns. They are good for detecting nature scenes as there should be ~50% of the scene in the green/brown spectrum even if a flower is at the center of it.
Additionally straight line detection should yield few results in nature scenes as straight edges are rare in nature. On a basic level generate an edge image, Threshold it and then search for line segments (pixels which approximate a straight line).
Entropy requires some Machine Vision knowledge. You would approach the scene by determining localised entropys and then histogramming the results here is a similar approach that you will have to use.
You would want to be advanced at Machine Vision if you are to attempt pattern recognition as this is a difficult subject and not something you can throw up in code sample. I would only attempt to implement these as a classifier once colour and edge information(lines) has been exhausted.
If this is a commercial application then a MV expert should be consulted. If this is a college assignment (unless it is a thesis) colour and edge/line information should be more than enough.
HOOG features are pretty much the de-facto standard for these kinds of problems, I think. They're a bit involved to compute (and I don't know what environment you're working in) but powerful.
A simpler solution which might get you up and running, depending on how hard the dataset is, is to extract all overlapping patches from the images, cluster them using k-means (or whatever you like), and then represent an image as a distribution over this set of quantised image patches for a supervised classifier like an SVM. You'd be surprised how often something like this works, and it should at least provide a competitive baseline.

Neural network: should the algorithm be rewritten for every case? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have 2 sequences of numbers and I'd want to continue it using neural algorithms (there is some logic in them, but I don't know what, and there are no external factors affecting the selection). There are some relationship is in each of the two sequences separately, as well as between them.
So, I'm new to machine learning, but I've got such an idea: is there any already written-and-well-working applications (libraries) that implement exact algorithms for me not to learn them all before using. Simply like "most-frequently-used-neural-algorithms-kit".
I'm thinking of analysing some music sheets and two sequences: "notes" and "durations".
OK, according to the comments I think I got what you want.
Generally, no, you don't need to rewrite the standard algorithm of ANN. But be aware that ANN is not an algorithm, but a cluster of algorithms (including BackPropagation-ANN, Hopfield-ANN, Boltzmann Machine etc). Among them I recommend BP-ANN which is simple and suitable for your project. You might want to input a sequences of the known notes and duration, and then expect an output of the next note and duration.
To use BP-ANN, you don't need to rewrite them. Due to its a widely-used algorithm, there are many toolkits and open source implementations of it:
Google "back propagation neural network implementation", you will find it easily. There are also a few opensource projects on Github(in both C language and Matlab): https://github.com/search?q=back+propagation&type=Everything&repo=&langOverride=&start_value=1
For further reading if you also want to deeply understand the details of its implementation, read this: http://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1279&context=ecetr&sei-redir=1
If you're interested in neural networks there are plenty of libraries available.
ANNIE is one such open source example, the MATLAB Neural Network toolbox is a
commercial example. These are libraries which you tell the architecture of the
neural network, you can train, test, verify, etc. The important part in all
these machine learning methods is how you represent your data, and those were
the comments you were getting (for example Predictor's). Sometimes you get
excellent results with one representation and very bad results with others.
There are also libraries to train SVMs (a specialized algorithm to train neural
networks) with quadratic regularization, LIBSVM is one great example.
There is also plenty of work on predicting time series with neural networks (if
that is what you want to do with music, I am not sure what exactly you want).
If the input is a series of (note, duration) pairs, then I suspect you'd get much farther by summarizing the historical note-to-note transitions or by something similar in an effort to capture the syntax of the music (Markov analysis, etc.), than you would by stuffing this into a neural network. It may help, too, to try representing the series as note differentials, measuring how many notes up or down the scale the new note is, rather than the actual value of the note itself.

Resources