Many packages such as sklearn use RBF kernel as default for SVM classification. I wonder if there is a proof/explanation why this is the "best" kernel for a general use.
Is there some analysis for the richness of boundaries that can be written as a_1*K(x,x_1)+...+a_m K(x,x_m)=0 for different kernels?
I am looking for some references.
Thank you.
Related
In principle components analysis (PCA) why we need to calculate the eigenface to identify an unknown image? why we do not just use similarity measures to find best match betweem an unknown image and the images in the training data set?
I strongly suggest that you study PCA formally. It is not a difficult algorithm to understand.
PCA is a dimension reduction tool, not a classifier. In Scikit-Learn, all classifiers and estimators have a predict method which PCA does not. You need to fit a classifier on the PCA-transformed data. Scikit-Learn has many classifiers.
I want list of ml algorithms that supported with spark mllib and does not supported with mahout and list of ml algorithms that supported with mahout and does not supported with spark mllib thanks.
I think this page gives a good overview. It lists all the supported algorithms. If an algorithm is not shown there, you could assume it is not supported.
Classification:
logistic regression,
naive Bayes,...
Regression:
generalized linear regression,
survival regression,...
Decision trees,
random forests,
and gradient-boosted trees
Recommendation:
alternating least squares (ALS)
Clustering:
K-means,
Gaussian mixtures (GMMs),...
Topic modeling:
latent Dirichlet allocation (LDA)
Frequent itemsets,
association rules,
and sequential pattern mining
I'm following a TensorFlow example that takes a bunch of features (real estate related) and "expensive" (ie house price) as the binary target.
I was wondering if the target could take more than just a 0 or 1. Let's say, 0 (not expensive), 1 (expensive), 3 (very expensive).
I don't think this is possible as the logistic regression model has asymptotes nearing 0 and 1.
This might be a stupid question, but I'm totally new to ML.
I think I found the answer myself. From Wikipedia:
First, the conditional distribution y|x is a Bernoulli distribution rather than a Gaussian distribution, because the dependent variable is binary. Second, the predicted values are probabilities and are therefore restricted to (0,1) through the logistic distribution function because logistic regression predicts the probability of particular outcomes.
Logistic Regression is defined for binary classification tasks.(For more details, please logistic_regression. For multi-class classification problems, you can use Softmax Classification algorithm. Following tutorials shows how to write a Softmax Classifier in Tensorflow Library.
Softmax_Regression in Tensorflow
However, your data set is linearly non-separable (most of the time this is the case in real-world datasets) you have to use an algorithm which can handle nonlinear decision boundaries. Algorithm such as Neural Network or SVM with Kernels would be a good choice. Following IPython notebook shows how to create a simple Neural Network in Tensorflow.
Neural Network in Tensorflow
Good Luck!
From the documentation scikit-learn implements SVC, NuSVC and LinearSVC which are classes capable of performing multi-class classification on a dataset. By the other hand I also read about that scikit learn also uses libsvm for support vector machine algorithm. I'm a bit confused about what's the difference between SVC and libsvm versions, by now I guess the difference is that SVC is the support vector machine algorithm fot the multiclass problem and libsvm is for the binary class problem. Could anybody help me to understad the difference between this?.
They are just different implementations of the same algorithm. The SVM module (SVC, NuSVC, etc) is a wrapper around the libsvm library and supports different kernels while LinearSVC is based on liblinear and only supports a linear kernel. So:
SVC(kernel = 'linear')
is in theory "equivalent" to:
LinearSVC()
Because the implementations are different in practice you will get different results, the most important ones being that LinearSVC only supports a linear kernel, is faster and can scale a lot better.
This is a snapshot from the book Hands-on Machine Learning
For all SVM versions, like c-svm, v-svm, soft margin svm etc., can a support vector not be a training sample?
No, it can't. A support vector is always a sample from the training set.
This is a good thing, because it means SVMs are oblivious to the internal structure of their samples and their support vectors. Only the kernel function, which is separate from the SVM proper, has to know about the structure of samples. While most kernels operate on vectors of numbers, there exist kernels that operate on strings, trees, graphs, you name it.
(Note that linear support vector machines can be trained without taking support vectors into account. I.e., when you train a linear model under hinge loss with appropriate regularization using an algorithm such as SGD, you get a model that is equivalent to an SVM with a linear kernel, but where the support vectors are implicit.)