Lasvm documentation and information [closed] - machine-learning

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have thousands of samples for training and test and I want to use SVM with RBF kernel to classify them. The problem is the fact that the Libsvm's implementation of RBF kernel is very slow when using 10k or more data. The main focus of the slow performance is the grid search.
I read about the Liblinear and Lasvm. But liblinear is not what I want because Svms with linear kernel usually have smaller accuracy than RBF kernel accuracy.
I was searching for Lasvm and I can't find useful informations about it. The project site is very poor about information of it. I want to know if Lasvm can use the RBF kernel or it has a specific kind of kernel, if I should scale the test and treining data and if I can do the grid search for my kernel parameters with cross validation.

LaSVM has an RBF kernel implementation too. Based on my experience on large data (>100.000 instances in >1.000 dimensions), it is no faster than LIBSVM though. If you really want to use a nonlinear kernel for huge data, you could try EnsembleSVM.
If your data is truly huge and you are not familiar with ensemble learning, LIBLINEAR is the way to go. If you have a high number of input dimensions, the linear kernel is usually not much worse than RBF while being orders of magnitude faster.

Related

Why Up sampling over down sampling? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I have a data of 191 samples and have created a logistic regression. I have first run the model with using the raw data and then went to upsampling.
What I am not able to understand is: -
Why to do upsampling before downsampling or both up and down sampling.
If upsampling creates a problem of over fitting then it can be handled with scaling of the data.
After up sampling or any other sampling, what are the parameters that I must look into so that I proceed with another sampling e.g. down sampling or up and down sampling?
I kindly request someone to help me understand the above.
Downsampling always means a loss of information, which is why in general downsampling is preferably avoided.
Scaling is actually the best alternative. Typically the data is up-sampled as it is underrepresented in the data compared to the majority data. As many algorithm try to minimise the empirical risk -- the probability of a misclassification -- they are focusing more on the majority data. The reason for upsampling/downsampling is than as either the representation in the training data is not representative or the cost of misclassification of the minority data is much higher, e.g., in predictive maintenance. The best way to correct this is actually a cost-matrix. However, as quite a few algorithms do not have an out-ofthe-box mechnisms for cost-functions upsampling/downsampling is often used as approximation. Hence, upsampling is only to be preferred if additional "noise" can be introduced during the sampling process
standard validation

Why GPUs comes more into play than CPUS when it comes to dealing with Deep Learning? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
In almost most of the cases, I come across about GPUs while dealing with any execution part in Deep Learning.
This has to do with GPU architecture versus CPU. It turns out gaming requires a lot of matrix multiplications, hence the GPU architecture was optimized for these types of operations, specifically they are optimized for high rate floating-point arithmetic. More on this here
It so happens that neural networks are mostly matrix multiplications.
For example:
Is the mathematical formulation of a simple neural network with one hidden layer. W_h is a matrix of weights that multiplies your input x, to which we add a bias b_h. The linear equation W_hx + b_h can be compacted to a single matrix multiplication. The sigma is a nonlinear activation like sigmoid. The outer sigmoid is again a matrix multiplication. Hence GPUs

Dimensionality / noise reduction techniques for regression problems [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
What are some techniques for dimensionality reduction in regression problems? I have tried the only unsupervised techniques I know, PCA and Kernel PCA (using the scikit learn library), but I have not seen any improvements using these. Perhaps these are only suitable for classification problems? What are some other techniques I can try? Preferably, ones that are implemented in sklearn.
This is a very general question, and the suitability of the techniques (or combinations of them), really depends on your problem specifics.
In general, there are several categories of dimension reduction (asides from those you mentioned.
Perhaps the simplest form of dimension reduction is to just use some of the features, in which case we are really talking about feature selection (see sklearn's module).
Another way would be to cluster (sklearn's), and replace each cluster by an aggregate of its components.
Finally, some regressors use l1 penalization and properties of convex optimization to simultaneously select a subset of features; in sklearn, see the lasso and elastic net.
Once again, this a very broad problem. There are entire books and competitions even of feature selection, which is a subset of dimension reduction.
Adding to #AmiTavory 's good answer: PCA principal components analysis can be used here. If you do not wish to perform dimensionality reduction simply retain the same number of eigenvectors from the PCA as the size of the input matrix: in your case 20.
The resulting output will be orthogonal eigenvectors: you may consider them to provide the "transformation" you are seeking as follow: the vectors are ranked by their respective amount of variance they represent wrt the inputs.

Different usage of Machine Learning classifiers [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have learned several classifiers in Machine learning - Decision tree, Neural network, SVM, Bayesian classifier, K-NN...etc.
Can anyone please help to understand when I should prefer one of the classifier over other - for example - in which situation(nature of data sets, etc) I should prefer decision tree over neural net OR which situation SVM might work better than Bayesian ??
Sorry if this is not a good place to post this question.
Thanks.
This is EXTREMELY related to the nature of the dataset. There are several meta-learning approaches that will tell you which classifier to use, but generaly there isn't a golden rule.
If you're data is easily separable (easy to distinguish entries from different classes), perhaps decision-trees or SVMs (with a linear kernel) are good enough. However, if your data needs to be transformed into other [higher] dimensional spaces, kernel-based classifiers might work well, such as RBF SVMs. SVMs also work better with non-redundant, independent features. When combinations between features are needed, artificial neural networks and bayesian classifiers work good as well.
Yet again, this is highly subjective and strongly depends on your feature set. For instance, having a single feature that is highly correlated with the class might determine which classifier works best. That said, overall, the no-free-lunch theorem says that no classifier is better for everything, but SVMs are generally regarded as the current best bet on binary classification.

How does google prediction API work [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Can someone predict :) or guess how does the Google Prediction API work under the hood?
I know there are some machine learning techniques:
Decision Trees, Neuron networks, naive Bayesian classification etc.
Which technique do you think Google is using?
The single answer to the question on Stats SE is good, given limited information from Google itself. It concludes with the same thought I had, that Google isn't telling regarding the innards of the Google Prediction API.
There was a Reddit discussion about this too. The most helpful response was from a user who is credible due to his prior work in that field (in my opinion). He wasn't certain what Google Prediction API was using, but had some ideas about what it was NOT using, based on discussions on the Google Group for the Prediction API:
the current implementation is not able to deal correctly with non-linear
separable data sets (XOR and Circular). That probably means that they
are fitting linear models such as regularized logistic regression or
SVMs but not neural networks or kernel SVMs. Fitting linear models is
very scalable to both wide problems (many features) and long problems
(many samples) provided that you use... stochastic gradient descent
with truncated gradients to handle sparsity inducing regularizers.
There was a little more, and of course, some other responses. Note that Google Prediction API has since released a new version, but it is not any more obvious (to me) how it works "under the hood".

Resources