Is Gradient Descent algorithm ever used during training of any unsupervised training like clustering, collaborative filtering, etc..?
Gradient descent can be used for a whole bunch of unsupervised learning tasks. In fact, neural networks, which use the Gradient Descent algorithm are used widely for unsupervised learning tasks, like representations of text or natural language in vector space (word2vec).
You can also think of dimensionality reduction techniques like autoencoders, which use Gradient Descent as well.
I am not aware of GD being directly used in clustering, but this link discusses an approach that utilizes Autoencoders with Kmeans, which use GD.
Read this link as well, which discusses a similar question.
In unsupervised algorithms, you don't need to do this. For example, in k-Means, where you are trying to minimize the mean squared error (MSE), you can minimize the error directly at each step given the assignments; no gradients needed.
Related
Where and why is linear and non-linear transformations is useful. What are use cases in Machine learning and in deep learning especially for computer vision.
A neural net can be seen as framework for linear transformation.
Think of the math operations which are defining the linear transformation and compare it with a neural net.
If by Machine Learning (ML) we mean any program that learns from data, then, yes, regression can be said to be part of ML. But there are several other aspects to Machine Learning such as : solution is improved iteratively based on some performance measure. Whereas for linear regression there is a closed form solution in the form of a direct formula using which all the parameters can be determined and it does not involve iterations. But there is other version of parameter estimation for regression that makes use of gradient descent and it involves several iterations. Does it mean that this iterative version of parameter estimation for regression is done forcefully to bring regression under machine learning umbrella? Or the iterative version has some advantages that the direct formula does not offer?
I won't comment on whether regression is part of ML or not (I don't really see where your definitions came from). But regarding the advantage of an iterative approach, please note that the closed-form solution for linear regression is as follows:
Where X is your design matrix.
Please note that inverting a matrix is an O(n^3) operation, which is infeasible for large n. This is the obvious advantage of the iterative approach using GD.
I am intended to do a yes/no classifier. The problem is that the data does not come from me, so I have to work with what I have been given. I have around 150 samples, each sample contains 3 features, these features are continuous numeric variables. I know the dataset is quite small. I would like to make you two questions:
A) What would be the best machine learning algorithm for this? SVM? a neural network? All that I have read seems to require a big dataset.
B)I could make the dataset a little bit bigger by adding some samples that do not contain all the features, only one or two. I have read that you can use sparse vectors in this case, is this possible with every machine learning algorithm? (I have seen them in SVM)
Thanks a lot for your help!!!
My recommendation is to use a simple and straightforward algorithm, like decision tree or logistic regression, although, the ones you refer to should work equally well.
The dataset size shouldn't be a problem, given that you have far more samples than variables. But having more data always helps.
Naive Bayes is a good choice for a situation when there are few training examples. When compared to logistic regression, it was shown by Ng and Jordan that Naive Bayes converges towards its optimum performance faster with fewer training examples. (See section 4 of this book chapter.) Informally speaking, Naive Bayes models a joint probability distribution that performs better in this situation.
Do not use a decision tree in this situation. Decision trees have a tendency to overfit, a problem that is exacerbated when you have little training data.
Is this process correct?
Suppose We have a bunch of data such as MNIST.
We just feed all these data(without label) to RBM and resample each data from trained model.
Then output can be treated as new data for classification.
Do I understand it correctly?
What is the purpose of using RBM?
You are correct, RBMs are a form of unsupervised learning algorithm that are commonly used to reduce the dimensionality of your feature space. Another common approach is to use autoencoders.
RBMs are trained using the contrastive divergence algorithm. The best overview of this algorithm comes from Geoffrey Hinton who came up with it.
https://www.cs.toronto.edu/~hinton/absps/guideTR.pdf
A great paper about how unsupervised learning improves performance can be found at http://jmlr.org/papers/volume11/erhan10a/erhan10a.pdf. The paper shows that unsupervised learning provides better generalization and filters (if using CRBMs)
I'm using an OpenCV Haar classifier in my work but I keep reading conflicting reports on whether the OpenCV Haar classifier is an SVM or not, can anyone clarify if it is using an SVM? Also if it is not using an SVM what advantages does the Haar method offer over an SVM approach?
SVM and Boosting (AdaBoost, GentleBoost, etc) are feature classification strategies/algorithms. Support Vector Machines solve a complex optimization problem, often using kernel functions which allows us to separate samples by working in a much higher dimension feature space. On the other hand, boosting is a strategy based on combining lots of "cheap" classifiers in a smart way, which leads to a very fast classification. Those weak classifiers can be even SVM.
Haar-like features are a kind of features based in integral images and very suitable for Computer Vision problems.
This is, you can combine Haar features with any of the two classification schemes.
It isn't SVM. Here is the documentation:
http://docs.opencv.org/modules/objdetect/doc/cascade_classification.html#haar-feature-based-cascade-classifier-for-object-detection
It uses boosting (supporting AdaBoost and a variety of other similar methods -- all based on boosting).
The important difference is related to speed of evaluation is important in cascade classifiers and their stage based boosting algorithms allow very fast evaluation and high accuracy (in particular support training with many negatives), at a better balance point than an SVM for this particular application.