Unsupervised Classification in Machine Learning - machine-learning

Clustering (eg: K-means , EM algorithm etc) is used for unsupervised classification by forming clusters in the data sets using some distance measurement between data points
My question is :
Other than clustering what can I use to perform unsupervised classification and how? Or is there no other option other than clustering for unsupervised Classification?
Edit: Yes I meant k-means

The short answer is NO, clustering is not the only field under unsupervised learning. Unsupervised Learning is way more broader than only clustering. Clustering is just a sub-field of (or type of) unsupervised learning.
Little correction: KNN is not a clustering method, it is a classification algorithm. You probably meant to say k-means.
The essence of unsupervised learning is basically learning data without ground truth labels. Thus, the goal of unsupervised learning is to find representations of data given. The applications of unupervised learning vary a lot, though academically it is true that the field is less attractive to researchers due to its complexity and effort to build new stuff and/or make improvements.
Dimension reduction can be considered under unsupervised learning as you want to find a good representation of data in lower dimensions. They are also useful for visualizing high-dimension data. PCA, SNE, tSNE, Isomap, etc. are type of these applications.
Clustering methods are type of unsupervised learning as well where you want to group and label values based on some distance/divergence measure. Some applications could be K-means, Hierarchical clustering, etc.
Generative models, generative models model the conditional probability P(X|Y=y). The research in this field boomed since the publication of GAN (see paper). GANs can learn the data distribution without seeing the data explicitly. Methods are various where GANs, VAE, Gaussian Mixture, LDA, Hidden Markov model.
You can read further here on unsupervised learning.

Clustering is a general term that stands for the case where data points will be split into classes without any information about the true choices. So no matter what kind of algorithm you are applying, it will be a clustering if it is unsupervised classification.
Of course there are many different approaches depending on the case, data, problem and etc. If you could provide more context about your exact task, I might name some approaches.

Related

Multiple sensors = multiple deep learning models?

Let's say I have 30,000 vibration sensors monitoring 30,000 drills (1 sensor per drill) in different workplaces. I need to detect anomalies in vibration patterns.
Given we have enough historical data, how would you go about creating models for this problem?
This is a somewhat ambiguous question, however you can follow the following broad steps to perform anomaly detection:
Load the data into your computing environment, maybe Python, MATLAB, or R. This is assuming your data can fit into memory, else you may want to consider setting up an Hadoop or Spark cluster on Amazon EC2 or other virtual clusters.
You should perform some EDA to understand your data better. This will reveal more on the underlying struture of the data, what kind of distribution is it from, etc.
Make rough visual plots of your data if possible. This will come in handy when you need to polish some final plots for a presentation when reporting your analysis.
Based on the EDA, you can then intuitivey prepare your data for processing. You may need to transform, rescale or standardize the dataset before applying any Machine Learning technique for Anomaly detection.
For supervised datasets (i.e. labels are provided), you may consider algorithms such as SVM, Neural Networks, XGBoost or any other appropriate supervised technique. However, great care much be taken in evaluating the results because typical to anomaly detection datasets, there is more often than not a very small number of positive examples (y = 1) with respect to the total number of examples. This is called class imbalance. There are various ways of mitigating this problem. See Class Imbalance Problem.
For unsupervised datasets, techniques such as the density based methods (i.e. Local Outlier Factor (LOF) and its varieties, k-Nearest Neighbor (kNN) -> its a very popular method), One-class SVM, etc. A monograph of unsupervised methods for anomaly detection is detailed in this study. A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data
N.b..
- Don't forget to consider rudimentary ML practices when building your models such as: splitting into training set/ test set or exploring resampling methods such as k-fold CV, LOOCV, etc to control bias/ variance in your results.
- Explore other techniques such as Ensemble methods (i.e. Boosting & Bagging algorithms) to improde model accuracy.
- Deep learning techniques such as the Muli-layer Perceptron can be explored on this problem. If there is some time-series component, a Recurrent Neural Network, RNN can be explored.

Is train/test-Split in unsupervised learning necessary/useful?

In supervised learning I have the typical train/test split to learn the algorithm, e.g. Regression or Classification. Regarding unsupervised learning, my question is: Is train/test split necessary and useful? If yes, why?
Well This Depend on the Problem, the form of dataset and Class of Unsupervised algorithm used to solve the particular problem.
Roughly:-
Dimensionality reduction techniques are usually tested by calculating the error in reconstruction so there we can use k-fold cross-validation procedure
But on clustering algorithm, I would suggest doing statistical testing in order to test performance. There is also little time-consuming trick which splitting dataset and hand label the test set with meaningfull classes and cross validate
In any case unsupervised algorithm is used on supervised data then it always good cross-validate
overall:- It is not necessary to split data in the train-test set but if we can do it it is always better
Here is article which explains how cross-validation is a good tool for unsupervised learning
http://udini.proquest.com/view/cross-validation-for-unsupervised-pqid:1904931481/ and the full text is available here http://arxiv.org/pdf/0909.3052.pdf
https:///www.researchgate.net/post/Which_are_the_methods_to_validate_an_unsupervised_machine_learning_algorithm
Definitely it is useful.
Few points that I know about "why".
When testing a model comes into the story, it should always perform on unseen data. So it is better that you have spitted data using train_test_split.
The second case is that the data should always be shuffled in the format. Otherwise, the n-1 type of data will occur when fitting the model that may not give good results.

Machine Learning: Unsupervised Backpropagation

I'm having trouble with some of the concepts in machine learning through neural networks. One of them is backpropagation. In the weight updating equation,
delta_w = a*(t - y)*g'(h)*x
t is the "target output", which would be your class label, or something, in the case of supervised learning. But what would the "target output" be for unsupervised learning?
Can someone kindly provide an example of how you'd use BP in unsupervised learning, specifically for clustering of classification?
Thanks in advance.
The most common thing to do is train an autoencoder, where the desired outputs are equal to the inputs. This makes the network try to learn a representation that best "compresses" the input distribution.
Here's a patent describing a different approach, where the output labels are assigned randomly and then sometimes flipped based on convergence rates. It seems weird to me, but okay.
I'm not familiar with other methods that use backpropogation for clustering or other unsupervised tasks. Clustering approaches with ANNs seem to use other algorithms (example 1, example 2).
I'm not sure which unsupervised machine learning algorithm uses backpropagation specifically; if there is one I haven't heard of it. Can you point to an example?
Backpropagation is used to compute the derivatives of the error function for training an artificial neural network with respect to the weights in the network. It's named as such because the "errors" are "propagating" through the network "backwards". You need it in this case because the final error with respect to the target depends on a function of functions (of functions ... depending on how many layers in your ANN.) The derivatives allow you to then adjust the values to improve the error function, tempered by the learning rate (this is gradient descent).
In unsupervised algorithms, you don't need to do this. For example, in k-Means, where you are trying to minimize the mean squared error (MSE), you can minimize the error directly at each step given the assignments; no gradients needed. In other clustering models, such as a mixture of Gaussians, the expectation-maximization (EM) algorithm is much more powerful and accurate than any gradient-descent based method.
What you might be asking is about unsupervised feature learning and deep learning.
Feature learning is the only unsupervised method I can think of with respect of NN or its recent variant.(a variant called mixture of RBM's is there analogous to mixture of gaussians but you can build a lot of models based on the two). But basically Two models I am familiar with are RBM's(restricted boltzman machines) and Autoencoders.
Autoencoders(optionally sparse activations can be encoded in optimization function) are just feedforward neural networks which tune its weights in such a way that the output is a reconstructed input. Multiple hidden layers can be used but the weight initialization uses a greedy layer wise training for better starting point. So to answer the question the target function will be input itself.
RBM's are stochastic networks usually interpreted as graphical model which has restrictions on connections. In this setting there is no output layer and the connection between input and latent layer is bidirectional like an undirected graphical model. What it tries to learn is a distribution on inputs(observed and unobserved variables). Here also your answer would be input is the target.
Mixture of RBM's(analogous to mixture of gaussians) can be used for soft clustering or KRBM(analogous to K-means) can be used for hard clustering. Which in effect feels like learning multiple non-linear subspaces.
http://deeplearning.net/tutorial/rbm.html
http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial
An alternative approach is to use something like generative backpropagation. In this scenario, you train a neural network updating the weights AND the input values. The given values are used as the output values since you can compute an error value directly. This approach has been used in dimensionality reduction, matrix completion (missing value imputation) among other applications. For more information, see non-linear principal component analysis (NLPCA) and unsupervised backpropagation (UBP) which uses the idea of generative backpropagation. UBP extends NLPCA by introducing a pre-training stage. An implementation of UBP and NLPCA and unsupervised backpropagation can be found in the waffles machine learning toolkit. The documentation for UBP and NLPCA can be found using the nlpca command.
To use back-propagation for unsupervised learning it is merely necessary to set t, the target output, at each stage of the algorithm to the class for which the average distance to each element of the class before updating is least. In short we always try to train the ANN to place its input into the class whose members are most similar in terms of our input. Because this process is sensitive to input scale it is necessary to first normalize the input data in each dimension by subtracting the average and dividing by the standard deviation for each component in order to calculate the distance in a scale-invariant manner.
The advantage to using a back-prop neural network rather than a simple distance from a center definition of the clusters is that neural networks can allow for more complex and irregular boundaries between clusters.

When should I use support vector machines as opposed to artificial neural networks?

I know SVMs are supposedly 'ANN killers' in that they automatically select representation complexity and find a global optimum (see here for some SVM praising quotes).
But here is where I'm unclear -- do all of these claims of superiority hold for just the case of a 2 class decision problem or do they go further? (I assume they hold for non-linearly separable classes or else no-one would care)
So a sample of some of the cases I'd like to be cleared up:
Are SVMs better than ANNs with many classes?
in an online setting?
What about in a semi-supervised case like reinforcement learning?
Is there a better unsupervised version of SVMs?
I don't expect someone to answer all of these lil' subquestions, but rather to give some general bounds for when SVMs are better than the common ANN equivalents (e.g. FFBP, recurrent BP, Boltzmann machines, SOMs, etc.) in practice, and preferably, in theory as well.
Are SVMs better than ANN with many classes? You are probably referring to the fact that SVMs are in essence, either either one-class or two-class classifiers. Indeed they are and there's no way to modify a SVM algorithm to classify more than two classes.
The fundamental feature of a SVM is the separating maximum-margin hyperplane whose position is determined by maximizing its distance from the support vectors. And yet SVMs are routinely used for multi-class classification, which is accomplished with a processing wrapper around multiple SVM classifiers that work in a "one against many" pattern--i.e., the training data is shown to the first SVM which classifies those instances as "Class I" or "not Class I". The data in the second class, is then shown to a second SVM which classifies this data as "Class II" or "not Class II", and so on. In practice, this works quite well. So as you would expect, the superior resolution of SVMs compared to other classifiers is not limited to two-class data.
As far as i can tell, the studies reported in the literature confirm this, e.g., In the provocatively titled paper Sex with Support Vector Machines substantially better resolution for sex identification (Male/Female) in 12-square pixel images, was reported for SVM compared with that of a group of traditional linear classifiers; SVM also outperformed RBF NN, as well as large ensemble RBF NN). But there seem to be plenty of similar evidence for the superior performance of SVM in multi-class problems: e.g., SVM outperformed NN in protein-fold recognition, and in time-series forecasting.
My impression from reading this literature over the past decade or so, is that the majority of the carefully designed studies--by persons skilled at configuring and using both techniques, and using data sufficiently resistant to classification to provoke some meaningful difference in resolution--report the superior performance of SVM relative to NN. But as your Question suggests, that performance delta seems to be, to a degree, domain specific.
For instance, NN outperformed SVM in a comparative study of author identification from texts in Arabic script; In a study comparing credit rating prediction, there was no discernible difference in resolution by the two classifiers; a similar result was reported in a study of high-energy particle classification.
I have read, from more than one source in the academic literature, that SVM outperforms NN as the size of the training data decreases.
Finally, the extent to which one can generalize from the results of these comparative studies is probably quite limited. For instance, in one study comparing the accuracy of SVM and NN in time series forecasting, the investigators reported that SVM did indeed outperform a conventional (back-propagating over layered nodes) NN but performance of the SVM was about the same as that of an RBF (radial basis function) NN.
[Are SVMs better than ANN] In an Online setting? SVMs are not used in an online setting (i.e., incremental training). The essence of SVMs is the separating hyperplane whose position is determined by a small number of support vectors. So even a single additional data point could in principle significantly influence the position of this hyperplane.
What about in a semi-supervised case like reinforcement learning? Until the OP's comment to this answer, i was not aware of either Neural Networks or SVMs used in this way--but they are.
The most widely used- semi-supervised variant of SVM is named Transductive SVM (TSVM), first mentioned by Vladimir Vapnick (the same guy who discovered/invented conventional SVM). I know almost nothing about this technique other than what's it is called and that is follows the principles of transduction (roughly lateral reasoning--i.e., reasoning from training data to test data). Apparently TSV is a preferred technique in the field of text classification.
Is there a better unsupervised version of SVMs? I don't believe SVMs are suitable for unsupervised learning. Separation is based on the position of the maximum-margin hyperplane determined by support vectors. This could easily be my own limited understanding, but i don't see how that would happen if those support vectors were unlabeled (i.e., if you didn't know before-hand what you were trying to separate). One crucial use case of unsupervised algorithms is when you don't have labeled data or you do and it's badly unbalanced. E.g., online fraud; here you might have in your training data, only a few data points labeled as "fraudulent accounts" (and usually with questionable accuracy) versus the remaining >99% labeled "not fraud." In this scenario, a one-class classifier, a typical configuration for SVMs, is the a good option. In particular, the training data consists of instances labeled "not fraud" and "unk" (or some other label to indicate they are not in the class)--in other words, "inside the decision boundary" and "outside the decision boundary."
I wanted to conclude by mentioning that, 20 years after their "discovery", the SVM is a firmly entrenched member in the ML library. And indeed, the consistently superior resolution compared with other state-of-the-art classifiers is well documented.
Their pedigree is both a function of their superior performance documented in numerous rigorously controlled studies as well as their conceptual elegance. W/r/t the latter point, consider that multi-layer perceptrons (MLP), though they are often excellent classifiers, are driven by a numerical optimization routine, which in practice rarely finds the global minimum; moreover, that solution has no conceptual significance. On the other hand, the numerical optimization at the heart of building an SVM classifier does in fact find the global minimum. What's more that solution is the actual decision boundary.
Still, i think SVM reputation has declined a little during the past few years.
The primary reason i suspect is the NetFlix competition. NetFlix emphasized the resolving power of fundamental techniques of matrix decomposition and even more significantly t*he power of combining classifiers. People combined classifiers long before NetFlix, but more as a contingent technique than as an attribute of classifier design. Moreover, many of the techniques for combining classifiers are extraordinarily simple to understand and also to implement. By contrast, SVMs are not only very difficult to code (in my opinion, by far the most difficult ML algorithm to implement in code) but also difficult to configure and implement as a pre-compiled library--e.g., a kernel must be selected, the results are very sensitive to how the data is re-scaled/normalized, etc.
I loved Doug's answer. I would like to add two comments.
1) Vladimir Vapnick also co-invented the VC dimension which is important in learning theory.
2) I think that SVMs were the best overall classifiers from 2000 to 2009, but after 2009, I am not sure. I think that neural nets have improved very significantly recently due to the work in Deep Learning and Sparse Denoising Auto-Encoders. I thought I saw a number of benchmarks where they outperformed SVMs. See, for example, slide 31 of
http://deeplearningworkshopnips2010.files.wordpress.com/2010/09/nips10-workshop-tutorial-final.pdf
A few of my friends have been using the sparse auto encoder technique. The neural nets build with that technique significantly outperformed the older back propagation neural networks. I will try to post some experimental results at artent.net if I get some time.
I'd expect SVM's to be better when you have good features to start with. IE, your features succinctly capture all the necessary information. You can see if your features are good if instances of the same class "clump together" in the feature space. Then SVM with Euclidian kernel should do the trick. Essentially you can view SVM as a supercharged nearest neighbor classifier, so whenever NN does well, SVM should do even better, by adding automatic quality control over the examples in your set. On the converse -- if it's a dataset where nearest neighbor (in feature space) is expected to do badly, SVM will do badly as well.
- Is there a better unsupervised version of SVMs?
Just answering only this question here. Unsupervised learning can be done by so-called one-class support vector machines. Again, similar to normal SVMs, there is an element that promotes sparsity. In normal SVMs only a few points are considered important, the support vectors. In one-class SVMs again only a few points can be used to either:
"separate" a dataset as far from the origin as possible, or
define a radius as small as possible.
The advantages of normal SVMs carry over to this case. Compared to density estimation only a few points need to be considered. The disadvantages carry over as well.
Are SVMs better than ANNs with many classes?
SVMs have been designated for discrete classification. Before moving to ANNs, try ensemble methods like Random Forest , Gradient Boosting, Gaussian Probability Classification etc
What about in a semi-supervised case like reinforcement learning?
Deep Q learning provides better alternatives.
Is there a better unsupervised version of SVMs?
SVM is not suited for unsupervised learning. You have other alternatives for unsupervised learning : K-Means, Hierarchical clustering, TSNE clustering etc
From ANN perspective, you can try Autoencoder, General adversarial network
Few more useful links:
towardsdatascience
wikipedia

What is the difference between supervised learning and unsupervised learning? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
In terms of artificial intelligence and machine learning, what is the difference between supervised and unsupervised learning?
Can you provide a basic, easy explanation with an example?
Since you ask this very basic question, it looks like it's worth specifying what Machine Learning itself is.
Machine Learning is a class of algorithms which is data-driven, i.e. unlike "normal" algorithms it is the data that "tells" what the "good answer" is. Example: a hypothetical non-machine learning algorithm for face detection in images would try to define what a face is (round skin-like-colored disk, with dark area where you expect the eyes etc). A machine learning algorithm would not have such coded definition, but would "learn-by-examples": you'll show several images of faces and not-faces and a good algorithm will eventually learn and be able to predict whether or not an unseen image is a face.
This particular example of face detection is supervised, which means that your examples must be labeled, or explicitly say which ones are faces and which ones aren't.
In an unsupervised algorithm your examples are not labeled, i.e. you don't say anything. Of course, in such a case the algorithm itself cannot "invent" what a face is, but it can try to cluster the data into different groups, e.g. it can distinguish that faces are very different from landscapes, which are very different from horses.
Since another answer mentions it (though, in an incorrect way): there are "intermediate" forms of supervision, i.e. semi-supervised and active learning. Technically, these are supervised methods in which there is some "smart" way to avoid a large number of labeled examples. In active learning, the algorithm itself decides which thing you should label (e.g. it can be pretty sure about a landscape and a horse, but it might ask you to confirm if a gorilla is indeed the picture of a face). In semi-supervised learning, there are two different algorithms which start with the labeled examples, and then "tell" each other the way they think about some large number of unlabeled data. From this "discussion" they learn.
Supervised learning is when the data you feed your algorithm with is "tagged" or "labelled", to help your logic make decisions.
Example: Bayes spam filtering, where you have to flag an item as spam to refine the results.
Unsupervised learning are types of algorithms that try to find correlations without any external inputs other than the raw data.
Example: data mining clustering algorithms.
Supervised learning
Applications in which the training data comprises examples of the input vectors along with their corresponding target vectors are known as supervised learning problems.
Unsupervised learning
In other pattern recognition problems, the training data consists of a set of input vectors x without any corresponding target values. The goal in such unsupervised learning problems may be to discover groups of similar examples within the data, where it is called clustering
Pattern Recognition and Machine Learning (Bishop, 2006)
In supervised learning, the input x is provided with the expected outcome y (i.e., the output the model is supposed to produce when the input is x), which is often called the "class" (or "label") of the corresponding input x.
In unsupervised learning, the "class" of an example x is not provided. So, unsupervised learning can be thought of as finding "hidden structure" in unlabelled data set.
Approaches to supervised learning include:
Classification (1R, Naive Bayes, decision tree learning algorithm, such
as ID3 CART, and so on)
Numeric Value Prediction
Approaches to unsupervised learning include:
Clustering (K-means, hierarchical clustering)
Association Rule Learning
I can tell you an example.
Suppose you need to recognize which vehicle is a car and which one is a motorcycle.
In the supervised learning case, your input (training) dataset needs to be labelled, that is, for each input element in your input (training) dataset, you should specify if it represents a car or a motorcycle.
In the unsupervised learning case, you do not label the inputs. The unsupervised model clusters the input into clusters based e.g. on similar features/properties. So, in this case, there is are no labels like "car".
For instance, very often training a neural network is supervised learning: you're telling the network to which class corresponds the feature vector you're feeding.
Clustering is unsupervised learning: you let the algorithm decide how to group samples into classes that share common properties.
Another example of unsupervised learning is Kohonen's self organizing maps.
I have always found the distinction between unsupervised and supervised learning to be arbitrary and a little confusing. There is no real distinction between the two cases, instead there is a range of situations in which an algorithm can have more or less 'supervision'. The existence of semi-supervised learning is an obvious examples where the line is blurred.
I tend to think of supervision as giving feedback to the algorithm about what solutions should be preferred. For a traditional supervised setting, such as spam detection, you tell the algorithm "don't make any mistakes on the training set"; for a traditional unsupervised setting, such as clustering, you tell the algorithm "points that are close to each other should be in the same cluster". It just so happens that, the first form of feedback is a lot more specific than the latter.
In short, when someone says 'supervised', think classification, when they say 'unsupervised' think clustering and try not to worry too much about it beyond that.
Supervised Learning
Supervised learning is based on training a data sample
from data source with correct classification already assigned.
Such techniques are utilized in feedforward or MultiLayer
Perceptron (MLP) models. These MLP has three distinctive
characteristics:
One or more layers of hidden neurons that are not part of the input
or output layers of the network that enable the network to learn and
solve any complex problems
The nonlinearity reflected in the neuronal activity is
differentiable and,
The interconnection model of the network exhibits a high degree of
connectivity.
These characteristics along with learning through training
solve difficult and diverse problems. Learning through
training in a supervised ANN model also called as error backpropagation algorithm. The error correction-learning
algorithm trains the network based on the input-output
samples and finds error signal, which is the difference of the
output calculated and the desired output and adjusts the
synaptic weights of the neurons that is proportional to the
product of the error signal and the input instance of the
synaptic weight. Based on this principle, error back
propagation learning occurs in two passes:
Forward Pass:
Here, input vector is presented to the network. This input signal propagates forward, neuron by neuron through the network and emerges at the output end of
the network as output signal: y(n) = φ(v(n)) where v(n) is the induced local field of a neuron defined by v(n) =Σ w(n)y(n). The output that is calculated at the output layer o(n) is compared with the desired response d(n) and finds the error e(n) for that neuron. The synaptic weights of the network during this pass are remains same.
Backward Pass:
The error signal that is originated at the output neuron of that layer is propagated backward through network. This calculates the local gradient for each neuron in each layer and allows the synaptic weights of the network to undergo changes in accordance with the delta rule as:
Δw(n) = η * δ(n) * y(n).
This recursive computation is continued, with forward pass followed by the backward pass for each input pattern till the network is converged.
Supervised learning paradigm of an ANN is efficient and finds solutions to several linear and non-linear problems such as classification, plant control, forecasting, prediction, robotics etc.
Unsupervised Learning
Self-Organizing neural networks learn using unsupervised learning algorithm to identify hidden patterns in unlabelled input data. This unsupervised refers to the ability to learn and organize information without providing an error signal to evaluate the potential solution. The lack of direction for the learning algorithm in unsupervised learning can sometime be advantageous, since it lets the algorithm to look back for patterns that have not been previously considered. The main characteristics of Self-Organizing Maps (SOM) are:
It transforms an incoming signal pattern of arbitrary dimension into
one or 2 dimensional map and perform this transformation adaptively
The network represents feedforward structure with a single
computational layer consisting of neurons arranged in rows and
columns. At each stage of representation, each input signal is kept
in its proper context and,
Neurons dealing with closely related pieces of information are close
together and they communicate through synaptic connections.
The computational layer is also called as competitive layer since the neurons in the layer compete with each other to become active. Hence, this learning algorithm is called competitive algorithm. Unsupervised algorithm in SOM
works in three phases:
Competition phase:
for each input pattern x, presented to the network, inner product with synaptic weight w is calculated and the neurons in the competitive layer finds a discriminant function that induce competition among the neurons and the synaptic weight vector that is close to the input vector in the Euclidean distance is announced as winner in the competition. That neuron is called best matching neuron,
i.e. x = arg min ║x - w║.
Cooperative phase:
the winning neuron determines the center of a topological neighborhood h of cooperating neurons. This is performed by the lateral interaction d among the
cooperative neurons. This topological neighborhood reduces its size over a time period.
Adaptive phase:
enables the winning neuron and its neighborhood neurons to increase their individual values of the discriminant function in relation to the input pattern
through suitable synaptic weight adjustments,
Δw = ηh(x)(x –w).
Upon repeated presentation of the training patterns, the synaptic weight vectors tend to follow the distribution of the input patterns due to the neighborhood updating and thus ANN learns without supervisor.
Self-Organizing Model naturally represents the neuro-biological behavior, and hence is used in many real world applications such as clustering, speech recognition, texture segmentation, vector coding etc.
Reference.
There are many answers already which explain the differences in detail. I found these gifs on codeacademy and they often help me explain the differences effectively.
Supervised Learning
Notice that the training images have labels here and that the model is learning the names of the images.
Unsupervised Learning
Notice that what's being done here is just grouping(clustering) and that the model doesn't know anything about any image.
Machine learning:
It explores the study and construction of algorithms that can learn from and make predictions on data.Such algorithms operate by building a model from example inputs in order to make data-driven predictions or decisions expressed as outputs,rather than following strictly static program instructions.
Supervised learning:
It is the machine learning task of inferring a function from labeled training data.The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples.
The computer is presented with example inputs and their desired outputs, given by a "teacher", and the goal is to learn a general rule that maps inputs to outputs.Specifically, a supervised learning algorithm takes a known set of input data and known responses to the data (output), and trains a model to generate reasonable predictions for the response to new data.
Unsupervised learning:
It is learning without a teacher. One basic
thing that you might want to do with data is to visualize it. It is the machine learning task of inferring a function to describe hidden structure from unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward signal to evaluate a potential solution. This distinguishes unsupervised learning from supervised learning. Unsupervised learning uses procedures that attempt to find natural partitions
of patterns.
With unsupervised learning there is no feedback based on the prediction results, i.e., there is no teacher to correct you.Under the Unsupervised learning methods no labeled examples are provided and there is no notion of the output during the learning process. As a result, it is up to the learning scheme/model to find patterns or discover the groups of the input data
You should use unsupervised learning methods when you need a large
amount of data to train your models, and the willingness and ability
to experiment and explore, and of course a challenge that isn’t well
solved via more-established methods.With unsupervised learning it is
possible to learn larger and more complex models than with supervised
learning.Here is a good example on it
.
Supervised Learning: You give variously labelled example data as input, along with the correct answers. This algorithm will learn from it, and start predicting correct results based on the inputs thereafter. Example: Email Spam filter
Unsupervised Learning: You just give data and don't tell anything - like labels or correct answers. Algorithm automatically analyses patterns in the data. Example: Google News
Supervised learning:
say a kid goes to kinder-garden. here teacher shows him 3 toys-house,ball and car. now teacher gives him 10 toys.
he will classify them in 3 box of house,ball and car based on his previous experience.
so kid was first supervised by teachers for getting right answers for few sets. then he was tested on unknown toys.
Unsupervised learning:
again kindergarten example.A child is given 10 toys. he is told to segment similar ones.
so based on features like shape,size,color,function etc he will try to make 3 groups say A,B,C and group them.
The word Supervise means you are giving supervision/instruction to machine to help it find answers. Once it learns instructions, it can easily predict for new case.
Unsupervised means there is no supervision or instruction how to find answers/labels and machine will use its intelligence to find some pattern in our data. Here it will not make prediction, it will just try to find clusters which has similar data.
Supervised learning, given the data with an answer.
Given email labeled as spam/not spam, learn a spam filter.
Given a dataset of patients diagnosed as either having diabetes or not, learn to classify new patients as having diabetes or not.
Unsupervised learning, given the data without an answer, let the pc to group things.
Given a set of news articles found on the web, group the into set of articles about the same story.
Given a database of custom data, automatically discover market segments and group customers into different market segments.
Reference
Supervised Learning
In this, every input pattern that is used to train the network is
associated with an output pattern, which is the target or the desired
pattern. A teacher is assumed to be present during the learning
process, when a comparison is made between the network's computed
output and the correct expected output, to determine the error. The
error can then be used to change network parameters, which result in
an improvement in performance.
Unsupervised Learning
In this learning method, the target output is not presented to the
network. It is as if there is no teacher to present the desired
pattern and hence, the system learns of its own by discovering and
adapting to structural features in the input patterns.
I'll try to keep it simple.
Supervised Learning: In this technique of learning, we are given a data set and the system already knows the correct output of the data set. So here, our system learns by predicting a value of its own. Then, it does an accuracy check by using a cost function to check how close its prediction was to the actual output.
Unsupervised Learning: In this approach, we have little or no knowledge of what our result would be. So instead, we derive structure from the data where we don't know effect of variable.
We make structure by clustering the data based on relationship among the variable in data.
Here, we don't have a feedback based on our prediction.
Supervised learning
You have input x and a target output t. So you train the algorithm to generalize to the missing parts. It is supervised because the target is given. You are the supervisor telling the algorithm: For the example x, you should output t!
Unsupervised learning
Although segmentation, clustering and compression are usually counted in this direction, I have a hard time to come up with a good definition for it.
Let's take auto-encoders for compression as an example. While you only have the input x given, it is the human engineer how tells the algorithm that the target is also x. So in some sense, this is not different from supervised learning.
And for clustering and segmentation, I'm not too sure if it really fits the definition of machine learning (see other question).
Supervised Learning: You have labeled data and have to learn from that. e.g house data along with price and then learn to predict price
Unsupervised learning: you have to find the trend and then predict, no prior labels given.
e.g different people in the class and then a new person comes so what group does this new student belong to.
In Supervised Learning we know what the input and output should be. For example , given a set of cars. We have to find out which ones red and which ones blue.
Whereas, Unsupervised learning is where we have to find out the answer with a very little or without any idea about how the output should be. For example, a learner might be able to build a model that detects when people are smiling based on correlation of facial patterns and words such as "what are you smiling about?".
Supervised learning can label a new item into one of the trained labels based on learning during training. You need to provide large numbers of training data set, validation data set and test data set. If you provide say pixel image vectors of digits along with training data with labels, then it can identify the numbers.
Unsupervised learning does not require training data-sets. In unsupervised learning it can group items into different clusters based on the difference in the input vectors. If you provide pixel image vectors of digits and ask it to classify into 10 categories, it may do that. But it does know how to labels it as you have not provided training labels.
Supervised Learning is basically where you have input variables(x) and output variable(y) and use algorithm to learn the mapping function from input to the output. The reason why we called this as supervised is because algorithm learns from the training dataset, the algorithm iteratively makes predictions on the training data.
Supervised have two types-Classification and Regression.
Classification is when the output variable is category like yes/no, true/false.
Regression is when the output is real values like height of person, Temperature etc.
UN supervised learning is where we have only input data(X) and no output variables.
This is called an unsupervised learning because unlike supervised learning above there is no correct answers and there is no teacher. Algorithms are left to their own devises to discover and present the interesting structure in the data.
Types of unsupervised learning are clustering and Association.
Supervised Learning is basically a technique in which the training data from which the machine learns is already labelled that is suppose a simple even odd number classifier where you have already classified the data during training . Therefore it uses "LABELLED" data.
Unsupervised learning on the contrary is a technique in which the machine by itself labels the data . Or you can say its the case when the machine learns by itself from scratch.
In Simple
Supervised learning is type of machine learning problem in which we have some labels and by using that labels we implement algorithm such as regression and classification .Classification is applied where our output is like in the form of
0 or 1 ,true/false,yes/no. and regression is applied where out put a real value such a house of price
Unsupervised Learning is a type of machine learning problem in which we don't have any labels means we have some data only ,unstructured data and we have to cluster the data (grouping of data)using various unsupervised algorithm
Supervised Machine Learning
"The process of an algorithm learning from training dataset and
predict the output. "
Accuracy of predicted output directly proportional to the training data (length)
Supervised learning is where you have input variables (x) (training dataset) and an output variable (Y) (testing dataset) and you use an algorithm to learn the mapping function from the input to the output.
Y = f(X)
Major types:
Classification (discrete y-axis)
Predictive (continuous y-axis)
Algorithms:
Classification Algorithms:
Neural Networks
Naïve Bayes classifiers
Fisher linear discriminant
KNN
Decision Tree
Super Vector Machines
Predictive Algorithms:
Nearest neighbor
Linear Regression,Multi Regression
Application areas:
Classifying emails as spam
Classifying whether patient has
disease or not
Voice Recognition
Predict the HR select particular candidate or not
Predict the stock market price
Supervised learning:
A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples.
We provide training data and we know correct output for a certain input
We know relation between input and output
Categories of problem:
Regression: Predict results within a continuous output => map input variables to some continuous function.
Example:
Given a picture of a person, predict his age
Classification: Predict results in a discrete output => map input variables into discrete categories
Example:
Is this tumer cancerous?
Unsupervised learning:
Unsupervised learning learns from test data that has not been labeled, classified or categorized. Unsupervised learning identifies commonalities in the data and reacts based on the presence or absence of such commonalities in each new piece of data.
We can derive this structure by clustering the data based on relationships among the variables in the data.
There is no feedback based on the prediction results.
Categories of problem:
Clustering: is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters)
Example:
Take a collection of 1,000,000 different genes, and find a way to automatically group these genes into groups that are somehow similar or related by different variables, such as lifespan, location, roles, and so on.
Popular use cases are listed here.
Difference between classification and clustering in data mining?
References:
Supervised_learning
Unsupervised_learning
machine-learning from coursera
towardsdatascience
Supervised Learning
Unsupervised Learning
Example:
Supervised Learning:
One bag with apple
One bag with orange
=> build model
One mixed bag of apple and orange.
=> Please classify
Unsupervised Learning:
One mixed bag of apple and orange.
=> build model
Another mixed bag
=> Please classify
In simple words.. :) It's my understanding, feel free to correct.
Supervised learning is, we know what we are predicting on the basis of provided data. So we have a column in the dataset which needs to be predicated.
Unsupervised learning is, we try to extract meaning out of the provided dataset. We don't have clarity on what to be predicted. So question is why we do this?.. :) Answer is - the outcome of Unsupervised learning is groups/clusters(similar data together). So if we receive any new data then we associate that with the identified cluster/group and understand it's features.
I hope it will help you.
supervised learning
supervised learning is where we know the output of the raw input, i.e the data is labelled so that during the training of machine learning model it will understand what it need to detect in the give output, and it will guide the system during the training to detect the pre-labelled objects on that basis it will detect the similar objects which we have provided in training.
Here the algorithms will know what's the structure and pattern of data. Supervised learning is used for classification
As an example, we can have a different objects whose shapes are square, circle, trianle our task is to arrange the same types of shapes
the labelled dataset have all the shapes labelled, and we will train the machine learning model on that dataset, on the based of training dateset it will start detecting the shapes.
Un-supervised learning
Unsupervised learning is a unguided learning where the end result is not known, it will cluster the dataset and based on similar properties of the object it will divide the objects on different bunches and detect the objects.
Here algorithms will search for the different pattern in the raw data, and based on that it will cluster the data. Un-supervised learning is used for clustering.
As an example, we can have different objects of multiple shapes square, circle, triangle, so it will make the bunches based on the object properties, if a object has four sides it will consider it square, and if it have three sides triangle and if no sides than circle, here the the data is not labelled, it will learn itself to detect the various shapes
Machine learning is a field where you are trying to make machine to mimic the human behavior.
You train machine just like a baby.The way humans learn, identify features, recognize patterns and train himself, same way you train machine by feeding data with various features. Machine algorithm identify the pattern within the data and classify it into particular category.
Machine learning broadly divided into two category, supervised and unsupervised learning.
Supervised learning is the concept where you have input vector / data with corresponding target value (output).On the other hand unsupervised learning is the concept where you only have input vectors / data without any corresponding target value.
An example of supervised learning is handwritten digits recognition where you have image of digits with corresponding digit [0-9], and an example of unsupervised learning is grouping customers by purchasing behavior.

Resources