Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
I am learning Linear Algebra(started recently) and was curious to know its applications in Machine Learning, where can I read about this
Thank you
Linear Algebra provides the computational engine for the majority of Machine Learning algorithms.
For instance, probably the most conspicuous and most frequent application of ML
is the recommendation engine.
Aside from data retrieval, the real crux of these algorithms is often
'reconstruction' of the ridiculously sparse data used as input for these engines.
The raw data supplied to Amazon.com's user-based R/E is (probably) a massive
data matrix in which the users are the rows and its products are represented
in the columns. Therefore, to organically populate this matrix, every customer would have to
purchase every product Amazon.com sells. Linear Algebra-based techniques are used here.
All of the techniques in current use involve some type of matrix decomposition, a fundamental
class of linear algebra techniques (e.g., non-negative matrix approximation, and
positive-maximum-margin-matrix approximation (warning link to pdf!) are perhaps the two most common)
Second, many if not most ML techniques rely on a numerical optimization technique.
E.g., most supervised ML algorithms involve creation of a trained classifier/regressor by minimizing the delta between the value calculated by the nascent classifier and
the actual value from the training data. This can be done either iteratively or using linear algebra
techniques. If the latter, then the technique is usually SVD or some variant.
Third, the spectral-based decompositions--PCA (principal component analysis)
and kernel PCA--are perhaps the most commonly used dimension-reduction techniques,
often applied in a pre-processing step just ahead of the ML algorithm in the data flow,
for instance, PCA is often used instance in a Kohonen Map to initialize the
lattice. The principal insight underneath these techniques is that the eigenvectors of the covariance matrix (a square, symmetric matrix with zeros down the main diagonal, prepared from the original data matrix) are unit length and are orthogonal to each other.
In machine learning, we generally deal with data in form of vectors/matrices. Any statistical method used involves linear algebra as its integral part. Also, it is useful in data mining.
SVD and PCA are famous dimensionality reduction techniques involving linear algebra.
Bayesian decision theory also involves significant amount of LA.You can try it also.
Singular value decomposition (SVD), is a classic method widely used in Machine Learning.
I find this article is fairly easy, explaining a SVD based recommendation system, see http://www.igvita.com/2007/01/15/svd-recommendation-system-in-ruby/ .
And Strang's linear algebra book, contains a section on the application of SVD to rank web pages (HITS algorithm) see Google Books.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed last year.
Improve this question
I am looking for some Research paper or books have good, basic definiton of what Supervised and Unsupervised Learning is. So that i am able to quote these definition in my project.
Thank you so much.
I would make a reference to the following book: Artificial Intelligence: A Modern Approach (3rd Edition) 3rd Edition by Stuart Russell and Peter Norvig. In more detail in Chapter 18 and in pages 693 and on there is an analysis of supervised and unsupervised learning. About unsupervised learning:
In unsupervised learning, the agent learns patterns in the input
even though no explicit feedback is supplied.
The most common unsupervised learning task is clustering:
detecting potentially useful clusters of input examples.
For example, a taxi agent might gradually develop a concept
of “good traffic days” and “bad traffic days” without ever being
given labeled examples of each by a teacher
While for supervised:
In supervised learning, the agent observes some example input–output
pairs
and learns a function that maps from input to output. In component 1 above,
the inputs are percepts and the output are provided by a teacher
who says “Brake!” or “Turn left.” In component 2, the inputs are camera
images and the outputs again come from a teacher who says “that’s a bus.”
In 3, the theory of braking is a function from states and braking actions
to stopping distance in feet. In this case the output value is available
directly from the agent’s percepts (after the fact); the environment
is the teacher.
The examples are mentioned in the text above.
Christopher M. Bishop, "Pattern Recognition and Machine Learning", p.3 (emphasis mine)
Applications in which the training data comprises examples of the input vectors along with their corresponding target vectors are known as supervised learning problems...
In other pattern recognition problems, the training data consists of a set of input vectors x without any corresponding target values. The goal in such unsupervised learning problems may be to discover groups of similar examples within the data,
where it is called clustering, or to determine the distribution of data within the input space, known as density estimation, or to project the data from a high-dimensional space down to two or three dimensions for the purpose of visualization.
Which is as good as you can get. Basically, the most noticable difference is whether we have labels wrt. which we want learning model to optimize. If we don't have some of the labels, it's still can be described as weakly-supervised learning. If no labels are available,the only thing left is to find some structure in the data.
Thanks #Pavel Tyshevskyi for the answear. Your answer is perfect but it seem a littel but hard to understand for beginers like me.
And after hour of searching, i found my own answer version in "Machine Learning For Dummies, IBM Limited Edition" book, at part "Approaches to Machine Learning" of chapter 1 "Understanding Machine Learning". It has simpler definition and has example that can help me to understand better a bit. Link to the book: Machine Learning For Dummies, IBM Limited Edition
Supervised learning
Supervised learning typically begins with an established set of data and a certain understanding of how that data is classified. Supervised learning is intended to find patterns in data that can be applied to an analytics process. This data has labeled features that define the meaning of data. For example, there could be mil-lions of images of animals and include an explanation of what each animal is and then you can create a machine learning appli-cation that distinguishes one animal from another. By labeling this data about types of animals, you may have hundreds of cat-egories of different species. Because the attributes and the mean-ing of the data have been identified, it is well understood by the users that are training the modeled data so that it fits the details of the labels. When the label is continuous, it is a regression; when the data comes from a finite set of values, it known as classifica-tion. In essence, regression used for supervised learning helps you understand the correlation between variables. An example of supervised learning is weather forecasting. By using regression analysis, weather forecasting takes into account known historical weather patterns and the current conditions to provide a predic-tion on the weather.
The algorithms are trained using preprocessed examples, and at this point, the performance of the algorithms is evaluated with test data. Occasionally, patterns that are identified in a subset of the data can’t be detected in the larger population of data. If the model is fit to only represent the patterns that exist in the training subset, you create a problem called overfitting. Overfit-ting means that your model is precisely tuned for your training data but may not be applicable for large sets of unknown data. To protect against overfitting, testing needs to be done against unforeseen or unknown labeled data. Using unforeseen data for the test set can help you evaluate the accuracy of the model in predicting outcomes and results. Supervised training models have broad applicability to a variety of business problems, including fraud detection, recommendation solutions, speech recognition, or risk analysis.
Unsupervised learning
Unsupervised learning is best suited when the problem requires a massive amount of data that is unlabeled. For example, social media applications, such as Twitter, Instagram, Snapchat, and.....
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 months ago.
Improve this question
I'm trying to classify/cluster subjects according to 4 features in two classes: healthy and sick.
Two things to know: I know the labels/classes of each subject + I only have 40 subjects (in total: training + testing set!)
What should I choose in this case, clustering or classification?
Clustering vs classification is not the choice of method but choice of problem. What is the problem at hand? You have labeled data and want to get a model that can label more - this is by definition classification. In terms of what specific method of classification to use it is a whole new, research-driven, question, rather than a simple programming issue. In particular many classifiers will try to fit some sort of generative model to the data (and thus learn about the structure even without labels), but in the end - labels are there, and should be used.*
Clustering is based on unsupervised learning and classification is based on supervised learning. Unsupervised learning is used when you don't have the target labels, it is used to cluster the data into groups. Whereas supervised learning is used when you have labeled data.
In your statement you have mentioned that you have labels then go for classification algorithms like logistic regression, svm etc. Also if you have a small dataset then you should take care of over fitting, to overcome this go for simple algorithms.
Classification is type of supervised learning. In the Classification you know algorithm needs to predict from finite set of output. For example input data has information about people who take credit card. Then algorithm will learn pattern from input data and output column(take credit card or not).Once algorithm learn it will predict from unseen data take credit card or not. In this example there are only finite number of output(2 in this case - take credit card or not). This problem can be solved using classification.
Clustering is in the unsupervised learning. It mainly deal with data which is not labelled. Clustering algorithm will separate data based on similar characteristics
I have some questions about SVM :
1- Why using SVM? or in other words, what causes it to appear?
2- The state Of art (2017)
3- What improvements have they made?
SVM works very well. In many applications, they are still among the best performing algorithms.
We've seen some progress in particular on linear SVMs, that can be trained much faster than kernel SVMs.
Read more literature. Don't expect an exhaustive answer in this QA format. Show more effort on your behalf.
SVM's are most commonly used for classification problems where labeled data is available (supervised learning) and are useful for modeling with limited data. For problems with unlabeled data (unsupervised learning), then support vector clustering is an algorithm commonly employed. SVM tends to perform better on binary classification problems since the decision boundaries will not overlap. Your 2nd and 3rd questions are very ambiguous (and need lots of work!), but I'll suffice it to say that SVM's have found wide range applicability to medical data science. Here's a link to explore more about this: Applications of Support Vector Machine (SVM) Learning in Cancer Genomics
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
There are several components and techniques used in learning programs. Machine learning components include ANN, Bayesian networks, SVM, PCA and other probability based methods. What role do Bayesian networks based techniques play in machine learning?
Also it would be helpful to know how does integrating one or more of these components into applications lead to real solutions, and how does software deal with limited knowledge and still produce sufficiently reliable results.
Probability and Learning
Probability plays a role in all learning. If we apply Shannon's information theory, the movement of probability toward one of the extremes 0.0 or 1.0 is information. Shannon defined a bit as the quotient of the log_2 of the before and after probabilities of a hypothesis. Given the probability of the hypothesis and its logical inversion, if the probability does not increase for either, no bits of information have been learned.
Bayesian Approaches
Bayesian Networks are directed graphs that represents causality hypotheses. They are generally represented as nodes with conditions connected by arrows that represent the hypothetical causes and corresponding effects. Algorithms have been developed based on Bayes' Theorem that attempt to statistically analyze causality from data that had been or is being collected.
MINOR SIDE NOTE: There are often usage constraints for the analytic tools. Most Bayesian algorithms require that the directed graph be acyclic, meaning that no series of arrows exist between two or more nodes anywhere in the graph that create a purely clockwise or purely counterclockwise closed loop. This is to avoid endless loops, however there may be now or in the future algorithms that work with cycles and handle them seamlessly from mathematical theory and software usability perspectives.
Application to Learning
The application to learning is that the probabilities calculated can be used to predict potential control mechanisms. The litmus test for learning is the ability to reliably alter the future through controls. An important application is the sorting of mail from handwriting. Both neural nets and Naive Bayesian classifiers can be useful in general pattern recognition integrated into routing or manipulation robotics.
Keep in mind here that the term network has a very wide meaning. Neural Nets are not at all the same approach as Bayesian Networks, although they may be applied to similar problem-solution topologies.
Relation to Other Approaches and Mechanisms
How a system designer uses support vector machines, principle component analysis, neural nets, and Bayesian networks in multivariate time series analysis (MTSA) varies from author to author. How they tie together also depends on the problem domain and statistical qualities of the data set, including size, skew, sparseness, and the number of dimensions.
The list given includes only four of a much larger set of machine learning tools. For instance Fuzzy Logic combines weights and production system (rule based) approaches.
The year is also a factor. An answer given now might be stale next year. If I were to write software given the same predictive or control goals as I was given ten years ago, I might combine various techniques entirely differently. I would certainly have a plethora of additional libraries and comparative studies to read and analyse before drawing my system topology.
The field is quite active.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I am a computer science student and i have to choose the theme of my future research work. I really want to solve some scientific problems in chemistry(or maybe biology) using computers. Also I have huge interest in machine learning sphere.
I have been surfing over internet for a while, and have found some particular references on that kind of problems. But, unfortunately, that stuff is not enough for me.
So, I am interested in the Community's recommendation of particular resources that present the application of an ML technique to solve a problem in chemistry--e.g., a journal article or a good book describing typical (or the new ones) problems in chemistry being solved "in silico".
i should think that chemistry, as much as any domain, would have the richest supply of problems particularly suited for ML. The rubric of problems i have in mind are QSAR (quantitative structure-activity relationships) both for naturally occurring compounds and prospectively, e.g., drug design.
Perhaps have a look at AZOrange--an entire ML library built for the sole purpose of solving chemistry problems using ML techniques. In particular, AZOrange is a re-implementation of the highly-regarded GUI-driven ML Library, Orange, specifically for the solution of QSAR problems.
In addition, here are two particularly good ones--both published within the last year and in both, ML is at the heart (the link is to the article's page on the Journal of Chemoinformatics Site and includes the full text of each article):
AZOrange-High performance open source machine learning for QSAR modeling in a graphical programming environment.
2D-Qsar for 450 types of amino acid induction peptides with a novel substructure pair descriptor having wider scope
It seems to me that the general natural of QSAR problems are ideal for study by ML:
a highly non-linear relationship between the expectation variables
(e.g, "features") and the response variable (e.g., "class labels" or
"regression estimates")
at least for the larger molecules, the structure-activity
relationships is sufficiently complex that they are at least several
generations from solution by analytical means, so any hope of
accurate prediction of these relationships can only be reliably
performed by empirical techniques
oceans of training data pairing analysis of some form of
instrument-produced data (e.g., protein structure determined by x-ray
crystallography) with laboratory data recording the chemical behavior
behavior of that protein (e.g., reaction kinetics)
So here are a couple of suggestions for interesting and current areas of research at the ML-chemistry interface:
QSAR prediction applying current "best practices"; for instance, the technique that won the NetFlix Prize (awarded sept 2009) was not based on a state-of-the-art ML algorithm, instead it used kNN. The interesting aspects of the winning technique are:
the data imputation technique--the technique for re-generating the data rows having one or more feature missing; the particular
technique for solving this sparsity problem is usually referred to by
the term Positive Maximum Margin Matrix Factorization (or
Non-Negative Maximum Margin Matrix Factorization). Perhaps there are
a interesting QSAR problems which were deemed insoluble by ML
techniques because of poor data quality, in particular sparsity.
Armed with PMMMF, these might be good problems to revisit
algorithm combination--the rubric of post-processing techniques that involve combining the results of two or more
classifiers was generally known to ML practitioners prior to the
NetFlix Prize but in fact these techniques were rarely used. The most
widely used of these techniques are AdaBoost, Gradient Boosting, and
Bagging (bootstrap aggregation). I wonder if there are some QSAR
problems for which the state-of-the-art ML techniques have not quite
provided the resolution or prediction accuracy required by the
problem context; if so, it would certainly be interesting to know if
those results could be improved by combining classifiers. Aside from their often dramatic improvement on prediction accuracy, an additional advantage of these techniques is that many of them are very simple to implement. For instance, Bagging works like this: train your classifier for some number of epochs and look at the results; identify those data points in your training data that caused the poorest resolution by your classifier--i.e., the data points it consistently predicted incorrectly over many epochs; apply a higher weight to those training instances (i.e., penalize your classifier more heavily for an incorrect prediction) and re-train y our classifier with this "new" data set.