Machine learning algorithms: which algorithm for which issue? [closed] - machine-learning

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I am new at the domain of machine learning and i have noticed that there are a lot of algorithms/ set of algorithms that can be used: SVM, decision trees, naive bayes, perceptron etc...
That is why I wonder which algorithm should one use for solving which issue? In other words which algorithm solves which problem class?
So my question is if you know a good web site or book that focuses on this algorithm selection problematic?
Any help would be appreciated. Thx in advance.
Horace

Take Andrew Ng's machine learning course on coursera. It's beautifully put together, explains the differences between different types of ML algorithm, gives advice on when to use each algorithm, and contains material useful for practioners as well as maths if you want it. I'm in the process of learning machine learning myself and this has been by far the most useful resource.
(Another piece of advice you might find useful is to consider learning python. This is based on a mistake I made of not starting to learn python at an earlier stage and ruling out the many books, web pages, sdks, etc that are python based. As it turns out, python is pretty easy to pick up, and from my own personal observations at least, widely used in the machine learning and data science communities.)

scikit-learn.org published this infographic, that can be helpful, even when you're not using sklearn library.

#TooTone: In my opinion Machine Learning in Action could help the OP with deciding on which technique to use for a particular problem, as the book gives a clear classification of the different ML algorithms and pros, cons, and "works with" for each of them. I do agree the code is somewhat hard to read, especially for people not used to matrix operations. There is years of research condensed into a 10 line Python program, so be prepared that understanding it will take a day (for me at least).

It is very hard answer the question “which algorithm for which issue?”
That ability comes with a lot of experience and knowledge. So I suggest, you should read few good books about machine learning. Probably, following book would be a good starting point.
Machine Learning: A Probabilistic Perspective
Once you have some knowledge about machine learning, you can work on couple of simple machine learning problems. Iris flower dataset is a good starting point. It consists of several features belonging to three types of Iris species. Initially develop a simple machine learning model (such as Logistic Regression) to classify Iris species and gradually you could move to more advanced models such as Neural Networks.

As a simple starting place I consider what inputs I have and what outputs I want, which often narrows down choices in any situation. For example, if I have categories, rather than numbers and a target category for each input, decision trees are a good idea. If I have no target, I can only do clustering. If I have numerical inputs and a numerical output I could use neural networks or other types of regression. I could also use decision trees that generate regression equations. There are further questions to be asked after this, but it's a good place to start.

Following DZone Refcard might also helpful .. http://refcardz.dzone.com/refcardz/machine-learning-predictive. But you will have to dig in to each in detail eventually.

Related

Which supervised machine learning classification method suits for randomly spread classes? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
If classes are randomly spread or it is having more noise, which type of supervised ML classification model will give better results, and why?
It is difficult to say which classifier will perform best on general problems. It often requires testing of a variety of algorithms on a given problem in order to determine which classifier performs best.
Best performance is also dependent on the nature of the problem. There is a great answer in this stackoverflow question which looks at various scoring metrics. For each problem, one needs to understand and consider which scoring metric will be best.
All of that said, neural networks, Random Forest classifiers, Support Vector Machines, and a variety of others are all candidates for creating useful models given that classes are, as you indicated, equally distributed. When classes are imbalanced, the rules shift slightly, as most ML algorithms assume balance.
My suggestion would be to try a few different algorithms, and tune the hyper parameters, to compare them for your specific application. You will often find one algorithm is better, but not remarkably so. In my experience, often of far greater importance, is how your data are preprocessed and how your features are prepared. Once again this is a highly generic answer as it depends greatly on your given application.

How to implement feature extraction in Julia [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am trying to make a binary classifier using machine learning and I am trying to develop other features for my data using correlated features (numerical attributes) I have. I searched much but could not get a block of code that will work with me.
What should i do?
I've searched in dimenshionality reduction and found library (Multivariate Statistics) but actually i did not understand and i felt lost :D
No one will make a choice for you what exact method to choose. They are many, many different ways of doing a binary classification and to do feature extraction. If you feel overwhelmed by all these names that libraries such as Multivariate Statistics offer, then take a look at a textbook on statistics and machine learning, understanding the methods is independent from the programming language.
Start with some simple methods such as principal compenent analysis (PCA), (MultivariateStats.jl provides that), then test others as you gain more knowledge on your data and the methods.
Some Julia libraries to take a look at: JuliaStats (https://github.com/JuliaStats) with its parts
StatsBase for the most basic stuff
MultivariateStats for methods like PCA
StatsModels (and DataFrames) for statistical models
many more ....
For Neural Networks there are Flux.jl and KNet.jl
For Clustering there is Clustering.jl
Then, there are also bindings to the python libraries Tensorflow (Neural Networks & more) and Scikit-Learn (all kinds of ML algorithms)
There are many more projects, but these are some that I think are important.

Machine learning systems [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
As per Pedro Domingos in his famous paper "A Few Useful Things to Know about Machine Learning" he writes Machine learning systems automatically learn programs from data.
But from my experience we r giving algorithms like ANN or SVM etc.
My question is how it is automating automation?
Could someone put some light with example.
When you develop a machine learning algorithm, with ANN or SVM or whatever, you don't say to your programming how to solve your problem, you are telling him how to learn to solve the problem.
SVM or ANN are ways to learn a solution to a problem, but not how to solve a problem.
So when people say "Machine learning systems automatically learn programs from data", they are saying that you never programmed a solution to your problem, but rather letting the computer learning to do so.
To quote wikipedia : "Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed"
https://en.wikipedia.org/wiki/Machine_learning
[Edit]
For example let's take one of the most simple machine learning algorithm, the linear regression in a 2D space.
The aim of this algorithm is to learn a linear function given a dataset of (x,y), so when you given your system a new x you get an approximation of what the real y would be.
But when you code a linear regression you never specify the linear function y = ax+b. What you code is a way for the program to deduce it from the dataset.
The linear function y=ax+b is the solution to your problem, the linear regression code is the way you are going to learn that solution.
https://en.wikipedia.org/wiki/Linear_regression
Machine Learning development helps to improve business operations as well as improve business scalability. A number of ML algorithms and artificial intelligence tools have gained tremendous popularity in the community of business analytics. There has been a rise in machine learning market due to faster and cheaper computational processing, easy availability of data as well as affordable data storage.

Probability basics for machine learning [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I have recently started studying Machine Learning and found that I need to refresh probability basics such as Conditional Probability, Bayes Theorem etc.
I am looking for online resources where I can quickly brush up probability concepts wrt Machine Learning.
The online resorces, I stumbled upon are either very basic or too advanced.
This might help: http://www.cs.cmu.edu/~tom/10601_fall2012/lectures.shtml
The above link is from Tom Mitchell's Machine Learning Class # CMU. Videos are available too. You will gain a very good understanding of ML concepts if you go through all the videos. (or just the first few videos for Conditional Probability, Bayes Theorem, etc).
The notion of conditional probability and bayes theorem are very basic themselves. It doesn't get any more basic than that in probabilistic modeling, you might say. Which suggests that you didn't look two well at what you've found or didn't really do any search at all.
Off the top of my head, I can name two resources: first, any Coursera course dealing with probabilities or machine learning (see AI, Statistics One or Probabilistic Graphical Models) contains these preliminaries. Second, there's a number of books on statistics freely available online, one example being Information Theory, Inference, and Learning Algorithms.

ANN and SVM classification [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
Where is ANN classification (regression) better than SVM? Some real-world examples?
There are many applications where they're better, many applications where they're comparable, many applications where they are worse. It also depends on who you ask. It is hard to say this type of data or that type of data/application.
An example where ANN, in particular convolutional neural networks, work better than SVMs would be digit classification on MNIST. Another such case is the work of Geoff Hinton's group on speech recognition using Deep Belief Networks
Recently I have read a paper of proving the theoretical equivalence between ANN and SVM. However, ANN is usually slower than SVM.
I am just finishing some out-of-the-box comparison between support vector machines and neural networks on several popular regression- and classification datasets - first results in short: svms learn fast and predict slow - neural networks learn slow but predict fast and have very lightweight models. Concerning accuracy/loss, both methods seem to be on par.
It will largely depend as both have different tradeoffs and design criteria. There has been some work to show the relationship and some say equivalence as seen in other answers to this question. Below is another reference which draws links between these two techniques in machine learning:
Ronan Collobert and Samy Bengio. 2004. Links between perceptrons, MLPs
and SVMs. In Proceedings of the twenty-first international
conference on Machine learning (ICML '04). ACM, New York, NY, USA,
23-. DOI: https://doi.org/10.1145/1015330.1015415

Resources