Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I'm developing a software with my friend in university, and the problem is: We are new to AI because they never taught us AI courses, we'll have them next year I think.
Our professor advised us to search about the algorithm before we start, and to give him what we found so we can use it, and I want to find what is good. We are making this software with Machine learning and here's what we want to have: Suppose I have 100 students report cards, and you want to classify them from the best to the worst, but with Machine learning.
it appeared in the exercice, that i need by "marks" and "opinions" of the professors who did the report cards ,and also the "class", that i need to show to other professors in the software who is best and worst but with the caracteristic they have exemple :
student 1 : 10 in maths 19 in science
student 2 : 10 in science 19 in maths
student 3 : 10 in science 19 in maths but is in lower class than student 2
the professor of science will see first student 1 ten studient 2
professor of maths will see first student 2 then student 3 then 1
What algorithm do we need and why? We have read a lot about machine learning algorithms, but I don't know what is the best to use in this case.
Thank you in advance for your help.
This very much depends on the data you have and what exactly you want to predict.
Are the features continuous or categorical? Do you want to predict a continouus or categorical value?
Let´s assume that you have categorical features in your report cards and you want to predict a binary target like "best" or "worst", than look into binary classification. If you need to predict grades (eg. 1 or 2 or 3 etc.) than look into categorical classification.
If you want to sort all the report cards by estimating a continuous value than you need regression.
For all these you have a wide variety of algorithms like Linear Regression, Decision Trees, Random Forests, Naive Bayes, Support Vector Machines or even Neural Networks.
Have a look here at this helpful scikit-learn estimator to get a first intuition of your choices.
This cheat sheet from Microsoft is very good too.
From what I can derive from your question I would first check a simple Linear Regression.
Well, I'm relatively new to machine learning but in your scenario I'd use logistic regression since you are classifying the report cards in terms of best and worst.
Hope I have been of help
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
As per Pedro Domingos in his famous paper "A Few Useful Things to Know about Machine Learning" he writes Machine learning systems automatically learn programs from data.
But from my experience we r giving algorithms like ANN or SVM etc.
My question is how it is automating automation?
Could someone put some light with example.
When you develop a machine learning algorithm, with ANN or SVM or whatever, you don't say to your programming how to solve your problem, you are telling him how to learn to solve the problem.
SVM or ANN are ways to learn a solution to a problem, but not how to solve a problem.
So when people say "Machine learning systems automatically learn programs from data", they are saying that you never programmed a solution to your problem, but rather letting the computer learning to do so.
To quote wikipedia : "Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed"
https://en.wikipedia.org/wiki/Machine_learning
[Edit]
For example let's take one of the most simple machine learning algorithm, the linear regression in a 2D space.
The aim of this algorithm is to learn a linear function given a dataset of (x,y), so when you given your system a new x you get an approximation of what the real y would be.
But when you code a linear regression you never specify the linear function y = ax+b. What you code is a way for the program to deduce it from the dataset.
The linear function y=ax+b is the solution to your problem, the linear regression code is the way you are going to learn that solution.
https://en.wikipedia.org/wiki/Linear_regression
Machine Learning development helps to improve business operations as well as improve business scalability. A number of ML algorithms and artificial intelligence tools have gained tremendous popularity in the community of business analytics. There has been a rise in machine learning market due to faster and cheaper computational processing, easy availability of data as well as affordable data storage.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I am new at the domain of machine learning and i have noticed that there are a lot of algorithms/ set of algorithms that can be used: SVM, decision trees, naive bayes, perceptron etc...
That is why I wonder which algorithm should one use for solving which issue? In other words which algorithm solves which problem class?
So my question is if you know a good web site or book that focuses on this algorithm selection problematic?
Any help would be appreciated. Thx in advance.
Horace
Take Andrew Ng's machine learning course on coursera. It's beautifully put together, explains the differences between different types of ML algorithm, gives advice on when to use each algorithm, and contains material useful for practioners as well as maths if you want it. I'm in the process of learning machine learning myself and this has been by far the most useful resource.
(Another piece of advice you might find useful is to consider learning python. This is based on a mistake I made of not starting to learn python at an earlier stage and ruling out the many books, web pages, sdks, etc that are python based. As it turns out, python is pretty easy to pick up, and from my own personal observations at least, widely used in the machine learning and data science communities.)
scikit-learn.org published this infographic, that can be helpful, even when you're not using sklearn library.
#TooTone: In my opinion Machine Learning in Action could help the OP with deciding on which technique to use for a particular problem, as the book gives a clear classification of the different ML algorithms and pros, cons, and "works with" for each of them. I do agree the code is somewhat hard to read, especially for people not used to matrix operations. There is years of research condensed into a 10 line Python program, so be prepared that understanding it will take a day (for me at least).
It is very hard answer the question “which algorithm for which issue?”
That ability comes with a lot of experience and knowledge. So I suggest, you should read few good books about machine learning. Probably, following book would be a good starting point.
Machine Learning: A Probabilistic Perspective
Once you have some knowledge about machine learning, you can work on couple of simple machine learning problems. Iris flower dataset is a good starting point. It consists of several features belonging to three types of Iris species. Initially develop a simple machine learning model (such as Logistic Regression) to classify Iris species and gradually you could move to more advanced models such as Neural Networks.
As a simple starting place I consider what inputs I have and what outputs I want, which often narrows down choices in any situation. For example, if I have categories, rather than numbers and a target category for each input, decision trees are a good idea. If I have no target, I can only do clustering. If I have numerical inputs and a numerical output I could use neural networks or other types of regression. I could also use decision trees that generate regression equations. There are further questions to be asked after this, but it's a good place to start.
Following DZone Refcard might also helpful .. http://refcardz.dzone.com/refcardz/machine-learning-predictive. But you will have to dig in to each in detail eventually.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
If a survey is given to determine overall customer satisfaction, and there are 20 general questions and a final summary question: "What's your overall satisfaction 1-10", how could it be determined which questions are most significantly related to the summary question's answer?
In short, which questions actually mattered and which ones were just wasting space on the survey...
Information about the relevance of certain features is given by linear classification and regression weights associated with these features.
For your specific application, you could try training an L1 or L0 regularized regressor (http://en.wikipedia.org/wiki/Least-angle_regression, http://en.wikipedia.org/wiki/Matching_pursuit). These regularizers force many of the regression weights to zero, which means that the features associated with these weights can be effectively ignored.
There are many different approaches for answering this question and at varying levels of sophistication. I would start by calculating the correlation matrix for all pair-wise combinations of answers, thereby indicating which individual questions are most (or most negatively) correlated with the overall satisfaction score. This is pretty straightforward in Excel with the Analysis ToolPak.
Next, I would look into clustering techniques starting simple and moving up in sophistication only if necessary. Not knowing anything about the domain to which this survey data applies it is hard to say which algorithm would be the most effective, but for starters I would look at k-means and variants if your clusters are likely to all be similarly-sized. However, if a vast majority of the responses are very similar, I would look into expectation-maximization-based algorithms. A good open-source toolkit for exploring data and testing the efficacy of various algorithms is called Weka.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I have recently started studying Machine Learning and found that I need to refresh probability basics such as Conditional Probability, Bayes Theorem etc.
I am looking for online resources where I can quickly brush up probability concepts wrt Machine Learning.
The online resorces, I stumbled upon are either very basic or too advanced.
This might help: http://www.cs.cmu.edu/~tom/10601_fall2012/lectures.shtml
The above link is from Tom Mitchell's Machine Learning Class # CMU. Videos are available too. You will gain a very good understanding of ML concepts if you go through all the videos. (or just the first few videos for Conditional Probability, Bayes Theorem, etc).
The notion of conditional probability and bayes theorem are very basic themselves. It doesn't get any more basic than that in probabilistic modeling, you might say. Which suggests that you didn't look two well at what you've found or didn't really do any search at all.
Off the top of my head, I can name two resources: first, any Coursera course dealing with probabilities or machine learning (see AI, Statistics One or Probabilistic Graphical Models) contains these preliminaries. Second, there's a number of books on statistics freely available online, one example being Information Theory, Inference, and Learning Algorithms.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
Where is ANN classification (regression) better than SVM? Some real-world examples?
There are many applications where they're better, many applications where they're comparable, many applications where they are worse. It also depends on who you ask. It is hard to say this type of data or that type of data/application.
An example where ANN, in particular convolutional neural networks, work better than SVMs would be digit classification on MNIST. Another such case is the work of Geoff Hinton's group on speech recognition using Deep Belief Networks
Recently I have read a paper of proving the theoretical equivalence between ANN and SVM. However, ANN is usually slower than SVM.
I am just finishing some out-of-the-box comparison between support vector machines and neural networks on several popular regression- and classification datasets - first results in short: svms learn fast and predict slow - neural networks learn slow but predict fast and have very lightweight models. Concerning accuracy/loss, both methods seem to be on par.
It will largely depend as both have different tradeoffs and design criteria. There has been some work to show the relationship and some say equivalence as seen in other answers to this question. Below is another reference which draws links between these two techniques in machine learning:
Ronan Collobert and Samy Bengio. 2004. Links between perceptrons, MLPs
and SVMs. In Proceedings of the twenty-first international
conference on Machine learning (ICML '04). ACM, New York, NY, USA,
23-. DOI: https://doi.org/10.1145/1015330.1015415